Dataproc master node configuration

Dataproc master node configuration

I am wondering how good should be the master node for spark.(machine type) I have seen people talking about worker nodes and executor cores/instances, but couldn't find any advice for master node. I am running the applications in cluster mode. Any advice?

1 Answer
1

It actually depends on the cluster size. The nanemode keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept.

So if you have a large cluster you need to use a master with more memory.

For example if you have around 500 i3.8xlarge machines in a cluster you could have i3.8xlarge box as the master. However if you have around 1000+ such boxes you really need to use R4 memory optimize master node.

If you have a relatively small cluster the master node really doesn't matter. If you are running spark job with cluster mode , spark driver will start from any of the core node rather the master node. So as far as spark is concerned the master node doesn't really matter. However for managing large cluster master node needs to be bigger.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

WNJpGHqaE1Flnrvqd

搜尋此網誌

Fjhtyj