Dataproc master node configuration

Multi tool use
Multi tool use


Dataproc master node configuration



I am wondering how good should be the master node for spark.(machine type) I have seen people talking about worker nodes and executor cores/instances, but couldn't find any advice for master node. I am running the applications in cluster mode. Any advice?




1 Answer
1



It actually depends on the cluster size. The nanemode keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept.



So if you have a large cluster you need to use a master with more memory.



For example if you have around 500 i3.8xlarge machines in a cluster you could have i3.8xlarge box as the master. However if you have around 1000+ such boxes you really need to use R4 memory optimize master node.



If you have a relatively small cluster the master node really doesn't matter. If you are running spark job with cluster mode , spark driver will start from any of the core node rather the master node. So as far as spark is concerned the master node doesn't really matter. However for managing large cluster master node needs to be bigger.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

WNJpGHqaE1Flnrvqd
nz7YF25XdZG,mdC 9xde6Un,ye3Ax4 sRoK KKwVLnk pDS k,67Nw P4shBWGU3q,6pL2XZftX1YpJ3

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications