A minimum of a 3 Node Galera Cluster is Recommended, but is it better to have 5 nodes?


A minimum of a 3 Node Galera Cluster is Recommended, but is it better to have 5 nodes?



Right now I have a 400GB database, with a 5 node Galera Cluster. They are all RAID 10 SSDs.



I've read the following:



If the node goes missing due to a network problem or otherwise leaves
without telling the rest of the cluster, then problems can arise. For
the cluster to function, it needs a quorum, a majority of nodes active
in the cluster. The two other nodes will continue to function normally
since their partition has more than half of the known nodes but the
node that left will stop accepting queries when it realizes that it is
no longer in contact with the active partition. In this case, assuming
an application can access the two active nodes, the failure can go
mostly unnoticed.



I am trying to reduce my cost and make things more optimal. My cluster is handling a few thousand queries per minute. Is it safe to have a 3 node cluster?



What happens if 1 or 2 of the nodes go down, would there be a total outage of the database?



Is it recommended to have a 5 node cluster over a 3 node cluster?



Should I put them on RAID 1, RAID 0, or RAID 10? What is suffice?



This question came from our site for professional and enthusiast programmers.





Thank you @jww I will take a note of this.
– Kevin
Jul 2 at 5:52




1 Answer
1



If the criteria for "safe" is "no single point of failure", then 3 suffices.



If "safe" means surviving "any two points of failure", then there is no solution. 5 only handles the case where two servers go down, not arbitrary combinations of things.



RAID 10 (or 5, but not 1 or 0) provides recovery from a single disk failure on a single machine. Since a 3+ node cluster can survive the failure of one entire disk subsystem, RAID is not required; it just gives you an extra level of comfort.



I do like RAID 10 with Battery Backed Write Cache -- this has the bonus of making writes virtually instantaneous.



Here is a situation that can happen with a 3-node cluster (N1, N2, N3). Let's say N1 dies. After putting a new (or repaired) N1 into the cluster, it will rebuild the data. This uses N2 to as the 'donor' to rebuild N1. That leaves only N3 at full functionality. (N2 will be somewhat busy sending data to N1.) The Cluster is still alive, though "slowed".





Thank you for this answer, it answers it perfectly - I was worried about the "quorum" being messed up in the case that 1 node goes offline. Should I be worried in the case that 2 nodes go down at the same time, a 5 node setup would save me? Or a 3 node setup should be solid too (I guess there would be 1 node left online still).
– Kevin
Jul 2 at 6:31





Are the 3 nodes in the same building? If so, then natural disasters are perhaps your worst fear.
– Rick James
Jul 2 at 13:06






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

How to set up datasource with Spring for HikariCP?

Display dokan vendor name on Woocommerce single product pages