Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

sizing for failover

Hello,

We have a 3 node cluster

18 OSDS of 4TB each, 6 OSD's per node

2048K active pgs

256k RAM in each node

Storage shows 34.89TB / 65.48TB (53.29%)

If we lost a node, would the system continue to function?  Just had an issue that when we lost a 4TB drive and turned the node off to replace, the system crashed.  Got it back up, thanks to the Petasan folks, but need to understand more on fail-over sizing.  This could also affect rebooting nodes due to upgrade.

Thanks,

 

1 node going down will not cause your cluster to go down, unless something is not setup correctly.

Normally loosing one node in a 3 node cluster will not cause issues unless you are using 2 times replication and thus your cluster will start recovering to the remaining nodes which will act like its unresponsive. By adding two more nodes and switching to 3 times replication, you can withstand loosing a node and the burden on the cluster will be a lot smaller than on a three node cluster. And this gets better with the economy of scale by adding more clusters, you spread the data out reducing the amount to be replicated at one time.

There is also iscsi issues where your targets may be on the node you lost and these can get stuck from moving to another node. Best option is to ensure you have all three nodes supplying iscsi targets and setup MPIO on your servers to handle node loss gracefully.