Forums - PetaSAN

ForumGeneral Discussionsizing for failover
You need to log in to create posts and topics. Login · Register
sizing for failover

khopkins
96 Posts

June 4, 2021, 9:08 pm
Quote from khopkins on June 4, 2021, 9:08 pm
Hello,

We have a 3 node cluster

18 OSDS of 4TB each, 6 OSD's per node

2048K active pgs

256k RAM in each node

Storage shows 34.89TB / 65.48TB (53.29%)

If we lost a node, would the system continue to function? Just had an issue that when we lost a 4TB drive and turned the node off to replace, the system crashed. Got it back up, thanks to the Petasan folks, but need to understand more on fail-over sizing. This could also affect rebooting nodes due to upgrade.

Thanks,

Hello,

We have a 3 node cluster

18 OSDS of 4TB each, 6 OSD's per node

2048K active pgs

256k RAM in each node

Storage shows 34.89TB / 65.48TB (53.29%)

If we lost a node, would the system continue to function? Just had an issue that when we lost a 4TB drive and turned the node off to replace, the system crashed. Got it back up, thanks to the Petasan folks, but need to understand more on fail-over sizing. This could also affect rebooting nodes due to upgrade.

Thanks,

#1

admin
2,930 Posts

June 6, 2021, 10:10 pm
Quote from admin on June 6, 2021, 10:10 pm
1 node going down will not cause your cluster to go down, unless something is not setup correctly.

1 node going down will not cause your cluster to go down, unless something is not setup correctly.

#2

Shiori
86 Posts

December 17, 2021, 2:14 pm
Quote from Shiori on December 17, 2021, 2:14 pm
Normally loosing one node in a 3 node cluster will not cause issues unless you are using 2 times replication and thus your cluster will start recovering to the remaining nodes which will act like its unresponsive. By adding two more nodes and switching to 3 times replication, you can withstand loosing a node and the burden on the cluster will be a lot smaller than on a three node cluster. And this gets better with the economy of scale by adding more clusters, you spread the data out reducing the amount to be replicated at one time.

There is also iscsi issues where your targets may be on the node you lost and these can get stuck from moving to another node. Best option is to ensure you have all three nodes supplying iscsi targets and setup MPIO on your servers to handle node loss gracefully.

Normally loosing one node in a 3 node cluster will not cause issues unless you are using 2 times replication and thus your cluster will start recovering to the remaining nodes which will act like its unresponsive. By adding two more nodes and switching to 3 times replication, you can withstand loosing a node and the burden on the cluster will be a lot smaller than on a three node cluster. And this gets better with the economy of scale by adding more clusters, you spread the data out reducing the amount to be replicated at one time.

There is also iscsi issues where your targets may be on the node you lost and these can get stuck from moving to another node. Best option is to ensure you have all three nodes supplying iscsi targets and setup MPIO on your servers to handle node loss gracefully.

#3

Post Reply: sizing for failover

Cancel