sizing for failover
khopkins
96 Posts
June 4, 2021, 9:08 pmQuote from khopkins on June 4, 2021, 9:08 pmHello,
We have a 3 node cluster
18 OSDS of 4TB each, 6 OSD's per node
2048K active pgs
256k RAM in each node
Storage shows 34.89TB / 65.48TB (53.29%)
If we lost a node, would the system continue to function? Just had an issue that when we lost a 4TB drive and turned the node off to replace, the system crashed. Got it back up, thanks to the Petasan folks, but need to understand more on fail-over sizing. This could also affect rebooting nodes due to upgrade.
Thanks,
Hello,
We have a 3 node cluster
18 OSDS of 4TB each, 6 OSD's per node
2048K active pgs
256k RAM in each node
Storage shows 34.89TB / 65.48TB (53.29%)
If we lost a node, would the system continue to function? Just had an issue that when we lost a 4TB drive and turned the node off to replace, the system crashed. Got it back up, thanks to the Petasan folks, but need to understand more on fail-over sizing. This could also affect rebooting nodes due to upgrade.
Thanks,
admin
2,930 Posts
June 6, 2021, 10:10 pmQuote from admin on June 6, 2021, 10:10 pm1 node going down will not cause your cluster to go down, unless something is not setup correctly.
1 node going down will not cause your cluster to go down, unless something is not setup correctly.
Shiori
86 Posts
December 17, 2021, 2:14 pmQuote from Shiori on December 17, 2021, 2:14 pmNormally loosing one node in a 3 node cluster will not cause issues unless you are using 2 times replication and thus your cluster will start recovering to the remaining nodes which will act like its unresponsive. By adding two more nodes and switching to 3 times replication, you can withstand loosing a node and the burden on the cluster will be a lot smaller than on a three node cluster. And this gets better with the economy of scale by adding more clusters, you spread the data out reducing the amount to be replicated at one time.
There is also iscsi issues where your targets may be on the node you lost and these can get stuck from moving to another node. Best option is to ensure you have all three nodes supplying iscsi targets and setup MPIO on your servers to handle node loss gracefully.
Normally loosing one node in a 3 node cluster will not cause issues unless you are using 2 times replication and thus your cluster will start recovering to the remaining nodes which will act like its unresponsive. By adding two more nodes and switching to 3 times replication, you can withstand loosing a node and the burden on the cluster will be a lot smaller than on a three node cluster. And this gets better with the economy of scale by adding more clusters, you spread the data out reducing the amount to be replicated at one time.
There is also iscsi issues where your targets may be on the node you lost and these can get stuck from moving to another node. Best option is to ensure you have all three nodes supplying iscsi targets and setup MPIO on your servers to handle node loss gracefully.
sizing for failover
khopkins
96 Posts
Quote from khopkins on June 4, 2021, 9:08 pmHello,
We have a 3 node cluster
18 OSDS of 4TB each, 6 OSD's per node
2048K active pgs
256k RAM in each node
Storage shows 34.89TB / 65.48TB (53.29%)
If we lost a node, would the system continue to function? Just had an issue that when we lost a 4TB drive and turned the node off to replace, the system crashed. Got it back up, thanks to the Petasan folks, but need to understand more on fail-over sizing. This could also affect rebooting nodes due to upgrade.
Thanks,
Hello,
We have a 3 node cluster
18 OSDS of 4TB each, 6 OSD's per node
2048K active pgs
256k RAM in each node
Storage shows 34.89TB / 65.48TB (53.29%)
If we lost a node, would the system continue to function? Just had an issue that when we lost a 4TB drive and turned the node off to replace, the system crashed. Got it back up, thanks to the Petasan folks, but need to understand more on fail-over sizing. This could also affect rebooting nodes due to upgrade.
Thanks,
admin
2,930 Posts
Quote from admin on June 6, 2021, 10:10 pm1 node going down will not cause your cluster to go down, unless something is not setup correctly.
1 node going down will not cause your cluster to go down, unless something is not setup correctly.
Shiori
86 Posts
Quote from Shiori on December 17, 2021, 2:14 pmNormally loosing one node in a 3 node cluster will not cause issues unless you are using 2 times replication and thus your cluster will start recovering to the remaining nodes which will act like its unresponsive. By adding two more nodes and switching to 3 times replication, you can withstand loosing a node and the burden on the cluster will be a lot smaller than on a three node cluster. And this gets better with the economy of scale by adding more clusters, you spread the data out reducing the amount to be replicated at one time.
There is also iscsi issues where your targets may be on the node you lost and these can get stuck from moving to another node. Best option is to ensure you have all three nodes supplying iscsi targets and setup MPIO on your servers to handle node loss gracefully.
Normally loosing one node in a 3 node cluster will not cause issues unless you are using 2 times replication and thus your cluster will start recovering to the remaining nodes which will act like its unresponsive. By adding two more nodes and switching to 3 times replication, you can withstand loosing a node and the burden on the cluster will be a lot smaller than on a three node cluster. And this gets better with the economy of scale by adding more clusters, you spread the data out reducing the amount to be replicated at one time.
There is also iscsi issues where your targets may be on the node you lost and these can get stuck from moving to another node. Best option is to ensure you have all three nodes supplying iscsi targets and setup MPIO on your servers to handle node loss gracefully.