Halting an OSD VM leads to iSCSI disconnection
jmlefevre
3 Posts
December 21, 2018, 4:42 pmQuote from jmlefevre on December 21, 2018, 4:42 pmHi,
We are testing petasan to use as an iSCSI provider to MS cluster failover.
We enabled multipath on Windows servers.
We configured petasan as:
3 nodes OSD on VMWare ESX
Management subnet and iSCSI subnet are the same (10.2.8.0/24)
Backend subnets are on another VLAN (10.2.19.0/24 and 10.2.20/24)
iSCSI auto IP assignments is from 10.2.8.100 to 10.2.8.110
3 Path are configured.
We tested several failure cases, especially:
* Reboot of a node from ssh console => This works well but regulary on reboot node start and then after a few seconds, the node shutdown. (This is my issue #1)
* Halt a node from VMware console => The iSCSI disk on windows is vanishing after a 30s freeze. (This is my issue #2)
Could you kindly help us on those 2 issues ?
Best regards,
Hi,
We are testing petasan to use as an iSCSI provider to MS cluster failover.
We enabled multipath on Windows servers.
We configured petasan as:
3 nodes OSD on VMWare ESX
Management subnet and iSCSI subnet are the same (10.2.8.0/24)
Backend subnets are on another VLAN (10.2.19.0/24 and 10.2.20/24)
iSCSI auto IP assignments is from 10.2.8.100 to 10.2.8.110
3 Path are configured.
We tested several failure cases, especially:
* Reboot of a node from ssh console => This works well but regulary on reboot node start and then after a few seconds, the node shutdown. (This is my issue #1)
* Halt a node from VMware console => The iSCSI disk on windows is vanishing after a 30s freeze. (This is my issue #2)
Could you kindly help us on those 2 issues ?
Best regards,
admin
2,930 Posts
December 21, 2018, 5:37 pmQuote from admin on December 21, 2018, 5:37 pm
- this will happen if a node had iscsi resources and goes down and up too quicky. it is being killed due to fencing, you can turn off fencing from the maintenance page (not recommended) and the node will not get killed. Fencing is done by other nodes since they have no idea if the node is now good or it is die-ing, they kill it so they can safely take over the paths. If you wait a minute or so so that all iscsi paths that were on the done have already been distributed to other nodes then it will not be fenced.
- if you mean you shut down the node, then it should work without issue, we test this all the time, it should also be very similar to your case 1 of reboot. apart from checking your setup, the only thing i can think of is you have low resources such as ram, when a node goes down the other nodes will do some recovery so it will stress them more, if you have very low ram you may bring them to a near halt.
- this will happen if a node had iscsi resources and goes down and up too quicky. it is being killed due to fencing, you can turn off fencing from the maintenance page (not recommended) and the node will not get killed. Fencing is done by other nodes since they have no idea if the node is now good or it is die-ing, they kill it so they can safely take over the paths. If you wait a minute or so so that all iscsi paths that were on the done have already been distributed to other nodes then it will not be fenced.
- if you mean you shut down the node, then it should work without issue, we test this all the time, it should also be very similar to your case 1 of reboot. apart from checking your setup, the only thing i can think of is you have low resources such as ram, when a node goes down the other nodes will do some recovery so it will stress them more, if you have very low ram you may bring them to a near halt.
Last edited on December 21, 2018, 5:38 pm by admin · #2
jmlefevre
3 Posts
December 24, 2018, 9:38 amQuote from jmlefevre on December 24, 2018, 9:38 amHi,
Thanks a lot for such a quick reply.
For issue 1/, thanks for explaination. Indeed, it is not an issue but a interresting feature.
For issue 2/ we will run some test today. But we are not confident it comes from resource like RAM as we set 8Gb of RAM for a 3 nodes cluster with 3x150Gb storage.
We will revert to you today.
Hi,
Thanks a lot for such a quick reply.
For issue 1/, thanks for explaination. Indeed, it is not an issue but a interresting feature.
For issue 2/ we will run some test today. But we are not confident it comes from resource like RAM as we set 8Gb of RAM for a 3 nodes cluster with 3x150Gb storage.
We will revert to you today.
jmlefevre
3 Posts
December 24, 2018, 12:34 pmQuote from jmlefevre on December 24, 2018, 12:34 pmHi again
Seems the disconnection for issue #2 is due to cluster failover misconfiguration (not identified yet).
Nevertheless, we can see that in case of reboot or halting a node of the petasan cluster, there is a freeze for 10 to 30 seconds. Is there any way to reduce that freeze time ?
Best regards,
Hi again
Seems the disconnection for issue #2 is due to cluster failover misconfiguration (not identified yet).
Nevertheless, we can see that in case of reboot or halting a node of the petasan cluster, there is a freeze for 10 to 30 seconds. Is there any way to reduce that freeze time ?
Best regards,
admin
2,930 Posts
December 24, 2018, 4:56 pmQuote from admin on December 24, 2018, 4:56 pmThe path switching by PetaSAN itself takes less than 1 sec, you can test this if you do a manual path assignment (from the Path Assignment page) your io will not feel any freeze. The 10-30 sec you see is due to the default iSCSI configuration of vmware/microsoft and for some errors, configuratoion in Ceph, we are able to get fast failover values if we change them from defaults.
The conf values are
Microsodt iSCSI failover timeouts: LinkDownTime, TcpDataRetransmissions, TcpInitialRtt
VMware ESXi failiver timeouts : RecoveryTimeout, NoopTimeout, NoopInterval
Ceph OSD disk failure : osd_heartbeat_interval, osd_heartbeat_grace
Having said this we strongly recommend against changing the default values, they were chosen for a reason, lowering them may start triggering false failover when the system is overloaded and will cause many more problems, a 10-30 sec io freeze is common for SAN failures.
Re the other issue: apart from checking your setup, 8 GB is very low, please check our hardware guide. Another thing is that during failover testing, make sure that Ceph had time to recover from a previous node failure(s) simulation before triggering a new one, try to make sure the dashboard shows an ok status before shutting down node.
The path switching by PetaSAN itself takes less than 1 sec, you can test this if you do a manual path assignment (from the Path Assignment page) your io will not feel any freeze. The 10-30 sec you see is due to the default iSCSI configuration of vmware/microsoft and for some errors, configuratoion in Ceph, we are able to get fast failover values if we change them from defaults.
The conf values are
Microsodt iSCSI failover timeouts: LinkDownTime, TcpDataRetransmissions, TcpInitialRtt
VMware ESXi failiver timeouts : RecoveryTimeout, NoopTimeout, NoopInterval
Ceph OSD disk failure : osd_heartbeat_interval, osd_heartbeat_grace
Having said this we strongly recommend against changing the default values, they were chosen for a reason, lowering them may start triggering false failover when the system is overloaded and will cause many more problems, a 10-30 sec io freeze is common for SAN failures.
Re the other issue: apart from checking your setup, 8 GB is very low, please check our hardware guide. Another thing is that during failover testing, make sure that Ceph had time to recover from a previous node failure(s) simulation before triggering a new one, try to make sure the dashboard shows an ok status before shutting down node.
Last edited on December 24, 2018, 5:00 pm by admin · #5
Halting an OSD VM leads to iSCSI disconnection
jmlefevre
3 Posts
Quote from jmlefevre on December 21, 2018, 4:42 pmHi,
We are testing petasan to use as an iSCSI provider to MS cluster failover.
We enabled multipath on Windows servers.
We configured petasan as:
3 nodes OSD on VMWare ESX
Management subnet and iSCSI subnet are the same (10.2.8.0/24)
Backend subnets are on another VLAN (10.2.19.0/24 and 10.2.20/24)
iSCSI auto IP assignments is from 10.2.8.100 to 10.2.8.110
3 Path are configured.
We tested several failure cases, especially:
* Reboot of a node from ssh console => This works well but regulary on reboot node start and then after a few seconds, the node shutdown. (This is my issue #1)
* Halt a node from VMware console => The iSCSI disk on windows is vanishing after a 30s freeze. (This is my issue #2)
Could you kindly help us on those 2 issues ?
Best regards,
Hi,
We are testing petasan to use as an iSCSI provider to MS cluster failover.
We enabled multipath on Windows servers.
We configured petasan as:
3 nodes OSD on VMWare ESX
Management subnet and iSCSI subnet are the same (10.2.8.0/24)
Backend subnets are on another VLAN (10.2.19.0/24 and 10.2.20/24)
iSCSI auto IP assignments is from 10.2.8.100 to 10.2.8.110
3 Path are configured.
We tested several failure cases, especially:
* Reboot of a node from ssh console => This works well but regulary on reboot node start and then after a few seconds, the node shutdown. (This is my issue #1)
* Halt a node from VMware console => The iSCSI disk on windows is vanishing after a 30s freeze. (This is my issue #2)
Could you kindly help us on those 2 issues ?
Best regards,
admin
2,930 Posts
Quote from admin on December 21, 2018, 5:37 pm
- this will happen if a node had iscsi resources and goes down and up too quicky. it is being killed due to fencing, you can turn off fencing from the maintenance page (not recommended) and the node will not get killed. Fencing is done by other nodes since they have no idea if the node is now good or it is die-ing, they kill it so they can safely take over the paths. If you wait a minute or so so that all iscsi paths that were on the done have already been distributed to other nodes then it will not be fenced.
- if you mean you shut down the node, then it should work without issue, we test this all the time, it should also be very similar to your case 1 of reboot. apart from checking your setup, the only thing i can think of is you have low resources such as ram, when a node goes down the other nodes will do some recovery so it will stress them more, if you have very low ram you may bring them to a near halt.
- this will happen if a node had iscsi resources and goes down and up too quicky. it is being killed due to fencing, you can turn off fencing from the maintenance page (not recommended) and the node will not get killed. Fencing is done by other nodes since they have no idea if the node is now good or it is die-ing, they kill it so they can safely take over the paths. If you wait a minute or so so that all iscsi paths that were on the done have already been distributed to other nodes then it will not be fenced.
- if you mean you shut down the node, then it should work without issue, we test this all the time, it should also be very similar to your case 1 of reboot. apart from checking your setup, the only thing i can think of is you have low resources such as ram, when a node goes down the other nodes will do some recovery so it will stress them more, if you have very low ram you may bring them to a near halt.
jmlefevre
3 Posts
Quote from jmlefevre on December 24, 2018, 9:38 amHi,
Thanks a lot for such a quick reply.
For issue 1/, thanks for explaination. Indeed, it is not an issue but a interresting feature.
For issue 2/ we will run some test today. But we are not confident it comes from resource like RAM as we set 8Gb of RAM for a 3 nodes cluster with 3x150Gb storage.
We will revert to you today.
Hi,
Thanks a lot for such a quick reply.
For issue 1/, thanks for explaination. Indeed, it is not an issue but a interresting feature.
For issue 2/ we will run some test today. But we are not confident it comes from resource like RAM as we set 8Gb of RAM for a 3 nodes cluster with 3x150Gb storage.
We will revert to you today.
jmlefevre
3 Posts
Quote from jmlefevre on December 24, 2018, 12:34 pmHi again
Seems the disconnection for issue #2 is due to cluster failover misconfiguration (not identified yet).
Nevertheless, we can see that in case of reboot or halting a node of the petasan cluster, there is a freeze for 10 to 30 seconds. Is there any way to reduce that freeze time ?
Best regards,
Hi again
Seems the disconnection for issue #2 is due to cluster failover misconfiguration (not identified yet).
Nevertheless, we can see that in case of reboot or halting a node of the petasan cluster, there is a freeze for 10 to 30 seconds. Is there any way to reduce that freeze time ?
Best regards,
admin
2,930 Posts
Quote from admin on December 24, 2018, 4:56 pmThe path switching by PetaSAN itself takes less than 1 sec, you can test this if you do a manual path assignment (from the Path Assignment page) your io will not feel any freeze. The 10-30 sec you see is due to the default iSCSI configuration of vmware/microsoft and for some errors, configuratoion in Ceph, we are able to get fast failover values if we change them from defaults.
The conf values are
Microsodt iSCSI failover timeouts: LinkDownTime, TcpDataRetransmissions, TcpInitialRtt
VMware ESXi failiver timeouts : RecoveryTimeout, NoopTimeout, NoopInterval
Ceph OSD disk failure : osd_heartbeat_interval, osd_heartbeat_graceHaving said this we strongly recommend against changing the default values, they were chosen for a reason, lowering them may start triggering false failover when the system is overloaded and will cause many more problems, a 10-30 sec io freeze is common for SAN failures.
Re the other issue: apart from checking your setup, 8 GB is very low, please check our hardware guide. Another thing is that during failover testing, make sure that Ceph had time to recover from a previous node failure(s) simulation before triggering a new one, try to make sure the dashboard shows an ok status before shutting down node.
The path switching by PetaSAN itself takes less than 1 sec, you can test this if you do a manual path assignment (from the Path Assignment page) your io will not feel any freeze. The 10-30 sec you see is due to the default iSCSI configuration of vmware/microsoft and for some errors, configuratoion in Ceph, we are able to get fast failover values if we change them from defaults.
The conf values are
Microsodt iSCSI failover timeouts: LinkDownTime, TcpDataRetransmissions, TcpInitialRtt
VMware ESXi failiver timeouts : RecoveryTimeout, NoopTimeout, NoopInterval
Ceph OSD disk failure : osd_heartbeat_interval, osd_heartbeat_grace
Having said this we strongly recommend against changing the default values, they were chosen for a reason, lowering them may start triggering false failover when the system is overloaded and will cause many more problems, a 10-30 sec io freeze is common for SAN failures.
Re the other issue: apart from checking your setup, 8 GB is very low, please check our hardware guide. Another thing is that during failover testing, make sure that Ceph had time to recover from a previous node failure(s) simulation before triggering a new one, try to make sure the dashboard shows an ok status before shutting down node.