Automatic shutdown of nodes
Vector
8 Posts
May 4, 2020, 9:08 amQuote from Vector on May 4, 2020, 9:08 amTesting a cluster of 3 nodes. 2 physical disks, one virtual. Observed a spontaneous shutdown of the physical node. What could be the reason ? Where to dig ? On one note (DELL R530 when loading ACPI error, the second one loads normally) But both are turned off periodically.
Testing a cluster of 3 nodes. 2 physical disks, one virtual. Observed a spontaneous shutdown of the physical node. What could be the reason ? Where to dig ? On one note (DELL R530 when loading ACPI error, the second one loads normally) But both are turned off periodically.
admin
2,930 Posts
May 4, 2020, 12:04 pmQuote from admin on May 4, 2020, 12:04 pmThis is probably fencing in action, a node being shutdown by other nodes if it does not respond in time to cluster heartbeats. This is done so they can take over resources (ip, disk io) while making sure the original node is not doing any io itself.
You can turn off fencing from maintenance tab to stop this but it is not recommended.
You should find out the root cause of this, ie why the node is not responding to heartbeats, it could be connection errors on backend interface, hardware errors, or too much load relative to your hardware.
This is probably fencing in action, a node being shutdown by other nodes if it does not respond in time to cluster heartbeats. This is done so they can take over resources (ip, disk io) while making sure the original node is not doing any io itself.
You can turn off fencing from maintenance tab to stop this but it is not recommended.
You should find out the root cause of this, ie why the node is not responding to heartbeats, it could be connection errors on backend interface, hardware errors, or too much load relative to your hardware.
Last edited on May 4, 2020, 12:05 pm by admin · #2
Ste
125 Posts
May 4, 2020, 12:47 pmQuote from Ste on May 4, 2020, 12:47 pmHi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Hi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Last edited on May 4, 2020, 12:47 pm by Ste · #3
Vector
8 Posts
May 6, 2020, 1:30 pmQuote from Vector on May 6, 2020, 1:30 pm
Quote from Ste on May 4, 2020, 12:47 pm
Hi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Hi. It seems to be the same for me. And how Did you manage the conflicts ? Did you move your network card to another slot ? I poeticas BIOS but nothing for the IRQ settings have not yet found.
Quote from Ste on May 4, 2020, 12:47 pm
Hi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Hi. It seems to be the same for me. And how Did you manage the conflicts ? Did you move your network card to another slot ? I poeticas BIOS but nothing for the IRQ settings have not yet found.
Ste
125 Posts
June 2, 2020, 10:42 amQuote from Ste on June 2, 2020, 10:42 amHi, sorry for the late reply, but that was a faulty motheboard, so we simply replaced it.
Bye.
Hi, sorry for the late reply, but that was a faulty motheboard, so we simply replaced it.
Bye.
Automatic shutdown of nodes
Vector
8 Posts
Quote from Vector on May 4, 2020, 9:08 amTesting a cluster of 3 nodes. 2 physical disks, one virtual. Observed a spontaneous shutdown of the physical node. What could be the reason ? Where to dig ? On one note (DELL R530 when loading ACPI error, the second one loads normally) But both are turned off periodically.
Testing a cluster of 3 nodes. 2 physical disks, one virtual. Observed a spontaneous shutdown of the physical node. What could be the reason ? Where to dig ? On one note (DELL R530 when loading ACPI error, the second one loads normally) But both are turned off periodically.
admin
2,930 Posts
Quote from admin on May 4, 2020, 12:04 pmThis is probably fencing in action, a node being shutdown by other nodes if it does not respond in time to cluster heartbeats. This is done so they can take over resources (ip, disk io) while making sure the original node is not doing any io itself.
You can turn off fencing from maintenance tab to stop this but it is not recommended.
You should find out the root cause of this, ie why the node is not responding to heartbeats, it could be connection errors on backend interface, hardware errors, or too much load relative to your hardware.
This is probably fencing in action, a node being shutdown by other nodes if it does not respond in time to cluster heartbeats. This is done so they can take over resources (ip, disk io) while making sure the original node is not doing any io itself.
You can turn off fencing from maintenance tab to stop this but it is not recommended.
You should find out the root cause of this, ie why the node is not responding to heartbeats, it could be connection errors on backend interface, hardware errors, or too much load relative to your hardware.
Ste
125 Posts
Quote from Ste on May 4, 2020, 12:47 pmHi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Hi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Vector
8 Posts
Quote from Vector on May 6, 2020, 1:30 pmQuote from Ste on May 4, 2020, 12:47 pmHi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Hi. It seems to be the same for me. And how Did you manage the conflicts ? Did you move your network card to another slot ? I poeticas BIOS but nothing for the IRQ settings have not yet found.
Quote from Ste on May 4, 2020, 12:47 pmHi, I had a similar issue in the past, and it was caused by the backend network interface hanging from time to time, due to an IRQ conflict with the SATA controller.
Hi. It seems to be the same for me. And how Did you manage the conflicts ? Did you move your network card to another slot ? I poeticas BIOS but nothing for the IRQ settings have not yet found.
Ste
125 Posts
Quote from Ste on June 2, 2020, 10:42 amHi, sorry for the late reply, but that was a faulty motheboard, so we simply replaced it.
Bye.
Hi, sorry for the late reply, but that was a faulty motheboard, so we simply replaced it.
Bye.