Forums - PetaSAN

ForumGeneral Discussionstrange shutdowns
You need to log in to create posts and topics. Login · Register
strange shutdowns

therm
121 Posts

June 29, 2017, 1:04 pm
Quote from therm on June 29, 2017, 1:04 pm
Hi,

today morning we had strang shutdowns of 2 of our 3 petasan servers. At the time hosts turned off our network admin was pluging cables and changing some network settings (MTU,Spanning Tree,Flow Control). In journallog it seems that there first was a short outage of the backend links and afterwards the system was shut down.

My Question: Is there something in the cluster which could shutdown the servers if there is a conenction problem?

Regards,

Dennis

Hi,

today morning we had strang shutdowns of 2 of our 3 petasan servers. At the time hosts turned off our network admin was pluging cables and changing some network settings (MTU,Spanning Tree,Flow Control). In journallog it seems that there first was a short outage of the backend links and afterwards the system was shut down.

My Question: Is there something in the cluster which could shutdown the servers if there is a conenction problem?

Regards,

Dennis

Last edited on June 29, 2017, 1:04 pm · #1

admin
2,959 Posts

June 29, 2017, 1:30 pm
Quote from admin on June 29, 2017, 1:30 pm
Yes we do simple software based fencing, when a node is not able to connect to the cluster it will clean any current resources (ips/iSCSI paths) so they could be served by other nodes, but also the other nodes will try to kill it before failing over these resources, this happens when the failed node exceeds its timeout in reporting its health check heartbeat to the cluster ( via Consul ) .

In the future we will allow more advanced hardware based fencing such as using STONITH/IPMI.

Yes we do simple software based fencing, when a node is not able to connect to the cluster it will clean any current resources (ips/iSCSI paths) so they could be served by other nodes, but also the other nodes will try to kill it before failing over these resources, this happens when the failed node exceeds its timeout in reporting its health check heartbeat to the cluster ( via Consul ) .

In the future we will allow more advanced hardware based fencing such as using STONITH/IPMI.

Last edited on June 29, 2017, 1:32 pm · #2

therm
121 Posts

June 30, 2017, 5:39 am
Quote from therm on June 30, 2017, 5:39 am
After rebooting a node it happens often that the other nodes shut this node down again, how to prevent this?

Regards,

Dennis

After rebooting a node it happens often that the other nodes shut this node down again, how to prevent this?

Regards,

Dennis

#3

admin
2,959 Posts

June 30, 2017, 9:39 am
Quote from admin on June 30, 2017, 9:39 am
Just wait a couple of minutes before starting the machine. This is the case in fencing, the other nodes cannot be 100% sure the suspected node is now OK or it is still dying, this will go on until all the other nodes agree on distributing the failed resources among them.

Just wait a couple of minutes before starting the machine. This is the case in fencing, the other nodes cannot be 100% sure the suspected node is now OK or it is still dying, this will go on until all the other nodes agree on distributing the failed resources among them.

#4

Post Reply: strange shutdowns

Cancel