Meaning of disabling Fencing
therm
121 Posts
March 4, 2023, 10:20 amQuote from therm on March 4, 2023, 10:20 amHi @all,
could you please clarify what exactly happens when disabling fencing? I am asking because nearly all problems we had with Petasan was due to fencing shutting down nodes.
Does the deactivation of fencing:
- only disable the reboot of another iSCSI-Node?
- or does it more like prevent the node from getting the IPs from the other nodes in case of communication error with consul?
What I like to have is a simple setup in which we deploy 4 IPs evanly in two iSCSI-Networks and if one iSCSI-Node fails, so what? More than enough IPs would be there to handle IO. But that would need the nodes to not reboot other servers and to not take over the IPs of other iSCSI-Servers.
Is that possible?
Thanks in advance,
Dennis
Hi @all,
could you please clarify what exactly happens when disabling fencing? I am asking because nearly all problems we had with Petasan was due to fencing shutting down nodes.
Does the deactivation of fencing:
- only disable the reboot of another iSCSI-Node?
- or does it more like prevent the node from getting the IPs from the other nodes in case of communication error with consul?
What I like to have is a simple setup in which we deploy 4 IPs evanly in two iSCSI-Networks and if one iSCSI-Node fails, so what? More than enough IPs would be there to handle IO. But that would need the nodes to not reboot other servers and to not take over the IPs of other iSCSI-Servers.
Is that possible?
Thanks in advance,
Dennis
Last edited on March 4, 2023, 11:25 am by therm · #1
admin
2,930 Posts
March 6, 2023, 7:21 pmQuote from admin on March 6, 2023, 7:21 pmFencing is killing a non-responsive node that already has resources to take over its resources. See
https://en.wikipedia.org/wiki/STONITH
If a node does not respond to cluster heartbeats in time, it is therefore considered out of the cluster. The rest of the cluster can consider this node as failed/down/dead....but what if that node is "half dead", there is a small chance that this node still has access to the target storage and can be writing old or corrupt data. The idea of fencing is to kill this node and then take over whatever ip paths/resources it has.Having a half dead, stray, out of cluster, node that talks to your storage could be dangerous. One other open-source project that implements fencing is Pacemaker.
Not having fencing is not a high risk by itself, however i would investigate why the node does not respond to cluster heartbeats in time, in this case we use Consul framework which has heartbeats every 15 sec. It could lead to to issues with load or network saturation or could be hardware.
Fencing is killing a non-responsive node that already has resources to take over its resources. See
https://en.wikipedia.org/wiki/STONITH
If a node does not respond to cluster heartbeats in time, it is therefore considered out of the cluster. The rest of the cluster can consider this node as failed/down/dead....but what if that node is "half dead", there is a small chance that this node still has access to the target storage and can be writing old or corrupt data. The idea of fencing is to kill this node and then take over whatever ip paths/resources it has.Having a half dead, stray, out of cluster, node that talks to your storage could be dangerous. One other open-source project that implements fencing is Pacemaker.
Not having fencing is not a high risk by itself, however i would investigate why the node does not respond to cluster heartbeats in time, in this case we use Consul framework which has heartbeats every 15 sec. It could lead to to issues with load or network saturation or could be hardware.
Meaning of disabling Fencing
therm
121 Posts
Quote from therm on March 4, 2023, 10:20 amHi @all,
could you please clarify what exactly happens when disabling fencing? I am asking because nearly all problems we had with Petasan was due to fencing shutting down nodes.
Does the deactivation of fencing:
- only disable the reboot of another iSCSI-Node?
- or does it more like prevent the node from getting the IPs from the other nodes in case of communication error with consul?
What I like to have is a simple setup in which we deploy 4 IPs evanly in two iSCSI-Networks and if one iSCSI-Node fails, so what? More than enough IPs would be there to handle IO. But that would need the nodes to not reboot other servers and to not take over the IPs of other iSCSI-Servers.
Is that possible?
Thanks in advance,
Dennis
Hi @all,
could you please clarify what exactly happens when disabling fencing? I am asking because nearly all problems we had with Petasan was due to fencing shutting down nodes.
Does the deactivation of fencing:
- only disable the reboot of another iSCSI-Node?
- or does it more like prevent the node from getting the IPs from the other nodes in case of communication error with consul?
What I like to have is a simple setup in which we deploy 4 IPs evanly in two iSCSI-Networks and if one iSCSI-Node fails, so what? More than enough IPs would be there to handle IO. But that would need the nodes to not reboot other servers and to not take over the IPs of other iSCSI-Servers.
Is that possible?
Thanks in advance,
Dennis
admin
2,930 Posts
Quote from admin on March 6, 2023, 7:21 pmFencing is killing a non-responsive node that already has resources to take over its resources. See
https://en.wikipedia.org/wiki/STONITH
If a node does not respond to cluster heartbeats in time, it is therefore considered out of the cluster. The rest of the cluster can consider this node as failed/down/dead....but what if that node is "half dead", there is a small chance that this node still has access to the target storage and can be writing old or corrupt data. The idea of fencing is to kill this node and then take over whatever ip paths/resources it has.Having a half dead, stray, out of cluster, node that talks to your storage could be dangerous. One other open-source project that implements fencing is Pacemaker.
Not having fencing is not a high risk by itself, however i would investigate why the node does not respond to cluster heartbeats in time, in this case we use Consul framework which has heartbeats every 15 sec. It could lead to to issues with load or network saturation or could be hardware.
Fencing is killing a non-responsive node that already has resources to take over its resources. See
https://en.wikipedia.org/wiki/STONITH
If a node does not respond to cluster heartbeats in time, it is therefore considered out of the cluster. The rest of the cluster can consider this node as failed/down/dead....but what if that node is "half dead", there is a small chance that this node still has access to the target storage and can be writing old or corrupt data. The idea of fencing is to kill this node and then take over whatever ip paths/resources it has.Having a half dead, stray, out of cluster, node that talks to your storage could be dangerous. One other open-source project that implements fencing is Pacemaker.
Not having fencing is not a high risk by itself, however i would investigate why the node does not respond to cluster heartbeats in time, in this case we use Consul framework which has heartbeats every 15 sec. It could lead to to issues with load or network saturation or could be hardware.