Question regarding Failover 3 Node System Backend Network
exitsys
43 Posts
September 25, 2020, 2:02 pmQuote from exitsys on September 25, 2020, 2:02 pmHello everyone, we are testing a 3 node cluster and the question arises how to implement a failover for the backend network.
Like I said 3 Nodes
2 switches
with 3 nodes 2 nodes must be connected to one switch and one to the other.
But if the switch where two of the three nodes are connected fails, everything is off immediately.
I have 3 Dual 10G network cards in each node.
Can't I configure 2 backend IP addresses on each port of 2 dual cards? Or what is the solution here?
and another question. if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?
Thanks for your support.
Andreas
Hello everyone, we are testing a 3 node cluster and the question arises how to implement a failover for the backend network.
Like I said 3 Nodes
2 switches
with 3 nodes 2 nodes must be connected to one switch and one to the other.
But if the switch where two of the three nodes are connected fails, everything is off immediately.
I have 3 Dual 10G network cards in each node.
Can't I configure 2 backend IP addresses on each port of 2 dual cards? Or what is the solution here?
and another question. if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?
Thanks for your support.
Andreas
Last edited on September 25, 2020, 2:33 pm by exitsys · #1
admin
2,930 Posts
September 25, 2020, 4:59 pmQuote from admin on September 25, 2020, 4:59 pmYou should create a bond on your backend network
You should create a bond on your backend network
exitsys
43 Posts
September 26, 2020, 10:10 amQuote from exitsys on September 26, 2020, 10:10 amthx and can u answer me my second question pls? (if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?)
thx and can u answer me my second question pls? (if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?)
admin
2,930 Posts
September 26, 2020, 1:06 pmQuote from admin on September 26, 2020, 1:06 pmThis is due to fencing. If a node does not respond to cluster heartbeats it is killed by other nodes before they take over resources such as ip addresses and access to storage. You can switch fencing off in the maintenance tab but it is recommended on.
if the interface was bonded, the connection will not be lost and this will not happen.
This is due to fencing. If a node does not respond to cluster heartbeats it is killed by other nodes before they take over resources such as ip addresses and access to storage. You can switch fencing off in the maintenance tab but it is recommended on.
if the interface was bonded, the connection will not be lost and this will not happen.
Last edited on September 26, 2020, 1:09 pm by admin · #4
exitsys
43 Posts
September 27, 2020, 3:21 pmQuote from exitsys on September 27, 2020, 3:21 pmif i install the first node and combine 2 10g ports from 2 different cards into one bond, the node is no longer accessible via the management interface in step 7. i tried to create a balance-alb bond. theme with active-backup
what could be the reason?
hp dl360 gen9
eth0
Management
172.16.2.201
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth1
xx:xx:xx:xx:xx:xx
--
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth2
xx:xx:xx:xx:xx:xx
--
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth3
xx:xx:xx:xx:xx:xx
--
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth4
Backend Bond Primary
192.168.180.201
First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth5
iscsi-01
First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth6
iscsi02
Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth7
Backend Bond Secondary
Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth8
xx:xx:xx:xx:xx:xx
-
Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth9
xx:xx:xx:xx:xx:xx
-
Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
i think its the same as here : https://www.petasan.org/forums/?view=thread&id=676
petasan.log
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 68, in get_node_status
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
and
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 1222, in apply_node_network
raise Exeption("Error could not start backend network.")
if i install the first node and combine 2 10g ports from 2 different cards into one bond, the node is no longer accessible via the management interface in step 7. i tried to create a balance-alb bond. theme with active-backup
what could be the reason?
hp dl360 gen9
eth0
Management
172.16.2.201
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth1
xx:xx:xx:xx:xx:xx
--
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth2
xx:xx:xx:xx:xx:xx
--
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth3
xx:xx:xx:xx:xx:xx
--
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth4
Backend Bond Primary
192.168.180.201
First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth5
iscsi-01
First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth6
iscsi02
Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth7
Backend Bond Secondary
Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth8
xx:xx:xx:xx:xx:xx
-
Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth9
xx:xx:xx:xx:xx:xx
-
Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
i think its the same as here : https://www.petasan.org/forums/?view=thread&id=676
petasan.log
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 68, in get_node_status
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
and
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 1222, in apply_node_network
raise Exeption("Error could not start backend network.")
Last edited on September 27, 2020, 3:47 pm by exitsys · #5
admin
2,930 Posts
September 27, 2020, 5:56 pmQuote from admin on September 27, 2020, 5:56 pmcan you clarify if you are talking about the same issue or a new one ? you are now referring to the build step..correct ?
can you clarify if you are talking about the same issue or a new one ? you are now referring to the build step..correct ?
exitsys
43 Posts
September 27, 2020, 6:18 pmQuote from exitsys on September 27, 2020, 6:18 pmThe recommendation was that I could map the failover via a bond. So I started to reinstall the cluster. But when I configure a bond, I run into the above mentioned problem. This problem is exactly the same as in the link I posted.
The recommendation was that I could map the failover via a bond. So I started to reinstall the cluster. But when I configure a bond, I run into the above mentioned problem. This problem is exactly the same as in the link I posted.
admin
2,930 Posts
September 27, 2020, 7:38 pmQuote from admin on September 27, 2020, 7:38 pmYou should map the backend network using LACP.
You could also bond the management network, in such case maybe easier to it bond as active/backup as it works out of the box without any switch settings. You could also use LACP on management bond on many switch models that support having the LACP set on the switch but initially not set on the host while you deploy, LACP should then start when both ends negotiate it. Try to set the port to passive rather than active. If your switch cannot support this, you can add the bond manually after building the cluster by adding it in the /opt/petasan/config/cluster_info.json config file
You should map the backend network using LACP.
You could also bond the management network, in such case maybe easier to it bond as active/backup as it works out of the box without any switch settings. You could also use LACP on management bond on many switch models that support having the LACP set on the switch but initially not set on the host while you deploy, LACP should then start when both ends negotiate it. Try to set the port to passive rather than active. If your switch cannot support this, you can add the bond manually after building the cluster by adding it in the /opt/petasan/config/cluster_info.json config file
Last edited on September 27, 2020, 7:41 pm by admin · #8
exitsys
43 Posts
September 28, 2020, 6:04 amQuote from exitsys on September 28, 2020, 6:04 amBut it should not happen during the configuration that I lose the connection to the management IP. The complete machine is no longer accessible in step 7 of the wizard. Something can't be right, can it?
I also want to ensure that the connection in the backend network is not lost if a switch fails. To my knowledge LACP only works on one switch.
But it should not happen during the configuration that I lose the connection to the management IP. The complete machine is no longer accessible in step 7 of the wizard. Something can't be right, can it?
I also want to ensure that the connection in the backend network is not lost if a switch fails. To my knowledge LACP only works on one switch.
Last edited on September 28, 2020, 6:08 am by exitsys · #9
admin
2,930 Posts
September 28, 2020, 7:08 amQuote from admin on September 28, 2020, 7:08 amCan you double check you configured your switch ports correctly for LACP ?
Does the host became in-accessible after step 7 just for a short time during deployment or permanent ? like can you ping it or ssh to it ?
If you have issues, i suggest you use active/backup bond. You can change it later to LACP in the future by editing the config file
LACP can be done across 2 different switches to support HA among switches, but it is a feature that not all switches support so you need to check your model.
To ensure backend connection is not lost, you should test unplugging a cable from the bond in a running cluster as you were doing earlier.
Can you double check you configured your switch ports correctly for LACP ?
Does the host became in-accessible after step 7 just for a short time during deployment or permanent ? like can you ping it or ssh to it ?
If you have issues, i suggest you use active/backup bond. You can change it later to LACP in the future by editing the config file
LACP can be done across 2 different switches to support HA among switches, but it is a feature that not all switches support so you need to check your model.
To ensure backend connection is not lost, you should test unplugging a cable from the bond in a running cluster as you were doing earlier.
Question regarding Failover 3 Node System Backend Network
exitsys
43 Posts
Quote from exitsys on September 25, 2020, 2:02 pmHello everyone, we are testing a 3 node cluster and the question arises how to implement a failover for the backend network.
Like I said 3 Nodes
2 switches
with 3 nodes 2 nodes must be connected to one switch and one to the other.
But if the switch where two of the three nodes are connected fails, everything is off immediately.I have 3 Dual 10G network cards in each node.
Can't I configure 2 backend IP addresses on each port of 2 dual cards? Or what is the solution here?and another question. if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?
Thanks for your support.
Andreas
Hello everyone, we are testing a 3 node cluster and the question arises how to implement a failover for the backend network.
Like I said 3 Nodes
2 switches
with 3 nodes 2 nodes must be connected to one switch and one to the other.
But if the switch where two of the three nodes are connected fails, everything is off immediately.
I have 3 Dual 10G network cards in each node.
Can't I configure 2 backend IP addresses on each port of 2 dual cards? Or what is the solution here?
and another question. if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?
Thanks for your support.
Andreas
admin
2,930 Posts
Quote from admin on September 25, 2020, 4:59 pmYou should create a bond on your backend network
You should create a bond on your backend network
exitsys
43 Posts
Quote from exitsys on September 26, 2020, 10:10 amthx and can u answer me my second question pls? (if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?)
thx and can u answer me my second question pls? (if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?)
admin
2,930 Posts
Quote from admin on September 26, 2020, 1:06 pmThis is due to fencing. If a node does not respond to cluster heartbeats it is killed by other nodes before they take over resources such as ip addresses and access to storage. You can switch fencing off in the maintenance tab but it is recommended on.
if the interface was bonded, the connection will not be lost and this will not happen.
This is due to fencing. If a node does not respond to cluster heartbeats it is killed by other nodes before they take over resources such as ip addresses and access to storage. You can switch fencing off in the maintenance tab but it is recommended on.
if the interface was bonded, the connection will not be lost and this will not happen.
exitsys
43 Posts
Quote from exitsys on September 27, 2020, 3:21 pmif i install the first node and combine 2 10g ports from 2 different cards into one bond, the node is no longer accessible via the management interface in step 7. i tried to create a balance-alb bond. theme with active-backup
what could be the reason?
hp dl360 gen9
eth0 Management 172.16.2.201 NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) eth1 xx:xx:xx:xx:xx:xx -- NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) eth2 xx:xx:xx:xx:xx:xx -- NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) eth3 xx:xx:xx:xx:xx:xx -- NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) eth4 Backend Bond Primary 192.168.180.201 First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) eth5 iscsi-01 First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) eth6 iscsi02 Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) eth7 Backend Bond Secondary Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) eth8 xx:xx:xx:xx:xx:xx - Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) eth9 xx:xx:xx:xx:xx:xx - Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
i think its the same as here : https://www.petasan.org/forums/?view=thread&id=676
petasan.log
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 68, in get_node_status
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
and
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 1222, in apply_node_network
raise Exeption("Error could not start backend network.")
if i install the first node and combine 2 10g ports from 2 different cards into one bond, the node is no longer accessible via the management interface in step 7. i tried to create a balance-alb bond. theme with active-backup
what could be the reason?
hp dl360 gen9
eth0 | Management | 172.16.2.201 | NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) |
eth1 | xx:xx:xx:xx:xx:xx | -- | NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) |
eth2 | xx:xx:xx:xx:xx:xx | -- | NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) |
eth3 | xx:xx:xx:xx:xx:xx | -- | NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) |
eth4 | Backend Bond Primary | 192.168.180.201 | First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) |
eth5 | iscsi-01 | First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) | |
eth6 | iscsi02 | Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) | |
eth7 | Backend Bond Secondary | Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) | |
eth8 | xx:xx:xx:xx:xx:xx | - | Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) |
eth9 | xx:xx:xx:xx:xx:xx | - | Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) |
i think its the same as here : https://www.petasan.org/forums/?view=thread&id=676
petasan.log
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 68, in get_node_status
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
and
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 1222, in apply_node_network
raise Exeption("Error could not start backend network.")
admin
2,930 Posts
Quote from admin on September 27, 2020, 5:56 pmcan you clarify if you are talking about the same issue or a new one ? you are now referring to the build step..correct ?
can you clarify if you are talking about the same issue or a new one ? you are now referring to the build step..correct ?
exitsys
43 Posts
Quote from exitsys on September 27, 2020, 6:18 pmThe recommendation was that I could map the failover via a bond. So I started to reinstall the cluster. But when I configure a bond, I run into the above mentioned problem. This problem is exactly the same as in the link I posted.
The recommendation was that I could map the failover via a bond. So I started to reinstall the cluster. But when I configure a bond, I run into the above mentioned problem. This problem is exactly the same as in the link I posted.
admin
2,930 Posts
Quote from admin on September 27, 2020, 7:38 pmYou should map the backend network using LACP.
You could also bond the management network, in such case maybe easier to it bond as active/backup as it works out of the box without any switch settings. You could also use LACP on management bond on many switch models that support having the LACP set on the switch but initially not set on the host while you deploy, LACP should then start when both ends negotiate it. Try to set the port to passive rather than active. If your switch cannot support this, you can add the bond manually after building the cluster by adding it in the /opt/petasan/config/cluster_info.json config file
You should map the backend network using LACP.
You could also bond the management network, in such case maybe easier to it bond as active/backup as it works out of the box without any switch settings. You could also use LACP on management bond on many switch models that support having the LACP set on the switch but initially not set on the host while you deploy, LACP should then start when both ends negotiate it. Try to set the port to passive rather than active. If your switch cannot support this, you can add the bond manually after building the cluster by adding it in the /opt/petasan/config/cluster_info.json config file
exitsys
43 Posts
Quote from exitsys on September 28, 2020, 6:04 amBut it should not happen during the configuration that I lose the connection to the management IP. The complete machine is no longer accessible in step 7 of the wizard. Something can't be right, can it?
I also want to ensure that the connection in the backend network is not lost if a switch fails. To my knowledge LACP only works on one switch.
But it should not happen during the configuration that I lose the connection to the management IP. The complete machine is no longer accessible in step 7 of the wizard. Something can't be right, can it?
I also want to ensure that the connection in the backend network is not lost if a switch fails. To my knowledge LACP only works on one switch.
admin
2,930 Posts
Quote from admin on September 28, 2020, 7:08 amCan you double check you configured your switch ports correctly for LACP ?
Does the host became in-accessible after step 7 just for a short time during deployment or permanent ? like can you ping it or ssh to it ?
If you have issues, i suggest you use active/backup bond. You can change it later to LACP in the future by editing the config file
LACP can be done across 2 different switches to support HA among switches, but it is a feature that not all switches support so you need to check your model.
To ensure backend connection is not lost, you should test unplugging a cable from the bond in a running cluster as you were doing earlier.
Can you double check you configured your switch ports correctly for LACP ?
Does the host became in-accessible after step 7 just for a short time during deployment or permanent ? like can you ping it or ssh to it ?
If you have issues, i suggest you use active/backup bond. You can change it later to LACP in the future by editing the config file
LACP can be done across 2 different switches to support HA among switches, but it is a feature that not all switches support so you need to check your model.
To ensure backend connection is not lost, you should test unplugging a cable from the bond in a running cluster as you were doing earlier.