Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Question regarding Failover 3 Node System Backend Network

Pages: 1 2 3

Hello everyone, we are testing a 3 node cluster and the question arises how to implement a failover for the backend network.

Like I said 3 Nodes
2 switches
with 3 nodes 2 nodes must be connected to one switch and one to the other.
But if the switch where two of the three nodes are connected fails, everything is off immediately.

I have 3 Dual 10G network cards in each node.
Can't I configure 2 backend IP addresses on each port of 2 dual cards? Or what is the solution here?

and another question. if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?

Thanks for your support.

Andreas

You should create a bond on your backend network

thx and can u answer me my second question pls? (if i remove the cable from one node (backend interface) and plug it into the other switch, the node shuts down completely. is this normal?)

This is due to fencing. If a node does not respond to cluster heartbeats it is killed by other nodes before they take over resources such as ip addresses and access to storage. You can switch fencing off in the maintenance tab but it is recommended on.

if the interface was bonded, the connection will not be lost and this will not happen.

if i install the first node and combine 2 10g ports from 2 different cards into one bond, the node is no longer accessible via the management interface in step 7. i tried to create a balance-alb bond.  theme with active-backup

what could be the reason?
hp dl360 gen9

eth0 Management 172.16.2.201 NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth1 xx:xx:xx:xx:xx:xx -- NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth2 xx:xx:xx:xx:xx:xx -- NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth3 xx:xx:xx:xx:xx:xx -- NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
eth4 Backend Bond Primary 192.168.180.201 First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth5 iscsi-01 First NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth6 iscsi02 Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth7 Backend Bond Secondary Second NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth8 xx:xx:xx:xx:xx:xx - Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
eth9 xx:xx:xx:xx:xx:xx - Third NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

 

 

i think its the same as here : https://www.petasan.org/forums/?view=thread&id=676

 

petasan.log

File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 68, in get_node_status

File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info

and

File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 1222, in apply_node_network

raise Exeption("Error could not start backend network.")

can you clarify if you are talking about the same issue or a new one ? you are now referring to the build step..correct ?

The recommendation was that I could map the failover via a bond. So I started to reinstall the cluster. But when I configure a bond, I run into the above mentioned problem. This problem is exactly the same as in the link I posted.

You should map the backend network using LACP.

You could also bond the management network, in such case maybe easier to it bond as active/backup as it works out of the box without any switch settings. You could also use LACP on management bond on many switch models that support having the LACP set on the switch but initially not set on the host while you deploy, LACP should then start when both ends negotiate it. Try to set the port to passive rather than active. If your switch cannot support this, you can add the bond manually after building the cluster by adding it in the /opt/petasan/config/cluster_info.json config file

 

But it should not happen during the configuration that I lose the connection to the management IP. The complete machine is no longer accessible in step 7 of the wizard. Something can't be right, can it?
I also want to ensure that the connection in the backend network is not lost if a switch fails. To my knowledge LACP only works on one switch.

Can you double check you configured your switch ports correctly for LACP ?

Does the host became in-accessible after step 7 just for a short time during deployment or permanent ? like can you ping it or ssh to it ?

If you have issues, i suggest you use active/backup bond. You can change it later to LACP in the future by editing the config file

LACP can be done across 2 different switches to support HA among switches, but it is a feature that not all switches support so you need to check your model.

To ensure backend connection is not lost, you should test unplugging a cable from the bond in a running cluster as you were doing earlier.

Pages: 1 2 3