Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Backend bond failover issues

Hello admin,

We are suffering the same issue as described in the following thread: https://www.petasan.org/forums/?view=thread&id=772

Our situation is the following:

Our PetaSAN cluster (version 3.1.0) consists of 6 hosts.

Every host has 3 NIC’s:

  • Card 1: 2x1Gb for management
  • Card 2: 2x10Gb (1 for backend and 1 for iSCSI)
  • Card 3: 2x10Gb (1 for backend and 1 for iSCSI)

Two bonds are created on the cluster:

  • 1 management bond with 2x1Gb (balance-alb)
  • 1 backend bond with 2x10Gb (balance-alb), for redundancy we took 1x10Gb from Card 2 and 1x 10Gb from Card 3

The iSCSI uplink is done on 2 separate switches, dedicated for iSCSI (without vLAN config, etc.).

The backend uplink is done on 2 separate switches, with a 2x40Gb interlink.

If we disable a link on the backend switch (link between host A and backend switch), we end up in a situation that OSD’s from host A are going down. Next to this, we see that also iSCSI interfaces are going down even if there are no iSCSI interfaces active on host A.

During this event, we are still able to ping the backend ip and management ip of host A.

On the PetaSAN dashboard, we see the following warnings:

 

SLOW OSD heartbeats on back (longest 3000ms)

SLOW OSD heartbeats on front (longest 3000ms

33 slow ops, oldest one blocked for 147 sec, mon.HOST_C has slow ops

If we now reboot host A (without enabling the link), the cluster is returning to the HEALTH_OK state after a few minutes.

Can you advise us how to solve this issue. Currently this issue prevents us from doing maintenance on our backend switches.

The thread mentioned above talks about changing the balance-alb into an active-passive or balance-tlb bond mode. Is it possible to change this configuration on a running PetaSAN cluster?

Thanks a lot,

Robin

 

You can change bond to active backup or 802.3ad (LACP), they are more widely used and we test them alot. LACP does load balancing.

You can change the bond configuration in node via
/opt/petasan/config/cluster_info.json

change balance-alb to active-backup or 802.3ad
For LACP you would also need to configure the switch ports.

You should reboot or you can try without reboot
/opt/petasan/scripts/node_start_ips.py

You can also add any custom configuration in
/opt/petasan/scripts/custom/post_start_network.sh