Backend bond failover issues
robindewolf
9 Posts
March 1, 2023, 2:40 pmQuote from robindewolf on March 1, 2023, 2:40 pmHello admin,
We are suffering the same issue as described in the following thread: https://www.petasan.org/forums/?view=thread&id=772
Our situation is the following:
Our PetaSAN cluster (version 3.1.0) consists of 6 hosts.
Every host has 3 NIC’s:
- Card 1: 2x1Gb for management
- Card 2: 2x10Gb (1 for backend and 1 for iSCSI)
- Card 3: 2x10Gb (1 for backend and 1 for iSCSI)
Two bonds are created on the cluster:
- 1 management bond with 2x1Gb (balance-alb)
- 1 backend bond with 2x10Gb (balance-alb), for redundancy we took 1x10Gb from Card 2 and 1x 10Gb from Card 3
The iSCSI uplink is done on 2 separate switches, dedicated for iSCSI (without vLAN config, etc.).
The backend uplink is done on 2 separate switches, with a 2x40Gb interlink.
If we disable a link on the backend switch (link between host A and backend switch), we end up in a situation that OSD’s from host A are going down. Next to this, we see that also iSCSI interfaces are going down even if there are no iSCSI interfaces active on host A.
During this event, we are still able to ping the backend ip and management ip of host A.
On the PetaSAN dashboard, we see the following warnings:
SLOW OSD heartbeats on back (longest 3000ms)
SLOW OSD heartbeats on front (longest 3000ms
33 slow ops, oldest one blocked for 147 sec, mon.HOST_C has slow ops
If we now reboot host A (without enabling the link), the cluster is returning to the HEALTH_OK state after a few minutes.
Can you advise us how to solve this issue. Currently this issue prevents us from doing maintenance on our backend switches.
The thread mentioned above talks about changing the balance-alb into an active-passive or balance-tlb bond mode. Is it possible to change this configuration on a running PetaSAN cluster?
Thanks a lot,
Robin
Hello admin,
We are suffering the same issue as described in the following thread: https://www.petasan.org/forums/?view=thread&id=772
Our situation is the following:
Our PetaSAN cluster (version 3.1.0) consists of 6 hosts.
Every host has 3 NIC’s:
- Card 1: 2x1Gb for management
- Card 2: 2x10Gb (1 for backend and 1 for iSCSI)
- Card 3: 2x10Gb (1 for backend and 1 for iSCSI)
Two bonds are created on the cluster:
- 1 management bond with 2x1Gb (balance-alb)
- 1 backend bond with 2x10Gb (balance-alb), for redundancy we took 1x10Gb from Card 2 and 1x 10Gb from Card 3
The iSCSI uplink is done on 2 separate switches, dedicated for iSCSI (without vLAN config, etc.).
The backend uplink is done on 2 separate switches, with a 2x40Gb interlink.
If we disable a link on the backend switch (link between host A and backend switch), we end up in a situation that OSD’s from host A are going down. Next to this, we see that also iSCSI interfaces are going down even if there are no iSCSI interfaces active on host A.
During this event, we are still able to ping the backend ip and management ip of host A.
On the PetaSAN dashboard, we see the following warnings:
SLOW OSD heartbeats on back (longest 3000ms)
SLOW OSD heartbeats on front (longest 3000ms
33 slow ops, oldest one blocked for 147 sec, mon.HOST_C has slow ops
If we now reboot host A (without enabling the link), the cluster is returning to the HEALTH_OK state after a few minutes.
Can you advise us how to solve this issue. Currently this issue prevents us from doing maintenance on our backend switches.
The thread mentioned above talks about changing the balance-alb into an active-passive or balance-tlb bond mode. Is it possible to change this configuration on a running PetaSAN cluster?
Thanks a lot,
Robin
admin
2,930 Posts
March 1, 2023, 11:02 pmQuote from admin on March 1, 2023, 11:02 pmYou can change bond to active backup or 802.3ad (LACP), they are more widely used and we test them alot. LACP does load balancing.
You can change the bond configuration in node via
/opt/petasan/config/cluster_info.json
change balance-alb to active-backup or 802.3ad
For LACP you would also need to configure the switch ports.
You should reboot or you can try without reboot
/opt/petasan/scripts/node_start_ips.py
You can also add any custom configuration in
/opt/petasan/scripts/custom/post_start_network.sh
You can change bond to active backup or 802.3ad (LACP), they are more widely used and we test them alot. LACP does load balancing.
You can change the bond configuration in node via
/opt/petasan/config/cluster_info.json
change balance-alb to active-backup or 802.3ad
For LACP you would also need to configure the switch ports.
You should reboot or you can try without reboot
/opt/petasan/scripts/node_start_ips.py
You can also add any custom configuration in
/opt/petasan/scripts/custom/post_start_network.sh
Backend bond failover issues
robindewolf
9 Posts
Quote from robindewolf on March 1, 2023, 2:40 pmHello admin,
We are suffering the same issue as described in the following thread: https://www.petasan.org/forums/?view=thread&id=772
Our situation is the following:
Our PetaSAN cluster (version 3.1.0) consists of 6 hosts.
Every host has 3 NIC’s:
- Card 1: 2x1Gb for management
- Card 2: 2x10Gb (1 for backend and 1 for iSCSI)
- Card 3: 2x10Gb (1 for backend and 1 for iSCSI)
Two bonds are created on the cluster:
- 1 management bond with 2x1Gb (balance-alb)
- 1 backend bond with 2x10Gb (balance-alb), for redundancy we took 1x10Gb from Card 2 and 1x 10Gb from Card 3
The iSCSI uplink is done on 2 separate switches, dedicated for iSCSI (without vLAN config, etc.).
The backend uplink is done on 2 separate switches, with a 2x40Gb interlink.
If we disable a link on the backend switch (link between host A and backend switch), we end up in a situation that OSD’s from host A are going down. Next to this, we see that also iSCSI interfaces are going down even if there are no iSCSI interfaces active on host A.
During this event, we are still able to ping the backend ip and management ip of host A.
On the PetaSAN dashboard, we see the following warnings:
SLOW OSD heartbeats on back (longest 3000ms)
SLOW OSD heartbeats on front (longest 3000ms
33 slow ops, oldest one blocked for 147 sec, mon.HOST_C has slow ops
If we now reboot host A (without enabling the link), the cluster is returning to the HEALTH_OK state after a few minutes.
Can you advise us how to solve this issue. Currently this issue prevents us from doing maintenance on our backend switches.
The thread mentioned above talks about changing the balance-alb into an active-passive or balance-tlb bond mode. Is it possible to change this configuration on a running PetaSAN cluster?
Thanks a lot,
Robin
Hello admin,
We are suffering the same issue as described in the following thread: https://www.petasan.org/forums/?view=thread&id=772
Our situation is the following:
Our PetaSAN cluster (version 3.1.0) consists of 6 hosts.
Every host has 3 NIC’s:
- Card 1: 2x1Gb for management
- Card 2: 2x10Gb (1 for backend and 1 for iSCSI)
- Card 3: 2x10Gb (1 for backend and 1 for iSCSI)
Two bonds are created on the cluster:
- 1 management bond with 2x1Gb (balance-alb)
- 1 backend bond with 2x10Gb (balance-alb), for redundancy we took 1x10Gb from Card 2 and 1x 10Gb from Card 3
The iSCSI uplink is done on 2 separate switches, dedicated for iSCSI (without vLAN config, etc.).
The backend uplink is done on 2 separate switches, with a 2x40Gb interlink.
If we disable a link on the backend switch (link between host A and backend switch), we end up in a situation that OSD’s from host A are going down. Next to this, we see that also iSCSI interfaces are going down even if there are no iSCSI interfaces active on host A.
During this event, we are still able to ping the backend ip and management ip of host A.
On the PetaSAN dashboard, we see the following warnings:
SLOW OSD heartbeats on back (longest 3000ms)
SLOW OSD heartbeats on front (longest 3000ms
33 slow ops, oldest one blocked for 147 sec, mon.HOST_C has slow ops
If we now reboot host A (without enabling the link), the cluster is returning to the HEALTH_OK state after a few minutes.
Can you advise us how to solve this issue. Currently this issue prevents us from doing maintenance on our backend switches.
The thread mentioned above talks about changing the balance-alb into an active-passive or balance-tlb bond mode. Is it possible to change this configuration on a running PetaSAN cluster?
Thanks a lot,
Robin
admin
2,930 Posts
Quote from admin on March 1, 2023, 11:02 pmYou can change bond to active backup or 802.3ad (LACP), they are more widely used and we test them alot. LACP does load balancing.
You can change the bond configuration in node via
/opt/petasan/config/cluster_info.jsonchange balance-alb to active-backup or 802.3ad
For LACP you would also need to configure the switch ports.You should reboot or you can try without reboot
/opt/petasan/scripts/node_start_ips.pyYou can also add any custom configuration in
/opt/petasan/scripts/custom/post_start_network.sh
You can change bond to active backup or 802.3ad (LACP), they are more widely used and we test them alot. LACP does load balancing.
You can change the bond configuration in node via
/opt/petasan/config/cluster_info.json
change balance-alb to active-backup or 802.3ad
For LACP you would also need to configure the switch ports.
You should reboot or you can try without reboot
/opt/petasan/scripts/node_start_ips.py
You can also add any custom configuration in
/opt/petasan/scripts/custom/post_start_network.sh