Forums - PetaSAN

ForumGeneral Discussion2 Nodes shutting down after Netwo …
You need to log in to create posts and topics. Login · Register
2 Nodes shutting down after Network impact

eazyadm
25 Posts

April 20, 2022, 8:11 am
Quote from eazyadm on April 20, 2022, 8:11 am
Hi,

we are Testing Petasan with iscsi and s3 in strange sczenarios to go for sure we can use it for production.

at the moment our environment hast 6 storage nodes and 3 monitoring nodes.

In the last test, we blockt all the networktraffic on both switches, so no communication between all nodes was possible. We left this state for about 6 hours.

The result was that two storage nodes (2 and 3) were shutted down automatic.

We rebooted the switches, network traffic was now possible again. We powered on both offline nodes, after they where up Node 1 and 3 shutted down.

How to debug that situation ?

Ceph Heath has following warnings:

Reduced data availability: 416 pgs inactive
Degraded data redundancy: 18641675/55508526 objects degraded (33.583%), 3254 pgs degraded, 2411 pgs undersized

Thanks

Kind Regards

Hi,

we are Testing Petasan with iscsi and s3 in strange sczenarios to go for sure we can use it for production.

at the moment our environment hast 6 storage nodes and 3 monitoring nodes.

In the last test, we blockt all the networktraffic on both switches, so no communication between all nodes was possible. We left this state for about 6 hours.

The result was that two storage nodes (2 and 3) were shutted down automatic.

We rebooted the switches, network traffic was now possible again. We powered on both offline nodes, after they where up Node 1 and 3 shutted down.

How to debug that situation ?

Ceph Heath has following warnings:

Reduced data availability: 416 pgs inactive
Degraded data redundancy: 18641675/55508526 objects degraded (33.583%), 3254 pgs degraded, 2411 pgs undersized

Thanks

Kind Regards

Last edited on April 21, 2022, 1:51 pm by eazyadm · #1

admin
2,969 Posts

April 20, 2022, 11:34 am
Quote from admin on April 20, 2022, 11:34 am
Probably the shutdown was due to fencing, you can disable fencing from maintenance page, but it is better to leave it.

Most cases, with a 2 switch setup and assume you have bonded interface for HA, you would want to test shutting 1 switch at a time to verify the network setup is highly available and the cluster keeps functioning.

Shutting both switches is essentially a cluster shutdown, all nodes lose connection to one another and you have no cluster. For nodes to re-peer, easiest thing is to restart all nodes.

Probably the shutdown was due to fencing, you can disable fencing from maintenance page, but it is better to leave it.

Most cases, with a 2 switch setup and assume you have bonded interface for HA, you would want to test shutting 1 switch at a time to verify the network setup is highly available and the cluster keeps functioning.

Shutting both switches is essentially a cluster shutdown, all nodes lose connection to one another and you have no cluster. For nodes to re-peer, easiest thing is to restart all nodes.

Last edited on April 20, 2022, 11:34 am by admin · #2

eazyadm
25 Posts

April 21, 2022, 6:51 am
Quote from eazyadm on April 21, 2022, 6:51 am
Thanks for your answer.

That shouldn't be a normal situation, but we are testing things like that to see what can happen.

After serveral restarts of the offline nodes, all came back.

We did the test again and disabled fencing, and we we had no shutdown.

Thanks a lot, and have a nice day.

Thanks for your answer.

That shouldn't be a normal situation, but we are testing things like that to see what can happen.

After serveral restarts of the offline nodes, all came back.

We did the test again and disabled fencing, and we we had no shutdown.

Thanks a lot, and have a nice day.

#3

Post Reply: 2 Nodes shutting down after Network impact

Cancel