Forums - PetaSAN

ForumGeneral DiscussionCEPH cluster down
You need to log in to create posts and topics. Login · Register
CEPH cluster down

evankmd82
11 Posts

August 19, 2024, 4:38 am
Quote from evankmd82 on August 19, 2024, 4:38 am

Hello,

We had to move our ceph cluster physically. I followed all procedures (I thought) and shut them down at the same time cleanly. Upon starting them up, I am now seeing:

HEALTH_WARN 1 filesystem is degraded; 1 MDSs report slow metadata IOs; 23 osds down; Reduced data availability: 1082 pgs inactive, 341 pgs down; Degraded data redundancy: 5202276/19234662 objects degraded (27.046%), 1223 pgs degraded, 2262 pgs undersized; 6 slow ops, oldest one blocked for 426 sec, mon.nc-san3 has slow ops

What should I do to recover from this? We shut down the 5x SANs all at once, moved to a new facility, and started them back up. It does not seem to be getting better on its own.

I tried increasing the recovery speed options from the web interface.

I am running it on Ubuntu 18 and using PetaSAN 2.8.1 still.

It keeps showing various amounts of osds down, pgs down, etc but it is not seeming to recover on its own

Hello,

We had to move our ceph cluster physically. I followed all procedures (I thought) and shut them down at the same time cleanly. Upon starting them up, I am now seeing:

HEALTH_WARN 1 filesystem is degraded; 1 MDSs report slow metadata IOs; 23 osds down; Reduced data availability: 1082 pgs inactive, 341 pgs down; Degraded data redundancy: 5202276/19234662 objects degraded (27.046%), 1223 pgs degraded, 2262 pgs undersized; 6 slow ops, oldest one blocked for 426 sec, mon.nc-san3 has slow ops

What should I do to recover from this? We shut down the 5x SANs all at once, moved to a new facility, and started them back up. It does not seem to be getting better on its own.

I tried increasing the recovery speed options from the web interface.

I am running it on Ubuntu 18 and using PetaSAN 2.8.1 still.

It keeps showing various amounts of osds down, pgs down, etc but it is not seeming to recover on its own

#1

evankmd82
11 Posts

August 19, 2024, 5:06 am
Quote from evankmd82 on August 19, 2024, 5:06 am
It gets as low as 4 osds down and then starts climbing again. The pgs down/degraded/etc keep changing but never go in a fully positive direction. Any tips?

It gets as low as 4 osds down and then starts climbing again. The pgs down/degraded/etc keep changing but never go in a fully positive direction. Any tips?

#2

evankmd82
11 Posts

August 19, 2024, 6:15 am
Quote from evankmd82 on August 19, 2024, 6:15 am
I was able to resolve this.. seems there was a race condition created and I was able to get it to recover simply by restarting the ceph-osd daemons

I was able to resolve this.. seems there was a race condition created and I was able to get it to recover simply by restarting the ceph-osd daemons

#3

X1M
5 Posts

November 9, 2024, 6:24 am
Quote from X1M on November 9, 2024, 6:24 am
This is a good example of the beauty of PetaSAN, and CEPH. We have been using CEPH since 2017 and have had server crashing, power failure, disk failure and switch failures, and never had CEPH let me down. I all situations, CEPH started complaining and eventually it fixed itself. Sure, I had to reboot a node sometimes, and that’s it.

This is a good example of the beauty of PetaSAN, and CEPH. We have been using CEPH since 2017 and have had server crashing, power failure, disk failure and switch failures, and never had CEPH let me down. I all situations, CEPH started complaining and eventually it fixed itself. Sure, I had to reboot a node sometimes, and that’s it.

#4

Post Reply: CEPH cluster down

Cancel