Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

eMail notification OSDs down

Hi,

about the notification. We got now the second time an email during the night that said:

Dear PetaSAN user;

Cluster has one or more osd failures, please check the following osd(s):

- osd.33/HBPS02
- osd.55/HBPS02

If i look at the status in the morning:

-> Ceph Health OK
-> Physical Disk - OSD 33 - SMART Test Passed

I check the SMART status of the controller - also OK.

Do you have a explanation for this? Is the disk may be starting to die?
But if there could be a problem with the disc why did it get UP again instead of staying DOWN?

Thanks for you help!

 

 

It may be the OSD failed and then restarted. Can you look at the log file in /var/log/ceph on HBPS02 for osd 33 and 55. Also look at the OSD chart in dashboard to see if it shows a failure in the chart.  The failure does not necessarily a problem with the disk media, the logs should tell us.

we have also seen the same issue. One thing to note is the following Ceph Bug Tracker URL.

We are currently running Petasan 2.2.0 w/ Ceph 12.2.7.

Per the URL below, this issue is fixed in Ceph 12.2.8.

https://tracker.ceph.com/issues/23431