eMail notification OSDs down
trexman
60 Posts
March 27, 2019, 11:34 amQuote from trexman on March 27, 2019, 11:34 amHi,
about the notification. We got now the second time an email during the night that said:
Dear PetaSAN user;
Cluster has one or more osd failures, please check the following osd(s):
- osd.33/HBPS02
- osd.55/HBPS02
If i look at the status in the morning:
-> Ceph Health OK
-> Physical Disk - OSD 33 - SMART Test Passed
I check the SMART status of the controller - also OK.
Do you have a explanation for this? Is the disk may be starting to die?
But if there could be a problem with the disc why did it get UP again instead of staying DOWN?
Thanks for you help!
Hi,
about the notification. We got now the second time an email during the night that said:
Dear PetaSAN user;
Cluster has one or more osd failures, please check the following osd(s):
- osd.33/HBPS02
- osd.55/HBPS02
If i look at the status in the morning:
-> Ceph Health OK
-> Physical Disk - OSD 33 - SMART Test Passed
I check the SMART status of the controller - also OK.
Do you have a explanation for this? Is the disk may be starting to die?
But if there could be a problem with the disc why did it get UP again instead of staying DOWN?
Thanks for you help!
admin
2,930 Posts
March 27, 2019, 1:53 pmQuote from admin on March 27, 2019, 1:53 pmIt may be the OSD failed and then restarted. Can you look at the log file in /var/log/ceph on HBPS02 for osd 33 and 55. Also look at the OSD chart in dashboard to see if it shows a failure in the chart. The failure does not necessarily a problem with the disk media, the logs should tell us.
It may be the OSD failed and then restarted. Can you look at the log file in /var/log/ceph on HBPS02 for osd 33 and 55. Also look at the OSD chart in dashboard to see if it shows a failure in the chart. The failure does not necessarily a problem with the disk media, the logs should tell us.
ghbiz
76 Posts
April 2, 2019, 2:49 amQuote from ghbiz on April 2, 2019, 2:49 amwe have also seen the same issue. One thing to note is the following Ceph Bug Tracker URL.
We are currently running Petasan 2.2.0 w/ Ceph 12.2.7.
Per the URL below, this issue is fixed in Ceph 12.2.8.
we have also seen the same issue. One thing to note is the following Ceph Bug Tracker URL.
We are currently running Petasan 2.2.0 w/ Ceph 12.2.7.
Per the URL below, this issue is fixed in Ceph 12.2.8.
eMail notification OSDs down
trexman
60 Posts
Quote from trexman on March 27, 2019, 11:34 amHi,
about the notification. We got now the second time an email during the night that said:
Dear PetaSAN user;
Cluster has one or more osd failures, please check the following osd(s):
- osd.33/HBPS02
- osd.55/HBPS02If i look at the status in the morning:
-> Ceph Health OK
-> Physical Disk - OSD 33 - SMART Test PassedI check the SMART status of the controller - also OK.
Do you have a explanation for this? Is the disk may be starting to die?
But if there could be a problem with the disc why did it get UP again instead of staying DOWN?Thanks for you help!
Hi,
about the notification. We got now the second time an email during the night that said:
Dear PetaSAN user;
Cluster has one or more osd failures, please check the following osd(s):
- osd.33/HBPS02
- osd.55/HBPS02
If i look at the status in the morning:
-> Ceph Health OK
-> Physical Disk - OSD 33 - SMART Test Passed
I check the SMART status of the controller - also OK.
Do you have a explanation for this? Is the disk may be starting to die?
But if there could be a problem with the disc why did it get UP again instead of staying DOWN?
Thanks for you help!
admin
2,930 Posts
Quote from admin on March 27, 2019, 1:53 pmIt may be the OSD failed and then restarted. Can you look at the log file in /var/log/ceph on HBPS02 for osd 33 and 55. Also look at the OSD chart in dashboard to see if it shows a failure in the chart. The failure does not necessarily a problem with the disk media, the logs should tell us.
It may be the OSD failed and then restarted. Can you look at the log file in /var/log/ceph on HBPS02 for osd 33 and 55. Also look at the OSD chart in dashboard to see if it shows a failure in the chart. The failure does not necessarily a problem with the disk media, the logs should tell us.
ghbiz
76 Posts
Quote from ghbiz on April 2, 2019, 2:49 amwe have also seen the same issue. One thing to note is the following Ceph Bug Tracker URL.
We are currently running Petasan 2.2.0 w/ Ceph 12.2.7.
Per the URL below, this issue is fixed in Ceph 12.2.8.
we have also seen the same issue. One thing to note is the following Ceph Bug Tracker URL.
We are currently running Petasan 2.2.0 w/ Ceph 12.2.7.
Per the URL below, this issue is fixed in Ceph 12.2.8.