Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Cluster has one or more osd failures please check the following

I did a clean install of the new release nautilus and its been running about 2 days, this morning I decided to setup the smtp relay and immediately after this three osds went down on the primary node.  I don't see where this is correlated, but it happened right after.  a reboot of the server resolved the issue, and a time issue came up which I resolved via command line.  This happened again on the primary after the time was updated correctly as well, on node number 2. I have logs from nodes list if you want them

 

 

if i understand you correctly, it seems the issue is related to time setting and was felt after you installed smtp.  Ceph is very sensitive to time synchronization ( 0.3 s ) ,  this is why in PetaSAN we set up ntp between the nodes, you can setup an external ntp server in PetaSAN, the setup takes care of adjusting the time very slowly between the servers, if you do not setup an external ntp, we sync to the first node's time then 2nd..etc.

Maybe the smtp setup does its own sync with a external time server that is hardcoded , if so this could have thrown the time too much in the past or in future for OSDs to function correctly.

looks like that was it I ran the following commands on each node

 

service ntp stop

ntpdate ntp.ubuntu.com

service ntp start

 

I still have the 0.north-america.pool.ntp.org listed in the general settings, do you recommend this is used?

 

no particular recommendation on what external ntp server to use. in fact as far as we are concerned we can run without one at all,  as long as we sync the nodes among themselves we are ok.