Forums - PetaSAN

ForumBug ReportingCluster has one or more osd failu …
You need to log in to create posts and topics. Login · Register
Cluster has one or more osd failures please check the following

R3LZX
50 Posts

August 14, 2019, 1:37 pm
Quote from R3LZX on August 14, 2019, 1:37 pm
I did a clean install of the new release nautilus and its been running about 2 days, this morning I decided to setup the smtp relay and immediately after this three osds went down on the primary node. I don't see where this is correlated, but it happened right after. a reboot of the server resolved the issue, and a time issue came up which I resolved via command line. This happened again on the primary after the time was updated correctly as well, on node number 2. I have logs from nodes list if you want them

I did a clean install of the new release nautilus and its been running about 2 days, this morning I decided to setup the smtp relay and immediately after this three osds went down on the primary node. I don't see where this is correlated, but it happened right after. a reboot of the server resolved the issue, and a time issue came up which I resolved via command line. This happened again on the primary after the time was updated correctly as well, on node number 2. I have logs from nodes list if you want them

Last edited on August 14, 2019, 1:38 pm by R3LZX · #1

admin
2,930 Posts

August 14, 2019, 2:46 pm
Quote from admin on August 14, 2019, 2:46 pm
if i understand you correctly, it seems the issue is related to time setting and was felt after you installed smtp. Ceph is very sensitive to time synchronization ( 0.3 s ) , this is why in PetaSAN we set up ntp between the nodes, you can setup an external ntp server in PetaSAN, the setup takes care of adjusting the time very slowly between the servers, if you do not setup an external ntp, we sync to the first node's time then 2nd..etc.

Maybe the smtp setup does its own sync with a external time server that is hardcoded , if so this could have thrown the time too much in the past or in future for OSDs to function correctly.

if i understand you correctly, it seems the issue is related to time setting and was felt after you installed smtp. Ceph is very sensitive to time synchronization ( 0.3 s ) , this is why in PetaSAN we set up ntp between the nodes, you can setup an external ntp server in PetaSAN, the setup takes care of adjusting the time very slowly between the servers, if you do not setup an external ntp, we sync to the first node's time then 2nd..etc.

Maybe the smtp setup does its own sync with a external time server that is hardcoded , if so this could have thrown the time too much in the past or in future for OSDs to function correctly.

Last edited on August 14, 2019, 3:30 pm by admin · #2

R3LZX
50 Posts

August 14, 2019, 3:03 pm
Quote from R3LZX on August 14, 2019, 3:03 pm
looks like that was it I ran the following commands on each node

service ntp stop

ntpdate ntp.ubuntu.com

service ntp start

I still have the 0.north-america.pool.ntp.org listed in the general settings, do you recommend this is used?

looks like that was it I ran the following commands on each node

service ntp stop

ntpdate ntp.ubuntu.com

service ntp start

I still have the 0.north-america.pool.ntp.org listed in the general settings, do you recommend this is used?

#3

admin
2,930 Posts

August 14, 2019, 3:26 pm
Quote from admin on August 14, 2019, 3:26 pm
no particular recommendation on what external ntp server to use. in fact as far as we are concerned we can run without one at all, as long as we sync the nodes among themselves we are ok.

no particular recommendation on what external ntp server to use. in fact as far as we are concerned we can run without one at all, as long as we sync the nodes among themselves we are ok.

#4

Post Reply: Cluster has one or more osd failures please check the following

Cancel