Cluster has one or more osd failures please check the following
R3LZX
50 Posts
August 14, 2019, 1:37 pmQuote from R3LZX on August 14, 2019, 1:37 pmI did a clean install of the new release nautilus and its been running about 2 days, this morning I decided to setup the smtp relay and immediately after this three osds went down on the primary node. I don't see where this is correlated, but it happened right after. a reboot of the server resolved the issue, and a time issue came up which I resolved via command line. This happened again on the primary after the time was updated correctly as well, on node number 2. I have logs from nodes list if you want them
I did a clean install of the new release nautilus and its been running about 2 days, this morning I decided to setup the smtp relay and immediately after this three osds went down on the primary node. I don't see where this is correlated, but it happened right after. a reboot of the server resolved the issue, and a time issue came up which I resolved via command line. This happened again on the primary after the time was updated correctly as well, on node number 2. I have logs from nodes list if you want them
Last edited on August 14, 2019, 1:38 pm by R3LZX · #1
admin
2,918 Posts
August 14, 2019, 2:46 pmQuote from admin on August 14, 2019, 2:46 pmif i understand you correctly, it seems the issue is related to time setting and was felt after you installed smtp. Ceph is very sensitive to time synchronization ( 0.3 s ) , this is why in PetaSAN we set up ntp between the nodes, you can setup an external ntp server in PetaSAN, the setup takes care of adjusting the time very slowly between the servers, if you do not setup an external ntp, we sync to the first node's time then 2nd..etc.
Maybe the smtp setup does its own sync with a external time server that is hardcoded , if so this could have thrown the time too much in the past or in future for OSDs to function correctly.
if i understand you correctly, it seems the issue is related to time setting and was felt after you installed smtp. Ceph is very sensitive to time synchronization ( 0.3 s ) , this is why in PetaSAN we set up ntp between the nodes, you can setup an external ntp server in PetaSAN, the setup takes care of adjusting the time very slowly between the servers, if you do not setup an external ntp, we sync to the first node's time then 2nd..etc.
Maybe the smtp setup does its own sync with a external time server that is hardcoded , if so this could have thrown the time too much in the past or in future for OSDs to function correctly.
Last edited on August 14, 2019, 3:30 pm by admin · #2
R3LZX
50 Posts
August 14, 2019, 3:03 pmQuote from R3LZX on August 14, 2019, 3:03 pmlooks like that was it I ran the following commands on each node
service ntp stop
ntpdate ntp.ubuntu.com
service ntp start
I still have the 0.north-america.pool.ntp.org listed in the general settings, do you recommend this is used?
looks like that was it I ran the following commands on each node
service ntp stop
ntpdate ntp.ubuntu.com
service ntp start
I still have the 0.north-america.pool.ntp.org listed in the general settings, do you recommend this is used?
admin
2,918 Posts
August 14, 2019, 3:26 pmQuote from admin on August 14, 2019, 3:26 pmno particular recommendation on what external ntp server to use. in fact as far as we are concerned we can run without one at all, as long as we sync the nodes among themselves we are ok.
no particular recommendation on what external ntp server to use. in fact as far as we are concerned we can run without one at all, as long as we sync the nodes among themselves we are ok.
Cluster has one or more osd failures please check the following
R3LZX
50 Posts
Quote from R3LZX on August 14, 2019, 1:37 pmI did a clean install of the new release nautilus and its been running about 2 days, this morning I decided to setup the smtp relay and immediately after this three osds went down on the primary node. I don't see where this is correlated, but it happened right after. a reboot of the server resolved the issue, and a time issue came up which I resolved via command line. This happened again on the primary after the time was updated correctly as well, on node number 2. I have logs from nodes list if you want them
I did a clean install of the new release nautilus and its been running about 2 days, this morning I decided to setup the smtp relay and immediately after this three osds went down on the primary node. I don't see where this is correlated, but it happened right after. a reboot of the server resolved the issue, and a time issue came up which I resolved via command line. This happened again on the primary after the time was updated correctly as well, on node number 2. I have logs from nodes list if you want them
admin
2,918 Posts
Quote from admin on August 14, 2019, 2:46 pmif i understand you correctly, it seems the issue is related to time setting and was felt after you installed smtp. Ceph is very sensitive to time synchronization ( 0.3 s ) , this is why in PetaSAN we set up ntp between the nodes, you can setup an external ntp server in PetaSAN, the setup takes care of adjusting the time very slowly between the servers, if you do not setup an external ntp, we sync to the first node's time then 2nd..etc.
Maybe the smtp setup does its own sync with a external time server that is hardcoded , if so this could have thrown the time too much in the past or in future for OSDs to function correctly.
if i understand you correctly, it seems the issue is related to time setting and was felt after you installed smtp. Ceph is very sensitive to time synchronization ( 0.3 s ) , this is why in PetaSAN we set up ntp between the nodes, you can setup an external ntp server in PetaSAN, the setup takes care of adjusting the time very slowly between the servers, if you do not setup an external ntp, we sync to the first node's time then 2nd..etc.
Maybe the smtp setup does its own sync with a external time server that is hardcoded , if so this could have thrown the time too much in the past or in future for OSDs to function correctly.
R3LZX
50 Posts
Quote from R3LZX on August 14, 2019, 3:03 pmlooks like that was it I ran the following commands on each node
service ntp stop
ntpdate ntp.ubuntu.com
service ntp start
I still have the 0.north-america.pool.ntp.org listed in the general settings, do you recommend this is used?
looks like that was it I ran the following commands on each node
service ntp stop
ntpdate ntp.ubuntu.com
service ntp start
I still have the 0.north-america.pool.ntp.org listed in the general settings, do you recommend this is used?
admin
2,918 Posts
Quote from admin on August 14, 2019, 3:26 pmno particular recommendation on what external ntp server to use. in fact as far as we are concerned we can run without one at all, as long as we sync the nodes among themselves we are ok.
no particular recommendation on what external ntp server to use. in fact as far as we are concerned we can run without one at all, as long as we sync the nodes among themselves we are ok.