Clock Skew on 1st and 2nd mon node
RST
17 Posts
July 9, 2021, 7:45 amQuote from RST on July 9, 2021, 7:45 amHi,
Since last night, we have a warning on the 1st and 2nd monitoring nodes, that there is a clock skew.
If we check the "ceph health detail" there is a difference of 0.49s between them and the 3rd node.
But these two nodes are syncing against our ntp server. If we check with ntpdate -q 10.10.18.254 (our ntp server) there is no difference.
The 3rd node is getting it's ntp time from one of the two other and there is the difference of 0.49s.
Any idea of how we can fix that?
BR,
Reto
Hi,
Since last night, we have a warning on the 1st and 2nd monitoring nodes, that there is a clock skew.
If we check the "ceph health detail" there is a difference of 0.49s between them and the 3rd node.
But these two nodes are syncing against our ntp server. If we check with ntpdate -q 10.10.18.254 (our ntp server) there is no difference.
The 3rd node is getting it's ntp time from one of the two other and there is the difference of 0.49s.
Any idea of how we can fix that?
BR,
Reto
admin
2,930 Posts
July 9, 2021, 12:01 pmQuote from admin on July 9, 2021, 12:01 pmGenerally this resolves itself.
You can force sync on 3rd node:
systemctl stop ntp
ntpdate <ip>
systemctl start ntp
Generally this resolves itself.
You can force sync on 3rd node:
systemctl stop ntp
ntpdate <ip>
systemctl start ntp
Shiori
86 Posts
December 17, 2021, 6:46 pmQuote from Shiori on December 17, 2021, 6:46 pmThis is an old problem that seems to crop up once in a while.
The servers set their time on bootup then when they join the monitor cluster they are supposed to sync to each other (node 1 to be exact), however there is an inherent delay in this process that can cause the sync target to float out of scope and the difference will grow until the difference is too much for petasan.
To resolve this, we have a script on each node that triggers an ntp resync using ntpdate that runs as soon as the network adapter is up and linked. This is exactly what petasan does but this way its done twice and to a single reference before the servers try to coordinate. Its a bit of a hack but atleast the time difference issue isnt a problem anymore.
This is an old problem that seems to crop up once in a while.
The servers set their time on bootup then when they join the monitor cluster they are supposed to sync to each other (node 1 to be exact), however there is an inherent delay in this process that can cause the sync target to float out of scope and the difference will grow until the difference is too much for petasan.
To resolve this, we have a script on each node that triggers an ntp resync using ntpdate that runs as soon as the network adapter is up and linked. This is exactly what petasan does but this way its done twice and to a single reference before the servers try to coordinate. Its a bit of a hack but atleast the time difference issue isnt a problem anymore.
Clock Skew on 1st and 2nd mon node
RST
17 Posts
Quote from RST on July 9, 2021, 7:45 amHi,
Since last night, we have a warning on the 1st and 2nd monitoring nodes, that there is a clock skew.
If we check the "ceph health detail" there is a difference of 0.49s between them and the 3rd node.
But these two nodes are syncing against our ntp server. If we check with ntpdate -q 10.10.18.254 (our ntp server) there is no difference.The 3rd node is getting it's ntp time from one of the two other and there is the difference of 0.49s.
Any idea of how we can fix that?BR,
Reto
Hi,
Since last night, we have a warning on the 1st and 2nd monitoring nodes, that there is a clock skew.
If we check the "ceph health detail" there is a difference of 0.49s between them and the 3rd node.
But these two nodes are syncing against our ntp server. If we check with ntpdate -q 10.10.18.254 (our ntp server) there is no difference.
The 3rd node is getting it's ntp time from one of the two other and there is the difference of 0.49s.
Any idea of how we can fix that?
BR,
Reto
admin
2,930 Posts
Quote from admin on July 9, 2021, 12:01 pmGenerally this resolves itself.
You can force sync on 3rd node:
systemctl stop ntp
ntpdate <ip>
systemctl start ntp
Generally this resolves itself.
You can force sync on 3rd node:
systemctl stop ntp
ntpdate <ip>
systemctl start ntp
Shiori
86 Posts
Quote from Shiori on December 17, 2021, 6:46 pmThis is an old problem that seems to crop up once in a while.
The servers set their time on bootup then when they join the monitor cluster they are supposed to sync to each other (node 1 to be exact), however there is an inherent delay in this process that can cause the sync target to float out of scope and the difference will grow until the difference is too much for petasan.
To resolve this, we have a script on each node that triggers an ntp resync using ntpdate that runs as soon as the network adapter is up and linked. This is exactly what petasan does but this way its done twice and to a single reference before the servers try to coordinate. Its a bit of a hack but atleast the time difference issue isnt a problem anymore.
This is an old problem that seems to crop up once in a while.
The servers set their time on bootup then when they join the monitor cluster they are supposed to sync to each other (node 1 to be exact), however there is an inherent delay in this process that can cause the sync target to float out of scope and the difference will grow until the difference is too much for petasan.
To resolve this, we have a script on each node that triggers an ntp resync using ntpdate that runs as soon as the network adapter is up and linked. This is exactly what petasan does but this way its done twice and to a single reference before the servers try to coordinate. Its a bit of a hack but atleast the time difference issue isnt a problem anymore.