Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Clock Skew on 1st and 2nd mon node

Hi,

Since last night, we have a warning on the 1st and 2nd monitoring nodes, that there is a clock skew.

If we check the "ceph health detail" there is a difference of 0.49s between them and the 3rd node.
But these two nodes are syncing against our ntp server. If we check with ntpdate -q 10.10.18.254 (our ntp server) there is no difference.

The 3rd node is getting it's ntp time from one of the two other and there is the difference of 0.49s.
Any idea of how we can fix that?

BR,
Reto

Generally this resolves itself.

You can force sync on 3rd node:

systemctl stop ntp
ntpdate <ip>
systemctl start ntp

This is an old problem that seems to crop up once in a while.

The servers set their time on bootup then when they join the monitor cluster they are supposed to sync to each other (node 1 to be exact), however there is an inherent delay in this process that can cause the sync target to float out of scope and the difference will grow until the difference is too much for petasan.

To resolve this, we have a script on each node that triggers an ntp resync using ntpdate that runs as soon as the network adapter is up and linked. This is exactly what petasan does but  this way its done twice and to a single reference before the servers try to coordinate. Its a bit of a hack but atleast the time difference issue isnt a problem anymore.