Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

From the admin web application, add & remove OSD's?

Pages: 1 2

It does seem /etc/hosts got corrupt due to consul connection failure while joining the node, this is the reason the physical disk list does not open. To manually fix hosts file:

# stop auto sync service
systemctl stop petasan-file-sync
# manual fix hosts file
nano /etc/hosts
# sync the hosts file to all nodes:
/opt/petasan/scripts/util/sync_file.py /etc/hosts
# restart the sync service on current node
systemctl start petasan-file-sync

The root cause needs to be fixed, i would keep an eye on the log file and see if you do still see the consul connection errors. If you still see the consul connection errors in the logs, the system is not stable.
The most likely cause is flaky network or the system could be under powered, so under load (client io, recovery, scrub) it could be the system slowed to the point where it could not connect to the cluster. Observe your %disk busy,cpu, ram during/after the initial failure and see if those resources were maxed out. Do you have enough ram ?

Ok, it worked, thanks.

I agree that the system could be under powered: two nodes have 4MB ram and one node has only 100 Mbit ethernet ports. In fact, quite often during recovery I get emails from the cluster telling some OSD or one node are down, while actually they are still working. Anyway in this test cluster this is not a concern.

Bye, S.

Pages: 1 2