Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Petasan not comming online after reboot

Pages: 1 2

Hi,

After i have rebooted my 3 nodes, the webui and ceph status command is timing out.

In all 3 nodes in /opt/petasan/log/PetaSAN.log
Its output:
Consul is not responding, will check cluster backend ips:
19/10/2022 14:57:27 INFO Checking backend latencies :
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.11 =
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.12 = 9.9 us
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.13 =

The only latency it finds is the node's own IP.
All the nodes can ping each other on the backend and management IP.
They can also ssh to each other on the management IP

After the last reboot, I created a new user and added an SSH key to that user - I can't see what that has to do with it.

make sure your subnets do not overlap

can you ssh to each node or login using shell and show output of

ip addr

/opt/petasan/scripts/detect-interfaces.sh

Node01:
https://prnt.sc/haqq2hErUYxb

Node02:
https://prnt.sc/u-FwnX1A_1hG

Node03:
https://prnt.sc/AR2_pRoLaesZ

if the nodes can ping each other on backend and management, try to manually restart the mon service on the 3 nodes and see if ceph status starts to work. if so it will indicate that both ceph (and consul)  when starting during boot the nodes could not see other for some reason.

All nodes can ping each other on the backend and management.
I can't use any ceph commands, it will just timeout on all nodes.

I have tried restarting all 3 nodes, but that did not help.

Can you post your /opt/petasan/config/cluster_info.json

Can you check status of monitors on all 3 nodes: systemctl status ceph-mon@HOSTNAME

Can you try to restart the monitors on all 3 nodes:  systemctl restart ceph-mon@HOSTNAME

Do you see any errors in monitor log file in : /var/log/ceph ?

what is consul status using command:   consul members

/opt/petasan/config/cluster_info.json:
https://prnt.sc/UWCUFAimdcl0

Monitor status after Server has restartet:
https://prnt.sc/Fg-4uuOiGyjY

Monitor status after restarted monitors:
https://prnt.sc/oIuzGT0ayYBJ

Monitor log files:
https://prnt.sc/gvFab1DL2DXv

Consul status:
https://prnt.sc/v6lAnF0aSeU1

 

Can you double check the swicth ports are set up correctly for the bonds and vlans.

What is output of

cat /sys/class/net/bond0/bonding/slaves
cat /proc/net/bonding/bond0

If this is an attempt to install new cluster, i re-recommend re-install fresh, possibly first test without bonds/vlans then another time with them, it could help know where the issue is.

 

The Switch Configuration is correct, I can see the mac-address on all the different interfaces.

It is unfortunately not a new install, it is a running storage that we use for backup.

There is the config on the server, it's all the same on the other ones.

https://prnt.sc/T5Fy_tCAUmxi

Here is the switch config, if that will help:
https://prnt.sc/6LEiuBBPLpa7

Wait 30 min then try to restart the monitor via prev command

If it does not start, post the entire mon log file

Pages: 1 2