Petasan not comming online after reboot
Pages: 1 2
martinkp
10 Posts
October 19, 2022, 7:05 pmQuote from martinkp on October 19, 2022, 7:05 pmHi,
After i have rebooted my 3 nodes, the webui and ceph status command is timing out.
In all 3 nodes in /opt/petasan/log/PetaSAN.log
Its output:
Consul is not responding, will check cluster backend ips:
19/10/2022 14:57:27 INFO Checking backend latencies :
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.11 =
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.12 = 9.9 us
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.13 =
The only latency it finds is the node's own IP.
All the nodes can ping each other on the backend and management IP.
They can also ssh to each other on the management IP
After the last reboot, I created a new user and added an SSH key to that user - I can't see what that has to do with it.
Hi,
After i have rebooted my 3 nodes, the webui and ceph status command is timing out.
In all 3 nodes in /opt/petasan/log/PetaSAN.log
Its output:
Consul is not responding, will check cluster backend ips:
19/10/2022 14:57:27 INFO Checking backend latencies :
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.11 =
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.12 = 9.9 us
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.13 =
The only latency it finds is the node's own IP.
All the nodes can ping each other on the backend and management IP.
They can also ssh to each other on the management IP
After the last reboot, I created a new user and added an SSH key to that user - I can't see what that has to do with it.
admin
2,930 Posts
October 19, 2022, 7:37 pmQuote from admin on October 19, 2022, 7:37 pmmake sure your subnets do not overlap
can you ssh to each node or login using shell and show output of
ip addr
/opt/petasan/scripts/detect-interfaces.sh
make sure your subnets do not overlap
can you ssh to each node or login using shell and show output of
ip addr
/opt/petasan/scripts/detect-interfaces.sh
martinkp
10 Posts
October 19, 2022, 7:42 pmQuote from martinkp on October 19, 2022, 7:42 pmNode01:
https://prnt.sc/haqq2hErUYxb
Node02:
https://prnt.sc/u-FwnX1A_1hG
Node03:
https://prnt.sc/AR2_pRoLaesZ
Node01:
https://prnt.sc/haqq2hErUYxb
Node02:
https://prnt.sc/u-FwnX1A_1hG
Node03:
https://prnt.sc/AR2_pRoLaesZ
admin
2,930 Posts
October 19, 2022, 9:17 pmQuote from admin on October 19, 2022, 9:17 pmif the nodes can ping each other on backend and management, try to manually restart the mon service on the 3 nodes and see if ceph status starts to work. if so it will indicate that both ceph (and consul) when starting during boot the nodes could not see other for some reason.
if the nodes can ping each other on backend and management, try to manually restart the mon service on the 3 nodes and see if ceph status starts to work. if so it will indicate that both ceph (and consul) when starting during boot the nodes could not see other for some reason.
martinkp
10 Posts
October 20, 2022, 4:42 pmQuote from martinkp on October 20, 2022, 4:42 pmAll nodes can ping each other on the backend and management.
I can't use any ceph commands, it will just timeout on all nodes.
I have tried restarting all 3 nodes, but that did not help.
All nodes can ping each other on the backend and management.
I can't use any ceph commands, it will just timeout on all nodes.
I have tried restarting all 3 nodes, but that did not help.
Last edited on October 20, 2022, 4:45 pm by martinkp · #5
admin
2,930 Posts
October 21, 2022, 11:31 amQuote from admin on October 21, 2022, 11:31 amCan you post your /opt/petasan/config/cluster_info.json
Can you check status of monitors on all 3 nodes: systemctl status ceph-mon@HOSTNAME
Can you try to restart the monitors on all 3 nodes: systemctl restart ceph-mon@HOSTNAME
Do you see any errors in monitor log file in : /var/log/ceph ?
what is consul status using command: consul members
Can you post your /opt/petasan/config/cluster_info.json
Can you check status of monitors on all 3 nodes: systemctl status ceph-mon@HOSTNAME
Can you try to restart the monitors on all 3 nodes: systemctl restart ceph-mon@HOSTNAME
Do you see any errors in monitor log file in : /var/log/ceph ?
what is consul status using command: consul members
martinkp
10 Posts
October 21, 2022, 2:13 pmQuote from martinkp on October 21, 2022, 2:13 pm/opt/petasan/config/cluster_info.json:
https://prnt.sc/UWCUFAimdcl0
Monitor status after Server has restartet:
https://prnt.sc/Fg-4uuOiGyjY
Monitor status after restarted monitors:
https://prnt.sc/oIuzGT0ayYBJ
Monitor log files:
https://prnt.sc/gvFab1DL2DXv
Consul status:
https://prnt.sc/v6lAnF0aSeU1
/opt/petasan/config/cluster_info.json:
https://prnt.sc/UWCUFAimdcl0
Monitor status after Server has restartet:
https://prnt.sc/Fg-4uuOiGyjY
Monitor status after restarted monitors:
https://prnt.sc/oIuzGT0ayYBJ
Monitor log files:
https://prnt.sc/gvFab1DL2DXv
Consul status:
https://prnt.sc/v6lAnF0aSeU1
admin
2,930 Posts
October 21, 2022, 3:42 pmQuote from admin on October 21, 2022, 3:42 pmCan you double check the swicth ports are set up correctly for the bonds and vlans.
What is output of
cat /sys/class/net/bond0/bonding/slaves
cat /proc/net/bonding/bond0
If this is an attempt to install new cluster, i re-recommend re-install fresh, possibly first test without bonds/vlans then another time with them, it could help know where the issue is.
Can you double check the swicth ports are set up correctly for the bonds and vlans.
What is output of
cat /sys/class/net/bond0/bonding/slaves
cat /proc/net/bonding/bond0
If this is an attempt to install new cluster, i re-recommend re-install fresh, possibly first test without bonds/vlans then another time with them, it could help know where the issue is.
martinkp
10 Posts
October 21, 2022, 3:52 pmQuote from martinkp on October 21, 2022, 3:52 pmThe Switch Configuration is correct, I can see the mac-address on all the different interfaces.
It is unfortunately not a new install, it is a running storage that we use for backup.
There is the config on the server, it's all the same on the other ones.
https://prnt.sc/T5Fy_tCAUmxi
Here is the switch config, if that will help:
https://prnt.sc/6LEiuBBPLpa7
The Switch Configuration is correct, I can see the mac-address on all the different interfaces.
It is unfortunately not a new install, it is a running storage that we use for backup.
There is the config on the server, it's all the same on the other ones.
Here is the switch config, if that will help:
https://prnt.sc/6LEiuBBPLpa7
admin
2,930 Posts
October 21, 2022, 4:31 pmQuote from admin on October 21, 2022, 4:31 pmWait 30 min then try to restart the monitor via prev command
If it does not start, post the entire mon log file
Wait 30 min then try to restart the monitor via prev command
If it does not start, post the entire mon log file
Pages: 1 2
Petasan not comming online after reboot
martinkp
10 Posts
Quote from martinkp on October 19, 2022, 7:05 pmHi,
After i have rebooted my 3 nodes, the webui and ceph status command is timing out.
In all 3 nodes in /opt/petasan/log/PetaSAN.log
Its output:
Consul is not responding, will check cluster backend ips:
19/10/2022 14:57:27 INFO Checking backend latencies :
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.11 =
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.12 = 9.9 us
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.13 =The only latency it finds is the node's own IP.
All the nodes can ping each other on the backend and management IP.
They can also ssh to each other on the management IPAfter the last reboot, I created a new user and added an SSH key to that user - I can't see what that has to do with it.
Hi,
After i have rebooted my 3 nodes, the webui and ceph status command is timing out.
In all 3 nodes in /opt/petasan/log/PetaSAN.log
Its output:
Consul is not responding, will check cluster backend ips:
19/10/2022 14:57:27 INFO Checking backend latencies :
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.11 =
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.12 = 9.9 us
19/10/2022 14:57:27 INFO Network latency for backend 10.31.0.13 =
The only latency it finds is the node's own IP.
All the nodes can ping each other on the backend and management IP.
They can also ssh to each other on the management IP
After the last reboot, I created a new user and added an SSH key to that user - I can't see what that has to do with it.
admin
2,930 Posts
Quote from admin on October 19, 2022, 7:37 pmmake sure your subnets do not overlap
can you ssh to each node or login using shell and show output of
ip addr
/opt/petasan/scripts/detect-interfaces.sh
make sure your subnets do not overlap
can you ssh to each node or login using shell and show output of
ip addr
/opt/petasan/scripts/detect-interfaces.sh
martinkp
10 Posts
Quote from martinkp on October 19, 2022, 7:42 pmNode01:
https://prnt.sc/haqq2hErUYxbNode02:
https://prnt.sc/u-FwnX1A_1hGNode03:
https://prnt.sc/AR2_pRoLaesZ
Node01:
https://prnt.sc/haqq2hErUYxb
Node02:
https://prnt.sc/u-FwnX1A_1hG
Node03:
https://prnt.sc/AR2_pRoLaesZ
admin
2,930 Posts
Quote from admin on October 19, 2022, 9:17 pmif the nodes can ping each other on backend and management, try to manually restart the mon service on the 3 nodes and see if ceph status starts to work. if so it will indicate that both ceph (and consul) when starting during boot the nodes could not see other for some reason.
if the nodes can ping each other on backend and management, try to manually restart the mon service on the 3 nodes and see if ceph status starts to work. if so it will indicate that both ceph (and consul) when starting during boot the nodes could not see other for some reason.
martinkp
10 Posts
Quote from martinkp on October 20, 2022, 4:42 pmAll nodes can ping each other on the backend and management.
I can't use any ceph commands, it will just timeout on all nodes.I have tried restarting all 3 nodes, but that did not help.
All nodes can ping each other on the backend and management.
I can't use any ceph commands, it will just timeout on all nodes.
I have tried restarting all 3 nodes, but that did not help.
admin
2,930 Posts
Quote from admin on October 21, 2022, 11:31 amCan you post your /opt/petasan/config/cluster_info.json
Can you check status of monitors on all 3 nodes: systemctl status ceph-mon@HOSTNAME
Can you try to restart the monitors on all 3 nodes: systemctl restart ceph-mon@HOSTNAME
Do you see any errors in monitor log file in : /var/log/ceph ?
what is consul status using command: consul members
Can you post your /opt/petasan/config/cluster_info.json
Can you check status of monitors on all 3 nodes: systemctl status ceph-mon@HOSTNAME
Can you try to restart the monitors on all 3 nodes: systemctl restart ceph-mon@HOSTNAME
Do you see any errors in monitor log file in : /var/log/ceph ?
what is consul status using command: consul members
martinkp
10 Posts
Quote from martinkp on October 21, 2022, 2:13 pm/opt/petasan/config/cluster_info.json:
https://prnt.sc/UWCUFAimdcl0Monitor status after Server has restartet:
https://prnt.sc/Fg-4uuOiGyjYMonitor status after restarted monitors:
https://prnt.sc/oIuzGT0ayYBJMonitor log files:
https://prnt.sc/gvFab1DL2DXvConsul status:
https://prnt.sc/v6lAnF0aSeU1
/opt/petasan/config/cluster_info.json:
https://prnt.sc/UWCUFAimdcl0
Monitor status after Server has restartet:
https://prnt.sc/Fg-4uuOiGyjY
Monitor status after restarted monitors:
https://prnt.sc/oIuzGT0ayYBJ
Monitor log files:
https://prnt.sc/gvFab1DL2DXv
Consul status:
https://prnt.sc/v6lAnF0aSeU1
admin
2,930 Posts
Quote from admin on October 21, 2022, 3:42 pmCan you double check the swicth ports are set up correctly for the bonds and vlans.
What is output of
cat /sys/class/net/bond0/bonding/slaves
cat /proc/net/bonding/bond0If this is an attempt to install new cluster, i re-recommend re-install fresh, possibly first test without bonds/vlans then another time with them, it could help know where the issue is.
Can you double check the swicth ports are set up correctly for the bonds and vlans.
What is output of
cat /sys/class/net/bond0/bonding/slaves
cat /proc/net/bonding/bond0
If this is an attempt to install new cluster, i re-recommend re-install fresh, possibly first test without bonds/vlans then another time with them, it could help know where the issue is.
martinkp
10 Posts
Quote from martinkp on October 21, 2022, 3:52 pmThe Switch Configuration is correct, I can see the mac-address on all the different interfaces.
It is unfortunately not a new install, it is a running storage that we use for backup.
There is the config on the server, it's all the same on the other ones.
https://prnt.sc/T5Fy_tCAUmxi
Here is the switch config, if that will help:
https://prnt.sc/6LEiuBBPLpa7
The Switch Configuration is correct, I can see the mac-address on all the different interfaces.
It is unfortunately not a new install, it is a running storage that we use for backup.
There is the config on the server, it's all the same on the other ones.
Here is the switch config, if that will help:
https://prnt.sc/6LEiuBBPLpa7
admin
2,930 Posts
Quote from admin on October 21, 2022, 4:31 pmWait 30 min then try to restart the monitor via prev command
If it does not start, post the entire mon log file
Wait 30 min then try to restart the monitor via prev command
If it does not start, post the entire mon log file