Node freezes when adding to cluster
Pages: 1 2
shuelin@discovernet.ca
6 Posts
April 14, 2021, 9:48 pmQuote from shuelin@discovernet.ca on April 14, 2021, 9:48 pmwe have a node that when an attempt it made to add it to a cluster it freezes. no kernel panic just a straight up freeze. the config is 2 45 drives stornators and 1 other intel box.
things we have tried.
- creating the cluster with each of the different machines first and adding in different orders
- the 3rd node does the same thing virtual or physical
- installing petasan on different physical disks
- creating an all virtual 3 node cluster (works no problem)
- adding the nodes together with only management and nothing else
- adjusting the IP ranges in case i missed an IP conflict
- any combination of the above
the only thing we can get to work is 3 virtual nodes #4
there are no errors it just stops
has anyone elses seen this or have any advice? i have 20 - 18 tb just waiting to get used and filled
thanks
we have a node that when an attempt it made to add it to a cluster it freezes. no kernel panic just a straight up freeze. the config is 2 45 drives stornators and 1 other intel box.
things we have tried.
- creating the cluster with each of the different machines first and adding in different orders
- the 3rd node does the same thing virtual or physical
- installing petasan on different physical disks
- creating an all virtual 3 node cluster (works no problem)
- adding the nodes together with only management and nothing else
- adjusting the IP ranges in case i missed an IP conflict
- any combination of the above
the only thing we can get to work is 3 virtual nodes #4
there are no errors it just stops
has anyone elses seen this or have any advice? i have 20 - 18 tb just waiting to get used and filled
thanks
Last edited on April 14, 2021, 9:54 pm by shuelin@discovernet.ca · #1
admin
2,930 Posts
April 16, 2021, 10:51 amQuote from admin on April 16, 2021, 10:51 amCan you try to build the cluster without specifying any OSDs or any default pools. This eliminates a lot of startup functions. Once the cluster builds you can then via the management ui add the OSDs manually and pools, it will be easier to detect any freezing and on which function.
i know you checked your network configuration, ips, subnets ( make sure the subnets do not overlap in their ranges) but typically these are the most often issues.
Can you try to build the cluster without specifying any OSDs or any default pools. This eliminates a lot of startup functions. Once the cluster builds you can then via the management ui add the OSDs manually and pools, it will be easier to detect any freezing and on which function.
i know you checked your network configuration, ips, subnets ( make sure the subnets do not overlap in their ranges) but typically these are the most often issues.
shuelin@discovernet.ca
6 Posts
April 16, 2021, 5:20 pmQuote from shuelin@discovernet.ca on April 16, 2021, 5:20 pmthanks...did the double check on IP ranges all good. even verified all the MTU on switch all matched
also I had tried building the cluster with only the management role on all 3 nodes and the issue still persists. currently 2 of the nodes are in the cluster in management role only waiting for the 3rd management node. but sadly it still locks the 3rd node.
any other thoughts we could try?
thanks...did the double check on IP ranges all good. even verified all the MTU on switch all matched
also I had tried building the cluster with only the management role on all 3 nodes and the issue still persists. currently 2 of the nodes are in the cluster in management role only waiting for the 3rd management node. but sadly it still locks the 3rd node.
any other thoughts we could try?
admin
2,930 Posts
April 16, 2021, 8:27 pmQuote from admin on April 16, 2021, 8:27 pmcan you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
can you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
shuelin@discovernet.ca
6 Posts
April 18, 2021, 5:51 pmQuote from shuelin@discovernet.ca on April 18, 2021, 5:51 pm
Quote from admin on April 16, 2021, 8:27 pm
can you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
sent, thanks!
Quote from admin on April 16, 2021, 8:27 pm
can you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
sent, thanks!
admin
2,930 Posts
April 19, 2021, 3:53 pmQuote from admin on April 19, 2021, 3:53 pmfrom the logs in looks like there were prev, attempts to install. can you try once more installing from the beginning on all 3 nodes, using the installer to install os then deploy.
then if you it gets stuck in build of node 3, wait 15 min then send me same files + add the following for node 3 only:
dmesg
/var/log/syslog
from the logs in looks like there were prev, attempts to install. can you try once more installing from the beginning on all 3 nodes, using the installer to install os then deploy.
then if you it gets stuck in build of node 3, wait 15 min then send me same files + add the following for node 3 only:
dmesg
/var/log/syslog
Last edited on April 19, 2021, 3:54 pm by admin · #6
shuelin@discovernet.ca
6 Posts
April 21, 2021, 1:29 pmQuote from shuelin@discovernet.ca on April 21, 2021, 1:29 pmjust sent with the new logs and the syslog
did a clean install of all 3 nodes, management only, still #3 froze, waited 20 mins and sent logs
thakns
shawn
just sent with the new logs and the syslog
did a clean install of all 3 nodes, management only, still #3 froze, waited 20 mins and sent logs
thakns
shawn
admin
2,930 Posts
April 21, 2021, 8:11 pmQuote from admin on April 21, 2021, 8:11 pmit is most probably a hardware issue with 3rd node, i would recommend you try with different host if you can, or to verify this switch the node order for it to become the second node and most probably the hang will occur there.
from syslog, it appears to freeze during sync of the RTC real time clock
hwclock --systohc --utc
the syslog also show a reboot after this freeze but not sure if you did that or the system did it itself.
it is most probably a hardware issue with 3rd node, i would recommend you try with different host if you can, or to verify this switch the node order for it to become the second node and most probably the hang will occur there.
from syslog, it appears to freeze during sync of the RTC real time clock
hwclock --systohc --utc
the syslog also show a reboot after this freeze but not sure if you did that or the system did it itself.
Last edited on April 21, 2021, 8:21 pm by admin · #8
shuelin@discovernet.ca
6 Posts
April 21, 2021, 8:32 pmQuote from shuelin@discovernet.ca on April 21, 2021, 8:32 pmHello, i did the reboot to get the logs. was not able to access it to grab logs after the attempt to add it to the cluster.
I have tried with 3 different boxes as the 3rd node-2 physcial and 1 virtual and the same thing the 3rd node locks up. when we did 3 all virtual node we could create the cluster, which i thought could have been an issue with the 2 other nodes.
Hello, i did the reboot to get the logs. was not able to access it to grab logs after the attempt to add it to the cluster.
I have tried with 3 different boxes as the 3rd node-2 physcial and 1 virtual and the same thing the 3rd node locks up. when we did 3 all virtual node we could create the cluster, which i thought could have been an issue with the 2 other nodes.
admin
2,930 Posts
April 21, 2021, 10:12 pmQuote from admin on April 21, 2021, 10:12 pmcan you run the above command via ssh and see if it is causing the lock
can you run the above command via ssh and see if it is causing the lock
Pages: 1 2
Node freezes when adding to cluster
shuelin@discovernet.ca
6 Posts
Quote from shuelin@discovernet.ca on April 14, 2021, 9:48 pmwe have a node that when an attempt it made to add it to a cluster it freezes. no kernel panic just a straight up freeze. the config is 2 45 drives stornators and 1 other intel box.
things we have tried.
- creating the cluster with each of the different machines first and adding in different orders
- the 3rd node does the same thing virtual or physical
- installing petasan on different physical disks
- creating an all virtual 3 node cluster (works no problem)
- adding the nodes together with only management and nothing else
- adjusting the IP ranges in case i missed an IP conflict
- any combination of the above
the only thing we can get to work is 3 virtual nodes #4
there are no errors it just stops
has anyone elses seen this or have any advice? i have 20 - 18 tb just waiting to get used and filled
thanks
we have a node that when an attempt it made to add it to a cluster it freezes. no kernel panic just a straight up freeze. the config is 2 45 drives stornators and 1 other intel box.
things we have tried.
- creating the cluster with each of the different machines first and adding in different orders
- the 3rd node does the same thing virtual or physical
- installing petasan on different physical disks
- creating an all virtual 3 node cluster (works no problem)
- adding the nodes together with only management and nothing else
- adjusting the IP ranges in case i missed an IP conflict
- any combination of the above
the only thing we can get to work is 3 virtual nodes #4
there are no errors it just stops
has anyone elses seen this or have any advice? i have 20 - 18 tb just waiting to get used and filled
thanks
admin
2,930 Posts
Quote from admin on April 16, 2021, 10:51 amCan you try to build the cluster without specifying any OSDs or any default pools. This eliminates a lot of startup functions. Once the cluster builds you can then via the management ui add the OSDs manually and pools, it will be easier to detect any freezing and on which function.
i know you checked your network configuration, ips, subnets ( make sure the subnets do not overlap in their ranges) but typically these are the most often issues.
Can you try to build the cluster without specifying any OSDs or any default pools. This eliminates a lot of startup functions. Once the cluster builds you can then via the management ui add the OSDs manually and pools, it will be easier to detect any freezing and on which function.
i know you checked your network configuration, ips, subnets ( make sure the subnets do not overlap in their ranges) but typically these are the most often issues.
shuelin@discovernet.ca
6 Posts
Quote from shuelin@discovernet.ca on April 16, 2021, 5:20 pmthanks...did the double check on IP ranges all good. even verified all the MTU on switch all matched
also I had tried building the cluster with only the management role on all 3 nodes and the issue still persists. currently 2 of the nodes are in the cluster in management role only waiting for the 3rd management node. but sadly it still locks the 3rd node.
any other thoughts we could try?
thanks...did the double check on IP ranges all good. even verified all the MTU on switch all matched
also I had tried building the cluster with only the management role on all 3 nodes and the issue still persists. currently 2 of the nodes are in the cluster in management role only waiting for the 3rd management node. but sadly it still locks the 3rd node.
any other thoughts we could try?
admin
2,930 Posts
Quote from admin on April 16, 2021, 8:27 pmcan you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
can you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
shuelin@discovernet.ca
6 Posts
Quote from shuelin@discovernet.ca on April 18, 2021, 5:51 pmQuote from admin on April 16, 2021, 8:27 pmcan you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
sent, thanks!
Quote from admin on April 16, 2021, 8:27 pmcan you email the file /opt/petasan/log/PetaSAN.log on all 3 nodes to contact-us @ petasan.org
please send the logs of all 3 nodes after an unsuccessful attempt building the cluster in node 3 final step, wait 15 min then grab the logs.
also please include the file
/opt/petasan/config/cluster_info.json
sent, thanks!
admin
2,930 Posts
Quote from admin on April 19, 2021, 3:53 pmfrom the logs in looks like there were prev, attempts to install. can you try once more installing from the beginning on all 3 nodes, using the installer to install os then deploy.
then if you it gets stuck in build of node 3, wait 15 min then send me same files + add the following for node 3 only:
dmesg
/var/log/syslog
from the logs in looks like there were prev, attempts to install. can you try once more installing from the beginning on all 3 nodes, using the installer to install os then deploy.
then if you it gets stuck in build of node 3, wait 15 min then send me same files + add the following for node 3 only:
dmesg
/var/log/syslog
shuelin@discovernet.ca
6 Posts
Quote from shuelin@discovernet.ca on April 21, 2021, 1:29 pmjust sent with the new logs and the syslog
did a clean install of all 3 nodes, management only, still #3 froze, waited 20 mins and sent logs
thakns
shawn
just sent with the new logs and the syslog
did a clean install of all 3 nodes, management only, still #3 froze, waited 20 mins and sent logs
thakns
shawn
admin
2,930 Posts
Quote from admin on April 21, 2021, 8:11 pmit is most probably a hardware issue with 3rd node, i would recommend you try with different host if you can, or to verify this switch the node order for it to become the second node and most probably the hang will occur there.
from syslog, it appears to freeze during sync of the RTC real time clock
hwclock --systohc --utc
the syslog also show a reboot after this freeze but not sure if you did that or the system did it itself.
it is most probably a hardware issue with 3rd node, i would recommend you try with different host if you can, or to verify this switch the node order for it to become the second node and most probably the hang will occur there.
from syslog, it appears to freeze during sync of the RTC real time clock
hwclock --systohc --utc
the syslog also show a reboot after this freeze but not sure if you did that or the system did it itself.
shuelin@discovernet.ca
6 Posts
Quote from shuelin@discovernet.ca on April 21, 2021, 8:32 pmHello, i did the reboot to get the logs. was not able to access it to grab logs after the attempt to add it to the cluster.
I have tried with 3 different boxes as the 3rd node-2 physcial and 1 virtual and the same thing the 3rd node locks up. when we did 3 all virtual node we could create the cluster, which i thought could have been an issue with the 2 other nodes.
Hello, i did the reboot to get the logs. was not able to access it to grab logs after the attempt to add it to the cluster.
I have tried with 3 different boxes as the 3rd node-2 physcial and 1 virtual and the same thing the 3rd node locks up. when we did 3 all virtual node we could create the cluster, which i thought could have been an issue with the 2 other nodes.
admin
2,930 Posts
Quote from admin on April 21, 2021, 10:12 pmcan you run the above command via ssh and see if it is causing the lock
can you run the above command via ssh and see if it is causing the lock