fresh install of 2.4, cluster build seems to hang before creating any osd
deweyhylton
14 Posts
January 13, 2020, 11:13 pmQuote from deweyhylton on January 13, 2020, 11:13 pm3 nodes, each with 24 cores and 256gb memory. fresh install of 2.4 to all three.
start with node 1, provide initial cluster information and continue until it tells me it needs another node.
continue with node 2, supply information to join node 1 in the cluster, continue until it tells me it needs another node.
continue with node 3, join node 1 in cluster, continue until it tells me not to close the browser window while the cluster is being deployed.
... and then nothing. for 3 hours. looking at /opt/petasan/log/Petasan.log on all 3 nodes I see this:
node 1:
14/01/2020 01:46:56 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.4 -retry-join 10.149.110.5 -retry-join 10.149.110.6
node 2:
14/01/2020 01:46:57 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.5 -retry-join 10.149.110.4 -retry-join 10.149.110.6
node 3:
14/01/2020 01:47:09 INFO Starting to deploy remote monitors
It is clear that the disks have not yet been partitioned (lsblk). What should I be looking for here? We have verified connectivity on management and both backend networks via ping.
3 nodes, each with 24 cores and 256gb memory. fresh install of 2.4 to all three.
start with node 1, provide initial cluster information and continue until it tells me it needs another node.
continue with node 2, supply information to join node 1 in the cluster, continue until it tells me it needs another node.
continue with node 3, join node 1 in cluster, continue until it tells me not to close the browser window while the cluster is being deployed.
... and then nothing. for 3 hours. looking at /opt/petasan/log/Petasan.log on all 3 nodes I see this:
node 1:
14/01/2020 01:46:56 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.4 -retry-join 10.149.110.5 -retry-join 10.149.110.6
node 2:
14/01/2020 01:46:57 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.5 -retry-join 10.149.110.4 -retry-join 10.149.110.6
node 3:
14/01/2020 01:47:09 INFO Starting to deploy remote monitors
It is clear that the disks have not yet been partitioned (lsblk). What should I be looking for here? We have verified connectivity on management and both backend networks via ping.
admin
2,930 Posts
January 14, 2020, 11:23 amQuote from admin on January 14, 2020, 11:23 amhard to say, it seems the system freezed while creating the ceph monitors.
i recommend you re-check your connections and re-install.
If you still have issues, you can email us the following from all 3 nodes:
/opt/petasan/log/PetaSAN.log
Also if not too large
/var/log/syslog
dmesg
hard to say, it seems the system freezed while creating the ceph monitors.
i recommend you re-check your connections and re-install.
If you still have issues, you can email us the following from all 3 nodes:
/opt/petasan/log/PetaSAN.log
Also if not too large
/var/log/syslog
dmesg
deweyhylton
14 Posts
January 14, 2020, 7:33 pmQuote from deweyhylton on January 14, 2020, 7:33 pmThe issue appeared to be related to jumbo frames support. This was a redeployment in conjunction with datacenter move and ip changes. I re-learned the hard way that I need to triple-check behind our network team. Them saying things are done does not necessarily equate to the truth ... sorry for the noise.
The issue appeared to be related to jumbo frames support. This was a redeployment in conjunction with datacenter move and ip changes. I re-learned the hard way that I need to triple-check behind our network team. Them saying things are done does not necessarily equate to the truth ... sorry for the noise.
fresh install of 2.4, cluster build seems to hang before creating any osd
deweyhylton
14 Posts
Quote from deweyhylton on January 13, 2020, 11:13 pm3 nodes, each with 24 cores and 256gb memory. fresh install of 2.4 to all three.
start with node 1, provide initial cluster information and continue until it tells me it needs another node.
continue with node 2, supply information to join node 1 in the cluster, continue until it tells me it needs another node.
continue with node 3, join node 1 in cluster, continue until it tells me not to close the browser window while the cluster is being deployed.
... and then nothing. for 3 hours. looking at /opt/petasan/log/Petasan.log on all 3 nodes I see this:
node 1:
14/01/2020 01:46:56 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.4 -retry-join 10.149.110.5 -retry-join 10.149.110.6node 2:
14/01/2020 01:46:57 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.5 -retry-join 10.149.110.4 -retry-join 10.149.110.6node 3:
14/01/2020 01:47:09 INFO Starting to deploy remote monitors
It is clear that the disks have not yet been partitioned (lsblk). What should I be looking for here? We have verified connectivity on management and both backend networks via ping.
3 nodes, each with 24 cores and 256gb memory. fresh install of 2.4 to all three.
start with node 1, provide initial cluster information and continue until it tells me it needs another node.
continue with node 2, supply information to join node 1 in the cluster, continue until it tells me it needs another node.
continue with node 3, join node 1 in cluster, continue until it tells me not to close the browser window while the cluster is being deployed.
... and then nothing. for 3 hours. looking at /opt/petasan/log/Petasan.log on all 3 nodes I see this:
node 1:
14/01/2020 01:46:56 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.4 -retry-join 10.149.110.5 -retry-join 10.149.110.6
node 2:
14/01/2020 01:46:57 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.149.110.5 -retry-join 10.149.110.4 -retry-join 10.149.110.6
node 3:
14/01/2020 01:47:09 INFO Starting to deploy remote monitors
It is clear that the disks have not yet been partitioned (lsblk). What should I be looking for here? We have verified connectivity on management and both backend networks via ping.
admin
2,930 Posts
Quote from admin on January 14, 2020, 11:23 amhard to say, it seems the system freezed while creating the ceph monitors.
i recommend you re-check your connections and re-install.
If you still have issues, you can email us the following from all 3 nodes:
/opt/petasan/log/PetaSAN.log
Also if not too large
/var/log/syslog
dmesg
hard to say, it seems the system freezed while creating the ceph monitors.
i recommend you re-check your connections and re-install.
If you still have issues, you can email us the following from all 3 nodes:
/opt/petasan/log/PetaSAN.log
Also if not too large
/var/log/syslog
dmesg
deweyhylton
14 Posts
Quote from deweyhylton on January 14, 2020, 7:33 pmThe issue appeared to be related to jumbo frames support. This was a redeployment in conjunction with datacenter move and ip changes. I re-learned the hard way that I need to triple-check behind our network team. Them saying things are done does not necessarily equate to the truth ... sorry for the noise.
The issue appeared to be related to jumbo frames support. This was a redeployment in conjunction with datacenter move and ip changes. I re-learned the hard way that I need to triple-check behind our network team. Them saying things are done does not necessarily equate to the truth ... sorry for the noise.