setup freeze on step 6
itw
3 Posts
May 1, 2017, 4:50 pmQuote from itw on May 1, 2017, 4:50 pmHello,
I have problem during setting 3rd node, its freeze on step 6. I was waiting 24 hours and cluster don't want create.
My configuration :
4x hp dl180 G6 - 32 Gb Ram, 8 TB raid 5, system disk 128 Gb raid 1 SSD, 2x 1GbEthernet on board, 4x 1Gb Ethernet PCIE card
eth0 - onboard card - Management Subnet - 10.0.10.x/23
eth1 - onboard card - empty (waiting for v1.3 LACP)
eth2 - PCIE card - iSCSI1 Subnet 10.0.20.x/24 - Backend 1 Subnet 10.0.22.x/24
eth3 - PCIE card - empty (waiting for v1.3 LACP)
eth4 - PCIE card - iSCSI2 Subnet 10.0.21.x/24 - Backend 2 Subnet 10.0.23.x/24
eth5 - PCIE card - empty (waiting for v1.3 LACP)
all node can ping each other.
Thanks
Hello,
I have problem during setting 3rd node, its freeze on step 6. I was waiting 24 hours and cluster don't want create.
My configuration :
4x hp dl180 G6 - 32 Gb Ram, 8 TB raid 5, system disk 128 Gb raid 1 SSD, 2x 1GbEthernet on board, 4x 1Gb Ethernet PCIE card
eth0 - onboard card - Management Subnet - 10.0.10.x/23
eth1 - onboard card - empty (waiting for v1.3 LACP)
eth2 - PCIE card - iSCSI1 Subnet 10.0.20.x/24 - Backend 1 Subnet 10.0.22.x/24
eth3 - PCIE card - empty (waiting for v1.3 LACP)
eth4 - PCIE card - iSCSI2 Subnet 10.0.21.x/24 - Backend 2 Subnet 10.0.23.x/24
eth5 - PCIE card - empty (waiting for v1.3 LACP)
all node can ping each other.
Thanks
admin
2,918 Posts
May 2, 2017, 6:38 amQuote from admin on May 2, 2017, 6:38 amGenerally the time taken to build is mostly the time to partition and format the available disks.
1) To help know the issue: on the first 3 nodes, please gather the following files and dir ( using WinSCP for example )
/opt/petasan/log/PetaSAN.log
/opt/petasan/config/cluster_info.json
/opt/petasan/jobs (directory)
As well as the output of the following command
ceph-disk list
( To run the command it you can ssh to the node and redirect its output such as
ceph-disk list > result.txt )
Please zip all the files and send me an email on admin @ petasan.org
2) Also when you do your ping test, make sure the nodes ping each other on all the 3 static subnets: Management, Backend 1, Backend 2
3) Although this is not related to the issue, in Ceph it is better not to use RAID 5, rather use individual disks as JBOD or RAID 0
Generally the time taken to build is mostly the time to partition and format the available disks.
1) To help know the issue: on the first 3 nodes, please gather the following files and dir ( using WinSCP for example )
/opt/petasan/log/PetaSAN.log
/opt/petasan/config/cluster_info.json
/opt/petasan/jobs (directory)
As well as the output of the following command
ceph-disk list
( To run the command it you can ssh to the node and redirect its output such as
ceph-disk list > result.txt )
Please zip all the files and send me an email on admin @ petasan.org
2) Also when you do your ping test, make sure the nodes ping each other on all the 3 static subnets: Management, Backend 1, Backend 2
3) Although this is not related to the issue, in Ceph it is better not to use RAID 5, rather use individual disks as JBOD or RAID 0
itw
3 Posts
May 3, 2017, 9:59 pmQuote from itw on May 3, 2017, 9:59 pmhello,
collected and send to posted email.
thanks
hello,
collected and send to posted email.
thanks
admin
2,918 Posts
May 4, 2017, 9:51 amQuote from admin on May 4, 2017, 9:51 amApprox 1 min after the start of Step 6 to build the cluster on sk-itw-ps-003, the Ceph monitors were up and the Consul leaders were up, but then something caused a reboot while trying to prepare the OSD using the ceph-disk prepare command (this is a ceph cli command), this reboot broke the build stage.
01/05/2017 21:48:20 INFO Consul leaders are ready
01/05/2017 21:48:58 INFO Ceph monitors are ready.
01/05/2017 21:48:58 INFO Start deploying ceph OSDs.
* Problem here, a reboot occurred while executing ceph-disk prepare
01/05/2017 21:54:55 INFO Start settings IPs
We are not sure if this was related to the command itself or external.
What i suggest is this re-install from iso, and for all 3 nodes un-check the "Local Storage Service" in Step 5, this will omit building the OSD at deployment stage. The cluster should be built in about 5 minutes. If all goes well and you have a running cluster, go to the Node List and re-add the "Local Storage" role to one of the nodes, then go to the Physical Disk List for that node and add the disk from there, if there is still a problem we can help you trace ceph-disk prepare command in more detail with you. If all is well then maybe the reboot was from something else ?
Approx 1 min after the start of Step 6 to build the cluster on sk-itw-ps-003, the Ceph monitors were up and the Consul leaders were up, but then something caused a reboot while trying to prepare the OSD using the ceph-disk prepare command (this is a ceph cli command), this reboot broke the build stage.
01/05/2017 21:48:20 INFO Consul leaders are ready
01/05/2017 21:48:58 INFO Ceph monitors are ready.
01/05/2017 21:48:58 INFO Start deploying ceph OSDs.
* Problem here, a reboot occurred while executing ceph-disk prepare
01/05/2017 21:54:55 INFO Start settings IPs
We are not sure if this was related to the command itself or external.
What i suggest is this re-install from iso, and for all 3 nodes un-check the "Local Storage Service" in Step 5, this will omit building the OSD at deployment stage. The cluster should be built in about 5 minutes. If all goes well and you have a running cluster, go to the Node List and re-add the "Local Storage" role to one of the nodes, then go to the Physical Disk List for that node and add the disk from there, if there is still a problem we can help you trace ceph-disk prepare command in more detail with you. If all is well then maybe the reboot was from something else ?
itw
3 Posts
May 8, 2017, 9:25 pmQuote from itw on May 8, 2017, 9:25 pmHello,
I was reinstall all nodes, setup configured without "Local Storage Service", Cluster build about 3 minutes and now I can login to web management. 🙂
I was do all other things you wrote. all working and I can now start testing. For now its looking very nice.
Thanks
Hello,
I was reinstall all nodes, setup configured without "Local Storage Service", Cluster build about 3 minutes and now I can login to web management. 🙂
I was do all other things you wrote. all working and I can now start testing. For now its looking very nice.
Thanks
admin
2,918 Posts
May 9, 2017, 10:29 amQuote from admin on May 9, 2017, 10:29 amGlad you like it 🙂
2 things:
-You need to have at least 3 OSDs (in total) on separate nodes to create iSCSI disks and do io operations
-The earlier reboot was most likely unrelated to PetaSAN deployment, but if you do get a chance to repeat the earlier setup i'd be interested to know this for sure.
Glad you like it 🙂
2 things:
-You need to have at least 3 OSDs (in total) on separate nodes to create iSCSI disks and do io operations
-The earlier reboot was most likely unrelated to PetaSAN deployment, but if you do get a chance to repeat the earlier setup i'd be interested to know this for sure.
clsaad
8 Posts
May 17, 2017, 1:03 pmQuote from clsaad on May 17, 2017, 1:03 pmI have the same issue only with 3o node or after (4o, 5o, etc), but only with 1.2.2 version.
If I reinstall all nodes with 1.2.1 works fine.
I have the same issue only with 3o node or after (4o, 5o, etc), but only with 1.2.2 version.
If I reinstall all nodes with 1.2.1 works fine.
admin
2,918 Posts
May 17, 2017, 2:14 pmQuote from admin on May 17, 2017, 2:14 pmHello,
Can you retry the re-install using 1.2.2 ?
If it still fails can you please send the files listed above.
Hello,
Can you retry the re-install using 1.2.2 ?
If it still fails can you please send the files listed above.
milton
8 Posts
May 18, 2017, 8:30 pmQuote from milton on May 18, 2017, 8:30 pmHello, I got stuck on 3rd node, it gives a blank error I'm sending the files in the email
Hello, I got stuck on 3rd node, it gives a blank error I'm sending the files in the email
admin
2,918 Posts
May 19, 2017, 7:43 amQuote from admin on May 19, 2017, 7:43 amHi ,
On your first node petasan1/192.168.16.22 do you have any disks other than the system disk ?
Can please run this command on this node :
ceph-disk list > result
and send the result output file
If you do not have any disks then you need to uncheck the "Local Storage Service" when you first deploy the node.
Hi ,
On your first node petasan1/192.168.16.22 do you have any disks other than the system disk ?
Can please run this command on this node :
ceph-disk list > result
and send the result output file
If you do not have any disks then you need to uncheck the "Local Storage Service" when you first deploy the node.
setup freeze on step 6
itw
3 Posts
Quote from itw on May 1, 2017, 4:50 pmHello,
I have problem during setting 3rd node, its freeze on step 6. I was waiting 24 hours and cluster don't want create.
My configuration :
4x hp dl180 G6 - 32 Gb Ram, 8 TB raid 5, system disk 128 Gb raid 1 SSD, 2x 1GbEthernet on board, 4x 1Gb Ethernet PCIE card
eth0 - onboard card - Management Subnet - 10.0.10.x/23
eth1 - onboard card - empty (waiting for v1.3 LACP)
eth2 - PCIE card - iSCSI1 Subnet 10.0.20.x/24 - Backend 1 Subnet 10.0.22.x/24
eth3 - PCIE card - empty (waiting for v1.3 LACP)
eth4 - PCIE card - iSCSI2 Subnet 10.0.21.x/24 - Backend 2 Subnet 10.0.23.x/24
eth5 - PCIE card - empty (waiting for v1.3 LACP)
all node can ping each other.
Thanks
Hello,
I have problem during setting 3rd node, its freeze on step 6. I was waiting 24 hours and cluster don't want create.
My configuration :
4x hp dl180 G6 - 32 Gb Ram, 8 TB raid 5, system disk 128 Gb raid 1 SSD, 2x 1GbEthernet on board, 4x 1Gb Ethernet PCIE card
eth0 - onboard card - Management Subnet - 10.0.10.x/23
eth1 - onboard card - empty (waiting for v1.3 LACP)
eth2 - PCIE card - iSCSI1 Subnet 10.0.20.x/24 - Backend 1 Subnet 10.0.22.x/24
eth3 - PCIE card - empty (waiting for v1.3 LACP)
eth4 - PCIE card - iSCSI2 Subnet 10.0.21.x/24 - Backend 2 Subnet 10.0.23.x/24
eth5 - PCIE card - empty (waiting for v1.3 LACP)
all node can ping each other.
Thanks
admin
2,918 Posts
Quote from admin on May 2, 2017, 6:38 amGenerally the time taken to build is mostly the time to partition and format the available disks.
1) To help know the issue: on the first 3 nodes, please gather the following files and dir ( using WinSCP for example )
/opt/petasan/log/PetaSAN.log
/opt/petasan/config/cluster_info.json
/opt/petasan/jobs (directory)As well as the output of the following command
ceph-disk list
( To run the command it you can ssh to the node and redirect its output such as
ceph-disk list > result.txt )
Please zip all the files and send me an email on admin @ petasan.org2) Also when you do your ping test, make sure the nodes ping each other on all the 3 static subnets: Management, Backend 1, Backend 2
3) Although this is not related to the issue, in Ceph it is better not to use RAID 5, rather use individual disks as JBOD or RAID 0
Generally the time taken to build is mostly the time to partition and format the available disks.
1) To help know the issue: on the first 3 nodes, please gather the following files and dir ( using WinSCP for example )
/opt/petasan/log/PetaSAN.log
/opt/petasan/config/cluster_info.json
/opt/petasan/jobs (directory)
As well as the output of the following command
ceph-disk list
( To run the command it you can ssh to the node and redirect its output such as
ceph-disk list > result.txt )
Please zip all the files and send me an email on admin @ petasan.org
2) Also when you do your ping test, make sure the nodes ping each other on all the 3 static subnets: Management, Backend 1, Backend 2
3) Although this is not related to the issue, in Ceph it is better not to use RAID 5, rather use individual disks as JBOD or RAID 0
itw
3 Posts
Quote from itw on May 3, 2017, 9:59 pmhello,
collected and send to posted email.
thanks
hello,
collected and send to posted email.
thanks
admin
2,918 Posts
Quote from admin on May 4, 2017, 9:51 amApprox 1 min after the start of Step 6 to build the cluster on sk-itw-ps-003, the Ceph monitors were up and the Consul leaders were up, but then something caused a reboot while trying to prepare the OSD using the ceph-disk prepare command (this is a ceph cli command), this reboot broke the build stage.
01/05/2017 21:48:20 INFO Consul leaders are ready
01/05/2017 21:48:58 INFO Ceph monitors are ready.
01/05/2017 21:48:58 INFO Start deploying ceph OSDs.
* Problem here, a reboot occurred while executing ceph-disk prepare
01/05/2017 21:54:55 INFO Start settings IPs
We are not sure if this was related to the command itself or external.
What i suggest is this re-install from iso, and for all 3 nodes un-check the "Local Storage Service" in Step 5, this will omit building the OSD at deployment stage. The cluster should be built in about 5 minutes. If all goes well and you have a running cluster, go to the Node List and re-add the "Local Storage" role to one of the nodes, then go to the Physical Disk List for that node and add the disk from there, if there is still a problem we can help you trace ceph-disk prepare command in more detail with you. If all is well then maybe the reboot was from something else ?
Approx 1 min after the start of Step 6 to build the cluster on sk-itw-ps-003, the Ceph monitors were up and the Consul leaders were up, but then something caused a reboot while trying to prepare the OSD using the ceph-disk prepare command (this is a ceph cli command), this reboot broke the build stage.
01/05/2017 21:48:20 INFO Consul leaders are ready
01/05/2017 21:48:58 INFO Ceph monitors are ready.
01/05/2017 21:48:58 INFO Start deploying ceph OSDs.
* Problem here, a reboot occurred while executing ceph-disk prepare
01/05/2017 21:54:55 INFO Start settings IPs
We are not sure if this was related to the command itself or external.
What i suggest is this re-install from iso, and for all 3 nodes un-check the "Local Storage Service" in Step 5, this will omit building the OSD at deployment stage. The cluster should be built in about 5 minutes. If all goes well and you have a running cluster, go to the Node List and re-add the "Local Storage" role to one of the nodes, then go to the Physical Disk List for that node and add the disk from there, if there is still a problem we can help you trace ceph-disk prepare command in more detail with you. If all is well then maybe the reboot was from something else ?
itw
3 Posts
Quote from itw on May 8, 2017, 9:25 pmHello,
I was reinstall all nodes, setup configured without "Local Storage Service", Cluster build about 3 minutes and now I can login to web management. 🙂
I was do all other things you wrote. all working and I can now start testing. For now its looking very nice.
Thanks
Hello,
I was reinstall all nodes, setup configured without "Local Storage Service", Cluster build about 3 minutes and now I can login to web management. 🙂
I was do all other things you wrote. all working and I can now start testing. For now its looking very nice.
Thanks
admin
2,918 Posts
Quote from admin on May 9, 2017, 10:29 amGlad you like it 🙂
2 things:
-You need to have at least 3 OSDs (in total) on separate nodes to create iSCSI disks and do io operations
-The earlier reboot was most likely unrelated to PetaSAN deployment, but if you do get a chance to repeat the earlier setup i'd be interested to know this for sure.
Glad you like it 🙂
2 things:
-You need to have at least 3 OSDs (in total) on separate nodes to create iSCSI disks and do io operations
-The earlier reboot was most likely unrelated to PetaSAN deployment, but if you do get a chance to repeat the earlier setup i'd be interested to know this for sure.
clsaad
8 Posts
Quote from clsaad on May 17, 2017, 1:03 pmI have the same issue only with 3o node or after (4o, 5o, etc), but only with 1.2.2 version.
If I reinstall all nodes with 1.2.1 works fine.
I have the same issue only with 3o node or after (4o, 5o, etc), but only with 1.2.2 version.
If I reinstall all nodes with 1.2.1 works fine.
admin
2,918 Posts
Quote from admin on May 17, 2017, 2:14 pmHello,
Can you retry the re-install using 1.2.2 ?
If it still fails can you please send the files listed above.
Hello,
Can you retry the re-install using 1.2.2 ?
If it still fails can you please send the files listed above.
milton
8 Posts
Quote from milton on May 18, 2017, 8:30 pmHello, I got stuck on 3rd node, it gives a blank error I'm sending the files in the email
Hello, I got stuck on 3rd node, it gives a blank error I'm sending the files in the email
admin
2,918 Posts
Quote from admin on May 19, 2017, 7:43 amHi ,
On your first node petasan1/192.168.16.22 do you have any disks other than the system disk ?
Can please run this command on this node :
ceph-disk list > result
and send the result output file
If you do not have any disks then you need to uncheck the "Local Storage Service" when you first deploy the node.
Hi ,
On your first node petasan1/192.168.16.22 do you have any disks other than the system disk ?
Can please run this command on this node :
ceph-disk list > result
and send the result output file
If you do not have any disks then you need to uncheck the "Local Storage Service" when you first deploy the node.