Forums - PetaSAN

ForumBug Reportingsetup freeze on step 6
You need to log in to create posts and topics. Login · Register
setup freeze on step 6

Pages: 1 2 3

itw
3 Posts

May 1, 2017, 4:50 pm
Quote from itw on May 1, 2017, 4:50 pm
Hello,

I have problem during setting 3rd node, its freeze on step 6. I was waiting 24 hours and cluster don't want create.

My configuration :

4x hp dl180 G6 - 32 Gb Ram, 8 TB raid 5, system disk 128 Gb raid 1 SSD, 2x 1GbEthernet on board, 4x 1Gb Ethernet PCIE card

eth0 - onboard card - Management Subnet - 10.0.10.x/23

eth1 - onboard card - empty (waiting for v1.3 LACP)

eth2 - PCIE card - iSCSI1 Subnet 10.0.20.x/24 - Backend 1 Subnet 10.0.22.x/24

eth3 - PCIE card - empty (waiting for v1.3 LACP)

eth4 - PCIE card - iSCSI2 Subnet 10.0.21.x/24 - Backend 2 Subnet 10.0.23.x/24

eth5 - PCIE card - empty (waiting for v1.3 LACP)

all node can ping each other.

Thanks

Hello,

I have problem during setting 3rd node, its freeze on step 6. I was waiting 24 hours and cluster don't want create.

My configuration :

4x hp dl180 G6 - 32 Gb Ram, 8 TB raid 5, system disk 128 Gb raid 1 SSD, 2x 1GbEthernet on board, 4x 1Gb Ethernet PCIE card

eth0 - onboard card - Management Subnet - 10.0.10.x/23

eth1 - onboard card - empty (waiting for v1.3 LACP)

eth2 - PCIE card - iSCSI1 Subnet 10.0.20.x/24 - Backend 1 Subnet 10.0.22.x/24

eth3 - PCIE card - empty (waiting for v1.3 LACP)

eth4 - PCIE card - iSCSI2 Subnet 10.0.21.x/24 - Backend 2 Subnet 10.0.23.x/24

eth5 - PCIE card - empty (waiting for v1.3 LACP)

all node can ping each other.

Thanks

#1

admin
2,961 Posts

May 2, 2017, 6:38 am
Quote from admin on May 2, 2017, 6:38 am
Generally the time taken to build is mostly the time to partition and format the available disks.

1) To help know the issue: on the first 3 nodes, please gather the following files and dir ( using WinSCP for example )

/opt/petasan/log/PetaSAN.log
/opt/petasan/config/cluster_info.json
/opt/petasan/jobs (directory)

As well as the output of the following command
ceph-disk list
( To run the command it you can ssh to the node and redirect its output such as
ceph-disk list > result.txt )
Please zip all the files and send me an email on admin @ petasan.org

2) Also when you do your ping test, make sure the nodes ping each other on all the 3 static subnets: Management, Backend 1, Backend 2

3) Although this is not related to the issue, in Ceph it is better not to use RAID 5, rather use individual disks as JBOD or RAID 0

Generally the time taken to build is mostly the time to partition and format the available disks.

1) To help know the issue: on the first 3 nodes, please gather the following files and dir ( using WinSCP for example )

/opt/petasan/log/PetaSAN.log
/opt/petasan/config/cluster_info.json
/opt/petasan/jobs (directory)

As well as the output of the following command
ceph-disk list
( To run the command it you can ssh to the node and redirect its output such as
ceph-disk list > result.txt )
Please zip all the files and send me an email on admin @ petasan.org

2) Also when you do your ping test, make sure the nodes ping each other on all the 3 static subnets: Management, Backend 1, Backend 2

3) Although this is not related to the issue, in Ceph it is better not to use RAID 5, rather use individual disks as JBOD or RAID 0

#2

itw
3 Posts

May 3, 2017, 9:59 pm
Quote from itw on May 3, 2017, 9:59 pm
hello,

collected and send to posted email.

thanks

hello,

collected and send to posted email.

thanks

#3

admin
2,961 Posts

May 4, 2017, 9:51 am
Quote from admin on May 4, 2017, 9:51 am
Approx 1 min after the start of Step 6 to build the cluster on sk-itw-ps-003, the Ceph monitors were up and the Consul leaders were up, but then something caused a reboot while trying to prepare the OSD using the ceph-disk prepare command (this is a ceph cli command), this reboot broke the build stage.

01/05/2017 21:48:20 INFO Consul leaders are ready

01/05/2017 21:48:58 INFO Ceph monitors are ready.

01/05/2017 21:48:58 INFO Start deploying ceph OSDs.

* Problem here, a reboot occurred while executing ceph-disk prepare

01/05/2017 21:54:55 INFO Start settings IPs

We are not sure if this was related to the command itself or external.

What i suggest is this re-install from iso, and for all 3 nodes un-check the "Local Storage Service" in Step 5, this will omit building the OSD at deployment stage. The cluster should be built in about 5 minutes. If all goes well and you have a running cluster, go to the Node List and re-add the "Local Storage" role to one of the nodes, then go to the Physical Disk List for that node and add the disk from there, if there is still a problem we can help you trace ceph-disk prepare command in more detail with you. If all is well then maybe the reboot was from something else ?

Approx 1 min after the start of Step 6 to build the cluster on sk-itw-ps-003, the Ceph monitors were up and the Consul leaders were up, but then something caused a reboot while trying to prepare the OSD using the ceph-disk prepare command (this is a ceph cli command), this reboot broke the build stage.

01/05/2017 21:48:20 INFO Consul leaders are ready

01/05/2017 21:48:58 INFO Ceph monitors are ready.

01/05/2017 21:48:58 INFO Start deploying ceph OSDs.

* Problem here, a reboot occurred while executing ceph-disk prepare

01/05/2017 21:54:55 INFO Start settings IPs

We are not sure if this was related to the command itself or external.

What i suggest is this re-install from iso, and for all 3 nodes un-check the "Local Storage Service" in Step 5, this will omit building the OSD at deployment stage. The cluster should be built in about 5 minutes. If all goes well and you have a running cluster, go to the Node List and re-add the "Local Storage" role to one of the nodes, then go to the Physical Disk List for that node and add the disk from there, if there is still a problem we can help you trace ceph-disk prepare command in more detail with you. If all is well then maybe the reboot was from something else ?

#4

itw
3 Posts

May 8, 2017, 9:25 pm
Quote from itw on May 8, 2017, 9:25 pm
Hello,

I was reinstall all nodes, setup configured without "Local Storage Service", Cluster build about 3 minutes and now I can login to web management. 🙂

I was do all other things you wrote. all working and I can now start testing. For now its looking very nice.

Thanks

Hello,

I was reinstall all nodes, setup configured without "Local Storage Service", Cluster build about 3 minutes and now I can login to web management. 🙂

I was do all other things you wrote. all working and I can now start testing. For now its looking very nice.

Thanks

#5

admin
2,961 Posts

May 9, 2017, 10:29 am
Quote from admin on May 9, 2017, 10:29 am
Glad you like it 🙂

2 things:

-You need to have at least 3 OSDs (in total) on separate nodes to create iSCSI disks and do io operations

-The earlier reboot was most likely unrelated to PetaSAN deployment, but if you do get a chance to repeat the earlier setup i'd be interested to know this for sure.

Glad you like it 🙂

2 things:

-You need to have at least 3 OSDs (in total) on separate nodes to create iSCSI disks and do io operations

-The earlier reboot was most likely unrelated to PetaSAN deployment, but if you do get a chance to repeat the earlier setup i'd be interested to know this for sure.

#6

clsaad
8 Posts

May 17, 2017, 1:03 pm
Quote from clsaad on May 17, 2017, 1:03 pm
I have the same issue only with 3o node or after (4o, 5o, etc), but only with 1.2.2 version.

If I reinstall all nodes with 1.2.1 works fine.

I have the same issue only with 3o node or after (4o, 5o, etc), but only with 1.2.2 version.

If I reinstall all nodes with 1.2.1 works fine.

#7

admin
2,961 Posts

May 17, 2017, 2:14 pm
Quote from admin on May 17, 2017, 2:14 pm
Hello,

Can you retry the re-install using 1.2.2 ?

If it still fails can you please send the files listed above.

Hello,

Can you retry the re-install using 1.2.2 ?

If it still fails can you please send the files listed above.

#8

milton
8 Posts

May 18, 2017, 8:30 pm
Quote from milton on May 18, 2017, 8:30 pm
Hello, I got stuck on 3rd node, it gives a blank error I'm sending the files in the email

Hello, I got stuck on 3rd node, it gives a blank error I'm sending the files in the email

#9

admin
2,961 Posts

May 19, 2017, 7:43 am
Quote from admin on May 19, 2017, 7:43 am
Hi ,

On your first node petasan1/192.168.16.22 do you have any disks other than the system disk ?

Can please run this command on this node :

ceph-disk list > result

and send the result output file

If you do not have any disks then you need to uncheck the "Local Storage Service" when you first deploy the node.

Hi ,

On your first node petasan1/192.168.16.22 do you have any disks other than the system disk ?

Can please run this command on this node :

ceph-disk list > result

and send the result output file

If you do not have any disks then you need to uncheck the "Local Storage Service" when you first deploy the node.

#10

Post Reply: setup freeze on step 6

Cancel

Pages: 1 2 3