Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

setup freeze on step 6

Pages: 1 2 3

Hello,

I have problem during setting 3rd node, its freeze on step 6. I was waiting 24 hours and cluster don't want create.

My configuration :

4x hp dl180 G6 - 32 Gb Ram, 8 TB raid 5, system disk 128 Gb raid 1 SSD, 2x 1GbEthernet on board, 4x 1Gb Ethernet PCIE card

eth0 - onboard card -  Management Subnet - 10.0.10.x/23

eth1 - onboard card - empty (waiting for v1.3 LACP)

eth2 - PCIE card - iSCSI1 Subnet 10.0.20.x/24 - Backend 1 Subnet 10.0.22.x/24

eth3 - PCIE card - empty (waiting for v1.3 LACP)

eth4 - PCIE card - iSCSI2 Subnet 10.0.21.x/24 - Backend 2 Subnet 10.0.23.x/24

eth5 - PCIE card - empty (waiting for v1.3 LACP)

all node can ping each other.

Thanks

 

 

Generally the time taken to build is mostly the time to partition and format the available disks.

1) To help know the issue: on the first 3 nodes, please gather the following files and dir ( using WinSCP for example )

/opt/petasan/log/PetaSAN.log
/opt/petasan/config/cluster_info.json
/opt/petasan/jobs (directory)

As well as the output of the following command
ceph-disk list
( To run the command it you can ssh to the node and redirect its output such as
ceph-disk list > result.txt )
Please zip all the files and send me an email on admin @ petasan.org

2) Also when you do your ping test, make sure the nodes ping each other on all the 3 static subnets: Management, Backend 1, Backend 2

3) Although this is not related to the issue, in Ceph it is better not to use RAID 5, rather use individual disks as JBOD or RAID 0

hello,

collected and send to posted email.

thanks

Approx 1 min after the start of Step 6 to build the cluster on sk-itw-ps-003, the Ceph monitors were up and the Consul leaders were up, but then something caused a reboot while trying to prepare the OSD using the ceph-disk prepare command (this is a ceph cli command), this reboot broke the build stage.

01/05/2017 21:48:20 INFO     Consul leaders are ready

01/05/2017 21:48:58 INFO     Ceph monitors are ready.

01/05/2017 21:48:58 INFO     Start deploying ceph OSDs.

* Problem here, a reboot occurred  while executing  ceph-disk prepare

01/05/2017 21:54:55 INFO     Start settings IPs

We are not sure if this was related to the command itself or external.

What i suggest is this re-install from iso, and for all 3 nodes un-check the "Local Storage Service" in Step 5, this will omit building the OSD at deployment stage. The cluster should be built in about 5 minutes. If all goes well and you have a running cluster, go to the Node List and re-add the "Local Storage" role to one of the nodes, then go to the Physical Disk List for that node and add the disk from there, if there is still a problem we can help you trace ceph-disk prepare command in more detail with you. If all is well then maybe the reboot was from something else ?

Hello,

I was reinstall all nodes, setup configured without "Local Storage Service", Cluster build about 3 minutes and now I can login to web management. 🙂

I was do all other things you wrote. all working and I can now start testing. For now its looking very nice.

 

Thanks

Glad you like it 🙂

2 things:

-You need to have at least 3 OSDs  (in total) on separate nodes to create iSCSI disks and do io operations

-The earlier reboot was most likely unrelated to PetaSAN deployment, but if you do get a chance to repeat the earlier setup i'd be interested to know this for sure.

I have the same issue only with 3o node or after (4o, 5o, etc), but only with 1.2.2 version.

If I reinstall all nodes with 1.2.1 works fine.

Hello,

Can you retry the re-install using 1.2.2 ?

If it still fails can you please send the files listed above.

Hello, I got stuck on 3rd node, it gives a blank error I'm sending the files in the email

Hi ,

On your first node petasan1/192.168.16.22 do you have any disks other than the system disk ?

Can please run this command on this node :

ceph-disk list > result

and send the result output file

If you do not have any disks then you need to uncheck the "Local Storage Service" when you first deploy the node.

Pages: 1 2 3