Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Adding more OSDs than configured

Pages: 1 2

Hi all, I setup a 3 nodes petasan 2.0.0 cluster with a total of 13 OSDs. At initial configuration time I choose "up to 15 OSDs". Now I'm trying to add a fourth host with 5 OSD, but I doesn't work, I suppose it is because of OSD limitation. Here is the status:

root@petatest01:~# ceph osd tree  --cluster=petasan
ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       4.74617 root default
-5       1.36345     host petatest01
3   hdd 0.27269         osd.3           up  1.00000 1.00000
4   hdd 0.27269         osd.4           up  1.00000 1.00000
5   hdd 0.27269         osd.5           up  1.00000 1.00000
6   hdd 0.27269         osd.6           up  1.00000 1.00000
7   hdd 0.27269         osd.7           up  1.00000 1.00000
-7       1.36345     host petatest02
8   hdd 0.27269         osd.8           up  1.00000 1.00000
9   hdd 0.27269         osd.9           up  1.00000 1.00000
10   hdd 0.27269         osd.10          up  1.00000 1.00000
11   hdd 0.27269         osd.11          up  1.00000 1.00000
12   hdd 0.27269         osd.12          up  1.00000 1.00000
-3       0.20009     host petatest03
0   hdd 0.06670         osd.0           up  1.00000 1.00000
1   hdd 0.06670         osd.1           up  1.00000 1.00000
2   hdd 0.06670         osd.2           up  1.00000 1.00000
-9       1.81918     host petatest04
13   hdd 0.90959         osd.13        down        0 1.00000
14   hdd 0.90959         osd.14        down        0 1.00000

Is there a way to manually fix this, or I have to scratch everything and re-install the cluster ?

Thanks and bye, S.

 

 

The 15 OSDs is not a hard set limit, it is used for best tuning the ideal pg count. Things should work fine if you add more osds/nodes.

I understand when you added osds 13/14 they came up as down, is this correct  ? If you reboot the node, no fix ?

If you try to start the osds manually via command line / ssh what error do you get

/usr/lib/ceph/ceph-osd-prestart.sh --cluster petasan --id 13
/usr/bin/ceph-osd -f --cluster petasan --id 13 --setuser ceph --setgroup ceph

can you see errors in

/opt/petasan/log/ceph-disk.log
/var/log/ceph/ceph-osd.13.log

Yesterday I re-installed petaSAN software on host #4 (petatest04), so today I removed osd 13 and 14 and host petatest04 from cluster and started again the "join exixting cluster" procedure.

This time 3 OSDs (over a total of 5 available on the node) were succesfully added and set to up:

root@petatest01:~# ceph osd tree  --cluster=petasan
ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       4.97336 root default
-5       1.36299     host petatest01
3   hdd 0.27299         osd.3           up  1.00000 1.00000
4   hdd 0.27299         osd.4           up  1.00000 1.00000
5   hdd 0.27299         osd.5           up  1.00000 1.00000
6   hdd 0.27299         osd.6           up  1.00000 1.00000
7   hdd 0.27299         osd.7           up  1.00000 1.00000
-7       1.36299     host petatest02
8   hdd 0.27299         osd.8           up  1.00000 1.00000
9   hdd 0.27299         osd.9           up  1.00000 1.00000
10   hdd 0.27299         osd.10          up  1.00000 1.00000
11   hdd 0.27299         osd.11          up  1.00000 1.00000
12   hdd 0.27299         osd.12          up  1.00000 1.00000
-3       0.20000     host petatest03
0   hdd 0.06699         osd.0           up  1.00000 1.00000
1   hdd 0.06699         osd.1           up  1.00000 1.00000
2   hdd 0.06699         osd.2           up  1.00000 1.00000
-9       2.04738     host petatest04
13   hdd 0.90959         osd.13          up  1.00000 1.00000
14   hdd 0.90959         osd.14          up  1.00000 1.00000
15   hdd 0.22820         osd.15          up  1.00000 1.00000

but after the 3rd OSD there was a problem, the new 3 OSDs went down and a recovery started:

13   hdd 0.90959         osd.13        down        0 1.00000
14   hdd 0.90959         osd.14        down        0 1.00000
15   hdd 0.22820         osd.15        down        0 1.00000

 

pg_status

After this, the new host was not reachable anymore on both management and backend IPs, after one hour the "Final deployment stage" is still running but it is obviously hang. Then I forced a power cycle, and now the 4th node is online and the 3 newly added OSD are up, but actually there are still other 2 OSDs to add. If I browse to http:/<IP>:5001 then the wizard appears, is it safe to run it again to add the other 2 OSDs ?

Thanks, S.

 

 

Can you check you can ping from all subnets from node 4 to the cluster and vice versa.

Do you have enough RAM ?

Do you see any errors in /opt/petasan/log/PetaSAN.log ?

If you try to start a down OSD manually as per my prev post, what errors do you get on console ?

You can re run the wizard, if while this is running you also run the atop command do you see any resource issues on ram/cpu/disks ?

 

 

Quote from admin on May 4, 2018, 11:08 am

Can you check you can ping from all subnets from node 4 to the cluster and vice versa.

After the power cycle, all network connections are ok. All 4 nodes and 16 OSDs are up, recovery ended and all PGs are active+clean.

 

Do you have enough RAM ?

Node #1, #2 and #4 have 4GB, node #3 has 16GB ram. Is this enough ? Management nodes are #1, #2 and #3

 

Do you see any errors in /opt/petasan/log/PetaSAN.log ?

Only on node #1 now there's this recurring error (no errors on other nodes):

04/05/2018 14:47:03 ERROR    Error during process.
04/05/2018 14:47:03 ERROR    [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 98, in start     self.__process()
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 132, in __process    while self.__do_process() != True:
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 193, in __do_process    self.__clean_unused_rbd_images()
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 374, in __clean_unused_rbd_images    rbd_images = ceph_api.get_mapped_images(pool)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/ceph/api.py", line 481, in get_mapped_images    out ,err = cmd.exec_command("rbd --cluster {}  showmapped".format(cluster_name))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/common/cmd.py", line 39, in exec_command    p = subprocess.Popen(cmd,shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__    errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1235, in _execute_child    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

 

If you try to start a down OSD manually as per my prev post, what errors do you get on console ?

They started automatically at node reboot.

 

You can re run the wizard, if while this is running you also run the atop command do you see any resource issues on ram/cpu/disks ?

As soon as fill the "Management node IP to join" and click "next" I get an error: "Error joining node to cluster." It seems quite reasonable as the host is already in the cluster, even if not all OSDs are configured.

root@petatest02:~# ceph osd tree  --cluster=petasan
ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       4.97336 root default
-5       1.36299     host petatest01
3   hdd 0.27299         osd.3           up  1.00000 1.00000
4   hdd 0.27299         osd.4           up  1.00000 1.00000
5   hdd 0.27299         osd.5           up  1.00000 1.00000
6   hdd 0.27299         osd.6           up  1.00000 1.00000
7   hdd 0.27299         osd.7           up  1.00000 1.00000
-7       1.36299     host petatest02
8   hdd 0.27299         osd.8           up  1.00000 1.00000
9   hdd 0.27299         osd.9           up  1.00000 1.00000
10   hdd 0.27299         osd.10          up  1.00000 1.00000
11   hdd 0.27299         osd.11          up  1.00000 1.00000
12   hdd 0.27299         osd.12          up  1.00000 1.00000
-3       0.20000     host petatest03
0   hdd 0.06699         osd.0           up  1.00000 1.00000
1   hdd 0.06699         osd.1           up  1.00000 1.00000
2   hdd 0.06699         osd.2           up  1.00000 1.00000
-9       2.04738     host petatest04
13   hdd 0.90959         osd.13          up  1.00000 1.00000
14   hdd 0.90959         osd.14          up  1.00000 1.00000

15   hdd 0.22820         osd.15          up  1.00000 1.00000

I think I'll try the manual procedure to add the remainig 2 disks.

Bye, S.

4G is not enough for 5 osds, please see the hardware recommendation guide.

It probably worked initially when you had no data, but then you tried to add node 4 with 5 osds this created some re-balance and the node could not handle it. Even if the cluster is working now it may be stressed again if a node dies ad recovery needs to happen.

Ok, thank you for the hint, I'll consider it for sure for the production cluster planning.

Anyway, I think now I have another issue: when the fourth osd is added networking is stopped, I suspect an IRQ sharing problem between SATA controller and the PCI slot where the 2nd network card is installed.

For your second issue: "networking is stopped",  not sure what this means in detail but as long as your ips and network are setup correctly + you do not have severe resource issues (which you seem to have) then things should be ok.

Quote from admin on May 4, 2018, 3:02 pm

For your second issue: "networking is stopped",  not sure what this means in detail ...

I understood where was the issue. Node #4 motherboard has 6 SATA ports, but only 4 ports are independent, port #5 and #6 share the IRQ with other devices, like the network card. This caused networking to stop when accessing disk on port #5. So I reduced the number of disks from 6 to 4 (1 for OS + 3 OSD) and now node #4 is succesfully added to the cluster. Actually the join procedure is always failing during disk preparation, so I manually deleted and re-added the 3 OSDs and the status is finally healty.

ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       5.65475 root default
-5       1.36299     host petatest01
3   hdd 0.27299         osd.3           up  1.00000 1.00000
4   hdd 0.27299         osd.4           up  1.00000 1.00000
5   hdd 0.27299         osd.5           up  1.00000 1.00000
6   hdd 0.27299         osd.6           up  1.00000 1.00000
7   hdd 0.27299         osd.7           up  1.00000 1.00000
-7       1.36299     host petatest02
8   hdd 0.27299         osd.8           up  1.00000 1.00000
9   hdd 0.27299         osd.9           up  1.00000 1.00000
10   hdd 0.27299         osd.10          up  1.00000 1.00000
11   hdd 0.27299         osd.11          up  1.00000 1.00000
12   hdd 0.27299         osd.12          up  1.00000 1.00000
-3       0.20000     host petatest03
0   hdd 0.06699         osd.0           up  1.00000 1.00000
1   hdd 0.06699         osd.1           up  1.00000 1.00000
2   hdd 0.06699         osd.2           up  1.00000 1.00000
-9       2.72878     host petatest04
13   hdd 0.90959         osd.13          up  1.00000 1.00000
14   hdd 0.90959         osd.14          up  1.00000 1.00000
15   hdd 0.90959         osd.15          up  1.00000 1.00000

Thanks, Ste.

 

( ... to be deleted ...)

Pages: 1 2