Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

could not add all osd

Hi,

after building our first cluster with 4 nodes each node with 20x1,8TB sas and 4x 400GB SSD we can only see the half of the osd used by ceph

on 2 nodes each only 1 osd is used if we try to add the osd manually the system shows  "adding" and after a couple of secs adding is gone but osd is not added to the cluster

also we deleted the partition table of the osd we try to add cause of the they was a part of a zfs cluster before but still it's not possible to add the osd

 

in petasan.log we did not find any error:

01/02/2018 06:55:41 INFO     Start add osd job for disk sdb.
01/02/2018 06:55:42 INFO     Start cleaning disks
01/02/2018 06:55:43 INFO     Starting ceph-disk zap /dev/sdb
01/02/2018 06:55:45 INFO     Auto select journal for disk sdb.
01/02/2018 06:55:48 INFO     User selected auto journal and the selected journal is /dev/sdu disk for disk sdb.

osd tree shows only1 osd on node 1 and node 2

root@ps-cl01-node01:~# ceph osd tree --cluster ps-cl01
ID WEIGHT   TYPE NAME               UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 68.73706 root default
-2 32.73193     host ps-cl01-node03
0  1.63660         osd.0                up  1.00000          1.00000
1  1.63660         osd.1                up  1.00000          1.00000
2  1.63660         osd.2                up  1.00000          1.00000
3  1.63660         osd.3                up  1.00000          1.00000
4  1.63660         osd.4                up  1.00000          1.00000
5  1.63660         osd.5                up  1.00000          1.00000
6  1.63660         osd.6                up  1.00000          1.00000
7  1.63660         osd.7                up  1.00000          1.00000
8  1.63660         osd.8                up  1.00000          1.00000
9  1.63660         osd.9                up  1.00000          1.00000
10  1.63660         osd.10               up  1.00000          1.00000
11  1.63660         osd.11               up  1.00000          1.00000
12  1.63660         osd.12               up  1.00000          1.00000
13  1.63660         osd.13               up  1.00000          1.00000
14  1.63660         osd.14               up  1.00000          1.00000
15  1.63660         osd.15               up  1.00000          1.00000
16  1.63660         osd.16               up  1.00000          1.00000
17  1.63660         osd.17               up  1.00000          1.00000
18  1.63660         osd.18               up  1.00000          1.00000
19  1.63660         osd.19               up  1.00000          1.00000
-3  1.63660     host ps-cl01-node01
20  1.63660         osd.20               up  1.00000          1.00000
-4  1.63660     host ps-cl01-node02
21  1.63660         osd.21               up  1.00000          1.00000
-5 32.73193     host ps-cl01-node04
22  1.63660         osd.22               up  1.00000          1.00000
23  1.63660         osd.23               up  1.00000          1.00000
24  1.63660         osd.24               up  1.00000          1.00000
25  1.63660         osd.25               up  1.00000          1.00000
26  1.63660         osd.26               up  1.00000          1.00000
27  1.63660         osd.27               up  1.00000          1.00000
28  1.63660         osd.28               up  1.00000          1.00000
29  1.63660         osd.29               up  1.00000          1.00000
30  1.63660         osd.30               up  1.00000          1.00000
31  1.63660         osd.31               up  1.00000          1.00000
32  1.63660         osd.32               up  1.00000          1.00000
33  1.63660         osd.33               up  1.00000          1.00000
34  1.63660         osd.34               up  1.00000          1.00000
35  1.63660         osd.35               up  1.00000          1.00000
36  1.63660         osd.36               up  1.00000          1.00000
37  1.63660         osd.37               up  1.00000          1.00000
38  1.63660         osd.38               up  1.00000          1.00000
39  1.63660         osd.39               up  1.00000          1.00000
40  1.63660         osd.40               up  1.00000          1.00000
41  1.63660         osd.41               up  1.00000          1.00000

 

detect-disks.sh on node1

root@ps-cl01-node01:~# /opt/petasan/scripts/detect-disks.sh
device=sda,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb69c
device=sdb,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bad70
device=sdc,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5b9e94
device=sdd,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5b9d58
device=sde,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb158
device=sdf,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb7b0
device=sdg,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb078
device=sdh,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bae8c
device=sdi,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bacf4
device=sdj,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5ba140
device=sdk,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb284
device=sdl,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5baafc
device=sdm,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb034
device=sdn,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5b9d38
device=sdo,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb190
device=sdp,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5b88ac
device=sdq,size=293046768,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB150G7,serial=BTDV73260A58150MGN
device=sdr,size=293046768,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB150G7,serial=BTDV73350A70150MGN
device=sds,size=781422768,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BA400G4,serial=BTHV73340A7N400NGN
device=sdt,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb290
device=sdu,size=781422768,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BA400G4,serial=BTHV7334077H400NGN
device=sdv,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5b9cbc
device=sdw,size=781422768,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BA400G4,serial=BTHV73340GJM400NGN
device=sdx,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb684
device=sdy,size=781422768,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BA400G4,serial=BTHV73340GHZ400NGN
device=sdz,size=3516328368,bus=SCSI,fixed=Yes,ssd=No,vendor=HGST,model=HUC101818CS4200,serial=5000cca02c5bb15c

 

 

root@ps-cl01-node01:~# ceph-disk list
/dev/sda :
/dev/sda1 ceph data, active, cluster ps-cl01, osd.20, journal /dev/sds1
/dev/sdb :
/dev/sdb1 other
/dev/sdc other, unknown
/dev/sdd :
/dev/sdd1 other
/dev/sde other, unknown
/dev/sdf other, unknown
/dev/sdg other, unknown
/dev/sdh other, unknown
/dev/sdi other, unknown
/dev/sdj other, unknown
/dev/sdk other, unknown
/dev/sdl other, unknown
/dev/sdm other, unknown
/dev/sdn other, unknown
/dev/sdo other, unknown
/dev/sdp other, unknown
/dev/sdq :
/dev/sdq2 other, ext4, mounted on /
/dev/sdq1 other, ext4, mounted on /boot
/dev/sdq4 other, ext4, mounted on /opt/petasan/config
/dev/sdq3 other, ext4, mounted on /var/lib/ceph
/dev/sdr other, unknown
/dev/sds :
/dev/sds1 ceph journal, for /dev/sda1
/dev/sdt other, unknown
/dev/sdu :
/dev/sdu1 ceph journal
/dev/sdv other, unknown
/dev/sdw :
/dev/sdw1 ceph journal
/dev/sdx other, unknown
/dev/sdy :
/dev/sdy1 ceph journal
/dev/sdz other, unknown
root@ps-cl01-node01:~# ^C
root@ps-cl01-node01:~#

It is most likely related to old zfs metada that does not cleaned by deleting partition table as per

http://tracker.ceph.com/issues/19248

this may be the fix

http://www.petasan.org/forums/?view=thread&id=152&part=2#postid-805

perfect thanks now it looks good

but unfortunately, I added one wrong osd how can I remove this osdd from the cluster (its a smal ssd and should be used in future for boot system redundancy) ?

 

Happy it looks good, it is one those tough issues to troubleshoot since ceph-disk does not return any errors.

For the other issue, if i understand correctly you added an extra OSD that you want removed, the OSD was added successfully and is up. If so, in PetaSAN we allow deletion of OSD from ui only if they are down, but we do not allow stopping it  from ui, so you need to stop it manually:

systemctl stop ceph-osd@X

where X is the osd number

Soon after the ui will show it is down and allow you to delete

thanks for your fast respond ....yes it was strange cause of there was no error message

now everything looks good thanks again for your help