Forums - PetaSAN

ForumGeneral DiscussionLingering OSD
You need to log in to create posts and topics. Login · Register
Lingering OSD

garfield659
10 Posts

August 2, 2024, 1:04 am
Quote from garfield659 on August 2, 2024, 1:04 am
I was setting up a new node and I added some new OSDs and was testing disk throughput via the console (I thought I was on a different machine, didn't mean to do both on the same server). When I figured out I was on the same server I deleted the OSDs via the command line and am now suck with one OSD that is lingering.

In the web interface it lists:

Name: <Blank> Size: <Blank> SSD: No Usage: OSD5 Class: auto Serial: <blank> Status: 0 Linked Devices: <Blank> OSD Usage: <Blank>

root@gs-ss1-ps5:~# ceph pg dump
version 8625
stamp 2024-08-01T20:57:24.666073-0400
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED
1.0 2 0 0 0 0 590368 0 0 289 0 289 remapped+peering 2024-08-01T20:57:20.627869-0400 102'289 107:452 [0] 0 [1,0] 1 0'0 2024-08-01T17:41:59.396136-0400 0'0 2024-08-01T17:41:59.396136-0400 0 0 periodic scrub scheduled @ 2024-08-02T23:18:45.035057+0000 0 0

1 2 0 0 0 0 590368 0 0 289 289

sum 2 0 0 0 0 590368 0 0 289 289
OSD_STAT USED AVAIL USED_RAW TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
4 291 MiB 5.8 TiB 291 MiB 5.8 TiB [] 0 0
5 0 B 0 B 0 B 0 B [] 0 0
3 292 MiB 5.8 TiB 292 MiB 5.8 TiB [0,1,2] 0 0
2 292 MiB 5.8 TiB 292 MiB 5.8 TiB [0,1,3] 0 0
1 292 MiB 5.8 TiB 292 MiB 5.8 TiB [0,2,3] 1 0
0 292 MiB 5.8 TiB 292 MiB 5.8 TiB [1,2,3] 1 1
sum 1.4 GiB 29 TiB 1.4 GiB 29 TiB

* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details.
dumped all
root@gs-ss1-ps5:~#

any idea how to remove it? I tried "ceph osd destroy 5 --yes-i-really-mean-it" which reported "destroyed osd.5 however it still remains. This is PetaSAN version 3.3.0.

I was setting up a new node and I added some new OSDs and was testing disk throughput via the console (I thought I was on a different machine, didn't mean to do both on the same server). When I figured out I was on the same server I deleted the OSDs via the command line and am now suck with one OSD that is lingering.

In the web interface it lists:

Name: <Blank> Size: <Blank> SSD: No Usage: OSD5 Class: auto Serial: <blank> Status: 0 Linked Devices: <Blank> OSD Usage: <Blank>

root@gs-ss1-ps5:~# ceph pg dump
version 8625
stamp 2024-08-01T20:57:24.666073-0400
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED
1.0 2 0 0 0 0 590368 0 0 289 0 289 remapped+peering 2024-08-01T20:57:20.627869-0400 102'289 107:452 [0] 0 [1,0] 1 0'0 2024-08-01T17:41:59.396136-0400 0'0 2024-08-01T17:41:59.396136-0400 0 0 periodic scrub scheduled @ 2024-08-02T23:18:45.035057+0000 0 0

1 2 0 0 0 0 590368 0 0 289 289

sum 2 0 0 0 0 590368 0 0 289 289
OSD_STAT USED AVAIL USED_RAW TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
4 291 MiB 5.8 TiB 291 MiB 5.8 TiB [] 0 0
5 0 B 0 B 0 B 0 B [] 0 0
3 292 MiB 5.8 TiB 292 MiB 5.8 TiB [0,1,2] 0 0
2 292 MiB 5.8 TiB 292 MiB 5.8 TiB [0,1,3] 0 0
1 292 MiB 5.8 TiB 292 MiB 5.8 TiB [0,2,3] 1 0
0 292 MiB 5.8 TiB 292 MiB 5.8 TiB [1,2,3] 1 1
sum 1.4 GiB 29 TiB 1.4 GiB 29 TiB

* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details.
dumped all
root@gs-ss1-ps5:~#

any idea how to remove it? I tried "ceph osd destroy 5 --yes-i-really-mean-it" which reported "destroyed osd.5 however it still remains. This is PetaSAN version 3.3.0.

#1

admin
2,930 Posts

August 2, 2024, 4:04 am
Quote from admin on August 2, 2024, 4:04 am
On node with issue, what is output of

ceph-volume lvm list

ceph osd tree

On node with issue, what is output of

ceph-volume lvm list

ceph osd tree

#2

garfield659
10 Posts

August 2, 2024, 10:38 pm
Quote from garfield659 on August 2, 2024, 10:38 pm
OSD 5 is the one that seems stuck and cannot be removed. I have also tried rebooting the entire system (all nodes), but still have the stuck OSD5.

root@gs-ss1-ps5:~# ceph-volume lvm list

====== osd.4 =======

[block] /dev/ceph-c047897b-a3c7-45b6-932e-e35ce45f91ca/osd-block-5e4f6434-e4ed-43b3-9469-45acdf049533

block device /dev/ceph-c047897b-a3c7-45b6-932e-e35ce45f91ca/osd-block-5e4f6434-e4ed-43b3-9469-45acdf049533
block uuid CNtWcb-I0vt-7uSk-b22C-u0mh-yhdH-nHqY85
cephx lockbox secret
cluster fsid b1042b3d-e000-4a7c-9b6e-c03a8a8ea373
cluster name ceph
crush device class
encrypted 0
osd fsid 5e4f6434-e4ed-43b3-9469-45acdf049533
osd id 4
osdspec affinity
type block
vdo 0
devices /dev/nvme0n1p1
root@gs-ss1-ps5:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 34.93140 root default
-3 23.28760 host gs-ss1-ps4
0 ssd 5.82190 osd.0 up 1.00000 1.00000
1 ssd 5.82190 osd.1 up 1.00000 1.00000
2 ssd 5.82190 osd.2 up 1.00000 1.00000
3 ssd 5.82190 osd.3 up 1.00000 1.00000
-5 11.64380 host gs-ss1-ps5
4 ssd 5.82190 osd.4 up 1.00000 1.00000
5 ssd 5.82190 osd.5 destroyed 0 1.00000
root@gs-ss1-ps5:~#

OSD 5 is the one that seems stuck and cannot be removed. I have also tried rebooting the entire system (all nodes), but still have the stuck OSD5.

root@gs-ss1-ps5:~# ceph-volume lvm list

====== osd.4 =======

[block] /dev/ceph-c047897b-a3c7-45b6-932e-e35ce45f91ca/osd-block-5e4f6434-e4ed-43b3-9469-45acdf049533

block device /dev/ceph-c047897b-a3c7-45b6-932e-e35ce45f91ca/osd-block-5e4f6434-e4ed-43b3-9469-45acdf049533
block uuid CNtWcb-I0vt-7uSk-b22C-u0mh-yhdH-nHqY85
cephx lockbox secret
cluster fsid b1042b3d-e000-4a7c-9b6e-c03a8a8ea373
cluster name ceph
crush device class
encrypted 0
osd fsid 5e4f6434-e4ed-43b3-9469-45acdf049533
osd id 4
osdspec affinity
type block
vdo 0
devices /dev/nvme0n1p1
root@gs-ss1-ps5:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 34.93140 root default
-3 23.28760 host gs-ss1-ps4
0 ssd 5.82190 osd.0 up 1.00000 1.00000
1 ssd 5.82190 osd.1 up 1.00000 1.00000
2 ssd 5.82190 osd.2 up 1.00000 1.00000
3 ssd 5.82190 osd.3 up 1.00000 1.00000
-5 11.64380 host gs-ss1-ps5
4 ssd 5.82190 osd.4 up 1.00000 1.00000
5 ssd 5.82190 osd.5 destroyed 0 1.00000
root@gs-ss1-ps5:~#

#3

admin
2,930 Posts

August 3, 2024, 2:54 pm
Quote from admin on August 3, 2024, 2:54 pm
ceph osd rm osd.5
ceph osd crush remove osd.5
ceph auth del osd.5

ceph osd rm osd.5
ceph osd crush remove osd.5
ceph auth del osd.5

#4

garfield659
10 Posts

August 3, 2024, 3:34 pm
Quote from garfield659 on August 3, 2024, 3:34 pm
That did it. Thank you very much.

That did it. Thank you very much.

#5

Post Reply: Lingering OSD

Cancel