OSD won't UP after Journal temporary missing
wid
47 Posts
September 27, 2022, 8:16 pmQuote from wid on September 27, 2022, 8:16 pmFor sharing knowledge and for others with the same case:
Timeline:
Running cluster -> prepared node01 for shutdown -> maintenance mode -> server off -> removed one of 2 NVME cards from journal for 6 drives. -> unfortunately misplaced -> server on.
Maintenance has been turned off.
Then server was correctly turned off again, card corrected. When turned on, the disks (NVME + HDD) are visible.
OSDs which had journal on unlucky NVME do not get up. 6 disks DOWN. Reboot not helepd.
5 of 6 disks I removed from the panel, added correctly again, they are UP.
One I left for testing and learning.
# ceph osd df tree (part)
-5 58.92200 - 44 TiB 27 TiB 26 TiB 74 MiB 48 GiB 17 TiB 60.93 0.90 - host ceph01
7 hdd 7.33600 0.90002 7.3 TiB 6.0 TiB 5.9 TiB 5.5 MiB 10 GiB 1.3 TiB 81.71 1.21 253 up osd.7
9 hdd 5.51659 1.00000 5.5 TiB 4.0 TiB 3.9 TiB 727 KiB 7.0 GiB 1.5 TiB 72.43 1.07 172 up osd.9
10 hdd 5.51659 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.10
11 hdd 3.69730 1.00000 3.7 TiB 676 GiB 615 GiB 1.5 MiB 1.2 GiB 3.0 TiB 17.86 0.26 33 up osd.11
12 hdd 3.69730 1.00000 3.7 TiB 657 GiB 596 GiB 1.0 MiB 1.2 GiB 3.1 TiB 17.36 0.26 26 up osd.12
13 hdd 3.69730 1.00000 3.7 TiB 2.7 TiB 2.7 TiB 43 MiB 5.3 GiB 985 GiB 73.98 1.10 115 up osd.13
# ceph-volume lvm list (part)
====== osd.10 ======
[block] /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block device /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block uuid Piilr5-FTGQ-Yobo-ukeP-LQ9A-Lv7R-iMviwc
cephx lockbox secret
cluster fsid db9f5673-006e-45cb-b108-3646c60a3f31
cluster name ceph
crush device class None
db device /dev/nvme0n1p1
db uuid 39177a90-bc28-4383-8151-c22acb336295
encrypted 0
osd fsid 26cb0014-07d1-4632-b8cb-52c7a150f23e
osd id 10
osdspec affinity
type block
vdo 0
devices /dev/sdb1
[db] /dev/nvme0n1p1
PARTUUID 39177a90-bc28-4383-8151-c22acb336295
# ceph-volume lvm activate --all (does not work)
--> OSD ID 28 FSID 7bb1f4d4-4d9b-416f-9e99-9467f75cf48a process is active. Skipping activation
--> OSD ID 18 FSID c1d32b73-f538-4443-b002-d7806d22a313 process is active. Skipping activation
--> OSD ID 11 FSID 380f97e1-080e-4358-8d0f-7fd343a666b6 process is active. Skipping activation
--> Activating OSD ID 10 FSID 26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e --path /var/lib/ceph/osd/ceph-10 --no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-4
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ln -snf /dev/nvme0n1p1 /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/systemctl enable ceph-volume@lvm-10-26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/systemctl enable --runtime ceph-osd@10
Running command: /usr/bin/systemctl start ceph-osd@10
--> ceph-volume lvm activate successful for osd ID: 10
--> OSD ID 16 FSID 9360b500-1ae5-44a9-adc5-4daefb61ec1b process is active. Skipping activation
--> OSD ID 7 FSID 5f14e646-dcdd-4114-88fd-94305d2283a7 process is active. Skipping activation
--> OSD ID 9 FSID 06c5f03e-197b-4c5d-8b79-72b7b792f70e process is active. Skipping activation
# /usr/bin/systemctl status ceph-osd@10 (down)
# /usr/bin/systemctl start ceph-osd@10
022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25212, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25213, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25214, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25215, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25216, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25217, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25218, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25219, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25220, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25221, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25222, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25223, got 0 bytes
If I understand this correctly:
1) the Journal database was corrupted, for all drives.
2) the loss of the NVME drive results in the loss of 6 OSDs
For sharing knowledge and for others with the same case:
Timeline:
Running cluster -> prepared node01 for shutdown -> maintenance mode -> server off -> removed one of 2 NVME cards from journal for 6 drives. -> unfortunately misplaced -> server on.
Maintenance has been turned off.
Then server was correctly turned off again, card corrected. When turned on, the disks (NVME + HDD) are visible.
OSDs which had journal on unlucky NVME do not get up. 6 disks DOWN. Reboot not helepd.
5 of 6 disks I removed from the panel, added correctly again, they are UP.
One I left for testing and learning.
# ceph osd df tree (part)
-5 58.92200 - 44 TiB 27 TiB 26 TiB 74 MiB 48 GiB 17 TiB 60.93 0.90 - host ceph01
7 hdd 7.33600 0.90002 7.3 TiB 6.0 TiB 5.9 TiB 5.5 MiB 10 GiB 1.3 TiB 81.71 1.21 253 up osd.7
9 hdd 5.51659 1.00000 5.5 TiB 4.0 TiB 3.9 TiB 727 KiB 7.0 GiB 1.5 TiB 72.43 1.07 172 up osd.9
10 hdd 5.51659 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.10
11 hdd 3.69730 1.00000 3.7 TiB 676 GiB 615 GiB 1.5 MiB 1.2 GiB 3.0 TiB 17.86 0.26 33 up osd.11
12 hdd 3.69730 1.00000 3.7 TiB 657 GiB 596 GiB 1.0 MiB 1.2 GiB 3.1 TiB 17.36 0.26 26 up osd.12
13 hdd 3.69730 1.00000 3.7 TiB 2.7 TiB 2.7 TiB 43 MiB 5.3 GiB 985 GiB 73.98 1.10 115 up osd.13
# ceph-volume lvm list (part)
====== osd.10 ======
[block] /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block device /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block uuid Piilr5-FTGQ-Yobo-ukeP-LQ9A-Lv7R-iMviwc
cephx lockbox secret
cluster fsid db9f5673-006e-45cb-b108-3646c60a3f31
cluster name ceph
crush device class None
db device /dev/nvme0n1p1
db uuid 39177a90-bc28-4383-8151-c22acb336295
encrypted 0
osd fsid 26cb0014-07d1-4632-b8cb-52c7a150f23e
osd id 10
osdspec affinity
type block
vdo 0
devices /dev/sdb1
[db] /dev/nvme0n1p1
PARTUUID 39177a90-bc28-4383-8151-c22acb336295
# ceph-volume lvm activate --all (does not work)
--> OSD ID 28 FSID 7bb1f4d4-4d9b-416f-9e99-9467f75cf48a process is active. Skipping activation
--> OSD ID 18 FSID c1d32b73-f538-4443-b002-d7806d22a313 process is active. Skipping activation
--> OSD ID 11 FSID 380f97e1-080e-4358-8d0f-7fd343a666b6 process is active. Skipping activation
--> Activating OSD ID 10 FSID 26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e --path /var/lib/ceph/osd/ceph-10 --no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-4
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ln -snf /dev/nvme0n1p1 /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/systemctl enable ceph-volume@lvm-10-26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/systemctl enable --runtime ceph-osd@10
Running command: /usr/bin/systemctl start ceph-osd@10
--> ceph-volume lvm activate successful for osd ID: 10
--> OSD ID 16 FSID 9360b500-1ae5-44a9-adc5-4daefb61ec1b process is active. Skipping activation
--> OSD ID 7 FSID 5f14e646-dcdd-4114-88fd-94305d2283a7 process is active. Skipping activation
--> OSD ID 9 FSID 06c5f03e-197b-4c5d-8b79-72b7b792f70e process is active. Skipping activation
# /usr/bin/systemctl status ceph-osd@10 (down)
# /usr/bin/systemctl start ceph-osd@10
022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25212, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25213, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25214, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25215, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25216, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25217, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25218, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25219, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25220, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25221, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25222, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25223, got 0 bytes
If I understand this correctly:
1) the Journal database was corrupted, for all drives.
2) the loss of the NVME drive results in the loss of 6 OSDs
OSD won't UP after Journal temporary missing
wid
47 Posts
Quote from wid on September 27, 2022, 8:16 pmFor sharing knowledge and for others with the same case:
Timeline:
Running cluster -> prepared node01 for shutdown -> maintenance mode -> server off -> removed one of 2 NVME cards from journal for 6 drives. -> unfortunately misplaced -> server on.
Maintenance has been turned off.
Then server was correctly turned off again, card corrected. When turned on, the disks (NVME + HDD) are visible.
OSDs which had journal on unlucky NVME do not get up. 6 disks DOWN. Reboot not helepd.5 of 6 disks I removed from the panel, added correctly again, they are UP.
One I left for testing and learning.
# ceph osd df tree (part)
-5 58.92200 - 44 TiB 27 TiB 26 TiB 74 MiB 48 GiB 17 TiB 60.93 0.90 - host ceph01
7 hdd 7.33600 0.90002 7.3 TiB 6.0 TiB 5.9 TiB 5.5 MiB 10 GiB 1.3 TiB 81.71 1.21 253 up osd.7
9 hdd 5.51659 1.00000 5.5 TiB 4.0 TiB 3.9 TiB 727 KiB 7.0 GiB 1.5 TiB 72.43 1.07 172 up osd.9
10 hdd 5.51659 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.10
11 hdd 3.69730 1.00000 3.7 TiB 676 GiB 615 GiB 1.5 MiB 1.2 GiB 3.0 TiB 17.86 0.26 33 up osd.11
12 hdd 3.69730 1.00000 3.7 TiB 657 GiB 596 GiB 1.0 MiB 1.2 GiB 3.1 TiB 17.36 0.26 26 up osd.12
13 hdd 3.69730 1.00000 3.7 TiB 2.7 TiB 2.7 TiB 43 MiB 5.3 GiB 985 GiB 73.98 1.10 115 up osd.13# ceph-volume lvm list (part)
====== osd.10 ======
[block] /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block device /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block uuid Piilr5-FTGQ-Yobo-ukeP-LQ9A-Lv7R-iMviwc
cephx lockbox secret
cluster fsid db9f5673-006e-45cb-b108-3646c60a3f31
cluster name ceph
crush device class None
db device /dev/nvme0n1p1
db uuid 39177a90-bc28-4383-8151-c22acb336295
encrypted 0
osd fsid 26cb0014-07d1-4632-b8cb-52c7a150f23e
osd id 10
osdspec affinity
type block
vdo 0
devices /dev/sdb1[db] /dev/nvme0n1p1
PARTUUID 39177a90-bc28-4383-8151-c22acb336295
# ceph-volume lvm activate --all (does not work)
--> OSD ID 28 FSID 7bb1f4d4-4d9b-416f-9e99-9467f75cf48a process is active. Skipping activation
--> OSD ID 18 FSID c1d32b73-f538-4443-b002-d7806d22a313 process is active. Skipping activation
--> OSD ID 11 FSID 380f97e1-080e-4358-8d0f-7fd343a666b6 process is active. Skipping activation
--> Activating OSD ID 10 FSID 26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e --path /var/lib/ceph/osd/ceph-10 --no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-4
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ln -snf /dev/nvme0n1p1 /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/systemctl enable ceph-volume@lvm-10-26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/systemctl enable --runtime ceph-osd@10
Running command: /usr/bin/systemctl start ceph-osd@10
--> ceph-volume lvm activate successful for osd ID: 10
--> OSD ID 16 FSID 9360b500-1ae5-44a9-adc5-4daefb61ec1b process is active. Skipping activation
--> OSD ID 7 FSID 5f14e646-dcdd-4114-88fd-94305d2283a7 process is active. Skipping activation
--> OSD ID 9 FSID 06c5f03e-197b-4c5d-8b79-72b7b792f70e process is active. Skipping activation# /usr/bin/systemctl status ceph-osd@10 (down)
# /usr/bin/systemctl start ceph-osd@10022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25212, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25213, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25214, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25215, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25216, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25217, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25218, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25219, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25220, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25221, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25222, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25223, got 0 bytesIf I understand this correctly:
1) the Journal database was corrupted, for all drives.
2) the loss of the NVME drive results in the loss of 6 OSDs
For sharing knowledge and for others with the same case:
Timeline:
Running cluster -> prepared node01 for shutdown -> maintenance mode -> server off -> removed one of 2 NVME cards from journal for 6 drives. -> unfortunately misplaced -> server on.
Maintenance has been turned off.
Then server was correctly turned off again, card corrected. When turned on, the disks (NVME + HDD) are visible.
OSDs which had journal on unlucky NVME do not get up. 6 disks DOWN. Reboot not helepd.
5 of 6 disks I removed from the panel, added correctly again, they are UP.
One I left for testing and learning.
# ceph osd df tree (part)
-5 58.92200 - 44 TiB 27 TiB 26 TiB 74 MiB 48 GiB 17 TiB 60.93 0.90 - host ceph01
7 hdd 7.33600 0.90002 7.3 TiB 6.0 TiB 5.9 TiB 5.5 MiB 10 GiB 1.3 TiB 81.71 1.21 253 up osd.7
9 hdd 5.51659 1.00000 5.5 TiB 4.0 TiB 3.9 TiB 727 KiB 7.0 GiB 1.5 TiB 72.43 1.07 172 up osd.9
10 hdd 5.51659 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.10
11 hdd 3.69730 1.00000 3.7 TiB 676 GiB 615 GiB 1.5 MiB 1.2 GiB 3.0 TiB 17.86 0.26 33 up osd.11
12 hdd 3.69730 1.00000 3.7 TiB 657 GiB 596 GiB 1.0 MiB 1.2 GiB 3.1 TiB 17.36 0.26 26 up osd.12
13 hdd 3.69730 1.00000 3.7 TiB 2.7 TiB 2.7 TiB 43 MiB 5.3 GiB 985 GiB 73.98 1.10 115 up osd.13
# ceph-volume lvm list (part)
====== osd.10 ======
[block] /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block device /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e
block uuid Piilr5-FTGQ-Yobo-ukeP-LQ9A-Lv7R-iMviwc
cephx lockbox secret
cluster fsid db9f5673-006e-45cb-b108-3646c60a3f31
cluster name ceph
crush device class None
db device /dev/nvme0n1p1
db uuid 39177a90-bc28-4383-8151-c22acb336295
encrypted 0
osd fsid 26cb0014-07d1-4632-b8cb-52c7a150f23e
osd id 10
osdspec affinity
type block
vdo 0
devices /dev/sdb1
[db] /dev/nvme0n1p1
PARTUUID 39177a90-bc28-4383-8151-c22acb336295
# ceph-volume lvm activate --all (does not work)
--> OSD ID 28 FSID 7bb1f4d4-4d9b-416f-9e99-9467f75cf48a process is active. Skipping activation
--> OSD ID 18 FSID c1d32b73-f538-4443-b002-d7806d22a313 process is active. Skipping activation
--> OSD ID 11 FSID 380f97e1-080e-4358-8d0f-7fd343a666b6 process is active. Skipping activation
--> Activating OSD ID 10 FSID 26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e --path /var/lib/ceph/osd/ceph-10 --no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-8c3e6404-da2d-4ec4-a0ec-a8f8a97ec04f/osd-block-26cb0014-07d1-4632-b8cb-52c7a150f23e /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-4
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: /usr/bin/ln -snf /dev/nvme0n1p1 /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/nvme0n1p1
Running command: /usr/bin/systemctl enable ceph-volume@lvm-10-26cb0014-07d1-4632-b8cb-52c7a150f23e
Running command: /usr/bin/systemctl enable --runtime ceph-osd@10
Running command: /usr/bin/systemctl start ceph-osd@10
--> ceph-volume lvm activate successful for osd ID: 10
--> OSD ID 16 FSID 9360b500-1ae5-44a9-adc5-4daefb61ec1b process is active. Skipping activation
--> OSD ID 7 FSID 5f14e646-dcdd-4114-88fd-94305d2283a7 process is active. Skipping activation
--> OSD ID 9 FSID 06c5f03e-197b-4c5d-8b79-72b7b792f70e process is active. Skipping activation
# /usr/bin/systemctl status ceph-osd@10 (down)
# /usr/bin/systemctl start ceph-osd@10
022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25212, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25213, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25214, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25215, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25216, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25217, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25218, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25219, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25220, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25221, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25222, got 0 bytes
2022-09-27T22:07:51.852+0200 7fdd72d0f700 -1 osd.10 25534 failed to load OSD map for epoch 25223, got 0 bytes
If I understand this correctly:
1) the Journal database was corrupted, for all drives.
2) the loss of the NVME drive results in the loss of 6 OSDs