Forums - PetaSAN

ForumBug Reportingunable to add OSD when available …
You need to log in to create posts and topics. Login · Register
unable to add OSD when available journal partition exists

ghbiz
76 Posts

June 4, 2024, 1:57 pm
Quote from ghbiz on June 4, 2024, 1:57 pm
Running Petasan 3.2.1

Seems that it finds an available journal partition to use but logs show that it removes OSD directly after. Finally it adds a new journal partition for good measure?

04/06/2024 09:52:20 INFO Running script : /opt/petasan/scripts/admin/node_manage_disks.py add-osd -disk_name sdf -journal nvme0n1
04/06/2024 09:52:20 INFO Start add osd job for disk sdf.
04/06/2024 09:52:24 INFO Start cleaning : sdf
04/06/2024 09:52:28 INFO Executing : wipefs --all /dev/sdf1
04/06/2024 09:52:28 INFO Executing : dd if=/dev/zero of=/dev/sdf1 bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
04/06/2024 09:52:29 INFO Executing : wipefs --all /dev/sdf
04/06/2024 09:52:29 INFO Executing : dd if=/dev/zero of=/dev/sdf bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
04/06/2024 09:52:29 INFO Executing : parted -s /dev/sdf mklabel gpt
04/06/2024 09:52:29 INFO Executing : partprobe /dev/sdf
04/06/2024 09:52:36 INFO User selected journal nvme0n1 disk for disk sdf.
04/06/2024 09:52:36 INFO User didn't select a cache for disk sdf.
04/06/2024 09:52:36 INFO Start prepare bluestore OSD : sdf
04/06/2024 09:52:36 INFO Creating data partition num 1 size 9537536MB on /dev/sdf
04/06/2024 09:52:37 INFO Calling partprobe on sdf device
04/06/2024 09:52:37 INFO Executing partprobe /dev/sdf
04/06/2024 09:52:37 INFO Calling udevadm on sdf device
04/06/2024 09:52:37 INFO Executing udevadm settle --timeout 30
04/06/2024 09:52:41 INFO available journal partition found and will be reused.
04/06/2024 09:52:43 INFO Start remove osd.29 from crush map
04/06/2024 09:52:47 INFO osd.29 is removed from crush map
04/06/2024 09:52:47 INFO creating new journal partition.
04/06/2024 09:52:48 INFO Creating journal-db-29-b16f8750 partition num 22 size 61440MB on /dev/nvme0n1

Running Petasan 3.2.1

Seems that it finds an available journal partition to use but logs show that it removes OSD directly after. Finally it adds a new journal partition for good measure?

04/06/2024 09:52:20 INFO Running script : /opt/petasan/scripts/admin/node_manage_disks.py add-osd -disk_name sdf -journal nvme0n1
04/06/2024 09:52:20 INFO Start add osd job for disk sdf.
04/06/2024 09:52:24 INFO Start cleaning : sdf
04/06/2024 09:52:28 INFO Executing : wipefs --all /dev/sdf1
04/06/2024 09:52:28 INFO Executing : dd if=/dev/zero of=/dev/sdf1 bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
04/06/2024 09:52:29 INFO Executing : wipefs --all /dev/sdf
04/06/2024 09:52:29 INFO Executing : dd if=/dev/zero of=/dev/sdf bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
04/06/2024 09:52:29 INFO Executing : parted -s /dev/sdf mklabel gpt
04/06/2024 09:52:29 INFO Executing : partprobe /dev/sdf
04/06/2024 09:52:36 INFO User selected journal nvme0n1 disk for disk sdf.
04/06/2024 09:52:36 INFO User didn't select a cache for disk sdf.
04/06/2024 09:52:36 INFO Start prepare bluestore OSD : sdf
04/06/2024 09:52:36 INFO Creating data partition num 1 size 9537536MB on /dev/sdf
04/06/2024 09:52:37 INFO Calling partprobe on sdf device
04/06/2024 09:52:37 INFO Executing partprobe /dev/sdf
04/06/2024 09:52:37 INFO Calling udevadm on sdf device
04/06/2024 09:52:37 INFO Executing udevadm settle --timeout 30
04/06/2024 09:52:41 INFO available journal partition found and will be reused.
04/06/2024 09:52:43 INFO Start remove osd.29 from crush map
04/06/2024 09:52:47 INFO osd.29 is removed from crush map
04/06/2024 09:52:47 INFO creating new journal partition.
04/06/2024 09:52:48 INFO Creating journal-db-29-b16f8750 partition num 22 size 61440MB on /dev/nvme0n1

#1

ghbiz
76 Posts

June 4, 2024, 2:00 pm
Quote from ghbiz on June 4, 2024, 2:00 pm
some notes:

This came from an original cluster that had jounral drives as 40G ... I think back in Peta 2.8

Looks like the solution may be to manually delete unused nvme 40G partitions ...

nvme0n1 259:0 0 745.2G 0 disk
├─nvme0n1p1 259:1 0 40G 0 part
├─nvme0n1p2 259:2 0 40G 0 part
├─nvme0n1p3 259:3 0 40G 0 part
├─nvme0n1p4 259:4 0 40G 0 part
├─nvme0n1p5 259:5 0 40G 0 part
├─nvme0n1p6 259:6 0 40G 0 part
├─nvme0n1p13 259:7 0 40G 0 part
├─nvme0n1p14 259:8 0 40G 0 part
├─nvme0n1p15 259:9 0 40G 0 part
├─nvme0n1p16 259:10 0 40G 0 part
├─nvme0n1p17 259:11 0 60G 0 part
├─nvme0n1p18 259:12 0 60G 0 part
├─nvme0n1p19 259:13 0 60G 0 part
├─nvme0n1p20 259:14 0 60G 0 part
└─nvme0n1p21 259:15 0 60G 0 part

some notes:

This came from an original cluster that had jounral drives as 40G ... I think back in Peta 2.8

Looks like the solution may be to manually delete unused nvme 40G partitions ...

nvme0n1 259:0 0 745.2G 0 disk
├─nvme0n1p1 259:1 0 40G 0 part
├─nvme0n1p2 259:2 0 40G 0 part
├─nvme0n1p3 259:3 0 40G 0 part
├─nvme0n1p4 259:4 0 40G 0 part
├─nvme0n1p5 259:5 0 40G 0 part
├─nvme0n1p6 259:6 0 40G 0 part
├─nvme0n1p13 259:7 0 40G 0 part
├─nvme0n1p14 259:8 0 40G 0 part
├─nvme0n1p15 259:9 0 40G 0 part
├─nvme0n1p16 259:10 0 40G 0 part
├─nvme0n1p17 259:11 0 60G 0 part
├─nvme0n1p18 259:12 0 60G 0 part
├─nvme0n1p19 259:13 0 60G 0 part
├─nvme0n1p20 259:14 0 60G 0 part
└─nvme0n1p21 259:15 0 60G 0 part

#2

ghbiz
76 Posts

June 4, 2024, 2:07 pm
Quote from ghbiz on June 4, 2024, 2:07 pm
Went ahead and manually deleted the 40G partitions in the front of the disk that are no longer used and only left the 6x partitions that are being used.

Now the error is "Journal disk has no space for new OSD"

I feel like in the past, i use to be able to delete the partitions and petasan would create new partitions at the start of the drives...

nvme0n1 259:0 0 745.2G 0 disk

├─nvme0n1p16 259:10 0 40G 0 part
├─nvme0n1p17 259:11 0 60G 0 part
├─nvme0n1p18 259:12 0 60G 0 part
├─nvme0n1p19 259:13 0 60G 0 part
├─nvme0n1p20 259:14 0 60G 0 part
└─nvme0n1p21 259:15 0 60G 0 part

Went ahead and manually deleted the 40G partitions in the front of the disk that are no longer used and only left the 6x partitions that are being used.

Now the error is "Journal disk has no space for new OSD"

I feel like in the past, i use to be able to delete the partitions and petasan would create new partitions at the start of the drives...

nvme0n1 259:0 0 745.2G 0 disk

├─nvme0n1p16 259:10 0 40G 0 part
├─nvme0n1p17 259:11 0 60G 0 part
├─nvme0n1p18 259:12 0 60G 0 part
├─nvme0n1p19 259:13 0 60G 0 part
├─nvme0n1p20 259:14 0 60G 0 part
└─nvme0n1p21 259:15 0 60G 0 part

#3

ghbiz
76 Posts

June 4, 2024, 2:26 pm
Quote from ghbiz on June 4, 2024, 2:26 pm
adding more notes....

Internally, i have solved the issue by manually creating the 60G journal partition and manually setting the label on it for petasan to see it as availble.

At this time, the potential bug is when creating a new journal partition, it does NOT look at the beginning or middle of the drive for available space.

It is expecting all partitions in the beginning to be 60G in size as well...

Command (m for help): p

Disk /dev/nvme0n1: 745.22 GiB, 800166076416 bytes, 1562824368 sectors
Disk model: INTEL SSDPE2MD800G4
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2DCFF5B2-52AE-4079-BC04-08EBCD037EC0

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 125831167 125829120 60G Linux filesystem
/dev/nvme0n1p2 125831168 251660287 125829120 60G Linux filesystem
/dev/nvme0n1p3 251660288 377489407 125829120 60G Linux filesystem
/dev/nvme0n1p4 1006635008 1132464127 125829120 60G Linux filesystem
/dev/nvme0n1p16 1258293248 1342179327 83886080 40G unknown
/dev/nvme0n1p17 1342179328 1468008447 125829120 60G unknown
/dev/nvme0n1p18 503318528 629147647 125829120 60G unknown
/dev/nvme0n1p19 629147648 754976767 125829120 60G unknown
/dev/nvme0n1p20 754976768 880805887 125829120 60G unknown
/dev/nvme0n1p21 880805888 1006635007 125829120 60G unknown

Partition table entries are not in disk order.

Command (m for help): w
The partition table has been altered.
Syncing disks.

root@ceph-node5:~# sgdisk -t 1:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~# sgdisk -t 2:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~# sgdisk -t 3:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~# sgdisk -t 4:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~#

adding more notes....

Internally, i have solved the issue by manually creating the 60G journal partition and manually setting the label on it for petasan to see it as availble.

At this time, the potential bug is when creating a new journal partition, it does NOT look at the beginning or middle of the drive for available space.

It is expecting all partitions in the beginning to be 60G in size as well...

Command (m for help): p

Disk /dev/nvme0n1: 745.22 GiB, 800166076416 bytes, 1562824368 sectors
Disk model: INTEL SSDPE2MD800G4
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2DCFF5B2-52AE-4079-BC04-08EBCD037EC0

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 125831167 125829120 60G Linux filesystem
/dev/nvme0n1p2 125831168 251660287 125829120 60G Linux filesystem
/dev/nvme0n1p3 251660288 377489407 125829120 60G Linux filesystem
/dev/nvme0n1p4 1006635008 1132464127 125829120 60G Linux filesystem
/dev/nvme0n1p16 1258293248 1342179327 83886080 40G unknown
/dev/nvme0n1p17 1342179328 1468008447 125829120 60G unknown
/dev/nvme0n1p18 503318528 629147647 125829120 60G unknown
/dev/nvme0n1p19 629147648 754976767 125829120 60G unknown
/dev/nvme0n1p20 754976768 880805887 125829120 60G unknown
/dev/nvme0n1p21 880805888 1006635007 125829120 60G unknown

Partition table entries are not in disk order.

Command (m for help): w
The partition table has been altered.
Syncing disks.

root@ceph-node5:~# sgdisk -t 1:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~# sgdisk -t 2:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~# sgdisk -t 3:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~# sgdisk -t 4:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme0n1
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ceph-node5:~#

#4

admin
2,958 Posts

June 4, 2024, 9:55 pm
Quote from admin on June 4, 2024, 9:55 pm
You need to add a free disk to be a journal from the UI. Once added PetaSAN assumes it can use the entire disk and will first clean the drive. The system does not support using part of the disk as journal and leave other parts/partitions for something else. The system will not allow you to add a journal if any partitions are already mounted for something else.

If you place an existing journal created from a a different host or from a previous installation, in this case the system will detect it is a journal (no need to add it) will not clean the drive and will leave existing journal partitions and can only create/use new partitions from left space on drive. Old journal partitions are left because the system cannot assume their associated OSDs are no longer needed.

The new partitions created are sized based on bluestore_block_db_size, you can change the value from Configuration -> Ceph Configuration menu. it is possible the journal drive have different size partitions if you change the above value.

You need to add a free disk to be a journal from the UI. Once added PetaSAN assumes it can use the entire disk and will first clean the drive. The system does not support using part of the disk as journal and leave other parts/partitions for something else. The system will not allow you to add a journal if any partitions are already mounted for something else.

If you place an existing journal created from a a different host or from a previous installation, in this case the system will detect it is a journal (no need to add it) will not clean the drive and will leave existing journal partitions and can only create/use new partitions from left space on drive. Old journal partitions are left because the system cannot assume their associated OSDs are no longer needed.

The new partitions created are sized based on bluestore_block_db_size, you can change the value from Configuration -> Ceph Configuration menu. it is possible the journal drive have different size partitions if you change the above value.

Last edited on June 4, 2024, 9:55 pm by admin · #5

Post Reply: unable to add OSD when available journal partition exists

Cancel