Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

PetaSAN 3.0.1 OSD cannot be added with "Cache" selected

I am trying to complete an initial install on production equipment.
When manually adding the HDDs as OSDs with Journal and Cache, the OSD is not added and the SSD with the journal specification is removed.
Meaning that the Journal disk is removed. I can add it back but whether or not I specify a Journal disk or leave as Auto, the behavior is the same.

If I only do "Journal" it works. But if I choose the option Cache with or without Journal, the OSD is not added.

I am not sure where to look to see what is going on so as to fix it.

  1. Can you describe your hardware: ram, cpu, disk type (ssd/hdd) and configuration (journals./cache)  ?
  2. After rebooting the node which you want to add OSD, try to add OSD from ui  then on failure please post last section of /opt/petasan/log/PetaSAN.log on the node with OSD.

 

  1. (per node) 156GB memory, Intel Xeon Silver 4110,3x Intel SSDPE2KX010T8 (NVMe SSD), 6x Seagate ST8000NM03A (8TB HDD) nvme1- cache, nvme2 OSD (for Metadata), nvme3 - journals. Still trying to setup and OSD with both Cache and Journal but seems unlikely.
  2. Please see below:

23/12/2021 20:58:09 INFO Searching for any unlinked journals to be reused.
23/12/2021 20:58:09 INFO OSD IDs list for this node are : [1]
23/12/2021 20:58:09 INFO Found unlinked journal partition : nvme2n1p1 with label name : ceph-journal
23/12/2021 20:58:09 INFO Mark journal partition nvme2n1p1 as available.
23/12/2021 20:58:09 INFO Start setting partition type for nvme2n1p1
23/12/2021 20:58:09 INFO Starting sgdisk -t 1:103af3d7-a019-4e56-bfe0-4d664b989f40 /dev/nvme2n1
23/12/2021 20:58:10 INFO Calling partprobe on nvme2n1 device
23/12/2021 20:58:10 INFO Executing partprobe /dev/nvme2n1
23/12/2021 20:58:10 INFO Calling udevadm on nvme2n1 device
23/12/2021 20:58:10 INFO Executing udevadm settle --timeout 30
23/12/2021 20:58:17 INFO Searching for any unlinked caches to be reused.
23/12/2021 20:58:17 INFO OSD IDs list for this node are : [1]
23/12/2021 20:58:21 INFO Running script : /opt/petasan/scripts/admin/node_manage_disks.py add-osd -disk_name sda -journal auto -cache auto -cache_type writecache
23/12/2021 20:58:21 INFO Start add osd job for disk sda.
23/12/2021 20:58:24 INFO Start cleaning : sda
23/12/2021 20:58:25 INFO Executing : wipefs --all /dev/sda
23/12/2021 20:58:25 INFO Executing : dd if=/dev/zero of=/dev/sda bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
23/12/2021 20:58:25 INFO Executing : parted -s /dev/sda mklabel gpt
23/12/2021 20:58:25 INFO Executing : partprobe /dev/sda
23/12/2021 20:58:28 INFO Auto select journal for disk sda.
23/12/2021 20:58:31 INFO S3 init_action : s3_setting is not complete
23/12/2021 20:58:31 INFO Start cleaning : nvme2n1
23/12/2021 20:58:35 INFO Executing : wipefs --all /dev/nvme2n1p1
23/12/2021 20:58:35 INFO Executing : dd if=/dev/zero of=/dev/nvme2n1p1 bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
23/12/2021 20:58:35 INFO Executing : wipefs --all /dev/nvme2n1
23/12/2021 20:58:35 INFO Executing : dd if=/dev/zero of=/dev/nvme2n1 bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
23/12/2021 20:58:35 INFO Executing : parted -s /dev/nvme2n1 mklabel gpt
23/12/2021 20:58:35 INFO Executing : partprobe /dev/nvme2n1
23/12/2021 20:58:38 INFO User selected Auto journal, selected device is nvme2n1 disk for disk sda.
23/12/2021 20:58:38 INFO Auto select cache for disk sda.
23/12/2021 20:58:40 INFO User selected Auto cache, selected device is nvme0n1 disk for disk sda.
23/12/2021 20:58:40 INFO ==================================================
23/12/2021 20:58:40 INFO ===== Building DM-Writecache =====
23/12/2021 20:58:40 INFO ==================================================
23/12/2021 20:58:40 INFO Step 1 : Preparing Slow Disk :
23/12/2021 20:58:40 INFO ------------------------------
23/12/2021 20:58:40 INFO Creating data partition num 1 size 7630885MB on /dev/sda
23/12/2021 20:58:42 INFO Calling partprobe on sda device
23/12/2021 20:58:42 INFO Executing partprobe /dev/sda
23/12/2021 20:58:42 INFO Calling udevadm on sda device
23/12/2021 20:58:42 INFO Executing udevadm settle --timeout 30
23/12/2021 20:58:45 INFO origin_part_name = /dev/sda1
23/12/2021 20:58:45 INFO done
23/12/2021 20:58:45 INFO Step 2 : Preparing Fast Disk :
23/12/2021 20:58:45 INFO ------------------------------
23/12/2021 20:58:45 INFO Start cleaning : nvme0n1p1
23/12/2021 20:58:45 INFO Executing : umount /dev/nvme0n1p1
23/12/2021 20:58:45 ERROR Error executing : umount /dev/nvme0n1p1
23/12/2021 20:58:45 INFO Executing : wipefs --all /dev/nvme0n1p1
23/12/2021 20:58:45 INFO Executing : dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
23/12/2021 20:58:45 INFO Executing : dd bs=1M seek=276680704 if=/dev/zero of=/dev/nvme0n1p1 count=20 oflag=seek_bytes,direct,dsync >/dev/null 2>&1
23/12/2021 20:58:45 INFO cache_part_path = /dev/nvme0n1p1
23/12/2021 20:58:45 INFO done
23/12/2021 20:58:45 INFO Step 3 : Creating Physical Volumes :
23/12/2021 20:58:45 INFO ------------------------------------
23/12/2021 20:58:45 INFO done
23/12/2021 20:58:45 INFO Step 4 : Creating Volume Group :
23/12/2021 20:58:45 INFO --------------------------------
23/12/2021 20:58:46 INFO Start remove osd.9 from crush map
23/12/2021 20:58:48 INFO osd.9 is removed from crush map
23/12/2021 20:58:48 INFO new next_osd_id = 9
23/12/2021 20:58:48 INFO new vg_name = ps-e83e35b5-6e8f-4014-94a2-6b0831817077-wc-osd.9
23/12/2021 20:58:48 ERROR lvm_lib : Error creating volume group --> Devices have inconsistent logical block sizes (4096 and 512).
23/12/2021 20:58:48 INFO Creating Volume Group has been failed.
23/12/2021 20:58:48 INFO Start cleaning : sda
23/12/2021 20:58:48 ERROR lvm_lib : Error in deactivate_vg, vg does not exist
23/12/2021 20:58:52 INFO Executing : wipefs --all /dev/sda1
23/12/2021 20:58:52 INFO Executing : dd if=/dev/zero of=/dev/sda1 bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
23/12/2021 20:58:52 INFO Executing : wipefs --all /dev/sda
23/12/2021 20:58:52 INFO Executing : dd if=/dev/zero of=/dev/sda bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
23/12/2021 20:58:53 INFO Executing : parted -s /dev/sda mklabel gpt
23/12/2021 20:58:53 INFO Executing : partprobe /dev/sda
23/12/2021 20:58:56 INFO Start cleaning : nvme0n1p1
23/12/2021 20:58:56 ERROR lvm_lib : Error in deactivate_vg, vg does not exist
23/12/2021 20:58:56 INFO Executing : umount /dev/nvme0n1p1
23/12/2021 20:58:56 ERROR Error executing : umount /dev/nvme0n1p1
23/12/2021 20:58:56 INFO Executing : wipefs --all /dev/nvme0n1p1
23/12/2021 20:58:56 INFO Executing : dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=20 oflag=direct,dsync >/dev/null 2>&1
23/12/2021 20:58:56 INFO Start setting partition type for nvme0n1p1
23/12/2021 20:58:56 INFO Starting sgdisk -t 1:5b3f01d6-70d6-421a-a101-4b3131a8d600 /dev/nvme0n1
23/12/2021 20:58:57 INFO Calling partprobe on nvme0n1 device
23/12/2021 20:58:57 INFO Executing partprobe /dev/nvme0n1
23/12/2021 20:58:57 INFO Calling udevadm on nvme0n1 device
23/12/2021 20:58:57 INFO Executing udevadm settle --timeout 30

 

It seems the error is due to selecting cache and there seems to be old logical volumes, either old lvms on the disk or maybe from failed earlier attempts to add OSD. I suggest to wipe out drives that are not being used now, then try to add OSD, first try without journal then with journal+cache.

Another possibility:

https://access.redhat.com/solutions/5822381

allow_mixed_block_sizes set it to 1

 

Worked with no issues.

thanks for the feedback