Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Adding osd fails out: wipefs error

Hi There.

 

I am attempting to setup a new osd in one of our clusters, and it keeps failing out.

the error message that i can see in the logs is as follows.

27/07/2020 23:50:35 ERROR Error executing : wipefs --all /dev/sdf

27/07/2020 23:50:35 INFO Executing : wipefs --all /dev/sdf

27/07/2020 23:50:35 INFO Start cleaning disk : sdf

27/07/2020 23:50:32 INFO Start add osd job for disk sdf.

27/07/2020 23:50:32 INFO -disk_name sdf

27/07/2020 23:50:32 INFO params

27/07/2020 23:50:32 INFO /opt/petasan/scripts/admin/node_manage_disks.py add-osd

27/07/2020 23:50:32 INFO script

When running that command from ssh, i get the following error.

wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy

 

I originally thought that it may have been because the disks are from a customers array that was returned.

And as such went and removed any raid entries  using sudo mdadm --stop /dev/md125

trying again has however revealed the same results.

I am unsure as to where or what to look for next.

 

i have 12 disks failing on this one node, all in the same manner.

two however have managed to add correctly.

forcing the wipe fs, does however work correctly.

 

Regards

Alex

What PetaSAN version are you using ?

Could be related to zfs metada

http://www.petasan.org/forums/?view=thread&id=152&part=2#postid-805

Could also be an activated lvm vg

do a

pvs -o pv_name,vg_name

if it contains some lvm vg, di-activate with

vgchange -a n vg_name

then try

wipefs -a /dev/xx

 

Hi There.

 

We are running version 2.5.3

I Have tried wiping the first and last hundred sectors of the drive with dd.

root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000200246 s, 256 MB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100 seek=234441548
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.00191303 s, 26.8 MB/s
root@PS-Node04:~# fdisk -l /dev/sdf
Disk /dev/sdf: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

however attempting to add the disk again still fails.

 

There are only two volume groups, and they are on the disk that managed to add successfully.
Regards

Alex

 

 

Not clear if you also did the commands i listed earlier, did manually running the wipefs -a command work ? if yes what is the error log now you get when adding an OSD ? Are you adding OSD with journal or cache ?

Apoligies,

 

Yes i have also tried those commands.

root@PS-Node04:~# pvs -o pv_name,vg_name
PV VG
/dev/sdd1 ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
/dev/sdn1 ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40al-8601-4fe439bf104f
Volume group "ceph-c871e963-5f4a-40al-8601-4fe439bf104f" not found
Cannot process volume group ceph-c871e963-5f4a-40al-8601-4fe439bf104f
root@PS-Node04:~# vgchange -a n ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38

Logical volume ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38/osd-block-8673b2a5-c9bb-4af1-a453-88efaddbb276 in use.
Can't deactivate volume group "ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38" with 1 open logical volume(s)
root@PS-Node04:~#
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
Logical volume ceph-c871e963-5f4a-40a1-8601-4fe439bf104f/osd-block-32d77506-0c9f-470b-b0bd-0ba0065298d4 in use.
Can't deactivate volume group "ceph-c871e963-5f4a-40a1-8601-4fe439bf104f" with 1 open logical volume(s)

root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
root@PS-Node04:~#

 

that also fails.

 

Manually running the wipefs -a -f works, but not wipefs -a.

wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy

I am adding a osd without journal or cache, as i cant add a cache or journal drive either.

 

Regards

Alex

pvs -o pv_name,vg_name

did not list /dev/sdf  so it does not look a an active lvm vg.

We want to understand why the system thinks the device is in use.

can you check if it is used as a mount point

mount | grep "sdf"

if it is, try to find what process is using it

lsof | grep "mount point"

If you can unmount it via

umount "mount point"

and try the wipefs command.

If this fails, try the dd command by write 100M rather than 100 sectors. If nothing works and you can reboot, try rebooting and re-doing the above commands. You can also try to wipe the drive on some other box, even on Windows.

Another thing is to check if there are no errors in the drive

dmesg | grep sdf

Hi There.

root@PS-Node04:~# mount | grep "sdf"

No listings for that drive.
root@PS-Node04:~# dmesg | grep sdf
[ 3.027806] sd 2:0:0:0: [sdf] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[ 3.027827] sd 2:0:0:0: [sdf] Write Protect is off
[ 3.027830] sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[ 3.027862] sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 3.028723] sd 2:0:0:0: [sdf] Attached SCSI disk

so i cant see anything wrong with the drive.

unless cache is doing something.

root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=100M count=1 seek=1
1+0 records in
1+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.065108 s, 1.6 GB/s

root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=1k count=102400 seek=$(($(awk '$4 == "sdf" {print $3}' </proc/partitions) - 102400 ))
102400+0 records in
102400+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.04626 s, 100 MB/s

root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy

Still failing.

also for reference, the full mount listing.

root@PS-Node04:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=24666292k,nr_inodes=6166573,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=4938092k,mode=755)
/dev/sde3 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13028)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
/dev/sde4 on /var/lib/ceph type ext4 (rw,relatime,data=ordered)
/dev/sde2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
/dev/sde5 on /opt/petasan/config type ext4 (rw,relatime,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-26 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-27 type tmpfs (rw,relatime)
10.0.201.11:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

 

Regards

Alex

could try to reboot then wipe the disks. if still an issue try to wipe the disks on a separate box.