Adding osd fails out: wipefs error
wolfesupport
4 Posts
July 28, 2020, 8:23 amQuote from wolfesupport on July 28, 2020, 8:23 amHi There.
I am attempting to setup a new osd in one of our clusters, and it keeps failing out.
the error message that i can see in the logs is as follows.
27/07/2020 23:50:35 ERROR Error executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Start cleaning disk : sdf
27/07/2020 23:50:32 INFO Start add osd job for disk sdf.
27/07/2020 23:50:32 INFO -disk_name sdf
27/07/2020 23:50:32 INFO params
27/07/2020 23:50:32 INFO /opt/petasan/scripts/admin/node_manage_disks.py add-osd
27/07/2020 23:50:32 INFO script
When running that command from ssh, i get the following error.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I originally thought that it may have been because the disks are from a customers array that was returned.
And as such went and removed any raid entries using sudo mdadm --stop /dev/md125
trying again has however revealed the same results.
I am unsure as to where or what to look for next.
i have 12 disks failing on this one node, all in the same manner.
two however have managed to add correctly.
forcing the wipe fs, does however work correctly.
Regards
Alex
Hi There.
I am attempting to setup a new osd in one of our clusters, and it keeps failing out.
the error message that i can see in the logs is as follows.
27/07/2020 23:50:35 ERROR Error executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Start cleaning disk : sdf
27/07/2020 23:50:32 INFO Start add osd job for disk sdf.
27/07/2020 23:50:32 INFO -disk_name sdf
27/07/2020 23:50:32 INFO params
27/07/2020 23:50:32 INFO /opt/petasan/scripts/admin/node_manage_disks.py add-osd
27/07/2020 23:50:32 INFO script
When running that command from ssh, i get the following error.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I originally thought that it may have been because the disks are from a customers array that was returned.
And as such went and removed any raid entries using sudo mdadm --stop /dev/md125
trying again has however revealed the same results.
I am unsure as to where or what to look for next.
i have 12 disks failing on this one node, all in the same manner.
two however have managed to add correctly.
forcing the wipe fs, does however work correctly.
Regards
Alex
Last edited on August 3, 2020, 2:01 am by wolfesupport · #1
admin
2,930 Posts
July 28, 2020, 9:16 amQuote from admin on July 28, 2020, 9:16 amWhat PetaSAN version are you using ?
Could be related to zfs metada
http://www.petasan.org/forums/?view=thread&id=152&part=2#postid-805
Could also be an activated lvm vg
do a
pvs -o pv_name,vg_name
if it contains some lvm vg, di-activate with
vgchange -a n vg_name
then try
wipefs -a /dev/xx
What PetaSAN version are you using ?
Could be related to zfs metada
http://www.petasan.org/forums/?view=thread&id=152&part=2#postid-805
Could also be an activated lvm vg
do a
pvs -o pv_name,vg_name
if it contains some lvm vg, di-activate with
vgchange -a n vg_name
then try
wipefs -a /dev/xx
Last edited on July 28, 2020, 9:22 am by admin · #2
wolfesupport
4 Posts
August 3, 2020, 2:09 amQuote from wolfesupport on August 3, 2020, 2:09 amHi There.
We are running version 2.5.3
I Have tried wiping the first and last hundred sectors of the drive with dd.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000200246 s, 256 MB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100 seek=234441548
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.00191303 s, 26.8 MB/s
root@PS-Node04:~# fdisk -l /dev/sdf
Disk /dev/sdf: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
however attempting to add the disk again still fails.
There are only two volume groups, and they are on the disk that managed to add successfully.
Regards
Alex
Hi There.
We are running version 2.5.3
I Have tried wiping the first and last hundred sectors of the drive with dd.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000200246 s, 256 MB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100 seek=234441548
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.00191303 s, 26.8 MB/s
root@PS-Node04:~# fdisk -l /dev/sdf
Disk /dev/sdf: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
however attempting to add the disk again still fails.
There are only two volume groups, and they are on the disk that managed to add successfully.
Regards
Alex
admin
2,930 Posts
August 3, 2020, 4:31 amQuote from admin on August 3, 2020, 4:31 amNot clear if you also did the commands i listed earlier, did manually running the wipefs -a command work ? if yes what is the error log now you get when adding an OSD ? Are you adding OSD with journal or cache ?
Not clear if you also did the commands i listed earlier, did manually running the wipefs -a command work ? if yes what is the error log now you get when adding an OSD ? Are you adding OSD with journal or cache ?
Last edited on August 3, 2020, 4:32 am by admin · #4
wolfesupport
4 Posts
August 3, 2020, 8:22 amQuote from wolfesupport on August 3, 2020, 8:22 amApoligies,
Yes i have also tried those commands.
root@PS-Node04:~# pvs -o pv_name,vg_name
PV VG
/dev/sdd1 ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
/dev/sdn1 ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40al-8601-4fe439bf104f
Volume group "ceph-c871e963-5f4a-40al-8601-4fe439bf104f" not found
Cannot process volume group ceph-c871e963-5f4a-40al-8601-4fe439bf104f
root@PS-Node04:~# vgchange -a n ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
Logical volume ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38/osd-block-8673b2a5-c9bb-4af1-a453-88efaddbb276 in use.
Can't deactivate volume group "ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38" with 1 open logical volume(s)
root@PS-Node04:~#
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
Logical volume ceph-c871e963-5f4a-40a1-8601-4fe439bf104f/osd-block-32d77506-0c9f-470b-b0bd-0ba0065298d4 in use.
Can't deactivate volume group "ceph-c871e963-5f4a-40a1-8601-4fe439bf104f" with 1 open logical volume(s)
root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
root@PS-Node04:~#
that also fails.
Manually running the wipefs -a -f works, but not wipefs -a.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I am adding a osd without journal or cache, as i cant add a cache or journal drive either.
Regards
Alex
Apoligies,
Yes i have also tried those commands.
root@PS-Node04:~# pvs -o pv_name,vg_name
PV VG
/dev/sdd1 ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
/dev/sdn1 ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40al-8601-4fe439bf104f
Volume group "ceph-c871e963-5f4a-40al-8601-4fe439bf104f" not found
Cannot process volume group ceph-c871e963-5f4a-40al-8601-4fe439bf104f
root@PS-Node04:~# vgchange -a n ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
Logical volume ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38/osd-block-8673b2a5-c9bb-4af1-a453-88efaddbb276 in use.
Can't deactivate volume group "ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38" with 1 open logical volume(s)
root@PS-Node04:~#
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
Logical volume ceph-c871e963-5f4a-40a1-8601-4fe439bf104f/osd-block-32d77506-0c9f-470b-b0bd-0ba0065298d4 in use.
Can't deactivate volume group "ceph-c871e963-5f4a-40a1-8601-4fe439bf104f" with 1 open logical volume(s)
root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
root@PS-Node04:~#
that also fails.
Manually running the wipefs -a -f works, but not wipefs -a.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I am adding a osd without journal or cache, as i cant add a cache or journal drive either.
Regards
Alex
Last edited on August 3, 2020, 8:50 am by wolfesupport · #5
admin
2,930 Posts
August 3, 2020, 11:13 amQuote from admin on August 3, 2020, 11:13 ampvs -o pv_name,vg_name
did not list /dev/sdf so it does not look a an active lvm vg.
We want to understand why the system thinks the device is in use.
can you check if it is used as a mount point
mount | grep "sdf"
if it is, try to find what process is using it
lsof | grep "mount point"
If you can unmount it via
umount "mount point"
and try the wipefs command.
If this fails, try the dd command by write 100M rather than 100 sectors. If nothing works and you can reboot, try rebooting and re-doing the above commands. You can also try to wipe the drive on some other box, even on Windows.
Another thing is to check if there are no errors in the drive
dmesg | grep sdf
pvs -o pv_name,vg_name
did not list /dev/sdf so it does not look a an active lvm vg.
We want to understand why the system thinks the device is in use.
can you check if it is used as a mount point
mount | grep "sdf"
if it is, try to find what process is using it
lsof | grep "mount point"
If you can unmount it via
umount "mount point"
and try the wipefs command.
If this fails, try the dd command by write 100M rather than 100 sectors. If nothing works and you can reboot, try rebooting and re-doing the above commands. You can also try to wipe the drive on some other box, even on Windows.
Another thing is to check if there are no errors in the drive
dmesg | grep sdf
wolfesupport
4 Posts
August 5, 2020, 12:32 amQuote from wolfesupport on August 5, 2020, 12:32 amHi There.
root@PS-Node04:~# mount | grep "sdf"
No listings for that drive.
root@PS-Node04:~# dmesg | grep sdf
[ 3.027806] sd 2:0:0:0: [sdf] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[ 3.027827] sd 2:0:0:0: [sdf] Write Protect is off
[ 3.027830] sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[ 3.027862] sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 3.028723] sd 2:0:0:0: [sdf] Attached SCSI disk
so i cant see anything wrong with the drive.
unless cache is doing something.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=100M count=1 seek=1
1+0 records in
1+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.065108 s, 1.6 GB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=1k count=102400 seek=$(($(awk '$4 == "sdf" {print $3}' </proc/partitions) - 102400 ))
102400+0 records in
102400+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.04626 s, 100 MB/s
root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
Still failing.
also for reference, the full mount listing.
root@PS-Node04:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=24666292k,nr_inodes=6166573,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=4938092k,mode=755)
/dev/sde3 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13028)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
/dev/sde4 on /var/lib/ceph type ext4 (rw,relatime,data=ordered)
/dev/sde2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
/dev/sde5 on /opt/petasan/config type ext4 (rw,relatime,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-26 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-27 type tmpfs (rw,relatime)
10.0.201.11:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
Regards
Alex
Hi There.
root@PS-Node04:~# mount | grep "sdf"
No listings for that drive.
root@PS-Node04:~# dmesg | grep sdf
[ 3.027806] sd 2:0:0:0: [sdf] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[ 3.027827] sd 2:0:0:0: [sdf] Write Protect is off
[ 3.027830] sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[ 3.027862] sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 3.028723] sd 2:0:0:0: [sdf] Attached SCSI disk
so i cant see anything wrong with the drive.
unless cache is doing something.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=100M count=1 seek=1
1+0 records in
1+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.065108 s, 1.6 GB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=1k count=102400 seek=$(($(awk '$4 == "sdf" {print $3}' </proc/partitions) - 102400 ))
102400+0 records in
102400+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.04626 s, 100 MB/s
root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
Still failing.
also for reference, the full mount listing.
root@PS-Node04:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=24666292k,nr_inodes=6166573,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=4938092k,mode=755)
/dev/sde3 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13028)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
/dev/sde4 on /var/lib/ceph type ext4 (rw,relatime,data=ordered)
/dev/sde2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
/dev/sde5 on /opt/petasan/config type ext4 (rw,relatime,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-26 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-27 type tmpfs (rw,relatime)
10.0.201.11:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
Regards
Alex
Last edited on August 5, 2020, 12:33 am by wolfesupport · #7
admin
2,930 Posts
August 5, 2020, 12:36 pmQuote from admin on August 5, 2020, 12:36 pmcould try to reboot then wipe the disks. if still an issue try to wipe the disks on a separate box.
could try to reboot then wipe the disks. if still an issue try to wipe the disks on a separate box.
Adding osd fails out: wipefs error
wolfesupport
4 Posts
Quote from wolfesupport on July 28, 2020, 8:23 amHi There.
I am attempting to setup a new osd in one of our clusters, and it keeps failing out.
the error message that i can see in the logs is as follows.
27/07/2020 23:50:35 ERROR Error executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Start cleaning disk : sdf
27/07/2020 23:50:32 INFO Start add osd job for disk sdf.
27/07/2020 23:50:32 INFO -disk_name sdf
27/07/2020 23:50:32 INFO params
27/07/2020 23:50:32 INFO /opt/petasan/scripts/admin/node_manage_disks.py add-osd
27/07/2020 23:50:32 INFO script
When running that command from ssh, i get the following error.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I originally thought that it may have been because the disks are from a customers array that was returned.
And as such went and removed any raid entries using sudo mdadm --stop /dev/md125
trying again has however revealed the same results.
I am unsure as to where or what to look for next.
i have 12 disks failing on this one node, all in the same manner.
two however have managed to add correctly.
forcing the wipe fs, does however work correctly.
Regards
Alex
Hi There.
I am attempting to setup a new osd in one of our clusters, and it keeps failing out.
the error message that i can see in the logs is as follows.
27/07/2020 23:50:35 ERROR Error executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Executing : wipefs --all /dev/sdf
27/07/2020 23:50:35 INFO Start cleaning disk : sdf
27/07/2020 23:50:32 INFO Start add osd job for disk sdf.
27/07/2020 23:50:32 INFO -disk_name sdf
27/07/2020 23:50:32 INFO params
27/07/2020 23:50:32 INFO /opt/petasan/scripts/admin/node_manage_disks.py add-osd
27/07/2020 23:50:32 INFO script
When running that command from ssh, i get the following error.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I originally thought that it may have been because the disks are from a customers array that was returned.
And as such went and removed any raid entries using sudo mdadm --stop /dev/md125
trying again has however revealed the same results.
I am unsure as to where or what to look for next.
i have 12 disks failing on this one node, all in the same manner.
two however have managed to add correctly.
forcing the wipe fs, does however work correctly.
Regards
Alex
admin
2,930 Posts
Quote from admin on July 28, 2020, 9:16 amWhat PetaSAN version are you using ?
Could be related to zfs metada
http://www.petasan.org/forums/?view=thread&id=152&part=2#postid-805
Could also be an activated lvm vg
do a
pvs -o pv_name,vg_name
if it contains some lvm vg, di-activate with
vgchange -a n vg_name
then try
wipefs -a /dev/xx
What PetaSAN version are you using ?
Could be related to zfs metada
http://www.petasan.org/forums/?view=thread&id=152&part=2#postid-805
Could also be an activated lvm vg
do a
pvs -o pv_name,vg_name
if it contains some lvm vg, di-activate with
vgchange -a n vg_name
then try
wipefs -a /dev/xx
wolfesupport
4 Posts
Quote from wolfesupport on August 3, 2020, 2:09 amHi There.
We are running version 2.5.3
I Have tried wiping the first and last hundred sectors of the drive with dd.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000200246 s, 256 MB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100 seek=234441548
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.00191303 s, 26.8 MB/s
root@PS-Node04:~# fdisk -l /dev/sdf
Disk /dev/sdf: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 byteshowever attempting to add the disk again still fails.
There are only two volume groups, and they are on the disk that managed to add successfully.
RegardsAlex
Hi There.
We are running version 2.5.3
I Have tried wiping the first and last hundred sectors of the drive with dd.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000200246 s, 256 MB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=512 count=100 seek=234441548
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.00191303 s, 26.8 MB/s
root@PS-Node04:~# fdisk -l /dev/sdf
Disk /dev/sdf: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
however attempting to add the disk again still fails.
There are only two volume groups, and they are on the disk that managed to add successfully.
Regards
Alex
admin
2,930 Posts
Quote from admin on August 3, 2020, 4:31 amNot clear if you also did the commands i listed earlier, did manually running the wipefs -a command work ? if yes what is the error log now you get when adding an OSD ? Are you adding OSD with journal or cache ?
Not clear if you also did the commands i listed earlier, did manually running the wipefs -a command work ? if yes what is the error log now you get when adding an OSD ? Are you adding OSD with journal or cache ?
wolfesupport
4 Posts
Quote from wolfesupport on August 3, 2020, 8:22 amApoligies,
Yes i have also tried those commands.
root@PS-Node04:~# pvs -o pv_name,vg_name
PV VG
/dev/sdd1 ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
/dev/sdn1 ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40al-8601-4fe439bf104f
Volume group "ceph-c871e963-5f4a-40al-8601-4fe439bf104f" not found
Cannot process volume group ceph-c871e963-5f4a-40al-8601-4fe439bf104f
root@PS-Node04:~# vgchange -a n ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38Logical volume ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38/osd-block-8673b2a5-c9bb-4af1-a453-88efaddbb276 in use.
Can't deactivate volume group "ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38" with 1 open logical volume(s)
root@PS-Node04:~#
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
Logical volume ceph-c871e963-5f4a-40a1-8601-4fe439bf104f/osd-block-32d77506-0c9f-470b-b0bd-0ba0065298d4 in use.
Can't deactivate volume group "ceph-c871e963-5f4a-40a1-8601-4fe439bf104f" with 1 open logical volume(s)root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
root@PS-Node04:~#
that also fails.
Manually running the wipefs -a -f works, but not wipefs -a.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I am adding a osd without journal or cache, as i cant add a cache or journal drive either.
Regards
Alex
Apoligies,
Yes i have also tried those commands.
root@PS-Node04:~# pvs -o pv_name,vg_name
PV VG
/dev/sdd1 ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
/dev/sdn1 ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40al-8601-4fe439bf104f
Volume group "ceph-c871e963-5f4a-40al-8601-4fe439bf104f" not found
Cannot process volume group ceph-c871e963-5f4a-40al-8601-4fe439bf104f
root@PS-Node04:~# vgchange -a n ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38
Logical volume ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38/osd-block-8673b2a5-c9bb-4af1-a453-88efaddbb276 in use.
Can't deactivate volume group "ceph-e9b5f72e-b035-4aa3-b17c-bc615e492e38" with 1 open logical volume(s)
root@PS-Node04:~#
root@PS-Node04:~# vgchange -a n ceph-c871e963-5f4a-40a1-8601-4fe439bf104f
Logical volume ceph-c871e963-5f4a-40a1-8601-4fe439bf104f/osd-block-32d77506-0c9f-470b-b0bd-0ba0065298d4 in use.
Can't deactivate volume group "ceph-c871e963-5f4a-40a1-8601-4fe439bf104f" with 1 open logical volume(s)
root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
root@PS-Node04:~#
that also fails.
Manually running the wipefs -a -f works, but not wipefs -a.
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
I am adding a osd without journal or cache, as i cant add a cache or journal drive either.
Regards
Alex
admin
2,930 Posts
Quote from admin on August 3, 2020, 11:13 ampvs -o pv_name,vg_name
did not list /dev/sdf so it does not look a an active lvm vg.
We want to understand why the system thinks the device is in use.
can you check if it is used as a mount point
mount | grep "sdf"
if it is, try to find what process is using it
lsof | grep "mount point"
If you can unmount it via
umount "mount point"
and try the wipefs command.
If this fails, try the dd command by write 100M rather than 100 sectors. If nothing works and you can reboot, try rebooting and re-doing the above commands. You can also try to wipe the drive on some other box, even on Windows.
Another thing is to check if there are no errors in the drive
dmesg | grep sdf
pvs -o pv_name,vg_name
did not list /dev/sdf so it does not look a an active lvm vg.
We want to understand why the system thinks the device is in use.
can you check if it is used as a mount point
mount | grep "sdf"
if it is, try to find what process is using it
lsof | grep "mount point"
If you can unmount it via
umount "mount point"
and try the wipefs command.
If this fails, try the dd command by write 100M rather than 100 sectors. If nothing works and you can reboot, try rebooting and re-doing the above commands. You can also try to wipe the drive on some other box, even on Windows.
Another thing is to check if there are no errors in the drive
dmesg | grep sdf
wolfesupport
4 Posts
Quote from wolfesupport on August 5, 2020, 12:32 amHi There.
root@PS-Node04:~# mount | grep "sdf"
No listings for that drive.
root@PS-Node04:~# dmesg | grep sdf
[ 3.027806] sd 2:0:0:0: [sdf] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[ 3.027827] sd 2:0:0:0: [sdf] Write Protect is off
[ 3.027830] sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[ 3.027862] sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 3.028723] sd 2:0:0:0: [sdf] Attached SCSI diskso i cant see anything wrong with the drive.
unless cache is doing something.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=100M count=1 seek=1
1+0 records in
1+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.065108 s, 1.6 GB/sroot@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=1k count=102400 seek=$(($(awk '$4 == "sdf" {print $3}' </proc/partitions) - 102400 ))
102400+0 records in
102400+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.04626 s, 100 MB/sroot@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busyStill failing.
also for reference, the full mount listing.
root@PS-Node04:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=24666292k,nr_inodes=6166573,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=4938092k,mode=755)
/dev/sde3 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13028)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
/dev/sde4 on /var/lib/ceph type ext4 (rw,relatime,data=ordered)
/dev/sde2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
/dev/sde5 on /opt/petasan/config type ext4 (rw,relatime,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-26 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-27 type tmpfs (rw,relatime)
10.0.201.11:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
Regards
Alex
Hi There.
root@PS-Node04:~# mount | grep "sdf"
No listings for that drive.
root@PS-Node04:~# dmesg | grep sdf
[ 3.027806] sd 2:0:0:0: [sdf] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[ 3.027827] sd 2:0:0:0: [sdf] Write Protect is off
[ 3.027830] sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[ 3.027862] sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 3.028723] sd 2:0:0:0: [sdf] Attached SCSI disk
so i cant see anything wrong with the drive.
unless cache is doing something.
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=100M count=1 seek=1
1+0 records in
1+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.065108 s, 1.6 GB/s
root@PS-Node04:~# dd if=/dev/zero of=/dev/sdf bs=1k count=102400 seek=$(($(awk '$4 == "sdf" {print $3}' </proc/partitions) - 102400 ))
102400+0 records in
102400+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.04626 s, 100 MB/s
root@PS-Node04:~# wipefs -a /dev/sdf
wipefs: error: /dev/sdf: probing initialization failed: Device or resource busy
Still failing.
also for reference, the full mount listing.
root@PS-Node04:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=24666292k,nr_inodes=6166573,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=4938092k,mode=755)
/dev/sde3 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13028)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
/dev/sde4 on /var/lib/ceph type ext4 (rw,relatime,data=ordered)
/dev/sde2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
/dev/sde5 on /opt/petasan/config type ext4 (rw,relatime,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-26 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-27 type tmpfs (rw,relatime)
10.0.201.11:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
Regards
Alex
admin
2,930 Posts
Quote from admin on August 5, 2020, 12:36 pmcould try to reboot then wipe the disks. if still an issue try to wipe the disks on a separate box.
could try to reboot then wipe the disks. if still an issue try to wipe the disks on a separate box.