Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Replace failed drive

Pages: 1 2

The disk will be in Friday so will remove it and replace it, then see if UI is working.

Thanks,

Got the bad drive in and spun up, and have a GUI which shows status of  OSD 9 down.  Do I need to delete the disk and add it back?  If I do delete the disk, will the GUI change to add it back?

Deleted Disk and it removed it from the GUI.  Replaced disk does not show up to add, how do we add the disk?

New drive is not showing up in GUI.

Looks like the some things failed to prep the disk.

 

23/09/2020 10:19:53 ERROR sync_replication_node called on non-backup node

23/09/2020 10:19:52 INFO Update roles.

23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.

23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.

23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.

23/09/2020 09:35:21 ERROR ceph-volume.zap failed

23/09/2020 09:35:18 INFO Starting ceph-volume --log-path /opt/petasan/log lvm zap /dev/not_set --destroy

23/09/2020 09:35:18 INFO Start cleaning disk not_set

23/09/2020 09:35:18 INFO osd.9 is deleted from crush map

23/09/2020 09:35:17 INFO Start delete osd.9 from crush map

23/09/2020 09:35:17 WARNING osd.9 is removed from crush map

23/09/2020 09:35:17 WARNING The osd still up you need to stop osd service of osd.9

23/09/2020 09:35:16 ERROR Error executing ceph auth del osd.9

23/09/2020 09:35:15 ERROR Error executing ceph osd crush remove osd.9

23/09/2020 09:35:13 ERROR Error executing ceph osd out osd.9

23/09/2020 09:35:12 INFO Start remove osd.9 from crush map

23/09/2020 09:35:08 INFO Start delete osd job 1349670

23/09/2020 09:35:08 INFO Start delete job for osd.9

23/09/2020 09:35:08 INFO -id 9 -disk_name not_set

23/09/2020 09:35:08 INFO params

23/09/2020 09:35:08 INFO /opt/petasan/scripts/admin/node_manage_disks.py delete-osd

you can delete the bad OSD either before removal or after it is removed.

if you do not see the new drive listed in the ui, there could be an issue with the hardware, check it is listed in /sys/block and /dev/sdX if not look at kernel logs via dmesg there could be a problem with it.

Replaced drive with a new one, same results, no disk showing up in GUI to add

root@PS-Node1:/sys/block# ls
dm-0 dm-2 dm-4 rbd0 sda sdd sdf sr0
dm-1 dm-3 dm-5 rbd1 sdb sde sdgcd /sys

No sdc

/dev/sdc  did not show up

/var/lib/ceph/osd# ls
ceph-10 ceph-11 ceph-6 ceph-7 ceph-8 ceph-9

Loads of these in dmesg

[31534765.299824] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534766.619981] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.503301] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.643601] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.799517] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.965121] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534785.127645] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534787.274066] Buffer I/O error on dev dm-5, logical block 976486384, async page read

Just a thought, do these drives need to be initialized?  These are disk straight out of the package

If this one node was to be updated to 2.6, will it have any issues as the other nodes are 2.3.1?  Just until node 1 is good, than we'll update the others.

It depends on the time of the error, do a dmesg -T to make sure

You do not need to do anything for the new drives to show up, but they are not detected in /sys/block which could be a hardware issue.

You can upgrade if health status of cluster is Ok

Really appreciate the responses, we'll look more into being a backplane issue.  Thanks

Pages: 1 2