Replace failed drive
Pages: 1 2
khopkins
96 Posts
September 2, 2020, 7:23 pmQuote from khopkins on September 2, 2020, 7:23 pmThe disk will be in Friday so will remove it and replace it, then see if UI is working.
Thanks,
The disk will be in Friday so will remove it and replace it, then see if UI is working.
Thanks,
khopkins
96 Posts
September 23, 2020, 2:10 pmQuote from khopkins on September 23, 2020, 2:10 pmGot the bad drive in and spun up, and have a GUI which shows status of OSD 9 down. Do I need to delete the disk and add it back? If I do delete the disk, will the GUI change to add it back?
Deleted Disk and it removed it from the GUI. Replaced disk does not show up to add, how do we add the disk?
New drive is not showing up in GUI.
Got the bad drive in and spun up, and have a GUI which shows status of OSD 9 down. Do I need to delete the disk and add it back? If I do delete the disk, will the GUI change to add it back?
Deleted Disk and it removed it from the GUI. Replaced disk does not show up to add, how do we add the disk?
New drive is not showing up in GUI.
Last edited on September 23, 2020, 4:00 pm by khopkins · #12
khopkins
96 Posts
September 23, 2020, 4:39 pmQuote from khopkins on September 23, 2020, 4:39 pm
Looks like the some things failed to prep the disk.
23/09/2020 10:19:53 ERROR sync_replication_node called on non-backup node
23/09/2020 10:19:52 INFO Update roles.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 09:35:21 ERROR ceph-volume.zap failed
23/09/2020 09:35:18 INFO Starting ceph-volume --log-path /opt/petasan/log lvm zap /dev/not_set --destroy
23/09/2020 09:35:18 INFO Start cleaning disk not_set
23/09/2020 09:35:18 INFO osd.9 is deleted from crush map
23/09/2020 09:35:17 INFO Start delete osd.9 from crush map
23/09/2020 09:35:17 WARNING osd.9 is removed from crush map
23/09/2020 09:35:17 WARNING The osd still up you need to stop osd service of osd.9
23/09/2020 09:35:16 ERROR Error executing ceph auth del osd.9
23/09/2020 09:35:15 ERROR Error executing ceph osd crush remove osd.9
23/09/2020 09:35:13 ERROR Error executing ceph osd out osd.9
23/09/2020 09:35:12 INFO Start remove osd.9 from crush map
23/09/2020 09:35:08 INFO Start delete osd job 1349670
23/09/2020 09:35:08 INFO Start delete job for osd.9
23/09/2020 09:35:08 INFO -id 9 -disk_name not_set
23/09/2020 09:35:08 INFO params
23/09/2020 09:35:08 INFO /opt/petasan/scripts/admin/node_manage_disks.py delete-osd
Looks like the some things failed to prep the disk.
23/09/2020 10:19:53 ERROR sync_replication_node called on non-backup node
23/09/2020 10:19:52 INFO Update roles.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 09:35:21 ERROR ceph-volume.zap failed
23/09/2020 09:35:18 INFO Starting ceph-volume --log-path /opt/petasan/log lvm zap /dev/not_set --destroy
23/09/2020 09:35:18 INFO Start cleaning disk not_set
23/09/2020 09:35:18 INFO osd.9 is deleted from crush map
23/09/2020 09:35:17 INFO Start delete osd.9 from crush map
23/09/2020 09:35:17 WARNING osd.9 is removed from crush map
23/09/2020 09:35:17 WARNING The osd still up you need to stop osd service of osd.9
23/09/2020 09:35:16 ERROR Error executing ceph auth del osd.9
23/09/2020 09:35:15 ERROR Error executing ceph osd crush remove osd.9
23/09/2020 09:35:13 ERROR Error executing ceph osd out osd.9
23/09/2020 09:35:12 INFO Start remove osd.9 from crush map
23/09/2020 09:35:08 INFO Start delete osd job 1349670
23/09/2020 09:35:08 INFO Start delete job for osd.9
23/09/2020 09:35:08 INFO -id 9 -disk_name not_set
23/09/2020 09:35:08 INFO params
23/09/2020 09:35:08 INFO /opt/petasan/scripts/admin/node_manage_disks.py delete-osd
admin
2,930 Posts
September 23, 2020, 9:15 pmQuote from admin on September 23, 2020, 9:15 pmyou can delete the bad OSD either before removal or after it is removed.
if you do not see the new drive listed in the ui, there could be an issue with the hardware, check it is listed in /sys/block and /dev/sdX if not look at kernel logs via dmesg there could be a problem with it.
you can delete the bad OSD either before removal or after it is removed.
if you do not see the new drive listed in the ui, there could be an issue with the hardware, check it is listed in /sys/block and /dev/sdX if not look at kernel logs via dmesg there could be a problem with it.
khopkins
96 Posts
September 24, 2020, 3:14 pmQuote from khopkins on September 24, 2020, 3:14 pmReplaced drive with a new one, same results, no disk showing up in GUI to add
root@PS-Node1:/sys/block# ls
dm-0 dm-2 dm-4 rbd0 sda sdd sdf sr0
dm-1 dm-3 dm-5 rbd1 sdb sde sdgcd /sys
No sdc
/dev/sdc did not show up
/var/lib/ceph/osd# ls
ceph-10 ceph-11 ceph-6 ceph-7 ceph-8 ceph-9
Loads of these in dmesg
[31534765.299824] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534766.619981] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.503301] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.643601] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.799517] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.965121] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534785.127645] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534787.274066] Buffer I/O error on dev dm-5, logical block 976486384, async page read
Just a thought, do these drives need to be initialized? These are disk straight out of the package
If this one node was to be updated to 2.6, will it have any issues as the other nodes are 2.3.1? Just until node 1 is good, than we'll update the others.
Replaced drive with a new one, same results, no disk showing up in GUI to add
root@PS-Node1:/sys/block# ls
dm-0 dm-2 dm-4 rbd0 sda sdd sdf sr0
dm-1 dm-3 dm-5 rbd1 sdb sde sdgcd /sys
No sdc
/dev/sdc did not show up
/var/lib/ceph/osd# ls
ceph-10 ceph-11 ceph-6 ceph-7 ceph-8 ceph-9
Loads of these in dmesg
[31534765.299824] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534766.619981] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.503301] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.643601] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.799517] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.965121] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534785.127645] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534787.274066] Buffer I/O error on dev dm-5, logical block 976486384, async page read
Just a thought, do these drives need to be initialized? These are disk straight out of the package
If this one node was to be updated to 2.6, will it have any issues as the other nodes are 2.3.1? Just until node 1 is good, than we'll update the others.
Last edited on September 24, 2020, 4:41 pm by khopkins · #15
admin
2,930 Posts
September 24, 2020, 6:26 pmQuote from admin on September 24, 2020, 6:26 pmIt depends on the time of the error, do a dmesg -T to make sure
You do not need to do anything for the new drives to show up, but they are not detected in /sys/block which could be a hardware issue.
You can upgrade if health status of cluster is Ok
It depends on the time of the error, do a dmesg -T to make sure
You do not need to do anything for the new drives to show up, but they are not detected in /sys/block which could be a hardware issue.
You can upgrade if health status of cluster is Ok
khopkins
96 Posts
September 24, 2020, 6:56 pmQuote from khopkins on September 24, 2020, 6:56 pmReally appreciate the responses, we'll look more into being a backplane issue. Thanks
Really appreciate the responses, we'll look more into being a backplane issue. Thanks
Pages: 1 2
Replace failed drive
khopkins
96 Posts
Quote from khopkins on September 2, 2020, 7:23 pmThe disk will be in Friday so will remove it and replace it, then see if UI is working.
Thanks,
The disk will be in Friday so will remove it and replace it, then see if UI is working.
Thanks,
khopkins
96 Posts
Quote from khopkins on September 23, 2020, 2:10 pmGot the bad drive in and spun up, and have a GUI which shows status of OSD 9 down. Do I need to delete the disk and add it back? If I do delete the disk, will the GUI change to add it back?
Deleted Disk and it removed it from the GUI. Replaced disk does not show up to add, how do we add the disk?
New drive is not showing up in GUI.
Got the bad drive in and spun up, and have a GUI which shows status of OSD 9 down. Do I need to delete the disk and add it back? If I do delete the disk, will the GUI change to add it back?
Deleted Disk and it removed it from the GUI. Replaced disk does not show up to add, how do we add the disk?
New drive is not showing up in GUI.
khopkins
96 Posts
Quote from khopkins on September 23, 2020, 4:39 pmLooks like the some things failed to prep the disk.
23/09/2020 10:19:53 ERROR sync_replication_node called on non-backup node
23/09/2020 10:19:52 INFO Update roles.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 09:35:21 ERROR ceph-volume.zap failed
23/09/2020 09:35:18 INFO Starting ceph-volume --log-path /opt/petasan/log lvm zap /dev/not_set --destroy
23/09/2020 09:35:18 INFO Start cleaning disk not_set
23/09/2020 09:35:18 INFO osd.9 is deleted from crush map
23/09/2020 09:35:17 INFO Start delete osd.9 from crush map
23/09/2020 09:35:17 WARNING osd.9 is removed from crush map
23/09/2020 09:35:17 WARNING The osd still up you need to stop osd service of osd.9
23/09/2020 09:35:16 ERROR Error executing ceph auth del osd.9
23/09/2020 09:35:15 ERROR Error executing ceph osd crush remove osd.9
23/09/2020 09:35:13 ERROR Error executing ceph osd out osd.9
23/09/2020 09:35:12 INFO Start remove osd.9 from crush map
23/09/2020 09:35:08 INFO Start delete osd job 134967023/09/2020 09:35:08 INFO Start delete job for osd.9
23/09/2020 09:35:08 INFO -id 9 -disk_name not_set
23/09/2020 09:35:08 INFO params
23/09/2020 09:35:08 INFO /opt/petasan/scripts/admin/node_manage_disks.py delete-osd
Looks like the some things failed to prep the disk.
23/09/2020 10:19:53 ERROR sync_replication_node called on non-backup node
23/09/2020 10:19:52 INFO Update roles.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 10:19:52 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
23/09/2020 09:35:21 ERROR ceph-volume.zap failed
23/09/2020 09:35:18 INFO Starting ceph-volume --log-path /opt/petasan/log lvm zap /dev/not_set --destroy
23/09/2020 09:35:18 INFO Start cleaning disk not_set
23/09/2020 09:35:18 INFO osd.9 is deleted from crush map
23/09/2020 09:35:17 INFO Start delete osd.9 from crush map
23/09/2020 09:35:17 WARNING osd.9 is removed from crush map
23/09/2020 09:35:17 WARNING The osd still up you need to stop osd service of osd.9
23/09/2020 09:35:16 ERROR Error executing ceph auth del osd.9
23/09/2020 09:35:15 ERROR Error executing ceph osd crush remove osd.9
23/09/2020 09:35:13 ERROR Error executing ceph osd out osd.9
23/09/2020 09:35:12 INFO Start remove osd.9 from crush map
23/09/2020 09:35:08 INFO Start delete job for osd.9
23/09/2020 09:35:08 INFO -id 9 -disk_name not_set
23/09/2020 09:35:08 INFO params
23/09/2020 09:35:08 INFO /opt/petasan/scripts/admin/node_manage_disks.py delete-osd
admin
2,930 Posts
Quote from admin on September 23, 2020, 9:15 pmyou can delete the bad OSD either before removal or after it is removed.
if you do not see the new drive listed in the ui, there could be an issue with the hardware, check it is listed in /sys/block and /dev/sdX if not look at kernel logs via dmesg there could be a problem with it.
you can delete the bad OSD either before removal or after it is removed.
if you do not see the new drive listed in the ui, there could be an issue with the hardware, check it is listed in /sys/block and /dev/sdX if not look at kernel logs via dmesg there could be a problem with it.
khopkins
96 Posts
Quote from khopkins on September 24, 2020, 3:14 pmReplaced drive with a new one, same results, no disk showing up in GUI to add
root@PS-Node1:/sys/block# ls
dm-0 dm-2 dm-4 rbd0 sda sdd sdf sr0
dm-1 dm-3 dm-5 rbd1 sdb sde sdgcd /sysNo sdc
/dev/sdc did not show up
/var/lib/ceph/osd# ls
ceph-10 ceph-11 ceph-6 ceph-7 ceph-8 ceph-9Loads of these in dmesg
[31534765.299824] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534766.619981] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.503301] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.643601] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.799517] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.965121] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534785.127645] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534787.274066] Buffer I/O error on dev dm-5, logical block 976486384, async page readJust a thought, do these drives need to be initialized? These are disk straight out of the package
If this one node was to be updated to 2.6, will it have any issues as the other nodes are 2.3.1? Just until node 1 is good, than we'll update the others.
Replaced drive with a new one, same results, no disk showing up in GUI to add
root@PS-Node1:/sys/block# ls
dm-0 dm-2 dm-4 rbd0 sda sdd sdf sr0
dm-1 dm-3 dm-5 rbd1 sdb sde sdgcd /sys
No sdc
/dev/sdc did not show up
/var/lib/ceph/osd# ls
ceph-10 ceph-11 ceph-6 ceph-7 ceph-8 ceph-9
Loads of these in dmesg
[31534765.299824] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534766.619981] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.503301] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.643601] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.799517] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534784.965121] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534785.127645] Buffer I/O error on dev dm-5, logical block 976486384, async page read
[31534787.274066] Buffer I/O error on dev dm-5, logical block 976486384, async page read
Just a thought, do these drives need to be initialized? These are disk straight out of the package
If this one node was to be updated to 2.6, will it have any issues as the other nodes are 2.3.1? Just until node 1 is good, than we'll update the others.
admin
2,930 Posts
Quote from admin on September 24, 2020, 6:26 pmIt depends on the time of the error, do a dmesg -T to make sure
You do not need to do anything for the new drives to show up, but they are not detected in /sys/block which could be a hardware issue.
You can upgrade if health status of cluster is Ok
It depends on the time of the error, do a dmesg -T to make sure
You do not need to do anything for the new drives to show up, but they are not detected in /sys/block which could be a hardware issue.
You can upgrade if health status of cluster is Ok
khopkins
96 Posts
Quote from khopkins on September 24, 2020, 6:56 pmReally appreciate the responses, we'll look more into being a backplane issue. Thanks
Really appreciate the responses, we'll look more into being a backplane issue. Thanks