Forums - PetaSAN

ForumGeneral DiscussionTesting Petasan - Drive Failure P …
You need to log in to create posts and topics. Login · Register
Testing Petasan - Drive Failure Procedure

RobertH
27 Posts

July 31, 2020, 2:10 pm
Quote from RobertH on July 31, 2020, 2:10 pm
Doing a lab-build testing of PetaSAN before putting into production and trying different things to build internal documentation / howtos.

The lab setup is:
6 node cluster
Each Node:
= 4x 10GB NICs (2x 2 port cards, all bonded with interfaces running on vlans)
= 1x PCIe NVME (512GB Journal)
= 1x OS Drive (80GB SATA)
= 8x OSD drives (300GB SAS 10K) connected to HBA

Came up with a scenario that Im not sure how to deal with: We have the OSD drives setup to use the NVMe as a journal disk in each node.

What we are trying to come up with is the replacement procedure for an OSD should one fail, we tried it in our first pass before adding journal NVMe and it was just a matter of stopping the service, swapping the drive, and re-adding the disk in petasan, with the journal it seems to be more complicated.

Procedure tried so far:

Drive is reported as failed by the controller (smart, offline, etc)

Locate OSD name in PetaSAN gui to get the underlying service (ie ceph-osd@##)

Log into the console on the node that has disk issue using SSH

Execute the service stop for the ceph osd (systemctl stop ceph-osd@##)

Wait for the petasan gui to show disk as stopped

Wait for the petasan gui to show that the data has been rebalanced

Click the X button to remove the disk

Once disk is removed and marked as unused in the gui remove physical disk from host

Insert new physical disk to host

Wait for petasan gui to show new drive in host

Click add button

Select to journal the drive

failure journal does not have additional space

From our testing it seems the only way we can find (short of running commands on the console that Im not aware of) to get the replacement drive back into the journal on the NVMe is to remove all of the drives in the host from petasan so that petasan clears the journal drives partitions, and then add them back in one at a time

So is there a documentation ( looked through the admin manual ) on what the proper procedure is to swap a failing / failed drive out that is using a journal on another disk??

Thanks

Doing a lab-build testing of PetaSAN before putting into production and trying different things to build internal documentation / howtos.

The lab setup is:
6 node cluster
Each Node:
= 4x 10GB NICs (2x 2 port cards, all bonded with interfaces running on vlans)
= 1x PCIe NVME (512GB Journal)
= 1x OS Drive (80GB SATA)
= 8x OSD drives (300GB SAS 10K) connected to HBA

Came up with a scenario that Im not sure how to deal with: We have the OSD drives setup to use the NVMe as a journal disk in each node.

What we are trying to come up with is the replacement procedure for an OSD should one fail, we tried it in our first pass before adding journal NVMe and it was just a matter of stopping the service, swapping the drive, and re-adding the disk in petasan, with the journal it seems to be more complicated.

Procedure tried so far:

Drive is reported as failed by the controller (smart, offline, etc)

Locate OSD name in PetaSAN gui to get the underlying service (ie ceph-osd@##)

Log into the console on the node that has disk issue using SSH

Execute the service stop for the ceph osd (systemctl stop ceph-osd@##)

Wait for the petasan gui to show disk as stopped

Wait for the petasan gui to show that the data has been rebalanced

Click the X button to remove the disk

Once disk is removed and marked as unused in the gui remove physical disk from host

Insert new physical disk to host

Wait for petasan gui to show new drive in host

Click add button

Select to journal the drive

failure journal does not have additional space

From our testing it seems the only way we can find (short of running commands on the console that Im not aware of) to get the replacement drive back into the journal on the NVMe is to remove all of the drives in the host from petasan so that petasan clears the journal drives partitions, and then add them back in one at a time

So is there a documentation ( looked through the admin manual ) on what the proper procedure is to swap a failing / failed drive out that is using a journal on another disk??

Thanks

#1

admin
2,934 Posts

July 31, 2020, 4:22 pm
Quote from admin on July 31, 2020, 4:22 pm
If the OSD is down, the ui shows the delete button. If it is a working OSD, you can stop it via systemctl to delete it.
If an OSD is defective or physically removed, it will show up in the Disk List in a separate row by itself since it would not have a physical device associated, still it will have a delete button so it can be deleted from the cluster and crush tree.

Things are more complex when a journal is involved, first Ceph expects you to manage the journal partitions yourself, earlier we used to require a new journal partition each time we added a new OSD, later we added code to tag a journal partition whether it was available or used. When we delete a disk, we tag its journal partition as free so it can be re-used when adding new disk. This does not work in all cases if the OSD device is not readable we cannot know its uuid and identify what journal partition needs to be freed. We cannot assume that any current unconnected partitions are un-unused, the OSD could be temporarily down or removed so it is risky for us to free its partition. In version 2.6 we added scripts used by our support to tag free partitions manually yourself once you are sure:
/opt/petasan/scripts/util/make_journal_partition_free.py
/opt/petasan/scripts/util/journal_active_partitions.py

If the OSD is down, the ui shows the delete button. If it is a working OSD, you can stop it via systemctl to delete it.
If an OSD is defective or physically removed, it will show up in the Disk List in a separate row by itself since it would not have a physical device associated, still it will have a delete button so it can be deleted from the cluster and crush tree.

Things are more complex when a journal is involved, first Ceph expects you to manage the journal partitions yourself, earlier we used to require a new journal partition each time we added a new OSD, later we added code to tag a journal partition whether it was available or used. When we delete a disk, we tag its journal partition as free so it can be re-used when adding new disk. This does not work in all cases if the OSD device is not readable we cannot know its uuid and identify what journal partition needs to be freed. We cannot assume that any current unconnected partitions are un-unused, the OSD could be temporarily down or removed so it is risky for us to free its partition. In version 2.6 we added scripts used by our support to tag free partitions manually yourself once you are sure:
/opt/petasan/scripts/util/make_journal_partition_free.py
/opt/petasan/scripts/util/journal_active_partitions.py

#2

Post Reply: Testing Petasan - Drive Failure Procedure

Cancel