Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Disk swap procedure

Hello,

 

Wondering if we've reached max number of disks for our enclosures, and we want to increase size of each disk, what is the process for removing online OSDs/disks, and replacing with a new/larger disk?

 

Thanks!

1-  On a storage node set all its OSD weight to 0:

ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME

2-Observe the PG status chart from dashboard, once  rebalance is complete stops all OSDs on node:

systemctl stop ceph-osd@OSD_ID

3-From the admin web application, delete all OSDs on node

4-Once all OSDs in a node are deleted, replace all disks, bring the node online then from the admin web application,  add the new disks as OSDs.

6-The new additions will cause Ceph to rebalance again, observe the PG status chart from dashboard, once rebalance is complete then go back to step 1 on a different node.

Some notes:

  • Do consider if you really cannot add more nodes and have to do disk replacement.
  • Do not do disk replacement on more than 1 node at once.
  • Do not do disk replacement on any node if your cluster status is not active/clean.
  • If your new disks are much larger than before, you will add higher load on disks and nodes while you are upgrading since they will be serving more io. in addition data rebalancing will also add further load. Make sure your current resources (cpu/disks/ram/net) are not near max.

Does anything change if we aren't swapping all disks in the node, or do we just need to bring down the OSDs individually? IE, there's 8 or so drives of smaller capacity in each node that we would like to swap with larger.

 

Thanks!

You can perform the same steps but apply the steps only on the disks to be replaced rather than all disks.  If however the remaining disks on the node are near full, there is a chance they will fill up during step 1 rebalance (technically this should not happen, but it could due to the probabilistic nature of crush) , in such case you may omit step 1 altogether and just stop and delete the small disks without distributing their replicas, the downside is that until step 6 is complete you will have some pgs with 1 missing replica, ie 2 instead of  3 (if you have a 2 replica cluster, do not think of doing this).

Again if your remaining disks are not near full, do with the earlier steps and apply them on disks to be replaced.

Note: mixing disk sizes is a bad idea..if you have a 4TB disk next to 1TB disks, it will serve 4 times more io and will become a performance bottleneck.

Just want to understand the commands you listed...

ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME

This sets ALL OSDs on (only) the current node to weight 0? Not the whole cluster?

systemctl stop ceph-osd@OSD_ID

This stops all OSDs on the current node? No OSD id needs to be specified?

 

Thanks!

apologies for not being clear:  you will need to replace OSD_ID with the actual osd disk number + CLUSTER_NAME with you actual cluster name.

So if you have a node with osd 15 and 16 and cluster name my_cluster, would type:

ceph osd crush reweight osd.15 0 --cluster my_cluster

ceph osd crush reweight osd.16 0 --cluster my_cluster

for step 1: this will set their weights to 0

That's what I figured, just wanted to make sure.

Is there any issue with setting the weight on multiple OSDs without waiting in between, or shoudl I wait for the rebalance after setting each one?

You can do all osds within a single node at once