Disk swap procedure
protocol6v
85 Posts
September 20, 2018, 4:10 pmQuote from protocol6v on September 20, 2018, 4:10 pmHello,
Wondering if we've reached max number of disks for our enclosures, and we want to increase size of each disk, what is the process for removing online OSDs/disks, and replacing with a new/larger disk?
Thanks!
Hello,
Wondering if we've reached max number of disks for our enclosures, and we want to increase size of each disk, what is the process for removing online OSDs/disks, and replacing with a new/larger disk?
Thanks!
admin
2,930 Posts
September 20, 2018, 5:59 pmQuote from admin on September 20, 2018, 5:59 pm1- On a storage node set all its OSD weight to 0:
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
2-Observe the PG status chart from dashboard, once rebalance is complete stops all OSDs on node:
systemctl stop ceph-osd@OSD_ID
3-From the admin web application, delete all OSDs on node
4-Once all OSDs in a node are deleted, replace all disks, bring the node online then from the admin web application, add the new disks as OSDs.
6-The new additions will cause Ceph to rebalance again, observe the PG status chart from dashboard, once rebalance is complete then go back to step 1 on a different node.
Some notes:
- Do consider if you really cannot add more nodes and have to do disk replacement.
- Do not do disk replacement on more than 1 node at once.
- Do not do disk replacement on any node if your cluster status is not active/clean.
- If your new disks are much larger than before, you will add higher load on disks and nodes while you are upgrading since they will be serving more io. in addition data rebalancing will also add further load. Make sure your current resources (cpu/disks/ram/net) are not near max.
1- On a storage node set all its OSD weight to 0:
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
2-Observe the PG status chart from dashboard, once rebalance is complete stops all OSDs on node:
systemctl stop ceph-osd@OSD_ID
3-From the admin web application, delete all OSDs on node
4-Once all OSDs in a node are deleted, replace all disks, bring the node online then from the admin web application, add the new disks as OSDs.
6-The new additions will cause Ceph to rebalance again, observe the PG status chart from dashboard, once rebalance is complete then go back to step 1 on a different node.
Some notes:
- Do consider if you really cannot add more nodes and have to do disk replacement.
- Do not do disk replacement on more than 1 node at once.
- Do not do disk replacement on any node if your cluster status is not active/clean.
- If your new disks are much larger than before, you will add higher load on disks and nodes while you are upgrading since they will be serving more io. in addition data rebalancing will also add further load. Make sure your current resources (cpu/disks/ram/net) are not near max.
Last edited on September 20, 2018, 6:08 pm by admin · #2
protocol6v
85 Posts
September 21, 2018, 11:23 amQuote from protocol6v on September 21, 2018, 11:23 amDoes anything change if we aren't swapping all disks in the node, or do we just need to bring down the OSDs individually? IE, there's 8 or so drives of smaller capacity in each node that we would like to swap with larger.
Thanks!
Does anything change if we aren't swapping all disks in the node, or do we just need to bring down the OSDs individually? IE, there's 8 or so drives of smaller capacity in each node that we would like to swap with larger.
Thanks!
admin
2,930 Posts
September 21, 2018, 7:01 pmQuote from admin on September 21, 2018, 7:01 pmYou can perform the same steps but apply the steps only on the disks to be replaced rather than all disks. If however the remaining disks on the node are near full, there is a chance they will fill up during step 1 rebalance (technically this should not happen, but it could due to the probabilistic nature of crush) , in such case you may omit step 1 altogether and just stop and delete the small disks without distributing their replicas, the downside is that until step 6 is complete you will have some pgs with 1 missing replica, ie 2 instead of 3 (if you have a 2 replica cluster, do not think of doing this).
Again if your remaining disks are not near full, do with the earlier steps and apply them on disks to be replaced.
Note: mixing disk sizes is a bad idea..if you have a 4TB disk next to 1TB disks, it will serve 4 times more io and will become a performance bottleneck.
You can perform the same steps but apply the steps only on the disks to be replaced rather than all disks. If however the remaining disks on the node are near full, there is a chance they will fill up during step 1 rebalance (technically this should not happen, but it could due to the probabilistic nature of crush) , in such case you may omit step 1 altogether and just stop and delete the small disks without distributing their replicas, the downside is that until step 6 is complete you will have some pgs with 1 missing replica, ie 2 instead of 3 (if you have a 2 replica cluster, do not think of doing this).
Again if your remaining disks are not near full, do with the earlier steps and apply them on disks to be replaced.
Note: mixing disk sizes is a bad idea..if you have a 4TB disk next to 1TB disks, it will serve 4 times more io and will become a performance bottleneck.
Last edited on September 21, 2018, 7:04 pm by admin · #4
protocol6v
85 Posts
October 2, 2018, 12:56 pmQuote from protocol6v on October 2, 2018, 12:56 pmJust want to understand the commands you listed...
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
This sets ALL OSDs on (only) the current node to weight 0? Not the whole cluster?
systemctl stop ceph-osd@OSD_ID
This stops all OSDs on the current node? No OSD id needs to be specified?
Thanks!
Just want to understand the commands you listed...
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
This sets ALL OSDs on (only) the current node to weight 0? Not the whole cluster?
systemctl stop ceph-osd@OSD_ID
This stops all OSDs on the current node? No OSD id needs to be specified?
Thanks!
admin
2,930 Posts
October 2, 2018, 1:10 pmQuote from admin on October 2, 2018, 1:10 pmapologies for not being clear: you will need to replace OSD_ID with the actual osd disk number + CLUSTER_NAME with you actual cluster name.
So if you have a node with osd 15 and 16 and cluster name my_cluster, would type:
ceph osd crush reweight osd.15 0 --cluster my_cluster
ceph osd crush reweight osd.16 0 --cluster my_cluster
for step 1: this will set their weights to 0
apologies for not being clear: you will need to replace OSD_ID with the actual osd disk number + CLUSTER_NAME with you actual cluster name.
So if you have a node with osd 15 and 16 and cluster name my_cluster, would type:
ceph osd crush reweight osd.15 0 --cluster my_cluster
ceph osd crush reweight osd.16 0 --cluster my_cluster
for step 1: this will set their weights to 0
protocol6v
85 Posts
October 2, 2018, 1:37 pmQuote from protocol6v on October 2, 2018, 1:37 pmThat's what I figured, just wanted to make sure.
Is there any issue with setting the weight on multiple OSDs without waiting in between, or shoudl I wait for the rebalance after setting each one?
That's what I figured, just wanted to make sure.
Is there any issue with setting the weight on multiple OSDs without waiting in between, or shoudl I wait for the rebalance after setting each one?
admin
2,930 Posts
October 2, 2018, 1:40 pmQuote from admin on October 2, 2018, 1:40 pmYou can do all osds within a single node at once
You can do all osds within a single node at once
Last edited on October 2, 2018, 1:41 pm by admin · #8
Disk swap procedure
protocol6v
85 Posts
Quote from protocol6v on September 20, 2018, 4:10 pmHello,
Wondering if we've reached max number of disks for our enclosures, and we want to increase size of each disk, what is the process for removing online OSDs/disks, and replacing with a new/larger disk?
Thanks!
Hello,
Wondering if we've reached max number of disks for our enclosures, and we want to increase size of each disk, what is the process for removing online OSDs/disks, and replacing with a new/larger disk?
Thanks!
admin
2,930 Posts
Quote from admin on September 20, 2018, 5:59 pm1- On a storage node set all its OSD weight to 0:
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
2-Observe the PG status chart from dashboard, once rebalance is complete stops all OSDs on node:
systemctl stop ceph-osd@OSD_ID
3-From the admin web application, delete all OSDs on node
4-Once all OSDs in a node are deleted, replace all disks, bring the node online then from the admin web application, add the new disks as OSDs.
6-The new additions will cause Ceph to rebalance again, observe the PG status chart from dashboard, once rebalance is complete then go back to step 1 on a different node.
Some notes:
- Do consider if you really cannot add more nodes and have to do disk replacement.
- Do not do disk replacement on more than 1 node at once.
- Do not do disk replacement on any node if your cluster status is not active/clean.
- If your new disks are much larger than before, you will add higher load on disks and nodes while you are upgrading since they will be serving more io. in addition data rebalancing will also add further load. Make sure your current resources (cpu/disks/ram/net) are not near max.
1- On a storage node set all its OSD weight to 0:
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
2-Observe the PG status chart from dashboard, once rebalance is complete stops all OSDs on node:
systemctl stop ceph-osd@OSD_ID
3-From the admin web application, delete all OSDs on node
4-Once all OSDs in a node are deleted, replace all disks, bring the node online then from the admin web application, add the new disks as OSDs.
6-The new additions will cause Ceph to rebalance again, observe the PG status chart from dashboard, once rebalance is complete then go back to step 1 on a different node.
Some notes:
- Do consider if you really cannot add more nodes and have to do disk replacement.
- Do not do disk replacement on more than 1 node at once.
- Do not do disk replacement on any node if your cluster status is not active/clean.
- If your new disks are much larger than before, you will add higher load on disks and nodes while you are upgrading since they will be serving more io. in addition data rebalancing will also add further load. Make sure your current resources (cpu/disks/ram/net) are not near max.
protocol6v
85 Posts
Quote from protocol6v on September 21, 2018, 11:23 amDoes anything change if we aren't swapping all disks in the node, or do we just need to bring down the OSDs individually? IE, there's 8 or so drives of smaller capacity in each node that we would like to swap with larger.
Thanks!
Does anything change if we aren't swapping all disks in the node, or do we just need to bring down the OSDs individually? IE, there's 8 or so drives of smaller capacity in each node that we would like to swap with larger.
Thanks!
admin
2,930 Posts
Quote from admin on September 21, 2018, 7:01 pmYou can perform the same steps but apply the steps only on the disks to be replaced rather than all disks. If however the remaining disks on the node are near full, there is a chance they will fill up during step 1 rebalance (technically this should not happen, but it could due to the probabilistic nature of crush) , in such case you may omit step 1 altogether and just stop and delete the small disks without distributing their replicas, the downside is that until step 6 is complete you will have some pgs with 1 missing replica, ie 2 instead of 3 (if you have a 2 replica cluster, do not think of doing this).
Again if your remaining disks are not near full, do with the earlier steps and apply them on disks to be replaced.
Note: mixing disk sizes is a bad idea..if you have a 4TB disk next to 1TB disks, it will serve 4 times more io and will become a performance bottleneck.
You can perform the same steps but apply the steps only on the disks to be replaced rather than all disks. If however the remaining disks on the node are near full, there is a chance they will fill up during step 1 rebalance (technically this should not happen, but it could due to the probabilistic nature of crush) , in such case you may omit step 1 altogether and just stop and delete the small disks without distributing their replicas, the downside is that until step 6 is complete you will have some pgs with 1 missing replica, ie 2 instead of 3 (if you have a 2 replica cluster, do not think of doing this).
Again if your remaining disks are not near full, do with the earlier steps and apply them on disks to be replaced.
Note: mixing disk sizes is a bad idea..if you have a 4TB disk next to 1TB disks, it will serve 4 times more io and will become a performance bottleneck.
protocol6v
85 Posts
Quote from protocol6v on October 2, 2018, 12:56 pmJust want to understand the commands you listed...
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
This sets ALL OSDs on (only) the current node to weight 0? Not the whole cluster?
systemctl stop ceph-osd@OSD_ID
This stops all OSDs on the current node? No OSD id needs to be specified?
Thanks!
Just want to understand the commands you listed...
ceph osd crush reweight osd.OSD_ID 0 --cluster CLUSTER_NAME
This sets ALL OSDs on (only) the current node to weight 0? Not the whole cluster?
systemctl stop ceph-osd@OSD_ID
This stops all OSDs on the current node? No OSD id needs to be specified?
Thanks!
admin
2,930 Posts
Quote from admin on October 2, 2018, 1:10 pmapologies for not being clear: you will need to replace OSD_ID with the actual osd disk number + CLUSTER_NAME with you actual cluster name.
So if you have a node with osd 15 and 16 and cluster name my_cluster, would type:
ceph osd crush reweight osd.15 0 --cluster my_cluster
ceph osd crush reweight osd.16 0 --cluster my_cluster
for step 1: this will set their weights to 0
apologies for not being clear: you will need to replace OSD_ID with the actual osd disk number + CLUSTER_NAME with you actual cluster name.
So if you have a node with osd 15 and 16 and cluster name my_cluster, would type:
ceph osd crush reweight osd.15 0 --cluster my_cluster
ceph osd crush reweight osd.16 0 --cluster my_cluster
for step 1: this will set their weights to 0
protocol6v
85 Posts
Quote from protocol6v on October 2, 2018, 1:37 pmThat's what I figured, just wanted to make sure.
Is there any issue with setting the weight on multiple OSDs without waiting in between, or shoudl I wait for the rebalance after setting each one?
That's what I figured, just wanted to make sure.
Is there any issue with setting the weight on multiple OSDs without waiting in between, or shoudl I wait for the rebalance after setting each one?
admin
2,930 Posts
Quote from admin on October 2, 2018, 1:40 pmYou can do all osds within a single node at once
You can do all osds within a single node at once