Forums - PetaSAN

ForumBug ReportingCeph balancing ineffective
You need to log in to create posts and topics. Login · Register
Ceph balancing ineffective

dbutti
28 Posts

August 24, 2021, 2:54 pm
Quote from dbutti on August 24, 2021, 2:54 pm
Hello; I'm running a 3-node Petasan cluster, v2.8, with 12 OSDs and approximately 50TB raw storage space.

Although I've turned on the Ceph balancer right after installing the cluster, in upmap mode, I can now see that some OSDs only have 13% occupation, while the "fullest" one is at 25%. This means the some disks are almost 100% larger than other ones.

Is there anything else I should do? Is Ceph supposed to work this way, or should I see equal utilisation on all the OSDs?

Thank-you in advance,

Hello; I'm running a 3-node Petasan cluster, v2.8, with 12 OSDs and approximately 50TB raw storage space.

Although I've turned on the Ceph balancer right after installing the cluster, in upmap mode, I can now see that some OSDs only have 13% occupation, while the "fullest" one is at 25%. This means the some disks are almost 100% larger than other ones.

Is there anything else I should do? Is Ceph supposed to work this way, or should I see equal utilisation on all the OSDs?

Thank-you in advance,

#1

admin
2,930 Posts

August 25, 2021, 12:50 pm
Quote from admin on August 25, 2021, 12:50 pm
how many pgs in your pool(s)

how many pgs in your pool(s)

#2

dbutti
28 Posts

August 25, 2021, 1:06 pm
Quote from dbutti on August 25, 2021, 1:06 pm
The rbd, cephfs_data and nfsdb pools all have 32 PGs each.

Furthermore, shouldn't the autoscaler also automatically adjust that number? I have osd_pool_default_pg_autoscale_mode = on

Thanks,

The rbd, cephfs_data and nfsdb pools all have 32 PGs each.

Furthermore, shouldn't the autoscaler also automatically adjust that number? I have osd_pool_default_pg_autoscale_mode = on

Thanks,

#3

davlaw
35 Posts

August 25, 2021, 4:03 pm
Quote from davlaw on August 25, 2021, 4:03 pm
I have been looking at mine for the past few days, but even with the above setting in "global" each pool still shows off, but not sure how quickly it might take effect. Or does this only affect new pools, old pools will have to be done ( sorry it did say furture pools, so I guess, eph osd pool set <pool-name> pg_autoscale_mode <mode> will be needed)

root@peta7:~# ceph osd pool autoscale-status
POOL                     SIZE TARGET SIZE RATE RAW CAPACITY   RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
cephfs_data            48916k                2.0        61267G 0.0000                                  1.0     512          32 off
cephfs_metadata         8181k                2.0        61267G 0.0000                                  4.0      64          16 off
iscsi                  15141G                2.0        61267G 0.4943                                  1.0     512              off
device_health_metrics   2213k                3.0        61267G 0.0000                                  1.0       1              off
.rgw.root               1289                 3.0        61267G 0.0000                                  1.0      32              off
default.rgw.log         3520                 3.0        61267G 0.0000                                  1.0      32              off
default.rgw.control        0                 3.0        61267G 0.0000                                  1.0      32              off
default.rgw.meta           0                 3.0        61267G 0.0000                                  4.0      32           8 off

But kinda of wondering if we are confused, re-balance and autoscaler are different I think. should we be more concerned with backfill?

I have been looking at mine for the past few days, but even with the above setting in "global" each pool still shows off, but not sure how quickly it might take effect. Or does this only affect new pools, old pools will have to be done ( sorry it did say furture pools, so I guess, eph osd pool set <pool-name> pg_autoscale_mode <mode> will be needed)

root@peta7:~# ceph osd pool autoscale-status
POOL                     SIZE TARGET SIZE RATE RAW CAPACITY   RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
cephfs_data            48916k                2.0        61267G 0.0000                                  1.0     512          32 off
cephfs_metadata         8181k                2.0        61267G 0.0000                                  4.0      64          16 off
iscsi                  15141G                2.0        61267G 0.4943                                  1.0     512              off
device_health_metrics   2213k                3.0        61267G 0.0000                                  1.0       1              off
.rgw.root               1289                 3.0        61267G 0.0000                                  1.0      32              off
default.rgw.log         3520                 3.0        61267G 0.0000                                  1.0      32              off
default.rgw.control        0                 3.0        61267G 0.0000                                  1.0      32              off
default.rgw.meta           0                 3.0        61267G 0.0000                                  4.0      32           8 off

But kinda of wondering if we are confused, re-balance and autoscaler are different I think. should we be more concerned with backfill?

Last edited on August 25, 2021, 4:39 pm by davlaw · #4

dbutti
28 Posts

August 25, 2021, 4:57 pm
Quote from dbutti on August 25, 2021, 4:57 pm
Hello Davlaw; yes, Balancer and Autoscaler are two different things, indeed. My remark was just in reply to admin's question, how many PGs are in my pools. I believe that the answer shouldn't be relevant, given that the autoscaler is supposed to adjust that number automatically when new pools are added to the OSDs.

In any case, on my system I have:

root@petasan1:~# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
device_health_metrics 232.9k 3.0 46339G 0.0000 1.0 1 on
cephfs_data 92890M 3.0 46339G 0.0059 1.0 32 on
cephfs_metadata 227.0M 3.0 46339G 0.0000 4.0 16 on
.rgw.root 3343 3.0 46339G 0.0000 1.0 16 on
default.rgw.control 0 3.0 46339G 0.0000 1.0 16 on
default.rgw.meta 1676 3.0 46339G 0.0000 1.0 16 on
default.rgw.log 3520 3.0 46339G 0.0000 1.0 16 on
default.rgw.buckets.index 51679 3.0 46339G 0.0000 1.0 16 on
default.rgw.buckets.data 2571M 3.0 46339G 0.0002 1.0 32 on
rbd 635.5G 3.0 46339G 0.0411 1.0 32 on
nfsdb 16 3.0 46339G 0.0000 1.0 32 on
default.rgw.buckets.non-ec 0 3.0 46339G 0.0000 1.0 32 on
okd1 1217G 3.0 46339G 0.0788 1.0 32 on

And ceph balancer status shows:

{
"active": true,
"last_optimize_duration": "0:00:00.002819",
"last_optimize_started": "Wed Aug 25 18:53:51 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}

And one further note: by running ceph config-key dump | grep balancer I can see that no specific setting exists for the balancer (such as max_iterations, max_deviation and so on). Could this be the root cause? Am I supposed to adjust the settings anyhow?

Hello Davlaw; yes, Balancer and Autoscaler are two different things, indeed. My remark was just in reply to admin's question, how many PGs are in my pools. I believe that the answer shouldn't be relevant, given that the autoscaler is supposed to adjust that number automatically when new pools are added to the OSDs.

In any case, on my system I have:

root@petasan1:~# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
device_health_metrics 232.9k 3.0 46339G 0.0000 1.0 1 on
cephfs_data 92890M 3.0 46339G 0.0059 1.0 32 on
cephfs_metadata 227.0M 3.0 46339G 0.0000 4.0 16 on
.rgw.root 3343 3.0 46339G 0.0000 1.0 16 on
default.rgw.control 0 3.0 46339G 0.0000 1.0 16 on
default.rgw.meta 1676 3.0 46339G 0.0000 1.0 16 on
default.rgw.log 3520 3.0 46339G 0.0000 1.0 16 on
default.rgw.buckets.index 51679 3.0 46339G 0.0000 1.0 16 on
default.rgw.buckets.data 2571M 3.0 46339G 0.0002 1.0 32 on
rbd 635.5G 3.0 46339G 0.0411 1.0 32 on
nfsdb 16 3.0 46339G 0.0000 1.0 32 on
default.rgw.buckets.non-ec 0 3.0 46339G 0.0000 1.0 32 on
okd1 1217G 3.0 46339G 0.0788 1.0 32 on

And ceph balancer status shows:

{
"active": true,
"last_optimize_duration": "0:00:00.002819",
"last_optimize_started": "Wed Aug 25 18:53:51 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}

And one further note: by running ceph config-key dump | grep balancer I can see that no specific setting exists for the balancer (such as max_iterations, max_deviation and so on). Could this be the root cause? Am I supposed to adjust the settings anyhow?

Last edited on August 25, 2021, 5:04 pm by dbutti · #5

davlaw
35 Posts

August 25, 2021, 5:40 pm
Quote from davlaw on August 25, 2021, 5:40 pm
I noticed "plans" is empty, assuming there is a default?

I noticed "plans" is empty, assuming there is a default?

#6

dbutti
28 Posts

August 25, 2021, 5:51 pm
Quote from dbutti on August 25, 2021, 5:51 pm
I'm unsure; I just stuck to the normal settings through the GUI.

I'm unsure; I just stuck to the normal settings through the GUI.

#7

Post Reply: Ceph balancing ineffective

Cancel