Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

OSD balancer and pool size issue

Pages: 1 2

Hello together

I have two problems, I hope it's ok I put it together. Maybe it's related to each other.

First a overview of all:
We're running on version 3.0.1, have 3 Nodes with 11 SSD OSD's per Node and 4 Nodes with 2 HDD OSD's per Node.
Then we have three Pools
- device_health_metrics - this pool was created automaticly
- SSD-Pool, 900 PG's (we have started with 9 OSD's)
- HDD-Pool, 1024 PG's

Output of ceph df detail
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 131 TiB 71 TiB 61 TiB 61 TiB 46.05
ssd 115 TiB 34 TiB 81 TiB 81 TiB 70.22
TOTAL 247 TiB 105 TiB 141 TiB 141 TiB 57.34

--- POOLS ---
POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
SSD-Pool 7 900 27 TiB 27 TiB 7.8 MiB 8.23M 81 TiB 81 TiB 23 MiB 89.29 3.2 TiB N/A N/A 8.23M 0 B 0 B
HDD-Pool 8 1024 20 TiB 20 TiB 2.8 MiB 5.27M 60 TiB 60 TiB 8.4 MiB 49.13 21 TiB N/A N/A 5.27M 0 B 0 B
device_health_metrics 9 1 24 MiB 0 B 24 MiB 46 72 MiB 0 B 72 MiB 0 6.9 TiB N/A N/A 46 0 B 0 B

In my opinion the SSD pool should have 125 TiB Size (33 x 3.8 TiB) but the last resize when we add 3 OSD's to the Nodes the pool doesn't resize automaticly (until now it was always like this)
Then we have a balancer active
Output of ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.036000",
"last_optimize_started": "Tue Jul 12 13:20:47 2022",
"mode": "crush-compat",
"optimize_result": "Some osds belong to multiple subtrees: {0: ['SSD-Pool', 'default'], 1: ['SSD-Pool', 'default'], 2: ['SSD-Pool', 'default'], 3: ['SSD-Pool', 'default'], 4: ['SSD-Pool', 'default'], 5: ['SSD-Pool', 'default'], 6: ['SSD-Pool', 'default'], 7: ['SSD-Pool', 'default'], 8: ['SSD-Pool', 'default'], 9: ['SSD-Pool', 'default'], 10: ['SSD-Pool', 'default'], 11: ['SSD-Pool', 'default'], 12: ['SSD-Pool', 'default'], 13: ['SSD-Pool', 'default'], 14: ['SSD-Pool', 'default'], 15: ['HDD-Pool', 'default'], 16: ['HDD-Pool', 'default'], 17: ['HDD-Pool', 'default'], 18: ['HDD-Pool', 'default'], 19: ['HDD-Pool', 'default'], 20: ['HDD-Pool', 'default'], 21: ['HDD-Pool', 'default'], 22: ['HDD-Pool', 'default'], 23: ['SSD-Pool', 'default'], 24: ['SSD-Pool', 'default'], 25: ['SSD-Pool', 'default'], 26: ['SSD-Pool', 'default'], 27: ['SSD-Pool', 'default'], 28: ['SSD-Pool', 'default'], 29: ['SSD-Pool', 'default'], 30: ['SSD-Pool', 'default'], 31: ['SSD-Pool', 'default'], 32: ['SSD-Pool', 'default'], 33: ['SSD-Pool', 'default'], 34: ['SSD-Pool', 'default'], 35: ['SSD-Pool', 'default'], 36: ['SSD-Pool', 'default'], 37: ['SSD-Pool', 'default'], 38: ['SSD-Pool', 'default'], 39: ['SSD-Pool', 'default'], 40: ['SSD-Pool', 'default']}",
"plans": []
}

But the output of ceph osd df shows a unbalance and I don't know why

ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
15 hdd 16.42969 1.00000 16 TiB 7.2 TiB 7.2 TiB 3.9 MiB 11 GiB 9.2 TiB 44.02 0.77 367 up
16 hdd 16.42969 1.00000 16 TiB 7.7 TiB 7.6 TiB 46 KiB 12 GiB 8.7 TiB 46.87 0.82 391 up
17 hdd 16.42969 1.00000 16 TiB 7.8 TiB 7.8 TiB 3.8 MiB 12 GiB 8.6 TiB 47.69 0.83 398 up
18 hdd 16.42969 1.00000 16 TiB 7.7 TiB 7.7 TiB 459 KiB 11 GiB 8.7 TiB 47.13 0.82 393 up
19 hdd 16.42969 1.00000 16 TiB 7.7 TiB 7.6 TiB 166 KiB 11 GiB 8.7 TiB 46.85 0.82 391 up
20 hdd 16.42969 1.00000 16 TiB 7.4 TiB 7.3 TiB 212 KiB 11 GiB 9.1 TiB 44.75 0.78 373 up
21 hdd 16.42969 1.00000 16 TiB 7.6 TiB 7.5 TiB 133 KiB 11 GiB 8.8 TiB 46.21 0.81 385 up
22 hdd 16.42969 1.00000 16 TiB 7.4 TiB 7.3 TiB 23 MiB 11 GiB 9.1 TiB 44.86 0.78 375 up

0 ssd 3.49300 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 4.9 MiB 6.9 GiB 1.2 TiB 64.97 1.13 77 up
1 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 25 MiB 6.8 GiB 1.0 TiB 70.07 1.22 84 up
2 ssd 3.49300 1.00000 3.5 TiB 2.2 TiB 2.2 TiB 342 KiB 6.7 GiB 1.3 TiB 64.09 1.12 78 up
9 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 3.8 MiB 6.8 GiB 1.1 TiB 69.40 1.21 81 up
12 ssd 3.49300 1.00000 3.5 TiB 2.7 TiB 2.7 TiB 0 B 7.0 GiB 801 GiB 77.62 1.35 84 up
23 ssd 3.49309 1.00000 3.5 TiB 2.2 TiB 2.2 TiB 4.6 MiB 5.8 GiB 1.3 TiB 64.09 1.12 73 up
26 ssd 3.49309 1.00000 3.5 TiB 3.0 TiB 3.0 TiB 84 KiB 5.5 GiB 479 GiB 86.60 1.51 105 up
28 ssd 3.49199 1.00000 3.5 TiB 2.2 TiB 2.2 TiB 3.3 MiB 5.8 GiB 1.3 TiB 64.07 1.12 74 up
29 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 4.6 MiB 6.4 GiB 1.1 TiB 67.95 1.18 80 up
30 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 1.2 MiB 6.1 GiB 1.1 TiB 69.47 1.21 83 up
37 ssd 3.49199 1.00000 3.5 TiB 2.6 TiB 2.6 TiB 2.6 MiB 6.5 GiB 926 GiB 74.11 1.29 82 up
3 ssd 3.49300 1.00000 3.5 TiB 2.6 TiB 2.6 TiB 0 B 7.1 GiB 959 GiB 73.20 1.28 82 up
4 ssd 3.49300 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 3.8 MiB 6.9 GiB 1.1 TiB 67.09 1.17 79 up
5 ssd 3.49300 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 25 MiB 7.7 GiB 987 GiB 72.40 1.26 85 up
10 ssd 3.49300 1.00000 3.5 TiB 2.7 TiB 2.7 TiB 0 B 7.1 GiB 854 GiB 76.12 1.33 88 up
13 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 5.6 MiB 6.6 GiB 1.1 TiB 69.29 1.21 82 up
24 ssd 3.49309 1.00000 3.5 TiB 2.2 TiB 2.2 TiB 4.9 MiB 5.5 GiB 1.3 TiB 62.71 1.09 74 up
27 ssd 3.49309 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 4.8 MiB 4.2 GiB 1.2 TiB 64.88 1.13 77 up
31 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 1.5 MiB 6.3 GiB 1.0 TiB 70.14 1.22 83 up
32 ssd 3.49199 1.00000 3.5 TiB 2.6 TiB 2.5 TiB 2.9 MiB 6.6 GiB 959 GiB 73.17 1.28 84 up
33 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 1.6 MiB 6.4 GiB 1.0 TiB 70.97 1.24 84 up
38 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 4.7 MiB 6.4 GiB 983 GiB 72.50 1.26 83 up
6 ssd 3.49300 1.00000 3.5 TiB 2.7 TiB 2.7 TiB 1.2 MiB 7.1 GiB 855 GiB 76.11 1.33 88 up
7 ssd 3.49300 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 4.9 MiB 7.6 GiB 1.0 TiB 71.08 1.24 80 up
8 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 0 B 6.8 GiB 1.1 TiB 69.38 1.21 81 up
11 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 0 B 6.5 GiB 1.1 TiB 67.79 1.18 80 up
14 ssd 3.49300 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 2.2 MiB 6.7 GiB 1.2 TiB 64.90 1.13 73 up
25 ssd 3.49309 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 2.6 MiB 5.6 GiB 1.0 TiB 70.92 1.24 86 up
34 ssd 3.49199 1.00000 3.5 TiB 2.2 TiB 2.2 TiB 4.2 MiB 6.1 GiB 1.3 TiB 63.28 1.10 78 up
35 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 684 KiB 6.1 GiB 1016 GiB 71.59 1.25 89 up
36 ssd 3.49199 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 3.2 MiB 6.1 GiB 1.1 TiB 67.16 1.17 78 up
39 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 0 B 6.6 GiB 1.0 TiB 70.96 1.24 78 up
40 ssd 3.49309 1.00000 3.5 TiB 2.8 TiB 2.8 TiB 2.1 MiB 5.0 GiB 742 GiB 79.25 1.38 89 up
TOTAL 247 TiB 141 TiB 141 TiB 158 MiB 302 GiB 105 TiB 57.34

What can I do, that the balancer works better and why don't resize my pool 'SSD-Pool' automaticly?

Can you try changing balancer mode to upmap and see if it solves the balacing issue.

Yes the remaining capacity is affected by the balancing not working correctly.

I tried over web but I got an error:

Error, min_compat_client "luminous" is required for pg-upmap.

Can I change this on the productive system during operation? And how can I change this? I am never sure what I am allowed to change and what has consequences

Edit:

Output of ceph versions is

{
"mon": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 3
},
"mgr": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 3
},
"osd": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 41
},
"mds": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 3
},
"overall": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 50
}
}

the balancer will not allow you to use pgupmap if you have pre-luminous clients

run

ceph features

to see if you have clients which require pre- luminous features

Another way is to keep the balancer as is and delete the default replicated rule, make sure your pools do not use this rule, you can delete the device health metrics pool if using it and if you wish to recreate it with ssd rule.

 

ceph features shows
"mon":
"features": "0x3f01cfb8ffedffff",
"release": "luminous",
"num": 3

"mds":
"features": "0x3f01cfb8ffedffff",
"release": "luminous",
"num": 3

"osd":
"features": "0x3f01cfb8ffedffff",
"release": "luminous",
"num": 41

 

"client":
"features": "0x2f018fb86aa42ada",
"release": "luminous",
"num": 3

"features": "0x3f01cfb8ffedffff",
"release": "luminous",
"num": 3

"mgr":
"features": "0x3f01cfb8ffedffff",
"release": "luminous",
"num": 3

 

So is the highlighted client the problem?

For what exactly is the device_health_metrics pool? Did we need this and what happend when I delete it? Only this pool uses the default replicaed_rule.

Sorry that I ask several times but I prefer to be on the safe side and don't want to do things I don't understand

The client is ok, it is requesting luminous features. but not sure why you are getting the pre-luminous error when tsrying to set upmap.

you can change the device health pool to use the ssd rule from the ui, you can then delete the default rule:

ceph osd crush rule rm replicated_rule

Himm... strange

I've change the rule from the device_health_pool and delete the default rule. But I still can't change the balancer mode.

 

you should not have to change the mode, the deletion of the default rule should make the balancer work with existing mode, look for the balancer status should no longer have the "Some osds belong to multiple subtrees"

Hey

ceph balancer status says

{
"active": true,
"last_optimize_duration": "0:00:01.502575",
"last_optimize_started": "Thu Jul 14 08:12:52 2022",
"mode": "crush-compat",
"optimize_result": "Unable to find further optimization, change balancer mode and retry might help",
"plans": []
}

and ceph osd df looks quite better, thanks a lot

root@KXCHMUEST020:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
15 hdd 16.42969 1.00000 16 TiB 7.6 TiB 7.5 TiB 978 KiB 11 GiB 8.8 TiB 46.29 0.80 376 up
16 hdd 16.42969 1.00000 16 TiB 7.8 TiB 7.8 TiB 2.3 MiB 12 GiB 8.6 TiB 47.75 0.82 388 up
17 hdd 16.42969 1.00000 16 TiB 7.9 TiB 7.8 TiB 3.9 MiB 12 GiB 8.6 TiB 47.86 0.83 389 up
18 hdd 16.42969 1.00000 16 TiB 7.8 TiB 7.8 TiB 426 KiB 12 GiB 8.6 TiB 47.66 0.82 387 up
19 hdd 16.42969 1.00000 16 TiB 7.8 TiB 7.7 TiB 3.0 MiB 12 GiB 8.6 TiB 47.49 0.82 386 up
20 hdd 16.42969 1.00000 16 TiB 7.7 TiB 7.7 TiB 3.8 MiB 12 GiB 8.7 TiB 47.06 0.81 382 up
21 hdd 16.42969 1.00000 16 TiB 7.7 TiB 7.7 TiB 1.2 MiB 12 GiB 8.7 TiB 47.07 0.81 382 up
22 hdd 16.42969 1.00000 16 TiB 7.7 TiB 7.7 TiB 1.9 MiB 12 GiB 8.7 TiB 47.10 0.81 382 up
0 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 4.9 MiB 7.0 GiB 1.1 TiB 68.02 1.17 82 up
1 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 26 MiB 6.9 GiB 1.1 TiB 68.58 1.18 82 up
2 ssd 3.49300 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 342 KiB 6.9 GiB 1.1 TiB 67.09 1.16 82 up
9 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 980 KiB 6.9 GiB 1.1 TiB 69.38 1.20 82 up
12 ssd 3.49300 1.00000 3.5 TiB 2.6 TiB 2.6 TiB 0 B 7.1 GiB 881 GiB 75.38 1.30 82 up
23 ssd 3.49309 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 4.6 MiB 6.0 GiB 988 GiB 72.38 1.25 82 up
26 ssd 3.49309 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 5.4 MiB 5.4 GiB 1.2 TiB 67.07 1.16 82 up
28 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 37 KiB 6.4 GiB 987 GiB 72.41 1.25 82 up
29 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 4.6 MiB 6.2 GiB 1.1 TiB 69.45 1.20 82 up
30 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 1.2 MiB 6.6 GiB 1.1 TiB 69.46 1.20 82 up
37 ssd 3.49199 1.00000 3.5 TiB 2.6 TiB 2.6 TiB 4.1 MiB 6.4 GiB 952 GiB 73.38 1.26 81 up
3 ssd 3.49300 1.00000 3.5 TiB 2.6 TiB 2.6 TiB 0 B 7.1 GiB 957 GiB 73.24 1.26 81 up
4 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 3.8 MiB 7.0 GiB 1.1 TiB 69.34 1.20 82 up
5 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 25 MiB 6.9 GiB 1.0 TiB 70.12 1.21 82 up
10 ssd 3.49300 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 0 B 6.9 GiB 1016 GiB 71.58 1.23 82 up
13 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 1.2 MiB 6.8 GiB 1.1 TiB 68.57 1.18 81 up
24 ssd 3.49309 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 4.9 MiB 6.1 GiB 1.1 TiB 69.50 1.20 82 up
27 ssd 3.49309 1.00000 3.5 TiB 2.5 TiB 2.4 TiB 4.6 MiB 4.7 GiB 1.0 TiB 70.15 1.21 83 up
31 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 1.5 MiB 6.6 GiB 1.1 TiB 69.37 1.20 82 up
32 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 2.9 MiB 6.5 GiB 1.0 TiB 70.95 1.22 82 up
33 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 25 MiB 6.3 GiB 1.1 TiB 68.73 1.18 82 up
38 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 4.6 MiB 6.5 GiB 1.0 TiB 71.03 1.22 82 up
6 ssd 3.49300 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 1.2 MiB 7.1 GiB 1.0 TiB 70.83 1.22 82 up
7 ssd 3.49300 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 29 MiB 7.5 GiB 1.0 TiB 71.12 1.23 82 up
8 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 0 B 6.7 GiB 1.1 TiB 68.62 1.18 81 up
11 ssd 3.49300 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 0 B 6.6 GiB 1.1 TiB 69.33 1.20 82 up
14 ssd 3.49300 1.00000 3.5 TiB 2.5 TiB 2.5 TiB 2.2 MiB 7.1 GiB 986 GiB 72.45 1.25 82 up
25 ssd 3.49309 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 2.0 MiB 6.1 GiB 1.1 TiB 67.93 1.17 82 up
34 ssd 3.49199 1.00000 3.5 TiB 2.4 TiB 2.4 TiB 4.1 MiB 6.2 GiB 1.1 TiB 67.82 1.17 82 up
35 ssd 3.49199 1.00000 3.5 TiB 2.3 TiB 2.3 TiB 798 KiB 6.0 GiB 1.2 TiB 65.57 1.13 82 up
36 ssd 3.49199 1.00000 3.5 TiB 2.5 TiB 2.4 TiB 3.1 MiB 6.2 GiB 1.0 TiB 70.19 1.21 82 up
39 ssd 3.49199 1.00000 3.5 TiB 2.6 TiB 2.6 TiB 0 B 6.7 GiB 903 GiB 74.76 1.29 82 up
40 ssd 3.49309 1.00000 3.5 TiB 2.6 TiB 2.6 TiB 2.1 MiB 5.0 GiB 932 GiB 73.95 1.27 82 up
TOTAL 247 TiB 143 TiB 142 TiB 188 MiB 308 GiB 104 TiB 58.01
MIN/MAX VAR: 0.80/1.30 STDDEV: 12.12

So the problem was, that two different rules was set on the OSD's, right?

 

Can I ask you another question?

We have also problems with not deep-scrubbed pg's and you said here (http://www.petasan.org/forums/?view=thread&id=702) that it is possible to set some options in the ceph_configuration (osd_scrub_begin_hour, osd_scrub_end_hour, osd_scrub_sleep, osd_scrub_load_threshold). If I change the settings there are interrupptions in the productive system, right?

 

Pages: 1 2