Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Cluster stopped, 1 osd is full

Pages: 1 2

Hi,

I don't know why, but cluster stopped because one OSD is full.

But why? Balancer is enabled. One OSD is 95% full while other one is 43% full.

It's 3 node cluster (ceph01, ceph02), but data is on 2 nodes, after move data we will add disks to ceph03.

How to rebalance this?

you can check balancer status
ceph balancer status
but if the cluster has an error, balancer will not function

it is best to add an extra OSD to the node if you can, in all cases do the following:

From maintenenace tab, OSD CRUSH Weights, reduce the weight of OSD 16 by 10% ( example 8 TB -> 7.2 )

bump the max full values (careful not too much)
ceph osd dump | grep full
ceph osd set-full-ratio 0.96
ceph osd set-backfillfull-ratio 0.955
ceph osd dump | grep full

systemctl restart ceph-osd.16

if the rebalance is still stuck, on all nodes do:
systemctl restart ceph-osd.target

Thank You,

i did "ceph osd out 16" then cluster go to "warning" and data was moved,
then i set "ceph osd reweight-by-utilization" and data was moved to other OSD.

But after some time, cluster said "reweight" to 1.0 again.

Main question is: why this happend.
Why ceph send all data to one OSD (without any reweight, and active auto rebalance)

i am not sure why you would set OSD 16 to out, i would recommend changing its crush weight as per prev post, also as indicated, the autorebalance will not function if cluster is in error and can be monitored via the status command. i would recommend once the cluster is healthy to monitor the balancer and see if it is working.

Because it make cluster immediately working and start move PG from this OSD.

I ran several times ceph osd reweight-by-utilization

and OSD was almost equal, but after night cluster stoped again and the 16th OSD is full again.

Why? Cluster have 62% free space...

have you lowered its crush weight as per first post ?

Yes,

ID CLASS WEIGHT   REWEIGHT SIZE   RAW USE DATA    OMAP    META    AVAIL    %USE  VAR  PGS STATUS
 0   hdd 5.45699  1.00000 5.5 TiB 2.8 TiB 2.8 TiB 144 KiB 5.5 GiB  2.6 TiB 51.50 0.84  32     up
 2   hdd 5.35699  0.93358 5.5 TiB 4.5 TiB 4.5 TiB  28 KiB 7.3 GiB 1021 GiB 81.73 1.33  27     up
 4   hdd 5.45699  1.00000 5.5 TiB 3.5 TiB 3.5 TiB  56 KiB 6.7 GiB  1.9 TiB 64.89 1.06  32     up
 6   hdd 5.45699  1.00000 5.5 TiB 2.5 TiB 2.5 TiB  80 KiB 5.1 GiB  3.0 TiB 45.09 0.73  33     up
10   hdd 3.63799  1.00000 3.6 TiB 1.9 TiB 1.9 TiB  36 KiB 4.1 GiB  1.7 TiB 52.52 0.85  22     up
12   hdd 3.63799  1.00000 3.6 TiB 2.1 TiB 2.1 TiB  56 KiB 4.4 GiB  1.5 TiB 58.32 0.95  23     up
14   hdd 3.63799  1.00000 3.6 TiB 2.1 TiB 2.1 TiB 120 KiB 4.4 GiB  1.5 TiB 58.74 0.96  23     up
16   hdd 3.03799  0.93358 3.6 TiB 3.2 TiB 3.2 TiB  52 KiB 5.6 GiB  414 GiB 88.89 1.45  16     up
 1   hdd 5.45699  1.00000 5.5 TiB 3.6 TiB 3.6 TiB 112 KiB 6.4 GiB  1.9 TiB 65.42 1.06  31     up
 3   hdd 5.45699  1.00000 5.5 TiB 3.5 TiB 3.5 TiB  48 KiB 6.4 GiB  1.9 TiB 64.47 1.05  31     up
 5   hdd 5.45699  1.00000 5.5 TiB 3.8 TiB 3.8 TiB 104 KiB 7.5 GiB  1.6 TiB 70.06 1.14  30     up
 7   hdd 5.45699  1.00000 5.5 TiB 2.8 TiB 2.8 TiB 148 KiB 5.9 GiB  2.6 TiB 51.60 0.84  32     up
 9   hdd 3.63799  1.00000 3.6 TiB 1.6 TiB 1.6 TiB  52 KiB 4.2 GiB  2.0 TiB 44.94 0.73  21     up
11   hdd 3.63799  1.00000 3.6 TiB 2.3 TiB 2.3 TiB  81 KiB 4.4 GiB  1.3 TiB 64.59 1.05  21     up
13   hdd 3.63799  1.00000 3.6 TiB 2.4 TiB 2.4 TiB 124 KiB 4.6 GiB  1.2 TiB 66.18 1.08  22     up
15   hdd 3.63799  1.00000 3.6 TiB 1.9 TiB 1.9 TiB  52 KiB 4.3 GiB  1.7 TiB 52.51 0.85  20     up
                    TOTAL  73 TiB  45 TiB  45 TiB 1.3 MiB  87 GiB   28 TiB 61.44

The %use looks better, still OSD 16 has appreciably higher usage 89%  yet PG count assigned to it is 16 which is below the average so it seems some PGs are storing more than others. This is due to some bad luck that crush is not true random, but also the total number of PGs is low which is makes things harder to balance.

You have 208 PGs in 3 pools, can you specify what each pool has how many PGs, how many replicas ? Increasing the PGs will help rebalance things, assigning PGs to the pools being used more and assign less PGs to the pools not being used often.

 

3 pools
1) 16 PGs
2) 128 PGs
3) 64 PGs

This is result of ceph autoscaler, he down PGS from almost 1024 for each pool. It takes almost 12 days.

Every pool have 2 replcas, but after move data from srv03 we will move it as ceph03 with 3 replicas.

I think the low number of PGs is an issue, not sure why the autoscaler reduced it. I would bump them again to 1024 total. i am not sure how you using all 3 pools but try to have the pool with the most usage have the larger ratio of PGs.

Pages: 1 2