Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Pool have low avaible Space

We have a PETASAN Cluster with 14 TB Available but our Pools only have 50 GB Available Space.

We have and NFS and some iscsi Disk one our SAN.

This could happen if you have inbalance among your OSDs, the pool available size will be limited by its most filled OSD. Look at the dashboard chart for top OSD filled %, also running the command

ceph osd df

will show you variations among OSDs. You should enable the ceph balancer from UI, pg autoscaler will also help to make sure you have enough pgs for balancing. For short term fixes you can also change the osd crush weights also from UI to get better balance.

 

The Balancen is aktiv but I have an osd with 95% and one with 80 % on the same node bothh are ssd with the same space.

Could you post

ceph osd df

ceph df

For a temp quick fix, from ui reduce the crush weight of osd by 10%

Is the pg autoscaler on, is it on the pools ?

root@sanc1:~# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
5    ssd  6.98630   1.00000  7.0 TiB  5.9 TiB  5.8 TiB   23 GiB   56 GiB  1.1 TiB  83.98  0.96  177      up
6    ssd  6.98630   1.00000  7.0 TiB  5.6 TiB  5.5 TiB   24 GiB   56 GiB  1.4 TiB  79.95  0.92  177      up
7    ssd  6.98630   1.00000  7.0 TiB  6.4 TiB  6.3 TiB   23 GiB   54 GiB  590 GiB  91.75  1.05  178      up
8    ssd  6.98630   1.00000  7.0 TiB  6.4 TiB  6.3 TiB   23 GiB   57 GiB  610 GiB  91.48  1.05  180      up
9    ssd  6.98630   1.00000  7.0 TiB  6.3 TiB  6.2 TiB   25 GiB   55 GiB  747 GiB  89.56  1.03  185      up
10    ssd  6.98630   1.00000  7.0 TiB  5.7 TiB  5.6 TiB   24 GiB   58 GiB  1.3 TiB  81.83  0.94  177      up
11    ssd  6.98630   1.00000  7.0 TiB  5.8 TiB  5.7 TiB   23 GiB   58 GiB  1.2 TiB  82.41  0.94  178      up
12    ssd  6.98630   1.00000  7.0 TiB  6.3 TiB  6.2 TiB   24 GiB   59 GiB  735 GiB  89.73  1.03  189      up
13    ssd  6.98630   1.00000  7.0 TiB  6.6 TiB  6.5 TiB   22 GiB   54 GiB  431 GiB  93.98  1.08  175      up
14    ssd  6.98630   1.00000  7.0 TiB  6.2 TiB  6.1 TiB   24 GiB   54 GiB  800 GiB  88.81  1.02  178      up
0    ssd  6.98630   1.00000  7.0 TiB  6.4 TiB  6.3 TiB   23 GiB   58 GiB  616 GiB  91.39  1.05  185      up
1    ssd  6.98630   1.00000  7.0 TiB  6.6 TiB  6.5 TiB   24 GiB   57 GiB  388 GiB  94.58  1.08  182      up
2    ssd  6.98630   1.00000  7.0 TiB  5.2 TiB  5.1 TiB   25 GiB   53 GiB  1.8 TiB  74.56  0.85  175      up
3    ssd  6.98630   1.00000  7.0 TiB  6.3 TiB  6.2 TiB   23 GiB   58 GiB  729 GiB  89.81  1.03  181      up
4    ssd  6.98630   1.00000  7.0 TiB  6.0 TiB  6.0 TiB   22 GiB   57 GiB  973 GiB  86.40  0.99  174      up
TOTAL  105 TiB   92 TiB   90 TiB  352 GiB  843 GiB   13 TiB  87.35
MIN/MAX VAR: 0.85/1.08  STDDEV: 5.50
root@sanc1:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE   AVAIL    USED  RAW USED  %RAW USED
ssd    105 TiB  13 TiB  92 TiB    92 TiB      87.35
TOTAL  105 TiB  13 TiB  92 TiB    92 TiB      87.35

--- POOLS ---
POOL          ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr           1    1   17 MiB        6   51 MiB   0.01    150 GiB
rbd            2  128   21 TiB    5.47M   62 TiB  99.30    150 GiB
nfs-pool       3  256  9.0 TiB  192.48M   28 TiB  98.48    150 GiB
nfs-metadata   4  512  118 GiB    5.53M  354 GiB  44.07    150 GiB

The number of pgs per osd is good, their variation among the osds is not so bad,  but even osds with same pg count have a usage variance, probably a factor of random size variation among the pgs. No need to change the pg autoscaler.

The quickest fix now is to manually lower the crush weight from ui of the top used OSDs by say 7%.

You can also experiment with different balancer modes: crush vs upmap.

It is possible the balancer is not efficient if some OSDs fall under the default crush rule as well as ssd class rule. In this case it may make sense to only use ssd class but this needs to be investigated first as doing it directly will cause rebalance traffic, google to see how to do this efficiently.

In general your usage is over 85%, it could make sense to add more storage sepcially if you are writing new data.

Lastly we have colored usage warning on the dashboard page, it is important to check this regularly and not leave it till the disks gets filled, you can also specify to get notification emails from the system