Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Storage used size

Hi,

I'm a little bit confused about the used storage. Sorry for the stupid questions...

We have 3 Nodes with each 5x 2TB (real 1.82TB) OSD (15 OSDs total)
Petasan provides one replicated pool with a size 3 (Compression is disabled)
The pool is used as a VMware storage.

So my first simple calculation was that I can use 7.7TB without any problem (minus a bit overhead).
Calculation: 85% (osd nearfull ratio) of 1.82 TB * 15 (OSD) / 3 (replica)

Now 5.44 TB of the VMWare storage is used and PetaSAN is in the following warning state:


If I look into the Ceph usage:

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 27 TiB 6.5 TiB 21 TiB 21 TiB 76.23
TOTAL 27 TiB 6.5 TiB 21 TiB 21 TiB 76.23

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
SSD 2 6.9 TiB 1.82M 21 TiB 91.49 659 GiB

ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 27.28180 - 27 TiB 21 TiB 21 TiB 884 MiB 42 GiB 6.5 TiB 76.21 1.00 - root default
-5 9.09492 - 9.1 TiB 6.9 TiB 6.9 TiB 521 MiB 13 GiB 2.2 TiB 76.20 1.00 - host WOAZPS01
5 ssd 1.81898 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 110 MiB 2.5 GiB 531 GiB 71.48 0.94 96 up osd.5
6 ssd 1.81898 1.00000 1.8 TiB 1.4 TiB 1.4 TiB 120 MiB 2.7 GiB 406 GiB 78.20 1.03 105 up osd.6
7 ssd 1.81898 1.00000 1.8 TiB 1.5 TiB 1.5 TiB 100 MiB 2.9 GiB 354 GiB 81.02 1.06 109 up osd.7
8 ssd 1.81898 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 120 MiB 2.5 GiB 559 GiB 69.97 0.92 94 up osd.8
9 ssd 1.81898 1.00000 1.8 TiB 1.5 TiB 1.5 TiB 71 MiB 2.8 GiB 367 GiB 80.33 1.05 108 up osd.9
-7 9.09492 - 9.1 TiB 6.9 TiB 6.9 TiB 363 MiB 15 GiB 2.2 TiB 76.21 1.00 - host WOAZPS02
10 ssd 1.81898 1.00000 1.8 TiB 1.6 TiB 1.6 TiB 75 MiB 3.0 GiB 227 GiB 87.84 1.15 118 up osd.10
11 ssd 1.81898 1.00000 1.8 TiB 1.5 TiB 1.5 TiB 62 MiB 2.8 GiB 354 GiB 81.01 1.06 109 up osd.11
12 ssd 1.81898 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 73 MiB 2.5 GiB 516 GiB 72.32 0.95 97 up osd.12
13 ssd 1.81898 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 103 MiB 2.6 GiB 506 GiB 72.83 0.96 98 up osd.13
14 ssd 1.81898 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 50 MiB 4.0 GiB 613 GiB 67.07 0.88 90 up osd.14
-3 9.09195 - 9.1 TiB 6.9 TiB 6.9 TiB 195 KiB 14 GiB 2.2 TiB 76.23 1.00 - host WOAZPS03
0 ssd 1.81839 1.00000 1.8 TiB 1.6 TiB 1.6 TiB 28 KiB 3.0 GiB 269 GiB 85.55 1.12 115 up osd.0
1 ssd 1.81839 1.00000 1.8 TiB 1.4 TiB 1.4 TiB 28 KiB 2.8 GiB 405 GiB 78.27 1.03 105 up osd.1
2 ssd 1.81839 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 36 KiB 2.6 GiB 561 GiB 69.87 0.92 94 up osd.2
3 ssd 1.81839 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 52 KiB 2.6 GiB 573 GiB 69.20 0.91 93 up osd.3
4 ssd 1.81839 1.00000 1.8 TiB 1.4 TiB 1.4 TiB 51 KiB 2.8 GiB 405 GiB 78.27 1.03 105 up osd.4

  1. I don't understand the 21TiB USED under POOL. As far as I know these are not RAW values
  2. Why are there only 659 GiB MAX AVAIL? (707.6GB)
  3. Could there be wasted space some where?
  4. Why is the %USED of the Pool (91.49) and Raw storage (76.23) different?
  5. The cluster is still "working" until one OSD reaches 95% usage right?
  6. Should I do some re balance? (the OSD are not even balance)
  7. The PetaSAN chart of the Cluster Storage shows "Used" 0 B. Is this a bug?

Cluster Storage

So I know that the solution is add more OSD. But I just what to understand the the storage/disc usage.

Thank you.

There could be several reasons for discrepancies, but in your case the it is due to some inbalance on how data is distributed among the OSDs due to the random nature of crush placement algorithm. In your case one OSD has 118 PGs and on the low end another OSD has 91 PGs, so there is almost 20% variation between max and min, this of course results in OSD being 85% full while another be 70%.

You have 27 TB raw usage (before accounting replicas), 21 TB raw used, 6 TB available.
Your pool also show 21 TB raw used (this is raw) which /3 replica is 6.9 TB as shown under Stored, this is the net stored your clients/VMWare sees.

Now the questions of why the Max avail is only 659 Gb and why the % used difference between 91.49% (pools) and 76.23% (raw storage) :
Due to one OSD being 85% full, the pool max avail "predicts" that after writing 659 GB one of your OSDs in the pool will be full and your cannot write any more..so even though the cluster as a whole is only 76% full and should have 6.5 TB (+in your case since you have only 1 pool, it is also on average 76% full), in reality 659 GB is what you can write before your max OSD will be full.

Even though you have a warning, as you stated the cluster is still functioning, the relevant config values:

mon_osd_full_ratio = 0.950000
mon_osd_backfillfull_ratio = 0.900000
mon_osd_nearfull_ratio = 0.850000

you have a warning after 85% OSD near full, your OSD will accept writes till 95% but at 90% it will not be part of a backfill recovery..so in case you have some other OSD failure, the cluster may become degraded.

To fix:
-Add OSDs, this is highly recommended, this will also rebalance things
-Or use balancer module to rebalance (this is a temp solution)

ceph balancer mode crush-compat
ceph balancer on
ceph balancer status

Yes the PetaSAN chart show 0 bytes used is a recent bug, it is due to recent change in ceph df --format json-pretty command returning bytes_used instead or earlier raw_bytes_used. We have it logged as a bug.

Thanks for your detailed answer. This cleared a lot of confusion.
Also after the rebalance it looks completely different and more logical and the %USED are almost the same 🙂

1.7 TiB MAX AVAIL (notional) sound better than 659 GiB from before.

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 27 TiB 6.5 TiB 21 TiB 21 TiB 76.22
TOTAL 27 TiB 6.5 TiB 21 TiB 21 TiB 76.22

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
SSD 2 6.9 TiB 1.82M 21 TiB 80.68 1.7 TiB

 

I'll add more OSDs.
Do I have to turn off the balancer with the command "ceph balancer off" after it is finished?

Just out of curiosity: Should the rebalance not also improve the performance a little bit? So that a regular rebalance would help there too... especially for bigger cluster?

Great, happy it helped 🙂

Whether you want to keep balancer on or just use it when you feel you need it is up to you. either way is good. some users do enable it and disable it in a scheduled script to run at specific times instead of always.

Having inbalance in storage capacity most likely will cause some OSDs to be more busy than others, so theoretically you may see a small performance boost, probably nothing significant.