Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

PetaSAN in our 2 datacenters for production

Pages: 1 2 3 4 5

Great! "esxcli storage core device vaai status get" did return 'Supported' for the 'Delete Status'.

And good to know that vmfs will re-use the disk space that is not reclaimed. While reusing  this disk space, does compression take place (so, for the non-reclaimed space)? Or should i then first reclaim the space before compression can kick in? After removing 100GB's of data (which was put on PetaSAN before compression worked) en putting new VM's  (1TB of data) on the storage, I now see:

RAW STORAGE:
CLASS  SIZE     AVAIL    USED    RAW USED   %RAW USED
hdd       42 TiB   16 TiB    26 TiB    26 TiB             61.26
TOTAL 42 TiB   16 TiB    26 TiB    26 TiB             61.26

POOLS:
POOL    ID    STORED    OBJECTS    USED    %USED    MAX AVAIL   QUOTA OBJECTS    QUOTA BYTES  DIRTY    USED COMPR   UNDER COMPR
rbd          1        13 TiB         3.37M        26 TiB      74.31        4.4 TiB              N/A                                       N/A            3.37M     117 GiB                 235 GiB

 

And:

osd.8
"bluestore_compressed": 1696971702,
"bluestore_compressed_allocated": 5907742720,
"bluestore_compressed_original": 11883800064,
osd.9
"bluestore_compressed": 1772540552,
"bluestore_compressed_allocated": 6132400128,
"bluestore_compressed_original": 12336011264,
osd.10
"bluestore_compressed": 1450032079,
"bluestore_compressed_allocated": 5003083776,
"bluestore_compressed_original": 10065756672,

Is only 117GB now compressed? (as shown under 'USED COMPR')

And can we conclude that on OSD-10  (f.e.) only 5GB data is saved (due to compression)?

 

 

 

You do not need to reclaim space, writing new data after you enable compression will be evaluated for compression.

Yes only 117 GB got compressed.

Can you show

ceph daemon osd.8 perf dump | grep compress_success_count
ceph daemon osd.8 perf dump | grep compress_rejected_count

Are you using SSD or HDD OSDs ?

there is min write size below which compression will not happen, defaults are:

"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_compression_min_blob_size_ssd": "8192",

If you write random vm traffic, writes below the above sizes will not be compressed.
If you do file transfers ( file copy, backups ) the block sizes will be big enough for compression.

The results:

root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_success_count
"compress_success_count": 365627,
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_rejected_count
"compress_rejected_count": 71263,

We are using SSD's.

The minimum write size is independent of the used compression type? (zstd vs snappy f.e.)

While looking at the 'RAW STORAGE' availability (16TB) and comparing this value to the Pool 'MAX AVAIL' (4.3TB), I don't understand why the Pool has less available then the RAW STORAGE is showing. We have 2 replica's so I would expect to have 16TB/2 = 8TB availability on the Pool (but it seems to be 4.3TB):

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd       42 TiB 16 TiB 26 TiB 26 TiB 61.69
TOTAL     42 TiB 16 TiB 26 TiB 26 TiB 61.69

POOLS:
POOL ID STORED OBJECTS USED   %USED MAX AVAIL   QUOTA OBJECTS QUOTA   BYTES   DIRTY USED COMPR UNDER COMPR
rbd      1     13 TiB      3.39M       26 TiB   74.90      4.3 TiB         N/A          N/A          3.39M       139 GiB      279 GiB

What causes to have less space on the pool left then expected?

Thank you.

 

Yes the min size is independent on compression type.

For pool avail vs raw storage see here

Thanks, using the steps you gave;

ceph balancer mode crush-compat
ceph balancer on

there is more balance between the RAW en Pool space shown.

 

One of our two environments now gives this warning (since e few days):

root@NODE01_AM5-1:~# ceph -s
cluster:
id: 65838b73-85a9-4917-9644-cd7c1c292773
health: HEALTH_WARN
15 slow ops, oldest one blocked for 257004 sec, mon.NODE03_AM5-1 has slow ops

services:
mon: 3 daemons, quorum NODE03_AM5-1,NODE01_AM5-1,NODE02_AM5-1 (age 2d)
mgr: NODE01_AM5-1(active, since 3d), standbys: NODE02_AM5-1, NODE03_AM5-1
osd: 9 osds: 9 up (since 2d), 9 in (since 3w)

data:
pools: 1 pools, 256 pgs
objects: 108.08k objects, 420 GiB
usage: 835 GiB used, 15 TiB / 16 TiB avail
pgs: 256 active+clean

io:
client: 2.4 MiB/s rd, 19 KiB/s wr, 108 op/s rd, 3 op/s wr

And some extra info:

systemctl status ceph-mon@PetaSAN_AM5-1
● ceph-mon@PetaSAN_AM5-1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)

The warning came after we disconnected first one storage switch for an update and next (first switch active again) the second one. The 3 NODES are connected via a BOND to both switches (for redundancy). Our storage hasn't been down but it seems it resulted in this warning.

How could we resolve this? Thanks in advance!

Can you try starting the service. If it cannot start, look at the ceph logs.

Hi,

I have the question about original topic of this thread. We also plan to use 3 node Petasan(main storage system for Proxmox) between two server room(distance is two floor, so latency could not be an issue).

In general, when I place 2 nodes in server room 1 and 1 node in server room 2, and for example server room 1 lose electricity, whole Petasan goes down due to majority number of servers are down. How to handle this?  If I would have 5 node Petasan, the problem would be the same. What is the best approach to have "uninterruptable" Petasan between two close server rooms?

Thank you very much.

Vladislav

Any system that requires a quorum to work, requires n/2 +1 nodes to be up ( n is the number of nodes).   this avoids split brain conditions.  such a system will have a problem being split into 2 locations.

For your case, one idea is to setup 5 nodes but make 1 of them a vm running under Proxmox in an HA setup, if one location fails the vm is switches active to the other location to maintain quorum. This vm should be a monitor only (no iSCSI/storage) and should have its virtual disk stored outside of PetaSAN.

 

Hi,

Actually, none of the services seem to run. Also, if I look at our second test environment (Ceph Health OK) I see on the 3 NODES:

root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE01__AM5-2
● ceph-mon@NODE01__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE02__AM5-2
● ceph-mon@NODE02__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE03__AM5-2
● ceph-mon@NODE03__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)

Is this necessary at all or is this by default?

Your comment: "Can you try starting the service. If it cannot start, look at the ceph logs.

(Which 'service' did you meant I had to start?)

Thanks.

On NODE03 :

save the following logs:

ceph daemon mon.NODE03_AM5-1 ops
ceph daemon mon.NODE03_AM5-1 dump_historic_ops

restart service

systemctl restart ceph-mon@NODE03_AM5-1

Pages: 1 2 3 4 5