Forums - PetaSAN

ForumGeneral DiscussionPetaSAN in our 2 datacenters for …
You need to log in to create posts and topics. Login · Register
PetaSAN in our 2 datacenters for production

Pages: 1 2 3 4 5

Syscon
23 Posts

January 14, 2020, 8:15 am
Quote from Syscon on January 14, 2020, 8:15 am
Great! "esxcli storage core device vaai status get" did return 'Supported' for the 'Delete Status'.

And good to know that vmfs will re-use the disk space that is not reclaimed. While reusing this disk space, does compression take place (so, for the non-reclaimed space)? Or should i then first reclaim the space before compression can kick in? After removing 100GB's of data (which was put on PetaSAN before compression worked) en putting new VM's (1TB of data) on the storage, I now see:

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.26
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.26

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.37M 26 TiB 74.31 4.4 TiB N/A N/A 3.37M 117 GiB 235 GiB

And:

osd.8
"bluestore_compressed": 1696971702,
"bluestore_compressed_allocated": 5907742720,
"bluestore_compressed_original": 11883800064,
osd.9
"bluestore_compressed": 1772540552,
"bluestore_compressed_allocated": 6132400128,
"bluestore_compressed_original": 12336011264,
osd.10
"bluestore_compressed": 1450032079,
"bluestore_compressed_allocated": 5003083776,
"bluestore_compressed_original": 10065756672,

Is only 117GB now compressed? (as shown under 'USED COMPR')

And can we conclude that on OSD-10 (f.e.) only 5GB data is saved (due to compression)?

Great! "esxcli storage core device vaai status get" did return 'Supported' for the 'Delete Status'.

And good to know that vmfs will re-use the disk space that is not reclaimed. While reusing this disk space, does compression take place (so, for the non-reclaimed space)? Or should i then first reclaim the space before compression can kick in? After removing 100GB's of data (which was put on PetaSAN before compression worked) en putting new VM's (1TB of data) on the storage, I now see:

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.26
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.26

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.37M 26 TiB 74.31 4.4 TiB N/A N/A 3.37M 117 GiB 235 GiB

And:

osd.8
"bluestore_compressed": 1696971702,
"bluestore_compressed_allocated": 5907742720,
"bluestore_compressed_original": 11883800064,
osd.9
"bluestore_compressed": 1772540552,
"bluestore_compressed_allocated": 6132400128,
"bluestore_compressed_original": 12336011264,
osd.10
"bluestore_compressed": 1450032079,
"bluestore_compressed_allocated": 5003083776,
"bluestore_compressed_original": 10065756672,

Is only 117GB now compressed? (as shown under 'USED COMPR')

And can we conclude that on OSD-10 (f.e.) only 5GB data is saved (due to compression)?

#31

admin
2,969 Posts

January 14, 2020, 1:57 pm
Quote from admin on January 14, 2020, 1:57 pm
You do not need to reclaim space, writing new data after you enable compression will be evaluated for compression.

Yes only 117 GB got compressed.

Can you show

ceph daemon osd.8 perf dump | grep compress_success_count
ceph daemon osd.8 perf dump | grep compress_rejected_count

Are you using SSD or HDD OSDs ?

there is min write size below which compression will not happen, defaults are:

"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_compression_min_blob_size_ssd": "8192",

If you write random vm traffic, writes below the above sizes will not be compressed.
If you do file transfers ( file copy, backups ) the block sizes will be big enough for compression.

You do not need to reclaim space, writing new data after you enable compression will be evaluated for compression.

Yes only 117 GB got compressed.

Can you show

ceph daemon osd.8 perf dump | grep compress_success_count
ceph daemon osd.8 perf dump | grep compress_rejected_count

Are you using SSD or HDD OSDs ?

there is min write size below which compression will not happen, defaults are:

"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_compression_min_blob_size_ssd": "8192",

If you write random vm traffic, writes below the above sizes will not be compressed.
If you do file transfers ( file copy, backups ) the block sizes will be big enough for compression.

Last edited on January 14, 2020, 1:58 pm by admin · #32

Syscon
23 Posts

January 14, 2020, 2:16 pm
Quote from Syscon on January 14, 2020, 2:16 pm
The results:

root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_success_count
"compress_success_count": 365627,
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_rejected_count
"compress_rejected_count": 71263,

We are using SSD's.

The minimum write size is independent of the used compression type? (zstd vs snappy f.e.)

While looking at the 'RAW STORAGE' availability (16TB) and comparing this value to the Pool 'MAX AVAIL' (4.3TB), I don't understand why the Pool has less available then the RAW STORAGE is showing. We have 2 replica's so I would expect to have 16TB/2 = 8TB availability on the Pool (but it seems to be 4.3TB):

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.69
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.69

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.39M 26 TiB 74.90 4.3 TiB N/A N/A 3.39M 139 GiB 279 GiB

What causes to have less space on the pool left then expected?

Thank you.

The results:

root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_success_count
"compress_success_count": 365627,
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_rejected_count
"compress_rejected_count": 71263,

We are using SSD's.

The minimum write size is independent of the used compression type? (zstd vs snappy f.e.)

While looking at the 'RAW STORAGE' availability (16TB) and comparing this value to the Pool 'MAX AVAIL' (4.3TB), I don't understand why the Pool has less available then the RAW STORAGE is showing. We have 2 replica's so I would expect to have 16TB/2 = 8TB availability on the Pool (but it seems to be 4.3TB):

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.69
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.69

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.39M 26 TiB 74.90 4.3 TiB N/A N/A 3.39M 139 GiB 279 GiB

What causes to have less space on the pool left then expected?

Thank you.

Last edited on January 14, 2020, 2:31 pm by Syscon · #33

admin
2,969 Posts

January 14, 2020, 2:39 pm
Quote from admin on January 14, 2020, 2:39 pm
Yes the min size is independent on compression type.

For pool avail vs raw storage see here

Yes the min size is independent on compression type.

For pool avail vs raw storage see here

#34

Syscon
23 Posts

January 17, 2020, 12:48 pm
Quote from Syscon on January 17, 2020, 12:48 pm
Thanks, using the steps you gave;

ceph balancer mode crush-compat
ceph balancer on

there is more balance between the RAW en Pool space shown.

One of our two environments now gives this warning (since e few days):

root@NODE01_AM5-1:~# ceph -s
cluster:
id: 65838b73-85a9-4917-9644-cd7c1c292773
health: HEALTH_WARN
15 slow ops, oldest one blocked for 257004 sec, mon.NODE03_AM5-1 has slow ops

services:
mon: 3 daemons, quorum NODE03_AM5-1,NODE01_AM5-1,NODE02_AM5-1 (age 2d)
mgr: NODE01_AM5-1(active, since 3d), standbys: NODE02_AM5-1, NODE03_AM5-1
osd: 9 osds: 9 up (since 2d), 9 in (since 3w)

data:
pools: 1 pools, 256 pgs
objects: 108.08k objects, 420 GiB
usage: 835 GiB used, 15 TiB / 16 TiB avail
pgs: 256 active+clean

io:
client: 2.4 MiB/s rd, 19 KiB/s wr, 108 op/s rd, 3 op/s wr

And some extra info:

systemctl status ceph-mon@PetaSAN_AM5-1
● ceph-mon@PetaSAN_AM5-1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)

The warning came after we disconnected first one storage switch for an update and next (first switch active again) the second one. The 3 NODES are connected via a BOND to both switches (for redundancy). Our storage hasn't been down but it seems it resulted in this warning.

How could we resolve this? Thanks in advance!

Thanks, using the steps you gave;

ceph balancer mode crush-compat
ceph balancer on

there is more balance between the RAW en Pool space shown.

One of our two environments now gives this warning (since e few days):

root@NODE01_AM5-1:~# ceph -s
cluster:
id: 65838b73-85a9-4917-9644-cd7c1c292773
health: HEALTH_WARN
15 slow ops, oldest one blocked for 257004 sec, mon.NODE03_AM5-1 has slow ops

services:
mon: 3 daemons, quorum NODE03_AM5-1,NODE01_AM5-1,NODE02_AM5-1 (age 2d)
mgr: NODE01_AM5-1(active, since 3d), standbys: NODE02_AM5-1, NODE03_AM5-1
osd: 9 osds: 9 up (since 2d), 9 in (since 3w)

data:
pools: 1 pools, 256 pgs
objects: 108.08k objects, 420 GiB
usage: 835 GiB used, 15 TiB / 16 TiB avail
pgs: 256 active+clean

io:
client: 2.4 MiB/s rd, 19 KiB/s wr, 108 op/s rd, 3 op/s wr

And some extra info:

systemctl status ceph-mon@PetaSAN_AM5-1
● ceph-mon@PetaSAN_AM5-1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)

The warning came after we disconnected first one storage switch for an update and next (first switch active again) the second one. The 3 NODES are connected via a BOND to both switches (for redundancy). Our storage hasn't been down but it seems it resulted in this warning.

How could we resolve this? Thanks in advance!

Last edited on January 17, 2020, 12:50 pm by Syscon · #35

admin
2,969 Posts

January 17, 2020, 3:22 pm
Quote from admin on January 17, 2020, 3:22 pm
Can you try starting the service. If it cannot start, look at the ceph logs.

Can you try starting the service. If it cannot start, look at the ceph logs.

#36

vantolik
5 Posts

January 17, 2020, 9:46 pm
Quote from vantolik on January 17, 2020, 9:46 pm
Hi,

I have the question about original topic of this thread. We also plan to use 3 node Petasan(main storage system for Proxmox) between two server room(distance is two floor, so latency could not be an issue).

In general, when I place 2 nodes in server room 1 and 1 node in server room 2, and for example server room 1 lose electricity, whole Petasan goes down due to majority number of servers are down. How to handle this? If I would have 5 node Petasan, the problem would be the same. What is the best approach to have "uninterruptable" Petasan between two close server rooms?

Thank you very much.

Vladislav

Hi,

I have the question about original topic of this thread. We also plan to use 3 node Petasan(main storage system for Proxmox) between two server room(distance is two floor, so latency could not be an issue).

In general, when I place 2 nodes in server room 1 and 1 node in server room 2, and for example server room 1 lose electricity, whole Petasan goes down due to majority number of servers are down. How to handle this? If I would have 5 node Petasan, the problem would be the same. What is the best approach to have "uninterruptable" Petasan between two close server rooms?

Thank you very much.

Vladislav

#37

admin
2,969 Posts

January 18, 2020, 12:02 am
Quote from admin on January 18, 2020, 12:02 am
Any system that requires a quorum to work, requires n/2 +1 nodes to be up ( n is the number of nodes). this avoids split brain conditions. such a system will have a problem being split into 2 locations.

For your case, one idea is to setup 5 nodes but make 1 of them a vm running under Proxmox in an HA setup, if one location fails the vm is switches active to the other location to maintain quorum. This vm should be a monitor only (no iSCSI/storage) and should have its virtual disk stored outside of PetaSAN.

Any system that requires a quorum to work, requires n/2 +1 nodes to be up ( n is the number of nodes). this avoids split brain conditions. such a system will have a problem being split into 2 locations.

For your case, one idea is to setup 5 nodes but make 1 of them a vm running under Proxmox in an HA setup, if one location fails the vm is switches active to the other location to maintain quorum. This vm should be a monitor only (no iSCSI/storage) and should have its virtual disk stored outside of PetaSAN.

Last edited on January 18, 2020, 12:04 am by admin · #38

Syscon
23 Posts

January 18, 2020, 7:30 am
Quote from Syscon on January 18, 2020, 7:30 am
Hi,

Actually, none of the services seem to run. Also, if I look at our second test environment (Ceph Health OK) I see on the 3 NODES:

root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE01AM5-2
● ceph-mon@NODE01AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE02AM5-2
● ceph-mon@NODE02AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE03AM5-2
● ceph-mon@NODE03AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)

Is this necessary at all or is this by default?

Your comment: "Can you try starting the service. If it cannot start, look at the ceph logs.

(Which 'service' did you meant I had to start?)

Thanks.

Hi,

Actually, none of the services seem to run. Also, if I look at our second test environment (Ceph Health OK) I see on the 3 NODES:

root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE01AM5-2
● ceph-mon@NODE01AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE02AM5-2
● ceph-mon@NODE02AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE03AM5-2
● ceph-mon@NODE03AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)

Is this necessary at all or is this by default?

Your comment: "Can you try starting the service. If it cannot start, look at the ceph logs.

(Which 'service' did you meant I had to start?)

Thanks.

#39

admin
2,969 Posts

January 18, 2020, 9:26 am
Quote from admin on January 18, 2020, 9:26 am
On NODE03 :

save the following logs:

ceph daemon mon.NODE03_AM5-1 ops
ceph daemon mon.NODE03_AM5-1 dump_historic_ops

restart service

systemctl restart ceph-mon@NODE03_AM5-1

On NODE03 :

save the following logs:

ceph daemon mon.NODE03_AM5-1 ops
ceph daemon mon.NODE03_AM5-1 dump_historic_ops

restart service

systemctl restart ceph-mon@NODE03_AM5-1

#40

Post Reply: PetaSAN in our 2 datacenters for production

Cancel

Pages: 1 2 3 4 5