PetaSAN in our 2 datacenters for production
Syscon
23 Posts
January 14, 2020, 8:15 amQuote from Syscon on January 14, 2020, 8:15 amGreat! "esxcli storage core device vaai status get" did return 'Supported' for the 'Delete Status'.
And good to know that vmfs will re-use the disk space that is not reclaimed. While reusing this disk space, does compression take place (so, for the non-reclaimed space)? Or should i then first reclaim the space before compression can kick in? After removing 100GB's of data (which was put on PetaSAN before compression worked) en putting new VM's (1TB of data) on the storage, I now see:
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.26
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.26
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.37M 26 TiB 74.31 4.4 TiB N/A N/A 3.37M 117 GiB 235 GiB
And:
osd.8
"bluestore_compressed": 1696971702,
"bluestore_compressed_allocated": 5907742720,
"bluestore_compressed_original": 11883800064,
osd.9
"bluestore_compressed": 1772540552,
"bluestore_compressed_allocated": 6132400128,
"bluestore_compressed_original": 12336011264,
osd.10
"bluestore_compressed": 1450032079,
"bluestore_compressed_allocated": 5003083776,
"bluestore_compressed_original": 10065756672,
Is only 117GB now compressed? (as shown under 'USED COMPR')
And can we conclude that on OSD-10 (f.e.) only 5GB data is saved (due to compression)?
Great! "esxcli storage core device vaai status get" did return 'Supported' for the 'Delete Status'.
And good to know that vmfs will re-use the disk space that is not reclaimed. While reusing this disk space, does compression take place (so, for the non-reclaimed space)? Or should i then first reclaim the space before compression can kick in? After removing 100GB's of data (which was put on PetaSAN before compression worked) en putting new VM's (1TB of data) on the storage, I now see:
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.26
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.26
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.37M 26 TiB 74.31 4.4 TiB N/A N/A 3.37M 117 GiB 235 GiB
And:
osd.8
"bluestore_compressed": 1696971702,
"bluestore_compressed_allocated": 5907742720,
"bluestore_compressed_original": 11883800064,
osd.9
"bluestore_compressed": 1772540552,
"bluestore_compressed_allocated": 6132400128,
"bluestore_compressed_original": 12336011264,
osd.10
"bluestore_compressed": 1450032079,
"bluestore_compressed_allocated": 5003083776,
"bluestore_compressed_original": 10065756672,
Is only 117GB now compressed? (as shown under 'USED COMPR')
And can we conclude that on OSD-10 (f.e.) only 5GB data is saved (due to compression)?
admin
2,930 Posts
January 14, 2020, 1:57 pmQuote from admin on January 14, 2020, 1:57 pmYou do not need to reclaim space, writing new data after you enable compression will be evaluated for compression.
Yes only 117 GB got compressed.
Can you show
ceph daemon osd.8 perf dump | grep compress_success_count
ceph daemon osd.8 perf dump | grep compress_rejected_count
Are you using SSD or HDD OSDs ?
there is min write size below which compression will not happen, defaults are:
"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_compression_min_blob_size_ssd": "8192",
If you write random vm traffic, writes below the above sizes will not be compressed.
If you do file transfers ( file copy, backups ) the block sizes will be big enough for compression.
You do not need to reclaim space, writing new data after you enable compression will be evaluated for compression.
Yes only 117 GB got compressed.
Can you show
ceph daemon osd.8 perf dump | grep compress_success_count
ceph daemon osd.8 perf dump | grep compress_rejected_count
Are you using SSD or HDD OSDs ?
there is min write size below which compression will not happen, defaults are:
"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_compression_min_blob_size_ssd": "8192",
If you write random vm traffic, writes below the above sizes will not be compressed.
If you do file transfers ( file copy, backups ) the block sizes will be big enough for compression.
Last edited on January 14, 2020, 1:58 pm by admin · #32
Syscon
23 Posts
January 14, 2020, 2:16 pmQuote from Syscon on January 14, 2020, 2:16 pmThe results:
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_success_count
"compress_success_count": 365627,
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_rejected_count
"compress_rejected_count": 71263,
We are using SSD's.
The minimum write size is independent of the used compression type? (zstd vs snappy f.e.)
While looking at the 'RAW STORAGE' availability (16TB) and comparing this value to the Pool 'MAX AVAIL' (4.3TB), I don't understand why the Pool has less available then the RAW STORAGE is showing. We have 2 replica's so I would expect to have 16TB/2 = 8TB availability on the Pool (but it seems to be 4.3TB):
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.69
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.69
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.39M 26 TiB 74.90 4.3 TiB N/A N/A 3.39M 139 GiB 279 GiB
What causes to have less space on the pool left then expected?
Thank you.
The results:
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_success_count
"compress_success_count": 365627,
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_rejected_count
"compress_rejected_count": 71263,
We are using SSD's.
The minimum write size is independent of the used compression type? (zstd vs snappy f.e.)
While looking at the 'RAW STORAGE' availability (16TB) and comparing this value to the Pool 'MAX AVAIL' (4.3TB), I don't understand why the Pool has less available then the RAW STORAGE is showing. We have 2 replica's so I would expect to have 16TB/2 = 8TB availability on the Pool (but it seems to be 4.3TB):
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.69
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.69
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.39M 26 TiB 74.90 4.3 TiB N/A N/A 3.39M 139 GiB 279 GiB
What causes to have less space on the pool left then expected?
Thank you.
Last edited on January 14, 2020, 2:31 pm by Syscon · #33
admin
2,930 Posts
Syscon
23 Posts
January 17, 2020, 12:48 pmQuote from Syscon on January 17, 2020, 12:48 pmThanks, using the steps you gave;
ceph balancer mode crush-compat
ceph balancer on
there is more balance between the RAW en Pool space shown.
One of our two environments now gives this warning (since e few days):
root@NODE01_AM5-1:~# ceph -s
cluster:
id: 65838b73-85a9-4917-9644-cd7c1c292773
health: HEALTH_WARN
15 slow ops, oldest one blocked for 257004 sec, mon.NODE03_AM5-1 has slow ops
services:
mon: 3 daemons, quorum NODE03_AM5-1,NODE01_AM5-1,NODE02_AM5-1 (age 2d)
mgr: NODE01_AM5-1(active, since 3d), standbys: NODE02_AM5-1, NODE03_AM5-1
osd: 9 osds: 9 up (since 2d), 9 in (since 3w)
data:
pools: 1 pools, 256 pgs
objects: 108.08k objects, 420 GiB
usage: 835 GiB used, 15 TiB / 16 TiB avail
pgs: 256 active+clean
io:
client: 2.4 MiB/s rd, 19 KiB/s wr, 108 op/s rd, 3 op/s wr
And some extra info:
systemctl status ceph-mon@PetaSAN_AM5-1
● ceph-mon@PetaSAN_AM5-1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
The warning came after we disconnected first one storage switch for an update and next (first switch active again) the second one. The 3 NODES are connected via a BOND to both switches (for redundancy). Our storage hasn't been down but it seems it resulted in this warning.
How could we resolve this? Thanks in advance!
Thanks, using the steps you gave;
ceph balancer mode crush-compat
ceph balancer on
there is more balance between the RAW en Pool space shown.
One of our two environments now gives this warning (since e few days):
root@NODE01_AM5-1:~# ceph -s
cluster:
id: 65838b73-85a9-4917-9644-cd7c1c292773
health: HEALTH_WARN
15 slow ops, oldest one blocked for 257004 sec, mon.NODE03_AM5-1 has slow ops
services:
mon: 3 daemons, quorum NODE03_AM5-1,NODE01_AM5-1,NODE02_AM5-1 (age 2d)
mgr: NODE01_AM5-1(active, since 3d), standbys: NODE02_AM5-1, NODE03_AM5-1
osd: 9 osds: 9 up (since 2d), 9 in (since 3w)
data:
pools: 1 pools, 256 pgs
objects: 108.08k objects, 420 GiB
usage: 835 GiB used, 15 TiB / 16 TiB avail
pgs: 256 active+clean
io:
client: 2.4 MiB/s rd, 19 KiB/s wr, 108 op/s rd, 3 op/s wr
And some extra info:
systemctl status ceph-mon@PetaSAN_AM5-1
● ceph-mon@PetaSAN_AM5-1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
The warning came after we disconnected first one storage switch for an update and next (first switch active again) the second one. The 3 NODES are connected via a BOND to both switches (for redundancy). Our storage hasn't been down but it seems it resulted in this warning.
How could we resolve this? Thanks in advance!
Last edited on January 17, 2020, 12:50 pm by Syscon · #35
admin
2,930 Posts
January 17, 2020, 3:22 pmQuote from admin on January 17, 2020, 3:22 pmCan you try starting the service. If it cannot start, look at the ceph logs.
Can you try starting the service. If it cannot start, look at the ceph logs.
vantolik
5 Posts
January 17, 2020, 9:46 pmQuote from vantolik on January 17, 2020, 9:46 pmHi,
I have the question about original topic of this thread. We also plan to use 3 node Petasan(main storage system for Proxmox) between two server room(distance is two floor, so latency could not be an issue).
In general, when I place 2 nodes in server room 1 and 1 node in server room 2, and for example server room 1 lose electricity, whole Petasan goes down due to majority number of servers are down. How to handle this? If I would have 5 node Petasan, the problem would be the same. What is the best approach to have "uninterruptable" Petasan between two close server rooms?
Thank you very much.
Vladislav
Hi,
I have the question about original topic of this thread. We also plan to use 3 node Petasan(main storage system for Proxmox) between two server room(distance is two floor, so latency could not be an issue).
In general, when I place 2 nodes in server room 1 and 1 node in server room 2, and for example server room 1 lose electricity, whole Petasan goes down due to majority number of servers are down. How to handle this? If I would have 5 node Petasan, the problem would be the same. What is the best approach to have "uninterruptable" Petasan between two close server rooms?
Thank you very much.
Vladislav
admin
2,930 Posts
January 18, 2020, 12:02 amQuote from admin on January 18, 2020, 12:02 amAny system that requires a quorum to work, requires n/2 +1 nodes to be up ( n is the number of nodes). this avoids split brain conditions. such a system will have a problem being split into 2 locations.
For your case, one idea is to setup 5 nodes but make 1 of them a vm running under Proxmox in an HA setup, if one location fails the vm is switches active to the other location to maintain quorum. This vm should be a monitor only (no iSCSI/storage) and should have its virtual disk stored outside of PetaSAN.
Any system that requires a quorum to work, requires n/2 +1 nodes to be up ( n is the number of nodes). this avoids split brain conditions. such a system will have a problem being split into 2 locations.
For your case, one idea is to setup 5 nodes but make 1 of them a vm running under Proxmox in an HA setup, if one location fails the vm is switches active to the other location to maintain quorum. This vm should be a monitor only (no iSCSI/storage) and should have its virtual disk stored outside of PetaSAN.
Last edited on January 18, 2020, 12:04 am by admin · #38
Syscon
23 Posts
January 18, 2020, 7:30 amQuote from Syscon on January 18, 2020, 7:30 amHi,
Actually, none of the services seem to run. Also, if I look at our second test environment (Ceph Health OK) I see on the 3 NODES:
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE01__AM5-2
● ceph-mon@NODE01__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE02__AM5-2
● ceph-mon@NODE02__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE03__AM5-2
● ceph-mon@NODE03__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
Is this necessary at all or is this by default?
Your comment: "Can you try starting the service. If it cannot start, look at the ceph logs.
(Which 'service' did you meant I had to start?)
Thanks.
Hi,
Actually, none of the services seem to run. Also, if I look at our second test environment (Ceph Health OK) I see on the 3 NODES:
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE01__AM5-2
● ceph-mon@NODE01__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE02__AM5-2
● ceph-mon@NODE02__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE03__AM5-2
● ceph-mon@NODE03__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
Is this necessary at all or is this by default?
Your comment: "Can you try starting the service. If it cannot start, look at the ceph logs.
(Which 'service' did you meant I had to start?)
Thanks.
admin
2,930 Posts
January 18, 2020, 9:26 amQuote from admin on January 18, 2020, 9:26 amOn NODE03 :
save the following logs:
ceph daemon mon.NODE03_AM5-1 ops
ceph daemon mon.NODE03_AM5-1 dump_historic_ops
restart service
systemctl restart ceph-mon@NODE03_AM5-1
On NODE03 :
save the following logs:
ceph daemon mon.NODE03_AM5-1 ops
ceph daemon mon.NODE03_AM5-1 dump_historic_ops
restart service
systemctl restart ceph-mon@NODE03_AM5-1
PetaSAN in our 2 datacenters for production
Syscon
23 Posts
Quote from Syscon on January 14, 2020, 8:15 amGreat! "esxcli storage core device vaai status get" did return 'Supported' for the 'Delete Status'.
And good to know that vmfs will re-use the disk space that is not reclaimed. While reusing this disk space, does compression take place (so, for the non-reclaimed space)? Or should i then first reclaim the space before compression can kick in? After removing 100GB's of data (which was put on PetaSAN before compression worked) en putting new VM's (1TB of data) on the storage, I now see:
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.26
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.26POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.37M 26 TiB 74.31 4.4 TiB N/A N/A 3.37M 117 GiB 235 GiB
And:
osd.8
"bluestore_compressed": 1696971702,
"bluestore_compressed_allocated": 5907742720,
"bluestore_compressed_original": 11883800064,
osd.9
"bluestore_compressed": 1772540552,
"bluestore_compressed_allocated": 6132400128,
"bluestore_compressed_original": 12336011264,
osd.10
"bluestore_compressed": 1450032079,
"bluestore_compressed_allocated": 5003083776,
"bluestore_compressed_original": 10065756672,Is only 117GB now compressed? (as shown under 'USED COMPR')
And can we conclude that on OSD-10 (f.e.) only 5GB data is saved (due to compression)?
Great! "esxcli storage core device vaai status get" did return 'Supported' for the 'Delete Status'.
And good to know that vmfs will re-use the disk space that is not reclaimed. While reusing this disk space, does compression take place (so, for the non-reclaimed space)? Or should i then first reclaim the space before compression can kick in? After removing 100GB's of data (which was put on PetaSAN before compression worked) en putting new VM's (1TB of data) on the storage, I now see:
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.26
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.26
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.37M 26 TiB 74.31 4.4 TiB N/A N/A 3.37M 117 GiB 235 GiB
And:
osd.8
"bluestore_compressed": 1696971702,
"bluestore_compressed_allocated": 5907742720,
"bluestore_compressed_original": 11883800064,
osd.9
"bluestore_compressed": 1772540552,
"bluestore_compressed_allocated": 6132400128,
"bluestore_compressed_original": 12336011264,
osd.10
"bluestore_compressed": 1450032079,
"bluestore_compressed_allocated": 5003083776,
"bluestore_compressed_original": 10065756672,
Is only 117GB now compressed? (as shown under 'USED COMPR')
And can we conclude that on OSD-10 (f.e.) only 5GB data is saved (due to compression)?
admin
2,930 Posts
Quote from admin on January 14, 2020, 1:57 pmYou do not need to reclaim space, writing new data after you enable compression will be evaluated for compression.
Yes only 117 GB got compressed.
Can you show
ceph daemon osd.8 perf dump | grep compress_success_count
ceph daemon osd.8 perf dump | grep compress_rejected_countAre you using SSD or HDD OSDs ?
there is min write size below which compression will not happen, defaults are:
"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_compression_min_blob_size_ssd": "8192",If you write random vm traffic, writes below the above sizes will not be compressed.
If you do file transfers ( file copy, backups ) the block sizes will be big enough for compression.
You do not need to reclaim space, writing new data after you enable compression will be evaluated for compression.
Yes only 117 GB got compressed.
Can you show
ceph daemon osd.8 perf dump | grep compress_success_count
ceph daemon osd.8 perf dump | grep compress_rejected_count
Are you using SSD or HDD OSDs ?
there is min write size below which compression will not happen, defaults are:
"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_compression_min_blob_size_ssd": "8192",
If you write random vm traffic, writes below the above sizes will not be compressed.
If you do file transfers ( file copy, backups ) the block sizes will be big enough for compression.
Syscon
23 Posts
Quote from Syscon on January 14, 2020, 2:16 pmThe results:
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_success_count
"compress_success_count": 365627,
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_rejected_count
"compress_rejected_count": 71263,We are using SSD's.
The minimum write size is independent of the used compression type? (zstd vs snappy f.e.)
While looking at the 'RAW STORAGE' availability (16TB) and comparing this value to the Pool 'MAX AVAIL' (4.3TB), I don't understand why the Pool has less available then the RAW STORAGE is showing. We have 2 replica's so I would expect to have 16TB/2 = 8TB availability on the Pool (but it seems to be 4.3TB):
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.69
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.69POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.39M 26 TiB 74.90 4.3 TiB N/A N/A 3.39M 139 GiB 279 GiBWhat causes to have less space on the pool left then expected?
Thank you.
The results:
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_success_count
"compress_success_count": 365627,
root@NODE01_AM5-2:~# ceph daemon osd.8 perf dump | grep compress_rejected_count
"compress_rejected_count": 71263,
We are using SSD's.
The minimum write size is independent of the used compression type? (zstd vs snappy f.e.)
While looking at the 'RAW STORAGE' availability (16TB) and comparing this value to the Pool 'MAX AVAIL' (4.3TB), I don't understand why the Pool has less available then the RAW STORAGE is showing. We have 2 replica's so I would expect to have 16TB/2 = 8TB availability on the Pool (but it seems to be 4.3TB):
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 42 TiB 16 TiB 26 TiB 26 TiB 61.69
TOTAL 42 TiB 16 TiB 26 TiB 26 TiB 61.69
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd 1 13 TiB 3.39M 26 TiB 74.90 4.3 TiB N/A N/A 3.39M 139 GiB 279 GiB
What causes to have less space on the pool left then expected?
Thank you.
admin
2,930 Posts
Syscon
23 Posts
Quote from Syscon on January 17, 2020, 12:48 pmThanks, using the steps you gave;
ceph balancer mode crush-compat
ceph balancer onthere is more balance between the RAW en Pool space shown.
One of our two environments now gives this warning (since e few days):
root@NODE01_AM5-1:~# ceph -s
cluster:
id: 65838b73-85a9-4917-9644-cd7c1c292773
health: HEALTH_WARN
15 slow ops, oldest one blocked for 257004 sec, mon.NODE03_AM5-1 has slow opsservices:
mon: 3 daemons, quorum NODE03_AM5-1,NODE01_AM5-1,NODE02_AM5-1 (age 2d)
mgr: NODE01_AM5-1(active, since 3d), standbys: NODE02_AM5-1, NODE03_AM5-1
osd: 9 osds: 9 up (since 2d), 9 in (since 3w)data:
pools: 1 pools, 256 pgs
objects: 108.08k objects, 420 GiB
usage: 835 GiB used, 15 TiB / 16 TiB avail
pgs: 256 active+cleanio:
client: 2.4 MiB/s rd, 19 KiB/s wr, 108 op/s rd, 3 op/s wrAnd some extra info:
systemctl status ceph-mon@PetaSAN_AM5-1
● ceph-mon@PetaSAN_AM5-1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)The warning came after we disconnected first one storage switch for an update and next (first switch active again) the second one. The 3 NODES are connected via a BOND to both switches (for redundancy). Our storage hasn't been down but it seems it resulted in this warning.
How could we resolve this? Thanks in advance!
Thanks, using the steps you gave;
ceph balancer mode crush-compat
ceph balancer on
there is more balance between the RAW en Pool space shown.
One of our two environments now gives this warning (since e few days):
root@NODE01_AM5-1:~# ceph -s
cluster:
id: 65838b73-85a9-4917-9644-cd7c1c292773
health: HEALTH_WARN
15 slow ops, oldest one blocked for 257004 sec, mon.NODE03_AM5-1 has slow ops
services:
mon: 3 daemons, quorum NODE03_AM5-1,NODE01_AM5-1,NODE02_AM5-1 (age 2d)
mgr: NODE01_AM5-1(active, since 3d), standbys: NODE02_AM5-1, NODE03_AM5-1
osd: 9 osds: 9 up (since 2d), 9 in (since 3w)
data:
pools: 1 pools, 256 pgs
objects: 108.08k objects, 420 GiB
usage: 835 GiB used, 15 TiB / 16 TiB avail
pgs: 256 active+clean
io:
client: 2.4 MiB/s rd, 19 KiB/s wr, 108 op/s rd, 3 op/s wr
And some extra info:
systemctl status ceph-mon@PetaSAN_AM5-1
● ceph-mon@PetaSAN_AM5-1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
The warning came after we disconnected first one storage switch for an update and next (first switch active again) the second one. The 3 NODES are connected via a BOND to both switches (for redundancy). Our storage hasn't been down but it seems it resulted in this warning.
How could we resolve this? Thanks in advance!
admin
2,930 Posts
Quote from admin on January 17, 2020, 3:22 pmCan you try starting the service. If it cannot start, look at the ceph logs.
Can you try starting the service. If it cannot start, look at the ceph logs.
vantolik
5 Posts
Quote from vantolik on January 17, 2020, 9:46 pmHi,
I have the question about original topic of this thread. We also plan to use 3 node Petasan(main storage system for Proxmox) between two server room(distance is two floor, so latency could not be an issue).
In general, when I place 2 nodes in server room 1 and 1 node in server room 2, and for example server room 1 lose electricity, whole Petasan goes down due to majority number of servers are down. How to handle this? If I would have 5 node Petasan, the problem would be the same. What is the best approach to have "uninterruptable" Petasan between two close server rooms?
Thank you very much.
Vladislav
Hi,
I have the question about original topic of this thread. We also plan to use 3 node Petasan(main storage system for Proxmox) between two server room(distance is two floor, so latency could not be an issue).
In general, when I place 2 nodes in server room 1 and 1 node in server room 2, and for example server room 1 lose electricity, whole Petasan goes down due to majority number of servers are down. How to handle this? If I would have 5 node Petasan, the problem would be the same. What is the best approach to have "uninterruptable" Petasan between two close server rooms?
Thank you very much.
Vladislav
admin
2,930 Posts
Quote from admin on January 18, 2020, 12:02 amAny system that requires a quorum to work, requires n/2 +1 nodes to be up ( n is the number of nodes). this avoids split brain conditions. such a system will have a problem being split into 2 locations.
For your case, one idea is to setup 5 nodes but make 1 of them a vm running under Proxmox in an HA setup, if one location fails the vm is switches active to the other location to maintain quorum. This vm should be a monitor only (no iSCSI/storage) and should have its virtual disk stored outside of PetaSAN.
Any system that requires a quorum to work, requires n/2 +1 nodes to be up ( n is the number of nodes). this avoids split brain conditions. such a system will have a problem being split into 2 locations.
For your case, one idea is to setup 5 nodes but make 1 of them a vm running under Proxmox in an HA setup, if one location fails the vm is switches active to the other location to maintain quorum. This vm should be a monitor only (no iSCSI/storage) and should have its virtual disk stored outside of PetaSAN.
Syscon
23 Posts
Quote from Syscon on January 18, 2020, 7:30 amHi,
Actually, none of the services seem to run. Also, if I look at our second test environment (Ceph Health OK) I see on the 3 NODES:
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE01__AM5-2
● ceph-mon@NODE01__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE02__AM5-2
● ceph-mon@NODE02__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE03__AM5-2
● ceph-mon@NODE03__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)Is this necessary at all or is this by default?
Your comment: "Can you try starting the service. If it cannot start, look at the ceph logs.
(Which 'service' did you meant I had to start?)
Thanks.
Hi,
Actually, none of the services seem to run. Also, if I look at our second test environment (Ceph Health OK) I see on the 3 NODES:
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE01__AM5-2
● ceph-mon@NODE01__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE02__AM5-2
● ceph-mon@NODE02__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
root@NODE01_AM5-2:~# systemctl status ceph-mon@NODE03__AM5-2
● ceph-mon@NODE03__AM5-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor preset: enabled)
Active: inactive (dead)
Is this necessary at all or is this by default?
Your comment: "Can you try starting the service. If it cannot start, look at the ceph logs.
(Which 'service' did you meant I had to start?)
Thanks.
admin
2,930 Posts
Quote from admin on January 18, 2020, 9:26 amOn NODE03 :
save the following logs:
ceph daemon mon.NODE03_AM5-1 ops
ceph daemon mon.NODE03_AM5-1 dump_historic_opsrestart service
systemctl restart ceph-mon@NODE03_AM5-1
On NODE03 :
save the following logs:
ceph daemon mon.NODE03_AM5-1 ops
ceph daemon mon.NODE03_AM5-1 dump_historic_ops
restart service
systemctl restart ceph-mon@NODE03_AM5-1