PetaSAN in our 2 datacenters for production
Syscon
23 Posts
December 16, 2019, 10:22 amQuote from Syscon on December 16, 2019, 10:22 amHi Admin,
We have added a second cluster to act as a backup cluster (this is the same hardware as given earlier this post for the main cluster). The replication process went well but slow (average speed around 10MB/s). Replication has been done without any VM running. I have compression (for the replication job) enabled.
Is there a way to speed this up?
FYI, 4k iops benchmark (1 thread/1 client) delivers: 558 write and 2750 read on the replication cluster (while replication was running).
Furthermore, in case our main cluster would be unusable and we have to use the replication cluster, what are the steps to get the replication cluster working as fast as possible in our VMware environment? (So that we can startup the VM’s again from this storage)
Thanks for your reply!
Hi Admin,
We have added a second cluster to act as a backup cluster (this is the same hardware as given earlier this post for the main cluster). The replication process went well but slow (average speed around 10MB/s). Replication has been done without any VM running. I have compression (for the replication job) enabled.
Is there a way to speed this up?
FYI, 4k iops benchmark (1 thread/1 client) delivers: 558 write and 2750 read on the replication cluster (while replication was running).
Furthermore, in case our main cluster would be unusable and we have to use the replication cluster, what are the steps to get the replication cluster working as fast as possible in our VMware environment? (So that we can startup the VM’s again from this storage)
Thanks for your reply!
admin
2,930 Posts
December 16, 2019, 11:23 amQuote from admin on December 16, 2019, 11:23 amApart from network, compression type can slow down, can you try either without compression or with zstd.
You can perform replication while your vms are running, as this uses snapshots.
For getting the replication cluster working fast, we have a start all disks
/opt/petasan/scripts/util/start_all_disks.py
you should also stop the disk on the main cluster
/opt/petasan/scripts/util/stop_all_disks.py
You should stop replication jobs from ui, or de-activate the replication user at the destination
Apart from network, compression type can slow down, can you try either without compression or with zstd.
You can perform replication while your vms are running, as this uses snapshots.
For getting the replication cluster working fast, we have a start all disks
/opt/petasan/scripts/util/start_all_disks.py
you should also stop the disk on the main cluster
/opt/petasan/scripts/util/stop_all_disks.py
You should stop replication jobs from ui, or de-activate the replication user at the destination
Syscon
23 Posts
December 18, 2019, 8:44 amQuote from Syscon on December 18, 2019, 8:44 amHi Admin,
Have tested the replication task without compression (still have to test with zstd) and the speed increased 5 times, thanks!
Adding the replication cluster to ESXi also worked.
I realized something else though during the test: When disconnecting one of the iSCSI/Backend (10GB) connections on one of the nodes (applies to all nodes) the node becomes unavailable in the cluster.
The network setup (per node) looks like this:
1x Management (1GB)
2x 10GB for iSCSI and Backend -> the first 10GB adapter: iSCSI 1 + Backend 1 and the second 10GB adapter: iSCSI 2 + Backend 2.
I would expect that if I disconnect the cable from the first 10GB adapter, the second will still continue to function (and the node will be active).
Could you please explain this? Thank you!
Hi Admin,
Have tested the replication task without compression (still have to test with zstd) and the speed increased 5 times, thanks!
Adding the replication cluster to ESXi also worked.
I realized something else though during the test: When disconnecting one of the iSCSI/Backend (10GB) connections on one of the nodes (applies to all nodes) the node becomes unavailable in the cluster.
The network setup (per node) looks like this:
1x Management (1GB)
2x 10GB for iSCSI and Backend -> the first 10GB adapter: iSCSI 1 + Backend 1 and the second 10GB adapter: iSCSI 2 + Backend 2.
I would expect that if I disconnect the cable from the first 10GB adapter, the second will still continue to function (and the node will be active).
Could you please explain this? Thank you!
admin
2,930 Posts
December 18, 2019, 10:30 amQuote from admin on December 18, 2019, 10:30 amYou were probably using gzip compression which does give high compression but is not real time like zstd and lz4. Note you can also try using smaller size disk images and see if the rate increases.
For the network issue: backend 1 and backend 2 are different networks serving different purposes, they are not a redundant setup, if you need redundancy you need to configure interface bonding.
You were probably using gzip compression which does give high compression but is not real time like zstd and lz4. Note you can also try using smaller size disk images and see if the rate increases.
For the network issue: backend 1 and backend 2 are different networks serving different purposes, they are not a redundant setup, if you need redundancy you need to configure interface bonding.
Syscon
23 Posts
December 19, 2019, 8:02 amQuote from Syscon on December 19, 2019, 8:02 amGreat, with bonding the fail-over works now.
Regarding the types of network bonding I have chosen 'balance-tlb'. Is this the best type/mode or do you advice another type in this setup ?
Goal is to have redundancy (we will use two switches for both 10GB network adapters) and performance.
Thanks!
Great, with bonding the fail-over works now.
Regarding the types of network bonding I have chosen 'balance-tlb'. Is this the best type/mode or do you advice another type in this setup ?
Goal is to have redundancy (we will use two switches for both 10GB network adapters) and performance.
Thanks!
Last edited on December 19, 2019, 10:42 am by Syscon · #15
admin
2,930 Posts
December 19, 2019, 5:16 pmQuote from admin on December 19, 2019, 5:16 pmThis depends on your network and switches, the most common we find is LACP as well as active/backup. You should have your bonds go to 2 different switches so to protected against switch failures, but for LACP you need to check your switch supports this splitting.
This depends on your network and switches, the most common we find is LACP as well as active/backup. You should have your bonds go to 2 different switches so to protected against switch failures, but for LACP you need to check your switch supports this splitting.
Last edited on December 19, 2019, 5:16 pm by admin · #16
Syscon
23 Posts
January 9, 2020, 2:04 pmQuote from Syscon on January 9, 2020, 2:04 pmRegarding the compression, I choose for 'zstd' and after I have copied 3 TB of Virtual Machines (VMWare) on the cluster (42TB total with 2 replica's, so 21TB availability). Looking at the storage counter and via SSH vs the used space in vCenter/VMware I can't seem to notice that any compression took place. How could i check the compression ratio?
Thanks for your support!
Regarding the compression, I choose for 'zstd' and after I have copied 3 TB of Virtual Machines (VMWare) on the cluster (42TB total with 2 replica's, so 21TB availability). Looking at the storage counter and via SSH vs the used space in vCenter/VMware I can't seem to notice that any compression took place. How could i check the compression ratio?
Thanks for your support!
admin
2,930 Posts
January 9, 2020, 4:52 pmQuote from admin on January 9, 2020, 4:52 pmThe compression used in replication is an on the fly network traffic compression used during transmission only, it is decompressed on the fly at the destination. The destination image is an identical copy of source image, and can be started as a normal iSCSI disk at the destination, so it is not compressed.
During the time a replication job is active, you can get its status from the ui, you can see the measured compression ratio. You can probably measure the same thing via some network traffic monitor tool.
In the near future, we will also be supporting backup/restore operations including to/from remote sites (and cloud storage). Backups will indeed support compression at the storage level, but replication between clusters is different as you need to store a valid ceph image.
The compression used in replication is an on the fly network traffic compression used during transmission only, it is decompressed on the fly at the destination. The destination image is an identical copy of source image, and can be started as a normal iSCSI disk at the destination, so it is not compressed.
During the time a replication job is active, you can get its status from the ui, you can see the measured compression ratio. You can probably measure the same thing via some network traffic monitor tool.
In the near future, we will also be supporting backup/restore operations including to/from remote sites (and cloud storage). Backups will indeed support compression at the storage level, but replication between clusters is different as you need to store a valid ceph image.
Last edited on January 9, 2020, 4:55 pm by admin · #18
Syscon
23 Posts
January 10, 2020, 8:03 amQuote from Syscon on January 10, 2020, 8:03 amOk, regarding replication that's clear, thanks!
My question remaining (I have not been clear here) is about the compression ratio on the primary storage. So, with zstd choosen, what will be my 'local' compression ratio (and how can I check this). I hope to see, using zstd compression, that more VM's/data could be stored on the primary storage. With our former storage we managed with LZ4 to have a ratio of 1,75 for example.
Ok, regarding replication that's clear, thanks!
My question remaining (I have not been clear here) is about the compression ratio on the primary storage. So, with zstd choosen, what will be my 'local' compression ratio (and how can I check this). I hope to see, using zstd compression, that more VM's/data could be stored on the primary storage. With our former storage we managed with LZ4 to have a ratio of 1,75 for example.
admin
2,930 Posts
January 10, 2020, 9:28 amQuote from admin on January 10, 2020, 9:28 amAh, so you mean the pool is compressed.
you can view compression data in various ways,
ceph df detail
will show at the pool level under POOLS:
UNDER COMPR how much data bytes qualified for compression
USED COMPR size of this data after compression in bytes
also under RAW STORAGE, you will find USED values increase lower than if you calculate your writes x replica count
at the OSD level you can run
for osd in `seq 0 10`; do echo osd.$osd; sudo ceph daemon osd.$osd perf dump | grep 'bluestore_compressed'; done
and look at
bluestore_compressed_allocated -> bytes after compression
bluestore_compressed_original" -> bytes before compression
Ah, so you mean the pool is compressed.
you can view compression data in various ways,
ceph df detail
will show at the pool level under POOLS:
UNDER COMPR how much data bytes qualified for compression
USED COMPR size of this data after compression in bytes
also under RAW STORAGE, you will find USED values increase lower than if you calculate your writes x replica count
at the OSD level you can run
for osd in `seq 0 10`; do echo osd.$osd; sudo ceph daemon osd.$osd perf dump | grep 'bluestore_compressed'; done
and look at
bluestore_compressed_allocated -> bytes after compression
bluestore_compressed_original" -> bytes before compression
PetaSAN in our 2 datacenters for production
Syscon
23 Posts
Quote from Syscon on December 16, 2019, 10:22 amHi Admin,
We have added a second cluster to act as a backup cluster (this is the same hardware as given earlier this post for the main cluster). The replication process went well but slow (average speed around 10MB/s). Replication has been done without any VM running. I have compression (for the replication job) enabled.
Is there a way to speed this up?
FYI, 4k iops benchmark (1 thread/1 client) delivers: 558 write and 2750 read on the replication cluster (while replication was running).
Furthermore, in case our main cluster would be unusable and we have to use the replication cluster, what are the steps to get the replication cluster working as fast as possible in our VMware environment? (So that we can startup the VM’s again from this storage)
Thanks for your reply!
Hi Admin,
We have added a second cluster to act as a backup cluster (this is the same hardware as given earlier this post for the main cluster). The replication process went well but slow (average speed around 10MB/s). Replication has been done without any VM running. I have compression (for the replication job) enabled.
Is there a way to speed this up?
FYI, 4k iops benchmark (1 thread/1 client) delivers: 558 write and 2750 read on the replication cluster (while replication was running).
Furthermore, in case our main cluster would be unusable and we have to use the replication cluster, what are the steps to get the replication cluster working as fast as possible in our VMware environment? (So that we can startup the VM’s again from this storage)
Thanks for your reply!
admin
2,930 Posts
Quote from admin on December 16, 2019, 11:23 amApart from network, compression type can slow down, can you try either without compression or with zstd.
You can perform replication while your vms are running, as this uses snapshots.
For getting the replication cluster working fast, we have a start all disks
/opt/petasan/scripts/util/start_all_disks.py
you should also stop the disk on the main cluster
/opt/petasan/scripts/util/stop_all_disks.py
You should stop replication jobs from ui, or de-activate the replication user at the destination
Apart from network, compression type can slow down, can you try either without compression or with zstd.
You can perform replication while your vms are running, as this uses snapshots.
For getting the replication cluster working fast, we have a start all disks
/opt/petasan/scripts/util/start_all_disks.py
you should also stop the disk on the main cluster
/opt/petasan/scripts/util/stop_all_disks.py
You should stop replication jobs from ui, or de-activate the replication user at the destination
Syscon
23 Posts
Quote from Syscon on December 18, 2019, 8:44 amHi Admin,
Have tested the replication task without compression (still have to test with zstd) and the speed increased 5 times, thanks!
Adding the replication cluster to ESXi also worked.
I realized something else though during the test: When disconnecting one of the iSCSI/Backend (10GB) connections on one of the nodes (applies to all nodes) the node becomes unavailable in the cluster.
The network setup (per node) looks like this:
1x Management (1GB)
2x 10GB for iSCSI and Backend -> the first 10GB adapter: iSCSI 1 + Backend 1 and the second 10GB adapter: iSCSI 2 + Backend 2.
I would expect that if I disconnect the cable from the first 10GB adapter, the second will still continue to function (and the node will be active).
Could you please explain this? Thank you!
Hi Admin,
Have tested the replication task without compression (still have to test with zstd) and the speed increased 5 times, thanks!
Adding the replication cluster to ESXi also worked.
I realized something else though during the test: When disconnecting one of the iSCSI/Backend (10GB) connections on one of the nodes (applies to all nodes) the node becomes unavailable in the cluster.
The network setup (per node) looks like this:
1x Management (1GB)
2x 10GB for iSCSI and Backend -> the first 10GB adapter: iSCSI 1 + Backend 1 and the second 10GB adapter: iSCSI 2 + Backend 2.
I would expect that if I disconnect the cable from the first 10GB adapter, the second will still continue to function (and the node will be active).
Could you please explain this? Thank you!
admin
2,930 Posts
Quote from admin on December 18, 2019, 10:30 amYou were probably using gzip compression which does give high compression but is not real time like zstd and lz4. Note you can also try using smaller size disk images and see if the rate increases.
For the network issue: backend 1 and backend 2 are different networks serving different purposes, they are not a redundant setup, if you need redundancy you need to configure interface bonding.
You were probably using gzip compression which does give high compression but is not real time like zstd and lz4. Note you can also try using smaller size disk images and see if the rate increases.
For the network issue: backend 1 and backend 2 are different networks serving different purposes, they are not a redundant setup, if you need redundancy you need to configure interface bonding.
Syscon
23 Posts
Quote from Syscon on December 19, 2019, 8:02 amGreat, with bonding the fail-over works now.
Regarding the types of network bonding I have chosen 'balance-tlb'. Is this the best type/mode or do you advice another type in this setup ?
Goal is to have redundancy (we will use two switches for both 10GB network adapters) and performance.
Thanks!
Great, with bonding the fail-over works now.
Regarding the types of network bonding I have chosen 'balance-tlb'. Is this the best type/mode or do you advice another type in this setup ?
Goal is to have redundancy (we will use two switches for both 10GB network adapters) and performance.
Thanks!
admin
2,930 Posts
Quote from admin on December 19, 2019, 5:16 pmThis depends on your network and switches, the most common we find is LACP as well as active/backup. You should have your bonds go to 2 different switches so to protected against switch failures, but for LACP you need to check your switch supports this splitting.
This depends on your network and switches, the most common we find is LACP as well as active/backup. You should have your bonds go to 2 different switches so to protected against switch failures, but for LACP you need to check your switch supports this splitting.
Syscon
23 Posts
Quote from Syscon on January 9, 2020, 2:04 pmRegarding the compression, I choose for 'zstd' and after I have copied 3 TB of Virtual Machines (VMWare) on the cluster (42TB total with 2 replica's, so 21TB availability). Looking at the storage counter and via SSH vs the used space in vCenter/VMware I can't seem to notice that any compression took place. How could i check the compression ratio?
Thanks for your support!
Regarding the compression, I choose for 'zstd' and after I have copied 3 TB of Virtual Machines (VMWare) on the cluster (42TB total with 2 replica's, so 21TB availability). Looking at the storage counter and via SSH vs the used space in vCenter/VMware I can't seem to notice that any compression took place. How could i check the compression ratio?
Thanks for your support!
admin
2,930 Posts
Quote from admin on January 9, 2020, 4:52 pmThe compression used in replication is an on the fly network traffic compression used during transmission only, it is decompressed on the fly at the destination. The destination image is an identical copy of source image, and can be started as a normal iSCSI disk at the destination, so it is not compressed.
During the time a replication job is active, you can get its status from the ui, you can see the measured compression ratio. You can probably measure the same thing via some network traffic monitor tool.
In the near future, we will also be supporting backup/restore operations including to/from remote sites (and cloud storage). Backups will indeed support compression at the storage level, but replication between clusters is different as you need to store a valid ceph image.
The compression used in replication is an on the fly network traffic compression used during transmission only, it is decompressed on the fly at the destination. The destination image is an identical copy of source image, and can be started as a normal iSCSI disk at the destination, so it is not compressed.
During the time a replication job is active, you can get its status from the ui, you can see the measured compression ratio. You can probably measure the same thing via some network traffic monitor tool.
In the near future, we will also be supporting backup/restore operations including to/from remote sites (and cloud storage). Backups will indeed support compression at the storage level, but replication between clusters is different as you need to store a valid ceph image.
Syscon
23 Posts
Quote from Syscon on January 10, 2020, 8:03 amOk, regarding replication that's clear, thanks!
My question remaining (I have not been clear here) is about the compression ratio on the primary storage. So, with zstd choosen, what will be my 'local' compression ratio (and how can I check this). I hope to see, using zstd compression, that more VM's/data could be stored on the primary storage. With our former storage we managed with LZ4 to have a ratio of 1,75 for example.
Ok, regarding replication that's clear, thanks!
My question remaining (I have not been clear here) is about the compression ratio on the primary storage. So, with zstd choosen, what will be my 'local' compression ratio (and how can I check this). I hope to see, using zstd compression, that more VM's/data could be stored on the primary storage. With our former storage we managed with LZ4 to have a ratio of 1,75 for example.
admin
2,930 Posts
Quote from admin on January 10, 2020, 9:28 amAh, so you mean the pool is compressed.
you can view compression data in various ways,
ceph df detail
will show at the pool level under POOLS:
UNDER COMPR how much data bytes qualified for compression
USED COMPR size of this data after compression in bytes
also under RAW STORAGE, you will find USED values increase lower than if you calculate your writes x replica count
at the OSD level you can run
for osd in `seq 0 10`; do echo osd.$osd; sudo ceph daemon osd.$osd perf dump | grep 'bluestore_compressed'; done
and look at
bluestore_compressed_allocated -> bytes after compression
bluestore_compressed_original" -> bytes before compression
Ah, so you mean the pool is compressed.
you can view compression data in various ways,
ceph df detail
will show at the pool level under POOLS:
UNDER COMPR how much data bytes qualified for compression
USED COMPR size of this data after compression in bytes
also under RAW STORAGE, you will find USED values increase lower than if you calculate your writes x replica count
at the OSD level you can run
for osd in `seq 0 10`; do echo osd.$osd; sudo ceph daemon osd.$osd perf dump | grep 'bluestore_compressed'; done
and look at
bluestore_compressed_allocated -> bytes after compression
bluestore_compressed_original" -> bytes before compression