PetaSAN v 1.3.0 released !
admin
2,930 Posts
May 18, 2017, 3:39 pmQuote from admin on May 18, 2017, 3:39 pmHappy to announce the release of PetaSAN 1.3.0 with the following new features :
- Interface Bonding / LACP.
- Support Jumbo frames.
- Upgrade via installer.
- Email notifications / alarms.
- Selection of 2 or 3 data replicas.
Hope you like it !
Happy to announce the release of PetaSAN 1.3.0 with the following new features :
- Interface Bonding / LACP.
- Support Jumbo frames.
- Upgrade via installer.
- Email notifications / alarms.
- Selection of 2 or 3 data replicas.
Hope you like it !
mmenzo
8 Posts
May 22, 2017, 7:01 amQuote from mmenzo on May 22, 2017, 7:01 amIs there any documentation available on how we upgrade PetaSAN from 1.2.2 to 1.3.0 without data loss (and preferably without downtime as well)?
Is there any documentation available on how we upgrade PetaSAN from 1.2.2 to 1.3.0 without data loss (and preferably without downtime as well)?
admin
2,930 Posts
May 22, 2017, 9:48 amQuote from admin on May 22, 2017, 9:48 amThe installer will auto-detect your existing installation and will offer the choice of Upgrade ( default choice ) or New install. The upgrade will preserve all your data and should take 5 minutes, the node will be down, but the cluster (other nodes ) and client io will be running, Client io will feel a 25 sec pause before switching the path to another node ( this is client configured but is the default settings in the Windows initiator or VMWare iSCSI adaptor default settings but can be changed ).
In the near future we will also be adding online updates which will cover small application changes whereas the installer upgrades will typically be used for major changes like kernel/ceph/ubuntu system files.
The installer will auto-detect your existing installation and will offer the choice of Upgrade ( default choice ) or New install. The upgrade will preserve all your data and should take 5 minutes, the node will be down, but the cluster (other nodes ) and client io will be running, Client io will feel a 25 sec pause before switching the path to another node ( this is client configured but is the default settings in the Windows initiator or VMWare iSCSI adaptor default settings but can be changed ).
In the near future we will also be adding online updates which will cover small application changes whereas the installer upgrades will typically be used for major changes like kernel/ceph/ubuntu system files.
mmenzo
8 Posts
May 22, 2017, 3:12 pmQuote from mmenzo on May 22, 2017, 3:12 pmThank you for the information! Our cluster has succesfully been updated without any downtime / data loss.
Thank you for the information! Our cluster has succesfully been updated without any downtime / data loss.
admin
2,930 Posts
May 23, 2017, 2:00 pmQuote from admin on May 23, 2017, 2:00 pmNice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
Nice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
mmenzo
8 Posts
May 24, 2017, 12:25 pmQuote from mmenzo on May 24, 2017, 12:25 pm
Quote from admin on May 23, 2017, 2:00 pm
Nice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
We use VMware. Currently we are not entirely happy with the cluster performance (ceph benchmark runs at 80 MB/s write, 300 MB/s read and VMware reports about half of those values), but that may be because we took a bunch of used disks and placed them in the cluster. I have the feeling some of those disks are quite slow and they are bringing the cluster performance down, but I'm still investigating it.
We are happy with the level of redundancy though! We performed some tests (like physically pulling the power plugs on a storage node) and nothing broke.
Quote from admin on May 23, 2017, 2:00 pm
Nice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
We use VMware. Currently we are not entirely happy with the cluster performance (ceph benchmark runs at 80 MB/s write, 300 MB/s read and VMware reports about half of those values), but that may be because we took a bunch of used disks and placed them in the cluster. I have the feeling some of those disks are quite slow and they are bringing the cluster performance down, but I'm still investigating it.
We are happy with the level of redundancy though! We performed some tests (like physically pulling the power plugs on a storage node) and nothing broke.
gomaelettronica
10 Posts
May 24, 2017, 2:42 pmQuote from gomaelettronica on May 24, 2017, 2:42 pmHi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
Hi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
mmenzo
8 Posts
May 24, 2017, 2:56 pmQuote from mmenzo on May 24, 2017, 2:56 pm
Quote from gomaelettronica on May 24, 2017, 2:42 pm
Hi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
I used the following tutorial to test the performance in Ceph: http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance
To test the actual writing speed for a VM I used the Linux command "dd" - for example "dd if=/dev/zero of=/tmp/tmp.img bs=1G count=1 oflag=direct"
Quote from gomaelettronica on May 24, 2017, 2:42 pm
Hi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
I used the following tutorial to test the performance in Ceph: http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance
To test the actual writing speed for a VM I used the Linux command "dd" - for example "dd if=/dev/zero of=/tmp/tmp.img bs=1G count=1 oflag=direct"
Last edited on May 24, 2017, 2:57 pm · #8
gomaelettronica
10 Posts
May 24, 2017, 5:04 pmQuote from gomaelettronica on May 24, 2017, 5:04 pmThank you very much!
In the next days I will benchmark my testing cluster.
Best regards.
Luca
Thank you very much!
In the next days I will benchmark my testing cluster.
Best regards.
Luca
admin
2,930 Posts
May 24, 2017, 5:19 pmQuote from admin on May 24, 2017, 5:19 pmWe are happy with the level of redundancy though!
Well it is good to hear , we have put a lot to make it this way 🙂 we are also planning to add performance bench marking and tuning features in the near future.
In your case, the quickest way to identify the bottleneck is to measure how busy your disks and cpu are ( your network is 10G so it is not the culprit ) while doing the Ceph benchmark test. We have several command line tools included: atop collectl and sysstat. You can choose any one of them. We need to look for busy % more than bandwidth/iops.
show all stats in 1 page:
atop
sysstat is my preference total cpu:
sar 3 5
individual cores:
sar -P ALL 3 5
disks:
sar -d -p 3 5
Hopefully the cpu utilizations are low and the disks are the bottleneck. If all your disks are all high % and cpus % is low this means your system can accommodate more disks on the same host, the more you add the faster the cluster becomes and you will start using more of the idle cpu / net resources. If only a couple of disks have high % then they are bad apples that are slowing the entire cluster and should be removed (remove one at a time and allow Ceph self healing to complete, this can be monitored via the PG Status on the dashboard). If your cpus are high % then you are using under-powered machines or we have problem.
It goes without saying that if you can use faster disks (SSD!) things will really fly but of-course the purpose of tuning is to make best use of existing resources.
Regarding the difference between what you saw in Ceph benchmark and VMWare, this is probably due the different block sizes and io depth / threads. The rados bench command has a default of 4M block size + 16 threads, your VMWare is using much lower block sizes than this (you can get an average block size by dividing the observed bandwidth with the iops ) and can re-run the rados bench with the -b option.
For the raw disk test you used with the dd command, you need to add dsync flag since the journal uses it and use a smaller block size which would be more close to the VMWare pattern:
dd if=/dev/zero of=out_file bs=4K count=100000 oflag=direct,dsync
Also this tests sequential write speed, in real VMWare case you will have many concurrent threads doing small io and your disk seek latency will be the main factor.
One more thing, Ceph uses a journal to achieve write integrity, so that in case of failure halfway through a write you do not have inconsistent data. So each client write io is done twice on disk requiring 3 seeks and if your replica count is 2 (default can be changed from Cluster Settings) this will be done twice. For larger block sizes, such as 4M used in Ceph benchmark, you expect to see 1/4 ratio between write and read speeds. For small block sizes this ratio will be up to 6. This correlates with the 80/300 you see
We are happy with the level of redundancy though!
Well it is good to hear , we have put a lot to make it this way 🙂 we are also planning to add performance bench marking and tuning features in the near future.
In your case, the quickest way to identify the bottleneck is to measure how busy your disks and cpu are ( your network is 10G so it is not the culprit ) while doing the Ceph benchmark test. We have several command line tools included: atop collectl and sysstat. You can choose any one of them. We need to look for busy % more than bandwidth/iops.
show all stats in 1 page:
atop
sysstat is my preference total cpu:
sar 3 5
individual cores:
sar -P ALL 3 5
disks:
sar -d -p 3 5
Hopefully the cpu utilizations are low and the disks are the bottleneck. If all your disks are all high % and cpus % is low this means your system can accommodate more disks on the same host, the more you add the faster the cluster becomes and you will start using more of the idle cpu / net resources. If only a couple of disks have high % then they are bad apples that are slowing the entire cluster and should be removed (remove one at a time and allow Ceph self healing to complete, this can be monitored via the PG Status on the dashboard). If your cpus are high % then you are using under-powered machines or we have problem.
It goes without saying that if you can use faster disks (SSD!) things will really fly but of-course the purpose of tuning is to make best use of existing resources.
Regarding the difference between what you saw in Ceph benchmark and VMWare, this is probably due the different block sizes and io depth / threads. The rados bench command has a default of 4M block size + 16 threads, your VMWare is using much lower block sizes than this (you can get an average block size by dividing the observed bandwidth with the iops ) and can re-run the rados bench with the -b option.
For the raw disk test you used with the dd command, you need to add dsync flag since the journal uses it and use a smaller block size which would be more close to the VMWare pattern:
dd if=/dev/zero of=out_file bs=4K count=100000 oflag=direct,dsync
Also this tests sequential write speed, in real VMWare case you will have many concurrent threads doing small io and your disk seek latency will be the main factor.
One more thing, Ceph uses a journal to achieve write integrity, so that in case of failure halfway through a write you do not have inconsistent data. So each client write io is done twice on disk requiring 3 seeks and if your replica count is 2 (default can be changed from Cluster Settings) this will be done twice. For larger block sizes, such as 4M used in Ceph benchmark, you expect to see 1/4 ratio between write and read speeds. For small block sizes this ratio will be up to 6. This correlates with the 80/300 you see
Last edited on May 25, 2017, 7:27 am · #10
PetaSAN v 1.3.0 released !
admin
2,930 Posts
Quote from admin on May 18, 2017, 3:39 pmHappy to announce the release of PetaSAN 1.3.0 with the following new features :
- Interface Bonding / LACP.
- Support Jumbo frames.
- Upgrade via installer.
- Email notifications / alarms.
- Selection of 2 or 3 data replicas.
Hope you like it !
Happy to announce the release of PetaSAN 1.3.0 with the following new features :
- Interface Bonding / LACP.
- Support Jumbo frames.
- Upgrade via installer.
- Email notifications / alarms.
- Selection of 2 or 3 data replicas.
Hope you like it !
mmenzo
8 Posts
Quote from mmenzo on May 22, 2017, 7:01 amIs there any documentation available on how we upgrade PetaSAN from 1.2.2 to 1.3.0 without data loss (and preferably without downtime as well)?
Is there any documentation available on how we upgrade PetaSAN from 1.2.2 to 1.3.0 without data loss (and preferably without downtime as well)?
admin
2,930 Posts
Quote from admin on May 22, 2017, 9:48 amThe installer will auto-detect your existing installation and will offer the choice of Upgrade ( default choice ) or New install. The upgrade will preserve all your data and should take 5 minutes, the node will be down, but the cluster (other nodes ) and client io will be running, Client io will feel a 25 sec pause before switching the path to another node ( this is client configured but is the default settings in the Windows initiator or VMWare iSCSI adaptor default settings but can be changed ).
In the near future we will also be adding online updates which will cover small application changes whereas the installer upgrades will typically be used for major changes like kernel/ceph/ubuntu system files.
The installer will auto-detect your existing installation and will offer the choice of Upgrade ( default choice ) or New install. The upgrade will preserve all your data and should take 5 minutes, the node will be down, but the cluster (other nodes ) and client io will be running, Client io will feel a 25 sec pause before switching the path to another node ( this is client configured but is the default settings in the Windows initiator or VMWare iSCSI adaptor default settings but can be changed ).
In the near future we will also be adding online updates which will cover small application changes whereas the installer upgrades will typically be used for major changes like kernel/ceph/ubuntu system files.
mmenzo
8 Posts
Quote from mmenzo on May 22, 2017, 3:12 pmThank you for the information! Our cluster has succesfully been updated without any downtime / data loss.
Thank you for the information! Our cluster has succesfully been updated without any downtime / data loss.
admin
2,930 Posts
Quote from admin on May 23, 2017, 2:00 pmNice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
Nice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
mmenzo
8 Posts
Quote from mmenzo on May 24, 2017, 12:25 pmQuote from admin on May 23, 2017, 2:00 pmNice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
We use VMware. Currently we are not entirely happy with the cluster performance (ceph benchmark runs at 80 MB/s write, 300 MB/s read and VMware reports about half of those values), but that may be because we took a bunch of used disks and placed them in the cluster. I have the feeling some of those disks are quite slow and they are bringing the cluster performance down, but I'm still investigating it.
We are happy with the level of redundancy though! We performed some tests (like physically pulling the power plugs on a storage node) and nothing broke.
Quote from admin on May 23, 2017, 2:00 pmNice to hear things went smooth 🙂
Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?
We use VMware. Currently we are not entirely happy with the cluster performance (ceph benchmark runs at 80 MB/s write, 300 MB/s read and VMware reports about half of those values), but that may be because we took a bunch of used disks and placed them in the cluster. I have the feeling some of those disks are quite slow and they are bringing the cluster performance down, but I'm still investigating it.
We are happy with the level of redundancy though! We performed some tests (like physically pulling the power plugs on a storage node) and nothing broke.
gomaelettronica
10 Posts
Quote from gomaelettronica on May 24, 2017, 2:42 pmHi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
Hi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
mmenzo
8 Posts
Quote from mmenzo on May 24, 2017, 2:56 pmQuote from gomaelettronica on May 24, 2017, 2:42 pmHi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
I used the following tutorial to test the performance in Ceph: http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance
To test the actual writing speed for a VM I used the Linux command "dd" - for example "dd if=/dev/zero of=/tmp/tmp.img bs=1G count=1 oflag=direct"
Quote from gomaelettronica on May 24, 2017, 2:42 pmHi,
I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?
Thanks in advance.
Luca
I used the following tutorial to test the performance in Ceph: http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance
To test the actual writing speed for a VM I used the Linux command "dd" - for example "dd if=/dev/zero of=/tmp/tmp.img bs=1G count=1 oflag=direct"
gomaelettronica
10 Posts
Quote from gomaelettronica on May 24, 2017, 5:04 pmThank you very much!
In the next days I will benchmark my testing cluster.
Best regards.
Luca
Thank you very much!
In the next days I will benchmark my testing cluster.
Best regards.
Luca
admin
2,930 Posts
Quote from admin on May 24, 2017, 5:19 pmWe are happy with the level of redundancy though!
Well it is good to hear , we have put a lot to make it this way 🙂 we are also planning to add performance bench marking and tuning features in the near future.
In your case, the quickest way to identify the bottleneck is to measure how busy your disks and cpu are ( your network is 10G so it is not the culprit ) while doing the Ceph benchmark test. We have several command line tools included: atop collectl and sysstat. You can choose any one of them. We need to look for busy % more than bandwidth/iops.
show all stats in 1 page:
atop
sysstat is my preference total cpu:
sar 3 5
individual cores:
sar -P ALL 3 5
disks:
sar -d -p 3 5
Hopefully the cpu utilizations are low and the disks are the bottleneck. If all your disks are all high % and cpus % is low this means your system can accommodate more disks on the same host, the more you add the faster the cluster becomes and you will start using more of the idle cpu / net resources. If only a couple of disks have high % then they are bad apples that are slowing the entire cluster and should be removed (remove one at a time and allow Ceph self healing to complete, this can be monitored via the PG Status on the dashboard). If your cpus are high % then you are using under-powered machines or we have problem.
It goes without saying that if you can use faster disks (SSD!) things will really fly but of-course the purpose of tuning is to make best use of existing resources.
Regarding the difference between what you saw in Ceph benchmark and VMWare, this is probably due the different block sizes and io depth / threads. The rados bench command has a default of 4M block size + 16 threads, your VMWare is using much lower block sizes than this (you can get an average block size by dividing the observed bandwidth with the iops ) and can re-run the rados bench with the -b option.
For the raw disk test you used with the dd command, you need to add dsync flag since the journal uses it and use a smaller block size which would be more close to the VMWare pattern:
dd if=/dev/zero of=out_file bs=4K count=100000 oflag=direct,dsync
Also this tests sequential write speed, in real VMWare case you will have many concurrent threads doing small io and your disk seek latency will be the main factor.
One more thing, Ceph uses a journal to achieve write integrity, so that in case of failure halfway through a write you do not have inconsistent data. So each client write io is done twice on disk requiring 3 seeks and if your replica count is 2 (default can be changed from Cluster Settings) this will be done twice. For larger block sizes, such as 4M used in Ceph benchmark, you expect to see 1/4 ratio between write and read speeds. For small block sizes this ratio will be up to 6. This correlates with the 80/300 you see
We are happy with the level of redundancy though!
Well it is good to hear , we have put a lot to make it this way 🙂 we are also planning to add performance bench marking and tuning features in the near future.
In your case, the quickest way to identify the bottleneck is to measure how busy your disks and cpu are ( your network is 10G so it is not the culprit ) while doing the Ceph benchmark test. We have several command line tools included: atop collectl and sysstat. You can choose any one of them. We need to look for busy % more than bandwidth/iops.
show all stats in 1 page:
atop
sysstat is my preference total cpu:
sar 3 5
individual cores:
sar -P ALL 3 5
disks:
sar -d -p 3 5
Hopefully the cpu utilizations are low and the disks are the bottleneck. If all your disks are all high % and cpus % is low this means your system can accommodate more disks on the same host, the more you add the faster the cluster becomes and you will start using more of the idle cpu / net resources. If only a couple of disks have high % then they are bad apples that are slowing the entire cluster and should be removed (remove one at a time and allow Ceph self healing to complete, this can be monitored via the PG Status on the dashboard). If your cpus are high % then you are using under-powered machines or we have problem.
It goes without saying that if you can use faster disks (SSD!) things will really fly but of-course the purpose of tuning is to make best use of existing resources.
Regarding the difference between what you saw in Ceph benchmark and VMWare, this is probably due the different block sizes and io depth / threads. The rados bench command has a default of 4M block size + 16 threads, your VMWare is using much lower block sizes than this (you can get an average block size by dividing the observed bandwidth with the iops ) and can re-run the rados bench with the -b option.
For the raw disk test you used with the dd command, you need to add dsync flag since the journal uses it and use a smaller block size which would be more close to the VMWare pattern:
dd if=/dev/zero of=out_file bs=4K count=100000 oflag=direct,dsync
Also this tests sequential write speed, in real VMWare case you will have many concurrent threads doing small io and your disk seek latency will be the main factor.
One more thing, Ceph uses a journal to achieve write integrity, so that in case of failure halfway through a write you do not have inconsistent data. So each client write io is done twice on disk requiring 3 seeks and if your replica count is 2 (default can be changed from Cluster Settings) this will be done twice. For larger block sizes, such as 4M used in Ceph benchmark, you expect to see 1/4 ratio between write and read speeds. For small block sizes this ratio will be up to 6. This correlates with the 80/300 you see