Forums - PetaSAN

ForumGeneral DiscussionPetaSAN v 1.3.0 released !
You need to log in to create posts and topics. Login · Register
PetaSAN v 1.3.0 released !

admin
2,930 Posts

May 18, 2017, 3:39 pm
Quote from admin on May 18, 2017, 3:39 pm
Happy to announce the release of PetaSAN 1.3.0 with the following new features :

Interface Bonding / LACP.

Support Jumbo frames.

Upgrade via installer.

Email notifications / alarms.

Selection of 2 or 3 data replicas.

Hope you like it !

Happy to announce the release of PetaSAN 1.3.0 with the following new features :

Interface Bonding / LACP.

Support Jumbo frames.

Upgrade via installer.

Email notifications / alarms.

Selection of 2 or 3 data replicas.

Hope you like it !

#1

mmenzo
8 Posts

May 22, 2017, 7:01 am
Quote from mmenzo on May 22, 2017, 7:01 am
Is there any documentation available on how we upgrade PetaSAN from 1.2.2 to 1.3.0 without data loss (and preferably without downtime as well)?

Is there any documentation available on how we upgrade PetaSAN from 1.2.2 to 1.3.0 without data loss (and preferably without downtime as well)?

#2

admin
2,930 Posts

May 22, 2017, 9:48 am
Quote from admin on May 22, 2017, 9:48 am
The installer will auto-detect your existing installation and will offer the choice of Upgrade ( default choice ) or New install. The upgrade will preserve all your data and should take 5 minutes, the node will be down, but the cluster (other nodes ) and client io will be running, Client io will feel a 25 sec pause before switching the path to another node ( this is client configured but is the default settings in the Windows initiator or VMWare iSCSI adaptor default settings but can be changed ).

In the near future we will also be adding online updates which will cover small application changes whereas the installer upgrades will typically be used for major changes like kernel/ceph/ubuntu system files.

The installer will auto-detect your existing installation and will offer the choice of Upgrade ( default choice ) or New install. The upgrade will preserve all your data and should take 5 minutes, the node will be down, but the cluster (other nodes ) and client io will be running, Client io will feel a 25 sec pause before switching the path to another node ( this is client configured but is the default settings in the Windows initiator or VMWare iSCSI adaptor default settings but can be changed ).

In the near future we will also be adding online updates which will cover small application changes whereas the installer upgrades will typically be used for major changes like kernel/ceph/ubuntu system files.

#3

mmenzo
8 Posts

May 22, 2017, 3:12 pm
Quote from mmenzo on May 22, 2017, 3:12 pm
Thank you for the information! Our cluster has succesfully been updated without any downtime / data loss.

Thank you for the information! Our cluster has succesfully been updated without any downtime / data loss.

#4

admin
2,930 Posts

May 23, 2017, 2:00 pm
Quote from admin on May 23, 2017, 2:00 pm
Nice to hear things went smooth 🙂

Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?

Nice to hear things went smooth 🙂

Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?

#5

mmenzo
8 Posts

May 24, 2017, 12:25 pm
Quote from mmenzo on May 24, 2017, 12:25 pm

Quote from admin on May 23, 2017, 2:00 pm

Nice to hear things went smooth 🙂

Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?

We use VMware. Currently we are not entirely happy with the cluster performance (ceph benchmark runs at 80 MB/s write, 300 MB/s read and VMware reports about half of those values), but that may be because we took a bunch of used disks and placed them in the cluster. I have the feeling some of those disks are quite slow and they are bringing the cluster performance down, but I'm still investigating it.

We are happy with the level of redundancy though! We performed some tests (like physically pulling the power plugs on a storage node) and nothing broke.

Quote from admin on May 23, 2017, 2:00 pm

Nice to hear things went smooth 🙂

Are you happy with the performance ? Is your cluster environment Linux/Windows/VMWare ?

We use VMware. Currently we are not entirely happy with the cluster performance (ceph benchmark runs at 80 MB/s write, 300 MB/s read and VMware reports about half of those values), but that may be because we took a bunch of used disks and placed them in the cluster. I have the feeling some of those disks are quite slow and they are bringing the cluster performance down, but I'm still investigating it.

We are happy with the level of redundancy though! We performed some tests (like physically pulling the power plugs on a storage node) and nothing broke.

#6

gomaelettronica
10 Posts

May 24, 2017, 2:42 pm
Quote from gomaelettronica on May 24, 2017, 2:42 pm
Hi,

I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?

Thanks in advance.

Luca

Hi,

I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?

Thanks in advance.

Luca

#7

mmenzo
8 Posts

May 24, 2017, 2:56 pm
Quote from mmenzo on May 24, 2017, 2:56 pm

Quote from gomaelettronica on May 24, 2017, 2:42 pm

Hi,

I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?

Thanks in advance.

Luca

I used the following tutorial to test the performance in Ceph: http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance

To test the actual writing speed for a VM I used the Linux command "dd" - for example "dd if=/dev/zero of=/tmp/tmp.img bs=1G count=1 oflag=direct"

Quote from gomaelettronica on May 24, 2017, 2:42 pm

Hi,

I've just updated my test Cluster to version 1.3.0. Very easy and very fast with no downtime. 🙂 I'm curious to test performance with the "Ceph Benchmark" but I don't know how. Could you please give me some hints?

Thanks in advance.

Luca

I used the following tutorial to test the performance in Ceph: http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance

To test the actual writing speed for a VM I used the Linux command "dd" - for example "dd if=/dev/zero of=/tmp/tmp.img bs=1G count=1 oflag=direct"

Last edited on May 24, 2017, 2:57 pm · #8

gomaelettronica
10 Posts

May 24, 2017, 5:04 pm
Quote from gomaelettronica on May 24, 2017, 5:04 pm
Thank you very much!

In the next days I will benchmark my testing cluster.

Best regards.

Luca

Thank you very much!

In the next days I will benchmark my testing cluster.

Best regards.

Luca

#9

admin
2,930 Posts

May 24, 2017, 5:19 pm
Quote from admin on May 24, 2017, 5:19 pm
We are happy with the level of redundancy though!

Well it is good to hear , we have put a lot to make it this way 🙂 we are also planning to add performance bench marking and tuning features in the near future.

In your case, the quickest way to identify the bottleneck is to measure how busy your disks and cpu are ( your network is 10G so it is not the culprit ) while doing the Ceph benchmark test. We have several command line tools included: atop collectl and sysstat. You can choose any one of them. We need to look for busy % more than bandwidth/iops.

show all stats in 1 page:

atop

sysstat is my preference total cpu:

sar 3 5

individual cores:

sar -P ALL 3 5

disks:

sar -d -p 3 5

Hopefully the cpu utilizations are low and the disks are the bottleneck. If all your disks are all high % and cpus % is low this means your system can accommodate more disks on the same host, the more you add the faster the cluster becomes and you will start using more of the idle cpu / net resources. If only a couple of disks have high % then they are bad apples that are slowing the entire cluster and should be removed (remove one at a time and allow Ceph self healing to complete, this can be monitored via the PG Status on the dashboard). If your cpus are high % then you are using under-powered machines or we have problem.

It goes without saying that if you can use faster disks (SSD!) things will really fly but of-course the purpose of tuning is to make best use of existing resources.

Regarding the difference between what you saw in Ceph benchmark and VMWare, this is probably due the different block sizes and io depth / threads. The rados bench command has a default of 4M block size + 16 threads, your VMWare is using much lower block sizes than this (you can get an average block size by dividing the observed bandwidth with the iops ) and can re-run the rados bench with the -b option.

For the raw disk test you used with the dd command, you need to add dsync flag since the journal uses it and use a smaller block size which would be more close to the VMWare pattern:

dd if=/dev/zero of=out_file bs=4K count=100000 oflag=direct,dsync

Also this tests sequential write speed, in real VMWare case you will have many concurrent threads doing small io and your disk seek latency will be the main factor.

One more thing, Ceph uses a journal to achieve write integrity, so that in case of failure halfway through a write you do not have inconsistent data. So each client write io is done twice on disk requiring 3 seeks and if your replica count is 2 (default can be changed from Cluster Settings) this will be done twice. For larger block sizes, such as 4M used in Ceph benchmark, you expect to see 1/4 ratio between write and read speeds. For small block sizes this ratio will be up to 6. This correlates with the 80/300 you see

We are happy with the level of redundancy though!

Well it is good to hear , we have put a lot to make it this way 🙂 we are also planning to add performance bench marking and tuning features in the near future.

In your case, the quickest way to identify the bottleneck is to measure how busy your disks and cpu are ( your network is 10G so it is not the culprit ) while doing the Ceph benchmark test. We have several command line tools included: atop collectl and sysstat. You can choose any one of them. We need to look for busy % more than bandwidth/iops.

show all stats in 1 page:

atop

sysstat is my preference total cpu:

sar 3 5

individual cores:

sar -P ALL 3 5

disks:

sar -d -p 3 5

Hopefully the cpu utilizations are low and the disks are the bottleneck. If all your disks are all high % and cpus % is low this means your system can accommodate more disks on the same host, the more you add the faster the cluster becomes and you will start using more of the idle cpu / net resources. If only a couple of disks have high % then they are bad apples that are slowing the entire cluster and should be removed (remove one at a time and allow Ceph self healing to complete, this can be monitored via the PG Status on the dashboard). If your cpus are high % then you are using under-powered machines or we have problem.

It goes without saying that if you can use faster disks (SSD!) things will really fly but of-course the purpose of tuning is to make best use of existing resources.

Regarding the difference between what you saw in Ceph benchmark and VMWare, this is probably due the different block sizes and io depth / threads. The rados bench command has a default of 4M block size + 16 threads, your VMWare is using much lower block sizes than this (you can get an average block size by dividing the observed bandwidth with the iops ) and can re-run the rados bench with the -b option.

For the raw disk test you used with the dd command, you need to add dsync flag since the journal uses it and use a smaller block size which would be more close to the VMWare pattern:

dd if=/dev/zero of=out_file bs=4K count=100000 oflag=direct,dsync

Also this tests sequential write speed, in real VMWare case you will have many concurrent threads doing small io and your disk seek latency will be the main factor.

One more thing, Ceph uses a journal to achieve write integrity, so that in case of failure halfway through a write you do not have inconsistent data. So each client write io is done twice on disk requiring 3 seeks and if your replica count is 2 (default can be changed from Cluster Settings) this will be done twice. For larger block sizes, such as 4M used in Ceph benchmark, you expect to see 1/4 ratio between write and read speeds. For small block sizes this ratio will be up to 6. This correlates with the 80/300 you see

Last edited on May 25, 2017, 7:27 am · #10

Post Reply: PetaSAN v 1.3.0 released !

Cancel