Forums - PetaSAN

ForumGeneral DiscussionDrive Configurations - hardware R …
You need to log in to create posts and topics. Login · Register
Drive Configurations - hardware RAID vs JOBD

Pages: 1 2

craig51
10 Posts

March 23, 2017, 7:18 pm
Quote from craig51 on March 23, 2017, 7:18 pm
Admin

I am considering testing this product as it looks promising,

We have many storage units, most of them are Supermicro MOBO with single XEON, 16 gig Ram and LSI 9565 RAID controllers, most are 8x2TB SAS 7200 RPM drives, currently we are using open-e on some and other san \ nas OS's, I would love to get node redundancy and maybe a little bit better performance if possible

My question is if I want to deploy a 3 node set and I use 3 of these boxes should I use the hardware raid controllers or swap out for a SAS HBA since these cards will not do JBOD from what I can tell,

I can not tell if the petasan node config uses all disks or deploys a software raid, currently we use Raid10 on these and have 24 TB of space but not any node redundancy

I would just like to get some insight on best practice drive config wit hthis product

Thanks for your response

Craig51

Admin

I am considering testing this product as it looks promising,

We have many storage units, most of them are Supermicro MOBO with single XEON, 16 gig Ram and LSI 9565 RAID controllers, most are 8x2TB SAS 7200 RPM drives, currently we are using open-e on some and other san \ nas OS's, I would love to get node redundancy and maybe a little bit better performance if possible

My question is if I want to deploy a 3 node set and I use 3 of these boxes should I use the hardware raid controllers or swap out for a SAS HBA since these cards will not do JBOD from what I can tell,

I can not tell if the petasan node config uses all disks or deploys a software raid, currently we use Raid10 on these and have 24 TB of space but not any node redundancy

I would just like to get some insight on best practice drive config wit hthis product

Thanks for your response

Craig51

#1

craig51
10 Posts

March 23, 2017, 8:14 pm
Quote from craig51 on March 23, 2017, 8:14 pm
follow up on my own post,

the controllrs are 9265i and it looks as if they do have jbod option

follow up on my own post,

the controllrs are 9265i and it looks as if they do have jbod option

#2

admin
2,930 Posts

March 23, 2017, 9:27 pm
Quote from admin on March 23, 2017, 9:27 pm
Generally Ceph favors JBOD over RAID. Ceph has been designed with data redundancy so if you do use RAID, RAID 0 is most commonly used. If your RAID controller has a write back cache that is battery backed, you can use it which may provide improved latency in some cases.

The best thing is to try your hardware, both in JBOD mode and in RAID 0 and test the performance of each.

There is/was some nice article comparing JBOD and RAID 0 for a variety of controllers. The article is slightly old and some of the charts are no longer present but still have good info (if you do need the charts, i have them and can send them)

http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/

http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/

PetaSAN does not use software RAID. When you deploy a node in PetaSAN and give it the Local Storage role it will use all disks it finds (during deployment) for storage. After deployment, you can add more disks manually and create Ceph OSDs from them via the Management interface. So you have the choice to assign all disks as Ceph OSDs from the beginning, or add them later one by one. You can also deploy a node without the Local Storage role, then add this role later, in such case you can add the OSDs one by one via the UI to have more control.

One thing to note, PetaSAN current version does support adding an SSD journal disk ahead of a spinning data disks. We only support configurations with all SSDs or all spinning. If you use spinning disks, you may feel a performance hit if you use a small number of disks. The more you add disks (and nodes) the better performance you will get, a single iSCSI lun will use all cluster disks concurrently. Ideally you should add many disks per node till you start hitting high cpu usage, when this happens you need to scale out by adding nodes.

Generally Ceph favors JBOD over RAID. Ceph has been designed with data redundancy so if you do use RAID, RAID 0 is most commonly used. If your RAID controller has a write back cache that is battery backed, you can use it which may provide improved latency in some cases.

The best thing is to try your hardware, both in JBOD mode and in RAID 0 and test the performance of each.

There is/was some nice article comparing JBOD and RAID 0 for a variety of controllers. The article is slightly old and some of the charts are no longer present but still have good info (if you do need the charts, i have them and can send them)

http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/

http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/

PetaSAN does not use software RAID. When you deploy a node in PetaSAN and give it the Local Storage role it will use all disks it finds (during deployment) for storage. After deployment, you can add more disks manually and create Ceph OSDs from them via the Management interface. So you have the choice to assign all disks as Ceph OSDs from the beginning, or add them later one by one. You can also deploy a node without the Local Storage role, then add this role later, in such case you can add the OSDs one by one via the UI to have more control.

One thing to note, PetaSAN current version does support adding an SSD journal disk ahead of a spinning data disks. We only support configurations with all SSDs or all spinning. If you use spinning disks, you may feel a performance hit if you use a small number of disks. The more you add disks (and nodes) the better performance you will get, a single iSCSI lun will use all cluster disks concurrently. Ideally you should add many disks per node till you start hitting high cpu usage, when this happens you need to scale out by adding nodes.

Last edited on March 23, 2017, 9:55 pm · #3

craig51
10 Posts

March 24, 2017, 3:27 am
Quote from craig51 on March 24, 2017, 3:27 am
Thank you for quick response,

in my scenario if I deploy 3 nodes and add a single small SSD drive for sda system disk allowing for the 8 2TB disks in each node to be used in JBOD I will have a total of 48Tb of raw disk, how much usable storage will I have available ?

Thank you for quick response,

in my scenario if I deploy 3 nodes and add a single small SSD drive for sda system disk allowing for the 8 2TB disks in each node to be used in JBOD I will have a total of 48Tb of raw disk, how much usable storage will I have available ?

#4

admin
2,930 Posts

March 24, 2017, 7:28 am
Quote from admin on March 24, 2017, 7:28 am
48 TB will be raw storage

24 TB will be available storage for 2 replica redundancy (default)

16 TB will be available storage for 3 replica.

48 TB will be raw storage

24 TB will be available storage for 2 replica redundancy (default)

16 TB will be available storage for 3 replica.

#5

admin
2,930 Posts

March 24, 2017, 8:39 am
Quote from admin on March 24, 2017, 8:39 am
For your sda, consider using 2 SSDs in RAID 1 to avoid node downtime in case of sda failure

For your sda, consider using 2 SSDs in RAID 1 to avoid node downtime in case of sda failure

#6

craig51
10 Posts

March 29, 2017, 2:55 am
Quote from craig51 on March 29, 2017, 2:55 am
I am not sure if I need to start another thread but my main question here is about performance,

we are satisfied with the performance we are getting fro our open-e sans but I would like to accomplish node redundancy,

what are your thoughts on performance comparisons if these sans are using raid 10 hardware controllers with 7200 sas drives

I would not want to implement a 3 or 4 node peta san and end up with less performance than I have now,

I ask this because of a few posts I have seen on sites covering scaleio and comparing their performance vs ceph

your thoughts

I am not sure if I need to start another thread but my main question here is about performance,

we are satisfied with the performance we are getting fro our open-e sans but I would like to accomplish node redundancy,

what are your thoughts on performance comparisons if these sans are using raid 10 hardware controllers with 7200 sas drives

I would not want to implement a 3 or 4 node peta san and end up with less performance than I have now,

I ask this because of a few posts I have seen on sites covering scaleio and comparing their performance vs ceph

your thoughts

#7

admin
2,930 Posts

March 29, 2017, 10:52 am
Quote from admin on March 29, 2017, 10:52 am
You are asking difficult questions that i do not know the answer...but you did ask for my thoughts 🙂

A traditional 2 node SAN scale-up is by design going to give you the best performance for a 2 node setup, but it cannot grow. A Software Defined Storage (SDS) scale out system carries messaging and processing overhead to allow it to grow/scale out horizontally, the more you add the faster it gets.

On a node by node basis, the traditional SAN will give you better performance but cannot scale, SDS less per-node performance but can scale. Of course in many SAN setups such as yours, you end up having multiple/separate 2-node SANs to accommodate increased storage, so management and node redundancy are inferior to an SDS setup. So you have to decide which solution is better for your case.

I should also say that SDS seems to be the trend: Software Defined Storag e, Software Defined Networking, Software Defined Data Centers..it is all virtual now. Fewer people now argue that virtualization of CPU in a VM is inferior to running on bare metal due to performance hit, this trend will continue. Even Microsoft with Windows 2016 are moving to SDS with their (quite expensive) Storage Spaces Direct and in many of their talks predict the death of traditional 2 node SANs.

ScaleIO: Now this is the other side of the spectrum. an SDS solution coming from EMC has to be good, no doubt, and i am not qualified to compare it to Ceph, but again my thoughts:

It is free for non-commercial/non production use, else it is pricey.

It is proprietary storage format, Ceph is ubiquitous open format that is a defacto SDS standard. When you think of going to SDS, you should think 10-20 years would this format be supported.

Re the performance analysis you mention:

The EMC presentation at OpenStack event comparing ScaleIO to Ceph was titled "Battle of the Titans", coming from the world largest commercial storage vendor and comparing themselves with an open source solution speaks for itself.

By design, Ceph leans heavily to insure data integrity even at expense of performance: writes are done in a 2 phase commit approach using a journal ( much like a database engine does for integrity reasons ) and journal writes bypass any kernel caching. Does ScaleIO do this ?

I was at the last OpenStack event and got a chance to talk about ScaleIO performance comparison with some Ceph people, some think it is marketing stuff..the truth is probably in the middle. But what i was sure about is the extent of industry support and momentum behind Ceph, from embedded chip and storage devices to high performance computing.

Some high performance all flash Ceph solutions from Samsung/Intel/SanDisk approach 1M IOPS per individual node !

You are asking difficult questions that i do not know the answer...but you did ask for my thoughts 🙂

A traditional 2 node SAN scale-up is by design going to give you the best performance for a 2 node setup, but it cannot grow. A Software Defined Storage (SDS) scale out system carries messaging and processing overhead to allow it to grow/scale out horizontally, the more you add the faster it gets.

On a node by node basis, the traditional SAN will give you better performance but cannot scale, SDS less per-node performance but can scale. Of course in many SAN setups such as yours, you end up having multiple/separate 2-node SANs to accommodate increased storage, so management and node redundancy are inferior to an SDS setup. So you have to decide which solution is better for your case.

I should also say that SDS seems to be the trend: Software Defined Storag e, Software Defined Networking, Software Defined Data Centers..it is all virtual now. Fewer people now argue that virtualization of CPU in a VM is inferior to running on bare metal due to performance hit, this trend will continue. Even Microsoft with Windows 2016 are moving to SDS with their (quite expensive) Storage Spaces Direct and in many of their talks predict the death of traditional 2 node SANs.

ScaleIO: Now this is the other side of the spectrum. an SDS solution coming from EMC has to be good, no doubt, and i am not qualified to compare it to Ceph, but again my thoughts:

It is free for non-commercial/non production use, else it is pricey.

It is proprietary storage format, Ceph is ubiquitous open format that is a defacto SDS standard. When you think of going to SDS, you should think 10-20 years would this format be supported.

Re the performance analysis you mention:

The EMC presentation at OpenStack event comparing ScaleIO to Ceph was titled "Battle of the Titans", coming from the world largest commercial storage vendor and comparing themselves with an open source solution speaks for itself.

By design, Ceph leans heavily to insure data integrity even at expense of performance: writes are done in a 2 phase commit approach using a journal ( much like a database engine does for integrity reasons ) and journal writes bypass any kernel caching. Does ScaleIO do this ?

I was at the last OpenStack event and got a chance to talk about ScaleIO performance comparison with some Ceph people, some think it is marketing stuff..the truth is probably in the middle. But what i was sure about is the extent of industry support and momentum behind Ceph, from embedded chip and storage devices to high performance computing.

Some high performance all flash Ceph solutions from Samsung/Intel/SanDisk approach 1M IOPS per individual node !

#8

admin
2,930 Posts

March 29, 2017, 10:21 pm
Quote from admin on March 29, 2017, 10:21 pm
I looked further at the ScaleIO comparison

http://cloudscaling.com/blog/cloud-computing/killing-the-storage-unicorn-purpose-built-scaleio-spanks-multi-purpose-ceph-on-performance/

the tests were done using 4K block sizes on SSD, they claimed:

ScaleIO is 5 times faster than Ceph.

Ceph is good with spinning disks but not fast enough with SSDs

Many of the SSD vendors such as SanDisk,Intel,Samsung looked at making Ceph based solution and tuning it to SSD. Sandisk and Intel discovered performance issue/bug with a memory allocation library TCMalloc that affects small (4k) block sizes when using SSDs, correcting this issue alone improved the performance by 4.2x to 4.7x ( for 4k block sizes ).

https://ceph.com/geen-categorie/the-ceph-and-tcmalloc-performance-story/

more detail:

https://www.msi.umn.edu/sites/default/files/MN_RH_BOFSC15.pdf

These SSD vendors now have Ceph based products reaching 1M IOPS per node.

PetaSAN uses the Jemalloc which gives the best performance.

I looked further at the ScaleIO comparison

http://cloudscaling.com/blog/cloud-computing/killing-the-storage-unicorn-purpose-built-scaleio-spanks-multi-purpose-ceph-on-performance/

the tests were done using 4K block sizes on SSD, they claimed:

ScaleIO is 5 times faster than Ceph.

Ceph is good with spinning disks but not fast enough with SSDs

Many of the SSD vendors such as SanDisk,Intel,Samsung looked at making Ceph based solution and tuning it to SSD. Sandisk and Intel discovered performance issue/bug with a memory allocation library TCMalloc that affects small (4k) block sizes when using SSDs, correcting this issue alone improved the performance by 4.2x to 4.7x ( for 4k block sizes ).

https://ceph.com/geen-categorie/the-ceph-and-tcmalloc-performance-story/

more detail:

https://www.msi.umn.edu/sites/default/files/MN_RH_BOFSC15.pdf

These SSD vendors now have Ceph based products reaching 1M IOPS per node.

PetaSAN uses the Jemalloc which gives the best performance.

Last edited on March 29, 2017, 10:28 pm · #9

ek-media
3 Posts

April 19, 2017, 5:07 pm
Quote from ek-media on April 19, 2017, 5:07 pm
Admin: For your sda, consider using 2 SSDs in RAID 1 to avoid node downtime in case of sda failure

Is it possible to use 2x SAS disk in RAID 1 for OS and 6x 2TB SDD JBOB in the same node?

Or will this give some performance problems?

Admin: For your sda, consider using 2 SSDs in RAID 1 to avoid node downtime in case of sda failure

Is it possible to use 2x SAS disk in RAID 1 for OS and 6x 2TB SDD JBOB in the same node?

Or will this give some performance problems?

#10

Post Reply: Drive Configurations - hardware RAID vs JOBD

Cancel

Pages: 1 2