Forums - PetaSAN

ForumGeneral DiscussionNew installation questions
You need to log in to create posts and topics. Login · Register
New installation questions

merlinios
9 Posts

July 25, 2020, 3:54 pm
Quote from merlinios on July 25, 2020, 3:54 pm
Hello all ,

Congratulations for your efford with PetaSAN . We have read a lot of good reviews of your product and so that is why we as a company want to give it a try ,

So lets get to the point , right now for our public cloud we are using Storage Spaces Direct HyperConverged Systems with 2 flavors .

1st one is with Dell Hardware - 8x PowerEdge R740xd with 2x Xeon Gold 2152 22Core CPUs with 44 Logical Cores each node , 768GB RAM . Each node has 4x 1.92TB Enterprise SSDs for caching and 8x8TB SAS for capacity all connected to an HBA controller . Ratio is 1:2 with 3 way mirror volumes

Networking is 2x25G mellanox NICS on each node with RoCE for RDMA .

With the above setup we have about 700-800k IOPS with subms latency with 8 nodes and 3 way mirror volumes . 80/30 tests. If we go full read only for testing we can get 1.5 Million IO with CPU at 80%

The other setup is HP Servers with same CPUs , same amount of memory , same capacity enterprise disks , same ratio and same resiliency 3 way mirror volumes . The only difference is that in HP clusters we are using iWARP for RDMA and not RoCE .

So , we are looking for a solution for a software define storage which has the same features with storage spaces direct in terms of expanding the clusters and grow up the capacity and all this magic stuff . We have only some questions regarding this solution ,

Can we use the same 1:2 ratio with petasan and 3 way mirror for resiliency ?

Can we use RDMA for low latency networking which is critical for the storage traffic ? iWARP or RoCE

In terms of performance , is there a possibility to take maybe the same amount of performance as with storage spaces direct ? Of course the nodes for petasan will be with lower RAM and maybe CPUs . The configs i gave above are for hyperconverged clusters so they are also have virtual machines running on them .

I have 3 HP nodes for testing. Can i test petasan to this nodes or i need anything else ?

Thanks a lot and again nice job

Hello all ,

Congratulations for your efford with PetaSAN . We have read a lot of good reviews of your product and so that is why we as a company want to give it a try ,

So lets get to the point , right now for our public cloud we are using Storage Spaces Direct HyperConverged Systems with 2 flavors .

1st one is with Dell Hardware - 8x PowerEdge R740xd with 2x Xeon Gold 2152 22Core CPUs with 44 Logical Cores each node , 768GB RAM . Each node has 4x 1.92TB Enterprise SSDs for caching and 8x8TB SAS for capacity all connected to an HBA controller . Ratio is 1:2 with 3 way mirror volumes

Networking is 2x25G mellanox NICS on each node with RoCE for RDMA .

With the above setup we have about 700-800k IOPS with subms latency with 8 nodes and 3 way mirror volumes . 80/30 tests. If we go full read only for testing we can get 1.5 Million IO with CPU at 80%

The other setup is HP Servers with same CPUs , same amount of memory , same capacity enterprise disks , same ratio and same resiliency 3 way mirror volumes . The only difference is that in HP clusters we are using iWARP for RDMA and not RoCE .

So , we are looking for a solution for a software define storage which has the same features with storage spaces direct in terms of expanding the clusters and grow up the capacity and all this magic stuff . We have only some questions regarding this solution ,

Can we use the same 1:2 ratio with petasan and 3 way mirror for resiliency ?

Can we use RDMA for low latency networking which is critical for the storage traffic ? iWARP or RoCE

In terms of performance , is there a possibility to take maybe the same amount of performance as with storage spaces direct ? Of course the nodes for petasan will be with lower RAM and maybe CPUs . The configs i gave above are for hyperconverged clusters so they are also have virtual machines running on them .

I have 3 HP nodes for testing. Can i test petasan to this nodes or i need anything else ?

Thanks a lot and again nice job

#1

admin
2,930 Posts

July 25, 2020, 5:07 pm
Quote from admin on July 25, 2020, 5:07 pm
Thanks for the feedback 🙂

Yes you can test PetaSAN with 3 hosts, actually testing PetaSAN yourself is the best way for you to get a feel of the system and what it can do with in your hardware setup.

With good hardware and all flash, a PetaSAN storage node will give 20-40k random 4k iops. Throughput with large block sizes such as 1MB, there is not issue saturating 2x25G.

Latency can be from 0.3-0.5 ms for reads and 0.8-1.5 ms for writes. RDMA does not help. The latency numbers are not exceptional in themselves, but the system's strong point is that it can scale and have a large number of concurrency/users and keep going.

If you use HDDs, the latency goes up. It is recommended you using an SSD journal device with a ratio to HDD of 1:4 ( or NVME with ratio 1:12). For write intensive and latency sensitive workloads we offer a write cache ( Linux dm-writecache ) which can take ratios of 1:1-8 (recommended 1:4).

I do not know about MS Storage Spaces Direct, but benchmarking could be deceiving, most commercial solutions show some great numbers under specific workloads. I find it hard to believe you can get 1M iops with some SSD caching and an HDD backend, unless you are just accessing the SSD cache (ie the cache is not filled and not flushing) or your workload is more sequential than random and in such case you should be measuring throughput bandwidth as iops would be meaningless. We have put a lot of engineering effort in our write cache and there is just no way to get a sustained random iops to an HDD backing devices with such numbers.

Thanks for the feedback 🙂

Yes you can test PetaSAN with 3 hosts, actually testing PetaSAN yourself is the best way for you to get a feel of the system and what it can do with in your hardware setup.

With good hardware and all flash, a PetaSAN storage node will give 20-40k random 4k iops. Throughput with large block sizes such as 1MB, there is not issue saturating 2x25G.

Latency can be from 0.3-0.5 ms for reads and 0.8-1.5 ms for writes. RDMA does not help. The latency numbers are not exceptional in themselves, but the system's strong point is that it can scale and have a large number of concurrency/users and keep going.

If you use HDDs, the latency goes up. It is recommended you using an SSD journal device with a ratio to HDD of 1:4 ( or NVME with ratio 1:12). For write intensive and latency sensitive workloads we offer a write cache ( Linux dm-writecache ) which can take ratios of 1:1-8 (recommended 1:4).

I do not know about MS Storage Spaces Direct, but benchmarking could be deceiving, most commercial solutions show some great numbers under specific workloads. I find it hard to believe you can get 1M iops with some SSD caching and an HDD backend, unless you are just accessing the SSD cache (ie the cache is not filled and not flushing) or your workload is more sequential than random and in such case you should be measuring throughput bandwidth as iops would be meaningless. We have put a lot of engineering effort in our write cache and there is just no way to get a sustained random iops to an HDD backing devices with such numbers.

#2

merlinios
9 Posts

July 25, 2020, 5:30 pm
Quote from merlinios on July 25, 2020, 5:30 pm
Hello

Thanks for the info i will try to create a POC with my 3 servers .

20-30k with all flash ?? I think this is pretty small number. Are you sure about this ?

Storage spaces direct is the same technology as vmware vsan . In vsan you have the same performance , double the price 😉

Ofcourse the numbers i have gave you is SSD cache only . My tests with the 1.5 million IO are with 320 Virtual machines running on 8 nodes with a footprint of about 30 TB which they fit in my cache which is 45 TB . My tests are made with Random IO always , 4k as hyper-v has 4k with vhdx , 8 outstanding threads and 1 thread per target . The magic of course with Vmware vSAN and MS Storage Spaces direct is that they behave pretty well also when the cache is filled as the only cache the hot chunks of your data and also you can add another node so you can increase the cache size .

Here is a screenshot of one of my production cluster with 850 Virtual machines . This is now on non production hours . 300K IO with 20GBps bandwidth and subms latency . Our workloads are 90/10

https://pasteboard.co/JjjAxnk.png

Thanks for the info

Hello

Thanks for the info i will try to create a POC with my 3 servers .

20-30k with all flash ?? I think this is pretty small number. Are you sure about this ?

Storage spaces direct is the same technology as vmware vsan . In vsan you have the same performance , double the price 😉

Ofcourse the numbers i have gave you is SSD cache only . My tests with the 1.5 million IO are with 320 Virtual machines running on 8 nodes with a footprint of about 30 TB which they fit in my cache which is 45 TB . My tests are made with Random IO always , 4k as hyper-v has 4k with vhdx , 8 outstanding threads and 1 thread per target . The magic of course with Vmware vSAN and MS Storage Spaces direct is that they behave pretty well also when the cache is filled as the only cache the hot chunks of your data and also you can add another node so you can increase the cache size .

Here is a screenshot of one of my production cluster with 850 Virtual machines . This is now on non production hours . 300K IO with 20GBps bandwidth and subms latency . Our workloads are 90/10

https://pasteboard.co/JjjAxnk.png

Thanks for the info

Last edited on July 25, 2020, 5:31 pm by merlinios · #3

admin
2,930 Posts

July 25, 2020, 6:35 pm
Quote from admin on July 25, 2020, 6:35 pm
I mentioned 20-40k per node, give or take this is a ballpark. Most of the saturation is in cpu % util not disk.

The magic of course with Vmware vSAN and MS Storage Spaces direct is that they behave pretty well also when the cache is filled as the only cache the hot chunks of your data..

The Linux dm-cache is also cleverly done, but i doubt there could be any magic in this, if your slow hdd devices cannot sustain these random iops, the filled cache cannot accept them..

I mentioned 20-40k per node, give or take this is a ballpark. Most of the saturation is in cpu % util not disk.

The magic of course with Vmware vSAN and MS Storage Spaces direct is that they behave pretty well also when the cache is filled as the only cache the hot chunks of your data..

The Linux dm-cache is also cleverly done, but i doubt there could be any magic in this, if your slow hdd devices cannot sustain these random iops, the filled cache cannot accept them..

Last edited on July 25, 2020, 6:53 pm by admin · #4

merlinios
9 Posts

July 25, 2020, 8:56 pm
Quote from merlinios on July 25, 2020, 8:56 pm
Thanks for the info , i will proceed with my tests. With the cpu i have in my lab i wont think that i will have problem . 2x22core xeon gold to each node .

Thanks for the info , i will proceed with my tests. With the cpu i have in my lab i wont think that i will have problem . 2x22core xeon gold to each node .

#5

Post Reply: New installation questions

Cancel