Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Hardware advice

Hi all,

Im planning a 3 node PetaSAN implementation for a DR-site and i have a few questions i need some advice to.
At the moment the storage need is 10-15TB and it would probably never grow over 50TB in the next 3-5 yrs.
It's going to be hosting a mix of Hyper-V and Vmware

My suggested setup for a minimum start:

3x SuperMicro 2U 12bay 2x10gbase-t nic with 7x PCI-e slots
2x Xeon e5-2600 series v2 (4C 3-4GHz)
32-64GB DDR3 1833MHz
2x10GBASE-t on-board NIC
2x Dual 40GB SFP+ server nic (broadcom, intel etc)
3x PCI-e NVMe m.2 contrller (no cache)
Disk controller: Not yet decided (HBA/RAID)
SSD 2x 128gb OS volume RAID1-onboard
SSD 3x 1TB (1500/1500 read/write) sata (data vol)
NVMe M.2 3x 1TB (3000/3000 read/write) (data vol)

I will have space for expansion at a later time. Now this will give me native 6TB on each node.

Now i see admin recommend using disk controllers with lots of cache, and not recommending any RAID levels. HBA's mostly
dont have any cache and most RAID controller dont allow pass-trough. Is the best solution a RAID0 for each SSD? What about 2x RAID10 containers for instance? Wouldn't that increase performance?
Could anyone recommend a good controller for 8-12 SSDs? (6gbps vs 12gbps?)

Will my m.2 sticks be slowed down by the much slower ssd's? Or should i find a way to put m.2's into the drive bays?
(There are a lot of conversion of m.2 to sata out there)
If my m.2's are slowed down, would i be better off adding more nodes and keeping to PCI-e NVMe m.2's only? Has anyone had a system
similar running stable?

Also, is it correct that the OS does not need any significant performance to them?
As of CPU's, does the system want many cores and low MHz, or fewer cores but more MHz? Pros/Cons?
Any 40gb network cards working better then others? Infiniband ok to use?

Thanks for your time!
//Ray

Since you use SSDs, you do not need a controller with cache. Cache is only recommended when you have hdds to reduce latency.

Use raid 1 for OS. Do not use RAID for your OSDs, Ceph works better the more OSD disks it uses.

You may want to experiment with:

  • nvme as journals, sata ssds as OSDs
  • not using nvmes and replace them with more sata ssds and not using any external journals ( the OSDs will have their journals on same disk ),
  • or if you can afford all nvmes for your OSDs. Just do not use a mix as pure OSDs

You can then use the PetaSAN benchmark page to find the performance

For network: i would consider using the 2 x 10G nics as iSCSI MPIO, the 40 G nics as bonded with your 2 backend networks.

For cpu: to increase total iops, it is more economical to use more cores. If you care about latency, or iop per single client operation, then more cores do not help, and you would need faster GHz. If 1-3 ms latency is good then no need for higher GHz.

Cannot recommend Infiniband, just because we do not test it here.

Good luck.

 

I have tested petasan with next config

3x Dell R620
on each server 2x E5-2697 V2 cpu
96 GB DDR3 DDR3 1833MHz
2 x NVME Intel with 2700/1500 Read Write 4TB
4 x Samsung 963 SSD 1 TB
40 GBE HP ethernet

Dell  S4048-ON switch 6x 40GBE

Client
vmware esxi
Dell R620
10 GBE intel Ethernet

i have made 2 pools 1 x 12 osd SSD another 3 x nvme pool
i think it's very slow because if i direct write to normal ssd then i get faster
NVME disk are also very slow over PETASAN

------------------------------------------------------------------------------
CrystalDiskMark 7.0.0 x64 (C) 2007-2019 hiyohiyo
Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
Sequential 1MiB (Q= 8, T= 1): 800.459 MB/s [ 763.4 IOPS] < 10455.28 us>
Sequential 1MiB (Q= 1, T= 1): 465.431 MB/s [ 443.9 IOPS] < 2249.27 us>
Random 4KiB (Q= 32, T=16): 164.226 MB/s [ 40094.2 IOPS] < 12658.45 us>
Random 4KiB (Q= 1, T= 1): 12.860 MB/s [ 3139.6 IOPS] < 317.34 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 788.747 MB/s [ 752.2 IOPS] < 10550.82 us>
Sequential 1MiB (Q= 1, T= 1): 343.556 MB/s [ 327.6 IOPS] < 3047.61 us>
Random 4KiB (Q= 32, T=16): 75.952 MB/s [ 18543.0 IOPS] < 27427.73 us>
Random 4KiB (Q= 1, T= 1): 5.590 MB/s [ 1364.7 IOPS] < 730.88 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/10/31 15:20:12
OS: Windows 10 Professional [10.0 Build 19042] (x64)
Comment: Petasan NVME 40GBE switch ESXI 10 GBE connection

------------------------------------------------------------------------------
CrystalDiskMark 7.0.0 x64 (C) 2007-2019 hiyohiyo
Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
Sequential 1MiB (Q= 8, T= 1): 867.943 MB/s [ 827.7 IOPS] < 9639.11 us>
Sequential 1MiB (Q= 1, T= 1): 467.204 MB/s [ 445.6 IOPS] < 2241.11 us>
Random 4KiB (Q= 32, T=16): 162.724 MB/s [ 39727.5 IOPS] < 12833.93 us>
Random 4KiB (Q= 1, T= 1): 12.702 MB/s [ 3101.1 IOPS] < 321.29 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 464.519 MB/s [ 443.0 IOPS] < 17926.04 us>
Sequential 1MiB (Q= 1, T= 1): 230.482 MB/s [ 219.8 IOPS] < 4540.22 us>
Random 4KiB (Q= 32, T=16): 100.190 MB/s [ 24460.4 IOPS] < 20793.64 us>
Random 4KiB (Q= 1, T= 1): 5.113 MB/s [ 1248.3 IOPS] < 799.10 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/10/31 15:20:17
OS: Windows 10 Professional [10.0 Build 19042] (x64)
Comment: Petasan 3xH 12xSSD SATA osd 40GBE switch ESXI 10 GBE

------------------------------------------------------------------------------
CrystalDiskMark 7.0.0 x64 (C) 2007-2019 hiyohiyo
Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
Sequential 1MiB (Q= 8, T= 1): 608.947 MB/s [ 580.7 IOPS] < 13749.70 us>
Sequential 1MiB (Q= 1, T= 1): 489.249 MB/s [ 466.6 IOPS] < 2131.32 us>
Random 4KiB (Q= 32, T=16): 187.023 MB/s [ 45659.9 IOPS] < 10456.53 us>
Random 4KiB (Q= 1, T= 1): 21.813 MB/s [ 5325.4 IOPS] < 186.55 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 515.820 MB/s [ 491.9 IOPS] < 16166.33 us>
Sequential 1MiB (Q= 1, T= 1): 520.391 MB/s [ 496.3 IOPS] < 2011.96 us>
Random 4KiB (Q= 32, T=16): 119.435 MB/s [ 29158.9 IOPS] < 17468.26 us>
Random 4KiB (Q= 1, T= 1): 29.121 MB/s [ 7109.6 IOPS] < 139.47 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/10/31 15:20:21
OS: Windows 10 Professional [10.0 Build 19042] (x64)
Comment: Direct SSD Intel SATA (basic ssd)

 

First you will not get the combined disk performance added up, a typical SAN will have TCP network latency overhead, SDS like Ceph have higher overheads in cpu.

Some points:

You latency of 0.3 ms read and 0.7 ms write from vm is good.

Can you measure the TCP latency on your 10G network

For iops: Can you look a the charts for %cpu and %disk utlization and see which is saturating first.

Try to run the test on multiple ESXi with multiple VMs, vmware does put queue depth limits on different layers (default  is 32 ) and add up the iops from different ESXi clients.

For throughput, it may help the increase the bandwidth on the 10G, could also use active bonding like LACP.