Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

I need High reliability storage solution with limited server hardware.

Hello,

Our company Ture Network operate NUCserver project. (www.NUCserver.com)

Our project goal is simple, want to put 192 mini embedded PC on standard 42U server rack and it work fine.

Our next project is NUC farm, which is using NUCserver as DaaS(Desktop as a Service) base on RDI(Real Desktop Infrastructure)

https://www.facebook.com/dogmatic.dream/posts/1818220584885592

So we need separate NUC itself and storage for remote boot with iSCSI target.

All OS image on Storage server and kind of Mnangement middleware assign specific image to not assigned NUC then provide DaaS to customer.

our NUC farm calulate base on 8 unit of NUC for basic set.

According to our research, when remote booting from iSCSI target for each one set of 8 NUC , it need 2Gbps throughput. (250mbps per each NUC.)

So, if i want booting all together at same time, 192 NUC need 48Gbps throughput from iSCSI target storage.

Problem is our special NUC rack system rack have only 50cm depth, so we using custom made storage server case with depth 48cm only.

자동 대체 텍스트를 사용할 수 없습니다.자동 대체 텍스트를 사용할 수 없습니다.자동 대체 텍스트를 사용할 수 없습니다.자동 대체 텍스트를 사용할 수 없습니다.자동 대체 텍스트를 사용할 수 없습니다.자동 대체 텍스트를 사용할 수 없습니다.자동 대체 텍스트를 사용할 수 없습니다.자동 대체 텍스트를 사용할 수 없습니다.

We custom order for 3kind of storage case for project. which is  2U 12bay hotswap, 3U16bay hotswap, 4U24bay hotswap. (picture 2~8)

As you can see from picture, there are only space for miniITX size mainboard and most of miniITX mainboard only have one of PCI-e 3.0 16x slot.

so i found some of xeon d or atom based server board from SuperMicro and Gigabyte which have 2 of SFP+ port for 10G.

http://b2b.gigabyte.com/Server-Motherboard/MA10-ST0-rev-11#ov

MA10-ST0(1.1)

http://b2b.gigabyte.com/Server-Motherboard/MB10-DS3-rev-13#ov

MB10-DS3 (rev. 1.3)

https://www.supermicro.com/products/motherboard/Xeon/D/X10SDV-8C-TLN4F_.cfm

X10SDV-8C-TLN4F+에 대한 이미지 검색결과

each NUC have 2 of Gigabit Ethernet port, one for service network which connect from customer and other for storage network which iSCSI boot and read/write to disk image on storage server.

We developed ourselves this second Ethernet card too.

nuc5i5myhe ethernet에 대한 이미지 검색결과

Anyway,

I originally plan to using Mellanox  36 port 40/56G infiniband gateway switch  with gateway license which can connect 1 of 40G to 4 of 10G separate cable. (picture below)

https://store.mellanox.com/products/mellanox-msb7780-es2f-switch-ib-based-edr-infiniband-1u-router-36-qsfp28-ports-2-power-supplies-ac-x86-dual-core-standard-depth-p2c-airflow-rail-kit-rohs6.html

SB7700__13088.1505249074.1280.1280.jpg (1280×1280)

 

from my calculation,  one rack of 192 NUC need 8 of 10G connection for iSCSI target which mean 2  of 40G port,

36port mellanox switch can handle 1920 NUC with 20 of 40G port, and remained 16 port work as 56G infiniband can connect 8 set of storage server with each 2 port 56g infiniband card.

unfortunately, miniITX board only have maximum 6~12 port SATA3. and need to connect maximum 24 spinning HDD.  so need kind of HBA expansion card for more SATA port on storage server.

and there are no more slot for infiniband card.

 

 

So here are my question.

  1. if i using petasan with 12,16,24 bay spinning HDD, 20Gbps will be enough for throughput bandwidth?
  2. or if i give up to using  16,24bay case then only using 12bay with internal 12 sata board such like  http://techreport.com/news/25703/asrock-combines-avoton-soc-12-sata-ports-on-mini-itx-mobo (picture at below) then using 2port infiniband card, is it better than 2 of sfp+ port main baord using?

please help me to right choice.

Hi there..Interesting project.

First we do not support infiniband.  I believe some users did get it working but it is not directly supported in PetaSAN.

For the 2 sfp+ port board : The 20 G total will be give a net iSCSI client throughput of  5-6 G for writes and 10 G for reads, a write io will go from client to the iSCSI Target service within PetaSAN  on the iSCSI subnets, the iSCSI Target sends the io to Ceph on the Backend 1 subnet and Ceph will create copies replicate them on Backend 2 to different nodes/storage servers.  You may also consider bonding the 2 ports and map all of the PetaSAN subnets on this bond.

If using spinning disks, adding an SSD journal ( need 1 SSD per 4 spinning ) will double the disk write throughput so you may consider having 3-4 SSDs. This feature is supported in v 1,5 due in December but i can send you a beta if you wish ( the journal feature is already tested and stable ).

With this kind of project, you will need to test your hardware for storage: We have a benchmarking page  (added in v1.4) which will help you measure your throughput and understand whether your bottlenecks are the number of disks or network or cpu. So i highly recommend you do this with the supermicro and gigabyte boards with different number of spinning and journal combinations. The resource loads will vary greatly based on io block size, the benchmaking tests allows for both 4M and 4k tests, in some cases you may know the client io pattern but in many cases you do not and should optimize your setup yo give good balanace between small and large block sizes.

We do offer support and consultancy, if you do wish to use this please send us email at contact-us @ petasan.org

If using spinning disks, adding an SSD journal ( need 1 SSD per 4 spinning ) will double the disk write throughput so you may consider having 3-4 SSDs. This feature is supported in v 1,5 due in December but i can send you a beta if you wish ( the journal feature is already tested and stable ).

Hi! So you mean, that once I have SSD disk, I can easily add it to the existing setup, without such a things like extracting the CRUSH map and modifying it by hand, right? What about the 2.0 with Bluestore - should we expect easy SSD 'cache' add with 1 click or so? I mean, like adding L2ARC SSD cache in ZFS - easy and simple? I'm asking, because in CEPH docs I can't find exact statements about SSDs as cache, except some general sentence that 'Bluestore always tries to choose faster device', but anyway there is a description to enable DB on SSD, and I'm not sure what size should be used for this DB. So, SSD disk should be divided for as many partitions for Bluestore DB (which size? 1GB?), and the rest of this space, should be added as OSD or not? Could you please clarify the SSD usage in PetaSAN solution?

There is a difference between cache and journal.

journal is used to speed up writes but does not cache data. Currently filestore journal and bluestore wal (write ahead log)  see better write performance when using an external SSD disk for this.

Caching is to cache data so to speed both read and write. There are 2 methods: Cache Tiering in ceph where you setup up 2 pools: 1 fast used for caching and 1 slow that contains all the storage, you would need to define crush map defining what disks belong to the fast pool and those for the slow pool. However cache tiering is now NOT recommended. The other method is to use caching on a block device level outside of Ceph, this could be using bcache or dm-cache to act as cache in front of OSD block device.  Since you mentioned ZFS, i do believe it can do caching itself. Caching at the block level is the way to go. PetaSAN will be supporting block level cache once tested well by the Ceph team so we have more data/recommendations, it is on their todo list.

Last, Bluestore does have a rocks db for storing metadata, makes sense to store this on faster SSD as you mentioned, for simplicity it can be co-located on the same fast partition as used by wal.

So to summaries, in PetaSAN v 1.5  SSD/nvme will be used as journal for slower hdds/ssds. in 2.0 it will be used as journal(wal)+db for slower hdds/sdds. Beyond we will support block level caching using bcache.

So to summaries, in PetaSAN v 1.5  SSD/nvme will be used as journal for slower hdds/ssds. in 2.0 it will be used as journal(wal)+db for slower hdds/sdds. Beyond we will support block level caching using bcache.

Should we expect some simple clicks method to achieve this (and recommendations for those partitions sizes) - that is, all automatically, or some more knowledge would be needed? Don't take me wrong, I'm with science and knowledge, but I'm also lazy and after years spent on CLI, some things shoud be more convenient for administration. So, I expect something like "Add SSD cache [x] WAL [x] RockDB [x] Data Cache and field to put device path to some SSD disk, or so 😉 What do you think? 🙂

Yes we do try to make things simple. We have a default 20 G  partition size in both 1.5 and upcoming 2.0,  in 2.0 it is used for both wal and rocksdb (collocated). The size can be changed from the config file if desired, but the default is good.

From the ui, the user selects the devices to be used as journal/wal, then when adding OSDs he can select a specific journal or select 'Auto' which lets the system choose the least loaded journal.

Hello admin.

Thanks for contact me and also offer consulting or pay support.

We intend to ask pay support before our commercial project.

so i ask you to forget about nuc server or custom storage case first.

first of all, please consider me as dummy about san, iscsi, storage.

here are situation and questions.

i want build kind of commercial vps service. because most of vps have traffic limit and pre maden vm. so i want make cloud style vps service without traffic limit. but bandwidth limit.

scenario is simple as below.

  1. customer visit and register to service site. http://www.nolimitvps.com (it is not work yet)
  2. customer select  core, ram, storage,os, ip range,bandwidth and it automatically calculate total cost for month.
  3. customer select payment method and discount plan by advanced pay fof 6,12,24,36 months.
  4. customer make payment.
  5. VPS automatically generate and setting ip , etc.
  6. System send email to customer for access information about VPS and dashboard, VNC for disaster recovery.
  7. if payment period end, VPS automatically delete and remove from customer's dashboad

we already made automatic VPS service and work fine.

I just want to using petasan as a our storage system for VPS.

in this case, i think only need  petasan as one  iSCSI target from all hyper-v machine which have  many core and ram.

and your instruction confuse me some.

you said if install ssd as journal/wal, it will helpful to increase write speed to disk.

  1. how can i install ssd on storage server?

My 5 set of storage server dl380p gen8 have 12 slot of 3.5" from raid card + internal 2 2.5inch slot from internal sata port.

each storage server have 2 of e5-2630l, 64g ram and 4 of 10/100/1000 ethernet port but will not using this ethernet . and plan to add 2 of dual port 10g ethernet card for petasan.

I originally  plan to install 12 of 4Tbyte sas disk for storage + 2 of 100g ssd(soft raid 1) for petasan os.

but from your recommendation, if i install ssd for 3~4 hdd, it will be good for increase write speed.

so if i  install 9 of hdd + 3 of 100g server ssd for 12 slot, how can i setup ssd as journal?

2.  what is exact total size of petasan cluster?

if i install 9 of 4Tbyte disk + 3 of 100G ssd for each node, it's total size is 36.3TB and if i using 5 node, it total 181.5TB.

but patasan's purpose is HA. so i didn;t expact this total size but i need to know how many TB i can using as VM disk with HA.

3. do you think dual E5-2630 and 64G ram is enough or too high or low spec for petasan? any recommendation for upgrade or downgrade some part?

dl380p g8

if you want you can send me a email and ask pay consult or support. my contact no is 82-10-4605-8041 and skype:hl1ill@hotmail.com

Sent you email. Thanks !