Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Storage and Networking issues

Hi,

Good day to you.

We have 3 units of R730 and all are same specs

dual E5 v4 processor
128GB RAM DDR4

Apparently, we are running on 300 Virtual Machines and it might be grow up to 1000 VMs and might be more. We need your suggestions on the storage and networking.

We plan to use R730 for the SAN/Clustered Storage and scaleable storage to eliminate of the Direct Attach Storage. Our main concern is the High IOPS and High Capacity Storage.

However, I am a middle of midst of the Networking and the Storage type.

Questions on Networking : -
1) we planned to use 10GB SFP+ ( quad port / 4 ports ) on each of our R730. Is it possible?

2) How many switches needed on the best-effort-basis to meet our requirement for 1000VMs and more?

Questions on Storage:
1) Is it possible to use NVMe ( m.2 or u.2 ) ?
2) Can we get a very high capacity and high IOPS by using PETASAN?

** We still in a midst on the decision of the storage and the networking.

Please advise.

Thanks

In short, yes but not with three nodes, maybe 6 or 12 nodes though.

You require one cpu core for the OS, one core per service and one core per OSD (storage drive). So your dual 16 core cpus allows for up to 28 OSDs and provides for iscsi and the OS. Your 128GB of ram is just enough for this but it would be wise to increase this to 160GB or more to ensure you have the room for drive rebalancing. Plan your memory requirements as follows: 4GB per OSD, 4GB for the OS and 2GB per network port, be sure to include an extra 10GB for OSD rebalancing.

Direct Attached Storage is actually not a bad thing since you will not have the room for all or the possible drives per chassis. Just make sure you use a good quality SAS host with no raid and dual connect your host to your DAS box.

IOPS are determined by the storage drives capabilities, the number of drives, the amount of replication and number of nodes in the cluster. Building a multi node cluster allows for better IOPS distribution so each node does not need to be fully capable of the speeds required. So in short, add more nodes to your cluster to gain better total efficiency, add more drives to each node to gain additional IOPS per node. Adding SSD/NVMe drives as journals or cache drives increased your IOPS by reducing transaction time. Building the entire cluster from NVMe drives will increase total IOPS due to their higher individual IOPS metrics. NVMe drives show up as SSD drives and are treated about the same. Make sure you do not use SMR drives if you add spinning drives. These are very slow write drives and will drop your cluster speed and IOPS.

Network throughput will be determined by the number of concurrent sessions and the transactional block size of each session up to the limit of the network hardware used.

The number of switches is determined by your network design, the number of hosts in the network and how much non-blocking bandwidth on each switch. If you have 20 VM hosts with 50vms on each then you would require two host switches (50% rule unless you have the budget for much more powerful and expensive switches). For the storage switches, you will need at least one switch though building twin iscsi networks will increase your failure resiliency. You can use the same switch but then your adding latency to your network and since you want the best build possible its wise to build a storage network on two separate switches. So plan on your storage switches needing 2x 10gbe ports per node per switch. Run your management and one iscsi network on one bonded pair to one switch and your backend and the other iscsi network on the other bonded pair to the other switch.

Network card support is basically if Debian/Ubuntu has support for it. So put the card in a box, load Ubuntu 20 into it and see if it shows up and is useable. Or go the route we did and manually add the drivers before joining each node to the cluster.

Yes you can bond nics together as long as your switch allows active/active LACP.

 

Hope this helps, even if it is late