Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

ceph: 5 nodes with 12 drives vs 10 nodes with 6 drives

Hi,

I'm designing a ceph cluster for our VFX studio. We have about 32 artists seats and I need high sequential read and write speeds, not so much IOPS. I will use whatever it takes to put the best possible hardware inside each node, but I have to decide now if I go with many nodes with fewer disks or viceversa, for the same number of total drives.

Any advice will be greatly appreciated.

What's your network bandwidth available per server? Any flash for journals/db? What's your price penalty for going with 10 servers vs 5 ?

With more servers you get more possible aggregated bandwidth with many concurrent users/io threads, single user may see no gain if his network interface is the bottleneck.

Consider the impact of loosing a server as well to determine how many you need total. ie 5 servers total and 1 goes down, that hurts as it add 25% more load on the remaining servers.

Flash for journal/bd you get more performance with many concurrent users, single user may not see performance increase but overall feel of the storage will be much improved for everyone and speeds up recovery by at least 2x.

If users are on gigabit you can save money and use less servers, otherwise to fill 10gbit go for as many servers as you can that make financial sense and maybe trade off some servers to be able to add flash for journals.

Your choice of replication or EC for the pool will also affect performance, I find EC to give better single user performance but hurts when you have too many concurrent users vs the number of disks and you want a minimum of 6 servers for EC to make sense.

If you want to minimize cost of hardware (slower cpu, less memory, gigabit only) you should stick to replicated pool, don't go EC.

Lastly, you can't cheat physics... if you want more performance, add more disks and flash for journal/db.

Thank you for your nice reply. I thought I would receive some notification if my post was answered. That's why I haven't replied until now.

The financial department has taken care of the question 🙁 It looks like I'll have 5 nodes of 12 HDD slots each (2U cases), though we'll be starting with 8 HDD each for now. Since you say I need at least 6 nodes for EC, and there should be an odd number of nodes (or so I've been told), maybe I can push for 7 nodes, but I won't gold my breath.

I can go 10G or 25G for the networks, but I'll ask on a different post.

Thanks again.

If it can help you, I've been using Axiom EP450 SAS SSD that come bundle with the proper Dell trays and they are performing well with 6 HDD per SSD for journals. Also using them for dedicated all-flash pool for S3 indexes and meta buckets and are performing well for that too.

And I've been running on bonded dual-10Gbit per node and it's well balanced with our 16 to 24 hdd per servers, network has not been a bottleneck yet.

 

Every bit of information helps. Thank you so much!