Setup advice
aldoir
2 Posts
October 30, 2018, 12:20 amQuote from aldoir on October 30, 2018, 12:20 amMy currently storage (2x IBM V7000 mirrored) are reaching EOL and I'm looking for a scalable replacement solution without vender lock-in. I follow the Ceph development but never had guts to implement it, mainly because the lack of qualified professionals and/or capacity to train and keep them.
But this project caught my attention the way you wrapped the solution in a friendly way (for a storage administrator). So, here is my plan.
My main usage is for virtualization (Hyper-V) with a peak usage of 1500 write IOPS (low for modern setups).
A have two datacenters interconected with abundand fiber pairs available, so for latency matters it's a local network.
My setup, based on available hardware would be:
4x Lenovo SR630 (two on each datacenter)
- Redundant power supply
- 8x 2.5" 1,8Tb SAS for OSD
- 2x 2.5" 240Gb SSD for Journal
- 32Gb RAM
- 1x octa-core Xeon Silver processor
- 128Gb M.2 card (Operating System)
- 1x PCIe LOM Card 4x 10G BASE-T
4x Cisco SG350XG-2F10 12-Port 10GBase-T (two on each DataCenter for MultiPath)
- Link agreggation on SFP ports (20Gb combined)
I would use replica=3 spread across both datacenters.
Will this setup reach something around 2500 write IOPS? How about maitenance in long term, ie support for adding new nodes, software upgrades?
Thanks for any advices
My currently storage (2x IBM V7000 mirrored) are reaching EOL and I'm looking for a scalable replacement solution without vender lock-in. I follow the Ceph development but never had guts to implement it, mainly because the lack of qualified professionals and/or capacity to train and keep them.
But this project caught my attention the way you wrapped the solution in a friendly way (for a storage administrator). So, here is my plan.
My main usage is for virtualization (Hyper-V) with a peak usage of 1500 write IOPS (low for modern setups).
A have two datacenters interconected with abundand fiber pairs available, so for latency matters it's a local network.
My setup, based on available hardware would be:
4x Lenovo SR630 (two on each datacenter)
- Redundant power supply
- 8x 2.5" 1,8Tb SAS for OSD
- 2x 2.5" 240Gb SSD for Journal
- 32Gb RAM
- 1x octa-core Xeon Silver processor
- 128Gb M.2 card (Operating System)
- 1x PCIe LOM Card 4x 10G BASE-T
4x Cisco SG350XG-2F10 12-Port 10GBase-T (two on each DataCenter for MultiPath)
- Link agreggation on SFP ports (20Gb combined)
I would use replica=3 spread across both datacenters.
Will this setup reach something around 2500 write IOPS? How about maitenance in long term, ie support for adding new nodes, software upgrades?
Thanks for any advices
admin
2,930 Posts
October 30, 2018, 11:20 amQuote from admin on October 30, 2018, 11:20 am2500 write iops is low, most installations are much higher than this. You would need to test your hardware yourself, it will take a few minutes to install a 3 node cluster and run the included benchmark. I would recommend you run them all local first to take away any cross-datacenter latency at first.
One recommendation to help boost performance with hdd, you can probably add more that 8 per node and the performance will scale. Also if you have a controller with write-back cache (battery backed) it can further increase write performance by 3-5 times.
PetaSAN is an SDS solution, you can add nodes as you wish, both performance and capacity will scale as you add nodes.
Our installer does automatic upgrades from previous version(s)
2500 write iops is low, most installations are much higher than this. You would need to test your hardware yourself, it will take a few minutes to install a 3 node cluster and run the included benchmark. I would recommend you run them all local first to take away any cross-datacenter latency at first.
One recommendation to help boost performance with hdd, you can probably add more that 8 per node and the performance will scale. Also if you have a controller with write-back cache (battery backed) it can further increase write performance by 3-5 times.
PetaSAN is an SDS solution, you can add nodes as you wish, both performance and capacity will scale as you add nodes.
Our installer does automatic upgrades from previous version(s)
aldoir
2 Posts
October 30, 2018, 11:43 amQuote from aldoir on October 30, 2018, 11:43 amThanks for your quick response.
Since I don't have this hardware in hand maybe the best topic title would be "buying advice" to avoid buying the wrong gear.
On such scenario where one datacenter might go completely down and I need the remaining site serving the requests, would you recommend two bigger servers with more slots instead four small ones?
I have a special concern about brain split. I believe I'll need to place a manager on a third site/cloud acting only as a quorum decisor. Sooner or later the cluster will grow, but I would like to start the project as sucessful as it can be
For a bigger server with hot-swappable drives, could I replace drives without downtime on that node?
Thanks for your quick response.
Since I don't have this hardware in hand maybe the best topic title would be "buying advice" to avoid buying the wrong gear.
On such scenario where one datacenter might go completely down and I need the remaining site serving the requests, would you recommend two bigger servers with more slots instead four small ones?
I have a special concern about brain split. I believe I'll need to place a manager on a third site/cloud acting only as a quorum decisor. Sooner or later the cluster will grow, but I would like to start the project as sucessful as it can be
For a bigger server with hot-swappable drives, could I replace drives without downtime on that node?
Last edited on October 30, 2018, 11:56 am by aldoir · #3
admin
2,930 Posts
October 30, 2018, 5:30 pmQuote from admin on October 30, 2018, 5:30 pmThe recommendation in our hardware guide is to have an all SSD setup, most new Ceph clusters in the near future will be so. The bluestore storage engine will work better this way. In such have 5-10 SSDs per host.
The next step, if you need to use HDDs, have larger number such as 12-24 per host, with SSD journals using a ratio of 4:1 + use a controller with write back cache (battery backed) either in jbod mode or as single disk RAID -0 volumes.
I would recommend more smaller servers than fewer larger ones.
hot-swap is great, but even if you have to shut down a node to replace disks, the cluster keeps running, there is no client down time.
If you want you cluster to be across data centers but only 2 physical locations, you will have issues of quorums + where to store you third replica. Some idea:
- Use 4 replicas and store 2 replicas per data center, with a pool min_size of 2. Have 1 of the Management nodes act as pure monitor, do not assign roles for storage nor iSCSI, install this node as a VM in a system not using PetaSAN ( example VMWare HA ) so it will run in 1 of the datacenters and fails to the next, its role is provide monitor quorum.
- Have 2 separate installations and divide the load between so each will have active and standby images. and setup async/geo replication between them.
The recommendation in our hardware guide is to have an all SSD setup, most new Ceph clusters in the near future will be so. The bluestore storage engine will work better this way. In such have 5-10 SSDs per host.
The next step, if you need to use HDDs, have larger number such as 12-24 per host, with SSD journals using a ratio of 4:1 + use a controller with write back cache (battery backed) either in jbod mode or as single disk RAID -0 volumes.
I would recommend more smaller servers than fewer larger ones.
hot-swap is great, but even if you have to shut down a node to replace disks, the cluster keeps running, there is no client down time.
If you want you cluster to be across data centers but only 2 physical locations, you will have issues of quorums + where to store you third replica. Some idea:
- Use 4 replicas and store 2 replicas per data center, with a pool min_size of 2. Have 1 of the Management nodes act as pure monitor, do not assign roles for storage nor iSCSI, install this node as a VM in a system not using PetaSAN ( example VMWare HA ) so it will run in 1 of the datacenters and fails to the next, its role is provide monitor quorum.
- Have 2 separate installations and divide the load between so each will have active and standby images. and setup async/geo replication between them.
Last edited on October 30, 2018, 5:32 pm by admin · #4
Setup advice
aldoir
2 Posts
Quote from aldoir on October 30, 2018, 12:20 amMy currently storage (2x IBM V7000 mirrored) are reaching EOL and I'm looking for a scalable replacement solution without vender lock-in. I follow the Ceph development but never had guts to implement it, mainly because the lack of qualified professionals and/or capacity to train and keep them.
But this project caught my attention the way you wrapped the solution in a friendly way (for a storage administrator). So, here is my plan.
My main usage is for virtualization (Hyper-V) with a peak usage of 1500 write IOPS (low for modern setups).
A have two datacenters interconected with abundand fiber pairs available, so for latency matters it's a local network.
My setup, based on available hardware would be:
4x Lenovo SR630 (two on each datacenter)
- Redundant power supply
- 8x 2.5" 1,8Tb SAS for OSD
- 2x 2.5" 240Gb SSD for Journal
- 32Gb RAM
- 1x octa-core Xeon Silver processor
- 128Gb M.2 card (Operating System)
- 1x PCIe LOM Card 4x 10G BASE-T
4x Cisco SG350XG-2F10 12-Port 10GBase-T (two on each DataCenter for MultiPath)
- Link agreggation on SFP ports (20Gb combined)
I would use replica=3 spread across both datacenters.
Will this setup reach something around 2500 write IOPS? How about maitenance in long term, ie support for adding new nodes, software upgrades?
Thanks for any advices
My currently storage (2x IBM V7000 mirrored) are reaching EOL and I'm looking for a scalable replacement solution without vender lock-in. I follow the Ceph development but never had guts to implement it, mainly because the lack of qualified professionals and/or capacity to train and keep them.
But this project caught my attention the way you wrapped the solution in a friendly way (for a storage administrator). So, here is my plan.
My main usage is for virtualization (Hyper-V) with a peak usage of 1500 write IOPS (low for modern setups).
A have two datacenters interconected with abundand fiber pairs available, so for latency matters it's a local network.
My setup, based on available hardware would be:
4x Lenovo SR630 (two on each datacenter)
- Redundant power supply
- 8x 2.5" 1,8Tb SAS for OSD
- 2x 2.5" 240Gb SSD for Journal
- 32Gb RAM
- 1x octa-core Xeon Silver processor
- 128Gb M.2 card (Operating System)
- 1x PCIe LOM Card 4x 10G BASE-T
4x Cisco SG350XG-2F10 12-Port 10GBase-T (two on each DataCenter for MultiPath)
- Link agreggation on SFP ports (20Gb combined)
I would use replica=3 spread across both datacenters.
Will this setup reach something around 2500 write IOPS? How about maitenance in long term, ie support for adding new nodes, software upgrades?
Thanks for any advices
admin
2,930 Posts
Quote from admin on October 30, 2018, 11:20 am2500 write iops is low, most installations are much higher than this. You would need to test your hardware yourself, it will take a few minutes to install a 3 node cluster and run the included benchmark. I would recommend you run them all local first to take away any cross-datacenter latency at first.
One recommendation to help boost performance with hdd, you can probably add more that 8 per node and the performance will scale. Also if you have a controller with write-back cache (battery backed) it can further increase write performance by 3-5 times.
PetaSAN is an SDS solution, you can add nodes as you wish, both performance and capacity will scale as you add nodes.
Our installer does automatic upgrades from previous version(s)
2500 write iops is low, most installations are much higher than this. You would need to test your hardware yourself, it will take a few minutes to install a 3 node cluster and run the included benchmark. I would recommend you run them all local first to take away any cross-datacenter latency at first.
One recommendation to help boost performance with hdd, you can probably add more that 8 per node and the performance will scale. Also if you have a controller with write-back cache (battery backed) it can further increase write performance by 3-5 times.
PetaSAN is an SDS solution, you can add nodes as you wish, both performance and capacity will scale as you add nodes.
Our installer does automatic upgrades from previous version(s)
aldoir
2 Posts
Quote from aldoir on October 30, 2018, 11:43 amThanks for your quick response.
Since I don't have this hardware in hand maybe the best topic title would be "buying advice" to avoid buying the wrong gear.
On such scenario where one datacenter might go completely down and I need the remaining site serving the requests, would you recommend two bigger servers with more slots instead four small ones?
I have a special concern about brain split. I believe I'll need to place a manager on a third site/cloud acting only as a quorum decisor. Sooner or later the cluster will grow, but I would like to start the project as sucessful as it can be
For a bigger server with hot-swappable drives, could I replace drives without downtime on that node?
Thanks for your quick response.
Since I don't have this hardware in hand maybe the best topic title would be "buying advice" to avoid buying the wrong gear.
On such scenario where one datacenter might go completely down and I need the remaining site serving the requests, would you recommend two bigger servers with more slots instead four small ones?
I have a special concern about brain split. I believe I'll need to place a manager on a third site/cloud acting only as a quorum decisor. Sooner or later the cluster will grow, but I would like to start the project as sucessful as it can be
For a bigger server with hot-swappable drives, could I replace drives without downtime on that node?
admin
2,930 Posts
Quote from admin on October 30, 2018, 5:30 pmThe recommendation in our hardware guide is to have an all SSD setup, most new Ceph clusters in the near future will be so. The bluestore storage engine will work better this way. In such have 5-10 SSDs per host.
The next step, if you need to use HDDs, have larger number such as 12-24 per host, with SSD journals using a ratio of 4:1 + use a controller with write back cache (battery backed) either in jbod mode or as single disk RAID -0 volumes.
I would recommend more smaller servers than fewer larger ones.
hot-swap is great, but even if you have to shut down a node to replace disks, the cluster keeps running, there is no client down time.
If you want you cluster to be across data centers but only 2 physical locations, you will have issues of quorums + where to store you third replica. Some idea:
- Use 4 replicas and store 2 replicas per data center, with a pool min_size of 2. Have 1 of the Management nodes act as pure monitor, do not assign roles for storage nor iSCSI, install this node as a VM in a system not using PetaSAN ( example VMWare HA ) so it will run in 1 of the datacenters and fails to the next, its role is provide monitor quorum.
- Have 2 separate installations and divide the load between so each will have active and standby images. and setup async/geo replication between them.
The recommendation in our hardware guide is to have an all SSD setup, most new Ceph clusters in the near future will be so. The bluestore storage engine will work better this way. In such have 5-10 SSDs per host.
The next step, if you need to use HDDs, have larger number such as 12-24 per host, with SSD journals using a ratio of 4:1 + use a controller with write back cache (battery backed) either in jbod mode or as single disk RAID -0 volumes.
I would recommend more smaller servers than fewer larger ones.
hot-swap is great, but even if you have to shut down a node to replace disks, the cluster keeps running, there is no client down time.
If you want you cluster to be across data centers but only 2 physical locations, you will have issues of quorums + where to store you third replica. Some idea:
- Use 4 replicas and store 2 replicas per data center, with a pool min_size of 2. Have 1 of the Management nodes act as pure monitor, do not assign roles for storage nor iSCSI, install this node as a VM in a system not using PetaSAN ( example VMWare HA ) so it will run in 1 of the datacenters and fails to the next, its role is provide monitor quorum.
- Have 2 separate installations and divide the load between so each will have active and standby images. and setup async/geo replication between them.