Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Very poor performance, what did I do wrong?

Greetings,

Please help.

My initial test setup with PetaSAN was 3 Dell PowerEdge R610s towers each with 6x 2Tb SATA disks. The PERC 6i doesn't have an IT mode so i setup each disk as a single disk RAID0. Even with only 32G each, the testing appeared to go very well and I was able to get satisfactory iSCSI write speeds (50+Mbps). For my production setup I have 3 Dell PE 720's rackmount capabable of holding 24x 2.5" SAS or SATA drives. They each have 128G RAM and an LSI SAS 9211-8i flashed to IT Mode. Each has 8x 1TB 2.5" SATA drives and 4 300G 15K RPM SAS drives + one 300G SAS for the OS. Initially, I setup the SAS as journal disks for the SATAs but was getting terrible write speeds. I rebuilt the systems to use only the SATA with the same results. I even removed the network BOND for the management and backend networks so that each runs off a single NIC. I have a Gbe Switch and the iSCSI and Backend networks are setup with Jumbo Frames. I realize that 10GbE is ideal, but I'm not looking for ludicrous speeds here. This this is for backup data.,  so I'm just trying to get something comparable to the Synology storage we have been using, which, even with 4TB-8TB SATA, gets 400+ IOPS write speeds. I was getting about 100 IOPS with the SAS journals and now I am getting about 17 IOPS without. I'm testing this by "vMotioning" a virtual machine with a large HDD between a Synology and PetaSAN.

Any suggestions or help would be very much appreciated.

Thank you in advance for your advice.

~Chris

First the recommended hardware is all flash and 10G network, if you have hdd then SSD journal should be a min if you want decent performance. As you have seen there was a big gap between your raid0, SAS jorunal and pure SATA setups. so hardware setups make a big difference.

Synology probably uses RAID or talk to local disks unlike a SDS solution, so hard drive latency may not show such low impact.  Note that with RAID the drives you add improve performance for a single io operation. In Ceph the performance for single operation does not improve with more disks and is totally dependent on the speed of a single disk...the more disks you add the more total iops you get for all your operations but a single operation speed remain the same. RAID is better performance for a single operation but Ceph is designed to shine when you have a lot of concurrent load.

To understand what a write operation will do : it will look up the object metadata to know where the object is stored, seek to that location to write it, then updare the metadata ( new lenghts, crc, location on disk) then after this is completed will send the write command to the 2 replica host and wait for them before returning to the client. As you can see there are 2 metadata io ( read, write ) beside the data write + doing the same on the replicas. The 17 iops are very low, but think for each client write iop you are doing 3 iops on 1 disk then waiting the same time on the replica. If your hdd does 100 raw iops, you are not far off. Using an SSD jorunal will put you metadata read/write on a much faster disk so you should really do that.

Another thing to mention is that for data integrity, Ceph uses sync writes ( FUA: force unit access / Flush ) on every write to guarantee integrity but this bypasses any volatile caching on disk.

Thank you. So it seems that for my specific need, PetaSAN is not a good choice. I was planning to make this is a backup destination infrastructure that would be highly scalable as my storage needs increased. The inbound won't be a lot of concurrent loads. I wonder if I should change the controllers to RAID 5 or 6 and allow them to do more write caching. Or would I just be continuing to shoot myself in the foot?

 

Thanks again

Maybe it is not the correct choice, depends. First if you are looking for backup then iops is not the concern but it is more throughput, hdds are not bad for this as there throughput is not bad. If your are backing from Windows host, you can use petasan-copy tool : it copies a single file stream into 16 parallel streams to get higher concurrency, + it will use 4M block size when doing the copies, it is recommended to change some registry settings in Windows as the tool will show. There could be similar commercial tools that do such things. Again if you can place 1 SSD journal to 4-5 hdd OSDs it will make a big difference.

Good luck.

 

Unfortunately, this will be connected via iSCSI to Vmware. Trying to vMotion to it is really slow and with the running VM, which is a Linux Veeam Repository, writing to it is also slow. So I think I'm screwed in all directions. I can try the SSD journal though. Is there a rule of thumb regarding what size the SSD should be relative to the size of the HDDs it's journaling?

The SSD requires 64G per each OSD it servers, recommended to serve 4 OSDs, but some users go higher.

For VMWare, make sure you do increase the MaxIOSize from 128k to 512k as per our guide, this makes a big difference with hdd.

Try doing a few backup operations at the same time if possible as the system is good this, or you can try get a backup or file copy tool that will split the stream in parallel, petasan-copy can be run if using Windows vm.

Another idea is to create a RAID0/5 from several PetaSAN disks on your client host, so when it copies the stream will be split and you will get parallel operations, however for this to work well you should use large block sizes to do the copy operation since it will be split among the drives, so use 1-4 M block size if you can.

I'm looking into your recommendations. When considering the 10+GbE, is that needed for the backend interfaces AND the iSCSI interfaces? Would there be any benefit to just making the backend 10G? If i also make the iSCSI 10G then i need to add 10G NICs to my VM hosts too. So that could get expensive very quickly.

The backend subnet bandwidth is at least as equal or larger than your iSCSI. However in PetaSAN you can map more than 1 subnet on the same network interface (or bond). So you can have a single 10G and map your iSCSI 1/2 and backend on this single interface if you  wish.