Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Performance issues; new to ceph and petasan

Pages: 1 2

Hello,

I am very new to ceph and Petasan, hence please excuse my lack of knowledge on this.

i have tried the following

node 1,2,3 (all same hardware specs)

intel core i5 processor, 24 gb ram 3 x 1gbps nic 1 x 160 gb sata for OS and 1 x 1tb hdd for OSD

cluster network configuration

etho - management and backend1

eth1 - iscsi1

eth2 - backend2 and iscsi2

 

all systems are configured the same way by following the quick start guide and connected to the same ethernet switch

my cluster has come online (using default cluster configuration)

i was able to mount the iscsi volume on my windows 10 pro system (seems mpio is not supported on desktop os by microsoft)

when i strart the copy from win10 system (ssd) to iscsi disk, the initial file transfer goes up to 100-110MBp/s; but after a couple of minutes it drops down to 20~30 MBp/s. the only way i can get the speed back is to pause the file copy for some time and then resume it. in whcih case after resuming, the speed is high for a couple of minutes and then reduced back.

is there something i need to check / tweak; or something i m doing wrong?

 

any pointers will ehlp. Please do feel free to ask for further information if i have missed in my original post.

Unfortunately your hardware is under-powered to get good results. The initial 110 MBps surge is due to the Windows write cache, the steady 20-30 MBps is your real write speed. Note that due to replication, the cluster is writing 2x or 3x this speed. The read speed will also be at least double due to this.

If you have several concurrent io operations ( example: several file copies, io on more than 1 iSCSI disk, database doing concurrent transactions..etc ) you may get a total bandwidth higher that this, but your single stream speed will be the same.

To get a boost with minimum cost, you would need a couple of more disks, this will scale your total bandwidth but your single stream bandwidth will be the same.

To get good results you need 10G nics and at least some ssds to act as journal devices for a couple of spinner disks. The recommendation is to use all ssds. You also need to have cpu cores as per our hardware guide.

Remember this is a scale out system, the more you add the faster it gets.

Thank you very much for the clarity

i will try with a dell 520 server 8 core x 2 processor and 64 gb ram as a node tomorrow and share if i see any improvement on this

since this is only a lab setup; i am unable to add SSD or 10g nics at the moment. Will surely refer to the hardware guide.

Hey Admin ,

CPU: 2 x E5-2630v4 10core
RAM: 128GB
2 x 2TB NVME
Storage: 2 x 1TB SATA
Storage: 10 x 8TB SATA
Storage: 2 x 800GB SATA SSD.

I get really bad speed results. can you please give me some CLI test commands to report results.

I noticed a python script that runs there.

Note: I used both NVME and SSD as journals. but still same results.

What test did you use to measure performance ?

Can you run the cluster benchmark for throughput and iops at  1 and 64 threads for 5 min ?

If you have been trying other hardware,  is the performance issue specific to this configuration ?

Do you use a controller with write back cache ?

What model do you use for nvme and SSD  devices ?

Can you use all OSDs to be same disk type and capacity.

Quote from admin on October 18, 2018, 3:44 pm

What test did you use to measure performance ?

https://crystalmark.info/en/download/ - On Windows.

DD Command on Linux.

 

Can you run the cluster benchmark for throughput and iops at  1 and 64 threads for 5 min ?

Results
Cluster Throughput
Write     Read
97 MB/s    122 MB/s

srocceph4 Write Resource Load Details
Memory
kbmemfree    kbmemused    kbbuffers    kbcached    kbcommit    %commit    kbactive    kbinact    kbdirty    %util
114291936    17626800    433828    588872    24893736    18    16652124    313364    1152    13

Network Interfaces
Interface    rxpck/s    txpck/s    rxkB/s    txkB/s    rxcmp/s    txcmp/s    rxmcst/s    %util
eth0    4471    1018    34195    500    0    0    0    2
eth1    5488    5449    36009    33932    0    0    0    2
eth2    8    5    0    0    0    0    0    0

Disks
Disk    tps    rd_sec/s    wr_sec/s    avgrq-sz    avgqu-sz    await    svctm    %util
sdb    24    0    12451    512    0    15    1    3
sdc    27    0    13871    512    0    14    1    3
sde    29    0    14854    512    0    15    1    4
sdd    33    0    16930    512    0    15    1    4
sdf    30    0    15619    512    0    15    1    4
sdg    20    0    10485    512    0    16    1    3
sdh    25    0    13216    512    0    15    1    3
sdi    26    0    13544    512    0    15    1    3
sdj    25    0    12888    512    0    15    1    3
sdk    26    0    13325    512    0    15    1    3

 

srocceph4 Read Resource Load Details
Memory
kbmemfree    kbmemused    kbbuffers    kbcached    kbcommit    %commit    kbactive    kbinact    kbdirty    %util
114350752    17567984    433940    589868    25511160    19    16601088    313660    660    13

Network Interfaces
Interface    rxpck/s    txpck/s    rxkB/s    txkB/s    rxcmp/s    txcmp/s    rxmcst/s    %util
eth0    1142    5316    705    41070    0    0    0    3
eth1    424    529    314    321    0    0    0    0
eth2    8    5    0    0    0    0    0    0

Disks
Disk    tps    rd_sec/s    wr_sec/s    avgrq-sz    avgqu-sz    await    svctm    %util
sdb    5    2839    0    512    0    34    2    1
sdc    8    4259    0    512    0    34    2    2
sde    10    5461    0    512    0    35    2    3
sdd    9    4696    0    512    0    33    2    2
sdf    10    5529    0    512    0    32    2    3
sdg    7    3932    0    512    0    33    2    2
sdh    12    6225    0    512    0    32    2    3
sdi    10    5133    0    512    0    31    2    2
sdj    10    5352    0    512    0    33    2    2
sdk    10    5461    0    512    0    31    2    2

and the second host has the same results. ( roughly).

 

If you have been trying other hardware,  is the performance issue specific to this configuration ?

The second Cluster - it has the same results.

SSG-6028R-E1CR12L

X10DRH-IT, CSE-826BTS-R920LPBP-1

P4X-DPE52620V4-SR2R6

BDW-EP 8C/16T E5-2620V4 2.1G 20M 8GT 85W R3 2011 R0

MEM-DR416L-CL07-ER26

16GB DDR4-2666 2RX8 ECC RDIMM

HDS-IAN0-SSDPED1K375GAX

Intel 3D XPoint DC P4800X 375G PCIe 3.0 HHHLAIC 30DWPD FW420

HDS-I2T2-SSDSC2KB240G7

Intel S4500 240GB, SATA 6Gb/s, 3D, TLC 2.5" 1DWPD FW121

HDD-T8000-ST8000NM0055

Seagate 3.5", 8TB, SATA3.0, 7.2K RPM, 256M, 512E,

Do you use a controller with write back cache ?

No Controller - its all JBOD.

What model do you use for nvme and SSD  devices ?

Supermicro SuperStorage Server 6028R-E1CR12T - 12x SATA/SAS - LSI 3108 12G SAS onboard RAID - Dual 10-Gigabit Ethernet - 920W Redundant

AVD-SSDPEDMX020T7

Intel DC P3520 2.0TB,NVMe PCIe3.0x4,3D MLC HHHL AIC0.6DW

2TD-SSDSC2BB800G7

Intel S3520 800GB, SATA 6Gb/s, 3D MLC 2.5" 7.0mm

Can you use all OSDs to be same disk type and capacity.

T8000-ST8000NM0055

Seagate 3.5", 8TB, SATA3 6Gb/s, 7.2K RPM, 256M, 4kN - Storage

 

The 97 MB/s    122 MB/s is the throughput when using 1 thread (correct ?), can you run the test with 64 threads to simulate concurrent streams ?  it should scale up the more streams you have. These tests are done with 4MB block sizes (standard for throughput tests), a Windows file copy would use 256k sizes so it will be a bit lower, it will also scale the more streams you have.

The above is for MB/s speed, if you (also) want high IOPS: you will either need an all SSD solution or use controller with cache if using spinning disks.

Quote from admin on October 19, 2018, 8:59 am

The 97 MB/s    122 MB/s is the throughput when using 1 thread (correct ?), can you run the test with 64 threads to simulate concurrent streams ?  it should scale up the more streams you have. These tests are done with 4MB block sizes (standard for throughput tests), a Windows file copy would use 256k sizes so it will be a bit lower, it will also scale the more streams you have.

The above is for MB/s speed, if you (also) want high IOPS: you will either need an all SSD solution or use controller with cache if using spinning disks.

We get around 70MB writes which is slow in ESXI  - ISCSI..

These are the results on 64 threads.

 

Results
Cluster Throughput
Write     Read
1695 MB/s    1646 MB/s

Memory
kbmemfree    kbmemused    kbbuffers    kbcached    kbcommit    %commit    kbactive    kbinact    kbdirty    %util
117317948    14600788    435740    601480    25612584    19    13600708    316648    336    11

Network Interfaces
Interface    rxpck/s    txpck/s    rxkB/s    txkB/s    rxcmp/s    txcmp/s    rxmcst/s    %util
eth0    67628    15678    589376    1481    0    0    0    48
eth1    81860    82341    594371    594335    0    0    0    48
eth2    12    10    0    0    0    0    0    0
Disks
Disk    tps    rd_sec/s    wr_sec/s    avgrq-sz    avgqu-sz    await    svctm    %util
sdb    444    0    227519    512    6    14    1    46
sdc    460    0    235847    512    6    15    1    47
sde    463    0    237486    512    7    15    1    48
sdd    489    0    250702    512    7    15    1    50
sdf    477    0    244387    512    7    15    1    49
sdg    438    0    224713    512    6    14    1    45
sdh    442    0    226740    512    7    16    1    46
sdi    444    0    227409    512    6    14    1    44
sdj    442    0    226699    512    6    13    1    44
sdk    477    0    244340    512    6    14    1    48

There are several performance metrics, choosing which to achieve depends on your expected workload and budget.

Total cluster MB/s: The 97-122 MB/s went up to 1695-1646 MB/s under concurrent load. The 1695-1646 MB/s is probably network saturation on the client and your cluster is able to give higher values if you have more clients.

Single stream MB/s: The 97-122 MB/s for a single stream, the system will give you less than what your drive will give you, this unlike RAID, it is lower stream speed but can scale with many streams. The ESXi 70 MB/s is due to the smaller block sizes + iSCSI step overhead, if you add more vms, the total speed will increase. To get higher speeds you would need all flash or  a controller with cache for your spinning disks. In some cases you can increase single stream performance via using rbd striping or creating a RAID-0 out of the PetaSAN iSCSI disks, but this could have negative impact if you end up needing many simultaneous concurrent streams.

Total cluster IOPS: For iops sensitive workloads, you would need all flash or controller with cache for spinning disks + the more cores you have the better.

Single steam IOPS: This is the most expensive to achieve in an SDS distributed solution, in addition to all flash, you would need fastest GHz cpu + 40/100 Gbps nics.

Hey guys, sorry to Necro this thread but I have been playing with Petasan lately as a hopeful replacement of a traditional Ceph cluster deployed via Ceph-Ansible on Centos 7. Running Latest version of Nautilus.

 

The main reason I have been looking into Petasan is due to the terrible performance of ceph-iscsi. My understanding is that petasan does not use ceph-iscsi and instead their own tool written specifically for petasan.

Performance of Ceph-iSCSI direct to Windows is quite bad, in the neighborhood of 150MB/s writes and about double reads, but VMware its even worse.

but from looking at this thread, it appears maybe I was incorrect in my assessment of Petasan having better than Ceph-iscsi performance? I was sure I read something in the neighborhood of 75% of native RADOS performance with iSCSI.

So, let me just start by saying I have access to a lot of very powerful hardware, but have been just testing in a PoC cluster for now with 5 OSD's per server with no SSD's at all.. which is the same thing I used for my base numbers for the traditional Ceph cluster as well.

Only problem is, I am seeing much worse perfromance through Petasan right now.  I am sure something is done incorrectly, I am hoping someone can point me in the right direction.
Also, just to ask a question about the higher parallel performance when using more VM's. This doesn't make sense to me in the way I understand Ceph. If you only use 1 RBD as a datastore for VMware, its my understanding that Ceph sees that RBD as a single RADOS client, and so I'm not quite sure how adding more VM's to a single RBD gets you the high throughput and parallel performance that Ceph offers.

My understanding is that in order to really be using Ceph correctly to get the full throughput, is to have multiple RBD's. Even something like a new RBD cut for each VM - the way that Proxmox does it.

Please correct me if I am wrong as I would definitely love to know if I am completely off base here.

Thanks in advance!

Pages: 1 2