Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Which performance values are realistic?

I now have a cluster with three nodes.

3 x HPE DL360 Gen9
in each node:
128GB Ram
2 x Intel Xeon E5-2630LV3 8x 1.80 GHz
1 x Intel DCS3520 120GB SSD for petasan system
4 x Samsung PM1643a 960GB OSD

1x 1gbit management
2 x 10g Intel x520-da2 (one port each for backend and one port each for iscsi1 and iscsi2)
Backend is configured as a bond (balance-alb) and everything is crossed on two switches.
If I now move Hyper-V VMs from one host with PetaSAN to another with direct Raid10 with 6 x 1TB SSD, I achieve a throughput of about 600MB/s according to PetaSAN Dashboard.
IOPS are according to cluster benchmark about 45k read and 23k write 4k random with 2 nodes.
What kind of values are common in such a system? Aren't the 600MB/s a bit low? Or do I expect too much?

Does it bring more performance if I activate Jumbo Frames on the iscsi interfaces and switches? Can I change in the json file afterwards or?

i made the changes in cluster_info.json on one node and thought it would automatically replicate to the other nodes. But after 2 restarts of all nodes I noticed that Jumbo Frames still does not work correctly and the files on the other two nodes did not change. So I just copied the json to the other two nodes. After restarting all nodes I came to the management interface for a short time and then nothing worked. It seems that the other two nodes are shut down again. Could it be that something is wrong with the Jumbo Frames configuration?

 

here is my config

{
"backend_1_base_ip": "192.168.180.0",
"backend_1_eth_name": "Backend-4-7",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth4,eth7",
"is_jumbo_frames": true,
"mode": "balance-alb",
"name": "Backend-4-7",
"primary_interface": "eth4"
}
],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "3",
"eth_count": 10,
"jf_mtu_size": "9000",
"jumbo_frames": [
"eth4",
"eth5",
"eth6",
"eth7"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.180.201",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.201",
"name": "psn01"
},
{
"backend_1_ip": "192.168.180.202",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.202",
"name": "psn02"
},
{
"backend_1_ip": "192.168.180.203",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.203",
"name": "psn03"
}
],
"name": "PetaSAN-Cluster-01",
"storage_engine": "bluestore"
}

ok, it seems that when you change settings you really have to turn off fencing during this time. Now everything works. MTU fits, too. But why the nodes lose contact with each other only because of something like that in the backend I don't understand.

only my first question remains open, concerning the performance. maybe someone can say something about it.

There is a large variation in performance numbers, it depends a lot on hardware components,

For iops: run a 5 min benchmark, 256 threads, 2 clients. then look at the charts on dashboard for cpu and disk % utilisation: if disk is higher you could get more iops by adding more disks, for example have 6 or 8 per node. If on the other hand your cpu is near 100%  then it is the bottleneck, so you can not get more iops unless you add more nodes or have better cpus with higher number of cores and frequency.

For throughput: the 600 MB/s how many copy operations were going on in parallel ? the more operations the more you should get as the system scales quite well.

Cluster IOPS
Write Read
24127 45235

 

Write Resource Load:

Memory Util% 22

CPU Util% avg 57 - max 82

Network Util% Avg 4 - Max 5

Disks Util% Journals 0 and OSDs Avg 42 - Max 47

 

Read Resource Load:

Memory Util% 22

CPU Util% avg 15 - max 33

Network Util% Avg 2 - Max 4

Disks Util% Journals 0 and OSDs Avg 22 - Max 35

 

Can you check from the charts as well on all 3 nodes, the benchmark excludes the 2 nodes simulating client load, they will probably show higher cpu load. If so then it is mainly a cpu issue however your cluster could deliver a bit more iops if the clients were external as in real case. To get useful chart run the test for 5 min.