Which performance values are realistic?
exitsys
43 Posts
October 1, 2020, 9:14 pmQuote from exitsys on October 1, 2020, 9:14 pmI now have a cluster with three nodes.
3 x HPE DL360 Gen9
in each node:
128GB Ram
2 x Intel Xeon E5-2630LV3 8x 1.80 GHz
1 x Intel DCS3520 120GB SSD for petasan system
4 x Samsung PM1643a 960GB OSD
1x 1gbit management
2 x 10g Intel x520-da2 (one port each for backend and one port each for iscsi1 and iscsi2)
Backend is configured as a bond (balance-alb) and everything is crossed on two switches.
If I now move Hyper-V VMs from one host with PetaSAN to another with direct Raid10 with 6 x 1TB SSD, I achieve a throughput of about 600MB/s according to PetaSAN Dashboard.
IOPS are according to cluster benchmark about 45k read and 23k write 4k random with 2 nodes.
What kind of values are common in such a system? Aren't the 600MB/s a bit low? Or do I expect too much?
Does it bring more performance if I activate Jumbo Frames on the iscsi interfaces and switches? Can I change in the json file afterwards or?
I now have a cluster with three nodes.
3 x HPE DL360 Gen9
in each node:
128GB Ram
2 x Intel Xeon E5-2630LV3 8x 1.80 GHz
1 x Intel DCS3520 120GB SSD for petasan system
4 x Samsung PM1643a 960GB OSD
1x 1gbit management
2 x 10g Intel x520-da2 (one port each for backend and one port each for iscsi1 and iscsi2)
Backend is configured as a bond (balance-alb) and everything is crossed on two switches.
If I now move Hyper-V VMs from one host with PetaSAN to another with direct Raid10 with 6 x 1TB SSD, I achieve a throughput of about 600MB/s according to PetaSAN Dashboard.
IOPS are according to cluster benchmark about 45k read and 23k write 4k random with 2 nodes.
What kind of values are common in such a system? Aren't the 600MB/s a bit low? Or do I expect too much?
Does it bring more performance if I activate Jumbo Frames on the iscsi interfaces and switches? Can I change in the json file afterwards or?
Last edited on October 1, 2020, 9:15 pm by exitsys · #1
exitsys
43 Posts
October 2, 2020, 12:25 amQuote from exitsys on October 2, 2020, 12:25 ami made the changes in cluster_info.json on one node and thought it would automatically replicate to the other nodes. But after 2 restarts of all nodes I noticed that Jumbo Frames still does not work correctly and the files on the other two nodes did not change. So I just copied the json to the other two nodes. After restarting all nodes I came to the management interface for a short time and then nothing worked. It seems that the other two nodes are shut down again. Could it be that something is wrong with the Jumbo Frames configuration?
here is my config
{
"backend_1_base_ip": "192.168.180.0",
"backend_1_eth_name": "Backend-4-7",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth4,eth7",
"is_jumbo_frames": true,
"mode": "balance-alb",
"name": "Backend-4-7",
"primary_interface": "eth4"
}
],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "3",
"eth_count": 10,
"jf_mtu_size": "9000",
"jumbo_frames": [
"eth4",
"eth5",
"eth6",
"eth7"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.180.201",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.201",
"name": "psn01"
},
{
"backend_1_ip": "192.168.180.202",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.202",
"name": "psn02"
},
{
"backend_1_ip": "192.168.180.203",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.203",
"name": "psn03"
}
],
"name": "PetaSAN-Cluster-01",
"storage_engine": "bluestore"
}
i made the changes in cluster_info.json on one node and thought it would automatically replicate to the other nodes. But after 2 restarts of all nodes I noticed that Jumbo Frames still does not work correctly and the files on the other two nodes did not change. So I just copied the json to the other two nodes. After restarting all nodes I came to the management interface for a short time and then nothing worked. It seems that the other two nodes are shut down again. Could it be that something is wrong with the Jumbo Frames configuration?
here is my config
{
"backend_1_base_ip": "192.168.180.0",
"backend_1_eth_name": "Backend-4-7",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth4,eth7",
"is_jumbo_frames": true,
"mode": "balance-alb",
"name": "Backend-4-7",
"primary_interface": "eth4"
}
],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "3",
"eth_count": 10,
"jf_mtu_size": "9000",
"jumbo_frames": [
"eth4",
"eth5",
"eth6",
"eth7"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.180.201",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.201",
"name": "psn01"
},
{
"backend_1_ip": "192.168.180.202",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.202",
"name": "psn02"
},
{
"backend_1_ip": "192.168.180.203",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.203",
"name": "psn03"
}
],
"name": "PetaSAN-Cluster-01",
"storage_engine": "bluestore"
}
Last edited on October 2, 2020, 12:27 am by exitsys · #2
exitsys
43 Posts
October 2, 2020, 12:41 amQuote from exitsys on October 2, 2020, 12:41 amok, it seems that when you change settings you really have to turn off fencing during this time. Now everything works. MTU fits, too. But why the nodes lose contact with each other only because of something like that in the backend I don't understand.
only my first question remains open, concerning the performance. maybe someone can say something about it.
ok, it seems that when you change settings you really have to turn off fencing during this time. Now everything works. MTU fits, too. But why the nodes lose contact with each other only because of something like that in the backend I don't understand.
only my first question remains open, concerning the performance. maybe someone can say something about it.
Last edited on October 2, 2020, 12:42 am by exitsys · #3
admin
2,930 Posts
October 2, 2020, 2:06 amQuote from admin on October 2, 2020, 2:06 amThere is a large variation in performance numbers, it depends a lot on hardware components,
For iops: run a 5 min benchmark, 256 threads, 2 clients. then look at the charts on dashboard for cpu and disk % utilisation: if disk is higher you could get more iops by adding more disks, for example have 6 or 8 per node. If on the other hand your cpu is near 100% then it is the bottleneck, so you can not get more iops unless you add more nodes or have better cpus with higher number of cores and frequency.
For throughput: the 600 MB/s how many copy operations were going on in parallel ? the more operations the more you should get as the system scales quite well.
There is a large variation in performance numbers, it depends a lot on hardware components,
For iops: run a 5 min benchmark, 256 threads, 2 clients. then look at the charts on dashboard for cpu and disk % utilisation: if disk is higher you could get more iops by adding more disks, for example have 6 or 8 per node. If on the other hand your cpu is near 100% then it is the bottleneck, so you can not get more iops unless you add more nodes or have better cpus with higher number of cores and frequency.
For throughput: the 600 MB/s how many copy operations were going on in parallel ? the more operations the more you should get as the system scales quite well.
exitsys
43 Posts
October 2, 2020, 1:14 pmQuote from exitsys on October 2, 2020, 1:14 pmCluster IOPS
Write Read
24127 45235
Write Resource Load:
Memory Util% 22
CPU Util% avg 57 - max 82
Network Util% Avg 4 - Max 5
Disks Util% Journals 0 and OSDs Avg 42 - Max 47
Read Resource Load:
Memory Util% 22
CPU Util% avg 15 - max 33
Network Util% Avg 2 - Max 4
Disks Util% Journals 0 and OSDs Avg 22 - Max 35
Cluster IOPS
Write Read
24127 45235
Write Resource Load:
Memory Util% 22
CPU Util% avg 57 - max 82
Network Util% Avg 4 - Max 5
Disks Util% Journals 0 and OSDs Avg 42 - Max 47
Read Resource Load:
Memory Util% 22
CPU Util% avg 15 - max 33
Network Util% Avg 2 - Max 4
Disks Util% Journals 0 and OSDs Avg 22 - Max 35
admin
2,930 Posts
October 2, 2020, 1:51 pmQuote from admin on October 2, 2020, 1:51 pmCan you check from the charts as well on all 3 nodes, the benchmark excludes the 2 nodes simulating client load, they will probably show higher cpu load. If so then it is mainly a cpu issue however your cluster could deliver a bit more iops if the clients were external as in real case. To get useful chart run the test for 5 min.
Can you check from the charts as well on all 3 nodes, the benchmark excludes the 2 nodes simulating client load, they will probably show higher cpu load. If so then it is mainly a cpu issue however your cluster could deliver a bit more iops if the clients were external as in real case. To get useful chart run the test for 5 min.
Last edited on October 2, 2020, 1:52 pm by admin · #6
Which performance values are realistic?
exitsys
43 Posts
Quote from exitsys on October 1, 2020, 9:14 pmI now have a cluster with three nodes.
3 x HPE DL360 Gen9
in each node:
128GB Ram
2 x Intel Xeon E5-2630LV3 8x 1.80 GHz
1 x Intel DCS3520 120GB SSD for petasan system
4 x Samsung PM1643a 960GB OSD1x 1gbit management
2 x 10g Intel x520-da2 (one port each for backend and one port each for iscsi1 and iscsi2)
Backend is configured as a bond (balance-alb) and everything is crossed on two switches.
If I now move Hyper-V VMs from one host with PetaSAN to another with direct Raid10 with 6 x 1TB SSD, I achieve a throughput of about 600MB/s according to PetaSAN Dashboard.
IOPS are according to cluster benchmark about 45k read and 23k write 4k random with 2 nodes.
What kind of values are common in such a system? Aren't the 600MB/s a bit low? Or do I expect too much?Does it bring more performance if I activate Jumbo Frames on the iscsi interfaces and switches? Can I change in the json file afterwards or?
I now have a cluster with three nodes.
3 x HPE DL360 Gen9
in each node:
128GB Ram
2 x Intel Xeon E5-2630LV3 8x 1.80 GHz
1 x Intel DCS3520 120GB SSD for petasan system
4 x Samsung PM1643a 960GB OSD
1x 1gbit management
2 x 10g Intel x520-da2 (one port each for backend and one port each for iscsi1 and iscsi2)
Backend is configured as a bond (balance-alb) and everything is crossed on two switches.
If I now move Hyper-V VMs from one host with PetaSAN to another with direct Raid10 with 6 x 1TB SSD, I achieve a throughput of about 600MB/s according to PetaSAN Dashboard.
IOPS are according to cluster benchmark about 45k read and 23k write 4k random with 2 nodes.
What kind of values are common in such a system? Aren't the 600MB/s a bit low? Or do I expect too much?
Does it bring more performance if I activate Jumbo Frames on the iscsi interfaces and switches? Can I change in the json file afterwards or?
exitsys
43 Posts
Quote from exitsys on October 2, 2020, 12:25 ami made the changes in cluster_info.json on one node and thought it would automatically replicate to the other nodes. But after 2 restarts of all nodes I noticed that Jumbo Frames still does not work correctly and the files on the other two nodes did not change. So I just copied the json to the other two nodes. After restarting all nodes I came to the management interface for a short time and then nothing worked. It seems that the other two nodes are shut down again. Could it be that something is wrong with the Jumbo Frames configuration?
here is my config
{
"backend_1_base_ip": "192.168.180.0",
"backend_1_eth_name": "Backend-4-7",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth4,eth7",
"is_jumbo_frames": true,
"mode": "balance-alb",
"name": "Backend-4-7",
"primary_interface": "eth4"
}
],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "3",
"eth_count": 10,
"jf_mtu_size": "9000",
"jumbo_frames": [
"eth4",
"eth5",
"eth6",
"eth7"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.180.201",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.201",
"name": "psn01"
},
{
"backend_1_ip": "192.168.180.202",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.202",
"name": "psn02"
},
{
"backend_1_ip": "192.168.180.203",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.203",
"name": "psn03"
}
],
"name": "PetaSAN-Cluster-01",
"storage_engine": "bluestore"
}
i made the changes in cluster_info.json on one node and thought it would automatically replicate to the other nodes. But after 2 restarts of all nodes I noticed that Jumbo Frames still does not work correctly and the files on the other two nodes did not change. So I just copied the json to the other two nodes. After restarting all nodes I came to the management interface for a short time and then nothing worked. It seems that the other two nodes are shut down again. Could it be that something is wrong with the Jumbo Frames configuration?
here is my config
{
"backend_1_base_ip": "192.168.180.0",
"backend_1_eth_name": "Backend-4-7",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth4,eth7",
"is_jumbo_frames": true,
"mode": "balance-alb",
"name": "Backend-4-7",
"primary_interface": "eth4"
}
],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "3",
"eth_count": 10,
"jf_mtu_size": "9000",
"jumbo_frames": [
"eth4",
"eth5",
"eth6",
"eth7"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.180.201",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.201",
"name": "psn01"
},
{
"backend_1_ip": "192.168.180.202",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.202",
"name": "psn02"
},
{
"backend_1_ip": "192.168.180.203",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "172.16.1.203",
"name": "psn03"
}
],
"name": "PetaSAN-Cluster-01",
"storage_engine": "bluestore"
}
exitsys
43 Posts
Quote from exitsys on October 2, 2020, 12:41 amok, it seems that when you change settings you really have to turn off fencing during this time. Now everything works. MTU fits, too. But why the nodes lose contact with each other only because of something like that in the backend I don't understand.
only my first question remains open, concerning the performance. maybe someone can say something about it.
ok, it seems that when you change settings you really have to turn off fencing during this time. Now everything works. MTU fits, too. But why the nodes lose contact with each other only because of something like that in the backend I don't understand.
only my first question remains open, concerning the performance. maybe someone can say something about it.
admin
2,930 Posts
Quote from admin on October 2, 2020, 2:06 amThere is a large variation in performance numbers, it depends a lot on hardware components,
For iops: run a 5 min benchmark, 256 threads, 2 clients. then look at the charts on dashboard for cpu and disk % utilisation: if disk is higher you could get more iops by adding more disks, for example have 6 or 8 per node. If on the other hand your cpu is near 100% then it is the bottleneck, so you can not get more iops unless you add more nodes or have better cpus with higher number of cores and frequency.
For throughput: the 600 MB/s how many copy operations were going on in parallel ? the more operations the more you should get as the system scales quite well.
There is a large variation in performance numbers, it depends a lot on hardware components,
For iops: run a 5 min benchmark, 256 threads, 2 clients. then look at the charts on dashboard for cpu and disk % utilisation: if disk is higher you could get more iops by adding more disks, for example have 6 or 8 per node. If on the other hand your cpu is near 100% then it is the bottleneck, so you can not get more iops unless you add more nodes or have better cpus with higher number of cores and frequency.
For throughput: the 600 MB/s how many copy operations were going on in parallel ? the more operations the more you should get as the system scales quite well.
exitsys
43 Posts
Quote from exitsys on October 2, 2020, 1:14 pmCluster IOPS
Write Read
24127 45235
Write Resource Load:
Memory Util% 22
CPU Util% avg 57 - max 82
Network Util% Avg 4 - Max 5
Disks Util% Journals 0 and OSDs Avg 42 - Max 47
Read Resource Load:
Memory Util% 22
CPU Util% avg 15 - max 33
Network Util% Avg 2 - Max 4
Disks Util% Journals 0 and OSDs Avg 22 - Max 35
Cluster IOPS
Write Read
24127 45235
Write Resource Load:
Memory Util% 22
CPU Util% avg 57 - max 82
Network Util% Avg 4 - Max 5
Disks Util% Journals 0 and OSDs Avg 42 - Max 47
Read Resource Load:
Memory Util% 22
CPU Util% avg 15 - max 33
Network Util% Avg 2 - Max 4
Disks Util% Journals 0 and OSDs Avg 22 - Max 35
admin
2,930 Posts
Quote from admin on October 2, 2020, 1:51 pmCan you check from the charts as well on all 3 nodes, the benchmark excludes the 2 nodes simulating client load, they will probably show higher cpu load. If so then it is mainly a cpu issue however your cluster could deliver a bit more iops if the clients were external as in real case. To get useful chart run the test for 5 min.
Can you check from the charts as well on all 3 nodes, the benchmark excludes the 2 nodes simulating client load, they will probably show higher cpu load. If so then it is mainly a cpu issue however your cluster could deliver a bit more iops if the clients were external as in real case. To get useful chart run the test for 5 min.