Reconfiguration a node and general questions
Pages: 1 2
sgorla
6 Posts
August 8, 2019, 11:11 pmQuote from sgorla on August 8, 2019, 11:11 pmHi, I'm testing your product. My first impression is that it is a very interesting idea. I have been working with servers for more than 10 years, today in our computer center I have set up a laboratory with a pool of servers and switches.
My dedicated systems for these tests are the following:
4 Dell poweredge 2900 servers with 6 500 GB SATA drives
24GB RAM per server.
2 servers with 3 nics 1GB and 2 with 4 nics 1GB.
The goal is to use eth0 administration, use eth1 and eth2 in bond for backend, eth3 available only in two nodes for iscsi.
My first query is as follows:
How can I reconfigure a node without reinstalling Petasan?
The second question:
I configured the main node correctly, with eth0 for administration and in bounding eth1 and eth2 for backend.
eth3 is in two nodes for iscsi.
I access by SSH and I can ping all interfaces and respond correctly. But when I want to add a node, at the end, I get the following error:
Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interface
Another error, I remove the LAGs and put the raw interfaces, when I want to add the third node I get this error:
Alert!
Node interfaces do not match cluster interface settings.
For now that. Thank you!
Hi, I'm testing your product. My first impression is that it is a very interesting idea. I have been working with servers for more than 10 years, today in our computer center I have set up a laboratory with a pool of servers and switches.
My dedicated systems for these tests are the following:
4 Dell poweredge 2900 servers with 6 500 GB SATA drives
24GB RAM per server.
2 servers with 3 nics 1GB and 2 with 4 nics 1GB.
The goal is to use eth0 administration, use eth1 and eth2 in bond for backend, eth3 available only in two nodes for iscsi.
My first query is as follows:
How can I reconfigure a node without reinstalling Petasan?
The second question:
I configured the main node correctly, with eth0 for administration and in bounding eth1 and eth2 for backend.
eth3 is in two nodes for iscsi.
I access by SSH and I can ping all interfaces and respond correctly. But when I want to add a node, at the end, I get the following error:
Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interface
Another error, I remove the LAGs and put the raw interfaces, when I want to add the third node I get this error:
Alert!
Node interfaces do not match cluster interface settings.
For now that. Thank you!
Last edited on August 8, 2019, 11:32 pm by sgorla · #1
admin
2,930 Posts
August 9, 2019, 9:59 amQuote from admin on August 9, 2019, 9:59 amThe configuration is in
/opt/petasan/config/cluster_info.py (applies to all nodes in cluster)
/opt/petasan/config/node_info.py (applies to existing node)
You should be able to change "most" settings for subnets/bonds/jumbo frames/ips..etc. Looking at the file is self explanatory.
some points:
- This is a clustered system rather that a single node, so do not expect to be able to change configuration 1 node at a time while the cluster is running, you probably need to shut down the entire cluster do the changes then bring them all online together.
- While many configurations can be changed, changing backend 1 ip address is not easy at all: it is the ip used by ceph monitors, consul servers and gluster server. Some of these systems are already difficult to undergo these changes once up and running outside of PetaSAN, but you could search their online docs.
- It is much better to try to design the production system network correctly ahead of time than put the effort to change it later during production since it is not a simple system.
- Currently in PetaSAN, all nodes need to have the same number of interfaces.
The configuration is in
/opt/petasan/config/cluster_info.py (applies to all nodes in cluster)
/opt/petasan/config/node_info.py (applies to existing node)
You should be able to change "most" settings for subnets/bonds/jumbo frames/ips..etc. Looking at the file is self explanatory.
some points:
- This is a clustered system rather that a single node, so do not expect to be able to change configuration 1 node at a time while the cluster is running, you probably need to shut down the entire cluster do the changes then bring them all online together.
- While many configurations can be changed, changing backend 1 ip address is not easy at all: it is the ip used by ceph monitors, consul servers and gluster server. Some of these systems are already difficult to undergo these changes once up and running outside of PetaSAN, but you could search their online docs.
- It is much better to try to design the production system network correctly ahead of time than put the effort to change it later during production since it is not a simple system.
- Currently in PetaSAN, all nodes need to have the same number of interfaces.
Last edited on August 9, 2019, 10:03 am by admin · #2
sgorla
6 Posts
August 9, 2019, 11:37 amQuote from sgorla on August 9, 2019, 11:37 am
Good morning, in the previous installation that worked, with 2 adapters for backend (ceph and iscsi) and 1 adapter for administration, I noticed performance problems but only on the side of the iscsi clients.
I did internal cluster tests and the results were good considering the connection speed. But in tests with Hyper-V and ESXi 6.5, configured by iscsi, the performance drops too much.
The writings and internal readings (with the tests incorporated and writing directly into ceph using "dd / dev / zero of = x") are around 100MB / s. In my tests using iSCSI the performance drops to 17MB / s. followed the best practices for both cases according to the online documentation.
I have also installed in a previous case ceph nautilus in debian and the problems are similar.
Using NFS instead of iSCSI the performance is superior, reaches 90MB / s
What do you think?
On the other hand, do you have any documentation for what are the scripts located in / opt / petasan / script?
Thank you!!
Good morning, in the previous installation that worked, with 2 adapters for backend (ceph and iscsi) and 1 adapter for administration, I noticed performance problems but only on the side of the iscsi clients.
I did internal cluster tests and the results were good considering the connection speed. But in tests with Hyper-V and ESXi 6.5, configured by iscsi, the performance drops too much.
The writings and internal readings (with the tests incorporated and writing directly into ceph using "dd / dev / zero of = x") are around 100MB / s. In my tests using iSCSI the performance drops to 17MB / s. followed the best practices for both cases according to the online documentation.
I have also installed in a previous case ceph nautilus in debian and the problems are similar.
Using NFS instead of iSCSI the performance is superior, reaches 90MB / s
What do you think?
On the other hand, do you have any documentation for what are the scripts located in / opt / petasan / script?
Thank you!!
admin
2,930 Posts
August 9, 2019, 12:18 pmQuote from admin on August 9, 2019, 12:18 pmWith all flash deployments we can reach 250 MB/s single io stream and 1.3 GB/s for total within vm runing in ESXi. VMotion and Veeam 400 MB/s single stream.
One config value that you should set in ESXi is the iSCSI max io size as per our vmware guide. this will help a lot with hdds.
With hdds, we recommend you use an ssd journal + controller with writeback-cache. They can reduce the write latency from 15-20 ms down to 3-5 ms. An all flash cluster will have a write latency of 1-2 ms.
From the PetaSAN bemchmark: what is the 4k iops at 1, 8 and 64 threads ?
Typically iSCSI will give better latency than NFS. One possibility of why you see good performance is caching. To support high availability + not have any data loss, we cannot cache at the gateway. A single server (iSCSI/NFS) that allows caching will give much better result, specially with slow hdds that have high latency but are prone to data loss on power down + cannot support high availability.
With all flash deployments we can reach 250 MB/s single io stream and 1.3 GB/s for total within vm runing in ESXi. VMotion and Veeam 400 MB/s single stream.
One config value that you should set in ESXi is the iSCSI max io size as per our vmware guide. this will help a lot with hdds.
With hdds, we recommend you use an ssd journal + controller with writeback-cache. They can reduce the write latency from 15-20 ms down to 3-5 ms. An all flash cluster will have a write latency of 1-2 ms.
From the PetaSAN bemchmark: what is the 4k iops at 1, 8 and 64 threads ?
Typically iSCSI will give better latency than NFS. One possibility of why you see good performance is caching. To support high availability + not have any data loss, we cannot cache at the gateway. A single server (iSCSI/NFS) that allows caching will give much better result, specially with slow hdds that have high latency but are prone to data loss on power down + cannot support high availability.
sgorla
6 Posts
August 9, 2019, 2:47 pmQuote from sgorla on August 9, 2019, 2:47 pmI still have the backend connection problem.
The error is as follows:
Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interface
Node 01:
eth0: 10.1.114.151
Backend:
eth1: 192.168.100.1
eth2: 192.168.200.1
Config:
{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}
Node 02:
eth0: 10.1.114.152
Backend:
eth1: 192.168.100.2
eth2: 192.168.200.2
{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}
From node 01 - Ping tests:
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.100.2
PING 192.168.100.2 (192.168.100.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.2: icmp_seq = 1 ttl = 64 time = 0.648 ms
8980 bytes from 192.168.100.2: icmp_seq = 2 ttl = 64 time = 0.594 ms
8980 bytes from 192.168.100.2: icmp_seq = 3 ttl = 64 time = 0.714 ms
--- 192.168.100.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2039ms
rtt min / avg / max / mdev = 0.594 / 0.652 / 0.714 / 0.049 ms
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.200.2
PING 192.168.200.2 (192.168.200.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.2: icmp_seq = 1 ttl = 64 time = 0.507 ms
8980 bytes from 192.168.200.2: icmp_seq = 2 ttl = 64 time = 0.644 ms
8980 bytes from 192.168.200.2: icmp_seq = 3 ttl = 64 time = 0.528 ms
From node 02:
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.1: icmp_seq = 1 ttl = 64 time = 0.630 ms
8980 bytes from 192.168.100.1: icmp_seq = 2 ttl = 64 time = 0.709 ms
8980 bytes from 192.168.100.1: icmp_seq = 3 ttl = 64 time = 0.564 ms
--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2050ms
rtt min / avg / max / mdev = 0.564 / 0.634 / 0.709 / 0.062 ms
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.1: icmp_seq = 1 ttl = 64 time = 0.773 ms
8980 bytes from 192.168.200.1: icmp_seq = 2 ttl = 64 time = 0.646 ms
8980 bytes from 192.168.200.1: icmp_seq = 3 ttl = 64 time = 0.753 ms
What do you think? Thank you!
I still have the backend connection problem.
The error is as follows:
Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interface
Node 01:
eth0: 10.1.114.151
Backend:
eth1: 192.168.100.1
eth2: 192.168.200.1
Config:
{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}
Node 02:
eth0: 10.1.114.152
Backend:
eth1: 192.168.100.2
eth2: 192.168.200.2
{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}
From node 01 - Ping tests:
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.100.2
PING 192.168.100.2 (192.168.100.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.2: icmp_seq = 1 ttl = 64 time = 0.648 ms
8980 bytes from 192.168.100.2: icmp_seq = 2 ttl = 64 time = 0.594 ms
8980 bytes from 192.168.100.2: icmp_seq = 3 ttl = 64 time = 0.714 ms
--- 192.168.100.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2039ms
rtt min / avg / max / mdev = 0.594 / 0.652 / 0.714 / 0.049 ms
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.200.2
PING 192.168.200.2 (192.168.200.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.2: icmp_seq = 1 ttl = 64 time = 0.507 ms
8980 bytes from 192.168.200.2: icmp_seq = 2 ttl = 64 time = 0.644 ms
8980 bytes from 192.168.200.2: icmp_seq = 3 ttl = 64 time = 0.528 ms
From node 02:
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.1: icmp_seq = 1 ttl = 64 time = 0.630 ms
8980 bytes from 192.168.100.1: icmp_seq = 2 ttl = 64 time = 0.709 ms
8980 bytes from 192.168.100.1: icmp_seq = 3 ttl = 64 time = 0.564 ms
--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2050ms
rtt min / avg / max / mdev = 0.564 / 0.634 / 0.709 / 0.062 ms
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.1: icmp_seq = 1 ttl = 64 time = 0.773 ms
8980 bytes from 192.168.200.1: icmp_seq = 2 ttl = 64 time = 0.646 ms
8980 bytes from 192.168.200.1: icmp_seq = 3 ttl = 64 time = 0.753 ms
What do you think? Thank you!
Last edited on August 9, 2019, 2:48 pm by sgorla · #5
admin
2,930 Posts
August 9, 2019, 5:53 pmQuote from admin on August 9, 2019, 5:53 pmi am no sure if you were trying to edit the config of existing cluster as per your original post, or is this a new cluster failing during deploy as the error seems to indicate.
If this is a config you edited for an existing cluster, i suggest instead of changing the config file by hand, install a dummy test vm with PetaSAN and pretend to create a cluster which matches your real cluster then copy the resultant config file.
If this is a new cluster: it seems to indicate a connection issue just before the last build step. In such case i would recommend you re-install and proceed as normal but just before the last step on node 3, just before you build the cluster, open up an ssh and make sure all nodes can ping on management, backend 1 and backend 2. it could be your cluster requires some time after a dynamic ip assignment, for it to start working, if so this could be an issue later in case of dynamic ip failover.
i also notice your ping time of 0.6 - 0.7 ms is quite high, this does have an effect on latency and io performance.
i am no sure if you were trying to edit the config of existing cluster as per your original post, or is this a new cluster failing during deploy as the error seems to indicate.
If this is a config you edited for an existing cluster, i suggest instead of changing the config file by hand, install a dummy test vm with PetaSAN and pretend to create a cluster which matches your real cluster then copy the resultant config file.
If this is a new cluster: it seems to indicate a connection issue just before the last build step. In such case i would recommend you re-install and proceed as normal but just before the last step on node 3, just before you build the cluster, open up an ssh and make sure all nodes can ping on management, backend 1 and backend 2. it could be your cluster requires some time after a dynamic ip assignment, for it to start working, if so this could be an issue later in case of dynamic ip failover.
i also notice your ping time of 0.6 - 0.7 ms is quite high, this does have an effect on latency and io performance.
sgorla
6 Posts
August 9, 2019, 8:49 pmQuote from sgorla on August 9, 2019, 8:49 pmHello again! I set up the cluster, the state is healthy.
I reconfigured an iscsi LUN but I still have a big performance problem.
I have done the best practices according to your installation manual.
Internally with the brechmarks I have 90MB of writing and about 300MB of reading.
What tests do you think I have to do?
I read that for block storage like ceph you had to perform the following configuration in vmware
Disabling VAAI when using Block storage
To disable VAAI in ESXi / ESX, you must modify these advanced configuration settings:
HardwareAcceleratedMove
HardwareAcceleratedInit
HardwareAcceleratedLocking
Configure this too, but without results. I'm at your service. Regards!
P/D: this new latency
PING 192.168.100.4 (192.168.100.4) 8972(9000) bytes of data.
8980 bytes from 192.168.100.4: icmp_seq=1 ttl=64 time=0.482 ms
Hello again! I set up the cluster, the state is healthy.
I reconfigured an iscsi LUN but I still have a big performance problem.
I have done the best practices according to your installation manual.
Internally with the brechmarks I have 90MB of writing and about 300MB of reading.
What tests do you think I have to do?
I read that for block storage like ceph you had to perform the following configuration in vmware
Disabling VAAI when using Block storage
To disable VAAI in ESXi / ESX, you must modify these advanced configuration settings:
HardwareAcceleratedMove
HardwareAcceleratedInit
HardwareAcceleratedLocking
Configure this too, but without results. I'm at your service. Regards!
P/D: this new latency
PING 192.168.100.4 (192.168.100.4) 8972(9000) bytes of data.
8980 bytes from 192.168.100.4: icmp_seq=1 ttl=64 time=0.482 ms
admin
2,930 Posts
August 9, 2019, 9:17 pmQuote from admin on August 9, 2019, 9:17 pmFrom the PetaSAN bemchmark:
what is the 4k iops at 1, 8 and 64 threads ?
what is the 4M throughput at 1 thread ?
You should not dis-able vaai on esxi. just increase the max io size and reboot.
From the PetaSAN bemchmark:
what is the 4k iops at 1, 8 and 64 threads ?
what is the 4M throughput at 1 thread ?
You should not dis-able vaai on esxi. just increase the max io size and reboot.
Last edited on August 9, 2019, 9:19 pm by admin · #8
sgorla
6 Posts
August 9, 2019, 10:15 pmQuote from sgorla on August 9, 2019, 10:15 pmbemchmark 4k iops at 1
Results
Cluster IOPS
Write Read
93 1583
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 51 16 18 0 1 20 30 2 7
PDSTCL03 22 17 22 0 1 21 21 2 5
PDSTCL04 34 6 7 0 0 10 10 1 2
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 13 15 1 2 8 8 4 9
PDSTCL03 22 15 17 1 2 14 14 2 5
PDSTCL04 34 8 9 1 2 6 6 2 4
bemchmark 4k iops at 8
Results
Cluster IOPS
Write Read
373 11830
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 34 41 0 1 53 77 3 14
PDSTCL03 22 41 50 0 1 70 77 1 2
PDSTCL04 34 31 33 0 1 63 63 2 5
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 22 23 3 10 0 0 5 8
PDSTCL03 22 22 23 4 12 0 0 5 11
PDSTCL04 34 17 18 3 10 0 0 3 6
bemchmark 4k iops at 64
Results
Cluster IOPS
Write Read
1505 12490
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 53 68 75 2 4 93 99 9 30
PDSTCL03 23 67 73 2 4 98 99 5 8
PDSTCL04 35 76 78 1 4 99 99 9 23
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 54 62 63 2 7 0 0 45 67
PDSTCL03 23 81 86 2 8 1 1 48 91
PDSTCL04 35 61 67 2 7 0 0 42 75
4M throughput at 1 thread
Results
Cluster Throughput
Write Read
23 MB/s 65 MB/s
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 8 9 6 12 1 1 4 9
PDSTCL03 24 7 7 4 9 1 1 2 4
PDSTCL04 35 3 3 4 9 0 0 3 4
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 3 5 4 13 0 0 3 6
PDSTCL03 24 3 4 4 12 0 0 2 4
PDSTCL04 36 1 3 3 11 0 0 2 3
bemchmark 4k iops at 1
Results
Cluster IOPS
Write Read
93 1583
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 51 16 18 0 1 20 30 2 7
PDSTCL03 22 17 22 0 1 21 21 2 5
PDSTCL04 34 6 7 0 0 10 10 1 2
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 13 15 1 2 8 8 4 9
PDSTCL03 22 15 17 1 2 14 14 2 5
PDSTCL04 34 8 9 1 2 6 6 2 4
bemchmark 4k iops at 8
Results
Cluster IOPS
Write Read
373 11830
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 34 41 0 1 53 77 3 14
PDSTCL03 22 41 50 0 1 70 77 1 2
PDSTCL04 34 31 33 0 1 63 63 2 5
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 22 23 3 10 0 0 5 8
PDSTCL03 22 22 23 4 12 0 0 5 11
PDSTCL04 34 17 18 3 10 0 0 3 6
bemchmark 4k iops at 64
Results
Cluster IOPS
Write Read
1505 12490
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 53 68 75 2 4 93 99 9 30
PDSTCL03 23 67 73 2 4 98 99 5 8
PDSTCL04 35 76 78 1 4 99 99 9 23
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 54 62 63 2 7 0 0 45 67
PDSTCL03 23 81 86 2 8 1 1 48 91
PDSTCL04 35 61 67 2 7 0 0 42 75
4M throughput at 1 thread
Results
Cluster Throughput
Write Read
23 MB/s 65 MB/s
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 8 9 6 12 1 1 4 9
PDSTCL03 24 7 7 4 9 1 1 2 4
PDSTCL04 35 3 3 4 9 0 0 3 4
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 3 5 4 13 0 0 3 6
PDSTCL03 24 3 4 4 12 0 0 2 4
PDSTCL04 36 1 3 3 11 0 0 2 3
Last edited on August 9, 2019, 10:16 pm by sgorla · #9
admin
2,930 Posts
August 9, 2019, 10:58 pmQuote from admin on August 9, 2019, 10:58 pmThe 4M write throughput of 23 MB/s per single thread is low, this is the low level ceph rados performance.
the 4k write iops show latencies of about 11, 23, 43 ms depending on load.
From what i see, you need better hardware to get better performance. ofcourse this is open ended, so some options you have in increasing cost:
- enable caching on the client, application side
- reduce replication factor from 3 to 2 (not really recommended)
- add ssd journals: this could reduce latency by factor of 2. i did see journal data in the benchmark, but i am not sure you use ssds, also you really should be using enterprise ssds rather than commercial grade and test their sync/fsync write iops speed (this can vary by a factor of 100)..you can do so from the blue node console.
- do not use anything less than 10G interfaces
- adding a controller with write back cache can further reduce the write latency of hdds by a factor of 3
- our recommendation is all flash, which gives you 1-2 ms write latency.
The 4M write throughput of 23 MB/s per single thread is low, this is the low level ceph rados performance.
the 4k write iops show latencies of about 11, 23, 43 ms depending on load.
From what i see, you need better hardware to get better performance. ofcourse this is open ended, so some options you have in increasing cost:
- enable caching on the client, application side
- reduce replication factor from 3 to 2 (not really recommended)
- add ssd journals: this could reduce latency by factor of 2. i did see journal data in the benchmark, but i am not sure you use ssds, also you really should be using enterprise ssds rather than commercial grade and test their sync/fsync write iops speed (this can vary by a factor of 100)..you can do so from the blue node console.
- do not use anything less than 10G interfaces
- adding a controller with write back cache can further reduce the write latency of hdds by a factor of 3
- our recommendation is all flash, which gives you 1-2 ms write latency.
Last edited on August 9, 2019, 11:29 pm by admin · #10
Pages: 1 2
Reconfiguration a node and general questions
sgorla
6 Posts
Quote from sgorla on August 8, 2019, 11:11 pmHi, I'm testing your product. My first impression is that it is a very interesting idea. I have been working with servers for more than 10 years, today in our computer center I have set up a laboratory with a pool of servers and switches.
My dedicated systems for these tests are the following:
4 Dell poweredge 2900 servers with 6 500 GB SATA drives
24GB RAM per server.
2 servers with 3 nics 1GB and 2 with 4 nics 1GB.
The goal is to use eth0 administration, use eth1 and eth2 in bond for backend, eth3 available only in two nodes for iscsi.My first query is as follows:
How can I reconfigure a node without reinstalling Petasan?
The second question:
I configured the main node correctly, with eth0 for administration and in bounding eth1 and eth2 for backend.
eth3 is in two nodes for iscsi.
I access by SSH and I can ping all interfaces and respond correctly. But when I want to add a node, at the end, I get the following error:Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interfaceAnother error, I remove the LAGs and put the raw interfaces, when I want to add the third node I get this error:
Alert!
Node interfaces do not match cluster interface settings.For now that. Thank you!
Hi, I'm testing your product. My first impression is that it is a very interesting idea. I have been working with servers for more than 10 years, today in our computer center I have set up a laboratory with a pool of servers and switches.
My dedicated systems for these tests are the following:
4 Dell poweredge 2900 servers with 6 500 GB SATA drives
24GB RAM per server.
2 servers with 3 nics 1GB and 2 with 4 nics 1GB.
The goal is to use eth0 administration, use eth1 and eth2 in bond for backend, eth3 available only in two nodes for iscsi.
My first query is as follows:
How can I reconfigure a node without reinstalling Petasan?
The second question:
I configured the main node correctly, with eth0 for administration and in bounding eth1 and eth2 for backend.
eth3 is in two nodes for iscsi.
I access by SSH and I can ping all interfaces and respond correctly. But when I want to add a node, at the end, I get the following error:
Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interface
Another error, I remove the LAGs and put the raw interfaces, when I want to add the third node I get this error:
Alert!
Node interfaces do not match cluster interface settings.
For now that. Thank you!
admin
2,930 Posts
Quote from admin on August 9, 2019, 9:59 amThe configuration is in
/opt/petasan/config/cluster_info.py (applies to all nodes in cluster)
/opt/petasan/config/node_info.py (applies to existing node)
You should be able to change "most" settings for subnets/bonds/jumbo frames/ips..etc. Looking at the file is self explanatory.
some points:
- This is a clustered system rather that a single node, so do not expect to be able to change configuration 1 node at a time while the cluster is running, you probably need to shut down the entire cluster do the changes then bring them all online together.
- While many configurations can be changed, changing backend 1 ip address is not easy at all: it is the ip used by ceph monitors, consul servers and gluster server. Some of these systems are already difficult to undergo these changes once up and running outside of PetaSAN, but you could search their online docs.
- It is much better to try to design the production system network correctly ahead of time than put the effort to change it later during production since it is not a simple system.
- Currently in PetaSAN, all nodes need to have the same number of interfaces.
The configuration is in
/opt/petasan/config/cluster_info.py (applies to all nodes in cluster)
/opt/petasan/config/node_info.py (applies to existing node)
You should be able to change "most" settings for subnets/bonds/jumbo frames/ips..etc. Looking at the file is self explanatory.
some points:
- This is a clustered system rather that a single node, so do not expect to be able to change configuration 1 node at a time while the cluster is running, you probably need to shut down the entire cluster do the changes then bring them all online together.
- While many configurations can be changed, changing backend 1 ip address is not easy at all: it is the ip used by ceph monitors, consul servers and gluster server. Some of these systems are already difficult to undergo these changes once up and running outside of PetaSAN, but you could search their online docs.
- It is much better to try to design the production system network correctly ahead of time than put the effort to change it later during production since it is not a simple system.
- Currently in PetaSAN, all nodes need to have the same number of interfaces.
sgorla
6 Posts
Quote from sgorla on August 9, 2019, 11:37 amGood morning, in the previous installation that worked, with 2 adapters for backend (ceph and iscsi) and 1 adapter for administration, I noticed performance problems but only on the side of the iscsi clients.
I did internal cluster tests and the results were good considering the connection speed. But in tests with Hyper-V and ESXi 6.5, configured by iscsi, the performance drops too much.
The writings and internal readings (with the tests incorporated and writing directly into ceph using "dd / dev / zero of = x") are around 100MB / s. In my tests using iSCSI the performance drops to 17MB / s. followed the best practices for both cases according to the online documentation.
I have also installed in a previous case ceph nautilus in debian and the problems are similar.
Using NFS instead of iSCSI the performance is superior, reaches 90MB / s
What do you think?
On the other hand, do you have any documentation for what are the scripts located in / opt / petasan / script?
Thank you!!
I did internal cluster tests and the results were good considering the connection speed. But in tests with Hyper-V and ESXi 6.5, configured by iscsi, the performance drops too much.
The writings and internal readings (with the tests incorporated and writing directly into ceph using "dd / dev / zero of = x") are around 100MB / s. In my tests using iSCSI the performance drops to 17MB / s. followed the best practices for both cases according to the online documentation.
I have also installed in a previous case ceph nautilus in debian and the problems are similar.
Using NFS instead of iSCSI the performance is superior, reaches 90MB / s
What do you think?
On the other hand, do you have any documentation for what are the scripts located in / opt / petasan / script?
Thank you!!
admin
2,930 Posts
Quote from admin on August 9, 2019, 12:18 pmWith all flash deployments we can reach 250 MB/s single io stream and 1.3 GB/s for total within vm runing in ESXi. VMotion and Veeam 400 MB/s single stream.
One config value that you should set in ESXi is the iSCSI max io size as per our vmware guide. this will help a lot with hdds.
With hdds, we recommend you use an ssd journal + controller with writeback-cache. They can reduce the write latency from 15-20 ms down to 3-5 ms. An all flash cluster will have a write latency of 1-2 ms.
From the PetaSAN bemchmark: what is the 4k iops at 1, 8 and 64 threads ?
Typically iSCSI will give better latency than NFS. One possibility of why you see good performance is caching. To support high availability + not have any data loss, we cannot cache at the gateway. A single server (iSCSI/NFS) that allows caching will give much better result, specially with slow hdds that have high latency but are prone to data loss on power down + cannot support high availability.
With all flash deployments we can reach 250 MB/s single io stream and 1.3 GB/s for total within vm runing in ESXi. VMotion and Veeam 400 MB/s single stream.
One config value that you should set in ESXi is the iSCSI max io size as per our vmware guide. this will help a lot with hdds.
With hdds, we recommend you use an ssd journal + controller with writeback-cache. They can reduce the write latency from 15-20 ms down to 3-5 ms. An all flash cluster will have a write latency of 1-2 ms.
From the PetaSAN bemchmark: what is the 4k iops at 1, 8 and 64 threads ?
Typically iSCSI will give better latency than NFS. One possibility of why you see good performance is caching. To support high availability + not have any data loss, we cannot cache at the gateway. A single server (iSCSI/NFS) that allows caching will give much better result, specially with slow hdds that have high latency but are prone to data loss on power down + cannot support high availability.
sgorla
6 Posts
Quote from sgorla on August 9, 2019, 2:47 pmI still have the backend connection problem.
The error is as follows:Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interfaceNode 01:
eth0: 10.1.114.151
Backend:
eth1: 192.168.100.1
eth2: 192.168.200.1Config:
{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}Node 02:
eth0: 10.1.114.152
Backend:
eth1: 192.168.100.2
eth2: 192.168.200.2{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}From node 01 - Ping tests:
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.100.2
PING 192.168.100.2 (192.168.100.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.2: icmp_seq = 1 ttl = 64 time = 0.648 ms
8980 bytes from 192.168.100.2: icmp_seq = 2 ttl = 64 time = 0.594 ms
8980 bytes from 192.168.100.2: icmp_seq = 3 ttl = 64 time = 0.714 ms--- 192.168.100.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2039ms
rtt min / avg / max / mdev = 0.594 / 0.652 / 0.714 / 0.049 ms
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.200.2
PING 192.168.200.2 (192.168.200.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.2: icmp_seq = 1 ttl = 64 time = 0.507 ms
8980 bytes from 192.168.200.2: icmp_seq = 2 ttl = 64 time = 0.644 ms
8980 bytes from 192.168.200.2: icmp_seq = 3 ttl = 64 time = 0.528 msFrom node 02:
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.1: icmp_seq = 1 ttl = 64 time = 0.630 ms
8980 bytes from 192.168.100.1: icmp_seq = 2 ttl = 64 time = 0.709 ms
8980 bytes from 192.168.100.1: icmp_seq = 3 ttl = 64 time = 0.564 ms--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2050ms
rtt min / avg / max / mdev = 0.564 / 0.634 / 0.709 / 0.062 ms
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.1: icmp_seq = 1 ttl = 64 time = 0.773 ms
8980 bytes from 192.168.200.1: icmp_seq = 2 ttl = 64 time = 0.646 ms
8980 bytes from 192.168.200.1: icmp_seq = 3 ttl = 64 time = 0.753 ms
What do you think? Thank you!
I still have the backend connection problem.
The error is as follows:
Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interface
Node 01:
eth0: 10.1.114.151
Backend:
eth1: 192.168.100.1
eth2: 192.168.200.1
Config:
{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}
Node 02:
eth0: 10.1.114.152
Backend:
eth1: 192.168.100.2
eth2: 192.168.200.2
{
"backend_1_base_ip": "192.168.100.0",
"backend_1_eth_name": "eth2",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.200.0",
"backend_2_eth_name": "eth2",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [],
"eth_count": 3,
"iscsi_1_eth_name": "eth1",
"iscsi_2_eth_name": "eth1",
"jumbo_frames": [
"eth1",
"eth2"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.100.1",
"backend_2_ip": "192.168.200.1",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.114.151",
"name": "PDSTCL01"
}
],
"name": "PDCL01",
"storage_engine": "bluestore"
}
From node 01 - Ping tests:
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.100.2
PING 192.168.100.2 (192.168.100.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.2: icmp_seq = 1 ttl = 64 time = 0.648 ms
8980 bytes from 192.168.100.2: icmp_seq = 2 ttl = 64 time = 0.594 ms
8980 bytes from 192.168.100.2: icmp_seq = 3 ttl = 64 time = 0.714 ms
--- 192.168.100.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2039ms
rtt min / avg / max / mdev = 0.594 / 0.652 / 0.714 / 0.049 ms
root @ PDSTCL01: / opt / petasan / config # ping -M do -s 8972 192.168.200.2
PING 192.168.200.2 (192.168.200.2) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.2: icmp_seq = 1 ttl = 64 time = 0.507 ms
8980 bytes from 192.168.200.2: icmp_seq = 2 ttl = 64 time = 0.644 ms
8980 bytes from 192.168.200.2: icmp_seq = 3 ttl = 64 time = 0.528 ms
From node 02:
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.100.1: icmp_seq = 1 ttl = 64 time = 0.630 ms
8980 bytes from 192.168.100.1: icmp_seq = 2 ttl = 64 time = 0.709 ms
8980 bytes from 192.168.100.1: icmp_seq = 3 ttl = 64 time = 0.564 ms
--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2050ms
rtt min / avg / max / mdev = 0.564 / 0.634 / 0.709 / 0.062 ms
root @ PDSTCL02: ~ # ping -M do -s 8972 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 8972 (9000) bytes of data.
8980 bytes from 192.168.200.1: icmp_seq = 1 ttl = 64 time = 0.773 ms
8980 bytes from 192.168.200.1: icmp_seq = 2 ttl = 64 time = 0.646 ms
8980 bytes from 192.168.200.1: icmp_seq = 3 ttl = 64 time = 0.753 ms
What do you think? Thank you!
admin
2,930 Posts
Quote from admin on August 9, 2019, 5:53 pmi am no sure if you were trying to edit the config of existing cluster as per your original post, or is this a new cluster failing during deploy as the error seems to indicate.
If this is a config you edited for an existing cluster, i suggest instead of changing the config file by hand, install a dummy test vm with PetaSAN and pretend to create a cluster which matches your real cluster then copy the resultant config file.
If this is a new cluster: it seems to indicate a connection issue just before the last build step. In such case i would recommend you re-install and proceed as normal but just before the last step on node 3, just before you build the cluster, open up an ssh and make sure all nodes can ping on management, backend 1 and backend 2. it could be your cluster requires some time after a dynamic ip assignment, for it to start working, if so this could be an issue later in case of dynamic ip failover.
i also notice your ping time of 0.6 - 0.7 ms is quite high, this does have an effect on latency and io performance.
i am no sure if you were trying to edit the config of existing cluster as per your original post, or is this a new cluster failing during deploy as the error seems to indicate.
If this is a config you edited for an existing cluster, i suggest instead of changing the config file by hand, install a dummy test vm with PetaSAN and pretend to create a cluster which matches your real cluster then copy the resultant config file.
If this is a new cluster: it seems to indicate a connection issue just before the last build step. In such case i would recommend you re-install and proceed as normal but just before the last step on node 3, just before you build the cluster, open up an ssh and make sure all nodes can ping on management, backend 1 and backend 2. it could be your cluster requires some time after a dynamic ip assignment, for it to start working, if so this could be an issue later in case of dynamic ip failover.
i also notice your ping time of 0.6 - 0.7 ms is quite high, this does have an effect on latency and io performance.
sgorla
6 Posts
Quote from sgorla on August 9, 2019, 8:49 pmHello again! I set up the cluster, the state is healthy.
I reconfigured an iscsi LUN but I still have a big performance problem.
I have done the best practices according to your installation manual.
Internally with the brechmarks I have 90MB of writing and about 300MB of reading.
What tests do you think I have to do?
I read that for block storage like ceph you had to perform the following configuration in vmwareDisabling VAAI when using Block storage
To disable VAAI in ESXi / ESX, you must modify these advanced configuration settings:HardwareAcceleratedMove
HardwareAcceleratedInit
HardwareAcceleratedLockingConfigure this too, but without results. I'm at your service. Regards!
P/D: this new latency
PING 192.168.100.4 (192.168.100.4) 8972(9000) bytes of data.
8980 bytes from 192.168.100.4: icmp_seq=1 ttl=64 time=0.482 ms
Hello again! I set up the cluster, the state is healthy.
I reconfigured an iscsi LUN but I still have a big performance problem.
I have done the best practices according to your installation manual.
Internally with the brechmarks I have 90MB of writing and about 300MB of reading.
What tests do you think I have to do?
I read that for block storage like ceph you had to perform the following configuration in vmware
Disabling VAAI when using Block storage
To disable VAAI in ESXi / ESX, you must modify these advanced configuration settings:
HardwareAcceleratedMove
HardwareAcceleratedInit
HardwareAcceleratedLocking
Configure this too, but without results. I'm at your service. Regards!
P/D: this new latency
PING 192.168.100.4 (192.168.100.4) 8972(9000) bytes of data.
8980 bytes from 192.168.100.4: icmp_seq=1 ttl=64 time=0.482 ms
admin
2,930 Posts
Quote from admin on August 9, 2019, 9:17 pmFrom the PetaSAN bemchmark:
what is the 4k iops at 1, 8 and 64 threads ?
what is the 4M throughput at 1 thread ?
You should not dis-able vaai on esxi. just increase the max io size and reboot.
From the PetaSAN bemchmark:
what is the 4k iops at 1, 8 and 64 threads ?
what is the 4M throughput at 1 thread ?
You should not dis-able vaai on esxi. just increase the max io size and reboot.
sgorla
6 Posts
Quote from sgorla on August 9, 2019, 10:15 pmbemchmark 4k iops at 1
Results
Cluster IOPS
Write Read
93 1583
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 51 16 18 0 1 20 30 2 7
PDSTCL03 22 17 22 0 1 21 21 2 5
PDSTCL04 34 6 7 0 0 10 10 1 2
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 13 15 1 2 8 8 4 9
PDSTCL03 22 15 17 1 2 14 14 2 5
PDSTCL04 34 8 9 1 2 6 6 2 4bemchmark 4k iops at 8
Results
Cluster IOPS
Write Read
373 11830
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 34 41 0 1 53 77 3 14
PDSTCL03 22 41 50 0 1 70 77 1 2
PDSTCL04 34 31 33 0 1 63 63 2 5
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 22 23 3 10 0 0 5 8
PDSTCL03 22 22 23 4 12 0 0 5 11
PDSTCL04 34 17 18 3 10 0 0 3 6bemchmark 4k iops at 64
Results
Cluster IOPS
Write Read
1505 12490
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 53 68 75 2 4 93 99 9 30
PDSTCL03 23 67 73 2 4 98 99 5 8
PDSTCL04 35 76 78 1 4 99 99 9 23
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 54 62 63 2 7 0 0 45 67
PDSTCL03 23 81 86 2 8 1 1 48 91
PDSTCL04 35 61 67 2 7 0 0 42 754M throughput at 1 thread
Results
Cluster Throughput
Write Read
23 MB/s 65 MB/s
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 8 9 6 12 1 1 4 9
PDSTCL03 24 7 7 4 9 1 1 2 4
PDSTCL04 35 3 3 4 9 0 0 3 4
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 3 5 4 13 0 0 3 6
PDSTCL03 24 3 4 4 12 0 0 2 4
PDSTCL04 36 1 3 3 11 0 0 2 3
bemchmark 4k iops at 1
Results
Cluster IOPS
Write Read
93 1583
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 51 16 18 0 1 20 30 2 7
PDSTCL03 22 17 22 0 1 21 21 2 5
PDSTCL04 34 6 7 0 0 10 10 1 2
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 13 15 1 2 8 8 4 9
PDSTCL03 22 15 17 1 2 14 14 2 5
PDSTCL04 34 8 9 1 2 6 6 2 4
bemchmark 4k iops at 8
Results
Cluster IOPS
Write Read
373 11830
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 34 41 0 1 53 77 3 14
PDSTCL03 22 41 50 0 1 70 77 1 2
PDSTCL04 34 31 33 0 1 63 63 2 5
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 52 22 23 3 10 0 0 5 8
PDSTCL03 22 22 23 4 12 0 0 5 11
PDSTCL04 34 17 18 3 10 0 0 3 6
bemchmark 4k iops at 64
Results
Cluster IOPS
Write Read
1505 12490
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 53 68 75 2 4 93 99 9 30
PDSTCL03 23 67 73 2 4 98 99 5 8
PDSTCL04 35 76 78 1 4 99 99 9 23
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 54 62 63 2 7 0 0 45 67
PDSTCL03 23 81 86 2 8 1 1 48 91
PDSTCL04 35 61 67 2 7 0 0 42 75
4M throughput at 1 thread
Results
Cluster Throughput
Write Read
23 MB/s 65 MB/s
Write Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 8 9 6 12 1 1 4 9
PDSTCL03 24 7 7 4 9 1 1 2 4
PDSTCL04 35 3 3 4 9 0 0 3 4
Read Resource Load:
Node Memory Util% CPU Util% Network Util% Disks Util% Actions
Avg Max Avg Max Journals OSDs
Avg Max Avg Max
PDSTCL02 55 3 5 4 13 0 0 3 6
PDSTCL03 24 3 4 4 12 0 0 2 4
PDSTCL04 36 1 3 3 11 0 0 2 3
admin
2,930 Posts
Quote from admin on August 9, 2019, 10:58 pmThe 4M write throughput of 23 MB/s per single thread is low, this is the low level ceph rados performance.
the 4k write iops show latencies of about 11, 23, 43 ms depending on load.
From what i see, you need better hardware to get better performance. ofcourse this is open ended, so some options you have in increasing cost:
- enable caching on the client, application side
- reduce replication factor from 3 to 2 (not really recommended)
- add ssd journals: this could reduce latency by factor of 2. i did see journal data in the benchmark, but i am not sure you use ssds, also you really should be using enterprise ssds rather than commercial grade and test their sync/fsync write iops speed (this can vary by a factor of 100)..you can do so from the blue node console.
- do not use anything less than 10G interfaces
- adding a controller with write back cache can further reduce the write latency of hdds by a factor of 3
- our recommendation is all flash, which gives you 1-2 ms write latency.
The 4M write throughput of 23 MB/s per single thread is low, this is the low level ceph rados performance.
the 4k write iops show latencies of about 11, 23, 43 ms depending on load.
From what i see, you need better hardware to get better performance. ofcourse this is open ended, so some options you have in increasing cost:
- enable caching on the client, application side
- reduce replication factor from 3 to 2 (not really recommended)
- add ssd journals: this could reduce latency by factor of 2. i did see journal data in the benchmark, but i am not sure you use ssds, also you really should be using enterprise ssds rather than commercial grade and test their sync/fsync write iops speed (this can vary by a factor of 100)..you can do so from the blue node console.
- do not use anything less than 10G interfaces
- adding a controller with write back cache can further reduce the write latency of hdds by a factor of 3
- our recommendation is all flash, which gives you 1-2 ms write latency.