1.4 benchmark
philip.shannon
37 Posts
September 29, 2017, 1:17 pmQuote from philip.shannon on September 29, 2017, 1:17 pmHi- I got the cluster up and running using ver 1.4, 3 Dell R730 servers each with 1 petaSAN 1.4 vm using RDM x10 SSD drives. When I run the benchmark from the gui nothing happens except the spinning wheel, I waited like 30 minutes for results. SO I checked the logfiles and saw some commands, tried them from one of the vm's
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4096 -m w
40029
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -m r
40078
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/storage_load.py -d 15
40410
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4194304 -m w
42418
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -m r
42448
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -b 4096 -m w
43040
update: the graphs at the main page show zero iops and zero throughput. Also, my admin logon keeps timing out. Is there a way to turn that off, or change it to 2 or 4 hours before timeout and have to re-login? thanks
Hi- I got the cluster up and running using ver 1.4, 3 Dell R730 servers each with 1 petaSAN 1.4 vm using RDM x10 SSD drives. When I run the benchmark from the gui nothing happens except the spinning wheel, I waited like 30 minutes for results. SO I checked the logfiles and saw some commands, tried them from one of the vm's
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4096 -m w
40029
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -m r
40078
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/storage_load.py -d 15
40410
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4194304 -m w
42418
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -m r
42448
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -b 4096 -m w
43040
update: the graphs at the main page show zero iops and zero throughput. Also, my admin logon keeps timing out. Is there a way to turn that off, or change it to 2 or 4 hours before timeout and have to re-login? thanks
Last edited on September 29, 2017, 1:34 pm by philip.shannon · #1
admin
2,930 Posts
September 29, 2017, 3:23 pmQuote from admin on September 29, 2017, 3:23 pmHi there,
Could be the web session is expiring before the test completes. The default session timeout is 5 min. If this is the case it may be because under vmware virtualization the stress tests takes too long to complete, another possibility is if you use firefox try to have accurate time set on your browser box and your vms, chrome is more forgiving.
To look in more detail i recommend you do:
- Have the correct time set on your browser and vms if using firefox or use chrome.
- If this is a test cluster, please reboot all vms (it will give us a clean state when looking at the logs)
- Run a 1 min 4K test with 1 worker thread, chose any 1 client out of the 3 to simulate client io
- During the test and while the spinning wheel is running open another tab in your browser (do not close the benchmark tab but open a new tab in the same browser ) and access the PetaSAN management url: it should display the home dashboard page without asking for login, if does ask for login then the session expires too quickly.
- Please collect PetaSAN.log on all 3 machines and email the to admin @ petasan.org
The cluster iops and throughput on the first page do not get affected by the benchmark test, as the test creates a separate test pool that does not affect the cluster stats. However the node stats charts will stress for node resources such as cpu/net/disks if the test duration was over 1 min, the sample time of node resource charts.
Hi there,
Could be the web session is expiring before the test completes. The default session timeout is 5 min. If this is the case it may be because under vmware virtualization the stress tests takes too long to complete, another possibility is if you use firefox try to have accurate time set on your browser box and your vms, chrome is more forgiving.
To look in more detail i recommend you do:
- Have the correct time set on your browser and vms if using firefox or use chrome.
- If this is a test cluster, please reboot all vms (it will give us a clean state when looking at the logs)
- Run a 1 min 4K test with 1 worker thread, chose any 1 client out of the 3 to simulate client io
- During the test and while the spinning wheel is running open another tab in your browser (do not close the benchmark tab but open a new tab in the same browser ) and access the PetaSAN management url: it should display the home dashboard page without asking for login, if does ask for login then the session expires too quickly.
- Please collect PetaSAN.log on all 3 machines and email the to admin @ petasan.org
The cluster iops and throughput on the first page do not get affected by the benchmark test, as the test creates a separate test pool that does not affect the cluster stats. However the node stats charts will stress for node resources such as cpu/net/disks if the test duration was over 1 min, the sample time of node resource charts.
Last edited on September 29, 2017, 3:24 pm by admin · #2
philip.shannon
37 Posts
September 29, 2017, 7:11 pmQuote from philip.shannon on September 29, 2017, 7:11 pmAfter reboot, and switching from IE to Chrome, I got results!
Results
Cluster IOPS
Write
Read
3848
19748
After reboot, and switching from IE to Chrome, I got results!
Results
Cluster IOPS
Write
Read
3848
19748
philip.shannon
37 Posts
September 29, 2017, 7:12 pmQuote from philip.shannon on September 29, 2017, 7:12 pmimage of results is here https://photos.app.goo.gl/Z8oHjSkUtINqauUe2
image of results is here https://photos.app.goo.gl/Z8oHjSkUtINqauUe2
Last edited on September 29, 2017, 7:16 pm by philip.shannon · #4
philip.shannon
37 Posts
September 29, 2017, 7:12 pmQuote from philip.shannon on September 29, 2017, 7:12 pmsee above link for results
see above link for results
Last edited on September 29, 2017, 7:17 pm by philip.shannon · #5
admin
2,930 Posts
September 29, 2017, 8:04 pmQuote from admin on September 29, 2017, 8:04 pmGood it worked 🙂 it could be IE, we only test with chrome and firefox.
The test you show ( iops, ? threads ) shows cpu is the bottleneck your network and disks are not busy (under utilized). More cpu /cores will give better iops, . Note that write speed is typically 4 to 6 times slower that read since it writes twice ( journal then data partitions ) x the number of replicas. Note that an accurate test will run clients on non storage nodes as per the blue info banner at top, in our case the node that is simulating client test is also a storage server, it is very likely its cpu will be 100% due to the extra client simulation load.
Can you test 4M throuput using 1, 16 and 64 threads and do the same with 4k iops. and see:
- If the numbers scale up as you simulate more threads
- What are the bottlenecks for iops and throughput ?
Good it worked 🙂 it could be IE, we only test with chrome and firefox.
The test you show ( iops, ? threads ) shows cpu is the bottleneck your network and disks are not busy (under utilized). More cpu /cores will give better iops, . Note that write speed is typically 4 to 6 times slower that read since it writes twice ( journal then data partitions ) x the number of replicas. Note that an accurate test will run clients on non storage nodes as per the blue info banner at top, in our case the node that is simulating client test is also a storage server, it is very likely its cpu will be 100% due to the extra client simulation load.
Can you test 4M throuput using 1, 16 and 64 threads and do the same with 4k iops. and see:
- If the numbers scale up as you simulate more threads
- What are the bottlenecks for iops and throughput ?
admin
2,930 Posts
September 29, 2017, 9:02 pmQuote from admin on September 29, 2017, 9:02 pmOne more thing that is specific to running virtualized under vmware: the cpu% usage may mean the vm is waiting for something from ESX host if %iowait is high
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017926
So please click detail and see if the cpu% is mostly user% and system% corresponding to something internal to he vm ie PetaSAN related or if its caused by iowait% meaning it is waiting for host
One more thing that is specific to running virtualized under vmware: the cpu% usage may mean the vm is waiting for something from ESX host if %iowait is high
So please click detail and see if the cpu% is mostly user% and system% corresponding to something internal to he vm ie PetaSAN related or if its caused by iowait% meaning it is waiting for host
Last edited on September 29, 2017, 9:04 pm by admin · #7
philip.shannon
37 Posts
October 4, 2017, 5:52 pmQuote from philip.shannon on October 4, 2017, 5:52 pmwill try these later this week. thank you
will try these later this week. thank you
philip.shannon
37 Posts
October 5, 2017, 4:42 pmQuote from philip.shannon on October 5, 2017, 4:42 pmwe have experience with %iowait don't think it's in play here, we have problems when host CPU's are oversubscribed. In this case each host is only running 1 single VM.
I did increase the vCPU's from 4 to 8 and ran all of the benchmarks. Tough to post results here though
we have experience with %iowait don't think it's in play here, we have problems when host CPU's are oversubscribed. In this case each host is only running 1 single VM.
I did increase the vCPU's from 4 to 8 and ran all of the benchmarks. Tough to post results here though
1.4 benchmark
philip.shannon
37 Posts
Quote from philip.shannon on September 29, 2017, 1:17 pmHi- I got the cluster up and running using ver 1.4, 3 Dell R730 servers each with 1 petaSAN 1.4 vm using RDM x10 SSD drives. When I run the benchmark from the gui nothing happens except the spinning wheel, I waited like 30 minutes for results. SO I checked the logfiles and saw some commands, tried them from one of the vm's
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4096 -m w
40029
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -m r
40078
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/storage_load.py -d 15
40410root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4194304 -m w
42418
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -m r
42448root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -b 4096 -m w
43040update: the graphs at the main page show zero iops and zero throughput. Also, my admin logon keeps timing out. Is there a way to turn that off, or change it to 2 or 4 hours before timeout and have to re-login? thanks
Hi- I got the cluster up and running using ver 1.4, 3 Dell R730 servers each with 1 petaSAN 1.4 vm using RDM x10 SSD drives. When I run the benchmark from the gui nothing happens except the spinning wheel, I waited like 30 minutes for results. SO I checked the logfiles and saw some commands, tried them from one of the vm's
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4096 -m w
40029
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -m r
40078
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/storage_load.py -d 15
40410
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4194304 -m w
42418
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -m r
42448
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -b 4096 -m w
43040
update: the graphs at the main page show zero iops and zero throughput. Also, my admin logon keeps timing out. Is there a way to turn that off, or change it to 2 or 4 hours before timeout and have to re-login? thanks
admin
2,930 Posts
Quote from admin on September 29, 2017, 3:23 pmHi there,
Could be the web session is expiring before the test completes. The default session timeout is 5 min. If this is the case it may be because under vmware virtualization the stress tests takes too long to complete, another possibility is if you use firefox try to have accurate time set on your browser box and your vms, chrome is more forgiving.
To look in more detail i recommend you do:
- Have the correct time set on your browser and vms if using firefox or use chrome.
- If this is a test cluster, please reboot all vms (it will give us a clean state when looking at the logs)
- Run a 1 min 4K test with 1 worker thread, chose any 1 client out of the 3 to simulate client io
- During the test and while the spinning wheel is running open another tab in your browser (do not close the benchmark tab but open a new tab in the same browser ) and access the PetaSAN management url: it should display the home dashboard page without asking for login, if does ask for login then the session expires too quickly.
- Please collect PetaSAN.log on all 3 machines and email the to admin @ petasan.org
The cluster iops and throughput on the first page do not get affected by the benchmark test, as the test creates a separate test pool that does not affect the cluster stats. However the node stats charts will stress for node resources such as cpu/net/disks if the test duration was over 1 min, the sample time of node resource charts.
Hi there,
Could be the web session is expiring before the test completes. The default session timeout is 5 min. If this is the case it may be because under vmware virtualization the stress tests takes too long to complete, another possibility is if you use firefox try to have accurate time set on your browser box and your vms, chrome is more forgiving.
To look in more detail i recommend you do:
- Have the correct time set on your browser and vms if using firefox or use chrome.
- If this is a test cluster, please reboot all vms (it will give us a clean state when looking at the logs)
- Run a 1 min 4K test with 1 worker thread, chose any 1 client out of the 3 to simulate client io
- During the test and while the spinning wheel is running open another tab in your browser (do not close the benchmark tab but open a new tab in the same browser ) and access the PetaSAN management url: it should display the home dashboard page without asking for login, if does ask for login then the session expires too quickly.
- Please collect PetaSAN.log on all 3 machines and email the to admin @ petasan.org
The cluster iops and throughput on the first page do not get affected by the benchmark test, as the test creates a separate test pool that does not affect the cluster stats. However the node stats charts will stress for node resources such as cpu/net/disks if the test duration was over 1 min, the sample time of node resource charts.
philip.shannon
37 Posts
Quote from philip.shannon on September 29, 2017, 7:11 pmAfter reboot, and switching from IE to Chrome, I got results!
Results
Cluster IOPS
Write Read 3848 19748
After reboot, and switching from IE to Chrome, I got results!
Results
Cluster IOPS
Write | Read |
---|---|
3848 | 19748 |
philip.shannon
37 Posts
Quote from philip.shannon on September 29, 2017, 7:12 pmimage of results is here https://photos.app.goo.gl/Z8oHjSkUtINqauUe2
image of results is here https://photos.app.goo.gl/Z8oHjSkUtINqauUe2
philip.shannon
37 Posts
Quote from philip.shannon on September 29, 2017, 7:12 pmsee above link for results
see above link for results
admin
2,930 Posts
Quote from admin on September 29, 2017, 8:04 pmGood it worked 🙂 it could be IE, we only test with chrome and firefox.
The test you show ( iops, ? threads ) shows cpu is the bottleneck your network and disks are not busy (under utilized). More cpu /cores will give better iops, . Note that write speed is typically 4 to 6 times slower that read since it writes twice ( journal then data partitions ) x the number of replicas. Note that an accurate test will run clients on non storage nodes as per the blue info banner at top, in our case the node that is simulating client test is also a storage server, it is very likely its cpu will be 100% due to the extra client simulation load.
Can you test 4M throuput using 1, 16 and 64 threads and do the same with 4k iops. and see:
- If the numbers scale up as you simulate more threads
- What are the bottlenecks for iops and throughput ?
Good it worked 🙂 it could be IE, we only test with chrome and firefox.
The test you show ( iops, ? threads ) shows cpu is the bottleneck your network and disks are not busy (under utilized). More cpu /cores will give better iops, . Note that write speed is typically 4 to 6 times slower that read since it writes twice ( journal then data partitions ) x the number of replicas. Note that an accurate test will run clients on non storage nodes as per the blue info banner at top, in our case the node that is simulating client test is also a storage server, it is very likely its cpu will be 100% due to the extra client simulation load.
Can you test 4M throuput using 1, 16 and 64 threads and do the same with 4k iops. and see:
- If the numbers scale up as you simulate more threads
- What are the bottlenecks for iops and throughput ?
admin
2,930 Posts
Quote from admin on September 29, 2017, 9:02 pmOne more thing that is specific to running virtualized under vmware: the cpu% usage may mean the vm is waiting for something from ESX host if %iowait is high
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017926
So please click detail and see if the cpu% is mostly user% and system% corresponding to something internal to he vm ie PetaSAN related or if its caused by iowait% meaning it is waiting for host
One more thing that is specific to running virtualized under vmware: the cpu% usage may mean the vm is waiting for something from ESX host if %iowait is high
So please click detail and see if the cpu% is mostly user% and system% corresponding to something internal to he vm ie PetaSAN related or if its caused by iowait% meaning it is waiting for host
philip.shannon
37 Posts
Quote from philip.shannon on October 4, 2017, 5:52 pmwill try these later this week. thank you
will try these later this week. thank you
philip.shannon
37 Posts
Quote from philip.shannon on October 5, 2017, 4:42 pmwe have experience with %iowait don't think it's in play here, we have problems when host CPU's are oversubscribed. In this case each host is only running 1 single VM.
I did increase the vCPU's from 4 to 8 and ran all of the benchmarks. Tough to post results here though
we have experience with %iowait don't think it's in play here, we have problems when host CPU's are oversubscribed. In this case each host is only running 1 single VM.
I did increase the vCPU's from 4 to 8 and ran all of the benchmarks. Tough to post results here though