Forums - PetaSAN

ForumGeneral Discussion1.4 benchmark
You need to log in to create posts and topics. Login · Register
1.4 benchmark

philip.shannon
37 Posts

September 29, 2017, 1:17 pm
Quote from philip.shannon on September 29, 2017, 1:17 pm
Hi- I got the cluster up and running using ver 1.4, 3 Dell R730 servers each with 1 petaSAN 1.4 vm using RDM x10 SSD drives. When I run the benchmark from the gui nothing happens except the spinning wheel, I waited like 30 minutes for results. SO I checked the logfiles and saw some commands, tried them from one of the vm's

root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4096 -m w
40029
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -m r
40078
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/storage_load.py -d 15
40410

root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4194304 -m w
42418
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -m r
42448

root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -b 4096 -m w
43040

update: the graphs at the main page show zero iops and zero throughput. Also, my admin logon keeps timing out. Is there a way to turn that off, or change it to 2 or 4 hours before timeout and have to re-login? thanks

Hi- I got the cluster up and running using ver 1.4, 3 Dell R730 servers each with 1 petaSAN 1.4 vm using RDM x10 SSD drives. When I run the benchmark from the gui nothing happens except the spinning wheel, I waited like 30 minutes for results. SO I checked the logfiles and saw some commands, tried them from one of the vm's

root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4096 -m w
40029
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -m r
40078
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/storage_load.py -d 15
40410

root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 2 -b 4194304 -m w
42418
root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -m r
42448

root@VENGPS01:~# python /opt/petasan/scripts/jobs/benchmark/client_stress.py -d 30 -t 16 -b 4096 -m w
43040

update: the graphs at the main page show zero iops and zero throughput. Also, my admin logon keeps timing out. Is there a way to turn that off, or change it to 2 or 4 hours before timeout and have to re-login? thanks

Last edited on September 29, 2017, 1:34 pm by philip.shannon · #1

admin
2,961 Posts

September 29, 2017, 3:23 pm
Quote from admin on September 29, 2017, 3:23 pm
Hi there,

Could be the web session is expiring before the test completes. The default session timeout is 5 min. If this is the case it may be because under vmware virtualization the stress tests takes too long to complete, another possibility is if you use firefox try to have accurate time set on your browser box and your vms, chrome is more forgiving.

To look in more detail i recommend you do:

Have the correct time set on your browser and vms if using firefox or use chrome.

If this is a test cluster, please reboot all vms (it will give us a clean state when looking at the logs)

Run a 1 min 4K test with 1 worker thread, chose any 1 client out of the 3 to simulate client io

During the test and while the spinning wheel is running open another tab in your browser (do not close the benchmark tab but open a new tab in the same browser ) and access the PetaSAN management url: it should display the home dashboard page without asking for login, if does ask for login then the session expires too quickly.

Please collect PetaSAN.log on all 3 machines and email the to admin @ petasan.org

The cluster iops and throughput on the first page do not get affected by the benchmark test, as the test creates a separate test pool that does not affect the cluster stats. However the node stats charts will stress for node resources such as cpu/net/disks if the test duration was over 1 min, the sample time of node resource charts.

Hi there,

Could be the web session is expiring before the test completes. The default session timeout is 5 min. If this is the case it may be because under vmware virtualization the stress tests takes too long to complete, another possibility is if you use firefox try to have accurate time set on your browser box and your vms, chrome is more forgiving.

To look in more detail i recommend you do:

Have the correct time set on your browser and vms if using firefox or use chrome.

If this is a test cluster, please reboot all vms (it will give us a clean state when looking at the logs)

Run a 1 min 4K test with 1 worker thread, chose any 1 client out of the 3 to simulate client io

During the test and while the spinning wheel is running open another tab in your browser (do not close the benchmark tab but open a new tab in the same browser ) and access the PetaSAN management url: it should display the home dashboard page without asking for login, if does ask for login then the session expires too quickly.

Please collect PetaSAN.log on all 3 machines and email the to admin @ petasan.org

The cluster iops and throughput on the first page do not get affected by the benchmark test, as the test creates a separate test pool that does not affect the cluster stats. However the node stats charts will stress for node resources such as cpu/net/disks if the test duration was over 1 min, the sample time of node resource charts.

Last edited on September 29, 2017, 3:24 pm by admin · #2

philip.shannon
37 Posts

September 29, 2017, 7:11 pm
Quote from philip.shannon on September 29, 2017, 7:11 pm
After reboot, and switching from IE to Chrome, I got results!

Results

Cluster IOPS

Write Read

3848 19748

After reboot, and switching from IE to Chrome, I got results!

Results

Cluster IOPS

Write Read

3848 19748

#3

philip.shannon
37 Posts

September 29, 2017, 7:12 pm
Quote from philip.shannon on September 29, 2017, 7:12 pm
image of results is here https://photos.app.goo.gl/Z8oHjSkUtINqauUe2

image of results is here https://photos.app.goo.gl/Z8oHjSkUtINqauUe2

Last edited on September 29, 2017, 7:16 pm by philip.shannon · #4

philip.shannon
37 Posts

September 29, 2017, 7:12 pm
Quote from philip.shannon on September 29, 2017, 7:12 pm
see above link for results

see above link for results

Last edited on September 29, 2017, 7:17 pm by philip.shannon · #5

admin
2,961 Posts

September 29, 2017, 8:04 pm
Quote from admin on September 29, 2017, 8:04 pm
Good it worked 🙂 it could be IE, we only test with chrome and firefox.

The test you show ( iops, ? threads ) shows cpu is the bottleneck your network and disks are not busy (under utilized). More cpu /cores will give better iops, . Note that write speed is typically 4 to 6 times slower that read since it writes twice ( journal then data partitions ) x the number of replicas. Note that an accurate test will run clients on non storage nodes as per the blue info banner at top, in our case the node that is simulating client test is also a storage server, it is very likely its cpu will be 100% due to the extra client simulation load.

Can you test 4M throuput using 1, 16 and 64 threads and do the same with 4k iops. and see:

If the numbers scale up as you simulate more threads

What are the bottlenecks for iops and throughput ?

Good it worked 🙂 it could be IE, we only test with chrome and firefox.

The test you show ( iops, ? threads ) shows cpu is the bottleneck your network and disks are not busy (under utilized). More cpu /cores will give better iops, . Note that write speed is typically 4 to 6 times slower that read since it writes twice ( journal then data partitions ) x the number of replicas. Note that an accurate test will run clients on non storage nodes as per the blue info banner at top, in our case the node that is simulating client test is also a storage server, it is very likely its cpu will be 100% due to the extra client simulation load.

Can you test 4M throuput using 1, 16 and 64 threads and do the same with 4k iops. and see:

If the numbers scale up as you simulate more threads

What are the bottlenecks for iops and throughput ?

#6

admin
2,961 Posts

September 29, 2017, 9:02 pm
Quote from admin on September 29, 2017, 9:02 pm
One more thing that is specific to running virtualized under vmware: the cpu% usage may mean the vm is waiting for something from ESX host if %iowait is high

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017926

So please click detail and see if the cpu% is mostly user% and system% corresponding to something internal to he vm ie PetaSAN related or if its caused by iowait% meaning it is waiting for host

One more thing that is specific to running virtualized under vmware: the cpu% usage may mean the vm is waiting for something from ESX host if %iowait is high

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017926

So please click detail and see if the cpu% is mostly user% and system% corresponding to something internal to he vm ie PetaSAN related or if its caused by iowait% meaning it is waiting for host

Last edited on September 29, 2017, 9:04 pm by admin · #7

philip.shannon
37 Posts

October 4, 2017, 5:52 pm
Quote from philip.shannon on October 4, 2017, 5:52 pm
will try these later this week. thank you

will try these later this week. thank you

#8

philip.shannon
37 Posts

October 5, 2017, 4:42 pm
Quote from philip.shannon on October 5, 2017, 4:42 pm
we have experience with %iowait don't think it's in play here, we have problems when host CPU's are oversubscribed. In this case each host is only running 1 single VM.

I did increase the vCPU's from 4 to 8 and ran all of the benchmarks. Tough to post results here though

we have experience with %iowait don't think it's in play here, we have problems when host CPU's are oversubscribed. In this case each host is only running 1 single VM.

I did increase the vCPU's from 4 to 8 and ran all of the benchmarks. Tough to post results here though

#9

Post Reply: 1.4 benchmark

Cancel