Sanity Check on NVMe system throughput
garfield659
10 Posts
February 6, 2024, 11:02 pmQuote from garfield659 on February 6, 2024, 11:02 pmI setup PetaSAN with 7 nodes, 3 for management only, and 4 with 4X U.2 NVMe OSD drives each (16 NVMe total OSDs), and a 10GB network. Running the PetaSAN benchmark on the 3:2 replicated RBD pool (512 PGs) I am getting the following:
Test: 4M Throughput rados benchmark with 16 threads per client and 1 min duration using 1 client
Write: 622MB/s Read: 1398MB/s
Test: 4k IOPS rados benchmark
Write: 18812 IOPS Read: 36909 IOPS
Does this seem reasonable? I was expecting these number to be a lot higher. CPU utilization is under 20% max, memory is under 10% max, and network is well under 10% max. The Disk Util% is 99-100 on each OSD when running the IOPS test, which leads me to believe everything is fine, just my expectations were too high. The Disk Util% is avg 21-41 and max 25-90 when running throughput test. Oh and the NVMe drives are Samsung MZQLW960HMJP-00003 960GB U.2 drives.
Thank you.
I setup PetaSAN with 7 nodes, 3 for management only, and 4 with 4X U.2 NVMe OSD drives each (16 NVMe total OSDs), and a 10GB network. Running the PetaSAN benchmark on the 3:2 replicated RBD pool (512 PGs) I am getting the following:
Test: 4M Throughput rados benchmark with 16 threads per client and 1 min duration using 1 client
Write: 622MB/s Read: 1398MB/s
Test: 4k IOPS rados benchmark
Write: 18812 IOPS Read: 36909 IOPS
Does this seem reasonable? I was expecting these number to be a lot higher. CPU utilization is under 20% max, memory is under 10% max, and network is well under 10% max. The Disk Util% is 99-100 on each OSD when running the IOPS test, which leads me to believe everything is fine, just my expectations were too high. The Disk Util% is avg 21-41 and max 25-90 when running throughput test. Oh and the NVMe drives are Samsung MZQLW960HMJP-00003 960GB U.2 drives.
Thank you.
garfield659
10 Posts
February 7, 2024, 12:04 amQuote from garfield659 on February 7, 2024, 12:04 amUpdate, I went into the "Show Details" and found 2 OSDs that had high aqu-sz and await. I replaced them and my speeds went up to 1051MB/s write and 1392MB/s read. I found I had some Dell 960GB drives in the mix and they seem to be having some problems.
In any case if someone can let me know if I am in the ballpark of expected speed I would appreciate it. The read speed still seems rather low to me.
Update, I went into the "Show Details" and found 2 OSDs that had high aqu-sz and await. I replaced them and my speeds went up to 1051MB/s write and 1392MB/s read. I found I had some Dell 960GB drives in the mix and they seem to be having some problems.
In any case if someone can let me know if I am in the ballpark of expected speed I would appreciate it. The read speed still seems rather low to me.
garfield659
10 Posts
February 7, 2024, 7:59 pmQuote from garfield659 on February 7, 2024, 7:59 pmAfter closer examination I found that 1392MB/s translates to roughly to 11Gbps (I have bonded 2 10Gbps links). The problem now is PetaSAN benchmark will work every once in a while, but most of the time I get "Alert: Error loading benchmark test report". I tired with multiple clients and it still happens. I watched the switch and it is performing the test, I see input rate go up for a bit and then output rate from the client I am testing with, but then the error is returned.
Anyone know how to fix the issue with "Alert: Error loading benchmark test report"?
After closer examination I found that 1392MB/s translates to roughly to 11Gbps (I have bonded 2 10Gbps links). The problem now is PetaSAN benchmark will work every once in a while, but most of the time I get "Alert: Error loading benchmark test report". I tired with multiple clients and it still happens. I watched the switch and it is performing the test, I see input rate go up for a bit and then output rate from the client I am testing with, but then the error is returned.
Anyone know how to fix the issue with "Alert: Error loading benchmark test report"?
admin
2,930 Posts
February 12, 2024, 10:00 pmQuote from admin on February 12, 2024, 10:00 pmThe latency is approx 1 ms write ( 1000 iops for 1 thread), 0.3 ms reads ( 3000 iops for 1 thread). If you increase the number of client threads, you will get higher results.
The latency is approx 1 ms write ( 1000 iops for 1 thread), 0.3 ms reads ( 3000 iops for 1 thread). If you increase the number of client threads, you will get higher results.
Sanity Check on NVMe system throughput
garfield659
10 Posts
Quote from garfield659 on February 6, 2024, 11:02 pmI setup PetaSAN with 7 nodes, 3 for management only, and 4 with 4X U.2 NVMe OSD drives each (16 NVMe total OSDs), and a 10GB network. Running the PetaSAN benchmark on the 3:2 replicated RBD pool (512 PGs) I am getting the following:
Test: 4M Throughput rados benchmark with 16 threads per client and 1 min duration using 1 client
Write: 622MB/s Read: 1398MB/s
Test: 4k IOPS rados benchmark
Write: 18812 IOPS Read: 36909 IOPS
Does this seem reasonable? I was expecting these number to be a lot higher. CPU utilization is under 20% max, memory is under 10% max, and network is well under 10% max. The Disk Util% is 99-100 on each OSD when running the IOPS test, which leads me to believe everything is fine, just my expectations were too high. The Disk Util% is avg 21-41 and max 25-90 when running throughput test. Oh and the NVMe drives are Samsung MZQLW960HMJP-00003 960GB U.2 drives.
Thank you.
I setup PetaSAN with 7 nodes, 3 for management only, and 4 with 4X U.2 NVMe OSD drives each (16 NVMe total OSDs), and a 10GB network. Running the PetaSAN benchmark on the 3:2 replicated RBD pool (512 PGs) I am getting the following:
Test: 4M Throughput rados benchmark with 16 threads per client and 1 min duration using 1 client
Write: 622MB/s Read: 1398MB/s
Test: 4k IOPS rados benchmark
Write: 18812 IOPS Read: 36909 IOPS
Does this seem reasonable? I was expecting these number to be a lot higher. CPU utilization is under 20% max, memory is under 10% max, and network is well under 10% max. The Disk Util% is 99-100 on each OSD when running the IOPS test, which leads me to believe everything is fine, just my expectations were too high. The Disk Util% is avg 21-41 and max 25-90 when running throughput test. Oh and the NVMe drives are Samsung MZQLW960HMJP-00003 960GB U.2 drives.
Thank you.
garfield659
10 Posts
Quote from garfield659 on February 7, 2024, 12:04 amUpdate, I went into the "Show Details" and found 2 OSDs that had high aqu-sz and await. I replaced them and my speeds went up to 1051MB/s write and 1392MB/s read. I found I had some Dell 960GB drives in the mix and they seem to be having some problems.
In any case if someone can let me know if I am in the ballpark of expected speed I would appreciate it. The read speed still seems rather low to me.
Update, I went into the "Show Details" and found 2 OSDs that had high aqu-sz and await. I replaced them and my speeds went up to 1051MB/s write and 1392MB/s read. I found I had some Dell 960GB drives in the mix and they seem to be having some problems.
In any case if someone can let me know if I am in the ballpark of expected speed I would appreciate it. The read speed still seems rather low to me.
garfield659
10 Posts
Quote from garfield659 on February 7, 2024, 7:59 pmAfter closer examination I found that 1392MB/s translates to roughly to 11Gbps (I have bonded 2 10Gbps links). The problem now is PetaSAN benchmark will work every once in a while, but most of the time I get "Alert: Error loading benchmark test report". I tired with multiple clients and it still happens. I watched the switch and it is performing the test, I see input rate go up for a bit and then output rate from the client I am testing with, but then the error is returned.
Anyone know how to fix the issue with "Alert: Error loading benchmark test report"?
After closer examination I found that 1392MB/s translates to roughly to 11Gbps (I have bonded 2 10Gbps links). The problem now is PetaSAN benchmark will work every once in a while, but most of the time I get "Alert: Error loading benchmark test report". I tired with multiple clients and it still happens. I watched the switch and it is performing the test, I see input rate go up for a bit and then output rate from the client I am testing with, but then the error is returned.
Anyone know how to fix the issue with "Alert: Error loading benchmark test report"?
admin
2,930 Posts
Quote from admin on February 12, 2024, 10:00 pmThe latency is approx 1 ms write ( 1000 iops for 1 thread), 0.3 ms reads ( 3000 iops for 1 thread). If you increase the number of client threads, you will get higher results.
The latency is approx 1 ms write ( 1000 iops for 1 thread), 0.3 ms reads ( 3000 iops for 1 thread). If you increase the number of client threads, you will get higher results.