Forums - PetaSAN

ForumGeneral DiscussionJournal NVME reducing Cluster Per …
You need to log in to create posts and topics. Login · Register
Journal NVME reducing Cluster Performance?

eazyadm
25 Posts

April 21, 2022, 2:14 pm
Quote from eazyadm on April 21, 2022, 2:14 pm
Hi,

we're facing high load on our nvme journal discs. They have all ~100% load and I can't belive, but the petasan cluster is slow as hell.

I know there where an issue with wrong kernel stats on nvme devices, but that shouldn't be the problem, we're running version 3.
We also found some thread with nearly the same issue but without an solutions or it was not exact the same problem.

6 Storage Node with 12x 7.2k disks, 1x 2TB NVME for Journal, 128 GB memory, 4x 10 GB Network

If I use the HD_Speed tool on a windows server which is connected via iscsi to petasan, I get about 10mb/s

If we use petasan Benchmarking the values are the following and you can see the nvme load

The NVME's are:
SAMSUNG MZ1LB1T9HALS-00007
Micron_7300_MTFDHBG1T9TDF

Are there any ideas?

Thanks

Hi,

we're facing high load on our nvme journal discs. They have all ~100% load and I can't belive, but the petasan cluster is slow as hell.

I know there where an issue with wrong kernel stats on nvme devices, but that shouldn't be the problem, we're running version 3.
We also found some thread with nearly the same issue but without an solutions or it was not exact the same problem.

6 Storage Node with 12x 7.2k disks, 1x 2TB NVME for Journal, 128 GB memory, 4x 10 GB Network

If I use the HD_Speed tool on a windows server which is connected via iscsi to petasan, I get about 10mb/s

If we use petasan Benchmarking the values are the following and you can see the nvme load

The NVME's are:
SAMSUNG MZ1LB1T9HALS-00007
Micron_7300_MTFDHBG1T9TDF

Are there any ideas?

Thanks

Last edited on April 21, 2022, 3:07 pm by eazyadm · #1

admin
2,967 Posts

April 21, 2022, 5:33 pm
Quote from admin on April 21, 2022, 5:33 pm
The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.

As for iops:

First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.

From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.

Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.

The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.

As for iops:

First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.

From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.

Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.

#2

admin
2,967 Posts

April 21, 2022, 5:33 pm
Quote from admin on April 21, 2022, 5:33 pm
The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.

As for iops:

First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.

From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.

Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.

The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.

As for iops:

First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.

From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.

Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.

#3

eazyadm
25 Posts

April 22, 2022, 10:28 am
Quote from eazyadm on April 22, 2022, 10:28 am
Hi,

ok tahnks for your answer, I though the measurement issue was already fixed, good to know.

I think we will not get an problem with the iops because the cluster should be used only as S3 and iscsi backup destination.

The Threads will increase because of the increasing number of customers.

But, just for my feeling and mybe recovery sczenarios. Can we increase the perfomance by adding an SSD Cache? Should we use an Raid 1 Hardwarecontroller for the SSD Cache or does Petasan handle the redundancy?

Thanks

Hi,

ok tahnks for your answer, I though the measurement issue was already fixed, good to know.

I think we will not get an problem with the iops because the cluster should be used only as S3 and iscsi backup destination.

The Threads will increase because of the increasing number of customers.

But, just for my feeling and mybe recovery sczenarios. Can we increase the perfomance by adding an SSD Cache? Should we use an Raid 1 Hardwarecontroller for the SSD Cache or does Petasan handle the redundancy?

Thanks

Last edited on April 22, 2022, 10:28 am by eazyadm · #4

yangsm
34 Posts

April 24, 2022, 2:17 am
Quote from yangsm on April 24, 2022, 2:17 am
We are working on updating to a 5.14 kernel which should fix this

When do you plan to release the 5.14 kernel

We are working on updating to a 5.14 kernel which should fix this

When do you plan to release the 5.14 kernel

#5

admin
2,967 Posts

April 24, 2022, 1:41 pm
Quote from admin on April 24, 2022, 1:41 pm
It is planed for 3.2 release, we do not have date for it but roughly July/August. Current 3.1 is in testing phase and should be out mid May, it will mainly include ability to manually/automatically re-assign ips for better load distribution.

for 5.14, just curious are you looking to this to solve the nvme % used statistics or you are interest for something else ?

It is planed for 3.2 release, we do not have date for it but roughly July/August. Current 3.1 is in testing phase and should be out mid May, it will mainly include ability to manually/automatically re-assign ips for better load distribution.

for 5.14, just curious are you looking to this to solve the nvme % used statistics or you are interest for something else ?

#6

yangsm
34 Posts

April 24, 2022, 2:52 pm
Quote from yangsm on April 24, 2022, 2:52 pm
I'm interested in its other features, can I test this kernel alone?

I'm interested in its other features, can I test this kernel alone?

#7

admin
2,967 Posts

April 24, 2022, 3:21 pm
Quote from admin on April 24, 2022, 3:21 pm
write another post in a couple of weeks, we can make it for your testing.

write another post in a couple of weeks, we can make it for your testing.

#8

yangsm
34 Posts

April 24, 2022, 10:16 pm
Quote from yangsm on April 24, 2022, 10:16 pm
Thank you

Thank you

#9

Post Reply: Journal NVME reducing Cluster Performance?

Cancel