Journal NVME reducing Cluster Performance?
eazyadm
25 Posts
April 21, 2022, 2:14 pmQuote from eazyadm on April 21, 2022, 2:14 pmHi,
we're facing high load on our nvme journal discs. They have all ~100% load and I can't belive, but the petasan cluster is slow as hell.
I know there where an issue with wrong kernel stats on nvme devices, but that shouldn't be the problem, we're running version 3.
We also found some thread with nearly the same issue but without an solutions or it was not exact the same problem.
6 Storage Node with 12x 7.2k disks, 1x 2TB NVME for Journal, 128 GB memory, 4x 10 GB Network
If I use the HD_Speed tool on a windows server which is connected via iscsi to petasan, I get about 10mb/s
If we use petasan Benchmarking the values are the following and you can see the nvme load
The NVME's are:
SAMSUNG MZ1LB1T9HALS-00007
Micron_7300_MTFDHBG1T9TDF
Are there any ideas?
Thanks
Hi,
we're facing high load on our nvme journal discs. They have all ~100% load and I can't belive, but the petasan cluster is slow as hell.
I know there where an issue with wrong kernel stats on nvme devices, but that shouldn't be the problem, we're running version 3.
We also found some thread with nearly the same issue but without an solutions or it was not exact the same problem.
6 Storage Node with 12x 7.2k disks, 1x 2TB NVME for Journal, 128 GB memory, 4x 10 GB Network
If I use the HD_Speed tool on a windows server which is connected via iscsi to petasan, I get about 10mb/s
If we use petasan Benchmarking the values are the following and you can see the nvme load
The NVME's are:
SAMSUNG MZ1LB1T9HALS-00007
Micron_7300_MTFDHBG1T9TDF
Are there any ideas?
Thanks
Last edited on April 21, 2022, 3:07 pm by eazyadm · #1
admin
2,930 Posts
April 21, 2022, 5:33 pmQuote from admin on April 21, 2022, 5:33 pmThe high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
admin
2,930 Posts
April 21, 2022, 5:33 pmQuote from admin on April 21, 2022, 5:33 pmThe high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
eazyadm
25 Posts
April 22, 2022, 10:28 amQuote from eazyadm on April 22, 2022, 10:28 amHi,
ok tahnks for your answer, I though the measurement issue was already fixed, good to know.
I think we will not get an problem with the iops because the cluster should be used only as S3 and iscsi backup destination.
The Threads will increase because of the increasing number of customers.
But, just for my feeling and mybe recovery sczenarios. Can we increase the perfomance by adding an SSD Cache? Should we use an Raid 1 Hardwarecontroller for the SSD Cache or does Petasan handle the redundancy?
Thanks
Hi,
ok tahnks for your answer, I though the measurement issue was already fixed, good to know.
I think we will not get an problem with the iops because the cluster should be used only as S3 and iscsi backup destination.
The Threads will increase because of the increasing number of customers.
But, just for my feeling and mybe recovery sczenarios. Can we increase the perfomance by adding an SSD Cache? Should we use an Raid 1 Hardwarecontroller for the SSD Cache or does Petasan handle the redundancy?
Thanks
Last edited on April 22, 2022, 10:28 am by eazyadm · #4
yangsm
31 Posts
April 24, 2022, 2:17 amQuote from yangsm on April 24, 2022, 2:17 amWe are working on updating to a 5.14 kernel which should fix this
When do you plan to release the 5.14 kernel
We are working on updating to a 5.14 kernel which should fix this
When do you plan to release the 5.14 kernel
admin
2,930 Posts
April 24, 2022, 1:41 pmQuote from admin on April 24, 2022, 1:41 pmIt is planed for 3.2 release, we do not have date for it but roughly July/August. Current 3.1 is in testing phase and should be out mid May, it will mainly include ability to manually/automatically re-assign ips for better load distribution.
for 5.14, just curious are you looking to this to solve the nvme % used statistics or you are interest for something else ?
It is planed for 3.2 release, we do not have date for it but roughly July/August. Current 3.1 is in testing phase and should be out mid May, it will mainly include ability to manually/automatically re-assign ips for better load distribution.
for 5.14, just curious are you looking to this to solve the nvme % used statistics or you are interest for something else ?
yangsm
31 Posts
April 24, 2022, 2:52 pmQuote from yangsm on April 24, 2022, 2:52 pmI'm interested in its other features, can I test this kernel alone?
I'm interested in its other features, can I test this kernel alone?
admin
2,930 Posts
April 24, 2022, 3:21 pmQuote from admin on April 24, 2022, 3:21 pmwrite another post in a couple of weeks, we can make it for your testing.
write another post in a couple of weeks, we can make it for your testing.
yangsm
31 Posts
Journal NVME reducing Cluster Performance?
eazyadm
25 Posts
Quote from eazyadm on April 21, 2022, 2:14 pmHi,
we're facing high load on our nvme journal discs. They have all ~100% load and I can't belive, but the petasan cluster is slow as hell.
I know there where an issue with wrong kernel stats on nvme devices, but that shouldn't be the problem, we're running version 3.
We also found some thread with nearly the same issue but without an solutions or it was not exact the same problem.6 Storage Node with 12x 7.2k disks, 1x 2TB NVME for Journal, 128 GB memory, 4x 10 GB Network
If I use the HD_Speed tool on a windows server which is connected via iscsi to petasan, I get about 10mb/s
If we use petasan Benchmarking the values are the following and you can see the nvme load
The NVME's are:
SAMSUNG MZ1LB1T9HALS-00007
Micron_7300_MTFDHBG1T9TDFAre there any ideas?
Thanks
Hi,
we're facing high load on our nvme journal discs. They have all ~100% load and I can't belive, but the petasan cluster is slow as hell.
I know there where an issue with wrong kernel stats on nvme devices, but that shouldn't be the problem, we're running version 3.
We also found some thread with nearly the same issue but without an solutions or it was not exact the same problem.
6 Storage Node with 12x 7.2k disks, 1x 2TB NVME for Journal, 128 GB memory, 4x 10 GB Network
If I use the HD_Speed tool on a windows server which is connected via iscsi to petasan, I get about 10mb/s
If we use petasan Benchmarking the values are the following and you can see the nvme load
The NVME's are:
SAMSUNG MZ1LB1T9HALS-00007
Micron_7300_MTFDHBG1T9TDF
Are there any ideas?
Thanks
admin
2,930 Posts
Quote from admin on April 21, 2022, 5:33 pmThe high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
admin
2,930 Posts
Quote from admin on April 21, 2022, 5:33 pmThe high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
The high load on nvme is as you mentioned, incorrectly measured by the kernel on some nvme models, some nvme models do work fine in your case i think it is incorrect. We are working on updating to a 5.14 kernel which should fix this.
As for iops:
First if you care about iops you should be using an all flash cluster, HDDs do not give high iops, this is specially true for a distributed system.
From rados bench: the read iops are high, partially because of OSD caching, for writes you get 850 iops for 16 threads so each thread is see-ing 53 write iops itself, note that this is replicated io so the speed is not bad for HDD cluster. You have 72 disks, so you should be able to increase you total write iops by throwing more threads, say 64 or 128 with 2 clients, this way your 72 disks will be fully used, still the single thread iops of 53 iops cannot be increased as this is inherent latency.
Have not used the HD_Speed tool, but from the screen shot you are using 4k block size, so you are really measuring iops, at 10 MB/s with 4kb block you are getting 2500 iops. Not sure how many threads, but this is not bad. If you need higher MB/s you need to use large block sizes.
eazyadm
25 Posts
Quote from eazyadm on April 22, 2022, 10:28 amHi,
ok tahnks for your answer, I though the measurement issue was already fixed, good to know.
I think we will not get an problem with the iops because the cluster should be used only as S3 and iscsi backup destination.
The Threads will increase because of the increasing number of customers.
But, just for my feeling and mybe recovery sczenarios. Can we increase the perfomance by adding an SSD Cache? Should we use an Raid 1 Hardwarecontroller for the SSD Cache or does Petasan handle the redundancy?
Thanks
Hi,
ok tahnks for your answer, I though the measurement issue was already fixed, good to know.
I think we will not get an problem with the iops because the cluster should be used only as S3 and iscsi backup destination.
The Threads will increase because of the increasing number of customers.
But, just for my feeling and mybe recovery sczenarios. Can we increase the perfomance by adding an SSD Cache? Should we use an Raid 1 Hardwarecontroller for the SSD Cache or does Petasan handle the redundancy?
Thanks
yangsm
31 Posts
Quote from yangsm on April 24, 2022, 2:17 amWe are working on updating to a 5.14 kernel which should fix this
When do you plan to release the 5.14 kernel
We are working on updating to a 5.14 kernel which should fix this
When do you plan to release the 5.14 kernel
admin
2,930 Posts
Quote from admin on April 24, 2022, 1:41 pmIt is planed for 3.2 release, we do not have date for it but roughly July/August. Current 3.1 is in testing phase and should be out mid May, it will mainly include ability to manually/automatically re-assign ips for better load distribution.
for 5.14, just curious are you looking to this to solve the nvme % used statistics or you are interest for something else ?
It is planed for 3.2 release, we do not have date for it but roughly July/August. Current 3.1 is in testing phase and should be out mid May, it will mainly include ability to manually/automatically re-assign ips for better load distribution.
for 5.14, just curious are you looking to this to solve the nvme % used statistics or you are interest for something else ?
yangsm
31 Posts
Quote from yangsm on April 24, 2022, 2:52 pmI'm interested in its other features, can I test this kernel alone?
I'm interested in its other features, can I test this kernel alone?
admin
2,930 Posts
Quote from admin on April 24, 2022, 3:21 pmwrite another post in a couple of weeks, we can make it for your testing.
write another post in a couple of weeks, we can make it for your testing.
yangsm
31 Posts