EC RBD over iscsi performance
shadowlin
67 Posts
March 12, 2019, 8:04 amQuote from shadowlin on March 12, 2019, 8:04 ambackground:
We have a existed production cluster which we can't entirely switch to petasan. So we are using a petasan node as iscsi-gateway with only targetcli tool. I know this is not a typical use case of petasan, but it works fine for us.
the problem:
Everything works as expected when we use replicated pool for rbd we can get 80%+ performance with rbd over iscsi compare to native rbd(kernel rbd). We have been looking forward to try EC rbd over iscsi since luminous. We tried EC rbd over iscsi with petasan2.0 node as iscsi-gateway(with targetcli),it worked but the performance was very low(20% of kernel rbd performance) so we gave up and think this is beacuse petasan2.0 dosen't support EC. Then petasan 2.2 is released and we tried it again,but the performance is still the same.
I am wondering is there any change for how we should create the target or uses the rbd?
Thanks
background:
We have a existed production cluster which we can't entirely switch to petasan. So we are using a petasan node as iscsi-gateway with only targetcli tool. I know this is not a typical use case of petasan, but it works fine for us.
the problem:
Everything works as expected when we use replicated pool for rbd we can get 80%+ performance with rbd over iscsi compare to native rbd(kernel rbd). We have been looking forward to try EC rbd over iscsi since luminous. We tried EC rbd over iscsi with petasan2.0 node as iscsi-gateway(with targetcli),it worked but the performance was very low(20% of kernel rbd performance) so we gave up and think this is beacuse petasan2.0 dosen't support EC. Then petasan 2.2 is released and we tried it again,but the performance is still the same.
I am wondering is there any change for how we should create the target or uses the rbd?
Thanks
admin
2,930 Posts
March 12, 2019, 8:48 amQuote from admin on March 12, 2019, 8:48 amGenerally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
Generally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
Last edited on March 12, 2019, 8:52 am by admin · #2
shadowlin
67 Posts
March 12, 2019, 9:22 amQuote from shadowlin on March 12, 2019, 9:22 amThank you for the quick reply.
I am using targetcli to manually create iscsi target so I was worried about I may missed something to cause the low performance.
The test was done with 4m block seq write/read and compare between ec rbd over iscsi and native ec rbd.
What kind of changes are made to let petasan2.2 support EC ? I noticed the kernel is 4.12 now
Thank you for the quick reply.
I am using targetcli to manually create iscsi target so I was worried about I may missed something to cause the low performance.
The test was done with 4m block seq write/read and compare between ec rbd over iscsi and native ec rbd.
What kind of changes are made to let petasan2.2 support EC ? I noticed the kernel is 4.12 now
shadowlin
67 Posts
March 14, 2019, 7:49 amQuote from shadowlin on March 14, 2019, 7:49 am
Quote from admin on March 12, 2019, 8:48 am
Generally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
We did further test use fio
the fio parameters are
[seq-write]
description="seq-write"
direct=1
ioengine=libaio
filename=/dev/sda(/dev/rbd0)
numjobs=8
iodepth=16
group_reporting
rw=write
bs=4M
The throughput with native ec rbd is around 1000MB/s but the throughput with ec rbd over iscsi is only around 150MB/s.
Quote from admin on March 12, 2019, 8:48 am
Generally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
We did further test use fio
the fio parameters are
[seq-write]
description="seq-write"
direct=1
ioengine=libaio
filename=/dev/sda(/dev/rbd0)
numjobs=8
iodepth=16
group_reporting
rw=write
bs=4M
The throughput with native ec rbd is around 1000MB/s but the throughput with ec rbd over iscsi is only around 150MB/s.
admin
2,930 Posts
March 14, 2019, 10:36 amQuote from admin on March 14, 2019, 10:36 amMost likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
Most likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
Last edited on March 14, 2019, 10:58 am by admin · #5
shadowlin
67 Posts
March 15, 2019, 1:02 amQuote from shadowlin on March 15, 2019, 1:02 am
Quote from admin on March 14, 2019, 10:36 am
Most likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
I tested on replicated pool too yesterday, it turned out it effect the replicated pool too(throughput of replicated pool is the same as the ec pool).
I looked through my test log and found we have only tested with petasan1.5(ceph:jewel) and get the good result for rbd over iscsi compare to native rbd.
I will do the same test with petasan 1.5 again.
Could ceph jewel to luminous and petasan 1.5 to petasan 2.2 effect the result?
Where can i find the lio tuning post besides "New performance tuning recommendations"?
Quote from admin on March 14, 2019, 10:36 am
Most likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
I tested on replicated pool too yesterday, it turned out it effect the replicated pool too(throughput of replicated pool is the same as the ec pool).
I looked through my test log and found we have only tested with petasan1.5(ceph:jewel) and get the good result for rbd over iscsi compare to native rbd.
I will do the same test with petasan 1.5 again.
Could ceph jewel to luminous and petasan 1.5 to petasan 2.2 effect the result?
Where can i find the lio tuning post besides "New performance tuning recommendations"?
Last edited on March 15, 2019, 4:35 am by shadowlin · #6
admin
2,930 Posts
March 15, 2019, 6:41 amQuote from admin on March 15, 2019, 6:41 amHard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
Hard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
Last edited on March 15, 2019, 6:41 am by admin · #7
shadowlin
67 Posts
March 15, 2019, 9:01 amQuote from shadowlin on March 15, 2019, 9:01 am
Quote from admin on March 15, 2019, 6:41 am
Hard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
I have tried the new tuning recommendations without luck. But I am not sure how to make sure if the new setting is working. Is there a way to check if new setting is working?
You mentioned to change the block size. How should I change it both for initiator side and target side?
Quote from admin on March 15, 2019, 6:41 am
Hard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
I have tried the new tuning recommendations without luck. But I am not sure how to make sure if the new setting is working. Is there a way to check if new setting is working?
You mentioned to change the block size. How should I change it both for initiator side and target side?
Last edited on March 15, 2019, 9:01 am by shadowlin · #8
admin
2,930 Posts
March 15, 2019, 12:31 pmQuote from admin on March 15, 2019, 12:31 pmYou should also do it client side, we did include settings for ESXi clients, for Windows the defaults are not bad, if you are using Linux clients you need increase it there as well. A reboot may be needed. The most direct way to know the negotiated block size is to monitor tcp messages messages across the wire another way you may be able to increase the logging level at either end you may be able see this getting logged but i am not sure, we do put out our trace messages while debugging but we remove them but maybe you can see something from standard LIO.
You should also do it client side, we did include settings for ESXi clients, for Windows the defaults are not bad, if you are using Linux clients you need increase it there as well. A reboot may be needed. The most direct way to know the negotiated block size is to monitor tcp messages messages across the wire another way you may be able to increase the logging level at either end you may be able see this getting logged but i am not sure, we do put out our trace messages while debugging but we remove them but maybe you can see something from standard LIO.
shadowlin
67 Posts
March 18, 2019, 1:15 amQuote from shadowlin on March 18, 2019, 1:15 amwhat is the recommended block size for linux initiator?
BTW: in the tunning guide said the backstore block size is 512.what is the unit?512KB?
BTW2: which setting in the lio setting is to set the block size? is that the block_size in backstore setting?
what is the recommended block size for linux initiator?
BTW: in the tunning guide said the backstore block size is 512.what is the unit?512KB?
BTW2: which setting in the lio setting is to set the block size? is that the block_size in backstore setting?
Last edited on March 18, 2019, 3:25 am by shadowlin · #10
EC RBD over iscsi performance
shadowlin
67 Posts
Quote from shadowlin on March 12, 2019, 8:04 ambackground:
We have a existed production cluster which we can't entirely switch to petasan. So we are using a petasan node as iscsi-gateway with only targetcli tool. I know this is not a typical use case of petasan, but it works fine for us.
the problem:
Everything works as expected when we use replicated pool for rbd we can get 80%+ performance with rbd over iscsi compare to native rbd(kernel rbd). We have been looking forward to try EC rbd over iscsi since luminous. We tried EC rbd over iscsi with petasan2.0 node as iscsi-gateway(with targetcli),it worked but the performance was very low(20% of kernel rbd performance) so we gave up and think this is beacuse petasan2.0 dosen't support EC. Then petasan 2.2 is released and we tried it again,but the performance is still the same.
I am wondering is there any change for how we should create the target or uses the rbd?
Thanks
background:
We have a existed production cluster which we can't entirely switch to petasan. So we are using a petasan node as iscsi-gateway with only targetcli tool. I know this is not a typical use case of petasan, but it works fine for us.
the problem:
Everything works as expected when we use replicated pool for rbd we can get 80%+ performance with rbd over iscsi compare to native rbd(kernel rbd). We have been looking forward to try EC rbd over iscsi since luminous. We tried EC rbd over iscsi with petasan2.0 node as iscsi-gateway(with targetcli),it worked but the performance was very low(20% of kernel rbd performance) so we gave up and think this is beacuse petasan2.0 dosen't support EC. Then petasan 2.2 is released and we tried it again,but the performance is still the same.
I am wondering is there any change for how we should create the target or uses the rbd?
Thanks
admin
2,930 Posts
Quote from admin on March 12, 2019, 8:48 amGenerally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
Generally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
shadowlin
67 Posts
Quote from shadowlin on March 12, 2019, 9:22 amThank you for the quick reply.
I am using targetcli to manually create iscsi target so I was worried about I may missed something to cause the low performance.
The test was done with 4m block seq write/read and compare between ec rbd over iscsi and native ec rbd.
What kind of changes are made to let petasan2.2 support EC ? I noticed the kernel is 4.12 now
Thank you for the quick reply.
I am using targetcli to manually create iscsi target so I was worried about I may missed something to cause the low performance.
The test was done with 4m block seq write/read and compare between ec rbd over iscsi and native ec rbd.
What kind of changes are made to let petasan2.2 support EC ? I noticed the kernel is 4.12 now
shadowlin
67 Posts
Quote from shadowlin on March 14, 2019, 7:49 amQuote from admin on March 12, 2019, 8:48 amGenerally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
We did further test use fio
the fio parameters are
[seq-write]
description="seq-write"
direct=1
ioengine=libaio
filename=/dev/sda(/dev/rbd0)
numjobs=8
iodepth=16
group_reporting
rw=write
bs=4M
The throughput with native ec rbd is around 1000MB/s but the throughput with ec rbd over iscsi is only around 150MB/s.
Quote from admin on March 12, 2019, 8:48 amGenerally we are able to achieve near native rbd performance at the iSCSI layer, with good hardware it is almost 100% even with high iops 4k random writes. There is nothing from the PetaSAN iSCSI client side we do differently with EC.
The other general thing is EC will always be slower than replicated, a single read io will have to be read from k+m different hosts.
We did however see that with small block 4k random writes in EC for a fresh created/ un-provisioned disk starts off "very" slow then starts to increase over time as the disk gets populated, there is a hit in creating objects but once created they get faster. Try thick provisioning the disk first (eg with dd command to fill with zeroes). Large writes or even small block sequential writes do not show this as bad.
We did further test use fio
the fio parameters are
[seq-write]
description="seq-write"
direct=1
ioengine=libaio
filename=/dev/sda(/dev/rbd0)
numjobs=8
iodepth=16
group_reporting
rw=write
bs=4M
The throughput with native ec rbd is around 1000MB/s but the throughput with ec rbd over iscsi is only around 150MB/s.
admin
2,930 Posts
Quote from admin on March 14, 2019, 10:36 amMost likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
Most likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
shadowlin
67 Posts
Quote from shadowlin on March 15, 2019, 1:02 amQuote from admin on March 14, 2019, 10:36 amMost likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
I tested on replicated pool too yesterday, it turned out it effect the replicated pool too(throughput of replicated pool is the same as the ec pool).
I looked through my test log and found we have only tested with petasan1.5(ceph:jewel) and get the good result for rbd over iscsi compare to native rbd.
I will do the same test with petasan 1.5 again.
Could ceph jewel to luminous and petasan 1.5 to petasan 2.2 effect the result?
Where can i find the lio tuning post besides "New performance tuning recommendations"?
Quote from admin on March 14, 2019, 10:36 amMost likely this is for fio the io goes out in 4M chunks, in iSCSI it is shopped to a smaller block size. To increase this, see our tuning post for LIO tuning + you also need to setup the you iSCSI initiator to similar values since the end block size is negotiated.
I am not sure why this affects EC pools than regular replicated pools, probably they are more latency sensitive. In all cases increasing the block size will help.
I tested on replicated pool too yesterday, it turned out it effect the replicated pool too(throughput of replicated pool is the same as the ec pool).
I looked through my test log and found we have only tested with petasan1.5(ceph:jewel) and get the good result for rbd over iscsi compare to native rbd.
I will do the same test with petasan 1.5 again.
Could ceph jewel to luminous and petasan 1.5 to petasan 2.2 effect the result?
Where can i find the lio tuning post besides "New performance tuning recommendations"?
admin
2,930 Posts
Quote from admin on March 15, 2019, 6:41 amHard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
Hard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
shadowlin
67 Posts
Quote from shadowlin on March 15, 2019, 9:01 amQuote from admin on March 15, 2019, 6:41 amHard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
I have tried the new tuning recommendations without luck. But I am not sure how to make sure if the new setting is working. Is there a way to check if new setting is working?
You mentioned to change the block size. How should I change it both for initiator side and target side?
Quote from admin on March 15, 2019, 6:41 amHard to say for your environment. What i can say is we do not have a 150 MB/s limit with 2.2, with good hardware we can get 1.3 GB/s throughput test for single client, it will be interesting to know if you switch back or maybe something has changed in your setup, also make sure your client is setup correctly. Yes the tuning recommendations are correct.
I have tried the new tuning recommendations without luck. But I am not sure how to make sure if the new setting is working. Is there a way to check if new setting is working?
You mentioned to change the block size. How should I change it both for initiator side and target side?
admin
2,930 Posts
Quote from admin on March 15, 2019, 12:31 pmYou should also do it client side, we did include settings for ESXi clients, for Windows the defaults are not bad, if you are using Linux clients you need increase it there as well. A reboot may be needed. The most direct way to know the negotiated block size is to monitor tcp messages messages across the wire another way you may be able to increase the logging level at either end you may be able see this getting logged but i am not sure, we do put out our trace messages while debugging but we remove them but maybe you can see something from standard LIO.
You should also do it client side, we did include settings for ESXi clients, for Windows the defaults are not bad, if you are using Linux clients you need increase it there as well. A reboot may be needed. The most direct way to know the negotiated block size is to monitor tcp messages messages across the wire another way you may be able to increase the logging level at either end you may be able see this getting logged but i am not sure, we do put out our trace messages while debugging but we remove them but maybe you can see something from standard LIO.
shadowlin
67 Posts
Quote from shadowlin on March 18, 2019, 1:15 amwhat is the recommended block size for linux initiator?
BTW: in the tunning guide said the backstore block size is 512.what is the unit?512KB?
BTW2: which setting in the lio setting is to set the block size? is that the block_size in backstore setting?
what is the recommended block size for linux initiator?
BTW: in the tunning guide said the backstore block size is 512.what is the unit?512KB?
BTW2: which setting in the lio setting is to set the block size? is that the block_size in backstore setting?