Very slow write speed on low end 3 node cluster
admin
2,930 Posts
April 16, 2018, 10:40 amQuote from admin on April 16, 2018, 10:40 amAre using ssds as osds or as journals ? can you use straight ssds for osds (even if you have 1 per box) and see what you get.
what version of PetaSAN are you using ?
Are using ssds as osds or as journals ? can you use straight ssds for osds (even if you have 1 per box) and see what you get.
what version of PetaSAN are you using ?
sds80
14 Posts
April 16, 2018, 10:46 amQuote from sds80 on April 16, 2018, 10:46 amSSD's used as journals
petasan 1.5
yes i can setup this (hdd (system) +osd (ssd))
SSD's used as journals
petasan 1.5
yes i can setup this (hdd (system) +osd (ssd))
admin
2,930 Posts
April 17, 2018, 6:39 amQuote from admin on April 17, 2018, 6:39 amCan you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
Can you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
sds80
14 Posts
April 18, 2018, 8:05 amQuote from sds80 on April 18, 2018, 8:05 am
Quote from admin on April 17, 2018, 6:39 am
Can you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
Checked on physical and virtual windows 7 pro client machine - result is the same.
petasan 1.5 - 3nodes
1HDD (system) + 1SSD(osd+journal)
Ok testing is over. Now the results.
11G file copy ~ 45min speed is slowing down all the time and file copy ended at level - 5,38Mb/sec
Very similar results in VMware direct copy to attached petasan disk:
So yes, this config (SSD only) is 10X faster then my initial with HDD in write speed, but it is still to slow for any production purpose. So to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
So conclusion - Petasan in production on 3 nodes is a dream?
Quote from admin on April 17, 2018, 6:39 am
Can you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
Checked on physical and virtual windows 7 pro client machine - result is the same.
petasan 1.5 - 3nodes
1HDD (system) + 1SSD(osd+journal)
Ok testing is over. Now the results.
11G file copy ~ 45min speed is slowing down all the time and file copy ended at level - 5,38Mb/sec
Very similar results in VMware direct copy to attached petasan disk:
So yes, this config (SSD only) is 10X faster then my initial with HDD in write speed, but it is still to slow for any production purpose. So to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
So conclusion - Petasan in production on 3 nodes is a dream?
Last edited on April 18, 2018, 9:43 am by sds80 · #14
shadowlin
67 Posts
April 18, 2018, 8:22 amQuote from shadowlin on April 18, 2018, 8:22 amI am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
I am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
sds80
14 Posts
April 18, 2018, 10:40 amQuote from sds80 on April 18, 2018, 10:40 am
Quote from shadowlin on April 18, 2018, 8:22 am
I am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
How many nodes in your cluster and nodes configuration (CPU,RAM, RAID controller, HDD) ?
Quote from shadowlin on April 18, 2018, 8:22 am
I am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
How many nodes in your cluster and nodes configuration (CPU,RAM, RAID controller, HDD) ?
admin
2,930 Posts
April 18, 2018, 11:16 amQuote from admin on April 18, 2018, 11:16 amSo to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
The write results you see 360KB/s for hdds with ssd journal and 5 MB /s for all ssds are way off. Even with pure hdd (no ssd journal) you should get around 30 MB/s write speed for a single copy operation using Windows Server 2016 client with MPIO ( 2 paths ), this is even with 3 nodes with 1 hdd osd each. If you have more nodes/osds you will still get 30 MB/s per single copy operation, but it should scale with multiple file copy operations at the same time, so if you have 30 osds, you can have 10 copy operations with a total of 300 MB/s write speed. if you have all ssds you should get 80 MB/s per each file operations. These numbers assume (as i understand you do) copy a large size file of several GB, and not many small tiny files, the later case will give much lower numbers, even if copying local.
So something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
I am having performance issue on windows 10 with default iscsi initiator too. copy speed is only about 5-8MB/s.
Are you using all pure hdds or ssds ? Are you copying 1 large file or many tiny files ? You use 1 path not MPIO ? if you have several concurrent copies running at the same time, does it scale up ?
So to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
The write results you see 360KB/s for hdds with ssd journal and 5 MB /s for all ssds are way off. Even with pure hdd (no ssd journal) you should get around 30 MB/s write speed for a single copy operation using Windows Server 2016 client with MPIO ( 2 paths ), this is even with 3 nodes with 1 hdd osd each. If you have more nodes/osds you will still get 30 MB/s per single copy operation, but it should scale with multiple file copy operations at the same time, so if you have 30 osds, you can have 10 copy operations with a total of 300 MB/s write speed. if you have all ssds you should get 80 MB/s per each file operations. These numbers assume (as i understand you do) copy a large size file of several GB, and not many small tiny files, the later case will give much lower numbers, even if copying local.
So something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
I am having performance issue on windows 10 with default iscsi initiator too. copy speed is only about 5-8MB/s.
Are you using all pure hdds or ssds ? Are you copying 1 large file or many tiny files ? You use 1 path not MPIO ? if you have several concurrent copies running at the same time, does it scale up ?
Last edited on April 18, 2018, 11:25 am by admin · #17
shadowlin
67 Posts
April 18, 2018, 2:44 pmQuote from shadowlin on April 18, 2018, 2:44 pmI am using pure hdds and was copying 1 larger file.
I can't set mpio on my windows 10 machine(i can add multiple sessions but can't combine them,the mpio setting was grey when I enter the device window ) so it is only single path.
I will try to run several concurrent copies.
but why in linux the speed is way much faster even with single path?
I am using pure hdds and was copying 1 larger file.
I can't set mpio on my windows 10 machine(i can add multiple sessions but can't combine them,the mpio setting was grey when I enter the device window ) so it is only single path.
I will try to run several concurrent copies.
but why in linux the speed is way much faster even with single path?
sds80
14 Posts
April 19, 2018, 9:50 amQuote from sds80 on April 19, 2018, 9:50 amSo something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
Check logs - nothing suspicious
Check nics/switches - replace backend2 iSCSI2 subnet switch to more powerful model.
Test direct copy of large file from node to nodes on all subnets (Mgmt, Backend1, Backend2) all the way - wire speed!
Test direct copy of large file from VM host in Mgmt subnet to nodes - wire speed!
SO nothing wrong with networking from that point!
No run some synthetic tests.
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
root@peta1:/var/lib/ceph/osd/clusterx-1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 10054 10063.90 41221745.56
2 12800 5214.60 21358987.88
3 13911 4642.05 19013844.08
4 14697 3678.10 15065484.69
5 15056 2918.52 11954269.25
6 16939 1376.64 5638697.93
7 18668 1282.96 5254985.62
8 22871 1785.62 7313880.12
9 23607 1781.28 7296129.03
10 24295 1910.32 7824676.54
11 24739 1303.53 5339275.42
12 24824 1237.58 5069108.16
13 25458 518.36 2123211.00
14 26470 568.64 2329148.00
15 27645 662.74 2714603.35
16 32955 2045.40 8377955.03
17 33473 1730.45 7087906.08
18 34004 1711.35 7009673.02
// see here significiant decrease in the begininig from 41Mb/sec to 7Mb/sec
.....
929 1297374 1813.83 7429444.47
930 1297893 1851.24 7582696.45
931 1298348 1788.40 7325267.38
932 1299503 1742.41 7136915.36
933 1300364 759.14 3109440.33
934 1300434 623.29 2552992.97
935 1301329 685.61 2808276.70
936 1304486 1228.03 5030011.97
937 1308600 1822.84 7466371.77
938 1309240 1776.22 7275377.62
939 1309878 1902.42 7792326.15
940 1310559 1812.15 7422583.07
elapsed: 945 ops: 1310721 ops/sec: 1385.94 bytes/sec: 5676792.48
so random write of io-size=4M , threads=16, io-total=5Gb is 5.6Mb/sec - similar result to my copy testing from VM host
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern seq
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 20117 20088.89 82284107.89
2 37591 18803.98 77021113.67
3 55307 18441.31 75535615.22
4 72517 18133.52 74274878.08
5 86459 17135.81 70188278.83
6 97131 15409.78 63118468.67
7 110565 14594.11 59777459.47
8 122795 13419.19 54965021.59
9 136180 12731.24 52147169.91
10 151096 13034.36 53388736.02
11 162353 12984.86 53185982.32
....
119 1270047 11517.53 47175821.92
120 1275582 9244.68 37866191.18
121 1276901 7624.59 31230337.29
122 1279382 5810.92 23801514.63
123 1281852 4466.09 18293084.54
124 1293042 4602.64 18852426.73
125 1295917 4711.17 19296957.04
126 1300391 4870.08 19947859.82
127 1303970 5647.70 23132967.29
128 1308778 5433.10 22253988.20
elapsed: 128 ops: 1310721 ops/sec: 10183.17 bytes/sec: 41710248.13
seq write is much greater - 41Mb/sec average
So that can i do else to find out the problem?
as i see from this tests:
- nothing wrong with NICs, switches or subnets
- nothing wrong with hard disks (direct or sequence copy is normal)
- nothing wrong with iSCSI client connection (random write synth test is showing similar poor perfomance)
So something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
Check logs - nothing suspicious
Check nics/switches - replace backend2 iSCSI2 subnet switch to more powerful model.
Test direct copy of large file from node to nodes on all subnets (Mgmt, Backend1, Backend2) all the way - wire speed!
Test direct copy of large file from VM host in Mgmt subnet to nodes - wire speed!
SO nothing wrong with networking from that point!
No run some synthetic tests.
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
root@peta1:/var/lib/ceph/osd/clusterx-1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 10054 10063.90 41221745.56
2 12800 5214.60 21358987.88
3 13911 4642.05 19013844.08
4 14697 3678.10 15065484.69
5 15056 2918.52 11954269.25
6 16939 1376.64 5638697.93
7 18668 1282.96 5254985.62
8 22871 1785.62 7313880.12
9 23607 1781.28 7296129.03
10 24295 1910.32 7824676.54
11 24739 1303.53 5339275.42
12 24824 1237.58 5069108.16
13 25458 518.36 2123211.00
14 26470 568.64 2329148.00
15 27645 662.74 2714603.35
16 32955 2045.40 8377955.03
17 33473 1730.45 7087906.08
18 34004 1711.35 7009673.02
// see here significiant decrease in the begininig from 41Mb/sec to 7Mb/sec
.....
929 1297374 1813.83 7429444.47
930 1297893 1851.24 7582696.45
931 1298348 1788.40 7325267.38
932 1299503 1742.41 7136915.36
933 1300364 759.14 3109440.33
934 1300434 623.29 2552992.97
935 1301329 685.61 2808276.70
936 1304486 1228.03 5030011.97
937 1308600 1822.84 7466371.77
938 1309240 1776.22 7275377.62
939 1309878 1902.42 7792326.15
940 1310559 1812.15 7422583.07
elapsed: 945 ops: 1310721 ops/sec: 1385.94 bytes/sec: 5676792.48
so random write of io-size=4M , threads=16, io-total=5Gb is 5.6Mb/sec - similar result to my copy testing from VM host
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern seq
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 20117 20088.89 82284107.89
2 37591 18803.98 77021113.67
3 55307 18441.31 75535615.22
4 72517 18133.52 74274878.08
5 86459 17135.81 70188278.83
6 97131 15409.78 63118468.67
7 110565 14594.11 59777459.47
8 122795 13419.19 54965021.59
9 136180 12731.24 52147169.91
10 151096 13034.36 53388736.02
11 162353 12984.86 53185982.32
....
119 1270047 11517.53 47175821.92
120 1275582 9244.68 37866191.18
121 1276901 7624.59 31230337.29
122 1279382 5810.92 23801514.63
123 1281852 4466.09 18293084.54
124 1293042 4602.64 18852426.73
125 1295917 4711.17 19296957.04
126 1300391 4870.08 19947859.82
127 1303970 5647.70 23132967.29
128 1308778 5433.10 22253988.20
elapsed: 128 ops: 1310721 ops/sec: 10183.17 bytes/sec: 41710248.13
seq write is much greater - 41Mb/sec average
So that can i do else to find out the problem?
as i see from this tests:
- nothing wrong with NICs, switches or subnets
- nothing wrong with hard disks (direct or sequence copy is normal)
- nothing wrong with iSCSI client connection (random write synth test is showing similar poor perfomance)
admin
2,930 Posts
April 19, 2018, 11:39 amQuote from admin on April 19, 2018, 11:39 amI do not know why you are see-ing this. The charts show the system is almost idle, your disks are giving 10 iops and almost no utilization.
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
If you run atop command on one of the nodes during the test, do you see "red" values show up ?
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.
I do not know why you are see-ing this. The charts show the system is almost idle, your disks are giving 10 iops and almost no utilization.
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
If you run atop command on one of the nodes during the test, do you see "red" values show up ?
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.
Very slow write speed on low end 3 node cluster
admin
2,930 Posts
Quote from admin on April 16, 2018, 10:40 amAre using ssds as osds or as journals ? can you use straight ssds for osds (even if you have 1 per box) and see what you get.
what version of PetaSAN are you using ?
Are using ssds as osds or as journals ? can you use straight ssds for osds (even if you have 1 per box) and see what you get.
what version of PetaSAN are you using ?
sds80
14 Posts
Quote from sds80 on April 16, 2018, 10:46 amSSD's used as journals
petasan 1.5
yes i can setup this (hdd (system) +osd (ssd))
SSD's used as journals
petasan 1.5
yes i can setup this (hdd (system) +osd (ssd))
admin
2,930 Posts
Quote from admin on April 17, 2018, 6:39 amCan you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
Can you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
sds80
14 Posts
Quote from sds80 on April 18, 2018, 8:05 amQuote from admin on April 17, 2018, 6:39 amCan you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
Checked on physical and virtual windows 7 pro client machine - result is the same.
petasan 1.5 - 3nodes
1HDD (system) + 1SSD(osd+journal)
Ok testing is over. Now the results.
11G file copy ~ 45min speed is slowing down all the time and file copy ended at level - 5,38Mb/sec
Very similar results in VMware direct copy to attached petasan disk:
So yes, this config (SSD only) is 10X faster then my initial with HDD in write speed, but it is still to slow for any production purpose. So to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
So conclusion - Petasan in production on 3 nodes is a dream?
Quote from admin on April 17, 2018, 6:39 amCan you check the client side how iSCSI initiator is setup. Maybe test using a different client system.
Checked on physical and virtual windows 7 pro client machine - result is the same.
petasan 1.5 - 3nodes
1HDD (system) + 1SSD(osd+journal)
Ok testing is over. Now the results.
11G file copy ~ 45min speed is slowing down all the time and file copy ended at level - 5,38Mb/sec
Very similar results in VMware direct copy to attached petasan disk:
So yes, this config (SSD only) is 10X faster then my initial with HDD in write speed, but it is still to slow for any production purpose. So to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
So conclusion - Petasan in production on 3 nodes is a dream?
shadowlin
67 Posts
Quote from shadowlin on April 18, 2018, 8:22 amI am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
I am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
sds80
14 Posts
Quote from sds80 on April 18, 2018, 10:40 amQuote from shadowlin on April 18, 2018, 8:22 amI am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
How many nodes in your cluster and nodes configuration (CPU,RAM, RAID controller, HDD) ?
Quote from shadowlin on April 18, 2018, 8:22 amI am having performance issue on windows 10 with default iscsi initiator too.
copy speed is only about 5-8MB/s.
But in linux environment I can copy a 3.6GB file in a few seconds
How many nodes in your cluster and nodes configuration (CPU,RAM, RAID controller, HDD) ?
admin
2,930 Posts
Quote from admin on April 18, 2018, 11:16 amSo to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
The write results you see 360KB/s for hdds with ssd journal and 5 MB /s for all ssds are way off. Even with pure hdd (no ssd journal) you should get around 30 MB/s write speed for a single copy operation using Windows Server 2016 client with MPIO ( 2 paths ), this is even with 3 nodes with 1 hdd osd each. If you have more nodes/osds you will still get 30 MB/s per single copy operation, but it should scale with multiple file copy operations at the same time, so if you have 30 osds, you can have 10 copy operations with a total of 300 MB/s write speed. if you have all ssds you should get 80 MB/s per each file operations. These numbers assume (as i understand you do) copy a large size file of several GB, and not many small tiny files, the later case will give much lower numbers, even if copying local.
So something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
I am having performance issue on windows 10 with default iscsi initiator too. copy speed is only about 5-8MB/s.
Are you using all pure hdds or ssds ? Are you copying 1 large file or many tiny files ? You use 1 path not MPIO ? if you have several concurrent copies running at the same time, does it scale up ?
So to achieve write speed about 50Mb/sec i need + 27 nodes with similar config right?
The write results you see 360KB/s for hdds with ssd journal and 5 MB /s for all ssds are way off. Even with pure hdd (no ssd journal) you should get around 30 MB/s write speed for a single copy operation using Windows Server 2016 client with MPIO ( 2 paths ), this is even with 3 nodes with 1 hdd osd each. If you have more nodes/osds you will still get 30 MB/s per single copy operation, but it should scale with multiple file copy operations at the same time, so if you have 30 osds, you can have 10 copy operations with a total of 300 MB/s write speed. if you have all ssds you should get 80 MB/s per each file operations. These numbers assume (as i understand you do) copy a large size file of several GB, and not many small tiny files, the later case will give much lower numbers, even if copying local.
So something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
I am having performance issue on windows 10 with default iscsi initiator too. copy speed is only about 5-8MB/s.
Are you using all pure hdds or ssds ? Are you copying 1 large file or many tiny files ? You use 1 path not MPIO ? if you have several concurrent copies running at the same time, does it scale up ?
shadowlin
67 Posts
Quote from shadowlin on April 18, 2018, 2:44 pmI am using pure hdds and was copying 1 larger file.
I can't set mpio on my windows 10 machine(i can add multiple sessions but can't combine them,the mpio setting was grey when I enter the device window ) so it is only single path.
I will try to run several concurrent copies.
but why in linux the speed is way much faster even with single path?
I am using pure hdds and was copying 1 larger file.
I can't set mpio on my windows 10 machine(i can add multiple sessions but can't combine them,the mpio setting was grey when I enter the device window ) so it is only single path.
I will try to run several concurrent copies.
but why in linux the speed is way much faster even with single path?
sds80
14 Posts
Quote from sds80 on April 19, 2018, 9:50 amSo something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
Check logs - nothing suspicious
Check nics/switches - replace backend2 iSCSI2 subnet switch to more powerful model.
Test direct copy of large file from node to nodes on all subnets (Mgmt, Backend1, Backend2) all the way - wire speed!
Test direct copy of large file from VM host in Mgmt subnet to nodes - wire speed!
SO nothing wrong with networking from that point!
No run some synthetic tests.
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
root@peta1:/var/lib/ceph/osd/clusterx-1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 10054 10063.90 41221745.56
2 12800 5214.60 21358987.88
3 13911 4642.05 19013844.08
4 14697 3678.10 15065484.69
5 15056 2918.52 11954269.25
6 16939 1376.64 5638697.93
7 18668 1282.96 5254985.62
8 22871 1785.62 7313880.12
9 23607 1781.28 7296129.03
10 24295 1910.32 7824676.54
11 24739 1303.53 5339275.42
12 24824 1237.58 5069108.16
13 25458 518.36 2123211.00
14 26470 568.64 2329148.00
15 27645 662.74 2714603.35
16 32955 2045.40 8377955.03
17 33473 1730.45 7087906.08
18 34004 1711.35 7009673.02// see here significiant decrease in the begininig from 41Mb/sec to 7Mb/sec
.....
929 1297374 1813.83 7429444.47
930 1297893 1851.24 7582696.45
931 1298348 1788.40 7325267.38
932 1299503 1742.41 7136915.36
933 1300364 759.14 3109440.33
934 1300434 623.29 2552992.97
935 1301329 685.61 2808276.70
936 1304486 1228.03 5030011.97
937 1308600 1822.84 7466371.77
938 1309240 1776.22 7275377.62
939 1309878 1902.42 7792326.15
940 1310559 1812.15 7422583.07
elapsed: 945 ops: 1310721 ops/sec: 1385.94 bytes/sec: 5676792.48so random write of io-size=4M , threads=16, io-total=5Gb is 5.6Mb/sec - similar result to my copy testing from VM host
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern seq
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 20117 20088.89 82284107.89
2 37591 18803.98 77021113.67
3 55307 18441.31 75535615.22
4 72517 18133.52 74274878.08
5 86459 17135.81 70188278.83
6 97131 15409.78 63118468.67
7 110565 14594.11 59777459.47
8 122795 13419.19 54965021.59
9 136180 12731.24 52147169.91
10 151096 13034.36 53388736.02
11 162353 12984.86 53185982.32
....119 1270047 11517.53 47175821.92
120 1275582 9244.68 37866191.18
121 1276901 7624.59 31230337.29
122 1279382 5810.92 23801514.63
123 1281852 4466.09 18293084.54
124 1293042 4602.64 18852426.73
125 1295917 4711.17 19296957.04
126 1300391 4870.08 19947859.82
127 1303970 5647.70 23132967.29
128 1308778 5433.10 22253988.20
elapsed: 128 ops: 1310721 ops/sec: 10183.17 bytes/sec: 41710248.13
seq write is much greater - 41Mb/sec average
So that can i do else to find out the problem?
as i see from this tests:
- nothing wrong with NICs, switches or subnets
- nothing wrong with hard disks (direct or sequence copy is normal)
- nothing wrong with iSCSI client connection (random write synth test is showing similar poor perfomance)
So something is just not right with the numbers. If there is a way start clean and install on new hardware that would be ideal, else i would check the configuration, make sure you do not have duplicate ips or overlapping subnets. If you have good read speed, my initial suspect will be backend 2 network since this is used for write replication : check the nics/switches. Also do you see any errors in dmesg or in /opt/petasan/log/PetaSAN.log ?
Check logs - nothing suspicious
Check nics/switches - replace backend2 iSCSI2 subnet switch to more powerful model.
Test direct copy of large file from node to nodes on all subnets (Mgmt, Backend1, Backend2) all the way - wire speed!
Test direct copy of large file from VM host in Mgmt subnet to nodes - wire speed!
SO nothing wrong with networking from that point!
No run some synthetic tests.
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
root@peta1:/var/lib/ceph/osd/clusterx-1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern rand
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 10054 10063.90 41221745.56
2 12800 5214.60 21358987.88
3 13911 4642.05 19013844.08
4 14697 3678.10 15065484.69
5 15056 2918.52 11954269.25
6 16939 1376.64 5638697.93
7 18668 1282.96 5254985.62
8 22871 1785.62 7313880.12
9 23607 1781.28 7296129.03
10 24295 1910.32 7824676.54
11 24739 1303.53 5339275.42
12 24824 1237.58 5069108.16
13 25458 518.36 2123211.00
14 26470 568.64 2329148.00
15 27645 662.74 2714603.35
16 32955 2045.40 8377955.03
17 33473 1730.45 7087906.08
18 34004 1711.35 7009673.02// see here significiant decrease in the begininig from 41Mb/sec to 7Mb/sec
.....
929 1297374 1813.83 7429444.47
930 1297893 1851.24 7582696.45
931 1298348 1788.40 7325267.38
932 1299503 1742.41 7136915.36
933 1300364 759.14 3109440.33
934 1300434 623.29 2552992.97
935 1301329 685.61 2808276.70
936 1304486 1228.03 5030011.97
937 1308600 1822.84 7466371.77
938 1309240 1776.22 7275377.62
939 1309878 1902.42 7792326.15
940 1310559 1812.15 7422583.07
elapsed: 945 ops: 1310721 ops/sec: 1385.94 bytes/sec: 5676792.48
so random write of io-size=4M , threads=16, io-total=5Gb is 5.6Mb/sec - similar result to my copy testing from VM host
root@peta1# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-pattern seq
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
bench-write io_size 4096 io_threads 16 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 20117 20088.89 82284107.89
2 37591 18803.98 77021113.67
3 55307 18441.31 75535615.22
4 72517 18133.52 74274878.08
5 86459 17135.81 70188278.83
6 97131 15409.78 63118468.67
7 110565 14594.11 59777459.47
8 122795 13419.19 54965021.59
9 136180 12731.24 52147169.91
10 151096 13034.36 53388736.02
11 162353 12984.86 53185982.32
....119 1270047 11517.53 47175821.92
120 1275582 9244.68 37866191.18
121 1276901 7624.59 31230337.29
122 1279382 5810.92 23801514.63
123 1281852 4466.09 18293084.54
124 1293042 4602.64 18852426.73
125 1295917 4711.17 19296957.04
126 1300391 4870.08 19947859.82
127 1303970 5647.70 23132967.29
128 1308778 5433.10 22253988.20
elapsed: 128 ops: 1310721 ops/sec: 10183.17 bytes/sec: 41710248.13
seq write is much greater - 41Mb/sec average
So that can i do else to find out the problem?
as i see from this tests:
- nothing wrong with NICs, switches or subnets
- nothing wrong with hard disks (direct or sequence copy is normal)
- nothing wrong with iSCSI client connection (random write synth test is showing similar poor perfomance)
admin
2,930 Posts
Quote from admin on April 19, 2018, 11:39 amI do not know why you are see-ing this. The charts show the system is almost idle, your disks are giving 10 iops and almost no utilization.
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512KIf you run atop command on one of the nodes during the test, do you see "red" values show up ?
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.
I do not know why you are see-ing this. The charts show the system is almost idle, your disks are giving 10 iops and almost no utilization.
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
If you run atop command on one of the nodes during the test, do you see "red" values show up ?
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.