Very slow write speed on low end 3 node cluster
sds80
14 Posts
April 20, 2018, 4:34 amQuote from sds80 on April 20, 2018, 4:34 am# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
....
4520 1285868 307.96 1261420.36
4521 1286171 303.41 1242755.63
4522 1286503 303.17 1241793.70
4523 1286808 303.82 1244466.64
4524 1287109 305.32 1250585.57
4525 1287387 303.77 1244244.47
4526 1287680 301.72 1235848.78
4527 1288008 301.12 1233399.67
4528 1288324 303.12 1241559.11
4529 1288616 301.42 1234597.00
4530 1288906 303.90 1244783.78
...
(to slow, 'ctrl+x' about hour after beginning)
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
...
1398 470456 351.29 1438871.86
1399 470786 349.65 1432150.17
1400 471135 347.30 1422541.21
1401 471464 342.41 1402492.20
1402 471809 340.32 1393961.61
1403 472150 338.80 1387704.97
1404 472487 340.25 1393680.13
1405 472839 340.89 1396298.77
1406 473181 343.52 1407044.38
1407 473524 342.99 1404866.64
1408 473863 342.56 1403129.62
1409 474202 343.02 1404992.20
1410 474565 345.25 1414125.15
1411 474918 347.32 1422616.19
1412 475256 346.54 1419420.95
1413 475589 345.13 1413656.18
1414 475935 346.47 1419122.50
...
(to slow, 'ctrl+x' about hour after beginning)
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 190 190.41 12478826.51
2 385 192.80 12635412.03
3 580 193.46 12678369.63
4 762 190.62 12492270.08
5 941 188.31 12340770.86
6 1125 187.10 12261533.58
...
569 80889 182.49 11959342.54
570 81079 182.58 11965266.75
571 81264 184.26 12075933.85
572 81447 184.25 12074809.93
573 81636 187.37 12279299.79
574 81815 185.08 12129455.96
elapsed: 574 ops: 81921 ops/sec: 142.57 bytes/sec: 9343659.72
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 199 199.61 13081783.40
2 395 197.46 12941065.04
3 585 195.09 12785251.34
4 783 195.76 12829283.90
5 861 171.97 11270527.74
6 963 152.79 10013065.48
7 1163 152.92 10022086.82
8 1191 121.17 7940702.65
9 1328 109.05 7146767.56
10 1433 114.29 7490151.42
...
626 80461 123.44 8089621.24
627 80488 106.77 6997159.36
628 80661 137.45 9007922.50
629 80734 125.77 8242297.07
630 80889 120.55 7900469.05
631 80931 93.58 6132557.71
632 81062 116.48 7633869.34
633 81244 116.55 7638540.06
634 81311 115.88 7594070.18
635 81471 116.42 7629852.97
636 81673 149.45 9794086.93
637 81873 162.36 10640553.79
elapsed: 637 ops: 81921 ops/sec: 128.54 bytes/sec: 8423920.00
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 60 60.06 31486532.42
2 119 59.75 31327356.34
3 177 59.05 30957139.43
4 197 49.31 25851276.69
5 217 43.40 22755135.68
6 251 38.27 20064902.55
7 284 32.77 17182914.70
8 305 25.65 13446663.42
9 332 26.92 14112742.29
10 365 29.65 15544157.12
...
289 9812 27.41 14370546.52
290 9838 28.26 14815407.48
291 9864 29.51 15473098.06
292 9895 29.36 15391555.73
293 9953 35.08 18394421.06
294 9980 33.62 17628999.32
295 10006 33.58 17607770.33
296 10032 33.57 17599085.46
297 10065 34.23 17945535.78
298 10093 28.00 14681237.64
299 10140 32.04 16796624.25
300 10186 36.04 18893424.91
301 10218 37.02 19408360.06
elapsed: 301 ops: 10241 ops/sec: 33.95 bytes/sec: 17801057.56
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 56 55.75 29231409.66
2 95 47.99 25159581.88
3 129 42.43 22246030.42
4 155 38.97 20433745.72
5 216 43.18 22637565.29
6 238 36.53 19153921.02
7 266 34.07 17864802.97
8 293 33.16 17387880.64
9 339 36.76 19270848.08
10 368 30.35 15911849.79
11 411 34.39 18031141.23
...
295 9867 36.54 19159582.09
296 9890 36.16 18960391.24
297 9913 33.45 17537821.73
298 9942 27.38 14354665.65
299 9966 23.71 12433087.98
300 10002 27.09 14205292.60
301 10055 32.86 17228744.48
302 10083 33.97 17809404.41
303 10121 35.82 18778194.03
304 10160 38.80 20344113.92
305 10190 37.61 19716982.41
306 10229 34.96 18328771.19
elapsed: 306 ops: 10241 ops/sec: 33.44 bytes/sec: 17533909.55
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
i think this is due Monitor service malfunction on node3. i rebooted this node and error disappeared.
If you run atop command on one of the nodes during the test, do you see "red" values show up ?
now i have permanent problem regardless of load on all 3 nodes - MEM USAGE 100%
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
only OSD log is:
root@peta2:~# cat /var/log/ceph/ceph-osd.admin.log.1
2018-04-18 13:35:55.369713 7f5e744c68c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 1983
2018-04-18 13:35:55.370296 7f5e744c68c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.377642 7f5e744c68c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.378004 7f5e744c68c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.378079 7f5e744c68c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:55.867535 7ff891e838c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2006
2018-04-18 13:35:55.868130 7ff891e838c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.875925 7ff891e838c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.876342 7ff891e838c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.876436 7ff891e838c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:59.236656 7f25181808c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2082
2018-04-18 13:35:59.237413 7f25181808c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:59.247248 7f25181808c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:59.248919 7f25181808c0 1 journal close /dev/sdb2
2018-04-18 13:35:59.249002 7f25181808c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:00.392934 7fb6fa1198c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2181
2018-04-18 13:36:00.393542 7fb6fa1198c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:00.400939 7fb6fa1198c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:00.401368 7fb6fa1198c0 1 journal close /dev/sdb2
2018-04-18 13:36:00.401464 7fb6fa1198c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:06.069200 7f8995c2d8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2455
2018-04-18 13:36:06.069979 7f8995c2d8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:06.078987 7f8995c2d8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:06.080628 7f8995c2d8c0 1 journal close /dev/sdb2
2018-04-18 13:36:06.080680 7f8995c2d8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
2018-04-19 09:25:30.146064 7f757a13b8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 546
2018-04-19 09:25:30.161079 7f757a13b8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-19 09:25:30.168995 7f757a13b8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-19 09:25:30.169405 7f757a13b8c0 1 journal close /dev/sdb2
2018-04-19 09:25:30.169491 7f757a13b8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.
what do you mean 'subnet 2' ?
i have 3 identical servers with 2 NICs (Intel 6111ESB/6321ESB as seen in Petasan Installation GUI)
so my config is:
- eth0 (Mgmt, Backend1, iSCSI1)
- eth1 ( Backend2, iSCSI2)
i don't have any other NIC's to replace the current ones
So maybe is amount of RAM is bottleneck?
on next week i decide to install 3 node ceph only cluster (via ceph-deploy) and run rados tests
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
....
4520 1285868 307.96 1261420.36
4521 1286171 303.41 1242755.63
4522 1286503 303.17 1241793.70
4523 1286808 303.82 1244466.64
4524 1287109 305.32 1250585.57
4525 1287387 303.77 1244244.47
4526 1287680 301.72 1235848.78
4527 1288008 301.12 1233399.67
4528 1288324 303.12 1241559.11
4529 1288616 301.42 1234597.00
4530 1288906 303.90 1244783.78
...
(to slow, 'ctrl+x' about hour after beginning)
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
...
1398 470456 351.29 1438871.86
1399 470786 349.65 1432150.17
1400 471135 347.30 1422541.21
1401 471464 342.41 1402492.20
1402 471809 340.32 1393961.61
1403 472150 338.80 1387704.97
1404 472487 340.25 1393680.13
1405 472839 340.89 1396298.77
1406 473181 343.52 1407044.38
1407 473524 342.99 1404866.64
1408 473863 342.56 1403129.62
1409 474202 343.02 1404992.20
1410 474565 345.25 1414125.15
1411 474918 347.32 1422616.19
1412 475256 346.54 1419420.95
1413 475589 345.13 1413656.18
1414 475935 346.47 1419122.50
...
(to slow, 'ctrl+x' about hour after beginning)
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 190 190.41 12478826.51
2 385 192.80 12635412.03
3 580 193.46 12678369.63
4 762 190.62 12492270.08
5 941 188.31 12340770.86
6 1125 187.10 12261533.58
...
569 80889 182.49 11959342.54
570 81079 182.58 11965266.75
571 81264 184.26 12075933.85
572 81447 184.25 12074809.93
573 81636 187.37 12279299.79
574 81815 185.08 12129455.96
elapsed: 574 ops: 81921 ops/sec: 142.57 bytes/sec: 9343659.72
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 199 199.61 13081783.40
2 395 197.46 12941065.04
3 585 195.09 12785251.34
4 783 195.76 12829283.90
5 861 171.97 11270527.74
6 963 152.79 10013065.48
7 1163 152.92 10022086.82
8 1191 121.17 7940702.65
9 1328 109.05 7146767.56
10 1433 114.29 7490151.42
...
626 80461 123.44 8089621.24
627 80488 106.77 6997159.36
628 80661 137.45 9007922.50
629 80734 125.77 8242297.07
630 80889 120.55 7900469.05
631 80931 93.58 6132557.71
632 81062 116.48 7633869.34
633 81244 116.55 7638540.06
634 81311 115.88 7594070.18
635 81471 116.42 7629852.97
636 81673 149.45 9794086.93
637 81873 162.36 10640553.79
elapsed: 637 ops: 81921 ops/sec: 128.54 bytes/sec: 8423920.00
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 60 60.06 31486532.42
2 119 59.75 31327356.34
3 177 59.05 30957139.43
4 197 49.31 25851276.69
5 217 43.40 22755135.68
6 251 38.27 20064902.55
7 284 32.77 17182914.70
8 305 25.65 13446663.42
9 332 26.92 14112742.29
10 365 29.65 15544157.12
...
289 9812 27.41 14370546.52
290 9838 28.26 14815407.48
291 9864 29.51 15473098.06
292 9895 29.36 15391555.73
293 9953 35.08 18394421.06
294 9980 33.62 17628999.32
295 10006 33.58 17607770.33
296 10032 33.57 17599085.46
297 10065 34.23 17945535.78
298 10093 28.00 14681237.64
299 10140 32.04 16796624.25
300 10186 36.04 18893424.91
301 10218 37.02 19408360.06
elapsed: 301 ops: 10241 ops/sec: 33.95 bytes/sec: 17801057.56
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 56 55.75 29231409.66
2 95 47.99 25159581.88
3 129 42.43 22246030.42
4 155 38.97 20433745.72
5 216 43.18 22637565.29
6 238 36.53 19153921.02
7 266 34.07 17864802.97
8 293 33.16 17387880.64
9 339 36.76 19270848.08
10 368 30.35 15911849.79
11 411 34.39 18031141.23
...
295 9867 36.54 19159582.09
296 9890 36.16 18960391.24
297 9913 33.45 17537821.73
298 9942 27.38 14354665.65
299 9966 23.71 12433087.98
300 10002 27.09 14205292.60
301 10055 32.86 17228744.48
302 10083 33.97 17809404.41
303 10121 35.82 18778194.03
304 10160 38.80 20344113.92
305 10190 37.61 19716982.41
306 10229 34.96 18328771.19
elapsed: 306 ops: 10241 ops/sec: 33.44 bytes/sec: 17533909.55
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
i think this is due Monitor service malfunction on node3. i rebooted this node and error disappeared.
If you run atop command on one of the nodes during the test, do you see "red" values show up ?
now i have permanent problem regardless of load on all 3 nodes - MEM USAGE 100%
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
only OSD log is:
root@peta2:~# cat /var/log/ceph/ceph-osd.admin.log.1
2018-04-18 13:35:55.369713 7f5e744c68c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 1983
2018-04-18 13:35:55.370296 7f5e744c68c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.377642 7f5e744c68c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.378004 7f5e744c68c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.378079 7f5e744c68c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:55.867535 7ff891e838c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2006
2018-04-18 13:35:55.868130 7ff891e838c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.875925 7ff891e838c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.876342 7ff891e838c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.876436 7ff891e838c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:59.236656 7f25181808c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2082
2018-04-18 13:35:59.237413 7f25181808c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:59.247248 7f25181808c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:59.248919 7f25181808c0 1 journal close /dev/sdb2
2018-04-18 13:35:59.249002 7f25181808c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:00.392934 7fb6fa1198c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2181
2018-04-18 13:36:00.393542 7fb6fa1198c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:00.400939 7fb6fa1198c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:00.401368 7fb6fa1198c0 1 journal close /dev/sdb2
2018-04-18 13:36:00.401464 7fb6fa1198c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:06.069200 7f8995c2d8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2455
2018-04-18 13:36:06.069979 7f8995c2d8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:06.078987 7f8995c2d8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:06.080628 7f8995c2d8c0 1 journal close /dev/sdb2
2018-04-18 13:36:06.080680 7f8995c2d8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
2018-04-19 09:25:30.146064 7f757a13b8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 546
2018-04-19 09:25:30.161079 7f757a13b8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-19 09:25:30.168995 7f757a13b8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-19 09:25:30.169405 7f757a13b8c0 1 journal close /dev/sdb2
2018-04-19 09:25:30.169491 7f757a13b8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.
what do you mean 'subnet 2' ?
i have 3 identical servers with 2 NICs (Intel 6111ESB/6321ESB as seen in Petasan Installation GUI)
so my config is:
- eth0 (Mgmt, Backend1, iSCSI1)
- eth1 ( Backend2, iSCSI2)
i don't have any other NIC's to replace the current ones
So maybe is amount of RAM is bottleneck?
on next week i decide to install 3 node ceph only cluster (via ceph-deploy) and run rados tests
Last edited on April 20, 2018, 9:08 am by sds80 · #21
admin
2,930 Posts
April 20, 2018, 11:36 amQuote from admin on April 20, 2018, 11:36 amOne thing is to make sure the cluster is active-clean before you run the tests. It may be better to also dis-able scrub and deep-scrub from the maintenance mode.
For ram, edit /etc/sysctl.conf on all 3 nodes and comment (via #) the following 2 lines:
vm.swappiness=10
vm.vfs_cache_pressure=1
and reboot
For subnet 2 i meant to say backend 2.
If you can install a ceph yourself and run the rados/rbd tests that could also be helpful, let us know of you find any issues.
One thing is to make sure the cluster is active-clean before you run the tests. It may be better to also dis-able scrub and deep-scrub from the maintenance mode.
For ram, edit /etc/sysctl.conf on all 3 nodes and comment (via #) the following 2 lines:
vm.swappiness=10
vm.vfs_cache_pressure=1
and reboot
For subnet 2 i meant to say backend 2.
If you can install a ceph yourself and run the rados/rbd tests that could also be helpful, let us know of you find any issues.
Last edited on April 20, 2018, 11:38 am by admin · #22
sds80
14 Posts
April 26, 2018, 9:56 amQuote from sds80 on April 26, 2018, 9:56 amInstalled Ceph 3 node cluster on "VERSION="14.04.5 LTS, Trusty Tahr"
root@peta1:~# ceph -v
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
root@peta1:~# cat /etc/ceph/ceph.conf
[global]
fsid = 205a5603-0d41-4b2e-89e1-d79f10a26ec1
mon_initial_members = peta1, peta2, peta3
mon_host = 192.168.120.230,192.168.120.231,192.168.120.232
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 192.168.120.0/24
cluster_network = 10.10.2.0/24
Then install tgt:
# apt install tgt
# rbd create -p rbd rbd1 --size 4096 --name client.admin --image-feature layering
# echo '<target virtual-ceph:iscsi>
driver iscsi
bs-type rbd
backing-store rbd/rbd1 # Format: <pool_name>/<rbd_image_name>
initiator-address ALL </target>'
/etc/tgt/conf.d/ceph.conf
# systemctl restart tgt
Then i connected ceph disk via iSCSI client (w7 on vm) and copy large 13Gb file and 5000 small files (summary 2Gb).
Average speed 37Mb/s and 24Mb/s respectively.
So the speed is significially increased on clear Ceph installation vs Petasan.
I think there is some network configuration problem on Petasan.
Maybe you help me find the reason of bad write perfomance on Petasan installation.
ps - i have 64 PG's in default ceph-deploy installation, but 256 PG's in default Petasan install. Maybe this can react to write perfomance ? How can i change PG to 256 and test it ?
Installed Ceph 3 node cluster on "VERSION="14.04.5 LTS, Trusty Tahr"
root@peta1:~# ceph -v
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
root@peta1:~# cat /etc/ceph/ceph.conf
[global]
fsid = 205a5603-0d41-4b2e-89e1-d79f10a26ec1
mon_initial_members = peta1, peta2, peta3
mon_host = 192.168.120.230,192.168.120.231,192.168.120.232
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 192.168.120.0/24
cluster_network = 10.10.2.0/24
Then install tgt:
# apt install tgt
# rbd create -p rbd rbd1 --size 4096 --name client.admin --image-feature layering
# echo '<target virtual-ceph:iscsi>
driver iscsi
bs-type rbd
backing-store rbd/rbd1 # Format: <pool_name>/<rbd_image_name>
initiator-address ALL </target>'
/etc/tgt/conf.d/ceph.conf
# systemctl restart tgt
Then i connected ceph disk via iSCSI client (w7 on vm) and copy large 13Gb file and 5000 small files (summary 2Gb).
Average speed 37Mb/s and 24Mb/s respectively.
So the speed is significially increased on clear Ceph installation vs Petasan.
I think there is some network configuration problem on Petasan.
Maybe you help me find the reason of bad write perfomance on Petasan installation.
ps - i have 64 PG's in default ceph-deploy installation, but 256 PG's in default Petasan install. Maybe this can react to write perfomance ? How can i change PG to 256 and test it ?
Last edited on April 26, 2018, 10:02 am by sds80 · #23
admin
2,930 Posts
April 26, 2018, 10:48 amQuote from admin on April 26, 2018, 10:48 amI suspect the speed gain you see is probably due to write back caching in your setup. This allows the iSCSI target to cache many io and submit them in large bulks. Although this gives much better write performance, it does not allow iSCSI high availability since if a node dies while caching data, the failover path does not have this cached data and will probably lead to data corruption. Cache can be used for single non-ha setup to speed writes, but there is always potential for data loss (rather than corruption). Most HA iSCSI SANs do not cache data, the client still can cache at the client end (via the client os setting) if desired.
To see if this is the case, can you rerun the same benchmark test i posted earlier so we can compare the 2 setups:
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
I suspect the speed gain you see is probably due to write back caching in your setup. This allows the iSCSI target to cache many io and submit them in large bulks. Although this gives much better write performance, it does not allow iSCSI high availability since if a node dies while caching data, the failover path does not have this cached data and will probably lead to data corruption. Cache can be used for single non-ha setup to speed writes, but there is always potential for data loss (rather than corruption). Most HA iSCSI SANs do not cache data, the client still can cache at the client end (via the client os setting) if desired.
To see if this is the case, can you rerun the same benchmark test i posted earlier so we can compare the 2 setups:
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
Last edited on April 26, 2018, 10:49 am by admin · #24
sds80
14 Posts
April 26, 2018, 10:58 amQuote from sds80 on April 26, 2018, 10:58 amroot@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 185 185.60 760199.57
2 452 226.26 926767.51
3 642 214.09 876912.92
4 675 168.07 688402.59
5 737 147.39 603705.05
6 787 117.71 482141.94
7 959 101.40 415327.21
8 1237 119.08 487755.94
9 1483 162.25 664572.69
10 1531 156.84 642414.05
11 1553 156.58 641336.05
12 1607 121.24 496616.35
13 1682 88.95 364325.39
...
10471 1309143 137.28 562293.52
10472 1309146 123.31 505060.85
10473 1309204 103.52 424038.37
10474 1309355 121.79 498843.06
10475 1309644 120.02 491586.80
10476 1309936 169.80 695496.63
10477 1310082 198.39 812608.66
10478 1310125 195.71 801646.84
10479 1310144 155.86 638384.17
10480 1310198 110.81 453877.72
10481 1310240 59.37 243180.55
10482 1310495 82.57 338220.69
elapsed: 10482 ops: 1310721 ops/sec: 125.04 bytes/sec: 512145.26
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 296 296.66 1215102.36
2 513 250.26 1025072.81
3 537 177.79 728208.40
4 643 160.95 659251.50
5 936 187.32 767255.63
6 1233 187.42 767684.74
7 1481 195.64 801356.90
8 1706 234.99 962499.43
9 2003 272.02 1114200.73
10 2301 273.11 1118663.10
...
4315 916228 246.88 1011231.24
4316 916416 243.69 998161.25
4317 916663 229.85 941453.40
4318 916843 228.54 936079.70
4319 917148 228.53 936073.19
4320 917399 234.09 958819.37
4321 917694 255.47 1046409.07
4322 917779 233.29 955552.95
4323 917804 191.94 786189.52
4324 917993 169.04 692379.18
4325 918267 173.62 711157.08
4326 918426 146.44 599813.67
4327 918647 173.53 710777.63
4328 918802 194.18 795341.26
4329 919041 209.55 858328.68
4330 919230 192.59 788838.80
4331 919393 193.35 791941.31
4332 919696 210.50 862210.08
4333 919879 217.30 890053.22
4334 919902 171.88 704039.11
...
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 166 166.83 10933627.78
2 288 143.38 9396747.12
3 344 114.66 7514448.23
4 416 103.88 6807838.98
5 477 88.14 5776136.33
6 478 62.35 4086490.05
7 628 68.19 4469135.09
8 771 85.44 5599120.90
9 858 88.63 5808722.21
10 949 103.10 6756435.30
...
1163 81140 62.12 4070906.03
1164 81281 88.65 5809854.40
1165 81445 116.34 7624684.04
1166 81522 95.26 6242669.24
1167 81539 85.99 5635260.96
1168 81584 89.18 5844568.72
1169 81648 73.44 4812686.26
1170 81708 48.63 3186805.93
1171 81736 43.44 2847144.95
1172 81802 52.82 3461659.70
1173 81838 50.76 3326898.27
elapsed: 1173 ops: 81921 ops/sec: 69.78 bytes/sec: 4573115.29
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 173 173.67 11381469.40
2 355 177.66 11642828.14
3 526 175.59 11507649.76
4 651 160.48 10517126.37
5 677 134.71 8828572.31
6 772 119.71 7845543.57
7 949 118.81 7786373.45
8 1118 118.35 7756218.99
9 1272 125.59 8230497.64
10 1300 124.63 8167493.45
...
810 80908 65.83 4314111.22
811 81082 90.00 5898082.32
812 81215 110.61 7249218.68
813 81244 112.97 7403300.63
814 81291 110.71 7255437.00
815 81341 86.51 5669497.22
816 81416 66.74 4373813.41
817 81477 52.59 3446677.39
818 81645 80.61 5283157.27
819 81813 104.43 6843952.03
elapsed: 819 ops: 81921 ops/sec: 99.95 bytes/sec: 6550433.96
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 45 45.81 24016994.96
2 91 45.70 23957793.54
3 130 43.58 22847305.26
4 147 36.47 19119609.19
5 167 33.43 17528969.81
6 179 26.57 13931251.31
7 190 19.20 10063983.33
8 194 12.43 6518416.87
9 216 13.53 7092166.23
10 224 11.22 5880899.73
...
574 10075 9.40 4930013.54
575 10079 8.33 4366731.21
576 10101 10.75 5635610.00
577 10115 11.18 5860530.36
578 10153 17.25 9044775.40
579 10196 25.31 13269452.33
580 10221 26.95 14127067.94
581 10230 25.69 13470566.54
582 10240 25.13 13174435.25
elapsed: 582 ops: 10241 ops/sec: 17.59 bytes/sec: 9222399.90
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 51 51.70 27103074.15
2 103 51.53 27017638.08
3 135 45.32 23760413.10
4 155 38.85 20366603.85
5 183 36.58 19177765.89
6 226 35.01 18354715.55
7 272 33.86 17751662.58
8 316 36.12 18938618.74
9 338 36.65 19212577.38
10 361 35.18 18444806.03
...
359 10045 13.69 7176961.19
360 10051 12.80 6711485.82
361 10062 11.02 5778292.40
362 10087 13.91 7293660.81
363 10109 15.91 8339015.02
364 10124 16.31 8551990.29
365 10145 19.19 10060962.03
366 10169 21.41 11226375.33
367 10186 19.89 10429246.53
368 10223 22.80 11953390.80
elapsed: 368 ops: 10241 ops/sec: 27.80 bytes/sec: 14573487.85
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4M
bench-write io_size 4194304 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 7 6.97 29240480.92
2 12 6.47 27147232.46
3 17 5.70 23915718.98
4 22 5.69 23860058.47
5 27 5.29 22181822.83
6 30 4.61 19328294.43
7 34 4.34 18203876.61
8 39 4.13 17339788.66
9 41 3.80 15928710.13
10 48 4.33 18149676.52
...
312 1243 4.85 20356648.71
313 1247 4.96 20787766.09
314 1252 4.67 19605427.32
315 1255 4.63 19435600.75
316 1258 4.26 17860833.76
317 1264 4.12 17286140.28
318 1267 4.02 16874536.76
319 1273 4.33 18152169.76
320 1277 4.54 19024705.15
elapsed: 321 ops: 1281 ops/sec: 3.99 bytes/sec: 16735217.94
Results are similar (but previous SSD only installation and current 2HDD(osd)+1SSD(journal).
So conclusion is - use more expensive hardware?
There is no more room to enhance perfomance?
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 185 185.60 760199.57
2 452 226.26 926767.51
3 642 214.09 876912.92
4 675 168.07 688402.59
5 737 147.39 603705.05
6 787 117.71 482141.94
7 959 101.40 415327.21
8 1237 119.08 487755.94
9 1483 162.25 664572.69
10 1531 156.84 642414.05
11 1553 156.58 641336.05
12 1607 121.24 496616.35
13 1682 88.95 364325.39
...
10471 1309143 137.28 562293.52
10472 1309146 123.31 505060.85
10473 1309204 103.52 424038.37
10474 1309355 121.79 498843.06
10475 1309644 120.02 491586.80
10476 1309936 169.80 695496.63
10477 1310082 198.39 812608.66
10478 1310125 195.71 801646.84
10479 1310144 155.86 638384.17
10480 1310198 110.81 453877.72
10481 1310240 59.37 243180.55
10482 1310495 82.57 338220.69
elapsed: 10482 ops: 1310721 ops/sec: 125.04 bytes/sec: 512145.26
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 296 296.66 1215102.36
2 513 250.26 1025072.81
3 537 177.79 728208.40
4 643 160.95 659251.50
5 936 187.32 767255.63
6 1233 187.42 767684.74
7 1481 195.64 801356.90
8 1706 234.99 962499.43
9 2003 272.02 1114200.73
10 2301 273.11 1118663.10
...
4315 916228 246.88 1011231.24
4316 916416 243.69 998161.25
4317 916663 229.85 941453.40
4318 916843 228.54 936079.70
4319 917148 228.53 936073.19
4320 917399 234.09 958819.37
4321 917694 255.47 1046409.07
4322 917779 233.29 955552.95
4323 917804 191.94 786189.52
4324 917993 169.04 692379.18
4325 918267 173.62 711157.08
4326 918426 146.44 599813.67
4327 918647 173.53 710777.63
4328 918802 194.18 795341.26
4329 919041 209.55 858328.68
4330 919230 192.59 788838.80
4331 919393 193.35 791941.31
4332 919696 210.50 862210.08
4333 919879 217.30 890053.22
4334 919902 171.88 704039.11
...
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 166 166.83 10933627.78
2 288 143.38 9396747.12
3 344 114.66 7514448.23
4 416 103.88 6807838.98
5 477 88.14 5776136.33
6 478 62.35 4086490.05
7 628 68.19 4469135.09
8 771 85.44 5599120.90
9 858 88.63 5808722.21
10 949 103.10 6756435.30
...
1163 81140 62.12 4070906.03
1164 81281 88.65 5809854.40
1165 81445 116.34 7624684.04
1166 81522 95.26 6242669.24
1167 81539 85.99 5635260.96
1168 81584 89.18 5844568.72
1169 81648 73.44 4812686.26
1170 81708 48.63 3186805.93
1171 81736 43.44 2847144.95
1172 81802 52.82 3461659.70
1173 81838 50.76 3326898.27
elapsed: 1173 ops: 81921 ops/sec: 69.78 bytes/sec: 4573115.29
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 173 173.67 11381469.40
2 355 177.66 11642828.14
3 526 175.59 11507649.76
4 651 160.48 10517126.37
5 677 134.71 8828572.31
6 772 119.71 7845543.57
7 949 118.81 7786373.45
8 1118 118.35 7756218.99
9 1272 125.59 8230497.64
10 1300 124.63 8167493.45
...
810 80908 65.83 4314111.22
811 81082 90.00 5898082.32
812 81215 110.61 7249218.68
813 81244 112.97 7403300.63
814 81291 110.71 7255437.00
815 81341 86.51 5669497.22
816 81416 66.74 4373813.41
817 81477 52.59 3446677.39
818 81645 80.61 5283157.27
819 81813 104.43 6843952.03
elapsed: 819 ops: 81921 ops/sec: 99.95 bytes/sec: 6550433.96
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 45 45.81 24016994.96
2 91 45.70 23957793.54
3 130 43.58 22847305.26
4 147 36.47 19119609.19
5 167 33.43 17528969.81
6 179 26.57 13931251.31
7 190 19.20 10063983.33
8 194 12.43 6518416.87
9 216 13.53 7092166.23
10 224 11.22 5880899.73
...
574 10075 9.40 4930013.54
575 10079 8.33 4366731.21
576 10101 10.75 5635610.00
577 10115 11.18 5860530.36
578 10153 17.25 9044775.40
579 10196 25.31 13269452.33
580 10221 26.95 14127067.94
581 10230 25.69 13470566.54
582 10240 25.13 13174435.25
elapsed: 582 ops: 10241 ops/sec: 17.59 bytes/sec: 9222399.90
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 51 51.70 27103074.15
2 103 51.53 27017638.08
3 135 45.32 23760413.10
4 155 38.85 20366603.85
5 183 36.58 19177765.89
6 226 35.01 18354715.55
7 272 33.86 17751662.58
8 316 36.12 18938618.74
9 338 36.65 19212577.38
10 361 35.18 18444806.03
...
359 10045 13.69 7176961.19
360 10051 12.80 6711485.82
361 10062 11.02 5778292.40
362 10087 13.91 7293660.81
363 10109 15.91 8339015.02
364 10124 16.31 8551990.29
365 10145 19.19 10060962.03
366 10169 21.41 11226375.33
367 10186 19.89 10429246.53
368 10223 22.80 11953390.80
elapsed: 368 ops: 10241 ops/sec: 27.80 bytes/sec: 14573487.85
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4M
bench-write io_size 4194304 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 7 6.97 29240480.92
2 12 6.47 27147232.46
3 17 5.70 23915718.98
4 22 5.69 23860058.47
5 27 5.29 22181822.83
6 30 4.61 19328294.43
7 34 4.34 18203876.61
8 39 4.13 17339788.66
9 41 3.80 15928710.13
10 48 4.33 18149676.52
...
312 1243 4.85 20356648.71
313 1247 4.96 20787766.09
314 1252 4.67 19605427.32
315 1255 4.63 19435600.75
316 1258 4.26 17860833.76
317 1264 4.12 17286140.28
318 1267 4.02 16874536.76
319 1273 4.33 18152169.76
320 1277 4.54 19024705.15
elapsed: 321 ops: 1281 ops/sec: 3.99 bytes/sec: 16735217.94
Results are similar (but previous SSD only installation and current 2HDD(osd)+1SSD(journal).
So conclusion is - use more expensive hardware?
There is no more room to enhance perfomance?
Last edited on April 27, 2018, 9:29 am by sds80 · #25
admin
2,930 Posts
April 27, 2018, 4:35 pmQuote from admin on April 27, 2018, 4:35 pmPetaSAN will work best with good hardware. It will give you best performance if using all flash. Even with hdds if you use controller with write back cache and ssd journals you will get good performance.
In your case, the numbers are lower than normal even for pure spinning disks and 1G network, as suggested i would recommend trying on a different hardware and network, even if it is low end. Of course if you do have higher end hardware that will be better.
PetaSAN will work best with good hardware. It will give you best performance if using all flash. Even with hdds if you use controller with write back cache and ssd journals you will get good performance.
In your case, the numbers are lower than normal even for pure spinning disks and 1G network, as suggested i would recommend trying on a different hardware and network, even if it is low end. Of course if you do have higher end hardware that will be better.
Very slow write speed on low end 3 node cluster
sds80
14 Posts
Quote from sds80 on April 20, 2018, 4:34 am# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
....
4520 1285868 307.96 1261420.36
4521 1286171 303.41 1242755.63
4522 1286503 303.17 1241793.70
4523 1286808 303.82 1244466.64
4524 1287109 305.32 1250585.57
4525 1287387 303.77 1244244.47
4526 1287680 301.72 1235848.78
4527 1288008 301.12 1233399.67
4528 1288324 303.12 1241559.11
4529 1288616 301.42 1234597.00
4530 1288906 303.90 1244783.78...
(to slow, 'ctrl+x' about hour after beginning)
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
...
1398 470456 351.29 1438871.86
1399 470786 349.65 1432150.17
1400 471135 347.30 1422541.21
1401 471464 342.41 1402492.20
1402 471809 340.32 1393961.61
1403 472150 338.80 1387704.97
1404 472487 340.25 1393680.13
1405 472839 340.89 1396298.77
1406 473181 343.52 1407044.38
1407 473524 342.99 1404866.64
1408 473863 342.56 1403129.62
1409 474202 343.02 1404992.20
1410 474565 345.25 1414125.15
1411 474918 347.32 1422616.19
1412 475256 346.54 1419420.95
1413 475589 345.13 1413656.18
1414 475935 346.47 1419122.50...
(to slow, 'ctrl+x' about hour after beginning)
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 190 190.41 12478826.51
2 385 192.80 12635412.03
3 580 193.46 12678369.63
4 762 190.62 12492270.08
5 941 188.31 12340770.86
6 1125 187.10 12261533.58...
569 80889 182.49 11959342.54
570 81079 182.58 11965266.75
571 81264 184.26 12075933.85
572 81447 184.25 12074809.93
573 81636 187.37 12279299.79
574 81815 185.08 12129455.96
elapsed: 574 ops: 81921 ops/sec: 142.57 bytes/sec: 9343659.72# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 199 199.61 13081783.40
2 395 197.46 12941065.04
3 585 195.09 12785251.34
4 783 195.76 12829283.90
5 861 171.97 11270527.74
6 963 152.79 10013065.48
7 1163 152.92 10022086.82
8 1191 121.17 7940702.65
9 1328 109.05 7146767.56
10 1433 114.29 7490151.42...
626 80461 123.44 8089621.24
627 80488 106.77 6997159.36
628 80661 137.45 9007922.50
629 80734 125.77 8242297.07
630 80889 120.55 7900469.05
631 80931 93.58 6132557.71
632 81062 116.48 7633869.34
633 81244 116.55 7638540.06
634 81311 115.88 7594070.18
635 81471 116.42 7629852.97
636 81673 149.45 9794086.93
637 81873 162.36 10640553.79
elapsed: 637 ops: 81921 ops/sec: 128.54 bytes/sec: 8423920.00# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 60 60.06 31486532.42
2 119 59.75 31327356.34
3 177 59.05 30957139.43
4 197 49.31 25851276.69
5 217 43.40 22755135.68
6 251 38.27 20064902.55
7 284 32.77 17182914.70
8 305 25.65 13446663.42
9 332 26.92 14112742.29
10 365 29.65 15544157.12...
289 9812 27.41 14370546.52
290 9838 28.26 14815407.48
291 9864 29.51 15473098.06
292 9895 29.36 15391555.73
293 9953 35.08 18394421.06
294 9980 33.62 17628999.32
295 10006 33.58 17607770.33
296 10032 33.57 17599085.46
297 10065 34.23 17945535.78
298 10093 28.00 14681237.64
299 10140 32.04 16796624.25
300 10186 36.04 18893424.91
301 10218 37.02 19408360.06
elapsed: 301 ops: 10241 ops/sec: 33.95 bytes/sec: 17801057.56# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 56 55.75 29231409.66
2 95 47.99 25159581.88
3 129 42.43 22246030.42
4 155 38.97 20433745.72
5 216 43.18 22637565.29
6 238 36.53 19153921.02
7 266 34.07 17864802.97
8 293 33.16 17387880.64
9 339 36.76 19270848.08
10 368 30.35 15911849.79
11 411 34.39 18031141.23...
295 9867 36.54 19159582.09
296 9890 36.16 18960391.24
297 9913 33.45 17537821.73
298 9942 27.38 14354665.65
299 9966 23.71 12433087.98
300 10002 27.09 14205292.60
301 10055 32.86 17228744.48
302 10083 33.97 17809404.41
303 10121 35.82 18778194.03
304 10160 38.80 20344113.92
305 10190 37.61 19716982.41
306 10229 34.96 18328771.19
elapsed: 306 ops: 10241 ops/sec: 33.44 bytes/sec: 17533909.55
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
i think this is due Monitor service malfunction on node3. i rebooted this node and error disappeared.
If you run atop command on one of the nodes during the test, do you see "red" values show up ?
now i have permanent problem regardless of load on all 3 nodes - MEM USAGE 100%
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
only OSD log is:
root@peta2:~# cat /var/log/ceph/ceph-osd.admin.log.1
2018-04-18 13:35:55.369713 7f5e744c68c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 1983
2018-04-18 13:35:55.370296 7f5e744c68c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.377642 7f5e744c68c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.378004 7f5e744c68c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.378079 7f5e744c68c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:55.867535 7ff891e838c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2006
2018-04-18 13:35:55.868130 7ff891e838c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.875925 7ff891e838c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.876342 7ff891e838c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.876436 7ff891e838c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:59.236656 7f25181808c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2082
2018-04-18 13:35:59.237413 7f25181808c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:59.247248 7f25181808c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:59.248919 7f25181808c0 1 journal close /dev/sdb2
2018-04-18 13:35:59.249002 7f25181808c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:00.392934 7fb6fa1198c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2181
2018-04-18 13:36:00.393542 7fb6fa1198c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:00.400939 7fb6fa1198c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:00.401368 7fb6fa1198c0 1 journal close /dev/sdb2
2018-04-18 13:36:00.401464 7fb6fa1198c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:06.069200 7f8995c2d8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2455
2018-04-18 13:36:06.069979 7f8995c2d8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:06.078987 7f8995c2d8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:06.080628 7f8995c2d8c0 1 journal close /dev/sdb2
2018-04-18 13:36:06.080680 7f8995c2d8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
2018-04-19 09:25:30.146064 7f757a13b8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 546
2018-04-19 09:25:30.161079 7f757a13b8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-19 09:25:30.168995 7f757a13b8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-19 09:25:30.169405 7f757a13b8c0 1 journal close /dev/sdb2
2018-04-19 09:25:30.169491 7f757a13b8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.
what do you mean 'subnet 2' ?
i have 3 identical servers with 2 NICs (Intel 6111ESB/6321ESB as seen in Petasan Installation GUI)
so my config is:
- eth0 (Mgmt, Backend1, iSCSI1)
- eth1 ( Backend2, iSCSI2)
i don't have any other NIC's to replace the current ones
So maybe is amount of RAM is bottleneck?
on next week i decide to install 3 node ceph only cluster (via ceph-deploy) and run rados tests
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
....
4520 1285868 307.96 1261420.36
4521 1286171 303.41 1242755.63
4522 1286503 303.17 1241793.70
4523 1286808 303.82 1244466.64
4524 1287109 305.32 1250585.57
4525 1287387 303.77 1244244.47
4526 1287680 301.72 1235848.78
4527 1288008 301.12 1233399.67
4528 1288324 303.12 1241559.11
4529 1288616 301.42 1234597.00
4530 1288906 303.90 1244783.78...
(to slow, 'ctrl+x' about hour after beginning)
# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
...
1398 470456 351.29 1438871.86
1399 470786 349.65 1432150.17
1400 471135 347.30 1422541.21
1401 471464 342.41 1402492.20
1402 471809 340.32 1393961.61
1403 472150 338.80 1387704.97
1404 472487 340.25 1393680.13
1405 472839 340.89 1396298.77
1406 473181 343.52 1407044.38
1407 473524 342.99 1404866.64
1408 473863 342.56 1403129.62
1409 474202 343.02 1404992.20
1410 474565 345.25 1414125.15
1411 474918 347.32 1422616.19
1412 475256 346.54 1419420.95
1413 475589 345.13 1413656.18
1414 475935 346.47 1419122.50...
(to slow, 'ctrl+x' about hour after beginning)
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 190 190.41 12478826.51
2 385 192.80 12635412.03
3 580 193.46 12678369.63
4 762 190.62 12492270.08
5 941 188.31 12340770.86
6 1125 187.10 12261533.58...
569 80889 182.49 11959342.54
570 81079 182.58 11965266.75
571 81264 184.26 12075933.85
572 81447 184.25 12074809.93
573 81636 187.37 12279299.79
574 81815 185.08 12129455.96
elapsed: 574 ops: 81921 ops/sec: 142.57 bytes/sec: 9343659.72# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 199 199.61 13081783.40
2 395 197.46 12941065.04
3 585 195.09 12785251.34
4 783 195.76 12829283.90
5 861 171.97 11270527.74
6 963 152.79 10013065.48
7 1163 152.92 10022086.82
8 1191 121.17 7940702.65
9 1328 109.05 7146767.56
10 1433 114.29 7490151.42...
626 80461 123.44 8089621.24
627 80488 106.77 6997159.36
628 80661 137.45 9007922.50
629 80734 125.77 8242297.07
630 80889 120.55 7900469.05
631 80931 93.58 6132557.71
632 81062 116.48 7633869.34
633 81244 116.55 7638540.06
634 81311 115.88 7594070.18
635 81471 116.42 7629852.97
636 81673 149.45 9794086.93
637 81873 162.36 10640553.79
elapsed: 637 ops: 81921 ops/sec: 128.54 bytes/sec: 8423920.00# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 60 60.06 31486532.42
2 119 59.75 31327356.34
3 177 59.05 30957139.43
4 197 49.31 25851276.69
5 217 43.40 22755135.68
6 251 38.27 20064902.55
7 284 32.77 17182914.70
8 305 25.65 13446663.42
9 332 26.92 14112742.29
10 365 29.65 15544157.12...
289 9812 27.41 14370546.52
290 9838 28.26 14815407.48
291 9864 29.51 15473098.06
292 9895 29.36 15391555.73
293 9953 35.08 18394421.06
294 9980 33.62 17628999.32
295 10006 33.58 17607770.33
296 10032 33.57 17599085.46
297 10065 34.23 17945535.78
298 10093 28.00 14681237.64
299 10140 32.04 16796624.25
300 10186 36.04 18893424.91
301 10218 37.02 19408360.06
elapsed: 301 ops: 10241 ops/sec: 33.95 bytes/sec: 17801057.56# rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 56 55.75 29231409.66
2 95 47.99 25159581.88
3 129 42.43 22246030.42
4 155 38.97 20433745.72
5 216 43.18 22637565.29
6 238 36.53 19153921.02
7 266 34.07 17864802.97
8 293 33.16 17387880.64
9 339 36.76 19270848.08
10 368 30.35 15911849.79
11 411 34.39 18031141.23...
295 9867 36.54 19159582.09
296 9890 36.16 18960391.24
297 9913 33.45 17537821.73
298 9942 27.38 14354665.65
299 9966 23.71 12433087.98
300 10002 27.09 14205292.60
301 10055 32.86 17228744.48
302 10083 33.97 17809404.41
303 10121 35.82 18778194.03
304 10160 38.80 20344113.92
305 10190 37.61 19716982.41
306 10229 34.96 18328771.19
elapsed: 306 ops: 10241 ops/sec: 33.44 bytes/sec: 17533909.55
I see the second test printed this error, is this frequent ?
2018-04-19 16:34:58.941473 7fcdb38c8700 0 -- :/3213431852 >> 10.10.1.3:6789/0 pipe(0x560f1de53580 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x560f1de58300).fault
i think this is due Monitor service malfunction on node3. i rebooted this node and error disappeared.
If you run atop command on one of the nodes during the test, do you see "red" values show up ?
now i have permanent problem regardless of load on all 3 nodes - MEM USAGE 100%
Do you see any error in osd logs:
cat /var/log/ceph/CLUSTER_NAME-osd.OSD_ID.log
only OSD log is:
root@peta2:~# cat /var/log/ceph/ceph-osd.admin.log.1
2018-04-18 13:35:55.369713 7f5e744c68c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 1983
2018-04-18 13:35:55.370296 7f5e744c68c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.377642 7f5e744c68c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.378004 7f5e744c68c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.378079 7f5e744c68c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:55.867535 7ff891e838c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2006
2018-04-18 13:35:55.868130 7ff891e838c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:55.875925 7ff891e838c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:55.876342 7ff891e838c0 1 journal close /dev/sdb2
2018-04-18 13:35:55.876436 7ff891e838c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:35:59.236656 7f25181808c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2082
2018-04-18 13:35:59.237413 7f25181808c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:35:59.247248 7f25181808c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:35:59.248919 7f25181808c0 1 journal close /dev/sdb2
2018-04-18 13:35:59.249002 7f25181808c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:00.392934 7fb6fa1198c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2181
2018-04-18 13:36:00.393542 7fb6fa1198c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:00.400939 7fb6fa1198c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:00.401368 7fb6fa1198c0 1 journal close /dev/sdb2
2018-04-18 13:36:00.401464 7fb6fa1198c0 0 probe_block_device_fsid /dev/sdb2 is filestore, ad263400-36fc-46ca-b426-33ce525881c3
2018-04-18 13:36:06.069200 7f8995c2d8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 2455
2018-04-18 13:36:06.069979 7f8995c2d8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-18 13:36:06.078987 7f8995c2d8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-18 13:36:06.080628 7f8995c2d8c0 1 journal close /dev/sdb2
2018-04-18 13:36:06.080680 7f8995c2d8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
2018-04-19 09:25:30.146064 7f757a13b8c0 0 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 546
2018-04-19 09:25:30.161079 7f757a13b8c0 -1 bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
2018-04-19 09:25:30.168995 7f757a13b8c0 1 journal _open /dev/sdb2 fd 4: 21474836480 bytes, block size 4096 bytes, directio = 0, aio = 0
2018-04-19 09:25:30.169405 7f757a13b8c0 1 journal close /dev/sdb2
2018-04-19 09:25:30.169491 7f757a13b8c0 0 probe_block_device_fsid /dev/sdb2 is filestore, f5f7efc5-c8af-4ca9-8cb0-37b00df350af
Do you have a dedicated nic for subnet 2 ? if so can you test a new setup where subnet 2 is mapped combined with a different nic.
what do you mean 'subnet 2' ?
i have 3 identical servers with 2 NICs (Intel 6111ESB/6321ESB as seen in Petasan Installation GUI)
so my config is:
- eth0 (Mgmt, Backend1, iSCSI1)
- eth1 ( Backend2, iSCSI2)
i don't have any other NIC's to replace the current ones
So maybe is amount of RAM is bottleneck?
on next week i decide to install 3 node ceph only cluster (via ceph-deploy) and run rados tests
admin
2,930 Posts
Quote from admin on April 20, 2018, 11:36 amOne thing is to make sure the cluster is active-clean before you run the tests. It may be better to also dis-able scrub and deep-scrub from the maintenance mode.
For ram, edit /etc/sysctl.conf on all 3 nodes and comment (via #) the following 2 lines:
vm.swappiness=10
vm.vfs_cache_pressure=1and reboot
For subnet 2 i meant to say backend 2.
If you can install a ceph yourself and run the rados/rbd tests that could also be helpful, let us know of you find any issues.
One thing is to make sure the cluster is active-clean before you run the tests. It may be better to also dis-able scrub and deep-scrub from the maintenance mode.
For ram, edit /etc/sysctl.conf on all 3 nodes and comment (via #) the following 2 lines:
vm.swappiness=10
vm.vfs_cache_pressure=1
and reboot
For subnet 2 i meant to say backend 2.
If you can install a ceph yourself and run the rados/rbd tests that could also be helpful, let us know of you find any issues.
sds80
14 Posts
Quote from sds80 on April 26, 2018, 9:56 amInstalled Ceph 3 node cluster on "VERSION="14.04.5 LTS, Trusty Tahr"
root@peta1:~# ceph -v
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
root@peta1:~# cat /etc/ceph/ceph.conf
[global]
fsid = 205a5603-0d41-4b2e-89e1-d79f10a26ec1
mon_initial_members = peta1, peta2, peta3
mon_host = 192.168.120.230,192.168.120.231,192.168.120.232
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 192.168.120.0/24
cluster_network = 10.10.2.0/24
Then install tgt:
# apt install tgt
# rbd create -p rbd rbd1 --size 4096 --name client.admin --image-feature layering
# echo '<target virtual-ceph:iscsi>
driver iscsi
bs-type rbd
backing-store rbd/rbd1 # Format: <pool_name>/<rbd_image_name>
initiator-address ALL </target>'
/etc/tgt/conf.d/ceph.conf
# systemctl restart tgt
Then i connected ceph disk via iSCSI client (w7 on vm) and copy large 13Gb file and 5000 small files (summary 2Gb).
Average speed 37Mb/s and 24Mb/s respectively.
So the speed is significially increased on clear Ceph installation vs Petasan.
I think there is some network configuration problem on Petasan.
Maybe you help me find the reason of bad write perfomance on Petasan installation.
ps - i have 64 PG's in default ceph-deploy installation, but 256 PG's in default Petasan install. Maybe this can react to write perfomance ? How can i change PG to 256 and test it ?
Installed Ceph 3 node cluster on "VERSION="14.04.5 LTS, Trusty Tahr"
root@peta1:~# ceph -v
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
root@peta1:~# cat /etc/ceph/ceph.conf
[global]
fsid = 205a5603-0d41-4b2e-89e1-d79f10a26ec1
mon_initial_members = peta1, peta2, peta3
mon_host = 192.168.120.230,192.168.120.231,192.168.120.232
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 192.168.120.0/24
cluster_network = 10.10.2.0/24
Then install tgt:
# apt install tgt
# rbd create -p rbd rbd1 --size 4096 --name client.admin --image-feature layering
# echo '<target virtual-ceph:iscsi>
driver iscsi
bs-type rbd
backing-store rbd/rbd1 # Format: <pool_name>/<rbd_image_name>
initiator-address ALL </target>'
/etc/tgt/conf.d/ceph.conf
# systemctl restart tgt
Then i connected ceph disk via iSCSI client (w7 on vm) and copy large 13Gb file and 5000 small files (summary 2Gb).
Average speed 37Mb/s and 24Mb/s respectively.
So the speed is significially increased on clear Ceph installation vs Petasan.
I think there is some network configuration problem on Petasan.
Maybe you help me find the reason of bad write perfomance on Petasan installation.
ps - i have 64 PG's in default ceph-deploy installation, but 256 PG's in default Petasan install. Maybe this can react to write perfomance ? How can i change PG to 256 and test it ?
admin
2,930 Posts
Quote from admin on April 26, 2018, 10:48 amI suspect the speed gain you see is probably due to write back caching in your setup. This allows the iSCSI target to cache many io and submit them in large bulks. Although this gives much better write performance, it does not allow iSCSI high availability since if a node dies while caching data, the failover path does not have this cached data and will probably lead to data corruption. Cache can be used for single non-ha setup to speed writes, but there is always potential for data loss (rather than corruption). Most HA iSCSI SANs do not cache data, the client still can cache at the client end (via the client os setting) if desired.
To see if this is the case, can you rerun the same benchmark test i posted earlier so we can compare the 2 setups:
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
I suspect the speed gain you see is probably due to write back caching in your setup. This allows the iSCSI target to cache many io and submit them in large bulks. Although this gives much better write performance, it does not allow iSCSI high availability since if a node dies while caching data, the failover path does not have this cached data and will probably lead to data corruption. Cache can be used for single non-ha setup to speed writes, but there is always potential for data loss (rather than corruption). Most HA iSCSI SANs do not cache data, the client still can cache at the client end (via the client os setting) if desired.
To see if this is the case, can you rerun the same benchmark test i posted earlier so we can compare the 2 setups:
Can you run the following benchmarks:
# 4k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K# 4k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K# 64k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K# 64k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K# 512k rand 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K# 512k seq 1 thread
rbd bench-write -c /opt/petasan/config/etc/ceph/clusterx.conf image-00001 --io-total 5368709200 -io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
sds80
14 Posts
Quote from sds80 on April 26, 2018, 10:58 amroot@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 185 185.60 760199.57
2 452 226.26 926767.51
3 642 214.09 876912.92
4 675 168.07 688402.59
5 737 147.39 603705.05
6 787 117.71 482141.94
7 959 101.40 415327.21
8 1237 119.08 487755.94
9 1483 162.25 664572.69
10 1531 156.84 642414.05
11 1553 156.58 641336.05
12 1607 121.24 496616.35
13 1682 88.95 364325.39...
10471 1309143 137.28 562293.52
10472 1309146 123.31 505060.85
10473 1309204 103.52 424038.37
10474 1309355 121.79 498843.06
10475 1309644 120.02 491586.80
10476 1309936 169.80 695496.63
10477 1310082 198.39 812608.66
10478 1310125 195.71 801646.84
10479 1310144 155.86 638384.17
10480 1310198 110.81 453877.72
10481 1310240 59.37 243180.55
10482 1310495 82.57 338220.69
elapsed: 10482 ops: 1310721 ops/sec: 125.04 bytes/sec: 512145.26
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 296 296.66 1215102.36
2 513 250.26 1025072.81
3 537 177.79 728208.40
4 643 160.95 659251.50
5 936 187.32 767255.63
6 1233 187.42 767684.74
7 1481 195.64 801356.90
8 1706 234.99 962499.43
9 2003 272.02 1114200.73
10 2301 273.11 1118663.10...
4315 916228 246.88 1011231.24
4316 916416 243.69 998161.25
4317 916663 229.85 941453.40
4318 916843 228.54 936079.70
4319 917148 228.53 936073.19
4320 917399 234.09 958819.37
4321 917694 255.47 1046409.07
4322 917779 233.29 955552.95
4323 917804 191.94 786189.52
4324 917993 169.04 692379.18
4325 918267 173.62 711157.08
4326 918426 146.44 599813.67
4327 918647 173.53 710777.63
4328 918802 194.18 795341.26
4329 919041 209.55 858328.68
4330 919230 192.59 788838.80
4331 919393 193.35 791941.31
4332 919696 210.50 862210.08
4333 919879 217.30 890053.22
4334 919902 171.88 704039.11...
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 166 166.83 10933627.78
2 288 143.38 9396747.12
3 344 114.66 7514448.23
4 416 103.88 6807838.98
5 477 88.14 5776136.33
6 478 62.35 4086490.05
7 628 68.19 4469135.09
8 771 85.44 5599120.90
9 858 88.63 5808722.21
10 949 103.10 6756435.30...
1163 81140 62.12 4070906.03
1164 81281 88.65 5809854.40
1165 81445 116.34 7624684.04
1166 81522 95.26 6242669.24
1167 81539 85.99 5635260.96
1168 81584 89.18 5844568.72
1169 81648 73.44 4812686.26
1170 81708 48.63 3186805.93
1171 81736 43.44 2847144.95
1172 81802 52.82 3461659.70
1173 81838 50.76 3326898.27
elapsed: 1173 ops: 81921 ops/sec: 69.78 bytes/sec: 4573115.29
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 173 173.67 11381469.40
2 355 177.66 11642828.14
3 526 175.59 11507649.76
4 651 160.48 10517126.37
5 677 134.71 8828572.31
6 772 119.71 7845543.57
7 949 118.81 7786373.45
8 1118 118.35 7756218.99
9 1272 125.59 8230497.64
10 1300 124.63 8167493.45...
810 80908 65.83 4314111.22
811 81082 90.00 5898082.32
812 81215 110.61 7249218.68
813 81244 112.97 7403300.63
814 81291 110.71 7255437.00
815 81341 86.51 5669497.22
816 81416 66.74 4373813.41
817 81477 52.59 3446677.39
818 81645 80.61 5283157.27
819 81813 104.43 6843952.03
elapsed: 819 ops: 81921 ops/sec: 99.95 bytes/sec: 6550433.96
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 45 45.81 24016994.96
2 91 45.70 23957793.54
3 130 43.58 22847305.26
4 147 36.47 19119609.19
5 167 33.43 17528969.81
6 179 26.57 13931251.31
7 190 19.20 10063983.33
8 194 12.43 6518416.87
9 216 13.53 7092166.23
10 224 11.22 5880899.73...
574 10075 9.40 4930013.54
575 10079 8.33 4366731.21
576 10101 10.75 5635610.00
577 10115 11.18 5860530.36
578 10153 17.25 9044775.40
579 10196 25.31 13269452.33
580 10221 26.95 14127067.94
581 10230 25.69 13470566.54
582 10240 25.13 13174435.25
elapsed: 582 ops: 10241 ops/sec: 17.59 bytes/sec: 9222399.90
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 51 51.70 27103074.15
2 103 51.53 27017638.08
3 135 45.32 23760413.10
4 155 38.85 20366603.85
5 183 36.58 19177765.89
6 226 35.01 18354715.55
7 272 33.86 17751662.58
8 316 36.12 18938618.74
9 338 36.65 19212577.38
10 361 35.18 18444806.03...
359 10045 13.69 7176961.19
360 10051 12.80 6711485.82
361 10062 11.02 5778292.40
362 10087 13.91 7293660.81
363 10109 15.91 8339015.02
364 10124 16.31 8551990.29
365 10145 19.19 10060962.03
366 10169 21.41 11226375.33
367 10186 19.89 10429246.53
368 10223 22.80 11953390.80
elapsed: 368 ops: 10241 ops/sec: 27.80 bytes/sec: 14573487.85
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4M
bench-write io_size 4194304 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 7 6.97 29240480.92
2 12 6.47 27147232.46
3 17 5.70 23915718.98
4 22 5.69 23860058.47
5 27 5.29 22181822.83
6 30 4.61 19328294.43
7 34 4.34 18203876.61
8 39 4.13 17339788.66
9 41 3.80 15928710.13
10 48 4.33 18149676.52...
312 1243 4.85 20356648.71
313 1247 4.96 20787766.09
314 1252 4.67 19605427.32
315 1255 4.63 19435600.75
316 1258 4.26 17860833.76
317 1264 4.12 17286140.28
318 1267 4.02 16874536.76
319 1273 4.33 18152169.76
320 1277 4.54 19024705.15
elapsed: 321 ops: 1281 ops/sec: 3.99 bytes/sec: 16735217.94Results are similar (but previous SSD only installation and current 2HDD(osd)+1SSD(journal).
So conclusion is - use more expensive hardware?
There is no more room to enhance perfomance?
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 185 185.60 760199.57
2 452 226.26 926767.51
3 642 214.09 876912.92
4 675 168.07 688402.59
5 737 147.39 603705.05
6 787 117.71 482141.94
7 959 101.40 415327.21
8 1237 119.08 487755.94
9 1483 162.25 664572.69
10 1531 156.84 642414.05
11 1553 156.58 641336.05
12 1607 121.24 496616.35
13 1682 88.95 364325.39...
10471 1309143 137.28 562293.52
10472 1309146 123.31 505060.85
10473 1309204 103.52 424038.37
10474 1309355 121.79 498843.06
10475 1309644 120.02 491586.80
10476 1309936 169.80 695496.63
10477 1310082 198.39 812608.66
10478 1310125 195.71 801646.84
10479 1310144 155.86 638384.17
10480 1310198 110.81 453877.72
10481 1310240 59.37 243180.55
10482 1310495 82.57 338220.69
elapsed: 10482 ops: 1310721 ops/sec: 125.04 bytes/sec: 512145.26
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 4K
bench-write io_size 4096 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 296 296.66 1215102.36
2 513 250.26 1025072.81
3 537 177.79 728208.40
4 643 160.95 659251.50
5 936 187.32 767255.63
6 1233 187.42 767684.74
7 1481 195.64 801356.90
8 1706 234.99 962499.43
9 2003 272.02 1114200.73
10 2301 273.11 1118663.10...
4315 916228 246.88 1011231.24
4316 916416 243.69 998161.25
4317 916663 229.85 941453.40
4318 916843 228.54 936079.70
4319 917148 228.53 936073.19
4320 917399 234.09 958819.37
4321 917694 255.47 1046409.07
4322 917779 233.29 955552.95
4323 917804 191.94 786189.52
4324 917993 169.04 692379.18
4325 918267 173.62 711157.08
4326 918426 146.44 599813.67
4327 918647 173.53 710777.63
4328 918802 194.18 795341.26
4329 919041 209.55 858328.68
4330 919230 192.59 788838.80
4331 919393 193.35 791941.31
4332 919696 210.50 862210.08
4333 919879 217.30 890053.22
4334 919902 171.88 704039.11...
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 166 166.83 10933627.78
2 288 143.38 9396747.12
3 344 114.66 7514448.23
4 416 103.88 6807838.98
5 477 88.14 5776136.33
6 478 62.35 4086490.05
7 628 68.19 4469135.09
8 771 85.44 5599120.90
9 858 88.63 5808722.21
10 949 103.10 6756435.30...
1163 81140 62.12 4070906.03
1164 81281 88.65 5809854.40
1165 81445 116.34 7624684.04
1166 81522 95.26 6242669.24
1167 81539 85.99 5635260.96
1168 81584 89.18 5844568.72
1169 81648 73.44 4812686.26
1170 81708 48.63 3186805.93
1171 81736 43.44 2847144.95
1172 81802 52.82 3461659.70
1173 81838 50.76 3326898.27
elapsed: 1173 ops: 81921 ops/sec: 69.78 bytes/sec: 4573115.29
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 64K
bench-write io_size 65536 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 173 173.67 11381469.40
2 355 177.66 11642828.14
3 526 175.59 11507649.76
4 651 160.48 10517126.37
5 677 134.71 8828572.31
6 772 119.71 7845543.57
7 949 118.81 7786373.45
8 1118 118.35 7756218.99
9 1272 125.59 8230497.64
10 1300 124.63 8167493.45...
810 80908 65.83 4314111.22
811 81082 90.00 5898082.32
812 81215 110.61 7249218.68
813 81244 112.97 7403300.63
814 81291 110.71 7255437.00
815 81341 86.51 5669497.22
816 81416 66.74 4373813.41
817 81477 52.59 3446677.39
818 81645 80.61 5283157.27
819 81813 104.43 6843952.03
elapsed: 819 ops: 81921 ops/sec: 99.95 bytes/sec: 6550433.96
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 45 45.81 24016994.96
2 91 45.70 23957793.54
3 130 43.58 22847305.26
4 147 36.47 19119609.19
5 167 33.43 17528969.81
6 179 26.57 13931251.31
7 190 19.20 10063983.33
8 194 12.43 6518416.87
9 216 13.53 7092166.23
10 224 11.22 5880899.73...
574 10075 9.40 4930013.54
575 10079 8.33 4366731.21
576 10101 10.75 5635610.00
577 10115 11.18 5860530.36
578 10153 17.25 9044775.40
579 10196 25.31 13269452.33
580 10221 26.95 14127067.94
581 10230 25.69 13470566.54
582 10240 25.13 13174435.25
elapsed: 582 ops: 10241 ops/sec: 17.59 bytes/sec: 9222399.90
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern seq --io-size 512K
bench-write io_size 524288 io_threads 1 bytes 5368709200 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 51 51.70 27103074.15
2 103 51.53 27017638.08
3 135 45.32 23760413.10
4 155 38.85 20366603.85
5 183 36.58 19177765.89
6 226 35.01 18354715.55
7 272 33.86 17751662.58
8 316 36.12 18938618.74
9 338 36.65 19212577.38
10 361 35.18 18444806.03...
359 10045 13.69 7176961.19
360 10051 12.80 6711485.82
361 10062 11.02 5778292.40
362 10087 13.91 7293660.81
363 10109 15.91 8339015.02
364 10124 16.31 8551990.29
365 10145 19.19 10060962.03
366 10169 21.41 11226375.33
367 10186 19.89 10429246.53
368 10223 22.80 11953390.80
elapsed: 368 ops: 10241 ops/sec: 27.80 bytes/sec: 14573487.85
root@peta1:~# rbd bench-write rbd1 --io-total 5368709200 --io-threads=1 --rbd_cache=false --io-pattern rand --io-size 4M
bench-write io_size 4194304 io_threads 1 bytes 5368709200 pattern random
SEC OPS OPS/SEC BYTES/SEC
1 7 6.97 29240480.92
2 12 6.47 27147232.46
3 17 5.70 23915718.98
4 22 5.69 23860058.47
5 27 5.29 22181822.83
6 30 4.61 19328294.43
7 34 4.34 18203876.61
8 39 4.13 17339788.66
9 41 3.80 15928710.13
10 48 4.33 18149676.52...
312 1243 4.85 20356648.71
313 1247 4.96 20787766.09
314 1252 4.67 19605427.32
315 1255 4.63 19435600.75
316 1258 4.26 17860833.76
317 1264 4.12 17286140.28
318 1267 4.02 16874536.76
319 1273 4.33 18152169.76
320 1277 4.54 19024705.15
elapsed: 321 ops: 1281 ops/sec: 3.99 bytes/sec: 16735217.94
Results are similar (but previous SSD only installation and current 2HDD(osd)+1SSD(journal).
So conclusion is - use more expensive hardware?
There is no more room to enhance perfomance?
admin
2,930 Posts
Quote from admin on April 27, 2018, 4:35 pmPetaSAN will work best with good hardware. It will give you best performance if using all flash. Even with hdds if you use controller with write back cache and ssd journals you will get good performance.
In your case, the numbers are lower than normal even for pure spinning disks and 1G network, as suggested i would recommend trying on a different hardware and network, even if it is low end. Of course if you do have higher end hardware that will be better.
PetaSAN will work best with good hardware. It will give you best performance if using all flash. Even with hdds if you use controller with write back cache and ssd journals you will get good performance.
In your case, the numbers are lower than normal even for pure spinning disks and 1G network, as suggested i would recommend trying on a different hardware and network, even if it is low end. Of course if you do have higher end hardware that will be better.