mix different sizes of SSD
Pages: 1 2
exitsys
43 Posts
October 10, 2020, 6:55 amQuote from exitsys on October 10, 2020, 6:55 amthe cpu utilization and memory utilization are at about 0%.
Network utilization at 2% throughput at 20MB/s.
Everything is bored. Nevertheless, the overnight rebuild is only now with degraded data redundancy: 2267002/11139717 objects degraded (20.351%), 323 pgs degraded, 323 pgs undersized.
Is there anything wrong with this? I had 6 x 960Gb in it and exchanged it for 4 x 1920GB
the cpu utilization and memory utilization are at about 0%.
Network utilization at 2% throughput at 20MB/s.
Everything is bored. Nevertheless, the overnight rebuild is only now with degraded data redundancy: 2267002/11139717 objects degraded (20.351%), 323 pgs degraded, 323 pgs undersized.
Is there anything wrong with this? I had 6 x 960Gb in it and exchanged it for 4 x 1920GB
admin
2,930 Posts
October 10, 2020, 8:52 amQuote from admin on October 10, 2020, 8:52 amit could well be my load guess was wrong as it totally depends on the environment. you have 2 issues:
your current recovery/backfill speed is slow and your cluster is not loaded: set the backfill speed to "average" in the ui, observe the load charts as you did earlier and include the disk % utilization/busy on all nodes, after 1-2 hours if all ok you can switch it to "fast" then maybe "very fast" later. You can look at the PG Status chart to give you estimates on when things will complete.
your earlier issue of iSCSI dropping: could be something other than load but hard for me to guess. i would look at the same load charts at the time of the iSCSI issue: was the load also low ? look at the PG Status charts, were any PGs inactive or down ?
it could well be my load guess was wrong as it totally depends on the environment. you have 2 issues:
your current recovery/backfill speed is slow and your cluster is not loaded: set the backfill speed to "average" in the ui, observe the load charts as you did earlier and include the disk % utilization/busy on all nodes, after 1-2 hours if all ok you can switch it to "fast" then maybe "very fast" later. You can look at the PG Status chart to give you estimates on when things will complete.
your earlier issue of iSCSI dropping: could be something other than load but hard for me to guess. i would look at the same load charts at the time of the iSCSI issue: was the load also low ? look at the PG Status charts, were any PGs inactive or down ?
Last edited on October 10, 2020, 8:53 am by admin · #12
exitsys
43 Posts
October 10, 2020, 1:32 pmQuote from exitsys on October 10, 2020, 1:32 pmHi, I have now set Backfill Speed to very Fast.
as I said, I had swapped all SSds in the third note yesterday.
Node 1
CPU approx. 3%.
Memory approx. 19%
Disk Utilization about 9% on some
Network Utilization approx. 7%
Node2
CPU approx. 12%
Memory approx. 23%
Disk Utilization slightly between 7% and 36% on some
Network Utilization approx. 97%
Node3
CPU approx. 13%
Memory approx. 20%.
Disk Utilization about 66% on some
Network Utilization approx. 92%
In the time I have written this, Ceph Health has fallen from 15% degraded to 1.2%. So I want to say it's happened pretty damn fast now.
the bottleneck is now the backend network. i have it on active backup. Maybe I should set it to balance-alb?
Hi, I have now set Backfill Speed to very Fast.
as I said, I had swapped all SSds in the third note yesterday.
Node 1
CPU approx. 3%.
Memory approx. 19%
Disk Utilization about 9% on some
Network Utilization approx. 7%
Node2
CPU approx. 12%
Memory approx. 23%
Disk Utilization slightly between 7% and 36% on some
Network Utilization approx. 97%
Node3
CPU approx. 13%
Memory approx. 20%.
Disk Utilization about 66% on some
Network Utilization approx. 92%
In the time I have written this, Ceph Health has fallen from 15% degraded to 1.2%. So I want to say it's happened pretty damn fast now.
the bottleneck is now the backend network. i have it on active backup. Maybe I should set it to balance-alb?
Last edited on October 10, 2020, 1:37 pm by exitsys · #13
admin
2,930 Posts
October 10, 2020, 3:04 pmQuote from admin on October 10, 2020, 3:04 pmYes it makes a big difference. i would advise you do not set the speed above average unless temporarily in cases like this and you monitor the load. I would recommend you use LACP if your switches support MLAG. For the iSCSI issue i recommend you look at the second point posted earlier. Good luck.
Yes it makes a big difference. i would advise you do not set the speed above average unless temporarily in cases like this and you monitor the load. I would recommend you use LACP if your switches support MLAG. For the iSCSI issue i recommend you look at the second point posted earlier. Good luck.
Last edited on October 10, 2020, 3:08 pm by admin · #14
Pages: 1 2
mix different sizes of SSD
exitsys
43 Posts
Quote from exitsys on October 10, 2020, 6:55 amthe cpu utilization and memory utilization are at about 0%.
Network utilization at 2% throughput at 20MB/s.
Everything is bored. Nevertheless, the overnight rebuild is only now with degraded data redundancy: 2267002/11139717 objects degraded (20.351%), 323 pgs degraded, 323 pgs undersized.
Is there anything wrong with this? I had 6 x 960Gb in it and exchanged it for 4 x 1920GB
the cpu utilization and memory utilization are at about 0%.
Network utilization at 2% throughput at 20MB/s.
Everything is bored. Nevertheless, the overnight rebuild is only now with degraded data redundancy: 2267002/11139717 objects degraded (20.351%), 323 pgs degraded, 323 pgs undersized.
Is there anything wrong with this? I had 6 x 960Gb in it and exchanged it for 4 x 1920GB
admin
2,930 Posts
Quote from admin on October 10, 2020, 8:52 amit could well be my load guess was wrong as it totally depends on the environment. you have 2 issues:
your current recovery/backfill speed is slow and your cluster is not loaded: set the backfill speed to "average" in the ui, observe the load charts as you did earlier and include the disk % utilization/busy on all nodes, after 1-2 hours if all ok you can switch it to "fast" then maybe "very fast" later. You can look at the PG Status chart to give you estimates on when things will complete.
your earlier issue of iSCSI dropping: could be something other than load but hard for me to guess. i would look at the same load charts at the time of the iSCSI issue: was the load also low ? look at the PG Status charts, were any PGs inactive or down ?
it could well be my load guess was wrong as it totally depends on the environment. you have 2 issues:
your current recovery/backfill speed is slow and your cluster is not loaded: set the backfill speed to "average" in the ui, observe the load charts as you did earlier and include the disk % utilization/busy on all nodes, after 1-2 hours if all ok you can switch it to "fast" then maybe "very fast" later. You can look at the PG Status chart to give you estimates on when things will complete.
your earlier issue of iSCSI dropping: could be something other than load but hard for me to guess. i would look at the same load charts at the time of the iSCSI issue: was the load also low ? look at the PG Status charts, were any PGs inactive or down ?
exitsys
43 Posts
Quote from exitsys on October 10, 2020, 1:32 pmHi, I have now set Backfill Speed to very Fast.
as I said, I had swapped all SSds in the third note yesterday.
Node 1
CPU approx. 3%.
Memory approx. 19%
Disk Utilization about 9% on some
Network Utilization approx. 7%Node2
CPU approx. 12%
Memory approx. 23%
Disk Utilization slightly between 7% and 36% on some
Network Utilization approx. 97%Node3
CPU approx. 13%
Memory approx. 20%.
Disk Utilization about 66% on some
Network Utilization approx. 92%In the time I have written this, Ceph Health has fallen from 15% degraded to 1.2%. So I want to say it's happened pretty damn fast now.
the bottleneck is now the backend network. i have it on active backup. Maybe I should set it to balance-alb?
Hi, I have now set Backfill Speed to very Fast.
as I said, I had swapped all SSds in the third note yesterday.
Node 1
CPU approx. 3%.
Memory approx. 19%
Disk Utilization about 9% on some
Network Utilization approx. 7%
Node2
CPU approx. 12%
Memory approx. 23%
Disk Utilization slightly between 7% and 36% on some
Network Utilization approx. 97%
Node3
CPU approx. 13%
Memory approx. 20%.
Disk Utilization about 66% on some
Network Utilization approx. 92%
In the time I have written this, Ceph Health has fallen from 15% degraded to 1.2%. So I want to say it's happened pretty damn fast now.
the bottleneck is now the backend network. i have it on active backup. Maybe I should set it to balance-alb?
admin
2,930 Posts
Quote from admin on October 10, 2020, 3:04 pmYes it makes a big difference. i would advise you do not set the speed above average unless temporarily in cases like this and you monitor the load. I would recommend you use LACP if your switches support MLAG. For the iSCSI issue i recommend you look at the second point posted earlier. Good luck.
Yes it makes a big difference. i would advise you do not set the speed above average unless temporarily in cases like this and you monitor the load. I would recommend you use LACP if your switches support MLAG. For the iSCSI issue i recommend you look at the second point posted earlier. Good luck.