Cluster stopped, 1 osd is full
Pages: 1 2
wid
47 Posts
March 7, 2021, 1:09 pmQuote from wid on March 7, 2021, 1:09 pmHi,
I don't know why, but cluster stopped because one OSD is full.
But why? Balancer is enabled. One OSD is 95% full while other one is 43% full.
It's 3 node cluster (ceph01, ceph02), but data is on 2 nodes, after move data we will add disks to ceph03.
How to rebalance this?
Hi,
I don't know why, but cluster stopped because one OSD is full.
But why? Balancer is enabled. One OSD is 95% full while other one is 43% full.
It's 3 node cluster (ceph01, ceph02), but data is on 2 nodes, after move data we will add disks to ceph03.
How to rebalance this?
admin
2,930 Posts
March 7, 2021, 2:58 pmQuote from admin on March 7, 2021, 2:58 pmyou can check balancer status
ceph balancer status
but if the cluster has an error, balancer will not function
it is best to add an extra OSD to the node if you can, in all cases do the following:
From maintenenace tab, OSD CRUSH Weights, reduce the weight of OSD 16 by 10% ( example 8 TB -> 7.2 )
bump the max full values (careful not too much)
ceph osd dump | grep full
ceph osd set-full-ratio 0.96
ceph osd set-backfillfull-ratio 0.955
ceph osd dump | grep full
systemctl restart ceph-osd.16
if the rebalance is still stuck, on all nodes do:
systemctl restart ceph-osd.target
you can check balancer status
ceph balancer status
but if the cluster has an error, balancer will not function
it is best to add an extra OSD to the node if you can, in all cases do the following:
From maintenenace tab, OSD CRUSH Weights, reduce the weight of OSD 16 by 10% ( example 8 TB -> 7.2 )
bump the max full values (careful not too much)
ceph osd dump | grep full
ceph osd set-full-ratio 0.96
ceph osd set-backfillfull-ratio 0.955
ceph osd dump | grep full
systemctl restart ceph-osd.16
if the rebalance is still stuck, on all nodes do:
systemctl restart ceph-osd.target
wid
47 Posts
March 7, 2021, 6:05 pmQuote from wid on March 7, 2021, 6:05 pmThank You,
i did "ceph osd out 16" then cluster go to "warning" and data was moved,
then i set "ceph osd reweight-by-utilization" and data was moved to other OSD.
But after some time, cluster said "reweight" to 1.0 again.
Main question is: why this happend.
Why ceph send all data to one OSD (without any reweight, and active auto rebalance)
Thank You,
i did "ceph osd out 16" then cluster go to "warning" and data was moved,
then i set "ceph osd reweight-by-utilization" and data was moved to other OSD.
But after some time, cluster said "reweight" to 1.0 again.
Main question is: why this happend.
Why ceph send all data to one OSD (without any reweight, and active auto rebalance)
admin
2,930 Posts
March 7, 2021, 7:34 pmQuote from admin on March 7, 2021, 7:34 pmi am not sure why you would set OSD 16 to out, i would recommend changing its crush weight as per prev post, also as indicated, the autorebalance will not function if cluster is in error and can be monitored via the status command. i would recommend once the cluster is healthy to monitor the balancer and see if it is working.
i am not sure why you would set OSD 16 to out, i would recommend changing its crush weight as per prev post, also as indicated, the autorebalance will not function if cluster is in error and can be monitored via the status command. i would recommend once the cluster is healthy to monitor the balancer and see if it is working.
wid
47 Posts
March 8, 2021, 10:48 amQuote from wid on March 8, 2021, 10:48 amBecause it make cluster immediately working and start move PG from this OSD.
I ran several times ceph osd reweight-by-utilization
and OSD was almost equal, but after night cluster stoped again and the 16th OSD is full again.
Why? Cluster have 62% free space...
Because it make cluster immediately working and start move PG from this OSD.
I ran several times ceph osd reweight-by-utilization
and OSD was almost equal, but after night cluster stoped again and the 16th OSD is full again.
Why? Cluster have 62% free space...
admin
2,930 Posts
March 8, 2021, 1:36 pmQuote from admin on March 8, 2021, 1:36 pmhave you lowered its crush weight as per first post ?
have you lowered its crush weight as per first post ?
wid
47 Posts
March 8, 2021, 1:46 pmQuote from wid on March 8, 2021, 1:46 pmYes,
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 144 KiB 5.5 GiB 2.6 TiB 51.50 0.84 32 up
2 hdd 5.35699 0.93358 5.5 TiB 4.5 TiB 4.5 TiB 28 KiB 7.3 GiB 1021 GiB 81.73 1.33 27 up
4 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 56 KiB 6.7 GiB 1.9 TiB 64.89 1.06 32 up
6 hdd 5.45699 1.00000 5.5 TiB 2.5 TiB 2.5 TiB 80 KiB 5.1 GiB 3.0 TiB 45.09 0.73 33 up
10 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 36 KiB 4.1 GiB 1.7 TiB 52.52 0.85 22 up
12 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 56 KiB 4.4 GiB 1.5 TiB 58.32 0.95 23 up
14 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 120 KiB 4.4 GiB 1.5 TiB 58.74 0.96 23 up
16 hdd 3.03799 0.93358 3.6 TiB 3.2 TiB 3.2 TiB 52 KiB 5.6 GiB 414 GiB 88.89 1.45 16 up
1 hdd 5.45699 1.00000 5.5 TiB 3.6 TiB 3.6 TiB 112 KiB 6.4 GiB 1.9 TiB 65.42 1.06 31 up
3 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 48 KiB 6.4 GiB 1.9 TiB 64.47 1.05 31 up
5 hdd 5.45699 1.00000 5.5 TiB 3.8 TiB 3.8 TiB 104 KiB 7.5 GiB 1.6 TiB 70.06 1.14 30 up
7 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 148 KiB 5.9 GiB 2.6 TiB 51.60 0.84 32 up
9 hdd 3.63799 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 52 KiB 4.2 GiB 2.0 TiB 44.94 0.73 21 up
11 hdd 3.63799 1.00000 3.6 TiB 2.3 TiB 2.3 TiB 81 KiB 4.4 GiB 1.3 TiB 64.59 1.05 21 up
13 hdd 3.63799 1.00000 3.6 TiB 2.4 TiB 2.4 TiB 124 KiB 4.6 GiB 1.2 TiB 66.18 1.08 22 up
15 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 52 KiB 4.3 GiB 1.7 TiB 52.51 0.85 20 up
TOTAL 73 TiB 45 TiB 45 TiB 1.3 MiB 87 GiB 28 TiB 61.44
Yes,
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 144 KiB 5.5 GiB 2.6 TiB 51.50 0.84 32 up
2 hdd 5.35699 0.93358 5.5 TiB 4.5 TiB 4.5 TiB 28 KiB 7.3 GiB 1021 GiB 81.73 1.33 27 up
4 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 56 KiB 6.7 GiB 1.9 TiB 64.89 1.06 32 up
6 hdd 5.45699 1.00000 5.5 TiB 2.5 TiB 2.5 TiB 80 KiB 5.1 GiB 3.0 TiB 45.09 0.73 33 up
10 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 36 KiB 4.1 GiB 1.7 TiB 52.52 0.85 22 up
12 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 56 KiB 4.4 GiB 1.5 TiB 58.32 0.95 23 up
14 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 120 KiB 4.4 GiB 1.5 TiB 58.74 0.96 23 up
16 hdd 3.03799 0.93358 3.6 TiB 3.2 TiB 3.2 TiB 52 KiB 5.6 GiB 414 GiB 88.89 1.45 16 up
1 hdd 5.45699 1.00000 5.5 TiB 3.6 TiB 3.6 TiB 112 KiB 6.4 GiB 1.9 TiB 65.42 1.06 31 up
3 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 48 KiB 6.4 GiB 1.9 TiB 64.47 1.05 31 up
5 hdd 5.45699 1.00000 5.5 TiB 3.8 TiB 3.8 TiB 104 KiB 7.5 GiB 1.6 TiB 70.06 1.14 30 up
7 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 148 KiB 5.9 GiB 2.6 TiB 51.60 0.84 32 up
9 hdd 3.63799 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 52 KiB 4.2 GiB 2.0 TiB 44.94 0.73 21 up
11 hdd 3.63799 1.00000 3.6 TiB 2.3 TiB 2.3 TiB 81 KiB 4.4 GiB 1.3 TiB 64.59 1.05 21 up
13 hdd 3.63799 1.00000 3.6 TiB 2.4 TiB 2.4 TiB 124 KiB 4.6 GiB 1.2 TiB 66.18 1.08 22 up
15 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 52 KiB 4.3 GiB 1.7 TiB 52.51 0.85 20 up
TOTAL 73 TiB 45 TiB 45 TiB 1.3 MiB 87 GiB 28 TiB 61.44
Last edited on March 8, 2021, 1:46 pm by wid · #7
admin
2,930 Posts
March 8, 2021, 3:42 pmQuote from admin on March 8, 2021, 3:42 pmThe %use looks better, still OSD 16 has appreciably higher usage 89% yet PG count assigned to it is 16 which is below the average so it seems some PGs are storing more than others. This is due to some bad luck that crush is not true random, but also the total number of PGs is low which is makes things harder to balance.
You have 208 PGs in 3 pools, can you specify what each pool has how many PGs, how many replicas ? Increasing the PGs will help rebalance things, assigning PGs to the pools being used more and assign less PGs to the pools not being used often.
The %use looks better, still OSD 16 has appreciably higher usage 89% yet PG count assigned to it is 16 which is below the average so it seems some PGs are storing more than others. This is due to some bad luck that crush is not true random, but also the total number of PGs is low which is makes things harder to balance.
You have 208 PGs in 3 pools, can you specify what each pool has how many PGs, how many replicas ? Increasing the PGs will help rebalance things, assigning PGs to the pools being used more and assign less PGs to the pools not being used often.
wid
47 Posts
March 8, 2021, 3:47 pmQuote from wid on March 8, 2021, 3:47 pm3 pools
1) 16 PGs
2) 128 PGs
3) 64 PGs
This is result of ceph autoscaler, he down PGS from almost 1024 for each pool. It takes almost 12 days.
Every pool have 2 replcas, but after move data from srv03 we will move it as ceph03 with 3 replicas.
3 pools
1) 16 PGs
2) 128 PGs
3) 64 PGs
This is result of ceph autoscaler, he down PGS from almost 1024 for each pool. It takes almost 12 days.
Every pool have 2 replcas, but after move data from srv03 we will move it as ceph03 with 3 replicas.
admin
2,930 Posts
March 8, 2021, 8:18 pmQuote from admin on March 8, 2021, 8:18 pmI think the low number of PGs is an issue, not sure why the autoscaler reduced it. I would bump them again to 1024 total. i am not sure how you using all 3 pools but try to have the pool with the most usage have the larger ratio of PGs.
I think the low number of PGs is an issue, not sure why the autoscaler reduced it. I would bump them again to 1024 total. i am not sure how you using all 3 pools but try to have the pool with the most usage have the larger ratio of PGs.
Pages: 1 2
Cluster stopped, 1 osd is full
wid
47 Posts
Quote from wid on March 7, 2021, 1:09 pmHi,
I don't know why, but cluster stopped because one OSD is full.
But why? Balancer is enabled. One OSD is 95% full while other one is 43% full.
It's 3 node cluster (ceph01, ceph02), but data is on 2 nodes, after move data we will add disks to ceph03.
How to rebalance this?
Hi,
I don't know why, but cluster stopped because one OSD is full.
But why? Balancer is enabled. One OSD is 95% full while other one is 43% full.
It's 3 node cluster (ceph01, ceph02), but data is on 2 nodes, after move data we will add disks to ceph03.
How to rebalance this?
admin
2,930 Posts
Quote from admin on March 7, 2021, 2:58 pmyou can check balancer status
ceph balancer status
but if the cluster has an error, balancer will not functionit is best to add an extra OSD to the node if you can, in all cases do the following:
From maintenenace tab, OSD CRUSH Weights, reduce the weight of OSD 16 by 10% ( example 8 TB -> 7.2 )
bump the max full values (careful not too much)
ceph osd dump | grep full
ceph osd set-full-ratio 0.96
ceph osd set-backfillfull-ratio 0.955
ceph osd dump | grep fullsystemctl restart ceph-osd.16
if the rebalance is still stuck, on all nodes do:
systemctl restart ceph-osd.target
you can check balancer status
ceph balancer status
but if the cluster has an error, balancer will not function
it is best to add an extra OSD to the node if you can, in all cases do the following:
From maintenenace tab, OSD CRUSH Weights, reduce the weight of OSD 16 by 10% ( example 8 TB -> 7.2 )
bump the max full values (careful not too much)
ceph osd dump | grep full
ceph osd set-full-ratio 0.96
ceph osd set-backfillfull-ratio 0.955
ceph osd dump | grep full
systemctl restart ceph-osd.16
if the rebalance is still stuck, on all nodes do:
systemctl restart ceph-osd.target
wid
47 Posts
Quote from wid on March 7, 2021, 6:05 pmThank You,
i did "ceph osd out 16" then cluster go to "warning" and data was moved,
then i set "ceph osd reweight-by-utilization" and data was moved to other OSD.But after some time, cluster said "reweight" to 1.0 again.
Main question is: why this happend.
Why ceph send all data to one OSD (without any reweight, and active auto rebalance)
Thank You,
i did "ceph osd out 16" then cluster go to "warning" and data was moved,
then i set "ceph osd reweight-by-utilization" and data was moved to other OSD.
But after some time, cluster said "reweight" to 1.0 again.
Main question is: why this happend.
Why ceph send all data to one OSD (without any reweight, and active auto rebalance)
admin
2,930 Posts
Quote from admin on March 7, 2021, 7:34 pmi am not sure why you would set OSD 16 to out, i would recommend changing its crush weight as per prev post, also as indicated, the autorebalance will not function if cluster is in error and can be monitored via the status command. i would recommend once the cluster is healthy to monitor the balancer and see if it is working.
i am not sure why you would set OSD 16 to out, i would recommend changing its crush weight as per prev post, also as indicated, the autorebalance will not function if cluster is in error and can be monitored via the status command. i would recommend once the cluster is healthy to monitor the balancer and see if it is working.
wid
47 Posts
Quote from wid on March 8, 2021, 10:48 amBecause it make cluster immediately working and start move PG from this OSD.
I ran several times ceph osd reweight-by-utilization
and OSD was almost equal, but after night cluster stoped again and the 16th OSD is full again.
Why? Cluster have 62% free space...
Because it make cluster immediately working and start move PG from this OSD.
I ran several times ceph osd reweight-by-utilization
and OSD was almost equal, but after night cluster stoped again and the 16th OSD is full again.
Why? Cluster have 62% free space...
admin
2,930 Posts
Quote from admin on March 8, 2021, 1:36 pmhave you lowered its crush weight as per first post ?
have you lowered its crush weight as per first post ?
wid
47 Posts
Quote from wid on March 8, 2021, 1:46 pmYes,
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 144 KiB 5.5 GiB 2.6 TiB 51.50 0.84 32 up 2 hdd 5.35699 0.93358 5.5 TiB 4.5 TiB 4.5 TiB 28 KiB 7.3 GiB 1021 GiB 81.73 1.33 27 up 4 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 56 KiB 6.7 GiB 1.9 TiB 64.89 1.06 32 up 6 hdd 5.45699 1.00000 5.5 TiB 2.5 TiB 2.5 TiB 80 KiB 5.1 GiB 3.0 TiB 45.09 0.73 33 up 10 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 36 KiB 4.1 GiB 1.7 TiB 52.52 0.85 22 up 12 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 56 KiB 4.4 GiB 1.5 TiB 58.32 0.95 23 up 14 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 120 KiB 4.4 GiB 1.5 TiB 58.74 0.96 23 up 16 hdd 3.03799 0.93358 3.6 TiB 3.2 TiB 3.2 TiB 52 KiB 5.6 GiB 414 GiB 88.89 1.45 16 up 1 hdd 5.45699 1.00000 5.5 TiB 3.6 TiB 3.6 TiB 112 KiB 6.4 GiB 1.9 TiB 65.42 1.06 31 up 3 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 48 KiB 6.4 GiB 1.9 TiB 64.47 1.05 31 up 5 hdd 5.45699 1.00000 5.5 TiB 3.8 TiB 3.8 TiB 104 KiB 7.5 GiB 1.6 TiB 70.06 1.14 30 up 7 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 148 KiB 5.9 GiB 2.6 TiB 51.60 0.84 32 up 9 hdd 3.63799 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 52 KiB 4.2 GiB 2.0 TiB 44.94 0.73 21 up 11 hdd 3.63799 1.00000 3.6 TiB 2.3 TiB 2.3 TiB 81 KiB 4.4 GiB 1.3 TiB 64.59 1.05 21 up 13 hdd 3.63799 1.00000 3.6 TiB 2.4 TiB 2.4 TiB 124 KiB 4.6 GiB 1.2 TiB 66.18 1.08 22 up 15 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 52 KiB 4.3 GiB 1.7 TiB 52.51 0.85 20 up TOTAL 73 TiB 45 TiB 45 TiB 1.3 MiB 87 GiB 28 TiB 61.44
Yes,
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 144 KiB 5.5 GiB 2.6 TiB 51.50 0.84 32 up 2 hdd 5.35699 0.93358 5.5 TiB 4.5 TiB 4.5 TiB 28 KiB 7.3 GiB 1021 GiB 81.73 1.33 27 up 4 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 56 KiB 6.7 GiB 1.9 TiB 64.89 1.06 32 up 6 hdd 5.45699 1.00000 5.5 TiB 2.5 TiB 2.5 TiB 80 KiB 5.1 GiB 3.0 TiB 45.09 0.73 33 up 10 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 36 KiB 4.1 GiB 1.7 TiB 52.52 0.85 22 up 12 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 56 KiB 4.4 GiB 1.5 TiB 58.32 0.95 23 up 14 hdd 3.63799 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 120 KiB 4.4 GiB 1.5 TiB 58.74 0.96 23 up 16 hdd 3.03799 0.93358 3.6 TiB 3.2 TiB 3.2 TiB 52 KiB 5.6 GiB 414 GiB 88.89 1.45 16 up 1 hdd 5.45699 1.00000 5.5 TiB 3.6 TiB 3.6 TiB 112 KiB 6.4 GiB 1.9 TiB 65.42 1.06 31 up 3 hdd 5.45699 1.00000 5.5 TiB 3.5 TiB 3.5 TiB 48 KiB 6.4 GiB 1.9 TiB 64.47 1.05 31 up 5 hdd 5.45699 1.00000 5.5 TiB 3.8 TiB 3.8 TiB 104 KiB 7.5 GiB 1.6 TiB 70.06 1.14 30 up 7 hdd 5.45699 1.00000 5.5 TiB 2.8 TiB 2.8 TiB 148 KiB 5.9 GiB 2.6 TiB 51.60 0.84 32 up 9 hdd 3.63799 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 52 KiB 4.2 GiB 2.0 TiB 44.94 0.73 21 up 11 hdd 3.63799 1.00000 3.6 TiB 2.3 TiB 2.3 TiB 81 KiB 4.4 GiB 1.3 TiB 64.59 1.05 21 up 13 hdd 3.63799 1.00000 3.6 TiB 2.4 TiB 2.4 TiB 124 KiB 4.6 GiB 1.2 TiB 66.18 1.08 22 up 15 hdd 3.63799 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 52 KiB 4.3 GiB 1.7 TiB 52.51 0.85 20 up TOTAL 73 TiB 45 TiB 45 TiB 1.3 MiB 87 GiB 28 TiB 61.44
admin
2,930 Posts
Quote from admin on March 8, 2021, 3:42 pmThe %use looks better, still OSD 16 has appreciably higher usage 89% yet PG count assigned to it is 16 which is below the average so it seems some PGs are storing more than others. This is due to some bad luck that crush is not true random, but also the total number of PGs is low which is makes things harder to balance.
You have 208 PGs in 3 pools, can you specify what each pool has how many PGs, how many replicas ? Increasing the PGs will help rebalance things, assigning PGs to the pools being used more and assign less PGs to the pools not being used often.
The %use looks better, still OSD 16 has appreciably higher usage 89% yet PG count assigned to it is 16 which is below the average so it seems some PGs are storing more than others. This is due to some bad luck that crush is not true random, but also the total number of PGs is low which is makes things harder to balance.
You have 208 PGs in 3 pools, can you specify what each pool has how many PGs, how many replicas ? Increasing the PGs will help rebalance things, assigning PGs to the pools being used more and assign less PGs to the pools not being used often.
wid
47 Posts
Quote from wid on March 8, 2021, 3:47 pm3 pools
1) 16 PGs
2) 128 PGs
3) 64 PGsThis is result of ceph autoscaler, he down PGS from almost 1024 for each pool. It takes almost 12 days.
Every pool have 2 replcas, but after move data from srv03 we will move it as ceph03 with 3 replicas.
3 pools
1) 16 PGs
2) 128 PGs
3) 64 PGs
This is result of ceph autoscaler, he down PGS from almost 1024 for each pool. It takes almost 12 days.
Every pool have 2 replcas, but after move data from srv03 we will move it as ceph03 with 3 replicas.
admin
2,930 Posts
Quote from admin on March 8, 2021, 8:18 pmI think the low number of PGs is an issue, not sure why the autoscaler reduced it. I would bump them again to 1024 total. i am not sure how you using all 3 pools but try to have the pool with the most usage have the larger ratio of PGs.
I think the low number of PGs is an issue, not sure why the autoscaler reduced it. I would bump them again to 1024 total. i am not sure how you using all 3 pools but try to have the pool with the most usage have the larger ratio of PGs.