OSD balance issue
petasanrd911
19 Posts
July 15, 2021, 9:12 pmQuote from petasanrd911 on July 15, 2021, 9:12 pmI am seeing imbalance issue. I have the ceph balancer enabled and set to upmap. Recently a warning showed up that osd24 is almost full and that 5 pools are almost full. OSD and node list below. Trying to understand the storage, it looks like my main pool cephfs_data isn't showing all the available storage?
What do I need to do to get the OSD balanced? I do know that Node5 is 1 3.64tb drive short and will be adding one soon to keep the nodes even but I wouldn't' think that causes the OSD's to be so unbalanced.
ceph health detail
HEALTH_WARN 1 nearfull osd(s); 5 pool(s) nearfull
[WRN] OSD_NEARFULL: 1 nearfull osd(s)
osd.24 is near full
[WRN] POOL_NEARFULL: 5 pool(s) nearfull
pool 'rbd' is nearfull
pool 'cephfs_data' is nearfull
pool 'cephfs_metadata' is nearfull
pool 'plexdata' is nearfull
pool 'device_health_metrics' is nearfull
Storage shows 93.36 TB / 160.1 TB (58.31%)
Pools
Name Type Usage PGs Size Min Size Rule Name Used Space Available Space Active OSDs
cephfs_data replicated cephfs 128 3 2 replicated_rule 93.21 TB 4.3 TB 34
cephfs_metadata replicated cephfs 64 3 2 replicated_rule 1022.74 MB 4.3 TB 34
device_health_metrics replicated mgr_devicehealth 1 3 2 replicated_rule 7.95 MB 4.3 TB 3
plexdata EC cephfs 128 3 2 ec-by-host-hdd 0 Bytes 8.59 TB 34
rbd replicated rbd 128 3 2 replicated_rule 3.94 KB 4.3 TB 34
Node1
disk size Usage OSD Usage
sdc 3.64 TB OSD33 40%
sdd 3.64 TB OSD23 40%
sde 3.64 TB OSD20 74%
sdf 5.46 TB OSD12 76%
sdg 5.46 TB OSD13 80%
sdh 5.46 TB OSD2 44%
sdi 5.46 TB OSD3 62%
Node2
disk size Usage OSD Usage
sdc 3.64 TB OSD32 73%
sdd 3.64 TB OSD24 87%
sde 5.46 TB OSD14 58%
sdf 5.46 TB OSD19 67%
sdg 3.64 TB OSD28 60%
sdh 5.46 TB OSD4 67%
sdi 5.46 TB OSD5 58%
node3
disk size Usage OSD Usage
sdc 3.64 TB OSD31 47%
sdd 3.64 TB OSD26 60%
sde 3.64 TB OSD25 47%
sdf 5.46 TB OSD15 71%
sdg 5.46 TB OSD16 49%
sdh 5.46 TB OSD0 40%
sdi 5.46 TB OSD1 58%
node4
disk size Usage OSD Usage
sdc 3.64 TB OSD30 60%
sdd 3.64 TB OSD22 80%
sde 3.64 TB OSD27 33%
sdf 5.46 TB OSD17 58%
sdg 5.46 TB OSD18 49%
sdh 5.46 TB OSD6 53%
sdi 5.46 TB OSD7 45%
node5
disk size Usage OSD Usage
sdb 3.64 TB OSD29 47%
sdc 3.64 TB OSD21 40%
sde 5.46 TB OSD8 54%
sdf 5.46 TB OSD9 80%
sdg 5.46 TB OSD10 71%
sdh 5.46 TB OSD11 45%
I am seeing imbalance issue. I have the ceph balancer enabled and set to upmap. Recently a warning showed up that osd24 is almost full and that 5 pools are almost full. OSD and node list below. Trying to understand the storage, it looks like my main pool cephfs_data isn't showing all the available storage?
What do I need to do to get the OSD balanced? I do know that Node5 is 1 3.64tb drive short and will be adding one soon to keep the nodes even but I wouldn't' think that causes the OSD's to be so unbalanced.
ceph health detail
HEALTH_WARN 1 nearfull osd(s); 5 pool(s) nearfull
[WRN] OSD_NEARFULL: 1 nearfull osd(s)
osd.24 is near full
[WRN] POOL_NEARFULL: 5 pool(s) nearfull
pool 'rbd' is nearfull
pool 'cephfs_data' is nearfull
pool 'cephfs_metadata' is nearfull
pool 'plexdata' is nearfull
pool 'device_health_metrics' is nearfull
Storage shows 93.36 TB / 160.1 TB (58.31%)
Pools
Name Type Usage PGs Size Min Size Rule Name Used Space Available Space Active OSDs
cephfs_data replicated cephfs 128 3 2 replicated_rule 93.21 TB 4.3 TB 34
cephfs_metadata replicated cephfs 64 3 2 replicated_rule 1022.74 MB 4.3 TB 34
device_health_metrics replicated mgr_devicehealth 1 3 2 replicated_rule 7.95 MB 4.3 TB 3
plexdata EC cephfs 128 3 2 ec-by-host-hdd 0 Bytes 8.59 TB 34
rbd replicated rbd 128 3 2 replicated_rule 3.94 KB 4.3 TB 34
Node1
disk size Usage OSD Usage
sdc 3.64 TB OSD33 40%
sdd 3.64 TB OSD23 40%
sde 3.64 TB OSD20 74%
sdf 5.46 TB OSD12 76%
sdg 5.46 TB OSD13 80%
sdh 5.46 TB OSD2 44%
sdi 5.46 TB OSD3 62%
Node2
disk size Usage OSD Usage
sdc 3.64 TB OSD32 73%
sdd 3.64 TB OSD24 87%
sde 5.46 TB OSD14 58%
sdf 5.46 TB OSD19 67%
sdg 3.64 TB OSD28 60%
sdh 5.46 TB OSD4 67%
sdi 5.46 TB OSD5 58%
node3
disk size Usage OSD Usage
sdc 3.64 TB OSD31 47%
sdd 3.64 TB OSD26 60%
sde 3.64 TB OSD25 47%
sdf 5.46 TB OSD15 71%
sdg 5.46 TB OSD16 49%
sdh 5.46 TB OSD0 40%
sdi 5.46 TB OSD1 58%
node4
disk size Usage OSD Usage
sdc 3.64 TB OSD30 60%
sdd 3.64 TB OSD22 80%
sde 3.64 TB OSD27 33%
sdf 5.46 TB OSD17 58%
sdg 5.46 TB OSD18 49%
sdh 5.46 TB OSD6 53%
sdi 5.46 TB OSD7 45%
node5
disk size Usage OSD Usage
sdb 3.64 TB OSD29 47%
sdc 3.64 TB OSD21 40%
sde 5.46 TB OSD8 54%
sdf 5.46 TB OSD9 80%
sdg 5.46 TB OSD10 71%
sdh 5.46 TB OSD11 45%
admin
2,930 Posts
July 15, 2021, 9:40 pmQuote from admin on July 15, 2021, 9:40 pmit does not seem the balancer is working, you can check
ceph balancer status
For now you can lower the crush weight of osd 24 to 3.0 ( TB) from the Maintenance tab
it does not seem the balancer is working, you can check
ceph balancer status
For now you can lower the crush weight of osd 24 to 3.0 ( TB) from the Maintenance tab
Last edited on July 15, 2021, 9:41 pm by admin · #2
petasanrd911
19 Posts
July 15, 2021, 10:53 pmQuote from petasanrd911 on July 15, 2021, 10:53 pmceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.002294",
"last_optimize_started": "Thu Jul 15 15:52:05 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}
ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.002294",
"last_optimize_started": "Thu Jul 15 15:52:05 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}
petasanrd911
19 Posts
July 17, 2021, 12:21 amQuote from petasanrd911 on July 17, 2021, 12:21 amanything I can do to get the OSD to properly balance? I adjusted the weight on osd 24 as suggested and the warning cleared but other OSD's are out of balance. I do see that there are no "plans" listed when I check the balancer status.
anything I can do to get the OSD to properly balance? I adjusted the weight on osd 24 as suggested and the warning cleared but other OSD's are out of balance. I do see that there are no "plans" listed when I check the balancer status.
it3
4 Posts
August 9, 2021, 2:24 pmQuote from it3 on August 9, 2021, 2:24 pmIf you increase the number of PGs, it improves balancing. Ideally you want about 100 to 150 PGs per OSD.
(30 OSDs x 100) / 3 = 1000 PGs across your pools.
You have 128+128+128+64+1 PGs.
I've not used the autobalancer.
If you increase the number of PGs, it improves balancing. Ideally you want about 100 to 150 PGs per OSD.
(30 OSDs x 100) / 3 = 1000 PGs across your pools.
You have 128+128+128+64+1 PGs.
I've not used the autobalancer.
OSD balance issue
petasanrd911
19 Posts
Quote from petasanrd911 on July 15, 2021, 9:12 pmI am seeing imbalance issue. I have the ceph balancer enabled and set to upmap. Recently a warning showed up that osd24 is almost full and that 5 pools are almost full. OSD and node list below. Trying to understand the storage, it looks like my main pool cephfs_data isn't showing all the available storage?
What do I need to do to get the OSD balanced? I do know that Node5 is 1 3.64tb drive short and will be adding one soon to keep the nodes even but I wouldn't' think that causes the OSD's to be so unbalanced.
ceph health detail
HEALTH_WARN 1 nearfull osd(s); 5 pool(s) nearfull
[WRN] OSD_NEARFULL: 1 nearfull osd(s)
osd.24 is near full
[WRN] POOL_NEARFULL: 5 pool(s) nearfull
pool 'rbd' is nearfull
pool 'cephfs_data' is nearfull
pool 'cephfs_metadata' is nearfull
pool 'plexdata' is nearfull
pool 'device_health_metrics' is nearfullStorage shows 93.36 TB / 160.1 TB (58.31%)
Pools
Name Type Usage PGs Size Min Size Rule Name Used Space Available Space Active OSDs
cephfs_data replicated cephfs 128 3 2 replicated_rule 93.21 TB 4.3 TB 34
cephfs_metadata replicated cephfs 64 3 2 replicated_rule 1022.74 MB 4.3 TB 34
device_health_metrics replicated mgr_devicehealth 1 3 2 replicated_rule 7.95 MB 4.3 TB 3
plexdata EC cephfs 128 3 2 ec-by-host-hdd 0 Bytes 8.59 TB 34
rbd replicated rbd 128 3 2 replicated_rule 3.94 KB 4.3 TB 34Node1
disk size Usage OSD Usage
sdc 3.64 TB OSD33 40%
sdd 3.64 TB OSD23 40%
sde 3.64 TB OSD20 74%
sdf 5.46 TB OSD12 76%
sdg 5.46 TB OSD13 80%
sdh 5.46 TB OSD2 44%
sdi 5.46 TB OSD3 62%Node2
disk size Usage OSD Usage
sdc 3.64 TB OSD32 73%
sdd 3.64 TB OSD24 87%
sde 5.46 TB OSD14 58%
sdf 5.46 TB OSD19 67%
sdg 3.64 TB OSD28 60%
sdh 5.46 TB OSD4 67%
sdi 5.46 TB OSD5 58%node3
disk size Usage OSD Usage
sdc 3.64 TB OSD31 47%
sdd 3.64 TB OSD26 60%
sde 3.64 TB OSD25 47%
sdf 5.46 TB OSD15 71%
sdg 5.46 TB OSD16 49%
sdh 5.46 TB OSD0 40%
sdi 5.46 TB OSD1 58%node4
disk size Usage OSD Usage
sdc 3.64 TB OSD30 60%
sdd 3.64 TB OSD22 80%
sde 3.64 TB OSD27 33%
sdf 5.46 TB OSD17 58%
sdg 5.46 TB OSD18 49%
sdh 5.46 TB OSD6 53%
sdi 5.46 TB OSD7 45%node5
disk size Usage OSD Usage
sdb 3.64 TB OSD29 47%
sdc 3.64 TB OSD21 40%
sde 5.46 TB OSD8 54%
sdf 5.46 TB OSD9 80%
sdg 5.46 TB OSD10 71%
sdh 5.46 TB OSD11 45%
I am seeing imbalance issue. I have the ceph balancer enabled and set to upmap. Recently a warning showed up that osd24 is almost full and that 5 pools are almost full. OSD and node list below. Trying to understand the storage, it looks like my main pool cephfs_data isn't showing all the available storage?
What do I need to do to get the OSD balanced? I do know that Node5 is 1 3.64tb drive short and will be adding one soon to keep the nodes even but I wouldn't' think that causes the OSD's to be so unbalanced.
ceph health detail
HEALTH_WARN 1 nearfull osd(s); 5 pool(s) nearfull
[WRN] OSD_NEARFULL: 1 nearfull osd(s)
osd.24 is near full
[WRN] POOL_NEARFULL: 5 pool(s) nearfull
pool 'rbd' is nearfull
pool 'cephfs_data' is nearfull
pool 'cephfs_metadata' is nearfull
pool 'plexdata' is nearfull
pool 'device_health_metrics' is nearfull
Storage shows 93.36 TB / 160.1 TB (58.31%)
Pools
Name Type Usage PGs Size Min Size Rule Name Used Space Available Space Active OSDs
cephfs_data replicated cephfs 128 3 2 replicated_rule 93.21 TB 4.3 TB 34
cephfs_metadata replicated cephfs 64 3 2 replicated_rule 1022.74 MB 4.3 TB 34
device_health_metrics replicated mgr_devicehealth 1 3 2 replicated_rule 7.95 MB 4.3 TB 3
plexdata EC cephfs 128 3 2 ec-by-host-hdd 0 Bytes 8.59 TB 34
rbd replicated rbd 128 3 2 replicated_rule 3.94 KB 4.3 TB 34
Node1
disk size Usage OSD Usage
sdc 3.64 TB OSD33 40%
sdd 3.64 TB OSD23 40%
sde 3.64 TB OSD20 74%
sdf 5.46 TB OSD12 76%
sdg 5.46 TB OSD13 80%
sdh 5.46 TB OSD2 44%
sdi 5.46 TB OSD3 62%
Node2
disk size Usage OSD Usage
sdc 3.64 TB OSD32 73%
sdd 3.64 TB OSD24 87%
sde 5.46 TB OSD14 58%
sdf 5.46 TB OSD19 67%
sdg 3.64 TB OSD28 60%
sdh 5.46 TB OSD4 67%
sdi 5.46 TB OSD5 58%
node3
disk size Usage OSD Usage
sdc 3.64 TB OSD31 47%
sdd 3.64 TB OSD26 60%
sde 3.64 TB OSD25 47%
sdf 5.46 TB OSD15 71%
sdg 5.46 TB OSD16 49%
sdh 5.46 TB OSD0 40%
sdi 5.46 TB OSD1 58%
node4
disk size Usage OSD Usage
sdc 3.64 TB OSD30 60%
sdd 3.64 TB OSD22 80%
sde 3.64 TB OSD27 33%
sdf 5.46 TB OSD17 58%
sdg 5.46 TB OSD18 49%
sdh 5.46 TB OSD6 53%
sdi 5.46 TB OSD7 45%
node5
disk size Usage OSD Usage
sdb 3.64 TB OSD29 47%
sdc 3.64 TB OSD21 40%
sde 5.46 TB OSD8 54%
sdf 5.46 TB OSD9 80%
sdg 5.46 TB OSD10 71%
sdh 5.46 TB OSD11 45%
admin
2,930 Posts
Quote from admin on July 15, 2021, 9:40 pmit does not seem the balancer is working, you can check
ceph balancer status
For now you can lower the crush weight of osd 24 to 3.0 ( TB) from the Maintenance tab
it does not seem the balancer is working, you can check
ceph balancer status
For now you can lower the crush weight of osd 24 to 3.0 ( TB) from the Maintenance tab
petasanrd911
19 Posts
Quote from petasanrd911 on July 15, 2021, 10:53 pmceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.002294",
"last_optimize_started": "Thu Jul 15 15:52:05 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}
ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.002294",
"last_optimize_started": "Thu Jul 15 15:52:05 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}
petasanrd911
19 Posts
Quote from petasanrd911 on July 17, 2021, 12:21 amanything I can do to get the OSD to properly balance? I adjusted the weight on osd 24 as suggested and the warning cleared but other OSD's are out of balance. I do see that there are no "plans" listed when I check the balancer status.
anything I can do to get the OSD to properly balance? I adjusted the weight on osd 24 as suggested and the warning cleared but other OSD's are out of balance. I do see that there are no "plans" listed when I check the balancer status.
it3
4 Posts
Quote from it3 on August 9, 2021, 2:24 pmIf you increase the number of PGs, it improves balancing. Ideally you want about 100 to 150 PGs per OSD.
(30 OSDs x 100) / 3 = 1000 PGs across your pools.
You have 128+128+128+64+1 PGs.
I've not used the autobalancer.
If you increase the number of PGs, it improves balancing. Ideally you want about 100 to 150 PGs per OSD.
(30 OSDs x 100) / 3 = 1000 PGs across your pools.
You have 128+128+128+64+1 PGs.
I've not used the autobalancer.