Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

OSD balance issue

I am seeing imbalance issue. I have the ceph balancer enabled and set to upmap. Recently a warning showed up that  osd24 is almost full and that 5 pools are almost full. OSD and node list below. Trying to understand the storage, it looks like my main pool cephfs_data isn't showing all the available storage?

What do I need to do to get the OSD balanced? I do know that Node5 is 1 3.64tb drive short and will be adding one soon to keep the nodes even but I wouldn't' think that causes the OSD's to be so unbalanced.

ceph health detail
HEALTH_WARN 1 nearfull osd(s); 5 pool(s) nearfull
[WRN] OSD_NEARFULL: 1 nearfull osd(s)
osd.24 is near full
[WRN] POOL_NEARFULL: 5 pool(s) nearfull
pool 'rbd' is nearfull
pool 'cephfs_data' is nearfull
pool 'cephfs_metadata' is nearfull
pool 'plexdata' is nearfull
pool 'device_health_metrics' is nearfull

Storage shows 93.36 TB / 160.1 TB (58.31%)

Pools
Name                                  Type              Usage                               PGs    Size    Min Size    Rule Name            Used Space    Available Space    Active OSDs
cephfs_data                       replicated    cephfs                              128      3         2                 replicated_rule     93.21 TB        4.3 TB                    34
cephfs_metadata              replicated    cephfs                              64        3         2                 replicated_rule     1022.74 MB  4.3 TB                    34
device_health_metrics    replicated   mgr_devicehealth          1          3         2                 replicated_rule     7.95 MB         4.3 TB                    3
plexdata                              EC                cephfs                               128      3         2                 ec-by-host-hdd     0 Bytes           8.59 TB                 34
rbd                                       replicated    rbd                                    128      3         2                 replicated_rule     3.94 KB         4.3 TB                    34

Node1
disk    size    Usage    OSD Usage
sdc    3.64 TB    OSD33    40%
sdd    3.64 TB    OSD23    40%
sde    3.64 TB    OSD20    74%
sdf    5.46 TB    OSD12    76%
sdg    5.46 TB    OSD13    80%
sdh    5.46 TB    OSD2    44%
sdi    5.46 TB    OSD3    62%

Node2
disk    size    Usage    OSD Usage
sdc    3.64 TB    OSD32    73%
sdd    3.64 TB    OSD24    87%
sde    5.46 TB    OSD14    58%
sdf    5.46 TB    OSD19    67%
sdg    3.64 TB    OSD28    60%
sdh    5.46 TB    OSD4    67%
sdi    5.46 TB    OSD5    58%

node3
disk    size    Usage    OSD Usage
sdc    3.64 TB    OSD31    47%
sdd    3.64 TB    OSD26    60%
sde    3.64 TB    OSD25    47%
sdf    5.46 TB    OSD15    71%
sdg    5.46 TB    OSD16    49%
sdh    5.46 TB    OSD0    40%
sdi    5.46 TB    OSD1    58%

node4
disk    size    Usage    OSD Usage
sdc    3.64 TB    OSD30    60%
sdd    3.64 TB    OSD22    80%
sde    3.64 TB    OSD27    33%
sdf    5.46 TB    OSD17    58%
sdg    5.46 TB    OSD18    49%
sdh    5.46 TB    OSD6    53%
sdi    5.46 TB    OSD7    45%

node5
disk    size    Usage    OSD Usage
sdb    3.64 TB    OSD29    47%
sdc    3.64 TB    OSD21    40%
sde    5.46 TB    OSD8    54%
sdf    5.46 TB    OSD9    80%
sdg    5.46 TB    OSD10    71%
sdh    5.46 TB    OSD11    45%

it does not seem the balancer is working, you can check

ceph balancer status

For now you can lower the crush weight of osd 24 to 3.0 ( TB) from the Maintenance tab

ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.002294",
"last_optimize_started": "Thu Jul 15 15:52:05 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}

anything I can do to get the OSD to properly balance? I adjusted the weight on osd 24 as suggested and the warning cleared but other OSD's are out of balance. I do see that there are no "plans" listed when I check the balancer status.

If you increase the number of PGs, it improves balancing.  Ideally you want about 100 to 150 PGs per OSD.

(30 OSDs x 100) / 3 = 1000 PGs across your pools.

You have 128+128+128+64+1 PGs.

I've not used the autobalancer.