Forums - PetaSAN

ForumGeneral DiscussionOSD balance issue
You need to log in to create posts and topics. Login · Register
OSD balance issue

petasanrd911
19 Posts

July 15, 2021, 9:12 pm
Quote from petasanrd911 on July 15, 2021, 9:12 pm
I am seeing imbalance issue. I have the ceph balancer enabled and set to upmap. Recently a warning showed up that osd24 is almost full and that 5 pools are almost full. OSD and node list below. Trying to understand the storage, it looks like my main pool cephfs_data isn't showing all the available storage?

What do I need to do to get the OSD balanced? I do know that Node5 is 1 3.64tb drive short and will be adding one soon to keep the nodes even but I wouldn't' think that causes the OSD's to be so unbalanced.

ceph health detail
HEALTH_WARN 1 nearfull osd(s); 5 pool(s) nearfull
[WRN] OSD_NEARFULL: 1 nearfull osd(s)
osd.24 is near full
[WRN] POOL_NEARFULL: 5 pool(s) nearfull
pool 'rbd' is nearfull
pool 'cephfs_data' is nearfull
pool 'cephfs_metadata' is nearfull
pool 'plexdata' is nearfull
pool 'device_health_metrics' is nearfull

Storage shows 93.36 TB / 160.1 TB (58.31%)

Pools
Name                                  Type              Usage                               PGs   Size   Min Size   Rule Name            Used Space   Available Space   Active OSDs
cephfs_data                       replicated   cephfs                              128      3         2                 replicated_rule     93.21 TB        4.3 TB                34
cephfs_metadata              replicated   cephfs                      64        3         2                 replicated_rule    1022.74 MB 4.3 TB                    34
device_health_metrics   replicated   mgr_devicehealth          1          3         2                 replicated_rule    7.95 MB         4.3 TB            3
plexdata                              EC                cephfs                           128      3         2                 ec-by-host-hdd     0 Bytes           8.59 TB                 34
rbd                                       replicated    rbd                                    128      3         2                 replicated_rule    3.94 KB         4.3 TB                    34

Node1
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD33   40%
sdd   3.64 TB   OSD23   40%
sde   3.64 TB   OSD20   74%
sdf   5.46 TB   OSD12   76%
sdg   5.46 TB   OSD13   80%
sdh   5.46 TB   OSD2   44%
sdi   5.46 TB   OSD3   62%

Node2
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD32   73%
sdd   3.64 TB   OSD24   87%
sde   5.46 TB   OSD14   58%
sdf   5.46 TB   OSD19   67%
sdg   3.64 TB   OSD28   60%
sdh   5.46 TB   OSD4   67%
sdi   5.46 TB   OSD5   58%

node3
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD31   47%
sdd   3.64 TB   OSD26   60%
sde   3.64 TB   OSD25   47%
sdf   5.46 TB   OSD15   71%
sdg   5.46 TB   OSD16   49%
sdh   5.46 TB   OSD0   40%
sdi   5.46 TB   OSD1   58%

node4
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD30   60%
sdd   3.64 TB   OSD22   80%
sde   3.64 TB   OSD27   33%
sdf   5.46 TB   OSD17   58%
sdg   5.46 TB   OSD18   49%
sdh   5.46 TB   OSD6   53%
sdi   5.46 TB   OSD7   45%

node5
disk   size   Usage   OSD Usage
sdb   3.64 TB   OSD29   47%
sdc   3.64 TB   OSD21   40%
sde   5.46 TB   OSD8   54%
sdf   5.46 TB   OSD9   80%
sdg   5.46 TB   OSD10   71%
sdh   5.46 TB   OSD11   45%

I am seeing imbalance issue. I have the ceph balancer enabled and set to upmap. Recently a warning showed up that osd24 is almost full and that 5 pools are almost full. OSD and node list below. Trying to understand the storage, it looks like my main pool cephfs_data isn't showing all the available storage?

What do I need to do to get the OSD balanced? I do know that Node5 is 1 3.64tb drive short and will be adding one soon to keep the nodes even but I wouldn't' think that causes the OSD's to be so unbalanced.

ceph health detail
HEALTH_WARN 1 nearfull osd(s); 5 pool(s) nearfull
[WRN] OSD_NEARFULL: 1 nearfull osd(s)
osd.24 is near full
[WRN] POOL_NEARFULL: 5 pool(s) nearfull
pool 'rbd' is nearfull
pool 'cephfs_data' is nearfull
pool 'cephfs_metadata' is nearfull
pool 'plexdata' is nearfull
pool 'device_health_metrics' is nearfull

Storage shows 93.36 TB / 160.1 TB (58.31%)

Pools
Name                                  Type              Usage                               PGs   Size   Min Size   Rule Name            Used Space   Available Space   Active OSDs
cephfs_data                       replicated   cephfs                              128      3         2                 replicated_rule     93.21 TB        4.3 TB                34
cephfs_metadata              replicated   cephfs                      64        3         2                 replicated_rule    1022.74 MB 4.3 TB                    34
device_health_metrics   replicated   mgr_devicehealth          1          3         2                 replicated_rule    7.95 MB         4.3 TB            3
plexdata                              EC                cephfs                           128      3         2                 ec-by-host-hdd     0 Bytes           8.59 TB                 34
rbd                                       replicated    rbd                                    128      3         2                 replicated_rule    3.94 KB         4.3 TB                    34

Node1
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD33   40%
sdd   3.64 TB   OSD23   40%
sde   3.64 TB   OSD20   74%
sdf   5.46 TB   OSD12   76%
sdg   5.46 TB   OSD13   80%
sdh   5.46 TB   OSD2   44%
sdi   5.46 TB   OSD3   62%

Node2
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD32   73%
sdd   3.64 TB   OSD24   87%
sde   5.46 TB   OSD14   58%
sdf   5.46 TB   OSD19   67%
sdg   3.64 TB   OSD28   60%
sdh   5.46 TB   OSD4   67%
sdi   5.46 TB   OSD5   58%

node3
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD31   47%
sdd   3.64 TB   OSD26   60%
sde   3.64 TB   OSD25   47%
sdf   5.46 TB   OSD15   71%
sdg   5.46 TB   OSD16   49%
sdh   5.46 TB   OSD0   40%
sdi   5.46 TB   OSD1   58%

node4
disk   size   Usage   OSD Usage
sdc   3.64 TB   OSD30   60%
sdd   3.64 TB   OSD22   80%
sde   3.64 TB   OSD27   33%
sdf   5.46 TB   OSD17   58%
sdg   5.46 TB   OSD18   49%
sdh   5.46 TB   OSD6   53%
sdi   5.46 TB   OSD7   45%

node5
disk   size   Usage   OSD Usage
sdb   3.64 TB   OSD29   47%
sdc   3.64 TB   OSD21   40%
sde   5.46 TB   OSD8   54%
sdf   5.46 TB   OSD9   80%
sdg   5.46 TB   OSD10   71%
sdh   5.46 TB   OSD11   45%

#1

admin
2,969 Posts

July 15, 2021, 9:40 pm
Quote from admin on July 15, 2021, 9:40 pm
it does not seem the balancer is working, you can check

ceph balancer status

For now you can lower the crush weight of osd 24 to 3.0 ( TB) from the Maintenance tab

it does not seem the balancer is working, you can check

ceph balancer status

For now you can lower the crush weight of osd 24 to 3.0 ( TB) from the Maintenance tab

Last edited on July 15, 2021, 9:41 pm by admin · #2

petasanrd911
19 Posts

July 15, 2021, 10:53 pm
Quote from petasanrd911 on July 15, 2021, 10:53 pm
ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.002294",
"last_optimize_started": "Thu Jul 15 15:52:05 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}

ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.002294",
"last_optimize_started": "Thu Jul 15 15:52:05 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}

#3

petasanrd911
19 Posts

July 17, 2021, 12:21 am
Quote from petasanrd911 on July 17, 2021, 12:21 am
anything I can do to get the OSD to properly balance? I adjusted the weight on osd 24 as suggested and the warning cleared but other OSD's are out of balance. I do see that there are no "plans" listed when I check the balancer status.

anything I can do to get the OSD to properly balance? I adjusted the weight on osd 24 as suggested and the warning cleared but other OSD's are out of balance. I do see that there are no "plans" listed when I check the balancer status.

#4

it3
4 Posts

August 9, 2021, 2:24 pm
Quote from it3 on August 9, 2021, 2:24 pm
If you increase the number of PGs, it improves balancing. Ideally you want about 100 to 150 PGs per OSD.

(30 OSDs x 100) / 3 = 1000 PGs across your pools.

You have 128+128+128+64+1 PGs.

I've not used the autobalancer.

If you increase the number of PGs, it improves balancing. Ideally you want about 100 to 150 PGs per OSD.

(30 OSDs x 100) / 3 = 1000 PGs across your pools.

You have 128+128+128+64+1 PGs.

I've not used the autobalancer.

#5

Post Reply: OSD balance issue

Cancel