Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Ceph Health: Too many PGs per OSD

Pages: 1 2

Hi there,

we've installed a 5 node cluster (with each 5 disks (5OSD and 1 journal) and we've selected the cluster-size (50-100disks - perhaps this was the issue).
The installation went successfully, but the following message appeared on the dashboard:

Reduced data availability: 1237 pgs inactive
too many PGs per OSD (777 > max 300)
Degraded data redundancy: 1237 pgs unclean
2 slow requests are blocked > 32 sec

Afterwards we have added to every node one OSD disk more (totally 6 now), the issue with the unclean pgs was resolved.
The output looks now good until one message:


ceph osd -w --cluster peta-001-bit1

cluster:
id: e22c41d1-937a-4597-ba82-db706c0d9f53
health: HEALTH_WARN
too many PGs per OSD (491 > max 300)

services:
mon: 3 daemons, quorum cep-001-bit1,cep-002-bit1,cep-003-bit1
mgr: cep-002-bit1(active), standbys: cep-003-bit1, cep-001-bit1
osd: 25 osds: 25 up, 25 in

data:
pools: 1 pools, 4096 pgs
objects: 0 objects, 0 bytes
usage: 525 GB used, 6955 GB / 7481 GB avail
pgs: 4096 active+clean


 

Has this something to do with the cluster-size config, which we selected initial?

Many thanks in advance.

Best regards
Reto

 

Yes it is related. The initial selection of 50->200 disks results in 4096 PGs, it was better to choose 15->50 which results in 1024 PGs

Now you have 25 OSDs : each OSD  has  4096 X 3 (replicas) / 25 =  491 PGs

The warning you see is because the upper limit is 300 PGs per OSD, this is why you see the warning. Your cluster will work but it puts too much stress on the OSD as it needs to synchronize all these with other peer OSDs.

The 15->50 disks selection would have resulted in  122 PGs per OSD which would be an ideal count.

It is not possible to decrease the PG count. It is possible to increase it (if expanding the cluster) but it will generate a lot of rebalance of stored data so it is really better to get it correct from the beginning. Ceph developers will in the future try to make this parameter flexible but currently you need to know beforehand.

If this is test cluster i would recommend re-install or maybe increase the disks to a 42 OSDs, so to be just below the 300 PG warning.

Hi admin,

Many thanks for great detailed explanation!
Alright, we are going to reinstall PetaSAN, as we do not plan to increase the disk size in the near future.

Best regards and thanks again
Reto

Hey,

Got the same situation but have been running this for over a year.  Just got the warning of "too many PGs per OSD (357 > max 300)".  Have 17 OSD's with 1024 pgs, now what?  Is this going to hurt it?

Not too alarming, some options:

1-ignore the warning

2-add approx 20% more osds

3-from the Ceph Configuration menu in ui, increase mon_max_pg_per_osd under mgr section from 300 to 360

4-decrease the pg count in your pools by 20%, note this will cause data rebalance
ceph osd pool set POOL pg_num XX
ceph osd pool set POOL pgp_num XX

well changed the number to 360 yesterday and this morning, another line was there with 300 max, so now there are two lines for the same thing, but one of them I'm unable to make it change, it just keep coming back to 300.

mon_max_pg_per_osd  =
300
mon_max_pg_per_osd  =
360

Deleted the one with 300, which left the one with 360 listed.  Refreshed and the two came back so unable to make changes

Can you delete both keys, then re-add under mgr section. probably there was another global key from the upgrade.

It won't stay deleted 🙂

Removed both statements and created one under the mgr, but it came back in global as 300.  Guess we'll have to try #4 from your above options?

 

from cli run:

ceph config rm global mon_max_pg_per_osd
ceph config-key rm config/global/mon_max_pg_per_osd

then from ui delete key in both places, wait a few minutes, check it does not come back 🙂 then add the key from ui in mgr section

i suspect this could be due to the upgrade of conf file, could be a bug in ceph config assimilate-conf command we use.

Just a question on this, would this result in any outages?  If the statement cannot be given, will the system continue to run with it?

Pages: 1 2