Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

BlueFS spillover after 3.3.0 update

Hi,

After updating to 3.3.0 from 3.2.1 we suddenly have a bunch of warnings about OSD(s) experiencing BlueFS spillover. They appeared as we updated and rebooted each node, eg there was 7 on the first, and 6 on the second, and so on, and we've ended up with 27 showing.

What's odd is that when we run "ceph health detail" most of the OSDs aren't anywhere near full:

root@gl-san-02a:~# ceph health detail
HEALTH_WARN 26 OSD(s) experiencing BlueFS spillover
[WRN] BLUEFS_SPILLOVER: 26 OSD(s) experiencing BlueFS spillover
osd.0 spilled over 72 MiB metadata from 'db' device (28 GiB used of 60 GiB) to slow device
osd.3 spilled over 5.5 GiB metadata from 'db' device (2.7 GiB used of 60 GiB) to slow device
osd.5 spilled over 22 GiB metadata from 'db' device (11 GiB used of 60 GiB) to slow device
osd.9 spilled over 24 GiB metadata from 'db' device (17 GiB used of 60 GiB) to slow device
osd.10 spilled over 29 GiB metadata from 'db' device (12 GiB used of 60 GiB) to slow device
osd.12 spilled over 25 GiB metadata from 'db' device (22 GiB used of 60 GiB) to slow device
osd.14 spilled over 332 MiB metadata from 'db' device (56 GiB used of 60 GiB) to slow device
osd.16 spilled over 23 GiB metadata from 'db' device (25 GiB used of 60 GiB) to slow device
osd.18 spilled over 34 GiB metadata from 'db' device (13 GiB used of 60 GiB) to slow device
osd.20 spilled over 28 GiB metadata from 'db' device (16 GiB used of 60 GiB) to slow device
osd.21 spilled over 26 GiB metadata from 'db' device (19 GiB used of 60 GiB) to slow device
osd.26 spilled over 23 GiB metadata from 'db' device (17 GiB used of 60 GiB) to slow device
osd.33 spilled over 27 GiB metadata from 'db' device (14 GiB used of 60 GiB) to slow device
osd.35 spilled over 35 GiB metadata from 'db' device (22 GiB used of 60 GiB) to slow device
osd.36 spilled over 25 GiB metadata from 'db' device (16 GiB used of 60 GiB) to slow device
osd.37 spilled over 33 GiB metadata from 'db' device (16 GiB used of 60 GiB) to slow device
osd.38 spilled over 23 GiB metadata from 'db' device (5.0 GiB used of 60 GiB) to slow device
osd.39 spilled over 23 GiB metadata from 'db' device (16 GiB used of 60 GiB) to slow device
osd.40 spilled over 21 GiB metadata from 'db' device (15 GiB used of 60 GiB) to slow device
osd.41 spilled over 23 GiB metadata from 'db' device (13 GiB used of 60 GiB) to slow device
osd.43 spilled over 36 GiB metadata from 'db' device (10 GiB used of 60 GiB) to slow device
osd.45 spilled over 312 MiB metadata from 'db' device (45 GiB used of 60 GiB) to slow device
osd.47 spilled over 52 GiB metadata from 'db' device (10 GiB used of 60 GiB) to slow device
osd.48 spilled over 31 GiB metadata from 'db' device (18 GiB used of 60 GiB) to slow device
osd.49 spilled over 60 GiB metadata from 'db' device (9.7 GiB used of 60 GiB) to slow device
osd.52 spilled over 5.4 GiB metadata from 'db' device (58 GiB used of 60 GiB) to slow device

Any suggestions on how we can remedy this (without just suppressing the warning)?

Thanks!

It should not be related to the upgrade, the upgrade did a minor change of Ceph version 17.2.5 -> 17.2.7.

You would need to migrate the db to a larger device, you can use the script

/opt/petasan/script/util/migrate_db.sh

or use ceph-bluestore-tool or ceph-volume directly to perform the migration,