Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Ceph HEALTH_ERR on node reboot

I just got a freeze of the ISCSI-LUN while rebooting one node (three node setup). How to prevent this?

I set nodown and noout before rebooting.

root@ceph-node-2:~# ceph -s
cluster c352a562-4dcc-48c4-8f19-9659ca33f475
health HEALTH_ERR
215 pgs are stuck inactive for more than 300 seconds
339 pgs peering
220 pgs stale
215 pgs stuck inactive
6 requests are blocked > 32 sec
nodown,noout,sortbitwise,require_jewel_osds flag(s) set
1 mons down, quorum 1,2 ceph-node-2,ceph-node-3
monmap e3: 3 mons at {ceph-node-1=192.168.1.194:6789/0,ceph-node-2=192.168.1.195:6789/0,ceph-node-3=192.168.1.196:6789/0}
election epoch 152, quorum 1,2 ceph-node-2,ceph-node-3
osdmap e4057: 72 osds: 72 up, 72 in
flags nodown,noout,sortbitwise,require_jewel_osds
pgmap v102858: 4096 pgs, 1 pools, 324 GB data, 83299 objects
981 GB used, 260 TB / 261 TB avail
3426 active+clean
339 peering
220 stale+active+clean
111 activating

IO came back after node was present again (the node was shutdown by the cluster then, so I needed to start it a another time)
Regards,

Dennis

This is most likely a resource limitation issue. We have a hardware recommendation guide, please do look at it and try to be as close as possible. When you reboot a node Ceph does background recovery to bring all nodes back in sync. This will put extra load on your servers on top of client io. Also why did you set the nodown/noout option on the OSDs ? if this was because they were constantly coming up and down (flapping) it is also probably because of resource issues.

So in terms of what to do:

  • Try to make sure your hardware is close to the recommended guide.
  • If you are below recommendation please run "atop" command on the 3 nodes after you reboot and take note of %free ram, % disk busy, % cpu, % net util for the first 15 minutes.
  • reset back the nodown/noout
  • If you have limits on many resources, try to increase RAM first as it is easiest to do and can help other bottlenecks.
  • Please add the following to the /etc/ceph/CLUSTER_NAME.conf on all nodes then reboot, they should help further limit the recovery  load from the default values

osd_max_backfills = 1

osd_recovery_max_active = 1

osd_recovery_priority = 1

osd_recovery_op_priority = 1

osd_recovery_threads = 1

osd_client_op_priority = 63

osd_recovery_max_start = 1

osd_max_scrubs = 1

osd_scrub_during_recovery = false

osd_scrub_priority = 1

If you do still have issues, let me know the hardware configurations and the atop results and we can take it from there.

This is not load related. It seems to me that nodown option has caused this.

Using noout option (only) during reboot/maintainance to prevent any recovery seems to work fine for the moment.

Regards,

Dennis

Excellent to hear.

We have seen many freezes like this due to very low resources used (we also do it ourselves for testing). One of the feedback we have in PetaSAN is since everything works easily, some users may underestimate the value of proper sizing. The whole release 1.4 it targeted at this point.

It is indeed strange the noout caused the freeze in your case. Will test this and maybe it is a Ceph bug.

Yes I've seen the low setups in this forum as well, but our setup is bigger as your recommendation.

Just to correct, it is nodown that causes the problem.