Ceph HEALTH_ERR on node reboot
therm
121 Posts
July 6, 2017, 10:22 amQuote from therm on July 6, 2017, 10:22 amI just got a freeze of the ISCSI-LUN while rebooting one node (three node setup). How to prevent this?
I set nodown and noout before rebooting.
root@ceph-node-2:~# ceph -s
cluster c352a562-4dcc-48c4-8f19-9659ca33f475
health HEALTH_ERR
215 pgs are stuck inactive for more than 300 seconds
339 pgs peering
220 pgs stale
215 pgs stuck inactive
6 requests are blocked > 32 sec
nodown,noout,sortbitwise,require_jewel_osds flag(s) set
1 mons down, quorum 1,2 ceph-node-2,ceph-node-3
monmap e3: 3 mons at {ceph-node-1=192.168.1.194:6789/0,ceph-node-2=192.168.1.195:6789/0,ceph-node-3=192.168.1.196:6789/0}
election epoch 152, quorum 1,2 ceph-node-2,ceph-node-3
osdmap e4057: 72 osds: 72 up, 72 in
flags nodown,noout,sortbitwise,require_jewel_osds
pgmap v102858: 4096 pgs, 1 pools, 324 GB data, 83299 objects
981 GB used, 260 TB / 261 TB avail
3426 active+clean
339 peering
220 stale+active+clean
111 activating
IO came back after node was present again (the node was shutdown by the cluster then, so I needed to start it a another time)
Regards,
Dennis
I just got a freeze of the ISCSI-LUN while rebooting one node (three node setup). How to prevent this?
I set nodown and noout before rebooting.
root@ceph-node-2:~# ceph -s
cluster c352a562-4dcc-48c4-8f19-9659ca33f475
health HEALTH_ERR
215 pgs are stuck inactive for more than 300 seconds
339 pgs peering
220 pgs stale
215 pgs stuck inactive
6 requests are blocked > 32 sec
nodown,noout,sortbitwise,require_jewel_osds flag(s) set
1 mons down, quorum 1,2 ceph-node-2,ceph-node-3
monmap e3: 3 mons at {ceph-node-1=192.168.1.194:6789/0,ceph-node-2=192.168.1.195:6789/0,ceph-node-3=192.168.1.196:6789/0}
election epoch 152, quorum 1,2 ceph-node-2,ceph-node-3
osdmap e4057: 72 osds: 72 up, 72 in
flags nodown,noout,sortbitwise,require_jewel_osds
pgmap v102858: 4096 pgs, 1 pools, 324 GB data, 83299 objects
981 GB used, 260 TB / 261 TB avail
3426 active+clean
339 peering
220 stale+active+clean
111 activating
IO came back after node was present again (the node was shutdown by the cluster then, so I needed to start it a another time)
Regards,
Dennis
Last edited on July 6, 2017, 11:25 am · #1
admin
2,930 Posts
July 6, 2017, 12:06 pmQuote from admin on July 6, 2017, 12:06 pmThis is most likely a resource limitation issue. We have a hardware recommendation guide, please do look at it and try to be as close as possible. When you reboot a node Ceph does background recovery to bring all nodes back in sync. This will put extra load on your servers on top of client io. Also why did you set the nodown/noout option on the OSDs ? if this was because they were constantly coming up and down (flapping) it is also probably because of resource issues.
So in terms of what to do:
- Try to make sure your hardware is close to the recommended guide.
- If you are below recommendation please run "atop" command on the 3 nodes after you reboot and take note of %free ram, % disk busy, % cpu, % net util for the first 15 minutes.
- reset back the nodown/noout
- If you have limits on many resources, try to increase RAM first as it is easiest to do and can help other bottlenecks.
- Please add the following to the /etc/ceph/CLUSTER_NAME.conf on all nodes then reboot, they should help further limit the recovery load from the default values
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_priority = 1
osd_recovery_op_priority = 1
osd_recovery_threads = 1
osd_client_op_priority = 63
osd_recovery_max_start = 1
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
If you do still have issues, let me know the hardware configurations and the atop results and we can take it from there.
This is most likely a resource limitation issue. We have a hardware recommendation guide, please do look at it and try to be as close as possible. When you reboot a node Ceph does background recovery to bring all nodes back in sync. This will put extra load on your servers on top of client io. Also why did you set the nodown/noout option on the OSDs ? if this was because they were constantly coming up and down (flapping) it is also probably because of resource issues.
So in terms of what to do:
- Try to make sure your hardware is close to the recommended guide.
- If you are below recommendation please run "atop" command on the 3 nodes after you reboot and take note of %free ram, % disk busy, % cpu, % net util for the first 15 minutes.
- reset back the nodown/noout
- If you have limits on many resources, try to increase RAM first as it is easiest to do and can help other bottlenecks.
- Please add the following to the /etc/ceph/CLUSTER_NAME.conf on all nodes then reboot, they should help further limit the recovery load from the default values
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_priority = 1
osd_recovery_op_priority = 1
osd_recovery_threads = 1
osd_client_op_priority = 63
osd_recovery_max_start = 1
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
If you do still have issues, let me know the hardware configurations and the atop results and we can take it from there.
therm
121 Posts
July 7, 2017, 5:52 amQuote from therm on July 7, 2017, 5:52 amThis is not load related. It seems to me that nodown option has caused this.
Using noout option (only) during reboot/maintainance to prevent any recovery seems to work fine for the moment.
Regards,
Dennis
This is not load related. It seems to me that nodown option has caused this.
Using noout option (only) during reboot/maintainance to prevent any recovery seems to work fine for the moment.
Regards,
Dennis
admin
2,930 Posts
July 7, 2017, 8:06 amQuote from admin on July 7, 2017, 8:06 amExcellent to hear.
We have seen many freezes like this due to very low resources used (we also do it ourselves for testing). One of the feedback we have in PetaSAN is since everything works easily, some users may underestimate the value of proper sizing. The whole release 1.4 it targeted at this point.
It is indeed strange the noout caused the freeze in your case. Will test this and maybe it is a Ceph bug.
Excellent to hear.
We have seen many freezes like this due to very low resources used (we also do it ourselves for testing). One of the feedback we have in PetaSAN is since everything works easily, some users may underestimate the value of proper sizing. The whole release 1.4 it targeted at this point.
It is indeed strange the noout caused the freeze in your case. Will test this and maybe it is a Ceph bug.
Last edited on July 7, 2017, 8:07 am · #4
therm
121 Posts
July 7, 2017, 8:08 amQuote from therm on July 7, 2017, 8:08 amYes I've seen the low setups in this forum as well, but our setup is bigger as your recommendation.
Just to correct, it is nodown that causes the problem.
Yes I've seen the low setups in this forum as well, but our setup is bigger as your recommendation.
Just to correct, it is nodown that causes the problem.
Ceph HEALTH_ERR on node reboot
therm
121 Posts
Quote from therm on July 6, 2017, 10:22 amI just got a freeze of the ISCSI-LUN while rebooting one node (three node setup). How to prevent this?
I set nodown and noout before rebooting.
root@ceph-node-2:~# ceph -s
cluster c352a562-4dcc-48c4-8f19-9659ca33f475
health HEALTH_ERR
215 pgs are stuck inactive for more than 300 seconds
339 pgs peering
220 pgs stale
215 pgs stuck inactive
6 requests are blocked > 32 sec
nodown,noout,sortbitwise,require_jewel_osds flag(s) set
1 mons down, quorum 1,2 ceph-node-2,ceph-node-3
monmap e3: 3 mons at {ceph-node-1=192.168.1.194:6789/0,ceph-node-2=192.168.1.195:6789/0,ceph-node-3=192.168.1.196:6789/0}
election epoch 152, quorum 1,2 ceph-node-2,ceph-node-3
osdmap e4057: 72 osds: 72 up, 72 in
flags nodown,noout,sortbitwise,require_jewel_osds
pgmap v102858: 4096 pgs, 1 pools, 324 GB data, 83299 objects
981 GB used, 260 TB / 261 TB avail
3426 active+clean
339 peering
220 stale+active+clean
111 activatingIO came back after node was present again (the node was shutdown by the cluster then, so I needed to start it a another time)
Regards,Dennis
I just got a freeze of the ISCSI-LUN while rebooting one node (three node setup). How to prevent this?
I set nodown and noout before rebooting.
root@ceph-node-2:~# ceph -s
cluster c352a562-4dcc-48c4-8f19-9659ca33f475
health HEALTH_ERR
215 pgs are stuck inactive for more than 300 seconds
339 pgs peering
220 pgs stale
215 pgs stuck inactive
6 requests are blocked > 32 sec
nodown,noout,sortbitwise,require_jewel_osds flag(s) set
1 mons down, quorum 1,2 ceph-node-2,ceph-node-3
monmap e3: 3 mons at {ceph-node-1=192.168.1.194:6789/0,ceph-node-2=192.168.1.195:6789/0,ceph-node-3=192.168.1.196:6789/0}
election epoch 152, quorum 1,2 ceph-node-2,ceph-node-3
osdmap e4057: 72 osds: 72 up, 72 in
flags nodown,noout,sortbitwise,require_jewel_osds
pgmap v102858: 4096 pgs, 1 pools, 324 GB data, 83299 objects
981 GB used, 260 TB / 261 TB avail
3426 active+clean
339 peering
220 stale+active+clean
111 activating
IO came back after node was present again (the node was shutdown by the cluster then, so I needed to start it a another time)
Regards,
Dennis
admin
2,930 Posts
Quote from admin on July 6, 2017, 12:06 pmThis is most likely a resource limitation issue. We have a hardware recommendation guide, please do look at it and try to be as close as possible. When you reboot a node Ceph does background recovery to bring all nodes back in sync. This will put extra load on your servers on top of client io. Also why did you set the nodown/noout option on the OSDs ? if this was because they were constantly coming up and down (flapping) it is also probably because of resource issues.
So in terms of what to do:
- Try to make sure your hardware is close to the recommended guide.
- If you are below recommendation please run "atop" command on the 3 nodes after you reboot and take note of %free ram, % disk busy, % cpu, % net util for the first 15 minutes.
- reset back the nodown/noout
- If you have limits on many resources, try to increase RAM first as it is easiest to do and can help other bottlenecks.
- Please add the following to the /etc/ceph/CLUSTER_NAME.conf on all nodes then reboot, they should help further limit the recovery load from the default values
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_priority = 1
osd_recovery_op_priority = 1
osd_recovery_threads = 1
osd_client_op_priority = 63
osd_recovery_max_start = 1
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
If you do still have issues, let me know the hardware configurations and the atop results and we can take it from there.
This is most likely a resource limitation issue. We have a hardware recommendation guide, please do look at it and try to be as close as possible. When you reboot a node Ceph does background recovery to bring all nodes back in sync. This will put extra load on your servers on top of client io. Also why did you set the nodown/noout option on the OSDs ? if this was because they were constantly coming up and down (flapping) it is also probably because of resource issues.
So in terms of what to do:
- Try to make sure your hardware is close to the recommended guide.
- If you are below recommendation please run "atop" command on the 3 nodes after you reboot and take note of %free ram, % disk busy, % cpu, % net util for the first 15 minutes.
- reset back the nodown/noout
- If you have limits on many resources, try to increase RAM first as it is easiest to do and can help other bottlenecks.
- Please add the following to the /etc/ceph/CLUSTER_NAME.conf on all nodes then reboot, they should help further limit the recovery load from the default values
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_priority = 1
osd_recovery_op_priority = 1
osd_recovery_threads = 1
osd_client_op_priority = 63
osd_recovery_max_start = 1
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
If you do still have issues, let me know the hardware configurations and the atop results and we can take it from there.
therm
121 Posts
Quote from therm on July 7, 2017, 5:52 amThis is not load related. It seems to me that nodown option has caused this.
Using noout option (only) during reboot/maintainance to prevent any recovery seems to work fine for the moment.
Regards,
Dennis
This is not load related. It seems to me that nodown option has caused this.
Using noout option (only) during reboot/maintainance to prevent any recovery seems to work fine for the moment.
Regards,
Dennis
admin
2,930 Posts
Quote from admin on July 7, 2017, 8:06 amExcellent to hear.
We have seen many freezes like this due to very low resources used (we also do it ourselves for testing). One of the feedback we have in PetaSAN is since everything works easily, some users may underestimate the value of proper sizing. The whole release 1.4 it targeted at this point.
It is indeed strange the noout caused the freeze in your case. Will test this and maybe it is a Ceph bug.
Excellent to hear.
We have seen many freezes like this due to very low resources used (we also do it ourselves for testing). One of the feedback we have in PetaSAN is since everything works easily, some users may underestimate the value of proper sizing. The whole release 1.4 it targeted at this point.
It is indeed strange the noout caused the freeze in your case. Will test this and maybe it is a Ceph bug.
therm
121 Posts
Quote from therm on July 7, 2017, 8:08 amYes I've seen the low setups in this forum as well, but our setup is bigger as your recommendation.
Just to correct, it is nodown that causes the problem.
Yes I've seen the low setups in this forum as well, but our setup is bigger as your recommendation.
Just to correct, it is nodown that causes the problem.