Crash after blackout
gomaelettronica
10 Posts
June 6, 2017, 4:06 pmQuote from gomaelettronica on June 6, 2017, 4:06 pmHello everyone,
tonight we've had a bad blackout exceeding UPS' battery charge so my PetaSAN cluster has gone down badly. Now I have a lot of problems bringing it up. The behaviour seems a bit random to me.
- Random nodes reboots
- Random OSDs disappears
- Charts randomly shows No Datapoints
- PG Status never reach the 1000/1000
Even after a lot of Node reboots I can't bring the cluster to normal status. Is there something I can do to fix the problem? Thanks in advance...
Best regards.
Luca
Hello everyone,
tonight we've had a bad blackout exceeding UPS' battery charge so my PetaSAN cluster has gone down badly. Now I have a lot of problems bringing it up. The behaviour seems a bit random to me.
- Random nodes reboots
- Random OSDs disappears
- Charts randomly shows No Datapoints
- PG Status never reach the 1000/1000
Even after a lot of Node reboots I can't bring the cluster to normal status. Is there something I can do to fix the problem? Thanks in advance...
Best regards.
Luca
admin
2,930 Posts
June 6, 2017, 6:23 pmQuote from admin on June 6, 2017, 6:23 pmHi,
what is the output of
ceph status --cluster CLUSTER_NAME
do you get random node reboots or do you get random shutdowns ?
are some of the osd always down or do they go up and down (flap) ?
Hi,
what is the output of
ceph status --cluster CLUSTER_NAME
do you get random node reboots or do you get random shutdowns ?
are some of the osd always down or do they go up and down (flap) ?
Last edited on June 6, 2017, 6:24 pm · #2
gomaelettronica
10 Posts
June 7, 2017, 7:07 amQuote from gomaelettronica on June 7, 2017, 7:07 amHi,
- the output of the ceph status command is:
cluster 92f9db61-fc7b-4327-ba71-1a5fb85ee1ca
health HEALTH_ERR
820 pgs are stuck inactive for more than 300 seconds
180 pgs degraded
820 pgs down
820 pgs peering
180 pgs stale
820 pgs stuck inactive
180 pgs stuck unclean
149 pgs undersized
3 requests are blocked > 32 sec
recovery 4110/38066 objects degraded (10.797%)
recovery 839/38066 objects misplaced (2.204%)
too many PGs per OSD (606 > max 500)
monmap e3: 3 mons at {ps-node-01=10.0.1.1:6789/0,ps-node-02=10.0.1.2:6789/0,ps-node-03=10.0.1.3:6789/0}
election epoch 426, quorum 0,1,2 ps-node-01,ps-node-02,ps-node-03
osdmap e1231: 6 osds: 3 up, 3 in; 180 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v137921: 1000 pgs, 1 pools, 75983 MB data, 19033 objects
86908 MB used, 347 GB / 431 GB avail
4110/38066 objects degraded (10.797%)
839/38066 objects misplaced (2.204%)
820 down+peering
145 stale+active+undersized+degraded
31 stale+active+degraded
4 stale+active+undersized+degraded+remapped
- I get random reboots and also random shutdown
- OSDs are not always down but they go up and down and varies from node to node.
Thanks again for your support.
Best regards.
Luca
Hi,
- the output of the ceph status command is:
cluster 92f9db61-fc7b-4327-ba71-1a5fb85ee1ca
health HEALTH_ERR
820 pgs are stuck inactive for more than 300 seconds
180 pgs degraded
820 pgs down
820 pgs peering
180 pgs stale
820 pgs stuck inactive
180 pgs stuck unclean
149 pgs undersized
3 requests are blocked > 32 sec
recovery 4110/38066 objects degraded (10.797%)
recovery 839/38066 objects misplaced (2.204%)
too many PGs per OSD (606 > max 500)
monmap e3: 3 mons at {ps-node-01=10.0.1.1:6789/0,ps-node-02=10.0.1.2:6789/0,ps-node-03=10.0.1.3:6789/0}
election epoch 426, quorum 0,1,2 ps-node-01,ps-node-02,ps-node-03
osdmap e1231: 6 osds: 3 up, 3 in; 180 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v137921: 1000 pgs, 1 pools, 75983 MB data, 19033 objects
86908 MB used, 347 GB / 431 GB avail
4110/38066 objects degraded (10.797%)
839/38066 objects misplaced (2.204%)
820 down+peering
145 stale+active+undersized+degraded
31 stale+active+degraded
4 stale+active+undersized+degraded+remapped
- I get random reboots and also random shutdown
- OSDs are not always down but they go up and down and varies from node to node.
Thanks again for your support.
Best regards.
Luca
admin
2,930 Posts
June 7, 2017, 12:23 pmQuote from admin on June 7, 2017, 12:23 pmIn terms of priority we should:
- Fix random reboot issues
- Try to bring up all OSDs
- Fix PG stuck states
For the reboots, there is nothing in PetaSAN/Ceph itself that will perform a reboot, so this is strange maybe there is a hardware issue. However in many cases after an unclean crash, the system will be busy trying to recover and check data consistency, possibly the default Ceph values for recovery puts too much stress on your hardware. To reduce this load you can add the following configuration to /etc/ceph/CLUSTER_NAME.conf under the [global] section for your 3 nodes:
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_threads = 1
osd_recovery_op priority = 1
osd_client_op_priority = 63
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
then reboot all 3 nodes. Again this is not a direct fix for reboots but it should put less strain on a recovering system. It will be helpful if you can watch the resource values: cpu/net/disk busy % by running atop command. Also check for any kernel messages with
dmesg | grep -i "error\|warn\|fail"
Also can you tell me your current resources: RAM, NICs and their speed.
If reboots and possible system load get fixed then it is possible to look at bringing up the OSDs. You need to have at least 5 of 6 up to recover data. The slightly good sign is that the OSDs are not always down but are flapping. This is more common under severe system load where the OSD heartbeats are not getting passed. There are other indications you mentioned such as the graphs are not showing which do indicate there is a resource issue. It is likely that after solving the reboots and system load the OSDs will be up by themselves, else we will need to start them manually and look at their logs.
After fixing the OSDs, the PG stuck should improve and in some cases totally be fixed. However it may be required to manually intervene to fix consistency issues.
Let me know how you progress and i will try to help as much as possible. Good luck.
In terms of priority we should:
- Fix random reboot issues
- Try to bring up all OSDs
- Fix PG stuck states
For the reboots, there is nothing in PetaSAN/Ceph itself that will perform a reboot, so this is strange maybe there is a hardware issue. However in many cases after an unclean crash, the system will be busy trying to recover and check data consistency, possibly the default Ceph values for recovery puts too much stress on your hardware. To reduce this load you can add the following configuration to /etc/ceph/CLUSTER_NAME.conf under the [global] section for your 3 nodes:
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_threads = 1
osd_recovery_op priority = 1
osd_client_op_priority = 63
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
then reboot all 3 nodes. Again this is not a direct fix for reboots but it should put less strain on a recovering system. It will be helpful if you can watch the resource values: cpu/net/disk busy % by running atop command. Also check for any kernel messages with
dmesg | grep -i "error\|warn\|fail"
Also can you tell me your current resources: RAM, NICs and their speed.
If reboots and possible system load get fixed then it is possible to look at bringing up the OSDs. You need to have at least 5 of 6 up to recover data. The slightly good sign is that the OSDs are not always down but are flapping. This is more common under severe system load where the OSD heartbeats are not getting passed. There are other indications you mentioned such as the graphs are not showing which do indicate there is a resource issue. It is likely that after solving the reboots and system load the OSDs will be up by themselves, else we will need to start them manually and look at their logs.
After fixing the OSDs, the PG stuck should improve and in some cases totally be fixed. However it may be required to manually intervene to fix consistency issues.
Let me know how you progress and i will try to help as much as possible. Good luck.
gomaelettronica
10 Posts
June 7, 2017, 1:26 pmQuote from gomaelettronica on June 7, 2017, 1:26 pmHello,
I've done the cluster.conf changes you suggested me and then rebooted all the nodes. Now I see 2 nodes of 3 up and using the atop command on every node I see that I'm running very high on disk utilizations (101% on the system disk sda on two nodes!!!) and very low on free RAM. My test specs are very poor because I'm running 3 nodes with only 1 GBe NIC and 4 GB of RAM.
Before sending this message I see that node 2 has only 2 (out of 3) OSD up. Thanks again for the support...
Best regards!
Luca
Hello,
I've done the cluster.conf changes you suggested me and then rebooted all the nodes. Now I see 2 nodes of 3 up and using the atop command on every node I see that I'm running very high on disk utilizations (101% on the system disk sda on two nodes!!!) and very low on free RAM. My test specs are very poor because I'm running 3 nodes with only 1 GBe NIC and 4 GB of RAM.
Before sending this message I see that node 2 has only 2 (out of 3) OSD up. Thanks again for the support...
Best regards!
Luca
admin
2,930 Posts
June 7, 2017, 2:08 pmQuote from admin on June 7, 2017, 2:08 pmIt does look like there is a resource issue, if you can increase your RAM to at least 8 G and try. The busy system disk is most likely caused by RAM issue.
It does look like there is a resource issue, if you can increase your RAM to at least 8 G and try. The busy system disk is most likely caused by RAM issue.
Crash after blackout
gomaelettronica
10 Posts
Quote from gomaelettronica on June 6, 2017, 4:06 pmHello everyone,
tonight we've had a bad blackout exceeding UPS' battery charge so my PetaSAN cluster has gone down badly. Now I have a lot of problems bringing it up. The behaviour seems a bit random to me.
- Random nodes reboots
- Random OSDs disappears
- Charts randomly shows No Datapoints
- PG Status never reach the 1000/1000
Even after a lot of Node reboots I can't bring the cluster to normal status. Is there something I can do to fix the problem? Thanks in advance...
Best regards.
Luca
Hello everyone,
tonight we've had a bad blackout exceeding UPS' battery charge so my PetaSAN cluster has gone down badly. Now I have a lot of problems bringing it up. The behaviour seems a bit random to me.
- Random nodes reboots
- Random OSDs disappears
- Charts randomly shows No Datapoints
- PG Status never reach the 1000/1000
Even after a lot of Node reboots I can't bring the cluster to normal status. Is there something I can do to fix the problem? Thanks in advance...
Best regards.
Luca
admin
2,930 Posts
Quote from admin on June 6, 2017, 6:23 pmHi,
what is the output of
ceph status --cluster CLUSTER_NAME
do you get random node reboots or do you get random shutdowns ?
are some of the osd always down or do they go up and down (flap) ?
Hi,
what is the output of
ceph status --cluster CLUSTER_NAME
do you get random node reboots or do you get random shutdowns ?
are some of the osd always down or do they go up and down (flap) ?
gomaelettronica
10 Posts
Quote from gomaelettronica on June 7, 2017, 7:07 amHi,
- the output of the ceph status command is:
cluster 92f9db61-fc7b-4327-ba71-1a5fb85ee1ca
health HEALTH_ERR
820 pgs are stuck inactive for more than 300 seconds
180 pgs degraded
820 pgs down
820 pgs peering
180 pgs stale
820 pgs stuck inactive
180 pgs stuck unclean
149 pgs undersized
3 requests are blocked > 32 sec
recovery 4110/38066 objects degraded (10.797%)
recovery 839/38066 objects misplaced (2.204%)
too many PGs per OSD (606 > max 500)
monmap e3: 3 mons at {ps-node-01=10.0.1.1:6789/0,ps-node-02=10.0.1.2:6789/0,ps-node-03=10.0.1.3:6789/0}
election epoch 426, quorum 0,1,2 ps-node-01,ps-node-02,ps-node-03
osdmap e1231: 6 osds: 3 up, 3 in; 180 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v137921: 1000 pgs, 1 pools, 75983 MB data, 19033 objects
86908 MB used, 347 GB / 431 GB avail
4110/38066 objects degraded (10.797%)
839/38066 objects misplaced (2.204%)
820 down+peering
145 stale+active+undersized+degraded
31 stale+active+degraded
4 stale+active+undersized+degraded+remapped
- I get random reboots and also random shutdown
- OSDs are not always down but they go up and down and varies from node to node.
Thanks again for your support.
Best regards.
Luca
Hi,
- the output of the ceph status command is:
cluster 92f9db61-fc7b-4327-ba71-1a5fb85ee1ca
health HEALTH_ERR
820 pgs are stuck inactive for more than 300 seconds
180 pgs degraded
820 pgs down
820 pgs peering
180 pgs stale
820 pgs stuck inactive
180 pgs stuck unclean
149 pgs undersized
3 requests are blocked > 32 sec
recovery 4110/38066 objects degraded (10.797%)
recovery 839/38066 objects misplaced (2.204%)
too many PGs per OSD (606 > max 500)
monmap e3: 3 mons at {ps-node-01=10.0.1.1:6789/0,ps-node-02=10.0.1.2:6789/0,ps-node-03=10.0.1.3:6789/0}
election epoch 426, quorum 0,1,2 ps-node-01,ps-node-02,ps-node-03
osdmap e1231: 6 osds: 3 up, 3 in; 180 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v137921: 1000 pgs, 1 pools, 75983 MB data, 19033 objects
86908 MB used, 347 GB / 431 GB avail
4110/38066 objects degraded (10.797%)
839/38066 objects misplaced (2.204%)
820 down+peering
145 stale+active+undersized+degraded
31 stale+active+degraded
4 stale+active+undersized+degraded+remapped
- I get random reboots and also random shutdown
- OSDs are not always down but they go up and down and varies from node to node.
Thanks again for your support.
Best regards.
Luca
admin
2,930 Posts
Quote from admin on June 7, 2017, 12:23 pmIn terms of priority we should:
- Fix random reboot issues
- Try to bring up all OSDs
- Fix PG stuck states
For the reboots, there is nothing in PetaSAN/Ceph itself that will perform a reboot, so this is strange maybe there is a hardware issue. However in many cases after an unclean crash, the system will be busy trying to recover and check data consistency, possibly the default Ceph values for recovery puts too much stress on your hardware. To reduce this load you can add the following configuration to /etc/ceph/CLUSTER_NAME.conf under the [global] section for your 3 nodes:
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_threads = 1
osd_recovery_op priority = 1
osd_client_op_priority = 63
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
then reboot all 3 nodes. Again this is not a direct fix for reboots but it should put less strain on a recovering system. It will be helpful if you can watch the resource values: cpu/net/disk busy % by running atop command. Also check for any kernel messages with
dmesg | grep -i "error\|warn\|fail"
Also can you tell me your current resources: RAM, NICs and their speed.
If reboots and possible system load get fixed then it is possible to look at bringing up the OSDs. You need to have at least 5 of 6 up to recover data. The slightly good sign is that the OSDs are not always down but are flapping. This is more common under severe system load where the OSD heartbeats are not getting passed. There are other indications you mentioned such as the graphs are not showing which do indicate there is a resource issue. It is likely that after solving the reboots and system load the OSDs will be up by themselves, else we will need to start them manually and look at their logs.
After fixing the OSDs, the PG stuck should improve and in some cases totally be fixed. However it may be required to manually intervene to fix consistency issues.
Let me know how you progress and i will try to help as much as possible. Good luck.
In terms of priority we should:
- Fix random reboot issues
- Try to bring up all OSDs
- Fix PG stuck states
For the reboots, there is nothing in PetaSAN/Ceph itself that will perform a reboot, so this is strange maybe there is a hardware issue. However in many cases after an unclean crash, the system will be busy trying to recover and check data consistency, possibly the default Ceph values for recovery puts too much stress on your hardware. To reduce this load you can add the following configuration to /etc/ceph/CLUSTER_NAME.conf under the [global] section for your 3 nodes:
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_threads = 1
osd_recovery_op priority = 1
osd_client_op_priority = 63
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
then reboot all 3 nodes. Again this is not a direct fix for reboots but it should put less strain on a recovering system. It will be helpful if you can watch the resource values: cpu/net/disk busy % by running atop command. Also check for any kernel messages with
dmesg | grep -i "error\|warn\|fail"
Also can you tell me your current resources: RAM, NICs and their speed.
If reboots and possible system load get fixed then it is possible to look at bringing up the OSDs. You need to have at least 5 of 6 up to recover data. The slightly good sign is that the OSDs are not always down but are flapping. This is more common under severe system load where the OSD heartbeats are not getting passed. There are other indications you mentioned such as the graphs are not showing which do indicate there is a resource issue. It is likely that after solving the reboots and system load the OSDs will be up by themselves, else we will need to start them manually and look at their logs.
After fixing the OSDs, the PG stuck should improve and in some cases totally be fixed. However it may be required to manually intervene to fix consistency issues.
Let me know how you progress and i will try to help as much as possible. Good luck.
gomaelettronica
10 Posts
Quote from gomaelettronica on June 7, 2017, 1:26 pmHello,
I've done the cluster.conf changes you suggested me and then rebooted all the nodes. Now I see 2 nodes of 3 up and using the atop command on every node I see that I'm running very high on disk utilizations (101% on the system disk sda on two nodes!!!) and very low on free RAM. My test specs are very poor because I'm running 3 nodes with only 1 GBe NIC and 4 GB of RAM.
Before sending this message I see that node 2 has only 2 (out of 3) OSD up. Thanks again for the support...Best regards!
Luca
Hello,
I've done the cluster.conf changes you suggested me and then rebooted all the nodes. Now I see 2 nodes of 3 up and using the atop command on every node I see that I'm running very high on disk utilizations (101% on the system disk sda on two nodes!!!) and very low on free RAM. My test specs are very poor because I'm running 3 nodes with only 1 GBe NIC and 4 GB of RAM.
Before sending this message I see that node 2 has only 2 (out of 3) OSD up. Thanks again for the support...
Best regards!
Luca
admin
2,930 Posts
Quote from admin on June 7, 2017, 2:08 pmIt does look like there is a resource issue, if you can increase your RAM to at least 8 G and try. The busy system disk is most likely caused by RAM issue.
It does look like there is a resource issue, if you can increase your RAM to at least 8 G and try. The busy system disk is most likely caused by RAM issue.