Forums - PetaSAN

ForumGeneral DiscussionCrash after blackout
You need to log in to create posts and topics. Login · Register
Crash after blackout

gomaelettronica
10 Posts

June 6, 2017, 4:06 pm
Quote from gomaelettronica on June 6, 2017, 4:06 pm
Hello everyone,

tonight we've had a bad blackout exceeding UPS' battery charge so my PetaSAN cluster has gone down badly. Now I have a lot of problems bringing it up. The behaviour seems a bit random to me.

Random nodes reboots

Random OSDs disappears

Charts randomly shows No Datapoints

PG Status never reach the 1000/1000

Even after a lot of Node reboots I can't bring the cluster to normal status. Is there something I can do to fix the problem? Thanks in advance...

Best regards.

Luca

Hello everyone,

tonight we've had a bad blackout exceeding UPS' battery charge so my PetaSAN cluster has gone down badly. Now I have a lot of problems bringing it up. The behaviour seems a bit random to me.

Random nodes reboots

Random OSDs disappears

Charts randomly shows No Datapoints

PG Status never reach the 1000/1000

Even after a lot of Node reboots I can't bring the cluster to normal status. Is there something I can do to fix the problem? Thanks in advance...

Best regards.

Luca

#1

admin
2,930 Posts

June 6, 2017, 6:23 pm
Quote from admin on June 6, 2017, 6:23 pm
Hi,

what is the output of

ceph status --cluster CLUSTER_NAME

do you get random node reboots or do you get random shutdowns ?

are some of the osd always down or do they go up and down (flap) ?

Hi,

what is the output of

ceph status --cluster CLUSTER_NAME

do you get random node reboots or do you get random shutdowns ?

are some of the osd always down or do they go up and down (flap) ?

Last edited on June 6, 2017, 6:24 pm · #2

gomaelettronica
10 Posts

June 7, 2017, 7:07 am
Quote from gomaelettronica on June 7, 2017, 7:07 am
Hi,

the output of the ceph status command is:

cluster 92f9db61-fc7b-4327-ba71-1a5fb85ee1ca

health HEALTH_ERR

820 pgs are stuck inactive for more than 300 seconds

180 pgs degraded

820 pgs down

820 pgs peering

180 pgs stale

820 pgs stuck inactive

180 pgs stuck unclean

149 pgs undersized

3 requests are blocked > 32 sec

recovery 4110/38066 objects degraded (10.797%)

recovery 839/38066 objects misplaced (2.204%)

too many PGs per OSD (606 > max 500)

monmap e3: 3 mons at {ps-node-01=10.0.1.1:6789/0,ps-node-02=10.0.1.2:6789/0,ps-node-03=10.0.1.3:6789/0}

election epoch 426, quorum 0,1,2 ps-node-01,ps-node-02,ps-node-03

osdmap e1231: 6 osds: 3 up, 3 in; 180 remapped pgs

flags sortbitwise,require_jewel_osds

pgmap v137921: 1000 pgs, 1 pools, 75983 MB data, 19033 objects

86908 MB used, 347 GB / 431 GB avail

4110/38066 objects degraded (10.797%)

839/38066 objects misplaced (2.204%)

820 down+peering

145 stale+active+undersized+degraded

31 stale+active+degraded

4 stale+active+undersized+degraded+remapped

I get random reboots and also random shutdown

OSDs are not always down but they go up and down and varies from node to node.

Thanks again for your support.

Best regards.

Luca

Hi,

the output of the ceph status command is:

cluster 92f9db61-fc7b-4327-ba71-1a5fb85ee1ca

health HEALTH_ERR

820 pgs are stuck inactive for more than 300 seconds

180 pgs degraded

820 pgs down

820 pgs peering

180 pgs stale

820 pgs stuck inactive

180 pgs stuck unclean

149 pgs undersized

3 requests are blocked > 32 sec

recovery 4110/38066 objects degraded (10.797%)

recovery 839/38066 objects misplaced (2.204%)

too many PGs per OSD (606 > max 500)

monmap e3: 3 mons at {ps-node-01=10.0.1.1:6789/0,ps-node-02=10.0.1.2:6789/0,ps-node-03=10.0.1.3:6789/0}

election epoch 426, quorum 0,1,2 ps-node-01,ps-node-02,ps-node-03

osdmap e1231: 6 osds: 3 up, 3 in; 180 remapped pgs

flags sortbitwise,require_jewel_osds

pgmap v137921: 1000 pgs, 1 pools, 75983 MB data, 19033 objects

86908 MB used, 347 GB / 431 GB avail

4110/38066 objects degraded (10.797%)

839/38066 objects misplaced (2.204%)

820 down+peering

145 stale+active+undersized+degraded

31 stale+active+degraded

4 stale+active+undersized+degraded+remapped

I get random reboots and also random shutdown

OSDs are not always down but they go up and down and varies from node to node.

Thanks again for your support.

Best regards.

Luca

#3

admin
2,930 Posts

June 7, 2017, 12:23 pm
Quote from admin on June 7, 2017, 12:23 pm
In terms of priority we should:

Fix random reboot issues

Try to bring up all OSDs

Fix PG stuck states

For the reboots, there is nothing in PetaSAN/Ceph itself that will perform a reboot, so this is strange maybe there is a hardware issue. However in many cases after an unclean crash, the system will be busy trying to recover and check data consistency, possibly the default Ceph values for recovery puts too much stress on your hardware. To reduce this load you can add the following configuration to /etc/ceph/CLUSTER_NAME.conf under the [global] section for your 3 nodes:

osd_max_backfills = 1

osd_recovery_max_active = 1

osd_recovery_threads = 1

osd_recovery_op priority = 1

osd_client_op_priority = 63

osd_max_scrubs = 1

osd_scrub_during_recovery = false

osd_scrub_priority = 1

then reboot all 3 nodes. Again this is not a direct fix for reboots but it should put less strain on a recovering system. It will be helpful if you can watch the resource values: cpu/net/disk busy % by running atop command. Also check for any kernel messages with

dmesg | grep -i "error\|warn\|fail"

Also can you tell me your current resources: RAM, NICs and their speed.

If reboots and possible system load get fixed then it is possible to look at bringing up the OSDs. You need to have at least 5 of 6 up to recover data. The slightly good sign is that the OSDs are not always down but are flapping. This is more common under severe system load where the OSD heartbeats are not getting passed. There are other indications you mentioned such as the graphs are not showing which do indicate there is a resource issue. It is likely that after solving the reboots and system load the OSDs will be up by themselves, else we will need to start them manually and look at their logs.

After fixing the OSDs, the PG stuck should improve and in some cases totally be fixed. However it may be required to manually intervene to fix consistency issues.

Let me know how you progress and i will try to help as much as possible. Good luck.

In terms of priority we should:

Fix random reboot issues

Try to bring up all OSDs

Fix PG stuck states

For the reboots, there is nothing in PetaSAN/Ceph itself that will perform a reboot, so this is strange maybe there is a hardware issue. However in many cases after an unclean crash, the system will be busy trying to recover and check data consistency, possibly the default Ceph values for recovery puts too much stress on your hardware. To reduce this load you can add the following configuration to /etc/ceph/CLUSTER_NAME.conf under the [global] section for your 3 nodes:

osd_max_backfills = 1

osd_recovery_max_active = 1

osd_recovery_threads = 1

osd_recovery_op priority = 1

osd_client_op_priority = 63

osd_max_scrubs = 1

osd_scrub_during_recovery = false

osd_scrub_priority = 1

then reboot all 3 nodes. Again this is not a direct fix for reboots but it should put less strain on a recovering system. It will be helpful if you can watch the resource values: cpu/net/disk busy % by running atop command. Also check for any kernel messages with

dmesg | grep -i "error\|warn\|fail"

Also can you tell me your current resources: RAM, NICs and their speed.

If reboots and possible system load get fixed then it is possible to look at bringing up the OSDs. You need to have at least 5 of 6 up to recover data. The slightly good sign is that the OSDs are not always down but are flapping. This is more common under severe system load where the OSD heartbeats are not getting passed. There are other indications you mentioned such as the graphs are not showing which do indicate there is a resource issue. It is likely that after solving the reboots and system load the OSDs will be up by themselves, else we will need to start them manually and look at their logs.

After fixing the OSDs, the PG stuck should improve and in some cases totally be fixed. However it may be required to manually intervene to fix consistency issues.

Let me know how you progress and i will try to help as much as possible. Good luck.

#4

gomaelettronica
10 Posts

June 7, 2017, 1:26 pm
Quote from gomaelettronica on June 7, 2017, 1:26 pm
Hello,
I've done the cluster.conf changes you suggested me and then rebooted all the nodes. Now I see 2 nodes of 3 up and using the atop command on every node I see that I'm running very high on disk utilizations (101% on the system disk sda on two nodes!!!) and very low on free RAM. My test specs are very poor because I'm running 3 nodes with only 1 GBe NIC and 4 GB of RAM.
Before sending this message I see that node 2 has only 2 (out of 3) OSD up. Thanks again for the support...

Best regards!

Luca

Hello,
I've done the cluster.conf changes you suggested me and then rebooted all the nodes. Now I see 2 nodes of 3 up and using the atop command on every node I see that I'm running very high on disk utilizations (101% on the system disk sda on two nodes!!!) and very low on free RAM. My test specs are very poor because I'm running 3 nodes with only 1 GBe NIC and 4 GB of RAM.
Before sending this message I see that node 2 has only 2 (out of 3) OSD up. Thanks again for the support...

Best regards!

Luca

#5

admin
2,930 Posts

June 7, 2017, 2:08 pm
Quote from admin on June 7, 2017, 2:08 pm
It does look like there is a resource issue, if you can increase your RAM to at least 8 G and try. The busy system disk is most likely caused by RAM issue.

It does look like there is a resource issue, if you can increase your RAM to at least 8 G and try. The busy system disk is most likely caused by RAM issue.

#6

Post Reply: Crash after blackout

Cancel