Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

OSD not marked as down

We are in our final tests and now we have a new problem.

OSDs are not getting down when a node crashes.

After some research I found this:

 

https://bugzilla.redhat.com/show_bug.cgi?id=1425115

 

I'm not sure if it's really a ceph bug or something that can be fixed with configuration.

 

Do you have any idea what that might be? If it's a ceph bug do you have how to update?

------------------------------------------------------------------------------

2017-07-13 20:19:31.879197 mon.0 [INF] osd.5 marked itself down

2017-07-13 20:19:31.887533 mon.0 [INF] osd.0 marked itself down

2017-07-13 20:19:31.907472 mon.0 [INF] osd.2 marked itself down

2017-07-13 20:19:40.994921 mon.0 [INF] mon.san01 calling new monitor election

2017-07-13 20:19:40.996203 mon.2 [INF] mon.san03 calling new monitor election

2017-07-13 20:19:45.997331 mon.0 [INF] mon.san01@0 won leader election with quorum 0,2

2017-07-13 20:19:45.998626 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2 san01,san03

2017-07-13 20:19:46.128726 mon.0 [INF] monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}

2017-07-13 20:19:46.129106 mon.0 [INF] osdmap e26342: 6 osds: 6 up, 6 in

2017-07-13 20:20:14.012643 mon.0 [INF] pgmap v1648401: 1000 pgs: 1000 active+clean; 395 GB data, 791 GB used, 11460 GB / 12252 GB avail

root@san01:~# ceph osd tree --cluster xxxxxxxx

ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 11.96452 root default

-2        0     host san03

-3  5.98228     host san01

6  1.99409         osd.6       up  1.00000          1.00000

7  1.99409         osd.7       up  1.00000          1.00000

8  1.99409         osd.8       up  1.00000          1.00000

-4  5.98224     host san02

2  1.99408         osd.2       up  1.00000          1.00000

5  1.99408         osd.5       up  1.00000          1.00000

0  1.99408         osd.0       up  1.00000          1.00000

----------------------------------------------------------------------------------------------

Can you make sure the noout and nodown flags are not set:

ceph osd unset noout --cluster xx

ceph osd unset nodown --cluster xx

does this happen every time you shut a node or only sometimes ?

I do not have any of these values set in my ceph cluster.

ever. Independent of node crash.

If it's really a ceph bug, I'm thinking of separating the nodes from mon, with the nodes from osd

----------------------------------------------------------------------------------------

root@san01:~# ceph -w --cluster xxxxxxx
cluster 9f99e76f-1f50-4aa3-b876-dbf194a3cadf
health HEALTH_OK
monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
election epoch 500, quorum 0,1,2 san01,san02,san03
osdmap e29393: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v1682675: 1000 pgs, 1 pools, 120 GB data, 30981 objects
241 GB used, 12010 GB / 12252 GB avail
1000 active+clean
client io 673 B/s rd, 34329 B/s wr, 0 op/s rd, 3 op/s wr

We will look into it. it may be a Ceph issue but can you please let me know if this happens all the time or only in some cases ? We test failing nodes all the time and we have not hit this, also i believe in your earlier tests with Ceph recovery load, i would presume the OSD downs were detected correct ? If you can give me some more info it will greatly help.

PetaSAN 1.4 will include Ceph 10.2.7 which according to the link you sent has a fix, but i would much prefer if we can reproduce the problem here and make sure it does solve the issue you see and maybe send a patch.

Happens all the time. At the moment I have 4 Petasan nodes. 2 nodes with osds, san01 and san02, and 2 nodes without osd, san03 e san04
No matter what sequence I turn off the nodes, or how many times I do the test, it does not mark the osd as down from the nodes san01 and san02.

In my previous tests where I encountered problems until I adjusted the environment, the osds were very different.
2 disks 146GB, 1 disk 230GB, 1 virtual disk 130GB, etc...
The OSD environment was very messy, because it was a test environment.

Now a more organized environment, 6 OSD distributed in 6 2TB disks in 2 nodes, 3 disks in each. Soon he would receive another 6 hard disk 2TB.
So that's right, before the OSDs were detected as down, even with different sizes, now more organized the OSDs are not detected as down.

Is there a release date for version 1.4?

-------------------------------------------------------------------------------------------------------------------------------

root@san01:~# ceph osd tree --cluster xxxxxx

ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 11.96452 root default

-2        0     host san03

-3  5.98228     host san01

6  1.99409         osd.6       up  1.00000          1.00000

7  1.99409         osd.7       up  1.00000          1.00000

8  1.99409         osd.8       up  1.00000          1.00000

-4  5.98224     host san02

2  1.99408         osd.2       up  1.00000          1.00000

5  1.99408         osd.5       up  1.00000          1.00000

0  1.99408         osd.0       up  1.00000          1.00000

looking into this, it is not  a bug. it happens because you only have 2 storage/osd nodes and the default in Ceph is that you need 2 different storage nodes to report that an OSD is down from a third node. For a 2 storage node setup, please add the following in the /etc/ceph/cluster_name.conf on all nodes under the [mon] section and reboot

mon_osd_min_down_reporters = 1

this should fix the issue. Please note that you should only do this when you only have 2 storage nodes. This settings can lead to issue if you have more than 2 storage nodes since in gives the power for 1 node to fail thee others.

version 1.4 is due mid Aug it will have many performance and tuning features.  We are also releasing a 1.31 in a couple of days mainly to fix the issue with ESXi arp cache, it is being tested now.

 

Great, worked with that option.

When I increase osd numbers per server, should I remove this option?