Forums - PetaSAN

ForumGeneral DiscussionOSD not marked as down
You need to log in to create posts and topics. Login · Register
OSD not marked as down

maxthetor
24 Posts

July 14, 2017, 1:25 am
Quote from maxthetor on July 14, 2017, 1:25 am
We are in our final tests and now we have a new problem.

OSDs are not getting down when a node crashes.

After some research I found this:

Https://bugzilla.redhat.com/show_bug.cgi?id=1425115

I'm not sure if it's really a ceph bug or something that can be fixed with configuration.

Do you have any idea what that might be? If it's a ceph bug do you have how to update?

------------------------------------------------------------------------------

2017-07-13 20:19:31.879197 mon.0 [INF] osd.5 marked itself down

2017-07-13 20:19:31.887533 mon.0 [INF] osd.0 marked itself down

2017-07-13 20:19:31.907472 mon.0 [INF] osd.2 marked itself down

2017-07-13 20:19:40.994921 mon.0 [INF] mon.san01 calling new monitor election

2017-07-13 20:19:40.996203 mon.2 [INF] mon.san03 calling new monitor election

2017-07-13 20:19:45.997331 mon.0 [INF] mon.san01@0 won leader election with quorum 0,2

2017-07-13 20:19:45.998626 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2 san01,san03

2017-07-13 20:19:46.128726 mon.0 [INF] monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}

2017-07-13 20:19:46.129106 mon.0 [INF] osdmap e26342: 6 osds: 6 up, 6 in

2017-07-13 20:20:14.012643 mon.0 [INF] pgmap v1648401: 1000 pgs: 1000 active+clean; 395 GB data, 791 GB used, 11460 GB / 12252 GB avail

root@san01:~# ceph osd tree --cluster xxxxxxxx

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 11.96452 root default

-2 0 host san03

-3 5.98228 host san01

6 1.99409 osd.6 up 1.00000 1.00000

7 1.99409 osd.7 up 1.00000 1.00000

8 1.99409 osd.8 up 1.00000 1.00000

-4 5.98224 host san02

2 1.99408 osd.2 up 1.00000 1.00000

5 1.99408 osd.5 up 1.00000 1.00000

0 1.99408 osd.0 up 1.00000 1.00000

----------------------------------------------------------------------------------------------

We are in our final tests and now we have a new problem.

OSDs are not getting down when a node crashes.

After some research I found this:

https://bugzilla.redhat.com/show_bug.cgi?id=1425115

I'm not sure if it's really a ceph bug or something that can be fixed with configuration.

Do you have any idea what that might be? If it's a ceph bug do you have how to update?

------------------------------------------------------------------------------

2017-07-13 20:19:31.879197 mon.0 [INF] osd.5 marked itself down

2017-07-13 20:19:31.887533 mon.0 [INF] osd.0 marked itself down

2017-07-13 20:19:31.907472 mon.0 [INF] osd.2 marked itself down

2017-07-13 20:19:40.994921 mon.0 [INF] mon.san01 calling new monitor election

2017-07-13 20:19:40.996203 mon.2 [INF] mon.san03 calling new monitor election

2017-07-13 20:19:45.997331 mon.0 [INF] mon.san01@0 won leader election with quorum 0,2

2017-07-13 20:19:45.998626 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2 san01,san03

2017-07-13 20:19:46.128726 mon.0 [INF] monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}

2017-07-13 20:19:46.129106 mon.0 [INF] osdmap e26342: 6 osds: 6 up, 6 in

2017-07-13 20:20:14.012643 mon.0 [INF] pgmap v1648401: 1000 pgs: 1000 active+clean; 395 GB data, 791 GB used, 11460 GB / 12252 GB avail

root@san01:~# ceph osd tree --cluster xxxxxxxx

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 11.96452 root default

-2 0 host san03

-3 5.98228 host san01

6 1.99409 osd.6 up 1.00000 1.00000

7 1.99409 osd.7 up 1.00000 1.00000

8 1.99409 osd.8 up 1.00000 1.00000

-4 5.98224 host san02

2 1.99408 osd.2 up 1.00000 1.00000

5 1.99408 osd.5 up 1.00000 1.00000

0 1.99408 osd.0 up 1.00000 1.00000

----------------------------------------------------------------------------------------------

#1

admin
2,930 Posts

July 14, 2017, 4:45 am
Quote from admin on July 14, 2017, 4:45 am
Can you make sure the noout and nodown flags are not set:

ceph osd unset noout --cluster xx

ceph osd unset nodown --cluster xx

does this happen every time you shut a node or only sometimes ?

Can you make sure the noout and nodown flags are not set:

ceph osd unset noout --cluster xx

ceph osd unset nodown --cluster xx

does this happen every time you shut a node or only sometimes ?

#2

maxthetor
24 Posts

July 14, 2017, 2:33 pm
Quote from maxthetor on July 14, 2017, 2:33 pm
I do not have any of these values set in my ceph cluster.

ever. Independent of node crash.

If it's really a ceph bug, I'm thinking of separating the nodes from mon, with the nodes from osd

----------------------------------------------------------------------------------------

root@san01:~# ceph -w --cluster xxxxxxx
cluster 9f99e76f-1f50-4aa3-b876-dbf194a3cadf
health HEALTH_OK
monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
election epoch 500, quorum 0,1,2 san01,san02,san03
osdmap e29393: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v1682675: 1000 pgs, 1 pools, 120 GB data, 30981 objects
241 GB used, 12010 GB / 12252 GB avail
1000 active+clean
client io 673 B/s rd, 34329 B/s wr, 0 op/s rd, 3 op/s wr

I do not have any of these values set in my ceph cluster.

ever. Independent of node crash.

If it's really a ceph bug, I'm thinking of separating the nodes from mon, with the nodes from osd

----------------------------------------------------------------------------------------

root@san01:~# ceph -w --cluster xxxxxxx
cluster 9f99e76f-1f50-4aa3-b876-dbf194a3cadf
health HEALTH_OK
monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
election epoch 500, quorum 0,1,2 san01,san02,san03
osdmap e29393: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v1682675: 1000 pgs, 1 pools, 120 GB data, 30981 objects
241 GB used, 12010 GB / 12252 GB avail
1000 active+clean
client io 673 B/s rd, 34329 B/s wr, 0 op/s rd, 3 op/s wr

#3

admin
2,930 Posts

July 14, 2017, 7:59 pm
Quote from admin on July 14, 2017, 7:59 pm
We will look into it. it may be a Ceph issue but can you please let me know if this happens all the time or only in some cases ? We test failing nodes all the time and we have not hit this, also i believe in your earlier tests with Ceph recovery load, i would presume the OSD downs were detected correct ? If you can give me some more info it will greatly help.

PetaSAN 1.4 will include Ceph 10.2.7 which according to the link you sent has a fix, but i would much prefer if we can reproduce the problem here and make sure it does solve the issue you see and maybe send a patch.

We will look into it. it may be a Ceph issue but can you please let me know if this happens all the time or only in some cases ? We test failing nodes all the time and we have not hit this, also i believe in your earlier tests with Ceph recovery load, i would presume the OSD downs were detected correct ? If you can give me some more info it will greatly help.

PetaSAN 1.4 will include Ceph 10.2.7 which according to the link you sent has a fix, but i would much prefer if we can reproduce the problem here and make sure it does solve the issue you see and maybe send a patch.

#4

maxthetor
24 Posts

July 14, 2017, 11:30 pm
Quote from maxthetor on July 14, 2017, 11:30 pm
Happens all the time. At the moment I have 4 Petasan nodes. 2 nodes with osds, san01 and san02, and 2 nodes without osd, san03 e san04
No matter what sequence I turn off the nodes, or how many times I do the test, it does not mark the osd as down from the nodes san01 and san02.

In my previous tests where I encountered problems until I adjusted the environment, the osds were very different.
2 disks 146GB, 1 disk 230GB, 1 virtual disk 130GB, etc...
The OSD environment was very messy, because it was a test environment.

Now a more organized environment, 6 OSD distributed in 6 2TB disks in 2 nodes, 3 disks in each. Soon he would receive another 6 hard disk 2TB.
So that's right, before the OSDs were detected as down, even with different sizes, now more organized the OSDs are not detected as down.

Is there a release date for version 1.4?

-------------------------------------------------------------------------------------------------------------------------------

root@san01:~# ceph osd tree --cluster xxxxxx

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 11.96452 root default

-2 0 host san03

-3 5.98228 host san01

6 1.99409 osd.6 up 1.00000 1.00000

7 1.99409 osd.7 up 1.00000 1.00000

8 1.99409 osd.8 up 1.00000 1.00000

-4 5.98224 host san02

2 1.99408 osd.2 up 1.00000 1.00000

5 1.99408 osd.5 up 1.00000 1.00000

0 1.99408 osd.0 up 1.00000 1.00000

Happens all the time. At the moment I have 4 Petasan nodes. 2 nodes with osds, san01 and san02, and 2 nodes without osd, san03 e san04
No matter what sequence I turn off the nodes, or how many times I do the test, it does not mark the osd as down from the nodes san01 and san02.

In my previous tests where I encountered problems until I adjusted the environment, the osds were very different.
2 disks 146GB, 1 disk 230GB, 1 virtual disk 130GB, etc...
The OSD environment was very messy, because it was a test environment.

Now a more organized environment, 6 OSD distributed in 6 2TB disks in 2 nodes, 3 disks in each. Soon he would receive another 6 hard disk 2TB.
So that's right, before the OSDs were detected as down, even with different sizes, now more organized the OSDs are not detected as down.

Is there a release date for version 1.4?

-------------------------------------------------------------------------------------------------------------------------------

root@san01:~# ceph osd tree --cluster xxxxxx

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 11.96452 root default

-2 0 host san03

-3 5.98228 host san01

6 1.99409 osd.6 up 1.00000 1.00000

7 1.99409 osd.7 up 1.00000 1.00000

8 1.99409 osd.8 up 1.00000 1.00000

-4 5.98224 host san02

2 1.99408 osd.2 up 1.00000 1.00000

5 1.99408 osd.5 up 1.00000 1.00000

0 1.99408 osd.0 up 1.00000 1.00000

#5

admin
2,930 Posts

July 14, 2017, 11:49 pm
Quote from admin on July 14, 2017, 11:49 pm
looking into this, it is not a bug. it happens because you only have 2 storage/osd nodes and the default in Ceph is that you need 2 different storage nodes to report that an OSD is down from a third node. For a 2 storage node setup, please add the following in the /etc/ceph/cluster_name.conf on all nodes under the [mon] section and reboot

mon_osd_min_down_reporters = 1

this should fix the issue. Please note that you should only do this when you only have 2 storage nodes. This settings can lead to issue if you have more than 2 storage nodes since in gives the power for 1 node to fail thee others.

version 1.4 is due mid Aug it will have many performance and tuning features. We are also releasing a 1.31 in a couple of days mainly to fix the issue with ESXi arp cache, it is being tested now.

looking into this, it is not a bug. it happens because you only have 2 storage/osd nodes and the default in Ceph is that you need 2 different storage nodes to report that an OSD is down from a third node. For a 2 storage node setup, please add the following in the /etc/ceph/cluster_name.conf on all nodes under the [mon] section and reboot

mon_osd_min_down_reporters = 1

this should fix the issue. Please note that you should only do this when you only have 2 storage nodes. This settings can lead to issue if you have more than 2 storage nodes since in gives the power for 1 node to fail thee others.

version 1.4 is due mid Aug it will have many performance and tuning features. We are also releasing a 1.31 in a couple of days mainly to fix the issue with ESXi arp cache, it is being tested now.

Last edited on July 14, 2017, 11:52 pm · #6

maxthetor
24 Posts

July 25, 2017, 12:30 am
Quote from maxthetor on July 25, 2017, 12:30 am
Great, worked with that option.

When I increase osd numbers per server, should I remove this option?

Great, worked with that option.

When I increase osd numbers per server, should I remove this option?

#7

Post Reply: OSD not marked as down

Cancel