OSD not marked as down
maxthetor
24 Posts
July 14, 2017, 1:25 amQuote from maxthetor on July 14, 2017, 1:25 amWe are in our final tests and now we have a new problem.
OSDs are not getting down when a node crashes.
After some research I found this:
Https://bugzilla.redhat.com/show_bug.cgi?id=1425115
I'm not sure if it's really a ceph bug or something that can be fixed with configuration.
Do you have any idea what that might be? If it's a ceph bug do you have how to update?
------------------------------------------------------------------------------
2017-07-13 20:19:31.879197 mon.0 [INF] osd.5 marked itself down
2017-07-13 20:19:31.887533 mon.0 [INF] osd.0 marked itself down
2017-07-13 20:19:31.907472 mon.0 [INF] osd.2 marked itself down
2017-07-13 20:19:40.994921 mon.0 [INF] mon.san01 calling new monitor election
2017-07-13 20:19:40.996203 mon.2 [INF] mon.san03 calling new monitor election
2017-07-13 20:19:45.997331 mon.0 [INF] mon.san01@0 won leader election with quorum 0,2
2017-07-13 20:19:45.998626 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2 san01,san03
2017-07-13 20:19:46.128726 mon.0 [INF] monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
2017-07-13 20:19:46.129106 mon.0 [INF] osdmap e26342: 6 osds: 6 up, 6 in
2017-07-13 20:20:14.012643 mon.0 [INF] pgmap v1648401: 1000 pgs: 1000 active+clean; 395 GB data, 791 GB used, 11460 GB / 12252 GB avail
root@san01:~# ceph osd tree --cluster xxxxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
----------------------------------------------------------------------------------------------
We are in our final tests and now we have a new problem.
OSDs are not getting down when a node crashes.
After some research I found this:
https://bugzilla.redhat.com/show_bug.cgi?id=1425115
I'm not sure if it's really a ceph bug or something that can be fixed with configuration.
Do you have any idea what that might be? If it's a ceph bug do you have how to update?
------------------------------------------------------------------------------
2017-07-13 20:19:31.879197 mon.0 [INF] osd.5 marked itself down
2017-07-13 20:19:31.887533 mon.0 [INF] osd.0 marked itself down
2017-07-13 20:19:31.907472 mon.0 [INF] osd.2 marked itself down
2017-07-13 20:19:40.994921 mon.0 [INF] mon.san01 calling new monitor election
2017-07-13 20:19:40.996203 mon.2 [INF] mon.san03 calling new monitor election
2017-07-13 20:19:45.997331 mon.0 [INF] mon.san01@0 won leader election with quorum 0,2
2017-07-13 20:19:45.998626 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2 san01,san03
2017-07-13 20:19:46.128726 mon.0 [INF] monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
2017-07-13 20:19:46.129106 mon.0 [INF] osdmap e26342: 6 osds: 6 up, 6 in
2017-07-13 20:20:14.012643 mon.0 [INF] pgmap v1648401: 1000 pgs: 1000 active+clean; 395 GB data, 791 GB used, 11460 GB / 12252 GB avail
root@san01:~# ceph osd tree --cluster xxxxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
----------------------------------------------------------------------------------------------
admin
2,930 Posts
July 14, 2017, 4:45 amQuote from admin on July 14, 2017, 4:45 amCan you make sure the noout and nodown flags are not set:
ceph osd unset noout --cluster xx
ceph osd unset nodown --cluster xx
does this happen every time you shut a node or only sometimes ?
Can you make sure the noout and nodown flags are not set:
ceph osd unset noout --cluster xx
ceph osd unset nodown --cluster xx
does this happen every time you shut a node or only sometimes ?
maxthetor
24 Posts
July 14, 2017, 2:33 pmQuote from maxthetor on July 14, 2017, 2:33 pmI do not have any of these values set in my ceph cluster.
ever. Independent of node crash.
If it's really a ceph bug, I'm thinking of separating the nodes from mon, with the nodes from osd
----------------------------------------------------------------------------------------
root@san01:~# ceph -w --cluster xxxxxxx
cluster 9f99e76f-1f50-4aa3-b876-dbf194a3cadf
health HEALTH_OK
monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
election epoch 500, quorum 0,1,2 san01,san02,san03
osdmap e29393: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v1682675: 1000 pgs, 1 pools, 120 GB data, 30981 objects
241 GB used, 12010 GB / 12252 GB avail
1000 active+clean
client io 673 B/s rd, 34329 B/s wr, 0 op/s rd, 3 op/s wr
I do not have any of these values set in my ceph cluster.
ever. Independent of node crash.
If it's really a ceph bug, I'm thinking of separating the nodes from mon, with the nodes from osd
----------------------------------------------------------------------------------------
root@san01:~# ceph -w --cluster xxxxxxx
cluster 9f99e76f-1f50-4aa3-b876-dbf194a3cadf
health HEALTH_OK
monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
election epoch 500, quorum 0,1,2 san01,san02,san03
osdmap e29393: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v1682675: 1000 pgs, 1 pools, 120 GB data, 30981 objects
241 GB used, 12010 GB / 12252 GB avail
1000 active+clean
client io 673 B/s rd, 34329 B/s wr, 0 op/s rd, 3 op/s wr
admin
2,930 Posts
July 14, 2017, 7:59 pmQuote from admin on July 14, 2017, 7:59 pmWe will look into it. it may be a Ceph issue but can you please let me know if this happens all the time or only in some cases ? We test failing nodes all the time and we have not hit this, also i believe in your earlier tests with Ceph recovery load, i would presume the OSD downs were detected correct ? If you can give me some more info it will greatly help.
PetaSAN 1.4 will include Ceph 10.2.7 which according to the link you sent has a fix, but i would much prefer if we can reproduce the problem here and make sure it does solve the issue you see and maybe send a patch.
We will look into it. it may be a Ceph issue but can you please let me know if this happens all the time or only in some cases ? We test failing nodes all the time and we have not hit this, also i believe in your earlier tests with Ceph recovery load, i would presume the OSD downs were detected correct ? If you can give me some more info it will greatly help.
PetaSAN 1.4 will include Ceph 10.2.7 which according to the link you sent has a fix, but i would much prefer if we can reproduce the problem here and make sure it does solve the issue you see and maybe send a patch.
maxthetor
24 Posts
July 14, 2017, 11:30 pmQuote from maxthetor on July 14, 2017, 11:30 pmHappens all the time. At the moment I have 4 Petasan nodes. 2 nodes with osds, san01 and san02, and 2 nodes without osd, san03 e san04
No matter what sequence I turn off the nodes, or how many times I do the test, it does not mark the osd as down from the nodes san01 and san02.
In my previous tests where I encountered problems until I adjusted the environment, the osds were very different.
2 disks 146GB, 1 disk 230GB, 1 virtual disk 130GB, etc...
The OSD environment was very messy, because it was a test environment.
Now a more organized environment, 6 OSD distributed in 6 2TB disks in 2 nodes, 3 disks in each. Soon he would receive another 6 hard disk 2TB.
So that's right, before the OSDs were detected as down, even with different sizes, now more organized the OSDs are not detected as down.
Is there a release date for version 1.4?
-------------------------------------------------------------------------------------------------------------------------------
root@san01:~# ceph osd tree --cluster xxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
Happens all the time. At the moment I have 4 Petasan nodes. 2 nodes with osds, san01 and san02, and 2 nodes without osd, san03 e san04
No matter what sequence I turn off the nodes, or how many times I do the test, it does not mark the osd as down from the nodes san01 and san02.
In my previous tests where I encountered problems until I adjusted the environment, the osds were very different.
2 disks 146GB, 1 disk 230GB, 1 virtual disk 130GB, etc...
The OSD environment was very messy, because it was a test environment.
Now a more organized environment, 6 OSD distributed in 6 2TB disks in 2 nodes, 3 disks in each. Soon he would receive another 6 hard disk 2TB.
So that's right, before the OSDs were detected as down, even with different sizes, now more organized the OSDs are not detected as down.
Is there a release date for version 1.4?
-------------------------------------------------------------------------------------------------------------------------------
root@san01:~# ceph osd tree --cluster xxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
admin
2,930 Posts
July 14, 2017, 11:49 pmQuote from admin on July 14, 2017, 11:49 pmlooking into this, it is not a bug. it happens because you only have 2 storage/osd nodes and the default in Ceph is that you need 2 different storage nodes to report that an OSD is down from a third node. For a 2 storage node setup, please add the following in the /etc/ceph/cluster_name.conf on all nodes under the [mon] section and reboot
mon_osd_min_down_reporters = 1
this should fix the issue. Please note that you should only do this when you only have 2 storage nodes. This settings can lead to issue if you have more than 2 storage nodes since in gives the power for 1 node to fail thee others.
version 1.4 is due mid Aug it will have many performance and tuning features. We are also releasing a 1.31 in a couple of days mainly to fix the issue with ESXi arp cache, it is being tested now.
looking into this, it is not a bug. it happens because you only have 2 storage/osd nodes and the default in Ceph is that you need 2 different storage nodes to report that an OSD is down from a third node. For a 2 storage node setup, please add the following in the /etc/ceph/cluster_name.conf on all nodes under the [mon] section and reboot
mon_osd_min_down_reporters = 1
this should fix the issue. Please note that you should only do this when you only have 2 storage nodes. This settings can lead to issue if you have more than 2 storage nodes since in gives the power for 1 node to fail thee others.
version 1.4 is due mid Aug it will have many performance and tuning features. We are also releasing a 1.31 in a couple of days mainly to fix the issue with ESXi arp cache, it is being tested now.
Last edited on July 14, 2017, 11:52 pm · #6
maxthetor
24 Posts
July 25, 2017, 12:30 amQuote from maxthetor on July 25, 2017, 12:30 amGreat, worked with that option.
When I increase osd numbers per server, should I remove this option?
Great, worked with that option.
When I increase osd numbers per server, should I remove this option?
OSD not marked as down
maxthetor
24 Posts
Quote from maxthetor on July 14, 2017, 1:25 amWe are in our final tests and now we have a new problem.
OSDs are not getting down when a node crashes.
After some research I found this:
Https://bugzilla.redhat.com/show_bug.cgi?id=1425115
I'm not sure if it's really a ceph bug or something that can be fixed with configuration.
Do you have any idea what that might be? If it's a ceph bug do you have how to update?
------------------------------------------------------------------------------
2017-07-13 20:19:31.879197 mon.0 [INF] osd.5 marked itself down
2017-07-13 20:19:31.887533 mon.0 [INF] osd.0 marked itself down
2017-07-13 20:19:31.907472 mon.0 [INF] osd.2 marked itself down
2017-07-13 20:19:40.994921 mon.0 [INF] mon.san01 calling new monitor election
2017-07-13 20:19:40.996203 mon.2 [INF] mon.san03 calling new monitor election
2017-07-13 20:19:45.997331 mon.0 [INF] mon.san01@0 won leader election with quorum 0,2
2017-07-13 20:19:45.998626 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2 san01,san03
2017-07-13 20:19:46.128726 mon.0 [INF] monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
2017-07-13 20:19:46.129106 mon.0 [INF] osdmap e26342: 6 osds: 6 up, 6 in
2017-07-13 20:20:14.012643 mon.0 [INF] pgmap v1648401: 1000 pgs: 1000 active+clean; 395 GB data, 791 GB used, 11460 GB / 12252 GB avail
root@san01:~# ceph osd tree --cluster xxxxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
----------------------------------------------------------------------------------------------
We are in our final tests and now we have a new problem.
OSDs are not getting down when a node crashes.
After some research I found this:
https://bugzilla.redhat.com/show_bug.cgi?id=1425115
I'm not sure if it's really a ceph bug or something that can be fixed with configuration.
Do you have any idea what that might be? If it's a ceph bug do you have how to update?
------------------------------------------------------------------------------
2017-07-13 20:19:31.879197 mon.0 [INF] osd.5 marked itself down
2017-07-13 20:19:31.887533 mon.0 [INF] osd.0 marked itself down
2017-07-13 20:19:31.907472 mon.0 [INF] osd.2 marked itself down
2017-07-13 20:19:40.994921 mon.0 [INF] mon.san01 calling new monitor election
2017-07-13 20:19:40.996203 mon.2 [INF] mon.san03 calling new monitor election
2017-07-13 20:19:45.997331 mon.0 [INF] mon.san01@0 won leader election with quorum 0,2
2017-07-13 20:19:45.998626 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2 san01,san03
2017-07-13 20:19:46.128726 mon.0 [INF] monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
2017-07-13 20:19:46.129106 mon.0 [INF] osdmap e26342: 6 osds: 6 up, 6 in
2017-07-13 20:20:14.012643 mon.0 [INF] pgmap v1648401: 1000 pgs: 1000 active+clean; 395 GB data, 791 GB used, 11460 GB / 12252 GB avail
root@san01:~# ceph osd tree --cluster xxxxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
----------------------------------------------------------------------------------------------
admin
2,930 Posts
Quote from admin on July 14, 2017, 4:45 amCan you make sure the noout and nodown flags are not set:
ceph osd unset noout --cluster xx
ceph osd unset nodown --cluster xx
does this happen every time you shut a node or only sometimes ?
Can you make sure the noout and nodown flags are not set:
ceph osd unset noout --cluster xx
ceph osd unset nodown --cluster xx
does this happen every time you shut a node or only sometimes ?
maxthetor
24 Posts
Quote from maxthetor on July 14, 2017, 2:33 pmI do not have any of these values set in my ceph cluster.
ever. Independent of node crash.
If it's really a ceph bug, I'm thinking of separating the nodes from mon, with the nodes from osd
----------------------------------------------------------------------------------------
root@san01:~# ceph -w --cluster xxxxxxx
cluster 9f99e76f-1f50-4aa3-b876-dbf194a3cadf
health HEALTH_OK
monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
election epoch 500, quorum 0,1,2 san01,san02,san03
osdmap e29393: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v1682675: 1000 pgs, 1 pools, 120 GB data, 30981 objects
241 GB used, 12010 GB / 12252 GB avail
1000 active+clean
client io 673 B/s rd, 34329 B/s wr, 0 op/s rd, 3 op/s wr
I do not have any of these values set in my ceph cluster.
ever. Independent of node crash.
If it's really a ceph bug, I'm thinking of separating the nodes from mon, with the nodes from osd
----------------------------------------------------------------------------------------
root@san01:~# ceph -w --cluster xxxxxxx
cluster 9f99e76f-1f50-4aa3-b876-dbf194a3cadf
health HEALTH_OK
monmap e3: 3 mons at {san01=10.0.10.1:6789/0,san02=10.0.10.2:6789/0,san03=10.0.10.3:6789/0}
election epoch 500, quorum 0,1,2 san01,san02,san03
osdmap e29393: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v1682675: 1000 pgs, 1 pools, 120 GB data, 30981 objects
241 GB used, 12010 GB / 12252 GB avail
1000 active+clean
client io 673 B/s rd, 34329 B/s wr, 0 op/s rd, 3 op/s wr
admin
2,930 Posts
Quote from admin on July 14, 2017, 7:59 pmWe will look into it. it may be a Ceph issue but can you please let me know if this happens all the time or only in some cases ? We test failing nodes all the time and we have not hit this, also i believe in your earlier tests with Ceph recovery load, i would presume the OSD downs were detected correct ? If you can give me some more info it will greatly help.
PetaSAN 1.4 will include Ceph 10.2.7 which according to the link you sent has a fix, but i would much prefer if we can reproduce the problem here and make sure it does solve the issue you see and maybe send a patch.
We will look into it. it may be a Ceph issue but can you please let me know if this happens all the time or only in some cases ? We test failing nodes all the time and we have not hit this, also i believe in your earlier tests with Ceph recovery load, i would presume the OSD downs were detected correct ? If you can give me some more info it will greatly help.
PetaSAN 1.4 will include Ceph 10.2.7 which according to the link you sent has a fix, but i would much prefer if we can reproduce the problem here and make sure it does solve the issue you see and maybe send a patch.
maxthetor
24 Posts
Quote from maxthetor on July 14, 2017, 11:30 pmHappens all the time. At the moment I have 4 Petasan nodes. 2 nodes with osds, san01 and san02, and 2 nodes without osd, san03 e san04
No matter what sequence I turn off the nodes, or how many times I do the test, it does not mark the osd as down from the nodes san01 and san02.In my previous tests where I encountered problems until I adjusted the environment, the osds were very different.
2 disks 146GB, 1 disk 230GB, 1 virtual disk 130GB, etc...
The OSD environment was very messy, because it was a test environment.Now a more organized environment, 6 OSD distributed in 6 2TB disks in 2 nodes, 3 disks in each. Soon he would receive another 6 hard disk 2TB.
So that's right, before the OSDs were detected as down, even with different sizes, now more organized the OSDs are not detected as down.Is there a release date for version 1.4?
-------------------------------------------------------------------------------------------------------------------------------
root@san01:~# ceph osd tree --cluster xxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
Happens all the time. At the moment I have 4 Petasan nodes. 2 nodes with osds, san01 and san02, and 2 nodes without osd, san03 e san04
No matter what sequence I turn off the nodes, or how many times I do the test, it does not mark the osd as down from the nodes san01 and san02.
In my previous tests where I encountered problems until I adjusted the environment, the osds were very different.
2 disks 146GB, 1 disk 230GB, 1 virtual disk 130GB, etc...
The OSD environment was very messy, because it was a test environment.
Now a more organized environment, 6 OSD distributed in 6 2TB disks in 2 nodes, 3 disks in each. Soon he would receive another 6 hard disk 2TB.
So that's right, before the OSDs were detected as down, even with different sizes, now more organized the OSDs are not detected as down.
Is there a release date for version 1.4?
-------------------------------------------------------------------------------------------------------------------------------
root@san01:~# ceph osd tree --cluster xxxxxx
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.96452 root default
-2 0 host san03
-3 5.98228 host san01
6 1.99409 osd.6 up 1.00000 1.00000
7 1.99409 osd.7 up 1.00000 1.00000
8 1.99409 osd.8 up 1.00000 1.00000
-4 5.98224 host san02
2 1.99408 osd.2 up 1.00000 1.00000
5 1.99408 osd.5 up 1.00000 1.00000
0 1.99408 osd.0 up 1.00000 1.00000
admin
2,930 Posts
Quote from admin on July 14, 2017, 11:49 pmlooking into this, it is not a bug. it happens because you only have 2 storage/osd nodes and the default in Ceph is that you need 2 different storage nodes to report that an OSD is down from a third node. For a 2 storage node setup, please add the following in the /etc/ceph/cluster_name.conf on all nodes under the [mon] section and reboot
mon_osd_min_down_reporters = 1
this should fix the issue. Please note that you should only do this when you only have 2 storage nodes. This settings can lead to issue if you have more than 2 storage nodes since in gives the power for 1 node to fail thee others.
version 1.4 is due mid Aug it will have many performance and tuning features. We are also releasing a 1.31 in a couple of days mainly to fix the issue with ESXi arp cache, it is being tested now.
looking into this, it is not a bug. it happens because you only have 2 storage/osd nodes and the default in Ceph is that you need 2 different storage nodes to report that an OSD is down from a third node. For a 2 storage node setup, please add the following in the /etc/ceph/cluster_name.conf on all nodes under the [mon] section and reboot
mon_osd_min_down_reporters = 1
this should fix the issue. Please note that you should only do this when you only have 2 storage nodes. This settings can lead to issue if you have more than 2 storage nodes since in gives the power for 1 node to fail thee others.
version 1.4 is due mid Aug it will have many performance and tuning features. We are also releasing a 1.31 in a couple of days mainly to fix the issue with ESXi arp cache, it is being tested now.
maxthetor
24 Posts
Quote from maxthetor on July 25, 2017, 12:30 amGreat, worked with that option.
When I increase osd numbers per server, should I remove this option?
Great, worked with that option.
When I increase osd numbers per server, should I remove this option?