No Active Mgr
Pages: 1 2
bill gottlieb
26 Posts
April 23, 2020, 5:49 pmQuote from bill gottlieb on April 23, 2020, 5:49 pmI have been having issues with one of my 3 managers (petasan-2) losing power, the other two (petasan-0, petasan-1) are fine. petasan-2 was the active manager.
The problem is, once petasan-2 goes down, shouldn't one of the other two step in and take it's place as manager?
This is not happening.
Version is 2.3.1
Thank you
root@petasan-1:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 17h), out of quorum: petasan-2
mgr: no daemons active (since 17h)
osd: 31 osds: 29 up (since 17h), 29 in (since 17h)
I have been having issues with one of my 3 managers (petasan-2) losing power, the other two (petasan-0, petasan-1) are fine. petasan-2 was the active manager.
The problem is, once petasan-2 goes down, shouldn't one of the other two step in and take it's place as manager?
This is not happening.
Version is 2.3.1
Thank you
root@petasan-1:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 17h), out of quorum: petasan-2
mgr: no daemons active (since 17h)
osd: 31 osds: 29 up (since 17h), 29 in (since 17h)
admin
2,930 Posts
April 23, 2020, 6:06 pmQuote from admin on April 23, 2020, 6:06 pmon petasan-0 node
do a
systemctl restart ceph-mgr@petasan-0
if it cannot find the service, do a
/opt/petasan/scripts/create_mgr.py
if this fixes it, repeat on petasan-3
on petasan-0 node
do a
systemctl restart ceph-mgr@petasan-0
if it cannot find the service, do a
/opt/petasan/scripts/create_mgr.py
if this fixes it, repeat on petasan-3
bill gottlieb
26 Posts
April 23, 2020, 6:17 pmQuote from bill gottlieb on April 23, 2020, 6:17 pmDidn't seem to affect it at all..
Still no active mgr
root@petasan-0:~# systemctl restart ceph-mgr@petasan-0
root@petasan-0:~#
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
Didn't seem to affect it at all..
Still no active mgr
root@petasan-0:~# systemctl restart ceph-mgr@petasan-0
root@petasan-0:~#
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
bill gottlieb
26 Posts
April 23, 2020, 6:22 pmQuote from bill gottlieb on April 23, 2020, 6:22 pmMore..
root@petasan-0:~# systemctl stop ceph-mgr@petasan-0
root@petasan-0:~# systemctl start ceph-mgr@petasan-0
Job for ceph-mgr@petasan-0.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@petasan-0.service" and "journalctl -xe" for details.
root@petasan-0:~# systemctl status ceph-mgr@petasan-0.service
● ceph-mgr@petasan-0.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-04-23 14:20:16 EDT; 5min ago
Process: 2920019 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id petasan-0 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2920019 (code=exited, status=1/FAILURE)
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Service hold-off time over, scheduling restart.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Scheduled restart job, restart counter is at 3.
Apr 23 14:20:16 petasan-0 systemd[1]: Stopped Ceph cluster manager daemon.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:20:16 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:25:39 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
root@petasan-0:~#
More..
root@petasan-0:~# systemctl stop ceph-mgr@petasan-0
root@petasan-0:~# systemctl start ceph-mgr@petasan-0
Job for ceph-mgr@petasan-0.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@petasan-0.service" and "journalctl -xe" for details.
root@petasan-0:~# systemctl status ceph-mgr@petasan-0.service
● ceph-mgr@petasan-0.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-04-23 14:20:16 EDT; 5min ago
Process: 2920019 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id petasan-0 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2920019 (code=exited, status=1/FAILURE)
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Service hold-off time over, scheduling restart.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Scheduled restart job, restart counter is at 3.
Apr 23 14:20:16 petasan-0 systemd[1]: Stopped Ceph cluster manager daemon.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:20:16 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:25:39 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
root@petasan-0:~#
admin
2,930 Posts
April 23, 2020, 6:37 pmQuote from admin on April 23, 2020, 6:37 pmon petasan-0 try this on the command line:
ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
see if this prints an error
else it works, then break from it and fix the error is complaining there is a limit on how many times you can restart the service.
The current settings is 3 times per 30 min
you can increase this
nano /lib/systemd/system/ceph-mgr@.service
StartLimitBurst=3 -> 10
systemctl daemon-reload
systemctl restart ceph-mgr@petasan-0
on petasan-0 try this on the command line:
ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
see if this prints an error
else it works, then break from it and fix the error is complaining there is a limit on how many times you can restart the service.
The current settings is 3 times per 30 min
you can increase this
nano /lib/systemd/system/ceph-mgr@.service
StartLimitBurst=3 -> 10
systemctl daemon-reload
systemctl restart ceph-mgr@petasan-0
bill gottlieb
26 Posts
April 23, 2020, 6:49 pmQuote from bill gottlieb on April 23, 2020, 6:49 pmroot@petasan-0:~# ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
did not load config file, using default settings.
unable to get monitor info from DNS SRV with service name: ceph-mon
2020-04-23 14:54:53.545 7fa0135f9d40 -1 failed for service _ceph-mon._tcp
2020-04-23 14:54:53.545 7fa0135f9d40 -1 monclient: get_monmap_and_config cannot identify monitors to contact
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#
root@petasan-0:~# ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
did not load config file, using default settings.
unable to get monitor info from DNS SRV with service name: ceph-mon
2020-04-23 14:54:53.545 7fa0135f9d40 -1 failed for service _ceph-mon._tcp
2020-04-23 14:54:53.545 7fa0135f9d40 -1 monclient: get_monmap_and_config cannot identify monitors to contact
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#
admin
2,930 Posts
April 23, 2020, 7:12 pmQuote from admin on April 23, 2020, 7:12 pmtry adding ceph after --cluster
ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
try adding ceph after --cluster
ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
bill gottlieb
26 Posts
April 24, 2020, 4:13 pmQuote from bill gottlieb on April 24, 2020, 4:13 pmroot@petasan-0:~# ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x558e50a46a40) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x7ffc8de01798) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#
BTW - there are currently no subdirectories under /var/lib/ceph/mgr/ in either petasan-0 or petasan-1 servers.
- Bill
root@petasan-0:~# ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x558e50a46a40) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x7ffc8de01798) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#
BTW - there are currently no subdirectories under /var/lib/ceph/mgr/ in either petasan-0 or petasan-1 servers.
- Bill
admin
2,930 Posts
April 24, 2020, 5:07 pmQuote from admin on April 24, 2020, 5:07 pmif using 2.3.1
rm /opt/petasan/config/mgr_installed_flag
/opt/petasan/scripts/create_mgr.py
if using 2.3.1
rm /opt/petasan/config/mgr_installed_flag
/opt/petasan/scripts/create_mgr.py
bill gottlieb
26 Posts
April 24, 2020, 5:47 pmQuote from bill gottlieb on April 24, 2020, 5:47 pmroot@petasan-0:~# rm /opt/petasan/config/mgr_installed_flag
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.admin
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_WARN
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 44h), out of quorum: petasan-2
mgr: petasan-0(active, since 17s)
osd: 31 osds: 29 up (since 44h), 29 in (since 43h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
root@petasan-0:~#
Should I do the same on petasan-2 ?
root@petasan-0:~# rm /opt/petasan/config/mgr_installed_flag
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.admin
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_WARN
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 44h), out of quorum: petasan-2
mgr: petasan-0(active, since 17s)
osd: 31 osds: 29 up (since 44h), 29 in (since 43h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
root@petasan-0:~#
Should I do the same on petasan-2 ?
Pages: 1 2
No Active Mgr
bill gottlieb
26 Posts
Quote from bill gottlieb on April 23, 2020, 5:49 pmI have been having issues with one of my 3 managers (petasan-2) losing power, the other two (petasan-0, petasan-1) are fine. petasan-2 was the active manager.
The problem is, once petasan-2 goes down, shouldn't one of the other two step in and take it's place as manager?
This is not happening.
Version is 2.3.1
Thank you
root@petasan-1:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 17h), out of quorum: petasan-2
mgr: no daemons active (since 17h)
osd: 31 osds: 29 up (since 17h), 29 in (since 17h)
I have been having issues with one of my 3 managers (petasan-2) losing power, the other two (petasan-0, petasan-1) are fine. petasan-2 was the active manager.
The problem is, once petasan-2 goes down, shouldn't one of the other two step in and take it's place as manager?
This is not happening.
Version is 2.3.1
Thank you
root@petasan-1:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 17h), out of quorum: petasan-2
mgr: no daemons active (since 17h)
osd: 31 osds: 29 up (since 17h), 29 in (since 17h)
admin
2,930 Posts
Quote from admin on April 23, 2020, 6:06 pmon petasan-0 node
do a
systemctl restart ceph-mgr@petasan-0
if it cannot find the service, do a
/opt/petasan/scripts/create_mgr.py
if this fixes it, repeat on petasan-3
on petasan-0 node
do a
systemctl restart ceph-mgr@petasan-0
if it cannot find the service, do a
/opt/petasan/scripts/create_mgr.py
if this fixes it, repeat on petasan-3
bill gottlieb
26 Posts
Quote from bill gottlieb on April 23, 2020, 6:17 pmDidn't seem to affect it at all..
Still no active mgr
root@petasan-0:~# systemctl restart ceph-mgr@petasan-0
root@petasan-0:~#
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+cleanio:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wrroot@petasan-0:~# /opt/petasan/scripts/create_mgr.py
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+cleanio:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
Didn't seem to affect it at all..
Still no active mgr
root@petasan-0:~# systemctl restart ceph-mgr@petasan-0
root@petasan-0:~#
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
bill gottlieb
26 Posts
Quote from bill gottlieb on April 23, 2020, 6:22 pmMore..
root@petasan-0:~# systemctl stop ceph-mgr@petasan-0
root@petasan-0:~# systemctl start ceph-mgr@petasan-0
Job for ceph-mgr@petasan-0.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@petasan-0.service" and "journalctl -xe" for details.
root@petasan-0:~# systemctl status ceph-mgr@petasan-0.service
● ceph-mgr@petasan-0.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-04-23 14:20:16 EDT; 5min ago
Process: 2920019 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id petasan-0 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2920019 (code=exited, status=1/FAILURE)Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Service hold-off time over, scheduling restart.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Scheduled restart job, restart counter is at 3.
Apr 23 14:20:16 petasan-0 systemd[1]: Stopped Ceph cluster manager daemon.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:20:16 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:25:39 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
root@petasan-0:~#
More..
root@petasan-0:~# systemctl stop ceph-mgr@petasan-0
root@petasan-0:~# systemctl start ceph-mgr@petasan-0
Job for ceph-mgr@petasan-0.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@petasan-0.service" and "journalctl -xe" for details.
root@petasan-0:~# systemctl status ceph-mgr@petasan-0.service
● ceph-mgr@petasan-0.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-04-23 14:20:16 EDT; 5min ago
Process: 2920019 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id petasan-0 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2920019 (code=exited, status=1/FAILURE)
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Service hold-off time over, scheduling restart.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Scheduled restart job, restart counter is at 3.
Apr 23 14:20:16 petasan-0 systemd[1]: Stopped Ceph cluster manager daemon.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:20:16 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:25:39 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
root@petasan-0:~#
admin
2,930 Posts
Quote from admin on April 23, 2020, 6:37 pmon petasan-0 try this on the command line:
ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
see if this prints an errorelse it works, then break from it and fix the error is complaining there is a limit on how many times you can restart the service.
The current settings is 3 times per 30 min
you can increase this
nano /lib/systemd/system/ceph-mgr@.service
StartLimitBurst=3 -> 10systemctl daemon-reload
systemctl restart ceph-mgr@petasan-0
on petasan-0 try this on the command line:
ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
see if this prints an error
else it works, then break from it and fix the error is complaining there is a limit on how many times you can restart the service.
The current settings is 3 times per 30 min
you can increase this
nano /lib/systemd/system/ceph-mgr@.service
StartLimitBurst=3 -> 10
systemctl daemon-reload
systemctl restart ceph-mgr@petasan-0
bill gottlieb
26 Posts
Quote from bill gottlieb on April 23, 2020, 6:49 pmroot@petasan-0:~# ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
did not load config file, using default settings.
unable to get monitor info from DNS SRV with service name: ceph-mon
2020-04-23 14:54:53.545 7fa0135f9d40 -1 failed for service _ceph-mon._tcp
2020-04-23 14:54:53.545 7fa0135f9d40 -1 monclient: get_monmap_and_config cannot identify monitors to contact
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#
root@petasan-0:~# ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
did not load config file, using default settings.
unable to get monitor info from DNS SRV with service name: ceph-mon
2020-04-23 14:54:53.545 7fa0135f9d40 -1 failed for service _ceph-mon._tcp
2020-04-23 14:54:53.545 7fa0135f9d40 -1 monclient: get_monmap_and_config cannot identify monitors to contact
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#
admin
2,930 Posts
Quote from admin on April 23, 2020, 7:12 pmtry adding ceph after --cluster
ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
try adding ceph after --cluster
ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
bill gottlieb
26 Posts
Quote from bill gottlieb on April 24, 2020, 4:13 pmroot@petasan-0:~# ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x558e50a46a40) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x7ffc8de01798) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#BTW - there are currently no subdirectories under /var/lib/ceph/mgr/ in either petasan-0 or petasan-1 servers.
- Bill
root@petasan-0:~# ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x558e50a46a40) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x7ffc8de01798) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#
BTW - there are currently no subdirectories under /var/lib/ceph/mgr/ in either petasan-0 or petasan-1 servers.
- Bill
admin
2,930 Posts
Quote from admin on April 24, 2020, 5:07 pmif using 2.3.1
rm /opt/petasan/config/mgr_installed_flag
/opt/petasan/scripts/create_mgr.py
if using 2.3.1
rm /opt/petasan/config/mgr_installed_flag
/opt/petasan/scripts/create_mgr.py
bill gottlieb
26 Posts
Quote from bill gottlieb on April 24, 2020, 5:47 pmroot@petasan-0:~# rm /opt/petasan/config/mgr_installed_flag
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.admin
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_WARN
1/3 mons down, quorum petasan-0,petasan-1services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 44h), out of quorum: petasan-2
mgr: petasan-0(active, since 17s)
osd: 31 osds: 29 up (since 44h), 29 in (since 43h)data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+cleanio:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wrroot@petasan-0:~#
Should I do the same on petasan-2 ?
root@petasan-0:~# rm /opt/petasan/config/mgr_installed_flag
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.admin
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_WARN
1/3 mons down, quorum petasan-0,petasan-1
services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 44h), out of quorum: petasan-2
mgr: petasan-0(active, since 17s)
osd: 31 osds: 29 up (since 44h), 29 in (since 43h)
data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean
io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr
root@petasan-0:~#
Should I do the same on petasan-2 ?