Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

No Active Mgr

Pages: 1 2

I have been having issues with one of my 3 managers (petasan-2) losing power, the other two (petasan-0, petasan-1) are fine. petasan-2 was the active manager.

The problem is, once petasan-2 goes down, shouldn't one of the other two step in and take it's place as manager?

This is not happening.

Version is 2.3.1

Thank you

 

root@petasan-1:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1

services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 17h), out of quorum: petasan-2
mgr: no daemons active (since 17h)
osd: 31 osds: 29 up (since 17h), 29 in (since 17h)

on petasan-0 node

do a

systemctl  restart ceph-mgr@petasan-0

if it cannot find the service, do a

/opt/petasan/scripts/create_mgr.py

if this fixes it, repeat on petasan-3

Didn't seem to affect it at all..

Still no active mgr

root@petasan-0:~# systemctl restart ceph-mgr@petasan-0
root@petasan-0:~#
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1

services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)

data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean

io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr

root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_ERR
no active mgr
1/3 mons down, quorum petasan-0,petasan-1

services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 20h), out of quorum: petasan-2
mgr: no daemons active (since 20h)
osd: 31 osds: 29 up (since 20h), 29 in (since 20h)

data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean

io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr

More..

root@petasan-0:~# systemctl stop ceph-mgr@petasan-0
root@petasan-0:~# systemctl start ceph-mgr@petasan-0
Job for ceph-mgr@petasan-0.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@petasan-0.service" and "journalctl -xe" for details.
root@petasan-0:~# systemctl status ceph-mgr@petasan-0.service
ceph-mgr@petasan-0.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-04-23 14:20:16 EDT; 5min ago
Process: 2920019 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id petasan-0 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2920019 (code=exited, status=1/FAILURE)

Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Service hold-off time over, scheduling restart.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Scheduled restart job, restart counter is at 3.
Apr 23 14:20:16 petasan-0 systemd[1]: Stopped Ceph cluster manager daemon.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:20:16 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:20:16 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Start request repeated too quickly.
Apr 23 14:25:39 petasan-0 systemd[1]: ceph-mgr@petasan-0.service: Failed with result 'exit-code'.
Apr 23 14:25:39 petasan-0 systemd[1]: Failed to start Ceph cluster manager daemon.
root@petasan-0:~#

on petasan-0 try this on the command line:
ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
see if this prints an error

else it works, then break from it and fix the error is complaining there is a limit on how many times you can restart the service.

The current settings is 3 times per 30 min
you can increase this
nano /lib/systemd/system/ceph-mgr@.service
StartLimitBurst=3 -> 10

systemctl daemon-reload
systemctl restart ceph-mgr@petasan-0

root@petasan-0:~# ceph-mgr -f --cluster --id petasan-0 --setuser ceph --setgroup ceph
did not load config file, using default settings.
unable to get monitor info from DNS SRV with service name: ceph-mon
2020-04-23 14:54:53.545 7fa0135f9d40 -1 failed for service _ceph-mon._tcp
2020-04-23 14:54:53.545 7fa0135f9d40 -1 monclient: get_monmap_and_config cannot identify monitors to contact
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#

try adding ceph after --cluster

ceph-mgr -f --cluster ceph  --id petasan-0 --setuser ceph --setgroup ceph

 

root@petasan-0:~# ceph-mgr -f --cluster ceph --id petasan-0 --setuser ceph --setgroup ceph
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x558e50a46a40) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
2020-04-24 12:17:27.476 7f6c58a52d40 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-petasan-0/keyring: (2) No such file or directory
2020-04-24 12:17:27.476 7f6c58a52d40 -1 AuthRegistry(0x7ffc8de01798) no keyring found at /var/lib/ceph/mgr/ceph-petasan-0/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)
root@petasan-0:~#

BTW - there are currently no subdirectories under /var/lib/ceph/mgr/ in either petasan-0 or petasan-1 servers.

 

  • Bill

if using 2.3.1

rm /opt/petasan/config/mgr_installed_flag
/opt/petasan/scripts/create_mgr.py

root@petasan-0:~# rm /opt/petasan/config/mgr_installed_flag
root@petasan-0:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.admin
root@petasan-0:~# ceph status
cluster:
id: 45e5f9d4-ba49-41b1-9b6a-09e3da4e2cd4
health: HEALTH_WARN
1/3 mons down, quorum petasan-0,petasan-1

services:
mon: 3 daemons, quorum petasan-0,petasan-1 (age 44h), out of quorum: petasan-2
mgr: petasan-0(active, since 17s)
osd: 31 osds: 29 up (since 44h), 29 in (since 43h)

data:
pools: 1 pools, 784 pgs
objects: 389.40k objects, 1.5 TiB
usage: 4.5 TiB used, 22 TiB / 26 TiB avail
pgs: 784 active+clean

io:
client: 57 KiB/s rd, 835 KiB/s wr, 34 op/s rd, 44 op/s wr

root@petasan-0:~#

 

Should I do the same on petasan-2 ?

 

Pages: 1 2