Ceph cluster manager daemon failure (after 2.3.1 upgrade)
trexman
60 Posts
November 1, 2019, 6:23 pmQuote from trexman on November 1, 2019, 6:23 pmHi,
we did the PetaSAN upgrade from 2.3.0 to 2.3.1.
The upgrade so far went fine, except one manager daemon won't start on one node.
So after I did the restart of the manager (step 3.3 of the upgrade guide) I got the following message:
Job for ceph-mgr@HBPS03.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@HBPS03.service" and "journalctl -xe" for details.
The status of the service says:
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:34:50 CET; 4min 0s ago
Process: 13575 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 13575 (code=exited, status=1/FAILURE)
Nov 01 18:34:50 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:34:50 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:38:14 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
I also did a complete reboot of this node without success.
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:45:54 CET; 1min 33s ago
Process: 3954 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 3954 (code=exited, status=1/FAILURE)
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Service hold-off time over, scheduling restart.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Scheduled restart job, restart counter is at 3.
Nov 01 18:45:54 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:45:54 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
What problem could this occurs?
Hi,
we did the PetaSAN upgrade from 2.3.0 to 2.3.1.
The upgrade so far went fine, except one manager daemon won't start on one node.
So after I did the restart of the manager (step 3.3 of the upgrade guide) I got the following message:
Job for ceph-mgr@HBPS03.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@HBPS03.service" and "journalctl -xe" for details.
The status of the service says:
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:34:50 CET; 4min 0s ago
Process: 13575 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 13575 (code=exited, status=1/FAILURE)
Nov 01 18:34:50 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:34:50 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:38:14 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
I also did a complete reboot of this node without success.
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:45:54 CET; 1min 33s ago
Process: 3954 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 3954 (code=exited, status=1/FAILURE)
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Service hold-off time over, scheduling restart.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Scheduled restart job, restart counter is at 3.
Nov 01 18:45:54 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:45:54 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
What problem could this occurs?
trexman
60 Posts
November 1, 2019, 6:35 pmQuote from trexman on November 1, 2019, 6:35 pmI got a few more information's. This is the output of the log ceph-mgr.HBPS03.log:
2019-11-01 18:21:56.858395 7f053c25f800 0 set uid:gid to 64045:64045 (ceph:ceph)
2019-11-01 18:21:56.858413 7f053c25f800 0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-mgr, pid 2434
2019-11-01 18:21:56.860471 7f053c25f800 0 pidfile_write: ignore empty --pid-file
2019-11-01 18:21:56.865317 7f053c25f800 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-HBPS03/keyring: (2) No such file or directory
2019-11-01 18:21:56.865332 7f053c25f800 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
And YES this is the problem. The directory /var/lib/ceph/mgr/ on the node HBPS03 is empty. No sub directory ceph-HBPS03 and no keyring like on the other nodes.
How could this happens and how could this fixed?
Can i just create the directory ceph-HBPS03 and copy the keyring from another node?
Thanks for your help.
I got a few more information's. This is the output of the log ceph-mgr.HBPS03.log:
2019-11-01 18:21:56.858395 7f053c25f800 0 set uid:gid to 64045:64045 (ceph:ceph)
2019-11-01 18:21:56.858413 7f053c25f800 0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-mgr, pid 2434
2019-11-01 18:21:56.860471 7f053c25f800 0 pidfile_write: ignore empty --pid-file
2019-11-01 18:21:56.865317 7f053c25f800 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-HBPS03/keyring: (2) No such file or directory
2019-11-01 18:21:56.865332 7f053c25f800 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
And YES this is the problem. The directory /var/lib/ceph/mgr/ on the node HBPS03 is empty. No sub directory ceph-HBPS03 and no keyring like on the other nodes.
How could this happens and how could this fixed?
Can i just create the directory ceph-HBPS03 and copy the keyring from another node?
Thanks for your help.
admin
2,930 Posts
November 2, 2019, 6:40 amQuote from admin on November 2, 2019, 6:40 amI understand that you run the Nautilus upgrade script and followed the guide, all went well expect 1 manager, if so can you run the following on the node in question and show output:
dpkg -s ceph-mgr | grep Version
ceph status | grep mgr
/opt/petasan/scripts/create_mgr.py
tail /opt/petasan/log/PetaSAN.log
I understand that you run the Nautilus upgrade script and followed the guide, all went well expect 1 manager, if so can you run the following on the node in question and show output:
dpkg -s ceph-mgr | grep Version
ceph status | grep mgr
/opt/petasan/scripts/create_mgr.py
tail /opt/petasan/log/PetaSAN.log
trexman
60 Posts
November 2, 2019, 6:53 amQuote from trexman on November 2, 2019, 6:53 amHi,
we also have another problem. I see that this node is now rebooting every 2-3 hours... 🙁
root@HBPS03:~# dpkg -s ceph-mgr | grep Version
Version: 14.2.2-1bionic
root@HBPS03:~# ceph status | grep mgr
mgr: HBPS01(active, since 13h), standbys: HBPS02
root@HBPS03:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.admin
root@HBPS03:~# tail /opt/petasan/log/PetaSAN.log
02/11/2019 07:44:25 INFO Service is starting.
02/11/2019 07:44:25 INFO Cleaning unused configurations.
02/11/2019 07:44:25 INFO Cleaning all mapped disks
02/11/2019 07:44:25 INFO Cleaning unused rbd images.
02/11/2019 07:44:25 INFO Cleaning unused ips.
02/11/2019 07:50:20 INFO create_mgr() fresh install
02/11/2019 07:50:20 INFO create_mgr() started
02/11/2019 07:50:20 INFO create_mgr() cmd: mkdir -p /var/lib/ceph/mgr/ceph-HBPS03
02/11/2019 07:50:20 INFO create_mgr() cmd: ceph --cluster ceph auth get-or-create mgr.HBPS03 mon 'allow profile mgr' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mgr/ceph-HBPS03/keyring
02/11/2019 07:50:21 INFO create_mgr() ended successfully
OK after this the keyring file is now existing and the ceph-manager service is running.
Hopefully the rebooting will now stops.
Thanks for your quick help!
Hi,
we also have another problem. I see that this node is now rebooting every 2-3 hours... 🙁
root@HBPS03:~# dpkg -s ceph-mgr | grep Version
Version: 14.2.2-1bionic
root@HBPS03:~# ceph status | grep mgr
mgr: HBPS01(active, since 13h), standbys: HBPS02
root@HBPS03:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.admin
root@HBPS03:~# tail /opt/petasan/log/PetaSAN.log
02/11/2019 07:44:25 INFO Service is starting.
02/11/2019 07:44:25 INFO Cleaning unused configurations.
02/11/2019 07:44:25 INFO Cleaning all mapped disks
02/11/2019 07:44:25 INFO Cleaning unused rbd images.
02/11/2019 07:44:25 INFO Cleaning unused ips.
02/11/2019 07:50:20 INFO create_mgr() fresh install
02/11/2019 07:50:20 INFO create_mgr() started
02/11/2019 07:50:20 INFO create_mgr() cmd: mkdir -p /var/lib/ceph/mgr/ceph-HBPS03
02/11/2019 07:50:20 INFO create_mgr() cmd: ceph --cluster ceph auth get-or-create mgr.HBPS03 mon 'allow profile mgr' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mgr/ceph-HBPS03/keyring
02/11/2019 07:50:21 INFO create_mgr() ended successfully
OK after this the keyring file is now existing and the ceph-manager service is running.
Hopefully the rebooting will now stops.
Thanks for your quick help!
trexman
60 Posts
November 2, 2019, 8:45 amQuote from trexman on November 2, 2019, 8:45 amThe problem with the rebooting node still persist. I can see that these reboots are some how time triggered, but I have no idea where to look for the reason. And it is an actual reboot of the node, if i check the uptime.
This problem first occurs after the upgrade.
Can you please help with this.
The problem with the rebooting node still persist. I can see that these reboots are some how time triggered, but I have no idea where to look for the reason. And it is an actual reboot of the node, if i check the uptime.
This problem first occurs after the upgrade.
Can you please help with this.
Last edited on November 2, 2019, 8:46 am by trexman · #5
admin
2,930 Posts
November 2, 2019, 3:08 pmQuote from admin on November 2, 2019, 3:08 pmas it tuns out you are customer, please log this in our support portal.
as it tuns out you are customer, please log this in our support portal.
Ceph cluster manager daemon failure (after 2.3.1 upgrade)
trexman
60 Posts
Quote from trexman on November 1, 2019, 6:23 pmHi,
we did the PetaSAN upgrade from 2.3.0 to 2.3.1.
The upgrade so far went fine, except one manager daemon won't start on one node.So after I did the restart of the manager (step 3.3 of the upgrade guide) I got the following message:
Job for ceph-mgr@HBPS03.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@HBPS03.service" and "journalctl -xe" for details.The status of the service says:
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:34:50 CET; 4min 0s ago
Process: 13575 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 13575 (code=exited, status=1/FAILURE)Nov 01 18:34:50 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:34:50 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:38:14 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.I also did a complete reboot of this node without success.
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:45:54 CET; 1min 33s ago
Process: 3954 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 3954 (code=exited, status=1/FAILURE)Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Service hold-off time over, scheduling restart.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Scheduled restart job, restart counter is at 3.
Nov 01 18:45:54 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:45:54 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.What problem could this occurs?
Hi,
we did the PetaSAN upgrade from 2.3.0 to 2.3.1.
The upgrade so far went fine, except one manager daemon won't start on one node.
So after I did the restart of the manager (step 3.3 of the upgrade guide) I got the following message:
Job for ceph-mgr@HBPS03.service failed because the control process exited with error code.
See "systemctl status ceph-mgr@HBPS03.service" and "journalctl -xe" for details.
The status of the service says:
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:34:50 CET; 4min 0s ago
Process: 13575 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 13575 (code=exited, status=1/FAILURE)Nov 01 18:34:50 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:34:50 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:34:50 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:38:14 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:38:14 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
I also did a complete reboot of this node without success.
● ceph-mgr@HBPS03.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-11-01 18:45:54 CET; 1min 33s ago
Process: 3954 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id HBPS03 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 3954 (code=exited, status=1/FAILURE)Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Service hold-off time over, scheduling restart.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Scheduled restart job, restart counter is at 3.
Nov 01 18:45:54 HBPS03 systemd[1]: Stopped Ceph cluster manager daemon.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Start request repeated too quickly.
Nov 01 18:45:54 HBPS03 systemd[1]: ceph-mgr@HBPS03.service: Failed with result 'exit-code'.
Nov 01 18:45:54 HBPS03 systemd[1]: Failed to start Ceph cluster manager daemon.
What problem could this occurs?
trexman
60 Posts
Quote from trexman on November 1, 2019, 6:35 pmI got a few more information's. This is the output of the log ceph-mgr.HBPS03.log:
2019-11-01 18:21:56.858395 7f053c25f800 0 set uid:gid to 64045:64045 (ceph:ceph)
2019-11-01 18:21:56.858413 7f053c25f800 0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-mgr, pid 2434
2019-11-01 18:21:56.860471 7f053c25f800 0 pidfile_write: ignore empty --pid-file
2019-11-01 18:21:56.865317 7f053c25f800 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-HBPS03/keyring: (2) No such file or directory
2019-11-01 18:21:56.865332 7f053c25f800 -1 monclient: ERROR: missing keyring, cannot use cephx for authenticationAnd YES this is the problem. The directory /var/lib/ceph/mgr/ on the node HBPS03 is empty. No sub directory ceph-HBPS03 and no keyring like on the other nodes.
How could this happens and how could this fixed?
Can i just create the directory ceph-HBPS03 and copy the keyring from another node?Thanks for your help.
I got a few more information's. This is the output of the log ceph-mgr.HBPS03.log:
2019-11-01 18:21:56.858395 7f053c25f800 0 set uid:gid to 64045:64045 (ceph:ceph)
2019-11-01 18:21:56.858413 7f053c25f800 0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-mgr, pid 2434
2019-11-01 18:21:56.860471 7f053c25f800 0 pidfile_write: ignore empty --pid-file
2019-11-01 18:21:56.865317 7f053c25f800 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-HBPS03/keyring: (2) No such file or directory
2019-11-01 18:21:56.865332 7f053c25f800 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
And YES this is the problem. The directory /var/lib/ceph/mgr/ on the node HBPS03 is empty. No sub directory ceph-HBPS03 and no keyring like on the other nodes.
How could this happens and how could this fixed?
Can i just create the directory ceph-HBPS03 and copy the keyring from another node?
Thanks for your help.
admin
2,930 Posts
Quote from admin on November 2, 2019, 6:40 amI understand that you run the Nautilus upgrade script and followed the guide, all went well expect 1 manager, if so can you run the following on the node in question and show output:
dpkg -s ceph-mgr | grep Version
ceph status | grep mgr/opt/petasan/scripts/create_mgr.py
tail /opt/petasan/log/PetaSAN.log
I understand that you run the Nautilus upgrade script and followed the guide, all went well expect 1 manager, if so can you run the following on the node in question and show output:
dpkg -s ceph-mgr | grep Version
ceph status | grep mgr/opt/petasan/scripts/create_mgr.py
tail /opt/petasan/log/PetaSAN.log
trexman
60 Posts
Quote from trexman on November 2, 2019, 6:53 amHi,
we also have another problem. I see that this node is now rebooting every 2-3 hours... 🙁root@HBPS03:~# dpkg -s ceph-mgr | grep Version
Version: 14.2.2-1bionicroot@HBPS03:~# ceph status | grep mgr
mgr: HBPS01(active, since 13h), standbys: HBPS02root@HBPS03:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.adminroot@HBPS03:~# tail /opt/petasan/log/PetaSAN.log
02/11/2019 07:44:25 INFO Service is starting.
02/11/2019 07:44:25 INFO Cleaning unused configurations.
02/11/2019 07:44:25 INFO Cleaning all mapped disks
02/11/2019 07:44:25 INFO Cleaning unused rbd images.
02/11/2019 07:44:25 INFO Cleaning unused ips.
02/11/2019 07:50:20 INFO create_mgr() fresh install
02/11/2019 07:50:20 INFO create_mgr() started
02/11/2019 07:50:20 INFO create_mgr() cmd: mkdir -p /var/lib/ceph/mgr/ceph-HBPS03
02/11/2019 07:50:20 INFO create_mgr() cmd: ceph --cluster ceph auth get-or-create mgr.HBPS03 mon 'allow profile mgr' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mgr/ceph-HBPS03/keyring
02/11/2019 07:50:21 INFO create_mgr() ended successfullyOK after this the keyring file is now existing and the ceph-manager service is running.
Hopefully the rebooting will now stops.Thanks for your quick help!
Hi,
we also have another problem. I see that this node is now rebooting every 2-3 hours... 🙁
root@HBPS03:~# dpkg -s ceph-mgr | grep Version
Version: 14.2.2-1bionicroot@HBPS03:~# ceph status | grep mgr
mgr: HBPS01(active, since 13h), standbys: HBPS02root@HBPS03:~# /opt/petasan/scripts/create_mgr.py
updated caps for client.adminroot@HBPS03:~# tail /opt/petasan/log/PetaSAN.log
02/11/2019 07:44:25 INFO Service is starting.
02/11/2019 07:44:25 INFO Cleaning unused configurations.
02/11/2019 07:44:25 INFO Cleaning all mapped disks
02/11/2019 07:44:25 INFO Cleaning unused rbd images.
02/11/2019 07:44:25 INFO Cleaning unused ips.
02/11/2019 07:50:20 INFO create_mgr() fresh install
02/11/2019 07:50:20 INFO create_mgr() started
02/11/2019 07:50:20 INFO create_mgr() cmd: mkdir -p /var/lib/ceph/mgr/ceph-HBPS03
02/11/2019 07:50:20 INFO create_mgr() cmd: ceph --cluster ceph auth get-or-create mgr.HBPS03 mon 'allow profile mgr' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mgr/ceph-HBPS03/keyring
02/11/2019 07:50:21 INFO create_mgr() ended successfully
OK after this the keyring file is now existing and the ceph-manager service is running.
Hopefully the rebooting will now stops.
Thanks for your quick help!
trexman
60 Posts
Quote from trexman on November 2, 2019, 8:45 amThe problem with the rebooting node still persist. I can see that these reboots are some how time triggered, but I have no idea where to look for the reason. And it is an actual reboot of the node, if i check the uptime.
This problem first occurs after the upgrade.
Can you please help with this.
The problem with the rebooting node still persist. I can see that these reboots are some how time triggered, but I have no idea where to look for the reason. And it is an actual reboot of the node, if i check the uptime.
This problem first occurs after the upgrade.
Can you please help with this.
admin
2,930 Posts
Quote from admin on November 2, 2019, 3:08 pmas it tuns out you are customer, please log this in our support portal.
as it tuns out you are customer, please log this in our support portal.