Forums - PetaSAN

ForumBug ReportingUpdate 3.2 issues
You need to log in to create posts and topics. Login · Register
Update 3.2 issues

AlbertHakvoort
21 Posts

July 12, 2023, 2:43 pm
Quote from AlbertHakvoort on July 12, 2023, 2:43 pm
Hello,

After the update to 3.2 the osd's are not updating.

When I (re)try to execute the update

ceph osd require-osd-release quincy

ceph stops working / hungs for a long time

mon.Ceph02 (mon.2) 8959 : cluster [INF] disallowing boot of quincy+ OSD osd.21 v2:10.0.1.13:6826/153756 because require_osd_release < octopus

root@Ceph01:~# ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 filesystem is degraded
1/3 mons down, quorum Ceph01,Ceph02
noout flag(s) set
3 osds down
all OSDs are running octopus or later but require_osd_release < octopus
Reduced data availability: 4385 pgs inactive, 347 pgs down, 2514 pgs peering
Degraded data redundancy: 32475412/259722717 objects degraded (12.504%), 576 pgs degraded, 582 pgs undersized
6381 slow ops, oldest one blocked for 4570 sec, mon.Ceph02 has slow ops

services:
mon: 3 daemons, quorum Ceph01,Ceph02 (age 0.366662s), out of quorum: Ceph03
mgr: Ceph03(active, since 5h), standbys: Ceph02, Ceph01
mds: 1/1 daemons up, 2 standby
osd: 66 osds: 19 up (since 6h), 22 in (since 6h); 1794 remapped pgs
flags noout

data:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 4385 pgs
objects: 86.57M objects, 193 TiB
usage: 278 TiB used, 84 TiB / 361 TiB avail
pgs: 21.482% pgs unknown
78.518% pgs not active
32475412/259722717 objects degraded (12.504%)
2514 peering
942 unknown
576 undersized+degraded+peered
347 down
6 undersized+peered

Tried restarting the nodes/osd's but no luck so far.. Any idea's?

Hello,

After the update to 3.2 the osd's are not updating.

When I (re)try to execute the update

ceph osd require-osd-release quincy

ceph stops working / hungs for a long time

mon.Ceph02 (mon.2) 8959 : cluster [INF] disallowing boot of quincy+ OSD osd.21 v2:10.0.1.13:6826/153756 because require_osd_release < octopus

root@Ceph01:~# ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 filesystem is degraded
1/3 mons down, quorum Ceph01,Ceph02
noout flag(s) set
3 osds down
all OSDs are running octopus or later but require_osd_release < octopus
Reduced data availability: 4385 pgs inactive, 347 pgs down, 2514 pgs peering
Degraded data redundancy: 32475412/259722717 objects degraded (12.504%), 576 pgs degraded, 582 pgs undersized
6381 slow ops, oldest one blocked for 4570 sec, mon.Ceph02 has slow ops

services:
mon: 3 daemons, quorum Ceph01,Ceph02 (age 0.366662s), out of quorum: Ceph03
mgr: Ceph03(active, since 5h), standbys: Ceph02, Ceph01
mds: 1/1 daemons up, 2 standby
osd: 66 osds: 19 up (since 6h), 22 in (since 6h); 1794 remapped pgs
flags noout

data:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 4385 pgs
objects: 86.57M objects, 193 TiB
usage: 278 TiB used, 84 TiB / 361 TiB avail
pgs: 21.482% pgs unknown
78.518% pgs not active
32475412/259722717 objects degraded (12.504%)
2514 peering
942 unknown
576 undersized+degraded+peered
347 down
6 undersized+peered

Tried restarting the nodes/osd's but no luck so far.. Any idea's?

Last edited on July 12, 2023, 2:52 pm by AlbertHakvoort · #1

hansmeiser2207
1 Post

July 13, 2023, 12:32 pm
Quote from hansmeiser2207 on July 13, 2023, 12:32 pm
Hello,

i have exactly the same error pattern.

We initially set up PetaSAN with version 2.1.0 and have since installed every available update.

All OSD's of all OSD nodes are listed as "down, in".

All OSD nodes have the operating system "Ubuntu v20.04.6 LTS (Focal Fossa)" installed and all packages are up to date.

The ceph version on the OSD nodes is "ceph/petasan-v3,now 17.2.5-1petasan amd64 [installed]".

The last update command "ceph osd require-osd-release quincy" is stuck. All OSDs are not being recognized by the cluster, although all OSD daemons have started successfully and they have no errors.
It seems to me that the monitor nodes can no longer communicate with the OSD nodes. I can rule out network problems.

I need urgent help please!

Thanks very much.

Hello,

i have exactly the same error pattern.

We initially set up PetaSAN with version 2.1.0 and have since installed every available update.

All OSD's of all OSD nodes are listed as "down, in".

All OSD nodes have the operating system "Ubuntu v20.04.6 LTS (Focal Fossa)" installed and all packages are up to date.

The ceph version on the OSD nodes is "ceph/petasan-v3,now 17.2.5-1petasan amd64 [installed]".

The last update command "ceph osd require-osd-release quincy" is stuck. All OSDs are not being recognized by the cluster, although all OSD daemons have started successfully and they have no errors.
It seems to me that the monitor nodes can no longer communicate with the OSD nodes. I can rule out network problems.

I need urgent help please!

Thanks very much.

#2

AlbertHakvoort
21 Posts

July 13, 2023, 1:46 pm
Quote from AlbertHakvoort on July 13, 2023, 1:46 pm
Support is looking into my issue.

Can you post the output of :

ceph status
ceph osd dump | grep release
ceph versions

Support is looking into my issue.

Can you post the output of :

ceph status
ceph osd dump | grep release
ceph versions

#3

AlbertHakvoort
21 Posts

July 13, 2023, 4:00 pm
Quote from AlbertHakvoort on July 13, 2023, 4:00 pm
Our Petasan is running again.

Support compiled a modified ceph-mon binary who was able to ignore the wrong version of the OSD's.

Our Petasan is running again.

Support compiled a modified ceph-mon binary who was able to ignore the wrong version of the OSD's.

#4

admin
2,959 Posts

July 13, 2023, 5:39 pm
Quote from admin on July 13, 2023, 5:39 pm
if it is the same issue, it is related to
https://www.spinics.net/lists/ceph-users/msg74089.html

it happens if the cluster was updated from initially old release. Note that we will provide a fix in our upgrade script to handle this automatically. So the following is just for the case you already upgraded an have the issue.

download modified monitor binary
wget https://www.petasan.org/fixes/320/ceph-mon.gz
gunzip ceph-mon.gz

On first 3 nodes

mv /usr/bin/ceph-mon /usr/bin/ceph-mon-orig
chmod +x ceph-mon
cp ceph-mon /usr/bin
ln -s /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2 /usr/lib/x86_64-linux-gnu/libceph-common.so.2
systemctl restart ceph-mon.target

make sure the 3 new monitors started and are in quorum using
ceph status
if yes then
ceph osd require-osd-release octopus
if all goes well, the following will command show octopus
ceph osd dump | grep release

if all ok, reboot all nodes in cluster
Only when all OSDs are up and all PGs are active/clean and the only issue is
all OSDs are running quincy or later but require_osd_release < quincy
then at this point
ceph osd require-osd-release quincy

Again we will fix the upgrade scripts to automatically handle this without need of this binary.

if it is the same issue, it is related to
https://www.spinics.net/lists/ceph-users/msg74089.html

it happens if the cluster was updated from initially old release. Note that we will provide a fix in our upgrade script to handle this automatically. So the following is just for the case you already upgraded an have the issue.

download modified monitor binary
wget https://www.petasan.org/fixes/320/ceph-mon.gz
gunzip ceph-mon.gz

On first 3 nodes

mv /usr/bin/ceph-mon /usr/bin/ceph-mon-orig
chmod +x ceph-mon
cp ceph-mon /usr/bin
ln -s /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2 /usr/lib/x86_64-linux-gnu/libceph-common.so.2
systemctl restart ceph-mon.target

make sure the 3 new monitors started and are in quorum using
ceph status
if yes then
ceph osd require-osd-release octopus
if all goes well, the following will command show octopus
ceph osd dump | grep release

if all ok, reboot all nodes in cluster
Only when all OSDs are up and all PGs are active/clean and the only issue is
all OSDs are running quincy or later but require_osd_release < quincy
then at this point
ceph osd require-osd-release quincy

Again we will fix the upgrade scripts to automatically handle this without need of this binary.

Last edited on July 14, 2023, 6:56 pm by admin · #5

admin
2,959 Posts

July 13, 2023, 8:51 pm
Quote from admin on July 13, 2023, 8:51 pm
upgrade sctipt has been modified to deal with this case. the script is dynamically fetched online, so no changes are locally required.

upgrade sctipt has been modified to deal with this case. the script is dynamically fetched online, so no changes are locally required.

#6

Post Reply: Update 3.2 issues

Cancel