Update 3.2 issues
AlbertHakvoort
21 Posts
July 12, 2023, 2:43 pmQuote from AlbertHakvoort on July 12, 2023, 2:43 pmHello,
After the update to 3.2 the osd's are not updating.
When I (re)try to execute the update
ceph osd require-osd-release quincy
ceph stops working / hungs for a long time
mon.Ceph02 (mon.2) 8959 : cluster [INF] disallowing boot of quincy+ OSD osd.21 v2:10.0.1.13:6826/153756 because require_osd_release < octopus
root@Ceph01:~# ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 filesystem is degraded
1/3 mons down, quorum Ceph01,Ceph02
noout flag(s) set
3 osds down
all OSDs are running octopus or later but require_osd_release < octopus
Reduced data availability: 4385 pgs inactive, 347 pgs down, 2514 pgs peering
Degraded data redundancy: 32475412/259722717 objects degraded (12.504%), 576 pgs degraded, 582 pgs undersized
6381 slow ops, oldest one blocked for 4570 sec, mon.Ceph02 has slow ops
services:
mon: 3 daemons, quorum Ceph01,Ceph02 (age 0.366662s), out of quorum: Ceph03
mgr: Ceph03(active, since 5h), standbys: Ceph02, Ceph01
mds: 1/1 daemons up, 2 standby
osd: 66 osds: 19 up (since 6h), 22 in (since 6h); 1794 remapped pgs
flags noout
data:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 4385 pgs
objects: 86.57M objects, 193 TiB
usage: 278 TiB used, 84 TiB / 361 TiB avail
pgs: 21.482% pgs unknown
78.518% pgs not active
32475412/259722717 objects degraded (12.504%)
2514 peering
942 unknown
576 undersized+degraded+peered
347 down
6 undersized+peered
Tried restarting the nodes/osd's but no luck so far.. Any idea's?
Hello,
After the update to 3.2 the osd's are not updating.
When I (re)try to execute the update
ceph osd require-osd-release quincy
ceph stops working / hungs for a long time
mon.Ceph02 (mon.2) 8959 : cluster [INF] disallowing boot of quincy+ OSD osd.21 v2:10.0.1.13:6826/153756 because require_osd_release < octopus
root@Ceph01:~# ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 filesystem is degraded
1/3 mons down, quorum Ceph01,Ceph02
noout flag(s) set
3 osds down
all OSDs are running octopus or later but require_osd_release < octopus
Reduced data availability: 4385 pgs inactive, 347 pgs down, 2514 pgs peering
Degraded data redundancy: 32475412/259722717 objects degraded (12.504%), 576 pgs degraded, 582 pgs undersized
6381 slow ops, oldest one blocked for 4570 sec, mon.Ceph02 has slow ops
services:
mon: 3 daemons, quorum Ceph01,Ceph02 (age 0.366662s), out of quorum: Ceph03
mgr: Ceph03(active, since 5h), standbys: Ceph02, Ceph01
mds: 1/1 daemons up, 2 standby
osd: 66 osds: 19 up (since 6h), 22 in (since 6h); 1794 remapped pgs
flags noout
data:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 4385 pgs
objects: 86.57M objects, 193 TiB
usage: 278 TiB used, 84 TiB / 361 TiB avail
pgs: 21.482% pgs unknown
78.518% pgs not active
32475412/259722717 objects degraded (12.504%)
2514 peering
942 unknown
576 undersized+degraded+peered
347 down
6 undersized+peered
Tried restarting the nodes/osd's but no luck so far.. Any idea's?
Last edited on July 12, 2023, 2:52 pm by AlbertHakvoort · #1
hansmeiser2207
1 Post
July 13, 2023, 12:32 pmQuote from hansmeiser2207 on July 13, 2023, 12:32 pmHello,
i have exactly the same error pattern.
We initially set up PetaSAN with version 2.1.0 and have since installed every available update.
All OSD's of all OSD nodes are listed as "down, in".
All OSD nodes have the operating system "Ubuntu v20.04.6 LTS (Focal Fossa)" installed and all packages are up to date.
The ceph version on the OSD nodes is "ceph/petasan-v3,now 17.2.5-1petasan amd64 [installed]".
The last update command "ceph osd require-osd-release quincy" is stuck. All OSDs are not being recognized by the cluster, although all OSD daemons have started successfully and they have no errors.
It seems to me that the monitor nodes can no longer communicate with the OSD nodes. I can rule out network problems.
I need urgent help please!
Thanks very much.
Hello,
i have exactly the same error pattern.
We initially set up PetaSAN with version 2.1.0 and have since installed every available update.
All OSD's of all OSD nodes are listed as "down, in".
All OSD nodes have the operating system "Ubuntu v20.04.6 LTS (Focal Fossa)" installed and all packages are up to date.
The ceph version on the OSD nodes is "ceph/petasan-v3,now 17.2.5-1petasan amd64 [installed]".
The last update command "ceph osd require-osd-release quincy" is stuck. All OSDs are not being recognized by the cluster, although all OSD daemons have started successfully and they have no errors.
It seems to me that the monitor nodes can no longer communicate with the OSD nodes. I can rule out network problems.
I need urgent help please!
Thanks very much.
AlbertHakvoort
21 Posts
July 13, 2023, 1:46 pmQuote from AlbertHakvoort on July 13, 2023, 1:46 pmSupport is looking into my issue.
Can you post the output of :
ceph status
ceph osd dump | grep release
ceph versions
Support is looking into my issue.
Can you post the output of :
ceph status
ceph osd dump | grep release
ceph versions
AlbertHakvoort
21 Posts
July 13, 2023, 4:00 pmQuote from AlbertHakvoort on July 13, 2023, 4:00 pmOur Petasan is running again.
Support compiled a modified ceph-mon binary who was able to ignore the wrong version of the OSD's.
Our Petasan is running again.
Support compiled a modified ceph-mon binary who was able to ignore the wrong version of the OSD's.
admin
2,930 Posts
July 13, 2023, 5:39 pmQuote from admin on July 13, 2023, 5:39 pmif it is the same issue, it is related to
https://www.spinics.net/lists/ceph-users/msg74089.html
it happens if the cluster was updated from initially old release. Note that we will provide a fix in our upgrade script to handle this automatically. So the following is just for the case you already upgraded an have the issue.
download modified monitor binary
wget https://www.petasan.org/fixes/320/ceph-mon.gz
gunzip ceph-mon.gz
On first 3 nodes
mv /usr/bin/ceph-mon /usr/bin/ceph-mon-orig
chmod +x ceph-mon
cp ceph-mon /usr/bin
ln -s /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2 /usr/lib/x86_64-linux-gnu/libceph-common.so.2
systemctl restart ceph-mon.target
make sure the 3 new monitors started and are in quorum using
ceph status
if yes then
ceph osd require-osd-release octopus
if all goes well, the following will command show octopus
ceph osd dump | grep release
if all ok, reboot all nodes in cluster
Only when all OSDs are up and all PGs are active/clean and the only issue is
all OSDs are running quincy or later but require_osd_release < quincy
then at this point
ceph osd require-osd-release quincy
Again we will fix the upgrade scripts to automatically handle this without need of this binary.
if it is the same issue, it is related to
https://www.spinics.net/lists/ceph-users/msg74089.html
it happens if the cluster was updated from initially old release. Note that we will provide a fix in our upgrade script to handle this automatically. So the following is just for the case you already upgraded an have the issue.
download modified monitor binary
wget https://www.petasan.org/fixes/320/ceph-mon.gz
gunzip ceph-mon.gz
On first 3 nodes
mv /usr/bin/ceph-mon /usr/bin/ceph-mon-orig
chmod +x ceph-mon
cp ceph-mon /usr/bin
ln -s /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2 /usr/lib/x86_64-linux-gnu/libceph-common.so.2
systemctl restart ceph-mon.target
make sure the 3 new monitors started and are in quorum using
ceph status
if yes then
ceph osd require-osd-release octopus
if all goes well, the following will command show octopus
ceph osd dump | grep release
if all ok, reboot all nodes in cluster
Only when all OSDs are up and all PGs are active/clean and the only issue is
all OSDs are running quincy or later but require_osd_release < quincy
then at this point
ceph osd require-osd-release quincy
Again we will fix the upgrade scripts to automatically handle this without need of this binary.
Last edited on July 14, 2023, 6:56 pm by admin · #5
admin
2,930 Posts
July 13, 2023, 8:51 pmQuote from admin on July 13, 2023, 8:51 pmupgrade sctipt has been modified to deal with this case. the script is dynamically fetched online, so no changes are locally required.
upgrade sctipt has been modified to deal with this case. the script is dynamically fetched online, so no changes are locally required.
Update 3.2 issues
AlbertHakvoort
21 Posts
Quote from AlbertHakvoort on July 12, 2023, 2:43 pmHello,
After the update to 3.2 the osd's are not updating.
When I (re)try to execute the update
ceph osd require-osd-release quincy
ceph stops working / hungs for a long time
mon.Ceph02 (mon.2) 8959 : cluster [INF] disallowing boot of quincy+ OSD osd.21 v2:10.0.1.13:6826/153756 because require_osd_release < octopus
root@Ceph01:~# ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 filesystem is degraded
1/3 mons down, quorum Ceph01,Ceph02
noout flag(s) set
3 osds down
all OSDs are running octopus or later but require_osd_release < octopus
Reduced data availability: 4385 pgs inactive, 347 pgs down, 2514 pgs peering
Degraded data redundancy: 32475412/259722717 objects degraded (12.504%), 576 pgs degraded, 582 pgs undersized
6381 slow ops, oldest one blocked for 4570 sec, mon.Ceph02 has slow opsservices:
mon: 3 daemons, quorum Ceph01,Ceph02 (age 0.366662s), out of quorum: Ceph03
mgr: Ceph03(active, since 5h), standbys: Ceph02, Ceph01
mds: 1/1 daemons up, 2 standby
osd: 66 osds: 19 up (since 6h), 22 in (since 6h); 1794 remapped pgs
flags nooutdata:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 4385 pgs
objects: 86.57M objects, 193 TiB
usage: 278 TiB used, 84 TiB / 361 TiB avail
pgs: 21.482% pgs unknown
78.518% pgs not active
32475412/259722717 objects degraded (12.504%)
2514 peering
942 unknown
576 undersized+degraded+peered
347 down
6 undersized+peered
Tried restarting the nodes/osd's but no luck so far.. Any idea's?
Hello,
After the update to 3.2 the osd's are not updating.
When I (re)try to execute the update
ceph osd require-osd-release quincy
ceph stops working / hungs for a long time
mon.Ceph02 (mon.2) 8959 : cluster [INF] disallowing boot of quincy+ OSD osd.21 v2:10.0.1.13:6826/153756 because require_osd_release < octopus
root@Ceph01:~# ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 filesystem is degraded
1/3 mons down, quorum Ceph01,Ceph02
noout flag(s) set
3 osds down
all OSDs are running octopus or later but require_osd_release < octopus
Reduced data availability: 4385 pgs inactive, 347 pgs down, 2514 pgs peering
Degraded data redundancy: 32475412/259722717 objects degraded (12.504%), 576 pgs degraded, 582 pgs undersized
6381 slow ops, oldest one blocked for 4570 sec, mon.Ceph02 has slow opsservices:
mon: 3 daemons, quorum Ceph01,Ceph02 (age 0.366662s), out of quorum: Ceph03
mgr: Ceph03(active, since 5h), standbys: Ceph02, Ceph01
mds: 1/1 daemons up, 2 standby
osd: 66 osds: 19 up (since 6h), 22 in (since 6h); 1794 remapped pgs
flags nooutdata:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 4385 pgs
objects: 86.57M objects, 193 TiB
usage: 278 TiB used, 84 TiB / 361 TiB avail
pgs: 21.482% pgs unknown
78.518% pgs not active
32475412/259722717 objects degraded (12.504%)
2514 peering
942 unknown
576 undersized+degraded+peered
347 down
6 undersized+peered
Tried restarting the nodes/osd's but no luck so far.. Any idea's?
hansmeiser2207
1 Post
Quote from hansmeiser2207 on July 13, 2023, 12:32 pmHello,
i have exactly the same error pattern.
We initially set up PetaSAN with version 2.1.0 and have since installed every available update.
All OSD's of all OSD nodes are listed as "down, in".
All OSD nodes have the operating system "Ubuntu v20.04.6 LTS (Focal Fossa)" installed and all packages are up to date.
The ceph version on the OSD nodes is "ceph/petasan-v3,now 17.2.5-1petasan amd64 [installed]".
The last update command "ceph osd require-osd-release quincy" is stuck. All OSDs are not being recognized by the cluster, although all OSD daemons have started successfully and they have no errors.
It seems to me that the monitor nodes can no longer communicate with the OSD nodes. I can rule out network problems.I need urgent help please!
Thanks very much.
Hello,
i have exactly the same error pattern.
We initially set up PetaSAN with version 2.1.0 and have since installed every available update.
All OSD's of all OSD nodes are listed as "down, in".
All OSD nodes have the operating system "Ubuntu v20.04.6 LTS (Focal Fossa)" installed and all packages are up to date.
The ceph version on the OSD nodes is "ceph/petasan-v3,now 17.2.5-1petasan amd64 [installed]".
The last update command "ceph osd require-osd-release quincy" is stuck. All OSDs are not being recognized by the cluster, although all OSD daemons have started successfully and they have no errors.
It seems to me that the monitor nodes can no longer communicate with the OSD nodes. I can rule out network problems.
I need urgent help please!
Thanks very much.
AlbertHakvoort
21 Posts
Quote from AlbertHakvoort on July 13, 2023, 1:46 pmSupport is looking into my issue.
Can you post the output of :
ceph status
ceph osd dump | grep release
ceph versions
Support is looking into my issue.
Can you post the output of :
ceph status
ceph osd dump | grep release
ceph versions
AlbertHakvoort
21 Posts
Quote from AlbertHakvoort on July 13, 2023, 4:00 pmOur Petasan is running again.
Support compiled a modified ceph-mon binary who was able to ignore the wrong version of the OSD's.
Our Petasan is running again.
Support compiled a modified ceph-mon binary who was able to ignore the wrong version of the OSD's.
admin
2,930 Posts
Quote from admin on July 13, 2023, 5:39 pmif it is the same issue, it is related to
https://www.spinics.net/lists/ceph-users/msg74089.htmlit happens if the cluster was updated from initially old release. Note that we will provide a fix in our upgrade script to handle this automatically. So the following is just for the case you already upgraded an have the issue.
download modified monitor binary
wget https://www.petasan.org/fixes/320/ceph-mon.gz
gunzip ceph-mon.gzOn first 3 nodes
mv /usr/bin/ceph-mon /usr/bin/ceph-mon-orig
chmod +x ceph-mon
cp ceph-mon /usr/bin
ln -s /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2 /usr/lib/x86_64-linux-gnu/libceph-common.so.2
systemctl restart ceph-mon.targetmake sure the 3 new monitors started and are in quorum using
ceph status
if yes then
ceph osd require-osd-release octopus
if all goes well, the following will command show octopus
ceph osd dump | grep releaseif all ok, reboot all nodes in cluster
Only when all OSDs are up and all PGs are active/clean and the only issue is
all OSDs are running quincy or later but require_osd_release < quincy
then at this point
ceph osd require-osd-release quincyAgain we will fix the upgrade scripts to automatically handle this without need of this binary.
if it is the same issue, it is related to
https://www.spinics.net/lists/ceph-users/msg74089.html
it happens if the cluster was updated from initially old release. Note that we will provide a fix in our upgrade script to handle this automatically. So the following is just for the case you already upgraded an have the issue.
download modified monitor binary
wget https://www.petasan.org/fixes/320/ceph-mon.gz
gunzip ceph-mon.gz
On first 3 nodes
mv /usr/bin/ceph-mon /usr/bin/ceph-mon-orig
chmod +x ceph-mon
cp ceph-mon /usr/bin
ln -s /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2 /usr/lib/x86_64-linux-gnu/libceph-common.so.2
systemctl restart ceph-mon.target
make sure the 3 new monitors started and are in quorum using
ceph status
if yes then
ceph osd require-osd-release octopus
if all goes well, the following will command show octopus
ceph osd dump | grep release
if all ok, reboot all nodes in cluster
Only when all OSDs are up and all PGs are active/clean and the only issue is
all OSDs are running quincy or later but require_osd_release < quincy
then at this point
ceph osd require-osd-release quincy
Again we will fix the upgrade scripts to automatically handle this without need of this binary.
admin
2,930 Posts
Quote from admin on July 13, 2023, 8:51 pmupgrade sctipt has been modified to deal with this case. the script is dynamically fetched online, so no changes are locally required.
upgrade sctipt has been modified to deal with this case. the script is dynamically fetched online, so no changes are locally required.