Upgrade 2.7.3 -> 2.8 breaks OSDs
dbutti
28 Posts
July 3, 2021, 1:55 pmQuote from dbutti on July 3, 2021, 1:55 pmHello! Let me say thanks once again for the great Petasan software and your very appreciated hard work.
I use Petasan in production since release 2.0.0, and went through all the updates without major issues, until the last one.
My cluster is a rather small one: 3x Supermicro server, 24GB RAM, 3x SSDs and 3x HDDs each, 2x 10Gbps NICs each, approximately 40TB raw storage space.
When I started the upgrade on the first node, following the usual instructions, I immediately got caught in a major issue: after updating Ceph from Nautilus to Octopus, ALL the OSDs failed to rejoin the cluster. Some of them would just start and stay disconnected, others kept starting and crashing, complaining about a corrupted Bluestore DB.
Digging a bit deeper, I could determine that two issues appeared at the same time:
- On my (rather old) cluster, a rather old setting was still active: ceph osd require-osd-release luminous. The 2.8.0 update did not change, and for this reason even the healthy OSDs were not able to rejoin the cluster. I solved this by running ceph osd require-osd-release nautilus so the healthy OSD immediately rejoined the cluster and started syncing.
- A majority of OSDs on the node (2 out of 3) kept crashing anyway, and I found out that is due to a nasty bug in Octopus: see https://tracker.ceph.com/issues/50017 What happens here is that the new OSD version attempts a format migration on the Bluestore DBs, but this frequently fails and leaves the OSD in an usable condition (the only solution here is to drop the broken OSDs, recreate them, and have Ceph refill them in due course).
Issue 2 is obviously a very serious one, putting data redundancy and availability to a risk, and making the cluster upgrade progress long and complicated.
It seems that perhaps setting the bluestore_fsck_quick_fix_on_mount to false could prevent Ceph from falling into the bug, but I hadn't the time to test this, and will leave it up to you to determine how to avoid the issue in a future upgrade.
I hope this will help other users and the community to avoid downtimes or data loss for this reason.
Thanks, keep up the good work!
Hello! Let me say thanks once again for the great Petasan software and your very appreciated hard work.
I use Petasan in production since release 2.0.0, and went through all the updates without major issues, until the last one.
My cluster is a rather small one: 3x Supermicro server, 24GB RAM, 3x SSDs and 3x HDDs each, 2x 10Gbps NICs each, approximately 40TB raw storage space.
When I started the upgrade on the first node, following the usual instructions, I immediately got caught in a major issue: after updating Ceph from Nautilus to Octopus, ALL the OSDs failed to rejoin the cluster. Some of them would just start and stay disconnected, others kept starting and crashing, complaining about a corrupted Bluestore DB.
Digging a bit deeper, I could determine that two issues appeared at the same time:
- On my (rather old) cluster, a rather old setting was still active: ceph osd require-osd-release luminous. The 2.8.0 update did not change, and for this reason even the healthy OSDs were not able to rejoin the cluster. I solved this by running ceph osd require-osd-release nautilus so the healthy OSD immediately rejoined the cluster and started syncing.
- A majority of OSDs on the node (2 out of 3) kept crashing anyway, and I found out that is due to a nasty bug in Octopus: see https://tracker.ceph.com/issues/50017 What happens here is that the new OSD version attempts a format migration on the Bluestore DBs, but this frequently fails and leaves the OSD in an usable condition (the only solution here is to drop the broken OSDs, recreate them, and have Ceph refill them in due course).
Issue 2 is obviously a very serious one, putting data redundancy and availability to a risk, and making the cluster upgrade progress long and complicated.
It seems that perhaps setting the bluestore_fsck_quick_fix_on_mount to false could prevent Ceph from falling into the bug, but I hadn't the time to test this, and will leave it up to you to determine how to avoid the issue in a future upgrade.
I hope this will help other users and the community to avoid downtimes or data loss for this reason.
Thanks, keep up the good work!
admin
2,930 Posts
July 3, 2021, 6:20 pmQuote from admin on July 3, 2021, 6:20 pmFor 1) we do have the ceph osd require-osd-release nautilus command in our 2.3.1 (Nautilus) upgrade guide, it is the last step.'
For 2) We do test online upgrade as part of our testing, we did not have this bug but i am not sure how far back we test from, but we will look into upgrading from old versions. we will also look more into the Ceph bug tracker report.
For 1) we do have the ceph osd require-osd-release nautilus command in our 2.3.1 (Nautilus) upgrade guide, it is the last step.'
For 2) We do test online upgrade as part of our testing, we did not have this bug but i am not sure how far back we test from, but we will look into upgrading from old versions. we will also look more into the Ceph bug tracker report.
Last edited on July 3, 2021, 6:23 pm by admin · #2
dbutti
28 Posts
July 3, 2021, 8:21 pmQuote from dbutti on July 3, 2021, 8:21 pmThanks for your reply!
N.1 is my fault then, I could have missed on a previous update. Maybe it would be a good idea to enforce that change on the upgrade script anyway, so there’s no chance for misbehavior - but that was easy to fix.
N.2 is much scarier, and it probably has to do with the number of changes to be done on the Bluestore (according to the issue tracker anyway). So it will probably not cause any problem in a test case, where the past transaction history is quite short, but it can (and does) on a loaded system with a lot of activity.
Thanks for your reply!
N.1 is my fault then, I could have missed on a previous update. Maybe it would be a good idea to enforce that change on the upgrade script anyway, so there’s no chance for misbehavior - but that was easy to fix.
N.2 is much scarier, and it probably has to do with the number of changes to be done on the Bluestore (according to the issue tracker anyway). So it will probably not cause any problem in a test case, where the past transaction history is quite short, but it can (and does) on a loaded system with a lot of activity.
Upgrade 2.7.3 -> 2.8 breaks OSDs
dbutti
28 Posts
Quote from dbutti on July 3, 2021, 1:55 pmHello! Let me say thanks once again for the great Petasan software and your very appreciated hard work.
I use Petasan in production since release 2.0.0, and went through all the updates without major issues, until the last one.
My cluster is a rather small one: 3x Supermicro server, 24GB RAM, 3x SSDs and 3x HDDs each, 2x 10Gbps NICs each, approximately 40TB raw storage space.
When I started the upgrade on the first node, following the usual instructions, I immediately got caught in a major issue: after updating Ceph from Nautilus to Octopus, ALL the OSDs failed to rejoin the cluster. Some of them would just start and stay disconnected, others kept starting and crashing, complaining about a corrupted Bluestore DB.
Digging a bit deeper, I could determine that two issues appeared at the same time:
- On my (rather old) cluster, a rather old setting was still active: ceph osd require-osd-release luminous. The 2.8.0 update did not change, and for this reason even the healthy OSDs were not able to rejoin the cluster. I solved this by running ceph osd require-osd-release nautilus so the healthy OSD immediately rejoined the cluster and started syncing.
- A majority of OSDs on the node (2 out of 3) kept crashing anyway, and I found out that is due to a nasty bug in Octopus: see https://tracker.ceph.com/issues/50017 What happens here is that the new OSD version attempts a format migration on the Bluestore DBs, but this frequently fails and leaves the OSD in an usable condition (the only solution here is to drop the broken OSDs, recreate them, and have Ceph refill them in due course).
Issue 2 is obviously a very serious one, putting data redundancy and availability to a risk, and making the cluster upgrade progress long and complicated.
It seems that perhaps setting the bluestore_fsck_quick_fix_on_mount to false could prevent Ceph from falling into the bug, but I hadn't the time to test this, and will leave it up to you to determine how to avoid the issue in a future upgrade.
I hope this will help other users and the community to avoid downtimes or data loss for this reason.
Thanks, keep up the good work!
Hello! Let me say thanks once again for the great Petasan software and your very appreciated hard work.
I use Petasan in production since release 2.0.0, and went through all the updates without major issues, until the last one.
My cluster is a rather small one: 3x Supermicro server, 24GB RAM, 3x SSDs and 3x HDDs each, 2x 10Gbps NICs each, approximately 40TB raw storage space.
When I started the upgrade on the first node, following the usual instructions, I immediately got caught in a major issue: after updating Ceph from Nautilus to Octopus, ALL the OSDs failed to rejoin the cluster. Some of them would just start and stay disconnected, others kept starting and crashing, complaining about a corrupted Bluestore DB.
Digging a bit deeper, I could determine that two issues appeared at the same time:
- On my (rather old) cluster, a rather old setting was still active: ceph osd require-osd-release luminous. The 2.8.0 update did not change, and for this reason even the healthy OSDs were not able to rejoin the cluster. I solved this by running ceph osd require-osd-release nautilus so the healthy OSD immediately rejoined the cluster and started syncing.
- A majority of OSDs on the node (2 out of 3) kept crashing anyway, and I found out that is due to a nasty bug in Octopus: see https://tracker.ceph.com/issues/50017 What happens here is that the new OSD version attempts a format migration on the Bluestore DBs, but this frequently fails and leaves the OSD in an usable condition (the only solution here is to drop the broken OSDs, recreate them, and have Ceph refill them in due course).
Issue 2 is obviously a very serious one, putting data redundancy and availability to a risk, and making the cluster upgrade progress long and complicated.
It seems that perhaps setting the bluestore_fsck_quick_fix_on_mount to false could prevent Ceph from falling into the bug, but I hadn't the time to test this, and will leave it up to you to determine how to avoid the issue in a future upgrade.
I hope this will help other users and the community to avoid downtimes or data loss for this reason.
Thanks, keep up the good work!
admin
2,930 Posts
Quote from admin on July 3, 2021, 6:20 pmFor 1) we do have the ceph osd require-osd-release nautilus command in our 2.3.1 (Nautilus) upgrade guide, it is the last step.'
For 2) We do test online upgrade as part of our testing, we did not have this bug but i am not sure how far back we test from, but we will look into upgrading from old versions. we will also look more into the Ceph bug tracker report.
For 1) we do have the ceph osd require-osd-release nautilus command in our 2.3.1 (Nautilus) upgrade guide, it is the last step.'
For 2) We do test online upgrade as part of our testing, we did not have this bug but i am not sure how far back we test from, but we will look into upgrading from old versions. we will also look more into the Ceph bug tracker report.
dbutti
28 Posts
Quote from dbutti on July 3, 2021, 8:21 pmThanks for your reply!
N.1 is my fault then, I could have missed on a previous update. Maybe it would be a good idea to enforce that change on the upgrade script anyway, so there’s no chance for misbehavior - but that was easy to fix.
N.2 is much scarier, and it probably has to do with the number of changes to be done on the Bluestore (according to the issue tracker anyway). So it will probably not cause any problem in a test case, where the past transaction history is quite short, but it can (and does) on a loaded system with a lot of activity.
Thanks for your reply!
N.1 is my fault then, I could have missed on a previous update. Maybe it would be a good idea to enforce that change on the upgrade script anyway, so there’s no chance for misbehavior - but that was easy to fix.
N.2 is much scarier, and it probably has to do with the number of changes to be done on the Bluestore (according to the issue tracker anyway). So it will probably not cause any problem in a test case, where the past transaction history is quite short, but it can (and does) on a loaded system with a lot of activity.