Updating ceph version
wluke
66 Posts
November 15, 2023, 12:07 pmQuote from wluke on November 15, 2023, 12:07 pmHi,
We're seeing the metadata write throughput almost constantly increase, and then never drop back down if we use cephfs snapshots (after we stop taking snapshots it just remains at the level it's at, even when there's no cephfs files being read or written), and I believe the issue (mds: switch submit_mutex to fair mutex for MDLog) and associated fixes in 17.2.7 might sort this. Restarting the MDS does clear it back down, but that's not ideal and knocks services offline for a few moments.
The first instance it increased up to 100MB/s writes on the metadata pool, before we noticed. I tried reducing the snapshot frequency but this just slowed the increase - currently we've a now constant 10MB/s writes on the metadata pool.
I was wondering if it's possible to update the ceph version outwith a PetaSAN official update so we can run 17.2.7?
Thanks!
Hi,
We're seeing the metadata write throughput almost constantly increase, and then never drop back down if we use cephfs snapshots (after we stop taking snapshots it just remains at the level it's at, even when there's no cephfs files being read or written), and I believe the issue (mds: switch submit_mutex to fair mutex for MDLog) and associated fixes in 17.2.7 might sort this. Restarting the MDS does clear it back down, but that's not ideal and knocks services offline for a few moments.
The first instance it increased up to 100MB/s writes on the metadata pool, before we noticed. I tried reducing the snapshot frequency but this just slowed the increase - currently we've a now constant 10MB/s writes on the metadata pool.
I was wondering if it's possible to update the ceph version outwith a PetaSAN official update so we can run 17.2.7?
Thanks!
admin
2,930 Posts
November 15, 2023, 2:54 pmQuote from admin on November 15, 2023, 2:54 pmGenerally we always lag on the latest version as do most supported products. Ceph 17.2.7 and 17.2.6 do have their share of issues. If it is a significant bug we can backport fix it to our released version. We plan to release 3.3 in approx 1 month but so far the plan is to still use 17.2.5
Looking at the bug you mention, it seems it can happen under heavy load
https://tracker.ceph.com/issues/58000
If you can reduce the load, reduce snapshot frequency, make sure you use ssd for metadatapool.
Generally we always lag on the latest version as do most supported products. Ceph 17.2.7 and 17.2.6 do have their share of issues. If it is a significant bug we can backport fix it to our released version. We plan to release 3.3 in approx 1 month but so far the plan is to still use 17.2.5
Looking at the bug you mention, it seems it can happen under heavy load
https://tracker.ceph.com/issues/58000
If you can reduce the load, reduce snapshot frequency, make sure you use ssd for metadatapool.
Last edited on November 15, 2023, 2:54 pm by admin · #2
wluke
66 Posts
November 15, 2023, 3:28 pmQuote from wluke on November 15, 2023, 3:28 pmEven dropping the snapshots down to only once and hour (we're using the https://www.45drives.com/blog/ceph/ceph-geo-replication/ tool to keep offsite backups where there are hundreds of thousands of directories and a few million additional files per month, where just standard rsync takes too long) we still see the writes on the metadata too keep climbing, and never drop. Stopping the snapshots entirely and it seems to stop increasing, but it never drops back down.
The metadata tool is replicated on SSDs, with NVME cache and WAL.
There are a few other related fixes to MDS that could also possibly apply, but I think the issue I highlighted seems the most likely culprit. We're not seeing the issue on another cluster we have (Proxmox/Ceph 17.2.7)
Reducing the load isn't really an option, but the cephfs load certainly isn't massive. During production hours there's a steady amount of new files being created (call recordings), around 5-10MB/s, and around 5-10MB/s of other data being written. Outside production hours there's a steady stream of services writing logfiles to the samba shares, perhaps 500kB/s overnight (50-100 IOPS)
The mClock scheduler improvements in 17.2.7 also seem to really help with things getting totally saturated by recovery IO, which we have observed when adding some new OSDs recently, but that's a secondary concern, so if 17.2.7 isn't an option then backporting this one MDS fix would be super helpful
Are there any plans to include cephfs-mirror (either in the UI or just available) in the 3.3 release? This would perhaps be another method to allow us to do offsite backups, although it also relies on snapshots so I think would be similarly effected by the issues we're seeing
Even dropping the snapshots down to only once and hour (we're using the https://www.45drives.com/blog/ceph/ceph-geo-replication/ tool to keep offsite backups where there are hundreds of thousands of directories and a few million additional files per month, where just standard rsync takes too long) we still see the writes on the metadata too keep climbing, and never drop. Stopping the snapshots entirely and it seems to stop increasing, but it never drops back down.
The metadata tool is replicated on SSDs, with NVME cache and WAL.
There are a few other related fixes to MDS that could also possibly apply, but I think the issue I highlighted seems the most likely culprit. We're not seeing the issue on another cluster we have (Proxmox/Ceph 17.2.7)
Reducing the load isn't really an option, but the cephfs load certainly isn't massive. During production hours there's a steady amount of new files being created (call recordings), around 5-10MB/s, and around 5-10MB/s of other data being written. Outside production hours there's a steady stream of services writing logfiles to the samba shares, perhaps 500kB/s overnight (50-100 IOPS)
The mClock scheduler improvements in 17.2.7 also seem to really help with things getting totally saturated by recovery IO, which we have observed when adding some new OSDs recently, but that's a secondary concern, so if 17.2.7 isn't an option then backporting this one MDS fix would be super helpful
Are there any plans to include cephfs-mirror (either in the UI or just available) in the 3.3 release? This would perhaps be another method to allow us to do offsite backups, although it also relies on snapshots so I think would be similarly effected by the issues we're seeing
admin
2,930 Posts
November 15, 2023, 4:47 pmQuote from admin on November 15, 2023, 4:47 pmif you stop the 45drives geo-replication tool, do you still see the issue ?
if you stop the 45drives geo-replication tool, do you still see the issue ?
wluke
66 Posts
November 15, 2023, 5:12 pmQuote from wluke on November 15, 2023, 5:12 pmNaa - it's what's taking the snapshots so stopping it running periodically stops the metadata writes increasing (but they still stay at whatever level they were at), looking at the code (https://github.com/45Drives/cephgeorep/tree/master/src/impl), it's just taking the cephfs snapshot and then traversing all the folders and files with standard filesystem access c++ functions, so with cephfs using the kernel mounts, the only thing it's doing differently (from a ceph point of view) from "find" or "rsync" is the snapshot part
Naa - it's what's taking the snapshots so stopping it running periodically stops the metadata writes increasing (but they still stay at whatever level they were at), looking at the code (https://github.com/45Drives/cephgeorep/tree/master/src/impl), it's just taking the cephfs snapshot and then traversing all the folders and files with standard filesystem access c++ functions, so with cephfs using the kernel mounts, the only thing it's doing differently (from a ceph point of view) from "find" or "rsync" is the snapshot part
Last edited on November 15, 2023, 5:13 pm by wluke · #5
Updating ceph version
wluke
66 Posts
Quote from wluke on November 15, 2023, 12:07 pmHi,
We're seeing the metadata write throughput almost constantly increase, and then never drop back down if we use cephfs snapshots (after we stop taking snapshots it just remains at the level it's at, even when there's no cephfs files being read or written), and I believe the issue (mds: switch submit_mutex to fair mutex for MDLog) and associated fixes in 17.2.7 might sort this. Restarting the MDS does clear it back down, but that's not ideal and knocks services offline for a few moments.
The first instance it increased up to 100MB/s writes on the metadata pool, before we noticed. I tried reducing the snapshot frequency but this just slowed the increase - currently we've a now constant 10MB/s writes on the metadata pool.
I was wondering if it's possible to update the ceph version outwith a PetaSAN official update so we can run 17.2.7?
Thanks!
Hi,
We're seeing the metadata write throughput almost constantly increase, and then never drop back down if we use cephfs snapshots (after we stop taking snapshots it just remains at the level it's at, even when there's no cephfs files being read or written), and I believe the issue (mds: switch submit_mutex to fair mutex for MDLog) and associated fixes in 17.2.7 might sort this. Restarting the MDS does clear it back down, but that's not ideal and knocks services offline for a few moments.
The first instance it increased up to 100MB/s writes on the metadata pool, before we noticed. I tried reducing the snapshot frequency but this just slowed the increase - currently we've a now constant 10MB/s writes on the metadata pool.
I was wondering if it's possible to update the ceph version outwith a PetaSAN official update so we can run 17.2.7?
Thanks!
admin
2,930 Posts
Quote from admin on November 15, 2023, 2:54 pmGenerally we always lag on the latest version as do most supported products. Ceph 17.2.7 and 17.2.6 do have their share of issues. If it is a significant bug we can backport fix it to our released version. We plan to release 3.3 in approx 1 month but so far the plan is to still use 17.2.5
Looking at the bug you mention, it seems it can happen under heavy load
https://tracker.ceph.com/issues/58000
If you can reduce the load, reduce snapshot frequency, make sure you use ssd for metadatapool.
Generally we always lag on the latest version as do most supported products. Ceph 17.2.7 and 17.2.6 do have their share of issues. If it is a significant bug we can backport fix it to our released version. We plan to release 3.3 in approx 1 month but so far the plan is to still use 17.2.5
Looking at the bug you mention, it seems it can happen under heavy load
https://tracker.ceph.com/issues/58000
If you can reduce the load, reduce snapshot frequency, make sure you use ssd for metadatapool.
wluke
66 Posts
Quote from wluke on November 15, 2023, 3:28 pmEven dropping the snapshots down to only once and hour (we're using the https://www.45drives.com/blog/ceph/ceph-geo-replication/ tool to keep offsite backups where there are hundreds of thousands of directories and a few million additional files per month, where just standard rsync takes too long) we still see the writes on the metadata too keep climbing, and never drop. Stopping the snapshots entirely and it seems to stop increasing, but it never drops back down.
The metadata tool is replicated on SSDs, with NVME cache and WAL.
There are a few other related fixes to MDS that could also possibly apply, but I think the issue I highlighted seems the most likely culprit. We're not seeing the issue on another cluster we have (Proxmox/Ceph 17.2.7)
Reducing the load isn't really an option, but the cephfs load certainly isn't massive. During production hours there's a steady amount of new files being created (call recordings), around 5-10MB/s, and around 5-10MB/s of other data being written. Outside production hours there's a steady stream of services writing logfiles to the samba shares, perhaps 500kB/s overnight (50-100 IOPS)
The mClock scheduler improvements in 17.2.7 also seem to really help with things getting totally saturated by recovery IO, which we have observed when adding some new OSDs recently, but that's a secondary concern, so if 17.2.7 isn't an option then backporting this one MDS fix would be super helpful
Are there any plans to include cephfs-mirror (either in the UI or just available) in the 3.3 release? This would perhaps be another method to allow us to do offsite backups, although it also relies on snapshots so I think would be similarly effected by the issues we're seeing
Even dropping the snapshots down to only once and hour (we're using the https://www.45drives.com/blog/ceph/ceph-geo-replication/ tool to keep offsite backups where there are hundreds of thousands of directories and a few million additional files per month, where just standard rsync takes too long) we still see the writes on the metadata too keep climbing, and never drop. Stopping the snapshots entirely and it seems to stop increasing, but it never drops back down.
The metadata tool is replicated on SSDs, with NVME cache and WAL.
There are a few other related fixes to MDS that could also possibly apply, but I think the issue I highlighted seems the most likely culprit. We're not seeing the issue on another cluster we have (Proxmox/Ceph 17.2.7)
Reducing the load isn't really an option, but the cephfs load certainly isn't massive. During production hours there's a steady amount of new files being created (call recordings), around 5-10MB/s, and around 5-10MB/s of other data being written. Outside production hours there's a steady stream of services writing logfiles to the samba shares, perhaps 500kB/s overnight (50-100 IOPS)
The mClock scheduler improvements in 17.2.7 also seem to really help with things getting totally saturated by recovery IO, which we have observed when adding some new OSDs recently, but that's a secondary concern, so if 17.2.7 isn't an option then backporting this one MDS fix would be super helpful
Are there any plans to include cephfs-mirror (either in the UI or just available) in the 3.3 release? This would perhaps be another method to allow us to do offsite backups, although it also relies on snapshots so I think would be similarly effected by the issues we're seeing
admin
2,930 Posts
Quote from admin on November 15, 2023, 4:47 pmif you stop the 45drives geo-replication tool, do you still see the issue ?
if you stop the 45drives geo-replication tool, do you still see the issue ?
wluke
66 Posts
Quote from wluke on November 15, 2023, 5:12 pmNaa - it's what's taking the snapshots so stopping it running periodically stops the metadata writes increasing (but they still stay at whatever level they were at), looking at the code (https://github.com/45Drives/cephgeorep/tree/master/src/impl), it's just taking the cephfs snapshot and then traversing all the folders and files with standard filesystem access c++ functions, so with cephfs using the kernel mounts, the only thing it's doing differently (from a ceph point of view) from "find" or "rsync" is the snapshot part
Naa - it's what's taking the snapshots so stopping it running periodically stops the metadata writes increasing (but they still stay at whatever level they were at), looking at the code (https://github.com/45Drives/cephgeorep/tree/master/src/impl), it's just taking the cephfs snapshot and then traversing all the folders and files with standard filesystem access c++ functions, so with cephfs using the kernel mounts, the only thing it's doing differently (from a ceph point of view) from "find" or "rsync" is the snapshot part