Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Logrotate settings? (High disk useage on / partition)

Pages: 1 2 3

Hi,

I'm frequently needing to manually clear out log files on all the nodes as the / partition which is only 15G is running out of space.

It doesn't seem to be any one set of logs in particular that are growing uncontrolled, just they are generally all increasing, and there's not a lot of space available in that 15G. Is there any way to configure the log rotation settings, or reduce the verbosity of any of the logs?

Just incase there's something else using more space that it should here's some details (this is after I've cleaned up most the larger .gz, .0 and .1 log file archives.

root@gl-san-02c:/# du -h --max-depth=1

0 ./proc
52K ./root
0 ./dev
478M ./opt
4.0K ./media
9.1G ./var
5.4M ./etc
4.0K ./debs
2.2G ./usr
192K ./tmp
12K ./home
0 ./sys
74M ./boot
4.0K ./srv
7.5M ./run

root@gl-san-02c:/# du -h --max-depth=1 /var
4.0K /var/opt
20K /var/spool
4.3G /var/lib
4.0K /var/local
85M /var/cache
1.7M /var/backups
4.9G /var/log
140K /var/tmp
4.0K /var/mail
28K /var/www
9.2G /var

Hope you can offer some guidance, thanks!

 

Yes you can control logrotate settings in

/etc/logrotate.d

But i first would recommend you dig lower, in most cases you will find one or two culprits rather than it being uniform usage, it could be a log filling because of errors or warnings that nay need addressing.

 

Thanks!

It seems the ctdb logrotate settings are point at a file that doesn't exist: /var/log/ctdb/log.ctdb

Whereas on my systems I'm seeing large (multi gigabyte) /var/log/samba/log.ctdb and I can't see this is configured to be rotated anywhere.

Is there something odd about my system, or should this be the file the ctdb logrotate settings point at?

You could change the logrotate to pout to correct path.

I would also look at why the the file is mutli-gigabyte: what is the messages ? it could be that you have assigned the CIFS role but has not yet assigned any IPs in the CIFS Settings, and ctdb would be complaining it needs a ip range to start. If you are not using CIFS, remove the role from the nodes.

There were periods of the file getting high amount of log messages about "high hopcount", however I'm unsure if this was because some of the nodes were 100% disk space used, and so CTDB was failing to allocate more space to its "volatile" locks folder (which was around 4G)

I was deleting around 2.1 million files from a CIFs share (from within Windows) at the time.... I wouldn't have thought this in itself should cause errors though should it?

 

So the errors are gone now ? or do you still get them in the logs ? is the cluster health ok ?

The errors have stopped now, but when the root partition ran out of space CTDB got stuck in recovery mode, which made all our CIFS shares inaccessible. I left it for around 15 minutes but it did not recover, so I had to "systemctl stop petasan-cifs" on all the nodes, and restart that, ctdb and smbd.

Now the CTDB shows healthly, and the cluster health is okay (well, it's WARN, but that's just as there's load of PGs that have not been scrubbed in time - I've dropped the frequency of the deep scrubs, which seems to have applied everywhere apart from the warnings)

I'm going to retry deleting the 2.1 million files again and hopefully it works this time!

So shortly after retyring the deleltion operation I see the following spamming endlessly into two out of the five nodes /var/log/samba/log.ctdb

Node A:

2023/03/14 18:41:48.737701 ctdbd[747550]: High hopcount 29198399 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.740330 ctdbd[747550]: High hopcount 29198497 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.740444 ctdbd[747550]: High hopcount 29198499 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.743179 ctdbd[747550]: High hopcount 29198597 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.743275 ctdbd[747550]: High hopcount 29198599 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.745887 ctdbd[747550]: High hopcount 29198697 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.745985 ctdbd[747550]: High hopcount 29198699 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.748526 ctdbd[747550]: High hopcount 29198797 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.748617 ctdbd[747550]: High hopcount 29198799 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.751118 ctdbd[747550]: High hopcount 29198897 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.751210 ctdbd[747550]: High hopcount 29198899 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.753739 ctdbd[747550]: High hopcount 29198997 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.753833 ctdbd[747550]: High hopcount 29198999 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.756334 ctdbd[747550]: High hopcount 29199097 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.756429 ctdbd[747550]: High hopcount 29199099 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.758994 ctdbd[747550]: High hopcount 29199197 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.759087 ctdbd[747550]: High hopcount 29199199 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.761640 ctdbd[747550]: High hopcount 29199297 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.761743 ctdbd[747550]: High hopcount 29199299 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.764529 ctdbd[747550]: High hopcount 29199397 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.764629 ctdbd[747550]: High hopcount 29199399 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.767413 ctdbd[747550]: High hopcount 29199497 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.767514 ctdbd[747550]: High hopcount 29199499 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.770082 ctdbd[747550]: High hopcount 29199597 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.770183 ctdbd[747550]: High hopcount 29199599 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.773194 ctdbd[747550]: High hopcount 29199697 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.773318 ctdbd[747550]: High hopcount 29199699 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.776280 ctdbd[747550]: High hopcount 29199797 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.776435 ctdbd[747550]: High hopcount 29199799 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.779268 ctdbd[747550]: High hopcount 29199897 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.779374 ctdbd[747550]: High hopcount 29199899 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.782313 ctdbd[747550]: High hopcount 29199997 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.782424 ctdbd[747550]: High hopcount 29199999 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.785271 ctdbd[747550]: High hopcount 29200097 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.785371 ctdbd[747550]: High hopcount 29200099 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.787839 ctdbd[747550]: High hopcount 29200197 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.787933 ctdbd[747550]: High hopcount 29200199 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.790680 ctdbd[747550]: High hopcount 29200297 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.790792 ctdbd[747550]: High hopcount 29200299 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.793852 ctdbd[747550]: High hopcount 29200397 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.793969 ctdbd[747550]: High hopcount 29200399 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.796908 ctdbd[747550]: High hopcount 29200497 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.797047 ctdbd[747550]: High hopcount 29200499 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.800080 ctdbd[747550]: High hopcount 29200597 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.800200 ctdbd[747550]: High hopcount 29200599 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.803219 ctdbd[747550]: High hopcount 29200697 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.803343 ctdbd[747550]: High hopcount 29200699 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.805983 ctdbd[747550]: High hopcount 29200797 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.806079 ctdbd[747550]: High hopcount 29200799 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.808734 ctdbd[747550]: High hopcount 29200897 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.808836 ctdbd[747550]: High hopcount 29200899 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.811360 ctdbd[747550]: High hopcount 29200997 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.811461 ctdbd[747550]: High hopcount 29200999 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.813947 ctdbd[747550]: High hopcount 29201097 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.814039 ctdbd[747550]: High hopcount 29201099 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.816537 ctdbd[747550]: High hopcount 29201197 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.816628 ctdbd[747550]: High hopcount 29201199 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.819152 ctdbd[747550]: High hopcount 29201297 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.819246 ctdbd[747550]: High hopcount 29201299 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.821799 ctdbd[747550]: High hopcount 29201397 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4
2023/03/14 18:41:48.821895 ctdbd[747550]: High hopcount 29201399 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:1 src:0 lmaster:1 header->dmaster:4 dst:4

 

Node E:

2023/03/14 18:42:31.576854 ctdbd[242428]: High hopcount 30728498 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.579421 ctdbd[242428]: High hopcount 30728596 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.579514 ctdbd[242428]: High hopcount 30728598 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.582080 ctdbd[242428]: High hopcount 30728696 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.582174 ctdbd[242428]: High hopcount 30728698 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.584692 ctdbd[242428]: High hopcount 30728796 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.584785 ctdbd[242428]: High hopcount 30728798 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.587393 ctdbd[242428]: High hopcount 30728896 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.587488 ctdbd[242428]: High hopcount 30728898 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.590107 ctdbd[242428]: High hopcount 30728996 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.590203 ctdbd[242428]: High hopcount 30728998 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.592753 ctdbd[242428]: High hopcount 30729096 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.592846 ctdbd[242428]: High hopcount 30729098 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.595343 ctdbd[242428]: High hopcount 30729196 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.595458 ctdbd[242428]: High hopcount 30729198 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.598022 ctdbd[242428]: High hopcount 30729296 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.598123 ctdbd[242428]: High hopcount 30729298 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.600671 ctdbd[242428]: High hopcount 30729396 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.600765 ctdbd[242428]: High hopcount 30729398 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.603248 ctdbd[242428]: High hopcount 30729496 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.603362 ctdbd[242428]: High hopcount 30729498 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.608368 ctdbd[242428]: High hopcount 30729596 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.608494 ctdbd[242428]: High hopcount 30729598 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.611099 ctdbd[242428]: High hopcount 30729696 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.611195 ctdbd[242428]: High hopcount 30729698 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.613676 ctdbd[242428]: High hopcount 30729796 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.613767 ctdbd[242428]: High hopcount 30729798 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.616243 ctdbd[242428]: High hopcount 30729896 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.616335 ctdbd[242428]: High hopcount 30729898 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.618867 ctdbd[242428]: High hopcount 30729996 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.618971 ctdbd[242428]: High hopcount 30729998 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.621517 ctdbd[242428]: High hopcount 30730096 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.621615 ctdbd[242428]: High hopcount 30730098 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.624127 ctdbd[242428]: High hopcount 30730196 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.624220 ctdbd[242428]: High hopcount 30730198 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.626732 ctdbd[242428]: High hopcount 30730296 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.626830 ctdbd[242428]: High hopcount 30730298 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.629481 ctdbd[242428]: High hopcount 30730396 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.629587 ctdbd[242428]: High hopcount 30730398 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.632050 ctdbd[242428]: High hopcount 30730496 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.632141 ctdbd[242428]: High hopcount 30730498 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.634985 ctdbd[242428]: High hopcount 30730596 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.635104 ctdbd[242428]: High hopcount 30730598 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.637611 ctdbd[242428]: High hopcount 30730696 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.637711 ctdbd[242428]: High hopcount 30730698 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.640303 ctdbd[242428]: High hopcount 30730796 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.640401 ctdbd[242428]: High hopcount 30730798 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.643049 ctdbd[242428]: High hopcount 30730896 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.643147 ctdbd[242428]: High hopcount 30730898 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.645732 ctdbd[242428]: High hopcount 30730996 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.645831 ctdbd[242428]: High hopcount 30730998 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.648292 ctdbd[242428]: High hopcount 30731096 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.648383 ctdbd[242428]: High hopcount 30731098 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.650890 ctdbd[242428]: High hopcount 30731196 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.650984 ctdbd[242428]: High hopcount 30731198 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.653580 ctdbd[242428]: High hopcount 30731296 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.653679 ctdbd[242428]: High hopcount 30731298 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.656214 ctdbd[242428]: High hopcount 30731396 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.656305 ctdbd[242428]: High hopcount 30731398 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.658877 ctdbd[242428]: High hopcount 30731496 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1
2023/03/14 18:42:31.658973 ctdbd[242428]: High hopcount 30731498 dbid:locking.tdb key:0x40318ef6 reqid=5f256cea pnn:4 src:0 lmaster:1 header->dmaster:1 dst:1

Have you any ideas what might be causing this? The deletion operation seemed to have stalled, so I've stopped it, but the logs are still being filled with this at a rate of hundred or thousands of entries a second.

hard to say without going deep into this.

The ctdb wiki shows this as a performance contention issue:

https://wiki.samba.org/index.php/CTDB_Performance

https://wiki.samba.org/index.php/CTDB_database_design

I would recommend you first investigate if the CephFS layer is fine by running similar tests at the CephFS level rather than CIFS.

I would also investigate how you are deleting these files, from Windows Desktop or from some application, is it a single client application connected to 1 ctdb server, or multiprocess/threaded simultaneously deleting ? is the deleting client also the client that created and accessed the files or some other clients accessed the files being deleted ?

Does the system resources get busy during this ? ram/cpu/disks ? do you have CephFS on SSD pool or at least metadata on SSD ? is the status of ceph and ctdb (using command: ctdb status) healthy during this deletion ?

Can you control the deletion speed from you client, maybe spread the deletion across a wider timeframe. Does it help if you are deleting millions of files to stop a couple of CIFS services and have a few running ?

These are some suggestions, good luck.

Thanks for the feedback.

The deletion is via windows file explorer, which I believe is a single threaded deletion, and there's no way to control the speed. The files and folders were all first written around 6 months ago and have not been touched since, or accessed.

I read a few threads online around CTDB and some seemed to indicate that running out of disk space may have got the ctdb database corrupted, hence the lock bouncing back and forth as seen in the logs. They suggested stopping CTDB on all nodes and deleting the database (in /var/lib/ctdb/volatile) so that it get re-created fresh. Apparently there have been some fixes around this to make things recover more gracefully, which I'm guessing will likely be available in the next petasan release

I've done this and started the deletion again and will monitor and see how it goes. I'm keen to know if larger operations such as this via windows explorer via cifs -> cephfs are inherently troublesome so want to re-try this way before I try and do the deletion directly at the cephfs level.

Should fileshares such as this with tens of millions of files work reliably via cifs/cephfs, or would I be better sharing them via a Windows server attached to an RDB over iSCSI?

Pages: 1 2 3