serve-stale degraded performance, resolver client leak, and logspam
Summary
When serve-stale
is enabled, named
can be forced into a bad state in which:
- the performance of the server drops dramatically
- it leaks resolver client tasks
- it uses large amounts of CPU even when not serving real clients
- it writes a huge volume of 'stale answer' log messages
In my testing this bad state starts after about an hour.
This was originally reported in #185 (closed) but this problem seems to be separate from the crash so I am opening a new issue.
Steps to reproduce
I run three copies of the following command-line
$ while sleep $(( RANDOM / 100 ))
do time adns-masterfile -s $SERVER -m 100 -qqqq <named_dump_reverse.db
done
The dump file is available from https://jackdaw.cam.ac.uk/ipreg/named_dump_reverse.db - it contains about 350000 records from . to as - mainly in-add.arpa, which has a lot of broken authoritative servers and therefore is good for exercising serve-stale
.
adns-masterfile
comes from https://git.uis.cam.ac.uk/x/uis/ipreg/adns-masterfile.git
Relevant configuration files
I use the following options to enable serve-stale
max-stale-ttl 1h;
stale-answer-enable yes;
Relevant logs and/or screenshots
Here is an indication of how fast it is logging when in the bad state:
-rw-rw-r-- 1 named named 11M Apr 11 17:01 named.log.1
-rw-rw-r-- 1 named named 11M Apr 11 17:01 named.log.2
-rw-rw-r-- 1 named named 11M Apr 11 17:01 named.log.3
-rw-rw-r-- 1 named named 11M Apr 11 17:01 named.log.4
-rw-rw-r-- 1 named named 11M Apr 11 17:00 named.log.5
-rw-rw-r-- 1 named named 11M Apr 11 17:00 named.log.6
-rw-rw-r-- 1 named named 11M Apr 11 17:00 named.log.7
-rw-rw-r-- 1 named named 11M Apr 11 17:00 named.log.8
-rw-rw-r-- 1 named named 11M Apr 11 17:00 named.log.9