Better handling of unrealistic values given to max-cache-size
As reported to ISC:
When max-cache-size was set to 2M, and QPS was over 10k, the memory being used by cache couldn't be limited as expected. But if it was set to 50M, then everything looked fine.
This is unsurprising. max-cache-size being reached is a signal to named that it needs to deploy more 'assertive' cache cleaning methods but nevertheless named may not be able to clean up as quickly as new client queries are causing named to consume more cache memory, simply by handling the recursive activity needed to get the answers to the client.
Client queries will not fail because max-cache-size is exceeded - they will only fail if more memory is requested but isn't available from the host system.
Meanwhile, named will be running around trying to free up cache memory all of the time - and potentially failing to do so because there aren't any/many cache entries not actively being used, which therefore can't be freed yet.
The architecture of the operation of named as a recursive/forwarding server, is that RRs are obtained from other DNS servers, placed into cache, and then the client-facing server code uses those (newly, or previously obtained) cache entries to construct the query response to the client(s).
Therefore at least some cache memory is needed for named to be able to serve recursive clients at all - and the higher the QPS, the more "work in progress" cache space that is needed, even if newly cached entries are being discarded immediately after use.
That's the bare-bones way of looking at it.
A caching server that doesn't cache very much at all is going to have to talk to other DNS servers all the time to get answers for clients - generally this is suboptimal (putting it mildly!) for performance and throughput - although there may be some obscure corner cases where it's desirable to operate like this.
Perhaps we should:
- Set a minimum for max-cache-size
- Document realistic operational values (and why we recommend those)
- Document how named manages max-cache-size during operation
- Have named fail to start if max-cache-size is set too small (as determined by the operational/configured environment)