Override fetch-limits when stale-answer-enable is 'yes' for queries that are attempting to refresh positive stale RRsets
See also #2066 (closed), #2273 (closed), #2247 (closed) and #2248 (closed)
The final piece of the puzzle is having Serve Stale play nicely with fetch-limits.
See also https://bugs.isc.org/Ticket/Display.html?id=41259
Q. When fetch-limits are triggered, how do you give precedence to known 'good' client queries?
A. You base this on the existence of stale content already in cache.
Q. How do you override fetch-limits in a safe way (so that you don't override fetch-limits for every 'good' client query and still overwhelm the authoritative servers? A. You have to have serving of stale answers enabled - that way you already have a built-in mechanism to rate-limit the refresh retries.
The current (and soon to be improved) implementation of serve-stale still only triggers serving of stale content if there is a stale answer already in cache that could be used, AND an attempt to refresh the stale content has timed out. If we don't attempt to refresh it, then it won't be served stale and the stale-refresh-time countdown won't be started for this query.
The same also applies if we served it stale once, but the stale-refresh-time has now expired again, and we need to try again to contact the authoritative servers - if we get blocked by fetch-limits (fetches-per-zone or fetches-per-server) we still won't use the stale data.
My request is that if these three condition are met, that we bypass fetchlimits and try to resolve the query anyway. The conditions are:
- The server has
stale-answer-enable yes;
- The name/type being resolved can be answered from cache, should it be necessary to serve a stale answer
- This is not a negative response (NXDOMAIN or NXRRSET)
The risk of bypassing fetchlimits is low
a) we know that this query has been resolved before, so it might be possible to again and it is a better candidate query to bypass fetchlimits than a brand new name that this resolver hasn't encountered before - particularly if this resolver is also being used to participate in a PRSD-style attack
b) even if the query times out, this will trigger serving of the stale answer to all other clients who arrive with the same query during stale-refresh-time
It would be really good to get this into the December releases with the other improvements to Serve Stale