BIND resolver sending query response to client with TTL of 53 years+ (system time as epoch seconds)
As reported in Support ticket #21962
We have an incomplete selection of packets from a packet trace wherein the sequence is:
Epoch Time: 1680345891.204918472 seconds Non-recursive query from server to (presumably) the server that the resolver has 'learned' (I think via following delegation from the parent zone for which it is a secondary) is auth for problem-name
Epoch Time: 1680345891.279429478 seconds Recursive query received from client for problem-name
Epoch Time: 1680345891.279457069 seconds Query response to the fetch. This query response has a TTL of only 1 second AND I think has come from some sort of proxy or load-balancing device because other evidence is that the names from this zone have auth TTL of 30s. Awfully, it also has RD and RD set, and doesn't have AA in the header, so the device isn't actually Auth for the zone and has fudged RD, not copied it from the query.
Epoch Time: 1680345891.429645634 seconds Query response to client with RRsets in the Answer section with TTL: 1680345891 (19448 days, 10 hours, 44 minutes, 51 seconds)
This is way way way bigger than the default max-cache-ttl, and coincidentally matches the timestamp on the packets that have been captured. I assume therefore that it's come from cache, since when we cache a learned RRset, we don't put in an absolute TTL, instead we compute the time that it should expire and record that instead.
It looks like to me, from the timestamps, that either the fetch was started on behalf of an earlier client query, or (more likely?) it's a prefetch on behalf of an earlier client query.
There is no serve-stale configured, but they do have customised prefetch:
prefetch 9 15;
stale-answer-enable no;
stale-cache-enable no;
(I wonder if turning off prefetch will make the problem go away?)
Also from the configuration:
There is dns64, but not involving any of these addresses
There is ECS, but not involving anything in/under this domain being queried
There is global forwarding, but this is overridden later by a zone statement that disables forwarding for any delegations below the parent zone for .
I have made this confidential for now, as I don't know whether or not it's specific to the BIND -S edition.
This problem started happening when they upgraded from 9.16.27-S1 to 9.16.39-S1
This may be related or similar to:
#2982 (closed) (I was reminded of this by the pattern of behaviour documented here: https://support.isc.org/Ticket/Display.html?id=21175#txn-823448 )
But it has also been suggested by engineering that this might be a variant on #3613 (closed)