BIND resolver incorrectly handles NODATA/NOERROR (NXRRSET) query response when CNAME is queried during prefetch
Summary
"A" fetch to an auth server returns "CNAME". But (it appears), with prefetch enabled (the default), when the "CNAME" is fetched the authoritative sends back noanswer/noerror = NXRRSET). Clearly this is broken behaviour on the part of the Auth servers (or they just changed their zone from providing a CNAME to providing an answer) but I still don't see why it breaks a BIND resolver - which should just at this point understand that the CNAME no longer exists and (as needed, as a result of client queries) query instead from the beginning with the RTYPE the client needs to have resolved.
Instead, the resolver is returning the empty answer to querying clients (who are not querying for CNAME, they are querying the resolver for A)
BIND version used
9.16.35-S1
Steps to reproduce
We don't have a reproducer at this time, but see the Support ticket for more details on what's happening. You need an authoritative server that responds with a CNAME (with a valid target) when queried for A (or other) rtypes for a name, but when queried explicitly for CNAME, sends back noerror/noanswer (essentially NXRRSET). Then enable prefetch and keep querying the server for record type A until the CNAME is close to expiry and is therefore prefetched explicitly...
What is the current bug behavior?
When we are handling a client query, we are making queries to cache or to authoritative servers (cache miss) but all of those queries are for the RTYPE that we want to resolve. We don't query for type CNAME. IF we hit a CNAME along the way, then that will cause us to start a new query (from cache or initiate a fetch if we need to) using the target of the CNAME as the new name to be queried.
So far so good. This implies that the code that looks in cache and gets an answer from a fetch handles CNAME as a special case and that we likely look for or cache EXPLICITLY for CNAMEs while we're looking for the RTYPE that we actually want to resolve.
I suspect that we would not expect to find in cache an NXRRSET of type CNAME. Essentially this is meaningless to us - if CNAME doesn't exist than any other record type might exist, we just don't know, it might as well just not be there.
If we get back 'NXRRSET' from a fetch for type CNAME, do we even add it to cache, or does this result in us deleting the original CNAME RR?
Whatever we do with it, it appears to 'break' the cache so that clients get back NOANSWER (empty answer) instead of named doing another fetch based on the RTYPE of the client query made after this CNAME has been refreshed.
What is the expected correct behavior?
Getting a 'NXRRSET' query response from an auth server that has explicitly been queried for a CNAME RR (to refresh what was in cache before - as instigated by prefetch) should not cause the cache to no longer be able to resolve queries for that name for other RTYPEs.
Subsequent client queries after receipt of the auth answer that says that the CNAME no longer exists, should cause new fetches to the auth server with the RTYPE of the client query in them.
Is it remotely possibly however, that finding a CNAME in cache (since we already know that we do something special if we find it) but then finding that it's not a pointer to 'go look up this name instead of the one you had' but instead is NXRRSET (whoa that wasn't what we expected to find!) could cause something aberrant to happen ... ? Or maybe this is a subtle race condition do with replacing the CNAME with NXRRSET for the CNAME (or deleting it entirely) because of the query response from the auth server, and this happening as a result of the prefetch, but now racing with the next client query that is looking in cache?
Relevant configuration files
No configs - nothing special needed, just prefetch enabled so that when the CNAME in cache is close to expiry, a client query will trigger a prefetch.
Relevant logs and/or screenshots
N/A - please see support ticket for more details
Possible fixes
N/A
(P.S. With the info available and with what we know it was very hard to complete this report per the template).