BIND 9.12 and newer doesn't accept non-authoritative answer from parent glue when resolving delegated nameserver addresses
We have encountered an authoritative server that properly delegates a zone (the parent holds the NS and glue records), but when queried for the address of one of the delegated nameservers, responds, not with a referral, but a non-authoritative answer.
This is what we get (which we believe to be broken behaviour) from that server (names changed for anonymity):
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33685
;; flags: qr; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 4
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;dns1.datacentres.broken.com. IN A
;; ANSWER SECTION:
dns1.datacentres.broken.com. 86400 IN A 192.0.2.210
;; AUTHORITY SECTION:
datacentres.broken.com. 3600 IN NS dns1.datacentres.broken.com.
datacentres.broken.com. 3600 IN NS dns2.datacentres.broken.com.
datacentres.broken.com. 3600 IN NS dns3.datacentres.broken.com.
;; ADDITIONAL SECTION:
dns1.datacentres.broken.com. 86400 IN A 192.0.2.210
dns2.datacentres.broken.com. 86400 IN A 192.0.2.140
dns3.datacentres.broken.com. 86400 IN A 192.0.2.16
This is what we'd expect:
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38219
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 4
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;dns1.datacentres.behave.com. IN A
;; AUTHORITY SECTION:
datacentres.behave.com. 3600 IN NS dns1.datacentres.behave.com.
datacentres.behave.com. 3600 IN NS dns2.datacentres.behave.com.
datacentres.behave.com. 3600 IN NS dns3.datacentres.behave.com.
;; ADDITIONAL SECTION:
dns1.datacentres.behave.com. 86400 IN A 192.0.2.210
dns2.datacentres.behave.com. 86400 IN A 192.0.2.140
dns3.datacentres.behave.com. 86400 IN A 192.0.2.16
The broken/unexpected behaviour is the receipt of the glue from the delegating parent, provided as a non-authoritative answer when following the delegation/referrals path, instead of as a non-authoritative referral.
The bug however is that BIND from 9.12 and newer, and depending on the state of its cache (this fails consistently with a clean cache) will fail to resolve queries for dns1.datacentres.broken.com (and dns2/3) and also for names in any other domains that use these nameservers to host their domain. SERVFAIL is returned to the requesting client by the resolver.
Occasionally the problem can be temporarily 'fixed' - we think because of the way cache is populated during resolution of other client queries.
We have some debug information in support ticket https://support.isc.org/Ticket/Display.html?id=13359
This issue did not occur with BIND 9.11.4 and older.
I haven't been able to identify which specific change(s) introduced the less-forgiving behaviour of named.