stub zone foiled by minimal-responses
Summary
It appears that BIND stub zones whose nameservers need glue records (because the NS targets are inside the zone) do not function correctly if the authoritative nameservers are configured to return minimal-responses.
- authoritative nameserver (adns) with zone "example.com" type master, listing itself as ns1.example.com
- recursive nameserver (rdns) with zone "example.com" type stub
BIND version used
# named -V
BIND 9.11.17 (Extended Support Version) <id:65c9496>
running on Linux x86_64 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/opt/isc/isc-bind/root/usr' '--exec-prefix=/opt/isc/isc-bind/root/usr' '--bindir=/opt/isc/isc-bind/root/usr/bin' '--sbindir=/opt/isc/isc-bind/root/usr/sbin' '--sysconfdir=/etc/opt/isc/isc-bind' '--datadir=/opt/isc/isc-bind/root/usr/share' '--includedir=/opt/isc/isc-bind/root/usr/include' '--libdir=/opt/isc/isc-bind/root/usr/lib64' '--libexecdir=/opt/isc/isc-bind/root/usr/libexec' '--localstatedir=/var/opt/isc/isc-bind' '--sharedstatedir=/var/opt/isc/isc-bind/lib' '--mandir=/opt/isc/isc-bind/root/usr/share/man' '--infodir=/opt/isc/isc-bind/root/usr/share/info' '--disable-static' '--enable-threads' '--enable-ipv6' '--enable-dnstap' '--with-pic' '--with-gssapi' '--with-libjson' '--with-libtool' '--with-libxml2' '--without-lmdb' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--with-python' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS= -L/opt/isc/isc-bind/root/usr/lib64' 'PKG_CONFIG_PATH=:/opt/isc/isc-bind/root/usr/lib64/pkgconfig:/opt/isc/isc-bind/root/usr/share/pkgconfig'
compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-39)
compiled with OpenSSL version: OpenSSL 1.0.2k 26 Jan 2017
linked to OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
compiled with libxml2 version: 2.9.1
linked to libxml2 version: 20901
compiled with libjson-c version: 0.11
linked to libjson-c version: 0.11
compiled with zlib version: 1.2.7
linked to zlib version: 1.2.7
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled
default paths:
named configuration: /etc/opt/isc/isc-bind/named.conf
rndc configuration: /etc/opt/isc/isc-bind/rndc.conf
DNSSEC root key: /etc/opt/isc/isc-bind/bind.keys
nsupdate session key: /var/opt/isc/isc-bind/run/named/session.key
named PID file: /var/opt/isc/isc-bind/run/named/named.pid
named lock file: /var/opt/isc/isc-bind/run/named/named.lock
Steps to reproduce
-
Configure two CentOS 7 instances (adns and rdns) running BIND 9.11.17-1.1.el7 from https://copr.fedorainfracloud.org/coprs/isc/bind-esv/
(For convenience, use low TTLs and disable DNSSEC validation. Don't configure minimal-responses yet. See full configs below.)
-
On rdns, verify that the stub zone works properly:
[root@dmrz-test-rdns ~]# dig @localhost +noall +ans example.com txt example.com. 60 IN TXT "hello world" [root@dmrz-test-rdns ~]# dig @localhost +noall +ans ns1.example.com a ns1.example.com. 60 IN A 10.224.255.51 [root@dmrz-test-rdns ~]# cat /var/opt/isc/isc-bind/named/data/example.com $ORIGIN . $TTL 60 ; 1 minute example.com IN SOA ns1.example.com. dns-admin.example.com. ( 1 ; serial 3600 ; refresh (1 hour) 900 ; retry (15 minutes) 1209600 ; expire (2 weeks) 60 ; minimum (1 minute) ) NS ns1.example.com. $ORIGIN example.com. ns1 A 10.224.255.51
As expected, our local file contains the SOA, NS, and glue records for this zone.
-
On adns, configure
minimal-responses yes;
.Also edit the master zone file to add a fictitious ns2.example.com (NS record and A record) and increment SOA serial to 2, in order to illustrate how these changes are received.
Finally,
rndc reload
. -
Observe at this point that rdns can still answer queries (because the local file already contains one working glue record), but rdns does not update its local glue records to reflect recent changes in the authoritative data.
[root@dmrz-test-rdns ~]# rndc refresh example.com zone refresh queued [root@dmrz-test-rdns ~]# dig @localhost +noall +ans ns2.example.com a ns2.example.com. 60 IN A 192.168.1.1 [root@dmrz-test-rdns ~]# cat /var/opt/isc/isc-bind/named/data/example.com $ORIGIN . $TTL 60 ; 1 minute example.com IN SOA ns1.example.com. dns-admin.example.com. ( 2 ; serial 3600 ; refresh (1 hour) 900 ; retry (15 minutes) 1209600 ; expire (2 weeks) 60 ; minimum (1 minute) ) NS ns1.example.com. NS ns2.example.com. $ORIGIN example.com. ns1 A 10.224.255.51
Note that our local file now contains the updated serial and the new NS record, but not the new A record.
-
Remove the existing local file on rdns and try to bootstrap it again from scratch.
[root@dmrz-test-rdns ~]# systemctl stop isc-bind-named [root@dmrz-test-rdns ~]# rm /var/opt/isc/isc-bind/named/data/example.com rm: remove regular file ‘/var/opt/isc/isc-bind/named/data/example.com’? y [root@dmrz-test-rdns ~]# systemctl start isc-bind-named
Observe that rdns now has no glue records for the stub zone at all:
[root@dmrz-test-rdns ~]# cat /var/opt/isc/isc-bind/named/data/example.com $ORIGIN . $TTL 60 ; 1 minute example.com IN SOA ns1.example.com. dns-admin.example.com. ( 2 ; serial 3600 ; refresh (1 hour) 900 ; retry (15 minutes) 1209600 ; expire (2 weeks) 60 ; minimum (1 minute) ) NS ns1.example.com. NS ns2.example.com.
and is consequently unable to answer queries for the zone:
[root@dmrz-test-rdns ~]# dig @localhost example.com txt ; <<>> DiG 9.11.17 <<>> @localhost example.com txt ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 46974 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 098de2585799f5c405e3568b5e87f44315250340a61a221b (good) ;; QUESTION SECTION: ;example.com. IN TXT ;; Query time: 4000 msec ;; SERVER: ::1#53(::1) ;; WHEN: Sat Apr 04 02:43:15 UTC 2020 ;; MSG SIZE rcvd: 68
-
Optionally rectify the problem by removing
minimal-responses yes;
and incrementing the SOA serial again (on adns), thenrndc refresh example.com
(on rdns).
What is the current bug behavior?
As shown above, a newly bootstrapped rdns returns SERVFAIL for queries in the stub zone, while an established rdns which already knows some glue records will not immediately break but will gradually drift (and possibly break later on).
The latter behavior is particularly insidious; it turns out that my production resolvers have been in this state ever since I changed my authoritative nameservers over a year ago, but since they haven't stopped working, I never noticed anything was amiss until I tried to stand up a brand new resolver.
What is the expected correct behavior?
I think the stub zone on rdns should notice whenever the NS records it retrieves from the master will require glue in order to use, and (if no glue records were included in the additional section) should send follow-up A/AAAA queries to the master to retrieve them.
Failing that (i.e. if ISC determines that it would complicate the implementation of stub zones too much), the description of stub zones in https://ftp.isc.org/isc/bind9/9.11.17/doc/arm/Bv9ARM.ch06.html#zone_statement should clearly warn that this use case is not supported. FWIW if the decision is not to support it, it may be less fragile to also stop supporting it even when the master does provide an additional section.
Further wrinkle: will the same master which knows the NS records always know the A records (in cases where glue is required)? I would think so in the real world, but I can imagine a theoretical corner-case scenario where asking that master for the A record might give a referral response instead:
- com.
example.com. NS parent-ns1.childzone.example.com. parent-ns1.childzone.example.com. A 10.224.255.51
- example.com. (served by 10.224.255.51)
example.com. NS parent-ns1.childzone.example.com. childzone.example.com. NS child-ns1.childzone.example.com. child-ns1.childzone.example.com. A 10.224.255.53
- childzone.example.com. (served by 10.224.255.53)
childzone.example.com. NS child-ns1.childzone.example.com. child-ns1.childzone.example.com. A 10.224.255.53 parent-ns1.childzone.example.com. A 10.224.255.51
Relevant configuration files
adns:/etc/opt/isc/isc-bind/named.conf
options {
directory "/var/opt/isc/isc-bind/named/data";
dnssec-validation auto;
allow-query { any; };
recursion no;
#minimal-responses yes;
};
logging {
channel default_debug {
file "named.run";
print-time yes;
severity dynamic;
};
};
zone "example.com" IN {
file "example.com";
type master;
};
adns:/var/opt/isc/isc-bind/named/data/example.com
$ORIGIN .
$TTL 60
example.com IN SOA ns1.example.com. dns-admin.example.com. (
1 ; serial
3600 ; refresh (1 hour)
900 ; retry (15 minutes)
1209600 ; expire (2 weeks)
60 ; minimum (1 minute)
)
NS ns1.example.com.
TXT "hello world"
$ORIGIN example.com.
ns1 A 10.224.255.51
rdns:/etc/opt/isc/isc-bind/named.conf
options {
directory "/var/opt/isc/isc-bind/named/data";
allow-query { localhost; };
recursion yes;
dnssec-validation no;
};
logging {
channel default_debug {
file "named.run";
print-time yes;
severity dynamic;
};
};
zone "example.com" IN {
file "example.com";
type stub;
masters { 10.224.255.51; };
};