stale-answer-client-timeout 0 does not serve fresh answer from prior NXDOMAIN
Summary
BIND version affected
BIND 9.16.23-RH (Extended Support Version) <id:fde3b1f>
running on Linux x86_64 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Mon Apr 8 11:23:13 EDT 2024
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--with-maxminddb' '--with-dlopen=yes' '--with-gssapi=yes' '--with-lmdb=yes' '--without-libjson' '--with-json-c' '--enable-dnstap' '--enable-fixed-rrset' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 8.5.0 20210514 (Red Hat 8.5.0-20)
compiled with OpenSSL version: OpenSSL 1.1.1k FIPS 25 Mar 2021
linked to OpenSSL version: OpenSSL 1.1.1k FIPS 25 Mar 2021
compiled with libuv version: 1.41.1
linked to libuv version: 1.41.1
compiled with libxml2 version: 2.9.7
linked to libxml2 version: 20907
compiled with json-c version: 0.13.1
linked to json-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.2.0
compiled with protobuf-c version: 1.3.0
linked to protobuf-c version: 1.3.0
threads support is enabled
Steps to reproduce
- Internal domain forwarded to an authoritative system that has records added/removed by API calls.
zone "dev-example.com" IN {
type forward;
forwarders { 10.55.51.130; 10.55.50.130; };
};
- Non-existent record is queried and NXDOMAIN result is cached according to configured NTTL (60s).
# dig @localhost example.dev-example.com
; <<>> DiG 9.16.23-RH <<>> @localhost example.dev-example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 31241
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: a9497143aead2c9b01000000662b0ff2613a8c532df0d0b5 (good)
;; QUESTION SECTION:
;example.dev-example.com. IN A
;; AUTHORITY SECTION:
dev-example.com. 27 IN SOA xxxx. xxx. 2024045182 900 600 86400 1200
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Apr 26 02:22:42 UTC 2024
;; MSG SIZE rcvd: 160
- Record is added to zone of internal domain that BIND resolver is forwarding to.
dig @10.55.50.130 example.dev-example.com
; <<>> DiG 9.16.23-RH <<>> @10.55.50.130 example.dev-example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27878
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;example.dev-example.com. IN A
;; ANSWER SECTION:
example.dev-example.com. 66 IN A 10.1.1.1
;; Query time: 1 msec
;; SERVER: 10.55.50.130#53(10.55.50.130)
;; WHEN: Fri Apr 26 02:28:56 UTC 2024
;; MSG SIZE rcvd: 128
- Query is made to local resolver - response is that of NXDOMAIN counting down NTTL - expected behavior.
- Query is made to local resolver after 60s NTTL expiry - response returned is stale NXDOMAIN record w/ stale TTL.
dig @localhost example.dev-example.com
; <<>> DiG 9.16.23-RH <<>> @localhost example.dev-example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 45371
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 56ceac3f3abcb52e01000000662b11725396ccbb2b69b49a (good)
;; QUESTION SECTION:
;example.dev-example.com. IN A
;; AUTHORITY SECTION:
dev-example.com. 30 IN SOA ins.dev-example.com. hostmaster.dev-example.com. 2024045182 900 600 86400 1200
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Apr 26 02:29:06 UTC 2024
;; MSG SIZE rcvd: 160
- Updated record from authoritative forward is never returned (or at least not prior to max-stale-ttl as best as I can guess at this point).
- Initial cache dumps right after NTTL expiry still show an negative cached record, but subsequent dumps show the updated record has been retrieved and cached, but it is never served to the requesting client, clients are still returned a stale NXDOMAIN record.
# grep example.dev-example.com /var/named/data/named.dump
example.dev-example.com. 43089 \-ANY ;-$NXDOMAIN
# grep example.dev-example.com /var/named/data/named.dump
example.dev-example.com. 79 A 10.1.1.1
What is the current bug behavior?
An NXDOMAIN stale response is returned to a resolver client query despite a valid answer being available from authoritative sources IF the name was queried PRIOR to it's existence on the authoritative system, serve-stale is enabled, and stale-answer-client-timeout 0
is set.
What is the expected correct behavior?
Expected behavior is that when a valid answer exists and that answer is cached, it should then be served to the client instead of an invalid and stale NXDOMAIN response.
Relevant configuration files
Relevant config segments:
max-cache-ttl 21600;
max-ncache-ttl 60;
min-cache-ttl 90;
stale-cache-enable yes;
stale-answer-enable yes;
stale-refresh-time 120;
stale-answer-client-timeout 0;
max-stale-ttl 43200;
prefetch 10;
...
response-policy {
zone "rpz01";
zone "rpz02" log no;
zone "rpz03" log no;
zone "rpz04" policy passthru;
};
Relevant logs
# grep example.dev-example.com /var/log/messages
Apr 26 02:23:10 dns01 named[3702]: serve-stale: info: example.dev-example.com stale answer used, an attempt to refresh the RRset will still be made
Apr 26 02:24:10 dns01 named[3702]: serve-stale: info: example.dev-example.com stale answer used, an attempt to refresh the RRset will still be made
Apr 26 02:29:06 dns01 named[3702]: serve-stale: info: example.dev-example.com stale answer used, an attempt to refresh the RRset will still be made
Apr 26 02:29:40 dns01 named[3702]: serve-stale: info: example.dev-example.com stale answer used, an attempt to refresh the RRset will still be made