Unexpected NODATA answers instead of successful response or SERVFAIL with serve-stale answers enabled and serve-stale-client-timeout set to positive value
Summary
Unexpected NODATA answer with serve-stale answers enabled and serve-stale-client-timeout
set more that 0.
BIND version used
BIND 9.16.38-Ubuntu (Extended Support Version) <id:af0056a>
running on Linux x86_64 5.15.0-1028-aws #32~20.04.1-Ubuntu SMP Mon Jan 9 18:02:08 UTC 2023
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--libexecdir=/usr/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=/usr' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-5oQ5lS/bind9-9.16.38=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 9.4.0
compiled with OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
linked to OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
compiled with libuv version: 1.44.2
linked to libuv version: 1.44.2
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with json-c version: 0.13.1
linked to json-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.4.2
compiled with protobuf-c version: 1.3.3
linked to protobuf-c version: 1.3.3
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
Steps to reproduce
There are two cases affected with unexpected NODATA answers.
- First test case: recursion enabled, upstreams available, no data in cache.
Expected response: NOERROR with data if recursion succeeded in timeout value (10s)
- OS: Ubuntu 20.04, kernel 5.15.0-1028-aws
- Freshly installed Bind9 from https://launchpad.net/~isc/+archive/ubuntu/bind-esv
- In config file set that options:
dnssec-validation no;
stale-cache-enable yes;
stale-answer-enable yes;
stale-answer-client-timeout 150;
qname-minimization disabled;
listen-on-v6 { none; };
- Restart named (or do
rndc flush
to clean up caches) - run command:
dig +timeout=10 @127.0.0.1 ta1.myctl.com A
If time to answer takes more than 150ms and there is no data in cache, I get NOERROR NODATA answer exact after 150ms:
root@ip-172-31-40-214:/etc/bind# dig +timeout=10 @127.0.0.1 ta1.myctl.com A
; <<>> DiG 9.16.38-Ubuntu <<>> +timeout @127.0.0.1 ta1.myctl.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1739
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 6
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d202b375fc315c390100000064134c5002732d2be0d5cfc3 (good)
;; QUESTION SECTION:
;ta1.myctl.com. IN A
;; AUTHORITY SECTION:
myctl.com. 172800 IN NS ns-canada.topdns.com.
myctl.com. 172800 IN NS ns-usa.topdns.com.
myctl.com. 172800 IN NS ns-uk.topdns.com.
;; ADDITIONAL SECTION:
ns-uk.topdns.com. 172800 IN A 77.247.183.137
ns-uk.topdns.com. 172800 IN A 108.61.150.91
ns-usa.topdns.com. 172800 IN A 46.166.189.99
ns-usa.topdns.com. 172800 IN A 108.61.12.163
ns-canada.topdns.com. 172800 IN A 109.201.142.225
;; Query time: 151 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Mar 16 17:05:20 UTC 2023
;; MSG SIZE rcvd: 225
I expect successful answer if upstreams answer within dig timeout option value.
I couldn't reproduce such behaviour with qname-minimization relaxed;
As soon as data appears in cache, I see correct answers.
- Second test case: recursion enabled, upstreams unavailable, no data in cache.
Expected response: SERVFAIL instead of NODATA
The same initial OS and Bind version like in the first test case:
- asking query to fill prime cache: dig +timeout=10 @127.0.0.1 ta1.myctl.com A
- block all external DNS answers with iptables:
sudo iptables -A INPUT -s 127.0.0.0/24 -p udp --sport 53 -j ACCEPT && \
sudo iptables -A INPUT -s 127.0.0.0/24 -p tcp --sport 53 -j ACCEPT && \
sudo iptables -A INPUT -s 0.0.0.0/0 -p udp --sport 53 -j DROP && \
sudo iptables -A INPUT -s 0.0.0.0/0 -p tcp --sport 53 -j DROP
- flushing name
ta1.myctl.com
with command:rndc flushname ta1.myctl.com
- running query:
dig +timeout=10 @127.0.0.1 ta1.myctl.com A
Getting NODATA response after 150ms:
; <<>> DiG 9.16.38-Ubuntu <<>> +timeout @127.0.0.1 ta1.myctl.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54375
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 6
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 6373254ddb0b95f10100000064134f9530da2cc66aff4cd7 (good)
;; QUESTION SECTION:
;ta1.myctl.com. IN A
;; AUTHORITY SECTION:
myctl.com. 3556 IN NS ns-canada.topdns.com.
myctl.com. 3556 IN NS ns-usa.topdns.com.
myctl.com. 3556 IN NS ns-uk.topdns.com.
;; ADDITIONAL SECTION:
ns-uk.topdns.com. 172756 IN A 77.247.183.137
ns-uk.topdns.com. 172756 IN A 108.61.150.91
ns-usa.topdns.com. 172756 IN A 46.166.189.99
ns-usa.topdns.com. 172756 IN A 108.61.12.163
ns-canada.topdns.com. 172756 IN A 109.201.142.225
;; Query time: 151 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Mar 16 17:19:17 UTC 2023
;; MSG SIZE rcvd: 225
- running the same query again:
Getting SERVFAIL (after
resolver-query-timeout
)
; <<>> DiG 9.16.38-Ubuntu <<>> +timeout @127.0.0.1 ta1.myctl.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 25802
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 23fed5e0a2a249db01000000641350293043165904e30895 (good)
;; QUESTION SECTION:
;ta1.myctl.com. IN A
;; Query time: 9827 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Mar 16 17:21:45 UTC 2023
;; MSG SIZE rcvd: 70
What is the current bug behavior?
When serve-stale answers is enabled, and stale-answer-client-timeout set to positive value, I see unexpected NODATA answers (exact after stale-answer-client-timeout passed) instead of success or SERVFAIL.
What is the expected correct behavior?
First test case: When upstreams available and can answer in timeout option (in all examples it is 10 seconds), and data is not in cache - return answer. Second test case: When upstreams are not available and no data in cache - return SERVFAIL instead of NODATA
Relevant configuration files
logging {
channel "standard_var_log" {
file "/var/log/named/named.log" versions 3 size 104857600;
severity info;
print-time yes;
print-severity yes;
print-category yes;
};
category "default" {
"standard_var_log";
};
category "lame-servers" {
"null";
};
category "rpz" {
"null";
};
category "serve-stale" {
"standard_var_log";
};
};
options {
directory "/var/cache/bind";
listen-on-v6 {
"none";
};
dnssec-validation no;
qname-minimization disabled;
stale-answer-enable yes;
stale-answer-client-timeout 150;
stale-cache-enable yes;
};
zone "." {
type hint;
file "/usr/share/dns/root.hints";
};
zone "localhost" {
type master;
file "/etc/bind/db.local";
};
zone "127.in-addr.arpa" {
type master;
file "/etc/bind/db.127";
};
zone "0.in-addr.arpa" {
type master;
file "/etc/bind/db.0";
};
zone "255.in-addr.arpa" {
type master;
file "/etc/bind/db.255";
};
Relevant logs and/or screenshots
16-Mar-2023 17:18:37.566 general: info: received control channel command 'flushname ta1.myctl.com'
16-Mar-2023 17:18:37.566 general: info: flushing name 'ta1.myctl.com' in all cache views succeeded
16-Mar-2023 17:18:37.586 general: info: received control channel command 'dumpdb'
16-Mar-2023 17:18:37.586 general: info: dumpdb started
16-Mar-2023 17:18:37.590 general: info: dumpdb complete
16-Mar-2023 17:18:38.846 general: info: received control channel command 'flushname ta1.myctl.com'
16-Mar-2023 17:18:38.846 general: info: flushing name 'ta1.myctl.com' in all cache views succeeded
16-Mar-2023 17:18:38.866 general: info: received control channel command 'dumpdb'
16-Mar-2023 17:18:38.866 general: info: dumpdb started
16-Mar-2023 17:18:38.870 general: info: dumpdb complete
16-Mar-2023 17:18:40.506 general: info: received control channel command 'flushname ta1.myctl.com'
16-Mar-2023 17:18:40.506 general: info: flushing name 'ta1.myctl.com' in all cache views succeeded
16-Mar-2023 17:18:40.526 general: info: received control channel command 'dumpdb'
16-Mar-2023 17:18:40.526 general: info: dumpdb started
16-Mar-2023 17:18:40.530 general: info: dumpdb complete
16-Mar-2023 17:19:16.978 general: info: received control channel command 'flushname ta1.myctl.com'
16-Mar-2023 17:19:16.982 general: info: flushing name 'ta1.myctl.com' in all cache views succeeded
16-Mar-2023 17:19:16.998 general: info: received control channel command 'dumpdb'
16-Mar-2023 17:19:16.998 general: info: dumpdb started
16-Mar-2023 17:19:17.022 general: info: dumpdb complete
16-Mar-2023 17:19:19.394 general: info: received control channel command 'flushname ta1.myctl.com'
16-Mar-2023 17:19:19.394 general: info: flushing name 'ta1.myctl.com' in all cache views succeeded
16-Mar-2023 17:19:19.414 general: info: received control channel command 'dumpdb'
16-Mar-2023 17:19:19.414 general: info: dumpdb started
16-Mar-2023 17:19:19.422 general: info: dumpdb complete
16-Mar-2023 17:19:27.022 client: info: client @0x7f8fc85d8880 127.0.0.1#49750 (ta1.myctl.com): recursion loop detected
16-Mar-2023 17:19:27.022 client: info: client @0x7f8fc85e97a0 127.0.0.1#54607 (ta1.myctl.com): recursion loop detected
16-Mar-2023 17:21:35.554 general: info: received control channel command 'flushname ta1.myctl.com'
16-Mar-2023 17:21:35.554 general: info: flushing name 'ta1.myctl.com' in all cache views succeeded
16-Mar-2023 17:21:35.570 general: info: received control channel command 'dumpdb'
16-Mar-2023 17:21:35.570 general: info: dumpdb started
16-Mar-2023 17:21:35.578 general: info: dumpdb complete
16-Mar-2023 17:21:45.594 client: info: client @0x7f8fc85e97a0 127.0.0.1#38227 (ta1.myctl.com): recursion loop detected
16-Mar-2023 17:21:45.594 client: info: client @0x7f8fc85d8880 127.0.0.1#41798 (ta1.myctl.com): recursion loop detected
Reference: Support ticket #21952