A combination of an RPZ with NSIP triggers & unusually (but not entirely) broken delegation gives SERVFAIL & "rpz NSIP rewrite X via Y NS address rewrite rrset failed: failure"
Summary
When a configured response policy zone contains rpz-nsip triggers and NS record resolution does not complete successfully (but in an apparently unusual way), SERVFAIL is returned to queries that would otherwise succeed.
BIND version used
BIND 9.18.8 (Stable Release) <id:35f5d35>
running on Darwin arm64 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000
built by make with '--prefix=/opt/homebrew/Cellar/bind/9.18.8' '--sysconfdir=/opt/homebrew/etc/bind' '--localstatedir=/opt/homebrew/var' '--with-json-c' '--with-libidn2=/opt/homebrew/opt/libidn2' '--with-openssl=/opt/homebrew/opt/openssl@3' '--without-lmdb' 'CC=clang' 'PKG_CONFIG_PATH=/opt/homebrew/opt/json-c/lib/pkgconfig:/opt/homebrew/opt/libidn2/lib/pkgconfig:/opt/homebrew/opt/libnghttp2/lib/pkgconfig:/opt/homebrew/opt/libuv/lib/pkgconfig:/opt/homebrew/opt/openssl@3/lib/pkgconfig' 'PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig:/opt/homebrew/Library/Homebrew/os/mac/pkgconfig/12'
compiled by CLANG Apple LLVM 14.0.0 (clang-1400.0.29.202)
compiled with OpenSSL version: OpenSSL 3.0.5 5 Jul 2022
linked to OpenSSL version: OpenSSL 3.0.5 5 Jul 2022
compiled with libuv version: 1.44.2
linked to libuv version: 1.44.2
compiled with libnghttp2 version: 1.50.0
linked to libnghttp2 version: 1.50.0
compiled with libxml2 version: 2.9.4
linked to libxml2 version: 20904
compiled with json-c version: 0.16
linked to json-c version: 0.16
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /opt/homebrew/etc/bind/named.conf
rndc configuration: /opt/homebrew/etc/bind/rndc.conf
DNSSEC root key: /opt/homebrew/etc/bind/bind.keys
nsupdate session key: /opt/homebrew/var/run/named/session.key
named PID file: /opt/homebrew/var/run/named/named.pid
named lock file: /opt/homebrew/var/run/named/named.lock
Steps to reproduce
Configure a minimal BIND 9 recursive resolver with a response policy zone that includes an rpz-nsip match, and then attempt to resolve www.britishairways.com (which appears to have an unusually broken partially lame delegation).
What is the current bug behavior?
DNS resolution of "www.britishways.com" fails with SERVFAIL:
$ delv www.britishairways.com @::1
;; resolution failed: SERVFAIL
named (at -d 1) logs:
25-Oct-2022 22:53:35.007 client @0x1439ad560 ::1#53833 (www.britishairways.com): rpz NSIP rewrite www.britishairways.com via dnssec1-win.server.ntli.net NS address rewrite rrset failed: failure
25-Oct-2022 22:53:35.007 client @0x1439ad560 ::1#53833 (www.britishairways.com): query failed (SERVFAIL) for www.britishairways.com/IN/A at query.c:7232
What is the expected correct behavior?
DNS resolution of "www.britishairways.com" should succeed:
$ delv www.britishairways.com @::1
; unsigned answer
www.britishairways.com. 60 IN CNAME www.ba.com.edgekey.net.
www.ba.com.edgekey.net. 21600 IN CNAME e8308.b.akamaiedge.net.
e8308.b.akamaiedge.net. 20 IN A 104.117.169.173
Relevant configuration files
named.conf:
options {
response-policy {
zone "test.example.net" policy given;
};
};
zone "test.example.net" {
type primary;
file "test.example.net";
};
test.example.net zone file:
@ SOA ns1 hostmaster. (
2003080800 ; serial number
12h ; refresh
15m ; update retry
3w ; expiry
2h ; minimum
)
@ NS ns1
ns1 A 127.0.0.1
foo.com CNAME .
32.99.99.168.192.rpz-nsip CNAME .
Note that simply commenting out the final line in the zone file causes the problem to go away.
Relevant logs and/or screenshots
named -d 2 -g output:
[...]
25-Oct-2022 22:57:14.370 fetch: www.britishairways.com/A
25-Oct-2022 22:57:14.371 fetch: _.com/A
25-Oct-2022 22:57:14.393 fetch: _.britishairways.com/A
25-Oct-2022 22:57:14.419 fetch: ns1.britishairways.com/AAAA
25-Oct-2022 22:57:14.419 fetch: ns2.britishairways.com/AAAA
25-Oct-2022 22:57:14.419 fetch: dnssec1-win.server.ntli.net/A
25-Oct-2022 22:57:14.419 fetch: dnssec1-win.server.ntli.net/AAAA
25-Oct-2022 22:57:14.419 fetch: dnssec2-win.server.ntli.net/A
25-Oct-2022 22:57:14.419 fetch: dnssec2-win.server.ntli.net/AAAA
25-Oct-2022 22:57:14.439 delete_node(): 0x600002c8edf0 www.britishairways.com (bucket 15)
25-Oct-2022 22:57:14.440 fetch: britishairways.com/DS
25-Oct-2022 22:57:14.456 fetch: com/DNSKEY
25-Oct-2022 22:57:14.477 fetch: dnssec1-win.server.ntli.net/A
25-Oct-2022 22:57:14.494 lame server resolving 'dnssec2-win.server.ntli.net' (in 'server.ntli.net'?): 194.168.4.237#53
25-Oct-2022 22:57:14.495 lame server resolving 'dnssec1-win.server.ntli.net' (in 'server.ntli.net'?): 194.168.4.237#53
25-Oct-2022 22:57:14.495 lame server resolving 'dnssec1-win.server.ntli.net' (in 'server.ntli.net'?): 194.168.4.237#53
25-Oct-2022 22:57:14.497 lame server resolving 'dnssec2-win.server.ntli.net' (in 'server.ntli.net'?): 194.168.4.237#53
25-Oct-2022 22:57:14.497 lame server resolving 'dnssec1-win.server.ntli.net' (in 'server.ntli.net'?): 194.168.4.237#53
25-Oct-2022 22:57:14.522 lame server resolving 'dnssec2-win.server.ntli.net' (in 'server.ntli.net'?): 62.253.162.237#53
25-Oct-2022 22:57:14.522 fetch: dns1.ntli.net/AAAA
25-Oct-2022 22:57:14.522 fetch: dns2.ntli.net/AAAA
25-Oct-2022 22:57:14.522 lame server resolving 'dnssec1-win.server.ntli.net' (in 'server.ntli.net'?): 62.253.162.237#53
25-Oct-2022 22:57:14.522 lame server resolving 'dnssec1-win.server.ntli.net' (in 'server.ntli.net'?): 62.253.162.237#53
25-Oct-2022 22:57:14.522 lame server resolving 'dnssec2-win.server.ntli.net' (in 'server.ntli.net'?): 62.253.162.237#53
25-Oct-2022 22:57:14.523 lame server resolving 'dnssec1-win.server.ntli.net' (in 'server.ntli.net'?): 62.253.162.237#53
25-Oct-2022 22:57:14.523 client @0x11a870560 ::1#52973 (www.britishairways.com): rpz NSIP rewrite www.britishairways.com via dnssec1-win.server.ntli.net NS address rewrite rrset failed: failure
25-Oct-2022 22:57:14.523 client @0x11a870560 ::1#52973 (www.britishairways.com): query failed (SERVFAIL) for www.britishairways.com/IN/A at query.c:7232
25-Oct-2022 22:57:14.523 fetch completed at resolver.c:4140 for dnssec1-win.server.ntli.net/A in 0.045906: failure/success [domain:server.ntli.net,referral:0,restart:2,qrysent:2,timeout:0,lame:2,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
Possible fixes
Unclear, but appears to be a fault in the rpz-nsip processing when an "unusually unknown" NS IP is processed.