Regression in BIND 9.16.10, DNSSEC fails due to improper NSEC3 creation witihin named
Summary
I could not renew LetsEncrypt certs due to what it reported as a SERVFAIL on CAA. Turns out that's a red herring; the actual error was a failure with DNSSEC, in particular the NSEC3 proving that no CAA exists was in error, causing Unbound to report:
- validate(nodata): sec_status_bogus
Debugging with wireshark showed nothing unusual. On a whim, I reverted from BIND 9.16.11 to 9.16.9, which is the version I was running the last time that LetsEncrypt renewal worked. Success! A check with unboundtest.com showed no errors, and LetsEncrypt worked fine.
Next, I installed BIND 9.16.10, and that failed. So the bug is a regression of some sort introduced sometime between 9.16.9 and 9.16.10. I could git bisect, but as this is my production system, I'm hoping somebody here at ISC might have a better idea of what to test, or how to test off-line.
There is more written about it here:
BIND version used
BIND 9.16.11 (Stable Release) <id:9ff601b>
running on Linux x86_64 5.10.16 #1 SMP Sun Feb 14 18:52:33 EST 2021
built by make with '--build=x86_64-slackware-linux-gnu' '--docdir=/usr/doc/bind-9.16.11' '--sysconfdir=/etc/bind' '--infodir=/usr/info' '--libdir=/usr/lib64' '--mandir=/usr/man' '--prefix=/usr' '--localstatedir=/var' '--enable-largefile' '--with-libtool' '--enable-shared' '--with-gssapi=/usr' '--with-libidn2' '--with-dlopen' '--with-dlz-filesystem' '--with-dlz-stub' '--enable-auto-validation' '--enable-dnsrps' '--enable-full-report' '--enable-fixed-rrset' '--enable-querytrace' '--with-python=/usr/bin/python3' '--with-openssl' 'build_alias=x86_64-slackware-linux-gnu' 'CFLAGS=-g -O2 -fPIC -march=opteron' 'PKG_CONFIG_PATH=/usr/local/lib64/pkgconfig:/usr/local/share/pkgconfig:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 10.2.0
compiled with OpenSSL version: OpenSSL 1.1.1i 8 Dec 2020
linked to OpenSSL version: OpenSSL 1.1.1j 16 Feb 2021
compiled with libuv version: 1.40.0
linked to libuv version: 1.41.0
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with json-c version: 0.15
linked to json-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
Steps to reproduce
- Create a simple example.com zone with some name servers, a content server, CNAME aliases for www.example.com, ftp.example.com and so forth pointing to content-server.example.com
- Set up BIND with dnssec-validation: auto, and give it a generic/default policy.
dnssec-policy "example" { keys { ksk key-directory lifetime unlimited algorithm ecdsa256; zsk key-directory lifetime P90D algorithm ecdsa256; }; };
- Add a ZSK and KSK to your named.conf for BIND to use in signing the zone. Publish the DS to the parent zone/registrar to allow the zone to be validated by public tools.
- To the example.com.zone file, add a line to instruct BIND to create and maintain NSEC3 records.
example.com. IN NSEC3PARAM 1 0 15 fedcba9876543210
- Start bind, and allow it to create the DNSSEC RRSIGs and other records.
- Analyze the result using an external tool, such as dnsvis: https://dnsviz.net/d/example.com/dnssec/
- Or, you can analyze it locally:
- dig example.com axfr | grep -v TSIG > example.com.signed.zone
- dnssec-verify -o example.com example.com.signed.zone
Loading zone 'example.com' from file 'example.com.signed.zone' Verifying the zone using the following algorithms: ECDSAP256SHA256. No correct ECDSAP256SHA256 signature for T6FO800JTQPO6ARFM1A2U9RNPG2PBQFR.example.com NSEC3 The zone is not fully signed for the following algorithms: ECDSAP256SHA256. DNSSEC completeness test failed.
- rndc zonestatus example.com IN external
name: example.com type: master files: external/example.com.zone serial: 2021021708 nodes: 14 last loaded: Wed, 17 Feb 2021 08:35:10 GMT secure: yes inline signing: no key maintenance: automatic next key event: Sat, 20 Feb 2021 20:59:19 GMT next resign node: U5OQPCDUCDD204VCT9H7GM8K3G8U8205.example.com/NSEC3 next resign time: Thu, 18 Feb 2021 20:17:17 GMT dynamic: yes frozen: no reconfigurable via modzone: no
What is the current bug behavior?
According to dnsviz.net, the NSEC3 records are not properly populated to prove a zone node does not exist. Some of the RRSIG/DNSKEY records also appear to be improper, as LetsEncrypt (which uses Unbound) cannot validate some records that do exist, such as the TXT of the _acme_challenge it uses.
[Update] It appears that even in BIND 9.16.9, there is still a bug in that NSEC3 records are only partially populated. According to dnsviz.net, checking for example.com succeeds, but checking for doesnotexist.example.com may or may not show DNSSEC validation errors. But of course, in BIND 9.16.10 and 9.16.11, even the check of the existing example.com results in red portions of the graph it presents.
What is the expected correct behavior?
BIND should properly maintain its NSEC3 and related DNSSEC records so that validation tools show no errors.
Relevant configuration files
I think the snippets I included in the Steps to Reproduce section, above, should be sufficient. I am not doing anything unusual here, to the best of my knowledge; I'm using a dnssec-policy that's essentially the built-in default, except for elliptic-curve encryption types. Keys are all Algorithm 13.
Relevant logs and/or screenshots
When running BIND 9.16.10, I see the following which is not present running 9.16.9 and below:
dnssec: warning: client @0x7fffb001a5a8 66.133.109.36#47330 (nS3.eXamPle.com): view external: expected a exact match NSEC3, got a covering record
dnssec: warning: client @0x7fffc000cec8 66.133.109.36#11834 (FTp.EXaMplE.COM): view external: expected a exact match NSEC3, got a covering record
Possible fixes
I have not yet done a git bisect, and hope not to do it as my means of testing (running LetsEncrypt to generate certificates) requires this to be my live, production site. But bisecting is possible if the devos cannot guess what might be the issue.