zone with dnssec-policy breaks when you add update-policy to it
Summary
I have a zone which is configured with a dnssec-policy
, which has been working fine for a long time. Then yesterday I added an update-policy
stanza to the zone. The reason for doing this was to allow certbot to use DNS01 challenge to issue a letsencrypt certificate. After I did this I discovered that replication from the primary to the secondary servers for the zone had stopped working.
On investigation I discovered that the zone SOA serial number on the primary server had gone backwards, and this was the reason why the zone had stopped replicating.
Before adding the update-policy
stanza, the files in /var/lib/bind were as follows. (NB: I don't remember the exact SOA serial numbers so I've made up SOA serial numbers and zone name for illustrative purposes.)
- db.example.com (hard-link to my original zone file with SOA serial=2022010101)
- db.example.com.jbk
- db.example.com.jnl
- db.example.com.signed (signed version of my zone file, with SOA serial=2022010180 - i.e. much greater than original)
- db.example.com.signed.jnl
After adding the update-policy
stanza, the db.example.com file had changed:
- db.example.com (no longer a hard-link, but now a signed version of my original zone file with SOA serial=2022010120 - i.e. greater than my original zone file but less than the SOA serial number in db.example.com.signed)
The other files were still there, although I didn't notice which of them had changed. In trying to get everything working I stopped named.service, deleted all of the files above and recreated the hard-link to the original file, and also deleted the contents of /var/cache/bind, then started named.service again, and after that I discovered that the .signed files hadn't been recreated. So I was left with:
- db.example.com (signed version of my original zone file with SOA serial greater than my original zone file but less than what the SOA serial number in db.example.com.signed had been)
- db.example.com.jnl
I managed to fix the problem by removing the update-policy
from the zone, and repeating the steps above (i.e. stopped named.service, deleted all of the files above and recreated the hard-link to the original file, and also deleted the contents of /var/cache/bind, then started named.service again).
BIND version used
BIND 9.18.1-1ubuntu1.1-Ubuntu (Stable Release) <id:>
running on Linux x86_64 5.15.0-33-generic #34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--disable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=yes' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/bind9-IeZYTB/bind9-9.18.1=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 11.2.0
compiled with OpenSSL version: OpenSSL 3.0.2 15 Mar 2022
linked to OpenSSL version: OpenSSL 3.0.2 15 Mar 2022
compiled with libuv version: 1.43.0
linked to libuv version: 1.43.0
compiled with libnghttp2 version: 1.43.0
linked to libnghttp2 version: 1.43.0
compiled with libxml2 version: 2.9.13
linked to libxml2 version: 20913
compiled with json-c version: 0.15
linked to json-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.5.2
threads support is enabled
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
Steps to reproduce
- Create zone with
dnssec-policy
, and make sure it is working fine (i.e. withoutupdate-policy
). - If possible resign the zone a few times to cause the SOA serial number in the signed version of the zone to be significantly higher than the original zone file.
- Use
dig
to query the SOA for the zone. - Add
update-policy
to the zone. For example:
update-policy {
grant keyname name _acme-challenge.example.com. TXT;
};
- Apply the changes by running
rndc reload
. - Use
dig
to query the SOA for the zone. You should find that the SOA has gone backwards from step 3.
What is the current bug behavior?
The zone SOA serial number has gone backwards. The original zone file has been replaced by a file containing RRSIG records, etc. The .signed files are no longer used.
What is the expected correct behavior?
I'm not too sure what the correct behaviour should be? Some options might be:
- Leave the original zone file intact, and apply dynamic updates to the .signed file?
- Treat this condition (i.e. using dnssec-policy and update-policy together) as an error?
Relevant configuration files
dnssec-policy kskrsa-zskrsa {
keys {
ksk lifetime unlimited algorithm rsasha256 2048;
zsk lifetime unlimited algorithm rsasha256 2048;
};
nsec3param iterations 0 optout no salt-length 0;
};
zone "example.com" {
type primary;
file "/var/lib/bind/db.example.com";
dnssec-policy kskrsa-zskrsa;
notify explicit;
also-notify { ...; };
allow-transfer { ...; };
allow-query { any; };
#update-policy {
# grant certbot name _acme-challenge.example.com. TXT;
#};
};
Relevant logs and/or screenshots
Sorry I don't know exactly what happened when, so it is difficult to find relevant logs.
Possible fixes
It is likely that what I was trying to do was a silly idea. And in fact I'm going to find another way to achieve this, such as creating a CNAME that references an unsigned dynamic zone?
However I thought I should report this as a bug because even if someone does something silly like this, I don't think it should have the impact that it did?
As a final note I feel that this is (at least in part) related to issue #1709 (closed) which discusses inline signing and dynamic zones, although TBH I didn't understand all the discussion.