inline-signing breaking on 9.11.25 (FreeBSD 11.4)
Summary
Journal becomes corrupted leading to inline-signing failure.
BIND version used
BIND 9.11.25 (Extended Support Version) <id:4a7e9aa>
running on FreeBSD amd64 11.4-RELEASE-p3 FreeBSD 11.4-RELEASE-p3 #0: Tue Sep 1 08:22:33 UTC 2020 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
built by make with '--localstatedir=/var' '--disable-linux-caps' '--with-randomdev=/dev/random' '--with-libxml2=/usr/local' '--with-readline=-L/usr/local/lib -ledit' '--with-dlopen=yes' '--with-gost=no' '--without-python' '--sysconfdir=/usr/local/etc/namedb' '--with-dlz-filesystem=yes' '--enable-dnstap' '--enable-filter-aaaa' '--disable-fixed-rrset' '--without-geoip2' '--without-gssapi' '--with-libidn2=/usr/local' '--enable-ipv6' '--with-libjson=/usr/local' '--disable-largefile' '--with-lmdb=/usr/local' '--disable-native-pkcs11' '--disable-querytrace' '--enable-rpz-nsdname' '--enable-rpz-nsip' 'STD_CDEFINES=-DDIG_SIGCHASE=1' '--with-openssl=/usr' '--enable-threads' '--with-tuning=default' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd11.4' 'build_alias=amd64-portbld-freebsd11.4' 'CC=cc' 'CFLAGS=-O2 -pipe -DLIBICONV_PLUG -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-DLIBICONV_PLUG -isystem /usr/local/include' 'CPP=cpp' 'PKG_CONFIG=pkgconf'
compiled by CLANG FreeBSD Clang 10.0.0 (git@github.com:llvm/llvm-project.git llvmorg-10.0.0-0-gd32170dbd5b)
compiled with OpenSSL version: OpenSSL 1.0.2u-freebsd 20 Dec 2019
linked to OpenSSL version: OpenSSL 1.0.2u-freebsd 20 Dec 2019
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with libjson-c version: 0.15
linked to libjson-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled
default paths:
named configuration: /usr/local/etc/namedb/named.conf
rndc configuration: /usr/local/etc/namedb/rndc.conf
DNSSEC root key: /usr/local/etc/namedb/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/pid
named lock file: /var/run/named/named.lock
FreeBSD 11.4, AMD64, running as a Vmware vm.
Steps to reproduce
Issue noticed on my personal machine where named started servfailing a master zone with inline-signing enabled. All slaves failed to receive it as well.
Upon hitting this issue, I deleted the zone.signed.jnl and zone.signed and did an rndc reload which did not help. The zone.jnl appeared to be corrupted.
I've backed up the unsigned zone files and timestamps, but will ask staff whether they want me to upload them elsewhere or attach them here. (I'd prefer this ticket be private if that's the case)
This happened on its own and I can find nothing in the logs.
What is the current bug behavior?
Journal gets out of sync with the main zone, refuses to update the signed copy, starts servfailing, slaves refuse to serve it.
What is the expected correct behavior?
Journal staying consistent.
Relevant configuration files
zone "gushi.org" {
type master;
file "/etc/namedb/m/gushi.org./gushi.org.hosts";
key-directory "/etc/namedb/m/gushi.org./keys";
inline-signing yes;
auto-dnssec maintain;
allow-transfer {
key pri-sec.gushi.org.;
redacted
};
notify yes;
also-notify {
redacted
};
};
Contents of zone directory:
drwxrwxrwx 4 root wheel 512 Jan 16 09:00 .
drwxrwxrwx 48 root wheel 36352 Jan 23 18:48 ..
-rw-r--r-- 1 root wheel 14915 Jan 12 20:25 gushi.org.hosts
-rw-r--r-- 1 bind wheel 512 Mar 16 2020 gushi.org.hosts.jbk
-rw-r--r-- 1 bind wheel 7498 Jan 12 20:23 gushi.org.hosts.jnl
-rw-r--r-- 1 bind wheel 52111 Jan 16 09:00 gushi.org.hosts.signed
-rw-r--r-- 1 bind wheel 1118287 Jan 16 08:49 gushi.org.hosts.signed.jnl
drwxr-xr-x 2 bind wheel 512 Jun 6 2019 keys
drwxr-xr-x 3 root wheel 512 Mar 15 2020 old
Relevant logs and/or screenshots
/var/log/messages
goes back several days, no mentions in logs:
Jan 19 21:00:00 prime newsyslog[98735]: logfile turned over due to size>100K
Before and after attempting to update the serial of the zone with zsu and reload yielded:
Before:
$TTL 3600
gushi.org. IN SOA ns.gushi.org. root.gushi.org. (
2021011113 ; serial number
7200 ; refresh
7200 ; retry
604800 ; expire
3600 ; minimum TTL
)
root@prime:/var/named/etc/namedb/m/gushi.org. # zsu -v -v -f gushi.org.hosts
Zone header:
$TTL 3600
gushi.org. IN SOA ns.gushi.org. root.gushi.org. (
2021012400 ; serial number
7200 ; refresh
7200 ; retry
604800 ; expire
3600 ; minimum TTL
)
root@prime:/var/named/etc/namedb/m/gushi.org. # rndc reload gushi.org
rndc: 'reload' failed: out of range
root@prime:/var/named/etc/namedb/m/gushi.org. # grep named /var/log/messages
Jan 24 00:34:04 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): journal rollforward failed: journal out of sync with zone
Jan 24 00:34:04 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): not loaded due to errors.
Jan 24 00:34:44 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): journal rollforward failed: journal out of sync with zone
Jan 24 00:34:44 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): not loaded due to errors.
## Finally, removing the .jnl file fixed things.
root@prime:/var/named/etc/namedb/m/gushi.org. # service named stop
Stopping named.
Waiting for PIDS: 9378.
root@prime:/var/named/etc/namedb/m/gushi.org. # rm gushi.org.hosts.jnl
root@prime:/var/named/etc/namedb/m/gushi.org. # service named start
Starting named.
root@prime:/var/named/etc/namedb/m/gushi.org. # rndc reload gushi.org
zone reload up-to-date
root@prime:/var/named/etc/namedb/m/gushi.org. # dig @127.0.0.1 gushi.org SOA
root@prime:/var/named/etc/namedb/m/gushi.org. # dig @127.0.0.1 gushi.org SOA
; <<>> DiG 9.16.9 <<>> @127.0.0.1 gushi.org SOA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5029
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 85d501bb14ab255c6d056848600d31689fb49aefd8933d7e (good)
;; QUESTION SECTION:
;gushi.org. IN SOA
;; ANSWER SECTION:
gushi.org. 3600 IN SOA ns.gushi.org. root.gushi.org. 2021012437 7200 7200 604800 3600
;; AUTHORITY SECTION:
gushi.org. 360 IN NS ns.gushi.org.
gushi.org. 360 IN NS ns2.gushi.org.
;; ADDITIONAL SECTION:
ns.gushi.org. 360 IN A 199.164.166.132
ns2.gushi.org. 360 IN A 149.20.3.253
ns.gushi.org. 360 IN AAAA 2620:137:6000:10::132
ns2.gushi.org. 360 IN AAAA 2001:4f8:1:2000::253
;; Query time: 9 msec
Possible fixes
Deleting the journal is the only thing that seems to help. I kept a copy of the .jnl/.jbk files from this failure mode, but have not uploaded them. I can do so out of band, or mark this ticket private.