Using "rndc delzone" during zone transfer may crash named
The following crash occurred during the inline
system test:
17-Nov-2020 04:26:19.900 queue_xfrin: zone test-l/IN (unsigned): enter
17-Nov-2020 04:26:19.900 zone test-l/IN (unsigned): Transfer started.
17-Nov-2020 04:26:19.900 zone test-l/IN (unsigned): no database exists yet, requesting AXFR of initial version from 10.53.0.2#12100
17-Nov-2020 04:26:19.904 received control channel command 'delzone test-l'
17-Nov-2020 04:26:19.904 zone test-l scheduled for removal via delzone
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: connected using 10.53.0.2#12100
17-Nov-2020 04:26:19.904 deleting zone test-l in view _default via delzone
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: sent request data
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: received 148 bytes
17-Nov-2020 04:26:19.904 received message from 10.53.0.2#12100
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19176
;; flags: qr aa; QUESTION: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;test-l. IN AXFR
;; ANSWER SECTION:
test-l. 300 IN SOA ns2.test-l. . 2000042407 20 20 1814400 3600
test-l. 300 IN NS ns3.test-l.
ns2.test-l. 300 IN A 10.53.0.2
ns3.test-l. 300 IN A 10.53.0.3
test-l. 300 IN SOA ns2.test-l. . 2000042407 20 20 1814400 3600
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: got nonincremental response
17-Nov-2020 04:26:19.904 zone_shutdown: zone test-l/IN (signed): shutting down
17-Nov-2020 04:26:19.904 zone_shutdown: zone test-l/IN (unsigned): shutting down
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: shut down: operation canceled
17-Nov-2020 04:26:19.904 dns_zone_verifydb: zone test-l/IN (unsigned): enter
17-Nov-2020 04:26:19.904 zone test-l/IN (unsigned): zone transfer finished: operation canceled
17-Nov-2020 04:26:19.904 removing journal file
17-Nov-2020 04:26:19.904 zone test-l/IN (unsigned): replacing zone database
17-Nov-2020 04:26:19.904 zone test-l/IN (unsigned): zone transfer finished: success
17-Nov-2020 04:26:19.904 zone test-l/IN (unsigned): transferred serial 2000042407
17-Nov-2020 04:26:19.904 zone_needdump: zone test-l/IN (unsigned): enter
17-Nov-2020 04:26:19.904 zone_settimer: zone test-l/IN (unsigned): enter
17-Nov-2020 04:26:19.904 zone_settimer: zone test-l/IN (unsigned): enter
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: Transfer status: success
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: Transfer completed: 1 messages, 5 records, 148 bytes, 0.001 secs (148000 bytes/sec) (serial 2000042407)
17-Nov-2020 04:26:19.904 transfer of 'test-l/IN (unsigned)' from 10.53.0.2#12100: freeing transfer context
17-Nov-2020 04:26:19.904 zone.c:16915: INSIST(((__extension__ ({ __auto_type __atomic_load_ptr = ((&(zone)->flags)); __typeof__ (*__atomic_load_ptr) __atomic_load_tmp; __atomic_load (__atomic_load_ptr, &__atomic_load_tmp, (memory_order_relaxed)); __atomic_load_tmp; }) & (DNS_ZONEFLG_REFRESH)) != 0)) failed, back trace
17-Nov-2020 04:26:19.904 /builds/isc-projects/bind9/bin/named/.libs/lt-named() [0x428fcc]
17-Nov-2020 04:26:19.904 /builds/isc-projects/bind9/lib/isc/.libs/libisc.so.1705(isc_assertion_failed+0xa) [0x7ff0d8adfc7a]
17-Nov-2020 04:26:19.904 /builds/isc-projects/bind9/lib/dns/.libs/libdns.so.1706(+0x185385) [0x7ff0d8812385]
17-Nov-2020 04:26:19.904 /builds/isc-projects/bind9/lib/dns/.libs/libdns.so.1706(+0x16f8e1) [0x7ff0d87fc8e1]
17-Nov-2020 04:26:19.904 /builds/isc-projects/bind9/lib/dns/.libs/libdns.so.1706(dns_xfrin_shutdown+0x31) [0x7ff0d87fca61]
17-Nov-2020 04:26:19.904 /builds/isc-projects/bind9/lib/dns/.libs/libdns.so.1706(+0x19111e) [0x7ff0d881e11e]
17-Nov-2020 04:26:19.904 /builds/isc-projects/bind9/lib/isc/.libs/libisc.so.1705(+0x5a879) [0x7ff0d8b00879]
17-Nov-2020 04:26:19.904 /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7ff0d71576ba]
17-Nov-2020 04:26:19.904 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7ff0d66ad4dd]
17-Nov-2020 04:26:19.904 exiting (due to assertion failure)
Looks like the test-l
zone was deleted using rndc
while its transfer
was in progress.
While I do not have any proof that this is related to migrating zone
transfer code to netmgr, this particular INSIST
has been in place for
the past 20 years, so it would be quite a coincidence to only start
hitting it now. If that turned out to be the case, branches other than
~"v9.17" might be affected, too, but I am sticking with the netmgr
theory for now.