BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2023-10-20T07:47:47Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4171catz crashed in dns_catz_dbupdate_callback2023-10-20T07:47:47ZOndřej Surýcatz crashed in dns_catz_dbupdate_callbackA [recent job](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3488427) failed with:
```
27-Jun-2023 06:41:51.371 ht.c:364: REQUIRE(((ht) != ((void *)0) && ((const isc__magic_t *)(ht))->magic == ((('H') << 24 | ('T') << 16 | ('a') << 8...A [recent job](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3488427) failed with:
```
27-Jun-2023 06:41:51.371 ht.c:364: REQUIRE(((ht) != ((void *)0) && ((const isc__magic_t *)(ht))->magic == ((('H') << 24 | ('T') << 16 | ('a') << 8 | ('b'))))) failed
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.15-dev.so(isc_backtrace_log+0x39) [0x7fb70ba2e0d8]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/bin/named/.libs/lt-named() [0x422644]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.15-dev.so(isc_assertion_failed+0xa) [0x7fb70ba2dc9d]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.15-dev.so(isc_ht_find+0x9d) [0x7fb70ba3633f]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/dns/.libs/libdns-9.19.15-dev.so(dns_catz_dbupdate_callback+0x84) [0x7fb70b43fe1c]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/dns/.libs/libdns-9.19.15-dev.so(dns_db_endload+0x46) [0x7fb70b44de68]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/dns/.libs/libdns-9.19.15-dev.so(+0x1aa38b) [0x7fb70b5aa38b]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/dns/.libs/libdns-9.19.15-dev.so(+0x8a3c6) [0x7fb70b48a3c6]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.15-dev.so(+0x56230) [0x7fb70ba56230]
27-Jun-2023 06:41:51.371 /lib64/libuv.so.1(uv__work_done+0xad) [0x7fb709425afd]
27-Jun-2023 06:41:51.371 /lib64/libuv.so.1(+0x132f1) [0x7fb70942a2f1]
27-Jun-2023 06:41:51.371 /lib64/libuv.so.1(uv__io_poll+0x4c5) [0x7fb70943bd15]
27-Jun-2023 06:41:51.371 /lib64/libuv.so.1(uv_run+0x114) [0x7fb70942aa74]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.15-dev.so(+0x4072c) [0x7fb70ba4072c]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.15-dev.so(+0x4fdbb) [0x7fb70ba4fdbb]
27-Jun-2023 06:41:51.371 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.15-dev.so(+0x4fde4) [0x7fb70ba4fde4]
27-Jun-2023 06:41:51.371 /lib64/libpthread.so.0(+0x81da) [0x7fb7082f81da]
27-Jun-2023 06:41:51.371 /lib64/libc.so.6(clone+0x43) [0x7fb7075cbe73]
27-Jun-2023 06:41:51.371 exiting (due to assertion failure)
```July 2023 (9.18.17, 9.18.17-S1, 9.19.15)Arаm SаrgsyаnArаm Sаrgsyаnhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4106ThreadSanitizer: lock-order-inversion (potential deadlock) in __interceptor_p...2023-06-06T11:58:12ZArаm SаrgsyаnThreadSanitizer: lock-order-inversion (potential deadlock) in __interceptor_pthread_mutex_lock in resolver.c (9.18)The added code coverage in !7977 has uncovered a preexisting [issue](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3432650) reported by TSAN. Here's the report (line numbers correspond to commit 8a9ce5a7):
```==================
WARNI...The added code coverage in !7977 has uncovered a preexisting [issue](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3432650) reported by TSAN. Here's the report (line numbers correspond to commit 8a9ce5a7):
```==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=107066)
Cycle in lock order graph: M0 (0x7b9400028a28) => M1 (0x7b4800015490) => M0
Mutex M1 acquired here while holding mutex M0 in thread T1:
#0 pthread_mutex_lock <null> (named+0x67a7b) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#1 fctx_sendevents /builds/isc-projects/bind9/lib/dns/resolver.c:1801:3 (libdns-9.18.16-dev.so+0x1867fe) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#2 fctx__done_detach /builds/isc-projects/bind9/lib/dns/resolver.c:1893:2 (libdns-9.18.16-dev.so+0x183e7b) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#3 rctx_done /builds/isc-projects/bind9/lib/dns/resolver.c (libdns-9.18.16-dev.so+0x1900cd) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#4 resquery_response /builds/isc-projects/bind9/lib/dns/resolver.c (libdns-9.18.16-dev.so+0x18e1fb) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#5 udp_recv /builds/isc-projects/bind9/lib/dns/dispatch.c:638:3 (libdns-9.18.16-dev.so+0x67bf7) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#6 isc__nm_async_readcb /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:2885:2 (libisc-9.18.16-dev.so+0x2c913) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#7 isc__nm_readcb /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:2858:3 (libisc-9.18.16-dev.so+0x2a532) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#8 udp_recv_cb /builds/isc-projects/bind9/lib/isc/netmgr/udp.c:650:2 (libisc-9.18.16-dev.so+0x4265e) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#9 isc__nm_udp_read_cb /builds/isc-projects/bind9/lib/isc/netmgr/udp.c:1057:2 (libisc-9.18.16-dev.so+0x43fbb) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#10 uv__udp_recvmsg /usr/src/libuv-v1.44.1/src/unix/udp.c:303:7 (libuv.so.1+0x264ef) (BuildId: 120c450d14885aa5308bc95c4ea77de2c2b1cc36)
#11 isc__trampoline_run /builds/isc-projects/bind9/lib/isc/trampoline.c:189:11 (libisc-9.18.16-dev.so+0x81127) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
Mutex M0 previously acquired by the same thread here:
#0 pthread_mutex_lock <null> (named+0x67a7b) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#1 fctx__done_detach /builds/isc-projects/bind9/lib/dns/resolver.c:1891:2 (libdns-9.18.16-dev.so+0x183e4d) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#2 rctx_done /builds/isc-projects/bind9/lib/dns/resolver.c (libdns-9.18.16-dev.so+0x1900cd) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#3 resquery_response /builds/isc-projects/bind9/lib/dns/resolver.c (libdns-9.18.16-dev.so+0x18e1fb) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#4 udp_recv /builds/isc-projects/bind9/lib/dns/dispatch.c:638:3 (libdns-9.18.16-dev.so+0x67bf7) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#5 isc__nm_async_readcb /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:2885:2 (libisc-9.18.16-dev.so+0x2c913) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#6 isc__nm_readcb /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:2858:3 (libisc-9.18.16-dev.so+0x2a532) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#7 udp_recv_cb /builds/isc-projects/bind9/lib/isc/netmgr/udp.c:650:2 (libisc-9.18.16-dev.so+0x4265e) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#8 isc__nm_udp_read_cb /builds/isc-projects/bind9/lib/isc/netmgr/udp.c:1057:2 (libisc-9.18.16-dev.so+0x43fbb) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#9 uv__udp_recvmsg /usr/src/libuv-v1.44.1/src/unix/udp.c:303:7 (libuv.so.1+0x264ef) (BuildId: 120c450d14885aa5308bc95c4ea77de2c2b1cc36)
#10 isc__trampoline_run /builds/isc-projects/bind9/lib/isc/trampoline.c:189:11 (libisc-9.18.16-dev.so+0x81127) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
Mutex M0 acquired here while holding mutex M1 in thread T1:
#0 pthread_mutex_lock <null> (named+0x67a7b) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#1 dns_resolver_shutdown /builds/isc-projects/bind9/lib/dns/resolver.c:10661:4 (libdns-9.18.16-dev.so+0x17d743) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#2 view_flushanddetach /builds/isc-projects/bind9/lib/dns/view.c:650:4 (libdns-9.18.16-dev.so+0x1d7b5d) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#3 dns_view_flushanddetach /builds/isc-projects/bind9/lib/dns/view.c:711:2 (libdns-9.18.16-dev.so+0x1d79de) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#4 shutdown_server /builds/isc-projects/bind9/bin/named/server.c:10153:4 (named+0xfe7c4) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#5 task_run /builds/isc-projects/bind9/lib/isc/task.c:815:5 (libisc-9.18.16-dev.so+0x75935) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#6 isc_task_run /builds/isc-projects/bind9/lib/isc/task.c:896:10 (libisc-9.18.16-dev.so+0x75935)
#7 isc__nm_async_task /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:848:11 (libisc-9.18.16-dev.so+0x2ebe7) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#8 process_netievent /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c (libisc-9.18.16-dev.so+0x26adc) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#9 process_queue /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:1013:8 (libisc-9.18.16-dev.so+0x2e9f7) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#10 process_all_queues /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:767:25 (libisc-9.18.16-dev.so+0x20306) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#11 async_cb /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:796:6 (libisc-9.18.16-dev.so+0x20306)
#12 uv__async_io /usr/src/libuv-v1.44.1/src/unix/async.c:163:5 (libuv.so.1+0x10e92) (BuildId: 120c450d14885aa5308bc95c4ea77de2c2b1cc36)
#13 isc__trampoline_run /builds/isc-projects/bind9/lib/isc/trampoline.c:189:11 (libisc-9.18.16-dev.so+0x81127) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
Mutex M1 previously acquired by the same thread here:
#0 pthread_mutex_lock <null> (named+0x67a7b) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#1 dns_resolver_shutdown /builds/isc-projects/bind9/lib/dns/resolver.c:10656:2 (libdns-9.18.16-dev.so+0x17d676) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#2 view_flushanddetach /builds/isc-projects/bind9/lib/dns/view.c:650:4 (libdns-9.18.16-dev.so+0x1d7b5d) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#3 dns_view_flushanddetach /builds/isc-projects/bind9/lib/dns/view.c:711:2 (libdns-9.18.16-dev.so+0x1d79de) (BuildId: 184ff5c2c676a86ff8d2823dffbae7684b323bb6)
#4 shutdown_server /builds/isc-projects/bind9/bin/named/server.c:10153:4 (named+0xfe7c4) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#5 task_run /builds/isc-projects/bind9/lib/isc/task.c:815:5 (libisc-9.18.16-dev.so+0x75935) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#6 isc_task_run /builds/isc-projects/bind9/lib/isc/task.c:896:10 (libisc-9.18.16-dev.so+0x75935)
#7 isc__nm_async_task /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:848:11 (libisc-9.18.16-dev.so+0x2ebe7) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#8 process_netievent /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c (libisc-9.18.16-dev.so+0x26adc) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#9 process_queue /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:1013:8 (libisc-9.18.16-dev.so+0x2e9f7) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#10 process_all_queues /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:767:25 (libisc-9.18.16-dev.so+0x20306) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#11 async_cb /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:796:6 (libisc-9.18.16-dev.so+0x20306)
#12 uv__async_io /usr/src/libuv-v1.44.1/src/unix/async.c:163:5 (libuv.so.1+0x10e92) (BuildId: 120c450d14885aa5308bc95c4ea77de2c2b1cc36)
#13 isc__trampoline_run /builds/isc-projects/bind9/lib/isc/trampoline.c:189:11 (libisc-9.18.16-dev.so+0x81127) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
Thread T1 'isc-net-0000' (tid=107081, running) created by main thread at:
#0 pthread_create <null> (named+0x65deb) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#1 isc_thread_create /builds/isc-projects/bind9/lib/isc/thread.c:73:8 (libisc-9.18.16-dev.so+0x7834a) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#2 isc__netmgr_create /builds/isc-projects/bind9/lib/isc/netmgr/netmgr.c:311:3 (libisc-9.18.16-dev.so+0x1fff0) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#3 isc_managers_create /builds/isc-projects/bind9/lib/isc/managers.c:31:2 (libisc-9.18.16-dev.so+0x5d40c) (BuildId: 4692f4fd1f4b6179023dbe9c8b9ecd8e2ed763f4)
#4 create_managers /builds/isc-projects/bind9/bin/named/main.c:1038:11 (named+0xf9fb8) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a)
#5 setup /builds/isc-projects/bind9/bin/named/main.c:1309:11 (named+0xf9fb8)
#6 main /builds/isc-projects/bind9/bin/named/main.c:1587:2 (named+0xf9fb8)
SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/builds/isc-projects/bind9/bin/named/.libs/named+0x67a7b) (BuildId: 0841b8f64e42f843435425f538d0d69cc808e37a) in __interceptor_pthread_mutex_lock
==================
ThreadSanitizer: reported 1 warnings
```
If possible, this needs to be addressed in the June releases, so that the backport MR !7977 can pass the CI and get merged.June 2023 (9.16.42, 9.16.42-S1, 9.18.16, 9.18.16-S1, 9.19.14)Arаm SаrgsyаnArаm Sаrgsyаnhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3874The !7530 broke mkeys test on Windows2023-02-27T14:58:05ZOndřej SurýThe !7530 broke mkeys test on Windows(It would be also ok to disable this test on Windows.)
```
S:mkeys:2023-02-15T03:17:19-0800
T:mkeys:1:A
A:mkeys:System test mkeys
I:mkeys:PORTRANGE:10000 - 10099
I:mkeys:starting servers
I:mkeys:check for signed record (1)
I:mkeys:check...(It would be also ok to disable this test on Windows.)
```
S:mkeys:2023-02-15T03:17:19-0800
T:mkeys:1:A
A:mkeys:System test mkeys
I:mkeys:PORTRANGE:10000 - 10099
I:mkeys:starting servers
I:mkeys:check for signed record (1)
I:mkeys:check positive validation with valid trust anchor (2)
I:mkeys:check positive validation using delv (3)
I:mkeys:check for failed validation due to wrong key in managed-keys (4)
I:mkeys:check new trust anchor can be added (5)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:check new trust anchor can't be added with bad initial key (6)
I:mkeys:ns3 refreshing managed keys for '_default'
I:mkeys:remove untrusted standby key, check timer restarts (7)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:restore untrusted standby key, revoke original key (8)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:refresh managed-keys, ensure same result (9)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:restore revoked key, ensure same result (10)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:reinitialize trust anchors, add second key to bind.keys
I:mkeys:check that no key from bind.keys is marked as an initializing key (11)
I:mkeys:reinitialize trust anchors, revert to one key in bind.keys
I:mkeys:check that standby key is now trusted (12)
I:mkeys:revoke original key, add new standby (13)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:revoke standby before it is trusted (14)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:wait 20 seconds for key add/remove holddowns to expire (15)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:revoke all keys, confirm roll to insecure (16)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:check for insecure response (17)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:reset the root server (18)
I:mkeys:ns1 server reload successful
I:mkeys:reinitialize trust anchors
I:mkeys:check positive validation (19)
I:mkeys:revoke key with bad signature, check revocation is ignored (20)
I:mkeys:ns1 server reload successful
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:check validation fails with bad DNSKEY rrset (21)
I:mkeys:restore DNSKEY rrset, check validation succeeds again (22)
I:mkeys:ns1 server reload successful
I:mkeys:reset the root server with no keys, check for minimal update (23)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:reset the root server with no signatures, check for minimal update (24)
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:restore root server, check validation succeeds again (25)
I:mkeys:ns1 server reload successful
I:mkeys:ns2 refreshing managed keys for '_default'
I:mkeys:check that trust-anchor-telemetry queries are logged (26)
I:mkeys:check that trust-anchor-telemetry queries are received (27)
I:mkeys:check 'rndc-managed-keys destroy' (28)
I:mkeys:ns2 destroying managed-keys database for '_default'
I:mkeys:check that trust-anchor-telemetry queries contain the correct key (29)
I:mkeys:check initialization fails if managed-keys can't be created (30)
I:mkeys:check failure to contact root servers does not prevent key refreshes after restart (31)
I:mkeys:exceeded time limit waiting for 'Returned from key fetch in keyfetch_done() for 'sub.foo':' in ns5/named.run
I:mkeys:failed
I:mkeys:check 'rndc managed-keys' and islands of trust root unreachable (32)
I:mkeys:failed
I:mkeys:check key refreshes are resumed after root servers become available (33)
I:mkeys:exceeded time limit waiting for 'Returned from key fetch in keyfetch_done() for 'sub.foo': success' in ns5/named.run
I:mkeys:exceeded time limit waiting for 'Returned from key fetch in keyfetch_done() for 'sub.foo': success' in ns5/named.run
I:mkeys:failed
I:mkeys:reinitialize trust anchors, add unsupported algorithm (34)
I:mkeys:ignoring unsupported algorithm in managed-keys (35)
I:mkeys:introduce unsupported algorithm rollover in authoritative zone (36)
I:mkeys:ignoring unsupported algorithm in rollover (37)
I:mkeys:ns1 server reload successful
I:mkeys:ns6 refreshing managed keys for '_default'
I:mkeys:check 'rndc managed-keys' and views (38)
I:mkeys:check 'rndc managed-keys' and islands of trust now that root is reachable (39)
I:mkeys:failed
I:mkeys:exit status: 4
I:mkeys:stopping servers
R:mkeys:FAIL
E:mkeys:2023-02-15T03:20:09-0800
```
See:
* https://gitlab.isc.org/isc-projects/bind9/-/jobs/3159567
* https://gitlab.isc.org/isc-projects/bind9/-/jobs/3159566March 2023 (9.16.39, 9.16.39-S1, 9.18.13, 9.18.13-S1, 9.19.11)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/3801dns_view is detached from deleted zone too late2023-10-30T15:00:55ZOndřej Surýdns_view is detached from deleted zone too lateWhen zone is removed from the configuration, it keeps a reference to `.view` (and possibly `.prev_view`) until the zone is deleted. The removal can take a long time, and if there's a constant stream of `rndc reconfig` adding and removin...When zone is removed from the configuration, it keeps a reference to `.view` (and possibly `.prev_view`) until the zone is deleted. The removal can take a long time, and if there's a constant stream of `rndc reconfig` adding and removing zones, the otherwise dead views can be kept in memory for really long time leading to a memory bloat.
This symptom might also be encountered on a server with a Catalog Zone with a high rate of change.February 2023 (9.16.38, 9.16.38-S1, 9.18.12, 9.18.12-S1, 9.19.10)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3771Segmentation fault in the dupsigs system test2023-01-05T09:03:25ZOndřej SurýSegmentation fault in the dupsigs system testFrom https://gitlab.isc.org/isc-projects/bind9/-/jobs/3040116:
```
D:dupsigs:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D dupsigs-ns1 -X named.lock -'.
D:dupsigs:Program terminated with signal SIGABRT, Abort...From https://gitlab.isc.org/isc-projects/bind9/-/jobs/3040116:
```
D:dupsigs:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D dupsigs-ns1 -X named.lock -'.
D:dupsigs:Program terminated with signal SIGABRT, Aborted.
D:dupsigs:#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f8ba4c0a4c0) at ../sysdeps/nptl/futex-internal.h:186
D:dupsigs:[Current thread is 1 (Thread 0x7f8bc717e140 (LWP 1506))]
D:dupsigs:#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f8ba4c0a4c0) at ../sysdeps/nptl/futex-internal.h:186
D:dupsigs:#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f8ba4c0a450, cond=0x7f8ba4c0a498) at pthread_cond_wait.c:508
D:dupsigs:#2 __pthread_cond_wait (cond=0x7f8ba4c0a498, mutex=0x7f8ba4c0a450) at pthread_cond_wait.c:638
D:dupsigs:#3 0x00007f8bca7d073b in rwlock_lock (rwl=0x7f8ba4c0a448, type=isc_rwlocktype_read) at rwlock.c:222
D:dupsigs:#4 0x00007f8bca7d116a in isc__rwlock_lock (rwl=0x7f8ba4c0a448, type=isc_rwlocktype_read) at rwlock.c:331
D:dupsigs:#5 0x00007f8bca40560a in findnodeintree (rbtdb=0x7f8ba4c0a300, tree=0x7f8ba4c1f190, name=0x7fffe325a5f0, create=false, nodep=0x7fffe325a8a8) at rbtdb.c:2861
D:dupsigs:#6 0x00007f8bca405a81 in findnode (db=0x7f8ba4c0a300, name=0x7fffe325a5f0, create=false, nodep=0x7fffe325a8a8) at rbtdb.c:2925
D:dupsigs:#7 0x00007f8bca348f65 in dns_db_findnode (db=0x7f8ba4c0a300, name=0x7fffe325a5f0, create=false, nodep=0x7fffe325a8a8) at db.c:434
D:dupsigs:#8 0x00007f8bca382b5a in dns_db_createsoatuple (db=0x7f8ba4c0a300, ver=0x7f8ba4c26180, mctx=0x7f8bc671c800, op=DNS_DIFFOP_EXISTS, tp=0x7fffe325b648) at journal.c:141
D:dupsigs:#9 0x00007f8bca268309 in ns_xfr_start (client=0x7f8bc619d400, reqtype=252) at xfrout.c:968
D:dupsigs:#10 0x00007f8bca255186 in ns_query_start (client=0x7f8bc619d400, handle=0x7f8bc6c9bfc0) at query.c:11979
D:dupsigs:#11 0x00007f8bca220e82 in ns__client_request (handle=0x7f8bc6c9bfc0, eresult=ISC_R_SUCCESS, region=0x7fffe325c320, arg=0x7f8bc6c94380) at client.c:2239
D:dupsigs:#12 0x00007f8bca76e703 in streamdns_on_complete_dnsmessage (dnsasm=0x7f8bc6c3e300, region=0x7fffe325c320, sock=0x7f8bc6deec00, transphandle=0x7f8bc6c9bc40) at netmgr/streamdns.c:144
D:dupsigs:#13 0x00007f8bca76e8c8 in streamdns_on_dnsmessage_data_cb (dnsasm=0x7f8bc6c3e300, result=ISC_R_SUCCESS, region=0x7fffe325c320, cbarg=0x7f8bc6deec00, userarg=0x7f8bc6c9bc40) at netmgr/streamdns.c:203
D:dupsigs:#14 0x00007f8bca76e1a3 in isc__dnsstream_assembler_handle_message (dnsasm=0x7f8bc6c3e300, userarg=0x7f8bc6c9bc40) at ./include/isc/dnsstream.h:338
D:dupsigs:#15 0x00007f8bca76e38b in isc_dnsstream_assembler_incoming (dnsasm=0x7f8bc6c3e300, userarg=0x7f8bc6c9bc40, buf=0x7f8bc6580800, buf_size=55) at ./include/isc/dnsstream.h:367
D:dupsigs:#16 0x00007f8bca76e9b3 in streamdns_handle_incoming_data (sock=0x7f8bc6deec00, transphandle=0x7f8bc6c9bc40, data=0x7f8bc6580800, len=55) at netmgr/streamdns.c:239
D:dupsigs:#17 0x00007f8bca76ff05 in streamdns_readcb (handle=0x7f8bc6c9bc40, result=ISC_R_SUCCESS, region=0x7fffe325c440, cbarg=0x7f8bc6deec00) at netmgr/streamdns.c:522
D:dupsigs:#18 0x00007f8bca769eb4 in isc__nm_async_readcb (worker=0x0, ev0=0x7fffe325c4c0) at netmgr/netmgr.c:2077
D:dupsigs:#19 0x00007f8bca769c95 in isc__nm_readcb (sock=0x7f8bc6dee200, uvreq=0x7f8bc67cf000, eresult=ISC_R_SUCCESS, async=false) at netmgr/netmgr.c:2050
D:dupsigs:#20 0x00007f8bca7763cc in isc__nm_tcp_read_cb (stream=0x7f8bc6dee7e0, nread=55, buf=0x7fffe325c730) at netmgr/tcp.c:823
D:dupsigs:#21 0x00007f8bc9e0b5e9 in uv__read (stream=0x7f8bc6dee7e0) at /usr/src/libuv-v1.44.1/src/unix/stream.c:1247
D:dupsigs:#22 0x00007f8bc9e0b8a8 in uv__stream_io (loop=0x7f8bc6ca2da0, w=0x7f8bc6dee868, events=1) at /usr/src/libuv-v1.44.1/src/unix/stream.c:1315
D:dupsigs:#23 0x00007f8bc9e15bc3 in uv__io_poll (loop=0x7f8bc6ca2da0, timeout=30000) at /usr/src/libuv-v1.44.1/src/unix/epoll.c:374
D:dupsigs:#24 0x00007f8bc9dfa8ae in uv_run (loop=0x7f8bc6ca2da0, mode=UV_RUN_DEFAULT) at /usr/src/libuv-v1.44.1/src/unix/core.c:391
D:dupsigs:#25 0x00007f8bca7a3308 in loop_run (loop=0x7f8bc6ca2d80) at loop.c:270
D:dupsigs:#26 0x00007f8bca7a3555 in loop_thread (arg=0x7f8bc6ca2d80) at loop.c:297
D:dupsigs:#27 0x00007f8bca7a4956 in isc_loopmgr_run (loopmgr=0x7f8bc6c233c0) at loop.c:481
D:dupsigs:#28 0x000055d925bbfe4f in main (argc=18, argv=0x7fffe325fb18) at main.c:1518
```https://gitlab.isc.org/isc-projects/bind9/-/issues/3752ADB quota might not be decremented, part 22023-01-03T10:21:59ZArаm SаrgsyаnADB quota might not be decremented, part 2The !6893 merge request, targeting the original issue #3598, has fixed some cases of ADB quota not being decremented back, but the affected customer continues experiencing the same issue.
I believe I have found another code path which c...The !6893 merge request, targeting the original issue #3598, has fixed some cases of ADB quota not being decremented back, but the affected customer continues experiencing the same issue.
I believe I have found another code path which can lead to the described behavior.
Normally, when a `resquery_t` object is created in `fctx_query()`, we [call](https://gitlab.isc.org/isc-projects/bind9/-/blob/df0584b5/lib/dns/resolver.c#L2325) `dns_adb_beginudpfetch()` (which increases the ADB quota) only **if it's a UDP query**.
Then, in `fctx_cancelquery()`, we [call](https://gitlab.isc.org/isc-projects/bind9/-/blob/df0584b5/lib/dns/resolver.c#L1435) `dns_adb_endudpfetch()` to decreases back the ADB quota, again only **if it's a UDP query**.
The problem, I think, is that a UDP query can [become](https://gitlab.isc.org/isc-projects/bind9/-/blob/df0584b5/lib/dns/resolver.c#L2663) a TCP query, preventing the quota from adjusting back in `fctx_cancelquery()` later.January 2023 (9.16.37, 9.16.37-S1, 9.18.11, 9.18.11-S1, 9.19.9)Arаm SаrgsyаnArаm Sаrgsyаnhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3744small updates to large catalog zones cause CPU consumption spikes2023-03-01T14:47:56ZPetr Špačekpspacek@isc.orgsmall updates to large catalog zones cause CPU consumption spikes### Summary
Single RR modification to CATZ causes CPU spikes.
Here is a latency comparison of 1-minute test run, on 100 k QPS (about 1/4 of capacity) while removing and then adding 1 zone, with 20 seconds in between. Green line shows t...### Summary
Single RR modification to CATZ causes CPU spikes.
Here is a latency comparison of 1-minute test run, on 100 k QPS (about 1/4 of capacity) while removing and then adding 1 zone, with 20 seconds in between. Green line shows the same configuration without addition/deletion:
![latency.svg](/uploads/8e8bef18b96e3f39f27c49ae6905db1c/latency.svg)
Timeout = 10 seconds, axes are log-log, Y in usec.
Test environment:
- A beefy AWS VM type - I've used c5n.4xlarge, 16 CPU threads.
- One CATZ with 100 k small zones.
Result: One update -> one CPU core is blocked for 10 seconds.
### BIND version used
- ~"Affects v9.18" : v9_18_10
### Steps to reproduce
- Generate a large catz and configure primary. Here is a script to do that, [genconf.py](/uploads/dbf27a4ea7b768db5a7f08882ffa0732/genconf.py) + [empty.db](/uploads/6c3650f13e246ef9f879cb37911608d3/empty.db)
- Configure secondary to pull the catalog and zones:
```
options {
catalog-zones {
zone "catalog.invalid." min-update-interval 1 default-primaries { 2600::1; };
};
};
zone "catalog.invalid." {
primaries { 2600::1; };
type secondary;
};
```
- Do one RR modification at a time to the CATZ, adding/deleting single zone at a time. Script to do that:
[gencatupd.py](/uploads/76cb7f2e78b4794ed7ba91678e9e4113/gencatupd.py) . Usage:
```
while true; do python gencatudp.py | nsupdate; done
```
### What is the current *bug* behavior?
```
2022-12-14T11:59:02.929Z general: zone catalog.invalid/IN: notify from 2600:1f18:634c:d17e::da5d#42256: serial 2670950588
2022-12-14T11:59:02.929Z xfer-in: zone catalog.invalid/IN: Transfer started.
2022-12-14T11:59:02.929Z xfer-in: transfer of 'catalog.invalid/IN' from 2600:1f18:634c:d17e::da5d#53: connected using 2600:1f18:634c:d17e::da5d#53
2022-12-14T11:59:02.929Z xfer-in: zone catalog.invalid/IN: transferred serial 2670950588
2022-12-14T11:59:02.929Z general: catz: updating catalog zone 'catalog.invalid' with serial -1624016708
2022-12-14T11:59:02.929Z xfer-in: transfer of 'catalog.invalid/IN' from 2600:1f18:634c:d17e::da5d#53: Transfer status: success
2022-12-14T11:59:02.929Z xfer-in: transfer of 'catalog.invalid/IN' from 2600:1f18:634c:d17e::da5d#53: Transfer completed: 1 messages, 5 records, 223 bytes, 0.001 secs (223000 bytes/sec) (serial 2670950588)
```
<- CPU spins here for 10 sec
```
2022-12-14T11:59:12.449Z general: catz: adding zone 'z100000.test' from catalog 'catalog.invalid' - success
2022-12-14T11:59:12.449Z xfer-in: zone z100000.test/IN: Transfer started.
```
### Relevant logs and/or screenshots
Here is a CPU profile taken while the CPU is spinning madly:
![catz.svg](/uploads/a41ed1e9d7a437fee59875b654f61a31/catz.svg)
### Possible fixes
I suppose one improvement could be to reuse IXFR change sets somehow, when they are available.March 2023 (9.16.39, 9.16.39-S1, 9.18.13, 9.18.13-S1, 9.19.11)Petr Špačekpspacek@isc.orgPetr Špačekpspacek@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/3727dns_keyfileio_t objects are never released2022-12-08T16:09:38ZOndřej Surýdns_keyfileio_t objects are never releasedThere's off-by-one error in reference counting the dns_keyfileio_t objects, so they are never released.There's off-by-one error in reference counting the dns_keyfileio_t objects, so they are never released.December 2022 (9.16.36, 9.16.36-S1, 9.18.10, 9.19.8)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3709Resolver hashtables don't shrink2024-03-05T23:09:16ZOndřej SurýResolver hashtables don't shrinkAfter we refactored `dns_resolver` unit from "buckets" to common hashtable, the stored fetch contexts and counters are never deleted or the hashtables shrink leading to memory use by the resolver to only grow.After we refactored `dns_resolver` unit from "buckets" to common hashtable, the stored fetch contexts and counters are never deleted or the hashtables shrink leading to memory use by the resolver to only grow.December 2022 (9.16.36, 9.16.36-S1, 9.18.10, 9.19.8)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3620crash on startup when building with --disable-doh2022-12-12T14:38:54ZEvan Huntcrash on startup when building with --disable-dohThe netinet handler for `isc_nmsocket_set_tlsctx()` is ifdef'd out when we build with `--disable-doh`, but it's still needed if DoT is configured. This is causing CI failures, for example [here](https://gitlab.isc.org/isc-projects/bind9/...The netinet handler for `isc_nmsocket_set_tlsctx()` is ifdef'd out when we build with `--disable-doh`, but it's still needed if DoT is configured. This is causing CI failures, for example [here](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2855956).November 2022 (9.16.35, 9.16.35-S1, 9.18.9, 9.19.7)https://gitlab.isc.org/isc-projects/bind9/-/issues/3456Possible race in dns_dispatch_connect()2022-08-25T13:18:30ZEvan HuntPossible race in dns_dispatch_connect()A [test failure](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2640755) turned up in !6562 that indicates a probable race in `dns_dispatch_connect()`: thread 1 sets `disp->tcpstate` from NONE to CONNECTING; thread 2 finds `disp->tcpst...A [test failure](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2640755) turned up in !6562 that indicates a probable race in `dns_dispatch_connect()`: thread 1 sets `disp->tcpstate` from NONE to CONNECTING; thread 2 finds `disp->tcpstate` is CONNECTING, locks the dispatch, appends something to `disp->pending`, and unlocks; thread 1 locks the dispatch, finds that `disp->pending` is not empty, and asserts.
I think the correct fix is just to remove the `INSIST` that the list has to be empty.August 2022 (9.16.32, 9.16.32-S1, 9.18.6, 9.19.4)Evan HuntEvan Hunthttps://gitlab.isc.org/isc-projects/bind9/-/issues/3441Recursive "stress" tests hang intermittently for BIND 9.16(-S)2022-07-12T07:10:16ZMichał KępieńRecursive "stress" tests hang intermittently for BIND 9.16(-S)For the past few days, recursive "stress" tests for `v9_16` and
`v9_16_sub` have been intermittently failing in weird ways:
- hangs during statistics collection at the end of the test:
- Fedora, amd64:
- https://gitl...For the past few days, recursive "stress" tests for `v9_16` and
`v9_16_sub` have been intermittently failing in weird ways:
- hangs during statistics collection at the end of the test:
- Fedora, amd64:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2624245
- https://gitlab.isc.org/isc-private/bind9/-/jobs/2626723
- Fedora, arm64:
- https://gitlab.isc.org/isc-private/bind9/-/jobs/2624543
- https://gitlab.isc.org/isc-private/bind9/-/jobs/2626726
- https://gitlab.isc.org/isc-private/bind9/-/jobs/2629480
- core dumps:
- Fedora, amd64:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2629186
Not all recursive "stress" test jobs are failing, though, and no such
problems have been occurring for `v9_18` or `main`. Given that the
problem started within the past week, !6419 (9.16-specific) is the most
likely culprit, but a quick glance at the changes it contains does not
raise any immediate red flags.
I managed to collect a core dump from a hung `named` instance and I
found out that all working threads are blocked on the same mutex (one of
the fetch context bucket locks). This suggests a missing mutex unlock
somewhere and explains why the server is unresponsive to queries, `rndc`
commands, etc. The question is how it got into that state, given that
it was seemingly working fine before !6419 and that the latter does not
look immediately suspicious.
**This issue is a blocker for releasing BIND 9.16.31 and 9.16.31-S1.**July 2022 (9.16.31, 9.16.31-S1, 9.18.5, 9.19.3)Michał KępieńMichał Kępieńhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3398Possible race between resolver query timeout and validation in BIND 9.162022-07-04T20:08:18ZArаm SаrgsyаnPossible race between resolver query timeout and validation in BIND 9.16A customer's BIND 9.6.27 `named` instance has crashed after days of working.
Here is the backtrace:
```
Thread 1 (Thread 0x7faa1ecba700 (LWP 2298)):
#0 0x00007faa22f2b5f7 in raise () from /lib64/libc.so.6
#1 0x00007faa22f2cce8 in abo...A customer's BIND 9.6.27 `named` instance has crashed after days of working.
Here is the backtrace:
```
Thread 1 (Thread 0x7faa1ecba700 (LWP 2298)):
#0 0x00007faa22f2b5f7 in raise () from /lib64/libc.so.6
#1 0x00007faa22f2cce8 in abort () from /lib64/libc.so.6
#2 0x000000000042ac65 in assertion_failed (file=<optimized out>, line=<optimized out>, type=<optimized out>,
cond=<optimized out>) at ./main.c:272
#3 0x00007faa2477afea in isc_assertion_failed (file=file@entry=0x7faa25b66c3f "resolver.c", line=line@entry=4561,
type=type@entry=isc_assertiontype_require,
cond=cond@entry=0x7faa25b67660 "(__builtin_expect(!!((fctx) != ((void *)0)), 1) && __builtin_expect(!!(((const isc__magic_t *)(fctx))->magic == ((('F') << 24 | ('!') << 16 | ('!') << 8 | ('!')))), 1))") at assertions.c:48
#4 0x00007faa25ab2aee in fctx_unlink (fctx=fctx@entry=0x7fa9b9b27ab0) at resolver.c:4561
#5 0x00007faa25ab4532 in maybe_destroy (fctx=fctx@entry=0x7fa9b9b27ab0, locked=true) at resolver.c:5722
#6 0x00007faa25ac1728 in validated (task=<optimized out>, event=0x7fa854ad0f60) at resolver.c:5815
#7 0x00007faa247aa0fe in task_run (task=0x7fa7db68a690) at task.c:851
#8 isc_task_run (task=0x7fa7db68a690) at task.c:944
#9 0x00007faa2478ebfd in isc__nm_async_task (ev0=ev0@entry=0x7fa977674df0, worker=0x1669e40) at netmgr.c:873
#10 0x00007faa24794598 in process_netievent (worker=worker@entry=0x1669e40, ievent=0x7fa977674df0) at netmgr.c:952
#11 0x00007faa24794779 in process_queue (worker=worker@entry=0x1669e40, type=type@entry=NETIEVENT_TASK) at netmgr.c:1021
#12 0x00007faa24794ec3 in process_all_queues (worker=0x1669e40) at netmgr.c:792
#13 async_cb (handle=0x166a1a0) at netmgr.c:821
#14 0x00007faa23dbb4b3 in uv__async_io () from /lib64/libuv.so.1
#15 0x00007faa23dce983 in uv__io_poll () from /lib64/libuv.so.1
#16 0x00007faa23dbbc88 in uv_run () from /lib64/libuv.so.1
#17 0x00007faa2479482f in nm_thread (worker0=0x1669e40) at netmgr.c:727
#18 0x00007faa247ac845 in isc__trampoline_run (arg=0x190d2d0) at trampoline.c:198
#19 0x00007faa232bedc5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007faa22fec1cd in clone () from /lib64/libc.so.6
```
Here's only the most important part isolated:
```
#3 0x00007faa2477afea in isc_assertion_failed (file=file@entry=0x7faa25b66c3f "resolver.c", line=line@entry=4561,
type=type@entry=isc_assertiontype_require,
cond=cond@entry=0x7faa25b67660 "(__builtin_expect(!!((fctx) != ((void *)0)), 1) && __builtin_expect(!!(((const isc__magic_t *)(fctx))->magic == ((('F') << 24 | ('!') << 16 | ('!') << 8 | ('!')))), 1))") at assertions.c:48
#4 0x00007faa25ab2aee in fctx_unlink (fctx=fctx@entry=0x7fa9b9b27ab0) at resolver.c:4561
#5 0x00007faa25ab4532 in maybe_destroy (fctx=fctx@entry=0x7fa9b9b27ab0, locked=true) at resolver.c:5722
#6 0x00007faa25ac1728 in validated (task=<optimized out>, event=0x7fa854ad0f60) at resolver.c:5815
#7 0x00007faa247aa0fe in task_run (task=0x7fa7db68a690) at task.c:851
```
Both in the beginning of `validated()` and in the beginning of `fctx_unlink()` there is a `REQUIRE(VALID_FCTX(fctx));` check. The first one passed successfully, while the second one caused a crash.
So somewhere between `validated()` and `fctx_unlink()`, the fetch context (`fctx`) was unexpectedly destroyed.
# Destroyed in the same thread? (not likely)
There can be two scenarios - either this happened during the execution of the same callback (should be easy to find), or another thread interfered and destroyed the context while `validated()` was doing its job. So let's start with the first one. For a moment it seemed that there is such a code path leading to a double `fctx_destroy()`:
```
validated() -> maybe_destroy() -> dns_validator_cancel() -> dns_resolver_destroyfetch() -> fctx_decreference() -> fctx_destroy()
-> fctx_destroy()
```
But the problem is that before calling `dns_validator_cancel()`, the `maybe_destroy()` makes sure that the bucket is locked:
```c
bucketnum = fctx->bucketnum;
if (!locked) {
LOCK(&res->buckets[bucketnum].lock);
}
```
... and `dns_resolver_destroyfetch()` also locks that same bucket:
```c
FTRACE("destroyfetch");
bucketnum = fctx->bucketnum;
LOCK(&res->buckets[bucketnum].lock);
```
But we actually do not experience any deadlocks, which most likely means that this exact code path never(?) executes.
# A multi-threading error?
Can two different tasks (threads) work with (and try to destroy) the same fetch context? All the fetch callbacks are running on the same task (derived from the bucket number), apart from the final callback which runs on a task chosen by the creator of the fetch.
So I can only imagine the following scenario how this could have happened:
1. [T1] The fetch times out, the timeout callback `fctx_timeout()` gets called, which initiates sending the final timeout event of the fetch to the creator's callback, through `fctx_done()` -> `fctx_sendevents()`.
2. [T1] Simultaneously, a validation completes so then the task starts executing the `validated()` function.
3. [T1] `validated()` locks the bucket lock, unlinks the current validator from the `fctx->validators` list (which happens to clear the list), destroys the validator, and for a short moment unlocks the bucket lock.
4. [T2] At that same brief moment, the creator's final callback sees that there are no more active fetches on `fctx`, and the `fctx->validators` list is empty, so it destroys the fetch context.
5. [T1] `validated()` locks the bucket lock again, calls `maybe_destroy()` which tries to destroy the fetch context again, and so it crashes.July 2022 (9.16.31, 9.16.31-S1, 9.18.5, 9.19.3)Arаm SаrgsyаnArаm Sаrgsyаnhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3105Assertion failure on shutdown in req_senddone()2022-03-08T08:50:44ZMichal NowakAssertion failure on shutdown in req_senddone()This seems to be the same issue as isc-projects/bind9#2951 but it has the fix isc-projects/bind9!5715 in and the problem still occurs in the CI.
Job [#2240130](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2240130) failed for 710f62b...This seems to be the same issue as isc-projects/bind9#2951 but it has the fix isc-projects/bind9!5715 in and the problem still occurs in the CI.
Job [#2240130](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2240130) failed for 710f62bf39a925c9a580192ec5ce8dee1ea66a06:
```
D:inline:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D inline-ns2 -X named.lock -m'.
D:inline:Program terminated with signal SIGABRT, Aborted.
D:inline:#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
D:inline:[Current thread is 1 (Thread 0x7f9c82efb700 (LWP 28048))]
D:inline:#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
D:inline:#1 0x00007f9c885cc537 in __GI_abort () at abort.c:79
D:inline:#2 0x0000000000420cbc in assertion_failed (file=0x7f9c89033a11 "request.c", line=<optimized out>, type=<optimized out>, cond=<optimized out>) at main.c:238
D:inline:#3 0x00007f9c890b726a in isc_assertion_failed (file=0x2 <error: Cannot access memory at address 0x2>, line=-2098243744, line@entry=1030, type=type@entry=isc_assertiontype_require, cond=0x7f9c885e2ce1 <__GI_raise+321> "H\213\204$\b\001") at assertions.c:49
D:inline:#4 0x00007f9c88f82172 in req_senddone (eresult=<optimized out>, region=<optimized out>, arg=<optimized out>) at request.c:1030
D:inline:#5 0x00007f9c88eb8cb5 in send_done (handle=0x7f9c75389780, result=ISC_R_CANCELED, cbarg=0x0) at dispatch.c:1864
D:inline:#6 0x00007f9c890a7cbf in isc__nm_async_sendcb (worker=<optimized out>, ev0=ev0@entry=0x7f9c7538a080) at netmgr/netmgr.c:2838
D:inline:#7 0x00007f9c890a4362 in process_netievent (worker=worker@entry=0x7f9c85c49c80, ievent=ievent@entry=0x7f9c7538a080) at netmgr/netmgr.c:972
D:inline:#8 0x00007f9c8909fa81 in process_queue (worker=0x7f9c85c49c80, type=NETIEVENT_NORMAL) at netmgr/netmgr.c:1010
D:inline:#9 process_all_queues (worker=0x7f9c85c49c80) at netmgr/netmgr.c:756
D:inline:#10 async_cb (handle=0x7f9c85c49fe0) at netmgr/netmgr.c:785
D:inline:#11 0x00007f9c889bce83 in uv__async_io (loop=0x7f9c85c49c90, w=0x7f9c85c49e58, events=1) at /usr/src/libuv-v1.43.0/src/unix/async.c:163
D:inline:#12 0x00007f9c889d8b87 in uv__io_poll (loop=0x7f9c85c49c90, timeout=0) at /usr/src/libuv-v1.43.0/src/unix/epoll.c:374
D:inline:#13 0x00007f9c889bd86c in uv_run (loop=0x7f9c85c49c90, mode=UV_RUN_DEFAULT) at /usr/src/libuv-v1.43.0/src/unix/core.c:389
D:inline:#14 0x00007f9c8909fbf1 in nm_thread (worker0=0x7f9c85c49c80) at netmgr/netmgr.c:691
D:inline:#15 0x00007f9c890dede5 in isc__trampoline_run (arg=0x2372760) at trampoline.c:187
D:inline:#16 0x00007f9c88774ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
D:inline:#17 0x00007f9c886a4def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```
[core.28030-backtrace.txt](/uploads/454199c1540460b792772cad022aea8f/core.28030-backtrace.txt)April 2022 (9.16.28, 9.16.28-S1, 9.18.2, 9.19.0)https://gitlab.isc.org/isc-projects/bind9/-/issues/3051Missing destroy(s) for rwlocks, mutexes and conditions2022-01-11T14:13:40ZOndřej SurýMissing destroy(s) for rwlocks, mutexes and conditionsIt was reported to us that there's a runaway memory leak on FreeBSD which was identified as missing dtor for pthread primitives:
1. `sock->cond` in netmgr - the one causing runaway memory use
2. `worker->lock` - one per worker
3. rwlock...It was reported to us that there's a runaway memory leak on FreeBSD which was identified as missing dtor for pthread primitives:
1. `sock->cond` in netmgr - the one causing runaway memory use
2. `worker->lock` - one per worker
3. rwlocks in db.c and dlz.c - initialized once per `named` (fixing would require dtor at library unload, not worth fixing)
4. rwlock in `lib/*/result.c` - initialized once per `named` (fixing would require dtor at library unload, not worth fixing)January 2022 (9.16.25, 9.16.25-S1, 9.17.22)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3040Removing the fetch context expiry timer uncovered latent bugs2022-01-26T11:33:40ZMichał KępieńRemoving the fetch context expiry timer uncovered latent bugsAfter the dispatch code was reworked to use netmgr, the lifetime expiry
timer for the fetch context was replaced with in-band netmgr timeouts.
Unfortunately, it turned out that the lifetime expiry timer served as a
last-resort measure fo...After the dispatch code was reworked to use netmgr, the lifetime expiry
timer for the fetch context was replaced with in-band netmgr timeouts.
Unfortunately, it turned out that the lifetime expiry timer served as a
last-resort measure for breaking various deadlocks and hangs which could
be triggered in pathological resolution scenarios (#3033 and #3037 are
examples of these).
Now that the fetches are never cleaned up after in some scenarios,
`named` may not respond at all to certain queries and hang on shutdown,
which is not acceptable for a stable release.
Since the bugs uncovered are neither new nor easy to fix, it was decided
that the lifetime expiry timer should be revived until we figure out all
known preexisting glitches in resolver code. The current plan is for
the lifetime expiry timer to be present in BIND 9.18, with the hope of
removing it before the release of BIND 9.20.December 2021 (9.16.24, 9.16.24-S1, 9.17.21)https://gitlab.isc.org/isc-projects/bind9/-/issues/3018fctx_cancelquery() attempts to process a query which has already been freed2022-01-17T14:55:55ZMichał Kępieńfctx_cancelquery() attempts to process a query which has already been freedhttps://gitlab.isc.org/isc-projects/bind9/-/jobs/2108955
<details>
<summary>Click to expand/collapse AddressSanitizer report</summary>
<pre>==4603==ERROR: AddressSanitizer: heap-use-after-free on address 0x6170000ad810 at pc 0x7fef0381a...https://gitlab.isc.org/isc-projects/bind9/-/jobs/2108955
<details>
<summary>Click to expand/collapse AddressSanitizer report</summary>
<pre>==4603==ERROR: AddressSanitizer: heap-use-after-free on address 0x6170000ad810 at pc 0x7fef0381a873 bp 0x7feef9e24e80 sp 0x7feef9e24e78
READ of size 8 at 0x6170000ad810 thread T7
#0 0x7fef0381a872 in fctx_cancelquery /builds/isc-projects/bind9/lib/dns/resolver.c:1256
#1 0x7fef03843a95 in rctx_done /builds/isc-projects/bind9/lib/dns/resolver.c:9438
#2 0x7fef0384a49c in resquery_response /builds/isc-projects/bind9/lib/dns/resolver.c:7343
#3 0x7fef0345c22c in udp_recv /builds/isc-projects/bind9/lib/dns/dispatch.c:583
#4 0x7fef0478ddc7 in isc__nm_async_readcb netmgr/netmgr.c:2778
#5 0x7fef0478e426 in isc__nm_readcb netmgr/netmgr.c:2751
#6 0x7fef047caed5 in udp_recv_cb netmgr/udp.c:637
#7 0x7fef047cef11 in isc__nm_udp_read_cb netmgr/udp.c:1021
#8 0x7fef020d8a41 in uv__udp_recvmsg /usr/src/libuv-v1.42.0/src/unix/udp.c:302
#9 0x7fef020d8365 in uv__udp_io /usr/src/libuv-v1.42.0/src/unix/udp.c:178
#10 0x7fef020df3ad in uv__io_poll /usr/src/libuv-v1.42.0/src/unix/epoll.c:374
#11 0x7fef020c3b5b in uv_run /usr/src/libuv-v1.42.0/src/unix/core.c:389
#12 0x7fef04791c34 in nm_thread netmgr/netmgr.c:688
#13 0x7fef0488cde4 in isc__trampoline_run /builds/isc-projects/bind9/lib/isc/trampoline.c:185
#14 0x7fef01ea6298 in start_thread (/lib64/libpthread.so.0+0x9298)
#15 0x7fef0145c352 in clone (/lib64/libc.so.6+0x100352)
0x6170000ad810 is located 16 bytes inside of 720-byte region [0x6170000ad800,0x6170000adad0)
freed by thread T7 here:
#0 0x7fef05112647 in free (/lib64/libasan.so.6+0xae647)
#1 0x7fef0482ff97 in sdallocx /builds/isc-projects/bind9/lib/isc/jemalloc_shim.h:38
#2 0x7fef0482ff97 in mem_put /builds/isc-projects/bind9/lib/isc/mem.c:361
#3 0x7fef0482ff97 in isc__mem_put /builds/isc-projects/bind9/lib/isc/mem.c:770
#4 0x7fef03819f4b in resquery_destroy /builds/isc-projects/bind9/lib/dns/resolver.c:1194
#5 0x7fef03819f4b in resquery_detach /builds/isc-projects/bind9/lib/dns/resolver.c:1219
#6 0x7fef0381bd10 in fctx_cancelquery /builds/isc-projects/bind9/lib/dns/resolver.c:1433
#7 0x7fef0381d2e7 in fctx_cancelqueries /builds/isc-projects/bind9/lib/dns/resolver.c:1485
#8 0x7fef03820412 in fctx_done /builds/isc-projects/bind9/lib/dns/resolver.c:1746
#9 0x7fef038439f0 in rctx_next /builds/isc-projects/bind9/lib/dns/resolver.c:9348
#10 0x7fef038439f0 in rctx_done /builds/isc-projects/bind9/lib/dns/resolver.c:9431
#11 0x7fef0384a49c in resquery_response /builds/isc-projects/bind9/lib/dns/resolver.c:7343
#12 0x7fef0345c22c in udp_recv /builds/isc-projects/bind9/lib/dns/dispatch.c:583
#13 0x7fef0478ddc7 in isc__nm_async_readcb netmgr/netmgr.c:2778
#14 0x7fef0478e426 in isc__nm_readcb netmgr/netmgr.c:2751
#15 0x7fef047caed5 in udp_recv_cb netmgr/udp.c:637
#16 0x7fef047cef11 in isc__nm_udp_read_cb netmgr/udp.c:1021
#17 0x7fef020d8a41 in uv__udp_recvmsg /usr/src/libuv-v1.42.0/src/unix/udp.c:302
#18 0x7fef020d8365 in uv__udp_io /usr/src/libuv-v1.42.0/src/unix/udp.c:178
#19 0x7fef020df3ad in uv__io_poll /usr/src/libuv-v1.42.0/src/unix/epoll.c:374
#20 0x7fef020c3b5b in uv_run /usr/src/libuv-v1.42.0/src/unix/core.c:389
#21 0x7fef04791c34 in nm_thread netmgr/netmgr.c:688
#22 0x7fef0488cde4 in isc__trampoline_run /builds/isc-projects/bind9/lib/isc/trampoline.c:185
#23 0x7fef01ea6298 in start_thread (/lib64/libpthread.so.0+0x9298)
previously allocated by thread T7 here:
#0 0x7fef0511293f in __interceptor_malloc (/lib64/libasan.so.6+0xae93f)
#1 0x7fef0482f3de in mallocx /builds/isc-projects/bind9/lib/isc/jemalloc_shim.h:30
#2 0x7fef0482f3de in mem_get /builds/isc-projects/bind9/lib/isc/mem.c:340
#3 0x7fef0482f3de in isc__mem_get /builds/isc-projects/bind9/lib/isc/mem.c:753
#4 0x7fef0382a923 in fctx_query /builds/isc-projects/bind9/lib/dns/resolver.c:2003
#5 0x7fef03844334 in rctx_resend /builds/isc-projects/bind9/lib/dns/resolver.c:9322
#6 0x7fef03844334 in rctx_done /builds/isc-projects/bind9/lib/dns/resolver.c:9452
#7 0x7fef0384749b in rctx_timedout /builds/isc-projects/bind9/lib/dns/resolver.c:7740
#8 0x7fef0384749b in resquery_response /builds/isc-projects/bind9/lib/dns/resolver.c:7223
#9 0x7fef0345c22c in udp_recv /builds/isc-projects/bind9/lib/dns/dispatch.c:583
#10 0x7fef0478ddc7 in isc__nm_async_readcb netmgr/netmgr.c:2778
#11 0x7fef0478e426 in isc__nm_readcb netmgr/netmgr.c:2751
#12 0x7fef0478e9fc in isc__nmsocket_readtimeout_cb netmgr/netmgr.c:2066
#13 0x7fef020beb19 in uv__run_timers /usr/src/libuv-v1.42.0/src/timer.c:178
#14 0x7fef020c3afb in uv_run /usr/src/libuv-v1.42.0/src/unix/core.c:380
#15 0x7fef04791c34 in nm_thread netmgr/netmgr.c:688
#16 0x7fef0488cde4 in isc__trampoline_run /builds/isc-projects/bind9/lib/isc/trampoline.c:185
#17 0x7fef01ea6298 in start_thread (/lib64/libpthread.so.0+0x9298)
Thread T7 created by T0 here:
#0 0x7fef050ba8d6 in pthread_create (/lib64/libasan.so.6+0x568d6)
#1 0x7fef0487b4b9 in isc_thread_create /builds/isc-projects/bind9/lib/isc/thread.c:79
#2 0x7fef04776c4a in isc__netmgr_create netmgr/netmgr.c:328
#3 0x7fef0482cd42 in isc_managers_create /builds/isc-projects/bind9/lib/isc/managers.c:36
#4 0x43d99a in create_managers /builds/isc-projects/bind9/bin/named/main.c:920
#5 0x43d99a in setup /builds/isc-projects/bind9/bin/named/main.c:1184
#6 0x43d99a in main /builds/isc-projects/bind9/bin/named/main.c:1452
#7 0x7fef01383b74 in __libc_start_main (/lib64/libc.so.6+0x27b74)
SUMMARY: AddressSanitizer: heap-use-after-free /builds/isc-projects/bind9/lib/dns/resolver.c:1256 in fctx_cancelquery
Shadow bytes around the buggy address:
0x0c2e8000dab0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000dac0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000dad0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000dae0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
0x0c2e8000daf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2e8000db00: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000db10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000db20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000db30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000db40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2e8000db50: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
===4803==ABORTING</pre>
</details>
Looks like a reference counting issue? `resquery_destroy()` is only
called when the reference count for the query reaches 0, but `rctx`
still wants to access it. AFAICT, `rctx_respinit()` does not
`resquery_attach()` to the query structure and instead [uses a plain
assignment][1], but I am not sure how easily this could be changed.
As a side note, could this maybe be related to #3013? It may be a long
shot, but the code location mentioned in the ASAN backtrace above
matches a branch handling a question section mismatch, so it made me
think of that one.
```c
7338 default:
7339 result = same_question(fctx, query->rmessage);
7340 if (result != ISC_R_SUCCESS) {
7341 FCTXTRACE3("response did not match question", result);
7342 rctx.nextitem = true;
7343 rctx_done(&rctx, result);
7344 return;
7345 }
7346 break;
7347 }
```
[1]: https://gitlab.isc.org/isc-projects/bind9/-/blob/a814f722615d2433cc57fd1f2fca1693febb3977/lib/dns/resolver.c#L7594December 2021 (9.16.24, 9.16.24-S1, 9.17.21)https://gitlab.isc.org/isc-projects/bind9/-/issues/3002BIND 9.17.20 leaks memory when used as a recursive resolver2021-11-23T14:40:45ZMichał KępieńBIND 9.17.20 leaks memory when used as a recursive resolver[BIND 9.17.20 release testing][1] revealed that memory use grows
significantly over time when `named` is used as a resolver.
This was already suspected in #2953, but the problem was not as apparent
back then.
The tricky part about find...[BIND 9.17.20 release testing][1] revealed that memory use grows
significantly over time when `named` is used as a resolver.
This was already suspected in #2953, but the problem was not as apparent
back then.
The tricky part about finding the root cause of this problem is that it
does not manifest itself all the time. For example, consider the
following 3 jobs, all of which were ran over the weekend 2 weeks ago for
the exact same code revision (the results quoted come from FreeBSD
jobs):
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2065864
<details>
<summary><b>OK</b> (click to expand)</summary>
<pre>2021-10-23:10:31:17 INFO: analysis for directory /var/tmp/gitlab_runner/builds/e-TSUMFs/0/isc-projects/bind9/output/ns3
Total of 121 measurements
Mid point of run:
Minimum: 303,776 KiB
Maximum: 333,344 KiB
Average: 314,357 KiB
End of run:
Minimum: 312,740 KiB
Maximum: 341,716 KiB
Average: 325,141 KiB</pre>
</details>
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2066591
<details>
<summary><b>OK</b> (click to expand)</summary>
<pre>2021-10-24:10:31:39 INFO: analysis for directory /var/tmp/gitlab_runner/builds/e-TSUMFs/0/isc-projects/bind9/output/ns3
Total of 121 measurements
Mid point of run:
Minimum: 291,820 KiB
Maximum: 319,860 KiB
Average: 305,058 KiB
End of run:
Minimum: 298,416 KiB
Maximum: 336,140 KiB
Average: 315,823 KiB</pre>
</details>
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2067629
<details>
<summary><b>ERROR</b> (click to expand)</summary>
<pre>2021-10-25:10:31:52 INFO: analysis for directory /var/tmp/gitlab_runner/builds/e-TSUMFs/0/isc-projects/bind9/output/ns3
Total of 121 measurements
Mid point of run:
Minimum: 376,972 KiB
Maximum: 547,596 KiB
Average: 452,970 KiB
End of run:
Minimum: 595,488 KiB
Maximum: 747,184 KiB
Average: 676,506 KiB</pre>
</details>
We need to get to the bottom of this before BIND 9.18.0 is released.
[1]: https://wiki.isc.org/bin/view/QA/BindQaResults_9_16_23December 2021 (9.16.24, 9.16.24-S1, 9.17.21)https://gitlab.isc.org/isc-projects/bind9/-/issues/2962Assertion failed in `dns_resolver_logfetch()`2022-07-15T11:44:10ZOndřej SurýAssertion failed in `dns_resolver_logfetch()`https://gitlab.isc.org/isc-projects/bind9/-/jobs/2057408
```
D:resolver:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D resolver-ns5 -X named.lock'.
8759D:resolver:Program terminated with signal SIGABRT, Abort...https://gitlab.isc.org/isc-projects/bind9/-/jobs/2057408
```
D:resolver:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D resolver-ns5 -X named.lock'.
8759D:resolver:Program terminated with signal SIGABRT, Aborted.
8760D:resolver:#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
8761D:resolver:[Current thread is 1 (Thread 0x7f4643d7f700 (LWP 5704))]
8762D:resolver:#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
8763D:resolver:#1 0x00007f4646f67535 in __GI_abort () at abort.c:79
8764D:resolver:#2 0x0000560317906076 in assertion_failed (file=0x7f4647dab44b "resolver.c", line=10543, type=isc_assertiontype_require, cond=0x7f4647dac878 "((fctx) != ((void *)0) && ((const isc__magic_t *)(fctx))->magic == ((('F') << 24 | ('!') << 16 | ('!') << 8 | ('!'))))") at main.c:236
8765D:resolver:#3 0x00007f4647f39a9d in isc_assertion_failed (file=0x7f4647dab44b "resolver.c", line=10543, type=isc_assertiontype_require, cond=0x7f4647dac878 "((fctx) != ((void *)0) && ((const isc__magic_t *)(fctx))->magic == ((('F') << 24 | ('!') << 16 | ('!') << 8 | ('!'))))") at assertions.c:47
8766D:resolver:#4 0x00007f4647cacff2 in dns_resolver_logfetch (fetch=0x7f460b4068e0, lctx=0x7f464440b800, category=0x7f4647a54230 <ns_categories+80>, module=0x7f4647a54290 <ns_modules+16>, level=4, duplicateok=false) at resolver.c:10543
8767D:resolver:#5 0x00007f4647a142ef in fetch_callback (task=0x7f4639ad70c0, event=0x7f460b88b300) at query.c:6242
8768D:resolver:#6 0x00007f4647f71680 in task_run (task=0x7f4639ad70c0) at task.c:827
8769D:resolver:#7 0x00007f4647f71941 in isc_task_run (task=0x7f4639ad70c0) at task.c:907
8770D:resolver:#8 0x00007f4647f13006 in isc__nm_async_task (worker=0x7f4644448240, ev0=0x7f4638397b80) at netmgr/netmgr.c:834
8771D:resolver:#9 0x00007f4647f1338a in process_netievent (worker=0x7f4644448240, ievent=0x7f4638397b80) at netmgr/netmgr.c:913
8772D:resolver:#10 0x00007f4647f14465 in process_queue (worker=0x7f4644448240, type=NETIEVENT_TASK) at netmgr/netmgr.c:1007
8773D:resolver:#11 0x00007f4647f12dab in process_all_queues (worker=0x7f4644448240) at netmgr/netmgr.c:753
8774D:resolver:#12 0x00007f4647f12e58 in async_cb (handle=0x7f46444485a0) at netmgr/netmgr.c:782
8775D:resolver:#13 0x00007f46475606d8 in ?? () from /usr/lib/x86_64-linux-gnu/libuv.so.1
8776D:resolver:#14 0x00007f464756f530 in uv.io_poll () from /usr/lib/x86_64-linux-gnu/libuv.so.1
8777D:resolver:#15 0x00007f4647560ff5 in uv_run () from /usr/lib/x86_64-linux-gnu/libuv.so.1
8778D:resolver:#16 0x00007f4647f128da in nm_thread (worker0=0x7f4644448240) at netmgr/netmgr.c:688
8779D:resolver:#17 0x00007f4647f7bfe2 in isc__trampoline_run (arg=0x560318659d50) at trampoline.c:185
8780D:resolver:#18 0x00007f464710dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
8781D:resolver:#19 0x00007f464703e4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```August 2022 (9.16.32, 9.16.32-S1, 9.18.6, 9.19.4)https://gitlab.isc.org/isc-projects/bind9/-/issues/2947unexpected deletion of configured catalog zone2021-10-27T13:05:17ZMark Andrewsunexpected deletion of configured catalog zoneJob [#2034208](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2034208) failed for 8a962622f211d50a9da66aa4759685f1598b87eb:
dom1.example is deleted despite it being added.
```
11-Oct-2021 23:27:45.427 catz: dns_catz_add_zone catalog1...Job [#2034208](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2034208) failed for 8a962622f211d50a9da66aa4759685f1598b87eb:
dom1.example is deleted despite it being added.
```
11-Oct-2021 23:27:45.427 catz: dns_catz_add_zone catalog1.example
11-Oct-2021 23:27:45.427 catz: dns_catz_add_zone catalog2.example
11-Oct-2021 23:27:45.427 catz: dns_catz_add_zone catalog3.example
11-Oct-2021 23:27:45.427 /builds/isc-projects/bind9/bin/tests/system/catz/ns2/named.conf:39: catz: zone-directory 'nonexistent' not found; zone files will not be saved
11-Oct-2021 23:27:45.639 catz: updating catalog zone 'catalog1.example' with serial 1
11-Oct-2021 23:27:45.639 catz: update_from_db: iteration finished
11-Oct-2021 23:27:45.655 catz: update_from_db: new zone merged
11-Oct-2021 23:27:45.743 catz: updating catalog zone 'catalog3.example' with serial 1
11-Oct-2021 23:27:45.743 catz: update_from_db: iteration finished
11-Oct-2021 23:27:45.759 catz: update_from_db: new zone merged
11-Oct-2021 23:27:50.835 catz: updating catalog zone 'catalog1.example' with serial 2
11-Oct-2021 23:27:50.835 catz: update_from_db: iteration finished
11-Oct-2021 23:27:50.843 catz: iterating over 'dom1.example' from catalog 'catalog1.example'
11-Oct-2021 23:27:50.843 catz: adding zone 'dom1.example' from catalog 'catalog1.example' - success
11-Oct-2021 23:27:50.855 catz: update_from_db: new zone merged
11-Oct-2021 23:27:50.855 catz: new zone version came too soon, deferring update
11-Oct-2021 23:27:55.859 catz: updating catalog zone 'catalog1.example' with serial 2
11-Oct-2021 23:27:55.859 catz: update_from_db: iteration finished
11-Oct-2021 23:27:55.867 catz: iterating over 'dom1.example' from catalog 'catalog1.example'
11-Oct-2021 23:27:55.867 catz: deleting zone 'dom1.example' from catalog 'catalog1.example' - success
11-Oct-2021 23:27:55.879 catz: update_from_db: new zone merged
11-Oct-2021 23:27:55.879 catz: catz_delzone_taskaction: zone 'dom1.example' deleted
11-Oct-2021 23:28:00.747 catz: updating catalog zone 'catalog2.example' with serial 1
11-Oct-2021 23:28:00.747 catz: update_from_db: iteration finished
11-Oct-2021 23:28:00.767 catz: update_from_db: new zone merged
11-Oct-2021 23:28:02.863 catz: update already queued
11-Oct-2021 23:28:02.863 catz: updating catalog zone 'catalog1.example' with serial 3
11-Oct-2021 23:28:02.863 catz: update_from_db: iteration finished
11-Oct-2021 23:28:02.867 catz: deleting zone 'dom1.example' from catalog 'catalog1.example' - success
11-Oct-2021 23:28:02.879 catz: update_from_db: new zone merged
11-Oct-2021 23:28:02.879 catz: catz_delzone_taskaction: zone 'dom1.example' not found
11-Oct-2021 23:28:03.351 catz: new zone version came too soon, deferring update
11-Oct-2021 23:28:06.355 catz: updating catalog zone 'catalog2.example' with serial 2
11-Oct-2021 23:28:06.355 catz: update_from_db: iteration finished
11-Oct-2021 23:28:06.359 catz: iterating over 'dom4.example' from catalog 'catalog2.example'
11-Oct-2021 23:28:06.359 catz: adding zone 'dom4.example' from catalog 'catalog2.example' - success
11-Oct-2021 23:28:06.367 catz: update_from_db: new zone merged
11-Oct-2021 23:28:07.867 catz: new zone version came too soon, deferring update
11-Oct-2021 23:28:07.867 catz: update already queued
11-Oct-2021 23:28:08.871 catz: updating catalog zone 'catalog1.example' with serial 4
11-Oct-2021 23:28:08.871 catz: unknown record in catalog zone - trash2.foo.catalog1.example IN A(failure) - ignoring
11-Oct-2021 23:28:08.871 catz: unknown record in catalog zone - trash.catalog1.example IN A(failure) - ignoring
11-Oct-2021 23:28:08.871 catz: unknown record in catalog zone - version.catalog1.example IN A(failure) - ignoring
11-Oct-2021 23:28:08.871 catz: unknown record in catalog zone - blahblah.636722929740e507aaf27c502812fc395d30fb17.zones.catalog1.example IN TXT(failure) - ignoring
11-Oct-2021 23:28:08.871 catz: unknown record in catalog zone - foobarbaz.b901f492f3ebf6c1e5b597e51766f02f0479eb03.zones.catalog1.example IN APL(failure) - ignoring
11-Oct-2021 23:28:08.871 catz: unknown record in catalog zone - e721433b6160b450260d4f54b3ec8bab30cb3b83.zones.catalog1.example IN NS(failure) - ignoring
11-Oct-2021 23:28:08.871 catz: unknown record in catalog zone - trash3.zones.catalog1.example IN NS(failure) - ignoring
11-Oct-2021 23:28:08.871 catz: update_from_db: iteration finished
```
Closer look at the zone disappearing while the TXT queries for the new records are being made.
```
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): ns__query_start
11-Oct-2021 23:27:54.819 client @0x7b7c00070178 10.53.0.1#39041 (dom1.example): query 'dom1.example/TXT/IN' approved
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_lookup
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_gotanswer
11-Oct-2021 23:27:54.819 client @0x7b7c00070178 10.53.0.1#39041 (dom1.example): rrl=(nil), HAVECOOKIE=0, result=DNS_R_NXRRSET, fname=0x7b5400030480(1), is_zone=1, RECURSIONOK=0, query.rpz_st=(nil)(0), RRL_CHECKED=0
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_checkrpz
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): rpz_rewrite
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_nodata
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_sign_nodata
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_addsoa
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_addrrset
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_setorder
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_additional
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_additional: done
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_addrrset: done
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): ns_query_done
11-Oct-2021 23:27:54.819 client @0x7b7c00070178 10.53.0.1#39041 (dom1.example): reset client
11-Oct-2021 23:27:54.819 query client=0x7b7c00070178 thread=0x7feb7696e700(dom1.example/TXT): query_reset
11-Oct-2021 23:27:55.435 client @0x7b7c0002f578 10.53.0.1#36757: received notify for zone 'dom1.example'
11-Oct-2021 23:27:55.435 zone dom1.example/IN: notify from 10.53.0.1#36757: serial 2
11-Oct-2021 23:27:55.435 queue_soa_query: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.435 soa_query: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.439 refresh_callback: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.439 refresh_callback: zone dom1.example/IN: serial: new 2, old 1
11-Oct-2021 23:27:55.439 queue_xfrin: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.439 zone dom1.example/IN: Transfer started.
11-Oct-2021 23:27:55.439 zone dom1.example/IN: requesting IXFR from 10.53.0.1#6200
11-Oct-2021 23:27:55.439 transfer of 'dom1.example/IN' from 10.53.0.1#6200: connected using 10.53.0.2#44285
11-Oct-2021 23:27:55.439 transfer of 'dom1.example/IN' from 10.53.0.1#6200: requesting IXFR for serial 1
11-Oct-2021 23:27:55.439 transfer of 'dom1.example/IN' from 10.53.0.1#6200: sent request data
11-Oct-2021 23:27:55.443 transfer of 'dom1.example/IN' from 10.53.0.1#6200: received 191 bytes
;dom1.example. IN IXFR
dom1.example. 3600 IN SOA . . 2 3600 3600 3600 3600
dom1.example. 3600 IN SOA . . 1 3600 3600 3600 3600
dom1.example. 3600 IN SOA . . 2 3600 3600 3600 3600
dom1.example. 0 IN TXT "added" "record"
dom1.example. 3600 IN SOA . . 2 3600 3600 3600 3600
11-Oct-2021 23:27:55.443 transfer of 'dom1.example/IN' from 10.53.0.1#6200: got incremental response
11-Oct-2021 23:27:55.443 journal file zonedir/__catz___default_catalog1.example_dom1.example.db.jnl does not exist, creating it
11-Oct-2021 23:27:55.443 del dom1.example. 3600 IN SOA . . 1 3600 3600 3600 3600
11-Oct-2021 23:27:55.447 add dom1.example. 3600 IN SOA . . 2 3600 3600 3600 3600
11-Oct-2021 23:27:55.447 add dom1.example. 0 IN TXT "added" "record"
11-Oct-2021 23:27:55.447 dns_zone_verifydb: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.447 zone_needdump: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.447 zone_settimer: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.447 zone dom1.example/IN: zone transfer finished: success
11-Oct-2021 23:27:55.447 zone dom1.example/IN: transferred serial 2
11-Oct-2021 23:27:55.447 zone_needdump: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.447 zone_settimer: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.447 zone_settimer: zone dom1.example/IN: enter
11-Oct-2021 23:27:55.447 transfer of 'dom1.example/IN' from 10.53.0.1#6200: Transfer status: success
11-Oct-2021 23:27:55.447 transfer of 'dom1.example/IN' from 10.53.0.1#6200: Transfer completed: 1 messages, 5 records, 191 bytes, 0.007 secs (27285 bytes/sec) (serial 2)
11-Oct-2021 23:27:55.447 transfer of 'dom1.example/IN' from 10.53.0.1#6200: freeing transfer context
11-Oct-2021 23:27:55.859 catz: updating catalog zone 'catalog1.example' with serial 2
11-Oct-2021 23:27:55.859 catz: update_from_db: iteration finished
11-Oct-2021 23:27:55.867 catz: iterating over 'dom1.example' from catalog 'catalog1.example'
11-Oct-2021 23:27:55.867 catz: deleting zone 'dom1.example' from catalog 'catalog1.example' - success
11-Oct-2021 23:27:55.879 catz: update_from_db: new zone merged
11-Oct-2021 23:27:55.879 calling free_rbtdb(dom1.example)
11-Oct-2021 23:27:55.879 done free_rbtdb(dom1.example)
11-Oct-2021 23:27:55.879 catz: catz_delzone_taskaction: zone 'dom1.example' deleted
11-Oct-2021 23:27:55.883 zone_shutdown: zone dom1.example/IN: shutting down
11-Oct-2021 23:27:56.027 query client=0x7b7c0001f978 thread=0x7feb78537700(dom1.example/TXT): qctx_init
11-Oct-2021 23:27:56.027 query client=0x7b7c0001f978 thread=0x7feb78537700(dom1.example/TXT): client attr:0x22300, query attr:0x700, restarts:0, origqname:dom1.example, timer:0, authdb:0, referral:0
11-Oct-2021 23:27:56.027 query client=0x7b7c0001f978 thread=0x7feb78537700(dom1.example/TXT): ns__query_start
11-Oct-2021 23:27:56.027 query client=0x7b7c0001f978 thread=0x7feb78537700(dom1.example/TXT): ns_query_done
11-Oct-2021 23:27:56.027 client @0x7b7c0001f978 10.53.0.1#34865 (dom1.example): query failed (REFUSED) for dom1.example/IN/TXT at query.c:5484
```
```
S:catz:2021-10-11T23:27:42+0000
T:catz:1:A
A:catz:System test catz
I:catz:PORTRANGE:6200 - 6299
I:catz:starting servers
I:catz:Testing adding/removing of domain in catalog zone
I:catz:checking that dom1.example. is not served by primary (1)
I:catz:Adding a domain dom1.example. to primary via RNDC (2)
I:catz:checking that dom1.example. is now served by primary (3)
I:catz:Adding domain dom1.example. to catalog1 zone (4)
I:catz:waiting for secondary to sync up (5)
I:catz:checking that dom1.example. is served by secondary (6)
I:catz:checking that zone-directory is populated (7)
I:catz:update dom1.example. (8)
I:catz:wait for secondary to be updated (9)
I:catz:failed
I:catz:check that journal was created for cleanup test (10)
I:catz:failed
```
- [x] Need to check v9.17November 2021 (9.16.23, 9.16.23-S1, 9.17.20)