BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2023-09-04T09:09:22Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4118Data race lib/dns/adb.c:1537 in clean_finds_at_name2023-09-04T09:09:22ZMichal NowakData race lib/dns/adb.c:1537 in clean_finds_at_nameJob [respdiff-long:tsan](https://gitlab.isc.org/isc-private/bind9/-/jobs/3440993) failed for [d2fbe443b833d093f68bf4f5a1736242fc8d18a1](https://gitlab.isc.org/isc-private/bind9/-/commit/d2fbe443b833d093f68bf4f5a1736242fc8d18a1) (~"v9.18-...Job [respdiff-long:tsan](https://gitlab.isc.org/isc-private/bind9/-/jobs/3440993) failed for [d2fbe443b833d093f68bf4f5a1736242fc8d18a1](https://gitlab.isc.org/isc-private/bind9/-/commit/d2fbe443b833d093f68bf4f5a1736242fc8d18a1) (~"v9.18-S").
```
WARNING: ThreadSanitizer: data race
Write of size 4 at 0x000000000001 by thread T1 (mutexes: write M1, write M2):
#0 clean_finds_at_name lib/dns/adb.c:1537
#1 fetch_callback lib/dns/adb.c:4009
#2 task_run lib/isc/task.c:815
#3 isc_task_run lib/isc/task.c:896
#4 isc__nm_async_task netmgr/netmgr.c:848
#5 process_netievent netmgr/netmgr.c:920
#6 process_queue netmgr/netmgr.c:1013
#7 process_all_queues netmgr/netmgr.c:767
#8 async_cb netmgr/netmgr.c:796
#9 uv__async_io /usr/src/libuv-v1.44.1/src/unix/async.c:163
#10 isc__trampoline_run lib/isc/trampoline.c:189
Previous read of size 4 at 0x000000000001 by thread T2:
#0 findname lib/dns/resolver.c:3749
#1 fctx_getaddresses lib/dns/resolver.c:3993
#2 fctx_try lib/dns/resolver.c:4390
#3 rctx_nextserver lib/dns/resolver.c:10356
#4 rctx_done lib/dns/resolver.c:10503
#5 resquery_response lib/dns/resolver.c:8511
#6 udp_recv lib/dns/dispatch.c:638
#7 isc__nm_async_readcb netmgr/netmgr.c:2885
#8 isc__nm_readcb netmgr/netmgr.c:2858
#9 udp_recv_cb netmgr/udp.c:650
#10 isc__nm_udp_read_cb netmgr/udp.c:1057
#11 uv__udp_recvmsg /usr/src/libuv-v1.44.1/src/unix/udp.c:303
#12 isc__trampoline_run lib/isc/trampoline.c:189
Location is heap block of size 256 at 0x000000000025 allocated by thread T2:
#0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:651
#1 mallocx lib/isc/jemalloc_shim.h:35
#2 mem_get lib/isc/mem.c:343
#3 isc__mem_get lib/isc/mem.c:761
#4 new_adbfind lib/dns/adb.c:1901
#5 dns_adb_createfind lib/dns/adb.c:2934
#6 findname lib/dns/resolver.c:3656
#7 fctx_getaddresses lib/dns/resolver.c:3993
#8 fctx_try lib/dns/resolver.c:4390
#9 rctx_nextserver lib/dns/resolver.c:10356
#10 rctx_done lib/dns/resolver.c:10503
#11 resquery_response lib/dns/resolver.c:8511
#12 udp_recv lib/dns/dispatch.c:638
#13 isc__nm_async_readcb netmgr/netmgr.c:2885
#14 isc__nm_readcb netmgr/netmgr.c:2858
#15 udp_recv_cb netmgr/udp.c:650
#16 isc__nm_udp_read_cb netmgr/udp.c:1057
#17 uv__udp_recvmsg /usr/src/libuv-v1.44.1/src/unix/udp.c:303
#18 isc__trampoline_run lib/isc/trampoline.c:189
Mutex M1 is already destroyed.
Mutex M2 is already destroyed.
Thread T1 (running) created by main thread at:
#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962
#1 isc_thread_create lib/isc/thread.c:73
#2 isc__netmgr_create netmgr/netmgr.c:311
#3 isc_managers_create lib/isc/managers.c:31
#4 create_managers bin/named/main.c:1042
#5 setup bin/named/main.c:1313
#6 main bin/named/main.c:1594
Thread T2 (running) created by main thread at:
#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962
#1 isc_thread_create lib/isc/thread.c:73
#2 isc__netmgr_create netmgr/netmgr.c:311
#3 isc_managers_create lib/isc/managers.c:31
#4 create_managers bin/named/main.c:1042
#5 setup bin/named/main.c:1313
#6 main bin/named/main.c:1594
SUMMARY: ThreadSanitizer: data race lib/dns/adb.c:1537 in clean_finds_at_name
```Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4112"serve-stale:check prefetch processing of a stale CNAME target" fails on Free...2023-07-07T09:25:45ZMichal Nowak"serve-stale:check prefetch processing of a stale CNAME target" fails on FreeBSD 13Job [#3435305](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3435305) failed for ff3d25a47f9f969669b2e4f5cde10c50f9cdd171 (~"v9.18").
On FreeBSD 13.2, the `check prefetch processing of a stale CNAME target` check [failed](https://git...Job [#3435305](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3435305) failed for ff3d25a47f9f969669b2e4f5cde10c50f9cdd171 (~"v9.18").
On FreeBSD 13.2, the `check prefetch processing of a stale CNAME target` check [failed](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3435305) [twice](https://gitlab.isc.org/isc-private/bind9/-/jobs/3431983) in the recent days:
```
2023-06-02 01:09:52 INFO:serve-stale I:serve-stale_tmp_q8yamlle:check prefetch processing of a stale CNAME target (214)
2023-06-02 01:09:55 INFO:serve-stale I:serve-stale_tmp_q8yamlle:failed
```
This was expected:
```
target.example. 2 IN A 10.53.0.2
```
But this was the answer:
```
target.example. 30 IN A 10.53.0.2
```
We got a stale answer after client timeout (`; EDE: 3 (Stale Answer): (client timeout)`), query time was 1840 msec. Locally, I get 2 msec and a non-stale answer.
I was unable to reproduce the problem locally.https://gitlab.isc.org/isc-projects/bind9/-/issues/4104ZoneQuota stats counter is not counting everything2024-02-24T07:55:05ZOndřej SurýZoneQuota stats counter is not counting everythingThe `ZoneQuota` should log all the hits to `fcount_incr()` returning `ISC_R_QUOTA`, but it does only in a single place. The counting should be moved to `fctx_incr()`.The `ZoneQuota` should log all the hits to `fcount_incr()` returning `ISC_R_QUOTA`, but it does only in a single place. The counting should be moved to `fctx_incr()`.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4102Use liburcu QSBR flavor2023-07-26T09:59:54ZOndřej SurýUse liburcu QSBR flavorThe QSBR flavor is faster, but also requires rcu_quiescent_state() to be called periodically from every RCU thread.The QSBR flavor is faster, but also requires rcu_quiescent_state() to be called periodically from every RCU thread.Not plannedOndřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4100REQUIRE(((multi) != ((void *)0) && ((const isc__magic_t *)(multi))->magic == ...2023-05-30T08:09:36ZMichal NowakREQUIRE(((multi) != ((void *)0) && ((const isc__magic_t *)(multi))->magic == ((('q') << 24 | ('p') << 16 | ('m') << 8 | ('v'))))) in qp.cJob [#3424074](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3424074) failed for 2e8ceeea14e336980c9da80449b84ecd16afc7e5.
The `qpmulti_test` unit test failed.
```
[==========] Running 1 test(s).
[ RUN ] qpmulti
qp.c:634: REQUI...Job [#3424074](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3424074) failed for 2e8ceeea14e336980c9da80449b84ecd16afc7e5.
The `qpmulti_test` unit test failed.
```
[==========] Running 1 test(s).
[ RUN ] qpmulti
qp.c:634: REQUIRE(((multi) != ((void *)0) && ((const isc__magic_t *)(multi))->magic == ((('q') << 24 | ('p') << 16 | ('m') << 8 | ('v'))))) failed, back trace
/builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.14-dev.so(+0x2ddb2)[0x7f231ea2ddb2]
/builds/isc-projects/bind9/lib/isc/.libs/libisc-9.19.14-dev.so(isc_assertion_failed+0xa)[0x7f231ea2dd2d]
/builds/isc-projects/bind9/lib/dns/.libs/libdns-9.19.14-dev.so(+0xb9fe3)[0x7f231e0b9fe3]
/lib64/liburcu.so.6(+0x37a9)[0x7f231d64e7a9]
/lib64/libpthread.so.0(+0x81da)[0x7f231dbdd1da]
/lib64/libc.so.6(clone+0x43)[0x7f231ceafe73]
../../tests/unit-test-driver.sh: line 36: 13597 Aborted (core dumped) "${TEST_PROGRAM}"
FAIL qpmulti_test (exit status: 134)
```
There's no core file or full backtrace in the logs.https://gitlab.isc.org/isc-projects/bind9/-/issues/4099[PATCH] +shortans2023-05-30T13:24:54ZFredrick Brennan[PATCH] +shortans# Patch
```diff
From 6041dcb60313b5fd81076bd53713b8a53fb95f87 Mon Sep 17 00:00:00 2001
From: Fredrick Brennan <copypaste@kittens.ph>
Date: Sat, 27 May 2023 08:23:45 -0400
Subject: [PATCH] [dig] +shortans
---
bin/dig/dig.c | 48 +++++...# Patch
```diff
From 6041dcb60313b5fd81076bd53713b8a53fb95f87 Mon Sep 17 00:00:00 2001
From: Fredrick Brennan <copypaste@kittens.ph>
Date: Sat, 27 May 2023 08:23:45 -0400
Subject: [PATCH] [dig] +shortans
---
bin/dig/dig.c | 48 ++++++++++++++++++++++++++++++++++++------------
bin/dig/dig.rst | 4 ++++
doc/man/dig.1in | 5 +++++
3 files changed, 45 insertions(+), 12 deletions(-)
diff --git a/bin/dig/dig.c b/bin/dig/dig.c
index 694924c0f2..dd9bfcd4a7 100644
--- a/bin/dig/dig.c
+++ b/bin/dig/dig.c
@@ -286,6 +286,8 @@ help(void) {
"short\n"
" form of answers - global "
"option)\n"
+ " +[no]shortans (equivalent to `+noall"
+ "+authority +answer`)\n"
" +[no]showbadcookie (Show BADCOOKIE message)\n"
" +[no]showsearch (Search with intermediate "
"results)\n"
@@ -1901,18 +1903,40 @@ plus_option(char *option, bool is_batchfile, bool *need_clone,
goto invalid_option;
}
switch (cmd[3]) {
- case 'r': /* short */
- FULLCHECK("short");
- short_form = state;
- if (state) {
- printcmd = false;
- lookup->section_additional = false;
- lookup->section_answer = true;
- lookup->section_authority = false;
- lookup->section_question = false;
- lookup->comments = false;
- lookup->stats = false;
- lookup->rrcomments = -1;
+ case 'r': /* shor… */
+ switch(cmd[4]) {
+ case 't': /* short… */
+ switch(cmd[5]) { /* short */
+ case '\0':
+ FULLCHECK("short");
+ short_form = state;
+ if (state) {
+ printcmd = false;
+ lookup->section_additional = false;
+ lookup->section_answer = true;
+ lookup->section_authority = false;
+ lookup->section_question = false;
+ lookup->comments = false;
+ lookup->stats = false;
+ lookup->rrcomments = -1;
+ }
+ break;
+ case 'a': /* shortans */
+ FULLCHECK("shortans");
+ lookup->section_question = !state;
+ lookup->section_authority = state;
+ lookup->section_answer = state;
+ lookup->section_additional = !state;
+ lookup->comments = !state;
+ lookup->stats = !state;
+ printcmd = !state;
+ break;
+ default:
+ goto invalid_option;
+ }
+ break;
+ default:
+ goto invalid_option;
}
break;
case 'w': /* showsearch */
diff --git a/bin/dig/dig.rst b/bin/dig/dig.rst
index a5bfb86556..75237f0ae0 100644
--- a/bin/dig/dig.rst
+++ b/bin/dig/dig.rst
@@ -571,6 +571,10 @@ abbreviation is unambiguous; for example, :option:`+cd` is equivalent to
form. This option always has a global effect; it cannot be set globally and
then overridden on a per-lookup basis.
+.. option:: +shortans, +noshortans
+
+ This option expands to :option:`+noall` :option:`+authority` :option:`+answer`.
+
.. option:: +showbadcookie, +noshowbadcookie
This option toggles whether to show the message containing the
diff --git a/doc/man/dig.1in b/doc/man/dig.1in
index d5f42ed852..1607d7f2ca 100644
--- a/doc/man/dig.1in
+++ b/doc/man/dig.1in
@@ -663,6 +663,11 @@ then overridden on a per\-lookup basis.
.UNINDENT
.INDENT 0.0
.TP
+.B +shortans, +noshortans
+This option expands to \fI\%+noall\fP \fI\%+authority\fP \fI\%+answer\fP\&.
+.UNINDENT
+.INDENT 0.0
+.TP
.B +showbadcookie, +noshowbadcookie
This option toggles whether to show the message containing the
BADCOOKIE rcode before retrying the request or not. The default
--
2.40.1
```
# Detached signature
```gpg
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQS1rLeeEfG/f0nzK7hYUwVpYvFOWAUCZHH3EAAKCRBYUwVpYvFO
WOiHAP9uTERa4rrztKKeqk1TSLkqP5RgDnBbgxcbTkHAt5q7/wEAvffIjE5SUX8P
RpxZ9yS2geRmVXwyLDiS4FjxN3u7vgE=
=i92K
-----END PGP SIGNATURE-----
```https://gitlab.isc.org/isc-projects/bind9/-/issues/4092timer.c:223:timerevent_destroy(): fatal error: RUNTIME_CHECK(isc_mutex_unlock...2023-05-25T07:38:17ZMichal Nowaktimer.c:223:timerevent_destroy(): fatal error: RUNTIME_CHECK(isc_mutex_unlock((&timer->lock)) == ISC_R_SUCCESS) failedJob [#3411550](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3411550) failed for 66254cf56d7072833db6d8744e6bcef2109b72e2.
BIND 9.18 `task` unit test failed on `unit:gcc:oraclelinux8:amd64`.
```
[==========] Running 11 test(s).
[ RU...Job [#3411550](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3411550) failed for 66254cf56d7072833db6d8744e6bcef2109b72e2.
BIND 9.18 `task` unit test failed on `unit:gcc:oraclelinux8:amd64`.
```
[==========] Running 11 test(s).
[ RUN ] manytasks
[ OK ] manytasks
[ RUN ] all_events
[ OK ] all_events
[ RUN ] basic
timer.c:223:timerevent_destroy(): fatal error: RUNTIME_CHECK(isc_mutex_unlock((&timer->lock)) == ISC_R_SUCCESS) failed
../../tests/unit-test-driver.sh: line 36: 8595 Aborted (core dumped) "${TEST_PROGRAM}"
I:task_test:Core dump found: ./core.8595
D:task_test:backtrace from ./core.8595 start
[New LWP 8636]
[New LWP 8595]
[New LWP 8637]
[New LWP 8638]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/builds/isc-projects/bind9/tests/isc/.libs/lt-task_test'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f8c5b302aff in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f8c3bfff700 (LWP 8636))]
Thread 4 (Thread 0x7f8c412fa700 (LWP 8638)):
#0 0x00007f8c5b68846c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f8c5c4af725 in run (uap=0x7f8c591e1000) at timer.c:632
manager = 0x7f8c591e1000
now = {seconds = 1684976709, nanoseconds = 609640403}
result = <optimized out>
__func__ = "run"
#2 0x00007f8c5c4b4b20 in isc__trampoline_run (arg=0x1973730) at trampoline.c:189
trampoline = 0x1973730
result = <optimized out>
#3 0x00007f8c5b6821da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007f8c5b2ede73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 3 (Thread 0x7f8c40af9700 (LWP 8637)):
#0 0x00007f8c5b3e4017 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f8c5c2460f9 in uv.io_poll () from /lib64/libuv.so.1
No symbol table info available.
#2 0x00007f8c5c234a74 in uv_run () from /lib64/libuv.so.1
No symbol table info available.
#3 0x00007f8c5c47aa6c in nm_thread (worker0=0x7f8c591f75b8) at netmgr/netmgr.c:698
r = <optimized out>
worker = 0x7f8c591f75b8
mgr = 0x7f8c59036000
__func__ = "nm_thread"
#4 0x00007f8c5c4b4b20 in isc__trampoline_run (arg=0x1974330) at trampoline.c:189
trampoline = 0x1974330
result = <optimized out>
#5 0x00007f8c5b6821da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#6 0x00007f8c5b2ede73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 2 (Thread 0x7f8c5ce04140 (LWP 8595)):
#0 0x00007f8c5b3ae9a8 in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f8c5b3dbf48 in usleep () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007f8c5c4ac692 in isc__taskmgr_destroy (managerp=managerp@entry=0x607348 <taskmgr>) at task.c:1041
No locals.
#3 0x00007f8c5c49b4b0 in isc_managers_destroy (netmgrp=netmgrp@entry=0x607338 <netmgr>, taskmgrp=taskmgrp@entry=0x607348 <taskmgr>, timermgrp=timermgrp@entry=0x607340 <timermgr>) at managers.c:99
No locals.
#4 0x00000000004052ee in teardown_managers (state=<optimized out>) at isc.c:84
No locals.
#5 0x0000000000404f64 in _teardown (state=<optimized out>) at task_test.c:91
No locals.
#6 0x00007f8c5be1702e in cmocka_run_one_test_or_fixture () from /lib64/libcmocka.so.0
No symbol table info available.
#7 0x00007f8c5be179e0 in _cmocka_run_group_tests () from /lib64/libcmocka.so.0
No symbol table info available.
#8 0x000000000040516b in main () at task_test.c:1408
r = <optimized out>
Thread 1 (Thread 0x7f8c3bfff700 (LWP 8636)):
#0 0x00007f8c5b302aff in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f8c5b2d5ea5 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007f8c5c48f5c2 in isc_error_fatal (file=file@entry=0x7f8c5c4c45a6 "timer.c", line=line@entry=223, func=func@entry=0x7f8c5c4d07a0 <__func__.7544> "timerevent_destroy", format=format@entry=0x7f8c5c4c0814 "RUNTIME_CHECK(%s) failed") at error.c:72
args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7f8c3bff9d00, reg_save_area = 0x7f8c3bff9c40}}
#3 0x00007f8c5c4af15f in timerevent_destroy (event0=0x7f8c51800b00) at timer.c:225
timer = 0x7f8c591e10a0
event = 0x7f8c51800b00
__func__ = "timerevent_destroy"
#4 0x00007f8c5c48f7e9 in isc_event_free (eventp=eventp@entry=0x7f8c3bff9d48) at event.c:93
event = <optimized out>
#5 0x0000000000403449 in basic_tick (task=<optimized out>, event=<optimized out>) at task_test.c:444
No locals.
#6 0x00007f8c5c4abf17 in task_run (task=0x7f8c591e73c0) at task.c:815
dispatch_count = 0
finished = false
quantum = <optimized out>
event = 0x7f8c51800b00
result = ISC_R_SUCCESS
dispatch_count = <optimized out>
finished = <optimized out>
event = <optimized out>
result = <optimized out>
quantum = <optimized out>
__func__ = "task_run"
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
__v = <optimized out>
#7 isc_task_run (task=0x7f8c591e73c0) at task.c:896
No locals.
#8 0x00007f8c5c472579 in isc__nm_async_task (worker=worker@entry=0x7f8c591f7000, ev0=ev0@entry=0x7f8c51805f80) at netmgr/netmgr.c:848
ievent = 0x7f8c51805f80
result = <optimized out>
#9 0x00007f8c5c479d78 in process_netievent (worker=worker@entry=0x7f8c591f7000, ievent=ievent@entry=0x7f8c51805f80) at netmgr/netmgr.c:920
No locals.
#10 0x00007f8c5c47a78e in process_queue (worker=worker@entry=0x7f8c591f7000, type=type@entry=NETIEVENT_TASK) at netmgr/netmgr.c:1013
next = 0x0
ievent = 0x7f8c51805f80
list = {head = 0x0, tail = 0x0}
__func__ = "process_queue"
#11 0x00007f8c5c47b23b in process_all_queues (worker=0x7f8c591f7000) at netmgr/netmgr.c:767
result = <optimized out>
type = 2
reschedule = false
reschedule = <optimized out>
type = <optimized out>
result = <optimized out>
#12 async_cb (handle=0x7f8c591f7360) at netmgr/netmgr.c:796
worker = 0x7f8c591f7000
#13 0x00007f8c5c2342f1 in uv.async_io.part () from /lib64/libuv.so.1
No symbol table info available.
#14 0x00007f8c5c245d15 in uv.io_poll () from /lib64/libuv.so.1
No symbol table info available.
#15 0x00007f8c5c234a74 in uv_run () from /lib64/libuv.so.1
No symbol table info available.
#16 0x00007f8c5c47aa6c in nm_thread (worker0=0x7f8c591f7000) at netmgr/netmgr.c:698
r = <optimized out>
worker = 0x7f8c591f7000
mgr = 0x7f8c59036000
__func__ = "nm_thread"
#17 0x00007f8c5c4b4b20 in isc__trampoline_run (arg=0x1976840) at trampoline.c:189
trampoline = 0x1976840
result = <optimized out>
#18 0x00007f8c5b6821da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#19 0x00007f8c5b2ede73 in clone () from /lib64/libc.so.6
No symbol table info available.
D:task_test:backtrace from ./core.8595 end
FAIL task_test (exit status: 134)
```https://gitlab.isc.org/isc-projects/bind9/-/issues/4087Follow-up from "fix handling of TCP timeouts"2023-11-02T16:30:30ZEvan HuntFollow-up from "fix handling of TCP timeouts"The following discussion from !7937 should be addressed:
- [ ] @aram started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/7937#note_375087): (+2 comments)
> While you are addressing Ondřej's comments, ...The following discussion from !7937 should be addressed:
- [ ] @aram started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/7937#note_375087): (+2 comments)
> While you are addressing Ondřej's comments, would you please also look at something not strictly related to this MR, which caught my eye (cc @ondrej):
>
> ```c
> void
> dns_dispatch_resume(dns_dispentry_t *resp, uint16_t timeout);
> /*%<
> * Reset the read timeout in the socket associated with 'resp' and
> * continue reading.
> *
> * Requires:
> *\li 'resp' is valid.
> */
> ```
>
> The function is supposed to reset the read timeout, but if I am reading the code correctly, both `udp_dispatch_getnext()` and `tcp_dispatch_getnext()` (called by `dns_dispatch_resume()`) potentially can ignore the timeout value if the read operation is already ongoing. Is that by design?
>
> I think it should at least update the `resp->timeout` value with the new one, and probably call `isc_nmhandle_settimeout()` even when already reading, in case if the new timeout is smaller than the remaining time of the current one.Not plannedEvan HuntEvan Hunthttps://gitlab.isc.org/isc-projects/bind9/-/issues/4084auto-tune transfers-in, transfers-out, transfers-per-ns and friends2023-05-30T12:53:01ZCathy Almondauto-tune transfers-in, transfers-out, transfers-per-ns and friends### Description
The problem, as noted in [Support ticket #21991](https://support.isc.org/Ticket/Display.html?id=21991), is that without testing in a production environment and under specific circumstances (speed of network, configuratio...### Description
The problem, as noted in [Support ticket #21991](https://support.isc.org/Ticket/Display.html?id=21991), is that without testing in a production environment and under specific circumstances (speed of network, configuration of primaries per zone, reachability, rate of zone update propagation and so on), it's hard to know what sane values to give to the options for tuning zone transfers. The types of things that need to be optimised are:
- Speed of synchronisation/completion of refreshes following a secondary server restart
- Effective use of CPU resources so that servers are not idling while there is work that could be done
- No interruption to client services during zone refreshes (for servers that are currently client-facing)
- Effective onward zone update propagation/refreshes (for servers that are intermediaries in the zone update propagation path)
- Speed of propagation of zone updates during normal operation (i.e. not when restarting something...)
### Request
I'd like transfers-in, transfers-out, transfers-per-ns and friends to be able to auto-tune themselves based on knowledge of how the server is performing.
See other work currently ongoing:
#3883
#3914
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4065Could query-source be made best-effort, not preventing startup in case of fai...2023-06-16T09:03:46ZPetr MenšíkCould query-source be made best-effort, not preventing startup in case of failure?### Description
Could be specification of outgoing addresses made non-fatal? Some users try to configure used outgoing address by named. But then they are surprised it creates problem during startup, because those addresses might not ye...### Description
Could be specification of outgoing addresses made non-fatal? Some users try to configure used outgoing address by named. But then they are surprised it creates problem during startup, because those addresses might not yet be available.
### Request
Could it be possible to specify ``query-source 10.1.2.3 optional;``, which would behave similar way to FREEBIND for listening sockets? If the socket could not be bound, just use whatever default address system provides. But try to use that address if that would work. It would allow also starting with not yet present addresses, which would appear later.
Alternative would be delaying root primining queries until listen-on machinery detects source address available. That seems a lot more complicated.
### Links / references
- https://bugzilla.redhat.com/show_bug.cgi?id=2195976https://gitlab.isc.org/isc-projects/bind9/-/issues/4058BIND resolver incorrectly handles NODATA/NOERROR (NXRRSET) query response whe...2023-12-19T09:18:49ZCathy AlmondBIND resolver incorrectly handles NODATA/NOERROR (NXRRSET) query response when CNAME is queried during prefetch### Summary
"A" fetch to an auth server returns "CNAME". But (it appears), with prefetch enabled (the default), when the "CNAME" is fetched the authoritative sends back noanswer/noerror = NXRRSET). Clearly this is broken behaviour on t...### Summary
"A" fetch to an auth server returns "CNAME". But (it appears), with prefetch enabled (the default), when the "CNAME" is fetched the authoritative sends back noanswer/noerror = NXRRSET). Clearly this is broken behaviour on the part of the Auth servers (or they just changed their zone from providing a CNAME to providing an answer) but I still don't see why it breaks a BIND resolver - which should just at this point understand that the CNAME no longer exists and (as needed, as a result of client queries) query instead from the beginning with the RTYPE the client needs to have resolved.
Instead, the resolver is returning the empty answer to querying clients (who are not querying for CNAME, they are querying the resolver for A)
See [Support ticket 22027](https://support.isc.org/Ticket/Display.html?id=22027)
### BIND version used
9.16.35-S1
### Steps to reproduce
We don't have a reproducer at this time, but see the Support ticket for more details on what's happening. You need an authoritative server that responds with a CNAME (with a valid target) when queried for A (or other) rtypes for a name, but when queried explicitly for CNAME, sends back noerror/noanswer (essentially NXRRSET). Then enable prefetch and keep querying the server for record type A until the CNAME is close to expiry and is therefore prefetched explicitly...
### What is the current *bug* behavior?
When we are handling a client query, we are making queries to cache or to authoritative servers (cache miss) but all of those queries are for the RTYPE that we want to resolve. We don't query for type CNAME. IF we hit a CNAME along the way, then that will cause us to start a new query (from cache or initiate a fetch if we need to) using the target of the CNAME as the new name to be queried.
So far so good. This implies that the code that looks in cache and gets an answer from a fetch handles CNAME as a special case and that we likely look for or cache EXPLICITLY for CNAMEs while we're looking for the RTYPE that we actually want to resolve.
I suspect that we would not expect to find in cache an NXRRSET of type CNAME. Essentially this is meaningless to us - if CNAME doesn't exist than any other record type might exist, we just don't know, it might as well just not be there.
If we get back 'NXRRSET' from a fetch for type CNAME, do we even add it to cache, or does this result in us deleting the original CNAME RR?
Whatever we do with it, it appears to 'break' the cache so that clients get back NOANSWER (empty answer) instead of named doing another fetch based on the RTYPE of the client query made after this CNAME has been refreshed.
### What is the expected *correct* behavior?
Getting a 'NXRRSET' query response from an auth server that has explicitly been queried for a CNAME RR (to refresh what was in cache before - as instigated by prefetch) should not cause the cache to no longer be able to resolve queries for that name for other RTYPEs.
Subsequent client queries after receipt of the auth answer that says that the CNAME no longer exists, should cause new fetches to the auth server with the RTYPE of the client query in them.
Is it remotely possibly however, that finding a CNAME in cache (since we already know that we do something special if we find it) but then finding that it's not a pointer to 'go look up this name instead of the one you had' but instead is NXRRSET (whoa that wasn't what we expected to find!) could cause something aberrant to happen ... ? Or maybe this is a subtle race condition do with replacing the CNAME with NXRRSET for the CNAME (or deleting it entirely) because of the query response from the auth server, and this happening as a result of the prefetch, but now racing with the next client query that is looking in cache?
### Relevant configuration files
No configs - nothing special needed, just prefetch enabled so that when the CNAME in cache is close to expiry, a client query will trigger a prefetch.
### Relevant logs and/or screenshots
N/A - please see support ticket for more details
### Possible fixes
N/A
(P.S. With the info available and with what we know it was very hard to complete this report per the template).BIND 9.21.xMatthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/4057dig XDG basedir support2023-05-12T07:24:54ZPaul Töttermandig XDG basedir support### Description
Check ${XDG_CONFIG_HOME}/dig/digrc in addition to ~/.digrc
### Links / references
https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html### Description
Check ${XDG_CONFIG_HOME}/dig/digrc in addition to ~/.digrc
### Links / references
https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.htmlNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4051udp_recv_send unit test hangs intermittently, causing udp_test to fail2023-05-10T08:39:47ZMichał Kępieńudp_recv_send unit test hangs intermittently, causing udp_test to failhttps://gitlab.isc.org/isc-private/bind9/-/jobs/3366257
```
[==========] Running 18 test(s).
[ RUN ] mock_listenudp_uv_udp_open
[ OK ] mock_listenudp_uv_udp_open
[ RUN ] mock_listenudp_uv_udp_bind
[ OK ] mock_liste...https://gitlab.isc.org/isc-private/bind9/-/jobs/3366257
```
[==========] Running 18 test(s).
[ RUN ] mock_listenudp_uv_udp_open
[ OK ] mock_listenudp_uv_udp_open
[ RUN ] mock_listenudp_uv_udp_bind
[ OK ] mock_listenudp_uv_udp_bind
[ RUN ] mock_listenudp_uv_udp_recv_start
[ OK ] mock_listenudp_uv_udp_recv_start
[ RUN ] mock_udpconnect_uv_udp_open
[ OK ] mock_udpconnect_uv_udp_open
[ RUN ] mock_udpconnect_uv_udp_bind
[ OK ] mock_udpconnect_uv_udp_bind
[ RUN ] mock_udpconnect_uv_udp_connect
[ OK ] mock_udpconnect_uv_udp_connect
[ RUN ] mock_udpconnect_uv_recv_buffer_size
[ OK ] mock_udpconnect_uv_recv_buffer_size
[ RUN ] mock_udpconnect_uv_send_buffer_size
[ OK ] mock_udpconnect_uv_send_buffer_size
[ RUN ] udp_noop
[ OK ] udp_noop
[ RUN ] udp_noresponse
[ OK ] udp_noresponse
[ RUN ] udp_shutdown_connect
[ OK ] udp_shutdown_connect
[ RUN ] udp_shutdown_read
[ OK ] udp_shutdown_read
[ RUN ] udp_cancel_read
[ OK ] udp_cancel_read
[ RUN ] udp_timeout_recovery
[ OK ] udp_timeout_recovery
[ RUN ] udp_double_read
[ OK ] udp_double_read
[ RUN ] udp_recv_one
[ OK ] udp_recv_one
[ RUN ] udp_recv_two
[ OK ] udp_recv_two
[ RUN ] udp_recv_send
PID 7802 exceeded run time limit, sending SIGABRT
```
Not sure what happened here. Threads seem to be idle. Artifacts were
preserved, including a core dump of the hung test process.
So far I have only seen this on `main`.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4036Run fuzz/ tests as part of unit CI jobs on bind-9.162023-04-26T10:39:20ZMichal NowakRun fuzz/ tests as part of unit CI jobs on bind-9.16On `bind-9.16`, the `fuzz/` tests are [not being run as part of the unit CI jobs](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3348523); they [are](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3348327) run on `main` and `bind-9.1...On `bind-9.16`, the `fuzz/` tests are [not being run as part of the unit CI jobs](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3348523); they [are](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3348327) run on `main` and `bind-9.18` as part of unit CI jobs.
On `bind-9.16`, `fuzz/` tests are only run when the `check` target is run from the root of the project.
Hence isc-projects/bind9#4035 was missed.https://gitlab.isc.org/isc-projects/bind9/-/issues/4026REQUIRE(handle->sock->tid == isc_tid()) shortly after zone file changed in xf...2023-11-02T09:22:14ZMichal NowakREQUIRE(handle->sock->tid == isc_tid()) shortly after zone file changed in xferquota system testJob [#3337779](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3337779) failed for 6892e463bf7daf93c332d33919c20f978a39b539 on OpenBSD.
```
S:xferquota:2023-04-20T08:37:42+0000
T:xferquota:1:A
A:xferquota:System test xferquota
I:xferqu...Job [#3337779](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3337779) failed for 6892e463bf7daf93c332d33919c20f978a39b539 on OpenBSD.
```
S:xferquota:2023-04-20T08:37:42+0000
T:xferquota:1:A
A:xferquota:System test xferquota
I:xferquota:PORTS:14569,14570,14571,14572,14573,14574,14575,14576,14577,14578,14579,14580,14581
I:xferquota:starting servers
I:xferquota:Have 70 zones up in 1 seconds
I:xferquota:Changing test zone...
I:xferquota:Have 70 zones up in 2 seconds
...
I:xferquota:Have 70 zones up in 359 seconds
I:xferquota:Took too long to load zones
I:xferquota:stopping servers
I:xferquota:ns1 died before a SIGTERM was sent
```
The core dump timestamp is 8∶37∶49 UTC, 7 seconds after the test started, when `netmgr/netmgr.c:1016: REQUIRE(handle->sock->tid == isc_tid()) failed` was hit.
```
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:Core was generated by `named'.
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:Program terminated with signal SIGABRT, Aborted.
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#0 thrkill () at /tmp/-:3
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:[Current thread is 1 (process 317378)]
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#0 thrkill () at /tmp/-:3
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#1 0x00000b231f341b8e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#2 0x00000b2070b225f2 in assertion_failed (file=0xb22c97fc318 "netmgr/netmgr.c", line=1016, type=isc_assertiontype_require, cond=<optimized out>) at main.c:222
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#3 0x00000b22c98236a0 in isc_assertion_failed (file=0x0, line=6, type=isc_assertiontype_require, cond=0xb231f370d6a <thrkill+10> "r\001\303d\211\004% ") at assertions.c:48
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#4 0x00000b22c9811a1e in isc__nmhandle_detach (handlep=<optimized out>) at netmgr/netmgr.c:1016
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#5 0x00000b2294782f20 in dispentry_destroy (resp=0xb234a87c820) at dispatch.c:469
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#6 dns_dispentry_unref (ptr=0xb234a87c820) at dispatch.c:488
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#7 0x00000b2294849ae2 in request_cancel (request=0xb234a867820) at request.c:805
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#8 dns_request_cancel (request=0xb234a867820) at request.c:818
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#9 0x00000b2294849948 in dns_requestmgr_shutdown (requestmgr=<optimized out>) at request.c:192
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#10 0x00000b229488dcfc in dns_view_detach (viewp=<optimized out>) at view.c:479
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#11 0x00000b2070b38a0f in load_configuration (filename=<optimized out>, server=<optimized out>, first_time=<optimized out>) at server.c:9730
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#12 0x00000b2070b26d0c in loadconfig (server=0xb230654ca20) at server.c:10306
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#13 reload (server=0xb230654ca20) at server.c:10332
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#14 0x00000b22c9823a50 in isc__async_cb (handle=<optimized out>) at async.c:84
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#15 0x00000b22df1f474d in uv.async_io () from /usr/local/lib/libuv.so.4.1
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#16 0x00000b22df206812 in uv.io_poll () from /usr/local/lib/libuv.so.4.1
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#17 0x00000b22df1f4e08 in uv_run () from /usr/local/lib/libuv.so.4.1
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#18 0x00000b22c9838e82 in loop_run (loop=0xb22951bc020) at loop.c:272
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#19 loop_thread (arg=0xb22951bc020) at loop.c:299
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#20 0x00000b22c9838d4f in isc_loopmgr_run (loopmgr=0xb230654ff20) at loop.c:473
D:/builds/isc-projects/bind9/bin/tests/system/xferquota:#21 0x00000b2070b2209e in main (argc=<optimized out>, argv=<optimized out>) at main.c:1513
```
Artifacts were saved in GitLab CI.Not plannedOndřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4024named took too long to terminate in respdiff-long-third-party job2023-05-24T11:27:31ZMichal Nowaknamed took too long to terminate in respdiff-long-third-party jobJob [#3333558](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3333558) failed for 6892e463bf7daf93c332d33919c20f978a39b539.
60 seconds wasn't enough for `named` in `respdiff-long-third-party` to terminate and has been aborted instead....Job [#3333558](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3333558) failed for 6892e463bf7daf93c332d33919c20f978a39b539.
60 seconds wasn't enough for `named` in `respdiff-long-third-party` to terminate and has been aborted instead.
```
[Current thread is 1 (Thread 0x7f80690d6140 (LWP 17957))]
#0 0x00007f806b984d56 in epoll_wait (epfd=4, events=0x7ffe696dc2c0, maxevents=1024, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 0x00007f806bd6d83a in uv__io_poll (loop=0x7f8068ca2220, timeout=-1) at /usr/src/libuv-v1.44.1/src/unix/epoll.c:236
#2 0x00007f806bd528ae in uv_run (loop=0x7f8068ca2220, mode=UV_RUN_DEFAULT) at /usr/src/libuv-v1.44.1/src/unix/core.c:391
#3 0x00007f806c3e1b73 in loop_run (loop=loop@entry=0x7f8068ca2200) at loop.c:272
#4 0x00007f806c3e1c16 in loop_thread (arg=0x7f8068ca2200) at loop.c:299
#5 0x00007f806c3e29af in isc_loopmgr_run (loopmgr=0x7f8068c1c1c0) at loop.c:473
#6 0x000055ea5742e0e0 in main (argc=<optimized out>, argv=<optimized out>) at main.c:1513
```
Full backtrace and logs were saved as part of CI artifacts.https://gitlab.isc.org/isc-projects/bind9/-/issues/4017delv +ns does not respect +nomultiline2023-07-24T09:40:40ZPetr Špačekpspacek@isc.orgdelv +ns does not respect +nomultiline### Summary
`delv +ns` even with `+nomultiline` (which is still presumably default) still produces records split across lines.
### BIND version used
* ~"Affects v9.19": 453aaac
### Steps to reproduce
```console
$ delv +ns +nomultili...### Summary
`delv +ns` even with `+nomultiline` (which is still presumably default) still produces records split across lines.
### BIND version used
* ~"Affects v9.19": 453aaac
### Steps to reproduce
```console
$ delv +ns +nomultiline
```
### What is the current *bug* behavior?
See RRSIG at the bottom:
```
;; ANSWER SECTION:
;. 518400 IN NS g.root-servers.net.
;. 518400 IN NS j.root-servers.net.
;. 518400 IN NS e.root-servers.net.
;. 518400 IN NS l.root-servers.net.
;. 518400 IN NS d.root-servers.net.
;. 518400 IN NS a.root-servers.net.
;. 518400 IN NS b.root-servers.net.
;. 518400 IN NS i.root-servers.net.
;. 518400 IN NS m.root-servers.net.
;. 518400 IN NS h.root-servers.net.
;. 518400 IN NS c.root-servers.net.
;. 518400 IN NS k.root-servers.net.
;. 518400 IN NS f.root-servers.net.
;. 518400 IN RRSIG NS 8 0 518400 (
; 20230430050000 20230417040000 60955 .
; ixbH/37glxgsTPCpCAuQPTDMH98e
; 70cquz9G9NRI+ex75JQzxeAUMcsw
; TtiY19vVTEPfrbRorDAxLRC720BV
; pJ9ZOQBBl8A9ss2R022TCSoBR44d
; BqY2e7M5nyUBaIkFkvF9+wyxa24+
; MHBli9qC91C+4uuTpqVhZjtnOjKQ
; 8UMRVZoZ5qTrn6EV9x5qq5akItf7
; hi1BEjJKylYdcplg5x3JDqkKGnso
; OS45mvo3fOt0owujArlEnsPy8+I3
; LwL1W68VdjG1CnTEp2HFpqbnoxQ1
; KhpKWErf/HEYxOnDgsXljUDuWOEX
; wj+UOYSnRzRekFGfSu211D447iHl
; 8XHISQ== )
```
### What is the expected *correct* behavior?
RRSIG (or any other RR) is not split across lines.https://gitlab.isc.org/isc-projects/bind9/-/issues/4014Implement tests for maximum global and idle time for incoming XFR2023-05-04T14:23:14ZOndřej SurýImplement tests for maximum global and idle time for incoming XFRSpin-off from !7810 to not forget to write pytests for maximum global and idle time for incoming XFR.Spin-off from !7810 to not forget to write pytests for maximum global and idle time for incoming XFR.Not plannedTom KrizekTom Krizekhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4013Add more tests for #4001 and #40022023-07-03T15:47:55ZOndřej SurýAdd more tests for #4001 and #4002This is a follow up from !7805 to add more tests for the source-port configuration.
To quote @pspacek:
> Well, this is pretty large change and needs tests. If nothing else I would like to see what happens if:
>
> * attempt to open TCP...This is a follow up from !7805 to add more tests for the source-port configuration.
To quote @pspacek:
> Well, this is pretty large change and needs tests. If nothing else I would like to see what happens if:
>
> * attempt to open TCP connection ends up in packet black-hole
> * connection is established and the remote side does not respond (established connection hangs)
> * the remote side responds with some something which does not parse as DNS
> * the remote side sends mismatching NOTIFY answer (say different zone name)Not plannedTom KrizekTom Krizekhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4010Allow for scripts / hooks for key rollovers2023-04-11T12:43:32ZKarol BabiochAllow for scripts / hooks for key rollovers### Description
It seems like currently there is no good way on how to automate a KSK rollover, since the corresponding DS record has to published in the parent zone. While there is [RFC7344](https://datatracker.ietf.org/doc/html/rfc734...### Description
It seems like currently there is no good way on how to automate a KSK rollover, since the corresponding DS record has to published in the parent zone. While there is [RFC7344](https://datatracker.ietf.org/doc/html/rfc7344), in reality it is not widely adopted. Personally I don't know any registrar who supports this yet. Anyway, this would require TSIG to be secure anyway.
One of my registrars offers an HTTPS-based API to manage DNSSEC records. Hence, its possible to write scripts that will automate the key rollover process.
### Request
There should be a way to trigger a script (with some inputs such as the key id, the DS record, etc.) whenever BIND is about to rotate a key. This way it should be possible to use `dnssec-policy` and fully automate the key rollover process, including the `KSK` key (rather than only the `ZSK` key).
### Links / referencesNot planned