BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2021-12-21T05:23:52Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2433investigate and improve lock contention around mctx2021-12-21T05:23:52ZPetr Špačekpspacek@isc.orginvestigate and improve lock contention around mctx### Summary
It _seems_ that locks around shared memory contexts (mctx) are contended for in various scenarios. This leads to worse performance.
### BIND version used
a23c5d2921e43aaea06d12b1b0f8de2fa6d07759
### Steps to reproduce
- ...### Summary
It _seems_ that locks around shared memory contexts (mctx) are contended for in various scenarios. This leads to worse performance.
### BIND version used
a23c5d2921e43aaea06d12b1b0f8de2fa6d07759
### Steps to reproduce
- Compile named using these options:
`OPTIMIZE="-Og" CFLAGS="g3 -ggdb -Wno-deprecated-declarations -fno-omit-frame-pointer -fno-optimize-sibling-calls -fPIC -rdynamic" LDFLAGS="-fPIE"`
- Run [mutrace](https://github.com/bconry/mutrace) and named with more threads:
`mutrace --hash-size=594327 -d named -n 32 -g -c /dev/null`
Note: Systems with binutils 2.34+ require mutrace patch https://github.com/bconry/mutrace/pull/4
- Send some cache hit traffic to named. E.g. run:
`yes '. NS' | dnsperf -c 100 -l 10`
- SIGINT named when dnsperf finishes.
- While reading output of mutrace ignore misleading line `mutrace.c:750 unlock_hash()` in stack tracebacks if it is present ([depends on mutrace version](https://github.com/bconry/mutrace/issues/5)).
### What is the current *bug* behavior?
Lock contention, leading to bad performance.
### Relevant logs and/or screenshots
[mutrace.log](/uploads/71b0d927b5950a422392d1556c496cee/mutrace.log)
Most contended mutex is:
```
Mutex #60844 (0x0x7f442da070d0) first referenced by:
mutrace.c:750 unlock_hash()
mutex.c:288 isc__mutex_init()
netmgr.c:249 isc_nm_start()
main.c:934 create_managers()
main.c:1248 setup()
main.c:1555 main()
??:0 __libc_start_main()
```
In source code netmgr.c:249 it is lock created on line:
```
249 isc_mempool_create(mgr->mctx, sizeof(isc__netievent_storage_t),
250 &mgr->evpool);
```
I.e. it is locking around `mgr->mctx`.
This is simplest way to show lock contention with tools. During high-QPS benchmarking using `kxdpgun` the lock contention around `mgr->mctx` was indeed creating measurable performance problem. On 16 core system this shared mctx leads to performance drop 3x when compared with situation where each thread has its own mctx.
### Possible fixes
Generally having a separate `mctx` per thread might be a good idea, but it needs careful design so objects which get passed between threads can be de/reallocated correctly.June 2021 (9.11.33, 9.11.33-S1, 9.16.17/9.16.18, 9.16.17-S1/9.16.18-S1, 9.17.14/9.17.15)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2515Performance drop of 10 % in LTT/main/root zone/dnsgen Perflab scenario2021-05-24T15:18:45ZMichal NowakPerformance drop of 10 % in LTT/main/root zone/dnsgen Perflab scenarioSince isc-projects/bind9!4659 was merged the `LTT/main/root zone/dnsgen` [Perflab scenario](https://perflab.isc.org/#/config/run/5bf195f683ba91a870b29770), and only that one, shows persistent 10 % performance drop from 460,000 to 420,000...Since isc-projects/bind9!4659 was merged the `LTT/main/root zone/dnsgen` [Perflab scenario](https://perflab.isc.org/#/config/run/5bf195f683ba91a870b29770), and only that one, shows persistent 10 % performance drop from 460,000 to 420,000 qps.June 2021 (9.11.33, 9.11.33-S1, 9.16.17/9.16.18, 9.16.17-S1/9.16.18-S1, 9.17.14/9.17.15)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2597Make calling generic rdata methods consistent2021-03-29T14:02:33ZMark AndrewsMake calling generic rdata methods consistentApril 2021 (9.11.30/9.11.31, 9.11.30-S1/9.11.31-S1, 9.16.14/9.16.15, 9.16.14-S1/9.16.15-S1, 9.17.12)https://gitlab.isc.org/isc-projects/bind9/-/issues/2636timing race in setnsec3param task2021-04-19T09:49:09ZOndřej Surýtiming race in setnsec3param taskIn `zone_postload()` the queued `zone->setnsec3param_queue` get scheduled as regular tasks:
> /*
> * Process any queued NSEC3PARAM change requests. Only for dynamic
> * zones, an inline-signing zone will perfor...In `zone_postload()` the queued `zone->setnsec3param_queue` get scheduled as regular tasks:
> /*
> * Process any queued NSEC3PARAM change requests. Only for dynamic
> * zones, an inline-signing zone will perform this action when
> * receiving the secure db (receive_secure_db).
> */
However, the original reason why the NSEC3PARAM change requests were originally queued depends on following code:
> /*
> * setnsec3param() will silently return early if the zone does not yet
> * have a database. Prevent that by queueing the event up if zone->db
> * is NULL. All events queued here are subsequently processed by
> * receive_secure_db() if it ever gets called or simply freed by
> * zone_free() otherwise.
> */
So, when all the queued events gets scheduled as tasks in the `zone_postload()` and the tasks are fired up, the `zone->db` still might be `NULL`. Due to a difference in scheduling this doesn't really happen with the old (separate) task manager, but it's very easy to trigger in the taskmgr@netmgr.May 2021 (9.11.32, 9.11.32-S1, 9.16.16, 9.16.16-S1, 9.17.13)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2638Run internal tasks on top of network manager worker loops2021-09-02T09:40:28ZOndřej SurýRun internal tasks on top of network manager worker loopsAfter the networking manager was introduced, the existing taskmgr kept its own set of worker threads competing with the netmgr threads. This issue is about moving the tasks to run on top of netmgr loops while keeping the existing interfa...After the networking manager was introduced, the existing taskmgr kept its own set of worker threads competing with the netmgr threads. This issue is about moving the tasks to run on top of netmgr loops while keeping the existing interface.
---
The primary merge requests implementing this change are:
- !4918
- !4983
The above changes required some follow-up tweaks, which were implemented in:
- !4980
- !4981
- !4982
- !5006
- !5008
- !5009
- !5010May 2021 (9.11.32, 9.11.32-S1, 9.16.16, 9.16.16-S1, 9.17.13)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2654Create isc_managers API2021-05-11T08:58:27ZOndřej SurýCreate isc_managers APIAll the BIND 9 binaries have the same sequence of taskmgr/netmgr/timermgr/... ctors and dtors. Move this into a separate API, so we don't duplicate the code at various places.All the BIND 9 binaries have the same sequence of taskmgr/netmgr/timermgr/... ctors and dtors. Move this into a separate API, so we don't duplicate the code at various places.May 2021 (9.11.32, 9.11.32-S1, 9.16.16, 9.16.16-S1, 9.17.13)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2690Remove Windows support for BIND 9.17/9.18+2022-01-21T13:43:04ZOndřej SurýRemove Windows support for BIND 9.17/9.18+This is a tracking issue to remove the Windows port from BIND 9 source code for the next stable release.This is a tracking issue to remove the Windows port from BIND 9 source code for the next stable release.July 2021 (9.11.34, 9.11.34-S1, 9.16.19, 9.16.19-S1, 9.17.16)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2691Remove native PKCS#11 support from BIND 9.17/9.18+2022-01-19T11:20:49ZOndřej SurýRemove native PKCS#11 support from BIND 9.17/9.18+This a tracking issue to remove the native PKCS#11 implementation from BIND 9 source code (including the documentation).This a tracking issue to remove the native PKCS#11 implementation from BIND 9 source code (including the documentation).October 2021 (9.11.36, 9.11.36-S1, 9.16.22, 9.16.22-S1, 9.17.19)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2732Zone dumping is blocking the networking IO2021-06-07T12:43:40ZOndřej SurýZone dumping is blocking the networking IOJune 2021 (9.11.33, 9.11.33-S1, 9.16.17/9.16.18, 9.16.17-S1/9.16.18-S1, 9.17.14/9.17.15)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2788Use tolower()/toupper()/isupper() from ctype.h2021-07-08T08:37:51ZOndřej SurýUse tolower()/toupper()/isupper() from ctype.hThis was suggested by Rick and my initial answer was but it is locale aware, so a) it will mess the names when in non-POSIX locale and b) it must be slower.
Turns out that a) can be easily fixed (any program that doesn't call `setlocale...This was suggested by Rick and my initial answer was but it is locale aware, so a) it will mess the names when in non-POSIX locale and b) it must be slower.
Turns out that a) can be easily fixed (any program that doesn't call `setlocale()` is running with **POSIX** locale) and b) isn't true. In fact, using the `tolower()` was fastest out of all the implementations we used or considered.July 2021 (9.11.34, 9.11.34-S1, 9.16.19, 9.16.19-S1, 9.17.16)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2843EC_KEY has been deprecated on OpenSSL 3.0.02022-01-19T11:20:50ZMark AndrewsEC_KEY has been deprecated on OpenSSL 3.0.0Need to workout what the replacement code needs to be as builds fail on strict systems.Need to workout what the replacement code needs to be as builds fail on strict systems.November 2021 (9.16.23, 9.16.23-S1, 9.17.20)Arаm SаrgsyаnArаm Sаrgsyаnhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2851A rare crash in the DoH code caused by an assert2021-08-17T09:01:28ZArtem BoldarievA rare crash in the DoH code caused by an assertA rare issue was found while working on !5309. The unit test suite (`doh_test`) revealed a situation when `session->handle` got detached too early in the `http_send_outgoing()`, the function which takes data from nghttp2 and sends it via...A rare issue was found while working on !5309. The unit test suite (`doh_test`) revealed a situation when `session->handle` got detached too early in the `http_send_outgoing()`, the function which takes data from nghttp2 and sends it via the underlying connection.
As a result, when we have reached the call to `isc_nm_send()`
```
session->sending++;
isc_nm_send(session->handle, &send->data, http_writecb, send);
return (true);
}
```
the `session->handle` was `NULL` triggering an assert:
```
void
isc_nm_send(isc_nmhandle_t *handle, isc_region_t *region, isc_nm_cb_t cb,
void *cbarg) {
REQUIRE(VALID_NMHANDLE(handle));
```September 2021 (9.16.21, 9.16.21-S1, 9.17.18)Artem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2860The DoH endpoint URL in dig might get prepared in wrong format2021-08-31T13:41:21ZArtem BoldarievThe DoH endpoint URL in dig might get prepared in wrong formatOne of the causes why the problem #2858 appeared is that the URL in the DoH code might get generated in wrong format at least when IPv6 is used. In particular, in the issue `dig` was trying to connect to `https://::1:444/dns-query` inste...One of the causes why the problem #2858 appeared is that the URL in the DoH code might get generated in wrong format at least when IPv6 is used. In particular, in the issue `dig` was trying to connect to `https://::1:444/dns-query` instead of `https://[::1]:444/dns-query`. Obviously, this issue prevents `dig` from querying servers via its IPv6 addresses (querying hostnames available via IPv6 works fine).
Also, it will always generate a URL starting with `https://` regardless of the fact if we use HTTP with encryption or not.
So, while the fix for the problem reported is already prepared (!5319), this issue needs to be taken care of as well. The code which prepared the URL needs to be revisited.September 2021 (9.16.21, 9.16.21-S1, 9.17.18)Artem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2926use netmgr for route sockets and remove isc_socket2022-01-21T13:44:36ZEvan Huntuse netmgr for route sockets and remove isc_socketThe last remaining use of `isc_socket` and `isc_socketmgr` in BIND is for the netlink/route sockets that are used to scan for interface changes.
The libuv documentation indicates that any socket that honors the datagram contract can be ...The last remaining use of `isc_socket` and `isc_socketmgr` in BIND is for the netlink/route sockets that are used to scan for interface changes.
The libuv documentation indicates that any socket that honors the datagram contract can be passed to `uv_udp_open()`, so we should be able to make the netmgr do this instead.November 2021 (9.16.23, 9.16.23-S1, 9.17.20)Evan HuntEvan Hunthttps://gitlab.isc.org/isc-projects/bind9/-/issues/2951assertion failure in req_senddone2022-01-21T07:32:54ZMark Andrewsassertion failure in req_senddoneJob [#2037401](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2037401) failed for c177c33c2773d7efa4b5af5460fd354450529d05:
```
D:checkds:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D checkds-ns9 -X named.l...Job [#2037401](https://gitlab.isc.org/isc-projects/bind9/-/jobs/2037401) failed for c177c33c2773d7efa4b5af5460fd354450529d05:
```
D:checkds:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D checkds-ns9 -X named.lock -'.
D:checkds:Program terminated with signal SIGABRT, Aborted.
D:checkds:#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
D:checkds:[Current thread is 1 (Thread 0x7f81a97fa700 (LWP 1941))]
D:checkds:#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
D:checkds:#1 0x00007f81b069e535 in __GI_abort () at abort.c:79
D:checkds:#2 0x0000000000420cac in assertion_failed (file=0x7f81b12ee461 "request.c", line=<optimized out>, type=<optimized out>, cond=<optimized out>) at main.c:237
D:checkds:#3 0x00007f81b1372b7a in isc_assertion_failed (file=0x2 <error: Cannot access memory at address 0x2>, line=-1451276416, line@entry=1029, type=type@entry=isc_assertiontype_require, cond=0x7f81b06b37bb <__GI_raise+267> "H\213\214$\b\001") at assertions.c:47
D:checkds:#4 0x00007f81b123b4e2 in req_senddone (eresult=<optimized out>, region=<optimized out>, arg=<optimized out>) at request.c:1029
D:checkds:#5 0x00007f81b11727f6 in send_done (handle=0x7f819db70d80, result=ISC_R_CANCELED, cbarg=<optimized out>) at dispatch.c:1770
D:checkds:#6 0x00007f81b1363b7f in isc__nm_async_sendcb (worker=<optimized out>, ev0=ev0@entry=0x7f81a5843580) at netmgr/netmgr.c:2788
D:checkds:#7 0x00007f81b13603c6 in process_netievent (worker=worker@entry=0x7f81adc4a030, ievent=ievent@entry=0x7f81a5843580) at netmgr/netmgr.c:960
D:checkds:#8 0x00007f81b135baf1 in process_queue (worker=0x7f81adc4a030, type=NETIEVENT_NORMAL) at netmgr/netmgr.c:998
D:checkds:#9 process_all_queues (worker=0x7f81adc4a030) at netmgr/netmgr.c:746
D:checkds:#10 async_cb (handle=0x7f81adc4a390) at netmgr/netmgr.c:775
D:checkds:#11 0x00007f81b0c976d8 in ?? () from /usr/lib/x86_64-linux-gnu/libuv.so.1
D:checkds:#12 0x00007f81b0ca6530 in uv.io_poll () from /usr/lib/x86_64-linux-gnu/libuv.so.1
D:checkds:#13 0x00007f81b0c97ff5 in uv_run () from /usr/lib/x86_64-linux-gnu/libuv.so.1
D:checkds:#14 0x00007f81b135bc39 in nm_thread (worker0=0x7f81adc4a030) at netmgr/netmgr.c:681
D:checkds:#15 0x00007f81b13a1aea in isc__trampoline_run (arg=0x1948b00) at trampoline.c:185
D:checkds:#16 0x00007f81b0844fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
D:checkds:#17 0x00007f81b07754cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```
```
Thread 1 (Thread 0x7f81a97fa700 (LWP 1941)):
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {16387, 0, 0, 0, 3905522674892552757, 140195000463360, 140195000513668, 140195000691568, 140195006326720, 140194998247424, 0, 140194998247424, 140194871224000, 140194942605624, 140194942605616, 5}}
pid = <optimized out>
tid = <optimized out>
ret = <optimized out>
#1 0x00007f81b069e535 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x7f81b0c5e900, sa_sigaction = 0x7f81b0c5e900}, sa_mask = {__val = {140194987098112, 140194987177249, 140194988119184, 140195000149089, 140194871199616, 31, 1483, 140194175012544, 13, 140194993269024, 17674366533170704384, 206158430256, 4575977, 140195000149089, 14, 206158430248}}, sa_flags = -1451275824, sa_restorer = 0xe}
sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x0000000000420cac in assertion_failed (file=0x7f81b12ee461 "request.c", line=<optimized out>, type=<optimized out>, cond=<optimized out>) at main.c:237
tracebuf = {0x420cc4 <assertion_failed+148>, 0x7f81b1372b7a <isc_assertion_failed+10>, 0x7f81b123b4e2 <req_senddone+450>, 0x7f81b11727f6 <send_done+54>, 0x7f81b1363b7f <isc__nm_async_sendcb+143>, 0x7f81b13603c6 <process_netievent+1222>, 0x7f81b135baf1 <async_cb+145>, 0x7f81b0c976d8, 0x7f81b0ca6530 <uv.io_poll+800>, 0x7f81b0c97ff5 <uv_run+277>, 0x7f81b135bc39 <nm_thread+121>, 0x7f81b13a1aea <isc__trampoline_run+74>, 0x7f81b0844fa3 <start_thread+243>, 0x7f81b07754cf <clone+63>, 0x7f81b07754cf <clone+63>, 0x0, 0x0, 0x405, 0x7f81b12ee461, 0x7f81a576f990, 0x7f81b0679668, 0x7f819c2b4e00, 0x7f81adc4a030, 0x7f81b18e2b00 <_dl_fixup+208>, 0x5, 0xffff00001f80, 0x7f816e473000, 0xa0, 0x7f819ca05748, 0x7f81a97f4e10, 0x0, 0x7f81a97f9080, 0x7f819ca05708, 0x7f81b043efaf, 0x7f81ade5f740, 0x7f81adea5040, 0x7f81adedc6c0, 0x7f81adedc6c0, 0x7f81adedc6c0, 0x7f81adedd340, 0x7f81adedd340, 0x7f81adedd440, 0xdededededededede, 0xdededededededede, 0xff00ffff00ff00, 0xff00ff00ff00ff, 0xff000000, 0x0, 0x3036346536313866, 0x756873203a303034, 0x0, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7325732500732520, 0x7325732573257325, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4200000042, 0x4200000042, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f81adc09000, 0x7f816e460400, 0x7f81b138437b <isc__mem_put+171>, 0x0, 0x7f816e460400, 0x7f819da00000, 0x7f816e460410, 0x0, 0x1, 0x7f816e460560, 0x7f816e460410, 0x2, 0x800000000000000e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f81a97f4e70, 0x7f81b1171054 <tcp_recv+1828>, 0x7f81a56e2000, 0x700350a1f640002, 0x0 <repeats 15 times>, 0x10, 0xffffffffffffffff, 0xffffffffffffffff, 0x700350a1f640002, 0x0, 0x0, 0x0, 0x0, 0xf547f6b817b13000, 0x0, 0x7f81adc09000, 0x7f81a97f4e10, 0x7f81b13ac7a8, 0x7b0, 0x7f81adc09008}
nframes = <optimized out>
#3 0x00007f81b1372b7a in isc_assertion_failed (file=0x2 <error: Cannot access memory at address 0x2>, line=-1451276416, line@entry=1029, type=type@entry=isc_assertiontype_require, cond=0x7f81b06b37bb <__GI_raise+267> "H\213\214$\b\001") at assertions.c:47
No locals.
#4 0x00007f81b123b4e2 in req_senddone (eresult=<optimized out>, region=<optimized out>, arg=<optimized out>) at request.c:1029
request = <optimized out>
#5 0x00007f81b11727f6 in send_done (handle=0x7f819db70d80, result=ISC_R_CANCELED, cbarg=<optimized out>) at dispatch.c:1770
resp = 0x0
#6 0x00007f81b1363b7f in isc__nm_async_sendcb (worker=<optimized out>, ev0=ev0@entry=0x7f81a5843580) at netmgr/netmgr.c:2788
ievent = <optimized out>
sock = 0x7f816e473000
uvreq = 0x0
eresult = ISC_R_CANCELED
#7 0x00007f81b13603c6 in process_netievent (worker=worker@entry=0x7f81adc4a030, ievent=ievent@entry=0x7f81a5843580) at netmgr/netmgr.c:960
No locals.
#8 0x00007f81b135baf1 in process_queue (worker=0x7f81adc4a030, type=NETIEVENT_NORMAL) at netmgr/netmgr.c:998
waiting = 4
ievent = 0x7f81a5843580
stop = <optimized out>
#9 process_all_queues (worker=0x7f81adc4a030) at netmgr/netmgr.c:746
result = <optimized out>
type = 3
reschedule = true
type = <optimized out>
result = <optimized out>
#10 async_cb (handle=0x7f81adc4a390) at netmgr/netmgr.c:775
worker = 0x7f81adc4a030
#11 0x00007f81b0c976d8 in ?? () from /usr/lib/x86_64-linux-gnu/libuv.so.1
No symbol table info available.
#12 0x00007f81b0ca6530 in uv.io_poll () from /usr/lib/x86_64-linux-gnu/libuv.so.1
No symbol table info available.
#13 0x00007f81b0c97ff5 in uv_run () from /usr/lib/x86_64-linux-gnu/libuv.so.1
No symbol table info available.
#14 0x00007f81b135bc39 in nm_thread (worker0=0x7f81adc4a030) at netmgr/netmgr.c:681
r = <optimized out>
worker = 0x7f81adc4a030
mgr = 0x7f81adc45000
#15 0x00007f81b13a1aea in isc__trampoline_run (arg=0x1948b00) at trampoline.c:185
trampoline = 0x1948b00
result = <optimized out>
#16 0x00007f81b0844fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
ret = <optimized out>
pd = <optimized out>
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140194871224064, -130959809494454272, 140721831580654, 140721831580655, 140194871224064, 26512128, 84985927855435776, 84967749757497344}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#17 0x00007f81b07754cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
```January 2022 (9.18.0)https://gitlab.isc.org/isc-projects/bind9/-/issues/2969Assertion failure in dns_resolver_destroyfetch()2022-05-16T10:19:55ZEvan HuntAssertion failure in dns_resolver_destroyfetch()This often seems to occur in the `forward` system test, while chasing DS records. It looks as if `fctx_cancelqueries()` is being run at the same time by two threads.
```
#2 0x000055d09e875e47 in assertion_failed (file=0x7fc3ca45280b "r...This often seems to occur in the `forward` system test, while chasing DS records. It looks as if `fctx_cancelqueries()` is being run at the same time by two threads.
```
#2 0x000055d09e875e47 in assertion_failed (file=0x7fc3ca45280b "resolver.c",
line=10603, type=isc_assertiontype_require,
cond=0x7fc3ca453c78 "((fctx) != ((void *)0) && ((const isc__magic_t *)(fctx))->magic == ((('F') << 24 | ('!') << 16 | ('!') << 8 | ('!'))))") at main.c:236
#3 0x00007fc3ca4e3c8d in isc_assertion_failed (
file=0x7fc3ca45280b "resolver.c", line=10603,
type=isc_assertiontype_require,
cond=0x7fc3ca453c78 "((fctx) != ((void *)0) && ((const isc__magic_t *)(fctx))->magic == ((('F') << 24 | ('!') << 16 | ('!') << 8 | ('!'))))")
at assertions.c:47
#4 0x00007fc3ca377303 in dns_resolver_destroyfetch (fetchp=0x7fc3c2bf4e08)
at resolver.c:10603
#5 0x00007fc3ca3ae320 in fetch_callback_ds (task=0x7fc3b6f55b80, event=0x0)
at validator.c:620
#6 0x00007fc3ca5126f9 in task_run (task=0x7fc3b6f55b80) at task.c:827
#7 0x00007fc3ca51290e in isc_task_run (task=0x7fc3b6f55b80) at task.c:907
#8 0x00007fc3ca4c2018 in isc__nm_async_task (worker=0x7fc3c7049df0,
ev0=0x7fc3b5a13780) at netmgr/netmgr.c:834
#9 0x00007fc3ca4c229c in process_netievent (worker=0x7fc3c7049df0,
ievent=0x7fc3b5a13780) at netmgr/netmgr.c:913
#10 0x00007fc3ca4c2dd5 in process_queue (worker=0x7fc3c7049df0,
type=NETIEVENT_TASK) at netmgr/netmgr.c:1007
#11 0x00007fc3ca4c1e59 in process_all_queues (worker=0x7fc3c7049df0)
at netmgr/netmgr.c:753
#12 0x00007fc3ca4c1edc in async_cb (handle=0x7fc3c704a150)
at netmgr/netmgr.c:782
#13 0x00007fc3c9df1a51 in uv__async_io (loop=0x7fc3c7049e00,
w=<optimized out>, events=<optimized out>) at src/unix/async.c:163
#14 0x00007fc3c9e035e5 in uv__io_poll (loop=loop@entry=0x7fc3c7049e00,
timeout=<optimized out>) at src/unix/linux-core.c:462
#15 0x00007fc3c9df21ec in uv_run (loop=0x7fc3c7049e00, mode=UV_RUN_DEFAULT)
at src/unix/core.c:389
```May 2022 (9.16.29, 9.16.29-S1, 9.18.3, 9.19.1)Evan HuntEvan Hunthttps://gitlab.isc.org/isc-projects/bind9/-/issues/3113Follow-up from "allow dns_clientinfo to store client ECS data"2023-02-27T14:15:08ZEvan HuntFollow-up from "allow dns_clientinfo to store client ECS data"The following discussion from !5555 should be addressed:
- [ ] @ondrej started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5555#note_257882): (+1 comment)
> Honestly, I would not change `dns_clientinf...The following discussion from !5555 should be addressed:
- [ ] @ondrej started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5555#note_257882): (+1 comment)
> Honestly, I would not change `dns_clientinfo_init()`, but would rather add extra function to set the ecs in the structure:
>
> ```suggestion:-2+0
> void
> dns_clientinfo_init(dns_clientinfo_t *ci, void *data, void *versionp);
>
> void
> dns_clientinfo_set_ecs(dns_clientinfo_t *ci, dns_ecs_t *ecs);
> ```
>
> That way the amount of changes to the other code would be minimized as if I count correctly, it's only three places, where we set the `ecs` data in the `dns_clientinfo` structure.March 2023 (9.16.39, 9.16.39-S1, 9.18.13, 9.18.13-S1, 9.19.11)https://gitlab.isc.org/isc-projects/bind9/-/issues/3122DoT stops working after "rndc reconfigure" when running named as non-root2024-01-11T14:13:01ZsthaugDoT stops working after "rndc reconfigure" when running named as non-root<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [...<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [security-officer@isc.org](security-officer@isc.org).
-->
### Summary
DoT stops working after "rndc reconfigure" when running named as non-root
### BIND version used
```
BIND 9.18.0 (Stable Release) <id:8db45af>
running on FreeBSD amd64 12.3-STABLE FreeBSD 12.3-STABLE r371270 DNS_VIMAGE
built by make with '--disable-linux-caps' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/namedb' '--with-dlopen=yes' '--with-libxml2' '--with-openssl=/usr' '--enable-dnsrps' '--with-readline=libedit' '--enable-dnstap' '--disable-fixed-rrset' '--disable-geoip' '--without-maxminddb' '--without-gssapi' '--with-libidn2=/usr/local' '--with-json-c' '--disable-largefile' '--with-lmdb=/usr/local' '--disable-querytrace' '--enable-tcp-fastopen' '--prefix=/usr/local' '--mandir=/usr/local/man' '--disable-silent-rules' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd12.2' 'build_alias=amd64-portbld-freebsd12.2' 'CC=cc' 'CFLAGS=-O2 -pipe -DLIBICONV_PLUG -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -ljson-c -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-DLIBICONV_PLUG -isystem /usr/local/include' 'CPP=cpp' 'PKG_CONFIG=pkgconf' 'READLINE_CFLAGS=-L/usr/local/lib'
compiled by CLANG FreeBSD Clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
compiled with OpenSSL version: OpenSSL 1.1.1m-freebsd 14 Dec 2021
linked to OpenSSL version: OpenSSL 1.1.1m-freebsd 14 Dec 2021
compiled with libuv version: 1.41.0
linked to libuv version: 1.41.0
compiled with libnghttp2 version: 1.44.0
linked to libnghttp2 version: 1.44.0
compiled with libxml2 version: 2.9.12
linked to libxml2 version: 20912
compiled with json-c version: 0.15
linked to json-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
compiled with protobuf-c version: 1.4.0
linked to protobuf-c version: 1.4.0
threads support is enabled
default paths:
named configuration: /usr/local/etc/namedb/named.conf
rndc configuration: /usr/local/etc/namedb/rndc.conf
DNSSEC root key: /usr/local/etc/namedb/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
```
Also tried 9.17.18 (Installed from FreeBSD package) and 9.17.22 (compiled from source).
### Steps to reproduce
Start named with the following command line:
/usr/local/sbin/named -t /var/named -u bind -c /usr/local/etc/namedb/named.conf
and with the following named.conf file:
```
options {
directory "/usr/local/etc/namedb/working";
pid-file "/var/run/named/pid";
dump-file "/var/dump/named_dump.db";
statistics-file "/var/stats/named.stats";
listen-on { 193.75.110.2; 127.0.0.1; };
listen-on port 853 tls dotas2116 { 193.75.110.2; 127.0.0.1; };
interface-interval 0;
recursion yes;
max-cache-size 1500M;
minimal-any yes;
minimal-responses yes;
querylog yes;
allow-query { 194.19.2.0/24; 193.75.110.0/24; 127.0.0.1; };
};
tls dotas2116 {
cert-file "/usr/local/etc/namedb/fullchain.pem";
key-file "/usr/local/etc/namedb/privkey.pem";
protocols { TLSv1.2; TLSv1.3; };
};
```
### What is the current *bug* behavior?
After doing "rndc reconfigure" named no longer listens to TCP port 853. This is visible in the log:
```
Jan 31 16:29:08 nslum named[42849]: no longer listening on 127.0.0.1#853
Jan 31 16:29:08 nslum named[42849]: listening on IPv4 interface lo0, 127.0.0.1#853
Jan 31 16:29:08 nslum named[42849]: creating TLS socket: permission denied
Jan 31 16:29:08 nslum named[42849]: creating IPv4 interface lo0 failed; interface ignored
Jan 31 16:29:08 nslum named[42849]: no longer listening on 193.75.110.2#853
Jan 31 16:29:08 nslum named[42849]: listening on IPv4 interface ixl1.15, 193.75.110.2#853
Jan 31 16:29:08 nslum named[42849]: creating TLS socket: permission denied
Jan 31 16:29:08 nslum named[42849]: creating IPv4 interface ixl1.15 failed; interface ignored
```
Using "dig +tls" to test results in:
;; Connection to 193.75.110.2#853(193.75.110.2) for vg.no failed: connection refused.
and using the FreeBSD "sockstat" command shows named is not listening to TCP port 853.
### What is the expected *correct* behavior?
The expected behavior is that "dig +tls" works, and resolves names normally. This works right after startup - and using the "sockstat command I can see that named is listening to TCP port 853:
```
Jan 31 16:27:10 nslum named[42849]: listening on IPv4 interface lo0, 127.0.0.1#853
Jan 31 16:27:10 nslum named[42849]: listening on IPv4 interface ixl1.15, 193.75.110.2#853
```
### Relevant configuration files
See named.conf above.
### Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console
output, logs, and code, as it's very hard to read otherwise.)
### Possible fixes
(If you can, link to the line of code that might be responsible for the
problem.)May 2022 (9.16.29, 9.16.29-S1, 9.18.3, 9.19.1)Artem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3168Refactor recursion tracking in lib/ns/query.c2022-10-10T07:40:54ZMichał KępieńRefactor recursion tracking in lib/ns/query.cWhile #3147 was fixed with a change that should be safe to backport
(!5870 only touches statistics-related code), a number of problems in
`lib/ns/query.c` were identified while that problem was being
investigated:
1. [x] (!5882) Code d...While #3147 was fixed with a change that should be safe to backport
(!5870 only touches statistics-related code), a number of problems in
`lib/ns/query.c` were identified while that problem was being
investigated:
1. [x] (!5882) Code duplication.
2. [x] (!5883) Logically unrelated bits of code (e.g. prefetch and RPZ)
share common code paths (e.g. `prefetch_done()`) even
though they do not need to.
3. [x] (!5884) Various features able to initiate recursion use common
memory locations (e.g. `client->fetchhandle`,
`client->query.fetch`), which forces them all to
"cooperate" with each other even though they are all
logically separate.
4. [x] (!5885) One of the overloaded variables is
`client->recursionquota`, which causes recursion quota
to be handled inconsistently.
5. [x] (!5886) The `check_recursionquota()` function needs to be
cleaned up.
Unlike #3147, IMHO addressing the above issues is 9.19+ material due to
the amount of changes involved.July 2022 (9.16.31, 9.16.31-S1, 9.18.5, 9.19.3)Michał KępieńMichał Kępieńhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3178ThreadSanitizer: data race lib/dns/dispatch.c in dns_dispatch_send2023-01-24T11:13:22ZArаm SаrgsyаnThreadSanitizer: data race lib/dns/dispatch.c in dns_dispatch_sendA ThreadSanitizer report in a CI [job](https://gitlab.isc.org/isc-private/bind9/-/jobs/2327824) in the private repository.
```
==================
WARNING: ThreadSanitizer: data race (pid=29526)
Read of size 8 at 0x7b5000020a10 by thre...A ThreadSanitizer report in a CI [job](https://gitlab.isc.org/isc-private/bind9/-/jobs/2327824) in the private repository.
```
==================
WARNING: ThreadSanitizer: data race (pid=29526)
Read of size 8 at 0x7b5000020a10 by thread T9 (mutexes: write M2079, write M542256371492526000):
#0 dns_dispatch_send /builds/isc-private/bind9/lib/dns/dispatch.c (libdns-9.18.0.so+0x66d0c)
#1 req_send /builds/isc-private/bind9/lib/dns/request.c:352:2 (libdns-9.18.0.so+0x18a044)
#2 req_connected /builds/isc-private/bind9/lib/dns/request.c:1016:3 (libdns-9.18.0.so+0x18a044)
#3 dns_dispatch_connect /builds/isc-private/bind9/lib/dns/dispatch.c:1835:5 (libdns-9.18.0.so+0x6619d)
#4 dns_request_createvia /builds/isc-private/bind9/lib/dns/request.c:772:12 (libdns-9.18.0.so+0x18b7f8)
#5 checkds_send_toaddr /builds/isc-private/bind9/lib/dns/zone.c:21358:11 (libdns-9.18.0.so+0x23db16)
#6 task_run /builds/isc-private/bind9/lib/isc/task.c:821:5 (libisc-9.18.0.so+0x75856)
#7 isc_task_run /builds/isc-private/bind9/lib/isc/task.c:901:10 (libisc-9.18.0.so+0x75856)
#8 isc__nm_async_task /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:846:11 (libisc-9.18.0.so+0x2dfa7)
#9 process_netievent /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c (libisc-9.18.0.so+0x2712f)
#10 process_queue /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:1019:16 (libisc-9.18.0.so+0x20f58)
#11 process_all_queues /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:765:25 (libisc-9.18.0.so+0x20f58)
#12 async_cb /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:794:6 (libisc-9.18.0.so+0x20f58)
#13 uv__async_io /usr/src/libuv-v1.43.0/src/unix/async.c:163:5 (libuv.so.1+0x10e82)
#14 isc__trampoline_run /builds/isc-private/bind9/lib/isc/trampoline.c:187:11 (libisc-9.18.0.so+0x7f0d9)
Previous write of size 8 at 0x7b5000020a10 by thread T2 (mutexes: write M686371576748313440):
#0 isc__nmhandle_attach /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:1711:11 (libisc-9.18.0.so+0x24380)
#1 startrecv /builds/isc-private/bind9/lib/dns/dispatch.c:1705:3 (libdns-9.18.0.so+0x67f91)
#2 tcp_connected /builds/isc-private/bind9/lib/dns/dispatch.c:1730:3 (libdns-9.18.0.so+0x6645b)
#3 isc__nm_async_connectcb /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:2735:2 (libisc-9.18.0.so+0x2c625)
#4 isc__nm_connectcb /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:2710:3 (libisc-9.18.0.so+0x29c19)
#5 tcpdns_connect_cb /builds/isc-private/bind9/lib/isc/netmgr/tcpdns.c:249:2 (libisc-9.18.0.so+0x37af9)
#6 uv__stream_connect /usr/src/libuv-v1.43.0/src/unix/stream.c:1390:5 (libuv.so.1+0x22a9f)
#7 isc__trampoline_run /builds/isc-private/bind9/lib/isc/trampoline.c:187:11 (libisc-9.18.0.so+0x7f0d9)
Location is heap block of size 480 at 0x7b5000020a00 allocated by thread T2:
#0 malloc <null> (named+0x45f71d)
#1 mallocx /builds/isc-private/bind9/lib/isc/./jemalloc_shim.h:35:10 (libisc-9.18.0.so+0x5fd97)
#2 mem_get /builds/isc-private/bind9/lib/isc/mem.c:345:8 (libisc-9.18.0.so+0x5fd97)
#3 isc__mem_get /builds/isc-private/bind9/lib/isc/mem.c:760:8 (libisc-9.18.0.so+0x5fd97)
#4 dispatch_allocate /builds/isc-private/bind9/lib/dns/dispatch.c:1090:9 (libdns-9.18.0.so+0x61d5b)
#5 dns_dispatch_createtcp /builds/isc-private/bind9/lib/dns/dispatch.c:1144:2 (libdns-9.18.0.so+0x61ae6)
#6 tcp_dispatch /builds/isc-private/bind9/lib/dns/request.c:417:11 (libdns-9.18.0.so+0x189acd)
#7 get_dispatch /builds/isc-private/bind9/lib/dns/request.c:458:12 (libdns-9.18.0.so+0x189acd)
#8 dns_request_createvia /builds/isc-private/bind9/lib/dns/request.c:719:11 (libdns-9.18.0.so+0x18b16f)
#9 checkds_send_toaddr /builds/isc-private/bind9/lib/dns/zone.c:21358:11 (libdns-9.18.0.so+0x23db16)
#10 task_run /builds/isc-private/bind9/lib/isc/task.c:821:5 (libisc-9.18.0.so+0x75856)
#11 isc_task_run /builds/isc-private/bind9/lib/isc/task.c:901:10 (libisc-9.18.0.so+0x75856)
#12 isc__nm_async_task /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:846:11 (libisc-9.18.0.so+0x2dfa7)
#13 process_netievent /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c (libisc-9.18.0.so+0x2712f)
#14 process_queue /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:1019:16 (libisc-9.18.0.so+0x20f58)
#15 process_all_queues /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:765:25 (libisc-9.18.0.so+0x20f58)
#16 async_cb /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:794:6 (libisc-9.18.0.so+0x20f58)
#17 uv__async_io /usr/src/libuv-v1.43.0/src/unix/async.c:163:5 (libuv.so.1+0x10e82)
#18 isc__trampoline_run /builds/isc-private/bind9/lib/isc/trampoline.c:187:11 (libisc-9.18.0.so+0x7f0d9)
Mutex M2079 (0x7b7c00015008) created at:
#0 pthread_mutex_init <null> (named+0x46251f)
#1 isc__mutex_init /builds/isc-private/bind9/lib/isc/mutex.c:52:8 (libisc-9.18.0.so+0x64b97)
#2 dns_zone_create /builds/isc-private/bind9/lib/dns/zone.c:1148:2 (libdns-9.18.0.so+0x1f4cca)
#3 dns_zonemgr_createzone /builds/isc-private/bind9/lib/dns/zone.c:18937:11 (libdns-9.18.0.so+0x20afca)
#4 configure_zone /builds/isc-private/bind9/bin/named/server.c:6746:3 (named+0x507551)
#5 configure_view /builds/isc-private/bind9/bin/named/server.c:4149:3 (named+0x4fe365)
#6 load_configuration /builds/isc-private/bind9/bin/named/server.c:9277:3 (named+0x4f8661)
#7 run_server /builds/isc-private/bind9/bin/named/server.c:9990:2 (named+0x4e5f75)
#8 task_run /builds/isc-private/bind9/lib/isc/task.c:821:5 (libisc-9.18.0.so+0x75856)
#9 isc_task_run /builds/isc-private/bind9/lib/isc/task.c:901:10 (libisc-9.18.0.so+0x75856)
#10 isc__nm_async_task /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:846:11 (libisc-9.18.0.so+0x2dfa7)
#11 process_netievent /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c (libisc-9.18.0.so+0x2712f)
#12 process_queue /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:1019:16 (libisc-9.18.0.so+0x20f58)
#13 process_all_queues /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:765:25 (libisc-9.18.0.so+0x20f58)
#14 async_cb /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:794:6 (libisc-9.18.0.so+0x20f58)
#15 uv__async_io /usr/src/libuv-v1.43.0/src/unix/async.c:163:5 (libuv.so.1+0x10e82)
#16 isc__trampoline_run /builds/isc-private/bind9/lib/isc/trampoline.c:187:11 (libisc-9.18.0.so+0x7f0d9)
Mutex M542256371492526000 is already destroyed.
Mutex M686371576748313440 is already destroyed.
Thread T9 'isc-net-0008' (tid=29574, running) created by main thread at:
#0 pthread_create <null> (named+0x460dad)
#1 isc_thread_create /builds/isc-private/bind9/lib/isc/thread.c:81:8 (libisc-9.18.0.so+0x7804a)
#2 isc__netmgr_create /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:331:3 (libisc-9.18.0.so+0x20cb1)
#3 isc_managers_create /builds/isc-private/bind9/lib/isc/managers.c:41:2 (libisc-9.18.0.so+0x5e117)
#4 create_managers /builds/isc-private/bind9/bin/named/main.c:922:11 (named+0x4e1a9c)
#5 setup /builds/isc-private/bind9/bin/named/main.c:1186:11 (named+0x4e1a9c)
#6 main /builds/isc-private/bind9/bin/named/main.c:1454:2 (named+0x4e1a9c)
Thread T2 'isc-net-0001' (tid=29549, running) created by main thread at:
#0 pthread_create <null> (named+0x460dad)
#1 isc_thread_create /builds/isc-private/bind9/lib/isc/thread.c:81:8 (libisc-9.18.0.so+0x7804a)
#2 isc__netmgr_create /builds/isc-private/bind9/lib/isc/netmgr/netmgr.c:331:3 (libisc-9.18.0.so+0x20cb1)
#3 isc_managers_create /builds/isc-private/bind9/lib/isc/managers.c:41:2 (libisc-9.18.0.so+0x5e117)
#4 create_managers /builds/isc-private/bind9/bin/named/main.c:922:11 (named+0x4e1a9c)
#5 setup /builds/isc-private/bind9/bin/named/main.c:1186:11 (named+0x4e1a9c)
#6 main /builds/isc-private/bind9/bin/named/main.c:1454:2 (named+0x4e1a9c)
SUMMARY: ThreadSanitizer: data race /builds/isc-private/bind9/lib/dns/dispatch.c in dns_dispatch_send
==================
ThreadSanitizer: reported 1 warnings
```January 2023 (9.16.37, 9.16.37-S1, 9.18.11, 9.18.11-S1, 9.19.9)