ISC Open Source Projects issueshttps://gitlab.isc.org/groups/isc-projects/-/issues2020-10-30T10:40:21Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1764build 9.16.2/release, @ `make depend`, "fatal error: lib/dns/dnstap.pb-c.h: N...2020-10-30T10:40:21Zpgndbuild 9.16.2/release, @ `make depend`, "fatal error: lib/dns/dnstap.pb-c.h: No such file or directory"building 9.16.2/release on linux/64
config with
```
./configure \
...
--without-pkcs11
--enable-dnstap \
--with-protobuf-c \
--with-libfstrm \
...
```
reports as expected
```
...
=========================================...building 9.16.2/release on linux/64
config with
```
./configure \
...
--without-pkcs11
--enable-dnstap \
--with-protobuf-c \
--with-libfstrm \
...
```
reports as expected
```
...
===============================================================================
Configuration summary:
...
-------------------------------------------------------------------------------
Optional features enabled:
...
!!! Allow 'dnstap' packet logging (--enable-dnstap)
...
-------------------------------------------------------------------------------
Features disabled or unavailable on this platform:
...
!!! Using PKCS#11 for Public-Key Cryptography (--with-native-pkcs11)
...
-------------------------------------------------------------------------------
...
```
on exec of
```
make depend
```
an error's reported,
```
...
make[3]: Leaving directory '/usr/local/src/bind-9.16.2/lib/dns/include'
/bin/sh /usr/local/src/bind-9.16.2/make/mkdep -include /usr/local/src/bind-9.16.2/config.h -I/usr/local/src/bind-9.16.2 -I../.. -I. -I../../lib/dns -Iinclude -I/usr/local/src/bind-9.16.2/lib/dns/include -I../../lib/dns/include -I/usr/local/src/bind-9.16.2/lib/isc/include -I../../lib/isc -I../../lib/isc/include -I../../lib/isc/unix/include -I../../lib/isc/pthreads/include -I/usr/local/openssl11/include -I/usr/include/json-c -I/usr/include/libxml2 -I/var/lib/GeoIP2/include -include /usr/local/src/bind-9.16.2/config.h -I/usr/local/src/bind-9.16.2 -I../.. -I. -I../../lib/dns -Iinclude -I/usr/local/src/bind-9.16.2/lib/dns/include -I../../lib/dns/include -I/usr/local/src/bind-9.16.2/lib/isc/include -I../../lib/isc -I../../lib/isc/include -I../../lib/isc/unix/include -I../../lib/isc/pthreads/include -I/usr/local/openssl11/include -I/usr/include/json-c -I/usr/include/libxml2 -I/var/lib/GeoIP2/include -O3 -Wall -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fmessage-length=0 -grecord-gcc-switches -march=native -mtune=native -fPIC -DPIC -D_GNU_SOURCE -fno-strict-aliasing -Wall -pthread -I/usr/local/lmdb/include -Iyes/include -Iyes/include -fPIC -W -Wall -Wmissing-prototypes -Wcast-qual -Wwrite-strings -Wformat -Wpointer-arith -Wno-missing-field-initializers -fno-strict-aliasing @PKCS11LINKSRCS@ dst_api.c dst_parse.c dst_result.c gssapi_link.c gssapictx.c hmac_link.c openssl_link.c openssldh_link.c opensslecdsa_link.c openssleddsa_link.c opensslrsa_link.c pkcs11rsa_link.c pkcs11ecdsa_link.c pkcs11eddsa_link.c pkcs11.c key.c acl.c adb.c badcache. byaddr.c cache.c callbacks.c clientinfo.c compress.c db.c dbiterator.c dbtable.c diff.c dispatch.c dlz.c dns64.c dnsrps.c dnssec.c ds.c dyndb.c ecs.c fixedname.c forward.c ipkeylist.c iptable.c journal.c kasp.c keydata.c keymgr.c keytable.c lib.c log.c lookup.c master.c masterdump.c message.c name.c ncache.c nsec.c nsec3.c nta.c order.c peer.c portlist.c rbt.c rbtdb.c rcode.c rdata.c rdatalist.c rdataset.c rdatasetiter.c rdataslab.c request.c resolver.c result.c rootns.c rpz.c rrl.c rriterator.c sdb.c sdlz.c soa.c ssu.c ssu_external.c stats.c tcpmsg.c time.c timer.c tkey.c tsec.c tsig.c ttl.c update.c validator.c version.c view.c xfrin.c zone.c zoneverify.c
zonekey.c zt.c client.c ecdb.c dnstap.c dnstap.pb-c.c
!!! gcc-10: error: PKCS11LINKSRCS@: No such file or directory
!!! gcc-10: error: badcache.: No such file or directory
!!! gcc-10: error: dnstap.pb-c.c: No such file or directory
make[2]: Leaving directory '/usr/local/src/bind-9.16.2/lib/dns'
...
making depend in /usr/local/src/bind-9.16.2/bin/tools
make[2]: Entering directory '/usr/local/src/bind-9.16.2/bin/tools'
/bin/sh /usr/local/src/bind-9.16.2/make/mkdep -include /usr/local/src/bind-9.16.2/config.h -I/usr/local/src/bind-9.16.2 -I../.. -I/usr/local/src/bind-9.16.2/lib/dns/include -I../../lib/dns/include -I/usr/local/src/bind-9.16.2/lib/isc/include -I../../lib/isc -I../../lib/isc/include -I../../lib/isc/unix/include -I../../lib/isc/pthreads/include -I/usr/local/src/bind-9.16.2/lib/isccfg/include -I../../lib/isccfg/include -I/usr/local/src/bind-9.16.2/lib/bind9/include -I../../lib/bind9/include -I/usr/local/openssl11/include -I/var/lib/GeoIP2/include -DVERSION="9.16.2" -include /usr/local/src/bind-9.16.2/config.h -I/usr/local/src/bind-9.16.2 -I../.. -I/usr/local/src/bind-9.16.2/lib/dns/include -I../../lib/dns/include -I/usr/local/src/bind-9.16.2/lib/isc/include -I../../lib/isc -I../../lib/isc/include -I../../lib/isc/unix/include -I../../lib/isc/pthreads/include -I/usr/local/src/bind-9.16.2/lib/isccfg/include -I../../lib/isccfg/include -I/usr/local/src/bind-9.16.2/lib/bind9/include -I../../lib/bind9/include -I/usr/local/openssl11/include -I/var/lib/GeoIP2/include -DVERSION="9.16.2" -O3 -Wall -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fmessage-length=0 -grecord-gcc-switches -march=native -mtune=native -fPIC -DPIC -D_GNU_SOURCE -fno-strict-aliasing -Wall -pthread -I/usr/local/lmdb/include -Iyes/include -Iyes/include -fPIC -W -Wall -Wmissing-prototypes -Wcast-qual -Wwrite-strings -Wformat -Wpointer-arith -Wno-missing-field-initializers -fno-strict-aliasing arpaname.c named-journalprint.c named-rrchecker.c nsec3hash.c mdig.c dnstap-read.c named-nzd2nzf.c
!!! dnstap-read.c:51:10: fatal error: lib/dns/dnstap.pb-c.h: No such file or directory
51 | #include "lib/dns/dnstap.pb-c.h"
| ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
...
```May 2020 (9.11.19, 9.11.19-S1, 9.14.12, 9.16.3)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/1765create empty release notes for 9.17.2, 9.16.3, 9.11.192020-04-17T06:30:38ZMark Andrewscreate empty release notes for 9.17.2, 9.16.3, 9.11.19May 2020 (9.11.19, 9.11.19-S1, 9.14.12, 9.16.3)https://gitlab.isc.org/isc-projects/bind9/-/issues/1766Core dump in legacy system test.2020-07-02T09:30:55ZMark AndrewsCore dump in legacy system test.Job [#827575](https://gitlab.isc.org/isc-projects/bind9/-/jobs/827575) failed for 8d04b6b93a48dace450761683c0265ed997a94bd:Job [#827575](https://gitlab.isc.org/isc-projects/bind9/-/jobs/827575) failed for 8d04b6b93a48dace450761683c0265ed997a94bd:July 2020 (9.11.21, 9.11.21-S1, 9.16.5, 9.17.3)Witold KrecickiWitold Krecickihttps://gitlab.isc.org/isc-projects/bind9/-/issues/1767bind keys download remove lookaside2020-08-12T23:37:31Zreedjcbind keys download remove lookasidePlease see http://ftp.isc.org/isc/bind9/keys/9.11/ (which is linked from https://www.isc.org/bind-keys/)
Please remove the lookaside key data to catch up to Oct. 2017:
https://gitlab.isc.org/isc-projects/bind9/-/blob/f29359299aaab519f39...Please see http://ftp.isc.org/isc/bind9/keys/9.11/ (which is linked from https://www.isc.org/bind-keys/)
Please remove the lookaside key data to catch up to Oct. 2017:
https://gitlab.isc.org/isc-projects/bind9/-/blob/f29359299aaab519f39b090cd83de85cd2fc3820/bind.keyshttps://gitlab.isc.org/isc-projects/bind9/-/issues/1768move dns_peer_t into peer.c2020-04-29T11:54:43ZMark Andrewsmove dns_peer_t into peer.cMay 2020 (9.11.19, 9.11.19-S1, 9.14.12, 9.16.3)https://gitlab.isc.org/isc-projects/bind9/-/issues/1769Ensure all necessary files are included in the tarball produced by "make dist"2020-06-05T11:48:56ZMichał KępieńEnsure all necessary files are included in the tarball produced by "make dist"`make dist` should produce a self-contained tarball which can be used
for building and installing BIND. Since e.g. multiple headers are
currently not included in `*_SOURCES` Automake variables, this is
currently not the case and it need...`make dist` should produce a self-contained tarball which can be used
for building and installing BIND. Since e.g. multiple headers are
currently not included in `*_SOURCES` Automake variables, this is
currently not the case and it needs to be fixed so that we can use `make
dist` instead of `git archive` (+ `.gitattributes`) for creating our
source tarballs.June 2020 (9.11.20, 9.11.20-S1, 9.16.4, 9.17.2)Michal NowakMichal Nowakhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1770Review how we use sys/un.h2020-12-10T08:11:45ZOndřej SurýReview how we use sys/un.hThere's:
* `#ifdef ISC_PLATFORM_HAVESYSUNH`
* `#ifdef ISC_PLAFORM_HAVESYSUNH`
* `#ifdef AF_UNIX`
Let's fix this properly in stable releases...There's:
* `#ifdef ISC_PLATFORM_HAVESYSUNH`
* `#ifdef ISC_PLAFORM_HAVESYSUNH`
* `#ifdef AF_UNIX`
Let's fix this properly in stable releases...December 2020 (9.11.26, 9.11.26-S1, 9.16.10, 9.16.10-S1, 9.17.8)Michal NowakMichal Nowakhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1772Properly test GSSAPI TSIG against Windows client / server2021-10-05T12:03:57ZOndřej SurýProperly test GSSAPI TSIG against Windows client / serverSince we now have Windows as part of the CI, we can (most probably) properly test GSSAPI TSIG against Windows. This would involve discovering how to configure both Windows client and Windows server and walk through available authenticat...Since we now have Windows as part of the CI, we can (most probably) properly test GSSAPI TSIG against Windows. This would involve discovering how to configure both Windows client and Windows server and walk through available authentication mechanisms to test them all.BIND 9.17 Backburnerhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1774Get Windows builds working again2020-05-29T12:31:25ZMichał KępieńGet Windows builds working againThe following discussion from !985 should be addressed:
- [ ] @michal started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/985#note_125128): (+1 comment)
> Fair enough, but there is a ton of `#if _MSC_...The following discussion from !985 should be addressed:
- [ ] @michal started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/985#note_125128): (+1 comment)
> Fair enough, but there is a ton of `#if _MSC_VER ...` conditional blocks
> still out there. Something to address in a separate issue? (Or when we
> try to fix Windows builds?)June 2020 (9.11.20, 9.11.20-S1, 9.16.4, 9.17.2)https://gitlab.isc.org/isc-projects/bind9/-/issues/1775Resizing (growing) of cache hash tables causes delays in processing of client...2020-11-13T18:31:42ZCathy AlmondResizing (growing) of cache hash tables causes delays in processing of client queriesFrom [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
During investigations of intermittent 'brownouts' - periods in which named seemingly stops actioning client queries for a short period, and then resumes ...From [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
During investigations of intermittent 'brownouts' - periods in which named seemingly stops actioning client queries for a short period, and then resumes processing a second or two later (yes, delays of seconds not ms from this) we 'caught' one culprit red-handed in a pstack run that was automatically triggered by an 'alarm' in monitoring inbound and outbound server traffic rates.
The thread in question was holding the cache tree lock, while growing the hash table:
```
Thread 21 (Thread 0x7f54d8b2f700 (LWP 19115)):
#0 0x000000000052bc7b in rehash (rbt=0x7f54b8c04058, newcount=<optimized out>) at rbt.c:2376
#1 0x000000000052da99 in hash_node (name=0x7f53d9562bb0, node=0x7f541cf79538, rbt=0x7f54b8c04058) at rbt.c:2389
#2 dns_rbt_addnode (rbt=0x7f54b8c04058, name=0x7f53d9562bb0, nodep=0x7f54d8b2dd28) at rbt.c:1451
#3 0x00000000005367ef in rbt_addnode_withdata (rbtdb=0x7f54b8c03010, rbt=0x7f54b8c04058, name=<optimized out>, nodep=0x7f54d8b2dd28) at rbtdb.c:2016
#4 0x000000000053ba42 in findnodeintree (rbtdb=0x7f54b8c03010, tree=0x7f54b8c04058, name=0x7f53d9562bb0, create=true, nodep=0x7f54d8b2ed30) at rbtdb.c:3339
#5 0x00000000005babb5 in cache_name (now=1587326409, zerottl=false, name=0x7f53d9562bb0, section=1, query=0x7f54600100d0, fctx=0x7f5449e172d0) at resolver.c:5876
#6 cache_message (now=1587326409, zerottl=false, query=0x7f54600100d0, fctx=0x7f5449e172d0) at resolver.c:6336
#7 resquery_response (task=0x7f5387cbb628, event=<optimized out>) at resolver.c:9166
#8 0x000000000068a8b1 in dispatch (manager=0x7f54dedc7010) at task.c:1157
#9 run (uap=0x7f54dedc7010) at task.c:1331
#10 0x00007f54dd90cdd5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f54dd635ead in clone () from /lib64/libc.so.6
```
The other cause of similar problems is when growing the ADB tables - that one however is logged, whereas it doesn't look like 'rehash' or anything that calls it owns up (via logging) to what it is doing.
Our immediate quick-fix wish is for a solution to the delays caused by growing hash tables that is along the lines of being able to specify the starting size as named is launched. This needs to be either run-time or configurable in named.conf. (It is *not* helpful to make it build-time only because in many environments there will be a single build that is distributed to many servers whose needs/sizing can vary.)
It would also be really helpful if any hash table growing could be logged - to include what the size is expanding to (this will help admins to tune their servers accordingly).
====
Longer term, I understand that the wish is to replace the current and now fairly ancient hashing solution with something more modern, faster, and in particular, that doesn't need to block access when resizing - I'll leave engineering to open a new and independent ticket for that. For the here and now, we need a quicker fix, not a new development feature that can't be back-ported or easily applied.August 2020 (9.11.22, 9.11.22-S1, 9.16.6, 9.17.4)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1776BIND 9.16 and cache node locks for name cleaning vs. 'the thundering herd'2021-10-05T12:07:29ZCathy AlmondBIND 9.16 and cache node locks for name cleaning vs. 'the thundering herd'From [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
During investigations of intermittent 'brownouts' - periods in which named seemingly stops actioning client queries for a short period, and then resumes ...From [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
During investigations of intermittent 'brownouts' - periods in which named seemingly stops actioning client queries for a short period, and then resumes processing a second or two later (yes, delays of seconds not ms from this) we 'caught' one interesting scenario on BIND 9.16 in which it appeared that the vast majority of the active threads (netmgr and taskmgr both - so both client queries being answered from cache, AND client queries for which recursion had just taken place) were competing for the same cache node lock.
The pstack output demonstrating the problem was automatically triggered by monitoring for anomalies in inbound versus outbound network traffic.
The symptoms when this issue occurs are that:
* Outbound client-facing traffic rates plummet (well below the proportion that you would expect to see if it was only cache-misses not being serviced
* Recursive query rates plummet too
* CPU use increases - but in user space not in system space
* Recursive clients backlog increases (and may hit the limit)
* Fetchlimits may be triggered (we suspect this, and its predecessor are symptom not cause however, although triggering fetchlimits will exacerbate the situation, both from the client perspective, and as increased traffic rates as clients retry/re-send.
What we saw in the pstacks was that the majority netmgr threads (these answer directly from cache) were attempting to get a write lock on the node - for example:
```
Thread 74 (Thread 0x7f3ff366e700 (LWP 11713)):
#0 isc_rwlock_lock (rwl=rwl@entry=0x7f3f59523980, type=type@entry=isc_rwlocktype_write) at rwlock.c:57
#1 0x000000000051d826 in decrement_reference (rbtdb=rbtdb@entry=0x7f3fc6457010, node=node@entry=0x7f3eace34510, least_serial=least_serial@entry=0, nlock=nlock@entry=isc_rwlocktype_read, tlock=tlock@entry=isc_rwlocktype_none, pruning=pruning@entry=false) at rbtdb.c:2040
#2 0x00000000005215bf in detachnode (db=0x7f3fc6457010, targetp=targetp@entry=0x7f3ff366da88) at rbtdb.c:5352
#3 0x00000000005217be in rdataset_disassociate (rdataset=<optimized out>) at rbtdb.c:8691
#4 0x00000000005657e8 in dns_rdataset_disassociate (rdataset=rdataset@entry=0x7f3fad30cf28) at rdataset.c:111
#5 0x00000000004ebb21 in msgresetnames (first_section=0, msg=0x7f3fad2e1a50, msg@entry=0x7f3fad30b5f0) at message.c:438
#6 msgreset (msg=msg@entry=0x7f3fad2e1a50, everything=everything@entry=false) at message.c:524
#7 0x00000000004ec95a in dns_message_reset (msg=0x7f3fad2e1a50, intent=intent@entry=1) at message.c:760
#8 0x00000000004797ba in ns_client_endrequest (client=0x7f3fae5b8550) at client.c:229
#9 ns__client_reset_cb (client0=0x7f3fae5b8550) at client.c:1586
#10 0x0000000000632989 in isc_nmhandle_unref (handle=handle@entry=0x7f3fae5b83e0) at netmgr.c:1158
#11 0x0000000000632c30 in isc__nm_uvreq_put (req0=req0@entry=0x7f3ff366dbb8, sock=<optimized out>) at netmgr.c:1291
#12 0x00000000006357c4 in udp_send_cb (req=<optimized out>, status=<optimized out>) at udp.c:465
#13 0x00007f3ff5375153 in uv__udp_run_completed () from /lib64/libuv.so.1
#14 0x00007f3ff53754d3 in uv__udp_io () from /lib64/libuv.so.1
#15 0x00007f3ff5367c43 in uv_run () from /lib64/libuv.so.1
#16 0x0000000000632fda in nm_thread (worker0=0x138e3e0) at netmgr.c:481
#17 0x00007f3ff4f39e65 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f3ff484488d in clone () from /lib64/libc.so.6
```
A handful of threads are attempting to get a read lock on the same node - for example:
```
Thread 59 (Thread 0x7f3feab0e700 (LWP 11734)):
#0 0x00007f3ff4f3d144 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1 0x000000000063cc6e in isc_rwlock_lock (rwl=0x7f3f59523980, type=type@entry=isc_rwlocktype_read) at rwlock.c:48
#2 0x00000000005129c6 in rdataset_getownercase (rdataset=<optimized out>, name=0x7f3feaaffde0) at rbtdb.c:9770
#3 0x000000000056620a in towiresorted (rdataset=rdataset@entry=0x7f3ec42dee70, owner_name=owner_name@entry=0x7f3ec42dd0a0, cctx=<optimized out>, target=<optimized out>, order=<optimized out>, order_arg=order_arg@entry=0x7f3ec42b8718, partial=true, options=1, countp=0x7f3feab005dc, state=<optimized out>) at rdataset.c:444
#4 0x0000000000566e3f in dns_rdataset_towirepartial (rdataset=rdataset@entry=0x7f3ec42dee70, owner_name=owner_name@entry=0x7f3ec42dd0a0, cctx=<optimized out>, target=<optimized out>, order=<optimized out>, order_arg=order_arg@entry=0x7f3ec42b8718, options=<optimized out>, options@entry=1, countp=<optimized out>, countp@entry=0x7f3feab005dc, state=<optimized out>, state@entry=0x0) at rdataset.c:565
#5 0x00000000004ecc71 in dns_message_rendersection (msg=0x7f3ec42b8550, sectionid=sectionid@entry=1, options=options@entry=6) at message.c:2086
#6 0x00000000004780f3 in ns_client_send (client=client@entry=0x7f3ec5d4b510) at client.c:555
#7 0x0000000000485b7c in query_send (client=0x7f3ec5d4b510) at query.c:552
#8 0x000000000048de23 in ns_query_done (qctx=qctx@entry=0x7f3feab09a70) at query.c:10921
#9 0x000000000048f76d in query_respond (qctx=0x7f3feab09a70) at query.c:7414
#10 query_prepresponse (qctx=qctx@entry=0x7f3feab09a70) at query.c:9913
#11 0x000000000049181c in query_gotanswer (qctx=qctx@entry=0x7f3feab09a70, res=res@entry=0) at query.c:6836
#12 0x0000000000493a22 in query_lookup (qctx=qctx@entry=0x7f3feab09a70) at query.c:5617
#13 0x00000000004950f6 in query_zone_delegation (qctx=0x7f3feab09a70) at query.c:8003
#14 query_delegation (qctx=qctx@entry=0x7f3feab09a70) at query.c:8031
#15 0x0000000000491a1a in query_gotanswer (qctx=qctx@entry=0x7f3feab09a70, res=res@entry=65565) at query.c:6842
#16 0x0000000000493a22 in query_lookup (qctx=qctx@entry=0x7f3feab09a70) at query.c:5617
#17 0x0000000000494036 in ns__query_start (qctx=qctx@entry=0x7f3feab09a70) at query.c:5493
#18 0x000000000048de05 in ns_query_done (qctx=qctx@entry=0x7f3feab09a70) at query.c:10853
#19 0x0000000000492420 in query_dname (qctx=<optimized out>) at query.c:9806
#20 query_gotanswer (qctx=qctx@entry=0x7f3feab09a70, res=res@entry=65568) at query.c:6872
#21 0x0000000000493a22 in query_lookup (qctx=qctx@entry=0x7f3feab09a70) at query.c:5617
#22 0x00000000004950f6 in query_zone_delegation (qctx=0x7f3feab09a70) at query.c:8003
#23 query_delegation (qctx=qctx@entry=0x7f3feab09a70) at query.c:8031
#24 0x0000000000491a1a in query_gotanswer (qctx=qctx@entry=0x7f3feab09a70, res=res@entry=65565) at query.c:6842
#25 0x0000000000493a22 in query_lookup (qctx=qctx@entry=0x7f3feab09a70) at query.c:5617
#26 0x0000000000494036 in ns__query_start (qctx=qctx@entry=0x7f3feab09a70) at query.c:5493
#27 0x000000000048de05 in ns_query_done (qctx=qctx@entry=0x7f3feab09a70) at query.c:10853
#28 0x0000000000492420 in query_dname (qctx=<optimized out>) at query.c:9806
#29 query_gotanswer (qctx=qctx@entry=0x7f3feab09a70, res=res@entry=65568) at query.c:6872
#30 0x0000000000493a22 in query_lookup (qctx=qctx@entry=0x7f3feab09a70) at query.c:5617
#31 0x00000000004950f6 in query_zone_delegation (qctx=0x7f3feab09a70) at query.c:8003
#32 query_delegation (qctx=qctx@entry=0x7f3feab09a70) at query.c:8031
#33 0x0000000000491a1a in query_gotanswer (qctx=qctx@entry=0x7f3feab09a70, res=res@entry=65565) at query.c:6842
#34 0x0000000000493a22 in query_lookup (qctx=qctx@entry=0x7f3feab09a70) at query.c:5617
#35 0x0000000000494036 in ns__query_start (qctx=qctx@entry=0x7f3feab09a70) at query.c:5493
#36 0x0000000000494b26 in query_setup (client=client@entry=0x7f3ec5d4b510, qtype=<optimized out>) at query.c:5217
#37 0x0000000000497056 in ns_query_start (client=client@entry=0x7f3ec5d4b510) at query.c:11318
#38 0x000000000047b101 in ns__client_request (handle=<optimized out>, region=<optimized out>, arg=<optimized out>) at client.c:2209
#39 0x0000000000635462 in udp_recv_cb (handle=<optimized out>, nrecv=48, buf=0x7f3feab0ab00, addr=<optimized out>, flags=<optimized out>) at udp.c:329
#40 0x00007f3ff53755db in uv__udp_io () from /lib64/libuv.so.1
#41 0x00007f3ff53779c8 in uv__io_poll () from /lib64/libuv.so.1
#42 0x00007f3ff5367c70 in uv_run () from /lib64/libuv.so.1
#43 0x0000000000632fda in nm_thread (worker0=0x13926e8) at netmgr.c:481
#44 0x00007f3ff4f39e65 in start_thread () from /lib64/libpthread.so.0
#45 0x00007f3ff484488d in clone () from /lib64/libc.so.6
```
Meanwhile, the threads run by taskmgr (this bunch would have recursed) were attempting to get write locks (unsurprisingly, although depending on the node and the client query, I guess it's also possible that one might want to get a read lock):
Here's a writer:
```
Thread 50 (Thread 0x7f3fe587b700 (LWP 11746)):
#0 isc_rwlock_lock (rwl=rwl@entry=0x7f3f59523980, type=type@entry=isc_rwlocktype_write) at rwlock.c:57
#1 0x000000000051d826 in decrement_reference (rbtdb=rbtdb@entry=0x7f3fc6457010, node=node@entry=0x7f3eace34510, least_serial=least_serial@entry=0, nlock=nlock@entry=isc_rwlocktype_read, tlock=tlock@entry=isc_rwlocktype_none, pruning=pruning@entry=false) at rbtdb.c:2040
#2 0x00000000005215bf in detachnode (db=0x7f3fc6457010, targetp=0x7f3fe587acc0) at rbtdb.c:5352
#3 0x00000000004bdd83 in dns_db_detachnode (db=<optimized out>, nodep=nodep@entry=0x7f3fe587acc0) at db.c:588
#4 0x00000000004804cb in qctx_clean (qctx=qctx@entry=0x7f3fe587a830) at query.c:5097
#5 0x000000000048db5a in ns_query_done (qctx=qctx@entry=0x7f3fe587a830) at query.c:10834
#6 0x000000000048f76d in query_respond (qctx=0x7f3fe587a830) at query.c:7414
#7 query_prepresponse (qctx=qctx@entry=0x7f3fe587a830) at query.c:9913
#8 0x000000000049181c in query_gotanswer (qctx=qctx@entry=0x7f3fe587a830, res=res@entry=0) at query.c:6836
#9 0x0000000000496870 in query_resume (qctx=0x7f3fe587a830) at query.c:6134
#10 fetch_callback (task=<optimized out>, event=0x7f3ead5c9c18) at query.c:5716
#11 0x000000000064007a in dispatch (threadid=<optimized out>, manager=<optimized out>) at task.c:1152
#12 run (queuep=<optimized out>) at task.c:1344
#13 0x00007f3ff4f39e65 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f3ff484488d in clone () from /lib64/libc.so.6
```
In this particular instance, every single one of the legacy i/o-handler threads was twiddling its thumbs (sitting on epoll_wait() ) - which is probably not too surprising, if no taskmgr workers are sending out queries to auth servers?
Doing stats on this particular capture (74 threads - 24x netmgr, 24x taskmgr, 24x legacy i/o plus 1 each main and the timer thread), we have:
33 instances of isc_rwlock_lock (rwl=rwl@entry=0x7f3f59523980
31 instances of rbtdb=rbtdb@entry=0x7f3fc6457010
30 instances of node=node@entry=0x7f3eace34510
It might be that it's possible to prove from the pstack output that this is a series of different names all attached to the same node, versus a single name that is expiring that all of the threads are attempting to clean-up simultaneously.
Either way, the locking is not working well in this situation - there's a lot of spinning in user space it would appear.
Hypotheses being tendered currently include:
* This scenario has always potentially existed, but using pthread-rwlocks amplifies it considerably
* Could this be a case where prefetching (enabled with default settings in this example) hits a surprise edge case?
* Is it possible we're seeing the after-effects of another delay which has resulted in late client query-response processing for something that has a very short TTL in cache?
* Is this a scenario where a client comes along and queries near-simultaneously (and probably quite innocently) for a lot of similar names under the same domain/apex very close to the time where they would all be naturally expiring from cache?
* Could it be that TTL=0 handling has broken in 9.16 with the introduction of netmgr (noting that TTL=0 responses from auth servers would be expected to be available solely to the clients that recursed and waited for the fetch completion - not to anyone who came along after the fetch had populated cache for the waiting client request to be fulfilled - this should all be in taskmgr and none of it in netmgr)?
* Do we perhaps have too many threads running (detected CPUs = 24)?BIND 9.19.xOndřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1777Update the build instructions for automake2020-05-01T07:07:24ZOndřej SurýUpdate the build instructions for automakeGo through various README and other documentation files and update the instructions how to build BIND 9 with automake in place.Go through various README and other documentation files and update the instructions how to build BIND 9 with automake in place.May 2020 (9.11.19, 9.11.19-S1, 9.14.12, 9.16.3)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1778Cleanup the final remnants of platform.h2021-10-05T12:07:45ZOndřej SurýCleanup the final remnants of platform.hThere are still few remaining bits in the `platform.h` header that we need to remove and finally get rid of the header.There are still few remaining bits in the `platform.h` header that we need to remove and finally get rid of the header.BIND 9.17 BackburnerOndřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1780Fix system tests failing with Automake2020-04-27T14:27:27ZMichał KępieńFix system tests failing with Automake 1. `.gitlab-ci.yml` script for running tests is currently broken as it
hides test failures[^1].
2. A number of system tests (e.g. `rrsetorder`) are consistently
failing.
We should first make CI jobs fail when tests fail and t... 1. `.gitlab-ci.yml` script for running tests is currently broken as it
hides test failures[^1].
2. A number of system tests (e.g. `rrsetorder`) are consistently
failing.
We should first make CI jobs fail when tests fail and then fix the
failures one by one.
[^1]: `( cd bin/tests/system && make -j${TEST_PARALLEL_JOBS:-1} -k check V=1 ) || cat bin/tests/system/test-suite.log`May 2020 (9.11.19, 9.11.19-S1, 9.14.12, 9.16.3)https://gitlab.isc.org/isc-projects/bind9/-/issues/17829.16.x: listen-on-v6 { any; }; no longer works as documented on FreeBSD2020-06-08T12:28:11Zmsinatra9.16.x: listen-on-v6 { any; }; no longer works as documented on FreeBSD<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [...<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [security-officer@isc.org](security-officer@isc.org).
-->
### Summary
In 9.14.x running on FreeBSD, 'listen-on-v6 { any; )' functions as documented in the ARM:
When { any; } is specified as the address_match_list for the listen-on-v6 option, the server does not bind a separate socket to each IPv6 interface address as it does for IPv4 if the operating system has enough API support for IPv6 (specifically if it conforms to RFC 3493 and RFC 3542). Instead, it listens on the IPv6 wildcard address. If the system only has incomplete API support for IPv6,however, the behavior is the same as that for IPv4.
In 9.16.x, it does not function as documented.
9.14.x:
```
root@devns1:~ # fgrep listen-on-v6 /etc/namedb/named.conf
listen-on-v6 { any; };
root@devns1:~ # sockstat | grep named
bind named 44277 3 dgram -> /var/run/logpriv
bind named 44277 21 tcp6 *:53 *:*
bind named 44277 23 tcp4 127.0.0.1:53 *:*
bind named 44277 24 tcp4 127.0.0.1:953 *:*
bind named 44277 25 tcp6 ::1:953 *:*
bind named 44277 512 udp6 *:53 *:*
bind named 44277 514 udp4 127.0.0.1:53 *:*
```
9.16.1 (also verified on 9.16.2):
```
root@devns1:~ # fgrep listen-on-v6 /etc/namedb/named.conf
listen-on-v6 { any; };
root@devns1:~ # sockstat | grep named
bind named 617 27 udp6 ::1:53 *:*
bind named 617 28 tcp6 ::1:53 *:*
bind named 617 29 tcp6 ::1:53 *:*
bind named 617 30 udp6 fe80::1%lo0:53 *:*
bind named 617 31 tcp6 fe80::1%lo0:53 *:*
bind named 617 32 tcp6 fe80::1%lo0:53 *:*
bind named 617 33 udp4 127.0.0.1:53 *:*
bind named 617 34 tcp4 127.0.0.1:53 *:*
bind named 617 35 tcp4 127.0.0.1:53 *:*
bind named 617 36 tcp4 127.0.0.1:953 *:*
bind named 617 37 tcp6 ::1:953 *:*
```
### BIND version used
9.16.2 exhibits the bug:
```
BIND 9.16.2 (Stable Release) <id:b310dc7>
running on FreeBSD amd64 11.3-RELEASE-p7 FreeBSD 11.3-RELEASE-p7 #0: Tue Mar 17 08:32:23 UTC 2020 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
built by make with '--disable-linux-caps' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/namedb' '--with-dlopen=yes' '--with-libxml2' '--with-openssl=/usr/local' '--with-readline=-L/usr/local/lib -ledit' '--with-dlz-filesystem=yes' '--disable-dnstap' '--disable-fixed-rrset' '--disable-geoip' '--without-maxminddb' '--without-gssapi' '--with-libidn2=/usr/local' '--with-json-c' '--disable-largefile' '--with-lmdb=/usr/local' '--disable-native-pkcs11' '--without-python' '--disable-querytrace' 'STD_CDEFINES=-DDIG_SIGCHASE=1' '--enable-tcp-fastopen' '--with-tuning=default' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd11.3' 'build_alias=amd64-portbld-freebsd11.3' 'CC=cc' 'CFLAGS=-O2 -pipe -DLIBICONV_PLUG -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -ljson-c -Wl,-rpath,/usr/local/lib -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-DLIBICONV_PLUG -isystem /usr/local/include' 'CPP=cpp' 'PKG_CONFIG=pkgconf'
compiled by CLANG 4.2.1 Compatible FreeBSD Clang 8.0.0 (tags/RELEASE_800/final 356365)
compiled with OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
linked to OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with json-c version: 0.13.1
linked to json-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
default paths:
named configuration: /usr/local/etc/namedb/named.conf
rndc configuration: /usr/local/etc/namedb/rndc.conf
DNSSEC root key: /usr/local/etc/namedb/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/pid
named lock file: /var/run/named/named.lock
```
### Steps to reproduce
1. System running FreeBSD 11.3-RELEASE or 12.1-RELEASE.
2. Install BIND916 either from ports (with default options).
3. Create an lo1 interface with (an) IPv6 address(es). We use lo1 for the service addresses of our anycast instances.
4. `ifconfig lo1 down`
5. Start named with a basic recursive or authoritative config, with `listen-on-v6 { any; };` configured.
6. `sockstat | grep named`. named will not be listening on the wildcard -nor- on the IPv6 addresses configured on lo1. This is because FreeBSD supports Enhanced DAD on all interfaces and marks all v6 addresses as 'tentative' until the interface comes up.
7. `ifconfig lo1 up`
8. `sockstat | grep named`. named is still not listening on lo1's IPv6 addresses.
9. Attempt to query the server on the IPv6 address on lo1. It will time out.
10. `rndc scan`
11. repeat steps 8 and 9. Still not listening on the lo1 addresses and not responding.
12. RESTART named. It is now listening on the new lo1 addresses.
With 9.14.x, queries to the new address do not time out because named is properly listening on the wildcard.
WORKAROUND:
`ifconfig lo1 no_dad`. This disables DAD processing on the loopback (not clear why you need it there anyway) and clears the `tentative` flag even if lo1 is down. named will listen explicitly on the IPv6 addresses whether lo1 is marked "UP" or "DOWN." Note that this does not work reliably on FreeBSD 11.3-RELEASE, but does work on 12.1-RELEASE.June 2020 (9.11.20, 9.11.20-S1, 9.16.4, 9.17.2)Witold KrecickiWitold Krecickihttps://gitlab.isc.org/isc-projects/bind9/-/issues/1783AX_CHECK_COMPILE_FLAG -fno-delete-null-pointer-checks does not fail for clang2020-04-29T16:26:35ZMark AndrewsAX_CHECK_COMPILE_FLAG -fno-delete-null-pointer-checks does not fail for clangMay 2020 (9.11.19, 9.11.19-S1, 9.14.12, 9.16.3)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/1784RFC9103: DNS Zone Transfer over TLS (XoT)2022-08-01T11:52:34ZPeter DaviesRFC9103: DNS Zone Transfer over TLS (XoT)### Description
The [RFC9103](https://datatracker.ietf.org/doc/html/rfc9103) describes the use of TLS to encrypt zones transfers in order to provide confidentiality, known as XFR-over-TLS (XoT). The standard has been adopted by the DPRI...### Description
The [RFC9103](https://datatracker.ietf.org/doc/html/rfc9103) describes the use of TLS to encrypt zones transfers in order to provide confidentiality, known as XFR-over-TLS (XoT). The standard has been adopted by the DPRIVE WG.
### Feature Request
The feature request is for BIND to support XFR-over-TLS as described in the above RFC. This will obviously be dependent on DoT (RFC7858) being implemented in BIND. The specific aspects of the XoT implementation that are desired are:
- [x] * Support for both AXFR and IXFR
- [x] * XoT requires `dot` ALPN token to be negotiated (See: #2794)
- [x] * XoT requires TLSv1.3 or higher (See: #2795, and related #2796)
- [x] * Support for XFR-over-TLS both when BIND is acting as a primary and a secondary
- [X] * XFR-over-TLS (XoT): Primaries need to be able to restrict XFR to just TLS (#2776)
- [ ] * Related: Replace `tcp-only` with a more generic option (#2992)
- [X] * Support for authentication of TLS connections via X.509 certificates (Strict TLS and Mutual TLS)
- Related MR: !5600
- [X] * A TLS contexts cache needs to be implemented for contexts reuse and fast retrieval of the data associated with contexts (like CA intermediates chain): #3067, !5672
- [x] * Add remote TLS certificate verification support, implement Strict and Mutual TLS authentication (#3163)
- [ ] * Optimisation of TCP/TLS connections such that persistent connections can be re-used for multiple IXFRs for the same zone, and also IXFRs for different zones.
- [X] Client TLS session resumption support: !6274
### Related issues/bugs
- [x] * #2450 - Follow-up from "Draft: Resolve "XoT xfrin""
- See !5602 which addresses the most important points from the issue
- [x] * #2884 - Sometimes dig aborts on an AXFR query over TLS
- [X] * #2986 - TLS not working on the client-side (dig/named)
- [X] * #3004 - dig and named crash when receiving XFR over TLS
See RT [#16298](https://support.isc.org/Ticket/Display.html?id=16298)BIND 9.19.xArtem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1785Suspected lack of updating LRU on records used for DNSSEC validation2020-04-23T19:39:49ZBrian ConrySuspected lack of updating LRU on records used for DNSSEC validationIf this is true, then it can lead to situations where a cache goes overmem and starts discarding records with an LRU of "never", but which happen to be necessary for validating other records and thus need to be refetched, causing an incr...If this is true, then it can lead to situations where a cache goes overmem and starts discarding records with an LRU of "never", but which happen to be necessary for validating other records and thus need to be refetched, causing an increase in upstream recursion and also in validation (as all of those records needed for validation need to be validated before they can be used).
See https://support.isc.org/Ticket/Display.html?id=16212 and supporting documents for the data that led to this tentative conclusion.https://gitlab.isc.org/isc-projects/bind9/-/issues/1787BIND (master) does not work with krb5 1.18 (NegoEx)2020-06-23T11:47:32ZMichał KępieńBIND (master) does not work with krb5 1.18 (NegoEx)Current `master` does not work with krb5 1.18 (released in February
2020) - `nsupdate` and `tsiggss` system tests are consistently failing.
`git bisect` claims that an [upstream commit][1] implementing
[NegoEx][2]) is the culprit.
This ...Current `master` does not work with krb5 1.18 (released in February
2020) - `nsupdate` and `tsiggss` system tests are consistently failing.
`git bisect` claims that an [upstream commit][1] implementing
[NegoEx][2]) is the culprit.
This is only an issue for `master` as we do not use krb5's SPNEGO
mechanism in any other branch. Older branches either use an internal
SPNEGO implementation or no SPNEGO mechanism at all when
`--disable-isc-spnego` is used.
Out of all our Docker images, only the Tumbleweed one has krb5 1.18,
though - as luck would have it - the `krb5-devel` package there installs
`krb5-config` into a custom prefix (`/usr/lib/mit`), which prevents
BIND's `./configure` from autodetecting it and thus BIND builds on
Tumbleweed lack GSSAPI support altogether. I will push a branch shortly
which fixes this so that the breakage can be demonstrated in CI.
I cannot say I understand GSSAPI, so this needs attention from someone
who does.
[1]: https://github.com/krb5/krb5/commit/c2ca2f26eaf817a6a7ed42257c380437ab802bd9
[2]: https://tools.ietf.org/html/draft-zhu-negoex-04BIND 9.17 BackburnerOndřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/1791Explore if we need to make changes for OpenSSL 3.0.02021-10-05T12:48:05ZMark AndrewsExplore if we need to make changes for OpenSSL 3.0.0OpenSSL 3.0.0 alpha1 has just been released. Explore if there are changes needed to support working with OpenSSL 3.0.0.OpenSSL 3.0.0 alpha1 has just been released. Explore if there are changes needed to support working with OpenSSL 3.0.0.