BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2024-02-24T08:19:32Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4427Various improvements to hashing and hash table management2024-02-24T08:19:32ZMichał KępieńVarious improvements to hashing and hash table managementThis is a meta issue to keep track of various improvements to hashing
and hash table management that were implemented since ~"v9.18".
Sparked by a [Mattermost discussion][1].
---
- [x] #4306/!8288 Implement incremental hashing
---
...This is a meta issue to keep track of various improvements to hashing
and hash table management that were implemented since ~"v9.18".
Sparked by a [Mattermost discussion][1].
---
- [x] #4306/!8288 Implement incremental hashing
---
[1]: https://mattermost.isc.org/isc/pl/rsyemrkwhtfcbxhtyxddhkn58yMay 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/4423named starts up slow when many zones reference the same dnssec-policy2024-02-24T07:54:22ZMatthijs Mekkingmatthijs@isc.orgnamed starts up slow when many zones reference the same dnssec-policyWhile rolling out KASP to many zones, it is more efficient to use more DNSSEC policies in order to improve
reload/reconfig times.
When all zones or referenced by the same `dnssec-policy`, it takes quite some time to process all zones af...While rolling out KASP to many zones, it is more efficient to use more DNSSEC policies in order to improve
reload/reconfig times.
When all zones or referenced by the same `dnssec-policy`, it takes quite some time to process all zones after reload/reconfig and CPU usage of the named process remains at 100% and it takes quite a few minutes for named to start responding to queries after such a reload/reconfig request.
When spreading my zones to 10 identical policies, cpu usage goes well above 100% (using more threads I assume) and this is speeding
things up really nice.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Matthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/4422No supported algorithms on platform2024-02-29T15:26:01ZMark AndrewsNo supported algorithms on platformJob [#3783240](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3783240) failed for 5d20a7ce254dabe1d4a99f7bd0fd1cfa6309124b:
```
$ PYTHON="$(source bin/tests/system/conf.sh; echo $PYTHON)"
Traceback (most recent call last):
File "/bu...Job [#3783240](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3783240) failed for 5d20a7ce254dabe1d4a99f7bd0fd1cfa6309124b:
```
$ PYTHON="$(source bin/tests/system/conf.sh; echo $PYTHON)"
Traceback (most recent call last):
File "/builds/isc-projects/bind9/bin/tests/system/get_algorithms.py", line 241, in <module>
main()
File "/builds/isc-projects/bind9/bin/tests/system/get_algorithms.py", line 227, in main
algs = filter_supported(algs)
^^^^^^^^^^^^^^^^^^^^^^
File "/builds/isc-projects/bind9/bin/tests/system/get_algorithms.py", line 138, in filter_supported
raise RuntimeError(
RuntimeError: no DEFAULT algorithm from "stable" set supported on this platform
$
```May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Tom KrizekTom Krizekhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4400CID 467118: Control flow issue in lib/dns/message.c2024-02-24T07:55:29ZMichal NowakCID 467118: Control flow issue in lib/dns/message.cCoverity Scan claims control flow issue in `lib/dns/message.c` (suspect: !8400).
```
*** CID 467118: Control flow issues (DEADCODE)
/lib/dns/message.c: 1076 in getquestions()
1070 return (DNS_R_RECOVERABLE);
1071 }
1072 ...Coverity Scan claims control flow issue in `lib/dns/message.c` (suspect: !8400).
```
*** CID 467118: Control flow issues (DEADCODE)
/lib/dns/message.c: 1076 in getquestions()
1070 return (DNS_R_RECOVERABLE);
1071 }
1072 return (ISC_R_SUCCESS);
1073
1074 cleanup:
1075 if (rdataset != NULL) {
>>> CID 467118: Control flow issues (DEADCODE)
>>> Execution cannot reach this statement: "dns_message_puttemprdataset...".
1076 dns_message_puttemprdataset(msg, &rdataset);
1077 }
1078 if (free_name) {
1079 dns_message_puttempname(msg, &name);
1080 }
1081
```May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4352inline-signing breaks nsdiff2024-02-24T07:55:39ZBjörn Perssoninline-signing breaks nsdiff### Summary
When both inline-signing and update-policy are in use, I can't detect race conditions with the method described in RFC 2136 section 5.7, which nsdiff uses.
In a zone that uses dnssec-policy and relies on the default value o...### Summary
When both inline-signing and update-policy are in use, I can't detect race conditions with the method described in RFC 2136 section 5.7, which nsdiff uses.
In a zone that uses dnssec-policy and relies on the default value of inline-signing, the method in RFC 2136 section 5.7 will stop working on upgrade to BIND 9.20, as inline-signing will then be switched on by default, if I understand correctly.
### BIND version used
```
$ named -V
BIND 9.18.19-1~deb12u1-Debian (Extended Support Version) <id:>
running on Linux x86_64 5.10.0-25-amd64 #1 SMP Debian 5.10.191-1 (2023-08-16)
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--disable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=yes' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/reproducible-path/bind9-9.18.19=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 12.2.0
compiled with OpenSSL version: OpenSSL 3.0.10 1 Aug 2023
linked to OpenSSL version: OpenSSL 3.0.9 30 May 2023
compiled with libuv version: 1.44.2
linked to libuv version: 1.44.2
compiled with libnghttp2 version: 1.52.0
linked to libnghttp2 version: 1.52.0
compiled with libxml2 version: 2.9.14
linked to libxml2 version: 20914
compiled with json-c version: 0.16
linked to json-c version: 0.16
compiled with zlib version: 1.2.13
linked to zlib version: 1.2.13
linked to maxminddb version: 1.7.1
compiled with protobuf-c version: 1.4.1
linked to protobuf-c version: 1.4.1
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
```
### Steps to reproduce
Here I start from a working state with serial number 2023091800. The prerequisite matches the reply to the SOA query, and the update is answered with NOERROR. This is correct as far as I understand:
```
$ (echo 'prereq yxrrset xn--rombobjrn-67a.se. IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091800 14400 3600 3024000 86400' ; echo send) | nsupdate -k internal -d
Creating key...
Creating key...
namefromtext
keycreate
Reply from SOA query:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23595
;; flags: qr aa rd ra; QUESTION: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;xn--rombobjrn-67a.se. IN SOA
;; ANSWER SECTION:
xn--rombobjrn-67a.se. 86400 IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091800 14400 3600 3024000 86400
Found zone name: xn--rombobjrn-67a.se
The primary is: ns1.xn--rombobjrn-67a.se
Sending update to 192.168.72.1#53
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 44980
;; flags:; ZONE: 1, PREREQ: 1, UPDATE: 0, ADDITIONAL: 1
;; PREREQUISITE SECTION:
xn--rombobjrn-67a.se. 0 IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091800 14400 3600 3024000 86400
;; TSIG PSEUDOSECTION:
internal.beorn.tag.xn--rombobjrn-67a.se. 0 ANY TSIG hmac-sha512. 1695171088 300 64 [...] 44980 NOERROR 0
Reply from update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 44980
;; flags: qr; ZONE: 1, PREREQ: 0, UPDATE: 0, ADDITIONAL: 1
;; ZONE SECTION:
;xn--rombobjrn-67a.se. IN SOA
;; TSIG PSEUDOSECTION:
internal.beorn.tag.xn--rombobjrn-67a.se. 0 ANY TSIG hmac-sha512. 1695171088 300 64 [...] 44980 NOERROR 0
```
Later, the server has automatically increased the serial number to 2023091802. I use the new serial number in the prerequisite, so it looks identical to the new SOA value, but this time the update is rejected with NXRRSET:
```
$ (echo 'prereq yxrrset xn--rombobjrn-67a.se. IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091802 14400 3600 3024000 86400' ; echo send) | nsupdate -k internal -d
Creating key...
Creating key...
namefromtext
keycreate
Reply from SOA query:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65052
;; flags: qr aa rd ra; QUESTION: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;xn--rombobjrn-67a.se. IN SOA
;; ANSWER SECTION:
xn--rombobjrn-67a.se. 86400 IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091802 14400 3600 3024000 86400
Found zone name: xn--rombobjrn-67a.se
The primary is: ns1.xn--rombobjrn-67a.se
Sending update to 192.168.72.1#53
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 52912
;; flags:; ZONE: 1, PREREQ: 1, UPDATE: 0, ADDITIONAL: 1
;; PREREQUISITE SECTION:
xn--rombobjrn-67a.se. 0 IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091802 14400 3600 3024000 86400
;; TSIG PSEUDOSECTION:
internal.beorn.tag.xn--rombobjrn-67a.se. 0 ANY TSIG hmac-sha512. 1695259647 300 64 [...] 52912 NOERROR 0
Reply from update query:
;; ->>HEADER<<- opcode: UPDATE, status: NXRRSET, id: 52912
;; flags: qr; ZONE: 1, PREREQ: 0, UPDATE: 0, ADDITIONAL: 1
;; ZONE SECTION:
;xn--rombobjrn-67a.se. IN SOA
;; TSIG PSEUDOSECTION:
internal.beorn.tag.xn--rombobjrn-67a.se. 0 ANY TSIG hmac-sha512. 1695259647 300 64 [...] 52912 NOERROR 0
```
Now I change the prerequisite back to 2023091800. The SOA hasn't changed again. The serial number is still 2023091802. This update should be rejected as the prerequisite contains an outdated serial number, but is in fact answered with NOERROR:
```
$ (echo 'prereq yxrrset xn--rombobjrn-67a.se. IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091800 14400 3600 3024000 86400' ; echo send) | nsupdate -k internal -d
Creating key...
Creating key...
namefromtext
keycreate
Reply from SOA query:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6226
;; flags: qr aa rd ra; QUESTION: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;xn--rombobjrn-67a.se. IN SOA
;; ANSWER SECTION:
xn--rombobjrn-67a.se. 86400 IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091802 14400 3600 3024000 86400
Found zone name: xn--rombobjrn-67a.se
The primary is: ns1.xn--rombobjrn-67a.se
Sending update to 192.168.72.1#53
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 49961
;; flags:; ZONE: 1, PREREQ: 1, UPDATE: 0, ADDITIONAL: 1
;; PREREQUISITE SECTION:
xn--rombobjrn-67a.se. 0 IN SOA ns1.xn--rombobjrn-67a.se. hostmaster.xn--rombobjrn-67a.se. 2023091800 14400 3600 3024000 86400
;; TSIG PSEUDOSECTION:
internal.beorn.tag.xn--rombobjrn-67a.se. 0 ANY TSIG hmac-sha512. 1695263626 300 64 [...] 49961 NOERROR 0
Reply from update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 49961
;; flags: qr; ZONE: 1, PREREQ: 0, UPDATE: 0, ADDITIONAL: 1
;; ZONE SECTION:
;xn--rombobjrn-67a.se. IN SOA
;; TSIG PSEUDOSECTION:
internal.beorn.tag.xn--rombobjrn-67a.se. 0 ANY TSIG hmac-sha512. 1695263626 300 64 [...] 49961 NOERROR 0
```
### What is the current *bug* behavior?
It seems that a serial number specified in a prerequisite of an update is compared to the unsigned version of the zone, but the serial number retrieved with a SOA or AXFR query is from the signed version. As far as I know a client can't look up the unsigned serial number, and thus can't specify it in the prerequisite. Thus the update fails when the two serial numbers differ.
### What is the expected *correct* behavior?
It seems to me that a prerequisite that specifies a SOA record should be checked against the same record that a client gets in response to a SOA or AXFR query. I don't know what other usecases that might break though.
### Relevant excerpts from the configuration file
```
dnssec-policy "some_name" {
keys {
ksk lifetime unlimited algorithm rsasha256 2048;
zsk lifetime unlimited algorithm rsasha256 2048;
};
dnskey-ttl P1D;
purge-keys 0;
};
view "internal" {
allow-transfer { key internal.beorn.tag.xn--rombobjrn-67a.se.; };
zone "xn--rombobjrn-67a.se" {
type master;
file "/var/lib/bind/db.xn--rombobjrn-67a.se.internal";
dnssec-policy some_name;
parental-agents { ::1; };
inline-signing yes;
update-policy {
grant internal.beorn.tag.xn--rombobjrn-67a.se. zonesub ANY;
};
};
};May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Matthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/4345Debug messages logging network traffic only include the address of one peer2024-02-24T08:19:42ZMichał KępieńDebug messages logging network traffic only include the address of one peerEven with `-d 99` used on the command line, `named` only logs lines
like:
28-Sep-2023 14:31:23.212 sending packet to 2001:503:ba3e::2:30#53
or:
28-Sep-2023 14:31:23.232 received packet from 2001:503:ba3e::2:30#53
However, net...Even with `-d 99` used on the command line, `named` only logs lines
like:
28-Sep-2023 14:31:23.212 sending packet to 2001:503:ba3e::2:30#53
or:
28-Sep-2023 14:31:23.232 received packet from 2001:503:ba3e::2:30#53
However, network traffic is always sent **from** one socket **to**
another. The currently available debug messages do not include the
sender's address (first example) or the receiver's address (second
example). As a result, just bumping up the log level is often not
enough to diagnose certain issues and a network traffic sniffer has to
be used in order to learn the details of the packets being exchanged.
This lack of detail sometimes also makes debugging system test issues
harder than it has to be. With multiple tests being run in parallel,
knowing the exact addresses and ports that were used by each running
`named` instance is crucial for determining whether a test failure was
caused by an unexpected interaction between tests or not. (Such issues
happened more than once in the past, particularly when network code
and/or the system test framework were being worked on.)
Debug messages logging network traffic should be extended to include
information about both sides of each communication channel.
While this issue is technically only tangential to #4344, having
detailed network-level information available would greatly improve the
benefits of the feature proposed here.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Michał KępieńMichał Kępieńhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4344Enable extraction of exact local socket addresses2024-02-24T08:19:47ZMichał KępieńEnable extraction of exact local socket addressesThe Network Manager API is currently unable to expose the exact
address/port that a local wildcard/TCP socket is bound to. This limits
the level of detail available to all sorts of traffic-logging code
(debug messages, dnstap, etc.)
Th...The Network Manager API is currently unable to expose the exact
address/port that a local wildcard/TCP socket is bound to. This limits
the level of detail available to all sorts of traffic-logging code
(debug messages, dnstap, etc.)
This has been previously discussed (in dnstap context) in #3143. Back
then, it quickly [emerged][1] that extracting the exact address that a
local wildcard/TCP socket is bound to requires issuing a system call.
Unfortunately, the function that would be responsible for doing this is
[called on a hot path][2]. After running some performance tests, it
[became obvious][3] that doing that unconditionally is a non-starter
performance-wise. The proposal was scrapped and replaced with a [note
in documentation](!6472).
However, the problem persists and limits the capabilities of not just
dnstap, but also logging code. In some cases, more detailed logging is
preferred over raw performance and there should be some way for the user
to express their preference in that regard.
[1]: https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5816#note_272336
[2]: https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5816#note_272404
[3]: https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5816#note_272407May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Michał KępieńMichał Kępieńhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4333catz: assertion failure in dns_view_attach() during shutdown2024-02-24T07:55:32ZArаm Sаrgsyаncatz: assertion failure in dns_view_attach() during shutdownSee https://gitlab.isc.org/isc-projects/bind9/-/jobs/3673976
This has happened on a branch based on `main`, so for now only ~"Affects v9.19" is set, but the other branches are probably affected too. The labels will updated when there is...See https://gitlab.isc.org/isc-projects/bind9/-/jobs/3673976
This has happened on a branch based on `main`, so for now only ~"Affects v9.19" is set, but the other branches are probably affected too. The labels will updated when there is more information.
```
[New LWP 55771]
[New LWP 55766]
[New LWP 55752]
[New LWP 55767]
[New LWP 55772]
[New LWP 55770]
[New LWP 55773]
[New LWP 55769]
[New LWP 55768]
[New LWP 55774]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/lt-named -D catz_tmp_b13wq61e-ns4 -X'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f0c573e0b8f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f0c3c87e700 (LWP 55771))]
Thread 10 (Thread 0x7f0c3b07b700 (LWP 55774)):
#0 0x00007f0c573cb9bd in syscall () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f0c57b68c76 in synchronize_rcu_memb () from /lib64/liburcu.so.6
No symbol table info available.
#2 0x00007f0c57b6975d in call_rcu_thread () from /lib64/liburcu.so.6
No symbol table info available.
#3 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 9 (Thread 0x7f0c5247d700 (LWP 55768)):
#0 0x00007f0c580ff60e in pthread_barrier_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f0c5b8416f9 in resume_loop (loop=0x7f0c53e93180) at loop.c:114
loopmgr = <optimized out>
loopmgr = <optimized out>
#2 pauseresume_cb (handle=<optimized out>) at loop.c:114
loop = 0x7f0c53e93180
#3 0x00007f0c5922a2f1 in uv.async_io.part () from /lib64/libuv.so.1
No symbol table info available.
#4 0x00007f0c5923bd15 in uv.io_poll () from /lib64/libuv.so.1
No symbol table info available.
#5 0x00007f0c5922aa74 in uv_run () from /lib64/libuv.so.1
No symbol table info available.
#6 0x00007f0c5b841792 in loop_thread (arg=arg@entry=0x7f0c53e93180) at loop.c:282
loop = 0x7f0c53e93180
r = <optimized out>
__func__ = "loop_thread"
ret = <optimized out>
#7 0x00007f0c5b850469 in thread_body (wrap=wrap@entry=0xf19630) at thread.c:85
func = 0x7f0c5b841707 <loop_thread>
arg = 0x7f0c53e93180
ret = 0x0
jemalloc_enforce_init = 0x7f0c48000b60
#8 0x00007f0c5b850492 in thread_run (wrap=0xf19630) at thread.c:100
ret = <optimized out>
#9 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#10 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 8 (Thread 0x7f0c51c7c700 (LWP 55769)):
#0 0x00007f0c573cb9bd in syscall () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f0c5795e96d in futex_wait.part () from /lib64/liburcu-cds.so.6
No symbol table info available.
#2 0x00007f0c5795ee10 in workqueue_thread () from /lib64/liburcu-cds.so.6
No symbol table info available.
#3 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 7 (Thread 0x7f0c3b87c700 (LWP 55773)):
#0 0x00007f0c5810187d in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f0c580fab29 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#2 0x00007f0c57b681c9 in mutex_lock () from /lib64/liburcu.so.6
No symbol table info available.
#3 0x00007f0c57b69498 in rcu_register_thread_memb () from /lib64/liburcu.so.6
No symbol table info available.
#4 0x00007f0c5b856969 in isc__work_cb (req=<optimized out>) at work.c:28
work = 0x7f0c406203c0
#5 0x00007f0c592254ee in worker () from /lib64/libuv.so.1
No symbol table info available.
#6 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#7 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 6 (Thread 0x7f0c3d07f700 (LWP 55770)):
#0 0x00007f0c580fe4ac in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f0c5923821d in uv_cond_wait () from /lib64/libuv.so.1
No symbol table info available.
#2 0x00007f0c5922558d in worker () from /lib64/libuv.so.1
No symbol table info available.
#3 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 5 (Thread 0x7f0c3c07d700 (LWP 55772)):
#0 0x00007f0c580fe4ac in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f0c5923821d in uv_cond_wait () from /lib64/libuv.so.1
No symbol table info available.
#2 0x00007f0c5922558d in worker () from /lib64/libuv.so.1
No symbol table info available.
#3 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 4 (Thread 0x7f0c52c7e700 (LWP 55767)):
#0 0x00007f0c580ff60e in pthread_barrier_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f0c5b8416f9 in resume_loop (loop=0x7f0c53e92900) at loop.c:114
loopmgr = <optimized out>
loopmgr = <optimized out>
#2 pauseresume_cb (handle=<optimized out>) at loop.c:114
loop = 0x7f0c53e92900
#3 0x00007f0c5922a2f1 in uv.async_io.part () from /lib64/libuv.so.1
No symbol table info available.
#4 0x00007f0c5923bd15 in uv.io_poll () from /lib64/libuv.so.1
No symbol table info available.
#5 0x00007f0c5922aa74 in uv_run () from /lib64/libuv.so.1
No symbol table info available.
#6 0x00007f0c5b841792 in loop_thread (arg=arg@entry=0x7f0c53e92900) at loop.c:282
loop = 0x7f0c53e92900
r = <optimized out>
__func__ = "loop_thread"
ret = <optimized out>
#7 0x00007f0c5b850469 in thread_body (wrap=wrap@entry=0xf19660) at thread.c:85
func = 0x7f0c5b841707 <loop_thread>
arg = 0x7f0c53e92900
ret = 0x0
jemalloc_enforce_init = 0x7f0c44000b60
#8 0x00007f0c5b850492 in thread_run (wrap=0xf19660) at thread.c:100
ret = <optimized out>
#9 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#10 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 3 (Thread 0x7f0c5bd5e480 (LWP 55752)):
#0 0x00007f0c5810187d in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f0c580fab29 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#2 0x00007f0c57b681c9 in mutex_lock () from /lib64/liburcu.so.6
No symbol table info available.
#3 0x00007f0c57b68c44 in synchronize_rcu_memb () from /lib64/liburcu.so.6
No symbol table info available.
#4 0x00007f0c5b36f3ca in dns_view_detach (viewp=viewp@entry=0x7ffddd6d9348) at view.c:519
mkzone = 0x0
rdzone = 0x0
resolver = 0x0
adb = 0x7f0c40625600
zonetable = 0x7f0c5395d0a0
requestmgr = 0x7f0c4061b1e0
dispatchmgr = 0x7f0c53e7a9a0
view = 0x7f0c538cfc00
__func__ = "dns_view_detach"
#5 0x00000000004457e1 in shutdown_server (arg=0x7f0c53e881c0) at server.c:10025
server = 0x7f0c53e881c0
view = 0x0
view_next = 0x7f0c538cf600
kasp = 0x0
kasp_next = <optimized out>
flush = false
nsc = 0x0
#6 0x00007f0c5b82e23a in isc__async_cb (handle=<optimized out>) at async.c:111
job = 0x7f0c53e673c0
loop = 0x7f0c53e91800
jobs = {head = {node = {next = 0x7f0c53e63cd0}}, tail = {p = 0x7f0c53e673d0}}
ret = <optimized out>
node = 0x7f0c53e673d0
next = 0x0
#7 0x00007f0c5922a2f1 in uv.async_io.part () from /lib64/libuv.so.1
No symbol table info available.
#8 0x00007f0c5923bd15 in uv.io_poll () from /lib64/libuv.so.1
No symbol table info available.
#9 0x00007f0c5922aa74 in uv_run () from /lib64/libuv.so.1
No symbol table info available.
#10 0x00007f0c5b841792 in loop_thread (arg=arg@entry=0x7f0c53e91800) at loop.c:282
loop = 0x7f0c53e91800
r = <optimized out>
__func__ = "loop_thread"
ret = <optimized out>
#11 0x00007f0c5b850469 in thread_body (wrap=0xf1a3e0) at thread.c:85
func = 0x7f0c5b841707 <loop_thread>
arg = 0x7f0c53e91800
ret = 0x0
jemalloc_enforce_init = 0xf1a410
#12 0x00007f0c5b8504e3 in isc_thread_main (func=func@entry=0x7f0c5b841707 <loop_thread>, arg=0x7f0c53e91800) at thread.c:116
No locals.
#13 0x00007f0c5b84271a in isc_loopmgr_run (loopmgr=0x7f0c53e11540) at loop.c:454
__func__ = "isc_loopmgr_run"
#14 0x0000000000425995 in main (argc=<optimized out>, argv=<optimized out>) at main.c:1580
result = <optimized out>
Thread 2 (Thread 0x7f0c5347f700 (LWP 55766)):
#0 0x00007f0c580ff60e in pthread_barrier_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007f0c5b8416f9 in resume_loop (loop=0x7f0c53e92080) at loop.c:114
loopmgr = <optimized out>
loopmgr = <optimized out>
#2 pauseresume_cb (handle=<optimized out>) at loop.c:114
loop = 0x7f0c53e92080
#3 0x00007f0c5922a2f1 in uv.async_io.part () from /lib64/libuv.so.1
No symbol table info available.
#4 0x00007f0c5923bd15 in uv.io_poll () from /lib64/libuv.so.1
No symbol table info available.
#5 0x00007f0c5922aa74 in uv_run () from /lib64/libuv.so.1
No symbol table info available.
#6 0x00007f0c5b841792 in loop_thread (arg=arg@entry=0x7f0c53e92080) at loop.c:282
loop = 0x7f0c53e92080
r = <optimized out>
__func__ = "loop_thread"
ret = <optimized out>
#7 0x00007f0c5b850469 in thread_body (wrap=wrap@entry=0xf19710) at thread.c:85
func = 0x7f0c5b841707 <loop_thread>
arg = 0x7f0c53e92080
ret = 0x0
jemalloc_enforce_init = 0x7f0c4c000b60
#8 0x00007f0c5b850492 in thread_run (wrap=0xf19710) at thread.c:100
ret = <optimized out>
#9 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#10 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 1 (Thread 0x7f0c3c87e700 (LWP 55771)):
#0 0x00007f0c573e0b8f in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f0c573b3ea5 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x0000000000422efd in assertion_failed (file=0x7f0c5b3eb932 "view.c", line=429, type=isc_assertiontype_insist, cond=0x7f0c5b3c1f70 "__v > 0 && __v < (4294967295U)") at main.c:234
No locals.
#3 0x00007f0c5b82df3b in isc_assertion_failed (file=file@entry=0x7f0c5b3eb932 "view.c", line=line@entry=429, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7f0c5b3c1f70 "__v > 0 && __v < (4294967295U)") at assertions.c:48
No locals.
#4 0x00007f0c5b36c3f1 in dns_view_attach (source=source@entry=0x7f0c538cfc00, targetp=targetp@entry=0x7f0c3a20e4c8) at view.c:431
__v = <optimized out>
#5 0x000000000042ba15 in catz_run (entry=0x7f0c39e25000, origin=origin@entry=0x7f0c53e88540, view=0x7f0c538cfc00, udata=0x680670 <ns_catz_cbdata>, type=type@entry=CATZ_DELZONE) at server.c:2956
cz = 0x7f0c3a20e4b0
action = 0x42baf4 <catz_delzone_cb>
#6 0x000000000042ba6e in catz_delzone (entry=<optimized out>, origin=origin@entry=0x7f0c53e88540, view=<optimized out>, udata=<optimized out>) at server.c:2972
No locals.
#7 0x00007f0c5b244c2b in dns__catz_zones_merge (catz=0x7f0c53e88540, newcatz=0x7f0c3a211000) at catz.c:696
entry = 0x7f0c39e25000
result = ISC_R_SUCCESS
iter1 = 0x0
iter2 = 0x7f0c3a21e060
iteradd = 0x7f0c3a21e040
itermod = 0x7f0c3a21e020
toadd = 0x7f0c53e6aa30
tomod = 0x7f0c53e6a9e0
delcur = <optimized out>
czname = "catalog-tls.example\000\f\177\000\000\240\340tW\f\177\000\000\000\000\000\000\000\000\000\000\212\067HW\f\177\000\000@\000\000\000\000\000\000\000\000\006\"s\223>Z\177 \260\207<\f\177\000\000\034\024\205[\f\177\000\000\067\000\000\000\037\000\000\000\016\000\000\000\026\000\000\000\b\000\000\000{\000\000\000\001\000\000\000\000\000\000\000P\262\207<\f\177\000\000\365\016\204[\f\177\000\000{e\206[\f\177\000\000\000&uW\f\177\000\000\240\343tW\f\177\000\000Pl\205[\f\177\000\000{e\206[\f\177\000\000\000\000\000\000\000\000\000\000P\262\207<\f\177\000\000\300\b\204[\f\177\000\000H\361\350S\f\177\000\000"...
zname = "tls1.example\000\177\000\000\377", '\000' <repeats 23 times>, '\377' <repeats 16 times>, '\000' <repeats 24 times>, "`\330tW\f\177\000\000\000\006\"s\223>Z\177\000&uW\f\177\000\000\207(\255\373\000\000\000\000\360\200\346S\f\177\000\000\060\264\207<\f\177\000\000!fuB\000\000\000\000\240\253\207<\f\177\000\000\377", '\000' <repeats 23 times>, '\377' <repeats 16 times>, "\000\006\"s\223>Z\177\340\253\207<\f\177\000\000\000"...
addzone = 0x42ba81 <catz_addzone>
modzone = 0x42ba70 <catz_modzone>
delzone = 0x42ba5f <catz_delzone>
__func__ = "dns__catz_zones_merge"
#8 0x00007f0c5b248106 in dns__catz_update_cb (data=<optimized out>) at catz.c:2467
catz = <optimized out>
updb = <optimized out>
catzs = <optimized out>
oldcatz = 0x7f0c53e88540
newcatz = 0x7f0c3a211000
result = ISC_R_SHUTTINGDOWN
r = <optimized out>
node = 0x0
vers_node = 0x0
updbit = 0x0
fixname = {name = {magic = 1145983854, ndata = 0x0, length = 0, labels = 0, attributes = {absolute = false, readonly = false, dynamic = false, dynoffsets = false, nocompress = false, cache = false, answer = false, ncache = false, chaining = false, chase = false, wildcard = false, prerequisite = false, update = false, hasupdaterec = false}, offsets = 0x7f0c3c87bf40 "", buffer = 0x7f0c3c87bfc0, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {head = 0x0, tail = 0x0}}, offsets = "\000\b\024\034", '\000' <repeats 123 times>, buffer = {magic = 1114990113, base = 0x7f0c3c87c000, length = 255, used = 0, current = 0, active = 0, extra = 0, dynamic = false, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0}, data = "\aversion\vcatalog-tls\aexample", '\000' <repeats 36 times>, "!fuB\000\000\000\000@y\":\f\177\000\000\000\000\002\000\016", '\000' <repeats 19 times>, '\377' <repeats 16 times>, "\000\000\000\000\000\000\000\000!fuB", '\000' <repeats 20 times>, ")\253\017X\f\177\000\000\000\000\000\000\000\000\000\000\n\000\000\000\000\000\000\000\200\322\326W\f\177\000\000\000\000\000\000\000\000\000\000\066\303\017X\f\177\000\000\000"...}
name = 0x7f0c3c87bef0
rdsiter = 0x0
rdataset = {magic = 1145983826, methods = 0x0, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, rdclass = 0, type = 0, ttl = 0, trust = 0, covers = 0, attributes = 0, count = 4294967295, resign = 0, {keytable = {node = 0x0, iter = 0x0}, ncache = {raw = 0x0, iter_pos = 0x0, iter_count = 0}, slab = {db = 0x0, node = 0x0, raw = 0x0, iter_pos = 0x0, iter_count = 0, noqname = 0x0, closest = 0x0}, rdlist = {list = 0x0, iter = 0x0, noqname = 0x0, closest = 0x0, node = 0x0}, rps = {db = 0x0, iter_pos = 0x0, iter_count = 0}}}
bname = "catalog-tls.example", '\000' <repeats 413 times>...
cname = "\000\274\207<\f\177\000\000\001\000\000\000\001\000\000\000@y\":\f\177", '\000' <repeats 19 times>, "\204$:\f\177\000\000\000\340$:\f\177", '\000' <repeats 11 times>, "~`@\f\177", '\000' <repeats 18 times>, "\340\300\207<\f\177\000\000\200\250\340S\f\177\000\000\000\000\000\000\000\000\000\000 \200d@\f\177\000\000\020", '\000' <repeats 311 times>...
is_vers_processed = <optimized out>
is_active = true
vers = 2
catz_vers = <optimized out>
__func__ = "dns__catz_update_cb"
#9 0x00007f0c5b856976 in isc__work_cb (req=<optimized out>) at work.c:30
work = 0x7f0c53e6dbe0
#10 0x00007f0c592254ee in worker () from /lib64/libuv.so.1
No symbol table info available.
#11 0x00007f0c580f81da in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#12 0x00007f0c573cbe73 in clone () from /lib64/libc.so.6
No symbol table info available.
```May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4298ns1 shutdown hang in "tcp:checking that BIND 9 doesn't crash on long TCP mess...2024-02-24T07:51:48ZMichal Nowakns1 shutdown hang in "tcp:checking that BIND 9 doesn't crash on long TCP messages"Job [#3639436](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639436) failed for 028154d416c2a29bea41c4d9658066845539b82a.
jemalloc arenas were merged to `main` and 9.18, but they did not help with the "tcp:checking that BIND 9 doesn...Job [#3639436](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639436) failed for 028154d416c2a29bea41c4d9658066845539b82a.
jemalloc arenas were merged to `main` and 9.18, but they did not help with the "tcp:checking that BIND 9 doesn't crash on long TCP messages" check (isc-projects/bind9#4038) entirely (nor with the `isc_mem_benchmark` check of the `mem_test` unit test, see isc-projects/bind9#4286). Arguably, I've never seen the `tcp` check fail four times in a day:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639436
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639704
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639696
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639687
However, the nature of the failure is different: `ns1` is not OOM-killed but it didn't terminate in time, that is 5 minutes wasn't enough to terminate and the process was aborted.
```
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:checking that BIND 9 doesn't crash on long TCP messages (10)
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:sending 300000 time(s): 00010000000100000000000003697363036f72670000fc0001
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:............................................................................................................................................................................................................................................................................................................
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:sent 4023683 bytes to 10.53.0.1:20597
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:exit status: 0
---------------------------- Captured log teardown -----------------------------
2023-09-06 00:25:23 INFO:tcp I:tcp_tmp_n9au2yez:ns1 didn't die when sent a SIGTERM
2023-09-06 00:25:23 ERROR:tcp Failed to stop servers
2023-09-06 00:25:24 INFO:tcp I:tcp_tmp_n9au2yez:Core dump(s) found: /builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez/ns1/core.387947
2023-09-06 00:25:24 INFO:tcp D:tcp_tmp_n9au2yez:backtrace from /builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez/ns1/core.387947:
2023-09-06 00:25:24 INFO:tcp D:tcp_tmp_n9au2yez:--------------------------------------------------------------------------------
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D tcp_tmp_n9au2yez-ns1 -X nam'.
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:Program terminated with signal SIGABRT, Aborted.
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#0 0x00007f8a0a4c3129 in pthread_barrier_wait@GLIBC_2.2.5 () from /lib64/libc.so.6
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:[Current thread is 1 (Thread 0x7f8a09eaa580 (LWP 387947))]
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#0 0x00007f8a0a4c3129 in pthread_barrier_wait@GLIBC_2.2.5 () from /lib64/libc.so.6
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#1 0x00007f8a0b2432a2 in stop_tcp_child_job (arg=0x7f8a09ab6800) at netmgr/tcp.c:589
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#2 0x00007f8a0b243372 in stop_tcp_child (sock=<optimized out>) at netmgr/tcp.c:597
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#3 0x00007f8a0b243b21 in isc__nm_tcp_stoplistening (sock=0x7f8a09a77800) at netmgr/tcp.c:622
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#4 0x00007f8a0b23b359 in isc_nm_stoplistening (sock=<optimized out>) at netmgr/netmgr.c:1699
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#5 0x00007f8a0b23dc62 in isc__nmsocket_stop (listener=0x7f8a09a76e00) at netmgr/netmgr.c:1730
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#6 0x00007f8a0b24183e in isc__nm_streamdns_stoplistening (sock=<optimized out>) at netmgr/streamdns.c:962
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#7 0x00007f8a0b23b360 in isc_nm_stoplistening (sock=<optimized out>) at netmgr/netmgr.c:1702
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#8 0x00007f8a0afcab27 in ns_interface_shutdown (ifp=ifp@entry=0x7f8a09ad7980) at interfacemgr.c:729
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#9 0x00007f8a0afcaf9a in purge_old_interfaces (mgr=mgr@entry=0x7f8a09a70500) at interfacemgr.c:815
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#10 0x00007f8a0afcb13e in ns_interfacemgr_shutdown (mgr=0x7f8a09a70500) at interfacemgr.c:435
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#11 0x0000000000445bf1 in shutdown_server (arg=0x7f8a09a9f700) at server.c:9983
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#12 0x00007f8a0b24b383 in isc__async_cb (handle=<optimized out>) at async.c:111
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#13 0x00007f8a0a977bd3 in ?? () from /lib64/libuv.so.1
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#14 0x00007f8a0a99457b in ?? () from /lib64/libuv.so.1
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#15 0x00007f8a0a97d097 in uv_run () from /lib64/libuv.so.1
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#16 0x00007f8a0b25d1ba in loop_thread (arg=arg@entry=0x7f8a09aac800) at loop.c:282
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#17 0x00007f8a0b26ca20 in thread_body (wrap=0xb788b0) at thread.c:85
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#18 0x00007f8a0b26ca99 in isc_thread_main (func=func@entry=0x7f8a0b25d12f <loop_thread>, arg=0x7f8a09aac800) at thread.c:116
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#19 0x00007f8a0b25e109 in isc_loopmgr_run (loopmgr=0x7f8a09a6f6c0) at loop.c:454
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#20 0x0000000000426faa in main (argc=16, argv=0x7fff1308dae8) at main.c:1592
2023-09-06 00:27:07 INFO:tcp D:tcp_tmp_n9au2yez:--------------------------------------------------------------------------------
```
```
06-Sep-2023 00:20:58.284 client @0x7f895a786800 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a787400 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a788000 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a788c00 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a789800 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a78a400 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a7a5000 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a7a5c00 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 netmgr 0x7f8a09a6f900: Shutting down network manager worker on loop 0x7f8a09aae180(3)
06-Sep-2023 00:20:58.284 netmgr 0x7f8a09a6f900: Shutting down network manager worker on loop 0x7f8a09aad900(2)
```
[core.387947-backtrace.txt](/uploads/dab4601f64759ee9475d504cac179df2/core.387947-backtrace.txt)
[named.run](/uploads/d5b292cffcbd202101133e28d131551b/named.run)
Locally, I can't reproduce it; `ns1` terminates at worst in 210 seconds.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/4202Running the "mkeys" system test around the top of the hour may cause it to fail2024-02-29T15:26:10ZMichał KępieńRunning the "mkeys" system test around the top of the hour may cause it to failhttps://gitlab.isc.org/isc-private/bind9/-/jobs/3509369
The `mkeys` system test failed silently on the following step:
2023-07-06 15:00:59 INFO:mkeys I:mkeys_tmp_5qq267_u:revoke key with bad signature, check revocation is ignor...https://gitlab.isc.org/isc-private/bind9/-/jobs/3509369
The `mkeys` system test failed silently on the following step:
2023-07-06 15:00:59 INFO:mkeys I:mkeys_tmp_5qq267_u:revoke key with bad signature, check revocation is ignored (19)
This means that it was more than likely `set -e` that triggered the
failure. Unfortunately, the `mkeys` system test is written in a way
that does not make debugging easy when `set -e` is in effect. There are
a *lot* of steps in the relevant check and each of them could trigger
the failure:
<details>
<summary>Click to expand/collapse</summary>
```sh
n=$((n+1))
echo_i "revoke key with bad signature, check revocation is ignored ($n)"
ret=0
revoked=$($REVOKE -K ns1 "$original")
rkeyid=$(keyfile_to_key_id "$revoked")
rm -f ns1/root.db.signed.jnl
# We need to activate at least one valid DNSKEY to prevent dnssec-signzone from
# failing. Alternatively, we could use -P to disable post-sign verification,
# but we actually do want post-sign verification to happen to ensure the zone
# is correct before we break it on purpose.
$SETTIME -R none -D none -K ns1 "$standby1" > /dev/null
$SIGNER -Sg -K ns1 -N unixtime -O full -o . -f signer.out.$n ns1/root.db > /dev/null 2>/dev/null
cp -f ns1/root.db.signed ns1/root.db.tmp
BADSIG="SVn2tLDzpNX2rxR4xRceiCsiTqcWNKh7NQ0EQfCrVzp9WEmLw60sQ5kP xGk4FS/xSKfh89hO2O/H20Bzp0lMdtr2tKy8IMdU/mBZxQf2PXhUWRkg V2buVBKugTiOPTJSnaqYCN3rSfV1o7NtC1VNHKKK/D5g6bpDehdn5Gaq kpBhN+MSCCh9OZP2IT20luS1ARXxLlvuSVXJ3JYuuhTsQXUbX/SQpNoB Lo6ahCE55szJnmAxZEbb2KOVnSlZRA6ZBHDhdtO0S4OkvcmTutvcVV+7 w53CbKdaXhirvHIh0mZXmYk2PbPLDY7PU9wSH40UiWPOB9f00wwn6hUe uEQ1Qg=="
# Less than a second may have passed since ns1 was started. If we call
# dnssec-signzone immediately, ns1/root.db.signed will not be reloaded by the
# subsequent "rndc reload ." call on platforms which do not set the
# "nanoseconds" field of isc_time_t, due to zone load time being seemingly
# equal to master file modification time.
sleep 1
sed -e "/ $rkeyid \./s, \. .*$, . $BADSIG," signer.out.$n > ns1/root.db.signed
mkeys_reload_on 1 || ret=1
mkeys_refresh_on 2 || ret=1
mkeys_status_on 2 > rndc.out.$n 2>&1 || ret=1
# one key listed
count=$(grep -c "keyid: " rndc.out.$n) || true
[ "$count" -eq 1 ] || { echo_i "'keyid:' count ($count) != 1"; ret=1; }
# it's the original key id
count=$(grep -c "keyid: $originalid" rndc.out.$n) || true
[ "$count" -eq 1 ] || { echo_i "'keyid: $originalid' count ($count) != 1"; ret=1; }
# not revoked
count=$(grep -c "REVOKE" rndc.out.$n) || true
[ "$count" -eq 0 ] || { echo_i "'REVOKE' count ($count) != 0"; ret=1; }
# trust is still current
count=$(grep -c "trust" rndc.out.$n) || true
[ "$count" -eq 1 ] || { echo_i "'trust' count != 1"; ret=1; }
count=$(grep -c "trusted since" rndc.out.$n) || true
[ "$count" -eq 1 ] || { echo_i "'trusted since' count != 1"; ret=1; }
if [ $ret != 0 ]; then echo_i "failed"; fi
status=$((status+ret))
```
</details>
However, it is possible to look at the presence/absence of certain files
among the test artifacts and also to look at file timestamps, so that
some scenarios can be ruled out. In this case, `ns1/root.db.tmp` did
not exist, so execution did not reach the `cp -f` line. This meant that
only `dnssec-revoke`, `dnssec-settime`, and `dnssec-signzone` could have
failed. However, since there were three key files in `ns1` newer than
the `$original` one (meaning that `dnssec-revoke` and `dnssec-settime`
did their job), `dnssec-signzone` was the primary suspect. I ran its
invocation from the test manually on the artifacts and...
```
$ dnssec-signzone -Sg -K ns1 -N unixtime -O full -o . -f signer.out.19 ns1/root.db
Fetching ./ECDSAP384SHA384/25503 (KSK) from key repository.
Fetching ./ECDSAP256SHA256/37163 (KSK) from key repository.
Fetching ./ECDSAP384SHA384/24825 (ZSK) from key repository.
dnssec-signzone: warning: Serial number would not advance, using increment method instead
Verifying the zone using the following algorithms:
- ECDSAP256SHA256
Missing ZSK for algorithm ECDSAP256SHA256
Missing self-signed KSK for algorithm ECDSAP384SHA384
No correct ECDSAP256SHA256 signature for . NSEC
No correct ECDSAP256SHA256 signature for . SOA
No correct ECDSAP256SHA256 signature for . NS
No correct ECDSAP256SHA256 signature for example NSEC
No correct ECDSAP256SHA256 signature for example TXT
No correct ECDSAP256SHA256 signature for a.root-servers.nil NSEC
No correct ECDSAP256SHA256 signature for a.root-servers.nil A
No correct ECDSAP256SHA256 signature for tld NSEC
The zone is not fully signed for the following algorithms:
ECDSAP256SHA256
ECDSAP384SHA384
.
DNSSEC completeness test failed.
Zone verification failed (failure)
```
But wait, how is it even possible that this zone is signed using
multiple algorithms?
```
$ git grep -F _ALGORITHM bin/tests/system/mkeys/ | wc -l
13
$ git grep -F _ALGORITHM bin/tests/system/mkeys/ | grep -vF DEFAULT_ALGORITHM
$
```
The `*_ALGORITHM` environment variables are set by
`bin/tests/system/get_algorithms.py`. The script is written in a way
that allows the algorithms used to be chosen randomly from a specific
set. The `mkeys` test takes advantage of that feature and sets
`ALGORITHM_SET` to `ecc_default`. The script tries to ensure a stable
set of algorithms is used for each system test run by [seeding its RNG
with a value derived from the current time][1]. This works most of the
time, but if we get really unlucky, `setup.sh` can be run during one
"time slot" while `tests.sh` is run during another. This is exactly
what happened in this case:
```
------------------------------ Captured log setup ------------------------------
2023-07-06 14:59:58 INFO:mkeys switching to tmpdir: /builds/isc-private/bind9/bin/tests/system/mkeys_tmp_5qq267_u
2023-07-06 14:59:58 INFO:mkeys test started: mkeys/tests_sh_mkeys.py
2023-07-06 14:59:58 INFO:mkeys using port range: <20583, 20602>
------------------------------ Captured log call -------------------------------
2023-07-06 15:00:14 INFO:mkeys I:mkeys_tmp_5qq267_u:check for signed record (1)
2023-07-06 15:00:14 INFO:mkeys I:mkeys_tmp_5qq267_u:check positive validation with valid trust anchor (2)
2023-07-06 15:00:14 INFO:mkeys I:mkeys_tmp_5qq267_u:check for failed validation due to wrong key in managed-keys (3)
...
```
(Note that when the `test started: ...` line is logged, the script
actually runs `setup.sh` first. `tests.sh` is run afterwards.)
This problem happens very rarely, so I am not sure if we need to do
anything about it, but it felt right to open an issue so that others are
aware that this is a thing. `mkeys` is the only system test that
currently sets the `ALGORITHM_SET` variable, so exposure is minimal. If
we migrate more tests to variable algorithms, this might become a more
pressing issue to address.
[1]: https://gitlab.isc.org/isc-projects/bind9/-/blob/bf8acd455693edef03881fd2180c5561bc0db66d/bin/tests/system/get_algorithms.py#L171-175May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Tom KrizekTom Krizekhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4104ZoneQuota stats counter is not counting everything2024-02-24T07:55:05ZOndřej SurýZoneQuota stats counter is not counting everythingThe `ZoneQuota` should log all the hits to `fcount_incr()` returning `ISC_R_QUOTA`, but it does only in a single place. The counting should be moved to `fctx_incr()`.The `ZoneQuota` should log all the hits to `fcount_incr()` returning `ISC_R_QUOTA`, but it does only in a single place. The counting should be moved to `fctx_incr()`.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3987Change DNSKEY TTL of inline-signed zone2024-02-24T07:55:08ZGerald VogtChange DNSKEY TTL of inline-signed zone### Description
I have a few zones using inline-signing which I have set up originally with 2d TTL. Due to this the existing DNSKEY RRs also have 2d TTL. Now I have been trying to reduce the TTL to 1d but it seems there is no supported ...### Description
I have a few zones using inline-signing which I have set up originally with 2d TTL. Due to this the existing DNSKEY RRs also have 2d TTL. Now I have been trying to reduce the TTL to 1d but it seems there is no supported way or tool to do so.
I have set dnskey-ttl to 1d and replace keys, still all DNSKEY RRs have 2d TTL. Setting it on the key with dnssec-settime doesn't help either and man pages specifically mention:
```
This option sets the default TTL to use for this key when it is converted into a DNSKEY RR.
This is the TTL used when the key is imported into a zone, unless there was already a DNSKEY
RRset in place, in which case the existing TTL takes precedence.
```
Running AlmaLinux 9 bind-9.16.23-5.el9_1.x86_64.
### Request
Add some way to change the TTL of DNSKEY RRs in inline-signed zones.
### Links / references
I have found a thread from 2016 about the same problem: https://www.mail-archive.com/bind-users@lists.isc.org/msg23186.htmlMay 2024 (9.18.27, 9.18.27-S1, 9.19.24)Matthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/3792incoming AXFR sometimes does not close TCP connection2024-02-24T07:53:11ZPetr Špačekpspacek@isc.orgincoming AXFR sometimes does not close TCP connection### Summary
I've noticed in PCAPs that sometimes BIND does not close TCP connection after successful incoming AXFR. This might cause source port depletion on a busy server.
### BIND version used
* ~"Affects v9.19": 9.19.9 56d7e01
* No...### Summary
I've noticed in PCAPs that sometimes BIND does not close TCP connection after successful incoming AXFR. This might cause source port depletion on a busy server.
### BIND version used
* ~"Affects v9.19": 9.19.9 56d7e01
* Not reproducible on ~"v9.18" (9.18.11 equivalent, b04ab06) - albeit closing the connection can take more than one second, it happens from the secondary side as expected
* ~"Affects v9.16": (9.16.37, b4a65aaea19762a3712932aa2270e8a833fbde22) - reproducible
Don't ask me how is that possible ...
### Steps to reproduce
1. Configure primary with 100k zones + catalog - can be BIND or Knot DNS (recommended to take BIND out of equation on one side)
2. Configure BIND as secondary for the catalog
3. Start secondary with clean state
### What is the current *bug* behavior?
PCAPs show that sometimes the primary closes hanging connection after primary-side timeout.
### What is the expected *correct* behavior?
Connections are closed as soon as possible.
### Relevant configuration files
#### Primary
* [named.conf](/uploads/863bf85788384d2e4893ea94cc606c89/named.conf)
* [catalog.db](/uploads/c515216922d648acf6065f7a50b36233/catalog.db)
* [empty.db](/uploads/5686c122ffb6fd4eb035bc1b88931e0f/empty.db)
Knot DNS version: [knotd.conf](/uploads/e59561f0b1f2047d348a51303d5a2119/knotd.conf)
#### Secondary
* [named.conf](/uploads/984a16e8322400cc6465b14ca45710ef/named.conf)
### Relevant logs and/or screenshots
* Primary: [primary.log.zst](/uploads/e50efe9e008cd762b3a671245e207b7d/primary.log.zst)
* Secondary: [secondary-for-knotd-conf3000.log.zst](/uploads/e8732553815f19ebf3b483629afd6279/secondary-for-knotd-conf3000.log.zst)
* search for `z19823.test` and look at timestamps
* PCAP: [bindconf3000.pcap.zst](/uploads/86b89c9ddc6d4e1c6066cfd1a997c25b/bindconf3000.pcap.zst)
* search for `tcp.stream eq 37322` in Wireshark to get `z19823.test` transfer
Suspicious conversation from the PCAP, times relative to the previous packet:
|No. | Time | Source | Source Port | Destination | Reply code | Info|
|--- | --- | --- | --- | --- | --- | ---|
|484345 | 0 | 192.0.2.2 | 40571 | 192.0.2.1 | | 40571 → 53 [SYN] Seq=0 Win=64660 Len=0 MSS=1220 SACK_PERM TSval=3661096036 TSecr=0 WS=128|
|484346 | 0,000027 | 192.0.2.1 | 53 | 192.0.2.2 | | 53 → 40571 [SYN, ACK] Seq=0 Ack=1 Win=65232 Len=0 MSS=1220 SACK_PERM TSval=1123290483 TSecr=3661096036 WS=128|
|484347 | 0,000008 | 192.0.2.2 | 40571 | 192.0.2.1 | | 40571 → 53 [ACK] Seq=1 Ack=1 Win=64768 Len=0 TSval=3661096036 TSecr=1123290483|
|511718 | 1,98078 | 192.0.2.2 | 40571 | 192.0.2.1 | | Standard query 0x47aa AXFR z19823.test|
|511719 | 0,000019 | 192.0.2.1 | 53 | 192.0.2.2 | | 53 → 40571 [ACK] Seq=1 Ack=32 Win=65280 Len=0 TSval=1123292464 TSecr=3661098017|
|511724 | 0,000107 | 192.0.2.1 | 53 | 192.0.2.2 | No error | Standard query response 0x47aa AXFR z19823.test SOA <Root> NS invalid SOA <Root>|
|511726 | 0,000009 | 192.0.2.2 | 40571 | 192.0.2.1 | | 40571 → 53 [ACK] Seq=32 Ack=121 Win=64768 Len=0 TSval=3661098017 TSecr=1123292464|
|601979 | 9,49634 | 192.0.2.1 | 53 | 192.0.2.2 | | 53 → 40571 [FIN, ACK] Seq=121 Ack=32 Win=65280 Len=0 TSval=1123301960 TSecr=3661098017|
|602469 | 0,040942 | 192.0.2.2 | 40571 | 192.0.2.1 | | 40571 → 53 [ACK] Seq=32 Ack=122 Win=64768 Len=0 TSval=3661107554 TSecr=1123301960|
|621475 | 1,959518 | 192.0.2.2 | 40571 | 192.0.2.1 | | 40571 → 53 [FIN, ACK] Seq=32 Ack=122 Win=64768 Len=0 TSval=3661109514 TSecr=1123301960|
|621476 | 0,000019 | 192.0.2.1 | 53 | 192.0.2.2 | | 53 → 40571 [ACK] Seq=122 Ack=33 Win=65280 Len=0 TSval=1123303961 TSecr=3661109514|
### Possible fixesMay 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/3472IPv4-only mode not respected for zone transfers2024-02-24T07:55:11ZThomas AmgartenIPv4-only mode not respected for zone transfers### Summary
Running BIND in IPv4-only-mode (```named -4```) and using a mirror zone (local root), then BIND tries to AXFR the root zone with IPv6 and reports failures about the unreachability:
```
27-Jul-2022 08:57:43.309 general: info:...### Summary
Running BIND in IPv4-only-mode (```named -4```) and using a mirror zone (local root), then BIND tries to AXFR the root zone with IPv6 and reports failures about the unreachability:
```
27-Jul-2022 08:57:43.309 general: info: zone ./IN: refresh: failure trying primary 2001:500:2::c#53 (source ::#0): operation canceled
27-Jul-2022 08:57:43.309 general: info: zone ./IN: refresh: failure trying primary 2001:500:2f::f#53 (source ::#0): operation canceled
27-Jul-2022 08:57:43.809 general: info: zone ./IN: refresh: failure trying primary 2001:500:12::d0d#53 (source ::#0): operation canceled
27-Jul-2022 08:57:43.809 general: info: zone ./IN: refresh: failure trying primary 2001:7fd::1#53 (source ::#0): operation canceled
27-Jul-2022 08:57:44.309 general: info: zone ./IN: refresh: failure trying primary 2620:0:2830:202::132#53 (source ::#0): operation canceled
27-Jul-2022 08:57:44.309 general: info: zone ./IN: refresh: failure trying primary 2620:0:2d0:202::132#53 (source ::#0): operation canceled
```
### BIND version used
```
$ named -V
BIND 9.18.5 (Stable Release) <id:6593103>
running on Linux x86_64 4.18.0-305.10.2.el8_4.x86_64 #1 SMP Tue Jul 20 20:34:55 UTC 2021
built by make with '--prefix=/usr/local/bind-9.18.5' '--sysconfdir=/opt/chroot/bind/etc/named/' '--mandir=/usr/local/share/man' '--localstatedir=/opt/chroot/bind/var' '--enable-largefile' '--enable-full-report' '--without-gssapi' '--with-json-c' '--enable-singletrace' 'PKG_CONFIG_PATH=:/usr/local/libuv/lib/pkgconfig/'
compiled by GCC 8.4.1 20200928 (Red Hat 8.4.1-1)
compiled with OpenSSL version: OpenSSL 1.1.1g FIPS 21 Apr 2020
linked to OpenSSL version: OpenSSL 1.1.1g FIPS 21 Apr 2020
compiled with libuv version: 1.41.1
linked to libuv version: 1.41.1
compiled with libnghttp2 version: 1.33.0
linked to libnghttp2 version: 1.33.0
compiled with json-c version: 0.13.1
linked to json-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
default paths:
named configuration: /opt/chroot/bind/etc/named/named.conf
rndc configuration: /opt/chroot/bind/etc/named/rndc.conf
DNSSEC root key: /opt/chroot/bind/etc/named/bind.keys
nsupdate session key: /opt/chroot/bind/var/run/named/session.key
named PID file: /opt/chroot/bind/var/run/named/named.pid
named lock file: /opt/chroot/bind/var/run/named/named.lock
```
### Steps to reproduce
Create a mirror zone:
```
zone "." {
type mirror;
notify no;
};
```
Run BIND with IPv4-only:
```
$ /usr/local/bind/sbin/named -4 -t /opt/chroot/bind -u named -c /etc/named/named.conf
```
And now check the log for the IPv6 failure:
```
27-Jul-2022 09:18:59.148 general: info: zone ./IN: refresh: failure trying primary 2001:500:200::b#53 (source ::#0): operation canceled
27-Jul-2022 09:18:59.651 general: info: zone ./IN: refresh: failure trying primary 2001:500:2::c#53 (source ::#0): operation canceled
27-Jul-2022 09:18:59.651 general: info: zone ./IN: refresh: failure trying primary 2001:500:2f::f#53 (source ::#0): operation canceled
27-Jul-2022 09:19:00.151 general: info: zone ./IN: refresh: failure trying primary 2001:500:12::d0d#53 (source ::#0): operation canceled
27-Jul-2022 09:19:00.151 general: info: zone ./IN: refresh: failure trying primary 2001:7fd::1#53 (source ::#0): operation canceled
27-Jul-2022 09:19:00.651 general: info: zone ./IN: refresh: failure trying primary 2620:0:2830:202::132#53 (source ::#0): operation canceled
27-Jul-2022 09:19:00.651 general: info: zone ./IN: refresh: failure trying primary 2620:0:2d0:202::132#53 (source ::#0): operation canceled
```
### What is the current *bug* behavior?
BIND tries to AXFR the root zone over IPv6, although ```named``` is configured to run in IPv4-only-mode.
### What is the expected *correct* behavior?
Not trying to AXFR the mirror zone over IPv6.
### Relevant configuration files
### Relevant logs and/or screenshots
Failure in the log:
```
27-Jul-2022 09:11:18.990 general: info: zone ./IN: refresh: failure trying primary 2001:500:2::c#53 (source ::#0): operation canceled
27-Jul-2022 09:11:18.990 general: info: zone ./IN: refresh: failure trying primary 2001:500:2f::f#53 (source ::#0): operation canceled
27-Jul-2022 09:11:19.490 general: info: zone ./IN: refresh: failure trying primary 2001:500:12::d0d#53 (source ::#0): operation canceled
27-Jul-2022 09:11:19.490 general: info: zone ./IN: refresh: failure trying primary 2001:7fd::1#53 (source ::#0): operation canceled
27-Jul-2022 09:11:19.990 general: info: zone ./IN: refresh: failure trying primary 2620:0:2830:202::132#53 (source ::#0): operation canceled
27-Jul-2022 09:11:19.990 general: info: zone ./IN: refresh: failure trying primary 2620:0:2d0:202::132#53 (source ::#0): operation canceled
```
### Possible fixesMay 2024 (9.18.27, 9.18.27-S1, 9.19.24)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2744warning: checkhints: unable to get root NS rrset from cache: not found2024-03-27T00:34:35ZCathy Almondwarning: checkhints: unable to get root NS rrset from cache: not foundPeriodically we see reports of resolvers that are failing to respond to clients successfully, perhaps with a build-up of recursive clients, inbound UDP packet drops, late and missing query responses and so on. Rebooting the server entir...Periodically we see reports of resolvers that are failing to respond to clients successfully, perhaps with a build-up of recursive clients, inbound UDP packet drops, late and missing query responses and so on. Rebooting the server entirely usually fixes the problem - for a time. Flushing cache may also buy some relief, but generally this does not last as long as if the server is rebooted entirely.
Plus one symptom in the logs - repeated spates of messages like this:
```
31-May-2021 16:08:38.110 general: warning: checkhints: unable to get root NS rrset from cache: not found
31-May-2021 16:08:41.110 general: warning: checkhints: unable to get root NS rrset from cache: not found
31-May-2021 16:08:42.151 general: warning: checkhints: unable to get root NS rrset from cache: not found
```
This error message occurs when the root nameservers have just been primed, but when checkhints goes to look at them, they're no longer available in cache (have been expired, possible also removed), all in a very short period of time.
Reports of this have been seen intermittently for many years and from many versions of BIND 9. Typically (in the older reports) this was a rare occurrence seen on a resolver that had been running for a long time; months, possibly years. Therefore after rebooting, the error and the problem was never seen again (or at least not within the shelf-life of the admin who reported it to us originally).
We suspect that what is happening is that the cache structure and content have become unmaintainable over a long period of content being added, expired and removed, and that there it's become impossible to add new RRsets to cache without using expiring existing content because of max-cache-size. The cache tree structure itself also occupies memory, and we've seen a few instances where a long-lived cache has become 'straggly' but also sparsely populated.
What we haven't been able to catch (yet), is the exact path taken that causes this error to be logged, although we have been hoping that improved stats, along with a `catch it earlier` assertion (the server anyway needs to be restarted when it has reached this state) might help. See #2082 .
We have also seen that in one or two instances of this warning being logged, in addition there was a problem reaching some of the root nameserver addresses listed in the root hints and used for priming. Either the root hints were out of date and an older IP address was unreachable, or there were local routing issues (typically IPv6-related) reaching some root server addresses. **This shouldn't be a problem**, per the way that root hints priming is designed, _but 'fixing' the root hints appears to have made the problem go away in some instances, as has fixing the routing and unreachability of some root hint addresses._
----
For anyone experiencing this problem for the first time, the likelihood is that one or more things have changed in your operating environment, and that these are causing cache content to be more substantial than before, or potentially distributed differently. For example:
- Installing a version of BIND that has `stale-cache-enable yes` by default
- An increase in client queries overall
- Client query patterns changing - perhaps causing a higher rate than usual of cached negative responses
- An increase in dual-stack clients querying for AAAA records
- An increase in client querying for HTTPS records
- A new client application that uses DNS-based probing
- Clients using a tunnelling-over-DNS service
- Using a client filtering service that operates by means of resolving the original client query first by appending another private zone name to it and checking the response status before allowing the original query to pass - thus adding the filtering RRsets to cache as well as the actual client query responses.
Currently, clues may be found in the BIND statistics and also in a dump of cache.
Firstly, these counters (available either from the output from `rndc stats` or using the xml or json statistics interface), can be a good indicator that there is too much cache cleaning taking place due to memory pressure, versus RRset TTL expiration:
DeleteLRU - "cache records deleted due to memory exhaustion"
DeleteTTL - "cache records deleted due to TTL expiration"
These are counters, therefore although seeing DeleteLRU far exceeding DeleteTTL in a single snapshot of the stats is a good indicator that all is not well with cache, ideally you want to monitor the trend over time.
Also these :
HeapMemInUse - "cache heap memory in use"
TreeMemInUse - "cache tree memory in use"
HeapMemMax - "cache heap highest memory in use"
TreeMemMax - "cache tree highest memory in use"
All of the above are gauges - they tell you 'this is where we are now', so a snapshot can be useful, as well as monitoring pattern over time. The 'Max' is a high water mark.
Aside: don't be tempted to look at either of these - they are not useful operationally and aren't counting what you might think they are from their names:
HeapMemTotal - "cache heap memory total"
TreeMemTotal - "cache tree memory total"
And finally, there are counters available of what's in cache currrently by RType. These are prefixed with `!` for counters of NXRRSET (pseudo RR indicating that a name that was queried existed but the type didn't), `#` for stale content, and `~` for content that has expired and is waiting on housekeeping/deletion.
If there is any kind of unexpected skew, it might be worth dumping cache to see what's in there.
And then decide - is it just that max-cache-size is now insufficient, or is that something else needs to be done to reduce cache content.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/4601wrong filename looked when reading key files2024-02-23T20:10:41ZMichael Tokarevwrong filename looked when reading key files### Summary
When bind9 tools read a zone file with DNSKEY records, for which no .key file is provided but .private exists, a misleading error message is generated. For example:
```
$ dnssec-signzone 168.192.in-addr.arpa
dnssec-signzone...### Summary
When bind9 tools read a zone file with DNSKEY records, for which no .key file is provided but .private exists, a misleading error message is generated. For example:
```
$ dnssec-signzone 168.192.in-addr.arpa
dnssec-signzone: warning: dns_dnssec_keylistfromrdataset: error reading ./K168.192.in-addr.arpa.+007+13293.private: file not found
$ ls -l ./K168.192.in-addr.arpa.+007+13293.*
-rw------- 1 root root 1707 Oct 28 2011 ./K168.192.in-addr.arpa.+007+13293.private
```
So, it reports an existing file as "not found", while actually (according to strace) it looked for a .key file (which indeed does not exist, since it is inlined in the zone itself).
The end result is that this key is not processed at all, despite the tool has all the information, - the .key file contents is in the zone already (that's where dnssec-signzone found the `+007+13293` part, so it does have the DNSKEY record and don't actually need the .key file), and the .private file which it reported as missing (while not even trying to open it), is actually exists.
### BIND version affected
```
BIND 9.18.24-1-Debian (Extended Support Version) <id:>
running on Linux x86_64 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01)
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--disable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=yes' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/reproducible-path/bind9-9.18.24=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 12.2.0
compiled with OpenSSL version: OpenSSL 3.0.11 19 Sep 2023
linked to OpenSSL version: OpenSSL 3.0.11 19 Sep 2023
compiled with libuv version: 1.44.2
linked to libuv version: 1.44.2
compiled with libnghttp2 version: 1.52.0
linked to libnghttp2 version: 1.52.0
compiled with libxml2 version: 2.9.14
linked to libxml2 version: 20914
compiled with json-c version: 0.16
linked to json-c version: 0.16
compiled with zlib version: 1.2.13
linked to zlib version: 1.2.13
linked to maxminddb version: 1.7.1
compiled with protobuf-c version: 1.4.1
linked to protobuf-c version: 1.4.1
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
```Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4585Add an option to named-compilezone to retain comments2024-02-18T03:27:07ZMarco DavidsAdd an option to named-compilezone to retain comments### Description
`named-compilezone` strips comments from zone files.
### Request
There might be use cases, where `named-compilezone` is used as a cleanup tool, while any comments that are present need to be retained.
It would be gre...### Description
`named-compilezone` strips comments from zone files.
### Request
There might be use cases, where `named-compilezone` is used as a cleanup tool, while any comments that are present need to be retained.
It would be great if this could be achieved by some command-line option.
Obviously there are some caveats, but it seems that these can be addressed by defining certain conditions that must be met and by properly documenting the right way of working. For example, just as a suggestion: to have any comments on the same line in the zonefile like this:
Undefined (may fail):
```
; comment 1
www AAAA 2001:db8::1
; comment 2
```
Well defined (will work):
`www AAAA 2001:db8::1 ; comment 1`
### Links / references
n/aNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4582Add support for QUIC and DNS over QUIC/DoQ (RFC 9250)2024-03-05T20:04:55ZArtem BoldarievAdd support for QUIC and DNS over QUIC/DoQ (RFC 9250)_This section is very likely to be changed/updated/expanded in the future._
## Overview
One of the relatively recent additions of transport protocols for DNS is QUIC (DoQ, covered by [RFC 9250](https://www.rfc-editor.org/rfc/rfc9250.ht..._This section is very likely to be changed/updated/expanded in the future._
## Overview
One of the relatively recent additions of transport protocols for DNS is QUIC (DoQ, covered by [RFC 9250](https://www.rfc-editor.org/rfc/rfc9250.html)) and HTTP/3 (DoH3), which also works on top of QUIC.
We need a generic implementation of the QUIC protocol in BIND's codebase to proceed with these new transports.
QUIC is a sophisticated transport that works on top of UDP and uses encryption on top of TLSv1.3. It is covered by multiple RFCs and is being actively worked on. The list of RFCs includes the following:
- https://www.rfc-editor.org/rfc/rfc9250.html
- https://www.rfc-editor.org/rfc/rfc8999.html
- https://www.rfc-editor.org/rfc/rfc9000.html
- https://www.rfc-editor.org/rfc/rfc9001.html
- https://www.rfc-editor.org/rfc/rfc9002.html
- https://www.rfc-editor.org/rfc/rfc9369.html
The protocol includes a lot of functionality, resembling the one from HTTP/2. Most notably, it is multiple uni- or bi-directional streams per connection. This aspect might have been influenced by the need to carry a new version of HTTP protocol (HTTP/3), which uses the multi-stream nature of QUIC instead of protocol-specific multiplexing found in HTTP/2.
Each bi-directional stream (in which we are the most interested, as these are used for DoQ) from the point of view of the higher-level code acts similarly to a TCP connection, though in the case of DNS, no more than one request/query per stream is allowed. That means that DNS pipelining is achieved by relying on the multistream nature of QUIC (again, similarly to HTTP/2). The thing is that each stream (being, effectively, a separate "connection") shares TLS parameters with others.
Another important aspect of QUIC is client connection end-point migration. That is, a client may change its location (= IP address and UDP port) while keeping the connection active. That functionality requires complete virtualisation of networking connections, which are now being identified not by IP address and port combination but by abstract connection identifiers (connection IDs). That brings a notion of a "connection path" into the picture as well as a procedure of "path migration" for a client.
Though that is my personal opinion, QUIC seems to be at least partially a result of large companies' experience of running user-space TCP/IP stacks on scale - when TCP/IP stack is implemented as a user-space library in an application which "directly" uses a dedicated network card. Indeed, in the case of QUIC, many things that we expect in-kernel TCP/IP implementation to do are brought under the control of a user-space application, including, but not limited to, such intricacies as congestion control. In this case, UDP can be seen as a portable kernel interface to the network card with the additional advantage of making it possible to run multiple applications simultaneously using a network card (which is not the case when using user-space TCP/IP stacks).
QUIC is often described as a replacement for the TCP protocol. One of the authors of "Computer Networks: A Systems Approach" [argues](https://www.theregister.com/2022/10/07/quic_tcp_replacement/) that it is, in fact, an addition to the internet protocols suite, which is meant to implement a missing paradigm - a basis for Remote Procedure Calls (RPC) protocols. That is, it might be considered a replacement of the TCP only in the cases when TCP was used due to a lack of a better alternative.
It should be noted that while QUIC is meant to be used as a universal transport for DNS, it, unlike HTTP/2 (DoH), can be used for zone transfers as well. It is [not always guaranteed](https://www.theregister.com/2021/08/04/dissecting_performance_of_production_quic/) that it will provide a significant performance boost. In fact, it might require more traffic in some cases, as even the initial QUIC message should be no less than 1200 bytes, which is a lot by common DNS measurements. However, it might compensate for that by lower latency due to 0-RTT support. Also, it seems to be more like a client-oriented protocol due to the ability to migrate client connections to new addresses (which is great for portable mobile devices). That being said, it is not clear how beneficial it would be to use it for server-to-server communications for things like zone transfers: servers do not change addresses often, and the protocol itself is more verbose than, e.g. DoT, so I fail to see the immediate benefits for this case (although it is standardised).
## The State of Open Source QUIC Implementations and the Great OpenSSL Schism
As it was stated before, QUIC includes TLSv1.3. Thus, most implementations decided to dedicate TLS-specific functionality to base the crypto-related bits to OpenSSL and its derivatives, which makes sense as these libraries are widely deployed and used. However, initially, these libraries lacked QUIC-specific parts in their TLS implementations.
As the early QUIC adopters were also QUIC implementors, they forked OpenSSL and added the missing bits. The related changes to the API ended up in multiple OpenSSL forks, namely [QuicTLS](https://quictls.github.io/) (maintained by Akamai and Microsoft), [LibreSSL](https://www.libressl.org/) (maintained by OpenBSD), and [BoringSSL](https://boringssl.googlesource.com/boringssl) (maintained by Google). These libraries only implement the low-level TLS 1.3 bits related to QUIC but do not have any internal QUIC implementations, as their early adopters developed their own.
For quite some time, the original OpenSSL had [a merge request](https://github.com/openssl/openssl/pull/8797) opened to implement the same API and remain mostly compatible with its forks, as was the case for a long time. However, OpenSSL authors decided to provide a high-level implementation of QUIC of their own making and eventually closed the MR. That caused a lot of drama, about which you can read [here](https://daniel.haxx.se/blog/2021/10/25/the-quic-api-openssl-will-not-provide/) or [here](https://github.com/haproxy/haproxy/issues/680#issuecomment-1433118828) as well as in other places. Regarding OpenSSL, it is worth keeping multiple things in mind.
Firstly, [OpenSSL's implementation](https://www.openssl.org/docs/manmaster/man7/openssl-quic.html) is not ready yet, as it includes only minimal client-side implementation starting from the version of OpenSSL v3.2. The server-side support was planned for 3.3 (April 2024) according to [the project's roadmap](https://www.openssl.org/roadmap.html), but eventually, it was moved to 3.4 (October 2024), as there are [many not completed tasks](https://github.com/orgs/openssl/projects/2/views/31?pane=issue&itemId=31713456), some of them are marked as "Epic". Even after that, we most likely will have only the most basic (_Minimal Working Product_) implementation that is not as battle-tested as some others and with missing features.
Secondly, OpenSSL does not allow the use of third-party implementations of QUIC: with OpenSSL, the only option is to use the internal QUIC implementation. That was [noted by Tatsuhiro Tsujikawa](https://github.com/ngtcp2/ngtcp2/issues/898#issuecomment-1692538880), the principal author of nghttp2/ngtcp2/nghttp3.
As a result of these decisions, most QUIC implementations chose to depend on QuicTLS and other forks that provide similar API. That list includes [MS-QUIC](https://github.com/microsoft/msquic), [Quiche](https://blog.cloudflare.com/enjoy-a-slice-of-quic-and-rust/), Chromium QUIC, as well as internal (=not exposed as a redistributable library as it is very specific) implementation in HAProxy. Probably, most other libraries are likely doing the same.
There are notable exceptions to this, though.
Firstly, NGINX does not depend on a fork-related functionality to implement QUIC support, nor does it depend on the OpenSSL implementation of QUIC. One could get this impression after reading [the announcement of this functionality in NGINX](https://www.nginx.com/blog/quic-http3-support-openssl-nginx/), which discusses that they implement _only_ OpenSSL compatibility layer in order to remain compatible with both OpenSSL and its now numerous forks. It should be noted, though, that in this particular case, whoever wrote the announcement was modest, as, in fact, NGINX includes [their own in-house QUIC implementation](https://github.com/nginx/nginx/tree/master/src/event/quic). And yes, it seems that they managed to do it without relying on any QUIC-related functionality in OpenSSL or its forks (like QuicTLS, LibreSSL, and BoringTLS). Their code seems to work with basically any OpenSSL-like library with TLSv1.3 support.
Secondly, there is [ngtcp2](https://github.com/ngtcp2/ngtcp2), which itself does not depend on any cryptographic library per se. The library itself may use one of the provided backends implemented on top of QuicTLS, GnuTLS, PicoTLS or BoringSSL and implemented as separate libraries, but it does not have to, as an application can provide a custom implementation of the [ngtcp2 crypto API](https://nghttp2.org/ngtcp2/crypto_apiref.html) as described in [the programmer's guide](https://nghttp2.org/ngtcp2/programmers-guide.html). The library itself is [very complete](https://nghttp2.org/ngtcp2/apiref.html) and seems to implement all the intricacies of the QUIC protocol, unlike the OpenSSL's implementation at the time of writing (and likely, for quiet some time after that).
At this point, it is clear that as far as QUIC support goes, the OpenSSL and its numerous forks **will remain incompatible**. _That is not a technical decision and, thus, hard to resolve._
## Implementing QUIC in BIND
For the reasons given above, I think that we should choose the ngtcp2 library as a basis for our QUIC transport. Additionally, it is much more mature than the implementation that OpenSSL will initially provide as and has been considered stable for a while (1.0.0 was released Oct 15, 2023). Moreover, before implementing the final IETF QUIC, it implemented a number of drafts, so it is safe to say that the authors have been tracking the development of the protocol very closely. Considering that the most recent RFCs have been implemented as well (like [QUIC version 2](https://www.rfc-editor.org/rfc/rfc9369.html), which, in fact, is a minor update of the protocol), it appears to still be the case. We can pair it with our own crypto API implementation inspired by the code from NGINX and currently provided crypto libraries. That will ensure that BIND still remains compatible with other OpenSSL forks, in particular on the platforms that use them by default (like OpenBSD). That should give us the flexibility of NGINX without the burden of maintaining our own QUIC implementation, as I cannot justify doing that as much as depending on any of the numerous OpenSSL forks.
I cannot justify waiting for OpenSSL to have their implementation completed either, as it is not clear how good the initial implementation will be, and I am not sure if the higher-level API they intend to provide is going to be the best fit for us. Ngtcp2 already looks more promising in that regard, not to mention that we have, in my opinion, a very good experience of using [nghttp2](https://nghttp2.org/) from the same author. Also, it is worth noting that at this stage, we have an internal subsystem for managing TLS contexts used by DoT and DoH, so it is very desirable to use it for QUIC as well and having our own ngtcp2 crypto API implementation should make it possible. We can use the code [from NGINX](https://github.com/nginx/nginx/blob/master/src/event/quic/ngx_event_quic_openssl_compat.c) and existing [ngtcp2 crypto libraries](https://github.com/ngtcp2/ngtcp2/tree/main/crypto) as examples.
Apart from choosing the library which will serve as the foundation of our QUIC implementation, there are many other problems to solve, which are no less challenging and will require some thinking and trial and error, so it is better for us not to wait for OpenSSL so that we will have more time to iron out our QUIC-related code before the new release. That is, the code related to connections and stream management and, ideally, connection migration is the most challenging, in my opinion, though the choice of the library will affect its structure for sure.
I think that it would be best to use and scale the experience of structuring the transport code in a way similar to Stream DNS and PROXYv2. It is very desirable as it allows testing without direct dependence on the networking code. By such a design, I mean implementing the most important parts of the QUIC code as a black box to which we pass data, and it calls the necessary callbacks when required.
On the highest level, bi-directional QUIC streams (in which we are interested the most for now) should map well to Stream DNS or a very similar transport (a QUIC Stream based on `isc_dnsstream_assembler_t`).
QUIC has some newer characteristics not present in other transports, like an ability by both end-points to create new streams at any moment as much as a concept of a generic multiplexed transport itself. These things are likely to be resolved in a manner similar to HTTP/2 - using a virtual connection per stream. Another thing to mention is connection migration. Our code is not ready for that (and, likely, never will), so we should set a QUIC Stream end-point information at the moment of creating and then never update that (from the point of view of the higher level code for compatibility reasons - in fact, we still can do the actual connection migration with all the open streams).
So, in short, it seems that we should attempt to use ngtcp2 with custom crypto API implementation first. The initial plan was to combine the client-side support for QUIC from OpenSSL and combine it with "OpenSSL Compatibility Layer" (as they named it) from NGINX (which turned out to be a complete internal only in-house QUIC implementation), but due to the reasons given above, that will not work as it is not possible to combine QUIC-related functionality in OpenSSL with any third-party QUIC implementation at all (and likely never will). Ngtcp2 is the only crypto library agnostic implementation of QUIC at the moment.
I would rather choose to depend on OpenSSL for our QUIC implementation as a backup plan.
## Bridging the Gap: How We Could Make an Ngtcp2 Crypto API Implementation
Let's see how we can proceed with providing our own ngtcp2 crypto API implementation.
### Ngtcp2 Crypto API Implementations Structure
As it was noted above, in order to work, ngtcp2 requires a crypto API implementation. The project provides multiple of them for the crypto libraries that have explicit support for QUIC. The list of these libraries includes (at the time of writing) QuicTLS, BoringSSL, GnuTLS, WolfSSL, and PicoTLS.
Each crypto API implementation library consists of two parts.
Firstly, it is a high-level, [shared part](https://github.com/ngtcp2/ngtcp2/blob/main/crypto/shared.c). Among other things, it includes default implementations of all callbacks used by ngtcp2, and some important API calls, most notably `ngtcp2_crypto_derive_and_install_rx_key()` and `ngtcp2_crypto_derive_and_install_tx_key()`, which are mentioned in [the ngtcp2 programmers guide](https://nghttp2.org/ngtcp2/programmers-guide.html).
Secondly, a low-level part that provides a foundation for the functionality of the shared part. There is an implementation for each of the above mentioned supported crypto libraries.
The shared part and a low-level part linked together is an ngtcp2 crypto API implementation.
Of these implementations, the most interesting for us is the one for [QuicTLS](https://github.com/ngtcp2/ngtcp2/blob/main/crypto/quictls/quictls.c) and, to a somewhat lesser extent, [BoringSSL](https://github.com/ngtcp2/ngtcp2/blob/main/crypto/boringssl/boringssl.c), as both are OpenSSL forks. In fact, we can use most of the code from there without much adaptation, as QuicTLS is essentially OpenSSL + QUIC-related API (it is regularly rebased on top of the "mainline" OpenSSL).
The QuicTLS crypto API implementation does not use many QUIC-related API calls. I have identified only the following:
- `SSL_CTX_set_quic_method()` - we would be better using a very similar SSL_set_quic_method();
- `SSL_provide_quic_data()`;
- `SSL_set_quic_transport_params()`/`SSL_get_peer_quic_transport_params()`;
- `SSL_process_quic_post_handshake()` - this one appears to be optional but omitting it will leave us without 0-RTT support (at least, for now).
There is one problem with how the crypto API libraries are structured - as there is no way to use only the shared part without depending on one of the above-mentioned crypto libraries (which would not work for us anyway).
So, in our ngtcp2 crypto API implementation, we would need to provide a replacement for the shared part - at least for the callbacks and `ngtcp2_crypto_derive_and_install_rx_key()`/ `ngtcp2_crypto_derive_and_install_tx_key()`. That is doable but very unfortunate, as it includes a lot of crypto-related things, not to mention that it is extra code to maintain.
Regarding the missing QUIC API in OpenSSL, there is a solution from NGINX which might work for us as well.
### NGINX OpenSSL Compatibility Layer
As noted above, NGINX includes a compatibility layer with NGINX as a part of its QUIC implementation. It includes the implementations of the following functions:
- `SSL_set_quic_method()`
- `SSL_provide_quic_data()`
- `SSL_set_quic_transport_params()`/`SSL_get_peer_quic_transport_params()`
NGINX's [OpenSSL compatibility layer](https://github.com/nginx/nginx/blob/master/src/event/quic/ngx_event_quic_openssl_compat.c) clearly provides an internal implementation of missing parts of BoringSSL QUIC API, which is not that different from QuicTLS API.
One missing thing is `SSL_process_quic_post_handshake()`, which is needed for 0-RTT support (to be more precise - for TLS early data processing). In fact, NGINX explicitly turns off TLS early data support when using QUIC. It is not clear if we can easily provide a replacement for this function, but even if not - we can live without 0-RTT support for now. At the time of writing, it is still optional in many crypto libraries.
It is worth noting that NGINX is a server application, so the provided QUIC API implementation might need some adjustments to work for client-side code.
At this point, it seems that despite some limitations, it is possible to provide a portable ngtcp2 crypto API implementation.BIND 9.21.xArtem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4574add an +opt display flag for dig2024-02-14T18:09:52ZMatthew Pounsettadd an +opt display flag for dig### Description
As part of providing support or education to non-experts in the DNS, I find it useful to be able to narrow the output of `dig` to just the relevant bits, while providing a simple, reproducible, easily understood command ...### Description
As part of providing support or education to non-experts in the DNS, I find it useful to be able to narrow the output of `dig` to just the relevant bits, while providing a simple, reproducible, easily understood command to the other person. For example, it's routine for me to do something like `dig +noall +answer ...` or `dig +noall +authority...`.
I recently ran into a case where it would have been useful to focus on the OPT pseudosection, but found no easy antidote to `+noall` for it. `+comments` is close, but not precise. Likewise there are a number of things I can do with `grep`, or combining `+yaml` with `/bin/yq` to limit the output, but at the expense of clarity in the command I share.
### Request
It would be helpful to have an `+opt` (or equivalent) display flag in dig, which shows only the OPT pseudo-section without any other output, similarly to flags `+answer`, `+authority`, and `+additional`.
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4550Resolve license aggregation for "reuse lint"2024-02-07T16:19:55ZMichal NowakResolve license aggregation for "reuse lint"`reuse lint` in the [`reuse`](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3976938) CI job has a lot of deprecation warnings about license aggregation in our repo:
```
/opt/venv/lib/python3.11/site-packages/reuse/project.py:286: Pen...`reuse lint` in the [`reuse`](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3976938) CI job has a lot of deprecation warnings about license aggregation in our repo:
```
/opt/venv/lib/python3.11/site-packages/reuse/project.py:286: PendingDeprecationWarning: Copyright and licensing
information for 'COPYRIGHT' has been found in both 'COPYRIGHT' and in the DEP5 file located at '.reuse/dep5'.
The information for these two sources has been aggregated. In the future this behaviour will change, and you will
need to explicitly enable aggregation. See <https://github.com/fsfe/reuse-tool/issues/779>. You need do nothing
yet. Run with `--suppress-deprecation` to hide this warning.
...
```Not plannedOndřej SurýOndřej Surý