BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2024-01-23T15:27:16Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4544"primaries" block documentation issues2024-01-23T15:27:16ZRay Bellis"primaries" block documentation issuesI'm finding the documentation of the "primaries" block confusing.
The ARM claims a `primaries` zone setting is only permissible within mirror, redirect, secondary and stub zones. However I've been using them at least a couple of years ...I'm finding the documentation of the "primaries" block confusing.
The ARM claims a `primaries` zone setting is only permissible within mirror, redirect, secondary and stub zones. However I've been using them at least a couple of years within the `also-notify` section of primary zones.
There's no direct mention of `primaries` in the grammar of an `also-notify` block. I _suspect_ that it's covered by `<remote-servers>` but the only link between `primaries` and `remote-servers` is this text in the glossary:
> remote-servers: A named list of one or more ip_addresses with optional tls_id, server_key, and/or port. A remote-servers list may include other remote-servers lists. See primaries block.
If in fact a `<remote-servers>` reference _is_ a (named) `primaries` list, then that ought to be spelled out more explicitly, and the documentation updated to reflect that this can be used in *any* `allow-notify` block in any applicable zone type.
I'd also suggest that the top level grammar ought to actually be called `xfer-servers` instead of `masters` and then that term used in place of `remote-servers` in the ARM. In the NOTIFY case the listed servers are secondaries, not primaries, and it makes no sense to call them primaries.
[`remote-servers` also causes confusion with `server <prefix> { }` used to specify per-server EDNS overrides, etc]Long-termMatthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/4542XoT: Primaries should be able to have different allow-transfer acls per trans...2024-01-22T13:10:56ZDave KnightXoT: Primaries should be able to have different allow-transfer acls per transport or ACLs should be extended with port and transport options### Description
We can restrict a primary to ONLY allow-transfer on a specific transport, e.g.
allow-transfer port 853 transport tls { acl_for_xot_clients; };
Unless I'm missing something, there's no way to have different rules per tr...### Description
We can restrict a primary to ONLY allow-transfer on a specific transport, e.g.
allow-transfer port 853 transport tls { acl_for_xot_clients; };
Unless I'm missing something, there's no way to have different rules per transport.
I want to require XoT for transfers over the Internet, but allow insecure AXFR to localnets.
It's not possible to have multiple allow-transfer definitions, i.e. this
allow-transfer port 53 transport tcp { acl_for_nonxot_clients; };
allow-transfer port 853 transport tls { acl_for_xot_clients; };
results in
'allow-transfer' redefined near 'allow-transfer'
And my understanding is that we can't refer to ports or transport in an acl.
### Request
Either allow multiple allow-transfer clauses, treating "allow transfer transport tcp" and "allow transfer transport tls" as different things, which can have their own acl specification, or add port and transport to the acl so that this can be controlled there.
### Links / referencesLong-termArtem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4538duplicate TLS session tickets from BIND2024-01-17T18:01:29ZPetr Špačekpspacek@isc.orgduplicate TLS session tickets from BIND### Summary
BIND sends **two** TLS session tickets in a row, in the same TCP frame. This looks like a bug. Probably no real-world impact except consuming a bit of extra bandwidth.
### BIND version affected
* ~"Affects v9.19" : e39b5447...### Summary
BIND sends **two** TLS session tickets in a row, in the same TCP frame. This looks like a bug. Probably no real-world impact except consuming a bit of extra bandwidth.
### BIND version affected
* ~"Affects v9.19" : e39b544704b98ddd8a19e317373b84ac74597f76 - noticed while testing !8646
* ~"Affects v9.18" : 071de1b5b54c27b1291bd97e3a95a93b1996eddc - isc-private/bind9!585
### Steps to reproduce
1. SSLKEYLOGFILE=/tmp/tlskeys /tmp/4527-improve-tls-framing-for-dot/sbin/named -g -c /tmp/named.conf
2. sudo tcpdump -i lo -w /tmp/tls.pcap 'port 853'
3. dig @127.0.0.1 +tls
- [tls.pcap](/uploads/e5836a9693d76f117c9e5c80f15cf2b1/tls.pcap)
- [tlskeys](/uploads/76d398d1c33b7eb90f4c7a14ff27a644/tlskeys)
### What is the current *bug* behavior?
For some reason BIND sends **two** TLS session tickets in a row, in the same TCP frame.
<details>
```
Frame 10: 608 bytes on wire (4864 bits), 608 bytes captured (4864 bits)
Ethernet II, Src: 00:00:00:00:00:00, Dst: 00:00:00:00:00:00
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 853, Dst Port: 46779, Seq: 766, Ack: 476, Len: 542
Transport Layer Security
TLSv1.3 Record Layer: Handshake Protocol: New Session Ticket
Opaque Type: Application Data (23)
Version: TLS 1.2 (0x0303)
Length: 266
[Content Type: Handshake (22)]
Handshake Protocol: New Session Ticket
Handshake Type: New Session Ticket (4)
Length: 245
TLS Session Ticket
Session Ticket Lifetime Hint: 7200 seconds (2 hours)
Session Ticket Age Add: 1399829672
Session Ticket Nonce Length: 8
Session Ticket Nonce: 0000000000000000
Session Ticket Length: 224
Session Ticket [truncated]: 5f2c5c7290f6b002e39631b54f85b14de2620e615663e5e3a2a5c5194a3e5c47d5da9fc257200fe4318de304b2471b4a1f35607e53e0a3eb04e00421e2539bcdbf486e60ec9900448831dc70c1dcb081c0890d04c337dbe4aef4806dd5004019a0a7edfabbf17de7590
Extensions Length: 0
TLSv1.3 Record Layer: Handshake Protocol: New Session Ticket
Opaque Type: Application Data (23)
Version: TLS 1.2 (0x0303)
Length: 266
[Content Type: Handshake (22)]
Handshake Protocol: New Session Ticket
Handshake Type: New Session Ticket (4)
Length: 245
TLS Session Ticket
Session Ticket Lifetime Hint: 7200 seconds (2 hours)
Session Ticket Age Add: 310059667
Session Ticket Nonce Length: 8
Session Ticket Nonce: 0000000000000001
Session Ticket Length: 224
Session Ticket [truncated]: 5f2c5c7290f6b002e39631b54f85b14dc423d6b1f00ccd25e30d7cf9290c0dc32d8ed4b9c72a8e3555d9ccdba4b3b6299e5306c5bf9ca48f72325e23927d1e9ae572d8937faedeb7b5846b4f8817bef5e537a5ff8e516c20f520ebb535ab37fa64996854d10dcee1291
Extensions Length: 0
```
</details>
### What is the expected *correct* behavior?
I would expect just one ticket.Artem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2117BIND sometimes fixates on one server address for a zone2024-01-17T14:25:21ZBrian ConryBIND sometimes fixates on one server address for a zoneA customer has reported:
> I noticed I focused on the wrong nameservers before (I sent the nameservers for akamaiedge.net, instead of g.akamaiedge.net), but the issue is the same. The authoritative nameservers to consider are:
> ```
> n...A customer has reported:
> I noticed I focused on the wrong nameservers before (I sent the nameservers for akamaiedge.net, instead of g.akamaiedge.net), but the issue is the same. The authoritative nameservers to consider are:
> ```
> n0g.akamaiedge.net. 152 IN A 88.221.81.192
> n0g.akamaiedge.net. 152 IN AAAA 2600:1480:e800::c0
> n1g.akamaiedge.net. 152 IN A 2.16.65.53
> n2g.akamaiedge.net. 152 IN A 2.16.65.86
> n3g.akamaiedge.net. 152 IN A 2.16.65.44
> n4g.akamaiedge.net. 152 IN A 2.16.65.68
> n5g.akamaiedge.net. 162 IN A 2.16.65.77
> n6g.akamaiedge.net. 162 IN A 2.21.25.118
> n7g.akamaiedge.net. 181 IN A 2.17.41.132
> ```
> response times are:
> ```
> 88.221.81.192: 147 msec
> 2600:1480:e800::c0: 146 msec
> 2.16.65.53: 1 msec
> 2.16.65.86: 1 msec
> 2.16.65.44: 1 msec
> 2.16.65.68: 1 msec
> 2.16.65.77: 1 msec
> 2.21.25.118: 15 msec
> 2.17.41.132: 13 msec
> ```
They have provided data from `rndc dumpdb -all`.
selected cache data:
```
; glue
g.akamaiedge.net. 865 NS n0g.akamaiedge.net.
865 NS n7g.akamaiedge.net.
865 NS n5g.akamaiedge.net.
865 NS n4g.akamaiedge.net.
865 NS n3g.akamaiedge.net.
865 NS n1g.akamaiedge.net.
865 NS n2g.akamaiedge.net.
865 NS n6g.akamaiedge.net.
; answer
e11550.g.akamaiedge.net. 433 \-TYPE65 ;-$NXRRSET
; g.akamaiedge.net. SOA n0g.akamaiedge.net. hostmaster.akamai.com. 1599033648 1000 1000 1000 1800
; authanswer
n0g.akamaiedge.net. 2246 A 88.221.81.192
; authanswer
2246 AAAA 2600:1480:e800::c0
; authanswer
n1g.akamaiedge.net. 2246 A 2.16.65.53
; authanswer
n2g.akamaiedge.net. 2246 A 2.16.65.86
; authanswer
n3g.akamaiedge.net. 2246 A 2.16.65.44
; authanswer
n4g.akamaiedge.net. 2246 A 2.16.65.68
; authanswer
n5g.akamaiedge.net. 2256 A 2.16.65.77
; authanswer
n6g.akamaiedge.net. 2256 A 2.21.25.118
; authanswer
n7g.akamaiedge.net. 2275 A 2.17.41.132
```
selected ADB entries:
```
; selected ADB data
; n0g.akamaiedge.net [v4 TTL 46] [v6 TTL 46] [v4 success] [v6 success]
; 88.221.81.192 [srtt 121879] [flags 00004000] [edns 63/0/0/0/0] [plain 0/0] [udpsize 512] [ttl -991]
; 2600:1480:e800::c0 [srtt 146019] [flags 00004000] [edns 135/0/0/0/0] [plain 0/0] [udpsize 512] [ttl -991]
; n1g.akamaiedge.net [v4 TTL 46] [v4 success] [v6 unexpected]
; 2.16.65.53 [srtt 6] [flags 00000000] [edns 0/0/0/0/0] [plain 0/0] [ttl 810]
; n2g.akamaiedge.net [v4 TTL 46] [v4 success] [v6 unexpected]
; 2.16.65.86 [srtt 21] [flags 00000000] [edns 0/0/0/0/0] [plain 0/0] [ttl 810]
; n3g.akamaiedge.net [v4 TTL 46] [v4 success] [v6 unexpected]
; 2.16.65.44 [srtt 20] [flags 00000000] [edns 0/0/0/0/0] [plain 0/0] [ttl 810]
; n4g.akamaiedge.net [v4 TTL 46] [v4 success] [v6 unexpected]
; 2.16.65.68 [srtt 29] [flags 00000000] [edns 0/0/0/0/0] [plain 0/0] [ttl 810]
; n5g.akamaiedge.net [v4 TTL 56] [v4 success] [v6 unexpected]
; 2.16.65.77 [srtt 30] [flags 00000000] [edns 0/0/0/0/0] [plain 0/0] [ttl 810]
; n6g.akamaiedge.net [v4 TTL 56] [v4 success] [v6 unexpected]
; 2.21.25.118 [srtt 27] [flags 00000000] [edns 0/0/0/0/0] [plain 0/0] [ttl 810]
; n7g.akamaiedge.net [v4 TTL 75] [v4 success] [v6 unexpected]
; 2.17.41.132 [srtt 9] [flags 00000000] [edns 0/0/0/0/0] [plain 0/0] [ttl 810]
```
One thing to note about the ADB entries is that the entries for `n1g` through `n7g` have not been used and appear to have been added, but unused, prior to the `dumpdb` (new entries are initialized to a value between 1 and 32 microseconds).
The core of the cycle appears to be:
1. As long as at least one address is found in the ADB for at least one of the names in the NS rrset, no new data is fetched or moved into the ADB
2. As long a `named` is waiting for a response from an address, that ADB entry is preserved
3. `named` sets how long to wait for a response based on the current SRTT
4. An ADB entry can be used even if it is expired
While theoretically any address could be the one fixated on, by virtue point 3 above the ones with the higher SRTT are more likely to be selected than the ones with the lower SRTT.
This is also more likely to happen for a frequently-queried zone with many records with low TTLs, such as the zone of a CDN.
This is not the first time I've seen behavior that I've believed linked to this, but it is the first time a customer has noticed it and it's also the clearest documentation yet for it.
I expect that there are multiple possible solutions to this, with the hard part being choosing the one that we believe will be the easiest to implement and have the lowest chances of unintended consequences.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4532An option to not have bind9/dnssec-settime (possibly other tools) reset permi...2024-01-16T20:30:36ZDan MahoneyAn option to not have bind9/dnssec-settime (possibly other tools) reset permissions on a .private file.### Description
The `named` process and `dnssec-settime` (perhaps other tools) will take it upon themselves to change the permissions of a private key on certain changes.
However, we track our key-directory (and other configs) using gi...### Description
The `named` process and `dnssec-settime` (perhaps other tools) will take it upon themselves to change the permissions of a private key on certain changes.
However, we track our key-directory (and other configs) using git, with a group-shared repository.
Typical permissions on .private files are bind:bind with mode 660, but because a normal user (in the bind group) diffs/commits/pushes the repository, these keys can also be user:bind mode 660.
(Noting as well that our tooling is not more comfortable running git tasks as root, complaining of other permissions issues. Also, the less we can do as root, the better.)
With bind's usual permissions model, one cannot do a git diff/git log if the file is owned by bind. If the file is owned by user:bind, bind loses access to it on the permissions change.
Changing the umask under which the process runs doesn't seem to fix this, we tried.
Running a periodic cron job to fix this is a possible workaround, but feels like it shouldn't be necessary.
### Request
For command line tools, an option to not do this.
For `named, an `options` statement that lets us turn this off.
Both retaining the current behavior by default.
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2964Templates in the configuration2024-01-15T07:00:03ZOndřej SurýTemplates in the configurationThe zone should contain reusable chunks (something like yaml: `<< *foo`).The zone should contain reusable chunks (something like yaml: `<< *foo`).Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/751[ISC-support #13775] Requested Feature: Add a way to define restrictions on w...2024-01-11T13:43:23ZMichael McNally[ISC-support #13775] Requested Feature: Add a way to define restrictions on which zones may be added via catalog zone### Description
Catalog zones provide a powerful mechanism to allow dynamic control over which zones are provisioned on an authoritative server but their initial implementation assumes that the party controlling the zone data for the ca...### Description
Catalog zones provide a powerful mechanism to allow dynamic control over which zones are provisioned on an authoritative server but their initial implementation assumes that the party controlling the zone data for the catalog zone is fully trusted. Numerous use cases exist where the operator of a server would like to delegate to a party permission to add and remove some zones but with restrictions on which such zones can be added (or removed.)
### Request
One of our BIND Support customers has asked that we consider adding mechanisms to allow an operator using catalog zones to restrict via configuration statements which zones will be allowed to be provisioned via the catalog zone.
### Links / references
The original customer request can be found in ISC Support ticket [#13775](https://support.isc.org/Ticket/Display.html?id=13775)Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4525bind acl doesn't respect interface identifier (in ipv6 link local address)2024-01-09T06:15:55Zelmaimbobind acl doesn't respect interface identifier (in ipv6 link local address)### Summary
Although BIND allows you to configure an IPv6 address with an interface identifier (e.g. fe80::1%ne0) in an "acl" statement, when it tests if an address satisfies the acl, it seems to only look at the address and ignores the...### Summary
Although BIND allows you to configure an IPv6 address with an interface identifier (e.g. fe80::1%ne0) in an "acl" statement, when it tests if an address satisfies the acl, it seems to only look at the address and ignores the interface identifier when performing the check.
### BIND version affected
```
# named -V
BIND 9.18.18-0ubuntu2-Ubuntu (Extended Support Version) <id:>
running on Linux x86_64 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--disable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=yes' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/bind9-UHPUkp/bind9-9.18.18=. -flto=auto -ffat-lto-objects -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -fdebug-prefix-map=/build/bind9-UHPUkp/bind9-9.18.18=/usr/src/bind9-1:9.18.18-0ubuntu2 -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 13.2.0
compiled with OpenSSL version: OpenSSL 3.0.10 1 Aug 2023
linked to OpenSSL version: OpenSSL 3.0.10 1 Aug 2023
compiled with libuv version: 1.44.2
linked to libuv version: 1.44.2
compiled with libnghttp2 version: 1.55.1
linked to libnghttp2 version: 1.55.1
compiled with libxml2 version: 2.9.14
linked to libxml2 version: 20914
compiled with json-c version: 0.17
linked to json-c version: 0.17
compiled with zlib version: 1.2.13
linked to zlib version: 1.2.13
linked to maxminddb version: 1.7.1
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
```
### Steps to reproduce
These steps require access to two machines, each having an IPv6 link-local address on a shared network segment. One of the machines needs to have an operational installation of BIND. The other machine needs the dig utility, or a similar tool that allows a DNS query to be sent to a specific IPv6 address.
1. On the BIND machine (server A), run `ip -6 address` and verify that the server has a loopback interface (lo) with IPv6 address `::1`, and at least one other network interface that has a link-local address `fe80::xxxx:xxxx:xxxx:xxxx/64`. Make a note of the name of the interface name and the link-local address.
2. On the other machine (server B), run `ip -6 address` and make a note of the interface name that is on the same network as the BIND machine.
3. Verify connectivity between the servers by pinging from server B to the link-local address of server A -- including server B's interface identifier. E.g.: `ping fe80::1e69:7aff:fe6c:2ab0%eno1`
4. On server A, edit the BIND configuration and add `acl testing { fe80::/64; };`, and also include "testing;" at the start of both `allow-query` and `allow-recursion` options. Run `rndc reload` to apply the configuration changes.
5. On server B, verify that you can use dig (or similar) to successfully query a DNS name using the the same link-local address used in the ping test above. E.g.: `dig google.com @fe80::1e69:7aff:fe6c:2ab0%eno1`
6. Now change the "acl testing" block to `acl testing { !fe80::%lo/64; fe80::/64; };`. The idea here is that we are disallowing queries coming from link-local addresses on the loopback interface. In theory this should make no difference to our test, since our query isn't coming in the loopback interface. Run `rndc reload` to apply the configuration changes.
7. Repeat the "dig" test, and you will find that the BIND server will now refuse the request. This shows that BIND considers that the request satisfies "!fe80::%lo/64;" when in fact it shouldn't because it doesn't originate from the loopback interface.
### What is the current *bug* behavior?
BIND seems to ignore the interface identifier for IPv6 addresses when applying acls.
### What is the expected *correct* behavior?
BIND should observe the interface identifier as described in the [documentation](https://bind9.readthedocs.io/en/latest/reference.html#term-ipv6_address). Please note that interface identifiers may also contain VLAN IDs - e.g. "eno1.20".
### Relevant configuration files
FYI I am trying to use ACLs similar to the following, to differentiate between requests originating on link-local addresses from different interfaces, so that the queries are handled by different views:
```
acl trusted-networks {
127.0.0.0/8;
::1;
fe80::%eno1.20/64;
fe80::%eno1.160/64;
fe80::%tun1/64;
};
acl dmz-networks {
fe80::%eno1.192/64;
};
```
### Relevant logs
All my logs show is that the wrong view is being used, due to this bug.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4318Check the size of the structure passed to dns_rdata_*struct methods2024-01-03T13:35:03ZMark AndrewsCheck the size of the structure passed to dns_rdata_*struct methods#4314 made me think we should check the size of the structure being passed to dns_rdata_tostruct, dns_rdata_fromstruct, and dns_rdata_freestruct as we don't have the compiler doing type checks for us. It there is a mismatch badness coul...#4314 made me think we should check the size of the structure being passed to dns_rdata_tostruct, dns_rdata_fromstruct, and dns_rdata_freestruct as we don't have the compiler doing type checks for us. It there is a mismatch badness could happen.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4230checkds test may fail due to a timing issue2024-01-02T13:40:29ZMark Andrewscheckds test may fail due to a timing issueJob [#3552282](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3552282) failed for 423c9d6716c4b96f0cf939653da47abca267bd23:
```
___________________________ test_checkds_dspublished ___________________________
[gw0] linux -- Python 3.1...Job [#3552282](https://gitlab.isc.org/isc-projects/bind9/-/jobs/3552282) failed for 423c9d6716c4b96f0cf939653da47abca267bd23:
```
___________________________ test_checkds_dspublished ___________________________
[gw0] linux -- Python 3.11.4 /usr/bin/python3.11
/builds/isc-projects/bind9/bin/tests/system/checkds/tests_checkds.py:640: in test_checkds_dspublished
checkds_dspublished(named_port, "explicit", "10.53.0.8")
/builds/isc-projects/bind9/bin/tests/system/checkds/tests_checkds.py:308: in checkds_dspublished
keystate_check(parent, zone, "DSPublish")
/builds/isc-projects/bind9/bin/tests/system/checkds/tests_checkds.py:228: in keystate_check
assert val != 0
E assert 0 != 0
```Not plannedTom KrizekTom Krizekhttps://gitlab.isc.org/isc-projects/bind9/-/issues/23DDoS mitigation2023-12-22T10:28:30ZOndřej SurýDDoS mitigationThis is a placeholder bug for general DDoS mitigation techniques that needs to be introduced into BIND to cope with current DNS landscape.This is a placeholder bug for general DDoS mitigation techniques that needs to be introduced into BIND to cope with current DNS landscape.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4503Possible pytest RNDC interface improvements2023-12-21T18:03:30ZŠtěpán BalážikPossible pytest RNDC interface improvementsReview of !8357, which provides a first cut of the pytest RNDC interface, shown multiple suggestion for possible improvements.
I am now dumping them here in a form of checklist so they don't get buried in the now resolved MR comments:
...Review of !8357, which provides a first cut of the pytest RNDC interface, shown multiple suggestion for possible improvements.
I am now dumping them here in a form of checklist so they don't get buried in the now resolved MR comments:
- [ ] Find a way to the the `*.in` files templating in pure Python. This is needed for the elimination of the `setup.sh` scripts. This will probably require depending on `jinja` explicitly.
- [ ] Add an "rndc null" before every reconfiguration to show which file is used (NamedInstance.add_mark_to_log() as it may be generically useful?)
- [ ] Extend `NamedInstance` with some kind of `query` method. This is needed as a replacement for the calls to `dig` which are common in system tests.
- [ ] There are now two objects representing the ports used in tests: dictionary returned by the `ports` fixture and the new `NamedPorts` class. Unify them. Discussed [here](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/8357#note_411674).
- [ ] Consider switch from `NamedTuple` to `dataclass` (Python 3.7 feature, requires a external dependency on some distros we run) as discussed [here](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/8357#note_411004).
- [ ] `NamedInstance.rndc(…)` method probably ought to be `async`. Discussed [here](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/8357#note_411007)
Feel free to add others!Long-termŠtěpán BalážikŠtěpán Balážikhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3444Issues with using C++-based PKCS#11 providers with BIND 9 when jemalloc suppo...2023-12-21T11:10:14ZMichał KępieńIssues with using C++-based PKCS#11 providers with BIND 9 when jemalloc support is enabledWhile [moving][1] SoftHSM-based jobs around between operating systems,
we noticed that `dnssec-keyfromlabel` segfaults on Debian 11 "bullseye"
when BIND 9 is built with jemalloc support. A full backtrace with debug
symbols installed fol...While [moving][1] SoftHSM-based jobs around between operating systems,
we noticed that `dnssec-keyfromlabel` segfaults on Debian 11 "bullseye"
when BIND 9 is built with jemalloc support. A full backtrace with debug
symbols installed follows:
<details>
<summary>Click to expand/collapse backtrace</summary>
```
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f779ea8e537 in __GI_abort () at abort.c:79
#2 0x00007f779e4335a0 in rtree_child_leaf_tryread (elm=0x7f779e530268 <je_extents_rtree+704104>, dependent=true) at src/rtree.c:205
#3 0x00007f779e433812 in je_rtree_leaf_elm_lookup_hard (tsdn=0x7f779bf82700, rtree=0x7f779e484400 <je_extents_rtree>, rtree_ctx=0x7f779bf82730, key=94481296101344, dependent=true, init_missing=false) at src/rtree.c:292
#4 0x00007f779e3b6235 in rtree_leaf_elm_lookup (tsdn=0x7f779bf82700, rtree=0x7f779e484400 <je_extents_rtree>, rtree_ctx=0x7f779bf82730, key=94481296101344, dependent=true, init_missing=false) at include/jemalloc/internal/rtree.h:381
#5 0x00007f779e3b627a in rtree_read (tsdn=0x7f779bf82700, rtree=0x7f779e484400 <je_extents_rtree>, rtree_ctx=0x7f779bf82730, key=94481296101344, dependent=true) at include/jemalloc/internal/rtree.h:406
#6 0x00007f779e3b6394 in rtree_szind_read (tsdn=0x7f779bf82700, rtree=0x7f779e484400 <je_extents_rtree>, rtree_ctx=0x7f779bf82730, key=94481296101344, dependent=true) at include/jemalloc/internal/rtree.h:429
#7 0x00007f779e3b8dbb in arena_salloc (tsdn=0x7f779bf82700, ptr=0x55ee24178fe0) at include/jemalloc/internal/arena_inlines_b.h:191
#8 0x00007f779e3b9f35 in isalloc (tsdn=0x7f779bf82700, ptr=0x55ee24178fe0) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:38
#9 0x00007f779e3c696f in je_sdallocx_default (ptr=0x55ee24178fe0, size=21, flags=0) at src/jemalloc.c:3555
#10 0x00007f779e3c6e3c in je_je_sdallocx_noflags (ptr=0x55ee24178fe0, size=21) at src/jemalloc.c:3611
#11 0x00007f779e44a855 in operator delete (ptr=0x55ee24178fe0, size=21) at src/jemalloc_cpp.cpp:131
#12 0x00007f779befa14f in __gnu_cxx::new_allocator<char>::deallocate (__t=<optimized out>, __p=<optimized out>, this=0x7ffd603a8b30) at /usr/include/c++/10/ext/new_allocator.h:133
#13 std::allocator_traits<std::allocator<char> >::deallocate (__n=<optimized out>, __p=<optimized out>, __a=...) at /usr/include/c++/10/bits/alloc_traits.h:492
#14 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_destroy (__size=<optimized out>, this=0x7ffd603a8b30) at /usr/include/c++/10/bits/basic_string.h:237
#15 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose (this=0x7ffd603a8b30) at /usr/include/c++/10/bits/basic_string.h:232
#16 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string (this=0x7ffd603a8b30, __in_chrg=<optimized out>) at /usr/include/c++/10/bits/basic_string.h:658
#17 SimpleConfigLoader::loadConfiguration (this=0x55ee2417aaf0) at SimpleConfigLoader.cpp:150
#18 0x00007f779bef73cb in Configuration::reload (this=0x55ee2417aa40) at Configuration.cpp:169
#19 0x00007f779bee38bf in SoftHSM::C_Initialize (this=0x55ee2417c910, pInitArgs=<optimized out>) at SoftHSM.cpp:564
#20 0x00007f779beb3e34 in C_Initialize (pInitArgs=0x7ffd603a90b0) at main.cpp:133
#21 0x00007f779bf73249 in pkcs11_CTX_load (ctx=ctx@entry=0x55ee24179230, name=<optimized out>) at p11_load.c:86
#22 0x00007f779bf76ac8 in PKCS11_CTX_load (ctx=ctx@entry=0x55ee24179230, ident=<optimized out>) at p11_front.c:46
#23 0x00007f779bf6e07a in ctx_enumerate_slots_unlocked (ctx=ctx@entry=0x55ee2415aff0, pkcs11_ctx=pkcs11_ctx@entry=0x55ee24179230) at eng_back.c:258
#24 0x00007f779bf6f0bd in ctx_init_libp11_unlocked (ctx=0x55ee2415aff0) at eng_back.c:307
#25 ctx_load_object (ctx=ctx@entry=0x55ee2415aff0, object_typestr=object_typestr@entry=0x7f779bf784be "public key", match_func=match_func@entry=0x7f779bf6f2c0 <match_public_key>, object_uri=0x7f779ba30000 "pkcs11:token=softhsm2-keyfromlabel;object=keyfromlabel-zsk-rsasha256.example;pin-source=/bind9/bin/tests/system/keyfromlabel/pin", ui_method=0x0, callback_data=0x0) at eng_back.c:578
#26 0x00007f779bf6f590 in ctx_load_pubkey (ctx=0x55ee2415aff0, s_key_id=<optimized out>, ui_method=<optimized out>, callback_data=<optimized out>) at eng_back.c:745
#27 0x00007f779e821140 in ENGINE_load_public_key () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#28 0x00007f779ecdba7c in opensslrsa_fromlabel (key=0x7f779ba35000, engine=0x7ffd603aada9 "pkcs11", label=0x7f779ba30000 "pkcs11:token=softhsm2-keyfromlabel;object=keyfromlabel-zsk-rsasha256.example;pin-source=/bind9/bin/tests/system/keyfromlabel/pin", pin=<optimized out>) at opensslrsa_link.c:1457
#29 0x00007f779ec902af in dst_key_fromlabel (name=name@entry=0x7ffd603a9540, alg=8, flags=flags@entry=256, protocol=protocol@entry=3, rdclass=<optimized out>, engine=engine@entry=0x7ffd603aada9 "pkcs11", label=0x7f779ba30000 "pkcs11:token=softhsm2-keyfromlabel;object=keyfromlabel-zsk-rsasha256.example;pin-source=/bind9/bin/tests/system/keyfromlabel/pin", pin=0x0, mctx=0x7f779ba09000, keyp=0x7ffd603a93b8) at dst_api.c:960
#30 0x000055ee2376afae in main (argc=<optimized out>, argv=<optimized out>) at dnssec-keyfromlabel.c:609
```
</details>
As it can be seen in the backtrace, the segmentation fault happens
during SoftHSM initialization.
When jemalloc is built with debugging enabled, the following assertion
is logged:
<jemalloc>: src/rtree.c:205: Failed assertion: "!dependent || leaf != NULL"
Nothing like this happens on Fedora. I have not checked other operating
systems.
Further investigation revealed that this assertion failure [means][2]
that jemalloc was asked to free a pointer that it did not allocate.
Things only get more fuzzy from here...
I believe the root cause of this issue lies somewhere in how various
distros link and load executables (because that influences when jemalloc
is initialized). Specifically, it looks like jemalloc gets initialized
earlier on Fedora than on Debian, which allows it to properly handle
allocations requested by C++ shared objects dynamically loaded from C
executables. Why this happens is over my head. However, one fact that
supports this theory is that `LD_PRELOAD`ing jemalloc on Debian seems to
work around the problem.
Since there are most likely other PKCS#11 providers out there that are
C++-based, I decided to document the hacks that worked around the
problem in my test environment (i.e. allowed `dnssec-keyfromlabel` to
work):
- `LD_PRELOAD` jemalloc.
- Build BIND 9 using `--without-jemalloc`.
- Link BIND 9 against a jemalloc build with C++ integration disabled
(`--disable-cxx`). This prevents jemalloc from handling C++'s `new`
and `delete` keywords.
Another possible (and untested) workaround would be to link BIND 9
against a jemalloc build that uses a custom function name prefix, but
BIND 9 [does not currently support such a scenario][3].
I do not see a clear way to fix this on the BIND 9 side of things, so
this issue is mostly meant to serve merely as a permanently-open source
of information.
[1]: !6322
[2]: https://gitter.im/jemalloc/jemalloc?at=5e275495364db33faa0bf972
[3]: #3116Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/660rndc showzone does not work for all zones2023-12-20T15:37:21ZPetr Špačekpspacek@isc.orgrndc showzone does not work for all zones### Summary
It seems that command `rndc showzone` does not work for certain zones, at very least for built-in zones. This is super inconvenient for scripts which iterate over all zones with intent to get and modify configuration of runn...### Summary
It seems that command `rndc showzone` does not work for certain zones, at very least for built-in zones. This is super inconvenient for scripts which iterate over all zones with intent to get and modify configuration of running process.
### BIND version used
BIND 9.13.0-dev <id:883a9485e9>
commit 883a9485e95916a686e56d81fff5130ac3102953
Date: Thu Feb 15 11:56:13 2018 -0800
### Steps to reproduce
$ rndc showzone 10.in-addr.arpa.
### What is the current *bug* behavior?
```
rndc: 'showzone' failed: failure
```
### What is the expected *correct* behavior?
Ideally equivalent which can be copied into back to config file and does not break anything. (Alternatively a more useful error message.)https://gitlab.isc.org/isc-projects/bind9/-/issues/3858Deprecate (or improve/replace) the fetches-per-zone option2023-12-19T09:21:45ZOndřej SurýDeprecate (or improve/replace) the fetches-per-zone optionThe `fetches-per-zone` is a measure to prevent abuse of the nameservers.
### How we pick a bucket?
When fetch (`fctx`) is created, the `fctx->domain` is initialized with a domain name that could be:
#### Argument passed by the called
...The `fetches-per-zone` is a measure to prevent abuse of the nameservers.
### How we pick a bucket?
When fetch (`fctx`) is created, the `fctx->domain` is initialized with a domain name that could be:
#### Argument passed by the called
`domain` passed by the caller - from `dns_adb`/`fetch_name` when `start_at_name` is set and from `ns_query`/`ns_query_recurse()`
No example here, we can (sort of) ignore this case.
#### In the forward-only mode
The `.` when we are in **forward-only** mode - there's only a single counter!
With QNAME Minimization On and Off
```
increasing counter for '.' in the '0x7fed97e3e000/www.google.com/A' to 1 (allowed 1 spilled 0)
increasing counter for '.' in the '0x7fed97a26800/com/DS' to 2 (allowed 2 spilled 0)
increasing counter for '.' in the '0x7fed97a25400/google.com/DS' to 3 (allowed 3 spilled 0)
decreasing counter for '.' in the '0x7fed97a26800/com/DS' to 2 (allowed 3 spilled 0)
increasing counter for '.' in the '0x7fed97226800/com/DNSKEY' to 3 (allowed 4 spilled 0)
decreasing counter for '.' in the '0x7fed97226800/com/DNSKEY' to 2 (allowed 4 spilled 0)
decreasing counter for '.' in the '0x7fed97a25400/google.com/DS' to 1 (allowed 4 spilled 0)
dropping counter for '.' in the '0x7fed97e3e000/www.google.com/A' to 0 (allowed 4 spilled 0)
```
#### Everything else
Whatever `dns_view_findzonecut()` returns. This includes **forward-first** configurations.
Example with QNAME minimization:
```
increasing counter for '.' in the '0x7f4b9983e000/www.google.com/A' to 1 (allowed 1 spilled 0)
increasing counter for '.' in the '0x7f4b9b81a000/_.com/A' to 2 (allowed 2 spilled 0)
decreasing counter for '.' in the '0x7f4b9b81a000/_.com/A' to 1 (allowed 2 spilled 0)
increasing counter for 'com' in the '0x7f4b9b81a000/_.com/A' to 1 (allowed 1 spilled 0)
dropping counter for 'com' in the '0x7f4b9b81a000/_.com/A' to 0 (allowed 1 spilled 0)
dropping counter for '.' in the '0x7f4b9983e000/www.google.com/A' to 0 (allowed 2 spilled 0)
increasing counter for 'com' in the '0x7f4b9983e000/www.google.com/A' to 1 (allowed 1 spilled 0)
increasing counter for 'com' in the '0x7f4b9b81a000/_.google.com/A' to 2 (allowed 2 spilled 0)
decreasing counter for 'com' in the '0x7f4b9b81a000/_.google.com/A' to 1 (allowed 2 spilled 0)
increasing counter for 'google.com' in the '0x7f4b9b81a000/_.google.com/A' to 1 (allowed 1 spilled 0)
dropping counter for 'google.com' in the '0x7f4b9b81a000/_.google.com/A' to 0 (allowed 1 spilled 0)
dropping counter for 'com' in the '0x7f4b9983e000/www.google.com/A' to 0 (allowed 2 spilled 0)
increasing counter for 'google.com' in the '0x7f4b9983e000/www.google.com/A' to 1 (allowed 1 spilled 0)
increasing counter for 'com' in the '0x7f4b9b81c800/google.com/DS' to 1 (allowed 1 spilled 0)
increasing counter for 'com' in the '0x7f4b99027800/com/DNSKEY' to 2 (allowed 2 spilled 0)
decreasing counter for 'com' in the '0x7f4b99027800/com/DNSKEY' to 1 (allowed 2 spilled 0)
dropping counter for 'com' in the '0x7f4b9b81c800/google.com/DS' to 0 (allowed 2 spilled 0)
dropping counter for 'google.com' in the '0x7f4b9983e000/www.google.com/A' to 0 (allowed 1 spilled 0)
```
Example without QNAME minimization:
```
increasing counter for '.' in the '0x7fc30803e000/www.google.com/A' to 1 (allowed 1 spilled 0)
dropping counter for '.' in the '0x7fc30803e000/www.google.com/A' to 0 (allowed 1 spilled 0)
increasing counter for 'com' in the '0x7fc30803e000/www.google.com/A' to 1 (allowed 1 spilled 0)
dropping counter for 'com' in the '0x7fc30803e000/www.google.com/A' to 0 (allowed 1 spilled 0)
increasing counter for 'com' in the '0x7fc30803e000/www.google.com/A' to 1 (allowed 1 spilled 0)
dropping counter for 'com' in the '0x7fc30803e000/www.google.com/A' to 0 (allowed 1 spilled 0)
increasing counter for 'google.com' in the '0x7fc30803e000/www.google.com/A' to 1 (allowed 1 spilled 0)
dropping counter for 'google.com' in the '0x7fc30803e000/www.google.com/A' to 0 (allowed 1 spilled 0)
increasing counter for 'google.com' in the '0x7fc30803e000/www.google.com/A' to 1 (allowed 1 spilled 0)
increasing counter for 'com' in the '0x7fc307c28c00/google.com/DS' to 1 (allowed 1 spilled 0)
increasing counter for 'com' in the '0x7fc307c27800/com/DNSKEY' to 2 (allowed 2 spilled 0)
decreasing counter for 'com' in the '0x7fc307c27800/com/DNSKEY' to 1 (allowed 2 spilled 0)
dropping counter for 'com' in the '0x7fc307c28c00/google.com/DS' to 0 (allowed 2 spilled 0)
dropping counter for 'google.com' in the '0x7fc30803e000/www.google.com/A' to 0 (allowed 1 spilled 0)
```
NOTE: The similar effect here has the `fetches-per-server` - but `fetches-per-server` is more fine-grained.BIND 9.21.xhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4219Exempt from fetch-limits, fetches generated as a result of prefetch of someth...2023-12-19T09:19:57ZCathy AlmondExempt from fetch-limits, fetches generated as a result of prefetch of something already in cache (opened as feature request, but could also be considered to be a design defect)### Description
Exempt fetches generated as a result of prefetch of something already in cache, from fetch limits
### Request
fetches-per-zone and fetches-server were originally designed to limit the number of pending fetches from a r...### Description
Exempt fetches generated as a result of prefetch of something already in cache, from fetch limits
### Request
fetches-per-zone and fetches-server were originally designed to limit the number of pending fetches from a resolver to auth servers when the auth servers were failing to respond (thus causing a backlog of fetches). There is another use-case for fetch-limits that we have observed (see [Support ticket #18991](https://support.isc.org/Ticket/Display.html?id=18991)) in which fetch-limits are used instead to limit concurrent queries to servers that are responding normally, even under DDoS query loads. In this situation the intention of applying the fetch-limits is to limit the impact on the auth servers of the DDoS. But this has the unfortunate effect of also rate-limiting 'good' queries.
For the 'good' queries for popular names - those are only going to be sent to the authoritative servers on a cache miss, or when previously cached content is close to expiry. Therefore one additional potential mitigation would be to try to ensure a much 'good' content remains in cache, so that it's only the 'new' or 'bad' content queries that causes fetches. IF prefetches (of previous known 'good' content, not negative NXDOMAIN or NXRRSET) were allowed a free pass through the fetch limits fence, then this would help in maintaining a good cache from which good query responses could be given, whilst at the same time rate-limiting the 'bad' queries being sent to the auth servers.
### Links / referencesBIND 9.21.xhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4058BIND resolver incorrectly handles NODATA/NOERROR (NXRRSET) query response whe...2023-12-19T09:18:49ZCathy AlmondBIND resolver incorrectly handles NODATA/NOERROR (NXRRSET) query response when CNAME is queried during prefetch### Summary
"A" fetch to an auth server returns "CNAME". But (it appears), with prefetch enabled (the default), when the "CNAME" is fetched the authoritative sends back noanswer/noerror = NXRRSET). Clearly this is broken behaviour on t...### Summary
"A" fetch to an auth server returns "CNAME". But (it appears), with prefetch enabled (the default), when the "CNAME" is fetched the authoritative sends back noanswer/noerror = NXRRSET). Clearly this is broken behaviour on the part of the Auth servers (or they just changed their zone from providing a CNAME to providing an answer) but I still don't see why it breaks a BIND resolver - which should just at this point understand that the CNAME no longer exists and (as needed, as a result of client queries) query instead from the beginning with the RTYPE the client needs to have resolved.
Instead, the resolver is returning the empty answer to querying clients (who are not querying for CNAME, they are querying the resolver for A)
See [Support ticket 22027](https://support.isc.org/Ticket/Display.html?id=22027)
### BIND version used
9.16.35-S1
### Steps to reproduce
We don't have a reproducer at this time, but see the Support ticket for more details on what's happening. You need an authoritative server that responds with a CNAME (with a valid target) when queried for A (or other) rtypes for a name, but when queried explicitly for CNAME, sends back noerror/noanswer (essentially NXRRSET). Then enable prefetch and keep querying the server for record type A until the CNAME is close to expiry and is therefore prefetched explicitly...
### What is the current *bug* behavior?
When we are handling a client query, we are making queries to cache or to authoritative servers (cache miss) but all of those queries are for the RTYPE that we want to resolve. We don't query for type CNAME. IF we hit a CNAME along the way, then that will cause us to start a new query (from cache or initiate a fetch if we need to) using the target of the CNAME as the new name to be queried.
So far so good. This implies that the code that looks in cache and gets an answer from a fetch handles CNAME as a special case and that we likely look for or cache EXPLICITLY for CNAMEs while we're looking for the RTYPE that we actually want to resolve.
I suspect that we would not expect to find in cache an NXRRSET of type CNAME. Essentially this is meaningless to us - if CNAME doesn't exist than any other record type might exist, we just don't know, it might as well just not be there.
If we get back 'NXRRSET' from a fetch for type CNAME, do we even add it to cache, or does this result in us deleting the original CNAME RR?
Whatever we do with it, it appears to 'break' the cache so that clients get back NOANSWER (empty answer) instead of named doing another fetch based on the RTYPE of the client query made after this CNAME has been refreshed.
### What is the expected *correct* behavior?
Getting a 'NXRRSET' query response from an auth server that has explicitly been queried for a CNAME RR (to refresh what was in cache before - as instigated by prefetch) should not cause the cache to no longer be able to resolve queries for that name for other RTYPEs.
Subsequent client queries after receipt of the auth answer that says that the CNAME no longer exists, should cause new fetches to the auth server with the RTYPE of the client query in them.
Is it remotely possibly however, that finding a CNAME in cache (since we already know that we do something special if we find it) but then finding that it's not a pointer to 'go look up this name instead of the one you had' but instead is NXRRSET (whoa that wasn't what we expected to find!) could cause something aberrant to happen ... ? Or maybe this is a subtle race condition do with replacing the CNAME with NXRRSET for the CNAME (or deleting it entirely) because of the query response from the auth server, and this happening as a result of the prefetch, but now racing with the next client query that is looking in cache?
### Relevant configuration files
No configs - nothing special needed, just prefetch enabled so that when the CNAME in cache is close to expiry, a client query will trigger a prefetch.
### Relevant logs and/or screenshots
N/A - please see support ticket for more details
### Possible fixes
N/A
(P.S. With the info available and with what we know it was very hard to complete this report per the template).BIND 9.21.xMatthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/4444TCP fallback does not happen on bind 9.18.112023-12-19T09:16:56ZShota HinoTCP fallback does not happen on bind 9.18.11
### Summary
After upgrading bind9 from v9.18.10 to v9.18.11, we found that TCP fallback no longer happens. The previous behavior on bind v9.18.10 was that after UDP queries time out, named falls back to TCP. We identified that this ch...
### Summary
After upgrading bind9 from v9.18.10 to v9.18.11, we found that TCP fallback no longer happens. The previous behavior on bind v9.18.10 was that after UDP queries time out, named falls back to TCP. We identified that this change in behavior was introduced by this change https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/7212.
### BIND version used
v9.18.11
### Steps to reproduce
1. Block UDP queries.
2. Observe that named does not fall back to TCP
### What is the current *bug* behavior?
TCP fallback does not happen after UDP queries time out.
### What is the expected *correct* behavior?
After UDP timeout, named falls back to use TCP.
### Relevant configuration files
```
options {
listen-on { 192.168.4.1; 127.0.0.1; }; # see warning above before changing
version "not currently available";
forwarders {
75.75.75.75;
75.75.76.76;
8.8.8.8;
8.8.4.4;
208.67.222.222;
208.67.220.220;
};
querylog yes;
# Cache and forward
recursion yes;
forward only;
# Enable dnssec
dnssec-validation yes;
auth-nxdomain no; # conform to RFC1035
max-cache-size 2m;
max-cache-ttl 3600;
# Default path is at the root directory, which is not writable by bind.
dump-file "/run/named/named_dump.db";
};
```
### Relevant logs and/or screenshots
This is the packet capture from the working version - v9.18.10. You can see that TCP fallback happens.
![Screenshot_from_2023-11-21_12-19-39](/uploads/9b85a9cac9e0b2900ac60c3baa619668/Screenshot_from_2023-11-21_12-19-39.png)
### Possible fixes
I have not fully traced the code, but TCP fallback might have been happening via the code here? https://gitlab.isc.org/isc-projects/bind9/-/blob/bind-9.18/lib/dns/resolver.c?ref_type=heads#L2600-2625.
With the said commit https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/7249/diffs?commit_id=095f634f48d621da1e26e5ada5026b5427e0a0bb#23711d1cafba45a6f9e0b84f76c786435421f0e8_2631_2631, this may not happen if the retry counter never gets incremented. I could be wrong on this analysis because I did not carefully trace the code, so please take this with a grain of salt.BIND 9.21.xhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4153Run system tests in network namespaces2023-12-14T15:27:50ZTom KrizekRun system tests in network namespacesExecuting system tests under pytest should support isolation using network namespaces on platforms where it's possible. It would simplify running the tests (no root setup required), prevent any network interference, remove weird quirks w...Executing system tests under pytest should support isolation using network namespaces on platforms where it's possible. It would simplify running the tests (no root setup required), prevent any network interference, remove weird quirks with port assignment and make it easier to capture relevant traffic into PCAP.Not plannedTom KrizekTom Krizekhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4485Update httppicoparser2023-12-13T16:43:09ZOndřej SurýUpdate httppicoparserThis sounds like something we should eventually sync: https://github.com/h2o/picohttpparser/pull/78This sounds like something we should eventually sync: https://github.com/h2o/picohttpparser/pull/78BIND 9.19.xOndřej SurýOndřej Surý