BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2024-03-06T08:38:23Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4592Improve the isc_heap resize algorithm2024-03-06T08:38:23ZOndřej SurýImprove the isc_heap resize algorithmThe current isc_heap resizing algorithm grows the array for holding the heap elements by 1024 (there's an argument to `isc_heap_create()`, but either default (1024) or explicit 1024 is used everywhere).The current isc_heap resizing algorithm grows the array for holding the heap elements by 1024 (there's an argument to `isc_heap_create()`, but either default (1024) or explicit 1024 is used everywhere).May 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/4591TTL-based cache cleaning in rbtdb ineffective (M/N problem)2024-03-20T13:50:45ZOndřej SurýTTL-based cache cleaning in rbtdb ineffective (M/N problem)Copying stuff from MatterMost for permanent record:
How it started?
> @ondrej: But I don't know actually - if there's something blocking the top of the heap to be cleaned up (dunno why yet), it could block cleaning everything else behin...Copying stuff from MatterMost for permanent record:
How it started?
> @ondrej: But I don't know actually - if there's something blocking the top of the heap to be cleaned up (dunno why yet), it could block cleaning everything else behind it
So, here's a summary of our finding
The TTL based cleaning is opportunistic. When we are inserting a new node, we look at the top of the heap ("list sorted by TTL-value") and if TTL of the top of the heap is expired, we try to eradicate the data it stores from the cache. The code that decides whether we will do the TTL-based cleaning looks like this:
```c
header = isc_heap_element(rbtdb->heaps[rbtnode->locknum], 1);
if (header != NULL) {
dns_ttl_t rdh_ttl = header->rdh_ttl;
/* Only account for stale TTL if cache is not overmem */
if (!cache_is_overmem) {
rdh_ttl += STALE_TTL(header, rbtdb);
}
if (rdh_ttl < now - RBTDB_VIRTUAL) {
expire_header(rbtdb, header, tree_locked,
expire_ttl);
}
}
```
Now, the RBTDB_VIRTUAL is this:
```
/*
* Allow clients with a virtual time of up to 5 minutes in the past to see
* records that would have otherwise have expired.
*/
#define RBTDB_VIRTUAL 300
```
Which means that even with 0 TTL records, the TTL-based cleaning will not kick-in until 5 minutes has passed.
Now what happens if you insert N records (where N could be large) within the first 5 minutes (cold bootstrap)?
After 5 minutes the TTL based cleaning should kick-in, but that's only if everything in the cache has expired, but even with that.
If we insert M records in the next minute, only M records will gets cleaned at maximum.
But anything that has longer TTL will just add to N, so the backlog of the records that are not subject to the TTL-based cleaning will build up over time until we are overmem and LRU based cleaning kicks in.
Additionally, @ondrej found out that there are records with `rdh_ttl` set to `0` if you just run `dnsperf` on a cold cache resolver, which made us puzzled - shouldn't we be cleaning records only after 5 minutes?
@michal did some excellent work and found out following:
so, i looked at those "zero TTL" frees that happen very shortly into the start of regular resolver operation
turns out that these are "superseded" headers
now, i don't understand why RBTDB does that. but it's this branch in add32() that expires these headers: https://gitlab.isc.org/isc-projects/bind9/-/blob/8fb49c5a8acdb37d4f4be0c5f5b19645d0103f48/lib/dns/rbtdb.c#L6677-6714
basically, AFAICT, what happens here is:
resolver: hey, RBTDB, add this shiny new SOA rdataset header to the cache
RBTDB: oh okay, let me see if i have a SOA header on that node already...
RBTDB: yup, got one. i'll just expire the one i had in there before (topheader) and insert the new one you gave me (newheader).
what i don't understand is why it just doesn't leave that old (identical, AFAICT) rdataset intact. but i'll move on with the reasoning for now.
so, the "previously-topmost" SOA header has its TTL set to zero, so it lands on the top of one of the TTL heaps
once we add a new rdataset, it should be cleaned up. but, as we already established yesterday, that only happens if the reference count of its node is zero.
meanwhile, in my very basic test (literally seconds into firing up dnsperf with the query set we use for respdiff tests, at some 1 kQPS), those rdatasets at the top of the heap are for nodes like net. or com. whose reference counts are around 60, or maybe 80 references. and until that count goes to zero, the first rdataset on the TTL heap will not disappear from there, blocking any rdataset headers on the same TTL heap from being cleaned up as well (backlog)
so, i thought "how bad is it and can we measure it?" and oh, boy, he could:
```
20-Feb-2024 10:46:29.561 connection refused resolving 'geo.fortinet.net/NS/IN': 104.198.248.186#53
20-Feb-2024 10:46:29.564 DNS format error from 203.205.195.75#53 resolving ns3.qq.com/AAAA for <unknown>: Name qq.com (SOA) not subdomain of zone ns3.qq.com -- invalid response
20-Feb-2024 10:46:29.564 FORMERR resolving 'ns3.qq.com/AAAA/IN': 203.205.195.75#53
20-Feb-2024 10:46:29.564 DNS format error from 108.136.87.44#53 resolving api.cloud.huawei.com/AAAA for 127.0.0.1#34840: Name huawei.com (SOA) not subdomain of zone cloud.huawei.com -- invalid response
20-Feb-2024 10:46:29.564 FORMERR resolving 'api.cloud.huawei.com/AAAA/IN': 108.136.87.44#53
-- 0 msecs since expiry
-- 0 msecs since expiry
-- 106 msecs since expiry
-- 106 msecs since expiry
-- 29 msecs since expiry
-- 29 msecs since expiry
20-Feb-2024 10:46:29.574 success resolving 'a.karma.sc2.fsapi.com/A' after disabling qname minimization due to 'ncache nxdomain'
-- 243 msecs since expiry
-- 246 msecs since expiry
-- 249 msecs since expiry
-- 433 msecs since expiry
-- 806 msecs since expiry
-- 813 msecs since expiry
-- 243 msecs since expiry
-- 246 msecs since expiry
-- 249 msecs since expiry
-- 433 msecs since expiry
-- 806 msecs since expiry
-- 0 msecs since expiry
-- 816 msecs since expiry
-- 816 msecs since expiry
-- 123 msecs since expiry
-- 323 msecs since expiry
-- 123 msecs since expiry
-- 323 msecs since expiry
20-Feb-2024 10:46:29.601 success resolving 'F00DCFRIBM41.zf.if.atcsg.net/A' after disabling qname minimization due to 'ncache nxdomain'
20-Feb-2024 10:46:29.624 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 47.118.199.194#53
-- 0 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:29.677 DNS format error from 3.97.163.50#53 resolving api.cloud.huawei.com/AAAA for 127.0.0.1#34840: Name huawei.com (SOA) not subdomain of zone cloud.huawei.com -- invalid response
20-Feb-2024 10:46:29.677 FORMERR resolving 'api.cloud.huawei.com/AAAA/IN': 3.97.163.50#53
-- 6 msecs since expiry
-- 3 msecs since expiry
-- 9 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:29.801 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 39.96.153.33#53
-- 3 msecs since expiry
-- 3 msecs since expiry
-- 46 msecs since expiry
-- 19 msecs since expiry
-- 0 msecs since expiry
-- 39 msecs since expiry
-- 0 msecs since expiry
-- 0 msecs since expiry
-- 3 msecs since expiry
-- 3 msecs since expiry
-- 126 msecs since expiry
-- 0 msecs since expiry
-- 0 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:30.104 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 139.224.142.104#53
-- 286 msecs since expiry
-- 0 msecs since expiry
-- 69 msecs since expiry
-- 13 msecs since expiry
20-Feb-2024 10:46:30.307 success resolving 'hcdnd.csfw.c.cdnhwc6.com/AAAA' after disabling qname minimization due to 'ncache nxdomain'
-- 0 msecs since expiry
-- 199 msecs since expiry
-- 269 msecs since expiry
-- 656 msecs since expiry
-- 689 msecs since expiry
-- 746 msecs since expiry
-- 789 msecs since expiry
-- 816 msecs since expiry
-- 819 msecs since expiry
-- 819 msecs since expiry
-- 839 msecs since expiry
-- 853 msecs since expiry
-- 873 msecs since expiry
-- 876 msecs since expiry
-- 889 msecs since expiry
-- 896 msecs since expiry
-- 896 msecs since expiry
-- 916 msecs since expiry
-- 936 msecs since expiry
-- 956 msecs since expiry
-- 969 msecs since expiry
-- 979 msecs since expiry
-- 983 msecs since expiry
-- 999 msecs since expiry
-- 999 msecs since expiry
-- 1006 msecs since expiry
-- 1006 msecs since expiry
-- 1053 msecs since expiry
-- 1053 msecs since expiry
-- 1063 msecs since expiry
-- 1066 msecs since expiry
-- 1073 msecs since expiry
-- 1079 msecs since expiry
-- 1083 msecs since expiry
-- 1089 msecs since expiry
-- 1096 msecs since expiry
-- 1133 msecs since expiry
-- 1139 msecs since expiry
-- 1156 msecs since expiry
-- 1163 msecs since expiry
-- 1166 msecs since expiry
-- 1179 msecs since expiry
-- 1183 msecs since expiry
-- 1189 msecs since expiry
-- 1193 msecs since expiry
-- 1196 msecs since expiry
-- 1199 msecs since expiry
-- 1223 msecs since expiry
-- 1223 msecs since expiry
-- 1249 msecs since expiry
-- 1259 msecs since expiry
-- 1276 msecs since expiry
-- 1293 msecs since expiry
-- 1336 msecs since expiry
-- 1363 msecs since expiry
-- 1369 msecs since expiry
-- 1379 msecs since expiry
-- 1393 msecs since expiry
-- 1396 msecs since expiry
-- 1399 msecs since expiry
-- 1403 msecs since expiry
-- 1423 msecs since expiry
-- 1429 msecs since expiry
-- 1436 msecs since expiry
-- 1436 msecs since expiry
-- 1439 msecs since expiry
-- 1446 msecs since expiry
-- 1446 msecs since expiry
-- 1466 msecs since expiry
-- 1466 msecs since expiry
-- 1469 msecs since expiry
-- 1473 msecs since expiry
-- 1486 msecs since expiry
-- 1503 msecs since expiry
-- 1506 msecs since expiry
-- 1516 msecs since expiry
-- 1539 msecs since expiry
-- 1539 msecs since expiry
-- 1556 msecs since expiry
-- 1566 msecs since expiry
-- 1569 msecs since expiry
-- 1589 msecs since expiry
-- 1623 msecs since expiry
-- 1623 msecs since expiry
-- 1629 msecs since expiry
-- 1636 msecs since expiry
-- 1636 msecs since expiry
-- 1649 msecs since expiry
-- 1679 msecs since expiry
-- 1686 msecs since expiry
-- 1703 msecs since expiry
-- 1703 msecs since expiry
-- 1716 msecs since expiry
-- 1726 msecs since expiry
-- 1736 msecs since expiry
-- 1773 msecs since expiry
-- 1796 msecs since expiry
-- 1796 msecs since expiry
-- 1799 msecs since expiry
-- 1803 msecs since expiry
-- 1829 msecs since expiry
-- 1839 msecs since expiry
-- 1876 msecs since expiry
-- 1879 msecs since expiry
-- 1883 msecs since expiry
-- 1889 msecs since expiry
-- 1889 msecs since expiry
-- 1889 msecs since expiry
-- 1899 msecs since expiry
-- 1946 msecs since expiry
-- 1956 msecs since expiry
-- 1983 msecs since expiry
-- 2013 msecs since expiry
-- 2013 msecs since expiry
-- 2023 msecs since expiry
-- 2049 msecs since expiry
-- 2059 msecs since expiry
-- 2063 msecs since expiry
-- 2079 msecs since expiry
-- 2106 msecs since expiry
-- 2109 msecs since expiry
-- 2119 msecs since expiry
-- 2136 msecs since expiry
-- 2156 msecs since expiry
-- 2183 msecs since expiry
-- 2199 msecs since expiry
-- 2203 msecs since expiry
-- 2219 msecs since expiry
-- 2223 msecs since expiry
-- 2229 msecs since expiry
-- 2243 msecs since expiry
-- 2253 msecs since expiry
-- 2263 msecs since expiry
-- 2286 msecs since expiry
-- 2303 msecs since expiry
-- 2343 msecs since expiry
-- 2356 msecs since expiry
-- 2376 msecs since expiry
-- 2376 msecs since expiry
-- 2379 msecs since expiry
-- 2386 msecs since expiry
-- 2416 msecs since expiry
-- 2426 msecs since expiry
-- 2433 msecs since expiry
-- 2449 msecs since expiry
-- 2466 msecs since expiry
-- 2479 msecs since expiry
-- 2509 msecs since expiry
-- 2516 msecs since expiry
-- 2523 msecs since expiry
-- 2529 msecs since expiry
-- 2536 msecs since expiry
-- 2539 msecs since expiry
-- 2539 msecs since expiry
-- 2543 msecs since expiry
-- 2553 msecs since expiry
-- 2559 msecs since expiry
-- 2569 msecs since expiry
-- 2593 msecs since expiry
-- 2609 msecs since expiry
-- 2616 msecs since expiry
-- 2623 msecs since expiry
-- 2629 msecs since expiry
-- 2656 msecs since expiry
-- 2659 msecs since expiry
-- 2673 msecs since expiry
-- 2716 msecs since expiry
-- 2766 msecs since expiry
-- 2773 msecs since expiry
-- 2773 msecs since expiry
-- 2776 msecs since expiry
-- 2783 msecs since expiry
-- 2789 msecs since expiry
-- 2823 msecs since expiry
-- 2843 msecs since expiry
-- 2879 msecs since expiry
-- 2879 msecs since expiry
-- 2886 msecs since expiry
-- 2899 msecs since expiry
-- 2919 msecs since expiry
-- 2926 msecs since expiry
-- 2949 msecs since expiry
-- 2956 msecs since expiry
-- 2956 msecs since expiry
-- 2956 msecs since expiry
-- 2979 msecs since expiry
-- 2983 msecs since expiry
-- 2983 msecs since expiry
-- 2986 msecs since expiry
-- 3033 msecs since expiry
-- 3039 msecs since expiry
-- 3049 msecs since expiry
-- 3066 msecs since expiry
-- 3086 msecs since expiry
-- 3126 msecs since expiry
-- 3129 msecs since expiry
-- 3156 msecs since expiry
-- 3169 msecs since expiry
-- 3173 msecs since expiry
-- 3176 msecs since expiry
-- 3226 msecs since expiry
-- 3253 msecs since expiry
-- 3266 msecs since expiry
-- 3269 msecs since expiry
-- 3279 msecs since expiry
-- 3279 msecs since expiry
-- 3299 msecs since expiry
-- 3323 msecs since expiry
-- 3323 msecs since expiry
-- 3336 msecs since expiry
-- 3336 msecs since expiry
-- 3363 msecs since expiry
-- 3366 msecs since expiry
-- 3396 msecs since expiry
-- 3399 msecs since expiry
-- 3409 msecs since expiry
-- 3449 msecs since expiry
-- 3463 msecs since expiry
-- 3466 msecs since expiry
-- 3499 msecs since expiry
-- 0 msecs since expiry
-- 199 msecs since expiry
-- 269 msecs since expiry
-- 656 msecs since expiry
-- 689 msecs since expiry
-- 746 msecs since expiry
-- 789 msecs since expiry
-- 816 msecs since expiry
-- 819 msecs since expiry
-- 819 msecs since expiry
-- 839 msecs since expiry
-- 853 msecs since expiry
-- 873 msecs since expiry
-- 876 msecs since expiry
-- 889 msecs since expiry
-- 896 msecs since expiry
-- 896 msecs since expiry
-- 916 msecs since expiry
-- 936 msecs since expiry
-- 956 msecs since expiry
-- 969 msecs since expiry
-- 979 msecs since expiry
-- 983 msecs since expiry
-- 999 msecs since expiry
-- 999 msecs since expiry
-- 1006 msecs since expiry
-- 1006 msecs since expiry
-- 1053 msecs since expiry
-- 1053 msecs since expiry
-- 1063 msecs since expiry
-- 1066 msecs since expiry
-- 1073 msecs since expiry
-- 1079 msecs since expiry
-- 1083 msecs since expiry
-- 1089 msecs since expiry
-- 1096 msecs since expiry
-- 1133 msecs since expiry
-- 1139 msecs since expiry
-- 1156 msecs since expiry
-- 1163 msecs since expiry
-- 1166 msecs since expiry
-- 1179 msecs since expiry
-- 1183 msecs since expiry
-- 1189 msecs since expiry
-- 1196 msecs since expiry
-- 1199 msecs since expiry
-- 1203 msecs since expiry
-- 1226 msecs since expiry
-- 1226 msecs since expiry
-- 1253 msecs since expiry
-- 1263 msecs since expiry
-- 1279 msecs since expiry
-- 1296 msecs since expiry
-- 1339 msecs since expiry
-- 1366 msecs since expiry
-- 1373 msecs since expiry
-- 1383 msecs since expiry
-- 1396 msecs since expiry
-- 1399 msecs since expiry
-- 1403 msecs since expiry
-- 1406 msecs since expiry
-- 1426 msecs since expiry
-- 1433 msecs since expiry
-- 1439 msecs since expiry
-- 1439 msecs since expiry
-- 1443 msecs since expiry
-- 1449 msecs since expiry
-- 1449 msecs since expiry
-- 1469 msecs since expiry
-- 1469 msecs since expiry
-- 1473 msecs since expiry
-- 1476 msecs since expiry
-- 1489 msecs since expiry
-- 1506 msecs since expiry
-- 1509 msecs since expiry
-- 1519 msecs since expiry
-- 1543 msecs since expiry
-- 1543 msecs since expiry
-- 1559 msecs since expiry
-- 1569 msecs since expiry
-- 1573 msecs since expiry
-- 1593 msecs since expiry
-- 1626 msecs since expiry
-- 1626 msecs since expiry
-- 1633 msecs since expiry
-- 1639 msecs since expiry
-- 1639 msecs since expiry
-- 1653 msecs since expiry
-- 1683 msecs since expiry
-- 1689 msecs since expiry
-- 1706 msecs since expiry
-- 1706 msecs since expiry
-- 1719 msecs since expiry
-- 1729 msecs since expiry
-- 1739 msecs since expiry
-- 1776 msecs since expiry
-- 1799 msecs since expiry
-- 1799 msecs since expiry
-- 1803 msecs since expiry
-- 1806 msecs since expiry
-- 1833 msecs since expiry
-- 1843 msecs since expiry
-- 1879 msecs since expiry
-- 1883 msecs since expiry
-- 1886 msecs since expiry
-- 1893 msecs since expiry
-- 1893 msecs since expiry
-- 1893 msecs since expiry
-- 1903 msecs since expiry
-- 1949 msecs since expiry
-- 1959 msecs since expiry
-- 1986 msecs since expiry
-- 2016 msecs since expiry
-- 2016 msecs since expiry
-- 2026 msecs since expiry
-- 2053 msecs since expiry
-- 2063 msecs since expiry
-- 2066 msecs since expiry
-- 2083 msecs since expiry
-- 2109 msecs since expiry
-- 2113 msecs since expiry
-- 2123 msecs since expiry
-- 2139 msecs since expiry
-- 2159 msecs since expiry
-- 2186 msecs since expiry
-- 2203 msecs since expiry
-- 2206 msecs since expiry
-- 2223 msecs since expiry
-- 2226 msecs since expiry
-- 2233 msecs since expiry
-- 2246 msecs since expiry
-- 2256 msecs since expiry
-- 2266 msecs since expiry
-- 2289 msecs since expiry
-- 2306 msecs since expiry
-- 2346 msecs since expiry
-- 2359 msecs since expiry
-- 2379 msecs since expiry
-- 2379 msecs since expiry
-- 2383 msecs since expiry
-- 2389 msecs since expiry
-- 2419 msecs since expiry
-- 2429 msecs since expiry
-- 2436 msecs since expiry
-- 2453 msecs since expiry
-- 2469 msecs since expiry
-- 2483 msecs since expiry
-- 2513 msecs since expiry
-- 2519 msecs since expiry
-- 2526 msecs since expiry
-- 2533 msecs since expiry
-- 2539 msecs since expiry
-- 2543 msecs since expiry
-- 2543 msecs since expiry
-- 2546 msecs since expiry
-- 2556 msecs since expiry
-- 2563 msecs since expiry
-- 2573 msecs since expiry
-- 2596 msecs since expiry
-- 2613 msecs since expiry
-- 2619 msecs since expiry
-- 2626 msecs since expiry
-- 2633 msecs since expiry
-- 2659 msecs since expiry
-- 2663 msecs since expiry
-- 2676 msecs since expiry
-- 2719 msecs since expiry
-- 2769 msecs since expiry
-- 2776 msecs since expiry
-- 2776 msecs since expiry
-- 2779 msecs since expiry
-- 2786 msecs since expiry
-- 2793 msecs since expiry
-- 2826 msecs since expiry
-- 2846 msecs since expiry
-- 2883 msecs since expiry
-- 2883 msecs since expiry
-- 2889 msecs since expiry
-- 2903 msecs since expiry
-- 2923 msecs since expiry
-- 2929 msecs since expiry
-- 2956 msecs since expiry
-- 2959 msecs since expiry
-- 2959 msecs since expiry
-- 2959 msecs since expiry
-- 2983 msecs since expiry
-- 2986 msecs since expiry
-- 2986 msecs since expiry
-- 2989 msecs since expiry
-- 3036 msecs since expiry
-- 3043 msecs since expiry
-- 3053 msecs since expiry
-- 3069 msecs since expiry
-- 3089 msecs since expiry
-- 3129 msecs since expiry
-- 3133 msecs since expiry
-- 3159 msecs since expiry
-- 3173 msecs since expiry
-- 3176 msecs since expiry
-- 3179 msecs since expiry
-- 3229 msecs since expiry
-- 3256 msecs since expiry
-- 3269 msecs since expiry
-- 3273 msecs since expiry
-- 3283 msecs since expiry
-- 3283 msecs since expiry
-- 3303 msecs since expiry
-- 3326 msecs since expiry
-- 3326 msecs since expiry
-- 3339 msecs since expiry
-- 3339 msecs since expiry
-- 3366 msecs since expiry
-- 3369 msecs since expiry
-- 3399 msecs since expiry
-- 3403 msecs since expiry
-- 3413 msecs since expiry
-- 3453 msecs since expiry
-- 3466 msecs since expiry
-- 3469 msecs since expiry
-- 3 msecs since expiry
20-Feb-2024 10:46:30.324 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 39.96.153.53#53
-- 0 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:30.524 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 39.96.153.32#53
-- 0 msecs since expiry
```
So, there are multiple issues circling around the TTL-based cleaning.March 2024 (9.16.49, 9.16.49-S1, 9.18.25, 9.18.25-S1, 9.19.22)https://gitlab.isc.org/isc-projects/bind9/-/issues/4589Adjust SyncPublish Interval for dnssec-policy to minimize delay2024-02-20T05:29:16ZAzizbek KadirovAdjust SyncPublish Interval for dnssec-policy to minimize delayWhen enabling the dnssec-policy for a zone in BIND9, I've noticed that the default SyncPublish interval is set to 1 day + 5 minutes.
Steps to reproduce:
1. Create a zone:
```plaintext
zone "example.com" {
type master;
file "/var/cac...When enabling the dnssec-policy for a zone in BIND9, I've noticed that the default SyncPublish interval is set to 1 day + 5 minutes.
Steps to reproduce:
1. Create a zone:
```plaintext
zone "example.com" {
type master;
file "/var/cache/bind/zones/example.com.zone";
dnssec-policy "default";
inline-signing yes;
key-directory "/var/cache/bind/keys/example.com";
};
```
2. reload bind with `rndc reload`
3. 3 files generated: .key, .private, .state.
4. Inside file, in metadata we can see following:
```plaintext
; This is a key-signing key, keyid 9061, for example.com.
; Created: 20240219101033 (Mon Feb 19 15:10:33 2024)
; Publish: 20240219101033 (Mon Feb 19 15:10:33 2024)
; Activate: 20240219101033 (Mon Feb 19 15:10:33 2024)
; SyncPublish: 20240220111533 (Tue Feb 20 16:15:33 2024)
```
5. It appears more efficient to reduce this interval to just +5 minutes. Currently, the delay incurred by the default interval might lead to potential synchronization issues or delays in propagating changes. By minimizing the interval to +5 minutes, we can ensure timely synchronization of DNSSEC-related updates without unnecessary delay. I couldn't find how to reduce SyncPublish time using custom dnssec-policyhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4588CID 486508: Control flow issues in lib/ns/query.c2024-02-21T10:51:33ZMichal NowakCID 486508: Control flow issues in lib/ns/query.cCoverity Scan claims a control flow issue in `lib/ns/query.c`.
```cpp
/lib/ns/query.c: 6307 in fetch_callback()
6301 * Return an error to the client, or just drop.
6302 */
6303 if (fetch_canceled) {
6304 CTRACE...Coverity Scan claims a control flow issue in `lib/ns/query.c`.
```cpp
/lib/ns/query.c: 6307 in fetch_callback()
6301 * Return an error to the client, or just drop.
6302 */
6303 if (fetch_canceled) {
6304 CTRACE(ISC_LOG_ERROR, "fetch cancelled");
6305 query_error(client, DNS_R_SERVFAIL, __LINE__);
6306 } else {
>>> CID 486508: Control flow issues (DEADCODE)
>>> Execution cannot reach this statement: "query_next(client, ISC_R_CA...".
6307 query_next(client, ISC_R_CANCELED);
6308 }
6309
6310 /*
6311 * Free any persistent plugin data that was allocated to
6312 * service the client, then detach the client object.
```March 2024 (9.16.49, 9.16.49-S1, 9.18.25, 9.18.25-S1, 9.19.22)Arаm SаrgsyаnArаm Sаrgsyаnhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4586Don't count expired / future RRSIGs in verification failure quota2024-02-24T08:17:15ZMark AndrewsDon't count expired / future RRSIGs in verification failure quotaExpired / future RRSIGs don't trigger a public key verification.Expired / future RRSIGs don't trigger a public key verification.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/4585Add an option to named-compilezone to retain comments2024-02-18T03:27:07ZMarco DavidsAdd an option to named-compilezone to retain comments### Description
`named-compilezone` strips comments from zone files.
### Request
There might be use cases, where `named-compilezone` is used as a cleanup tool, while any comments that are present need to be retained.
It would be gre...### Description
`named-compilezone` strips comments from zone files.
### Request
There might be use cases, where `named-compilezone` is used as a cleanup tool, while any comments that are present need to be retained.
It would be great if this could be achieved by some command-line option.
Obviously there are some caveats, but it seems that these can be addressed by defining certain conditions that must be met and by properly documenting the right way of working. For example, just as a suggestion: to have any comments on the same line in the zonefile like this:
Undefined (may fail):
```
; comment 1
www AAAA 2001:db8::1
; comment 2
```
Well defined (will work):
`www AAAA 2001:db8::1 ; comment 1`
### Links / references
n/aNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4584Feature request: streamline the behavior of 'hostname' and 'server-id'2024-02-15T15:24:10ZMarco DavidsFeature request: streamline the behavior of 'hostname' and 'server-id'### Description
`hostname` default is the hostname of the machine hosting the name server, as found by the gethostname() function.
`server-id` default is `none`.
This may lead to confusion. Especially since other vendors (Unbound, Kno...### Description
`hostname` default is the hostname of the machine hosting the name server, as found by the gethostname() function.
`server-id` default is `none`.
This may lead to confusion. Especially since other vendors (Unbound, Knot) do it differently by combining the two (via the `identity` option).
### Request
To put both option-defaults more inline with each other. And perhaps even to consider to combine them, instead of treating them as separate entities. RFC4892 seams to suggest they are alternative ways of pointing to the same thing and this is also how Knot and Unbound interpret it.
Perhaps, while at it, also consider the usefulness of SOA and NS records in the reply (in the case of `CH ANY version.bind` and `CH ANY authors.bind`), because they are still shown, even if `server-id` is undefined, while their existence is not clear.
### Links / references
RFC4892 and RFC5001 may be relevant.https://gitlab.isc.org/isc-projects/bind9/-/issues/4583BIND Upgrade2024-02-15T12:44:12ZGitLab Support BotBIND UpgradeHello,
Our bind version seems below. How can we upgrade bind version?
And if we upgrade bind version, is there any problem?
[root@ns2 ~]# named -v
BIND 9.11.36-RedHat-9.11.36-11.el8_9 (Extended Support Version) <id:68dbd5b>
Thanks
SemraHello,
Our bind version seems below. How can we upgrade bind version?
And if we upgrade bind version, is there any problem?
[root@ns2 ~]# named -v
BIND 9.11.36-RedHat-9.11.36-11.el8_9 (Extended Support Version) <id:68dbd5b>
Thanks
Semrahttps://gitlab.isc.org/isc-projects/bind9/-/issues/4582Add support for QUIC and DNS over QUIC/DoQ (RFC 9250)2024-03-05T20:04:55ZArtem BoldarievAdd support for QUIC and DNS over QUIC/DoQ (RFC 9250)_This section is very likely to be changed/updated/expanded in the future._
## Overview
One of the relatively recent additions of transport protocols for DNS is QUIC (DoQ, covered by [RFC 9250](https://www.rfc-editor.org/rfc/rfc9250.ht..._This section is very likely to be changed/updated/expanded in the future._
## Overview
One of the relatively recent additions of transport protocols for DNS is QUIC (DoQ, covered by [RFC 9250](https://www.rfc-editor.org/rfc/rfc9250.html)) and HTTP/3 (DoH3), which also works on top of QUIC.
We need a generic implementation of the QUIC protocol in BIND's codebase to proceed with these new transports.
QUIC is a sophisticated transport that works on top of UDP and uses encryption on top of TLSv1.3. It is covered by multiple RFCs and is being actively worked on. The list of RFCs includes the following:
- https://www.rfc-editor.org/rfc/rfc9250.html
- https://www.rfc-editor.org/rfc/rfc8999.html
- https://www.rfc-editor.org/rfc/rfc9000.html
- https://www.rfc-editor.org/rfc/rfc9001.html
- https://www.rfc-editor.org/rfc/rfc9002.html
- https://www.rfc-editor.org/rfc/rfc9369.html
The protocol includes a lot of functionality, resembling the one from HTTP/2. Most notably, it is multiple uni- or bi-directional streams per connection. This aspect might have been influenced by the need to carry a new version of HTTP protocol (HTTP/3), which uses the multi-stream nature of QUIC instead of protocol-specific multiplexing found in HTTP/2.
Each bi-directional stream (in which we are the most interested, as these are used for DoQ) from the point of view of the higher-level code acts similarly to a TCP connection, though in the case of DNS, no more than one request/query per stream is allowed. That means that DNS pipelining is achieved by relying on the multistream nature of QUIC (again, similarly to HTTP/2). The thing is that each stream (being, effectively, a separate "connection") shares TLS parameters with others.
Another important aspect of QUIC is client connection end-point migration. That is, a client may change its location (= IP address and UDP port) while keeping the connection active. That functionality requires complete virtualisation of networking connections, which are now being identified not by IP address and port combination but by abstract connection identifiers (connection IDs). That brings a notion of a "connection path" into the picture as well as a procedure of "path migration" for a client.
Though that is my personal opinion, QUIC seems to be at least partially a result of large companies' experience of running user-space TCP/IP stacks on scale - when TCP/IP stack is implemented as a user-space library in an application which "directly" uses a dedicated network card. Indeed, in the case of QUIC, many things that we expect in-kernel TCP/IP implementation to do are brought under the control of a user-space application, including, but not limited to, such intricacies as congestion control. In this case, UDP can be seen as a portable kernel interface to the network card with the additional advantage of making it possible to run multiple applications simultaneously using a network card (which is not the case when using user-space TCP/IP stacks).
QUIC is often described as a replacement for the TCP protocol. One of the authors of "Computer Networks: A Systems Approach" [argues](https://www.theregister.com/2022/10/07/quic_tcp_replacement/) that it is, in fact, an addition to the internet protocols suite, which is meant to implement a missing paradigm - a basis for Remote Procedure Calls (RPC) protocols. That is, it might be considered a replacement of the TCP only in the cases when TCP was used due to a lack of a better alternative.
It should be noted that while QUIC is meant to be used as a universal transport for DNS, it, unlike HTTP/2 (DoH), can be used for zone transfers as well. It is [not always guaranteed](https://www.theregister.com/2021/08/04/dissecting_performance_of_production_quic/) that it will provide a significant performance boost. In fact, it might require more traffic in some cases, as even the initial QUIC message should be no less than 1200 bytes, which is a lot by common DNS measurements. However, it might compensate for that by lower latency due to 0-RTT support. Also, it seems to be more like a client-oriented protocol due to the ability to migrate client connections to new addresses (which is great for portable mobile devices). That being said, it is not clear how beneficial it would be to use it for server-to-server communications for things like zone transfers: servers do not change addresses often, and the protocol itself is more verbose than, e.g. DoT, so I fail to see the immediate benefits for this case (although it is standardised).
## The State of Open Source QUIC Implementations and the Great OpenSSL Schism
As it was stated before, QUIC includes TLSv1.3. Thus, most implementations decided to dedicate TLS-specific functionality to base the crypto-related bits to OpenSSL and its derivatives, which makes sense as these libraries are widely deployed and used. However, initially, these libraries lacked QUIC-specific parts in their TLS implementations.
As the early QUIC adopters were also QUIC implementors, they forked OpenSSL and added the missing bits. The related changes to the API ended up in multiple OpenSSL forks, namely [QuicTLS](https://quictls.github.io/) (maintained by Akamai and Microsoft), [LibreSSL](https://www.libressl.org/) (maintained by OpenBSD), and [BoringSSL](https://boringssl.googlesource.com/boringssl) (maintained by Google). These libraries only implement the low-level TLS 1.3 bits related to QUIC but do not have any internal QUIC implementations, as their early adopters developed their own.
For quite some time, the original OpenSSL had [a merge request](https://github.com/openssl/openssl/pull/8797) opened to implement the same API and remain mostly compatible with its forks, as was the case for a long time. However, OpenSSL authors decided to provide a high-level implementation of QUIC of their own making and eventually closed the MR. That caused a lot of drama, about which you can read [here](https://daniel.haxx.se/blog/2021/10/25/the-quic-api-openssl-will-not-provide/) or [here](https://github.com/haproxy/haproxy/issues/680#issuecomment-1433118828) as well as in other places. Regarding OpenSSL, it is worth keeping multiple things in mind.
Firstly, [OpenSSL's implementation](https://www.openssl.org/docs/manmaster/man7/openssl-quic.html) is not ready yet, as it includes only minimal client-side implementation starting from the version of OpenSSL v3.2. The server-side support was planned for 3.3 (April 2024) according to [the project's roadmap](https://www.openssl.org/roadmap.html), but eventually, it was moved to 3.4 (October 2024), as there are [many not completed tasks](https://github.com/orgs/openssl/projects/2/views/31?pane=issue&itemId=31713456), some of them are marked as "Epic". Even after that, we most likely will have only the most basic (_Minimal Working Product_) implementation that is not as battle-tested as some others and with missing features.
Secondly, OpenSSL does not allow the use of third-party implementations of QUIC: with OpenSSL, the only option is to use the internal QUIC implementation. That was [noted by Tatsuhiro Tsujikawa](https://github.com/ngtcp2/ngtcp2/issues/898#issuecomment-1692538880), the principal author of nghttp2/ngtcp2/nghttp3.
As a result of these decisions, most QUIC implementations chose to depend on QuicTLS and other forks that provide similar API. That list includes [MS-QUIC](https://github.com/microsoft/msquic), [Quiche](https://blog.cloudflare.com/enjoy-a-slice-of-quic-and-rust/), Chromium QUIC, as well as internal (=not exposed as a redistributable library as it is very specific) implementation in HAProxy. Probably, most other libraries are likely doing the same.
There are notable exceptions to this, though.
Firstly, NGINX does not depend on a fork-related functionality to implement QUIC support, nor does it depend on the OpenSSL implementation of QUIC. One could get this impression after reading [the announcement of this functionality in NGINX](https://www.nginx.com/blog/quic-http3-support-openssl-nginx/), which discusses that they implement _only_ OpenSSL compatibility layer in order to remain compatible with both OpenSSL and its now numerous forks. It should be noted, though, that in this particular case, whoever wrote the announcement was modest, as, in fact, NGINX includes [their own in-house QUIC implementation](https://github.com/nginx/nginx/tree/master/src/event/quic). And yes, it seems that they managed to do it without relying on any QUIC-related functionality in OpenSSL or its forks (like QuicTLS, LibreSSL, and BoringTLS). Their code seems to work with basically any OpenSSL-like library with TLSv1.3 support.
Secondly, there is [ngtcp2](https://github.com/ngtcp2/ngtcp2), which itself does not depend on any cryptographic library per se. The library itself may use one of the provided backends implemented on top of QuicTLS, GnuTLS, PicoTLS or BoringSSL and implemented as separate libraries, but it does not have to, as an application can provide a custom implementation of the [ngtcp2 crypto API](https://nghttp2.org/ngtcp2/crypto_apiref.html) as described in [the programmer's guide](https://nghttp2.org/ngtcp2/programmers-guide.html). The library itself is [very complete](https://nghttp2.org/ngtcp2/apiref.html) and seems to implement all the intricacies of the QUIC protocol, unlike the OpenSSL's implementation at the time of writing (and likely, for quiet some time after that).
At this point, it is clear that as far as QUIC support goes, the OpenSSL and its numerous forks **will remain incompatible**. _That is not a technical decision and, thus, hard to resolve._
## Implementing QUIC in BIND
For the reasons given above, I think that we should choose the ngtcp2 library as a basis for our QUIC transport. Additionally, it is much more mature than the implementation that OpenSSL will initially provide as and has been considered stable for a while (1.0.0 was released Oct 15, 2023). Moreover, before implementing the final IETF QUIC, it implemented a number of drafts, so it is safe to say that the authors have been tracking the development of the protocol very closely. Considering that the most recent RFCs have been implemented as well (like [QUIC version 2](https://www.rfc-editor.org/rfc/rfc9369.html), which, in fact, is a minor update of the protocol), it appears to still be the case. We can pair it with our own crypto API implementation inspired by the code from NGINX and currently provided crypto libraries. That will ensure that BIND still remains compatible with other OpenSSL forks, in particular on the platforms that use them by default (like OpenBSD). That should give us the flexibility of NGINX without the burden of maintaining our own QUIC implementation, as I cannot justify doing that as much as depending on any of the numerous OpenSSL forks.
I cannot justify waiting for OpenSSL to have their implementation completed either, as it is not clear how good the initial implementation will be, and I am not sure if the higher-level API they intend to provide is going to be the best fit for us. Ngtcp2 already looks more promising in that regard, not to mention that we have, in my opinion, a very good experience of using [nghttp2](https://nghttp2.org/) from the same author. Also, it is worth noting that at this stage, we have an internal subsystem for managing TLS contexts used by DoT and DoH, so it is very desirable to use it for QUIC as well and having our own ngtcp2 crypto API implementation should make it possible. We can use the code [from NGINX](https://github.com/nginx/nginx/blob/master/src/event/quic/ngx_event_quic_openssl_compat.c) and existing [ngtcp2 crypto libraries](https://github.com/ngtcp2/ngtcp2/tree/main/crypto) as examples.
Apart from choosing the library which will serve as the foundation of our QUIC implementation, there are many other problems to solve, which are no less challenging and will require some thinking and trial and error, so it is better for us not to wait for OpenSSL so that we will have more time to iron out our QUIC-related code before the new release. That is, the code related to connections and stream management and, ideally, connection migration is the most challenging, in my opinion, though the choice of the library will affect its structure for sure.
I think that it would be best to use and scale the experience of structuring the transport code in a way similar to Stream DNS and PROXYv2. It is very desirable as it allows testing without direct dependence on the networking code. By such a design, I mean implementing the most important parts of the QUIC code as a black box to which we pass data, and it calls the necessary callbacks when required.
On the highest level, bi-directional QUIC streams (in which we are interested the most for now) should map well to Stream DNS or a very similar transport (a QUIC Stream based on `isc_dnsstream_assembler_t`).
QUIC has some newer characteristics not present in other transports, like an ability by both end-points to create new streams at any moment as much as a concept of a generic multiplexed transport itself. These things are likely to be resolved in a manner similar to HTTP/2 - using a virtual connection per stream. Another thing to mention is connection migration. Our code is not ready for that (and, likely, never will), so we should set a QUIC Stream end-point information at the moment of creating and then never update that (from the point of view of the higher level code for compatibility reasons - in fact, we still can do the actual connection migration with all the open streams).
So, in short, it seems that we should attempt to use ngtcp2 with custom crypto API implementation first. The initial plan was to combine the client-side support for QUIC from OpenSSL and combine it with "OpenSSL Compatibility Layer" (as they named it) from NGINX (which turned out to be a complete internal only in-house QUIC implementation), but due to the reasons given above, that will not work as it is not possible to combine QUIC-related functionality in OpenSSL with any third-party QUIC implementation at all (and likely never will). Ngtcp2 is the only crypto library agnostic implementation of QUIC at the moment.
I would rather choose to depend on OpenSSL for our QUIC implementation as a backup plan.
## Bridging the Gap: How We Could Make an Ngtcp2 Crypto API Implementation
Let's see how we can proceed with providing our own ngtcp2 crypto API implementation.
### Ngtcp2 Crypto API Implementations Structure
As it was noted above, in order to work, ngtcp2 requires a crypto API implementation. The project provides multiple of them for the crypto libraries that have explicit support for QUIC. The list of these libraries includes (at the time of writing) QuicTLS, BoringSSL, GnuTLS, WolfSSL, and PicoTLS.
Each crypto API implementation library consists of two parts.
Firstly, it is a high-level, [shared part](https://github.com/ngtcp2/ngtcp2/blob/main/crypto/shared.c). Among other things, it includes default implementations of all callbacks used by ngtcp2, and some important API calls, most notably `ngtcp2_crypto_derive_and_install_rx_key()` and `ngtcp2_crypto_derive_and_install_tx_key()`, which are mentioned in [the ngtcp2 programmers guide](https://nghttp2.org/ngtcp2/programmers-guide.html).
Secondly, a low-level part that provides a foundation for the functionality of the shared part. There is an implementation for each of the above mentioned supported crypto libraries.
The shared part and a low-level part linked together is an ngtcp2 crypto API implementation.
Of these implementations, the most interesting for us is the one for [QuicTLS](https://github.com/ngtcp2/ngtcp2/blob/main/crypto/quictls/quictls.c) and, to a somewhat lesser extent, [BoringSSL](https://github.com/ngtcp2/ngtcp2/blob/main/crypto/boringssl/boringssl.c), as both are OpenSSL forks. In fact, we can use most of the code from there without much adaptation, as QuicTLS is essentially OpenSSL + QUIC-related API (it is regularly rebased on top of the "mainline" OpenSSL).
The QuicTLS crypto API implementation does not use many QUIC-related API calls. I have identified only the following:
- `SSL_CTX_set_quic_method()` - we would be better using a very similar SSL_set_quic_method();
- `SSL_provide_quic_data()`;
- `SSL_set_quic_transport_params()`/`SSL_get_peer_quic_transport_params()`;
- `SSL_process_quic_post_handshake()` - this one appears to be optional but omitting it will leave us without 0-RTT support (at least, for now).
There is one problem with how the crypto API libraries are structured - as there is no way to use only the shared part without depending on one of the above-mentioned crypto libraries (which would not work for us anyway).
So, in our ngtcp2 crypto API implementation, we would need to provide a replacement for the shared part - at least for the callbacks and `ngtcp2_crypto_derive_and_install_rx_key()`/ `ngtcp2_crypto_derive_and_install_tx_key()`. That is doable but very unfortunate, as it includes a lot of crypto-related things, not to mention that it is extra code to maintain.
Regarding the missing QUIC API in OpenSSL, there is a solution from NGINX which might work for us as well.
### NGINX OpenSSL Compatibility Layer
As noted above, NGINX includes a compatibility layer with NGINX as a part of its QUIC implementation. It includes the implementations of the following functions:
- `SSL_set_quic_method()`
- `SSL_provide_quic_data()`
- `SSL_set_quic_transport_params()`/`SSL_get_peer_quic_transport_params()`
NGINX's [OpenSSL compatibility layer](https://github.com/nginx/nginx/blob/master/src/event/quic/ngx_event_quic_openssl_compat.c) clearly provides an internal implementation of missing parts of BoringSSL QUIC API, which is not that different from QuicTLS API.
One missing thing is `SSL_process_quic_post_handshake()`, which is needed for 0-RTT support (to be more precise - for TLS early data processing). In fact, NGINX explicitly turns off TLS early data support when using QUIC. It is not clear if we can easily provide a replacement for this function, but even if not - we can live without 0-RTT support for now. At the time of writing, it is still optional in many crypto libraries.
It is worth noting that NGINX is a server application, so the provided QUIC API implementation might need some adjustments to work for client-side code.
At this point, it seems that despite some limitations, it is possible to provide a portable ngtcp2 crypto API implementation.BIND 9.21.xArtem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4581CID 486476: Memory - corruptions (OVERRUN) in lib/dns/resconf.c2024-02-24T07:55:23ZMichal NowakCID 486476: Memory - corruptions (OVERRUN) in lib/dns/resconf.cAfter 371defc35753d04fa8b769b8c859630c3a76e9ed, Coverity Scan claims memory corruption in `lib/dns/resconf.c`:
```cpp
/lib/dns/resconf.c: 246 in add_server()
240
241 /* XXX: special case: treat all-0 IPv4 address as loopback *...After 371defc35753d04fa8b769b8c859630c3a76e9ed, Coverity Scan claims memory corruption in `lib/dns/resconf.c`:
```cpp
/lib/dns/resconf.c: 246 in add_server()
240
241 /* XXX: special case: treat all-0 IPv4 address as loopback */
242 v4 = &((struct sockaddr_in *)res->ai_addr)->sin_addr;
243 if (memcmp(v4, zeroaddress, 4) == 0) {
244 memmove(v4, loopaddress, 4);
245 }
>>> CID 486476: Memory - corruptions (OVERRUN)
>>> Overrunning struct type sockaddr_in of 16 bytes by passing it to a function which accesses it at byte offset 27 using argument "res->ai_addrlen" (which evaluates to 28). [Note: The source code implementation of the function has been overridden by a builtin model.]
246 memmove(&address->type.sin, res->ai_addr, res->ai_addrlen);
247 } else if (res->ai_family == AF_INET6) {
248 memmove(&address->type.sin6, res->ai_addr, res->ai_addrlen);
249 } else {
250 isc_mem_put(mctx, address, sizeof(*address));
251 UNEXPECTED_ERROR("ai_family (%d) not INET nor INET6",
```May 2024 (9.18.27, 9.18.27-S1, 9.19.24)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/4580Add resolver.arpa to the built in empty zones2024-03-21T00:23:52ZMark AndrewsAdd resolver.arpa to the built in empty zonesRFC 9462 adds resolver.arpa to the lists of empty zones.
```
6.4. Handling Non-DDR Queries for resolver.arpa
DNS resolvers that support DDR by responding to queries for _dns.resolver.arpa. MUST treat resolver.arpa as a
locally served z...RFC 9462 adds resolver.arpa to the lists of empty zones.
```
6.4. Handling Non-DDR Queries for resolver.arpa
DNS resolvers that support DDR by responding to queries for _dns.resolver.arpa. MUST treat resolver.arpa as a
locally served zone per [RFC6303]. In practice, this means that resolvers SHOULD respond to queries of any
type other than SVCB for _dns.resolver.arpa. with NODATA and queries of any type for any domain name under
resolver.arpa with NODATA.
```May 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/4579Restore the ability to select individual unit tests and turn on debugging2024-02-24T07:53:44ZMark AndrewsRestore the ability to select individual unit tests and turn on debugging63fe9312ff8f removed the ability to select individual tests from the command line and turn on debugging. This is useful when you want to check only parts of a unit test when developing. This restores / adds this ability.63fe9312ff8f removed the ability to select individual tests from the command line and turn on debugging. This is useful when you want to check only parts of a unit test when developing. This restores / adds this ability.May 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/4577bind 9.18.24 on Ubuntu 20.04 doesn't start due to "type=notify" in named.service2024-02-14T09:58:47ZPascal Ernsterbind 9.18.24 on Ubuntu 20.04 doesn't start due to "type=notify" in named.service### Summary
When installing `bind9 1:9.18.24-1+ubuntu20.04.1+deb.sury.org+1` from ISC's PPA on Ubuntu 20.04, the `named.service` unit hangs during the post-install phase of the package installation, and more generally, always when the `...### Summary
When installing `bind9 1:9.18.24-1+ubuntu20.04.1+deb.sury.org+1` from ISC's PPA on Ubuntu 20.04, the `named.service` unit hangs during the post-install phase of the package installation, and more generally, always when the `named.service` unit is started. This is caused by the `type=notify` line that is present in the `named.service` unit although the actual support for systemd notify is only present in bind's 9.19.x version branch. After a 90 seconds, the unit runs into a timeout, and the `bind9` package is left in an unconfigured/semi-configured state in apt/aptitude.
Note: Ubuntu's `bind` package was installed and running on my machine, and I encountered this bug today when I tried to migrate to the `bind9` package from ISC's PPA. To be sure that this is not caused by any remains from the previous installation, I have completely purged all bind-related packages, manually deleted the `bind` user and the `/etc/bind`, `/var/cache/bind` and `/var/lib/bind` directories. Even after this cleanup, I still encountered this bug when trying to install `bind9 1:9.18.24-1+ubuntu20.04.1+deb.sury.org+1`. I've used the following workaround to solve to issue for me until the bug is fixed in the PPA package:
#### Temporary workaround for other users encountering the same bug
Prior to installing the package, run `sudo systemctl edit named.service`, then enter the following two lines and save the file:
```
[Service]
Type=simple
```
You should now be able to install/reconfigure the `bind9` package and to start `named.service` without issues.
### BIND version affected
```
named -V
BIND 9.18.24-1+ubuntu20.04.1+deb.sury.org+1-Ubuntu (Extended Support Version) <id:>
running on Linux x86_64 5.4.0-173-generic #191-Ubuntu SMP Fri Feb 2 13:55:07 UTC 2024
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--disable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=yes' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-gw9jNu/bind9-9.18.24=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 9.4.0
compiled with OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
linked to OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
compiled with libuv version: 1.44.2
linked to libuv version: 1.44.2
compiled with libnghttp2 version: 1.40.0
linked to libnghttp2 version: 1.40.0
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with json-c version: 0.13.1
linked to json-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.4.2
compiled with protobuf-c version: 1.3.3
linked to protobuf-c version: 1.3.3
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
```
### Steps to reproduce
1. Try to install the `bind9` 9.18.24 package from ISC's Ubuntu 20.04 PPA, for example using `apt install bind9`
2. Notice that the package installation fails/aborts during the post-install phase after about 90 seconds.
3. Check the ouput of `journalctl -ru named | grep --extended-regexp '(timeout|timed out)'`
### What is the current *bug* behavior?
The systemd unit `named.service` times out after 90 seconds.
### What is the expected *correct* behavior?
The systemd unit `named.service` shouldn't time out.
### Relevant configuration files
The bug occurs even with the unchanged config that ships with the package.
### Relevant logs
Excerpt from `journalctl -u named.service`:
```
Feb 13 19:06:53 ns systemd[1]: Starting BIND Domain Name Server...
Feb 13 19:06:53 ns named[6454]: starting BIND 9.18.24-1+ubuntu20.04.1+deb.sury.org+1-Ubuntu (Extended Support Version) <id:>
Feb 13 19:06:53 ns named[6454]: running on Linux x86_64 5.4.0-173-generic #191-Ubuntu SMP Fri Feb 2 13:55:07 UTC 2024
Feb 13 19:06:53 ns named[6454]: built with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable>
Feb 13 19:06:53 ns named[6454]: running as: named -f -u bind
Feb 13 19:06:53 ns named[6454]: compiled by GCC 9.4.0
Feb 13 19:06:53 ns named[6454]: compiled with OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
Feb 13 19:06:53 ns named[6454]: linked to OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
Feb 13 19:06:53 ns named[6454]: compiled with libuv version: 1.44.2
Feb 13 19:06:53 ns named[6454]: linked to libuv version: 1.44.2
Feb 13 19:06:53 ns named[6454]: compiled with libxml2 version: 2.9.10
Feb 13 19:06:53 ns named[6454]: linked to libxml2 version: 20910
Feb 13 19:06:53 ns named[6454]: compiled with json-c version: 0.13.1
Feb 13 19:06:53 ns named[6454]: linked to json-c version: 0.13.1
Feb 13 19:06:53 ns named[6454]: compiled with zlib version: 1.2.11
Feb 13 19:06:53 ns named[6454]: linked to zlib version: 1.2.11
Feb 13 19:06:53 ns named[6454]: ----------------------------------------------------
Feb 13 19:06:53 ns named[6454]: BIND 9 is maintained by Internet Systems Consortium,
Feb 13 19:06:53 ns named[6454]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
Feb 13 19:06:53 ns named[6454]: corporation. Support and training for BIND 9 are
Feb 13 19:06:53 ns named[6454]: available at https://www.isc.org/support
Feb 13 19:06:53 ns named[6454]: ----------------------------------------------------
Feb 13 19:06:53 ns named[6454]: adjusted limit on open files from 524288 to 1048576
Feb 13 19:06:53 ns named[6454]: found 1 CPU, using 1 worker thread
Feb 13 19:06:53 ns named[6454]: using 1 UDP listener per interface
Feb 13 19:06:53 ns named[6454]: DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
Feb 13 19:06:53 ns named[6454]: DS algorithms: SHA-1 SHA-256 SHA-384
Feb 13 19:06:53 ns named[6454]: HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
Feb 13 19:06:53 ns named[6454]: TKEY mode 2 support (Diffie-Hellman): yes
Feb 13 19:06:53 ns named[6454]: TKEY mode 3 support (GSS-API): yes
Feb 13 19:06:53 ns named[6454]: loading configuration from '/etc/bind/named.conf'
[…]
Feb 13 19:08:23 ns systemd[1]: named.service: start operation timed out. Terminating.
Feb 13 19:08:23 ns named[6454]: no longer listening on 127.0.0.1#53
Feb 13 19:08:23 ns named[6454]: no longer listening on […]#53
Feb 13 19:08:23 ns named[6454]: no longer listening on ::1#53
Feb 13 19:08:23 ns named[6454]: no longer listening on […]#53
Feb 13 19:08:23 ns named[6454]: shutting down
Feb 13 19:08:23 ns named[6454]: stopping command channel on 127.0.0.1#953
Feb 13 19:08:23 ns named[6454]: stopping command channel on ::1#953
Feb 13 19:08:23 ns named[6454]: exiting
Feb 13 19:08:23 ns systemd[1]: named.service: Failed with result 'timeout'.
Feb 13 19:08:23 ns systemd[1]: Failed to start BIND Domain Name Server.
Feb 13 19:08:24 ns systemd[1]: named.service: Scheduled restart job, restart counter is at 1.
Feb 13 19:08:24 ns systemd[1]: Stopped BIND Domain Name Server.
```https://gitlab.isc.org/isc-projects/bind9/-/issues/4576#error "isc/stdatomic.h does not support your compiler"2024-02-13T21:49:44ZMarcus Kool#error "isc/stdatomic.h does not support your compiler"<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please make sure that you make the new issue
confident...<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please make sure that you make the new issue
confidential by clicking the checkbox at the bottom!
-->
### Summary
When compiling a plugin outside the bind source tree a fatal error is thrown by
#error "isc/stdatomic.h does not support your compiler"
### BIND version affected
<!--
Make sure you are testing with the **latest** supported version of BIND
for a given branch. Many bugs have been fixed over time!
See https://kb.isc.org/docs/supported-platforms for the current list.
The latest source is available from https://www.isc.org/download/#BIND
Paste the output of `named -V` here.
-->
all 9.18.x versions
### Steps to reproduce
<!--
This is extremely important! Be precise and use itemized lists, please.
Even if a default configuration is affected, please include the full configuration
files _you were testing with_.
Example:
1. Use _attached_ configuration file
2. Start BIND server with command: `named -g -c named.conf ...`
3. Simulate legitimate clients using command `dnsperf -S1 -d legit-queries ...`
4. Simulate attack traffic using command `dnsperf -S1 -d attack-queries ...`
-->
1. copy a plugin to outside the bind tree
2. make sure the plugin includes isc/log.h which triggers the inclusion of isc/stdatomic.h
3. use gcc 5.0 or higher
### What is the current *bug* behavior?
<!-- What actually happens. -->
The copiler issues a fatal error:
```
In file included from /local/src/bind/urlfilterdb-isc-bind-918/lib/isc/include/isc/atomic.h:19,
from /local/src/bind/urlfilterdb-isc-bind-918/lib/isc/include/isc/types.h:16,
from /local/src/bind/urlfilterdb-isc-bind-918/lib/isc/include/isc/log.h:25,
from ufdb-bind-log.h:12,
from ufdb-bind-log.c:19:
/local/src/bind/urlfilterdb-isc-bind-918/lib/isc/include/isc/stdatomic.h:29:2: error: #error "isc/stdatomic.h does not support your compiler"
29 | #error "isc/stdatomic.h does not support your compiler"
| ^~~~~
```
### What is the expected *correct* behavior?
<!-- What you should see instead. -->
compilation is successful
### Relevant configuration files
<!-- Paste any relevant configuration files here - please use code blocks (```)
to format console output. If submitting the contents of your
configuration file in a non-confidential issue, it is advisable to
obscure key secrets; this can be done automatically by using
`named-checkconf -px`. -->
### Relevant logs
<!-- Paste any relevant logs here - please use code blocks (```) to format console
output, logs, and code, as it's very hard to read otherwise. -->https://gitlab.isc.org/isc-projects/bind9/-/issues/4574add an +opt display flag for dig2024-02-14T18:09:52ZMatthew Pounsettadd an +opt display flag for dig### Description
As part of providing support or education to non-experts in the DNS, I find it useful to be able to narrow the output of `dig` to just the relevant bits, while providing a simple, reproducible, easily understood command ...### Description
As part of providing support or education to non-experts in the DNS, I find it useful to be able to narrow the output of `dig` to just the relevant bits, while providing a simple, reproducible, easily understood command to the other person. For example, it's routine for me to do something like `dig +noall +answer ...` or `dig +noall +authority...`.
I recently ran into a case where it would have been useful to focus on the OPT pseudosection, but found no easy antidote to `+noall` for it. `+comments` is close, but not precise. Likewise there are a number of things I can do with `grep`, or combining `+yaml` with `/bin/yq` to limit the output, but at the expense of clarity in the command I share.
### Request
It would be helpful to have an `+opt` (or equivalent) display flag in dig, which shows only the OPT pseudo-section without any other output, similarly to flags `+answer`, `+authority`, and `+additional`.
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4572TLS forwarder lookup fails in resolver.c when TLS CA certificate not available2024-02-21T20:41:29ZTobias WolterTLS forwarder lookup fails in resolver.c when TLS CA certificate not available### Summary
When configuring a forwarder with a TLS configuration that specifies a CA file to verify the remote certificate, BIND dies at the RUNTIME_CHECK in resolver.c when it cannot read the CA file.
### BIND version affected
BIND ...### Summary
When configuring a forwarder with a TLS configuration that specifies a CA file to verify the remote certificate, BIND dies at the RUNTIME_CHECK in resolver.c when it cannot read the CA file.
### BIND version affected
BIND 9.19.19-1+0~20231220.107+debian12~1.gbpfc5ec0-Debian (Development Release) <id:>
### Steps to reproduce
1. Use the provided configuration snippet in any working configuration.
2. Do *not* have a `/certificates` directory.
3. Look up something from test.example.com.
### What is the current *bug* behavior?
BIND crashes in the resolver library due to the TLS context not being set up correctly.
### What is the expected *correct* behavior?
BIND complains about not being able to read about the CA certificate on startup.
### Relevant configuration files
* named.conf (excerpt):
```
tls auth1 {
ca-file "/certificates/ca.crt";
remote-hostname "auth1";
};
tls auth2 {
ca-file "/certificates/ca.crt";
remote-hostname "auth2";
};
zone test.example.com {
type forward;
forwarders port 853 { 172.23.23.11 tls auth1; 172.23.23.12 tls auth2; };
forward only;
};
```
### Relevant logs
```
recursive-2 | 12-Feb-2024 08:57:44.370 client @0x713184c1c000 172.23.23.23#45455 (foo.test.example.com): query: foo.test.example.com IN A +E(0)K (172.23.23.14)
recursive-2 | fetch: foo.test.example.com/A
recursive-2 | QNAME minimization - minimized, qmintype 2 qminname corp
recursive-2 | resolver.c:2175:fctx_query(): fatal error:
recursive-2 | RUNTIME_CHECK(result == ISC_R_SUCCESS) failed
recursive-2 | exiting (due to fatal error in library)
recursive-2 exited with code 139
```March 2024 (9.16.49, 9.16.49-S1, 9.18.25, 9.18.25-S1, 9.19.22)Artem BoldarievArtem Boldarievhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4571findnsec3proofs failed to disassociate all rdatasets returned by dns_ncache_c...2024-03-08T08:24:37ZMark Andrewsfindnsec3proofs failed to disassociate all rdatasets returned by dns_ncache_currentThese are not currently reference counted by if they where it would introduce a memory/reference leak.These are not currently reference counted by if they where it would introduce a memory/reference leak.March 2024 (9.16.49, 9.16.49-S1, 9.18.25, 9.18.25-S1, 9.19.22)https://gitlab.isc.org/isc-projects/bind9/-/issues/4570CID 486327: Control flow issues (UNREACHABLE)2024-03-08T08:20:30ZMark AndrewsCID 486327: Control flow issues (UNREACHABLE)Just need to be removed.
```
** CID 486327: Control flow issues (UNREACHABLE)
/lib/isc/netmgr/netmgr.c: 750 in isc___nmsocket_init()
____________________________________________________________________________________________________...Just need to be removed.
```
** CID 486327: Control flow issues (UNREACHABLE)
/lib/isc/netmgr/netmgr.c: 750 in isc___nmsocket_init()
________________________________________________________________________________________________________
*** CID 486327: Control flow issues (UNREACHABLE)
/lib/isc/netmgr/netmgr.c: 750 in isc___nmsocket_init()
744 sock->statsindex = tcp6statsindex;
745 break;
746 default:
747 UNREACHABLE();
748 }
749 break;
CID 486327: Control flow issues (UNREACHABLE)
This code cannot be reached: "switch (family) {
case 2:...".
750 switch (family) {
751 case AF_INET:
752 sock->statsindex = tcp4statsindex;
753 break;
754 case AF_INET6:
755 sock->statsindex = tcp6statsindex;
```March 2024 (9.16.49, 9.16.49-S1, 9.18.25, 9.18.25-S1, 9.19.22)https://gitlab.isc.org/isc-projects/bind9/-/issues/4569** CID 486326: Memory - corruptions (OVERRUN)2024-03-08T08:23:03ZMark Andrews** CID 486326: Memory - corruptions (OVERRUN)This is cosmetic but should be addressed. sizeof(type.sa) is smaller than sizeof(type) and sizeof(type.sin6) so
while this overruns type.sa it doesn't actually overrun the union.
```
** CID 486326: Memory - corruptions (OVERRUN)
/lib/...This is cosmetic but should be addressed. sizeof(type.sa) is smaller than sizeof(type) and sizeof(type.sin6) so
while this overruns type.sa it doesn't actually overrun the union.
```
** CID 486326: Memory - corruptions (OVERRUN)
/lib/dns/resconf.c: 248 in add_server()
________________________________________________________________________________________________________
*** CID 486326: Memory - corruptions (OVERRUN)
/lib/dns/resconf.c: 248 in add_server()
242 if (res->ai_addrlen > sizeof(address->type)) {
243 isc_mem_put(mctx, address, sizeof(*address));
244 result = ISC_R_RANGE;
245 goto cleanup;
246 }
247 address->length = (unsigned int)res->ai_addrlen;
CID 486326: Memory - corruptions (OVERRUN)
Overrunning struct type sockaddr of 16 bytes by passing it to a function which accesses it at byte offset 27 using argument "res->ai_addrlen" (which evaluates to 28). [Note: The source code implementation of the function has been overridden by a builtin model.]
248 memmove(&address->type.sa, res->ai_addr, res->ai_addrlen);
249 ISC_LINK_INIT(address, link);
250 ISC_LIST_APPEND(*nameservers, address, link);
251
252 cleanup:
253 freeaddrinfo(res);
```March 2024 (9.16.49, 9.16.49-S1, 9.18.25, 9.18.25-S1, 9.19.22)https://gitlab.isc.org/isc-projects/bind9/-/issues/4568isc_ht case insensitive matching is broken2024-03-07T21:26:35ZOndřej Surýisc_ht case insensitive matching is brokenThe `isc__ht_node_match()` ignores the case_sensitivity setting and always uses `memcmp()` that is case sensitive.The `isc__ht_node_match()` ignores the case_sensitivity setting and always uses `memcmp()` that is case sensitive.February 2024 (9.16.47/9.16.48, 9.16.47/9.16.48-S1, 9.18.23/9.18.24, 9.18.23/9.18.24-S1, 9.19.21)