ISC Open Source Projects issueshttps://gitlab.isc.org/groups/isc-projects/-/issues2023-11-02T17:00:03Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2438[ISC-support #17264] add configuration option to set ADB size independent of ...2023-11-02T17:00:03ZBrian Conry[ISC-support #17264] add configuration option to set ADB size independent of cache sizeCurrently the maximum ADB size is set as a fraction (1/8) of the configured/effective `max-cache-size`.
It has been observed with at least one customer that this size is sometimes insufficient.
This may be related to the rise of CDNs w...Currently the maximum ADB size is set as a fraction (1/8) of the configured/effective `max-cache-size`.
It has been observed with at least one customer that this size is sometimes insufficient.
This may be related to the rise of CDNs with long CNAME chains spread across multiple domains with each domain typically hosted by a large group of geographically-diverse authoritative servers.
While there are other features necessary before an operator can tune this based on data rather than intuition, it may also be useful in our own explorations of ADB behavior in adverse conditions.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2436ADB stats don't provide any information useful to assess the utility that it ...2023-11-02T17:00:03ZBrian ConryADB stats don't provide any information useful to assess the utility that it is able to provideAs mentioned on #2405, the stats related to the ADB are inadequate for an operator to identify when there are issues.
The current ADB stats are limited to:
* `dns_adbstats_entriescnt` - "Addresses in hash table"
* `dns_adbstats_namescnt...As mentioned on #2405, the stats related to the ADB are inadequate for an operator to identify when there are issues.
The current ADB stats are limited to:
* `dns_adbstats_entriescnt` - "Addresses in hash table"
* `dns_adbstats_namescnt` - "Names in hash table"
* `dns_adbstats_nentries` - "Address hash table size"
* `dns_adbstats_nnames` - "Name hash table size"
It is also possible, using the stats channel, to see the stats for the memory contexts used by the ADBs, though there is not an explicit strong association between those memory contexts and the ADB (as identified by view).
It was perhaps expected that these would be sufficient to characterize what is going on in the hash table, but I have recently inspected a customer core file in which the ADB seems to have been in a state of near-constant overmem cleaning with about 73% of the address entries being marked as `ENTRY_IS_DEAD`.
It doesn't matter whether or not those "dead" entries are counted in those stats, the reported values are necessarily providing an incomplete picture of the state of the ADB.
At minimum there should be stats for the number of "live" entries separate from the number of "dead" entries. It may also be informative for an operator to see how many entries are associated with other flags, such as `NOEDNS`.
I have not yet done a thorough analysis to identify any other "missing" stats that may be useful to operators in understanding how their system is performing and utilizing resources.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2435No logging, at any level, related to ADB overmem cleaning2023-11-02T17:00:03ZBrian ConryNo logging, at any level, related to ADB overmem cleaningAs introduced in #2405, there is no logging associated with any of the decisions or actions relating to ADB overmem cleaning.
This makes it impossible to be certain about any of the decisions made, or even *if* decisions were made, with...As introduced in #2405, there is no logging associated with any of the decisions or actions relating to ADB overmem cleaning.
This makes it impossible to be certain about any of the decisions made, or even *if* decisions were made, without using a debugger.
When ADB overmem cleaning operates on a subset of the entries for servers authoritative for a zone it takes actions that will undermine all of the work done maintaining SRTT values and can cause the server to behave in non-intuitive ways.
Among other possible effects that are not yet properly characterized.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2426CID 316504: Untrusted loop bound (TAINTED_SCALAR)2022-03-01T09:57:30ZMichal NowakCID 316504: Untrusted loop bound (TAINTED_SCALAR)```
*** CID 316504: (TAINTED_SCALAR)
/lib/dns/rdata/generic/rrsig_46.c: 233 in totext_rrsig()
227
228 /*
229 * Time signed.
230 */
231 when = uint32_fromregion(&sr);
232 isc_region_consume(&sr, 4);
>>> ...```
*** CID 316504: (TAINTED_SCALAR)
/lib/dns/rdata/generic/rrsig_46.c: 233 in totext_rrsig()
227
228 /*
229 * Time signed.
230 */
231 when = uint32_fromregion(&sr);
232 isc_region_consume(&sr, 4);
>>> CID 316504: (TAINTED_SCALAR)
>>> Passing tainted expression "when" to "dns_time32_totext", which uses it as a loop boundary.
233 RETERR(dns_time32_totext(when, target));
234 RETERR(str_totext(" ", target));
235
236 /*
237 * Footprint.
238 */
/lib/dns/rdata/generic/rrsig_46.c: 225 in totext_rrsig()
219
220 /*
221 * Sig exp.
222 */
223 exp = uint32_fromregion(&sr);
224 isc_region_consume(&sr, 4);
>>> CID 316504: (TAINTED_SCALAR)
>>> Passing tainted expression "exp" to "dns_time32_totext", which uses it as a loop boundary.
225 RETERR(dns_time32_totext(exp, target));
226 RETERR(str_totext(" ", target));
227
228 /*
229 * Time signed.
230 */
```Not plannedMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2425CID 316505: Insecure data handling (TAINTED_SCALAR)2022-03-01T09:57:33ZMichal NowakCID 316505: Insecure data handling (TAINTED_SCALAR)```
*** CID 316505: Insecure data handling (TAINTED_SCALAR)
/lib/dns/journal.c: 972 in journal_find()
966 return (ISC_R_SUCCESS);
967 }
968
969 current_pos = j->header.begin;
970 index_find(j, serial, &current...```
*** CID 316505: Insecure data handling (TAINTED_SCALAR)
/lib/dns/journal.c: 972 in journal_find()
966 return (ISC_R_SUCCESS);
967 }
968
969 current_pos = j->header.begin;
970 index_find(j, serial, ¤t_pos);
971
>>> CID 316505: Insecure data handling (TAINTED_SCALAR)
>>> Using tainted variable "current_pos.serial" as a loop boundary.
972 while (current_pos.serial != serial) {
973 if (DNS_SERIAL_GT(current_pos.serial, serial)) {
974 return (ISC_R_NOTFOUND);
975 }
976 result = journal_next(j, ¤t_pos);
977 if (result != ISC_R_SUCCESS) {
```Not plannedMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2424CID 316506: Insecure data handling (TAINTED_SCALAR)2022-03-01T09:57:36ZMichal NowakCID 316506: Insecure data handling (TAINTED_SCALAR)```
*** CID 316506: Insecure data handling (TAINTED_SCALAR)
/lib/dns/journal.c: 1855 in read_one_rr()
1849 */
1850 if (isc_buffer_remaininglength(&j->it.source) != rdlen) {
1851 FAIL(DNS_R_FORMERR);
1852 }
1853 ...```
*** CID 316506: Insecure data handling (TAINTED_SCALAR)
/lib/dns/journal.c: 1855 in read_one_rr()
1849 */
1850 if (isc_buffer_remaininglength(&j->it.source) != rdlen) {
1851 FAIL(DNS_R_FORMERR);
1852 }
1853 isc_buffer_setactive(&j->it.source, rdlen);
1854 dns_rdata_reset(&j->it.rdata);
>>> CID 316506: Insecure data handling (TAINTED_SCALAR)
>>> Passing tainted expression "j->it.source.active" to "dns_rdata_fromwire", which uses it as a loop boundary.
1855 CHECK(dns_rdata_fromwire(&j->it.rdata, rdclass, rdtype, &j->it.source,
1856 &j->it.dctx, 0, &j->it.target));
1857 j->it.ttl = ttl;
1858
1859 j->it.xpos += sizeof(journal_rawrrhdr_t) + rrhdr.size;
1860 if (rdtype == dns_rdatatype_soa) {
```Not plannedMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2422CID 316508: Insecure data handling (TAINTED_SCALAR)2022-03-01T09:57:39ZMichal NowakCID 316508: Insecure data handling (TAINTED_SCALAR)```
*** CID 316508: Insecure data handling (TAINTED_SCALAR)
/lib/dns/journal.c: 1714 in dns_journal_iter_init()
1708
1709 result = journal_next(j, &pos);
1710 if (result == ISC_R_NOMORE) {
1711 result = ISC_R...```
*** CID 316508: Insecure data handling (TAINTED_SCALAR)
/lib/dns/journal.c: 1714 in dns_journal_iter_init()
1708
1709 result = journal_next(j, &pos);
1710 if (result == ISC_R_NOMORE) {
1711 result = ISC_R_SUCCESS;
1712 }
1713 CHECK(result);
>>> CID 316508: Insecure data handling (TAINTED_SCALAR)
>>> Using tainted variable "pos.serial" as a loop boundary.
1714 } while (pos.serial != end_serial);
1715
1716 /*
1717 * For each RR, subtract the length of the RR header,
1718 * as this would not be present in IXFR messages.
1719 * (We don't need to worry about the transaction header
```Not plannedMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2419CID 316511: Insecure data handling (TAINTED_SCALAR)2022-03-01T09:57:42ZMichal NowakCID 316511: Insecure data handling (TAINTED_SCALAR)```
*** CID 316511: Insecure data handling (TAINTED_SCALAR)
/lib/dns/rdata/generic/hip_55.c: 496 in casecompare_hip()
490 key_len = uint16_fromregion(&r1);
491 isc_region_consume(&r1, 2); /* key length */
492 isc_region_...```
*** CID 316511: Insecure data handling (TAINTED_SCALAR)
/lib/dns/rdata/generic/hip_55.c: 496 in casecompare_hip()
490 key_len = uint16_fromregion(&r1);
491 isc_region_consume(&r1, 2); /* key length */
492 isc_region_consume(&r2, 4);
493
494 INSIST(r1.length >= (unsigned)(hit_len + key_len));
495 INSIST(r2.length >= (unsigned)(hit_len + key_len));
>>> CID 316511: Insecure data handling (TAINTED_SCALAR)
>>> Passing tainted expression "hit_len + key_len" to "memcmp", which uses it as an offset.
496 order = memcmp(r1.base, r2.base, hit_len + key_len);
497 if (order != 0) {
498 return (order);
499 }
500 isc_region_consume(&r1, hit_len + key_len);
501 isc_region_consume(&r2, hit_len + key_len);
```Not plannedMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2418CID 316512: Untrusted loop bound (TAINTED_SCALAR)2022-03-01T09:57:44ZMichal NowakCID 316512: Untrusted loop bound (TAINTED_SCALAR)```
*** CID 316512: (TAINTED_SCALAR)
/lib/dns/rdata/generic/sig_24.c: 199 in totext_sig()
193
194 /*
195 * Time signed.
196 */
197 when = uint32_fromregion(&sr);
198 isc_region_consume(&sr, 4);
>>> ...```
*** CID 316512: (TAINTED_SCALAR)
/lib/dns/rdata/generic/sig_24.c: 199 in totext_sig()
193
194 /*
195 * Time signed.
196 */
197 when = uint32_fromregion(&sr);
198 isc_region_consume(&sr, 4);
>>> CID 316512: (TAINTED_SCALAR)
>>> Passing tainted expression "when" to "dns_time32_totext", which uses it as a loop boundary.
199 RETERR(dns_time32_totext(when, target));
200 RETERR(str_totext(" ", target));
201
202 /*
203 * Footprint.
204 */
/lib/dns/rdata/generic/sig_24.c: 187 in totext_sig()
181
182 /*
183 * Sig exp.
184 */
185 exp = uint32_fromregion(&sr);
186 isc_region_consume(&sr, 4);
>>> CID 316512: (TAINTED_SCALAR)
>>> Passing tainted expression "exp" to "dns_time32_totext", which uses it as a loop boundary.
187 RETERR(dns_time32_totext(exp, target));
188
189 if ((tctx->flags & DNS_STYLEFLAG_MULTILINE) != 0) {
190 RETERR(str_totext(" (", target));
191 }
192 RETERR(str_totext(tctx->linebreak, target));
```Not plannedMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2417CID 316513: Insecure data handling (TAINTED_SCALAR)2022-03-01T09:57:47ZMichal NowakCID 316513: Insecure data handling (TAINTED_SCALAR)```
*** CID 316513: Insecure data handling (TAINTED_SCALAR)
/lib/dns/master.c: 2618 in load_raw()
2612 * the target available region be the same if
2613 * decompression is disabled (see dctx above) and we
2614 *...```
*** CID 316513: Insecure data handling (TAINTED_SCALAR)
/lib/dns/master.c: 2618 in load_raw()
2612 * the target available region be the same if
2613 * decompression is disabled (see dctx above) and we
2614 * are not downcasing names (options == 0).
2615 */
2616 isc_buffer_init(&buf, isc_buffer_current(&target),
2617 (unsigned int)rdlen);
>>> CID 316513: Insecure data handling (TAINTED_SCALAR)
>>> Passing tainted expression "target.active" to "dns_rdata_fromwire", which uses it as a loop boundary.
2618 result = dns_rdata_fromwire(
2619 &rdata[i], rdatalist.rdclass, rdatalist.type,
2620 &target, &dctx, 0, &buf);
2621 if (result != ISC_R_SUCCESS) {
2622 goto cleanup;
2623 }
```Not plannedMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2414inline-signing breaking on 9.11.25 (FreeBSD 11.4)2022-03-01T09:47:38ZDan Mahoneyinline-signing breaking on 9.11.25 (FreeBSD 11.4)<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [...<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [security-officer@isc.org](security-officer@isc.org).
-->
### Summary
Journal becomes corrupted leading to inline-signing failure.
### BIND version used
```
BIND 9.11.25 (Extended Support Version) <id:4a7e9aa>
running on FreeBSD amd64 11.4-RELEASE-p3 FreeBSD 11.4-RELEASE-p3 #0: Tue Sep 1 08:22:33 UTC 2020 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
built by make with '--localstatedir=/var' '--disable-linux-caps' '--with-randomdev=/dev/random' '--with-libxml2=/usr/local' '--with-readline=-L/usr/local/lib -ledit' '--with-dlopen=yes' '--with-gost=no' '--without-python' '--sysconfdir=/usr/local/etc/namedb' '--with-dlz-filesystem=yes' '--enable-dnstap' '--enable-filter-aaaa' '--disable-fixed-rrset' '--without-geoip2' '--without-gssapi' '--with-libidn2=/usr/local' '--enable-ipv6' '--with-libjson=/usr/local' '--disable-largefile' '--with-lmdb=/usr/local' '--disable-native-pkcs11' '--disable-querytrace' '--enable-rpz-nsdname' '--enable-rpz-nsip' 'STD_CDEFINES=-DDIG_SIGCHASE=1' '--with-openssl=/usr' '--enable-threads' '--with-tuning=default' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd11.4' 'build_alias=amd64-portbld-freebsd11.4' 'CC=cc' 'CFLAGS=-O2 -pipe -DLIBICONV_PLUG -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-DLIBICONV_PLUG -isystem /usr/local/include' 'CPP=cpp' 'PKG_CONFIG=pkgconf'
compiled by CLANG FreeBSD Clang 10.0.0 (git@github.com:llvm/llvm-project.git llvmorg-10.0.0-0-gd32170dbd5b)
compiled with OpenSSL version: OpenSSL 1.0.2u-freebsd 20 Dec 2019
linked to OpenSSL version: OpenSSL 1.0.2u-freebsd 20 Dec 2019
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with libjson-c version: 0.15
linked to libjson-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled
default paths:
named configuration: /usr/local/etc/namedb/named.conf
rndc configuration: /usr/local/etc/namedb/rndc.conf
DNSSEC root key: /usr/local/etc/namedb/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/pid
named lock file: /var/run/named/named.lock
```
FreeBSD 11.4, AMD64, running as a Vmware vm.
### Steps to reproduce
Issue noticed on my personal machine where named started servfailing a master zone with inline-signing enabled. All slaves failed to receive it as well.
Upon hitting this issue, I deleted the zone.signed.jnl and zone.signed and did an rndc reload which did not help. The zone.jnl appeared to be corrupted.
I've backed up the unsigned zone files and timestamps, but will ask staff whether they want me to upload them elsewhere or attach them here. (I'd prefer this ticket be private if that's the case)
This happened on its own and I can find nothing in the logs.
### What is the current *bug* behavior?
Journal gets out of sync with the main zone, refuses to update the signed copy, starts servfailing, slaves refuse to serve it.
### What is the expected *correct* behavior?
Journal staying consistent.
### Relevant configuration files
```
zone "gushi.org" {
type master;
file "/etc/namedb/m/gushi.org./gushi.org.hosts";
key-directory "/etc/namedb/m/gushi.org./keys";
inline-signing yes;
auto-dnssec maintain;
allow-transfer {
key pri-sec.gushi.org.;
redacted
};
notify yes;
also-notify {
redacted
};
};
```
#### Contents of zone directory:
```
drwxrwxrwx 4 root wheel 512 Jan 16 09:00 .
drwxrwxrwx 48 root wheel 36352 Jan 23 18:48 ..
-rw-r--r-- 1 root wheel 14915 Jan 12 20:25 gushi.org.hosts
-rw-r--r-- 1 bind wheel 512 Mar 16 2020 gushi.org.hosts.jbk
-rw-r--r-- 1 bind wheel 7498 Jan 12 20:23 gushi.org.hosts.jnl
-rw-r--r-- 1 bind wheel 52111 Jan 16 09:00 gushi.org.hosts.signed
-rw-r--r-- 1 bind wheel 1118287 Jan 16 08:49 gushi.org.hosts.signed.jnl
drwxr-xr-x 2 bind wheel 512 Jun 6 2019 keys
drwxr-xr-x 3 root wheel 512 Mar 15 2020 old
```
### Relevant logs and/or screenshots
`/var/log/messages` goes back several days, no mentions in logs:
```
Jan 19 21:00:00 prime newsyslog[98735]: logfile turned over due to size>100K
```
Before and after attempting to update the serial of the zone with zsu and reload yielded:
Before:
```
$TTL 3600
gushi.org. IN SOA ns.gushi.org. root.gushi.org. (
2021011113 ; serial number
7200 ; refresh
7200 ; retry
604800 ; expire
3600 ; minimum TTL
)
root@prime:/var/named/etc/namedb/m/gushi.org. # zsu -v -v -f gushi.org.hosts
```
Zone header:
```
$TTL 3600
gushi.org. IN SOA ns.gushi.org. root.gushi.org. (
2021012400 ; serial number
7200 ; refresh
7200 ; retry
604800 ; expire
3600 ; minimum TTL
)
root@prime:/var/named/etc/namedb/m/gushi.org. # rndc reload gushi.org
rndc: 'reload' failed: out of range
root@prime:/var/named/etc/namedb/m/gushi.org. # grep named /var/log/messages
Jan 24 00:34:04 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): journal rollforward failed: journal out of sync with zone
Jan 24 00:34:04 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): not loaded due to errors.
Jan 24 00:34:44 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): journal rollforward failed: journal out of sync with zone
Jan 24 00:34:44 <daemon.err> prime named[9378]: zone gushi.org/IN (unsigned): not loaded due to errors.
## Finally, removing the .jnl file fixed things.
root@prime:/var/named/etc/namedb/m/gushi.org. # service named stop
Stopping named.
Waiting for PIDS: 9378.
root@prime:/var/named/etc/namedb/m/gushi.org. # rm gushi.org.hosts.jnl
root@prime:/var/named/etc/namedb/m/gushi.org. # service named start
Starting named.
root@prime:/var/named/etc/namedb/m/gushi.org. # rndc reload gushi.org
zone reload up-to-date
root@prime:/var/named/etc/namedb/m/gushi.org. # dig @127.0.0.1 gushi.org SOA
root@prime:/var/named/etc/namedb/m/gushi.org. # dig @127.0.0.1 gushi.org SOA
; <<>> DiG 9.16.9 <<>> @127.0.0.1 gushi.org SOA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5029
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 85d501bb14ab255c6d056848600d31689fb49aefd8933d7e (good)
;; QUESTION SECTION:
;gushi.org. IN SOA
;; ANSWER SECTION:
gushi.org. 3600 IN SOA ns.gushi.org. root.gushi.org. 2021012437 7200 7200 604800 3600
;; AUTHORITY SECTION:
gushi.org. 360 IN NS ns.gushi.org.
gushi.org. 360 IN NS ns2.gushi.org.
;; ADDITIONAL SECTION:
ns.gushi.org. 360 IN A 199.164.166.132
ns2.gushi.org. 360 IN A 149.20.3.253
ns.gushi.org. 360 IN AAAA 2620:137:6000:10::132
ns2.gushi.org. 360 IN AAAA 2001:4f8:1:2000::253
;; Query time: 9 msec
```
### Possible fixes
Deleting the journal is the only thing that seems to help. I kept a copy of the .jnl/.jbk files from this failure mode, but have not uploaded them. I can do so out of band, or mark this ticket private.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2411Gracefully shutdown the TLS connections in TLSDNS using SSL_shutdown2021-09-02T12:42:57ZOndřej SurýGracefully shutdown the TLS connections in TLSDNS using SSL_shutdownThe SSL_shutdown needs bit back and forth on the networking channel, so right now we are doing ungraceful shutdown by tearing down the underlying TCP connection. This should be fixed to behave like a good netizen.The SSL_shutdown needs bit back and forth on the networking channel, so right now we are doing ungraceful shutdown by tearing down the underlying TCP connection. This should be fixed to behave like a good netizen.Not plannedOndřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2405[ISC-support #17264] ADB overmem condition and cleaning - very difficult to d...2024-03-01T10:04:57ZBrian Conry[ISC-support #17264] ADB overmem condition and cleaning - very difficult to detect and causes erratic behaviorThe ADB's mctx size is set to 1/8 of the max-cache-size, if set. This is the only means to control the ADB memory limit. There is also a hard-coded maximum ADB size applied to ADBs for views that share a cache.
When it goes overmem, t...The ADB's mctx size is set to 1/8 of the max-cache-size, if set. This is the only means to control the ADB memory limit. There is also a hard-coded maximum ADB size applied to ADBs for views that share a cache.
When it goes overmem, the ADB starts removing names and entries. The strategy for removing entries doesn't seem to be tied strongly to utility.
This can lead to erratic behavior as BIND is constantly forgetting information about server SRTTs, EDNS capabilities, and other useful data.
In some cases, if not all of the entries for servers associated with a zone are affected by the overmem purge, this can cause the resolver to fixate on a small subset of the servers authoritative for the zone - and not necessarily the subset with the best SRTT.
There is no logging at any level related to ADB overmem activities, nor are there any stats directly related to ADB memory usage.
There are stats for counts of names and entries, along with the number of buckets for each type, but there's no reliable way to map those to memory usage.
The stats channel does contain detail for the ADB memory contexts, but there's no reliable way to map those memory contexts to a particular view.
It seems likely that most of the time the symptoms of an overmem ADB will be minor and nearly impossible to directly measure - small delays and increases in CPU usage associated with the repeated creation and destruction of ADB entries and/or fixation on suboptimal upstream servers - but will definitely degrade the quality of service that the resolver is providing.
This behavior was noticed by a customer when their monitoring zone happened to, by chance, be negatively affected.
Most of the symptoms described here are theoretical, based on my understanding of the code and various customer-described, but unreproducable and otherwise unexplained, behaviors.
This issue is a feature request covering:
* specific testing by ISC to better understand the impact and range of behaviors with the ADB is overmem #2441
* additional logging related to ADB overmem activities #2435
* additional stats/metrics relating to ADB overmem #2436
* improvements to ADB overmem behavior (ideally based on some utility metric) #2437
* ability to directly control ADB size independent of cache size #2438
* revisit the hard-coded shared-cache maximum ADB size (e.g. remove in favor of configuration) #2439
* system tests related to any/all items aboveNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2397Debian packages for BIND -S edition2022-06-14T20:58:52ZVicky Riskvicky@isc.orgDebian packages for BIND -S editionWe have a request from a new support customer to begin producing Debian packages for the -S edition.
When we first started creating S edition packages, most of our users didn't think they would use an ISC package. Most of them preferred ...We have a request from a new support customer to begin producing Debian packages for the -S edition.
When we first started creating S edition packages, most of our users didn't think they would use an ISC package. Most of them preferred to build their own images, using their own specifications ... and interest in an ISC package was lukewarm. That sentiment may be changing.Not plannedMichał KępieńMichał Kępieńhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2394dig +short for MX when the record is broken gives confusing answer2022-04-26T13:36:40ZPaul Hoffmandig +short for MX when the record is broken gives confusing answerA confused user said that dig +short for an MX record did not report the preference level. The example he gave was:
```
# dig +short cyclonit.com MX
HDRedirect-LB7-5a03e1c2772e1c9c.elb.us-east-1.amazonaws.com.
```
When given without +sho...A confused user said that dig +short for an MX record did not report the preference level. The example he gave was:
```
# dig +short cyclonit.com MX
HDRedirect-LB7-5a03e1c2772e1c9c.elb.us-east-1.amazonaws.com.
```
When given without +short, the reason becomes clear:
```
# dig cyclonit.com MX
; <<>> DiG 9.16.10 <<>> cyclonit.com MX
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65526
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: b0e8599a68ce3729dc85d51e6003a03d038d9864e7c2e63c (good)
;; QUESTION SECTION:
;cyclonit.com. IN MX
;; ANSWER SECTION:
cyclonit.com. 10544 IN CNAME HDRedirect-LB7-5a03e1c2772e1c9c.elb.us-east-1.amazonaws.com.
;; AUTHORITY SECTION:
elb.us-east-1.amazonaws.com. 53 IN SOA ns-1826.awsdns-36.co.uk. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 60
```
Yep, it's the dreaded "CNAME and MX at the same level" issue. However, +short hides that in a confusing way.
Proposal: dig +short for broken names such as this should instead reply "CNAME target", or possibly "Bad CNAME target".Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2385Glue records can be returned when the name server's name is same as the zone ...2022-03-01T09:42:29ZSiva Kesava R KakarlaGlue records can be returned when the name server's name is same as the zone origin### Summary
Similar to issue #2384, when the `NS` records name server's name is the same as the zone origin and if the zone origin has IP records, then they are not returned ([RFC 8499 Page 25](https://www.rfc-editor.org/rfc/rfc8499.htm...### Summary
Similar to issue #2384, when the `NS` records name server's name is the same as the zone origin and if the zone origin has IP records, then they are not returned ([RFC 8499 Page 25](https://www.rfc-editor.org/rfc/rfc8499.html)). Returning these kinds of glue records is optional, and the specification does not mandate it. As in the other issue, all the other famous implementations return them, so I am raising it as an issue but feel free to take it down.
### BIND version used
BIND 9.17.8 (Development Release) <id:8c6db04>
### Steps to reproduce
Consider the following zone file:
| | | |
|--------------------|-----------|----------------------------------------------------------|
|foo.com | 500 SOA | ns1.outside.edu. root.campus.edu. 3 86400 7200 604800 300 |
| foo.com. | 500 NS | ns1.outside.edu. |
| foo.com. | 500 AAAA| 2400:cb00:2049:1::a29f:1804 |
| foo.com. | 500 A | 1.1.1.1 |
| bar.foo.com. | 500 NS | foo.com. |
For the query `<bar.foo.com., SOA>` the answer from the BIND server is:
```
";QUESTION",
"bankcard.foo.com. IN SOA",
";ANSWER",
";AUTHORITY",
"bar.foo.com. 500 IN NS foo.com.",
";ADDITIONAL"
```
Other implementations return the additional section as follows:
```
";ADDITIONAL",
"foo.com. 500 IN A 1.1.1.1",
"foo.com. 500 IN AAAA 2400:cb00:2049:1::a29f:1804"
```
### What is the expected *correct* behavior?
The IP records can also be returned.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2384Sibling (In-bailiwick rule of RFC 8499) domain IP records not returned2022-03-01T09:42:33ZSiva Kesava R KakarlaSibling (In-bailiwick rule of RFC 8499) domain IP records not returned### Summary
In the case of delegation, when the name server's name is subordinate to the zone origin but not to the same owner name of the NS records, then the IP records are not returned. There is no rule mentioning that they have to b...### Summary
In the case of delegation, when the name server's name is subordinate to the zone origin but not to the same owner name of the NS records, then the IP records are not returned. There is no rule mentioning that they have to be returned, but other implementations like PowerDNS, Knot, NSD return them, so I am curious why BIND made this choice. (I am sorry if it was already mentioned in other issues; I did not find info when I searched.)
### BIND version used
BIND 9.17.8 (Development Release) <id:8c6db04>
### Steps to reproduce
Consider the following zone file:
| | | |
|--------------------|-----------|----------------------------------------------------------|
| campus.edu. | 500 SOA | ns1.outside.edu. root.campus.edu. 3 86400 7200 604800 300 |
| campus.edu. | 500 NS | ns1.outside.edu. |
| foo.campus.edu. | 500 NS | bar.campus.edu. |
| bar.campus.edu. | 500 A | 1.1.1.1 |
For the query `<a.foo.campus.edu., A>` the answer from the BIND server is:
```
"opcode QUERY",
"rcode NOERROR",
"flags QR",
";QUESTION",
"a.foo.campus.edu. IN A",
";ANSWER",
";AUTHORITY",
"foo.campus.edu. 500 IN NS bar.campus.edu. ",
";ADDITIONAL"
```
### What is the current *bug* behavior?
As mentioned earlier, this is not a buggy behavior but a deviation from others where they return the `A` record in the additional section.
### What is the expected *correct* behavior?
The glue records for sibling domains can also be returned.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2368Log more detail when IXFR fails with 'failed while receiving responses: not e...2023-11-02T16:26:05ZCathy AlmondLog more detail when IXFR fails with 'failed while receiving responses: not exact'Per [Support ticket #16866](https://support.isc.org/Ticket/Display.html?id=16866)
There are actually two things that it would be helpful to log:
1. When the zone update fails, what is the exact scenario was that caused the failure (in...Per [Support ticket #16866](https://support.isc.org/Ticket/Display.html?id=16866)
There are actually two things that it would be helpful to log:
1. When the zone update fails, what is the exact scenario was that caused the failure (including the RRs involved).
In this specific instance, it was a duplicate add, generated erroneously by a registry application, but not knowing why the zone update failed made the troubleshooting much more difficult.
2. Leading on from 1, it was discovered that while named will fail to accept an IXFR'd ADD that is a duplicate when this arrives in a new incremental update, it will ignore duplicate ADDs that are part of the same incremental update. This was somewhat of a surprise, and which also contributed to the difficulty troubleshooting (because 'we know there are occasional duplicates, but they don't usually cause a problem').
Therefore, if named is going to silently discard some duplicate adds, it should at least log that it did so!Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2358Update CI to have poisoned header files2022-03-01T09:42:36ZMark AndrewsUpdate CI to have poisoned header filesUpdate the CI to have a system with poisoned header files installed to detect when include order has been broken. The header files in the build / source tree should be found before these poisoned header files.
The contents of the poiso...Update the CI to have a system with poisoned header files installed to detect when include order has been broken. The header files in the build / source tree should be found before these poisoned header files.
The contents of the poisoned header files should be something like `#error fix include order`.
/usr/include and /usr/local/include would be ideal locations to add poisoned header files.
#2357 is what happens when we don't detect this at development time. I used poisoned <isc/types.h> and <dns/types.h> when testing the fixes for #2357 but really should have every header file with poisoned versions.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2345dig fails with "isc_nm_tcpdnsconnect: address in use" on OpenBSD2023-02-23T15:18:06ZMichal Nowakdig fails with "isc_nm_tcpdnsconnect: address in use" on OpenBSD`kasp` and `dnssec` (sometimes `autosign` and `zero`) system tests started failing reliably on OpenBSD on `main` after ff2bc7891e99442df51acea1110ad599ddc6756a got merged. [#1295042](https://gitlab.isc.org/isc-projects/bind9/-/jobs/12950...`kasp` and `dnssec` (sometimes `autosign` and `zero`) system tests started failing reliably on OpenBSD on `main` after ff2bc7891e99442df51acea1110ad599ddc6756a got merged. [#1295042](https://gitlab.isc.org/isc-projects/bind9/-/jobs/1295042) is the first scheduled CI pipeline which fails with:
```
dig: isc_nm_tcpdnsconnect: address in use
```
I was able to identify; latest is https://gitlab.isc.org/isc-projects/bind9/-/jobs/1358830.
No specific part of `kasp` and `dnssec` system tests seems to trigger this issue. Also this issue is present only when multiple test runs at the same time, tests don't fail when they run on their own.Not planned