BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2023-11-20T10:39:24Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4426Feature request - client.bind chaos class queries2023-11-20T10:39:24ZRay BellisFeature request - client.bind chaos class queriesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4371All the things that need to be fixed before 9.202024-03-27T14:02:04ZMatthijs Mekkingmatthijs@isc.orgAll the things that need to be fixed before 9.20This is an overarching issue for keeping track on all the things that need to be completed before the 9.20.0 release.
### Features
- [ ] #1128 Offline KSK (:gear: @matthijs)
- [x] #1129 HSM support via pkcs11-provider
- [x] #4363 Enfor...This is an overarching issue for keeping track on all the things that need to be completed before the 9.20.0 release.
### Features
- [ ] #1128 Offline KSK (:gear: @matthijs)
- [x] #1129 HSM support via pkcs11-provider
- [x] #4363 Enforce stricter NSEC3 parameter limits
- [x] #4388 Accepting PROXYv2
- [x] #4241 Expose data about 'first time' zone maintenance in-progress
- [ ] #2099 Implement ZoneMD signature generation and verification. (:gear: !5217 @marka, @each)
### Config incompatibilities
- [x] #4364 named-compilezone defaults
- [x] #4373 safer "dnssec-validation yes"
- [x] #4447 "stale-answer-client-timeout" must be zero (:gear: !8699 @aram)
### Refactoring
- [x] #4411 QPDB lite (:gear: !8726 @matthijs, @each)
- [x] #4251 system test runner
### Bugs
- [x] #4340 "max-cache-size" is a no-op since BIND 9.19.16
- [x] #4213 BIND shutdown hang in checkds/ns9/ in cross-version-config-tests job
- [x] #4060 named doesn't shut down after receiving rndc stop command
- [x] #4211 AssertionError: named crashed, shutdown crash
- [ ] #4403 Resolve spike in memory at start of named (:gear: @ondrej)
- [ ] #4481 TCP issue (:gear: isc-private/bind9!639 @ondrej)
- [ ] #4475 Data races in isc_buffer_peekuint8, rdataset_settrust, and memmove (:gear: !8645 @marka)
- [x] #4625 DNSSEC validation incompatibility
- [ ] #4652 Server crash caused by external UDP queriesBIND 9.19.x2024-05-02https://gitlab.isc.org/isc-projects/bind9/-/issues/4199dig (and other tools) may send queries with QID=0, which confuses Net::DNS2023-11-02T16:30:30ZMichał Kępieńdig (and other tools) may send queries with QID=0, which confuses Net::DNSUnless specified manually using `+qid=<value>`, `dig` uses a random
query ID for the DNS messages it sends out:
https://gitlab.isc.org/isc-projects/bind9/-/blob/bf8acd455693edef03881fd2180c5561bc0db66d/bin/dig/dighost.c#L2334
In partic...Unless specified manually using `+qid=<value>`, `dig` uses a random
query ID for the DNS messages it sends out:
https://gitlab.isc.org/isc-projects/bind9/-/blob/bf8acd455693edef03881fd2180c5561bc0db66d/bin/dig/dighost.c#L2334
In particular, the value chosen can be 0. While QID=0 is perfectly
legal protocol-wise, it seems that some code bases, e.g. Net::DNS, are
unable to properly handle queries with QID=0. Here is an example:
https://gitlab.isc.org/isc-private/bind9/-/jobs/3509123
```
2023-07-06 14:14:45 INFO:serve-stale I:serve-stale_tmp_iwl06k82:disable responses from authoritative server (89)
2023-07-06 14:14:57 INFO:serve-stale I:serve-stale_tmp_iwl06k82:failed
```
`bin/tests/system/serve-stale_tmp_iwl06k82/dig.out.test89`:
```
;; Warning: ID mismatch: expected ID 0, got 46879
;; communications error to 10.53.0.2#19223: timed out
; <<>> DiG 9.19.15 <<>> +time +tries -p 19223 @10.53.0.2 txt disable
; (1 server found)
;; global options: +cmd
;; no servers could be reached
```
This looked weird to me, so I started `ans2/ans2.pl` manually and sent a
query to it using `dig @10.53.0.2 -p 5300 disable. TXT +qid=0 +tries=1`.
Guess what:
```
;; Warning: ID mismatch: expected ID 0, got 27885
;; communications error to 10.53.0.2#5300: timed out
; <<>> DiG 9.19.15 <<>> @10.53.0.2 -p 5300 disable. TXT +qid=0 +tries=1
; (1 server found)
;; global options: +cmd
;; no servers could be reached
```
Looking at [Net::DNS sources][1], the documentation says:
```
=head2 id
print "query id = ", $packet->header->id, "\n";
$packet->header->id(1234);
Gets or sets the query identification number.
A random value is assigned if the argument value is undefined.
```
However, the above seems to be imprecise: apparently if the ID is
*defined*, but *set to 0*, Net::DNS treats it as an undefined value.
This causes the `$packet->header->id` call to return a random value
instead of 0 for queries with QID=0, breaking responses to such queries.
I don't see any reasonable way to work around this problem in our Perl
code (apart from converting it to Python). Adding `+qid` to every `dig`
invocation in the system test suite also seems over the top for working
around something this silly. However, until we do something about this,
we might be seeing a whole class of surprising failures in the system
test suite caused by this behavior.
[1]: https://www.net-dns.org/svn/net-dns/trunk/lib/Net/DNS/Header.pmNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4010Allow for scripts / hooks for key rollovers2023-04-11T12:43:32ZKarol BabiochAllow for scripts / hooks for key rollovers### Description
It seems like currently there is no good way on how to automate a KSK rollover, since the corresponding DS record has to published in the parent zone. While there is [RFC7344](https://datatracker.ietf.org/doc/html/rfc734...### Description
It seems like currently there is no good way on how to automate a KSK rollover, since the corresponding DS record has to published in the parent zone. While there is [RFC7344](https://datatracker.ietf.org/doc/html/rfc7344), in reality it is not widely adopted. Personally I don't know any registrar who supports this yet. Anyway, this would require TSIG to be secure anyway.
One of my registrars offers an HTTPS-based API to manage DNSSEC records. Hence, its possible to write scripts that will automate the key rollover process.
### Request
There should be a way to trigger a script (with some inputs such as the key id, the DS record, etc.) whenever BIND is about to rotate a key. This way it should be possible to use `dnssec-policy` and fully automate the key rollover process, including the `KSK` key (rather than only the `ZSK` key).
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3992The XFR unreachable cache redesign2024-03-01T09:52:29ZOndřej SurýThe XFR unreachable cache redesignThe unreachable cache for **dead primaries** was added to BIND 9 in 2006 via 1372e172d0e0b08996376b782a9041d1e3542489. It features a 10-slot LRU array with 600 seconds (10 minutes) fixed delay. During this time, any primary with a hicc...The unreachable cache for **dead primaries** was added to BIND 9 in 2006 via 1372e172d0e0b08996376b782a9041d1e3542489. It features a 10-slot LRU array with 600 seconds (10 minutes) fixed delay. During this time, any primary with a hiccup would be blocked for the whole block duration (unless overwritten by a different dead primary).
One can argue:
- 10 minutes is too long for a fixed, non-configurable delay
- 10 slots are not enough - servers could be running 1M and more zones with different primaries; and especially in situations like these, there's a high chance that more primaries would be having problems
I think this needs a redesign, but meanwhile - I think that we can drop the `UNREACH_HOLD_TIME` to something like 10 seconds (or 60?) - this should still prevent a thundering herd over the unresponsive server, but the recovery is going to be much faster.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3779catalog zone grammar does not enforce default-primaries key / should we suppo...2023-03-16T08:59:16ZPetr Špačekpspacek@isc.orgcatalog zone grammar does not enforce default-primaries key / should we support primary zones in catalog?### Summary
Catalog zones grammar in named.conf does not enforce/require `default-primaries` key. This can be either bug, or an opportunity to extend the feature in an meaningful way.
### BIND version used
* ~"Affects v9.16": d14a22b3...### Summary
Catalog zones grammar in named.conf does not enforce/require `default-primaries` key. This can be either bug, or an opportunity to extend the feature in an meaningful way.
### BIND version used
* ~"Affects v9.16": d14a22b3d9fa8e8bb21dfe3bb0bca216a5b93910
* ~"Affects v9.18": f5e7192691568d4c089fbdd4ed4e93c7af785bae
* ~"Affects v9.19": 0e489b9ed4ba7821c50038dade014bf2b706bd12
### Steps to reproduce
1. Define catalog zone **without** `default-primaries` key. E.g.
```
catalog-zones {
zone "catalog.invalid"
//default-masters { 127.0.0.2; }
in-memory no
zone-directory "catzones"
min-update-interval 1;
};
```
2a. Start **with** matching files on disk
2b. Start **without** matching files on disk
### What is the current *bug* behavior?
The config is accepted by parser but causes surprising behavior later on.
Variant 2A:
The zone is on disk under correct name, and it loads just fine when the file is available in `catzones` directory. `rndc zonestatus` then reports:
```
name: .
type: secondary
files: catzones/__catz___default_catalog.invalid_..db
serial: 2023010600
nodes: 8438
last loaded: Fri, 06 Jan 2023 16:03:08 GMT
next refresh: Fri, 06 Jan 2023 16:12:19 GMT
expires: Fri, 13 Jan 2023 16:03:08 GMT
secure: yes
inline signing: no
key maintenance: none
dynamic: no
reconfigurable via modzone: yes
```
Next time refresh timer hits it errors out with
```
zone ./IN: cannot refresh: no primaries
```
but continues serving the zone until it expires. Kind of works, but not so much because it can never refresh and is bound to expire eventually.
Variant 2B:
File is not on disk. It fails to load as expected, and logs
```
zone ./IN: cannot refresh: no primaries
```
immediately.
### Possible fixes
I can see two options:
a) Require the `default-primaries` and error out if it is not present. That would be the same as for regular secondary zones, I believe.
b) Make this behavior "supported", probably by switching zone type to "primary" in case there is no `default-primaries` defined for the respective catalog. (In that case `in-memory` must be configured as `no`.)
Personally I think it makes sense to do b) because it eliminates need to have two different per-zone config management procedures for primaries.
I mean - with "strict" variant adding a new primary zone always requires `rndc addzone` + catalog zone modification on the primary side.
With less strict variant `rndc addzone` is not necessary and the whole state is in the catalog zone, which is has to be maintained for secondaries anyway.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3695Improvement: Including query time in dnstap CLIENT_RESPONSE messages2023-01-11T12:15:33ZBorja Marcos EA2EKHImprovement: Including query time in dnstap CLIENT_RESPONSE messages### Description
While the dnstap specification recommends including the query time for AUTH_RESPONSE, RESOLVER_RESPONSE and
CLIENT_RESPONSE dnstap messages, the latter is excluded.
Having the query time in CLIENT_RESPONSE dnstap messa...### Description
While the dnstap specification recommends including the query time for AUTH_RESPONSE, RESOLVER_RESPONSE and
CLIENT_RESPONSE dnstap messages, the latter is excluded.
Having the query time in CLIENT_RESPONSE dnstap messages would be very useful when using dnstap to keep track
of response times.
### Request
In lib/dns/dnstap.c (both for 9.16 and 9.18) the dns_dt_send function accepts the qtime and rtime parameters.
However, when building the dnstap message, CLIENT_RESPONSE messages are prevented from using the qtime parameter.
` dm.m.has_response_time_sec = 1;
dm.m.response_time_nsec = isc_time_nanoseconds(t);
dm.m.has_response_time_nsec = 1;
/*
* Types CR, RR, and FR can fall through and get the query
* time set as well. Any other response type, break.
*/
if (msgtype != DNS_DTTYPE_RR && msgtype != DNS_DTTYPE_FR
&& msgtype != DNS_DTTYPE_CR) { // << I HAVE ADDED THIS!
break;
}
FALLTHROUGH;
case DNS_DTTYPE_AQ:
case DNS_DTTYPE_CQ:
case DNS_DTTYPE_FQ:
case DNS_DTTYPE_RQ:
case DNS_DTTYPE_SQ:
case DNS_DTTYPE_TQ:
case DNS_DTTYPE_UQ:
if (qtime != NULL) {
t = qtime;
}
dm.m.query_time_sec = isc_time_seconds(t);
dm.m.has_query_time_sec = 1;
dm.m.query_time_nsec = isc_time_nanoseconds(t);
dm.m.has_query_time_nsec = 1;
break;
`
I have tried making the simple change shown above (so that qtime is considered for
CLIENT_RESPONSE messages as well) and it works both for 9.16.35 and 9.18.9.
The change looks safe enough (it won´t crash because if qtime is NULL t will contain a
timestamp obtained when dns_dt_send() is invoked) and at worst it would contain a false
qtime.
A more correct alternative would be to include it for CLIENT_RESPONSE messages only if qtime != NULL. But
I don´t know whether it can happen or all the calls to dns_dt_send() will contain qtime.
Also, is it possible for qtime to be missing for a CLIENT_RESPONSE but not for a RESOLVER_RESPONSE? Because for a RESOLVER_RESPONSE it would mean that query time in the dnstap message would contain the timestamp obtained in dns_dt_send() and, being probably
greater than the response time itself that would botch a time difference calculation.
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3687Add the ability to specify TLS configuration at the zone level for catalog zones2023-03-29T07:57:03ZMark AndrewsAdd the ability to specify TLS configuration at the zone level for catalog zonesCurrently the only way to specify which TLS configuration to use with catalog member zones is to inherit it from the default-primaries settings.
One possible mechanism would be to support multiple fields in the TXT record that currently...Currently the only way to specify which TLS configuration to use with catalog member zones is to inherit it from the default-primaries settings.
One possible mechanism would be to support multiple fields in the TXT record that currently specifies the TSIG key with "" indicating that field is empty.
e.g.
- TXT keyname TLS-configuration
- TXT keyname
- TXT "" TLS-configuration
- TXT "" ""
- TXT keyname ""
Deploying such a change would require the servers involved to be upgraded prior to the use of the new record format.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3311Consider parent-centric delegations2024-03-01T10:04:57ZOndřej SurýConsider parent-centric delegationsThis is an umbrella issue to discuss the parent vs child-centric delegations.
## Child-centric NS
The child-centric NS way lets the child NS records override the delegation NS, but the parent NS has to be used at least once. This work...This is an umbrella issue to discuss the parent vs child-centric delegations.
## Child-centric NS
The child-centric NS way lets the child NS records override the delegation NS, but the parent NS has to be used at least once. This works fine as long as the parent and child NS records are in sync. When they are not in sync (both inter and intra), the used delegation NS can vary between runs based on what's in the cache.
## Parent-centric NS
The parent-centric NS way always uses the parent NS records for delegations, but requires a separate "delegation" database that's distinct from the resource-record cache. The parent-centric NS doesn't suffer from the problems that could happen when the child-NS and parent-NS are out of sync - there's only one "authority" for the delegation NS (parent).
This approach is not without problems - because of the way DNS is (under-)specified, the child-centric NS has been used for a long time, and changing the BIND 9 to use parent NS will break some users' expectations. Fortunately for us, this path has been already paved by (at least) Nominum Vantio and Google Public DNS (and apparently the world didn't collapse).
## To be considered
- [ ] DS vs apex-CNAME
- [ ] parent vs child NSEC RRsets
- [ ] glue records from the parent pointing into the child zone
- [ ] Debug/query options
(add more as stuff comes up in the discussion)Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3305Consider removing the built-in "_bind" view from the default configuration2022-05-02T08:07:17ZMichał KępieńConsider removing the built-in "_bind" view from the default configurationTime to stir up a hornet's nest!
The built-in `_bind` view has been part of BIND 9 since version 9.3.0.
Its purpose is to service CHAOS class queries for the following zones:
- `version.bind`
- `hostname.bind`
- `id.server`
- `...Time to stir up a hornet's nest!
The built-in `_bind` view has been part of BIND 9 since version 9.3.0.
Its purpose is to service CHAOS class queries for the following zones:
- `version.bind`
- `hostname.bind`
- `id.server`
- `authors.bind`
I have some thoughts on these. YMMV.
- `version.bind`: commonly set to `none` or some nonsense string in
production environments because it is believed to be a security hole
:shrug: [citation needed]
- `hostname.bind`: superseded by [NSID][1], I think?
- `id.server`: same.
That leaves us with `authors.bind`, which is a bit of a delicate topic.
I would not want to hurt anyone's feelings, so please just hear me out;
this issue is meant to be a place for discussion.
The primary problem I have with the `_bind` view is that it is a
liability on memory-constrained platforms because its presence in the
default configuration causes a useless `dns_resolver_t` object to be
[unconditionally created][2] upon `named` startup. That is no small
object: it comes with tasks, dispatches, etc. - the ironic part being
that this view does not need recursion at all (`recursion no;` does not
help). To the best of my knowledge, there is no way to disable creating
that view in the configuration file; it can only be *replaced* with a
different view, which does not prevent the memory use problem.
Other hiccups which this view has caused in the past (that I can
recall...) include making the default configuration vulnerable to a
security issue related to RRL, which is enabled for the `_bind` view by
default (see [CVE-2021-25218][3]), or having to extend its configuration
to prevent it from uselessly allocating even more memory on startup (see
86698ded32515710b5b8734b4ed8ac4d2be62b60).
I have been running a home resolver with the `_bind` view removed from
the source code for about a year and a half now and I have not noticed
any adverse effects caused by that modification.
I think we should consider removing the `_bind` view from the default
configuration. It can always be re-enabled via explicit configuration,
if somebody wants that. In other words, I think it should be "opt-in"
rather than "opt-out" (noting that there is no way to *actually* opt-out
right now). I am *not* proposing to remove the code responsible for
preparing the contents of the `authors.bind` zone or any other built-in
zone served by the `_bind` view. It's just that IMHO the long-term
costs of maintaining this view in the default configuration are not
worth the benefits.
Let the tomatoes fly :tomato: :tomato: :tomato:
[1]: https://datatracker.ietf.org/doc/html/rfc5001
[2]: https://gitlab.isc.org/isc-projects/bind9/-/blob/fcab10a26ece6419c2f53a2ad82499b4b5ba75c5/bin/named/server.c#L4740-4743
[3]: https://gitlab.isc.org/isc-projects/bind9/-/issues/2856#note_229301Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3050Post load checking of missing delegations2023-11-02T16:26:08ZMark AndrewsPost load checking of missing delegationsIs it worth while to perform a post load DS lookup for each primary / slave zone against the other loaded zone looking for a NXDOMAIN response which would indicate a missing delegation? This would catch cases like bhutan.gov.bt where b...Is it worth while to perform a post load DS lookup for each primary / slave zone against the other loaded zone looking for a NXDOMAIN response which would indicate a missing delegation? This would catch cases like bhutan.gov.bt where both it and the parent zone are served by the same servers but there isn't a delegation for bhutan.gov.bt in the gov.bt zone.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2879Add --disable-doh to a CI build?2023-11-02T16:26:08ZMark AndrewsAdd --disable-doh to a CI build?The following discussion from !5353 should be addressed:
- [ ] @marka started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5353#note_231504): (+1 comment)
> One remaining question is "do we add yet ano...The following discussion from !5353 should be addressed:
- [ ] @marka started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5353#note_231504): (+1 comment)
> One remaining question is "do we add yet another system with --disable-doh to CI?"
- [ ] Also should we have a CI build that does not have libnghttp2 installed.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2485DNS protocol cleanup: require correct AA bit2023-08-16T16:51:42ZPetr Špačekpspacek@isc.orgDNS protocol cleanup: require correct AA bit### Description
Allegedly different resolvers treat AA bit in responses differently, and this is causing different operational problems for each implementation. PowerDNS and Knot Resolver have had issues with that.
Proposal by Peter va...### Description
Allegedly different resolvers treat AA bit in responses differently, and this is causing different operational problems for each implementation. PowerDNS and Knot Resolver have had issues with that.
Proposal by Peter van Dijk is to be strict on AA bit and punish non-compliance. Main motivation seems to be code simplification when it comes various combinations of NXDOMAIN/NOERROR without SOA RR and/or "extra" NS records in authority which are sometimes added as "good measure" but do not actually mean a referral.
Anecdotes from the field:
a) Ralf Weber from Akamai has some reservations:
> Given that a lot of people use resolvers in front of their authoritative servers who don't send AA I fail to envision what resolvers should do. If we drop non AA answers I expect huge portion of the Internet to go dark, though I don't have hard numbers on that.
b) Recent versions of PowerDNS switched to stricter mode and insist on AA bit being correct. A person from Deutsche Telecom claims this:
> To give a sense of possible impact, we have tens of millions of subscribers and only 5-10 cases per year estimated. So I guess nothing would "go dark" :slightly_smiling_face:
### Links / references
Thread https://chat.dns-oarc.net/community/pl/57pcpenfkf86tr8onmhn1q5a4a
Personally I argue this is
a) not significant enough
b) not widespread enough
to warrant full fledged flag day, but we can start being stricter on AA bit if we decide to do so. PowerDNS already went in that direction so first-mover disadvantage is already paid :-)Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2405[ISC-support #17264] ADB overmem condition and cleaning - very difficult to d...2024-03-01T10:04:57ZBrian Conry[ISC-support #17264] ADB overmem condition and cleaning - very difficult to detect and causes erratic behaviorThe ADB's mctx size is set to 1/8 of the max-cache-size, if set. This is the only means to control the ADB memory limit. There is also a hard-coded maximum ADB size applied to ADBs for views that share a cache.
When it goes overmem, t...The ADB's mctx size is set to 1/8 of the max-cache-size, if set. This is the only means to control the ADB memory limit. There is also a hard-coded maximum ADB size applied to ADBs for views that share a cache.
When it goes overmem, the ADB starts removing names and entries. The strategy for removing entries doesn't seem to be tied strongly to utility.
This can lead to erratic behavior as BIND is constantly forgetting information about server SRTTs, EDNS capabilities, and other useful data.
In some cases, if not all of the entries for servers associated with a zone are affected by the overmem purge, this can cause the resolver to fixate on a small subset of the servers authoritative for the zone - and not necessarily the subset with the best SRTT.
There is no logging at any level related to ADB overmem activities, nor are there any stats directly related to ADB memory usage.
There are stats for counts of names and entries, along with the number of buckets for each type, but there's no reliable way to map those to memory usage.
The stats channel does contain detail for the ADB memory contexts, but there's no reliable way to map those memory contexts to a particular view.
It seems likely that most of the time the symptoms of an overmem ADB will be minor and nearly impossible to directly measure - small delays and increases in CPU usage associated with the repeated creation and destruction of ADB entries and/or fixation on suboptimal upstream servers - but will definitely degrade the quality of service that the resolver is providing.
This behavior was noticed by a customer when their monitoring zone happened to, by chance, be negatively affected.
Most of the symptoms described here are theoretical, based on my understanding of the code and various customer-described, but unreproducable and otherwise unexplained, behaviors.
This issue is a feature request covering:
* specific testing by ISC to better understand the impact and range of behaviors with the ADB is overmem #2441
* additional logging related to ADB overmem activities #2435
* additional stats/metrics relating to ADB overmem #2436
* improvements to ADB overmem behavior (ideally based on some utility metric) #2437
* ability to directly control ADB size independent of cache size #2438
* revisit the hard-coded shared-cache maximum ADB size (e.g. remove in favor of configuration) #2439
* system tests related to any/all items aboveNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2358Update CI to have poisoned header files2022-03-01T09:42:36ZMark AndrewsUpdate CI to have poisoned header filesUpdate the CI to have a system with poisoned header files installed to detect when include order has been broken. The header files in the build / source tree should be found before these poisoned header files.
The contents of the poiso...Update the CI to have a system with poisoned header files installed to detect when include order has been broken. The header files in the build / source tree should be found before these poisoned header files.
The contents of the poisoned header files should be something like `#error fix include order`.
/usr/include and /usr/local/include would be ideal locations to add poisoned header files.
#2357 is what happens when we don't detect this at development time. I used poisoned <isc/types.h> and <dns/types.h> when testing the fixes for #2357 but really should have every header file with poisoned versions.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2340Enable logging of rpz re-writes to dnstap.2024-03-27T13:54:38ZPeter DaviesEnable logging of rpz re-writes to dnstap.### Description
Enable logging of rpz re-writes to dnstap.
The ability to send rpz rewrite information that is generated by category rpz to the dnstap output stream.
[RT #17273](https://support.isc.org/Ticket/Display.html?id=17273)### Description
Enable logging of rpz re-writes to dnstap.
The ability to send rpz rewrite information that is generated by category rpz to the dnstap output stream.
[RT #17273](https://support.isc.org/Ticket/Display.html?id=17273)Not plannedEvan HuntEvan Hunthttps://gitlab.isc.org/isc-projects/bind9/-/issues/2032Review BIND Performance suggestions KB2023-11-02T17:00:02ZVicky Riskvicky@isc.orgReview BIND Performance suggestions KBDraft is in document360.
Preview is at https://kb.isc.org/preview/v1/dbe412aa-9e0c-4071-ab12-90bfd02b877f/1
What we need is not so much Editing as Improvement:
- this is (sadly) not going to be much help to the more sophisticated users,...Draft is in document360.
Preview is at https://kb.isc.org/preview/v1/dbe412aa-9e0c-4071-ab12-90bfd02b877f/1
What we need is not so much Editing as Improvement:
- this is (sadly) not going to be much help to the more sophisticated users, because most of the advice boils down to, you have to test on your own platform, with your own traffic, so
- given this advice has to be tailored more for people with less background in performance tuning, we should provide some sample cli or log messages to look for diagnosing whether the condition is present (e.g. low memory, buffer overflow, problems with fragmented packets...)
if we can provide any better advice on how to best measure performance on a production system (in this case on a resolver), imho that will be useful to a lot of ppl. I am sort of assuming most people are using something like Prometheus/Grafana today and looking at those charts.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4616Resolver cache redesign2024-03-01T12:29:31ZPetr Špačekpspacek@isc.orgResolver cache redesignThis is a meta issue to collect current problems & ideas what to do about it.
Current known problems:
- LRU cleaning can get state into a weird state: #2744
- Cache cleaning can block things, and is generally a mess: #3261, #4383
- Neg...This is a meta issue to collect current problems & ideas what to do about it.
Current known problems:
- LRU cleaning can get state into a weird state: #2744
- Cache cleaning can block things, and is generally a mess: #3261, #4383
- Negative answers from e.g. a random subdomain attack can push out useful things: #2495, #1831
- ADB vs. cache size is hardcoded and nobody knows if this is optimal or not: #2483, #2405
- Sizing is hard to get right: #614
- Cache is child-centric: #3311
- RRSIGs and not tightly bound to respective RR: #3396
- Data structures referenced by RBTDB are a mess: #4356, #3403, #3405Štěpán BalážikŠtěpán Balážikhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3958Adjust default tcp-clients value upward2023-09-12T09:09:13ZVicky Riskvicky@isc.orgAdjust default tcp-clients value upwardThe default for tcp-clients is set at 150. As more users are now supporting encrypted DNS, which sessions use TCP, it is likely that the % of overall DNS sessions using TCP will increase, and the current default quota will be too low for...The default for tcp-clients is set at 150. As more users are now supporting encrypted DNS, which sessions use TCP, it is likely that the % of overall DNS sessions using TCP will increase, and the current default quota will be too low for many users.
Although it is impossible to determine the ideal setting for all users, it seems likely that users who need to limit TCP sessions can support at least an order of magnitude more sessions, like maybe 2,000.
If we are very worried about impacting small-system users of BIND, perhaps we could just change the setting for BIND -S, which is not available to hobbyists?https://gitlab.isc.org/isc-projects/bind9/-/issues/3864Investigate the hot-keys problem2023-02-14T11:03:50ZOndřej SurýInvestigate the hot-keys problem### Hot Keys
Sharded data structures are somewhat vulnerable to hot keys. I.e. a single key which is frequently operated on. In this case there will be high contention on the shard for which this key belongs to. This can be mitigated by...### Hot Keys
Sharded data structures are somewhat vulnerable to hot keys. I.e. a single key which is frequently operated on. In this case there will be high contention on the shard for which this key belongs to. This can be mitigated by introducing a non-determinstic sharding function which places hot keys in more than 1 shard. This solution does complicate things though, and introduces a probabilistic component to the data structure (e.g. look ups may result in the key not being found, when in fact it is actually in another shard. This trade off is generally acceptable in cache scenarios, where failed look ups just result in the key being re-populated).
Source: http://quinnftw.com/sharding-to-reduce-mutex-contention/
(This seems like something that might be happening in the tree structure like DNS itself for the portions of the hierarchy close to the "trunk".)