BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2024-03-27T14:02:04Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4371All the things that need to be fixed before 9.202024-03-27T14:02:04ZMatthijs Mekkingmatthijs@isc.orgAll the things that need to be fixed before 9.20This is an overarching issue for keeping track on all the things that need to be completed before the 9.20.0 release.
### Features
- [ ] #1128 Offline KSK (:gear: @matthijs)
- [x] #1129 HSM support via pkcs11-provider
- [x] #4363 Enfor...This is an overarching issue for keeping track on all the things that need to be completed before the 9.20.0 release.
### Features
- [ ] #1128 Offline KSK (:gear: @matthijs)
- [x] #1129 HSM support via pkcs11-provider
- [x] #4363 Enforce stricter NSEC3 parameter limits
- [x] #4388 Accepting PROXYv2
- [x] #4241 Expose data about 'first time' zone maintenance in-progress
- [ ] #2099 Implement ZoneMD signature generation and verification. (:gear: !5217 @marka, @each)
### Config incompatibilities
- [x] #4364 named-compilezone defaults
- [x] #4373 safer "dnssec-validation yes"
- [x] #4447 "stale-answer-client-timeout" must be zero (:gear: !8699 @aram)
### Refactoring
- [x] #4411 QPDB lite (:gear: !8726 @matthijs, @each)
- [x] #4251 system test runner
### Bugs
- [x] #4340 "max-cache-size" is a no-op since BIND 9.19.16
- [x] #4213 BIND shutdown hang in checkds/ns9/ in cross-version-config-tests job
- [x] #4060 named doesn't shut down after receiving rndc stop command
- [x] #4211 AssertionError: named crashed, shutdown crash
- [ ] #4403 Resolve spike in memory at start of named (:gear: @ondrej)
- [ ] #4481 TCP issue (:gear: isc-private/bind9!639 @ondrej)
- [ ] #4475 Data races in isc_buffer_peekuint8, rdataset_settrust, and memmove (:gear: !8645 @marka)
- [x] #4625 DNSSEC validation incompatibility
- [ ] #4652 Server crash caused by external UDP queriesBIND 9.19.x2024-05-02https://gitlab.isc.org/isc-projects/bind9/-/issues/4616Resolver cache redesign2024-03-01T12:29:31ZPetr Špačekpspacek@isc.orgResolver cache redesignThis is a meta issue to collect current problems & ideas what to do about it.
Current known problems:
- LRU cleaning can get state into a weird state: #2744
- Cache cleaning can block things, and is generally a mess: #3261, #4383
- Neg...This is a meta issue to collect current problems & ideas what to do about it.
Current known problems:
- LRU cleaning can get state into a weird state: #2744
- Cache cleaning can block things, and is generally a mess: #3261, #4383
- Negative answers from e.g. a random subdomain attack can push out useful things: #2495, #1831
- ADB vs. cache size is hardcoded and nobody knows if this is optimal or not: #2483, #2405
- Sizing is hard to get right: #614
- Cache is child-centric: #3311
- RRSIGs and not tightly bound to respective RR: #3396
- Data structures referenced by RBTDB are a mess: #4356, #3403, #3405Štěpán BalážikŠtěpán Balážikhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4426Feature request - client.bind chaos class queries2023-11-20T10:39:24ZRay BellisFeature request - client.bind chaos class queriesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4199dig (and other tools) may send queries with QID=0, which confuses Net::DNS2023-11-02T16:30:30ZMichał Kępieńdig (and other tools) may send queries with QID=0, which confuses Net::DNSUnless specified manually using `+qid=<value>`, `dig` uses a random
query ID for the DNS messages it sends out:
https://gitlab.isc.org/isc-projects/bind9/-/blob/bf8acd455693edef03881fd2180c5561bc0db66d/bin/dig/dighost.c#L2334
In partic...Unless specified manually using `+qid=<value>`, `dig` uses a random
query ID for the DNS messages it sends out:
https://gitlab.isc.org/isc-projects/bind9/-/blob/bf8acd455693edef03881fd2180c5561bc0db66d/bin/dig/dighost.c#L2334
In particular, the value chosen can be 0. While QID=0 is perfectly
legal protocol-wise, it seems that some code bases, e.g. Net::DNS, are
unable to properly handle queries with QID=0. Here is an example:
https://gitlab.isc.org/isc-private/bind9/-/jobs/3509123
```
2023-07-06 14:14:45 INFO:serve-stale I:serve-stale_tmp_iwl06k82:disable responses from authoritative server (89)
2023-07-06 14:14:57 INFO:serve-stale I:serve-stale_tmp_iwl06k82:failed
```
`bin/tests/system/serve-stale_tmp_iwl06k82/dig.out.test89`:
```
;; Warning: ID mismatch: expected ID 0, got 46879
;; communications error to 10.53.0.2#19223: timed out
; <<>> DiG 9.19.15 <<>> +time +tries -p 19223 @10.53.0.2 txt disable
; (1 server found)
;; global options: +cmd
;; no servers could be reached
```
This looked weird to me, so I started `ans2/ans2.pl` manually and sent a
query to it using `dig @10.53.0.2 -p 5300 disable. TXT +qid=0 +tries=1`.
Guess what:
```
;; Warning: ID mismatch: expected ID 0, got 27885
;; communications error to 10.53.0.2#5300: timed out
; <<>> DiG 9.19.15 <<>> @10.53.0.2 -p 5300 disable. TXT +qid=0 +tries=1
; (1 server found)
;; global options: +cmd
;; no servers could be reached
```
Looking at [Net::DNS sources][1], the documentation says:
```
=head2 id
print "query id = ", $packet->header->id, "\n";
$packet->header->id(1234);
Gets or sets the query identification number.
A random value is assigned if the argument value is undefined.
```
However, the above seems to be imprecise: apparently if the ID is
*defined*, but *set to 0*, Net::DNS treats it as an undefined value.
This causes the `$packet->header->id` call to return a random value
instead of 0 for queries with QID=0, breaking responses to such queries.
I don't see any reasonable way to work around this problem in our Perl
code (apart from converting it to Python). Adding `+qid` to every `dig`
invocation in the system test suite also seems over the top for working
around something this silly. However, until we do something about this,
we might be seeing a whole class of surprising failures in the system
test suite caused by this behavior.
[1]: https://www.net-dns.org/svn/net-dns/trunk/lib/Net/DNS/Header.pmNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4010Allow for scripts / hooks for key rollovers2023-04-11T12:43:32ZKarol BabiochAllow for scripts / hooks for key rollovers### Description
It seems like currently there is no good way on how to automate a KSK rollover, since the corresponding DS record has to published in the parent zone. While there is [RFC7344](https://datatracker.ietf.org/doc/html/rfc734...### Description
It seems like currently there is no good way on how to automate a KSK rollover, since the corresponding DS record has to published in the parent zone. While there is [RFC7344](https://datatracker.ietf.org/doc/html/rfc7344), in reality it is not widely adopted. Personally I don't know any registrar who supports this yet. Anyway, this would require TSIG to be secure anyway.
One of my registrars offers an HTTPS-based API to manage DNSSEC records. Hence, its possible to write scripts that will automate the key rollover process.
### Request
There should be a way to trigger a script (with some inputs such as the key id, the DS record, etc.) whenever BIND is about to rotate a key. This way it should be possible to use `dnssec-policy` and fully automate the key rollover process, including the `KSK` key (rather than only the `ZSK` key).
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3992The XFR unreachable cache redesign2024-03-01T09:52:29ZOndřej SurýThe XFR unreachable cache redesignThe unreachable cache for **dead primaries** was added to BIND 9 in 2006 via 1372e172d0e0b08996376b782a9041d1e3542489. It features a 10-slot LRU array with 600 seconds (10 minutes) fixed delay. During this time, any primary with a hicc...The unreachable cache for **dead primaries** was added to BIND 9 in 2006 via 1372e172d0e0b08996376b782a9041d1e3542489. It features a 10-slot LRU array with 600 seconds (10 minutes) fixed delay. During this time, any primary with a hiccup would be blocked for the whole block duration (unless overwritten by a different dead primary).
One can argue:
- 10 minutes is too long for a fixed, non-configurable delay
- 10 slots are not enough - servers could be running 1M and more zones with different primaries; and especially in situations like these, there's a high chance that more primaries would be having problems
I think this needs a redesign, but meanwhile - I think that we can drop the `UNREACH_HOLD_TIME` to something like 10 seconds (or 60?) - this should still prevent a thundering herd over the unresponsive server, but the recovery is going to be much faster.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3958Adjust default tcp-clients value upward2023-09-12T09:09:13ZVicky Riskvicky@isc.orgAdjust default tcp-clients value upwardThe default for tcp-clients is set at 150. As more users are now supporting encrypted DNS, which sessions use TCP, it is likely that the % of overall DNS sessions using TCP will increase, and the current default quota will be too low for...The default for tcp-clients is set at 150. As more users are now supporting encrypted DNS, which sessions use TCP, it is likely that the % of overall DNS sessions using TCP will increase, and the current default quota will be too low for many users.
Although it is impossible to determine the ideal setting for all users, it seems likely that users who need to limit TCP sessions can support at least an order of magnitude more sessions, like maybe 2,000.
If we are very worried about impacting small-system users of BIND, perhaps we could just change the setting for BIND -S, which is not available to hobbyists?https://gitlab.isc.org/isc-projects/bind9/-/issues/3864Investigate the hot-keys problem2023-02-14T11:03:50ZOndřej SurýInvestigate the hot-keys problem### Hot Keys
Sharded data structures are somewhat vulnerable to hot keys. I.e. a single key which is frequently operated on. In this case there will be high contention on the shard for which this key belongs to. This can be mitigated by...### Hot Keys
Sharded data structures are somewhat vulnerable to hot keys. I.e. a single key which is frequently operated on. In this case there will be high contention on the shard for which this key belongs to. This can be mitigated by introducing a non-determinstic sharding function which places hot keys in more than 1 shard. This solution does complicate things though, and introduces a probabilistic component to the data structure (e.g. look ups may result in the key not being found, when in fact it is actually in another shard. This trade off is generally acceptable in cache scenarios, where failed look ups just result in the key being re-populated).
Source: http://quinnftw.com/sharding-to-reduce-mutex-contention/
(This seems like something that might be happening in the tree structure like DNS itself for the portions of the hierarchy close to the "trunk".)https://gitlab.isc.org/isc-projects/bind9/-/issues/3779catalog zone grammar does not enforce default-primaries key / should we suppo...2023-03-16T08:59:16ZPetr Špačekpspacek@isc.orgcatalog zone grammar does not enforce default-primaries key / should we support primary zones in catalog?### Summary
Catalog zones grammar in named.conf does not enforce/require `default-primaries` key. This can be either bug, or an opportunity to extend the feature in an meaningful way.
### BIND version used
* ~"Affects v9.16": d14a22b3...### Summary
Catalog zones grammar in named.conf does not enforce/require `default-primaries` key. This can be either bug, or an opportunity to extend the feature in an meaningful way.
### BIND version used
* ~"Affects v9.16": d14a22b3d9fa8e8bb21dfe3bb0bca216a5b93910
* ~"Affects v9.18": f5e7192691568d4c089fbdd4ed4e93c7af785bae
* ~"Affects v9.19": 0e489b9ed4ba7821c50038dade014bf2b706bd12
### Steps to reproduce
1. Define catalog zone **without** `default-primaries` key. E.g.
```
catalog-zones {
zone "catalog.invalid"
//default-masters { 127.0.0.2; }
in-memory no
zone-directory "catzones"
min-update-interval 1;
};
```
2a. Start **with** matching files on disk
2b. Start **without** matching files on disk
### What is the current *bug* behavior?
The config is accepted by parser but causes surprising behavior later on.
Variant 2A:
The zone is on disk under correct name, and it loads just fine when the file is available in `catzones` directory. `rndc zonestatus` then reports:
```
name: .
type: secondary
files: catzones/__catz___default_catalog.invalid_..db
serial: 2023010600
nodes: 8438
last loaded: Fri, 06 Jan 2023 16:03:08 GMT
next refresh: Fri, 06 Jan 2023 16:12:19 GMT
expires: Fri, 13 Jan 2023 16:03:08 GMT
secure: yes
inline signing: no
key maintenance: none
dynamic: no
reconfigurable via modzone: yes
```
Next time refresh timer hits it errors out with
```
zone ./IN: cannot refresh: no primaries
```
but continues serving the zone until it expires. Kind of works, but not so much because it can never refresh and is bound to expire eventually.
Variant 2B:
File is not on disk. It fails to load as expected, and logs
```
zone ./IN: cannot refresh: no primaries
```
immediately.
### Possible fixes
I can see two options:
a) Require the `default-primaries` and error out if it is not present. That would be the same as for regular secondary zones, I believe.
b) Make this behavior "supported", probably by switching zone type to "primary" in case there is no `default-primaries` defined for the respective catalog. (In that case `in-memory` must be configured as `no`.)
Personally I think it makes sense to do b) because it eliminates need to have two different per-zone config management procedures for primaries.
I mean - with "strict" variant adding a new primary zone always requires `rndc addzone` + catalog zone modification on the primary side.
With less strict variant `rndc addzone` is not necessary and the whole state is in the catalog zone, which is has to be maintained for secondaries anyway.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3695Improvement: Including query time in dnstap CLIENT_RESPONSE messages2023-01-11T12:15:33ZBorja Marcos EA2EKHImprovement: Including query time in dnstap CLIENT_RESPONSE messages### Description
While the dnstap specification recommends including the query time for AUTH_RESPONSE, RESOLVER_RESPONSE and
CLIENT_RESPONSE dnstap messages, the latter is excluded.
Having the query time in CLIENT_RESPONSE dnstap messa...### Description
While the dnstap specification recommends including the query time for AUTH_RESPONSE, RESOLVER_RESPONSE and
CLIENT_RESPONSE dnstap messages, the latter is excluded.
Having the query time in CLIENT_RESPONSE dnstap messages would be very useful when using dnstap to keep track
of response times.
### Request
In lib/dns/dnstap.c (both for 9.16 and 9.18) the dns_dt_send function accepts the qtime and rtime parameters.
However, when building the dnstap message, CLIENT_RESPONSE messages are prevented from using the qtime parameter.
` dm.m.has_response_time_sec = 1;
dm.m.response_time_nsec = isc_time_nanoseconds(t);
dm.m.has_response_time_nsec = 1;
/*
* Types CR, RR, and FR can fall through and get the query
* time set as well. Any other response type, break.
*/
if (msgtype != DNS_DTTYPE_RR && msgtype != DNS_DTTYPE_FR
&& msgtype != DNS_DTTYPE_CR) { // << I HAVE ADDED THIS!
break;
}
FALLTHROUGH;
case DNS_DTTYPE_AQ:
case DNS_DTTYPE_CQ:
case DNS_DTTYPE_FQ:
case DNS_DTTYPE_RQ:
case DNS_DTTYPE_SQ:
case DNS_DTTYPE_TQ:
case DNS_DTTYPE_UQ:
if (qtime != NULL) {
t = qtime;
}
dm.m.query_time_sec = isc_time_seconds(t);
dm.m.has_query_time_sec = 1;
dm.m.query_time_nsec = isc_time_nanoseconds(t);
dm.m.has_query_time_nsec = 1;
break;
`
I have tried making the simple change shown above (so that qtime is considered for
CLIENT_RESPONSE messages as well) and it works both for 9.16.35 and 9.18.9.
The change looks safe enough (it won´t crash because if qtime is NULL t will contain a
timestamp obtained when dns_dt_send() is invoked) and at worst it would contain a false
qtime.
A more correct alternative would be to include it for CLIENT_RESPONSE messages only if qtime != NULL. But
I don´t know whether it can happen or all the calls to dns_dt_send() will contain qtime.
Also, is it possible for qtime to be missing for a CLIENT_RESPONSE but not for a RESOLVER_RESPONSE? Because for a RESOLVER_RESPONSE it would mean that query time in the dnstap message would contain the timestamp obtained in dns_dt_send() and, being probably
greater than the response time itself that would botch a time difference calculation.
### Links / referencesNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3687Add the ability to specify TLS configuration at the zone level for catalog zones2023-03-29T07:57:03ZMark AndrewsAdd the ability to specify TLS configuration at the zone level for catalog zonesCurrently the only way to specify which TLS configuration to use with catalog member zones is to inherit it from the default-primaries settings.
One possible mechanism would be to support multiple fields in the TXT record that currently...Currently the only way to specify which TLS configuration to use with catalog member zones is to inherit it from the default-primaries settings.
One possible mechanism would be to support multiple fields in the TXT record that currently specifies the TSIG key with "" indicating that field is empty.
e.g.
- TXT keyname TLS-configuration
- TXT keyname
- TXT "" TLS-configuration
- TXT "" ""
- TXT keyname ""
Deploying such a change would require the servers involved to be upgraded prior to the use of the new record format.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3679Fix stub zone operation when talking to minimal authoritative servers2023-04-17T10:16:31ZGreg ChoulesFix stub zone operation when talking to minimal authoritative servershttps://gitlab.isc.org/isc-projects/bind9/-/issues/1736 was created to fix the issue that when the authoritative server to which a stub zone is pointing is configured for "minimal-responses yes;" and the NS are in-zone, operation of the ...https://gitlab.isc.org/isc-projects/bind9/-/issues/1736 was created to fix the issue that when the authoritative server to which a stub zone is pointing is configured for "minimal-responses yes;" and the NS are in-zone, operation of the stub zone would fail because it did not receive address records for the zone's NS in the Additional section of the NS response.
I believe that the fix was applied at the wrong end of the conversation, by making the authoritative server's response to NS queries return Additional data even if "minimal-responses" is set to "yes".
A BIND recursive server (with stub zones configured) may be talking to a non-BIND authoritative server with a more strict minimal response configuration, in which case the problem would still exist.
I think the correct fix would be to modify the stub zone code so that, if it needs address records for the given NS, it queries for them. This should work without any extra user configuration because stub zones **must** be defined with at least one primary - a bit like root hints - which is used to send the initial SOA and NS queries. The same address(es) can be used for the address queries for the NS records, given that they are in the zone itself.https://gitlab.isc.org/isc-projects/bind9/-/issues/3311Consider parent-centric delegations2024-03-01T10:04:57ZOndřej SurýConsider parent-centric delegationsThis is an umbrella issue to discuss the parent vs child-centric delegations.
## Child-centric NS
The child-centric NS way lets the child NS records override the delegation NS, but the parent NS has to be used at least once. This work...This is an umbrella issue to discuss the parent vs child-centric delegations.
## Child-centric NS
The child-centric NS way lets the child NS records override the delegation NS, but the parent NS has to be used at least once. This works fine as long as the parent and child NS records are in sync. When they are not in sync (both inter and intra), the used delegation NS can vary between runs based on what's in the cache.
## Parent-centric NS
The parent-centric NS way always uses the parent NS records for delegations, but requires a separate "delegation" database that's distinct from the resource-record cache. The parent-centric NS doesn't suffer from the problems that could happen when the child-NS and parent-NS are out of sync - there's only one "authority" for the delegation NS (parent).
This approach is not without problems - because of the way DNS is (under-)specified, the child-centric NS has been used for a long time, and changing the BIND 9 to use parent NS will break some users' expectations. Fortunately for us, this path has been already paved by (at least) Nominum Vantio and Google Public DNS (and apparently the world didn't collapse).
## To be considered
- [ ] DS vs apex-CNAME
- [ ] parent vs child NSEC RRsets
- [ ] glue records from the parent pointing into the child zone
- [ ] Debug/query options
(add more as stuff comes up in the discussion)Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3305Consider removing the built-in "_bind" view from the default configuration2022-05-02T08:07:17ZMichał KępieńConsider removing the built-in "_bind" view from the default configurationTime to stir up a hornet's nest!
The built-in `_bind` view has been part of BIND 9 since version 9.3.0.
Its purpose is to service CHAOS class queries for the following zones:
- `version.bind`
- `hostname.bind`
- `id.server`
- `...Time to stir up a hornet's nest!
The built-in `_bind` view has been part of BIND 9 since version 9.3.0.
Its purpose is to service CHAOS class queries for the following zones:
- `version.bind`
- `hostname.bind`
- `id.server`
- `authors.bind`
I have some thoughts on these. YMMV.
- `version.bind`: commonly set to `none` or some nonsense string in
production environments because it is believed to be a security hole
:shrug: [citation needed]
- `hostname.bind`: superseded by [NSID][1], I think?
- `id.server`: same.
That leaves us with `authors.bind`, which is a bit of a delicate topic.
I would not want to hurt anyone's feelings, so please just hear me out;
this issue is meant to be a place for discussion.
The primary problem I have with the `_bind` view is that it is a
liability on memory-constrained platforms because its presence in the
default configuration causes a useless `dns_resolver_t` object to be
[unconditionally created][2] upon `named` startup. That is no small
object: it comes with tasks, dispatches, etc. - the ironic part being
that this view does not need recursion at all (`recursion no;` does not
help). To the best of my knowledge, there is no way to disable creating
that view in the configuration file; it can only be *replaced* with a
different view, which does not prevent the memory use problem.
Other hiccups which this view has caused in the past (that I can
recall...) include making the default configuration vulnerable to a
security issue related to RRL, which is enabled for the `_bind` view by
default (see [CVE-2021-25218][3]), or having to extend its configuration
to prevent it from uselessly allocating even more memory on startup (see
86698ded32515710b5b8734b4ed8ac4d2be62b60).
I have been running a home resolver with the `_bind` view removed from
the source code for about a year and a half now and I have not noticed
any adverse effects caused by that modification.
I think we should consider removing the `_bind` view from the default
configuration. It can always be re-enabled via explicit configuration,
if somebody wants that. In other words, I think it should be "opt-in"
rather than "opt-out" (noting that there is no way to *actually* opt-out
right now). I am *not* proposing to remove the code responsible for
preparing the contents of the `authors.bind` zone or any other built-in
zone served by the `_bind` view. It's just that IMHO the long-term
costs of maintaining this view in the default configuration are not
worth the benefits.
Let the tomatoes fly :tomato: :tomato: :tomato:
[1]: https://datatracker.ietf.org/doc/html/rfc5001
[2]: https://gitlab.isc.org/isc-projects/bind9/-/blob/fcab10a26ece6419c2f53a2ad82499b4b5ba75c5/bin/named/server.c#L4740-4743
[3]: https://gitlab.isc.org/isc-projects/bind9/-/issues/2856#note_229301Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3081When is BIND ready?2024-03-27T13:27:23ZGreg ChoulesWhen is BIND ready?Related to Support ticket [19717](https://support.isc.org/Ticket/Display.html?id=19717)
The purpose of this issue is to make BIND more verbose and precise about reporting various stages of readiness when starting up, leading to a definit...Related to Support ticket [19717](https://support.isc.org/Ticket/Display.html?id=19717)
The purpose of this issue is to make BIND more verbose and precise about reporting various stages of readiness when starting up, leading to a definitive "I'm ready now" log message.
The question an operator will want an answer to is, when can I send queries to this server again?
Different features will all have their own completeness check. For example: RPZ, local zones, remote zones, mirror zones, CATZ. The request is for new log messages to allow operators to track progress of each of these features and a new (or redefined) final log message when all tasks are complete.
What is a task? When is it complete and when is BIND ready to do that thing?
Us and the customer have, in parallel, come up with similar thinking on what needs to be done. The principle is, at startup time create a one-time todo list from the zone configuration statements. As each list item is completed, generate a signal and remove it from the list. When all items are completed generate a final completion signal and set the state of an indicator that can be queried by RNDC, so that users can test the current complete/not complete state periodically.
Taking some different types of zones as examples, we would expect behaviour like this:
Primary zones:
- Read zone data from local storage. Once this has been read into memory the zone is 'ready', a signal is generated and no further readiness checks need to be made: this task is complete.
Secondary zones:
- If a zone has been configured with a file, read zone data from local storage. Once this has been read into memory the zone is 'ready', a signal is generated and no further readiness checks need to be made: this task is complete. NOTE: checking whether the zone is up to date (SOA queries and possible subsequent zone transfer) is specifically excluded from this task.
- If a zone has **not** been configured with a file, make SOA queries and attempt zone transfers as necessary in order to load the zone. If zone transfer succeeds and zone data is loaded into memory the zone is 'ready', a signal is generated and no further readiness checks need to be made: this task is complete. If zone transfer fails there needs to be a limit - number of tries without success - to how long this task remains on the todo list. In this case generate a 'not ready' signal and remove the task from the list.
Catalog zones:
- These can be treated similarly to Primary or Secondary zones for the catalog itself. Once the catalog is loaded generate a ready signal and remove it from the todo list.
- However, during processing of each catalog a further list of (member) zones will be generated, each of which need to be added to the todo list and treated as a Secondary zone with no previous local data storage - i.e. needing to be transferred from a primary server.
Response Policy Zones:
- These can be treated similarly to Primary or Secondary zones for the zone data itself, but with the (possible?) additional step of needing to build the policy once it has been loaded. An RPZ should be considered ready only when the policy is active and responses would be re-written.
Mirror zones:
- These are similar to secondary zones.
Anything else?https://gitlab.isc.org/isc-projects/bind9/-/issues/3050Post load checking of missing delegations2023-11-02T16:26:08ZMark AndrewsPost load checking of missing delegationsIs it worth while to perform a post load DS lookup for each primary / slave zone against the other loaded zone looking for a NXDOMAIN response which would indicate a missing delegation? This would catch cases like bhutan.gov.bt where b...Is it worth while to perform a post load DS lookup for each primary / slave zone against the other loaded zone looking for a NXDOMAIN response which would indicate a missing delegation? This would catch cases like bhutan.gov.bt where both it and the parent zone are served by the same servers but there isn't a delegation for bhutan.gov.bt in the gov.bt zone.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2879Add --disable-doh to a CI build?2023-11-02T16:26:08ZMark AndrewsAdd --disable-doh to a CI build?The following discussion from !5353 should be addressed:
- [ ] @marka started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5353#note_231504): (+1 comment)
> One remaining question is "do we add yet ano...The following discussion from !5353 should be addressed:
- [ ] @marka started a [discussion](https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5353#note_231504): (+1 comment)
> One remaining question is "do we add yet another system with --disable-doh to CI?"
- [ ] Also should we have a CI build that does not have libnghttp2 installed.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2641BIND Version checker/reporter2021-04-28T16:16:24ZVicky Riskvicky@isc.orgBIND Version checker/reporter**Background**
ISC needs some general idea how many instances of BIND are running which software versions. We frequently have to guesstimate this in order to assess the relative impact of a bug or vulnerability on our user base. We do ...**Background**
ISC needs some general idea how many instances of BIND are running which software versions. We frequently have to guesstimate this in order to assess the relative impact of a bug or vulnerability on our user base. We do not have any useful information from downloads, because our open source is published on so many sites, including many we do not control. Having this information would enable us to make better decisions about how many installations might be impacted and about what sort of fix is appropriate.
At the same time, we also need some mechanism to alert users who are running software that is EOL or subject to a known, published CVE. We should be able to provide unobtrusive but visible feedback to the operator when that is detected.
Users are wary of any feature that reports software information that could be used to identify and target them, or that carries any potential for side-channel data leakage. We must make it very transparent exactly what information is sent and what information we retain. Some users will wish to disable whatever version checker we implement, and we should make it easy to do so. However, if we do not enable this feature by default, it is unlikely that we will get enough data to be useful.
**Requirements**
1. We need a utility that will periodically (define) contact some system at ISC and report what BIND version it is running and check to see if that is a supported version clear of published CVEs.
1. The message or lookup from the version checker should consume a *very modest* amount of bandwidth. UDP should be adequate, retries and failure messages are not indicated if the lookup fails or is blocked. We want to be careful not to create a DDOS on the server.
1. The version checker does not need to check frequently - a period of daily might be ok and might be easiest to implement, weekly is adequate however.
1. The version checker should be enabled by default.
1. We might have one check that happens on startup and an identifiably different check that happens say, weekly after that. This is to address the issue of test systems, ephemeral dockers biasing the statistics.
1. **TBD.** It might be useful to initiate the 'ongoing' checker only when some level of regular query traffic is reached, to eliminate systems that are unused or 'toy' systems, again to reduce those systems biasing the statistics. The problem is that some of these systems we might regard as 'toy' systems could still be important to their users and we would be denying them the benefit of the alerts that their system is compromised. Also, some systems with a lot of query traffic could simply be, e.g. unmanaged open resolvers that are pounded with abuse traffic.
1. It must be relatively easy to disable the version checker. It should be possible to disable it without rebuilding the image and without restarting the daemon.
1. Some operating system packagers are going to want to disable the version checker, or possibly to 'redirect' it to check some facility of their own. This should not be unnecessarily difficult for them to do.
1. This feature should be added to the development branch of BIND.
1. **It is TBD** whether we should backport it to the most recent stable branch.
**Local logging**
The version checker should insert a log message in the configured default bind logging facility if the running version is EOL or if it is subject to a known published CVE.
1. The log message should state the CVE status (clear, vulnerable, EOL, unrecognized version)
1. **TBD. **If there is adequate room, this message could provide any of the following
- the date of the planned EOL for this version
- a link to get more information
- a link to download a new version
- a link to the vulnerability report
- a link to the CVE matrix
depending on what is feasible and reasonable given the space available.
1. **TBD** whether the log message should be at the Warning level or the Informational level
1. **TBD** whether the version checker should log that it ran, and checked the version and the response is ok. Some users might want this, but since it is not very useful and adds more chaff to the log, if it is logged it should be at a low level, info or debug.
**Version Status Responder**
1. ISC should stand up a system that the version checker can check
1. The responder should be identified by a FQDN. We will have to maintain this facility for a long time, and using a FQDN will make it easier for packagers to substitute their own responder if they choose.
1. The responder should have information on known BIND versions with support status and CVE status.
1. It should be possible for anyone who can publish a BIND CVE or post a BIND release to add or edit the known versions, CVE status and support status so we can easily keep this up to date, given we are releasing multiple new versions monthly.
1. The responder should log the time and version number the checker looks up, but it should not log the IP address the requests comes from or.... anything else.
1. **TBD.** How can we effectively identify package versions where the BIND version does not map to the ISC releases, such as when a packager backports CVE fixes?
1. **TBD** Should we make the census information public?https://gitlab.isc.org/isc-projects/bind9/-/issues/2495Mitigation for cache bloat due to negative RRsets (NXDOMAIN and NXRRSET) when...2024-03-01T10:04:57ZCathy AlmondMitigation for cache bloat due to negative RRsets (NXDOMAIN and NXRRSET) when preserving expired RRsets using max-stale-ttlI'm tagging this as 'Customer' because it is affecting/has affected several customer caches (unanticipated cache bloat) with 'stale-cache-enable yes;'
----
By design, negative cache content is not used in the same way as positive cache...I'm tagging this as 'Customer' because it is affecting/has affected several customer caches (unanticipated cache bloat) with 'stale-cache-enable yes;'
----
By design, negative cache content is not used in the same way as positive cache content. 'stale-answer-client-timeout' does not apply to negative content, thus the clients will timeout and likely retry the same query several times before giving up, long before the resolver-query-timeout. Only when named has failed to get a response from the authoritative server(s) does the 'stale-refresh-time' window commence.
There is a good reason for this - when querying the authoritative servers for a name that is already NXDOMAIN or NXRRSET in cache, we don't know if this is going to be replaced with positive content or not. Therefore, to ensure that positive content is preferred, we have to wait to be sure that it's sensible to serve the stale negative content as a last resort.
UNFORTUNATELY:
1. Most negative content is single-use - so adding max-stale-ttl to its retention period can significantly increase cache bloat - for no useful reason whatsoever.
2. Most negative content has (also by design of sane zone administrators, or overridden by max-ncache-ttl) a much shorter TTL in cache than positive content.
So when we're not preserving stale content, the negative content is fairly quickly removed from cache, in comparison with the positive RRsets. But adding max-stale-ttl to the retention period can quite significantly tilt the balance in favour of all of this one-use-only cached negative content.
We don't have any statistics that measure cache hits for different RTYPEs, so we don't know for sure, what percentage of negative content is used again (cache hits) versus just sitting there for no good reason (cache misses) - perhaps we should? (Perhaps we should also have stats on stale hits and misses by RTYPE?)
But in any case, having seen too many caches overloaded with stale negative content, I should like to propose an option that can be used either to shorten the max-stale-ttl for negative content, but that also takes a value of zero, to disable its retention entirely.https://gitlab.isc.org/isc-projects/bind9/-/issues/2485DNS protocol cleanup: require correct AA bit2023-08-16T16:51:42ZPetr Špačekpspacek@isc.orgDNS protocol cleanup: require correct AA bit### Description
Allegedly different resolvers treat AA bit in responses differently, and this is causing different operational problems for each implementation. PowerDNS and Knot Resolver have had issues with that.
Proposal by Peter va...### Description
Allegedly different resolvers treat AA bit in responses differently, and this is causing different operational problems for each implementation. PowerDNS and Knot Resolver have had issues with that.
Proposal by Peter van Dijk is to be strict on AA bit and punish non-compliance. Main motivation seems to be code simplification when it comes various combinations of NXDOMAIN/NOERROR without SOA RR and/or "extra" NS records in authority which are sometimes added as "good measure" but do not actually mean a referral.
Anecdotes from the field:
a) Ralf Weber from Akamai has some reservations:
> Given that a lot of people use resolvers in front of their authoritative servers who don't send AA I fail to envision what resolvers should do. If we drop non AA answers I expect huge portion of the Internet to go dark, though I don't have hard numbers on that.
b) Recent versions of PowerDNS switched to stricter mode and insist on AA bit being correct. A person from Deutsche Telecom claims this:
> To give a sense of possible impact, we have tens of millions of subscribers and only 5-10 cases per year estimated. So I guess nothing would "go dark" :slightly_smiling_face:
### Links / references
Thread https://chat.dns-oarc.net/community/pl/57pcpenfkf86tr8onmhn1q5a4a
Personally I argue this is
a) not significant enough
b) not widespread enough
to warrant full fledged flag day, but we can start being stricter on AA bit if we decide to do so. PowerDNS already went in that direction so first-mover disadvantage is already paid :-)Not planned