ISC Open Source Projects issueshttps://gitlab.isc.org/groups/isc-projects/-/issues2021-07-08T09:17:25Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2279Document in the ARM, how max-cache-size is used (from 9.16.6 and 9.17.4 and n...2021-07-08T09:17:25ZCathy AlmondDocument in the ARM, how max-cache-size is used (from 9.16.6 and 9.17.4 and newer) to avoid server delays due to hash table resizing
Related to #1775, !3935, !3936 and [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
We haven't done a very good job of documenting in the ARM how this new hash table sizing thing works.
There's just this s...
Related to #1775, !3935, !3936 and [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
We haven't done a very good job of documenting in the ARM how this new hash table sizing thing works.
There's just this snippet in the 9.16 ARM:
8.7.3 Feature Changes
• BIND’s cache database implementation has been updated to use a faster hash function with better distribution. In
addition, the effective max-cache-size (configured explicitly, defaulting to a value based on system memory or
set to unlimited) now pre-allocates fixed-size hash tables. This prevents interruption to query resolution when
the hash table sizes need to be increased. [GL #1775]
Meanwhile, max-cache-size is still described thus:
max-cache-size
This sets the maximum amount of memory to use for the server’s cache, in bytes or percentage
of total physical memory. When the amount of data in the cache reaches this limit, the server causes records to
expire prematurely based on an LRU-based strategy so that the limit is not exceeded. The keyword unlimited,
or the value 0, places no limit on the cache size; records are purged from the cache only when their TTLs expire.
Any positive values less than 2MB are ignored and reset to 2MB. In a server with multiple views, the limit applies
separately to the cache of each view. The default is 90%. On systems where detection of the amount of physical
memory is not supported, values represented as a percentage fall back to unlimited. Note that the detection of
physical memory is done only once at startup, so named does not adjust the cache size if the amount of physical
memory is changed during runtime
Neither of these really explains how max-cache-size affects cache hash table sizing, which is:
a) Set the initial hash table size to be 4 bits (see !3935, !3936 and #2075)
b) If we have set max-cache-size, or have not specified it at all (defaults to 90% of available system memory) then compute the largest expected hash table size we should need
c) On the first time we need to increase the hash table for any RBT, and assuming we have not set 'unlimited' (max-cache-size 0), then do one single increase, from the minimum, to the biggest size we expect to need (per action b)
Please find a way to explain this clearly in the ARM.
====
We should, I think, also recommend that operators with multiple views in their configuration, set max-cache-size per view, and also (to prevent surprises), for minimal-use-views that they don't know how big a cache they will need, but that it's probably not too big, to use "max-cache-size 0;"July 2021 (9.11.34, 9.11.34-S1, 9.16.19, 9.16.19-S1, 9.17.16)https://gitlab.isc.org/isc-projects/bind9/-/issues/2277Log when we grow the cache hash tables2021-10-08T07:39:41ZCathy AlmondLog when we grow the cache hash tablesRelated to #1775 (and its successors) and to [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
Although we're aware that resizing (growing) the cache has tables can cause delays in processing client queries b...Related to #1775 (and its successors) and to [Support ticket #16212](https://support.isc.org/Ticket/Display.html?id=16212)
Although we're aware that resizing (growing) the cache has tables can cause delays in processing client queries by a resolver, particularly when the hash table and cache have reached a significant size, we don't log anything when we do this operation.
This is a bit bizarre - we log when we resize the similar hash tables in ADB.
It was also requested that we add some logging in #1775, but this appears to have been overlooked:
> It would also be really helpful if any hash table growing could be logged - to include what the size is expanding to (this will help admins to tune their servers accordingly).
The customer involved in Support ticket #16212 also suggests that it's probably not necessary to log all cache hash table resizing, just the significant/big events. However, I'd be inclined to:
a) Log the initial size (when computed based on max-cache-size and whether or not the server allows recursion)
b) Log **all** resizing/rehash actions
That way, we have something to search for in the logs, if there is a suspicion that hash table resizing - particularly on recursive resolvers of significant size - might have occurred unexpectedly (despite #1775) and caused a short (likely a couple of seconds) blackout on the resolver.
Here's the proposed code submitted by the customer:
```
*** ./lib/dns/rbt.c-orig 2020-08-17 10:29:01.631928214 +0200
--- ./lib/dns/rbt.c 2020-08-17 10:29:33.567261898 +0200
***************
*** 2360,2365 ****
--- 2360,2373 ----
rbt->hashbits = newbits;
newsize = HASHSIZE(rbt->hashbits);
+
+ #ifdef WANT_LOG_REHASH
+ if (oldbits >= 14)
+ isc_log_write(dns_lctx, DNS_LOGCATEGORY_DATABASE, DNS_LOGMODULE_CACHE, ISC_LOG_WARNING,
+ "rehash %p: grow table from %d to %d starting", rbt, (int)oldsize, (int)newsize);
+ #endif
+
+
rbt->hashtable = isc_mem_get(rbt->mctx,
newsize * sizeof(dns_rbtnode_t *));
memset(rbt->hashtable, 0, newsize * sizeof(dns_rbtnode_t *));
***************
*** 2376,2381 ****
--- 2384,2393 ----
}
isc_mem_put(rbt->mctx, oldtable, oldsize * sizeof(dns_rbtnode_t *));
+ #ifdef WANT_LOG_REHASH
+ if (oldbits >= 14)
+ isc_log_write(dns_lctx, DNS_LOGCATEGORY_DATABASE, DNS_LOGMODULE_CACHE, ISC_LOG_WARNING, "rehash %p: grow finished", rbt);
+ #endif
}
static void
```
Note - I have tagged this as a bug, since I consider it a failure of named not to log something that can affect its responsiveness when it occurs. If possible, I'd like to see this included in the December releases (and also back-ported to 9.11).October 2021 (9.11.36, 9.11.36-S1, 9.16.22, 9.16.22-S1, 9.17.19)https://gitlab.isc.org/isc-projects/bind9/-/issues/2276Update to CentOS 7.92020-11-27T09:55:52ZMichal NowakUpdate to CentOS 7.9CentOS 7.9 is [out](https://wiki.centos.org/action/show/Manuals/ReleaseNotes/CentOS7.2009), we should update.CentOS 7.9 is [out](https://wiki.centos.org/action/show/Manuals/ReleaseNotes/CentOS7.2009), we should update.December 2020 (9.11.26, 9.11.26-S1, 9.16.10, 9.16.10-S1, 9.17.8)Michal NowakMichal Nowakhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2275Tighten DNS COOKIE response handling2020-12-01T08:45:30ZMark AndrewsTighten DNS COOKIE response handlingFallback to TCP when we have already seen a DNS COOKIE response from the given address and don't have one in this UDP response. This could be a server that has turned off DNS COOKIE support, a misconfigured anycast server with partial D...Fallback to TCP when we have already seen a DNS COOKIE response from the given address and don't have one in this UDP response. This could be a server that has turned off DNS COOKIE support, a misconfigured anycast server with partial DNS COOKIE support, or a spoofed response. Falling back to TCP is the correct behaviour in all 3 cases.
Future work, once the percentage of DNS COOKIE aware servers increases enough, will be to fallback to TCP on all UDP responses w/o DNS COOKIE options.December 2020 (9.11.26, 9.11.26-S1, 9.16.10, 9.16.10-S1, 9.17.8)https://gitlab.isc.org/isc-projects/bind9/-/issues/2274Drop CentOS 6 support after November 30, 20202023-02-16T11:41:59ZMichal NowakDrop CentOS 6 support after November 30, 2020CentOS 6 [EOL date](https://wiki.centos.org/FAQ/General#What_is_the_support_.27.27end_of_life.27.27_for_each_CentOS_release.3F) is November 30, 2020. ~~We should decide on it's support for `9_11` and `9_16` (it was dropped on `main` sinc...CentOS 6 [EOL date](https://wiki.centos.org/FAQ/General#What_is_the_support_.27.27end_of_life.27.27_for_each_CentOS_release.3F) is November 30, 2020. ~~We should decide on it's support for `9_11` and `9_16` (it was dropped on `main` since the 9.17 epoch started).~~ In GitLab CI, CentOS 6 jobs will only be run for BIND 9.11 and BIND 9.11-S.
Given that `v9_16` is still under very active development (DoH) with dependencies being added (libnghttp2), I suggest CentOS 6 is dropped after November 30, 2020 and we spare us the headache of building dependencies (libev) for new BIND dependencies (libnghttp2).
Outstanding tasks:
- [x] Prepare and send an announcement to *bind-users*
- [x] (!4392) Drop CentOS 6 jobs from GitLab CI for `v9_16`
- [x] (isc-projects/images!90) Drop CentOS 6-specific parts from all Packer image recipes
- [x] (isc-projects/images@68115ddec1f7fd760464ab4a50be46fca4cebfd9) Drop CentOS 6 support from the `packager:rpm` Docker image
- [x] Drop CentOS 6-specific parts from `*.spec` files for BIND's build prerequisites
- [x] (isc-private/rpms/bind!14) Drop CentOS 6-specific parts from BIND RPM build scripts
- [x] (isc-private/rpms/bind!14) Drop CentOS 6-specific parts from BIND RPM test scripts
- [x] Remove CentOS 6 chroots from Copr before publishing the December releases
- [x] Update installation instructions on Copr so that they do not include CentOS 6-specific bits
- [x] Remove all CentOS 6 packages from Cloudsmith (after a few months); update "Due date" according to the next task
- [x] Remove all CentOS 6 artifacts ([Docker](https://gitlab.isc.org/isc-projects/images/-/tree/main/docker/bind9/centos-template), [Packer](https://gitlab.isc.org/isc-projects/images/-/tree/main/packer/centos), ...) after BIND 9.11 EOLMichał KępieńMichał Kępień2023-12-31https://gitlab.isc.org/isc-projects/bind9/-/issues/2273Back-port #2066 to 9.11-S and 9.16 (Fix serve-stale so that it is usable when...2021-06-07T13:55:45ZCathy AlmondBack-port #2066 to 9.11-S and 9.16 (Fix serve-stale so that it is usable when needed)This is a reminder/request to back-port the improved Serve-stale implementation to BIND 9.11-S for the March 2021 maintenance releases.
Backport to 9.11-S:
- [x] #2066 (`stale-refresh-time`) (9.11.33-S1)
- [x] #2247 (`stale-answer-clien...This is a reminder/request to back-port the improved Serve-stale implementation to BIND 9.11-S for the March 2021 maintenance releases.
Backport to 9.11-S:
- [x] #2066 (`stale-refresh-time`) (9.11.33-S1)
- [x] #2247 (`stale-answer-client-timeout`) (9.11.33-S1)
- [x] #2248 (Update defaults) (9.11.33-S1)
- [x] #2281 (Coverity `CHECKED_RETURN` issue) (9.11.33-S1)
- [x] #2289 (Nonsensical TTLs is cache dump) (9.11.33-S1)
- [x] #2442 (TSAN error) (don't backport, fixing the TSAN error here is just silencing the warning)
- [x] #2434 (serve-stale \w fetch-limits) (9.11.33-S1)
- [x] #2443 (Coverity `OVERRUN` issue) (9.11.33-S1)
- [x] #2503 (`stale-answer-client-timeout` crash) (9.11.33-S1)
- [x] #2565 (`serve-stale fetch-limits` crash) (9.11.33-S1)
- [x] #2594 (serve-stale recursion race condition crash) (9.11.33-S1)
- [x] #2608 (`stale-answer-client-timeout` default off) (9.11.33-S1)
- [x] #2731 (serve-stale \w dns64) (9.11.33-S1)
- [x] #2733 (serve-stale \w prefetch) (9.11.33-S1)
- [x] !199 (several serve-stale improvements kchen) (9.11.33-S1)
- [x] add code to prohibit stale-answer-client-timeout > 0)
Backport to 9.16:
- [x] #2066 (`stale-refresh-time`) (9.16.9)
- [x] #2247 (`stale-answer-client-timeout`) (9.16.12)
- [x] #2248 (Update defaults) (9.16.12)
- [x] #2281 (Coverity `CHECKED_RETURN` issue) (9.16.13)
- [x] #2289 (Nonsensical TTLs is cache dump) (9.11.15)
- [x] #2434 (serve-stale \w fetch-limits) (9.16.13)
- [x] #2442 (TSAN error) (9.16.12)
- [x] #2443 (Coverity `OVERRUN` issue) (9.16.13)
- [x] #2503 (`stale-answer-client-timeout` crash) (9.16.13)
- [x] #2565 (`serve-stale fetch-limits` crash) (9.16.13)
- [x] #2594 (serve-stale recursion race condition crash) (9.16.15)
- [x] #2608 (`stale-answer-client-timeout` default off) (9.16.15)
- [x] #2731 (serve-stale \w dns64) (9.16.17)
- [x] #2733 (serve-stale \w prefetch) (9.16.17)
- [x] !199 (several serve-stale improvements kchen) (9.16.17)June 2021 (9.11.33, 9.11.33-S1, 9.16.17/9.16.18, 9.16.17-S1/9.16.18-S1, 9.17.14/9.17.15)Matthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/2272Backport DoT/DoH-related merge requests2021-06-14T19:24:02ZMichał KępieńBackport DoT/DoH-related merge requestsThis issue contains a list of DoT/DoH-related merge requests which
should be eventually backported to `v9_16`, but may need to wait in a
queue for a while before that happens.
**Merge requests that *must* be backported:**
- [ ] !3532...This issue contains a list of DoT/DoH-related merge requests which
should be eventually backported to `v9_16`, but may need to wait in a
queue for a while before that happens.
**Merge requests that *must* be backported:**
- [ ] !3532 Add TLS support to named and dig
- [ ] !4373 Add support to link with libssl
- [x] !4584 refactor TLSDNS module to work with libuv/ssl directly
- [ ] !4571 Add support for incoming tranfers via XoT
- [ ] !4644 Resolve "Encrypted DNS - RFC 8484, DNS over HTTPS, DOH (also DoT comments)"
- [ ] !4653 Resolve "too easy to configure unencrypted DoH"
- [ ] !4689 report libnghttp2 version in 'named -V'
- [ ] !4766 Fix comparison between signed and unsigned integer expressions
- [ ] !4672 Resolve "RFC8484, DoH support in DIG (and any other relevant utilities)"
- [ ] !4794 Resolve "warning: array subscript is of type 'char' on NetBSD 9"
- [ ] !4792 Load full certificate chain from a certificate chain file
- [ ] !4803 Fix a XoT crash
- [ ] !4806 Resolve "Does not compile without deprecated OpenSSL APIs"
- [ ] !4820 Fix dangling uvreq when data is sent from tlsdns_cycle()
- [ ] !4809 Fix memory accounting bug in TLSDNS
- [ ] !4824 Call isc__nm_tlsdns_failed_read on tls_error to cleanup the socket
- [ ] !4851 TLS transport code refactoring and unit tests
- [ ] !4863 Fix "doth" system test failure with SSL_ERROR_SYSCALL (5)
- [ ] !4893 Merge the tls_test.c into netmgr_test.c and extend the tests suite
- [ ] !4906 Resolve "tlsstream.c: warning: comparison of integer expressions of different signedness"
- [ ] !5005 Fix flawed DoH unit tests logic and some corner cases in the DoH code. Fix doh_test failure on FreeBSD 13.0
- [ ] !5019 DoH flamethrower fixes
- [ ] !5024 Add DoH quota tests
- [ ] !5121 HTTP/2 write bufferingJuly 2021 (9.11.34, 9.11.34-S1, 9.16.19, 9.16.19-S1, 9.17.16)https://gitlab.isc.org/isc-projects/bind9/-/issues/2271Wrap the libuv into cmocka mock object for the unit test2020-12-03T10:19:53ZOndřej SurýWrap the libuv into cmocka mock object for the unit testOn *BSDs, the libuv is more unreliable than on Linux and it could return intermittent errors about not enough file descriptors, port binding and etc.
As we need a more reliable way how to test that, we need to wrap the libuv library cal...On *BSDs, the libuv is more unreliable than on Linux and it could return intermittent errors about not enough file descriptors, port binding and etc.
As we need a more reliable way how to test that, we need to wrap the libuv library calls into mock objects, and simulate the failure programatically.December 2020 (9.11.26, 9.11.26-S1, 9.16.10, 9.16.10-S1, 9.17.8)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2259zone_namerd_tostr called w/o lock being held2020-11-13T10:52:05ZMark Andrewszone_namerd_tostr called w/o lock being heldJob [#1286391](https://gitlab.isc.org/isc-projects/bind9/-/jobs/1286391) failed for c19a35c945ebc21272143253d408e145b949a966:
```
WARNING: ThreadSanitizer: data race
Read of size 8 at 0x000000000001 by thread T1:
#0 inline_raw li...Job [#1286391](https://gitlab.isc.org/isc-projects/bind9/-/jobs/1286391) failed for c19a35c945ebc21272143253d408e145b949a966:
```
WARNING: ThreadSanitizer: data race
Read of size 8 at 0x000000000001 by thread T1:
#0 inline_raw lib/dns/zone.c:1375
#1 zone_namerd_tostr lib/dns/zone.c:15316
#2 dns_zone_name lib/dns/zone.c:15391
#3 xfrin_log lib/dns/xfrin.c:1605
#4 xfrin_destroy lib/dns/xfrin.c:1477
#5 dns_xfrin_detach lib/dns/xfrin.c:739
#6 xfrin_connect_done lib/dns/xfrin.c:970
#7 tcpdnsconnect_cb netmgr/tcpdns.c:786
#8 tcp_connect_cb netmgr/tcp.c:292
#9 <null> <null>
#10 <null> <null>
Previous write of size 8 at 0x000000000001 by thread T2 (mutexes: write M1):
#0 zone_shutdown lib/dns/zone.c:14462
#1 dispatch lib/isc/task.c:1152
#2 run lib/isc/task.c:1344
#3 <null> <null>
Location is heap block of size 2769 at 0x000000000013 allocated by thread T3:
#0 malloc <null>
#1 default_memalloc lib/isc/mem.c:713
#2 mem_get lib/isc/mem.c:622
#3 mem_allocateunlocked lib/isc/mem.c:1268
#4 isc___mem_allocate lib/isc/mem.c:1288
#5 isc__mem_allocate lib/isc/mem.c:2453
#6 isc___mem_get lib/isc/mem.c:1037
#7 isc__mem_get lib/isc/mem.c:2432
#8 dns_zone_create lib/dns/zone.c:984
#9 configure_zone bin/named/server.c:6502
#10 do_addzone bin/named/server.c:13391
#11 named_server_changezone bin/named/server.c:13788
#12 named_control_docommand bin/named/control.c:207
#13 control_command bin/named/controlconf.c:392
#14 dispatch lib/isc/task.c:1152
#15 run lib/isc/task.c:1344
#16 <null> <null>
```November 2020 (9.11.25, 9.11.25-S1, 9.16.9, 9.16.9-S1, 9.17.7)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/2252ns_client_sendraw() is missing DNSTAP support.2020-11-13T11:04:35ZMark Andrewsns_client_sendraw() is missing DNSTAP support.Found in the course of discussing [Support RT#17273][1] (though not
directly related to the issue reported in that ticket).
[1]: https://support.isc.org/Ticket/Display.html?id=17273Found in the course of discussing [Support RT#17273][1] (though not
directly related to the issue reported in that ticket).
[1]: https://support.isc.org/Ticket/Display.html?id=17273November 2020 (9.11.25, 9.11.25-S1, 9.16.9, 9.16.9-S1, 9.17.7)https://gitlab.isc.org/isc-projects/bind9/-/issues/2250DNS Flag Day 2020 - EDNS buffer size configuring does not work anymore2020-12-02T22:34:28ZArsen StasicDNS Flag Day 2020 - EDNS buffer size configuring does not work anymore<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [...<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [security-officer@isc.org](security-officer@isc.org).
-->
### Summary
I think !4179 introduced a bug, that any config option of max-udp-size or edns-udp-size are not working anymore.
### BIND version used
9.16.8
9.11.24
old versions ( 9.16.7 , 9.11.23 ) don't show this behavior
### Steps to reproduce
Install new bind and following config:
```
edns-udp-size 2000;
max-udp-size 2000;
```
But you will still get a TC-bit for queries bigger than 1232 byte.
### What is the current *bug* behavior?
You get the TC-bit even if the answer is lower than 2000 byte long.
### What is the expected *correct* behavior?
Not getting the TC-bit.
### Relevant configuration files
```
edns-udp-size 2000;
max-udp-size 2000;
```
### Relevant logs and/or screenshots
With the new version installed on 28th October 2020 the TCP queries for DNSKEY quadrupled:
![Screenshot_2020-11-06_DSC-Grafana](/uploads/c633160feb899346b14ced8c47403ff0/Screenshot_2020-11-06_DSC-Grafana.png)
### Possible fixes
I think !4179 introduced this bug.December 2020 (9.11.26, 9.11.26-S1, 9.16.10, 9.16.10-S1, 9.17.8)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/stork/-/issues/443test improvement: listing system tests should not require packages2022-06-21T12:33:57ZTomek Mrugalskitest improvement: listing system tests should not require packagesI've tried to get a list of system tests, but was told I need to build packages first.
```
$ ./venv/bin/pytest --collect-only tests.py
Cannot find deb or rpm Stork packages.
To prepare them run `rake build_pkgs_in_docker`.
```
Annoyin...I've tried to get a list of system tests, but was told I need to build packages first.
```
$ ./venv/bin/pytest --collect-only tests.py
Cannot find deb or rpm Stork packages.
To prepare them run `rake build_pkgs_in_docker`.
```
Annoying, but I can live with it now.outstandinghttps://gitlab.isc.org/isc-projects/bind9/-/issues/2248Update serve-stale configuration defaults2021-02-03T08:25:09ZMatthijs Mekkingmatthijs@isc.orgUpdate serve-stale configuration defaultsUpdate the defaults to the RFC 8767 recommended values (`stale-answer-ttl 30`, `max-stale-ttl 1d`, `stale-refresh-time 30s` or higher).Update the defaults to the RFC 8767 recommended values (`stale-answer-ttl 30`, `max-stale-ttl 1d`, `stale-refresh-time 30s` or higher).February 2021 (9.11.28, 9.11.28-S1, 9.16.12, 9.16.12-S1, 9.17.10)Matthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/2247Add serve-stale option to set client timeout2021-01-29T09:51:55ZMatthijs Mekkingmatthijs@isc.orgAdd serve-stale option to set client timeoutImplement `stale-answer-client-timeout`, which is the maximum amount of time a recursive resolver should allow between the receipt of a resolution request and sending its response (only to be used if `stale-answer-enable` is set).Implement `stale-answer-client-timeout`, which is the maximum amount of time a recursive resolver should allow between the receipt of a resolution request and sending its response (only to be used if `stale-answer-enable` is set).February 2021 (9.11.28, 9.11.28-S1, 9.16.12, 9.16.12-S1, 9.17.10)Diego dos Santos FronzaDiego dos Santos Fronzahttps://gitlab.isc.org/isc-projects/bind9/-/issues/2246Backport netmgr-related merge requests2021-08-12T09:36:56ZMichał KępieńBackport netmgr-related merge requestsThis issue contains a list of netmgr-related merge requests which should
be eventually backported to `v9_16`, but may need to wait in a queue for
a while before that happens.
**Merge requests that *must* be backported:**
- [x] ~~!378...This issue contains a list of netmgr-related merge requests which should
be eventually backported to `v9_16`, but may need to wait in a queue for
a while before that happens.
**Merge requests that *must* be backported:**
- [x] ~~!3781 Fix socket closing races.~~
- [x] !4318 (included in !4455) Resolve "Add netmgr functions to support outgoing DNS queries"
- [x] !4341 (included in !4455) Fix improper closed connection handling in tcpdns.
- [x] !4386 (included in !4455) Turn all the callback to be always asynchronous
- [x] **!4444 Refactor netmgr and add more unit tests**
- [x] !4452 (included in !4455) Avoid netievent allocations when the callbacks can be called directly
- [x] !4458 (included in !4455) Make netmgr initialize and cleanup Winsock itself
- [x] !4459 (included in !4455) Distribute queries among threads even on platforms without SO_REUSEPORT_LB
- [x] !4465 (included in !4455) Don't use stack allocated buffer for uv_write()
- [x] !4468 (included in !4455) Fix datarace when UDP/TCP connect fails and we are in nmthread
- [x] !4469 (included in !4455) Use sock->nchildren instead of mgr->nworkers when initializing NM
- [x] !4472 (included in !4455) Fix s/HAVE_REUSEPORT_LB/HAVE_SO_REUSEPORT_LB/ typo in #define
**Merge requests that *may* be backported:**
- [ ] !4115 Resolve "convert dig and friends to use the netmgr"
- [ ] !4246 use netmgr for xfrin
- [ ] !4374 address some possible shutdown races in xfrin
- [ ] !4397 Resolve ""dig" crashes when interrupted while waiting for a TCP connection"
- [ ] !4466 Configure the system-wide TCP connection timeout on OpenBSD
- [ ] !4633 Resolve "Incorrect size passed to isc_mem_put"
- [x] !4628 Improve reliability of the netmgr unit tests
- [x] !4845 netmgr: Make it possible to recover from ISC_R_TIMEDOUT (backported without the relevant changes to `dig`, `rndc`, or xfrin)
- [ ] !4898 Prevent the double xfrin_fail() call
- [x] !4930 ensure read timeouts are recoverable
- [ ] !4796 Add workaround for "nslookup segfaults for SERVFAIL"
- [x] !4918 Refactor taskmgr to run on top of netmgr
- [x] !4983 Destroy netmgr before destroying taskmgr
- [x] !4981 Add nanosleep and usleep Windows shims
- [ ] !4982 Add support for generating backtraces on Windows
- [x] !5009 Bump the netmgr quantum to 1024
- [x] !5013 initalise sock->cond
- [x] !5021 Fix the outgoing UDP socket selection on Windows
[^1]: likely made redundant by !4444September 2021 (9.16.21, 9.16.21-S1, 9.17.18)https://gitlab.isc.org/isc-projects/bind9/-/issues/2245bind 9.16.8 does not honor CPU affinity2021-01-21T08:12:54ZOle Bjørn Hessenbind 9.16.8 does not honor CPU affinity
### Summary
bind 9.16.8 does not honor CPU affinity mask on linux.
We are running a dpdk application that works as an firewall/ddos
protection protecting named. To run dpdk we need to reserve
a couple of dedicated CPUs for dpdk threa...
### Summary
bind 9.16.8 does not honor CPU affinity mask on linux.
We are running a dpdk application that works as an firewall/ddos
protection protecting named. To run dpdk we need to reserve
a couple of dedicated CPUs for dpdk threads. These threads must
run on same CPU socket as PCI-bus/NIC card, so they must run on
the first CPU socket. No other threads must use these CPUs that
is dedicated to this dpdk application.
On linux the command taskset is used to set the CPU affinity mask
for a process. Or one can use kernel boot parameter isolcpus to
set system CPU affinity.
The problem is that named ignores the existing affinity mask and blindly
binds the threads for isc-{net,worker,socket}-{nr} to process number {nr}.
### BIND version used
BIND 9.16.8
### Steps to reproduce
``` sh
# taskset fff0 ./bin/named/named -f -c /etc/named/named.conf -u named -U 6 -n 10
# ps -T -o pid,psr,time,comm -e | egrep 'isc-net-0000'
32374 0 00:00:00 isc-net-0000
```
The taskset command signals to named that it can select all cpus but not cpu 0,1,2,3
### What is the current *bug* behavior?
### What is the expected *correct* behavior?
Select next available CPU relative to the existing affinity mask.
For the process above I would have expected the first thread to
bind to CPU 4.
``` sh
# ps -T -o pid,psr,time,comm -e | egrep 'isc-net-0000'
32374 4 00:00:00 isc-net-0000
```
### Possible fixes
The following code fetches the existing affinity mask and use it to
select next available CPU.
``` c
# cat ../telenor-patches/telenor-honor-affinity.patch
diff -r -c ../bind-9.16.8-orig/lib/isc/pthreads/thread.c lib/isc/pthreads/thread.c
*** ../bind-9.16.8-orig/lib/isc/pthreads/thread.c 2020-10-13 10:41:40.000000000 +0200
--- lib/isc/pthreads/thread.c 2020-10-30 12:24:26.627360658 +0100
***************
*** 155,162 ****
cpuset_destroy(cset);
#else /* linux? */
cpu_set_t set;
CPU_ZERO(&set);
! CPU_SET(cpu, &set);
if (pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &set) !=
0) {
return (ISC_R_FAILURE);
--- 155,174 ----
cpuset_destroy(cset);
#else /* linux? */
cpu_set_t set;
+ if (pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t), &set) !=
+ 0) {
+ return (ISC_R_FAILURE);
+ }
+ int cpu_id = -1, cpu_aff_ok_counter = -1;
+ while (cpu_aff_ok_counter < cpu) {
+ cpu_id++;
+ if (CPU_ISSET(cpu_id, &set)) /* true if process affinity allows using cpu */
+ cpu_aff_ok_counter++;
+ if (cpu_id > 10000)
+ return (ISC_R_FAILURE);
+ }
CPU_ZERO(&set);
! CPU_SET(cpu_id, &set);
if (pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &set) !=
0) {
return (ISC_R_FAILURE);
```January 2021 (9.11.27, 9.11.27-S1, 9.16.11, 9.16.11-S1, 9.17.9)https://gitlab.isc.org/isc-projects/bind9/-/issues/2244NTA-related crash after reconfiguring views2020-12-03T15:36:27ZFritz ElfertNTA-related crash after reconfiguring views<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [...<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [security-officer@isc.org](security-officer@isc.org).
-->
### Summary
After modifying /etc/named.conf (moving some zones into a separate view) and restarting named,
it crashed every few (approx. 5) minutes. Before the config change, I had experimented with the
rndc nta ZONE, where ZONE was one of the zones that have been moved into a separate view.
### BIND version used
```
BIND 9.11.23-RedHat-9.11.23-1.fc32 (Extended Support Version) <id:4f70056>
running on Linux x86_64 5.8.13-200.fc32.x86_64 #1 SMP Thu Oct 1 21:49:42 UTC 2020
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/bin/python3' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=yes' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'LT_SYS_LIBRARY_PATH=/usr/lib64:' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 10.2.1 20200723 (Red Hat 10.2.1-1)
compiled with OpenSSL version: OpenSSL 1.1.1g FIPS 21 Apr 2020
linked to OpenSSL version: OpenSSL 1.1.1g FIPS 21 Apr 2020
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with libjson-c version: 0.13.1
linked to libjson-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.4.2
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
DNSSEC root key: /etc/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
geoip-directory: /usr/share/GeoIP
```
### Steps to reproduce
See https://bugzilla.redhat.com/show_bug.cgi?id=1893761
(config files and gdb output are attached there)November 2020 (9.11.25, 9.11.25-S1, 9.16.9, 9.16.9-S1, 9.17.7)https://gitlab.isc.org/isc-projects/stork/-/issues/437failed unit-tests due to missing dependency2022-05-11T09:37:48ZTomek Mrugalskifailed unit-tests due to missing dependencyTried running `rake unittest_backend` on my fresh Ubuntu 20.04. They failed with this:
```
rm -f backend/server/agentcomm/api_mock.go
for db in $(psql -t -h localhost -p 5432 -U storktest -c "select datname from pg_database wher...Tried running `rake unittest_backend` on my fresh Ubuntu 20.04. They failed with this:
```
rm -f backend/server/agentcomm/api_mock.go
for db in $(psql -t -h localhost -p 5432 -U storktest -c "select datname from pg_database where datname ~ 'storktest.*'"); do
dropdb -h localhost -p 5432 -U storktest $db
done
sh: 2: psql: not found
createdb -h localhost -p 5432 -U storktest -O storktest storktest
rake aborted!
Command failed with status (127): [createdb -h localhost -p 5432 -U storktest...]
/home/thomson/devel/stork/Rakefile:405:in `block in <top (required)>'
/usr/share/rubygems-integration/all/gems/rake-13.0.1/exe/rake:27:in `<top (required)>'
Tasks: TOP => unittest_backend
(See full trace by running task with --trace)
```
We should either make a check if psql is available (maybe make an array of tools we require in the system)?1.3https://gitlab.isc.org/isc-projects/bind9/-/issues/2241Add TCPDNS unit test2020-12-03T10:46:18ZOndřej SurýAdd TCPDNS unit testDecember 2020 (9.11.26, 9.11.26-S1, 9.16.10, 9.16.10-S1, 9.17.8)Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2239fctx:id is uninitialized and effectively unused2020-11-12T13:23:15ZMichael McNallyfctx:id is uninitialized and effectively unusedOn [Support #17250](https://support.isc.org/Ticket/Display.html?id=17250) Jinmei has another suggestion for us based on his review of the code:
> I've just noticed the 'id' field of struct fctx (defined in lib/dns/resolver.c) is not init...On [Support #17250](https://support.isc.org/Ticket/Display.html?id=17250) Jinmei has another suggestion for us based on his review of the code:
> I've just noticed the 'id' field of struct fctx (defined in lib/dns/resolver.c) is not initialized, and the uninitialized value is used only for logging (so effectively unused). This field seems to be introduced in 2018 at commit f2af336 with initialization, but has been effectively removed at some point (I've not figured out exactly when). I suspect it's really unnecessary today and it should have been cleaned up. The attached patch is one trivial way to clean it up (the fact that it compiles should also prove it's unused otherwise).
>
> Pretty minor though, except that a bogus 'id' value could be included in a debug log message.
He suggests the following patch:
```
diff --git a/lib/dns/resolver.c b/lib/dns/resolver.c
index 1dd82a0300..020399db80 100644
--- a/lib/dns/resolver.c
+++ b/lib/dns/resolver.c
@@ -400,7 +400,6 @@ struct fetchctx {
unsigned int valfail;
bool timeout;
dns_adbaddrinfo_t *addrinfo;
- dns_messageid_t id;
unsigned int depth;
char clientstr[ISC_SOCKADDR_FORMATSIZE];
};
@@ -3446,8 +3445,8 @@ findname(fetchctx_t *fctx, const dns_name_t *name, in_port_t port,
isc_log_write(dns_lctx, DNS_LOGCATEGORY_RESOLVER,
DNS_LOGMODULE_RESOLVER, ISC_LOG_DEBUG(3),
- "fctx %p(%s): createfind for %s/%d - %s", fctx,
- fctx->info, fctx->clientstr, fctx->id,
+ "fctx %p(%s): createfind for %s - %s", fctx,
+ fctx->info, fctx->clientstr,
isc_result_totext(result));
if (result != ISC_R_SUCCESS) {
```November 2020 (9.11.25, 9.11.25-S1, 9.16.9, 9.16.9-S1, 9.17.7)Matthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.org