ISC Open Source Projects issueshttps://gitlab.isc.org/groups/isc-projects/-/issues2021-10-04T12:47:58Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/303Improve const correctness2021-10-04T12:47:58ZOndřej SurýImprove const correctness@bind-team, this is 3 years old review of BIND code by Errata Security: https://blog.erratasec.com/2015/07/a-quick-review-of-bind9-code.html and he is right.
We need to start using `const` more extensively _and_ internalise it when writ...@bind-team, this is 3 years old review of BIND code by Errata Security: https://blog.erratasec.com/2015/07/a-quick-review-of-bind9-code.html and he is right.
We need to start using `const` more extensively _and_ internalise it when writing new code and doing reviews.
It is a long term goals, so we will just keep this open, and I'll list major components that should be reviewed:
* [ ] libisc
* [ ] libdns
* [ ] libns
* [ ] libisccfg
* [ ] libiscccc
* [ ] bin/namedLong-termhttps://gitlab.isc.org/isc-projects/bind9/-/issues/318Add a warnings about changed options in 9.16 ESV to 9.11 ESV2021-10-04T12:49:00ZOndřej SurýAdd a warnings about changed options in 9.16 ESV to 9.11 ESVThis is just a placeholder for any configuration options where we either change the defaults or remove the configuration option in the next ESV. For those options, we need to add a warning to the current ESV.
Here's the not-complete li...This is just a placeholder for any configuration options where we either change the defaults or remove the configuration option in the next ESV. For those options, we need to add a warning to the current ESV.
Here's the not-complete list:
[ ] ECS Authoritativehttps://gitlab.isc.org/isc-projects/bind9/-/issues/294Follow cname/dname when resolving NSes2021-10-04T12:53:17ZWitold KrecickiFollow cname/dname when resolving NSesAlthough 1034 states that NS should not be a CNAME there's no practical reason why we shouldn't follow NSes that are CNAME/DNAME records.Although 1034 states that NS should not be a CNAME there's no practical reason why we shouldn't follow NSes that are CNAME/DNAME records.Witold KrecickiWitold Krecickihttps://gitlab.isc.org/isc-projects/bind9/-/issues/368Certain named builds improperly check writability of some directories when -u...2021-10-04T12:54:47ZMichał KępieńCertain named builds improperly check writability of some directories when -u is usedBIND 9.12 [introduced](16d6fab2e59f1fdf63eb71fc59e138031f5c5005) mandatory checks for writability of certain directories. Certain build types perform some of these checks in incorrect order when `-u` is used.
Here is what happens for n...BIND 9.12 [introduced](16d6fab2e59f1fdf63eb71fc59e138031f5c5005) mandatory checks for writability of certain directories. Certain build types perform some of these checks in incorrect order when `-u` is used.
Here is what happens for non-threaded Linux builds with support for capabilities:
1. `named` drops capabilities to the initial set,
2. `directory` and potentially `managed-keys-directory` are checked for writability,
3. `setuid()` is called.
This means the checks in step 2 are made with the root user's privileges, albeit severely diminished by the limited capabilities which are in effect after step 1 (namely, `CAP_DAC_OVERRIDE` is no longer set). So if e.g. `directory` is set to a path which is writable by the user specified with `-u`, but not for the root user (stripped of its superuser privileges), `named` will (wrongly) refuse to start.
These same checks are differently broken for non-threaded Linux builds without support for capabilities *and* **all** builds on other platforms because step 1 above is never executed. This means that writability checks from step 2 always succeed, because they are performed with full superuser privileges rather than `-u` user's privileges.
The only type of build where the problem does not exist are threaded Linux builds, because in their case, `setuid()` is called between steps 1 and 2 above.
Fixing the problem likely requires moving the directory checks from the view configuration routines until after the last possible call to `named_os_changeuser()`, similarly to how the current working directory is checked.Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/374oss-fuzz integration2021-10-04T12:55:20ZBhargava Shastryoss-fuzz integration### Description
Continual fuzzing may help catch potential security vulnerabilities in bind at an early stage. To this end, it might be useful to enrol bind in oss-fuzz, a free continual fuzzing initiative offered by Google. I have a te...### Description
Continual fuzzing may help catch potential security vulnerabilities in bind at an early stage. To this end, it might be useful to enrol bind in oss-fuzz, a free continual fuzzing initiative offered by Google. I have a test case (see below) that can be used as a starting point for this integration. The short-term plan would be to augment this test case or write new ones using the libFuzzer API.
```
#include <stddef.h>
#include <stdint.h>
#include <isc/buffer.h>
#include <dns/fixedname.h>
#include <dns/name.h>
int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)
{
isc_buffer_t buf;
isc_result_t result;
dns_fixedname_t origin;
if (size < 5) return 0;
dns_fixedname_init(&origin);
isc_buffer_init(&buf, (void *)data, size);
isc_buffer_add(&buf, size);
result = dns_name_fromtext(dns_fixedname_name(&origin), &buf, dns_rootname, 0, NULL);
return 0;
}
```
### Request
Ideally, oss-fuzz integration would create a sub-folder like `tests/oss-fuzz` that would house oss-fuzz specific test harnesses such as above. Once this test harness is approved and merged into the bind repo, I can send a pull request to oss-fuzz to fetch bind sources, build and run bind fuzzers on a continual basis.
For this to happen, the following is required:
- The oss-fuzz test harness (such as the one shown above) is merged into bind source tree
- one person from Bind development team would serve as the primary contact
- one Google linked ID (e.g., from a Bind dev team) to view bug reports needs to be provided
- [optional] one or more additional people may be CCed
### Links / references
- https://github.com/google/oss-fuzz
- http://libfuzzer.info/
- https://opensource.google.com/projects/oss-fuzzhttps://gitlab.isc.org/isc-projects/bind9/-/issues/380Check the minimal key sizes in tests that might break some HSMs2021-10-04T12:56:42ZOndřej SurýCheck the minimal key sizes in tests that might break some HSMshttps://gitlab.isc.org/isc-projects/bind9/-/issues/407round robin fails with two record RRSET in 9.12.1-P2 on rDNS2021-10-04T13:00:31ZGhost Userround robin fails with two record RRSET in 9.12.1-P2 on rDNS### Summary
An RRSET with two A records fails to properly round-robin for recursive servers running 9.12.1-P2. It works fine on the authoritative servers, and it works on the rDNS server if there are three or more records.
### Steps ...### Summary
An RRSET with two A records fails to properly round-robin for recursive servers running 9.12.1-P2. It works fine on the authoritative servers, and it works on the rDNS server if there are three or more records.
### Steps to reproduce
dig roundrobin.ucsc.edu @adns2.ucsc.edu
In theory, your rDNS server running 9.12.1-P2 will not show proper round-robin behavior on that query. Our rDNS servers are also authoritative for ucsc.edu, but the same behavior manifests when I query something for which our rDNS is not authoritative:
dig www.yale.edu.cdn.cloudflare.net
so I believe this issue should be easily seen by anyone running 9.12.1-P2 for rDNS.
### What is the current *bug* behavior?
The pair of records are always returned in the same order.
### What is the expected *correct* behavior?
Records returned round robin.
### Relevant configuration files
I am going to poke in only a few config lines on the theory that the general config is probably not relevant. I'm happy to flogged if I get that wrong and will provide full details if needed.
<pre>rrset-order { order cyclic; };
recursion yes;
dnssec-enable yes;
dnssec-validation yes;
response-policy { .... }</pre>https://gitlab.isc.org/isc-projects/bind9/-/issues/425zonechecks test fail intermittently with PKCS#11 enabled2021-10-04T13:01:50ZOndřej Surýzonechecks test fail intermittently with PKCS#11 enabledThe cause: `setup.sh: line 29: 8750 Segmentation fault (core dumped) $SIGNER -SD -o master.example master.db > /dev/null 2> signer.err
`
And the logs are here
https://gitlab.isc.org/isc-projects/bind9/-/jobs/26217The cause: `setup.sh: line 29: 8750 Segmentation fault (core dumped) $SIGNER -SD -o master.example master.db > /dev/null 2> signer.err
`
And the logs are here
https://gitlab.isc.org/isc-projects/bind9/-/jobs/26217https://gitlab.isc.org/isc-projects/bind9/-/issues/429Follow-up from "Replace isc_safe routines with their OpenSSL counterparts"2021-10-04T13:02:24ZOndřej SurýFollow-up from "Replace isc_safe routines with their OpenSSL counterparts"The following discussion from !546 should be addressed:
- [ ] @ondrej started a [discussion](https://gitlab.isc.org/isc-projects/bind9/merge_requests/546#note_14964):
> There are more places where `isc_safe_memequal()` is used and...The following discussion from !546 should be addressed:
- [ ] @ondrej started a [discussion](https://gitlab.isc.org/isc-projects/bind9/merge_requests/546#note_14964):
> There are more places where `isc_safe_memequal()` is used and it doesn't have to be. Should I clean this up as part of this MR or open a separate issue for that?Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/437Generated manual pages are not installed in oot build2021-10-04T13:03:32ZPetr MenšíkGenerated manual pages are not installed in oot build<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [...<!--
If the bug you are reporting is potentially security-related - for example,
if it involves an assertion failure or other crash in `named` that can be
triggered repeatedly - then please do *NOT* report it here, but send an
email to [security-officer@isc.org](security-officer@isc.org).
-->
### Summary
Make install always installs manual pages from source directory. If build directory is different, manual pages are generated from docbook into build directory. But those generated manual pages are not installed by make install.
### Steps to reproduce
On bind 9.11.4, it seems also on master branch
- install xsltproc package
- mkdir oot && cd oot && ../configure
- sed -e "s|<date>\(2014-02-19\)</date>|<date>$(date -I)</date>|" -i bin/dig/dig.docbook
- make
- mkdir /tmp/dig && make install DESTDIR=/tmp/dig
### What is the current *bug* behavior?
Installed page has original date at the footer, it is not modified.
### What is the expected *correct* behavior?
Installed page has current date, it is modified variant.
### Possible fixes
(If you can, link to the line of code that might be responsible for the
problem.)
When installing manual pages, make auto variable `$^` should be used instead of `${srcdir}/man`.
Possible fix is included in !553 merge request.https://gitlab.isc.org/isc-projects/bind9/-/issues/446repeated `rndc reload` or `rndc reconfig` on bind 9.11.3 and 9.11.4 causes na...2021-10-04T13:04:07ZAlex Maestasrepeated `rndc reload` or `rndc reconfig` on bind 9.11.3 and 9.11.4 causes named memory usage to grow.### Summary
Each run of `rndc reload` or `rndc reconfig`, with bind 9.11.3 and 9.11.4, in our configuration, causes named memory usage to grow.
### Steps to reproduce
Run `rndc reload` or `rndc reconfig` repeatedly, without changing t...### Summary
Each run of `rndc reload` or `rndc reconfig`, with bind 9.11.3 and 9.11.4, in our configuration, causes named memory usage to grow.
### Steps to reproduce
Run `rndc reload` or `rndc reconfig` repeatedly, without changing the configuration.
### What is the current *bug* behavior?
Memory usage increases per run of `rndc reload`.
### What is the expected *correct* behavior?
Memory usage should remain relatively constant.
### Relevant configuration files
```
$ sudo named-checkconf -t /var/named/chroot -px
acl "real_localhost" {
127.0.0.1/32;
};
acl "lan_hosts" {
10.0.0.0/8;
172.16.0.0/12;
192.168.0.0/16;
};
acl "dns_resolvers" {
"lan_hosts";
};
controls {
inet 127.0.0.1 port 953 allow {
"localhost";
} keys {
"rndckey";
};
};
logging {
channel "general" {
file "/var/log/named.log";
severity notice;
print-time yes;
print-severity yes;
print-category yes;
};
channel "verbose" {
file "/var/log/verbose.log";
severity debug 1;
print-time yes;
print-severity yes;
print-category yes;
};
channel "query" {
file "/var/log/query.log";
severity info;
print-time yes;
print-severity no;
print-category no;
};
category "default" {
"general";
"verbose";
};
category "queries" {
"query";
};
};
options {
directory "/var/named";
dump-file "data/cache_dump.db";
listen-on {
"any";
};
listen-on-v6 {
"any";
};
memstatistics-file "data/named_mem_stats.txt";
pid-file "/var/run/named/named.pid";
querylog no;
statistics-file "data/named_stats.txt";
use-v4-udp-ports {
range 57345 61000;
};
auth-nxdomain no;
max-cache-size 15728640;
no-case-compress {
"localhost";
"lan_hosts";
};
recursion yes;
rrset-order {
order random;
};
allow-query {
"localhost";
};
allow-transfer {
"none";
};
forward only;
forwarders {
10.86.100.108;
10.86.110.129;
10.86.144.123;
10.86.95.123;
10.86.96.101;
10.86.97.126;
};
notify no;
};
key "rndckey" {
algorithm "hmac-md5";
secret "????????????????????????????????????????????????????????????";
};
zone "twitter.com.smf1.twitter.com" {
type master;
file "db.empty";
};
zone "twttr.net.smf1.twitter.com" {
type master;
file "db.empty";
};
zone "twitter.com.atla.twitter.com" {
type master;
file "db.empty";
};
zone "twttr.net.atla.twitter.com" {
type master;
file "db.empty";
};
zone "twitter.com.atla.twttr.net" {
type master;
file "db.empty";
};
zone "twttr.net.atla.twttr.net" {
type master;
file "db.empty";
};
zone "twitter.com.atlb.twitter.com" {
type master;
file "db.empty";
};
zone "twttr.net.atlb.twitter.com" {
type master;
file "db.empty";
};
zone "twitter.com.atlb.twttr.net" {
type master;
file "db.empty";
};
zone "twttr.net.atlb.twttr.net" {
type master;
file "db.empty";
};
zone "twitter.com.smfc.twitter.com" {
type master;
file "db.empty";
};
zone "twttr.net.smfc.twitter.com" {
type master;
file "db.empty";
};
zone "twitter.com.atlc.twitter.com" {
type master;
file "db.empty";
};
zone "twttr.net.atlc.twitter.com" {
type master;
file "db.empty";
};
zone "twitter.com.atlc.twttr.net" {
type master;
file "db.empty";
};
zone "twttr.net.atlc.twttr.net" {
type master;
file "db.empty";
};
zone "twitter.com.prod.twitter.com" {
type master;
file "db.empty";
};
zone "twitter.com.prod.twttr.net" {
type master;
file "db.empty";
};
zone "twttr.net.prod.twitter.com" {
type master;
file "db.empty";
};
zone "twttr.net.prod.twttr.net" {
type master;
file "db.empty";
};
zone "twitter.com.corpdc.twitter.com" {
type master;
file "db.empty";
};
zone "twitter.com.corpdc.twttr.net" {
type master;
file "db.empty";
};
zone "twttr.net.corpdc.twitter.com" {
type master;
file "db.empty";
};
zone "twttr.net.corpdc.twttr.net" {
type master;
file "db.empty";
};
zone "twtter.com" {
type master;
file "db.empty";
};
zone "twitter.com.twttr.net" {
type master;
file "db.empty";
};
zone "twttr.net.twttr.net" {
type master;
file "db.empty";
};
zone "." {
type hint;
file "root.hint";
};
zone "localhost" {
type master;
file "db.localhost";
};
zone "0.0.127.in-addr.arpa" {
type master;
file "db.127.0.0";
};
```
### Relevant logs and/or screenshots
We created core dumps by sending signal 11 to `named`, from several machines with varying memory usage.
first-pass naïve analysis shows that the strings 'KSATtstA' and 'udpdispatch' loosely correlate with memory usage of the process. These hosts represent bind 9.11.3.
```
$ for i in smf* ; do echo = $i = ; strings -a $i | sort | uniq -c | sort -nr | head -n10 ; done
= smf1-azg-31-sr1 =
65978 KSATtstA
64625 udpdispatch
6475 tSeD
2624 pMEMlpmA
2253 nSND
1710 !fuB
1492 twitter
1257 CmeMxcmA
1197 L'jh
1197 disp_sepool
= smf1-dha-15-sr1 =
8297 KSATtstA
7168 udpdispatch
2212 tSeD
1088 CmeMxcmA
584 twitter
564 nSND
406 kLWR
375 !fuB
326 pMEMlpmA`
279 ONBR
= smf1-duy-24-sr1 =
24776 KSATtstA
23596 udpdispatch
3224 tSeD
1251 nSND
1136 CmeMxcmA
982 pMEMlpmA
910 !fuB
838 twitter
437 psiD
437 disp_sepool
= smf1-duz-23-sr1 =
24777 KSATtstA 7
23579 udpdispatch
4300 tSeD
1278 nSND
1136 CmeMxcmA
982 pMEMlpmA
928 !fuB
856 twitter
437 psiD
437 disp_sepool
```
We also tested out 9.11.4:
```
$ ps auxww | grep \[n]amed
named 205598 5.9 0.0 1282692 229648 ? Ssl 21:30 0:02 /usr/sbin/named -u named -c /etc/named.conf -t /var/named/chroot -c /etc/named.conf
$ sudo rndc reload
server reload successful
$ ps auxww | grep \[n]amed
named 205598 9.8 0.1 1282172 291564 ? Ssl 21:30 0:04 /usr/sbin/named -u named -c /etc/named.conf -t /var/named/chroot -c /etc/named.conf
$ sudo rndc reload
server reload successful
$ ps auxww | grep \[n]amed
named 205598 14.3 0.1 1282172 346904 ? Ssl 21:30 0:07 /usr/sbin/named -u named -c /etc/named.conf -t /var/named/chroot -c /etc/named.conf
$ sudo rndc reload
server reload successful
$ ps auxww | grep \[n]amed
named 205598 18.9 0.1 1282172 291992 ? Ssl 21:30 0:09 /usr/sbin/named -u named -c /etc/named.conf -t /var/named/chroot -c /etc/named.conf
```
After six more `rndc reload` commands:
```
$ ps auxww | grep \[n]amed
named 205598 31.6 0.1 1348028 409972 ? Ssl 21:30 0:38 /usr/sbin/named -u named -c /etc/named.conf -t /var/named/chroot -c /etc/named.conf
```
We forced a core and found similar results:
```
$ strings -a core.205598 | sort | uniq -c | sort -nr | head -n10
65972 KSATtstA
64699 udpdispatch
5480 tSeD
2612 pMEMlpmA`
1667 nSND
1522 !fuB
1251 CmeMxcmA
1197 psiD
1197 disp_sepool
1008 disp_portpool
```
### Possible fixes
These strings correspond to various magic values, suggesting that some path `rndc reload` and `rndc reconfig` take is leaking `udpdispatch` structures tagged with ISCAPI_TASK_MAGIC and ONDESTROY_MAGIC. We have a valgrind report, which was inconclusive.
```
lib/isc/ondestroy.c:23:#define ONDESTROY_MAGIC ISC_MAGIC('D', 'e', 'S', 't')
lib/isc/task.c:89:#define TASK_MAGIC ISC_MAGIC('T', 'A', 'S', 'K')
lib/isc/include/isc/task.h:166:#define ISCAPI_TASK_MAGIC ISC_MAGIC('A','t','s','t')
```https://gitlab.isc.org/isc-projects/bind9/-/issues/450Man page macros should be by section.2021-10-04T13:04:48ZMark AndrewsMan page macros should be by section.The MANPAGE macros in Makefiles should be by section.
This should reduce the probability of a man page being mis-installed.The MANPAGE macros in Makefiles should be by section.
This should reduce the probability of a man page being mis-installed.Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/475nsupdate system test fails intermittently2021-10-04T13:07:12ZMichał Kępieńnsupdate system test fails intermittentlyA new failure mode of the "nsupdate" system test has been observed (https://gitlab.isc.org/isc-projects/bind9/-/jobs/36547):
```
I:nsupdate:check that unixtime serial number is correctly generated (7)
I:nsupdate:out-of-range serial=1534...A new failure mode of the "nsupdate" system test has been observed (https://gitlab.isc.org/isc-projects/bind9/-/jobs/36547):
```
I:nsupdate:check that unixtime serial number is correctly generated (7)
I:nsupdate:out-of-range serial=1534238386 > now=1534238385
I:nsupdate:failed
```
I do not know whether it is in any way related to #424 or the fix for it. Artifacts kept.https://gitlab.isc.org/isc-projects/bind9/-/issues/483race condition in radix.c2021-10-04T13:08:20ZOndřej Surýrace condition in radix.cIn `_ref_prefix()` atomics cannot be used, and the critical section (the part between `isc_refcount_current()` and `isc_refcount_increment()` must be locked instead otherwise the memory might leak.In `_ref_prefix()` atomics cannot be used, and the critical section (the part between `isc_refcount_current()` and `isc_refcount_increment()` must be locked instead otherwise the memory might leak.BIND 9.17 BackburnerOndřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/494DNSSEC tools effectively ignore the "-r" command line switch when linked with...2021-10-04T13:08:50ZMichał KępieńDNSSEC tools effectively ignore the "-r" command line switch when linked with specific OpenSSL versionsIn order for the `-r` switch to actually cause entropy to be gathered from the path supplied to it, the dst library must register itself in OpenSSL as the default source of entropy. However, this will not happen in certain circumstances...In order for the `-r` switch to actually cause entropy to be gathered from the path supplied to it, the dst library must register itself in OpenSSL as the default source of entropy. However, this will not happen in certain circumstances:
* for BIND 9.11, [it will not happen if OpenSSL already has a default source of entropy set](https://gitlab.isc.org/isc-projects/bind9/blob/30a24678c35a2098ef8f15daf44cb951d6c6a026/lib/dns/openssl_link.c#L267-278),
* for BIND 9.12, [it will not happen if BIND is compiled with `--enable-crypto-rand` *or* if OpenSSL already has a default source of entropy set](https://gitlab.isc.org/isc-projects/bind9/blob/070b67910431550b60eeaf65d1f984e0514436b1/lib/dns/openssl_link.c#L273-286).
Note that OpenSSL versions between 1.0.1 and 1.0.1e (inclusive) use the ["Intel RDRAND engine"](https://github.com/openssl/openssl/blob/OpenSSL_1_0_1e/crypto/engine/eng_rdrand.c) as the default source of entropy. One prominent user of OpenSSL 1.0.1e is RHEL/CentOS 6.
*master* is unaffected since the `-r` switch was dropped in it altogether.
While I agree that the dst library should prefer OpenSSL's default source of entropy over libisc-provided entropy, IMHO the `-r` switch is an explicit request to override that preference.Ondřej SurýOndřej Surýhttps://gitlab.isc.org/isc-projects/bind9/-/issues/518Allow the EDNS version and EDNS flags in requests to be set via named.conf fo...2021-10-04T13:11:12ZMark AndrewsAllow the EDNS version and EDNS flags in requests to be set via named.conf for compliance testing from a recursive serverMark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/520host: hang in epoll_wait due to interface status changes2021-10-04T13:11:59ZGhost Userhost: hang in epoll_wait due to interface status changes### Summary
Certain interface state changes seem to be able to trigger a permanent hang in host.
This came up in Ubuntu bug [1752411](https://bugs.launchpad.net/ubuntu/+source/openconnect/+bug/1752411) and I'm merely summarizing here wh...### Summary
Certain interface state changes seem to be able to trigger a permanent hang in host.
This came up in Ubuntu bug [1752411](https://bugs.launchpad.net/ubuntu/+source/openconnect/+bug/1752411) and I'm merely summarizing here what our great community found on the way of implementing a mitigation for now.
### Steps to reproduce
We have had reports of two kinds to trigger that (none is a perfect do this to trigger - I know):
1. setting interfaces online/offline while running host, but that seems dependent on the interface type
2. some VPN solution creating their virtual interfaces and then running host while those are still initializing
Read more on the Ubuntu bug about the different approaches to reproduce different people have taken.
### What is the current *bug* behavior?
`host` hangs for an inifinte amount of time.
While it was not our initial setup per reports of people trying alternatives this even applies to cases that set "-W <timeout>"
### What is the expected *correct* behavior?
`host` to give up with an error after some time, or at least timing out when -W is set.
### Relevant configuration files
<none>
### Relevant logs and/or screenshots
process appears sleeping like
```
root 14606 0.0 0.0 187532 8384 ? Sl 13:05 0:00 host -t soa local.
```
At the same time hte kernel wchan for the process is `sigsuspend`
GDB backtrace showing the -1 epoll_wait call
```
(gdb) t a a bt full
Thread 4 (Thread 0x7ffff0fe1700 (LWP 9916)):
#0 0x00007ffff6be9bb7 in epoll_wait (epfd=5, events=0x7ffff7f81010, maxevents=64, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
resultvar = 18446744073709551612
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00007ffff712a49b in watcher (uap=0x7ffff7f80010) at ../../../../lib/isc/unix/socket.c:4292
manager = 0x7ffff7f80010
done = isc_boolean_false
cc = <optimized out>
fnname = 0x7ffff714389a "epoll_wait()"
strbuf = '\000' <repeats 127 times>
#2 0x00007ffff6ec06db in start_thread (arg=0x7ffff0fe1700) at pthread_create.c:463
pd = 0x7ffff0fe1700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737236571904, 6142566376078845154, 140737236569984, 0, 140737353613328, 140737488347808, -6142586158121474846, -6142581914974542622}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0,
cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#3 0x00007ffff6be988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
```
Strace just shows it in epoll
```
[pid 6197] 0.000076 epoll_wait(5, <unfinished ...>
```
Those snippets are stripped for readability, full logs can be found on the Ubuntu bugs.
For example to check the other threads of the GDB backtrace or such if you need that.
### Possible fixes
Per epoll_wait documentation there could be a timeout set, which currently is -1 to be infinite
```
cc = epoll_wait(manager->epoll_fd, manager->events,
manager->nevents, -1);
```
That should be this [code](https://gitlab.isc.org/isc-projects/bind9/blob/master/lib/isc/unix/socket.c#L4157) in current master.
Maybe setting this to non -1 and iterating the epoll timeout to catch signals like the timeout or other failures?
OTOH it is a lib function and I don't know where else this is reused.
### Mitigations
For now since -W <timeout> didn't work we have wrapped it in an external 'timeout' call which works to mitigate the issue.
Long term it would be nice if this bug could be fixed to help other potential users as well as we are considering to move from host to another resolver tool if it is better suited and works. On the actual case (avahi script) we might even drop most of the code these days, but that is not relevant for this issue "in" `host` we are discussing here.BIND 9.17 Backburnerhttps://gitlab.isc.org/isc-projects/bind9/-/issues/563Look to see if we can detect referral loops without impacting on valid referr...2021-10-04T13:17:39ZMark AndrewsLook to see if we can detect referral loops without impacting on valid referral chains.Witold KrecickiWitold Krecickihttps://gitlab.isc.org/isc-projects/bind9/-/issues/5739.11.5rc1 statschannel test sometimes failing on ubuntu 18.042021-10-04T13:18:24ZCurtis Blackburn9.11.5rc1 statschannel test sometimes failing on ubuntu 18.04in 9.11.5rc1, on ubuntu 18.04, the statschannel test sometimes fails.
Testing 9.11.4-P2 on the same machine, I was unable to replicate the failure at all.
the failure seems to be rooted in the first check.
traffic.expect.1 (file incl...in 9.11.5rc1, on ubuntu 18.04, the statschannel test sometimes fails.
Testing 9.11.4-P2 on the same machine, I was unable to replicate the failure at all.
the failure seems to be rooted in the first check.
traffic.expect.1 (file included in tarball):
```
tcp request-size 16-31: 1
tcp response-size 64-79: 1
```
traffic.out.x1 (file generated by the test):
```
tcp request-size 16-31: 1
tcp response-size 16-31: 1
```
sample failure output:
```
S:statschannel:Wed Oct 3 12:10:33 PDT 2018
T:statschannel:1:A
A:statschannel:System test statschannel
I:statschannel:PORTRANGE:12100 - 12199
I:statschannel:JSON tests require JSON library; skipping
I:statschannel:fetching traffic size data (1)
I:statschannel:... using xml
traffic.out.x1 traffic.expect.1 differ: byte 45, line 2
I:statschannel:failed
I:statschannel:fetching traffic size data after small UDP query (2)
I:statschannel:... using xml
traffic.out.x2 traffic.expect.2 differ: byte 45, line 2
I:statschannel:failed
I:statschannel:fetching traffic size data after large UDP query (4)
I:statschannel:... using xml
traffic.out.x4 traffic.expect.4 differ: byte 45, line 2
I:statschannel:failed
I:statschannel:fetching traffic size data after small TCP query (5)
I:statschannel:... using xml
traffic.out.x5 traffic.expect.5 differ: byte 100, line 4
I:statschannel:failed
I:statschannel:fetching traffic size data after large TCP query (6)
I:statschannel:... using xml
traffic.out.x6 traffic.expect.6 differ: byte 100, line 4
I:statschannel:failed
I:statschannel:checking consistency between named.stats and xml/json (7)
I:statschannel:checking consistency between regular and compressed output (8)
I:statschannel:checking if compressed output is really compressed (9)
I:statschannel:exit status: 5
R:statschannel:FAIL
E:statschannel:Wed Oct 3 12:10:36 PDT 2018
```https://gitlab.isc.org/isc-projects/bind9/-/issues/591Add per keyid managed keys validation success counters.2021-10-04T13:19:03ZMark AndrewsAdd per keyid managed keys validation success counters.This will allow administrators to see which keys/sig pairs are successful.This will allow administrators to see which keys/sig pairs are successful.