"resolver" test fails intermittently
The recently-added (only on main
) check for a SERVFAIL response to a
TCP query with an empty question section fails pretty frequently. The
check was added in commit 2f3ded76 (part
of !5616 (merged)). Despite a code comment suggesting otherwise, the test
fails due to dig
hitting a timeout instead of receiving a SERVFAIL
response:
$ cat dig.ns5.out.70
; <<>> DiG 9.17.21 <<>> -p 29349 @10.53.0.5 -b 10.53.0.5 +tcp tcpalso.no-questions. a +tries=3 +time=4
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
Example occurrences:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2197750
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2197749
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2197279
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2196581
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2185707
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2185704
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2185288
Most of these happened on FreeBSD, but two happened on Linux (under TSAN), so the common denominator seems to be "machine under load" rather than "only happens on FreeBSD".