dig (and other tools) may send queries with QID=0, which confuses Net::DNS
Unless specified manually using +qid=<value>
, dig
uses a random
query ID for the DNS messages it sends out:
In particular, the value chosen can be 0. While QID=0 is perfectly legal protocol-wise, it seems that some code bases, e.g. Net::DNS, are unable to properly handle queries with QID=0. Here is an example:
https://gitlab.isc.org/isc-private/bind9/-/jobs/3509123
2023-07-06 14:14:45 INFO:serve-stale I:serve-stale_tmp_iwl06k82:disable responses from authoritative server (89)
2023-07-06 14:14:57 INFO:serve-stale I:serve-stale_tmp_iwl06k82:failed
bin/tests/system/serve-stale_tmp_iwl06k82/dig.out.test89
:
;; Warning: ID mismatch: expected ID 0, got 46879
;; communications error to 10.53.0.2#19223: timed out
; <<>> DiG 9.19.15 <<>> +time +tries -p 19223 @10.53.0.2 txt disable
; (1 server found)
;; global options: +cmd
;; no servers could be reached
This looked weird to me, so I started ans2/ans2.pl
manually and sent a
query to it using dig @10.53.0.2 -p 5300 disable. TXT +qid=0 +tries=1
.
Guess what:
;; Warning: ID mismatch: expected ID 0, got 27885
;; communications error to 10.53.0.2#5300: timed out
; <<>> DiG 9.19.15 <<>> @10.53.0.2 -p 5300 disable. TXT +qid=0 +tries=1
; (1 server found)
;; global options: +cmd
;; no servers could be reached
Looking at Net::DNS sources, the documentation says:
=head2 id
print "query id = ", $packet->header->id, "\n";
$packet->header->id(1234);
Gets or sets the query identification number.
A random value is assigned if the argument value is undefined.
However, the above seems to be imprecise: apparently if the ID is
defined, but set to 0, Net::DNS treats it as an undefined value.
This causes the $packet->header->id
call to return a random value
instead of 0 for queries with QID=0, breaking responses to such queries.
I don't see any reasonable way to work around this problem in our Perl
code (apart from converting it to Python). Adding +qid
to every dig
invocation in the system test suite also seems over the top for working
around something this silly. However, until we do something about this,
we might be seeing a whole class of surprising failures in the system
test suite caused by this behavior.