Concurrently running dns_request-based UDP queries may inadvertently cancel each other
lib/dns/request.c:find_udp()
does not set the
DNS_DISPATCHATTR_EXCLUSIVE
attribute when requesting a
non-port-specific UDP dispatch to use for sending a request created by
dns_request_createvia()
or dns_request_createraw()
. This may result
in multiple different dns_request
-based queries using the same UDP
dispatch (i.e. UDP socket). Given that lib/dns/request.c:req_cancel()
is used for cleaning up both successful and failed requests, destroying
a properly answered request can cause an unrelated, queued send
operation (for a different request using the same UDP socket) to be
canceled. The problem manifests itself in the form of the following log
message:
21-Feb-2021 19:28:45.734 zone verify-ixfr/IN: refresh: failure trying master 10.53.0.2#9500 (source 10.53.0.3#0): operation canceled
(source: job 1514369, bin/tests/system/mirror/ns3/named.run
)
Note that the above log message is the same as the one mentioned in https://kb.isc.org/docs/aa-01213, but the root cause is different (the message above was generated on Windows and thus could not be caused by Netfilter interference).
Both outgoing NOTIFY messages (on primaries) and SOA refresh queries (on
secondaries) are sent using the dns_request
API.
An example high-level sequence of events leading to this problem is:
- request A is created using
dns_request_createvia()
, - request A gets assigned a non-exclusive UDP dispatch,
- request A's query is sent out,
- response to request A's query arrives and its processing starts,
- request B is created using
dns_request_createvia()
, - request B gets assigned the same UDP dispatch as request A,
- request B's query is queued for sending,
- request A is cleaned up,
req_cancel()
callsisc_socket_cancel()
, - request B's query is canceled before it gets put on the wire,
- request B's callback is called with
ISC_R_CANCELED
.
The simplest way to prevent this is to do what named
does for its
outgoing resolver queries, which is to use exclusive UDP dispatches
unless notify-source
and/or transfer-source
includes a specific port
number (e.g. transfer-source 10.53.0.1 port 1234;
). Here is a patch
which achieves that:
diff --git a/lib/dns/request.c b/lib/dns/request.c
index f2c3d3d94d..47429c9850 100644
--- a/lib/dns/request.c
+++ b/lib/dns/request.c
@@ -628,6 +630,11 @@ find_udp_dispatch(dns_requestmgr_t *requestmgr, const isc_sockaddr_t *srcaddr,
default:
return (ISC_R_NOTIMPLEMENTED);
}
+
+ if (isc_sockaddr_getport(srcaddr) == 0) {
+ attrs |= DNS_DISPATCHATTR_EXCLUSIVE;
+ }
+
attrmask = 0;
attrmask |= DNS_DISPATCHATTR_UDP;
attrmask |= DNS_DISPATCHATTR_TCP;
Without this patch, I could reproduce this problem in the mirror
system test within minutes. With this patch applied, I could not
reproduce it in the same environment for hours, which I think is solid
enough proof to confirm that the root cause analysis above is accurate.
However, the proposed patch still does not solve the problem for query sources using an explicit port number.
Furthermore, fixing this might not be worth the effort as the dispatch code is in the process of being migrated to netmgr (!4601 (merged)), which means the offending part of the code will be dropped altogether soon enough. As for BIND 9.11, AFAICT, the problem does not have any severe real-world consequences:
-
SOA queries which fail to be sent out get requeued,
-
NOTIFY messages which fail to be sent out are not requeued, but the secondaries should still eventually manage to refresh the zone (according to the SOA refresh timer).
My guess is that we have been unknowingly hitting this issue in GitLab
CI, but in order for this bug to trigger a test failure, certain
conditions must be met. In the case of the failed mirror
system test
mentioned above, canceling the first SOA query caused a fallback to the
alternative zone transfer source address (alt-transfer-source
), which
was not allowed to transfer the verify-ixfr
zone from ns2
. This
triggered an AXFR retry down the road, which was unexpected and rightly
caused the test to fail.