checkds code may fail to release all resources on shutdown
The "checkds" system test has been failing intermittently on FreeBSD:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/1845792
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/1847239
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/1851457
These failures are caused not by the test itself failing (the actual
Python tests are skipped), but rather by named
assertion failures
triggered by outstanding memory allocations at shutdown.
I assumed these are happening because named
is shut down very shortly
after startup. By looking at the list of outstanding allocations, I was
able to determine that the leaked allocations are instances of the
dns_message_t
structure along with its various members. All of these
dns_message_t
objects had from_to_wire
set to
DNS_MESSAGE_INTENTRENDER
, which made me look at
checkds_send_toaddr()
, where these objects are allocated.
I believe there is a bug in there that prevents the dns_message_t
object (referenced by the message
stack variable) from being released
when the dns_request_createvia()
call fails (e.g. because
requestmgr->exiting
is true
, which is what happens at shutdown):
diff --git a/lib/dns/zone.c b/lib/dns/zone.c
index bbd2da00fda..cb6f47870f4 100644
--- a/lib/dns/zone.c
+++ b/lib/dns/zone.c
@@ -21231,7 +21231,7 @@ checkds_send_toaddr(isc_task_t *task, isc_event_t *event) {
checkds->zone, ISC_LOG_DEBUG(3),
"checkds: dns_request_createvia() to %s failed: %s",
addrbuf, dns_result_totext(result));
- goto cleanup;
+ goto cleanup_key;
}
cleanup_key:
(Note that the goto
statement can also be removed altogether, but
perhaps it is more future-proof to leave it there, in case more code
gets added at a later time.)
To reproduce the problem, apply the following patch:
diff --git a/lib/dns/zone.c b/lib/dns/zone.c
index bbd2da00fda..1790f7d3ada 100644
--- a/lib/dns/zone.c
+++ b/lib/dns/zone.c
@@ -14,6 +14,7 @@
#include <errno.h>
#include <inttypes.h>
#include <stdbool.h>
+#include <unistd.h>
#include <isc/atomic.h>
#include <isc/file.h>
@@ -21222,6 +21223,7 @@ checkds_send_toaddr(isc_task_t *task, isc_event_t *event) {
timeout = 15;
options |= DNS_REQUESTOPT_TCP;
+ sleep(1);
result = dns_request_createvia(
checkds->zone->view->requestmgr, message, &src, &checkds->dst,
dscp, options, key, timeout * 3, timeout, 0,
and run the checkds
system test on a platform where the Python tests
for checkds are skipped.
I do not think this is significant enough to fix in July releases - it
only happens if named
is shut down around the time a DS check is
queued and only triggers an assertion failure at shutdown.