ADB uses recursion when auth zone is not ready
As reported in Support ticket #20672
The bug was originally found in 9.11, but still exists in 9.16.
Quoting the reporter (almost) verbatim:
We found a bug where when a locally configured authoritative zone was not ready, ADB would try to resolve NS addresses whose names were under such zones using the resolver and cache results (e.g., NXDOMAIN) within the ADB cache. Whereas it should not, and it should fail such lookups until the local zone database is ready.
This behavior of ADB was implemented differently from the query path. The bug was readily reproducible by the customer who was using a custom dns_db database implementation.
This caused resolution failures for upto NCACHE TTL (e.g., min(SOA.minimum TTL, SOA TTL)) in our customer's case (15 minutes). in "our distribution (redacted)" with a unittest (similar to ISC BIND system test) to reproduce it, but I am unable to share the testcase code right now. The testcase is anyway reliant on the rest of "our distribution (redacted)" test framework, so I don't know how useful it would be to ISC. BUT, the testcase used database type "rbt" as in ISC BIND.
An extract of patch we used to fix it, which may be helpful to understand the problem:
diff --git a/lib/dns/adb.c b/lib/dns/adb.c
index ea93fef..bdd514c 100644
--- a/lib/dns/adb.c
+++ b/lib/dns/adb.c
@@ -3177,6 +3177,25 @@ dns_adb_createfind(dns_adb_t *adb, isc_task_t *task, isc_taskaction_t action,
if (!NAME_FETCH_V4(adbname)) {
wanted_fetches |= DNS_ADBFIND_INET;
}
+
+ /*
+ * If a dbfind_name() resulted in DNS_R_NOTLOADED, it
+ * would have happened because a zone database was not
+ * yet loaded (e.g., during named startup). In this
+ * case, don't attempt any fetches to avoid caching any
+ * results from recursion that may return undesired data
+ * vs. what is in the local zones.
+ *
+ * This may cause short-lived failures, but it is better
+ * than long-term failures, e.g., due to NXDOMAIN
+ * answers from upstream forwarders when looking up the
+ * addresses of nameservers because they don't exist
+ * outside local zones, that are cached for multiple
+ * minutes and cause SERVFAIL to downstream clients
+ * until their NCACHE TTL expire.
+ */
+ if (ISC_UNLIKELY(result == DNS_R_NOTLOADED))
+ find->options |= DNS_ADBFIND_NOFETCH;
}
v6:
@@ -3213,6 +3232,12 @@ v6:
if (!NAME_FETCH_V6(adbname)) {
wanted_fetches |= DNS_ADBFIND_INET6;
}
+
+ /*
+ * See similar comment in IPv4 case above.
+ */
+ if (ISC_UNLIKELY(result == DNS_R_NOTLOADED))
+ find->options |= DNS_ADBFIND_NOFETCH;
}
fetch:
diff --git a/lib/dns/view.c b/lib/dns/view.c
index 4a8f9af..c0e7f80 100644
--- a/lib/dns/view.c
+++ b/lib/dns/view.c
@@ -1091,6 +1091,9 @@ dns_view_find(dns_view_t *view, const dns_name_t *name, dns_rdatatype_t type,
}
if (result == ISC_R_SUCCESS || result == DNS_R_PARTIALMATCH) {
result = dns_zone_getdb(zone, &db);
+ if (result == DNS_R_NOTLOADED) {
+ goto cleanup;
+ }
if (result != ISC_R_SUCCESS && view->cachedb != NULL) {
dns_db_attach(view->cachedb, &db);
} else if (result != ISC_R_SUCCESS) {