Various BIND 9 bugs in cleanup-after-fail scenarios
From Support ticket https://support.isc.org/Ticket/Display.html?id=13049
The submitter says:
I've found several bugs of BIND 9.11.3-S1 (not checked against S2 but I believe S2 has the same problems) through the following test:
- start named with some simple config
- suspend the process by sending SIGSTOP
- spawn 10 (arbitrary choice) rndc processes in background, each invoking the 'reconfig' command
- resume named by sending SIGCONT
- shutdown named by sending SIGTERM
In many cases named exits cleanly, but if I repeat this test many times, it sometimes crashes at the end, reporting a memory leak. I've figured out it's because dns_resolver_create() failed for its call to isc_task_create() and then failed to free res->badcache. (isc_task_create() fails due to ISC_R_SHUTTINGDOWN as it tries to process 'rndc reconfig' after receiving SIGTERM).
On code inspection I've found the same kind of problem in dns_view_create() (it could leak 'failcache'), although this wouldn't happen in the above test since it doesn't involve isc_task_create().
Furthermore, both the resolver and view modules just ignore the return value from dns_badcache_init() and still assume 'badcache' and 'failcache' are valid once dns_resolver/view_create() succeeds. So this could potentially cause a crash.
And, when I tried to fix the above issue and confirm the fix, I've noticed another bug in dns_view_create(): if this function fails and performs cleanup, isc_refcount_destroy() triggers an assertion failure, since the refcount is initialized to 1 (destroy() assumes it's 0).
Finally, I've seen another variant of the first problem in dns_ntatable_create(): if it fails to call isc_task_create() it caused SEGFAULT here: isc_mem_put(ntatable->view->mctx, ntatable, sizeof(*ntatable)); since ntatable->view->mctx is not initialized at this point. (I couldn't reproduce this failure mode in vanilla 9.11.3-S1, so some subtle timing may be necessary. But the bug is obvious from the code).
Below I've pasted a patch that would fix all of these. You'd fix some of the problems in a different way (for example, it would be much cleaner if you explicitly catch the result of dns_badcache_init() and handle errors appropriately), but the patch should help understand what's wrong.
Each of these problems is easy to fix, but I'd like to suggest a couple of high-level points so it's less likely for us to see this type of issues in future:
- you may want to add a test case I showed at the beginning of this report in your auto-test framework. IIRC we've seen and reported a similar type of bugs before in this scenario. If you regularly run tests for this scenario it would help notice similar future issues much sooner, ideally before public releases.
- the bug about isc_refcount_destroy() suggests that code for cleanup-on-failure can easily have bugs that are not noticed in normal tests. I don't have a fool-proof way to prevent this kind of bugs, but my usual practice in this case is to at least manually exercise the cleanup path by tweaking the code so some call fails, running a test, and confirming no crash happens. you may want to do something similar as part of daily practice, at least for complex cleanup code like the one in dns_view_create() (if you can come up with a more sophisticated way, that would be better of course).
diff --git a/lib/dns/nta.c b/lib/dns/nta.c
index e46cb6f..7f60353 100644
--- a/lib/dns/nta.c
+++ b/lib/dns/nta.c
@@ -147,7 +147,7 @@ dns_ntatable_create(dns_view_t *view,
isc_task_detach(&ntatable->task);
cleanup_ntatable:
- isc_mem_put(ntatable->view->mctx, ntatable, sizeof(*ntatable));
+ isc_mem_put(view->mctx, ntatable, sizeof(*ntatable));
return (result);
}
diff --git a/lib/dns/resolver.c b/lib/dns/resolver.c
index 2d21d37..6b3b940 100644
--- a/lib/dns/resolver.c
+++ b/lib/dns/resolver.c
@@ -9349,6 +9349,10 @@ dns_resolver_create(dns_view_t *view,
res->badcache = NULL;
dns_badcache_init(res->mctx, DNS_RESOLVER_BADCACHESIZE,
&res->badcache);
+ if (res->badcache == NULL) {
+ result = ISC_R_NOMEMORY;
+ goto cleanup_res;
+ }
res->mustbesecure = NULL;
res->spillatmin = res->spillat = 10;
res->spillatmax = 100;
@@ -9541,6 +9545,8 @@ dns_resolver_create(dns_view_t *view,
res->nbuckets * sizeof(fctxbucket_t));
cleanup_res:
+ if (res->badcache != NULL)
+ dns_badcache_destroy(&res->badcache);
isc_mem_put(view->mctx, res, sizeof(*res));
return (result);
diff --git a/lib/dns/view.c b/lib/dns/view.c
index c0c4e7b..cda0803 100644
--- a/lib/dns/view.c
+++ b/lib/dns/view.c
@@ -251,7 +251,11 @@ dns_view_create(isc_mem_t *mctx, dns_rdataclass_t rdclass,
view->fail_ttl = 0;
view->failcache = NULL;
(void)dns_badcache_init(view->mctx, DNS_VIEW_FAILCACHESIZE,
- &view->failcache);
+ &view->failcache);
+ if (view->failcache == NULL) {
+ result = ISC_R_NOMEMORY;
+ goto cleanup_dynkeys;
+ }
view->v6bias = 0;
view->dtenv = NULL;
view->dttypes = 0;
@@ -325,10 +329,13 @@ dns_view_create(isc_mem_t *mctx, dns_rdataclass_t rdclass,
DESTROYLOCK(&view->new_zone_lock);
cleanup_dynkeys:
+ if (view->failcache != NULL)
+ dns_badcache_destroy(&view->failcache);
if (view->dynamickeys != NULL)
dns_tsigkeyring_detach(&view->dynamickeys);
cleanup_references:
+ isc_refcount_decrement(&view->references, NULL);
isc_refcount_destroy(&view->references);
cleanup_fwdtable: