Problems after failure of loading rpz [ISC-support #14002]
Various issues following on from a failure to load an RPZ zone as reported to Support
BIND version used
The submitter confirms this can happen on BIND 9.11.3-S2 and public git version as of commit 3485fe4b. (It doesn't happen for BIND 9.10.3-P4)
This is related to the fact that zone.c:zone_startload() calls dns_zone_rpz_enable_db() at the beginning of the function. This will register a "listener" for the 'db' (to be loaded) with corresponding dns_rpz_zone_t and mark dns_rpz_zone_t as such, but doesn't care whether the load eventually succeeds. It leads to at least the two issues described below.
The first issue is that named could crash on shutdown after certain kinds of failure of loading rpz. This happens when the load fails without calling dns_db_endload() (which seems to be unlikely to happen in practice, but could still happen in theory if, for example, dns_db_beginload() fails), the dns_rpz_zone_t object is still kept marked as having registered 'db'. So when this object is finally destroyed in rpz_detach(), this code will incorrectly call dns_db_updatenotify_unregister()
if (rpz->db_registered) dns_db_updatenotify_unregister(rpz->db, dns_rpz_dbupdate_callback, rpz);
and trigger this assertion failure:
REQUIRE(db != NULL);
Another variant of the issue happens when the zone loading fails but dns_db_endload() is called. This is relatively more likely to happen than the previous case; e.g. when reloading a zone but the zone is broken or nonexistent. In this case the registered callback function dns_rpz_dbupdate_callback() is called with an incomplete new version of 'db'; in particular, if the new version of the zone file doesn't exist, it removes all "summary database" content of the previous version of the rpz, so none of the previous RPZ rules will apply anymore (note that the zone itself still keeps the previous version of db on failure of reloading). I'm not sure if it's intentional, but this behavior looks counter intuitive to me, and it's at least incompatible with BIND 9.10, which keeps using the previous version of RPZ if reloading a new version fails.
I can think of several ways to fix these issues (assuming the second behavior is also something to fix), but I guess one fundamental point is to avoid calling dns_zone_rpz_enable_db() prematurely. If there's not a particular reason for calling it at the top of zone_startload(), I'd defer it until the load is successfully completed.
I believe both of the above two issues are prevented if we do that, but there are a couple of additional remarks:
rpz_detach() is awkward in that it calls dns_db_updatenotify_unregister() without checking rpz->db is non-NULL (which, if violated, triggers a crash), while it performs the check immediately after that: if (rpz->db_registered) dns_db_updatenotify_unregister(rpz->db, dns_rpz_dbupdate_callback, rpz); [...] if (rpz->db) dns_db_detach(&rpz->db); (it also violates ISC's coding guideline by not making the NULL pointer comparison explicit). If you are sure rpz->db is non-NULL (although which is wrong), you should be able to omit the second 'if'; if you need to handle that case, the same check should have been done before calling dns_db_updatenotify_unregister(). The crash problem could be prevented that way, although the second issue would still happen.
On a related point, 'db_registered' looks almost redundant. If you need to check the validity of rpz->db before dns_db_updatenotify_unregister(), the only other place where it's used is the REQUIRE assertion in dns_rpz_dbupdate_callback(). Also, 'db_registered' is set to true outside of the rpz module (rpz.c), which makes the code more fragile and less understandable (the rpz implementation generally has this kind of problem; it exposes too many internal details outside of the main module, and this is one of such cases). You may want to retire this thing at this opportunity.