Signed version of an inline-signed zone may be dumped without unsigned serial number information
When the signed version of an inline-signed zone is dumped to disk, the
serial number of the unsigned version of the zone is written in the
raw-format header so that the contents of the signed zone can be
resynchronized after named
restart if the unsigned zone file is
modified while named
is not running (see RT #26676).
In order for the serial number of the unsigned zone to be determined
during the dump, zone->raw
must be set to a non-NULL
value. This
should always be the case as long as the signed version of the zone is
used for anything by named
.
However, a scenario exists in which the signed version of the zone has
zone->raw
set to NULL
while it is being dumped:
-
Zone dump is requested;
zone_dump()
is invoked. -
Another zone dump is already in progress, so the dump gets deferred until I/O is available (see
zonemgr_getio()
). -
The last external reference to the zone is released.
zone_shutdown()
gets queued to the zone's task. -
I/O becomes available for zone dumping.
zone_gotwritehandle()
gets queued to the zone's task. -
The zone's task runs
zone_shutdown()
.zone->raw
gets set toNULL
. -
The zone's task runs
zone_gotwritehandle()
.zone->raw
is determined to beNULL
, causing the serial number of the unsigned version of the zone to be omitted from the raw-format dump of the signed zone file.
I believe this issue became easier to trigger in BIND 9.12.0. That was the first BIND 9 release containing change 4613 (see RT #38324), specifically this hunk:
@@ -9773,7 +9822,7 @@ dns_zone_flush(dns_zone_t *zone) {
dumping = ISC_TRUE;
UNLOCK_ZONE(zone);
if (!dumping)
- result = zone_dump(zone, ISC_FALSE); /* Unknown task. */
+ result = zone_dump(zone, ISC_TRUE); /* Unknown task. */
return (result);
}
zone_dump()
can either perform the zone->raw
check itself or defer
it until zone dump I/O becomes available. Before the above change,
deferring the check was only possible if zone_dump()
was called from
zone_maintenance()
(which itself is timer-based). The above change
enables the zone->raw
check to also be deferred when zone_dump()
is
called from dns_zone_flush()
, i.e. essentially from anywhere,
particularly from zone table cleanup callbacks which are run when the
zone's reference count is likely to drop to zero, triggering
zone_shutdown()
and ultimately causing the bug above.
The above change was originally introduced in commit
980611a3fe3ececeb0049b9e7c2e380b577f5e68 without any detailed
explanation. I am not entirely sure why, but the change seems to be
necessary in order for some tests related to max-journal-size
to pass.
I ran out of time to determine why that is. Note, however, that
zone_dump()
warns against setting compact
to true
for
non-task-locked call sites (see also the code comments next to
zone_dump()
invocations).
At any rate, I believe that the bug could be triggered even without the
above change - when the zone's reference count drops to zero while
zone_maintenance()
is running. I have not confirmed that it is
practically possible and I can certainly be missing some implicit
protection against such a triggering scenario happening. It does not
matter much anyway with BIND 9.11 reaching EoL soon and this not being a
critical problem.
The only quick way to fix this issue that I see is to defer detaching
from zone->raw
in zone_shutdown()
if the zone is in the process of
being dumped to disk.
The problem is easily reproducible, though I need to find a clean way of turning it into a system test.
This problem was discovered in the process of attempting to fix an unrelated issue.