Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • BIND BIND
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 605
    • Issues 605
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 87
    • Merge requests 87
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ISC Open Source ProjectsISC Open Source Projects
  • BINDBIND
  • Issues
  • #3071
Closed
Open
Issue created Dec 30, 2021 by Michał Kępień@michalOwner

Signed version of an inline-signed zone may be dumped without unsigned serial number information

When the signed version of an inline-signed zone is dumped to disk, the serial number of the unsigned version of the zone is written in the raw-format header so that the contents of the signed zone can be resynchronized after named restart if the unsigned zone file is modified while named is not running (see RT #26676).

In order for the serial number of the unsigned zone to be determined during the dump, zone->raw must be set to a non-NULL value. This should always be the case as long as the signed version of the zone is used for anything by named.

However, a scenario exists in which the signed version of the zone has zone->raw set to NULL while it is being dumped:

  1. Zone dump is requested; zone_dump() is invoked.

  2. Another zone dump is already in progress, so the dump gets deferred until I/O is available (see zonemgr_getio()).

  3. The last external reference to the zone is released. zone_shutdown() gets queued to the zone's task.

  4. I/O becomes available for zone dumping. zone_gotwritehandle() gets queued to the zone's task.

  5. The zone's task runs zone_shutdown(). zone->raw gets set to NULL.

  6. The zone's task runs zone_gotwritehandle(). zone->raw is determined to be NULL, causing the serial number of the unsigned version of the zone to be omitted from the raw-format dump of the signed zone file.

I believe this issue became easier to trigger in BIND 9.12.0. That was the first BIND 9 release containing change 4613 (see RT #38324), specifically this hunk:

@@ -9773,7 +9822,7 @@ dns_zone_flush(dns_zone_t *zone) {
                dumping = ISC_TRUE;
        UNLOCK_ZONE(zone);
        if (!dumping)
-               result = zone_dump(zone, ISC_FALSE);    /* Unknown task. */
+               result = zone_dump(zone, ISC_TRUE);     /* Unknown task. */
        return (result);
 }

zone_dump() can either perform the zone->raw check itself or defer it until zone dump I/O becomes available. Before the above change, deferring the check was only possible if zone_dump() was called from zone_maintenance() (which itself is timer-based). The above change enables the zone->raw check to also be deferred when zone_dump() is called from dns_zone_flush(), i.e. essentially from anywhere, particularly from zone table cleanup callbacks which are run when the zone's reference count is likely to drop to zero, triggering zone_shutdown() and ultimately causing the bug above.

The above change was originally introduced in commit 980611a3fe3ececeb0049b9e7c2e380b577f5e68 without any detailed explanation. I am not entirely sure why, but the change seems to be necessary in order for some tests related to max-journal-size to pass. I ran out of time to determine why that is. Note, however, that zone_dump() warns against setting compact to true for non-task-locked call sites (see also the code comments next to zone_dump() invocations).

At any rate, I believe that the bug could be triggered even without the above change - when the zone's reference count drops to zero while zone_maintenance() is running. I have not confirmed that it is practically possible and I can certainly be missing some implicit protection against such a triggering scenario happening. It does not matter much anyway with BIND 9.11 reaching EoL soon and this not being a critical problem.

The only quick way to fix this issue that I see is to defer detaching from zone->raw in zone_shutdown() if the zone is in the process of being dumped to disk.

The problem is easily reproducible, though I need to find a clean way of turning it into a system test.

This problem was discovered in the process of attempting to fix an unrelated issue.

Assignee
Assign to
Time tracking