Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • BIND BIND
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 530
    • Issues 530
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 102
    • Merge requests 102
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ISC Open Source Projects
  • BINDBIND
  • Issues
  • #1550
Closed
Open
Created Jan 09, 2020 by Cathy Almond@cathyaDeveloper

[ISC-support #15905] rndc stop issued after server (or single zone) rndc reload and during ixfr-from-differences processing leaves the .jnl file corrupted

From Support ticket #15905

9.11.x (but I don't anticipate that the BIND version makes any difference)

This problem is readily reproducible (and, I suspect, occurs because "rndc stop" doesn't recognise that the zone is effectively 'dynamic' because it has been reloaded with 'ixfr-from-differences yes;').

Here's what is being done, per the server logs. First the reload (the outcome is also the same with 'reload zone'):

09-Jan-2020 14:50:29.157 general: info: received control channel command 'reload' ============>>> RELOAD COMMAND
09-Jan-2020 14:50:29.179 general: info: loading configuration from '/etc/named.conf'
... etcetera
09-Jan-2020 14:50:29.202 general: info: reloading configuration succeeded
09-Jan-2020 14:50:29.202 general: info: reloading zones succeeded
09-Jan-2020 14:50:29.202 general: notice: all zones loaded
09-Jan-2020 14:50:29.202 general: notice: running

Note that the reload has completed, as far as the logging is concerned, but, it would appear that the regeneration of the .jnl files via 'ixfr-from-differences yes;' has not (high CPU use by named - suggests that it is busy doing this).

Then the 'rndc stop' is issued - and it completes almost immediately (no evidence that it is waiting for the processing to complete), in fact, it seems to log that it has aborted a zone reload, even though the previous logging said that the reload had completed:

09-Jan-2020 14:50:33.211 general: info: received control channel command 'stop -p'
09-Jan-2020 14:50:33.212 general: info: shutting down: flushing changes

.. etcetera (just the logs of the various sockets being closed here)

09-Jan-2020 14:50:33.216 general: error: zone test.com/IN: loading from master file dynamic/test.com.zone failed: operation canceled
09-Jan-2020 14:50:33.216 general: error: zone test.com/IN: not loaded due to errors.
09-Jan-2020 14:50:34.265 general: notice: exiting

And then after this, restarting named - the zone can no longer be loaded - the journal file does not tally with the zone itself:

...
09-Jan-2020 14:51:28.141 general: error: zone test.com/IN: journal rollforward failed: journal out of sync with zone
09-Jan-2020 14:51:28.141 general: error: zone test.com/IN: not loaded due to errors.
09-Jan-2020 14:51:28.142 general: notice: all zones loaded
09-Jan-2020 14:51:28.142 general: notice: running

======

Something has gone badly wrong during the 'rndc stop' - which is supposed to be a graceful shutdown of named. I'm assuming that the problem is that the .jnl file itself is corrupt, rather than that something has happened to the zone file on disk - but will ask for more data to confirm this.

stop [-p]
Stop the server, making sure any recent changes made through dynamic update
or IXFR are first saved to the master files of the updated zones. If -p is
specified named’s process id is returned. This allows an external process
to determine when named had completed stopping.

Now, since ixfr-from-differences processing could take an age to complete, I don't think it's reasonable to wait forever on the rndc stop. Possibly one solution would be have a (configurable?) timeout, after which any pending ixfr-from-differences .jnl file generation is terminated and the incomplete .jnl file discarded/removed. After all, the administrator in this scenario has just presented named with a new zone file that it asked named to load - so we know that the full copy of the zone on disk should be good and valid - we just loaded from it.

====

The workaround if this happens, is presumably to manually discard the corrupted .jnl file and restart named.

Edited Mar 16, 2021 by Support RT
Assignee
Assign to
Time tracking