Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • BIND BIND
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 568
    • Issues 568
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 111
    • Merge requests 111
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ISC Open Source ProjectsISC Open Source Projects
  • BINDBIND
  • Issues
  • #572
Closed
Open
Issue created Oct 03, 2018 by Michał Kępień@michalOwner

Improve accuracy of query error logging

There are two issues related to query error logging that bug me:

  1. Until BIND 9.11 (inclusive), when something goes wrong while recursing for an answer to a query, the default: branch of a switch statement in query_gotanswer() generates a SERVFAIL response. When serve-stale was implemented in BIND 9.12, it was done in a way which causes that default: branch to only set a flag (want_stale) in the query context for such queries, effectively moving the QUERY_ERROR() call for such queries to query_done() (and making query_done() call itself recursively, which hinders readability). This may be a source of confusion1 as the "query failed" log messages now point at a line in query_done() which appears to have something to do with serve-stale (if (qctx->want_stale)) even if the latter is not enabled.

  2. Apparently we are setting qctx->result to DNS_R_SERVFAIL for a lot of different failure modes:

    $ sed -n 's|.*\(QUERY_ERROR([^)]*);\)$|\1|p;' lib/ns/query.c | sort | uniq -c | sort 
          2 QUERY_ERROR(qctx, DNS_R_DROP);
          2 QUERY_ERROR(qctx, DNS_R_REFUSED);
          4 QUERY_ERROR(qctx, result);
         19 QUERY_ERROR(qctx, DNS_R_SERVFAIL);

    In some situations, this really is the only sensible thing to do (e.g. when the root key sentinel mechanism overrides the RCODE or when the SERVFAIL cache is hit) but in most others, doing this feels like replacing potentially useful diagnostic information with a rather less useful "oops, something went wrong" type of message:

    client @0x7fa62c0403b0 ::1#52563 (servfail.nl): query failed (SERVFAIL) for servfail.nl/IN/A at query.c:10681

    While in some cases the root cause of failed queries is indicated by the preceding log messages, troubleshooting others requires recompiling BIND with --enable-querytrace, which is not always a viable option for the user.

    I think we can improve named's behavior in this regard. Note that the response message's RCODE is derived from qctx->result using dns_result_torcode(), which handles a number of possible isc_result_t values and returns SERVFAIL for anything not explicitly listed. If we used QUERY_ERROR(qctx, result); instead of QUERY_ERROR(qctx, DNS_R_SERVFAIL); where possible, the specific error could be logged by query_error() and the response's RCODE would still be SERVFAIL. The win here would be that some light could be shed on the less obvious query errors just by bumping the debug level using rndc trace (or enabling query logging) rather than recompiling with --enable-querytrace.

  1. see e.g. https://lists.isc.org/pipermail/bind-users/2018-September/100868.html, though the issue caught me by surprise a couple of times as well while debugging ↩

Assignee
Assign to
Time tracking