BIND stuck after error "unable to obtain neither an IPv4 nor an IPv6 dispatch"

Summary

Server appeared to be stuck in some strange state after "rndc reconfig".

BIND version used

BIND 9.16.5 (Stable Release) <id:c00b458>
running on Linux x86_64 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc/named' '--disable-static' '--with-pic' '--without-python' '--with-libtool' '--without-lmdb' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-39)
compiled with OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
linked to OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
compiled with libxml2 version: 2.9.1
linked to libxml2 version: 20901
compiled with json-c version: 0.11
linked to json-c version: 0.11
compiled with zlib version: 1.2.7
linked to zlib version: 1.2.7
threads support is enabled

default paths:
  named configuration:  /etc/named/named.conf
  rndc configuration:   /etc/named/rndc.conf
  DNSSEC root key:      /etc/named/bind.keys
  nsupdate session key: /var/run/named/session.key
  named PID file:       /var/run/named/named.pid
  named lock file:      /var/run/named/named.lock

Steps to reproduce

Haven't been able to reproduce this. We run "rndc reconfig" frequently on many of our BIND servers, and this is the first time I've seen such behaviour. On this specific server, after log rotation, logrotate ran "rndc reconfig" and BIND logged this:

What is the current bug behavior?

After logging that error, BIND appeared to be stuck in a strange state. It was answering queries over UDP (did not check TCP). However, it was not refreshing any of the secondary zones. However, I don't know what was going on, because logrotate compressed the rotated log file and deleted the original, but the named process still held it open. However, it couldn't write any logs. "rndc zonestatus" for various zones showed them loaded and stuck on older serials.

What is the expected correct behavior?

BIND should have reloaded its configuration and created a new log file in /var/log/named/named.log

Relevant configuration files

I'm not including the entire config file here, but here are the relevant snippets:

logging {
        channel "default" {
                file "/var/log/named/named.log";
                severity info;
                print-time yes;
                print-category yes;
        };
        channel "ratelimit" {
                file "/var/log/named/ratelimit.ringlog" versions 10 size 10485760;
                print-time yes;
        };
        category "default" {
                "default";
        };
        category "rate-limit" {
                "ratelimit";
        };
        category "update" {
                "null";
        };
        category "update-security" {
                "null";
        };
};
options {
        answer-cookie no;
        directory "/var/named";
        keep-response-order {
                "any";
        };
        listen-on {
                127.0.0.1/32;
                IPv4 address/32;
        };
        listen-on-v6 {
                ::1/128;
                IPv6 address/128;
        };
        server-id hostname;
        tcp-clients 1000;
        transfers-in 100;
        transfers-out 100;
        version "9.16";
        dnssec-validation no;
        ixfr-from-differences yes;
        minimal-responses yes;
        recursion no;
        allow-transfer {
                "internal";
        };
        max-journal-size 10485760;
        notify explicit;
        zero-no-soa-ttl no;
        zone-statistics none;
};

Relevant logs and/or screenshots

This was the last thing logged in the rotated file:

10-Aug-2020 09:21:29.548 general: received control channel command 'reconfig'
10-Aug-2020 09:21:29.548 general: loading configuration from '/etc/named/named.conf'
10-Aug-2020 09:21:29.848 general: unable to open '/etc/named/bind.keys'; using built-in keys instead
10-Aug-2020 09:21:29.851 general: using default UDP/IPv4 port range: [32768, 60999]
10-Aug-2020 09:21:29.851 general: using default UDP/IPv6 port range: [32768, 60999]
10-Aug-2020 09:21:29.853 general: sizing zone task pool based on 4615 zones
10-Aug-2020 09:21:30.253 config: none:98: 'max-cache-size 90%' - setting to 57795MB (out of 64216MB)
10-Aug-2020 09:21:30.253 general: ./server.c:4530: unexpected error:
10-Aug-2020 09:21:30.253 general: unable to obtain neither an IPv4 nor an IPv6 dispatch
10-Aug-2020 09:21:30.276 general: reloading configuration failed: unexpected error

We were debugging the issue the next day, on Aug 11. When we couldn't figure anything out, we tried to restart BIND. It runs under systemd on our server, and this is what appeared in the systemd journal:

Aug 11 09:08:00 hostname systemd[1]: Stopping BIND...
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:08:01 hostname named[3686]: named: src/unix/udp.c:119: uv__udp_finish_close: Assertion `handle->send_queue_size == 0' failed.
Aug 11 09:09:30 hostname systemd[1]: named.service stop-sigterm timed out. Killing.

BIND logged those errors, but failed to exit, so systemd sent it a KILL signal after 90s.

Possible fixes

I don't have any suggestion for a fix.

Edited Aug 12, 2020 by Anand Buddhdev

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information