assertion failure in 9.18.11
Summary
BIND 9.8.11 crashed with an assertion failure on a forward-only nameserver.
BIND version used
running on Linux i686 5.15.91 #1 SMP Wed Feb 1 05:31:49 EST 2023
built by make with '--prefix=/usr' '--localstatedir=/var' '--sysconfdir=/var/named' '--with-libjson=no' '--without-python' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/lib/pkgconfig:/usr/share/pkgconfig:/usr/openssl/lib/pkgconfig:/usr/local/lib/pkgconfig:/usr/local/apache2/lib/pkgconfig:/usr/local/python/lib/pkgconfig:/usr/lib/x86_64-linux-gnu/pkgconfig:/usr/pgsql/lib/pkgconfig'
compiled by GCC 12.2.0
compiled with OpenSSL version: OpenSSL 3.0.7 1 Nov 2022
linked to OpenSSL version: OpenSSL 3.0.7 1 Nov 2022
compiled with libuv version: 1.35.0
linked to libuv version: 1.35.0
compiled with libnghttp2 version: 1.51.0
linked to libnghttp2 version: 1.51.0
compiled with libxml2 version: 2.10.3
linked to libxml2 version: 21003
compiled with json-c version: 0.13.1
linked to json-c version: 0.13.1
compiled with zlib version: 1.2.13
linked to zlib version: 1.2.13
linked to maxminddb version: 1.6.0
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): no
default paths:
named configuration: /var/named/named.conf
rndc configuration: /var/named/rndc.conf
DNSSEC root key: /var/named/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
geoip-directory: /usr/share/GeoIP
Steps to reproduce
The machine had been running 9.18.11 fine for a few days. I rebooted it this morning (new kernel; 5.15.90 -> 5.15.91) and about four hours later named crashed.
What is the current bug behavior?
netmgr/netmgr.c:2203: INSIST(!worker->recvbuf_inuse) failed, back trace
/usr/sbin/named() [0x8065211]
/usr/lib/libisc-9.18.11.so(isc_assertion_failed+0x25) [0x77bfd4c5]
/usr/lib/libisc-9.18.11.so(isc__nm_alloc_cb+0x109) [0x77be2e59]
/usr/lib/libuv.so.1(+0x1d798) [0x7736d798]
/usr/lib/libuv.so.1(+0x1df6f) [0x7736df6f]
/usr/lib/libuv.so.1(uv__io_poll+0x3c8) [0x77370278]
/usr/lib/libuv.so.1(uv_run+0x106) [0x7735f6e6]
/usr/lib/libisc-9.18.11.so(+0x21c88) [0x77bebc88]
/usr/lib/libisc-9.18.11.so(isc__trampoline_run+0x22) [0x77c25c12]
/lib/libc.so.6(+0x85cfd) [0x76fc8cfd]
/lib/libc.so.6(+0x11cfc8) [0x7705ffc8]
exiting (due to assertion failure)
What is the expected correct behavior?
BIND doesn't crash.
This happened very quickly on 9.18.10; I reported it to security (per the docs; assertion failure) and it was suggested to try 9.18.11 which was being released the next day. 9.18.11 has been running fine on all our machines since then -- of about 50 machines, this is the first crash I've seen.
Relevant configuration files
This is a forward-only cache server:
options {
listen-on { 127.0.0.1; };
forward only;
forwarders { 69.67.192.12; 69.67.192.11; };
auth-nxdomain no;
};
This machine (all of them, but this one in particular) is running glibc 2.36. Most machines run a 32-bit Linux, but there are ten or so 64-bit machines. This one is a 32.
This is not running jemalloc, which I understand is recommended. But I still wouldn't expect a crash like this if it's malloc-related.
Possible fixes
(If you can, link to the line of code that might be responsible for the problem.)