BUG reconfig+auto-dnssec+high thread number leak resources and crash named
Summary
Adding multiple times, zones with auto-dnssec maintain; inline-signing yes; on a server with many cores leaks memory AND other unidentified resources. After some times, named stop responding. In the log:
Jul 22 20:18:38 localhost named[51093]: general: error: could not get query source dispatcher (127.0.1.6#0)
Jul 22 20:18:38 localhost named[51093]: general: error: reloading configuration failed: out of memory
Then, the daemon must be restarted to work again and the crash is not only a memory issue.
BIND version used
# /usr/local/sbin/named -V
BIND 9.16.5 (Stable Release) <id:c00b458>
running on FreeBSD amd64 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC
built by make with '--disable-linux-caps' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/namedb' '--with-dlopen=yes' '--with-libxml2' '--with-openssl=/usr' '--with-readline=-L/usr/local/lib -ledit' '--with-dlz-filesystem=yes' '--disable-dnstap' '--disable-fixed-rrset' '--disable-geoip' '--without-maxminddb' '--without-gssapi' '--with-libidn2=/usr/local' '--with-json-c' '--disable-largefile' '--with-lmdb=/usr/local' '--disable-native-pkcs11' '--without-python' '--disable-querytrace' 'STD_CDEFINES=-DDIG_SIGCHASE=1' '--enable-tcp-fastopen' '--with-tuning=default' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd12.1' 'build_alias=amd64-portbld-freebsd12.1' 'CC=cc' 'CFLAGS=-O2 -pipe -DLIBICONV_PLUG -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -ljson-c -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-DLIBICONV_PLUG -isystem /usr/local/include' 'CPP=cpp' 'PKG_CONFIG=pkgconf'
compiled by CLANG 4.2.1 Compatible FreeBSD Clang 8.0.1 (tags/RELEASE_801/final 366581)
compiled with OpenSSL version: OpenSSL 1.1.1d-freebsd 10 Sep 2019
linked to OpenSSL version: OpenSSL 1.1.1d-freebsd 10 Sep 2019
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with json-c version: 0.14
linked to json-c version: 0.14
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
default paths:
named configuration: /usr/local/etc/namedb/named.conf
rndc configuration: /usr/local/etc/namedb/rndc.conf
DNSSEC root key: /usr/local/etc/namedb/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/pid
named lock file: /var/run/named/named.lock
Steps to reproduce
I write a small perl script to reproduce the bug. It generates very small zones with ksk & zsk, add them to the config file and reconfig with "rndc reconfig". Each reconfig add the new zone and leaks memory. The leak seems proportional to the number of threads (by default the number of cores). An other type (unidentified) of resources is leaked which make the crash. This other leak seems proportional to the number of UDP listeners per interface (by default the number of cores).
The path at the beginning of the script are for FreeBSD pkg and should be adapted on other platform. More the test server have core , more the crash occurs quickly. My test server has 40 cores .
I could not reproduce the bug with non signed zones. It needs auto-dnssec maintain zones.
What is the current bug behavior?
On a server with 40 cores, it crash after about 204 zones . Reducing the number of UDP listeners per interface (-U n) from 40 to 20 double the number of iteration needed to crash the server. Reducing it to 10 double again this number. This has no effect on the visible (in top) memory leak.
Reducing the number of worker threads (-n xxx) reduce the memory leak but not the number of iteration needed to crash the server. Increasing the number of worker threads increase proportionally the memory leak. I try -n 100 and the memory used by the daemon increased to more than 70G for less than 500 zones.
This is why I think 2 different types of resources are leaked (memory proportional to the number of worker thread and an other type proportional to the number of UDP listeners per interface which make the crash).
What is the expected correct behavior?
rndc reconfig to add zone with "auto-dnssec maintain" should not leak memory and other resources leading the crash of the server.
Relevant configuration files
named.conf
logging {
channel stdlog {
syslog local1;
print-category yes;
print-severity yes;
print-time no;
};
category default { stdlog; };
category queries { "null"; };
category query-errors { "null"; };
category update { "null"; };
category update-security { "null"; };
category security { "null"; };
};
options {
directory "/usr/local/etc/namedb/working";
pid-file "/var/run/named/pid";
dump-file "/var/dump/named_dump.db";
statistics-file "/var/stats/named.stats";
listen-on { 127.0.1.6; };
listen-on-v6 { none; };
disable-empty-zone "255.255.255.255.IN-ADDR.ARPA";
disable-empty-zone "0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.IP6.ARPA";
disable-empty-zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.IP6.ARPA";
query-source address 127.0.1.6 port *;
allow-transfer {
127.0.1.6;
};
startup-notify-rate 100;
notify-source 127.0.1.6 ;
recursion no;
notify no;
check-integrity no;
minimal-responses yes;
max-transfer-idle-out 5;
max-transfer-time-out 10;
tcp-clients 1000;
tcp-listen-queue 100;
transfers-out 1000;
// dnssec-enable yes;
sig-validity-interval 60 30;
masterfile-format text;
request-ixfr no;
provide-ixfr no;
};
// The traditional root hints mechanism. Use this, OR the slave zones below.
zone "." { type hint; file "/usr/local/etc/namedb/named.root"; };
key "rndc-key" {
algorithm hmac-sha256;
secret "rbdxs1PCxJY6kH9C3J/vosWkeRz9DXZkN2muT6o1N2c=";
};
controls {
inet 127.0.1.6
port 953
allow { any; } keys { "rndc-key"; };
};
// The zones
include "/usr/local/etc/namedb/named.conf.custom.inc";
rndc.conf:
key "rndc-key" {
algorithm hmac-sha256;
secret "rbdxs1PCxJY6kH9C3J/vosWkeRz9DXZkN2muT6o1N2c=";
};
options {
default-key "rndc-key";
default-server 127.0.1.6;
default-port 953;
};
Relevant Log :
# rndc status
version: BIND 9.16.5 (Stable Release) <id:c00b458>
running on localhost.bookmyname.com: FreeBSD amd64 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC
boot time: Wed, 22 Jul 2020 18:37:50 GMT
last configured: Wed, 22 Jul 2020 18:37:50 GMT
configuration file: /usr/local/etc/namedb/named-custom.conf
CPUs found: 40
worker threads: 40
UDP listeners per interface: 40
number of zones: 411 (0 automatic)
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is ON
recursive clients: 0/900/1000
tcp clients: 0/1000
TCP high-water: 0
server is up and running