Bad performance in BIND 9.16.6
Summary
Hello everybody,
I am creating this issue because I am having problems with my VMs that are running BIND 9.16.6 in openSUSE Leap 15.3 and ESXi 7.0 for which I am getting bad performances when querying them in UDP.
After some days of research, I have not found a solution yet so I am coming here hoping that someone have any idea that could help me to go forward.
For information, everything was working fine when I was using 9.11 version so the issue seems to be related to in some way to #2143 (closed)
BIND version used
BIND 9.16.6 (Stable Release) <id:25846cf>
running on Linux x86_64 5.3.18-150300.59.87-default #1 SMP Thu Jul 21 14:31:28 UTC 2022 (cc90276)
built by make with '--host=x86_64-suse-linux-gnu' '--build=x86_64-suse-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/lib' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--disable-dependency-tracking' '--with-python=/usr/bin/python3' '--includedir=/usr/include/bind' '--disable-static' '--with-openssl' '--enable-threads' '--with-libtool' '--with-libxml2' '--with-libjson' '--with-libidn2' '--with-dlz-mysql' '--with-dlz-ldap' '--with-randomdev=/dev/urandom' '--enable-ipv6' '--with-pic' '--disable-openssl-version-check' '--with-tuning=large' '--with-geoip' '--with-dlopen' '--with-gssapi=yes' '--disable-isc-spnego' '--enable-fixed-rrset' '--enable-filter-aaaa' '--with-systemd' '--enable-full-report' 'build_alias=x86_64-suse-linux-gnu' 'host_alias=x86_64-suse-linux-gnu' 'CFLAGS=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fPIE -DNO_VERSION_DATE' 'LDFLAGS=-pie' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 7.5.0
compiled with OpenSSL version: OpenSSL 1.1.1d 10 Sep 2019
linked to OpenSSL version: OpenSSL 1.1.1d 10 Sep 2019
compiled with libuv version: 1.18.0
linked to libuv version: 1.18.0
compiled with libxml2 version: 2.9.7
linked to libxml2 version: 20907
compiled with json-c version: 0.13
linked to json-c version: 0.13
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
DNSSEC root key: /etc/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
Steps to reproduce
The problem is happening constantly since I start BIND.
What is the current bug behavior?
I have observed that there is always a connection with a high Recv-Q value and another weird thing is that I have got 2vCPUs and 1 NIC with 2 UDP listeners per interface but when I check with ss
command, I can see 3 network connections attached to the IP of the NIC.
What is the expected correct behavior?
Recv-Q should be getting empty instead of being stuck to a high value. Besides, I suppose that I should see only 2 UDP connections for my NIC (according to the number of UDP listeners per interface).
Relevant logs and/or screenshots
Output of ss -u -a -n '( sport = :53 )'
command:
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 213760 0 172.31.18.10:53 0.0.0.0:*
UNCONN 0 0 172.31.18.10:53 0.0.0.0:*
UNCONN 0 0 172.31.18.10:53 0.0.0.0:*
UNCONN 0 0 127.0.0.1:53 0.0.0.0:*
UNCONN 0 0 127.0.0.1:53 0.0.0.0:*
Output of rndc status
command:
version: BIND 9.16.6 (Stable Release) <id:25846cf> (None of your business)
running on vmxdns: Linux x86_64 5.3.18-150300.59.87-default #1 SMP Thu Jul 21 14:31:28 UTC 2022 (cc90276)
boot time: Thu, 25 Aug 2022 09:40:54 GMT
last configured: Thu, 25 Aug 2022 09:40:54 GMT
configuration file: /etc/named.conf (/var/lib/named/etc/named.conf)
CPUs found: 2
worker threads: 2
UDP listeners per interface: 2
number of zones: 116 (99 automatic)
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/900/1000
tcp clients: 0/150
TCP high-water: 8
server is up and running
I have open a VMWware case to check with them if there is something at VM level that could explain that behavior but, for the moment, they think that the problem comes from BIND, not being able to read data from the network buffer.
Have you got any ideas, please ?
Thank you very much for your help.
Regards.