BIND-9.16.19: Resolver fetch process hang
Summary
Sending outgoing resquery process could be hang for seconds, which is abnormal.
BIND version used
BIND 9.16.19-RedHat-9.16.19-20210816135007.alios7 (Stable Release) id:df0e751 running on Linux x86_64 3.10.0-327.ali2012.alios7.x86_64 #1 SMP Mon Oct 9 14:09:14 CST 2017 built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var' '--enable-epoll' '--with-pic' '--with-python=/home/tops/bin/python2.7' '--with-python-install-dir=/home/tops' '--disable-geoip' '--with-tuning=large' '--enable-querytrace' '--enable-auto-validation=no' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig:/usr/lib64/pkgconfig/:/home/admin/246_20210813142442272_220712980_code/rpm_workspace/rpm/.dep_create/lib/pkgconfig' compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-4) compiled with OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013 linked to OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013 compiled with libuv version: 1.35.0 linked to libuv version: 1.35.0 compiled with libxml2 version: 2.9.1 linked to libxml2 version: 20901 compiled with zlib version: 1.2.7 linked to zlib version: 1.2.7 threads support is enabled
default paths: named configuration: /etc/named.conf rndc configuration: /etc/rndc.conf DNSSEC root key: /etc/bind.keys nsupdate session key: /var/run/named/session.key named PID file: /var/run/named/named.pid named lock file: /var/run/named/named.lock
Steps to reproduce
Not sure, supposing some specific flow would cause the issue.
What is the current bug behavior?
Sending outgoing resquery process could be hang for seconds, which can be detected by querytrace(enabled by compiling):
17-Aug-2021 00:41:27.982 fetch: game-redis-1159.knight.game.12406.270199.-100/CNAME
17-Aug-2021 00:41:27.982 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): create
17-Aug-2021 00:41:27.982 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): join
17-Aug-2021 00:41:27.982 fetch 0x7fa699adedf0 (fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME)): created
17-Aug-2021 00:41:27.982 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): start
17-Aug-2021 00:41:27.982 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): try fctx->qc=0
17-Aug-2021 00:41:27.982 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): cancelqueries
17-Aug-2021 00:41:27.982 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): getaddresses fctx->depth=0
17-Aug-2021 00:41:27.982 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): query
17-Aug-2021 00:41:27.982 resquery 0x7fa604d05360 (fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME)): send
17-Aug-2021 00:41:29.122 resquery 0x7fa604d05360 (fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME)): sent
17-Aug-2021 00:41:29.123 resquery 0x7fa604d05360 (fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME)): udpconnected
17-Aug-2021 00:41:29.123 resquery 0x7fa604d05360 (fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME)): senddone
17-Aug-2021 00:41:29.152 resquery 0x7fa604d05360 (fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME)): response
17-Aug-2021 00:41:29.152 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): rctx_answer
17-Aug-2021 00:41:29.152 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): cache_message
17-Aug-2021 00:41:29.152 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): cache_name
17-Aug-2021 00:41:29.152 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): findnoqname
17-Aug-2021 00:41:29.153 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): clone_results
17-Aug-2021 00:41:29.153 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): [result: success] query canceled in response(); responding
17-Aug-2021 00:41:29.153 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): cancelquery
17-Aug-2021 00:41:29.153 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): done
17-Aug-2021 00:41:29.153 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): stopqueries
17-Aug-2021 00:41:29.153 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): cancelqueries
17-Aug-2021 00:41:29.153 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): sendevents
17-Aug-2021 00:41:29.158 fetch 0x7fa699adedf0 (fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME)): destroyfetch
17-Aug-2021 00:41:29.158 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): shutdown
17-Aug-2021 00:41:29.187 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): doshutdown
17-Aug-2021 00:41:29.187 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): stopqueries
17-Aug-2021 00:41:29.187 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): cancelqueries
17-Aug-2021 00:41:29.187 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): unlink
17-Aug-2021 00:41:29.187 fctx 0x7fa60be39de0(game-redis-1159.knight.game.12406.270199.-100/CNAME): destroy
It take more than 1 second between resquery "send" and be "sent". Meanwhile, all working threads are in D(uninterruptible sleep) state:
What is the expected correct behavior?
Not hang.
(What you should see instead.)
Relevant configuration files
(Paste any relevant configuration files - please use code blocks (```)
to format console output. If submitting the contents of your
configuration file in a non-confidential Issue, it is advisable to
obscure key secrets: this can be done automatically by using
named-checkconf -px
.)
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code, as it's very hard to read otherwise.)
Possible fixes
(If you can, link to the line of code that might be responsible for the problem.)