Possible data race in rbtdb, happens occasionally on ppc64le
Summary
Assertion failures was reported on Fedora bug 1736762. It was reported on ppc64le, where it happens occasionally.
BIND version used
(Paste the output of named -V
.)
BIND 9.11.11-RedHat-9.11.11-1.fc30 (Extended Support Version) <id:4ae9ff1>
running on Linux ppc64le 5.3.12-200.fc30.ppc64le #1 SMP Thu Nov 21 22:15:03 UTC 2019
built by make with '--build=ppc64le-redhat-linux-gnu' '--host=ppc64le-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/bin/python3' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-geoip' '--with-libidn2' '--enable-openssl-hash' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=yes' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=ppc64le-redhat-linux-gnu' 'host_alias=ppc64le-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mcpu=power8 -mtune=power8 -fasynchronous-unwind-tables -fstack-clash-protection' 'LDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 9.2.1 20190827 (Red Hat 9.2.1-1)
compiled with OpenSSL version: OpenSSL 1.1.1d FIPS 10 Sep 2019
linked to OpenSSL version: OpenSSL 1.1.1d FIPS 10 Sep 2019
compiled with libxml2 version: 2.9.9
linked to libxml2 version: 20909
compiled with libjson-c version: 0.13.1
linked to libjson-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
Steps to reproduce
now known, happens randomly.
What is the current bug behavior?
Jul 22 00:51:58 Blackbird named[1860]: ../../../lib/dns/rbtdb.c:10337: INSIST((rbtdb->rdatasets[header->node->locknum]).head == (header)) failed, back trace
Jul 22 00:51:58 Blackbird audit[1860]: ANOM_ABEND auid=4294967295 uid=25 gid=25 ses=4294967295 pid=1860 comm="isc-worker0006" exe="/usr/sbin/named" sig=6 res=1
Jul 22 00:51:58 Blackbird named[1860]: #0 0x12f650508 in ??
Jul 22 00:51:58 Blackbird named[1860]: #1 0x7fffa8f5d978 in ??
Jul 22 00:51:58 Blackbird named[1860]: #2 0x7fffa91b2714 in ??
Jul 22 00:51:58 Blackbird named[1860]: #3 0x7fffa91b9738 in ??
Jul 22 00:51:58 Blackbird named[1860]: #4 0x7fffa91c3834 in ??
Jul 22 00:51:58 Blackbird named[1860]: #5 0x7fffa913bb90 in ??
Jul 22 00:51:58 Blackbird named[1860]: #6 0x12f6630e0 in ??
Jul 22 00:51:58 Blackbird named[1860]: #7 0x12f66c258 in ??
Jul 22 00:51:58 Blackbird named[1860]: #8 0x7fffa8f91c28 in ??
Jul 22 00:51:58 Blackbird named[1860]: #9 0x7fffa8ab8c10 in ??
Jul 22 00:51:58 Blackbird named[1860]: #10 0x7fffa86ed8a8 in ??
Jul 22 00:51:58 Blackbird named[1860]: exiting (due to assertion failure)
Jul 22 00:51:59 Blackbird systemd[1]: named.service: Main process exited, code=killed, status=6/ABRT
Jul 22 00:51:59 Blackbird systemd[1]: named.service: Failed with result 'signal'.
(gdb) bt
#0 0x00007fffa51b2668 in __libc_signal_restore_set (set=0x7fffa381ae78) at ../sysdeps/unix/sysv/linux/internal-signals.h:84
#1 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#2 0x00007fffa519468c in __GI_abort () at abort.c:79
#3 0x0000000105810cf4 in assertion_failed (file=<optimized out>, line=<optimized out>, type=<optimized out>,
cond=<optimized out>) at ../../../bin/named/main.c:234
#4 0x00007fffa5b0db68 in isc_assertion_failed (file=<optimized out>, line=<optimized out>, type=<optimized out>,
cond=<optimized out>) at ../../../lib/isc/assertions.c:51
#5 0x00007fffa5d64134 in update_header (header=header@entry=0x7fff9969f010, now=<optimized out>, rbtdb=<optimized out>,
rbtdb=<optimized out>) at ../../../lib/dns/rbtdb.c:10321
#6 0x00007fffa5d6b178 in find_deepest_zonecut (search=0x7fffa381bbd8, node=0x7fff9969a010, nodep=0x7fffa381c940,
foundname=0x7fff99c38150, rdataset=0x7fff99c3c5c0, sigrdataset=0x0) at ../../../lib/dns/rbtdb.c:4878
#7 0x00007fffa5d752a4 in cache_find (db=<optimized out>, name=<optimized out>, version=<optimized out>,
type=<optimized out>, options=<optimized out>, now=<optimized out>, nodep=0x7fffa381c940, foundname=0x7fff99c38150,
rdataset=<optimized out>, sigrdataset=<optimized out>) at ../../../lib/dns/rbtdb.c:5087
#8 0x00007fffa5cebb40 in dns_db_findext (db=0x7fff99694010, name=0x7fff99c381f0, version=0x0, type=<optimized out>,
options=<optimized out>, now=<optimized out>, nodep=0x7fffa381c940, foundname=<optimized out>, methods=0x7fffa381ca58,
clientinfo=0x7fffa381ca88, rdataset=0x7fff99c3c5c0, sigrdataset=0x0) at ../../../lib/dns/db.c:594
#9 0x00000001058238b0 in query_find (client=<optimized out>, event=<optimized out>, event@entry=0x0, qtype=qtype@entry=1)
at ../../../bin/named/query.c:7274
#10 0x000000010582dfb4 in ns_query_start (client=0x7fff9425fc70) at ../../../bin/named/query.c:9672
#11 0x00000001058022c0 in client_request (task=<optimized out>, event=<optimized out>) at ../../../bin/named/client.c:3093
#12 0x00007fffa5b42068 in dispatch (manager=0x7fffa3825010) at ../../../lib/isc/task.c:1143
#13 run (uap=0x7fffa3825010) at ../../../lib/isc/task.c:1315
#14 0x00007fffa56688c8 in start_thread (arg=0x7fffa381f070) at pthread_create.c:479
#15 0x00007fffa529f498 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
What is the expected correct behavior?
No assertion failure.
Relevant configuration files
(Paste any relevant configuration files - please use code blocks (```)
to format console output. If submitting the contents of your
configuration file in a non-confidential Issue, it is advisable to
obscure key secrets: this can be done automatically by using
named-checkconf -px
.)
Relevant logs and/or screenshots
Fedora bug contains some backtraces.
Possible fixes
(If you can, link to the line of code that might be responsible for the problem.) I think write lock should be used from the start. Or it should be upgraded before search for actual header pointers are used.
I lack expertise related to thread collisions in bind database structures. I would like to ask some help, if this might be related to already found issues in rbtdb.