BIND 9.16 unit tests failing reliably on x86_64 NUMA machines
Summary
Some libisc unit tests are reliably failing on 9.16.10 build
BIND version used
(Paste the output of named -V
.)
BIND 9.16.10-RedHat-9.16.10-2.fc34 (Stable Release) <id:fac8def>
running on Linux x86_64 5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1 SMP Fri Jan 8 16:28:08 UTC 2021
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/bin/python3' '--with-libtool' '--localstatedir=/var' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--with-maxminddb' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=yes' '--without-libjson' '--with-json-c' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CC=gcc' 'CFLAGS= -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld ' 'LT_SYS_LIBRARY_PATH=/usr/lib64:' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 11.0.0 20210113 (Red Hat 11.0.0-0)
compiled with OpenSSL version: OpenSSL 1.1.1i FIPS 8 Dec 2020
linked to OpenSSL version: OpenSSL 1.1.1i FIPS 8 Dec 2020
compiled with libuv version: 1.40.0
linked to libuv version: 1.40.0
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with json-c version: 0.14
linked to json-c version: 0.14
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.4.3
compiled with protobuf-c version: 1.3.3
linked to protobuf-c version: 1.3.3
threads support is enabled
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
DNSSEC root key: /etc/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
geoip-directory: /usr/share/GeoIP
Steps to reproduce
Run unit tests on numa machine with 40+ processors.
What is the current bug behavior?
Few lib/isc tests end with signal 6.
gdb .libs/lt-buffer_test
(gdb) run
(gdb) bt
#0 0x00007ffff7da0272 in raise () from /lib64/libc.so.6
#1 0x00007ffff7d898a4 in abort () from /lib64/libc.so.6
#2 0x00007ffff7f5f1a5 in isc_assertion_failed (file=file@entry=0x7ffff7fa60ea "../../../lib/isc/hp.c", line=line@entry=88,
type=type@entry=isc_assertiontype_require, cond=cond@entry=0x7ffff7fa60ce "tid_v < isc__hp_max_threads")
at ../../../lib/isc/assertions.c:47
#3 0x00007ffff7f6363d in tid () at ../../../lib/isc/hp.c:85
#4 tid () at ../../../lib/isc/hp.c:85
#5 isc_hp_protect (hp=0x55555db5adb8, ihp=<optimized out>, atom=0x555558f9ef80) at ../../../lib/isc/hp.c:167
#6 0x00007ffff7f8036c in isc_queue_dequeue (queue=<optimized out>) at ../../../lib/isc/queue.c:181
#7 isc_queue_dequeue (queue=queue@entry=0x555558f9ef80) at ../../../lib/isc/queue.c:173
#8 0x00007ffff7f84e90 in process_queue (worker=worker@entry=0x5555555744c0, queue=0x555558f9ef80)
at netmgr/../../../../lib/isc/netmgr/netmgr.c:606
#9 0x00007ffff7f8567b in process_priority_queue (worker=0x5555555744c0) at netmgr/../../../../lib/isc/netmgr/netmgr.c:585
#10 process_queues (worker=0x5555555744c0) at netmgr/../../../../lib/isc/netmgr/netmgr.c:595
#11 async_cb (handle=<optimized out>) at netmgr/../../../../lib/isc/netmgr/netmgr.c:556
#12 0x00007ffff7a3bffd in uv__async_io (loop=0x5555555744d0, w=<optimized out>, events=<optimized out>) at src/unix/async.c:163
#13 0x00007ffff7a54ac3 in uv__io_poll (loop=0x5555555744d0, timeout=<optimized out>) at src/unix/linux-core.c:462
#14 0x00007ffff7a44794 in uv_run (loop=loop@entry=0x5555555744d0, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:385
#15 0x00007ffff7f852f9 in nm_thread (worker0=0x5555555744c0) at netmgr/../../../../lib/isc/netmgr/netmgr.c:496
#16 0x00007ffff786a269 in start_thread () from /lib64/libpthread.so.0
#17 0x00007ffff7e64143 in clone () from /lib64/libc.so.6
(gdb) frame 2
#2 0x00007ffff7f5f1a5 in isc_assertion_failed (file=file@entry=0x7ffff7fa60ea "../../../lib/isc/hp.c", line=line@entry=88,
type=type@entry=isc_assertiontype_require, cond=cond@entry=0x7ffff7fa60ce "tid_v < isc__hp_max_threads")
at ../../../lib/isc/assertions.c:47
47 abort();
(gdb) p tid_v
$1 = 145
(gdb) p isc__hp_max_threads
$2 = 128
What is the expected correct behavior?
It should pass, just like on older
Relevant configuration files
(Paste any relevant configuration files - please use code blocks (```)
to format console output. If submitting the contents of your
configuration file in a non-confidential Issue, it is advisable to
obscure key secrets: this can be done automatically by using
named-checkconf -px
.)
Relevant logs and/or screenshots
Fails oten on BIND build on Fedora infrastructure. https://kojipkgs.fedoraproject.org//work/tasks/3832/59983832/build.log https://koji.fedoraproject.org/koji/taskinfo?taskID=59983579
22/28 passed (6 failed)
===> Broken tests
buffer_test:main -> broken: Received signal 6 [3.586s]
mem_test:main -> broken: Received signal 6 [6.796s]
pool_test:main -> broken: Received signal 6 [3.524s]
socket_test:main -> broken: Received signal 6 [3.409s]
task_test:main -> broken: Received signal 6 [15.954s]
taskpool_test:main -> broken: Received signal 6 [3.504s]
===> Summary
Results read from /root/.kyua/store/results.root_bind_bind-9.16.10_build_lib_isc_tests.20210119-125008-906997.db
Test cases: 28 total, 0 skipped, 0 expected failures, 6 broken, 0 failed
Total time: 62.697s
R:FAIL:status:1
E:unit:Tue Jan 19 07:51:12 AM EST 2021
# tail -30 /proc/cpuinfo
power management:
processor : 55
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
stepping : 2
microcode : 0x2d
cpu MHz : 2400.000
cache size : 35840 KB
physical id : 1
siblings : 28
core id : 14
cpu cores : 14
apicid : 61
initial apicid : 61
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5187.78
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
Possible fixes
(If you can, link to the line of code that might be responsible for the problem.)