unit test netmgr_test of 9.17.15 fails reliably on s390x
Summary
netmgr_test fails always on 9.17.15 builds
BIND version used
BIND 9.17.15 (Development Release) <id:a3a1875>
running on Linux s390x 5.13.0-0.rc7.20210624git7426cedc7dad.54.fc35.s390x #1 SMP Thu Jun 24 15:11:21 UTC 2021
built by make with '--build=s390x-ibm-linux-gnu' '--host=s390x-ibm-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--localstatedir=/var' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--with-maxminddb' '--with-gssapi=yes' '--with-lmdb=yes' '--with-json-c' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--enable-full-report' 'build_alias=s390x-ibm-linux-gnu' 'host_alias=s390x-ibm-linux-gnu' 'CC=gcc' 'CFLAGS= -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection' 'LDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld ' 'LT_SYS_LIBRARY_PATH=/usr/lib64:' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 11.1.1 20210623 (Red Hat 11.1.1-6)
compiled with OpenSSL version: OpenSSL 1.1.1k FIPS 25 Mar 2021
linked to OpenSSL version: OpenSSL 1.1.1k FIPS 25 Mar 2021
compiled with libuv version: 1.41.0
linked to libuv version: 1.41.0
compiled with libnghttp2 version: 1.43.0
linked to libnghttp2 version: 1.43.0
compiled with libxml2 version: 2.9.12
linked to libxml2 version: 20912
compiled with json-c version: 0.14
linked to json-c version: 0.14
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.5.2
compiled with protobuf-c version: 1.3.3
linked to protobuf-c version: 1.3.3
threads support is enabled
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
DNSSEC root key: /etc/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
geoip-directory: /usr/share/GeoIP
Steps to reproduce
- build 9.17.15 on Fedora rawhide, arch s390x (fails on f34 too)
- make unit
What I have used:
- git clone https://src.fedoraproject.org/forks/pemensik/rpms/bind.git
- cd bind
- git checkout v9_17
- fedpkg builddep bind.spec
- fedpkg --release rawhide local
- or fedpkg --release rawhide scratch-build --arch s390x --srpm
What is the current bug behavior?
make[5]: Entering directory '/builddir/build/BUILD/bind-9.17.15/build/lib/isc/tests'
PASS: aes_test
PASS: buffer_test
PASS: counter_test
PASS: crc64_test
PASS: doh_test
PASS: errno_test
PASS: file_test
PASS: hash_test
PASS: heap_test
PASS: hmac_test
PASS: ht_test
PASS: lex_test
PASS: md_test
PASS: mem_test
PASS: netaddr_test
FAIL: netmgr_test
PASS: parse_test
PASS: pool_test
PASS: quota_test
PASS: radix_test
PASS: random_test
PASS: regex_test
PASS: result_test
PASS: safe_test
PASS: siphash_test
PASS: sockaddr_test
PASS: socket_test
PASS: symtab_test
PASS: task_test
PASS: taskpool_test
PASS: time_test
PASS: timer_test
============================================================================
Testsuite summary for BIND 9.17.15
============================================================================
# TOTAL: 32
# PASS: 31
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
============================================================================
See lib/isc/tests/test-suite.log
Please report to info@isc.org
============================================================================``
What is the expected correct behavior?
All PASS, FAIL: 0
Relevant configuration files
Used bind.spec.
Relevant logs and/or screenshots
$ ./netmgr_test
[==========] Running 81 test(s).
[ RUN ] mock_listenudp_uv_udp_open
[ OK ] mock_listenudp_uv_udp_open
[ RUN ] mock_listenudp_uv_udp_bind
[ OK ] mock_listenudp_uv_udp_bind
[ RUN ] mock_listenudp_uv_udp_recv_start
[ OK ] mock_listenudp_uv_udp_recv_start
[ RUN ] mock_udpconnect_uv_udp_open
[ OK ] mock_udpconnect_uv_udp_open
[ RUN ] mock_udpconnect_uv_udp_bind
[ OK ] mock_udpconnect_uv_udp_bind
[ RUN ] mock_udpconnect_uv_udp_connect
[ OK ] mock_udpconnect_uv_udp_connect
[ RUN ] mock_udpconnect_uv_recv_buffer_size
[ OK ] mock_udpconnect_uv_recv_buffer_size
[ RUN ] mock_udpconnect_uv_send_buffer_size
[ OK ] mock_udpconnect_uv_send_buffer_size
[ RUN ] udp_noop
[ OK ] udp_noop
[ RUN ] udp_noresponse
[ OK ] udp_noresponse
[ RUN ] udp_timeout_recovery
[ OK ] udp_timeout_recovery
[ RUN ] udp_recv_one
$ echo $?
255
(gdb) bt
#0 __GI_exit (status=status@entry=-1) at exit.c:143
#1 0x000003fffde8296e in exit_test (quit_application=1) at /usr/src/debug/cmocka-1.1.5-9.fc35.s390x/src/cmocka.c:408
#2 0x000003fffde82a7a in _fail (file=file@entry=0x2aa00018c8c "../../../../lib/isc/tests/netmgr_test.c",
line=<optimized out>) at /usr/src/debug/cmocka-1.1.5-9.fc35.s390x/src/cmocka.c:2196
#3 0x000003fffde82b3c in _assert_true (result=0, line=<optimized out>,
file=0x2aa00018c8c "../../../../lib/isc/tests/netmgr_test.c",
expression=0x2aa00018f3a "region->length >= sizeof(magic)")
at /usr/src/debug/cmocka-1.1.5-9.fc35.s390x/src/cmocka.c:1730
#4 0x000002aa000078c2 in listen_read_cb (handle=<optimized out>, eresult=<optimized out>,
region=region@entry=0x3fffd0395c8, cbarg=0x0) at ../../../../lib/isc/tests/netmgr_test.c:537
#5 0x000003fffdda9b16 in isc__nm_async_readcb (worker=worker@entry=0x0, ev0=ev0@entry=0x3fffd039680)
at ../../../lib/isc/netmgr/netmgr.c:2739
#6 0x000003fffdda9c98 in isc__nm_readcb (sock=sock@entry=0x2aa00e5fd30, uvreq=<optimized out>, eresult=eresult@entry=0)
at ../../../lib/isc/netmgr/netmgr.c:2714
#7 0x000002aa00005cba in udp_recv_cb (handle=<optimized out>, nrecv=0, buf=0x3fffd0398b8, addr=0x3fffd039748,
flags=<optimized out>) at ../../../../lib/isc/tests/../netmgr/udp.c:420
#8 0x000003fffdd22cc2 in uv__udp_recvmsg (handle=0x2aa00e603f0) at src/unix/udp.c:304
#9 uv__udp_io (loop=<optimized out>, w=0x2aa00e60470, revents=<optimized out>) at src/unix/udp.c:180
#10 0x000003fffdd26b78 in uv__io_poll (loop=0x2aa004d68b0, timeout=<optimized out>) at src/unix/linux-core.c:462
#11 0x000003fffdd163e0 in uv_run (loop=loop@entry=0x2aa004d68b0, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:385
#12 0x000003fffddad4f0 in nm_thread (worker0=0x2aa004d68a0) at ../../../lib/isc/netmgr/netmgr.c:746
#13 0x000003fffddf2696 in isc__trampoline_run (arg=0x2aa0057c850) at ../../../lib/isc/trampoline.c:184
#14 0x000003fffdb9a8a2 in start_thread (arg=<optimized out>) at pthread_create.c:429
#15 0x000003fffdc12d8e in thread_start () at ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:67
(gdb) frame 4
#4 0x000002aa000078c2 in listen_read_cb (handle=<optimized out>, eresult=<optimized out>,
region=region@entry=0x3fffd0395c8, cbarg=0x0) at ../../../../lib/isc/tests/netmgr_test.c:537
537 assert_true(region->length >= sizeof(magic));
(gdb) p region->length
$1 = 0
(gdb) p sizeof(magic)
$2 = 8
(gdb) p region
$3 = (isc_region_t *) 0x3fffd0395c8
(gdb) p *region
$4 = {base = 0x2aa0057d270 "", length = 0}
# cat /proc/cpuinfo
vendor_id : IBM/S390
# processors : 2
bogomips per cpu: 3033.00
max thread id : 0
features : esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
facilities : 0 1 2 3 4 6 7 8 9 10 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 40 41 42 43 44 45 46 47 48 49 50 51 52 53 55 57 73 74 75 76 77 80 81 82 128 129 131
cache0 : level=1 type=Data scope=Private size=128K line_size=256 associativity=8
cache1 : level=1 type=Instruction scope=Private size=96K line_size=256 associativity=6
cache2 : level=2 type=Data scope=Private size=2048K line_size=256 associativity=8
cache3 : level=2 type=Instruction scope=Private size=2048K line_size=256 associativity=8
cache4 : level=3 type=Unified scope=Shared size=65536K line_size=256 associativity=16
cache5 : level=4 type=Unified scope=Shared size=491520K line_size=256 associativity=30
processor 0: version = FF, identification = 3233E8, machine = 2964
processor 1: version = FF, identification = 3233E8, machine = 2964
cpu number : 0
physical id : 0
core id : 0
book id : 0
drawer id : 0
dedicated : 0
address : 0
siblings : 1
cpu cores : 1
version : FF
identification : 3233E8
machine : 2964
cpu MHz dynamic : 5000
cpu MHz static : 5000
cpu number : 1
physical id : 1
core id : 1
book id : 1
drawer id : 1
dedicated : 0
address : 1
siblings : 1
cpu cores : 1
version : FF
identification : 3233E8
machine : 2964
cpu MHz dynamic : 5000
cpu MHz static : 5000
Strange is this failure is not printed to stdout or stderr, it just exits from test with 255 error code.
Possible fixes
(If you can, link to the line of code that might be responsible for the problem.)
It reliably fails on Fedora 34 too. No idea why it always fails on s390x. I have seen also failure on x86_64 builder, which I could not reproduce on my own machines. s390x fails always. Could be related to CPU count?
Edited by Petr Menšík