pthread_mutex_lock() failed during shutdown on FreeBSD 12.3
https://gitlab.isc.org/isc-private/bind9/-/jobs/2752200
Click to expand/collapse backtrace
D:inline:#0 0x000000080151769a in thr_kill () from /lib/libc.so.7
D:inline:#1 0x0000000801515af4 in raise () from /lib/libc.so.7
D:inline:#2 0x000000080148b719 in abort () from /lib/libc.so.7
D:inline:#3 0x000000000023f75d in library_fatal_error (file=0x8008b1067 "request.c", line=984, format=0x8008a58ff "%s() failed in %s(): %s", args=0x7fffdfbfbb70) at main.c:277
D:inline:#4 0x000000080031bdd5 in isc_error_fatal (file=0x18809 <error: Cannot access memory at address 0x18809>, line=6, format=0x0) at error.c:68
D:inline:#5 0x00000008009cde97 in req_senddone (eresult=<optimized out>, region=<optimized out>, arg=<optimized out>) at request.c:998
D:inline:#6 0x00000008009134d5 in send_done (handle=0x80533d380, result=ISC_R_SHUTTINGDOWN, cbarg=0x0) at dispatch.c:1862
D:inline:#7 0x00000008003169f8 in isc__nm_udp_send (handle=0x18809, region=0x7fffdfbfbd00, cb=0x8009134a0 <send_done>, cbarg=0x0) at netmgr/udp.c:704
D:inline:#8 0x000000080030919c in isc_nm_send (handle=0x18809, region=0x6, cb=0x0, cbarg=0x8015176ba <thr_self+10>) at netmgr/netmgr.c:1985
D:inline:#9 0x0000000800913470 in dns_dispatch_send (resp=0x8051b6a00, r=0x7fffdfbfbd00, dscp=<optimized out>) at dispatch.c:1913
D:inline:#10 0x00000008009ce20d in req_send (request=0x8051bbc40) at request.c:293
D:inline:#11 0x00000008009cdc52 in req_connected (eresult=ISC_R_SUCCESS, region=<optimized out>, arg=0x8051bbc40) at request.c:962
D:inline:#12 0x000000080091337e in udp_connected (handle=0x80533d380, eresult=ISC_R_SUCCESS, arg=0x8051b6a00) at dispatch.c:1783
D:inline:#13 0x0000000800309545 in isc__nm_async_connectcb (worker=<optimized out>, ev0=<optimized out>) at netmgr/netmgr.c:2172
D:inline:#14 0x0000000800306aab in process_netievent (arg=0x80533d500) at netmgr/netmgr.c:497
D:inline:#15 0x0000000800322d47 in isc__job_cb (idle=0x8051b3048) at job.c:75
D:inline:#16 0x00000008011215cd in ?? () from /usr/local/lib/libuv.so.1
D:inline:#17 0x000000080111b026 in uv_run () from /usr/local/lib/libuv.so.1
D:inline:#18 0x0000000800329d7e in loop_run (loop=0x801952f90) at loop.c:266
D:inline:#19 0x0000000800328f6c in loop_thread (arg=0x801952f90) at loop.c:293
D:inline:#20 0x000000080033f8d6 in isc__trampoline_run (arg=0x8019ef360) at trampoline.c:189
D:inline:#21 0x000000080134108c in ?? () from /lib/libthr.so.3
D:inline:#22 0x0000000000000000 in ?? ()
bin/tests/system/inline/ns2/named.run
:
08-Sep-2022 12:18:02.696 request.c:984: fatal error:
08-Sep-2022 12:18:02.696 pthread_mutex_lock() failed in req_senddone(): No error: 0
08-Sep-2022 12:18:02.696 exiting (due to fatal error in library)
Relevant C code:
973 static void
974 req_senddone(isc_result_t eresult, isc_region_t *region, void *arg) {
975 dns_request_t *request = (dns_request_t *)arg;
976
977 REQUIRE(VALID_REQUEST(request));
978 REQUIRE(DNS_REQUEST_SENDING(request));
979
980 UNUSED(region);
981
982 req_log(ISC_LOG_DEBUG(3), "req_senddone: request %p", request);
983
984 >>> LOCK(&request->requestmgr->locks[request->hash]);
985 request->flags &= ~DNS_REQUEST_F_SENDING;
986
There are two issues to address here:
-
Commit 3608abc8 (part of !6570 (merged)) inadvertently copied a mistake in logging
pthread_cond_init()
errors toERRNO_CHECK()
: instead of passing the value returned by a givenpthread_*()
function tostrerror_r()
,ERRNO_CHECK()
passes theerrno
variable tostrerror_r()
. This causes bogus error reports like the one above (X failed in Y: No error: 0
). -
The
pthread_mutex_lock()
call above really must have failed, otherwise the process would not have crashed. Unfortunately, issue number 1 described above preventednamed
from logging the actual error value returned by libthr, so further investigation is severely hampered.
I will fix issue number 1 first as we need to merge it and then wait for another instance of this issue to manifest itself in order to make any progress with investigating issue number 2.
I checked for obvious issues in the core dump, but nothing drew my attention - everything looks "normal" for FreeBSD:
(gdb) print request->references
$1 = 3
(gdb) print request->hash
$2 = 3
(gdb) print request->requestmgr->references
$3 = 7
(gdb) print request->requestmgr->exiting
$4 = false
(gdb) print request->requestmgr->locks
$5 = {0x802fbf380, 0x802fbf400, 0x802fbf480, 0x802fbf500, 0x802fbf580, 0x802fbf600, 0x802fbf680}
(gdb) print request->requestmgr->locks[request->hash]
$6 = (isc_mutex_t) 0x802fbf500
(gdb) x/1xg request->requestmgr->locks[request->hash]
0x802fbf500: 0x0000000000018809
Unfortunately libthr debug symbols are not available out-of-the-box on FreeBSD and there is no easy way to install them without rebuilding the whole system, so I do not have any further introspection into the actual mutex structure.
For completeness, the DNS message that named
tried to send when the
crash happened was:
0x8052a1680: 0x24 0x22 0x24 0x00 0x00 0x01 0x00 0x01
0x8052a1688: 0x00 0x00 0x00 0x00 0x06 0x74 0x65 0x73
0x8052a1690: 0x74 0x2d 0x68 0x00 0x00 0x06 0x00 0x01
0x8052a1698: 0xc0 0x0c 0x00 0x06 0x00 0x01 0x00 0x00
0x8052a16a0: 0x00 0x00 0x00 0x1b 0x03 0x6e 0x73 0x32
0x8052a16a8: 0xc0 0x0c 0x00 0x77 0x36 0x39 0xa7 0x00
0x8052a16b0: 0x00 0x00 0x14 0x00 0x00 0x00 0x14 0x00
0x8052a16b8: 0x1b 0xaf 0x80 0x00 0x00 0x0e 0x10
which translates to:
id 9250
opcode NOTIFY
rcode NOERROR
flags AA
;QUESTION
test-h. IN SOA
;ANSWER
test-h. 0 IN SOA ns2.test-h. . 2000042407 20 20 1814400 3600
;AUTHORITY
;ADDITIONAL