Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • BIND BIND
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 589
    • Issues 589
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 95
    • Merge requests 95
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ISC Open Source ProjectsISC Open Source Projects
  • BINDBIND
  • Issues
  • #3534
Closed
Open
Issue created Sep 09, 2022 by Michał Kępień@michalOwner

pthread_mutex_lock() failed during shutdown on FreeBSD 12.3

https://gitlab.isc.org/isc-private/bind9/-/jobs/2752200

Click to expand/collapse backtrace
D:inline:#0  0x000000080151769a in thr_kill () from /lib/libc.so.7
D:inline:#1  0x0000000801515af4 in raise () from /lib/libc.so.7
D:inline:#2  0x000000080148b719 in abort () from /lib/libc.so.7
D:inline:#3  0x000000000023f75d in library_fatal_error (file=0x8008b1067 "request.c", line=984, format=0x8008a58ff "%s() failed in %s(): %s", args=0x7fffdfbfbb70) at main.c:277
D:inline:#4  0x000000080031bdd5 in isc_error_fatal (file=0x18809 <error: Cannot access memory at address 0x18809>, line=6, format=0x0) at error.c:68
D:inline:#5  0x00000008009cde97 in req_senddone (eresult=<optimized out>, region=<optimized out>, arg=<optimized out>) at request.c:998
D:inline:#6  0x00000008009134d5 in send_done (handle=0x80533d380, result=ISC_R_SHUTTINGDOWN, cbarg=0x0) at dispatch.c:1862
D:inline:#7  0x00000008003169f8 in isc__nm_udp_send (handle=0x18809, region=0x7fffdfbfbd00, cb=0x8009134a0 <send_done>, cbarg=0x0) at netmgr/udp.c:704
D:inline:#8  0x000000080030919c in isc_nm_send (handle=0x18809, region=0x6, cb=0x0, cbarg=0x8015176ba <thr_self+10>) at netmgr/netmgr.c:1985
D:inline:#9  0x0000000800913470 in dns_dispatch_send (resp=0x8051b6a00, r=0x7fffdfbfbd00, dscp=<optimized out>) at dispatch.c:1913
D:inline:#10 0x00000008009ce20d in req_send (request=0x8051bbc40) at request.c:293
D:inline:#11 0x00000008009cdc52 in req_connected (eresult=ISC_R_SUCCESS, region=<optimized out>, arg=0x8051bbc40) at request.c:962
D:inline:#12 0x000000080091337e in udp_connected (handle=0x80533d380, eresult=ISC_R_SUCCESS, arg=0x8051b6a00) at dispatch.c:1783
D:inline:#13 0x0000000800309545 in isc__nm_async_connectcb (worker=<optimized out>, ev0=<optimized out>) at netmgr/netmgr.c:2172
D:inline:#14 0x0000000800306aab in process_netievent (arg=0x80533d500) at netmgr/netmgr.c:497
D:inline:#15 0x0000000800322d47 in isc__job_cb (idle=0x8051b3048) at job.c:75
D:inline:#16 0x00000008011215cd in ?? () from /usr/local/lib/libuv.so.1
D:inline:#17 0x000000080111b026 in uv_run () from /usr/local/lib/libuv.so.1
D:inline:#18 0x0000000800329d7e in loop_run (loop=0x801952f90) at loop.c:266
D:inline:#19 0x0000000800328f6c in loop_thread (arg=0x801952f90) at loop.c:293
D:inline:#20 0x000000080033f8d6 in isc__trampoline_run (arg=0x8019ef360) at trampoline.c:189
D:inline:#21 0x000000080134108c in ?? () from /lib/libthr.so.3
D:inline:#22 0x0000000000000000 in ?? ()

bin/tests/system/inline/ns2/named.run:

08-Sep-2022 12:18:02.696 request.c:984: fatal error:
08-Sep-2022 12:18:02.696 pthread_mutex_lock() failed in req_senddone(): No error: 0
08-Sep-2022 12:18:02.696 exiting (due to fatal error in library)

Relevant C code:

 973 static void                                                                     
 974 req_senddone(isc_result_t eresult, isc_region_t *region, void *arg) {           
 975         dns_request_t *request = (dns_request_t *)arg;                          
 976                                                                                 
 977         REQUIRE(VALID_REQUEST(request));                                        
 978         REQUIRE(DNS_REQUEST_SENDING(request));                                  
 979                                                                                 
 980         UNUSED(region);                                                         
 981                                                                                 
 982         req_log(ISC_LOG_DEBUG(3), "req_senddone: request %p", request);         
 983                                                                                 
 984   >>>   LOCK(&request->requestmgr->locks[request->hash]);                       
 985         request->flags &= ~DNS_REQUEST_F_SENDING;                               
 986                                                                                 

There are two issues to address here:

  1. Commit 3608abc8 (part of !6570 (merged)) inadvertently copied a mistake in logging pthread_cond_init() errors to ERRNO_CHECK(): instead of passing the value returned by a given pthread_*() function to strerror_r(), ERRNO_CHECK() passes the errno variable to strerror_r(). This causes bogus error reports like the one above (X failed in Y: No error: 0).

  2. The pthread_mutex_lock() call above really must have failed, otherwise the process would not have crashed. Unfortunately, issue number 1 described above prevented named from logging the actual error value returned by libthr, so further investigation is severely hampered.

I will fix issue number 1 first as we need to merge it and then wait for another instance of this issue to manifest itself in order to make any progress with investigating issue number 2.

I checked for obvious issues in the core dump, but nothing drew my attention - everything looks "normal" for FreeBSD:

(gdb) print request->references 
$1 = 3
(gdb) print request->hash
$2 = 3
(gdb) print request->requestmgr->references 
$3 = 7
(gdb) print request->requestmgr->exiting
$4 = false
(gdb) print request->requestmgr->locks
$5 = {0x802fbf380, 0x802fbf400, 0x802fbf480, 0x802fbf500, 0x802fbf580, 0x802fbf600, 0x802fbf680}
(gdb) print request->requestmgr->locks[request->hash]
$6 = (isc_mutex_t) 0x802fbf500
(gdb) x/1xg request->requestmgr->locks[request->hash]
0x802fbf500:	0x0000000000018809

Unfortunately libthr debug symbols are not available out-of-the-box on FreeBSD and there is no easy way to install them without rebuilding the whole system, so I do not have any further introspection into the actual mutex structure.

For completeness, the DNS message that named tried to send when the crash happened was:

0x8052a1680:	0x24	0x22	0x24	0x00	0x00	0x01	0x00	0x01
0x8052a1688:	0x00	0x00	0x00	0x00	0x06	0x74	0x65	0x73
0x8052a1690:	0x74	0x2d	0x68	0x00	0x00	0x06	0x00	0x01
0x8052a1698:	0xc0	0x0c	0x00	0x06	0x00	0x01	0x00	0x00
0x8052a16a0:	0x00	0x00	0x00	0x1b	0x03	0x6e	0x73	0x32
0x8052a16a8:	0xc0	0x0c	0x00	0x77	0x36	0x39	0xa7	0x00
0x8052a16b0:	0x00	0x00	0x14	0x00	0x00	0x00	0x14	0x00
0x8052a16b8:	0x1b	0xaf	0x80	0x00	0x00	0x0e	0x10

which translates to:

id 9250
opcode NOTIFY
rcode NOERROR
flags AA
;QUESTION
test-h. IN SOA
;ANSWER
test-h. 0 IN SOA ns2.test-h. . 2000042407 20 20 1814400 3600
;AUTHORITY
;ADDITIONAL
Assignee
Assign to
Time tracking