Crashes during recursive-mode "stress" test
All three recursive-mode "stress" tests failed during last night's
scheduled pipeline for main
:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3144301
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3144304
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3144307
The crashes apparently happen due to an assertion failure in the
release_fctx()
function in lib/dns/resolver.c
:
7202 }
7203
7204 LOCK(&res->fctxs_lock);
7205 result = isc_hashmap_delete(res->fctxs, &hashval, fctx->key.key,
7206 fctx->key.size);
7207 >>> INSIST(result == ISC_R_SUCCESS);
7208 fctx->hashed = false;
7209 UNLOCK(&res->fctxs_lock);
7210 }
7211
Backtrace:
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f3949880ec3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007f3949830a76 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007f394981a7fc in __GI_abort () at abort.c:79
#4 0x0000000000417bca in assertion_failed (file=<optimized out>, line=<optimized out>, type=isc_assertiontype_insist, cond=0x7f394a41c7b7 "result == ISC_R_SUCCESS") at main.c:236
#5 0x00007f394a4b65ba in isc_assertion_failed (file=file@entry=0x7f394a433196 "resolver.c", line=line@entry=7207, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7f394a41c7b7 "result == ISC_R_SUCCESS")
at assertions.c:49
#6 0x00007f394a37cebd in release_fctx (fctx=fctx@entry=0x7f3916f8f800) at resolver.c:7207
#7 0x00007f394a38885a in fctx__done (fctx=0x7f3916f8f800, result=ISC_R_SUCCESS, line=5825, func=0x7f394a4368f0 <__func__.19> "validated", file=0x7f394a433196 "resolver.c") at resolver.c:1860
#8 0x00007f394a38cc8a in validated (task=<optimized out>, event=<optimized out>) at resolver.c:5825
#9 0x00007f394a4d820b in task_run (task=<optimized out>, task@entry=0x7f39486fdc00) at task.c:469
#10 0x00007f394a4d8549 in task__run (arg=0x7f39486fdc00) at task.c:287
#11 0x00007f394a4c14d6 in isc__job_cb (idle=0x7f391d00e448) at job.c:75
#12 0x00007f3949c9d5bf in uv__run_idle (loop=0x7f3948cca940) at /usr/src/libuv-v1.44.1/src/unix/loop-watcher.c:68
#13 0x00007f3949c93be8 in uv_run (loop=0x7f3948cca940, mode=UV_RUN_DEFAULT) at /usr/src/libuv-v1.44.1/src/unix/core.c:384
#14 0x00007f394a4c7608 in loop_run (loop=loop@entry=0x7f3948cca920) at loop.c:270
#15 0x00007f394a4c7707 in loop_thread (arg=0x7f3948cca920) at loop.c:297
#16 0x00007f394a4ded3a in isc__trampoline_run (arg=0xb1d8d0) at trampoline.c:202
#17 0x00007f394987f12d in start_thread (arg=<optimized out>) at pthread_create.c:442
#18 0x00007f39498ffd74 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
These jobs passed fine in the previous scheduled pipeline:
https://gitlab.isc.org/isc-projects/bind9/-/pipelines/128138
Here is a list of branches merged into main
between the two relevant code revisions:
$ git log --merges --oneline --no-decorate d838b9f5..origin/main
daf78318eda Merge branch 'each-remove-bind9-refvar' into 'main'
1db5dc456a4 Merge branch '3840-avoid-libuv-with-broken-recvmmsg' into 'main'
e239e97a0d9 Merge branch 'fanf-another-bitstring-remnant' into 'main'
d39f666c7e7 Merge branch 'fanf-smaller-rdatasetheader' into 'main'
ab4f4b4df0a Merge branch '3857-notify-source-port-test-is-not-reliable' into 'main'
I am not making this issue confidential because it does not appear to affect stable branches: