Underflow of the RecursClients counter crashes named during respdiff tests
On the first run after !2453 (merged) got merged, the respdiff tests crashed due to a statistics counter underflow:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2286800
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/2286801
Click to expand/collapse log excerpt
11-Feb-2022 00:28:28.644 SERVFAIL unexpected RCODE resolving '_.233.124.88.in-addr.arpa/A/IN': 193.0.9.6#53
11-Feb-2022 00:28:29.772 timed out resolving 'ars-tv.ru/MX/IN': 83.242.151.118#53
11-Feb-2022 00:28:29.904 REFUSED unexpected RCODE resolving 'ars-tv.ru/MX/IN': 176.192.105.132#53
11-Feb-2022 00:28:29.904 query client=0x7f13d4405170 thread=0x7f13d8bf4700(ars-tv.ru/MX): query_gotanswer: unexpected error: failure
11-Feb-2022 00:28:30.084 stats.c:118: REQUIRE(__atomic_fetch_sub (((&stats->counters[counter])), ((1)), (memory_order_release)) > 0) failed, back trace
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/bin/named/.libs/named(+0x1fc3f) [0x556bfdedbc3f]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(isc_assertion_failed+0xa) [0x7f13e2ea5362]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(isc_stats_dump+0) [0x7f13e2ec2f0c]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/ns/.libs/libns-9.17.22.so(ns_stats_decrement+0x1a) [0x7f13e2c2fcb0]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/ns/.libs/libns-9.17.22.so(+0x34ace) [0x7f13e2c2dace]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(+0x557f3) [0x7f13e2ec67f3]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(isc_task_run+0x9) [0x7f13e2ec690e]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(+0x1d88c) [0x7f13e2e8e88c]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(+0x23b5c) [0x7f13e2e94b5c]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(+0x24340) [0x7f13e2e95340]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(+0x248ff) [0x7f13e2e958ff]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(+0x24956) [0x7f13e2e95956]
11-Feb-2022 00:28:30.084 /usr/lib/x86_64-linux-gnu/libuv.so.1(+0x10e83) [0x7f13e282fe83]
11-Feb-2022 00:28:30.084 /usr/lib/x86_64-linux-gnu/libuv.so.1(+0x2cb87) [0x7f13e284bb87]
11-Feb-2022 00:28:30.084 /usr/lib/x86_64-linux-gnu/libuv.so.1(uv_run+0xb1) [0x7f13e283086c]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(+0x24412) [0x7f13e2e95412]
11-Feb-2022 00:28:30.084 /builds/isc-projects/bind9/lib/isc/.libs/libisc-9.17.22.so(isc__trampoline_run+0x16) [0x7f13e2ecd5ba]
11-Feb-2022 00:28:30.084 /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f13e25f8ea7]
11-Feb-2022 00:28:30.084 /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f13e2528def]
11-Feb-2022 00:28:30.084 exiting (due to assertion failure)
Click to expand/collapse GDB backtrace
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f13e2450537 in __GI_abort () at abort.c:79
#2 0x0000556bfdedbd96 in assertion_failed (file=<optimized out>, line=<optimized out>, type=isc_assertiontype_require,
cond=0x7f13e2ee7110 "__atomic_fetch_sub (((&stats->counters[counter])), ((1)), (memory_order_release)) > 0") at main.c:238
#3 0x00007f13e2ea5362 in isc_assertion_failed (file=file@entry=0x7f13e2ee7215 "stats.c", line=line@entry=118, type=type@entry=isc_assertiontype_require,
cond=cond@entry=0x7f13e2ee7110 "__atomic_fetch_sub (((&stats->counters[counter])), ((1)), (memory_order_release)) > 0") at assertions.c:49
#4 0x00007f13e2ec2f0c in isc_stats_decrement (stats=<optimized out>, counter=counter@entry=36) at stats.c:118
#5 0x00007f13e2c2fcb0 in ns_stats_decrement (stats=<optimized out>, counter=counter@entry=36) at stats.c:105
#6 0x00007f13e2c2dace in fetch_callback (task=<optimized out>, event=<optimized out>) at query.c:6186
#7 0x00007f13e2ec67f3 in task_run (task=0x7f13d76c56c0) at task.c:821
#8 0x00007f13e2ec690e in isc_task_run (task=<optimized out>) at task.c:901
#9 0x00007f13e2e8e88c in isc__nm_async_task (worker=worker@entry=0x7f13dfa36240, ev0=ev0@entry=0x7f1395dc6200) at netmgr/netmgr.c:837
#10 0x00007f13e2e94b5c in process_netievent (worker=worker@entry=0x7f13dfa36240, ievent=0x7f1395dc6200) at netmgr/netmgr.c:916
#11 0x00007f13e2e95340 in process_queue (worker=worker@entry=0x7f13dfa36240, type=type@entry=NETIEVENT_TASK) at netmgr/netmgr.c:1010
#12 0x00007f13e2e958ff in process_all_queues (worker=0x7f13dfa36240) at netmgr/netmgr.c:756
#13 0x00007f13e2e95956 in async_cb (handle=0x7f13dfa365a0) at netmgr/netmgr.c:785
#14 0x00007f13e282fe83 in uv__async_io (loop=0x7f13dfa36250, w=0x7f13dfa36418, events=1) at /usr/src/libuv-v1.43.0/src/unix/async.c:163
#15 0x00007f13e284bb87 in uv__io_poll (loop=0x7f13dfa36250, timeout=-1) at /usr/src/libuv-v1.43.0/src/unix/epoll.c:374
#16 0x00007f13e283086c in uv_run (loop=0x7f13dfa36250, mode=UV_RUN_DEFAULT) at /usr/src/libuv-v1.43.0/src/unix/core.c:389
#17 0x00007f13e2e95412 in nm_thread (worker0=0x7f13dfa36240) at netmgr/netmgr.c:691
#18 0x00007f13e2ecd5ba in isc__trampoline_run (arg=0x556bfe629b70) at trampoline.c:187
#19 0x00007f13e25f8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f13e2528def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The backtrace shows that the crashing call is in fetch_callback()
:
6180 /*
6181 * We're done recursing, detach from quota and unlink from
6182 * the manager's recursing-clients list.
6183 */
6184
6185 if (client->recursionquota != NULL) {
6186 isc_quota_detach(&client->recursionquota);
6187 >>> ns_stats_decrement(client->sctx->nsstats,
6188 >>> ns_statscounter_recursclients);
(The backtrace is identical for both jobs linked to above.)