New stale-answer-client-timeout crashes BIND 9.16 and 9.17
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f3c7bda4859 in __GI_abort () at abort.c:79
#2 0x000055cc6a31dd0a in assertion_failed (file=<optimized out>, line=<optimized out>, type=<optimized out>, cond=<optimized out>) at ./main.c:261
#3 0x000055cc6a52ed20 in isc_assertion_failed (file=file@entry=0x55cc6a5a3863 "query.c", line=line@entry=6282, type=type@entry=isc_assertiontype_require,
cond=cond@entry=0x55cc6a5a42c0 "client->query.fetch == ((void *)0)") at assertions.c:46
#4 0x000055cc6a37ba6a in ns_query_recurse (client=0x7f3c540104c8, qtype=<optimized out>, qname=qname@entry=0x7f3c45331380, qdomain=0x7f3c453312e0, nameservers=0x7f3c45335bc8,
resuming=<optimized out>) at query.c:6258
#5 0x000055cc6a385e71 in query_delegation_recurse (qctx=0x7f3c721fa920) at query.c:8513
#6 query_delegation (qctx=qctx@entry=0x7f3c721fa920) at query.c:8459
#7 0x000055cc6a381f0e in query_gotanswer (qctx=qctx@entry=0x7f3c721fa920, res=res@entry=65565) at query.c:7220
#8 0x000055cc6a383e14 in query_lookup (qctx=qctx@entry=0x7f3c721fa920) at query.c:5925
#9 0x000055cc6a387e90 in query_lookup_staleonly (client=0x7f3c540104c8) at query.c:5956
#10 fetch_callback (task=<optimized out>, event=<optimized out>) at query.c:5995
#11 0x000055cc6a564131 in dispatch (threadid=<optimized out>, manager=<optimized out>) at task.c:1152
#12 run (queuep=<optimized out>) at task.c:1344
#13 0x00007f3c7bf8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007f3c7bea1293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The problem case here is that when we are handling a client request, and stale-answer-client-timeout
is triggered, a separate stale only request is made, while the already started resolver fetch continues to wait for a response.
This is fine if there is a stale answer in cache, but if nothing is found, the stale only query will also start recursion.
Now we hit the client->query.fetch == ((void *)0)
assertion.
Fix by disabling recursion on staleonly lookups.
Edited by Support RT