TCP dispatch can still hang with non-matching response
While testing a fix for #3040 (closed) @michal observed another shutdown hang, which turned out to be a missed case from #3026 (closed). A TCP dispatch resumes listening after receiving a response by calling dispatch_getnext()
, and then the response callback sees that the response didn't match the question and calls dns_dispatch_getnext()
, which uselessly calls dispatch_getnext()
again, bumping the dispatch reference count.
This can currently be observed with the query:
dig @localhost -x 74.213.100.99
The delegation to 74.in-addr.arpa is lame, all servers are responding with REFUSED. This leads to the following being logged:
02-Dec-2021 10:45:37.815 received packet from 24.138.252.19#53
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 19192
;; flags: qr; QUESTION: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
02-Dec-2021 10:45:37.815 DNS format error from 24.138.252.19#53 resolving 99.100.213.74.in-addr.arpa/PTR for 127.0.0.1#41988: empty question section
02-Dec-2021 10:45:37.815 fctx 0x7f30ed826000(99.100.213.74.in-addr.arpa/PTR): [result: FORMERR] response did not match question
02-Dec-2021 10:45:37.815 fctx 0x7f30ed826000(99.100.213.74.in-addr.arpa/PTR): [result: FORMERR] query canceled in rctx_done(); responding
It looks to me like the best fix is to make dispatch_getnext()
idempotent.