suspicious message IDs and lost answers in a long DoT stream
Summary
dnsperf in DoT mode sometimes reports unexpected message IDs and timeouts when talking to BIND, even in test runs without retransmits, timeouts, and message ID reuse.
Versions used
- Affects v9.18: 5a1d465 (fix for #4242 (closed))
- Not affected/not reproducible on: v9.19
- dnsperf: 2.13.0 - commit 00f5a188aaabab7cc3b2fffa016944c20402b5e6
Steps to reproduce
- Enable DoT
- Run BIND with
SSLKEYLOGFILE=/tmp/tlskeys
env variable to capture TLS keys so we can open PCAPs later - Capture PCAP:
sudo tcpdump -i lo -w /tmp/pcap 'port 853'
- Run
dnsperf
couple times:dnsperf -d /tmp/qlist -l0.1 -q 1000 -m dot -s 127.0.0.1 -p 853 -c1
- Watch for "response with an unexpected (maybe timed out) id" and "[Timeout]" messages
Above are minimal parameters for -q
and -l
which allow me to reproduce the problem on my machine. Bump -q
and -l
as needed.
What is the current bug behavior?
$ dnsperf -d /tmp/qlist -l0.1 -q 1000 -m dot -s 127.0.0.1 -p 853 -c1
DNS Performance Testing Tool
Version 2.13.0
[Status] Command line: dnsperf -d /tmp/qlist -l0.1 -q 1000 -m dot -s 127.0.0.1 -p 853 -c1
[Status] Sending queries (to 127.0.0.1:853)
[Status] Started at: Thu Aug 10 14:59:52 2023
[Status] Stopping after 0.100000 seconds
Warning: received a response with an unexpected (maybe timed out) id: 2014
[Timeout] Query timed out: msg id 2015
[Status] Testing complete (time limit)
Statistics:
Queries sent: 4160
Queries completed: 4159 (99.98%)
Queries lost: 1 (0.02%)
Response codes: NOERROR 4159 (100.00%)
Average packet size: request 35, response 51
Run time (s): 0.106694
Queries per second: 38980.636212
Average Latency (s): 0.016029 (min 0.000095, max 0.027541)
Latency StdDev (s): 0.009930
Connection Statistics:
Reconnections: 0 (0.00% of 1 connections)
Average Latency (s): 0.002322 (min 0.002322, max 0.002322)
What is the expected correct behavior?
No reports of unexpected or lost messages.
Relevant configuration files
Relevant logs and/or screenshots
Look for message IDs 2014 and 2015. They should be there just once in each direction, but 2014 is duplicated on response from BIND and 2015 has missing response.
Helpful wireshark display filters:
dns.id == 2014
dns.id == 2015
dns.retransmission == 1
Edited by Petr Špaček