suspicious message IDs and lost answers in a long DoT stream
dnsperf in DoT mode sometimes reports unexpected message IDs and timeouts when talking to BIND, even in test runs without retransmits, timeouts, and message ID reuse.
- Affects v9.18: 5a1d465 (fix for #4242 (closed))
- Not affected/not reproducible on: v9.19
- dnsperf: 2.13.0 - commit 00f5a188aaabab7cc3b2fffa016944c20402b5e6
Steps to reproduce
- Enable DoT
- Run BIND with
SSLKEYLOGFILE=/tmp/tlskeysenv variable to capture TLS keys so we can open PCAPs later
- Capture PCAP:
sudo tcpdump -i lo -w /tmp/pcap 'port 853'
dnsperf -d /tmp/qlist -l0.1 -q 1000 -m dot -s 127.0.0.1 -p 853 -c1
- Watch for "response with an unexpected (maybe timed out) id" and "[Timeout]" messages
Above are minimal parameters for
-l which allow me to reproduce the problem on my machine. Bump
-l as needed.
What is the current bug behavior?
$ dnsperf -d /tmp/qlist -l0.1 -q 1000 -m dot -s 127.0.0.1 -p 853 -c1 DNS Performance Testing Tool Version 2.13.0 [Status] Command line: dnsperf -d /tmp/qlist -l0.1 -q 1000 -m dot -s 127.0.0.1 -p 853 -c1 [Status] Sending queries (to 127.0.0.1:853) [Status] Started at: Thu Aug 10 14:59:52 2023 [Status] Stopping after 0.100000 seconds
Warning: received a response with an unexpected (maybe timed out) id: 2014 [Timeout] Query timed out: msg id 2015
[Status] Testing complete (time limit) Statistics: Queries sent: 4160 Queries completed: 4159 (99.98%) Queries lost: 1 (0.02%) Response codes: NOERROR 4159 (100.00%) Average packet size: request 35, response 51 Run time (s): 0.106694 Queries per second: 38980.636212 Average Latency (s): 0.016029 (min 0.000095, max 0.027541) Latency StdDev (s): 0.009930 Connection Statistics: Reconnections: 0 (0.00% of 1 connections) Average Latency (s): 0.002322 (min 0.002322, max 0.002322)
What is the expected correct behavior?
No reports of unexpected or lost messages.
Relevant configuration files
Relevant logs and/or screenshots
Look for message IDs 2014 and 2015. They should be there just once in each direction, but 2014 is duplicated on response from BIND and 2015 has missing response.
Helpful wireshark display filters:
dns.id == 2014
dns.id == 2015
dns.retransmission == 1