Flamethrower instance #3 stops querying BIND in RPZ mode
Job #3554059 failed for 2086be9b.
The stress:rpz:fedora:38:amd64
"stress" test often fails on BIND 9.16 (and -S) because the Flamethrower no. 3 stops sending queries over TCP. The other TCP Flamethrower (no. 4) and two UDP Flamethrowers keep working as they should. The issue is limited to BIND 9.16 and the amd64 RPZ job. The Fedora 38 image has Flamethrower 0.11.0 from Fedora repos.
This issue is close to #2395 (closed), but not quite. I wonder if the #2395 (comment 187991) fix or bumping to Flamethrower master
would work.
The first occurrence of this issue seems to be https://gitlab.isc.org/isc-projects/bind9/-/jobs/3435489 from 2 June 2023, between 9.16.41 and 9.16.42 releases. I could not reproduce the problem when manually triggering the job in the CI.
2023-07-30:13:27:29 INFO: Server 'ns4' (rpz) received 5,386,242 TCP queries
2023-07-30:13:27:29 INFO: About 10,800,000 TCP queries were expected
2023-07-30:13:27:29 INFO: Minimum number of TCP queries required to pass is 9,720,000
2023-07-30:13:27:29 ERROR: BIND did not process enough TCP queries
IPv4 TCP Flamethrower: generator.log
45.4845s: send: 1760, avg send: 1406, recv: 1772, avg recv: 1386, min/avg/max resp: 120.112/462.365/2886.94ms, in flight: 587, timeouts: 11
46.4852s: send: 100, avg send: 1378, recv: 633, avg recv: 1369, min/avg/max resp: 52.8356/730.897/1950.99ms, in flight: 50, timeouts: 6
47.4848s: send: 0, avg send: 1378, recv: 7, avg recv: 1340, min/avg/max resp: 1175.22/2074.78/2833.18ms, in flight: 33, timeouts: 6
48.4851s: send: 0, avg send: 1378, recv: 2, avg recv: 1312, min/avg/max resp: 2511.72/2554.51/2597.29ms, in flight: 12, timeouts: 10
49.486s: send: 0, avg send: 1378, recv: 0, avg recv: 1312, min/avg/max resp: 0/-nan/0ms, in flight: 12, timeouts: 0
...
3599.61s: send: 0, avg send: 1378, recv: 0, avg recv: 1312, min/avg/max resp: 0/-nan/0ms, in flight: 12, timeouts: 0
3600.61s: send: 0, avg send: 1378, recv: 0, avg recv: 1312, min/avg/max resp: 0/-nan/0ms, in flight: 12, timeouts: 0
3600.87s: send: 0, avg send: 1378, recv: 0, avg recv: 1312, min/avg/max resp: 0/-nan/0ms, in flight: 12, timeouts: 0
...
runtime : 3600.87 s
total sent : 65289
total rcvd : 64877
About 5 million TCP queries should have been sent.