dnstap system test fails on dnstap output file
System test dnstap
started to fail very frequently on v9.16 in the system:gcc:oraclelinux7:amd64 job (specifically on oraclelinux7
):
I:dnstap:checking unix socket message counts
I:dnstap:dnstap output file smaller than expected
I:dnstap:failed
I:dnstap:checking UDP message counts
I:dnstap:ns4 0 expected 4
I:dnstap:failed
I:dnstap:checking TCP message counts
I:dnstap:checking AUTH_QUERY message counts
I:dnstap:checking AUTH_RESPONSE message counts
I:dnstap:checking CLIENT_QUERY message counts
I:dnstap:ns4 0 expected 1
I:dnstap:failed
I:dnstap:checking CLIENT_RESPONSE message counts
I:dnstap:ns4 0 expected 1
I:dnstap:failed
I:dnstap:checking RESOLVER_QUERY message counts
I:dnstap:checking RESOLVER_RESPONSE message counts
I:dnstap:checking UPDATE_QUERY message counts
I:dnstap:ns4 0 expected 1
I:dnstap:failed
I:dnstap:checking UPDATE_RESPONSE message counts
I:dnstap:ns4 0 expected 1
I:dnstap:failed
I:dnstap:checking reopened unix socket message counts
I:dnstap:failed
I:dnstap:checking UDP message counts
I:dnstap:checking TCP message counts
I:dnstap:checking AUTH_QUERY message counts
I:dnstap:checking AUTH_RESPONSE message counts
I:dnstap:checking CLIENT_QUERY message counts
I:dnstap:checking CLIENT_RESPONSE message counts
I:dnstap:checking RESOLVER_QUERY message counts
I:dnstap:checking RESOLVER_RESPONSE message counts
I:dnstap:checking UPDATE_QUERY message counts
I:dnstap:checking UPDATE_RESPONSE message counts
I:dnstap:checking large packet printing
I:dnstap:checking 'rndc -roll <value>' (no versions)
I:dnstap:checking 'rndc -roll <value>' (versions)
I:dnstap:exit status: 7
The good news is that this isn't cause by any code change, as the test used to pass before on the very same commit, and then started to fail afterwards since 2023-07-28.
However, it is quite puzzling why the test started to fail so often - the underlying oraclelinux-7-amd64
image was last built on 2023-07-20, and the failure wasn't occurring at first, so it can't be caused by an image rebuild / dependencies either.
I wasn't able to reproduce the issue locally on v9.16 using our oraclelinux-7-amd
container image in over a hundred runs. I also didn't manage to reproduce it in my local environment on v9.19 in over a hundred runs either.
Unfortunately, increasing the timeout for the dnstap file to be ready to 10s up from 5s doesn't seem to help.