"padding" system test issues
Some of the checks in the "padding" system test appear to be very fragile:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/183716
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/184468
Both of these job failures were triggered by the following check:
echo_i "checking that a padding server config should enforce TCP ($n)"
ret=0
n=`expr $n + 1`
$RNDCCMD 10.53.0.2 stats
opad=`grep "EDNS padding option received" ns2/named.stats | \
tail -1 | awk '{ print $1}'`
$DIG $DIGOPTS foo.example @10.53.0.4 > dig.out.test$n
$RNDCCMD 10.53.0.2 stats
npad=`grep "EDNS padding option received" ns2/named.stats | \
tail -1 | awk '{ print $1}'`
if [ "$opad" -ne "$npad" ]; then ret=1; fi
if [ $ret != 0 ]; then echo_i "failed"; fi
status=`expr $status + $ret`
What happens in the above two failed jobs is that (I believe due to QNAME minimization on ns3
, which is queried shortly before the check quoted above), ns3
may send a TCP query for ns2.example/AAAA
to ns2
after the first rndc stats
invocation but before ns4
is queried for foo.example
. This causes the "EDNS padding option received" counter to be different between the two rndc stats
invocations in the check quoted above, triggering a false positive. This seems to be a very fragile method of checking whether ns4
sent a query with an EDNS padding option present.
The test also fails to preserve diagnostic information for later inspection.
Finally, I find the test descriptions quite confusing - EDNS padding does not enforce TCP, it requires TCP.