serve-stale system test check for recursive-clients quota with stale-answer-client-timeout 1.8 can fail
serve-stale
test may intermittently fail with a false positive:
I:serve-stale_tmp_60yifh_0:check that named survives reaching recursive-clients quota (stale-answer-client-timeout 1.8) (153)
I:serve-stale_tmp_60yifh_0:failed
In this test, we overwhelm the recursive-client
quota by sending queries for latencyN.data.example TXT
and then check the server didn't crash. This is done by sending a query for data.example TXT
and verifying we get a NOERROR
response. In some slower environments (e.g. job#3440468), the returned reply might be SERVFAIL
instead, because that query hits a SERVFAIL
cache:
05-Jun-2023 01:11:02.257 client @0x7fdcefa1a160 10.53.0.1#43053: UDP request
05-Jun-2023 01:11:02.257 client @0x7fdcefa1a160 10.53.0.1#43053: using view '_default'
05-Jun-2023 01:11:02.257 client @0x7fdcefa1a160 10.53.0.1#43053: request is not signed
05-Jun-2023 01:11:02.257 client @0x7fdcefa1a160 10.53.0.1#43053: recursion available
05-Jun-2023 01:11:02.257 query client=0x7fdcefa1a160 thread=0x7fdcf0b7bb38(<unknown-query>): ns_query_start
05-Jun-2023 01:11:02.257 query client=0x7fdcefa1a160 thread=0x7fdcf0b7bb38(data.example/TXT): qctx_init
05-Jun-2023 01:11:02.257 query client=0x7fdcefa1a160 thread=0x7fdcf0b7bb38(data.example/TXT): client attr:0x2302, query attr:0x703, restarts:0, origqname:data.example, timer:0, authdb:0, referral:0
05-Jun-2023 01:11:02.257 client @0x7fdcefa1a160 10.53.0.1#43053 (data.example): servfail cache hit data.example/TXT (CD=0)
05-Jun-2023 01:11:02.257 query client=0x7fdcefa1a160 thread=0x7fdcf0b7bb38(data.example/TXT): ns_query_done
05-Jun-2023 01:11:02.257 client @0x7fdcefa1a160 10.53.0.1#43053 (data.example): query failed (SERVFAIL) for data.example/IN/TXT at query.c:7086
05-Jun-2023 01:11:02.257 client @0x7fdcefa1a160 10.53.0.1#43053 (data.example): reset client
05-Jun-2023 01:11:02.257 query client=0x7fdcefa1a160 thread=0x7fdcf0b7bb38(data.example/TXT): query_reset
While the check fails, it doesn't indicate the server crashed - thus resulting in a false negative result.
A possible solution would be to try a different query (e.g. version.bind txt ch
) to verify the server is still alive and responding.
Edited by Nicki Křížek