"shutdown" system test needs to be tweaked to account for recent netmgr changes
The shutdown
system test on main
fails intermittently and I believe
that it may be caused by the recent changes in named
behavior during
shutdown (caused by various tweaks to netmgr code).
Here is an example failure:
https://gitlab.isc.org/isc-private/bind9/-/jobs/1299652
What happened here is that a call to resolver.query()
raised a
NoNameservers
exception because ns3
did not respond to one of the
random queries which are sent just before ns3
gets kill
ed by a
SIGTERM signal. This is certainly possible as ns3
needs to recurse to
resolve those random queries; in fact, it seems that it did even get a
recursive response in time:
16-Nov-2020 11:51:06.817 received packet from 10.53.0.2#32288
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 38572
;; flags: qr aa ra cd; QUESTION: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; COOKIE: 2fc13823006109bb010000005fb267aaa20d2412c8c3b6d3
;; QUESTION SECTION:
;gyggbf.test. IN A
;; AUTHORITY SECTION:
;test. 60 IN SOA ns1.test. root.test. (
; 2020040101 ; serial
; 14400 ; refresh (4 hours)
; 3600 ; retry (1 hour)
; 604800 ; expire (1 week)
; 60 ; minimum (1 minute)
; )
but the response was never sent as the server was SIGTERM'd beforehand:
16-Nov-2020 11:51:06.675 shutting down
which resulted in:
16-Nov-2020 11:51:06.820 client @0x8073e5160 10.53.0.3#37091 (gyggbf.test): send failed: operation canceled
All in all, it seems to me that nothing extraordinary happened here and
this looks like a false positive. It may be enough to catch
NoNameservers
exceptions, but a closer look at what the shutdown
system test does in the light of recent netmgr changes would not hurt.