This should solve the intermittent failures on the test_radius_*
.
What happens is that, for non-relevant reasons, RADIUS tests restart Kea servers at some point. When Kea is stopped via keactrl stop
, forge doesn't wait for Kea to actually stop and just issues another keactrl start
. This works most of the time, but when libdhcp_radius.so
, Kea takes around 25 milliseconds to deregister radius as a host backend. Here is a log that shows that:
113-2022-09-08 15:23:05.279 DEBUG [kea-dhcp4.commands/3081227.140598081251776] COMMAND_DEREGISTERED Command version-get deregistered
114:2022-09-08 15:23:05.304 DEBUG [kea-dhcp4.hosts/3081227.140598081251776] HOSTS_BACKEND_DEREGISTER deregistered host backend type: radius
25ms is a lot, because when radius is not configured, everything happens in under a millisecond, such that you're lucky if you see the timestamp change from keactrl stop
to the last log of Kea. And It's enough for keactrl start
to not take effect because it sees the pidfile is still there, because Kea is not stopped.
So the suggested fix is to make keactrl stop
wait for Kea to stop.
Here it is running a job. All radius tests pass: https://jenkins.aws.isc.org/view/Kea-manual/job/kea-manual/job/tarball-system-tests/158/