This should solve the intermittent failures on the
What happens is that, for non-relevant reasons, RADIUS tests restart Kea servers at some point. When Kea is stopped via
keactrl stop, forge doesn't wait for Kea to actually stop and just issues another
keactrl start. This works most of the time, but when
libdhcp_radius.so, Kea takes around 25 milliseconds to deregister radius as a host backend. Here is a log that shows that:
113-2022-09-08 15:23:05.279 DEBUG [kea-dhcp4.commands/3081227.140598081251776] COMMAND_DEREGISTERED Command version-get deregistered 114:2022-09-08 15:23:05.304 DEBUG [kea-dhcp4.hosts/3081227.140598081251776] HOSTS_BACKEND_DEREGISTER deregistered host backend type: radius
25ms is a lot, because when radius is not configured, everything happens in under a millisecond, such that you're lucky if you see the timestamp change from
keactrl stop to the last log of Kea. And It's enough for
keactrl start to not take effect because it sees the pidfile is still there, because Kea is not stopped.
So the suggested fix is to make
keactrl stop wait for Kea to stop.
Here it is running a job. All radius tests pass: https://jenkins.aws.isc.org/view/Kea-manual/job/kea-manual/job/tarball-system-tests/158/