Possible problem with clock skew in the HA Hook library
A curious discrepancy was noticed during a HA failover event. Contact was lost with the Primary server. There seems to have been some severe clock skew on the Primary server. When the primary server returned to service, and the clock skew was noticed, both HA partners entered TERMINATED state. The discrepancy is that the log line on the secondary referencing the clock skew seems to give a time received from the primary which was from the last successful heartbeat before the outage began.
Last successful heartbeat logged on the primary prior to the outage:
2023-01-11 21:14:19.143 INFO [kea-dhcp4.commands/1727327.140540724561664] COMMAND_RECEIVED Received command 'ha-heartbeat'
Secondary's log of the clock skew (presumably post outage):
2023-01-11 21:17:47.784 ERROR [kea-dhcp4.ha-hooks/1724040.139887997884160] HA_HIGH_CLOCK_SKEW_CAUSES_TERMINATION my time: 2023-01-11 21:17:47, partner's time: 2023-01-11 21:14:19, partner's clock is 208s behind, causing HA service to terminate
Primary's log of the clock skew (assumedly at the same moment as the secondary's log about entering the TERMINATED state):
2023-01-11 21:24:05.763 ERROR [kea-dhcp4.ha-hooks/1727327.140540774917888] HA_TERMINATED HA service terminated due to an unrecoverable condition. Check previous error message(s), address the problem and restart!
Assuming that the clock skews were logged at the same real time on both servers, then there is a discrepancy here. The Secondary claims that the primary's clock is 208s behind. However, the Primary's clock APPEARS to be 5 or 6 minutes ahead from the times logged.