HA Failover event display - reasons
Request from a Kea user
When viewing HA status, if the status is 'failed over', can we pull out the information about WHY it has failed over? What if it fails and doesn’t ‘failover’
I interpret this to mean the following:
-
Keep history (at least 1 deep) of 'failover events' per HA pair and allow inspection. -
Capture failover reason and store that with the failover event (time stamp, what server, any other info we have). -
Capture uptime/date stamp from when the failover event was resolved and the pair resumed 'regular' operation. Not sure how this could or should be done, since it might be a popular choice in an active/passive set up to just let the new active server continue on, and restore the failed server as the new passive partner. It should be more straightforward with a LB set-up, because the failover is resolved when both partners are operating again. -
Display 'failover status' along side other metrics that demonstrate that the server is currently operational, such as 'New leases in last 15 minutes' so that it is possible to detect a situation in which the server hasn't officially failed over, but appears to not be working. I realize, this is like putting in a feature that won't ever work unless there is a bug, but it does increase user confidence.