Commit aa83b956 authored by Marcin Siodelski's avatar Marcin Siodelski
Browse files

[5603] Documented clock skew in HA and "terminated" state.

parent 505ce96e
......@@ -95,6 +95,41 @@
<title>Clocks on Active Servers</title>
<para>Synchronized clocks are essential for the HA setup to operate
reliably. The servers share lease information via lease updates and
during synchronization of the databases. The lease information includes
the time when the lease has been allocated and when it expires. Some
clock skew between the servers participating the HA setup would usually
exist. This is acceptable as long as the clock skew is relatively low,
comparing to the lease lifetimes. However, if the clock skew becomes too
high, the different notion of time for the lease expiration by different
servers may cause the HA system to malfuction. For example, one server
may consider valid lease to be expired. As a consequence, the lease reclamation
process may remove a name associated with this lease from the DNS, even though
the lease may later get renewed by a client.</para>
<para>Each active server monitors the clock skew by comparing its current
time with the time returned by its partner in response to the heartbeat
command. This gives a good approximation of the clock skew, although it
doesn't take into account the time between sending the response by the
partner and receiving this response by the server which sent the
heartbeat command. If the clock skew exceeds 30 seconds, a warning log
message is issued. The administrator may correct this problem by
synchronizing the clocks (e.g. using NTP). The servers should notice
the clock skew correction and stop issuing the warning</para>
<para>If the clock skew is not corrected and it exceeds 60 seconds, the
HA service on each of the servers is terminated, i.e. the state
machine enters the <command>terminated</command> state. The servers
will continue to respond to the DHCP clients (as in the load-balancing
or hot-standby mode), but will neither exchange lease updates nor
heartbeats and their lease databases will diverge. In this case, the
administrator should synchronize the clocks and restart the servers.
<title>Server States</title>
<para>The DHCP server operating within an HA setup runs a state machine
......@@ -167,6 +202,26 @@
answer from the partner and is not doing anything else while the
leases synchronization takes place.</para></listitem>
<listitem><para><command>terminated</command> - an active server
transitions to this state when the High Availability hooks library
is unable to further provide reliable service and a manual
intervention of the administrator is required to correct the problem.
It is envisaged that various issues with the HA setup may cause the
server to transition to this state in the future. As of Kea 1.4.0
release, the only issue causing the HA service to terminate is
unacceptably high clock skew between the active servers, i.e. if the
clocks on respective servers are more than 60 seconds apart.
While in this state, the server will continue responding to the
DHCP clients based on the HA mode selected (load balancing or
hot standby), but the lease updates won't be exchanged and the
heartbeats won't be sent. The server which got into the
"terminated" state will remain in this state until it is
restarted. The administrator must eliminate the issue which caused
this situation prior to restarting the server (synchronize clocks).
Otherwise, the server will return to the "terminated" state as
soon as it finds that the clock skew is still too high.
<listitem><para><command>waiting</command> - each started server
instance enters this state. The backup server will transition
directly from this state to the <command>backup</command> state.
......@@ -245,6 +300,12 @@
<entry>active server</entry>
<entry>same as in the load-balancing or hot-standby state</entry>
<entry>any server</entry>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment