Load-balancing and Failover in Kea - Requirements
This section provides a list of formal requirements for High Availability (HA) in Kea. We plan to address some of these in Kea 1.4.0. This is "work in progress" document and it is subject to changes.
For the purpose of this design, we are only considering a pair of failover peers, which is the equivalent functionality of DHCPv4 failover. This is a baseline requirement for many ISC DHCP users to migrate to Kea.
Many large operators (especially those running datacenters) require at least 3 equivalent server instances for redundancy and world-wide coverage, but a solution that would work for n+1 redundancy might be a different solution, with different features and limitations. (This solution is likely to require the separate lease database backend.)
The failure scenarios addressed include the failure of one of the Kea servers, a network segmentation in which the two servers cannot connect to each other, and the failure or loss of connectivity between Kea and its database backend.
G.1. Kea DHCP servers MUST support redundancy to increase DHCP service uptime in case of failure ("High Availability of DHCP service", or briefly "HA").
G.2. Kea MUST support at least two server instances of the same kind, working as "HA peers" to provide redundancy.
G.3. HA MUST be supported by both DHCPv4 and DHCPv6 server.
G.4. Load balancing with a split of 50/50 MUST be supported in HA configuration.
G.5. Hot standby involving two servers MUST be supported in the HA configuration.
G.6. Backup servers MAY be supported. These servers receive lease updates and may be manually activated to perform failover.
G.7. HA MUST be supported for dynamic lease allocations from pools.
G.8. HA MUST support host reservations.
G.9. HA MUST be supported in Kea configuration using any supported lease database, i.e. MySQL, Postgres and Memfile.
G.10. HA MUST support the case when leases are replicated via external database replication, e.g. MySQL database replication.
C.1. HA configuration MUST support splitting pools between HA peers.
C.2. HA configuration MUST support splitting subnets between HA peers.
C.3. HA configuration MUST support splitting shared networks between HA peers.
C.4. HA configuration MUST provide a parameter indicating if the given peer should perform failover automatically.
C.5. HA configuration MUST provide a parameter indicating that the starting up server should remain (be paused) in the specified HA state.
== Failure Detection Requirements ==
F.1. HA peer MUST be able to detect partner failure by periodically sending heartbeat command.
F.2. HA peer MUST be able to detect partner failure by examining the secs field (DHCPv4) and elapsed time (DHCPv6) of queries sent to the partner.
F.3. HA peer MUST be able to automatically start processing DHCP traffic directed to a partner when the partner is down.
== Requirements specific to Database-backend deployment
D1. Kea MUST detect the failure of its own database connection (if using a db backend) and must attempt to reconnect.
D2. It MUST be possible to configure more than one database IP address into Kea to try in case the primary is unresponsive.
D3. Kea MUST support alternate algorithms for address selection, so that two servers sharing a single database or cluster backend can minimize collisions by employing different algorithms.
D4. Kea SHOULD implement some improved connection to the db backend to improve communication performance, either socket support or multiple simultaneous IP connections.
== Synchronization Requirements ==
S.1. HA peers MUST be able to send/receive synchronous lease updates, i.e. response is not sent to a DHCP client until peers confirmed that the lease update was successful.
S.2. HA peers MUST be able to send lease updates to multiple hosts, e.g. other HA peers, backup services etc.
S.3. HA peer MUST be able to query for all leases held in partner's database using RESTful API and synchronize its lease database resolving any conflicts.
S.4. HA peers MUST be able to automatically resume the load balanced service after one or more servers are put back online.
== Commands Requirements ==
X.1. Kea DHCP servers MUST support a command to cease DHCP service, e.g. when synchronizing lease database.
X.2. Commands described in X.1 MUST provide optional timeout value which would cause the servers to resume DHCP service after a specified period of time.
X.3. Kea DHCP server MUST support a command to resume DHCP service.
X.4. Kea DHCP server MUST support a command to retrieve all leases from the lease database.
X.5. Kea DHCP server MUST support a command instructing the server to take ownership of pools belonging to their HA peers, in case the peers are down.
X.6. Kea DHCP server MUST support a command instructing the server to stop serving leases from pools belonging to other peers, in case the peers are back online after the failure.
X.7. Kea DHCP server MUST support a heartbeat command used by the HA peers to verify if the server is online.
X.8. Kea DHCP server MUST support a command to synchronize lease database with a specified server.
X.9. Kea RESTful API MUST support long lived HTTP connections, i.e. connections over which multiple commands can be sent.
X.10. Kea DHCP server MUST support a command which will allow the server to transition to the next HA state after pausing the state machine at the given state as a result of the configuration (see C.5).
Logging and Diagnostics
P.1. HA peer MUST be able to use the server identifier of the partner when responding to a query directed to the partner being down.