|
|
# Load-balancing and Failover in Kea - Requirements
|
|
|
|
|
|
This section provides a list of formal requirements for High Availability (HA) in Kea. We plan to address some of these in Kea 1.4.0.
|
|
|
|
|
|
For the purpose of this design, we are only considering a pair of failover peers, which is the equivalent functionality of DHCPv4 failover. This is a baseline requirement for many ISC DHCP users to migrate to Kea.
|
|
|
|
|
|
Many large operators (especially those running datacenters) require at least 3 equivalent server instances for redundancy and world-wide coverage, but a solution that would work for n+1 redundancy might be a different solution, with different features and limitations. (This solution is likely to require the separate lease database backend.)
|
|
|
|
|
|
The failure scenarios addressed include the failure of one of the Kea servers, a network segmentation in which the two servers cannot connect to each other, and the failure or loss of connectivity between Kea and its database backend.
|
|
|
|
|
|
## General Requirements
|
|
|
|
|
|
G.1. Kea DHCP servers MUST support redundancy to increase DHCP service uptime in case of failure ("High Availability of DHCP service", or briefly "HA").
|
|
|
|
|
|
G.2. Kea MUST support at least two server instances of the same kind, working as "HA peers" to provide redundancy.
|
|
|
|
|
|
G.3. HA MUST be supported by both DHCPv4 and DHCPv6 server.
|
|
|
|
|
|
G.4. Load balancing with a split of 50/50 MUST be supported in HA configuration.
|
|
|
|
|
|
G.5. Hot standby involving two servers MUST be supported in the HA configuration.
|
|
|
|
|
|
G.6. Backup servers MAY be supported. These servers receive lease updates and may be manually activated to perform failover.
|
|
|
|
|
|
G.7. HA MUST be supported for dynamic lease allocations from pools.
|
|
|
|
|
|
G.8. HA MUST support host reservations.
|
|
|
|
|
|
G.9. HA MUST be supported in Kea configuration using any supported lease database, i.e. MySQL, Postgres and Memfile.
|
|
|
|
|
|
G.10. HA MUST support the case when leases are replicated via external database replication, e.g. MySQL database replication.
|
|
|
|
|
|
## Configuration Requirements
|
|
|
|
|
|
C.1. HA configuration MUST support splitting pools between HA peers.
|
|
|
|
|
|
C.2. HA configuration MUST support splitting subnets between HA peers.
|
|
|
|
|
|
C.3. HA configuration MUST support splitting shared networks between HA peers.
|
|
|
|
|
|
C.4. HA configuration MUST provide a parameter indicating if the given peer should perform failover automatically.
|
|
|
|
|
|
C.5. HA configuration MUST provide a parameter indicating that the starting up server should remain (be paused) in the specified HA state.
|
|
|
|
|
|
== Failure Detection Requirements ==
|
|
|
|
|
|
F.1. HA peer MUST be able to detect partner failure by periodically sending heartbeat command.
|
|
|
|
|
|
F.2. HA peer MUST be able to detect partner failure by examining the secs field (DHCPv4) and elapsed time (DHCPv6) of queries sent to the partner.
|
|
|
|
|
|
F.3. HA peer MUST be able to automatically start processing DHCP traffic directed to a partner when the partner is down.
|
|
|
|
|
|
== Requirements specific to Database-backend deployment
|
|
|
|
|
|
D1. Kea MUST detect the failure of its own database connection (if using a db backend) and must attempt to reconnect.
|
|
|
|
|
|
D2. It MUST be possible to configure more than one database IP address into Kea to try in case the primary is unresponsive.
|
|
|
|
|
|
D3. Kea MUST support alternate algorithms for address selection, so that two servers sharing a single database or cluster backend can minimize collisions by employing different algorithms.
|
|
|
|
|
|
D4. Kea SHOULD implement some improved connection to the db backend to improve communication performance, either socket support or multiple simultaneous IP connections.
|
|
|
|
|
|
== Synchronization Requirements ==
|
|
|
|
|
|
S.1. HA peers MUST be able to send/receive synchronous lease updates, i.e. response is not sent to a DHCP client until peers confirmed that the lease update was successful.
|
|
|
|
|
|
S.2. HA peers MUST be able to send lease updates to multiple hosts, e.g. other HA peers, backup services etc.
|
|
|
|
|
|
S.3. HA peer MUST be able to query for all leases held in partner's database using RESTful API and synchronize its lease database resolving any conflicts.
|
|
|
|
|
|
S.4. HA peers MUST be able to automatically resume the load balanced service after one or more servers are put back online.
|
|
|
|
|
|
== Commands Requirements ==
|
|
|
|
|
|
X.1. Kea DHCP servers MUST support a command to cease DHCP service, e.g. when synchronizing lease database.
|
|
|
|
|
|
X.2. Commands described in X.1 MUST provide optional timeout value which would cause the servers to resume DHCP service after a specified period of time.
|
|
|
|
|
|
X.3. Kea DHCP server MUST support a command to resume DHCP service.
|
|
|
|
|
|
X.4. Kea DHCP server MUST support a command to retrieve all leases from the lease database.
|
|
|
|
|
|
X.5. Kea DHCP server MUST support a command instructing the server to take ownership of pools belonging to their HA peers, in case the peers are down.
|
|
|
|
|
|
X.6. Kea DHCP server MUST support a command instructing the server to stop serving leases from pools belonging to other peers, in case the peers are back online after the failure.
|
|
|
|
|
|
X.7. Kea DHCP server MUST support a heartbeat command used by the HA peers to verify if the server is online.
|
|
|
|
|
|
X.8. Kea DHCP server MUST support a command to synchronize lease database with a specified server.
|
|
|
|
|
|
X.9. Kea RESTful API MUST support long lived HTTP connections, i.e. connections over which multiple commands can be sent.
|
|
|
|
|
|
X.10. Kea DHCP server MUST support a command which will allow the server to transition to the next HA state after pausing the state machine at the given state as a result of the configuration (see C.5).
|
|
|
|
|
|
## Logging and Diagnostics
|
|
|
|
|
|
|
|
|
|
|
|
## Protocol Requirements
|
|
|
|
|
|
P.1. HA peer MUST be able to use the server identifier of the partner when responding to a query directed to the partner being down. |