Andrei Pavel · e3c1f9fd
--- a/Designs/anycast-ha-considerations.md
+++ b/Designs/anycast-ha-considerations.md
+# DHCP with High Availability Considerations
+
+## Table of Contents
+
+[[_TOC_]]
+
+## DHCP with Anycast
+Some operators expressed their interest in using anycast addresses for DHCP servers in their networks. In this case, several DHCP servers may exist in the network with the same IP address on which the servers listen for DHCP messages. The BGP routing protocol is used to determine the "nearest" server and forward DHCP messages. Using anycast improves DHCP service reliability and manageability. If a DHCP server terminates unexpectedly or an operator shuts it down for maintenance, the traffic is automatically re-routed to other servers.
+
+A DHCPv4 client interested in obtaining a new lease sends a DHCPDISCOVER to a broadcast address and may receive responses from one or multiple servers. The client selects one of the servers and broadcasts DHCPREQUEST with the selected server's identifier to obtain a lease. Multiple servers may receive this message, but only a server with the matching server identifier responds.
+
+To renew the lease, the client sends DHCPREQUEST to the unicast address of the server that allocated the lease.
+
+Suppose the client already has a lease, but the server which allocated this lease is unavailable. In that case, the client enters a rebinding state and sends DHCPREQUEST to a broadcast address without specifying a server identifier, and accepts a response from any server.
+
+To summarize, a DHCPv4 client either broadcasts its messages to multiple servers without indicating any particular server to respond, broadcasts the messages to numerous servers, or unicasts the messages to a specific server. The server's unicast address is also its server identifier. The client can use the server identifier to select one of the servers in broadcast messages.
+
+DHCP protocols were not designed with anycast in mind. In anycast, all servers use the same IP address as server identifier. The client is unable to distinguish the servers and can't control which server will receive its messages. It is also the case when the client renews its lease. The servers must always assume that received messages were directed to them because all use the same server identifier. The usual ways of distinguishing the servers by their server identifiers, both in case of new lease allocations and lease renewals, are no longer sufficient.
+
+This and other aspects of using anycast and HA are discussed in more detail in the next section.
+
+## Challenges
+In this section, we discuss the challenges of using anycast with HA. The proposed solutions are described in another section.
+
+### Lease Allocation Conflicts
+Use of anycast implies that all participating DHCP servers possess the most current lease information because any of them may receive a DHCP request at any time. Many Kea operators use a shared SQL database as a central lease information repository for multiple servers. This solution can work quite well with anycast because connected DHCP servers share a single copy of a lease database. The servers may compete for the same lease, attempting to allocate this lease to different clients, but database locking mechanisms prevent it. One of the DHCP clients will eventually be forced to retry and will obtain another lease.
+
+One of the flaws of using this solution is that the database becomes a single point of failure. Therefore, some of the operators prefer using the Kea High Availability hooks library instead.
+
+DHCP servers participating in the High Availability configuration maintain their copies of the lease database. They exchange lease information via the control channel using control commands, a.k.a. lease updates. Delays in lease updates transmission and temporary communication issues between the servers increase the risk of conflicting lease allocations. This risk is especially high in the load-balancing mode in which both servers respond to DHCP queries simultaneously. Therefore, Kea ARM urges the operators to break the address pools into distinct chunks (50/50 split) to be used by respective servers. The break guarantees that the servers never use conflicting addresses during normal operation.
+
+In the case of anycast, both servers must assume responsibility for every received DHCP query. Thus, the servers must be able to allocate addresses from the entire address range. Otherwise, renewing a lease assigned by another server wouldn't be possible. It imposes that using anycast requires a different solution to avoid lease allocation conflicts.
+
+### Multipath Queries
+Most of the articles describe the anycast as a mechanism to select the nearest host and send the packet to this host. On the DHCP field, it implies that there is only one server processing a given DHCP query at the time. However, some users interested in using anycast highlighted that multipath transmission is theoretically possible when a "distance" to both DHCP servers is equal. We should investigate if this occurs in real deployments and how frequently.
+
+If two servers independently receive and process the same query, they may make conflicting decisions about lease allocation. In particular, they may assign different leases to the same client. The client renewing a lease may receive DHCPACK from one server and DHCPNAK from the other. Finally, if the servers exchange lease updates, the partner's lease update will override the lease information on the server which allocated the lease. The overriding lease update may only slightly affect the original information, e.g., with a different client's last transmission time (cltt), but it may also lead to the deletion of this lease if the other server sent a DHCPNAK.
+
+If multipath transmissions occur frequently, it will have a significant negative impact on the lease information consistency.
+
+### Impact on Failover Procedure
+Kea HA solution includes a two-step failover procedure, which aims to reliably determine whether the partner has crashed or only the communication with it is temporarily interrupted. A premature switchover of both servers to the partner-down state may have severe consequences in the load-balancing mode. Both servers assume responsibility for the entire address range and may assign the same addresses to different clients.
+
+When communication between the servers is interrupted for a prolonged amount of time, the servers start monitoring the DHCP queries directed to their partners. The secs field in DHCPv4 and the Elapsed Time option in DHCPv6 specify how long a given client has been trying to obtain a lease. If these values grow beyond a configured threshold, it indicates that the partner does not respond to DHCP queries assuring the monitoring server about the partner's failure.
+
+Unlike in load-balancing, in the anycast case, there is no split of responsibility between the servers. In other words, each server must assume responsibility for a received query. If one server crashes, the surviving server does not know whether it received a query because its partner is unavailable or because it would typically receive this query according to the BGP routing scheme. Without any additional information, the server must always respond to the query. As a result, there are no unanswered queries with high values in secs field or the Elapsed Time option. The second step of the currently implemented failover procedure does not apply to anycast.
+
+## Possible Solutions
+In this section, we describe the proposed solutions to the challenges described above.
+
+### Lease Allocation Conflicts
+A DHCP exchange in which the lease allocation conflict is the most probable is when the client does not request any address. In DHCPv4, this is typically a DHCPDISCOVER/DHCPOFFER exchange. DHCPREQUEST without a requested IP address is much rarer. Still, Kea will attempt to allocate a lease to a client sending DHCPREQUEST without the requested IP address, even if the client has no lease. In other cases, i.e. renew or rebind, the client should include the requested IP address in the DHCPREQUEST message.
+
+Let's consider an algorithm in which the servers are configured to use split pools like in the load-balancing case, but they only use this split for specific DHCP exchanges. If a client sends a query without a requested IP address, the server must select an address for this client, and the server must use an address from its own pool. This way, servers won't offer the same address to two clients who send a DHCPDISCOVER or DHCPREQUEST without a requested IP address. If the server offered an address to the client, and the client subsequently sends a DHCPREQUEST, the client includes the offered IP address in the request. The server may now access addresses from the entire address space, but the client requests the address from this server's pool. In this case, we assume that the clients follow the DHCP specifications and request the same address they were offered in the DHCPDISCOVER/DHCPOFFER exchange. Also, we assume that both requests within the 4-way exchange are routed to the same server.
+
+After the server assigned a lease to the client, the client will send a DHCPREQUEST to the server periodically to renew the lease. The client unicasts the request; the nearest DHCP server will receive it and respond because all servers share the same IP address and server identifier. The receiving server may be different than the one which allocated the lease. The same situation occurs when the client is in the "rebinding" state. The only difference is that in that case, the client does not include a server identifier in the request. Since the client provided the requested IP address, the server may now use the entire address range to allocate the lease. The renewed address may belong to this server or its partner. The server renews the address regardless.
+
+Suppose there are two servers: "server1" and "server2", and their address pools are correctly split. The Kea HA hooks library must recognize the packets without a requested IP address and assign a server-specific class to them, e.g. "HA_server1" for server1. If a received request contains a requested IP address, the server should assign both servers' classes, i.e. "HA_server1" and "HA_server2". If the server is in the partner-down state, it must always assign both classes.
+
+### Multipath Queries
+There is no solution to be implemented in the DHCP server, which we could use to recognize that a query was received by more than one server. Thus, we will have to document that an operator willing to use anycast with HA should ensure that multipath DHCP queries are not possible in his network or that such queries are infrequent. If this can't be guaranteed, we recommend not to use the anycast.
+
+### Impact on Failover Procedure
+The two-step failover procedure does not work with anycast HA configuration. We need to limit the failover procedure to the heartbeat mechanism and assume that the partner no longer works if the heartbeat is unsuccessful longer than a configured amount of time. If the partner is operational and only the control channel connection failed, this may lead to a situation that both servers transition to the partner-down state. In this state, the servers don't exchange lease updates, so that any new lease allocations one of the servers made will be missed by the partner. The servers will synchronize their lease databases when the communication is re-established.
+
+When two servers are in the partner-down state, there is no significant risk of conflicting lease renewals. A server receiving a renewal request will extend the lease in its local database and respond. If the network topology hasn't changed, the server receiving the renewal request is typically the one that initially allocated the lease.  If this is a different server, the renewal should also succeed.
+
+When a server runs in the load-balancing mode and the partner-down state, the server selects an address pool for new allocations using a special algorithm. It computes a hash from the client's MAC address and/or client identifier. Next, the hash value is examined to decide which server should process the packet. The algorithm is deterministic and yields the same result on both servers. This result is stable for a stable MAC address and/or client identifier. This algorithm allows for picking the pool from which the client would obtain a lease if both servers were operating normally. When the servers return to normal operation, both servers take responsibility for the clients appropriate for them (based on the load balancing algorithm) without a need for re-allocating assigned leases during partner-down. All clients have leases from suitable pools.
+
+In the anycast setup, there is no load balancing algorithm involved. A server running in the partner-down state and receiving a DHCPDISCOVER has no means to select a preferred address pool for the given client. Therefore, the server must offer an address from its own pool. Offering an address from the partner's pool poses a risk of collision if the partner is operational. The partner could provide the same address to another client.
+
+Suppose the server allocates a lease from its own pool, and the client subsequently renews this lease with the partner server. In that case, the partner will accept the renewal because servers can allocate addresses from all pools when the queries contain requested IP addresses (see above).
+
+The solution described here looks sane, but it has a drawback that we must take into consideration. Typically the address pools are split 50/50. The load-balancing algorithm guarantees the even distribution of queries between the servers. In the anycast case, the distribution is more fluid and depends either on the BGP routing configuration or the DHCP servers' downtime. When one server dies, and the other takes over the service, the latter may allocate some new leases from its address pool. It leads to an imbalance between the lease allocations in the pools belonging to the servers. Kea HA implementation currently has no mechanism to rebalance the pools. Without rebalancing, one of the servers may starve free addresses. On the other hand, if there is an equal probability of either server failure and the average downtime, we can assume that the pools' utilization will tend to rebalance over time between the servers.
+
+If a tendency to go out of balance is a concern, we can consider adding a configuration parameter allowing the servers to use the entire address space (all pools) when they receive requests without the requested IP addresses. Enabling it will increase the potential for collisions, but we could mitigate it with random lease selection.
+Random Lease Selection
+Kea uses an iterative allocation strategy to select free leases from the available address pools. It significantly improves lease selection performance and reduces the server's complexity but has a major pitfall for a pair of servers sharing address pools, regardless of whether using HA or shared lease database for redundancy. In the extreme case, the servers begin with empty pools and start offering the same addresses in the same order to different clients. The situation slightly improves over time because of the pools' fragmentations, but we could do better than that.
+
+The servers should randomize lease selection, so there is an equal probability of selecting any free lease. The randomization concept using permutations was described in the "leases preallocation design". We should consider implementing it as an alternative to the iterative allocation. As a final note, it is worth to mention that leases randomization perform best for the low to moderate pool utilization. When the pool utilization is high, the conflict rate will dramatically increase regardless of the randomization.
+
+## DHCPv6 Considerations
+This document focuses on using anycast with DHCPv4 servers because most of the questions about the use of anycast from Kea users were about this protocol. It is also simpler to give examples using one protocol's nomenclature. However, Kea HA works as well for DHCPv6, and similar issues to DHCPv4 are present.
+
+In the DHCPv6, similar solutions apply. If a client sends a Solicit message, the receiving server should select leases from its pools. If the client sends a Renew or Rebind message, the server should refresh the client's leases regardless of the address pools to which they belong.
+
+In the case of Request message, the server should generally allocate requested IAs if they contain addresses and/or delegated prefixes. If they contain no addresses nor delegated prefixes, the server should use its own pools. In an unlikely situation that the Request message contains some IAs with and some without requested leases, the server should use its own pool and ignore addresses and delegated prefixes in the IAs to avoid allocating some leases from pools belonging to this server and some from the pools belonging to the partner.
+
+## Open Questions
+The following are the open questions that stem from the considerations in this document:
+  - Can we assume that multipath queries are non-existent or very rare when using anycast?
+  - Do we need a configuration parameter to allow using the entire address space when the server processes a query without the requested IP address?
+
+## Final Notes
+The issues and solutions described in this document focus on the use case when HA and anycast addresses are used. This document aims to evaluate the complexity of the solutions and the suitability of using anycast with HA in general. Still, it must be remembered that HA by design facilitates only two active servers. It may be a roadblock for scaling the DHCP deployment. Every operator should keep this in mind during the design phase of his network. A shared lease database may often be a better pick because it can scale beyond two servers. Additionally, it is currently better suited for anycast. Our HA solution can be extended to facilitate anycast addressing, but we should first gather some feedback if there is a strong desire for it among our users, and if the assumptions made here are acceptable.