Andrei Pavel · e3c1f9fd
--- a/Designs/free-lease-queues-design.md
+++ b/Designs/free-lease-queues-design.md
+[[_TOC_]]
+
+# Free Lease Queue Design
+
+This document replaces the [Backend Assisted Lease Selection Design](designs/Backend-Assisted-Lease-Selection-design) document. It describes optimizations in free lease selection for a DHCP client in case when appropriate address pools are exhausted or nearly exhausted. The described approach is common for all lease database backends. It introduces free lease queues (FLQ) populated upon server's startup or reconfiguration. The queues are designed to facilitate fast free lease lookup within an address range (pool) and fast insertion of a reclaimed lease into appropriate queue making the lease available for allocation.
+
+A notable difference between the approach described in obsolete [Backend Assisted Lease Selection Design](designs/Backend-Assisted-Lease-Selection-design) and this design is that the latter uses common approach for all lease database backends (free lease queues are maintained in server's memory - outside of the backends), while the former design described backend specific storage for the queues in each backend.
+
+## Issues with Iterative Allocation
+
+Lease selection is a complex process in which the allocation engine favors allocation of reservations and address hints. If neither a reserved address nor an address provided as hint can be offered, the server must find free lease from within the address pools belonging to the given subnet or multiple subnets, if the subnets belong to a shared network. In a new installation all addresses in a new pool are initially free. The server iteratively picks addresses from the pool and offers them to the clients. This is the least complicated way to select free leases and it is also the fastest when the pool utilization is relatively low.
+
+The iterative allocation performance degrades significantly when the pools are nearly exhausted. The server walks over the entire pool and for each candidate address it checks lease existence. In an extreme case, the server will walk over the entire pool before it will find out that there are no more addresses available and then it will try another pool. This has been reported to be an issue for some installations having many relatively small pools with high utilization.
+
+The solution described in this design is aimed at minimizing the number of iterations to find a free lease within an address pool or learn that the pool is exhausted. Ideally, the number of iterations for a single pool should be equal to 1, even when the pool is nearly exhausted. The use of in pool host reservations (and conflict resolution scheme) may increase the number of iterations when conflicts are detected. However, the iterative allocation is also prone to this issue.
+
+## Configuration
+
+### Configuration Levels
+
+The new mechanism mostly aimed at improving lease allocation performance for nearly exhausted pools. It may also improve performance for the pools with high fragmentation, in which free leases are scattered. For large address pools, e.g. /8, the occurrence of the pool exhaustion is less probable. Populating and maintaining free lease queues has certain computation cost upon server's startup and reconfiguration and increases memory usage. Therefore it should be administrator's decision for which pools the free lease queues are used and for which pools they aren't. The appropriate configuration knobs must be available at global, shared network, subnet and pool levels.
+
+### Shuffling Leases
+
+The free lease queues (FLQ) designed herein, besides fast lease lookup, will also provide a new capability to shuffle offered leases. It has been pointed out by several Kea users and admitted by Kea team members that offering consecutive IP addresses (iterative allocation) makes it trivial to guess next address to be offered/allocated and makes a network prone to scanning attacks. Introduction of FLQ is a good opportunity for addressing this issue. Leases in the queues can be shuffled and offered in no particular order. Shuffling leases requires maintenance of a (shuffling) state and introduces additional complexity. Hence, this design assumes that it should be possible to enable/disable shuffling from global down to the pool level.
+
+### Configuration Knobs
+
+FLQ and shuffling are complementary and should be configured separately. This design proposes introduction of the two boolean parameters: `free-lease-queue` and `free-lease-shuffle`, to enable (if true) or disable (if false) the use of these features.
+
+The following configuration:
+
+```json
+{
+    "Dhcp4": {
+        "free-lease-queue": false,
+        "free-lease-shuffle": false
+    }
+}
+```
+
+is a default configuration which globally disables the use of FLQ and leases shuffling, effectively causing the server to use iterative allocation strategy as in Kea 1.9.0 and earlier.
+
+Both parameters can be specified at global, shared network, subnet and pool levels and are subject to regular inheritance rules.
+
+For example:
+
+```json
+{
+    "Dhcp4": {
+        "free-lease-queue": true,
+        "free-lease-shuffle": true,
+        "subnet4": [
+            {
+                "subnet": "10.0.0.0/8",
+                "free-lease-queue": false
+            }
+        ]
+    }
+}
+```
+
+In this example, the subnet specific setting of `free-lease-queue` overrides the default setting, effectively disabling the usage of free lease queues in the 10.0.0.0/8 subnet. All subnets for which neither of this parameters is overridden will use the global settings.
+
+The following table summarizes the pros and cons of using the new settings. The administrator must consider pool sizes, pools utilization, memory size, machine performance and possibly other things specific to the deployment to select appropriate settings at different configuration levels.
+
+| free-lease-queue | free-lease-shuffle | Pros | Cons |
+| ---------------- | ------------------ | ---- | ---- |
+| false | false | Fast startup, low memory consumption. | Network scans, slow free lease lookup. |
+| true | false | Fast free lease lookup, not using memory for shuffling. | Slower startup, network scans. |
+| false | true | Network scanning harder with no overhead for maintaining the queues. | Slow free lease lookup. |
+| true | true | Network scanning harder, fast free lease lookup. | Slower startup, the largest memory consumption. |
+
+## Code Changes
+
+### New Allocator
+
+Kea allocation engine is a part of the `dhcpsrv` library, responsible for finding free leases and allocating them to the clients. It is also responsible for periodically reclaiming expired leases. An `allocator` is a replaceable part of the allocation engine providing a mechanism to find next candidate lease. As of today, Kea has only one allocator, the iterative allocator, which is always used. This allocator satisfies the first case out of those listed above, i.e. both `free-lease-queue` and `free-lease-shuffle` are set to `false` at all levels.
+
+In order to facilitate other cases (lease queues and/or shuffling), a new allocator must be developed. Currently, only one allocator instance can be used during the server's run time. It implies that if any of these new configuration settings is set to `true` at any level, the new allocator must be used instead of the default iterative allocator. It also means that the new allocator must support the iterative allocation strategy because this strategy may be used for selected pools.
+
+This brings a question if we should remove the iterative allocator altogether given that the new allocator will cover its functionality. Personally, as the author of this design, I feel that initially it shouldn't be removed until we make sure that the new allocator performs as good as the iterative allocator in terms of performance and functionality. The new allocator is currently considered experimental and therefore there must be a way to fall back to the "traditional", well tested iterative allocation strategy.
+
+Another interesting question is whether we should be able to configure the server to use different allocators for different pools, subnets etc. This is a possibility which must be further explored. At the first glance, however, it looks like it may bring too much complexity to the allocation engine state for respective pools.
+
+I propose that we use iterative allocator when `free-lease-queue` and `free-lease-shuffle` are set to `false` at all configuration levels and use the new allocator when any of these values is set to `true` at any configuration level.
+
+The proposed name for this new allocator is `FLQAllocator`.
+
+### Shuffling (Permutations)
+
+The server is configured to offer addresses from the contiguous address ranges (pools). When the allocation engine follows the iterative strategy it finds next address in the pool by incrementing a pool specific counter by one until it reaches the end of the pool. Then, it starts from the beginning of the address range until it reaches the address from which it began. This way the allocation engine ensures to walk over all addresses in the address pool.
+
+When the addresses are to be offered in a random order, it is significantly harder to ensure that the allocation engine walks over all addresses in the pool before proceeding to the next pool or starts from the beginning of that pool.
+
+A random arrangement of all elements (IP addresses in this case) in a given range is called "permutation". The [Fisher-Yates Shuffle](https://en.wikipedia.org/wiki/Fisher–Yates_shuffle) is a well known algorithm for generating permutations. Using this algorithm we're going to create a new class, `IPRangePermutation`, which will be used to shuffle the addresses in a given address range and return them in a random order.
+
+This class will expose the following interface:
+
+```c++
+IPRangePermutation(const AddressRange& range);
+IPRangePermutation(const PrefixRange& range);
+bool exhausted() const;
+asiolink::IOAddress next(bool& done);
+```
+
+The first two are the constructors creating new permutation for a pool or delegated prefix pool respectively. The `exhausted()` function checks if we have reached the end of the permutation (the end of the pool), i.e. we have already stored all addresses from the pool in the FLQ or, if `free-lease-queue` is `false`, we offered all addresses from the pool to the clients and we should move to its beginning. The `next` method returns next (random) address from the permutation. The `next` never returns the same address twice. It keeps an internal state remembering what addresses have been already returned and avoids returning them again.
+
+We should consider having a setting for this class that will cause it to always return a default permutation, i.e. addresses in ascending order. This will facilitate the case of iterative allocation with the `ShuffledPreAllocator`. This will simplify the logic of the allocator which will use the `IPRangePermutation` regardless if the server is configured to shuffle the leases or not.
+
+### Free Lease Queue
+
+The FLQ is a structure holding information about available leases in various IP address/prefix ranges. This structure has the following properties:
+- holds the information about free leases for all configured pools for which `free-leases-queue` is `true`,
+- can be used as double ended queue for an IP range, i.e. free leases appended to the end of the queue and picked from its front,
+- is optimized for appending a reclaimed lease at the end of the address range to which this lease belongs, i.e. the appropriate queue should be quickly found for a reclaimed lease and the lease should be appended to this queue
+
+Free leases are populated to the appropriate queues during server's startup or reconfiguration. The server iterates over the entire (or optionally a part of the) IP range and for each address or delegated prefix it checks if it is free (there is no valid lease for it). Each free lease is appended to the FLQ. The order in which the leases are appended to the FLQ is determined by the `IPRangePermutation` instance for the given pool.
+
+The process of populating free leases is expensive and may significantly delay the server's reconfiguration. Thus, the server should compare the pools before and after reconfiguration and populate free leases only for new or modified pools. If no pools have been added or modified, no leases are populated. The free leases will always be populated when the server is (re)started.
+
+The `FreeLeaseQueue` class includes, but is not limited to, the following calls for managing and retrieving free leases:
+
+```c++
+// Used in lease reclamation to append reclaimed lease to the end of the queue.
+bool append(const asiolink::IOAddress& address);
+bool append(const asiolink::IOAddress& prefix, const uint8_t delegated_length);
+
+// Used when free leases are populated to append free lease to the range.
+void append(const AddressRange& range, const asiolink::IOAddress& address);
+void append(const PrefixRange& range, const asiolink::IOAddress& prefix);
+
+// Used during lease allocation to indicate that the given lease is being allocated
+// and should be removed from FLQ.
+bool use(const AddressRange& range, const asiolink::IOAddress& address);
+bool use(const PrefixRange& range, const asiolink::IOAddress& prefix);
+
+// Returns next candidate lease without removing it from FLQ.
+template<typename RangeType>
+asiolink::IOAddress next(const RangeType& range);
+
+// Returns next candidate lease and removes it from the FLQ.
+template<typename RangeType>
+asiolink::IOAddress pop(const RangeType& range);
+```
+
+### New Lease API Calls
+
+When populating free leases for an IP range the server must check if the lease exists for each address or delegated prefix in this range. This has significant performance implications for SQL/CQL lease database backends. It is unacceptable to make a separate database query for each address or delegated prefix. Instead, the backends should provide the API calls for fetching all valid leases within a given IP range. Then, the server can iterate over all addresses or delegated prefixes within the IP range and use the in-memory copy of the leases to decide which address/delegated prefix is free and should be recorded within the FLQ. This reduces the number of database queries to one per address range.
+
+### Lease Reclamation
+
+Lease reclamation is a recurring task performed by the server to detect expired leases and make them available for new allocations. In case of the iterative allocation, a lease can also be reclaimed "on flight" when the allocation engine checks if the lease returned by the allocator is available and finds that it is expired. Even if it hasn't been reclaimed by the reclamation routine it can be allocated. We sometimes call such allocation "reusing an expired lease". Appropriate callouts are executed as in case of the reclamation via the reclamation routine and the DDNS cleanup is also performed.
+
+In case of the FLQ allocation strategy the lease reuse will almost never occur. Once a lease is allocated it is removed from the FLQ and is not returned to the FLQ until it is reclaimed. The FLQ allocator only offers the leases found in the FLQ. This makes it essential that the lease reclamation is enabled at all times when using FLQ allocation strategy. If it is disabled the server may run out of available addresses even though the lease database contains expired leases which could be used for new allocations.
+
+When the server is started and the FLQ is in use, the server must make it clear to the administrator (by issuing a log message) that the periodic lease reclamation must be enabled. The frequency of the lease reclamation is also important. If it is too rare, the FLQ may get exhausted causing the allocation to fail, even though there may be some expired leases to be reused.
+
+## Optimizations for Large Pools
+
+The FLQ mechanism is aimed at dealing with performance degradations when pools are nearly exhausted. This is rarely the case for very large pools and it is assumed that for these pools the administrators will rather disable the FLQ. However, it is hard to make definitive statements when a pool is too large for using FLQ and when using FLQ is beneficial with acceptable server startup time. It depends on many factors and it may require some experience in the system administration before any well informed decisions can be made. Some administrators may want to leave the FLQ enabled for all pools initially. Some can make configuration errors and leave FLQ enabled for a pool as large as /64. The server should handle these cases gracefully and shouldn't freeze on startup.
+
+Assuming that X (TBD) is the maximum number of leases that can be loaded into the FLQ at once for a given pool, we should provide a capability to populate at most X free leases at startup and then add X more to the FLQ every time the number of leases in the FLQ drops below Y. The process of refilling the FLQ should be recurring as in case of the lease reclamation routine. Perhaps queue refiling and lease reclamation could be even be combined?
+
+The server stops refilling a given pool with leases when the `IPRangePermutation` associated with this pool signals that the permutation is exhausted, i.e. all leases belonging to the IP range are either allocated, expired or within FLQ.
+
+## Compatibility with Different Lease Backends
+
+The presented solution is compatible with all supported lease database backends. It can also be used with proprietary lease database backends, should they be created. The backends must implement new calls fetching valid leases for an IP range, described earlier in this design.
+
+Because the server maintains FLQ in its memory and free leases are populated during startup, reconfiguration or while refilling the FLQ, there is a possibility that the information within the FLQ and the lease database diverge. For example, this is the case when two or more Kea servers are share the lease database and independently allocate leases. One of the servers "thinks" that a lease is free but the other server has allocated this lease to another client. This design does not deal with such conflicts in any special way. It relies on the existing allocation engine implementation which already resolves such conflicts. If it finds that a lease exists for the candidate address or delegated prefix it simply gets the next candidate address from an allocator. The process ends after as many attempts as the size of the pool. In case of the shared database it certainly increases the average number of calls to FLQ to get free lease, on the other hand the risk of such collisions should be partially mitigated by the use permutations.