Marcin Siodelski · 6b3a66db
--- a/designs/Backend-Assisted-Lease-Selection-design.md
+++ b/designs/Backend-Assisted-Lease-Selection-design.md
@@ -12,24 +12,24 @@ The design issue for this feature is #1126.

 The new lease lookup scheme is going to shift some logic from the allocation engine to the lease backend implementations. We're migrating away from the lease allocation strategy in which the allocation engine "blindly" picks a candidate lease and then verifies if it is available, to the strategy in which the lease backend finds and offers a candidate lease.

-Even though we can't always guarantee that the lease offered by the backend can be allocated to the given client, shifting the lease selection to the backend should significantly reduce the number of costly interactions between the allocation engine and the lease backend. In particular, it guarantees that if there is a free lease in a given address range it will be offered to the allocation engine with a single query, even if this last available lease in this pool. We're going to call this strategy a Backend Assisted Lease Selection (BALS). In the current code, the lease backend passively responds to the queries whether the lease picked by the allocation engine is available or not. In the extreme case, the allocation engine can make a large number of queries equal to the size of the address pool to find the last available lease. If the pool is exhausted, it will make all these queries to find out that there are no more available leases. With the Backend Assisted Lease Selection the allocation engine will learn that the pool is already exhausted with a single query. It can then try other pools or even subnets, if the subnets are grouped in shared networks.
+Even though we can't always guarantee that the lease offered by the backend can be allocated to the given client, shifting the lease selection to the backend should significantly reduce the number of costly interactions between the allocation engine and the lease backend. In particular, it guarantees that if there is a free lease in a given address range it will be offered to the allocation engine with a single query, even if this last available lease in the pool. We're going to call this strategy a Backend Assisted Lease Selection (BALS). In the current code, the lease backend passively responds to the queries whether the lease picked by the allocation engine is available or not. In the extreme case, the allocation engine can make a large number of queries equal to the size of the address pool to find the last available lease. If the pool is exhausted, it will make all these queries to find out that there are no more available leases. With the Backend Assisted Lease Selection the allocation engine will learn that the pool is already exhausted with a single query. It can then try other pools or even subnets, if the subnets are grouped in shared networks.

-The lease offered by the backend is available for allocation from the backend's perspective. There may be some additional conditions to be tested on the allocation engine's side prior to offering the lease to the client, e.g. checking if the given lease is not reserved to another client. If any of these additional conditions is not met, the allocation engine sends additional queries to the lease backend asking for another lease. This makes it similar to the currently used allocation strategy in that it make take several attempts before the lease is allocated, however the number of such attempts is taken to a minimum. Most of the time a single call should be sufficient to find and allocate a candidate lease. If the server is configured to assigned in-pool host reservations, it is possible that the candidate address is reserved to another client in which case at least two queries are needed. If the pool includes many host reservations, the number of attempts to find an available lease may significantly grow, but this should be no worse than the current situation. In fact, filling the address pool with in-pool reservations should be considered a misconfiguration. In that case it is recommended to use out of pool reservations which should minimize the risk of collisions or eliminate them entirely.
+The lease offered by the backend is available for allocation from the backend's perspective. There may be some additional conditions to be tested on the allocation engine's side prior to offering the lease to the client, e.g. checking if the given lease is not reserved to another client. If any of these additional conditions is not met, the allocation engine sends additional queries to the lease backend asking for another lease. This makes it similar to the currently used allocation strategy in that it can take several attempts before the lease is allocated, however the number of such attempts is brought to a minimum. Most of the time a single call should be sufficient to find and allocate a candidate lease. If the server is configured to assigned in-pool host reservations, it is possible that the candidate address is reserved to another client in which case at least two queries are needed. If the pool includes many host reservations, the number of attempts to find an available lease may significantly grow, but this should be no worse than the current situation. In fact, filling the address pool with in-pool reservations should be considered a misconfiguration. In that case it is recommended to use out of pool reservations which should minimize the risk of collisions or eliminate them entirely.

 # Backends Compatibility

-The concepts described in this design will have to be implemented for different lease database backends. Due to the complexity of the allocation engine and variety of lease database backends supported by Kea, it is impractical to implement BALS for all backends at once or even within a single Kea release. The likely approach is to start with the Memfile backend, then MySQL and PostgreSQL. Support for BALS in Cassandra may never be added or will be added at the very end. Therefore, we must assume the coexistence of the BALS and the current lease selection strategy using the Iterative Allocator. 
+The concepts described in this design will have to be implemented for different lease database backends. Due to the complexity of the allocation engine and variety of lease database backends supported by Kea, it is impractical to implement BALS for all backends at once or even within a single Kea release. The likely approach is to start with the Memfile backend, then MySQL and PostgreSQL. Support for BALS in Cassandra may never be added or will be added at the very end. Therefore, we must assume the coexistence of the BALS and the current lease selection strategy using the `IterativeAllocator`. 

-In order to take advantage of the existing allocation engine code and keep the required changes to the absolute minimum, this design proposes that new allocator is created (in addition to the existing IterativeAllocator) which will support BALS. This allocator will be similar to the iterative allocator, but instead of walking over the addresses in the address pool it will make calls to the lease backend to get the next available address.
+In order to take advantage of the existing allocation engine code and keep the required changes at minimum, this design proposes that new allocator is created (in addition to the existing `IterativeAllocator`) which will support BALS. This allocator will be similar to the `IterativeAllocator`, but instead of walking over the addresses in the address pool it will make calls to the lease backend to get the next available address.

-Depending on the backend's capabilities, one or another allocator will be selected upon the server's startup or reload. The changes in the allocation engine are expected to be minimal, if any.
+Depending on the backend's capabilities, one or another allocator will be selected upon the server's startup or reload. The changes in the allocation engine to support the new allocator are expected to be minimal.

 # New API Calls

 ```c++
 uint8_t getLeaseSelectionSupport() const;
-Lease4Ptr findAvailableLease4(IOAddress range_start, IOAddress range_end, IOAddress last_offered);
-Lease6Ptr findAvailableLeaseNA(IOAddress range_start, IOAddress range_end, IOAddress last_offered);
+Lease4Ptr findAvailableLease4(IOAddress range_start, IOAddress range_end, IOAddress last_offered, uint64_t attempt_count);
+Lease6Ptr findAvailableLeaseNA(IOAddress range_start, IOAddress range_end, IOAddress last_offered, uint64_t attempt_count);
 ... prefix delegation TBD ....
 ```

@@ -42,21 +42,21 @@ const uint8_t LEASE_SELECTION_PD = 0x04;
 const uint8_t LEASE SELECTION_ALL = 0x07;
 ```

-The remaining functions are used by the allocation engine to get the next available lease of a given type for a given pool. The pool is specified as a pair of IP addresses denoting the beginning and the end of the pool. The last argument designates the last offered lease for the given pool. The allocation engine maintains the last offered addresses for the individual pools. Providing the last offered address and properly updating it within the allocation engine's allocator guarantees that the backend will not return the same address when the allocation engine makes subsequent attempts to get an available address for a client.
+The remaining functions are used by the allocation engine to get the next available lease of a given type for a given pool. The pool is specified as a pair of IP addresses denoting the beginning and the end of the pool. The third argument designates the last offered lease for the given pool. The allocation engine maintains the pointers to the last offered addresses for the individual pools. Providing the last offered address and properly updating it within the allocation engine's allocator guarantees that the backend will avoid returning the same address when the allocation engine makes subsequent attempts to get an available address for a client.
+
+The forth argument is a counter of attempts made to get an available address for a given client/packet. It should be set to 0 upon the first attempt and incremented when the offered address is rejected by the allocation engine and another attempt is made.

 # Memfile

 The Memfile is the first backend to support BALS.

-Currently, the backend stores the information about allocated leases in a multi index container. It is straightforward to query the container for a range of allocated leases, a single allocated lease etc. Finding an available lease involves both looking into the defined pools and the leases stored in the multi index container. By eliminating the allocated leases from the pool we find a collection of unallocated/available leases. This process would be expensive with the current lease data structures present in the Memfile.
-
-The idea to improve the process of finding an available lease is based on extending the backend to include another container holding the information about unallocated leases. Finding available leases in such container is as straightforward as finding an allocated lease in the existing multi index container. However, this adds some complexity both to the backend and to the server's configuration to ensure data integrity.
+Currently, the backend stores the information about allocated leases in a multi index container. It is straightforward and efficient to query the container for a range of allocated leases, a single allocated lease etc. Though, this container is not designed for efficient searches of available leases. This would involve both looking into the defined pools and the leases stored in the multi index container. By eliminating the allocated leases from the pool it would be possible to find a collection of unallocated/available leases. This process would be pretty inefficient as proven by some earlier attempts we have made to implement it.

-The new complexity of the backend and the server's configuration are described in detail in the following subsections.
+The idea to improve the process of finding an available lease is based on extending the backend to include another container holding the information about **available** leases. Finding available leases in such container is as efficient as finding an allocated lease in the existing multi index container. However, the usage of the additional container brings some complexity either to the backend and to the server's configuration to ensure the data integrity.

-## Data Structures
+## Containers

-We opt for two having two distinct containers, one for holding the allocated leases (A-container) and one for holding unallocated leases (U-container). These containers have different requirements for indexing. Thus, it would be impractical to combine the information about the allocated and available leases in a single container. When the lease gets allocated it becomes necessary to remove this lease from the unallocated leases and create new lease entry in the container holding allocated leases. Similarly, lease deletion should result in removal of the allocated lease and creation of a new entry in the container holding unallocated leases. There are also some other interesting cases to consider: lease is expired or expired reclaimed. In both cases the lease is technically available for allocation but the lease entry exists in the A-container. The following table summarizes which container holds a lease entry depending on the lease state.
+The design proposes to have two distinct containers, one for holding the allocated leases (A-container) and the new container for holding unallocated leases (U-container). These containers have different requirements for indexing. Thus, it would be impractical to combine the information about the allocated and available leases in a single container. When the lease gets allocated it becomes necessary to remove this lease from the unallocated leases and create new lease entry in the container holding allocated leases. Similarly, lease deletion should result in removal of the allocated lease and creation of a new entry in the container holding unallocated leases. There are also some other interesting cases to consider: lease is expired or expired reclaimed. In both cases the lease is technically available for allocation but the lease entry exists in the A-container. The following table summarizes which container holds a lease entry depending on the lease state.

 | Lease state | A-container | U-container |
 | ----------- | ----------- | ----------- |
@@ -65,12 +65,17 @@ We opt for two having two distinct containers, one for holding the allocated lea
 | expired     | yes         | no          |
 | reclaimed   | yes         | yes         |

-The `expired` lease in this case is the one for which the valid lifetime elapsed and the client failed to renew it, but the server hasn't yet reclaimed this lease. The lease is reclaimed via the periodic reclamation process triggered by the server according to the user settings in the server's configuration file. The time window when the lease is expired and not yet reclaimed strictly depends on the server's configuration. In extreme cases an expired lease may never be reclaimed. It should still be possible to allocate such lease, though. Therefore, the expired leases should be occasionally merged into the U-container to make the lease allocation mechanism independent of the periodic lease reclamation routine. Further on we provide some examples when such merge could be triggered.
+The `expired` lease in this case is the one for which the valid lifetime elapsed and the client failed to renew it, but the server hasn't yet reclaimed this lease. The lease is reclaimed via the periodic reclamation process triggered by the server according to the user settings in the server's configuration file. The time window when the lease is expired and not yet reclaimed strictly depends on the server's configuration. In extreme cases an expired lease may never be reclaimed. It should still be possible to allocate such lease, though.
+
+The most efficient way to find an available lease is by getting one from the U-container. In most cases, the first lease found within this container will be accepted by the allocation engine and there will be no need to reuse an expired lease from the A-container. If the lease reclamation routine is enabled, the expired leases will eventually be moved to the U-container by this routine. Looking for expired leases in the A-container brings additional overhead to packet processing, but can't be completely avoided. One case is when the lease reclamation routine is disabled. Another case is when the allocation engine refuses to use the leases returned by the backend, typically because of the conflicts with the host reservations. In those two situations the backend assisted allocator should be able to reach for the expired leases.
+
+The backend assisted allocator should access server's configuration to determine whether the lease reclamation routine is enabled or not. If it is disabled, it should make an attempt to transfer expired leases from the A-container to the U-container upon each attempt to find an available lease. Without this, the allocator wouldn't be able to offer expired leases. In another case, when the lease reclamation is enabled, it is better to rely on it to transfer expired leases from the A-container to the U-container and not cause additional overhead for packet processing.

-*Write something about the expired leases merge and how to make multiple attempts to get the lease that passes host reservation test*
+The `attempts_count` is used by the backend assisted allocator to determine when the transfer of expired leases should take place in cases when the lease reclamation is enabled. This design proposes that such transfer should take place when the `attempts_count` is greater than 0. The `attempts_count` is increased by the allocation engine while trying to allocate the lease. It is set to 0 upon the first attempt. If the allocation engine rejects the lease offered by the backend for the given client it will set it to 1 and retry. Such retry is a signal to the allocator that there may be a deficiency in available leases within the U-container and the transfer of expired leases should be attempted.

+Note that the `attempts_count` is set per packet (also per thread). It is obviously reset to 0 when the new packet comes in.

-The U-container must have an ordered index by IP addresses so as the backend can efficiently access available addresses within different ranges. In order to avoid iterative allocation (e.g. 192.0.2.1, 192.0.2.2...), there should be another index created which holds the values being unique hashes from the IP addresses. That way it is possible to order the result set containing multiple IP addresses according to the hash, thus offering the leases in the unpredictable order of addresses. Different backends may apply different hashing functions. In fact, it may even be extensible and the hashing algorithm, could be configurable if we choose to do so.
+The U-container must have an ordered index by IP addresses so as the backend can efficiently access available addresses within different IP ranges. In order to avoid iterative allocation (e.g. 192.0.2.1, 192.0.2.2...), there should be another index created which holds unique hashes computed from the IP addresses. It makes it possible to order the result set containing multiple IP addresses according to the hash, thus offering the leases in the hardly predictable order of addresses. Different backends may apply different hashing functions. In fact, it may even be extensible and the hashing algorithm could be configurable if we choose to do so.

 The contents of the U-container are not persisted. They are stored in memory only. If the server is restarted the A-container is restored from the lease file, the pools are read from the configuration storage  and finally the U-container is recreated using combined information from the A-container and the pools' configurations.