- Backend Assisted Lease Selection Design
- Basic Concepts
- Backends Compatibility
- New API Calls
- MySQL and Postgres
- Implementation Tasks
Backend Assisted Lease Selection Design
This is a design document describing a proposed solution to optimize a process by which Kea finds and offers available leases to the requesting DHCP client. It improves efficiency of the lease allocation stage in which the allocator picks candidate leases iteratively and checks whether they are available for allocation. The current approach has been proven to be highly inefficient when the address pools become exhausted or nearly exhausted.
The features described in this design are sometimes called "leases pre-allocation", because the concept of storing information about unallocated leases in the lease database is one of the core ideas to improve lease allocation performance.
The design issue for this feature is #1126.
The new lease lookup scheme is going to shift some logic from the allocation engine to the lease backend implementations. We're migrating away from the lease allocation strategy in which the allocation engine "blindly" picks a candidate lease and then verifies if it is available, to the strategy in which the lease backend finds and offers a candidate lease.
Even though we can't always guarantee that the lease offered by the backend can be allocated to the given client, shifting the lease selection to the backend should significantly reduce the number of costly interactions between the allocation engine and the lease backend. In particular, it guarantees that if there is a free lease in a given address range it will be offered to the allocation engine with a single query, even if this last available lease in the pool. We're going to call this strategy a Backend Assisted Lease Selection (BALS). In the current code, the lease backend passively responds to the queries whether the lease picked by the allocation engine is available or not. In the extreme case, the allocation engine can make a large number of queries equal to the size of the address pool to find the last available lease. If the pool is exhausted, it will make all these queries to find out that there are no more available leases. With the Backend Assisted Lease Selection the allocation engine will learn that the pool is already exhausted with a single query. It can then try other pools or even subnets, if the subnets are grouped in shared networks.
The lease offered by the backend is available for allocation from the backend's perspective. There may be some additional conditions to be tested on the allocation engine's side prior to offering the lease to the client, e.g. checking if the given lease is not reserved to another client. If any of these additional conditions is not met, the allocation engine sends additional queries to the lease backend asking for another lease. This makes it similar to the currently used allocation strategy in that it can take several attempts before the lease is allocated, however the number of such attempts is brought to a minimum. Most of the time a single call should be sufficient to find and allocate a candidate lease. If the server is configured to assigned in-pool host reservations, it is possible that the candidate address is reserved to another client in which case at least two queries are needed. If the pool includes many host reservations, the number of attempts to find an available lease may significantly grow, but this should be no worse than the current situation. In fact, filling the address pool with in-pool reservations should be considered a misconfiguration. In that case it is recommended to use out of pool reservations which should minimize the risk of collisions or eliminate them entirely.
The concepts described in this design will have to be implemented for different lease database backends. Due to the complexity of the allocation engine and variety of lease database backends supported by Kea, it is impractical to implement BALS for all backends at once or even within a single Kea release. The likely approach is to start with the Memfile backend, then MySQL and PostgreSQL. Support for BALS in Cassandra may never be added or will be added at the very end. Therefore, we must assume the coexistence of the BALS and the current lease selection strategy using the
In order to take advantage of the existing allocation engine code and keep the required changes at minimum, this design proposes that new allocator is created (in addition to the existing
IterativeAllocator) which will support BALS. This allocator will be similar to the
IterativeAllocator, but instead of walking over the addresses in the address pool it will make calls to the lease backend to get the next available address.
Depending on the backend's capabilities, one or another allocator will be selected upon the server's startup or reload. The changes in the allocation engine to support the new allocator are expected to be minimal.
New API Calls
uint8_t getLeaseSelectionSupport() const; void preallocateLeases4(IOAddress range_start, IOAddress range_end); void preallocateLeases6(IOAddress range_start, IOAddress range_end); Lease4Ptr findAvailableLease4(IOAddress range_start, IOAddress range_end, IOAddress last_offered, uint64_t attempt_count); Lease6Ptr findAvailableLeaseNA(IOAddress range_start, IOAddress range_end, IOAddress last_offered, uint64_t attempt_count); ... prefix delegation TBD ....
getLeaseSelectionSupport() function will be implemented by all backends and will return a bitwise value indicating if the given backend supports the BALS and for which lease types. Based on that the allocation engine will be able to determine which code path to take for lease allocation. The following flags will be pre-defined:
const uint8_t LEASE_SELECTION_NONE = 0x00; const uint8_t LEASE_SELECTION_ADDR4 = 0x01; const uint8_t LEASE_SELECTION_NA = 0x02; const uint8_t LEASE_SELECTION_PD = 0x04; const uint8_t LEASE SELECTION_ALL = 0x07;
preallocateLeases6 functions are to be triggered upon the server's first configuration, reconfiguration or configuration update for individual address pools for which preallocation should take place. As a result, the backend specific implementations will create appropriate lease entries to mark those that are available.
The remaining functions are used by the allocation engine to get the next available lease of a given type for a given pool. The pool is specified as a pair of IP addresses denoting the beginning and the end of the pool. The third argument designates the last offered lease for the given pool. The allocation engine maintains the pointers to the last offered addresses for the individual pools. Providing the last offered address and properly updating it within the allocation engine's allocator guarantees that the backend will avoid returning the same address when the allocation engine makes subsequent attempts to get an available address for a client.
The forth argument is a counter of attempts made to get an available address for a given client/packet. It should be set to 0 upon the first attempt and incremented when the offered address is rejected by the allocation engine and another attempt is made.
The Memfile is the first backend to support BALS.
Currently, the backend stores the information about allocated leases in a multi index container. It is straightforward and efficient to query the container for a range of allocated leases, a single allocated lease etc. Though, this container is not designed for efficient searches of available leases. This would involve both looking into the defined pools and the leases stored in the multi index container. By eliminating the allocated leases from the pool it would be possible to find a collection of unallocated/available leases. This process would be pretty inefficient as proven by some earlier attempts we have made to implement it.
The idea to improve the process of finding an available lease is based on extending the backend to include another container holding the information about available leases. Finding available leases in such container is as efficient as finding an allocated lease in the existing multi index container. However, the usage of the additional container brings some complexity either to the backend and to the server's configuration to ensure the data integrity.
The design proposes to have two distinct containers, one for holding the allocated leases (A-container) and the new container for holding unallocated leases (U-container). These containers have different requirements for indexing. Thus, it would be impractical to combine the information about the allocated and available leases in a single container. When the lease gets allocated it becomes necessary to remove this lease from the unallocated leases and create new lease entry in the container holding allocated leases. Similarly, lease deletion should result in removal of the allocated lease and creation of a new entry in the container holding unallocated leases. There are also some other interesting cases to consider: lease is expired or expired reclaimed. In both cases the lease is technically available for allocation but the lease entry exists in the A-container. The following table summarizes which container holds a lease entry depending on the lease state.
expired lease in this case is the one for which the valid lifetime elapsed and the client failed to renew it, but the server hasn't yet reclaimed this lease. The lease is reclaimed via the periodic reclamation process triggered by the server according to the user settings in the server's configuration file. The time window when the lease is expired and not yet reclaimed strictly depends on the server's configuration. In extreme cases an expired lease may never be reclaimed. It should still be possible to allocate such lease, though.
The most efficient way to find an available lease is by getting one from the U-container. In most cases, the first lease found within this container will be accepted by the allocation engine and there will be no need to reuse an expired lease from the A-container. If the lease reclamation routine is enabled, the expired leases will eventually be moved to the U-container by this routine. Looking for expired leases in the A-container brings additional overhead to packet processing, but can't be completely avoided. One case is when the lease reclamation routine is disabled. Another case is when the allocation engine refuses to use the leases returned by the backend, typically because of the conflicts with the host reservations. In those two situations the backend assisted allocator should be able to reach for the expired leases.
The backend assisted allocator should access server's configuration to determine whether the lease reclamation routine is enabled or not. If it is disabled, it should make an attempt to transfer expired leases from the A-container to the U-container upon each attempt to find an available lease. Without this, the allocator wouldn't be able to offer expired leases. In another case, when the lease reclamation is enabled, it is better to rely on it to transfer expired leases from the A-container to the U-container and not cause additional overhead for packet processing.
attempts_count is used by the backend assisted allocator to determine when the transfer of expired leases should take place in cases when the lease reclamation is enabled. This design proposes that such transfer should take place when the
attempts_count is greater than 0. The
attempts_count is increased by the allocation engine while trying to allocate the lease. It is set to 0 upon the first attempt. If the allocation engine rejects the lease offered by the backend for the given client it will set it to 1 and retry. Such retry is a signal to the allocator that there may be a deficiency in available leases within the U-container and the transfer of expired leases should be attempted.
Note that the
attempts_count is set per packet (also per thread). It is obviously reset to 0 when the new packet comes in.
The U-container must have an ordered index by IP addresses so as the backend can efficiently access available addresses within different IP ranges. In order to avoid iterative allocation (e.g. 192.0.2.1, 192.0.2.2...), there should be another index created which holds unique hashes computed from the IP addresses. It makes it possible to order the result set containing multiple IP addresses according to the hash, thus offering the leases in the hardly predictable order of addresses. Different backends may apply different hashing functions. In fact, it may even be extensible and the hashing algorithm could be configurable if we choose to do so.
The contents of the U-container are not persisted. They are stored in memory only. If the server is restarted the A-container is restored from the lease file, the pools are read from the configuration storage and finally the U-container is recreated using combined information from the A-container and the pools' configurations.
There are several different ways to configure pools to be served by a Kea server:
- Kea configuration files,
- subnet_cmds hooks library,
- cb_cmds hooks library.
A change in the pools configuration should trigger an action of pre-allocating the leases for the updated pools. Pre-allocation may be a time consuming operation depending on the size of the pools configured. The server should be able to verify if the latest configuration change affected any pools. If no pools have been modified as a result of the reconfiguration, the pre-allocation step should be skipped. If only some pools have been modified, the pre-allocation should be performed only for the modified pools.
There is no standard way in which the pre-allocation could be triggered for all configuration methods listed above.
When the server is reloaded, the server should be able to compare the old and new configurations and detect which pools have been added. The
preallocateLeasesX function should be executed as a final configuration step for all new pools. We could potentially make such checks during the commit() call on the
CfgMgr, but a PoC would be needed to confirm whether it is a good idea.
The subnet_cmds and configuration backend add/modify subnets directly to the current config (not staging config), and that doesn't involve a commit phase. Therefore, we'd need to trigger
preallocateLeasesX whenever they update the current config directly. Again, determining the best place to trigger pre-allocation is to be determined.
MySQL and Postgres
The SQL backends store information about allocated leases in the
lease6 tables. Similarly to the Memfile backend, finding an available lease using the information about allocated leases is inefficient and thus storing the information about unallocated leases is required.
The challenging part is how to efficiently populate the unallocated leases to the
lease6_avail tables. Let's start with the divagations about the IPv4 case first.
Tables and Queries for IPv4
The available addresses are populated for individual address pools. For each such address pool we have to make a query which finds "gaps" in the
lease4 table and creates the corresponding entries in the
lease4_avail table. IPv4 addresses are represented as 4 byte integers in the database. They can be sorted in the ascending order as a sequence of numbers and the typical techniques for finding gaps in a sequence of numbers can be applied.
The following query in PostgreSQL:
SELECT avail FROM generate_series(('192.0.2.10'::inet - '0.0.0.0'::inet), ('192.0.2.100'::inet - '0.0.0.0'::inet), 1) AS avail LEFT JOIN lease4 on avail = lease4.address WHERE lease4.address IS NULL;
selects all unallocated addresses (not found in the
lease4 table) for a pool of
192.0.2.10 - 192.0.2.100. Those addresses can be inserted into the
lease4_avail table to mark they are available for allocation. The
ON CONFLICT/DO NOTHING construct can be used with this insert to bypass the duplicates if some of these entries already exist in the
The trigger should be added on the
lease4_avail table to delete entries as a result of inserting leases into the
lease4 table. Similarly, deleting the lease from the
lease4 table should trigger insertion into the
IPv6 to be described
Pseudo Random Lease Selection
Randomizing available lease selection from the pool can be achieved by adding a hashing column to the
The following query is the extension of the query provided in the previous section which besides inserting available addresses would also add md5 digest to the table:
WITH avail_addresses AS (SELECT avail FROM generate_series(('192.0.2.10'::inet - '0.0.0.0'::inet), ('192.0.2.100'::inet - '0.0.0.0'::inet), 1) AS avail LEFT JOIN lease4 on avail = lease4.address WHERE lease4.address IS NULL) INSERT INTO lease4_avail (address, address_hash) SELECT avail, md5(avail::text) FROM avail_addresses;
lease4_avail is created as follows:
CREATE TABLE lease4_avail ( address bigint NOT NULL PRIMARY KEY, address_hash text NOT NULL ); CREATE UNIQUE INDEX lease4_avail_address_hash_unique ON lease4_avail(address_hash);
Then the query similar to this can be used to retrieve next available address:
SELECT * FROM lease4_avail WHERE address >= 3221225994 AND address <= 3221226084 AND address_hash > md5(3221226082::text) ORDER BY address_hash;
3221226082 is the integer representation of the last IP address returned for the given pool. The
3221226084 designate the pool range.
This is the example order in which the addresses from the 192.0.2.10 to 192.0.2.100 are returned for such hashing algorithm:
address ------------- 192.0.2.71 192.0.2.91 192.0.2.70 192.0.2.90 192.0.2.42 192.0.2.47 192.0.2.23 192.0.2.72 192.0.2.76 192.0.2.51 192.0.2.28 192.0.2.38 192.0.2.64 192.0.2.14 192.0.2.36 192.0.2.53 192.0.2.43 192.0.2.21 192.0.2.50 192.0.2.60 192.0.2.87 192.0.2.99 192.0.2.17 192.0.2.57 192.0.2.78 192.0.2.37 192.0.2.100 ...
Considerations About Multiple Kea Instances
It appears to be a common use case to have multiple Kea servers connected to the same database. These servers share lease information in the
lease6 tables. The SQL trigger mechanism addresses the problem of consistency between the
leaseX_avail tables during normal operation. The aspect which requires some consideration is the servers' startup and reconfiguration phases.
Each server connected to the database is configured with a set of address pools. The servers may be configured with different address pools, the same address pools or overlapping address pools. The server connecting to the lease database is unaware whether any other server is connected to it and what pools it is configured to serve addresses from. The server connecting to the database is interested in "pre-allocating" the leases for the pools it is configured to serve, but it must not delete any existing entries in the
leaseX_avail tables as they might have been inserted by another Kea instance according to its configuration. The server connecting to the database should traverse addresses in its own pools and create the corresponding entries in the
leaseX_avail tables. If any of the entries already exist (because the pools of the two servers overlap), the server skips to the next address leaving this entry in place.
Note: skipping the existing entry upon insert can be achieved by using ON CONFLICT/DO NOTHING construct in PostgreSQL.
When the server terminates it must not delete any entries from the
leaseX_avail tables because another server may be using them. The may obviously lead to a dirty database including extraneous data for the leases which belong to the address pools not configured on any server. If this is a concern, it is possible to introduce a configuration parameter which enables cleanup of the stale entries. The cleanup may only be enabled in deployments where a single server is connected to the database at a given time.
- Configuration knob: enable BALS on global, sn, subnet and pool level (2d)
- Update MySQL CB with new config knobs (2d)
- Trigger pre-allocation during reload (3d)
- Trigger pre-allocation for subnet_cmds (3d)
- Trigger pre-allocation for CB (3d)
- BALS support in Memfile (5d)
- Create new allocator supporting BALS (5d)
- BALS support in MySQL (5d)
- BALS support in PostgreSQL (5d)
- Collision avoidance with host reservations (4d)
Total of: 37 days Without SQL: 27 days
Doesn't include testing effort required: both for functional and performance testing.