Kea issueshttps://gitlab.isc.org/isc-projects/kea/-/issues2024-03-14T15:02:26Zhttps://gitlab.isc.org/isc-projects/kea/-/issues/3290Clarify application of the ha-scopes command in the actual deployments2024-03-14T15:02:26ZMarcin GodzinaClarify application of the ha-scopes command in the actual deployments`ha-scopes` command can modify servers scopes without changing its role and other HA parameters.
It can be a powerful tool, but its use can put the server in a state that will be very confusing for the Administrator.
I think this comman...`ha-scopes` command can modify servers scopes without changing its role and other HA parameters.
It can be a powerful tool, but its use can put the server in a state that will be very confusing for the Administrator.
I think this command requires more documentation and warnings about its usage.
For example: \
We have a hot standby pair and send the `ha-scopes` command to the `standby` server, enabling scopes of both servers.
This results in `primary` and `standby` servers replying to DHCP traffic. But the second server still reports as in a `standby` state.
This can lead to massive confusion for Administrators.kea2.6.0https://gitlab.isc.org/isc-projects/kea/-/issues/3266status-get command must return an HA relationship identifier2024-02-26T17:09:03ZMarcin Siodelskistatus-get command must return an HA relationship identifierWe now support the hub-and-spoke setup with multiple relationships in one server. The HA state can be retrieved using the `status-get` command. The problem is, though, that the `status-get` result lacks an association between returned lo...We now support the hub-and-spoke setup with multiple relationships in one server. The HA state can be retrieved using the `status-get` command. The problem is, though, that the `status-get` result lacks an association between returned local/remote entries and the configured relationships. It makes it nearly impossible to match the returned statuses with the relationships we maintain in the Stork database. The status-get response must return identifiers of the HA relationships to enable this matching.kea2.5.7Marcin SiodelskiMarcin Siodelskihttps://gitlab.isc.org/isc-projects/kea/-/issues/3250Unattended terminated state and a reboot2024-02-29T14:40:00ZMarcin SiodelskiUnattended terminated state and a rebootConsider the following case. The clocks on two HA-enabled servers diverge and the clock skew eventually exceeds 60 seconds. As a result, both servers transition to the terminated state. In this state, the servers continue serving DHCP cl...Consider the following case. The clocks on two HA-enabled servers diverge and the clock skew eventually exceeds 60 seconds. As a result, both servers transition to the terminated state. In this state, the servers continue serving DHCP clients but do not exchange the lease updates nor heartbeats. An administrator neglects to correct the clocks and one of the servers reboots. The server enters the `waiting` state and remains in this state hoping that the other server is restarted so they can continue the lease database synchronization and start normal operation. However, the server is unaware that its reboot was not triggered in the course of fixing the clocks, so it will wait for the partner endlessly (or until the administrator comes to work in the morning). The waiting server is not responding to the DHCP traffic until then.
This situation should not occur in a setup where NTP has been enabled. It also should not occur if there is a proper monitoring to detect that the clocks diverge early enough. However, there are chances this situation may happen when all of this is neglected.
The proposed solution is to apply a timeout (could even be several to 10 minutes long) for a server in the waiting state. If the transition of its partner does not occur until this timeout elapses, the server in the waiting state transitions back to the terminated state and continues serving the clients. The waiting server MUST NOT transition to the waiting state immediately after it detects that its partner is in the terminated state to allow enough time to the administrator to reboot the server sequentially after correcting the clocks.kea2.5.7https://gitlab.isc.org/isc-projects/kea/-/issues/3246DHCPRELEASE and lease expiration in active-standby HA setup2024-02-12T09:40:26ZPeter DaviesDHCPRELEASE and lease expiration in active-standby HA setupDHCPvRELEASE lease expiration in active-standby HA setup
Kea 2.5.5
When a client sends a DHCPRELEASE message to a Kea primary HA server, the expired
lease processing settings are honoured.
However, the primary updates the...DHCPvRELEASE lease expiration in active-standby HA setup
Kea 2.5.5
When a client sends a DHCPRELEASE message to a Kea primary HA server, the expired
lease processing settings are honoured.
However, the primary updates the failover server with instructions to delete the
lease.
This leads to a divergence of lease data between the two servers.
[SF00001636](https://isc.lightning.force.com/lightning/r/Case/500S6000004XPRy/view)next-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/3226HA lease updates do not create an accounting entry in v62024-01-25T15:00:10ZAndrei Pavelandrei@isc.orgHA lease updates do not create an accounting entry in v6In v6, HA lease updates are done with the `lease6-bulk-apply` command which is not handled in the `command_processed` RADIUS callout.
This is unlike v4 which does create accounting entries for HA lease updates sent via `lease4-update`.In v6, HA lease updates are done with the `lease6-bulk-apply` command which is not handled in the `command_processed` RADIUS callout.
This is unlike v4 which does create accounting entries for HA lease updates sent via `lease4-update`.next-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/3206subnet-get commands should fetch leases for selected subnets with pagination2024-03-04T09:20:34ZMarcin Siodelskisubnet-get commands should fetch leases for selected subnets with paginationIn HA, we use lease commands to synchronize the database. The lease commands fetch all leases with pagination. However, in the hub-and-spoke model it would be useful to fetch the leases only for selected subnets because the relationships...In HA, we use lease commands to synchronize the database. The lease commands fetch all leases with pagination. However, in the hub-and-spoke model it would be useful to fetch the leases only for selected subnets because the relationships are partitioned by subnet. Today, all leases have to be fetched by each relationship and those that do not belong to the relationship are discarded. This is inefficient. One thing to consider is that subnet identifiers are listed explicitly in the commands.kea2.5.7https://gitlab.isc.org/isc-projects/kea/-/issues/3125HA ignored packets cause DROP statistics counter increment2024-01-22T14:00:37ZDarren AnkneyHA ignored packets cause DROP statistics counter incrementHA_BUFFER6_RECEIVE_NOT_FOR_US increments drop counters.
- This happens at least with a load balancing configuration.
- I think maybe not with hot-standby since I don't think the service logs anything or cares about incoming client pack...HA_BUFFER6_RECEIVE_NOT_FOR_US increments drop counters.
- This happens at least with a load balancing configuration.
- I think maybe not with hot-standby since I don't think the service logs anything or cares about incoming client packets unless it loses contact with the HA peer?
- I cite BUFFER6 above but I'm sure the same is true for DHCPv4.
Possible solutions:
- introduce a new drop status that could be discounted later or part of a different drop statistic?
- Could introduce new status that it is ignored or filtered instead of dropped?
[SF1374](https://isc.lightning.force.com/lightning/r/Case/5007V00002YkO0oQAF/view)next-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/2932Kea HA issue with terminating connection2023-11-10T09:50:24ZNick HahnKea HA issue with terminating connectionWe recently migrated our DHCP setup from dhcpd to Kea. It runs on
two servers with hot standby and a memfile backend for the leases. Kea
assigns IP addresses for around 7000 pools.
Over the past few months the HA connection terminated...We recently migrated our DHCP setup from dhcpd to Kea. It runs on
two servers with hot standby and a memfile backend for the leases. Kea
assigns IP addresses for around 7000 pools.
Over the past few months the HA connection terminated in random intervals.
From looking at the logs on the passive node I can see a lot of
'ResourceBusy: IP address ... could not be updated' warnings prior to
the connection terminating. Since multithreading is enabled I suspected
this may be due to the threads encountering a resource lock on the memfile.
I suppose after the lease update fails a few times, the connection is terminated.
Is the 'ResourceBusy' warning the cause for the terminating HA connection and
is there any way to fix the underlying issue? Any ideas on the issue are greatly
appraciated.
Here are the logs from the primary server:
```
Jun 12 15:04:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1, cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:04:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:04:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:04:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:04:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:04:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:04:45 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.ha-hooks.139625718580992] HA_LEASE_UPDATE_CONFLICT [hwtype=1], cid=[], tid=0x0: lease update to standby-dhcp (http://dhcp-2:8001/) returned conflict status code: ResourceBusy: IP address:123.123.123.123 could not be updated. (error code 4)
Jun 12 15:04:56 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:04:56 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:04:56 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:05:28 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:05:28 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:05:28 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:05:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625752151808] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:05:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625752151808] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:05:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625752151808] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.ha-hooks.139625701795584] HA_LEASE_UPDATE_CONFLICT [hwtype=1], cid=[], tid=0x0: lease update to standby-dhcp (http://dhcp-2:8001/) returned conflict status code: ResourceBusy: IP address:123.123.123.123 could not be updated. (error code 4)
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.ha-hooks.139625718580992] HA_LEASE_UPDATE_CONFLICT [hwtype=1], cid=[], tid=0x0: lease update to standby-dhcp (http://dhcp-2:8001/) returned conflict status code: ResourceBusy: IP address:123.123.123.123 could not be updated. (error code 4)
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: ERROR [kea-dhcp4.ha-hooks.139625718580992] HA_LEASE_UPDATE_REJECTS_CAUSED_TERMINATION too many rejected lease updates cause the HA service to terminate
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: ERROR [kea-dhcp4.ha-hooks.139625718580992] HA_TERMINATED HA service terminated due to an unrecoverable condition. Check previous error message(s), address the problem and restart!
```
Here are the logs from the standby server:
```
Mar 12 19:25:06 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670034884352] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678688706, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 2907, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:25:06 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670009706240] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678688706, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 2907, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:27:28 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670009706240] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678688848, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 3812, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:32:05 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670018098944] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678689125, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 274, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:32:34 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670009706240] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678689154, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 113, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:32:36 dhcp-2 kea-dhcp4[203037]: ERROR [kea-dhcp4.ha-hooks.139670104323840] HA_TERMINATED HA service terminated due to an unrecoverable condition. Check previous error message(s), address the problem and restart!
Mar 12 22:11:09 dhcp-2 kea-dhcp4[203037]: ERROR [kea-dhcp4.packets.139670138794688] DHCP4_BUFFER_RECEIVE_FAIL error on attempt to receive packet: Truncated DHCPv4 packet (len=0) received, at least 236 is expected.
```
The relevant config is the following on both hosts, differing only in the "this-server-name" property.
```
"hooks-libraries": [{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_lease_cmds.so",
"parameters": {}
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_stat_cmds.so",
"parameters": {}
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [{
"this-server-name": "standby-dhcp",
"mode": "hot-standby",
"heartbeat-delay": 10000,
"max-response-delay": 60000,
"max-ack-delay": 5000,
"max-unacked-clients": 5,
"peers": [{
"name": "primary-dhcp",
"url": "http://dhcp-1:8001/",
"role": "primary",
"auto-failover": true
}, {
"name": "standby-dhcp",
"url": "http://dhcp-2:8001/",
"role": "standby",
"auto-failover": true
}]
}]
}
}]
```next-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/2897Cross-check - server should check its HA partner config2023-06-15T13:50:50ZTomek MrugalskiCross-check - server should check its HA partner configHere's an idea for new HA capability. On startup (or when explicit command is called), the server retrieves its partner configuration with `config-get` and checks it for consistency: if the subnets and pools are defined the same way, if ...Here's an idea for new HA capability. On startup (or when explicit command is called), the server retrieves its partner configuration with `config-get` and checks it for consistency: if the subnets and pools are defined the same way, if the subnet-ids match etc.
Right now the doc says those should be the same, with the only difference being server-name, but we don't check it.
What to do with spotted differences is to be determined. We could print a warning, refuse HA connection, shutdown, or even maybe the primary attempt to fix its partner's config.
This is merely an idea. If we like it, the first step would be to turn this into more coherent design. Hence the ~design.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2775HA hook's URLs should support DNS resolution with configurable re-resolution2023-04-06T13:43:06ZTobias FlorekHA hook's URLs should support DNS resolution with configurable re-resolution---
name: Feature request
about: Allow using DNS resolution in HA hook's URLs
---
**Some initial questions**
- Are you sure your feature is not already implemented in the latest Kea version? **yes**
- Are you sure what you would like to...---
name: Feature request
about: Allow using DNS resolution in HA hook's URLs
---
**Some initial questions**
- Are you sure your feature is not already implemented in the latest Kea version? **yes**
- Are you sure what you would like to do is not possible using some other mechanisms? **not reasonable**
- Have you discussed your idea on kea-users or kea-dev mailing lists? **no**
**Is your feature request related to a problem? Please describe.**
I am deploying HA Kea on Kubernetes where (using SDNs) pod(/container) IPs are not constant. The hostname can be made persistent though.
Now I can create a Kubernetes service per Pod which will assign a so-called cluster IP which is stable and gets redirected to the pod.
This works alright for HA communicating to the control agent, but not using the dedicated listener.
**Describe the solution you'd like**
Preferably allow using DNS (re-)resolution for HA hook's URLs. Or allow specifying the listener's bind-address.
**Funding its development**
Kea is run by ISC, which is a small non-profit organization without any government funding or any permanent sponsorship organizations. Are you able and willing to participate financially in the development costs? **no**
**Participating in development**
Are you willing to participate in the feature development? ISC team always tries to make a feature as generic as possible, so it can be used in wide variety of situations. That means the proposed solution may be a bit different that you initially thought. Are you willing to take part in the design discussions? Are you willing to test an unreleased engineering code? **yes**
**Contacting you**
preferably via gitlab.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2714RFE: HA plugin ability to detect partner inabilty to receive client requests ...2023-07-31T14:12:57ZKevin FlemingRFE: HA plugin ability to detect partner inabilty to receive client requests and transition it to 'partner-down'---
name: Feature request
about: HA plugin ability to detect partner inabilty to receive client requests and transition it to 'partner-down'
---
**Some initial questions**
- Are you sure your feature is not already implemented in the l...---
name: Feature request
about: HA plugin ability to detect partner inabilty to receive client requests and transition it to 'partner-down'
---
**Some initial questions**
- Are you sure your feature is not already implemented in the latest Kea version? Yes
- Are you sure what you would like to do is not possible using some other mechanisms? Yes
- Have you discussed your idea on kea-users or kea-dev mailing lists? Yes
**Is your feature request related to a problem? Please describe.**
(This issue was created as a result of an extensive thread on kea-users)
When the HA plugin is being used in either hot-standby or load-balancing mode, Kea peers are able to notice some forms of communications failures and force the other peers to the 'partner-down' state in order to provide service to clients supported by the other peer.
However, in a situation where client requests are not being delivered to a peer, but it is otherwise fully operational including the peer-to-peer communications link, clients supported by that peer will not be serviced, but the other peer(s) care unable to notice the issue and take action to correct it. This situation could arise when the Kea peers are using separate network links for client traffic and HA traffic, or when the Kea peers are receiving client traffic via a DHCP relay and the relay configuration is incorrect.
**Describe the solution you'd like**
One (or more) opt-in mechanisms that the Kea admin can choose to enhance the ability to detect peer failures to service clients, even when the peer's Kea daemon is otherwise fully operational.
**Describe alternatives you've considered**
Some discussions about external monitoring solutions have occurred, and that is certainly an option which some admins could choose.
**Funding its development**
Kea is run by ISC, which is a small non-profit organization without any government funding or any permanent sponsorship organizations. Are you able and willing to participate financially in the development costs? Yes
**Participating in development**
Are you willing to participate in the feature development? ISC team always tries to make a feature as generic as possible, so it can be used in wide variety of situations. That means the proposed solution may be a bit different that you initially thought. Are you willing to take part in the design discussions? Are you willing to test an unreleased engineering code? Yesnext-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/2708HA pool rebalancing2023-02-02T14:23:33ZTomek MrugalskiHA pool rebalancingThis idea is not new. It was recently brought up by @cathya in Porto (see [notes](https://pad.isc.org/p/porto2022-kea-features-for-stork#L58). The overall concept is to design and implement a mechanism similar to the one in ISC DHCP. Whe...This idea is not new. It was recently brought up by @cathya in Porto (see [notes](https://pad.isc.org/p/porto2022-kea-features-for-stork#L58). The overall concept is to design and implement a mechanism similar to the one in ISC DHCP. When there are two servers in load-balancing, it is possible that one of them will run out of addresses while the other one still has many.
Couple random comments:
- The pool rebalancing would somehow make both partners negotiate the pools and rebalance them.
- Using a hysteresis approach with high/low threshold would prevent the mechanism to go crazy when running out of addresses. We don't want it to go crazy when there's one or two addresses left.
- The pool dynamism would add extra complexity as the modified pool range would need to be stored somewhere that would survive crashes/reboots etc.
This requires a ~design. It's a complicated feature request with a high potential for endless tweaks, conflicting tuning requests etc.
We will do it one day, but this would require a lot of design, testing and tuning.outstandinghttps://gitlab.isc.org/isc-projects/kea/-/issues/2700HA Load-Balancing Network issue detection between Relay and Kea2023-01-26T15:22:15ZMathias AichingerHA Load-Balancing Network issue detection between Relay and KeaHi,
I have already tried to resolve this issue with the kea users community, but it seems not many are using HA Load Balancing.
I have the following problem.
Scenario:
Multiple DHCP-Relays at different sites with both KEA-Servers as DH...Hi,
I have already tried to resolve this issue with the kea users community, but it seems not many are using HA Load Balancing.
I have the following problem.
Scenario:
Multiple DHCP-Relays at different sites with both KEA-Servers as DHCP-Servers. Both servers are available and the load balancing shifts the requests between the two servers.
Incident: Because of a network issue Kea 1 is not available from the clients. The network connection between Kea 1 and Kea 2 still works, so no partner-down state.
Expected behaviour: Kea 2 sees the unacked clients of Kea 1 and sets Kea 1 in partner-down state and handles all requests.
Experienced behaviour: Kea 2 still reports HA_BUFFER4_RECEIVE_NOT_FOR_US and does not handle the requests. Unacked clients is not counted.
Is there a misunderstanding or configuration mistake on my side?
```
{
"library": "/usr/local/lib/kea//hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"this-server-name": "server2",
"mode": "load-balancing",
"heartbeat-delay": 10000,
"max-response-delay": 60000,
"max-ack-delay": 10000,
"max-unacked-clients": 1,
"delayed-updates-limit": 100,
"peers": [
{
"name": "server1",
"url": "http://192.168.248.1:8080/",
"role": "primary",
"auto-failover": true
},
{
"name": "server2",
"url": "http://192.168.248.2:8080/",
"role": "secondary",
"auto-failover": true
}
]
}
]
}
}
```
Thank you,
Mathiasbackloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2692Hook service must start before the first packet.2024-03-18T09:23:21ZFrancis DupontHook service must start before the first packet.We have a general issue with hooks starting services by a post to the server IO service as this is IO service is polled so the services started after the first packet processing.
I suggest to add a poll at the end of loadConfigFile in c...We have a general issue with hooks starting services by a post to the server IO service as this is IO service is polled so the services started after the first packet processing.
I suggest to add a poll at the end of loadConfigFile in ctrl_dhcp[46]_srv.cc to catch late issues e.g. bad configuration detected when the service is launched.
Hooks using deferred start: HA, LQ and soon RADIUS.kea2.5.7Marcin SiodelskiMarcin Siodelskihttps://gitlab.isc.org/isc-projects/kea/-/issues/2592partner-down state transition when max-unacked-clients reached2022-11-24T14:45:22ZMarcin Siodelskipartner-down state transition when max-unacked-clients reachedSuppose the server lost the connection with its partner. The server begins the failover procedure by checking whether or not the partner responds to the DHCP queries. The `max-unacked-clients` setting controls how many different clients ...Suppose the server lost the connection with its partner. The server begins the failover procedure by checking whether or not the partner responds to the DHCP queries. The `max-unacked-clients` setting controls how many different clients should retry getting the lease with the increased value of the `secs` field before the server considers partner dead. One would expect the server to make `partner-down` transition as soon as the number of unacked clients reaches the configured number. In fact, the state transitions are generally performed when the server completes a heartbeat or a lease update. It is possible that under heavy traffic there will be much larger number of unacked clients and the server still sits in the normal state (e.g. hot-standby), waiting for the heartbeat trigger. Assuming the heartbeat interval is reasonable, it should probably be fine. However, we may consider starting the transition as soon as the number of unacked clients reaches the configured maximum.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2565HA lease v6 updates use the default hwtype and hwaddr_source2023-07-31T13:51:18ZAndrei Pavelandrei@isc.orgHA lease v6 updates use the default hwtype and hwaddr_sourceNotice the discrepancy in the last two columns:
* `server1`:
```
address,duid,valid_lifetime,expire,subnet_id,pref_lifetime,lease_type,iaid,prefix_len,fqdn_fwd,fqdn_rev,hostname,hwaddr,state,user_context,hwtype,hwaddr_source
2001:db8:5...Notice the discrepancy in the last two columns:
* `server1`:
```
address,duid,valid_lifetime,expire,subnet_id,pref_lifetime,lease_type,iaid,prefix_len,fqdn_fwd,fqdn_rev,hostname,hwaddr,state,user_context,hwtype,hwaddr_source
2001:db8:50::11,00:03:00:01:01:03:0d:04:0b:01,4000,1663013972,1,3000,0,5946,128,0,0,,01:03:0d:04:0b:01,0,,1,0
2001:db8:50::12,00:03:00:01:01:04:0e:05:0c:02,4000,1663013972,1,3000,0,3512,128,0,0,,01:04:0e:05:0c:02,0,,1,0
2001:db8:50::d,00:03:00:01:01:05:0f:06:0d:03,4000,1663013972,1,3000,0,5918,128,0,0,,01:05:0f:06:0d:03,0,,1,2
2001:db8:50::e,00:03:00:01:01:06:10:07:0e:04,4000,1663013973,1,3000,0,4936,128,0,0,,01:06:10:07:0e:04,0,,1,2
```
* `server2`:
```
address,duid,valid_lifetime,expire,subnet_id,pref_lifetime,lease_type,iaid,prefix_len,fqdn_fwd,fqdn_rev,hostname,hwaddr,state,user_context,hwtype,hwaddr_source
2001:db8:50::11,00:03:00:01:01:03:0d:04:0b:01,4000,1663013972,1,3000,0,5946,128,0,0,,01:03:0d:04:0b:01,0,,1,2
2001:db8:50::12,00:03:00:01:01:04:0e:05:0c:02,4000,1663013972,1,3000,0,3512,128,0,0,,01:04:0e:05:0c:02,0,,1,2
2001:db8:50::d,00:03:00:01:01:05:0f:06:0d:03,4000,1663013972,1,3000,0,5918,128,0,0,,01:05:0f:06:0d:03,0,,1,0
2001:db8:50::e,00:03:00:01:01:06:10:07:0e:04,4000,1663013973,1,3000,0,4936,128,0,0,,01:06:10:07:0e:04,0,,1,0
```
The ones with `hwaddr_source = 0` are updated from the other peer. `hwtype = 1` is also likely a default that happens to match its source in the examples above.
It looks like `Lease6::toElement()` and `Lease6Parser::parse()` need the `hwtype` and `hwaddr_source` capabilities.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2427Kea HA hot-standby mode - standby peer not catching up2023-07-31T13:42:46ZfavqKea HA hot-standby mode - standby peer not catching upHi,
I'm testing a Kea HA setup in hot-standby mode, with the following settings:
* Kea 2.0.1 DHCPv4 + control agent.
* Two Kea instances: one "primary" and the other "standby".
* memfile backend with file persistence enabled.
* Lease...Hi,
I'm testing a Kea HA setup in hot-standby mode, with the following settings:
* Kea 2.0.1 DHCPv4 + control agent.
* Two Kea instances: one "primary" and the other "standby".
* memfile backend with file persistence enabled.
* Lease synchronization enabled in the HA setup.
* The only hooks libraries in use are ha and lease_cmds.
I ran perfdhcp simulating multiple clients against the primary. After a while of sending many requests to the primary, I see that both instances have stored leases, but the standby didn't completely catch up with the primary.
That is, when I inspect the leases on both instances using the lease4-get-all API command, I see that the number of leases did increase on both instances, but the standby has less leases than the primary.
If I manually call the ha-sync API command, or if I restart the standby, or if I reload the configuration in the standby, the standby does a sync and catches up with the primary, and the number of leases becomes equal again. However, if I then run perfdhcp repeatedly, standby eventually starts falling behind again.
Note that, when this happens, if I call the "ha-heartbeat" API command on both instances, they both report an "unsent-update-count" of 0.
A similar thing happens with DHCPv6.
Is this behavior expected? Is it normal for the standby to not catch up with the primary during HA operation, needing manual intervention ("ha-sync", restart or config reload) to catch up?
Thank you.outstandinghttps://gitlab.isc.org/isc-projects/kea/-/issues/2339Memory leak in HA scenario with backup server down2023-09-07T14:02:26ZBranimir RajtarMemory leak in HA scenario with backup server down---
name: Memory leak in HA scenario with backup server down
about: Memory loss is created on running instances
---
**Describe the bug**
HA mode is configured with three servers (primary, secondary, backup) and is serving clients. Whe...---
name: Memory leak in HA scenario with backup server down
about: Memory loss is created on running instances
---
**Describe the bug**
HA mode is configured with three servers (primary, secondary, backup) and is serving clients. When the backup server becomes unavailable, the primary and secondary experience a continuous memory leak which is manifested as a continuous increase in RSS memory use for the isc-kea-dhcp4-server process. The size of the memory leak is in direct correlation with the number of active clients - the larger number, the greater the memory leak. Once the backup server is deleted from the configuration or it becomes active again, there is no more memory increase, but the old memory is not freed.
**To Reproduce**
Steps to reproduce the behavior:
1. Run KEA (DHCP4 only) in HA scenario with two load-balancing servers (primary and secondary) and a single backup server
2. Start serving clients (40k in our scenario) and monitoring RSS usage for the KEA server process
3. Disable backup server
4. Verify that RSS usage is increasing continuously
5. Enable backup server
6. Verify that RSS usage is stable
**Expected behavior**
The servers should not have any memory leaks.
**Environment:**
- Kea version: 1.8.2, 2.0.2
- OS: Ubuntu 18.04
- Memfile
- libdhcp_lease_cmds, libdhcp_stat_cmds, libdhcp_ha
**Additional Information**
```
{
"Dhcp4": {
"dhcp-queue-control": {
"enable-queue": true,
"queue-type": "kea-ring4",
"capacity": 256
},
"interfaces-config": {
"interfaces": [
"eth1"
],
"dhcp-socket-type": "udp"
},
"control-socket": {
"socket-type": "unix",
"socket-name": "/tmp/kea-dhcp4-ctrl.sock"
},
"lease-database": {
"type": "memfile",
"persist": true,
"name": "/var/lib/kea/dhcp4.leases",
"lfc-interval": 3600,
"port": 0
},
"expired-leases-processing": {
"reclaim-timer-wait-time": 10,
"flush-reclaimed-timer-wait-time": 25,
"hold-reclaimed-time": 3600,
"max-reclaim-leases": 100,
"max-reclaim-time": 250,
"unwarned-reclaim-cycles": 5
},
"renew-timer": 60,
"rebind-timer": 100,
"valid-lifetime": 120,
"option-data": [],
"hooks-libraries": [
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_lease_cmds.so",
"parameters": {}
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_stat_cmds.so"
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"this-server-name": "server3",
"mode": "load-balancing",
"heartbeat-delay": 3000,
"max-response-delay": 7000,
"max-ack-delay": 7000,
"max-unacked-clients": 20,
"peers": [
{
"name": "server2",
"url": "http://<XXX>:8080/",
"role": "secondary",
"auto-failover": true
},
{
"name": "server1",
"url": "http://<YYY>:8080/",
"role": "primary",
"auto-failover": true
},
{
"name": "server3",
"url": "http://<ZZZ>:8080/",
"role": "backup",
"auto-failover": true
}
]
}
]
}
}
],
"option-def": [
{
"name": "classless-static-route",
"code": 121,
"space": "dhcp4",
"type": "record",
"array": true,
"record-types": "uint8, uint8"
}
],
"client-classes": [
// anonymized
],
"subnet4": [
// anonymized
],
"reservations": [],
"loggers": [
{
"name": "kea-dhcp4",
"output_options": [
{
"output": "syslog"
}
],
"severity": "error",
"debuglevel": 0
}
]
}
}
```
**Contacting you**
Email/Github, telephone is available after contactnext-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/2223HA error when partner received duplicated DHCP requests with the same Transac...2023-07-31T13:39:24ZSpencer LoweHA error when partner received duplicated DHCP requests with the same Transaction ID**Describe the bug**
Because our redundant topology the same Kea server will get the same DHCP request message with the same transaction ID from different relays. We run our kea servers in the load-balancing HA mode. Server 1 will send t...**Describe the bug**
Because our redundant topology the same Kea server will get the same DHCP request message with the same transaction ID from different relays. We run our kea servers in the load-balancing HA mode. Server 1 will send the lease4-update to server 2. Because server 1 received the same packet twice with the same transaction ID it sends both updates to server 2. Server 2 then responds with resource busy because it is still updating the lease from the first request when the second request comes in. Because server 2 errors out server 1 puts the server into unknown state. It will then resync, but it breaks on every DHCP request. We are storing leases in Postgres.
**To Reproduce**
Steps to reproduce the behavior:
1. Run Kea dhcpv4 in load balancing mode. Send a duplicate DHCP request with the same transaction ID to a server.
2. Server 1 will process both DHCP requests and will send the lease4-update to server 2. Both of these requests happen at almost the same time.
3. Server 1 will then put server 2 into unknown state because the lease4-update command failed.
**Expected behavior**
I would expect the update to not fail, but I am not sure how DHCP servers are supposed to handle duplicate requests with the same transaction IDs
**Environment:**
- Kea dhcp4 v 2.0.0
- OS: [e.g. Debian 11 x64]
- Storing leases in Postgres
- `libdhcp_lease_cmds`, `libdhcp_ha`
**Additional Information**
- I have attached a PCAP that shows the DHCP requests and the lease4-update command
- Attached is the kea config that runs on both servers
[ss21-dhcp-debug.pcap](/uploads/f4c0beee834c75b7131e4e485af12d47/ss21-dhcp-debug.pcap)[kea-config.json](/uploads/a5a5a89cd10a736706b93ac7eabd4e9d/kea-config.json)
**Contacting you**
slowe@clairglobal.comnext-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/1914HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading sometimes fails2023-02-27T13:41:09ZAndrei Pavelandrei@isc.orgHAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading sometimes failsThis time it happened on distcheck on CentOS 8.
https://jenkins.aws.isc.org/job/kea-dev/job/distcheck/415/execution/node/136/log/?consoleFull
```
16:04:40 [ RUN ] HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading
16:04:...This time it happened on distcheck on CentOS 8.
https://jenkins.aws.isc.org/job/kea-dev/job/distcheck/415/execution/node/136/log/?consoleFull
```
16:04:40 [ RUN ] HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading
16:04:40 ../../../../../../../src/hooks/dhcp/high_availability/tests/ha_service_unittest.cc:1096: Failure
16:04:40 Expected equality of these values:
16:04:40 2
16:04:40 factory3_->getResponseCreator()->getReceivedRequests().size()
16:04:40 Which is: 1
16:04:40 ../../../../../../../src/hooks/dhcp/high_availability/tests/ha_service_unittest.cc:1102: Failure
16:04:40 Value of: update_request3
16:04:40 Actual: false
16:04:40 Expected: true
16:04:40 [ FAILED ] HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading (2 ms)
```backlog