Kea issueshttps://gitlab.isc.org/isc-projects/kea/-/issues2024-03-14T15:02:26Zhttps://gitlab.isc.org/isc-projects/kea/-/issues/3290Clarify application of the ha-scopes command in the actual deployments2024-03-14T15:02:26ZMarcin GodzinaClarify application of the ha-scopes command in the actual deployments`ha-scopes` command can modify servers scopes without changing its role and other HA parameters.
It can be a powerful tool, but its use can put the server in a state that will be very confusing for the Administrator.
I think this comman...`ha-scopes` command can modify servers scopes without changing its role and other HA parameters.
It can be a powerful tool, but its use can put the server in a state that will be very confusing for the Administrator.
I think this command requires more documentation and warnings about its usage.
For example: \
We have a hot standby pair and send the `ha-scopes` command to the `standby` server, enabling scopes of both servers.
This results in `primary` and `standby` servers replying to DHCP traffic. But the second server still reports as in a `standby` state.
This can lead to massive confusion for Administrators.kea2.6.0https://gitlab.isc.org/isc-projects/kea/-/issues/3276Kea primary server in "passive backup" freeze/crash on receiving ha-sync2024-03-28T10:34:28ZMarcin GodzinaKea primary server in "passive backup" freeze/crash on receiving ha-syncKea HA server set as `primary` freezes after receiving `ha-sync` command with proper arguments.
The backup server does NOT crash.
Freeze occurs only in `passive-backup` mode.
The problem exists in both v4 and v6. Also, in Memfile and m...Kea HA server set as `primary` freezes after receiving `ha-sync` command with proper arguments.
The backup server does NOT crash.
Freeze occurs only in `passive-backup` mode.
The problem exists in both v4 and v6. Also, in Memfile and mysql/psql lease database.
**Kea versions tested:**
- 2.5.7-git 8c1f22e3fb65225a0279606a8a65962850a5f881
- 2.4.0 release tarball
**Tested systems:**
- Fedora 38 in VM on my local setup.
- Ubuntu 22.04, Alpine 3.16, Fedora 36 on Jenkins build farm.
**To Reproduce**
Steps to reproduce the behavior:
1. Run Kea HA servers in **Passive backup** configuration (tested configuration provided)
2. Wait for servers to connect.
3. Optionally add leases (crashes either way)
4. Send the `ha-sync` command with proper arguments to the primary server. (`"server-name": "server2"` for provided configuration) (Invalid arguments respond with error)
The primary server freezes after receiving a response to the `dhcp-disable` command, sent automatically to the backup server. It does not respond to kea-ctrl agent, keyboard interrupts or SIGHUP
<details><summary>Commands tested to freeze provided config:</summary>
```
{
command": "ha-sync",
"arguments": {
"server-name": "server2"
}
}
```
```
{
command": "ha-sync",
"arguments": {
"server-name": "server1"
}
}
```
```
{
command": "ha-sync",
"arguments": {
"server-name": "server2",
"max-period": 60
}
}
```
</details>
**Configuration**
<details><summary>Primary</summary>
```
{
"Dhcp4": {
"option-data": [],
"hooks-libraries": [
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"peers": [
{
"auto-failover": true,
"name": "server1",
"role": "primary",
"url": "http://192.168.56.102:8003/"
},
{
"auto-failover": true,
"name": "server2",
"role": "backup",
"url": "http://192.168.56.103:8003/"
}
],
"state-machine": {
"states": []
},
"mode": "passive-backup",
"this-server-name": "server1",
"multi-threading": {
"enable-multi-threading": true,
"http-dedicated-listener": true,
"http-listener-threads": 0,
"http-client-threads": 0
}
}
]
}
},
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_lease_cmds.so"
}
],
"shared-networks": [],
"subnet4": [
{
"subnet": "192.168.50.0/24",
"pools": [
{
"pool": "192.168.50.1-192.168.50.200"
}
],
"interface": "enp0s9"
}
],
"interfaces-config": {
"interfaces": [
"enp0s9"
]
},
"control-socket": {
"socket-type": "unix",
"socket-name": "/home/mgodzina/installed/keadev/var/run/kea/control_socket"
},
"renew-timer": 1000,
"rebind-timer": 2000,
"valid-lifetime": 4000,
"loggers": [
{
"name": "kea-dhcp4",
"output-options": [
{
"output": "/home/mgodzina/installed/keadev/var/log/kea.log"
}
],
"severity": "DEBUG",
"debuglevel": 99
}
],
"lease-database": {
"type": "memfile"
}
}
}
```
</details>
<details><summary>Backup</summary>
```
{
"Dhcp4": {
"option-data": [],
"hooks-libraries": [
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"peers": [
{
"auto-failover": true,
"name": "server1",
"role": "primary",
"url": "http://192.168.56.102:8003/"
},
{
"auto-failover": true,
"name": "server2",
"role": "backup",
"url": "http://192.168.56.103:8003/"
}
],
"state-machine": {
"states": []
},
"mode": "passive-backup",
"this-server-name": "server2",
"multi-threading": {
"enable-multi-threading": true,
"http-dedicated-listener": true,
"http-listener-threads": 0,
"http-client-threads": 0
}
}
]
}
},
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_lease_cmds.so"
}
],
"shared-networks": [],
"subnet4": [
{
"subnet": "192.168.50.0/24",
"pools": [
{
"pool": "192.168.50.1-192.168.50.200"
}
],
"interface": "enp0s9"
}
],
"interfaces-config": {
"interfaces": [
"enp0s9"
]
},
"control-socket": {
"socket-type": "unix",
"socket-name": "/home/mgodzina/installed/keadev/var/run/kea/control_socket"
},
"renew-timer": 1000,
"rebind-timer": 2000,
"valid-lifetime": 4000,
"loggers": [
{
"name": "kea-dhcp4",
"output-options": [
{
"output": "/home/mgodzina/installed/keadev/var/log/kea.log"
}
],
"severity": "DEBUG",
"debuglevel": 99
}
],
"lease-database": {
"type": "memfile"
}
}
}
```
</details>
**Logs**
<details><summary>Primary server log tail</summary>
```
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.commands/2096.139741364354944] COMMAND_SOCKET_CONNECTION_OPENED Opened socket 38 for incoming command connection
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.commands/2096.139741364354944] COMMAND_SOCKET_READ Received 127 bytes over command socket 38
2024-02-28 16:20:13.417 INFO [kea-dhcp4.commands/2096.139741364354944] COMMAND_RECEIVED Received command 'ha-sync'
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.callouts/2096.139741364354944] HOOKS_CALLOUTS_BEGIN begin all callouts for hook $ha_sync
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_CLIENT_REQUEST_SEND sending HTTP request POST / HTTP/1.1 to http://192.168.56.103:8003/
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_CLIENT_REQUEST_SEND_DETAILS detailed information about request sent to http://192.168.56.103:8003/:
POST / HTTP/1.1
Host: 192.168.56.103
Content-Length: 86
Content-Type: application/json
{ "arguments": { "origin": 2000 }, "command": "dhcp-disable", "service": [ "dhcp4" ] }
2024-02-28 16:20:13.417 INFO [kea-dhcp4.ha-hooks/2096.139741364354944] HA_SYNC_START server1: starting lease database synchronization with server2
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_SERVER_RESPONSE_RECEIVED received HTTP response from http://192.168.56.103:8003/
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_SERVER_RESPONSE_RECEIVED_DETAILS detailed information about well-formed response received from http://192.168.56.103:8003/:
HTTP/1.1 200 OK
Content-Length: 54
Content-Type: application/json
Date: Wed, 28 Feb 2024 15:20:13 GMT
[ { "result": 0, "text": "DHCPv4 service disabled" } ]
```
</details>
<details><summary>Backup server log snippet with timeout:</summary>
```
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_REQUEST_RECEIVE_START start receiving request from 192.168.56.102 with timeout 10
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_DATA_RECEIVED received 179 bytes from 192.168.56.102
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_CLIENT_REQUEST_RECEIVED received HTTP request from 192.168.56.102
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_CLIENT_REQUEST_RECEIVED_DETAILS detailed information about well-formed request received from 192.168.56.102:
POST / HTTP/1.1
Host: 192.168.56.103
Content-Length: 86
Content-Type: application/json
{ "arguments": { "origin": 2000 }, "command": "dhcp-disable", "service": [ "dhcp4" ] }
2024-02-28 16:20:13.413 INFO [kea-dhcp4.commands/20519.140151306917568] COMMAND_RECEIVED Received command 'dhcp-disable'
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.callouts/20519.140151306917568] HOOKS_CALLOUTS_BEGIN begin all callouts for hook command_processed
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.callouts/20519.140151306917568] HOOKS_CALLOUT_CALLED hooks library with index 1 has called a callout on hook command_processed that has address 0x7f778767ffe0 (callout duration: 0.000 ms)
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.callouts/20519.140151306917568] HOOKS_CALLOUTS_COMPLETE completed callouts for hook command_processed (total callouts duration: 0.000 ms)
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_SERVER_RESPONSE_SEND sending HTTP response HTTP/1.1 200 OK to 192.168.56.102
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_SERVER_RESPONSE_SEND_DETAILS detailed information about response sent to 192.168.56.102:
HTTP/1.1 200 OK
Content-Length: 54
Content-Type: application/json
Date: Wed, 28 Feb 2024 15:20:13 GMT
[ { "result": 0, "text": "DHCPv4 service disabled" } ]
2024-02-28 16:20:17.831 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: reclaim-expired-leases
2024-02-28 16:20:17.831 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_START starting reclamation of expired leases (limit = 100 leases or 250 milliseconds)
2024-02-28 16:20:17.831 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_GET_EXPIRED4 obtaining maximum 101 of expired IPv4 leases
2024-02-28 16:20:17.832 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE reclaimed 0 leases in 0.033 ms
2024-02-28 16:20:17.832 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_NO_MORE_EXPIRED_LEASES all expired leases have been reclaimed
2024-02-28 16:20:17.832 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: reclaim-expired-leases
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: flush-reclaimed-leases
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_RECLAIMED_LEASES_DELETE begin deletion of reclaimed leases expired more than 3600 seconds ago
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_DELETE_EXPIRED_RECLAIMED4 deleting reclaimed IPv4 leases that expired more than 3600 seconds ago
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_RECLAIMED_LEASES_DELETE_COMPLETE successfully deleted 0 expired-reclaimed leases
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: flush-reclaimed-leases
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: reclaim-expired-leases
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_START starting reclamation of expired leases (limit = 100 leases or 250 milliseconds)
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_GET_EXPIRED4 obtaining maximum 101 of expired IPv4 leases
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE reclaimed 0 leases in 0.032 ms
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_NO_MORE_EXPIRED_LEASES all expired leases have been reclaimed
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: reclaim-expired-leases
2024-02-28 16:20:37.891 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: reclaim-expired-leases
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_START starting reclamation of expired leases (limit = 100 leases or 250 milliseconds)
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_GET_EXPIRED4 obtaining maximum 101 of expired IPv4 leases
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE reclaimed 0 leases in 0.027 ms
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_NO_MORE_EXPIRED_LEASES all expired leases have been reclaimed
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: reclaim-expired-leases
2024-02-28 16:20:43.433 DEBUG [kea-dhcp4.http/20519.140151315310272] HTTP_IDLE_CONNECTION_TIMEOUT_OCCURRED closing persistent connection with 192.168.56.102 as a result of a timeout
2024-02-28 16:20:43.433 DEBUG [kea-dhcp4.http/20519.140151315310272] HTTP_CONNECTION_STOP stopping HTTP connection from 192.168.56.102
```
</details>
[gdb.txt](/uploads/de79e56462885f7947eab90267f7a658/gdb.txt)kea2.5.8Marcin SiodelskiMarcin Siodelskihttps://gitlab.isc.org/isc-projects/kea/-/issues/3252HA multiple relationships and RADIUS reselect are incompatible2024-03-27T15:26:47ZFrancis DupontHA multiple relationships and RADIUS reselect are incompatibleNothing trivial can be done to fix this other to drop the first query (RADIUS hook parks the query at the subnet select callout and knows the right subnet when the RADIUS response is received). For other queries using cached RADIUS infor...Nothing trivial can be done to fix this other to drop the first query (RADIUS hook parks the query at the subnet select callout and knows the right subnet when the RADIUS response is received). For other queries using cached RADIUS information the correctness relies on the order of the HA and RADIUS hooks (RADIUS before HA).kea2.6.0https://gitlab.isc.org/isc-projects/kea/-/issues/3250Unattended terminated state and a reboot2024-03-27T21:53:51ZMarcin SiodelskiUnattended terminated state and a rebootConsider the following case. The clocks on two HA-enabled servers diverge and the clock skew eventually exceeds 60 seconds. As a result, both servers transition to the terminated state. In this state, the servers continue serving DHCP cl...Consider the following case. The clocks on two HA-enabled servers diverge and the clock skew eventually exceeds 60 seconds. As a result, both servers transition to the terminated state. In this state, the servers continue serving DHCP clients but do not exchange the lease updates nor heartbeats. An administrator neglects to correct the clocks and one of the servers reboots. The server enters the `waiting` state and remains in this state hoping that the other server is restarted so they can continue the lease database synchronization and start normal operation. However, the server is unaware that its reboot was not triggered in the course of fixing the clocks, so it will wait for the partner endlessly (or until the administrator comes to work in the morning). The waiting server is not responding to the DHCP traffic until then.
This situation should not occur in a setup where NTP has been enabled. It also should not occur if there is a proper monitoring to detect that the clocks diverge early enough. However, there are chances this situation may happen when all of this is neglected.
The proposed solution is to apply a timeout (could even be several to 10 minutes long) for a server in the waiting state. If the transition of its partner does not occur until this timeout elapses, the server in the waiting state transitions back to the terminated state and continues serving the clients. The waiting server MUST NOT transition to the waiting state immediately after it detects that its partner is in the terminated state to allow enough time to the administrator to reboot the server sequentially after correcting the clocks.
[SF1598](https://isc.lightning.force.com/lightning/r/Case/500S6000003jBs3IAE/view)kea2.5.8https://gitlab.isc.org/isc-projects/kea/-/issues/3246DHCPRELEASE and lease expiration in active-standby HA setup2024-03-27T12:53:48ZPeter DaviesDHCPRELEASE and lease expiration in active-standby HA setupDHCPvRELEASE lease expiration in active-standby HA setup
Kea 2.5.5
When a client sends a DHCPRELEASE message to a Kea primary HA server, the expired
lease processing settings are honoured.
However, the primary updates the...DHCPvRELEASE lease expiration in active-standby HA setup
Kea 2.5.5
When a client sends a DHCPRELEASE message to a Kea primary HA server, the expired
lease processing settings are honoured.
However, the primary updates the failover server with instructions to delete the
lease.
This leads to a divergence of lease data between the two servers.
[SF00001636](https://isc.lightning.force.com/lightning/r/Case/500S6000004XPRy/view)kea2.5.8https://gitlab.isc.org/isc-projects/kea/-/issues/3226HA lease updates do not create an accounting entry in v62024-01-25T15:00:10ZAndrei Pavelandrei@isc.orgHA lease updates do not create an accounting entry in v6In v6, HA lease updates are done with the `lease6-bulk-apply` command which is not handled in the `command_processed` RADIUS callout.
This is unlike v4 which does create accounting entries for HA lease updates sent via `lease4-update`.In v6, HA lease updates are done with the `lease6-bulk-apply` command which is not handled in the `command_processed` RADIUS callout.
This is unlike v4 which does create accounting entries for HA lease updates sent via `lease4-update`.next-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/3206subnet-get commands should fetch leases for selected subnets with pagination2024-03-22T13:15:53ZMarcin Siodelskisubnet-get commands should fetch leases for selected subnets with paginationIn HA, we use lease commands to synchronize the database. The lease commands fetch all leases with pagination. However, in the hub-and-spoke model it would be useful to fetch the leases only for selected subnets because the relationships...In HA, we use lease commands to synchronize the database. The lease commands fetch all leases with pagination. However, in the hub-and-spoke model it would be useful to fetch the leases only for selected subnets because the relationships are partitioned by subnet. Today, all leases have to be fetched by each relationship and those that do not belong to the relationship are discarded. This is inefficient. One thing to consider is that subnet identifiers are listed explicitly in the commands.next-stable-3.0https://gitlab.isc.org/isc-projects/kea/-/issues/3125HA ignored packets cause DROP statistics counter increment2024-03-27T12:58:00ZDarren AnkneyHA ignored packets cause DROP statistics counter incrementHA_BUFFER6_RECEIVE_NOT_FOR_US increments drop counters.
- This happens at least with a load balancing configuration.
- I think maybe not with hot-standby since I don't think the service logs anything or cares about incoming client pack...HA_BUFFER6_RECEIVE_NOT_FOR_US increments drop counters.
- This happens at least with a load balancing configuration.
- I think maybe not with hot-standby since I don't think the service logs anything or cares about incoming client packets unless it loses contact with the HA peer?
- I cite BUFFER6 above but I'm sure the same is true for DHCPv4.
Possible solutions:
- introduce a new drop status that could be discounted later or part of a different drop statistic?
- Could introduce new status that it is ignored or filtered instead of dropped?
[SF1374](https://isc.lightning.force.com/lightning/r/Case/5007V00002YkO0oQAF/view)kea2.5.8https://gitlab.isc.org/isc-projects/kea/-/issues/2932Kea HA issue with terminating connection2023-11-10T09:50:24ZNick HahnKea HA issue with terminating connectionWe recently migrated our DHCP setup from dhcpd to Kea. It runs on
two servers with hot standby and a memfile backend for the leases. Kea
assigns IP addresses for around 7000 pools.
Over the past few months the HA connection terminated...We recently migrated our DHCP setup from dhcpd to Kea. It runs on
two servers with hot standby and a memfile backend for the leases. Kea
assigns IP addresses for around 7000 pools.
Over the past few months the HA connection terminated in random intervals.
From looking at the logs on the passive node I can see a lot of
'ResourceBusy: IP address ... could not be updated' warnings prior to
the connection terminating. Since multithreading is enabled I suspected
this may be due to the threads encountering a resource lock on the memfile.
I suppose after the lease update fails a few times, the connection is terminated.
Is the 'ResourceBusy' warning the cause for the terminating HA connection and
is there any way to fix the underlying issue? Any ideas on the issue are greatly
appraciated.
Here are the logs from the primary server:
```
Jun 12 15:04:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1, cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:04:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:04:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:04:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:04:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:04:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:04:45 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.ha-hooks.139625718580992] HA_LEASE_UPDATE_CONFLICT [hwtype=1], cid=[], tid=0x0: lease update to standby-dhcp (http://dhcp-2:8001/) returned conflict status code: ResourceBusy: IP address:123.123.123.123 could not be updated. (error code 4)
Jun 12 15:04:56 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:04:56 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:04:56 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625735366400] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:05:28 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:05:28 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:05:28 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625726973696] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:05:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625752151808] ALLOC_ENGINE_V4_ALLOC_FAIL_SUBNET [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 lease in the subnet 123.123.123.123/30, subnet-id 30926, shared network (none)
Jun 12 15:05:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625752151808] ALLOC_ENGINE_V4_ALLOC_FAIL [hwtype=1], cid=[], tid=0x0: failed to allocate an IPv4 address after 1 attempt(s)
Jun 12 15:05:31 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.alloc-engine.139625752151808] ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES [hwtype=1], cid=[], tid=0x0: Failed to allocate an IPv4 address for client with classes: ALL, HA_primary-dhcp, VENDOR_CLASS_MSFT 5.0, UNKNOWN
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.ha-hooks.139625701795584] HA_LEASE_UPDATE_CONFLICT [hwtype=1], cid=[], tid=0x0: lease update to standby-dhcp (http://dhcp-2:8001/) returned conflict status code: ResourceBusy: IP address:123.123.123.123 could not be updated. (error code 4)
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: WARN [kea-dhcp4.ha-hooks.139625718580992] HA_LEASE_UPDATE_CONFLICT [hwtype=1], cid=[], tid=0x0: lease update to standby-dhcp (http://dhcp-2:8001/) returned conflict status code: ResourceBusy: IP address:123.123.123.123 could not be updated. (error code 4)
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: ERROR [kea-dhcp4.ha-hooks.139625718580992] HA_LEASE_UPDATE_REJECTS_CAUSED_TERMINATION too many rejected lease updates cause the HA service to terminate
Jun 12 15:05:39 dhcp-1 kea-dhcp4[564812]: ERROR [kea-dhcp4.ha-hooks.139625718580992] HA_TERMINATED HA service terminated due to an unrecoverable condition. Check previous error message(s), address the problem and restart!
```
Here are the logs from the standby server:
```
Mar 12 19:25:06 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670034884352] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678688706, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 2907, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:25:06 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670009706240] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678688706, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 2907, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:27:28 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670009706240] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678688848, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 3812, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:32:05 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670018098944] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678689125, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 274, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:32:34 dhcp-2 kea-dhcp4[203037]: WARN [kea-dhcp4.lease-cmds-hooks.139670009706240] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "", "expire": 1678689154, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "", "hw-address": "", "ip-address": "", "state": 0, "subnet-id": 113, "valid-lft": 43200 }, reason: ResourceBusy: IP address:123.123.123.123 could not be updated.)
Mar 12 19:32:36 dhcp-2 kea-dhcp4[203037]: ERROR [kea-dhcp4.ha-hooks.139670104323840] HA_TERMINATED HA service terminated due to an unrecoverable condition. Check previous error message(s), address the problem and restart!
Mar 12 22:11:09 dhcp-2 kea-dhcp4[203037]: ERROR [kea-dhcp4.packets.139670138794688] DHCP4_BUFFER_RECEIVE_FAIL error on attempt to receive packet: Truncated DHCPv4 packet (len=0) received, at least 236 is expected.
```
The relevant config is the following on both hosts, differing only in the "this-server-name" property.
```
"hooks-libraries": [{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_lease_cmds.so",
"parameters": {}
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_stat_cmds.so",
"parameters": {}
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [{
"this-server-name": "standby-dhcp",
"mode": "hot-standby",
"heartbeat-delay": 10000,
"max-response-delay": 60000,
"max-ack-delay": 5000,
"max-unacked-clients": 5,
"peers": [{
"name": "primary-dhcp",
"url": "http://dhcp-1:8001/",
"role": "primary",
"auto-failover": true
}, {
"name": "standby-dhcp",
"url": "http://dhcp-2:8001/",
"role": "standby",
"auto-failover": true
}]
}]
}
}]
```next-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/2897Cross-check - server should check its HA partner config2023-06-15T13:50:50ZTomek MrugalskiCross-check - server should check its HA partner configHere's an idea for new HA capability. On startup (or when explicit command is called), the server retrieves its partner configuration with `config-get` and checks it for consistency: if the subnets and pools are defined the same way, if ...Here's an idea for new HA capability. On startup (or when explicit command is called), the server retrieves its partner configuration with `config-get` and checks it for consistency: if the subnets and pools are defined the same way, if the subnet-ids match etc.
Right now the doc says those should be the same, with the only difference being server-name, but we don't check it.
What to do with spotted differences is to be determined. We could print a warning, refuse HA connection, shutdown, or even maybe the primary attempt to fix its partner's config.
This is merely an idea. If we like it, the first step would be to turn this into more coherent design. Hence the ~design.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2775HA hook's URLs should support DNS resolution with configurable re-resolution2023-04-06T13:43:06ZTobias FlorekHA hook's URLs should support DNS resolution with configurable re-resolution---
name: Feature request
about: Allow using DNS resolution in HA hook's URLs
---
**Some initial questions**
- Are you sure your feature is not already implemented in the latest Kea version? **yes**
- Are you sure what you would like to...---
name: Feature request
about: Allow using DNS resolution in HA hook's URLs
---
**Some initial questions**
- Are you sure your feature is not already implemented in the latest Kea version? **yes**
- Are you sure what you would like to do is not possible using some other mechanisms? **not reasonable**
- Have you discussed your idea on kea-users or kea-dev mailing lists? **no**
**Is your feature request related to a problem? Please describe.**
I am deploying HA Kea on Kubernetes where (using SDNs) pod(/container) IPs are not constant. The hostname can be made persistent though.
Now I can create a Kubernetes service per Pod which will assign a so-called cluster IP which is stable and gets redirected to the pod.
This works alright for HA communicating to the control agent, but not using the dedicated listener.
**Describe the solution you'd like**
Preferably allow using DNS (re-)resolution for HA hook's URLs. Or allow specifying the listener's bind-address.
**Funding its development**
Kea is run by ISC, which is a small non-profit organization without any government funding or any permanent sponsorship organizations. Are you able and willing to participate financially in the development costs? **no**
**Participating in development**
Are you willing to participate in the feature development? ISC team always tries to make a feature as generic as possible, so it can be used in wide variety of situations. That means the proposed solution may be a bit different that you initially thought. Are you willing to take part in the design discussions? Are you willing to test an unreleased engineering code? **yes**
**Contacting you**
preferably via gitlab.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2714RFE: HA plugin ability to detect partner inabilty to receive client requests ...2023-07-31T14:12:57ZKevin FlemingRFE: HA plugin ability to detect partner inabilty to receive client requests and transition it to 'partner-down'---
name: Feature request
about: HA plugin ability to detect partner inabilty to receive client requests and transition it to 'partner-down'
---
**Some initial questions**
- Are you sure your feature is not already implemented in the l...---
name: Feature request
about: HA plugin ability to detect partner inabilty to receive client requests and transition it to 'partner-down'
---
**Some initial questions**
- Are you sure your feature is not already implemented in the latest Kea version? Yes
- Are you sure what you would like to do is not possible using some other mechanisms? Yes
- Have you discussed your idea on kea-users or kea-dev mailing lists? Yes
**Is your feature request related to a problem? Please describe.**
(This issue was created as a result of an extensive thread on kea-users)
When the HA plugin is being used in either hot-standby or load-balancing mode, Kea peers are able to notice some forms of communications failures and force the other peers to the 'partner-down' state in order to provide service to clients supported by the other peer.
However, in a situation where client requests are not being delivered to a peer, but it is otherwise fully operational including the peer-to-peer communications link, clients supported by that peer will not be serviced, but the other peer(s) care unable to notice the issue and take action to correct it. This situation could arise when the Kea peers are using separate network links for client traffic and HA traffic, or when the Kea peers are receiving client traffic via a DHCP relay and the relay configuration is incorrect.
**Describe the solution you'd like**
One (or more) opt-in mechanisms that the Kea admin can choose to enhance the ability to detect peer failures to service clients, even when the peer's Kea daemon is otherwise fully operational.
**Describe alternatives you've considered**
Some discussions about external monitoring solutions have occurred, and that is certainly an option which some admins could choose.
**Funding its development**
Kea is run by ISC, which is a small non-profit organization without any government funding or any permanent sponsorship organizations. Are you able and willing to participate financially in the development costs? Yes
**Participating in development**
Are you willing to participate in the feature development? ISC team always tries to make a feature as generic as possible, so it can be used in wide variety of situations. That means the proposed solution may be a bit different that you initially thought. Are you willing to take part in the design discussions? Are you willing to test an unreleased engineering code? Yesnext-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/2708HA pool rebalancing2023-02-02T14:23:33ZTomek MrugalskiHA pool rebalancingThis idea is not new. It was recently brought up by @cathya in Porto (see [notes](https://pad.isc.org/p/porto2022-kea-features-for-stork#L58). The overall concept is to design and implement a mechanism similar to the one in ISC DHCP. Whe...This idea is not new. It was recently brought up by @cathya in Porto (see [notes](https://pad.isc.org/p/porto2022-kea-features-for-stork#L58). The overall concept is to design and implement a mechanism similar to the one in ISC DHCP. When there are two servers in load-balancing, it is possible that one of them will run out of addresses while the other one still has many.
Couple random comments:
- The pool rebalancing would somehow make both partners negotiate the pools and rebalance them.
- Using a hysteresis approach with high/low threshold would prevent the mechanism to go crazy when running out of addresses. We don't want it to go crazy when there's one or two addresses left.
- The pool dynamism would add extra complexity as the modified pool range would need to be stored somewhere that would survive crashes/reboots etc.
This requires a ~design. It's a complicated feature request with a high potential for endless tweaks, conflicting tuning requests etc.
We will do it one day, but this would require a lot of design, testing and tuning.outstandinghttps://gitlab.isc.org/isc-projects/kea/-/issues/2700HA Load-Balancing Network issue detection between Relay and Kea2023-01-26T15:22:15ZMathias AichingerHA Load-Balancing Network issue detection between Relay and KeaHi,
I have already tried to resolve this issue with the kea users community, but it seems not many are using HA Load Balancing.
I have the following problem.
Scenario:
Multiple DHCP-Relays at different sites with both KEA-Servers as DH...Hi,
I have already tried to resolve this issue with the kea users community, but it seems not many are using HA Load Balancing.
I have the following problem.
Scenario:
Multiple DHCP-Relays at different sites with both KEA-Servers as DHCP-Servers. Both servers are available and the load balancing shifts the requests between the two servers.
Incident: Because of a network issue Kea 1 is not available from the clients. The network connection between Kea 1 and Kea 2 still works, so no partner-down state.
Expected behaviour: Kea 2 sees the unacked clients of Kea 1 and sets Kea 1 in partner-down state and handles all requests.
Experienced behaviour: Kea 2 still reports HA_BUFFER4_RECEIVE_NOT_FOR_US and does not handle the requests. Unacked clients is not counted.
Is there a misunderstanding or configuration mistake on my side?
```
{
"library": "/usr/local/lib/kea//hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"this-server-name": "server2",
"mode": "load-balancing",
"heartbeat-delay": 10000,
"max-response-delay": 60000,
"max-ack-delay": 10000,
"max-unacked-clients": 1,
"delayed-updates-limit": 100,
"peers": [
{
"name": "server1",
"url": "http://192.168.248.1:8080/",
"role": "primary",
"auto-failover": true
},
{
"name": "server2",
"url": "http://192.168.248.2:8080/",
"role": "secondary",
"auto-failover": true
}
]
}
]
}
}
```
Thank you,
Mathiasbackloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2592partner-down state transition when max-unacked-clients reached2022-11-24T14:45:22ZMarcin Siodelskipartner-down state transition when max-unacked-clients reachedSuppose the server lost the connection with its partner. The server begins the failover procedure by checking whether or not the partner responds to the DHCP queries. The `max-unacked-clients` setting controls how many different clients ...Suppose the server lost the connection with its partner. The server begins the failover procedure by checking whether or not the partner responds to the DHCP queries. The `max-unacked-clients` setting controls how many different clients should retry getting the lease with the increased value of the `secs` field before the server considers partner dead. One would expect the server to make `partner-down` transition as soon as the number of unacked clients reaches the configured number. In fact, the state transitions are generally performed when the server completes a heartbeat or a lease update. It is possible that under heavy traffic there will be much larger number of unacked clients and the server still sits in the normal state (e.g. hot-standby), waiting for the heartbeat trigger. Assuming the heartbeat interval is reasonable, it should probably be fine. However, we may consider starting the transition as soon as the number of unacked clients reaches the configured maximum.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2565HA lease v6 updates use the default hwtype and hwaddr_source2023-07-31T13:51:18ZAndrei Pavelandrei@isc.orgHA lease v6 updates use the default hwtype and hwaddr_sourceNotice the discrepancy in the last two columns:
* `server1`:
```
address,duid,valid_lifetime,expire,subnet_id,pref_lifetime,lease_type,iaid,prefix_len,fqdn_fwd,fqdn_rev,hostname,hwaddr,state,user_context,hwtype,hwaddr_source
2001:db8:5...Notice the discrepancy in the last two columns:
* `server1`:
```
address,duid,valid_lifetime,expire,subnet_id,pref_lifetime,lease_type,iaid,prefix_len,fqdn_fwd,fqdn_rev,hostname,hwaddr,state,user_context,hwtype,hwaddr_source
2001:db8:50::11,00:03:00:01:01:03:0d:04:0b:01,4000,1663013972,1,3000,0,5946,128,0,0,,01:03:0d:04:0b:01,0,,1,0
2001:db8:50::12,00:03:00:01:01:04:0e:05:0c:02,4000,1663013972,1,3000,0,3512,128,0,0,,01:04:0e:05:0c:02,0,,1,0
2001:db8:50::d,00:03:00:01:01:05:0f:06:0d:03,4000,1663013972,1,3000,0,5918,128,0,0,,01:05:0f:06:0d:03,0,,1,2
2001:db8:50::e,00:03:00:01:01:06:10:07:0e:04,4000,1663013973,1,3000,0,4936,128,0,0,,01:06:10:07:0e:04,0,,1,2
```
* `server2`:
```
address,duid,valid_lifetime,expire,subnet_id,pref_lifetime,lease_type,iaid,prefix_len,fqdn_fwd,fqdn_rev,hostname,hwaddr,state,user_context,hwtype,hwaddr_source
2001:db8:50::11,00:03:00:01:01:03:0d:04:0b:01,4000,1663013972,1,3000,0,5946,128,0,0,,01:03:0d:04:0b:01,0,,1,2
2001:db8:50::12,00:03:00:01:01:04:0e:05:0c:02,4000,1663013972,1,3000,0,3512,128,0,0,,01:04:0e:05:0c:02,0,,1,2
2001:db8:50::d,00:03:00:01:01:05:0f:06:0d:03,4000,1663013972,1,3000,0,5918,128,0,0,,01:05:0f:06:0d:03,0,,1,0
2001:db8:50::e,00:03:00:01:01:06:10:07:0e:04,4000,1663013973,1,3000,0,4936,128,0,0,,01:06:10:07:0e:04,0,,1,0
```
The ones with `hwaddr_source = 0` are updated from the other peer. `hwtype = 1` is also likely a default that happens to match its source in the examples above.
It looks like `Lease6::toElement()` and `Lease6Parser::parse()` need the `hwtype` and `hwaddr_source` capabilities.backloghttps://gitlab.isc.org/isc-projects/kea/-/issues/2427Kea HA hot-standby mode - standby peer not catching up2023-07-31T13:42:46ZfavqKea HA hot-standby mode - standby peer not catching upHi,
I'm testing a Kea HA setup in hot-standby mode, with the following settings:
* Kea 2.0.1 DHCPv4 + control agent.
* Two Kea instances: one "primary" and the other "standby".
* memfile backend with file persistence enabled.
* Lease...Hi,
I'm testing a Kea HA setup in hot-standby mode, with the following settings:
* Kea 2.0.1 DHCPv4 + control agent.
* Two Kea instances: one "primary" and the other "standby".
* memfile backend with file persistence enabled.
* Lease synchronization enabled in the HA setup.
* The only hooks libraries in use are ha and lease_cmds.
I ran perfdhcp simulating multiple clients against the primary. After a while of sending many requests to the primary, I see that both instances have stored leases, but the standby didn't completely catch up with the primary.
That is, when I inspect the leases on both instances using the lease4-get-all API command, I see that the number of leases did increase on both instances, but the standby has less leases than the primary.
If I manually call the ha-sync API command, or if I restart the standby, or if I reload the configuration in the standby, the standby does a sync and catches up with the primary, and the number of leases becomes equal again. However, if I then run perfdhcp repeatedly, standby eventually starts falling behind again.
Note that, when this happens, if I call the "ha-heartbeat" API command on both instances, they both report an "unsent-update-count" of 0.
A similar thing happens with DHCPv6.
Is this behavior expected? Is it normal for the standby to not catch up with the primary during HA operation, needing manual intervention ("ha-sync", restart or config reload) to catch up?
Thank you.outstandinghttps://gitlab.isc.org/isc-projects/kea/-/issues/2339Memory leak in HA scenario with backup server down2023-09-07T14:02:26ZBranimir RajtarMemory leak in HA scenario with backup server down---
name: Memory leak in HA scenario with backup server down
about: Memory loss is created on running instances
---
**Describe the bug**
HA mode is configured with three servers (primary, secondary, backup) and is serving clients. Whe...---
name: Memory leak in HA scenario with backup server down
about: Memory loss is created on running instances
---
**Describe the bug**
HA mode is configured with three servers (primary, secondary, backup) and is serving clients. When the backup server becomes unavailable, the primary and secondary experience a continuous memory leak which is manifested as a continuous increase in RSS memory use for the isc-kea-dhcp4-server process. The size of the memory leak is in direct correlation with the number of active clients - the larger number, the greater the memory leak. Once the backup server is deleted from the configuration or it becomes active again, there is no more memory increase, but the old memory is not freed.
**To Reproduce**
Steps to reproduce the behavior:
1. Run KEA (DHCP4 only) in HA scenario with two load-balancing servers (primary and secondary) and a single backup server
2. Start serving clients (40k in our scenario) and monitoring RSS usage for the KEA server process
3. Disable backup server
4. Verify that RSS usage is increasing continuously
5. Enable backup server
6. Verify that RSS usage is stable
**Expected behavior**
The servers should not have any memory leaks.
**Environment:**
- Kea version: 1.8.2, 2.0.2
- OS: Ubuntu 18.04
- Memfile
- libdhcp_lease_cmds, libdhcp_stat_cmds, libdhcp_ha
**Additional Information**
```
{
"Dhcp4": {
"dhcp-queue-control": {
"enable-queue": true,
"queue-type": "kea-ring4",
"capacity": 256
},
"interfaces-config": {
"interfaces": [
"eth1"
],
"dhcp-socket-type": "udp"
},
"control-socket": {
"socket-type": "unix",
"socket-name": "/tmp/kea-dhcp4-ctrl.sock"
},
"lease-database": {
"type": "memfile",
"persist": true,
"name": "/var/lib/kea/dhcp4.leases",
"lfc-interval": 3600,
"port": 0
},
"expired-leases-processing": {
"reclaim-timer-wait-time": 10,
"flush-reclaimed-timer-wait-time": 25,
"hold-reclaimed-time": 3600,
"max-reclaim-leases": 100,
"max-reclaim-time": 250,
"unwarned-reclaim-cycles": 5
},
"renew-timer": 60,
"rebind-timer": 100,
"valid-lifetime": 120,
"option-data": [],
"hooks-libraries": [
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_lease_cmds.so",
"parameters": {}
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_stat_cmds.so"
},
{
"library": "/usr/lib/x86_64-linux-gnu/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"this-server-name": "server3",
"mode": "load-balancing",
"heartbeat-delay": 3000,
"max-response-delay": 7000,
"max-ack-delay": 7000,
"max-unacked-clients": 20,
"peers": [
{
"name": "server2",
"url": "http://<XXX>:8080/",
"role": "secondary",
"auto-failover": true
},
{
"name": "server1",
"url": "http://<YYY>:8080/",
"role": "primary",
"auto-failover": true
},
{
"name": "server3",
"url": "http://<ZZZ>:8080/",
"role": "backup",
"auto-failover": true
}
]
}
]
}
}
],
"option-def": [
{
"name": "classless-static-route",
"code": 121,
"space": "dhcp4",
"type": "record",
"array": true,
"record-types": "uint8, uint8"
}
],
"client-classes": [
// anonymized
],
"subnet4": [
// anonymized
],
"reservations": [],
"loggers": [
{
"name": "kea-dhcp4",
"output_options": [
{
"output": "syslog"
}
],
"severity": "error",
"debuglevel": 0
}
]
}
}
```
**Contacting you**
Email/Github, telephone is available after contactnext-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/2223HA error when partner received duplicated DHCP requests with the same Transac...2023-07-31T13:39:24ZSpencer LoweHA error when partner received duplicated DHCP requests with the same Transaction ID**Describe the bug**
Because our redundant topology the same Kea server will get the same DHCP request message with the same transaction ID from different relays. We run our kea servers in the load-balancing HA mode. Server 1 will send t...**Describe the bug**
Because our redundant topology the same Kea server will get the same DHCP request message with the same transaction ID from different relays. We run our kea servers in the load-balancing HA mode. Server 1 will send the lease4-update to server 2. Because server 1 received the same packet twice with the same transaction ID it sends both updates to server 2. Server 2 then responds with resource busy because it is still updating the lease from the first request when the second request comes in. Because server 2 errors out server 1 puts the server into unknown state. It will then resync, but it breaks on every DHCP request. We are storing leases in Postgres.
**To Reproduce**
Steps to reproduce the behavior:
1. Run Kea dhcpv4 in load balancing mode. Send a duplicate DHCP request with the same transaction ID to a server.
2. Server 1 will process both DHCP requests and will send the lease4-update to server 2. Both of these requests happen at almost the same time.
3. Server 1 will then put server 2 into unknown state because the lease4-update command failed.
**Expected behavior**
I would expect the update to not fail, but I am not sure how DHCP servers are supposed to handle duplicate requests with the same transaction IDs
**Environment:**
- Kea dhcp4 v 2.0.0
- OS: [e.g. Debian 11 x64]
- Storing leases in Postgres
- `libdhcp_lease_cmds`, `libdhcp_ha`
**Additional Information**
- I have attached a PCAP that shows the DHCP requests and the lease4-update command
- Attached is the kea config that runs on both servers
[ss21-dhcp-debug.pcap](/uploads/f4c0beee834c75b7131e4e485af12d47/ss21-dhcp-debug.pcap)[kea-config.json](/uploads/a5a5a89cd10a736706b93ac7eabd4e9d/kea-config.json)
**Contacting you**
slowe@clairglobal.comnext-stable-2.6https://gitlab.isc.org/isc-projects/kea/-/issues/1914HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading sometimes fails2023-02-27T13:41:09ZAndrei Pavelandrei@isc.orgHAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading sometimes failsThis time it happened on distcheck on CentOS 8.
https://jenkins.aws.isc.org/job/kea-dev/job/distcheck/415/execution/node/136/log/?consoleFull
```
16:04:40 [ RUN ] HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading
16:04:...This time it happened on distcheck on CentOS 8.
https://jenkins.aws.isc.org/job/kea-dev/job/distcheck/415/execution/node/136/log/?consoleFull
```
16:04:40 [ RUN ] HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading
16:04:40 ../../../../../../../src/hooks/dhcp/high_availability/tests/ha_service_unittest.cc:1096: Failure
16:04:40 Expected equality of these values:
16:04:40 2
16:04:40 factory3_->getResponseCreator()->getReceivedRequests().size()
16:04:40 Which is: 1
16:04:40 ../../../../../../../src/hooks/dhcp/high_availability/tests/ha_service_unittest.cc:1102: Failure
16:04:40 Value of: update_request3
16:04:40 Actual: false
16:04:40 Expected: true
16:04:40 [ FAILED ] HAServiceTest.sendSuccessfulUpdatesAuthorizedMultiThreading (2 ms)
```backlog