Kea primary server in "passive backup" freeze/crash on receiving ha-sync
Kea HA server set as primary
freezes after receiving ha-sync
command with proper arguments.
The backup server does NOT crash.
Freeze occurs only in passive-backup
mode.
The problem exists in both v4 and v6. Also, in Memfile and mysql/psql lease database.
Kea versions tested:
- 2.5.7-git 8c1f22e3
- 2.4.0 release tarball
Tested systems:
- Fedora 38 in VM on my local setup.
- Ubuntu 22.04, Alpine 3.16, Fedora 36 on Jenkins build farm.
To Reproduce
Steps to reproduce the behavior:
- Run Kea HA servers in Passive backup configuration (tested configuration provided)
- Wait for servers to connect.
- Optionally add leases (crashes either way)
- Send the
ha-sync
command with proper arguments to the primary server. ("server-name": "server2"
for provided configuration) (Invalid arguments respond with error) The primary server freezes after receiving a response to thedhcp-disable
command, sent automatically to the backup server. It does not respond to kea-ctrl agent, keyboard interrupts or SIGHUP
Commands tested to freeze provided config:
{
command": "ha-sync",
"arguments": {
"server-name": "server2"
}
}
{
command": "ha-sync",
"arguments": {
"server-name": "server1"
}
}
{
command": "ha-sync",
"arguments": {
"server-name": "server2",
"max-period": 60
}
}
Configuration
Primary
{
"Dhcp4": {
"option-data": [],
"hooks-libraries": [
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"peers": [
{
"auto-failover": true,
"name": "server1",
"role": "primary",
"url": "http://192.168.56.102:8003/"
},
{
"auto-failover": true,
"name": "server2",
"role": "backup",
"url": "http://192.168.56.103:8003/"
}
],
"state-machine": {
"states": []
},
"mode": "passive-backup",
"this-server-name": "server1",
"multi-threading": {
"enable-multi-threading": true,
"http-dedicated-listener": true,
"http-listener-threads": 0,
"http-client-threads": 0
}
}
]
}
},
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_lease_cmds.so"
}
],
"shared-networks": [],
"subnet4": [
{
"subnet": "192.168.50.0/24",
"pools": [
{
"pool": "192.168.50.1-192.168.50.200"
}
],
"interface": "enp0s9"
}
],
"interfaces-config": {
"interfaces": [
"enp0s9"
]
},
"control-socket": {
"socket-type": "unix",
"socket-name": "/home/mgodzina/installed/keadev/var/run/kea/control_socket"
},
"renew-timer": 1000,
"rebind-timer": 2000,
"valid-lifetime": 4000,
"loggers": [
{
"name": "kea-dhcp4",
"output-options": [
{
"output": "/home/mgodzina/installed/keadev/var/log/kea.log"
}
],
"severity": "DEBUG",
"debuglevel": 99
}
],
"lease-database": {
"type": "memfile"
}
}
}
Backup
{
"Dhcp4": {
"option-data": [],
"hooks-libraries": [
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [
{
"peers": [
{
"auto-failover": true,
"name": "server1",
"role": "primary",
"url": "http://192.168.56.102:8003/"
},
{
"auto-failover": true,
"name": "server2",
"role": "backup",
"url": "http://192.168.56.103:8003/"
}
],
"state-machine": {
"states": []
},
"mode": "passive-backup",
"this-server-name": "server2",
"multi-threading": {
"enable-multi-threading": true,
"http-dedicated-listener": true,
"http-listener-threads": 0,
"http-client-threads": 0
}
}
]
}
},
{
"library": "/home/mgodzina/installed/keadev/lib/kea/hooks/libdhcp_lease_cmds.so"
}
],
"shared-networks": [],
"subnet4": [
{
"subnet": "192.168.50.0/24",
"pools": [
{
"pool": "192.168.50.1-192.168.50.200"
}
],
"interface": "enp0s9"
}
],
"interfaces-config": {
"interfaces": [
"enp0s9"
]
},
"control-socket": {
"socket-type": "unix",
"socket-name": "/home/mgodzina/installed/keadev/var/run/kea/control_socket"
},
"renew-timer": 1000,
"rebind-timer": 2000,
"valid-lifetime": 4000,
"loggers": [
{
"name": "kea-dhcp4",
"output-options": [
{
"output": "/home/mgodzina/installed/keadev/var/log/kea.log"
}
],
"severity": "DEBUG",
"debuglevel": 99
}
],
"lease-database": {
"type": "memfile"
}
}
}
Logs
Primary server log tail
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.commands/2096.139741364354944] COMMAND_SOCKET_CONNECTION_OPENED Opened socket 38 for incoming command connection
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.commands/2096.139741364354944] COMMAND_SOCKET_READ Received 127 bytes over command socket 38
2024-02-28 16:20:13.417 INFO [kea-dhcp4.commands/2096.139741364354944] COMMAND_RECEIVED Received command 'ha-sync'
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.callouts/2096.139741364354944] HOOKS_CALLOUTS_BEGIN begin all callouts for hook $ha_sync
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_CLIENT_REQUEST_SEND sending HTTP request POST / HTTP/1.1 to http://192.168.56.103:8003/
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_CLIENT_REQUEST_SEND_DETAILS detailed information about request sent to http://192.168.56.103:8003/:
POST / HTTP/1.1
Host: 192.168.56.103
Content-Length: 86
Content-Type: application/json
{ "arguments": { "origin": 2000 }, "command": "dhcp-disable", "service": [ "dhcp4" ] }
2024-02-28 16:20:13.417 INFO [kea-dhcp4.ha-hooks/2096.139741364354944] HA_SYNC_START server1: starting lease database synchronization with server2
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_SERVER_RESPONSE_RECEIVED received HTTP response from http://192.168.56.103:8003/
2024-02-28 16:20:13.417 DEBUG [kea-dhcp4.http/2096.139741364354944] HTTP_SERVER_RESPONSE_RECEIVED_DETAILS detailed information about well-formed response received from http://192.168.56.103:8003/:
HTTP/1.1 200 OK
Content-Length: 54
Content-Type: application/json
Date: Wed, 28 Feb 2024 15:20:13 GMT
[ { "result": 0, "text": "DHCPv4 service disabled" } ]
Backup server log snippet with timeout:
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_REQUEST_RECEIVE_START start receiving request from 192.168.56.102 with timeout 10
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_DATA_RECEIVED received 179 bytes from 192.168.56.102
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_CLIENT_REQUEST_RECEIVED received HTTP request from 192.168.56.102
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_CLIENT_REQUEST_RECEIVED_DETAILS detailed information about well-formed request received from 192.168.56.102:
POST / HTTP/1.1
Host: 192.168.56.103
Content-Length: 86
Content-Type: application/json
{ "arguments": { "origin": 2000 }, "command": "dhcp-disable", "service": [ "dhcp4" ] }
2024-02-28 16:20:13.413 INFO [kea-dhcp4.commands/20519.140151306917568] COMMAND_RECEIVED Received command 'dhcp-disable'
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.callouts/20519.140151306917568] HOOKS_CALLOUTS_BEGIN begin all callouts for hook command_processed
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.callouts/20519.140151306917568] HOOKS_CALLOUT_CALLED hooks library with index 1 has called a callout on hook command_processed that has address 0x7f778767ffe0 (callout duration: 0.000 ms)
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.callouts/20519.140151306917568] HOOKS_CALLOUTS_COMPLETE completed callouts for hook command_processed (total callouts duration: 0.000 ms)
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_SERVER_RESPONSE_SEND sending HTTP response HTTP/1.1 200 OK to 192.168.56.102
2024-02-28 16:20:13.413 DEBUG [kea-dhcp4.http/20519.140151306917568] HTTP_SERVER_RESPONSE_SEND_DETAILS detailed information about response sent to 192.168.56.102:
HTTP/1.1 200 OK
Content-Length: 54
Content-Type: application/json
Date: Wed, 28 Feb 2024 15:20:13 GMT
[ { "result": 0, "text": "DHCPv4 service disabled" } ]
2024-02-28 16:20:17.831 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: reclaim-expired-leases
2024-02-28 16:20:17.831 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_START starting reclamation of expired leases (limit = 100 leases or 250 milliseconds)
2024-02-28 16:20:17.831 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_GET_EXPIRED4 obtaining maximum 101 of expired IPv4 leases
2024-02-28 16:20:17.832 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE reclaimed 0 leases in 0.033 ms
2024-02-28 16:20:17.832 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_NO_MORE_EXPIRED_LEASES all expired leases have been reclaimed
2024-02-28 16:20:17.832 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: reclaim-expired-leases
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: flush-reclaimed-leases
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_RECLAIMED_LEASES_DELETE begin deletion of reclaimed leases expired more than 3600 seconds ago
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_DELETE_EXPIRED_RECLAIMED4 deleting reclaimed IPv4 leases that expired more than 3600 seconds ago
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_RECLAIMED_LEASES_DELETE_COMPLETE successfully deleted 0 expired-reclaimed leases
2024-02-28 16:20:21.840 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: flush-reclaimed-leases
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: reclaim-expired-leases
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_START starting reclamation of expired leases (limit = 100 leases or 250 milliseconds)
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_GET_EXPIRED4 obtaining maximum 101 of expired IPv4 leases
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE reclaimed 0 leases in 0.032 ms
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_NO_MORE_EXPIRED_LEASES all expired leases have been reclaimed
2024-02-28 16:20:27.852 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: reclaim-expired-leases
2024-02-28 16:20:37.891 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: reclaim-expired-leases
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_START starting reclamation of expired leases (limit = 100 leases or 250 milliseconds)
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_MEMFILE_GET_EXPIRED4 obtaining maximum 101 of expired IPv4 leases
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE reclaimed 0 leases in 0.027 ms
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.alloc-engine/20519.140151383601024] ALLOC_ENGINE_V4_NO_MORE_EXPIRED_LEASES all expired leases have been reclaimed
2024-02-28 16:20:37.892 DEBUG [kea-dhcp4.dhcpsrv/20519.140151383601024] DHCPSRV_TIMERMGR_START_TIMER starting timer: reclaim-expired-leases
2024-02-28 16:20:43.433 DEBUG [kea-dhcp4.http/20519.140151315310272] HTTP_IDLE_CONNECTION_TIMEOUT_OCCURRED closing persistent connection with 192.168.56.102 as a result of a timeout
2024-02-28 16:20:43.433 DEBUG [kea-dhcp4.http/20519.140151315310272] HTTP_CONNECTION_STOP stopping HTTP connection from 192.168.56.102
Edited by Marcin Godzina