HA error when partner received duplicated DHCP requests with the same Transaction ID
Describe the bug Because our redundant topology the same Kea server will get the same DHCP request message with the same transaction ID from different relays. We run our kea servers in the load-balancing HA mode. Server 1 will send the lease4-update to server 2. Because server 1 received the same packet twice with the same transaction ID it sends both updates to server 2. Server 2 then responds with resource busy because it is still updating the lease from the first request when the second request comes in. Because server 2 errors out server 1 puts the server into unknown state. It will then resync, but it breaks on every DHCP request. We are storing leases in Postgres.
To Reproduce Steps to reproduce the behavior:
- Run Kea dhcpv4 in load balancing mode. Send a duplicate DHCP request with the same transaction ID to a server.
- Server 1 will process both DHCP requests and will send the lease4-update to server 2. Both of these requests happen at almost the same time.
- Server 1 will then put server 2 into unknown state because the lease4-update command failed.
Expected behavior I would expect the update to not fail, but I am not sure how DHCP servers are supposed to handle duplicate requests with the same transaction IDs
Environment:
- Kea dhcp4 v 2.0.0
- OS: [e.g. Debian 11 x64]
- Storing leases in Postgres
-
libdhcp_lease_cmds
,libdhcp_ha
Additional Information
- I have attached a PCAP that shows the DHCP requests and the lease4-update command
- Attached is the kea config that runs on both servers
ss21-dhcp-debug.pcapkea-config.json
Contacting you slowe@clairglobal.com