dhcpd failing/crashing continuously
name: dhcpd failing/crashing in failover scenario
about:
Describe the bug
Configured 2 router appliances with dhcpd failover setting. One of them is always failing with the following error
../../../../bind-9.11.2-P1/lib/isc/unix/socket.c:3357: INSIST(!sock->pending_send) failed, back trace
The other appliance continuously logging following error message
../../../../bind-9.11.2-P1/lib/isc/unix/socket.c:1013: epoll_ctl(DEL), 11: Bad file descriptor
To Reproduce
Steps to reproduce the behavior:
- Run dhcpd with the following config(dhcdp.conf) on one of the appliance(failover primary). Command:
dhcpd -d -lf /var/dhcpd/dhcpd.leases eth2
authoritative
failover peer "dhcp-failover-eth2" {
primary;
address 10.24.108.10;
port 647;
peer address 10.24.108.11;
peer port 647;
split 128;
mclt 360;
max-response-delay 10;
max-unacked-updates 10;
load balance max seconds 5;
}
subnet 10.24.108.0 netmask 255.255.255.0 {
option routers 10.24.108.1;
option subnet-mask 255.255.255.0;
option broadcast-address 10.24.108.255;
pool {
failover peer "dhcp-failover-eth2";
#config for interface eth2
deny dynamic bootp clients;
range 10.24.108.70 10.24.108.80;
}
option domain-name-servers 8.8.8.8;
default-lease-time 7200;
max-lease-time 7200;
}
- Run dhcpd with the following config(dhcdp.conf) on other appliance (failover secondary)
authoritative;
failover peer "dhcp-failover-eth2" {
secondary;
address 10.24.108.11;
port 647;
peer address 10.24.108.10;
peer port 647;
max-response-delay 10;
max-unacked-updates 10;
load balance max seconds 5;
}
subnet 10.24.108.0 netmask 255.255.255.0 {
option routers 10.24.108.1;
option subnet-mask 255.255.255.0;
option broadcast-address 10.24.108.255;
pool {
failover peer "dhcp-failover-eth2";
#config for interface eth2
deny dynamic bootp clients;
range 10.24.108.70 10.24.108.80;
}
option domain-name-servers 8.8.8.8;
default-lease-time 7200;
max-lease-time 7200;
}
- Then, mostly on primary dhcpd seeing the INSIST error and epoll failure on secondary appliances respectively. Respective logs from dhcpd for primary followed by secondary appliances.
Internet Systems Consortium DHCP Server 4.4.1
Copyright 2004-2018 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /etc/dhcp/dhcpd.conf
Database file: /var/dhcpd/dhcpd.leases
PID file: /var/run/dhcpd.pid
Wrote 5 leases to leases file.
Listening on Socket/eth2/10.24.108.0/24
Sending on Socket/eth2/10.24.108.0/24
Sending on Socket/fallback/fallback-net
failover peer dhcp-failover-eth2: I move from communications-interrupted to startup
Server starting service.
failover peer dhcp-failover-eth2: peer moves from normal to recover
failover peer dhcp-failover-eth2: I move from startup to partner-down
../../../../bind-9.11.2-P1/lib/isc/unix/socket.c:3357: INSIST(!sock->pending_send) failed, back trace
Internet Systems Consortium DHCP Server 4.4.1
Copyright 2004-2018 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /etc/dhcp/dhcpd.conf
Database file: /var/dhcpd/dhcpd.leases
PID file: /var/run/dhcpd.pid
Wrote 5 leases to leases file.
Listening on Socket/eth2/10.24.108.0/24
Sending on Socket/eth2/10.24.108.0/24
Sending on Socket/fallback/fallback-net
failover peer dhcp-failover-eth2: I move from recover to startup
Server starting service.
../../../../bind-9.11.2-P1/lib/isc/unix/socket.c:1013: epoll_ctl(DEL), 11: Bad file descriptor
../../../../bind-9.11.2-P1/lib/isc/unix/socket.c:1013: epoll_ctl(DEL), 11: Bad file descriptor
Expected behavior Both the appliances should start dhcpd seamlessly without reporting error.
Environment:
- ISC DHCP version: isc-dhcpd-4.4.1
- OS: Linux
- Which features were compiled in: dhcpd (server)
Additional Information
Some initial questions
- Are you sure your feature is not already implemented in the latest ISC DHCP version? NO
- Are you sure your requrested feature is not already impemented in Kea? Perhaps it's a good time to consider migration? NO
- Are you sure what you would like to do is not possible using some other mechanisms? YES
- Have you discussed your idea on dhcp-users and/or dhcp-workers mailing lists? NO
Describe the solution you'd like A clear and concise description of what you want to happen.
Describe alternatives you've considered
- When this failure can occur? Would like to understand what reasons can cause?
- Does this failure is because of ddns? Since the source code pointing to bind-9.11.2-P1 directory, by any chance this is part of bind9 product of ISC? If so, Can I turn of this?
- If this is due to some addon of dhcpd, I'm OK to turnoff it.