MariaDB Connector Time Out Issue:
MariaDB Connector Time Out Issue:
In a setup with two DHCP servers connected to a MariaDB Galera cluster replicating
the data between the nodes, a read call occasionally hangs; it is unclear why.
The servers connect to a database via a local ha_proxy. The ha_proxy can switch
the database connection to another database instance when it detects a database
failure. The ha_proxy configuration for each node includes the
on-marked-down shutdown-sessions
clause. It should terminate the connection
from the Kea server upon database failure. In theory, the Kea server should be
able to reconnect to another database instance (via the same proxy) because it
is configured with max-reconnect-tries=2 and reconnect-wait-time=1000.
One of Kea's threads may hang during reading from the database after updating
lease information. This will pause the execution of periodic tasks and the
processing of DHCP requests. If, however, Kea is configured with HA and the
"dedicated-listener" Kea will reply to heartbeats commands from its partner.
As a result, the partner will believe the server is operational.
if (!timeout)
timeout= -1;
do {
rc= poll(&p_fd, 1, timeout);
} while (rc == -1 && errno == EINTR);
The libc poll() function blocks the call when the timeout is negative. In other
words, if the application using the MariaDB connector (Kea in this case) doesn't
set the read timeout for the connection, the read from the database may never
return, causing the hang in the application. The function may return upon the
TCP connection timeout; this may take a relatively time.
It has not been possible to provoke a hang, on cutover to a different database;
the outcome is either a correct switchover or a shutdown. Shutdowns appear to
occur when the database switchover takes longer than 2 seconds. One should consider
bumping up the max-reconnect-tries or the reconnect-wait-time (or both) if the
termination frequently occurs instead of the switchover.
Using iptables
to block the traffic from the "ha_proxy" to the database while
the server was running and allocating leases can cause Kea to hang.
Kea needs an additional configuration knob (or knobs) to configure the database
read timeouts. Currently, it is always to 0. The possible limitations are that
these timeouts are only valid for some MySQL versions (but versions not supporting
it are pretty old), and only for TCP connections (not UNIX domain socket connections).
This means we may need some additional logic to validate if this setting applies.