Improve reliability of the netmgr unit tests (!4628) · Merge requests · ISC Open Source Projects / BIND

Ondřej Surý requested to merge 2416-improve-netmgr-unit-tests-reliability into main Jan 29, 2021

Improve reliability of the netmgr unit tests

The netmgr unit tests were designed to push the system limits to maximum by sending as many queries as possible in the busy loop from multiple threads. This mostly works with UDP, but in the stateful protocol where establishing the connection takes more time, it failed quite often in the CI. On FreeBSD, this happened more often, because the socket() call would fail spuriosly making the problem even worse.

This commit does several things to improve reliability:

return value of isc_nm_connect() is always checked and retried when scheduling the connection fails
The busy while loop has been slowed down with usleep(1000); so the netmgr threads could schedule the work and get executed.
The isc_thread_yield() was replaced with usleep(1000); also to allow the other threads to do any work.
Instead of waiting on just one variable, we wait for multiple variables to reach the final value
We are wrapping the netmgr operations (connects, reads, writes, accepts) with reference counting and waiting for all the callbacks to be accounted for.

This has two effects:

a) the isc_nm_t is always clean of active sockets and handles when destroyed, so it will prevent the spurious INSIST(references == 1) from isc_nm_destroy()

b) the unit test now ensures that all the callbacks are always called when they should be called, so any stuck test means that there was a missing callback call and it is always a real bug

These changes allows us to remove the workaround that would not run certain tests on systems without port load-balancing.

Closes #2416 (closed) #2455 (closed)

Edited Mar 19, 2021 by Ondřej Surý

Improve reliability of the netmgr unit tests

Merge request reports