"logfileconfig" system test crashes occasionally
All recent failures of the logfileconfig
system test look similar and are caused by segmentation faults. Two examples:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/751371
- https://gitlab.isc.org/isc-private/bind9/-/jobs/753929
I believe !2791 (merged) is the culprit.
The immediate problem is that lcfg
is already destroyed when the segfault happens:
(gdb) print /x *lcfg
$26 = {magic = 0xdededede, lctx = 0xdededededededede, channels = {head = 0xdededededededede,
tail = 0xdededededededede}, channellists = 0xdededededededede, channellist_count = 0xdededede,
duplicate_interval = 0xdededede, highest_level = 0xdededede, tag = 0xdededededededede, dynamic = 0xde}
However, looking at the code, it could not have yet been destroyed only a few lines above the crash location (which is inside the while
loop in isc_log_doit()
in both cases, though on different lines) as e.g. lcfg->channellists
was dereferenced without a segfault. This indicates that another thread must have freed lcfg
while the crashing thread was running.
Note that in both crashes there is another thread running that is trying to lock the logging context mutex and in both cases that locking attempt is made from the same call site:
(gdb) frame 3
#3 0x00007f9490d6619c in isc_log_doit (lctx=0x1da1060, category=0x7f9490fadfd0, module=0x677730,
level=1, write_once=false, format=0x4649d8 "now using logging configuration from config file",
args=0x7f9489296480) at log.c:1537
This call site is inside load_configuration()
, right after a call to isc_logconfig_use()
, which contains a call to isc_logconfig_destroy()
.
I believe the crash happens upon startup and the order of events is as follows:
-
Thread A (the crashing thread) tries to log a message using the default logging configuration. It reaches
isc_log_doit()
, locks the logging context mutex and fetches the pointer to the default logging configuration fromlctx->logconfig
. -
Thread B is in the process of loading configuration. It calls
isc_logconfig_use()
to replace the default logging configuration with the configured one. This causesisc_logconfig_destroy()
to destroy the default logging configuration in the process. -
Thread A continues its attempt to log the requested message, but the default logging configuration is already destroyed and a segmentation fault occurs.
Note that before !2791 (merged), isc_logconfig_use()
was holding the logging context mutex while updating lctx->logconfig
, so the logging configuration could never be replaced while a message was being logged by another thread. !2791 (merged) replaced the LOCK()/UNLOCK()
call in that location with a call to atomic_exchange_acq_rel()
, allowing isc_logconfig_destroy()
to be called while it was still in use by another thread.
This is a race between threads, it is not exploitable.