Fix deadlock between rndc addzone/delzone/modzone
It follows a description of the steps that were leading to the deadlock:
-
do_addzone
callsisc_task_beginexclusive
. -
isc_task_beginexclusive
waits for (N_WORKERS - 1) halted tasks, this blocks waiting for those (no. workers -1) workers to halt.
isc_task_beginexclusive(isc_task_t *task0) {
...
while (manager->halted + 1 < manager->workers) {
wake_all_queues(manager);
WAIT(&manager->halt_cond, &manager->halt_lock);
}
-
It is possible that in
task.c / dispatch()
a worker is running a task event, if that event blocks it will not allow this worker to halt. -
do_addzone
acquiresLOCK(&view->new_zone_lock);
, -
rmzone
event is called from some worker'sdispatch()
,rmzone
blocks waiting for the same lock. -
do_addzone
callsisc_task_beginexclusive
. -
Deadlock triggered, since:
-
rmzone
is wating for the lock. -
isc_task_beginexclusive
is waiting for (no. workers - 1) to be halted - since
rmzone
event is blocked it won't allow the worker to halt.
-
To fix this, we updated do_addzone code to call isc_task_beginexclusive before the lock is acquired, we postpone locking to the nearest required place, same for isc_task_beginexclusive.
The same could happen with rndc modzone, so that was addressed as well.
Closes #2626 (closed)