Skip to content

Fix deadlock between rndc addzone/delzone/modzone

It follows a description of the steps that were leading to the deadlock:

  1. do_addzone calls isc_task_beginexclusive.

  2. isc_task_beginexclusive waits for (N_WORKERS - 1) halted tasks, this blocks waiting for those (no. workers -1) workers to halt.

isc_task_beginexclusive(isc_task_t *task0) {
    ...
	while (manager->halted + 1 < manager->workers) {
		wake_all_queues(manager);
		WAIT(&manager->halt_cond, &manager->halt_lock);
	}
  1. It is possible that in task.c / dispatch() a worker is running a task event, if that event blocks it will not allow this worker to halt.

  2. do_addzone acquires LOCK(&view->new_zone_lock);,

  3. rmzone event is called from some worker's dispatch(), rmzone blocks waiting for the same lock.

  4. do_addzone calls isc_task_beginexclusive.

  5. Deadlock triggered, since:

    • rmzone is wating for the lock.
    • isc_task_beginexclusive is waiting for (no. workers - 1) to be halted
    • since rmzone event is blocked it won't allow the worker to halt.

To fix this, we updated do_addzone code to call isc_task_beginexclusive before the lock is acquired, we postpone locking to the nearest required place, same for isc_task_beginexclusive.

The same could happen with rndc modzone, so that was addressed as well.

Closes #2626 (closed)

Edited by Michał Kępień

Merge request reports