Skip to content
  • Diego Fronza's avatar
    Fix deadlock between rndc addzone/delzone/modzone · 942b83d3
    Diego Fronza authored
    It follows a description of the steps that were leading to the deadlock:
    
    1. `do_addzone` calls `isc_task_beginexclusive`.
    
    2. `isc_task_beginexclusive` waits for (N_WORKERS - 1) halted tasks,
       this blocks waiting for those (no. workers -1) workers to halt.
    ...
    isc_task_beginexclusive(isc_task_t *task0) {
        ...
    	while (manager->halted + 1 < manager->workers) {
    		wake_all_queues(manager);
    		WAIT(&manager->halt_cond, &manager->halt_lock);
    	}
    ```
    
    3. It is possible that in `task.c / dispatch()` a worker is running a
       task event, if that event blocks it will not allow this worker to
       halt.
    
    4. `do_addzone` acquires `LOCK(&view->new_zone_lock);`,
    
    5. `rmzone` event is called from some worker's `dispatch()`, `rmzone`
       blocks waiting for the same lock.
    
    6. `do_addzone` calls `isc_task_beginexclusive`.
    
    7. Deadlock triggered, since:
    	- `rmzone` is wating for the lock.
    	- `isc_task_beginexclusive` is waiting for (no. workers - 1) to
    	   be halted
    	- since `rmzone` event is blocked it won't allow the worker to halt.
    
    To fix this, we updated do_addzone code to call isc_task_beginexclusive
    before the lock is acquired, we postpone locking to the nearest required
    place, same for isc_task_beginexclusive.
    
    The same could happen with rndc modzone, so that was addressed as well.
    942b83d3