Skip to content
  • Witold Kręcicki's avatar
    Fix a race in taskmgr between worker and task pausing/unpausing. · e1c4a691
    Witold Kręcicki authored
    To reproduce the race - create a task, send two events to it, first one
    must take some time. Then, from the outside, pause(), unpause() and detach()
    the task.
    When the long-running event is processed by the task it is in
    task_state_running state. When we called pause() the state changed to
    task_state_paused, on unpause we checked that there are events in the task
    queue, changed the state to task_state_ready and enqueued the task on the
    workers readyq. We then detach the task.
    The dispatch() is done with processing the event, it processes the second
    event in the queue, and then shuts down the task and frees it (as it's not
    referenced anymore). Dispatcher then takes the, already freed, task from
    the queue where it was wrongly put, causing an use-after free and,
    subsequently, either an assertion failure or a segmentation fault.
    The probability of this happening is very slim, yet it might happen under a
    very high load, more probably on a recursive resolver than on an
    authoritative.
    The fix introduces a new 'task_state_pausing' state - to which tasks
    are moved if they're being paused while still running. They are moved
    to task_state_paused state when dispatcher is done with them, and
    if we unpause a task in paused state it's moved back to task_state_running
    and not requeued.
    e1c4a691