Skip to content
  • Tony Finch's avatar
    Refactor qp-trie to use QSBR · 4b5ec07b
    Tony Finch authored
    The first working multi-threaded qp-trie was stuck with an unpleasant
    trade-off:
    
      * Use `isc_rwlock`, which has acceptable write performance, but
        terrible read scalability because the qp-trie made all accesses
        through a single lock.
    
      * Use `liburcu`, which has great read scalability, but terrible
        write performance, because I was relying on `rcu_synchronize()`
        which is rather slow. And `liburcu` is LGPL.
    
    To get the best of both worlds, we need our own scalable read side,
    which we now have with `isc_qsbr`. And we need to modify the write
    side so that it is not blocked by readers.
    
    Better write performance requires an async cleanup function like
    `call_rcu()`, instead of the blocking `rcu_synchronize()`. (There
    is no blocking cleanup in `isc_qsbr`, because I have concluded
    that it would be an attractive nuisance.)
    
    Until now, all my multithreading qp-trie designs have been based
    around two versions, read-only and mutable. This is too few to
    work with asynchronous cleanup. The bare minimum (as in epoch
    based reclamation) is three, but it makes more sense to support an
    arbitrary number. Doing multi-version support "properly" makes
    fewer assumptions about how safe memory reclamation works, and it
    makes snapshots and rollbacks simpler.
    
    To avoid making the memory management even more complicated, I
    have introduced a new kind of "packed reader node" to anchor the
    root of a version of the trie. This is simpler because it re-uses
    the existing chunk lifetime logic - see the discussion under
    "packed reader nodes" in `qp_p.h`.
    
    I have also made the chunk lifetime logic simpler. The idea of a
    "generation" is gone; instead, chunks are either mutable or
    immutable. And the QSBR phase number is used to indicate when a
    chunk can be reclaimed.
    
    Instead of the `shared_base` flag (which was basically a one-bit
    reference count, with a two version limit) the base array now has a
    refcount, which replaces the confusing ad-hoc lifetime logic with
    something more familiar and systematic.
    4b5ec07b