Skip to content
  • Ondřej Surý's avatar
    Refactor netmgr and add more unit tests · 634bdfb1
    Ondřej Surý authored
    This is a part of the works that intends to make the netmgr stable,
    testable, maintainable and tested.  It contains a numerous changes to
    the netmgr code and unfortunately, it was not possible to split this
    into smaller chunks as the work here needs to be committed as a complete
    works.
    
    NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
    tcpdns.c and it should be a subject to refactoring in the future.
    
    The changes that are included in this commit are listed here
    (extensively, but not exclusively):
    
    * The netmgr_test unit test was split into individual tests (udp_test,
      tcp_test, tcpdns_test and newly added tcp_quota_test)
    
    * The udp_test and tcp_test has been extended to allow programatic
      failures from the libuv API.  Unfortunately, we can't use cmocka
      mock() and will_return(), so we emulate the behaviour with #define and
      including the netmgr/{udp,tcp}.c source file directly.
    
    * The netievents that we put on the nm queue have variable number of
      members, out of these the isc_nmsocket_t and isc_nmhandle_t always
      needs to be attached before enqueueing the netievent_<foo> and
      detached after we have called the isc_nm_async_<foo> to ensure that
      the socket (handle) doesn't disappear between scheduling the event and
      actually executing the event.
    
    * Cancelling the in-flight TCP connection using libuv requires to call
      uv_close() on the original uv_tcp_t handle which just breaks too many
      assumptions we have in the netmgr code.  Instead of using uv_timer for
      TCP connection timeouts, we use platform specific socket option.
    
    * Fix the synchronization between {nm,async}_{listentcp,tcpconnect}
    
      When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
      waiting for socket to either end up with error (that path was fine) or
      to be listening or connected using condition variable and mutex.
    
      Several things could happen:
    
        0. everything is ok
    
        1. the waiting thread would miss the SIGNAL() - because the enqueued
           event would be processed faster than we could start WAIT()ing.
           In case the operation would end up with error, it would be ok, as
           the error variable would be unchanged.
    
        2. the waiting thread miss the sock->{connected,listening} = `true`
           would be set to `false` in the tcp_{listen,connect}close_cb() as
           the connection would be so short lived that the socket would be
           closed before we could even start WAIT()ing
    
    * The tcpdns has been converted to using libuv directly.  Previously,
      the tcpdns protocol used tcp protocol from netmgr, this proved to be
      very complicated to understand, fix and make changes to.  The new
      tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
      Closes: #2194, #2283, #2318, #2266, #2034, #1920
    
    * The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
      pass accepted TCP sockets between netthreads, but instead (similar to
      UDP) uses per netthread uv_loop listener.  This greatly reduces the
      complexity as the socket is always run in the associated nm and uv
      loops, and we are also not touching the libuv internals.
    
      There's an unfortunate side effect though, the new code requires
      support for load-balanced sockets from the operating system for both
      UDP and TCP (see #2137).  If the operating system doesn't support the
      load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
      on FreeBSD 12+), the number of netthreads is limited to 1.
    
    * The netmgr has now two debugging #ifdefs:
    
      1. Already existing NETMGR_TRACE prints any dangling nmsockets and
         nmhandles before triggering assertion failure.  This options would
         reduce performance when enabled, but in theory, it could be enabled
         on low-performance systems.
    
      2. New NETMGR_TRACE_VERBOSE option has been added that enables
         extensive netmgr logging that allows the software engineer to
         precisely track any attach/detach operations on the nmsockets and
         nmhandles.  This is not suitable for any kind of production
         machine, only for debugging.
    
    * The tlsdns netmgr protocol has been split from the tcpdns and it still
      uses the old method of stacking the netmgr boxes on top of each other.
      We will have to refactor the tlsdns netmgr protocol to use the same
      approach - build the stack using only libuv and openssl.
    
    * Limit but not assert the tcp buffer size in tcp_alloc_cb
      Closes: #2061
    634bdfb1