1. 30 Jul, 2020 3 commits
    • Michał Kępień's avatar
      Merge branch '2024-fix-idle-timeout-for-connected-tcp-sockets' into 'main' · 1ce582ca
      Michał Kępień authored
      Fix idle timeout for connected TCP sockets
      
      Closes #2024
      
      See merge request !3854
      1ce582ca
    • Michał Kępień's avatar
      Add CHANGES for GL #2024 · 18efb245
      Michał Kępień authored
      18efb245
    • Michał Kępień's avatar
      Fix idle timeout for connected TCP sockets · 953d704b
      Michał Kępień authored
      When named acting as a resolver connects to an authoritative server over
      TCP, it sets the idle timeout for that connection to 20 seconds.  This
      fixed timeout was picked back when the default processing timeout for
      each client query was hardcoded to 30 seconds.  Commit
      000a8970 made this processing timeout
      configurable through "resolver-query-timeout" and decreased its default
      value to 10 seconds, but the idle TCP timeout was not adjusted to
      reflect that change.  As a result, with the current defaults in effect,
      a single hung TCP connection will consistently cause the resolution
      process for a given query to time out.
      
      Set the idle timeout for connected TCP sockets to half of the client
      query processing timeout configured for a resolver.  This allows named
      to handle hung TCP connections more robustly and prevents the timeout
      mismatch issue from resurfacing in the future if the default is ever
      changed again.
      953d704b
  2. 29 Jul, 2020 2 commits
  3. 28 Jul, 2020 2 commits
  4. 27 Jul, 2020 3 commits
  5. 24 Jul, 2020 12 commits
  6. 23 Jul, 2020 2 commits
  7. 21 Jul, 2020 9 commits
    • Michal Nowak's avatar
      Merge branch '1727-drop-use-of-featuretest-have-dlopen' into 'main' · 064e314d
      Michal Nowak authored
      Drop feature test for dlopen()
      
      Closes #1727
      
      See merge request !3625
      064e314d
    • Michal Nowak's avatar
      Drop feature test for dlopen() · 2064e01c
      Michal Nowak authored
      With libtool being mandatory from 9.17 on, so is dlopen() (via libltdl).
      2064e01c
    • Ondřej Surý's avatar
      Merge branch... · 451ed397
      Ondřej Surý authored
      Merge branch '1775-resizing-growing-of-cache-hash-tables-causes-delays-in-processing-of-client-queries' into 'main'
      
      Fix the rbt hashtable and grow it when setting max-cache-size
      
      Closes #1775
      
      See merge request !3865
      451ed397
    • Ondřej Surý's avatar
      Add CHANGES and release note for #1775 · 2b4f0f03
      Ondřej Surý authored
      2b4f0f03
    • Ondřej Surý's avatar
      Change the dns_name hashing to use 32-bit values · a9182c89
      Ondřej Surý authored
      Change the dns_hash_name() and dns_hash_fullname() functions to use
      isc_hash32() as the maximum hashtable size in rbt is 0..UINT32_MAX
      large.
      a9182c89
    • Ondřej Surý's avatar
      Add isc_hash32() and rename isc_hash_function() to isc_hash64() · f59fd49f
      Ondřej Surý authored
      As the names suggest the original isc_hash64 function returns 64-bit
      long hash values and the isc_hash32() returns 32-bit values.
      f59fd49f
    • Ondřej Surý's avatar
      Add HalfSipHash 2-4 reference implementation · 344d66aa
      Ondřej Surý authored
      The HalfSipHash implementation has 32-bit keys and returns 32-bit
      value.
      344d66aa
    • Ondřej Surý's avatar
      Remove OpenSSL based SipHash 2-4 implementation · 21d751df
      Ondřej Surý authored
      Creation of EVP_MD_CTX and EVP_PKEY is quite expensive, so until we fix the code
      to reuse the OpenSSL contexts and keys we'll use our own implementation of
      siphash instead of trying to integrate with OpenSSL.
      21d751df
    • Ondřej Surý's avatar
      Fix the rbt hashtable and grow it when setting max-cache-size · e24bc324
      Ondřej Surý authored
      There were several problems with rbt hashtable implementation:
      
      1. Our internal hashing function returns uint64_t value, but it was
         silently truncated to unsigned int in dns_name_hash() and
         dns_name_fullhash() functions.  As the SipHash 2-4 higher bits are
         more random, we need to use the upper half of the return value.
      
      2. The hashtable implementation in rbt.c was using modulo to pick the
         slot number for the hash table.  This has several problems because
         modulo is: a) slow, b) oblivious to patterns in the input data.  This
         could lead to very uneven distribution of the hashed data in the
         hashtable.  Combined with the single-linked lists we use, it could
         really hog-down the lookup and removal of the nodes from the rbt
         tree[a].  The Fibonacci Hashing is much better fit for the hashtable
         function here.  For longer description, read "Fibonacci Hashing: The
         Optimization that the World Forgot"[b] or just look at the Linux
         kernel.  Also this will make Diego very happy :).
      
      3. The hashtable would rehash every time the number of nodes in the rbt
         tree would exceed 3 * (hashtable size).  The overcommit will make the
         uneven distribution in the hashtable even worse, but the main problem
         lies in the rehashing - every time the database grows beyond the
         limit, each subsequent rehashing will be much slower.  The mitigation
         here is letting the rbt know how big the cache can grown and
         pre-allocate the hashtable to be big enough to actually never need to
         rehash.  This will consume more memory at the start, but since the
         size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
         will only consume maximum of 32GB of memory for hashtable in the
         worst case (and max-cache-size would need to be set to more than
         4TB).  Calling the dns_db_adjusthashsize() will also cap the maximum
         size of the hashtable to the pre-computed number of bits, so it won't
         try to consume more gigabytes of memory than available for the
         database.
      
         FIXME: What is the average size of the rbt node that gets hashed?  I
         chose the pagesize (4k) as initial value to precompute the size of
         the hashtable, but the value is based on feeling and not any real
         data.
      
      For future work, there are more places where we use result of the hash
      value modulo some small number and that would benefit from Fibonacci
      Hashing to get better distribution.
      
      Notes:
      a. A doubly linked list should be used here to speedup the removal of
         the entries from the hashtable.
      b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
      e24bc324
  8. 20 Jul, 2020 1 commit
  9. 17 Jul, 2020 3 commits
    • Michal Nowak's avatar
      Check tests for core files regardless of test status · 1b13123c
      Michal Nowak authored
      Failed test should be checked for core files et al. and have
      backtrace generated.
      1b13123c
    • Michal Nowak's avatar
      Rationalize backtrace logging · 05c13e50
      Michal Nowak authored
      GDB backtrace generated via "thread apply all bt full" is too long for
      standard output, lets save them to .txt file among other log files.
      05c13e50
    • Michal Nowak's avatar
      Ensure various test issues are treated as failures · b232e858
      Michal Nowak authored
      Make sure bin/tests/system/run.sh returns a non-zero exit code if any of
      the following happens:
      
        - the test being run produces a core dump,
        - assertion failures are found in the test's logs,
        - ThreadSanitizer reports are found after the test completes,
        - the servers started by the test fail to shut down cleanly.
      
      This change is necessary to always fail a test in such cases (before the
      migration to Automake, test failures were determined based on the
      presence of "R:<test-name>:FAIL" lines in the test suite output and thus
      it was not necessary for bin/tests/system/run.sh to return a non-zero
      exit code).
      b232e858
  10. 16 Jul, 2020 3 commits