1. 11 Oct, 2021 1 commit
    • Ondřej Surý's avatar
      Disable lame-ttl cache · 8fe18c05
      Ondřej Surý authored and Michał Kępień's avatar Michał Kępień committed
      The lame-ttl cache is implemented in ADB as per-server locked
      linked-list "indexed" with <qname,qtype>.  This list has to be walked
      every time there's a new query or new record added into the lame cache.
      Determined attacker can use this to degrade performance of the resolver.
      
      Resolver testing has shown that disabling the lame cache has little
      impact on the resolver performance and it's a minimal viable defense
      against this kind of attack.
      8fe18c05
  2. 30 Sep, 2021 1 commit
    • Arаm Sаrgsyаn's avatar
      Handle a missing zone when reloading a catalog zone · 311074f5
      Arаm Sаrgsyаn authored
      Previously a missing/deleted zone which was referenced by a catalog
      zone was causing a crash when doing a reload.
      
      This commit will make `named` to ignore the fact that the zone is
      missing, and make sure to restore it later on.
      
      (cherry picked from commit 94a57128)
      311074f5
  3. 29 Sep, 2021 1 commit
    • Mark Andrews's avatar
      Fix "check-names master" and "check-names slave" · 7aa30aae
      Mark Andrews authored
      check for type "master" / "slave" at the same time as checking
      for "primary" / "secondary" as we step through the maps.
      
      Checking "primary" then "master" or "master" then "primary" does
      not work as the synomym is not checked for to stop the search.
      Similarly with "secondary" and "slave".
      
      (cherry picked from commit a3c6516a)
      7aa30aae
  4. 14 Sep, 2021 1 commit
    • Ondřej Surý's avatar
      Remove the code to adjust listening interfaces for *-source-v6 · 0807d8b0
      Ondřej Surý authored and Ondřej Surý's avatar Ondřej Surý committed
      Previously, named would run with a configuration
      where *-source-v6 (notify-source-v6, transfer-source-v6 and
      query-source-v6) address and port could be simultaneously used for
      listening.  This is no longer true for BIND 9.16+ and the code that
      would do interface adjustments would unexpectedly disable listening on
      TCP for such interfaces.
      
      This commit removes the code that would adjust listening interfaces
      for addresses/ports configured in *-source-v6 option.
      
      (cherry picked from commit 8ac1d4e0)
      0807d8b0
  5. 30 Aug, 2021 1 commit
  6. 25 Aug, 2021 1 commit
  7. 12 Jul, 2021 1 commit
  8. 01 Jul, 2021 2 commits
  9. 22 Jun, 2021 1 commit
    • Michał Kępień's avatar
      Use minimal-sized caches for non-recursive views · 126436cc
      Michał Kępień authored
      Currently the implicit default for the "max-cache-size" option is "90%".
      As this option is inherited by all configured views, using multiple
      views can lead to memory exhaustion over time due to overcommitment.
      The "max-cache-size 90%;" default also causes cache RBT hash tables to
      be preallocated for every configured view, which does not really make
      sense for views which do not allow recursion.
      
      To limit this problem's potential for causing operational issues, use a
      minimal-sized cache for views which do not allow recursion and do not
      have "max-cache-size" explicitly set (either in global configuration or
      in view configuration).
      
      For configurations which include multiple views allowing recursion,
      adjusting "max-cache-size" appropriately is still left to the operator.
      
      (cherry picked from commit 86541b39)
      126436cc
  10. 31 May, 2021 1 commit
    • Ondřej Surý's avatar
      Refactor zone dumping code to use netmgr async threadpools · c8eddf4f
      Ondřej Surý authored and Ondřej Surý's avatar Ondřej Surý committed
      Previously, dumping the zones to the files were quantized, so it doesn't
      slow down network IO processing.  With the introduction of network
      manager asynchronous threadpools, we can move the IO intensive work to
      use that API and we don't have to quantize the work anymore as it the
      file IO won't block anything except other zone dumping processes.
      
      (cherry picked from commit 8a5c62de)
      c8eddf4f
  11. 20 May, 2021 1 commit
  12. 14 May, 2021 1 commit
    • Evan Hunt's avatar
      backport of netmgr/taskmgr to 9.16 · ef1d909f
      Evan Hunt authored and Ondřej Surý's avatar Ondřej Surý committed
      this rolls up numerous changes that have been applied to the
      main branch, including moving isc_task operations into the
      netmgr event loops, and other general stabilization.
      ef1d909f
  13. 05 May, 2021 1 commit
  14. 30 Apr, 2021 1 commit
    • Matthijs Mekking's avatar
      Add built-in dnssec-policy "insecure" · 375112a6
      Matthijs Mekking authored
      Add a new built-in policy "insecure", to be used to gracefully unsign
      a zone. Previously you could just remove the 'dnssec-policy'
      configuration from your zone statement, or remove it.
      
      The built-in policy "none" (or not configured) now actually means
      no DNSSEC maintenance for the corresponding zone. So if you
      immediately reconfigure your zone from whatever policy to "none",
      your zone will temporarily be seen as bogus by validating resolvers.
      
      This means we can remove the functions 'dns_zone_use_kasp()' and
      'dns_zone_secure_to_insecure()' again. We also no longer have to
      check for the existence of key state files to figure out if a zone
      is transitioning to insecure.
      
      (cherry picked from commit 2710d9a1)
      375112a6
  15. 29 Apr, 2021 1 commit
  16. 26 Apr, 2021 1 commit
    • Diego Fronza's avatar
      Fix deadlock between rndc addzone/delzone/modzone · 942b83d3
      Diego Fronza authored
      It follows a description of the steps that were leading to the deadlock:
      
      1. `do_addzone` calls `isc_task_beginexclusive`.
      
      2. `isc_task_beginexclusive` waits for (N_WORKERS - 1) halted tasks,
         this blocks waiting for those (no. workers -1) workers to halt.
      ...
      isc_task_beginexclusive(isc_task_t *task0) {
          ...
      	while (manager->halted + 1 < manager->workers) {
      		wake_all_queues(manager);
      		WAIT(&manager->halt_cond, &manager->halt_lock);
      	}
      ```
      
      3. It is possible that in `task.c / dispatch()` a worker is running a
         task event, if that event blocks it will not allow this worker to
         halt.
      
      4. `do_addzone` acquires `LOCK(&view->new_zone_lock);`,
      
      5. `rmzone` event is called from some worker's `dispatch()`, `rmzone`
         blocks waiting for the same lock.
      
      6. `do_addzone` calls `isc_task_beginexclusive`.
      
      7. Deadlock triggered, since:
      	- `rmzone` is wating for the lock.
      	- `isc_task_beginexclusive` is waiting for (no. workers - 1) to
      	   be halted
      	- since `rmzone` event is blocked it won't allow the worker to halt.
      
      To fix this, we updated do_addzone code to call isc_task_beginexclusive
      before the lock is acquired, we postpone locking to the nearest required
      place, same for isc_task_beginexclusive.
      
      The same could happen with rndc modzone, so that was addressed as well.
      942b83d3
  17. 22 Mar, 2021 1 commit
    • Matthijs Mekking's avatar
      Rekey immediately after rndc checkds/rollover · d12b40f6
      Matthijs Mekking authored
      Call 'dns_zone_rekey' after a 'rndc dnssec -checkds' or 'rndc dnssec
      -rollover' command is received, because such a command may influence
      the next key event. Updating the keys immediately avoids unnecessary
      rollover delays.
      
      The kasp system test no longer needs to call 'rndc loadkeys' after
      a 'rndc dnssec -checkds' or 'rndc dnssec -rollover' command.
      
      (cherry picked from commit 82f72ae2)
      d12b40f6
  18. 18 Mar, 2021 1 commit
    • Ondřej Surý's avatar
      Change the isc_nm_(get|set)timeouts() to work with milliseconds · db49ffca
      Ondřej Surý authored
      The RFC7828 specifies the keepalive interval to be 16-bit, specified in
      units of 100 milliseconds and the configuration options tcp-*-timeouts
      are following the suit.  The units of 100 milliseconds are very
      unintuitive and while we can't change the configuration and presentation
      format, we should not follow this weird unit in the API.
      
      This commit changes the isc_nm_(get|set)timeouts() functions to work
      with milliseconds and convert the values to milliseconds before passing
      them to the function, not just internally.
      db49ffca
  19. 16 Feb, 2021 2 commits
  20. 15 Feb, 2021 1 commit
    • Diego Fronza's avatar
      Fix dangling references to outdated views after reconfig · d89a8bf6
      Diego Fronza authored
      This commit fix a leak which was happening every time an inline-signed
      zone was added to the configuration, followed by a rndc reconfig.
      
      During the reconfig process, the secure version of every inline-signed
      zone was "moved" to a new view upon a reconfig and it "took the raw
      version along", but only once the secure version was freed (at shutdown)
      was prev_view for the raw version detached from, causing the old view to
      be released as well.
      
      This caused dangling references to be kept for the previous view, thus
      keeping all resources used by that view in memory.
      d89a8bf6
  21. 29 Jan, 2021 2 commits
    • Diego Fronza's avatar
      Added option for disabling stale-answer-client-timeout · 0aebad96
      Diego Fronza authored and Matthijs Mekking's avatar Matthijs Mekking committed
      This commit allows to specify "disabled" or "off" in
      stale-answer-client-timeout statement. The logic to support this
      behavior will be added in the subsequent commits.
      
      This commit also ensures an upper bound to stale-answer-client-timeout
      which equals to one second less than 'resolver-query-timeout'.
      
      (cherry picked from commit 0ad6f594)
      0aebad96
    • Diego Fronza's avatar
      Add stale-answer-client-timeout option · 3478794a
      Diego Fronza authored and Matthijs Mekking's avatar Matthijs Mekking committed
      The general logic behind the addition of this new feature works as
      folows:
      
      When a client query arrives, the basic path (query.c / ns_query_recurse)
      was to create a fetch, waiting for completion in fetch_callback.
      
      With the introduction of stale-answer-client-timeout, a new event of
      type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
      answers are enabled and the fetch took longer than
      stale-answer-client-timeout to complete.
      
      When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
      must ensure that the folowing happens:
      
      1. Setup a new query context with the sole purpose of looking up for
         stale RRset only data, for that matters a new flag was added
         'DNS_DBFIND_STALEONLY' used in database lookups.
      
          . If a stale RRset is found, mark the original client query as
            answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
            so when the fetch completion event is received later, we avoid
            answering the client twice.
      
          . If a stale RRset is not found, cleanup and wait for the normal
            fetch completion event.
      
      2. In ns_query_done, we must change this part:
      	/*
      	 * If we're recursing then just return; the query will
      	 * resume when recursion ends.
      	 */
      	if (RECURSING(qctx->client)) {
      		return (qctx->result);
      	}
      
         To this:
      
      	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
      		return (qctx->result);
      	}
      
         Otherwise we would not proceed to answer the client if it happened
         that a stale answer was found when looking up for stale only data.
      
      When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
      proceed as before, resuming query, updating stats, etc, but a few
      exceptions had to be added, most important of which are two:
      
      1. Before answering the client (ns_client_send), check if the query
         wasn't already answered before.
      
      2. Before detaching a client, e.g.
         isc_nmhandle_detach(&client->reqhandle), ensure that this is the
         fetch completion event, and not the one triggered due to
         stale-answer-client-timeout, so a correct call would be:
         if (!QUERY_STALEONLY(client)) {
              isc_nmhandle_detach(&client->reqhandle);
         }
      
      Other than these notes, comments were added in code in attempt to make
      these updates easier to follow.
      
      (cherry picked from commit 171a5b75)
      3478794a
  22. 28 Jan, 2021 1 commit
  23. 27 Jan, 2021 1 commit
  24. 12 Jan, 2021 1 commit
  25. 04 Jan, 2021 1 commit
  26. 23 Dec, 2020 1 commit
    • Matthijs Mekking's avatar
      Treat dnssec-policy "none" as a builtin zone · cf0439cd
      Matthijs Mekking authored
      Configure "none" as a builtin policy. Change the 'cfg_kasp_fromconfig'
      api so that the 'name' will determine what policy needs to be
      configured.
      
      When transitioning a zone from secure to insecure, there will be
      cases when a zone with no DNSSEC policy (dnssec-policy none) should
      be using KASP. When there are key state files available, this is an
      indication that the zone once was DNSSEC signed but is reconfigured
      to become insecure.
      
      If we would not run the keymgr, named would abruptly remove the
      DNSSEC records from the zone, making the zone bogus. Therefore,
      change the code such that a zone will use kasp if there is a valid
      dnssec-policy configured, or if there are state files available.
      
      (cherry picked from commit cf420b2a)
      cf0439cd
  27. 09 Dec, 2020 1 commit
    • Ondřej Surý's avatar
      Refactor netmgr and add more unit tests · 7b9c8b97
      Ondřej Surý authored
      This is a part of the works that intends to make the netmgr stable,
      testable, maintainable and tested.  It contains a numerous changes to
      the netmgr code and unfortunately, it was not possible to split this
      into smaller chunks as the work here needs to be committed as a complete
      works.
      
      NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
      tcpdns.c and it should be a subject to refactoring in the future.
      
      The changes that are included in this commit are listed here
      (extensively, but not exclusively):
      
      * The netmgr_test unit test was split into individual tests (udp_test,
        tcp_test, tcpdns_test and newly added tcp_quota_test)
      
      * The udp_test and tcp_test has been extended to allow programatic
        failures from the libuv API.  Unfortunately, we can't use cmocka
        mock() and will_return(), so we emulate the behaviour with #define and
        including the netmgr/{udp,tcp}.c source file directly.
      
      * The netievents that we put on the nm queue have variable number of
        members, out of these the isc_nmsocket_t and isc_nmhandle_t always
        needs to be attached before enqueueing the netievent_<foo> and
        detached after we have called the isc_nm_async_<foo> to ensure that
        the socket (handle) doesn't disappear between scheduling the event and
        actually executing the event.
      
      * Cancelling the in-flight TCP connection using libuv requires to call
        uv_close() on the original uv_tcp_t handle which just breaks too many
        assumptions we have in the netmgr code.  Instead of using uv_timer for
        TCP connection timeouts, we use platform specific socket option.
      
      * Fix the synchronization between {nm,async}_{listentcp,tcpconnect}
      
        When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
        waiting for socket to either end up with error (that path was fine) or
        to be listening or connected using condition variable and mutex.
      
        Several things could happen:
      
          0. everything is ok
      
          1. the waiting thread would miss the SIGNAL() - because the enqueued
             event would be processed faster than we could start WAIT()ing.
             In case the operation would end up with error, it would be ok, as
             the error variable would be unchanged.
      
          2. the waiting thread miss the sock->{connected,listening} = `true`
             would be set to `false` in the tcp_{listen,connect}close_cb() as
             the connection would be so short lived that the socket would be
             closed before we could even start WAIT()ing
      
      * The tcpdns has been converted to using libuv directly.  Previously,
        the tcpdns protocol used tcp protocol from netmgr, this proved to be
        very complicated to understand, fix and make changes to.  The new
        tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
        Closes: #2194, #2283, #2318, #2266, #2034, #1920
      
      * The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
        pass accepted TCP sockets between netthreads, but instead (similar to
        UDP) uses per netthread uv_loop listener.  This greatly reduces the
        complexity as the socket is always run in the associated nm and uv
        loops, and we are also not touching the libuv internals.
      
        There's an unfortunate side effect though, the new code requires
        support for load-balanced sockets from the operating system for both
        UDP and TCP (see #2137).  If the operating system doesn't support the
        load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
        on FreeBSD 12+), the number of netthreads is limited to 1.
      
      * The netmgr has now two debugging #ifdefs:
      
        1. Already existing NETMGR_TRACE prints any dangling nmsockets and
           nmhandles before triggering assertion failure.  This options would
           reduce performance when enabled, but in theory, it could be enabled
           on low-performance systems.
      
        2. New NETMGR_TRACE_VERBOSE option has been added that enables
           extensive netmgr logging that allows the software engineer to
           precisely track any attach/detach operations on the nmsockets and
           nmhandles.  This is not suitable for any kind of production
           machine, only for debugging.
      
      * The tlsdns netmgr protocol has been split from the tcpdns and it still
        uses the old method of stacking the netmgr boxes on top of each other.
        We will have to refactor the tlsdns netmgr protocol to use the same
        approach - build the stack using only libuv and openssl.
      
      * Limit but not assert the tcp buffer size in tcp_alloc_cb
        Closes: #2061
      
      (cherry picked from commit 634bdfb1)
      7b9c8b97
  28. 08 Dec, 2020 1 commit
  29. 26 Nov, 2020 7 commits
    • Matthijs Mekking's avatar
      Detect NSEC3 salt collisions · 6db87916
      Matthijs Mekking authored
      When generating a new salt, compare it with the previous NSEC3
      paremeters to ensure the new parameters are different from the
      previous ones.
      
      This moves the salt generation call from 'bin/named/*.s' to
      'lib/dns/zone.c'. When setting new NSEC3 parameters, you can set a new
      function parameter 'resalt' to enforce a new salt to be generated. A
      new salt will also be generated if 'salt' is set to NULL.
      
      Logging salt with zone context can now be done with 'dnssec_log',
      removing the need for 'dns_nsec3_log_salt'.
      
      (cherry picked from commit 6b5d7357)
      6db87916
    • Matthijs Mekking's avatar
      Add zone context to "generated salt" logs · 734865e1
      Matthijs Mekking authored
      (cherry picked from commit 3b4c764b)
      734865e1
    • Matthijs Mekking's avatar
      Move logging of salt in separate function · 93f9d3b8
      Matthijs Mekking authored
      There may be a desire to log the salt without losing the context
      of log module, level, and category.
      
      (cherry picked from commit 7878f300)
      93f9d3b8
    • Matthijs Mekking's avatar
      Don't use 'rndc signing' with kasp · b6cf8833
      Matthijs Mekking authored
      The 'rndc signing' command allows you to manipulate the private
      records that are used to store signing state. Don't use these with
      'dnssec-policy' as such manipulations may violate the policy (if you
      want to change the NSEC3 parameters, change the policy and reconfig).
      
      (cherry picked from commit eae9a6d2)
      b6cf8833
    • Matthijs Mekking's avatar
      Fix a reconfig bug wrt inline-signing · d13786d5
      Matthijs Mekking authored
      When doing 'rndc reconfig', named may complain about a zone not being
      reusable because it has a raw version of the zone, and the new
      configuration has not set 'inline-signing'. However, 'inline-signing'
      may be implicitly true if a 'dnssec-policy' is used for the zone, and
      the zone is not dynamic.
      
      Improve the check in 'named_zone_reusable'.  Create a new function for
      checking 'inline-signing' configuration that matches existing code in
      'bin/named/server.c'.
      
      (cherry picked from commit ba8128ea)
      d13786d5
    • Matthijs Mekking's avatar
      Support for NSEC3 in dnssec-policy · 008e84e9
      Matthijs Mekking authored
      Implement support for NSEC3 in dnssec-policy.  Store the configuration
      in kasp objects. When configuring a zone, call 'dns_zone_setnsec3param'
      to queue an nsec3param event. This will ensure that any previous
      chains will be removed and a chain according to the dnssec-policy is
      created.
      
      Add tests for dnssec-policy zones that uses the new 'nsec3param'
      option, as well as changing to new values, changing to NSEC, and
      changing from NSEC.
      
      (cherry picked from commit 114af58e)
      008e84e9
    • Matthijs Mekking's avatar
      Move generate_salt function to lib/dns/nsec3 · 9b9ac92f
      Matthijs Mekking authored
      We will be using this function also on reconfig, so it should have
      a wider availability than just bin/named/server.
      
      (cherry picked from commit 84a42730)
      9b9ac92f
  30. 11 Nov, 2020 2 commits