1. 29 Apr, 2021 2 commits
  2. 28 Apr, 2021 6 commits
    • Matthijs Mekking's avatar
      Merge branch 'matthijs-nit-serve-stale-fixesv9_16' into 'v9_16' · 1050d186
      Matthijs Mekking authored
      Serve-stale nit fixes (9.16)
      
      See merge request !4950
      1050d186
    • Matthijs Mekking's avatar
      Serve-stale nit fixes · 4615cbb5
      Matthijs Mekking authored
      While working on the serve-stale backports, I noticed the following
      oddities:
      
      1. In the serve-stale system test, in one case we keep track of the
         time how long it took for dig to complete. In commit
         aaed7f9d, the code removed the
         exception to check for result == ISC_R_SUCCESS on stale found
         answers, and adjusted the test accordingly. This failed to update
         the time tracking accordingly. Move the t1/t2 time track variables
         back around the two dig commands to ensure the lookups resolved
         faster than the resolver-query-timeout.
      
      2. We can remove the setting of NS_QUERYATTR_STALEOK and
         DNS_RDATASETATTR_STALE_ADDED on the "else if (stale_timeout)"
         code path, because they are added later when we know we have
         actually found a stale answer on a stale timeout lookup.
      
      3. We should clear the NS_QUERYATTR_STALEOK flag from the client
         query attributes instead of DNS_RDATASETATTR_STALE_ADDED (that
         flag is set on the rdataset attributes).
      
      4. In 'bin/named/config.c' we should set the configuration options
         in alpabetical order.
      
      5. In the ARM, in the backports we have added "(stale)" between
         "cached" and "RRset" to make more clear a stale RRset may be
         returned in this scenario.
      
      (cherry picked from commit 104b6762)
      4615cbb5
    • Michał Kępień's avatar
      Merge branch 'michal/limit-logging-for-verbose-system-tests-v9_16' into 'v9_16' · 99157e22
      Michał Kępień authored
      [v9_16] Limit logging for verbose system tests
      
      See merge request !4948
      99157e22
    • Michał Kępień's avatar
      Warn when log files grow too big in system tests · c0b15db6
      Michał Kępień authored
      Exerting excessive I/O load on the host running system tests should be
      avoided in order to limit the number of false positives reported by the
      system test suite.  In some cases, running named with "-d 99" (which is
      the default for system tests) results in a massive amount of logs being
      generated, most of which are useless.  Implement a log file size check
      to draw developers' attention to overly verbose named instances used in
      system tests.  The warning threshold of 200,000 lines was chosen
      arbitrarily.
      
      (cherry picked from commit 241e85ef)
      c0b15db6
    • Michał Kępień's avatar
      Prevent useless logging in the "tcp" system test · 13e97eb3
      Michał Kępień authored
      The regression test for CVE-2020-8620 causes a lot of useless messages
      to be logged.  However, globally decreasing the log level for the
      affected named instance would be a step too far as debugging information
      may be useful for troubleshooting other checks in the "tcp" system test.
      Starting a separate named instance for a single check should be avoided
      when possible and thus is also not a good solution.  As a compromise,
      run "rndc trace 1" for the affected named instance before starting the
      regression test for CVE-2020-8620.
      
      (cherry picked from commit 17e5c2a5)
      13e97eb3
    • Michał Kępień's avatar
      Limit logging for verbose system tests · 966ab29e
      Michał Kępień authored
      The system test framework starts all named instances with the "-d 99"
      command line option (unless it is overridden by a named.args file in a
      given instance's working directory).  This causes a lot of log messages
      to be written to named.run files - currently over 5 million lines for a
      single test suite run.  While debugging information preserved in the log
      files is essential for troubleshooting intermittent test failures, some
      system tests involve sending hundreds or even thousands of queries,
      which causes the relevant log files to explode in size.  When multiple
      tests (or even multiple test suites) are run in parallel, excessive
      logging contributes considerably to the I/O load on the test host,
      increasing the odds of intermittent test failures getting triggered.
      
      Decrease the debug level for the seven most verbose named instances:
      
        - use "-d 3" for ns2 in the "cacheclean" system test (it is the lowest
          logging level at which the test still passes without the need to
          apply any changes to tests.sh),
      
        - use "-d 1" for the other six named instances.
      
      This roughly halves the number of lines logged by each test suite run
      while still leaving enough information in the logs to allow at least
      basic troubleshooting in case of test failures.
      
      This approach was chosen as it results in a greater decrease in the
      number of lines logged than running all named instances with "-d 3",
      without causing any test failures.
      
      (cherry picked from commit 4a8d4048)
      966ab29e
  3. 26 Apr, 2021 14 commits
    • Diego dos Santos Fronza's avatar
      Merge branch... · daf5f2be
      Diego dos Santos Fronza authored
      Merge branch '2626-deadlock-with-concurrent-rndc-addzone-rndc-delzone-commands-bp-v9_16' into 'v9_16'
      
      Fix deadlock between rndc addzone/delzone/modzone
      
      See merge request !4944
      daf5f2be
    • Diego Fronza's avatar
      Add CHANGES note for GL #2626 · 8bd7b2e3
      Diego Fronza authored
      8bd7b2e3
    • Diego Fronza's avatar
      Add system test for the deadlock fix · 6b37d3ac
      Diego Fronza authored
      The test spawns 4 parallel workers that keep adding, modifying and
      deleting zones, the main thread repeatedly checks wheter rndc
      status responds within a reasonable period.
      
      While environment and timing issues may affect the test, in most
      test cases the deadlock that was taking place before the fix used to
      trigger in less than 7 seconds in a machine with at least 2 cores.
      6b37d3ac
    • Diego Fronza's avatar
      Fix deadlock between rndc addzone/delzone/modzone · 942b83d3
      Diego Fronza authored
      It follows a description of the steps that were leading to the deadlock:
      
      1. `do_addzone` calls `isc_task_beginexclusive`.
      
      2. `isc_task_beginexclusive` waits for (N_WORKERS - 1) halted tasks,
         this blocks waiting for those (no. workers -1) workers to halt.
      ...
      isc_task_beginexclusive(isc_task_t *task0) {
          ...
      	while (manager->halted + 1 < manager->workers) {
      		wake_all_queues(manager);
      		WAIT(&manager->halt_cond, &manager->halt_lock);
      	}
      ```
      
      3. It is possible that in `task.c / dispatch()` a worker is running a
         task event, if that event blocks it will not allow this worker to
         halt.
      
      4. `do_addzone` acquires `LOCK(&view->new_zone_lock);`,
      
      5. `rmzone` event is called from some worker's `dispatch()`, `rmzone`
         blocks waiting for the same lock.
      
      6. `do_addzone` calls `isc_task_beginexclusive`.
      
      7. Deadlock triggered, since:
      	- `rmzone` is wating for the lock.
      	- `isc_task_beginexclusive` is waiting for (no. workers - 1) to
      	   be halted
      	- since `rmzone` event is blocked it won't allow the worker to halt.
      
      To fix this, we updated do_addzone code to call isc_task_beginexclusive
      before the lock is acquired, we postpone locking to the nearest required
      place, same for isc_task_beginexclusive.
      
      The same could happen with rndc modzone, so that was addressed as well.
      942b83d3
    • Michał Kępień's avatar
      Merge branch '2650-handle-soa-rrsigs-not-at-zone-apex-v9_16' into 'v9_16' · a6f6e018
      Michał Kępień authored
      [v9_16] Handle RRSIG(SOA) RRsets not at zone apex
      
      See merge request !4943
      a6f6e018
    • Michał Kępień's avatar
      Add CHANGES entry · ae5a84c8
      Michał Kępień authored
      (cherry picked from commit 47a7b042)
      ae5a84c8
    • Michał Kępień's avatar
      Test handling of non-apex RRSIG(SOA) RRsets · 82a5b915
      Michał Kępień authored
      Add a check to the "dnssec" system test which ensures that RRSIG(SOA)
      RRsets present anywhere else than at the zone apex are automatically
      removed after a zone containing such RRsets is loaded.
      
      (cherry picked from commit 24bf4b94)
      82a5b915
    • Mark Andrews's avatar
      Be more precise with the stopping conditions in zone_resigninc · b25a1943
      Mark Andrews authored
      If there happens to be a RRSIG(SOA) that is not at the zone apex
      for any reason it should not be considered as a stopping condition
      for incremental zone signing.
      
      (cherry picked from commit b7cdc358)
      b25a1943
    • Matthijs Mekking's avatar
      Merge branch '2628-kasp-create-multiple-key-keyid-conflict-v9_16' into 'v9_16' · b28c90ad
      Matthijs Mekking authored
      Check for keyid conflicts between new keys
      
      See merge request !4942
      b28c90ad
    • Matthijs Mekking's avatar
      Changes and release notes for [#2628] · c599fb85
      Matthijs Mekking authored
      (cherry picked from commit b99ec657)
      c599fb85
    • Matthijs Mekking's avatar
      Check for keyid conflicts between new keys · f82d4f04
      Matthijs Mekking authored
      When the keymgr needs to create new keys, it is possible it needs to
      create multiple keys. The keymgr checks for keyid conflicts with
      already existing keys, but it should also check against that it just
      created.
      
      (cherry picked from commit 668301f1)
      f82d4f04
    • Michał Kępień's avatar
      Merge branch '2634-test-tkey-gssapi-credential-conditionally-v9_16' into 'v9_16' · fd24d057
      Michał Kępień authored
      [v9_16] Test "tkey-gssapi-credential" conditionally
      
      See merge request !4939
      fd24d057
    • Michał Kępień's avatar
      Test "--without-gssapi" in GitLab CI · fb90a34f
      Michał Kępień authored
      GitLab CI pipelines do not currently include a Linux job that would have
      GSSAPI support disabled.  Add the "--without-gssapi" option to the
      ./configure invocation on Debian 9 to address that deficiency and also
      to continuously test that build-time switch.
      
      (cherry picked from commit a3957af8)
      fb90a34f
    • Michał Kępień's avatar
      Test "tkey-gssapi-credential" conditionally · cce29c39
      Michał Kępień authored
      If "tkey-gssapi-credential" is set in the configuration and GSSAPI
      support is not available, named will refuse to start.  As the test
      system framework does not support starting named instances
      conditionally, ensure that "tkey-gssapi-credential" is only present in
      named.conf if GSSAPI support is available.
      
      (cherry picked from commit 6feac68b)
      cce29c39
  4. 25 Apr, 2021 2 commits
  5. 23 Apr, 2021 4 commits
    • Petr Špaček's avatar
      Merge branch '2634-test-tkey-gssapi-credential-v9_16' into 'v9_16' · da0f4365
      Petr Špaček authored
      Add tests for the "tkey-gssapi-credential" option (v9.16)
      
      See merge request !4933
      da0f4365
    • Petr Špaček's avatar
      Add tests for the "tkey-gssapi-credential" option · 1d4fbe25
      Petr Špaček authored
      Four named instances in the "nsupdate" system test have GSS-TSIG support
      enabled.  All of them currently use "tkey-gssapi-keytab".  Configure two
      of them with "tkey-gssapi-credential" to test that option.
      
      As "tkey-gssapi-keytab" and "tkey-gssapi-credential" both provide the
      same functionality, no test modifications are required.  The difference
      between the two options is that the value of "tkey-gssapi-keytab" is an
      explicit path to the keytab file to acquire credentials from, while the
      value of "tkey-gssapi-credential" is the name of the principal whose
      credentials should be used; those credentials are looked up in the
      keytab file expected by the Kerberos library, i.e. /etc/krb5.keytab by
      default.  The path to the default keytab file can be overridden using by
      setting the KRB5_KTNAME environment variable.  Utilize that variable to
      use existing keytab files with the "tkey-gssapi-credential" option.
      
      The KRB5_KTNAME environment variable should not interfere with the
      "tkey-gssapi-keytab" option.  Nevertheless, rename one of the keytab
      files used with "tkey-gssapi-keytab" to something else than the contents
      of the KRB5_KTNAME environment variable in order to make sure that both
      "tkey-gssapi-keytab" and "tkey-gssapi-credential" are actually tested.
      
      (cherry picked from commit 1746d2e8)
      1d4fbe25
    • Mark Andrews's avatar
      Merge branch '2625-the-shutdown-system-test-is-not-capturing-enough-v9_16' into 'v9_16' · d4513539
      Mark Andrews authored
      Abort named if 'rndc stop' or 'kill TERM' has failed to shutdown
      
      See merge request !4932
      d4513539
    • Mark Andrews's avatar
      Abort named if 'rndc stop' or 'kill TERM' has failed to shutdown · 4e36debc
      Mark Andrews authored
      (cherry picked from commit c3c7f584)
      4e36debc
  6. 19 Apr, 2021 12 commits
    • Ondřej Surý's avatar
      Merge branch... · d4dad8fe
      Ondřej Surý authored
      Merge branch '2637-threadsanitizer-lock-order-inversion-potential-deadlock-in-zone_refreshkeys-v9_16' into 'v9_16'
      
      Fix lock-order-inversion (potential deadlock) in dns_resolver_createfetch
      
      See merge request !4920
      d4dad8fe
    • Ondřej Surý's avatar
      Fix lock-order-inversion (potential deadlock) in dns_resolver_createfetch · 4bae6d8d
      Ondřej Surý authored
      There's a lock-order-inversion when running `zone_maintenance()` from
      the timer while shutting down the server `shutdown_server()`.  This only
      happens when the taskmgr scheduling is more relaxed and paralellized,
      but the issue is real nevertheless.
      
      The associated ThreadSanitizer warning:
      
          WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock)
            Cycle in lock order graph: M1 (0x000000000001) => M2 (0x000000000000) => M1
      
            Mutex M2 acquired here while holding mutex M1 in thread T1:
      	#0 pthread_mutex_lock <null>
      	#1 dns_view_findzonecut lib/dns/view.c:1326:2
      	#2 fctx_create lib/dns/resolver.c:5144:13
      	#3 dns_resolver_createfetch lib/dns/resolver.c:10977:12
      	#4 zone_refreshkeys lib/dns/zone.c:10830:13
      	#5 zone_maintenance lib/dns/zone.c:11065:5
      	#6 zone_timer lib/dns/zone.c:14652:2
      	#7 task_run lib/isc/task.c:857:5
      	#8 isc_task_run lib/isc/task.c:944:10
      	#9 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
      	#10 process_netievent lib/isc/netmgr/netmgr.c
      	#11 process_queue lib/isc/netmgr/netmgr.c:885:8
      	#12 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
      	#13 process_queues lib/isc/netmgr/netmgr.c:772:7
      	#14 async_cb lib/isc/netmgr/netmgr.c:671:2
      	#15 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
      	#16 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
      	#17 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
      	#18 nm_thread lib/isc/netmgr/netmgr.c:597:11
      	#19 isc__trampoline_run lib/isc/trampoline.c:184:11
      
            Mutex M1 previously acquired by the same thread here:
      	#0 pthread_mutex_lock <null>
      	#1 zone_refreshkeys lib/dns/zone.c:10717:2
      	#2 zone_maintenance lib/dns/zone.c:11065:5
      	#3 zone_timer lib/dns/zone.c:14652:2
      	#4 task_run lib/isc/task.c:857:5
      	#5 isc_task_run lib/isc/task.c:944:10
      	#6 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
      	#7 process_netievent lib/isc/netmgr/netmgr.c
      	#8 process_queue lib/isc/netmgr/netmgr.c:885:8
      	#9 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
      	#10 process_queues lib/isc/netmgr/netmgr.c:772:7
      	#11 async_cb lib/isc/netmgr/netmgr.c:671:2
      	#12 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
      	#13 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
      	#14 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
      	#15 nm_thread lib/isc/netmgr/netmgr.c:597:11
      	#16 isc__trampoline_run lib/isc/trampoline.c:184:11
      
            Mutex M1 acquired here while holding mutex M2 in thread T2:
      	#0 pthread_mutex_lock <null>
      	#1 dns_zone_flush lib/dns/zone.c:11443:2
      	#2 view_flushanddetach lib/dns/view.c:657:5
      	#3 dns_view_flushanddetach lib/dns/view.c:690:2
      	#4 shutdown_server bin/named/server.c:10056:4
      	#5 task_run lib/isc/task.c:857:5
      	#6 isc_task_run lib/isc/task.c:944:10
      	#7 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
      	#8 process_netievent lib/isc/netmgr/netmgr.c
      	#9 process_queue lib/isc/netmgr/netmgr.c:885:8
      	#10 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
      	#11 process_queues lib/isc/netmgr/netmgr.c:772:7
      	#12 async_cb lib/isc/netmgr/netmgr.c:671:2
      	#13 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
      	#14 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
      	#15 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
      	#16 nm_thread lib/isc/netmgr/netmgr.c:597:11
      	#17 isc__trampoline_run lib/isc/trampoline.c:184:11
      
            Mutex M2 previously acquired by the same thread here:
      	#0 pthread_mutex_lock <null>
      	#1 view_flushanddetach lib/dns/view.c:645:3
      	#2 dns_view_flushanddetach lib/dns/view.c:690:2
      	#3 shutdown_server bin/named/server.c:10056:4
      	#4 task_run lib/isc/task.c:857:5
      	#5 isc_task_run lib/isc/task.c:944:10
      	#6 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
      	#7 process_netievent lib/isc/netmgr/netmgr.c
      	#8 process_queue lib/isc/netmgr/netmgr.c:885:8
      	#9 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
      	#10 process_queues lib/isc/netmgr/netmgr.c:772:7
      	#11 async_cb lib/isc/netmgr/netmgr.c:671:2
      	#12 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
      	#13 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
      	#14 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
      	#15 nm_thread lib/isc/netmgr/netmgr.c:597:11
      	#16 isc__trampoline_run lib/isc/trampoline.c:184:11
      
            Thread T2 (running) created by main thread at:
      	#0 pthread_create <null>
      	#1 isc_thread_create lib/isc/pthreads/thread.c:79:8
      	#2 isc_nm_start lib/isc/netmgr/netmgr.c:303:3
      	#3 create_managers bin/named/main.c:957:15
      	#4 setup bin/named/main.c:1267:11
      	#5 main bin/named/main.c:1558:2
      
            Thread T2 (running) created by main thread at:
      	#0 pthread_create <null>
      	#1 isc_thread_create lib/isc/pthreads/thread.c:79:8
      	#2 isc_nm_start lib/isc/netmgr/netmgr.c:303:3
      	#3 create_managers bin/named/main.c:957:15
      	#4 setup bin/named/main.c:1267:11
      	#5 main bin/named/main.c:1558:2
      
          SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) in __interceptor_pthread_mutex_lock
      
      (cherry picked from commit 25d27851)
      4bae6d8d
    • Ondřej Surý's avatar
      Merge branch 'ondrej/cleanup-double-createinctx-constructors-v9_16' into 'v9_16' · bb926e36
      Ondřej Surý authored
      Cleanup the isc_<*>mgr_createinc() constructors (v9.16)
      
      See merge request !4912
      bb926e36
    • Ondřej Surý's avatar
      Cleanup the isc_<*>mgr_createinc() constructors · 97a5559a
      Ondřej Surý authored
      Previously, the taskmgr, timermgr and socketmgr had a constructor
      variant, that would create the mgr on top of existing appctx.  This was
      no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
      directly without any extra code.
      
      This commit just cleans up the extra function.
      
      (cherry picked from commit 3388ef36)
      97a5559a
    • Ondřej Surý's avatar
      Merge branch 'ondrej/cleanup-ISCAPI-remnants-v9_16' into 'v9_16' · 3cdf0260
      Ondřej Surý authored
      Cleanup the public vs private ISCAPI remnants (v9.16)
      
      See merge request !4917
      3cdf0260
    • Ondřej Surý's avatar
      Cleanup the public vs private ISCAPI remnants · 08055b74
      Ondřej Surý authored
      Since all the libraries are internal now, just cleanup the ISCAPI remnants
      in isc_socket, isc_task and isc_timer APIs.  This means, there's one less
      layer as following changes have been done:
      
       * struct isc_socket and struct isc_socketmgr have been removed
       * struct isc__socket and struct isc__socketmgr have been renamed
         to struct isc_socket and struct isc_socketmgr
       * struct isc_task and struct isc_taskmgr have been removed
       * struct isc__task and struct isc__taskmgr have been renamed
         to struct isc_task and struct isc_taskmgr
       * struct isc_timer and struct isc_timermgr have been removed
       * struct isc__timer and struct isc__timermgr have been renamed
         to struct isc_timer and struct isc_timermgr
       * All the associated code that dealt with typing isc_<foo>
         to isc__<foo> and back has been removed.
      
      (cherry picked from commit 16fe0d1f)
      08055b74
    • Ondřej Surý's avatar
      Merge branch 'each-cleanup-dns_client-v9_16' into 'v9_16' · f4cf2561
      Ondřej Surý authored
      remove dns_client_update() and related code (v9.16)
      
      See merge request !4914
      f4cf2561
    • Evan Hunt's avatar
      Add CHANGES note for [GL !4835] · 49d9c1ab
      Evan Hunt authored
      (cherry picked from commit 07e349de)
      49d9c1ab
    • Mark Andrews's avatar
      properly initialise resarg->lock · 9d85c567
      Mark Andrews authored
      9d85c567
    • Ondřej Surý's avatar
      Fixup win32 paths for moved bin/tests/system/resolve · 579b8bec
      Ondřej Surý authored
      When resolve.c was moved from lib/samples to bin/tests/system, the
      resolve.vcxproj.in would still contain old paths to the directory
      root. This commit adds one more ..\ to match the directory depth.
      
      Additionally, fixup the path in BINDInstall.vcxproj.in to be
      bin/tests/system and not bin/tests/samples.
      
      (cherry picked from commit f14e6786)
      579b8bec
    • Evan Hunt's avatar
      move samples/resolve.c to bin/tests/system · 28511bfc
      Evan Hunt authored
      "resolve" is used by the resolver system tests, and I'm not
      certain whether delv exercises the same code, so rather than
      remove it, I moved it to bin/tests/system.
      
      (cherry picked from commit d0ec7d1f)
      28511bfc
    • Ondřej Surý's avatar
      Merge branch '2636-timing-race-in-setnsec3param-task-v9_16' into 'v9_16' · dac09503
      Ondřej Surý authored
      Fix task timing race in setnsec3param() (v9.16)
      
      See merge request !4915
      dac09503