BIND issueshttps://gitlab.isc.org/isc-projects/bind9/-/issues2023-11-02T17:02:20Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/2959Remove support for signed 32-bit time_t2023-11-02T17:02:20ZOndřej SurýRemove support for signed 32-bit time_tNow there are couple of requirements:
* All user space must be compiled with a 64-bit time_t, which are supported in the musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher.
See for details: h...Now there are couple of requirements:
* All user space must be compiled with a 64-bit time_t, which are supported in the musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher.
See for details: https://lkml.org/lkml/2020/1/29/355?anz=webNot plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3063dnssec-verify detect and support multiple cores2023-11-02T17:02:21ZDaniel Stirnimanndnssec-verify detect and support multiple cores### Description
We use `dnssec-verify` (from BIND 9.16) to validate large DNSSEC-signed zones. I noticed that on a multi core processor (eg 16 cores) always only one cpu is used. I guess, validation time could be speed up a lot if all a...### Description
We use `dnssec-verify` (from BIND 9.16) to validate large DNSSEC-signed zones. I noticed that on a multi core processor (eg 16 cores) always only one cpu is used. I guess, validation time could be speed up a lot if all available cores would be used.
### Request
Make `dnssec-verify` use all available cores automatically for operations for which this is possible eg. signature verification.
`dnssec-signzone` already automatically detects and uses all available cores and even has an argument switch to specify an specific number (`man dnssec-signzone`). I think something like this would be very useful:
```
-n ncpus
This option specifies the number of threads to use. By default, one thread is started for each detected CPU.
```Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/3262Offloaded RPZ processing needs 'shuttingdown' signal2023-11-02T17:05:04ZOndřej SurýOffloaded RPZ processing needs 'shuttingdown' signalWhen the database is shutdown during the threadpool processing of the RPZ, we would crash:
```
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f999...When the database is shutdown during the threadpool processing of the RPZ, we would crash:
```
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f99990958f3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007f99990486a6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007f99990327d3 in __GI_abort () at abort.c:79
#4 0x0000000000415baa in assertion_failed (file=<optimized out>, line=<optimized out>, type=<optimized out>, cond=<optimized out>) at main.c:237
#5 0x00007f9999af07fa in isc_assertion_failed (file=file@entry=0x7f9999a42890 "db.c", line=line@entry=581, type=type@entry=isc_assertiontype_require,
cond=cond@entry=0x7f9999a49528 "nodep != ((void *)0) && *nodep != ((void *)0)") at assertions.c:49
#6 0x00007f99998f2bef in dns_db_detachnode (db=<optimized out>, nodep=nodep@entry=0x7f99739f98f0) at db.c:581
#7 0x00007f99999ca3b2 in update_nodes (rpz=rpz@entry=0x7f99928d1400, newnodes=<optimized out>) at rpz.c:1762
#8 0x00007f99999cace8 in update_rpz_cb (data=0x7f99928d1400) at rpz.c:1942
#9 0x00007f99993fce94 in uv__queue_work (w=0x7f9907839600) at /usr/src/libuv-v1.43.0/src/threadpool.c:326
#10 0x00007f99993fc61c in worker (arg=0x0) at /usr/src/libuv-v1.43.0/src/threadpool.c:122
#11 0x00007f9999093b1a in start_thread (arg=<optimized out>) at pthread_create.c:443
#12 0x00007f99991178e4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
```
Related MR with reverts:
* https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/6091
* https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/6092
Original issue:
* https://gitlab.isc.org/isc-projects/bind9/-/issues/3190
Original MRs with offloaded RPZ:
* https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5938
* https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/6072
* https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/6074Not plannedhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4559Convert DNS_GETDB_ into struct with 1-bit long booleans2024-02-02T13:23:23ZOndřej SurýConvert DNS_GETDB_ into struct with 1-bit long booleansSee https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/8683#note_433377 for details:
> Going step further, I think it can very well be a struct with booleans. It should cost nothing because compiler is not stupid nowadays and it...See https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/8683#note_433377 for details:
> Going step further, I think it can very well be a struct with booleans. It should cost nothing because compiler is not stupid nowadays and it will make decoding values in coredumps easier.
>
> (To be clear - I mean something like this: !6902 (merged))https://gitlab.isc.org/isc-projects/bind9/-/issues/3810Replace system test runner with pytest2024-02-29T15:26:01ZTom KrizekReplace system test runner with pytestThe legacy solution for running systems test has evolved over the course of years and is currently a mix of shell & perl scripts intermingled with the build system, while some of the system tests utilize pytest. Implementing a more consi...The legacy solution for running systems test has evolved over the course of years and is currently a mix of shell & perl scripts intermingled with the build system, while some of the system tests utilize pytest. Implementing a more consistent solution using just pytest as a runner could bring following benefits:
- better test run isolation (i.e. artifacts from previous run don't interfere with current test run)
- more precise control over test selection (running just a single test case)
- getting rid of perl+shell glue scripts
- a simpler and more standard way to run and parallelize test runs
- solid foundation for future extensions (e.g. wrapping test execution inside a network/pid namespace)
For a transitory period of time, the legacy test framework should be supported, since it'd be difficult to replace everything at once. The pytest runner should be available in 9.18+, it'd be prudent to keep the legacy runner support until 9.16 reaches EOL. By that time, we should have enough insight to determine whether pytest proves to be a suitable replacement and throw away the legacy runner from supported branches at that point.
Migration plan for moving to pytest runner and dropping the legacy runner support:
- Phase I - pytest runner development, legacy runner supported
- [x] initial implementation of the pytest runner (#3978, !6809)
- [x] support out-of-tree tests (#4246)
- [x] resolve support on CI systems with old pytest (OpenBSD, CentOS 7) (!8193)
- [x] implement any missing (and desired) features from legacy runner (#4252)
- [x] configure `make check` to invoke pytest (#4262)
- Phase II - deprecating legacy runner - 9.19-only
- [ ] remove legacy runner control script(s) - legacy.run.sh, get_ports.sh ...
- [ ] remove no longer needed scripts from system tests (e.g. clean.sh)
- [ ] remove conf.sh(.common) and declare variables in pytest only
- [ ] remove the Makefile entanglement
- [ ] declare python and pytest-xdist as required dependencies for tests + document
- [ ] address any `FUTURE` comments in the pytest runner code
- Phase III - cleanup after legacy runner
- [ ] rewrite start.pl/stop.pl to python (related https://gitlab.isc.org/isc-projects/bind9/-/issues/3198)
- [ ] rewrite remaining setup/teardown perl&shell scripts to python
- [ ] rewrite setup.sh/prereq.sh system tests scripts to pytest fixtures
- [ ] ensure system test documentation is up to dateBIND 9.19.xTom KrizekTom Krizekhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4592Improve the isc_heap resize algorithm2024-03-06T08:38:23ZOndřej SurýImprove the isc_heap resize algorithmThe current isc_heap resizing algorithm grows the array for holding the heap elements by 1024 (there's an argument to `isc_heap_create()`, but either default (1024) or explicit 1024 is used everywhere).The current isc_heap resizing algorithm grows the array for holding the heap elements by 1024 (there's an argument to `isc_heap_create()`, but either default (1024) or explicit 1024 is used everywhere).May 2024 (9.18.27, 9.18.27-S1, 9.19.24)https://gitlab.isc.org/isc-projects/bind9/-/issues/1550[ISC-support #15905] rndc stop issued after server (or single zone) rndc relo...2024-03-13T21:08:58ZCathy Almond[ISC-support #15905] rndc stop issued after server (or single zone) rndc reload and during ixfr-from-differences processing leaves the .jnl file corruptedFrom Support ticket [#15905](https://support.isc.org/Ticket/Display.html?id=15905)
9.11.x (but I don't anticipate that the BIND version makes any difference)
This problem is readily reproducible (and, I suspect, occurs because "rndc st...From Support ticket [#15905](https://support.isc.org/Ticket/Display.html?id=15905)
9.11.x (but I don't anticipate that the BIND version makes any difference)
This problem is readily reproducible (and, I suspect, occurs because "rndc stop" doesn't recognise that the zone is effectively 'dynamic' because it has been reloaded with 'ixfr-from-differences yes;').
Here's what is being done, per the server logs. First the reload (the outcome is also the same with 'reload zone'):
```
09-Jan-2020 14:50:29.157 general: info: received control channel command 'reload' ============>>> RELOAD COMMAND
09-Jan-2020 14:50:29.179 general: info: loading configuration from '/etc/named.conf'
... etcetera
09-Jan-2020 14:50:29.202 general: info: reloading configuration succeeded
09-Jan-2020 14:50:29.202 general: info: reloading zones succeeded
09-Jan-2020 14:50:29.202 general: notice: all zones loaded
09-Jan-2020 14:50:29.202 general: notice: running
```
Note that the reload has completed, as far as the logging is concerned, but, it would appear that the regeneration of the .jnl files via 'ixfr-from-differences yes;' has not (high CPU use by named - suggests that it is busy doing this).
Then the 'rndc stop' is issued - and it completes almost immediately (no evidence that it is waiting for the processing to complete), in fact, it seems to log that it has aborted a zone reload, even though the previous logging said that the reload *had* completed:
```
09-Jan-2020 14:50:33.211 general: info: received control channel command 'stop -p'
09-Jan-2020 14:50:33.212 general: info: shutting down: flushing changes
.. etcetera (just the logs of the various sockets being closed here)
09-Jan-2020 14:50:33.216 general: error: zone test.com/IN: loading from master file dynamic/test.com.zone failed: operation canceled
09-Jan-2020 14:50:33.216 general: error: zone test.com/IN: not loaded due to errors.
09-Jan-2020 14:50:34.265 general: notice: exiting
```
And then after this, restarting named - the zone can no longer be loaded - the journal file does not tally with the zone itself:
```
...
09-Jan-2020 14:51:28.141 general: error: zone test.com/IN: journal rollforward failed: journal out of sync with zone
09-Jan-2020 14:51:28.141 general: error: zone test.com/IN: not loaded due to errors.
09-Jan-2020 14:51:28.142 general: notice: all zones loaded
09-Jan-2020 14:51:28.142 general: notice: running
```
======
Something has gone badly wrong during the 'rndc stop' - which is supposed to be a graceful shutdown of named. I'm assuming that the problem is that the .jnl file itself is corrupt, rather than that something has happened to the zone file on disk - but will ask for more data to confirm this.
```
stop [-p]
Stop the server, making sure any recent changes made through dynamic update
or IXFR are first saved to the master files of the updated zones. If -p is
specified named’s process id is returned. This allows an external process
to determine when named had completed stopping.
```
Now, since ixfr-from-differences processing could take an age to complete, I don't think it's reasonable to wait forever on the rndc stop. Possibly one solution would be have a (configurable?) timeout, after which any pending ixfr-from-differences .jnl file generation is terminated and the incomplete .jnl file discarded/removed. After all, the administrator in this scenario has just presented named with a new zone file that it asked named to load - so we know that the full copy of the zone on disk should be good and valid - we just loaded from it.
====
The workaround if this happens, is presumably to manually discard the corrupted .jnl file and restart named.Not planned