ISC Open Source Projects issueshttps://gitlab.isc.org/groups/isc-projects/-/issues2023-12-20T15:19:05Zhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4441Make LRU memory-based cache cleaning compatible with ECS cache2023-12-20T15:19:05ZGreg ChoulesMake LRU memory-based cache cleaning compatible with ECS cacheThe structure of ECS cache means that non-ECS RRsets will tend to be preferred over ECS RRsets when LRU (Least Recently Used) cache cleaning is initiated when a cache size reaches configured max-cache-size
Essentially, LRU maintains a l...The structure of ECS cache means that non-ECS RRsets will tend to be preferred over ECS RRsets when LRU (Least Recently Used) cache cleaning is initiated when a cache size reaches configured max-cache-size
Essentially, LRU maintains a list of nodes as potential early-expire candidates based on when they were last accessed. When a node is accessed by a client query it drops to the bottom of the list. When named needs to expire some content early, in order to make room for more, it picks the node from the top of the list.
Enter ECS cache. Here we have, sitting attached to a single node, another dimension of variable size, depending on the range of clients querying for the same name where ECS has been enabled and also depending on the effective prefix size for the scoped RRsets sitting there. Every time we access that node to pull out (or add) an RRset, we move the node to the end of the LRU list. If there are a lot of different RRsets being maintained, even though some of them individually would have been candidates for LRU deletion, newer access/additions to the node will move it back to the bottom of the LRU list.
The outcome is that under cache memory pressure, we could end up expiring most of the other (usable!) cache content but never removing the older ECS-scoped content that has created the memory pressure in the first place. This could lead to:
1. Cache thrash and poor performance due to repeated adds and deletions of some important cache content.
2. Worst case - we seldom manage to complete the series of fetches needed to populate cache with the RRsets needed to answer a client query because the content is vanishing as fast as we put it in there.
3. Very worst case - we don't manage to prime the roots because as fast as we do, they're LRU-expiring again (or some other problem with getting them into cache to be used).
The mitigation (as it is anyway, for sites whose caches are over-constrained by max-cache-size) is to make sure that max-cache-size is large enough to prevent reaching the limit, other than exceptionally.
ECS cache, however, is much more vulnerable to becoming unusable under cache memory pressure because of the way that the ECS-scoped content drives up cache memory use but then fails to be managed effectively by LRU-cleaning, so we end up with 'islands' of untouched ECS cache and nothing much else.
We do need to fix this, but (perhaps) also make recommendations to users of ECS that they must increase max-cache-size when enabling ECS and then monitor, since reaching max-cache-size could be bad for their resolvers because cache cleaning in that situation is not as effective as for non-ECS caches.
Similarly we need to cater for multiple RDATA types for the same name being cached at the same node, since those are subject to the same problem, although mostly to a much lesser degree.
Relates to (now closed) Support case 00001412 [Single View broken](https://isc.lightning.force.com/lightning/r/Case/5007V00002ZSjxdQAD/view)December 2023 (9.18.21, 9.18.21-S1, 9.19.19)https://gitlab.isc.org/isc-projects/kea/-/issues/3156Changes for Kea 2.4.1 release2023-11-23T13:38:03ZAndrei Pavelandrei@isc.orgChanges for Kea 2.4.1 release
- [x] added release entry to ChangeLogs
- [x] regenerated BNF grammar
- [x] regenerated message headers
- [x] regenerated parsers
- [x] reordered messages in alphabetical order
- [x] updated copyright years
- [x] added release entry to ChangeLogs
- [x] regenerated BNF grammar
- [x] regenerated message headers
- [x] regenerated parsers
- [x] reordered messages in alphabetical order
- [x] updated copyright yearskea2.4.1Andrei Pavelandrei@isc.orgAndrei Pavelandrei@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/4439segfault in resolver when serving UDP clients2023-12-19T16:34:56ZTom Krizeksegfault in resolver when serving UDP clientsWhen running BIND9 as a resolver, it crashes with SIGSEGV within a couple of seconds when I run DNS Shotgun simulating UDP clients using a realistic query set.
I managed to bisect the cause of this issue to commit f36e118b9a5750fef886a4...When running BIND9 as a resolver, it crashes with SIGSEGV within a couple of seconds when I run DNS Shotgun simulating UDP clients using a realistic query set.
I managed to bisect the cause of this issue to commit f36e118b9a5750fef886a4ca179740b99e97821e.
BIND9 is executed with a single thread (seems to be the most reliable way to reproduce the issue): `./bin/named/named -n 1 -c named.conf`
**named.conf**
```
options {
listen-on { 10.53.0.1; };
recursion yes;
};
```
**coredump**
- [core.named.1000.da411ff6f65d4f8aa8904bcc81e63f02.1798547.1700141110000000.zst](/uploads/61244267a69e03a21fb25873d97889ea/core.named.1000.da411ff6f65d4f8aa8904bcc81e63f02.1798547.1700141110000000.zst)
- [named](/uploads/a17d2a1275b163188fbafb36618fb655/named)
```
Program terminated with signal SIGSEGV, Segmentation fault.
warning: Section `.reg-xstate/1798547' in core file too small.
#0 0x00007fe4acaca388 in async_restart (arg=0x7fe4a3e8ba00) at query.c:5843
Thread 1 (Thread 0x7fe4ab144580 (LWP 1798547)):
#0 0x00007fe4acaca388 in async_restart (arg=0x7fe4a3e8ba00) at query.c:5843
#1 0x00007fe4acb18268 in isc__async_cb (handle=<optimized out>) at async.c:111
#2 0x00007fe4ac5879fb in ?? () from /usr/lib/libuv.so.1
#3 0x00007fe4ac5a4cdb in ?? () from /usr/lib/libuv.so.1
#4 0x00007fe4ac58cf9f in uv_run () from /usr/lib/libuv.so.1
#5 0x00007fe4acb2b38c in loop_thread (arg=0x7fe4a8a90000) at loop.c:282
#6 0x000055f0f2f03928 in main (argc=8, argv=0x7ffd47f22688) at main.c:1574
#0 0x00007fa3a23bf388 in async_restart (arg=0x7fa39b178700) at query.c:5843
5843 isc_mem_put(client->manager->mctx, qctx, sizeof(*qctx));
(gdb) p client
$1 = (ns_client_t *) 0x7fa399e96800
(gdb) p client->manager
$2 = (ns_clientmgr_t *) 0xdededededededede
```
`client->manager` has been deallocated while still in use.
I also have an `rr` recording, but I don't have the knowledge to make much use of it. The `client->manager` was deallocated here:
```
Old value = (ns_clientmgr_t *) 0x7f19d8e6c1de
New value = (ns_clientmgr_t *) 0x7f19d8e6c1e0
0x00007f19dbd10c4a in ?? () from /usr/lib/libc.so.6
(rr) bt
#0 0x00007f19dbd10c4a in ?? () from /usr/lib/libc.so.6
#1 0x00007f19dced81c0 in memset (__len=<optimized out>, __ch=222, __dest=0x7f19d5ec7c00)
at /usr/include/bits/string_fortified.h:59
#2 mem_put (flags=0, size=<optimized out>, mem=0x7f19d5ec7c00, ctx=0x7f19d8e22760)
at mem.c:326
#3 isc__mem_put (ctx=0x7f19d8e22760, ptr=0x7f19d5ec7c00, size=<optimized out>, flags=0,
file=<optimized out>, line=<optimized out>) at mem.c:761
#4 0x00007f19dce557db in ns__client_put_cb (client0=0x7f19d5ec7c00) at client.c:1627
#5 0x00007f19dceb2503 in nmhandle_free (handle=0x7f19d6543280, sock=0x7f19d8e92800)
at netmgr/netmgr.c:886
#6 nmhandle__destroy (handle=0x7f19d6543280) at netmgr/netmgr.c:906
#7 0x00007f19dce6a814 in ns_query_done (qctx=qctx@entry=0x7f19d51b1600) at query.c:11663
#8 0x00007f19dce73ba7 in query_delegation_recurse (qctx=qctx@entry=0x7f19d51b1600)
at query.c:9054
#9 0x00007f19dce73dae in query_delegation (qctx=0x7f19d51b1600) at query.c:8976
#10 0x00007f19dce712b6 in query_lookup (qctx=0x7f19d51b1600) at query.c:6179
#11 0x00007f19dce72be6 in ns__query_start (qctx=0x7f19d51b1600) at query.c:5820
#12 0x00007f19dce73355 in async_restart (arg=0x7f19d51b1600) at query.c:5838
#13 0x00007f19dcec1268 in isc__async_cb (handle=<optimized out>) at async.c:111
#14 0x00007f19dc9359fb in ?? () from /usr/lib/libuv.so.1
#15 0x00007f19dc952cdb in ?? () from /usr/lib/libuv.so.1
#16 0x00007f19dc93af9f in uv_run () from /usr/lib/libuv.so.1
#17 0x00007f19dced438c in loop_thread (arg=0x7f19d8e90000) at loop.c:282
#18 0x0000562924fc5928 in main (argc=6, argv=0x7ffdf1f45078) at main.c:1574
```January 2024 (9.16.46, 9.16.46-S1, 9.18.22, 9.18.22-S1, 9.19.20) (❗RECALLED❗)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/kea/-/issues/3154Kea fails to link with log4cplus if the UNICODE macro is defined2024-01-16T16:22:58ZAndrei Pavelandrei@isc.orgKea fails to link with log4cplus if the UNICODE macro is definedThis has been observed in the fuzzing experiments on an Ubuntu 20.04 OSS-fuzz container, and also prior to that on my local FreeBSD 13 VM.
When attempting to build Kea, it complains at `./configure` about `configure: error: Needs log4cp...This has been observed in the fuzzing experiments on an Ubuntu 20.04 OSS-fuzz container, and also prior to that on my local FreeBSD 13 VM.
When attempting to build Kea, it complains at `./configure` about `configure: error: Needs log4cplus library`.
`config.log` shows:
```
conftest.cpp:(.text+0x21): undefined reference to `log4cplus::Logger::getInstance(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)'
```
Doing `CXXFLAGS='-UUNICODE' LDFLAGS='-UUNICODE' ./configure` fixes it, although it is likely that only one of the set of flags is needed.
What happens is the log4cplus headers have UNICODE macros that makes its code use wide character strings instead of regular strings, hence the undefined symbol. See `/usr/include/log4cplus/clogger.h`, `/usr/include/log4cplus/tchar.h`, `/usr/include/log4cplus/tstring.h`, and others.
log4cplus is likely not the one setting the macro, but likely other dependencies or some system component like the compiler itself. Likely the latter, which is why it appears on fringe systems.
This error could be prevented without manually specifying flags when `.configure`-ing. Either:
1. Follow in the steps of log4cplus and split the use of its functions according to whether UNICODE is defined or not and provide a `getInstance()` that uses wide character strings if it is defined.
2. Detect if Kea can link with log4cplus in `configure.ac` and add `-UUNICODE` to the set of flags if the error above ocurs. This is rather intrusive and also inaccurate.kea2.5.5https://gitlab.isc.org/isc-projects/bind9/-/issues/4436nsupdate segfaults in tsiggss on FreeBSD 142024-03-22T12:19:44ZMichal Nowaknsupdate segfaults in tsiggss on FreeBSD 14`nsupdate` segfaults in the `tsiggss` system test on FreeBSD 14.0 on ~"v9.18" and ~"v9.16".
Here's a first crash in the system test. There are several more crashes afterward.
```
2023-11-15 12:20:53,799 INFO:tsiggss I:tsiggss_tm...`nsupdate` segfaults in the `tsiggss` system test on FreeBSD 14.0 on ~"v9.18" and ~"v9.16".
Here's a first crash in the system test. There are several more crashes afterward.
```
2023-11-15 12:20:53,799 INFO:tsiggss I:tsiggss_tmp_dk09tbmf:testing updates to testdc1 as administrator (1)
2023-11-15 12:20:53,800 INFO:tsiggss I:tsiggss_tmp_dk09tbmf:testing update for testdc1.example.nil. A 86400 A 10.53.0.10
2023-11-15 12:20:53,840 INFO:tsiggss Segmentation fault (core dumped)
2023-11-15 12:20:53,841 INFO:tsiggss I:tsiggss_tmp_dk09tbmf:update failed for testdc1.example.nil. A 86400 A 10.53.0.10
2023-11-15 12:20:53,841 INFO:tsiggss I:Reply from SOA query:
2023-11-15 12:20:53,841 INFO:tsiggss I:;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 47069
2023-11-15 12:20:53,842 INFO:tsiggss I:;; flags: qr aa; QUESTION: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
2023-11-15 12:20:53,842 INFO:tsiggss I:;; QUESTION SECTION:
2023-11-15 12:20:53,842 INFO:tsiggss I:;testdc1.example.nil. IN SOA
2023-11-15 12:20:53,842 INFO:tsiggss I:
2023-11-15 12:20:53,842 INFO:tsiggss I:;; AUTHORITY SECTION:
2023-11-15 12:20:53,842 INFO:tsiggss I:example.nil. 0 IN SOA blu.example.nil. hostmaster.example.nil. 2010113027 172800 14400 3628800 604800
2023-11-15 12:20:53,842 INFO:tsiggss I:
2023-11-15 12:20:53,842 INFO:tsiggss I:Found zone name: example.nil
2023-11-15 12:20:53,842 INFO:tsiggss I:The primary is: blu.example.nil
2023-11-15 12:20:53,843 INFO:tsiggss I:start_gssrequest
2023-11-15 12:20:53,843 INFO:tsiggss I:Found realm from ticket: EXAMPLE.NIL
2023-11-15 12:20:53,843 INFO:tsiggss I:tsiggss_tmp_dk09tbmf:failed
```
Sample `nsupdate` backtrace:
```
Core was generated by `/root/bind9/bin/nsupdate/.libs/nsupdate -g -d ns1/update.txt'.
Program terminated with signal SIGSEGV, Segmentation fault.
Address not mapped to object.
#0 0x00000008316a1a0f in EVP_Cipher () from /lib/libcrypto.so.30
[Current thread is 1 (LWP 188477)]
#0 0x00000008316a1a0f in EVP_Cipher () from /lib/libcrypto.so.30
#1 0x000000082e96f4b6 in ?? () from /usr/lib/libkrb5.so.11
#2 0x000000082e973ac8 in krb5_encrypt_ivec () from /usr/lib/libkrb5.so.11
#3 0x000000082e973de5 in krb5_encrypt () from /usr/lib/libkrb5.so.11
#4 0x000000082e9675bf in _krb5_build_authenticator () from /usr/lib/libkrb5.so.11
#5 0x000000082dcff3f6 in ?? () from /usr/lib/libgssapi_krb5.so.10
#6 0x000000082dcfed0b in _gsskrb5_init_sec_context () from /usr/lib/libgssapi_krb5.so.10
#7 0x000000082d95bd4f in gss_init_sec_context () from /usr/lib/libgssapi.so.10
#8 0x000000083ed613b6 in ?? () from /usr/lib/libgssapi_spnego.so.10
#9 0x000000083ed5f5c0 in _gss_spnego_indicate_mechtypelist () from /usr/lib/libgssapi_spnego.so.10
#10 0x000000083ed607ee in _gss_spnego_init_sec_context () from /usr/lib/libgssapi_spnego.so.10
#11 0x000000082d95bd4f in gss_init_sec_context () from /usr/lib/libgssapi.so.10
#12 0x0000000822a308e5 in dst_gssapi_initctx (name=<optimized out>, intoken=intoken@entry=0x0, outtoken=outtoken@entry=0x83d56d700, gssctx=0x83d56e218, mctx=0x1aef866b3000, err_message=0x83d56e200) at gssapictx.c
#13 0x0000000822b0c9af in dns_tkey_buildgssquery (msg=0x1aef87203a80, name=0x2130e0 <fkname>, gname=0x1aef87234300, gname@entry=0x83d56d7a0, intoken=0x1aef872700f0, intoken@entry=0x0, lifetime=lifetime@entry=0, context=0xcf, context@entry=0x83d56e218, win2k=<optimized out>, mctx=0x1aef866b3000, err_message=0x83d56e200) at tkey.c
#14 0x000000000020e790 in start_gssrequest (primary=primary@entry=0x83d56e730) at nsupdate.c
#15 0x000000000020e33c in recvsoa (task=<optimized out>, event=0x0) at nsupdate.c
#16 0x0000000821c68370 in task_run (task=0x1aef8665c140) at task.c
#17 isc_task_run (task=0x1aef8665c140) at task.c
#18 0x0000000821c38689 in isc__nm_async_task (worker=worker@entry=0x1aef866d0000, ev0=0x1aef872700f0, ev0@entry=0x1aef8721c480) at netmgr/netmgr.c
#19 0x0000000821c32ec6 in process_netievent (worker=worker@entry=0x1aef866d0000, ievent=ievent@entry=0x1aef8721c480) at netmgr/netmgr.c
#20 0x0000000821c384f2 in process_queue (worker=worker@entry=0x1aef866d0000, type=type@entry=NETIEVENT_TASK) at netmgr/netmgr.c
#21 0x0000000821c2e6bd in process_all_queues (worker=0x1aef866d0000) at netmgr/netmgr.c
#22 async_cb (handle=0x1aef866d02d8) at netmgr/netmgr.c
#23 0x0000000829b3c871 in ?? () from /usr/local/lib/libuv.so.1
#24 0x0000000829b4e0fd in ?? () from /usr/local/lib/libuv.so.1
#25 0x0000000829b3ce60 in uv_run () from /usr/local/lib/libuv.so.1
#26 0x0000000821c2e7ab in nm_thread (worker0=0x1aef866d0000) at netmgr/netmgr.c
#27 0x0000000821c70e46 in isc__trampoline_run (arg=0x1aef8662bb90) at trampoline.c
#28 0x00000008376e0a75 in ?? () from /lib/libthr.so.3
#29 0x0000000000000000 in ?? ()
```
```
BIND 9.18.21-dev (Extended Support Version) <id:ed78bc4>
running on FreeBSD amd64 14.0-RC2 FreeBSD 14.0-RC2 #0 releng/14.0-n265317-1d2ff5639925: Fri Oct 20 06:17:03 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
built by make with '--disable-maintainer-mode' '--enable-developer' '--enable-option-checking=fatal' '--enable-dnstap' '--with-cmocka' '--with-libxml2' '--with-json-c' '--with-readline=libedit'
compiled by CLANG FreeBSD Clang 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152)
compiled with OpenSSL version: OpenSSL 3.0.11 19 Sep 2023
linked to OpenSSL version: OpenSSL 3.0.11 19 Sep 2023
compiled with libuv version: 1.46.0
linked to libuv version: 1.46.0
compiled with libnghttp2 version: 1.57.0
linked to libnghttp2 version: 1.57.0
compiled with libxml2 version: 2.10.4
linked to libxml2 version: 21004
compiled with json-c version: 0.17
linked to json-c version: 0.17
compiled with zlib version: 1.3
linked to zlib version: 1.3
linked to maxminddb version: 1.7.1
compiled with protobuf-c version: 1.4.1
linked to protobuf-c version: 1.4.1
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /usr/local/etc/named.conf
rndc configuration: /usr/local/etc/rndc.conf
DNSSEC root key: /usr/local/etc/bind.keys
nsupdate session key: /usr/local/var/run/named/session.key
named PID file: /usr/local/var/run/named/named.pid
named lock file: /usr/local/var/run/named/named.lock
geoip-directory: /usr/local/share/GeoIP
```
```
checking for krb5-config... /usr/bin/krb5-config
checking for gssapi libraries... -I/usr/include -L/usr/lib -lgssapi -lgssapi_krb5 -lheimntlm -lkrb5 -lhx509 -lcom_err -lcrypto -lasn1 -lwind -lheimbase -lroken -lcrypt -pthread
checking for gssapi/gssapi.h... yes
checking for gssapi/gssapi_krb5.h... yes
checking for gssapi_krb5.h... no
checking for gss_acquire_cred... yes
checking for krb5 libraries... -I/usr/include -L/usr/lib -lkrb5 -lhx509 -lcom_err -lcrypto -lasn1 -lwind -lheimbase -lroken -lcrypt -pthread
checking for krb5/krb5.h... no
checking for krb5.h... yes
checking for krb5_init_context... yes
```
[pytest.log.txt](/uploads/ca1a092b91023024d1c3215295837dd2/pytest.log.txt)
[core.43134-backtrace.txt](/uploads/dab5cd198e0e09345030257576a91602/core.43134-backtrace.txt)
[core.43041-backtrace.txt](/uploads/f40e49e3120623d207cd0ed5b8e93d0b/core.43041-backtrace.txt)
[core.44009-backtrace.txt](/uploads/c7c66927be73250c875443cdb6c70802/core.44009-backtrace.txt)
[core.43922-backtrace.txt](/uploads/e82cfe048701645b83c74af46773e7e1/core.43922-backtrace.txt)
[core.43252-backtrace.txt](/uploads/fdfe3109fd5b13b2f9b06f22ce0da585/core.43252-backtrace.txt)
[core.43094-backtrace.txt](/uploads/9c4a6c1e2e03753c9e71cbb5b7465ff8/core.43094-backtrace.txt)
[core.42986-backtrace.txt](/uploads/c964412aa69ad3fede8fb6cf8c9c1b5d/core.42986-backtrace.txt)
[core.42931-backtrace.txt](/uploads/6402826262a2210df3fc0bb39857813e/core.42931-backtrace.txt)
[nsupdate.out6](/uploads/bdda0082ecfb34190cec4e83a9c3f1d1/nsupdate.out6)
[nsupdate.out5](/uploads/df1a9bbff55030ce6093fea3750f08f0/nsupdate.out5)
[nsupdate.out8](/uploads/58f94412f56d2136472a3305f1b0f573/nsupdate.out8)
[nsupdate.out7](/uploads/28aa35d4d517296581d05630cae16b7d/nsupdate.out7)
[nsupdate.out4](/uploads/9ec2300228db0bf43e29609b88bfef9a/nsupdate.out4)
[nsupdate.out3](/uploads/4fcbf78e8cb9e5a356618123d0e97941/nsupdate.out3)
[nsupdate.out2](/uploads/46b5c9ce602f9bb99cb1673e72ab879d/nsupdate.out2)
[nsupdate.out11](/uploads/cb41ad3685b8d0f31315a5da05308044/nsupdate.out11)
[nsupdate.out10](/uploads/c1f8cbe2f93700b395ee9547658c66cd/nsupdate.out10)
[nsupdate.out1](/uploads/f77767f9b3e2d067f15bc1b121bd56cf/nsupdate.out1)December 2023 (9.18.21, 9.18.21-S1, 9.19.19)https://gitlab.isc.org/isc-projects/kea/-/issues/3153Bump up library versions for 2.4.12023-11-23T13:38:03ZAndrei Pavelandrei@isc.orgBump up library versions for 2.4.1Bump up library versions for %"kea2.4.1".Bump up library versions for %"kea2.4.1".kea2.4.1Razvan BecheriuRazvan Becheriuhttps://gitlab.isc.org/isc-projects/kea/-/issues/3152Extraneous second lookup in host cache in the radius access callout when subn...2024-03-25T14:14:24ZAndrei Pavelandrei@isc.orgExtraneous second lookup in host cache in the radius access callout when subnet is not reselectedIn `subnetX_select` callouts for `libdhcp_radius.so`, there are two `getXAny` lookups in the host cache. The second one has the purpose of fetching the host again with the new subnet ID. But this makes sense only if the subnet was resele...In `subnetX_select` callouts for `libdhcp_radius.so`, there are two `getXAny` lookups in the host cache. The second one has the purpose of fetching the host again with the new subnet ID. But this makes sense only if the subnet was reselected since the first retrieval as part of the subnet reselection process that is specific to RADIUS. However, it is called regardless of whether the subnet was reselected or not. This could be optimized.next-stable-2.6Andrei Pavelandrei@isc.orgAndrei Pavelandrei@isc.orghttps://gitlab.isc.org/isc-projects/bind9/-/issues/4433Supplied Buffer Too Large in wire_test.c2023-12-06T18:22:03ZEric SesterhennSupplied Buffer Too Large in wire_test.cThe code in wire_test.c does provide an 64*1024 buffer to dns_message_renderbegin(). This might trigger an error in a corner case, where the code is expecting to receive buffers that are not larger than 65536 bytes.
~~~
if (result != IS...The code in wire_test.c does provide an 64*1024 buffer to dns_message_renderbegin(). This might trigger an error in a corner case, where the code is expecting to receive buffers that are not larger than 65536 bytes.
~~~
if (result != ISC_R_SUCCESS) {
INSIST(st.used < 65536);
dns_compress_rollback(
msg->cctx, (uint16_t)st.used);
*(msg->buffer) = st; /* rollback */
msg->buffer->length += msg->reserved;
msg->counts[sectionid] += total;
maybe_clear_ad(msg, sectionid);
return (result);
}
~~~December 2023 (9.18.21, 9.18.21-S1, 9.19.19)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/bind9/-/issues/4432Pointers Dereferenced before Being Checked2023-12-06T18:27:45ZEric SesterhennPointers Dereferenced before Being CheckedIn several places pointers are dereferenced before being checked against NULL. In the listing, the pointer mgr is dereferenced to assign worker and then checked for validity, which includes a NULL pointer check. In case mgr is NULL, inva...In several places pointers are dereferenced before being checked against NULL. In the listing, the pointer mgr is dereferenced to assign worker and then checked for validity, which includes a NULL pointer check. In case mgr is NULL, invalid memory is getting read which likely
leads to a crash instead of a more controlled abort.
~~~
void
isc_nm_streamdnsconnect(isc_nm_t *mgr, isc_sockaddr_t *local,
isc_sockaddr_t *peer, isc_nm_cb_t cb, void *cbarg,
unsigned int timeout, isc_tlsctx_t *ctx,
isc_tlsctx_client_session_cache_t *client_sess_cache) {
isc_nmsocket_t *nsock = NULL;
isc__networker_t *worker = &mgr->workers[isc_tid()];
REQUIRE(VALID_NM(mgr));
~~~
Similar code exists in isc_nm_listenstreamdns(), isc_nm_tcpconnect(), isc_nm_listentls(), isc_nm_tlsconnect(), isc_nm_tcpconnect() and isc_nm_udpconnect(). sock is used in a similar pattern in isc__nm_udp_send(). The stats pointer in dns_dnssecsignstats_increment()
and dns_dnssecsignstats_clear() is accessed in the same way.December 2023 (9.18.21, 9.18.21-S1, 9.19.19)Mark AndrewsMark Andrewshttps://gitlab.isc.org/isc-projects/stork/-/issues/1222Make it obvious that an entire machine has stopped responding2024-02-22T18:14:29ZDarren AnkneyMake it obvious that an entire machine has stopped respondingCurrently, if an entire host with stork-agent + kea-* goes offline, there is not much indication in the Stork UI. Eventually, a message will appear in the Events on the right hand side. This was as reported to me by a customer during a...Currently, if an entire host with stork-agent + kea-* goes offline, there is not much indication in the Stork UI. Eventually, a message will appear in the Events on the right hand side. This was as reported to me by a customer during a call where I was helping him install Stork and do some testing. The customer was sharing their screen and I did see that nothing indicated that the host was gone. Under "machines" all of the services still had a green checkmark next to them even a few minutes later.
I propose that there should be an area of the GUI dedicated to warnings regarding various critical problems (like an entire machine has gone away). The events area is fine but it is filled with all kind of messages and users quickly learn to ignore it. Maybe, in addition to an area that shows warnings, there should be some kind of monitoring page where the current health of each component (even the entire host) could be shown visually. I think this type of screen is what the customer was ultimately after.
[SF1448](https://isc.lightning.force.com/lightning/r/Case/500S6000000sLFmIAM/view) and [SF1430](https://isc.lightning.force.com/lightning/r/Case/5007V00002ZyNuWQAV/view)1.16Marcin SiodelskiMarcin Siodelskihttps://gitlab.isc.org/isc-projects/kea/-/issues/3150dhcpv4 and dhcpv6 services go down for some reason2023-11-16T18:13:47ZSandeep Gagalapallydhcpv4 and dhcpv6 services go down for some reasonHello Kea Support,
We are facing some intermittent issues with dhcpv4 and dhcpv6 services going down for some reason.
Our setup includes two kea instances ( 2.4.0 ) running on Ubuntu 18.04 and connect to an AWS RDS MySQL Instance.
c...Hello Kea Support,
We are facing some intermittent issues with dhcpv4 and dhcpv6 services going down for some reason.
Our setup includes two kea instances ( 2.4.0 ) running on Ubuntu 18.04 and connect to an AWS RDS MySQL Instance.
can you please review the configs of dhcp4 attached ?
The network connection between Kea servers and mySQL DB seems to working fine.
[kea-dhcp-1.conf](/uploads/cb2fc2093bd1450cf23694d64c62d3b2/kea-dhcp-1.conf)
[kea-dhcp-2.conf](/uploads/7a3f41c2f2be740e986dbd5232a3f0dc/kea-dhcp-2.conf)https://gitlab.isc.org/isc-projects/kea/-/issues/3149Bulk Leasequery (BLQ) needs to be able to match PDs associated with a link, e...2024-01-18T20:33:05ZCathy AlmondBulk Leasequery (BLQ) needs to be able to match PDs associated with a link, even if the subnet of the PD is outside of the subnet of the linkI'm reporting this a bug, since this is missing functionality, but functionality that is clearly important operationally, and without this (or a viable workaround) means the use of BLQ by a rebooting router is useless for finding and qui...I'm reporting this a bug, since this is missing functionality, but functionality that is clearly important operationally, and without this (or a viable workaround) means the use of BLQ by a rebooting router is useless for finding and quickly reprovisioning PDs as well as IAs.
**Describe the bug**
A router is using BLQ to learn of the existing leases associated with its link following a reboot. Particularly it's interested in the PDs since it's going to need to set up its routing appropriately to the client PDs.
BLQ using `query-by-link-address` is retrieving only the IAs associated with the link. This is sort of unsurprising since the PDs, although associated with the subnet that matches the link, aren't members of that subnet (this is also unsurprising).
However, in the leases memfile, both the IA and the PD have the same subnet id (the id of the subnet that matches to the link), and the IA, unsurprisingly is an address within that subnet.
- There is no PD pool associated with the subnet in the dhcp6 configuration
- All the PDs are assigned using HRs - but they *are* associated with the same subnet as the client IA
- I can see in the leases file, that the subnet ID is correct, if BLQ happened internally to be using subnet ID to 'find' leases associated with a link address.
I can't see any other way to get this information currently. The natural way to use BLQ here is to specify the link address (which also does match in the leases database, as well as the subnet ID associated with this link address's subnet). But the underlying lease search code doesn't seem to be using either of these.
How else is a router going to get the list of PDs it should be provisioning? Clearly after just rebooting, it's not going to **know** what ranges of prefixes have been delegated since PD subnets are independent of the subnet via which they're being delegated anyway.
See details in the customer case linked below.
I've also been discussing the semantics of the documentation and other information around BLQ with Kea Engineering, who have requested that I open this issue.
**Environment:**
- Kea version: 2.4.0
- OS: N/A
- Using memfile for leases, postgresql for HRs and mysql for config:
```
...
"Dhcp6": {
"allocator": "iterative",
"calculate-tee-times": true,
"config-control": {
"config-databases": [
{
"host": "127.0.0.1",
"max-reconnect-tries": 30,
"name": "kea",
"password": "<obscured>",
"port": 3306,
"reconnect-wait-time": 2000,
"type": "mysql",
"user": "kea"
}
],
"config-fetch-wait-time": 60
},
...
"lease-database": {
"lfc-interval": 3600,
"type": "memfile"
},
...
"reservations-global": false,
"reservations-in-subnet": true,
"reservations-lookup-first": false,
"reservations-out-of-pool": false,
...
"library": "\/usr\/lib\/x86_64-linux-gnu\/kea\/hooks\/libdhcp_lease_query.so",
"parameters": {
"advanced": {
"active-query-enabled": false,
"bulk-query-enabled": true,
"extended-info-tables-enabled": true,
"lease-query-ip": "<obscured>",
"lease-query-tcp-port": 547,
"max-bulk-query-threads": 0,
"max-concurrent-queries": 0,
"max-leases-per-fetch": 100,
"max-requester-connections": 10,
"max-requester-idle-time": 300
},
"requesters": [
"<obscured>"
]
}
}
],
"host-reservation-identifiers": [
"duid",
"flex-id"
],
"hostname-char-replacement": "",
"hostname-char-set": "[^A-Za-z0-9.-]",
"hosts-databases": [
{
"host": "127.0.0.1",
"max-reconnect-tries": 30,
"name": "kea",
"password": "<obscured>",
"port": 3306,
"reconnect-wait-time": 2000,
"type": "mysql",
"user": "kea"
}
],
```
- Using hooks:
> libdhcp_ha.so
> libdhcp_mysql_cb.so
> libdhcp_host_cmds.so
> libdhcp_lease_cmds.so
> libdhcp_subnet_cmds.so
> libdhcp_cb_cmds.so
> libdhcp_flex_id.so
> libdhcp_legal_log.so
> libdhcp_lease_query.so
**Additional Information**
Here's the operational use case supporting this:
> The DHCP BLQ needs to return the PDs including from any reservations. For our use case, PDs are the one thing we are actually looking to grab when using BLQ. Our routers will send a BLQ after they reload so that they can re-populate the PD routes and subscribers v6 networks can begin working again.
**Contacting you**
See [SF#1426](https://isc.lightning.force.com/lightning/r/5007V00002Zkn9vQAB/view?0.source=alohaHeader)kea2.5.5Francis DupontFrancis Duponthttps://gitlab.isc.org/isc-projects/kea/-/issues/3148backport hammer changes that fix errors in jenkins2023-11-23T13:38:03ZAndrei Pavelandrei@isc.orgbackport hammer changes that fix errors in jenkins* https://jenkins.aws.isc.org/view/Kea-2.4/job/kea-2.4/job/ut-extended/10/execution/node/253/log/
```plaintext
22:45:57 [HAMMER] 2023-11-13 20:45:56,978 Job for mariadb.service failed because the control process exited with error code...* https://jenkins.aws.isc.org/view/Kea-2.4/job/kea-2.4/job/ut-extended/10/execution/node/253/log/
```plaintext
22:45:57 [HAMMER] 2023-11-13 20:45:56,978 Job for mariadb.service failed because the control process exited with error code.
22:45:57 [HAMMER] 2023-11-13 20:45:56,978 See "systemctl status mariadb.service" and "journalctl -xe" for details.
```
Should be fixed by 82b3ee8457dc39226f196250d5c62f7f1ad49493.
* https://jenkins.aws.isc.org/view/Kea-2.4/job/kea-2.4/job/pkg/19/execution/node/99/log/
```plaintext
23:13:20 [HAMMER] 2023-11-13 21:13:20,375 >>>>> Executing mv kea-pkg/* kea-pkg in /home/alpine/workspace/kea-2.4/pkg
23:13:20 [HAMMER] 2023-11-13 21:13:20,376 mv: 'kea-pkg/isc-kea-2.4.0-r20231113204504.apk' and 'kea-pkg/isc-kea-2.4.0-r20231113204504.apk' are the same file
23:13:20 [HAMMER] 2023-11-13 21:13:20,376 mv: 'kea-pkg/isc-kea-admin-2.4.0-r20231113204504.apk' and 'kea-pkg/isc-kea-admin-2.4.0-r20231113204504.apk' are the same file
23:13:20 [HAMMER] 2023-11-13 21:13:20,376 mv: 'kea-pkg/isc-kea-common-2.4.0-r20231113204504.apk' and 'kea-pkg/isc-kea-common-2.4.0-r20231113204504.apk' are the same file
```
Should be fixed by 60e92acc095e56f0f4db5281e850d6c3879ca71d.kea2.4.1Andrei Pavelandrei@isc.orgAndrei Pavelandrei@isc.orghttps://gitlab.isc.org/isc-projects/kea/-/issues/3147Packaging on Alpine 3.16 tries to move files to the same location as the source2023-11-23T13:38:03ZAndrei Pavelandrei@isc.orgPackaging on Alpine 3.16 tries to move files to the same location as the source```
09:26:57 [HAMMER] 2023-11-13 07:26:56,616 >>>>> Executing mv kea-pkg/* kea-pkg in /home/alpine/workspace/kea-dev/pkg
09:26:57 [HAMMER] 2023-11-13 07:26:56,617 mv: 'kea-pkg/isc-kea-2.5.4-r20231113065823.apk' and 'kea-pkg/isc-kea-2...```
09:26:57 [HAMMER] 2023-11-13 07:26:56,616 >>>>> Executing mv kea-pkg/* kea-pkg in /home/alpine/workspace/kea-dev/pkg
09:26:57 [HAMMER] 2023-11-13 07:26:56,617 mv: 'kea-pkg/isc-kea-2.5.4-r20231113065823.apk' and 'kea-pkg/isc-kea-2.5.4-r20231113065823.apk' are the same file
```
https://jenkins.aws.isc.org/job/kea-dev/job/pkg/1345/execution/node/94/log/?consoleFull
Does not happen on other alpines, or any other distributions.kea2.5.4Andrei Pavelandrei@isc.orgAndrei Pavelandrei@isc.orghttps://gitlab.isc.org/isc-projects/kea/-/issues/3145Backport #3017: fix for interface redetection regression2023-11-23T13:38:03ZAndrei Pavelandrei@isc.orgBackport #3017: fix for interface redetection regressionThis ticket is about backporting the fix for the interface redetection regression that was developed in #3017 and released in %kea2.5.3.This ticket is about backporting the fix for the interface redetection regression that was developed in #3017 and released in %kea2.5.3.kea2.4.1Razvan BecheriuRazvan Becheriuhttps://gitlab.isc.org/isc-projects/bind9/-/issues/4425Current level of tcp-clients missing from statistics channel2024-03-07T22:44:04ZDarren AnkneyCurrent level of tcp-clients missing from statistics channelThe current level of `tcp-clients` from `rndc status` as shown in this screen shot:
![my-test-server-rndc-status](/uploads/da8dfd7a94ae169a17003a006014e2b6/my-test-server-rndc-status.png)
Is not included in the json, nor xml output fro...The current level of `tcp-clients` from `rndc status` as shown in this screen shot:
![my-test-server-rndc-status](/uploads/da8dfd7a94ae169a17003a006014e2b6/my-test-server-rndc-status.png)
Is not included in the json, nor xml output from the statistics channel. Only `TCPConnHighWater` is included, which reports the max simultaneous TCP connections that existed at some point since BIND was started. The current level is not exposed anywhere in the stats channel that I could find. It is exposed, however, in `rndc status` as shown above. it seems reasonable that this statistic should be exposed for monitoring in the stats channel.
[SF1419](https://isc.lightning.force.com/lightning/r/Case/5007V00002Zh7IyQAJ/view)March 2024 (9.16.49, 9.16.49-S1, 9.18.25, 9.18.25-S1, 9.19.22)Aydın MercanAydın Mercanhttps://gitlab.isc.org/isc-projects/kea/-/issues/3143Backport #3111: FLQ fix2023-11-23T13:38:03ZTomek MrugalskiBackport #3111: FLQ fixThis ticket is about backporting FLQ race condition fix that was developed in #3111 and released in %kea2.5.3.This ticket is about backporting FLQ race condition fix that was developed in #3111 and released in %kea2.5.3.kea2.4.1Razvan BecheriuRazvan Becheriuhttps://gitlab.isc.org/isc-projects/kea/-/issues/3142deadlock caused by race between start stop and wait2023-11-21T12:04:58ZRazvan Becheriudeadlock caused by race between start stop and waitdeadlock on master is caused by the fact that ```start``` is calling ```enable``` which is setting ```working_``` to thread count:
```
void enable(uint32_t thread_count) {
std::lock_guard<std::mutex> lock(mutex_);
...deadlock on master is caused by the fact that ```start``` is calling ```enable``` which is setting ```working_``` to thread count:
```
void enable(uint32_t thread_count) {
std::lock_guard<std::mutex> lock(mutex_);
enabled_ = true;
working_ = thread_count;
}
```
and ```stop``` is calling ```disable``` which just sets ```enabled_``` to false
```
void disable() {
{
std::lock_guard<std::mutex> lock(mutex_);
enabled_ = false;
}
// Notify pop so that it can exit.
cv_.notify_all();
}
```
some threads just exit without calling ```pop```:
```
/// @brief run function of each thread
void run() {
while (queue_.enabled()) { // <<< --- exit here without calling pop
WorkItemPtr item = queue_.pop();
if (item) {
try {
(*item)();
} catch (...) {
// catch all exceptions
}
}
}
}
```
and never reach the code which is decrementing ```working_```:
```
Item pop() {
std::unique_lock<std::mutex> lock(mutex_);
--working_; // <<< --- this code is never reached
// Wait for push or disable functions.
if (working_ == 0 && queue_.empty()) {
wait_cv_.notify_all();
}
cv_.wait(lock, [&]() {return (!enabled_ || !queue_.empty());});
if (!enabled_) {
return (Item());
}
```
so any call to ```wait``` will cause a deadlock because ```working_``` never reaches 0:
```
void wait() {
std::unique_lock<std::mutex> lock(mutex_);
// Wait for any item or for working threads to finish.
wait_cv_.wait(lock, [&]() {return (working_ == 0 && queue_.empty());}); // <<< --- deadlock here
}
```
to replicate apply this patch on master (UTs changes only) and they will fail:
```
diff --git a/src/lib/util/tests/thread_pool_unittest.cc b/src/lib/util/tests/thread_pool_unittest.cc
index 9c636c9e85..1c2e3a3efe 100644
--- a/src/lib/util/tests/thread_pool_unittest.cc
+++ b/src/lib/util/tests/thread_pool_unittest.cc
@@ -533,7 +533,7 @@ TEST_F(ThreadPoolTest, wait) {
ASSERT_EQ(thread_pool.size(), 0);
items_count = 64;
- thread_count = 16;
+ thread_count = 256;
// prepare setup
reset(thread_count);
@@ -556,15 +556,13 @@ TEST_F(ThreadPoolTest, wait) {
// calling start should create the threads and should keep the queued items
EXPECT_NO_THROW(thread_pool.start(thread_count));
- // the thread count should match
- ASSERT_EQ(thread_pool.size(), thread_count);
+ thread_pool.stop();
// wait for all items to be processed
- thread_pool.wait();
+ ASSERT_TRUE(thread_pool.wait(1));
// the item count should be 0
ASSERT_EQ(thread_pool.count(), 0);
- // the thread count should match
- ASSERT_EQ(thread_pool.size(), thread_count);
+
// all items should have been processed
ASSERT_EQ(count(), items_count);
```
without timeout on ```wait``` they cause a deadlock.kea2.5.4Razvan BecheriuRazvan Becheriuhttps://gitlab.isc.org/isc-projects/kea-quick-config/-/issues/48advance version number and add version number section to relnotes2023-11-07T20:50:15ZDarren Ankneyadvance version number and add version number section to relnotesI always forget to do this. So I made an issue about it.I always forget to do this. So I made an issue about it.0.3Darren AnkneyDarren Ankneyhttps://gitlab.isc.org/isc-projects/kea/-/issues/3141Update DNR docs to RFC 94632024-02-23T16:22:33ZPiotrek ZadrogaUpdate DNR docs to RFC 9463As of November 2023, draft-ietf-add-dnr/16 was published as RFC 9463.
There are some comments in code referring to draft, they could be updated with RFC no. for clarity.
Sadly, the on-wire format has changed between draft and RFC. Kea ...As of November 2023, draft-ietf-add-dnr/16 was published as RFC 9463.
There are some comments in code referring to draft, they could be updated with RFC no. for clarity.
Sadly, the on-wire format has changed between draft and RFC. Kea code needs to be updated.kea2.5.6Piotrek ZadrogaPiotrek Zadroga