[CVE-2022-3924] named configured to answer from stale cache may terminate unexpectedly at recursive-clients soft quota
Quick Links | |
---|---|
Incident Manager: | @matthijs |
Deputy Incident Manager: | @peterd |
Public Disclosure Date: | 2023-01-25 |
CVSS Score: | 7.5 |
Security Advisory: | isc-private/printing-press!37 |
Mattermost Channel: | CVE-2022-3924: Serve-stale crash on recursive-clients soft quota |
Support Ticket: | https://support.isc.org/Ticket/Display.html?id=21397 |
Release Checklist: | #3755 (closed) |
Post-mortem Etherpad: | postmortem-2023-01 |
Earlier Than T-5
-
🔗 (IM) Pick a Deputy Incident Manager -
🔗 (IM) Respond to the bug reporter -
🔗 (IM) Create an Etherpad for post-mortem -
🔗 (SwEng) Ensure there are no public merge requests which inadvertently disclose the issue -
🔗 (IM) Assign a CVE identifier -
🔗 (SwEng) Update this issue with the assigned CVE identifier and the CVSS score -
🔗 (SwEng) Determine the range of product versions affected (including the Subscription Edition) -
🔗 (SwEng) Determine whether workarounds for the problem exist -
🔗 (SwEng) If necessary, coordinate with other parties -
🔗 (Support) Prepare and send out "earliest" notifications -
🔗 (Support) Create a merge request for the Security Advisory and include all readily available information in it -
🔗 (SwEng) Prepare a private merge request containing a system test reproducing the problem -
🔗 (SwEng) Notify Support when a reproducer is ready -
🔗 (SwEng) Prepare a detailed explanation of the code flow triggering the problem -
🔗 (SwEng) Prepare a private merge request with the fix -
🔗 (SwEng) Ensure the merge request with the fix is reviewed and has no outstanding discussions -
🔗 (Support) Review the documentation changes introduced by the merge request with the fix -
🔗 (SwEng) Prepare backports of the merge request addressing the problem for all affected (and still maintained) branches of a given product -
🔗 (Support) Finish preparing the Security Advisory -
🔗 (QA) Create (or update) the private issue containing links to fixes & reproducers for all CVEs fixed in a given release cycle -
🔗 (QA) (BIND 9 only) Reserve a block ofCHANGES
placeholders once the complete set of vulnerabilities fixed in a given release cycle is determined -
🔗 (QA) Merge the CVE fixes in CVE identifier order -
🔗 (QA) Prepare a standalone patch for the last stable release of each affected (and still maintained) product branch -
🔗 (QA) Prepare ASN releases (as outlined in the Release Checklist)
At T-5
-
🔗 (Support) Send ASN to eligible customers -
🔗 (Support) (BIND 9 only) Send a pre-announcement email to the bind-announce mailing list to alert users that the upcoming release will include security fixes
At T-4
At T-1
-
🔗 (Support) Verify that any new or reinstated customers have received the notification email -
🔗 (First IM) Send notifications to OS packagers
On the Day of Public Disclosure
-
🔗 (IM) Grant Support clearance to proceed with public release -
🔗 (Support) Publish the releases (as outlined in the release checklist) -
🔗 (Support) (BIND 9 only) Update vulnerability matrix in the Knowledge Base -
🔗 (Support) Bump Document Version for the Security Advisory and publish it in the Knowledge Base -
🔗 (First IM) Send notification emails to third parties -
🔗 (First IM) Advise MITRE about the disclosed CVEs -
🔗 (First IM) Merge the Security Advisory merge request -
🔗 (IM) Inform original reporter (if external) that the security disclosure process is complete -
🔗 (Support) Inform customers a fix has been released
After Public Disclosure
-
🔗 (First IM) Organize post-mortem meeting and make sure it happens -
🔗 (Support) Close support tickets -
🔗 (QA) Merge a regression test reproducing the bug into all affected (and still maintained) branches
Report: Resolver crash when stale cache entry matches RPZ entry
Note: The report mentions RPZ, but it is actually a crash caused by the recursive-clients
soft quota being hit. The crash only happens when stale-answer-client-timeout
is enabled and has a non-zero value.
A customer has reported a repeatable crash on 9.16.34 (on Amazon Linux 2012) and 9.18.8 (on Ubuntu 22.04.1 LTS), when 'stale-answer-enable yes;' is configured, and a query matches both a stale cache entry (auth nameservers blocked) and an RPZ entry.
Reproducer:
# named -V
BIND 9.18.1-1ubuntu1.2-Ubuntu (Stable Release) <id:>
running on Linux x86_64 5.15.0-1021-aws #25-Ubuntu SMP Fri Sep 23 12:20:42 UTC 2022
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--disable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=yes' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/bind9-2lYtkE/bind9-9.18.1=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 11.2.0
compiled with OpenSSL version: OpenSSL 3.0.2 15 Mar 2022
linked to OpenSSL version: OpenSSL 3.0.2 15 Mar 2022
compiled with libuv version: 1.43.0
linked to libuv version: 1.43.0
compiled with libnghttp2 version: 1.43.0
linked to libnghttp2 version: 1.43.0
compiled with libxml2 version: 2.9.13
linked to libxml2 version: 20913
compiled with json-c version: 0.15
linked to json-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.5.2
threads support is enabled
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
# named-checkconf -p
options {
directory "/var/cache/bind";
listen-on-v6 {
"any";
};
dnssec-validation no;
response-policy {
zone "test.rpz.local" max-policy-ttl 86400;
} break-dnssec yes qname-wait-recurse no;
stale-answer-enable yes;
stale-answer-client-timeout 1800;
stale-cache-enable yes;
};
zone "." {
type hint;
file "/usr/share/dns/root.hints";
};
zone "localhost" {
type master;
file "/etc/bind/db.local";
};
zone "127.in-addr.arpa" {
type master;
file "/etc/bind/db.127";
};
zone "0.in-addr.arpa" {
type master;
file "/etc/bind/db.0";
};
zone "255.in-addr.arpa" {
type master;
file "/etc/bind/db.255";
};
zone "test.rpz.local" in {
type master;
file "/etc/bind/db.rpz.local";
allow-query {
"localhost";
};
allow-transfer {
"localhost";
};
forwarders {
};
};
zone "myctl.com" in {
type master;
file "/etc/bind/myctl.com.local";
allow-query {
"localhost";
};
allow-transfer {
"localhost";
};
forwarders {
};
};
# cat db.rpz.local
$TTL 900
@ IN SOA localhost. root.localhost. (
1 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
86400 ) ; Negative Cache TTL
IN NS localhost.
$TTL 293
test.myctl.com CNAME test-cname-a.myctl.com.
# cat myctl.com.local
$ORIGIN .
$TTL 86400
myctl.com IN SOA localhost. root.localhost. (
1 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
86400 ) ; Negative Cache TTL
IN NS ns-canada.topdns.com.
IN NS ns-usa.topdns.com.
IN NS ns-uk.topdns.com.
$ORIGIN myctl.com
test-cname-a NS ns-canada.topdns.com.
NS ns-usa.topdns.com.
NS ns-uk.topdns.com.
test NS ns-canada.topdns.com.
NS ns-usa.topdns.com.
NS ns-uk.topdns.com.
$ORIGIN test.myctl.com
$TTL 300
* CNAME test.myctl.com.
Regular queries work fine, with the RPZ taking effect:
# dig @127.0.0.1 999.test.myctl.com
; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>> @127.0.0.1 999.test.myctl.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50674
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 2
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 66f39e5345dbbe20010000006352b399a385cb42e1947df1 (good)
;; QUESTION SECTION:
;999.test.myctl.com. IN A
;; ANSWER SECTION:
999.test.myctl.com. 300 IN CNAME test.myctl.com.
test.myctl.com. 293 IN CNAME test-cname-a.myctl.com.
test-cname-a.myctl.com. 60 IN A 127.0.0.1
;; ADDITIONAL SECTION:
test.rpz.local. 1 IN SOA localhost. root.localhost. 1 604800 86400 2419200 86400
;; Query time: 143 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Oct 21 14:58:33 UTC 2022
;; MSG SIZE rcvd: 205
Command to run test is
queryperf -d ./test.pattern.shuffled -s 127.0.0.1 -l 900 -T 10000 -v
The input file was created with:
for m in $(seq 1 200); do for i in $(seq 1 1000); do echo $i.test.myctl.com A >> test.pattern; done; done
shuf test.pattern > test.pattern.shuffled
After one minute, we block upstream NSs with iptables:
iptables -I INPUT -s 77.247.183.137/32,108.61.150.91/32,109.201.142.225/32,46.166.189.99/32,108.61.12.163/32 -j DROP
As soon as TTL for all CNAMEs are expired (checking with rndc dumpdb), we get this set of error messages in log file:
9.16:
21-Oct-2022 18:16:07.996 client @0x7f261d905230 127.0.0.1#47018 (61.test.myctl.com): rpz QNAME Local-Data rewrite test.myctl.com/A/IN via test.myctl.com.test.rpz.local
21-Oct-2022 18:16:07.996 test-cname-a.myctl.com query within stale refresh time, stale answer used
21-Oct-2022 18:16:07.996 client @0x7f261d783450 127.0.0.1#47018 (42.test.myctl.com): rpz QNAME Local-Data rewrite test.myctl.com/A/IN via test.myctl.com.test.rpz.local
21-Oct-2022 18:16:07.996 test-cname-a.myctl.com query within stale refresh time, stale answer used
21-Oct-2022 18:16:08.000 client @0x7f261d783450 127.0.0.1#47018 (14.test.myctl.com): recursive-clients soft limit exceeded (901/900/1000), aborting oldest query
21-Oct-2022 18:16:08.000 client @0x7f261d891dc0 127.0.0.1#47018 (429.test.myctl.com): rpz QNAME Local-Data rewrite test.myctl.com/A/IN via test.myctl.com.test.rpz.local
21-Oct-2022 18:16:08.000 client @0x7f261ce109b0 127.0.0.1#47018 (907.test.myctl.com): rpz QNAME Local-Data rewrite test.myctl.com/A/IN via test.myctl.com.test.rpz.local
21-Oct-2022 18:16:08.000 resolver.c:11282: fatal error:
21-Oct-2022 18:16:08.000 RUNTIME_CHECK(event->fetch != fetch) failed
21-Oct-2022 18:16:08.000 exiting (due to fatal error in library)
9.18:
Oct 21 18:25:03 ip-172-31-41-183 named[13856]: client @0x7fd9082306a8 127.0.0.1#55429 (875.test.myctl.com): rpz QNAME Local-Data rewrite test.myctl.com/A/IN via test.myctl.com.test.rpz.local
Oct 21 18:25:03 ip-172-31-41-183 named[13856]: client @0x7fd9082306a8 127.0.0.1#55429 (459.test.myctl.com): rpz QNAME Local-Data rewrite test.myctl.com/A/IN via test.myctl.com.test.rpz.local
Oct 21 18:25:03 ip-172-31-41-183 named[13856]: client @0x7fd9082306a8 127.0.0.1#55429 (82.test.myctl.com): rpz QNAME Local-Data rewrite test.myctl.com/A/IN via test.myctl.com.test.rpz.local
Oct 21 18:26:10 ip-172-31-41-183 named[13856]: client @0x7fd909aa06f8 127.0.0.1#55429 (256.test.myctl.com): recursive-clients soft limit exceeded (901/900/1000), aborting oldest query
Oct 21 18:26:10 ip-172-31-41-183 rsyslogd: imjournal: 4812 messages lost due to rate-limiting
Oct 21 18:26:11 ip-172-31-41-183 named[13856]: client @0x7fd908dcdf58 127.0.0.1#55429 (75.test.myctl.com): recursive-clients soft limit exceeded (901/900/1000), aborting oldest query
Oct 21 18:26:12 ip-172-31-41-183 named[13856]: client @0x7fd9088ef9e8 127.0.0.1#55429 (510.test.myctl.com): recursive-clients soft limit exceeded (901/900/1000), aborting oldest query
Oct 21 18:26:12 ip-172-31-41-183 named[13856]: resolver.c:11000: fatal error:
Oct 21 18:26:12 ip-172-31-41-183 named[13856]: RUNTIME_CHECK(event->fetch != fetch) failed
Oct 21 18:26:12 ip-172-31-41-183 named[13856]: exiting (due to fatal error in library)