[CVE-2024-1975] SIG(0) can be used to exhaust CPU resources

Quick Links	🔗
Incident Manager:	@pspacek
Deputy Incident Manager:	@peterd
Public Disclosure Date:	2024-07-23
CVSS Score:	7.5
Security Advisory:	isc-private/printing-press!102
Mattermost Channel:	CVE-2024-1975
Support Ticket:	N/A
Release Checklist:	#4735 (closed)

💡 Click here (internal resource) for general information about the security incident handling process.

Earlier Than T-5

At T-5

🔗 (Marketing) Update the text on the T-5 (from the Printing Press project) and "earliest" ASN documents in the SF portal
🔗 (Marketing) (BIND 9 only) Update the BIND -S information document in SF with download links to the new versions
🔗 (Marketing) Bulk email eligible customers to check the SF portal
🔗 (Marketing) (BIND 9 only) Send a pre-announcement email to the bind-announce mailing list to alert users that the upcoming release will include security fixes

At T-1

🔗 (First IM) Send notifications to OS packagers

On the Day of Public Disclosure

🔗 (IM) Grant QA & Marketing clearance to proceed with public release
🔗 (QA/Marketing) Publish the releases (as outlined in the release checklist)
🔗 (Support) (BIND 9 only) Add the new CVEs to the vulnerability matrix in the Knowledge Base
🔗 (Support) Bump Document Version for the Security Advisory and publish it in the Knowledge Base
🔗 (First IM) Send notification emails to third parties
🔗 (First IM) Advise MITRE about the disclosed CVEs
🔗 (First IM) Merge the Security Advisory merge request
🚫 🔗 (IM) Inform original reporter (if external) that the security disclosure process is complete
🔗 (Marketing) Update the SF portal to clear the ASN
🔗 (Marketing) Email ASN recipients that the embargo is lifted

After Public Disclosure

🚫 🔗 (QA) Merge a regression test reproducing the bug into all affected (and still maintained) branches

Summary

Authoritative servers with a KEY RR in a zone or validating resolvers are vulnerable to trivial CPU exhaustion attacks using SIG(0) protocol.

BIND versions affected

All versions with SIG(0) support. Tested configuration:

BIND 9.19.19-dev (Development Release) <id:de2009e>
compiled by GCC 13.2.1 20230801
compiled with OpenSSL version: OpenSSL 3.1.4 24 Oct 2023
linked to OpenSSL version: OpenSSL 3.1.4 24 Oct 2023
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): no
TKEY mode 3 support (GSS-API): yes

~"Affects v9.19": c8c0f4bb
Affects v9.18: 44e4b5cb
~"Affects v9.16": 161d69ab
Affects v9.11 (EoL): v9.11.37-S1

Preconditions and assumptions

Server must be configured with at least one KEY RR in a zone OR KEY RR must be in cache - with trust level secure (DNSSEC-validated record).

Example configuration: https://jpmens.net/2010/12/01/securing-dynamic-dns-updates-ddns-with-sig0/
Essentially any zone with KEY RR in it should suffice. If an attacker has ability to add KEY RR he can prepare conditions for the attack.

Attacker is able to find DNS name of the KEY RR.

Attacker's abilities

Attacker simply needs to send a signed message which refers the existing KEY RR in one of the zones OR in cache. The signature can be invalid (any syntactically valid message is sufficient).

Impact

Inability to respond to legitimate queries,
CPU resource exhaustion on the server.

The impact gets higher with number of KEY RRs on the same name - it seems we are trying all KEY RRs until we find a match or exhaust all options.

Values were measured on build from the bind-9.18 branch, listed as thousands QPS, named running on a single CPU core to make measurement easier.

State	QPS [k]
legitimate cache hits - before the attack	70
attack	1
legitimate traffic when the attack is ongoing	10

Steps to reproduce

Beware: Test over UDP. TCP also suffers from #4481 which throws results off.
Configure BIND to be authoritative for a zone containing KEY RR:

Start BIND server with command: named -g -c named.conf -n1 (single thread to simplify measurement)
Simulate legitimate clients using: yes '. A' | dnsperf -S1 -O suppress=timeout -c 256
Simulate attack traffic using attached script & data file: python udploop.py 127.0.0.1 53 sig0-bad-query.tcpdns --report-interval 1

Please note the binary containing queries does NOT have a valid signature. It's a valid message which pretends to be signed with the KEY RR from the zone file, but it's not a valid signature.

What is the current bug behavior?

Legitimate QPS drops significantly. Impact on legit traffic QPS is roughly 60x higher than for an ordinary query without SIG(0).
Side effect: log spam with one log line per attacker's message

What is the expected correct behavior?

Probably limit on resources we are willing to spend on SIG(0), or expensive crypto in general.

Possibly also smaller TCP buffer size (or smaller buffers elsewhere, I don't exactly know where the messages are being buffered).

Relevant logs

This particular variant of attack logs one message per query:

request has invalid signature: RRSIG failed to verify (BADSIG)

Edited Jul 30, 2024 by Nicki Křížek

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information