[CVE-2023-4236] named may terminate unexpectedly under high DNS-over-TLS query load
Quick Links | |
---|---|
Incident Manager: | @pspacek |
Deputy Incident Manager: | @greg |
Public Disclosure Date: | 2023-09-20 |
CVSS Score: | 7.5 |
Security Advisory: | isc-private/printing-press!66 |
Mattermost Channel: | CVE-2023-4236 |
Support Ticket: | Salesforce case #1086 |
Release Checklist: | #4289 (closed) |
Post-mortem Etherpad: | postmortem-2023-09 |
Earlier Than T-5
-
🔗 (IM) Pick a Deputy Incident Manager -
🔗 (IM) Respond to the bug reporter -
🔗 (IM) Create an Etherpad for post-mortem -
🔗 (SwEng) Ensure there are no public merge requests which inadvertently disclose the issue -
🔗 (IM) Assign a CVE identifier -
🔗 (SwEng) Update this issue with the assigned CVE identifier and the CVSS score: MM -
🔗 (SwEng) Determine the range of product versions affected (including the Subscription Edition) -
🔗 (SwEng) Determine whether workarounds for the problem exist -
🚫 (SwEng) If necessary, coordinate with other parties -
🔗 (Support) Prepare and send out "earliest" notifications -
🔗 (Support) Create a merge request for the Security Advisory and include all readily available information in it -
🚫 (SwEng) Prepare a private merge request containing a system test reproducing the problem -
🚫 (SwEng) Notify Support when a reproducer is ready -
🔗 (SwEng) Prepare a detailed explanation of the code flow triggering the problem: see here -
🔗 (SwEng) Prepare a private merge request with the fix -
🔗 (SwEng) Ensure the merge request with the fix is reviewed and has no outstanding discussions -
🔗 (Support) Review the documentation changes introduced by the merge request with the fix -
🚫 (SwEng) Prepare backports of the merge request addressing the problem for all affected (and still maintained) branches of a given product -
🔗 (Support) Finish preparing the Security Advisory -
🔗 (QA) Create (or update) the private issue containing links to fixes & reproducers for all CVEs fixed in a given release cycle -
🔗 (QA) (BIND 9 only) Reserve a block ofCHANGES
placeholders once the complete set of vulnerabilities fixed in a given release cycle is determined -
🔗 (QA) Merge the CVE fixes in CVE identifier order -
🔗 (QA) Prepare a standalone patch for the last stable release of each affected (and still maintained) product branch -
🔗 (QA) Prepare ASN releases (as outlined in the Release Checklist)
At T-5
-
🔗 (Support) Send ASN to eligible customers -
🔗 (Support) (BIND 9 only) Send a pre-announcement email to the <em>bind-announce</em> mailing list to alert users that the upcoming release will include security fixes
At T-4
At T-1
-
🔗 (Support) Verify that any new or reinstated customers have received the notification email -
🔗 (First IM) Send notifications to OS packagers
On the Day of Public Disclosure
-
🔗 (IM) Grant Support clearance to proceed with public release -
🔗 (Support) Publish the releases (as outlined in the release checklist) -
🔗 (Support) (BIND 9 only) Add the new CVEs to the vulnerability matrix in the Knowledge Base -
🔗 (Support) Bump Document Version for the Security Advisory and publish it in the Knowledge Base -
🔗 (First IM) Send notification emails to third parties -
🔗 (First IM) Advise MITRE about the disclosed CVEs -
🔗 (First IM) Merge the Security Advisory merge request -
🔗 (IM) Inform original reporter (if external) that the security disclosure process is complete -
🔗 (Support) Inform customers a fix has been released
After Public Disclosure
-
🚫 🔗 (First IM) Organize post-mortem meeting and make sure it happens -
🔗 (Support) Close support tickets -
🚫 🔗 (QA) Merge a regression test reproducing the bug into all affected (and still maintained) branches
Summary
From the customer:
I was running dnsperf in a test environment to try and find parameters to get the highest QPS rate. Originally tested our custom build on CentOS 7. After it started crashing I switched to stock named on a Fedora 37 test VM to reproduce.
dnsperf example:
dnsperf -d /usr/share/dnsperf/queryfile-example-current -S 1 -s 192.168.122.239 -q 15000 -c 48 -T 32 -m dot
This or a similar command would often run without issues. While I was testing I'd interrupt dnsperf if QPS was lower than a previous run to tweak parameters. This would eventually cause named to crash.
BIND version used
Steps to reproduce
See above
What is the current bug behavior?
named
dies horribly: The assert as logged is:
The assert is
../../../lib/isc/netmgr/netmgr.c:2921: REQUIRE(((uvreq) != ((void *)0) && ((const isc__magic_t *)(uvreq))->magic == ((('N') << 24 | ('M') << 16 | ('U') << 8 | ('R'))))) failed, back trace
What is the expected correct behavior?
named lives forever (or at least until an operator decides to stop and restart it)
Relevant configuration files
see internal only note
Relevant logs and/or screenshots
This is from the backtrace from the core dump that was produced:
Stack trace of thread 35031:
#0 0x00007f40cc2afe5c __pthread_kill_implementation (libc.so.6 + 0x8ce5c)
#1 0x00007f40cc25fa76 raise (libc.so.6 + 0x3ca76)
#2 0x00007f40cc2497fc abort (libc.so.6 + 0x267fc)
#3 0x0000561a0d6fb575 assertion_failed.cold (named + 0x1c575)
#4 0x00007f40cce39a50 isc_assertion_failed (libisc-9.18.16.so + 0x39a50)
#5 0x00007f40cce296d9 isc___nm_uvreq_put (libisc-9.18.16.so + 0x296d9)
#6 0x00007f40cce29d23 isc__nm_async_sendcb (libisc-9.18.16.so + 0x29d23)
#7 0x00007f40cce2da04 process_netievent (libisc-9.18.16.so + 0x2da04)
#8 0x00007f40cce2e057 process_queue (libisc-9.18.16.so + 0x2e057)
#9 0x00007f40cce2e277 async_cb (libisc-9.18.16.so + 0x2e277)
#10 0x00007f40ccd08e23 uv__async_io.part.0 (libuv.so.1 + 0xae23)
#11 0x00007f40ccd25dcb uv__io_poll (libuv.so.1 + 0x27dcb)
#12 0x00007f40ccd0e62f uv_run (libuv.so.1 + 0x1062f)
#13 0x00007f40cce2e704 nm_thread (libisc-9.18.16.so + 0x2e704)
#14 0x00007f40cce646e9 isc__trampoline_run (libisc-9.18.16.so + 0x646e9)
#15 0x00007f40cc2ae12d start_thread (libc.so.6 + 0x8b12d)
#16 0x00007f40cc32fbc0 __clone3 (libc.so.6 + 0x10cbc0)