[CVE-2021-25219] Lame cache can be abused to severely degrade resolver performance
CVE-specific actions
-
Assign a CVE identifier -
Determine CVSS score -
Determine the range of BIND versions affected (including the Subscription Edition) -
Determine whether workarounds for the problem exists -
Create a draft of the security advisory and put the information above in there -
Prepare a detailed description of the problem which should include the following by default: -
Prepare a private merge request containing the following items in separate commits: - a test for the issue (may be moved to a separate merge request for deferred merging)
- a fix for the issue
- documentation updates (
CHANGES
, release notes, anything else applicable)
-
Ensure the merge request from the previous step is reviewed by SWENG staff and has no outstanding discussions -
Ensure the documentation changes introduced by the merge request addressing the problem are reviewed by Support and Marketing staff -
Prepare backports of the merge request addressing the problem for all affected (and still maintained) BIND branches (backporting might affect the issue's scope and/or description) -
Prepare a standalone patch for the last stable release of each affected (and still maintained) BIND branch
Release-specific actions
-
Create/update the private issue containing links to fixes & reproducers for all CVEs fixed in a given release cycle -
Reserve a block of CHANGES
placeholders once the complete set of vulnerabilities fixed in a given release cycle is determined -
Ensure the merge requests containing CVE fixes are merged into security-*
branches in CVE identifier order
Post-disclosure actions
The ADB lame cache lists can grow long under a random subdomain attack involving a lame server resulting in degraded server performance.
https://support.isc.org/Ticket/Display.html?id=19171
We received a customer support issue where attacker is exploiting handling of lame responses in BIND's resolver to degrade its performance. This is done by performing a Random Subdomain Attack on a domain for which the supposed to-be authoritative name-server sends a lame response.
Attack Pattern: In an ideal case, if an upstream authoritative name server sends a lame response for a (qname, qtype) then the DNS resolver marks the upstream server as lame for the (qname, qtype) by adding an entry to dns_adblameinfo_t list (which is a linked list and safe-guarded with a mutex lock) under that server in the address database. This entry will be maintained in the address database until lame_ttl expiry (default = 600s). This is done to ensure that if any other client sends a query with the same (qname, qtype), then a SERVFAIL is sent to that client without sending any queries to the upstream name server.
Suppose an attacker identifies a domain (say example.com) for which the upstream name server is sending lame responses, then the attacker can send more and more queries to the DNS resolver with random & unique subdomains. Since all the upstream responses will be lame, the dns_adblameinfo_t list can grow very long for the server over time. If example.com is hosted on multiple nameservers and all respond with lame answers for subdomains of example.com then a similar list is maintained for each server.
In this case, any queries to more random subdomains in this domain (example.com), can severely impact address lookup in ADB database as the resolver has to walkthrough the dns_adblameinfo_t list to see if the incoming (qname, qtype) matches any entry in the list. Also, when processing lame responses from an upstream nameserver, the dns_adblameinfo_t list is traversed completely once again to ensure there is no entry with the same (qname, qtype) before adding a new entry to the list.
Because of the long length of the dns_adblameinfo_t linked list and list being safeguarded with mutex locks, we see a huge performance impact on the DNS resolver. Some threads are busy traversing the list when an incoming query is received, some threads are busy marking a server as lame for (qname, qtype) and some threads are busy waiting on the mutex lock. We have observed that UDP socket buffers are filled and incoming query requests are dropped. This behavior is impacting query responses.
Click here to expand/collapse further details
Lab reproduction steps follow. If you are unable to reproduce this and absolutely need a system test, please ask us.
- Create 4 upstream name servers to give lame responses to all the subdomains of example.com and valid responses to any other domains.
- Create a fake root name server which will give aforementioned name servers as delegations for any domain
- Set up a DNS resolver with recursion enabled and lame_ttl set to 600s
- Set up a client to send a "good" load of 10K QPS
- Set up another client to send 0.5K qps "bad" load which contains random & unique subdomains of example.com which results in lame responses from upstream servers
- Once the dns_adblameinfo_t list grows huge, we will see both good and bad queries getting dropped as socket buffers gets filled up.
Sample ADB dump snippet below:
; ns0.lamezone.com [v4 TTL 9] [v4 success] [v6 unexpected]
; x.x.x.x [srtt 14886] [flags 00000008] [edns 0/0/0/0/0] [plain 243/0] [udpsize 512] [ttl 1381]
; lame7b666f80772a3ca0.lamezone.com A [lame TTL 600]
; lame2a66477284602834.lamezone.com A [lame TTL 600]
; lame13d22bfabedd01ec.lamezone.com A [lame TTL 600]
; lameb80ed1378a08573c.lamezone.com A [lame TTL 600]
; lame00e62ac39a71892d.lamezone.com A [lame TTL 600]
; lame89b523316ebed0c5.lamezone.com A [lame TTL 600]
; lame8cd0d0dbef312217.lamezone.com A [lame TTL 600]
; lame9706e9b7d1efbc99.lamezone.com A [lame TTL 600]
; lame9032e2c6e17d6641.lamezone.com A [lame TTL 599]
; lame7ad64e7c0ab85626.lamezone.com A [lame TTL 599]
; lamed06b237517ce8645.lamezone.com A [lame TTL 599]
; lamee2060685a923fda7.lamezone.com A [lame TTL 599]
; lame343ee8665dcea995.lamezone.com A [lame TTL 599]
; lameb977875001a8b8fb.lamezone.com A [lame TTL 599]
; lame2310dcd40d94a0f2.lamezone.com A [lame TTL 599]
; lame483fe23ba5a988cb.lamezone.com A [lame TTL 599]
; lameda3f75bdb0c831ed.lamezone.com A [lame TTL 599]
; lameced50ee825424244.lamezone.com A [lame TTL 599]
; lame3bb661710d73d83b.lamezone.com A [lame TTL 599]
; lame6c55f727a1a6ddeb.lamezone.com A [lame TTL 599]
; lame0caa15b502515d55.lamezone.com A [lame TTL 599]
; lame5fc20a3d0a323793.lamezone.com A [lame TTL 599]
; lame5fa76a80eb12d174.lamezone.com A [lame TTL 599]
; lame54056775b52b26c6.lamezone.com A [lame TTL 599]
; lamea44305ab4775bc9d.lamezone.com A [lame TTL 599]
; lame5378076402e48972.lamezone.com A [lame TTL 599]
; lamedcee9bf7bc559c51.lamezone.com A [lame TTL 599]
; lame393f530f9243b217.lamezone.com A [lame TTL 599]
; lamea6eed84f4c259e21.lamezone.com A [lame TTL 599]
; lamea60557f44db138dc.lamezone.com A [lame TTL 599]
; lame021df6990bfc90b5.lamezone.com A [lame TTL 599]
; lame7bc389882ce41950.lamezone.com A [lame TTL 599]
; lame3648acfec5264795.lamezone.com A [lame TTL 599]
; lamee2874e4794550f45.lamezone.com A [lame TTL 599]
; lame57c91be78ddec9e8.lamezone.com A [lame TTL 599]
; lame9abfe14a3d21195d.lamezone.com A [lame TTL 599]
; lame5cc66b7c9fc9a50b.lamezone.com A [lame TTL 599]
Sample lame response: AA bit is unset, NO ERROR, NO DATA, AUTHORITY RRS > 0
Domain Name System (response)
Transaction ID: 0x9754
Flags: 0x8000 Standard query response, No error
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...0 .... .... = Recursion desired: Don't do query recursively
.... .... 0... .... = Recursion available: Server can't do recursive queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0000 = Reply code: No error (0)
Questions: 1
Answer RRs: 0
Authority RRs: 4
Additional RRs: 4
Queries
Authoritative nameservers
lamezone.com: type NS, class IN, ns ns0.lamezone.com
lamezone.com: type NS, class IN, ns ns1.lamezone.com
lamezone.com: type NS, class IN, ns ns2.lamezone.com
lamezone.com: type NS, class IN, ns ns3.lamezone.com
Additional records
ns0.lamezone.com: type A, class IN, addr 1.0.70.93
ns1.lamezone.com: type A, class IN, addr 1.0.157.163
ns2.lamezone.com: type A, class IN, addr 1.0.184.154
ns3.lamezone.com: type A, class IN, addr 1.0.66.115
[Request In: 23]
[Time: 0.000326587 seconds]
Below are couple of major functions which traverses the entire dns_adblameinfo_t list:
- entry_is_lame() method which is invoked when an incoming query is received and recursion is to be performed. In this function, complete list is traversed to check if lame_ttl is expired for any (qname, qtype) entry and also checks for incoming (qname, qtype) matches with any entry in the list.
/*
* Entry bucket MUST be locked!
*/
static isc_boolean_t
entry_is_lame(dns_adb_t *adb, dns_adbentry_t *entry, dns_name_t *qname,
dns_rdatatype_t qtype, isc_stdtime_t now)
{
dns_adblameinfo_t *li, *next_li;
isc_boolean_t is_bad;
is_bad = ISC_FALSE;
li = ISC_LIST_HEAD(entry->lameinfo);
if (li == NULL)
return (ISC_FALSE);
while (li != NULL) {
next_li = ISC_LIST_NEXT(li, plink);
/*
* Has the entry expired?
*/
if (li->lame_timer < now) {
ISC_LIST_UNLINK(entry->lameinfo, li, plink);
free_adblameinfo(adb, &li);
}
/*
* Order tests from least to most expensive.
*
* We do not break out of the main loop here as
* we use the loop for house keeping.
*/
if (li != NULL && !is_bad && li->qtype == qtype &&
dns_name_equal(qname, &li->qname))
is_bad = ISC_TRUE;
li = next_li;
}
return (is_bad);
}
- dns_adb_marklame() method is invoked when a upstream response is found lame and (qname, qtype) entry added for the server. Before adding an entry entire list is traversed.
isc_result_t
dns_adb_marklame(dns_adb_t *adb, dns_adbaddrinfo_t *addr, dns_name_t *qname,
dns_rdatatype_t qtype, isc_stdtime_t expire_time)
{
dns_adblameinfo_t *li;
int bucket;
isc_result_t result = ISC_R_SUCCESS;
REQUIRE(DNS_ADB_VALID(adb));
REQUIRE(DNS_ADBADDRINFO_VALID(addr));
REQUIRE(qname != NULL);
bucket = addr->entry->lock_bucket;
LOCK(&adb->entrylocks[bucket]);
li = ISC_LIST_HEAD(addr->entry->lameinfo);
while (li != NULL &&
(li->qtype != qtype || !dns_name_equal(qname, &li->qname)))
li = ISC_LIST_NEXT(li, plink);
if (li != NULL) {
if (expire_time > li->lame_timer)
li->lame_timer = expire_time;
goto unlock;
}
li = new_adblameinfo(adb, qname, qtype);
if (li == NULL) {
result = ISC_R_NOMEMORY;
goto unlock;
}
li->lame_timer = expire_time;
ISC_LIST_PREPEND(addr->entry->lameinfo, li, plink);
unlock:
UNLOCK(&adb->entrylocks[bucket]);
return (result);
}
Pstack traces:
Thread 15 (Thread 0x7fccbbbc3710 (LWP 30236)):
#0 entry_is_lame (adb=0x7fccaf4d4010, entry=0x0, qname=0x7fc911524052, qtype=23682, now=1623806085) at adb.c:2183
#1 0x00000000004b7c90 in copy_namehook_lists (adb=0x7fccaf4d4010, find=0x7fccb2226a40, qname=<optimized out>, qtype=<optimized out>, name=<optimized out>, now=<optimized out>) at adb.c:2264
#2 0x00000000004c150a in dns_adb_createfind2 (adb=0x7fccaf4d4010, task=<optimized out>, action=<optimized out>, arg=<optimized out>, name=<optimized out>, qname=<optimized out>, qtype=23682, options=207, now=1623806085, target=0x0, port=53, depth=1, qc=0x7fcc80612050, findp=0x7fccbbbc1998) at adb.c:3291
#3 0x00000000005b05b7 in findname (fctx=0x7fcc09264280, name=0x7fccbbbc1cc0, port=<optimized out>, options=<optimized out>, flags=0, now=<optimized out>, overquota=0x7fccbbbc1c90, need_alternate=0x7fccbbbc1c9c, no_addresses=0x7fccbbbc1c98) at resolver.c:3703
#4 0x00000000005b8826 in fctx_getaddresses (fctx=0x7fcc09264280, badcache=<optimized out>) at resolver.c:4017
#5 0x00000000005bb4bc in fctx_try (fctx=0x7fcc09264280, retrying=<optimized out>, badcache=isc_boolean_false) at resolver.c:4401
#6 0x00000000005c27e8 in resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:10182
#7 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#8 run (uap=<optimized out>) at task.c:1352
#9 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#10 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
Thread 14 (Thread 0x7fccbb3c2710 (LWP 30237)):
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007fcf2875a93e in _L_lock_949 () from /lib64/libpthread.so.0
#2 0x00007fcf2875a877 in __pthread_mutex_lock (mutex=0x7fc931917ff8) at pthread_mutex_lock.c:101
#3 0x00000000004b28e0 in dns_adb_beginudpfetch (adb=0x7fccaf4d4010, addr=0x80) at adb.c:5000
#4 0x00000000005b950d in fctx_query (fctx=0x7fcc48213d60, addrinfo=0x7fcc8a02f3f8, options=<optimized out>) at resolver.c:2160
#5 0x00000000005bba5f in fctx_try (fctx=0x7fcc48213d60, retrying=<optimized out>, badcache=isc_boolean_false) at resolver.c:4470
#6 0x00000000005c27e8 in resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:10182
#7 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#8 run (uap=<optimized out>) at task.c:1352
#9 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#10 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
Thread 13 (Thread 0x7fccbabc1710 (LWP 30238)):
#0 entry_is_lame (adb=0x7fccaf4d4010, entry=0x0, qname=0x7fc91152bd50, qtype=28336, now=1623806085) at adb.c:2183
#1 0x00000000004b7c90 in copy_namehook_lists (adb=0x7fccaf4d4010, find=0x7fcc8c69ee70, qname=<optimized out>, qtype=<optimized out>, name=<optimized out>, now=<optimized out>) at adb.c:2264
#2 0x00000000004c150a in dns_adb_createfind2 (adb=0x7fccaf4d4010, task=<optimized out>, action=<optimized out>, arg=<optimized out>, name=<optimized out>, qname=<optimized out>, qtype=28336, options=207, now=1623806085, target=0x0, port=53, depth=1, qc=0x7fcc8c756fc0, findp=0x7fccbabbf998) at adb.c:3291
#3 0x00000000005b05b7 in findname (fctx=0x7fcc33205be0, name=0x7fccbabbfcc0, port=<optimized out>, options=<optimized out>, flags=0, now=<optimized out>, overquota=0x7fccbabbfc90, need_alternate=0x7fccbabbfc9c, no_addresses=0x7fccbabbfc98) at resolver.c:3703
#4 0x00000000005b8826 in fctx_getaddresses (fctx=0x7fcc33205be0, badcache=<optimized out>) at resolver.c:4017
#5 0x00000000005bb4bc in fctx_try (fctx=0x7fcc33205be0, retrying=<optimized out>, badcache=isc_boolean_false) at resolver.c:4401
#6 0x00000000005c27e8 in resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:10182
#7 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#8 run (uap=<optimized out>) at task.c:1352
#9 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#10 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
Thread 12 (Thread 0x7fccba3c0710 (LWP 30239)):
#0 0x00000000004b542b in dns_adb_marklame (adb=0x7fccaf4d4010, addr=0x7fcc94d62030, qname=0x7fcc4cab6f50, qtype=1, expire_time=<optimized out>) at adb.c:4182
#1 0x00000000005c2d6c in resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:9768
#2 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#3 run (uap=<optimized out>) at task.c:1352
#4 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#5 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6 0x0000000000000000 in ?? ()
Thread 11 (Thread 0x7fccb9bbf710 (LWP 30240)):
#0 strchrnul () at ../sysdeps/x86_64/strchrnul.S:49
#1 0x00007fcf216477cc in __find_specmb (format=<optimized out>) at printf-parse.h:99
#2 _IO_vfprintf_internal (s=0x7fccb9bbca10, format=<optimized out>, ap=0x7fccb9bbcb90) at vfprintf.c:1619
#3 0x00007fcf2166c041 in _IO_vsnprintf (string=0x7fccb9bbcbb0 "Name voovlive.com (SOA", maxlen=2048, format=0x7ab020 "Name %s (%s) not subdomain of zone %s -- invalid response", args=0x7fccb9bbcb90) at vsnprintf.c:120
#4 0x00000000005b2583 in log_formerr (fctx=0x7fcca5501a70, format=0x220 <error: Cannot access memory at address 0x220>) at resolver.c:5483
#5 0x00000000005ba024 in noanswer_response (fctx=0x7fcca5501a70, oqname=<optimized out>, look_in_options=<optimized out>) at resolver.c:8051
#6 0x00000000005c083e in resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:9915
#7 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#8 run (uap=<optimized out>) at task.c:1352
#9 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#10 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
Thread 10 (Thread 0x7fccb93be710 (LWP 30241)):
#0 0x00000000004b542b in dns_adb_marklame (adb=0x7fccaf4d4010, addr=0x7fcc95313ac0, qname=0x7fcbeb7203f0, qtype=1, expire_time=<optimized out>) at adb.c:4182
#1 0x00000000005c2d6c in resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:9768
#2 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#3 run (uap=<optimized out>) at task.c:1352
#4 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#5 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6 0x0000000000000000 in ?? ()
Thread 9 (Thread 0x7fccb8bbd710 (LWP 30242)):
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007fcf2875a93e in _L_lock_949 () from /lib64/libpthread.so.0
#2 0x00007fcf2875a877 in __pthread_mutex_lock (mutex=0x7fc931917ff8) at pthread_mutex_lock.c:101
#3 0x00000000004b28e0 in dns_adb_beginudpfetch (adb=0x7fccaf4d4010, addr=0x80) at adb.c:5000
#4 0x00000000005b950d in fctx_query (fctx=0x7fcaecad1470, addrinfo=0x7fccb22152a0, options=<optimized out>) at resolver.c:2160
#5 0x00000000005bba5f in fctx_try (fctx=0x7fcaecad1470, retrying=<optimized out>, badcache=isc_boolean_false) at resolver.c:4470
#6 0x00000000005c27e8 in resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:10182
#7 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#8 run (uap=<optimized out>) at task.c:1352
#9 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#10 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
Thread 8 (Thread 0x7fccb83bc710 (LWP 30243)):
#0 entry_is_lame (adb=0x7fccaf4d4010, entry=0x0, qname=0x7fc91152680d, qtype=16189, now=1623806085) at adb.c:2183
#1 0x00000000004b7c90 in copy_namehook_lists (adb=0x7fccaf4d4010, find=0x7fccb221d640, qname=<optimized out>, qtype=<optimized out>, name=<optimized out>, now=<optimized out>) at adb.c:2264
#2 0x00000000004c150a in dns_adb_createfind2 (adb=0x7fccaf4d4010, task=<optimized out>, action=<optimized out>, arg=<optimized out>, name=<optimized out>, qname=<optimized out>, qtype=16189, options=223, now=1623806085, target=0x0, port=53, depth=1, qc=0x7fcc9a960910, findp=0x7fccb83bb738) at adb.c:3291
#3 0x00000000005b05b7 in findname (fctx=0x7fcc41fa4010, name=0x7fccb83bba60, port=<optimized out>, options=<optimized out>, flags=0, now=<optimized out>, overquota=0x7fccb83bba30, need_alternate=0x7fccb83bba3c, no_addresses=0x7fccb83bba38) at resolver.c:3703
#4 0x00000000005b8826 in fctx_getaddresses (fctx=0x7fcc41fa4010, badcache=<optimized out>) at resolver.c:4017
#5 0x00000000005bb4bc in fctx_try (fctx=0x7fcc41fa4010, retrying=<optimized out>, badcache=isc_boolean_false) at resolver.c:4401
#6 0x00000000005bc0f0 in fctx_start (task=<optimized out>, event=<optimized out>) at resolver.c:4981
#7 0x0000000000750133 in dispatch (manager=<optimized out>) at task.c:1180
#8 run (uap=<optimized out>) at task.c:1352
#9 0x00007fcf28758fc9 in start_thread (arg=<optimized out>) at pthread_create.c:297
#10 0x00007fcf216c855d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
</details>