TTL-based cache cleaning in rbtdb ineffective (M/N problem)
Copying stuff from MatterMost for permanent record:
How it started?
@ondrej: But I don't know actually - if there's something blocking the top of the heap to be cleaned up (dunno why yet), it could block cleaning everything else behind it
So, here's a summary of our finding
The TTL based cleaning is opportunistic. When we are inserting a new node, we look at the top of the heap ("list sorted by TTL-value") and if TTL of the top of the heap is expired, we try to eradicate the data it stores from the cache. The code that decides whether we will do the TTL-based cleaning looks like this:
header = isc_heap_element(rbtdb->heaps[rbtnode->locknum], 1);
if (header != NULL) {
dns_ttl_t rdh_ttl = header->rdh_ttl;
/* Only account for stale TTL if cache is not overmem */
if (!cache_is_overmem) {
rdh_ttl += STALE_TTL(header, rbtdb);
}
if (rdh_ttl < now - RBTDB_VIRTUAL) {
expire_header(rbtdb, header, tree_locked,
expire_ttl);
}
}
Now, the RBTDB_VIRTUAL is this:
/*
* Allow clients with a virtual time of up to 5 minutes in the past to see
* records that would have otherwise have expired.
*/
#define RBTDB_VIRTUAL 300
Which means that even with 0 TTL records, the TTL-based cleaning will not kick-in until 5 minutes has passed.
Now what happens if you insert N records (where N could be large) within the first 5 minutes (cold bootstrap)?
After 5 minutes the TTL based cleaning should kick-in, but that's only if everything in the cache has expired, but even with that.
If we insert M records in the next minute, only M records will gets cleaned at maximum.
But anything that has longer TTL will just add to N, so the backlog of the records that are not subject to the TTL-based cleaning will build up over time until we are overmem and LRU based cleaning kicks in.
Additionally, @ondrej found out that there are records with rdh_ttl
set to 0
if you just run dnsperf
on a cold cache resolver, which made us puzzled - shouldn't we be cleaning records only after 5 minutes?
@michal did some excellent work and found out following:
so, i looked at those "zero TTL" frees that happen very shortly into the start of regular resolver operation
turns out that these are "superseded" headers
now, i don't understand why RBTDB does that. but it's this branch in add32() that expires these headers: https://gitlab.isc.org/isc-projects/bind9/-/blob/8fb49c5a8acdb37d4f4be0c5f5b19645d0103f48/lib/dns/rbtdb.c#L6677-6714
basically, AFAICT, what happens here is:
resolver: hey, RBTDB, add this shiny new SOA rdataset header to the cache RBTDB: oh okay, let me see if i have a SOA header on that node already... RBTDB: yup, got one. i'll just expire the one i had in there before (topheader) and insert the new one you gave me (newheader).
what i don't understand is why it just doesn't leave that old (identical, AFAICT) rdataset intact. but i'll move on with the reasoning for now.
so, the "previously-topmost" SOA header has its TTL set to zero, so it lands on the top of one of the TTL heaps
once we add a new rdataset, it should be cleaned up. but, as we already established yesterday, that only happens if the reference count of its node is zero.
meanwhile, in my very basic test (literally seconds into firing up dnsperf with the query set we use for respdiff tests, at some 1 kQPS), those rdatasets at the top of the heap are for nodes like net. or com. whose reference counts are around 60, or maybe 80 references. and until that count goes to zero, the first rdataset on the TTL heap will not disappear from there, blocking any rdataset headers on the same TTL heap from being cleaned up as well (backlog)
so, i thought "how bad is it and can we measure it?" and oh, boy, he could:
20-Feb-2024 10:46:29.561 connection refused resolving 'geo.fortinet.net/NS/IN': 104.198.248.186#53
20-Feb-2024 10:46:29.564 DNS format error from 203.205.195.75#53 resolving ns3.qq.com/AAAA for <unknown>: Name qq.com (SOA) not subdomain of zone ns3.qq.com -- invalid response
20-Feb-2024 10:46:29.564 FORMERR resolving 'ns3.qq.com/AAAA/IN': 203.205.195.75#53
20-Feb-2024 10:46:29.564 DNS format error from 108.136.87.44#53 resolving api.cloud.huawei.com/AAAA for 127.0.0.1#34840: Name huawei.com (SOA) not subdomain of zone cloud.huawei.com -- invalid response
20-Feb-2024 10:46:29.564 FORMERR resolving 'api.cloud.huawei.com/AAAA/IN': 108.136.87.44#53
-- 0 msecs since expiry
-- 0 msecs since expiry
-- 106 msecs since expiry
-- 106 msecs since expiry
-- 29 msecs since expiry
-- 29 msecs since expiry
20-Feb-2024 10:46:29.574 success resolving 'a.karma.sc2.fsapi.com/A' after disabling qname minimization due to 'ncache nxdomain'
-- 243 msecs since expiry
-- 246 msecs since expiry
-- 249 msecs since expiry
-- 433 msecs since expiry
-- 806 msecs since expiry
-- 813 msecs since expiry
-- 243 msecs since expiry
-- 246 msecs since expiry
-- 249 msecs since expiry
-- 433 msecs since expiry
-- 806 msecs since expiry
-- 0 msecs since expiry
-- 816 msecs since expiry
-- 816 msecs since expiry
-- 123 msecs since expiry
-- 323 msecs since expiry
-- 123 msecs since expiry
-- 323 msecs since expiry
20-Feb-2024 10:46:29.601 success resolving 'F00DCFRIBM41.zf.if.atcsg.net/A' after disabling qname minimization due to 'ncache nxdomain'
20-Feb-2024 10:46:29.624 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 47.118.199.194#53
-- 0 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:29.677 DNS format error from 3.97.163.50#53 resolving api.cloud.huawei.com/AAAA for 127.0.0.1#34840: Name huawei.com (SOA) not subdomain of zone cloud.huawei.com -- invalid response
20-Feb-2024 10:46:29.677 FORMERR resolving 'api.cloud.huawei.com/AAAA/IN': 3.97.163.50#53
-- 6 msecs since expiry
-- 3 msecs since expiry
-- 9 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:29.801 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 39.96.153.33#53
-- 3 msecs since expiry
-- 3 msecs since expiry
-- 46 msecs since expiry
-- 19 msecs since expiry
-- 0 msecs since expiry
-- 39 msecs since expiry
-- 0 msecs since expiry
-- 0 msecs since expiry
-- 3 msecs since expiry
-- 3 msecs since expiry
-- 126 msecs since expiry
-- 0 msecs since expiry
-- 0 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:30.104 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 139.224.142.104#53
-- 286 msecs since expiry
-- 0 msecs since expiry
-- 69 msecs since expiry
-- 13 msecs since expiry
20-Feb-2024 10:46:30.307 success resolving 'hcdnd.csfw.c.cdnhwc6.com/AAAA' after disabling qname minimization due to 'ncache nxdomain'
-- 0 msecs since expiry
-- 199 msecs since expiry
-- 269 msecs since expiry
-- 656 msecs since expiry
-- 689 msecs since expiry
-- 746 msecs since expiry
-- 789 msecs since expiry
-- 816 msecs since expiry
-- 819 msecs since expiry
-- 819 msecs since expiry
-- 839 msecs since expiry
-- 853 msecs since expiry
-- 873 msecs since expiry
-- 876 msecs since expiry
-- 889 msecs since expiry
-- 896 msecs since expiry
-- 896 msecs since expiry
-- 916 msecs since expiry
-- 936 msecs since expiry
-- 956 msecs since expiry
-- 969 msecs since expiry
-- 979 msecs since expiry
-- 983 msecs since expiry
-- 999 msecs since expiry
-- 999 msecs since expiry
-- 1006 msecs since expiry
-- 1006 msecs since expiry
-- 1053 msecs since expiry
-- 1053 msecs since expiry
-- 1063 msecs since expiry
-- 1066 msecs since expiry
-- 1073 msecs since expiry
-- 1079 msecs since expiry
-- 1083 msecs since expiry
-- 1089 msecs since expiry
-- 1096 msecs since expiry
-- 1133 msecs since expiry
-- 1139 msecs since expiry
-- 1156 msecs since expiry
-- 1163 msecs since expiry
-- 1166 msecs since expiry
-- 1179 msecs since expiry
-- 1183 msecs since expiry
-- 1189 msecs since expiry
-- 1193 msecs since expiry
-- 1196 msecs since expiry
-- 1199 msecs since expiry
-- 1223 msecs since expiry
-- 1223 msecs since expiry
-- 1249 msecs since expiry
-- 1259 msecs since expiry
-- 1276 msecs since expiry
-- 1293 msecs since expiry
-- 1336 msecs since expiry
-- 1363 msecs since expiry
-- 1369 msecs since expiry
-- 1379 msecs since expiry
-- 1393 msecs since expiry
-- 1396 msecs since expiry
-- 1399 msecs since expiry
-- 1403 msecs since expiry
-- 1423 msecs since expiry
-- 1429 msecs since expiry
-- 1436 msecs since expiry
-- 1436 msecs since expiry
-- 1439 msecs since expiry
-- 1446 msecs since expiry
-- 1446 msecs since expiry
-- 1466 msecs since expiry
-- 1466 msecs since expiry
-- 1469 msecs since expiry
-- 1473 msecs since expiry
-- 1486 msecs since expiry
-- 1503 msecs since expiry
-- 1506 msecs since expiry
-- 1516 msecs since expiry
-- 1539 msecs since expiry
-- 1539 msecs since expiry
-- 1556 msecs since expiry
-- 1566 msecs since expiry
-- 1569 msecs since expiry
-- 1589 msecs since expiry
-- 1623 msecs since expiry
-- 1623 msecs since expiry
-- 1629 msecs since expiry
-- 1636 msecs since expiry
-- 1636 msecs since expiry
-- 1649 msecs since expiry
-- 1679 msecs since expiry
-- 1686 msecs since expiry
-- 1703 msecs since expiry
-- 1703 msecs since expiry
-- 1716 msecs since expiry
-- 1726 msecs since expiry
-- 1736 msecs since expiry
-- 1773 msecs since expiry
-- 1796 msecs since expiry
-- 1796 msecs since expiry
-- 1799 msecs since expiry
-- 1803 msecs since expiry
-- 1829 msecs since expiry
-- 1839 msecs since expiry
-- 1876 msecs since expiry
-- 1879 msecs since expiry
-- 1883 msecs since expiry
-- 1889 msecs since expiry
-- 1889 msecs since expiry
-- 1889 msecs since expiry
-- 1899 msecs since expiry
-- 1946 msecs since expiry
-- 1956 msecs since expiry
-- 1983 msecs since expiry
-- 2013 msecs since expiry
-- 2013 msecs since expiry
-- 2023 msecs since expiry
-- 2049 msecs since expiry
-- 2059 msecs since expiry
-- 2063 msecs since expiry
-- 2079 msecs since expiry
-- 2106 msecs since expiry
-- 2109 msecs since expiry
-- 2119 msecs since expiry
-- 2136 msecs since expiry
-- 2156 msecs since expiry
-- 2183 msecs since expiry
-- 2199 msecs since expiry
-- 2203 msecs since expiry
-- 2219 msecs since expiry
-- 2223 msecs since expiry
-- 2229 msecs since expiry
-- 2243 msecs since expiry
-- 2253 msecs since expiry
-- 2263 msecs since expiry
-- 2286 msecs since expiry
-- 2303 msecs since expiry
-- 2343 msecs since expiry
-- 2356 msecs since expiry
-- 2376 msecs since expiry
-- 2376 msecs since expiry
-- 2379 msecs since expiry
-- 2386 msecs since expiry
-- 2416 msecs since expiry
-- 2426 msecs since expiry
-- 2433 msecs since expiry
-- 2449 msecs since expiry
-- 2466 msecs since expiry
-- 2479 msecs since expiry
-- 2509 msecs since expiry
-- 2516 msecs since expiry
-- 2523 msecs since expiry
-- 2529 msecs since expiry
-- 2536 msecs since expiry
-- 2539 msecs since expiry
-- 2539 msecs since expiry
-- 2543 msecs since expiry
-- 2553 msecs since expiry
-- 2559 msecs since expiry
-- 2569 msecs since expiry
-- 2593 msecs since expiry
-- 2609 msecs since expiry
-- 2616 msecs since expiry
-- 2623 msecs since expiry
-- 2629 msecs since expiry
-- 2656 msecs since expiry
-- 2659 msecs since expiry
-- 2673 msecs since expiry
-- 2716 msecs since expiry
-- 2766 msecs since expiry
-- 2773 msecs since expiry
-- 2773 msecs since expiry
-- 2776 msecs since expiry
-- 2783 msecs since expiry
-- 2789 msecs since expiry
-- 2823 msecs since expiry
-- 2843 msecs since expiry
-- 2879 msecs since expiry
-- 2879 msecs since expiry
-- 2886 msecs since expiry
-- 2899 msecs since expiry
-- 2919 msecs since expiry
-- 2926 msecs since expiry
-- 2949 msecs since expiry
-- 2956 msecs since expiry
-- 2956 msecs since expiry
-- 2956 msecs since expiry
-- 2979 msecs since expiry
-- 2983 msecs since expiry
-- 2983 msecs since expiry
-- 2986 msecs since expiry
-- 3033 msecs since expiry
-- 3039 msecs since expiry
-- 3049 msecs since expiry
-- 3066 msecs since expiry
-- 3086 msecs since expiry
-- 3126 msecs since expiry
-- 3129 msecs since expiry
-- 3156 msecs since expiry
-- 3169 msecs since expiry
-- 3173 msecs since expiry
-- 3176 msecs since expiry
-- 3226 msecs since expiry
-- 3253 msecs since expiry
-- 3266 msecs since expiry
-- 3269 msecs since expiry
-- 3279 msecs since expiry
-- 3279 msecs since expiry
-- 3299 msecs since expiry
-- 3323 msecs since expiry
-- 3323 msecs since expiry
-- 3336 msecs since expiry
-- 3336 msecs since expiry
-- 3363 msecs since expiry
-- 3366 msecs since expiry
-- 3396 msecs since expiry
-- 3399 msecs since expiry
-- 3409 msecs since expiry
-- 3449 msecs since expiry
-- 3463 msecs since expiry
-- 3466 msecs since expiry
-- 3499 msecs since expiry
-- 0 msecs since expiry
-- 199 msecs since expiry
-- 269 msecs since expiry
-- 656 msecs since expiry
-- 689 msecs since expiry
-- 746 msecs since expiry
-- 789 msecs since expiry
-- 816 msecs since expiry
-- 819 msecs since expiry
-- 819 msecs since expiry
-- 839 msecs since expiry
-- 853 msecs since expiry
-- 873 msecs since expiry
-- 876 msecs since expiry
-- 889 msecs since expiry
-- 896 msecs since expiry
-- 896 msecs since expiry
-- 916 msecs since expiry
-- 936 msecs since expiry
-- 956 msecs since expiry
-- 969 msecs since expiry
-- 979 msecs since expiry
-- 983 msecs since expiry
-- 999 msecs since expiry
-- 999 msecs since expiry
-- 1006 msecs since expiry
-- 1006 msecs since expiry
-- 1053 msecs since expiry
-- 1053 msecs since expiry
-- 1063 msecs since expiry
-- 1066 msecs since expiry
-- 1073 msecs since expiry
-- 1079 msecs since expiry
-- 1083 msecs since expiry
-- 1089 msecs since expiry
-- 1096 msecs since expiry
-- 1133 msecs since expiry
-- 1139 msecs since expiry
-- 1156 msecs since expiry
-- 1163 msecs since expiry
-- 1166 msecs since expiry
-- 1179 msecs since expiry
-- 1183 msecs since expiry
-- 1189 msecs since expiry
-- 1196 msecs since expiry
-- 1199 msecs since expiry
-- 1203 msecs since expiry
-- 1226 msecs since expiry
-- 1226 msecs since expiry
-- 1253 msecs since expiry
-- 1263 msecs since expiry
-- 1279 msecs since expiry
-- 1296 msecs since expiry
-- 1339 msecs since expiry
-- 1366 msecs since expiry
-- 1373 msecs since expiry
-- 1383 msecs since expiry
-- 1396 msecs since expiry
-- 1399 msecs since expiry
-- 1403 msecs since expiry
-- 1406 msecs since expiry
-- 1426 msecs since expiry
-- 1433 msecs since expiry
-- 1439 msecs since expiry
-- 1439 msecs since expiry
-- 1443 msecs since expiry
-- 1449 msecs since expiry
-- 1449 msecs since expiry
-- 1469 msecs since expiry
-- 1469 msecs since expiry
-- 1473 msecs since expiry
-- 1476 msecs since expiry
-- 1489 msecs since expiry
-- 1506 msecs since expiry
-- 1509 msecs since expiry
-- 1519 msecs since expiry
-- 1543 msecs since expiry
-- 1543 msecs since expiry
-- 1559 msecs since expiry
-- 1569 msecs since expiry
-- 1573 msecs since expiry
-- 1593 msecs since expiry
-- 1626 msecs since expiry
-- 1626 msecs since expiry
-- 1633 msecs since expiry
-- 1639 msecs since expiry
-- 1639 msecs since expiry
-- 1653 msecs since expiry
-- 1683 msecs since expiry
-- 1689 msecs since expiry
-- 1706 msecs since expiry
-- 1706 msecs since expiry
-- 1719 msecs since expiry
-- 1729 msecs since expiry
-- 1739 msecs since expiry
-- 1776 msecs since expiry
-- 1799 msecs since expiry
-- 1799 msecs since expiry
-- 1803 msecs since expiry
-- 1806 msecs since expiry
-- 1833 msecs since expiry
-- 1843 msecs since expiry
-- 1879 msecs since expiry
-- 1883 msecs since expiry
-- 1886 msecs since expiry
-- 1893 msecs since expiry
-- 1893 msecs since expiry
-- 1893 msecs since expiry
-- 1903 msecs since expiry
-- 1949 msecs since expiry
-- 1959 msecs since expiry
-- 1986 msecs since expiry
-- 2016 msecs since expiry
-- 2016 msecs since expiry
-- 2026 msecs since expiry
-- 2053 msecs since expiry
-- 2063 msecs since expiry
-- 2066 msecs since expiry
-- 2083 msecs since expiry
-- 2109 msecs since expiry
-- 2113 msecs since expiry
-- 2123 msecs since expiry
-- 2139 msecs since expiry
-- 2159 msecs since expiry
-- 2186 msecs since expiry
-- 2203 msecs since expiry
-- 2206 msecs since expiry
-- 2223 msecs since expiry
-- 2226 msecs since expiry
-- 2233 msecs since expiry
-- 2246 msecs since expiry
-- 2256 msecs since expiry
-- 2266 msecs since expiry
-- 2289 msecs since expiry
-- 2306 msecs since expiry
-- 2346 msecs since expiry
-- 2359 msecs since expiry
-- 2379 msecs since expiry
-- 2379 msecs since expiry
-- 2383 msecs since expiry
-- 2389 msecs since expiry
-- 2419 msecs since expiry
-- 2429 msecs since expiry
-- 2436 msecs since expiry
-- 2453 msecs since expiry
-- 2469 msecs since expiry
-- 2483 msecs since expiry
-- 2513 msecs since expiry
-- 2519 msecs since expiry
-- 2526 msecs since expiry
-- 2533 msecs since expiry
-- 2539 msecs since expiry
-- 2543 msecs since expiry
-- 2543 msecs since expiry
-- 2546 msecs since expiry
-- 2556 msecs since expiry
-- 2563 msecs since expiry
-- 2573 msecs since expiry
-- 2596 msecs since expiry
-- 2613 msecs since expiry
-- 2619 msecs since expiry
-- 2626 msecs since expiry
-- 2633 msecs since expiry
-- 2659 msecs since expiry
-- 2663 msecs since expiry
-- 2676 msecs since expiry
-- 2719 msecs since expiry
-- 2769 msecs since expiry
-- 2776 msecs since expiry
-- 2776 msecs since expiry
-- 2779 msecs since expiry
-- 2786 msecs since expiry
-- 2793 msecs since expiry
-- 2826 msecs since expiry
-- 2846 msecs since expiry
-- 2883 msecs since expiry
-- 2883 msecs since expiry
-- 2889 msecs since expiry
-- 2903 msecs since expiry
-- 2923 msecs since expiry
-- 2929 msecs since expiry
-- 2956 msecs since expiry
-- 2959 msecs since expiry
-- 2959 msecs since expiry
-- 2959 msecs since expiry
-- 2983 msecs since expiry
-- 2986 msecs since expiry
-- 2986 msecs since expiry
-- 2989 msecs since expiry
-- 3036 msecs since expiry
-- 3043 msecs since expiry
-- 3053 msecs since expiry
-- 3069 msecs since expiry
-- 3089 msecs since expiry
-- 3129 msecs since expiry
-- 3133 msecs since expiry
-- 3159 msecs since expiry
-- 3173 msecs since expiry
-- 3176 msecs since expiry
-- 3179 msecs since expiry
-- 3229 msecs since expiry
-- 3256 msecs since expiry
-- 3269 msecs since expiry
-- 3273 msecs since expiry
-- 3283 msecs since expiry
-- 3283 msecs since expiry
-- 3303 msecs since expiry
-- 3326 msecs since expiry
-- 3326 msecs since expiry
-- 3339 msecs since expiry
-- 3339 msecs since expiry
-- 3366 msecs since expiry
-- 3369 msecs since expiry
-- 3399 msecs since expiry
-- 3403 msecs since expiry
-- 3413 msecs since expiry
-- 3453 msecs since expiry
-- 3466 msecs since expiry
-- 3469 msecs since expiry
-- 3 msecs since expiry
20-Feb-2024 10:46:30.324 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 39.96.153.53#53
-- 0 msecs since expiry
-- 0 msecs since expiry
20-Feb-2024 10:46:30.524 REFUSED unexpected RCODE resolving 'cloud.Jensen.com.cn/NS/IN': 39.96.153.32#53
-- 0 msecs since expiry
So, there are multiple issues circling around the TTL-based cleaning.