Did we set the max-recursion-queries limit too low in our CVE-2020-8616 fix?
A Support customer has reported to us that the first time they query google.de on a server with a cold cache they get a SERVFAIL. Subsequent queries succeed. They asked us about this because the behavior differed from an earlier version of BIND which did not exhibit the issue.
After some troubleshooting, it turns out that, due to a combination of factors -- some specific to the domain but also some applying to the server, they are hitting the max-recursion-queries limit of 75 queries that was set as part of the remediation for CVE-2020-8616 (intended to prevent an exploit, demonstrated as a proof-of-concept by the researchers who reported it, that could send a server chasing a huge number of queries when processing a referral.)
The situation reported by the customer seems to demonstrate that a server with a not-very-unusual configuration can hit the limit while processing a common, fairly high-profile zone. Should we then adjust the limit, make changes to the log message visibility when the limit is hit, and/or make other changes?