[ISC-support #13383] Add RRset size-limiting option to protect masters from rogue dynamic updates and slaves from damaging IXFR/AXFRs
Based on https://support.isc.org/Ticket/Display.html?id=13383
The issue is that although the DNS protocol does not restrict how many RRs can exist in a single RRset, once the number of RRs becomes significantly large, both dynamic updates and inbound IXFRs become highly CPU-intensive and performance degrading. Encountering oversized RRsets in a production environment is almost certainly the result of a rogue client or poorly-considered configuration/design, but the outcome for any DNS servers having to maintain oversized RRsets can be a significantly degraded performance.
(This scenario is similar to the one that led to the introduction of BIND option max-rrset which protects servers that are hosting zones provided by 3rd parties from being overwhelmed unexpectedly by a zone that unexpectedly becomes of significant size - more than the hosting server can handle).
Some more background information to the case leading to this request is that it was observed in a zone that has an RRset consisting of 4K RRs of type A for the same hostname, that adding a new RR with a different TTL can take in excess of several seconds and the task performing the update consumes a significant amount of CPU doing so.
The reason why adding an additional RR causes so much internal zone maintenance activity is complex, but it is directly due to the TTL change and what this implies for maintenance of the .jnl file (used both to satisfy outbound IXFR requests, and for rolling forward/loading the zone locally, in case of an unexpected interrupt of named (a normal rndc stop will write the current version of the zone to disk, but a sigterm or crash won't - although see option flush-zones-on-shutdown which defaults to "no" that can influence this).
When the TTL of one of the RRset changes, it results in all of the other RRs in the set having to be updated to adopt the new TTL at the same time (a difference in TTL between the members of the same RRset is not permitted). Although the TTL is stored on a per-RRset basis internal to BIND, this is, unfortunately, not the end of the story. When the RRset is being updated, named has to generate the set of zone content changes for the .jnl file - and it's here that this becomes, not a single RR add, but the deletion of 4K RRs and the addition of 4K+1 RRs. And this is what drives named doing exactly this when it adds this new RRset itself to its zone.
However, this gets worse (and this next part explains why the maintenance of large RRsets increases exponentially rather than linearly). When you add a RR to a pre-existing RRset, in order to prevent duplicates, for each 'add' named has to check the new candidate against all of the other RRs already in the set. So, when adding the 200th, we have to check it against 199 others, but for the 4001th, it's checking against 4000 others.
Yes, potentially we could optimise this internally for the single case where we know we've checked the one RR that we're adding, on the basis that we know the only reason we're doing 4K deletes and adds is to change the TTL on the whole lot, but that really we're just adding one new RR. But that is not going to help a slave server receiving an IXFR of the same update, and it's not going to help this server if it needs to reload/replay the zone update from the .jnl file if it gets interrupted.
So similarly, just accepting an IXFR for a zone with 4K+ new RRs to be added to a single RRset is also going to impact named, irrespective of the TTLs on those RRs. (Basically, oversized RRsets are bad news for a server that is properly checking the contents of zones that it is loading/updating).
Therefore, going back to the request for a new option... it's clearly bordering on insane to have RRsets of this size - DNS certainly wasn't designed to handle them (think of the query responses and the clients!) and BIND certainly wasn't optimised internally for this extreme scenario. Therefore, please could we have a new option (with sane defaults on a per-RTYPE basis) to prevent accidental craziness that degrades performance.
Something like:
max-rrset [rrtype] integer;
(Defaults to be discussed... but they're potentially going to be different for PTR vs other RTYPEs).