The XFR unreachable cache redesign
The unreachable cache for dead primaries was added to BIND 9 in 2006 via 1372e172. It features a 10-slot LRU array with 600 seconds (10 minutes) fixed delay. During this time, any primary with a hiccup would be blocked for the whole block duration (unless overwritten by a different dead primary).
One can argue:
- 10 minutes is too long for a fixed, non-configurable delay
- 10 slots are not enough - servers could be running 1M and more zones with different primaries; and especially in situations like these, there's a high chance that more primaries would be having problems
I think this needs a redesign, but meanwhile - I think that we can drop the UNREACH_HOLD_TIME
to something like 10 seconds (or 60?) - this should still prevent a thundering herd over the unresponsive server, but the recovery is going to be much faster.