Improve A/AAAA ADB and expiration synchronization for servers with addresses in both families
A customer had sufficient issues with the upstream IPv6 routing that they followed the advice from one of our KB articles and added server ::/0 { bogus yes; };
to their configuration.
Unexpectedly, this led to an increase in their SERVFAIL rate, impacting their customers.
The customer has done a detailed investigation into this and has identified that in a lot of cases the SERVFAIL is generated when the server is fetching fresh address records and the AAAA response returns before the A response, with the SERVFAIL being generated in the gap between the responses.
It seems that maybe we should wait for responses to both queries before proceeding?
In thinking about this further, I believe the same thing could happen if the A response arrives before the AAAA response and the two responses are processed in different seconds, pushing the expiration of the AAAA records to be later even if they are received with the same TTL.
Could/should we maybe force all of the address records for a name (A and AAAA) to expire at the same time by clamping them to all match the soonest expiration?