random port re-use for zone refresh SOA queries - when it doesn't work it really doesn't work
There's a well-known issue with SOA queries for zone refreshes, where the path to an authoritative server from a secondary has some random ports blocked.
The issue is that for these refresh queries alone, BIND reuses an existing dispatch to the same server IP address. It does so for performance reasons, so that it's not creating and then tearing down sockets all the time, between it and a frequently-used provider of zone updates. But there is a down-side to this strategy, as previously encountered before, and that has not yet made it into a KB article.
IF a socket has been opened for a refresh query to another server, but the packets to the other server are all being dropped (because of the source port that has been chosen randomly by the OS - for example, then this refresh query will have to wait to time-out. This makes it quite likely (on a busy authoritative server, particularly one participating in a multi-layered tree-like zone propagation strategy) another zone refresh will come along and see the existing socket that has been opened, so will also send its own refresh query that way.
When the first query times out, the socket persists because there's now another zone refresh using it ... and so-on. Essentially nothing more can be refreshed from this server because there's always at least one outstanding query waiting to time out and keeping the port open and the dispatch active.
See: https://bugs.isc.org/Ticket/Display.html?id=26974
The obvious mitigation is to prevent named from using source ports that are known not to work. For example some providers are blocking port 11211 in response to memcached DDoS activity: https://blog.cloudflare.com/memcrashed-major-amplification-attacks-from-port-11211/ so add this to named.conf:
avoid-v4-udp-ports { 11211; };
But this is 'whack-a-mole' - the problem is bigger than this - how do you know in advance which random ports are not going to work for zone refreshes?
Can BIND 'do it better' than it is doing now?
It would potentially be messy (touches too many areas of other processing) to add an additional field to the structure handling the socket in order to count timeouts - but that might be the only way to do this?
We can't use something in ADB, because it's only a timeout problem to this server for specific source ports.
It also doesn't work to retry the same server again immediately with a different random source port - we don't know if the timeout is because the server is silently unreachable, or if it's only this source port that is problematic. And we don't want to double (or more) the time to getting back a good SOA RR and the initiating the zone update.
And then finally, we don't know that our port is a 'bad' port until we've tried the server and succeeded with a 'good' port - so again, I'm not sure what the best process would be to try to address this in named's own processing logic.
So final thought, is to add a couple of counters to the dispatch so that after it has been used 'x' times or has timed-out 'y' times, it's no longer permissable to reuse it - a new socket with a new random port has to be opened ...
====
Further musing:
Would the ephemeral/dynamic range (49152–65535) be any safer than using the full 1024-65535? For sure, you're reducing randomness, and just because a port isn't in use by a registered service locally, doesn't guarantee that a firewall along the way hasn't decided to block it? But a thought?
====
Footnote: This is only a problem where the port blocking doesn't result in a send failure (ICMP something or other) that can get back to the sending process. It's silent drops and/or faulty network stacks that don't pass the failure up to the sending socket call that are going to cause this issue.