"Address already in use" should optionally be fatal
I would like an option so that if the server encounters "address already in use", it is fatal, and the server exits.
Currently, it appears that on the default setting of "listen on all interfaces", an address already in use is silently ignored. If interfaces are explicitly listed in the options section then it is noted when one of them is in use:
Jul 24 13:14:13 a0 named[10713]: listening on IPv4 interface eth0:0, 85.119.82.237#53
Jul 24 13:14:13 a0 named[10713]: binding TCP socket: address in use
…but this is not fatal.
I have just spent 2 days debugging a problem for a customer where their bind9 server would not allow an AXFR, saying it was NOTAUTH, despite the fact that every DNS diagnostic tool at my disposal said that it was in fact authoritative and the logs from named showed that it was loading the zone fine.
I eventually discovered that there was a lingering named process from 24 hours earlier which the customer had somehow started from outside of systemd. The new copy of named was started by systemd and as it silently ignored the inability to bind to TCP/53, systemd believed that everything was okay and there was absolutely no logging of the bind failure, even at debug level.
All TCP/53 traffic was answered and responded to by the old rogue copy of named which at its time of starting was not configured to serve the zone, hence NOTAUTH.
Of course, messages regarding the AXFR failure were logged by the rogue copy of named, but it was hard to notice that they were from a different process as the only clue was the differing process ID, which did not stand out. So yes, I could see that the AXFR was failing because NOTAUTH, but I could not comprehend why named considered itself NOTAUTH given that all the configuration seemed correct.
It is a stroke of luck that the customer did not give up and reboot, because that would of course have "fixed" the situation by clearing out the rogue copy of named.
I feel that this situation was unnecessarily difficult to debug owing to the lack of logging. Ideally it could be possible to complain even in the "listen on all" default case, but if that is not possible then I accept that I learned a new thing which is that my own practice of explicitly listing interfaces to listen on is generally a good idea that I should extend to every customer's config as well.
But even then, that only gets me as far as a warning in the logs, not a refusal to start. Though at least a warning would certainly give enough to work on to find the problem in future.
This behavour was seen with 9.11 on Debian buster.