named can be started multiple times
Summary
named will start multiple times. Several years ago based on a feature request from Carsten Strotmann, code was added that only allowed it to start once, and a second attempt at starting would fail with exit status=1.
The new situation is more problematic than the one Carsten reported. Then, if named was started and then a new named process was begun, the second would fail to listen on all ports, and could not be communicated with using rndc. Now, the second named can be reached with rndc after the first process is stopped, and it may listed ports and respond to queries. (May means I saw that behavior, but haven't documented it.)
BIND version used
Problem version: 9.14.3 (demoed below) Also seen in 9.14.2 Problem does not appear in: BIND 9.12.3
# named -V
BIND 9.14.3 (Stable Release) <id:896acdc>
running on Linux x86_64 4.11.12-100.fc24.x86_64 #1 SMP Fri Jul 21 17:35:20 UTC 2017
built by make with '--sysconfdir=/etc/namedb' '--enable-dnstap'
compiled by GCC 8.3.0
compiled with OpenSSL version: OpenSSL 1.1.1c 28 May 2019
linked to OpenSSL version: OpenSSL 1.1.1c 28 May 2019
threads support is enabled
default paths:
named configuration: /etc/namedb/named.conf
rndc configuration: /etc/namedb/rndc.conf
DNSSEC root key: /etc/namedb/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
Steps to reproduce
named is initially not running
# ps -ef | grep named | grep -v grep
# named
# named is process 15662
#ps -ef | grep named | grep -v grep
root 15662 1 1 15:13 ? 00:00:00 named
note the time named was last configured: 13:13:23
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT
ss reports that named (proc 15662) has 11 sockets open
# ss -ltunp | grep named | sed 's/ *$//'
udp UNCONN 0 0 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15662,fd=517))
udp UNCONN 0 0 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15662,fd=516))
udp UNCONN 0 0 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15662,fd=515))
udp UNCONN 0 0 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15662,fd=514))
udp UNCONN 0 0 [::]:53 [::]:* users:(("named",pid=15662,fd=512))
udp UNCONN 0 0 [::]:53 [::]:* users:(("named",pid=15662,fd=513))
tcp LISTEN 0 10 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15662,fd=23))
tcp LISTEN 0 10 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15662,fd=22))
tcp LISTEN 0 128 127.0.0.1:953 0.0.0.0:* users:(("named",pid=15662,fd=24))
tcp LISTEN 0 10 [::]:53 [::]:* users:(("named",pid=15662,fd=21))
tcp LISTEN 0 128 [::1]:953 [::]:* users:(("named",pid=15662,fd=25))
Starting named a second time succeeds.
# named
# echo $?
0
# ps -ef | grep named | grep -v grep
root 15662 1 0 15:13 ? 00:00:00 named
root 15803 1 0 15:26 ? 00:00:00 named
Now the weird part. The second process succeeded in listening on the same ports as the first process.
# ss -ltunp | grep named | sed 's/ *$//'
udp UNCONN 0 0 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15803,fd=517))
udp UNCONN 0 0 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15803,fd=516))
udp UNCONN 0 0 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15803,fd=515))
udp UNCONN 0 0 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15803,fd=514))
udp UNCONN 0 0 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15662,fd=517))
udp UNCONN 0 0 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15662,fd=516))
udp UNCONN 0 0 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15662,fd=515))
udp UNCONN 0 0 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15662,fd=514))
udp UNCONN 0 0 [::]:53 [::]:* users:(("named",pid=15662,fd=512))
udp UNCONN 0 0 [::]:53 [::]:* users:(("named",pid=15662,fd=513))
udp UNCONN 0 0 [::]:53 [::]:* users:(("named",pid=15803,fd=512))
udp UNCONN 0 0 [::]:53 [::]:* users:(("named",pid=15803,fd=513))
tcp LISTEN 0 10 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15803,fd=23))
tcp LISTEN 0 10 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15803,fd=22))
tcp LISTEN 0 10 192.168.53.111:53 0.0.0.0:* users:(("named",pid=15662,fd=23))
tcp LISTEN 0 10 127.0.0.1:53 0.0.0.0:* users:(("named",pid=15662,fd=22))
tcp LISTEN 0 128 127.0.0.1:953 0.0.0.0:* users:(("named",pid=15803,fd=24))
tcp LISTEN 0 128 127.0.0.1:953 0.0.0.0:* users:(("named",pid=15662,fd=24))
tcp LISTEN 0 10 [::]:53 [::]:* users:(("named",pid=15662,fd=21))
tcp LISTEN 0 10 [::]:53 [::]:* users:(("named",pid=15803,fd=21))
tcp LISTEN 0 128 [::1]:953 [::]:* users:(("named",pid=15662,fd=25))
tcp LISTEN 0 128 [::1]:953 [::]:* users:(("named",pid=15803,fd=25))
NOTE: I would have thought this was an operating environment failure, except that with the same environment, BIND 9.12.3 does not start a second time.
AND NOW THE PROBLEM. Run rndc status over and over again, and sometimes it will connect with one process, sometime with the other. We notice this with different "last configured" timestamps. Keep repeating the command until it shows connecting with both processes:
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:26:04 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:26:04 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT
#
Not shown, but when this problem will effect queries as well. Some go to one process, some to the other. That was to debug.
What is the current bug behavior?
Because some queries and some rndc commands go to one process, and some to another, this is inconsistent results from the same server (where server is defined as a machine, not a process).
What is the expected correct behavior?
named should not start twice. As an extra request, I recommend it should put out an error to STDERR. A second invocation of named will almost always be manual from the CLI. Therefore, output to STDERR is very useful. This contrasts to error and warning messages when named is started automatically. Possible message: named: Another instance is already running. Not starting.
Relevant configuration files
N/A