named can be started multiple times

Summary

named will start multiple times. Several years ago based on a feature request from Carsten Strotmann, code was added that only allowed it to start once, and a second attempt at starting would fail with exit status=1.

The new situation is more problematic than the one Carsten reported. Then, if named was started and then a new named process was begun, the second would fail to listen on all ports, and could not be communicated with using rndc. Now, the second named can be reached with rndc after the first process is stopped, and it may listed ports and respond to queries. (May means I saw that behavior, but haven't documented it.)

BIND version used

Problem version: 9.14.3 (demoed below) Also seen in 9.14.2 Problem does not appear in: BIND 9.12.3

# named -V
BIND 9.14.3 (Stable Release) <id:896acdc>
running on Linux x86_64 4.11.12-100.fc24.x86_64 #1 SMP Fri Jul 21 17:35:20 UTC 2017
built by make with '--sysconfdir=/etc/namedb' '--enable-dnstap'
compiled by GCC 8.3.0
compiled with OpenSSL version: OpenSSL 1.1.1c  28 May 2019
linked to OpenSSL version: OpenSSL 1.1.1c  28 May 2019
threads support is enabled

default paths:
  named configuration:  /etc/namedb/named.conf
  rndc configuration:   /etc/namedb/rndc.conf
  DNSSEC root key:      /etc/namedb/bind.keys
  nsupdate session key: /var/run/named/session.key
  named PID file:       /var/run/named/named.pid
  named lock file:      /var/run/named/named.lock

Steps to reproduce

named is initially not running

# ps -ef | grep named | grep -v grep 

# named

# named is process 15662
#ps -ef | grep named | grep -v grep
root     15662     1  1 15:13 ?        00:00:00 named

note the time named was last configured: 13:13:23

# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT

ss reports that named (proc 15662) has 11 sockets open

# ss -ltunp | grep named | sed 's/ *$//'   
udp     UNCONN   0        0          192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15662,fd=517))
udp     UNCONN   0        0          192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15662,fd=516))
udp     UNCONN   0        0               127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15662,fd=515))
udp     UNCONN   0        0               127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15662,fd=514))
udp     UNCONN   0        0                    [::]:53               [::]:*      users:(("named",pid=15662,fd=512))
udp     UNCONN   0        0                    [::]:53               [::]:*      users:(("named",pid=15662,fd=513))
tcp     LISTEN   0        10         192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15662,fd=23))
tcp     LISTEN   0        10              127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15662,fd=22))
tcp     LISTEN   0        128             127.0.0.1:953           0.0.0.0:*      users:(("named",pid=15662,fd=24))
tcp     LISTEN   0        10                   [::]:53               [::]:*      users:(("named",pid=15662,fd=21))
tcp     LISTEN   0        128                 [::1]:953              [::]:*      users:(("named",pid=15662,fd=25))

Starting named a second time succeeds.

# named
# echo $?           
0
# ps -ef | grep named | grep -v grep
root     15662     1  0 15:13 ?        00:00:00 named
root     15803     1  0 15:26 ?        00:00:00 named

Now the weird part. The second process succeeded in listening on the same ports as the first process.

# ss -ltunp | grep named | sed 's/ *$//'       
udp     UNCONN   0        0          192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15803,fd=517))
udp     UNCONN   0        0          192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15803,fd=516))
udp     UNCONN   0        0               127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15803,fd=515))
udp     UNCONN   0        0               127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15803,fd=514))
udp     UNCONN   0        0          192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15662,fd=517))
udp     UNCONN   0        0          192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15662,fd=516))
udp     UNCONN   0        0               127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15662,fd=515))
udp     UNCONN   0        0               127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15662,fd=514))
udp     UNCONN   0        0                    [::]:53               [::]:*      users:(("named",pid=15662,fd=512))
udp     UNCONN   0        0                    [::]:53               [::]:*      users:(("named",pid=15662,fd=513))
udp     UNCONN   0        0                    [::]:53               [::]:*      users:(("named",pid=15803,fd=512))
udp     UNCONN   0        0                    [::]:53               [::]:*      users:(("named",pid=15803,fd=513))
tcp     LISTEN   0        10         192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15803,fd=23))
tcp     LISTEN   0        10              127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15803,fd=22))
tcp     LISTEN   0        10         192.168.53.111:53            0.0.0.0:*      users:(("named",pid=15662,fd=23))
tcp     LISTEN   0        10              127.0.0.1:53            0.0.0.0:*      users:(("named",pid=15662,fd=22))
tcp     LISTEN   0        128             127.0.0.1:953           0.0.0.0:*      users:(("named",pid=15803,fd=24))
tcp     LISTEN   0        128             127.0.0.1:953           0.0.0.0:*      users:(("named",pid=15662,fd=24))
tcp     LISTEN   0        10                   [::]:53               [::]:*      users:(("named",pid=15662,fd=21))
tcp     LISTEN   0        10                   [::]:53               [::]:*      users:(("named",pid=15803,fd=21))
tcp     LISTEN   0        128                 [::1]:953              [::]:*      users:(("named",pid=15662,fd=25))
tcp     LISTEN   0        128                 [::1]:953              [::]:*      users:(("named",pid=15803,fd=25))

NOTE: I would have thought this was an operating environment failure, except that with the same environment, BIND 9.12.3 does not start a second time.

AND NOW THE PROBLEM. Run rndc status over and over again, and sometimes it will connect with one process, sometime with the other. We notice this with different "last configured" timestamps. Keep repeating the command until it shows connecting with both processes:

# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:26:04 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:26:04 GMT
# rndc status | grep last
last configured: Tue, 09 Jul 2019 13:13:23 GMT
#

Not shown, but when this problem will effect queries as well. Some go to one process, some to the other. That was to debug.

What is the current bug behavior?

Because some queries and some rndc commands go to one process, and some to another, this is inconsistent results from the same server (where server is defined as a machine, not a process).

What is the expected correct behavior?

named should not start twice. As an extra request, I recommend it should put out an error to STDERR. A second invocation of named will almost always be manual from the CLI. Therefore, output to STDERR is very useful. This contrasts to error and warning messages when named is started automatically. Possible message: named: Another instance is already running. Not starting.

Relevant configuration files

N/A

Edited Oct 04, 2021 by Ondřej Surý