"unable to set effective uid to 0" is a warning, should be fatal (or handled better)
The story (it's a bit long):
Updated BIND from 9.11.2 to 9.14.1. Basic name resolution works. But some time after startup, nothing appears in log files, and slaves see various zone transfer errors. This appears to be due to failure to regain privileges.
The last error logged in syslog was:
view external: transfer of 'redacted.net/IN': send: socket is not connected
The last message logged in named
's log was (4 hours after the syslog message):
xfer-out: info: client @0x9f38080 redacted#39072 (redacted.net): view external: transfer of 'redacted.net/IN': AXFR ended: 1 messages, 145 records, 15078 bytes, 0.002 secs (7539000 bytes/sec)
About 11 hours after that, the slaves started reporting:
error: transfer of 'redacted.net/IN/internal' from redacted#53: failed while receiving responses: connection reset
On this machine, named
is started as root
and run -u named
Looking in the log file, just after setting up the listening sockets we see:
unable to set effective uid to 0: Operation not permitted
This is repeated after generating session key for dynamic DNS
Yet named continued to initialize without further complaint.
Going back to the build, configure
found support for capabilities, so built --enable-linux-caps
. But for some reason (which I haven't tracked down), things go astray at runtime. There are no obvious clues - presumably because logging failed. (My guess is that during log rotation, the old log was closed, and a new one could not be opened due to lack of privilege. But that's just a guess. And it doesn't explain why the zone transfers were aborted.)
I noticed that the test in configure
is merely that a program that calls cap_set_proc()
(with no arguments) links. I guess that's the best you can do, considering that configure
should be run without privs, and you have to deal with cross-builds...
One slave, on a different architecture and kernel, also reports the unable to set effective uid
error, but continued to log (including the failed transfers).
My (external) secondary DNS service runs on NSD, and also reported zone transfer failures. So they are definitely caused by the BIND master.
Building 9.14.2
with --disable-linux-caps
resolves the symptoms. Tracking down why might be a project for another time.
Recommendations:
- It seems to me that if
named
is built with--enable-linux-caps
, failure to set effective (uid,gid) to the real or saved uid/gid should be fatal.
It isn't obvious that there are production cases where named
should be able to survive this - if code is asking for privilege, it should always be because that privilege is required. If it's asking for privilege in cases where it might require privilege, the actual operation should fail. But it appears that being unable to log prevented this from being visible, and named
soldiered on in silence, resolving ordinary requests (hence, triggering no alarms) and failing zone transfers - which eventually got my attention.
Possible cases where continuing might be viable include in testing or debugging, where you specify a non-privileged -p
port. I'm not sure what the right answer is: two possibilities:
- Make the
unable to set effective uid
message more prominent - Add a command-line switch to make the error a warning (perhaps
-P
-- "ignore failure to obtain privilege")
- Identifying and debugging this would be much simpler if named ensured that all errors are logged by changing logging such that a file isn't closed until its successor is open (assuming my guess is correct). This would guarantee that a failure to open gets logged in the previous logfile, which could continue to log. I have verified that the current logging code closes files before opening a successor... Do you want a separate issue for this observation?