ccmsg doesn't support multiple message read in the single TCP read
Summary
Sending multiple messages (without waiting for the previous responses to arrive) to bind via rndc causes messages to get lost/bind to loose sync on the stream.
BIND version used
Docker image ubuntu/bind9:9.18-22.04_beta
BIND 9.18.12-0ubuntu0.22.04.3-Ubuntu (Extended Support Version) <id:>
running on Linux x86_64 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 9 15:34:04 UTC 2
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--disable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=yes' '--with-libidn2' '--with-json-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/bind9-B5s8Yi/bind9-9.18.12=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 11.4.0
compiled with OpenSSL version: OpenSSL 3.0.2 15 Mar 2022
linked to OpenSSL version: OpenSSL 3.0.2 15 Mar 2022
compiled with libuv version: 1.43.0
linked to libuv version: 1.43.0
compiled with libnghttp2 version: 1.43.0
linked to libnghttp2 version: 1.43.0
compiled with libxml2 version: 2.9.13
linked to libxml2 version: 20913
compiled with json-c version: 0.15
linked to json-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.5.2
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): yes
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /etc/bind/named.conf
rndc configuration: /etc/bind/rndc.conf
DNSSEC root key: /etc/bind/bind.keys
nsupdate session key: //run/named/session.key
named PID file: //run/named/named.pid
named lock file: //run/named/named.lock
geoip-directory: /usr/share/GeoIP
Steps to reproduce
Send multiple requests via rndc in rapid succession. The particular program used to initially observe the issue is closed source and tightly integrated, however the following script using bind9-rndc-node
experiences the same behaviour.
var RNDC = require('bind9-rndc');
var key = 'BzDBJ1B/JbQg9iXJYAGZLQ==';
var session = RNDC.connect('172.17.0.2', 953, key, 'md5');
var test_count = 10;
var recv_count = 0;
session.on('ready', () => {
for(var i = 0; i < test_count; i++)
session.send('status');
});
session.on('data', (obj) => {
recv_count++;
console.log("Got " + recv_count);
console.log(obj);
});
session.on('error', console.log);
What is the current bug behavior?
Bind correctly handles some of the requests but drops others. Depending on the message size and timing the indices of the recognized messages vary and sometimes bind looses track entirely until it drops the connection with a timeout.
What is the expected correct behavior?
The number of responses is equal to the number of requests (the order can vary, thats what _ser
is for) and the stream stays in sync, regardless of the timing when sending messages. If bind can not keep up with the client it should use the tcp blocking to signal this instead of dropping data.
Relevant configuration files
named.conf
key "rndc-key" {
algorithm hmac-md5;
secret "BzDBJ1B/JbQg9iXJYAGZLQ==";
};
controls {
inet 0.0.0.0 allow { any; } keys { "rndc-key"; };
};
options {
directory "/var/cache/bind";
version none;
allow-query { any; };
allow-recursion { any; };
dnssec-validation auto;
auth-nxdomain no; # conform to RFC1035
listen-on-v6 { };
listen-on { any; };
};
Relevant logs and/or screenshots
Sample output of script (note theres a rather long pause before warning: unread...
):
(node:10142) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Got 1
Map(0) {
type: 'status',
result: '0',
text: 'version: BIND 9.18.12-0ubuntu0.22.04.3-Ubuntu (Extended Support Version) <id:> (version.bind/txt/ch disabled)\n' +
'running on 54d01852d58b: Linux x86_64 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 9 15:34:04 UTC 2\n' +
'boot time: Sat, 04 Nov 2023 14:45:41 GMT\n' +
'last configured: Sat, 04 Nov 2023 14:45:41 GMT\n' +
'configuration file: /etc/bind/named.conf\n' +
'CPUs found: 16\n' +
'worker threads: 16\n' +
'UDP listeners per interface: 16\n' +
'number of zones: 101 (99 automatic)\n' +
'debug level: 0\n' +
'xfers running: 0\n' +
'xfers deferred: 0\n' +
'soa queries in progress: 0\n' +
'query logging is OFF\n' +
'recursive clients: 0/900/1000\n' +
'tcp clients: 0/150\n' +
'TCP high-water: 0\n' +
'server is up and running'
}
warning: unread data left over
Relevant log output for this transaction:
04-Nov-2023 15:12:17.519 invalid command from 172.17.0.1#40764: timed out
Possible fixes
I am not familiar with the bind9 source code, but I will take a look if I can find something (along with retesting on the master branch). A fix for clients is to ensure that there is never more than one message in flight. This fixes the issue, but vastly degrades performance if the latency between server and client is huge.