BIND 9.14 option synth-from-dnssec causing high CPU consumption and degraded client experience
Summary
As reported in Support ticket #15338
### BIND version used
BIND 9.14.6 (Stable Release) <id:efd3496>
running on Linux x86_64 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019
built by make with '--prefix=/opt/bind' '--sysconfdir=/etc' '--disable-linux-caps' '--enable-dnsrps'
compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-39)
compiled with OpenSSL version: OpenSSL 1.0.2k 26 Jan 2017
linked to OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
compiled with zlib version: 1.2.7
linked to zlib version: 1.2.7
threads support is enabled
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
DNSSEC root key: /etc/bind.keys
nsupdate session key: /opt/bind/var/run/named/session.key
named PID file: /opt/bind/var/run/named/named.pid
named lock file: /opt/bind/var/run/named/named.lock
Steps to reproduce
This is an in-house resolver (one of a pair behind a load-balancer). DNSSEC and DNSSEC-validation are both disabled:
dnssec-enable no;
dnssec-validation no;
What is the current bug behavior?
After upgrading one of the server pair from BIND 9.10 to 9.14, the server that had previously been running BIND 9.10 changed from running named at around 20% CPU consumption to 150%. It is also slower for clients: "it is slow after CPU usage reach 150%, even when I query the cached data on it, it takes more than 200ms to respond"
What is the expected correct behavior?
As good (or better) performance than before upgrading
Relevant configuration files
See above.
Relevant logs and/or screenshots
There was nothing unusual or different in any of the logging, QPS or any of the usual suspects. Of note, in 9.14, 'dnssec-enable' is no longer a functioning option - and, confirmed in the PCAPs, this server is setting the DO bit on queries to authoritative servers and receiving DNSSEC material with query responses (which will be being cached).
We captured a series of operating stack snapshots using pstack - which showed a surprising number of instances of worker threads calling find_coveringnsec() which was a surprise.
Notably on this server and in this environment, it was expected that there will be a high proportion of negative responses to clients: "there are a lot of invalid/NXDOMAIN dns queries".
Speculatively, we added:
synth-from-dnssec no;
With this new configuration option, performance returned to normal.
Per the ARM:
synth-from-dnssec
Synthesize answers from cached NSEC, NSEC3 and other RRsets that have been proved to be correct using DNSSEC. The default is yes.
Note:
• DNSSEC validation must be enabled for this option to be effective.
This initial implementation only covers synthesis of answers from NSEC records. Synthesis from NSEC3 is planned for the future. This will also be controlled by synth-from-dnssec.
I would have expected that to mean that the option would be disabled if DNSSEC-validation is disabled, but it could be interpreted to mean that the option doesn't do anything useful (which makes sense - as you wouldn't want to use unvalidated NSEC RRsets for this). But the high performance penalty was nevertheless surprising.
The problem may have been more significant in this case due to the notably high proportion of negative cached RRsets/pseudo-RRsets.
Possible fixes
N/A (but there's a clear workaround - disable this feature)