Segmentation fault while dumping the cache
Summary
Named segmentation faults while dumping the cache in response to the commands rndc dumpbd -cache
or rndc dumpdb -all
.
This issue is not occuring right away after starting named.
It needs to have processed recursive DNS requests for a few hours before the segmentation fault can occur.
BIND version used
# named -V
BIND 9.16.29-RH (Extended Support Version) <id:7e23e43>
running on Linux x86_64 3.10.0-1160.59.1.el7.x86_64 #1 SMP Wed Feb 16 12:17:35 UTC 2022
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/bin/python' '--with-libtool' '--localstatedir=/var' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--with-maxminddb' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-gssapi=yes' '--with-lmdb=no' '--without-libjson' '--with-json-c' '--enable-fixed-rrset' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-44)
compiled with OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
linked to OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
compiled with libuv version: 1.44.1
linked to libuv version: 1.44.1
compiled with libxml2 version: 2.9.1
linked to libxml2 version: 20901
compiled with json-c version: 0.11
linked to json-c version: 0.11
compiled with zlib version: 1.2.7
linked to zlib version: 1.2.7
linked to maxminddb version: 1.2.0
threads support is enabled
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
DNSSEC root key: /etc/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
geoip-directory: /usr/share/GeoIP
This issue that has first been observed on our main recursive name servers with version 9.16.27 is still present in versions 9.16.28 and 9.16.29.
Before being updated to version 9.16.27 our servers have been running version 9.16.23, 9.16.22, 9.16.15 and 9.16.13. I have not observed this issue with these versions.
Steps to reproduce
Start named, wait a few hours of activity then issue the command rndc dumpdb -cache
.
What is the current bug behavior?
A segmentation fault occurs and named crashes and dumps a core.
The cache dump file is partially generated.
What is the expected correct behavior?
Named should dump the cache content to the file and resume its operations.
Relevant configuration files
[sanitized]
Relevant logs and/or screenshots
Here are the last messages logged before the crash :
20-May-2022 10:13:13.669 general: info: received control channel command 'stats'
20-May-2022 10:13:13.670 general: info: dumpstats complete
20-May-2022 10:13:14.843 resolver: info: bad cookie from 69.94.141.222#53
20-May-2022 10:13:14.847 resolver: info: bad cookie from 69.94.141.222#53
20-May-2022 10:13:58.378 general: info: received control channel command 'dumpdb -cache'
20-May-2022 10:13:58.383 general: info: dumpdb started: -cache
Here is the reason of the crash reported by gdb :
Program terminated with signal 11, Segmentation fault.
#0 0x00007fd7bf27ca3b in isc_hp_protect (hp=0x5594a3a9e170, ihp=ihp@entry=0,
atom=atom@entry=0x5594a3aae980) at ../../../lib/isc/hp.c:158
158 atomic_store(&hp->hp[tid()][ihp], ret);
And here is a backtrace and a dump of the *node and *rbtdb variables that lead to the decision to trigger a node pruning while the iteration on the cache.
#0 0x00007fd7bf27ca3b in isc_hp_protect (hp=0x5594a3a9e170, ihp=ihp@entry=0,
atom=atom@entry=0x5594a3aae980) at ../../../lib/isc/hp.c:158
n = <optimized out>
ret = 140564102797376
#1 0x00007fd7bf29c86c in isc_queue_enqueue (queue=0x5594a3aae900,
item=140564146162096) at ../../../lib/isc/queue.c:142
lt = 0x0
idx = <optimized out>
n = 0
#2 0x00007fd7bf28a7c5 in isc__nm_enqueue_ievent (worker=0x5594a3a9ced0,
event=<optimized out>) at ../../../../lib/isc/netmgr/netmgr.c:1167
No locals.
#3 0x00007fd7bf2a45bd in task_ready (task=0x7fd7b42a66a0)
at ../../../lib/isc/task.c:345
manager = 0x5594a3ac3e30
#4 0x00007fd7bf2a4ad2 in isc_task_sendto (task=<optimized out>,
eventp=eventp@entry=0x7fd7ba78d8a8, c=c@entry=-1)
at ../../../lib/isc/task.c:494
was_idle = <optimized out>
#5 0x00007fd7bf2a4b2a in isc_task_send (task=<optimized out>,
eventp=eventp@entry=0x7fd7ba78d8a8) at ../../../lib/isc/task.c:450
No locals.
#6 0x00007fd7c0562a41 in send_to_prune_tree (
rbtdb=rbtdb@entry=0x7fd7b42a16f0, node=node@entry=0x7fd79d5b0720,
locktype=isc_rwlocktype_write) at ../../../lib/dns/rbtdb.c:1948
ev = 0x0
db = 0x7fd7b42a16f0
#7 0x00007fd7c0563d01 in decrement_reference (
rbtdb=rbtdb@entry=0x7fd7b42a16f0, node=node@entry=0x7fd79d5b0720,
least_serial=<optimized out>, least_serial@entry=0,
nlock=nlock@entry=isc_rwlocktype_read, tlock=isc_rwlocktype_read,
pruning=pruning@entry=false) at ../../../lib/dns/rbtdb.c:2184
result = <optimized out>
write_locked = true
locked = true
nodelock = 0x7fd7b42a1ad8
bucket = 1
no_reference = true
#8 0x00007fd7c0564d4f in dereference_iter_node (rbtdbiter=0x7fd7a167eaa0)
at ../../../lib/dns/rbtdb.c:9295
rbtdb = 0x7fd7b42a16f0
node = 0x7fd79d5b0720
lock = 0x7fd7b42a1ad8
#9 0x00007fd7c0564e16 in dbiterator_next (iterator=0x7fd7a167eaa0)
at ../../../lib/dns/rbtdb.c:9667
result = 65560
rbtdbiter = 0x7fd7a167eaa0
name = 0x7fd7a167ead0
origin = 0x7fd7a167ece0
rbtdb = 0x7fd7b42a16f0
#10 0x00007fd7c052a3b4 in dumptostream (dctx=0x7fd7a9e87920)
at ../../../lib/dns/masterdump.c:1800
rdsiter = 0x0
node = 0x0
result = 0
buffer = {magic = 1114990113, base = 0x7fd7a4001740, length = 4800,
used = 126, current = 0, active = 0, link = {
prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0,
autore = false}
bufmem = <optimized out>
name = 0x7fd7ba78da60
fixname = {name = {magic = 1145983854,
ndata = 0x7fd7ba78db70 "\001\071\002\066\062\003\061\071\067\003\061\062\060\ain-addr\004arpa", length = 27, labels = 7, attributes = 1,
offsets = 0x7fd7ba78dab0 "", buffer = 0x7fd7ba78db30, link = {
prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {
head = 0x0, tail = 0x0}},
offsets = "\000\002\005\t\r\025\032 ',\024\026\030\032\034\036 \"$&(*,.02468:<>@DMQ", '\000' <repeats 44 times>, "!fuB\000\000\000\000@\333x\272\327\177\000\000\377", '\000' <repeats 15 times>, '\377' <repeats 16 times>,
buffer = {magic = 1114990113, base = 0x7fd7ba78db70, length = 255,
used = 27, current = 0, active = 0, link = {
prev = 0xffffffffffffffff, next = 0xffffffffffffffff},
mctx = 0x0, autore = false},
data = "\001\071\002\066\062\003\061\071\067\003\061\062\060\ain-addr\004arpa\000\000\000rpa\000a\000\000\000\000r\004arpa\000ar\000\000mutive\003app\000abog0xxphqca\311+l\275\327\177\000\000\000\000\000\000\000\000\000\000\211\375\232\275\327\177\000\000\000\000\000\000\000\000\000\000o\217+\277\327\177\000\000\006\375\000\000\000\000\000\000ū\200\000\000\000\000\000\001\000\000\000\000\000\000\000\244\201\000\000\031\000\000\000\031", '\000' <repeats 15 times>, "\036w4\001\000\000\000\000\000\020\000\000\000\000\000\000\200\377\000\000\000\000\000\000\241\210\205b\000\000\000\000\200:l\276\327\177\000\000\000\000\000\000\000\000\000\000"...}
#11 0x00007fd7c052aa7d in master_dump_cb (data=0x7fd7a9e87920)
at ../../../lib/dns/masterdump.c:1502
result = 61
dctx = 0x7fd7a9e87920
#12 0x00007fd7be4a0434 in worker () from /lib64/libuv.so.1
No symbol table info available.
#13 0x00007fd7bd9a8ea5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x00007fd7bd6d1b0d in clone () from /lib64/libc.so.6
No symbol table info available.
detail for decrement_reference :
#7 0x00007fd7c0563d01 in decrement_reference (
rbtdb=rbtdb@entry=0x7fd7b42a16f0, node=node@entry=0x7fd79d5b0720,
least_serial=<optimized out>, least_serial@entry=0,
nlock=nlock@entry=isc_rwlocktype_read, tlock=isc_rwlocktype_read,
pruning=pruning@entry=false) at ../../../lib/dns/rbtdb.c:2184
2184 send_to_prune_tree(rbtdb, node, isc_rwlocktype_write);
(gdb) print *node
$1 = {magic = 1380077135, is_root = 1, color = 1, find_callback = 0,
attributes = 0, nsec = 0, namelen = 2, offsetlen = 1, oldnamelen = 2,
is_mmapped = 0, parent_is_relative = 0, left_is_relative = 0,
right_is_relative = 0, down_is_relative = 0, data_is_relative = 0,
fullnamelen = 0, rpz = 0, hashval = 2811100944, uppernode = 0x7fd7abe3dbe0,
hashnext = 0x7fd7a95a58f0, parent = 0x7fd7abe3dbe0, left = 0x0, right = 0x0,
down = 0x0, deadlink = {prev = 0xffffffffffffffff,
next = 0xffffffffffffffff}, data = 0x0, dirty = 0 '\000', wild = 0 '\000',
locknum = 1, references = 1}
(gdb) print *rbtdb
$2 = {common = {magic = 1145983812, impmagic = 1380074548,
methods = 0x7fd7c08af560 <cache_methods>, attributes = 1, rdclass = 1,
origin = {magic = 1145983854, ndata = 0x7fd7b42a6080 "", length = 1,
labels = 1, attributes = 15, offsets = 0x7fd7b42a6081 "", buffer = 0x0,
link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {
head = 0x0, tail = 0x0}}, mctx = 0x7fd7b428fa30, update_listeners = {
head = 0x0, tail = 0x0}}, lock = {magic = 1381452907, lock = {__data = {
__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3,
__spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>,
__align = 0}, spins = -1, write_requests = 0, write_completions = 0,
cnt_and_flag = 0, readable = {__data = {__lock = 0, __futex = 0,
__total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
__nwaiters = 0, __broadcast_seq = 0},
__size = '\000' <repeats 47 times>, __align = 0}, writeable = {__data = {
__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
__size = '\000' <repeats 47 times>, __align = 0}, readers_waiting = 0,
write_granted = 1, write_quota = 4}, tree_lock = {magic = 1381452907,
lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 3, __spins = 0, __elision = 0, __list = {__prev = 0x0,
__next = 0x0}},
__size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>,
__align = 0}, spins = -1, write_requests = 0, write_completions = -1,
cnt_and_flag = 1, readable = {__data = {__lock = 0, __futex = 0,
__total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
__nwaiters = 0, __broadcast_seq = 0},
__size = '\000' <repeats 47 times>, __align = 0}, writeable = {__data = {
__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
__size = '\000' <repeats 47 times>, __align = 0}, readers_waiting = 0,
write_granted = 2809755, write_quota = 4}, node_lock_count = 17,
node_locks = 0x7fd7b42a1a00, origin_node = 0x0, nsec3_origin_node = 0x0,
rrsetstats = 0x7fd7b4290150, cachestats = 0x7fd7b4290110,
gluecachestats = 0x0, active = 17, references = 7, attributes = 0,
current_serial = 1, least_serial = 1, next_serial = 2,
current_version = 0x7fd7b42a6340, future_version = 0x0, open_versions = {
head = 0x7fd7b42a6340, tail = 0x7fd7b42a6340}, task = 0x7fd7b42a66a0,
soanode = 0x0, nsnode = 0x0, serve_stale_ttl = 86400,
serve_stale_refresh = 30, rdatasets = 0x7fd7b42a58a0,
deadnodes = 0x7fd7b42a5f60, hmctx = 0x7fd7b428fb80, heaps = 0x7fd7b42a59c0,
mmap_location = 0x0, mmap_size = 0, tree = 0x7fd7b42a60a0,
nsec = 0x7fd7b42a6180, nsec3 = 0x7fd7b42a6260, quantum = 0}
In case it may be usefull, here are the statistics that were generated by named before the cache dump :
[sanitized]