BIND 9.16.3 segfault in isc__nm_tcpdns_send
From Support ticket #16727 (details of core dump etcetera can be found there).
Crash on a BIND 9.16.3 (Stable Release) id:5ea41c1 (with no rehash patch - that is the one to make it possible to start named with larger hash tables so that there is no hash table resizing as cache expands)
core /var/log/splunk/core/core.19901
Core was generated by `/local/sbin/named -f -c /etc/named/named.conf -u named -n 12'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000637859 in isc__nm_tcpdns_send (handle=0x7f01eb0cdcf0, region=0x7f01ff3c2700, cb=0x478fc0 <client_senddone>, cbarg=0x7f01eb0cde60)
at tcpdns.c:483
483 tcpdns.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.5.x86_64 json-c-0.11-4.el7_0.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuv-1.37.0-1.el7.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7_6.8.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x0000000000637859 in isc__nm_tcpdns_send (handle=0x7f01eb0cdcf0, region=0x7f01ff3c2700, cb=0x478fc0 <client_senddone>, cbarg=0x7f01eb0cde60)
at tcpdns.c:483
#1 0x000000000063231d in isc_nm_send (handle=<optimized out>, region=<optimized out>, cb=<optimized out>, cbarg=<optimized out>) at netmgr.c:1309
#2 0x00000000004770d7 in client_sendpkg (client=client@entry=0x7f01eb0cde60, buffer=0x7f01ff3c2780, buffer=0x7f01ff3c2780) at client.c:366
#3 0x0000000000478064 in ns_client_send (client=client@entry=0x7f01eb0cde60) at client.c:634
#4 0x0000000000485a6c in query_send (client=0x7f01eb0cde60) at query.c:552
#5 0x000000000048dd13 in ns_query_done (qctx=qctx@entry=0x7f01ff3c4830) at query.c:10921
#6 0x000000000048f65d in query_respond (qctx=0x7f01ff3c4830) at query.c:7414
#7 query_prepresponse (qctx=qctx@entry=0x7f01ff3c4830) at query.c:9913
#8 0x000000000049170c in query_gotanswer (qctx=qctx@entry=0x7f01ff3c4830, res=res@entry=0) at query.c:6836
#9 0x0000000000496760 in query_resume (qctx=0x7f01ff3c4830) at query.c:6134
#10 fetch_callback (task=<optimized out>, event=0x7f0157df1490) at query.c:5716
#11 0x000000000064168a in dispatch (threadid=<optimized out>, manager=<optimized out>) at task.c:1152
#12 run (queuep=<optimized out>) at task.c:1344
#13 0x00007f020a26bdd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f0209b76ead in clone () from /lib64/libc.so.6
(gdb) info frame 0
(gdb) info locals
t = 0x7f01a3fc7f28
sock = 0x7f015c8c9e10
(gdb) print *t
$1 = {mctx = 0x7f0193bfdd78, handle = 0x7f01eff53560, region = {base = 0x0, length = 215}, orighandle = 0x7f00f447c4a0, cb = 0x478fc0 <client_senddone>,
cbarg = 0x7f00f447c610}
(gdb) print *t->mtx
There is no member named mtx.
(gdb) print *(t->mctx)
$2 = {impmagic = 1337724176, magic = 32513, methods = 0x7f01dffd1690}
(gdb) print *(t->handle)
$3 = {magic = 0, references = 0, sock = 0x0, ah_pos = 0, inflight = false, peer = {type = {sa = {sa_family = 0, sa_data = '\000' <repeats 13 times>}, sin = {
sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 0, sin6_port = 0,
sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0,
0}}}, sin6_scope_id = 0}, ss = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, sunix = {sun_family = 0,
sun_path = '\000' <repeats 107 times>}}, length = 0, link = {prev = 0x0, next = 0x0}}, local = {type = {sa = {sa_family = 0,
sa_data = '\000' <repeats 13 times>}, sin = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"},
sin6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0,
0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, ss = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0},
sunix = {sun_family = 0, sun_path = '\000' <repeats 107 times>}}, length = 0, link = {prev = 0x0, next = 0x0}}, doreset = 0x0, dofree = 0x0,
opaque = 0x0, extra = 0x7f01eff536d0 ""}
(gdb) print *(t->cbarg)
Attempt to dereference a generic pointer.
# ls -lst --full-time /var/log/splunk/core/core.19901
3243708 -rw------- 1 named named 3637059584 2020-06-13 21:42:52.861692911 +0200 /var/log/splunk/core/core.19901
Last syslog messages were:
2020-06-13T21:42:42.464+02:00 dispatch: dispatch 0x7f01b2a57550: shutting down due to TCP receive error: 172.105.106.137#53: connection reset
2020-06-13T21:42:42.558+02:00 dispatch: dispatch 0x7f01b3b48cc0: shutting down due to TCP receive error: 172.105.106.137#53: connection reset
And just before (as in, the logging is a bit out of sequence):
2020-06-13T21:42:42.000+02:00 2020-06-13T21:42:42+02:00 ti0016o823.ti.telenor.net kernel:
[30703183.144337] isc-worker0005[19921]: segfault at 118 ip 0000000000637859 sp 00007f01ff3c26c0
error 4 in named[400000+2f9000]
There is a full gdb backtrace of all the threads on the support ticket. Binaries and libs are on another support ticket #16728
====
Note that this server then had trouble restarting - repeated instances of another crash on startup.
Edited by Cathy Almond