nsupdate leaks memory when using GSS-TSIG and receiving SIGTERM at a "right" time
Summary
I'm reporting https://bugs.isc.org/SelfService/Display.html?id=46044 here, because it seems that bugs.isc.org is not getting any attention.
While running tests for sssd project, which runs nsupdate on the background, we discovered a bug in nsupdate itself. During the test, sssd runs nsupdate with GSSAPI as a child process. However sssd fails to perform some operation and sends SIGTERM to all of its child processes, including nsupdate.
The problem is, that if nsupdate receives SIGTERM exactly after returning from recvgss() function, but before receiving a response from the server, it terminates the event loop and eventually ends up in cleanup() function, which however does not clean up "tmpzonename" and "restart_master" variables, allocated in recvsoa(). These are freed only in update_completed(). When the memory context is destroyed in cleanup(), nsupdate crashes, because these two variables were never freed.
Steps to reproduce
I created a reproducer using gdb, which sends SIGTERM to nsupdate right before exiting recvgss() function. nsupdate then crashes on different places, because of a race conditions when shutting down the application. Sometimes it is because the memory was not freed, sometimes it is because some structures are tried to be freed twice.
The reproducer requires GDB > 7.5, a DNS server with allowed zone updates and using GSSAPI with nsupdate (and having all the prerequisites like krb ticket and so on...).
[root@vm-058-217 ~]# cat nsupdate.cmd
update add bumblebee10.test.example.com. 3600 IN A 192.168.1.1
send
[root@vm-058-217 ~]# cat nsupdate_reproducer.gdb
# set breakpoint at the end of recvsoa() function
break ddebug if $_streq(format, "Out of recvgss")
handle SIGTERM nostop
# run nsupdate
run -g nsupdate.cmd
# send a SIGTERM to nsupdate, causing it do end immediately
# we have to do it from shell, because gdb is too slow with sending the signal
shell kill -s SIGTERM $(pidof nsupdate) && sleep 1 && exit
continue
The reproducer does not work 100%, but it makes nsupdate crash with some error most of the time. I successfully reproduced the issue with bind 9.11.1 and seeing changes in git, it does not seem to be fixed yet. (this was relevant at the end of 2017)
What is the current bug behavior?
nsupdate sometimes crashes while processing SIGTERM signal.
What is the expected correct behavior?
should exit correctly
Relevant configuration files
n/a
Relevant logs and/or screenshots
crash #1 (on Fedora 26 with bind 9.11.1)
[root@qeos-186 ~]# gdb --command=nsupdate_reproducer.gdb nsupdate
GNU gdb (GDB) Fedora 8.0-24.fc26
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from nsupdate...Reading symbols from /usr/lib/debug/usr/bin/nsupdate.debug...done.
done.
Breakpoint 1 at 0x4ee0: file ./nsupdate.c, line 366.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff4142700 (LWP 16820)]
[New Thread 0x7ffff3941700 (LWP 16821)]
[New Thread 0x7ffff3140700 (LWP 16822)]
[Switching to Thread 0x7ffff3140700 (LWP 16822)]
Thread 4 "nsupdate" hit Breakpoint 1, ddebug (format=0x555555560164 "Out of recvgss") at ./nsupdate.c:366
366 ddebug(const char *format, ...) {
Thread 1 "nsupdate" received signal SIGTERM, Terminated.
[Thread 0x7ffff3140700 (LWP 16822) exited]
[Thread 0x7ffff4142700 (LWP 16820) exited]
[Thread 0x7ffff3941700 (LWP 16821) exited]
Failing assertion due to probable leaked memory in context 0x555555764030 ("") (stats[18].gets == 1).
mem.c:1072: INSIST(ctx->stats[i].gets == 0U) failed, back trace
#0 0x7ffff70fdd27 in ??
#1 0x7ffff70fdc7a in ??
#2 0x7ffff7110f5c in ??
#3 0x7ffff71111e5 in ??
#4 0x7ffff7114e55 in ??
#5 0x555555558b70 in ??
#6 0x7ffff4c2a50a in ??
#7 0x555555558d4a in ??
Thread 1 "nsupdate" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7fe0800 (LWP 16816)]
0x00007ffff4c4069b in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install GeoIP-1.6.11-1.fc26.x86_64 glibc-2.25-10.fc26.x86_64 gssproxy-0.7.0-9.fc26.x86_64 keyutils-libs-1.5.10-1.fc26.x86_64 krb5-libs-1.15.1-28.fc26.x86_64 libcap-2.25-5.fc26.x86_64 libcom_err-1.43.4-2.fc26.x86_64 libgcc-7.1.1-3.fc26.x86_64 libselinux-2.6-7.fc26.x86_64 libxml2-2.9.4-2.fc26.x86_64 openssl-libs-1.1.0f-7.fc26.x86_64 pcre-8.41-1.fc26.x86_64 xz-libs-5.2.3-2.fc26.x86_64 zlib-1.2.11-2.fc26.x86_64
(gdb) bt
#0 0x00007ffff4c4069b in raise () from /lib64/libc.so.6
#1 0x00007ffff4c424a0 in abort () from /lib64/libc.so.6
#2 0x00007ffff70fdc7f in isc_assertion_failed (file=file@entry=0x7ffff7149708 "mem.c", line=line@entry=1072, type=type@entry=isc_assertiontype_insist,
cond=cond@entry=0x7ffff71497ed "ctx->stats[i].gets == 0U") at assertions.c:50
#3 0x00007ffff7110f5c in destroy (ctx=ctx@entry=0x555555764030) at mem.c:1072
#4 0x00007ffff71111e5 in isc__mem_destroy (ctxp=0x555555763a68 <gmctx>) at mem.c:1225
#5 0x00007ffff7114e55 in isc_mem_destroy (mctxp=0x555555763a68 <gmctx>) at mem.c:2752
#6 0x0000555555558b70 in cleanup () at ./nsupdate.c:3200
#7 main (argc=3, argv=0x7fffffffe318) at ./nsupdate.c:3252
crash #2 (on RHEL-7 with bind 9.9.4)
[root@vm-058-217 ~]# gdb --command=nsupdate_reproducer.gdb nsupdate
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-103.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/nsupdate...Reading symbols from /usr/lib/debug/usr/bin/nsupdate.debug...done.
done.
Breakpoint 1 at 0x404f20: file ./nsupdate.c, line 344.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff3e08700 (LWP 23095)]
[New Thread 0x7ffff3607700 (LWP 23096)]
[New Thread 0x7ffff2e06700 (LWP 23097)]
[Switching to Thread 0x7ffff2e06700 (LWP 23097)]
Breakpoint 1, ddebug (format=format@entry=0x40bb12 "Out of recvgss") at ./nsupdate.c:344
344 ddebug(const char *format, ...) {
Program received signal SIGTERM, Terminated.
request.c:249: REQUIRE((((requestmgr) != ((void *)0)) && (((const isc__magic_t *)(requestmgr))->magic == ((('R') << 24 | ('q') << 16 | ('u') << 8 | ('M')))))) failed, back trace
#0 0x7ffff6f61287 in ??
#1 0x7ffff6f611da in ??
#2 0x7ffff78ec097 in ??
#3 0x408a1b in ??
#4 0x408e95 in ??
#5 0x7ffff6f840c6 in ??
#6 0x7ffff5d67dc5 in ??
#7 0x7ffff4de07ad in ??
Program received signal SIGABRT, Aborted.
0x00007ffff4d1e1d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-161.el7.x86_64 gssproxy-0.7.0-5.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-10.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-8.el7.x86_64 openssl-libs-1.0.2k-1.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
(gdb) bt
#0 0x00007ffff4d1e1d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff4d1f8c8 in abort () from /lib64/libc.so.6
#2 0x00007ffff6f611df in isc_assertion_failed (file=file@entry=0x7ffff7988ac6 "request.c", line=line@entry=249, type=type@entry=isc_assertiontype_require,
cond=cond@entry=0x7ffff79890d0 "(((requestmgr) != ((void *)0)) && (((const isc__magic_t *)(requestmgr))->magic == ((('R') << 24 | ('q') << 16 | ('u') << 8 | ('M')))))")
at assertions.c:58
#3 0x00007ffff78ec097 in dns_requestmgr_shutdown (requestmgr=0x0) at request.c:249
#4 0x0000000000408a1b in maybeshutdown () at ./nsupdate.c:761
#5 0x0000000000408e95 in getinput (task=<optimized out>, event=<optimized out>) at ./nsupdate.c:2929
#6 0x00007ffff6f840c6 in dispatch (manager=0x7ffff7fb1010) at task.c:1116
#7 run (uap=0x7ffff7fb1010) at task.c:1286
#8 0x00007ffff5d67dc5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007ffff4de07ad in clone () from /lib64/libc.so.6
crash #3 (on RHEL-7 with bind 9.9.4)
[root@vm-058-217 ~]# gdb --command=nsupdate_reproducer.gdb nsupdate
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-103.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/nsupdate...Reading symbols from /usr/lib/debug/usr/bin/nsupdate.debug...done.
done.
Breakpoint 1 at 0x404f20: file ./nsupdate.c, line 344.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff3e08700 (LWP 22986)]
[New Thread 0x7ffff3607700 (LWP 22987)]
[New Thread 0x7ffff2e06700 (LWP 22988)]
[Switching to Thread 0x7ffff2e06700 (LWP 22988)]
Breakpoint 1, ddebug (format=format@entry=0x40bb12 "Out of recvgss") at ./nsupdate.c:344
344 ddebug(const char *format, ...) {
Program received signal SIGTERM, Terminated.
tsig.c:2016: REQUIRE(*ringp != ((void *)0)) failed, back trace
#0 0x7ffff6f61287 in ??
#1 0x7ffff6f611da in ??
#2 0x7ffff79163a9 in ??
#3 0x404aac in ??
#4 0x7ffff4d0ab35 in ??
#5 0x404da3 in ??
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7fe3840 (LWP 22982)]
0x00007ffff4d1e1d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-161.el7.x86_64 gssproxy-0.7.0-5.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-10.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-8.el7.x86_64 openssl-libs-1.0.2k-1.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
(gdb) bt
#0 0x00007ffff4d1e1d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff4d1f8c8 in abort () from /lib64/libc.so.6
#2 0x00007ffff6f611df in isc_assertion_failed (file=file@entry=0x7ffff79906b0 "tsig.c", line=line@entry=2016, type=type@entry=isc_assertiontype_require,
cond=cond@entry=0x7ffff79908da "*ringp != ((void *)0)") at assertions.c:58
#3 0x00007ffff79163a9 in dns_tsigkeyring_detach (ringp=ringp@entry=0x60ed80 <gssring>) at tsig.c:2016
#4 0x0000000000404aac in cleanup () at ./nsupdate.c:2880
#5 main (argc=3, argv=0x7fffffffe3a8) at ./nsupdate.c:2971
Possible fixes
While the memory leak, which caused the crash in this case, can be fixed easily, I was able to reproduce other crashes because of the same race condition at the shutdown. Thus a "complete" fix is not trivial due to the complexity of the shutdown sequence in nsupdate.
WIP fix is attached to Red Hat bug https://bugzilla.redhat.com/show_bug.cgi?id=1300636, but it probably won't fit the git master HEAD...