ns1 shutdown hang in "tcp:checking that BIND 9 doesn't crash on long TCP messages"
Job #3639436 failed for 028154d4.
jemalloc arenas were merged to main
and 9.18, but they did not help with the "tcp:checking that BIND 9 doesn't crash on long TCP messages" check (#4038 (closed)) entirely (nor with the isc_mem_benchmark
check of the mem_test
unit test, see #4286 (closed)). Arguably, I've never seen the tcp
check fail four times in a day:
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639436
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639704
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639696
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3639687
However, the nature of the failure is different: ns1
is not OOM-killed but it didn't terminate in time, that is 5 minutes wasn't enough to terminate and the process was aborted.
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:checking that BIND 9 doesn't crash on long TCP messages (10)
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:sending 300000 time(s): 00010000000100000000000003697363036f72670000fc0001
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:............................................................................................................................................................................................................................................................................................................
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:sent 4023683 bytes to 10.53.0.1:20597
2023-09-06 00:20:22 INFO:tcp I:tcp_tmp_n9au2yez:exit status: 0
---------------------------- Captured log teardown -----------------------------
2023-09-06 00:25:23 INFO:tcp I:tcp_tmp_n9au2yez:ns1 didn't die when sent a SIGTERM
2023-09-06 00:25:23 ERROR:tcp Failed to stop servers
2023-09-06 00:25:24 INFO:tcp I:tcp_tmp_n9au2yez:Core dump(s) found: /builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez/ns1/core.387947
2023-09-06 00:25:24 INFO:tcp D:tcp_tmp_n9au2yez:backtrace from /builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez/ns1/core.387947:
2023-09-06 00:25:24 INFO:tcp D:tcp_tmp_n9au2yez:--------------------------------------------------------------------------------
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D tcp_tmp_n9au2yez-ns1 -X nam'.
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:Program terminated with signal SIGABRT, Aborted.
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#0 0x00007f8a0a4c3129 in pthread_barrier_wait@GLIBC_2.2.5 () from /lib64/libc.so.6
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:[Current thread is 1 (Thread 0x7f8a09eaa580 (LWP 387947))]
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#0 0x00007f8a0a4c3129 in pthread_barrier_wait@GLIBC_2.2.5 () from /lib64/libc.so.6
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#1 0x00007f8a0b2432a2 in stop_tcp_child_job (arg=0x7f8a09ab6800) at netmgr/tcp.c:589
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#2 0x00007f8a0b243372 in stop_tcp_child (sock=<optimized out>) at netmgr/tcp.c:597
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#3 0x00007f8a0b243b21 in isc__nm_tcp_stoplistening (sock=0x7f8a09a77800) at netmgr/tcp.c:622
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#4 0x00007f8a0b23b359 in isc_nm_stoplistening (sock=<optimized out>) at netmgr/netmgr.c:1699
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#5 0x00007f8a0b23dc62 in isc__nmsocket_stop (listener=0x7f8a09a76e00) at netmgr/netmgr.c:1730
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#6 0x00007f8a0b24183e in isc__nm_streamdns_stoplistening (sock=<optimized out>) at netmgr/streamdns.c:962
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#7 0x00007f8a0b23b360 in isc_nm_stoplistening (sock=<optimized out>) at netmgr/netmgr.c:1702
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#8 0x00007f8a0afcab27 in ns_interface_shutdown (ifp=ifp@entry=0x7f8a09ad7980) at interfacemgr.c:729
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#9 0x00007f8a0afcaf9a in purge_old_interfaces (mgr=mgr@entry=0x7f8a09a70500) at interfacemgr.c:815
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#10 0x00007f8a0afcb13e in ns_interfacemgr_shutdown (mgr=0x7f8a09a70500) at interfacemgr.c:435
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#11 0x0000000000445bf1 in shutdown_server (arg=0x7f8a09a9f700) at server.c:9983
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#12 0x00007f8a0b24b383 in isc__async_cb (handle=<optimized out>) at async.c:111
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#13 0x00007f8a0a977bd3 in ?? () from /lib64/libuv.so.1
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#14 0x00007f8a0a99457b in ?? () from /lib64/libuv.so.1
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#15 0x00007f8a0a97d097 in uv_run () from /lib64/libuv.so.1
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#16 0x00007f8a0b25d1ba in loop_thread (arg=arg@entry=0x7f8a09aac800) at loop.c:282
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#17 0x00007f8a0b26ca20 in thread_body (wrap=0xb788b0) at thread.c:85
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#18 0x00007f8a0b26ca99 in isc_thread_main (func=func@entry=0x7f8a0b25d12f <loop_thread>, arg=0x7f8a09aac800) at thread.c:116
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#19 0x00007f8a0b25e109 in isc_loopmgr_run (loopmgr=0x7f8a09a6f6c0) at loop.c:454
2023-09-06 00:27:07 INFO:tcp D:/builds/isc-projects/bind9/bin/tests/system/tcp_tmp_n9au2yez:#20 0x0000000000426faa in main (argc=16, argv=0x7fff1308dae8) at main.c:1592
2023-09-06 00:27:07 INFO:tcp D:tcp_tmp_n9au2yez:--------------------------------------------------------------------------------
06-Sep-2023 00:20:58.284 client @0x7f895a786800 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a787400 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a788000 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a788c00 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a789800 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a78a400 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a7a5000 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 client @0x7f895a7a5c00 10.53.0.1#53464 (isc.org): bad zone transfer request: 'isc.org/IN': non-authoritative zone (NOTAUTH)
06-Sep-2023 00:20:58.284 netmgr 0x7f8a09a6f900: Shutting down network manager worker on loop 0x7f8a09aae180(3)
06-Sep-2023 00:20:58.284 netmgr 0x7f8a09a6f900: Shutting down network manager worker on loop 0x7f8a09aad900(2)
Locally, I can't reproduce it; ns1
terminates at worst in 210 seconds.