Intermittent named hangs triggered in the "checkds" system test
The checkds
system test has been intermittently failing (with
"increased intensity") for the main
branch for about two weeks now.
Sample job links follow (in chronological order):
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3022378
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3022797
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3022804
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3022807
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3024434
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3025404
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3025408
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3025412
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3027628
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3027638
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3027647
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3030540
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3030548
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3031221
- https://gitlab.isc.org/isc-projects/bind9/-/jobs/3031231
I retained the artifacts for the last job above (3031231).
This seems to happen "across the board" on main
, not on any particular
platform.
As it usually happen for named
hangs, CI job logs do not contain any
specific hints as to what to look at in particular:
I:checkds:stopping servers
I:checkds:ns9 didn't die when sent a SIGTERM
I:checkds:stopping servers failed
I:checkds:Core dump(s) found: checkds/ns9/core.28623
D:checkds:backtrace from checkds/ns9/core.28623:
D:checkds:--------------------------------------------------------------------------------
D:checkds:Core was generated by `/builds/isc-projects/bind9/bin/named/.libs/named -D checkds-ns9 -X named.lock -'.
D:checkds:Program terminated with signal SIGABRT, Aborted.
D:checkds:#0 futex_wait (private=0, expected=0, futex_word=0x7f6a64c23454) at ../sysdeps/nptl/futex-internal.h:144
D:checkds:[Current thread is 1 (Thread 0x7f6a6515b140 (LWP 28623))]
D:checkds:#0 futex_wait (private=0, expected=0, futex_word=0x7f6a64c23454) at ../sysdeps/nptl/futex-internal.h:144
D:checkds:#1 futex_wait_simple (private=0, expected=0, futex_word=0x7f6a64c23454) at ../sysdeps/nptl/futex-internal.h:175
D:checkds:#2 __pthread_barrier_wait (barrier=0x7f6a64c23450) at pthread_barrier_wait.c:184
D:checkds:#3 0x00007f6a67dea2e5 in uv_barrier_wait (barrier=0x7f6a64c23450) at /usr/src/libuv-v1.44.1/src/unix/thread.c:148
D:checkds:#4 0x00007f6a68780620 in loop_run (loop=0x7f6a64ca2700) at loop.c:273
D:checkds:#5 0x00007f6a687807fb in loop_thread (arg=0x7f6a64ca2700) at loop.c:297
D:checkds:#6 0x00007f6a68781bfc in isc_loopmgr_run (loopmgr=0x7f6a64c233c0) at loop.c:477
D:checkds:#7 0x00005556f0a7ae27 in main (argc=16, argv=0x7ffc4c693068) at main.c:1518
D:checkds:--------------------------------------------------------------------------------
This may or may not be related to #3671 (closed), which is also v9.19-specific.