urcu cleanup may happen after jemalloc prepares its memory leak summary
Since !8139 (merged) was merged yesterday, the respdiff-long
CI job can now
fail due to alleged memory leaks being found, e.g.:
https://gitlab.isc.org/isc-projects/bind9/-/jobs/3556883
The call stack from which the alleged memory leak stems from is slightly cryptic:
https://gitlab.isc.org/isc-projects/bind9/-/jobs/3556883/artifacts/file/jeprof.pdf
urcu_memb_unregister_thread()
urcu_memb_get_default_call_rcu_data()
urcu_memb_call_rcu()
bcentry_evict()
bcentry_alive()
dns_badcache_add()
ns_client_error()
ns_query_done()
query_gotanswer()
query_resume()
fetch_callback()
I skimmed the relevant code and came up with a reasonably stable way (not 100% effective!) to reproduce this:
-
Build BIND 9 against a profiling-enabled (
--enable-prof
) jemalloc build. -
Enable jemalloc profiling:
export MALLOC_CONF="prof_leak:true,lg_prof_sample:0,prof_final:true"
-
Start
named
with the default configuration:named -g -c /dev/null
-
Run:
dig @localhost dnssec-failed.org. sleep 31 dig @localhost dnssec-failed.org.
-
Interrupt
named
using CTRL+C.
Armed with the reproducer above, I subsequently prepared a custom
liburcu build with debug symbols included and optimizations disabled.
This caused the last function in the call stack to change from
urcu_memb_unregister_thread()
to call_rcu_data_init()
, which made
more sense. It looked like some RCU-managed structure is not freed
before shutdown.
While debugging this further, I added some printf()
calls to see which
exact pointers are not freed until shutdown. At that point, I noticed
that all the debug snippets look like this:
01-Aug-2023 12:50:43.001 shutting down
01-Aug-2023 12:50:43.031 exiting
<jemalloc>: Leak approximation summary: ~87056 bytes, ~20 objects, >= 13 contexts
<jemalloc>: Run jeprof on dump output for leak detail
-- urcu: calling call_rcu_data_free() for 0x7f11b2dad000
Specifically, jemalloc may prepare its profiling summary before liburcu gets a chance to release all the data it allocated for its internal structures. I believe that the two relevant bits of code (the jemalloc profiling code and the urcu cleanup code) may be run concurrently because the reproducer presented above is not 100% effective (i.e. the heap profile sometimes does not include the offending liburcu pointer, even though in such cases liburcu seemingly frees it after jemalloc prints its summary).
I think the solution for this is to add call_rcu_data_init()
to the
list of functions ignored by jeprof
. This does not look like a
runaway memory leak, just a glitch at shutdown time. I checked in my
local development environment that actual pointer leaks in the BIND 9
code will still be caught with this tweak in place.