investigate and improve lock contention around mctx
It seems that locks around shared memory contexts (mctx) are contended for in various scenarios. This leads to worse performance.
BIND version used
Steps to reproduce
Compile named using these options:
OPTIMIZE="-Og" CFLAGS="g3 -ggdb -Wno-deprecated-declarations -fno-omit-frame-pointer -fno-optimize-sibling-calls -fPIC -rdynamic" LDFLAGS="-fPIE"
Run mutrace and named with more threads:
mutrace --hash-size=594327 -d named -n 32 -g -c /dev/null
Note: Systems with binutils 2.34+ require mutrace patch https://github.com/bconry/mutrace/pull/4
Send some cache hit traffic to named. E.g. run:
yes '. NS' | dnsperf -c 100 -l 10
SIGINT named when dnsperf finishes.
While reading output of mutrace ignore misleading line
mutrace.c:750 unlock_hash()in stack tracebacks if it is present (depends on mutrace version).
What is the current bug behavior?
Lock contention, leading to bad performance.
Relevant logs and/or screenshots
Most contended mutex is:
Mutex #60844 (0x0x7f442da070d0) first referenced by: mutrace.c:750 unlock_hash() mutex.c:288 isc__mutex_init() netmgr.c:249 isc_nm_start() main.c:934 create_managers() main.c:1248 setup() main.c:1555 main() ??:0 __libc_start_main()
In source code netmgr.c:249 it is lock created on line:
249 isc_mempool_create(mgr->mctx, sizeof(isc__netievent_storage_t), 250 &mgr->evpool);
I.e. it is locking around
This is simplest way to show lock contention with tools. During high-QPS benchmarking using
kxdpgun the lock contention around
mgr->mctx was indeed creating measurable performance problem. On 16 core system this shared mctx leads to performance drop 3x when compared with situation where each thread has its own mctx.
Generally having a separate
mctx per thread might be a good idea, but it needs careful design so objects which get passed between threads can be de/reallocated correctly.