investigate and improve lock contention around mctx

Summary

It seems that locks around shared memory contexts (mctx) are contended for in various scenarios. This leads to worse performance.

BIND version used

a23c5d29

Steps to reproduce

Compile named using these options: OPTIMIZE="-Og" CFLAGS="g3 -ggdb -Wno-deprecated-declarations -fno-omit-frame-pointer -fno-optimize-sibling-calls -fPIC -rdynamic" LDFLAGS="-fPIE"
Run mutrace and named with more threads: mutrace --hash-size=594327 -d named -n 32 -g -c /dev/null

Note: Systems with binutils 2.34+ require mutrace patch https://github.com/bconry/mutrace/pull/4

Send some cache hit traffic to named. E.g. run: yes '. NS' | dnsperf -c 100 -l 10
SIGINT named when dnsperf finishes.
While reading output of mutrace ignore misleading line mutrace.c:750 unlock_hash() in stack tracebacks if it is present (depends on mutrace version).

What is the current bug behavior?

Lock contention, leading to bad performance.

Relevant logs and/or screenshots

mutrace.log

Most contended mutex is:

Mutex #60844 (0x0x7f442da070d0) first referenced by:
        mutrace.c:750   unlock_hash()
        mutex.c:288     isc__mutex_init()
        netmgr.c:249    isc_nm_start()
        main.c:934      create_managers()
        main.c:1248     setup()
        main.c:1555     main()
        ??:0    __libc_start_main()

In source code netmgr.c:249 it is lock created on line:

    249         isc_mempool_create(mgr->mctx, sizeof(isc__netievent_storage_t),
    250                            &mgr->evpool);

I.e. it is locking around mgr->mctx.

This is simplest way to show lock contention with tools. During high-QPS benchmarking using kxdpgun the lock contention around mgr->mctx was indeed creating measurable performance problem. On 16 core system this shared mctx leads to performance drop 3x when compared with situation where each thread has its own mctx.

Possible fixes

Generally having a separate mctx per thread might be a good idea, but it needs careful design so objects which get passed between threads can be de/reallocated correctly.

Edited Jan 26, 2021 by Petr Špaček

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information