memory statistics might be too expensive
Summary
While doing large zone transfer, profiling shows that mem_putstats() is frequently seen on CPU. This is suspicious.
The whole thing could be skewed by locking that doesn't show in the flamegraph. Then even things that doesn't take much time show up - which means that mem_putstats() might be a proxy for something is off in the graph.
So, here is (interactive if you download it) flame graph of a net
zone transfer, taken roughly 1 second after start and 2 seconds before end of it:
BIND version used
BIND 9.19.8-dev (Development Release) <id:e3ffe75>
running on Linux x86_64 6.0.10-arch2-1 #1 SMP PREEMPT_DYNAMIC Sat, 26 Nov 2022 16:51:18 +0000
built by make with '--cache-file=/home/pspacek/w/pkg/bind/config.cache' '--enable-full-report' '--prefix=/tmp/main' '--with-libjson=yes' '--with-atf=no' 'CC=ccache gcc' 'CFLAGS=-O2 -march=native -ggdb3' 'PKG_CONFIG_PATH=/usr/local/lib/pkgconfig'
compiled by GCC 12.2.0
compiled with OpenSSL version: OpenSSL 3.0.7 1 Nov 2022
linked to OpenSSL version: OpenSSL 3.0.7 1 Nov 2022
compiled with libuv version: 1.44.2
linked to libuv version: 1.44.2
Steps to reproduce
- Compile with "-O2 -ggdb3"
- Load net zone
- Fire zone transfer (beware, dig too slow): qnetaxfr.blob
cat data/qnetaxfr.blob - | socat - tcp-connect:[::1]:53 > /dev/null
- While zone transfer is running, profile:
timeout 45 perf record -F99 --call-graph dwarf,65528 --pid=445566
- Terminate profiling right before the transfer ends. (See
timeout
above and modify it according to wall clock time of zone transfer.)
What is the current bug behavior?
This is call for investigation. It is possible there is no bug.
Relevant configuration files
Just configure net.
zone. To make loading faster I disable bunch of checks:
options {
max-cache-size 10M;
allow-recursion { 127.0.0.0/8; ::1; };
allow-transfer { any; };
zone-statistics full;
check-dup-records warn;
check-integrity no;
check-mx ignore;
check-mx-cname ignore;
check-names primary ignore;
check-names secondary ignore;
check-sibling no;
check-spf ignore ;
check-srv-cname ignore;
check-wildcard no;
};
zone net {
type master;
file "/tmp/net/net.txt";
masterfile-format text;
notify no;
};
Relevant logs and/or screenshots
transfer of 'net/IN': AXFR ended: 93186 messages, 34921946 records, 1109361322 bytes, 48.285 secs (22975278 bytes/sec) (serial 1667779223)
Possible fixes
For now just investigate.