Querying statistics channel blocks the main loop
Summary
Querying for statistics can cause QPS drop/latency spike/query loss, depending on size of statistics and system load.
The statistics data gathered for the statistics channel is collected on the hot path and thus it blocks normal DNS processing.
BIND version affected
- All current versions
Steps to reproduce
- Cause large output from statistics channel. E.g. configure 10 000 zones:
- Start BIND server with command:
named -g -c named.conf
- Simulate trivial clients using command
yes 'zr7hakuwn2.example A' | dnsperf -S1 -c 128
- Observe QPS for a while to see what is average QPS
- Start hammering the server on stats channel:
while true; do curl 'http://127.0.0.1:8080/json/v1' -O /dev/null; done
What is the current bug behavior?
QPS rates as reported by dnsperf:
1713370265.966231: 146502.106984
1713370266.967267: 143250.592386
1713370267.967821: 143286.619213
1713370268.968858: 145593.020038
1713370269.969570: 145559.361734
1713370270.970602: 145300.050348
1713370271.971653: 144388.247951
1713370272.972684: 147107.332340
1713370273.972899: 146250.556130
1713370274.973988: 103643.132629 <<-- curl loop started here
1713370275.975122: 30878.983233
1713370276.976162: 31635.099497
1713370277.976355: 31065.004454
1713370278.977493: 31122.582501
1713370279.978524: 103021.784540 <<-- curl loop stopped here
1713370280.979562: 146231.212002
1713370281.980602: 146337.808679
1713370282.981632: 144628.033126
What is the expected correct behavior?
Minimal impact as long as system has spare capacity to process requests for statistics.
Edited by Petr Špaček