small updates to large RPZ cause CPU consumption spikes
Summary
Two-RR modification to RPZ causes CPU and latency spikes.
BIND version used
- Affects v9.18: v9_18_10
Steps to reproduce
- Configure resolver with a large RPZ fetched from secondary. Tested with 200k RRs, consisting of 100k pairs like:
domain.rpz. 300 CNAME .
*.domain.rpz. 300 CNAME .
- Update the RPZ by adding and removing one pair, separated by 20 second delay.
What is the current bug behavior?
Here is a latency comparison of 1-minute test run, on 100 k QPS (about 1/4 of capacity) while removing and then pair of RPZ RRs, with 20 seconds in between. Green line shows the same configuration without addition/deletion, blue line is the same with non-default reuseport no;
:
Relevant configuration files
options {
// reuseport no;
response-policy {
zone "phish." min-update-interval 10;
};
};
zone "phish." {
primaries { 2600::; };
type secondary;
};
Relevant logs and/or screenshots
I suspect that the delays happen between reload start
and reload done
.
14-Dec-2022 16:08:30.300 rpz: phish.edit.host.dtq: reload start
14-Dec-2022 16:08:30.300 transfer of 'phish.edit.host.dtq/IN' from 2600:1f18:634c:d17e::da5d#53: Transfer status: success
14-Dec-2022 16:08:30.300 transfer of 'phish.edit.host.dtq/IN' from 2600:1f18:634c:d17e::da5d#53: Transfer completed: 1 messages, 6 records, 280 bytes, 0.001 secs (280000 bytes/sec) (serial 1638362951)
14-Dec-2022 16:08:30.740 rpz: phish.edit.host.dtq: reload done