BIND 9.20.1 memory use issue
As reported originally by a customer, it is possible to cause Authoritative (recursion no;) BIND 9.20.1 to use a large amount of RAM on the system by merely sending it queries about zones for which it is not authoritative at a rapid pace. Customer reported that the out of memory killer would kill BIND. I was not able to reliably reproduce this behavior (only once in several tries - logs below). I was able to reliably cause BIND 9.20.1 to use up nearly all of the RAM on the system and to cause some swap to be used.
Out of memory kill logs
Sep 10 15:45:55 ubuntu-2204 kernel: [ 1926] 114 1926 5616532 3948447 40157184 1037284 0 named
Sep 10 15:45:55 ubuntu-2204 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=named.service,mems_allowed=0,global_oom,task_memcg=/system.slicenamed.service,task=named,pid=1926,uid=114
Sep 10 15:45:55 ubuntu-2204 kernel: Out of memory: Killed process 1926 (named) total-vm:22466128kB, anon-rss:15790944kB, file-rss:2892kB, shmem-rss:0kB, UID:114 pgtables:39216kB oom_score_adj:0
Sep 10 15:45:56 ubuntu-2204 systemd[1]: named.service: Main process exited, code=killed, status=9/KILL
░░ An ExecStart= process belonging to unit named.service has exited.
Sep 10 15:45:56 ubuntu-2204 systemd[1]: named.service: Failed with result 'signal'.
░░ The unit named.service has entered the 'failed' state with result 'signal'.
Sep 10 15:45:56 ubuntu-2204 systemd[1]: named.service: Consumed 1min 21.225s CPU time.
░░ The unit named.service completed and consumed the indicated resources.
To reproduce:
- Client VM (192.168.20.16/24 and 10.238.0.16/16) with
kxdpgun
- Server (192.168.20.15/24 and 10.238.0.15/16) with BIND 9.20.1 on Ubuntu 22.04 with ISC open source packages (https://launchpad.net/~isc/+archive/ubuntu/bind/) (note that I did not attempt other versions and merely matched the customer's version).
- use a configuration like this: named.conf.
- Use a file called txt73 (txt73) with a single query in it that the BIND server will respond to with REFUSED.
- Run BIND normally on the server VM.
- Run
sudo kxdpgun -t 240 -i txt73 -l 10.238.0.0/16 -Q 500000 192.168.20.15
on the client VM. - Observe large RAM usage that persists for some time (seconds or up to hours) after the queries stop.
The steps above might be overkill to reproduce this. I did not try simpler steps. There were manually produced core files supplied by the customer. RT23950