named generates core and ends with signal SIGFPE, Arithmetic exception.
Summary
For max-cache-size value, 1823420M named is crashing with arithmetic exception and generating the core.
BIND version affected
Version info
BIND 9.16.23-RH (Extended Support Version) <id:fde3b1f>
running on Linux x86_64 5.15.0-105.125.6.2.1.el9uek.x86_64 #2 SMP Thu Sep 14 21:51:15 PDT 2023
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/bin/python3' '--with-libtool' '--localstatedir=/var' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--with-maxminddb' '--with-dlopen=yes' '--with-gssapi=yes' '--with-lmdb=yes' '--without-libjson' '--with-json-c' '--enable-dnstap' '--enable-fixed-rrset' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CC=gcc' 'CFLAGS= -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 ' 'LT_SYS_LIBRARY_PATH=/usr/lib64:' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 11.4.1 20230605 (Red Hat 11.4.1-2.1.0.1)
compiled with OpenSSL version: OpenSSL 3.0.7 1 Nov 2022
linked to OpenSSL version: OpenSSL 3.0.7 1 Nov 2022
compiled with libuv version: 1.42.0
linked to libuv version: 1.42.0
compiled with libxml2 version: 2.9.13
linked to libxml2 version: 20913
compiled with json-c version: 0.14
linked to json-c version: 0.14
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.5.2
compiled with protobuf-c version: 1.3.3
linked to protobuf-c version: 1.3.3
threads support is enabled
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
DNSSEC root key: /etc/bind.keys
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
named lock file: /var/run/named/named.lock
geoip-directory: /usr/share/GeoIP
Relevant configuration files
Add max-cache-size value 1823420M option in named.conf to replicate the issue, why this particular value will be explained further in the report.
In our ssystem total memory is cat ./proc/meminfo | grep -i MemTotal MemTotal: 2124386752 kB
we haven't configured the max cache size option in our configuration file so it takes the max-cache size to 90% of its total memory as seen in the log
none:89: 'max-cache-size 90%' - setting to 1867136MB (out of 2074596MB)
Relevant logs
As soon as it start running it generates
Program terminated with signal SIGFPE, Arithmetic exception.
#0 0x00007f29bc9f5a8d in more_frags (new_size=0, ctx=0x7f29b032e6c0) at ../../../lib/isc/mem.c:457
457 frags = (int)(total_size / new_size);
[Current thread is 1 (Thread 0x7f29bb9e5640 (LWP 346082))]
(gdb) bt
#0 0x00007f29bc9f5a8d in more_frags (new_size=0, ctx=0x7f29b032e6c0) at ../../../lib/isc/mem.c:457
#1 mem_getunlocked (ctx=ctx@entry=0x7f29b032e6c0, size=size@entry=4294967296) at ../../../lib/isc/mem.c:522
#2 0x00007f29bca003ce in isc___mem_get (ctx0=0x7f29b032e6c0, size=4294967296, file=0x7f29bcc8e980 "../../../lib/dns/rbt.c", line=2387) at ../../../lib/isc/mem.c:1066
#3 0x00007f29bcb6c465 in rehash (newbits=<optimized out>, rbt=0x7f29b682d010) at ../../../lib/dns/rbt.c:2387
#4 maybe_rehash (rbt=0x7f29b682d010, newcount=<optimized out>) at ../../../lib/dns/rbt.c:2409
#5 0x00007f29bcb6d9d0 in dns_rbt_adjusthashsize (size=<optimized out>, rbt=<optimized out>) at ../../../lib/dns/rbt.c:1098
#6 dns_rbt_adjusthashsize (rbt=<optimized out>, size=<optimized out>) at ../../../lib/dns/rbt.c:1084
#7 0x00007f29bcb84dd9 in adjusthashsize (db=0x7f29b6829010, size=1911994449920) at ../../../lib/dns/rbtdb.c:8129
#8 0x000055742d5bd66b in configure_view (view=<optimized out>, viewlist=<optimized out>, config=0x7f29b6d60ee8, vconfig=0x0, cachelist=<optimized out>, kasplist=<optimized out>, bindkeys=0x0,
mctx=0x55742e088c40, actx=0x7f29bbb2f538, need_hints=true) at ../../../bin/named/server.c:4625
#9 0x000055742d5cba8b in load_configuration (filename=<optimized out>, server=server@entry=0x7f29b6d32010, first_time=first_time@entry=true) at ../../../bin/named/server.c:8997
#10 0x000055742d5cdc1e in run_server (task=<optimized out>, event=<optimized out>) at ../../../bin/named/server.c:9709
#11 0x00007f29bca221bd in task_run (task=0x7f29b6d3d010) at ../../../lib/isc/task.c:857
#12 isc_task_run (task=0x7f29b6d3d010) at ../../../lib/isc/task.c:950
#13 0x00007f29bca0d2a9 in isc__nm_async_task (worker=0x55742e09bfb0, ev0=0x7f29b6d478a8) at netmgr/../../../../lib/isc/netmgr/netmgr.c:873
#14 process_netievent (worker=worker@entry=0x55742e09bfb0, ievent=0x7f29b6d478a8) at netmgr/../../../../lib/isc/netmgr/netmgr.c:958
#15 0x00007f29bca0d425 in process_queue (worker=worker@entry=0x55742e09bfb0, type=type@entry=NETIEVENT_TASK) at netmgr/../../../../lib/isc/netmgr/netmgr.c:1027
#16 0x00007f29bca0dc17 in process_all_queues (worker=0x55742e09bfb0) at netmgr/../../../../lib/isc/netmgr/netmgr.c:798
#17 async_cb (handle=0x55742e09c310) at netmgr/../../../../lib/isc/netmgr/netmgr.c:827
#18 0x00007f29bc7a6b3d in uv__async_io (loop=0x55742e09bfc0, w=<optimized out>, events=<optimized out>) at src/unix/async.c:163
#19 0x00007f29bc7c285e in uv__io_poll (loop=0x55742e09bfc0, timeout=<optimized out>) at src/unix/epoll.c:374
#20 0x00007f29bc7ac5a8 in uv__io_poll (timeout=<optimized out>, loop=0x55742e09bfc0) at src/unix/udp.c:122
#21 uv_run (loop=loop@entry=0x55742e09bfc0, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:389
#22 0x00007f29bca0d4b7 in nm_thread (worker0=0x55742e09bfb0) at netmgr/../../../../lib/isc/netmgr/netmgr.c:733
#23 0x00007f29bca1ff9a in isc__trampoline_run (arg=0x55742e09fe90) at ../../../lib/isc/trampoline.c:196
#24 0x00007f29bc200812 in start_thread (arg=<optimized out>) at pthread_create.c:443
#25 0x00007f29bc1a0450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb)
I did some analysis the quantize function in mem_getunlocked is returning value zero for size =4294967296
I did testing around this and it seems like it can return zero for these cases and ours is one of them
for(size_t i=1;i<4294967399;i++)
{
size_t result = quantize(i);
if(result==0)
printf("%lu=%lu\n",i,result);
}
4294967289=0
4294967290=0
4294967291=0
4294967292=0
4294967293=0
4294967294=0
4294967295=0
4294967296=0