Auto-Vectorize Compiler Optimization Causing Exception / Crash
Summary
The named process crashes with a segementation fault when compiled with the optimization flags
-fmarch=native -ftree-slp-vectorize
and jemalloc disabed. During tests, X41 found that compilers
clang22 version 16 and GCC23 version 10.4 were affected.
On a Thinkpad T15p Gen3 system a crash can be reproduced reliably when compiling with the
following commands shown in listing 4.32:
export CFLAGS="-O1 -g -march=native -ftree-slp-vectorize
export CXXFLAGS=$CFLAGS
make clean
./configure --without-jemalloc && make
When starting named with the example configuration via the command line ./named -f -d10 -M fill, a crash occurs:
Thread 1 "named" received signal SIGSEGV, Segmentation fault.
0x00007ffff7903136 in allocate_version (mctx=mctx@entry=0x5555556da340, serial=serial@entry=1, references=references@entry=1, writer=writer@entry=false) at rbtdb.c:1282
1282 ISC_LIST_INIT(version->resigned_list);
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
-----------------------[ REGISTERS / show-flags off / show-compact-regs off ]--------------------------
*RAX 0x5555556dff20 <- 0x0
*RBX 0x5555556dfb10 <- 0xbebebebe00000001
RCX 0x0
*RDX 0x28
*RDI 0x5555556dff20 <- 0x0
RSI 0x0
*R8 0x5555556dff20 <- 0x0
*R9 0x7ffff720cbe0 (main_arena+96) -> 0x5555556ff3f0 <- 0x211b0cca04000100
*R10 0x4000000
R11 0x0
*R12 0x1
*R13 0x5555556da340 <- 0x44d656d43
R14 0x0
*R15 0x1
*RBP 0x7fffffff89d0 <- 0x0
*RSP 0x7fffffff89a0 -> 0x5555556dad60 -> 0x5555556dfa90 <- 0x5242542b /* '+TBR' */
*RIP 0x7ffff7903136 (allocate_version+122) <- vmovdqa ymmword ptr [rbx + 0x20], ymm0
--------------------------[ DISASM / x86-64 / set emulate on ]-------------------------------
> 0x7ffff7903136 <allocate_version+122> vmovdqa ymmword ptr [rbx + 0x20], ymm0
0x7ffff790313b <allocate_version+127> mov qword ptr [rbx + 0x40], -1
0x7ffff7903143 <allocate_version+135> mov qword ptr [rbx + 0x48], -1
0x7ffff790314b <allocate_version+143> mov rax, rbx
0x7ffff790314e <allocate_version+146> add rsp, 8
0x7ffff7903152 <allocate_version+150> pop rbx
0x7ffff7903153 <allocate_version+151> pop r12
0x7ffff7903155 <allocate_version+153> pop r13
0x7ffff7903157 <allocate_version+155> pop r14
0x7ffff7903159 <allocate_version+157> pop r15
0x7ffff790315b <allocate_version+159> pop rbp
-------------------------------[ SOURCE (CODE) ]--------------------------------------
In file: /home/user/src/bind9-clean/lib/dns/rbtdb.c
1277 version->glue_table = isc_mem_getx(mctx, size, ISC_MEM_ZERO);
1278
1279 version->writer = writer;
1280 version->commit_ok = false;
1281 ISC_LIST_INIT(version->changed_list);
> 1282 ISC_LIST_INIT(version->resigned_list);
1283 ISC_LINK_INIT(version, link);
1284
1285 return (version);
1286 }
1287
The crash occurs in function allocate_version() although the code looks insuspicious as seen in the following listing:
static rbtdb_version_t *
allocate_version(isc_mem_t *mctx, rbtdb_serial_t serial,
unsigned int references, bool writer) {
rbtdb_version_t *version;
size_t size;
version = isc_mem_get(mctx, sizeof(*version));
version->serial = serial;
isc_refcount_init(&version->references, references);
isc_rwlock_init(&version->glue_rwlock);
version->glue_table_bits = ISC_HASH_MIN_BITS;
version->glue_table_nodecount = 0U;
size = ISC_HASHSIZE(version->glue_table_bits) *
sizeof(version->glue_table[0]);
version->glue_table = isc_mem_getx(mctx, size, ISC_MEM_ZERO);
version->writer = writer;
version->commit_ok = false;
ISC_LIST_INIT(version->changed_list); // MARK incorrect optimization of these consecutive ISC_LIST_INIT statements
ISC_LIST_INIT(version->resigned_list);
ISC_LINK_INIT(version, link);
return (version);
}
After extensive debugging it was found that an unaligned pointer is used in a x86_64 vector
instruction: vmovdqa ymmword ptr [rbx + 0x20], ymm0
Further investigation reveals that this seems to be caused by a miscompilation due to automatic
vectorization optimizations caused by the flags -march=native -ftree-slp-vectorize
, which
cause the compiler to use the native instruction set of the detected architecture and to apply
auto-vectorization24 performance optimizations.
Making changes to the order and interleaving of statements in the C code that should not have
any effect on the list operation semantics makes the crash disappear as shown in the following listing
of a patch for file lib/dns/rbtdb.c
:
ISC_LIST_INIT(version->changed_list);
volatile int noop = 1; if (noop) { // MARK no effect / NOOP to break the vectorization
ISC_LIST_INIT(version->resigned_list);
}
This shows that the observed behavior could be caused by a miscompilation / compiler error. Additional crashes appear in other parts of the code, showing this is a general problem.
Further investigation revealed that the exception is cased by using a non-32-bit aligned pointer
in the vmovdqa
vector instruction. If forcing alignment in the rbtdb_version struct via compiler
attributes, the crash disappears as shown in the following code listing:
typedef struct rbtdb_version {
/* Not locked */
rbtdb_serial_t serial;
dns_rbtdb_t *rbtdb;
/*
* Protected in the refcount routines.
* XXXJT: should we change the lock policy based on the refcount
* performance?
*/
isc_refcount_t references;
/* Locked by database lock. */
bool writer;
bool commit_ok;
struct {} __attribute__ ((aligned (32))); // MARK force alignment
rbtdb_changedlist_t changed_list;
struct {} __attribute__ ((aligned (32)));
Since the compiler is introducing the vectorized instructions due to optimizations on its own, X41 considers the root cause to be a compiler bug, affecting both GCC and Clang compilers.
While the impact of a startup crash without user-controlled data is most likely not security relevant, the pattern of having two consecutive list operations being affected by a compiler optimization, could cause to exploitable scenarios in the future, should attackers be able to find an way to trigger the usage of misaligned pointers.
BIND version used
BIND 9.19.13 (Development Release) id:66a3c6b
Possible fixes
X41 recommends to investigate the observed behavior and to potentially file a bug to the GCC and Clang maintainers. As a temporary workaround, the affected structures could be changed to enforce 32-bit alignment as shown in the listing above.