glue-cache scales very poorly on multi-CPU systems

Summary

Glue cache scales very poorly and suffers from lock contention. It eventually improves max QPS by 1/3 on 16-thread system with delegation-heavy workload. Maxing out QPS on 16-thread system takes over 300 seconds of query load.

BIND version used

Affects v9.18: v9.18.14
Other versions were not tested, but are assumed to be affected

Steps to reproduce

configure delegation-heavy zone, e.g. SE from https://zonedata.iis.se/
issue queries which hit delegations, preferably unique: querydb.xz

What is the current bug behavior?

Initially QPS is very low, and adding CPUs is not improving performance. Gradually BIND builds glue-cache and overall QPS improves.

What is the expected correct behavior?

Initial QPS should not be that low.
Adding CPUs should improve performance, also initially.

Workaround

options {
    glue-cache no;
};

This provides more predictable performance but incurs ~ 1/3 performance hit (in terms of max QPS).

Benchmarks

16-thread machine in AWS, type c5n.4xlarge
BIND v9.18.4 with glue-cache on / off
SE zone serial 2021122008
client kxdpgun: kxdpgun -t 5 -Q $QPS -i querydb 10.10.126.46 -p 5300
5-second tests, QPS in the table below is average
Individual lines in table are successive tests
Each step starts with the same query set (so successive tests repeat some of the queries)
QPS step is +50k QPS
QPS is incremented only if reponse rate was >= 99 %

You can see that glue-cache yes; requires significant warm-up time and eventually provides up to 1/3 higher max QPS than configuration with glue-cache no;. Problem is the ridiculously long warm-up phase.

glue-cache no config hits max QPS right away without any warm-up.

Raw data - each line is one 5-second benchmark:

glue-cache yes		glue-cache no
QPS	Response rate	QPS	Response rate
50000	77 %	300000	99 %
50000	90 %	350000	97 %
50000	79 %	350000	96 %
50000	99 %	350000	96 %
100000	69 %	350000	97 %
100000	80 %	350000	max reached
100000	99 %
150000	74 %
150000	77 %
150000	83 %
150000	96 %
150000	99 %
200000	79 %
200000	80 %
200000	82 %
200000	82 %
200000	17 %
200000	22 %
200000	28 %
200000	39 %
200000	62 %
200000	99 %
250000	82 %
250000	83 %
250000	84 %
250000	85 %
250000	87 %
250000	90 %
250000	95 %
250000	99 %
300000	85 %
300000	85 %
300000	86 %
300000	86 %
300000	87 %
300000	88 %
300000	90 %
300000	93 %
300000	98 %
300000	99 %
350000	86 %
350000	87 %
350000	87 %
350000	87 %
350000	88 %
350000	88 %
350000	89 %
350000	90 %
350000	92 %
350000	94 %
350000	98 %
350000	99 %
400000	88 %
400000	88 %
400000	88 %
400000	88 %
400000	89 %
400000	89 %
400000	90 %
400000	90 %
400000	91 %
400000	92 %
400000	93 %
400000	95 %
400000	98 %
400000	99 %
450000	82 %
450000	82 %
450000	84 %
450000	83 %
450000	83 %
450000	83 %
450000	84 %
450000	84 %
450000	85 %
450000	85 %
450000	86 %
450000	85 %
450000	90 %
450000	88 %
450000	91 %
450000	92 %
450000	max reached

Flame chart with sleeper + waker threads generated by offwaketime.py:

(Sorry for missing stack frames, but you get the point.)

Edited May 04, 2023 by Petr Špaček

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information