|
|
## The Basics
|
|
|
|
|
|
The basic BIND 9 memory management object is a memory context: the application can have as many as it is practical. There are two reasons for a separate memory context: a) logical separation - this includes both separate accounting, and different configuration, and b) contention and speed - access to a memory context pinned on a specific thread will not be blocked by different threads.
|
|
|
The basic BIND 9 memory management object is a memory context: the application can have as many as is practical. There are two reasons for a separate memory context: a) logical separation - this includes both separate accounting, and different configuration, and b) contention and speed - access to a memory context pinned on a specific thread will not be blocked by different threads.
|
|
|
|
|
|
## Limiting memory use
|
|
|
|
|
|
The configuration option `max-cache-size` only affects the memory context in the cache and ADB (address database). All other memory contexts are unrestrained. This means setting the `max-cache-size` to 100% would lead to OOM Reaper finding your BIND 9 process and killing it.
|
|
|
The configuration option `max-cache-size` only affects the memory context in the cache and ADB (address database). All other memory contexts are unconstrained. This means setting the `max-cache-size` to 100% would lead to the OOM Reaper finding your BIND 9 process and killing it.
|
|
|
|
|
|
### BIND 9.16 uses more memory than BIND 9.11
|
|
|
|
|
|
There are two reasons for this:
|
|
|
|
|
|
1. The networking model has changed. In BIND 9.11 there was a single "listener" that distributed the incoming work between idle threads. This simpler model was slower, but because there was a single listener, it also consumed less memory.
|
|
|
1. The networking model has changed. In BIND 9.11 there was a single "listener" that distributed the incoming work between idle threads. This simpler model was slower, but because there was a single listener, it also consumed less memory than the current multiple listener model.
|
|
|
2. BIND 9.16 uses a hybrid of the new and old networking memory. BIND 9.16 uses the new networking code to receive and process incoming DNS messages (from clients), but it still uses the older networking code for sending and processing outgoing DNS messages (to other servers). This means it needs to run twice as many threads - there's a threadpool of workers for each function.
|
|
|
|
|
|
### BIND 9.18 uses less memory than BIND 9.16
|
|
|
|
|
|
BIND 9.18 uses less memory than 9.16, similar to the memory usage in 9.11. The part that sends and processes outgoing DNS messages (server side) was refactored to use the new networking code and therefore uses half as many threads as BIND 9.16 used.
|
|
|
|
|
|
The other major change implemented in BIND 9.18 was the replacement of the internal memory allocator with the jemalloc memory allocator. The internal memory allocator would keep pools of memory chunks for later reuse and would never free up the reserved memory. The jemalloc memory allocator is much better suited to the memory usage patterns that BIND 9 exhibits and is able to be both fast and memory efficient.
|
|
|
The other major change implemented in BIND 9.18 was the replacement of the internal memory allocator with the jemalloc memory allocator. The internal memory allocator kept pools of memory for later reuse and would never free up the reserved memory. The jemalloc memory allocator is much better suited to the memory usage patterns that BIND 9 exhibits and is able to be both fast and memory efficient.
|
|
|
|
|
|
Our general recommendation for all deployments is to use jemalloc even with BIND 9.16 by forcing linkage via extra LDFLAGS (./configure LDFLAGS="-ljemalloc" should do the trick).
|
|
|
Our general recommendation for all deployments is to use jemalloc if possible. You can use jemalloc with BIND 9.16 by forcing the linkage via extra LDFLAGS (./configure LDFLAGS="-ljemalloc" should do the trick).
|
|
|
|
|
|
## Measuring Memory
|
|
|
|
... | ... | @@ -27,7 +27,7 @@ Measuring real memory usage can be tricky, but fortunately, there are some tools |
|
|
|
|
|
### Measuring Memory Internally
|
|
|
|
|
|
The statistics channel exposes counters about the memory contexts. The important values are 'InUse' and 'Malloced'. The difference between the two is that the 'InUse' counter shows the memory used "externally" and 'Malloced' is the memory including the management overhead (the more memory contexts the more overhead there is).
|
|
|
The statistics channel exposes counters for the memory contexts. The important values are 'InUse' and 'Malloced'. The 'InUse' counter shows the memory used "externally" and 'Malloced' includes the management overhead (the more memory contexts the more overhead there is).
|
|
|
|
|
|
You can use attached [memory-json.py](uploads/e9398e64964bbd68e7715c594dadbd3e/memory-json.py) script to parse the statistics channel output to receive following data (this is from `main` branch):
|
|
|
|
... | ... | @@ -50,7 +50,7 @@ MALLOCED: 13.3MiB == 13.3MiB |
|
|
|
|
|
### Measuring Memory Externally
|
|
|
|
|
|
The rule of thumb is "Don't use the 'top' command" - there are better tools that are less misleading. There are two tools the are easily available on modern Linux systems - pmap and smem.
|
|
|
The rule of thumb is "Don't use the 'top' command" - there are better tools that are less misleading. There are two tools the are easily available on modern Linux systems - **pmap and smem**.
|
|
|
|
|
|
#### pmap
|
|
|
|
... | ... | @@ -83,7 +83,7 @@ $ smem -P named -a |
|
|
|
|
|
### Differences
|
|
|
|
|
|
There are couple of explanations why the numbers reported by the BIND 9 statistics channel might differ from the memory usage reported by the operating system.
|
|
|
There are couple of reasons that the numbers reported by the BIND 9 statistics channel might differ from the memory usage reported by the operating system.
|
|
|
|
|
|
External libraries
|
|
|
BIND 9 uses several external libraries - OpenSSL, libuv, libxml2, json-c and possibly others. All these also need memory from the operating system to operate. The difference should not be large, but it's also not negligible. If the difference between the used memory reported by the internal statistics channel and USS is large (on a busy server), then congratulations, you've found a leak in an external library. (NOTE: BIND 9.19 - the development version - provides own memory context for OpenSSL, libuv and libxml2 if the library versions are recent enough.)
|
... | ... | @@ -93,9 +93,9 @@ There's quite a lot of churn in the memory allocations and deallocations on a bu |
|
|
|
|
|
## Memory Profiling
|
|
|
|
|
|
When compiled (or even linked using `LD_PRELOAD`), `jemalloc` can produce **heap** snapshots based on triggers (time, size, ...). This can be later analysed using `jeprof` tool to see where did the memory went.
|
|
|
When compiled (or even linked using `LD_PRELOAD`), `jemalloc` can produce **heap** snapshots based on triggers (time, size, ...). This can be later analysed using the `jeprof` tool to see where the memory went.
|
|
|
|
|
|
The very basics would be:
|
|
|
The basics are:
|
|
|
|
|
|
```
|
|
|
export MALLOC_CONF="abort_conf:true,prof:true,lg_prof_interval:19,lg_prof_sample:19,prof_prefix:jeprof"
|
... | ... | @@ -103,7 +103,7 @@ export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 # you don't need th |
|
|
/usr/sbin/named # use your normal options and configuration that you use in production
|
|
|
```
|
|
|
|
|
|
You'll most probably need to fine tune the `lg_prof_interval` and `lg_prof_sample` numbers (it's **log base 2**) to get the desired file size.
|
|
|
You'll probably need to fine tune the `lg_prof_interval` and `lg_prof_sample` numbers (it's **log base 2**) to get the desired file size.
|
|
|
|
|
|
After running the benchmark or the regular workload, you should end up with bunch of `jeprof.<pid>.<m>.i<n>.heap` files. Pick the latest and run:
|
|
|
|
... | ... | @@ -115,15 +115,15 @@ jeprof \ |
|
|
/usr/sbin/named **HEAP FILE** --pdf > "jeprof.pdf"
|
|
|
```
|
|
|
|
|
|
More options can be found in [jeprof](https://manpages.ubuntu.com/manpages/impish/man1/jeprof.1.html) manual page, and can't be taken as is, but must be interpreted with knowledge of BIND 9 internals. That said if you are reporting what you think is a memory issue, attaching output of the `jeprof` will certainly help.
|
|
|
More options can be found in [jeprof](https://manpages.ubuntu.com/manpages/impish/man1/jeprof.1.html) manual page. These must be interpreted with knowledge of the BIND 9 internals. That said, if you are reporting what you think is a memory issue, attaching output of the `jeprof` will certainly help.
|
|
|
|
|
|
## Graphs
|
|
|
|
|
|
### Resolver Benchmarks
|
|
|
|
|
|
Here are some basic graphs comparing memory usage in BIND 9.11, 9.16, 9.18, and 9.19 (aka as `main`).
|
|
|
Below are some basic graphs comparing memory usage in BIND 9.11, 9.16, 9.18, and 9.19 (aka `main`).
|
|
|
|
|
|
As you can see, 9.18 and 9.19 memory usage is in the same ballpark as 9.11, but the latency has improved greatly. The 9.16 memory usage is double, as described above (double number of worker threads).
|
|
|
As you can see, 9.18 and 9.19 memory usage is in the same ballpark as 9.11, but the latency has improved greatly. The 9.16 memory usage is double, as described above (because it uses double the number of worker threads).
|
|
|
|
|
|
![all-latency-since_0-until_60-2](uploads/bcbd74704b6cce50fcdf29ac131d50d0/all-latency-since_0-until_60-2.png) ![all-latency-since_60-until_120](uploads/7767f7276982a4eda11806cb5130c69f/all-latency-since_60-until_120.png) ![all-resmon.memory.current-docker](uploads/0130242517994173bf12df6c7d9785f6/all-resmon.memory.current-docker.png)
|
|
|
|
... | ... | |