Ondřej Surý · 15682c0f
--- a/BIND-9-Memory-Explained.md
+++ b/BIND-9-Memory-Explained.md
@@ -2,22 +2,22 @@

 ## The Basics

-BIND 9's basic memory management object is a memory context: the application can have as many as it is practical. There are two reasons for a separate memory context: a) logical separation - this includes both separate accounting, and different configurations, b) contention and speed - access to a memory context pinned on a specific thread will not be blocked by different threads.
+BIND 9 basic memory management object is a memory context, the application can have as many as it is practical. There are two reasons for a separate memory context: a) logical separation - this includes both separate accounting, and different configuration, b) contention and speed - access to a memory context pinned on a specific thread will not be blocked by different threads.

 ## Limiting the memory use

-The configuration option `max-cache-size` only affects the memory context in the cache and ADB (address database) - all other memory contexts are unrestrained. This means setting the `max-cache-size` to 100% would lead to OOM Reaper finding your BIND 9 process and killing it.
+The configuration options `max-cache-size` only affects the memory context in the cache and ADB (address database) - all other memory context are unrestrained. This means setting the `max-cache-size` to 100% would lead to OOM Reaper finding your BIND 9 process and killing it.

 ### BIND 9.11 and BIND 9.16 memory differences

 There are two reasons why BIND 9.16 uses more memory than BIND 9.11:
    
-The networking model has changed - in BIND 9.11 there was a single "listener" that would distribute the incoming work between idle threads. This simpler model was slower, but because there was a single listener, it also consumed less memory.
-BIND 9.16 has a hybrid networking model, using the new multi-listener networking code to receive and process incoming DNS messages (from clients), but still using the old networking code for sending and processing outgoing DNS messages (to other servers). This means it needs to run twice as many threads - there's a threadpool of workers for each function.
+The networking model has changed - in BIND 9.11 there was a single "listener" that would distribute the incoming work between idle threads. This simpler model would be slower, but because there was a single listener, it would also consume less memory.
+Hybrid networking model - BIND 9.16 uses new networking code to receive and process incoming DNS messages (from clients), but it still uses old networking code for sending and processing outgoing DNS messages (to other servers). This means it needs to run double number of threads - there's a threadpool of workers for each function.

 ### BIND 9.16 and BIND 9.18

-Memory usage in BIND 9.18 is lower than in 9.16, and similar to the memory usage in 9.11, because the part that sends and processes outgoing DNS message was refactored to use the new networking code. Therefore BIND 9.18 uses half the number of threads that BIND 9.16 was using.
+The memory usage in BIND 9.18 is again lower and similar to what the memory usage was in 9.11 because the part that sends and processes outgoing DNS message was refactored to use the new networking code and therefore uses half the number of threads that BIND 9.16 was using.

 The other major change implemented in BIND 9.18 was replacement of the internal memory allocator with the jemalloc memory allocator. The internal memory allocator would keep pools of memory chunks for later reuse and would never free up the reserved memory. The jemalloc memory allocator is much better suited to the memory usage patterns that BIND 9 exhibits and is able to be both fast and memory efficient.

@@ -29,7 +29,25 @@ Measuring the real memory usage can be tricky, but fortunately, there are some t

 ### Measuring Memory Internally

-The statistics channel exposes counters for the memory contexts. The important values are 'InUse' and 'Malloced'. The difference between the two is that the InUse counter shows the memory used "externally" and 'Malloced' is the memory including the management overhead (the more memory contexts the more overhead there is).
+The statistics channel exposes counters about the memory contexts. The important values are 'InUse' and 'Malloced'. The difference between the two is that the InUse counter shows the memory used "externally" and 'Malloced' is the memory including the management overhead (the more memory contexts the more overhead there is).
+
+You can use attached [memory-json.py](uploads/e9398e64964bbd68e7715c594dadbd3e/memory-json.py) script to parse the statistics channel output to receive following data (this is from `main` branch):
+```
+OpenSSL: 268.8KiB 277.0KiB
+uv: 6.1KiB 14.3KiB
+libxml2: 1.0KiB 9.2KiB
+<unknown>: 9.2KiB 17.4KiB
+main: 1.4MiB 1.5MiB
+loop: 10.8MiB 10.8MiB
+zonemgr-mctxpoo: 20.5KiB 86.1KiB
+clientmgr: 768.0B 66.4KiB
+cache: 31.9KiB 48.3KiB
+cache_heap: 2.1KiB 18.5KiB
+ADB: 525.7KiB 542.1KiB
+SUMMARY
+INUSE: 13.1MiB == 13.1MiB
+MALLOCED: 13.3MiB == 13.3MiB
+```

 ### Measuring Memory Externally

@@ -37,31 +55,51 @@ The rule of thumb is "Don't use top command" - there are better tools that are l

 #### pmap

-pmap provides detailed statistics, but can be too chatty - the basic usage is `pmap -x -p <pid>`. It prints information about all pages used by the program which includes shared libraries, the program itself and heap. The important number is the last one "Dirty" - it shows the memory "used" by the BIND 9.
+`pmap` provides detailed statistics, but can be too chatty - the basic usage is `pmap -x -p <pid>`. It prints information about all pages used by the program which includes shared libraries, the program itself and heap. The important number is the last one "Dirty" - it shows the memory "used" by the BIND 9.
+
+Example `pmap` output could look like this:
+```
+$ pmap -x -p $(pidof named)
+3301879:   /usr/sbin/named -4 -g -c named.conf
+Address           Kbytes     RSS   Dirty Mode  Mapping
+000055872b587000      88      88       0 r---- /usr/sbin/named
+[...too many lines...]
+00007ffc52753000     132      40      40 rw---   [ stack ]
+00007ffc527c1000      16       0       0 r----   [ anon ]
+00007ffc527c5000       8       4       0 r-x--   [ anon ]
+---------------- ------- ------- -------
+total kB          760180   74324   60708
+```

 #### smem

-smem provides fewer details, so if you want only a "single" number run `smem -P named` and look for the USS column - this provides the information about memory used by the program sans the shared library. The PSS column adds shared libraries divided by the number of programs using those libraries, and RSS is the normal Resident Size.
+`smem` provides less details, so if you want only "single" number run `smem -P named` and look for the USS column - this provides the information about memory used by the program sans the shared library. The PSS column adds shared libraries divided by the number of programs using those libraries, and RSS is the normal Resident Size.
+
+```
+$ smem -P named -a
+    PID User   Command                                                               Swap   USS   PSS   RSS
+3301879 ondrej /home/ondrej/Projects/bind9/bin/named/.libs/named -4 -g -c named.conf    0 69664 70201 74324
+```

 ### Differences

 There are couple of explanations why the numbers reported by the BIND 9 statistics channel might differ from the memory usage reported by the operating system.

-**External libraries** - BIND 9 uses several external libraries - OpenSSL, libuv, libxml2, json-c and possibly others. All these also need memory from the operating system to operate. The difference should not be large, but it's also not negligible. If the difference between the used memory reported by the internal statistics channel and USS is large (on a busy server), then congratulations, you've found a leak in external library. (NOTE: BIND 9.19 - the development version - provides its own memory context for OpenSSL, libuv and libxml2 if the library versions are recent enough.)
-**Memory fragmentation** - there's quite a lot of churn in the memory allocations and deallocations on a busy server, and memory gets fragmented. The default Linux allocator isn't particularly good with the BIND 9 memory usage patterns. Using jemalloc is strongly recommended as it handles memory fragmentation much better and is also faster.
+External libraries - BIND 9 uses several external libraries - OpenSSL, libuv, libxml2, json-c and possibly others. All these also need memory from the operating system to operate. The difference should not be large, but it's also not negligible. If the difference between the used memory reported by the internal statistics channel and USS is large (on a busy server), then congratulations, you've found a leak in external library. (NOTE: BIND 9.19 - the development version - provides own memory context for OpenSSL, libuv and libxml2 if the library versions are recent enough.)
+Memory fragmentation - there's quite a churn in the memory allocations and deallocations on the busy server, and memory gets fragmented - the default Linux allocator isn't particulary good with the BIND 9 memory usage patters. Using jemalloc is strongly recommended as it handles the memory fragmentation much better and is also faster.

 ## Memory Profiling

 When compiled (or even linked using `LD_PRELOAD`), `jemalloc` can produce **heap** snapshots based on triggers (time, size, ...).  This can be later analysed using `jeprof` tool to see where did the memory went.

-A very basic profile:
+The very basics would be:
 ```
 export MALLOC_CONF="abort_conf:true,prof:true,lg_prof_interval:19,lg_prof_sample:19,prof_prefix:jeprof"
 export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 # you don't need that if compiled with jemalloc
 /usr/sbin/named # use your normal options and configuration that you use in production
 ```

-You'll probably need to fine tune the `lg_prof_interval` and `lg_prof_sample` numbers (it's **log base 2**) to manage the file size.
+You'll most probably need to fine tune the `lg_prof_interval` and `lg_prof_sample` numbers (it's **log base 2**) if there's too much file or too little.

 After running the benchmark or the regular workload, you should end up with bunch of `jeprof.<pid>.<m>.i<n>.heap` files.  Pick the latest and run:

@@ -73,4 +111,14 @@ jeprof \
 	/usr/sbin/named **HEAP FILE** --pdf > "jeprof.pdf"
 ```

-More options can be found in [jeprof](https://manpages.ubuntu.com/manpages/impish/man1/jeprof.1.html) manual page, and can't be taken as is, but must be interpreted with knowledge of BIND 9 internals.  That said if you are reporting what you think is a memory issue, attaching output of the `jeprof` is certainly going to help.
\ No newline at end of file
+More options can be found in [jeprof](https://manpages.ubuntu.com/manpages/impish/man1/jeprof.1.html) manual page, and can't be taken as is, but must be interpreted with knowledge of BIND 9 internals.  That said if you are reporting what you think is a memory issue, attaching output of the `jeprof` will certainly help.
+
+## Some measurements
+
+To support what has been said in this article, here are some basic graphs comparing 9.11, 9.16, 9.18, and 9.19 (codenamed as `main`).
+
+As you can see, the 9.18 and 9.19 memory usage is in the same ballpark as 9.11, but the latency has improved greatly.  The 9.16 memory usage is double, as described above (double number of worker threads).
+
+![all-latency-since_0-until_60-2](uploads/bcbd74704b6cce50fcdf29ac131d50d0/all-latency-since_0-until_60-2.png)
+![all-latency-since_60-until_120](uploads/7767f7276982a4eda11806cb5130c69f/all-latency-since_60-until_120.png)
+![all-resmon.memory.current-docker](uploads/0130242517994173bf12df6c7d9785f6/all-resmon.memory.current-docker.png)
\ No newline at end of file