LMDB 0.9.26 will break "rndc reconfig" (+ other LMDB issues)
The way BIND uses LMDB is at odds with what the authors of that library
expect and intend. While I cannot find any mention of that
recommendation in the docs, the LMDB author himself says that
"you should only ever open an environment once in any particular
process".1 BIND calls mdb_env_open()
and mdb_env_close()
multiple times within the lifetime of a single process, but
AFAICT, doing that in an "open → close → open → close → ..." type of
sequence works just fine. Unfortunately, BIND does something worse and
it is related to what happens during an rndc reconfig
: new views are
configured first and only after that happens, the old views get torn
down. For LMDB, this means that we call mdb_env_open()
for a
previously mdb_env_open()
ed LMDB environment from the same process
and then close the old "instance" of the environment. I am not sure we
ever cared about this a lot because it seemed to Just Work™. It did
bite us once (see 40a90fbf), but we
worked around the problem and moved on.
Still, the docs clearly state:
Do not have open an LMDB database twice in the same process at the same time.
We are not getting away with it as easily this time around.
In December 2019, the FreeBSD port for LMDB started using robust
mutexes, which FreeBSD started supporting in the 11.0 release. This
broke LMDB-related BIND system tests on FreeBSD. I investigated it and
my conclusion was that the problem was likely caused by some low-level
FreeBSD issue that was over my head. I reported it and it was
ultimately determined to be an undefined-behavior-type issue with
what the FreeBSD threading library does when a mutex is unmapped from
memory without being destroyed first. This prompted LMDB maintainers to
merge a fix which causes LMDB to destroy locktable mutexes when
mdb_env_close()
is called and the process calling that function is the
only remaining user of the LMDB environment. This change has been
already applied as a patch to the FreeBSD port of LMDB 0.9.24 and
I fully expect it to be a part of the next LMDB release, i.e. version
0.9.26, as it has also been merged into the LMDB 0.9 release
branch.
The problem is that the aforementioned fix breaks rndc reconfig
on
all platforms we test on because it breaks the kludge we have been
effectively relying on so far. This is what we do (note that all calls
are for the same LMDB environment on disk, even though the pointers used
below - envA
and envB
- are different!):
mdb_env_open(envA)
- (
rndc reconfig
is invoked) mdb_env_open(envB)
mdb_env_close(envA)
Since all of this happens within a single process, the mdb_env_close()
call from step 4 destroys the locktable mutexes (because it correctly
observes that the current process is the only remaining user of the LMDB
environment at hand), which prevents any subsequent mdb_txn_begin()
calls from succeeding. Here is an example system test job which
triggers the problem:
https://gitlab.isc.org/isc-projects/bind9/-/jobs/975003
This problem affects all maintained BIND branches.
The only way I can see to work around the problem yet again without
redesigning the whole thing from the ground up is to employ MDB_NOLOCK
and use a mutex for controlling concurrent access to the LMDB database
ourselves. What saves us here is that we already have a mutex
handy and we can just broaden its scope without bumping the API versions
for our libraries in 9.11. I will submit a merge request implementing
this workaround shortly.
Honestly, though, I am afraid that this will just be another bandaid.
Call me an Eastern European grumbler, but I am not happy with the way
LMDB support has been implemented in BIND. We seem to have chosen
LMDB because it was apparently performing slightly better than SQLite 3.
The thing is, I do not think our use case needs fast concurrent access
to the database; we need something that allows us to add, remove, and
query zone configuration faster than scanning a flat file sequentially
(which is what pretty much any sane database should be capable of).
LMDB lives up to its promises about speed, but it comes with a set of
caveats that we need to cater for, which complicates our code given how
BIND works. To make things worse, our implementation of LMDB uses
#ifdef
guards, which means it shares some of the code with the
non-LMDB variant (NZF, a text file), but not all of it, which makes
the code harder to follow than it has to be.
Here are some ideas for what we can do in the future to improve the state of things:
-
Rework the LMDB implementation in BIND so that it matches the intended use of that library. This could be achieved by keeping a global list of reference-counted LMDB environment objects, each of which would be associated with a specific view name (not view instance!). This approach should allow
rndc reconfig
to do without callingmdb_env_open()
ormdb_env_close()
. I think such a change would be too severe to go into 9.16, though. -
Use a different database that will likely be slower than LMDB, but might be simpler to use.
-
Move LMDB support to a module (easier said than done).
-
Drop LMDB support altogether :-)
-
The docs do in fact state the same thing, see below.
↩