Commit 0fdc68bb authored by Michal 'vorner' Vaner's avatar Michal 'vorner' Vaner
Browse files

Merge #2776

Resolver research document about mixed resolver/authoritative mode
parents e93ef80d 3a98bed9
The general issue is how to insure that the resolver scales.
Currently resolvers are CPU bound, and it seems likely that both
instructions-per-cycle and CPU frequency will not increase radically,
scaling will need to be across multiple cores.
How can we best scale a recursive resolver across multiple cores?
Some possible solutions:
a. Multiple processes with independent caches
b. Multiple processes with shared cache
c. A mix of independent/shared cache
d. Thread variations of the above
All of these may be complicated by NUMA architectures (with
faster/slower access to specific RAM).
Mixed recursive & authoritative setup
Ideally we will run the authoritative server independently of the
recursive resolver.
We need a way to run both an authoritative and a recursive resolver on
the same machine and listening on the same IP/port. But we need a way to
run only one of them as well.
This is mostly the same problem as we have with DDNS packets and xfr-out
requests, but they aren't that performance sensitive as auth & resolver.
There are a number of possible approaches to this:
One fat module
With some build system or dynamic linker tricks, we create three modules:
* Stand-alone auth
* Stand-alone resolver
* Compound module containing both
The user then chooses either one stand-alone module, or the compound one,
depending on the requirements.
* It is easier to switch between processing and ask authoritative questions
from within the resolver processing.
* The code is not separated (one bugs takes down both, admin can't see which
one takes how much CPU).
* BIND 9 does this and its code is a jungle. Maybe it's not just a
* Limits flexibility -- for example, we can't then decide to make the resolver
threaded (or we would have to make sure the auth processing doesn't break
with threads, which will be hard).
There's also the idea of putting the auth into a loadable library and the
resolver could load and use it somehow. But the advantages and disadvantages
are probably the same.
Auth first
We do the same as with xfrout and ddns. When a query comes, it is examined and
if the `RD` bit is set, it is forwarded to the resolver.
* Separate auth and resolver modules
* Minimal changes to auth
* No slowdown on the auth side
* Counter-intuitive asymmetric design
* Possible slowdown on the resolver side
* Resolver needs to know both modes (for running stand-alone too)
There's also the possibility of the reverse -- resolver first. It may make
more sense for performance (the more usual scenario would probably be a
high-load resolver with just few low-volume authoritative zones). On the other
hand, auth already has some forwarding tricks.
Auth with cache
This is mostly the same as ``Auth first'', however, the cache is in the auth
server. If it is in the cache, it is answered right away. If not, it is then
forwarded to the resolver. The resolver then updates the cache too.
* Probably good performance
* Cache duplication (several auth modules, it doesn't feel like it would work
with shared memory without locking).
* Cache is probably very different from authoritative zones, it would
complicate auth processing.
* The resolver needs own copy of cache (to be able to get partial results),
probably a different one than the auth server.
One module does only the listening. It doesn't process the queries itself, it
only looks into them and forwards them to the processing modules.
* Clean design with separated modules
* Easy to run modules stand-alone
* Allows for solving the xfrout & ddns forwarding without auth running
* Allows for views (different auths with different configurations)
* Allows balancing/clustering across multiple machines
* Easy to create new modules for different kinds of DNS handling and share
port with them too
* Need to set up another module (not a problem if we have inter-module
dependencies in b10-init)
* Possible performance impact. However, experiments show this is not an issue,
and the receptionist can actually increase the throughput with some tuning
and the increase in RTT is not big.
Implementation ideas
* Let's have a new TCP transport, where we send not only the DNS messages,
but also the source and destination ports and addresses (two reasons --
ACLs in target module and not keeping state in the receptionist). It would
allow for transfer of a batch of messages at once, to save some calls to
kernel (like a length of block of messages, it is read at once, then they
are all parsed one by one, the whole block of answers is sent back).
* A module creates a listening socket (UNIX by default) on startup and
contacts all the receptionists. It sends what kind of packets to send
to the module and the address of the UNIX socket. All the receptionists
connect to the module. This allows for auto-configuring the receptionist.
* The queries are sent from the receptionist in batches, the answers are sent
back to the receptionist in batches too.
* It is possible to fine-tune and use OS-specific tricks (like epoll or
sending multiple UDP messages by single call to sendmmsg()).
Implement the receptionist in a way we can still work without it (not throwing
the current UDPServer and TCPServer in asiodns away).
The way we handle xfrout and DDNS needs some changes, since we can't forward
sockets for the query. We would implement the receptionist protocol on them,
which would allow the receptionist to forward messages to them. We would then
modify auth to be able to forward the queries over the receptionist protocol,
so ordinary users don't need to start the receptionist.
Cache performance may be important for the resolver. It might not be
critical. We need to research this.
One key question is: given a specific cache hit rate, how much of an
impact does cache performance have?
For example, if we have 90% cache hit rate, will we still be spending
most of our time in system calls or in looking things up in our cache?
There are several ways we can consider figuring this out, including
measuring this in existing resolvers (BIND 9, Unbound) or modeling
with specific values.
Once we know how critical the cache performance is, we can consider
which algorithm is best for that. If it is very critical, then a
custom algorithm designed for DNS caching makes sense. If it is not,
then we can consider using an STL-based data structure.
This directory contains research and design documents for the BIND 10
resolver reimplementation.
Each file contains a specific issue and discussion surrounding that
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment