Implement Stream DNS transport: refactoring stream based DNS transports to avoid code duplication and having two different TLS (and TCP) implementations

Currently our networking code contains multiple implementations of TLS based on OpenSSL:

TLS DNS (tlsdns.c);
TLS stream (tlsstream.c).

The first one (the so called TLS DNS) is a specialised version specifically targeted at handling DNS messages sent in the same format one would expect to see over TCP. It is implemented by using libuv and OpenSSL directly. The implementation cannot be used for anything else except handling DNS messages.

Another one (the so called TLS stream) is a universal implementation which implements the same interface as generic TCP transport. Moreover, it shares unit testing code with TCP. This one is based around the layered design; that is, it does not reference libuv code directly, as it offloads data exchange to the generic TCP networking code, but concentrates on encrypting and decrypting data using OpenSSL. Due to the fact that it offloads networking to the generic TCP code, it is universal and much more compact (TLS DNS is roughly 2 times larger than TLS stream in terms of lines of code).

To make things more complicated, and less optimal from the maintenance perspective in the long run, there is also a separate transport for DNS over TCP (the so called TCP DNS - tcpdns.c), which bears a lot of similarity with TLS DNS - it also uses libuv directly, but does not deal with encryption. However, the format of the DNS messages is the same.

So at this point we have four stream transports (TCP (tcp.c), TCP DNS, TLS DNS, TLS stream; DNS over HTTP (http.c) is in a category of its own):

Three of these (TCP, TCP DNS, TLS DNS) have very similar code to the point where a change in one should be repeated three times quite often;
Two of these four transports (TLS DNS, TLS stream) implement TLS but in a different way;
Of the four transports, two (TCP DNS, TLS DNS) are specialised ones and deal with DNS messages only, basically in the same format (sans the encryption).

As far as I understand that, the situation is a result of the Network Manager instability at the time of the transports development. As the transports were implemented in the middle of refactoring, it turned out that implementing new transports directly on top of libuv essentially by forking the code of generic TCP turned out to be more productive.

In my opinion, this situation will bring a lot of headaches in the future, even more so when we will add more DNS transports. To improve the situation, we might get rid of most of the similar transports, most obviously two TLS implementations.

In order to do so, we can reimplement the transport for DNS over TLS (the TLS DNS) on top of the universal TLS code (TLS stream). It implies the layered design approach, similar to our DNS over HTTP(S) implementation, but of course, much simpler, because it does not have to deal with HTTP intricacies and HTTP/2 stream multiplexing.

This design has numerous important advantages. Firstly, it enables ultimate code reuse and avoids code duplication. Thus, a simple bug fix to TCP or TLS transports would immediately benefit the specialised transports, implemented on top of them. Secondly, it should reduce the code complexity. In this case the resulting transport’s code would be much simpler, as code-wise it does not need to deal with two protocols (DNS and TLS) at the same time as it offloads encryption to the underlying TLS transport, so it would deal with properly assembling and framing DNS messages in the “stream format” (when length of DNS messages is being prepended to them).

Actually, it would not add complexity to go further than that. Let’s take a look at TLS DNS and TCP DNS from the networking perspective, or, to be more precise, taking into consideration the OSI model. They deal with the data in the same format sans the encryption: that is they differ at the “Presentation” layer of the OSI model. Thus, from the application perspective, they can be considered implementing the same protocol, the difference between them is the same as that one between HTTP and HTTPS. Here we can again take a look at how DNS over HTTP(S) transport is implemented - it can be used both with (over TLS stream) and without (over TCP) encryption; in general, the code is written in a transport agnostic way. We can apply a similar design to both TCP DNS and TLS DNS and implement a single transport, Stream DNS, which can use both plain TCP and encrypted TLS stream transports depending on the situation.

Thus the amount of stream DNS transports would get reduced from two to one, but extensible one, additionally to reducing the number of TLS implementations in the same way. That has a possibility to pay off even more later if we consider implementing support for PPROXY (v2) protocol. This way, to implement support for TCP DNS queries over PROXY protocol, we could potentially implement a simple PROXY stream protocol and quite easily extend the existing Stream DNS with basically a dozen lines of new code. The existing DNS over HTTP(S) code could be extended with PROXY support in a similar way.

(I have to admit that I mention PROXY protocol here purely for demonstrative purposes, as I haven’t looked much into it - but I cannot see why it would not work. The Stream DNS approach would not help with implementing PROXY protocol support over UDP, of course, as it should be a separate simple transport anyway).

In my honest opinion, the only transports where libuv should be directly referenced are the foundational ones - that is TCP and UDP. Everything else directly or indirectly should work on top of these two to avoid code duplication.

When implementing Stream DNS, I would start with a tiny “library” (a couple of hundreds of code) which would handle incoming data from sockets and assemble DNS messages. After a message is assembled, it would call a callback. This library can be implemented as a state machine and unit tested separately from the networking code (it does not have to depend on that). There is, obviously, no need to do a similar thing for outgoing data, as prepending a message size on writes is not that complicated, considering that our code has full control over the outgoing data.

After the library’s code is ready, Stream DNS can be trivially implemented on top of that. It will mostly consist of code taking data from the underlying TCP or TLS stream socket (with a possibility to extend it to more streams, like PROXY mentioned above). The code itself will be completely (or almost completely) transport agnostic.

The existing set of unit tests for TCP DNS and TLS DNS might be reused almost without changes as well.

My expectation is that this implementation is going to be much smaller than TLS DNS, TCP DNS and even a fairly compact TLS stream. Having a unified transport for stream DNS will put an end to the zoo of similar transports which we currently have in the codebase, each possibly having its own idiosyncrasies. It might also be a start for implementing PROXY protocol support in BIND, although this part should be more extensively researched and at this point it should rather be seen as a bonus and used here mostly for illustrative purposes.

Edited Aug 01, 2022 by Suzanne Goldlust