Commit 50df71a8 authored by Ondřej Surý's avatar Ondřej Surý
Browse files

Merge branch '2183-dns-flag-day-2020' into 'main'

Resolve "DNS Flag Day 2020"

Closes #2183

See merge request isc-projects/bind9!4179
parents d51f09a8 096d41b4
5516. [func] The default EDNS buffer size has been changed from 4096
to 1232, the EDNS buffer size probing has been removed
and ``named`` now sets the DON'T FRAGMENT flag on
outgoing UDP packets. [GL #2183]
5515. [func] Add 'rndc dnssec -rollover' command to trigger a
manual rollover for a specific key. [GL #1749]
......
......@@ -73,7 +73,7 @@
#define LOOKUP_LIMIT 64
#define DEFAULT_EDNS_VERSION 0
#define DEFAULT_EDNS_BUFSIZE 4096
#define DEFAULT_EDNS_BUFSIZE 1232
/*%
* Lookup_limit is just a limiter, keeping too many lookups from being
......
......@@ -59,7 +59,7 @@ options {\n\
# directory <none>\n\
dnssec-policy \"none\";\n\
dump-file \"named_dump.db\";\n\
edns-udp-size 4096;\n\
edns-udp-size 1232;\n\
# fake-iquery <obsolete>;\n"
#ifndef WIN32
" files unlimited;\n"
......@@ -83,11 +83,11 @@ options {\n\
match-mapped-addresses no;\n\
max-ixfr-ratio 100%;\n\
max-rsa-exponent-size 0; /* no limit */\n\
max-udp-size 4096;\n\
max-udp-size 1232;\n\
memstatistics-file \"named.memstats\";\n\
# multiple-cnames <obsolete>;\n\
# named-xfer <obsolete>;\n\
nocookie-udp-size 4096;\n\
nocookie-udp-size 1232;\n\
notify-rate 20;\n\
nta-lifetime 3600;\n\
nta-recheck 300;\n\
......
......@@ -958,8 +958,10 @@ if [ -x "$DIG" ] ; then
echo_i "check that dig +bufsize restores default bufsize ($n)"
ret=0
dig_with_opts @10.53.0.3 a.example +bufsize=0 +bufsize +qr > dig.out.test$n 2>&1 || ret=1
lines=`grep "EDNS:.* udp: 4096" dig.out.test$n | wc -l`
lines=`grep "EDNS:.* udp:" dig.out.test$n | wc -l`
lines1232=`grep "EDNS:.* udp: 1232" dig.out.test$n | wc -l`
test $lines -eq 2 || ret=1
test $lines1232 -eq 2 || ret=1
if [ $ret -ne 0 ]; then echo_i "failed"; fi
status=$((status+ret))
......
......@@ -199,9 +199,9 @@ if [ $ret != 0 ]; then echo_i "failed"; fi
status=`expr $status + $ret`
n=`expr $n + 1`
echo_i "checking recursive lookup to edns 512 server succeeds ($n)"
echo_i "checking recursive lookup to edns 512 server fails ($n)"
ret=0
resolution_succeeds edns512. || ret=1
resolution_fails edns512. || ret=1
if [ $ret != 0 ]; then echo_i "failed"; fi
status=`expr $status + $ret`
......
......@@ -638,7 +638,7 @@ sendquery(struct query *query, isc_task_t *task) {
unsigned char cookie[40];
if (query->udpsize == 0) {
query->udpsize = 4096;
query->udpsize = 1232;
}
if (query->edns < 0) {
query->edns = 0;
......
......@@ -1842,7 +1842,7 @@ Boolean Options
``nocookie-udp-size``
This sets the maximum size of UDP responses that are sent to queries
without a valid server COOKIE. A value below 128 is silently
raised to 128. The default value is 4096, but the ``max-udp-size``
raised to 128. The default value is 1232, but the ``max-udp-size``
option may further limit the response size.
``sit-secret``
......@@ -3399,7 +3399,7 @@ Tuning
the size of packets received from authoritative servers in response
to recursive queries. Valid values are 512 to 4096; values outside
this range are silently adjusted to the nearest value within it.
The default value is 4096.
The default value is 1232.
The usual reason for setting ``edns-udp-size`` to a non-default value
is to get UDP answers to pass through broken firewalls that block
......@@ -3407,26 +3407,22 @@ Tuning
512 bytes.
When ``named`` first queries a remote server, it advertises a UDP
buffer size of 512, as this has the greatest chance of success on the
first try.
If the initial query is successful with EDNS advertising a buffer
size of 512, then ``named`` switches to advertising a buffer size
of 4096 bytes (unless ``edns-udp-size`` is lower, in which case the
latter will be used).
Query timeouts observed for any given server affect the buffer
size advertised in queries sent to that server. Depending on
observed packet dropping patterns, the advertised buffer size is
lowered to 1432 bytes, 1232 bytes, 512 bytes, or the size of the
largest UDP response ever received from a given server, and then
clamped to the ``<512, edns-udp-size>`` range. Per-server EDNS
statistics are only retained in memory for the lifetime of a given
server's ADB entry.
(The values 1232 and 1432 are chosen to allow for an
IPv4-/IPv6-encapsulated UDP message to be sent without fragmentation at the
minimum MTU sizes for Ethernet and IPv6 networks.)
buffer size of 1232.
Query timeouts observed for any given server affect the buffer size
advertised in queries sent to that server. Depending on observed packet
dropping patterns, the query is retried over TCP. Per-server EDNS statistics
are only retained in memory for the lifetime of a given server's ADB entry.
The ``named`` now sets the DON'T FRAGMENT flag on outgoing UDP packets.
According to the measurements done by multiple parties this should not be
causing any operational problems as most of the Internet "core" is able to
cope with IP message sizes between 1400-1500 bytes, the 1232 size was picked
as a conservative minimal number that could be changed by the DNS operator to
a estimated path MTU minus the estimated header space. In practice, the
smallest MTU witnessed in the operational DNS community is 1500 octets, the
Ethernet maximum payload size, so a a useful default for maximum DNS/UDP
payload size on **reliable** networks would be 1400.
Any server-specific ``edns-udp-size`` setting has precedence over all
the above rules.
......@@ -3435,7 +3431,7 @@ Tuning
This sets the maximum EDNS UDP message size that ``named`` sends, in bytes.
Valid values are 512 to 4096; values outside this range are
silently adjusted to the nearest value within it. The default value
is 4096.
is 1232.
This value applies to responses sent by a server; to set the
advertised buffer size in queries, see ``edns-udp-size``.
......
......@@ -24,8 +24,6 @@ Known Issues
New Features
~~~~~~~~~~~~
- None.
- Add a new ``rndc`` command, ``rndc dnssec -rollover``, which triggers
a manual rollover for a specific key. [GL #1749]
......@@ -42,7 +40,17 @@ Removed Features
Feature Changes
~~~~~~~~~~~~~~~
- None.
- [DNS Flag Day 2020]: The default EDNS buffer size has been changed from 4096
to 1232, the EDNS buffer size probing has been removed and ``named`` now sets
the DON'T FRAGMENT flag on outgoing UDP packets. According to the
measurements done by multiple parties this should not be causing any
operational problems as most of the Internet "core" is able to cope with IP
message sizes between 1400-1500 bytes, the 1232 size was picked as a
conservative minimal number that could be changed by the DNS operator to a
estimated path MTU minus the estimated header space. In practice, the smallest
MTU witnessed in the operational DNS community is 1500 octets, the Ethernet
maximum payload size, so a a useful default for maximum DNS/UDP payload size
on reliable networks would be 1400. [GL #2183]
Bug Fixes
~~~~~~~~~
......
......@@ -249,20 +249,13 @@ struct dns_adbentry {
unsigned char plain;
unsigned char plainto;
unsigned char edns;
unsigned char to4096; /* Our max. */
unsigned char ednsto;
uint8_t mode;
atomic_uint_fast32_t quota;
atomic_uint_fast32_t active;
double atr;
/*
* Allow for encapsulated IPv4/IPv6 UDP packet over ethernet.
* Ethernet 1500 - IP(20) - IP6(40) - UDP(8) = 1432.
*/
unsigned char to1432; /* Ethernet */
unsigned char to1232; /* IPv6 nofrag */
unsigned char to512; /* plain DNS */
isc_sockaddr_t sockaddr;
unsigned char *cookie;
uint16_t cookielen;
......@@ -1893,14 +1886,11 @@ new_adbentry(dns_adb_t *adb) {
e->flags = 0;
e->udpsize = 0;
e->edns = 0;
e->ednsto = 0;
e->completed = 0;
e->timeouts = 0;
e->plain = 0;
e->plainto = 0;
e->to4096 = 0;
e->to1432 = 0;
e->to1232 = 0;
e->to512 = 0;
e->cookie = NULL;
e->cookielen = 0;
e->srtt = (isc_random_uniform(0x1f)) + 1;
......@@ -3529,8 +3519,7 @@ dump_adb(dns_adb_t *adb, FILE *f, bool debug, isc_stdtime_t now) {
dns_adbentry_t *entry;
fprintf(f, ";\n; Address database dump\n;\n");
fprintf(f, "; [edns success/4096 timeout/1432 timeout/1232 timeout/"
"512 timeout]\n");
fprintf(f, "; [edns success/timeout]\n");
fprintf(f, "; [plain success/timeout]\n;\n");
if (debug) {
LOCK(&adb->reflock);
......@@ -3656,11 +3645,10 @@ dump_entry(FILE *f, dns_adb_t *adb, dns_adbentry_t *entry, bool debug,
}
fprintf(f,
";\t%s [srtt %u] [flags %08x] [edns %u/%u/%u/%u/%u] "
";\t%s [srtt %u] [flags %08x] [edns %u/%u] "
"[plain %u/%u]",
addrbuf, entry->srtt, entry->flags, entry->edns, entry->to4096,
entry->to1432, entry->to1232, entry->to512, entry->plain,
entry->plainto);
addrbuf, entry->srtt, entry->flags, entry->edns, entry->ednsto,
entry->plain, entry->plainto);
if (entry->udpsize != 0U) {
fprintf(f, " [udpsize %u]", entry->udpsize);
}
......@@ -4437,41 +4425,6 @@ maybe_adjust_quota(dns_adb_t *adb, dns_adbaddrinfo_t *addr, bool timeout) {
}
#define EDNSTOS 3U
bool
dns_adb_noedns(dns_adb_t *adb, dns_adbaddrinfo_t *addr) {
int bucket;
bool noedns = false;
REQUIRE(DNS_ADB_VALID(adb));
REQUIRE(DNS_ADBADDRINFO_VALID(addr));
bucket = addr->entry->lock_bucket;
LOCK(&adb->entrylocks[bucket]);
if (addr->entry->edns == 0U &&
(addr->entry->plain > EDNSTOS || addr->entry->to4096 > EDNSTOS))
{
if (((addr->entry->plain + addr->entry->to4096) & 0x3f) != 0) {
noedns = true;
} else {
/*
* Increment plain so we don't get stuck.
*/
addr->entry->plain++;
if (addr->entry->plain == 0xff) {
addr->entry->edns >>= 1;
addr->entry->to4096 >>= 1;
addr->entry->to1432 >>= 1;
addr->entry->to1232 >>= 1;
addr->entry->to512 >>= 1;
addr->entry->plain >>= 1;
addr->entry->plainto >>= 1;
}
}
}
UNLOCK(&adb->entrylocks[bucket]);
return (noedns);
}
void
dns_adb_plainresponse(dns_adb_t *adb, dns_adbaddrinfo_t *addr) {
......@@ -4488,10 +4441,7 @@ dns_adb_plainresponse(dns_adb_t *adb, dns_adbaddrinfo_t *addr) {
addr->entry->plain++;
if (addr->entry->plain == 0xff) {
addr->entry->edns >>= 1;
addr->entry->to4096 >>= 1;
addr->entry->to1432 >>= 1;
addr->entry->to1232 >>= 1;
addr->entry->to512 >>= 1;
addr->entry->ednsto >>= 1;
addr->entry->plain >>= 1;
addr->entry->plainto >>= 1;
}
......@@ -4510,25 +4460,10 @@ dns_adb_timeout(dns_adb_t *adb, dns_adbaddrinfo_t *addr) {
maybe_adjust_quota(adb, addr, true);
/*
* If we have not had a successful query then clear all
* edns timeout information.
*/
if (addr->entry->edns == 0 && addr->entry->plain == 0) {
addr->entry->to512 = 0;
addr->entry->to1232 = 0;
addr->entry->to1432 = 0;
addr->entry->to4096 = 0;
} else {
addr->entry->to512 >>= 1;
addr->entry->to1232 >>= 1;
addr->entry->to1432 >>= 1;
addr->entry->to4096 >>= 1;
}
addr->entry->plainto++;
if (addr->entry->plainto == 0xff) {
addr->entry->edns >>= 1;
addr->entry->ednsto >>= 1;
addr->entry->plain >>= 1;
addr->entry->plainto >>= 1;
}
......@@ -4536,7 +4471,7 @@ dns_adb_timeout(dns_adb_t *adb, dns_adbaddrinfo_t *addr) {
}
void
dns_adb_ednsto(dns_adb_t *adb, dns_adbaddrinfo_t *addr, unsigned int size) {
dns_adb_ednsto(dns_adb_t *adb, dns_adbaddrinfo_t *addr) {
int bucket;
REQUIRE(DNS_ADB_VALID(adb));
......@@ -4547,36 +4482,10 @@ dns_adb_ednsto(dns_adb_t *adb, dns_adbaddrinfo_t *addr, unsigned int size) {
maybe_adjust_quota(adb, addr, true);
if (size <= 512U) {
if (addr->entry->to512 <= EDNSTOS) {
addr->entry->to512++;
addr->entry->to1232++;
addr->entry->to1432++;
addr->entry->to4096++;
}
} else if (size <= 1232U) {
if (addr->entry->to1232 <= EDNSTOS) {
addr->entry->to1232++;
addr->entry->to1432++;
addr->entry->to4096++;
}
} else if (size <= 1432U) {
if (addr->entry->to1432 <= EDNSTOS) {
addr->entry->to1432++;
addr->entry->to4096++;
}
} else {
if (addr->entry->to4096 <= EDNSTOS) {
addr->entry->to4096++;
}
}
if (addr->entry->to4096 == 0xff) {
addr->entry->ednsto++;
if (addr->entry->ednsto == 0xff) {
addr->entry->edns >>= 1;
addr->entry->to4096 >>= 1;
addr->entry->to1432 >>= 1;
addr->entry->to1232 >>= 1;
addr->entry->to512 >>= 1;
addr->entry->ednsto >>= 1;
addr->entry->plain >>= 1;
addr->entry->plainto >>= 1;
}
......@@ -4604,10 +4513,7 @@ dns_adb_setudpsize(dns_adb_t *adb, dns_adbaddrinfo_t *addr, unsigned int size) {
addr->entry->edns++;
if (addr->entry->edns == 0xff) {
addr->entry->edns >>= 1;
addr->entry->to4096 >>= 1;
addr->entry->to1432 >>= 1;
addr->entry->to1232 >>= 1;
addr->entry->to512 >>= 1;
addr->entry->ednsto >>= 1;
addr->entry->plain >>= 1;
addr->entry->plainto >>= 1;
}
......@@ -4630,38 +4536,6 @@ dns_adb_getudpsize(dns_adb_t *adb, dns_adbaddrinfo_t *addr) {
return (size);
}
unsigned int
dns_adb_probesize(dns_adb_t *adb, dns_adbaddrinfo_t *addr, int lookups) {
int bucket;
unsigned int size;
REQUIRE(DNS_ADB_VALID(adb));
REQUIRE(DNS_ADBADDRINFO_VALID(addr));
bucket = addr->entry->lock_bucket;
LOCK(&adb->entrylocks[bucket]);
if (addr->entry->to1232 > EDNSTOS || lookups >= 2) {
size = 512;
} else if (addr->entry->to1432 > EDNSTOS || lookups >= 1) {
size = 1232;
} else if (addr->entry->to4096 > EDNSTOS) {
size = 1432;
} else {
size = 4096;
}
/*
* Don't shrink probe size below what we have seen due to multiple
* lookups.
*/
if (lookups > 0 && size < addr->entry->udpsize &&
addr->entry->udpsize < 4096) {
size = addr->entry->udpsize;
}
UNLOCK(&adb->entrylocks[bucket]);
return (size);
}
void
dns_adb_setcookie(dns_adb_t *adb, dns_adbaddrinfo_t *addr,
const unsigned char *cookie, size_t len) {
......
......@@ -606,19 +606,6 @@ dns_adb_getudpsize(dns_adb_t *adb, dns_adbaddrinfo_t *addr);
*\li addr be valid.
*/
unsigned int
dns_adb_probesize(dns_adb_t *adb, dns_adbaddrinfo_t *addr, int lookups);
/*%
* Return suggested EDNS UDP size based on observed responses / failures.
* 'lookups' is the number of times the current lookup has been attempted.
*
* Requires:
*
*\li adb be valid.
*
*\li addr be valid.
*/
void
dns_adb_plainresponse(dns_adb_t *adb, dns_adbaddrinfo_t *addr);
/*%
......@@ -644,22 +631,9 @@ dns_adb_timeout(dns_adb_t *adb, dns_adbaddrinfo_t *addr);
*/
void
dns_adb_ednsto(dns_adb_t *adb, dns_adbaddrinfo_t *addr, unsigned int size);
/*%
* Record a failed EDNS UDP response and the advertised EDNS UDP buffer size
* used.
*
* Requires:
*
*\li adb be valid.
*
*\li addr be valid.
*/
bool
dns_adb_noedns(dns_adb_t *adb, dns_adbaddrinfo_t *addr);
dns_adb_ednsto(dns_adb_t *adb, dns_adbaddrinfo_t *addr);
/*%
* Return whether EDNS should be disabled for this server.
* Record a EDNS UDP query failed.
*
* Requires:
*
......
......@@ -94,13 +94,11 @@ typedef enum { dns_quotatype_zone = 0, dns_quotatype_server } dns_quotatype_t;
#define DNS_FETCHOPT_NOEDNS0 0x00000008 /*%< Do not use EDNS. */
#define DNS_FETCHOPT_FORWARDONLY 0x00000010 /*%< Only use forwarders. */
#define DNS_FETCHOPT_NOVALIDATE 0x00000020 /*%< Disable validation. */
#define DNS_FETCHOPT_EDNS512 \
0x00000040 /*%< Advertise a 512 byte \
* UDP buffer. */
#define DNS_FETCHOPT_WANTNSID 0x00000080 /*%< Request NSID */
#define DNS_FETCHOPT_PREFETCH 0x00000100 /*%< Do prefetch */
#define DNS_FETCHOPT_NOCDFLAG 0x00000200 /*%< Don't set CD flag. */
#define DNS_FETCHOPT_NONTA 0x00000400 /*%< Ignore NTA table. */
#define DNS_FETCHOPT_OBSOLETE1 0x00000040 /*%< Obsolete */
#define DNS_FETCHOPT_WANTNSID 0x00000080 /*%< Request NSID */
#define DNS_FETCHOPT_PREFETCH 0x00000100 /*%< Do prefetch */
#define DNS_FETCHOPT_NOCDFLAG 0x00000200 /*%< Don't set CD flag. */
#define DNS_FETCHOPT_NONTA 0x00000400 /*%< Ignore NTA table. */
/* RESERVED ECS 0x00000000 */
/* RESERVED ECS 0x00001000 */
/* RESERVED ECS 0x00002000 */
......
......@@ -203,6 +203,11 @@
*/
#define RECV_BUFFER_SIZE 4096 /* XXXRTH Constant. */
/*%
* Default EDNS0 buffer size
*/
#define DEFAULT_EDNS_BUFSIZE 1232
/*%
* This defines the maximum number of timeouts we will permit before we
* disable EDNS0 on the query.
......@@ -316,7 +321,6 @@ struct fetchctx {
dns_fwdpolicy_t fwdpolicy;
isc_sockaddrlist_t bad;
ISC_LIST(struct tried) edns;
ISC_LIST(struct tried) edns512;
isc_sockaddrlist_t bad_edns;
dns_validator_t *validator;
ISC_LIST(dns_validator_t) validators;
......@@ -1215,7 +1219,7 @@ update_edns_stats(resquery_t *query) {
}
if ((query->options & DNS_FETCHOPT_NOEDNS0) == 0) {
dns_adb_ednsto(fctx->adb, query->addrinfo, query->udpsize);
dns_adb_ednsto(fctx->adb, query->addrinfo);
} else {
dns_adb_timeout(fctx->adb, query->addrinfo);
}
......@@ -2321,38 +2325,6 @@ add_triededns(fetchctx_t *fctx, isc_sockaddr_t *address) {
ISC_LIST_INITANDAPPEND(fctx->edns, tried, link);
}
static struct tried *
triededns512(fetchctx_t *fctx, isc_sockaddr_t *address) {
struct tried *tried;
for (tried = ISC_LIST_HEAD(fctx->edns512); tried != NULL;
tried = ISC_LIST_NEXT(tried, link))
{
if (isc_sockaddr_equal(&tried->addr, address)) {
return (tried);
}
}
return (NULL);
}
static void
add_triededns512(fetchctx_t *fctx, isc_sockaddr_t *address) {
struct tried *tried;
tried = triededns512(fctx, address);
if (tried != NULL) {
tried->count++;
return;
}
tried = isc_mem_get(fctx->mctx, sizeof(*tried));
tried->addr = *address;
tried->count = 1;
ISC_LIST_INITANDAPPEND(fctx->edns512, tried, link);
}
static inline size_t
addr2buf(void *buf, const size_t bufsize, const isc_sockaddr_t *sockaddr) {
isc_netaddr_t netaddr;
......@@ -2607,15 +2579,14 @@ resquery_send(resquery_t *query) {
* response size we have seen from this server so far.
*
* If this server has already timed out twice or more in this
* fetch context, force setting the advertised UDP buffer size
* to 512 bytes.
* fetch context, force TCP.
*/
if ((tried = triededns(fctx, sockaddr)) != NULL) {
if (tried->count == 1U) {
hint = dns_adb_getudpsize(fctx->adb,
query->addrinfo);
} else if (tried->count >= 2U) {
query->options |= DNS_FETCHOPT_EDNS512;
query->options |= DNS_FETCHOPT_TCP;
}
}
}
......@@ -2637,23 +2608,10 @@ resquery_send(resquery_t *query) {
uint16_t padding = 0;
/*
* If we ever received an EDNS response from this
* server, initialize 'udpsize' with a value between
* 512 and 4096, based on any potential EDNS timeouts
* observed for this particular server in the past and
* the total number of query timeouts observed for this
* fetch context so far. Clamp 'udpsize' to the global
* 'edns-udp-size' value (if unset, the latter defaults
* to 4096 bytes).
* Set the default UDP size to what was configured as
* 'edns-buffer-size'
*/
if ((flags & FCTX_ADDRINFO_EDNSOK) != 0) {
udpsize = dns_adb_probesize(fctx->adb,
query->addrinfo,
fctx->timeouts);
if (udpsize > res->udpsize) {
udpsize = res->udpsize;
}
}
udpsize = res->udpsize;
/*
* This server timed out for the first time in this
......@@ -2667,17 +2625,6 @@ resquery_send(resquery_t *query) {
udpsize = hint;
}
/*
* If we have not received any responses from this
* server before or if this server has already timed
* out twice or more in this fetch context, use an EDNS
* UDP buffer size of 512 bytes.
*/
if (udpsize == 0U ||
(query->options & DNS_FETCHOPT_EDNS512) != 0) {
udpsize = 512;
}
/*
* If a fixed EDNS UDP buffer size is configured for
* this server, make sure we obey that.
......@@ -2805,13 +2752,7 @@ resquery_send(resquery_t *query) {
goto cleanup_message;
}
if (udpsize > 512U) {
add_triededns(fctx, &query->addrinfo->sockaddr);
}
if (udpsize == 512U) {
add_triededns512(fctx, &query->addrinfo->sockaddr);
}
add_triededns(fctx, &query->addrinfo->sockaddr);
/*
* Clear CD if EDNS is not in use.
......@@ -3101,17 +3042,10 @@ resquery_connected(isc_task_t *task, isc_event_t *event) {
isc_socket_detach(&query->tcpsocket);
/*
* Do not query this server again in this fetch context
* if we already tried reducing the advertised EDNS UDP
* payload size to 512 bytes and the server is
* unavailable over TCP. This prevents query loops
* lasting until the fetch context restart limit is
* reached when attempting to get answers whose size
* exceeds 512 bytes from broken servers.
* if the server is unavailable over TCP.
*/