QNAME minimization may trigger "premature" priming queries
I do not believe this is a serious issue, maybe it can even be closed
with -EWORKSASDESIGNED
, but perhaps there is room for improvement
here. I mostly wanted to document what is happening because figuring it
out required me to run rr
for 10 days straight :-)
With QNAME minimization enabled, a named
resolver might send priming
queries before the last cached ./NS
response expires. The "root
cause" here is that the glue TTL for the root-servers.net.
zone
differs between the root servers (3600000) and the servers authoritative
for net.
(172800). This can be demonstrated using the following dig
invocations:
$ dig @a.root-servers.net root-servers.net. NS +norec
$ dig @a.gtld-servers.net root-servers.net. NS +norec +multi
Since the root-servers.net.
zone is unsigned, the RFC 2181 trust level
of the RRsets found in the ADDITIONAL section of the referrals returned
by net.
authoritative servers is higher than those returned in the
ADDITIONAL section of unsigned authoritative responses generated by the
root servers (which are authoritative for root-servers.net.
). This
results in the 172800 TTL for [a-m].root-servers.net/A(AAA)
overriding
the higher one received from the root servers.
To reproduce the issue:
-
Start a
named
resolver with QNAME minimization enabled and preferablymax-stale-ttl 0;
set, so that the DB dump from the next step is easier to interpret. -
Do an
rndc dumpdb -cache
, then run:grep -F root-servers.net named_dump.db
The result should contain lines like:
a.root-servers.net. 518399 A 198.41.0.4 b.root-servers.net. 518399 A 199.9.14.201 c.root-servers.net. 518399 A 192.33.4.12 d.root-servers.net. 518399 A 199.7.91.13 e.root-servers.net. 518399 A 192.203.230.10 f.root-servers.net. 518399 A 192.5.5.241 g.root-servers.net. 518399 A 192.112.36.4 h.root-servers.net. 518399 A 198.97.190.53 i.root-servers.net. 518399 A 192.36.148.17 j.root-servers.net. 518399 A 192.58.128.30 k.root-servers.net. 518399 A 193.0.14.129 l.root-servers.net. 518399 A 199.7.83.42 m.root-servers.net. 518399 A 202.12.27.33
These are cache entries created by the resolver priming query sent during startup. The TTL is about 1 week, which is consistent with
max-cache-ttl
. -
Query the resolver for
root-servers.net/NS
. This will cause the resolver to first ask the root servers about_.net.
, which will allow it to learn about the servers authoritative forroot-servers.net.
. These will subsequently be queried for_.root-servers.net.
, triggering a referral with TTL=172800. -
Repeat step 2. The relevant lines should now turn into something like:
a.root-servers.net. 172799 A 198.41.0.4 b.root-servers.net. 172799 A 199.9.14.201 c.root-servers.net. 172799 A 192.33.4.12 d.root-servers.net. 172799 A 199.7.91.13 e.root-servers.net. 172799 A 192.203.230.10 f.root-servers.net. 172799 A 192.5.5.241 g.root-servers.net. 172799 A 192.112.36.4 h.root-servers.net. 172799 A 198.97.190.53 i.root-servers.net. 172799 A 192.36.148.17 j.root-servers.net. 172799 A 192.58.128.30 k.root-servers.net. 172799 A 193.0.14.129 l.root-servers.net. 172799 A 199.7.83.42 m.root-servers.net. 172799 A 202.12.27.33
With the TTL lowered, this resolver will now send the next priming query in about 2 days instead of 6 days.
This scenario does not happen without QNAME minimization, in which case
the resolver would start processing the root-servers.net/NS
query by
immediately querying the root servers - and since the root servers are
authoritative for root-servers.net.
, the servers authoritative for
net.
would not get a chance to send any TTL=172800 referrals.
I found this purely because I was curious why a named
resolver whose
cache size limit never gets exceeded would sometimes send a priming
query earlier than 6 days after the last priming query. This "problem"
with the interaction of QNAME minimization with root servers is not
nearly as grave as the other one described in #1896.
I think triggering this requires a client query for something in the
root-servers.net.
zone. The issue manifests itself two days later :-)