Is a BIND bug responsible for a sharp rise in DNSSEC traffic to the root servers?
[managed-keys.bind.init](/uploads/417b78c0b0f9f695c9181c91fa454054/managed-keys.bind.init) [managed-keys.bind.after](/uploads/dc537462929f7df3d03d97bc313b6b22/managed-keys.bind.after) A mysterious sharp increase in Dnssec-related traffic to root servers has launched an urgent investigation. Ray knows the details of the problem. Reporter will probably announce this soon as a Bind bug. -- Begin forwarded message: > Thanks to the folks at reporter for sending me lab instructions, I've > narrowed down their bind config to the following that I think > reproduces the problem. > > dnssec-enable no; > //dnssec-validation yes; > > This in combination with a managed-keys file that has the old key and > can't be updated with the revoked key (due to the above) causes DNSKEY > queries to be fired off for every incoming request, from what I can > tell. Likely in part because DNSSEC validation in the above is > probably set to auto by default? The bind manual states "Note > dnssec-enable needs to be set to yes (default value is yes) in order > for dnssec-validation to be effective." > > Could someone else please verify they can reproduce this? I can under > bind 9.11.4-P2-RedHat-9.11.4-13.P2.fc29 on fedora on two different > machines. I tried to get Robert to reproduce it under > 9.11.4-P2-RedHat-9.11.4-3.P2.fc27 but he couldn't. > > Note also, I'm getting SERVFAILs for requests sent through it as well, > though at some point. > > This also produces lines like this in my log: > > validating ./DNSKEY: unable to find a DNSKEY which verifies the DNSKEY > RRset and also matches a trusted key for '.' > no valid KEY resolving './DNSKEY/IN': 18.104.22.168#53 > broken trust chain resolving 'icann.org/DS/IN > <http://icann.org/DS/IN>': 22.214.171.124#53 > broken trust chain resolving 'icann.org/DNSKEY/IN > <http://icann.org/DNSKEY/IN>': 126.96.36.199#53 > broken trust chain resolving 'meetings.icann.org/A/IN > <http://meetings.icann.org/A/IN>': 188.8.131.52#53 > validating org/DS: bad cache hit (./DNSKEY) > > (which explains the SERVFAILs) > > My initial /var/named/dynamic/managed-keys.bind file that i believe is > causing the issue is attached. Note that it does get replaced with the > second file I'm attaching, showing the revoked key and no other source > of trust. > > Also note that at least twice out of ~6, it randomly didn't have a > problem. Stopping bind, resetting the file to the init state, and > restarting caused it to start failing again. > > Steps concretely: > > 1. systemctl stop named > 2. sudo cp /var/named/dynamic/managed-keys.bind.init > /var/named/dynamic/managed-keys.bind > 3. systemctl start named > 4. sudo tcpdump -i foo -n port 53 | grep -i dnskey > 5. dig @localhost meetings.icann.org <http://meetings.icann.org> > 6. dig @localhost example.com <http://example.com> > 7. dig @localhost foo.com <http://foo.com> > > The problem is that it seems very race-conditiony. Sometimes I can hit > it, and sometimes it does't. About a 25-50% chance of it getting into > whatever error state it's in. And sometimes it stops happening after > a while as well.