Is a BIND bug responsible for a sharp rise in DNSSEC traffic to the root servers?
[managed-keys.bind.init](/uploads/417b78c0b0f9f695c9181c91fa454054/managed-keys.bind.init)[managed-keys.bind.after](/uploads/dc537462929f7df3d03d97bc313b6b22/managed-keys.bind.after)A mysterious sharp increase in Dnssec-related traffic to root servers has launched an urgent investigation.Ray knows the details of the problem.Reporter will probably announce this soon as a Bind bug.--Begin forwarded message:> Thanks to the folks at reporter for sending me lab instructions, I've > narrowed down their bind config to the following that I think > reproduces the problem.>> dnssec-enable no;> //dnssec-validation yes;>> This in combination with a managed-keys file that has the old key and > can't be updated with the revoked key (due to the above) causes DNSKEY > queries to be fired off for every incoming request, from what I can > tell. Likely in part because DNSSEC validation in the above is > probably set to auto by default? The bind manual states "Note > dnssec-enable needs to be set to yes (default value is yes) in order > for dnssec-validation to be effective.">> Could someone else please verify they can reproduce this? I can under > bind 9.11.4-P2-RedHat-9.11.4-13.P2.fc29 on fedora on two different > machines. I tried to get Robert to reproduce it under > 9.11.4-P2-RedHat-9.11.4-3.P2.fc27 but he couldn't.>> Note also, I'm getting SERVFAILs for requests sent through it as well, > though at some point.>> This also produces lines like this in my log:>> validating ./DNSKEY: unable to find a DNSKEY which verifies the DNSKEY > RRset and also matches a trusted key for '.'> no valid KEY resolving './DNSKEY/IN': 198.97.190.53#53> broken trust chain resolving 'icann.org/DS/IN > <http://icann.org/DS/IN>': 199.19.56.1#53> broken trust chain resolving 'icann.org/DNSKEY/IN > <http://icann.org/DNSKEY/IN>': 199.43.134.53#53> broken trust chain resolving 'meetings.icann.org/A/IN > <http://meetings.icann.org/A/IN>': 199.43.134.53#53> validating org/DS: bad cache hit (./DNSKEY)>> (which explains the SERVFAILs)>> My initial /var/named/dynamic/managed-keys.bind file that i believe is > causing the issue is attached. Note that it does get replaced with the > second file I'm attaching, showing the revoked key and no other source > of trust.>> Also note that at least twice out of ~6, it randomly didn't have a > problem. Stopping bind, resetting the file to the init state, and > restarting caused it to start failing again.>> Steps concretely:>> 1. systemctl stop named> 2. sudo cp /var/named/dynamic/managed-keys.bind.init > /var/named/dynamic/managed-keys.bind> 3. systemctl start named> 4. sudo tcpdump -i foo -n port 53 | grep -i dnskey> 5. dig @localhost meetings.icann.org <http://meetings.icann.org>> 6. dig @localhost example.com <http://example.com>> 7. dig @localhost foo.com <http://foo.com>>> The problem is that it seems very race-conditiony. Sometimes I can hit > it, and sometimes it does't. About a 25-50% chance of it getting into > whatever error state it's in. And sometimes it stops happening after > a while as well.