named may generate broken glueless referrals when reaching UDP packet size limit
named
unconditionally ignores the ISC_R_NOSPACE
result code
when rendering the ADDITIONAL section of a response message. This is
usually fine, but there is an edge case which breaks with this
behavior: for delegations which only have in-bailiwick name servers
defined for the child zone, a referral must include glue records;
meanwhile, named
may not include any glue records in a referral if its
size is close to the UDP packet size limit.
While I have not proved this in practice, I believe BIND has been behaving this way since version 9.0.0. The issue affects all referrals, both signed and unsigned, including those served by root servers: I came across this problem by analyzing traffic patterns on my home resolver. I believe that in extreme cases, this issue may cause resolution failures for the delegated domain.
Given enough retries (due to load balancing), it should be possible to reproduce this problem with:
$ dig @k.root-servers.net. _.dk. A +norec +dnssec +bufsize=512 +ignore +nsid
; <<>> DiG 9.17.2 <<>> @k.root-servers.net. _.dk. A +norec +dnssec +bufsize=512 +ignore +nsid
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10102
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; NSID: 6e 73 31 2e 64 65 2d 66 72 61 2e 6b 2e 72 69 70 65 2e 6e 65 74 ("ns1.de-fra.k.ripe.net")
;; QUESTION SECTION:
;_.dk. IN A
;; AUTHORITY SECTION:
dk. 172800 IN NS b.nic.dk.
dk. 172800 IN NS l.nic.dk.
dk. 172800 IN NS p.nic.dk.
dk. 172800 IN NS s.nic.dk.
dk. 172800 IN NS a.nic.dk.
dk. 172800 IN NS d.nic.dk.
dk. 172800 IN NS c.nic.dk.
dk. 86400 IN DS 9280 13 2 3A93091A1A8A850E72A86DDAD3C79B2E3D426CC275BC11980E85C0AC 2854612B
dk. 86400 IN RRSIG DS 8 1 86400 20200706050000 20200623040000 48903 . AKrl/m0EtfPo4IbY0oJLNkCh/H0ZzFeEEl7SrgQOoKQ7uxj1rYsb5vy7 w7J+kRjq0Ll4LTt92DtkJS6OqcXZguv0sQFnEOw5tkmTvjNXJZSsjk6u c/jPrfDlVaT9glKeGGjcFpSnDfSkUv7gsR5S91Ovk6RX1ZfKKzlydebC /ci+qAcaxnJtFDiSa7RR7jReoGkeR8FFo0onjsO/RRcUKxvBjO7aRWNh k/1dD8VSGBro/8LEaPuTYMxRnbTSvBxqO+IY1JhcbCoZYRMRCvFqPm1j 3zh025e+vDjQsA86YeahM/lr2mSgNXkNO87BEYPPVo73eA+9ANoPEcle vnhEtA==
;; Query time: 23 msec
;; SERVER: 2001:7fd::1#53(2001:7fd::1)
;; WHEN: Tue Jun 23 11:19:28 CEST 2020
;; MSG SIZE rcvd: 511
This issue is similar in spirit to the F-Root incident which happened earlier this year.
If my findings are correct, I think the fix for this bug needs a release note.