named may generate broken glueless referrals when reaching UDP packet size limit

named unconditionally ignores the ISC_R_NOSPACE result code when rendering the ADDITIONAL section of a response message. This is usually fine, but there is an edge case which breaks with this behavior: for delegations which only have in-bailiwick name servers defined for the child zone, a referral must include glue records; meanwhile, named may not include any glue records in a referral if its size is close to the UDP packet size limit.

While I have not proved this in practice, I believe BIND has been behaving this way since version 9.0.0. The issue affects all referrals, both signed and unsigned, including those served by root servers: I came across this problem by analyzing traffic patterns on my home resolver. I believe that in extreme cases, this issue may cause resolution failures for the delegated domain.

Given enough retries (due to load balancing), it should be possible to reproduce this problem with:

$ dig @k.root-servers.net. _.dk. A +norec +dnssec +bufsize=512 +ignore +nsid

; <<>> DiG 9.17.2 <<>> @k.root-servers.net. _.dk. A +norec +dnssec +bufsize=512 +ignore +nsid
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10102
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; NSID: 6e 73 31 2e 64 65 2d 66 72 61 2e 6b 2e 72 69 70 65 2e 6e 65 74 ("ns1.de-fra.k.ripe.net")
;; QUESTION SECTION:
;_.dk.				IN	A

;; AUTHORITY SECTION:
dk.			172800	IN	NS	b.nic.dk.
dk.			172800	IN	NS	l.nic.dk.
dk.			172800	IN	NS	p.nic.dk.
dk.			172800	IN	NS	s.nic.dk.
dk.			172800	IN	NS	a.nic.dk.
dk.			172800	IN	NS	d.nic.dk.
dk.			172800	IN	NS	c.nic.dk.
dk.			86400	IN	DS	9280 13 2 3A93091A1A8A850E72A86DDAD3C79B2E3D426CC275BC11980E85C0AC 2854612B
dk.			86400	IN	RRSIG	DS 8 1 86400 20200706050000 20200623040000 48903 . AKrl/m0EtfPo4IbY0oJLNkCh/H0ZzFeEEl7SrgQOoKQ7uxj1rYsb5vy7 w7J+kRjq0Ll4LTt92DtkJS6OqcXZguv0sQFnEOw5tkmTvjNXJZSsjk6u c/jPrfDlVaT9glKeGGjcFpSnDfSkUv7gsR5S91Ovk6RX1ZfKKzlydebC /ci+qAcaxnJtFDiSa7RR7jReoGkeR8FFo0onjsO/RRcUKxvBjO7aRWNh k/1dD8VSGBro/8LEaPuTYMxRnbTSvBxqO+IY1JhcbCoZYRMRCvFqPm1j 3zh025e+vDjQsA86YeahM/lr2mSgNXkNO87BEYPPVo73eA+9ANoPEcle vnhEtA==

;; Query time: 23 msec
;; SERVER: 2001:7fd::1#53(2001:7fd::1)
;; WHEN: Tue Jun 23 11:19:28 CEST 2020
;; MSG SIZE  rcvd: 511

This issue is similar in spirit to the F-Root incident which happened earlier this year.

If my findings are correct, I think the fix for this bug needs a release note.

Edited Sep 01, 2020 by Michał Kępień

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information