odd serve-stale / SERVFAIL performance problem
I have two identical newly installed servers running stock 9.12.3-P1 on Debian 9 "Stretch". They exhibit very different performance when trying to resolve reverse DNS under 48.in-addr.arpa. This domain has a lame delegation:
48.in-addr.arpa. NS psnyed01.prudential.com. psnyed01.prudential.com. CNAME ns1.prudential.com.
I have a sample of 29000 domains under 48.in-addr.arpa which I am using as my test set. Machine "R" can SERVFAIL them all in about 12 seconds with a hot cache, using
adns-masterfile as the client. I get the same time with a concurrency limit of 1000 or 4000 queries. Machine "S" takes much longer, about 25s with a concurrency of 1000, and 45s with a concurrency of 4000, and it develops a long backlog with the higher concurrency. Its ability to handle other queries suffers a lot while this is going on.
I've dumped the cache on the two machines. There are some differences
; answer ; stale ns1.prudential.com. 2055 A 184.108.40.206 ; answer ; stale ns2.prudential.com. 2055 A 220.127.116.11 ; pending-answer psnyed01.prudential.com. 3697 CNAME ns1.prudential.com.
in R's address database dump there is
; psnyed01.prudential.com alias ns1.prudential.com [target TTL 92] [v4 unexpected] [v6 unexpected]
Weirdly, R doesn't have any addresses in its adb dump (probably because it's been idle apart from by 48/8 testing).
; answer ; stale ns1.prudential.com. 3065 A 18.104.22.168 ; answer ; stale ns2.prudential.com. 3065 A 22.214.171.124 ; answer ; stale psnyed01.prudential.com. 3334 CNAME ns1.prudential.com.
On S there is no entry in the adb for psnyed01.prudential.com, and there are lots of addresses.
After restarting a server, or after flushing the prudential.com forward and reverse DNS, I get NXDOMAIN responses for names under 48.in-addr.arpa instead of SERVFAIL. I'm not sure how to provoke it to get into the servfail state.