BIND 9.16.15 as resolver: sudden increase in ServFail query results
After upgrading BIND to 9.16.15, we have now observed this problem twice. Suddenly, named starts returning a large portion of ServFail query results. Simultaneously, it appears that named stops responding to rndc
commands.
This shows the query results tallied in 10s intervals, categorized of whether they return ServFailed status or something else ("normal" results).
We have a dnscap
running on this host which covers this latter event, but nothing immediately obvious leaps out when looking at the resulting packet traces, although it is a bit like searching for the proverbial needle in a haystack. There also does not seem to be any interesting messages logged which can be correlated with this event.
So ... not too much concrete to go on here, but I have two questions:
- Have you received any similar error / incident reports?
- What, if anything, can I look for to collect more or better information to trace the actual root cause of this issue?
I run my BIND instances on NetBSD/amd64, currently 9.0 or slightly later.
Since this somewhat smells like a possibly security-related issue, I'll restrict its availability.
For now I have downgraded BIND on the offending instance to 9.16.12 (the previous version we ran), there are a few other things I need to do before re-trying with 9.16.15.