failed query to a `forward only` forwarder increments `serverquota` counter (spilled due to server quota)
As observed in Support ticket #16297
I was inspecting the stats output and was very surprised to see this:
13779 spilled due to server quota
The server in question does not have fetches-per-server
configured, so this defaults to zero (unlimited). But yet...
Looking at the code - I suspect there's a failure mode that drops through the 'out' block in fctx_getaddresses() without resetting all_spilled (which starts at 'true').
static isc_result_t
fctx_getaddresses(fetchctx_t *fctx, bool badcache) {
dns_rdata_t rdata = DNS_RDATA_INIT;
isc_result_t result;
dns_resolver_t *res;
isc_stdtime_t now;
unsigned int stdoptions = 0;
dns_forwarder_t *fwd;
dns_adbaddrinfo_t *ai;
bool all_bad;
dns_rdata_ns_t ns;
bool need_alternate = false;
bool all_spilled = true;
...
/*
* If all of the addresses found were over the
* fetches-per-server quota, return the configured
* response.
*/
if (all_spilled) {
result = res->quotaresp[dns_quotatype_server];
inc_stats(res, dns_resstatscounter_serverquota);
}
This is a server that is using global forwarding, so we skip case 'normal_nses', which is where 'all_spilled' is normally reset from true to false during processing:
if (fctx->fwdpolicy == dns_fwdpolicy_only)
goto out;
So I'm guessing that what's been 'counted' and then reported here, is failures in getting responses back from any of the global forwarders (which tallies quite nicely with the problem I'm investigating - even though this wasn't a counter I was expecting to see in the stats!).
The assumption seems to be if it's a failure for any other reason than fetch-limits, that something will reset the 'all_spilled' flag - it would appear that assumption is flawed for some configurations and situations. Could someone have a look at this please - it should be an easy one to fix.
I note that this has also been noticed before on bind-users:
https://lists.isc.org/pipermail/bind-users/2016-June/097011.html
I observed this in 9.11.15-S1, but the code path looks the same still on master.
Requested changes:
-
fix serverquota counter -
add a new counter for specifically for situation when all forwarders have failed