failed query to a `forward only` forwarder increments `serverquota` counter (spilled due to server quota)

I was inspecting the stats output and was very surprised to see this:

13779 spilled due to server quota

The server in question does not have fetches-per-server configured, so this defaults to zero (unlimited). But yet...

Looking at the code - I suspect there's a failure mode that drops through the 'out' block in fctx_getaddresses() without resetting all_spilled (which starts at 'true').

static isc_result_t
fctx_getaddresses(fetchctx_t *fctx, bool badcache) {
	dns_rdata_t rdata = DNS_RDATA_INIT;
	isc_result_t result;
	dns_resolver_t *res;
	isc_stdtime_t now;
	unsigned int stdoptions = 0;
	dns_forwarder_t *fwd;
	dns_adbaddrinfo_t *ai;
	bool all_bad;
	dns_rdata_ns_t ns;
	bool need_alternate = false;
	bool all_spilled = true;

...

			/*
			 * If all of the addresses found were over the
			 * fetches-per-server quota, return the configured
			 * response.
			 */
			if (all_spilled) {
				result = res->quotaresp[dns_quotatype_server];
				inc_stats(res, dns_resstatscounter_serverquota);
			}

This is a server that is using global forwarding, so we skip case 'normal_nses', which is where 'all_spilled' is normally reset from true to false during processing:

	if (fctx->fwdpolicy == dns_fwdpolicy_only)
		goto out;

So I'm guessing that what's been 'counted' and then reported here, is failures in getting responses back from any of the global forwarders (which tallies quite nicely with the problem I'm investigating - even though this wasn't a counter I was expecting to see in the stats!).

The assumption seems to be if it's a failure for any other reason than fetch-limits, that something will reset the 'all_spilled' flag - it would appear that assumption is flawed for some configurations and situations. Could someone have a look at this please - it should be an easy one to fix.

I note that this has also been noticed before on bind-users:

https://lists.isc.org/pipermail/bind-users/2016-June/097011.html

I observed this in 9.11.15-S1, but the code path looks the same still on master.

Requested changes:

fix serverquota counter
add a new counter for specifically for situation when all forwarders have failed

Edited Jan 24, 2022 by Petr Špaček

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information