Exceeding transfers-per-ns quota aborts zone refresh

Summary

A zone transfer abandoned because the selected master host's transfers-per-ns quota is reached cancels the whole refresh activity for the zone even when alternative master hosts are configured and below quota.

This causes updates to be unnecessarily delayed; another refresh will not occur until the zone's refresh timer expires or another NOTIFY message is received.

The effect of this is that just one slow master host effectively torpedoes overall zone-transfer latency even when lots of other master servers are available to use. A slow master which holds its zone transfer sessions open longer than usual causing it to reach its quota kills off the refresh activities for all zones that it's named's currently preferred master for.

Steps to reproduce

Use the likes of tc on a slave host to slow TCP sessions to its first-listed master host to the point where the number of concurrent TCP transfer sessions exceeds the transfers-per-ns limit. Observe that slave hosts abort refresh activities for zones due to transfers-per-ns quota without trying the other master hosts.

What is the current bug behavior?

Exceeding the transfers-per-ns quota for one master aborts the refresh activity for the zone, not just the transfer from the over-quota master.

What is the expected correct behavior?

If there are other masters for the zone, they are below their own transfers-per-ns quota, and the global transfers-in quota is not exceeded, the alternative masters should be used.

Edited Oct 04, 2021 by Ondřej Surý