tcp-clients mostly ineffective
Note that this is largely a copy/paste from an update on the customer ticket. I didn't feel like trying to explain it again only using slightly different words.
My theory has borne fruit and I have empirically confirmed how it is that 'tcp-clients' is mostly ineffective, and also a plausible explanation for why it hasn't been noticed before now.
I've previously described how 'tcp-clients' is a hard quota, but that named will go ahead and service a request that is over the quota. This is the first part of the problem.
After a client's work is completed it is then scrubbed of various bits of information, detached from the quota if it was attached, and then returned to the pool of available clients and the accept_list.
Note that the last part happens the same whether the client was counted in the quota or not.
Further, since there isn't any counting done of clients that are handling over-quota requests, there's also no way of knowing when they exist and so whenever a client that was counted is returned the quota count is decreased.
Thus the situation is:
- the quota only affects the creation of replacement clients for the accept_list, it does not control the number of clients that can be on that list.
- the server starts off with the ability to go over quota by 1
- each time the quota count reaches the maximum value it increases the server's capacity to go over quota by 1 because there is a client replacement at that point
- each time a client counted in the quota finishes its work the quota count is decreased. if the quota was at max and there are pending connections to accept() this will cause the server's over-quota capacity to immediately increase by 1
In order to verify this I created a patch (attached, note that it is a superset of the patches provided previously) that adds a second TCP quota, but this one is configured differently.
The original tcpquota is configured to be "hard". This means that it has a max that cannot be exceeded.
The new one is configured to be "soft" with no absolute max but a soft limit that is kept in synch with the regular quota's max. This soft limit doesn't prevent new clients from being counted, but I do use it in logging.
I added client logging in two places. The first is immediately after the isc_quota_attach in client_newconn and the other is immediately before the isc_quota_detach in exit_check (also in client.c).
The logging statement in client_newconn also includes the text translation of the result code from the isc_quota_attach called on the original (hard) quota.
The logging statement in exit_check also includes whether or not the client is attached (i.e. counted) in the "hard" quota.
Both statements include the "used" and "max" from the hard quota and the "used" and "soft" from the soft quota (should be the same as the hard quota max).
I found that the best way to test and demonstrate this was with a script that maintained a set number of outstanding connections - this allowed me to throttle it so that I didn't exhaust resources while also maintaining the pressure to make the server bounce against the limit imposed by 'tcp-clients' as often as possible. The script is attached as mkload.pl - all 37 lines of it.
It seems that the rate of growth in uncounted clients is related to the 'tcp-clients' setting (more is faster) and that the upper limit is likely to be the lower of what can get through the network and the number of available file descriptors.