MS DNS servers seem to silently invalidate "inactive" GSS contexts. As a result of that, if Kea doesn't perform a successful DDNS request-reply exchange with GSS-TSIG for a certain period, the MS server invalidates the corresponding GSS-TSIG key and starts refusing subsequent requests with GSS-TSIG using that context. Those subsequent requests will be responded with "broken" responses as described above, with the RCODE being REFUSED. We heard this "certain period" is 165s, and it certainly seems to be a few minutes, but we've not explicitly confirmed the exact period (or the existence of such "invalidation timer").
From Kea's point of view, those responses just look like broken GSS-TSIG token, so it cannot tell whether the context is invalidated in the MS server. So the only possible workaround right now is to rekey GSS-TSIG keys quite often, whether or not it's been actively used. This is where my other ticket (#20794) matters. see #2404 (closed)
From our experiments, if we rekey GSS-TSIG keys about every 150s and set key lifetime to 160s, this problem didn't happen. But such a frequent rekeying is a waste if the key is being actively used. For example, if we keep triggering DDNS updates every 30s or so, the problem didn't seem to happen even with the default rekey-interval (2700s).
So I'd suggest introducing some kind of "max-inactivity-interval". Kea would keep track of the use of generated GSS-TSIG keys, and if a key isn't used for a successful DDNS update attempt for the specified interval, it would trigger rekeying and expire the inactive one as soon as the new key becomes available.
see RT #20795