Prevent generating broken glueless referrals
I think this MR really needs a high-level description, so here I go. (As usually, I tried hard to be very thorough both in both commit log messages and code comments, but still.)
#1967 (closed) turned out to be surprisingly frustrating to fix across all supported versions of BIND as:
-
v9.11 (EoL) uses non-refactored query handling code (
bin/named/query.c
) and also has an optional additional cache, but no glue cache (which is 9.12+), -
v9.16+ has refactored query handling code (
lib/ns/query.c
) and does not support the additional cache feature, but does have an optional glue cache (different feature).
All in all, fixing this in all maintained branches would require preparing four different variants of a non-trivial fix:
-
bin/named/query.c
with additional cache enabled, -
bin/named/query.c
with additional cache disabled, -
lib/ns/query.c
with glue cache enabled, -
lib/ns/query.c
with glue cache disabled.
This looks like overkill to me for something that is not a critical issue.
This MR only contains the last two variants of the fix from the list above, i.e. it addresses the problem in v9.16 and v9.17 (EoL) both with and without the glue cache.
But honestly, I would like to go a step further. My suggestion is to:
-
only fix the "glue cache enabled" variant,
-
drop the
glue-cache
option in v9.17 (EoL) and leave it always on (in a separate MR, of course).
The rationale for introducing the glue-cache
option was:
The rationale for this option is that we don't have any feedback on how the glue cache is going to behave in the real world just yet. There should be an option to bypass it.
To the best of my knowledge, we have received zero problem reports which
mention this feature in the past 3 years (since glue cache was
introduced). I also have not seen the glue-cache no;
setting in
configuration files people use these days.
Thoughts?
Here is a brief overview of the branch (doc updates excluded):
-
8eb90b69:
glue
system test cleanup; nothing fancy and there are just two checks in that test right now, -
8344c604: add tests reproducing #1967 (closed) in various configurations,
-
0cd1dbe5: "glue cache on" fix, part 1,
-
dddbe70d: "glue cache on" fix, part 2,
-
7684320f: "glue cache off" fix.
The only reason I split the "glue cache on" fix into two parts is to
make the fix easier to follow as each of these parts touches a different
section of rdataset_addglue()
.
I would be more than happy to drop the "glue cache off" part of this MR, please do not treat the amount of comments I put in there as a factor :)
Finally, both "glue cache on" and "glue cache off" variants could
probably be replaced with a fix that affects a common code path,
similarly to e.g. query_glueanswer()
, but:
-
the code implementing such a fix would be more complicated than the changes suggested in this MR,
-
I expect such a fix to perform worse than the changes suggested in this MR (by putting the fix in glue cache code, each referral is only "analyzed" once - a "common path" fix would need to do that for every response sent).
Once the path forward with this MR is agreed upon, I would like to run it through Perflab before merging it.
Closes #1967 (closed)