RPZ rpz-nsip rules seem not to understand stub and static-stub zones and don't handle DNS_R_GLUE result well ...
After introducing a new rpz-nsip policy in a pre-existing zone (the first one in use in this environment) a zone which is the subject of a stub zone configuration then started logging messages of this type for all queries for names in and in delegations from this stub zone.
Switching to static-stub didn't help.
We think this relates to the way that named creates a zone db in memory for stub and static-stub zones, but unexpectedly (unexpectedly to RPZ that is) returns glue from it versus authoritative RRsets.
The RPZ code that is producing the logged message doesn't have a code to handle DNS_R_GLUE and thus this falls through to 'default' error handling.
Here are some (anonymised) examples of the log messages being reported:
Mar 19 18:05:17 r7 named[100011]: client @0x7f688c00b4f8 192.0.2.0.134#45337 (query1.alpha.example.com): rpz NSIP/NSDNAME rewrite query1.alpha.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:17 r7 named[100011]: client @0x7f688c007bf8 192.0.2.0.134#45337 (query1.alpha.example.com): rpz NSIP/NSDNAME rewrite query1.alpha.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:17 r7 named[100011]: client @0x7f69080eaf98 192.0.2.134#36690 (1.2.3.10.in-addr.arpa): rpz NSIP/NSDNAME rewrite 1.2.3.10.in-addr.arpa via 10.in-addr.arpa unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:17 r7 named[100011]: client @0x7f68f001bf28 192.0.2.0.168#39939 (query2.beta.gamma.example.com): rpz NSIP/NSDNAME rewrite query2.beta.gamma.example.com via beta.gamma.example.com unrecognized NS rpz_rrset_find() failed: glue
...
Mar 19 18:05:18 r7 named[100011]: client @0x7f68b8069c08 192.0.2.0.86#38447 (query3.delta.epsilon.example.com): rpz NSIP/NSDNAME rewrite query3.delta.epsilon.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:18 r7 named[100011]: client @0x7f68b002f2d8 192.0.2.0.157#51189 (query4.alpha.example.com): rpz NSIP/NSDNAME rewrite query4.alpha.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:18 r7 named[100011]: client @0x7f6870087de8 192.0.2.0.56#54972 (query5.alpha.example.com): rpz NSIP/NSDNAME rewrite query5.alpha.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:18 r7 named[100011]: client @0x7f68b002f2d8 192.0.2.0.157#51189 (query6.alpha.example.com): rpz NSIP/NSDNAME rewrite query6.alpha.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:18 r7 named[100011]: client @0x7f687000c908 192.0.2.0.56#54972 (query7.alpha.example.com): rpz NSIP/NSDNAME rewrite query7.alpha.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:18 r7 named[100011]: client @0x7f6920098de8 192.0.2.0.201#47323 (query8.zeta.example.com): rpz NSIP/NSDNAME rewrite query8.zeta.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Mar 19 18:05:18 r7 named[100011]: client @0x7f68d86e9108 192.0.2.0.201#42436 (query8.zeta.example.com): rpz NSIP/NSDNAME rewrite query8.zeta.example.com via example.com unrecognized NS rpz_rrset_find() failed: glue
Coincidentally with this scenario, the site in question also experienced a temporary loss of Internet connectivity, but even after it was restored, the logging of the message unrecognized NS rpz_rrset_find() failed: glue
persisted until rpz-nsip processing was prevented by adding nsip-enable no
to the response policy.
Subsequently we discovered that a new rpz-nsip policy had been added to the policy zone at around the same time as the temporary loss of Internet connectivity.
Notably above (and we still need to check into this, as well as the qnames below example.com
(the zone which was configured as type stub), there is also a query for a qname below 10.in-addr.arpa
producing the same error.
The server is dnssec-validating and running 9.16.26-S1 although there is a validate-except covering example.com.
The reporter wrote:
I have verified that there are no RPZ entries for any example.com name,
so I don't know why the RPZ was being triggered in the first place. All
of our RPZ entries look like this one:
<< sanitised name >>. 14400 IN CNAME .
so any match should result in a QNAME NXDOMAIN answer. But a rewrites were
being triggered for non-matches and resulted in SERVFAIL. I wonder if
this could be an interaction between RPZ and serve-stale, as the problem
started at about the same time as we had a firewall issue that blocked
access from our recursive servers to numerous authorities for a while.
Disabling the RPZ stopped the errors (named was reconfigured, not restarted).