BIND cache-cleaning bug in cleanup_dead_nodes()

As reported in Support Ticket #20948 against BIND 9.16.23-S1

The ISC BIND subscription edition modifies the rbtdb cache data structure for EDNS CLIENT-SUBNET. It appears that the CLIENT-SUBNET patch was ported to the 9.16 subscription version from a previous branch.

There is a bug in cleanup_dead_nodes() which causes the cache to overflow "max-cache-size" and keep growing. This normally may not happen in the codepath via ISC BIND's overmem_purge(), or happen to a less degree, but NIOS makes some changes that causes a far greater number of nodes to enter the deadnodes lists. Nevertheless, the overflow occurs eventually due to a bug in cleanup_dead_nodes(), and it might occur in ISC BIND too.

The bug occurs due to the ECS cache changes, but it can occur even when ECS is disabled as the modified cache is always used.

diff --git a/lib/dns/rbtdb.c b/lib/dns/rbtdb.c
index f553684..96d1d41 100644
--- a/lib/dns/rbtdb.c
+++ b/lib/dns/rbtdb.c
@@ -2204,20 +2204,36 @@ cleanup_dead_nodes(dns_rbtdb_t *rbtdb, int bucketnum) {
 
                 if (is_leaf(node) && rbtdb->task != NULL) {
                         send_to_prune_tree(rbtdb, node, isc_rwlocktype_write);
-               } else if (node->down == NULL && node->data == NULL) {
+               } else if (node->down == NULL &&
+                          (node->data == NULL ||
+                           ((((rbtdb_data_t *) node->data)->nonecs_data == NULL) &&
+                            (((rbtdb_data_t *) node->data)->ecs_root == NULL))))
+               {
                         /*
                          * Not a interior node and not needing to be
                          * reactivated.
                          */
                         delete_node(rbtdb, node);
-               } else if (node->data == NULL) {
+               } else if (node->data == NULL ||
+                           ((((rbtdb_data_t *) node->data)->nonecs_data == NULL) &&
+                            (((rbtdb_data_t *) node->data)->ecs_root == NULL)))
+                {
                         /*
                          * A interior node without data. Leave linked to
                          * to be cleaned up when node->down becomes NULL.
                          */
                         ISC_LIST_APPEND(rbtdb->deadnodes[bucketnum], node,
                                         deadlink);
+               } else {
+                       /*
+                        * XXXMUKS: Here, node has been removed from the
+                        * deadnodes[bucketnum] list, and appears to
+                        * fall off the list. It should be commented so
+                        * here, as the implicit removal appears
+                        * surprising.
+                        */
                 }
+
                 node = ISC_LIST_HEAD(rbtdb->deadnodes[bucketnum]);
                 count--;
         }

There also seems to be a minor bug in KEEP_NODE(). However the cache cleans up properly nevertheless and this part of the sub-expression which checks node->down may be unnecessary:

diff --git a/lib/dns/rbtdb.c b/lib/dns/rbtdb.c
index f553684..5e408bc 100644
--- a/lib/dns/rbtdb.c
+++ b/lib/dns/rbtdb.c
@@ -2309,7 +2309,7 @@ decrement_reference(dns_rbtdb_t *rbtdb, dns_rbtnode_t *node,
          ((n)->data != NULL &&                                     \
           (((((rbtdb_data_t *)(n)->data)->nonecs_data) != NULL) || \
            ((((rbtdb_data_t *)(n)->data)->ecs_root) != NULL))) ||  \
-        ((l) && (n)->down != NULL) || (n) == (r)->origin_node ||  \
+        (!(l) || (n)->down != NULL) || (n) == (r)->origin_node ||  \
          (n) == (r)->nsec3_origin_node)
 
         /* Handle easy and typical case first. */

Edited Jul 18, 2022 by Petr Špaček

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information