Skip to content

Test serve-stale behavior in a shared cache setup

Add a test that intends to trigger a very specific order of events that led to a crash before commit bbd163ac:

  1. Two views, view A and view B, are attached to a shared cache. Serve-stale is enabled; "stale-answer-client-timeout" is set to a positive integer.

  2. The following DNS response chain is cached:

    cname.selective.    5     IN    CNAME    a.selective.
    a.selective.        10    IN    A        10.53.0.2
  3. cname.selective/CNAME expires from cache. a.selective/A remains active.

  4. Both view A and view B are queried for cname.selective/A.

  5. The resolvers for both views start recursion due to the cname.selective/CNAME record being expired.

  6. The resolver for view A manages to successfully resolve the query. Due to packet loss, the resolver for view B fails to resolve the query and continues querying the authoritative servers.

  7. "stale-answer-client-timeout" fires for cname.selective/A in view B. Since the resolver for view A managed to resolve cname.selective/CNAME in the meantime and the a.selective/A record has not expired from cache yet, the final answer for the client query received by view B is readily available and is therefore sent back to the client.

  8. The a.selective/A record expires from cache.

  9. The resolver for view B manages to resolve cname.selective/CNAME and resumes resolution for its target name, i.e. a.selective/A. Since the latter expired from cache (in step 8), recursive resolution is started for a.selective/A.

  10. Due to packet loss, the a.selective/A queries that the resolver for view B sends to authoritative servers remain unanswered.

  11. "stale-answer-client-timeout" fires for a.selective/A in view B. The resolver for view B finds a stale a.selective/A record and attempts to send it back to the client. However, the a.selective/A record was already added to the response (and sent back to the client) in step 7. named crashes due to an assertion failure.

With the right timing, the new test causes affected named versions to crash with the following assertion failure:

query.c:8250: INSIST(qctx->rdataset == ((void *)0) || qctx->qtype == ((dns_rdatatype_t)dns_rdatatype_dname))

Closes #4287 (closed)

Merge request reports