Multiple RRSIGs on some records in signed zone even though only one key is ever active at a time
named sometimes creates new RRSIGs for some records even though there are current, valid, RRSIGs already for those records from a now-inactive key that's still published.
Steps to reproduce
This may require the signing of the zone to have never completed with one or more earlier keys.
- create a KSK and at least four ZSKs
reset_keys.shfor the keys
- use the attached
reset_keys.shimmediately prior to starting
named. this schedules three key rolls: now+300s, now+600s, and now+660s, including key deletion
- the result is observable before named is finished working through the scheduled keyevents
- pipe the output of
check_journal.pland it will report on changes to key status (as reflected in the TYPE65534 records) and when/if extra signatures are generated (I've attached the output from one of my test runs)
What is the current bug behaviour?
Starting with the zone update in which the TYPE65534 RR for the original key (40278 in my test run) is removed (serial number 1996073706 in my file), named will sign RRsets with the currently-active key (20856 in my test run) even though they have current and valid signatures for the previous key (21894 in my test run) that is still published. This behaviour is not permanent, but it is unknown what triggers the end of it.
If no keys are given deletion times the bug will not manifest.
What is the expected correct behaviour?
If there is always only ever one active ZSK at a time, there should never be current, valid, RRSIGs from multiple keys for any given RRset.
More specifically, given a sequence of successor keys
A, B, C, D, E, the signing behaviour during the transition from key
D to key
E doesn't seem like it should depend on whether or not it happens while key
A is being deleted from the zone, but that is what has been observed.
It is not known whether or not there are "normal" configurations and/or keyroll policies that can trigger this bug.
The original reporter encountered it while testing their keyroll scripting by moving the system clock ahead by +30days relatively soon after a keyroll was performed. The test procedures here were created to reproduce the observed problem without having to modify the system clock.
Relevant configuration files
Relevant logs and/or screenshots
This has been observed in both 9.9.4 and a recent master (commit 49849155). I have
rr recordings from both versions and can arrange to upload them if requested.