expired RRSIGs / incorrect re-sign jitter
I've been having some weird re-signing problems with some zones on my
toy/dev server (running 9.17.0). The problems show up as errors from my
CDS checking script, when it looks to see if the delegation for
dev.dns.cam.ac.uk
needs upating. It gets the DNSKEY RRset via a resolver,
and sometimes discovers that the RRSIG expired while the RRset was in the
resolver cache. For example, this morning I saw:
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
dev.dns.cam.ac.uk. 23781 IN RRSIG DNSKEY 13 5 86400 (
20200327105542 20200317095543 55007 dev.dns.cam.ac.uk.
d8oiuuOMYH+fwaGSEjsVDUAdcVPGex75j1Ym5ib+dbyt
V1vtWA4BSl1EboVjfEjZIt57bizzJzB6SKtV87Gscg== )
My zones are configured with sig-validity-interval 10 8;
so zones should
be signed every 2 days and have at least 8 days between now and the RRSIG
expiration times. The zones have a max TTL of 24h and a SOA EXPIRE timer
of 7 days so expired RRSIGs should never occur.
I've just encountered something that suggests why my expired RRSIGs are
happening. By the time I started investigating the problem above, that
record had been re-signed and the dev.dns.cam.ac.uk
zone was all OK, but
one of the other zones was not. The TXT record (and a couple of NSEC
records) RRSIGs was due to expire too soon:
cb4.eu. 3600 IN RRSIG TXT 13 2 3600 20200328094609 20200318093210 ...
I frobbed the TXT records with nsupdate (well, actually nsvi!) and instead of the problem being fixed it was made worse! The TXT and SOA were updated with expiration times that were way too soon! I have not noticed this error happen with an nsupdate before.
cb4.eu. 3600 IN RRSIG SOA 13 2 3600 20200331005712 20200327113058 ...
cb4.eu. 3600 IN RRSIG TXT 13 2 3600 20200331005712 20200327113058 ...
The re-sign annotations in the zone are also wrong for these records (well, I suppose they can't be in the past!)
$ named-compilezone -j -f raw -o /dev/stdout cb4.eu cb4.eu
cb4.eu. 3600 IN SOA onyx.dotat.at. dot.dotat.at. 7506 3600 3600 604800 3600
cb4.eu. 3600 IN RRSIG SOA 13 2 3600 20200331005712 20200327113058 ...
; resign=20200331005712
...
cb4.eu. 3600 IN TXT "v=spf1 include:spf.messagingengine.com ?all"
cb4.eu. 3600 IN RRSIG TXT 13 2 3600 20200331005712 20200327113058 ...
; resign=20200331005712
...
I have not yet investigated the code, but it looks to me like the expiration jitter is covering the 10 days of the whole validity period, whereas it should only cover the 2 days of the resigning period - i.e. the expiry times must not occur within the 8 days safety margin.