Fatal error at adb.c:1648:expire_entry() during "stress" test in RPZ mode
One of the RPZ-mode "stress" test jobs failed overnight with a crash that I think does not match any other open (or recently closed) GitLab issue:
https://gitlab.isc.org/isc-projects/bind9/-/jobs/3019884
23-Dec-2022 02:54:46.074 general: critical: adb.c:1648:expire_entry(): fatal error:
23-Dec-2022 02:54:46.074 general: critical: RUNTIME_CHECK(result == ISC_R_SUCCESS) failed
23-Dec-2022 02:54:46.074 general: critical: exiting (due to fatal error in library)
The relevant code location is:
1639 static void
1640 expire_entry(dns_adbentry_t *adbentry) {
1641 isc_result_t result;
1642 dns_adb_t *adb = adbentry->adb;
1643
1644 adbentry->flags |= ENTRY_IS_DEAD;
1645
1646 result = isc_hashmap_delete(adb->entries, NULL, &adbentry->sockaddr,
1647 sizeof(adbentry->sockaddr));
1648 >>> RUNTIME_CHECK(result == ISC_R_SUCCESS);
1649 ISC_LIST_UNLINK(adb->entries_lru, adbentry, link);
1650
1651 dns_adbentry_detach(&adbentry);
1652 }
Artifacts for the job have been retained; these include a core dump in
output/ns4/
. The logs pasted above come from output/ns4/named.log
.
While result
has been optimized out and so its value is not readily
available in GDB, looking at the source code of isc_hashmap_delete()
,
I cannot see how it could return any error other than ISC_R_NOTFOUND
.
This code only exists in main
, so I do not think this issue needs to
be confidential.
AFAICT, any resolver could hit this issue and it is not not a shutdown issue.