There was a possible lock inversion scenario that could happen when attempting to shutdown BIND if three or more threads are involved.
To demonstrate the problem in question lets consider the case detected in TSAN tests, we have three threads, T1, T2 and T3, we also have three mutexes involved, M1, M2 and M3.
M1 = adb->namelocks[some_bucket] M2 = some_view->lock (View's lock) M3 = some_zone->lock (Zone's lock) T1 acquires M1 dns_adb_createfind -> find_name_and_lock T2 acquires M2 view_flushanddetach T3 acquires M3 zone_shutdown T1 attempts to acquire M2, but it is locked by T2: #1 dns_view_find lib/dns/view.c:1040 #2 dbfind_name lib/dns/adb.c:3833 #3 dns_adb_createfind lib/dns/adb.c:3198 T2 attempts to acquire M3, but it is locked by T3: #1 dns_zone_flush lib/dns/zone.c:11441 #2 flush lib/dns/zt.c:215 #3 dns_zt_apply lib/dns/zt.c:537 #4 zt_destroy lib/dns/zt.c:221 #5 zt_flushanddetach lib/dns/zt.c:243 #6 dns_zt_flushanddetach lib/dns/zt.c:249 #7 view_flushanddetach lib/dns/view.c:645 T3 attemtps to acquire M1, but it is locked by T1: #1 violate_locking_hierarchy lib/dns/adb.c:1279 #2 dns_adb_cancelfind lib/dns/adb.c:3457 #3 notify_cancel lib/dns/zone.c:11796 #4 zone_shutdown lib/dns/zone.c:14532
To fix the problem, we addressed the function zone_shutdown, before we attempt to acquire the zone's lock (M3) we now acquire the view's lock (M2) associated with that zone, this way we ensure that either T2 or T3 will complete its job. If T3 runs first it will relase M1, so T1 could run to completion, if T2 runs first, it will release M3 allowing T3 to run to completion.
Closes #2615 (closed)