[ISC-support #21600] Hang after interrupted transfer via XoT
https://support.isc.org/Ticket/Display.html?id=21600 https://support.isc.org/Ticket/Display.html?id=21804 (seems related)
Two customers have reported that after an XoT transfer was interrupted, a BIND 9.18.11 secondary would fail to attempt another transfer. One mentioned an interruption via a primary restart, the other stated it would occur over a poor network path. The hang appears to be more easily reproduced using 'dig +timeout=5 +retry=0 +tls AXFR'. Killing the dig process from another login is necessary, since Ctrl-C will not terminate it:
root@testauth-sec:/var/named# dig +tls @testauth-pri example.com AXFR | tail
host-1283008.example.com. 300 IN A 4.5.6.7
host-1283009.example.com. 300 IN A 4.5.6.7
host-128301.example.com. 300 IN A 4.5.6.7
host-1283010.example.com. 300 IN A 4.5.6.7
host-1283011.example.com. 300 IN A 4.5.6.7
host-1283012.example.com. 300 IN A 4.5.6.7
host-1283013.example.com. 300 IN A 4.5.6.7
host-1283014.example.com. 300 IN A 4.5.6.7
;; communications error to 192.168.101.116#853: connection reset
;; TLS peer certificate verification for 192.168.101.116#853 failed: self signed certificate
root@testauth-sec:/var/named# dig +tls @testauth-pri example.com AXFR | tail
^C
Due to the difficulty of timing the primary restart with the secondary's transfer requests, the hang was observed only once for 'named' but no usable core file could be obtained due to stripped symbols. Hours of trying with an unstripped installation did not result in a hang. This issue will be updated if further testing can trigger the hang with 'named'. This appears to affect the receiving side of the transfer, as it is possible to perform successful transfers from other 'dig' instances while a prior one is hung.
Method of reproducing:
- configure bind with TLS and a big zone, wich takes 20 seconds or more to load
- prepare a dig command against the primary, using "+tls +timeout=5 +retry=0"
- start bind and wait until it is listening on the TLS-Port (takes only a few milliseconds)
- issue the prepared dig command
- dig will hang, and never returns to command line
A .txz containing dig binary and core file (taken via gcore during hang) are attached.(edited)
Marking as confidential, just in case.