bind9 doesn't tcp retransmit
Summary
When doing a recursive resolution, after falling back to tcp, named doesn't retransmit if the first syn is not acknowledged. This breaks ECN fallback and means bind fails against (broken) servers that block syn packets with ECN bit set.
With ECN fallback, if the first SYN packet (with ECN set) is not acknowledged, when it is re-transmitted, the ECN bit is unset, just in case something somewhere was dropping the packet because of the ECN bit set (this should not happen, but some broken TCP implementations do.) Interestingly, if you do a dig directly against such a broken DNS server, it does correctly retransmit without the ECN bit set, and the lookup succeeds. But for some reason, when named is doing a recursive lookup, if the first syn isn't acknowledged, it just gives up.
BIND version used
BIND 9.11.4-3ubuntu5-Ubuntu (Extended Support Version) <id:2fe4344>
running on Linux x86_64 4.18.0-13-generic #14-Ubuntu SMP Wed Dec 5 09:04:24 UTC 2018
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include'
'--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var'
'--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--libexecdir=/usr/lib/x86_64-linux-gnu'
'--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu'
'--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile'
'--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr'
'--with-gssapi=/usr' '--with-libidn2' '--with-libjson=/usr' '--without-lmdb' '--with-gnu-ld'
'--with-geoip=/usr' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa'
'--enable-native-pkcs11' '--with-pkcs11=/usr/lib/softhsm/libsofthsm2.so' '--with-randomdev=/dev/urandom'
'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-JzrX_5/bind9-9.11.4+dfsg=.
-fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing
-fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions
-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 8.2.0
compiled with OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
linked to OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
compiled with libxml2 version: 2.9.4
linked to libxml2 version: 20904
compiled with libjson-c version: 0.12.1
linked to libjson-c version: 0.12.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
Steps to reproduce
enable ecn: echo 1 > /proc/sys/net/ipv4/tcp_ecn <br>
flush cache: rndc flush <br>
dig www.sprint.net <br>
### What is the current *bug* behavior?
servfail response
### What is the expected *correct* behavior?
; <<>> DiG 9.11.4-3ubuntu5-Ubuntu <<>> www.sprint.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22391
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.sprint.net. IN A
;; ANSWER SECTION:
www.sprint.net. 3600 IN A 208.24.22.50
;; Query time: 88 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Thu Dec 20 20:46:16 PST 2018
;; MSG SIZE rcvd: 59