Some recursive DNS lookups fail with SERVFAIL
Summary
Name lookups for 2 Microsoft sites result in SERVFAIL. The queries successfully return A records if requested directly on the forwarded servers.
The 2 names that fail, are:
proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net
proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net
Two very similar lookups that succeed:
proxyworkerrolein4-his-seas-1.connector.his.msappproxy.net
proxyworkerrolein0-his-eas-1.connector.his.msappproxy.net
BIND version used
BIND 9.9.6-P1 (Extended Support Version) id:3612d8fb built by make with '--prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib64' '--includedir=/usr/include/bind' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-openssl' '--enable-threads' '--with-libtool' '--enable-runidn' '--with-libxml2=/usr' '--with-gssapi' '--enable-rrl' 'CFLAGS=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fno-strict-aliasing' 'LDFLAGS=-L/usr/lib64' compiled by GCC 4.3.4 [gcc-4_3-branch revision 152973] using OpenSSL version: OpenSSL 0.9.8j 07 Jan 2009 using libxml2 version: 2.7.6
Steps to reproduce
dig ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net.
; <<>> DiG 9.9.6-P1 <<>> ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28055
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net. IN A
;; ANSWER SECTION:
ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net. 10 IN CNAME proxyworkerrole.2.his-eas-1.cloudapp.net.
proxyworkerrole.2.his-eas-1.cloudapp.net. 10 IN A 168.63.205.143
;; Query time: 154 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue May 05 17:56:10 NZST 2020
;; MSG SIZE rcvd: 153
iritndhc01:~ #
iritndhc01:~ # dig proxyworkerrolein4-his-seas-1.connector.his.msappproxy.net.
; <<>> DiG 9.9.6-P1 <<>> proxyworkerrolein4-his-seas-1.connector.his.msappproxy.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58242
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;proxyworkerrolein4-his-seas-1.connector.his.msappproxy.net. IN A
;; ANSWER SECTION:
proxyworkerrolein4-his-seas-1.connector.his.msappproxy.net. 10 IN CNAME proxyworkerrole.4.his-seas-1.cloudapp.net.
proxyworkerrole.4.his-seas-1.cloudapp.net. 10 IN A 13.67.51.8
;; Query time: 86 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue May 05 17:14:38 NZST 2020
;; MSG SIZE rcvd: 155
iritndhc01:~ #
What is the current bug behavior?
iritndhc01:~ # dig proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net.
; <<>> DiG 9.9.6-P1 <<>> proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 64892
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net. IN A
;; Query time: 99 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue May 05 17:55:56 NZST 2020
;; MSG SIZE rcvd: 86
iritndhc01:~ #
What is the expected correct behavior?
The following good response happens only if the upper case characters from the CNAME are used in the query:
iritndhc01:~ # dig ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net.
; <<>> DiG 9.9.6-P1 <<>> ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28055
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net. IN A
;; ANSWER SECTION:
ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net. 10 IN CNAME proxyworkerrole.2.his-eas-1.cloudapp.net.
proxyworkerrole.2.his-eas-1.cloudapp.net. 10 IN A 168.63.205.143
;; Query time: 154 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue May 05 17:56:10 NZST 2020
;; MSG SIZE rcvd: 153
iritndhc01:~ #
Or, if the lookup is done on the server that the local DNS server forwards to:
dig @10.32.0.153 proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net.
; <<>> DiG 9.9.6-P1 <<>> @10.32.0.153 proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47871
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net. IN A
;; ANSWER SECTION:
proxyworkerrolein2-his-eas-1.connector.his.msappproxy.net. 10 IN CNAME ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net.
ProxyWorkerRoleIN2-his-eas-1.connector.his.msappproxy.net. 10 IN CNAME proxyworkerrole.2.his-eas-1.cloudapp.net.
proxyworkerrole.2.his-eas-1.cloudapp.net. 10 IN A 168.63.205.143
;; Query time: 42 msec
;; SERVER: 10.32.0.153#53(10.32.0.153)
;; WHEN: Tue May 05 17:55:41 NZST 2020
;; MSG SIZE rcvd: 185
iritndhc01:~ #
Interestingly, we can get a correct response if we do an "any" lookup first, followed by the A record request. In this example, the first request demonstrates the normal SERVFAIL result:
; <<>> DiG 9.9.6-P1 <<>> proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 48081
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net. IN A
;; Query time: 24 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri May 08 10:08:06 NZST 2020
;; MSG SIZE rcvd: 87
; <<>> DiG 9.9.6-P1 <<>> proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net. any
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23285
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net. IN ANY
;; ANSWER SECTION:
proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net. 221 IN CNAME proxyworkerrole.5.his-seas-1.cloudapp.net.
;; AUTHORITY SECTION:
msappproxy.net. 37135 IN NS ns1prod.226.azuredns-prd.org.
msappproxy.net. 37135 IN NS ns2prod.226.azuredns-prd.info.
msappproxy.net. 37135 IN NS ns2prod.226.azuredns-prd.org.
msappproxy.net. 37135 IN NS ns1prod.226.azuredns-prd.info.
;; Query time: 13 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri May 08 10:08:06 NZST 2020
;; MSG SIZE rcvd: 268
; <<>> DiG 9.9.6-P1 <<>> proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22645
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net. IN A
;; ANSWER SECTION:
proxyworkerrolein5-his-seas-1.connector.his.msappproxy.net. 221 IN CNAME proxyworkerrole.5.his-seas-1.cloudapp.net.
proxyworkerrole.5.his-seas-1.cloudapp.net. 10 IN A 13.67.79.189
;; Query time: 40 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri May 08 10:08:06 NZST 2020
;; MSG SIZE rcvd: 155
Relevant configuration files
(Paste any relevant configuration files - please use code blocks (```)
to format console output. If submitting the contents of your
configuration file in a non-confidential Issue, it is advisable to
obscure key secrets: this can be done automatically by using
named-checkconf -px
.)
Relevant logs and/or screenshots
We have confirmed that the forwarded server returns the correct detail back to the DNS server (iritndhc01). The implication is that iritndhc01 can't interpret the response from the forwarded request correctly, so returns SERVFAIL.
Possible fixes
(If you can, link to the line of code that might be responsible for the problem.)