BUG REPORT: Bind9 as upstream server after a while stops resolving some domains
THIS IS BUG REPORT NOT CONFIGURATION PROBLEM.
I already reported this while ago but was discarded as misconfiguration. Later I stopped receiving replies.. If it was misconfiguration then it wouldn't work for some time would it?? Please read below.
Problem with Bind9 is that when it is set up as upstream DNS server it stops accepting requests after couple hours/days. THIS IS NOT NETWORK PROBLEM SINCE IT IS HAPPENING ON LAN AND IPTABLES ARE TURNED OFF! Sorry for caps but that was argument of some "expert" here.. if that person read through the description it most likely would not argue as such. PLEASE this time have some courtesy and read all through.
Summary
I have 2 servers at home. One is main other is just backup in case main is down - this backup is running Pihole and my local DNS (bind9) is running on the main one. I have chained pihole into my LAN setup (set it up in DHCP service as only DNS server) and everything worked as expected for about 2 days. When the next day only some requests got "resolved" (same results on PC/phone). What I noticed was that request more local to me (.cz, .sk) got resolved but requests as .com, .net did not (however wikipedia.org worked). Strangely when resolving directly against bind9 all works every time. Only if it is through pihol when having bind9 as only upstream resolver it starts failing. When I restart bind9 everything goes back to normal for some time but it happens again (already happened 3 times).
I have no clue what it might causing this strange behaviour.. any inputs are more than welcome. Below is detailed description of the problem.
1) Working setup till now (user request -> DNS on main server -> if match respond otherwise forward to the upstream DNS[cloudflare]):
Not using pihole. Had local DNS on the main server and forwarders to (1.1.1.1/1.0.0.1). This is because when locally (or via VPN) users can access services directly on the LAN. Also I am blocking some countries and if users connects from that country my domain is translated Local IP not Public IP.
2) Setup I wan to achieve (user request -> DNS on Pihole -> Pihole Magic/Logic -> forwards to the bind9 on the main server -> if match respond otherwise forward to the upstream DNS[cloudflare]):
Just chain pihole in this whole setup.
- changed DNS IP in router's DHCP settings from the main server to Pihole
- tested with default settings (cloudflare upstream DNS servers) - works (Internet part)
- configured custom upstream DNS server to be the local IP of the main server, here where things started to be weird:
- some domains gets resolved without the problem, but for example
github.com
not:ERR_NAME_RESOLUTION_FAILED
- when connecting via the phone I am getting captive portal redirect to log in to the wifi network - so I guess it fails to resolve something as well
- e.g. not getting notifications on whatsapp/messenger/hangouts
- some domains gets resolved without the problem, but for example
BIND version used
BIND 9.11.5-P4-5.1-Debian (Extended Support Version) <id:998753c>
running on Linux x86_64 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26)
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--libexecdir=/usr/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=/usr' '--with-libidn2' '--with-libjson=/usr' '--with-lmdb=/usr' '--with-gnu-ld' '--with-geoip=/usr' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib/softhsm/libsofthsm2.so' '--with-randomdev=/dev/urandom' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-9ZuvGL/bind9-9.11.5.P4+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 8.3.0
compiled with OpenSSL version: OpenSSL 1.1.1c 28 May 2019
linked to OpenSSL version: OpenSSL 1.1.1d 10 Sep 2019
compiled with libxml2 version: 2.9.4
linked to libxml2 version: 20904
compiled with libjson-c version: 0.12.1
linked to libjson-c version: 0.12.1
threads support is enabled
Steps to reproduce
- default pihole config
- upstream dns server to the custom local like bind9 - can be with default config with upstream servers 1.1.1.1/1.0.0.1
- change dns in dhcp settings to pihole on the router
- experience that weird behavior
What is the current bug behavior?
User request -> DNS on Pihole -> Pihole Magic/Logic-> forwards to the bind9 on the main server -> >>> bind9 processing the forwarded request and if match respond otherwise forward to the upstream DNS[cloudflare] <<<
What is the expected correct behavior?
User request -> DNS on Pihole -> Pihole Magic/Logic -> forwards to the bind9 on the main server -> if match respond otherwise forward to the upstream DNS[cloudflare]
- it works like this until it breaks and just some requests got resolved
Your Environment
- Hardware architecture: AMDx64
- Linux caradhras 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
- Docker Host Operating System and OS Version: Linux edoras 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 GNU/Linux
- Docker Version: Docker version 19.03.5, build 633a0ea838
Relevant configuration files
(Paste any relevant configuration files - please use code blocks (```)
to format console output. If submitting the contents of your
configuration file in a non-confidential Issue, it is advisable to
obscure key secrets: this can be done automatically by using
named-checkconf -px
.)
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code, as it's very hard to read otherwise.)
IPTABLES: This is LAN setup and only IP Tables are in the way but they have a rule to allow all in from the LAN prefix. No difference where they are turned off by allow all.
38M 21G ACCEPT all -- lo any anywhere anywhere /* loopback in */
118M 169G ACCEPT all -- br0 any 192.168.255.0/24 anywhere /* LAN */
0 0 ACCEPT all -- br0 any 10.8.0.0/24 anywhere /* VPN */
First some console output from cmd
-
192.168.255.0/24
- LAN prefix -
192.168.255.9
- backup server running pihole -
192.168.255.11
- main server running bind9
root@caradhras:[~]: nslookup dsl.sk 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
Non-authoritative answer:
Name: dsl.sk
Address: 217.67.19.197
root@caradhras:[~]: nslookup czc.cz 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
Non-authoritative answer:
Name: czc.cz
Address: 82.99.173.171
Name: czc.cz
Address: 82.99.173.173
root@caradhras:[~]: nslookup google.sk 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
Non-authoritative answer:
Name: google.sk
Address: 172.217.23.195
Name: google.sk
Address: 2a00:1450:4014:80c::2003
root@caradhras:[~]: nslookup google.cz 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
Non-authoritative answer:
Name: google.cz
Address: 172.217.23.195
Name: google.cz
Address: 2a00:1450:4014:80c::2003
root@caradhras:[~]: nslookup google.com 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
Non-authoritative answer:
Name: google.com
Address: 216.58.201.110
Name: google.com
Address: 2a00:1450:4014:801::200e
root@caradhras:[~]: nslookup github.com 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
** server can't find github.com: SERVFAIL
root@caradhras:[~]: nslookup facebook.com 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
** server can't find facebook.com: SERVFAIL
root@caradhras:[~]: nslookup wikipedia.org 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
** server can't find wikipedia.org: SERVFAIL
root@caradhras:[~]: nslookup craigslist.org 192.168.255.9
Server: 192.168.255.9
Address: 192.168.255.9#53
** server can't find craigslist.org: SERVFAIL
root@caradhras:[~]: nslookup craigslist.org 192.168.255.11
Server: 192.168.255.11
Address: 192.168.255.11#53
Non-authoritative answer:
Name: craigslist.org
Address: 208.82.237.129
root@caradhras:[~]: nslookup wikipedia.org 192.168.255.11
Server: 192.168.255.11
Address: 192.168.255.11#53
Non-authoritative answer:
Name: wikipedia.org
Address: 91.198.174.192
Name: wikipedia.org
Address: 2620:0:862:ed1a::1
query.log
Here github.com did not get resolved but dsl.sk yes
06-Apr-2020 09:52:55.044 client @0x7fd14802df90 192.168.255.9#58817 (github.com): query: github.com IN A +E(0)D (192.168.255.11)
06-Apr-2020 09:52:55.049 client @0x7fd15c5ef860 192.168.255.9#25396 (sk): query: sk IN DS +E(0)D (192.168.255.11)
06-Apr-2020 09:52:55.056 client @0x7fd10c011db0 192.168.255.9#28941 (dsl.sk): query: dsl.sk IN DS +E(0)D (192.168.255.11)
misc.log
06-Apr-2020 05:10:08.754 resolver: info: resolver priming query complete
06-Apr-2020 05:11:16.986 resolver: info: resolver priming query complete
06-Apr-2020 05:11:19.476 resolver: info: resolver priming query complete
06-Apr-2020 05:12:11.061 edns-disabled: info: success resolving 'icecast-u1.play.cz/A' (in 'play.cz'?) after disabling EDNS
06-Apr-2020 05:12:18.276 edns-disabled: info: success resolving 'zara.ns.cloudflare.com/AAAA' (in 'com'?) after disabling EDNS
06-Apr-2020 05:12:19.472 edns-disabled: info: success resolving 'carl.ns.cloudflare.com/A' (in 'com'?) after disabling EDNS
06-Apr-2020 09:53:59.017 general: info: received control channel command 'stop'
06-Apr-2020 09:53:59.020 general: info: shutting down: flushing changes
06-Apr-2020 09:53:59.020 general: notice: stopping command channel on 127.0.0.1#953
06-Apr-2020 09:53:59.020 general: notice: stopping command channel on ::1#953
06-Apr-2020 09:53:59.022 network: info: no longer listening on ::#53
06-Apr-2020 09:53:59.022 network: info: no longer listening on 127.0.0.1#53
06-Apr-2020 09:53:59.022 network: info: no longer listening on 192.168.255.11#53
06-Apr-2020 09:53:59.073 general: notice: exiting
06-Apr-2020 09:53:59.136 general: info: managed-keys-zone: loaded serial 5
06-Apr-2020 09:53:59.136 general: info: zone 0.in-addr.arpa/IN: loaded serial 1
06-Apr-2020 09:53:59.138 general: info: zone localhost/IN: loaded serial 2
06-Apr-2020 09:53:59.138 general: info: zone cloudmin.example.com/IN: loaded serial 1580163634
06-Apr-2020 09:53:59.139 general: info: zone 127.in-addr.arpa/IN: loaded serial 1
06-Apr-2020 09:53:59.139 general: info: zone 255.in-addr.arpa/IN: loaded serial 1
06-Apr-2020 09:53:59.139 general: info: zone 255.168.192.in-addr.arpa/IN: loaded serial 7
06-Apr-2020 09:53:59.139 general: info: zone example.com/IN: loaded serial 7
06-Apr-2020 09:53:59.139 general: notice: all zones loaded
06-Apr-2020 09:53:59.139 general: notice: running
06-Apr-2020 09:53:59.631 resolver: info: resolver priming query complete
06-Apr-2020 09:54:01.060 resolver: info: resolver priming query complete
config:
acl "trusted" {
192.168.255.0/24;
10.8.0.0/24;
localhost;
localnets;
};
//zone "0.8.10.in-addr.arpa" {
// type master;
// file "/etc/bind/zones/db.10.8.0"; # 10.8.0.0/24 subnet
// //allow-transfer { 192.168.255.12; }; # private IP address - secondary
//};
options {
directory "/var/cache/bind";
recursion yes; # enables resursive queries
allow-recursion { trusted; }; # allows recursive queries from "trusted" clients
listen-on {
127.0.0.1;
192.168.255.11;
};
allow-transfer { none; }; # disable zone transfers by default
forwarders {
1.1.1.1;
1.0.0.1;
};
//forward only;
dnssec-enable yes;
dnssec-validation yes;
// If there is a firewall between you and nameservers you want
// to talk to, you may need to fix the firewall to allow multiple
// ports to talk. See http://www.kb.cert.org/vuls/id/800113
// If your ISP provided one or more IP addresses for stable
// nameservers, you probably want to use them as forwarders.
// Uncomment the following block, and insert the addresses replacing
// the all-0's placeholder.
// forwarders {
// 0.0.0.0;
// };
//========================================================================
// If BIND logs error messages about the root key being expired,
// you will need to update your keys. See https://www.isc.org/bind-keys
//========================================================================
//dnssec-validation auto;
auth-nxdomain no; # conform to RFC1035
listen-on-v6 {
any;
};
};
logging {
channel "misc" {
file "/var/log/named/misc.log" versions 10 size 10m;
print-time YES;
print-severity YES;
print-category YES;
};
channel "query" {
file "/var/log/named/query.log" versions 10 size 10m;
print-time YES;
print-severity NO;
print-category NO;
};
channel "lame" {
file "/var/log/named/lamers.log" versions 1 size 5m;
print-time yes;
print-severity yes;
severity info;
};
category "default" { "misc"; };
category "queries" { "query"; };
category "lame-servers" { "lame"; };
};
//
// Do any local configuration here
//
// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/etc/bind/zones.rfc1918";
zone "example.com" {
type master;
file "/etc/bind/zones/db.example.com"; # zone file path
//allow-transfer { 192.168.255.12; }; # private IP address - secondary
};
zone "255.168.192.in-addr.arpa" {
type master;
file "/etc/bind/zones/db.192.168.255"; # 192.168.255.0/24 subnet
//allow-transfer { 192.168.255.12; }; # private IP address - secondary
};
zone "cloudmin.example.com" {
type master;
file "/var/lib/bind/cloudmin.example.com.hosts";
};
Thank you, Michal