Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • BIND BIND
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 528
    • Issues 528
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 101
    • Merge requests 101
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ISC Open Source Projects
  • BINDBIND
  • Issues
  • #1874

Closed
Open
Created May 25, 2020 by Michal Aron@MacGyver27

BUG REPORT: Bind9 as upstream server after a while stops resolving some domains

THIS IS BUG REPORT NOT CONFIGURATION PROBLEM.

I already reported this while ago but was discarded as misconfiguration. Later I stopped receiving replies.. If it was misconfiguration then it wouldn't work for some time would it?? Please read below.

Problem with Bind9 is that when it is set up as upstream DNS server it stops accepting requests after couple hours/days. THIS IS NOT NETWORK PROBLEM SINCE IT IS HAPPENING ON LAN AND IPTABLES ARE TURNED OFF! Sorry for caps but that was argument of some "expert" here.. if that person read through the description it most likely would not argue as such. PLEASE this time have some courtesy and read all through.

Summary

I have 2 servers at home. One is main other is just backup in case main is down - this backup is running Pihole and my local DNS (bind9) is running on the main one. I have chained pihole into my LAN setup (set it up in DHCP service as only DNS server) and everything worked as expected for about 2 days. When the next day only some requests got "resolved" (same results on PC/phone). What I noticed was that request more local to me (.cz, .sk) got resolved but requests as .com, .net did not (however wikipedia.org worked). Strangely when resolving directly against bind9 all works every time. Only if it is through pihol when having bind9 as only upstream resolver it starts failing. When I restart bind9 everything goes back to normal for some time but it happens again (already happened 3 times).

I have no clue what it might causing this strange behaviour.. any inputs are more than welcome. Below is detailed description of the problem.

1) Working setup till now (user request -> DNS on main server -> if match respond otherwise forward to the upstream DNS[cloudflare]):

Not using pihole. Had local DNS on the main server and forwarders to (1.1.1.1/1.0.0.1). This is because when locally (or via VPN) users can access services directly on the LAN. Also I am blocking some countries and if users connects from that country my domain is translated Local IP not Public IP.

2) Setup I wan to achieve (user request -> DNS on Pihole -> Pihole Magic/Logic -> forwards to the bind9 on the main server -> if match respond otherwise forward to the upstream DNS[cloudflare]):

Just chain pihole in this whole setup.

  • changed DNS IP in router's DHCP settings from the main server to Pihole
  • tested with default settings (cloudflare upstream DNS servers) - works (Internet part)
  • configured custom upstream DNS server to be the local IP of the main server, here where things started to be weird:
    • some domains gets resolved without the problem, but for example github.com not: ERR_NAME_RESOLUTION_FAILED
    • when connecting via the phone I am getting captive portal redirect to log in to the wifi network - so I guess it fails to resolve something as well
    • e.g. not getting notifications on whatsapp/messenger/hangouts

BIND version used

BIND 9.11.5-P4-5.1-Debian (Extended Support Version) <id:998753c>
running on Linux x86_64 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26)
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--libexecdir=/usr/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=/usr' '--with-libidn2' '--with-libjson=/usr' '--with-lmdb=/usr' '--with-gnu-ld' '--with-geoip=/usr' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib/softhsm/libsofthsm2.so' '--with-randomdev=/dev/urandom' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-9ZuvGL/bind9-9.11.5.P4+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 8.3.0
compiled with OpenSSL version: OpenSSL 1.1.1c  28 May 2019
linked to OpenSSL version: OpenSSL 1.1.1d  10 Sep 2019
compiled with libxml2 version: 2.9.4
linked to libxml2 version: 20904
compiled with libjson-c version: 0.12.1
linked to libjson-c version: 0.12.1
threads support is enabled

Steps to reproduce

  1. default pihole config
  2. upstream dns server to the custom local like bind9 - can be with default config with upstream servers 1.1.1.1/1.0.0.1
  3. change dns in dhcp settings to pihole on the router
  4. experience that weird behavior

What is the current bug behavior?

User request -> DNS on Pihole -> Pihole Magic/Logic-> forwards to the bind9 on the main server -> >>> bind9 processing the forwarded request and if match respond otherwise forward to the upstream DNS[cloudflare] <<<

What is the expected correct behavior?

User request -> DNS on Pihole -> Pihole Magic/Logic -> forwards to the bind9 on the main server -> if match respond otherwise forward to the upstream DNS[cloudflare]

  • it works like this until it breaks and just some requests got resolved

Your Environment

  • Hardware architecture: AMDx64
  • Linux caradhras 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
  • Docker Host Operating System and OS Version: Linux edoras 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 GNU/Linux
  • Docker Version: Docker version 19.03.5, build 633a0ea838

Relevant configuration files

(Paste any relevant configuration files - please use code blocks (```) to format console output. If submitting the contents of your configuration file in a non-confidential Issue, it is advisable to obscure key secrets: this can be done automatically by using named-checkconf -px.)

Relevant logs and/or screenshots

(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code, as it's very hard to read otherwise.)

IPTABLES: This is LAN setup and only IP Tables are in the way but they have a rule to allow all in from the LAN prefix. No difference where they are turned off by allow all.
  38M   21G ACCEPT     all  --  lo     any     anywhere             anywhere             /* loopback in */
 118M  169G ACCEPT     all  --  br0    any     192.168.255.0/24     anywhere             /* LAN */
    0     0 ACCEPT     all  --  br0    any     10.8.0.0/24          anywhere             /* VPN */
First some console output from cmd
  • 192.168.255.0/24 - LAN prefix
  • 192.168.255.9 - backup server running pihole
  • 192.168.255.11 - main server running bind9
root@caradhras:[~]: nslookup dsl.sk 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

Non-authoritative answer:
Name:   dsl.sk
Address: 217.67.19.197

root@caradhras:[~]: nslookup czc.cz 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

Non-authoritative answer:
Name:   czc.cz
Address: 82.99.173.171
Name:   czc.cz
Address: 82.99.173.173

root@caradhras:[~]: nslookup google.sk 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

Non-authoritative answer:
Name:   google.sk
Address: 172.217.23.195
Name:   google.sk
Address: 2a00:1450:4014:80c::2003

root@caradhras:[~]: nslookup google.cz 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

Non-authoritative answer:
Name:   google.cz
Address: 172.217.23.195
Name:   google.cz
Address: 2a00:1450:4014:80c::2003

root@caradhras:[~]: nslookup google.com 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.201.110
Name:   google.com
Address: 2a00:1450:4014:801::200e

root@caradhras:[~]: nslookup github.com 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

** server can't find github.com: SERVFAIL

root@caradhras:[~]: nslookup facebook.com 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

** server can't find facebook.com: SERVFAIL

root@caradhras:[~]: nslookup wikipedia.org 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

** server can't find wikipedia.org: SERVFAIL

root@caradhras:[~]: nslookup craigslist.org 192.168.255.9
Server:         192.168.255.9
Address:        192.168.255.9#53

** server can't find craigslist.org: SERVFAIL

root@caradhras:[~]: nslookup craigslist.org 192.168.255.11
Server:         192.168.255.11
Address:        192.168.255.11#53

Non-authoritative answer:
Name:   craigslist.org
Address: 208.82.237.129

root@caradhras:[~]: nslookup wikipedia.org 192.168.255.11
Server:         192.168.255.11
Address:        192.168.255.11#53

Non-authoritative answer:
Name:   wikipedia.org
Address: 91.198.174.192
Name:   wikipedia.org
Address: 2620:0:862:ed1a::1
query.log

Here github.com did not get resolved but dsl.sk yes

06-Apr-2020 09:52:55.044 client @0x7fd14802df90 192.168.255.9#58817 (github.com): query: github.com IN A +E(0)D (192.168.255.11)
06-Apr-2020 09:52:55.049 client @0x7fd15c5ef860 192.168.255.9#25396 (sk): query: sk IN DS +E(0)D (192.168.255.11)
06-Apr-2020 09:52:55.056 client @0x7fd10c011db0 192.168.255.9#28941 (dsl.sk): query: dsl.sk IN DS +E(0)D (192.168.255.11)
misc.log
06-Apr-2020 05:10:08.754 resolver: info: resolver priming query complete
06-Apr-2020 05:11:16.986 resolver: info: resolver priming query complete
06-Apr-2020 05:11:19.476 resolver: info: resolver priming query complete
06-Apr-2020 05:12:11.061 edns-disabled: info: success resolving 'icecast-u1.play.cz/A' (in 'play.cz'?) after disabling EDNS
06-Apr-2020 05:12:18.276 edns-disabled: info: success resolving 'zara.ns.cloudflare.com/AAAA' (in 'com'?) after disabling EDNS
06-Apr-2020 05:12:19.472 edns-disabled: info: success resolving 'carl.ns.cloudflare.com/A' (in 'com'?) after disabling EDNS
06-Apr-2020 09:53:59.017 general: info: received control channel command 'stop'
06-Apr-2020 09:53:59.020 general: info: shutting down: flushing changes
06-Apr-2020 09:53:59.020 general: notice: stopping command channel on 127.0.0.1#953
06-Apr-2020 09:53:59.020 general: notice: stopping command channel on ::1#953
06-Apr-2020 09:53:59.022 network: info: no longer listening on ::#53
06-Apr-2020 09:53:59.022 network: info: no longer listening on 127.0.0.1#53
06-Apr-2020 09:53:59.022 network: info: no longer listening on 192.168.255.11#53
06-Apr-2020 09:53:59.073 general: notice: exiting
06-Apr-2020 09:53:59.136 general: info: managed-keys-zone: loaded serial 5
06-Apr-2020 09:53:59.136 general: info: zone 0.in-addr.arpa/IN: loaded serial 1
06-Apr-2020 09:53:59.138 general: info: zone localhost/IN: loaded serial 2
06-Apr-2020 09:53:59.138 general: info: zone cloudmin.example.com/IN: loaded serial 1580163634
06-Apr-2020 09:53:59.139 general: info: zone 127.in-addr.arpa/IN: loaded serial 1
06-Apr-2020 09:53:59.139 general: info: zone 255.in-addr.arpa/IN: loaded serial 1
06-Apr-2020 09:53:59.139 general: info: zone 255.168.192.in-addr.arpa/IN: loaded serial 7
06-Apr-2020 09:53:59.139 general: info: zone example.com/IN: loaded serial 7
06-Apr-2020 09:53:59.139 general: notice: all zones loaded
06-Apr-2020 09:53:59.139 general: notice: running
06-Apr-2020 09:53:59.631 resolver: info: resolver priming query complete
06-Apr-2020 09:54:01.060 resolver: info: resolver priming query complete
config:
acl "trusted" {
	192.168.255.0/24;
	10.8.0.0/24;
	localhost;
	localnets;
};

//zone "0.8.10.in-addr.arpa" {
//    type master;
//    file "/etc/bind/zones/db.10.8.0";		# 10.8.0.0/24 subnet
//    //allow-transfer { 192.168.255.12; };	# private IP address - secondary
//};

options {
	directory "/var/cache/bind";

	recursion yes;					# enables resursive queries
	allow-recursion { trusted; };	# allows recursive queries from "trusted" clients
	listen-on {
		127.0.0.1;
		192.168.255.11;
	};
	allow-transfer { none; };		# disable zone transfers by default

	forwarders {
			1.1.1.1;
			1.0.0.1;
	};
	//forward only;

	dnssec-enable yes;
	dnssec-validation yes;

	// If there is a firewall between you and nameservers you want
	// to talk to, you may need to fix the firewall to allow multiple
	// ports to talk.  See http://www.kb.cert.org/vuls/id/800113

	// If your ISP provided one or more IP addresses for stable 
	// nameservers, you probably want to use them as forwarders.  
	// Uncomment the following block, and insert the addresses replacing 
	// the all-0's placeholder.

	// forwarders {
	// 	0.0.0.0;
	// };

	//========================================================================
	// If BIND logs error messages about the root key being expired,
	// you will need to update your keys.  See https://www.isc.org/bind-keys
	//========================================================================
	//dnssec-validation auto;

	auth-nxdomain no;    # conform to RFC1035
	listen-on-v6 {
		any;
	};
};

logging {
          channel "misc" {
                    file "/var/log/named/misc.log" versions 10 size 10m;
                    print-time YES;
                    print-severity YES;
                    print-category YES;
          };
  
          channel "query" {
                    file "/var/log/named/query.log" versions 10 size 10m;
                    print-time YES;
                    print-severity NO;
                    print-category NO;
          };
  
          channel "lame" {
                    file "/var/log/named/lamers.log" versions 1 size 5m;
                    print-time yes;
                    print-severity yes;
                    severity info;
          };
  
          category "default" { "misc"; };
          category "queries" { "query"; };
          category "lame-servers" { "lame"; };
  
};
//
// Do any local configuration here
//

// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/etc/bind/zones.rfc1918";

zone "example.com" {
    type master;
    file "/etc/bind/zones/db.example.com";	# zone file path
    //allow-transfer { 192.168.255.12; };	# private IP address - secondary
};

zone "255.168.192.in-addr.arpa" {
    type master;
    file "/etc/bind/zones/db.192.168.255";      # 192.168.255.0/24 subnet
    //allow-transfer { 192.168.255.12; };       # private IP address - secondary
};

zone "cloudmin.example.com" {
	type master;
	file "/var/lib/bind/cloudmin.example.com.hosts";
	};

Thank you, Michal

Edited May 25, 2020 by Michal Aron
Assignee
Assign to
Time tracking