lame servers with IPv6 unreachable make dispatch@netmgr stuck on shutdown
So, I narrowed it down to a single query:
dig -p 5300 IN A mail.lab.comcor.ru. @10.10.10.20
which goes like this:
04-Oct-2021 09:25:59.507 resolver priming query complete
04-Oct-2021 09:26:16.119 network unreachable resolving '_.ru/A/IN': 2001:500:2f::f#53
04-Oct-2021 09:26:16.119 network unreachable resolving '_.ru/A/IN': 2001:500:12::d0d#53
04-Oct-2021 09:26:16.135 network unreachable resolving '_.comcor.ru/A/IN': 2001:678:15:0:193:232:142:17#53
04-Oct-2021 09:26:16.135 network unreachable resolving '_.comcor.ru/A/IN': 2001:678:18:0:194:190:124:17#53
04-Oct-2021 09:26:16.135 network unreachable resolving '_.comcor.ru/A/IN': 2001:678:17:0:193:232:128:6#53
04-Oct-2021 09:26:16.135 network unreachable resolving '_.comcor.ru/A/IN': 2001:678:16:0:194:85:252:62#53
04-Oct-2021 09:26:16.135 network unreachable resolving '_.comcor.ru/A/IN': 2001:678:14:0:193:232:156:17#53
04-Oct-2021 09:26:16.143 network unreachable resolving '_.lab.comcor.ru/A/IN': 2a02:290:0:2::5#53
04-Oct-2021 09:26:16.143 network unreachable resolving '_.lab.comcor.ru/A/IN': 2a02:290:0:1::4#53
04-Oct-2021 09:26:16.203 network unreachable resolving 'ns.lab.comcor.ru/AAAA/IN': 2001:678:17:0:193:232:128:6#53
04-Oct-2021 09:26:16.207 network unreachable resolving 'ns.lab.comcor.ru/AAAA/IN': 2001:678:18:0:194:190:124:17#53
04-Oct-2021 09:26:16.207 network unreachable resolving 'ns.lab.comcor.ru/AAAA/IN': 2001:678:16:0:194:85:252:62#53
04-Oct-2021 09:26:16.251 lame server resolving 'mail.lab.comcor.ru' (in 'lab.COMCOR.ru'?): 212.45.0.3#53
04-Oct-2021 09:26:16.335 network unreachable resolving 'ns.lab.comcor.ru/AAAA/IN': 2a02:290:0:2::5#53
04-Oct-2021 09:26:16.443 lame server resolving 'ns.lab.comcor.ru' (in 'lab.COMCOR.ru'?): 212.45.0.3#53
You also need to have broken IPv6 :-), it doesn't happen when I run named -4
.
Edited
This is caused by the new dispatch code:
- Start
named -p 5300 -g -c /dev/null
- Start
dnsperf -s 127.0.0.1 -p 5300 -D -d queryfile-example-10million-201202
- Press Ctrl-C
-
named
doesn't stop:
$ eu-stack -p $(pidof named)
PID 275694 - process
TID 275694:
#0 0x00007f4b7bac9c61 clock_nanosleep@@GLIBC_2.17
#1 0x00007f4b7bacf443 __nanosleep
#2 0x00007f4b7bafa125 usleep
#3 0x00007f4b7c476f19 isc__taskmgr_destroy
#4 0x00007f4b7c45b994 isc_managers_destroy
#5 0x0000564d877b86cf destroy_managers
#6 0x0000564d877b86da cleanup
#7 0x0000564d877ba041 main
#8 0x00007f4b7ba2ad0a __libc_start_main
#9 0x0000564d877ae9ca _start
TID 275708:
#0 0x00007f4b7bb02116 epoll_wait
#1 0x00007f4b7be18b3f uv__io_poll
#2 0x00007f4b7be07714 uv_run
#3 0x00007f4b7c43c33e nm_thread
#4 0x00007f4b7c47ce50 isc__trampoline_run
#5 0x00007f4b7bbd1ea7 start_thread
#6 0x00007f4b7bb01def __clone
TID 275709:
#0 0x00007f4b7bbd8ad8 pthread_cond_timedwait@@GLIBC_2.3.2
#1 0x00007f4b7c44f081 isc_condition_waituntil
#2 0x00007f4b7c479dab run
#3 0x00007f4b7c47ce50 isc__trampoline_run
#4 0x00007f4b7bbd1ea7 start_thread
#5 0x00007f4b7bb01def __clone
TID 275710:
#0 0x00007f4b7bb02116 epoll_wait
#1 0x00007f4b7c46f6f2 netthread
#2 0x00007f4b7c47ce50 isc__trampoline_run
#3 0x00007f4b7bbd1ea7 start_thread
#4 0x00007f4b7bb01def __clone
+1 we are obviously missing a system test for this kind of scenario.
Edited by Ondřej Surý