bind should use IP_RECVERR / IPV6_RECVERR to learn about ICMP destination network/host unreachable
Description
bind uses a connect()ed UDP socket when sending recursive queries. The connect()ed socket provides feedback on a variety of ICMP errors (eg port unreachable) which bind can then use to decide what to do with errors (report them to the client, try again with a different nameserver etc).
However, Linux's implementation does not report what it considers "transient" conditions, which is defined as Destination host Unreachable, Destination network unreachable, Source Route Failed and Message Too Big.
stracing and tcpdumping named shows:
; Setting up the socket:[pid 7339] 23:06:03.324158 socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 6 <0.000027>
[pid 7339] 23:06:03.324240 fcntl(6, F_DUPFD, 512) = 523 <0.000032>
[pid 7339] 23:06:03.324333 close(6) = 0 <0.000024>
[pid 7339] 23:06:03.324405 fcntl(523, F_GETFL) = 0x2 (flags O_RDWR) <0.000018>
[pid 7339] 23:06:03.324464 fcntl(523, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000018>
[pid 7339] 23:06:03.324520 setsockopt(523, SOL_SOCKET, SO_TIMESTAMP, [1], 4) = 0 <0.000020>
[pid 7339] 23:06:03.324582 setsockopt(523, SOL_IP, IP_MTU_DISCOVER, [5], 4) = 0 <0.000020>
[pid 7339] 23:06:03.324641 getsockopt(523, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0 <0.000019>
[pid 7339] 23:06:03.324702 setsockopt(523, SOL_IP, IP_RECVTOS, [1], 4) = 0 <0.000021>
[pid 7339] 23:06:03.324756 bind(523, {sa_family=AF_INET, sin_port=htons(54746), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 <0.000013>
[pid 7339] 23:06:03.324804 recvmsg(523, {msg_namelen=128}, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000010>
[pid 7339] 23:06:03.324950 connect(523, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.112.36.4")}, 16) = 0 <0.000023>
[pid 7341] 23:06:03.324972 epoll_ctl(8, EPOLL_CTL_ADD, 523, {EPOLLIN, {u32=523, u64=523}}) = 0 <0.000018>
; Now, send the packet:
[pid 7339] 23:06:03.325017 sendmsg(523, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\27\201\0\0\0\1\0\0\0\0\0\0\3www\7example\3com\0\0\1\0"..., iov_len=33}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 33 <0.000066>
; And here's the packets, as seen from tcpdump:
23:06:03.325049 IP 192.168.0.1.54746 > 192.112.36.4.53: 6017 A? www.example.com. (33)
23:06:03.325081 IP 192.168.0.254 > 192.168.0.1: ICMP net 192.112.36.4 unreachable, length 69
; See where the packet was sent from...
[pid 7339] 23:06:03.325155 getsockname(523, {sa_family=AF_INET, sin_port=htons(36615), sin_addr=inet_addr("192.168.0.1")}, [128->16]) = 0 <0.000028>
; Now we wait ...
; And then we time out and give up
[pid 7341] 23:06:03.854391 epoll_ctl(8, EPOLL_CTL_DEL, 523, 0x7ff0a8bcbdd8) = 0 <0.000024>
[pid 7341] 23:06:03.854460 epoll_ctl(8, EPOLL_CTL_DEL, 523, 0x7ff0a8bcbdd8) = -1 ENOENT (No such file or directory) <0.000022>
[pid 7341] 23:06:03.854534 close(523) = 0 <0.000023>
Request
bind should set setsockopt(fd, SOL_IP, IP_RECVERR, &one, sizeof(one));
and receive a EPOLL_ERR, and recvmsg return value so that it notices that the send has failed immediately rather than waiting for a timeout.
This would let bind react much much more quickly to network events.
Links / references
- libc bug and discussion: https://sourceware.org/bugzilla/show_bug.cgi?id=24047
- dns operations discussion: https://lists.dns-oarc.net/pipermail/dns-operations/2019-January/018271.html
- Linux kernel bug and discussion: https://bugzilla.kernel.org/show_bug.cgi?id=202355
- Linux manpage documentation bug: https://bugzilla.kernel.org/show_bug.cgi?id=202369