Commit graph

12763 commits

Author SHA1 Message Date
Witold Kręcicki
b4e27a075a Don't crash if isc_uv_export returns an error in accept_connection.
isc_uv_export can return an error - e.g. EMFILE (from dup), handle this
nicely.

(cherry picked from commit ae9a6befa8)
2020-10-22 14:59:00 -07:00
Ondřej Surý
81085bbeca Fix the way udp_send_direct() is used
There were two problems how udp_send_direct() was used:

1. The udp_send_direct() can return ISC_R_CANCELED (or translated error
   from uv_udp_send()), but the isc__nm_async_udpsend() wasn't checking
   the error code and not releasing the uvreq in case of an error.

2. In isc__nm_udp_send(), when the UDP send is already in the right
   netthread, it uses udp_send_direct() to send the UDP packet right
   away.  When that happened the uvreq was not freed, and the error code
   was returned to the caller.  We need to return ISC_R_SUCCESS and
   rather use the callback to report an error in such case.

(cherry picked from commit afca2e3b21)
2020-10-22 14:59:00 -07:00
Tinderbox User
44e91206a4 prep 9.16.8 2020-10-22 09:09:07 +02:00
Diego Fronza
d5355b8105 Always return address records in additional section for NS queries 2020-10-21 12:12:22 -03:00
Diego Fronza
7a3dbbc395 Fix transfer of glue records in stub zones if master has minimal-responses set
Stub zones don't make use of AXFR/IXFR for the transfering of zone
data, instead, a single query is issued to the master asking for
their nameserver records (NS).

That works fine unless master is configured with 'minimal-responses'
set to yes, in which case glue records are not provided by master
in the answer with nameservers authoritative for the zone, leaving
stub zones with incomplete databases.

This commit fix this problem in a simple way, when the answer with
the authoritative nameservers is received from master (stub_callback),
for each nameserver listed (save_nsrrset), a A and AAAA records for
the name is verified in the additional section, and if not present
a query is created to resolve the corresponsing missing glue.

A struct 'stub_cb_args' was added to keep relevant information for
performing a query, like TSIG key, udp size, dscp value, etc, this
information is borrowed from, and created within function 'ns_query',
where the resolving of nameserver from master starts.

A new field was added to the struct 'dns_stub', an atomic integer,
namely pending_requests, which is used to keep how many queries are
created when resolving nameserver addresses that were missing in
the glue.

When the value of pending_requests is zero we know we can release
resources, adjust zone timers, dump to zone file, etc.
2020-10-21 12:11:31 -03:00
Matthijs Mekking
5c0b5b64e5 Don't increment network error stats on UV_EOF
When networking statistics was added to the netmgr (in commit
5234a8e00a), two lines were added that
increment the 'STATID_RECVFAIL' statistic: One if 'uv_read_start'
fails and one at the end of the 'read_cb'.  The latter happens
if 'nread < 0'.

According to the libuv documentation, I/O read callbacks (such as for
files and sockets) are passed a parameter 'nread'. If 'nread' is less
than 0, there was an error and 'UV_EOF' is the end of file error, which
you may want to handle differently.

In other words, we should not treat EOF as a RECVFAIL error.

(cherry picked from commit 6c5ff94218)
2020-10-20 14:05:09 +00:00
Mark Andrews
da0a7a34ec Complete the isc_nmhandle_detach() in the worker thread.
isc_nmhandle_detach() needs to complete in the same thread
as shutdown_walk_cb() to avoid a race.  Clear the caller's
pointer then pass control to the worker if necessary.

    WARNING: ThreadSanitizer: data race
    Write of size 8 at 0x000000000001 by thread T1:
    #0 isc_nmhandle_detach lib/isc/netmgr/netmgr.c:1258:15
    #1 control_command bin/named/controlconf.c:388:3
    #2 dispatch lib/isc/task.c:1152:7
    #3 run lib/isc/task.c:1344:2

    Previous read of size 8 at 0x000000000001 by thread T2:
    #0 isc_nm_pauseread lib/isc/netmgr/netmgr.c:1449:33
    #1 recv_data lib/isccc/ccmsg.c:109:2
    #2 isc__nm_tcp_shutdown lib/isc/netmgr/tcp.c:1157:4
    #3 shutdown_walk_cb lib/isc/netmgr/netmgr.c:1515:3
    #4 uv_walk <null>
    #5 process_queue lib/isc/netmgr/netmgr.c:659:4
    #6 process_normal_queue lib/isc/netmgr/netmgr.c:582:10
    #7 process_queues lib/isc/netmgr/netmgr.c:590:8
    #8 async_cb lib/isc/netmgr/netmgr.c:548:2
    #9 <null> <null>

(cherry picked from commit f95ba8aa20)
2020-10-15 11:03:47 +11:00
Ondřej Surý
dbf2d0b15f Clean the last remnant of ISC_PLATFORM_HAVEIPV6 macro
In set_sndbuf() we were using ISC_PLATFORM_HAVEIPV6 macro that doesn't
exist anymore, because we assume that IPv6 support is always available.

(cherry picked from commit 96ac91a18a)
2020-10-08 09:03:08 +02:00
Ondřej Surý
53d6a11a0e Clone the csock in accept_connection(), not in callback
If we clone the csock (children socket) in TCP accept_connection()
instead of passing the ssock (server socket) to the call back and
cloning it there we unbreak the assumption that every socket is handled
inside it's own worker thread and therefore we can get rid of (at least)
callback locking.

(cherry picked from commit e8b56acb49)
2020-10-08 08:16:54 +02:00
Ondřej Surý
69e6c8467c Change the isc__nm_tcpdns_stoplistening() to be asynchronous event
The isc__nm_tcpdns_stoplistening() would call isc__nmsocket_clearcb()
that would clear the .accept_cb from non-netmgr thread.  Change the
tcpdns_stoplistening to enqueue ievent that would get processed in the
right netmgr thread to avoid locking.

(cherry picked from commit d86a74d8a4)
2020-10-08 08:16:53 +02:00
Mark Andrews
84922b2dc7 Restore the dns_message_reset() call before the dns_dispatch_getnext()
This was accidentally lost in the process of moving rmessage from fctx
to query.  Without this dns_message_setclass() will fail.

(cherry picked from commit 1f63bb15b3)
2020-10-08 16:27:10 +11:00
Mark Andrews
33d7b5b56f Silence Coverity REVERSE_INULL report
message does not need to be tested to NULL

(cherry picked from commit f0a66cb5aa)
2020-10-06 23:37:13 +11:00
Ondřej Surý
58a518adca Change the default ENDS buffer size to 1232 for DNS Flag Day 2020
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication.  In this commit, we implement the minimal
required changes by changing the defaults for `edns-udp-size`,
`max-udp-size` and `nocookie-udp-size` to `1232` (the value picked by
DNS Flag Day 2020).

(cherry picked from commit bb990030d3)
2020-10-06 09:35:20 +02:00
Ondřej Surý
ccd2902a02 Split reusing the addr/port and load-balancing socket options
The SO_REUSEADDR, SO_REUSEPORT and SO_REUSEPORT_LB has different meaning
on different platform. In this commit, we split the function to set the
reuse of address/port and setting the load-balancing into separate
functions.

The libuv library already have multiplatform support for setting
SO_REUSEADDR and SO_REUSEPORT that allows binding to the same address
and port, but unfortunately, when used after the load-balancing socket
options have been already set, it overrides the previous setting, so we
need our own helper function to enable the SO_REUSEADDR/SO_REUSEPORT
first and then enable the load-balancing socket option.

(cherry picked from commit fd975a551d)
2020-10-05 16:19:23 +02:00
Ondřej Surý
5cff119533 Use uv_os_sock_t instead of uv_os_fd_t for sockets
On POSIX based systems both uv_os_sock_t and uv_os_fd_t are both typedef
to int.  That's not true on Windows, where uv_os_sock_t is SOCKET and
uv_os_fd_t is HANDLE and they differ in level of indirection.

(cherry picked from commit acb6ad9e3c)
2020-10-05 16:19:23 +02:00
Ondřej Surý
601fc37efe Refactor isc__nm_socket_freebind() to take fd and sa_family as args
The isc__nm_socket_freebind() has been refactored to match other
isc__nm_socket_...() helper functions and take uv_os_fd_t and
sa_family_t as function arguments.

(cherry picked from commit 9dc01a636b)
2020-10-05 16:19:23 +02:00
Ondřej Surý
c3d721b13b Add helper function to enable DF (don't fragment) flag on UDP sockets
This commits add isc__nm_socket_dontfrag() helper functions.

(cherry picked from commit d685bbc822)
2020-10-05 16:19:23 +02:00
Ondřej Surý
72b85a4100 Add SO_REUSEPORT and SO_INCOMING_CPU helper functions
The setting of SO_REUSE**** and SO_INCOMING_CPU have been moved into a
separate helper functions.

(cherry picked from commit 5daaca7146)
2020-10-05 16:19:23 +02:00
Matthijs Mekking
63652ca58f Use explicit result codes for 'rndc dnssec' cmd
It is better to add new result codes than to overload existing codes.

(cherry picked from commit 70d1ec432f)
2020-10-05 11:20:35 +02:00
Matthijs Mekking
6bbb2a8581 Various rndc dnssec -checkds fixes
While working on 'rndc dnssec -rollover' I noticed the following
(small) issues:

- The key files where updated with hints set to "-when" and that
  should always be "now.
- The kasp system test did not properly update the test number when
  calling 'rndc dnssec -checkds' (and ensuring that works).
- There was a missing ']' in the rndc.c help output.

(cherry picked from commit edc53fc416)
2020-10-05 11:20:35 +02:00
Matthijs Mekking
5bbecc5116 Test rndc rollover inactive key
When users (accidentally) try to roll an inactive key, throw an error.

(cherry picked from commit fcd34abb9e)
2020-10-05 11:20:35 +02:00
Matthijs Mekking
ad48f07c9a Add manual key rollover logic
Add to the keymgr a function that will schedule a rollover. This
basically means setting the time when the key needs to retire,
and updating the key lifetime, then update the state file. The next
time that named runs the keymgr the new lifetime will be taken into
account.

(cherry picked from commit df8276aef0)
2020-10-05 11:20:35 +02:00
Matthijs Mekking
79f9a5ddd5 Change condition for rndc dumpdb -expired
After backporting #1870 to 9.11-S I saw that the condition check there
is different than in the main branch. In 9.11-S "stale" can mean
stale and serve-stale, or not active (awaiting cleanup). In 9.16 and
later versions, "stale" is stale and serve-stale, and "ancient" means
not active (awaiting cleanup). An "ancient" RRset is one that is not
active (TTL expired) and is not eligble for serve-stale.

Update the condition for rndc dumpdb -expired to closer match what is
in 9.11-S.

(cherry picked from commit 5614454c3b)
2020-10-05 10:46:14 +02:00
Matthijs Mekking
456925d6ec Fix kasp min key size bug
The minimal size for RSASHA1, RSASHA256 is 512, but due to bad
assignment it was set to 1024.

(cherry picked from commit 7c555254fe)
2020-10-02 10:18:59 +02:00
Matthijs Mekking
a63dad13da Fix Ed25519 and Ed448 in dnssec-policy keymgr
The kasp code had bad implicit size values for the cryptographic
algorithms Ed25519 and Ed448. When creating keys they would never
match the dnssec-policy, leading to new attempts to create keys.

These algorithms were previously not yet added to the system tests,
due to lack of availability on some systems.

(cherry picked from commit 0e207392ec)
2020-10-02 10:18:25 +02:00
Michał Kępień
9e62c206c6 Allow "order none" in "rrset-order" rules
named-checkconf treats the following configuration as valid:

    options {
        rrset-order {
            order none;
        };
    };

Yet, the above configuration causes named to crash on startup with:

    order.c:74: REQUIRE(mode == 0x00000800 || mode == 0x00000400 || mode == 0x00800000) failed, back trace

Add DNS_RDATASETATTR_NONE to the list of RRset ordering modes accepted
by dns_order_add() to allow "order none" to be used in "rrset-order"
rules.  This both prevents the aforementioned crashes and addresses the
discrepancy between named-checkconf and named.

(cherry picked from commit dbcf683c1a)
2020-10-02 08:50:51 +02:00
Ondřej Surý
50db10b7ca Fix the clang 12 warnings with multi-line strings in string arrays
The clang 12 has a new warning that warns when using multi-line strings
in the string arrays, f.e.:

    { "aa",
      "b"
      "b",
      "cc" }

would generate warning like this:

    private_test.c:162:7: error: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Werror,-Wstring-concatenation]
                                      "33333/RSASHA1" };
                                      ^
    private_test.c:161:7: note: place parentheses around the string literal to silence warning
                                      "Done removing signatures for key "
                                      ^
    private_test.c:197:7: error: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Werror,-Wstring-concatenation]
                                      "NSEC chain",
                                      ^
    private_test.c:196:7: note: place parentheses around the string literal to silence warning
                                      "Removing NSEC3 chain 1 0 30 DEAF / creating "
                                      ^
    2 errors generated.

(cherry picked from commit 7b07f22969)
2020-10-01 18:42:11 +02:00
Ondřej Surý
7a90ad1fe2 Add separate prefetch nmhandle to ns_client_t
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records.  The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.

(cherry picked from commit d4976e0ebe)
2020-10-01 18:09:35 +02:00
Ondřej Surý
1126fe3b5b Refactor the pausing/unpausing and finishing the nm_thread
The isc_nm_pause(), isc_nm_resume() and finishing the nm_thread() from
nm_destroy() has been refactored, so all use the netievents instead of
directly touching the worker structure members.  This allows us to
remove most of the locking as the .paused and .finished members are
always accessed from the matching nm_thread.

When shutting down the nm_thread(), instead of issuing uv_stop(), we
just shutdown the .async handler, so all uv_loop_t events are properly
finished first and uv_run() ends gracefully with no outstanding active
handles in the loop.

(cherry picked from commit e5ab137ba3)
2020-10-01 18:09:35 +02:00
Witold Kręcicki
4a7dfd69ac tracing of active sockets and handles
If NETMGR_TRACE is defined, we now maintain a list of active sockets
in the netmgr object and a list of active handles in each socket
object; by walking the list and printing `backtrace` in a debugger
we can see where they were created, to assist in in debugging of
reference counting errors.

On shutdown, if netmgr finds there are still active sockets after
waiting, isc__nm_dump_active() will be called to log the list of
active sockets and their underlying handles, along with some details
about them.

(cherry picked from commit 00e04a86c8)
2020-10-01 18:09:35 +02:00
Evan Hunt
686b73ae25 limit the time we wait for netmgr to be destroyed
if more than 10 seconds pass while we wait for netmgr events to
finish running on shutdown, something is almost certainly wrong
and we should assert and crash.

(cherry picked from commit 2f2d60a989)
2020-10-01 18:09:35 +02:00
Ondřej Surý
5a92958fba properly lock the setting/unsetting of callbacks in isc_nmsocket_t
changes to socket callback functions were not thread safe.

(cherry picked from commit 89c534d3b9)
2020-10-01 18:09:35 +02:00
Evan Hunt
ba2e9dfb99 change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.

A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:

        - reqhandle:    held while waiting for a request callback (query,
                        notify, update)
        - sendhandle:   held while waiting for a send callback
        - fetchhandle:  held while waiting for a recursive fetch to
                        complete
        - updatehandle: held while waiting for an update-forwarding
                        task to complete

(cherry picked from commit 57b4dde974)
2020-10-01 18:09:35 +02:00
Witold Kręcicki
0202b289c2 assorted small netmgr-related changes
- rename isc_nmsocket_t->tcphandle to statichandle
- cancelread functions now take handles instead of sockets
- add a 'client' flag in socket objects, currently unused, to
  indicate whether it is to be used as a client or server socket

(cherry picked from commit 7eb4564895)
2020-10-01 16:44:43 +02:00
Evan Hunt
7a4e97ef50 Use different allocators for UDP and TCP
Each worker has a receive buffer with space for 20 DNS messages of up
to 2^16 bytes each, and the allocator function passed to uv_read_start()
or uv_udp_recv_start() will reserve a portion of it for use by sockets.
UDP can use recvmmsg() and so it needs that entire space, but TCP reads
one message at a time.

This commit introduces separate allocator functions for TCP and UDP
setting different buffer size limits, so that libuv will provide the
correct buffer sizes to each of them.

(cherry picked from commit 38264b6a4d)
2020-10-01 16:44:43 +02:00
Witold Kręcicki
f0b089d922 netmgr: retry binding with IP_FREEBIND when EADDRNOTAVAIL is returned.
When a new IPv6 interface/address appears it's first in a tentative
state - in which we cannot bind to it, yet it's already being reported
by the route socket. Because of that BIND9 is unable to listen on any
newly detected IPv6 addresses. Fix it by setting IP_FREEBIND option (or
equivalent option on other OSes) and then retrying bind() call.

(cherry picked from commit a0f7d28967)
2020-10-01 16:44:43 +02:00
Evan Hunt
bc5ea9d65e use handles for isc_nm_pauseread() and isc_nm_resumeread()
by having these functions act on netmgr handles instead of socket
objects, they can be used in callback functions outside the netgmr.

(cherry picked from commit 55896df79d)
2020-10-01 16:44:43 +02:00
Evan Hunt
6b77bd309a Don't destroy a non-closed socket, wait for all the callbacks.
We erroneously tried to destroy a socket after issuing
isc__nm_tcp{,dns}_close. Under some (race) circumstances we could get
nm_socket_cleanup to be called twice for the same socket, causing an
access to a dead memory.

(cherry picked from commit 233f134a4f)
2020-10-01 16:44:43 +02:00
Witold Kręcicki
5a0fbc41ec Fix possible race in isc__nm_tcpconnect.
There's a possibility of race in isc__nm_tcpconnect if the asynchronous
connect operation finishes with all the callbacks before we exit the
isc__nm_tcpconnect itself we might access an already freed memory.
Fix it by creating an additional reference to the socket freed at the
end of isc__nm_tcpconnect.

(cherry picked from commit 896db0f419)
2020-10-01 16:44:43 +02:00
Witold Kręcicki
ebb4d506d6 Add missing libisc.def definitions, netmgr version of isc_sockettype_t.
(cherry picked from commit 25f84ffc68)
2020-10-01 16:44:43 +02:00
Evan Hunt
b14cb9e2f1 restore "blackhole" functionality
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.

in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().

(cherry picked from commit 23c7373d68)
2020-10-01 16:44:43 +02:00
Evan Hunt
80569bf977 Make netmgr tcpdns send calls asynchronous
isc__nm_tcpdns_send() was not asynchronous and accessed socket
internal fields in an unsafe manner, which could lead to a race
condition and subsequent crash. Fix it by moving tcpdns processing
to a proper netmgr thread.

(cherry picked from commit 591b79b597)
2020-10-01 16:44:43 +02:00
Witold Kręcicki
3942b226b8 Fix a shutdown race in netmgr udp
We need to mark the socket as inactive early (and synchronously)
in the stoplistening process; otherwise we might destroy the
callback argument before we actually stop listening, and call
the callback on bad memory.

(cherry picked from commit 1cf65cd882)
2020-10-01 16:44:43 +02:00
Evan Hunt
ca39572e5d clean up outerhandle when a tcpdns socket is disconnected
this prevents a crash when some non-netmgr thread, such as a
recursive lookup, times out after the TCP socket is already
disconnected.

(cherry picked from commit 3704c4fff2)
2020-10-01 16:44:43 +02:00
Evan Hunt
f64a881a30 change the signature of recv callbacks to include a result code
this will allow recv event handlers to distinguish between cases
in which the region is NULL because of error, shutdown, or cancelation.

(cherry picked from commit 75c985c07f)
2020-10-01 16:44:43 +02:00
Evan Hunt
d9d482e9e2 implement isc_nm_cancelread()
The isc_nm_cancelread() function cancels reading on a connected
socket and calls its read callback function with a 'result'
parameter of ISC_R_CANCELED.

(cherry picked from commit 5191ec8f86)
2020-10-01 16:44:43 +02:00
Evan Hunt
e1ebbaacea shorten the sleep in isc_nm_destroy()
when isc_nm_destroy() is called, there's a loop that waits for
other references to be detached, pausing and unpausing the netmgr
to ensure that all the workers' events are run, followed by a
1-second sleep. this caused a delay on shutdown which will be
noticeable when netmgr is used in tools other than named itself,
so the delay has now been reduced to a hundredth of a second.

(cherry picked from commit 870204fe47)
2020-10-01 16:44:43 +02:00
Evan Hunt
a9061ea123 implement isc_nm_tcpconnect()
the isc_nm_tcpconnect() function establishes a client connection via
TCP.  once the connection is esablished, a callback function will be
called with a newly created network manager handle.

(cherry picked from commit abbb79f9d1)
2020-10-01 16:44:43 +02:00
Witold Kręcicki
8db2ef9f8e allow tcpdns sockets to self-reference while connected
A TCPDNS socket creates a handle for each complete DNS message.

Previously, when all the handles were disconnected, the socket
would be closed, but the wrapped TCP socket might still have
more to read.

Now, when a connection is established, the TCPDNS socket creates
a reference to itself by attaching itself to sock->self. This
reference isn't cleared until the connection is closed via
EOF, timeout, or server shutdown. This allows the socket to remain
open even when there are no active handles for it.

(cherry picked from commit cd79b49538)
2020-10-01 16:44:43 +02:00
Evan Hunt
4209f051e9 modify reference counting within netmgr
- isc__nmhandle_get() now attaches to the sock in the nmhandle object.
  the caller is responsible for dereferencing the original socket
  pointer when necessary.
- tcpdns listener sockets attach sock->outer to the outer tcp listener
  socket. tcpdns connected sockets attach sock->outerhandle to the handle
  for the tcp connected socket.
- only listener sockets need to be attached/detached directly. connected
  sockets should only be accessed and reference-counted via their
  associated handles.

(cherry picked from commit 5ea26ee1f1)
2020-10-01 16:44:43 +02:00