Stub zones don't make use of AXFR/IXFR for the transfering of zone
data, instead, a single query is issued to the master asking for
their nameserver records (NS).
That works fine unless master is configured with 'minimal-responses'
set to yes, in which case glue records are not provided by master
in the answer with nameservers authoritative for the zone, leaving
stub zones with incomplete databases.
This commit fix this problem in a simple way, when the answer with
the authoritative nameservers is received from master (stub_callback),
for each nameserver listed (save_nsrrset), a A and AAAA records for
the name is verified in the additional section, and if not present
a query is created to resolve the corresponsing missing glue.
A struct 'stub_cb_args' was added to keep relevant information for
performing a query, like TSIG key, udp size, dscp value, etc, this
information is borrowed from, and created within function 'ns_query',
where the resolving of nameserver from master starts.
A new field was added to the struct 'dns_stub', an atomic integer,
namely pending_requests, which is used to keep how many queries are
created when resolving nameserver addresses that were missing in
the glue.
When the value of pending_requests is zero we know we can release
resources, adjust zone timers, dump to zone file, etc.
When networking statistics was added to the netmgr (in commit
5234a8e00a), two lines were added that
increment the 'STATID_RECVFAIL' statistic: One if 'uv_read_start'
fails and one at the end of the 'read_cb'. The latter happens
if 'nread < 0'.
According to the libuv documentation, I/O read callbacks (such as for
files and sockets) are passed a parameter 'nread'. If 'nread' is less
than 0, there was an error and 'UV_EOF' is the end of file error, which
you may want to handle differently.
In other words, we should not treat EOF as a RECVFAIL error.
isc_nmhandle_detach() needs to complete in the same thread
as shutdown_walk_cb() to avoid a race. Clear the caller's
pointer then pass control to the worker if necessary.
WARNING: ThreadSanitizer: data race
Write of size 8 at 0x000000000001 by thread T1:
#0 isc_nmhandle_detach lib/isc/netmgr/netmgr.c:1258:15
#1 control_command bin/named/controlconf.c:388:3
#2 dispatch lib/isc/task.c:1152:7
#3 run lib/isc/task.c:1344:2
Previous read of size 8 at 0x000000000001 by thread T2:
#0 isc_nm_pauseread lib/isc/netmgr/netmgr.c:1449:33
#1 recv_data lib/isccc/ccmsg.c:109:2
#2 isc__nm_tcp_shutdown lib/isc/netmgr/tcp.c:1157:4
#3 shutdown_walk_cb lib/isc/netmgr/netmgr.c:1515:3
#4 uv_walk <null>
#5 process_queue lib/isc/netmgr/netmgr.c:659:4
#6 process_normal_queue lib/isc/netmgr/netmgr.c:582:10
#7 process_queues lib/isc/netmgr/netmgr.c:590:8
#8 async_cb lib/isc/netmgr/netmgr.c:548:2
#9 <null> <null>
If we clone the csock (children socket) in TCP accept_connection()
instead of passing the ssock (server socket) to the call back and
cloning it there we unbreak the assumption that every socket is handled
inside it's own worker thread and therefore we can get rid of (at least)
callback locking.
The isc__nm_tcpdns_stoplistening() would call isc__nmsocket_clearcb()
that would clear the .accept_cb from non-netmgr thread. Change the
tcpdns_stoplistening to enqueue ievent that would get processed in the
right netmgr thread to avoid locking.
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication. In this commit, we implement the required
changes and simplify the logic for picking the EDNS Buffer Size.
1. The defaults for `edns-udp-size`, `max-udp-size` and
`nocookie-udp-size` have been changed to `1232` (the value picked by
DNS Flag Day 2020).
2. The probing heuristics that would try 512->4096->1432->1232 buffer
sizes has been removed and the resolver will always use just the
`edns-udp-size` value.
3. Instead of just disabling the PMTUD mechanism on the UDP sockets, we
now set IP_DONTFRAG (IPV6_DONTFRAG) flag. That means that the UDP
packets won't get ever fragmented. If the ICMP packets are lost the
UDP will just timeout and eventually be retried over TCP.
The SO_REUSEADDR, SO_REUSEPORT and SO_REUSEPORT_LB has different meaning
on different platform. In this commit, we split the function to set the
reuse of address/port and setting the load-balancing into separate
functions.
The libuv library already have multiplatform support for setting
SO_REUSEADDR and SO_REUSEPORT that allows binding to the same address
and port, but unfortunately, when used after the load-balancing socket
options have been already set, it overrides the previous setting, so we
need our own helper function to enable the SO_REUSEADDR/SO_REUSEPORT
first and then enable the load-balancing socket option.
On POSIX based systems both uv_os_sock_t and uv_os_fd_t are both typedef
to int. That's not true on Windows, where uv_os_sock_t is SOCKET and
uv_os_fd_t is HANDLE and they differ in level of indirection.
The isc__nm_socket_freebind() has been refactored to match other
isc__nm_socket_...() helper functions and take uv_os_fd_t and
sa_family_t as function arguments.
While working on 'rndc dnssec -rollover' I noticed the following
(small) issues:
- The key files where updated with hints set to "-when" and that
should always be "now.
- The kasp system test did not properly update the test number when
calling 'rndc dnssec -checkds' (and ensuring that works).
- There was a missing ']' in the rndc.c help output.
Add to the keymgr a function that will schedule a rollover. This
basically means setting the time when the key needs to retire,
and updating the key lifetime, then update the state file. The next
time that named runs the keymgr the new lifetime will be taken into
account.
After backporting #1870 to 9.11-S I saw that the condition check there
is different than in the main branch. In 9.11-S "stale" can mean
stale and serve-stale, or not active (awaiting cleanup). In 9.16 and
later versions, "stale" is stale and serve-stale, and "ancient" means
not active (awaiting cleanup). An "ancient" RRset is one that is not
active (TTL expired) and is not eligble for serve-stale.
Update the condition for rndc dumpdb -expired to closer match what is
in 9.11-S.
The kasp code had bad implicit size values for the cryptographic
algorithms Ed25519 and Ed448. When creating keys they would never
match the dnssec-policy, leading to new attempts to create keys.
These algorithms were previously not yet added to the system tests,
due to lack of availability on some systems.
named-checkconf treats the following configuration as valid:
options {
rrset-order {
order none;
};
};
Yet, the above configuration causes named to crash on startup with:
order.c:74: REQUIRE(mode == 0x00000800 || mode == 0x00000400 || mode == 0x00800000) failed, back trace
Add DNS_RDATASETATTR_NONE to the list of RRset ordering modes accepted
by dns_order_add() to allow "order none" to be used in "rrset-order"
rules. This both prevents the aforementioned crashes and addresses the
discrepancy between named-checkconf and named.
The fuzzing harness operates on dns_message_t in non-standard ways
and if 'sig0name' is non-NULL when msgresetsigs() and
dns_message_renderreset() are called it should be cleaned up.
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.
This commit fix the problems that arose when moving the dns_message_t
object from fetchctx_t to the query structure.
Since the lifetime of query objects are different than that of a fetchctx
and the dns_message_t object held by the query may be being used by some
external module, e.g. validator, even after the query may have been destroyed,
propery handling of the references to the message were added in this commit to
avoid accessing an already destroyed object.
Specifically, in rctx_done(), a reference to the message is attached at
the beginning of the function and detached at the end, since a possible call
to fctx_cancelquery() would release the dns_message_t object, and in the next
lines of code a call to rctx_nextserver() or rctx_chaseds() would require
a valid pointer to the same object.
In valcreate() a new reference is attached to the message object, this
ensures that if the corresponding query object is destroyed before the
validator attempts to access it, no invalid pointer access occurs.
In validated() we have to attach a new reference to the message, since
we destroy the validator object at the beginning of the function,
and we need access to the message in the next lines of the same function.
rctx_nextserver() and rctx_chaseds() functions were adapted to receive
a new parameter of dns_message_t* type, this was so they could receive a
valid reference to a dns_message_t since using the response context respctx_t
to access the message through rctx->query->rmessage could lead to an already
released reference due to the query being canceled.
The assertion failure REQUIRE(msg->state == DNS_SECTION_ANY),
caused by calling dns_message_setclass within function resquery_response()
in resolver.c, was happening due to wrong management of dns message_t
objects used to process responses to the queries issued by the resolver.
Before the fix, a resolver's fetch context (fetchctx_t) would hold
a pointer to the message, this same reference would then be used over all
the attempts to resolve the query, trying next server, etc... for this to work
the message object would have it's state reset between each iteration, marking
it as ready for a new processing.
The problem arose in a scenario with many different forwarders configured,
managing the state of the dns_message_t object was lacking better
synchronization, which have led it to a invalid dns_message_t state in
resquery_response().
Instead of adding unnecessarily complex code to synchronize the object,
the dns_message_t object was moved from fetchctx_t structure to the
query structure, where it better belongs to, since each query will produce
a response, this way whenever a new query is created an associated
dns_messate_t is also created.
This commit deals mainly with moving the dns_message_t object from fetchctx_t
to the query structure.
The isc_nm_pause(), isc_nm_resume() and finishing the nm_thread() from
nm_destroy() has been refactored, so all use the netievents instead of
directly touching the worker structure members. This allows us to
remove most of the locking as the .paused and .finished members are
always accessed from the matching nm_thread.
When shutting down the nm_thread(), instead of issuing uv_stop(), we
just shutdown the .async handler, so all uv_loop_t events are properly
finished first and uv_run() ends gracefully with no outstanding active
handles in the loop.
Since Mac OS X 10.1, Mach-O object files are by default built with a
so-called two-level namespace which prevents symbol lookups in BIND unit
tests that attempt to override the implementations of certain library
functions from working as intended. This feature can be disabled by
passing the "-flat_namespace" flag to the linker. Fix unit tests
affected by this issue on macOS by adding "-flat_namespace" to LDFLAGS
used for building all object files on that operating system (it is not
enough to only set that flag for the unit test executables).
As currently used in the BIND source tree, the --wrap linker option is
redundant because:
- static builds are no longer supported,
- there is no need to wrap around existing functions - what is
actually required (at least for now) is to replace them altogether
in unit tests,
- only functions exposed by shared libraries linked into unit test
binaries are currently being replaced.
Given the above, providing the alternative implementations of functions
to be overridden in lib/ns/tests/nstest.c is a much simpler alternative
to using the --wrap linker option. Drop the code detecting support for
the latter from configure.ac, simplify the relevant Makefile.am, and
remove lib/ns/tests/wrap.c, updating lib/ns/tests/nstest.c accordingly
(it is harmless for unit tests which are not calling the overridden
functions).
The typical sequence of events for AAAA queries which trigger recursion
for an A RRset at the same name is as follows:
1. Original query context is created.
2. An AAAA RRset is found in cache.
3. Client-specific data is allocated from the filter-aaaa memory pool.
4. Recursion is triggered for an A RRset.
5. Original query context is torn down.
6. Recursion for an A RRset completes.
7. A second query context is created.
8. Client-specific data is retrieved from the filter-aaaa memory pool.
9. The response to be sent is processed according to configuration.
10. The response is sent.
11. Client-specific data is returned to the filter-aaaa memory pool.
12. The second query context is torn down.
However, steps 6-12 are not executed if recursion for an A RRset is
canceled. Thus, if named is in the process of recursing for A RRsets
when a shutdown is requested, the filter-aaaa memory pool will have
outstanding allocations which will never get released. This in turn
leads to a crash since every memory pool must not have any outstanding
allocations by the time isc_mempool_destroy() is called.
Fix by creating a stub query context whenever fetch_callback() is called,
including cancellation events. When the qctx is destroyed, it will ensure
the client is detached and the plugin memory is freed.
By changing the check in 'rdatasetiter_first' and 'rdatasetiter_next'
from "now > header->rdh_ttl" to "now - RBDTB_VIRTUAL > header->rdh_ttl"
we include expired rdataset entries so that they can be used for
"rndc dumpdb -expired".
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call. Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.