After refactoring of `validated()`, the `maybe_destroy()` function is
no longer expected to actually destroy the fetch context when it is
being called, so effectively it only ensures that the validators are
canceled when the context has no more queries and pending events, but
that is redundant, because `maybe_destroy()` `REQUIRE`s that the context
should be in the shutting down state, and the function which sets that
state is already canceling the validators in its own turn.
As a failsafe, to make sure that no validators will be created after
`fctx_doshutdown()` is called, add an early return from `valcreate()` if
the context is in the shutting down state.
The `resolver.c:validated()` function unlinks the current validator from
the fetch's validators list, which can leave it empty, then unlocks
the bucket lock. If, by a chance, the fetch was timed out just before
the `validated()` call, the final timeout callback running in parallel
with `validated()` can find the fetch context with no active fetches
and with an empty validators list and destroy it, which is unexpected
for the `validated()` function and can lead to a crash.
Increase the fetch context's reference count in the beginning of
`validated()` and decrease it when it finishes its work to avoid the
unexpected destruction of the fetch context.
The current logic for determining the address of the socket to which a
client sent its query is:
1. Get the address:port tuple from the netmgr handle using
isc_nmhandle_localaddr() or from the ns_interface_t structure.
2. Convert the address:port tuple from step 1 into an isc_netaddr_t
using isc_netaddr_fromsockaddr().
3. Convert the address from step 2 back into a socket address with the
port set to 0 using isc_sockaddr_fromnetaddr().
Note that the port number (readily available in the netmgr handle or in
the ns_interface_t structure) is needlessly lost in the process,
preventing it from being recorded in dnstap captures of client traffic
produced by named.
Fix by first storing the address:port tuple in client->destsockaddr and
then creating an isc_netaddr_t from that structure. This allows the
port number to be retained in client->destsockaddr, which is what
subsequently gets passed to dns_dt_send().
Remove an outdated code comment.
(cherry picked from commit 2f945703f2)
Under specific rare timing circumstances the uv_read_start() could
fail with UV_EINVAL when the connection is reset between the connect (or
accept) and the uv_read_start() call on the nmworker loop. Handle such
situation gracefully by propagating the errors from uv_read_start() into
upper layers, so the socket can be internally closed().
(cherry picked from commit b432d5d3bc)
When there are multiple record datasets in a database node of a catalog
zone, and BIND encounters a soft error during processing of a dataset,
it breaks from the loop and doesn't process the other datasets in the
node.
There are cases when this is not desired. For example, the catalog zones
draft version 5 states that there must be a TXT RRset named
`version.$CATZ` with exactly one RR, but it doesn't set a limitation
on possible non-TXT RRsets named `version.$CATZ` existing alongside
with the TXT one. In case when one exists, we will get a processing
error and will not continue the loop to process the TXT RRset coming
next.
Remove the "break" statement to continue processing all record datasets.
(cherry picked from commit 0b2d5490cd)
When processing a catalog zone update, skip processing records with
DNSSEC-related and ZONEMD types, because we are not interested in them
in the context of a catalog zone, and processing them will fail and
produce an unnecessary warning message.
(cherry picked from commit 73d6643137)
The current implementation of isc_queue uses Michael-Scott lock-free
queue that in turn uses hazard pointers. It was discovered that the way
we use the isc_queue, such complicated mechanism isn't really needed,
because most of the time, we either execute the work directly when on
nmthread (in case of UDP) or schedule the work from the matching
nmthreads.
Replace the current implementation of the isc_queue with a simple locked
ISC_LIST. There's a slight improvement - since copying the whole list
is very lightweight - we move the queue into a new list before we start
the processing and locking just for moving the queue and not for every
single item on the list.
NOTE: There's a room for future improvements - since we don't guarantee
the order in which the netievents are processed, we could have two lists
- one unlocked that would be used when scheduling the work from the
matching thread and one locked that would be used from non-matching
thread.
(cherry picked from commit 6bd025942c)
Typing from libuv structure to isc_region_t is not possible, because
their sizes differ on 64 bit architectures. Little endian machines seems
to be lucky and still result in test passed. But big endian machine such
as s390x fails the test reliably.
Fix by directly creating the buffer as isc_region_t and skipping the
type conversion. More readable and still more correct.
(cherry picked from commit 057438cb45)
Make sure that the key structure is valid when calling the following
functions:
- dst_key_setexternal
- dst_key_isexternal
- dst_key_setmodified
- dst_key_ismodified
This commit is adapted because 9.16 has a different approach
of deconsting the variable.
(cherry picked from commit 888ec4e0d4)
Setting the sock->write_timeout from the TCP, TCPDNS, and TLSDNS send
functions could lead to (harmless) data race when setting the value for
the first time when the isc_nm_send() function would be called from
thread not-matching the socket we are sending to. Move the setting the
sock->write_timeout to the matching async function which is always
called from the matching thread.
(cherry picked from commit 61117840c1)
Clang added support for the gcc-style fallthrough
attribute (i.e. __attribute__((fallthrough))) in version 10. However,
__has_attribute(fallthrough) will return 1 in C mode in older versions,
even though they only support the C++11 fallthrough attribute. At best,
the unsupported attribute is simply ignored; at worst, it causes errors.
The C2x fallthrough attribute has the advantages of being supported in
the broadest range of clang versions (added in version 9) and being easy
to check for support. Use C2x [[fallthrough]] attribute if possible, and
fall back to not using an attribute for clang versions that don't have
it.
Courtesy of Joshua Root
(cherry picked from commit 14c8d43863)
Add a new parameter to the dst_key structure, mark a key modified if
dst_key_(un)set[bool,num,state,time] is called. Only write out key
files during a keymgr run if the metadata has changed.
(cherry picked from commit 1da91b3ab4)
when built without libtool, the sample driver in the dyndb
system test runs library intializers that have already been
run, causing the value for isc__trampoline_min to be reset.
wrap the trampoline initialize and shutdown routines under
isc_once_do() to ensure that they are only run once.
The ns_statscounter_recursclients counter was previously only
incremented or decremented if client->recursionquota was non-NULL.
This was harmless, because that value should always be non-NULL if
recursion is enabled, but it made the code slightly confusing.
(cherry picked from commit 0201eab655)
Since commit bad5a523c2, when the fetches-per-server quota
was increased or decreased, instead of the value being set to
the newly calculated quota, it was set to the *minimum* of
the new quota or 1 - which effectively meant it was always set to 1.
it should instead have been the maximum, to prevent the value from
ever dropping to zero.
(cherry picked from commit 694bc50273)
When attaching to the trampoline, the isc__trampoline_max was access
unlocked. This would not manifest under normal circumstances because we
initialize 65 trampolines by default and that's enough for most
commodity hardware, but there are ARM machines with 128+ cores where
this would be reported by ThreadSanitizer.
Add locking around the code in isc__trampoline_attach(). This also
requires the lock to leak on exit (along with memory that we already)
because a new thread might be attaching to the trampoline while we are
running the library destructor at the same time.
(cherry picked from commit 933162ae14)
inet_ntop result should always protect against empty string accepted
without an error. Make additional check to satisfy coverity scans.
(cherry picked from commit 656a0f076f)
Coverity detected issues:
- var_decl: Declaring variable "diff" without initializer.
- uninit_use_in_call: Using uninitialized value "diff.tuples.head" when
calling "dns_diff_clear".
(cherry picked from commit 67e773c93c)
When we compile with libuv that has some capabilities via flags passed
to f.e. uv_udp_listen() or uv_udp_bind(), the call with such flags would
fail with invalid arguments when older libuv version is linked at the
runtime that doesn't understand the flag that was available at the
compile time.
Enforce minimal libuv version when flags have been available at the
compile time, but are not available at the runtime. This check is less
strict than enforcing the runtime libuv version to be same or higher
than compile time libuv version.
Since version 5.0.0, decay-based purging is the only available dirty
page cleanup mechanism in jemalloc. It relies on so-called tickers,
which are simple data structures used for ensuring that certain actions
are taken "once every N times". Ticker data (state) is stored in a
thread-specific data structure called tsd in jemalloc parlance. Ticks
are triggered when extents are allocated and deallocated. Once every
1000 ticks, jemalloc attempts to release some of the dirty pages hanging
around (if any). This allows memory use to be kept in check over time.
This dirty page cleanup mechanism has a quirk. If the first
allocator-related action for a given thread is a free(), a
minimally-initialized tsd is set up which does not include ticker data.
When that thread subsequently calls *alloc(), the tsd transitions to its
nominal state, but due to a certain flag being set during minimal tsd
initialization, ticker data remains unallocated. This prevents
decay-based dirty page purging from working, effectively enabling memory
exhaustion over time. [1]
The quirk described above has been addressed (by moving ticker state to
a different structure) in jemalloc's development branch [2], but not in
any numbered jemalloc version released to date (the latest one being
5.2.1 as of this writing).
Work around the problem by ensuring that every thread spawned by
isc_thread_create() starts with a malloc() call. Avoid immediately
calling free() for the dummy allocation to prevent an optimizing
compiler from stripping away the malloc() + free() pair altogether.
An alternative implementation of this workaround was considered that
used a pair of isc_mem_create() + isc_mem_destroy() calls instead of
malloc() + free(), enabling the change to be fully contained within
isc__trampoline_run() (i.e. to not touch struct isc__trampoline), as the
compiler is not allowed to strip away arbitrary function calls.
However, that solution was eventually dismissed as it triggered
ThreadSanitizer reports when tools like dig, nsupdate, or rndc exited
abruptly without waiting for all worker threads to finish their work.
[1] https://github.com/jemalloc/jemalloc/issues/2251
[2] c259323ab3
(cherry picked from commit 7aa7b6474b)
Update the function that synchronizes the CDS and CDNSKEY DELETE
records. It now allows for the possibility that the CDS DELETE record
is published and the CDNSKEY DELETE record is not, and vice versa.
Also update the code in zone.c how 'dns_dnssec_syncdelete()' is called.
With KASP, we still maintain the DELETE records our self. Otherwise,
we publish the CDS and CDNSKEY DELETE record only if they are added
to the zone. We do still check if these records can be signed by a KSK.
This change will allow users to add a CDS and/or CDNSKEY DELETE record
manually, without BIND removing them on the next zone sign.
Note that this commit removes the check whether the key is a KSK, this
check is redundant because this check is also made in
'dst_key_is_signing()' when the role is set to DST_BOOL_KSK.
(cherry picked from commit 3d05c99abb)
After some back and forth, it was decidede to match the configuration
option with unbound ("so-reuseport"), PowerDNS ("reuseport") and/or
nginx ("reuseport").
(cherry picked from commit 7e71c4d0cc)
When backporting the load balanced sockets to BIND 9.16, the Windows
specific paths were missed. Add the #if(n)def _WIN32 back into the
appropriate places.
Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private
netmgr-int.h header file, making the configuration of load balanced
sockets inoperable.
Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add
missing isc_nm_getloadbalancesockets() implementation.
(cherry picked from commit 142c63dda8)
Previously, the option to enable kernel load balancing of the sockets
was always enabled when supported by the operating system (SO_REUSEPORT
on Linux and SO_REUSEPORT_LB on FreeBSD).
It was reported that in scenarios where the networking threads are also
responsible for processing long-running tasks (like RPZ processing, CATZ
processing or large zone transfers), this could lead to intermitten
brownouts for some clients, because the thread assigned by the operating
system might be busy. In such scenarious, the overall performance would
be better served by threads competing over the sockets because the idle
threads can pick up the incoming traffic.
Add new configuration option (`load-balance-sockets`) to allow enabling
or disabling the load balancing of the sockets.
(cherry picked from commit 85c6e797aa)
Previously, the RPZ updates ran quantized on the main nm_worker loops.
As the quantum was set to 1024, this might lead to service
interruptions when large RPZ update was processed.
Change the RPZ update process to run as the offloaded work. The update
and cleanup loops were refactored to do as little locking of the
maintenance lock as possible for the shortest periods of time and the db
iterator is being paused for every iteration, so we don't hold the rbtdb
tree lock for prolonged periods of time.
(cherry picked from commit f106d0ed2b)
(cherry picked from commit e128b6a951)
Previously dns_rpz_add() were passed dns_rpz_zones_t and index to .zones
array. Because we actually attach to dns_rpz_zone_t, we should be using
the local pointer instead of passing the index and "finding" the
dns_rpz_zone_t again.
Additionally, dns_rpz_add() and dns_rpz_delete() were used only inside
rpz.c, so make them static.
(cherry picked from commit b6e885c97f)
(cherry picked from commit f4cba0784e)
Do a general cleanup of lib/dns/rpz.c style:
* Removed deprecated and unused functions
* Unified dns_rpz_zone_t naming to rpz
* Unified dns_rpz_zones_t naming to rpzs
* Add and use rpz_attach() and rpz_attach_rpzs() functions
* Shuffled variables to be more local (cppcheck cleanup)
(cherry picked from commit 840179a247)
(cherry picked from commit bfee462403)
the value of 'i' in generate could overflow when adding 'step' to
it in the 'for' loop. Use an unsigned int for 'i' which will give
an additional bit and prevent the overflow. The inputs are both
less than 2^31 and and the result will be less than 2^32-1.
(cherry picked from commit 5abdee9004)
Ensure the update zone name is mentioned in the NOTAUTH error message
in the server log, so that it is easier to track down problematic
update clients. There are two cases: either the update zone is
unrelated to any of the server's zones (previously no zone was
mentioned); or the update zone is a subdomain of one or more of the
server's zones (previously the name of the irrelevant parent zone was
misleadingly logged).
Closes#3209
(cherry picked from commit 84c4eb02e7)
In couple places, we have missed INSIST(0) or ISC_UNREACHABLE()
replacement on some branches with UNREACHABLE(). Replace all
ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().
The backport of using modern compiler features broken Windows debug
build because there's no __builtin_unreachable() in MSVC.
Define __builtin_unreachable() shim on MSVC using __assume(0).