The BUFSIZ value varies between platforms, it could be 8K on Linux and
512 bytes on mingw. Make sure the buffers are always big enough for the
output data to prevent truncation of the output by appropriately
enlarging or sizing the buffers.
(cherry picked from commit b19d932262e84608174cb89eeed32ae0212f8a87)
Contrary to what the documentation states, memory filling is only
enabled by --enable-developer (or by setting -DISC_MEM_DEFAULTFILL=1) if
the internal memory allocator is used. However, the internal memory
allocator is disabled by default, so just using the --enable-developer
build-time option does not enable memory filling (passing "-M fill" on
the named command line is necessary to actually enable it). As memory
filling is a useful tool for troubleshooting certain types of bugs, it
should also be enabled by --enable-developer when the system allocator
is used.
Furthermore, memory-related preprocessor macros are handled in two
distinct locations: lib/isc/include/isc/mem.h and bin/named/main.c.
This makes the logic hard to follow.
Move all code handling the ISC_MEM_DEFAULTFILL preprocessor macro to
lib/isc/include/isc/mem.h, ensuring memory filling is enabled by the
--enable-developer build-time switch, no matter which memory allocator
is used.
Commit c96b6eb5ec changed the way mempool
code handles freed allocations that cannot be retained for later use as
"free list" items: it no longer uses different logic depending on
whether the internal allocator is used or the system one. However, that
commit did not update a relevant piece of code in isc_mempool_destroy(),
causing memory context statistics to always be off upon shutdown when
BIND 9 is built with -DISC_MEM_USE_INTERNAL_MALLOC=1. This causes
assertion failures. Update isc_mempool_destroy() accordingly in order
to prevent this issue from being triggered.
free_namelist could be passed names with associated rdatasets
when handling errors. These need to be disassociated before
calling dns_message_puttemprdataset.
(cherry picked from commit 745d5edc3a8ca6f232b2d700ae076c2caee2bfc5)
messages indicating the reason for a fallback to AXFR (i.e, because
the requested serial number is not present in the journal, or because
the size of the IXFR response would exceeed "max-ixfr-ratio") are now
logged at level info instead of debug(4).
(cherry picked from commit df1d81cf96)
fctx_decreference() may call fctx_destroy(), which in turn may free the
fetch context by calling isc_mem_putanddetach(). This means that
whenever fctx_decreference() is called, the fetch context pointer should
be assumed to point to garbage after that call. Meanwhile, the
following pattern is used in several places in lib/dns/resolver.c:
LOCK(&res->buckets[fctx->bucketnum].lock);
bucket_empty = fctx_decreference(fctx);
UNLOCK(&res->buckets[fctx->bucketnum].lock);
Given that 'fctx' may be freed by the fctx_decreference() call, there is
no guarantee that the value of fctx->bucketnum will be the same before
and after the fctx_decreference() call. This can cause all kinds of
locking issues as LOCK() calls no longer match up with their UNLOCK()
counterparts.
Fix by always using a helper variable to hold the bucket number when the
pattern above is used.
Note that fctx_try() still uses 'fctx' after calling fctx_decreference()
(it calls fctx_done()). This is safe to do because the reference count
for 'fctx' is increased a few lines earlier and it also cannot be zero
right before that increase happens, so the fctx_decreference() call in
that particular location never invokes fctx_destroy(). Nevertheless,
use a helper variable for that call site as well, to retain consistency
and to prevent copy-pasted code from causing similar problems in the
future.
The original sscanf processing allowed for a number of syntax errors
to be accepted. This included missing the closing brace in
${modifiers}
Look for both comma and right brace as intermediate seperators as
well as consuming the final right brace in the sscanf processing
for ${modifiers}. Check when we got right brace to determine if
the sscanf consumed more input than expected and if so behave as
if it had stopped at the first right brace.
(cherry picked from commit 7be64c0e94)
$GENERATE uses 'int' for its computations and some constructions
can overflow values that can be represented by an 'int' resulting
in undefined behaviour. Detect these conditions and return a
range error.
(cherry picked from commit 5327b9708f)
After refactoring of `validated()`, the `maybe_destroy()` function is
no longer expected to actually destroy the fetch context when it is
being called, so effectively it only ensures that the validators are
canceled when the context has no more queries and pending events, but
that is redundant, because `maybe_destroy()` `REQUIRE`s that the context
should be in the shutting down state, and the function which sets that
state is already canceling the validators in its own turn.
As a failsafe, to make sure that no validators will be created after
`fctx_doshutdown()` is called, add an early return from `valcreate()` if
the context is in the shutting down state.
The `resolver.c:validated()` function unlinks the current validator from
the fetch's validators list, which can leave it empty, then unlocks
the bucket lock. If, by a chance, the fetch was timed out just before
the `validated()` call, the final timeout callback running in parallel
with `validated()` can find the fetch context with no active fetches
and with an empty validators list and destroy it, which is unexpected
for the `validated()` function and can lead to a crash.
Increase the fetch context's reference count in the beginning of
`validated()` and decrease it when it finishes its work to avoid the
unexpected destruction of the fetch context.
The current logic for determining the address of the socket to which a
client sent its query is:
1. Get the address:port tuple from the netmgr handle using
isc_nmhandle_localaddr() or from the ns_interface_t structure.
2. Convert the address:port tuple from step 1 into an isc_netaddr_t
using isc_netaddr_fromsockaddr().
3. Convert the address from step 2 back into a socket address with the
port set to 0 using isc_sockaddr_fromnetaddr().
Note that the port number (readily available in the netmgr handle or in
the ns_interface_t structure) is needlessly lost in the process,
preventing it from being recorded in dnstap captures of client traffic
produced by named.
Fix by first storing the address:port tuple in client->destsockaddr and
then creating an isc_netaddr_t from that structure. This allows the
port number to be retained in client->destsockaddr, which is what
subsequently gets passed to dns_dt_send().
Remove an outdated code comment.
(cherry picked from commit 2f945703f2)
Under specific rare timing circumstances the uv_read_start() could
fail with UV_EINVAL when the connection is reset between the connect (or
accept) and the uv_read_start() call on the nmworker loop. Handle such
situation gracefully by propagating the errors from uv_read_start() into
upper layers, so the socket can be internally closed().
(cherry picked from commit b432d5d3bc)
When there are multiple record datasets in a database node of a catalog
zone, and BIND encounters a soft error during processing of a dataset,
it breaks from the loop and doesn't process the other datasets in the
node.
There are cases when this is not desired. For example, the catalog zones
draft version 5 states that there must be a TXT RRset named
`version.$CATZ` with exactly one RR, but it doesn't set a limitation
on possible non-TXT RRsets named `version.$CATZ` existing alongside
with the TXT one. In case when one exists, we will get a processing
error and will not continue the loop to process the TXT RRset coming
next.
Remove the "break" statement to continue processing all record datasets.
(cherry picked from commit 0b2d5490cd)
When processing a catalog zone update, skip processing records with
DNSSEC-related and ZONEMD types, because we are not interested in them
in the context of a catalog zone, and processing them will fail and
produce an unnecessary warning message.
(cherry picked from commit 73d6643137)
The current implementation of isc_queue uses Michael-Scott lock-free
queue that in turn uses hazard pointers. It was discovered that the way
we use the isc_queue, such complicated mechanism isn't really needed,
because most of the time, we either execute the work directly when on
nmthread (in case of UDP) or schedule the work from the matching
nmthreads.
Replace the current implementation of the isc_queue with a simple locked
ISC_LIST. There's a slight improvement - since copying the whole list
is very lightweight - we move the queue into a new list before we start
the processing and locking just for moving the queue and not for every
single item on the list.
NOTE: There's a room for future improvements - since we don't guarantee
the order in which the netievents are processed, we could have two lists
- one unlocked that would be used when scheduling the work from the
matching thread and one locked that would be used from non-matching
thread.
(cherry picked from commit 6bd025942c)
Typing from libuv structure to isc_region_t is not possible, because
their sizes differ on 64 bit architectures. Little endian machines seems
to be lucky and still result in test passed. But big endian machine such
as s390x fails the test reliably.
Fix by directly creating the buffer as isc_region_t and skipping the
type conversion. More readable and still more correct.
(cherry picked from commit 057438cb45)
Make sure that the key structure is valid when calling the following
functions:
- dst_key_setexternal
- dst_key_isexternal
- dst_key_setmodified
- dst_key_ismodified
This commit is adapted because 9.16 has a different approach
of deconsting the variable.
(cherry picked from commit 888ec4e0d4)
Setting the sock->write_timeout from the TCP, TCPDNS, and TLSDNS send
functions could lead to (harmless) data race when setting the value for
the first time when the isc_nm_send() function would be called from
thread not-matching the socket we are sending to. Move the setting the
sock->write_timeout to the matching async function which is always
called from the matching thread.
(cherry picked from commit 61117840c1)
Clang added support for the gcc-style fallthrough
attribute (i.e. __attribute__((fallthrough))) in version 10. However,
__has_attribute(fallthrough) will return 1 in C mode in older versions,
even though they only support the C++11 fallthrough attribute. At best,
the unsupported attribute is simply ignored; at worst, it causes errors.
The C2x fallthrough attribute has the advantages of being supported in
the broadest range of clang versions (added in version 9) and being easy
to check for support. Use C2x [[fallthrough]] attribute if possible, and
fall back to not using an attribute for clang versions that don't have
it.
Courtesy of Joshua Root
(cherry picked from commit 14c8d43863)
Add a new parameter to the dst_key structure, mark a key modified if
dst_key_(un)set[bool,num,state,time] is called. Only write out key
files during a keymgr run if the metadata has changed.
(cherry picked from commit 1da91b3ab4)
when built without libtool, the sample driver in the dyndb
system test runs library intializers that have already been
run, causing the value for isc__trampoline_min to be reset.
wrap the trampoline initialize and shutdown routines under
isc_once_do() to ensure that they are only run once.
The ns_statscounter_recursclients counter was previously only
incremented or decremented if client->recursionquota was non-NULL.
This was harmless, because that value should always be non-NULL if
recursion is enabled, but it made the code slightly confusing.
(cherry picked from commit 0201eab655)
Since commit bad5a523c2, when the fetches-per-server quota
was increased or decreased, instead of the value being set to
the newly calculated quota, it was set to the *minimum* of
the new quota or 1 - which effectively meant it was always set to 1.
it should instead have been the maximum, to prevent the value from
ever dropping to zero.
(cherry picked from commit 694bc50273)
When attaching to the trampoline, the isc__trampoline_max was access
unlocked. This would not manifest under normal circumstances because we
initialize 65 trampolines by default and that's enough for most
commodity hardware, but there are ARM machines with 128+ cores where
this would be reported by ThreadSanitizer.
Add locking around the code in isc__trampoline_attach(). This also
requires the lock to leak on exit (along with memory that we already)
because a new thread might be attaching to the trampoline while we are
running the library destructor at the same time.
(cherry picked from commit 933162ae14)
inet_ntop result should always protect against empty string accepted
without an error. Make additional check to satisfy coverity scans.
(cherry picked from commit 656a0f076f)
Coverity detected issues:
- var_decl: Declaring variable "diff" without initializer.
- uninit_use_in_call: Using uninitialized value "diff.tuples.head" when
calling "dns_diff_clear".
(cherry picked from commit 67e773c93c)
When we compile with libuv that has some capabilities via flags passed
to f.e. uv_udp_listen() or uv_udp_bind(), the call with such flags would
fail with invalid arguments when older libuv version is linked at the
runtime that doesn't understand the flag that was available at the
compile time.
Enforce minimal libuv version when flags have been available at the
compile time, but are not available at the runtime. This check is less
strict than enforcing the runtime libuv version to be same or higher
than compile time libuv version.
Since version 5.0.0, decay-based purging is the only available dirty
page cleanup mechanism in jemalloc. It relies on so-called tickers,
which are simple data structures used for ensuring that certain actions
are taken "once every N times". Ticker data (state) is stored in a
thread-specific data structure called tsd in jemalloc parlance. Ticks
are triggered when extents are allocated and deallocated. Once every
1000 ticks, jemalloc attempts to release some of the dirty pages hanging
around (if any). This allows memory use to be kept in check over time.
This dirty page cleanup mechanism has a quirk. If the first
allocator-related action for a given thread is a free(), a
minimally-initialized tsd is set up which does not include ticker data.
When that thread subsequently calls *alloc(), the tsd transitions to its
nominal state, but due to a certain flag being set during minimal tsd
initialization, ticker data remains unallocated. This prevents
decay-based dirty page purging from working, effectively enabling memory
exhaustion over time. [1]
The quirk described above has been addressed (by moving ticker state to
a different structure) in jemalloc's development branch [2], but not in
any numbered jemalloc version released to date (the latest one being
5.2.1 as of this writing).
Work around the problem by ensuring that every thread spawned by
isc_thread_create() starts with a malloc() call. Avoid immediately
calling free() for the dummy allocation to prevent an optimizing
compiler from stripping away the malloc() + free() pair altogether.
An alternative implementation of this workaround was considered that
used a pair of isc_mem_create() + isc_mem_destroy() calls instead of
malloc() + free(), enabling the change to be fully contained within
isc__trampoline_run() (i.e. to not touch struct isc__trampoline), as the
compiler is not allowed to strip away arbitrary function calls.
However, that solution was eventually dismissed as it triggered
ThreadSanitizer reports when tools like dig, nsupdate, or rndc exited
abruptly without waiting for all worker threads to finish their work.
[1] https://github.com/jemalloc/jemalloc/issues/2251
[2] c259323ab3
(cherry picked from commit 7aa7b6474b)
Update the function that synchronizes the CDS and CDNSKEY DELETE
records. It now allows for the possibility that the CDS DELETE record
is published and the CDNSKEY DELETE record is not, and vice versa.
Also update the code in zone.c how 'dns_dnssec_syncdelete()' is called.
With KASP, we still maintain the DELETE records our self. Otherwise,
we publish the CDS and CDNSKEY DELETE record only if they are added
to the zone. We do still check if these records can be signed by a KSK.
This change will allow users to add a CDS and/or CDNSKEY DELETE record
manually, without BIND removing them on the next zone sign.
Note that this commit removes the check whether the key is a KSK, this
check is redundant because this check is also made in
'dst_key_is_signing()' when the role is set to DST_BOOL_KSK.
(cherry picked from commit 3d05c99abb)
After some back and forth, it was decidede to match the configuration
option with unbound ("so-reuseport"), PowerDNS ("reuseport") and/or
nginx ("reuseport").
(cherry picked from commit 7e71c4d0cc)
When backporting the load balanced sockets to BIND 9.16, the Windows
specific paths were missed. Add the #if(n)def _WIN32 back into the
appropriate places.