When canceling the last fetch, we also need to stop the fctx_expired
timer from possibly firing between the fctx_shutdown() call and the
fetch being actually destroyed along with the timer. As there are
multiple places where fctx_shutdown() is being called without stopping
the timer, move the fctx_stoptimer() to fctx_shutdown() and cleanup the
explicit usage.
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks). This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.
Make the short lived ADB objects use the other memory context that we
already created for the hashmaps. This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.
(cherry picked from commit 05faff6d53)
When the recursive-clients value is too large, the linked lists holding
the fetch contexts can also grow large and since the algorithm to merge
outgoing queries is quadratic, named can get slow.
Replace the linked list with hashtable for faster lookups. This also
allows us to reduce the number of tasks (buckets) in the resolver.
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0. In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.
Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
(cherry picked from commit 7f4471594d)
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer. This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.
Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
The last remaining tuning value was RESOLVER_NTASKS and instead of
having variable number of the tasks per-cpu and in named and in
dns_client, set the number of the resolver tasks to 523 (number taken
from dns_client unit) to accomodate most of the recursive-clients
values.
The uv_req union member of struct isc__nm_uvreq contained libuv request
types that we don't use. Turns out that uv_getnameinfo_t is 1000 bytes
big and unnecessarily enlarged the whole structure. Remove all the
unused members from the uv_req union.
After removing sockaddr_unix from isc_sockaddr, we can also remove
sockaddr_storage and reduce the isc_sockaddr size from 152 bytes to just
48 bytes needed to hold IPv6 addresses.
(cherry picked from commit 2367b6a2e1)
When isc_rwlock_trylock() fails to get a read lock because another
writer was faster, it should wake up other waiting writers in case
there are no other readers, but the current code forgets about
the currently active writer when evaluating 'cntflag'.
Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if
there are other readers, otherwise the waiting writers, if they exist,
might not wake up.
Only call eatline() to skip to the next line if we're not
already at the end of a line when parsing an unknown directive.
We were accidentally skipping the next line when there was only
a single unknown directive on the current line.
(cherry picked from commit eb78ad2080)
Commit af7db89513 as part of #4141 was
supposed to apply the 'max-recursion-queries' quota to validator
queries, but the counter was never actually passed on to
dns_resolver_createfetch(). This has been fixed, and the global query
counter ('max-query-count', per client request) is now also added.
(cherry picked from commit 5b1ae4a948)
Upstream code doesn't do regular releases, so we need to regularly
sync the code from the upstream repository. This is synchronization up
to the commit f8d0513 from Jan 29, 2024.
(cherry picked from commit d14a76e115)
While implementing the global limit 'max-query-count', initially I
thought adding the variable to the resolver structure. But the limit
is per client request so it was moved to the view structure (and
counter in ns_query structure). However, I forgot to remove the
variable from the resolver structure again. This commit fixes that.
(cherry picked from commit 397ca34e34)
This global limit is not reset on query restarts and is a hard limit
for any client request.
Note: This commit has been significantly modified because of many
merge conflicts due to the dns_resolver_createfetch api changes.
(cherry picked from commit 16b3bd1cc7)
Add another option to configure how many outgoing queries per
client request is allowed. The existing 'max-recursion-queries' is
per restart, this one is a global limit.
(cherry picked from commit bbc16cc8e6)
The root cause is the fix for CVE-2024-0760 (part 3), which resets
the TCP connection on a failed send. Specifically commit
4b7c6138 stops reading on the socket
because the TCP connection is throttling.
When the tcpdns_send_cb callback thinks about restarting reading
on the socket, this fails because the socket is a client socket.
And nsupdate is a client and is using the same netmgr code.
This commit removes the requirement that the socket must be a server
socket, allowing reading on the socket again after being throttled.
(manually picked from commit aa24b77d8b)
Previously, the update policy rules check was moved earlier in the
sequence, and the keep rule match pointers were kept to maintain the
ability to verify maximum records by type.
However, these pointers can become invalid if server reloading
or reconfiguration occurs before update completion. To prevent
this issue, extract the maximum records by type value immediately
during processing and only keep the copy of the values instead of the
full ssurule.
(cherry picked from commit 44a54a29d8)
DNS_LOGMODULE_RBTDB was simply inappropriate, and this
log message is actually dependent on db implementation
details, so DNS_LOGMODULE_DB would be the best choice.
(cherry picked from commit b0309ee631)
The new log message is emitted when adding or updating an RRset
fails due to exceeding the max-records-per-type limit. The log includes
the owner name and type, corresponding zone name, and the limit value.
It will be emitted on loading a zone file, inbound zone transfer
(both AXFR and IXFR), handling a DDNS update, or updating a cache DB.
It's especially helpful in the case of zone transfer, since the
secondary side doesn't have direct access to the offending zone data.
It could also be used for max-types-per-name, but this change
doesn't implement it yet as it's much less likely to happen
in practice.
(cherry picked from commit 4156995431)
The DLZ modules are poorly maintained as we only ensure they can still
be compiled, the DLZ interface is blocking, so anything that blocks the
query to the database blocks the whole server and they should not be
used except in testing. The DLZ interface itself should be scheduled
for removal.
(cherry picked from commit a6cce753e2)
In two places, after linking the client to the manager's
"recursing-clients" list using the check_recursionquota()
function, the query.c module fails to unlink it on error
paths. Fix the bugs by unlinking the client from the list.
Also make sure that unlinking happens before detaching the
client's handle, as it is the logically correct order, e.g.
in case if it's the last handle and ns__client_reset_cb()
can be called because of the detachment.
(cherry picked from commit 36c4808903)
The 'dns' variable in dohpath can be in various forms ({?dns},
{dns}, {&dns} etc.). To check for a valid dohpath it ends up
being simpler to just parse the URI template rather than looking
for all the various forms if substring.
(cherry picked from commit af54ef9f5d)
Re-split format strings that had been poorly split by multiple
clang-format runs using different versions of clang-format.
(cherry picked from commit a24d6e1654)
For dynamic zones that do not set inline-signing explicitly, add a
warning that the default value for inline-signing has changed. Dynamic
zones that want to be able to reuse the zone (and not trigger a full
resign) should explicitly configure "inline-signing no;".
maxlabels is the suffix length that corresponds to the latest
NXDOMAIN response. minlabels is the suffix length that corresponds
to longest found existing name.
(cherry picked from commit 67f31c5046)
Prior to running the keymgr, first make sure that existing keys
are present in the new keylist. If not, treat this as an operational
error where the keys are made offline (temporarily), possibly unwanted.
(cherry picked from commit 5fdad05a8a)
Currently, the outgoing UDP sockets have enabled
SO_REUSEADDR (SO_REUSEPORT on BSDs) which allows multiple UDP sockets to
bind to the same address+port. There's one caveat though - only a
single (the last one) socket is going to receive all the incoming
traffic. This in turn could lead to incoming DNS message matching to
invalid dns_dispatch and getting dropped.
Disable setting the SO_REUSEADDR on the outgoing UDP sockets. This
needs to be done explicitly because `uv_udp_open()` silently enables the
option on the socket.
(cherry picked from commit eec30c33c2)
As the relaxed memory ordering doesn't ensure any memory
synchronization, it is possible that the increment will succeed even
in the case when it should not - there is a race between
atomic_fetch_sub(..., acq_rel) and atomic_fetch_add(..., relaxed).
Only the result is consistent, but the previous value for both calls
could be same when both calls are executed at the same time.
(cherry picked from commit 88227ea665)
Static-stub address and addresses from other sources where being
mixed together resulting in static-stub queries going to addresses
not specified in the configuration or alternatively static-stub
addresses being used instead of the real addresses.
(cherry picked from commit b3a2c790f3)
We currently set SO_INCOMING_CPU incorrectly, and testing by Ondrej
shows that fixing the issue and setting affinities is worse than letting
the kernel schedule threads without constraints. So we should not set
SO_INCOMING_CPU anymore.
(cherry picked from commit 8b8149cdd2)
The 'all_spilled' local variable in resolver.c:fctx_getaddresses()
is 'true' by default, and only becomes false when there is at least
one successfully found NS address. However, when a 'forward only;'
configuration is used, the code jumps over the part where it looks
for NS addresses and doesn't reset the 'all_spilled' to false, which
results in incorretly increased 'serverquota' statistics variable,
and also in invalid return error code from the function. The result
code error didn't make any differences, because all codes other than
'ISC_R_SUCCESS' or 'DNS_R_WAIT' were treated in the same way, and
the result code was never logged anywhere.
Set the default value of 'all_spilled' to 'false', and only make it
'true' before actually starting to look up NS addresses.
(cherry picked from commit e430ce7039)
If the operating system UDP queue gets full and the outgoing UDP sending
starts to be delayed, BIND 9 could exhibit memory spikes as it tries to
enqueue all the outgoing UDP messages. As those are not going to be
delivered anyway (as we argued when we stopped enlarging the operating
system send and receive buffers), try to send the UDP messages directly
using `uv_udp_try_send()` and if that fails, drop the outgoing UDP
message.
(cherry picked from commit b576c4c977)
If neither libxml2 nor libjson_c are available have named-checkconf
fail if a statistics-channels block is specified.
(cherry picked from commit b9246418e8)
This change allows fallback from an IXFR failure to AXFR when the
reason is DNS_R_TOOMANYRECORDS. This is because this error condition
could be temporary only in an intermediate version of IXFR
transactions and it's possible that the latest version of the zone
doesn't have that condition. In such a case, the secondary would never
be able to update the zone (even if it could) without this fallback.
This fallback behavior is particularly useful with the recently
introduced max-records-per-type and max-types-per-name options:
the primary may not have these limitations and may temporarily
introduce "too many" records, breaking IXFR. If the primary side
subsequently deletes these records, this fallback will help recover
the zone transfer failure automatically; without it, the secondary
side would first need to increase the limit, which requires more
operational overhead and has its own adverse effect.
This change also fixes a minor glitch that DNS_R_TOOMANYRECORDS wasn't
logged in xfrin_fail.
(cherry picked from commit 7289090683)
Administrators may wish to constrain the set of cores that BIND 9 runs
on via the 'taskset', 'cpuset' or 'numactl' programs (or equivalent on
other O/S), for example to achieve higher (or more stable) performance
by more closely associating threads with individual NIC rx queues. If
the admin has used taskset, it follows that BIND ought to
automatically use the given number of CPUs rather than the system wide
count.
Co-Authored-By: Ray Bellis <ray@isc.org>
(cherry picked from commit 5a2df8caf5)
Return partial match from dns_db_find/dns_db_find when requested
to short circuit the closest encloser discover process. Most of the
time this will be the actual closest encloser but may not be when
there yet to be committed / cleaned up versions of the zone with
names below the actual closest encloser.
(cherry picked from commit d42ea08f16)