Commit graph

15531 commits

Author SHA1 Message Date
Aram Sargsyan
a018b4e36f Implement the ForwardOnlyFail statistics channel counter
The new ForwardOnlyFail statistics channel counter indicates the
number of queries failed due to bad forwarders for 'forward only'
zones.
2024-09-16 09:31:14 +00:00
Aram Sargsyan
e430ce7039 Fix a 'serverquota' counter calculation bug
The 'all_spilled' local variable in resolver.c:fctx_getaddresses()
is 'true' by default, and only becomes false when there is at least
one successfully found NS address. However, when a 'forward only;'
configuration is used, the code jumps over the part where it looks
for NS addresses and doesn't reset the 'all_spilled' to false, which
results in incorretly increased 'serverquota' statistics variable,
and also in invalid return error code from the function. The result
code error didn't make any differences, because all codes other than
'ISC_R_SUCCESS' or 'DNS_R_WAIT' were treated in the same way, and
the result code was never logged anywhere.

Set the default value of 'all_spilled' to 'false', and only make it
'true' before actually starting to look up NS addresses.
2024-09-16 08:23:12 +00:00
Ondřej Surý
8a96a3af6a Move offloaded DNSSEC operations to different helper threads
Currently, the isc_work API is overloaded.  It runs both the
CPU-intensive operations like DNSSEC validations and long-term tasks
like RPZ processing, CATZ processing, zone file loading/dumping and few
others.

Under specific circumstances, when many large zones are being loaded, or
RPZ zones processed, this stops the CPU-intensive tasks and the DNSSEC
validation is practically stopped until the long-running tasks are
finished.

As this is undesireable, this commit moves the CPU-intensive operations
from the isc_work API to the isc_helper API that only runs fast memory
cleanups now.
2024-09-12 12:09:45 +00:00
Ondřej Surý
6370e9b311 Add isc_helper API that adds 1:1 thread for each loop
Add an extra thread that can be used to offload operations that would
affect latency, but are not long-running tasks; those are handled by
isc_work API.

Each isc_loop now has matching isc_helper thread that also built on top
of uv_loop.  In fact, it matches most of the isc_loop functionality, but
only the `isc_helper_run()` asynchronous call is exposed.
2024-09-12 12:09:45 +00:00
Aram Sargsyan
35ef25e5ea Fix data race in offloaded dns_message_checksig()
When verifying a message in an offloaded thread there is a race with
the worker thread which writes to the same buffer. Clone the message
buffer before offloading.
2024-09-12 09:08:35 +00:00
alessio
da0e48b611 Remove "port" from source address options
Remove the use of "port" when configuring query-source(-v6),
transfer-source(-v6), notify-source(-v6), parental-source(-v6),
etc. Remove the use of source ports for parental-agents.

Also remove the deprecated options use-{v4,v6}-udp-ports and
avoid-{v4,v6}udp-ports.
2024-09-12 08:15:58 +02:00
Mark Andrews
b9246418e8 Fix named-checkconf and statistics-channels
If neither libxml2 nor libjson_c are available have named-checkconf
fail if a statistics-channels block is specified.
2024-09-12 09:21:44 +10:00
Michal Nowak
ff69d07fed
Update code formatting
clang 19 was updated in the base image.
2024-09-10 17:31:32 +02:00
JINMEI Tatuya
7289090683 allow IXFR-to-AXFR fallback on DNS_R_TOOMANYRECORDS
This change allows fallback from an IXFR failure to AXFR when the
reason is DNS_R_TOOMANYRECORDS. This is because this error condition
could be temporary only in an intermediate version of IXFR
transactions and it's possible that the latest version of the zone
doesn't have that condition. In such a case, the secondary would never
be able to update the zone (even if it could) without this fallback.

This fallback behavior is particularly useful with the recently
introduced max-records-per-type and max-types-per-name options:
the primary may not have these limitations and may temporarily
introduce "too many" records, breaking IXFR. If the primary side
subsequently deletes these records, this fallback will help recover
the zone transfer failure automatically; without it, the secondary
side would first need to increase the limit, which requires more
operational overhead and has its own adverse effect.

This change also fixes a minor glitch that DNS_R_TOOMANYRECORDS wasn't
logged in xfrin_fail.
2024-09-10 14:02:38 +02:00
Aram Sargsyan
0367c60759 Fix RCU API usage in acl.c
The rcu_xchg_pointer() function can be used outside of a critical
section, and usually must be followed by a synchronize_rcu() or
call_rcu() call to detach from the resource, unless if there are
some guarantees in place because of our own reference counting.
2024-09-10 09:54:20 +00:00
Mark Andrews
61faffd06f Add flag to named-checkconf to ignore "not configured" errors
named-checkconf now takes "-n" to ignore "not configured" errors.
This allows named-checkconf to check the syntax of configurations
from other builds which have support for more options.
2024-09-09 23:32:16 +00:00
Matthijs Mekking
911daeb306 Nit logging change
Fix wrong function name (dns_dnssec_keymgr -> dns_keymgr_run).

Add error log if dns_keymgr_offline() fails.
2024-09-03 12:01:21 +02:00
Matthijs Mekking
5af53a329f Fix bug in dns_keymgr_offline
If the ZSK has lifetime unlimited, the timing metadata "Inactive" and
"Delete" cannot be found and is treated as an error. Fix by allowing
these metadata to not exist.
2024-09-03 11:57:56 +02:00
Ondřej Surý
5a2df8caf5 Follow the number of CPU set by taskset/cpuset
Administrators may wish to constrain the set of cores that BIND 9 runs
on via the 'taskset', 'cpuset' or 'numactl' programs (or equivalent on
other O/S), for example to achieve higher (or more stable) performance
by more closely associating threads with individual NIC rx queues. If
the admin has used taskset, it follows that BIND ought to
automatically use the given number of CPUs rather than the system wide
count.

Co-Authored-By: Ray Bellis <ray@isc.org>
2024-08-29 14:43:18 +00:00
Mark Andrews
d42ea08f16 Return partial match when requested
Return partial match from dns_db_find/dns_db_find when requested
to short circuit the closest encloser discover process.  Most of the
time this will be the actual closest encloser but may not be when
there yet to be committed / cleaned up versions of the zone with
names below the actual closest encloser.
2024-08-29 12:48:20 +00:00
Mark Andrews
43f0b0e8eb Move lock earlier in the call sequence
fctx->state should be read with the lock held.

    1559        /*
    1560         * Caller must be holding the fctx lock.
    1561         */

    CID 468796: (#1 of 1): Data race condition (MISSING_LOCK)
    1. missing_lock: Accessing fctx->state without holding lock fetchctx.lock.
       Elsewhere, fetchctx.state is written to with fetchctx.lock held 2 out of 2 times.
    1562        REQUIRE(fctx->state == fetchstate_done);
    1563
    1564        FCTXTRACE("sendevents");
    1565
    1566        LOCK(&fctx->lock);
    1567
2024-08-29 04:33:56 +00:00
Mark Andrews
a45e39d114 Use atomics to access find->status 2024-08-28 22:42:16 +00:00
Mark Andrews
c900300f21 Use an accessor fuction to access find->status
find->status is marked as private and access is controlled
by find->lock.
2024-08-28 22:42:16 +00:00
Aram Sargsyan
c7e8b7cf63 Exempt prefetches from the fetches-per-server quota
Give prefetches a free pass through the quota so that the cache
entries for popular zones could be updated successfully even if the
quota for is already reached.
2024-08-26 15:50:21 +00:00
Aram Sargsyan
cada2de31f Exempt prefetches from the fetches-per-zone quota
Give prefetches a free pass through the quota so that the cache entry
for a popular zone could be updated successfully even if the quota for
it is already reached.
2024-08-26 15:50:21 +00:00
Ondřej Surý
d61712d14e Stop using malloc_usable_size and malloc_size
Although the nanual page of malloc_usable_size says:

    Although the excess bytes can be over‐written by the application
    without ill effects, this is not good programming practice: the
    number of excess bytes in an allocation depends on the underlying
    implementation.

it looks like the premise is broken with _FORTIFY_SOURCE=3 on newer
systems and it might return a value that causes program to stop with
"buffer overflow" detected from the _FORTIFY_SOURCE.  As we do have own
implementation that tracks the allocation size that we can use to track
the allocation size, we can stop relying on this introspection function.

Also the newer manual page for malloc_usable_size changed the NOTES to:

    The value returned by malloc_usable_size() may be greater than the
    requested size of the allocation because of various internal
    implementation details, none of which the programmer should rely on.
    This function is intended to only be used for diagnostics and
    statistics; writing to the excess memory without first calling
    realloc(3) to resize the allocation is not supported.  The returned
    value is only valid at the time of the call.

Remove usage of both malloc_usable_size() and malloc_size() to be on the
safe size and only use the internal size tracking mechanism when
jemalloc is not available.
2024-08-26 15:00:44 +00:00
Evan Hunt
642a1b985d remove the "dialup" and "heartbeat-interval" options
mark "dialup" and "heartbeat-interval" options as ancient and
remove the documentation and the code implementing them.
2024-08-22 11:11:10 -07:00
Aram Sargsyan
c05a823e8b Implement the 'request-ixfr-max-diffs' configuration option
This limits the maximum number of received incremental zone
transfer differences for a secondary server. Upon reaching the
confgiured limit, the secondary aborts IXFR and initiates a full
zone transfer (AXFR).
2024-08-22 13:42:27 +00:00
Mark Andrews
035289be71 Check key tag range when matching dnssec keys to kasp keys 2024-08-22 12:12:02 +00:00
Mark Andrews
c5bc0a1805 Add optional range directive to keys in dnssec-policy 2024-08-22 12:12:02 +00:00
Mark Andrews
25bf77fac6 Add the concept of allowed key tag ranges to kasp 2024-08-22 12:12:02 +00:00
Matthijs Mekking
f37eb33f29 Fix algorithm rollover bug wrt keytag conflicts
If there is an algorithm rollover and two keys of different algorithm
share the same keytags, then there is a possibility that if we check
that a key matches a specific state, we are checking against the wrong
key.

Fix this by not only checking for matching key id but also key
algorithm.
2024-08-22 11:29:43 +02:00
Ondřej Surý
7b756350f5
Use clang-format-19 to update formatting
This is purely result of running:

    git-clang-format-19 --binary clang-format-19 origin/main
2024-08-22 09:21:55 +02:00
Matthijs Mekking
2e3068ed60 Disable some behavior in offline-ksk mode
Some things we no longer want to do when we are in offline-ksk mode.

1. Don't check for inactive and private keys if the key is a KSK.
2. Don't update the TTL of DNSKEY, CDS and CDNSKEY RRset, these come
   from the SKR.
2024-08-22 08:21:52 +02:00
Matthijs Mekking
61cf599fbf Retrieve RRSIG from SKR
When it is time to generate a new signature (dns_dnssec_sign), rather
than create a new one, retrieve it from the SKR.
2024-08-22 08:21:52 +02:00
Matthijs Mekking
30d20b110e Don't read private key files for offline KSKs
When we are appending contents of a DNSKEY rdataset to a keylist,
don't attempt to read the private key file of a KSK when we are in
offline-ksk mode.
2024-08-22 08:21:52 +02:00
Matthijs Mekking
2190aa904f Update key states in offline-ksk mode
With offline-ksk enabled, we don't run the keymgr because the key
timings are determined by the SKR. We do update the key states but
we derive them from the timing metadata.

Then, we can skip a other tasks in offline-ksk mode, like DS checking
at the parent and CDS synchronization, because the CDS and CDNSKEY
RRsets also come from the SKR.
2024-08-22 08:21:52 +02:00
Matthijs Mekking
63e058c29e Apply SKR bundle on rekey
When a zone has a skr structure, lookup the currently active bundle
that contains the right key and signature material.
2024-08-22 08:21:52 +02:00
Matthijs Mekking
037382c4a5 Implement SKR import
When 'rndc skr import' is called, read the file contents and store the
data in the zone's skr structure.
2024-08-22 08:21:52 +02:00
Matthijs Mekking
445722d2bf Add code to store SKR
This added source code stores SKR data. It is loosely based on:
https://www.iana.org/dnssec/archive/files/draft-icann-dnssec-keymgmt-01.txt

A SKR contains a list of signed DNSKEY RRsets. Each change in data
should be stored in a separate bundle. So if the RRSIG is refreshed that
means it is stored in the next bundle. Likewise, if there is a new ZSK
pre-published, it is in the next bundle.

In addition (not mentioned in the draft), each bundle may contain
signed CDS and CDNSKEY RRsets.

Each bundle has an inception time. These will determine when we need
to re-sign or re-key the zone.
2024-08-22 08:21:52 +02:00
Matthijs Mekking
0598381236 Add offline-ksk option
Add a new configuration option to enable Offline KSK key management.

Offline KSK cannot work with CSK because it splits how keys with the
KSK and ZSK role operate. Therefore, one key cannot have both roles.
Add a configuration check to ensure this.
2024-08-22 08:21:52 +02:00
Nicki Křížek
779de4ec34
Merge tag 'v9.21.0' 2024-08-21 16:23:09 +02:00
Ondřej Surý
3bca3cb5cf
Destroy the dns_xfrin isc_timers on the correct loop
There are few places where we attach/detach from the dns_xfrin object
while running on a different thread than the zone's assigned thread -
xfrin_xmlrender() in the statschannel and dns_zone_stopxfr() to name the
two places where it happens now.  In the rare case, when the incoming
transfer completes (or shuts down) in the brief period between the other
thread attaches and detaches from the dns_xfrin, the isc_timer_destroy()
calls would be called by the last thread calling the xfrin_detach().
In the worst case, it would be this other thread causing assertion
failure.  Move the isc_timer_destroy() call to xfrin_end() function
which is always called on the right thread and to match this move
isc_timer_create() to xfrin_start() - although this other change makes
no difference.
2024-08-21 13:54:40 +02:00
Ondřej Surý
679e90a57d Add isc_log_createandusechannel() function to simplify usage
The new
isc_log_createandusechannel() function combines following calls:

    isc_log_createchannel()
    isc_log_usechannel()

calls into a single call that cannot fail and therefore can be used in
places where we know this cannot fail thus simplifying the error
handling.
2024-08-20 12:50:39 +00:00
Ondřej Surý
091d738c72 Convert all categories and modules into static lists
Remove the complicated mechanism that could be (in theory) used by
external libraries to register new categories and modules with
statically defined lists in <isc/log.h>.  This is similar to what we
have done for <isc/result.h> result codes.  All the libraries are now
internal to BIND 9, so we don't need to provide a mechanism to register
extra categories and modules.
2024-08-20 12:50:39 +00:00
Ondřej Surý
8506102216 Remove logging context (isc_log_t) from the public namespace
Now that the logging uses single global context, remove the isc_log_t
from the public namespace.
2024-08-20 12:50:39 +00:00
Ondřej Surý
043f11de3f Remove isc_log_write1() and isc_log_vwrite1() functions
The isc_log_write1() and isc_log_vwrite1() functions were meant to
de-duplicate the messages sent to the isc_log subsystem.  However, they
were never used in an entire code base and the whole mechanism around it
was complicated and very inefficient.  Just remove those, there are
better ways to deduplicate syslog messages inside syslog daemons now.
2024-08-20 12:50:39 +00:00
Ondřej Surý
b2dda86254 Replace isc_log_create/destroy with isc_logconfig_get()
Add isc_logconfig_get() function to get the current logconfig and use
the getter to replace most of the little dancing around setting up
logging in the tools. Thus:

    isc_log_create(mctx, &lctx, &logconfig);
    isc_log_setcontext(lctx);
    dns_log_setcontext(lctx);
    ...
    ...use lcfg...
    ...
    isc_log_destroy();

is now only:

    logconfig = isc_logconfig_get(lctx);
    ...use lcfg...

For thread-safety, isc_logconfig_get() should be surrounded by RCU read
lock, but since we never use isc_logconfig_get() in threaded context,
the only place where it is actually used (but not really needed) is
named_log_init().
2024-08-20 12:50:39 +00:00
Ondřej Surý
a8a689531f Use single logging context for everything
Instead of juggling different logging context, use one single logging
context that gets initialized in the libisc constructor and destroyed in
the libisc destructor.

The application is still responsible for creating the logging
configuration before using the isc_log API.

This patch is first in the series in a way that it is transparent for
the users of the isc_log API as the isc_log_create() and
isc_log_destroy() are now thin shims that emulate the previous
functionality, but it isc_log_create() will always return internal
isc__lctx pointer and isc_log_destroy() will actually not destroy the
internal isc__lctx context.

Signed-off-by: Ondřej Surý <ondrej@isc.org>
2024-08-20 12:50:39 +00:00
Aram Sargsyan
8bb9568467 Process also the ISC_R_CANCELED result code in rpz_rewrite()
Log  canceled queries (e.g. when shutting down a hung fetch)
in DEBUG3 level instead of DEBUG1 which is used for the
"unrecognized" result codes.
2024-08-19 10:15:01 +00:00
Ondřej Surý
59f4fdebc0 Check the result of dirfd() before calling unlinkat()
Instead of directly using the result of dirfd() in the unlinkat() call,
check whether the returned file descriptor is actually valid.  That
doesn't really change the logic as the unlinkat() would fail with
invalid descriptor anyway, but this is cleaner and will report the right
error returned directly by dirfd() instead of EBADF from unlinkat().
2024-08-19 09:57:28 +00:00
Ondřej Surý
2fbf9757b8 Remove code to read and parse /proc/net/if_inet6 on Linux
The getifaddr() works fine for years, so we don't have to
keep the callback to parse /proc/net/if_inet6 anymore.
2024-08-19 09:42:55 +00:00
Ondřej Surý
dda5ba53df Ignore errno returned from rewind() in the interface iterator
The clang-scan 19 has reported that we are ignoring errno after the call
to rewind().  As we don't really care about the result, just silence the
error, the whole code will be removed in the development version anyway
as it is not needed.
2024-08-19 09:42:55 +00:00
Ondřej Surý
122a142241 Use constexpr for NS_PER_SEC and friends constants
The contexpr introduced in C23 standard makes perfect sense to be used
instead of preprocessor macros - the symbols are kept, etc.  Define
ISC_CONSTEXPR to be `constexpr` for C23 and `static const` for the older
C standards.  Use the newly introduced macro for the NS_PER_SEC and
friends time constants.
2024-08-19 09:08:55 +00:00
Ondřej Surý
b03e90e0d4 Change the NS_PER_SEC (and friends) from enum to static const
New version of clang (19) has introduced a stricter checks when mixing
integer (and float types) with enums.  In this case, we used enum {}
as C17 doesn't have constexpr yet.  Change the time conversion constants
to be static const unsigned int instead of enum values.
2024-08-19 09:08:55 +00:00