When the mark_ancient() helper function was introduced, couple of places
with duplicate (or almost duplicate) code was missed. Move the
mark_ancient() function closer to the top of the file, and correctly use
it in places that mark the header as ANCIENT.
If we know that the header has ZEROTTL set, the server should never send
stale records for it and the TTL should never be anything else than 0.
The comment was already there, but the code was not matching the
comment.
The old name was misleading as it never meant time-to-live, e.g. number
of seconds from now when the header should expire. The true meaning was
an expiration time e.g. now + ttl. This was the original design bug
that caused the slip when we assigned header->ttl to rdataset->ttl.
Because the name was matching, nobody has questioned the correctness of
the code both during the MR review and during the numerous re-reviews
when we were searching for the cause of the 54 year TTL.
When the header has been marked as ANCIENT, but the ttl hasn't been
reset (this happens in couple of places), the rdataset TTL would be
set to the header timestamp instead to a reasonable TTL value.
Since this header has been already expired (ANCIENT is set), set the
rdataset TTL to 0 and don't reuse this field to print the expiration
time when dumping the cache. Instead of printing the time, we now
just print 'expired (awaiting cleanup'.
When there are parent and child zones on the same server, the DNSKEY
lookup was failing as the pending record we are validating is needed
to fetch the DNSKEY records. This change allows that to happen.
The caller is already setting STARTATZONE when the name being looked
up is a subdomain of the current domain.
The address lookups from ADB were not being validated, allowing
spoofed responses to be accepted and used for other lookups.
Validate the answers except when CD=1 is set in the triggering
request. Separate ADB names looked up with CD=1 from those without
CD=1, to prevent the use of unvalidated answers in the normal lookup
case (CD=0). Set the TTL on unvalidated (pending) responses to
ADB_CACHE_MINIMUM when adding them to the ADB.
the search for the deepest known zone cut in the cache could
improperly reject a node containing stale data, even if the
NS rdataset wasn't the data that was stale.
this change also improves the efficiency of the search by
stopping it when both NS and RRSIG(NS) have been found.
Change the names of the node reference counting functions
and add comments to make the mechanism easier to understand:
- newref() and decref() are now called qpcnode_acquire()/
qpznode_acquire() and qpcnode_release()/qpznode_release()
respectively; this reflects the fact that they modify both
the internal and external reference counters for a node.
- qpcnode_newref() and qpznode_newref() are now called
qpcnode_erefs_increment() and qpznode_erefs_increment(), and
qpcnode_decref() and qpznode_decref() are now called
qpcnode_erefs_decrement() and qpznode_erefs_decrement(),
to reflect that they only increase and decrease the node's
external reference counters, not internal.
This removes the db_nodelock_t structure and changes the node_locks
array to be composed only of isc_rwlock_t pointers. The .reference
member has been moved to qpdb->references in addition to
common.references that's external to dns_db API users. The .exiting
members has been completely removed as it has no use when the reference
counting is used correctly.
The origin_node in qpcache was always NULL, so we can remove the
getoriginode() function and origin_node pointer as the
dns_db_getoriginnode() correctly returns ISC_R_NOTFOUND when the
function is not implemented.
Cleanup the pattern in the decref() functions in both qpcache.c and
qpzone.c, so it follows the similar patter as we already have in
newref() function.
In order to avoid to loop to find the next position to store an EDE in
a dns_edectx_t, add a "nextede" state which holds the next available
position.
Also, in order ot avoid to loop to find if an EDE is already existing in
a dns_edectx_t, and avoid a duplicate, use a bitmap to immediately know
if the EDE is there or not.
Those both changes applies for adding or copying EDE.
Also make the direction of dns_ede_copy more explicit/avoid errors by
making "edectx_from" a const pointer.
Migrate tests cases in client_test code which were exclusively testing
code which is now all wrapped inside ede compilation unit. Those are
testing maximum number of EDE, duplicate EDE as well as truncation of
text of an EDE.
Also add coverage for the copy of EDE from an edectx to another one, as
well as checking the assertion of the maximum EDE info code which can be
used.
Instead of mixing the dns_resolver and dns_validator units directly with
the EDE code, split-out the dns_ede functionality into own separate
compilation unit and hide the implementation details behind abstraction.
Additionally, the EDE codes are directly copied into the ns_client
buffers by passing the EDE context to dns_resolver_createfetch().
This makes the dns_ede implementation simpler to use, although sligtly
more complicated on the inside.
Co-authored-by: Colin Vidal <colin@isc.org>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
When an EDE code is added to a message, the code is converted early in a
big-endian order so it can be memcpy-ed directly in the EDE buffer that
will go on the wire.
This previous change forget to update debug logs which still assume the
EDE code was in host byte order. Add a separate variable to
differentiate both and avoid ambiguities
Extended DNS error 22 (No reachable authority) was previously detected
when `fctx_expired` fired. It turns out this function is used as a
"safety net" and the timeout detection should be caught earlier.
It was working though, because of another issue fixed by !9927. Since
this change, the recursive request timed out detection occurs before
`fctx_expired` so EDE 22 is not added to the response message anymore.
The fix of the problem is to add the EDE 22 code in two situations:
- When the dispatch code timed out (rctx_timedout) the resolver code
checks various properties to figure out if it needs to make another
fetch attempt. One of the paramters if the fetch expiration time. If
it expires, the whole recursion is canceled, so it now adds the EDE 22
code.
- If the fetch expiration time doesn't expires in the case above (and
other parameters allows it) a new fetch attempt is made (fctx_query).
But before the new request is actually made, the fetch expiration time
is re-checked. It might then has elapsed, and the whole recursion is
canceled. So it now also adds the EDE 22 code here as well.
Add support for EDE codes 1 (Unsupported DNSKEY Algorithm) and 2
(Unsupported DS Digest Type) which might occurs during DNSSEC
validation in case of unsupported DNSKEY algorithm or DS digest type.
Because DNSSEC internally kicks off various fetches, we need to copy
all encountered extended errors from fetch responses to the fetch
context. Upon an event, the errors from the fetch context are copied
to the client response.
ISCCC_R_SYNTAX, ISCCC_R_EXPIRED, and ISCCC_R_CLOCKSKEW have the
same usage and text formats as DNS_R_SYNTAX, DNS_R_EXPIRED and
DNS_R_CLOCKSCREW respectively. this was originally done because
result codes were defined in separate libraries, and some tool
might be linked with libisccc but not libdns. as the result codes
are now defined in only one place, there's no need to retain the
duplicates.
the isc_mem allocation functions can no longer fail; as a result,
ISC_R_NOMEMORY is now rarely used: only when an external library
such as libjson-c or libfstrm could return NULL. (even in
these cases, arguably we should assert rather than returning
ISC_R_NOMEMORY.)
code and comments that mentioned ISC_R_NOMEMORY have been
cleaned up, and the following functions have been changed to
type void, since (in most cases) the only value they could
return was ISC_R_SUCCESS:
- dns_dns64_create()
- dns_dyndb_create()
- dns_ipkeylist_resize()
- dns_kasp_create()
- dns_kasp_key_create()
- dns_keystore_create()
- dns_order_create()
- dns_order_add()
- dns_peerlist_new()
- dns_tkeyctx_create()
- dns_view_create()
- dns_zone_setorigin()
- dns_zone_setfile()
- dns_zone_setstream()
- dns_zone_getdbtype()
- dns_zone_setjournal()
- dns_zone_setkeydirectory()
- isc_lex_openstream()
- isc_portset_create()
- isc_symtab_create()
(the exception is dns_view_create(), which could have returned
other error codes in the event of a crypto library failure when
calling isc_file_sanitize(), but that should be a RUNTIME_CHECK
anyway.)
Track inside the dns_dnsseckey structure whether we have seen the
private key, or if this key only has a public key file.
If the key only has a public key file, or a DNSKEY reference in the
zone, mark the key 'pubkey'. In dnssec-signzone, if the key only
has a public key available, consider the key to be offline. Any
signatures that should be refreshed for which the key is not available,
retain the signature.
So in the code, 'expired' becomes 'refresh', and the new 'expired'
is only used to determine whether we need to keep the signature if
the corresponding key is not available (retaining the signature if
it is not expired).
In the 'keysthatsigned' function, we can remove:
- key->force_publish = false;
- key->force_sign = false;
because they are redundant ('dns_dnsseckey_create' already sets these
values to false).
Extended DNS error mechanism (EDE) enables to have several EDE raised
during a DNS resolution (typically, a DNSSEC query will do multiple
fetches which each of them can have an error). Add support to up to 3
EDE errors in an DNS response. If duplicates occur (two EDEs with the
same code, the extra text is not compared), only the first one will be
part of the DNS answer.
Because the maximum number of EDE is statically fixed, `ns_client_t`
object own a static vector of `DNS_DE_MAX_ERRORS` (instead of a linked
list, for instance). The array can be fully filled (all slots point to
an allocated `dns_ednsopt_t` object) or partially filled (or
empty). In such case, the first NULL slot means there is no more EDE
objects.
When TCP is used, 'fctx_query()' adds one second to the rtt
(round-trip time) value, but there's a bug when the decision
about using TCP is made already after the calculation. Move the
block of the code which looks up the peers list to decide
whether to use TCP into a place that's before the rtt calculation
is performed. This commit doesn't add or remove any code, it just
moves the code and adds a comment block.
When 'ISC_R_TIMEDOUT' is received in 'tcp_recv()', it times out the
oldest response in the active responses queue, and only after that it
checks whether other active responses have also timed out. So when
setting a timeout value for a read operation after a successful
connection, it makes sense to take the timeout value from the oldest
response in the active queue too, because, theoretically, the responses
can have different timeout values, e.g. when the TCP dispatch is shared.
Currently 'resp' is always NULL. Previously when connect and read
timeouts were not separated in dispatch this affected only logging, but
now since we are setting a new timeout after a successful connection,
we need to choose a suitable response from the active queue.
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0. In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.
Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer. This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.
Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks). This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.
Make the short lived ADB objects use the other memory context that we
already created for the hashmaps. This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.
In some places there was a limitation of the maximum timeout
value of INT16_MAX, which is only about 32 seconds. Refactor
the code to remove the limitation.
The network manager layer has two different timers with their
own timeout values for TCP connections: connect timeout and read
timeout. Separate the connect and the read TCP timeouts in the
dispatch module too.
struct fetchctx does have a list of pending validators as well as a
pointer to the HEAD validator. Remove the validator pointer to avoid
confusion, as there is no perticular reasons to have it directly
accessible outside of the list.
We started using isc_nm_bad_request() more actively throughout
codebase. In the case of HTTP/2 it can lead to a large count of
useless "Bad Request" messages in the BIND log, as often we attempt to
send such request over effectively finished HTTP/2 sessions.
This commit fixes that.
A call to isc_nm_read_stop() would always stop reading timer even in
manual timer control mode which was added with StreamDNS in mind. That
looks like an omission that happened due to how timers are controlled
in StreamDNS where we always stop the timer before pausing reading
anyway (see streamdns_on_complete_dnsmessage()). That would not work
well for HTTP, though, where we might want pause reading without
stopping the timer in the case we want to split incoming data into
multiple chunks to be processed independently.
I suppose that it happened due to NM refactoring in the middle of
StreamDNS development (at the time isc_nm_cancelread() and
isc_nm_pauseread() were removed), as the StreamDNS code seems to be
written as if timers are not stoping during a call to
isc_nm_read_stop().
This commit introduces manual read timer control as used by StreamDNS
and its underlying transports. Before that, DoH code would rely on the
timer control provided by TCP, which would reset the timer any time
some data arrived. Now, the timer is restarted only when a full DNS
message is processed in line with other DNS transports.
That change is required because we should not stop the timer when
reading from the network is paused due to throttling. We need a way to
drop timed-out clients, particularly those who refuse to read the data
we send.
This commit adds logic to make code better protected against clients
that send valid HTTP/2 data that is useless from a DNS server
perspective.
Firstly, it adds logic that protects against clients who send too
little useful (=DNS) data. We achieve that by adding a check that
eventually detects such clients with a nonfavorable useful to
processed data ratio after the initial grace period. The grace period
is limited to processing 128 KiB of data, which should be enough for
sending the largest possible DNS message in a GET request and then
some. This is the main safety belt that would detect even flooding
clients that initially behave well in order to fool the checks server.
Secondly, in addition to the above, we introduce additional checks to
detect outright misbehaving clients earlier:
The code will treat clients that open too many streams (50) without
sending any data for processing as flooding ones; The clients that
managed to send 1.5 KiB of data without opening a single stream or
submitting at least some DNS data will be treated as flooding ones.
Of course, the behaviour described above is nothing else but
heuristical checks, so they can never be perfect. At the same time,
they should be reasonable enough not to drop any valid clients,
realatively easy to implement, and have negligible computational
overhead.
Initially, our DNS-over-HTTP(S) implementation would try to process as
much incoming data from the network as possible. However, that might
be undesirable as we might create too many streams (each effectively
backed by a ns_client_t object). That is too forgiving as it might
overwhelm the server and trash its memory allocator, causing high CPU
and memory usage.
Instead of doing that, we resort to processing incoming data using a
chunk-by-chunk processing strategy. That is, we split data into small
chunks (currently 256 bytes) and process each of them
asynchronously. However, we can process more than one chunk at
once (up to 4 currently), given that the number of HTTP/2 streams has
not increased while processing a chunk.
That alone is not enough, though. In addition to the above, we should
limit the number of active streams: these streams for which we have
received a request and started processing it (the ones for which a
read callback was called), as it is perfectly fine to have more opened
streams than active ones. In the case we have reached or surpassed the
limit of active streams, we stop reading AND processing the data from
the remote peer. The number of active streams is effectively decreased
only when responses associated with the active streams are sent to the
remote peer.
Overall, this strategy is very similar to the one used for other
stream-based DNS transports like TCP and TLS.
Limit the number of records appended to ADDITIONAL section to the names
that have less than 14 records in the RDATA. This limits the number
of the lookups into the database(s) during single client query.
Also don't append any additional data to ANY queries. The answer to ANY
is already big enough.
All the database implementations share the same names for the methods
implementing the database. That has some advantages like knowing what
to expect, but it turns out that any time such method shows up in any
kind of tracing - be it perf record, backtrace or anything else that
uses symbol names, it is very hard to distinguish whether the find()
belongs to qpcache, qpzone, builtin or sdlz implementation.
Make at least the names for qpzone and qpcache unique.
there was a database bug in which dns_db_find() could get a partial
match for the query name, but still set foundname to match the full
query name. this triggered an assertion when query_addwildcardproof()
assumed that foundname would be shorter.
the database bug has been fixed, but in case it happens again, we
can just copy the name instead of splitting it. we will also log a
warning that the closest-encloser name was invalid.
when adding a new NSEC3 record, dns_nsec3_addnsec3() uses a
dbiterator to seek to the newly created node and then find its
predecessor. dbiterators in the qpzone use snapshots, so changes
to the database are not reflected in an already-existing iterator.
consequently, when we add a new node, we have to create a new iterator
before we can seek to it.
when a requested name is found in the QP trie during a lookup, but its
records have been marked as nonexistent by a previous deletion, then
it's treated as a partial match, but the foundname could be left
pointing to the original qname rather than the parent. this could
lead to an assertion failure in query_findclosestnsec3().
The code in zone_startload() disables RPZ and CATZ for a zone if
dns_master_loadfile() returns anything other than ISC_R_SUCCESS,
which makes sense, but it's an error because zone_startload() can
also return DNS_R_SEENINCLUDE upon success when the zone had an
$INCLUDE statement.
Ensure the log prefixes passed to the dns_message_logpacketfrom()
function by its callers do not include the word "from" as the latter is
now emitted by the logfmtpacket() helper function.
Ensure the log prefixes passed to the dns_message_logpacketfromto()
function by its callers do not include the words "from" or "to" as those
are now emitted by the logfmtpacket() helper function.
Move dns_dispentry_getlocaladdress() calls around so that they are not
only invoked when dnstap support is compiled in. This function calls
isc_nmhandle_localaddr(), which may issue a system call, but only if the
ISC_SOCKET_DETAILS preprocessor macro is set at compile time.
Pass the value extracted by dns_dispentry_getlocaladdress() to
dns_message_logpacketfromto() so that it gets logged, adding useful
information to the relevant debug messages.
Since dns_message_logpacket() only takes a single socket address as a
parameter (and it is always the sending socket's address), rename it to
dns_message_logpacketfrom() so that its name better conveys its purpose
and so that the difference in purpose between this function and
dns_message_logpacketfromto() becomes more apparent.
Since dns_message_logfmtpacket() needs to be provided with both "from"
and "to" socket addresses, rename it to dns_message_logpacketfromto() so
that its name better conveys its purpose. Clean up the code comments
for that function.
Change the function prototype for dns_message_logfmtpacket() so that it
takes two isc_sockaddr_t parameters: one for the sending side and
another one for the receiving side. This enables debug messages to be
more precise.
Also adjust the function prototype for logfmtpacket() accordingly.
Unlike dns_message_logfmtpacket(), this function must not require both
'from' and 'to' parameters to be non-NULL as it is still going to be
used by dns_message_logpacket(), which only provides a single socket
address. Adjust its log format to handle both of these cases properly.
Adjust both dns_message_logfmtpacket() call sites accordingly, without
actually providing the second socket address yet. (This causes the
revised REQUIRE() assertion in dns_message_logfmtpacket() to fail; the
issue will be addressed in a separate commit.)
Both existing callers of the dns_message_logfmtpacket() function set the
argument passed as 'style' to &dns_master_style_comment. To simplify
these call sites, drop the 'style' parameter from the prototype for
dns_message_logfmtpacket() and use a fixed value of
&dns_master_style_comment in the function's body instead.
All callers of the logfmtpacket() helper function require the argument
passed as 'address' to be non-NULL. Meanwhile, the 'newline' and
'space' local variables in logfmtpacket() are only set to values
different than their initial values if the 'address' parameter is NULL.
Replace the 'newline' and 'space' local variables in logfmtpacket() with
fixed strings to improve code readability.
Extracting the exact address that each wildcard/TCP socket is bound to
locally requires issuing the getsockname() system call, which libuv
exposes via its uv_*_getsockname() functions. This is only required for
detailed logging and comes at a noticeable performance cost, so it
should not happen by default. However, it is useful for debugging
certain problems (e.g. cryptic system test failures), so a convenient
way of enabling that behavior should exist.
Update isc_nmhandle_localaddr() so that it calls uv_*_getsockname() when
the ISC_SOCKET_DETAILS preprocessor macro is set at compile time.
Ensure proper handling of sockets that wrap other sockets.
Set the new ISC_SOCKET_DETAILS macro by default when --enable-developer
is passed to ./configure. This enables detailed logging in the system
tests run in GitLab CI without affecting performance in non-development
BIND 9 builds.
Note that setting the ISC_SOCKET_DETAILS preprocessor macro at compile
time enables all callers of isc_nmhandle_localaddr() to extract the
exact address of a given local socket, which results e.g. in dnstap
captures containing more accurate information.
Mention the new preprocessor macro in the section of the ARM that
discusses why exact socket addresses may not be logged by default.
The dns_dispatch_gettcp() function is used for finding an existing TCP
connection that can be reused for sending a query from a specified local
address to a specified remote address. The logic for matching the
provided <local address, remote address> tuple to one of the existing
TCP connections is implemented in the dispatch_match() function:
- if the examined TCP connection already has a libuv handle assigned,
it means the connection has already been established; therefore,
compare the provided <local address, remote address> tuple against
the corresponding address tuple for the libuv handle associated with
the connection,
- if the examined TCP connection does not yet have a libuv handle
assigned, it means the connection has not yet been established;
therefore, compare the provided <local address, remote address>
tuple against the corresponding address tuple that the TCP
connection was originally created for.
This logic limits TCP connection reuse potential as the libuv handle
assigned to an existing dispatch object may have a more specific local
<address, port> tuple associated with it than the local <address, port>
tuple that the dispatch object was originally created for. That's
because the local address for outgoing connections can be set to a
wildcard <address, port> tuple (indicating that the caller does not care
what source <address, port> tuple will be used for establishing the
connection, thereby delegating the task of picking it to the operating
system) and then get "upgraded" to a specific <address, port> tuple when
the socket is bound (and a libuv handle gets associated with it). When
another dns_dispatch_gettcp() caller then tries to look for an existing
TCP connection to the same peer and passes a wildcard address in the
local part of the tuple, the function will not match that request to a
previously-established TCP connection (unless isc_nmhandle_localaddr()
returns a wildcard address as well).
Simplify dispatch_match() so that the libuv handle associated with an
existing dispatch object is not examined for the purpose of matching it
to the provided <local address, remote address> tuple; instead, always
examine the <local address, remote address> tuple that the dispatch
object was originally created for. This enables reuse of TCP
connections created without providing a specific local socket address
while still preventing other connections (created for a specific local
socket address) from being inadvertently shared.
This commit ensures that BIND enables TLS SNI support for outgoing DoT
connections (when possible) in order to improve compatibility with
other DNS server software.
This commit adds support for setting SNI hostnames in outgoing
connections over TLS.
Most of the changes are related to either adapting the code to accept
and extra argument in *connect() functions and a couple of changes to
the TLS Stream to actually make use of the new SNI hostname
information.
Previously, we had an ISC_CONSTEXPR macro that was expanded to either
`constexpr` or `static const`, depending on compiler support. To make
the code cleaner, move `constexpr` support detection to Autoconf; if
`constexpr` support is missing from the compiler, define `constexpr` as
`static const` in config.h.
Since BIND 9 headers are not longer public, there's no reason to keep
the ISC_LANG_BEGINDECL and ISC_LANG_ENDDECL macros to support including
them from C++ projects.
This is a second attempt to rewrite the GLUE cache to not use per
database version hash table. Instead of keeping a hash table indexed by
the node, use a directly linked list of GLUE records for each
slabheader. This was attempted before, but there was a data race caused
by the fact that the thread cleaning the GLUE records could be slower
than accessing the slab headers again and reinitializing the wait-free
stack.
The improved design builds on the previous design, but adds a new
dns_gluelist structure that has a pointer to the database version.
If a dns_gluelist belonging to a different (old) version is detected, it
is just detached from the slabheader and left for the closeversion() to
clean it up later.
The 'remote-servers' named.conf reference conflicts with the standard
term from the glossary. Rename the standard term to server-list to
make the docs build.
Add back the top blocks 'parental-agents', 'primaries', and 'masters'
to the configuration. Do not document them as so many names for the
same clause is confusing.
This has a slight negative side effect that a top block 'primaries'
can be referred to with a zone statement 'parental-agents' for example,
but that shouldn't be a big issue.
Having zone statements that are also top blocks is confusing, and if
we want to add more in the future (which I suspect will be for
generalized notifications, multi-signer), we need to duplicate a lot
of code.
Remove top blocks 'parental-agents' and 'primaries' and just have one
top block 'remote-servers' that you can refer to with zone statements.
this commit removes the deprecated "sortlist" option. the option
is now marked as ancient; it is a fatal error to use it in
named.conf.
the sortlist system test has been removed, and other tests that
referenced the option have been modified.
the enabling functions, dns_message_setsortorder() and
dns_rdataset_towiresorted(), have also been removed.
named-checkzone will now, as part of the zone's integrity checks,
look to see if there are A or AAAA records being served and if so
check that the nameservers have A or AAAA records respectively.
These are a sometimes overlooked checks that, if not met, can mean
that a service that is supposed to reachable over IPv6 will not be
resolvable when the recursive resolver is IPv6 only. Similarly for
IPv4 servers when there are IPv4 only resolvers.
- remove obsolete DNS_LOGMODULE_RBT and DNS_LOGMODULE_RBTDB
- correct the misuse of the wrong log modules in dns/rpz.c and
dns/catz.c, and add DNS_LOGMODULE_RPZ and DNS_LOGMODULE_CATZ
to support them.
`shutdown_trigger_close_cb` is not called in the main loop since
queued events in the `loop->async_trigger`, including loop teardown
(shutdown_server) are processed first, before the `uv_close` callback
is executed..
In order to pass the information to the queued events, it is necessary
to set the flag earlier in the process and not wait for the `uv_close`
callback to trigger.
Search directive from resolv.conf had a maximum of 8 domains. Any
more were ignored. Do not ignore them anymore; iterate over any
number of domains.
Test resolv.conf support by checking the first and last domain in
the search list. Ignore the domains between; just ensure that the
last domain in the configuration is the last domain parsed.
Only call eatline() to skip to the next line if we're not
already at the end of a line when parsing an unknown directive.
We were accidentally skipping the next line when there was only
a single unknown directive on the current line.
The DNS_R_MUSTBESECURE lost its meaning with removal of
dnssec-must-be-secure option, so replace the few remaining (and a bit
confusing) use of this result code with DNS_R_NOVALIDSIG.
The dnssec-must-be-secure feature was added in the early days of BIND 9
and DNSSEC and it makes sense only as a debugging feature. There are no
reasons to keep this feature in the production code anymore.
Remove the feature to simplify the code.
Commit af7db89513 as part of #4141 was
supposed to apply the 'max-recursion-queries' quota to validator
queries, but the counter was never actually passed on to
dns_resolver_createfetch(). This has been fixed, and the global query
counter ('max-query-count', per client request) is now also added.
Upstream code doesn't do regular releases, so we need to regularly
sync the code from the upstream repository. This is synchronization up
to the commit f8d0513 from Jan 29, 2024.
While implementing the global limit 'max-query-count', initially I
thought adding the variable to the resolver structure. But the limit
is per client request so it was moved to the view structure (and
counter in ns_query structure). However, I forgot to remove the
variable from the resolver structure again. This commit fixes that.
The root cause is the fix for CVE-2024-0760 (part 3), which resets
the TCP connection on a failed send. Specifically commit
4b7c61381f stops reading on the socket
because the TCP connection is throttling.
When the tcpdns_send_cb callback thinks about restarting reading
on the socket, this fails because the socket is a client socket.
And nsupdate is a client and is using the same netmgr code.
This commit removes the requirement that the socket must be a server
socket, allowing reading on the socket again after being throttled.
Add another option to configure how many outgoing queries per
client request is allowed. The existing 'max-recursion-queries' is
per restart, this one is a global limit.
Previously, the update policy rules check was moved earlier in the
sequence, and the keep rule match pointers were kept to maintain the
ability to verify maximum records by type.
However, these pointers can become invalid if server reloading
or reconfiguration occurs before update completion. To prevent
this issue, extract the maximum records by type value immediately
during processing and only keep the copy of the values instead of the
full ssurule.
the generated grammar for named.conf clauses that may or may not be
enabled at compile time will now print the same comment regardless of
whether or not they are.
previously, the grammar didn't print a comment if an option was enabled,
but printed "not configured" if it was disabled. now, in both cases,
it will say "optional (only available if configured)".
as an incidental fix, clarified the documentation for "named-checkconf -n".
Add support for Extended DNS Errors (EDE) error 22: No reachable
authority. This occurs when after a timeout delay when the resolver is
trying to query an authority server.
The lame-ttl processing was overriden to be disabled in the config,
but the code related to the lame-ttl was still kept in the resolver
code. More importantly, the DNS_RESOLVER_BADCACHETTL() macro would
cause the entries in the resolver badcache to be always cached for at
least 30 seconds even if the lame-ttl would be set to 0.
Remove the dns_badcache code from the dns_resolver unit, so we save some
processing time and memory in the resolver code.
Instead of cleaning the dns_badcache opportunistically, add per-loop
LRU, so each thread-loop can clean the expired entries. This also
allows removal of the atomic operations as the badcache entries are now
immutable, instead of updating the badcache entry in place, the old
entry is now deleted from the hashtable and the LRU list, and the new
entry is inserted in the LRU.
The code that listens on individual interfaces is now stable and doesn't
require any changes. The code that would bind to IPv6 wildcard address
and then use IPv6 pktinfo structure to get the source address is not
going to be completed, so it's better to just remove the dead cruft.
In 2024, it is reasonable to assume that IPv4 and IPv6 is always
available on a socket() level. We still keep the option to enable or
disable each IP version individually, as the routing might be broken or
undesirable for one of the versions.
There was a data race dns_validator_cancel() was called when the
offloaded operations were in progress. Make dns_validator_cancel()
respect the data ownership and only set new .shuttingdown variable when
the offloaded operations are in progress. The cancel operation would
then finish when the offloaded work passes the ownership back to the
respective thread.
Previously a ISC_R_CANCELED result code switch-case has been added to
the zone.c:zone_xfrdone() function, which did two things:
1. Schedule a new zone transfer if there's a scheduled force reload of
the zone.
2. Reset the primaries list.
This proved to be not a well-thought change and causes problems,
because the ISC_R_CANCELED code is used not only when the whole transfer
is canceled, but also when, for example, a particular primary server is
unreachable, and named still needs to continue the transfer process by
trying the next server, which it now no longer does in some cases. To
solve this issue, three changes are made:
1. Make sure dns_zone_refresh() runs on the zone's loop, so that the
sequential calls of dns_zone_stopxfr() and dns_zone_forcexfr()
functions (like done in 'rndc retransfer -force') run in intended
order and don't race with each other.
2. Since starting the new transfer is now guaranteed to run after the
previous transfer is shut down (see the previous change), remove the
special handling of the ISC_R_CANCELED case, and let the default
handler to handle it like before. This will bring back the ability to
try the next primary if the current one was interrupted with a
ISC_R_CANCELED result code.
3. Change the xfrin.c:xfrin_shutdown() function to pass the
ISC_R_SHUTTINGDOWN result code instead of ISC_R_CANCELED, as it makes
more sense.
The DLZ modules are poorly maintained as we only ensure they can still
be compiled, the DLZ interface is blocking, so anything that blocks the
query to the database blocks the whole server and they should not be
used except in testing. The DLZ interface itself should be scheduled
for removal.
The libisc now includes sizeable chunks of cryptography, but the crypto
log module was missing. Add the new ISC_LOGMODULE_CRYPTO to libisc and
use it in the isc_tls error logging.
This change adds a "none" parameter to the query-source[-v6]
options in named.conf, which forbid the usage of IPv4 or IPv6
addresses when doing upstream queries.
DNS_LOGMODULE_RBTDB was simply inappropriate, and this
log message is actually dependent on db implementation
details, so DNS_LOGMODULE_DB would be the best choice.
The new log message is emitted when adding or updating an RRset
fails due to exceeding the max-records-per-type limit. The log includes
the owner name and type, corresponding zone name, and the limit value.
It will be emitted on loading a zone file, inbound zone transfer
(both AXFR and IXFR), handling a DDNS update, or updating a cache DB.
It's especially helpful in the case of zone transfer, since the
secondary side doesn't have direct access to the offending zone data.
It could also be used for max-types-per-name, but this change
doesn't implement it yet as it's much less likely to happen
in practice.
The 'dns' variable in dohpath can be in various forms ({?dns},
{dns}, {&dns} etc.). To check for a valid dohpath it ends up
being simpler to just parse the URI template rather than looking
for all the various forms if substring.
Add query counters for DoT, DoH, unencrypted DoH and their proxied
counterparts. The protocols don't increment TCP/UDP counters anymore
since they aren't the same as plain DNS-over-53.
The usage of port and tls arguments in *-source and *-source-v6 named
configuration options has been previously removed. Remove
configuration check deprecating usage of those arguments.
Reintroduce logic to apply diffs when the number of pending tuples is
above 128. The previous strategy of accumulating all the tuples and
pushing them at the end leads to excessive memory consumption during
transfer.
This effectively reverts half of e3892805d6
When canceling the ADB find, the lock on the find gets released for
a brief period of time to be locked again inside adbname lock. During
the brief period that the ADB find is unlocked, it can get canceled by
other means removing it from the adbname list which in turn causes
assertion failure due to a double removal from the adbname list.
Recheck if the find->adbname is still valid after acquiring the lock
again and if not just skip the double removal. Additionally, attach to
the adbname as in the worst case, the adbname might also cease to exist
if the scheduler would block this particular thread for a longer period
of time invalidating the lock we are going to acquire and release.
QPDB is now a default implementation for both cache and zone. Remove
the venerable RBTDB database implementation, so we can fast-track the
changes to the database without having to implement the design changes
to both QPDB and RBTDB and this allows us to be more aggressive when
refactoring the database design.
There is a data race between the statistics channel, which uses
`dns_zone_getxfr()` to get a reference to `zone->xfr`, and the creation
of `zone->xfr`, because the latter happens outside of a zone lock.
Split the `dns_xfrin_create()` function into two parts to separate the
zone tranfer startring part from the zone transfer object creation part.
This allows us to attach the new object to a local variable first, then
attach it to `zone->xfr` under a lock, and only then start the transfer.
Originally, the dns_dbversion_t was typedef'ed to void type. This
allowed some flexibility, but using (void *) just removes any
type-checking that C might have. Instead of using:
typedef void dns_dbversion_t;
use a trick to define the type to non-existing structure:
typedef struct dns_dbversion dns_dbversion_t;
This allows the C compilers to employ the type-checking while the
structure itself doesn't have to be ever defined because the actual
'storage' is never accessed using dns_dbversion_t type.
Originally, the dns_dbnode_t was typedef'ed to void type. This allowed
some flexibility, but using (void *) just removes any type-checking that
C might have. Instead of using:
typedef void dns_dbnode_t;
use a trick to define the type to non-existing structure:
typedef struct dns_dbnode dns_dbnode_t;
This allows the C compilers to employ the type-checking while the
structure itself doesn't have to be ever defined because the actual
'storage' is never accessed using dns_dbnode_t type.
The query-source option has the slight quirk of allowing the address to
be specified in two ways, either as every other source option, or as an
"address" key-value pair.
For this reason, it had a separate parsing function from other X-source
options, but it is possible to extend the parsing of other X-sources to
be generic and also handle query-source.
This commit just does that.
The isc/crypto.h now directly includes the OpenSSL headers (evp.h) and
any application that includes that header also needs to have
OPENSSL_CFLAGS in the Makefile.am. Adjust the required automake files
as needed.
Add an option to dnssec-ksr keygen, -o, to create KSKs instead of ZSKs.
This way, we can create a set of KSKS for a given period too.
For KSKs we also need to set timing metadata, including "SyncPublish"
and "SyncDelete". This functionality already exists in keymgr.c so
let's make the function accessible.
Replace dnssec-keygen calls with dnssec-ksr keygen for KSK in the
ksr system test and check keys for created KSKs as well. This requires
a slight modification of the check_keys function to take into account
KSK timings and metadata.
some EDNS option names, including DAU, DHU, N3U, and CHAIN,
were not printed in dns_message_pseudosectiontotext() or
_psuedosectiontoyaml(); they were displayed as unknown options.
this has been corrected.
that code was also refactored to use switch instead of if/else,
and to look up the option code names in a table to prevent
inconsistencies between the two formats. one such inconsistency
was corrected: the "TCP-KEEPALIVE" option is now always printed
with a hyphen, instead of being "TCP KEEPALIVE" when not using
YAML. the keepalive system test has been updated to expect this.
EDNS options that print DNS names (i.e., CHAIN and Report-Channel)
now enclose them in quotation marks to ensure YAML correctness.
the auth system test has been updated to expect this when grepping
for Report-Channel options.
Dispatch needs to know the transport that is being used over the
TCP connection to correctly allow for it to be reused. Add a
transport parameter to dns_dispatch_createtcp and dns_dispatch_gettcp
and use it when selecting a TCP socket for reuse.
RFC 9567 section 8.1 specifies that the agent domain cannot
be a subdomain of the domain it is reporting on. therefore,
in addition to making it illegal to configure that at the
zone level, we also need to disable send-report-channel for
any zone for which the global send-report-channel value is
a subdomain.
we also now warn if send-report-channel is configured
globally to a zone that we host, but that zone doesn't
have log-report-channel set.
the logging of error-report queries is no longer activated by
the view's "send-report-channel" option; that now only configures
the agent-domain value that is to be sent in authoritative
responses. the warning that was logged when "send-agent-domain"
was set to a value that is not a locally configured zone has
been removed.
error-report logging is now activated by the presence of an
authoritative zone with the "log-report-channel" option set to
"yes". this is not permitted in the root zone.
NOTE: a zone with "log-report-channel yes;" should contain a
"*._er" wildcard, but that requirement is not yet enforced.
add a boolean "log-report-channel" option for primary and
secondary zones, which sets the DNS_ZONEOPT_LOGREPORTS zone
flag. this option is not yet functional.
If send-report-channel is set at the zone level, it will
be stored in the zone object and used instead of the
view-level agent-domain when constructing the EDNS
Report-Channel option.
This commit adds support for the EDNS Report-Channel option,
which is returned in authoritative responses when EDNS is in use.
"send-report-channel" sets the Agent-Domain value that will be
included in EDNS Report-Channel options. This is configurable at
the options/view level; the value is a DNS name. Setting the
Agent-Domain to the root zone (".") disables the option.
When this value has been set, incoming queries matchng the form
_er.<qtype>.<qname>.<extended-error-code>._er.<agent-domain>/TXT
will be logged to the dns-reporting-agent channel at INFO level.
(Note: error reporting queries will only be accepted if sent via
TCP or with a good server cookie. If neither is present, named
returns BADCOOKIE to complete the DNS COOKIE handshake, or TC=1
to switch the client to TCP.)
These are logged to the update category at debug level 99 and
have the following form.
update-policy: using: signer=ddns-key.example.nil, name=updated.example.nil, addr=10.53.0.1, tcp=0, type=A, target=
update-policy: trying: grant zonesub-key.example.nil zonesub TXT
update-policy: next rule: signer does not match identity
update-policy: trying: grant ddns-key.example.nil zonesub ANY
update-policy: matched: grant ddns-key.example.nil zonesub ANY
or
update-policy: using: signer=restricted.example.nil, name=example.nil, addr=10.53.0.1, tcp=0, type=TXT, target=
update-policy: trying: grant zonesub-key.example.nil zonesub TXT
update-policy: next rule: signer does not match identity
update-policy: trying: grant ddns-key.example.nil zonesub ANY
update-policy: next rule: signer does not match identity
update-policy: trying: grant restricted.example.nil zonesub ANY
update-policy: next rule: name/subdomain mismatch
update-policy: no match found
where 'using:' is the calling parameters of dns_ssutable_checkrules,
'trying:' in the rule bing evaluated, "next rule:" is the reason
the rule does not match, "matched:" repeats the matched rule, and
no match found is reported when te set of rules is exhausted.
When DSCP was removed the parsing of hostnames was accidentally
broken resulting in an assertion failure. Call cfg_parse_tuple
rather than using custom code in parse_sockaddrnameport.
Unify libcrypto initialization and explicit digest fetching in a single
place and move relevant code to the isc__crypto namespace instead of
isc__tls.
It will remove the remaining implicit fetching and deduplicate explicit
fetching inside the codebase.
The <openssl/hmac.h> header was unused and including the
header might cause build failure when OpenSSL doesn't have
Engines support enabled.
See https://fedoraproject.org/wiki/Changes/OpensslDeprecateEngine
Removes unused hmac includes after Remove OpenSSL Engine support
(commit ef7aba7072) removed engine
support.
maxlabels is the suffix length that corresponds to the latest
NXDOMAIN response. minlabels is the suffix length that corresponds
to longest found existing name.
Prior to running the keymgr, first make sure that existing keys
are present in the new keylist. If not, treat this as an operational
error where the keys are made offline (temporarily), possibly unwanted.
Rename check_recursionquota() to acquire_recursionquota(), and
implement a new function called release_recursionquota() to
reverse the action. It helps with decreasing code duplication.
In two places, after linking the client to the manager's
"recursing-clients" list using the check_recursionquota()
function, the query.c module fails to unlink it on error
paths. Fix the bugs by unlinking the client from the list.
Also make sure that unlinking happens before detaching the
client's handle, as it is the logically correct order, e.g.
in case if it's the last handle and ns__client_reset_cb()
can be called because of the detachment.
The dns_zone_getxfrintime() function fails to lock the zone before
accessing its 'xfrintime' structure member, which can cause a data
race between soa_query() and the statistics channel. Add the missing
locking/unlocking pair, like it's done in numerous other similar
functions.
The 'nodetach' member is a leftover from the times when non-zero
'stale-answer-client-timeout' values were supported, and currently
is always 'false'. Clean up the member and its usage.
Currently, the outgoing UDP sockets have enabled
SO_REUSEADDR (SO_REUSEPORT on BSDs) which allows multiple UDP sockets to
bind to the same address+port. There's one caveat though - only a
single (the last one) socket is going to receive all the incoming
traffic. This in turn could lead to incoming DNS message matching to
invalid dns_dispatch and getting dropped.
Disable setting the SO_REUSEADDR on the outgoing UDP sockets. This
needs to be done explicitly because `uv_udp_open()` silently enables the
option on the socket.
When matching the TCP dispatch responses, we should skip the responses
that do not belong to our TCP connection. This can happen with faulty
upstream server that sends invalid QID back to us.
The dns_dispatch_add() function registers the 'resp' entry in
'disp->mgr->qids' hash table with 'resp->port' being 0, but in
tcp_recv_success(), when looking up an entry in the hash table
after a successfully received data the port is used, so if the
local port was set (i.e. it was not 0) it fails to find the
entry and results in an unexpected error.
Set the 'resp->port' to the given local port value extracted from
'disp->local'.
This commit adds support for timestamps in iso8601 format with timezone
when logging. This is exposed through the iso8601-tzinfo printtime
suboption.
It also makes the new logging format the default for -g output,
hopefully removing the need for custom timestamp parsing in scripts.
This commit nulls all type fields for the clausedef lists that are
declared ancient, and removes the corresponding cfg_type_t and parsing
functions when they are found to be unused after the change.
Static-stub address and addresses from other sources where being
mixed together resulting in static-stub queries going to addresses
not specified in the configuration or alternatively static-stub
addresses being used instead of the real addresses.
As the relaxed memory ordering doesn't ensure any memory
synchronization, it is possible that the increment will succeed even
in the case when it should not - there is a race between
atomic_fetch_sub(..., acq_rel) and atomic_fetch_add(..., relaxed).
Only the result is consistent, but the previous value for both calls
could be same when both calls are executed at the same time.
An exit path in the dns_dispatch_add() function fails to get out of
the RCU critical section when returning early. Add the missing
rcu_read_unlock() call.
This change uses uv_get_total_memory() to get the memory available to
BIND 9 with possible modification by uv_get_constrained_memory() if the
libuv version is recent enough to honour constraints created by
f.e. cgroups.
The OpenBSD doesn't have sysctlbyname(), but sysctl() can be used to
read the number of online/available CPUs by reading following MIB(s):
[CTL_HW, HW_NCPUONLINE] with fallback to [CTL_HW, HW_NCPU].
Cleanup various checks and cleanups that are available on the all
platforms like sysctlbyname() and various related <sys/*.h> headers
that are either defined in POSIX or available on Linux and all BSDs.
Instead of cooking up our own code for getting the number of available
CPUs for named to use, make use of uv_available_parallelism() from
libuv >= 1.44.0.
The dns_dispatch_add() call in the dns_xfrin unit had hardcoded 30
second limit. This meant that any incoming transfer would be stopped in
it didn't finish within 30 seconds limit. Additionally, dns_xfrin
callback was ignoring the return value from dns_dispatch_getnext() when
restarting the reading from the TCP stream; this could cause transfers
to get stuck waiting for a callback that would never come due to the
dns_dispatch having already been shut down.
Call the dns_dispatch_add() without a timeout and properly handle the
result code from the dns_dispatch_getnext().
The dns_dispatch_add() has timeout parameter that could not be 0 (for
not timeout). Modify the dns_dispatch implementation to accept a zero
timeout for cases where the timeouts are undesirable because they are
managed externally.
Query and response log shares the same flags. Move flags logging out of
log_query to share it with log_response. Use buffer instead of snprintf
to fill flags a bit faster.
Signed-off-by: Petr Menšík <pemensik@redhat.com>
Remove answer flag from log, log instead count of records for each
message section. Include EDNS version and few flags of response. Add
also status of result.
Still does not include body of responses rrset.
when the QP cache was adapted from the RBT database, some names
weren't changed. this could be confusing, so let's change them now.
also, we no longer need to include rbt.h.
DNSRPS was the API for a commercial implementation of Response-Policy
Zones that was supposedly better. However, it was never open-sourced
and has only ever been available from a single vendor. This goes against
the principle that the open-source edition of BIND 9 should contain only
features that are generally available and universal.
This commit removes the DNSRPS implementation from BIND 9. It may be
reinstated in the subscription edition if there's enough interest from
customers, but it would have to be rewritten as a plugin (hook) instead
of hard-wiring it again in so many places.
If the operating system UDP queue gets full and the outgoing UDP sending
starts to be delayed, BIND 9 could exhibit memory spikes as it tries to
enqueue all the outgoing UDP messages. As those are not going to be
delivered anyway (as we argued when we stopped enlarging the operating
system send and receive buffers), try to send the UDP messages directly
using `uv_udp_try_send()` and if that fails, drop the outgoing UDP
message.
We currently set SO_INCOMING_CPU incorrectly, and testing by Ondrej
shows that fixing the issue and setting affinities is worse than letting
the kernel schedule threads without constraints. So we should not set
SO_INCOMING_CPU anymore.
The 'all_spilled' local variable in resolver.c:fctx_getaddresses()
is 'true' by default, and only becomes false when there is at least
one successfully found NS address. However, when a 'forward only;'
configuration is used, the code jumps over the part where it looks
for NS addresses and doesn't reset the 'all_spilled' to false, which
results in incorretly increased 'serverquota' statistics variable,
and also in invalid return error code from the function. The result
code error didn't make any differences, because all codes other than
'ISC_R_SUCCESS' or 'DNS_R_WAIT' were treated in the same way, and
the result code was never logged anywhere.
Set the default value of 'all_spilled' to 'false', and only make it
'true' before actually starting to look up NS addresses.
Currently, the isc_work API is overloaded. It runs both the
CPU-intensive operations like DNSSEC validations and long-term tasks
like RPZ processing, CATZ processing, zone file loading/dumping and few
others.
Under specific circumstances, when many large zones are being loaded, or
RPZ zones processed, this stops the CPU-intensive tasks and the DNSSEC
validation is practically stopped until the long-running tasks are
finished.
As this is undesireable, this commit moves the CPU-intensive operations
from the isc_work API to the isc_helper API that only runs fast memory
cleanups now.
Add an extra thread that can be used to offload operations that would
affect latency, but are not long-running tasks; those are handled by
isc_work API.
Each isc_loop now has matching isc_helper thread that also built on top
of uv_loop. In fact, it matches most of the isc_loop functionality, but
only the `isc_helper_run()` asynchronous call is exposed.
When verifying a message in an offloaded thread there is a race with
the worker thread which writes to the same buffer. Clone the message
buffer before offloading.
Remove the use of "port" when configuring query-source(-v6),
transfer-source(-v6), notify-source(-v6), parental-source(-v6),
etc. Remove the use of source ports for parental-agents.
Also remove the deprecated options use-{v4,v6}-udp-ports and
avoid-{v4,v6}udp-ports.