Commit graph

1057 commits

Author SHA1 Message Date
alessio
2f27d66450 Refactor to use list-like macro for message sections
In the code base it is very common to iterate over all names in a message
section and all rdatasets for each name, but various idioms are used for
iteration.

This commit standardizes them as much as possible to a single idiom,
using the macro MSG_SECTION_FOREACH, similar to the existing
ISC_LIST_FOREACH.
2025-03-27 03:09:46 +01:00
Evan Hunt
36cf1c6a5b when forwarding, try with CD=0 first
when sending a query to a forwarder for a name within a secure domain,
the first query is now sent with CD=0. when the forwarder itself
is validating, this will give it a chance to detect bogus data and
replace it with valid data before answering. this reduces our chances
of being stuck with data that can't be validated.

if the forwarder returns SERVFAIL to the initial query, the query
will be repeated with CD=1, to allow for the possibility that the
forwarder's validator is faulty or that the bogus answer is covered
by an NTA.

note: previously, CD=1 was only sent when the query name was in a
secure domain. today, validating servers have a trust anchor at the
root by default, so virtually all queries are in a secure domain.
therefore, the code has been simplified.  as long as validation is
enabled, any forward query that receives a SERVFAIL response will be
retried with CD=1.
2025-03-24 17:33:11 -07:00
Mark Andrews
a4f5c1d5f3 Add option request-zoneversion
This can be set at the option, view and server levels and causes
named to add an EDNS ZONEVERSION option to requests.  Replies are
logged to the 'zoneversion' category.
2025-03-24 22:16:09 +00:00
Mark Andrews
bfbaacc9a0 Use reference counted counters for nfail and nvalidations
The fetch context that held these values could be freed while there
were still active pointers to the memory.  Using a reference counted
pointer avoids this.
2025-03-20 09:12:49 +11:00
Aram Sargsyan
830e548111 Fix the resolvers RTT-ranged responses statistics counters
When a response times out the fctx_cancelquery() function
incorrectly calculates it in the 'dns_resstatscounter_queryrtt5'
counter (i.e. >=1600 ms). To avoid this, the rctx_timedout()
function should make sure that 'rctx->finish' is NULL. And in order
to adjust the RTT values for the timed out server, 'rctx->no_response'
should be true. Update the rctx_timedout() function to make those
changes.
2025-03-18 16:20:59 +00:00
Aram Sargsyan
12e7dfa397 Fix resolver responses statistics counter
The resquery_response() function increases the response counter without
checking if the response was successful. Increase the counter only when
the result indicates success.
2025-03-18 16:20:59 +00:00
Ondřej Surý
0d9f58b745 Remove a kludge to process non-authoritative CNAME response
A BIND 8 server could return a non-authoritative answer when a CNAME is
followed.  This is no longer handled as a valid answer.
2025-03-17 23:23:24 +00:00
Ondřej Surý
05d6542e6d Remove the kludges for records in the bad sections
There were kludges to help process responses from authoritative servers
giving RRs in wrong sections (mentioning BIND 8).  These should just go
away and such responses should not be processed.
2025-03-17 23:23:24 +00:00
Evan Hunt
606d30796e use new dns_rdatatype classification functions
modify code to use dns_rdatatype_ismulti(), dns_rdatatype_issig(),
dns_rdatatype_isaddr(), and dns_rdatatype_isalias() where applicable.
2025-03-15 00:27:54 +00:00
Mark Andrews
de519cd1c9 Don't leak the original QTYPE to parent zone
When performing QNAME minimization, named now sends an NS
query for the original QNAME, to prevent the parent zone from
receiving the QTYPE.

For example, when looking up example.com/A, we now send NS queries
for both com and example.com before sending the A query to the
servers for example.com.  Previously, an A query for example.com
would have been sent to the servers for com.

Several system tests needed to be adjusted for the new query pattern:

- Some queries in the serve-stale test were sent to the wrong server.
- The synthfromdnssec test could fail due to timing issues; this
  has been addressed by adding a 1-second delay.
- The cookie test could fail due to the a change in the count of
  TSIG records received in the "check that missing COOKIE with a
  valid TSIG signed response does not trigger TCP fallback" test case.
- The GL #4652 regression test case in the chain system test depends
  on a particular query order, which no longer occurs when QNAME
  minimization is active. We now disable qname-minimization
  for that test.
2025-03-14 01:01:26 +00:00
Mark Andrews
496f7963cd Fix handling of ISC_R_TIMEOUT in resume_qmin()
If a timeout occurs when sending a QMIN query, QNAME
minimization should be disabled. This now causes a hard
failure in strict mode, or a fallback to non-minimized queries
in relaxed mode.
2025-03-14 01:01:26 +00:00
Mark Andrews
98fc14dc75 Exempt QNAME minimization queries from fetches-per-zone
The calling fetch has already called fcount_incr() for this zone;
calling it again for a QMIN query results in double counting.

When resuming after a QMIN query is answered, however, we do now
ensure before continuing that the fetches-per-zone limit has not
been exceeded.
2025-03-14 01:01:26 +00:00
Evan Hunt
9ebeb60174 fix the fetchresponse result for CNAME/DNAME
the fix in commit 1edbbc32b4 was incomplete; the wrong
event result could also be set in cache_name() and validated().
2025-02-27 19:00:27 +00:00
Evan Hunt
1edbbc32b4 set eresult based on the type in ncache_adderesult()
when the caching of a negative record failed because of the
presence of a positive one, ncache_adderesult() could override
this to ISC_R_SUCCESS. this could cause CNAME and DNAME responses
to be handled incorrectly.  ncache_adderesult() now sets the result
code correctly in such cases.
2025-02-25 21:29:19 -08:00
Evan Hunt
6c2af2ae3b remove 'target' from dns_adb
the target name parameter to dns_adb_createfind() was always passed as
NULL, so we can safely remove it.

relatedly, the 'target' field in the dns_adbname structure was never
referenced after being set.  the 'expire_target' field was used, but
only as a way to check whether an ADB name represents a CNAME or DNAME,
and that information can be stored as a single flag.
2025-02-26 00:43:21 +00:00
Mark Andrews
f98a8331aa Fix dual-stack-servers
Named was stopping nameserver address resolution attempts too soon
when dual stack servers are configured.  Dual stack servers are
used when there are *not* addresses for the server in a particular
address family so find->status == DNS_ADB_NOMOREADDRESSES is not a
sufficient stopping condition when dual stack servers are available.
Call fctx_try to see if the alternate servers can be used.
2025-02-25 23:47:46 +00:00
Evan Hunt
a6986f6837 remove 'target' parameter from dns_name_concatenate()
the target buffer passed to dns_name_concatenate() was never
used (except for one place in dig, where it wasn't actually
needed, and has already been removed in a prior commit).
we can safely remove the parameter.
2025-02-25 12:53:25 -08:00
Evan Hunt
2edefbad4a remove the 'name_coff' parameter in dns_name_towire()
this parameter was added as a (minor) optimization for
cases where dns_name_towire() is run repeatedly with the
same compression context, as when rendering all of the rdatas
in an rdataset. it is currently only used in one place.

we now simplify the interface by removing the extra parameter.
the compression offset value is now part of the compression
context, and can be activated when needed by calling
dns_compress_setmultiuse(). multiuse mode is automatically
deactivated by any subsequent call to dns_compress_permitted().
2025-02-25 12:53:25 -08:00
Ondřej Surý
1e4fb53c61
Destroy the hashmap iterator inside the rwlock
Previously, the hashmap iterator for fetches-per-zone was destroy
outside the rwlock.  This could lead to an assertion failure due to a
timing race with the internal rehashing of the hashmap table as the
rehashing process requires no iterators to be running when rehashing the
hashmap table.  This has been fixed by moving the destruction of the
iterator inside the read locked section.
2025-02-25 13:36:37 +01:00
Ondřej Surý
1c22ab2ef7
Simplify name initializers
We no longer need to pass labels to DNS_NAME_INITABSOLUTE
and DNS_NAME_INITNONABSOLUTE.
2025-02-25 12:17:34 +01:00
Ondřej Surý
04c2c2cbc8
Simplify dns_name_init()
Remove the now-unused offsets parameter from dns_name_init().
2025-02-25 12:17:34 +01:00
Ondřej Surý
08e966df82
Remove offsets from the dns_name and dns_fixedname structures
The offsets were meant to speed-up the repeated dns_name operations, but
it was experimentally proven that there's actually no real-world
benefit.  Remove the offsets and labels fields from the dns_name and the
static offsets fields to save 128 bytes from the fixedname in favor of
calculating labels and offsets only when needed.
2025-02-25 12:17:34 +01:00
Evan Hunt
d0fd9cbe3b Fix a logic error in cache_name()
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
2025-02-24 15:04:14 -08:00
Ondřej Surý
c6b0368b21
Dump the fetches from dns_resolver_dumpfetches()
Previously, the dns_resolver_dumpfetches() would go over the fetch
counters.  Alas, because of the earlier optimization, the fetch counters
would be increased only when fetches-per-zone was not 0, otherwise the
whole counting was skipped for performance reasons.

Instead of using the auxiliary fetch counters hash table, use the real
hash table that stores the fetch contexts to dump the ongoing fetches to
the recursing file.

Additionally print more information about the fetch context like start
and expiry times, number of fetch responses, number of queries and count
of allowed and dropped fetches.
2025-02-21 22:25:43 +01:00
Ondřej Surý
cf078fadeb
Fix the fetch context hash table lock ordering
The order of the fetch context hash table rwlock and the individual
fetch context was reversed when calling the release_fctx() function.
This was causing a problem when iterating the hash table, and thus the
ordering has been corrected in a way that the hash table rwlock is now
always locked on the outside and the fctx lock is the interior lock.
2025-02-21 22:05:43 +01:00
Ondřej Surý
77ec2a6c22 Cleanup the isc_counter unit
The isc_counter_create() doesn't need the return value (it was always
ISC_R_SUCCESS), use the macros to implement the reference counting,
little style cleanup, and expand the unit test.
2025-02-21 09:51:42 +00:00
Evan Hunt
6aba56ae89 Check whether a rejected rrset is different
Add a new dns_rdataset_equals() function to check whether two
rdatasets are equal in DNSSEC terms.

When an rdataset being cached is rejected because its trust
level is lower than the existing rdataset, we now check to see
whether the rejected data was identical to the existing data.
This allows us to cache a potentially useful RRSIG when handling
CD=1 queries, while still rejecting RRSIGs that would definitely
have resulted in a validation failure.
2025-02-19 17:25:20 -08:00
Mark Andrews
ea9d7080cd Validate address lookups from ADB
The address lookups from ADB were not being validated, allowing
spoofed responses to be accepted and used for other lookups.

Validate the answers except when CD=1 is set in the triggering
request.  Separate ADB names looked up with CD=1 from those without
CD=1, to prevent the use of unvalidated answers in the normal lookup
case (CD=0).  Set the TTL on unvalidated (pending) responses to
ADB_CACHE_MINIMUM when adding them to the ADB.
2025-02-03 00:24:34 +00:00
Ondřej Surý
2f8e0edf3b Split and simplify the use of EDE list implementation
Instead of mixing the dns_resolver and dns_validator units directly with
the EDE code, split-out the dns_ede functionality into own separate
compilation unit and hide the implementation details behind abstraction.

Additionally, the EDE codes are directly copied into the ns_client
buffers by passing the EDE context to dns_resolver_createfetch().

This makes the dns_ede implementation simpler to use, although sligtly
more complicated on the inside.

Co-authored-by: Colin Vidal <colin@isc.org>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
2025-01-30 11:52:53 +01:00
Andoni Duarte Pintado
3a64b288c1 Merge tag 'v9.21.4' 2025-01-29 17:17:18 +01:00
Colin Vidal
78274ec2b1 fix EDE 22 time out detection
Extended DNS error 22 (No reachable authority) was previously detected
when `fctx_expired` fired. It turns out this function is used as a
"safety net" and the timeout detection should be caught earlier.

It was working though, because of another issue fixed by !9927. Since
this change, the recursive request timed out detection occurs before
`fctx_expired` so EDE 22 is not added to the response message anymore.

The fix of the problem is to add the EDE 22 code in two situations:

- When the dispatch code timed out (rctx_timedout) the resolver code
  checks various properties to figure out if it needs to make another
  fetch attempt. One of the paramters if the fetch expiration time. If
  it expires, the whole recursion is canceled, so it now adds the EDE 22
  code.

- If the fetch expiration time doesn't expires in the case above (and
  other parameters allows it) a new fetch attempt is made (fctx_query).
  But before the new request is actually made, the fetch expiration time
  is re-checked. It might then has elapsed, and the whole recursion is
  canceled. So it now also adds the EDE 22 code here as well.
2025-01-27 11:49:44 +01:00
Colin Vidal
46a58acdf5 add support for EDE code 1 and 2
Add support for EDE codes 1 (Unsupported DNSKEY Algorithm) and 2
(Unsupported DS Digest Type) which might occurs during DNSSEC
validation in case of unsupported DNSKEY algorithm or DS digest type.

Because DNSSEC internally kicks off various fetches, we need to copy
all encountered extended errors from fetch responses to the fetch
context. Upon an event, the errors from the fetch context are copied
to the client response.
2025-01-24 12:26:30 +00:00
Aram Sargsyan
a6d6c3cb45 Clean up fctx->next_timeout
Since the support for non-zero values of stale-answer-client-timeout
was removed in bd7463914f, 'next_timeout'
is unused. Clean it up.
2025-01-22 13:40:45 +00:00
Aram Sargsyan
87c453850c Fix rtt calculation bug for TCP in the resolver
When TCP is used, 'fctx_query()' adds one second to the rtt
(round-trip time) value, but there's a bug when the decision
about using TCP is made already after the calculation. Move the
block of the code which looks up the peers list to decide
whether to use TCP into a place that's before the rtt calculation
is performed. This commit doesn't add or remove any code, it just
moves the code and adds a comment block.
2025-01-22 13:40:45 +00:00
Ondřej Surý
9f945c8b67
Shutdown the fetch context after canceling the last fetch
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer.  This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.

Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
2025-01-22 14:19:20 +01:00
Aram Sargsyan
64ffbe82c0 Separate the connect and the read timeouts in dispatch
The network manager layer has two different timers with their
own timeout values for TCP connections: connect timeout and read
timeout. Separate the connect and the read TCP timeouts in the
dispatch module too.
2025-01-22 11:57:52 +00:00
Colin Vidal
c9529c0acb remove ISC_LINK(link) property from fetchctx
Likely because of historical reasons, struct fetchctx does have a list
link property but is never used as a list. Remove this link property.
2025-01-22 09:56:09 +00:00
Colin Vidal
93e6e72eb6 remove validator link form fetchctx
struct fetchctx does have a list of pending validators as well as a
pointer to the HEAD validator. Remove the validator pointer to avoid
confusion, as there is no perticular reasons to have it directly
accessible outside of the list.
2025-01-22 09:56:09 +00:00
Ondřej Surý
a1982cf1bb Limit the additional processing for large RDATA sets
Limit the number of records appended to ADDITIONAL section to the names
that have less than 14 records in the RDATA.  This limits the number
of the lookups into the database(s) during single client query.

Also don't append any additional data to ANY queries.  The answer to ANY
is already big enough.
2025-01-14 09:57:54 +00:00
Michał Kępień
7bdf5152d6
Adjust dns_message_logpacketfrom() log prefixes
Ensure the log prefixes passed to the dns_message_logpacketfrom()
function by its callers do not include the word "from" as the latter is
now emitted by the logfmtpacket() helper function.
2024-12-31 05:40:48 +01:00
Michał Kępień
58d38352ee
Adjust dns_message_logpacketfromto() log prefixes
Ensure the log prefixes passed to the dns_message_logpacketfromto()
function by its callers do not include the words "from" or "to" as those
are now emitted by the logfmtpacket() helper function.
2024-12-31 05:40:48 +01:00
Michał Kępień
c5555a5ca2
Log both "from" and "to" socket in debug messages
Move dns_dispentry_getlocaladdress() calls around so that they are not
only invoked when dnstap support is compiled in.  This function calls
isc_nmhandle_localaddr(), which may issue a system call, but only if the
ISC_SOCKET_DETAILS preprocessor macro is set at compile time.

Pass the value extracted by dns_dispentry_getlocaladdress() to
dns_message_logpacketfromto() so that it gets logged, adding useful
information to the relevant debug messages.
2024-12-31 05:40:48 +01:00
Michał Kępień
4ab35f6839
Rename dns_message_logpacket()
Since dns_message_logpacket() only takes a single socket address as a
parameter (and it is always the sending socket's address), rename it to
dns_message_logpacketfrom() so that its name better conveys its purpose
and so that the difference in purpose between this function and
dns_message_logpacketfromto() becomes more apparent.
2024-12-31 05:40:48 +01:00
Michał Kępień
fa073a0a63
Rename dns_message_logfmtpacket()
Since dns_message_logfmtpacket() needs to be provided with both "from"
and "to" socket addresses, rename it to dns_message_logpacketfromto() so
that its name better conveys its purpose.  Clean up the code comments
for that function.
2024-12-31 05:40:48 +01:00
Michał Kępień
bafa5d3c2e
Enable logging both "from" and "to" socket
Change the function prototype for dns_message_logfmtpacket() so that it
takes two isc_sockaddr_t parameters: one for the sending side and
another one for the receiving side.  This enables debug messages to be
more precise.

Also adjust the function prototype for logfmtpacket() accordingly.
Unlike dns_message_logfmtpacket(), this function must not require both
'from' and 'to' parameters to be non-NULL as it is still going to be
used by dns_message_logpacket(), which only provides a single socket
address.  Adjust its log format to handle both of these cases properly.

Adjust both dns_message_logfmtpacket() call sites accordingly, without
actually providing the second socket address yet.  (This causes the
revised REQUIRE() assertion in dns_message_logfmtpacket() to fail; the
issue will be addressed in a separate commit.)
2024-12-31 05:40:48 +01:00
Michał Kępień
05d69bd7a4
dns_message_logfmtpacket(): drop 'style' parameter
Both existing callers of the dns_message_logfmtpacket() function set the
argument passed as 'style' to &dns_master_style_comment.  To simplify
these call sites, drop the 'style' parameter from the prototype for
dns_message_logfmtpacket() and use a fixed value of
&dns_master_style_comment in the function's body instead.
2024-12-31 05:40:48 +01:00
Ondřej Surý
dcd1f5b842
Remove dnssec-must-be-secure feature
The dnssec-must-be-secure feature was added in the early days of BIND 9
and DNSSEC and it makes sense only as a debugging feature.  There are no
reasons to keep this feature in the production code anymore.

Remove the feature to simplify the code.
2024-12-09 13:10:21 +01:00
Matthijs Mekking
397ca34e34 Remove unused maxquerycount
While implementing the global limit 'max-query-count', initially I
thought adding the variable to the resolver structure. But the limit
is per client request so it was moved to the view structure (and
counter in ns_query structure). However, I forgot to remove the
variable from the resolver structure again. This commit fixes that.
2024-12-06 11:19:18 +01:00
Matthijs Mekking
16b3bd1cc7 Implement global limit for outgoing queries
This global limit is not reset on query restarts and is a hard limit
for any client request.
2024-12-05 14:17:07 +01:00
Matthijs Mekking
bbc16cc8e6 Implement 'max-query-count'
Add another option to configure how many outgoing queries per
client request is allowed. The existing 'max-recursion-queries' is
per restart, this one is a global limit.
2024-12-05 14:01:57 +01:00