Commit graph

14359 commits

Author SHA1 Message Date
Mark Andrews
96754276a7 Report non-effective primaries
When named is started with -4 or -6 and the primaries for a zone
do not have an IPv4 or IPv6 address respectively issue a log message.

(cherry picked from commit 2cd4303249)
2024-06-03 13:52:37 +00:00
Mark Andrews
7a9ac0491f Zone transfers should honour -4 and -6 options
Check if the address family has been disabled when transferring
zones.

(cherry picked from commit ecdde04e63)
2024-06-03 13:52:37 +00:00
Mark Andrews
e0af62deac Add helper function isc_sockaddr_disabled
(cherry picked from commit 9be1873ef3)
2024-06-03 13:52:37 +00:00
Matthijs Mekking
e1a49ee6d4 Call reset_shutdown if uv_tcp_close_reset failed
If uv_tcp_close_reset() returns an error code, this means the
reset_shutdown callback has not been issued, so do it now.

(cherry picked from commit c40e5c8653)
2024-06-03 08:16:32 +00:00
Matthijs Mekking
6f6d90fd51 Do not runtime check uv_tcp_close_reset
When we reset a TCP connection by sending a RST packet, do not bother
requiring the result is a success code.

(cherry picked from commit 5b94bb2129)
2024-06-03 08:16:32 +00:00
Aydın Mercan
dc9f55da5b
increase TCP4Clients/TCP6Clients after point of no failure
Failing to accept TCP/TLS connections in 9.18 detaches the quota in
isc__nm_failed_accept_cb, causing TCP4Clients and TCP6Clients statistics
to not decrease inside cleanup.

Fix by increasing the counter after the point of no failure but before
handling statistics through the client's socket is no longer valid.
2024-05-30 13:39:23 +03:00
Ondřej Surý
7c275be420
Create the new database for AXFR from the dns_zone API
The `axfr_makedb()` didn't set the loop on the newly created database,
effectively killing delayed cleaning on such database.  Move the
database creation into dns_zone API that knows all the gory details of
creating new database suitable for the zone.

(cherry picked from commit 3310cac2b0)
2024-05-29 08:56:38 +02:00
Mark Andrews
26b6ce9a56 Clang-format header file changes 2024-05-17 16:21:35 -07:00
Aram Sargsyan
0a48252b53 Fix a data race in isc_task_purgeevent()
When isc_task_purgeevent() is called for and 'event', the event, in
the meanwhile, could in theory get processed, unlinked, and freed.
So when the function then operates on the 'event', it causes a
segmentation fault.

The only place where isc_task_purgeevent() is called is from
timer_purge().

In order to resolve the data race, call isc_task_purgeevent() inside
the 'timer->lock' locked block, so that timerevent_destroy() won't
be able to destroy the event if it was processed in the meanwhile,
before isc_task_purgeevent() had a chance to purge it.

In order to be able to do that, move the responsibility of calling
isc_event_free() (upon a successful purge) out from the
isc_task_purgeevent() function to its caller instead, so that it can
be called outside of the timer->lock locked block.
2024-05-17 12:08:27 +00:00
Aram Sargsyan
857f6adaec Test a race condition between isc_timer_purge() and isc_event_free()
Let basic_tick() of 'task1' and 'basic_quick' of 'task4' run in
different threads, and insert an artificial delay in timer_purge()
to cause an existing race condition to appear.
2024-05-17 10:49:57 +00:00
Aram Sargsyan
c7b15f1f5a Expose internal timer_purge() as isc_timer_purge()
This function is used in a unit test to check for data races.
2024-05-17 10:49:57 +00:00
Mark Andrews
f7fb020b6e add test cases for several FORMERR code paths:
- duplicated question
  - duplicated answer
  - qtype as an answer
  - two question types
  - question names
  - nsec3 bad owner name
  - short record
  - short question
  - mismatching question class
  - bad record owner name
  - mismatched class in record
  - mismatched KEY class
  - OPT wrong owner name
  - invalid RRSIG "covers" type
  - UPDATE malformed delete type
  - TSIG wrong class
  - TSIG not the last record

(cherry picked from commit 6e9ed4983e)
2024-05-17 15:34:07 +10:00
Mark Andrews
29292902c0 Properly build the NSEC/NSEC3 type bit map
DNSKEY was incorrectly being added to the NESC/NSEC3 type bit map
when it was obscured by the delegation.  This lead to zone verification
failures.

(cherry picked from commit ec3c624814)
2024-05-16 01:53:39 +00:00
Mark Andrews
32589b2be7 Properly update 'maxtype'
'maxtype' should be checked to see if it should be updated whenever
a type is added to the type map.

(cherry picked from commit e84615629f)
2024-05-16 01:53:39 +00:00
Michał Kępień
df346b0088
Prevent passing NULL to dns_dispatch_resume()
If a query sent using the dns_request API times out when the view it was
associated with gets torn down, the dns_dispatch_resume() call in
req_response() may be issued with the 'resp' argument set to NULL,
triggering an assertion failure.  Consider the following scenario ([A]
and [B] are thread identifiers):

 1. [A] Read timeout for a dispatch query fires.

 2. [A] udp_recv() is called.  It locks the dispatch, determines it
    timed out, prepares for calling the higher-level callback with
    ISC_R_TIMEDOUT, and unlocks the dispatch (lib/dns/dispatch.c:633).

 3. [B] The last reference to a view is released.
    dns_requestmgr_shutdown() is called, canceling all in-flight
    requests for that view.  (Note that udp_recv() in thread [A] already
    unlocked the dispatch, so its state can be modified.)  As a part of
    this process, request_cancel() calls dns_dispatch_done() on
    request->dispentry, setting it to NULL.

 4. [A] udp_recv() calls the higher-level callback (req_response()) with
    ISC_R_TIMEDOUT.

 5. [A] Since the request timed out, req_response() retries sending it.
    In the process, it calls dns_dispatch_resume(), passing
    request->dispentry as the 'resp' argument.

 6. [A] Since 'resp' is NULL, the REQUIRE(VALID_RESPONSE(resp));
    assertion in dns_dispatch_resume() fails.

Fix by checking whether the request has been canceled before calling
dns_dispatch_resume(), similarly to how it is done in req_connected()
and req_senddone().
2024-05-15 21:24:24 +02:00
Mark Andrews
35f1e43273 Use dns_view_findzone instead of dns_zt_find
With weak zone attachments being used for catzs, catzs->view->zonetable
may be NULL so we need to account for this which dns_view_findzone
does.  This is already done in main.
2024-05-14 08:46:00 +10:00
Mark Andrews
d12def13f6 catzs->view should maintain a view reference
Use dns_view_weakattach and dns_view_weakdetach to maintain a
reference to the view referenced through catzs->view.

(cherry picked from commit 307e3ed9a6)
2024-05-09 10:22:00 +10:00
Mark Andrews
d1cc8a271d Only check SVBC alias forms at higher levels
Allow SVBC (HTTPS) alias form with parameters to be accepted from
the wire and when transfered.  This is for possible future extensions.

(cherry picked from commit 799046929c)
2024-05-07 02:08:27 +00:00
Mark Andrews
3f25f3349e Remove infinite loop on ISC_R_NOFILE
When parsing a zonefile named-checkzone (and others) could loop
infinitely if a directory was $INCLUDED.  Record the error and treat
as EOF when looking for multiple errors.

This was found by Eric Sesterhenn from X41.

(cherry picked from commit efd27bb82d)
2024-05-07 01:06:14 +00:00
Mark Andrews
58efb2f740 Address infinite loop when processing $GENERATE
In nibble mode if the value to be converted was negative the parser
would loop forever.  Process the value as an unsigned int instead
of as an int to prevent sign extension when shifting.

This was found by Eric Sesterhenn from X41.

(cherry picked from commit 371824f078)
2024-05-06 23:59:06 +00:00
Matthijs Mekking
4ef23ad0ff RPZ response's SOA record is incorrectly set to 1
An RPZ response's SOA record TTL is set to 1 instead of the SOA TTL,
a boolean value is passed on to query_addsoa, which is supposed to be
a TTL value. I don't see what value is appropriate to be used for
overriding, so we will pass UINT32_MAX.

(cherry picked from commit 5d7e613e81)
2024-05-06 12:18:08 +02:00
Mark Andrews
82ca80c2e9
Move onto the next RRSIG on DNS_R_SIGEXPIRED or DNS_R_SIGFUTURE 2024-04-30 17:47:49 +02:00
Michal Nowak
ea413a6fae Update sources to Clang 18 formatting
(cherry picked from commit f454fa6dea)
2024-04-23 12:48:56 +00:00
Matthijs Mekking
0134b91feb If kasp is not used, use legacy signature jitter
If the zone is signed with a different way than 'dnssec-policy', use
the legacy way of jittering signatures, that is calculate jitter by
taking the two values of 'sig-validity-interval' and subtracting the
second value from the first value.
2024-04-18 15:00:07 +00:00
Matthijs Mekking
f211c05990 Add checkconf check for signatures-jitter
Having a value higher than signatures-validity does not make sense
and should be treated as a configuration error.

(cherry picked from commit c3d8932f79)
2024-04-18 15:00:07 +00:00
Matthijs Mekking
2d8ed9d5d2 Implement signature jitter
When calculating the RRSIG validity, jitter is now derived from the
config option rather than from the refresh value.

(cherry picked from commit 67f403a423)
2024-04-18 15:00:07 +00:00
Matthijs Mekking
a1e61f179e Refactor code that calculates signature validity
There are three code blocks that are (almost) similar, refactor it
to one function.

(cherry picked from commit 0438d3655b)
2024-04-18 15:00:07 +00:00
Matthijs Mekking
104eabdc2e Add signatures-jitter option
Add an option to speficy signatures jitter.

(cherry picked from commit 2a4daaedca)
2024-04-18 15:00:07 +00:00
Michał Kępień
0107701681 Merge tag 'v9.18.25' into bind-9.18 2024-03-20 14:34:32 +01:00
Matthijs Mekking
a621e035d4 Detect invalid durations
Be stricter in durations that are accepted. Basically we accept ISO 8601
formats, but fail to detect garbage after the integers in such strings.

For example, 'P7.5D' will be treated as 7 days. Pass 'endptr' to
'strtoll' and check if the endptr is at the correct suffix.

(cherry picked from commit e39de45adc)
2024-03-14 11:40:43 +01:00
Mark Andrews
7498db6366 Don't use static stub when returning best NS
If we find a static stub zone in query_addbestns look for a parent
zone which isn't a static stub.

(cherry picked from commit 40816e4e35)
2024-03-14 15:33:25 +11:00
Evan Hunt
bc237c6f4a remove dead code in rbtdb.c
dns_db_addrdataset() enforces a requirement that version can only be
NULL for a cache database. code that checks for zone semantics and
version == NULL can never be reached.

(cherry picked from commit b3c8b5cfb2)
2024-03-13 18:21:44 -07:00
Mark Andrews
3fadd9efec Only call memmove if the rdata length is non zero
This avoids undefined behaviour on zero length rdata where the
data pointer is NULL.

(cherry picked from commit 228cc557fe)
2024-03-14 11:06:11 +11:00
Matthijs Mekking
1b2e6f494a Fix bug in keymgr Depends function
The Depends relation refers to types of rollovers in which a certain
record type is going to be swapped. Specifically, the Depends relation
says there should be no dependency on the predecessor key (the set
Dep(x, T) must be empty).

But if the key is phased out (all its states are in HIDDEN), there is
no longer a dependency. Since the relationship is still maintained
(Predecessor and Successor metadata), the keymgr_dep function still
returned true. In other words, the set Dep(x, T) is not considered
empty.

This slows down key rollovers, only retiring keys when the successor
key has been fully propagated.

(cherry picked from commit 0aac81cf80)
2024-03-13 11:51:02 +01:00
Ondřej Surý
e7927a5b18
Move the task creation into cache_create_db()
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time.  This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.

(cherry picked from commit 79040a669c)
2024-03-06 19:09:10 +01:00
Ondřej Surý
ce3b343b0a
Create a second pruning task for rbtdb with unlimited quantum
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized).  However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.

To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.

This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.

(cherry picked from commit 231b2375e5)
2024-03-06 19:09:10 +01:00
Ondřej Surý
c301081a50
Restore the parent cleaning logic in prune_tree()
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.

Update code comments so that they more precisely describe what the
relevant bits of code actually do.

(cherry picked from commit 3a01c749f9)
2024-03-06 19:09:10 +01:00
Ondřej Surý
79040a669c
Move the task creation into cache_create_db()
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time.  This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
2024-03-06 17:11:14 +01:00
Ondřej Surý
231b2375e5
Create a second pruning task for rbtdb with unlimited quantum
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized).  However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.

To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.

This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.
2024-03-06 17:11:14 +01:00
Ondřej Surý
3a01c749f9
Restore the parent cleaning logic in prune_tree()
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.

Update code comments so that they more precisely describe what the
relevant bits of code actually do.

(cherry picked from commit 454c75a33a)
2024-03-06 17:11:14 +01:00
Ondřej Surý
3129c80741
Make the TTL-based cleaning more aggressive
It was discovered that the TTL-based cleaning could build up
a significant backlog of the rdataset headers during the periods where
the top of the TTL heap isn't expired yet.  Make the TTL-based cleaning
more aggressive by cleaning more headers from the heap when we are
adding new header into the RBTDB.

(cherry picked from commit d8220ca4ca)
(cherry picked from commit b4d9f1cbab)
2024-02-29 16:13:18 +01:00
Ondřej Surý
1c3dd0d0ff
Remove expired rdataset headers from the heap
It was discovered that an expired header could sit on top of the heap
a little longer than desireable.  Remove expired headers (headers with
rdh_ttl set to 0) from the heap completely, so they don't block the next
TTL-based cleaning.

(cherry picked from commit a9383e4b95)
(cherry picked from commit 756555dbcf)
2024-02-29 16:13:18 +01:00
Ondřej Surý
b4d9f1cbab
Make the TTL-based cleaning more aggressive
It was discovered that the TTL-based cleaning could build up
a significant backlog of the rdataset headers during the periods where
the top of the TTL heap isn't expired yet.  Make the TTL-based cleaning
more aggressive by cleaning more headers from the heap when we are
adding new header into the RBTDB.

(cherry picked from commit d8220ca4ca)
2024-02-29 16:07:41 +01:00
Ondřej Surý
756555dbcf
Remove expired rdataset headers from the heap
It was discovered that an expired header could sit on top of the heap
a little longer than desireable.  Remove expired headers (headers with
rdh_ttl set to 0) from the heap completely, so they don't block the next
TTL-based cleaning.

(cherry picked from commit a9383e4b95)
2024-02-29 16:07:41 +01:00
Ondřej Surý
29e9d4193b
Simplify the parent cleaning in the prune_tree() mechanism
Instead of juggling with node locks in a cycle, cleanup the node we are
just pruning and send any the parent that's also subject to the pruning
to the prune tree via normal way (e.g. enqueue pruning on the parent).

This simplifies the code and also spreads the pruning load across more
event loop ticks which is better for lock contention as less things run
in a tight loop.

(cherry picked from commit 0b32d323e0)
(cherry picked from commit a4c225cb6d)
2024-02-29 12:39:26 +01:00
Ondřej Surý
156a08e327
Reduce lock contention during RBTDB tree pruning
The log message for commit a9af1ac5ae
explained:

    In some older BIND 9 branches, the extra queuing overhead eliminated by
    this change could be remotely exploited to cause excessive memory use.
    Due to architectural shift, this branch is not vulnerable to that issue,
    but applying the fix to the latter is nevertheless deemed prudent for
    consistency and to make the code future-proof.

However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.

This commit reverts the change to the pruning mechanism introduced by
commit a9af1ac5ae as BIND branches newer
than 9.16 were not affected by the excessive event queueing overhead
issue mentioned in the log message for the above commit.

(cherry picked from commit eed17611d8)
(cherry picked from commit 4b32456705)
2024-02-29 12:39:26 +01:00
Ondřej Surý
a4c225cb6d
Simplify the parent cleaning in the prune_tree() mechanism
Instead of juggling with node locks in a cycle, cleanup the node we are
just pruning and send any the parent that's also subject to the pruning
to the prune tree via normal way (e.g. enqueue pruning on the parent).

This simplifies the code and also spreads the pruning load across more
event loop ticks which is better for lock contention as less things run
in a tight loop.

(cherry picked from commit 0b32d323e0)
2024-02-29 12:06:56 +01:00
Ondřej Surý
4b32456705
Reduce lock contention during RBTDB tree pruning
The log message for commit a9af1ac5ae
explained:

    In some older BIND 9 branches, the extra queuing overhead eliminated by
    this change could be remotely exploited to cause excessive memory use.
    Due to architectural shift, this branch is not vulnerable to that issue,
    but applying the fix to the latter is nevertheless deemed prudent for
    consistency and to make the code future-proof.

However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.

This commit reverts the change to the pruning mechanism introduced by
commit a9af1ac5ae as BIND branches newer
than 9.16 were not affected by the excessive event queueing overhead
issue mentioned in the log message for the above commit.

(cherry picked from commit eed17611d8)
2024-02-29 12:06:56 +01:00
Aydın Mercan
abc47f5ce4
Expose the TCP client count in statistics channel
The statistics channel does not expose the current number of TCP clients
connected, only the highwater. Therefore, users did not have an easy
means to collect statistics about TCP clients served over time. This
information could only be measured as a seperate mechanism via rndc by
looking at the TCP quota filled.

In order to expose the exact current count of connected TCP clients
(tracked by the "tcp-clients" quota) as a statistics counter, an
extra, dedicated Network Manager callback would need to be
implemented for that purpose (a counterpart of ns__client_tcpconn()
that would be run when a TCP connection is torn down), which is
inefficient. Instead, track the number of currently-connected TCP
clients separately for IPv4 and IPv6, as Network Manager statistics.

(cherry picked from commit 2690dc48d3)
2024-02-27 11:04:28 +03:00
Mark Andrews
2e224d46d2 Add RESINFO record type
This is a TXT clone using code point 261.

(cherry picked from commit 0651063658)
2024-02-26 13:20:48 +11:00