Commit eba7fb5f9f modified the definition
of struct dns_rbtnode. Doing that changes the layout of map-format zone
files. Bump MAPAPI and update the offsets used in map-format zone file
checks in the "masterformat" system test, as these changes were
inadvertently omitted from the aforementioned change.
(cherry picked from commit 52fe0b6be7)
Commit 540a5b5a2c modified the definition
of struct dns_rbtnode. Doing that changes the layout of map-format zone
files. Bump MAPAPI and update the offsets used in map-format zone file
checks in the "masterformat" system test, as these changes were
inadvertently omitted from the aforementioned change.
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time. This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
(cherry picked from commit d4bc4e5cc6)
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized). However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.
To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.
This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.
(cherry picked from commit 540a5b5a2c)
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.
Update code comments so that they more precisely describe what the
relevant bits of code actually do.
(cherry picked from commit 12c42a6c07)
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time. This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
(cherry picked from commit 79040a669c)
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized). However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.
To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.
This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.
(cherry picked from commit 231b2375e5)
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.
Update code comments so that they more precisely describe what the
relevant bits of code actually do.
(cherry picked from commit 454c75a33a)
Commit 37101c7c8a checks the prunelink
member of the node that was just pruned, not its parent node that was
intended to be examined. Fix by checking the prunelink member of the
parent node, so that adding the latter to its relevant prunenodes list
twice is properly guarded against.
(cherry picked from commit 7d9be24bb1)
Commit 4b6fc97af6 checks the prunelink
member of the node that was just pruned, not its parent node that was
intended to be examined. Fix by checking the prunelink member of the
parent node, so that adding the latter to its relevant prunenodes list
twice is properly guarded against.
If a node cleaned up by prune_tree() happens to belong to the same node
bucket as its parent, the latter is directly appended to the prunenodes
list currently processed by prune_tree(). However, the relevant code
branch does not account for the fact that the parent might already be on
the list it is trying to append it to. Fix by only calling
ISC_LIST_APPEND() for parent nodes not yet added to their relevant
prunenodes list.
(cherry picked from commit 4b6fc97af6)
If a node cleaned up by prune_tree() happens to belong to the same node
bucket as its parent, the latter is directly appended to the prunenodes
list currently processed by prune_tree(). However, the relevant code
branch does not account for the fact that the parent might already be on
the list it is trying to append it to. Fix by only calling
ISC_LIST_APPEND() for parent nodes not yet added to their relevant
prunenodes list.
Commit 801e888d03 made the prune_tree()
function use send_to_prune_tree() for triggering pruning of deleted leaf
nodes' parents. This enabled the following sequence of events to
happen:
1. Node A, which is a leaf node, is passed to send_to_prune_tree() and
its pruning is queued.
2. Node B is added to the RBTDB as a child of node A before the latter
gets pruned.
3. Node B, which is now a leaf node itself (and is likely to belong to
a different node bucket than node A), is passed to
send_to_prune_tree() and its pruning gets queued.
4. Node B gets pruned. Its parent, node A, now becomes a leaf again
and therefore the prune_tree() call that handled node B calls
send_to_prune_tree() for node A.
5. Since node A was already queued for pruning in step 1 (but not yet
pruned), the INSIST(!ISC_LINK_LINKED(node, prunelink)); assertion
fails for node A in send_to_prune_tree().
The above sequence of events is not a sign of pathological behavior.
Replace the assertion check with a conditional early return from
send_to_prune_tree().
(cherry picked from commit f6289ad931)
Commit 2df147cb12 made the prune_tree()
function use send_to_prune_tree() for triggering pruning of deleted leaf
nodes' parents. This enabled the following sequence of events to
happen:
1. Node A, which is a leaf node, is passed to send_to_prune_tree() and
its pruning is queued.
2. Node B is added to the RBTDB as a child of node A before the latter
gets pruned.
3. Node B, which is now a leaf node itself (and is likely to belong to
a different node bucket than node A), is passed to
send_to_prune_tree() and its pruning gets queued.
4. Node B gets pruned. Its parent, node A, now becomes a leaf again
and therefore the prune_tree() call that handled node B calls
send_to_prune_tree() for node A.
5. Since node A was already queued for pruning in step 1 (but not yet
pruned), the INSIST(!ISC_LINK_LINKED(node, prunelink)); assertion
fails for node A in send_to_prune_tree().
The above sequence of events is not a sign of pathological behavior.
Replace the assertion check with a conditional early return from
send_to_prune_tree().
It was discovered that the TTL-based cleaning could build up
a significant backlog of the rdataset headers during the periods where
the top of the TTL heap isn't expired yet. Make the TTL-based cleaning
more aggressive by cleaning more headers from the heap when we are
adding new header into the RBTDB.
(cherry picked from commit d8220ca4ca)
(cherry picked from commit 496fe6bc60)
It was discovered that an expired header could sit on top of the heap
a little longer than desireable. Remove expired headers (headers with
rdh_ttl set to 0) from the heap completely, so they don't block the next
TTL-based cleaning.
(cherry picked from commit a9383e4b95)
(cherry picked from commit abe080d16e)
It was discovered that the TTL-based cleaning could build up
a significant backlog of the rdataset headers during the periods where
the top of the TTL heap isn't expired yet. Make the TTL-based cleaning
more aggressive by cleaning more headers from the heap when we are
adding new header into the RBTDB.
(cherry picked from commit d8220ca4ca)
It was discovered that an expired header could sit on top of the heap
a little longer than desireable. Remove expired headers (headers with
rdh_ttl set to 0) from the heap completely, so they don't block the next
TTL-based cleaning.
(cherry picked from commit a9383e4b95)
The log message for commit c3377cbfaa
explained:
Instead of issuing a separate isc_task_send() call for every RBTDB node
that triggers tree pruning, maintain a list of nodes from which tree
pruning can be started from and only issue an isc_task_send() call if
pruning has not yet been triggered by another RBTDB node.
The extra queuing overhead eliminated by this change could be remotely
exploited to cause excessive memory use.
However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.
This commit makes the prunenodes list bucketed, adds a quantum of 10
items per prune_tree() run, and simplifies parent node cleaning in the
prune_tree() logic.
Instead of juggling node locks in a cycle, only clean up the node
currently being pruned and queue its parent (if it is also eligible) for
pruning in the same way (by sending an event).
This simplifies the code and also spreads the pruning load across more
task loop ticks, which is better for lock contention as less things run
in a tight loop.
(cherry picked from commit 2df147cb12)
The log message for commit c3377cbfaa
explained:
Instead of issuing a separate isc_task_send() call for every RBTDB node
that triggers tree pruning, maintain a list of nodes from which tree
pruning can be started from and only issue an isc_task_send() call if
pruning has not yet been triggered by another RBTDB node.
The extra queuing overhead eliminated by this change could be remotely
exploited to cause excessive memory use.
However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.
This commit makes the prunenodes list bucketed, adds a quantum of 10
items per prune_tree() run, and simplifies parent node cleaning in the
prune_tree() logic.
Instead of juggling node locks in a cycle, only clean up the node
currently being pruned and queue its parent (if it is also eligible) for
pruning in the same way (by sending an event).
This simplifies the code and also spreads the pruning load across more
task loop ticks, which is better for lock contention as less things run
in a tight loop.
dns__cacherbt_expireheader can unlink / free header_prev underneath
it. Use ISC_LIST_TAIL after calling dns__cacherbt_expireheader
instead to get the next pointer to be processed.
(cherry picked from commit 7ce2e86024)
The case insensitive matching in isc_ht was basically completely broken
as only the hashvalue computation was case insensitive, but the key
comparison was always case sensitive.
(cherry picked from commit 175655b771)
Stop the cname_and_other_data processing if we already know that the
result is true. Also, we know that CNAME will be placed in the priority
headers, so we can stop looking for CNAME if we haven't found CNAME and
we are past the priority headers.
(cherry picked from commit 3f774c2a8a)
Mark the infrastructure RRTypes as "priority" types and place them at
the beginning of the rdataslab header data graph. The non-priority
types either go right after the priority types (if any).
(cherry picked from commit 3ac482be7f)
The cachedb was missing piece of code (already found in zonedb) that
would make lookups in the slabheaders to miss the RRSIGs for CNAME if
the order of CNAME and RRSIG(CNAME) was reversed in the node->data.
(cherry picked from commit 5070c7f5c7)
Don't parse the crypto data before parsing and matching the id and the
algorithm for consecutive DNSKEYs. This allows us to parse the RData
only in case the other parameters match allowing us to skip keys that
are of no interest to us, but still would consume precious CPU time by
parsing possibly garbage with OpenSSL.
(cherry picked from commit f39cd17a26)
Remember the position in the iterator when selecting the next signing
key. This should speed up processing for larger DNSKEY RRSets because
we don't have to iterate from start over and over again.
(cherry picked from commit 21af5c9a97)
Change the taskmgr (and thus netmgr) in a way that it supports fast and
slow task queues. The fast queue is used for incoming DNS traffic and
it will pass the processing to the slow queue for sending outgoing DNS
messages and processing resolver messages.
In the future, more tasks might get moved to the slow queues, so the
cached and authoritative DNS traffic can be handled without being slowed
down by operations that take longer time to process.
(cherry picked from commit 1b3b0cef22)
The fix for CVE-2023-4408 introduced a regression in the message
parser, which could cause a crash if an rdata type that can only
occur in the question was found in another section.
(cherry picked from commit 510f1de8a6)
the fix for CVE-2023-4408 introduced a regression in the message
parser, which could cause a crash if duplicate rdatasets were found
in the question section. this commit ensures that rdatasets are
correctly disassociated and freed when this occurs.
(cherry picked from commit 4c19d35614)
Instead of issuing a separate isc_task_send() call for every RBTDB node
that triggers tree pruning, maintain a list of nodes from which tree
pruning can be started from and only issue an isc_task_send() call if
pruning has not yet been triggered by another RBTDB node.
The extra queuing overhead eliminated by this change could be remotely
exploited to cause excessive memory use.
As this change modifies struct dns_rbtnode by adding a new 'prunelink'
member to it, bump MAPAPI to prevent any attempts of loading map-format
zone files created using older BIND 9 versions.
(cherry picked from commit 24381cc36d)
If we are in the process of looking for the A records as part of
dns64 processing and the server-stale timeout triggers, redo the
dns64 changes that had been made to the orignal qctx.
(cherry picked from commit 1fcc483df1)
The wrong result value was being saved for resumption with
nxdomain-redirect when performing the fetch. This lead to an assert
when checking that RFC 1918 reverse queries where not leaking to
the global internet.
(cherry picked from commit 9d0fa07c5e)
To prevent allocating large hashtable in dns_message, we need to
backport the improvements to isc_ht API from BIND 9.18+ that includes
support for case insensitive keys and incremental rehashing of the
hashtables.
When parsing messages use a hashtable instead of a linear search to
reduce the amount of work done in findname when there's more than one
name in the section.
There are two hashtables:
1) hashtable for owner names - that's constructed for each section when
we hit the second name in the section and destroyed right after parsing
that section;
2) per-name hashtable - for each name in the section, we construct a new
hashtable for that name if there are more than one rdataset for that
particular name.
(cherry picked from commit b8a9631754)
When kasp support was added 'inception' was used as a proxy for
'now' and resulted in signatures not being generated or the wrong
signatures being generated. 'inception' is the time to be set
in the signatures being generated and is usually in the past to
allow for clock skew. 'now' determines what keys are to be used
for signing.
(cherry picked from commit 6066e41948)
The maximum DNS message size is 65535 octets. Check that the buffer
being passed to dns_message_renderbegin does not exceed this as the
compression code assumes that all offsets are no bigger than this.
(cherry picked from commit a069513234)
The AES algorithm for DNS cookies was being kept for legacy reasons,
and it can be safely removed in the next major release. Mark is as
deprecated, so the `named-checkconf` prints a warning when in use.
(cherry picked from commit 67d14b0ee5)
when transferring in a non-inline-signing secondary for the first time,
we previously never set the value of zone->loadtime, so it remained
zero. this caused a test failure in the statschannel system test,
and that test case was temporarily disabled. the value is now set
correctly and the test case has been reinstated.
(cherry picked from commit 9643281453)