The current dispatch code could reuse the TCP connection when
dns_dispatch_gettcp() would be used first. This is problematic as the
dns_resolver doesn't use TCP connection sharing, but dns_request could
get the TCP stream that was created outside of the dns_request.
Add new DNS_DISPATCHOPT_UNSHARED option to dns_dispatch_createtcp() that
would prevent the TCP stream to be reused. Use that option in the
dns_resolver call to dns_dispatch_createtcp() to prevent dns_request
from reusing the TCP connections created by dns_resolver.
Additionally, the dns_xfrin unit added TCP connection sharing for
incoming transfers. While interleaving *xfr streams on a TCP connection
should work this should be a deliberate change and be property of the
server that can be controlled. Additionally some level of parallel TCP
streams is desirable. Revert to the old behaviour by removing the
dns_dispatch_gettcp() calls from dns_xfrin and use the new option to
prevent from sharing the transfer streams with dns_request.
In units that support detailed reference tracing via ISC_REFCOUNT
macros, we were doing:
/* Define to 1 for detailed reference tracing */
#undef <unit>_TRACE
This would prevent using -D<unit>_TRACE=1 in the CFLAGS.
Convert the above mentioned snippet with just a comment how to enable
the detailed reference tracing:
/* Add -D<unit>_TRACE=1 to CFLAGS for detailed reference tracing */
Keeping the information about lame server in the ADB was done in !322 to
fix following security issue:
[CVE-2021-25219] Disable "lame-ttl" cache
The handling of the lame servers needs to be redesigned and it is not
going to be enabled any time soon, and the current code is just dead
code that takes up space, code and stands in the way of making ADB work
faster.
Remove all the internals needed for handling the lame servers in the ADB
for now. It might get reintroduced later if and when we redesign ADB.
Move the block on the error path, where the link is checked, to a place
where it makes sense, to avoid accessing an unitialized link when
jumping to the 'cleanup_query' label from 4 different places. The link
is initialized only after those jumps happen.
In addition, initilize the link when creating the object, to avoid
similar errors.
Instead of creating new memory pools for each new dns_message, change
dns_message_create() method to optionally accept externally created
dns_fixedname_t and dns_rdataset_t memory pools. This allows us to
preallocate the memory pools in ns_client and dns_resolver units for the
lifetime of dns_resolver_t and ns_clientmgr_t.
Instead of high number of dispatches (4 * named_g_udpdisp)[1], make the
dispatches bound to threads and make dns_dispatchset_t create a dispatch
for each thread (event loop).
This required couple of other changes:
1. The dns_dispatch_createudp() must be called on loop, so the isc_tid()
is already initialized - changes to nsupdate and mdig were required.
2. The dns_requestmgr had only a single dispatch per v4 and v6. Instead
of using single dispatch, use dns_dispatchset_t for each protocol -
this is same as dns_resolver.
Refactor isc_hashmap to allow custom matching functions. This allows us
to have better tailored keys that don't require fixed uint8_t arrays,
but can be composed of more fields from the stored data structure.
Add a configuration option, resolver-use-dns64, which when true
will cause named to map IPv4 address to IPv6 addresses using the
view's DNS64 mapping rules when making iterative queries.
7a78a85b moved the destruction of the validator from near the start
validated to the end. This reversed the order of dns_validator_destroy
and fetchctx_detach. Restore the order so that val->name remains
valid for the lifetime of the validator.
instead of allowing a NULL nametree in dns_nametree_covered(),
require nametree to exist, and ensure that the nametrees defined
for view and resolver objects are always created.
name trees can now hold either boolean values or bit fields. the
type is selected when the name tree is created.
the behavior of dns_nametree_add() differs slightly beteween the types:
in a boolean tree adding an existing name will return ISC_R_EXISTS,
but in a bitfield tree it simply sets the specified bit in the bitfield
and returns ISC_R_SUCCESS.
the functions for disabling DNSSEC signing algorithms and DS digest
algorithms in resolver.c had a lot of duplicated code.
this commit adds functions to implement a "bitfield tree", which is
(currently) an RBT in which the node data contains arbitrary-sized
bitfields to indicate whether a value has been added at the given
node or not. (it can be changed to a QP trie later.)
it also replaces the functions dns_resolver_disable_algorithm(),
dns_resolver_algorithm_supported(), dns_resolver_disable_ds_digest()
and dns_resolver_ds_digest_supported() with simple wrappers that call
the new functions.
replace the use of RBTs for deny-answer-aliases, the exclude
lists for deny-answer-aliases and deny-answer-addresses, and
dnssec-must-be-secure, with name trees.
Use the new isc_mem_c*() calloc-like API for allocations that are
zeroed.
In turn, this also fixes couple of incorrect usage of the ISC_MEM_ZERO
for structures that need to be zeroed explicitly.
There are few places where isc_mem_cput() is used on structures with a
flexible member (or similar).
Instead of an RBT for the forwarders table, use a QP trie.
We now use reference counting for dns_forwarders_t. When a forwarders
object is retrieved by dns_fwdtable_find(), it must now be explicitly
detached by the caller afterward.
QP tries require stored objects to include their names, so the
the forwarders object now has that. This obviates the need to
pass back a separate 'foundname' value from dns_fwdtable_find().
The dns_badcache unit had (yet another) own locked hashtable
implementation. Replace the hashtable used by dns_badcache with
lock-free cds_lfht implementation from liburcu.
store a pointer to the running loop when creating a dispatch entry
with dns_dispatch_add(), and use isc_loop_now() to get the timestamp for
the current event loop tick when we initialize the dispentry start time
and check for timeouts.
in the past there was overlap between the fields used
as resolver fetch options and ADB addrinfo flags. this has
mostly been eliminated; now we can clean up the rest of
it and remove some confusing comments.
fctx counters could be accessed without locking when
"rndc fetchlimit" is called; while this is probably harmless
in production, it triggered TSAN reports in system tests.
make the code flow clearer by enumerating the result codes that
are treated as success conditions for an intermediate minimized
query (ISC_R_SUCCESS, DNS_R_DELEGATION, DNS_R_NXRRSET, etc), rather
than just folding them all into the 'default' branch of a switch
statement.
The dns_resolver creates a lot of smaller objects (fetch context, fetch
counter, query, response, ...) and those are all loop-bound.
Previously, those objects were allocated from the a single resolver
context, which in turn increases contention between threads - remember
"dead by thousand atomic paper cuts". Instead of using a single memory
context, use the per-loop memory contexts that are bound to a specific
loop and thus there's no contention between them when doing the memory
accounting.
view->adb may be referenced while the view is shutting down as the
zone uses a weak reference to the view and examines view->adb but
dns_view_detach call dns_adb_detach to clear view->adb.
The number of clients per query is calculated using the pending
fetch responses in the list. The dns_resolver_createfetch() function
includes every item in the list when deciding whether the limit is
reached (i.e. fctx->spilled is true). Then, when the limit is reached,
there is another calculation in fctx_sendevents(), when deciding
whether it is needed to increase the limit, but this time the TRYSTALE
responses are not included in the calculation (because of early break
from the loop), and because of that the limit is never increased.
A single client can have more than one associated response/event in the
list (currently max. two), and calculating them as separate "clients"
is unexpected. E.g. if 'stale-answer-enable' is enabled and
'stale-answer-client-timeout' is enabled and is larger than 0, then
each client will have two events, which will effectively halve the
clients-per-query limit.
Fix the dns_resolver_createfetch() function to calculate only the
regular FETCHDONE responses/events.
Change the fctx_sendevents() function to also calculate only FETCHDONE
responses/events. Currently, this second change doesn't have any impact,
because the TRYSTALE events were already skipped, but having the same
condition in both places will help prevent similar bugs in the future
if a new type of response/event is ever added.
If the resolver received a FORMERR response to a request with
an DNS COOKIE option present that echoes the option back, resend
the request without an DNS COOKIE option present.
It should be floor(DNS_NAME_MAXWIRE / 2) + 1 == 128
The mistake was introduced in c6bf51492d because:
* I was refactoring an existing `DNS_MAX_LABELS` defined as 127
* There was a longstanding bug in `dns_name_isvalid()` which
checked the number of labels against 127U instead of 128
* I mistakenly thought `dns_name_isvalid()` was correct and
`dns_name_countlabels()` was incorrect, but the reverse was true.
After this commit, occurrances of `DNS_NAME_MAXLABELS` with value
128 are consistent with the use of 127 or 128 before commit
c6bf51492d except for the mistake in `dns_name_isvalid()`.
This commit adds a test case that checks the MAXLABELS case
in `dns_name_fromtext()` and `dns_name_isvalid()`.
This change makes the zone table lock-free for reads. Previously, the
zone table used a red-black tree, which is not thread safe, so the hot
read path acquired both the per-view mutex and the per-zonetable
rwlock. (The double locking was to fix to cleanup races on shutdown.)
One visible difference is that zones are not necessarily shut down
promptly: it depends on when the qp-trie garbage collector cleans up
the zone table. The `catz` system test checks several times that zones
have been deleted; the test now checks for zones to be removed from
the server configuration, instead of being fully shut down. The catz
test does not churn through enough zones to trigger a gc, so the zones
are not fully detached until the server exits.
After this change, it is still possible to improve the way we handle
changes to the zone table, for instance, batching changes, or better
compaction heuristics.
The dns_adbentry_overquota() was violating the layers accessing the
adbentry struct members directly. Change it to dns_adb_overquota() to
match the dns_adb API.
The isc_time_now() and isc_time_now_hires() were used inconsistently
through the code - either with status check, or without status check,
or via TIME_NOW() macro with RUNTIME_CHECK() on failure.
Refactor the isc_time_now() and isc_time_now_hires() to always fail when
getting current time has failed, and return the isc_time_t value as
return value instead of passing the pointer to result in the argument.
This is a simple replacement using the semantic patch from the previous
commit and as added bonus, one removal of previously undetected unused
variable in named/server.c.
Change the isc_job_run() to not-make any allocations. The caller must
make sure that it allocates isc_job_t - usually as part of the argument
passed to the callback.
For simple jobs, using isc_async_run() is advised as it allocates its
own separate isc_job_t.
When resquery_response() was called with ISC_R_SHUTTINDOWN, the region
argument would be NULL, but rctx_respinit() would try to pass
region->base and region->len to the isc_buffer_init() leading to
a NULL pointer dereference. Properly handle non-ISC_R_SUCCESS by
ignoring the provided region.
Completely remove the TKEY Mode 2 (Diffie-Hellman Exchanged Keying) from
BIND 9 (from named, named.conf and all the tools). The TKEY usage is
fringe at best and in all known cases, GSSAPI is being used as it should.
The draft-eastlake-dnsop-rfc2930bis-tkey specifies that:
4.2 Diffie-Hellman Exchanged Keying (Deprecated)
The use of this mode (#2) is NOT RECOMMENDED for the following two
reasons but the specification is still included in Appendix A in case
an implementation is needed for compatibility with old TKEY
implementations. See Section 4.6 on ECDH Exchanged Keying.
The mixing function used does not meet current cryptographic
standards because it uses MD5 [RFC6151].
RSA keys must be excessively long to achieve levels of security
required by current standards.
We might optionally implement Elliptic Curve Diffie-Hellman (ECDH) key
exchange mode 6 if the draft ever reaches the RFC status. Meanwhile the
insecure DH mode needs to be removed.
This implements node reference tracing that passes all the internal
layers from dns_db API (and friends) to increment_reference() and
decrement_reference().
It can be enabled by #defining DNS_DB_NODETRACE in <dns/trace.h> header.
The output then looks like this:
incr:node:check_address_records:rootns.c:409:0x7f67f5a55a40->references = 1
decr:node:check_address_records:rootns.c:449:0x7f67f5a55a40->references = 0
incr:nodelock:check_address_records:rootns.c:409:0x7f67f5a55a40:0x7f68304d7040->references = 1
decr:nodelock:check_address_records:rootns.c:449:0x7f67f5a55a40:0x7f68304d7040->references = 0
There's associated python script to find the missing detach located at:
https://gitlab.isc.org/isc-projects/bind9/-/snippets/1038
Some qp-trie operations will need to know the maximum number of labels
in a name, so I wanted a standard macro definition with the right
value.
Replace DNS_MAX_LABELS from <dns/resolver.h with DNS_NAME_MAXLABELS in
<dns/name.h>, and add its counterpart DNS_NAME_LABELLEN.
Use these macros in `name.c` and `resolver.c`.
Fix an off-by-one error in an assertion in `dns_name_countlabels()`.
the 'dispatchmgr' member of the resolver object is used by both
the dns_resolver and dns_request modules, and may in the future
be used by others such as dns_xfrin. it doesn't make sense for it
to live in the resolver object; this commit moves it into dns_view.