there was another edge case in which an iterator could be positioned at
the wrong node after dns_qp_lookup(). when searching for a key, it's
possible to reach a dead-end branch that doesn't match, because the
branch offset point is *after* the point where the search key differs
from the branch's contents.
for example, if searching for the key "mop", we could reach a branch
containing "moon" and "moor". the branch offset point - i.e., the
point after which the branch's leaves differ from each other - is the
fourth character ("n" or "r"). however, both leaves differ from the
search key at position *three* ("o" or "p"). the old code failed to
detect this condition, and would have incorrectly left the iterator
pointing at some lower value and not at "moor".
this has been fixed and the unit test now includes this scenario.
in some cases it was possible for the iterator to be positioned in the
wrong place by dns_qp_lookup(). previously, when a leaf node was found
which matched the search key at its parent branch's offset point, but
did not match after that point, the code incorrectly assumed the leaf
it had found was a successor to the searched-for name, and stepped the
iterator back to find a predecessor. however, it was possible for the
non-matching leaf to be the predecessor, in which case stepping the
iterator back was wrong.
(for example: a branch contains "aba" and "abcd", and we are searching
for "abcde". we step down to the twig matching the letter "c" in
position 3. "abcd" is the predecessor of "abcde", so the iterator is
already correctly positioned, but because the twig was an exact match,
we would have moved it back one step to "aba".)
this previously went unnoticed due to a mistake in the qp_test unit
test, which had the wrong expected result for the test case that should
have detected the error. both the code and the test have been fixed.
the 'predecessor' argument to dns_qp_lookup() turns out not to
be sufficient for our needs: the predecessor node in a QP database
could have become "empty" (for the current version) because of an
update or because cache data expired, and in that case the caller
would have to iterate more than one step back to find the predecessor
node that it needs.
it may also be necessary for a caller to iterate forward, in
order to determine whether a node has any children.
for both of these reasons, we now replace the 'predecessor'
argument with an 'iter' argument. if set, this points to memory
with enough space for a dns_qpiter object.
when an exact match is found by the lookup, the iterator will be
pointing to the matching node. if not, it will be pointing to the
lexical predecessor of the nae that was searched for.
a dns_qpiter_current() method has been added for examining
the current value of the iterator without moving it in either
direction.
Previously, there were two methods of working with the overmem
condition:
1. hi/lo water callback - when the overmem condition was reached
for the first time, the water callback was called with HIWATER
mark and .is_overmem boolean was set internally. Similarly,
when the used memory went below the lo water mark, the water
callback would be called with LOWATER mark and .is_overmem
was reset. This check would be called **every** time memory
was allocated or freed.
2. isc_mem_isovermem() - a simple getter for the internal
.is_overmem flag
This commit refactors removes the first method and move the hi/lo water
checks to the isc_mem_isovermem() function, thus we now have only a
single method of checking overmem condition and the check for hi/lo
water is removed from the hot path for memory contexts that doesn't use
overmem checks.
The client connection timeout was set to just one second, which might
not be enough on busy systems (and the CI machines are oh-boy-busy).
Bump the server timeouts to 10 seconds and client timeouts to 5 seconds,
this will make the unit test run a little bit longer, but it should be
more reliable.
Refactor the dispatch unit test to use more local variables (previously
dispatchmgr, dispatch and dispentry were all global), and add two new
tests:
* dispatch_getcp - test whether the TCP connection will get reused
* dispatch_newtcp - test that the TCP connection will not get reused
when DNS_DISPATCHOPT_UNSHARED is in effect
The current dispatch code could reuse the TCP connection when
dns_dispatch_gettcp() would be used first. This is problematic as the
dns_resolver doesn't use TCP connection sharing, but dns_request could
get the TCP stream that was created outside of the dns_request.
Add new DNS_DISPATCHOPT_UNSHARED option to dns_dispatch_createtcp() that
would prevent the TCP stream to be reused. Use that option in the
dns_resolver call to dns_dispatch_createtcp() to prevent dns_request
from reusing the TCP connections created by dns_resolver.
Additionally, the dns_xfrin unit added TCP connection sharing for
incoming transfers. While interleaving *xfr streams on a TCP connection
should work this should be a deliberate change and be property of the
server that can be controlled. Additionally some level of parallel TCP
streams is desirable. Revert to the old behaviour by removing the
dns_dispatch_gettcp() calls from dns_xfrin and use the new option to
prevent from sharing the transfer streams with dns_request.
In order to check whether there are enough inserted values the
code uses the 'tests' variable (loop counter), which is unreliable,
because the loop sometimes removes an item instead of inserting
one (when the randomly generated item already exists).
Instead of the loop counter, use the existing variable 'inserted',
which should indicate the correct number of the inserted items.
depending on how the QP trie is traversed during a lookup, it is
possible for a search to terminate on a leaf which is a partial
match, without that leaf being added to the chain. to ensure the
chain is correct in this case, when a partial match condition is
detected via qpkey_compare(), we will call add_link() again, just
in case. (add_link() will check for a duplicated node, so it will
be harmless if it was already done.)
dns_qp_findname_ancestor() now takes an optional 'predecessor'
parameter, which if non-NULL is updated to contain the DNSSEC
predecessor of the name searched for. this is done by constructing
an iterator stack while carrying out the search, so it can be used
to step backward if needed.
since dns_qp_findname_ancestor() can now return a chain object, it is no
longer necessary to provide a _NOEXACT search option. if we want to look
up the closest ancestor of a name, we can just do a normal search, and
if successful, retrieve the second-to-last node from the QP chain.
this makes ancestor lookups slightly more complicated for the caller,
but allows us to simplify the code in dns_qp_findname_ancestor(), making
it easier to ensure correctness. this was a fairly rare use case:
outside of unit tests, DNS_QPFIND_NOEXACT was only used in the zone
table, which has now been updated to use the QP chain. the equivalent
RBT feature is only used by the resolver for cache lookups of 'atparent'
types (i.e, DS records).
- make iterators reversible: refactor dns_qpiter_next() and add a new
dns_qpiter_prev() function to support iterating both forwards and
backwards through a QP trie.
- added a 'name' parameter to dns_qpiter_next() (as well as _prev())
to make it easier to retrieve the nodename while iterating, without
having to construct it from pointer value data.
dns_qp_findname_ancestor() now takes an optional 'chain' parameter;
if set, the dns_qpchain object it points to will be updated with an
array of pointers to the populated nodes between the tree root and the
requested name. the number of nodes in the chain can then be accessed
using dns_qpchain_length() and the individual nodes using
dns_qpchain_node().
add a 'foundname' parameter to dns_qp_findname_ancestor(),
and use it to set the found name in dns_nametree.
this required adding a dns_qpkey_toname() function; that was
done by moving qp_test_keytoname() from the test library to qp.c.
added some more test cases and fixed bugs with the handling of
relative and empty names.
Instead of creating new memory pools for each new dns_message, change
dns_message_create() method to optionally accept externally created
dns_fixedname_t and dns_rdataset_t memory pools. This allows us to
preallocate the memory pools in ns_client and dns_resolver units for the
lifetime of dns_resolver_t and ns_clientmgr_t.
Reusing TCP connections with dns_dispatch_gettcp() used linear linked
list to lookup existing outgoing TCP connections that could be reused.
Replace the linked list with per-loop cds_lfht hashtable to speedup the
lookups. We use cds_lfht because it allows non-unique node insertion
that we need to check for dispatches in different connection states.
Instead of high number of dispatches (4 * named_g_udpdisp)[1], make the
dispatches bound to threads and make dns_dispatchset_t create a dispatch
for each thread (event loop).
This required couple of other changes:
1. The dns_dispatch_createudp() must be called on loop, so the isc_tid()
is already initialized - changes to nsupdate and mdig were required.
2. The dns_requestmgr had only a single dispatch per v4 and v6. Instead
of using single dispatch, use dns_dispatchset_t for each protocol -
this is same as dns_resolver.
instead of allowing a NULL nametree in dns_nametree_covered(),
require nametree to exist, and ensure that the nametrees defined
for view and resolver objects are always created.
name trees can now also hold trees of counters. each time a name
dns_nametree_add() is called with a given name, the counter for that
name is incremented; the name is not deleted until dns_nametree_delete()
is called the same number of times.
this is meant to be used for synth-from-dnssec, which is incremented for
each key defined at a name, and decremented when a key is removed, the
name must continue to exist until the number of keys has reached zero.
name trees can now hold either boolean values or bit fields. the
type is selected when the name tree is created.
the behavior of dns_nametree_add() differs slightly beteween the types:
in a boolean tree adding an existing name will return ISC_R_EXISTS,
but in a bitfield tree it simply sets the specified bit in the bitfield
and returns ISC_R_SUCCESS.
this is a QP trie of boolean values to indicate whether a name is
included in or excluded from some policy. this can be used for
synth-from-dnssec, deny-answer-aliases, etc.
This adds support for User Statically Defined Tracing (USDT). On
Linux, this uses the header from SystemTap and dtrace utility, but the
support is universal as long as dtrace is available.
Also add the required infrastructure to add probes to libisc, libdns and
libns libraries, where most of the probes will be.
The dns_dispatchmgr object was only set in the dns_view object making it
prone to use-after-free in the dns_xfrin unit when shutting down named.
Remove dns_view_setdispatchmgr() and optionally pass the dispatchmgr
directly to dns_view_create() when it is attached and not just assigned,
so the dns_dispatchmgr doesn't cease to exist too early.
The dns_view_getdnsdispatchmgr() is now protected by the RCU lock, the
dispatchmgr reference is incremented, so the caller needs to detach from
it, and the function can return NULL in case the dns_view has been
already shut down.
replace the red-black tree used by the negative trust anchor table
with a QP trie.
because of this change, dns_ntatable_init() can no longer fail, and
neither can dns_view_initntatable(). these functions have both been
changed to type void.
this function finds the closest matching ancestor, but the function
name could be read to imply that it returns the direct parent node;
this commit suggests a slightly less misleading name.
Make the `pval_r` and `ival_r` out arguments optional.
Add `pval_r` and `ival_r` out arguments to `dns_qp_deletekey()`
and `dns_qp_deletename()`, to return the deleted leaf.
The dns_badcache unit had (yet another) own locked hashtable
implementation. Replace the hashtable used by dns_badcache with
lock-free cds_lfht implementation from liburcu.
The isc_stats_create() can no longer return anything else than
ISC_R_SUCCESS. Refactor isc_stats_create() and its variants in libdns,
libns and named to just return void.
Add a unit test to check if the overmem purging in the RBTDB is
effective when mixed size RR data is inserted into the database.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Jinmei Tatuya <jtatuya@infoblox.com>
These two configuration options worked in conjunction with 'auto-dnssec'
to determine KSK usage, and thus are now obsoleted.
However, in the code we keep KSK processing so that when a zone is
reconfigured from using 'dnssec-policy' immediately to 'none' (without
going through 'insecure'), the zone is not immediately made bogus.
Add one more test case for going straight to none, now with a dynamic
zone (no inline-signing).
store a pointer to the running loop when creating a dispatch entry
with dns_dispatch_add(), and use isc_loop_now() to get the timestamp for
the current event loop tick when we initialize the dispentry start time
and check for timeouts.
ultimately we want the slab implementation of dns_rdataset to
be usable by more database implementaions than just rbtdb. this
commit moves rdataset_methods to rdataslab.c, renamed
dns_rdataslab_rdatasetmethods.
new database methods have been added: locknode, unlocknode,
addglue, expiredata, and deletedata, allowing external functions to
perform functions that previously required internal access to the
database implementation.
database and heap pointers are now stored in the dns_slabheader object
so that header is the only thing that needs to be passed to some
functions; this will simplify moving functions that process slabheaders
out of rbtdb.c so they can be used by other database implementations.
to reduce the amount of common code that will need to be shared
between the separated cache and zone database implementations,
clean up unused portions of dns_db.
the methods dns_db_dump(), dns_db_isdnssec(), dns_db_printnode(),
dns_db_resigned(), dns_db_expirenode() and dns_db_overmem() were
either never called or were only implemented as nonoperational stub
functions: they have now been removed.
dns_db_nodefullname() was only used in one place, which turned out
to be unnecessary, so it has also been removed.
dns_db_ispersistent() and dns_db_transfernode() are used, but only
the default implementation in db.c was ever actually called. since
they were never overridden by database methods, there's no need to
retain methods for them.
in rbtdb.c, beginload() and endload() methods are no longer defined for
the cache database, because that was never used (except in a few unit
tests which can easily be modified to use the zone implementation
instead). issecure() is also no longer defined for the cache database,
as the cache is always insecure and the default implementation of
dns_db_issecure() returns false.
for similar reasons, hashsize() is no longer defined for zone databases.
implementation functions that are shared between zone and cache are now
prepended with 'dns__rbtdb_' so they can become nonstatic.
serve_stale_ttl is now a common member of dns_db.
in preparation for splitting up rbtdb.c, rename some types so they
can be defined in dns/types.h instead of only locally. these include:
- struct noqname, which is used to hold no-qname and closest-encloser
proofs, and is now named dns_proof_t;
- rbtdb_rdatatype_t, which is used to hold a pair of rdatatypes and
is now called dns_typepair_t and defined in rdatatype.h;
- rbtdb_serial_t, which is now just a uint32_t;
- rdatasetheader_t and rdatasetheaderlist_t, now called
dns_slabheader_t and dns_slabheaderlist_t;
- rbtdb_version_t, now called dns_rbtdb_version_t.
the helper functions header_from_raw() and raw_from_header() are
renamed dns_slabheader_fromrdataset() and dns_slabheader_raw().
also made further style changes:
- fixing uninitialized pointer variables throughout rbtdb.c;
- switching some initializations to struct literals;
- renaming some functions and struct members more descriptively;
- replacing dns_db_secure_t with a simple bool since it no longer needs
to be tri-valued.
BIND's rdataset structure is a view of some DNS records. It is
polymorphic, so the details of how the records are stored can vary.
For instance, the records can be held in an rdatalist, or in an
rdataslab in the rbtdb.
The dns_rdataset structure previously had a number of fields called
`private1` up to `private7`, which were used by the various rdataset
implementations. It was not at all clear what these fields were for,
without reading the code and working it out from context.
This change makes the rdataset inheritance hierarchy more clear. The
polymorphic part of a `struct dns_rdataset` is now a union of structs,
each of which is named for the class of implementation using it. The
fields of these structs replace the old `privateN` fields. (Note: the
term "inheritance hierarchy" refers to the fact that the builtin and
SDLZ implementations are based on and inherit from the rdatalist
implementation, which in turn inherits from the generic rdataset.
Most of this change is mechanical, but there are a few extras.
In keynode.c there were a number of REQUIRE()ments that were not
necessary: they had already been checked by the rdataset method
dispatch code. On the other hand, In ncache.c there was a public
function which needed to REQUIRE() that an rdataset was valid.
I have removed lots of "reset iterator state" comments, because it
should now be clear from `target->iter = NULL` where before
`target->private5 = NULL` could have been doing anything.
Initialization is a bit neater in a few places, using C structure
literals where appropriate.
The pointer arithmetic for translating between an rdataslab header and
its raw contents is now fractionally safer.
since it is not necessary to find partial matches when looking
up names in a TSIG keyring, we can use a hash table instead of
an RBT to store them.
the tsigkey object now stores the key name as a dns_fixedname
rather than allocating memory for it.
the `name` parameter to dns_tsigkeyring_add() has been removed;
it was unneeded since the tsigkey object already contains a copy
of the name.
the opportunistic cleanup_ring() function has been removed;
it was only slowing down lookups.
this function was no longer needed, because the algorithm name is no
longer copied into the tsigkey object by dns_tsigkey_createfromkey();
it's always just a pointer to a statically defined name.
the prior practice of passing a dns_name containing the
expanded name of an algorithm to dns_tsigkey_create() and
dns_tsigkey_createfromkey() is unnecessarily cumbersome;
we can now pass the algorithm number instead.
use the ISC_REFCOUNT attach/detach implementation in dns/tsig.c
so that detailed tracing can be used during refactoring.
dns_tsig_keyring_t has been renamed dns_tsigkeyring_t so the type
and the attach/detach function names will match.