Commit graph

1645 commits

Author SHA1 Message Date
Evan Hunt
7fd78344e0 refactor isc_ratelimiter to use loop callbacks
the rate limter now uses loop callbacks rather than task events.
the API for isc_ratelimiter_enqueue() has been changed; we now pass
in a loop, a callback function and a callback argument, and
receive back a rate limiter event object (isc_rlevent_t). it
is no longer necessary for the caller to allocate the event.

the callback argument needs to include a pointer to the rlevent
object so that it can be freed using isc_rlevent_free(), or by
dequeueing.
2023-01-31 21:41:19 -08:00
Ondřej Surý
3d4e41d076 Remove the total memory counter
The total memory counter had again little or no meaning when we removed
the internal memory allocator.  It was just a monotonic counter that
would count add the allocation sizes but never subtracted anything, so
it would be just a "big number".
2023-01-24 17:57:16 +00:00
Ondřej Surý
91e349433f Remove maxinuse memory counter
The maxinuse memory counter indicated the highest amount of
memory allocated in the past. Checking and updating this high-
water mark value every time memory was allocated had an impact
on server performance, so it has been removed. Memory size can
be monitored more efficiently via an external tool logging RSS.
2023-01-24 17:57:16 +00:00
Ondřej Surý
971df0b4ed Remove malloced and maxmalloced memory counter
The malloced and maxmalloced memory counters were mostly useless since
we removed the internal allocator blocks - it would only differ from
inuse by the memory context size itself.
2023-01-24 17:57:16 +00:00
Evan Hunt
301f8b23e1 complete change of NETMGR_TRACE to ISC_NETMGR_TRACE
some references to the old ifdef were still in place.
2023-01-20 12:46:34 -08:00
Aram Sargsyan
41dc48bfd7 Refactor isc_nm_xfr_allowed()
Return 'isc_result_t' type value instead of 'bool' to indicate
the actual failure. Rename the function to something not suggesting
a boolean type result. Make changes in the places where the API
function is being used to check for the result code instead of
a boolean value.
2023-01-19 10:24:08 +00:00
Ondřej Surý
f3753d591f Use thread_local EVP_MD_CTX in isc_iterated_hash()
As this code is on hot path (NSEC3) this introduces an additional
optimization of the EVP_MD API - instead of calling EVP_MD_CTX_new() on
every call to isc_iterated_hash(), we create two thread_local objects
for each thread - a basectx and mdctx, initialize basectx once and then
use EVP_MD_CTX_copy_ex() to flip the initialized state into mdctx.  This
saves us couple more valuable microseconds from the isc_iterated_hash()
call.
2023-01-18 19:36:21 +01:00
Ondřej Surý
e6bfb8e456 Avoid implicit algorithm fetch for OpenSSL EVP_MD family
The implicit algorithm fetch causes a lock contention and significant
slowdown for small input buffers.  For more details, see:

https://github.com/openssl/openssl/issues/19612

Instead of using EVP_DigestInit_ex() initialize empty MD_CTX objects for
each algorithm and use EVP_MD_CTX_copy_ex() to initialize MD_CTX from a
static copy.  Additionally avoid implicit algorithm fetching by using
EVP_MD_fetch() for OpenSSL 3.0.
2023-01-18 18:32:57 +01:00
Tony Finch
290899661d Fix a typo in the NS_PER_ macros
Milliseconds and microseconds were swapped.
2023-01-16 20:33:57 +00:00
Ondřej Surý
d07c4a98da Prefer the pthread_barrier implementation over uv_barrier
Prefer the pthread_barrier implementation on platforms where it is
available over uv_barrier implementation.  This also solves the problem
with thread sanitizer builds on macOS that doesn't have pthread barrier.
2023-01-11 09:51:02 +01:00
Ondřej Surý
10f884a5b8
Remove unused isc_astack unit
The isc_astack unit is now unused, so just remove it.
2023-01-10 20:31:24 +01:00
Ondřej Surý
5bbba0d1a1
Simplify tracing the reference counting in isc_netmgr
Always track the per-worker sockets in the .active_sockets field in the
isc__networker_t struct and always track the per-socket handles in the
.active_handles field ian the isc_nmsocket_t struct.
2023-01-10 19:57:39 +01:00
Evan Hunt
916ea26ead remove nonfunctional DSCP implementation
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.

To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.
2023-01-09 12:15:21 -08:00
Ondřej Surý
6613f89c62 Enhance the isc_loop unit to allow reference count tracking
Use ISC_REFCOUNT_TRACE_{IMPL,DECL} to allow better isc_loop reference
tracking - use `#define ISC_LOOP_TRACE 1` in <isc/loop.h> to enable.
2023-01-05 12:33:15 +00:00
Mark Andrews
096b280b1c Do not pass NULL pointer to memmove - undefined behaviour
Check if 'old_base' is NULL and if so skip calling memmove.
2023-01-03 14:40:30 +11:00
Artem Boldariev
7962e7f575 tlsctx_client_session_cache_new() -> tlsctx_client_session_create()
Additionally to renaming, it changes the function definition so that
it accepts a pointer to pointer instead of returning a pointer to the
new object.

It is mostly done to make it in line with other functions in the
module.
2022-12-23 11:10:11 +02:00
Artem Boldariev
f102df96b8 Rename isc_tlsctx_cache_new() -> isc_tlsctx_cache_create()
Additionally to renaming, it changes the function definition so that
it accepts a pointer to pointer instead of returning a pointer to the
new object.

It is mostly done to make it in line with other functions in the
module.
2022-12-23 11:10:11 +02:00
Ondřej Surý
6cb6373b5a Convert Stream DNS to use isc_buffer API
Drop the whole isc_dnsbuffer API and use new improved isc_buffer API
that provides same functionality as the isc_dnsbuffer unit now.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4277eeeb9c Remove TLS DNS transport (and parts common with TCP DNS)
This commit removes TLS DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
e5649710d3 Remove TCP DNS transport
This commit removes TCP DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4524bf4083 Make isc_nm_tlssocket non-optional
This commit unties generic TLS code (isc_nm_tlssocket) from DoH, so
that it will be available regardless of the fact if BIND was built
with DNS over HTTP support or not.
2022-12-20 22:13:53 +02:00
Artem Boldariev
371b02f37a TCP: make it possible to set Nagle's algorithms state via handle
This commit adds ability to turn the Nagle's algorithm on or off via
connections handle. It adds the isc_nmhandle_set_tcp_nodelay()
function as the public interface for this functionality.
2022-12-20 22:13:53 +02:00
Artem Boldariev
f395cd4b3e Add isc_nm_streamdnssocket (aka Stream DNS)
This commit adds an initial implementation of isc_nm_streamdnssocket
transport: a unified transport for DNS over stream protocols messages,
which is capable of replacing both TCP DNS and TLS DNS
transports. Currently, the interface it provides is a unified set of
interfaces provided by both of the transports it attempts to replace.

The transport is built around "isc_dnsbuffer_t" and
"isc_dnsstream_assembler_t" objects and attempts to minimise both the
number of memory allocations during network transfers as well as
memory usage.
2022-12-20 22:13:51 +02:00
Artem Boldariev
338cf3e467 Add isc_dnsstream_assembler_t implementation
This commit adds the implementation for an "isc_dnsstream_assembler_t"
object. The object is built on top of "isc_dnsbuffer_t" and is
intended to encapsulate the state machine used for handling DNS
messages received in the format used for messages transmitted over
TCP.

The idea is that the object accepts the input data received from a
socket, tries to assemble DNS messages from the incoming data and
calls the callback which contains the status of the incoming data as
well as a pointer to the memory region referencing the data of the
assembled message. It is capable of assembling DNS messages no matter
how torn apart they are when sent over network.

The following statuses might be passed to the callback:

* ISC_R_SUCCESS - a message has been successfully assembled;
* ISC_R_NOMORE  - not enough data has been processed to assemble a
message;
* ISC_R_RANGE - there was an attempt to process a zero-sized DNS
message (someone attempts to send us junk data).

One could say that the object replaces the implementation of
"isc__nm_*_processbuffer()" functions used by the old TCP DNS and TLS
DNS transports with a better defined state machine completely
decoupled from the networking code itself.

Such a design makes it trivial to write unit tests for it, leading to
better verification of its correctness.

Another important difference is directly related to the fact that it
is built on top of "isc_dnsbuffer_t", which tries to manage memory in
a smart way. In particular:

* It tries to use a static buffer for smaller messages, reducing
pressure on the memory manager (hot path);
* When allocating dynamic memory for larger messages, it tries to
allocate memory conservatively (generic path).

These characteristics is a significant upgrade over the older logic
where a 64KB(+2 bytes) buffer was allocated from dynamic memory
regardless of the fact if we need a buffer this large or not. That is,
lesser memory usage is expected in a generic case for DNS transports
built on top of "isc_dnsstream_assembler_t."
2022-12-20 21:24:44 +02:00
Artem Boldariev
cbb758abd4 Add isc_dnsbuffer_t implementation
This commit adds "isc_dnsbuffer_t" object implementation, a thin
wrapper on top of "isc_buffer_t" which has the following
characteristics:

* provides interface specifically atuned for handling/generating DNS
messages, especially in the format used for DNS messages over TCP;
* avoids allocating dynamic memory when handling small DNS messages,
while transparently switching to using dynamic memory when handling
larger messages. This approach significantly reduces pressure on the
memory allocator, as most of the DNS messages are small.
2022-12-20 21:24:44 +02:00
Artem Boldariev
94e650ce89 Use 'restrict' and 'const' for 'isc_buffer_t'
The purpose of this commit is to aid compiler in generating better
code when working with `isc_buffer_t` objects by using restricted
pointers (and, to a lesser extent, 'const' modifier for read-only
arguments).

This way we, basically, instruct the compiler that the members of
structured passed by pointers into the functions can be treated as
local variables in the scope of a function. That should reduce the
number of load/store operations emitted by compilers when accessing
objects (e.g. 'isc_buffer_t') via pointers.
2022-12-20 21:01:27 +02:00
Ondřej Surý
460afcda18
Add isc_buffer_trycompact() function needed for StreamDNS
Add isc_buffer_trycompact() that's an optimization; it will compact the
buffer only when the remaining length is smaller than used length.
2022-12-20 19:13:48 +01:00
Ondřej Surý
e6062ee3ae
Add isc_buffer_setmctx() and isc_buffer_clearmctx() function
Add two extra functions needed by StreamDNS:

1. isc_buffer_setmctx() sets the buffer internal memory context, so we
   can use isc_buffer_reserve() on the buffer.  For this, we also need
   to track whether the .base was dynamically allocated or not.  This
   needs to be called after isc_buffer_init() and before first
   isc_buffer_reserve() call.

2. isc_buffer_clearmctx() clears the buffer internal memory context, and
   frees any dynamically allocated buffer.  This needs to be called
   after the last isc_buffer_reserve() call and before calling the
   isc_buffer_invalidate()
2022-12-20 19:13:48 +01:00
Ondřej Surý
8e3a86f6dd
Make the isc_buffer unit header-only
The isc_buffer is often used in the hot-path, so make it header-only
implementation.
2022-12-20 19:13:48 +01:00
Ondřej Surý
2ddea1e41c
Add a static pre-allocated buffer to isc_buffer_t
When the buffer is allocated via isc_buffer_allocate() and the size is
smaller or equal ISC_BUFFER_STATIC_SIZE (currently 512 bytes), the
buffer will be allocated as a flexible array member in the buffer
structure itself instead of allocating it on the heap.  This should help
when the buffer is used on the hot-path with small allocations.
2022-12-20 19:13:48 +01:00
Ondřej Surý
6bd2b34180
Enable auto-reallocation for all isc_buffer_allocate() buffers
When isc_buffer_t buffer is created with isc_buffer_allocate() assume
that we want it to always auto-reallocate instead of having an extra
call to enable auto-reallocation.
2022-12-20 19:13:48 +01:00
Ondřej Surý
135ec7a0f0
Remove single use isc_buffer_putdecint() function
The isc_buffer_putdecint() could be easily replaced with
isc_buffer_printf() with just a small overhead of calling vsnprintf()
twice instead once.  This is not on a hot-path (dns_catz unit), so we
can ignore the overhead and instead have less single-use code in favor
of using reusable more generic function.
2022-12-20 19:13:48 +01:00
Ondřej Surý
2a94123d5b
Refactor the isc_buffer_{get,put}uintN, add isc_buffer_peekuintN
The Stream DNS implementation needs a peek methods that read the value
from the buffer, but it doesn't advance the current position.  Add
isc_buffer_peekuintX methods, refactor the isc_buffer_{get,put}uintN
methods to modern integer types, and move the isc_buffer_getuintN to the
header as static inline functions.
2022-12-20 19:13:48 +01:00
Ondřej Surý
a1d45685e6
Move and extend the uint8_t low-endian to uint{32,64}t to endian.h
Move the U8TO{32,64}_LE and U{32,64}TO8_LE macros to endian.h and extend
the macros for 16-bit and Big-Endian variants.

Use the macros both in isc_siphash (LE) and isc_buffer (BE) units.
2022-12-20 19:13:48 +01:00
Ondřej Surý
aea251f3bc
Change the isc_buffer_reserve() to take just buffer pointer
The isc_buffer_reserve() would be passed a reference to the buffer
pointer, which was unnecessary as the pointer would never be changed
in the current implementation.  Remove the extra dereference.
2022-12-20 19:13:48 +01:00
Artem Boldariev
837fef78b1 Fix TLS session resumption via IDs when Mutual TLS is used
This commit fixes TLS session resumption via session IDs when
client certificates are used. To do so it makes sure that session ID
contexts are set within server TLS contexts. See OpenSSL documentation
for 'SSL_CTX_set_session_id_context()', the "Warnings" section.
2022-12-14 18:06:20 +02:00
Ondřej Surý
e2262c2112
Remove isc_resource API and set limits directly in named_os unit
The only function left in the isc_resource API was setting the file
limit.  Replace the whole unit with a simple getrlimit to check the
maximum value of RLIMIT_NOFILE and set the maximum back to rlimit_cur.

This is more compatible than trying to set RLIMIT_UNLIMITED on the
RLIMIT_NOFILE as it doesn't work on Linux (see man 5 proc on
/proc/sys/fs/nr_open), neither it does on Darwin kernel (see man 2
getrlimit).

The only place where the maximum value could be raised under privileged
user would be BSDs, but the `named_os_adjustnofile()` were not called
there before.  We would apply the increased limits only on Linux and Sun
platforms.
2022-12-07 19:40:00 +01:00
Ondřej Surý
50f357cb36
Refactor the dns_adb unit
The dns_adb unit has been refactored to be much simpler.  Following
changes have been made:

1. Simplify the ADB to always allow GLUE and hints

   There were only two places where dns_adb_createfind() was used - in
   the dns_resolver unit where hints and GLUE addresses were ok, and in
   the dns_zone where dns_adb_createfind() would be called without
   DNS_ADBFIND_HINTOK and DNS_ADBFIND_GLUEOK set.

   Simplify the logic by allowing hint and GLUE addresses when looking
   up the nameserver addresses to notify.  The difference is negligible
   and would cause a difference in the notified addresses only when
   there's mismatch between the parent and child addresses and we
   haven't cached the child addresses yet.

2. Drop the namebuckets and entrybuckets

   Formerly, the namebuckets and entrybuckets were used to reduced the
   lock contention when accessing the double-linked lists stored in each
   bucket.  In the previous refactoring, the custom hashtable for the
   buckets has been replaced with isc_ht/isc_hashmap, so only a single
   item (mostly, see below) would end up in each bucket.

   Removing the entrybuckets has been straightforward, the only matching
   was done on the isc_sockaddr_t member of the dns_adbentry.

   Removing the zonebuckets required GLUEOK and HINTOK bits to be
   removed because the find could match entries with-or-without the bits
   set, and creating a custom key that stores the
   DNS_ADBFIND_STARTATZONE in the first byte of the key, so we can do a
   straightforward lookup into the hashtable without traversing a list
   that contains items with different flags.

3. Remove unassociated entries from ADB database

   Previously, the adbentries could live in the ADB database even after
   unlinking them from dns_adbnames.  Such entries would show up as
   "Unassociated entries" in the ADB dump.  The benefit of keeping such
   entries is little - the chance that we link such entry to a adbname
   is small, and it's simpler to evict unlinked entries from the ADB
   cache (and the hashtable) than create second LRU cleaning mechanism.

   Unlinked ADB entries are now directly deleted from the hash
   table (hashmap) upon destruction.

4. Cleanup expired entries from the hash table

   When buckets were still in place, the code would keep the buckets
   always allocated and never shrink the hash table (hashmap).  With
   proper reference counting in place, we can delete the adbnames from
   the hash table and the LRU list.

5. Stop purging the names early when we hit the time limit

   Because the LRU list is now time ordered, we can stop purging the
   names when we find a first entry that doesn't fullfil our time-based
   eviction criteria because no further entry on the LRU list will meet
   the criteria.

Future work:

1. Lock contention

   In this commit, the focus was on correctness of the data structure,
   but in the future, the lock contention in the ADB database needs to
   be addressed.  Currently, we use simple mutex to lock the hash
   tables, because we almost always need to use a write lock for
   properly purging the hashtables.  The ADB database needs to be
   sharded (similar to the effect that buckets had in the past).  Each
   shard would contain own hashmap and own LRU list.

2. Time-based purging

   The ADB names and entries stay intact when there are no lookups.
   When we add separate shards, a timer needs to be added for time-based
   cleaning in case there's no traffic hashing to the inactive shard.

3. Revisit the 30 minutes limit

   The ADB cache is capped at 30 minutes.  This needs to be revisited,
   and at least the limit should be configurable (in both directions).
2022-11-30 10:03:24 +01:00
Ondřej Surý
118ae66976 Add extra set of ISC_REFCOUNT_TRACE_{IMPL,DECL} macros
The new ISC_REFCOUNT_TRACE_{IMPL,DECL} macros can be used to add a
reference tracing capability to any unit using the reference counting.
It requires a little bit of extra work in each header as you can't have
a define from inside a define (see rpz.h), but it's fairly easy to add
tracing to any struct using reference counting with these macros.
2022-11-29 23:57:40 -08:00
Tony Finch
00307fe318 Deduplicate time unit conversion factors
The various factors like NS_PER_MS are now defined in a single place
and the names are no longer inconsistent. I chose the _PER_SEC names
rather than _PER_S because it is slightly more clear in isolation;
but the smaller units are always NS, US, and MS.
2022-11-25 13:23:36 +00:00
Ondřej Surý
f46ce447a6
Add isc_hashmap API that implements Robin Hood hashing
Add new isc_hashmap API that differs from the current isc_ht API in
several aspects:

1. It implements Robin Hood Hashing which is open-addressing hash table
   algorithm (e.g. no linked-lists)

2. No memory allocations - the array to store the nodes is made of
   isc_hashmap_node_t structures instead of just pointers, so there's
   only allocation on resize.

3. The key is not copied into the hashmap node and must be also stored
   externally, either as part of the stored value or in any other
   location that's valid as long the value is stored in the hashmap.

This makes the isc_hashmap_t a little less universal because of the key
storage requirements, but the inserts and deletes are faster because
they don't require memory allocation on isc_hashmap_add() and memory
deallocation on isc_hashmap_delete().
2022-11-10 15:07:19 +01:00
Ondřej Surý
0492bbf590
Make the pthread_rwlock implementation header-only macros [2/2]
While using mutrace, the phtread-rwlock based isc_rwlock implementation
would be all tracked in the rwlock.c unit losing all useful information
as all rwlocks would be traced in a single place.  Rewrite the
pthread_rwlock based implementation to be header-only macros, so we can
use mutrace to properly track the rwlock contention without heavily
patching mutrace to understand the libisc synchronization primitives.
2022-11-02 10:34:10 +01:00
Ondřej Surý
6bd201ccec
Remove one level of indirection from isc_rwlock [1/2]
Instead of checking the PTHREAD_RUNTIME_CHECK from the header, move it
to the pthread_rwlock implementation functions.  The internal isc_rwlock
actually cannot fail, so the checks in the header was useless anyway.
2022-11-02 10:27:09 +01:00
Ondřej Surý
98b7a93772
Remove isc_rwlock_downgrade() from isc_rwlock
The isc_rwlock_downgrade() is not used anywhere, so we can remove it and
make the pthread_rwlock implementation simpler.
2022-11-02 09:05:37 +01:00
Evan Hunt
dc878e3098 isc_async_run() runs events in reverse order
when more than one event was scheduled in the isc_aysnc queue,
they were executed in reverse order. we need to pull events
off the back of queue instead the front, so that uv_loop will
run them in the right order.

note that isc_job_run() has the same behavior, because it calls
uv_idle_start() directly. in that case we just document it so
it'll be less surprising in the future.
2022-10-31 05:43:45 -07:00
Mark Andrews
3881afeb15 Add dns_rdata_checksvcb
dns_rdata_checksvcb performs data entry checks on SVCB records.
In particular that _dns SVBC record have an 'alpn' and if that 'alpn'
parameter indicates HTTP is in use that 'dophath' is present.
2022-10-29 00:22:54 +11:00
Ondřej Surý
6ba0a22627
Change the return type of isc_lex_create() to void
The isc_lex_create() cannot fail, so cleanup the return type from
isc_result_t to void.
2022-10-26 12:55:06 +02:00
Ondřej Surý
5e20c2ccfb
Replace (void *)-1 with ISC_LINK_TOMBSTONE
Instead of having "arbitrary" (void *)-1 to define non-linked, add a
ISC_LINK_TOMBSTONE(type) macro that replaces the "magic" value with a
define.
2022-10-18 11:36:15 +02:00
Ondřej Surý
cb3c36b8bf
Add ISC_{LIST,LINK}_INITIALIZER for designated initializers
Since we are using designated initializers, we were missing initializers
for ISC_LIST and ISC_LINK, add them, so you can do

    *foo = (foo_t){ .list = ISC_LIST_INITIALIZER };

Instead of:

    *foo = (foo_t){ 0 };
    ISC_LIST_INIT(foo->list);
2022-10-18 11:36:15 +02:00
Tony Finch
26ed03a61e Include the function name when reporting unexpected errors
I.e. print the name of the function in BIND that called the system
function that returned an error. Since it was useful for pthreads
code, it seems worthwhile doing so everywhere.
2022-10-17 13:43:59 +01:00