Commit graph

1843 commits

Author SHA1 Message Date
Ondřej Surý
3b10814569
Fix the streaming read callback shutdown logic
When shutting down TCP sockets, the read callback calling logic was
flawed, it would call either one less callback or one extra.  Fix the
logic in the way:

1. When isc_nm_read() has been called but isc_nm_read_stop() hasn't on
   the handle, the read callback will be called with ISC_R_CANCELED to
   cancel active reading from the socket/handle.

2. When isc_nm_read() has been called and isc_nm_read_stop() has been
   called on the on the handle, the read callback will be called with
   ISC_R_SHUTTINGDOWN to signal that the dormant (not-reading) socket
   is being shut down.

3. The .reading and .recv_read flags are little bit tricky.  The
   .reading flag indicates if the outer layer is reading the data (that
   would be uv_tcp_t for TCP and isc_nmsocket_t (TCP) for TLSStream),
   the .recv_read flag indicates whether somebody is interested in the
   data read from the socket.

   Usually, you would expect that the .reading should be false when
   .recv_read is false, but it gets even more tricky with TLSStream as
   the TLS protocol might need to read from the socket even when sending
   data.

   Fix the usage of the .recv_read and .reading flags in the TLSStream
   to their true meaning - which mostly consist of using .recv_read
   everywhere and then wrapping isc_nm_read() and isc_nm_read_stop()
   with the .reading flag.

4. The TLS failed read helper has been modified to resemble the TCP code
   as much as possible, clearing and re-setting the .recv_read flag in
   the TCP timeout code has been fixed and .recv_read is now cleared
   when isc_nm_read_stop() has been called on the streaming socket.

5. The use of Network Manager in the named_controlconf, isccc_ccmsg, and
   isc_httpd units have been greatly simplified due to the improved design.

6. More unit tests for TCP and TLS testing the shutdown conditions have
   been added.

Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
2023-04-20 12:58:32 +02:00
Ondřej Surý
1715cad685
Refactor the isc_quota code and fix the quota in TCP accept code
In e185412872, the TCP accept quota code
became broken in a subtle way - the quota would get initialized on the
first accept for the server socket and then deleted from the server
socket, so it would never get applied again.

Properly fixing this required a bigger refactoring of the isc_quota API
code to make it much simpler.  The new code decouples the ownership of
the quota and acquiring/releasing the quota limit.

After (during) the refactoring it became more clear that we need to use
the callback from the child side of the accepted connection, and not the
server side.
2023-04-12 14:10:37 +02:00
Ondřej Surý
0a468e7c9e
Make isc_tid() a header-only function
The isc_tid() function is often called on the hot-path and it's the only
function is to return thread_local variable, make the isc_tid() function
a header-only to save several function calls during query-response
processing.
2023-04-12 14:10:37 +02:00
Artem Boldariev
2b3a3c21dc Stream DNS: avoid memory copying/buffer resizing when reading data
This commit optimises isc_dnsstream_assembler_t in such a way that
memory copying and reallocation are avoided when receiving one or more
complete DNS messages at once. We try to handle the data from the
messages directly, without storing them in an intermediate memory
buffer.
2023-04-03 13:31:46 +00:00
Tony Finch
cd0e7f853a Simplify histogram quantiles
The `isc_histosummary_t` functions were written in the early days of
`hg64` and carried over when I brought `hg64` into BIND. They were
intended to be useful for graphing cumulative frequency distributions
and the like, but in practice whatever draws charts is better off with
a raw histogram export. Especially because of the poor performance of
the old functions.

The replacement `isc_histo_quantiles()` function is intended for
providing a few quantile values in BIND's stats channel, when the user
does not want the full histogram. Unlike the old functions, the caller
provides all the query fractions up-front, so that the values can be
found in a single scan instead of a scan per value. The scan is from
larger values to smaller, since larger quantiles are usually more
interesting, so the scan can bail out early.
2023-04-03 12:08:05 +01:00
Tony Finch
bc2389b828 Add per-thread sharded histograms for heavy loads
Although an `isc_histo_t` is thread-safe, it can suffer
from cache contention under heavy load. To avoid this,
an `isc_histomulti_t` contains a histogram per thread,
so updates are local and low-contention.
2023-04-03 12:08:05 +01:00
Tony Finch
82213a48cf Add isc_histo for histogram statistics
This is an adaptation of my `hg64` experiments for use in BIND.

As well as renaming everything according to ISC style, I have
written some more extensive tests that ensure the edge cases are
correct and the fenceposts are in the right places.

I have added utility functions for working with precision in terms of
decimal significant figures as well as this code's native binary.
2023-04-03 12:08:05 +01:00
Ondřej Surý
3a6a0fa867 Replace DE_CONST(k, v) with v = UNCONST(k) macro
Replace the complicated DE_CONST macro that required union with much
simple reference-dereference trick in the UNCONST() macro.
2023-04-03 10:25:56 +00:00
Ondřej Surý
4ec9c4a1db Cleanup the last Windows / MSC ifdefs and comments
Cleanup the remnants of MS Compiler bits from <isc/refcount.h>, printing
the information in named/main.c, and cleanup some comments about Windows
that no longer apply.

The bits in picohttpparser.{h,c} were left out, because it's not our
code.
2023-04-03 09:06:20 +00:00
Mark Andrews
e029803704 Handle fatal and FIPS provider interactions
When fatal is called we may be holding memory allocated by OpenSSL.
This may result in the reference count for the FIPS provider not
going to zero and the shared library not being unloaded during
OPENSSL_cleanup.  When the shared library is ultimately unloaded,
when all remaining dynamically loaded libraries are freed, we have
already destroyed the memory context we where using to track memory
leaks / late frees resulting in INSIST being called.

Disable triggering the INSIST when fatal has being called.
2023-04-03 12:44:27 +10:00
Mark Andrews
5a2e82557e Define isc_fips_mode() and isc_fips_set_mode()
isc_fips_mode() determines if the process is running in FIPS mode

isc_fips_set_mode() sets the process into FIPS mode
2023-04-03 12:05:28 +10:00
Tony Finch
555690a3c9 Simplify thread spawning
The `isc_trampoline` module had a lot of machinery to support stable
thread IDs for use by hazard pointers. But the hazard pointer code
is gone, and the `isc_loop` module now has its own per-loop thread
IDs.

The trampoline machinery seems over-complicated for its remaining
tasks, so move the per-thread initialization into `isc/thread.c`,
and delete the rest.
2023-03-31 17:21:52 +01:00
Ondřej Surý
a5f5f68502
Refactor isc_time_now() to return time, and not result
The isc_time_now() and isc_time_now_hires() were used inconsistently
through the code - either with status check, or without status check,
or via TIME_NOW() macro with RUNTIME_CHECK() on failure.

Refactor the isc_time_now() and isc_time_now_hires() to always fail when
getting current time has failed, and return the isc_time_t value as
return value instead of passing the pointer to result in the argument.
2023-03-31 15:02:06 +02:00
Ondřej Surý
263d232c79 Replace isc_fsaccess API with more secure file creation
The isc_fsaccess API was created to hide the implementation details
between POSIX and Windows APIs.  As we are not supporting the Windows
APIs anymore, it's better to drop this API used in the DST part.

Moreover, the isc_fsaccess was setting the permissions in an insecure
manner - it operated on the filename, and not on the file descriptor
which can lead to all kind of attacks if unpriviledged user has read (or
even worse write) access to key directory.

Replace the code that operates on the private keys with code that uses
mkstemp(), fchmod() and atomic rename() at the end, so at no time the
private key files have insecure permissions.
2023-03-31 12:52:59 +00:00
Ondřej Surý
aca7dd3961 Add isc_os_umask() function to get current umask
As it's impossible to get the current umask without modifying it at the
same time, initialize the current umask at the program start and keep
the loaded value internally.  Add isc_os_umask() function to access the
starttime umask.
2023-03-31 12:52:59 +00:00
Ondřej Surý
4bd6096d4b
Remove isc_stdtime_get() macro
Now that isc_stdtime_get() macro is unused, remove it from the header
file.
2023-03-31 13:33:16 +02:00
Ondřej Surý
c11af0448a
Provide isc_stdtime_now(void) that returns value
As isc_stdtime_get() cannot fail, the API seems to be too complicated,
add new isc_stdtime_now() that returns the unixtime as a return value.
2023-03-31 13:16:28 +02:00
Ondřej Surý
2c0a9575d7
Replace __attribute__((unused)) with ISC_ATTR_UNUSED attribute macro
Instead of marking the unused entities with UNUSED(x) macro in the
function body, use a `ISC_ATTR_UNUSED` attribute macro that expans to
C23 [[maybe_unused]] or __attribute__((__unused__)) as fallback.
2023-03-30 23:29:25 +02:00
Ondřej Surý
1176bf0552
Use C23 attributes if available, add ISC_ATTR_UNUSED
Use C23 attribute styles if available:

  * Add new ISC_ATTR_UNUSED attribute macro that either expands to C23's
    [[maybe_unused]] or __attribute__((__unused__));

  * Add default expansion of the `noreturn` to [[noreturn]] if available;

  * Move the FALLTHROUGH from <isc/util.h> to <isc/attributes.h>
2023-03-30 22:43:39 +02:00
Ondřej Surý
f5fc224af3
Add isc_async_current() macro to run job on current loop
Previously, isc_job_run() could have been used to run the job on the
current loop and the isc_job_run() would take care of allocating and
deallocating the job.  After the change in this MR, the isc_job_run()
is more complicated to use, so we introduce the isc_async_current()
macro to suplement isc_async_run() when we need to run the job on the
current loop.
2023-03-30 16:07:41 +02:00
Ondřej Surý
1844590ad9
Refactor isc_job_run to not-make any allocations
Change the isc_job_run() to not-make any allocations.  The caller must
make sure that it allocates isc_job_t - usually as part of the argument
passed to the callback.

For simple jobs, using isc_async_run() is advised as it allocates its
own separate isc_job_t.
2023-03-30 16:00:52 +02:00
Tony Finch
295e7c80e8 Ad-hoc backtrace logging with isc_backtrace_log()
It's sometimes helpful to get a quick idea of the call stack when
debugging. This change factors out the backtrace logging from named's
fatal error handler so that it's easy to use in other places too.
2023-03-29 10:47:53 +00:00
Evan Hunt
fe7ed2ba24 update stream sockets with bound address/port
when isc_nm_listenstreamdns() is called with a local port of 0,
a random port is chosen. call uv_getsockname() to determine what
the port is as soon as the socket is bound, and add a function
isc_nmsocket_getaddr() to retrieve it, so that the caller can
connect to the listening socket. this will be used in cases
where the same process is acting as both client and server.
2023-03-28 12:38:28 -07:00
Artem Boldariev
719343348e Delete old TLS DNS and TCP DNS dead code
This commit removes old, unused TLS DNS and TCP DNS definitions from
the code. They should have been deleted earlier, but that was missed.
2023-03-15 18:40:58 +02:00
Tony Finch
7e565a87a7
Apply adjusted clang-format
The headers were slightly reordered when liburcu was added.
2023-03-10 17:31:28 +01:00
Aram Sargsyan
fce68da460 Fix ISC_REFCOUNT_TRACE_IMPL usage
ISC_REFCOUNT_TRACE_IMPL uses isc_tid(), but the corresponding header
file is not included, which breaks, for example, compiling BIND with
DNS_CATZ_TRACE defined in lib/dns/include/dns/catz.h.

Add '#include <isc/tid.h>' in lib/isc/include/isc/refcount.h.
2023-03-09 21:38:04 +00:00
Tony Finch
c43668f031 Remove some lingering references to libbind9
Clean up the `.clang-format` #include priority list and
the `\file` declaration in `isc/getaddresses.h`.
2023-03-08 10:06:22 +00:00
Tony Finch
9b7aa536ba QSBR: safe memory reclamation for lock-free data structures
This "quiescent state based reclamation" module provides support for
the qp-trie module in dns/qp. It is a replacement for liburcu, written
without reference to the urcu source code, and in fact it works in a
significantly different way.

A few specifics of BIND make this variant of QSBR somewhat simpler:

  * We can require that wait-free access to a qp-trie only happens in
    an isc_loop callback. The loop provides a natural quiescent state,
    after the callbacks are done, when no qp-trie access occurs.

  * We can dispense with any API like rcu_synchronize(). In practice,
    it takes far too long to wait for a grace period to elapse for each
    write to a data structure.

  * We use the idea of "phases" (aka epochs or eras) from EBR to
    reduce the amount of bookkeeping needed to track memory that is no
    longer needed, knowing that the qp-trie does most of that work
    already.

I considered hazard pointers for safe memory reclamation. They have
more read-side overhead (updating the hazard pointers) and it wasn't
clear to me how to nicely schedule the cleanup work. Another
alternative, epoch-based reclamation, is designed for fine-grained
lock-free updates, so it needs some rethinking to work well with the
heavily read-biased design of the qp-trie. QSBR has the fastest read
side of the basic SMR algorithms (with no barriers), and fits well
into a libuv loop. More recent hybrid SMR algorithms do not appear to
have enough benefits to justify the extra complexity.
2023-02-23 15:57:53 +00:00
Tony Finch
63cd73d43e Include thread ID in refcount trace output 2023-02-23 14:28:27 +00:00
Evan Hunt
dc27552c30 remove isc_glob
the isc_glob module was originally needed to support posix-style glob
processing on Windows, but is now just an unnecessary wrapper around
glob(3). this commit removes it.
2023-02-22 17:35:29 +00:00
Tony Finch
36e56923ce Simple lock-free stack in <isc/stack.h>
Add a singly-linked stack that supports lock-free prepend and drain (to
empty the list and clean up its elements).  Intended for use with QSBR
to collect objects that need safe memory reclamation, or any other user
that works with adding objects to the stack and then draining them in
one go like various work queues.

In <isc/atomic.h>, add an `atomic_ptr()` macro to make type
declarations a little less abominable, and clean up a duplicate
definition of `atomic_compare_exchange_strong_acq_rel()`
2023-02-22 16:13:37 +00:00
Evan Hunt
b058f99cb8 remove references to obsolete isc_task/timer functions
removed references in code comments, doc/dev documentation, etc, to
isc_task, isc_timer_reset(), and isc_timertype_inactive. also removed a
coccinelle patch related to isc_timer_reset() that was no longer needed.
2023-02-22 08:13:30 +00:00
Tony Finch
3fef7c626a Move bind9_getaddresses() to isc_getaddresses()
No need to have a whole library for one function.
2023-02-21 13:12:26 +00:00
Evan Hunt
a52b17d39b
remove isc_task completely
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.

functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.

the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
2023-02-16 18:35:32 +01:00
Evan Hunt
f58e7c28cd
switch to using isc_loopmgr_pause() instead of task exclusive
change functions using isc_taskmgr_beginexclusive() to use
isc_loopmgr_pause() instead.

also, removed an unnecessary use of exclusive mode in
named_server_tcptimeouts().

most functions that were implemented as task events because they needed
to be running in a task to use exclusive mode have now been changed
into loop callbacks instead. (the exception is catz, which is being
changed in a separate commit because it's a particularly complex change.)
2023-02-16 17:51:55 +01:00
Tony Finch
f9c725d7d4 Remove do-nothing header <isc/stat.h>
Use <sys/stat.h> instead
2023-02-15 16:44:47 +00:00
Tony Finch
6927a30926 Remove do-nothing header <isc/print.h>
This one really truly did nothing. No lines added!
2023-02-15 16:44:47 +00:00
Tony Finch
c7615bc28d Remove do-nothing header <isc/offset.h>
And replace all uses of isc_offset_t with standard off_t
2023-02-15 16:44:47 +00:00
Tony Finch
bed09c1676 Remove do-nothing header <isc/netdb.h>
Not needed since we dropped Windows support
2023-02-15 16:44:47 +00:00
Tony Finch
b0893ae09a Explain <isc/strerr.h> a little more
The purpose of the `strerror_r()` wrapper was not obvious.
2023-02-15 16:44:09 +00:00
Tony Finch
75f7a85a39 Deprecate <isc/deprecated.h>
We refactor more freely these days.
2023-02-15 15:36:20 +00:00
Ondřej Surý
6ffda5920e
Add the reader-writer synchronization with modified C-RW-WP
This changes the internal isc_rwlock implementation to:

  Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra
  J. Marathe, and Nir Shavit.  2013.  NUMA-aware reader-writer locks.
  SIGPLAN Not. 48, 8 (August 2013), 157–166.
  DOI:https://doi.org/10.1145/2517327.24425

(The full article available from:
  http://mcg.cs.tau.ac.il/papers/ppopp2013-rwlocks.pdf)

The implementation is based on the The Writer-Preference Lock (C-RW-WP)
variant (see the 3.4 section of the paper for the rationale).

The implemented algorithm has been modified for simplicity and for usage
patterns in rbtdb.c.

The changes compared to the original algorithm:

  * We haven't implemented the cohort locks because that would require a
    knowledge of NUMA nodes, instead a simple atomic_bool is used as
    synchronization point for writer lock.

  * The per-thread reader counters are not being used - this would
    require the internal thread id (isc_tid_v) to be always initialized,
    even in the utilities; the change has a slight performance penalty,
    so we might revisit this change in the future.  However, this change
    also saves a lot of memory, because cache-line aligned counters were
    used, so on 32-core machine, the rwlock would be 4096+ bytes big.

  * The readers use a writer_barrier that will raise after a while when
    readers lock can't be acquired to prevent readers starvation.

  * Separate ingress and egress readers counters queues to reduce both
    inter and intra-thread contention.
2023-02-15 09:30:04 +01:00
Tony Finch
436b76bb17 Improve the spinloop pause / yield hint
Unfortunately, C still lacks a standard function for pause (x86,
sparc) or yeild (arm) instructions, for use in spin lock or CAS loops.
BIND has its own based on vendor intrinsics or inline asm.

Previously, it was buried in the `isc_rwlock` implementation. This
commit renames `isc_rwlock_pause()` to `isc_pause()` and moves
it into <isc/pause.h>.

This commit also fixes the configure script so that it detects ARM
yield support on systems that identify as `aarch*` instead of `arm*`.

On 64-bit ARM systems we now use the ISB (instruction synchronization
barrier) instruction in preference to yield. The ISB instruction
pauses the CPU for longer, several nanoseconds, which is more like the
x86 pause instruction. There are more details in a Rust pull request,
which also refers to MySQL making the same change:
https://github.com/rust-lang/rust/pull/84725
2023-02-14 17:13:24 +00:00
Evan Hunt
3a1bb8dac8 remove some unused functions
removed some functions that are no longer used and unlikely to
be resurrected, and also some that were only used to support Windows
and can now be replaced with generic versions.
2023-02-13 11:50:59 -08:00
Evan Hunt
935879ed11 remove isc_bind9 variable
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
2023-02-09 18:00:13 +00:00
Ondřej Surý
baced007af
Require C11 Atomic Operations via <stdatomic.h>
Make the C11 Atomic Operations mandatory and drop the Gcc __atomic
builtin shims.
2023-02-08 21:33:23 +01:00
Ondřej Surý
1c456c0284
Require C11 thread_local keyword and <threads.h> header
Change the autoconf check to require C11 <threads.h> header and
thread_local keyword.
2023-02-08 21:33:23 +01:00
Tony Finch
ff63b53ff4 Add isc_time_monotonic()
This is to simplify measurements of how long things take.
2023-02-06 12:14:51 +00:00
Evan Hunt
7fd78344e0 refactor isc_ratelimiter to use loop callbacks
the rate limter now uses loop callbacks rather than task events.
the API for isc_ratelimiter_enqueue() has been changed; we now pass
in a loop, a callback function and a callback argument, and
receive back a rate limiter event object (isc_rlevent_t). it
is no longer necessary for the caller to allocate the event.

the callback argument needs to include a pointer to the rlevent
object so that it can be freed using isc_rlevent_free(), or by
dequeueing.
2023-01-31 21:41:19 -08:00
Ondřej Surý
3d4e41d076 Remove the total memory counter
The total memory counter had again little or no meaning when we removed
the internal memory allocator.  It was just a monotonic counter that
would count add the allocation sizes but never subtracted anything, so
it would be just a "big number".
2023-01-24 17:57:16 +00:00
Ondřej Surý
91e349433f Remove maxinuse memory counter
The maxinuse memory counter indicated the highest amount of
memory allocated in the past. Checking and updating this high-
water mark value every time memory was allocated had an impact
on server performance, so it has been removed. Memory size can
be monitored more efficiently via an external tool logging RSS.
2023-01-24 17:57:16 +00:00
Ondřej Surý
971df0b4ed Remove malloced and maxmalloced memory counter
The malloced and maxmalloced memory counters were mostly useless since
we removed the internal allocator blocks - it would only differ from
inuse by the memory context size itself.
2023-01-24 17:57:16 +00:00
Evan Hunt
301f8b23e1 complete change of NETMGR_TRACE to ISC_NETMGR_TRACE
some references to the old ifdef were still in place.
2023-01-20 12:46:34 -08:00
Aram Sargsyan
41dc48bfd7 Refactor isc_nm_xfr_allowed()
Return 'isc_result_t' type value instead of 'bool' to indicate
the actual failure. Rename the function to something not suggesting
a boolean type result. Make changes in the places where the API
function is being used to check for the result code instead of
a boolean value.
2023-01-19 10:24:08 +00:00
Ondřej Surý
f3753d591f Use thread_local EVP_MD_CTX in isc_iterated_hash()
As this code is on hot path (NSEC3) this introduces an additional
optimization of the EVP_MD API - instead of calling EVP_MD_CTX_new() on
every call to isc_iterated_hash(), we create two thread_local objects
for each thread - a basectx and mdctx, initialize basectx once and then
use EVP_MD_CTX_copy_ex() to flip the initialized state into mdctx.  This
saves us couple more valuable microseconds from the isc_iterated_hash()
call.
2023-01-18 19:36:21 +01:00
Ondřej Surý
e6bfb8e456 Avoid implicit algorithm fetch for OpenSSL EVP_MD family
The implicit algorithm fetch causes a lock contention and significant
slowdown for small input buffers.  For more details, see:

https://github.com/openssl/openssl/issues/19612

Instead of using EVP_DigestInit_ex() initialize empty MD_CTX objects for
each algorithm and use EVP_MD_CTX_copy_ex() to initialize MD_CTX from a
static copy.  Additionally avoid implicit algorithm fetching by using
EVP_MD_fetch() for OpenSSL 3.0.
2023-01-18 18:32:57 +01:00
Tony Finch
290899661d Fix a typo in the NS_PER_ macros
Milliseconds and microseconds were swapped.
2023-01-16 20:33:57 +00:00
Ondřej Surý
d07c4a98da Prefer the pthread_barrier implementation over uv_barrier
Prefer the pthread_barrier implementation on platforms where it is
available over uv_barrier implementation.  This also solves the problem
with thread sanitizer builds on macOS that doesn't have pthread barrier.
2023-01-11 09:51:02 +01:00
Ondřej Surý
10f884a5b8
Remove unused isc_astack unit
The isc_astack unit is now unused, so just remove it.
2023-01-10 20:31:24 +01:00
Ondřej Surý
5bbba0d1a1
Simplify tracing the reference counting in isc_netmgr
Always track the per-worker sockets in the .active_sockets field in the
isc__networker_t struct and always track the per-socket handles in the
.active_handles field ian the isc_nmsocket_t struct.
2023-01-10 19:57:39 +01:00
Evan Hunt
916ea26ead remove nonfunctional DSCP implementation
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.

To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.
2023-01-09 12:15:21 -08:00
Ondřej Surý
6613f89c62 Enhance the isc_loop unit to allow reference count tracking
Use ISC_REFCOUNT_TRACE_{IMPL,DECL} to allow better isc_loop reference
tracking - use `#define ISC_LOOP_TRACE 1` in <isc/loop.h> to enable.
2023-01-05 12:33:15 +00:00
Mark Andrews
096b280b1c Do not pass NULL pointer to memmove - undefined behaviour
Check if 'old_base' is NULL and if so skip calling memmove.
2023-01-03 14:40:30 +11:00
Artem Boldariev
7962e7f575 tlsctx_client_session_cache_new() -> tlsctx_client_session_create()
Additionally to renaming, it changes the function definition so that
it accepts a pointer to pointer instead of returning a pointer to the
new object.

It is mostly done to make it in line with other functions in the
module.
2022-12-23 11:10:11 +02:00
Artem Boldariev
f102df96b8 Rename isc_tlsctx_cache_new() -> isc_tlsctx_cache_create()
Additionally to renaming, it changes the function definition so that
it accepts a pointer to pointer instead of returning a pointer to the
new object.

It is mostly done to make it in line with other functions in the
module.
2022-12-23 11:10:11 +02:00
Ondřej Surý
6cb6373b5a Convert Stream DNS to use isc_buffer API
Drop the whole isc_dnsbuffer API and use new improved isc_buffer API
that provides same functionality as the isc_dnsbuffer unit now.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4277eeeb9c Remove TLS DNS transport (and parts common with TCP DNS)
This commit removes TLS DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
e5649710d3 Remove TCP DNS transport
This commit removes TCP DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4524bf4083 Make isc_nm_tlssocket non-optional
This commit unties generic TLS code (isc_nm_tlssocket) from DoH, so
that it will be available regardless of the fact if BIND was built
with DNS over HTTP support or not.
2022-12-20 22:13:53 +02:00
Artem Boldariev
371b02f37a TCP: make it possible to set Nagle's algorithms state via handle
This commit adds ability to turn the Nagle's algorithm on or off via
connections handle. It adds the isc_nmhandle_set_tcp_nodelay()
function as the public interface for this functionality.
2022-12-20 22:13:53 +02:00
Artem Boldariev
f395cd4b3e Add isc_nm_streamdnssocket (aka Stream DNS)
This commit adds an initial implementation of isc_nm_streamdnssocket
transport: a unified transport for DNS over stream protocols messages,
which is capable of replacing both TCP DNS and TLS DNS
transports. Currently, the interface it provides is a unified set of
interfaces provided by both of the transports it attempts to replace.

The transport is built around "isc_dnsbuffer_t" and
"isc_dnsstream_assembler_t" objects and attempts to minimise both the
number of memory allocations during network transfers as well as
memory usage.
2022-12-20 22:13:51 +02:00
Artem Boldariev
338cf3e467 Add isc_dnsstream_assembler_t implementation
This commit adds the implementation for an "isc_dnsstream_assembler_t"
object. The object is built on top of "isc_dnsbuffer_t" and is
intended to encapsulate the state machine used for handling DNS
messages received in the format used for messages transmitted over
TCP.

The idea is that the object accepts the input data received from a
socket, tries to assemble DNS messages from the incoming data and
calls the callback which contains the status of the incoming data as
well as a pointer to the memory region referencing the data of the
assembled message. It is capable of assembling DNS messages no matter
how torn apart they are when sent over network.

The following statuses might be passed to the callback:

* ISC_R_SUCCESS - a message has been successfully assembled;
* ISC_R_NOMORE  - not enough data has been processed to assemble a
message;
* ISC_R_RANGE - there was an attempt to process a zero-sized DNS
message (someone attempts to send us junk data).

One could say that the object replaces the implementation of
"isc__nm_*_processbuffer()" functions used by the old TCP DNS and TLS
DNS transports with a better defined state machine completely
decoupled from the networking code itself.

Such a design makes it trivial to write unit tests for it, leading to
better verification of its correctness.

Another important difference is directly related to the fact that it
is built on top of "isc_dnsbuffer_t", which tries to manage memory in
a smart way. In particular:

* It tries to use a static buffer for smaller messages, reducing
pressure on the memory manager (hot path);
* When allocating dynamic memory for larger messages, it tries to
allocate memory conservatively (generic path).

These characteristics is a significant upgrade over the older logic
where a 64KB(+2 bytes) buffer was allocated from dynamic memory
regardless of the fact if we need a buffer this large or not. That is,
lesser memory usage is expected in a generic case for DNS transports
built on top of "isc_dnsstream_assembler_t."
2022-12-20 21:24:44 +02:00
Artem Boldariev
cbb758abd4 Add isc_dnsbuffer_t implementation
This commit adds "isc_dnsbuffer_t" object implementation, a thin
wrapper on top of "isc_buffer_t" which has the following
characteristics:

* provides interface specifically atuned for handling/generating DNS
messages, especially in the format used for DNS messages over TCP;
* avoids allocating dynamic memory when handling small DNS messages,
while transparently switching to using dynamic memory when handling
larger messages. This approach significantly reduces pressure on the
memory allocator, as most of the DNS messages are small.
2022-12-20 21:24:44 +02:00
Artem Boldariev
94e650ce89 Use 'restrict' and 'const' for 'isc_buffer_t'
The purpose of this commit is to aid compiler in generating better
code when working with `isc_buffer_t` objects by using restricted
pointers (and, to a lesser extent, 'const' modifier for read-only
arguments).

This way we, basically, instruct the compiler that the members of
structured passed by pointers into the functions can be treated as
local variables in the scope of a function. That should reduce the
number of load/store operations emitted by compilers when accessing
objects (e.g. 'isc_buffer_t') via pointers.
2022-12-20 21:01:27 +02:00
Ondřej Surý
460afcda18
Add isc_buffer_trycompact() function needed for StreamDNS
Add isc_buffer_trycompact() that's an optimization; it will compact the
buffer only when the remaining length is smaller than used length.
2022-12-20 19:13:48 +01:00
Ondřej Surý
e6062ee3ae
Add isc_buffer_setmctx() and isc_buffer_clearmctx() function
Add two extra functions needed by StreamDNS:

1. isc_buffer_setmctx() sets the buffer internal memory context, so we
   can use isc_buffer_reserve() on the buffer.  For this, we also need
   to track whether the .base was dynamically allocated or not.  This
   needs to be called after isc_buffer_init() and before first
   isc_buffer_reserve() call.

2. isc_buffer_clearmctx() clears the buffer internal memory context, and
   frees any dynamically allocated buffer.  This needs to be called
   after the last isc_buffer_reserve() call and before calling the
   isc_buffer_invalidate()
2022-12-20 19:13:48 +01:00
Ondřej Surý
8e3a86f6dd
Make the isc_buffer unit header-only
The isc_buffer is often used in the hot-path, so make it header-only
implementation.
2022-12-20 19:13:48 +01:00
Ondřej Surý
2ddea1e41c
Add a static pre-allocated buffer to isc_buffer_t
When the buffer is allocated via isc_buffer_allocate() and the size is
smaller or equal ISC_BUFFER_STATIC_SIZE (currently 512 bytes), the
buffer will be allocated as a flexible array member in the buffer
structure itself instead of allocating it on the heap.  This should help
when the buffer is used on the hot-path with small allocations.
2022-12-20 19:13:48 +01:00
Ondřej Surý
6bd2b34180
Enable auto-reallocation for all isc_buffer_allocate() buffers
When isc_buffer_t buffer is created with isc_buffer_allocate() assume
that we want it to always auto-reallocate instead of having an extra
call to enable auto-reallocation.
2022-12-20 19:13:48 +01:00
Ondřej Surý
135ec7a0f0
Remove single use isc_buffer_putdecint() function
The isc_buffer_putdecint() could be easily replaced with
isc_buffer_printf() with just a small overhead of calling vsnprintf()
twice instead once.  This is not on a hot-path (dns_catz unit), so we
can ignore the overhead and instead have less single-use code in favor
of using reusable more generic function.
2022-12-20 19:13:48 +01:00
Ondřej Surý
2a94123d5b
Refactor the isc_buffer_{get,put}uintN, add isc_buffer_peekuintN
The Stream DNS implementation needs a peek methods that read the value
from the buffer, but it doesn't advance the current position.  Add
isc_buffer_peekuintX methods, refactor the isc_buffer_{get,put}uintN
methods to modern integer types, and move the isc_buffer_getuintN to the
header as static inline functions.
2022-12-20 19:13:48 +01:00
Ondřej Surý
a1d45685e6
Move and extend the uint8_t low-endian to uint{32,64}t to endian.h
Move the U8TO{32,64}_LE and U{32,64}TO8_LE macros to endian.h and extend
the macros for 16-bit and Big-Endian variants.

Use the macros both in isc_siphash (LE) and isc_buffer (BE) units.
2022-12-20 19:13:48 +01:00
Ondřej Surý
aea251f3bc
Change the isc_buffer_reserve() to take just buffer pointer
The isc_buffer_reserve() would be passed a reference to the buffer
pointer, which was unnecessary as the pointer would never be changed
in the current implementation.  Remove the extra dereference.
2022-12-20 19:13:48 +01:00
Artem Boldariev
837fef78b1 Fix TLS session resumption via IDs when Mutual TLS is used
This commit fixes TLS session resumption via session IDs when
client certificates are used. To do so it makes sure that session ID
contexts are set within server TLS contexts. See OpenSSL documentation
for 'SSL_CTX_set_session_id_context()', the "Warnings" section.
2022-12-14 18:06:20 +02:00
Ondřej Surý
e2262c2112
Remove isc_resource API and set limits directly in named_os unit
The only function left in the isc_resource API was setting the file
limit.  Replace the whole unit with a simple getrlimit to check the
maximum value of RLIMIT_NOFILE and set the maximum back to rlimit_cur.

This is more compatible than trying to set RLIMIT_UNLIMITED on the
RLIMIT_NOFILE as it doesn't work on Linux (see man 5 proc on
/proc/sys/fs/nr_open), neither it does on Darwin kernel (see man 2
getrlimit).

The only place where the maximum value could be raised under privileged
user would be BSDs, but the `named_os_adjustnofile()` were not called
there before.  We would apply the increased limits only on Linux and Sun
platforms.
2022-12-07 19:40:00 +01:00
Ondřej Surý
50f357cb36
Refactor the dns_adb unit
The dns_adb unit has been refactored to be much simpler.  Following
changes have been made:

1. Simplify the ADB to always allow GLUE and hints

   There were only two places where dns_adb_createfind() was used - in
   the dns_resolver unit where hints and GLUE addresses were ok, and in
   the dns_zone where dns_adb_createfind() would be called without
   DNS_ADBFIND_HINTOK and DNS_ADBFIND_GLUEOK set.

   Simplify the logic by allowing hint and GLUE addresses when looking
   up the nameserver addresses to notify.  The difference is negligible
   and would cause a difference in the notified addresses only when
   there's mismatch between the parent and child addresses and we
   haven't cached the child addresses yet.

2. Drop the namebuckets and entrybuckets

   Formerly, the namebuckets and entrybuckets were used to reduced the
   lock contention when accessing the double-linked lists stored in each
   bucket.  In the previous refactoring, the custom hashtable for the
   buckets has been replaced with isc_ht/isc_hashmap, so only a single
   item (mostly, see below) would end up in each bucket.

   Removing the entrybuckets has been straightforward, the only matching
   was done on the isc_sockaddr_t member of the dns_adbentry.

   Removing the zonebuckets required GLUEOK and HINTOK bits to be
   removed because the find could match entries with-or-without the bits
   set, and creating a custom key that stores the
   DNS_ADBFIND_STARTATZONE in the first byte of the key, so we can do a
   straightforward lookup into the hashtable without traversing a list
   that contains items with different flags.

3. Remove unassociated entries from ADB database

   Previously, the adbentries could live in the ADB database even after
   unlinking them from dns_adbnames.  Such entries would show up as
   "Unassociated entries" in the ADB dump.  The benefit of keeping such
   entries is little - the chance that we link such entry to a adbname
   is small, and it's simpler to evict unlinked entries from the ADB
   cache (and the hashtable) than create second LRU cleaning mechanism.

   Unlinked ADB entries are now directly deleted from the hash
   table (hashmap) upon destruction.

4. Cleanup expired entries from the hash table

   When buckets were still in place, the code would keep the buckets
   always allocated and never shrink the hash table (hashmap).  With
   proper reference counting in place, we can delete the adbnames from
   the hash table and the LRU list.

5. Stop purging the names early when we hit the time limit

   Because the LRU list is now time ordered, we can stop purging the
   names when we find a first entry that doesn't fullfil our time-based
   eviction criteria because no further entry on the LRU list will meet
   the criteria.

Future work:

1. Lock contention

   In this commit, the focus was on correctness of the data structure,
   but in the future, the lock contention in the ADB database needs to
   be addressed.  Currently, we use simple mutex to lock the hash
   tables, because we almost always need to use a write lock for
   properly purging the hashtables.  The ADB database needs to be
   sharded (similar to the effect that buckets had in the past).  Each
   shard would contain own hashmap and own LRU list.

2. Time-based purging

   The ADB names and entries stay intact when there are no lookups.
   When we add separate shards, a timer needs to be added for time-based
   cleaning in case there's no traffic hashing to the inactive shard.

3. Revisit the 30 minutes limit

   The ADB cache is capped at 30 minutes.  This needs to be revisited,
   and at least the limit should be configurable (in both directions).
2022-11-30 10:03:24 +01:00
Ondřej Surý
118ae66976 Add extra set of ISC_REFCOUNT_TRACE_{IMPL,DECL} macros
The new ISC_REFCOUNT_TRACE_{IMPL,DECL} macros can be used to add a
reference tracing capability to any unit using the reference counting.
It requires a little bit of extra work in each header as you can't have
a define from inside a define (see rpz.h), but it's fairly easy to add
tracing to any struct using reference counting with these macros.
2022-11-29 23:57:40 -08:00
Tony Finch
00307fe318 Deduplicate time unit conversion factors
The various factors like NS_PER_MS are now defined in a single place
and the names are no longer inconsistent. I chose the _PER_SEC names
rather than _PER_S because it is slightly more clear in isolation;
but the smaller units are always NS, US, and MS.
2022-11-25 13:23:36 +00:00
Ondřej Surý
f46ce447a6
Add isc_hashmap API that implements Robin Hood hashing
Add new isc_hashmap API that differs from the current isc_ht API in
several aspects:

1. It implements Robin Hood Hashing which is open-addressing hash table
   algorithm (e.g. no linked-lists)

2. No memory allocations - the array to store the nodes is made of
   isc_hashmap_node_t structures instead of just pointers, so there's
   only allocation on resize.

3. The key is not copied into the hashmap node and must be also stored
   externally, either as part of the stored value or in any other
   location that's valid as long the value is stored in the hashmap.

This makes the isc_hashmap_t a little less universal because of the key
storage requirements, but the inserts and deletes are faster because
they don't require memory allocation on isc_hashmap_add() and memory
deallocation on isc_hashmap_delete().
2022-11-10 15:07:19 +01:00
Ondřej Surý
0492bbf590
Make the pthread_rwlock implementation header-only macros [2/2]
While using mutrace, the phtread-rwlock based isc_rwlock implementation
would be all tracked in the rwlock.c unit losing all useful information
as all rwlocks would be traced in a single place.  Rewrite the
pthread_rwlock based implementation to be header-only macros, so we can
use mutrace to properly track the rwlock contention without heavily
patching mutrace to understand the libisc synchronization primitives.
2022-11-02 10:34:10 +01:00
Ondřej Surý
6bd201ccec
Remove one level of indirection from isc_rwlock [1/2]
Instead of checking the PTHREAD_RUNTIME_CHECK from the header, move it
to the pthread_rwlock implementation functions.  The internal isc_rwlock
actually cannot fail, so the checks in the header was useless anyway.
2022-11-02 10:27:09 +01:00
Ondřej Surý
98b7a93772
Remove isc_rwlock_downgrade() from isc_rwlock
The isc_rwlock_downgrade() is not used anywhere, so we can remove it and
make the pthread_rwlock implementation simpler.
2022-11-02 09:05:37 +01:00
Evan Hunt
dc878e3098 isc_async_run() runs events in reverse order
when more than one event was scheduled in the isc_aysnc queue,
they were executed in reverse order. we need to pull events
off the back of queue instead the front, so that uv_loop will
run them in the right order.

note that isc_job_run() has the same behavior, because it calls
uv_idle_start() directly. in that case we just document it so
it'll be less surprising in the future.
2022-10-31 05:43:45 -07:00
Mark Andrews
3881afeb15 Add dns_rdata_checksvcb
dns_rdata_checksvcb performs data entry checks on SVCB records.
In particular that _dns SVBC record have an 'alpn' and if that 'alpn'
parameter indicates HTTP is in use that 'dophath' is present.
2022-10-29 00:22:54 +11:00
Ondřej Surý
6ba0a22627
Change the return type of isc_lex_create() to void
The isc_lex_create() cannot fail, so cleanup the return type from
isc_result_t to void.
2022-10-26 12:55:06 +02:00
Ondřej Surý
5e20c2ccfb
Replace (void *)-1 with ISC_LINK_TOMBSTONE
Instead of having "arbitrary" (void *)-1 to define non-linked, add a
ISC_LINK_TOMBSTONE(type) macro that replaces the "magic" value with a
define.
2022-10-18 11:36:15 +02:00
Ondřej Surý
cb3c36b8bf
Add ISC_{LIST,LINK}_INITIALIZER for designated initializers
Since we are using designated initializers, we were missing initializers
for ISC_LIST and ISC_LINK, add them, so you can do

    *foo = (foo_t){ .list = ISC_LIST_INITIALIZER };

Instead of:

    *foo = (foo_t){ 0 };
    ISC_LIST_INIT(foo->list);
2022-10-18 11:36:15 +02:00
Tony Finch
26ed03a61e Include the function name when reporting unexpected errors
I.e. print the name of the function in BIND that called the system
function that returned an error. Since it was useful for pthreads
code, it seems worthwhile doing so everywhere.
2022-10-17 13:43:59 +01:00
Tony Finch
a34a2784b1 De-duplicate some calls to strerror_r()
Specifically, when reporting an unexpected or fatal error.
2022-10-17 11:58:26 +01:00
Tony Finch
ec50c58f52 De-duplicate __FILE__, __LINE__
Mostly generated automatically with the following semantic patch,
except where coccinelle was confused by #ifdef in lib/isc/net.c

@@ expression list args; @@
- UNEXPECTED_ERROR(__FILE__, __LINE__, args)
+ UNEXPECTED_ERROR(args)
@@ expression list args; @@
- FATAL_ERROR(__FILE__, __LINE__, args)
+ FATAL_ERROR(args)
2022-10-17 11:58:26 +01:00
Ondřej Surý
cedfc97974 Improve reporting for pthread_once errors
Replace all uses of RUNTIME_CHECK() in lib/isc/include/isc/once.h with
PTHEADS_RUNTIME_CHECK(), in order to improve error reporting for any
once-related run-time failures (by augmenting error messages with
file/line/caller information and the error string corresponding to
errno).
2022-10-14 16:39:21 +02:00
Ondřej Surý
beecde7120 Rewrite isc_httpd using picohttpparser and isc_url_parse
Rewrite the isc_httpd to be more robust.

1. Replace the hand-crafted HTTP request parser with picohttpparser for
   parsing the whole HTTP/1.0 and HTTP/1.1 requests.  Limit the number
   of allowed headers to 10 (arbitrary number).

2. Replace the hand-crafted URL parser with isc_url_parse for parsing
   the URL from the HTTP request.

3. Increase the receive buffer to match the isc_netmgr buffers, so we
   can at least receive two full isc_nm_read()s.  This makes the
   truncation processing much simpler.

4. Process the received buffer from single isc_nm_read() in a single
   loop and schedule the sends to be independent of each other.

The first two changes makes the code simpler and rely on already
existing libraries that we already had (isc_url based on nodejs) or are
used elsewhere (picohttpparser).

The second two changes remove the artificial "truncation" limit on
parsing multiple request.  Now only a request that has too many
headers (currently 10) or is too big (so, the receive buffer fills up
without reaching end of the request) will end the connection.

We can be benevolent here with the limites, because the statschannel
channel is by definition private and access must be allowed only to
administrators of the server.  There are no timers, no rate-limiting, no
upper limit on the number of requests that can be served, etc.
2022-10-14 11:26:54 +02:00
Ondřej Surý
dbf5672f32
Replace isc_mem_*_aligned(..., alignment) with isc_mem_*x(..., flags)
Previously, the isc_mem_get_aligned() and friends took alignment size as
one of the arguments.  Replace the specific function with more generic
extended variant that now accepts ISC_MEM_ALIGN(alignment) for aligned
allocations and ISC_MEM_ZERO for allocations that zeroes
the (re-)allocated memory before returning the pointer to the caller.
2022-10-05 16:44:05 +02:00
Ondřej Surý
c14a4ac763
Add a case-insensitive option directly to siphash 2-4 implementation
Formerly, the isc_hash32() would have to change the key in a local copy
to make it case insensitive.  Change the isc_siphash24() and
isc_halfsiphash24() functions to lowercase the input directly when
reading it from the memory and converting the uint8_t * array to
64-bit (respectively 32-bit numbers).
2022-10-04 10:32:40 +02:00
Mark Andrews
5f07fe8cbb Use strnstr implementation from FreeBSD if not provided by OS 2022-10-04 14:21:41 +11:00
Ondřej Surý
477eb22c12
Refactor isc_ratelimiter API
Because the dns_zonemgr_create() was run before the loopmgr was started,
the isc_ratelimiter API was more complicated that it had to be.  Move
the dns_zonemgr_create() to run_server() task which is run on the main
loop, and simplify the isc_ratelimiter API implementation.

The isc_timer is now created in the isc_ratelimiter_create() and
starting the timer is now separate async task as is destroying the timer
in case it's not launched from the loop it was created on.  The
ratelimiter tick now doesn't have to create and destroy timer logic and
just stops the timer when there's no more work to do.

This should also solve all the races that were causing the
isc_ratelimiter to be left dangling because the timer was stopped before
the last reference would be detached.
2022-09-30 10:36:30 +02:00
Ondřej Surý
1e2ededb07
Add missing DbC check for name##_detach in ISC_REFCOUNT_IMPL macro
The detach function in the ISC_REFCOUNT_IMPL macro was missing DbC
checks, add them.
2022-09-30 09:50:17 +02:00
Ondřej Surý
e537fea861
Use custom isc_mem based allocator for libxml2
The libxml2 library provides a way to replace the default allocator with
user supplied allocator (malloc, realloc, strdup and free).

Create a memory context specifically for libxml2 to allow tracking the
memory usage that has originated from within libxml2.  This will provide
a separate memory context for libxml2 to track the allocations and when
shutting down the application it will check that all libxml2 allocations
were returned to the allocator.

Additionally, move the xmlInitParser() and xmlCleanupParser() calls from
bin/named/main.c to library constructor/destructor in libisc library.
2022-09-27 17:10:42 +02:00
Ondřej Surý
236d4b7739
Use custom isc_mem based allocator for OpenSSL
The OpenSSL library provides a way to replace the default allocator with
user supplied allocator (malloc, realloc, and free).

Create a memory context specifically for OpenSSL to allow tracking the
memory usage that has originated from within OpenSSL.  This will provide
a separate memory context for OpenSSL to track the allocations and when
shutting down the application it will check that all OpenSSL allocations
were returned to the allocator.
2022-09-27 17:10:42 +02:00
Ondřej Surý
a32d06dd42
Use custom isc_mem based allocator for libuv
The libuv library provides a way to replace the default allocator with
user supplied allocator (malloc, realloc, calloc and free).

Create a memory context specifically for libuv to allow tracking the
memory usage that has originated from within libuv.  This requires
libuv >= 1.38.0 which provides uv_library_shutdown() function that
assures no more allocations will be made.
2022-09-27 17:10:42 +02:00
Ondřej Surý
0086ebf3fc
Bump the libuv requirement to libuv >= 1.34.0
By bumping the minimum libuv version to 1.34.0, it allows us to remove
all libuv shims we ever had and makes the code much cleaner.  The
up-to-date libuv is available in all distributions supported by BIND
9.19+ either natively or as a backport.
2022-09-27 17:09:10 +02:00
Evan Hunt
1926ddc987 change ISC__BUFFER macros to inline functions
previously, when ISC_BUFFER_USEINLINE was defined, macros were
used to implement isc_buffer primitives (isc_buffer_init(),
isc_buffer_region(), etc). these macros were missing the DbC
assertions for those primitives, which made it possible for
coding errors to go undetected.

adding the assertions to the macros caused compiler warnings on
some platforms. therefore, this commit converts the ISC__BUFFER
macros to static inline functions instead, with assertions included,
and eliminates the non-inline implementation from buffer.c.

the --enable-buffer-useinline configure option has been removed.
2022-09-26 23:49:27 -07:00
Ondřej Surý
1baed21688
Switch the CSPRNG function from RAND_bytes() to uv_random()
The RAND_bytes() implementation differs between the OpenSSL versions and
uses the system entropy only for seeding its internal CSPRNG.  The
uv_random() on the other hand uses the system provided CSPRNG.

Switch from RAND_bytes() to uv_random() to use system provided CSPRNG.
2022-09-26 15:13:11 +02:00
Ondřej Surý
fffd444440
Cleanup the asychronous code in the stream implementations
After the loopmgr work has been merged, we can now cleanup the TCP and
TLS protocols a little bit, because there are stronger guarantees that
the sockets will be kept on the respective loops/threads.  We only need
asynchronous call for listening sockets (start, stop) and reading from
the TCP (because the isc_nm_read() might be called from read callback
again.

This commit does the following changes (they are intertwined together):

1. Cleanup most of the asynchronous events in the TCP code, and add
   comments for the events that needs to be kept asynchronous.

2. Remove isc_nm_resumeread() from the netmgr API, and replace
   isc_nm_resumeread() calls with existing isc_nm_read() calls.

3. Remove isc_nm_pauseread() from the netmgr API, and replace
   isc_nm_pauseread() calls with a new isc_nm_read_stop() call.

4. Disable the isc_nm_cancelread() for the streaming protocols, only the
   datagram-like protocols can use isc_nm_cancelread().

5. Add isc_nmhandle_close() that can be used to shutdown the socket
  earlier than after the last detach.  Formerly, the socket would be
  closed only after all reading and sending would be finished and the
  last reference would be detached.  The new isc_nmhandle_close() can
  be used to close the underlying socket earlier, so all the other
  asynchronous calls would call their respective callbacks immediately.

Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
2022-09-22 14:51:15 +02:00
Ondřej Surý
869c6d77a2 Convert isc_ratelimiter API to use on-loop timers
In preparation for the on-loop timers, the isc_ratelimiter API was
converted to use the timer on main loop and start and stop the timer
asynchronously on the main loop.
2022-09-21 14:25:33 -07:00
Ondřej Surý
27d1e498b8 Add isc_timer_async_destroy() helper function
As it sometimes happens that the object using isc_timer_t is destroyed
via detaching all the references with no guarantee that the last thread
will be matching thread, add a helper isc_timer_async_destroy() function
that stops the timer and runs the destroy function via isc_async_run()
on the matching thread.
2022-09-21 14:25:33 -07:00
Ondřej Surý
f6e4f620b3
Use the semantic patch to do the unsigned -> unsigned int change
Apply the semantic patch on the whole code base to get rid of 'unsigned'
usage in favor of explicit 'unsigned int'.
2022-09-19 15:56:02 +02:00
Tony Finch
21a383a8fd General-purpose unrolled ASCII tolower() loops
When converting a string to lower case, the compiler is able to
autovectorize nicely, so a nice simple implementation is also very
fast, comparable to memcpy().

Comparisons are more difficult for the compiler, so we convert eight
bytes at a time using "SIMD within a register" tricks. Experiments
indicate it's best to stick to simple loops for shorter strings and
the remainder of long strings.
2022-09-12 12:18:57 +01:00
Tony Finch
27a561273e Consolidate some ASCII tables in isc/ascii and isc/hex
There were a number of places that had copies of various ASCII
tables (case conversion, hex and decimal conversion) that are intended
to be faster than the ctype.h macros, or avoid locale pollution.

Move them into libisc, and wrap the lookup tables with macros that
avoid the ctype.h gotchas.
2022-09-12 12:18:57 +01:00
Michał Kępień
3b1c80fd0f Fix error reporting for POSIX Threads functions
Commit 3608abc8fa6a33046e1d34a0789cf7c9547f09ad inadvertently carried
over a mistake in logging pthread_cond_init() errors to the
ERRNO_CHECK() preprocessor macro: instead of passing the value returned
by a given pthread_*() function to strerror_r(), ERRNO_CHECK() passes
the errno variable to strerror_r().  This causes bogus error reports
because POSIX Threads API functions do not set the errno variable.

Fix by passing the value returned by a given pthread_*() function
instead of the errno variable to strerror_r().  Since this change makes
the name of the affected macro (ERRNO_CHECK()) confusing, rename the
latter to PTHREADS_RUNTIME_CHECK().  Also log the integer error value
returned by a given pthread_*() function verbatim to rule out any
further confusion in runtime error reporting.
2022-09-09 20:25:47 +02:00
Ondřej Surý
4d07768a09
Remove the isc_app API
The isc_app API is no longer used and has been removed.
2022-08-26 09:09:25 +02:00
Ondřej Surý
b69e783164
Update netmgr, tasks, and applications to use isc_loopmgr
Previously:

* applications were using isc_app as the base unit for running the
  application and signal handling.

* networking was handled in the netmgr layer, which would start a
  number of threads, each with a uv_loop event loop.

* task/event handling was done in the isc_task unit, which used
  netmgr event loops to run the isc_event calls.

In this refactoring:

* the network manager now uses isc_loop instead of maintaining its
  own worker threads and event loops.

* the taskmgr that manages isc_task instances now also uses isc_loopmgr,
  and every isc_task runs on a specific isc_loop bound to the specific
  thread.

* applications have been updated as necessary to use the new API.

* new ISC_LOOP_TEST macros have been added to enable unit tests to
  run isc_loop event loops. unit tests have been updated to use this
  where needed.
2022-08-26 09:09:24 +02:00
Ondřej Surý
49b149f5fd
Update isc_timer to use isc_loopmgr
* isc_timer was rewritten using the uv_timer, and isc_timermgr_t was
  completely removed; isc_timer objects are now directly created on the
  isc_loop event loops.

* the isc_timer API has been simplified. the "inactive" timer type has
  been removed; timers are now stopped by calling isc_timer_stop()
  instead of resetting to inactive.

* isc_manager now creates a loop manager rather than a timer manager.

* modules and applications using isc_timer have been updated to use the
  new API.
2022-08-25 17:17:07 +02:00
Ondřej Surý
84c90e223f
New event loop handling API
This commit introduces new APIs for applications and signal handling,
intended to replace isc_app for applications built on top of libisc.

* isc_app will be replaced with isc_loopmgr, which handles the
  starting and stopping of applications. In isc_loopmgr, the main
  thread is not blocked, but is part of the working thread set.
  The loop manager will start a number of threads, each with a
  uv_loop event loop running. Setup and teardown functions can be
  assigned which will run when the loop starts and stops, and
  jobs can be scheduled to run in the meantime. When
  isc_loopmgr_shutdown() is run from any the loops, all loops
  will shut down and the application can terminate.

* signal handling will now be handled with a separate isc_signal unit.
  isc_loopmgr only handles SIGTERM and SIGINT for application
  termination, but the application may install additional signal
  handlers, such as SIGHUP as a signal to reload configuration.

* new job running primitives, isc_job and isc_async, have been added.
  Both units schedule callbacks (specifying a callback function and
  argument) on an event loop. The difference is that isc_job unit is
  unlocked and not thread-safe, so it can be used to efficiently
  run jobs in the same thread, while isc_async is thread-safe and
  uses locking, so it can be used to pass jobs from one thread to
  another.

* isc_tid will be used to track the thread ID in isc_loop worker
  threads.

* unit tests have been added for the new APIs.
2022-08-25 12:24:29 +02:00
Ondřej Surý
a26862e653
Simplify the isc_event API
The ev_tag field was never used, and has now been removed.
2022-08-25 12:24:25 +02:00
Michał Kępień
b67ff4728f Improve reporting for barrier errors
uv_barrier_init() errors are currently ignored.  Use UV_RUNTIME_CHECK()
to catch them and to improve error reporting for any uv_barrier_init()
run-time failures (by augmenting error messages with file/line
information and the error string corresponding to the value returned).
2022-07-13 13:19:32 +02:00
Michał Kępień
7009f9d270 Improve reporting for read-write lock errors
Replace direct uses of implementation-specific rwlock functions in
lib/isc/include/isc/rwlock.h with preprocessor macros that use
ERRNO_CHECK(), in order to augment rwlock-related error messages with
file/line/caller information and the error string corresponding to
errno.  Adjust the implementation-specific functions for pthreads-based
rwlocks so that they return any errors encountered to the caller instead
of aborting execution immediately using RUNTIME_CHECK().

To keep code modifications simple, make the non-pthreads-based
implementation-specific rwlock functions always return 0; these
functions continue to handle errors using less verbose run-time
assertions as they do not set errno anyway.
2022-07-13 13:19:32 +02:00
Michał Kępień
badeeff0ac Improve reporting for condition variable errors
Replace all uses of RUNTIME_CHECK() in lib/isc/include/isc/condition.h
with ERRNO_CHECK(), in order to improve error reporting for any
condition-variable-related run-time failures (by augmenting error
messages with file/line/caller information and the error string
corresponding to errno).
2022-07-13 13:19:32 +02:00
Michał Kępień
f352a834a7 Improve reporting for mutex errors
Replace all uses of RUNTIME_CHECK() in lib/isc/include/isc/mutex.h with
ERRNO_CHECK(), in order to improve error reporting for any mutex-related
run-time failures (by augmenting error messages with file/line/caller
information and the error string corresponding to errno).
2022-07-13 13:19:32 +02:00
Michał Kępień
77aead5ab6 Enable tracking of pthreads barriers
Some POSIX threads implementations (e.g. FreeBSD's libthr) allocate
memory on the heap when pthread_barrier_init() is called.  Every call to
that function must be accompanied by a corresponding call to
pthread_barrier_destroy() or else the memory allocated for the barrier
will leak.

jemalloc can be used for detecting memory allocations which are not
released by a process when it exits.  Unfortunately, since jemalloc is
also the system allocator on FreeBSD and a special (profiling-enabled)
build of jemalloc is required for memory leak detection, this method
cannot be used for detecting leaked memory allocated by libthr on a
stock FreeBSD installation.

However, libthr's behavior can be emulated on any platform by
implementing alternative versions of libisc functions for creating and
destroying barriers that allocate memory using malloc() and release it
using free().  This enables using jemalloc for detecting missing
pthread_barrier_destroy() calls on any platform on which it works
reliably.

When the newly introduced ISC_TRACK_PTHREADS_OBJECTS preprocessor macro
is set, allocate isc_barrier_t structures on the heap in
isc_barrier_init() and free them in isc_barrier_destroy().  Reuse
existing barrier macros (after renaming them appropriately) for other
operations.
2022-07-13 13:19:32 +02:00
Ondřej Surý
e4606da2c6 Enable tracking of pthreads rwlocks
Some POSIX threads implementations (e.g. FreeBSD's libthr) allocate
memory on the heap when pthread_rwlock_init() is called.  Every call to
that function must be accompanied by a corresponding call to
pthread_rwlock_destroy() or else the memory allocated for the rwlock
will leak.

jemalloc can be used for detecting memory allocations which are not
released by a process when it exits.  Unfortunately, since jemalloc is
also the system allocator on FreeBSD and a special (profiling-enabled)
build of jemalloc is required for memory leak detection, this method
cannot be used for detecting leaked memory allocated by libthr on a
stock FreeBSD installation.

However, libthr's behavior can be emulated on any platform by
implementing alternative versions of libisc functions for creating and
destroying rwlocks that allocate memory using malloc() and release it
using free().  This enables using jemalloc for detecting missing
pthread_rwlock_destroy() calls on any platform on which it works
reliably.

When the newly introduced ISC_TRACK_PTHREADS_OBJECTS preprocessor macro
is set (and --enable-pthread-rwlock is used), allocate isc_rwlock_t
structures on the heap in isc_rwlock_init() and free them in
isc_rwlock_destroy().  Reuse existing functions defined in
lib/isc/rwlock.c for other operations, but rename them first, so that
they contain triple underscores (to indicate that these functions are
implementation-specific, unlike their mutex and condition variable
counterparts, which always use the pthreads implementation).  Define the
isc__rwlock_init() macro so that it is a logical counterpart of
isc__mutex_init() and isc__condition_init(); adjust isc___rwlock_init()
accordingly.  Remove a redundant function prototype for
isc__rwlock_lock() and rename that (static) function to rwlock_lock() in
order to avoid having to use quadruple underscores.
2022-07-13 13:19:32 +02:00
Ondřej Surý
8dfdb95a20 Enable tracking of pthreads condition variables
Some POSIX threads implementations (e.g. FreeBSD's libthr) allocate
memory on the heap when pthread_cond_init() is called.  Every call to
that function must be accompanied by a corresponding call to
pthread_cond_destroy() or else the memory allocated for the condition
variable will leak.

jemalloc can be used for detecting memory allocations which are not
released by a process when it exits.  Unfortunately, since jemalloc is
also the system allocator on FreeBSD and a special (profiling-enabled)
build of jemalloc is required for memory leak detection, this method
cannot be used for detecting leaked memory allocated by libthr on a
stock FreeBSD installation.

However, libthr's behavior can be emulated on any platform by
implementing alternative versions of libisc functions for creating and
destroying condition variables that allocate memory using malloc() and
release it using free().  This enables using jemalloc for detecting
missing pthread_cond_destroy() calls on any platform on which it works
reliably.

When the newly introduced ISC_TRACK_PTHREADS_OBJECTS preprocessor macro
is set, allocate isc_condition_t structures on the heap in
isc_condition_init() and free them in isc_condition_destroy().  Reuse
existing condition variable macros (after renaming them appropriately)
for other operations.
2022-07-13 13:19:32 +02:00
Ondřej Surý
ebcfb16576 Enable tracking of pthreads mutexes
Some POSIX threads implementations (e.g. FreeBSD's libthr) allocate
memory on the heap when pthread_mutex_init() is called.  Every call to
that function must be accompanied by a corresponding call to
pthread_mutex_destroy() or else the memory allocated for the mutex will
leak.

jemalloc can be used for detecting memory allocations which are not
released by a process when it exits.  Unfortunately, since jemalloc is
also the system allocator on FreeBSD and a special (profiling-enabled)
build of jemalloc is required for memory leak detection, this method
cannot be used for detecting leaked memory allocated by libthr on a
stock FreeBSD installation.

However, libthr's behavior can be emulated on any platform by
implementing alternative versions of libisc functions for creating and
destroying mutexes that allocate memory using malloc() and release it
using free().  This enables using jemalloc for detecting missing
pthread_mutex_destroy() calls on any platform on which it works
reliably.

Introduce a new ISC_TRACK_PTHREADS_OBJECTS preprocessor macro, which
causes isc_mutex_t structures to be allocated on the heap by
isc_mutex_init() and freed by isc_mutex_destroy().  Reuse existing mutex
macros (after renaming them appropriately) for other operations.
2022-07-13 13:19:32 +02:00
Ondřej Surý
deae974366 Directly cause assertion failure on pthreads primitives failure
Instead of returning error values from isc_rwlock_*(), isc_mutex_*(),
and isc_condition_*() macros/functions and subsequently carrying out
runtime assertion checks on the return values in the calling code,
trigger assertion failures directly in those macros/functions whenever
any pthread function returns an error, as there is no point in
continuing execution in such a case anyway.
2022-07-13 13:19:32 +02:00
Michał Kępień
365b47caee Add an ERRNO_CHECK() preprocessor macro
In a number of situations in pthreads-related code, a common sequence of
steps is taken: if the value returned by a library function is not 0,
pass errno to strerror_r(), log the string returned by the latter, and
immediately abort execution.  Add an ERRNO_CHECK() preprocessor macro
which takes those exact steps and use it wherever (conveniently)
possible.

Notes:

 1. The "log the return value of strerror_r() and abort" pattern is used
    in a number of other places that this commit does not touch; only
    "!= 0" checks followed by isc_error_fatal() calls with
    non-customized error messages are replaced here.

 2. This change temporarily breaks file name & line number reporting for
    isc__mutex_init() errors, to prevent breaking the build.  This issue
    will be rectified in a subsequent change.
2022-07-13 13:19:32 +02:00
Evan Hunt
a499794984 REQUIRE should not have side effects
it's a style violation to have REQUIRE or INSIST contain code that
must run for the server to work. this was being done with some
atomic_compare_exchange calls. these have been cleaned up.  uses
of atomic_compare_exchange in assertions have been replaced with
a new macro atomic_compare_exchange_enforced, which uses RUNTIME_CHECK
to ensure that the exchange was successful.
2022-07-05 12:22:55 -07:00
Artem Boldariev
d2e13ddf22 Update the set of HTTP endpoints on reconfiguration
This commit ensures that on reconfiguration the set of HTTP
endpoints (=paths) is being updated within HTTP listeners.
2022-06-28 15:42:38 +03:00
Artem Boldariev
e72962d5f1 Update max concurrent streams limit in HTTP listeners on reconfig
This commit ensures that HTTP listeners concurrent streams limit gets
updated properly on reconfiguration.
2022-06-28 15:42:38 +03:00
Michal Nowak
1c45a9885a
Update clang to version 14 2022-06-16 17:21:11 +02:00
Ondřej Surý
1fe391fd40 Make all tasks to be bound to a thread
Previously, tasks could be created either unbound or bound to a specific
thread (worker loop).  The unbound tasks would be assigned to a random
thread every time isc_task_send() was called.  Because there's no logic
that would assign the task to the least busy worker, this just creates
unpredictability.  Instead of random assignment, bind all the previously
unbound tasks to worker 0, which is guaranteed to exist.
2022-05-25 16:04:51 +02:00
Artem Boldariev
98f758ed4f CID 352848: split xfrin_start() and remove dead code
This commit separates TLS context creation code from xfrin_start() as
it has become too large and hard to follow into a new
function (similarly how it is done in dighost.c)

The dead code has been removed from the cleanup section of the TLS
creation code:

* there is no way 'tlsctx' can equal 'found';
* there is no way 'sess_cache' can be non-NULL in the cleanup section.

Also, it fixes a bug in the older version of the code, where TLS
client session context fetched from the cache would not get passed to
isc_nm_tlsdnsconnect().
2022-05-25 12:38:38 +03:00
Artem Boldariev
86465c1dac DoT: implement TLS client session resumption
This commit extends DoT code with TLS client session resumption
support implemented on top of the TLS client session cache.
2022-05-20 20:17:48 +03:00
Artem Boldariev
90bc13a5d5 TLS stream/DoH: implement TLS client session resumption
This commit extends TLS stream code and DoH code with TLS client
session resumption support implemented on top of the TLS client
session cache.
2022-05-20 20:17:45 +03:00
Artem Boldariev
987892d113 Extend TLS context cache with TLS client session cache
This commit extends TLS context cache with TLS client session cache so
that an associated session cache can be stored alongside the TLS
context within the context cache.
2022-05-20 20:13:20 +03:00
Artem Boldariev
4ef40988f3 Add TLS client session cache implementation
This commit adds an implementation of a client TLS session cache. TLS
client session cache is an object which allows efficient storing and
retrieval of previously saved TLS sessions so that they can be
resumed. This object is supposed to be a foundation for implementing
TLS session resumption - a standard technique to reduce the cost of
re-establishing a connection to the remote server endpoint.

OpenSSL does server-side TLS session caching transparently by
default. However, on the client-side, a TLS session to resume must be
manually specified when establishing the TLS connection. The TLS
client session cache is precisely the foundation for that.
2022-05-20 20:13:20 +03:00
Ondřej Surý
14c8d43863 Use C2x [[fallthrough]] when supported by LLVM/clang
Clang added support for the gcc-style fallthrough
attribute (i.e. __attribute__((fallthrough))) in version 10.  However,
__has_attribute(fallthrough) will return 1 in C mode in older versions,
even though they only support the C++11 fallthrough attribute. At best,
the unsupported attribute is simply ignored; at worst, it causes errors.

The C2x fallthrough attribute has the advantages of being supported in
the broadest range of clang versions (added in version 9) and being easy
to check for support. Use C2x [[fallthrough]] attribute if possible, and
fall back to not using an attribute for clang versions that don't have
it.

Courtesy of Joshua Root
2022-05-19 21:40:24 +02:00
Evan Hunt
6936db2f59 Always use the number of CPUS for resolver->ntasks
Since the fctx hash table is now self-resizing, and resolver tasks are
selected to match the thread that created the fetch context, there
shouldn't be any significant advantage to having multiple tasks per CPU;
a single task per thread should be sufficient.

Additionally, the fetch context is always pinned to the calling netmgr
thread to minimize the contention just to coalesced fetches - if two
threads starts the same fetch, it will be pinned to the first one to get
the bucket.
2022-05-19 09:27:33 +02:00
Ondřej Surý
0582478c96 Remove isc_task_destroy() and isc_task_shutdown()
After removing the isc_task_onshutdown(), the isc_task_shutdown() and
isc_task_destroy() became obsolete.

Remove calls to isc_task_shutdown() and replace the calls to
isc_task_destroy() with isc_task_detach().

Simplify the internal logic to destroy the task when the last reference
is removed.
2022-05-12 14:55:49 +02:00
Ondřej Surý
2235edabcf Remove isc_task_onshutdown()
The isc_task_onshutdown() was used to post event that should be run when
the task is being shutdown.  This could happen explicitly in the
isc_test_shutdown() call or implicitly when we detach the last reference
to the task and there are no more events posted on the task.

This whole task onshutdown mechanism just makes things more complicated,
and it's easier to post the "shutdown" events when we are shutting down
explicitly and the existing code already always knows when it should
shutdown the task that's being used to execute the onshutdown events.

Replace the isc_task_onshutdown() calls with explicit calls to execute
the shutdown tasks.
2022-05-12 13:45:34 +02:00
Ondřej Surý
b43812692d Move netmgr/uv-compat.h to <isc/uv.h>
As we are going to use libuv outside of the netmgr, we need the shims to
be readily available for the rest of the codebase.

Move the "netmgr/uv-compat.h" to <isc/uv.h> and netmgr/uv-compat.c to
uv.c, and as a rule of thumb, the users of libuv should include
<isc/uv.h> instead of <uv.h> directly.

Additionally, merge netmgr/uverr2result.c into uv.c and rename the
single function from isc__nm_uverr2result() to isc_uverr2result().
2022-05-03 10:02:19 +02:00
Tony Finch
d20ea4a703 Make isc_random_uniform() nearly divisionless
It used to require two 32-bit integer divisions to get a random number
less than some limit. Now we use Daniel Lemire's "nearly-divisionless"
algorithm for unbiased bounded random numbers, which requires one
64-bit integer multiply in the usual case, and one 32-bit integer
division in rare slow cases. Even the slow cases are faster than
before; there are also fewer branches.

I think this algorithm is exceptionally beautiful. It also has more
clever tricks than lines of code, so I have done my best to explain
how it works.
2022-04-22 16:40:37 +01:00
Ondřej Surý
d1d88a2895 Add detailed tracing when TASKMGR_TRACE is defined
When TASKMGR_TRACE=1 is defined, the task and event objects have
detailed tracing information about function, file, line, and
backtrace (to the extent tracked by gcc) where it was created.

At exit, when there are unfinished tasks, they will be printed along
with the detailed information.
2022-04-19 14:25:23 +02:00
Ondřej Surý
f0feaa3305 Remove isc_task_sendto(anddetach) functions
The only place where isc_task_sendto() was used was in dns_resolver
unit, where the "sendto" part was actually no-op, because dns_resolver
uses bound tasks.  Remove the isc_task_sendto() and
isc_task_sendtoanddetach() functions in favor of using bound tasks
create with isc_task_create_bound().

Additionally, cache the number of running netmgr threads (nworkers)
locally to reduce the number of function calls.
2022-04-19 14:24:36 +02:00
Ondřej Surý
1eeb4c1121 Remove isc_event_constallocate()
The isc_event_constallocate() function was not used anywhere, thus
remove the isc_event_constallocate() macro, declaration and definition.
2022-04-19 13:46:26 +02:00
Ondřej Surý
f55a4d3e55 Allow listening on less than nworkers threads
For some applications, it's useful to not listen on full battery of
threads.  Add workers argument to all isc_nm_listen*() functions and
convenience ISC_NM_LISTEN_ONE and ISC_NM_LISTEN_ALL macros.
2022-04-19 11:08:13 +02:00
Artem Boldariev
df317184eb Add isc_nmsocket_set_tlsctx()
This commit adds isc_nmsocket_set_tlsctx() - an asynchronous function
that replaces the TLS context within a given TLS-enabled listener
socket object. It is based on the newly added reference counting
functionality.

The intention of adding this function is to add functionality to
replace a TLS context without recreating the whole socket object,
including the underlying TCP listener socket, as a BIND process might
not have enough permissions to re-create it fully on reconfiguration.
2022-04-06 18:45:57 +03:00
Artem Boldariev
a7a482c1b1 Add isc_tlsctx_attach()
The implementation is done on top of the reference counting
functionality found in OpenSSL/LibreSSL, which allows for avoiding
wrapping the object.

Adding this function allows using reference counting for TLS contexts
in BIND 9's codebase.
2022-04-06 18:45:57 +03:00
Ondřej Surý
142c63dda8 Enable the load-balance-sockets configuration
Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private
netmgr-int.h header file, making the configuration of load balanced
sockets inoperable.

Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add
missing isc_nm_getloadbalancesockets() implementation.
2022-04-05 01:30:58 +02:00
Ondřej Surý
85c6e797aa Add option to configure load balance sockets
Previously, the option to enable kernel load balancing of the sockets
was always enabled when supported by the operating system (SO_REUSEPORT
on Linux and SO_REUSEPORT_LB on FreeBSD).

It was reported that in scenarios where the networking threads are also
responsible for processing long-running tasks (like RPZ processing, CATZ
processing or large zone transfers), this could lead to intermitten
brownouts for some clients, because the thread assigned by the operating
system might be busy.  In such scenarious, the overall performance would
be better served by threads competing over the sockets because the idle
threads can pick up the incoming traffic.

Add new configuration option (`load-balance-sockets`) to allow enabling
or disabling the load balancing of the sockets.
2022-04-04 23:10:04 +02:00
Ondřej Surý
f106d0ed2b Run the RPZ update as offloaded work
Previously, the RPZ updates ran quantized on the main nm_worker loops.
As the quantum was set to 1024, this might lead to service
interruptions when large RPZ update was processed.

Change the RPZ update process to run as the offloaded work.  The update
and cleanup loops were refactored to do as little locking of the
maintenance lock as possible for the shortest periods of time and the db
iterator is being paused for every iteration, so we don't hold the rbtdb
tree lock for prolonged periods of time.
2022-04-04 21:20:05 +02:00
Ondřej Surý
ae01ec2823 Don't use reference counting in isc_timer unit
The reference counting and isc_timer_attach()/isc_timer_detach()
semantic are actually misleading because it cannot be used under normal
conditions.  The usual conditions under which is timer used uses the
object where timer is used as argument to the "timer" itself.  This
means that when the caller is using `isc_timer_detach()` it needs the
timer to stop and the isc_timer_detach() does that only if this would be
the last reference.  Unfortunately, this also means that if the timer is
attached elsewhere and the timer is fired it will most likely be
use-after-free, because the object used in the timer no longer exists.

Remove the reference counting from the isc_timer unit, remove
isc_timer_attach() function and rename isc_timer_detach() to
isc_timer_destroy() to better reflect how the API needs to be used.

The only caveat is that the already executed event must be destroyed
before the isc_timer_destroy() is called because the timer is no longet
attached to .ev_destroy_arg.
2022-04-02 01:23:15 +02:00
Ondřej Surý
30e0fd942b Remove task privileged mode
Previously, the task privileged mode has been used only when the named
was starting up and loading the zones from the disk as the "first" thing
to do.  The privileged task was setup with quantum == 2, which made the
taskmgr/netmgr spin around the privileged queue processing two events at
the time.

The same effect can be achieved by setting the quantum to UINT_MAX (e.g.
practically unlimited) for the loadzone task, hence the privileged task
mode was removed in favor of just processing all the events on the
loadzone task in a single task_run().
2022-04-01 23:55:26 +02:00
Ondřej Surý
62a72211aa Remove isc_pool API
Since the last user of the isc_pool API is gone, remove the whole
isc_pool API.
2022-04-01 23:50:34 +02:00
Ondřej Surý
2707d0eeb7 Set hard thread affinity for each zone
After switching to per-thread resources in the zonemgr, the performance
was decreased because the memory context, zonetask and loadtask was
picked from the pool at random.

Pin the zone to single threadid (.tid) and align the memory context,
zonetask and loadtask to be the same, this sets the hard affinity of the
zone to the netmgr thread.
2022-04-01 23:50:34 +02:00
Ondřej Surý
a94678ff77 Create per-thread task and memory context for zonemgr
Previously, the zonemgr created 1 task per 100 zones and 1 memory
context per 1000 zones (with minimum 10 tasks and 2 memory contexts) to
reduce the contention between threads.

Instead of reducing the contention by having many resources, create a
per-nm_thread memory context, loadtask and zonetask and spread the zones
between just per-thread resources.

Note: this commit alone does decrease performance when loading the zone
by couple seconds (in case of 1M zone) and thus there's more work in
this whole MR fixing the performance.
2022-04-01 23:50:34 +02:00
Ondřej Surý
15ea6f002f Add isc_task_setquantum() and use it for post-init zone loading
Add isc_task_setquantum() function that modifies quantum for the future
isc_task_run() invocations.

NOTE: The current isc_task_run() caches the task->quantum into a local
variable and therefore the current event loop is not affected by any
quantum change.
2022-04-01 23:45:23 +02:00
Ondřej Surý
c17eee034b Remove isc_task_purge() and isc_task_purgerange()
The isc_task_purge() and isc_task_purgerange() were now unused, so sweep
the task.c file.  Additionally remove unused ISC_EVENTATTR_NOPURGE event
attribute.
2022-04-01 23:45:23 +02:00
Ondřej Surý
48b2a5df97 Keep the list of scheduled events on the timer
Instead of searching for the events to purge, keep the list of scheduled
events on the timer list and purge the events that we have scheduled.
2022-04-01 23:45:23 +02:00
Ondřej Surý
17aed2f895 Repair isc_task_purgeevent(), clean isc_task_unsend{,range}()
The isc_task_purgerange() was walking through all events on the task to
find a matching task.  Instead use the ISC_LINK_LINKED to find whether
the event is active.

Cleanup the related isc_task_unsend() and isc_task_unsendrange()
functions that were not used anywhere.
2022-04-01 23:45:23 +02:00
Ondřej Surý
b84c9b2608 Turn isc_hash_bits32() into static online function
Adding extra val & 0xffff in the isc_hash_bits32() macros in the hotpath
has significantly reduced the performance.  Turn the macro into static
inline function matching the previous hash_32() function used to compute
hashval matching the hashtable->bits.
2022-04-01 23:04:24 +02:00
Ondřej Surý
b05a991ad0 Make isc_ht optionally case insensitive
Previously, the isc_ht API would always take the key as a literal input
to the hashing function.  Change the isc_ht_init() function to take an
'options' argument, in which ISC_HT_CASE_SENSITIVE or _INSENSITIVE can
be specified, to determine whether to use case-sensitive hashing in
isc_hash32() when hashing the key.
2022-03-28 15:02:18 -07:00
Evan Hunt
e9ef3defa4 consolidate fibonacci hashing in one place
Fibonacci hashing was implemented in four separate places (rbt.c,
rbtdb.c, resolver.c, zone.c). This commit combines them into a single
implementation. The hash_32() function is now replaced with
isc_hash_bits32().
2022-03-28 14:44:21 -07:00
Artem Boldariev
783663db80 Add ISC_R_TLSBADPEERCERT error code to the TLS related code
This commit adds support for ISC_R_TLSBADPEERCERT error code, which is
supposed to be used to signal for TLS peer certificates verification
in dig and other code.

The support for this error code is added to our TLS and TLS DNS
implementations.

This commit also adds isc_nm_verify_tls_peer_result_string() function
which is supposed to be used to get a textual description of the
reason for getting a ISC_R_TLSBADPEERCERT error.
2022-03-28 15:32:30 +03:00
Artem Boldariev
71cf8fa5ac Extend TLS context cache with CA certificates store
This commit adds support for keeping CA certificates stores associated
with TLS contexts. The intention is to keep one reusable store per a
set of related TLS contexts.
2022-03-28 15:31:22 +03:00
Artem Boldariev
c49a81e27d Add foundational functions to implement Strict/Mutual TLS
This commit adds a set of functions that can be used to implement
Strict and Mutual TLS:

* isc_tlsctx_load_client_ca_names();
* isc_tlsctx_load_certificate();
* isc_tls_verify_peer_result_string();
* isc_tlsctx_enable_peer_verification().
2022-03-28 15:31:22 +03:00
Artem Boldariev
32783d36c2 Add utility functions to manipulate X509 certificate stores
This commit adds a set of high-level utility functions to manipulate
the certificate stores. The stores are needed to implement TLS
certificates verification efficiently.
2022-03-28 15:31:22 +03:00
Ondřej Surý
9de10cd153 Remove extrahandle size from netmgr
Previously, it was possible to assign a bit of memory space in the
nmhandle to store the client data.  This was complicated and prevents
further refactoring of isc_nmhandle_t caching (future work).

Instead of caching the data in the nmhandle, allocate the hot-path
ns_client_t objects from per-thread clientmgr memory context and just
assign it to the isc_nmhandle_t via isc_nmhandle_set().
2022-03-25 10:38:35 +01:00
Ondřej Surý
04d0b70ba2 Replace ISC_NORETURN with C11's noreturn
C11 has builtin support for _Noreturn function specifier with
convenience noreturn macro defined in <stdnoreturn.h> header.

Replace ISC_NORETURN macro by C11 noreturn with fallback to
__attribute__((noreturn)) if the C11 support is not complete.
2022-03-25 08:33:43 +01:00
Ondřej Surý
584f0d7a7e Simplify way we tag unreachable code with only ISC_UNREACHABLE()
Previously, the unreachable code paths would have to be tagged with:

    INSIST(0);
    ISC_UNREACHABLE();

There was also older parts of the code that used comment annotation:

    /* NOTREACHED */

Unify the handling of unreachable code paths to just use:

    UNREACHABLE();

The UNREACHABLE() macro now asserts when reached and also uses
__builtin_unreachable(); when such builtin is available in the compiler.
2022-03-25 08:33:43 +01:00
Ondřej Surý
fe7ce629f4 Add FALLTHROUGH macro for __attribute__((fallthrough))
Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which
is explicit version of the /* FALLTHROUGH */ comment we are currently
using.

Add and apply FALLTHROUGH macro that uses the attribute if available,
but does nothing on older compilers.

In one case (lib/dns/zone.c), using the macro revealed that we were
using the /* FALLTHROUGH */ comment in wrong place, remove that comment.
2022-03-25 08:33:43 +01:00
Ondřej Surý
d70daa29f7 Make netmgr the authority on number of threads running
Instead of passing the "workers" variable back and forth along with
passing the single isc_nm_t instance, add isc_nm_getnworkers() function
that returns the number of netmgr threads are running.

Change the ns_interfacemgr and ns_taskmgr to utilize the newly acquired
knowledge.
2022-03-18 21:53:28 +01:00
Ondřej Surý
e42cb1f198 Implement incremental hash table resizing in isc_ht
Previously, an incremental hash table resizing was implemented for the
dns_rbt_t hash table implementation.  Using that as a base, also
implement the incremental hash table resizing also for isc_ht API
hashtables:

 1. During the resize, allocate the new hash table, but keep the old
    table unchanged.
 2. In each lookup, delete, or iterator operation, check both tables.
 3. Perform insertion operations only in the new table.
 4. At each insertion also move <r> elements from the old table to
    the new table.
 5. When all elements are removed from the old table, deallocate it.

To ensure that the old table is completely copied over before the new
table itself needs to be enlarged, it is necessary to increase the
size of the table by a factor of at least (<r> + 1)/<r> during resizing.

In our implementation <r> is equal to 1.

The downside of this approach is that the old table and the new table
could stay in memory for longer when there are no new insertions into
the hash table for prolonged periods of time as the incremental
rehashing happens only during the insertions.
2022-03-17 08:16:24 +01:00
Ondřej Surý
79b5ccbf34 Implement isc_interval_t on top of isc_time_t
Change the isc_interval_t implementation from separate data type and
separate implementation to be shim implementation on top of isc_time_t.
The distinction between isc_interval_t and isc_time_t has been kept
because they are semantically different - isc_interval_t is relative and
isc_time_t is absolute, but this allows isc_time_t and isc_interval_t to
be freely interchangeable, f.e. this:

    isc_time_t *t1;
    isc_interval_t *interval;
    isc_time_t *t2;

    isc_interval_set(interval, isc_time_seconds(t2), isc_time_nanoseconds(t2);;
    isc_time_subtract(t1, interval, t2);
    isc_interval_set(interval, isc_time_seconds(t2), isc_time_nanoseconds(t2));

to just:

    isc_time_t *t1;
    isc_interval_t *interval;
    isc_time_t *t2;

    isc_time_subtract(t1, t2, interval);

without introducing a whole set of new functions.
2022-03-14 13:00:05 -07:00
Ondřej Surý
e6ca2a651f Refactor isc_timer_reset() use with semantic patch
Add and apply semantic patch to remove expires argument from the
isc_timer_reset() calls through the codebase.
2022-03-14 13:00:05 -07:00
Ondřej Surý
6437bcc488 Remove expires argument from isc_timer API
The isc_timer_reset() now works only with intervals for once timers.

This makes the API almost 1:1 compatible with the libuv timers making
the further refactoring possible.
2022-03-14 13:00:05 -07:00
Ondřej Surý
c259cecc90 Refactor isc_timer_create() to just create timer
The isc_timer_create() function was a bit conflated.  It could have been
used to create a timer and start it at the same time.  As there was a
single place where this was done before (see the previous commit for
nta.c), this was cleaned up and the isc_timer_create() function was
changed to only create new timer.
2022-03-14 13:00:05 -07:00
Ondřej Surý
8fbb42c49c Remove "a temporary hack, 'rndc timerpoke'"
In 2002, "a temporary hack, 'rndc timerpoke'" was added.  It's time
for it to go, so it was removed.
2022-03-14 13:00:05 -07:00
Ondřej Surý
f4751a91f7 Remove unused isc_timer_touch() function
The isc_timer_touch() was unused, just remove it.
2022-03-14 13:00:05 -07:00
Ondřej Surý
bbe1c06a8b Remove isc_timertype_limited from isc_timer API
The isc_timertype_limited timer type was never used (not even in tests).
Remove isc_timertype_limited timer type before planned refactoring.
2022-03-14 13:00:05 -07:00
Ondřej Surý
f251d69eba Remove usage of deprecated ATOMIC_VAR_INIT() macro
The C17 standard deprecated ATOMIC_VAR_INIT() macro (see [1]).  Follow
the suite and remove the ATOMIC_VAR_INIT() usage in favor of simple
assignment of the value as this is what all supported stdatomic.h
implementations do anyway:

  * MacOSX.plaform: #define ATOMIC_VAR_INIT(__v) {__v}
  * Gcc stdatomic.h: #define ATOMIC_VAR_INIT(VALUE)	(VALUE)

1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1138r0.pdf
2022-03-08 23:55:10 +01:00
Ondřej Surý
8fa27365ec Make isc_ht_init() and isc_ht_iter_create() return void
Previously, the function(s) in the commit subject could fail for various
reasons - mostly allocation failures, or other functions returning
different return code than ISC_R_SUCCESS.  Now, the aforementioned
function(s) cannot ever fail and they would always return ISC_R_SUCCESS.

Change the function(s) to return void and remove the extra checks in
the code that uses them.
2022-03-08 14:51:55 +01:00
Ondřej Surý
bbb4cdb92d Make isc_heap_create() and isc_heap_insert() return void
Previously, the function(s) in the commit subject could fail for various
reasons - mostly allocation failures, or other functions returning
different return code than ISC_R_SUCCESS.  Now, the aforementioned
function(s) cannot ever fail and they would always return ISC_R_SUCCESS.

Change the function(s) to return void and remove the extra checks in
the code that uses them.
2022-03-08 11:19:34 +01:00
Ondřej Surý
6bd025942c Replace netievent lock-free queue with simple locked queue
The current implementation of isc_queue uses Michael-Scott lock-free
queue that in turn uses hazard pointers.  It was discovered that the way
we use the isc_queue, such complicated mechanism isn't really needed,
because most of the time, we either execute the work directly when on
nmthread (in case of UDP) or schedule the work from the matching
nmthreads.

Replace the current implementation of the isc_queue with a simple locked
ISC_LIST.  There's a slight improvement - since copying the whole list
is very lightweight - we move the queue into a new list before we start
the processing and locking just for moving the queue and not for every
single item on the list.

NOTE: There's a room for future improvements - since we don't guarantee
the order in which the netievents are processed, we could have two lists
- one unlocked that would be used when scheduling the work from the
matching thread and one locked that would be used from non-matching
thread.
2022-03-04 13:49:51 +01:00
Ondřej Surý
d01562f22b Remove the keep-response-order ACL map
The keep-response-order option has been obsoleted, and in this commit,
remove the keep-response-order ACL map rendering the option no-op, the
call the isc_nm_sequential() and the now unused isc_nm_sequential()
function itself.
2022-02-18 09:16:03 +01:00
Ondřej Surý
3c7b04d015 Add network manager based timer API
This commits adds API that allows to create arbitrary timers associated
with the network manager handles.
2022-02-17 21:38:17 +01:00
Ondřej Surý
a89d9e0fa6 Add isc_nmhandle_setwritetimeout() function
In some situations (unit test and forthcoming XFR timeouts MR), we need
to modify the write timeout independently of the read timeout.  Add a
isc_nmhandle_setwritetimeout() function that could be called before
isc_nm_send() to specify a custom write timeout interval.
2022-02-17 09:06:58 +01:00
Ondřej Surý
0500345513 Remove unused functions from isc_thread API
The isc_thread_setaffinity call was removed in !5265 and we are not
going to restore it because it was proven that the performance is better
without it.  Additionally, remove the already disabled cpu system test.

The isc_thread_setconcurrency function is unused and also calling
pthread_setconcurrency() on Linux has no meaning, formerly it was
added because of Solaris in 2001 and it was removed when taskmgr was
refactored to run on top of netmgr in !4918.
2022-02-09 17:22:06 +01:00
Evan Hunt
d3fed6f400 update dlz_minimal.h
the addition of support for ECS client information in DLZ
modules omitted some necessary changes to build modules
in contrib.
2022-01-27 15:48:50 -08:00
Petr Menšík
f00f521e9c Use detected cache line size
IBM power architecture has L1 cache line size equal to 128.  Take
advantage of that on that architecture, do not force more common value
of 64.  When it is possible to detect higher value, use that value
instead.  Keep the default to be 64.
2022-01-27 13:02:23 +01:00
Ondřej Surý
58bd26b6cf Update the copyright information in all files in the repository
This commit converts the license handling to adhere to the REUSE
specification.  It specifically:

1. Adds used licnses to LICENSES/ directory

2. Add "isc" template for adding the copyright boilerplate

3. Changes all source files to include copyright and SPDX license
   header, this includes all the C sources, documentation, zone files,
   configuration files.  There are notes in the doc/dev/copyrights file
   on how to add correct headers to the new files.

4. Handle the rest that can't be modified via .reuse/dep5 file.  The
   binary (or otherwise unmodifiable) files could have license places
   next to them in <foo>.license file, but this would lead to cluttered
   repository and most of the files handled in the .reuse/dep5 file are
   system test files.
2022-01-11 09:05:02 +01:00