Commit graph

5036 commits

Author SHA1 Message Date
Ondřej Surý
3230c8e369
Add isc_sockaddr_hash_ex that can be used in incremental hashing
Add a sockaddr hashing function that can be used as part of incremental
hashing.
2023-09-19 19:56:33 +02:00
Ondřej Surý
9f40eee0a8
Remove isc_hash_function macro
The last two users of 64-bit isc_hash_function() macro were removed in
the previous commits, remove the macro as well.
2023-09-19 19:56:33 +02:00
Ondřej Surý
1653fa61c7
Use 32-bit hashing in isc memory debugging
Switch from 64-bit isc_hash_function() to 32-bit isc_hash32() as we were
using the 32-bit value only anyway.
2023-09-19 19:51:51 +02:00
Ondřej Surý
9d326aaba3
Use incremental hashing in the isc_sockaddr_hash() function
Instead of copying address back and forth when hashing addr+port, we can
use incremental hashing.  Additionally, switch from 64-bit
isc_hash_function to 32-bit isc_hash32() as the resulting value is
32-bit.
2023-09-19 19:51:51 +02:00
Ondřej Surý
26685ce5a8 Remove Raw and FDWatch type of socket statistics
The isc_sockstatscounter_raw* and isc_sockstatscounter_fdwatch was just
a dead code and those counters were not used anywhere.  Remove them.
2023-09-19 18:51:35 +02:00
Ondřej Surý
7aebbec653 Completely remove the Unix Domain Socket support from BIND 9
The Unix Domain Sockets support in BIND 9 has been completely disabled
since BIND 9.18 and it has been a fatal error since then.  Cleanup the
code and the documentation that suggest that Unix Domain Sockets are
supported.
2023-09-19 18:51:35 +02:00
Ondřej Surý
45fb84076d
Add assertion failure when adding to hashmap when iterating
When iterating the table, we can't add new nodes to the hashmap because
we can't assure that we are not adding the new node before the iterator.

This also applies to rehashing - which might be triggered by both
isc_hashmap_add() and isc_hashmap_delete(), but not
isc_hashmap_iter_delcurrent_next().
2023-09-19 11:18:04 +02:00
Mark Andrews
92a0d65a51
Fix hashmap iteration
When isc_hashmap_iter_delcurrent_next calls hashmap_delete_node
nodes from the front of the table could be added to the end of
the table resulting in them being returned twice.  Detect when
this is happening and prevent those nodes being returned twice
buy reducing the effective size of the table by one each time
it happens.
2023-09-19 11:18:03 +02:00
Ondřej Surý
6fd06c461b
Make dns_dispatch bound to threads
Instead of high number of dispatches (4 * named_g_udpdisp)[1], make the
dispatches bound to threads and make dns_dispatchset_t create a dispatch
for each thread (event loop).

This required couple of other changes:

1. The dns_dispatch_createudp() must be called on loop, so the isc_tid()
   is already initialized - changes to nsupdate and mdig were required.

2. The dns_requestmgr had only a single dispatch per v4 and v6.  Instead
   of using single dispatch, use dns_dispatchset_t for each protocol -
   this is same as dns_resolver.
2023-09-16 07:32:17 +02:00
Ondřej Surý
282c4709b8
Rewrite the QID lookup table to cds_lfht
Looking up unique message ID in the dns_dispatch has been using custom
hash tables.  Rewrite the custom hashtable to use cds_lfht API, removing
one extra lock in the cold-cache resolver hot path.
2023-09-16 07:32:17 +02:00
Ondřej Surý
e270266627
Refactor isc_hashmap to accept custom match function
Refactor isc_hashmap to allow custom matching functions.  This allows us
to have better tailored keys that don't require fixed uint8_t arrays,
but can be composed of more fields from the stored data structure.
2023-09-16 07:20:48 +02:00
Ondřej Surý
6ac286d4a3
Implement incremental version of isc_hash32 and isc_hash64
Add support for incremental hashing to the isc_hash unit, both 32-bit
and 64-bit incremental hashing is now supported.

This is commit second in series adding incremental hashing to libisc.
2023-09-12 16:17:06 +02:00
Ondřej Surý
4dd49ac528
Implement incremental version of SipHash 2-4 and HalfSipHash 2-4
When inserting items into hashtables (hashmaps), we might have a
fragmented key (as an example we might want to hash DNS name + class +
type).  We either need to construct continuous key in the memory and
then hash it en bloc, or incremental hashing is required.

This incremental version of SipHash 2-4 algorithm is the first building
block.

As SipHash 2-4 is often used in the hot paths, I've turned the
implementation into header-only version in the process.
2023-09-12 16:17:06 +02:00
Mark Andrews
820b0cceef
Limit isccc_cc_fromwire recursion depth
Named and rndc do not need a lot of recursion so the depth is
set to 10.
2023-09-07 19:46:19 +02:00
Mark Andrews
28adcf1831 Call ERR_clear_error on EVP_MD_fetch or EVP_##alg error 2023-09-06 00:28:56 +00:00
Ondřej Surý
d862f4bc64
Ignore jemalloc versions before 4.0.0
We now depend on explicitly creating memory arenas and disabling tcache
on those, and these features are not available with jemalloc < 4.
Instead of working around these issues, make the jemalloc >= 4.0.0 hard
requirement by looking for sdallocx() symbol that's only available from
that version.

The jemalloc < 4 was only used by RHEL 7 which is not supported since
BIND 9.19+.
2023-09-05 18:46:57 +02:00
Artem Boldariev
6e98b58d15
Add ability to set per jemalloc arena dirty and muzzy decay values
This commit adds couple of functions to change "dirty_decay_ms" and
"muzzy_decay_ms" settings on arenas associated with memory contexts.
2023-09-05 09:39:41 +02:00
Artem Boldariev
8550c52588
Make it possible to create memory contexts backed by jemalloc arenas
This commit extends the internal memory management middleware code in
BIND so that memory contexts backed by dedicated jemalloc arenas can
be created. A new function (isc_mem_create_arena()) is added for that.

Moreover, it extends the existing code so that specialised memory
contexts can be created easily, should we need that functionality for
other future purposes. We have achieved that by passing the flags to
the underlying jemalloc-related calls. See the above
isc_mem_create_arena(), which can serve as an example of this.

Having this opens up possibilities for creating memory contexts tuned
for specific needs.
2023-09-05 09:39:41 +02:00
Mark Andrews
9e2288208d Take ownership of pointer before freeing 2023-09-01 12:01:20 +10:00
Mark Andrews
4f790b6c58 Clear OpenSSL errors on TSL error paths 2023-09-01 12:01:20 +10:00
Mark Andrews
247422c69f Clear OpenSSL errors on SHA failures 2023-09-01 12:01:20 +10:00
Mark Andrews
4ea926934a Clear OpenSSL errors on EVP failures 2023-09-01 12:01:19 +10:00
Mark Andrews
6df53cdb87 Clear OpenSSL errors on EVP_PKEY_new failures 2023-09-01 12:01:19 +10:00
Ondřej Surý
d9048b3db1
Remove ISC_MEM_ZERO and isc_mem_*x() API
Use the new isc_mem_c*() calloc-like API for allocations that are
zeroed.

In turn, this also fixes couple of incorrect usage of the ISC_MEM_ZERO
for structures that need to be zeroed explicitly.

There are few places where isc_mem_cput() is used on structures with a
flexible member (or similar).
2023-08-31 22:08:35 +02:00
Ondřej Surý
8ac679a980
Remove ISC_MEM_ALIGN() memory flag
The ISC_MEM_ALIGN() was not used anywhere (except mem.c itself), so just
remove the unused flag.
2023-08-31 22:08:35 +02:00
Ondřej Surý
55c29b8d83
Do extra manual isc_mem_cget() conversions
Some of the cases weren't caught by the coccinelle and there were some
places where cget+memmove() could get converted to simple creget().
2023-08-31 22:08:35 +02:00
Ondřej Surý
89fcb6f897
Apply the isc_mem_cget semantic patch 2023-08-31 22:08:35 +02:00
Ondřej Surý
6272482113
Checked array allocation arithmetic with isc_mem_get and friends
Add new isc_mem_cget(), isc_mem_creget(), and isc_mem_cput() macros to
complement the isc_mem_callocate() (which works like calloc()).

The overflow checks are implemented as macros in the <isc/mem.h>, so
that the compiler can see that the element size is constant: it should
always be `sizeof(something)`.
2023-08-31 22:08:35 +02:00
Aram Sargsyan
a33dc921dc Fix a condition in isc_dnsstream_assembler_incoming()
Before calling isc_buffer_putmem(), there is a condition to check
that 'buf_size' is greater than 0. At this point 'buf_size' is
guaranteed to be greater than zero, so either the condition is
redundant, or 'unprocessed_size' should be checked instead, which
seems more logical, because calling isc_buffer_putmem() with
'unprocessed_size' being zero is not useful, although harmless.
2023-08-24 11:59:57 +00:00
Aram Sargsyan
9a271371d3 Handle cases when buf_size is zero
The isc_dnsstream_assembler_incoming() inline function expects that
when 'buf_size' is zero, then 'buf' must be NULL. The expectation is
not correct, because those values come from the libuv read callback,
and its documentation notes[1] that 'nread' ('buf_size' here) might
be 0, which does not indicate an error or EOF, but is equivalent to
EAGAIN or EWOULDBLOCK under read(2).

Change the isc_dnsstream_assembler_incoming() inline function to
remove the invalid expectation.

[1] https://docs.libuv.org/en/v1.x/stream.html#c.uv_read_cb
2023-08-24 11:59:57 +00:00
Tony Finch
52fcc9fc0f
Remove some unnecessary token pasting macrology
There used to be an extra layer of indirection in the memory functions
for certain dynamic linking scenarios. This involved variant spellings
like isc__mem and isc___mem. The isc___mem variants were removed in
commit 7de846977b so the token pasting is no longer needed and
only serves to obfuscate.
2023-08-23 14:49:15 +02:00
Ondřej Surý
2484a3702a
Add tracing probes to the isc_job unit
Add tracing probes to isc_job unit:

 * libisc:job_cb_before - before the job callback is called
 * libisc:job_cb_after - after the job callback is called
2023-08-21 18:39:53 +02:00
Ondřej Surý
dcd60215ac
Add tracing probes to the custom isc_rwlock implementation
Add tracing probes to ISC own isc_rwlock implementation to allow
fine-grained tracing.  The pthread rwlock already has probes inside
glibc, and it's difficult to add probes to headers included from the
other libraries.
2023-08-21 18:39:53 +02:00
Ondřej Surý
784d055809
Add support for User Statically Defined Tracing (USDT) probes
This adds support for User Statically Defined Tracing (USDT).  On
Linux, this uses the header from SystemTap and dtrace utility, but the
support is universal as long as dtrace is available.

Also add the required infrastructure to add probes to libisc, libdns and
libns libraries, where most of the probes will be.
2023-08-21 18:39:53 +02:00
Ondřej Surý
0c9cf8fabb
Limit the memory pool for the uvreqs
Set the number of maximum free items for the uvreq memory pool to 64.
2023-08-21 16:34:30 +02:00
Ondřej Surý
f36e118b9a
Limit the number of inactive handles kept for reuse
Instead of growing and never shrinking the list of the inactive
handles (to be reused mostly on the UDP connections), limit the number
of maximum number of inactive handles kept to 64.  Instead of caching
the inactive handles for all listening sockets, enable the caching on on
UDP listening sockets.  For TCP, the handles were cached for each
accepted socket thus reusing the handles only for long-standing TCP
connections, but not reusing the handles across different TCP streams.
2023-08-21 16:34:30 +02:00
Tony Finch
26e10e8fb5
Parse statschannel Content-Length: more carefully
A negative or excessively large Content-Length could cause a crash
by making `INSIST(httpd->consume != 0)` fail.
2023-08-21 14:14:18 +02:00
Tony Finch
c622b349e4
Apply the SET_IF_NOT_NULL() semantic patch
spatch --sp-file cocci/set_if_not_null.spatch --use-gitgrep --dir "." --include-headers --in-place
2023-08-15 12:21:41 +02:00
Tony Finch
0d6dcd217d
A SET_IF_NOT_NULL() macro for optional return values
The SET_IF_NOT_NULL() macro avoids a fair amount of tedious boilerplate,
checking pointer parameters to see if they're non-NULL and updating
them if they are.  The macro was already in the dns_zone unit, and this
commit moves it to the <isc/util.h> header.

I have included a Coccinelle semantic patch to use SET_IF_NOT_NULL()
where appropriate. The patch needs an #include in `openssl_shim.c`
in order to work.
2023-08-15 12:04:29 +02:00
Tony Finch
b22c87ca61
Fix a stack buffer overflow in the statistics channel
A long timestamp in an If-Modified-Since header could overflow a
fixed-size buffer.
2023-08-14 11:30:24 +02:00
Ondřej Surý
c1821ccf92
Call rcu_barrier() five times in the isc__mem_destroy()
Because rcu_barrier() needs to be called as many times as the number of
nested call_rcu() calls (call_rcu() calls made from call_rcu thread),
and currently there's no mechanism to detect whether there are more
call_rcu callbacks scheduled, we simply call the rcu_barrier() multiple
times.  The overhead is negligible and it prevents rare assertion
failures caused by the check for memory leaks in isc__mem_destroy().
2023-07-31 15:51:15 +02:00
Ondřej Surý
4ca64c1799
Pin dns_request to the associated loop
When dns_request was canceled via dns_requestmgr_shutdown() the cancel
event would be propagated on different loop (loop 0) than the loop where
request was created on.  In turn this would propagate down to isc_netmgr
where we require all the events to be called from the matching isc_loop.

Pin the dns_requests to the loops and ensure that all the events are
called on the associated loop.  This in turn allows us to remove the
hashed locks on the requests and change the single .requests list to be
a per-loop list for the request accounting.

Additionally, do some extra cleanup because some race condititions are
now not possible as all events on the dns_request are serialized.
2023-07-28 09:01:22 +02:00
Ondřej Surý
b6b0d81a36
Cleanup the __tsan_acquire/__tsan_release
With ThreadSanitizer support added to the Userspace RCU, we no longer
need to wrap the call_rcu and caa_container_of with
__tsan_{acquire,release} hints.  Remove the direct calls to
__tsan_{acquire,release} and the isc_urcu_{container,cleanup} macros.
2023-07-28 08:59:08 +02:00
Ondřej Surý
dc3e07572b
Workaround AddressSanitizer overzealous check
The cds_lfht_for_each_entry and cds_lfht_for_each_entry_duplicate macros
had a code that operated on the NULL pointer, at the end of the list it
was calling caa_container_of() on the NULL pointer in the init-clause
and iteration-expression, but the result wasn't actually used anywhere
because the cond-expression in the for loop has prevented executing
loop-statement.  This made AddressSanitizer notice the invalid operation
and rightfully complain.

This was reported to the upstream and fixed there.  Pull the upstream
fix into our <isc/urcu.h> header, so our CI checks pass.
2023-07-27 15:21:39 +02:00
Ondřej Surý
5321c474ea Refactor isc_stats_create() and its downstream users to return void
The isc_stats_create() can no longer return anything else than
ISC_R_SUCCESS.  Refactor isc_stats_create() and its variants in libdns,
libns and named to just return void.
2023-07-27 11:37:44 +02:00
Evan Hunt
e37d02905c
add isc_loop_now() to get consistent time
isc_loop_now() is a front-end to uv_now(), returning the start
time of the current event loop tick.
2023-07-19 15:32:21 +02:00
Evan Hunt
4db150437e
clean up unused dns_db methods
to reduce the amount of common code that will need to be shared
between the separated cache and zone database implementations,
clean up unused portions of dns_db.

the methods dns_db_dump(), dns_db_isdnssec(), dns_db_printnode(),
dns_db_resigned(), dns_db_expirenode() and dns_db_overmem() were
either never called or were only implemented as nonoperational stub
functions: they have now been removed.

dns_db_nodefullname() was only used in one place, which turned out
to be unnecessary, so it has also been removed.

dns_db_ispersistent() and dns_db_transfernode() are used, but only
the default implementation in db.c was ever actually called. since
they were never overridden by database methods, there's no need to
retain methods for them.

in rbtdb.c, beginload() and endload() methods are no longer defined for
the cache database, because that was never used (except in a few unit
tests which can easily be modified to use the zone implementation
instead).  issecure() is also no longer defined for the cache database,
as the cache is always insecure and the default implementation of
dns_db_issecure() returns false.

for similar reasons, hashsize() is no longer defined for zone databases.

implementation functions that are shared between zone and cache are now
prepended with 'dns__rbtdb_' so they can become nonstatic.

serve_stale_ttl is now a common member of dns_db.
2023-07-17 14:50:25 +02:00
Tony Finch
e2eaefbf7a
Check for overflow when resizing a heap
Ensure that the heap size calculations produce the correct answers,
and use `isc_mem_reget()` instead of calling `get` and `put`.

Closes #4122
2023-06-27 12:38:09 +02:00
Tony Finch
14f5b79c74
Check for overflow in jemalloc_shim
When compiled using a malloc that lacks an equivalent to sallocx(),
the jemalloc_shim adds a size prefix to each allocation. We must check
that this does not overflow.

Closes #4121
2023-06-27 12:38:09 +02:00
Tony Finch
92fcb7457c
Use isc_mem_callocate() in http_calloc()
Closes #4120
2023-06-27 12:38:09 +02:00
Tony Finch
81d73600c1
Add isc_mem_callocate() for safer array allocation
As well as clearing the fresh memory, `calloc()`-like functions must
ensure that the count and size do not overflow when multiplied.

Use `isc_mem_callocate()` in `isc__uv_calloc()`.
2023-06-27 12:38:09 +02:00
Tony Finch
7474cad4ad
Add <isc/overflow.h> for checked mul, add, and sub
The `ISC_OVERFLOW_XXX()` macros are usually wrappers around
`__builtin_xxx_overflow()`, with alternative implementations
for compilers that lack the builtins.

Replace the overflow checks in `isc/time.c` with the new macros.
2023-06-27 12:38:09 +02:00
Ondřej Surý
5bd9343c4e
Remove the explicit call_rcu thread creating and destruction
The free_all_cpu_call_rcu_data() call can consume hundreds of
milliseconds on shutdown.  Don't try to be smart and let the RCU library
handle this internally.
2023-06-27 07:59:00 +02:00
Tony Finch
e18ca83a3b Improve statschannel HTTP Connection: header protocol conformance
In HTTP/1.0 and HTTP/1.1, RFC 9112 section 9.6 says the last response
in a connection should include a `Connection: close` header, but the
statschannel server omitted it.

In an HTTP/1.0 response, the statschannel server can sometimes send a
`Connection: keep-alive` header when it is about to close the
connection. There are two ways:

If the first request on a connection is keep-alive and the second
request is not, then _both_ responses have `Connection: keep-alive`
but the connection is (correctly) closed after the second response.

If a single request contains

	Connection: close
	Connection: keep-alive

then RFC 9112 section 9.3 says the keep-alive header is ignored, but
the statschannel sends a spurious keep-alive in its response, though
it correctly closes the connection.

To fix these bugs, make it more clear that the `httpd->flags` are part
of the per-request-response state. The Connection: flags are now
described in terms of the effect they have instead of what causes them
to be set.
2023-06-15 17:03:09 +01:00
Ondřej Surý
a8e6c3b8f7
Make isc_result tables smaller
The isc_result_t enum was to sparse when each library code would skip to
next << 16 as a base.  Remove the huge holes in the isc_result_t enum to
make the isc_result tables more compact.

This change required a rewrite how we map dns_rcode_t to isc_result_t
and back, so we don't ever return neither isc_result_t value nor
dns_rcode_t out of defined range.
2023-06-15 15:32:04 +02:00
Midnight Veil
dd6acc1cac Translate POSIX errorcode EROFS to ISC_R_NOPERM
Report "permission denied" instead of "unexpected error"
when trying to update a zone file on a read-only file system.
2023-06-14 13:12:45 +01:00
Mark Andrews
e6e4ac05b8 Fix typo in synchronize_rcu macro (add h)
synchronize_rcu has not been used until now in BIND9 and there
was a typo in the define (a 'h' was missing).
2023-06-06 08:10:09 +10:00
Ondřej Surý
f760ee3f8c
Disable URCU inlining if inlined rcu_dereference() fails to compile
In some cases, the inlined version rcu_dereference() would not compile
when working on pointer to opaque struct (namely Ubuntu Jammy).  Detect
such condition in the autoconf and disable the inlining of the small
functions if it breaks the build.
2023-06-01 16:51:38 +02:00
Mark Andrews
ac2e0bc3ff Move isc_mem_put to after node is checked for equality
isc_mem_put NULL's the pointer to the memory being freed.  The
equality test 'parent->r == node' was accidentally being turned
into a test against NULL.
2023-05-29 01:40:57 +00:00
Evan Hunt
512e5e786b don't set SHUTTINGDOWN until after calling the request callbacks
if we set ISC_HTTPDMGR_SHUTTINGDOWN in the http manager before
calling the pending request callbacks, it can trigger an assertion.
2023-05-27 00:41:37 +00:00
Michal Nowak
1fe5c008d6
Ensure "wrap" variable is non-NULL
RUNTIME_CHECK on the "wrap" variable avoids possible NULL dereference:

    thread.c: In function 'thread_wrap':
    thread.c:60:15: error: dereference of possibly-NULL 'wrap' [CWE-690] [-Werror=analyzer-possible-null-dereference]
       60 |         *wrap = (struct thread_wrap){

The RUNTIME_CHECK was there before
7d1ceaf35d.
2023-05-19 11:02:59 +02:00
Michał Kępień
6029010dd2
Remove <isc/cmocka.h>
The last use of the cmocka_add_test_byname() helper macro was removed in
commit 63fe9312ff.  Remove the
<isc/cmocka.h> header that defines it.
2023-05-18 15:12:23 +02:00
Tony Finch
c319ccd4c9 Fixes for liburcu-qsbr
Move registration and deregistration of the main thread from
`isc_loopmgr_run()` into `isc__initialize()` / `isc__shutdown()`:
liburcu-qsbr fails an assertion if we try to use it from an
unregistered thread, and we need to be able to use it when the
event loops are not running.

Use `rcu_assign_pointer()` and `rcu_dereference()` in qp-trie
transactions so that they properly mark threads as online. The
RCU-protected pointer is no longer declared atomic because
liburcu does not (yet) use standard C atomics.

Fix the definition of `isc_qsbr_rcu_dereference()` to return
the referenced value, and to call the right function inside
liburcu.

Change the thread sanitizer suppressions to match any variant of
`rcu_*_barrier()`
2023-05-15 20:49:42 +00:00
Tony Finch
afae41aa40
Check the return value from uv_async_send()
An omission pointed out by the following report from Coverity:

    /lib/isc/loop.c: 483 in isc_loopmgr_pause()
    >>>     CID 455002:  Error handling issues  (CHECKED_RETURN)
    >>>     Calling "uv_async_send" without checking return value (as is done elsewhere 5 out of 6 times).
    483     		uv_async_send(&loop->pause_trigger);
2023-05-15 18:52:04 +01:00
Evan Hunt
b4ac7faee9 allow streamdns read to resume after timeout
when reading on a streamdns socket failed due to timeout, but
the dispatch was still waiting for other responses, it would
resume reading by calling isc_nm_read() again. this caused
an assertion because the socket was already reading.

we now check that either the socket is reading, or that it was
already reading on the same handle.
2023-05-13 23:31:45 -07:00
Tony Finch
fc770a8bd0
Remove the now-unused ISC_STACK
We are using the liburcu concurrent data structures instead.
2023-05-12 20:49:43 +01:00
Tony Finch
f11cc83142
Use per-CPU RCU helper threads
Create and free per-CPU helper threads from the main thread and tell
thread sanitizer to suppress leaking threads. (We are not leaking
threads ourselves and we can safely ignore the Userspace-RCU thread
leaks.)
2023-05-12 20:48:31 +01:00
Tony Finch
c377e0a9e3
Help thread sanitizer to cope with liburcu
All the places the qp-trie code was using `call_rcu()` needed
`__tsan_release()` and `__tsan_acquire()` annotations, so
add a couple of wrappers to encapsulate this pattern.

With these wrappers, the tests run almost clean under thread
sanitizer. The remaining problems are due to `rcu_barrier()`
which can be suppressed using `.tsan-suppress`. It does not
suppress the whole of `liburcu`, because we would like thread
sanitizer to detect problems in `call_rcu()` callbacks, which
are called from `liburcu`.

The CI jobs have been updated to use `.tsan-suppress` by
default, except for a special-case job that needs the
additional suppressions in `.tsan-suppress-extra`.

We might be able to get rid of some of this after liburcu gains
support for thread sanitizer.

Note: the `rcu_barrier()` suppression is not entirely effective:
tsan sometimes reports races that originate inside `rcu_barrier()`
but tsan has discarded the stack so it does not have the
information required to suppress the report. These "races" can
be made much easier to reproduce by adding `atexit_sleep_ms=1000`
to `TSAN_OPTIONS`. The problem with tsan's short memory can be
addressed by increasing `history_size`: when it is large enough
(6 or 7) the `rcu_barrier()` stack usually survives long enough
for suppression to work.
2023-05-12 20:48:31 +01:00
Tony Finch
05ca11e122
Remove isc_qsbr (we are using liburcu instead)
This commit breaks the qp-trie code.
2023-05-12 20:48:31 +01:00
Tony Finch
cd0795beea
Slightly more sanitary thread dispatch
Tell thread sanitizer that the thread wrapper is released before
passing it to a new thread.
2023-05-12 20:48:31 +01:00
Tony Finch
2e0c954806
Wait for RCU to finish before destroying a memory context
Memory reclamation by `call_rcu()` is asynchronous, so during shutdown
it can lose a race with the destruction of its memory context. When we
defer memory reclamation, we need to attach to the memory context to
indicate that it is still in use, but that is not enough to delay its
destruction. So, call `rcu_barrier()` in `isc_mem_destroy()` to wait
for pending RCU work to finish before proceeding to destroy the memory
context.
2023-05-12 20:48:31 +01:00
Tony Finch
4f97a679f0
A macro for the size of a struct with a flexible array member
It can be fairly long-winded to allocate space for a struct with a
flexible array member: in general we need the size of the struct, the
size of the member, and the number of elements. Wrap them all up in a
STRUCT_FLEX_SIZE() macro, and use the new macro for the flexible
arrays in isc_ht and dns_qp.
2023-05-12 20:48:31 +01:00
Ondřej Surý
fd3522c37b
Add Userspace-RCU to global CFLAGS and LIBS
The Userspace-RCU headers are now needed for more parts of the libisc
and libdns, thus we need to add it globally to prevent compilation
failures on systems with non-standard Userspace-RCU installation path.
2023-05-12 14:16:25 +02:00
Ondřej Surý
00f1823366
Change the isc_quota API to use cds_wfcqueue internally
The isc_quota API was using locked list of isc_job_t objects to keep the
waiting TCP accepts.  Change the isc_quota implementation to use
cds_wfcqueue internally - the enqueue is wait-free and only dequeue
needs to be locked.
2023-05-12 14:16:25 +02:00
Ondřej Surý
7b1d985de2
Change the isc_async API to use cds_wfcqueue internally
The isc_async API was using lock-free stack (where enqueue operation was
not wait-free).  Change the isc_async to use cds_wfcqueue internally -
enqueue and splice (move the queue members from one list to another) is
nonblocking and wait-free.
2023-05-12 14:16:25 +02:00
Ondřej Surý
7220851f67
Replace glue_cache hashtable with direct link in rdatasetheader
Instead of having a global hashtable with a global rwlock for the GLUE
cache, move the glue_list directly into rdatasetheader and use
Userspace-RCU to update the pointer when the glue_list is empty.

Additionally, the cached glue_lists needs to be stored in the RBTDB
version for early cleaning, otherwise the circular dependencies between
nodes and glue_lists will prevent nodes to be ever cleaned up.
2023-05-12 13:25:39 +02:00
Michal Nowak
31935a3537
Disable ASAN in nsupdate for fatal cases
Clang 16 LeakSanitizer reports a memory leak when dns_request_create()
returned a TLS error in the nsupdate system test. While technically a
memory leak on error handling, it's not a problem because the program is
immediately terminated; nsupdate is not expected to run for a prolonged
time.
2023-05-11 13:39:51 +02:00
Mark Andrews
9fcd42c672 Re-write remove_old_tsversions and greatest_version
Stop deliberately breaking const rules by copying file->name into
dirbuf and truncating it there.  Handle files located in the root
directory properly. Use unlinkat() from POSIX 200809.
2023-05-03 09:12:34 +02:00
Matthijs Mekking
70629d73da Fix purging old log files with absolute file path
Removing old timestamp or increment versions of log backup files did
not work when the file is an absolute path: only the entry name was
provided to the file remove function.

The dirname was also bogus, since the file separater was put back too
soon.

Fix these issues to make log file rotation work when the file is
configured to be an absolute path.
2023-05-03 09:12:11 +02:00
Tony Finch
7d1ceaf35d
Move per-thread RCU setup into isc_thread
All the per-loop `libuv` setup remains in `isc_loop`, but the per-thread
RCU setup is moved to `isc_thread` alongside the other per-thread setup.
This avoids repeating the per-thread setup for `call_rcu()` helpers,
and explains a little better why some parts of the per-thread setup
is missing for `call_rcu()` helpers.

This also removes the per-loop `call_rcu()` helpers as we refactored the
isc__random_initialize() in the previous commit.
2023-04-27 12:38:53 +02:00
Ondřej Surý
65021dbf52
Move the isc_random API initialization to the thread_local variable
Instead of writing complicated wrappers for every thread, move the
initialization back to isc_random unit and check whether the random seed
was initialized with a thread_local variable.

Ensure that isc_entropy_get() returns a non-zero seed.

This avoids problems with thread sanitizer tests getting stuck in an
infinite loop.
2023-04-27 12:38:53 +02:00
Tony Finch
e0248bf60f
Simplify isc_thread a little
Remove the `isc_threadarg_t` and `isc_threadresult_t`
typedefs which were unhelpful disguises for `void *`,
and free the dummy jemalloc allocation sooner.
2023-04-27 12:38:53 +02:00
Tony Finch
06f534fa69
Avoid spurious compilation failures in liburcu headers
When liburcu is not installed from a system package, its headers are
not treated as system headers by the compiler, so BIND's -Werror and
other warning options take effect. The liburcu headers have a lot
of inline functions, some of which do not use all their arguments,
which BIND's build treats as an error.
2023-04-27 12:38:53 +02:00
Ondřej Surý
c2c907d728
Improve the Userspace RCU integration
This commit allows BIND 9 to be compiled with different flavours of
Userspace RCU, and improves the integration between Userspace RCU and
our event loop:

- In the RCU QSBR, the thread is put offline when polling and online
  when rcu_dereference, rcu_assign_pointer (or friends) are called.

- In other RCU modes, we check that we are not reading when reaching the
  quiescent callback in the event loop.

- We register the thread before uv_work_run() callback is called and
  after it has finished.  The rcu_(un)register_thread() has a large
  overhead, but that's fine in this case.
2023-04-27 12:38:53 +02:00
Ondřej Surý
58663574b9
Use server socket to log TCP accept failures
The accept_connection() could detach from the child socket on a failure,
so we need to keep and use the server socket for logging the accept
failures.
2023-04-27 11:07:57 +02:00
Ondřej Surý
27ad3a65f9
Fix potential UAF when shutting down isc_httpd
Use the ISC_LIST_FOREACH_SAFE() macro to safely walk the running https
and shut them down in a manner safe from deletion.
2023-04-25 08:16:46 +02:00
Ondřej Surý
ae997d9e21
Add ISC_LIST_FOREACH(_SAFE) macros
There's a recurring pattern walking the ISC_LISTs that just repeats over
and over.  Add two macros:

 * ISC_LIST_FOREACH(list, elt, link) - walk the static list
 * ISC_LIST_FOREACH_SAFE(list, elt, link, next) - walk the list in
   a manner that's safe against list member deletions
2023-04-25 08:16:46 +02:00
Evan Hunt
0393b54afb
add a result code for ENOPROTOOPT, EPROTONOSUPPORT
there was no isc_result_t value for invalid protocol errors
that could be returned from libuv.
2023-04-21 12:42:10 +02:00
Ondřej Surý
b497e90179
Add isc_spinlock unit with shim pthread_spin implementation
The spinlock is small (atomic_uint_fast32_t at most), lightweight
synchronization primitive and should only be used for short-lived and
most of the time a isc_mutex should be used.

Add a isc_spinlock unit which is either (most of the time) a think
wrapper around pthread_spin API or an efficient shim implementation of
the simple spinlock.
2023-04-21 12:10:02 +02:00
Ondřej Surý
3b10814569
Fix the streaming read callback shutdown logic
When shutting down TCP sockets, the read callback calling logic was
flawed, it would call either one less callback or one extra.  Fix the
logic in the way:

1. When isc_nm_read() has been called but isc_nm_read_stop() hasn't on
   the handle, the read callback will be called with ISC_R_CANCELED to
   cancel active reading from the socket/handle.

2. When isc_nm_read() has been called and isc_nm_read_stop() has been
   called on the on the handle, the read callback will be called with
   ISC_R_SHUTTINGDOWN to signal that the dormant (not-reading) socket
   is being shut down.

3. The .reading and .recv_read flags are little bit tricky.  The
   .reading flag indicates if the outer layer is reading the data (that
   would be uv_tcp_t for TCP and isc_nmsocket_t (TCP) for TLSStream),
   the .recv_read flag indicates whether somebody is interested in the
   data read from the socket.

   Usually, you would expect that the .reading should be false when
   .recv_read is false, but it gets even more tricky with TLSStream as
   the TLS protocol might need to read from the socket even when sending
   data.

   Fix the usage of the .recv_read and .reading flags in the TLSStream
   to their true meaning - which mostly consist of using .recv_read
   everywhere and then wrapping isc_nm_read() and isc_nm_read_stop()
   with the .reading flag.

4. The TLS failed read helper has been modified to resemble the TCP code
   as much as possible, clearing and re-setting the .recv_read flag in
   the TCP timeout code has been fixed and .recv_read is now cleared
   when isc_nm_read_stop() has been called on the streaming socket.

5. The use of Network Manager in the named_controlconf, isccc_ccmsg, and
   isc_httpd units have been greatly simplified due to the improved design.

6. More unit tests for TCP and TLS testing the shutdown conditions have
   been added.

Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
2023-04-20 12:58:32 +02:00
Ondřej Surý
f677cf6b73
Remove unused netmgr->worker->sendbuf
By inspecting the code, it was discovered that .sendbuf member of the
isc__nm_networker_t was unused and just consuming ~64k per worker.
Remove the member and the association allocation/deallocation.
2023-04-14 16:20:14 +02:00
Ondřej Surý
1715cad685
Refactor the isc_quota code and fix the quota in TCP accept code
In e185412872, the TCP accept quota code
became broken in a subtle way - the quota would get initialized on the
first accept for the server socket and then deleted from the server
socket, so it would never get applied again.

Properly fixing this required a bigger refactoring of the isc_quota API
code to make it much simpler.  The new code decouples the ownership of
the quota and acquiring/releasing the quota limit.

After (during) the refactoring it became more clear that we need to use
the callback from the child side of the accepted connection, and not the
server side.
2023-04-12 14:10:37 +02:00
Ondřej Surý
1768522045
Convert tls_send() callback to use isc_job_run()
The tls_send() was already using uvreq; convert this to use more direct
isc_job_run() - the on-loop no-allocation method.
2023-04-12 14:10:37 +02:00
Ondřej Surý
1302345c93
Convert isc__nm_http_send() from isc_async_run() to isc_job_run()
The isc__nm_http_send() was already using uvreq; convert this to use
more direct isc_job_run() - the on-loop no-allocation method.
2023-04-12 14:10:37 +02:00
Ondřej Surý
3adba8ce23
Use isc_job_run() for reading from StreamDNS socket
Change the reading in the StreamDNS code to use isc_job_run() instead of
using isc_async_run() for less allocations and more streamlined
execution.
2023-04-12 14:10:37 +02:00
Ondřej Surý
74cbf523b3
Run closehandle_cb on run queue instead of async queue
Instead of using isc_async_run() when closing StreamDNS handle, add
isc_job_t member to the isc_nmhandle_t structure and use isc_job_run()
to avoid allocation/deallocation on the StreamDNS hot-path.
2023-04-12 14:10:37 +02:00
Ondřej Surý
d27f6f2d68
Accept overquota TCP connection on local thread if possible
If the quota callback is called on a thread matching the socket, call
the TCP accept function directly instead of using isc_async_run() which
allocates-deallocates memory.
2023-04-12 14:10:37 +02:00
Ondřej Surý
0a468e7c9e
Make isc_tid() a header-only function
The isc_tid() function is often called on the hot-path and it's the only
function is to return thread_local variable, make the isc_tid() function
a header-only to save several function calls during query-response
processing.
2023-04-12 14:10:37 +02:00
Tony Finch
3405b43fe9
Fix a division by zero bug in isc_histo
This can occur when calculating the standard deviation of an empty
histogram.
2023-04-05 23:29:21 +02:00
Mark Andrews
bf58c10dce Silence NULL pointer dereferene false positive
Only attempt to digest 'in' if it is non NULL.  This will prevent
false positives about NULL pointer dereferences against 'in' and
should also speed up the processing.
2023-04-03 13:32:40 +00:00
Artem Boldariev
2b3a3c21dc Stream DNS: avoid memory copying/buffer resizing when reading data
This commit optimises isc_dnsstream_assembler_t in such a way that
memory copying and reallocation are avoided when receiving one or more
complete DNS messages at once. We try to handle the data from the
messages directly, without storing them in an intermediate memory
buffer.
2023-04-03 13:31:46 +00:00
Tony Finch
cd0e7f853a Simplify histogram quantiles
The `isc_histosummary_t` functions were written in the early days of
`hg64` and carried over when I brought `hg64` into BIND. They were
intended to be useful for graphing cumulative frequency distributions
and the like, but in practice whatever draws charts is better off with
a raw histogram export. Especially because of the poor performance of
the old functions.

The replacement `isc_histo_quantiles()` function is intended for
providing a few quantile values in BIND's stats channel, when the user
does not want the full histogram. Unlike the old functions, the caller
provides all the query fractions up-front, so that the values can be
found in a single scan instead of a scan per value. The scan is from
larger values to smaller, since larger quantiles are usually more
interesting, so the scan can bail out early.
2023-04-03 12:08:05 +01:00
Tony Finch
bc2389b828 Add per-thread sharded histograms for heavy loads
Although an `isc_histo_t` is thread-safe, it can suffer
from cache contention under heavy load. To avoid this,
an `isc_histomulti_t` contains a histogram per thread,
so updates are local and low-contention.
2023-04-03 12:08:05 +01:00
Tony Finch
82213a48cf Add isc_histo for histogram statistics
This is an adaptation of my `hg64` experiments for use in BIND.

As well as renaming everything according to ISC style, I have
written some more extensive tests that ensure the edge cases are
correct and the fenceposts are in the right places.

I have added utility functions for working with precision in terms of
decimal significant figures as well as this code's native binary.
2023-04-03 12:08:05 +01:00
Ondřej Surý
3a6a0fa867 Replace DE_CONST(k, v) with v = UNCONST(k) macro
Replace the complicated DE_CONST macro that required union with much
simple reference-dereference trick in the UNCONST() macro.
2023-04-03 10:25:56 +00:00
Ondřej Surý
4ec9c4a1db Cleanup the last Windows / MSC ifdefs and comments
Cleanup the remnants of MS Compiler bits from <isc/refcount.h>, printing
the information in named/main.c, and cleanup some comments about Windows
that no longer apply.

The bits in picohttpparser.{h,c} were left out, because it's not our
code.
2023-04-03 09:06:20 +00:00
Mark Andrews
2abd6c7ab4 Handle MD5 not being supported by lib crypto
When initialising the message digests in lib/isc/md.c no
longer assume that the initialisation cannot fail.
2023-04-03 12:44:27 +10:00
Mark Andrews
a3172c8f9c Don't check for OPENSSL_cleanup failures by default
OPENSSL_cleanup is supposed to free all remaining memory in use
provided the application has cleaned up properly.  This is not the
case on some operating systems.  Silently ignore memory that is
freed after OPENSSL_cleanup has been called.
2023-04-03 12:44:27 +10:00
Mark Andrews
e029803704 Handle fatal and FIPS provider interactions
When fatal is called we may be holding memory allocated by OpenSSL.
This may result in the reference count for the FIPS provider not
going to zero and the shared library not being unloaded during
OPENSSL_cleanup.  When the shared library is ultimately unloaded,
when all remaining dynamically loaded libraries are freed, we have
already destroyed the memory context we where using to track memory
leaks / late frees resulting in INSIST being called.

Disable triggering the INSIST when fatal has being called.
2023-04-03 12:44:27 +10:00
Mark Andrews
5a2e82557e Define isc_fips_mode() and isc_fips_set_mode()
isc_fips_mode() determines if the process is running in FIPS mode

isc_fips_set_mode() sets the process into FIPS mode
2023-04-03 12:05:28 +10:00
Tony Finch
555690a3c9 Simplify thread spawning
The `isc_trampoline` module had a lot of machinery to support stable
thread IDs for use by hazard pointers. But the hazard pointer code
is gone, and the `isc_loop` module now has its own per-loop thread
IDs.

The trampoline machinery seems over-complicated for its remaining
tasks, so move the per-thread initialization into `isc/thread.c`,
and delete the rest.
2023-03-31 17:21:52 +01:00
Ondřej Surý
a5f5f68502
Refactor isc_time_now() to return time, and not result
The isc_time_now() and isc_time_now_hires() were used inconsistently
through the code - either with status check, or without status check,
or via TIME_NOW() macro with RUNTIME_CHECK() on failure.

Refactor the isc_time_now() and isc_time_now_hires() to always fail when
getting current time has failed, and return the isc_time_t value as
return value instead of passing the pointer to result in the argument.
2023-03-31 15:02:06 +02:00
Ondřej Surý
263d232c79 Replace isc_fsaccess API with more secure file creation
The isc_fsaccess API was created to hide the implementation details
between POSIX and Windows APIs.  As we are not supporting the Windows
APIs anymore, it's better to drop this API used in the DST part.

Moreover, the isc_fsaccess was setting the permissions in an insecure
manner - it operated on the filename, and not on the file descriptor
which can lead to all kind of attacks if unpriviledged user has read (or
even worse write) access to key directory.

Replace the code that operates on the private keys with code that uses
mkstemp(), fchmod() and atomic rename() at the end, so at no time the
private key files have insecure permissions.
2023-03-31 12:52:59 +00:00
Ondřej Surý
aca7dd3961 Add isc_os_umask() function to get current umask
As it's impossible to get the current umask without modifying it at the
same time, initialize the current umask at the program start and keep
the loaded value internally.  Add isc_os_umask() function to access the
starttime umask.
2023-03-31 12:52:59 +00:00
Ondřej Surý
4bd6096d4b
Remove isc_stdtime_get() macro
Now that isc_stdtime_get() macro is unused, remove it from the header
file.
2023-03-31 13:33:16 +02:00
Ondřej Surý
46f06c1d6e
Apply the semantic patch to remove isc_stdtime_get()
This is a simple replacement using the semantic patch from the previous
commit and as added bonus, one removal of previously undetected unused
variable in named/server.c.
2023-03-31 13:32:56 +02:00
Ondřej Surý
c11af0448a
Provide isc_stdtime_now(void) that returns value
As isc_stdtime_get() cannot fail, the API seems to be too complicated,
add new isc_stdtime_now() that returns the unixtime as a return value.
2023-03-31 13:16:28 +02:00
Tony Finch
194621a74e Fix a crash when dig or host receive a signal
When the loopmanager is shutting down following a signal,
`dig` and `host` should stop cleanly. Before this commit
they were oblivious to ISC_R_SHUTTINGDOWN.

The `isc_signal` callbacks now report this kind of mistake
with a stack backtrace.
2023-03-31 09:52:54 +00:00
Ondřej Surý
2c0a9575d7
Replace __attribute__((unused)) with ISC_ATTR_UNUSED attribute macro
Instead of marking the unused entities with UNUSED(x) macro in the
function body, use a `ISC_ATTR_UNUSED` attribute macro that expans to
C23 [[maybe_unused]] or __attribute__((__unused__)) as fallback.
2023-03-30 23:29:25 +02:00
Ondřej Surý
1176bf0552
Use C23 attributes if available, add ISC_ATTR_UNUSED
Use C23 attribute styles if available:

  * Add new ISC_ATTR_UNUSED attribute macro that either expands to C23's
    [[maybe_unused]] or __attribute__((__unused__));

  * Add default expansion of the `noreturn` to [[noreturn]] if available;

  * Move the FALLTHROUGH from <isc/util.h> to <isc/attributes.h>
2023-03-30 22:43:39 +02:00
Artem Boldariev
43e21d653f TLS Stream: remove incorrect/obsolete INSIST()s from tls_do_bio()
With the changes to tls_try_handshake() made in
2846888c57 there are some incorrect
INSISTS() related to handshake handling which better to be removed.
2023-03-30 18:21:50 +03:00
Ondřej Surý
2846888c57
Attach the accept "client" socket to .listener member of the socket
When accepting a TCP connection in the higher layers (tlsstream,
streamdns, and http) attach to the socket the connection was accepted
on, and use this socket instead of the parent listening socket.

This has an advantage - accessing the sock->listener now doesn't break
the thread boundaries, so we can properly check whether the socket is
being closed without requiring .closing member to be atomic_bool.
2023-03-30 16:10:08 +02:00
Ondřej Surý
45365adb32
Convert sock->active to non-atomic variable, cleanup rchildren
The last atomic_bool variable sock->active was converted to non-atomic
bool by properly handling the listening socket case where we were
checking parent socket instead of children sockets.

This is no longer necessary as we properly set the .active to false on
the children sockets.

Additionally, cleanup the .rchildren - the atomic variable was used for
mutex+condition to block until all children were listening, but that's
now being handled by a barrier.

Finally, just remove dead .self and .active_child_connections members of
the netmgr socket.
2023-03-30 16:10:08 +02:00
Ondřej Surý
e1a4572fd6
Refactor the use of atomics in netmgr
Now that everything runs on their own loop and we don't cross the thread
boundaries (with few exceptions), most of the atomic_bool variables used
to track the socket state have been unatomicized because they are always
accessed from the matching thread.

The remaining few have been relaxed: a) the sock->active is now using
acquire/release memory ordering; b) the various global limits are now
using relaxed memory ordering - we don't really care about the
synchronization for those.
2023-03-30 16:10:08 +02:00
Ondřej Surý
f5fc224af3
Add isc_async_current() macro to run job on current loop
Previously, isc_job_run() could have been used to run the job on the
current loop and the isc_job_run() would take care of allocating and
deallocating the job.  After the change in this MR, the isc_job_run()
is more complicated to use, so we introduce the isc_async_current()
macro to suplement isc_async_run() when we need to run the job on the
current loop.
2023-03-30 16:07:41 +02:00
Ondřej Surý
1844590ad9
Refactor isc_job_run to not-make any allocations
Change the isc_job_run() to not-make any allocations.  The caller must
make sure that it allocates isc_job_t - usually as part of the argument
passed to the callback.

For simple jobs, using isc_async_run() is advised as it allocates its
own separate isc_job_t.
2023-03-30 16:00:52 +02:00
Ondřej Surý
639d5065a3
Refactor the isc__nm_uvreq_t to have idle callback
Change the isc__nm_uvreq_t to have the idle callback as a separate
member as we always need to use it to properly close the uvreq.

Slightly refactor uvreq_put and uvreq_get to remove the unneeded
arguments - in uvreq_get(), we always use sock->worker, and in
uvreq_put, we always use req->sock, so there's not reason to pass those
extra arguments.
2023-03-29 21:16:44 +02:00
Ondřej Surý
476198f26c
Use uv_idle API for calling asynchronous connect/read/send callback
Instead of using isc_job_run() that's quite heavy as it allocates memory
for every new job, add uv_idle_t to uvreq union, and use uv_idle API
directly to execute the connect/read/send callback without any
additional allocations.
2023-03-29 21:16:44 +02:00
Ondřej Surý
670df3da74
Re-add the comment to streamdns_readmore()
Put the comment back, so it's more obvious that we are only restarting
timer when there's a last handle attached to the socket; there has to be
always at least one.
2023-03-29 21:16:44 +02:00
Tony Finch
295e7c80e8 Ad-hoc backtrace logging with isc_backtrace_log()
It's sometimes helpful to get a quick idea of the call stack when
debugging. This change factors out the backtrace logging from named's
fatal error handler so that it's easy to use in other places too.
2023-03-29 10:47:53 +00:00
Ondřej Surý
665f8bb78d Fix isc_nm_httpconnect to check for shuttindown condition
The isc_nm_httpconnect() would succeed even if the netmgr would be
already shuttingdown.  This has been fixed and the unit test has been
updated to cope with fact that the handle would be NULL when
isc_nm_httpconnect() returns with an error.
2023-03-29 05:49:57 +00:00
Evan Hunt
fe7ed2ba24 update stream sockets with bound address/port
when isc_nm_listenstreamdns() is called with a local port of 0,
a random port is chosen. call uv_getsockname() to determine what
the port is as soon as the socket is bound, and add a function
isc_nmsocket_getaddr() to retrieve it, so that the caller can
connect to the listening socket. this will be used in cases
where the same process is acting as both client and server.
2023-03-28 12:38:28 -07:00
Evan Hunt
4ad95e0567 add ns_interface_create()
add a public function ns_interface_create() allowing the caller
to set up a listening interface directly without having to set
up listen-on and scan network interfaces.
2023-03-28 12:38:28 -07:00
Ondřej Surý
a2e4a6883f
Remove the netievent remnants
After removing all functional netievents, remove what has been left from
the netievents.  This also includes leftovers from previous refactorings.
2023-03-24 07:58:53 +01:00
Ondřej Surý
6b107c3fbc
Convert stopping generic socket children to to isc_async callback
Simplify the stopping of the generic socket children by using the
isc_async API from the loopmgr instead of using the asychronous
netievent mechanism in the netmgr.
2023-03-24 07:58:53 +01:00
Ondřej Surý
744e93b70d
Convert setting of the TLS contexts to to isc_async callback
Simplify the setting of the TLS contexts by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:53 +01:00
Ondřej Surý
7ddc49d66a
Convert canceling StreamDNS socket to to isc_async callback
Simplify the canceling of the StreamDNS socket by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:53 +01:00
Ondřej Surý
2185dc75f0
Convert reading from StreamDNS socket to to isc_async callback
Simplify the reading from the StreamDNS socket by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
4a4bd68777
Convert setting of the DoH endpoints to to isc_async callback
Simplify the setting of the DoH endpoints by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
115160de73
Convert sending on the DoH socket to to isc_async callback
Simplify the sending on the DoH socket by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
a321d3f419
Convert closing the DoH socket to to isc_async callback
Simplify the closing the DoH socket by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
8c48c51f71
Convert doing the TLS IO to to isc_async callback
Simplify the doing the TLS IO by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
3d4d099ac8
Cleanup already defunct tlsconnect netievent
The netievent used for TLS connect was already defunct, just cleanup the
cruft.
2023-03-24 07:58:52 +01:00
Ondřej Surý
35b4ef0a08
Convert sending on the TLS socket to to isc_async callback
Simplify the sending on the TLS socket by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
4f27b14cd1
Convert closing the TLS socket to to isc_async callback
Simplify the closing the TLS socket by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
e185412872
Convert accepting new TCP connection to to isc_async callback
Simplify the acception the new TCP connection by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
1baffb6ff5
Convert canceling UDP socket to to isc_async callback
Simplify the canceling of the UDP socket by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
4419848efd
Convert stopping TCP children to to isc_async callback
Simplify the stopping of the TCP children by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
e1524f2b4e
Convert starting TCP children to to isc_async callback
Simplify the starting of the TCP children by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
8cb4cfd9db
Convert stopping UDP children to to isc_async callback
Simplify the stopping of the UDP children by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
b25dd5eaf5
Convert starting UDP children to to isc_async callback
Simplify the starting of the UDP children by using the isc_async API
from the loopmgr instead of using the asychronous netievent mechanism in
the netmgr.
2023-03-24 07:58:52 +01:00
Ondřej Surý
5a43be0775
Simplify netmgr active handles accounting
The active handles accounting was both using atomic counter and ISC_LIST
to keep track of active handles.  Remove the atomic counter that was in
use before the ISC_LIST was added for better tracking of the handles
attached to the socket.
2023-03-24 07:58:52 +01:00
Ondřej Surý
96cff4fc51
Convert netmgr handle detach to synchronous callback
Instead of calling isc__nmhandle_detach calling
nmhandle_detach_cb() asynchronously when there's closehandle_cb
initialized, convert the closehandle_cb to use isc_job, and make the
isc__nmhandle_detach() to be fully synchronous.
2023-03-24 07:58:52 +01:00
Ondřej Surý
237f4af152 Convert netmgr connect, read and send callbacks to isc_job
The netmgr connect, read and send callbacks can now only be executed on
the same loop, convert it from asynchronous netievent queue event to
more direct isc_job.
2023-03-23 22:33:40 -07:00
Artem Boldariev
719343348e Delete old TLS DNS and TCP DNS dead code
This commit removes old, unused TLS DNS and TCP DNS definitions from
the code. They should have been deleted earlier, but that was missed.
2023-03-15 18:40:58 +02:00
Tony Finch
7e565a87a7
Apply adjusted clang-format
The headers were slightly reordered when liburcu was added.
2023-03-10 17:31:28 +01:00
Ondřej Surý
2532b558b4
Build with liburcu, Userspace RCU
BIND needs a collection of standard lock-free data structures,
which we can find in liburcu, along with its RCU safe memory
reclamation machinery. We will use liburcu's QSBR variant instead
of the home-grown isc_qsbr.
2023-03-10 17:31:28 +01:00
Aram Sargsyan
fce68da460 Fix ISC_REFCOUNT_TRACE_IMPL usage
ISC_REFCOUNT_TRACE_IMPL uses isc_tid(), but the corresponding header
file is not included, which breaks, for example, compiling BIND with
DNS_CATZ_TRACE defined in lib/dns/include/dns/catz.h.

Add '#include <isc/tid.h>' in lib/isc/include/isc/refcount.h.
2023-03-09 21:38:04 +00:00
Mark Andrews
0045b24500 Silence uninitialized value false positives
In base32_decode_char the GCC 12 static analyser fails to determine
that ctx->val[1], ctx->val[3], ctx->val[4] and ctx->val[6] are
assigned values by the previous call to base32_decode_char.  Initialise
ctx->val to zeros when initalising the rest of ctx to silence the
false positive.
2023-03-08 22:40:03 +00:00
Tony Finch
c43668f031 Remove some lingering references to libbind9
Clean up the `.clang-format` #include priority list and
the `\file` declaration in `isc/getaddresses.h`.
2023-03-08 10:06:22 +00:00
Mark Andrews
cf5f133679 Fix memory leak in isc_hmac_init
If EVP_DigestSignInit failed 'pkey' was not freed.
2023-02-26 22:56:07 +00:00
Tony Finch
9b7aa536ba QSBR: safe memory reclamation for lock-free data structures
This "quiescent state based reclamation" module provides support for
the qp-trie module in dns/qp. It is a replacement for liburcu, written
without reference to the urcu source code, and in fact it works in a
significantly different way.

A few specifics of BIND make this variant of QSBR somewhat simpler:

  * We can require that wait-free access to a qp-trie only happens in
    an isc_loop callback. The loop provides a natural quiescent state,
    after the callbacks are done, when no qp-trie access occurs.

  * We can dispense with any API like rcu_synchronize(). In practice,
    it takes far too long to wait for a grace period to elapse for each
    write to a data structure.

  * We use the idea of "phases" (aka epochs or eras) from EBR to
    reduce the amount of bookkeeping needed to track memory that is no
    longer needed, knowing that the qp-trie does most of that work
    already.

I considered hazard pointers for safe memory reclamation. They have
more read-side overhead (updating the hazard pointers) and it wasn't
clear to me how to nicely schedule the cleanup work. Another
alternative, epoch-based reclamation, is designed for fine-grained
lock-free updates, so it needs some rethinking to work well with the
heavily read-biased design of the qp-trie. QSBR has the fastest read
side of the basic SMR algorithms (with no barriers), and fits well
into a libuv loop. More recent hybrid SMR algorithms do not appear to
have enough benefits to justify the extra complexity.
2023-02-23 15:57:53 +00:00
Tony Finch
63cd73d43e Include thread ID in refcount trace output 2023-02-23 14:28:27 +00:00
Evan Hunt
dc27552c30 remove isc_glob
the isc_glob module was originally needed to support posix-style glob
processing on Windows, but is now just an unnecessary wrapper around
glob(3). this commit removes it.
2023-02-22 17:35:29 +00:00
Ondřej Surý
6eb1340d1b Use atomic stack for async job queue
Previously, the async job queue would use a locked-list (ISC_LIST).
With introduction of atomic stack (that has to be drained at once), we
could use it to remove some contention between the threads and simplify
the async queue.

Fortunately, the reverse order still works for us - instead of append
and tail/prev operation on the list, we are now using prepend and
head/next operation on the atomic stack.
2023-02-22 16:13:37 +00:00
Tony Finch
36e56923ce Simple lock-free stack in <isc/stack.h>
Add a singly-linked stack that supports lock-free prepend and drain (to
empty the list and clean up its elements).  Intended for use with QSBR
to collect objects that need safe memory reclamation, or any other user
that works with adding objects to the stack and then draining them in
one go like various work queues.

In <isc/atomic.h>, add an `atomic_ptr()` macro to make type
declarations a little less abominable, and clean up a duplicate
definition of `atomic_compare_exchange_strong_acq_rel()`
2023-02-22 16:13:37 +00:00
Evan Hunt
b058f99cb8 remove references to obsolete isc_task/timer functions
removed references in code comments, doc/dev documentation, etc, to
isc_task, isc_timer_reset(), and isc_timertype_inactive. also removed a
coccinelle patch related to isc_timer_reset() that was no longer needed.
2023-02-22 08:13:30 +00:00
Tony Finch
3fef7c626a Move bind9_getaddresses() to isc_getaddresses()
No need to have a whole library for one function.
2023-02-21 13:12:26 +00:00
Evan Hunt
a52b17d39b
remove isc_task completely
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.

functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.

the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
2023-02-16 18:35:32 +01:00
Evan Hunt
f58e7c28cd
switch to using isc_loopmgr_pause() instead of task exclusive
change functions using isc_taskmgr_beginexclusive() to use
isc_loopmgr_pause() instead.

also, removed an unnecessary use of exclusive mode in
named_server_tcptimeouts().

most functions that were implemented as task events because they needed
to be running in a task to use exclusive mode have now been changed
into loop callbacks instead. (the exception is catz, which is being
changed in a separate commit because it's a particularly complex change.)
2023-02-16 17:51:55 +01:00
Tony Finch
f9c725d7d4 Remove do-nothing header <isc/stat.h>
Use <sys/stat.h> instead
2023-02-15 16:44:47 +00:00
Tony Finch
6927a30926 Remove do-nothing header <isc/print.h>
This one really truly did nothing. No lines added!
2023-02-15 16:44:47 +00:00
Tony Finch
c7615bc28d Remove do-nothing header <isc/offset.h>
And replace all uses of isc_offset_t with standard off_t
2023-02-15 16:44:47 +00:00
Tony Finch
bed09c1676 Remove do-nothing header <isc/netdb.h>
Not needed since we dropped Windows support
2023-02-15 16:44:47 +00:00
Tony Finch
b0893ae09a Explain <isc/strerr.h> a little more
The purpose of the `strerror_r()` wrapper was not obvious.
2023-02-15 16:44:09 +00:00
Tony Finch
75f7a85a39 Deprecate <isc/deprecated.h>
We refactor more freely these days.
2023-02-15 15:36:20 +00:00
Ondřej Surý
6ffda5920e
Add the reader-writer synchronization with modified C-RW-WP
This changes the internal isc_rwlock implementation to:

  Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra
  J. Marathe, and Nir Shavit.  2013.  NUMA-aware reader-writer locks.
  SIGPLAN Not. 48, 8 (August 2013), 157–166.
  DOI:https://doi.org/10.1145/2517327.24425

(The full article available from:
  http://mcg.cs.tau.ac.il/papers/ppopp2013-rwlocks.pdf)

The implementation is based on the The Writer-Preference Lock (C-RW-WP)
variant (see the 3.4 section of the paper for the rationale).

The implemented algorithm has been modified for simplicity and for usage
patterns in rbtdb.c.

The changes compared to the original algorithm:

  * We haven't implemented the cohort locks because that would require a
    knowledge of NUMA nodes, instead a simple atomic_bool is used as
    synchronization point for writer lock.

  * The per-thread reader counters are not being used - this would
    require the internal thread id (isc_tid_v) to be always initialized,
    even in the utilities; the change has a slight performance penalty,
    so we might revisit this change in the future.  However, this change
    also saves a lot of memory, because cache-line aligned counters were
    used, so on 32-core machine, the rwlock would be 4096+ bytes big.

  * The readers use a writer_barrier that will raise after a while when
    readers lock can't be acquired to prevent readers starvation.

  * Separate ingress and egress readers counters queues to reduce both
    inter and intra-thread contention.
2023-02-15 09:30:04 +01:00
Ondřej Surý
28fe8104ee
Add isc_hashmap_find() DbC check for valuep
This adds DbC check, so we don't pass non-NULL memory for a valued to
the isc_hashmap_find() function.
2023-02-15 09:30:04 +01:00
Tony Finch
436b76bb17 Improve the spinloop pause / yield hint
Unfortunately, C still lacks a standard function for pause (x86,
sparc) or yeild (arm) instructions, for use in spin lock or CAS loops.
BIND has its own based on vendor intrinsics or inline asm.

Previously, it was buried in the `isc_rwlock` implementation. This
commit renames `isc_rwlock_pause()` to `isc_pause()` and moves
it into <isc/pause.h>.

This commit also fixes the configure script so that it detects ARM
yield support on systems that identify as `aarch*` instead of `arm*`.

On 64-bit ARM systems we now use the ISB (instruction synchronization
barrier) instruction in preference to yield. The ISB instruction
pauses the CPU for longer, several nanoseconds, which is more like the
x86 pause instruction. There are more details in a Rust pull request,
which also refers to MySQL making the same change:
https://github.com/rust-lang/rust/pull/84725
2023-02-14 17:13:24 +00:00
Evan Hunt
3a1bb8dac8 remove some unused functions
removed some functions that are no longer used and unlikely to
be resurrected, and also some that were only used to support Windows
and can now be replaced with generic versions.
2023-02-13 11:50:59 -08:00
Evan Hunt
935879ed11 remove isc_bind9 variable
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
2023-02-09 18:00:13 +00:00
Ondřej Surý
d4d57f16c3 Sync compile-time & run-time libuv requirements
Bump the minimum libuv version required at runtime so that it matches
the compile-time requirements.
2023-02-09 15:04:52 +01:00
Ondřej Surý
735d09bffe Enforce version drift limits for libuv
libuv support for receiving multiple UDP messages in a single system
call (recvmmsg()) has been tweaked several times between libuv versions
1.35.0 and 1.40.0.  Mixing and matching libuv versions within that span
may lead to assertion failures and is therefore considered harmful, so
try to limit potential damage be preventing users from mixing libuv
versions with distinct sets of recvmmsg()-related flags.
2023-02-09 15:04:52 +01:00
Ondřej Surý
251f411fc3 Avoid libuv 1.35 and 1.36 that have broken recvmmsg implementation
The implementation of UDP recvmmsg in libuv 1.35 and 1.36 is
incomplete and could cause assertion failure under certain
circumstances.

Modify the configure and runtime checks to report a fatal error when
trying to compile or run with the affected versions.
2023-02-09 15:04:52 +01:00
Ondřej Surý
baced007af
Require C11 Atomic Operations via <stdatomic.h>
Make the C11 Atomic Operations mandatory and drop the Gcc __atomic
builtin shims.
2023-02-08 21:33:23 +01:00
Ondřej Surý
1c456c0284
Require C11 thread_local keyword and <threads.h> header
Change the autoconf check to require C11 <threads.h> header and
thread_local keyword.
2023-02-08 21:33:23 +01:00
Tony Finch
ff63b53ff4 Add isc_time_monotonic()
This is to simplify measurements of how long things take.
2023-02-06 12:14:51 +00:00
Tony Finch
b8e71f9580 Fix ISC_MEM_ZERO on allocators with malloc_usable_size()
ISC_MEM_ZERO requires great care to use when the space returned by
the allocator is larger than the requested space, and when memory is
reallocated. You must ensure that _every_ call to allocate or
reallocate a particular block of memory uses ISC_MEM_ZERO, to ensure
that the extra space is zeroed as expected. (When ISC_MEMFLAG_FILL
is set, the extra space will definitely be non-zero.)

When BIND is built without jemalloc, ISC_MEM_ZERO is implemented in
`jemalloc_shim.h`. This had a bug on systems that have malloc_size()
or malloc_usable_size(): memory was only zeroed up to the requested
size, not the allocated size. When an oversized allocation was
returned, and subsequently reallocated larger, memory between the
original requested size and the original allocated size could
contain unexpected nonzero junk. The realloc call does not know the
original requested size and only zeroes from the original allocated
size onwards.

After this change, `jemalloc_shim.h` always zeroes up to the
allocated size, not the requested size.
2023-02-06 11:21:12 +00:00
Evan Hunt
7fd78344e0 refactor isc_ratelimiter to use loop callbacks
the rate limter now uses loop callbacks rather than task events.
the API for isc_ratelimiter_enqueue() has been changed; we now pass
in a loop, a callback function and a callback argument, and
receive back a rate limiter event object (isc_rlevent_t). it
is no longer necessary for the caller to allocate the event.

the callback argument needs to include a pointer to the rlevent
object so that it can be freed using isc_rlevent_free(), or by
dequeueing.
2023-01-31 21:41:19 -08:00
Ondřej Surý
3d674ccc1d Restore Malloced memory counter as InUse alias + little cleanups
This restores the Malloced memory counter and it's now always equal to
InUse counter.  This is only for backwards compatibility reason and
there is no separate counter.

The commit also cleanups little things like structure with a single
item (summary.inuse), and shuts up a wrong cppcheck warning (the
notorious NULL check after assignment).
2023-01-24 17:57:16 +00:00
Ondřej Surý
474279e5f1 Remove ContextSize memory counter
Again, this was an internal allocator counter, now it's useless.
2023-01-24 17:57:16 +00:00
Ondřej Surý
863b2b8bf3 Make the all inuse memory counter atomic operations relaxed
Instead of enforcing stronger synchronization between threads, make all
the atomic operations relaxed.  We are not really interested in exact
numbers at all times - the single place where we need the exact number
is when the memory context is being destroyed.  Even when there's a
overmem counter, we don't care about exact ordering or exact number.
2023-01-24 17:57:16 +00:00
Ondřej Surý
a08e2d37ed Cleanup the ptr argument from mem_putstats()
The ptr argument was unneeded and unused.
2023-01-24 17:57:16 +00:00
Ondřej Surý
699736b7bb Remove the Lost memory counter
The Lost memory counter would count the memory "lost" by external
libraries.  There's really no such thing as `named` require the memory
contexts to be clean on destroy.
2023-01-24 17:57:16 +00:00
Ondřej Surý
7588cd5cb1 Remove stats buckets memory counters
The stats buckets were again more useful for internal allocator, because
we would see the individual "block" caches where the allocations would
fall into.  Remove the stats buckets, and if needed, we can pull more
detailed statistics out of the jemalloc.
2023-01-24 17:57:16 +00:00
Ondřej Surý
1ea8894626 Remove the 'totalgets' memory counter
The totalgets falls into the same category as other "total" and "max"
numbers - it's just a big number with no meaning to end user.
2023-01-24 17:57:16 +00:00
Ondřej Surý
3d4e41d076 Remove the total memory counter
The total memory counter had again little or no meaning when we removed
the internal memory allocator.  It was just a monotonic counter that
would count add the allocation sizes but never subtracted anything, so
it would be just a "big number".
2023-01-24 17:57:16 +00:00
Ondřej Surý
91e349433f Remove maxinuse memory counter
The maxinuse memory counter indicated the highest amount of
memory allocated in the past. Checking and updating this high-
water mark value every time memory was allocated had an impact
on server performance, so it has been removed. Memory size can
be monitored more efficiently via an external tool logging RSS.
2023-01-24 17:57:16 +00:00
Ondřej Surý
971df0b4ed Remove malloced and maxmalloced memory counter
The malloced and maxmalloced memory counters were mostly useless since
we removed the internal allocator blocks - it would only differ from
inuse by the memory context size itself.
2023-01-24 17:57:16 +00:00
Ondřej Surý
7d8aa63026 Make {increment,decrement}_malloced() return void
The return value was only used in a single place and only for
decrement_malloced() and we can easily replace that with atomic_load().
2023-01-24 17:57:16 +00:00
Evan Hunt
a2d773fb98 Refactor dnssec-signzone to use loop callbacks
Use isc_job_run() instead of isc_task_send() for dnssec-signzone
worker threads.

Also fix the issue where the additional assignwork() would be run only
from the main thread effectively serializing all the signing.
2023-01-21 23:39:09 -08:00
Evan Hunt
301f8b23e1 complete change of NETMGR_TRACE to ISC_NETMGR_TRACE
some references to the old ifdef were still in place.
2023-01-20 12:46:34 -08:00
Mark Andrews
b74dd2e8c2 Use INSIST rather then REQUIRE to meet DBC usage rules 2023-01-20 11:05:24 +11:00
Mark Andrews
08c39736a9 isc_nm_listentcp: treat socket failures gracefully
The old code didn't handle race conditions and errors on systems
with non load balancing sockets gracefully.  Look for an error on
any child socket and if found close all the child sockets and return
an error.
2023-01-20 11:05:24 +11:00
Mark Andrews
624f5a0dae isc_nm_listenudp: treat socket failures gracefully
The old code didn't handle race conditions and errors on systems
with non load balancing sockets gracefully.  Look for an error on
any child socket and if found close all the child sockets and return
an error.
2023-01-20 11:05:24 +11:00
Artem Boldariev
942569a1bb Fix building BIND on DragonFly BSD (on both older an newer versions)
This commit ensures that BIND and supplementary tools still can be
built on newer versions of DragonFly BSD. It used to be the case, but
somewhere between versions 6.2 and 6.4 the OS developers rearranged
headers and moved some function definitions around.

Before that the fact that it worked was more like a coincidence, this
time we, at least, looked at the related man pages included with the
OS.

No in depth testing has been done on this OS as we do not really
support this platform - so it is more like a goodwill act. We can,
however, use this platform for testing purposes, too. Also, we know
that the OS users do use BIND, as it is included in its ports
directory.

Building with './configure' and './configure --without-jemalloc' have
been fixed and are known to work at the time the commit is made.
2023-01-20 00:19:12 +02:00
Aram Sargsyan
41dc48bfd7 Refactor isc_nm_xfr_allowed()
Return 'isc_result_t' type value instead of 'bool' to indicate
the actual failure. Rename the function to something not suggesting
a boolean type result. Make changes in the places where the API
function is being used to check for the result code instead of
a boolean value.
2023-01-19 10:24:08 +00:00
Ondřej Surý
5abbcdadaf
Use thread_local EVP_MD in isc_iterated_hash()
Cherry-pick small fixup commit from 9.18/9.16 branches needed for
thread-safety.  This fixup commit is not needed for 9.19+ because of
reworked application setup, but it decouples isc_iterated_hash and
isc_md units and keeps all the branches in sync.
2023-01-18 23:33:43 +01:00
Ondřej Surý
f3753d591f Use thread_local EVP_MD_CTX in isc_iterated_hash()
As this code is on hot path (NSEC3) this introduces an additional
optimization of the EVP_MD API - instead of calling EVP_MD_CTX_new() on
every call to isc_iterated_hash(), we create two thread_local objects
for each thread - a basectx and mdctx, initialize basectx once and then
use EVP_MD_CTX_copy_ex() to flip the initialized state into mdctx.  This
saves us couple more valuable microseconds from the isc_iterated_hash()
call.
2023-01-18 19:36:21 +01:00
Ondřej Surý
25db8d0103 Use OpenSSL 1.x SHA_CTX API in isc_iterated_hash()
If the OpenSSL SHA1_{Init,Update,Final} API is still available, use it.
The API has been deprecated in OpenSSL 3.0, but it is significantly
faster than EVP_MD API, so make an exception here and keep using it
until we can't.
2023-01-18 19:36:17 +01:00
Ondřej Surý
36654df732 Use OpenSSL EVP_MD API directly in isc_iterated_hash()
Instead of going through another layer, use OpenSSL EVP_MD API directly
in the isc_iterated_hash() implementation.  This shaves off couple of
microseconds in the microbenchmark.
2023-01-18 18:32:57 +01:00
Ondřej Surý
e6bfb8e456 Avoid implicit algorithm fetch for OpenSSL EVP_MD family
The implicit algorithm fetch causes a lock contention and significant
slowdown for small input buffers.  For more details, see:

https://github.com/openssl/openssl/issues/19612

Instead of using EVP_DigestInit_ex() initialize empty MD_CTX objects for
each algorithm and use EVP_MD_CTX_copy_ex() to initialize MD_CTX from a
static copy.  Additionally avoid implicit algorithm fetching by using
EVP_MD_fetch() for OpenSSL 3.0.
2023-01-18 18:32:57 +01:00
Tony Finch
290899661d Fix a typo in the NS_PER_ macros
Milliseconds and microseconds were swapped.
2023-01-16 20:33:57 +00:00
Ondřej Surý
d07c4a98da Prefer the pthread_barrier implementation over uv_barrier
Prefer the pthread_barrier implementation on platforms where it is
available over uv_barrier implementation.  This also solves the problem
with thread sanitizer builds on macOS that doesn't have pthread barrier.
2023-01-11 09:51:02 +01:00
Ondřej Surý
d06602f036
Get rid of locking during UDP and TCP listen
We already have a synchronization mechanism when starting the UDP and
TCP listener children - barriers.  Change how we start the first-born
child (tid == 0), so we don't have to race for sock->parent->result and
sock->parent->fd.
2023-01-11 07:17:46 +01:00
Ondřej Surý
10f884a5b8
Remove unused isc_astack unit
The isc_astack unit is now unused, so just remove it.
2023-01-10 20:31:24 +01:00
Ondřej Surý
359faf2ff7
Convert isc_astack usage in netmgr to mempool and ISC_LIST
Change the per-socket inactive uvreq cache (implemented as isc_astack)
to per-worker memory pool.

Change the per-socket inactive nmhandle cache (implemented as
isc_astack) to unlocked per-socket ISC_LIST.
2023-01-10 20:31:24 +01:00
Ondřej Surý
5bbba0d1a1
Simplify tracing the reference counting in isc_netmgr
Always track the per-worker sockets in the .active_sockets field in the
isc__networker_t struct and always track the per-socket handles in the
.active_handles field ian the isc_nmsocket_t struct.
2023-01-10 19:57:39 +01:00
Mark Andrews
349c23dbb7 Accept 'in=NULL' with 'inlen=0' in isc_{half}siphash24
Arthimetic on NULL pointers is undefined.  Avoid arithmetic operations
when 'in' is NULL and require 'in' to be non-NULL if 'inlen' is not zero.
2023-01-10 17:52:56 +11:00
Evan Hunt
916ea26ead remove nonfunctional DSCP implementation
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.

To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.
2023-01-09 12:15:21 -08:00
Evan Hunt
9c577e10c3 use separate barriers for "stop" and "listen" operations
On some platforms, when a synchronizing barrier is cleared, one
thread can progress while other threads are still in the process
of releasing the barrier. If a barrier is reused by the progressing
thread during this window, it can cause a deadlock. This can occur if,
for example, we stop listening immediately after we start, because the
stop and listen functions both use socket->barrier.  This has been
addressed by using separate barrier objects for stop and listen.
2023-01-07 16:30:21 -08:00
Ondřej Surý
6613f89c62 Enhance the isc_loop unit to allow reference count tracking
Use ISC_REFCOUNT_TRACE_{IMPL,DECL} to allow better isc_loop reference
tracking - use `#define ISC_LOOP_TRACE 1` in <isc/loop.h> to enable.
2023-01-05 12:33:15 +00:00
Ondřej Surý
6553927d27
Enforce strong thread-affinity on StreamDNS sockets
Add a check that the isc__nm_streamdns_read(), isc__nm_streamdns_send(),
and isc__nm_streamdns_close() are being called from the matching thread.
2023-01-05 09:43:09 +01:00
Mark Andrews
096b280b1c Do not pass NULL pointer to memmove - undefined behaviour
Check if 'old_base' is NULL and if so skip calling memmove.
2023-01-03 14:40:30 +11:00
Artem Boldariev
fbf1546fb8 TLS: use isc_buffer_t for send requests
This commit replaces ad-hoc code for send requests buffer management
within TLS with the one based on isc_buffer_t.

Previous version of the code was trying to use pre-allocated small
buffers to avoid extra allocations. The code would allocate a larger
dynamic buffer when needed. There is no need to have ad-hoc code for
this, as isc_buffer_t now provides this functionality internally.

Additionally to the above, the old version of the code lacked any
logic to reuse the dynamically allocated buffers. Now, as we do not
manage memory buffers, but isc_buffer_t objects, we can implement this
strategy. It can be in particular helpful for longer lasting
connections, as in this case the buffer will adjust itself to the size
of the messages being transferred. That is, it is in particular useful
for XoT, as Stream DNS happen to order send requests in such a way
that the send request will get reused.
2022-12-30 19:56:25 +02:00
Artem Boldariev
7962e7f575 tlsctx_client_session_cache_new() -> tlsctx_client_session_create()
Additionally to renaming, it changes the function definition so that
it accepts a pointer to pointer instead of returning a pointer to the
new object.

It is mostly done to make it in line with other functions in the
module.
2022-12-23 11:10:11 +02:00
Artem Boldariev
f102df96b8 Rename isc_tlsctx_cache_new() -> isc_tlsctx_cache_create()
Additionally to renaming, it changes the function definition so that
it accepts a pointer to pointer instead of returning a pointer to the
new object.

It is mostly done to make it in line with other functions in the
module.
2022-12-23 11:10:11 +02:00
Ondřej Surý
6cb6373b5a Convert Stream DNS to use isc_buffer API
Drop the whole isc_dnsbuffer API and use new improved isc_buffer API
that provides same functionality as the isc_dnsbuffer unit now.
2022-12-20 22:13:53 +02:00
Artem Boldariev
0a7e83feea StreamDNS: Use isc__nm_senddns() to send DNS messages
This commit modifies the Stream DNS message so that it uses the
optimised code path (isc__nm_senddns()) for sending DNS messages over
the underlying transport. This way we avoid allocating any
intermediate memory buffers needed to render a DNS message with its
length pre-pended ahead of the contents (TCP DNS message format).
2022-12-20 22:13:53 +02:00
Artem Boldariev
cb6f3dc3c8 TLS: isc__nm_senddns() support
This commit adds support for isc_nm_senddns() to the generic TLS code.
2022-12-20 22:13:53 +02:00
Artem Boldariev
ad876a65af Add isc__nm_senddns()
The new internal function works in the same way as isc_nm_send()
except that it sends a DNS message size ahead of the DNS message
data (the format used in DNS over TCP).

The intention is to provide a fast path for sending DNS messages over
streams protocols - that is, without allocating any intermediate
memory buffers.
2022-12-20 22:13:53 +02:00
Artem Boldariev
56732ac2a0 TLS: try to avoid allocating send request objects
This commit optimises TLS send request object allocation to enable
send request object reuse, somewhat reducing pressure on the memory
manager. It is especially helpful in the case when Stream DNS uses the
TLS implementation as the transport.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4277eeeb9c Remove TLS DNS transport (and parts common with TCP DNS)
This commit removes TLS DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
e5649710d3 Remove TCP DNS transport
This commit removes TCP DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4524bf4083 Make isc_nm_tlssocket non-optional
This commit unties generic TLS code (isc_nm_tlssocket) from DoH, so
that it will be available regardless of the fact if BIND was built
with DNS over HTTP support or not.
2022-12-20 22:13:53 +02:00
Artem Boldariev
efe4267044 DoH: use isc_nmhandle_set_tcp_nodelay()
This commit replaces ad-hoc code for disabling Nagle's algorithm with
a call to isc_nmhandle_set_tcp_nodelay().
2022-12-20 22:13:53 +02:00
Artem Boldariev
e89575ddce StreamDNS: opportunistically disable Nagle's algorithm
This commit ensures that Stream DNS code attempts to disable Nagle's
algorithm regardless of underlying stream transport (TCP or TLS), as
we are not interested in trading latency for throughout when dealing
with DNS messages.
2022-12-20 22:13:53 +02:00
Artem Boldariev
05cfb27b80 Disable Nagle's algorithm for TLS connections by default
This commit ensures that Nagle's algorithm is disabled by default for
TLS connections on best effort basis, just like other networking
software (e.g. NGINX) does, as, in the case of TLS, we are not
interested in trading latency for throughput, rather vice versa.

We attempt to disable it as early as we can, right after TCP
connections establishment, as an attempt to speed up handshake
handling.
2022-12-20 22:13:53 +02:00
Artem Boldariev
371b02f37a TCP: make it possible to set Nagle's algorithms state via handle
This commit adds ability to turn the Nagle's algorithm on or off via
connections handle. It adds the isc_nmhandle_set_tcp_nodelay()
function as the public interface for this functionality.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4606384345 Extend isc__nm_socket_tcp_nodelay() to accept value
This makes it possible to both enable and disable Nagle's algorithm
for a TCP socket descriptor, before the change it was possible only to
disable it.
2022-12-20 22:13:53 +02:00
Artem Boldariev
f395cd4b3e Add isc_nm_streamdnssocket (aka Stream DNS)
This commit adds an initial implementation of isc_nm_streamdnssocket
transport: a unified transport for DNS over stream protocols messages,
which is capable of replacing both TCP DNS and TLS DNS
transports. Currently, the interface it provides is a unified set of
interfaces provided by both of the transports it attempts to replace.

The transport is built around "isc_dnsbuffer_t" and
"isc_dnsstream_assembler_t" objects and attempts to minimise both the
number of memory allocations during network transfers as well as
memory usage.
2022-12-20 22:13:51 +02:00
Artem Boldariev
338cf3e467 Add isc_dnsstream_assembler_t implementation
This commit adds the implementation for an "isc_dnsstream_assembler_t"
object. The object is built on top of "isc_dnsbuffer_t" and is
intended to encapsulate the state machine used for handling DNS
messages received in the format used for messages transmitted over
TCP.

The idea is that the object accepts the input data received from a
socket, tries to assemble DNS messages from the incoming data and
calls the callback which contains the status of the incoming data as
well as a pointer to the memory region referencing the data of the
assembled message. It is capable of assembling DNS messages no matter
how torn apart they are when sent over network.

The following statuses might be passed to the callback:

* ISC_R_SUCCESS - a message has been successfully assembled;
* ISC_R_NOMORE  - not enough data has been processed to assemble a
message;
* ISC_R_RANGE - there was an attempt to process a zero-sized DNS
message (someone attempts to send us junk data).

One could say that the object replaces the implementation of
"isc__nm_*_processbuffer()" functions used by the old TCP DNS and TLS
DNS transports with a better defined state machine completely
decoupled from the networking code itself.

Such a design makes it trivial to write unit tests for it, leading to
better verification of its correctness.

Another important difference is directly related to the fact that it
is built on top of "isc_dnsbuffer_t", which tries to manage memory in
a smart way. In particular:

* It tries to use a static buffer for smaller messages, reducing
pressure on the memory manager (hot path);
* When allocating dynamic memory for larger messages, it tries to
allocate memory conservatively (generic path).

These characteristics is a significant upgrade over the older logic
where a 64KB(+2 bytes) buffer was allocated from dynamic memory
regardless of the fact if we need a buffer this large or not. That is,
lesser memory usage is expected in a generic case for DNS transports
built on top of "isc_dnsstream_assembler_t."
2022-12-20 21:24:44 +02:00
Artem Boldariev
cbb758abd4 Add isc_dnsbuffer_t implementation
This commit adds "isc_dnsbuffer_t" object implementation, a thin
wrapper on top of "isc_buffer_t" which has the following
characteristics:

* provides interface specifically atuned for handling/generating DNS
messages, especially in the format used for DNS messages over TCP;
* avoids allocating dynamic memory when handling small DNS messages,
while transparently switching to using dynamic memory when handling
larger messages. This approach significantly reduces pressure on the
memory allocator, as most of the DNS messages are small.
2022-12-20 21:24:44 +02:00
Artem Boldariev
c0c59b55ab TLS: add an internal function isc__nmhandle_get_selected_alpn()
The added function provides the interface for getting an ALPN tag
negotiated during TLS connection establishment.

The new function can be used by higher level transports.
2022-12-20 21:24:44 +02:00
Artem Boldariev
15e626f1ca TLS: add manual read timer control mode
This commit adds manual read timer control mode, similarly to TCP.
This way the read timer can be controlled manually using:

* isc__nmsocket_timer_start();
* isc__nmsocket_timer_stop();
* isc__nmsocket_timer_restart().

The change is required to make it possible to implement more
sophisticated read timer control policies in DNS transports, built on
top of TLS.
2022-12-20 21:24:44 +02:00
Artem Boldariev
9aabd55725 TCP: add manual read timer control mode
This commit adds a manual read timer control mode to the TCP
code (adding isc__nmhandle_set_manual_timer() as the interface to it).

Manual read timer control mode suppresses read timer restarting the
read timer when receiving any amount of data. This way the read timer
can be controlled manually using:

* isc__nmsocket_timer_start();
* isc__nmsocket_timer_stop();
* isc__nmsocket_timer_restart().

The change is required to make it possible to implement more
sophisticated read timer control policies in DNS transports, built on
top of TCP.
2022-12-20 21:24:44 +02:00
Artem Boldariev
f4760358f8 TLS: expose the ability to (re)start and stop underlying read timer
This commit adds implementation of isc__nmsocket_timer_restart() and
isc__nmsocket_timer_stop() for generic TLS code in order to make its
interface more compatible with that of TCP.
2022-12-20 21:24:44 +02:00
Artem Boldariev
f18a9b3743 TLS: add isc__nmsocket_timer_running() support
This commit adds isc__nmsocket_timer_running() support to the generic
TLS code in order to make it more compatible with TCP.
2022-12-20 21:24:44 +02:00
Artem Boldariev
c0808532e1 TLS: isc_nm_bad_request() and isc__nmsocket_reset() support
This commit adds implementations of isc_nm_bad_request() and
isc__nmsocket_reset() to the generic TLS stream code in order to make
it more compatible with TCP code.
2022-12-20 21:24:44 +02:00
Artem Boldariev
94e650ce89 Use 'restrict' and 'const' for 'isc_buffer_t'
The purpose of this commit is to aid compiler in generating better
code when working with `isc_buffer_t` objects by using restricted
pointers (and, to a lesser extent, 'const' modifier for read-only
arguments).

This way we, basically, instruct the compiler that the members of
structured passed by pointers into the functions can be treated as
local variables in the scope of a function. That should reduce the
number of load/store operations emitted by compilers when accessing
objects (e.g. 'isc_buffer_t') via pointers.
2022-12-20 21:01:27 +02:00