Commit graph

106 commits

Author SHA1 Message Date
Ondřej Surý
6370e9b311 Add isc_helper API that adds 1:1 thread for each loop
Add an extra thread that can be used to offload operations that would
affect latency, but are not long-running tasks; those are handled by
isc_work API.

Each isc_loop now has matching isc_helper thread that also built on top
of uv_loop.  In fact, it matches most of the isc_loop functionality, but
only the `isc_helper_run()` asynchronous call is exposed.
2024-09-12 12:09:45 +00:00
Aydın Mercan
5dbb560747 remove the crc64 implementation
CRC-64 has been added for map files. Now that the map file format has
been removed, there isn't a reason to keep the implementation.
2024-08-05 11:21:25 +00:00
Ondřej Surý
a9b4d42346 Add isc_queue implementation on top of cds_wfcq
Add an isc_queue implementation that hides the gory details of cds_wfcq
into more neat API.  The same caveats as with cds_wfcq.

TODO: Add documentation to the API.
2024-06-05 09:19:56 +02:00
Ondřej Surý
2463e5232d
Use proper padding instead of using alignas()
As it was pointed out, the alignas() can't be used on objects larger
than `max_align_t` otherwise the compiler might miscompile the code to
use auto-vectorization on unaligned memory.

As we were only using alignas() as a way to prevent false memory
sharing, we can use manual padding in the affected structures.
2024-02-08 10:54:35 +01:00
Artem Boldariev
4a88fc9d5b PROXYv2 over UDP transport
This commit adds a new transport that supports PROXYv2 over UDP. It is
built on top of PROXYv2 handling code (just like PROXY Stream). It
works by processing and stripping the PROXYv2 headers at the beginning
of a datagram (when accepting a datagram) or by placing a PROXYv2
header to the beginning of an outgoing datagram.

The transport is built in such a way that incoming datagrams are being
handled with minimal memory allocations and copying.
2023-12-06 15:15:25 +02:00
Artem Boldariev
d119d666b3 PROXY Stream transport
This commit adds a new stream-based transport with an interface
compatible with TCP. The transport is built on top of TCP transport
and the new PROXYv2 handling code. Despite being built on top of TCP,
it can be easily extended to work on top of any TCP-like stream-based
transport. The intention of having this transport is to add PROXYv2
support into all existing stream-based DNS transport (DNS over TCP,
DNS over TLS, DNS over HTTP) by making the work on top of this new
transport.

The idea behind the transport is simple after accepting the connection
or connecting to a remote server it enters PROXYv2 handling mode: that
is, it either attempts to read (when accepting the connection) or send
(when establishing a connection) a PROXYv2 header. After that it works
like a mere wrapper on top of the underlying stream-based
transport (TCP).
2023-12-06 15:15:24 +02:00
Artem Boldariev
2c76717881 Add PROXYv2 header utilities
This commit adds a set of utilities for dealing with PROXYv2 headers,
both parsing and generating them. The code has no dependencies from
the networking code and is (for the most part) a "separate library".

The part responsible for handling incoming PROXYv2 headers is
structured as a state machine which accepts data as input and calls a
callback to notify the upper-level code about the data processing
status.

Such a design, among other things, makes it easy to write a thorough
unit test suite for that, as there are fewer dependencies as well as
will not stand in the way of any changes in the networking code.
2023-12-06 15:15:24 +02:00
Ondřej Surý
17da9fed58
Remove AES algorithm for DNS cookies
The AES algorithm for DNS cookies was being kept for legacy reasons, and
it can be safely removed in the next major release.  Remove both the AES
usage for DNS cookies and the AES implementation itself.
2023-11-15 10:31:16 +01:00
Ondřej Surý
4dd49ac528
Implement incremental version of SipHash 2-4 and HalfSipHash 2-4
When inserting items into hashtables (hashmaps), we might have a
fragmented key (as an example we might want to hash DNS name + class +
type).  We either need to construct continuous key in the memory and
then hash it en bloc, or incremental hashing is required.

This incremental version of SipHash 2-4 algorithm is the first building
block.

As SipHash 2-4 is often used in the hot paths, I've turned the
implementation into header-only version in the process.
2023-09-12 16:17:06 +02:00
Ondřej Surý
784d055809
Add support for User Statically Defined Tracing (USDT) probes
This adds support for User Statically Defined Tracing (USDT).  On
Linux, this uses the header from SystemTap and dtrace utility, but the
support is universal as long as dtrace is available.

Also add the required infrastructure to add probes to libisc, libdns and
libns libraries, where most of the probes will be.
2023-08-21 18:39:53 +02:00
Tony Finch
7474cad4ad
Add <isc/overflow.h> for checked mul, add, and sub
The `ISC_OVERFLOW_XXX()` macros are usually wrappers around
`__builtin_xxx_overflow()`, with alternative implementations
for compilers that lack the builtins.

Replace the overflow checks in `isc/time.c` with the new macros.
2023-06-27 12:38:09 +02:00
Michał Kępień
6029010dd2
Remove <isc/cmocka.h>
The last use of the cmocka_add_test_byname() helper macro was removed in
commit 63fe9312ff.  Remove the
<isc/cmocka.h> header that defines it.
2023-05-18 15:12:23 +02:00
Tony Finch
fc770a8bd0
Remove the now-unused ISC_STACK
We are using the liburcu concurrent data structures instead.
2023-05-12 20:49:43 +01:00
Tony Finch
05ca11e122
Remove isc_qsbr (we are using liburcu instead)
This commit breaks the qp-trie code.
2023-05-12 20:48:31 +01:00
Ondřej Surý
fd3522c37b
Add Userspace-RCU to global CFLAGS and LIBS
The Userspace-RCU headers are now needed for more parts of the libisc
and libdns, thus we need to add it globally to prevent compilation
failures on systems with non-standard Userspace-RCU installation path.
2023-05-12 14:16:25 +02:00
Ondřej Surý
65021dbf52
Move the isc_random API initialization to the thread_local variable
Instead of writing complicated wrappers for every thread, move the
initialization back to isc_random unit and check whether the random seed
was initialized with a thread_local variable.

Ensure that isc_entropy_get() returns a non-zero seed.

This avoids problems with thread sanitizer tests getting stuck in an
infinite loop.
2023-04-27 12:38:53 +02:00
Ondřej Surý
c2c907d728
Improve the Userspace RCU integration
This commit allows BIND 9 to be compiled with different flavours of
Userspace RCU, and improves the integration between Userspace RCU and
our event loop:

- In the RCU QSBR, the thread is put offline when polling and online
  when rcu_dereference, rcu_assign_pointer (or friends) are called.

- In other RCU modes, we check that we are not reading when reaching the
  quiescent callback in the event loop.

- We register the thread before uv_work_run() callback is called and
  after it has finished.  The rcu_(un)register_thread() has a large
  overhead, but that's fine in this case.
2023-04-27 12:38:53 +02:00
Ondřej Surý
b497e90179
Add isc_spinlock unit with shim pthread_spin implementation
The spinlock is small (atomic_uint_fast32_t at most), lightweight
synchronization primitive and should only be used for short-lived and
most of the time a isc_mutex should be used.

Add a isc_spinlock unit which is either (most of the time) a think
wrapper around pthread_spin API or an efficient shim implementation of
the simple spinlock.
2023-04-21 12:10:02 +02:00
Tony Finch
82213a48cf Add isc_histo for histogram statistics
This is an adaptation of my `hg64` experiments for use in BIND.

As well as renaming everything according to ISC style, I have
written some more extensive tests that ensure the edge cases are
correct and the fenceposts are in the right places.

I have added utility functions for working with precision in terms of
decimal significant figures as well as this code's native binary.
2023-04-03 12:08:05 +01:00
Mark Andrews
5a2e82557e Define isc_fips_mode() and isc_fips_set_mode()
isc_fips_mode() determines if the process is running in FIPS mode

isc_fips_set_mode() sets the process into FIPS mode
2023-04-03 12:05:28 +10:00
Tony Finch
555690a3c9 Simplify thread spawning
The `isc_trampoline` module had a lot of machinery to support stable
thread IDs for use by hazard pointers. But the hazard pointer code
is gone, and the `isc_loop` module now has its own per-loop thread
IDs.

The trampoline machinery seems over-complicated for its remaining
tasks, so move the per-thread initialization into `isc/thread.c`,
and delete the rest.
2023-03-31 17:21:52 +01:00
Ondřej Surý
263d232c79 Replace isc_fsaccess API with more secure file creation
The isc_fsaccess API was created to hide the implementation details
between POSIX and Windows APIs.  As we are not supporting the Windows
APIs anymore, it's better to drop this API used in the DST part.

Moreover, the isc_fsaccess was setting the permissions in an insecure
manner - it operated on the filename, and not on the file descriptor
which can lead to all kind of attacks if unpriviledged user has read (or
even worse write) access to key directory.

Replace the code that operates on the private keys with code that uses
mkstemp(), fchmod() and atomic rename() at the end, so at no time the
private key files have insecure permissions.
2023-03-31 12:52:59 +00:00
Ondřej Surý
1844590ad9
Refactor isc_job_run to not-make any allocations
Change the isc_job_run() to not-make any allocations.  The caller must
make sure that it allocates isc_job_t - usually as part of the argument
passed to the callback.

For simple jobs, using isc_async_run() is advised as it allocates its
own separate isc_job_t.
2023-03-30 16:00:52 +02:00
Ondřej Surý
2532b558b4
Build with liburcu, Userspace RCU
BIND needs a collection of standard lock-free data structures,
which we can find in liburcu, along with its RCU safe memory
reclamation machinery. We will use liburcu's QSBR variant instead
of the home-grown isc_qsbr.
2023-03-10 17:31:28 +01:00
Tony Finch
9b7aa536ba QSBR: safe memory reclamation for lock-free data structures
This "quiescent state based reclamation" module provides support for
the qp-trie module in dns/qp. It is a replacement for liburcu, written
without reference to the urcu source code, and in fact it works in a
significantly different way.

A few specifics of BIND make this variant of QSBR somewhat simpler:

  * We can require that wait-free access to a qp-trie only happens in
    an isc_loop callback. The loop provides a natural quiescent state,
    after the callbacks are done, when no qp-trie access occurs.

  * We can dispense with any API like rcu_synchronize(). In practice,
    it takes far too long to wait for a grace period to elapse for each
    write to a data structure.

  * We use the idea of "phases" (aka epochs or eras) from EBR to
    reduce the amount of bookkeeping needed to track memory that is no
    longer needed, knowing that the qp-trie does most of that work
    already.

I considered hazard pointers for safe memory reclamation. They have
more read-side overhead (updating the hazard pointers) and it wasn't
clear to me how to nicely schedule the cleanup work. Another
alternative, epoch-based reclamation, is designed for fine-grained
lock-free updates, so it needs some rethinking to work well with the
heavily read-biased design of the qp-trie. QSBR has the fastest read
side of the basic SMR algorithms (with no barriers), and fits well
into a libuv loop. More recent hybrid SMR algorithms do not appear to
have enough benefits to justify the extra complexity.
2023-02-23 15:57:53 +00:00
Evan Hunt
dc27552c30 remove isc_glob
the isc_glob module was originally needed to support posix-style glob
processing on Windows, but is now just an unnecessary wrapper around
glob(3). this commit removes it.
2023-02-22 17:35:29 +00:00
Tony Finch
36e56923ce Simple lock-free stack in <isc/stack.h>
Add a singly-linked stack that supports lock-free prepend and drain (to
empty the list and clean up its elements).  Intended for use with QSBR
to collect objects that need safe memory reclamation, or any other user
that works with adding objects to the stack and then draining them in
one go like various work queues.

In <isc/atomic.h>, add an `atomic_ptr()` macro to make type
declarations a little less abominable, and clean up a duplicate
definition of `atomic_compare_exchange_strong_acq_rel()`
2023-02-22 16:13:37 +00:00
Tony Finch
3fef7c626a Move bind9_getaddresses() to isc_getaddresses()
No need to have a whole library for one function.
2023-02-21 13:12:26 +00:00
Evan Hunt
a52b17d39b
remove isc_task completely
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.

functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.

the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
2023-02-16 18:35:32 +01:00
Tony Finch
f9c725d7d4 Remove do-nothing header <isc/stat.h>
Use <sys/stat.h> instead
2023-02-15 16:44:47 +00:00
Tony Finch
6927a30926 Remove do-nothing header <isc/print.h>
This one really truly did nothing. No lines added!
2023-02-15 16:44:47 +00:00
Tony Finch
c7615bc28d Remove do-nothing header <isc/offset.h>
And replace all uses of isc_offset_t with standard off_t
2023-02-15 16:44:47 +00:00
Tony Finch
bed09c1676 Remove do-nothing header <isc/netdb.h>
Not needed since we dropped Windows support
2023-02-15 16:44:47 +00:00
Tony Finch
75f7a85a39 Deprecate <isc/deprecated.h>
We refactor more freely these days.
2023-02-15 15:36:20 +00:00
Tony Finch
436b76bb17 Improve the spinloop pause / yield hint
Unfortunately, C still lacks a standard function for pause (x86,
sparc) or yeild (arm) instructions, for use in spin lock or CAS loops.
BIND has its own based on vendor intrinsics or inline asm.

Previously, it was buried in the `isc_rwlock` implementation. This
commit renames `isc_rwlock_pause()` to `isc_pause()` and moves
it into <isc/pause.h>.

This commit also fixes the configure script so that it detects ARM
yield support on systems that identify as `aarch*` instead of `arm*`.

On 64-bit ARM systems we now use the ISB (instruction synchronization
barrier) instruction in preference to yield. The ISB instruction
pauses the CPU for longer, several nanoseconds, which is more like the
x86 pause instruction. There are more details in a Rust pull request,
which also refers to MySQL making the same change:
https://github.com/rust-lang/rust/pull/84725
2023-02-14 17:13:24 +00:00
Evan Hunt
935879ed11 remove isc_bind9 variable
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
2023-02-09 18:00:13 +00:00
Ondřej Surý
baced007af
Require C11 Atomic Operations via <stdatomic.h>
Make the C11 Atomic Operations mandatory and drop the Gcc __atomic
builtin shims.
2023-02-08 21:33:23 +01:00
Ondřej Surý
10f884a5b8
Remove unused isc_astack unit
The isc_astack unit is now unused, so just remove it.
2023-01-10 20:31:24 +01:00
Ondřej Surý
6cb6373b5a Convert Stream DNS to use isc_buffer API
Drop the whole isc_dnsbuffer API and use new improved isc_buffer API
that provides same functionality as the isc_dnsbuffer unit now.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4277eeeb9c Remove TLS DNS transport (and parts common with TCP DNS)
This commit removes TLS DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
e5649710d3 Remove TCP DNS transport
This commit removes TCP DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4524bf4083 Make isc_nm_tlssocket non-optional
This commit unties generic TLS code (isc_nm_tlssocket) from DoH, so
that it will be available regardless of the fact if BIND was built
with DNS over HTTP support or not.
2022-12-20 22:13:53 +02:00
Artem Boldariev
f395cd4b3e Add isc_nm_streamdnssocket (aka Stream DNS)
This commit adds an initial implementation of isc_nm_streamdnssocket
transport: a unified transport for DNS over stream protocols messages,
which is capable of replacing both TCP DNS and TLS DNS
transports. Currently, the interface it provides is a unified set of
interfaces provided by both of the transports it attempts to replace.

The transport is built around "isc_dnsbuffer_t" and
"isc_dnsstream_assembler_t" objects and attempts to minimise both the
number of memory allocations during network transfers as well as
memory usage.
2022-12-20 22:13:51 +02:00
Artem Boldariev
338cf3e467 Add isc_dnsstream_assembler_t implementation
This commit adds the implementation for an "isc_dnsstream_assembler_t"
object. The object is built on top of "isc_dnsbuffer_t" and is
intended to encapsulate the state machine used for handling DNS
messages received in the format used for messages transmitted over
TCP.

The idea is that the object accepts the input data received from a
socket, tries to assemble DNS messages from the incoming data and
calls the callback which contains the status of the incoming data as
well as a pointer to the memory region referencing the data of the
assembled message. It is capable of assembling DNS messages no matter
how torn apart they are when sent over network.

The following statuses might be passed to the callback:

* ISC_R_SUCCESS - a message has been successfully assembled;
* ISC_R_NOMORE  - not enough data has been processed to assemble a
message;
* ISC_R_RANGE - there was an attempt to process a zero-sized DNS
message (someone attempts to send us junk data).

One could say that the object replaces the implementation of
"isc__nm_*_processbuffer()" functions used by the old TCP DNS and TLS
DNS transports with a better defined state machine completely
decoupled from the networking code itself.

Such a design makes it trivial to write unit tests for it, leading to
better verification of its correctness.

Another important difference is directly related to the fact that it
is built on top of "isc_dnsbuffer_t", which tries to manage memory in
a smart way. In particular:

* It tries to use a static buffer for smaller messages, reducing
pressure on the memory manager (hot path);
* When allocating dynamic memory for larger messages, it tries to
allocate memory conservatively (generic path).

These characteristics is a significant upgrade over the older logic
where a 64KB(+2 bytes) buffer was allocated from dynamic memory
regardless of the fact if we need a buffer this large or not. That is,
lesser memory usage is expected in a generic case for DNS transports
built on top of "isc_dnsstream_assembler_t."
2022-12-20 21:24:44 +02:00
Artem Boldariev
cbb758abd4 Add isc_dnsbuffer_t implementation
This commit adds "isc_dnsbuffer_t" object implementation, a thin
wrapper on top of "isc_buffer_t" which has the following
characteristics:

* provides interface specifically atuned for handling/generating DNS
messages, especially in the format used for DNS messages over TCP;
* avoids allocating dynamic memory when handling small DNS messages,
while transparently switching to using dynamic memory when handling
larger messages. This approach significantly reduces pressure on the
memory allocator, as most of the DNS messages are small.
2022-12-20 21:24:44 +02:00
Ondřej Surý
8e3a86f6dd
Make the isc_buffer unit header-only
The isc_buffer is often used in the hot-path, so make it header-only
implementation.
2022-12-20 19:13:48 +01:00
Ondřej Surý
e2262c2112
Remove isc_resource API and set limits directly in named_os unit
The only function left in the isc_resource API was setting the file
limit.  Replace the whole unit with a simple getrlimit to check the
maximum value of RLIMIT_NOFILE and set the maximum back to rlimit_cur.

This is more compatible than trying to set RLIMIT_UNLIMITED on the
RLIMIT_NOFILE as it doesn't work on Linux (see man 5 proc on
/proc/sys/fs/nr_open), neither it does on Darwin kernel (see man 2
getrlimit).

The only place where the maximum value could be raised under privileged
user would be BSDs, but the `named_os_adjustnofile()` were not called
there before.  We would apply the increased limits only on Linux and Sun
platforms.
2022-12-07 19:40:00 +01:00
Ondřej Surý
f46ce447a6
Add isc_hashmap API that implements Robin Hood hashing
Add new isc_hashmap API that differs from the current isc_ht API in
several aspects:

1. It implements Robin Hood Hashing which is open-addressing hash table
   algorithm (e.g. no linked-lists)

2. No memory allocations - the array to store the nodes is made of
   isc_hashmap_node_t structures instead of just pointers, so there's
   only allocation on resize.

3. The key is not copied into the hashmap node and must be also stored
   externally, either as part of the stored value or in any other
   location that's valid as long the value is stored in the hashmap.

This makes the isc_hashmap_t a little less universal because of the key
storage requirements, but the inserts and deletes are faster because
they don't require memory allocation on isc_hashmap_add() and memory
deallocation on isc_hashmap_delete().
2022-11-10 15:07:19 +01:00
Ondřej Surý
0492bbf590
Make the pthread_rwlock implementation header-only macros [2/2]
While using mutrace, the phtread-rwlock based isc_rwlock implementation
would be all tracked in the rwlock.c unit losing all useful information
as all rwlocks would be traced in a single place.  Rewrite the
pthread_rwlock based implementation to be header-only macros, so we can
use mutrace to properly track the rwlock contention without heavily
patching mutrace to understand the libisc synchronization primitives.
2022-11-02 10:34:10 +01:00
Ondřej Surý
3a8884f024 Add picohttpparser.{c.h} from https://github.com/h2o/picohttpparser
PicoHTTPParser is a tiny, primitive, fast HTTP request/response parser.

Unlike most parsers, it is stateless and does not allocate memory by
itself. All it does is accept pointer to buffer and the output
structure, and setups the pointers in the latter to point at the
necessary portions of the buffer.
2022-10-14 11:26:54 +02:00