Commit graph

82 commits

Author SHA1 Message Date
Tony Finch
9b7aa536ba QSBR: safe memory reclamation for lock-free data structures
This "quiescent state based reclamation" module provides support for
the qp-trie module in dns/qp. It is a replacement for liburcu, written
without reference to the urcu source code, and in fact it works in a
significantly different way.

A few specifics of BIND make this variant of QSBR somewhat simpler:

  * We can require that wait-free access to a qp-trie only happens in
    an isc_loop callback. The loop provides a natural quiescent state,
    after the callbacks are done, when no qp-trie access occurs.

  * We can dispense with any API like rcu_synchronize(). In practice,
    it takes far too long to wait for a grace period to elapse for each
    write to a data structure.

  * We use the idea of "phases" (aka epochs or eras) from EBR to
    reduce the amount of bookkeeping needed to track memory that is no
    longer needed, knowing that the qp-trie does most of that work
    already.

I considered hazard pointers for safe memory reclamation. They have
more read-side overhead (updating the hazard pointers) and it wasn't
clear to me how to nicely schedule the cleanup work. Another
alternative, epoch-based reclamation, is designed for fine-grained
lock-free updates, so it needs some rethinking to work well with the
heavily read-biased design of the qp-trie. QSBR has the fastest read
side of the basic SMR algorithms (with no barriers), and fits well
into a libuv loop. More recent hybrid SMR algorithms do not appear to
have enough benefits to justify the extra complexity.
2023-02-23 15:57:53 +00:00
Evan Hunt
dc27552c30 remove isc_glob
the isc_glob module was originally needed to support posix-style glob
processing on Windows, but is now just an unnecessary wrapper around
glob(3). this commit removes it.
2023-02-22 17:35:29 +00:00
Tony Finch
36e56923ce Simple lock-free stack in <isc/stack.h>
Add a singly-linked stack that supports lock-free prepend and drain (to
empty the list and clean up its elements).  Intended for use with QSBR
to collect objects that need safe memory reclamation, or any other user
that works with adding objects to the stack and then draining them in
one go like various work queues.

In <isc/atomic.h>, add an `atomic_ptr()` macro to make type
declarations a little less abominable, and clean up a duplicate
definition of `atomic_compare_exchange_strong_acq_rel()`
2023-02-22 16:13:37 +00:00
Tony Finch
3fef7c626a Move bind9_getaddresses() to isc_getaddresses()
No need to have a whole library for one function.
2023-02-21 13:12:26 +00:00
Evan Hunt
a52b17d39b
remove isc_task completely
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.

functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.

the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
2023-02-16 18:35:32 +01:00
Tony Finch
f9c725d7d4 Remove do-nothing header <isc/stat.h>
Use <sys/stat.h> instead
2023-02-15 16:44:47 +00:00
Tony Finch
6927a30926 Remove do-nothing header <isc/print.h>
This one really truly did nothing. No lines added!
2023-02-15 16:44:47 +00:00
Tony Finch
c7615bc28d Remove do-nothing header <isc/offset.h>
And replace all uses of isc_offset_t with standard off_t
2023-02-15 16:44:47 +00:00
Tony Finch
bed09c1676 Remove do-nothing header <isc/netdb.h>
Not needed since we dropped Windows support
2023-02-15 16:44:47 +00:00
Tony Finch
75f7a85a39 Deprecate <isc/deprecated.h>
We refactor more freely these days.
2023-02-15 15:36:20 +00:00
Tony Finch
436b76bb17 Improve the spinloop pause / yield hint
Unfortunately, C still lacks a standard function for pause (x86,
sparc) or yeild (arm) instructions, for use in spin lock or CAS loops.
BIND has its own based on vendor intrinsics or inline asm.

Previously, it was buried in the `isc_rwlock` implementation. This
commit renames `isc_rwlock_pause()` to `isc_pause()` and moves
it into <isc/pause.h>.

This commit also fixes the configure script so that it detects ARM
yield support on systems that identify as `aarch*` instead of `arm*`.

On 64-bit ARM systems we now use the ISB (instruction synchronization
barrier) instruction in preference to yield. The ISB instruction
pauses the CPU for longer, several nanoseconds, which is more like the
x86 pause instruction. There are more details in a Rust pull request,
which also refers to MySQL making the same change:
https://github.com/rust-lang/rust/pull/84725
2023-02-14 17:13:24 +00:00
Evan Hunt
935879ed11 remove isc_bind9 variable
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
2023-02-09 18:00:13 +00:00
Ondřej Surý
baced007af
Require C11 Atomic Operations via <stdatomic.h>
Make the C11 Atomic Operations mandatory and drop the Gcc __atomic
builtin shims.
2023-02-08 21:33:23 +01:00
Ondřej Surý
10f884a5b8
Remove unused isc_astack unit
The isc_astack unit is now unused, so just remove it.
2023-01-10 20:31:24 +01:00
Ondřej Surý
6cb6373b5a Convert Stream DNS to use isc_buffer API
Drop the whole isc_dnsbuffer API and use new improved isc_buffer API
that provides same functionality as the isc_dnsbuffer unit now.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4277eeeb9c Remove TLS DNS transport (and parts common with TCP DNS)
This commit removes TLS DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
e5649710d3 Remove TCP DNS transport
This commit removes TCP DNS transport superseded by Stream DNS.
2022-12-20 22:13:53 +02:00
Artem Boldariev
4524bf4083 Make isc_nm_tlssocket non-optional
This commit unties generic TLS code (isc_nm_tlssocket) from DoH, so
that it will be available regardless of the fact if BIND was built
with DNS over HTTP support or not.
2022-12-20 22:13:53 +02:00
Artem Boldariev
f395cd4b3e Add isc_nm_streamdnssocket (aka Stream DNS)
This commit adds an initial implementation of isc_nm_streamdnssocket
transport: a unified transport for DNS over stream protocols messages,
which is capable of replacing both TCP DNS and TLS DNS
transports. Currently, the interface it provides is a unified set of
interfaces provided by both of the transports it attempts to replace.

The transport is built around "isc_dnsbuffer_t" and
"isc_dnsstream_assembler_t" objects and attempts to minimise both the
number of memory allocations during network transfers as well as
memory usage.
2022-12-20 22:13:51 +02:00
Artem Boldariev
338cf3e467 Add isc_dnsstream_assembler_t implementation
This commit adds the implementation for an "isc_dnsstream_assembler_t"
object. The object is built on top of "isc_dnsbuffer_t" and is
intended to encapsulate the state machine used for handling DNS
messages received in the format used for messages transmitted over
TCP.

The idea is that the object accepts the input data received from a
socket, tries to assemble DNS messages from the incoming data and
calls the callback which contains the status of the incoming data as
well as a pointer to the memory region referencing the data of the
assembled message. It is capable of assembling DNS messages no matter
how torn apart they are when sent over network.

The following statuses might be passed to the callback:

* ISC_R_SUCCESS - a message has been successfully assembled;
* ISC_R_NOMORE  - not enough data has been processed to assemble a
message;
* ISC_R_RANGE - there was an attempt to process a zero-sized DNS
message (someone attempts to send us junk data).

One could say that the object replaces the implementation of
"isc__nm_*_processbuffer()" functions used by the old TCP DNS and TLS
DNS transports with a better defined state machine completely
decoupled from the networking code itself.

Such a design makes it trivial to write unit tests for it, leading to
better verification of its correctness.

Another important difference is directly related to the fact that it
is built on top of "isc_dnsbuffer_t", which tries to manage memory in
a smart way. In particular:

* It tries to use a static buffer for smaller messages, reducing
pressure on the memory manager (hot path);
* When allocating dynamic memory for larger messages, it tries to
allocate memory conservatively (generic path).

These characteristics is a significant upgrade over the older logic
where a 64KB(+2 bytes) buffer was allocated from dynamic memory
regardless of the fact if we need a buffer this large or not. That is,
lesser memory usage is expected in a generic case for DNS transports
built on top of "isc_dnsstream_assembler_t."
2022-12-20 21:24:44 +02:00
Artem Boldariev
cbb758abd4 Add isc_dnsbuffer_t implementation
This commit adds "isc_dnsbuffer_t" object implementation, a thin
wrapper on top of "isc_buffer_t" which has the following
characteristics:

* provides interface specifically atuned for handling/generating DNS
messages, especially in the format used for DNS messages over TCP;
* avoids allocating dynamic memory when handling small DNS messages,
while transparently switching to using dynamic memory when handling
larger messages. This approach significantly reduces pressure on the
memory allocator, as most of the DNS messages are small.
2022-12-20 21:24:44 +02:00
Ondřej Surý
8e3a86f6dd
Make the isc_buffer unit header-only
The isc_buffer is often used in the hot-path, so make it header-only
implementation.
2022-12-20 19:13:48 +01:00
Ondřej Surý
e2262c2112
Remove isc_resource API and set limits directly in named_os unit
The only function left in the isc_resource API was setting the file
limit.  Replace the whole unit with a simple getrlimit to check the
maximum value of RLIMIT_NOFILE and set the maximum back to rlimit_cur.

This is more compatible than trying to set RLIMIT_UNLIMITED on the
RLIMIT_NOFILE as it doesn't work on Linux (see man 5 proc on
/proc/sys/fs/nr_open), neither it does on Darwin kernel (see man 2
getrlimit).

The only place where the maximum value could be raised under privileged
user would be BSDs, but the `named_os_adjustnofile()` were not called
there before.  We would apply the increased limits only on Linux and Sun
platforms.
2022-12-07 19:40:00 +01:00
Ondřej Surý
f46ce447a6
Add isc_hashmap API that implements Robin Hood hashing
Add new isc_hashmap API that differs from the current isc_ht API in
several aspects:

1. It implements Robin Hood Hashing which is open-addressing hash table
   algorithm (e.g. no linked-lists)

2. No memory allocations - the array to store the nodes is made of
   isc_hashmap_node_t structures instead of just pointers, so there's
   only allocation on resize.

3. The key is not copied into the hashmap node and must be also stored
   externally, either as part of the stored value or in any other
   location that's valid as long the value is stored in the hashmap.

This makes the isc_hashmap_t a little less universal because of the key
storage requirements, but the inserts and deletes are faster because
they don't require memory allocation on isc_hashmap_add() and memory
deallocation on isc_hashmap_delete().
2022-11-10 15:07:19 +01:00
Ondřej Surý
0492bbf590
Make the pthread_rwlock implementation header-only macros [2/2]
While using mutrace, the phtread-rwlock based isc_rwlock implementation
would be all tracked in the rwlock.c unit losing all useful information
as all rwlocks would be traced in a single place.  Rewrite the
pthread_rwlock based implementation to be header-only macros, so we can
use mutrace to properly track the rwlock contention without heavily
patching mutrace to understand the libisc synchronization primitives.
2022-11-02 10:34:10 +01:00
Ondřej Surý
3a8884f024 Add picohttpparser.{c.h} from https://github.com/h2o/picohttpparser
PicoHTTPParser is a tiny, primitive, fast HTTP request/response parser.

Unlike most parsers, it is stateless and does not allocate memory by
itself. All it does is accept pointer to buffer and the output
structure, and setups the pointers in the latter to point at the
necessary portions of the buffer.
2022-10-14 11:26:54 +02:00
Ondřej Surý
e537fea861
Use custom isc_mem based allocator for libxml2
The libxml2 library provides a way to replace the default allocator with
user supplied allocator (malloc, realloc, strdup and free).

Create a memory context specifically for libxml2 to allow tracking the
memory usage that has originated from within libxml2.  This will provide
a separate memory context for libxml2 to track the allocations and when
shutting down the application it will check that all libxml2 allocations
were returned to the allocator.

Additionally, move the xmlInitParser() and xmlCleanupParser() calls from
bin/named/main.c to library constructor/destructor in libisc library.
2022-09-27 17:10:42 +02:00
Ondřej Surý
236d4b7739
Use custom isc_mem based allocator for OpenSSL
The OpenSSL library provides a way to replace the default allocator with
user supplied allocator (malloc, realloc, and free).

Create a memory context specifically for OpenSSL to allow tracking the
memory usage that has originated from within OpenSSL.  This will provide
a separate memory context for OpenSSL to track the allocations and when
shutting down the application it will check that all OpenSSL allocations
were returned to the allocator.
2022-09-27 17:10:42 +02:00
Ondřej Surý
1baed21688
Switch the CSPRNG function from RAND_bytes() to uv_random()
The RAND_bytes() implementation differs between the OpenSSL versions and
uses the system entropy only for seeding its internal CSPRNG.  The
uv_random() on the other hand uses the system provided CSPRNG.

Switch from RAND_bytes() to uv_random() to use system provided CSPRNG.
2022-09-26 15:13:11 +02:00
Tony Finch
bd251de035
Move random number re-seeding out of the hot path
Instead of checking if we need to re-seed for every isc_random call,
seed the random number generator in the libisc global initializer
and the per-thread initializer.
2022-09-19 16:27:12 +02:00
Tony Finch
27a561273e Consolidate some ASCII tables in isc/ascii and isc/hex
There were a number of places that had copies of various ASCII
tables (case conversion, hex and decimal conversion) that are intended
to be faster than the ctype.h macros, or avoid locale pollution.

Move them into libisc, and wrap the lookup tables with macros that
avoid the ctype.h gotchas.
2022-09-12 12:18:57 +01:00
Ondřej Surý
4d07768a09
Remove the isc_app API
The isc_app API is no longer used and has been removed.
2022-08-26 09:09:25 +02:00
Ondřej Surý
49b149f5fd
Update isc_timer to use isc_loopmgr
* isc_timer was rewritten using the uv_timer, and isc_timermgr_t was
  completely removed; isc_timer objects are now directly created on the
  isc_loop event loops.

* the isc_timer API has been simplified. the "inactive" timer type has
  been removed; timers are now stopped by calling isc_timer_stop()
  instead of resetting to inactive.

* isc_manager now creates a loop manager rather than a timer manager.

* modules and applications using isc_timer have been updated to use the
  new API.
2022-08-25 17:17:07 +02:00
Ondřej Surý
84c90e223f
New event loop handling API
This commit introduces new APIs for applications and signal handling,
intended to replace isc_app for applications built on top of libisc.

* isc_app will be replaced with isc_loopmgr, which handles the
  starting and stopping of applications. In isc_loopmgr, the main
  thread is not blocked, but is part of the working thread set.
  The loop manager will start a number of threads, each with a
  uv_loop event loop running. Setup and teardown functions can be
  assigned which will run when the loop starts and stops, and
  jobs can be scheduled to run in the meantime. When
  isc_loopmgr_shutdown() is run from any the loops, all loops
  will shut down and the application can terminate.

* signal handling will now be handled with a separate isc_signal unit.
  isc_loopmgr only handles SIGTERM and SIGINT for application
  termination, but the application may install additional signal
  handlers, such as SIGHUP as a signal to reload configuration.

* new job running primitives, isc_job and isc_async, have been added.
  Both units schedule callbacks (specifying a callback function and
  argument) on an event loop. The difference is that isc_job unit is
  unlocked and not thread-safe, so it can be used to efficiently
  run jobs in the same thread, while isc_async is thread-safe and
  uses locking, so it can be used to pass jobs from one thread to
  another.

* isc_tid will be used to track the thread ID in isc_loop worker
  threads.

* unit tests have been added for the new APIs.
2022-08-25 12:24:29 +02:00
Ondřej Surý
8e5e0fa522 Use library constructor to create default mutex attr once
Instead of using isc_once_do() on every isc_mutex_init() call, use the
global library constructor to initialize the default mutex attr
object (optionally with PTHREAD_MUTEX_ADAPTIVE_NP if supported) just
once when the library is loaded.
2022-07-13 13:19:32 +02:00
Ondřej Surý
2c3b2dabe9 Move all the unit tests to /tests/<libname>/
The unit tests are now using a common base, which means that
lib/dns/tests/ code now has to include lib/isc/include/isc/test.h and
link with lib/isc/test.c and lib/ns/tests has to include both libisc and
libdns parts.

Instead of cross-linking code between the directories, move the
/lib/<foo>/test.c to /tests/<foo>.c and /lib/<foo>/include/<foo>test.h
to /tests/include/tests/<foo>.h and create a single libtest.la
convenience library in /tests/.

At the same time, move the /lib/<foo>/tests/ to /tests/<foo>/ (but keep
it symlinked to the old location) and adjust paths accordingly.  In few
places, we are now using absolute paths instead of relative paths,
because the directory level has changed.  By moving the directories
under the /tests/ directory, the test-related code is kept in a single
place and we can avoid referencing files between libns->libdns->libisc
which is unhealthy because they live in a separate Makefile-space.

In the future, the /bin/tests/ should be merged to /tests/ and symlink
kept, and the /fuzz/ directory moved to /tests/fuzz/.
2022-05-28 14:53:02 -07:00
Ondřej Surý
b43812692d Move netmgr/uv-compat.h to <isc/uv.h>
As we are going to use libuv outside of the netmgr, we need the shims to
be readily available for the rest of the codebase.

Move the "netmgr/uv-compat.h" to <isc/uv.h> and netmgr/uv-compat.c to
uv.c, and as a rule of thumb, the users of libuv should include
<isc/uv.h> instead of <uv.h> directly.

Additionally, merge netmgr/uverr2result.c into uv.c and rename the
single function from isc__nm_uverr2result() to isc_uverr2result().
2022-05-03 10:02:19 +02:00
Ondřej Surý
24c3879675 Move socket related functions to netmgr/socket.c
Move the netmgr socket related functions from netmgr/netmgr.c and
netmgr/uv-compat.c to netmgr/socket.c, so they are all present all in
the same place.  Adjust the names of couple interal functions
accordingly.
2022-05-03 09:52:49 +02:00
Tony Finch
b2950c96de Revert "Move random number re-seeding out of the hot path"
This reverts commit b1bb41603e.
2022-04-25 15:18:58 +01:00
Tony Finch
b1bb41603e Move random number re-seeding out of the hot path
Instead of checking if we need to re-seed for every isc_random call,
seed the random number generator in the libisc global initializer
and the per-thread initializer.
2022-04-22 16:40:37 +01:00
Ondřej Surý
62a72211aa Remove isc_pool API
Since the last user of the isc_pool API is gone, remove the whole
isc_pool API.
2022-04-01 23:50:34 +02:00
Ondřej Surý
a94678ff77 Create per-thread task and memory context for zonemgr
Previously, the zonemgr created 1 task per 100 zones and 1 memory
context per 1000 zones (with minimum 10 tasks and 2 memory contexts) to
reduce the contention between threads.

Instead of reducing the contention by having many resources, create a
per-nm_thread memory context, loadtask and zonetask and spread the zones
between just per-thread resources.

Note: this commit alone does decrease performance when loading the zone
by couple seconds (in case of 1M zone) and thus there's more work in
this whole MR fixing the performance.
2022-04-01 23:50:34 +02:00
Ondřej Surý
6bd025942c Replace netievent lock-free queue with simple locked queue
The current implementation of isc_queue uses Michael-Scott lock-free
queue that in turn uses hazard pointers.  It was discovered that the way
we use the isc_queue, such complicated mechanism isn't really needed,
because most of the time, we either execute the work directly when on
nmthread (in case of UDP) or schedule the work from the matching
nmthreads.

Replace the current implementation of the isc_queue with a simple locked
ISC_LIST.  There's a slight improvement - since copying the whole list
is very lightweight - we move the queue into a new list before we start
the processing and locking just for moving the queue and not for every
single item on the list.

NOTE: There's a room for future improvements - since we don't guarantee
the order in which the netievents are processed, we could have two lists
- one unlocked that would be used when scheduling the work from the
matching thread and one locked that would be used from non-matching
thread.
2022-03-04 13:49:51 +01:00
Ondřej Surý
3c7b04d015 Add network manager based timer API
This commits adds API that allows to create arbitrary timers associated
with the network manager handles.
2022-02-17 21:38:17 +01:00
Ondřej Surý
4f78f9d72a Add #define ISC_OS_CACHELINE_SIZE 64
Add library ctor and dtor for isc_os compilation unit which initializes
the numbers of the CPUs and also checks whether L1 cacheline size is
really 64 if the sysconf() call is available.
2022-01-05 17:07:35 +01:00
Evan Hunt
a55589f881 remove all references to isc_socket and related types
Removed socket.c, socket.h, and all references to isc_socket_t,
isc_socketmgr_t, isc_sockevent_t, etc.
2021-10-15 01:01:25 -07:00
Ondřej Surý
e603983ec9 Stop providing branch prediction information
The __builtin_expect() can be used to provide the compiler with branch
prediction information.  The Gcc manual says[1] on the subject:

    In general, you should prefer to use actual profile feedback for
    this (-fprofile-arcs), as programmers are notoriously bad at
    predicting how their programs actually perform.

Stop using __builtin_expect() and ISC_LIKELY() and ISC_UNLIKELY() macros
to provide the branch prediction information as the performance testing
shows that named performs better when the __builtin_expect() is not
being used.

1. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
2021-10-14 10:33:24 +02:00
Michał Kępień
5178ba4cf2 Properly handle JEMALLOC_* Autoconf variables
The AX_CHECK_JEMALLOC() m4 macro sets the JEMALLOC_CFLAGS variable, not
JEMALLOC_CPPFLAGS.  Furthermore, the JEMALLOC_CFLAGS and JEMALLOC_LIBS
variables should only be included in the build flags if jemalloc was
successfully configured.  Tweak lib/isc/Makefile.am accordingly.
2021-10-12 10:44:30 +02:00
Ondřej Surý
2e3a2eecfe Make isc_result a static enum
Remove the dynamic registration of result codes.  Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header.  Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.

Additionally, add couple of:

    switch (result) {
    [...]
    default:
        break;
    }

statements where compiler now complains about missing enum values in the
switch statement.
2021-10-06 11:22:20 +02:00
Ondřej Surý
8cb2ba5dd3 Remove native PKCS#11 support
The native PKCS#11 support has been removed in favour of better
maintained, more performance and easier to use OpenSSL PKCS#11 engine
from the OpenSC project.
2021-09-09 15:35:39 +02:00