On some platforms, the __attribute__ constructor and destructor won't
take priorities and the compilation failed. On such platform would be
macOS. For this reason, the constructor/destructor in the libisc was
reworked to not use priorities, but have a single constructor and
destructor that calls the appropriate routines in correct order.
This commit removes the extra priority because it's now not needed and
it also breaks a compilation on macOS with GCC 10.
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
The size of the array holding the pointers to clientmgr was created so
big it could hold the actual clientmgr objects, not just the pointer.
This commit fixes the size to be just the ncpus * sizeof(pointer).
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before. This means less type-casting and
shorter path to access isc_sockaddr_t members.
At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
* dns_journal_next() leaves the read point in the journal after the
transaction header so journal_seek() should be inside the loop.
* we need to recover from transaction header inconsistencies
Additionally when correcting for <size, serial0, serial1, 0> the
correct consistency check is isc_serial_gt() rather than
isc_serial_ge(). All instances updated.
According to the measurements (recorded on GL!5085), the fillcount of 2
for namepool and fillcount of 4 for rdspool can fit 99.99% of request
for tested scenarios.
This was discovered by perf recording the single second recursive test
using flamethrower where the initial malloc lit up like a flare.
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.
We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).
Each clientmgr object than would have a single task and single memory
context.
Similarly, the resolver code would create hundreds of memory contexts
just on the resolver setup. The contention will be reduced directly in
the allocator, so for now just attach to the view memory instead of
creating separate memory context for each bucket.
Since a client object is bound to a netmgr handle, each client
will always be processed by the same netmgr worker, so we can
simplify the code by binding client->task to the same thread as
the client. Since ns__client_request() now runs in the same event
loop as client->task events, is no longer necessary to pause the
task manager before launching them.
Also removed some functions in isc_task that were not used.
The number of memory contexts created in the clientmgr was enormous. It
could easily create thousands of memory contexts because the formula was:
nprotocols * ncpus * ninterfaces * CLIENT_NMCTXS_PERCPU (8)
The original goal was to reduce the contention when allocating the
memory, but after a while nobody noticed that the amount of memory
context allocated would not reduce contention at all.
This commit removes the whole mctxpool and just uses the mctx from
clientmgr as the contention will be reduced directly in the allocator.
dns_name_copy() has been replaced nearly everywhere with
dns_name_copynf(). this commit changes the last two uses of
the original function. afterward, we can remove the old
dns_name_copy() implementation, and replace it with _copynf().
dns_message_gettempname() returns an initialized name with a dedicated
buffer, associated with a dns_fixedname object. Using dns_name_copynf()
to write a name into this object will actually copy the name data
from a source name. dns_name_clone() merely points target->ndata to
source->ndata, so it is faster, but it can lead to a use-after-free if
the source is freed before the target object is released via
dns_message_puttempname().
In a few places, clone was being used where copynf should have been;
this is now fixed.
As a side note, no memory was lost, because the ndata buffer used in
the dns_fixedname_t is internal to the structure, and is freed when
the dns_fixedname_t is freed regardless of the .ndata contents.
The last rdataset_getownercase() left it in a state where the code was
mix of microoptimizations (manual loop unrolling, complicated bitshifts)
with a code that would always rewrite the character even if it stayed
the same after transformation.
This commit makes sure that we modify only the characters that actually
need to change, removes the manual loop unrolling, and replaces the
weird bit arithmetics with a simple shift and bit-and.
dns_message_gettempname() now returns a pointer to an initialized
name associated with a dns_fixedname_t object. it is no longer
necessary to allocate a buffer for temporary names associated with
the message object.
This function has never been used since it was added to the source tree
by commit 686b27bfd3 back in 1999. As
the dns_zoneflg_t type is only defined in lib/dns/zone.c, no function
external to that file would be able to use dns_zone_setflag() properly
anyway - the DNS_ZONE_SETFLAG() and DNS_ZONE_CLRFLAG() macros should be
used instead. Zone options that can be set from outside zone.c are set
using dns_zone_setoption().
Don't allow the same zone with different dnssec-policies in separate
views have the same key-directory.
Track zones plus key-directory in a symtab and if there is a match,
check the offending zone's dnssec-policy name. If the name is "none"
(there is no kasp for the offending zone), or if the name is the same
(the zone shares keys), it is fine, otherwise it is an error (zones
in views using different policies cannot share the same key-directory).
if dns_updatemethod_date is used do that the returned method is only
set to dns_updatemethod_increment if the new serial does not encode
the current day (YYYYMMDDXX).
Instead of using fixed quantum, this commit adds atomic counter for
number of items on each queue and uses the number of netievents
scheduled to run as the limit of maximum number of netievents for a
single process_queue() run.
This prevents the endless loops when the netievent would schedule more
netievents onto the same loop, but we don't have to pick "magic" number
for the quantum.
This commit adds a new configuration option to set the receive and send
buffer sizes on the TCP and UDP netmgr sockets. The default is `0`
which doesn't set any value and just uses the value set by the operating
system.
There's no magic value here - set it too small and the performance will
drop, set it too large, the buffers can fill-up with queries that have
already timeouted on the client side and nobody is interested for the
answer and this would just make the server clog up even more by making
it produce useless work.
The `netstat -su` can be used on POSIX systems to monitor the receive
and send buffer errors.
Unit test run for out-of-tree builds used to fail to find
masterXX.data.in files:
/usr/bin/perl -w /builds/mnowak/bind9/lib/dns/tests/mkraw.pl < testdata/master/master12.data.in > testdata/master/master12.data
/bin/bash: testdata/master/master12.data.in: No such file or directory
make[4]: *** [Makefile:1910: testdata/master/master12.data] Error 1
The outgoing UDP socket selection would pick unintialized children
socket on Windows, because we have more netmgr workers than we have
listening sockets. This commit fixes the selection by keeping the
outgoing socket the same, so it's always run on existing socket.
The initial intent was to limit the number of concurrent streams by
the value of 100 but due to the error when reading the documentation
it was set to the maximum possible number of streams per session.
This could lead to security issues, e.g. a remote attacker could have
taken down the BIND instance by creating lots of sessions via low
number of transport connections. This commit fixes that.
We should not call nghttp2_session_terminate_session() in server-side
code after all of the active HTTP/2 streams are processed. The
underlying transport connection is expected to remain opened at least
for some time in this case for new HTTP/2 requests to arrive. That is
what flamethrower was expecting and it makes perfect sense from the
HTTP/2 perspective.
During the stress testing, it was discovered that the default netmgr
quantum of 128 is not enough and there was a performance drop for TCP on
FreeBSD. Bumping the default quantum to 1024 solves the performance
issue and is still enough to prevent the endless loops.
We were clearing the pointer to taskmgr as soon as isc_taskmgr_destroy()
would be called and before all tasks were finished. Unfortunately, some
tasks would use global named_g_taskmgr objects from inside the events
and this would cause either a data race or NULL pointer dereference.
This commit fixes the data race by moving the destruction of the
referenced pointer to the time after all tasks are finished.
Network manager events that require interlock (pause, resume, listen)
are now always executed in the same worker thread, mgr->workers[0],
to prevent races.
"stoplistening" events no longer require interlock.
- ensure isc_nm_pause() and isc_nm_resume() work the same whether
run from inside or outside of the netmgr.
- promote 'stop' events to the priority event level so they can
run while the netmgr is pausing or paused.
- when pausing, drain the priority queue before acquiring an
interlock; this prevents a deadlock when another thread is waiting
for us to complete a task.
- release interlock after pausing, reacquire it when resuming, so
that stop events can happen.
some incidental changes:
- use a function to enqueue pause and resume events (this was part of a
different change attempt that didn't work out; I kept it because I
thought was more readable).
- make mgr->nworkers a signed int to remove some annoying integer casts.
The netmgr listening, stoplistening, pausing and resuming functions
now use barriers for synchronization, which makes the code much simpler.
isc/barrier.h defines isc_barrier macros as a front-end for uv_barrier
on platforms where that works, and pthread_barrier where it doesn't
(including TSAN builds).
When isc__nm_http_stoplistening() is run from inside the netmgr, we need
to make sure it's run synchronously. This commit is just a band-aid
though, as the desired behvaior for isc_nm_stoplistening() is not always
the same:
1. When run from outside user of the interface, the call must be
synchronous, e.g. the calling code expects the call to really stop
listening on the interfaces.
2. But if there's a call from listen<proto> when listening fails,
that needs to be scheduled to run asynchronously, because
isc_nm_listen<proto> is being run in a paused (interlocked)
netmgr thread and we could get stuck.
The proper solution would be to make isc_nm_stoplistening()
behave like uv_close(), i.e., to have a proper callback.