The isc_mem_get(), isc_mem_allocate() and isc_mem_reallocate() can
return NULL ptr in case where the allocation size is NULL. Remove the
nonnull attribute from the functions' declarations.
This stems from the following definition in the C11 standard:
> If the size of the space requested is zero, the behavior is
> implementation-defined: either a null pointer is returned, or the
> behavior is as if the size were some nonzero value, except that the
> returned pointer shall not be used to access an object.
In this case, we return NULL as it's easier to detect errors when
accessing pointer from zero-sized allocation which should obviously
never happen.
- isc_mempool_get() can no longer fail; when there are no more objects
in the pool, more are always allocated. checking for NULL return is
no longer necessary.
- the isc_mempool_setmaxalloc() and isc_mempool_getmaxalloc() functions
are no longer used and have been removed.
Current mempools are kind of hybrid structures - they serve two
purposes:
1. mempool with a lock is basically static sized allocator with
pre-allocated free items
2. mempool without a lock is a doubly-linked list of preallocated items
The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.
The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
Previously, we only had capability to trace the mempool gets and puts,
but for debugging, it's sometimes also important to keep track how many
and where do the memory pools get created and destroyed. This commit
adds such tracking capability.
The ISC_MEM_DEBUGSIZE and ISC_MEM_DEBUGCTX did sanity checks on matching
size and memory context on the memory returned to the allocator. Those
will no longer needed when most of the allocator will be replaced with
jemalloc.
Previously, we only had capability to trace the memory gets and puts,
but for debugging, it's sometimes also important to keep track how many
and where do the memory contexts get created and destroyed. This commit
adds such tracking capability.
This commit makes BIND return HTTP status codes for malformed or too
small requests.
DNS request processing code would ignore such requests. Such an
approach works well for other DNS transport but does not make much
sense for HTTP, not allowing it to complete the request/response
sequence.
Suppose execution has reached the point where DNS message handling
code has been called. In that case, it means that the HTTP request has
been successfully processed, and, thus, we are expected to respond to
it either with a message containing some DNS payload or at least to
return an error status code. This commit ensures that BIND behaves
this way.
This commit adds two new autoconf options `--enable-doh` (enabled by
default) and `--with-libnghttp2` (mandatory when DoH is enabled).
When DoH support is disabled the library is not linked-in and support
for http(s) protocol is disabled in the netmgr, named and dig.
The isc/platform.h header was left empty which things either already
moved to config.h or to appropriate headers. This is just the final
cleanup commit.
The last remaining defines needed for platforms without NAME_MAX and
PATH_MAX (I'm looking at you, GNU Hurd) were moved to isc/dir.h where
it's prevalently used.
The ISC_STRERRORSIZE was defined in isc/platform.h header as the
value was different between Windows and POSIX platforms. Now that
Windows is gone, move the define to where it belongs.
Previously, each protocol (TCPDNS, TLSDNS) has specified own function to
disable pipelining on the connection. An oversight would lead to
assertion failure when opcode is not query over non-TCPDNS protocol
because the isc_nm_tcpdns_sequential() function would be called over
non-TCPDNS socket. This commit removes the per-protocol functions and
refactors the code to have and use common isc_nm_sequential() function
that would either disable the pipelining on the socket or would handle
the request in per specific manner. Currently it ignores the call for
HTTP sockets and causes assertion failure for protocols where it doesn't
make sense to call the function at all.
The requirements for BIND 9.17+ now requires C11 support from the
compiler, so we can safely drop most of the stdatomic.h shims from
lib/isc/unix/include/stdatomic.h.
This commit removes support for clang atomic builtins (clang >= 3.6.0
includes stdatomic.h header) and for Gcc __sync builtins.
The only compatibility shim that remains is support for __atomic
builtins for Gcc >= 4.7.0 since CentOS 7 still includes only Gcc 4.8.1
and the proper stdatomic.h header was only introduced in Gcc >= 4.9.
We cannot use DoH for zone transfers. According to RFC8484 a DoH
request contains exactly one DNS message (see Section 6: Definition of
the "application/dns-message" Media Type,
https://datatracker.ietf.org/doc/html/rfc8484#section-6). This makes
DoH unsuitable for zone transfers as often (and usually!) these need
more than one DNS message, especially for larger zones.
As zone transfers over DoH are not (yet) standardised, nor discussed
in RFC8484, the best thing we can do is to return "not implemented."
Technically DoH can be used to transfer small zones which fit in one
message, but that is not enough for the generic case.
Also, this commit makes the server-side DoH code ensure that no
multiple responses could be attempted to be sent over one HTTP/2
stream. In HTTP/2 one stream is mapped to one request/response
transaction. Now the write callback will be called with failure error
code in such a case.
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.
We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.
This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.
The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.
On some platforms, the __attribute__ constructor and destructor won't
take priorities and the compilation failed. On such platform would be
macOS. For this reason, the constructor/destructor in the libisc was
reworked to not use priorities, but have a single constructor and
destructor that calls the appropriate routines in correct order.
This commit removes the extra priority because it's now not needed and
it also breaks a compilation on macOS with GCC 10.
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before. This means less type-casting and
shorter path to access isc_sockaddr_t members.
At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.
We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).
Each clientmgr object than would have a single task and single memory
context.
Since a client object is bound to a netmgr handle, each client
will always be processed by the same netmgr worker, so we can
simplify the code by binding client->task to the same thread as
the client. Since ns__client_request() now runs in the same event
loop as client->task events, is no longer necessary to pause the
task manager before launching them.
Also removed some functions in isc_task that were not used.
Instead of using fixed quantum, this commit adds atomic counter for
number of items on each queue and uses the number of netievents
scheduled to run as the limit of maximum number of netievents for a
single process_queue() run.
This prevents the endless loops when the netievent would schedule more
netievents onto the same loop, but we don't have to pick "magic" number
for the quantum.
This commit adds a new configuration option to set the receive and send
buffer sizes on the TCP and UDP netmgr sockets. The default is `0`
which doesn't set any value and just uses the value set by the operating
system.
There's no magic value here - set it too small and the performance will
drop, set it too large, the buffers can fill-up with queries that have
already timeouted on the client side and nobody is interested for the
answer and this would just make the server clog up even more by making
it produce useless work.
The `netstat -su` can be used on POSIX systems to monitor the receive
and send buffer errors.
The netmgr listening, stoplistening, pausing and resuming functions
now use barriers for synchronization, which makes the code much simpler.
isc/barrier.h defines isc_barrier macros as a front-end for uv_barrier
on platforms where that works, and pthread_barrier where it doesn't
(including TSAN builds).
all zone loading tasks have the privileged flag, but we only want
them to run as privileged tasks when the server is being initialized;
if we privilege them the rest of the time, the server may hang for a
long time after a reload/reconfig. so now we call isc_taskmgr_setmode()
to turn privileged execution mode on or off in the task manager.
isc_task_privileged() returns true if the task's privilege flag is
set *and* the taskmgr is in privileged execution mode. this is used
to determine in which netmgr event queue the task should be run.
There was a theoretical possibility of clogging up the queue processing
with an endless loop where currently processing netievent would schedule
new netievent that would get processed immediately. This wasn't such a
problem when only netmgr netievents were processed, but with the
addition of the tasks, there are at least two situation where this could
happen:
1. In lib/dns/zone.c:setnsec3param() the task would get re-enqueued
when the zone was not yet fully loaded.
2. Tasks have internal quantum for maximum number of isc_events to be
processed, when the task quantum is reached, the task would get
rescheduled and then immediately processed by the netmgr queue
processing.
As the isc_queue doesn't have a mechanism to atomically move the queue,
this commit adds a mechanism to quantize the queue, so enqueueing new
netievents will never stop processing other uv_loop_t events.
The default quantum size is 128.
Since the queue used in the network manager allows items to be enqueued
more than once, tasks are now reference-counted around task_ready()
and task_run(). task_ready() now has a public API wrapper,
isc_task_ready(), that the netmgr can use to reschedule processing
of a task if the quantum has been reached.
Incidental changes: Cleaned up some unused fields left in isc_task_t
and isc_taskmgr_t after the last refactoring, and changed atomic
flags to atomic_bools for easier manipulation.
With taskmgr running on top of netmgr, the ordering of how the tasks and
netmgr shutdown interacts was wrong as previously isc_taskmgr_destroy()
was waiting until all tasks were properly shutdown and detached. This
responsibility was moved to netmgr, so we now need to do the following:
1. shutdown all the tasks - this schedules all shutdown events onto
the netmgr queue
2. shutdown the netmgr - this also makes sure all the tasks and
events are properly executed
3. Shutdown the taskmgr - this now waits for all the tasks to finish
running before returning
4. Shutdown the netmgr - this call waits for all the netmgr netievents
to finish before returning
This solves the race when the taskmgr object would be destroyed before
all the tasks were finished running in the netmgr loops.
Previously, netmgr, taskmgr, timermgr and socketmgr all had their own
isc_<*>mgr_create() and isc_<*>mgr_destroy() functions. The new
isc_managers_create() and isc_managers_destroy() fold all four into a
single function and makes sure the objects are created and destroy in
correct order.
Especially now, when taskmgr runs on top of netmgr, the correct order is
important and when the code was duplicated at many places it's easy to
make mistake.
The former isc_<*>mgr_create() and isc_<*>mgr_destroy() functions were
made private and a single call to isc_managers_create() and
isc_managers_destroy() is required at the program startup / shutdown.
This commit adds support for generating backtraces on Windows and
refactors the isc_backtrace API to match the Linux/BSD API (without
the isc_ prefix)
* isc_backtrace_gettrace() was renamed to isc_backtrace(), the third
argument was removed and the return type was changed to int
* isc_backtrace_symbols() was added
* isc_backtrace_symbols_fd() was added and used as appropriate
The malloc attribute allows compiler to do some optmizations on
functions that behave like malloc/calloc, like assuming that the
returned pointer do not alias other pointers.
This commit changes the taskmgr to run the individual tasks on the
netmgr internal workers. While an effort has been put into keeping the
taskmgr interface intact, couple of changes have been made:
* The taskmgr has no concept of universal privileged mode - rather the
tasks are either privileged or unprivileged (normal). The privileged
tasks are run as a first thing when the netmgr is unpaused. There
are now four different queues in in the netmgr:
1. priority queue - netievent on the priority queue are run even when
the taskmgr enter exclusive mode and netmgr is paused. This is
needed to properly start listening on the interfaces, free
resources and resume.
2. privileged task queue - only privileged tasks are queued here and
this is the first queue that gets processed when network manager
is unpaused using isc_nm_resume(). All netmgr workers need to
clean the privileged task queue before they all proceed normal
operation. Both task queues are processed when the workers are
finished.
3. task queue - only (traditional) task are scheduled here and this
queue along with privileged task queues are process when the
netmgr workers are finishing. This is needed to process the task
shutdown events.
4. normal queue - this is the queue with netmgr events, e.g. reading,
sending, callbacks and pretty much everything is processed here.
* The isc_taskmgr_create() now requires initialized netmgr (isc_nm_t)
object.
* The isc_nm_destroy() function now waits for indefinite time, but it
will print out the active objects when in tracing mode
(-DNETMGR_TRACE=1 and -DNETMGR_TRACE_VERBOSE=1), the netmgr has been
made a little bit more asynchronous and it might take longer time to
shutdown all the active networking connections.
* Previously, the isc_nm_stoplistening() was a synchronous operation.
This has been changed and the isc_nm_stoplistening() just schedules
the child sockets to stop listening and exits. This was needed to
prevent a deadlock as the the (traditional) tasks are now executed on
the netmgr threads.
* The socket selection logic in isc__nm_udp_send() was flawed, but
fortunatelly, it was broken, so we never hit the problem where we
created uvreq_t on a socket from nmhandle_t, but then a different
socket could be picked up and then we were trying to run the send
callback on a socket that had different threadid than currently
running.
Since all the libraries are internal now, just cleanup the ISCAPI remnants
in isc_socket, isc_task and isc_timer APIs. This means, there's one less
layer as following changes have been done:
* struct isc_socket and struct isc_socketmgr have been removed
* struct isc__socket and struct isc__socketmgr have been renamed
to struct isc_socket and struct isc_socketmgr
* struct isc_task and struct isc_taskmgr have been removed
* struct isc__task and struct isc__taskmgr have been renamed
to struct isc_task and struct isc_taskmgr
* struct isc_timer and struct isc_timermgr have been removed
* struct isc__timer and struct isc__timermgr have been renamed
to struct isc_timer and struct isc_timermgr
* All the associated code that dealt with typing isc_<foo>
to isc__<foo> and back has been removed.
Previously, the taskmgr, timermgr and socketmgr had a constructor
variant, that would create the mgr on top of existing appctx. This was
no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
directly without any extra code.
This commit just cleans up the extra function.
The isc_nm_*connect() functions were refactored to always return the
connection status via the connect callback instead of sometimes returning
the hard failure directly (for example, when the socket could not be
created, or when the network manager was shutting down).
This commit changes the connect functions in all the network manager
modules, and also makes the necessary refactoring changes in places
where the connect functions are called.
util.h requires ISC_CONSTRUCTOR definition, which depends on config.h
inclusion. It does not include it from isc/util.h (or any other header).
Using isc/util.h fails hard when isc/util.h is used without including
bind's config.h.
Move the check to c file, where ISC_CONSTRUCTOR is used. Ensure config.h
is included there.
The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates
on a timer tick. This clock is generally fine for millisecond accuracy,
but on servers with 100hz clocks, this clock is nowhere near accurate
enough for microsecond accuracy.
This commit adds a new isc_time_now_hires function that uses
CLOCK_REALTIME, which gives the current time, though it is somewhat
expensive to call. When microsecond accuracy is required, it may be
required to use extra resources for higher accuracy.
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit. The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.
This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.
This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.
This commit changes the timeout logic in this way:
* On TCP connection accept the tcp-initial-timeout is applied
and the timer is started
* When we are processing and/or sending any DNS message the timer is
stopped
* When we stop processing all DNS messages, the tcp-idle-timeout
is applied and the timer is started again
- style, cleanup, and removal of unnecessary code.
- combined isc_nm_http_add_endpoint() and isc_nm_http_add_doh_endpoint()
into one function, renamed isc_http_endpoint().
- moved isc_nm_http_connect_send_request() into doh_test.c as a helper
function; remove it from the public API.
- renamed isc_http2 and isc_nm_http2 types and functions to just isc_http
and isc_nm_http, for consistency with other existing names.
- shortened a number of long names.
- the caller is now responsible for determining the peer address.
in isc_nm_httpconnect(); this eliminates the need to parse the URI
and the dependency on an external resolver.
- the caller is also now responsible for creating the SSL client context,
for consistency with isc_nm_tlsdnsconnect().
- added setter functions for HTTP/2 ALPN. instead of setting up ALPN in
isc_tlsctx_createclient(), we now have a function
isc_tlsctx_enable_http2client_alpn() that can be run from
isc_nm_httpconnect().
- refactored isc_nm_httprequest() into separate read and send functions.
isc_nm_send() or isc_nm_read() is called on an http socket, it will
be stored until a corresponding isc_nm_read() or _send() arrives; when
we have both halves of the pair the HTTP request will be initiated.
- isc_nm_httprequest() is renamed isc__nm_http_request() for use as an
internal helper function by the DoH unit test. (eventually doh_test
should be rewritten to use read and send, and this function should
be removed.)
- added implementations of isc__nm_tls_settimeout() and
isc__nm_http_settimeout().
- increased NGHTTP2 header block length for client connections to 128K.
- use isc_mem_t for internal memory allocations inside nghttp2, to
help track memory leaks.
- send "Cache-Control" header in requests and responses. (note:
currently we try to bypass HTTP caching proxies, but ideally we should
interact with them: https://tools.ietf.org/html/rfc8484#section-5.1)
Instead of calling isc_tls_initialize()/isc_tls_destroy() explicitly use
gcc/clang attributes on POSIX and DLLMain on Windows to initialize and
shutdown OpenSSL library.
This resolves the issue when isc_nm_create() / isc_nm_destroy() was
called multiple times and it would call OpenSSL library destructors from
isc_nm_destroy().
At the same time, since we now have introduced the ctor/dtor for libisc,
this commit moves the isc_mem API initialization (the list of the
contexts) and changes the isc_mem_checkdestroyed() to schedule the
checking of memory context on library unload instead of executing the
code immediately.
The two memory debugging features: ISC_MEM_DEFAULTFILL
(ISC_MEMFLAG_FILL) and ISC_MEM_TRACKLINES were always enabled in all
builds and the former was only disabled in `named`.
This commits disables those two features in non-developer build to make
the memory allocator significantly faster.
This is yet another step into unlocking some parts of the memory
contexts. All the regularly updated variables has been turned into
atomic types, so we can later remove the locks when updating various
counters.
Also unlock as much code as possible without breaking anything.
On 24-core machine, the tests would crash because we would run out of
the hazard pointers. We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.
Note: This is just a band-aid and needs a proper fix.
Previously, the applications using libisc would be able to override the
internal memory methods with own implementation. This was no longer
possible, but the extra level of indirection was not removed. This
commit removes the extra level of indirection for the memory methods and
the default_memalloc() and default_memfree().
The internal memory allocator had an extra code to keep a list of blocks
for small size allocation. This would help to reduce the interactions
with the system malloc as the memory would be already allocated from the
system, but there's an extra cost associated with that - all the
allocations/deallocations must be locked, effectively eliminating any
optimizations in the system allocator targeted at multi-threaded
applications. While the isc_mem API is still using locks pretty heavily,
this is a first step into reducing the memory allocation/deallocation
contention.
The <isc/readline.h> header provided a compatibility shim to use when
other non-GNU readline libraries are in use. The two places where
readline library is being used is nslookup and nsupdate, so the header
file has been moved to bin/dig directory and it's directly included from
bin/nsupdate.
This also conceals any readline headers exposed from the libisc headers.
This commit completes the support for DNS-over-HTTP(S) built on top of
nghttp2 and plugs it into the BIND. Support for both GET and POST
requests is present, as required by RFC8484.
Both encrypted (via TLS) and unencrypted HTTP/2 connections are
supported. The latter are mostly there for debugging/troubleshooting
purposes and for the means of encryption offloading to third-party
software (as might be desirable in some environments to simplify TLS
certificates management).
This commit includes work-in-progress implementation of
DNS-over-HTTP(S).
Server-side code remains mostly untested, and there is only support
for POST requests.
This commit resurrects the old TLS code from
8f73c70d23.
It also includes numerous stability fixes and support for
isc_nm_cancelread() for the TLS layer.
The code was resurrected to be used for DoH.
* Following the example set in 634bdfb16d, the tlsdns netmgr
module now uses libuv and SSL primitives directly, rather than
opening a TLS socket which opens a TCP socket, as the previous
model was difficult to debug. Closes#2335.
* Remove the netmgr tls layer (we will have to re-add it for DoH)
* Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc
library; move the OpenSSL initialization/deinitialization from dstapi
needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}()
* Add couple of new shims needed for OpenSSL 1.0.x
* When LibreSSL is used, require at least version 2.7.0 that
has the best OpenSSL 1.1.x compatibility and auto init/deinit
* Enforce OpenSSL 1.1.x usage on Windows
* Added a TLSDNS unit test and implemented a simple TLSDNS echo
server and client.
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested. It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.
NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.
The changes that are included in this commit are listed here
(extensively, but not exclusively):
* The netmgr_test unit test was split into individual tests (udp_test,
tcp_test, tcpdns_test and newly added tcp_quota_test)
* The udp_test and tcp_test has been extended to allow programatic
failures from the libuv API. Unfortunately, we can't use cmocka
mock() and will_return(), so we emulate the behaviour with #define and
including the netmgr/{udp,tcp}.c source file directly.
* The netievents that we put on the nm queue have variable number of
members, out of these the isc_nmsocket_t and isc_nmhandle_t always
needs to be attached before enqueueing the netievent_<foo> and
detached after we have called the isc_nm_async_<foo> to ensure that
the socket (handle) doesn't disappear between scheduling the event and
actually executing the event.
* Cancelling the in-flight TCP connection using libuv requires to call
uv_close() on the original uv_tcp_t handle which just breaks too many
assumptions we have in the netmgr code. Instead of using uv_timer for
TCP connection timeouts, we use platform specific socket option.
* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}
When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
waiting for socket to either end up with error (that path was fine) or
to be listening or connected using condition variable and mutex.
Several things could happen:
0. everything is ok
1. the waiting thread would miss the SIGNAL() - because the enqueued
event would be processed faster than we could start WAIT()ing.
In case the operation would end up with error, it would be ok, as
the error variable would be unchanged.
2. the waiting thread miss the sock->{connected,listening} = `true`
would be set to `false` in the tcp_{listen,connect}close_cb() as
the connection would be so short lived that the socket would be
closed before we could even start WAIT()ing
* The tcpdns has been converted to using libuv directly. Previously,
the tcpdns protocol used tcp protocol from netmgr, this proved to be
very complicated to understand, fix and make changes to. The new
tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
Closes: #2194, #2283, #2318, #2266, #2034, #1920
* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
pass accepted TCP sockets between netthreads, but instead (similar to
UDP) uses per netthread uv_loop listener. This greatly reduces the
complexity as the socket is always run in the associated nm and uv
loops, and we are also not touching the libuv internals.
There's an unfortunate side effect though, the new code requires
support for load-balanced sockets from the operating system for both
UDP and TCP (see #2137). If the operating system doesn't support the
load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
on FreeBSD 12+), the number of netthreads is limited to 1.
* The netmgr has now two debugging #ifdefs:
1. Already existing NETMGR_TRACE prints any dangling nmsockets and
nmhandles before triggering assertion failure. This options would
reduce performance when enabled, but in theory, it could be enabled
on low-performance systems.
2. New NETMGR_TRACE_VERBOSE option has been added that enables
extensive netmgr logging that allows the software engineer to
precisely track any attach/detach operations on the nmsockets and
nmhandles. This is not suitable for any kind of production
machine, only for debugging.
* The tlsdns netmgr protocol has been split from the tcpdns and it still
uses the old method of stacking the netmgr boxes on top of each other.
We will have to refactor the tlsdns netmgr protocol to use the same
approach - build the stack using only libuv and openssl.
* Limit but not assert the tcp buffer size in tcp_alloc_cb
Closes: #2061
Add server-side TLS support to netmgr - that includes moving some of the
isc_nm_ functions from tcp.c to a wrapper in netmgr.c calling a proper
tcp or tls function, and a new isc_nm_listentls() function.
Add DoT support to tcpdns - isc_nm_listentlsdns().
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
timeout argument to ensure connections time out and are correctly
cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
failure if reading is interrupted by a non-network thread (e.g.
a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
prior to 1.27, when uv_udp_connect() was added
all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
whether the socket was still alive and scheduling reads/sends on
closed socket.
2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
changed to always return the error conditions via the callbacks, so
they always succeed. This applies to all protocols (UDP, TCP and
TCPDNS).
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.
A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:
- reqhandle: held while waiting for a request callback (query,
notify, update)
- sendhandle: held while waiting for a send callback
- fetchhandle: held while waiting for a recursive fetch to
complete
- updatehandle: held while waiting for an update-forwarding
task to complete
control channel connection objects now contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- cmdhandle: held while an rndc command is running
httpd connections contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- rename isc_nmsocket_t->tcphandle to statichandle
- cancelread functions now take handles instead of sockets
- add a 'client' flag in socket objects, currently unused, to
indicate whether it is to be used as a client or server socket
While
if (isc_refcount_decrement() == 1) { // memory_order_release
isc_refcount_destroy(); // memory_order_acquire
...
}
is theoretically the most efficent in practice, using
memory_order_acq_rel produces the same code on x86_64 and doesn't
trigger tsan data races (which use a idealistic model) if
isc_refcount_destroy() is not called immediately. In fact
isc_refcount_destroy() could be removed if we didn't want
to check for the count being 0 when isc_refcount_destroy() is
called.
https://stackoverflow.com/questions/49112732/memory-order-in-shared-pointer-destructor
This commit updates and simplifies the checks for the readline support
in nslookup and nsupdate:
* Change the autoconf checks to pkg-config only, all supported
libraries have accompanying .pc files now.
* Add editline support in addition to libedit and GNU readline
* Add isc/readline.h shim header that defines dummy readline()
function when no readline library is available
When pk11_numbits() is passed a user provided input that contains all
zeroes (via crafted DNS message), it would crash with assertion
failure. Fix that by properly handling such input.
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range. Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.
The stdatomic shims for non-C11 compilers (Windows, old gcc, ...) and
mutexatomic implemented only and minimal subset of the atomic types.
This commit adds 16-bit operations for Windows and all atomic types as
defined in standard.
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
The isc_nm_cancelread() function cancels reading on a connected
socket and calls its read callback function with a 'result'
parameter of ISC_R_CANCELED.
the isc_nm_tcpconnect() function establishes a client connection via
TCP. once the connection is esablished, a callback function will be
called with a newly created network manager handle.
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.
when built with "configure --enable-singletrace", named will produce
detailed query logging at the highest debug level for any query with
query ID zero.
this enables monitoring of the progress of a single query by specifying
the QID using "dig +qid=0". the "client" logging category should be set
to a low severity level to suppress logging of other queries. (the
chance of another query using QID=0 at the same time is only 1 in 2^16.)
"--enable-singletrace" turns on "--enable-querytrace" as well, so if the
logging severity is not lowered, all other queries will be logged
verbosely as well. compiling with either of these options will impair
query performance; they should only be turned on when testing or
troubleshooting.
Instead of using bind() and passing the listening socket to the children
threads using uv_export/uv_import use one thread that does the accepting,
and then passes the connected socket using uv_export/uv_import to a random
worker. The previous solution had thundering herd problems (all workers
waking up on one connection and trying to accept()), this one avoids this
and is simpler.
The tcp clients quota is simplified with isc_quota_attach_cb - a callback
is issued when the quota is available.
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests. Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.
This squashed commit contains following changes:
- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
by using automake
- the libtool is now properly integrated with automake (the way we used it
was rather hackish as the only official way how to use libtool is via
automake
- the dynamic module loading was rewritten from a custom patchwork to libtool's
libltdl (which includes the patchwork to support module loading on different
systems internally)
- conversion of the unit test executor from kyua to automake parallel driver
- conversion of the system test executor from custom make/shell to automake
parallel driver
- The GSSAPI has been refactored, the custom SPNEGO on the basis that
all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
support SPNEGO mechanism.
- The various defunct tests from bin/tests have been removed:
bin/tests/optional and bin/tests/pkcs11
- The text files generated from the MD files have been removed, the
MarkDown has been designed to be readable by both humans and computers
- The xsl header is now generated by a simple sed command instead of
perl helper
- The <irs/platform.h> header has been removed
- cleanups of configure.ac script to make it more simpler, addition of multiple
macros (there's still work to be done though)
- the tarball can now be prepared with `make dist`
- the system tests are partially able to run in oot build
Here's a list of unfinished work that needs to be completed in subsequent merge
requests:
- `make distcheck` doesn't yet work (because of system tests oot run is not yet
finished)
- documentation is not yet built, there's a different merge request with docbook
to sphinx-build rst conversion that needs to be rebased and adapted on top of
the automake
- msvc build is non functional yet and we need to decide whether we will just
cross-compile bind9 using mingw-w64 or fix the msvc build
- contributed dlz modules are not included neither in the autoconf nor automake
The pk11/constants.h header contained static CK_BYTE arrays and
we had to use #defines to pull only those we need. This commit
changes the constants to only define byte arrays with the content
and either use them directly or define the CK_BYTE arrays locally
where used.
We introduce a isc_quota_attach_cb function - if ISC_R_QUOTA is returned
at the time the function is called, then a callback will be called when
there's quota available (with quota already attached). The callbacks are
organized as a LIFO queue in the quota structure.
It's needed for TCP client quota - with old networking code we had one
single place where tcp clients quota was processed so we could resume
accepting when the we had spare slots, but it's gone with netmgr - now
we need to notify the listener/accepter that there's quota available so
that it can resume accepting.
Remove unused isc_quota_force() function.
The isc_quote_reserve and isc_quota_release were used only internally
from the quota.c and the tests. We should not expose API we are not
using.
tcpdns used transport-specific functions to operate on the outer socket.
Use generic ones instead, and select the proper call in netmgr.c.
Make the missing functions (e.g. isc_nm_read) generic and add type-specific
calls (isc__nm_tcp_read). This is the preparation for netmgr TLS layer.
The isc_mem API now crashes on memory allocation failure, and this is
the next commit in series to cleanup the code that could fail before,
but cannot fail now, e.g. isc_result_t return type has been changed to
void for the isc_log API functions that could only return ISC_R_SUCCESS.
The <isc/md.h> header directly included <openssl/evp.h> header which
enforced all users of the libisc library to explicitly list the include
path to OpenSSL and link with -lcrypto. By hiding the specific
implementation into the private namespace, we no longer enforce this.
In the long run, this might also allow us to switch cryptographic
library implementation without affecting the downstream users.
While making the isc_md_type_t type opaque, the API using the data type
was changed to use the pointer to isc_md_type_t instead of using the
type directly.
The <isc/md.h> header directly included <openssl/hmac.h> header which
enforced all users of the libisc library to explicitly list the include
path to OpenSSL and link with -lcrypto. By hiding the specific
implementation into the private namespace, we no longer enforce this.
In the long run, this might also allow us to switch cryptographic
library implementation without affecting the downstream users.
The two "functions" that isc/safe.h declared before were actually simple
defines to matching OpenSSL functions. The downside of the approach was
enforcing all users of the libisc library to explicitly list the include
path to OpenSSL and link with -lcrypto. By hiding the specific
implementation into the private namespace changing the defines into
simple functions, we no longer enforce this. In the long run, this
might also allow us to switch cryptographic library implementation
without affecting the downstream users.
The previous commit removed the code related to the internal symbol
table. On platforms where available, we can now use backtrace_symbols()
to print more verbose symbols table to the output.
As there's now general availability of backtrace() and
backtrace_symbols() functions (see below), the commit also removes the
usage of glibc internals and the custom stack tracing.
* backtrace(), backtrace_symbols(), and backtrace_symbols_fd() are
provided in glibc since version 2.1.
* backtrace(), backtrace_symbols(), and backtrace_symbols_fd() first
appeared in Mac OS X 10.5.
* The backtrace() library of functions first appeared in NetBSD 7.0 and
FreeBSD 10.0.
This commit simplifies a bit the lock management within dns_resolver_prime()
and prime_done() functions by means of turning resolver's attribute
"priming" into an atomic_bool and by creating only one dependent object on the
lock "primelock", namely the "primefetch" attribute.
By having the attribute "priming" as an atomic type, it save us from having to
use a lock just to test if priming is on or off for the given resolver context
object, within "dns_resolver_prime" function.
The "primelock" lock is still necessary, since dns_resolver_prime() function
internally calls dns_resolver_createfetch(), and whenever this function
succeeds it registers an event in the task manager which could be called by
another thread, namely the "prime_done" function, and this function is
responsible for disposing the "primefetch" attribute in the resolver object,
also for resetting "priming" attribute to false.
It is important that the invariant "priming == false AND primefetch == NULL"
remains constant, so that any thread calling "dns_resolver_prime" knows for sure
that if the "priming" attribute is false, "primefetch" attribute should also be
NULL, so a new fetch context could be created to fulfill this purpose, and
assigned to "primefetch" attribute under the lock protection.
To honor the explanation above, dns_resolver_prime is implemented as follow:
1. Atomically checks the attribute "priming" for the given resolver context.
2. If "priming" is false, assumes that "primefetch" is NULL (this is
ensured by the "prime_done" implementation), acquire "primelock"
lock and create a new fetch context, update "primefetch" pointer to
point to the newly allocated fetch context.
3. If "priming" is true, assumes that the job is already in progress,
no locks are acquired, nothing else to do.
To keep the previous invariant consistent, "prime_done" is implemented as follow:
1. Acquire "primefetch" lock.
2. Keep a reference to the current "primefetch" object;
3. Reset "primefetch" attribute to NULL.
4. Release "primefetch" lock.
5. Atomically update "priming" attribute to false.
6. Destroy the "primefetch" object by using the temporary reference.
This ensures that if "priming" is false, "primefetch" was already reset to NULL.
It doesn't make any difference in having the "priming" attribute not protected
by a lock, since the visible state of this variable would depend on the calling
order of the functions "dns_resolver_prime" and "prime_done".
As an example, suppose that instead of using an atomic for the "priming" attribute
we employed a lock to protect it.
Now suppose that "prime_done" function is called by Thread A, it is then preempted
before acquiring the lock, thus not reseting "priming" to false.
In parallel to that suppose that a Thread B is scheduled and that it calls
"dns_resolver_prime()", it then acquires the lock and check that "priming" is true,
thus it will consider that this resolver object is already priming and it won't do
any more job.
Conversely if the lock order was acquired in the other direction, Thread B would check
that "priming" is false (since prime_done acquired the lock first and set "priming" to false)
and it would initiate a priming fetch for this resolver.
An atomic variable wouldn't change this behavior, since it would behave exactly the
same, depending on the function call order, with the exception that it would avoid
having to use a lock.
There should be no side effects resulting from this change, since the previous
implementation employed use of the more general resolver's "lock" mutex, which
is used in far more contexts, but in the specifics of the "dns_resolver_prime"
and "prime_done" it was only used to protect "primefetch" and "priming" attributes,
which are not used in any of the other critical sections protected by the same lock,
thus having zero dependency on those variables.
hp implementation requires an object for each thread accessing
a hazard pointer. previous implementation had a hardcoded
HP_MAX_THREAD value of 128, which failed on machines with lots of
CPU cores (named uses 3n threads). We make isc__hp_max_threads
configurable at startup, with the value set to 4*named_g_cpus.
It's also important for this value not to be too big as we do
linear searches on a list.
The isc_refcount API that provides reference counting lost DbC checks for
overflows and underflows in the isc_refcount_{increment,decrement} functions.
The commit restores the overflow check in the isc_refcount_increment and
underflows check in the isc_refcount_decrement by checking for the previous
value to not be on the boundary.
- the socket stat counters have been moved from socket.h to stats.h.
- isc_nm_t now attaches to the same stats counter group as
isc_socketmgr_t, so that both managers can increment the same
set of statistics
- isc__nmsocket_init() now takes an interface as a paramter so that
the address family can be determined when initializing the socket.
- based on the address family and socket type, a group of statistics
counters will be associated with the socket - for example, UDP4Active
with IPv4 UDP sockets and TCP6Active with IPv6 TCP sockets. note
that no counters are currently associated with TCPDNS sockets; those
stats will be handled by the underlying TCP socket.
- the counters are not actually used by netmgr sockets yet; counter
increment and decrement calls will be added in a later commit.
After the network manager rewrite, tcp-higwater stats was only being
updated when a valid DNS query was received over tcp.
It turns out tcp-quota is updated right after a tcp connection is
accepted, before any data is read, so in the event that some client
connect but don't send a valid query, it wouldn't be taken into
account to update tcp-highwater stats, that is wrong.
This commit fix tcp-highwater to update its stats whenever a tcp connection
is established, independent of what happens after (timeout/invalid
request, etc).
The new ISC_THREAD_LOCAL macro unifies usage of platform dependent
Thread Local Storage definition thread_local vs __thread vs
__declspec(thread) to a single macro.
The commit also unifies the required level of support for TLS as for
some parts of the code it was mandatory and for some parts of the code
it wasn't.
FCTX_ATTR_SHUTTINGDOWN needs to be set and tested while holding the node
lock but the rest of the attributes don't as they are task locked. Making
fctx->attributes atomic allows both behaviours without races.
- restore support for tcp-initial-timeout, tcp-idle-timeout,
tcp-keepalive-timeout and tcp-advertised-timeout configuration
options, which were ineffective previously.
when the TCPDNS_CLIENTS_PER_CONN limit has been exceeded for a TCP
DNS connection, switch to sequential mode to ensure that memory cannot
be exhausted by too many simultaneous queries.
This allows a task to be temporary disabled so that objects won't be
processed simultaneously by libuv events and isc_task events. When a
task is paused, currently running events may complete, but no further
event will added to the run queue will be executed until the task is
unpaused.
When a task manager is created, we can now specify an `isc_nm`
object to associate with it; thereafter when the task manager is
placed into exclusive mode, the network manager will be paused.
This is a replacement for the existing isc_socket and isc_socketmgr
implementation. It uses libuv for asynchronous network communication;
"networker" objects will be distributed across worker threads reading
incoming packets and sending them for processing.
UDP listener sockets automatically create an array of "child" sockets
so each worker can listen separately.
TCP sockets are shared amongst worker threads.
A TCPDNS socket is a wrapper around a TCP socket, which handles the
the two-byte length field at the beginning of DNS messages over TCP.
(Other wrapper socket types can be implemented in the future to handle
DNS over TLS, DNS over HTTPS, etc.)
The double-locked queue implementation is still currently in use
in ns_client, but will be replaced by a fetch-and-add array queue.
This commit moves it from queue.h to list.h so that queue.h can be
used for the new data structure, and clean up dependencies between
list.h and types.h. Later, when the ISC_QUEUE is no longer is use,
it will be removed completely.
cppcheck 1.89 enabled certain value flow analysis mechanisms [1] which
trigger null pointer dereference false positives in lib/dns/rpz.c:
lib/dns/rpz.c:582:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:582:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:596:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:596:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:610:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:610:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
It seems that cppcheck no longer treats at least some REQUIRE()
assertion failures as fatal, so add extra assertion macro definitions to
lib/isc/include/isc/util.h that are only used when the CPPCHECK
preprocessor macro is defined; these definitions make cppcheck 1.89
behave as expected.
There is an important requirement for these custom definitions to work:
cppcheck must properly treat abort() as a function which does not
return. In order for that to happen, the __GNUC__ macro must be set to
a high enough number (because system include directories are used and
system headers compile attributes away if __GNUC__ is not high enough).
__GNUC__ is thus set to the major version number of the GCC compiler
used, which is what that latter does itself during compilation.
[1] aaeec462e6
The OASIS pkcs11.h header has a restrictive license. Replace the
pkcs11.h pkcs11f.h and pkcs11t.h headers with pkcs11.h from p11-kit.
For source distribution, the license for the OASIS headers itself
doesn't pose any licensing problem when combined with MPL license, but
it possibly creates problem for downstream distributors of BIND 9.
Previously the libisc allocator had ability to run unlocked when threading was
disabled. As the threading is now always on, remove the ISC_MEMFLAG_NOLOCK
memory flag as it serves no purpose.
The isc_mem_createx() function was only used in the tests to eliminate using the
default flags (which as of writing this commit message was ISC_MEMFLAG_INTERNAL
and ISC_MEMFLAG_FILL). This commit removes the isc_mem_createx() function from
the public API.
Previously, the isc_mem_create() and isc_mem_createx() functions took `max_size`
and `target_size` as first two arguments. Those values were never used in the
BIND 9 code. The refactoring removes those arguments and let BIND 9 always use
the default values.
Previously, the isc_mem_create() and isc_mem_createx() functions could have
failed because of failed memory allocation. As this was no longer true and the
functions have always returned ISC_R_SUCCESS, the have been refactored to return
void.
The native implementation's conversion from the uint8_t buffers to uint64_t now
follows the reference implementation that doesn't require aligned buffers.
This commit changes the BIND cookie algorithms to match
draft-sury-toorop-dnsop-server-cookies-00. Namely, it changes the Client Cookie
algorithm to use SipHash 2-4, adds the new Server Cookie algorithm using SipHash
2-4, and changes the default for the Server Cookie algorithm to be siphash24.
Add siphash24 cookie algorithm, and make it keep legacy aes as
The ThreadSanitizer found several possible data races in our rwlock
implementation. This commit changes all the unprotected variables to atomic and
also changes the explicit memory ordering (atomic_<foo>_explicit(..., <order>)
functions to use our convenience macros (atomic_<foo>_<order>).
The json-c have previously leaked into the global namespace leading
to forced -I<include_path> for every compilation unit using isc/xml.h
header. This MR fixes the usage making the caller object opaque.
The libxml2 have previously leaked into the global namespace leading
to forced -I<include_path> for every compilation unit using isc/xml.h
header. This MR fixes the usage making the caller object opaque.
Move the macOS section of <isc/endian.h> to a lower spot as it is
believed not to be the most popular platform for running BIND. Add a
comment and remove redundant definitions.
Instead of only supporting Linux, try making <isc/endian.h> support
other GNU platforms as well. Since some compilers define __GNUC__ on
BSDs (e.g. Clang on FreeBSD), move the relevant section to the bottom of
the platform-specific part of <isc/endian.h>, so that it only gets
evaluated when more specific platform determination criteria are not
met. Also include <byteswap.h> so that any byte-swapping macros which
may be defined in that file on older platforms are used in the fallback
definitions of the nonstandard hto[bl]e{16,32,64}() and
[bl]e{16,32,64}toh() conversion functions.
While Solaris does not support the nonstandard hto[bl]e{16,32,64}() and
[bl]e{16,32,64}toh() conversion functions, it does have some
byte-swapping macros available in <sys/byteorder.h>. Ensure these
macros are used in the fallback definitions of the aforementioned
nonstandard functions.
Since the hto[bl]e{16,32,64}() and [bl]e{16,32,64}toh() conversion
functions are nonstandard, add fallback definitions of these functions
to <isc/endian.h>, so that their unavailability does not prevent
compilation from succeeding.
Current versions of DragonFly BSD, FreeBSD, NetBSD, and OpenBSD all
support the modern variants of functions converting values between host
and big-endian/little-endian byte order while older ones might not.
Ensure <isc/endian.h> works properly in both cases.
This work cleans up the API which includes couple of things:
1. Make the isc_appctx_t type fully opaque
2. Protect all access to the isc_app_t members via stdatomics
3. sigwait() is part of POSIX.1, remove dead non-sigwait code
4. Remove unused code: isc_appctx_set{taskmgr,sockmgr,timermgr}
The header file <isc/atomic.h> now contains convenience macros for
most useful explicit memory ordering for C11 stdatomics, only relaxed
and acquire-release semantics is being used. These macros SHOULD be
used instead of atomic_<func>_explicit functions.
- if the TCP quota has been exceeded but there are no clients listening
for new connections on the interface, we can now force attachment to the
quota using isc_quota_force(), instead of carrying on with the quota not
attached.
- the TCP client quota is now referenced via a reference-counted
'ns_tcpconn' object, one of which is created whenever a client begins
listening for new connections, and attached to by members of that
client's pipeline group. when the last reference to the tcpconn
object is detached, it is freed and the TCP quota slot is released.
- reduce code duplication by adding mark_tcp_active() function
- convert counters to stdatomic
(cherry picked from commit a8dd133d270873b736c1be9bf50ebaa074f5b38f)
(cherry picked from commit 4a8fc979c4)
If we know that we'll have a task pool doing specific thing it's better
to use this knowledge and bind tasks to task queues, this behaves better
than randomly choosing the task queue.
- use bound resolver tasks - we have a pool of tasks doing resolutions,
we can spread the load evenly using isc_task_create_bound
- quantum set universally to 25
While implementing the new unit testing framework cmocka, it was found that the
BIND 9 code doesn't compile when assertions are disabled or replaced with any
function (such as mock_assert() from cmocka unit testing framework) that's not
directly recognized as assertion by the compiler.
This made the compiler to complain about blocks of code that was recognized as
unreachable before, but now it isn't.
The changes in this commit include:
* assigns default values to couple of local variables,
* moves some return statements around INSIST assertions,
* adds __builtin_unreachable(); annotations after some INSIST assertions,
* fixes one broken assertion (= instead of ==)
Remove the following functions in order to simplify socket code:
- isc_socket_recvv()
- isc_socket_sendtov()
- isc_socket_sendtov2()
- isc_socket_sendv()
- this enables memory to be allocated and freed in dyndb modules
when named is linked statically. when we standardize on libtool,
this should become unnecessary.
- also, simplified the isc_mem_create/createx API by removing
extra compatibility functions
This commit reverts the previous change to use system provided
entropy, as (SYS_)getrandom is very slow on Linux because it is
a syscall.
The change introduced in this commit adds a new call isc_nonce_buf
that uses CSPRNG from cryptographic library provider to generate
secure data that can be and must be used for generating nonces.
Example usage would be DNS cookies.
The isc_random() API has been changed to use fast PRNG that is not
cryptographically secure, but runs entirely in user space. Two
contestants have been considered xoroshiro family of the functions
by Villa&Blackman and PCG by O'Neill. After a consideration the
xoshiro128starstar function has been used as uint32_t random number
provider because it is very fast and has good enough properties
for our usage pattern.
The other change introduced in the commit is the more extensive usage
of isc_random_uniform in places where the usage pattern was
isc_random() % n to prevent modulo bias. For usage patterns where
only 16 or 8 bits are needed (DNS Message ID), the isc_random()
functions has been renamed to isc_random32(), and isc_random16() and
isc_random8() functions have been introduced by &-ing the
isc_random32() output with 0xffff and 0xff. Please note that the
functions that uses stripped down bit count doesn't pass our
NIST SP 800-22 based random test.
- mark the 'geoip-use-ecs' option obsolete; warn when it is used
in named.conf
- prohibit 'ecs' ACL tags in named.conf; note that this is a fatal error
since simply ignoring the tags could make ACLs behave unpredictably
- re-simplify the radix and iptable code
- clean up dns_acl_match(), dns_aclelement_match(), dns_acl_allowed()
and dns_geoip_match() so they no longer take ecs options
- remove the ECS-specific unit and system test cases
- remove references to ECS from the ARM
- Replace external -DOPENSSL/-DPKCS11CRYPTO with properly AC_DEFINEd
HAVE_OPENSSL/HAVE_PKCS11
- Don't enforce the crypto provider from platform.h, just from dst_api.c
and configure scripts
The three functions has been modeled after the arc4random family of
functions, and they will always return random bytes.
The isc_random family of functions internally use these CSPRNG (if available):
1. getrandom() libc call (might be available on Linux and Solaris)
2. SYS_getrandom syscall (might be available on Linux, detected at runtime)
3. arc4random(), arc4random_buf() and arc4random_uniform() (available on BSDs and Mac OS X)
4. crypto library function:
4a. RAND_bytes in case OpenSSL
4b. pkcs_C_GenerateRandom() in case PKCS#11 library
Certain isc_buffer_*() functions might call memmove() with the second
argument (source) set to NULL and the third argument (length) set to 0.
While harmless, it triggers an ubsan warning:
runtime error: null pointer passed as argument 2, which is declared to never be null
Modify all memmove() call sites in lib/isc/include/isc/buffer.h and
lib/isc/buffer.c which may potentially use NULL as the second argument
(source) so that memmove() is only called if the third argument (length)
is non-zero.
4807. [cleanup] isc_rng_randombytes() returns a specified number of
bytes from the PRNG; this is now used instead of
calling isc_rng_random() multiple times. [RT #46230]
4768. [func] By default, memory is no longer filled with tag values
when it is allocated or freed; this improves
performance but makes debugging of certain memory
issues more difficult. "named -M fill" turns memory
filling back on. (Building "configure
--enable-developer", turns memory fill on by
default again; it can then be disabled with
"named -M nofill".) [RT #45123]
4762. [func] "update-policy local" is now restricted to updates
from local addresses. (Previously, other addresses
were allowed so long as updates were signed by the
local session key.) [RT #45492]
4724. [func] By default, BIND now uses the random number
functions provided by the crypto library (i.e.,
OpenSSL or a PKCS#11 provider) as a source of
randomness rather than /dev/random. This is
suitable for virtual machine environments
which have limited entropy pools and lack
hardware random number generators.
This can be overridden by specifying another
entropy source via the "random-device" option
in named.conf, or via the -r command line option;
however, for functions requiring full cryptographic
strength, such as DNSSEC key generation, this
cannot be overridden. In particular, the -r
command line option no longer has any effect on
dnssec-keygen.
This can be disabled by building with
"configure --disable-crypto-rand".
[RT #31459] [RT #46047]
4708. [cleanup] Legacy Windows builds (i.e. for XP and earlier)
are no longer supported. [RT #45186]
4707. [func] The lightweight resolver daemon and library (lwresd
and liblwres) have been removed. [RT #45186]
4706. [func] Code implementing name server query processing has
been moved from bin/named to a new library "libns".
Functions remaining in bin/named are now prefixed
with "named_" rather than "ns_". This will make it
easier to write unit tests for name server code, or
link name server functionality into new tools.
[RT #45186]
4579. [func] Logging channels and dnstap output files can now
be configured with a "suffix" option, set to
either "increment" or "timestamp", indicating
whether to use incrementing numbers or timestamps
as the file suffix when rolling over a log file.
[RT #42838]
4518. [func] The "print-time" option in the logging configuration
can now take arguments "local", "iso8601" or
"iso8601-utc" to indicate the format in which the
date and time should be logged. For backward
compatibility, "yes" is a synonym for "local".
[RT #42585]
4503. [cleanup] "make uninstall" now removes file installed by
BIND. (This currently excludes Python files
due to lack of support in setup.py.) [RT #42912]
4445. [cleanup] isc_errno_toresult() can now be used to call the
formerly private function isc__errno2result().
[RT #43050]
4444. [bug] Fixed some issues related to dyndb: A bug caused
braces to be omitted when passing configuration text
from named.conf to a dyndb driver, and there was a
use-after-free in the sample dyndb driver. [RT #43050]
Patch for dyndb driver submitted by Petr Spacek at Red Hat.
4411. [func] "rndc dnstap -roll" automatically rolls the
dnstap output file; the previous version is
saved with ".0" suffix, and earlier versions
with ".1" and so on. An optional numeric argument
indicates how many prior files to save. [RT #42830]
provisioning secondary servers in which a list of
zones to be served is stored in a DNS zone and can
be propagated to slaves via AXFR/IXFR. [RT #41581]
4375. [func] Add support for automatic reallocation of isc_buffer
to isc_buffer_put* functions. [RT #42394]
4235. [func] Added support in named for "dnstap", a fast method of
capturing and logging DNS traffic, and a new command
"dnstap-read" to read a dnstap log file. Use
"configure --enable-dnstap" to enable this
feature (note that this requires libprotobuf-c
and libfstrm). See the ARM for configuration details.
Thanks to Robert Edmonds of Farsight Security.
[RT #40211]
4224. [func] Added support for "dyndb", a new interface for loading
zone data from an external database, developed by
Red Hat for the FreeIPA project.
DynDB drivers fully implement the BIND database
API, and are capable of significantly better
performance and functionality than DLZ drivers,
while taking advantage of advanced database
features not available in BIND such as multi-master
replication.
Thanks to Adam Tkac and Petr Spacek of Red Hat.
[RT #35271]