This commit makes number of concurrent HTTP/2 streams per connection
configurable as a mean to fight DDoS attacks. As soon as the limit is
reached, BIND terminates the whole session.
The commit adds a global configuration
option (http-streams-per-connection) which can be overridden in an
http <name> {...} statement like follows:
http local-http-server {
...
streams-per-connection 100;
...
};
For now the default value is 100, which should be enough (e.g. NGINX
uses 128, but it is a full-featured WEB-server). When using lower
numbers (e.g. ~70), it is possible to hit the limit with
e.g. flamethrower.
This commit adds support for http-listener-clients global options as
well as ability to override the default in an HTTP server description,
like:
http local-http-server {
...
listener-clients 100;
...
};
This way we have ability to specify per-listener active connections
quota globally and then override it when required. This is exactly
what AT&T requested us: they wanted a functionality to specify quota
globally and then override it for specific IPs. This change
functionality makes such a configuration possible.
It makes sense: for example, one could have different quotas for
internal and external clients. Or, for example, one could use BIND's
internal ability to serve encrypted DoH with some sane quota value for
internal clients, while having un-encrypted DoH listener without quota
to put BIND behind a load balancer doing TLS offloading for external
clients.
Moreover, the code no more shares the quota with TCP, which makes
little sense anyway (see tcp-clients option), because of the nature of
interaction of DoH clients: they tend to keep idle opened connections
for longer periods of time, preventing the TCP and TLS client from
being served. Thus, the need to have a separate, generally larger,
quota for them.
Also, the change makes any option within "http <name> { ... };"
statement optional, making it easier to override only required default
options.
By default, the DoH connections are limited to 300 per listener. I
hope that it is a good initial guesstimate.
The isc/platform.h header was left empty which things either already
moved to config.h or to appropriate headers. This is just the final
cleanup commit.
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.
We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
when a serve-stale answer has been sent, the client continues waiting
for a proper answer. if a final completion event for the client does
arrive, it can just be cleaned up without sending a response, similar
to a canceled fetch.
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.
We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).
Each clientmgr object than would have a single task and single memory
context.
The number of memory contexts created in the clientmgr was enormous. It
could easily create thousands of memory contexts because the formula was:
nprotocols * ncpus * ninterfaces * CLIENT_NMCTXS_PERCPU (8)
The original goal was to reduce the contention when allocating the
memory, but after a while nobody noticed that the amount of memory
context allocated would not reduce contention at all.
This commit removes the whole mctxpool and just uses the mctx from
clientmgr as the contention will be reduced directly in the allocator.
The stale-answer-client-timeout feature introduced a dependancy on
when a client may be detached from the handle. The dboption
DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads
the meaning of this database option, and actually introduced a bug
because the option was checked in other places. In particular, in
'ns_query_done()' there is a check for 'RECURSING(qctx->client) &&
(!QUERY_STALEONLY(&qctx->client->query) || ...' and the condition is
satisfied because recursion has not completed yet and
DNS_DBFIND_STALEONLY is already cleared by that time (in
query_lookup()), because we found a useful answer and we should detach
the client from the handle after sending the response.
Add a new boolean to the client structure to keep track of client
detach from handle is allowed or not. It is only disallowed if we are
in a staleonly lookup and we didn't found a useful answer.
This commit completes the support for DNS-over-HTTP(S) built on top of
nghttp2 and plugs it into the BIND. Support for both GET and POST
requests is present, as required by RFC8484.
Both encrypted (via TLS) and unencrypted HTTP/2 connections are
supported. The latter are mostly there for debugging/troubleshooting
purposes and for the means of encryption offloading to third-party
software (as might be desirable in some environments to simplify TLS
certificates management).
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.
For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.
To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:
. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
RRset due to timeout (resolver-query-timeout), its intent is to
try to look for stale data in cache as a fallback, but only if
stale answers are enabled in configuration.
This flag is also used to activate stale-refresh-time window, since it
is the only way the database knows that a resolution has failed.
. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
that it may use stale data. It is always set during query lookup if
stale answers are enabled, but only effectively used during
stale-refresh-time window. Also during this window, the resolver will
not try to resolve the query, in other words no attempt to refresh the
data in cache is made when the stale-refresh-time window is active.
. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
stale data from the database, but not due to a failure in resolution,
it also doesn't require stale-refresh-time window timer to be active.
As long as there is a stale RRset available, it should be returned.
It is mainly used in two situations:
1. When stale-answer-client-timeout timer is triggered: in that case
we want to know if there is stale data available to answer the
client.
2. When stale-answer-client-timeout value is set to zero: in that
case, we also want to know if there is some stale RRset available
to promptly answer the client.
We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:
1. Are we running query_lookup() due to stale-answer-client-timeout
timer being triggered?
In this case, we look for stale data, making use of
DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
respond the client with the data found, mark this query as
answered (query attribute NS_QUERYATTR_ANSWERED), so when the
fetch completes the client won't be answered twice.
We must also take care of not detaching from the client, as a
fetch will still be running in background, this is handled by the
following snippet:
if (!QUERY_STALEONLY(&client->query)) {
isc_nmhandle_detach(&client->reqhandle);
}
Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
means we are here due to a stale-answer-client-timeout timer
expiration.
2. Are we running query_lookup() due to resolver-query-timeout being
triggered?
In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
to look for stale data will be made.
As already explained, this flag is algo used to activate
stale-refresh-time window, as it means that we failed to refresh
a RRset due to timeout.
It is ok in this situation to detach from the client, as the
fetch is already completed.
3. Are we running query_lookup() during the first time, looking for
a RRset in cache and stale-answer-client-timeout value is set to
zero?
In this case, if stale answers are enabled (probably), we must do
an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
indicate to the database that we want stale data.
If we find an active RRset, proceed as normal, answer the client
and the query is done.
If we find a stale RRset we respond to the client and mark the
query as answered, but don't detach from the client yet as an
attempt in refreshing the RRset will still be made by means of
the new introduced function 'query_resolve'.
If no active or stale RRset is available, begin resolution as
usual.
The general logic behind the addition of this new feature works as
folows:
When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.
With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.
When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:
1. Setup a new query context with the sole purpose of looking up for
stale RRset only data, for that matters a new flag was added
'DNS_DBFIND_STALEONLY' used in database lookups.
. If a stale RRset is found, mark the original client query as
answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
so when the fetch completion event is received later, we avoid
answering the client twice.
. If a stale RRset is not found, cleanup and wait for the normal
fetch completion event.
2. In ns_query_done, we must change this part:
/*
* If we're recursing then just return; the query will
* resume when recursion ends.
*/
if (RECURSING(qctx->client)) {
return (qctx->result);
}
To this:
if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
return (qctx->result);
}
Otherwise we would not proceed to answer the client if it happened
that a stale answer was found when looking up for stale only data.
When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:
1. Before answering the client (ns_client_send), check if the query
wasn't already answered before.
2. Before detaching a client, e.g.
isc_nmhandle_detach(&client->reqhandle), ensure that this is the
fetch completion event, and not the one triggered due to
stale-answer-client-timeout, so a correct call would be:
if (!QUERY_STALEONLY(client)) {
isc_nmhandle_detach(&client->reqhandle);
}
Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
* Following the example set in 634bdfb16d, the tlsdns netmgr
module now uses libuv and SSL primitives directly, rather than
opening a TLS socket which opens a TCP socket, as the previous
model was difficult to debug. Closes#2335.
* Remove the netmgr tls layer (we will have to re-add it for DoH)
* Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc
library; move the OpenSSL initialization/deinitialization from dstapi
needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}()
* Add couple of new shims needed for OpenSSL 1.0.x
* When LibreSSL is used, require at least version 2.7.0 that
has the best OpenSSL 1.1.x compatibility and auto init/deinit
* Enforce OpenSSL 1.1.x usage on Windows
* Added a TLSDNS unit test and implemented a simple TLSDNS echo
server and client.
previously query plugins were strictly synchrounous - the query
process would be interrupted at some point, data would be looked
up or a change would be made, and then the query processing would
resume immediately.
this commit enables query plugins to initiate asynchronous processes
and resume on a completion event, as with recursion.
Parse the configuration of tls objects into SSL_CTX* objects. Listen on
DoT if 'tls' option is setup in listen-on directive. Use DoT/DoH ports
for DoT/DoH.
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records. The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.
A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:
- reqhandle: held while waiting for a request callback (query,
notify, update)
- sendhandle: held while waiting for a send callback
- fetchhandle: held while waiting for a recursive fetch to
complete
- updatehandle: held while waiting for an update-forwarding
task to complete
control channel connection objects now contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- cmdhandle: held while an rndc command is running
httpd connections contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
The basic scenario for the problem was that in the process of
resolving a query, if any rrset was eligible for prefetching, then it
would trigger a call to query_prefetch(), this call would run in
parallel to the normal query processing.
The problem arises due to the fact that both query_prefetch(), and,
in the original thread, a call to ns_query_recurse(), try to attach
to the recursionquota, but recursing client stats counter is only
incremented if ns_query_recurse() attachs to it first.
Conversely, if fetch_callback() is called before prefetch_done(),
it would not only detach from recursionquota, but also decrement
the stats counter, if query_prefetch() attached to te quota first
that would result in a decrement not matched by an increment, as
expected.
To solve this issue an atomic bool was added, it is set once in
ns_query_recurse(), allowing fetch_callback() to check for it
and decrement stats accordingly.
For a more compreensive explanation check the thread comment below:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1719#note_145857
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
NS_CLIENT_TCP_BUFFER_SIZE was 2 byte too large following the
move to netmgr add associated changes to lib/ns/client.c and
as a result an INSIST could be trigger if the DNS message being
constructed had a checkpoint stage that fell in those two extra
bytes. Adjusted NS_CLIENT_TCP_BUFFER_SIZE and cleaned up
client_allocsendbuf now that the previously reserved 2 bytes
are no longer used.
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests. Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.
This squashed commit contains following changes:
- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
by using automake
- the libtool is now properly integrated with automake (the way we used it
was rather hackish as the only official way how to use libtool is via
automake
- the dynamic module loading was rewritten from a custom patchwork to libtool's
libltdl (which includes the patchwork to support module loading on different
systems internally)
- conversion of the unit test executor from kyua to automake parallel driver
- conversion of the system test executor from custom make/shell to automake
parallel driver
- The GSSAPI has been refactored, the custom SPNEGO on the basis that
all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
support SPNEGO mechanism.
- The various defunct tests from bin/tests have been removed:
bin/tests/optional and bin/tests/pkcs11
- The text files generated from the MD files have been removed, the
MarkDown has been designed to be readable by both humans and computers
- The xsl header is now generated by a simple sed command instead of
perl helper
- The <irs/platform.h> header has been removed
- cleanups of configure.ac script to make it more simpler, addition of multiple
macros (there's still work to be done though)
- the tarball can now be prepared with `make dist`
- the system tests are partially able to run in oot build
Here's a list of unfinished work that needs to be completed in subsequent merge
requests:
- `make distcheck` doesn't yet work (because of system tests oot run is not yet
finished)
- documentation is not yet built, there's a different merge request with docbook
to sphinx-build rst conversion that needs to be rebased and adapted on top of
the automake
- msvc build is non functional yet and we need to decide whether we will just
cross-compile bind9 using mingw-w64 or fix the msvc build
- contributed dlz modules are not included neither in the autoconf nor automake
After the network manager rewrite, tcp-higwater stats was only being
updated when a valid DNS query was received over tcp.
It turns out tcp-quota is updated right after a tcp connection is
accepted, before any data is read, so in the event that some client
connect but don't send a valid query, it wouldn't be taken into
account to update tcp-highwater stats, that is wrong.
This commit fix tcp-highwater to update its stats whenever a tcp connection
is established, independent of what happens after (timeout/invalid
request, etc).
- restore support for tcp-initial-timeout, tcp-idle-timeout,
tcp-keepalive-timeout and tcp-advertised-timeout configuration
options, which were ineffective previously.
- ns__client_request() is now called by netmgr with an isc_nmhandle_t
parameter. The handle can then be permanently associated with an
ns_client object.
- The task manager is paused so that isc_task events that may be
triggred during client processing will not fire until after the netmgr is
finished with it. Before any asynchronous event, the client MUST
call isc_nmhandle_ref(client->handle), to prevent the client from
being reset and reused while waiting for an event to process. When
the asynchronous event is complete, isc_nmhandle_unref(client->handle)
must be called to ensure the handle can be reused later.
- reference counting of client objects is now handled in the nmhandle
object. when the handle references drop to zero, the client's "reset"
callback is used to free temporary resources and reiniialize it,
whereupon the handle (and associated client) is placed in the
"inactive handles" queue. when the sysstem is shutdown and the
handles are cleaned up, the client's "put" callback is called to free
all remaining resources.
- because client allocation is no longer handled in the same way,
the '-T clienttest' option has now been removed and is no longer
used by any system tests.
- the unit tests require wrapping the isc_nmhandle_unref() function;
when LD_WRAP is supported, that is used. otherwise we link a
libwrap.so interposer library and use that.
The double-locked queue implementation is still currently in use
in ns_client, but will be replaced by a fetch-and-add array queue.
This commit moves it from queue.h to list.h so that queue.h can be
used for the new data structure, and clean up dependencies between
list.h and types.h. Later, when the ISC_QUEUE is no longer is use,
it will be removed completely.
This variable will report the maximum number of simultaneous tcp clients
that BIND has served while running.
It can be verified by running rndc status, then inspect "tcp high-water:
count", or by generating statistics file, rndc stats, then inspect the
line with "TCP connection high-water" text.
The tcp-highwater variable is atomically updated based on an existing
tcp-quota system handled in ns/client.c.
This commit changes the BIND cookie algorithms to match
draft-sury-toorop-dnsop-server-cookies-00. Namely, it changes the Client Cookie
algorithm to use SipHash 2-4, adds the new Server Cookie algorithm using SipHash
2-4, and changes the default for the Server Cookie algorithm to be siphash24.
Add siphash24 cookie algorithm, and make it keep legacy aes as
- if the TCP quota has been exceeded but there are no clients listening
for new connections on the interface, we can now force attachment to the
quota using isc_quota_force(), instead of carrying on with the quota not
attached.
- the TCP client quota is now referenced via a reference-counted
'ns_tcpconn' object, one of which is created whenever a client begins
listening for new connections, and attached to by members of that
client's pipeline group. when the last reference to the tcpconn
object is detached, it is freed and the TCP quota slot is released.
- reduce code duplication by adding mark_tcp_active() function
- convert counters to stdatomic
(cherry picked from commit a8dd133d270873b736c1be9bf50ebaa074f5b38f)
(cherry picked from commit 4a8fc979c4)