There were two problems in the notify system test when it waited for
log messages to appear: the shellcheck refactoring introduced a call
to `wait_for_log` with a regex, but `wait_for_log` only supports fixed
strings, so it always ran for the full 45 second timeout; and the new
test to ensure that notify messages time out failed to reset the
nextpart pointer, so if the notify messages timed out before the test
ran, it would fail to see them.
This change adds a `wait_for_log_re` helper that matches a regex, and
uses it where appropriate in the notify system test, which stops the
test from waiting longer than necessary; and it resets the nextpart
pointer so that the notify timeout test works reliably.
Closes#3275
For some applications, it's useful to not listen on full battery of
threads. Add workers argument to all isc_nm_listen*() functions and
convenience ISC_NM_LISTEN_ONE and ISC_NM_LISTEN_ALL macros.
Prime the cache with a negative cache DS entry then make a query for
name beneath that entry. This will cause the DS entry to be retieved
as part of the validation process. Each RRset in the ncache entry
will be validated and the trust level for each will be updated.
dig previously set an exit code of 9 when a TCP connection failed
or when a UDP connection timed out, but when the server address is
localhost it's possible for a UDP query to fail with ISC_R_CONNREFUSED.
that code path didn't update the exit code, causing dig to exit with
status 0. we now set the exit code to 9 in this failure case.
Catalog zones change of ownership is special mechanism to facilitate
controlled migration of a member zone from one catalog to another.
It is implemented using catalog zones property named "coo" and is
documented in DNS catalog zones draft version 5 document.
Implement the feature using a new hash table in the catalog zone
structure, which holds the added "coo" properties for the catalog zone
(containing the target catalog zone's name), and the key for the hash
table being the member zone's name for which the "coo" property is being
created.
Change some log messages to have consistent zone name quoting types.
Update the ARM with change of ownership documentation and usage
examples.
Add tests which check newly the added features.
According to DNS catalog zones draft version 5 document, catalog
zone custom properties must be placed under the "ext" label.
Make necessary changes to support the new custom properties syntax in
catalog zones with version "2" of the schema.
Change the default catalog zones schema version from "1" to "2" in
ARM to prepare for the new features and changes which come starting
from this commit in order to support the latest DNS catalog zones draft
document.
Make some restructuring in ARM and rename the term catalog zone "option"
to "custom property" to better reflect the terms used in the draft.
Change the version of 'catalog1.zone.' catalog zone in the "catz" system
test to "2", and leave the version of 'catalog2.zone.' catalog zone at
version "1" to test both versions.
Add tests to check that the new syntax works only with the new schema
version, and that the old syntax works only with the legacy schema
version catalog zones.
In `+nssearch` mode `dig` starts the next query of the followup lookup
using `start_udp()` or `start_tcp()` calls without waiting for the
previous query to complete.
In UDP mode that happens in the `send_done()` callback of the previous
query, but in TCP mode that happens in the `start_tcp()` call of the
previous query (recursion) which doesn't work because `start_tcp()`
attaches the `lookup->current_query` to the query it is starting, so a
recursive call will result in an assertion failure.
Make the TCP mode to start the next query in `send_done()`, just like in
the UDP mode. During that time the `lookup->current_query` is already
detached by the `tcp_connected()`/`udp_ready()` callbacks.
Add a test case for a dynamically added CDS DELETE record and make
sure it is not removed when signing the zone. This happens because
BIND maintains CDS and CDNSKEY publishing and it will only allow
CDS DELETE records if the zone is transitioning to insecure. This is
a state that can be identified when using KASP through 'dnssec-policy',
but not when using 'auto-dnssec'.
Commit bf3fffff67 added a Python-based
name server (bin/tests/system/forward/ans11/ans.py) to the "forward"
system test, but did not update bin/tests/system/Makefile.am to ensure
Python is present in the test environment before the "forward" system
test is run. Update bin/tests/system/Makefile.am to enforce that
requirement.
Implement TCP support in the `ans11` Python-based DNS server.
Implement a control command channel in `ans11` to support an optional
silent mode of operation, which, when enabled, will ignore incoming
queries.
In the added check, make the `ans11` the NS server of
"a.root-servers.nil." for `ns3`, so it uses `ans11` (in silent mode)
for the regular (non-forwarded) name resolutions.
This will trigger the "hung fetch" scenario, which was causing `named`
to crash.
- Check that an NS in an authority section returned from a forwarder
which is above the name in a configured "forward first" or "forward
only" zone (i.e., net/NS in a response from a forwarder configured for
local.net) is not cached.
- Test that a DNAME for a parent domain will not be cached when sent
in a response from a forwarder configured to answer for a child.
- Check that glue is rejected if its name falls below that of zone
configured locally.
- Check that an extra out-of-bailiwick data in the answer section is
not cached (this was already working correctly, but was not explicitly
tested before).
Add a test case to check for lingering TCP sockets stuck in the
CLOSE_WAIT state. This can happen if a client sends some garbage after
its first query.
The system test runs the reproducer script and then sends another TCP
query to the resolver. The resolver is configured to allow one TCP
client only. If BIND has its TCP socket stuck in CLOSE_WAIT, it does
not have the resources available to answer the second query.
Note: A better test would be to check if the named daemon does not
have a TCP socket stuck in CLOSE_WAIT for example with netstat. When
running this test locally you can examine named with netstat manually.
But since netstat is platform specific it is not a good candidate to do
this as a system test.
If you, if you could return, don't let it burn.
Do you have to let it linger?
- Cranberries
This allows Gitlab to show nice summary for individual tests/test
directories and to expose the results in Gitlab API for consumption
elsewhere.
A catch: As of Gitlab 14.7.7, the detailed results are stored
only in artifacts and thus expire. All consumers (including API) need
to be "fast enough" to get the data before they disappear.
This also forces us to always store the artifacts intead of storing them
only on failure.
There are a couple of problems with dns_request_createvia(): a UDP
retry count of zero means unlimited retries (it should mean no
retries), and the overall request timeout is not enforced. The
combination of these bugs means that requests can be retried forever.
This change alters calls to dns_request_createvia() to avoid the
infinite retry bug by providing an explicit retry count. Previously,
the calls specified infinite retries and relied on the limit implied
by the overall request timeout and the UDP timeout (which did not work
because the overall timeout is not enforced). The `udpretries`
argument is also changed to be the number of retries; previously, zero
was interpreted as infinity because of an underflow to UINT_MAX, which
appeared to be a mistake. And `mdig` is updated to match the change in
retry accounting.
The bug could be triggered by zone maintenance queries, including
NOTIFY messages, DS parental checks, refresh SOA queries and stub zone
nameserver lookups. It could also occur with `nsupdate -r 0`.
(But `mdig` had its own code to avoid the bug.)
After some back and forth, it was decidede to match the configuration
option with unbound ("so-reuseport"), PowerDNS ("reuseport") and/or
nginx ("reuseport").
The error code path handling the `ISC_R_CANCELED` code lacks a
`clear_current_lookup()` call, without which dig hangs indefinitely
when handling the error.
Add the missing call to account for all references of the lookup so
it can be destroyed.
In `send_udp()` and `launch_next_query()` functions, when calling
`dighost_printmessage()` to print detailed information about the
sent query, dig always prints the data of the first query in the
lookup's queries list.
The first query in the list can be already finished, having its handles
freed, and accessing this information results in assertion failure.
Print the current query's information instead.
Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private
netmgr-int.h header file, making the configuration of load balanced
sockets inoperable.
Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add
missing isc_nm_getloadbalancesockets() implementation.
Previously, the option to enable kernel load balancing of the sockets
was always enabled when supported by the operating system (SO_REUSEPORT
on Linux and SO_REUSEPORT_LB on FreeBSD).
It was reported that in scenarios where the networking threads are also
responsible for processing long-running tasks (like RPZ processing, CATZ
processing or large zone transfers), this could lead to intermitten
brownouts for some clients, because the thread assigned by the operating
system might be busy. In such scenarious, the overall performance would
be better served by threads competing over the sockets because the idle
threads can pick up the incoming traffic.
Add new configuration option (`load-balance-sockets`) to allow enabling
or disabling the load balancing of the sockets.
In recv_done(), when dig decides to start the lookup's next query in
the line using `start_udp()` or `start_tcp()`, and for some reason,
no queries get started, dig doesn't cancel the lookup.
This can occur, for example, when there are two queries in the lookup,
one with a regular IP address, and another with a IPv4 mapped IPv6
address. When the regular IP address fails to serve the query, its
`recv_done()` callback starts the next query in the line (in this
case the one with a mapped IP address), but because `dig` doesn't
connect to such IP addresses, and there are no other queries in the
list, no new queries are being started, and the lookup keeps hanging.
After calling `start_udp()` or `start_tcp()` in `recv_done()`, check
if there are no pending/working queries then cancel the lookup instead
of only detaching from the current query.
The reference counting and isc_timer_attach()/isc_timer_detach()
semantic are actually misleading because it cannot be used under normal
conditions. The usual conditions under which is timer used uses the
object where timer is used as argument to the "timer" itself. This
means that when the caller is using `isc_timer_detach()` it needs the
timer to stop and the isc_timer_detach() does that only if this would be
the last reference. Unfortunately, this also means that if the timer is
attached elsewhere and the timer is fired it will most likely be
use-after-free, because the object used in the timer no longer exists.
Remove the reference counting from the isc_timer unit, remove
isc_timer_attach() function and rename isc_timer_detach() to
isc_timer_destroy() to better reflect how the API needs to be used.
The only caveat is that the already executed event must be destroyed
before the isc_timer_destroy() is called because the timer is no longet
attached to .ev_destroy_arg.
Previously, the task privileged mode has been used only when the named
was starting up and loading the zones from the disk as the "first" thing
to do. The privileged task was setup with quantum == 2, which made the
taskmgr/netmgr spin around the privileged queue processing two events at
the time.
The same effect can be achieved by setting the quantum to UINT_MAX (e.g.
practically unlimited) for the loadzone task, hence the privileged task
mode was removed in favor of just processing all the events on the
loadzone task in a single task_run().
Instead of passing the number of worker to the dns_zonemgr manually,
get the number of nm threads using the new isc_nm_getnworkers() call.
Additionally, remove the isc_pool API and manage the array of memory
context, zonetasks and loadtasks directly in the zonemgr.
After switching to per-thread resources in the zonemgr, the performance
was decreased because the memory context, zonetask and loadtask was
picked from the pool at random.
Pin the zone to single threadid (.tid) and align the memory context,
zonetask and loadtask to be the same, this sets the hard affinity of the
zone to the netmgr thread.
The zone counting in the named was used to properly size the zonemgr
resources (memory contexts, zonetasks and loadtasks). Since this is no
longer the case, remove the whole zone counting from named.
Previously, the zonemgr created 1 task per 100 zones and 1 memory
context per 1000 zones (with minimum 10 tasks and 2 memory contexts) to
reduce the contention between threads.
Instead of reducing the contention by having many resources, create a
per-nm_thread memory context, loadtask and zonetask and spread the zones
between just per-thread resources.
Note: this commit alone does decrease performance when loading the zone
by couple seconds (in case of 1M zone) and thus there's more work in
this whole MR fixing the performance.
a test case in the 'resolver' system test was reliant on
logged output that would only be present when query tracing
was enabled, as in developer builds. that test case is now
disabled when query tracing is not available. Thanks to
Anton Castelli.
The `udp_ready()` and `tcp_connected()` functions in dighost.c are
used for similar purposes for UDP and TCP respectively.
Synchronize the `udp_ready()` function entry code to behave like
`tcp_connected()` by adding input validation, debug messages and
early exit code when `cancel_now` is `true`.
When finishing the NSSEARCH task and there is no more followup
lookups to start, dig does not destroy the last lookup, which
causes it to hang indefinitely.
Rename the unused `first_pass` member of `dig_query_t` to `started`
and make it `true` in the first callback after `start_udp()` or
`start_tcp()` of the query to indicate that the query has been
started.
Create a new `check_if_queries_done()` function to check whether
all of the queries inside a lookup have been started and finished,
or canceled.
Use the mentioned function in the TRACE code block in `recv_done()`
to check whether the current query is the last one in the lookup and
cancel the lookup in that case to free the resources.
the line "$GENERATE 19-28/2147483645 $ CNAME x" should generate
a single CNAME with the owner "19.example.com", but prior to the
overflow bug it generated several CNAMEs, half of them with large
negative values.
we now test for the bugfix by using "named-checkzone -D" and
grepping for a single CNAME in the output.
Ensure the update zone name is mentioned in the NOTAUTH error message
in the server log, so that it is easier to track down problematic
update clients. There are two cases: either the update zone is
unrelated to any of the server's zones (previously no zone was
mentioned); or the update zone is a subdomain of one or more of the
server's zones (previously the name of the irrelevant parent zone was
misleadingly logged).
Closes#3209
The use of isc_appctx_t in dns_client was used to wait for
dns_client_startresolve() to finish the processing (the resolve_done()
task callback).
This has been replaced with standard bool+cond+lock combination removing
the need of isc_appctx_t altogether.
Previously, the isc_ht API would always take the key as a literal input
to the hashing function. Change the isc_ht_init() function to take an
'options' argument, in which ISC_HT_CASE_SENSITIVE or _INSENSITIVE can
be specified, to determine whether to use case-sensitive hashing in
isc_hash32() when hashing the key.
In couple places, we have missed INSIST(0) or ISC_UNREACHABLE()
replacement on some branches with UNREACHABLE(). Replace all
ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().
This commit extends the 'doth' system test with a set of Strict/Mutual
TLS related checks.
This commit also makes each doth NS instance use its own TLS
certificate that includes FQDN, IPv4, and IPv6 addresses, issued using
a common Certificate Authority, instead of ad-hoc certs.
Extend servers initialisation timeout to 60 seconds to improve the
tests stability in the CI as certain configurations could fail to
initialise on time under load.
This commit adds support for Strict/Mutual TLS into BIND. It does so
by implementing the backing code for 'hostname' and 'ca-file' options
of the 'tls' statement. The commit also updates the documentation
accordingly.
This commit adds support for Strict/Mutual TLS to dig.
The new command-line options and their behaviour are modelled after
kdig (+tls-ca, +tls-hostname, +tls-certfile, +tls-keyfile) for
compatibility reasons. That is, using +tls-* is sufficient to enable
DoT in dig, implying +tls-ca
If there is no other DNS transport specified via command-line,
specifying any of +tls-* options makes dig use DoT. In this case, its
behaviour is the same as if +tls-ca is specified: that is, the remote
peer's certificate is verified using the platform-specific
intermediate CA certificates store. This behaviour is introduced for
compatibility with kdig.
Because of the "goto" in the "if" body the "else" part is unnecessary
and adds another level of indentation.
Cleanup the code to not have the "else" part.
Catz logs a warning message when it is told to modify a zone which was
not added by the current catalog zone.
When logging a warning, distinguish the two cases when the zone
was not added by a catalog zone at all, and when the zone was
added by a different catalog zone.