Commit graph

10696 commits

Author SHA1 Message Date
Tony Finch
f4c2909353 Use wait_for_log_re in the autosign system test
Fix another occurrence of the mistake of passing a regex to
wait_for_log by using the new wait_for_log_re instead.
2022-04-19 17:00:31 +01:00
Tony Finch
4a30733ae5 Avoid timeouts in the notify system test
There were two problems in the notify system test when it waited for
log messages to appear: the shellcheck refactoring introduced a call
to `wait_for_log` with a regex, but `wait_for_log` only supports fixed
strings, so it always ran for the full 45 second timeout; and the new
test to ensure that notify messages time out failed to reset the
nextpart pointer, so if the notify messages timed out before the test
ran, it would fail to see them.

This change adds a `wait_for_log_re` helper that matches a regex, and
uses it where appropriate in the notify system test, which stops the
test from waiting longer than necessary; and it resets the nextpart
pointer so that the notify timeout test works reliably.

Closes #3275
2022-04-19 17:00:31 +01:00
Ondřej Surý
f55a4d3e55 Allow listening on less than nworkers threads
For some applications, it's useful to not listen on full battery of
threads.  Add workers argument to all isc_nm_listen*() functions and
convenience ISC_NM_LISTEN_ONE and ISC_NM_LISTEN_ALL macros.
2022-04-19 11:08:13 +02:00
Mark Andrews
d2d9910da2 Check that pending negative cache entries for DS can be used successfully
Prime the cache with a negative cache DS entry then make a query for
name beneath that entry. This will cause the DS entry to be retieved
as part of the validation process.  Each RRset in the ncache entry
will be validated and the trust level for each will be updated.
2022-04-19 08:38:26 +10:00
Evan Hunt
4eee6460ff ensure dig sets exitcode after local UDP connection failure
dig previously set an exit code of 9 when a TCP connection failed
or when a UDP connection timed out, but when the server address is
localhost it's possible for a UDP query to fail with ISC_R_CONNREFUSED.
that code path didn't update the exit code, causing dig to exit with
status 0. we now set the exit code to 9 in this failure case.
2022-04-15 10:32:31 -07:00
Aram Sargsyan
bb837db4ee Implement catalog zones change of ownership (coo) support
Catalog zones change of ownership is special mechanism to facilitate
controlled migration of a member zone from one catalog to another.

It is implemented using catalog zones property named "coo" and is
documented in DNS catalog zones draft version 5 document.

Implement the feature using a new hash table in the catalog zone
structure, which holds the added "coo" properties for the catalog zone
(containing the target catalog zone's name), and the key for the hash
table being the member zone's name for which the "coo" property is being
created.

Change some log messages to have consistent zone name quoting types.

Update the ARM with change of ownership documentation and usage
examples.

Add tests which check newly the added features.
2022-04-14 20:41:52 +00:00
Aram Sargsyan
cedfebc64a Implement catalog zones options new syntax based on custom properties
According to DNS catalog zones draft version 5 document, catalog
zone custom properties must be placed under the "ext" label.

Make necessary changes to support the new custom properties syntax in
catalog zones with version "2" of the schema.

Change the default catalog zones schema version from "1" to "2" in
ARM to prepare for the new features and changes which come starting
from this commit in order to support the latest DNS catalog zones draft
document.

Make some restructuring in ARM and rename the term catalog zone "option"
to "custom property" to better reflect the terms used in the draft.

Change the version of 'catalog1.zone.' catalog zone in the "catz" system
test to "2", and leave the version of 'catalog2.zone.' catalog zone at
version "1" to test both versions.

Add tests to check that the new syntax works only with the new schema
version, and that the old syntax works only with the legacy schema
version catalog zones.
2022-04-14 10:53:52 +00:00
Evan Hunt
6bf8535542 detach unfinished query when canceling
when a query was canceled while still in the process of connecting,
tcp_connected() and udp_ready() didn't detach the query object.
2022-04-14 09:34:40 +00:00
Aram Sargsyan
b944bf4120 Unify dig +nssearch next query starting code for TCP and UDP protocols
In `+nssearch` mode `dig` starts the next query of the followup lookup
using `start_udp()` or `start_tcp()` calls without waiting for the
previous query to complete.

In UDP mode that happens in the `send_done()` callback of the previous
query, but in TCP mode that happens in the `start_tcp()` call of the
previous query (recursion) which doesn't work because `start_tcp()`
attaches the `lookup->current_query` to the query it is starting, so a
recursive call will result in an assertion failure.

Make the TCP mode to start the next query in `send_done()`, just like in
the UDP mode. During that time the `lookup->current_query` is already
detached by the `tcp_connected()`/`udp_ready()` callbacks.
2022-04-14 09:34:40 +00:00
Matthijs Mekking
f08277f9fb Test CDS DELETE persists after zone sign
Add a test case for a dynamically added CDS DELETE record and make
sure it is not removed when signing the zone. This happens because
BIND maintains CDS and CDNSKEY publishing and it will only allow
CDS DELETE records if the zone is transitioning to insecure. This is
a state that can be identified when using KASP through 'dnssec-policy',
but not when using 'auto-dnssec'.
2022-04-13 13:26:59 +02:00
Michał Kępień
806f457147 Fix "forward" system test requirements
Commit bf3fffff67 added a Python-based
name server (bin/tests/system/forward/ans11/ans.py) to the "forward"
system test, but did not update bin/tests/system/Makefile.am to ensure
Python is present in the test environment before the "forward" system
test is run.  Update bin/tests/system/Makefile.am to enforce that
requirement.
2022-04-12 15:29:26 +02:00
Aram Sargsyan
848094d6f7
Add a hung fetch check while chasing DS in the forward system test
Implement TCP support in the `ans11` Python-based DNS server.

Implement a control command channel in `ans11` to support an optional
silent mode of operation, which, when enabled, will ignore incoming
queries.

In the added check, make the `ans11` the NS server of
"a.root-servers.nil." for `ns3`, so it uses `ans11` (in silent mode)
for the regular (non-forwarded) name resolutions.

This will trigger the "hung fetch" scenario, which was causing `named`
to crash.
2022-04-08 10:27:26 +02:00
Mark Andrews
bf3fffff67
Add tests for forwarder cache poisoning scenarios
- Check that an NS in an authority section returned from a forwarder
  which is above the name in a configured "forward first" or "forward
  only" zone (i.e., net/NS in a response from a forwarder configured for
  local.net) is not cached.
- Test that a DNAME for a parent domain will not be cached when sent
  in a response from a forwarder configured to answer for a child.
- Check that glue is rejected if its name falls below that of zone
  configured locally.
- Check that an extra out-of-bailiwick data in the answer section is
  not cached (this was already working correctly, but was not explicitly
  tested before).
2022-04-07 18:43:23 +02:00
Matthijs Mekking
b9ebde705b
Add system test lingering CLOSE_WAIT TCP sockets
Add a test case to check for lingering TCP sockets stuck in the
CLOSE_WAIT state. This can happen if a client sends some garbage after
its first query.

The system test runs the reproducer script and then sends another TCP
query to the resolver. The resolver is configured to allow one TCP
client only. If BIND has its TCP socket stuck in CLOSE_WAIT, it does
not have the resources available to answer the second query.

Note: A better test would be to check if the named daemon does not
have a TCP socket stuck in CLOSE_WAIT for example with netstat. When
running this test locally you can examine named with netstat manually.
But since netstat is platform specific it is not a good candidate to do
this as a system test.

If you, if you could return, don't let it burn.
Do you have to let it linger?
- Cranberries
2022-04-07 17:02:48 +02:00
Petr Špaček
d26d4f289f
Generate JUnit reports for unit & system tests
This allows Gitlab to show nice summary for individual tests/test
directories and to expose the results in Gitlab API for consumption
elsewhere.

A catch: As of Gitlab 14.7.7, the detailed results are stored
only in artifacts and thus expire. All consumers (including API) need
to be "fast enough" to get the data before they disappear.
This also forces us to always store the artifacts intead of storing them
only on failure.
2022-04-06 21:14:38 +02:00
Tony Finch
71ce8b0a51 Ensure that dns_request_createvia() has a retry limit
There are a couple of problems with dns_request_createvia(): a UDP
retry count of zero means unlimited retries (it should mean no
retries), and the overall request timeout is not enforced. The
combination of these bugs means that requests can be retried forever.

This change alters calls to dns_request_createvia() to avoid the
infinite retry bug by providing an explicit retry count. Previously,
the calls specified infinite retries and relied on the limit implied
by the overall request timeout and the UDP timeout (which did not work
because the overall timeout is not enforced). The `udpretries`
argument is also changed to be the number of retries; previously, zero
was interpreted as infinity because of an underflow to UINT_MAX, which
appeared to be a mistake. And `mdig` is updated to match the change in
retry accounting.

The bug could be triggered by zone maintenance queries, including
NOTIFY messages, DS parental checks, refresh SOA queries and stub zone
nameserver lookups. It could also occur with `nsupdate -r 0`.
(But `mdig` had its own code to avoid the bug.)
2022-04-06 17:12:48 +01:00
Tony Finch
5867c1b727 Make notify test shellcheck clean
Use POSIX shell syntax, and use functions to reduce repetition.
2022-04-06 17:12:08 +01:00
Artem Boldariev
8bec4a6bf6 Extend the doth system test
This commit adds simple checks that the TLS contexts in question are
indeed being updated on DoT and DoH listeners.
2022-04-06 18:45:57 +03:00
Ondřej Surý
7e71c4d0cc Rename the configuration option to load balance sockets to reuseport
After some back and forth, it was decidede to match the configuration
option with unbound ("so-reuseport"), PowerDNS ("reuseport") and/or
nginx ("reuseport").
2022-04-06 17:03:57 +02:00
Aram Sargsyan
5b2b3e589c Fix using unset pointer when printing a debug message in dighost.c
The used `query->handle` is always `NULL` at this point.

Change the code to use `handle` instead.
2022-04-05 11:20:42 +00:00
Aram Sargsyan
2771a5b64d Add a missing clear_current_lookup() call in recv_done()
The error code path handling the `ISC_R_CANCELED` code lacks a
`clear_current_lookup()` call, without which dig hangs indefinitely
when handling the error.

Add the missing call to account for all references of the lookup so
it can be destroyed.
2022-04-05 11:20:42 +00:00
Aram Sargsyan
f831e758d1 When using +qr in dig print the data of the current query
In `send_udp()` and `launch_next_query()` functions, when calling
`dighost_printmessage()` to print detailed information about the
sent query, dig always prints the data of the first query in the
lookup's queries list.

The first query in the list can be already finished, having its handles
freed, and accessing this information results in assertion failure.

Print the current query's information instead.
2022-04-05 11:20:41 +00:00
Mark Andrews
56fbed2f0f
Add regression test for CVE-2022-0635 2022-04-05 09:54:45 +02:00
Mark Andrews
9ef4d2b583 Use multiple fixed expressions for portable grep usage
Additionally add "network unreachable" as an expected error message.
2022-04-05 03:55:13 +00:00
Ondřej Surý
142c63dda8 Enable the load-balance-sockets configuration
Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private
netmgr-int.h header file, making the configuration of load balanced
sockets inoperable.

Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add
missing isc_nm_getloadbalancesockets() implementation.
2022-04-05 01:30:58 +02:00
Ondřej Surý
85c6e797aa Add option to configure load balance sockets
Previously, the option to enable kernel load balancing of the sockets
was always enabled when supported by the operating system (SO_REUSEPORT
on Linux and SO_REUSEPORT_LB on FreeBSD).

It was reported that in scenarios where the networking threads are also
responsible for processing long-running tasks (like RPZ processing, CATZ
processing or large zone transfers), this could lead to intermitten
brownouts for some clients, because the thread assigned by the operating
system might be busy.  In such scenarious, the overall performance would
be better served by threads competing over the sockets because the idle
threads can pick up the incoming traffic.

Add new configuration option (`load-balance-sockets`) to allow enabling
or disabling the load balancing of the sockets.
2022-04-04 23:10:04 +02:00
Aram Sargsyan
7e2f50c369 Fix dig hanging issue in cases when the lookup's next query can't start
In recv_done(), when dig decides to start the lookup's next query in
the line using `start_udp()` or `start_tcp()`, and for some reason,
no queries get started, dig doesn't cancel the lookup.

This can occur, for example, when there are two queries in the lookup,
one with a regular IP address, and another with a IPv4 mapped IPv6
address. When the regular IP address fails to serve the query, its
`recv_done()` callback starts the next query in the line (in this
case the one with a mapped IP address), but because `dig` doesn't
connect to such IP addresses, and there are no other queries in the
list, no new queries are being started, and the lookup keeps hanging.

After calling `start_udp()` or `start_tcp()` in `recv_done()`, check
if there are no pending/working queries then cancel the lookup instead
of only detaching from the current query.
2022-04-04 09:15:56 +00:00
Ondřej Surý
ae01ec2823 Don't use reference counting in isc_timer unit
The reference counting and isc_timer_attach()/isc_timer_detach()
semantic are actually misleading because it cannot be used under normal
conditions.  The usual conditions under which is timer used uses the
object where timer is used as argument to the "timer" itself.  This
means that when the caller is using `isc_timer_detach()` it needs the
timer to stop and the isc_timer_detach() does that only if this would be
the last reference.  Unfortunately, this also means that if the timer is
attached elsewhere and the timer is fired it will most likely be
use-after-free, because the object used in the timer no longer exists.

Remove the reference counting from the isc_timer unit, remove
isc_timer_attach() function and rename isc_timer_detach() to
isc_timer_destroy() to better reflect how the API needs to be used.

The only caveat is that the already executed event must be destroyed
before the isc_timer_destroy() is called because the timer is no longet
attached to .ev_destroy_arg.
2022-04-02 01:23:15 +02:00
Ondřej Surý
30e0fd942b Remove task privileged mode
Previously, the task privileged mode has been used only when the named
was starting up and loading the zones from the disk as the "first" thing
to do.  The privileged task was setup with quantum == 2, which made the
taskmgr/netmgr spin around the privileged queue processing two events at
the time.

The same effect can be achieved by setting the quantum to UINT_MAX (e.g.
practically unlimited) for the loadzone task, hence the privileged task
mode was removed in favor of just processing all the events on the
loadzone task in a single task_run().
2022-04-01 23:55:26 +02:00
Ondřej Surý
2bc7303af2 Use isc_nm_getnworkers to manage zone resources
Instead of passing the number of worker to the dns_zonemgr manually,
get the number of nm threads using the new isc_nm_getnworkers() call.

Additionally, remove the isc_pool API and manage the array of memory
context, zonetasks and loadtasks directly in the zonemgr.
2022-04-01 23:50:34 +02:00
Ondřej Surý
2707d0eeb7 Set hard thread affinity for each zone
After switching to per-thread resources in the zonemgr, the performance
was decreased because the memory context, zonetask and loadtask was
picked from the pool at random.

Pin the zone to single threadid (.tid) and align the memory context,
zonetask and loadtask to be the same, this sets the hard affinity of the
zone to the netmgr thread.
2022-04-01 23:50:34 +02:00
Ondřej Surý
8b4ba366dd Remove the zone counting in the named
The zone counting in the named was used to properly size the zonemgr
resources (memory contexts, zonetasks and loadtasks).  Since this is no
longer the case, remove the whole zone counting from named.
2022-04-01 23:50:34 +02:00
Ondřej Surý
a94678ff77 Create per-thread task and memory context for zonemgr
Previously, the zonemgr created 1 task per 100 zones and 1 memory
context per 1000 zones (with minimum 10 tasks and 2 memory contexts) to
reduce the contention between threads.

Instead of reducing the contention by having many resources, create a
per-nm_thread memory context, loadtask and zonetask and spread the zones
between just per-thread resources.

Note: this commit alone does decrease performance when loading the zone
by couple seconds (in case of 1M zone) and thus there's more work in
this whole MR fixing the performance.
2022-04-01 23:50:34 +02:00
Evan Hunt
5319d8adea fix resolver test when built without --enable-querytrace
a test case in the 'resolver' system test was reliant on
logged output that would only be present when query tracing
was enabled, as in developer builds. that test case is now
disabled when query tracing is not available. Thanks to
Anton Castelli.
2022-04-01 09:54:44 -07:00
Aram Sargsyan
4477f71868 Synchronze udp_ready() and tcp_connected() functions entry behavior
The `udp_ready()` and `tcp_connected()` functions in dighost.c are
used for similar purposes for UDP and TCP respectively.

Synchronize the `udp_ready()` function entry code to behave like
`tcp_connected()` by adding input validation, debug messages and
early exit code when `cancel_now` is `true`.
2022-04-01 10:56:27 +00:00
Aram Sargsyan
7d360bd05e Fix "dig +nssearch" indefinitely hanging issue
When finishing the NSSEARCH task and there is no more followup
lookups to start, dig does not destroy the last lookup, which
causes it to hang indefinitely.

Rename the unused `first_pass` member of `dig_query_t` to `started`
and make it `true` in the first callback after `start_udp()` or
`start_tcp()` of the query to indicate that the query has been
started.

Create a new `check_if_queries_done()` function to check whether
all of the queries inside a lookup have been started and finished,
or canceled.

Use the mentioned function in the TRACE code block in `recv_done()`
to check whether the current query is the last one in the lookup and
cancel the lookup in that case to free the resources.
2022-04-01 10:56:27 +00:00
Evan Hunt
bd814b79d4 add a system test for $GENERATE with an integer overflow
the line "$GENERATE 19-28/2147483645 $ CNAME x" should generate
a single CNAME with the owner "19.example.com", but prior to the
overflow bug it generated several CNAMEs, half of them with large
negative values.

we now test for the bugfix by using "named-checkzone -D" and
grepping for a single CNAME in the output.
2022-04-01 07:56:52 +00:00
Evan Hunt
2261c853b5 update shell syntax
clean up the shell syntax in the checkzone test prior to adding
a new test.
2022-04-01 07:56:52 +00:00
Tony Finch
84c4eb02e7 Log "not authoritative for update zone" more clearly
Ensure the update zone name is mentioned in the NOTAUTH error message
in the server log, so that it is easier to track down problematic
update clients. There are two cases: either the update zone is
unrelated to any of the server's zones (previously no zone was
mentioned); or the update zone is a subdomain of one or more of the
server's zones (previously the name of the irrelevant parent zone was
misleadingly logged).

Closes #3209
2022-03-30 12:50:30 +01:00
Ondřej Surý
3a650d973f Remove isc_appctx_t use in dns_client
The use of isc_appctx_t in dns_client was used to wait for
dns_client_startresolve() to finish the processing (the resolve_done()
task callback).

This has been replaced with standard bool+cond+lock combination removing
the need of isc_appctx_t altogether.
2022-03-29 14:14:49 -07:00
Tony Finch
29a3e77425 MacOS needs more IP addresses to run the system tests
The launchd script only counted up to 8 whereas ifconfig.sh went all
the way up to 10, and even a bit further than that.
2022-03-29 16:59:19 +01:00
Ondřej Surý
b05a991ad0 Make isc_ht optionally case insensitive
Previously, the isc_ht API would always take the key as a literal input
to the hashing function.  Change the isc_ht_init() function to take an
'options' argument, in which ISC_HT_CASE_SENSITIVE or _INSENSITIVE can
be specified, to determine whether to use case-sensitive hashing in
isc_hash32() when hashing the key.
2022-03-28 15:02:18 -07:00
Ondřej Surý
4dceab142d Consistenly use UNREACHABLE() instead of ISC_UNREACHABLE()
In couple places, we have missed INSIST(0) or ISC_UNREACHABLE()
replacement on some branches with UNREACHABLE().  Replace all
ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().
2022-03-28 23:26:08 +02:00
Artem Boldariev
cfea9a3aec Extend the 'doth' system test with Strict/Mutual TLS checks
This commit extends the 'doth' system test with a set of Strict/Mutual
TLS related checks.

This commit also makes each doth NS instance use its own TLS
certificate that includes FQDN, IPv4, and IPv6 addresses, issued using
a common Certificate Authority, instead of ad-hoc certs.

Extend servers initialisation timeout to 60 seconds to improve the
tests stability in the CI as certain configurations could fail to
initialise on time under load.
2022-03-28 16:22:53 +03:00
Artem Boldariev
7b9318bf72 Add missing plain HTTP options to dig's help output
A couple of dig options were missing in the help output, while been
properly documented and supported. This commit fixes this overlook.
2022-03-28 16:22:53 +03:00
Artem Boldariev
57f0251713 Add support for Strict/Mutual TLS into BIND
This commit adds support for Strict/Mutual TLS into BIND. It does so
by implementing the backing code for 'hostname' and 'ca-file' options
of the 'tls' statement. The commit also updates the documentation
accordingly.
2022-03-28 16:22:53 +03:00
Artem Boldariev
89d7059103 Restore disabled unused 'tls' options: 'ca-file' and 'hostname'
This commit restores the 'tls' options disabled in
78b73d0865.
2022-03-28 16:22:53 +03:00
Artem Boldariev
fd38a4e1bf Add support for Strict/Mutual TLS to dig
This commit adds support for Strict/Mutual TLS to dig.

The new command-line options and their behaviour are modelled after
kdig (+tls-ca, +tls-hostname, +tls-certfile, +tls-keyfile) for
compatibility reasons. That is, using +tls-* is sufficient to enable
DoT in dig, implying +tls-ca

If there is no other DNS transport specified via command-line,
specifying any of +tls-* options makes dig use DoT. In this case, its
behaviour is the same as if +tls-ca is specified: that is, the remote
peer's certificate is verified using the platform-specific
intermediate CA certificates store. This behaviour is introduced for
compatibility with kdig.
2022-03-28 16:22:53 +03:00
Aram Sargsyan
9b84bfb5f4 Cleanup the code to remove unnecessary indentation
Because of the "goto" in the "if" body the "else" part is unnecessary
and adds another level of indentation.

Cleanup the code to not have the "else" part.
2022-03-28 10:17:56 +00:00
Aram Sargsyan
d29e5f197b Log a warning when catz is told to modify a zone not added by catz
Catz logs a warning message when it is told to modify a zone which was
not added by the current catalog zone.

When logging a warning, distinguish the two cases when the zone
was not added by a catalog zone at all, and when the zone was
added by a different catalog zone.
2022-03-28 10:17:56 +00:00