Previously, upon receiving a query with TSIG, the server would log
an error and timeout. As there is no way to set up the keyring in the
class anyway (and I believe we don't need it), this commit lets such
queries parse but logs the fact that the query has TSIG.
However, there is a bug [1] in dnspython, which causes `make_response`
and `to_wire` to crash on messages constructed by `from_wire` with
`keyring=False`, so the hack with `message.__class__` is needed to work
around this.
This makes just enough changes for the tsig system test to work with
dnspython >= 2.0.0. On older version the server gives up.
[1] https://github.com/rthalley/dnspython/issues/1205
When compiling with meson, it may be easy to forget to compile system
test dependencies before running the tests. In that case, the test
results would be quite incosistent and unpredictable, with some tests
ending up with ERROR, some with FAILURE and others PASS, without a clear
indication that something is off before running the entire machinery.
Add a check to fail early on if the FEATURETEST binary isn't available,
indicating that system test dependencies were most likely not compiled.
When send_done() is called with a ISC_R_CANCELED status (e.g. because
of a signal from ctrl+c), dig can fail to shutdown because
check_if_done() is not called in the branch. Add a check_if_done()
call.
When reusing a TCP connection (because of the '+keepopen' option),
dig detaches from the query after launching it. This can cause a
crash in dig in rare cases when the "receive" callback is called
earlier than the "send" callback.
The '_cancel_lookup()' function detaches a query only if it's
found in the 'lookup->q' list. Before this commit, with one
additional detach happening before recv_done() -> _cancel_lookup()
is called, it didn't cause problems because an earlier _query_detach()
was unlinking the query from 'lookup->q' (because it was the last
reference), so the additional detach and the skipped detach were
undoing each other.
That is unless the "receive" callback was called earlier than the
"send" callback, in which case the additional detach wasn't destroying
the query (and wasn't unlinking it from 'lookup->q') because the "send"
callback's attachment was still there, and so _cancel_lookup() was
trying to "steal" the "send" callback's attachment and causing an
assertion on 'INSIST(query->sendhandle == NULL);'.
Delete the detachment which caused the described situation.
RRset ordering is now an enum inside struct rdataset attributes. This
was done to keep size to of the structure to its original value before
this MR.
I expect zero performance impact but it should be easier to deal with
attributes in debuggers and language servers.
The ns_client_t struct is reset and zero-ed out on every query,
but some fields (query, message, manager) are preserved.
We observe two things:
- The sendbuf field is going to be overwritten anyway, there's
no need to zero it out.
- The fields are copied out when the struct is zero-ed out, and
then copied back in. For the query field (which is 896 bytes)
this is very inefficient.
This commit makes the reset more efficient avoiding to unnecessary
zero-ing and copy.
The code which checks for both IPv4 and IPv6 mixed usage is inherently
unstable, since the address family is chosen randomly for each
connection.
Closes#5406
It's possible to use pytest.mark.flaky, which achieves the exact same
thing as our custom-defined isctest.mark.flaky -- attempts to rerun the
test on failure, but only is flaky package is available.
The test_kasp_case[secondary.kasp] can sometimes fail on freebsd13. It
appears the test gets stuck on some operation which should be very
quick, but for some reason takes at least a few seconds, causing the
cb_ixfr_is_signed() function to time out.
In one of the cases I investigated, it wasn't a query/response that
caused a timeout, but rather some operation in between. The test
attempts to read from a keyfile/statefile, but I see no reason why that
should block.
In any case, try to increase the timeout for the verification, as that
shouldn't hurt. Also allow the test to be re-run on freebsd13, as it's
likely to be caused by some odd behaviour on that platform -- the issue
doesn't appear anywhere else.
The check "unix socket message counts" sometimes fails with "dnstap
output file smaller than expected". This only happens on freebsd13 and
can't be reproduced easily. There was an attempt to decrease the
required file size in the past, but apparently, the issue can still
occur.
The serve_stale test has some inherent instabilities affecting many
different checks. While the failure rate isn't too high (about four
failures in past three weeks of nightlies), it gets ignored, because the
test has been unstable for a very long time.
This removes a leftover check which should've been removed in a prior
change (see #5244). The softhsm2 failures when attempting to delete the
token should be ignored.
Previously, the one-second sleep was unreliable, as it didn't properly
indicate that the rndc reconfig has been processed. The "test 'rndc
reconfig' with a broken config" check would sometimes fail under TSAN
in CI, because the previous rndc reconfig was still ongoing, and the
subsequent rndc reconfig was ignored.
These tests have been unstable under TSAN in the past, but it appears
that the same failure mode can happen outside of TSAN tests as well.
These tests have produced 12 failures combined in the past three weeks
in nightlies.
The fetchlimit test has failed 8 times in the nightly CI over the past
three weeks. That makes the overall failure rate somewhere around 1 %,
which isn't a lot, but is still annoying when lots of testing is going
on.
Rndc test "test 'rndc reconfig' with a broken config" was failing
intermittently.
Wait for 'running' to be logged rather than just using 'sleep 1' before
calling 'rndc reconfig' a second time to get the expected error message
rather than 'reconfig request ignored: already running'.
There are many system tests where we set `dnssec-validation yes;` only
to also set `trust-anchors { };` which effectively disables the
validation.
This commit replaces this convoluted setup with just
`dnssec-validation no;`.
When the interface-interval parser was changed from uint32 parser to
duration parser, the default value stayed at plain 60 which now means 60
seconds instead of 60 minutes. Fix the default value and the
documentation to match the reality.
Root trust anchors are automatically updated as described in RFC5011.
Add a system test which ensures the root DNSKEYs are always queried by
named during startup.
Because this test uses real internet DNS root servers, it is enabled
only when `CI_ENABLE_LIVE_INTERNET_TESTS` is set.
This test doesn't require artifact checking but when bundled in the same
directory with the shell based tests, the `system:clang:tsan` job was
failing non-deterministically.
The extra messages are typically traceback from assertion failures.
Previously, they'd be printed only after all individual test case
results have been printed. That made it difficult to pair the traceback
to the failing test in some cases, as the node information (aka test
name) might not always be present.
Instead, log any extra messages related to a particular test failure
directly after reporting its result, making the failure details more
readily available and easy to connect with a particular test case.
Make sure the queries and responses are logged at the DEBUG level, which
may provide useful information in case of failing tests.
This doesn't seem to significantly increase the overall artifacts size.
Previously, pytest.log.txt files from all system tests would take around
3 MB, with this change, it's around 8 MB).
In some cases, it's useful to log the sent and received DNS messages.
Add options to enable this on demand. Query is only logged the first
time it's sent, since it doesn't change. If response logging is turned
on, then each response is logged, since it might be different every
time.
When multiline message is logged, indent all but the first line (which
will be preceeded by the LOG_FORMAT). This improves the clarity of logs,
as it's immediately clear which lines are regular log output, and which
ones are multiline debug output.
Adjust the isctest.run.cmd() stdout/stderr logging to this new format.
The messages obtained from test results may contain stuff like detailed
failure/error information, tracebacks etc. In many cases, the message
will be empty, in which case it doesn't need to be logged.
For an example, run test with many test cases, e.g.
verify/test_verify.py, and inspect the tail of the pytest.log.txt before
and after this commit.
Use isctest.log logging facility for consistent and predictable logging
output rather than using print(). Remove writes of stderr, as that
output will be logged in the debug log in case the commands called with
isctest.run.cmd() fails.
We skip those by default as:
a) we don't want to stress the upstream servers in every CI pipeline
b) system tests need to be runnable in a isolated environment by default
Commit 5cd6c173ff changed the contents of
the PACKAGE_DESCRIPTION preprocessor macro from " (<description>)" to
just "<description>" and missed a spot while adjusting all uses of this
macro in the code base. Fix formatting for that malformed log message,
emitted upon named startup.
Previous CPU test relied on either missing default named.conf or the
missing permissions to write into its default directory. In short that
default configuration would be unusable with current user. It would hang
indefinitely at cpu test if the named user could write into directory
specified in default configuration.
Change it instead to explicitly try non-existent configuration file.
It will still fail immediately, but will not rely on running user or
presence of file at default configuration file path.
There is an ongoing debate about the usefulness of the extra artifacts
check. While it might be useful to detect unexpected behaviour in some
tests, it feels extraneous in many cases. This change provides a middle
ground by making the artifact checking optional. This might be
especially useful for writing new tests, since the author gets to decide
whether the check is useful -- and can utilize it, or can skip it for
sake of brevity.
The emptyzones system test ran two consecutive "rndc reload" commands
without waiting for the first one to complete. It used to work because
the commands were serialized, but now an rndc reconfig/reload command is
ignored if another one is already running, so the emptyzones test is
more likely to fail.
Fix this problem by waiting for the log message indicating that all the
zones are loaded before attempting the next reload.
Add a new system test which checks named output when starting,
reconfiguring and reloading the server. It checks that the steps where
configuration is loaded, when named enters exclusive mode, and when the
configuration is applied are all logged, and that they occur in the
correct order. This adds a guard/warning to keep the parsing of the
named.conf outside of the exclusive mode.
The configuration file was parsed when named was in exclusive
(i.e. single-threaded) mode and unable to answer queries. Because
the parsing is a self-contained operation, it is now done before
named enters exclusive mode.
This reduces the amount of time named can't answer queries when
reloading the configuration when the configuration file is large.
Note that exclusive mode is still used for applying the
configuration changes to the server.
Also, simplify the configuration logic by parsing the built-in
configuration only once at server start time.
In some rare cases, the softhsm2 utility reports failure to delete the
token directory, despite the token being found. Subsequent attempts to
delete the token again indicate that the token was deleted.
Ignore this cleanup error, as it doesn't prevent our tests from working
properly. There is also an attempt to delete the token before the test
starts which ensures a clean state before the test is executed, in case
there's actually a leftover token.
For duration measurements, i.e. deadlines and timeouts, it's more
suitable to use monotonic time as it's guaranteed to only go forward,
unlike time.time() which can be affected by local clock settings.
Allow use of exception (and by extension, assert statements) in the
called function in order to extract essential debug information about
the type of failure that was encountered.
In case the called function fails to succeed on the last retry and
raised an exception, log it as error and set it as the assert message to
propagate it through the pytest framework.
Create a test scenario where a signed zone is in multiple views and
then a key may be purged. This is a bug case where the key files are
removed by one view and then the other view starts complaining.
Add a zone using DS records that embed the private algorithm
identifier in the digest field. There are 2 DS record for an
unsupported DNSSEC algorithm one of which that doesn't have a
matching DNSKEY. This zone should validate as insecure as the
validator can establish that both DS records are for unsupported
DNSSEC algorithms.
There are 4 tests:
1) a zone using a known private OID. Validations should succeed
and return AD=1.
2) a zone using an unknown private OID. Validation should succeed
and return AD=0 as the DS to DNSKEY has provably unsupported
algorithm.
3) a zone using a known private OID and an extra DS record. Validation
should succeed as there is DS to DNSKEY with a known algorithm
linkage.
4) a zone using an unknown private OID and an extra DS record.
Validation should fail as only one of the DS records can be matched
to a provable unknown algorithm. The algorithm of the second DS
is indeterminate.
Use the existing RSASHA256 and RSASHA512 implementation to provide
working PRIVATEOID example implementations. We are using the OID
values normally associated with RSASHA256 (1.2.840.113549.1.1.11)
and RSASHA512 (1.2.840.113549.1.1.13).
Add support for proposed DS digest types that encode the private
algorithm identifier at the start of the DS digest as is done for
DNSKEY and RRSIG. This allows a DS record to identify the specific
DNSSEC algorithm, rather than a set of algorithms, when the algorithm
field is set to PRIVATEDNS or PRIVATEOID.
- dns_zone_cdscheck() has been extended to extract the key algorithms
from DNSKEY data when the CDS algorithm is PRIVATEOID or PRIVATEDNS.
- dns_zone_signwithkey() has been extended to support signing with
PRIVATEDNS and PRIVATEOID algorithms. The signing record (type 65534)
added at the zone apex to indicate the current state of automatic zone
signing can now contain an additional two-byte field for the DST
algorithm value, when the DNS secalg value isn't enough information.
dns_resolver_algorithm_supported() has been extended so in addition to
an algorithm number, it can also take a pointer to an RRSIG signature
field in which key information is encoded.
DST algorithm and DNSSEC algorithm values are not necessarily the same
anymore: if the DNSSEC algorithm value is PRIVATEOID or PRIVATEDNS, then
the DST algorithm will be mapped to something else. The conversion is
now done correctly where necessary.
- When the algorithm value for a DNSSEC key is set to PRIVATEOID
or PRIVATEDNS, that's a placeholder value indicating that the
real algorithm identifier is encoded into the key or signature
data. That means the DNSKEY algorithm value and the DST algorithm
value may not be identical, so we must now add environment variables
DEFAULT_ALGORITHM_DST_NUMBER, ALTERNATIVE_ALGORITHM_DST_NUMBER
and DISABLED_ALGORITHM_DST_NUMBER to the test suite, with support
for mapping from DST algorithm value to PRIVATEDNS or PRIVATEOID.
- Some test cases use RRSIGs that have been modified to force
validation to fail. When making those modifications, we now
preserve the first part of the signature, so that PRIVATEDNS and
PRIVATEOID algorithm identifier values will still work. (This
assumes that the identifiers are short and fit into the first
base64 block.)
When going insecure, we publish CDS and CDNSKEY DELETE records. Update
the check_apex function to test this.
Also, skip some tests in the 'check_rollover_step()' function. If
we change the DNSSEC Policy, keys that no longer match the policy will
be retired. When this exactly happens is hard to determine, as it
happens on the reconfigure. So for these tests, we skip the key timing
metadata checks.
Also, the zone becomes unsigned, so don't call 'check_zone_is_signed'
in those cases.
These test cases involve a reconfiguration. The first one is a zone
that changes from dynamic to inline-signing. The others are tests that
key lifetimes are updated correctly after changing them.
apply the existing journal file, if any, to the old version of the
database before diffing it against the new version. then, append
the diff to the end of the journal. this allows easy creation of
a journal file with multiple deltas, by running named-makejournal
successively.
- write the journal to a default location (file1.jnl) if it was not
specified on the commandline.
- exit with a clear error message if file2's SOA serial number is
not changed from file1.
move the "makejournal" tool from bin/tests/system to bin/tools
and rename it to "named-makejournal". add a man page. update
tests to use the new file location.
check that `delv +ns` sends iterative queries over both address
families when -4 and -6 are not used, and suppresses queries
appropriately when they are.
`delv +ns` invokes the same code to perform name resolution as `named`,
but it neglected to set up an IPv6 dispatch object first. Consequently,
it was behaving more like `named -4`.
It now sets up dispatch objects for both address families, and performs
resolver queries to both v4 and v6 addresses, except when one of the
address families has been suppressed by using `delv -4` or `delv -6`.
The "keyopts" field of the dns_zone object was added to support
"auto-dnssec"; at that time the "options" field already had most of
its 32 bits in use by other flags, so it made sense to add a new
field.
Since then, "options" has been widened to 64 bits, and "auto-dnssec"
has been obsoleted and removed. Most of the DNS_ZONEKEY flags are no
longer needed. The one that still seems useful (_FULLSIGN) has been
moved into DNS_ZONEOPT and the rest have been removed, along with
"keyopts" and its setter/getter functions.
Meson is a modern build system that has seen a rise in adoption and some
version of it is available in almost every platform supported.
Compared to automake, meson has the following advantages:
* Meson provides a significant boost to the build and configuration time
by better exploiting parallelism.
* Meson is subjectively considered to be better in readability.
These merits alone justify experimenting with meson as a way of
improving development time and ergonomics. However, there are some
compromises to ensure the transition goes relatively smooth:
* The system tests currently rely on various files within the source
directory. Changing this requirement is a non-trivial task that can't
be currently justified. Currently the last compiled build directory
writes into the source tree which is in turn used by pytest.
* The minimum version supported has been fixed at 0.61. Increasing this
value will require choosing a baseline of distributions that can
package with meson. On the contrary, there will likely be an attempt
to decrease this value to ensure almost universal support for building
BIND 9 with meson.
A dns_view_t has a queryonacl property, which is supposed to hold the
ACL matching the configuration "allow-query-on". However the code
parsing this configuration ACL was missing (or removed by mistake?),
hence this property was always NULL. The ACL was still built but
individually for each zone (which checks if the property exists in the
zone definition, view definition, and finally options definition).
It now create the ACL instance at the view level, enabling zones to
share the same (identical) ACL instead of having their own copies.
A "template" statement can contain the same configuration clauses
as a "zone" statement. A "zone" statement can now reference a
template, and all the clauses in that template will be used as
default values for the zone. For example:
template primary {
type primary;
file "$name.db";
initial-file "primary.db";
};
zone example.com {
template primary;
file "different-name.db"; // overrides the template
};
Special tokens can now be specified in a zone "file" option
in order to generate the filename parametrically. The first
instead of "$name" in the "file" option is replaced with the
zone origin, the first instance of "$type" is replaced with the
zone type (i.e., primary, secondary, etc), and the first instance
of "$view" is replaced with the view name..
This simplifies the creation of zones using initial-file templates.
For example:
$ rndc addzone <zonename> \
{ type primary; file "$name.db"; initial-file "template.db"
When loading a primary zone for the first time, if the zonefile
does not exist but an "initial-file" option has been set, then a
new file will be copied into place from the path specified by
"initial-file".
This can be used to simplify the process of adding new zones. For
instance, a template zonefile could be used by running:
$ rndc addzone example.com \
'{ type primary; file "example.db"; initial-file "template.db"; };'
there were some duplicated syntax checks in named_zone_configure()
that are no longer needed, now that we perform those same checks
using isccfg_check_zoneconf().
there were also some syntax checks that were *only* in
named_zone_configure(), which have now been moved to
isccfg_check_zoneconf(). test cases for them have been
added to the checkconf system test.
the function that checks zone syntax in libisccfg was previously
only called when loading named.conf, not when parsing an an
"rndc addzone" or "rndc modzone" command. this has been corrected.
note that some checks are still skipped: those that check for
duplication of filenames, key directories, etc. to fix this, we'd need
to export the symbol tables that are set up when loading named.conf and
preserve them so they could be reused later.
The "run.sh" script, used by "make test", changes the working
directory to the system test directory before executing pytest.
If the test drops hypothesis artifacts while running, this
can cause spurious test failures due to an apparent mismatch
between the contents of the system test directory and the
temporary pytest directory. This has been addressed by having
"run.sh" call pytest from the parent directory instead.
named-rrchecker now parses the braces which support multi-line input
from the beginning of the input rather than only when reading the
data fields of the record.
The odd-looking "\ " escape is required to italicize <character-string>
without italicizing the final "s". See reStructuredText Markup
Specification, sections "Inline markup recognition rules" and "Escaping
Mechanism". Most importantly:
Escaped whitespace characters are removed from the output document
together with the escaping backslash. This allows for character-level
inline markup.
Deduplicate the code for dynamic updates and increase code clarity by
using an actual dns.update.UpdateMessage rather than an undefined
intermediary format passed around as a list of arguments.
Move the 'csk-roll1' and 'csk-roll2' zones to the rollover test dir and
convert CSK rollover tests to pytest.
The DS swap spans multiple steps. Only the first time we should check
if the "CDS is now published" log is there, and only the first time we
should run 'rndc dnssec -checkds' on the keys. Add a new key to the
step dictionary to disable the DS swap checks.
This made me realize that we need to check for "is not None" in case
the value in the dictionary is False. Update check_rollover_step()
accordingly, and also add a log message which step/zone we are currently
checking.
Move the 'ksk-doubleksk' zones to the rollover test dir and convert KSK
rollover test to pytest.
Since the 'ksk-doubleksk' policy publishes different CDNSKEY/CDS RRsets,
update the 'check_rollover_step' to check which CDNSKEY/CDS RRsets should
be published and which should be prohibited. Update 'isctest.kasp'
accordingly.
We are changing the ZSK lifetime to unlimited in this test case as it
is of no importance (this actually discovered a bug in setting the
next time the keymgr should run).
Move the 'zsk-prepub' zones to the rollover test dir and convert ZSK
rollover test to pytest.
We need a way to signal a smooth rollover is going on. Signatures are
being replaced gradually during a ZSK rollover, so the existing
signatures of the predecessor ZSK are still being used. Add a smooth
operator to set the right expectations on what signatures are being
used.
Setting expected key relationships is a bit crude: a list of two
elements where the first element is the index of the expected keys that
is the predecessor, and the second element is the index of the expected
keys that is the successor.
We are changing the KSK lifetime to unlimited in this test case as it
is of no importance.
Move the 'enable-dnssec' to the rollover test dir and convert to pytest.
This requires new test functionality to check that "CDS is published"
messages are logged (or prohibited).
The setup part is slightly adapted such that it no longer needs to
set the '-P sync' value in most cases (this is then set by 'named'),
and to adjust for the inappropriate safety intervals fix.
Move the multi-signer test scenarios to the rollover directory and
convert tests to pytest.
- If the KeyProperties set the "legacy" to True, don't set expected
key times, nor check them. Also, when a matching key is found, set
key.external to True.
- External keys don't show up in the 'rndc dnssec -status' output so
skip them in the 'check_dnssecstatus' function. External keys never
sign RRsets, so also skip those keys in the '_check_signatures'
function.
- Key properties strings now can set expected key tag ranges, and if
KeyProperties have tag ranges set, they are checked.
In order to keep the kasp system test somewhat approachable, let's
move all rollover scenarios to its own test directory. Starting with
the manual rollover test cases.
A new test function is added to 'isctest.kasp', to verify that the
relationship metadata (Predecessor, Successor) is set correctly.
The configuration and setup for the zone 'manual-rollover.kasp' are
almost copied verbatim, the only exception is the keytimes. Similar
to the test kasp cases, we no longer set "SyncPublish/PublishCDS" in
the setup script. In addition to that, the offset is changed from one
day ago to one week ago, so that the key states match the timing
metadata (one day is too short to move a key from "hidden" to
"omnipresent").
Replace the custom DNS servers used in the "chain" system test with
new code based on the isctest.asyncserver module.
For ans3, replace the sequence of logical conditions present in Perl
code with zone files and a limited amount of custom logic applied on top
of them where necessary.
For ans4, replace the ctl_channel() and create_response() functions with
a custom control command handler coupled with a dynamically instantiated
response handler, making the code more robust and readable.
Migrate sendcmd() and its uses to the new way of sending control queries
to custom servers used in system tests.
To improve readability of sendcmd() calls used for controlling
isctest.asyncserver-based custom DNS servers, pass the command's name
and arguments as separate parameters.
Adding proper DNAME support to AsyncDnsServer would add complexity to
its code for little gain: DNAME use in custom system test servers is
limited to crafting responses that attempt to trigger bugs in named.
This fact will not be obvious to AsyncDnsServer users as it
automatically loads all zone files it finds and handles CNAME records
like a normal authoritative DNS server would.
Therefore, to prevent surprises:
- raise an exception whenever DNAME records are found in any of the
zone files loaded by AsyncDnsServer,
- add a new optional argument to the AsyncDnsServer constructor that
enables suppressing this new behavior, enabling zones with DNAME
records to be loaded anyway.
This enables response handlers to use the DNAME records present in zone
files in arbitrary ways without complicating the "base" code.
The constructor for the AsyncDnsServer class takes a 'load_zones'
argument that is not used anywhere and is not expected to be useful in
the future: zone files are not required for an AsyncDnsServer instance
to start and, if necessary, zone-based answers can be suppressed or
modified by installing a custom response handler.
dnspython does not treat CNAME records in zone files in any special way;
they are just RRsets belonging to zone nodes. Process CNAMEs when
preparing zone-based responses just like a normal authoritative DNS
server would.
The pytest cases checks if a zone is signed by looking at the NSEC
record at the apex. If that has an RRSIG record, it is considered
signed. But 'named' signs zones incrementally (in batches) and so
the zone may still lack some signatures. In other words, the tests
may consider a zone signed while in fact signing is not yet complete,
then performs additional checks such as is a subdomain signed with the
right key. If this check happens before the zone is actually fully
signed, the check will fail.
Fix this by using 'check_dnssec_verify' instead of
'check_is_zone_signed'. We were already doing this check, but we now
move it up. This will transfer the zone and then run 'dnssec-verify'
on the response. If the zone is partially signed, the check will fail,
and it will retry for up to ten times.
On FIPS-enabled platforms, we need to ensure a minimal version of
hypothesis which no longer uses MD5. This doesn't need to be enforced
for other platforms.
Move the import magic to a utility module to avoid copy-pasting the
boilerplate code around.
There were several methods how we used 'argv[0]'. Some programs had a
static value, some programs did use isc_file_progname(), some programs
stripped 'lt-' from the beginning of the name. And some used argv[0]
directly.
Unify the handling and all the variables into isc_commandline_progname
that gets populated by the new isc_commandline_init(argc, argv) call.
Instead of giving the memory context names with an explicit call to
isc_mem_setname(), add the name to isc_mem_create() call to have all the
memory contexts an unconditional name.
When request manager shuts down, it also shuts down all its ongoing
requests. Currently it calls their callback functions with a
ISC_R_SHUTTINGDOWN result code for the request. Since a request
manager can shutdown not only during named shutdown but also during
named reconfiguration, instead of sending ISC_R_SHUTTINGDOWN result
code send a ISC_R_CANCELED code to avoid confusion and errors with
the expectation that a ISC_R_SHUTTINGDOWN result code can only be
received during actual shutdown of named.
All the callback functions which are passed to either the
dns_request_create() or the dns_request_createraw() functions have
been analyzed to confirm that they can process both the
ISC_R_SHUTTINGDOWN and ISC_R_CANCELED result codes. Changes were
made where it was necessary.
This new test checks that named can correctly process an interrupted
SOA request during zone transfer, caused by reconfiguration.
Co-authored-by: Michał Kępień <michal@isc.org>
Enable existing rndc system tests (the python test function calling the
shell file was missing). Also update the extra artifacts list to remove
one generated file which was left behind.
replace the pattern `for (result = dns_rdataset_first(x); result ==
ISC_R_SUCCES; result = dns_rdataset_next(x)` with a new
`DNS_RDATASET_FOREACH` macro throughout BIND.
previously, ISC_LIST_FOREACH and ISC_LIST_FOREACH_SAFE were
two separate macros, with the _SAFE version allowing entries
to be unlinked during the loop. ISC_LIST_FOREACH is now also
safe, and the separate _SAFE macro has been removed.
similarly, the ISC_LIST_FOREACH_REV macro is now safe, and
ISC_LIST_FOREACH_REV_SAFE has also been removed.
The `max-rsa-exponent-size` could limit the exponents of the RSA
public keys during the DNSSEC verification. Instead of providing
a cryptic (not cryptographic) knob, hardcode the max exponent to
be 4096 (the theoretical maximum for DNSSEC).
This new option sets the delay, in seconds, to wait before sending
a set of NOTIFY messages for a zone. Whenever a NOTIFY message is
ready to be sent, sending will be deferred for this duration.
The test_idle_timeout check in the "timeouts" system test has been
failing often on FreeBSD 13 AWS hosts. Adding timestamped debug logging
shows that the time.sleep() calls used in that check are returning
significantly later than asked to on that platform (e.g. after 4 seconds
when just 1 second is requested), breaking the test's timing assumptions
and triggering false positives. These failures are not an indication of
a bug in named and have not been observed on any other platform. Mark
the problematic check as flaky, but only on FreeBSD 13, so that other
failure modes are caught appropriately.
In commit cc167266aa, the -g option was changed so it sets both
named_g_logstderr and also named_g_logflags to use ISO style timestamps
with tzinfo. Together with an error in named_log_setsafechannels(), that
change could cause the debugging level to be ignored.
Extract each section of the bundle and check that the expected
records are there. The old code was assuming that the records in
each section where in a particular order which didn't happen in
practice.
These tests ensure that if dnssec-policy is set on a higher level, the
zone is still signed (or unsigned) as expected. Or if a higher level
has an override, the new policy is honored as expected.
The isc_nm_getinitialtimeout() function (and also the previously used
isc_nm_gettimeouts() function) returns timeout value(s) in milliseconds,
while the dns_request_create() function expects timeout values in
seconds. Fix the bug by dividing the timeout value by MS_PER_SEC.
There is no added test, because it turns out delv doesn't support
setting custom timeout values (as opposed to what is suggested in
its man page). Tests should be added later when the '+timeout=T'
option is implemented.
Previously all kinds of TCP timeouts had a single getter and setter
functions. Separate each timeout to its own getter/setter functions,
because in majority of cases only one is required at a time, and it's
not optimal expanding those functions every time a new timeout value
is implemented.
Since notify messages now use the configured 'tcp-initial-timeout'
connect timeout value, the existing "checking notify retries expire
within 30 seconds" check in the "notify" system test is failing. Set
the 'tcp-initial-timeout' option for ns3 to the previously hardcoded
value of 15 seconds for the test to pass successfully.
The new 'tcp-primaries-timeout' configuration option works the same way
as the existing 'tcp-initial-timeout' option, but applies only to the
TCP connections made to the primary servers, so that the timeout value
can be set separately for them. The default is 15 seconds.
Also, while accommodating zone.c's code to support the new option, make
a light refactoring with the way UDP timeouts are calculated by using
definitions instead of hardcoded values.
For 'keystore.kasp', a setting 'key-directories' is used. If set, this
will expect a list of two directories, the first one is where the KSKs
will be stored, the second in the list is the ZSK key directory. This
may be expanded in the future to test more complex key storage cases.
The 'rumoured.kasp' zone is weird, the key timings can never match
those key states. But it is a regression test for an early day bug,
so we convert it, but skip the expected key times check.
These test cases follow the same pattern as many other, but all require
some additional checks. These are set in "additional-tests".
The "zsk-missing.autosign" zone is special handled, as it expects the
KSK to sign the SOA RRset (because the ZSK is unavailable).
The kasp/ns3/setup.sh script is updated so the SyncPublish is not set
(named will initialize it correctly). For the test zones that have
missing private key files we do need to set the expected key timing
metadata.
Remove the counterparts for the newly added test from the kasp shell
tests script.
The check_signatures code was initially created to be suitable for
the ksr system test, to test the Offline KSK feature. For that, a
key is expected to be signing if the current time is between
the timing metadata Active and Retired.
With dnssec-policy, the key timing metadata is indicative, the key
states determine the actual signing behavior.
Update the check_signatures function so that by default the signing
is derived from the key states (ksigning and zsigning). Add an
argument 'offline_ksk', if set the make sure that the zsigning is set
if the current time is between the Active and Retired timing metadata,
and for ksigning we just use the timing metadata (as the key is offline,
we cannot check the key states).
Another (upcoming) test case is where key files are missing. When the
ZSK private key file is missing, the KSK takes over. Add an argument
'zsk_missing', when set to True the expected zone signing (zsigning)
is reversed.
The zone 'pregenerated.kasp' is a case where there already exist more
keys than required. For this we set the 'pregenerated' setting. This
will change the 'keydir_to_keylist' function behavior: Only keys in use
are considered. A key is in use if all of the states are either
undefined, or set to 'hidden'.
The 'some-keys.kasp' zone is similar to 'pregenerated.kasp', except
only some keys have been pregenerated.
Write python-based tests for the many test cases from the kasp system
test. These test cases all follow the same pattern:
- Wait until the zone is signed.
- Check the keys from the key-directory against expected properties.
- Set the expected key timings derived from when the key was created.
- Check the key timing metadata against expected timings.
- Check the 'rndc dnssec -status' output.
- Check the apex is signed correctly.
- Check a subdomain is signed correctly.
- Verify that the zone is DNSSEC correct.
Remove the counterparts for the newly added test from the kasp shell
tests script.
Add a new test which gets an answer for a delegated zone, then
checks whether the 'stale-answer-client-timeout 0' mode (i.e. the
'stalefirst' mode) works for it.
The offical EDNS option name for "UL" is "UPDATE-LEASE". We now
emit "UPDATE-LEASE" instead of "UL", when printing messages, but
"UL" has been retained as an alias on the command line.
Update leases consist of 1 or 2 values, LEASE and KEY-LEASE. These
components are now emitted separately so they can be easily extracted
from YAML output. Tests have been added to check YAML correctness.
When rendering text, such as domain names or the EXTRA-TEXT
field of the EDE option, backslashes and quotation marks must
be escaped to ensure that the emitted message is valid YAML.
isctest.util was not imported so file_contents_contain could not be
found. And rename verify_keys to check_keys because it asserts in
isctest.run.retry_with_timeout.
This converts a special characters test case, a max-zone-ttl error
check, and two cases of insecure zones.
We no longer assert for having more than one DNSKEY and/or RRSIG
records. If the zone is insecure, this is no longer always true. And
we already check for the expected number of records in the
check_dnskeys/check_signatures functions.
This commit deals with converting the dynamic zone test cases to
pytest. The tests for 'inline-signing.kasp' are similar to the default
case, so these are added to 'test_kasp_default'.
Unfortunately I need to add sleep calls in between freezing, updating,
and thawing a zone. Without it the intermittent failures are too
frequent.
This commit deals with converting the test cases related to the default
dnssec-policy.
This requires a new method 'check_update_is_signed'. This method will
be used in future tests as well, and checks if an expected record is
in the zone and is properly signed.
Remove the counterparts for the newly added test from the kasp shell
tests script.
Convert the first couple of tests from 'kasp/tests.sh' to
'kasp/tests_kasp.py', those are test cases related to 'dnssec-keygen'
and 'dnssec-settime'.
For this, we also add a new KeyProperties method,
'policy_to_properties', that takes a list of strings which represent
the keys according to the dnssec-policy and the expected key states.
Many of the system tests now use jinja2 template engine. Adding jinja2
as a hard dependency is preferable than potentially silently skipping
many system tests.
These setup.sh scripts only do templating and copying files. Both of
these can be replaced with either jinja templates, or using plain files.
Since each test invocation creates its own temporary directory, copying
files to ensure a "clean" state is no longer necessary.
In cases where named writes some content to the files, a jinja template
can be used instead of a plain file to avoid an artifact check which
would detect a change to a git-tracked file.
All these setup files only use copy_setports function which can be done
with jinja2 templates instead -- simply by renaming the .in files to
.j2, without any other changes. The pytest runner will render these
templates during test setup without any need for an additional script.
DNS COOKIE and NSID should also be being processed when returning
BADVERS. Check that this has actually occured by looking for the
cookie and nsid in the response.
The original check_pid() always returned 0 on FreeBSD, even if the
process was still running. This makes the "verifying that named checks
for conflicting named processes" check fail on FreeBSD with TSAN.
Replace the custom DNS servers used in the "forward" system test with
new code based on the isctest.asyncserver module.
For ans6, instead of configuring the responses to send at runtime, set
them up when the server is started. Make sure the server supports
toggling response sending at runtime to enable simulating forwarder
timeouts as required by one of the checks.
For ans11, put most of the responses to be provided by that server into
a zone file, only retaining code modifying zone-based answers in the
form of a response handler, to improve code readability. Use explicit
domain names instead of variables as that server only handles a single
domain and fixed strings improve readability in this case. Make sure
the server supports toggling response sending at runtime to enable
simulating forwarder timeouts as required by one of the checks.
Migrate sendcmd() and its uses to the new way of sending control queries
to custom servers used in system tests.
Implement a reusable control command that makes it possible to
dynamically disable/enable sending responses to clients. This is a
typical use case for custom DNS servers employed in various BIND 9
system tests.
Some BIND 9 system tests need to dynamically change custom server
behavior at runtime. Existing custom servers typically use a separate
TCP socket for listening to control commands, which mimics what named
does, but adds extra complexity to the custom server's networking code
for no gain (given the purpose at hand). There is also no common way of
performing typical runtime actions (like toggling response dropping)
across all custom servers.
Instead of listening on a separate TCP socket in asyncserver.py, make it
detect DNS queries to a "magic" domain ("_control.") on the same port as
the one it uses for receiving "production" DNS traffic. This enables
query/response logging code to be reused for control traffic, clearly
denotes behavior changes in packet captures, facilitates implementing
commonly used features as reusable chunks of code (by making them "own"
distinct subdomains of the control domain), voids the need for separate
tools sending control commands, and enables using DNS facilities for
returning information to the user (e.g. RCODE for status codes, TXT
records for additional information, etc.).
With multiple and/or dynamically managed response handlers at play, it
becomes useful for debugging purposes to know which handler (if any) was
used for preparing each response sent by the server. Add debug logs
providing that information. Make class name the default string
representation of each response handler to prettify logs.
Extend AsyncDnsServer.install_response_handler() so that the provided
response handler can be inserted at the beginning of the handler list.
This enables installing a response handler that takes priority over all
previously installed handlers.
Add a new method, AsyncDnsServer.uninstall_response_handler(), which
enables removing a previously installed response handler.
Together, these two methods provide full control over the response
handler list at runtime.
Add a main() function to all custom servers based on isctest.asyncserver
and move server startup code there. This prevents redefining variables
from outer scope in custom server code as it evolves.
Prevent custom servers based on asyncserver.py from exiting prematurely
due to unhandled exceptions raised as a result of attempting to parse
invalid queries sent by clients.
The StreamWriter.wait_closed() method was introduced in Python 3.7, so
attempting to use it with Python 3.6 raises an exception. This has not
been noticed before because awaiting StreamWriter.wait_closed() is the
last action taken for each TCP connection and unhandled exceptions were
not causing the scripts based on AsyncServer to exit prematurely until
the previous commit.
As per Python documentation [1], awaiting StreamWriter.wait_closed()
after calling StreamWriter.close() is recommended, but not mandatory, so
try to use it if it is available, without taking any fallback action in
case it isn't.
[1] https://docs.python.org/3.13/library/asyncio-stream.html#asyncio.StreamWriter.close
Uncaught exceptions raised by tasks running on event loops are not
handled by Python's default exception handler, so they do not cause
scripts to die immediately with a non-zero exit code. Set up an
exception handler for AsyncServer code that makes any uncaught exception
the result of the Future that the top-level coroutine awaits. This
ensures that any uncaught exceptions cause scripts based on AsyncServer
to immediately exit with an error, enabling the system test framework to
fail tests in which custom servers encounter unforeseen problems.
For the kasp tests we need a new utility that can retrieve a list of
Keys from a given directory, belonging to a specific zone. This is
'keydir_to_keylist' and is the replacement of 'kasp.sh:get_keyids()'.
'next_key_event_eqauls' is a method to check when the next key event is
scheduled, needed for the rollover tests, and is the equivalent of shell
script 'check_next_key_event'.
This commit introduces replacements for the 'check_keys' and
'check_keytimes' from the shell test library. 'check_keys' is renamed
to 'verify_keys' because it does not assert.
For that, we introduce more functions for the class Key. The
'match_properties' function is used in 'verify_keys' to see if a set of
KeyProperties match the Key. This speficially ignores timing metadata.
The function resembles what is in 'kasp.sh:check_key()'.
The 'match_timingmetadata' function is used in 'check_keytimes' to see
if the timing metadata of a set of KeyProperties match the Key. The
values are checked in all three key files (except if the private key is
not available (set with properties["private"]), or if it is a legacy key
(set with properties["legacy"]).
An additional check function is added, to check if the key relationships
are set correctly. It follows a similar pattern as 'check_keytimes'. If
"Predecessor" and/or "Successor" are expected to be set in the state
file, this function checks so, and also verifies that they are not set
if they should not be.
Because we want to check the metadata in all three files, a new
value in the Key class is added: 'privatefile'. The 'get_metadata'
function is adapted so that we can also check metadata in other files.
Introduce methods to easily retrieve the TTL and public DNSKEY record
from the keyfile.
When checking if the CDS is equal to the expected value, use the DNSKEY
TTL instead of hardcoded 3600.
In isctest.kasp, introduce a new class 'KeyProperties' that can be used
to check if a Key matches expected properties. Properties are for the
time being divided in three parts: 'properties' that contain some
attributes of the expected properties (such as are we dealing with a
legacy key, is the private key available, and other things that do not
fit the metadata exactly), 'metadata' that contains expected metadata
(such as 'Algorithm', 'Lifetime', 'Length'), and 'timing', which is
metadata of the class KeyTimingMetadata.
The 'default()' method fills in the expected properties for the default
DNSSEC policy.
The 'set_expected_times()' sets the expected timing metadata, derived
from when the key was created. This method can take an offset to push
the expected timing metadata a duration in the future or back into the
past. If 'pregenerated=True', derive the expected timing metadata from
the 'Publish' metadata derived from the keyfile, rather than from the
'Created' metadata.
The calculations in the 'Ipub', 'IpubC' and 'Iret' methods are derived
from RFC 7583 DNSSEC Key Rollover Timing Considerations.
This is the first step of converting the kasp system test to pytest.
Well, perhaps not the first, because earlier the ksr system test was
already converted to pytest and then the `isctest/kasp.py` library
was already introduced. Lots of this code can be reused for the kasp
pytest code.
First of all, 'check_file_contents_equal' is moved out of the ksr test
and into the 'check' library. This feels the most appropriate place
for this function to be reused in other tests. Then, 'keystr_to_keylist'
is moved to the 'kasp' library.
Introduce two new methods that are unused in this point of time, but
we are going to need them for the kasp system test. 'zone_contains'
will be used to check if a signature exists in the zonefile. This way
we can tell whether the signature has been reused or refreshed.
'file_contents_contain' will be used to check if the comment and public
DNSKEY record in the keyfile is correct.
Reformat the section to be more consistent with the rest of the rndc
documentation and avoid using :program: directive which would needlessly
break rst links.
use the ISC_LIST_FOREACH pattern in places where lists had
been iterated using a different pattern from the typical
`for` loop: for example, `while (!ISC_LIST_EMPTY(...))` or
`while ((e = ISC_LIST_HEAD(...)) != NULL)`.
the pattern `for (x = ISC_LIST_HEAD(...); x != NULL; ISC_LIST_NEXT(...)`
has been changed to `ISC_LIST_FOREACH` throughout BIND, except in a few
cases where the change would be excessively complex.
in most cases this was a straightforward change. in some places,
however, the list element variable was referenced after the loop
ended, and the code was refactored to avoid this necessity.
also, because `ISC_LIST_FOREACH` uses typeof(list.head) to declare
the list elements, compilation failures can occur if the list object
has a `const` qualifier. some `const` qualifiers have been removed
from function parameters to avoid this problem, and where that was not
possible, `UNCONST` was used.
ISC_LIST_FOREACH and related macros now use 'typeof(list.head)' to
declare the list elements automatically; the caller no longer needs
to do so.
ISC_LIST_FOREACH_SAFE also now implicitly declares its own 'next'
pointer, so it only needs three parameters instead of four.
A recent change to the dnssec system test depended on a file
that is only in the source tree, not in the build tree, and was
therefore not available in out-of-tree builds.
libsystemd, despite being useful, adds a huge surface area for just
using the sd_notify API. libsystemd's surface has been exploited in the
past [1].
Implement the systemd notification protocol by hand since it is just
sending newline-delimited datagrams to a UNIX socket. The code shouldn't
need more attention in the future since the notification protocol is
covered under systemd's stability promise [2].
We don't need to support VSOCK-backed service notifications since they
are only intended for virtual machine inits.
[1]: https://www.openwall.com/lists/oss-security/2024/03/29/4
[2]: https://systemd.io/PORTABILITY_AND_STABILITY/