When performing QNAME minimization, named now sends an NS
query for the original QNAME, to prevent the parent zone from
receiving the QTYPE.
For example, when looking up example.com/A, we now send NS queries
for both com and example.com before sending the A query to the
servers for example.com. Previously, an A query for example.com
would have been sent to the servers for com.
Several system tests needed to be adjusted for the new query pattern:
- Some queries in the serve-stale test were sent to the wrong server.
- The synthfromdnssec test could fail due to timing issues; this
has been addressed by adding a 1-second delay.
- The cookie test could fail due to the a change in the count of
TSIG records received in the "check that missing COOKIE with a
valid TSIG signed response does not trigger TCP fallback" test case.
- The GL #4652 regression test case in the chain system test depends
on a particular query order, which no longer occurs when QNAME
minimization is active. We now disable qname-minimization
for that test.
Changed the default value for 'allow-transfer' to 'none'; zone
transfers now require explicit authorization.
Updated all system tests to specify an allow-transfer ACL when needed.
Revised the ARM to specify that the default is 'none'.
The lock-file configuration (both from configuration file and -X
argument to named) has better alternatives nowadays. Modern process
supervisor should be used to ensure that a single named process is
running on a given configuration.
Alternatively, it's possible to wrap the named with flock(1).
All changes in this commit were automated using the command:
shfmt -w -i 2 -ci -bn . $(find . -name "*.sh.in")
By default, only *.sh and files without extension are checked, so
*.sh.in files have to be added additionally. (See mvdan/sh#944)
The old name "common" clashes with the convention of system test
directory naming. It appears as a system test directory, but it only
contains helper files.
To reduce confusion and to allow automatic detection of issues with
possibly missing test files, rename the helper directory to "_common".
The leading underscore indicates the directory is different and the its
name can no longer be confused with regular system test directories.
Ensure all shell system tests are executed with the errexit option set.
This prevents unchecked return codes from commands in the test from
interfering with the tests, since any failures need to be handled
explicitly.
the default value of dnssec-validation is 'auto', which causes
a server to send a key refresh query to the root zone when starting
up. this is undesirable behavior in system tests, so this commit
sets dnssec-validation to either 'yes' or 'no' in all tests where
it had not previously been set.
this change had the mostly-harmless side effect of changing the cached
trust level of unvalidated answer data from 'answer' to 'authanswer',
which caused a few test cases in which dumped cache data was examined in
the serve-stale system test to fail. those test cases have now been
updated to expect 'authanswer'.
In order to run the shell system tests, the pytest runner has to pick
them up somehow. Adding an extra python file with a single function
for the shell tests for each system test proved to be the most
compatible way of running the shell tests across older pytest/xdist
versions.
Modify the legacy run.sh script to ignore these pytest-runner specific
glue files when executing tests written in pytest.
Remove parsing the configuration options 'alt-transfer-source',
'alt-transfer-source-v6', and 'use-alt-transfer-source', and remove
the corresponding code that implements the feature.
The system test should never attempt to start or stop any other server
than those that belong to that system test. Therefore, it is not
necessary to specify the system test name in function calls.
Additionally, this makes it possible to run the test inside a
differently named directory, as its name is automatically detected with
the $SYSTESTDIR variable. This enables running the system tests inside a
temporary directory.
Direct use of stop.pl was replaced with a more systematic approach to
use stop_servers helper function.
The checkbashisms script reports errors like this one:
script util/check-line-length.sh does not appear to have a #! interpreter line;
you may get strange results
Instead of expecting that telemetry check has already finished,
wait for it for maximum of three seconds, because named is run with
-tat=3, so the telemetry check must happen with 3 second window.
Co-authored-by: Evan Hunt <each@isc.org>
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
The ISC_MEM_DEBUGSIZE and ISC_MEM_DEBUGCTX did sanity checks on matching
size and memory context on the memory returned to the allocator. Those
will no longer needed when most of the allocator will be replaced with
jemalloc.
Any CI job:
- I:dnssec:file dnssec/ns1/trusted.keys not removed
- I:rpzrecurse:file rpzrecurse/ns3/named.run.prev not removed
system:clang:freebsd11:amd64:
- I:tkey:file tkey/ns1/named.conf-e not removed
system:gcc:sid:amd64:
- I🪞file mirror/ns3/_default.nzf not removed
system:gcc:xenial:amd64:
- I:rpzextra:file rpzextra/.cache/v/cache/lastfailed not removed
- I:rpzrecurse:file rpzrecurse/ns3/named.run.prev not removed
- I:shutdown:file shutdown/.cache/v/cache/lastfailed not removed
In order to lower the amount of memory allocated at startup by named
instances used in the BIND system test suite, set the default value of
"max-cache-size" for these to 2 megabytes. The purpose of this change
is to prevent named instances (or even entire virtual machines) from
getting killed by the operating system on the test host due to excessive
memory use.
Remove all "max-cache-size" statements from named configuration files
used in system tests ("checkconf" notwithstanding) to prevent confusion
as the "-T maxcachesize=..." command line option takes precedence over
configuration files.
The $SYSTEMTESTTOP shell variable if often set to .. in various shell
scripts inside bin/tests/system/, but most of the time it is only
used one line later, while sourcing conf.sh. This hardly improves
code readability.
$SYSTEMTESTTOP is also used for the purpose of referencing
scripts/files living in bin/tests/system/, but given that the
variable is always set to a short, relative path, we can drop it and
replace all of its occurrences with the relative path without adversely
affecting code readability.
this changes most visble uses of master/slave terminology in tests.sh
and most uses of 'type master' or 'type slave' in named.conf files.
files in the checkconf test were not updated in order to confirm that
the old syntax still works. rpzrecurse was also left mostly unchanged
to avoid interference with DNSRPS.
as "type primary" is preferred over "type master" now, it makes
sense to make "primaries" available as a synonym too.
added a correctness check to ensure "primaries" and "masters"
cannot both be used in the same zone.
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests. Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.
This squashed commit contains following changes:
- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
by using automake
- the libtool is now properly integrated with automake (the way we used it
was rather hackish as the only official way how to use libtool is via
automake
- the dynamic module loading was rewritten from a custom patchwork to libtool's
libltdl (which includes the patchwork to support module loading on different
systems internally)
- conversion of the unit test executor from kyua to automake parallel driver
- conversion of the system test executor from custom make/shell to automake
parallel driver
- The GSSAPI has been refactored, the custom SPNEGO on the basis that
all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
support SPNEGO mechanism.
- The various defunct tests from bin/tests have been removed:
bin/tests/optional and bin/tests/pkcs11
- The text files generated from the MD files have been removed, the
MarkDown has been designed to be readable by both humans and computers
- The xsl header is now generated by a simple sed command instead of
perl helper
- The <irs/platform.h> header has been removed
- cleanups of configure.ac script to make it more simpler, addition of multiple
macros (there's still work to be done though)
- the tarball can now be prepared with `make dist`
- the system tests are partially able to run in oot build
Here's a list of unfinished work that needs to be completed in subsequent merge
requests:
- `make distcheck` doesn't yet work (because of system tests oot run is not yet
finished)
- documentation is not yet built, there's a different merge request with docbook
to sphinx-build rst conversion that needs to be rebased and adapted on top of
the automake
- msvc build is non functional yet and we need to decide whether we will just
cross-compile bind9 using mingw-w64 or fix the msvc build
- contributed dlz modules are not included neither in the autoconf nor automake
- ns__client_request() is now called by netmgr with an isc_nmhandle_t
parameter. The handle can then be permanently associated with an
ns_client object.
- The task manager is paused so that isc_task events that may be
triggred during client processing will not fire until after the netmgr is
finished with it. Before any asynchronous event, the client MUST
call isc_nmhandle_ref(client->handle), to prevent the client from
being reset and reused while waiting for an event to process. When
the asynchronous event is complete, isc_nmhandle_unref(client->handle)
must be called to ensure the handle can be reused later.
- reference counting of client objects is now handled in the nmhandle
object. when the handle references drop to zero, the client's "reset"
callback is used to free temporary resources and reiniialize it,
whereupon the handle (and associated client) is placed in the
"inactive handles" queue. when the sysstem is shutdown and the
handles are cleaned up, the client's "put" callback is called to free
all remaining resources.
- because client allocation is no longer handled in the same way,
the '-T clienttest' option has now been removed and is no longer
used by any system tests.
- the unit tests require wrapping the isc_nmhandle_unref() function;
when LD_WRAP is supported, that is used. otherwise we link a
libwrap.so interposer library and use that.
The "mirror" system test expects all dig queries (including recursive
ones) to be responded to within 1 second, which turns out to be overly
optimistic in certain cases and leads to false positives being
triggered. Increase dig query timeout used throughout the "mirror"
system test to 2 seconds in order to alleviate the issue.
Currently, ns3 in the "mirror" system test sends trust anchor telemetry
queries every second as it is started with "-T tat=1". Given the number
of trust anchors configured on ns3 (9), TAT-related traffic clutters up
log files, hindering troubleshooting efforts. Increase TAT query
interval to 3 seconds in order to alleviate the issue.
Note that the interval chosen cannot be much higher if intermittent test
failures are to be avoided: TAT queries are only sent after the
configured number of seconds passes since resolver startup. Quick
experiments show that even on contemporary hardware, ns3 should be
running for at least 5 seconds before it is first shut down, so a
3-second TAT query interval seems to be a reasonable, future-proof
compromise. Ensure the relevant check is performed before ns3 is first
shut down to emphasize this trade-off and make it more clear by what
time TAT queries are expected to be sent.
When a mirror zone is verified, the 'ignore_kskflag' argument passed to
dns_zoneverify_dnssec() is set to false. This means that in order for
its verification to succeed, a mirror zone needs to have at least one
key with the SEP bit set configured as a trust anchor. This brings no
security benefit and prevents zones signed only using keys without the
SEP bit set from being mirrored, so change the value of the
'ignore_kskflag' argument passed to dns_zoneverify_dnssec() to true.
The "mirror" system test checks whether log messages announcing a mirror
zone coming into effect are emitted properly. However, the helper
functions responsible for waiting for zone transfers and zone loading to
complete do not wait for these exact log messages, but rather for other
ones preceding them, which introduces a possibility of false positives.
This problem cannot be addressed by just changing the log message to
look for because the test still needs to discern between transferring a
zone and loading a zone.
Add two new log messages at debug level 99 (which is what named
instances used in system tests are configured with) that are to be
emitted after the log messages announcing a mirror zone coming into
effect. Tweak the aforementioned helper functions to only return once
the log messages they originally looked for are followed by the newly
added log messages. This reliably prevents races when looking for
"mirror zone is now in use" log messages and also enables a workaround
previously put into place in the "mirror" system test to be reverted.
In the "mirror" system test, ns3 periodically sends trust anchor
telemetry queries to ns1 and ns2. It may thus happen that for some
non-recursive queries for names inside mirror zones which are not yet
loaded, ns3 will be able to synthesize a negative answer from the cached
records it obtained from trust anchor telemetry responses. In such
cases, NXDOMAIN responses will be sent with the root zone SOA in the
AUTHORITY section. Since the root zone used in the "mirror" system test
has the same serial number as ns2/verify.db.in and zone verification
checks look for the specified serial numbers anywhere in the answer, the
test could be broken if different zone names were used.
The +noauth dig option could be used to address this weakness, but that
would prevent entire responses from being stored for later inspection,
which in turn would hamper troubleshooting test failures. Instead, use
a different serial number for ns2/verify.db.in than for any other zone
used in the "mirror" system test and check the number of records in the
ANSWER section of each response.
Due to the way the "mirror" system test is set up, it is impossible for
the "verify-unsigned" and "verify-untrusted" zones to contain any serial
number other than the original one present in ns2/verify.db.in. Thus,
using presence of a different serial number in the SOA records of these
zones as an indicator of problems with mirror zone verification is
wrong. Look for the original zone serial number instead as that is the
one that will be returned by ns3 if one of the aforementioned zones is
successfully verified.
Log a message if a mirror zone becomes unusable for the resolver (most
usually due to the zone's expiration timer firing). Ensure that
verification failures do not cause a mirror zone to be unloaded
(instead, its last successfully verified version should be served if it
is available).
Log a message when a mirror zone is successfully loaded from disk and
subsequently verified.
This could have been implemented in a simpler manner, e.g. by modifying
an earlier code branch inside zone_postload() which checks whether the
zone already has a database attached and calls attachdb() if it does
not, but that would cause the resulting logs to indicate that a mirror
zone comes into effect before the "loaded serial ..." message is logged,
which would be confusing.
Tweak some existing sed commands used in the "mirror" system test to
ensure that separate test cases comprising it do not break each other.
Log a message when a mirror zone is successfully transferred and
verified, but only if no database for that zone was yet loaded at the
time the transfer was initiated.
This could have been implemented in a simpler manner, e.g. by modifying
zone_replacedb(), but (due to the calling order of the functions
involved in finalizing a zone transfer) that would cause the resulting
logs to suggest that a mirror zone comes into effect before its transfer
is finished, which would be confusing given the nature of mirror zones
and the fact that no message is logged upon successful mirror zone
verification.
Once the dns_zone_replacedb() call in axfr_finalize() is made, it
becomes impossible to determine whether the transferred zone had a
database attached before the transfer was started. Thus, that check is
instead performed when the transfer context is first created and the
result of this check is passed around in a field of the transfer context
structure. If it turns out to be desired, the relevant log message is
then emitted just before the transfer context is freed.
Taking this approach means that the log message added by this commit is
not timed precisely, i.e. mirror zone data may be used before this
message is logged. However, that can only be fixed by logging the
message inside zone_replacedb(), which causes arguably more dire issues
discussed above.
dns_zone_isloaded() is not used to double-check that transferred zone
data was correctly loaded since the 'shutdown_result' field of the zone
transfer context will not be set to ISC_R_SUCCESS unless axfr_finalize()
succeeds (and that in turn will not happen unless dns_zone_replacedb()
succeeds).
Use a zone's 'type' field instead of the value of its DNS_ZONEOPT_MIRROR
option for checking whether it is a mirror zone. This makes said zone
option and its associated helper function, dns_zone_mirror(), redundant,
so remove them. Remove a check specific to mirror zones from
named_zone_reusable() since another check in that function ensures that
changing a zone's type prevents it from being reused during
reconfiguration.
Force ns3 to use a constant source address (10.53.0.3) when sending
transfer requests for the "initially-unavailable" zone to prevent
failures of transfers not triggered by bin/tests/system/mirror/tests.sh
from causing fallback to using a source address for which transfers of
that zone are refused throughout the entire "mirror" system test since
that might yield false positives.
When query processing hits a delegation from a locally configured zone,
an attempt may be made to look for a better answer in the cache. In
such a case, the zone-sourced delegation data is set aside and the
lookup is retried using the cache database. When that lookup is
completed, a decision is made whether the answer found in the cache is
better than the answer found in the zone.
Currently, if the zone-sourced answer turns out to be better than the
one found in the cache:
- qctx->zdb is not restored into qctx->db,
- qctx->node, holding the zone database node found, is not even saved.
Thus, in such a case both qctx->db and qctx->node will point at cache
data. This is not an issue for BIND versions which do not support
mirror zones because in these versions non-recursive queries always
cause the zone-sourced delegation to be returned and thus the
non-recursive part of query_delegation() is never reached if the
delegation is coming from a zone. With mirror zones, however,
non-recursive queries may cause cache lookups even after a zone
delegation is found. Leaving qctx->db assigned to the cache database
when query_delegation() determines that the zone-sourced delegation is
the best answer to the client's query prevents DS records from being
added to delegations coming from mirror zones. Fix this issue by
keeping the zone database and zone node in qctx while the cache is
searched for an answer and then restoring them into qctx->db and
qctx->node, respectively, if the zone-sourced delegation turns out to be
the best answer. Since this change means that qctx->zdb cannot be used
as the glue database any more as it will be reset to NULL by RESTORE(),
ensure that qctx->db is not a cache database before attaching it to
qctx->client->query.gluedb.
Furthermore, current code contains a conditional statement which
prevents a mirror zone from being used as a source of glue records.
Said statement was added to prevent assertion failures caused by
attempting to use a zone database's glue cache for finding glue for an
NS RRset coming from a cache database. However, that check is overly
strict since it completely prevents glue from being added to delegations
coming from mirror zones. With the changes described above in place,
the scenario this check was preventing can no longer happen, so remove
the aforementioned check.
If qctx->zdb is not NULL, qctx->zfname will also not be NULL;
qctx->zsigrdataset may be NULL in such a case, but query_putrdataset()
handles pointers to NULL pointers gracefully. Remove redundant
conditional expressions to make the cleanup code in query_freedata()
match the corresponding sequences of SAVE() / RESTORE() macros more
closely.
dns_view_zonecut() may associate the dns_rdataset_t structure passed to
it even if it returns a result different then ISC_R_SUCCESS. Not
handling this properly may cause a reference leak. Fix by ensuring
'nameservers' is cleaned up in all relevant failure modes.