'rndc thaw' initiates asynchrous loading of all the zones
similar to 'rndc load'. Wait for the test zone's load to
complete before testing that it is updatable again.
(cherry picked from commit 5b3238aa85)
The covered type previously displayed as TYPE0 when it should
have reflected the records that was actually covered.
(cherry picked from commit 8ce359652a)
In some cases, BIND is not fast enough to fill the send buffer and
manages to answer all queries, contrary to what the test expects.
Repeat the check up to 3 times to limit this test instability.
(cherry picked from commit 681b23c398)
If the flaky plugin for pytest is available, use its decorator to
support re-running unstable tests. In case the package is missing,
execute the test as usual without attempts to re-run it in case of
failure.
This is mostly intended to increase the test stability in CI. Using a
custom decorator enables us to keep the flaky package as an optional
dependency.
(cherry picked from commit 5b703de733)
The mkeys system test could fail because root zone was resigned
within the same second as it was previously signed causing reloads
to fail. Add delays to the test to prevent this.
(cherry picked from commit 40e3529379)
Errors getting transfer statistics from named.run where not detected
as ret was not set to one if there hadn't been a success after looping
for a while.
(cherry picked from commit 287a1ac09b)
The setup.pl script has been replaced with static BIND configurations,
and in the course of this change, the unused ns1 server was removed.
This enhancement has greatly improved the overall test's readability.
(cherry picked from commit 08a8906cfc)
The shell version of the test was completed only after all DNS zone
updates were sent, even if the BIND server crashed while processing
them, leading to prolonged execution and potential hang in the CI
environment. The Python rewrite of the test ensures that DNS update
tasks finish within five minutes of starting, irrespective of a BIND
crash possibility or DNS zone updates not finishing in time.
(cherry picked from commit ecd7b30d0a)
The fstrm_capture utility is started in the background during the
"dnstap" system test. Consequently, "rndc dnstap-reopen" and similar
commands may be executed before fstrm_capture starts listening on the
Unix domain socket it is configured to receive dnstap data on. This
results in the dnstap data sent to that socket in the meantime to be
lost; while the fstrm writer thread is able to recover from such a
scenario within a couple of seconds (by reopening the configured dnstap
destination itself), only one write attempt is made for data
successfully queued to the writer thread, so dnstap frames can still be
lost in the process. This may happen during the "dnstap" system test,
leading to the dnstap output file being empty, which in turn causes the
test to fail.
Fix by waiting until fstrm_capture starts listening on the Unix domain
socket it is configured to use before asking named to reopen the
configured dnstap destination. Since various fstrm_capture versions log
different messages when the listening socket is set up, wait for a
common string that works for all fstrm_capture versions released to
date. Add a few extra debug messages indicating test progress and make
the test fail if the expected fstrm_capture log message is not generated
within 10 seconds.
(cherry picked from commit 26d3d97f12)
The fstrm_capture.out file is overwritten when the fstrm_capture utility
is restarted during the "dnstap" system test. Use a separate output
file for each fstrm_capture instance to ensure all output produced by
that tool during the "dnstap" system test is preserved for forensic
purposes.
(cherry picked from commit bd2941fc72)
'HOME=value command' should only change HOME for command but on
some platforms this occasionally sets HOME for the rest of the
test. Explicitly isolate the enviroment change using a sub shell.
(cherry picked from commit 96f75bba18)
The resolve binary is affected by GL#4119 which occassionally makes it
hand during system tests when running with TSAN. This is a workaround to
avoid wasting resources caused by a CI timeout for the system test tsan
jobs.
(cherry picked from commit 774b9bc629)
The conditions that trigger the crash:
- a stale record is in cache
- stale-answer-client-timeout is 0
- multiple clients query for the stale record, enough of them to exceed
the recursive-clients quota
- the response from the authoritative is sufficiently delayed so that
recursive-clients quota is exceeded first
The reproducer attempts to simulate this situation. However, it hasn't
proven to be 100 % reproducible, especially in CI. When reproducing
locally, the priming query also seems to sometimes interfere and prevent
the crash. When the reproducer is ran twice, it appears to be more
reliable in reproducing the issue.
(cherry picked from commit f617512d37)
In pytest 7.4.0, there were some changes to how the configuration file
for pytest is located. In our case, this resulted in a failure to find
the conftest.py with the needed fixtures which then prevented our python
tests from being executed successfully.
Configure the --confcutdir to ensure it points to the system test
directory, where our conftest.py is located.
Related https://github.com/pytest-dev/pytest/pull/11043
Because of a typo, the fetch.pl script tries to extract the server
address from the input parameter 'a' instead of 's'. Fix the typo.
(cherry picked from commit aa7538fd38)
Add this test scenario for a bug fixed a while ago. When a third key is
introduced while the previous rollover hasn't finished yet, the keymgr
could decide to remove the first two keys, because it was not checking
for an indirect dependency on the keys.
In other words, the previous bug behavior was that the first two keys
were removed from the zone too soon.
This test case checks that all three keys stay in the zone, and no keys
are removed premature after another new key has been introduced.
(cherry picked from commit 9c40cf0566)
In the kasp script, if one expected key is not found, continue checking
the other key ids, even if there is no match for the first one. This
provides a bit more information which keys mismatch and makes for
easier debugging test failures.
(cherry picked from commit 674249f66a)
Make the cds/setup.sh compatible with the workaround which relies on
testing the TSAN_OPTIONS variable which may not be set.
(cherry picked from commit 76d9873ef6)
Surround the variables which are checked whether they're executable in
double quotes. Without them, empty paths won't be properly interpreted
as not executable.
(manually picked from commit 06056c44a7)
Since delv can occasionally hang in system tests when running with TSAN
(see GL#4119), disable these tests as a workaround. Otherwise, the hung
delv process will just waste CI resources and prevent any meaningful
output from the rest of the test suite.
(cherry picked from commit fbcf37f914)
Previously, the first check silently failed, as 450 is apparently (in
the CI) the minimum output size for the dnstap output, rather than
470 which the test was expecting. Effectively, the check served as a 5
second sleep rather than waiting for the proper file size.
Additionally, check the expected file sizes and fail if expectations
aren't met.
(manually picked from commit 5f809e50b6)
On main, the minimum file size seems to 454 bytes, while on some
platforms in our CI setup for the 9.16 branch, it appears to be 450
instead.
The log message is supposed to contain the zone name which was
erroneously omitted, but didn't pop up during tests, since return code
was silently ignored.
Now it actually waits for the proper log message rather than being an
equivalent of 3 second sleep (which was also sufficient to make the test
pass, thus we detected no failure).
(cherry picked from commit 1dd4c2b9e2)
The purpose of the check is to verify the server has survived the
previous barrage of queries. This is done by sending a query and
checking we get a NOERROR response back.
Previously, that query could've been affected by a servfail cache - the
server would return a SERVFAIL answer, thus failing the check, despite
being up and running. Use version.bind txt ch query to avoid the
interference of servfail cache.
(cherry picked from commit dd7bcd2855)
The $t1 value equals $t2 due to the time elapsed between "rndc
managed-keys status" calls being equal to the normal active refresh
period (as calculated per rules listed in RFC 5011 section 2.3) minus an
"hour" (as set using -T mkeytimers). This value equality is expected to
happen on really slow machines. On our Windows CI runner, it happens
very often.
Add a test case where when priming the cache with a slow authoritative
resolver, the stale-answer-client-timeout option should not return
a delegation to the client (it should wait until an applicable answer
is found, if no entry is found in the cache).
(cherry picked from commit c3d4fd3449)
The line summarising TSAN reports was misplaced in the ASAN territory
and thus never used.
I also made core dumps, assertion failures, and TSAN reports detection
independent of each other.
(cherry picked from commit 0c4c7ddec4)
This check is too unstable on Windows. Given the bind-9.16 branch is in
security fixes-only mode, something unlikely to be investigated before
the branch goes EOL.
The set_key_default_values function hasn't been backported to bind-9.16
and produces a warning in the nsec3 system test:
tests.sh: line 234: set_key_default_values: command not found
Previously, if an exception would happen inside the `with` block, the
error handler would wait indefinitely for the process to end. That would
never happen, since the termination signal was never sent to named and
the test would get stuck.
Using the try-finally block ensures that the named process is always
killed and any exception or errors will be handled gracefully.
(cherry picked from commit 836e6ed284)
Improve code readability by splitting the test into more functions. Some
could be re-used later on for more general-purpose subprocess handling
or named checks.
(cherry picked from commit 9d64f1c1ed)
Empty-non-terminal NSEC records where not always removed when the
delegations generating them where removed via update. Check that
they now are.
(cherry picked from commit ad91a70d15)
At the time of test number (19), there were 10 "sending packet to
10.53.0.7" lines in the "legacy/ns1/named.run" file; usually, only seven
are present:
I:legacy:checking recursive lookup to edns 512 + no tcp server does not cause query loops (19)
I:legacy:ns1 sent 10 queries to ns7, expected less than 10
I:legacy:failed
Those three can be attributed to tests "8", "10", and "18", where the
dig of "resolution_fails()" retried after a timeout to succeed with
"status: SERVFAIL" subsequently, as seen in each of
dig.out.test{8,10,18} files.
;; communications error to 10.53.0.1#13093: timed out
; <<>> DiG 9.19.12-dev <<>> -p 13093 +tcp @10.53.0.1 edns512-notcp. TXT
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5368
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
This retry is unnecessary because "resolution_fails()" considers timeout
a positive result.
(cherry picked from commit e05460c813)
Manual page were updated to indicate it, but rndc -h still displays it
as required parameter. Make it look like optional.
(cherry picked from commit 0627214568)
Depending upon when the directory is sampled there may be 2
(oldest version removed and rename / reopen is in progresss) or
3 old versions of the log file.
(cherry picked from commit f485bb19c0)
due to comparing logfile suffixes as 32 bit rather than 64 bit
integers, logfiles with timestamp suffixes that should have been
removed when rolling could be left in place. this has been fixed.
(cherry picked from commit 9a9e906306)
the logfileconfig system test did not conform to the style of
other tests, and was difficult to read and maintain. it has
been cleaned up and simplifeid in several ways:
- named.args used when appropriate so that named can be started with
specified command line arguments, instead of having it launched
directly from tests.sh
- unused root zone removed from named configuration
- an existing directory used instead of using 'mkdir' to create one
- dnssec-validation disabled to stop the server sending unnecessary
queries
incidental fix: removed leftover debugging printfs from logconf.c.
(manually picked from commit 76baed3343)
The serve-stale system test was intermittently failing due to a timing
issue:
I:serve-stale:check stale data.example TXT was refreshed...
I:serve-stale:failed
The RRset is refreshed, however, it first checks for an expected log
line, prior checking that the stale data.example TXT was refreshed
(using dig). This log line is there to ensure the record is actually
refreshed before we start querying again. Alternatively we could just
retry_quiet 10 <wait for dig output matches expectations>. It would
lower the chances for intermittent test failures, since there is no
longer a "check for log line, sleep one second if check fails, check
for log line, ...", prior to the check.
(cherry picked from commit 0bf36da305)
Avoid creating any temporary files in the current workdir.
Additional/changing files in the bin/tests/system directory are
problematic for pytest/xdist collection phase, which assumes the list of
files doesn't change between the collection phase of the main pytest
thread and the subsequent collection phase of the xdist worker threads.
Since the testcrypto.sh is also called during pytest initialization
through conf.sh.common (to detect feature support), this could
occasionally cause a race condition when the list of files would be
different for the main pytest thread and the xdist worker.
(cherry picked from commit 61330a7863)