Commit graph

10520 commits

Author SHA1 Message Date
Michal Nowak
43947e7198
Add test for CVE-2023-3341
(cherry picked from commit 7d1834b250)
2023-10-20 16:26:49 +02:00
Michal Nowak
531c96b8ed
Update the source code formatting using clang-format-17 2023-10-17 17:56:31 +02:00
Mark Andrews
652847b06e Document that reloading happens asynchronously
(cherry picked from commit e33dbd0cbd)
2023-09-27 10:14:35 +10:00
Mark Andrews
0e55d94653 Wait for the test zone to finish re-loading
'rndc thaw' initiates asynchrous loading of all the zones
similar to 'rndc load'.  Wait for the test zone's load to
complete before testing that it is updatable again.

(cherry picked from commit 5b3238aa85)
2023-09-26 14:12:36 +10:00
Mark Andrews
50bdf976c7 Check RRSIG covered type in negative cache entry
The covered type previously displayed as TYPE0 when it should
have reflected the records that was actually covered.

(cherry picked from commit 8ce359652a)
2023-09-18 16:40:55 +10:00
Tom Krizek
d941dd151c
Skip checkds test on Python<3.7
checkds test requires the capture_output argument for subprocess.run()
which was added in Python 3.7.

(cherry picked from commit 0361233b3d)
2023-08-23 14:51:25 +02:00
Michal Nowak
2d951a900b
Mark test_send_timeout as flaky
In some cases, BIND is not fast enough to fill the send buffer and
manages to answer all queries, contrary to what the test expects.
Repeat the check up to 3 times to limit this test instability.

(cherry picked from commit 681b23c398)
2023-08-22 08:55:36 +02:00
Tom Krizek
f84e0f4ad0
Add custom flaky decorator to handle unstable tests
If the flaky plugin for pytest is available, use its decorator to
support re-running unstable tests. In case the package is missing,
execute the test as usual without attempts to re-run it in case of
failure.

This is mostly intended to increase the test stability in CI. Using a
custom decorator enables us to keep the flaky package as an optional
dependency.

(cherry picked from commit 5b703de733)
2023-08-22 08:55:36 +02:00
Mark Andrews
a9ab5c215f Add sleeps so that the modification time changes
The mkeys system test could fail because root zone was resigned
within the same second as it was previously signed causing reloads
to fail.  Add delays to the test to prevent this.

(cherry picked from commit 40e3529379)
2023-08-15 09:39:53 +10:00
Mark Andrews
71c2bb46e5 Set ret=1 if _wait_for_stats does not succeed
Errors getting transfer statistics from named.run where not detected
as ret was not set to one if there hadn't been a success after looping
for a while.

(cherry picked from commit 287a1ac09b)
2023-08-08 23:42:23 +00:00
Michał Kępień
5698a6a905
Convert setup.pl into static configurations
The setup.pl script has been replaced with static BIND configurations,
and in the course of this change, the unused ns1 server was removed.
This enhancement has greatly improved the overall test's readability.

(cherry picked from commit 08a8906cfc)
2023-08-08 18:01:17 +02:00
Michal Nowak
b939741cfd
Rewrite stress test to pytest
The shell version of the test was completed only after all DNS zone
updates were sent, even if the BIND server crashed while processing
them, leading to prolonged execution and potential hang in the CI
environment. The Python rewrite of the test ensures that DNS update
tasks finish within five minutes of starting, irrespective of a BIND
crash possibility or DNS zone updates not finishing in time.

(cherry picked from commit ecd7b30d0a)
2023-08-08 18:01:17 +02:00
Michał Kępień
268b4392ba
Wait until fstrm_capture is ready
The fstrm_capture utility is started in the background during the
"dnstap" system test.  Consequently, "rndc dnstap-reopen" and similar
commands may be executed before fstrm_capture starts listening on the
Unix domain socket it is configured to receive dnstap data on.  This
results in the dnstap data sent to that socket in the meantime to be
lost; while the fstrm writer thread is able to recover from such a
scenario within a couple of seconds (by reopening the configured dnstap
destination itself), only one write attempt is made for data
successfully queued to the writer thread, so dnstap frames can still be
lost in the process.  This may happen during the "dnstap" system test,
leading to the dnstap output file being empty, which in turn causes the
test to fail.

Fix by waiting until fstrm_capture starts listening on the Unix domain
socket it is configured to use before asking named to reopen the
configured dnstap destination.  Since various fstrm_capture versions log
different messages when the listening socket is set up, wait for a
common string that works for all fstrm_capture versions released to
date.  Add a few extra debug messages indicating test progress and make
the test fail if the expected fstrm_capture log message is not generated
within 10 seconds.

(cherry picked from commit 26d3d97f12)
2023-08-07 14:01:51 +02:00
Michał Kępień
c630efb2f1
Capture all fstrm_capture output
The fstrm_capture.out file is overwritten when the fstrm_capture utility
is restarted during the "dnstap" system test.  Use a separate output
file for each fstrm_capture instance to ensure all output produced by
that tool during the "dnstap" system test is preserved for forensic
purposes.

(cherry picked from commit bd2941fc72)
2023-08-07 14:01:51 +02:00
Mark Andrews
dc2ea03ea2
Use sub shell to isolate enviroment changes
'HOME=value command' should only change HOME for command but on
some platforms this occasionally sets HOME for the rest of the
test. Explicitly isolate the enviroment change using a sub shell.

(cherry picked from commit 96f75bba18)
2023-08-02 10:47:36 +02:00
Štěpán Balážik
fa8d48ee86
Fix ecdsa256 check in ecdsa system test setup
Probably by copy-paste mistake, ecdsa384 was checked twice.

(cherry picked from commit 10194baa07)
2023-07-28 09:15:17 +02:00
Tom Krizek
9abdcb23a2
Disable resolve checks under TSAN
The resolve binary is affected by GL#4119 which occassionally makes it
hand during system tests when running with TSAN. This is a workaround to
avoid wasting resources caused by a CI timeout for the system test tsan
jobs.

(cherry picked from commit 774b9bc629)
2023-07-26 10:08:29 +02:00
Tom Krizek
3c30f4a408
Reproducer for CVE-2023-2911
The conditions that trigger the crash:
- a stale record is in cache
- stale-answer-client-timeout is 0
- multiple clients query for the stale record, enough of them to exceed
  the recursive-clients quota
- the response from the authoritative is sufficiently delayed so that
  recursive-clients quota is exceeded first

The reproducer attempts to simulate this situation. However, it hasn't
proven to be 100 % reproducible, especially in CI. When reproducing
locally, the priming query also seems to sometimes interfere and prevent
the crash. When the reproducer is ran twice, it appears to be more
reliable in reproducing the issue.

(cherry picked from commit f617512d37)
2023-07-25 10:35:09 +02:00
Tom Krizek
90e33052d2
Configure pytest to properly locate conftest.py
In pytest 7.4.0, there were some changes to how the configuration file
for pytest is located. In our case, this resulted in a failure to find
the conftest.py with the needed fixtures which then prevented our python
tests from being executed successfully.

Configure the --confcutdir to ensure it points to the system test
directory, where our conftest.py is located.

Related https://github.com/pytest-dev/pytest/pull/11043
2023-07-20 13:27:03 +02:00
Aram Sargsyan
3a807e554f Fix a bug in an utility script for the statschannel system test
Because of a typo, the fetch.pl script tries to extract the server
address from the input parameter 'a' instead of 's'. Fix the typo.

(cherry picked from commit aa7538fd38)
2023-07-19 13:27:54 +00:00
Mark Andrews
ce17cdf9cb Use absolute path to locate run.gdb
(cherry picked from commit 3f7723cdff)
2023-07-19 12:53:43 +10:00
Matthijs Mekking
80a20c9643 Add test for "three is a crowd" bug (GL #2375)
Add this test scenario for a bug fixed a while ago. When a third key is
introduced while the previous rollover hasn't finished yet, the keymgr
could decide to remove the first two keys, because it was not checking
for an indirect dependency on the keys.

In other words, the previous bug behavior was that the first two keys
were removed from the zone too soon.

This test case checks that all three keys stay in the zone, and no keys
are removed premature after another new key has been introduced.

(cherry picked from commit 9c40cf0566)
2023-07-06 10:30:53 +02:00
Matthijs Mekking
83dd0c85a2 Check all keys despite early failure
In the kasp script, if one expected key is not found, continue checking
the other key ids, even if there is no match for the first one.  This
provides a bit more information which keys mismatch and makes for
easier debugging test failures.

(cherry picked from commit 674249f66a)
2023-07-06 10:28:41 +02:00
Tom Krizek
4efef8cb54
Check for unset variables only after conf.sh is loaded
Make the cds/setup.sh compatible with the workaround which relies on
testing the TSAN_OPTIONS variable which may not be set.

(cherry picked from commit 76d9873ef6)
2023-06-29 14:40:09 +02:00
Tom Krizek
2020ce2010
Fix checking for executables in shell conditions in tests
Surround the variables which are checked whether they're executable in
double quotes. Without them, empty paths won't be properly interpreted
as not executable.

(manually picked from commit 06056c44a7)
2023-06-29 13:19:47 +02:00
Tom Krizek
bd9dabc0c3
Only use delv if available in mkeys test
Check that $DELV is an executable before using it in a test.

(cherry picked from commit 384339dbba)
2023-06-29 13:16:50 +02:00
Tom Krizek
a904cd9a0e
Disable delv tests under TSAN
Since delv can occasionally hang in system tests when running with TSAN
(see GL#4119), disable these tests as a workaround. Otherwise, the hung
delv process will just waste CI resources and prevent any meaningful
output from the rest of the test suite.

(cherry picked from commit fbcf37f914)
2023-06-29 13:16:46 +02:00
Tom Krizek
0374c27fc5
Check for proper file size output in dnstap test
Previously, the first check silently failed, as 450 is apparently (in
the CI) the minimum output size for the dnstap output, rather than
470 which the test was expecting. Effectively, the check served as a 5
second sleep rather than waiting for the proper file size.

Additionally, check the expected file sizes and fail if expectations
aren't met.

(manually picked from commit 5f809e50b6)

On main, the minimum file size seems to 454 bytes, while on some
platforms in our CI setup for the 9.16 branch, it appears to be 450
instead.
2023-06-26 14:33:43 +02:00
Tom Krizek
9cfc8da487
Check for proper log message in kasp test
The log message is supposed to contain the zone name which was
erroneously omitted, but didn't pop up during tests, since return code
was silently ignored.

Now it actually waits for the proper log message rather than being an
equivalent of 3 second sleep (which was also sufficient to make the test
pass, thus we detected no failure).

(cherry picked from commit 1dd4c2b9e2)
2023-06-26 13:08:09 +02:00
Michał Kępień
731a736a91
Add a tool for reproducing ISC SPNEGO bugs
Extend the "tsiggss" system test with reproducers for CVE-2020-8625 and
CVE-2021-25216.

(cherry picked from commit a47dc810f7)
2023-06-19 10:36:25 +02:00
Tom Krizek
328d0a1d0a
Avoid false positive in serve-stale system test check
The purpose of the check is to verify the server has survived the
previous barrage of queries. This is done by sending a query and
checking we get a NOERROR response back.

Previously, that query could've been affected by a servfail cache - the
server would return a SERVFAIL answer, thus failing the check, despite
being up and running. Use version.bind txt ch query to avoid the
interference of servfail cache.

(cherry picked from commit dd7bcd2855)
2023-06-13 14:16:44 +02:00
Michal Nowak
ca57ddf53e
Disable minimal update check with no keys on Windows
The $t1 value equals $t2 due to the time elapsed between "rndc
managed-keys status" calls being equal to the normal active refresh
period (as calculated per rules listed in RFC 5011 section 2.3) minus an
"hour" (as set using -T mkeytimers). This value equality is expected to
happen on really slow machines. On our Windows CI runner, it happens
very often.
2023-05-31 14:25:02 +02:00
Matthijs Mekking
2d5b975f3a Add serve-stale test case for GL #3950
Add a test case where when priming the cache with a slow authoritative
resolver, the stale-answer-client-timeout option should not return
a delegation to the client (it should wait until an applicable answer
is found, if no entry is found in the cache).

(cherry picked from commit c3d4fd3449)
2023-05-30 15:32:24 +02:00
Michal Nowak
f00a212cb8
TSAN summarising line was misplaced in run.sh
The line summarising TSAN reports was misplaced in the ASAN territory
and thus never used.

I also made core dumps, assertion failures, and TSAN reports detection
independent of each other.

(cherry picked from commit 0c4c7ddec4)
2023-05-19 14:55:23 +02:00
Michal Nowak
4078347d80
Disable exceeded quota check on Windows
This check is too unstable on Windows. Given the bind-9.16 branch is in
security fixes-only mode, something unlikely to be investigated before
the branch goes EOL.
2023-05-18 16:57:55 +02:00
Michal Nowak
8cdfde8b35
Drop set_key_default_values function
The set_key_default_values function hasn't been backported to bind-9.16
and produces a warning in the nsec3 system test:

    tests.sh: line 234: set_key_default_values: command not found
2023-05-12 10:55:08 +02:00
Michal Nowak
4cc7feee24
Rewrite the ttl system test to pytest
(cherry picked from commit 0c05c3d97b)
2023-05-11 16:53:04 +02:00
Michal Nowak
ab9d43f814
Update sources to Clang 16 formatting 2023-05-11 14:26:14 +02:00
Tom Krizek
fd1d359965
Ensure named always terminates in the shutdown test
Previously, if an exception would happen inside the `with` block, the
error handler would wait indefinitely for the process to end. That would
never happen, since the termination signal was never sent to named and
the test would get stuck.

Using the try-finally block ensures that the named process is always
killed and any exception or errors will be handled gracefully.

(cherry picked from commit 836e6ed284)
2023-05-10 13:32:55 +02:00
Tom Krizek
7bbc38da95
Refactor shutdown test into more helper functions
Improve code readability by splitting the test into more functions. Some
could be re-used later on for more general-purpose subprocess handling
or named checks.

(cherry picked from commit 9d64f1c1ed)
2023-05-10 13:32:51 +02:00
Mark Andrews
aa73fda9bf Check removal of ENT when subdomains are removed
Empty-non-terminal NSEC records where not always removed when the
delegations generating them where removed via update. Check that
they now are.

(cherry picked from commit ad91a70d15)
2023-04-25 06:51:11 +01:00
Michal Nowak
6d5249c50c
Do not retry in resolution_fails() on timeout
At the time of test number (19), there were 10 "sending packet to
10.53.0.7" lines in the "legacy/ns1/named.run" file; usually, only seven
are present:

    I:legacy:checking recursive lookup to edns 512 + no tcp server does not cause query loops (19)
    I:legacy:ns1 sent 10 queries to ns7, expected less than 10
    I:legacy:failed

Those three can be attributed to tests "8", "10", and "18", where the
dig of "resolution_fails()" retried after a timeout to succeed with
"status: SERVFAIL" subsequently, as seen in each of
dig.out.test{8,10,18} files.

    ;; communications error to 10.53.0.1#13093: timed out

    ; <<>> DiG 9.19.12-dev <<>> -p 13093 +tcp @10.53.0.1 edns512-notcp. TXT
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5368
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

This retry is unnecessary because "resolution_fails()" considers timeout
a positive result.

(cherry picked from commit e05460c813)
2023-04-06 11:46:57 +02:00
Petr Špaček
f8e094bab8
Make rndc dnstap -roll docs easier to read
(cherry picked from commit 2897a45644)
2023-04-05 16:10:20 +02:00
Petr Menšík
f316f581fd
Make it obvious -roll number is optional
Manual page were updated to indicate it, but rndc -h still displays it
as required parameter. Make it look like optional.

(cherry picked from commit 0627214568)
2023-04-05 14:06:26 +02:00
Matthijs Mekking
9e702807cb Fix kasp system test bug
This test was succeeding for the wrong reason (policy not found, rather
than bad key length).

(cherry picked from commit 106497b011)
2023-03-31 10:34:49 +02:00
Mark Andrews
c3f5ef24b2 Accept either 2 or 3 old versions of log file
Depending upon when the directory is sampled there may be 2
(oldest version removed and rename / reopen is in progresss) or
3 old versions of the log file.

(cherry picked from commit f485bb19c0)
2023-03-28 10:03:33 +00:00
Evan Hunt
6e422ae3ae fixed a bug in rolling timestamp logfiles
due to comparing logfile suffixes as 32 bit rather than 64 bit
integers, logfiles with timestamp suffixes that should have been
removed when rolling could be left in place. this has been fixed.

(cherry picked from commit 9a9e906306)
2023-03-28 10:03:33 +00:00
Evan Hunt
e2f7f63448 rewrite logfileconfig system test
the logfileconfig system test did not conform to the style of
other tests, and was difficult to read and maintain. it has
been cleaned up and simplifeid in several ways:

- named.args used when appropriate so that named can be started with
  specified command line arguments, instead of having it launched
  directly from tests.sh
- unused root zone removed from named configuration
- an existing directory used instead of using 'mkdir' to create one
- dnssec-validation disabled to stop the server sending unnecessary
  queries

incidental fix: removed leftover debugging printfs from logconf.c.

(manually picked from commit 76baed3343)
2023-03-28 10:03:33 +00:00
Matthijs Mekking
6a97848791 Update serve-stale system test
The serve-stale system test was intermittently failing due to a timing
issue:

    I:serve-stale:check stale data.example TXT was refreshed...
    I:serve-stale:failed

The RRset is refreshed, however, it first checks for an expected log
line, prior checking that the stale data.example TXT was refreshed
(using dig). This log line is there to ensure the record is actually
refreshed before we start querying again. Alternatively we could just
retry_quiet 10 <wait for dig output matches expectations>. It would
lower the chances for intermittent test failures, since there is no
longer a "check for log line, sleep one second if check fails, check
for log line, ...", prior to the check.

(cherry picked from commit 0bf36da305)
2023-03-27 08:21:54 +00:00
Tom Krizek
d631ecdde7
testcrypto.sh: run in TMPDIR if possible
Avoid creating any temporary files in the current workdir.

Additional/changing files in the bin/tests/system directory are
problematic for pytest/xdist collection phase, which assumes the list of
files doesn't change between the collection phase of the main pytest
thread and the subsequent collection phase of the xdist worker threads.

Since the testcrypto.sh is also called during pytest initialization
through conf.sh.common (to detect feature support), this could
occasionally cause a race condition when the list of files would be
different for the main pytest thread and the xdist worker.

(cherry picked from commit 61330a7863)
2023-03-23 17:17:59 +01:00