Add this test scenario for a bug fixed a while ago. When a third key is
introduced while the previous rollover hasn't finished yet, the keymgr
could decide to remove the first two keys, because it was not checking
for an indirect dependency on the keys.
In other words, the previous bug behavior was that the first two keys
were removed from the zone too soon.
This test case checks that all three keys stay in the zone, and no keys
are removed premature after another new key has been introduced.
(cherry picked from commit 9c40cf0566)
In the kasp script, if one expected key is not found, continue checking
the other key ids, even if there is no match for the first one. This
provides a bit more information which keys mismatch and makes for
easier debugging test failures.
(cherry picked from commit 674249f66a)
Make the cds/setup.sh compatible with the workaround which relies on
testing the TSAN_OPTIONS variable which may not be set.
(cherry picked from commit 76d9873ef6)
Surround the variables which are checked whether they're executable in
double quotes. Without them, empty paths won't be properly interpreted
as not executable.
(manually picked from commit 06056c44a7)
Since delv can occasionally hang in system tests when running with TSAN
(see GL#4119), disable these tests as a workaround. Otherwise, the hung
delv process will just waste CI resources and prevent any meaningful
output from the rest of the test suite.
(cherry picked from commit fbcf37f914)
Previously, the first check silently failed, as 450 is apparently (in
the CI) the minimum output size for the dnstap output, rather than
470 which the test was expecting. Effectively, the check served as a 5
second sleep rather than waiting for the proper file size.
Additionally, check the expected file sizes and fail if expectations
aren't met.
(manually picked from commit 5f809e50b6)
On main, the minimum file size seems to 454 bytes, while on some
platforms in our CI setup for the 9.16 branch, it appears to be 450
instead.
The log message is supposed to contain the zone name which was
erroneously omitted, but didn't pop up during tests, since return code
was silently ignored.
Now it actually waits for the proper log message rather than being an
equivalent of 3 second sleep (which was also sufficient to make the test
pass, thus we detected no failure).
(cherry picked from commit 1dd4c2b9e2)
The purpose of the check is to verify the server has survived the
previous barrage of queries. This is done by sending a query and
checking we get a NOERROR response back.
Previously, that query could've been affected by a servfail cache - the
server would return a SERVFAIL answer, thus failing the check, despite
being up and running. Use version.bind txt ch query to avoid the
interference of servfail cache.
(cherry picked from commit dd7bcd2855)
The $t1 value equals $t2 due to the time elapsed between "rndc
managed-keys status" calls being equal to the normal active refresh
period (as calculated per rules listed in RFC 5011 section 2.3) minus an
"hour" (as set using -T mkeytimers). This value equality is expected to
happen on really slow machines. On our Windows CI runner, it happens
very often.
Add a test case where when priming the cache with a slow authoritative
resolver, the stale-answer-client-timeout option should not return
a delegation to the client (it should wait until an applicable answer
is found, if no entry is found in the cache).
(cherry picked from commit c3d4fd3449)
The line summarising TSAN reports was misplaced in the ASAN territory
and thus never used.
I also made core dumps, assertion failures, and TSAN reports detection
independent of each other.
(cherry picked from commit 0c4c7ddec4)
This check is too unstable on Windows. Given the bind-9.16 branch is in
security fixes-only mode, something unlikely to be investigated before
the branch goes EOL.
The set_key_default_values function hasn't been backported to bind-9.16
and produces a warning in the nsec3 system test:
tests.sh: line 234: set_key_default_values: command not found
Previously, if an exception would happen inside the `with` block, the
error handler would wait indefinitely for the process to end. That would
never happen, since the termination signal was never sent to named and
the test would get stuck.
Using the try-finally block ensures that the named process is always
killed and any exception or errors will be handled gracefully.
(cherry picked from commit 836e6ed284)
Improve code readability by splitting the test into more functions. Some
could be re-used later on for more general-purpose subprocess handling
or named checks.
(cherry picked from commit 9d64f1c1ed)
Empty-non-terminal NSEC records where not always removed when the
delegations generating them where removed via update. Check that
they now are.
(cherry picked from commit ad91a70d15)
At the time of test number (19), there were 10 "sending packet to
10.53.0.7" lines in the "legacy/ns1/named.run" file; usually, only seven
are present:
I:legacy:checking recursive lookup to edns 512 + no tcp server does not cause query loops (19)
I:legacy:ns1 sent 10 queries to ns7, expected less than 10
I:legacy:failed
Those three can be attributed to tests "8", "10", and "18", where the
dig of "resolution_fails()" retried after a timeout to succeed with
"status: SERVFAIL" subsequently, as seen in each of
dig.out.test{8,10,18} files.
;; communications error to 10.53.0.1#13093: timed out
; <<>> DiG 9.19.12-dev <<>> -p 13093 +tcp @10.53.0.1 edns512-notcp. TXT
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5368
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
This retry is unnecessary because "resolution_fails()" considers timeout
a positive result.
(cherry picked from commit e05460c813)
Depending upon when the directory is sampled there may be 2
(oldest version removed and rename / reopen is in progresss) or
3 old versions of the log file.
(cherry picked from commit f485bb19c0)
due to comparing logfile suffixes as 32 bit rather than 64 bit
integers, logfiles with timestamp suffixes that should have been
removed when rolling could be left in place. this has been fixed.
(cherry picked from commit 9a9e906306)
the logfileconfig system test did not conform to the style of
other tests, and was difficult to read and maintain. it has
been cleaned up and simplifeid in several ways:
- named.args used when appropriate so that named can be started with
specified command line arguments, instead of having it launched
directly from tests.sh
- unused root zone removed from named configuration
- an existing directory used instead of using 'mkdir' to create one
- dnssec-validation disabled to stop the server sending unnecessary
queries
incidental fix: removed leftover debugging printfs from logconf.c.
(manually picked from commit 76baed3343)
The serve-stale system test was intermittently failing due to a timing
issue:
I:serve-stale:check stale data.example TXT was refreshed...
I:serve-stale:failed
The RRset is refreshed, however, it first checks for an expected log
line, prior checking that the stale data.example TXT was refreshed
(using dig). This log line is there to ensure the record is actually
refreshed before we start querying again. Alternatively we could just
retry_quiet 10 <wait for dig output matches expectations>. It would
lower the chances for intermittent test failures, since there is no
longer a "check for log line, sleep one second if check fails, check
for log line, ...", prior to the check.
(cherry picked from commit 0bf36da305)
Avoid creating any temporary files in the current workdir.
Additional/changing files in the bin/tests/system directory are
problematic for pytest/xdist collection phase, which assumes the list of
files doesn't change between the collection phase of the main pytest
thread and the subsequent collection phase of the xdist worker threads.
Since the testcrypto.sh is also called during pytest initialization
through conf.sh.common (to detect feature support), this could
occasionally cause a race condition when the list of files would be
different for the main pytest thread and the xdist worker.
(cherry picked from commit 61330a7863)
Previously, an AXFR request would be issued every second while waiting
for the zone to be signed. This might've been the cause of issues in CI
where many tests are running in parallel and any extra load may increase
test instability.
Instead, check for the last NSEC record to have a signature before
commencing the AXFR request to check the zone has been fully signed.
Also increase the time for the zone signing to a total of 60+10 seconds
up from the previous 30.
(cherry picked from commit 3291c891f6)
There's no point in continuing the dupsigs test if a failure is
detected. End the test early to avoid wasting time and resources.
(cherry picked from commit ad647dca13)
Ensure messages from dupsigs system test end up in its log rather than
stdout. Previously, the output was hard to debug when running the tests
in parallel and messages wouldn't end up in the dupsigs.log.
(cherry picked from commit cbe2559f37)
In v9.16, the number of expected signatures for the fully signed dupsigs
zone is 1009 rather than 1008, since there is one extra DNSKEY
signature. The test itself checks for the correct number, but the
barrier which waits for the zone to be fully signed doesn't.
In practice, this had the effect of always waiting the full 30 seconds
for the zone to be signed. Afterwards, the wait barrier would fail.
However, the return code isn't handled, so the test would proceed and
succeed anyway, since 30 seconds was enough time for the zone to get
fully signed.
This issue was introduced during a backport in commit
4840d6f9c9.
The dnstap system test fails intermittently, and it appears to be
a timing issue - adding a short delay after running 'fstrm_capture',
and before running 'dnstap -reopen' improves the situation from
50% failures (5 out of 10 times) to 0% failures (0 out of 20 times),
tested locally.
The reason is that 'fstrm_capture' is executed in the background,
and due to OS scheduling and other factors, the listener socket
may not be ready when the following command runs and tells 'named'
to (re)open it.
(cherry picked from commit fa686fcea5)
The trick is to configure a duplicate zone, which comes after the
catalog zone, where the duplicate zone is an existing member zone.
In that scenario, all the zones which come before the "faulty" zone
in the configuration file will fail to be reverted to the previous
version of the view after a reconfiguration error, and in this
particular case that will result in an assertion failure when the
catalog zone update is initiated, because it will be still tied to
the new version of the view, which was dismissed.
(cherry picked from commit 93c4f382f4)
Add the 'ixfr-from-differences yes;' option to trigger a failed
zone postload operation when a zone is updated but the serial
number is not updated, then issue two successive 'rndc reload'
commands to trigger the bug, which causes an assertion failure.
(cherry picked from commit a73b67456e)
The faulty "DLZ" configuration triggers a reconfiguration failure
in such a place where view reverting code is covered.
(cherry picked from commit 95f4bac002)
stop.pl tries to stop ns1 via rndc but fails to find rndc.conf because
the logfileconfig test code is unexpectedly executed from the
logfileconfig/ns1/ directory. Instead of stopping ns1 with rndc, it
waits for 30 seconds and then terminates ns1 with the TERM signal.
I:logfileconfig:testing default logfile using named -L file (9)
rndc: ../common/rndc.conf does not exist
Stopping ns1 with rndc was recently inadvertently introduced in
172826bfa8. Stop ns1 with the TERM signal
directly, as we did before.
Reproduce the assertion by configuring a 'named' resolver with
'recursive-clients 10;' configuration option and running 20
queries is parallel.
Also tweak the 'ans2/ans.pl' to simulate a 50ms network latency
when qname starts with "latency". This makes sure that queries
running in parallel don't get served immediately, thus allowing
the configured recursive clients quota limitation to be activated.
(cherry picked from commit 4b52b0b4a9)
bin/tests/system/get_algorithms.py:225:4: R1720: Unnecessary "else" after "raise", remove the "else" and de-indent the code inside it (no-else-raise)
(cherry picked from commit 8064ac6bec)
Include MD5 feature detection in featuretest tool and use it in some
places. When RHEL distribution or Fedora ELN is in FIPS mode, then MD5
algorithm is unavailable completely and even hmac-md5 algorithm usage
will always fail. Work that around by checking MD5 works and if not,
skipping its usage.
Those changes were dragged as downstream patch bind-9.11-fips-tests.patch
in Fedora and RHEL.
(cherry picked from commit 6ad794a8cd)
This adds an island of trust that is reachable from the root
where the trust anchors are added to island.conf.
This add an island of trust that is not reachable from the root
where the trust anchors are added to private.conf.
(cherry picked from commit 41bdb5b9fe)
Occasionally, the allotted 10 seconds for the "running" line to appear
in log after named is started proved insufficient in CI, especially
during increased load. Give named up to 60 seconds to start up to
mitigate this issue.
(cherry picked from commit b8bb4233e8)
the nsupdate system test was intermittently failing due to the update
quota not being exceeded when it should have been. this is most likely
a timing issue: the client is sending updates too slowly, or the server
is processing them too quickly, for the quota to fill. this commit
attempts to make that the failure less likely by increasing the number
of update transactions from 10 to 20.
(cherry picked from commit 06b1faf068)
Following deleting the root trust anchor and reconfiguring the
server it takes some time to for trust anchor to appear in 'rndc
managed-keys status' output. Retry several times.
(cherry picked from commit 71dbd09796)
Set the DS state after issuing 'rndc dnssec -checkds'. If the DS
was published, it should go in RUMOURED state, regardless whether it
is already safe to do so according to the state machine.
Leaving it in HIDDEN (or if it was magically already in OMNIPRESENT or
UNRETENTIVE) would allow for easy shoot in the foot situations.
Similar, if the DS was withdrawn, the state should be set to
UNRETENTIVE. Leaving it in OMNIPRESENT (or RUMOURED/HIDDEN)
would also allow for easy shoot in the foot situations.
(cherry picked from commit ee42f66fbe)
The condition was accidentally reversed during refactoring in
9730ac4c56 . It would result in skipped
tests on builds with proper support and false negatives on builds
without proper feature support.
Credit for reporting the issue and the fix goes to Stanislav Levin.
(cherry picked from commit 473cb530f4)
verify that updates are refused when the client is disallowed by
allow-query, and update forwarding is refused when the client is
is disallowed by update-forwarding.
verify that "too many DNS UPDATEs" appears in the log file when too
many simultaneous updates are processing.
(cherry picked from commit b91339b80e)