'HOME=value command' should only change HOME for command but on
some platforms this occasionally sets HOME for the rest of the
test. Explicitly isolate the enviroment change using a sub shell.
(cherry picked from commit 96f75bba18)
Add a unit test to check if the overmem purging in the RBTDB is
effective when mixed size RR data is inserted into the database.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Jinmei Tatuya <jtatuya@infoblox.com>
(manually picked from 269c03831f)
The resolve binary is affected by GL#4119 which occassionally makes it
hand during system tests when running with TSAN. This is a workaround to
avoid wasting resources caused by a CI timeout for the system test tsan
jobs.
(cherry picked from commit 774b9bc629)
The conditions that trigger the crash:
- a stale record is in cache
- stale-answer-client-timeout is 0
- multiple clients query for the stale record, enough of them to exceed
the recursive-clients quota
- the response from the authoritative is sufficiently delayed so that
recursive-clients quota is exceeded first
The reproducer attempts to simulate this situation. However, it hasn't
proven to be 100 % reproducible, especially in CI. When reproducing
locally, the priming query also seems to sometimes interfere and prevent
the crash. When the reproducer is ran twice, it appears to be more
reliable in reproducing the issue.
(cherry picked from commit f617512d37)
When a primary server is not responding, mark it as temporarialy
unreachable. This will prevent too many zones queuing up on a
unreachable server and allow the refresh process to move onto
the next primary sooner once it has been so marked.
The detach (and possibly close) netmgr events can cause additional
callbacks to be called when under exclusive mode. The detach can
trigger next queued TCP query to be processed and close will call
configured close callback.
Move the detach and close netmgr events from the priority queue to the
normal queue as the detaching and closing the sockets can wait for the
exclusive mode to be over.
(cherry picked from commit c2c2ec0c96)
In pytest 7.4.0, there were some changes to how the configuration file
for pytest is located. In our case, this resulted in a failure to find
the conftest.py with the needed fixtures which then prevented our python
tests from being executed successfully.
Configure the --confcutdir to ensure it points to the system test
directory, where our conftest.py is located.
Related https://github.com/pytest-dev/pytest/pull/11043
Because of a typo, the fetch.pl script tries to extract the server
address from the input parameter 'a' instead of 's'. Fix the typo.
(cherry picked from commit aa7538fd38)
readthedocs.org is switching to in-repo configuration file in favor of
the deprecated web interface as it allows more flexibility.
This also fixes our recent doc build issues, as we're switching to a
newer Python which is required by Sphinx.
See https://blog.readthedocs.com/migrate-configuration-v2/
(cherry picked from commit a1a0ccda6e)
Add this test scenario for a bug fixed a while ago. When a third key is
introduced while the previous rollover hasn't finished yet, the keymgr
could decide to remove the first two keys, because it was not checking
for an indirect dependency on the keys.
In other words, the previous bug behavior was that the first two keys
were removed from the zone too soon.
This test case checks that all three keys stay in the zone, and no keys
are removed premature after another new key has been introduced.
(cherry picked from commit 9c40cf0566)
In the kasp script, if one expected key is not found, continue checking
the other key ids, even if there is no match for the first one. This
provides a bit more information which keys mismatch and makes for
easier debugging test failures.
(cherry picked from commit 674249f66a)
Make the cds/setup.sh compatible with the workaround which relies on
testing the TSAN_OPTIONS variable which may not be set.
(cherry picked from commit 76d9873ef6)
Surround the variables which are checked whether they're executable in
double quotes. Without them, empty paths won't be properly interpreted
as not executable.
(manually picked from commit 06056c44a7)
Since delv can occasionally hang in system tests when running with TSAN
(see GL#4119), disable these tests as a workaround. Otherwise, the hung
delv process will just waste CI resources and prevent any meaningful
output from the rest of the test suite.
(cherry picked from commit fbcf37f914)
Previously, the first check silently failed, as 450 is apparently (in
the CI) the minimum output size for the dnstap output, rather than
470 which the test was expecting. Effectively, the check served as a 5
second sleep rather than waiting for the proper file size.
Additionally, check the expected file sizes and fail if expectations
aren't met.
(manually picked from commit 5f809e50b6)
On main, the minimum file size seems to 454 bytes, while on some
platforms in our CI setup for the 9.16 branch, it appears to be 450
instead.
The log message is supposed to contain the zone name which was
erroneously omitted, but didn't pop up during tests, since return code
was silently ignored.
Now it actually waits for the proper log message rather than being an
equivalent of 3 second sleep (which was also sufficient to make the test
pass, thus we detected no failure).
(cherry picked from commit 1dd4c2b9e2)
util/parse_tsan.py builds tables of mutexes, threads, and pointers it
finds in the TSAN report provided to it as a command-line argument and
then replaces all mentions of each of these entities so that they are
numbered sequentially in the processed report. For example, this line:
Cycle in lock order graph: M0 (...) => M5 (...) => M9 (...) => M0
is expected to become:
Cycle in lock order graph: M1 (...) => M2 (...) => M3 (...) => M1
Problems arise when the gaps between mutex/thread identifiers present on
a single line are smaller than the total number of mutexes/threads found
by the script so far. For example, the following line:
Cycle in lock order graph: M0 (...) => M1 (...) => M2 (...) => M0
first gets turned into:
Cycle in lock order graph: M1 (...) => M1 (...) => M2 (...) => M1
and then into:
Cycle in lock order graph: M2 (...) => M2 (...) => M2 (...) => M2
In other words, lines like this become garbled due to information loss.
The problem stems from the fact that the numbering scheme the script
uses for identifying mutexes and threads is exactly the same as the one
used by TSAN itself. Update util/parse_tsan.py so that it uses
zero-padded numbers instead, making the "overlapping" demonstrated above
impossible.
(cherry picked from commit 7f0790c82f)
This is just a slight tweak for the respdiff CI test. The new dataset
has a different set of queries and it results in a slightly more
SERVFAILs rather than timeouts in the respdiff-long-third-party test.
In our comparison script, timeouts are not counted towards the
threshold. While the total number of differences remains roughly the
same, the different distributions of them (among SERVFAIL vs timeout)
warrants a slight bump in the threshold in order to avoid test failures.
Related isc-private/bind-qa!65
The purpose of the check is to verify the server has survived the
previous barrage of queries. This is done by sending a query and
checking we get a NOERROR response back.
Previously, that query could've been affected by a servfail cache - the
server would return a SERVFAIL answer, thus failing the check, despite
being up and running. Use version.bind txt ch query to avoid the
interference of servfail cache.
(cherry picked from commit dd7bcd2855)