Commit graph

43749 commits

Author SHA1 Message Date
Ondřej Surý
7682bc21a9
Rewrite dns_adb LRU to SIEVE
The dns_adb cleaning is little bit muddled as it mixes the "TTL"
based cleaning (.expire_v4 and .expire_v6 for adbname, .expires for
adbentry) with overmem cleaning.

Rewrite the LRU based cleaning to use SIEVE algorithm and to be overmem
cleaning only with a requirement to always cleanup at least 2-times the
size of the newly added entry.
2025-07-09 21:22:47 +02:00
Alessio Podda
e0d1d936de chg: dev: Replace per-zone lock buckets with global buckets
Qpzone employs a locking strategy where rwlocks are grouped into
buckets, and each zone gets 17 buckets.
This strategy is suboptimal in two ways:
 - If named is serving a single zone or a zone is the majority of the
   traffic, this strategy pretty much guarantees contention when using
   more than a dozen threads.
 - If named is serving many small zones, it causes substantial memory
   usage.

This commit switches the locking to a global table initialized at start
time. This should have three effects:
 - Performance should improve in the single zone case, since now we are
   selecting from a bigger pool of locks.
 - Memory consumption should go down significantly in the many zone
   cases.
 - Performance should not degrade substantially in the many zone cases.
   The reason for this is that, while we could have substantially more
   zones than locks, we can query/edit only O(num threads) at the same
   time. So by making the global table much bigger than the expected
   number of threads, we can limit contention.

Merge branch 'alessio/global-qpzone-lock-table' into 'main'

See merge request isc-projects/bind9!10446
2025-07-09 14:17:02 +00:00
Alessio Podda
25daa047d4 Replace per-zone lock buckets with global buckets
Qpzone employs a locking strategy where rwlocks are grouped into
buckets, and each zone gets 17 buckets.
This strategy is suboptimal in two ways:
 - If named is serving a single zone or a zone is the majority of the
   traffic, this strategy pretty much guarantees contention when using
   more than a dozen threads.
 - If named is serving many small zones, it causes substantial memory
   usage.

This commit switches the locking to a global table initialized at start
time. This should have three effects:
 - Performance should improve in the single zone case, since now we are
   selecting from a bigger pool of locks.
 - Memory consumption should go down significantly in the many zone
   cases.
 - Performance should not degrade substantially in the many zone cases.
   The reason for this is that, while we could have substantially more
   zones than locks, we can query/edit only O(num threads) at the same
   time. So by making the global table much bigger than the expected
   number of threads, we can limit contention.
2025-07-09 15:27:38 +02:00
Alessio Podda
512f1d3005 chg: dev: Extract the resigning heap into a separate struct
In the current implementation, the resigning heap is part of the zone
database. This leads to a cycle, as the database has a reference to its
nodes, but each node needs a reference to the database.

This MR splits the resigning heap into its own separate struct, in order
to help breaking the cycle.

Merge branch 'alessio/split-qpzone-heap-from-qpdb' into 'main'

See merge request isc-projects/bind9!10706
2025-07-09 11:05:52 +00:00
Alessio Podda
0b1785ec10 Extract the resigning heap into a separate struct
In the current implementation, the resigning heap is part of the zone
database. This leads to a cycle, as the database has a reference to its
nodes, but each node needs a reference to the database.

This MR splits the resigning heap into its own separate struct, in order
to help breaking the cycle.
2025-07-09 12:33:18 +02:00
Alessio Podda
c2a84bb17a Abstract bucket lock selection logic
Recovering the node lock from a pointer to the header and a pointer to
the db is a common operation. This commit abstracts it away into a
function, so that the node lock selection logic may be modified more
easily.
2025-07-09 12:33:18 +02:00
Mark Andrews
720fa14670 fix: dev: Fix a possible crash when adding a zone while recursing
A query for a zone that was not yet loaded may yield an unexpected result such as a CNAME or DNAME, triggering an assertion failure. This has been fixed.

Closes #5357

Merge branch '5357-resume-qmin-cname' into 'main'

See merge request isc-projects/bind9!10562
2025-07-09 10:55:28 +10:00
Petr Menšík
d2c6966232 Add few extra WANT_QUERYTRACE logs into resume_qmin
Print optionally a bit more details not passed to event in case
dns_view_findzonecut returns unexpected result. Result would be
visible later in foundevent, but found fname would be lost. Print it
into the log.
2025-07-09 10:13:29 +10:00
Petr Mensik
2fd3da54f9 Handle CNAME and DNAME in resume_min in a special way
When authoritative zone is loaded when query minimization query for the
same zone is already pending, it might receive unexpected result codes.

Normally DNS_R_CNAME would follow to query_cname after processing sent
events, but dns_view_findzonecut does not fill CNAME target into
event->foundevent. Usual lookup via query_lookup would always have that
filled.

Ideally we would restart the query with unmodified search name, if
unexpected change from recursing to local zone cut were detected. Until
dns_view_findzonecut is modified to export zone/cache source of the cut,
at least fail queries which went into unexpected state.
2025-07-09 10:13:29 +10:00
Michal Nowak
f0ca86be7c new: ci: Add AlmaLinux 10
Merge branch 'mnowak/add-almalinux-10' into 'main'

See merge request isc-projects/bind9!10682
2025-07-08 15:59:27 +02:00
Michal Nowak
7c5c16ea6b
Do not add AlmaLinux 9 unit and system test in MR pipelines 2025-07-08 14:51:47 +02:00
Michal Nowak
42367082cc
Add AlmaLinux 10 2025-07-08 14:51:47 +02:00
Michał Kępień
28226f979a fix: pkg: Fix named-makejournal man page installation
The man page for :iscman:`named-makejournal` was erroneously not
installed when building from a source tarball. This has been fixed.

See #5379

Merge branch '5379-fix-named-makejournal-man-page-installation' into 'main'

See merge request isc-projects/bind9!10709
2025-07-08 14:13:33 +02:00
Aydın Mercan
ccae13b482
Add missing files for meson built manpages
These manual entries still get built and installed but get excluded from
meson's rebuild detection.
2025-07-08 13:44:03 +03:00
Michał Kępień
caa0451e28
Fix named-makejournal man page installation
The man page for named-makejournal is erroneously not installed when
building from a source tarball.  Add that man page to the appropriate
lists in the build system so that it is installed both when building
from a Git repository and from a source tarball.
2025-07-08 13:44:03 +03:00
Michal Nowak
8936237ef6 fix: ci: Ensure PYTHON is set for every parse_tsan.py invocation
System tests' after_script missed the PYTHON environmental variable
setup.

    $ find -name 'tsan.*' -exec "$PYTHON" util/parse_tsan.py {} \;
    find: '': No such file or directory

Merge branch 'mnowak/fix-parse_tsan-invocation' into 'main'

See merge request isc-projects/bind9!10683
2025-07-08 12:21:47 +02:00
Michal Nowak
8f858c4f03
Ensure PYTHON is set for every parse_tsan.py invocation
System tests' after_script missed the PYTHON environmental variable
setup.

    $ find -name 'tsan.*' -exec "$PYTHON" util/parse_tsan.py {} \;
    find: '': No such file or directory
2025-07-08 11:05:00 +02:00
Ondřej Surý
754d17590e fix: usr: Clean enough memory when adding new ADB names/entries under memory pressure
The ADB memory cleaning is opportunistic even when we are under
memory pressure (in the overmem condition).  Split the opportunistic
LRU cleaning and overmem cleaning and make the overmem cleaning
always cleanup double of the newly allocated adbname/adbentry to
ensure we never allocate more memory than the assigned limit.

Merge branch 'ondrej/enforce-memory-cleanup-in-ADB-when-overmem' into 'main'

See merge request isc-projects/bind9!10637
2025-07-08 09:49:30 +02:00
Ondřej Surý
eb0ffa0d5f
When overmem, clean enough memory when adding new ADB names/entries
The purge_stale_names()/purge_stale_entries() is opportunistic even when
we are under memory pressure (overmem).  Split the opportunistic LRU
cleaning and overmem cleaning.  This makes the stale purging much
simpler as we don't have to try that hard and makes the overmem cleaning
always cleanup double the amount of the newly allocated ADB name/entry.
2025-07-08 05:56:19 +02:00
Mark Andrews
8420adf218 chg: usr: use native shared library extension
Use the native shared library extension when build loadable
libaries.  For most platforms this is ".so" but for Darwin it
is ".dylib".

Closes #5375

Merge branch '5375-use-native-shared-library-extension' into 'main'

See merge request isc-projects/bind9!10588
2025-07-08 01:24:40 +10:00
Mark Andrews
28a8933690 Use native shared library extension
For most platforms this is ".so" but for Darwin it is ".dylib".
2025-07-07 23:39:44 +10:00
Nicki Křížek
02d9fbfe26 chg: test: Improve system test stability
Tweak various system test which have been unstable in the past weeks.

Closes #5406

Merge branch 'nicki/improve-system-test-stability' into 'main'

See merge request isc-projects/bind9!10690
2025-07-07 14:04:10 +02:00
Nicki Křížek
b98660e93e Remove unstable check from digdelv test
The code which checks for both IPv4 and IPv6 mixed usage is inherently
unstable, since the address family is chosen randomly for each
connection.

Closes #5406
2025-07-07 13:29:15 +02:00
Nicki Křížek
4c487c811d Use pytest.mark.flaky as the flaky marker
It's possible to use pytest.mark.flaky, which achieves the exact same
thing as our custom-defined isctest.mark.flaky -- attempts to rerun the
test on failure, but only is flaky package is available.
2025-07-07 13:29:15 +02:00
Nicki Křížek
126a59cef2 Mark secondary.kasp test case as flaky on freebsd13
The test_kasp_case[secondary.kasp] can sometimes fail on freebsd13. It
appears the test gets stuck on some operation which should be very
quick, but for some reason takes at least a few seconds, causing the
cb_ixfr_is_signed() function to time out.

In one of the cases I investigated, it wasn't a query/response that
caused a timeout, but rather some operation in between. The test
attempts to read from a keyfile/statefile, but I see no reason why that
should block.

In any case, try to increase the timeout for the verification, as that
shouldn't hurt. Also allow the test to be re-run on freebsd13, as it's
likely to be caused by some odd behaviour on that platform -- the issue
doesn't appear anywhere else.
2025-07-07 13:29:15 +02:00
Nicki Křížek
34867e1693 Allow dnstap system test rerun on freebsd13
The check "unix socket message counts" sometimes fails with "dnstap
output file smaller than expected". This only happens on freebsd13 and
can't be reproduced easily. There was an attempt to decrease the
required file size in the past, but apparently, the issue can still
occur.
2025-07-07 13:29:15 +02:00
Nicki Křížek
1e0df480c7 Mark the serve_stale system test as flaky
The serve_stale test has some inherent instabilities affecting many
different checks. While the failure rate isn't too high (about four
failures in past three weeks of nightlies), it gets ignored, because the
test has been unstable for a very long time.
2025-07-07 13:29:15 +02:00
Nicki Křížek
6755d741e4 Remove token deletion check in keyfromlabel test
This removes a leftover check which should've been removed in a prior
change (see #5244). The softhsm2 failures when attempting to delete the
token should be ignored.
2025-07-07 13:29:15 +02:00
Nicki Křížek
87ab198b73 Use proper wait in rndc test
Previously, the one-second sleep was unreliable, as it didn't properly
indicate that the rndc reconfig has been processed. The "test 'rndc
reconfig' with a broken config" check would sometimes fail under TSAN
in CI, because the previous rndc reconfig was still ongoing, and the
subsequent rndc reconfig was ignored.
2025-07-07 13:29:15 +02:00
Nicki Křížek
66f6f4bba9 Allow reruns for test_json and test_xml tests
These tests have been unstable under TSAN in the past, but it appears
that the same failure mode can happen outside of TSAN tests as well.
These tests have produced 12 failures combined in the past three weeks
in nightlies.
2025-07-07 13:29:02 +02:00
Nicki Křížek
ae932eefc5 Increase test reruns for fetchlimit
The fetchlimit test has failed 8 times in the nightly CI over the past
three weeks. That makes the overall failure rate somewhere around 1 %,
which isn't a lot, but is still annoying when lots of testing is going
on.
2025-07-07 13:29:02 +02:00
Mark Andrews
a9575a4154 fix: test: rndc test: second 'rndc reconfig' happens too soon
Rndc test "test 'rndc reconfig' with a broken config" was failing
intermittently.

Wait for 'running' to be logged rather than just using 'sleep 1' before
calling 'rndc reconfig' a second time to get the expected error message
rather than 'reconfig request ignored: already running'.

Closes #5408

Merge branch '5408-rndc-test-second-rndc-reconfig-happens-too-soon' into 'main'

See merge request isc-projects/bind9!10687
2025-07-07 12:21:58 +10:00
Mark Andrews
8b7bbda2f1 rndc test: second 'rndc reconfig' happens too soon
Rndc test "test 'rndc reconfig' with a broken config" was failing
intermittently.

Wait for 'running' to be logged rather than just using 'sleep 1' before
calling 'rndc reconfig' a second time to get the expected error message
rather than 'reconfig request ignored: already running'.
2025-07-07 11:42:10 +10:00
Štěpán Balážik
7dcc654f2c chg: test: Disable DNSSEC validation instead of enabling it with empty TAs in system tests
There are many system tests where we set `dnssec-validation yes;` only
to also set `trust-anchors { };` which effectively disables the
validation.

This MR replaces this convoluted setup with just `dnssec-validation no;`.

Merge branch 'stepan/empty-trust-anchors-in-system-tests' into 'main'

See merge request isc-projects/bind9!10684
2025-07-06 16:54:41 +00:00
Štěpán Balážik
01d1ad7988 Disable DNSSEC validation instead of enabling it with empty TAs in tests
There are many system tests where we set `dnssec-validation yes;` only
to also set `trust-anchors { };` which effectively disables the
validation.

This commit replaces this convoluted setup with just
`dnssec-validation no;`.
2025-07-06 14:18:10 +00:00
Štěpán Balážik
67916aafad new: ci: Run an additional respdiff job for merge requests and schedules
On MRs it uses the merge target as the reference.
In schedules it uses the latest released version for this branch as the reference.

This MR lays the ground work for using respdiff on non-standard configurations (like ECS) in the public repo, see https://gitlab.isc.org/isc-private/bind9/-/merge_requests/807#note_573140.

To reduce the future hassle when maintaining the -S version, most of the work (including an added job, so we know that it actually works) is done here.

Merge branch 'stepan/respdiff-against-merge-target-or-last-release' into 'main'

See merge request isc-projects/bind9!10664
2025-07-06 13:18:53 +00:00
Štěpán Balážik
9a6e8b9190 Run an additional respdiff job for merge requests and schedules
On MRs it uses the merge target as the reference.
In schedules it uses the latest released version for this branch as the
reference.
2025-07-06 13:18:42 +00:00
Mark Andrews
571d318466 fix: dev: Separate out adbname type flags
There are three adbname flags that are used to identify different
types of adbname lookups when hashing rather than using multiple
hash tables.  Separate these to their own structure element as these
need to be able to be read without locking the adbname structure.

Closes #5404

Merge branch '5404-seperate-out-adbname-type-flags' into 'main'

See merge request isc-projects/bind9!10677
2025-07-06 23:09:13 +10:00
Mark Andrews
9158e63218 Separate out adbname flags that are hashed
There are three adbname flags that are used to identify different
types of adbname lookups when hashing rather than using multiple
hash tables.  Separate these to their own structure element as these
need to be able to be read without locking the adbname structure.
2025-07-06 22:33:27 +10:00
Michał Kępień
e5bcbaee99 chg: doc: Set up version for BIND 9.21.11
Merge branch 'michal/set-up-version-for-bind-9.21.11' into 'main'

See merge request isc-projects/bind9!10692
2025-07-04 22:16:23 +02:00
Michał Kępień
90c5583cf4 Update BIND version to 9.21.11-dev 2025-07-04 22:08:41 +02:00
Michał Kępień
205da98524
Update BIND version for release 2025-07-04 11:34:56 +02:00
Michał Kępień
aba823170b new: doc: Prepare documentation for BIND 9.21.10
Merge branch 'michal/prepare-documentation-for-bind-9.21.10' into 'v9.21.10-release'

See merge request isc-private/bind9!816
2025-07-04 11:33:25 +02:00
Michał Kępień
405938597f
Add release note for GL !8989 2025-07-03 22:54:36 +02:00
Michał Kępień
9cdaaa6511
Tweak and reword release notes 2025-07-03 22:54:36 +02:00
Michał Kępień
beb5214586
Prepare release notes for BIND 9.21.10 2025-07-03 22:54:36 +02:00
Michał Kępień
94a463138e
Generate changelog for BIND 9.21.10 2025-07-03 22:54:36 +02:00
Andoni Duarte
7fafa0e48f [CVE-2025-40777] sec: usr: Fix a possible assertion failure when using the 'stale-answer-client-timeout 0' option
In specific circumstances the :iscman:`named` resolver process could
terminate unexpectedly when stale answers were enabled and the
``stale-answer-client-timeout 0`` configuration option was used.
This has been fixed.

See isc-projects/bind9#5372

Merge branch '5372-security-serve-stale-crash-on-insist-unreachable' into 'v9.21.10-release'

See merge request isc-private/bind9!808
2025-07-03 10:52:28 +00:00
Aram Sargsyan
3d8bd8bbf1 Reset DNS_DBFIND_STALETIMEOUT in query_lookup()
If ns__query_start() is called because of a chained query (e.g.
after encountering a CNAME), a previously set DNS_DBFIND_STALETIMEOUT
flag on the query's 'dboptions' field can cause an assertion
failure if the new query's 'stalefirst' value is not true (e.g. if the
target qname is an authoritative zone for the server). Reset the
DNS_DBFIND_STALETIMEOUT flag in the query_lookup() function before
evaluating the 'stalefirst' value, and make sure to assign a fresh
value to the `stalefirst' flag instead of conditionally assigning it
only if the value is 'true'.
2025-07-03 11:03:34 +02:00
Nicki Křížek
3719cf53c0 chg: ci: Allow flaky unit tests to be re-run in CI
Mark unstable unit tests with `flaky` test suite. Execute the stable
separately in CI. Allow the flaky ones to be re-executed once in case
they fail.

Closes #5385

Merge branch '5385-rerun-flaky-unit-tests' into 'main'

See merge request isc-projects/bind9!10665
2025-07-02 13:49:00 +02:00