When a re-configuration fails, `apply_configuration` flows jump to a
cleanup label and, at some point, leave the exclusive mode and cleanup
the viewlist. It looks fine as the viewlist is at this point only
locally known (if this is a configuration failure, this is the new view
list, if this is a success, this is the old list which has been swapped
out from the production list during the exclusive mode).
However, the view and zone initialization code enqueues job callbacks,
for instance from `dns_zone_setsigninginterval` (but there are others
cases) which will be called for the new views and zones after the
exclusive mode is over.
Depending where the configuration fails, those views and zones can be
half-configured, for instance a view might have an unfrozen resolver.
Hence, leaving the exclusive mode before cleaning up those views ans
zones will immediately called the previously enqueued callbacks and lead
to this reconfiguration-failure crash stack:
```
isc_assertion_failed
dns_resolver_createfetch
do_keyfetch
isc__async_cb
...
uv_run
loop_thread
thread_body
thread_run
start_thread
...
```
To avoid the problem, the views are now cleaned up before leaving the
exclusive mode (which also clean up the zones and enqueued callbacks).
As context, the bug was introduced by !10910 which moved the creation
(not configuration) of the view outsides of the exclusive mode. This is
a safe move (as at this point, the newly view are only known locally by
`apply_configuration`) but the re-order was wrong regarding the point
where the exclusive mode was ended (before the change, the exclusive
mode as always ended before the new view are detached).
When performing a ZSK rollover, if the new DNSKEY is omnipresent, the :option:`rndc sign` command now signs the zone completely with the successor key, replacing all zone signatures from the predecessor key with new ones.
Closes#5483
Merge branch '5483-smooth-operator-bug' into 'main'
See merge request isc-projects/bind9!10867
After a full sign we no longer have to need to take the sign delay into
account. Update the timing checks in keymgr_transition_time to determine
the start of the interval: Either the last change, or if SigPublish/
SigDelete is set. The latter case indicates a full sign was done and
so we no longer have to take the sign delay into account.
When introducing the kasp logic, a full sign of the zone did not
generate new signatures for the new active keys during a ZSK rollover.
The introduced kasp logic ensured that the rollover is performed
smoothly, as in the signatures are only replaced if the old signature
is close to expiring (depending on the signatures-refresh option).
Fix by maintaining a fullsign boolean value in the signing structure,
that will ensure the RRsets are signed with the correct key, rather
than a similar good key.
In case of a fullsign, we can also remove signatures from inactive
keys.
Remove the unused dns_zone_signwithkey function.
``disable-algorithms`` could cause DNSSEC validation failures when the parent zone was
signed with the algorithms that were being disabled for the child zone.
This has been fixed; `disable-algorithms` now works
on a whole-of-zone basis.
If the zone's name is at or below the ``disable-algorithms`` name the algorithm
is disabled for that zone, using deepest match when there are multiple
``disable-algorithms`` clauses.
Closes#5165
Merge branch '5165-use-signer-name-when-disabling-dnssec-algorithms' into 'main'
See merge request isc-projects/bind9!10837
Test that if disable-algorithms is configured on a name that is below
the zonecut, it still validates (z.secure.example).
Test that if disable-algorithms is configured on a name that is above
the zonecut, it is treated as insecure (zonecut.ent.secure.example).
When disabling algorithms, use the signer name to determine if the
algorithm is disabled or not. This allows for algorithms to be
cleanly disabled on a zone level basis. Previously, just using the
records owner name, "disable-algorithms" could impact resolution of
names that where not disabled. This does now mean that
"disable-algorithms" can not be used to disable part of a zone anymore.
ACL configuration context variables are inconsistently named as `actx`,
`ac`, or `aclconfctx`, which caused confusion during code reviews. This
commit renames all `cfg_aclconfctx_t` variables to `aclctx`, which is
short, consistent, and unambiguous.
Closes#5530
Merge branch '5530-rename-actx' into 'main'
See merge request isc-projects/bind9!11003
ACL configuration context variables are inconsistently named as `actx`,
`ac`, or `aclconfctx`, which caused confusion during code reviews. This
commit renames all `cfg_aclconfctx_t` variables to `aclctx`, which is
short, consistent, and unambiguous.
A new option `-k` is added to `named-checkconf` that allows checking the `dnssec-policy` `keys` configuration against the configured key stores. If the found key files are not in sync with the given `dnssec-policy`, the check will fail.
This is useful to run before migrating to `dnssec-policy`.
Closes#5486
Merge branch '5486-named-checkconf-dnssec-policy-key-directory' into 'main'
See merge request isc-projects/bind9!10907
The DST_ALGORITHM_FORMATSIZE constant is unused. It could be used in
dst_kasp_key_format, but instead we will use DNS_NAME_FORMATSIZE
because it is used in other places too. Clean up the unused constant.
The configuration should also take into account the built-in
DNSSEC policies when verifying the keys in the key-directory match the
given policy. Update the code accordingly and add some good and
failure test cases.
With named-checkconf -k you can check your configuration including
checking the dnssec-policy keys against the configured keystores. If
there is a mismatch in the key files versus the policy, named-checkconf
will fail. This is useful for running before migrating to dnssec-policy.
For logging purposes, introduce a function that writes the identifying
information about a policy key into a string.
Allow a dnssec key to be initialized outside the keymgr code.
Add 'log_errors' to 'cfg_kasp_fromconfig' to avoid duplicate error
logs.
There's currently an issue with the shotgun workflow that's being
investigated. Until it's resolved, there's no point in creating the
shotgun jobs as they'll just fail.
Merge branch 'nicki/ci-temporarily-disable-shotgun-jobs' into 'main'
See merge request isc-projects/bind9!11005
There's currently an issue with the shotgun workflow that's being
investigated. Until it's resolved, there's no point in creating the
shotgun jobs as they'll just fail.
With the loopmgr rewrite in 9.20, the delv issue shoud no longer happen,
thus the delv tests can be executed under TSAN as well.
Related #4119
Merge branch 'nicki/delv-reenable-under-tsan' into 'main'
See merge request isc-projects/bind9!10996
Statically linking lib{isc,dns,ns,cfg,isccc} and enabling LTO shows over 10% improvements on all almost measurements in perflab. That said, we can't use Meson's option for LTO since it would result in every binary being compiled with LTO and a great increase in compile time.
To work around it, we add a configuration option that enables LTO and static linking only for the `named` binary.
Merge branch 'alessio/meson-lto-v2' into 'main'
See merge request isc-projects/bind9!10761
Enabling LTO yields substantial performance gains on both authoritative
and resolver benchmarks.
But since LTO defers many optimization passes to link time, enabling LTO
across the board would cause an increase in compilation time, as passes
that would be run only once would need to be run for each executable.
As a compromise, this commit adds a named-lto build option, that
compiles the individual object files with the -ffat-lto-object option
and then enables LTO only for the named executable. Object files are
reused between lib*.so and the named executable.
Enabling LTO in the subsequent commit requires the file names to be
unique and having same probes.d in each of the libraries breaks this
requirement. Rename probes.d to probes-{isc,dns,ns}.d files and adjust
the includes.
Refactor a bit of `apply_configuration` by extracting (into respective dedicated function) the logic to build the keystores list, the KASP list as well as creating the view/zones and configuring those. This is the next step of MR !10895 and !10901
While the code is extracted, some global variables has been changed into a function parameters which enable to have a clear view of the dependency of the function, typically, to know if it depends on local configuration object or runtime "production" object. The end goal (not in this MR, but later on) is to move as much as possible initialization logic outside of the exclusive mode.
As a first step, latest commits move the keystores list, KASP list and view/zones creation outside of the exclusive mode. (The view/zone configuration remain in exclusive mode for now, because of a dependency to the runtime "cachelist". This is the target of a next MR.
For the record; while moving the keystores list, KASP list and view/zone creation doesn't have a significant impact on the time the exclusive mode is taken (from my experiment on a 1M small zones instance); moving `configure_views` did have a _massive_ impact (basically, the time spend in the exclusive mode is then non calculable). Configuring views outside the exclusive mode needs more work, which will be done in future MRs.
See #4673
Merge branch 'colin/refactor-applyconfig' into 'main'
See merge request isc-projects/bind9!10910
In order to have a (minimal) test ensuring we don't move back
`apply_configuration` subroutines which can be done before the exclusive
lock is taken, `APPLY_CONFIGURATION_SUBROUTINE_LOG` macro is added and
used for the few subroutines already extracted from the exclusive mode.
Those expected logs are added in `configloading` system test checks.
When the server is configured (inside `apply_configuration`) a client
TLS context cache is created and attached to the global server object.
It is then used by `configure_view` flow (and also during runtime though
the zone manager).
It is now created before the exclusive mode, and the swap of the
previous TLS cache ctx is done at the end of the exclusive mode, if
everything went well.
This allows us (among other follow-up changes) to move the
`configure_views` function outside of the exclusive mode.
The keystores initialization, the KASP list initialization as well as
the initialization of the view no longer depends of any data shared by
running "production" objects during re-configuration of the server. This
allows us to move those outside (before) the exclusive mode is taken.
`named_g_actconfctx` is a global variable holding the ACL configuration
context alive (in particular, to dynamically load zones). However, this
object is build once per configuration (early) and is used only inside
server.c `apply_configuration` flow. (Two exceptions: the shutdown flow,
still in server.c and plugin check flow, which doesn't need it, so it's
NULL in such case).
Instead of leaving this global publicly exposed, it is now part of the
`named_server_t` object. This allows us to clearly see that, when
reconfigureing the server, the new instance of the ACL context is known
only by the newly built object and not currently used by "production"
object; and will help to move move logic before the exclusive mode is
taken.
The other advantage is that the ACL configuration context can now be
built before the exclusive lock as well.
The kasplist (dnssec-policy defined in the builtin and global
configuration options) was built inside apply_configuration. This
commit extracts this logic into its separate function.
In order to make the view configuration independent of the global
`server` object, the newly built kasplist is now passed as parameter.
(This eventually will help to be able to configure the views outside of
the exclusive mode by limiting its dependency to the global
`server`/`named_g_server`).
When creating/configuring the view, the user-defined views are built and
set into the viewlist, then builtin-view inside the builtin_viewlist.
But there is no seperate logic applied to those two lists, and they are
immediately merged into viewlist right after. This commit removes this
intermediate list and add builtin-views directly into the main viewlist
instead.
In order to help splitting apply_configuration, the inline loops and bit
of logic around it for views creation and configuration, each of those
are now in a dedicatated function.
Replace the locked hashmap with the lock-free hashtable from the RCU
library and protect the fetch contexts against reuse by replacing the
libisc reference counting with urcu_ref that can soft-fail in situation
where the reference count is already zero. This allows us to easily
skip re-using the fetch context if it is already in process of being
destroyed.
Merge branch 'ondrej/use-urcu-lfht-for-resolver-tables' into 'main'
See merge request isc-projects/bind9!10653
Previously, the fetch contexts were stored inside rwlocked hashmap
table. This was one of the most contended places for the resolver,
especially in the cold cache situation.
Replace the locked hashmap with the lock-free hashtable from the RCU
library and protect the fetch contexts against reuse by replacing the
libisc reference counting with urcu_ref that can soft-fail in situation
where the reference count is already zero. This allows us to easily
skip re-using the fetch context if it is already in process of being
destroyed.
Previously, the slabtops for "type" and its signature was only loosely
coupled and the headers could expire at different time (both TTL and LRU
based expiry). Add a .related member to the slabtop that allows us to
expire the headers in both related headers and also optimize the lookups
because now both slabtops are looked up at the same time.
Closes#3396
Merge branch '3396-bind-rrsigs-to-records' into 'main'
See merge request isc-projects/bind9!10985
Previously, the slabtops for "type" and its signature was only loosely
coupled and the headers could expire at different time (both TTL and LRU
based expiry). This commit expires the headers in both related
headers.
Co-authored-by: Matthijs Mekking <matthijs@isc.org>
Previously, the slabtops for "type" and its signature was only loosely
coupled. Add a .related member to the slabtop that allows us to
optimize the lookups because now both slabtops are looked up at the
same time.
Co-authored-by: Matthijs Mekking <matthijs@isc.org>
There was a pattern where first the header was checked for NULL
and then for being stale. In both cases the code path is the same
so it makes sense to put them in a separate function.
Co-authored-by: Matthijs Mekking <matthijs@isc.org>
This is the first MR in series that aims to reduce the node locking
by replacing the single-linked list of slabtop(s) and slabheader(s)
with CDS linked list. This commit doesn't do anything else beyond
replacing .next and .down links with the cds_list_head. The RCU
semantics will be added later.
Merge branch 'ondrej/use-rcu-list-for-slabtop' into 'main'
See merge request isc-projects/bind9!10944