If there happens to be a RRSIG(SOA) that is not at the zone apex
for any reason it should not be considered as a stopping condition
for incremental zone signing.
(cherry picked from commit b7cdc3583e)
When the keymgr needs to create new keys, it is possible it needs to
create multiple keys. The keymgr checks for keyid conflicts with
already existing keys, but it should also check against that it just
created.
(cherry picked from commit 668301f138)
Previously, the taskmgr, timermgr and socketmgr had a constructor
variant, that would create the mgr on top of existing appctx. This was
no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
directly without any extra code.
This commit just cleans up the extra function.
(cherry picked from commit 3388ef36b3)
Since all the libraries are internal now, just cleanup the ISCAPI remnants
in isc_socket, isc_task and isc_timer APIs. This means, there's one less
layer as following changes have been done:
* struct isc_socket and struct isc_socketmgr have been removed
* struct isc__socket and struct isc__socketmgr have been renamed
to struct isc_socket and struct isc_socketmgr
* struct isc_task and struct isc_taskmgr have been removed
* struct isc__task and struct isc__taskmgr have been renamed
to struct isc_task and struct isc_taskmgr
* struct isc_timer and struct isc_timermgr have been removed
* struct isc__timer and struct isc__timermgr have been renamed
to struct isc_timer and struct isc_timermgr
* All the associated code that dealt with typing isc_<foo>
to isc__<foo> and back has been removed.
(cherry picked from commit 16fe0d1f41)
"resolve" is used by the resolver system tests, and I'm not
certain whether delv exercises the same code, so rather than
remove it, I moved it to bin/tests/system.
(cherry picked from commit d0ec7d1f33)
sample code for export libraries is no longer needed and
this code is not used for any internal tests. also, sample-gai.c
had already been removed but there were some dangling references.
(cherry picked from commit 056afe7bdc)
the libdns client API is no longer being maintained for
external use, we can remove the code that isn't being used
internally, as well as the related tests.
(cherry picked from commit fb2a352e7c)
When setnsec3param() is schedule from zone_postload() there's no
guarantee that `zone->db` is not `NULL` yet. Thus when the
setnsec3param() is called, we need to check for `zone->db` existence and
reschedule the task, because calling `rss_post()` on a zone with empty
`.db` ends up with no-op (the function just returns).
(cherry picked from commit 0127ba6472)
BIND 9 attempts to look up GSSAPI OIDs for the Kerberos 5 and SPNEGO
mechanisms in the relevant header files provided by the Kerberos/GSSAPI
library used. Due to the differences between various Kerberos/GSSAPI
implementations, if any of the expected preprocessor macros
(GSS_KRB5_MECHANISM, GSS_SPNEGO_MECHANISM) is not defined in the header
files provided by the library used, the code in lib/dns/gssapictx.c
defines its own version of each missing macro, so that BIND 9 can
attempt to use the relevant security mechanisms anyway.
Commit a875dcc669, which contains a
partial backport of the changes introduced in commit
978c7b2e89, left a block of code in the
lib/dns/dst_internal.h header which defines the GSS_SPNEGO_MECHANISM
preprocessor macro to NULL if it is not defined by any header file
provided by the Kerberos/GSSAPI library used. This causes the
gss_add_oid_set_member() call in the mech_oid_set_create() helper
function to always return an error. This in turn causes the
dst_gssapi_acquirecred() function to also always return an error, which
ultimately prevents any named instance whose configuration includes the
"tkey-gssapi-credential" option from starting.
Remove the offending conditional definition of the GSS_SPNEGO_MECHANISM
preprocessor macro from lib/dns/dst_internal.h, so that a proper GSSAPI
OID is assigned to that macro in lib/dns/gssapictx.c when the
Kerberos/GSSAPI library used does not define it.
Too much logic was cramped inside the dns_journal_rollforward() that
made it harder to follow. The dns_journal_rollforward() was refactored
to work over already opened journal and some of the previous logic was
moved to new static zone_journal_rollforward() that separates the
journal "rollforward" logic from the "zone" logic.
(cherry picked from commit 55b942b4a0)
when dns_journal_rollforward returned ISC_R_RECOVERABLE the distintion
between 'up to date' and 'success' was lost, as a consequence
zone_needdump() was called writing out the zone file when it shouldn't
have been. This change restores that distintion. Adjust system
test to reflect visible changes.
(cherry picked from commit ec7a9af381)
Introduce some macros that can be reused in 'zone_load_soa_rr()' and
'zone_get_from_db()' to make those functions more readable.
(cherry picked from commit 8fcbef2423)
Shorten the code and make it less prone to initialisation errors
(it is still easy to forget adding an initializer, but it now defaults
to 0).
(cherry picked from commit 032110bd2e)
The draft says that the NSEC(3) TTL must have the same TTL value
as the minimum of the SOA MINIMUM field and the SOA TTL. This was
always the intended behaviour.
Update the zone structure to also track the SOA TTL. Whenever we
use the MINIMUM value to determine the NSEC(3) TTL, use the minimum
of MINIMUM and SOA TTL instead.
There is no specific test for this, however two tests need adjusting
because otherwise they failed: They were testing for NSEC3 records
including the TTL. Update these checks to use 600 (the SOA TTL),
rather than 3600 (the SOA MINIMUM).
(cherry picked from commit 9af8caa733)
It is more intuitive to have the countdown 'max-stale-ttl' as the
RRset TTL, instead of 0 TTL. This information was already available
in a comment "; stale (will be retained for x more seconds", but
Support suggested to put it in the TTL field instead.
(cherry picked from commit a83c8cb0af)
Before binding an RRset, check the time and see if this record is
stale (or perhaps even ancient). Marking a header stale or ancient
happens only when looking up an RRset in cache, but binding an RRset
can also happen on other occasions (for example when dumping the
database).
Check the time and compare it to the header. If according to the
time the entry is stale, but not ancient, set the STALE attribute.
If according to the time is ancient, set the ANCIENT attribute.
We could mark the header stale or ancient here, but that requires
locking, so that's why we only compare the current time against
the rdh_ttl.
Adjust the test to check the dump-db before querying for data. In the
dumped file the entry should be marked as stale, despite no cache
lookup happened since the initial query.
(cherry picked from commit debee6157b)
When introducing change 5149, "rndc dumpdb" started to print a line
above a stale RRset, indicating how long the data will be retained.
At that time, I thought it should also be possible to load
a cache from file. But if a TTL has a value of 0 (because it is stale),
stale entries wouldn't be loaded from file. So, I added the
'max-stale-ttl' to TTL values, and adjusted the $DATE accordingly.
Since we actually don't have a "load cache from file" feature, this
is premature and is causing confusion at operators. This commit
changes the 'max-stale-ttl' adjustments.
A check in the serve-stale system test is added for a non-stale
RRset (longttl.example) to make sure the TTL in cache is sensible.
Also, the comment above stale RRsets could have nonsensical
values. A possible reason why this may happen is when the RRset was
marked a stale but the 'max-stale-ttl' has passed (and is actually an
RRset awaiting cleanup). This would lead to the "will be retained"
value to be negative (but since it is stored in an uint32_t, you would
get a nonsensical value (e.g. 4294362497).
To mitigate against this, we now also check if the header is not
ancient. In addition we check if the stale_ttl would be negative, and
if so we set it to 0. Most likely this will not happen because the
header would already have been marked ancient, but there is a possible
race condition where the 'rdh_ttl + serve_stale_ttl' has passed,
but the header has not been checked for staleness.
(cherry picked from commit 2a5e0232ed)
add matching macros to pass arguments from called methods
to generic methods. This will reduce the amount of work
required when extending methods.
Also cleanup unnecessary UNUSED declarations.
(cherry picked from commit a88d3963e2)
Even if a call to gss_accept_sec_context() fails, it might still cause a
GSS-API response token to be allocated and left for the caller to
release. Make sure the token is released before an early return from
dst_gssapi_acceptctx().
(cherry picked from commit d954e152d9)
Both managed keys and regular zone journals need to be updated
immediately when a recoverable error is discovered.
(cherry picked from commit 0fbdf189c7)
Previously, dns_journal_begin_transaction() could reserve the wrong
amount of space. We now check that the transaction is internally
consistent when upgrading / downgrading a journal and we also handle the
bad transaction headers.
(cherry picked from commit 83310ffd92)
Instead of journal_write(), use correct format call journal_write_xhdr()
to write the dummy transaction header which looks at j->header_ver1 to
determine which transaction header to write instead of always writing a
zero filled journal_rawxhdr_t header.
(cherry picked from commit 5a6112ec8f)
Fix race between zone_maintenance and dns_zone_notifyreceive functions,
zone_maintenance was attempting to read a zone flag calling
DNS_ZONE_FLAG(zone, flag) while dns_zone_notifyreceive was updating
a flag in the same zone calling DNS_ZONE_SETFLAG(zone, ...).
The code reading the flag in zone_maintenance was not protected by the
zone's lock, to avoid a race the zone's lock is now being acquired
before an attempt to read the zone flag is made.
When we are recursing, RPZ processing is not allowed. But when we are
performing a lookup due to "stale-answer-client-timeout", we are still
recursing. This effectively means that RPZ processing is disabled on
such a lookup.
In this case, bail the "stale-answer-client-timeout" lookup and wait
for recursion to complete, as we we can't perform the RPZ rewrite
rules reliably.
(cherry picked from commit 3d3a6415f7)
The dboption DNS_DBFIND_STALEONLY caused confusion because it implies
we are looking for stale data **only** and ignore any active RRsets in
the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear
the option is related to a lookup due to "stale-answer-client-timeout".
Rename other usages of "staleonly", instead use "lookup due to...".
Also rename related function and variable names.
(cherry picked from commit 839df94190)
When doing a staleonly lookup we don't want to fallback to recursion.
After all, there are obviously problems with recursion, otherwise we
wouldn't do a staleonly lookup.
When resuming from recursion however, we should restore the
RECURSIONOK flag, allowing future required lookups for this client
to recurse.
(cherry picked from commit 3f81d79ffb)
When implementing "stale-answer-client-timeout", we decided that
we should only return positive answers prematurely to clients. A
negative response is not useful, and in that case it is better to
wait for the recursion to complete.
To do so, we check the result and if it is not ISC_R_SUCCESS, we
decide that it is not good enough. However, there are more return
codes that could lead to a positive answer (e.g. CNAME chains).
This commit removes the exception and now uses the same logic that
other stale lookups use to determine if we found a useful stale
answer (stale_found == true).
This means we can simplify two test cases in the serve-stale system
test: nodata.example is no longer treated differently than data.example.
(cherry picked from commit aaed7f9d8c)
The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response
twice. Without the attribute, this may happen if a staleonly lookup
found a useful answer and sends a response to the client, and later
recursion ends and also tries to send a response.
The attribute was also used to mask adding a duplicate RRset. This is
considered harmful. When we created a response to the client with a
stale only lookup (regardless if we actually have send the response),
we should clear the rdatasets that were added during that lookup.
Mark such rdatasets with the a new attribute,
DNS_RDATASETATTR_STALE_ADDED. Set a query attribute
NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale
only lookup. Before creating a response on a normal lookup, check if
we can expect rdatasets to have been added during a staleonly lookup.
If so, clear the rdatasets from the message with the attribute
DNS_RDATASETATTR_STALE_ADDED set.
(cherry picked from commit 3d5429f61f)
With stale-answer-client-timeout, we may send a response to the client,
but we may want to hold on to the network manager handle, because
recursion is going on in the background, or we need to refresh a
stale RRset.
Simplify the setting of 'nodetach':
* During a staleonly lookup we should not detach the nmhandle, so just
set it prior to 'query_lookup()'.
* During a staleonly "stalefirst" lookup set the 'nodetach' to true
if we are going to refresh the RRset.
Now there is no longer the need to clear the 'nodetach' if we go
through the "dbfind_stale", "stale_refresh_window", or "stale_only"
paths.
(cherry picked from commit 48b0dc159b)
When doing a staleonly lookup, ignore active RRsets from cache. If we
don't, we may add a duplicate RRset to the message, and hit an
assertion failure in query.c because adding the duplicate RRset to the
ANSWER section failed.
This can happen on a race condition. When a client query is received,
the recursion is started. When 'stale-answer-client-timeout' triggers
around the same time the recursion completes, the following sequence
of events may happen:
1. Queue the "try stale" fetch_callback() event to the client task.
2. Add the RRsets from the authoritative response to the cache.
3. Queue the "fetch complete" fetch_callback() event to the client task.
4. Execute the "try stale" fetch_callback(), which retrieves the
just-inserted RRset from the database.
5. In "ns_query_done()" we are still recursing, but the "staleonly"
query attribute has already been cleared. In other words, the
query will resume when recursion ends (it already has ended but is
still on the task queue).
6. Execute the "fetch complete" fetch_callback(). It finds the answer
from recursion in the cache again and tries to add the duplicate to
the answer section.
This commit changes the logic for finding stale answers in the cache,
such that on "stale_only" lookups actually only stale RRsets are
considered. It refactors the code so that code paths for "dbfind_stale",
"stale_refresh_window", and "stale_only" are more clear.
First we call some generic code that applies in all three cases,
formatting the domain name for logging purposes, increment the
trystale stats, and check if we actually found stale data that we can
use.
The "dbfind_stale" lookup will return SERVFAIL if we didn't found a
usable answer, otherwise we will continue with the lookup
(query_gotanswer()). This is no different as before the introduction of
"stale-answer-client-timeout" and "stale-refresh-time".
The "stale_refresh_window" lookup is similar to the "dbfind_stale"
lookup: return SERVFAIL if we didn't found a usable answer, otherwise
continue with the lookup (query_gotanswer()).
Finally the "stale_only" lookup.
If the "stale_only" lookup was triggered because of an actual client
timeout (stale-answer-client-timeout > 0), and if database lookup
returned a stale usable RRset, trigger a response to the client.
Otherwise return and wait until the recursion completes (or the
resolver query times out).
If the "stale_only" lookup is a "stale-anwer-client-timeout 0" lookup,
preferring stale data over a lookup. In this case if there was no stale
data, or the data was not a positive answer, retry the lookup with the
stale options cleared, a.k.a. a normal lookup. Otherwise, continue
with the lookup (query_gotanswer()) and refresh the stale RRset. This
will trigger a response to the client, but will not detach the handle
because a fetch will be created to refresh the RRset.
(cherry picked from commit 92f7a67892)
The stale-answer-client-timeout feature introduced a dependancy on
when a client may be detached from the handle. The dboption
DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads
the meaning of this database option, and actually introduced a bug
because the option was checked in other places. In particular, in
'ns_query_done()' there is a check for 'RECURSING(qctx->client) &&
(!QUERY_STALEONLY(&qctx->client->query) || ...' and the condition is
satisfied because recursion has not completed yet and
DNS_DBFIND_STALEONLY is already cleared by that time (in
query_lookup()), because we found a useful answer and we should detach
the client from the handle after sending the response.
Add a new boolean to the client structure to keep track of client
detach from handle is allowed or not. It is only disallowed if we are
in a staleonly lookup and we didn't found a useful answer.
(cherry picked from commit fee164243f)
Previously, every function had it's own #ifdef GSSAPI #else #endif block
that defined shim function in case GSSAPI was not being used. Now the
dummy shim functions have be split out into a single #else #endif block
at the end of the file.
This makes the gssapictx.c similar to 9.17.x code, making the backports
and reviews easier.
The Heimdal Kerberos library handles the OID sets in a different manner.
Unify the handling of the OID sets between MIT and Heimdal
implementations by dynamically creating the OID sets instead of using
static predefined set. This is how upstream recommends to handle the
OID sets.
The custom ISC SPNEGO mechanism implementation is no longer needed on
the basis that all major Kerberos 5/GSSAPI (mit-krb5, heimdal and
Windows) implementations support SPNEGO mechanism since 2006.
This commit removes the custom ISC SPNEGO implementation, and removes
the option from both autoconf and win32 Configure script. Unknown
options are being ignored, so this doesn't require any special handling.
Do not require config.h to use isc/util.h
See merge request isc-projects/bind9!4840
(cherry picked from commit 19b69e9a3b)
81eb3396 Do not require config.h to use isc/util.h
CDS/CDNSKEY DELETE records are only useful if they are signed,
otherwise the parent cannot verify these RRsets anyway. So once the DS
has been removed (and signaled to BIND), we can remove the DNSKEY and
RRSIG records, and at this point we can also remove the CDS/CDNSKEY
records.
(cherry picked from commit 6f31f62d69)
While not useful, having a CDS/CDNSKEY DELETE record in an unsigned
zone is not an error and "named-checkzone" should not complain.
(cherry picked from commit f211c7c2a1)
The 'keymgr_key_init()' function initializes key states if they have
not been set previously. It looks at the key timing metadata and
determines using the given times whether a state should be set to
RUMOURED or OMNIPRESENT.
However, the DNSKEY and ZRRSIG states were mixed up: When looking
at the Activate timing metadata we should set the ZRRSIG state, and
when looking at the Published timing metadata we should set the
DNSKEY state.
(cherry picked from commit 27e7d5f698)
The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates
on a timer tick. This clock is generally fine for millisecond accuracy,
but on servers with 100hz clocks, this clock is nowhere near accurate
enough for microsecond accuracy.
This commit adds a new isc_time_now_hires function that uses
CLOCK_REALTIME, which gives the current time, though it is somewhat
expensive to call. When microsecond accuracy is required, it may be
required to use extra resources for higher accuracy.
(cherry picked from commit ebced74b19)
The tlsdns API is not yet used in the 9.16 branch and the tlsdns_test
fails too often. Temporarily disable running the test until it is
actually needed.
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit. The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.
This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
The udp, tcpdns and tlsdns contained lot of cut&paste code or code that
was very similar making the stack harder to maintain as any change to
one would have to be copied to the the other protocols.
In this commit, we merge the common parts into the common functions
under isc__nm_<foo> namespace and just keep the little differences based
on the socket type.
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.
This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.
This commit changes the timeout logic in this way:
* On TCP connection accept the tcp-initial-timeout is applied
and the timer is started
* When we are processing and/or sending any DNS message the timer is
stopped
* When we stop processing all DNS messages, the tcp-idle-timeout
is applied and the timer is started again