There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.
Add text to clarify that the interval is calculated from the new
validity period.
In the case when `dnssec-signzone` is called on an already signed zone, and the private key file is unavailable, a signature that needs to be refreshed may be dropped without being able to generate a replacement. This has been fixed.
Closes#5126
Merge branch '5126-dnssec-signzone-retain-rrsig-if-key-is-offline' into 'main'
See merge request isc-projects/bind9!9951
Track inside the dns_dnsseckey structure whether we have seen the
private key, or if this key only has a public key file.
If the key only has a public key file, or a DNSKEY reference in the
zone, mark the key 'pubkey'. In dnssec-signzone, if the key only
has a public key available, consider the key to be offline. Any
signatures that should be refreshed for which the key is not available,
retain the signature.
So in the code, 'expired' becomes 'refresh', and the new 'expired'
is only used to determine whether we need to keep the signature if
the corresponding key is not available (retaining the signature if
it is not expired).
In the 'keysthatsigned' function, we can remove:
- key->force_publish = false;
- key->force_sign = false;
because they are redundant ('dns_dnsseckey_create' already sets these
values to false).
Add a test case for the scenario below.
There is a case when signing a zone with dnssec-signzone where the
private key file is moved outside the key directory (for offline
ksk purposes), and then the zone is resigned. The signature of the
DNSKEY needs refreshing, but is not expired.
Rather than removing the signature without having a valid replacement,
leave the signature in the zone (despite it needs to be refreshed).
If the generated status output exceeds 4096 it was silently truncated, now we output that the status was truncated.
Closes#4180
Merge branch '4180-possible-truncation-in-dns_keymgr_status' into 'main'
See merge request isc-projects/bind9!9905
Extended DNS error mechanism (EDE) may have several errors raised during a DNS resolution. `named` is now able to add up to three EDE codes in a DNS response. In the case of duplicate error codes, only the first one will be part of the DNS response.
Closes#5085
Merge branch '5085-multiple-ede' into 'main'
See merge request isc-projects/bind9!9952
Extended DNS error mechanism (EDE) enables to have several EDE raised
during a DNS resolution (typically, a DNSSEC query will do multiple
fetches which each of them can have an error). Add support to up to 3
EDE errors in an DNS response. If duplicates occur (two EDEs with the
same code, the extra text is not compared), only the first one will be
part of the DNS answer.
Because the maximum number of EDE is statically fixed, `ns_client_t`
object own a static vector of `DNS_DE_MAX_ERRORS` (instead of a linked
list, for instance). The array can be fully filled (all slots point to
an allocated `dns_ednsopt_t` object) or partially filled (or
empty). In such case, the first NULL slot means there is no more EDE
objects.
When 'ISC_R_TIMEDOUT' is received in 'tcp_recv()', it times out the
oldest response in the active responses queue, and only after that it
checks whether other active responses have also timed out. So when
setting a timeout value for a read operation after a successful
connection, it makes sense to take the timeout value from the oldest
response in the active queue too, because, theoretically, the responses
can have different timeout values, e.g. when the TCP dispatch is shared.
Currently 'resp' is always NULL. Previously when connect and read timeouts
were not separated in dispatch this affected only logging, but now since
we are setting a new timeout after a successful connection, we need to
choose a suitable response from the active queue.
Merge branch 'aram/dispatch-tcp_connected-fix' into 'main'
See merge request isc-projects/bind9!9927
Since the read timeout now works, the resolver time outs from the
dispatch level instead of from the "hung fetch" timer, and so the
EDE value in 'fctx_expired()' is not being set. Remove the expected
EDE value from the test.
When TCP is used, 'fctx_query()' adds one second to the rtt
(round-trip time) value, but there's a bug when the decision
about using TCP is made already after the calculation. Move the
block of the code which looks up the peers list to decide
whether to use TCP into a place that's before the rtt calculation
is performed. This commit doesn't add or remove any code, it just
moves the code and adds a comment block.
When 'ISC_R_TIMEDOUT' is received in 'tcp_recv()', it times out the
oldest response in the active responses queue, and only after that it
checks whether other active responses have also timed out. So when
setting a timeout value for a read operation after a successful
connection, it makes sense to take the timeout value from the oldest
response in the active queue too, because, theoretically, the responses
can have different timeout values, e.g. when the TCP dispatch is shared.
Currently 'resp' is always NULL. Previously when connect and read
timeouts were not separated in dispatch this affected only logging, but
now since we are setting a new timeout after a successful connection,
we need to choose a suitable response from the active queue.
Prevent lock contention among many worker threads referring to the same database node at the same time. This would improve zone and cache database performance for the heavily contended database nodes.
Closes#5130
Merge branch '5130-reduce-lock-contention-in-decrement-reference' into 'main'
See merge request isc-projects/bind9!9963
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0. In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.
Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Shutdown the fetch context immediately after the last fetch has been canceled from
that particular fetch context.
Merge branch 'ondrej/shutdown-the-fetch-context-early' into 'main'
See merge request isc-projects/bind9!9958
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer. This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.
Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
Resolver under heavy-load could exhaust the memory available for storing
the information in the Address Database (ADB) effectively evicting already
stored information in the ADB. The memory used to retrieve and provide
information from the ADB is now not a subject of the same memory limits
that are applied for storing the information in the Address Database.
Closes#5127
Merge branch '5127-change-ADB-memory-split' into 'main'
See merge request isc-projects/bind9!9954
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks). This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.
Make the short lived ADB objects use the other memory context that we
already created for the hashmaps. This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.
The network manager layer has two different timers with their
own timeout values for TCP connections: connect timeout and read
timeout. Separate the connect and the read TCP timeouts in the
dispatch module too.
Closes#5009
Merge branch '5009-dispatch-separate-connect-and-read-timeouts' into 'main'
See merge request isc-projects/bind9!9698
In some places there was a limitation of the maximum timeout
value of INT16_MAX, which is only about 32 seconds. Refactor
the code to remove the limitation.
The network manager layer has two different timers with their
own timeout values for TCP connections: connect timeout and read
timeout. Separate the connect and the read TCP timeouts in the
dispatch module too.
struct fetchctx does have several fields which are now unused or confusing, removing those.
Merge branch 'colin/remove-fctx-validator' into 'main'
See merge request isc-projects/bind9!9945
struct fetchctx does have a list of pending validators as well as a
pointer to the HEAD validator. Remove the validator pointer to avoid
confusion, as there is no perticular reasons to have it directly
accessible outside of the list.
A bug in the qpzone database could trigger a crash when querying for a deleted name, or a newly-added empty non-terminal name, in an NSEC3-signed zone. This has been fixed.
Closes#5108
Merge branch '5108-nsec3-empty-node' into 'main'
See merge request isc-projects/bind9!9928
there was a database bug in which dns_db_find() could get a partial
match for the query name, but still set foundname to match the full
query name. this triggered an assertion when query_addwildcardproof()
assumed that foundname would be shorter.
the database bug has been fixed, but in case it happens again, we
can just copy the name instead of splitting it. we will also log a
warning that the closest-encloser name was invalid.
when adding a new NSEC3 record, dns_nsec3_addnsec3() uses a
dbiterator to seek to the newly created node and then find its
predecessor. dbiterators in the qpzone use snapshots, so changes
to the database are not reflected in an already-existing iterator.
consequently, when we add a new node, we have to create a new iterator
before we can seek to it.
this test adds a record with empty non-terminal nodes above it. this
has also been observed to trigger the crash in NSEC3 zones.
NOTE: the test currently fails, because while there is no crash, the
query results are not as expected. when we add a node below an ENT,
receive_secure_serial() gets DNS_R_PARTIALMATCH, and the signed
zone is never updated. this is not a regression from fixing the
crash bug; it's a separate inline-signing bug.
test that there's no crash when querying for a newly-deleted node.
(incidentally also renamed ns3/named.conf.in to ns3/named1.conf.in,
because named2.conf.in does exist, and they should match.)
when a requested name is found in the QP trie during a lookup, but its
records have been marked as nonexistent by a previous deletion, then
it's treated as a partial match, but the foundname could be left
pointing to the original qname rather than the parent. this could
lead to an assertion failure in query_findclosestnsec3().
Commit b121f02eac renamed the top-level
"primaries" block in bin/named/config.c to "remote-servers". This
configuration block lists the primary servers used for an IANA root zone
mirror when no primary servers are explicitly specified for it in the
configuration. However, the relevant part of the named_zone_configure()
function only looks for a top-level "primaries" block and not for any of
its synonyms. As a result, configuring an IANA root zone mirror with
just:
zone "." {
type mirror;
};
now results in a cryptic fatal error on startup:
loading configuration: not found
exiting (due to fatal error)
Fix by using the correct top-level block name in named_zone_configure().
Response policy zones (RPZ) and catalog zones were not working correctly if they had an $INCLUDE statement defined. This has been fixed.
Closes#5111
Merge branch '5111-includes-disable-rpz-and-catz-fix' into 'main'
See merge request isc-projects/bind9!9930
The code in zone_startload() disables RPZ and CATZ for a zone if
dns_master_loadfile() returns anything other than ISC_R_SUCCESS,
which makes sense, but it's an error because zone_startload() can
also return DNS_R_SEENINCLUDE upon success when the zone had an
$INCLUDE statement.
Add performance tests of DoH using the GET protocol to nightly pipelines.
Merge branch 'nicki/ci-shotgun-doh-get' into 'main'
See merge request isc-projects/bind9!9926
Adjust number of zones down to 23 to match those present when testing in FIPS mode.
Closes#5097
Merge branch '5097-checking-startup-notify-rate-limit-fails-on-ol-8-fips' into 'main'
See merge request isc-projects/bind9!9919