- use a value less than 2^32 for DNS_ZONEFLG_FIXJOURNAL; a larger value
could cause problems in some build environments. the zone flag
DNS_ZONEFLG_DIFFONRELOAD, which was no longer in use, has now been
deleted and its value reused for _FIXJOURNAL.
- style, cleanup, and removal of unnecessary code.
- combined isc_nm_http_add_endpoint() and isc_nm_http_add_doh_endpoint()
into one function, renamed isc_http_endpoint().
- moved isc_nm_http_connect_send_request() into doh_test.c as a helper
function; remove it from the public API.
- renamed isc_http2 and isc_nm_http2 types and functions to just isc_http
and isc_nm_http, for consistency with other existing names.
- shortened a number of long names.
- the caller is now responsible for determining the peer address.
in isc_nm_httpconnect(); this eliminates the need to parse the URI
and the dependency on an external resolver.
- the caller is also now responsible for creating the SSL client context,
for consistency with isc_nm_tlsdnsconnect().
- added setter functions for HTTP/2 ALPN. instead of setting up ALPN in
isc_tlsctx_createclient(), we now have a function
isc_tlsctx_enable_http2client_alpn() that can be run from
isc_nm_httpconnect().
- refactored isc_nm_httprequest() into separate read and send functions.
isc_nm_send() or isc_nm_read() is called on an http socket, it will
be stored until a corresponding isc_nm_read() or _send() arrives; when
we have both halves of the pair the HTTP request will be initiated.
- isc_nm_httprequest() is renamed isc__nm_http_request() for use as an
internal helper function by the DoH unit test. (eventually doh_test
should be rewritten to use read and send, and this function should
be removed.)
- added implementations of isc__nm_tls_settimeout() and
isc__nm_http_settimeout().
- increased NGHTTP2 header block length for client connections to 128K.
- use isc_mem_t for internal memory allocations inside nghttp2, to
help track memory leaks.
- send "Cache-Control" header in requests and responses. (note:
currently we try to bypass HTTP caching proxies, but ideally we should
interact with them: https://tools.ietf.org/html/rfc8484#section-5.1)
when the 'max-ixfr-ratio' option was added, journal transaction
headers were revised to include a count of RR's in each transaction.
this made it impossible to read old journal files after an upgrade.
this branch restores the ability to read version 1 transaction
headers. when rolling forward, printing journal contents, if
the wrong transaction header format is found, we can switch.
when dns_journal_rollforward() detects a version 1 transaction
header, it returns DNS_R_RECOVERABLE. this triggers zone_postload()
to force a rewrite of the journal file in the new format, and
also to schedule a dump of the zone database with minimal delay.
journal repair is done by dns_journal_compact(), which rewrites
the entire journal, ignoring 'max-journal-size'. journal size is
corrected later.
newly created journal files now have "BIND LOG V9.2" in their headers
instead of "BIND LOG V9". files with the new version string cannot be
read using the old transaction header format. note that this means
newly created journal files will be rejected by older versions of named.
named-journalprint now takes a "-x" option, causing it to print
transaction header information before each delta, including its
format version.
On each keymgr run, we now also check if key files can be removed.
The 'purge-keys' interval determines how long keys should be retained
after they have become completely hidden.
Key files should not be removed if it has a state that is set to
something else then HIDDEN, if purge-keys is 0 (disabled), if
the key goal is set to OMNIPRESENT, or if the key is unused (a key is
unused if no timing metadata set, and no states are set or if set,
they are set to HIDDEN).
If the last changed timing metadata plus the purge-keys interval is
in the past, the key files may be removed.
Add a dst_key_t variable 'purge' to signal that the key file should
not be written to file again.
Add a new option 'purge-keys' to 'dnssec-policy' that will purge key
files for deleted keys. The option determines how long key files
should be retained prior to removing the corresponding files from
disk.
If set to 0, the option is disabled and 'named' will not remove key
files from disk.
The only reason for including the gssapi.h from the dst/gssapi.h header
was to get the typedefs of gss_cred_id_t and gss_ctx_id_t. Instead of
using those types directly this commit introduces dns_gss_cred_id_t and
dns_gss_ctx_id_t types that are being used in the public API and
privately retyped to their counterparts when we actually call the gss
api.
This also conceals the gssapi headers, so users of the libdns library
doesn't have to add GSSAPI_CFLAGS to the Makefile when including libdns
dst API.
The lmdb.h doesn't have to be included from the dns/view.h header as it
is separately included where used. This stops exposing the inclusion of
lmdb.h from the libdns headers.
Add support for a "tls" key/value pair for zone primaries, referencing
either a "tls" configuration statement or "ephemeral". If set to use
TLS, zones will send SOA and AXFR/IXFR queries over a TLS channel.
If we did not attempt a fetch due to fetch-limits, we should not start
the stale-refresh-time window.
Introduce a new flag DNS_DBFIND_STALESTART to differentiate between
a resolver failure and unexpected error. If we are resuming, this
indicates a resolver failure, then start the stale-refresh-time window,
otherwise don't start the stale-refresh-time window, but still fall
back to using stale data.
(This commit also wraps some docstrings to 80 characters width)
First of all, there was a flaw in the code related to the
'stale-refresh-time' option. If stale answers are enabled, and we
returned stale data, then it was assumed that it was because we were
in the 'stale-refresh-time' window. But now we could also have returned
stale data because of a 'stale-answer-client-timeout'. To fix this,
introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to
indicate whether the stale cache entry was returned because the
'stale-refresh-time' window is active.
Second, remove the special case handling when the result is
DNS_R_NCACHENXRRSET. This can be done more generic in the code block
when dealing with stale data.
Putting all stale case handling in the code block when dealing with
stale data makes the code more easy to follow.
Update documentation to be more verbose and to match then new code
flow.
The general logic behind the addition of this new feature works as
folows:
When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.
With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.
When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:
1. Setup a new query context with the sole purpose of looking up for
stale RRset only data, for that matters a new flag was added
'DNS_DBFIND_STALEONLY' used in database lookups.
. If a stale RRset is found, mark the original client query as
answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
so when the fetch completion event is received later, we avoid
answering the client twice.
. If a stale RRset is not found, cleanup and wait for the normal
fetch completion event.
2. In ns_query_done, we must change this part:
/*
* If we're recursing then just return; the query will
* resume when recursion ends.
*/
if (RECURSING(qctx->client)) {
return (qctx->result);
}
To this:
if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
return (qctx->result);
}
Otherwise we would not proceed to answer the client if it happened
that a stale answer was found when looking up for stale only data.
When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:
1. Before answering the client (ns_client_send), check if the query
wasn't already answered before.
2. Before detaching a client, e.g.
isc_nmhandle_detach(&client->reqhandle), ensure that this is the
fetch completion event, and not the one triggered due to
stale-answer-client-timeout, so a correct call would be:
if (!QUERY_STALEONLY(client)) {
isc_nmhandle_detach(&client->reqhandle);
}
Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
Configure "none" as a builtin policy. Change the 'cfg_kasp_fromconfig'
api so that the 'name' will determine what policy needs to be
configured.
When transitioning a zone from secure to insecure, there will be
cases when a zone with no DNSSEC policy (dnssec-policy none) should
be using KASP. When there are key state files available, this is an
indication that the zone once was DNSSEC signed but is reconfigured
to become insecure.
If we would not run the keymgr, named would abruptly remove the
DNSSEC records from the zone, making the zone bogus. Therefore,
change the code such that a zone will use kasp if there is a valid
dnssec-policy configured, or if there are state files available.
When using the `unixtime` or `date` method to update the SOA serial,
`named` and `dnssec-signzone` would silently fallback to `increment`
method to prevent the new serial number to be smaller than the old
serial number (using the serial number arithmetics). Add a warning
message when such fallback happens.
Add unit test to ensure the right NSEC3PARAM event is scheduled in
'dns_zone_setnsec3param()'. To avoid scheduling and managing actual
tasks, split up the 'dns_zone_setnsec3param()' function in two parts:
1. 'dns__zone_lookup_nsec3param()' that will check if the requested
NSEC3 parameters already exist, and if a new salt needs to be
generated.
2. The actual scheduling of the new NSEC3PARAM event (if needed).
When generating a new salt, compare it with the previous NSEC3
paremeters to ensure the new parameters are different from the
previous ones.
This moves the salt generation call from 'bin/named/*.s' to
'lib/dns/zone.c'. When setting new NSEC3 parameters, you can set a new
function parameter 'resalt' to enforce a new salt to be generated. A
new salt will also be generated if 'salt' is set to NULL.
Logging salt with zone context can now be done with 'dnssec_log',
removing the need for 'dns_nsec3_log_salt'.
Upon request from Mark, change the configuration of salt to salt
length.
Introduce a new function 'dns_zone_checknsec3aram' that can be used
upon reconfiguration to check if the existing NSEC3 parameters are
in sync with the configuration. If a salt is used that matches the
configured salt length, don't change the NSEC3 parameters.
Check 'nsec3param' configuration for the number of iterations. The
maximum number of iterations that are allowed are based on the key
size (see https://tools.ietf.org/html/rfc5155#section-10.3).
Check 'nsec3param' configuration for correct salt. If the string is
not "-" or hex-based, this is a bad salt.
Implement support for NSEC3 in dnssec-policy. Store the configuration
in kasp objects. When configuring a zone, call 'dns_zone_setnsec3param'
to queue an nsec3param event. This will ensure that any previous
chains will be removed and a chain according to the dnssec-policy is
created.
Add tests for dnssec-policy zones that uses the new 'nsec3param'
option, as well as changing to new values, changing to NSEC, and
changing from NSEC.
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).
The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.
A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.
Only when this interval expires we then try to do a normal refresh of
the rrset.
The logic behind this commit is as following:
- In query.c / query_gotanswer(), if the test of 'result' variable falls
to the default case, an error is assumed to have happened, and a call
to 'query_usestale()' is made to check if serving of stale rrset is
enabled in configuration.
- If serving of stale answers is enabled, a flag will be turned on in
the query context to look for stale records:
query.c:6839
qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;
- A call to query_lookup() will be made again, inside it a call to
'dns_db_findext()' is made, which in turn will invoke rbdb.c /
cache_find().
- In rbtdb.c / cache_find() the important bits of this change is the
call to 'check_stale_header()', which is a function that yields true
if we should skip the stale entry, or false if we should consider it.
- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
is set, if that is the case we know that this new search for stale
records was made due to a failure in a normal resolution, so we keep
track of the time in which the failured occured in rbtdb.c:4559:
header->last_refresh_fail_ts = search->now;
- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
know this is a normal lookup, if the record is stale and the query
time is between last failure time + stale-refresh-time window, then
we return false so cache_find() knows it can consider this stale
rrset entry to return as a response.
The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh
Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
since the network manager is now handling timeouts, xfrin doesn't
need an isc_task object.
it may be necessary to revert this later if we find that it's
important for zone_xfrdone() to be executed in the zone task context.
currently things seem to be working well without that, though.
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication. In this commit, we implement the required
changes and simplify the logic for picking the EDNS Buffer Size.
1. The defaults for `edns-udp-size`, `max-udp-size` and
`nocookie-udp-size` have been changed to `1232` (the value picked by
DNS Flag Day 2020).
2. The probing heuristics that would try 512->4096->1432->1232 buffer
sizes has been removed and the resolver will always use just the
`edns-udp-size` value.
3. Instead of just disabling the PMTUD mechanism on the UDP sockets, we
now set IP_DONTFRAG (IPV6_DONTFRAG) flag. That means that the UDP
packets won't get ever fragmented. If the ICMP packets are lost the
UDP will just timeout and eventually be retried over TCP.
While working on 'rndc dnssec -rollover' I noticed the following
(small) issues:
- The key files where updated with hints set to "-when" and that
should always be "now.
- The kasp system test did not properly update the test number when
calling 'rndc dnssec -checkds' (and ensuring that works).
- There was a missing ']' in the rndc.c help output.
Add to the keymgr a function that will schedule a rollover. This
basically means setting the time when the key needs to retire,
and updating the key lifetime, then update the state file. The next
time that named runs the keymgr the new lifetime will be taken into
account.
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call. Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.
Hold a weak reference to the view so that it can't go away while
nta is performing its lookups. Cancel nta timers once all external
references to the view have gone to prevent them triggering new work.
Add a new 'rndc' command 'dnssec -checkds' that allows the user to
signal named that a new DS record has been seen published in the
parent, or that an existing DS record has been withdrawn from the
parent.
Upon the 'checkds' request, 'named' will write out the new state for
the key, updating the 'DSPublish' or 'DSRemoved' timing metadata.
This replaces the "parent-registration-delay" configuration option,
this was unreliable because it was purely time based (if the user
did not actually submit the new DS to the parent for example, this
could result in an invalid DNSSEC state).
Because we cannot rely on the parent registration delay for state
transition, we need to replace it with a different guard. Instead,
if a key wants its DS state to be moved to RUMOURED, the "DSPublish"
time must be set and must not be in the future. If a key wants its
DS state to be moved to UNRETENTIVE, the "DSRemoved" time must be set
and must not be in the future.
By default, with '-checkds' you set the time that the DS has been
published or withdrawn to now, but you can set a different time with
'-when'. If there is only one KSK for the zone, that key has its
DS state moved to RUMOURED. If there are multiple keys for the zone,
specify the right key with '-key'.