Commit graph

12916 commits

Author SHA1 Message Date
Mark Andrews
4e192d2fe7 Address theoretical resource leak in dns_dt_open()
dns_dt_open() is not currently called with mode dns_dtmode_unix.

    *** CID 281489:  Resource leaks  (RESOURCE_LEAK)
    /lib/dns/dnstap.c: 983 in dns_dt_open()
    977
    978     		if (!dnstap_file(handle->reader)) {
    979     			CHECK(DNS_R_BADDNSTAP);
    980     		}
    981     		break;
    982     	case dns_dtmode_unix:
       CID 281489:  Resource leaks  (RESOURCE_LEAK)
       Variable "handle" going out of scope leaks the storage it points to.
    983     		return (ISC_R_NOTIMPLEMENTED);
    984     	default:
    985     		INSIST(0);
    986     		ISC_UNREACHABLE();
    987     	}
    988

(cherry picked from commit 003dd8cc70)
2021-02-23 09:41:15 +11:00
Mark Andrews
d68b85e555 Correctly detect when get_direction failed
(cherry picked from commit 009358d77d)
2021-02-19 11:39:12 +11:00
Mark Andrews
0cea486327 Test a LOC record with an invalid direction field
(cherry picked from commit 07902d9f9d)
2021-02-19 11:39:12 +11:00
Ondřej Surý
b04cb88462 Fix off-by-one bug in ISC SPNEGO implementation
The ISC SPNEGO implementation is based on mod_auth_kerb code.  When
CVE-2006-5989 was disclosed, the relevant fix was not applied to the
BIND 9 codebase, making the latter vulnerable to the aforementioned flaw
when "tkey-gssapi-keytab" or "tkey-gssapi-credential" is set in
named.conf.

The original description of CVE-2006-5989 was:

    Off-by-one error in the der_get_oid function in mod_auth_kerb 5.0
    allows remote attackers to cause a denial of service (crash) via a
    crafted Kerberos message that triggers a heap-based buffer overflow
    in the component array.

Later research revealed that this flaw also theoretically enables remote
code execution, though achieving the latter in real-world conditions is
currently deemed very difficult.

This vulnerability was responsibly reported as ZDI-CAN-12302 ("ISC BIND
TKEY Query Heap-based Buffer Overflow Remote Code Execution
Vulnerability") by Trend Micro Zero Day Initiative.
2021-02-17 22:36:08 +01:00
Ondřej Surý
d7b3a6a016 Rollback setting IP_DONTFRAG option on the UDP sockets
In DNS Flag Day 2020, the development branch started setting the
IP_DONTFRAG option on the UDP sockets.  It turned out, that this
code was incomplete leading to dropping the outgoing UDP packets.
Henceforth this commit rolls back this setting until we have a
proper fix that would send back empty response with TC flag set.

(cherry picked from commit 66eefac78c)
2021-02-17 14:41:56 +01:00
Michal Nowak
f483b102dd
Drop USE_OPENSSL constraint from dh_test
The USE_OPENSSL constraint in dh_test does not seems to be necessary
anymore, the test runs with PKCS#11 as well.

(cherry picked from commit c341e7f740)
2021-02-17 12:46:25 +01:00
Michal Nowak
ed38e32b69
Ensure dnstap_test returns SKIPPED_TEST_EXIT_CODE
Make sure lib/dns/tests/dnstap_test returns an exit code that indicates
a skipped test when dnstap is not enabled.

(cherry picked from commit c286341703)
2021-02-17 12:15:18 +01:00
Michal Nowak
04aff208fb
Use BIND 9.17 preprocessor macro to skip unit test
BIND 9.17 changed exit code of skipped test to meet Automake
expectations in fa505bfb0e. BIND 9.16 was
not rewritten to Automake, but for consistency reasons, the same
SKIPPED_TEST_EXIT_CODE preprocessor macro is used (though the actual
exit code differs from the one in BIND 9.17).

(cherry picked from commit fa505bfb0e)
2021-02-17 12:09:25 +01:00
Mark Andrews
d51b78c85b Stop including <gssapi.h> from <dst/gssapi.h> header
The only reason for including the gssapi.h from the dst/gssapi.h header
was to get the typedefs of gss_cred_id_t and gss_ctx_id_t.  Instead of
using those types directly this commit introduces dns_gss_cred_id_t and
dns_gss_ctx_id_t types that are being used in the public API and
privately retyped to their counterparts when we actually call the gss
api.

This also conceals the gssapi headers, so users of the libdns library
doesn't have to add GSSAPI_CFLAGS to the Makefile when including libdns
dst API.
2021-02-16 12:08:21 +11:00
Ondřej Surý
4bbe3e75de Stop including dnstap headers from <dns/dnstap.h>
The <fstrm.h> and <protobuf-c/protobuf-c.h> headers are only directly
included where used and we stopped exposing those headers from libdns
headers.
2021-02-16 12:08:21 +11:00
Mark Andrews
bf5aac225b Stop including <lmdb.h> from <dns/lmdb.h>
The lmdb.h header doesn't have to be included from the dns/lmdb.h
header as it can be separately included where used.  This stops
exposing the inclusion of lmdb.h from the libdns headers.
2021-02-16 12:08:21 +11:00
Mark Andrews
b8fc8742e5 Re-order include directories
${FSTRM_CFLAGS} ${PROTOBUF_C_CFLAGS} ${OPENSSL_CFLAGS} ${LMDB_CFLAGS}
need to appear after all directories in the build tree.
2021-02-16 12:08:21 +11:00
Diego Fronza
d89a8bf696 Fix dangling references to outdated views after reconfig
This commit fix a leak which was happening every time an inline-signed
zone was added to the configuration, followed by a rndc reconfig.

During the reconfig process, the secure version of every inline-signed
zone was "moved" to a new view upon a reconfig and it "took the raw
version along", but only once the secure version was freed (at shutdown)
was prev_view for the raw version detached from, causing the old view to
be released as well.

This caused dangling references to be kept for the previous view, thus
keeping all resources used by that view in memory.
2021-02-15 11:52:50 -03:00
Mark Andrews
6e30caed57 Silence Insecure data handling (TAINTED_SCALAR)
Coverity assumes that the memory holding any value read using byte
swapping is tainted.  As we store the NSEC3PARAM records in wire
form and iterations is byte swapped the memory holding the record
is marked as tainted.  nsec3->salt_length is marked as tainted
transitively. To remove the taint the value need to be range checked.
For a correctly formatted record region.length should match
nsec3->salt_length and provides a convenient value to check the field
against.

    *** CID 316507:  Insecure data handling  (TAINTED_SCALAR)
    /lib/dns/rdata/generic/nsec3param_51.c: 241 in tostruct_nsec3param()
    235     	region.length = rdata->length;
    236     	nsec3param->hash = uint8_consume_fromregion(&region);
    237     	nsec3param->flags = uint8_consume_fromregion(&region);
    238     	nsec3param->iterations = uint16_consume_fromregion(&region);
    239
    240     	nsec3param->salt_length = uint8_consume_fromregion(&region);
    >>>     CID 316507:  Insecure data handling  (TAINTED_SCALAR)
    >>>     Passing tainted expression "nsec3param->salt_length" to "mem_maybedup", which uses it as an offset.
    241     	nsec3param->salt = mem_maybedup(mctx, region.base,
    242     					nsec3param->salt_length);
    243     	if (nsec3param->salt == NULL) {
    244     		return (ISC_R_NOMEMORY);
    245     	}
    246     	isc_region_consume(&region, nsec3param->salt_length);

(cherry picked from commit c40133d840)
2021-02-12 10:43:19 +11:00
Mark Andrews
8302e9fb69 Silence Untrusted value as argument (TAINTED_SCALAR)
Coverity assumes that the memory holding any value read using byte
swapping is tainted.  As we store the NSEC3 records in wire form
and iterations is byte swapped the memory holding the record is
marked as tainted.  nsec3->salt_length and nsec3->next_length are
marked as tainted transitively. To remove the taint the values need
to be range checked.  Valid values for these should never exceed
region.length so that is becomes a reasonable value to check against.

    *** CID 316509:    (TAINTED_SCALAR)
    /lib/dns/rdata/generic/nsec3_50.c: 312 in tostruct_nsec3()
    306     	if (nsec3->salt == NULL) {
    307     		return (ISC_R_NOMEMORY);
    308     	}
    309     	isc_region_consume(&region, nsec3->salt_length);
    310
    311     	nsec3->next_length = uint8_consume_fromregion(&region);
    >>>     CID 316509:    (TAINTED_SCALAR)
    >>>     Passing tainted expression "nsec3->next_length" to "mem_maybedup", which uses it as an offset.
    312     	nsec3->next = mem_maybedup(mctx, region.base, nsec3->next_length);
    313     	if (nsec3->next == NULL) {
    314     		goto cleanup;
    315     	}
    316     	isc_region_consume(&region, nsec3->next_length);
    317
    /lib/dns/rdata/generic/nsec3_50.c: 305 in tostruct_nsec3()
    299     	region.length = rdata->length;
    300     	nsec3->hash = uint8_consume_fromregion(&region);
    301     	nsec3->flags = uint8_consume_fromregion(&region);
    302     	nsec3->iterations = uint16_consume_fromregion(&region);
    303
    304     	nsec3->salt_length = uint8_consume_fromregion(&region);
    >>>     CID 316509:    (TAINTED_SCALAR)
    >>>     Passing tainted expression "nsec3->salt_length" to "mem_maybedup", which uses it as an offset.
    305     	nsec3->salt = mem_maybedup(mctx, region.base, nsec3->salt_length);
    306     	if (nsec3->salt == NULL) {
    307     		return (ISC_R_NOMEMORY);
    308     	}
    309     	isc_region_consume(&region, nsec3->salt_length);
    310

(cherry picked from commit fd8d1337a5)
2021-02-12 10:43:19 +11:00
Michal Nowak
001413ed50
Drop AddressSanitizer constraint from libns unit tests
The AddressSanitizer constraint in some libns unit tests does not seem
to be necessary anymore, these tests run fine under AddressSanitizer.

(cherry picked from commit 613be8706e)
2021-02-10 11:03:27 +01:00
Matthijs Mekking
2afaff75ed Use stale on error also when unable to recurse
The 'query_usestale()' function was only called when in
'query_gotanswer()' and an unexpected error occurred. This may have
been "quota reached", and thus we were in some cases returning
stale data on fetch-limits (and if serve-stale enabled of course).

But we can also hit fetch-limits when recursing because we are
following a referral (in 'query_notfound()' and
'query_delegation_recurse()'). Here we should also check for using
stale data in case an error occurred.

Specifically don't check for using stale data when refetching a
zero TTL RRset from cache.

Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()'
function to avoid code duplication.

(cherry picked from commit 8bcd7fe69e)
2021-02-08 16:10:03 +01:00
Matthijs Mekking
dbf5428629 Only start stale refresh window when resuming
If we did not attempt a fetch due to fetch-limits, we should not start
the stale-refresh-time window.

Introduce a new flag DNS_DBFIND_STALESTART to differentiate between
a resolver failure and unexpected error. If we are resuming, this
indicates a resolver failure, then start the stale-refresh-time window,
otherwise don't start the stale-refresh-time window, but still fall
back to using stale data.

(This commit also wraps some docstrings to 80 characters width)

(cherry picked from commit aabdedeae3)
2021-02-08 16:07:43 +01:00
Matthijs Mekking
809ec0a224 Use stale data also if we are not resuming
Before this change, BIND will only fallback to using stale data if
there was an actual attempt to resolve the query. Then on a timeout,
the stale data from cache becomes eligible.

This commit changes this so that on any unexpected error stale data
becomes eligble (you would still have to have 'stale-answer-enable'
enabled of course).

If there is no stale data, this may return in an error again, so don't
loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is
set, this means we already tried to lookup stale data, so if that is
the case, don't use stale again.

(cherry picked from commit c6fd02aed5)
2021-02-08 16:07:43 +01:00
Mark Andrews
8092b7eec6 Remove redundant 'version == NULL' check
*** CID 318094:  Null pointer dereferences  (REVERSE_INULL)
    /lib/dns/rbtdb.c: 1389 in newversion()
    1383     	version->xfrsize = rbtdb->current_version->xfrsize;
    1384     	RWUNLOCK(&rbtdb->current_version->rwlock, isc_rwlocktype_read);
    1385     	rbtdb->next_serial++;
    1386     	rbtdb->future_version = version;
    1387     	RBTDB_UNLOCK(&rbtdb->lock, isc_rwlocktype_write);
    1388
       CID 318094:  Null pointer dereferences  (REVERSE_INULL)
       Null-checking "version" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    1389     	if (version == NULL) {
    1390     		return (result);
    1391     	}
    1392
    1393     	*versionp = version;
    1394

(cherry picked from commit 456d53d1fb)
2021-02-08 16:17:52 +11:00
Mark Andrews
a900d79ea8 Cleanup redundant isc_rwlock_init() result checks
(cherry picked from commit 3b11bacbb7)
2021-02-08 15:13:49 +11:00
Mark Andrews
c2a5b88275 Attempt to silence untrusted loop bound
Assign hit_len + key_len to len and test the result
rather than recomputing and letting the compiler simplify.

    213        isc_region_consume(&region, 2); /* hit length + algorithm */
        9. tainted_return_value: Function uint16_fromregion returns tainted data. [show details]
        10. tainted_data_transitive: Call to function uint16_fromregion with tainted argument *region.base returns tainted data.
        11. tainted_return_value: Function uint16_fromregion returns tainted data.
        12. tainted_data_transitive: Call to function uint16_fromregion with tainted argument *region.base returns tainted data.
        13. var_assign: Assigning: key_len = uint16_fromregion(&region), which taints key_len.
    214        key_len = uint16_fromregion(&region);
        14. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        15. Condition key_len == 0, taking false branch.
    215        if (key_len == 0) {
    216                RETERR(DNS_R_FORMERR);
    217        }
        16. Condition !!(_r->length >= _l), taking true branch.
        17. Condition !!(_r->length >= _l), taking true branch.
    218        isc_region_consume(&region, 2);
        18. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        19. Condition region.length < (unsigned int)(hit_len + key_len), taking false branch.
    219        if (region.length < (unsigned)(hit_len + key_len)) {
    220                RETERR(DNS_R_FORMERR);
    221        }
    222
        20. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        21. Condition _r != 0, taking false branch.
    223        RETERR(mem_tobuffer(target, rr.base, 4 + hit_len + key_len));
        22. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        23. var_assign_var: Compound assignment involving tainted variable 4 + hit_len + key_len to variable source->current taints source->current.
    224        isc_buffer_forward(source, 4 + hit_len + key_len);
    225
    226        dns_decompress_setmethods(dctx, DNS_COMPRESS_NONE);

    CID 281461 (#1 of 1): Untrusted loop bound (TAINTED_SCALAR)
        24. tainted_data: Using tainted variable source->active - source->current as a loop boundary.
    Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range.
    227        while (isc_buffer_activelength(source) > 0) {
    228                dns_name_init(&name, NULL);
    229                RETERR(dns_name_fromwire(&name, source, dctx, options, target));
    230        }

(cherry picked from commit 2f946c831a)
2021-02-08 14:05:11 +11:00
Mark Andrews
6da9f238d4 Allow A records below '_spf' labels as recommend by RFC7208
(cherry picked from commit 63c16c8506)
2021-02-03 16:26:25 +01:00
Matthijs Mekking
ce2a37a990 Use NUM_KEYSTATES constant where appropriate
We use the number 4 a lot when working on key states. Better to use
the NUM_KEYSTATES constant instead.

(cherry picked from commit 98ace6d97d)
2021-02-03 15:48:20 +01:00
Matthijs Mekking
a8fba11da9 Cleanup keymgr.c
Three small cleanups:

1. Remove an unused keystr/dst_key_format.
2. Initialize a dst_key_state_t state with NA.
3. Update false comment about local policy (local policy only adds
   barrier on transitions to the RUMOURED state, not the UNRETENTIVE
   state).

(cherry picked from commit 189d9a2d21)
2021-02-03 15:47:40 +01:00
Matthijs Mekking
ceac392e19 Fix DS/DNSKEY hidden or chained functions
There was a bug in function 'keymgr_ds_hidden_or_chained()'.

The funcion 'keymgr_ds_hidden_or_chained()' implements (3e) of rule2
as defined in the "Flexible and Robust Key Rollover" paper. The rules
says: All DS records need to be in the HIDDEN state, or if it is not
there must be a key with its DNSKEY and KRRSIG in OMNIPRESENT, and
its DS in the same state as the key in question. In human langauge,
if all keys have their DS in HIDDEN state you can do what you want,
but if a DS record is available to some validators, there must be
a chain of trust for it.

Note that the barriers on transitions first check if the current
state is valid, and then if the next state is valid too. But
here we falsely updated the 'dnskey_omnipresent' (now 'dnskey_chained')
with the next state. The next state applies to 'key' not to the state
to be checked. Updating the state here leads to (true) always, because
the key that will move its state will match the falsely updated
expected state. This could lead to the assumption that Key 2 would be
a valid chain of trust for Key 1, while clearly the presence of any
DS is uncertain.

The fix here is to check if the DNSKEY and KRRSIG are in OMNIPRESENT
state for the key that does not have its DS in the HIDDEN state, and
only if that is not the case, ensure that there is a key with the same
algorithm, that provides a valid chain of trust, that is, has its
DNSKEY, KRRSIG, and DS in OMNIPRESENT state.

The changes in 'keymgr_dnskey_hidden_or_chained()' are only cosmetical,
renaming 'rrsig_omnipresent' to 'rrsig_chained' and removing the
redundant initialization of the DST_KEY_DNSKEY expected state to NA.

(cherry picked from commit 291bcc3721)
2021-02-03 15:47:30 +01:00
Matthijs Mekking
6ff0e99fa7 Update keymgr_key_is_successor() calls
The previous commit changed the function definition of
'keymgr_key_is_successor()', this commit updates the code where
this function is called.

In 'keymgr_key_exists_with_state()' the logic is also updated slightly
to become more readable. First handle the easy cases:
- If the key does not match the state, continue with the next key.
- If we found a key with matching state, and there is no need to
  check the successor relationship, return (true).
- Otherwise check the successor relationship.

In 'keymgr_key_has_successor()' it is enough to check if a key has
a direct successor, so instead of calling 'keymgr_key_is_successor()',
we can just check 'keymgr_direct_dep()'.

In 'dns_keymgr_run()', we want to make sure that there is no
dependency on the keys before retiring excess keys, so replace
'keymgr_key_is_successor()' with 'keymgr_dep()'.

(cherry picked from commit 600915d1b2)
2021-02-03 15:47:23 +01:00
Matthijs Mekking
5e40515671 Implement Equation(2) of "Flexible Key Rollover"
So far the key manager could only deal with two keys in a rollover,
because it used a simplified version of the successor relationship
equation from "Flexible and Robust Key Rollover" paper. The simplified
version assumes only two keys take part in the key rollover and it
for that it is enough to check the direct relationship between two
keys (is key x the direct predecessor of key z and is key z the direct
successor of key x?).

But when a third key (or more keys) comes into the equation, the key
manager would assume that one key (or more) is redundant and removed
it from the zone prematurely.

Fix by implementing Equation(2) correctly, where we check for
dependencies on keys:

z ->T x: Dep(x, T) = ∅ ∧
         (x ∈ Dep(z, T) ∨
          ∃ y ∈ Dep(z, T)(y != z ∧ y ->T x ∧ DyKyRySy = DzKzRzSz))

This says: key z is a successor of key x if:
- key x depends on key z if z is a direct successor of x,
- or if there is another key y that depends on key z that has identical
  key states as key z and key y is a successor of key x.
- Also, key x may not have any other keys depending on it.

This is still a simplified version of Equation(2) (but at least much
better), because the paper allows for a set of keys to depend on a
key. This is defined as the set Dep(x, T). Keys in the set Dep(x, T)
have a dependency on key x for record type T. The BIND implementation
can only have one key in the set Dep(x, T). The function
'keymgr_dep()' stores this key in 'uint32_t *dep' if there is a
dependency.

There are two scenarios where multiple keys can depend on a single key:

1. Rolling keys is faster than the time required to finish the
   rollover procedure. This scenario is covered by the recursive
   implementation, and checking for a chain of direct dependencies
   will suffice.

2. Changing the policy, when a zone is requested to be signed with
   a different key length for example. BIND 9 will not mark successor
   relationships in this case, but tries to move towards the new
   policy. Since there is no successor relationship, the rules are
   even more strict, and the DNSSEC reconfiguration is actually slower
   than required.

Note: this commit breaks the build, because the function definition
of 'keymgr_key_is_successor' changed. This will be fixed in the
following commit.

(cherry picked from commit cc38527b63)
2021-02-03 15:47:14 +01:00
Matthijs Mekking
3f6dafe1f4 Remove initialize goal code
Since keys now have their goals initialized in 'keymgr_key_init()',
remove this redundant piece of code in 'keymgr_key_run()'.

(cherry picked from commit 82632fa6d9)
2021-02-03 08:42:51 +01:00
Matthijs Mekking
4170288a91 Correctly initialize old key with state file
The 'key_init()' function is used to initialize a state file for keys
that don't have one yet. This can happen if you are migrating from a
'auto-dnssec' or 'inline-signing' to a 'dnssec-policy' configuration.

It did not look at the "Inactive" and "Delete" timing metadata and so
old keys left behind in the key directory would also be considered as
a possible active key. This commit fixes this and now explicitly sets
the key goal to OMNIPRESENT for keys that have their "Active/Publish"
timing metadata in the past, but their "Inactive/Delete" timing
metadata in the future. If the "Inactive/Delete" timing metadata is
also in the past, the key goal is set to HIDDEN.

If the "Inactive/Delete" timing metadata is in the past, also the
key states are adjusted to either UNRETENTIVE or HIDDEN, depending on
how far in the past the metadata is set.

(cherry picked from commit 76cf72e65a)
2021-02-03 08:42:32 +01:00
Diego Fronza
51663408bc Fix race condition on check_stale_header
This commit fix a race that could happen when two or more threads have
failed to refresh the same RRset, the threads could simultaneously
attempt to update the header->last_refresh_fail_ts field in
check_stale_header, a field used to implement stale-refresh-time.

By making this field atomic we avoid such race.

(cherry picked from commit c75575e350)
2021-01-29 15:29:00 +01:00
Matthijs Mekking
99c72bf5da Update code flow in query.c wrt stale data
First of all, there was a flaw in the code related to the
'stale-refresh-time' option. If stale answers are enabled, and we
returned stale data, then it was assumed that it was because we were
in the 'stale-refresh-time' window. But now we could also have returned
stale data because of a 'stale-answer-client-timeout'. To fix this,
introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to
indicate whether the stale cache entry was returned because the
'stale-refresh-time' window is active.

Second, remove the special case handling when the result is
DNS_R_NCACHENXRRSET. This can be done more generic in the code block
when dealing with stale data.

Putting all stale case handling in the code block when dealing with
stale data makes the code more easy to follow.

Update documentation to be more verbose and to match then new code
flow.

(cherry picked from commit fa0c9280d2)
2021-01-29 10:43:41 +01:00
Diego Fronza
0e62c53c5b Extracted common function from query_lookup and query_refresh_rrset
Both functions employed the same code lines to allocate query context
buffers, which are used to store query results, so this shared portion
of code was extracted out to a new function, qctx_prepare_buffers.

Also, this commit uses qctx_init to initialize the query context whitin
query_refresh_rrset function.

(cherry picked from commit 966060c03b)
2021-01-29 10:43:27 +01:00
Diego Fronza
5cbb28a40e Small optimization in query_usestale
This commit makes the code in query_usestale easier to follow, it also
doesn't attach/detach to the database if stale answers are not enabled.

(cherry picked from commit f89ac07b28)
2021-01-29 10:41:39 +01:00
Diego Fronza
8324c3ddfe Allow stale data to be used before name resolution
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.

For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.

To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:

. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
  RRset due to timeout (resolver-query-timeout), its intent is to
  try to look for stale data in cache as a fallback, but only if
  stale answers are enabled in configuration.

  This flag is also used to activate stale-refresh-time window, since it
  is the only way the database knows that a resolution has failed.

. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
  that it may use stale data. It is always set during query lookup if
  stale answers are enabled, but only effectively used during
  stale-refresh-time window. Also during this window, the resolver will
  not try to resolve the query, in other words no attempt to refresh the
  data in cache is made when the stale-refresh-time window is active.

. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
  stale data from the database, but not due to a failure in resolution,
  it also doesn't require stale-refresh-time window timer to be active.
  As long as there is a stale RRset available, it should be returned.
  It is mainly used in two situations:

    1. When stale-answer-client-timeout timer is triggered: in that case
       we want to know if there is stale data available to answer the
       client.
    2. When stale-answer-client-timeout value is set to zero: in that
       case, we also want to know if there is some stale RRset available
       to promptly answer the client.

We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:

	1. Are we running query_lookup() due to stale-answer-client-timeout
       timer being triggered?

       In this case, we look for stale data, making use of
       DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
       respond the client with the data found, mark this query as
       answered (query attribute NS_QUERYATTR_ANSWERED), so when the
       fetch completes the client won't be answered twice.

       We must also take care of not detaching from the client, as a
       fetch will still be running in background, this is handled by the
       following snippet:

       if (!QUERY_STALEONLY(&client->query)) {
           isc_nmhandle_detach(&client->reqhandle);
       }

       Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
       means we are here due to a stale-answer-client-timeout timer
       expiration.

    2. Are we running query_lookup() due to resolver-query-timeout being
       triggered?

       In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
       to look for stale data will be made.
       As already explained, this flag is algo used to activate
       stale-refresh-time window, as it means that we failed to refresh
       a RRset due to timeout.
       It is ok in this situation to detach from the client, as the
       fetch is already completed.

    3. Are we running query_lookup() during the first time, looking for
       a RRset in cache and stale-answer-client-timeout value is set to
       zero?

       In this case, if stale answers are enabled (probably), we must do
       an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
       indicate to the database that we want stale data.

       If we find an active RRset, proceed as normal, answer the client
       and the query is done.

       If we find a stale RRset we respond to the client and mark the
       query as answered, but don't detach from the client yet as an
       attempt in refreshing the RRset will still be made by means of
       the new introduced function 'query_resolve'.

       If no active or stale RRset is available, begin resolution as
       usual.

(cherry picked from commit e219422575)
2021-01-29 10:39:09 +01:00
Diego Fronza
0aebad96b5 Added option for disabling stale-answer-client-timeout
This commit allows to specify "disabled" or "off" in
stale-answer-client-timeout statement. The logic to support this
behavior will be added in the subsequent commits.

This commit also ensures an upper bound to stale-answer-client-timeout
which equals to one second less than 'resolver-query-timeout'.

(cherry picked from commit 0ad6f594f6)
2021-01-29 10:38:58 +01:00
Diego Fronza
3478794a5d Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.

(cherry picked from commit 171a5b7542)
2021-01-29 10:38:32 +01:00
Diego Fronza
7bf8950a0a Added dns_view_staleanswerenabled() function
Since it takes a couple lines of code to check whether stale answers
are enabled for a given view, code was extracted out to a proper
function.

(cherry picked from commit 74840ec50b)
2021-01-29 10:35:26 +01:00
Diego Fronza
f3bd27373d Avoid iterating name twice when constructing fctx->info
This is a minor performance improvement, we store the result of the
first call to strlcat to use as an offset in the next call when
constructing fctx->info string.

(cherry picked from commit 49c40827f6)
2021-01-29 10:35:17 +01:00
Mark Andrews
6a0b751555 Require 'ctx' to be non-NULL in cfg_acl_fromconfig{,2}
(cherry picked from commit a8b55992a8)
2021-01-28 13:43:47 +11:00
Mark Andrews
afc75de0cc Optimise dnssec-verify
dns_dnssec_keyfromrdata() only needs to be called once per DNSKEY
rather than once per verification attempt.

(cherry picked from commit c75b325832)
2021-01-28 12:18:31 +11:00
Mark Andrews
b416d8fcdf Improve the diagnostic 'rndc retransfer' error message
(cherry picked from commit dd3520ae41)
2021-01-28 09:44:26 +11:00
Matthijs Mekking
4a36b6d918 Make opensslecdsa_parse use fromlabel
When 'opensslecdsa_parse()' encounters a label tag in the private key
file, load the private key with 'opensslecdsa_fromlabel()'. Otherwise
load it from the private structure.

This was attempted before with 'load_privkey()' and 'uses_engine()',
but had the same flaw as 'opensslecdsa_fromlabel()' had previously,
that is getting the private and public key separately, juggling with
pointers between EC_KEY and EVP_PKEY, did not create a valid
cryptographic key that could be used for signing.

(cherry picked from commit 57ac70ad46)
2021-01-26 15:04:59 +01:00
Matthijs Mekking
97185ecac2 Simplify opensslecdsa_fromlabel
The 'opensslecdsa_fromlabel()' function does not need to get the
OpenSSL engine twice to load the private and public key. Also no need
to call 'dst_key_to_eckey()' as the EC_KEY can be derived from the
loaded EVP_PKEY's.

Add some extra checks to ensure the key has the same base id and curve
(group nid) as the dst key.

Since we already have the EVP_PKEY, no need to call 'finalize_eckey()',
instead just set the right values in the key structure.

(cherry picked from commit 393052d6ff)
2021-01-26 15:04:51 +01:00
Matthijs Mekking
f555cec0af Replace EVP_DigestFinal with EVP_DigestFinal_ex
The openssl docs claim that EVP_DigestFinal() is obsolete and that
one should use EVP_DigestFinal_ex() instead.

(cherry picked from commit 1fcd0ef8bd)
2021-01-26 15:04:38 +01:00
Matthijs Mekking
9e2ea5efb1 Don't set pubkey if eckey already has public key
The 'ecdsa_check()' function tries to correctly set the public key
on the eckey, but this should be skipped if the public key is
retrieved via the private key.

(cherry picked from commit 06b9724152)
2021-01-26 15:04:21 +01:00
Matthijs Mekking
e3acfb44d5 ECDSA code should not use RSA label
The 'opensslecdsa_tofile()' function tags the label as an RSA label,
that is a copy paste error and should be of course an ECDSA label.

(cherry picked from commit 46afeca8bf)
2021-01-26 15:04:11 +01:00
Matthijs Mekking
8b25d3ab57 Correctly update pointers to pubkey and privkey
The functions 'load_pubkey_from_engine()' and
'load_privkey_from_engine()' did not correctly store the pointers.

Update both functions to add 'EC_KEY_set_public_key()' and
'EC_KEY_set_private_key()' respectively, so that the pointers to
the public and private keys survive the "load from engine" functions.

(cherry picked from commit 01239691a1)
2021-01-26 15:04:03 +01:00
Matthijs Mekking
f66df9f1b7 load_pubkey_from_engine() should load public key
The 'function load_pubkey_from_engine()' made a call to the libssl
function 'ENGINE_load_private_key'.  This is a copy paste error and
should be 'ENGINE_load_public_key'.

(cherry picked from commit 370285a62d)
2021-01-26 15:03:43 +01:00
Evan Hunt
077e2c2a74 add serial number to "transfer ended" log messages 2021-01-26 12:38:32 +01:00