Commit graph

9952 commits

Author SHA1 Message Date
Matthijs Mekking
acc95d4e1d Don't servfail on staleonly lookups
When a staleonly lookup doesn't find a satisfying answer, it should
not try to respond to the client.

This is not true when the initial lookup is staleonly (that is when
'stale-answer-client-timeout' is set to 0), because no resolver fetch
has been created at this point. In this case continue with the lookup
normally.

(cherry picked from commit f8b7b597e9)
2021-02-25 12:07:34 +01:00
Matthijs Mekking
84deb57bc3 Don't allow recursion on staleonly lookups
Fix a crash that can happen in the following scenario:

A client request is received. There is no data for it in the cache,
(not even stale data). A resolver fetch is created as part of
recursion.

Some time later, the fetch still hasn't completed, and
stale-answer-client-timeout is triggered. A staleonly lookup is
started. It will also find no data in the cache.

So 'query_lookup()' will call 'query_gotanswer()' with ISC_R_NOTFOUND,
so this will call 'query_notfound()' and this will start recursion.

We will eventually end up in 'ns_query_recurse()' and that requires
the client query fetch to be NULL:

    REQUIRE(client->query.fetch == NULL);

If the previously started fetch is still running this assertion
fails.

The crash is easily prevented by not requiring recursion for
staleonly lookups.

Also remove a redundant setting of the staleonly flag at the end of
'query_lookup_staleonly()' before destroying the query context.

Add a system test to catch this case.

(cherry picked from commit 9e061faaae)
2021-02-25 12:07:27 +01:00
Matthijs Mekking
ddfb9ea8c1 Add tests for NSEC3 on dynamic zones
GitLab issue #2498 is a bug report on NSEC3 with dynamic zones. Tests
for it in the nsec3 system test directory were missing.

(cherry picked from commit 0c0f10b53f)
2021-02-25 10:55:51 +01:00
Mark Andrews
c5ad174129 Silence CID 320481: Null pointer dereferences
*** CID 320481:  Null pointer dereferences  (REVERSE_INULL)
    /bin/tests/wire_test.c: 261 in main()
    255     			process_message(input);
    256     		}
    257     	} else {
    258     		process_message(input);
    259     	}
    260
       CID 320481:  Null pointer dereferences  (REVERSE_INULL)
       Null-checking "input" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    261     	if (input != NULL) {
    262     		isc_buffer_free(&input);
    263     	}
    264
    265     	if (printmemstats) {
    266     		isc_mem_stats(mctx, stdout);

(cherry picked from commit 658c950d7b)
2021-02-24 00:08:57 +11:00
Matthijs Mekking
9b4e067206 Minor kasp test fixes
Two minor fixes in the kasp system test:

1. A wrong comment in ns3/setup.sh (we are subtracting 2 hours, not
   adding them).
2. 'get_keyids' used bad parameters "$1" "$2" when 'check_numkeys'
   failed. Also, 'check_numkeys' can use $DIR, $ZONE, and $NUMKEYS
   directly, no need to pass them.

(cherry picked from commit 5be26898c0)
2021-02-23 09:19:23 +01:00
Matthijs Mekking
fc9dcbf419 Test purge-keys option
Add some more zones to the kasp system test to test the 'purge-keys'
option. Three zones test that the predecessor key files are removed
after the purge keys interval, one test checks that the key files
are retained if 'purge-keys' is disabled. For that, we change the
times to 90 days in the past (the default value for 'purge-keys').

(cherry picked from commit 6333ff15f0)
2021-02-23 09:19:11 +01:00
Matthijs Mekking
45dcabf411 Add purge-keys config option
Add a new option 'purge-keys' to 'dnssec-policy' that will purge key
files for deleted keys. The option determines how long key files
should be retained prior to removing the corresponding files from
disk.

If set to 0, the option is disabled and 'named' will not remove key
files from disk.

(cherry picked from commit 313de3a7e2)
2021-02-23 09:18:55 +01:00
Michal Nowak
5ded078daa
Use FEATURETEST variable instead of a path
feature-test tool location needs to be determined by its associated
variable; otherwise, the tool is not found on Windows:

    setup.sh: line 22: ../feature-test: No such file or directory

(cherry picked from commit 102f012631)
2021-02-18 15:47:57 +01:00
Matthijs Mekking
1ffe0accf5 Fix eddsa system test
Use the correct conf.sh setup commands in ns3/sign.sh
2021-02-18 08:37:48 +00:00
Michal Nowak
c65c3d5153
Prevent Git to expand $systest
CentOS 8 "git status" unexpectedly expands search directory "tsig" to
also search in the "tsiggss" directory, thus incorrectly identifying
files as "not removed" in the "tsig" directory:

$ git status -su --ignored tsig
$ touch tsiggss/ns1/{named.run,named.memstats}
$ git status -su --ignored tsig
!! tsiggss/ns1/named.memstats
!! tsiggss/ns1/named.run

(cherry picked from commit f310b75250)
2021-02-18 08:20:54 +01:00
Michal Nowak
78c5a80817
Clean omitted files from system tests
Any CI job:
- I:dnssec:file dnssec/ns1/trusted.keys not removed
- I:rpzrecurse:file rpzrecurse/ns3/named.run.prev not removed

system:gcc:sid:amd64:
- I🪞file mirror/ns3/_default.nzf not removed

system:gcc:xenial:amd64:
- I:shutdown:file shutdown/.cache/v/cache/lastfailed not removed

(cherry picked from commit 14a104d121)
2021-02-18 08:20:54 +01:00
Michal Nowak
382ace6db6
Add system test name to "file not removed" info
(cherry picked from commit 10bf725ee2)
2021-02-18 08:20:54 +01:00
Michal Nowak
59499ecc3a
Merge UNTESTED and SKIPPED system test results
Descriptions of UNTESTED and SKIPPED system test results are very
similar to one another and it may be confusing when to pick one and
when the other. Merging these two system test results removes the
confusion.

(cherry picked from commit 29d7c6e449)
2021-02-17 12:06:33 +01:00
Mark Andrews
d51b78c85b Stop including <gssapi.h> from <dst/gssapi.h> header
The only reason for including the gssapi.h from the dst/gssapi.h header
was to get the typedefs of gss_cred_id_t and gss_ctx_id_t.  Instead of
using those types directly this commit introduces dns_gss_cred_id_t and
dns_gss_ctx_id_t types that are being used in the public API and
privately retyped to their counterparts when we actually call the gss
api.

This also conceals the gssapi headers, so users of the libdns library
doesn't have to add GSSAPI_CFLAGS to the Makefile when including libdns
dst API.
2021-02-16 12:08:21 +11:00
Ondřej Surý
4bbe3e75de Stop including dnstap headers from <dns/dnstap.h>
The <fstrm.h> and <protobuf-c/protobuf-c.h> headers are only directly
included where used and we stopped exposing those headers from libdns
headers.
2021-02-16 12:08:21 +11:00
Mark Andrews
bf5aac225b Stop including <lmdb.h> from <dns/lmdb.h>
The lmdb.h header doesn't have to be included from the dns/lmdb.h
header as it can be separately included where used.  This stops
exposing the inclusion of lmdb.h from the libdns headers.
2021-02-16 12:08:21 +11:00
Mark Andrews
b8fc8742e5 Re-order include directories
${FSTRM_CFLAGS} ${PROTOBUF_C_CFLAGS} ${OPENSSL_CFLAGS} ${LMDB_CFLAGS}
need to appear after all directories in the build tree.
2021-02-16 12:08:21 +11:00
Diego Fronza
80c1a44643 Test reconfig after adding inline signed zones won't crash named
This test ensures that named won't crash after many inline-signed zones
are added to configurarion, followed by a rndc reconfig.
2021-02-15 11:53:24 -03:00
Diego Fronza
d89a8bf696 Fix dangling references to outdated views after reconfig
This commit fix a leak which was happening every time an inline-signed
zone was added to the configuration, followed by a rndc reconfig.

During the reconfig process, the secure version of every inline-signed
zone was "moved" to a new view upon a reconfig and it "took the raw
version along", but only once the secure version was freed (at shutdown)
was prev_view for the raw version detached from, causing the old view to
be released as well.

This caused dangling references to be kept for the previous view, thus
keeping all resources used by that view in memory.
2021-02-15 11:52:50 -03:00
Matthijs Mekking
40e56b0dcc Refactor ecdsa system test
Similar to eddsa system test.

(cherry picked from commit 650b0d4691)
2021-02-09 16:06:50 +01:00
Matthijs Mekking
4538d8ddf2 Refactor eddsa system test
Test for Ed25519 and Ed448. If both algorithms are not supported, skip
test. If only one algorithm is supported, run test, skip the
unsupported algorithm. If both are supported, run test normally.

Create new ns3. This will test Ed448 specifically, while now ns2 only
tests Ed25519. This moves some files from ns2/ to ns3/.

(cherry picked from commit 8bf31d0592)
2021-02-09 16:06:50 +01:00
Matthijs Mekking
5af3a46ac0 Fix testcrypto.sh
Testing Ed448 was actually testing Ed25519.

(cherry picked from commit 572d7ec3b7)
2021-02-09 15:25:25 +01:00
Matthijs Mekking
70fbbedc24 Adjust serve-stale test
The number of queries to use in the burst can be reduced, as we have
a very low fetch limit of 1.

The dig command in 'wait_for_fetchlimits()' should time out sooner as
we expect a SERVFAIL to be returned promptly.

Enabling serve-stale can be done before hitting fetch-limits. This
reduces the chance that the resolver queries time out and fetch count
is reset. The chance of that happening is already slim because
'resolver-query-timeout' is 10 seconds, but better to first let the
data become stale rather than doing that while attempting to resolve.

(cherry picked from commit 00f575e7ef)
2021-02-08 16:10:12 +01:00
Matthijs Mekking
2e28e5587e Add test for serve-stale /w fetch-limits
Add a test case when fetch-limits are reached and we have stale data
in cache.

This test starts with a positive answer for 'data.example/TXT' in
cache.

1. Reload named.conf to set fetch limits.
2. Disable responses from the authoritative server.
3. Now send a batch of queries to the resolver, until hitting the
   fetch limits. We can detect this by looking at the response RCODE,
   at some point we will see SERVFAIL responses.
4. At that point we will turn on serve-stale.
5. Clients should see stale answers now.
6. An incoming query should not set the stale-refresh-time window,
   so a following query should still get a stale answer because of a
   resolver failure (and not because it was in the stale-refresh-time
   window).

(cherry picked from commit 11b74fc176)
2021-02-08 16:07:43 +01:00
Mark Andrews
a900d79ea8 Cleanup redundant isc_rwlock_init() result checks
(cherry picked from commit 3b11bacbb7)
2021-02-08 15:13:49 +11:00
Mark Andrews
7976a7264a Check that A record is accepted with _spf label present
(cherry picked from commit a3b2b86e7f)
2021-02-03 16:26:32 +01:00
Matthijs Mekking
76f7f598b3 Add kasp test todo for [#2375]
This bugfix has been manually verified but is missing a unit test.
Created GL #2471 to track this.

(cherry picked from commit 189f5a3f28)
2021-02-03 15:48:29 +01:00
Matthijs Mekking
314accf7ef Update legacy-keys kasp test
The 'legacy-keys.kasp' test checks that a zone with key files but not
yet state files is signed correctly. This test is expanded to cover
the case where old key files still exist in the key directory. This
covers bug #2406 where keys with the "Delete" timing metadata are
picked up by the keymgr as active keys.

Fix the 'legacy-keys.kasp' test, by creating the right key files
(for zone 'legacy-keys.kasp', not 'legacy,kasp').

Use a unique policy for this zone, using shorter lifetimes.

Create two more keys for the zone, and use 'dnssec-settime' to set
the timing metadata in the past, long enough ago so that the keys
should not be considered by the keymgr.

Update the 'key_unused()' test function, and consider keys with
their "Delete" timing metadata in the past as unused.

Extend the test to ensure that the keys to be used are not the old
predecessor keys (with their "Delete" timing metadata in the past).

Update the test so that the checks performed are consistent with the
newly configured policy.

(cherry picked from commit d4b2b7072d)
2021-02-03 08:41:00 +01:00
Michal Nowak
7aa33bf5e4
Add README.md file to rsabigexponent system test
This README.md describes why is bigkey needed.

(cherry picked from commit a247f24dfa)
2021-01-29 15:54:07 +01:00
Matthijs Mekking
dba01187e3 Rewrap comments to 80 char width serve-stale test
(cherry picked from commit d8c6655d7d)
2021-01-29 10:43:51 +01:00
Diego Fronza
b89fc52cd1 Add documentation for stale-answer-client-timeout
(cherry picked from commit 6ab9070457)
2021-01-29 10:39:31 +01:00
Diego Fronza
bea63000db Add system tests for stale-answer-client-timeout
This commit add 4 tests for the new option:
	1. Test default configuration of stale-answer-client-timeout, a
	   value of 1.8 seconds, with stale-refresh-time disabled.

	2. Test disabling of stale-answer-client-timeout.

	3. Test stale-answer-client-timeout with a value of zero, in this
	   case we take advantage of a log entry which shows that a stale
	   answer was promptly used before an attempt to refresh the RRset
	   is made. We also check, by activating a disabled authoritative
	   server, that the RRset was successfully refreshed after that.

	4. Test stale-answer-client-timeout 0 with stale-refresh-time 4, in
	   this test we want to ensure a couple things:

	   - If we have a stale RRSet entry in cache, a request must be
		 promptly answered with this data, while BIND must also attempt
		 to refresh the RRSet in background.

	   - If the attempt to refresh the RRSet times out, the RRSet must
		 have its stale-refresh-time window activated.

	   - If a new request for the same RRSet arrives, it must be
		 promptly answered with stale data due to stale-refresh-time
		 being active for this RRSet, in this case no attempt to refresh
		 the RRSet is made.

	   - Enable authoritative server, ensure that the RRSet was not
		 refreshed, to honor stale-refresh-time.

	   - Wait for stale-refresh-window time pass, send another request
		 for the same RRSet, this time we expect the answer to be the
		 stale entry in cache being hit due to
		 stale-answer-client-timeout 0.

	    - Send another request, this time we expect the answer to be an
		  active RRSet, since it must have been refreshed during the
		  previous request.

(cherry picked from commit 35fd039d03)
2021-01-29 10:39:20 +01:00
Diego Fronza
0aebad96b5 Added option for disabling stale-answer-client-timeout
This commit allows to specify "disabled" or "off" in
stale-answer-client-timeout statement. The logic to support this
behavior will be added in the subsequent commits.

This commit also ensures an upper bound to stale-answer-client-timeout
which equals to one second less than 'resolver-query-timeout'.

(cherry picked from commit 0ad6f594f6)
2021-01-29 10:38:58 +01:00
Diego Fronza
93ca968644 Adjusted serve-stale test
After the addition of stale-answer-client-timeout a test was broken due
to the following behavior expected by the test.

1. Prime cache data.example txt.
2. Disable authoritative server.
3. Send a query for data.example txt.
4. Recursive server will timeout and answer from cache with stale RRset.
5. Recursive server will activate stale-refresh-time due to the previous
   failure in attempting to refresh the RRset.
6. Send a query for data.example txt.
7. Expect stale answer from cache due to stale-refresh-time
window being active, even if authoritative server is up.

Problem is that in step 4, due to the new option
stale-answer-client-timeout, recursive server will answer with stale
data before the actual fetch completes.

Since the original fetch is still running in background, if we re-enable
the authoritative server during that time, the RRset will actually be
successfully refreshed, and stale-refresh-window will not be activated.

The next queries will fail because they expect the TTL of the RRset to
match the one in the stale cache, not the one just refreshed.

To solve this, we explicitly disable stale-answer-client-timeout for
this test, as it's not the feature we are interested in testing here
anyways.

(cherry picked from commit a12bf4b61b)
2021-01-29 10:38:47 +01:00
Diego Fronza
3478794a5d Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.

(cherry picked from commit 171a5b7542)
2021-01-29 10:38:32 +01:00
Mark Andrews
f217a0cbae Stop xmlFreeTextWriter being called twice
xmlFreeTextWriter could be called twice if xmlDocDumpFormatMemoryEnc
failed.

(cherry picked from commit b5cf54252a)
2021-01-28 21:42:44 +00:00
Mark Andrews
abb2eb9d97 Add a named acl example
(cherry picked from commit 3dee62cfa5)
2021-01-28 13:43:47 +11:00
Mark Andrews
85318b521d Pass an afg_aclconfctx_t structure to cfg_acl_fromconfig
in named_zone_inlinesigning.  A NULL pointer does not work.

(cherry picked from commit 2b3fcd7156)
2021-01-28 13:43:47 +11:00
Mark Andrews
a30675998b Check that 'nsupdate -y' works for all HMAC algorithms
(cherry picked from commit 4b01ba44ea)
2021-01-28 12:57:03 +11:00
Mark Andrews
52b73db20b Check 'rndc retransfer' of primary error message
(cherry picked from commit 8f36b8567a)
2021-01-28 09:44:26 +11:00
Mark Andrews
b416d8fcdf Improve the diagnostic 'rndc retransfer' error message
(cherry picked from commit dd3520ae41)
2021-01-28 09:44:26 +11:00
Mark Andrews
702a00d10e Report unknown dash option during the pre-parse phase
(cherry picked from commit 3361c0d6f8)
2021-01-26 14:18:54 +01:00
Evan Hunt
62202b0e6d prevent ixfr/ns1 being removed 2021-01-26 12:38:32 +01:00
Evan Hunt
9529d1ed0d add a system test for AXFR fallback when max-ixfr-ratio is exceeded
also cleaned up the ixfr system test:

- use retry_quiet when applicable
- use scripts to generate test zones
- improve consistency
2021-01-26 12:38:32 +01:00
Evan Hunt
57aadd6cea add syntax and setter/getter functions to configure max-ixfr-ratio 2021-01-26 12:38:32 +01:00
Evan Hunt
0a1e1ead94 check whether taskset works before running cpu test
the taskset command used for the cpu system test seems
to be failing under vmware, causing a test failure. we
can try the taskset command and skip the test if it doesn't
work.

(cherry picked from commit a8a49bb783)
2021-01-20 15:44:31 -08:00
Matthijs Mekking
87b44b59c8 Update documentation on -E option
The -E option does not default to pkcs11 if --with-pkcs11 is set,
but always needs to be set explicitly.

(cherry picked from commit 0536375d4cf61c9b570a32e808dde78a7ef859bf)
2021-01-19 09:06:01 +01:00
Matthijs Mekking
57c6017d91 Fix control flow issue CID 314969 in zoneconf.c
Coverity Scan identified the following issue in bin/named/zoneconf.c:

    *** CID 314969:  Control flow issues  (DEADCODE)
    /bin/named/zoneconf.c: 2212 in named_zone_inlinesigning()

    if (!inline_signing && !zone_is_dynamic &&
        cfg_map_get(zoptions, "dnssec-policy", &signing) == ISC_R_SUCCESS &&
        signing != NULL)
    {
        if (strcmp(cfg_obj_asstring(signing), "none") != 0) {
            inline_signing = true;
    >>>     CID 314969:  Control flow issues  (DEADCODE)
    >>>     Execution cannot reach the expression ""no"" inside this statement: "dns_zone_log(zone, 1, "inli...".
            dns_zone_log(
                zone, ISC_LOG_DEBUG(1), "inline-signing: %s",
                inline_signing
                ? "implicitly through dnssec-policy"
                : "no");
        } else {
                ...
        }
    }

This is because we first set 'inline_signing = true' and then check
its value in 'dns_zone_log'.

(cherry picked from commit 8df629d0b2)
2021-01-18 14:40:26 +00:00
Matthijs Mekking
f77ec3cf58 Update serve-stale system test with new defaults
(cherry picked from commit 3be65246f8)
2021-01-15 10:38:45 +01:00
Matthijs Mekking
4d48df7f97 Update serve-stale config defaults
Change the serve-stale configuration defaults so that they match the
recommendations from RFC 8767.

(cherry picked from commit e15a433b23)
2021-01-15 10:38:30 +01:00