Commit graph

4581 commits

Author SHA1 Message Date
Michał Kępień
fec4664ab0 Set "max-cache-size" in the "geoip2" system test
The named configuration files used in the "geoip2" system test cause a
rather large number of views (6-8) to be set up in each tested named
instance.  Each view has its own cache.

Commit aa72c31422 caused the RBT hash
table to be pre-allocated to a size derived from "max-cache-size", so
that it never needs to be rehashed.  The size of that hash table is not
expected to be significant enough to cause memory use issues in typical
conditions even for large "max-cache-size" settings.

However, these two factors combined can cause memory exhaustion issues
in GitLab CI, where we run multiple "instances" of the test suite in
parallel on the same runner, each test suite executes multiple system
tests concurrently, and each system test may potentially start multiple
named instances at the same time.  In practice, this problem currently
only seems to be affecting the "geoip2" system test, which is failing
intermittently due to named instances used by that test getting killed
by oom-killer.

Prevent the "geoip2" system test from failing intermittently by setting
"max-cache-size" in named configuration files used in that test to a low
value in order to keep memory usage at bay even with a large number of
views configured.

(cherry picked from commit 4292d5bdfe)
2020-08-05 11:08:24 +02:00
Matthijs Mekking
f3103660d0 keyword 'primaries' is unknown in 9.16
In 9.17 we introduced 'primaries' as a synonym for 'masters' in the
configuration file. This synonym has not been backported so change
the serve-stale test to make use of the 'masters' keyword.
2020-08-05 09:09:16 +02:00
Matthijs Mekking
c92de6cb44 stale-cache-enable is enabled by default
Because this is a backport, the option should default to keep the
serve-stale caching enabled.
2020-08-05 09:09:16 +02:00
Ondřej Surý
c4e6ade0e5 Add tests with stale-cache-disabled into serve-stale system test
Add a fifth named (ns5) that runs with `stale-cache-enable no;` and
check that there are no stale records in the cache.

(cherry picked from commit abc2ab9223)
2020-08-05 09:09:16 +02:00
Ondřej Surý
b48e9ab201 Add stale-cache-enable option and disable serve-stable by default
The current serve-stale implementation in BIND 9 stores all received
records in the cache for a max-stale-ttl interval (default 12 hours).

This allows DNS operators to turn the serve-stale answers in an event of
large authoritative DNS outage.  The caching of the stale answers needs
to be enabled before the outage happens or the feature would be
otherwise useless.

The negative consequence of the default setting is the inevitable
cache-bloat that happens for every and each DNS operator running named.

In this MR, a new configuration option `stale-cache-enable` is
introduced that allows the operators to selectively enable or disable
the serve-stale feature of BIND 9 based on their decision.

The newly introduced option has been disabled by default,
e.g. serve-stale is disabled in the default configuration and has to be
enabled if required.

(cherry picked from commit ce53db34d6)
2020-08-05 09:09:16 +02:00
Mark Andrews
20bc6aefff Check rcode is FORMERR
(cherry picked from commit 88ff6b846c)
2020-08-04 23:04:34 +10:00
Michał Kępień
e734651fbd Only run system tests as root in developer mode
Running system tests with root privileges is potentially dangerous.
Only allow it when explicitly requested (by building with
--enable-developer).

(cherry picked from commit 3ef106f69d)
2020-07-31 07:46:27 +02:00
Michal Nowak
0f319908f0
Remove cross-test dependency on ckdnsrps.sh 2020-07-30 16:25:23 +02:00
Michal Nowak
72a6b0dc6f
Fix name of the test directory of stop.pl in masterformat test 2020-07-30 16:24:18 +02:00
Michal Nowak
24f5f68d7a
Ensure test fails if packet.pl does not work as expected 2020-07-30 16:20:46 +02:00
Ondřej Surý
aa72c31422 Fix the rbt hashtable and grow it when setting max-cache-size
There were several problems with rbt hashtable implementation:

1. Our internal hashing function returns uint64_t value, but it was
   silently truncated to unsigned int in dns_name_hash() and
   dns_name_fullhash() functions.  As the SipHash 2-4 higher bits are
   more random, we need to use the upper half of the return value.

2. The hashtable implementation in rbt.c was using modulo to pick the
   slot number for the hash table.  This has several problems because
   modulo is: a) slow, b) oblivious to patterns in the input data.  This
   could lead to very uneven distribution of the hashed data in the
   hashtable.  Combined with the single-linked lists we use, it could
   really hog-down the lookup and removal of the nodes from the rbt
   tree[a].  The Fibonacci Hashing is much better fit for the hashtable
   function here.  For longer description, read "Fibonacci Hashing: The
   Optimization that the World Forgot"[b] or just look at the Linux
   kernel.  Also this will make Diego very happy :).

3. The hashtable would rehash every time the number of nodes in the rbt
   tree would exceed 3 * (hashtable size).  The overcommit will make the
   uneven distribution in the hashtable even worse, but the main problem
   lies in the rehashing - every time the database grows beyond the
   limit, each subsequent rehashing will be much slower.  The mitigation
   here is letting the rbt know how big the cache can grown and
   pre-allocate the hashtable to be big enough to actually never need to
   rehash.  This will consume more memory at the start, but since the
   size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
   will only consume maximum of 32GB of memory for hashtable in the
   worst case (and max-cache-size would need to be set to more than
   4TB).  Calling the dns_db_adjusthashsize() will also cap the maximum
   size of the hashtable to the pre-computed number of bits, so it won't
   try to consume more gigabytes of memory than available for the
   database.

   FIXME: What is the average size of the rbt node that gets hashed?  I
   chose the pagesize (4k) as initial value to precompute the size of
   the hashtable, but the value is based on feeling and not any real
   data.

For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.

Notes:
a. A doubly linked list should be used here to speedup the removal of
   the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/

(cherry picked from commit e24bc324b4)
2020-07-30 11:57:24 +02:00
Diego Fronza
1a101f223c Add test for RPZ wildcard passthru ignored fix 2020-07-27 17:17:02 -03:00
Mark Andrews
b0942c2442 Check walking the hip rendezvous servers.
Also fixes extraneous white space at end of record when
there are no rendezvous servers.

(cherry picked from commit 78db46d746)
2020-07-24 15:24:49 +10:00
Michal Nowak
9509af7008
Check tests for core files regardless of test status
Failed test should be checked for core files et al. and have
backtrace generated.
2020-07-20 13:09:06 +02:00
Michal Nowak
ace988990a
Rationalize backtrace logging
GDB backtrace generated via "thread apply all bt full" is too long for
standard output, lets save them to .txt file among other log files.
2020-07-20 12:48:29 +02:00
Michal Nowak
c2bbe11349
Fold stop_servers_failed() to stop_servers() 2020-07-20 12:48:11 +02:00
Mark Andrews
90154d203b Add regression test for [GL !3735]
Check that resign interval is actually in days rather than hours
by checking that RRSIGs are all within the allowed day range.

(cherry picked from commit 11ecf7901b)
2020-07-14 12:11:42 +10:00
Mark Andrews
7e62d76b6b Don't verify the zone when setting expire to "now+1s" as it can fail
as too much wall clock time may have elapsed.

Also capture signzone output for forensic analysis

(cherry picked from commit a0e8a11cc6)
2020-07-13 12:42:46 +10:00
Ondřej Surý
b9b1366bf0 Add prereq.sh script to the shutdown system test
The shutdown test requires python, pytest and dnspython.
2020-07-03 08:54:01 +02:00
Matthijs Mekking
de02eb55b5 Fix kasp test set_keytime
While the creation and publication times of the various keys
in this policy are nearly at the same time there is a chance that
one key is created a second later than the other.

The `set_keytimes_algorithm_policy` mistakenly set the keytimes
for KEY3 based of the "published" time from KEY2.

(cherry picked from commit 24e07ae98e)
2020-07-02 04:56:20 +00:00
Diego Fronza
004849fd36 Added test for the fix
This test ensures that named will correctly shutdown
when receiving multiple control connections after processing
of either "rncd stop" or "kill -SIGTERM" commands.

Before the fix, named was crashing due to a race condition happening
between two threads, one running shutdown logic in named/server.c
and other handling control logic in controlconf.c.

This test tries to reproduce the above scenario by issuing multiple
queries to a target named instance, issuing either rndc stop or kill
-SIGTERM command to the same named instance, then starting multiple rndc
status connections to ensure it is not crashing anymore.

(cherry picked from commit 042e509753)
2020-07-01 12:52:51 +02:00
Matthijs Mekking
f1b3686cd2 Output rndc dnssec -status
Implement the 'rndc dnssec -status' command that will output
some information about the key states, such as which policy is
used for the zone, what keys are in use, and when rollover is
scheduled.

Add loose testing in the kasp system test, the actual times are
already tested via key file inspection.

(cherry picked from commit 19ce9ec1d4)
2020-07-01 09:57:44 +02:00
Evan Hunt
6b00e5f5a0 update the acl system test to include a blackhole test case
this ACL was previously untested, which allowed a regression to
go undetected.

(cherry picked from commit e3ee138098)
2020-06-30 19:41:42 -07:00
Michał Kępień
62f631f798 Silence PyYAML warning
Make yaml.load_all() use yaml.SafeLoader to address a warning currently
emitted when bin/tests/system/dnstap/ydump.py is run:

    ydump.py:28: YAMLLoadWarning: calling yaml.load_all() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
      for l in yaml.load_all(f.stdout):
2020-06-30 11:19:47 +10:00
Mark Andrews
089df5f8ef Add TOP_SRCDIR to conf.sh.in and conf.sh.win32 2020-06-29 23:50:45 +00:00
Mark Andrews
bf205b00c3 Check that 'rndc dnstap -roll <value>' works
(cherry picked from commit a289a57c7f)
2020-06-29 22:30:01 +00:00
Matthijs Mekking
7eed00502f kasp tests: fix wait for reconfig done
The wait until zones are signed after rndc reconfig is broken
because the zones are already signed before the reconfig.  Fix
by having a different way to ensure the signing of the zone is
complete.  This does require a call to the "wait_for_done_signing"
function after each "check_keys" call after the ns6 reconfig.

The "wait_for_done_signing" looks for a (newly added) debug log
message that named will output if it is done signing with a certain
key.

(cherry picked from commit a47192ed5b)
2020-06-29 08:09:40 +02:00
Matthijs Mekking
60752f8092 kasp tests: Replace while loops with retry_quiet
(cherry picked from commit cf76d839ae)
2020-06-29 08:09:32 +02:00
Evan Hunt
f171017570 append "0" to IPv6 addresses ending in "::" when printing YAML
such addresses broke some YAML parsers.

(cherry picked from commit a8baf79e33)
2020-06-25 18:57:06 -07:00
Matthijs Mekking
6d91799388 Add todo in dnssec system test for [GL #1689]
Add a note why we don't have a test case for the issue.

It is tricky to write a good test case for this if our tools are
not allowed to create signatures for unsupported algorithms.

(cherry picked from commit c6345fffe9)
2020-06-25 22:44:19 +10:00
Evan Hunt
dca3658720 "check-names primary" and "check-names secondary" were ignored
these keywords were added to the parser as synonyms for "master"
and "slave" but were never hooked in to the configuration of named,
so they were ignored. this has been fixed and the option is now
checked for correctness.

(cherry picked from commit ba31b189b4)
2020-06-22 14:30:14 +02:00
Mark Andrews
e522e80dc4 Add checking RFC 4592 responses examples to wildcard system test
(cherry picked from commit 30586aa054c9cd8a4e64c91ed78683a4b54c79bc)
2020-06-18 10:18:42 +02:00
Mark Andrews
0c23582ffd Improve the behaviour of yamlget.py when run with python2
(cherry picked from commit 9e72266705)
2020-06-05 10:51:01 +10:00
Mark Andrews
28a940fe69 Add +yaml support for EDE
(cherry picked from commit 0ec77c2b92)
2020-06-05 10:50:58 +10:00
Mark Andrews
6ac4e62fbc Ignore attempts to add DS records at zone apex
DS records belong in the parent zone at a zone cut and
are not retrievable with modern recursive servers.

(cherry picked from commit ae55fbbe9c)
2020-06-04 16:06:45 +02:00
Mark Andrews
b17f6eba6a Reject primary zones with an DS record at the zone apex.
DS records only belong at delegation points and if present
at the zone apex are invariably the result of administrative
errors.  Additionally they can't be queried for with modern
resolvers as the parent servers will be queried.

(cherry picked from commit 35a58d30c9)
2020-06-04 16:06:07 +02:00
Ondřej Surý
d85b936898 Reduce the default value for max-stale-ttl from 1 week to 12 hours
Originally, the default value for max-stale-ttl was 1 week, which could
and in some scenarios lead to cache exhaustion on a busy resolvers.
Picking the default value will always be juggling between value that's
useful (e.g. keeping the already cached records after they have already
expired and the upstream name servers are down) and not bloating the
cache too much (e.g. keeping everything for a very long time).  The new
default reflects what we think is a reasonable to time to react on both
sides (upstream authoritative and downstream recursive).

(cherry picked from commit 13fd3ecfab)
2020-06-03 10:45:09 +00:00
Matthijs Mekking
168d362b54 Fix bug in keymgr_key_has_successor
The logic in `keymgr_key_has_successor(key, keyring)` is flawed, it
returns true if there is any key in the keyring that has a successor,
while what we really want here is to make sure that the given key
has a successor in the given keyring.

Rather than relying on `keymgr_key_exists_with_state`, walk the
list of keys in the keyring and check if the key is a successor of
the given predecessor key.

(cherry picked from commit 0d578097ef)
2020-06-02 14:54:08 +02:00
Matthijs Mekking
e85c1aa74e Replace date -d with python script
The usage of 'date -d' in the kasp system test is not portable,
replace with a python script.  Also remove some leftover
"set_keytime 'yes'" calls.

(cherry picked from commit 5b3decaf48)
2020-06-02 11:36:25 +02:00
Matthijs Mekking
da2daea0e6 Test keytimes on algorithm rollover
This improves keytime testing on algorithm rollover.  It now
tests for specific times, and also tests for SyncPublish and
Removed keytimes.

(cherry picked from commit 61c1040ae5)
2020-06-02 11:36:08 +02:00
Matthijs Mekking
327d8bb273 Test keytimes on policy changes
This improves keytime testing on reconfiguration of the
dnssec-policy.

(cherry picked from commit da5e1e3a0f)
2020-06-02 11:36:01 +02:00
Matthijs Mekking
f026332f88 Test keytimes on CSK rollover
This improves keytime testing on CSK rollover.  It now
tests for specific times, and also tests for SyncPublish and
Removed keytimes.

Since an "active key" for ZSK and KSK means something
different, this makes it tricky to decide when a CSK is
active. An "active key" intuitively means the key is signing
so we say a CSK is active when it is creating zone signatures.

This change means a lot of timings for the CSK rollover tests
need to be adjusted.

The keymgr code needs a slight change on calculating the
prepublication time: For a KSK we need to include the parent
registration delay, but for CSK we look at the zone signing
property and stick with the ZSK prepublication calculation.

(cherry picked from commit e233433772)
2020-06-02 11:35:52 +02:00
Matthijs Mekking
8e0776d0d5 Test keytimes on KSK rollover
This improves keytime testing on KSK rollover.  It now
tests for specific times, and also tests for SyncPublish and
Removed keytimes.

(cherry picked from commit 649d0833ce)
2020-06-02 11:35:43 +02:00
Matthijs Mekking
437ec25c0c kasp: registration delay adjustments
Registration delay is not part of the Iret retire interval, thus
removed from the calculation when setting the Delete time metadata.

Include the registration delay in prepublication time, because
we need to prepublish the key sooner than just the Ipub
publication interval.

(cherry picked from commit 50bbbb76a8)
2020-06-02 11:35:32 +02:00
Matthijs Mekking
48a265b2c7 Test keytimes on ZSK rollover
This improves keytime testing on ZSK rollover.  It now
tests for specific times, and also tests for SyncPublish and
Removed keytimes.

(cherry picked from commit e01fcbbaf8)
2020-06-02 11:35:19 +02:00
Matthijs Mekking
0e1290c383 Test keytimes on enable-dnssec case
This improves keytime testing for enabling DNSSEC.  It now
tests for specific times, and also tests for SyncPublish.

(cherry picked from commit cf51c87fad)
2020-06-02 11:35:09 +02:00
Matthijs Mekking
e036a0a919 Start testing keytiming metadata
This commit adds testing keytiming metadata.  In order to facilitate
this, the kasp system test undergoes a few changes:

1. When finding a key file, rather than only saving the key ID,
   also save the base filename and creation date with `key_save`.
   These can be used later to set expected key times.
2. Add a test function `set_addkeytime` that takes a key, which
   keytiming to update, a datetime in keytiming format, and a number
   (seconds) to add, and sets the new time in the given keytime
   parameter of the given key.  This is used to set the expected key
   times.
3. Split `check_keys` in `check_keys` and `check_keytimes`.  First we
   need to find the keyfile before we can check the keytimes.
   We need to retrieve the creation date (and sometimes other
   keytimes) to determine the other expected key times.
4. Add helper functions to set the expected key times per policy.
   This avoids lots of duplication.

Check for keytimes for the first test cases (all that do not cover
rollovers).

(cherry picked from commit f8e34b57b4)
2020-06-02 11:34:49 +02:00
Matthijs Mekking
91d861b90d Stop keeping track of key parameter count
Stop tracking in the comments the number of key parameters in the
kasp system test, it adds nothing beneficial.

(cherry picked from commit 8483f71258)
2020-06-02 11:34:39 +02:00
Matthijs Mekking
cec9ddd18c Fix some more test output filenames
After removing dnssec-settime calls that set key rollover
relationship, we can adjust the counts in test output filenames.

Also fix a couple of more wrong counts in output filenames.

(cherry picked from commit 8204e31f0e)
2020-06-02 11:34:31 +02:00
Matthijs Mekking
f4d3a774f7 Set key rollover relationship without settime
Using dnssec-setttime after dnssec-keygen in the kasp system test
can lead to off by one second failures, so reduce the usage of
dnssec-settime in the setup scripts.  This commit deals with
setting the key rollover relationship (predecessor/successor).

(cherry picked from commit 5a590c47a5)
2020-06-02 11:34:22 +02:00