Commit graph

9969 commits

Author SHA1 Message Date
Diego Fronza
a3dbc5fb05 Added system test for stale-refresh-time
This test works as follow:
- Query for data.example rrset.
- Sleep until its TTL expires (2 secs).
- Disable authoritative server.
- Query for data.example again.
- Since server is down, answer come from stale cache, which has
  a configured stale-answer-ttl of 3 seconds.
- Enable authoritative server.
- Query for data.example again
- Since last query before activating authoritative server failed, and
  since 'stale-refresh-time' seconds hasn't elapsed yet, answer should
  come from stale cache and not from the authoritative server.
2020-11-11 12:53:24 -03:00
Diego Fronza
fc074f15a8 Adjusted ancient rrset system test
Before the stale-refresh-time feature, the system test for ancient rrset
was somewhat based on the average time the previous tests and queries
were taking, thus not very precise.

After the addition of stale-refresh-time the system test for ancient
rrset started to fail since the queries for stale records (low
max-stale-ttl) were not taking the time to do a full resolution
anymore, since the answers now were coming from the cache (because the
rrset were stale and within stale-refresh-time window after the
previous resolution failure).

To handle this, the correct time to wait before rrset become ancient is
calculated from max-stale-ttl configuration plus the TTL set in the
rrset used in the tests (ans2/ans.pl).

Then before sending queries for ancient rrset, we check if we need to
sleep enough to ensure those rrset will be marked as ancient.
2020-11-11 12:53:24 -03:00
Diego Fronza
5e47a13fd0 Warn if 'stale-refresh-time' < 30 (default)
RFC 8767 recommends that attempts to refresh to be done no more
frequently than every 30 seconds.

Added check into named-checkconf, which will warn if values below the
default are found in configuration.

BIND will also log the warning during loading of configuration in the
same fashion.
2020-11-11 12:53:23 -03:00
Diego Fronza
4827ad0ec4 Add stale-refresh-time option
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).

The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.

A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.

Only when this interval expires we then try to do a normal refresh of
the rrset.

The logic behind this commit is as following:

- In query.c / query_gotanswer(), if the test of 'result' variable falls
  to the default case, an error is assumed to have happened, and a call
  to 'query_usestale()' is made to check if serving of stale rrset is
  enabled in configuration.

- If serving of stale answers is enabled, a flag will be turned on in
  the query context to look for stale records:
  query.c:6839
  qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;

- A call to query_lookup() will be made again, inside it a call to
  'dns_db_findext()' is made, which in turn will invoke rbdb.c /
  cache_find().

- In rbtdb.c / cache_find() the important bits of this change is the
  call to 'check_stale_header()', which is a function that yields true
  if we should skip the stale entry, or false if we should consider it.

- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
  is set, if that is the case we know that this new search for stale
  records was made due to a failure in a normal resolution, so we keep
  track of the time in which the failured occured in rbtdb.c:4559:
  header->last_refresh_fail_ts = search->now;

- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
  know this is a normal lookup, if the record is stale and the query
  time is between last failure time + stale-refresh-time window, then
  we return false so cache_find() knows it can consider this stale
  rrset entry to return as a response.

The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh

Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
2020-11-11 12:53:23 -03:00
Michal Nowak
9088052225
Drop unused headers 2020-11-11 10:08:12 +01:00
Michal Nowak
24d5052e74
Drop @OPENSSL_LIB@ in bigkey
@OPENSSL_LIB@ was brought back with the
2f9f6f1fac revert.
2020-11-11 09:49:40 +01:00
Michal Nowak
2f9f6f1fac
Revert "Drop bigkey"
This reverts commit ef6703351a.

It is believed that the bigkey test is still useful.
2020-11-10 17:34:05 +01:00
Ondřej Surý
fa424225af netmgr: Add additional safeguards to netmgr/tls.c
This commit adds couple of additional safeguards against running
sends/reads on inactive sockets.  The changes was modeled after the
changes we made to netmgr/tcpdns.c
2020-11-10 14:17:20 +01:00
Witold Kręcicki
d2a2804069 DoT test
Preliminary test for DNSoverTLS - add the dot-port template to system
tests, test a simple query to an authoritative.
2020-11-10 14:17:18 +01:00
Witold Kręcicki
e94afa5bc0 Add 'ephemeral' keyword to 'tls' option in listen-on directive.
listen-on tls ephemeral will cause named to create an ephemeral
TLS self-signed certificate and key, stored only in memory.
2020-11-10 14:17:14 +01:00
Witold Kręcicki
38b78f59a0 Add DoT support to bind
Parse the configuration of tls objects into SSL_CTX* objects.  Listen on
DoT if 'tls' option is setup in listen-on directive.  Use DoT/DoH ports
for DoT/DoH.
2020-11-10 14:16:55 +01:00
Evan Hunt
8ed005f924 add parser support for TLS configuration options
This commit adds stub parser support and tests for:
- "tls" statement, specifying key and cert.
- an optional "tls" keyvalue in listen-on statements for DoT
  configuration.

Documentation for these options has also been added to the ARM, but
needs further work.
2020-11-10 14:16:49 +01:00
Evan Hunt
8886569e9d report peer address in TLS mode, and specify protocol
- peer address was not being reported correctly by "dig +tls"
- the protocol used is now reported in the dig output: UDP, TCP, or TLS.
2020-11-10 14:16:41 +01:00
Witold Kręcicki
03b2c948b6 add "dig +tls"
- add "+[no]tls" option to dig to enable TLS mode
- override the default port number in dig from 53 to 853 when using TLS
2020-11-10 14:16:35 +01:00
Mark Andrews
2b7128fede Check that DNSTAP captures forwarded UPDATE responses 2020-11-10 06:15:46 +00:00
Mark Andrews
06db7a153f Retry edns512 multiple times to trigger fallback to edns at 512
We want named to have slow resolving (multiple retries) when
there is a very small working MTU
2020-11-09 21:45:44 +00:00
Mark Andrews
b5145f46dc Fixup legacy test to account for not falling back to EDNS 512 lookups.
The SOA lookup for edns512 could succeed if the negative response
for ns.edns512/AAAA completed before all the edns512/SOA query
attempts are made.  The ns.edns512/AAAA lookup returns tc=1 and
the SOA record is cached after processing the NODATA response.
Lookup a TXT record at edns512 and look it up instead of the
SOA record.

Removed 'checking that TCP failures do not influence EDNS statistics
in the ADB' as it is no longer appropriate.
2020-11-09 21:45:44 +00:00
Evan Hunt
e011521ef1 address some possible shutdown races in xfrin
there were two failures during observed in testing, both occurring
when 'rndc halt' was run rather than 'rndc stop' - the latter dumps
zone contents to disk and presumably introduced enough delay to
prevent the races:

- a failure when the zone was shut down and called dns_xfrin_detach()
  before the xfrin had finished connecting; the connect timeout
  terminated without detaching its handle
- a failure when the tcpdns socket timer fired after the outerhandle
  had already been cleared.

this commit incidentally addresses a failure observed in mutexatomic
due to a variable having been initialized incorrectly.
2020-11-09 12:33:37 -08:00
Ondřej Surý
127ba7e930 Add libssl libraries to Windows build
This commit extends the perl Configure script to also check for libssl
in addition to libcrypto and change the vcxproj source files to link
with both libcrypto and libssl.
2020-11-09 16:00:28 +01:00
Evan Hunt
49d53a4aa9 use netmgr for xfrin
Use isc_nm_tcpdnsconnect() in xfrin.c for zone transfers.
2020-11-09 13:45:43 +01:00
Ondřej Surý
b558eca633 dig: Refactor recv_done, so there's less exit paths
The recv_done() callback had many exit paths with different conditions,
and every path had it's own set of destructors.  The refactored code now
has unified exit path with descriptive goto labels matching the intent:

 - cancel_lookup
 - next_lookup
 - detach_query
 - keep_query

The only exception to the rule is check_for_more_data() path, where the
part of the query gets reused, so the query->readhandle and query gets
detached on it's own, and by going to the keep_query, we are just
skipping calling the destructors again.
2020-11-08 13:36:12 -08:00
Evan Hunt
88f5f3915b dig: prevent query from being detached if udpconnect fails on first attempt
FreeBSD sometimes returns spurious errors in UDP connect() attempts,
so we try a few times before giving up. However, each failed attempt
triggers a call to udp_ready() in dighost.c, and that was causing
the query object to be detached prematurely.
2020-11-07 21:11:58 +01:00
Ondřej Surý
6d63ffe46d dig: add reference counter to the dig_lookup_t object
Sometimes, the dig_lookup_t could be destroyed before the final
send_done() callback was be called, leading to dereferencing an
already freed dig_lookup_t object.  By making the dig_lookup_t
reference counted, we are ensuring that it won't be freed until
the last reference (from dig_query_t .lookup) is released.
2020-11-07 21:11:42 +01:00
Ondřej Surý
ecd70eb4b5 dig: add new debugging output
track creation, attachment and detachment of dig_query objects.
2020-11-07 20:49:53 +01:00
Ondřej Surý
a2bc627c30 dig: add reference counting to dig_query_t
add a reference counter to the dig_query object to ensure
it isn't freed until the last caller releases it.
2020-11-07 20:49:53 +01:00
Evan Hunt
5307bf64ce reduce timing dependencies in system tests
one of the tests in the resolver system test depends on dig
getting no response to its first two query attempts, and SERVFAIL
on the third after resolution times out.

using a 5-second retry timer in dig means the SERVFAIL response
could occur while dig is discarding the second query and preparing
to send the third. in this case the server's response could be
missed.  shortening the retry interval to 4 seconds ensures that
dig has already sent the third query when the SERVFAIL response
arrives.

also, the serve-stale system test could fail due to a race in which
it timed out after waiting ten seconds for a file to be written, and
the dig timeout was just a bit longer. this is addressed by extending
the dig timeout to 11 seconds for this test.
2020-11-07 20:49:53 +01:00
Evan Hunt
ea2b04c361 dig: use new netmgr timeout mechanism
use isc_nmhandle_settimeout() to set read/recv timeouts, and get rid
of connect_timeout() and related functions in dighost.c.
2020-11-07 20:49:53 +01:00
Evan Hunt
e12dc1faa2 dig: setup IDN whenever printing a message
because dig now uses the netmgr, printing of response messages
happens in a different thread than setup. the IDN output filtering
procedure, which set using dns_name_settotextfilter(), is stored as
thread-local data, and so if it's set during setup, it won't be
accessible when printing. we now set it immediately before printing,
in the same thread, and clear it immedately afterward.
2020-11-07 20:49:53 +01:00
Evan Hunt
cde27d2d2b dig: remove "+unexpected" option
The network manager does not support returning UDP datagrams to
clients from unexpected sources; it is therefore not possible for
dig to accept them.  The "+[no]unexpected" option has therefore
been removed from the dig command and its documentation.
2020-11-07 20:49:53 +01:00
Evan Hunt
94b7988efb convert dig/host/nslookup to use the netmgr
use netmgr functions instead of isc_socket for dig, host, and
nslookup. note that `dig +unexpected` is not working.
2020-11-07 20:49:53 +01:00
Michal Nowak
ef6703351a
Drop bigkey
The 'bigkey' binary is not used anywhere, therefor it's sources should
be removed.
2020-11-05 17:17:14 +01:00
Michał Kępień
39191052ad Wait for the "fast-expire" zone to be transferred
In order for a "fast-expire/IN: response-policy zone expired" message to
be logged in ns3/named.run, the "fast-expire" zone must first be
transferred in by that server.  However, with unfavorable timing, ns3
may be stopped before it manages to fetch the "fast-expire" zone from
ns5 and after the latter has been reconfigured to no longer serve that
zone.  In such a case, the "rpz" system test will report a false
positive for the relevant check.  Prevent that from happening by
ensuring ns3 manages to transfer the "fast-expire" zone before getting
shut down.
2020-11-05 07:53:43 +01:00
Matthijs Mekking
518dd0bb17 kasp test: Use DEFAULT_ALGORITHM in tests.sh
Some setup scripts uses DEFAULT_ALGORITHM in their dnssec-policy
and/or initial signing. The tests still used the literal values
13, ECDSAP256SHA256, and 256. Replace those occurrences where
appropriate.
2020-11-04 12:41:25 +01:00
Matthijs Mekking
7e0ec9f624 Add a test for RFC 8901 signer model 2
The new 'dnssec-policy' was already compatible with multi-signer
model 2, now we also have a test for it.
2020-11-04 12:40:04 +01:00
Mark Andrews
40ae128922 dnssec system test needs python and perl 2020-11-03 11:22:36 +11:00
Evan Hunt
5dcdc00b93 add netmgr functions to support outgoing DNS queries
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
  timeout argument to ensure connections time out and are correctly
  cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
  stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
  failure if reading is interrupted by a non-network thread (e.g.
  a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
  prior to 1.27, when uv_udp_connect() was added

all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
2020-10-30 11:11:54 +01:00
Mark Andrews
d7840f4b93 Check that a zone in the process of being signed resolves
ans10 simulates a local anycast server which has both signed and
unsigned instances of a zone.  'A' queries get answered from the
signed instance.  Everything else gets answered from the unsigned
instance.  The resulting answer should be insecure.
2020-10-30 00:17:24 +11:00
Evan Hunt
78af071c11 fix a typo in rpz test
"tcp-only" was not being tested correctly in the RPZ system test
because the option to the "digcmd" function that causes queries to
be sent via TCP was misspelled in one case, and was being interpreted
as a query name.

the "ckresult" function has also been changed to be case sensitive
for consistency with "digcmd".
2020-10-28 21:39:35 -07:00
Ondřej Surý
38f34c266d Fix possible NULL dereference in cd->dlz_destroy()
If the call to cd->dlz_create() in dlopen_dlz_create() fails, cd->dbdata
may be NULL when dlopen_dlz_destroy() gets called in the cleanup path
and passing NULL to the cd->dlz_destroy() callback may cause a NULL
dereference.  Ensure that does not happen by checking whether cd->dbdata
is non-NULL before calling the cd->dlz_destroy() callback.
2020-10-28 15:48:58 +01:00
Ondřej Surý
37b9511ce1 Use libuv's shared library handling capabilities
While libltdl is a feature-rich library, BIND 9 code only uses its basic
capabilities, which are also provided by libuv and which BIND 9 already
uses for other purposes.  As libuv's cross-platform shared library
handling interface is modeled after the POSIX dlopen() interface,
converting code using the latter to the former is simple.  Replace
libltdl function calls with their libuv counterparts, refactoring the
code as necessary.  Remove all use of libltdl from the BIND 9 source
tree.
2020-10-28 15:48:58 +01:00
Ondřej Surý
e2436159ab Refactor the cleanup code in lt_dl code
The cleanup code that would clean the object after plugin/dlz/dyndb
loading has failed was duplicating the destructor for the object, so
instead of the extra code, we just use the destructor instead.
2020-10-28 15:48:58 +01:00
Ondřej Surý
4e9a58a3e6 Unify lt_dlopen() error handling
Make sure an error gets logged when any lt_dlopen() call in the source
tree fails.  Also make sure that NULL values returned by lt_dlerror()
are replaced with a generic error message to prevent passing NULL as an
argument for the %s format specifier.
2020-10-28 15:48:58 +01:00
Ondřej Surý
0f49b02fc5 Remove redundant lt_dlerror() calls
The redundant lt_dlerror() calls were taken from the examples to clean
any previous errors from lt_dl...() calls.  However upon code
inspection, it was discovered there are no such paths that could cause
the lt_dlerror() to return spurious error messages.
2020-10-28 15:48:58 +01:00
Michal Nowak
c0c4c024c6
Replace a seq invocation with a shell loop
seq is not portable.  Use a while loop instead to make the "dnssec"
system test script POSIX-compatible.
2020-10-27 12:21:53 +01:00
Michal Nowak
481dfb9671
Get rid of bashisms in string comparisons
The double equal sign ('==') is a Bash-specific string comparison
operator.  Ensure the single equal sign ('=') is used in all POSIX shell
scripts in the system test suite in order to retain their portability.
2020-10-27 12:21:07 +01:00
Michal Nowak
f0b13873a3
Fix system test backtrace generation on OpenBSD
On Linux core dump contains absolute path to crashed binary

    Core was generated by `/home/newman/isc/ws/bind9/bin/named/.libs/lt-named -D glue-ns1 -X named.lock -m'.

However, on OpenBSD there's only a basename

    Core was generated by `named'.

This commit adds support for the latter, retains the former.
2020-10-26 14:58:15 +01:00
Michal Nowak
a0426e0466
Use a POSIX shell in bin/tests/system/ifconfig.sh
Some non-POSIX shells, like /bin/csh on FreeBSD, are unable to execute
the config.guess file:

    + /bin/csh /var/tmp/gitlab_runner/builds/YdCaoq4b/0/mnowak/bind9/config.guess
    timestamp=2018-02-24: Command not found.
    me=config.guess: Command not found.
    Unmatched '"'.

When ./configure is run, it attempts to locate a POSIX-compliant shell.
Use the result of that search in the bin/tests/system/ifconfig.sh
script.
2020-10-23 10:32:21 +02:00
Ondřej Surý
64e56a9704 Postpone the isc_app_shutdown() after rndc response has been sent
When `rndc stop` is received, the isc_app_shutdown() was being called
before response to the rndc client has been sent; as the
isc_app_shutdown() also tears down the netmgr, the message was never
sent and rndc would complain about connection being interrupted in the
middle of the transaction.  We now postpone the shutdown after the rndc
response has been sent.
2020-10-22 11:46:58 -07:00
Ondřej Surý
f7c82e406e Fix the isc_nm_closedown() to actually close the pending connections
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
   whether the socket was still alive and scheduling reads/sends on
   closed socket.

2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
   changed to always return the error conditions via the callbacks, so
   they always succeed.  This applies to all protocols (UDP, TCP and
   TCPDNS).
2020-10-22 11:37:16 -07:00
Michal Nowak
7ef268bb4b
Drop unused bufferlist code 2020-10-22 13:11:16 +02:00