Commit graph

14250 commits

Author SHA1 Message Date
Matthijs Mekking
3c244da9d4 Recognize escapes when reading the public key
Escapes are valid in DNS names, and should be recognized when reading
the public key from disk.

(cherry picked from commit 71f023a1c3)
2023-11-20 08:35:30 +01:00
Evan Hunt
6b47d98a95 set loadtime during initial transfer of a secondary zone
when transferring in a non-inline-signing secondary for the first time,
we previously never set the value of zone->loadtime, so it remained
zero. this caused a test failure in the statschannel system test,
and that test case was temporarily disabled.  the value is now set
correctly and the test case has been reinstated.

(cherry picked from commit 9643281453)
2023-11-15 18:06:58 -08:00
Mark Andrews
617f73426d Adjust comment to have correct message limit value
(cherry picked from commit 560c245971)
2023-11-16 12:22:08 +11:00
Mark Andrews
ab2a450887 Check that buffer length in dns_message_renderbegin
The maximum DNS message size is 65535 octets. Check that the buffer
being passed to dns_message_renderbegin does not exceed this as the
compression code assumes that all offsets are no bigger than this.

(cherry picked from commit a069513234)
2023-11-16 12:22:08 +11:00
Ondřej Surý
6a85e79c0b
Reformat sources with up-to-date clang-format-17 2023-11-13 17:13:07 +01:00
Mark Andrews
ba7cfd2f92 Suppress reporting upcoming changes in root hints
To reduce the amount of log spam when root servers change their
addresses keep a table of upcoming changes by expected date and time
and suppress reporting differences for them until then.

Add initial entry for B.ROOT-SERVERS.NET, Nov 27, 2023.

(cherry picked from commit b69100b747)
2023-11-03 03:43:49 +11:00
Mark Andrews
15e13bd523 Update b.root-servers.net IP addresses
This covers both root hints and the default primaries for the root
zone mirror.  The official change date is Nov 27, 2023.

(cherry picked from commit 2ca2f7e985)
2023-11-03 03:43:49 +11:00
Michał Kępień
e974f98eb4
Improve stability of the jemalloc workaround
When jemalloc is linked into BIND 9 binaries (rather than preloaded or
used as the system allocator), depending on the decisions made by the
linker, the malloc() symbol may be resolved to a non-jemalloc
implementation at runtime.  Such a scenario foils the workaround added
in commit 2da371d005 as it relies on the
jemalloc implementation of malloc() to be executed.

Handle the above scenario properly by calling mallocx() explicitly
instead of relying on the runtime resolution of the malloc() symbol.
Use trivial wrapper functions to avoid the need to copy multiple #ifdef
lines from lib/isc/mem.c to lib/isc/trampoline.c.  Using a simpler
alternative, e.g. calling isc_mem_create() & isc_mem_destroy(), was
already considered before and rejected, as described in the log message
for commit 2da371d005.

ADJUST_ZERO_ALLOCATION_SIZE() is only used in isc__mem_free_noctx() to
concisely avoid compilation warnings about its 'size' parameter not
being used when building against jemalloc < 4.0.0 (as sdallocx() is then
redefined to dallocx(), which has a different signature).
2023-11-01 18:04:07 +01:00
Matthijs Mekking
76c9019403 Don't ignore auth zones when in serve-stale mode
When serve-stale is enabled and recursive resolution fails, the fallback
to lookup stale data always happens in the cache database. Any
authoritative data is ignored, and only information learned through
recursive resolution is examined.

If there is data in the cache that could lead to an answer, and this can
be just the root delegation, the resolver will iterate further, getting
closer to the answer that can be found by recursing down the root, and
eventually puts the final response in the cache.

Change the fallback to serve-stale to use 'query_getdb()', that finds
out the best matching database for the given query.

(cherry picked from commit 2322425016)
2023-10-31 13:52:08 +01:00
Mark Andrews
8b11061b91 Only declare 'engine' if it is used
Move the declaration of 'engine' within the appropriate #if/#endif
block.  Remove the UNUSED(engine) from the #else block.
2023-10-27 10:49:38 +11:00
Aram Sargsyan
2141bde46b Fix shutdown races in catzs
The dns__catz_update_cb() does not expect that 'catzs->zones'
can become NULL during shutdown.

Add similar checks in the dns__catz_update_cb() and dns_catz_zone_get()
functions to protect from such a case. Also add an INSIST in the
dns_catz_zone_add() function to explicitly state that such a case
is not expected there, because that function is called only during a
reconfiguration.

(cherry picked from commit 4eb4fa288c)
2023-10-23 10:53:40 +00:00
Mark Andrews
306ee4cb28 Adjust UDP timeouts used in zone maintenance
Drop timeout before resending a UDP request from 15 seconds to 5
seconds and add 1 second to the total time to allow for the reply
to the third request to arrive.  This will speed up the time it
takes for named to recover from a lost packet when refreshing a
zone and for it to determine that a primary is down.

(cherry picked from commit 29f399797d)
2023-10-20 00:16:01 +00:00
Mark Andrews
ebfbad29c1 Add parentheses around macro arguement 'msec'
The is needed to ensure that the multiplication is correctly done.
This was reported by Jinmei Tatuya.
2023-10-20 10:30:48 +11:00
Michal Nowak
7c6632e174
Update the source code formatting using clang-format-17 2023-10-18 09:02:57 +02:00
Matthijs Mekking
ac1b70ad00 Don't resign raw version of the zone
Update the function 'set_resigntime()' so that raw versions of
inline-signing zones are not scheduled to be resigned.

Also update the check in the same function for zone is dynamic, there
exists a function 'dns_zone_isdynamic()' that does a similar thing
and is more complete.

Also in 'zone_postload()' check whether the zone is not the raw
version of an inline-signing zone, preventing calculating the next
resign time.

(cherry picked from commit 741ce2d07a)
2023-10-16 10:34:17 +02:00
Ondřej Surý
905f8c5899
Don't undef <unit>_TRACE, instead add comment how to enable it
In units that support detailed reference tracing via ISC_REFCOUNT
macros, we were doing:

    /* Define to 1 for detailed reference tracing */
    #undef <unit>_TRACE

This would prevent using -D<unit>_TRACE=1 in the CFLAGS.

Convert the above mentioned snippet with just a comment how to enable
the detailed reference tracing:

    /* Add -D<unit>_TRACE=1 to CFLAGS for detailed reference tracing */

(cherry picked from commit 6afa961534)
2023-10-13 11:46:41 +02:00
Aram Sargsyan
c061b90cc6 Remove unnecessary NULL-checks in ns__client_setup()
All these pointers are guaranteed to be non-NULL.

Additionally, update a comment to remove obviously outdated
information about the function's requirements.

(cherry picked from commit b970556f21)
2023-10-02 10:04:56 +00:00
Aram Sargsyan
92e5173a9f Don't use an uninitialized link on an error path
Move the block on the error path, where the link is checked, to a place
where it makes sense, to avoid accessing an unitialized link when
jumping to the 'cleanup_query' label from 4 different places. The link
is initialized only after those jumps happen.

In addition, initilize the link when creating the object, to avoid
similar errors.

(cherry picked from commit fb7bbbd1be)
2023-09-28 10:30:42 +00:00
Ondřej Surý
818f4dc3a7
Explicitly cast chars to unsigned chars for <ctype.h> functions
Apply the semantic patch to catch all the places where we pass 'char' to
the <ctype.h> family of functions (isalpha() and friends, toupper(),
tolower()).

(cherry picked from commit 29caa6d1f0)
2023-09-22 17:01:59 +02:00
Michał Kępień
2f08b622bf Merge tag 'v9.18.19' into bind-9.18 2023-09-20 16:52:16 +02:00
Mark Andrews
933d03fc83 Correctly set the value of covered in dns_ncache_current
Fix the type and rdclass being passed to dns_rdata_tostruct so
that rrsig.covered is correctly set.

(cherry picked from commit 779980710c)
2023-09-18 15:51:44 +10:00
Aram Sargsyan
2a57c12922 Remove an unnecessary NULL-check
In the ns__client_put_cb() callback function the 'client->manager'
pointer is guaranteed to be non-NULL, because in ns__client_request(),
before setting up the callback, the ns__client_setup() function is
called for the 'client', which makes sure that 'client->manager' is set.

Removing the NULL-check resolves the following static analyzer warning:

    /lib/ns/client.c: 1675 in ns__client_put_cb()
    1669     		dns_message_puttemprdataset(client->message, &client->opt);
    1670     	}
    1671     	client_extendederror_reset(client);
    1672
    1673     	dns_message_detach(&client->message);
    1674
    >>>     CID 465168:  Null pointer dereferences  (REVERSE_INULL)
    >>>     Null-checking "client->manager" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    1675     	if (client->manager != NULL) {
    1676     		ns_clientmgr_detach(&client->manager);
    1677     	}
    1678
    1679     	/*
    1680     	 * Detaching the task must be done after unlinking from
2023-09-14 10:39:24 +00:00
Artem Boldariev
18efa454a9
TLS DNS: fix 'isc__nm_uvreq_t' send request use after free error
In TLS DNS, it was possible for an 'isc__nm_uvreq_t' object to be used
twice when it was not implied, leading to use after free errors,
leading to abort()'s and/or crashes.

In particular, in the older version of the code, the following was
happening:

* 'tlsdns_send_direct()' is eventually called from
'isc__nm_async_tlsdnssend()' when sending data;

* 'tlsdns_send_direct()' could fail. If it happens early closer to the
beginning of the function, then the 'isc__nm_uvreq_t' object is passed
to 'isc__nm_failed_send_cb()' and eventually gets destroyed normally
without any negative consequences;

* However, if 'tlsdns_send_direct()' fails due to the call to
'tls_cycle()', then the send request object is re-queued to be
processed asynchronously later by a call to 'tlsdns_send_enqueue()'.

* The error processing code in 'tlsdns_send_direct()' is written
without an assumption that the send request object could be re-queued
in 'tlsdns_send_direct()' and, thus, it was passed to
'isc__nm_failed_send_cb()' for asynchronous error processing.

* As a result of asynchronous processing initiated by
`tlsdns_send_enqueue()` the 'isc__nm_uvreq_t' send request object gets
properly destroyed eventually.

* Then, as a result of asynchronous error processing initiated by the
'isc__nm_failed_send_cb()' call, the send request object is used a
second time. However, it was destroyed at the previous step.

So we have a (relatively) obvious logical mistake here. Why
'tls_cycle()' was failing is of no particular importance, as I can see
many ways for it to fail both on TLS and networking levels. In
particular, during testing with 'dnsperf', I have encountered the
following two cases:

1. 'result == ISC_R_TLSERROR' - the remote side was wrapping up,
causing errors on TLS level (e.g. handshake has not been completed
yet);

2. 'result == ISC_R_CONNECTIONRESET' - the remote side closed the
connection after we have initiated the asynchronous send request
operation and reached the point when its processing started.

One could ask: why haven't we encountered the error before? That
happened because 'send()' is, in general, a faster operation than
'read()' so we were lucky enough not to encounter this, as it took
dnsperf to make our code fail in a particular way to trigger the
problematic code path. It requires closing the connection shortly
after we have made an asynchronous 'send()' attempt and passed the
data for processing by the libuv code (or, to be precise, down our
technologies stack).
2023-09-08 11:14:22 +02:00
Mark Andrews
432a49a7b0
Limit isccc_cc_fromwire recursion depth
Named and rndc do not need a lot of recursion so the depth is
set to 10.
2023-09-07 19:50:27 +02:00
Mark Andrews
7f89f2d6bc Call ERR_clear_error on EVP_MD_fetch or EVP_##alg error
(cherry picked from commit 28adcf1831)
2023-09-06 15:47:05 +00:00
Mark Andrews
0325e4a1fb Adjust level of log messages when transferring in a zone
This raises the log level of messages treated as FORMERR to NOTICE
when transfering in a zone.  This also adds a missing log message
for TYPE0 and meta types received during a zone transfer.

(cherry picked from commit 6c3414739d)
2023-09-06 20:14:41 +10:00
Artem Boldariev
84d71c8e2c
TLS DNS: take into account partial writes by SSL_write_ex()
This commit changes TLS DNS so that partial writes by the
SSL_write_ex() function are taken into account properly. Now, before
doing encryption, we are flushing the buffers for outgoing encrypted
data.

The problem is fairly complicated and originates from the fact that it
is somewhat hard to understand by reading the documentation if and
when partial writes are supported/enabled or not, and one can get a
false impression that they are not supported or enabled by
default (https://www.openssl.org/docs/man3.1/man3/SSL_write_ex.html). I
have added a lengthy comment about that into the code because it will
be more useful there. The documentation on this topic is vague and
hard to follow.

The main point is that when SSL_write_ex() fails with
SSL_ERROR_WANT_WRITE, the OpenSSL code tells us that we need to flush
the outgoing buffers and then call SSL_write_ex() again with exactly
the same arguments in order to continue as partial write could have
happened on the previous call to SSL_write_ex() (that is not hard to
verify by calling BIO_pending(sock->tls.app_rbio) before and after the
call to SSL_write_ex() and comparing the returned values). This aspect
was not taken into account in the code.

Now, one can wonder how that could have led to the behaviour that we
saw in the #4255 bug report. In particular, how could we lose one
message and duplicate one twice? That is where things get interesting.

One needs to keep two things in mind (that is important):

Firstly, the possibility that two (or more) subsequent SSL_write_ex()
calls will be done with exactly the same arguments is very high (the
code does not guarantee that in any way, but in practice, that happens
a lot).

Secondly, the dnsperf (the software that helped us to trigger the bug)
bombed the test server with messages that contained exactly the same
data. The only difference in the responses is message IDs, which can
be found closer to the start of a message.

So, that is what was going on in the older version of the code:

1. During one of the isc_nm_send() calls, the SSL_write_ex() call
fails with SSL_ERROR_WANT_WRITE. Partial writing has happened, though,
and we wrote a part of the message with the message
ID (e.g. 2014). Nevertheless, we have rescheduled the complete send
operation asynchronously by a call to tlsdns_send_enqueue().

2. While the asynchronous request has not been completed, we try to
send the message (e.g. with ID 2015). The next isc_nm_send() or
re-queued send happens with a call to SSL_write_ex() with EXACTLY the
same arguments as in the case of the previous call. That is, we are
acting as if we want to complete the previously failed SSL_write_ex()
attempt (according to the OpenSSL documentation:
https://www.openssl.org/docs/man3.1/man3/SSL_write_ex.html, the
"Warnings" section). This way, we already have a start of the message
containing the previous ID (2014 in our case) but complete the write
request with the rest of the data given in the current write
attempt. However, as responses differ only in message ID, we end up
sending a valid (properly structured) DNS message but with the ID of
the previous one. This way, we send a message with ID from the
previous isc_nm_send() attempt. The message with the ID from the send
request from this attempt will never be sent, as the code thinks that
it is sending it now (that is how we send the message with ID 2014
instead of 2015, as in our example, thus making the message with ID
2015 never to be sent).

3. At some point later, the asynchronous send request (the rescheduled
on the first step) completes without an error, sending a second
message with the same ID (2014).

It took exhausting SSL write buffers (so that a data encryption
attempt cannot be completed in one operation) via long DoT streams in
order to exhibit the behaviour described above. The exhaustion
happened because we have not been trying to flush the buffers often
enough (especially in the case of multiple subsequent writes).

In my opinion, the origin of the problem can be described as follows:

It happened due to making wrong guesses caused by poorly written
documentation.
2023-09-05 18:03:44 +02:00
Artem Boldariev
1cc17f797e
Allocate DNS send buffers using dedicated per-worker memory arenas
This commit ensures that memory allocations related to DNS send
buffers are routed through dedicated per-worker memory arenas in order
to decrease memory usage on high load caused by TCP-based DNS
transports.

We do that by following jemalloc developers suggestions:

https://github.com/jemalloc/jemalloc/issues/2483#issuecomment-1639019699
https://github.com/jemalloc/jemalloc/issues/2483#issuecomment-1698173849
(cherry picked from commit 01cc7edcca)
2023-09-05 15:02:30 +02:00
Artem Boldariev
f5cb14265f
Add ability to set per jemalloc arena dirty and muzzy decay values
This commit adds couple of functions to change "dirty_decay_ms" and
"muzzy_decay_ms" settings on arenas associated with memory contexts.

(cherry picked from commit 6e98b58d15)
2023-09-05 15:02:30 +02:00
Artem Boldariev
16a45837ca
Make it possible to create memory contexts backed by jemalloc arenas
This commit extends the internal memory management middleware code in
BIND so that memory contexts backed by dedicated jemalloc arenas can
be created. A new function (isc_mem_create_arena()) is added for that.

Moreover, it extends the existing code so that specialised memory
contexts can be created easily, should we need that functionality for
other future purposes. We have achieved that by passing the flags to
the underlying jemalloc-related calls. See the above
isc_mem_create_arena(), which can serve as an example of this.

Having this opens up possibilities for creating memory contexts tuned
for specific needs.

(cherry picked from commit 8550c52588)
2023-09-05 15:02:30 +02:00
Artem Boldariev
d53ecb7720
Fix building BIND on DragonFly BSD (on both older an newer versions)
This commit ensures that BIND and supplementary tools still can be
built on newer versions of DragonFly BSD. It used to be the case, but
somewhere between versions 6.2 and 6.4 the OS developers rearranged
headers and moved some function definitions around.

Before that the fact that it worked was more like a coincidence, this
time we, at least, looked at the related man pages included with the
OS.

No in depth testing has been done on this OS as we do not really
support this platform - so it is more like a goodwill act. We can,
however, use this platform for testing purposes, too. Also, we know
that the OS users do use BIND, as it is included in its ports
directory.

Building with './configure' and './configure --without-jemalloc' have
been fixed and are known to work at the time the commit is made.

(cherry picked from commit 942569a1bb)
2023-09-05 10:33:51 +02:00
Ondřej Surý
c9d6f0e400
Deprecate 'dnssec-must-be-secure' option
The dnssec-must-be-secure feature was added in the early days of BIND 9
and DNSSEC and it makes sense only as a debugging feature.

Remove the feature to simplify the code.

(cherry picked from commit 9e0b348a2b)
2023-09-04 17:27:14 +02:00
Ondřej Surý
46969dca75 Unobfuscate the code-flow logic in got_transfer_quota()
This refactors the code flow in got_transfer_quota() to not use the
CHECK() macro as it really obfuscates the code flow logic here.

(cherry picked from commit 00cb151f8e)
2023-09-04 08:04:52 +00:00
Aram Sargsyan
07b2d19b92 Reset the 'result' before using it again
The 'result' variable should be reset to ISC_R_NOTFOUND again,
because otherwise a log message could be logged about not being
able to get the TLS configuration based on on the 'result' value
from the previous calls to get the TSIG key.

(cherry picked from commit 6cab7fc627)
2023-09-04 08:04:52 +00:00
Mark Andrews
29a93d2889 Check that buf is large enough
(cherry picked from commit 299f519b09)
2023-09-01 14:06:27 +10:00
Mark Andrews
f77ffa7953 Take ownership of pointer before freeing
(cherry picked from commit 9e2288208d)
2023-09-01 14:03:49 +10:00
Mark Andrews
4c27f80476 Add missing "Design by Contract" REQUIREs
(cherry picked from commit b442ae8d3e)
2023-09-01 14:00:14 +10:00
Mark Andrews
894b0970e6 Clear OpenSSL errors on TSL error paths
(cherry picked from commit 4f790b6c58)
2023-09-01 13:45:34 +10:00
Mark Andrews
0111782f1e Clear OpenSSL errors on context creation failures
(cherry picked from commit 96db614d69)
2023-09-01 13:45:34 +10:00
Mark Andrews
900efd613f Clear OpenSSL errors on SHA failures
(cherry picked from commit 247422c69f)
2023-09-01 13:45:34 +10:00
Mark Andrews
34a0bb146c Clear OpenSSL errors on engine errors
(cherry picked from commit 2ba62aebce)
2023-09-01 13:43:20 +10:00
Mark Andrews
aca6f3e82d Clear OpenSSL errors on EVP failures
(cherry picked from commit 4ea926934a)
2023-09-01 13:40:32 +10:00
Mark Andrews
b5b13771f2 Clear OpenSSL errors on EVP_PKEY_new failures
(cherry picked from commit 6df53cdb87)
2023-09-01 13:37:02 +10:00
Mark Andrews
4d37996b1a Clear OpenSSL errors on EC_KEY_get0_private_key failures
(cherry picked from commit 86b04368b0)
2023-09-01 13:34:14 +10:00
Mark Andrews
386e88d0e4 Clear OpenSSL errors on EVP_PKEY_get0_EC_KEY failures
(cherry picked from commit abd8c03592)
2023-09-01 13:30:30 +10:00
Mark Andrews
ababcd28c1 Clear OpenSSL errors on EVP_PKEY_get_bn_param failures
(cherry picked from commit d8a9adc821)
2023-09-01 13:24:02 +10:00
Mark Andrews
fb503aa275 Clear OpenSSL errors on EVP_MD_CTX_create failures
(cherry picked from commit 8529be30bb)
2023-09-01 13:13:59 +10:00
Mark Andrews
290896921d Clear OpenSSL errors on ECDSA_SIG_new failures
(cherry picked from commit eafcd41120)
2023-09-01 13:13:06 +10:00
Matthijs Mekking
6e078a79d5 After cache flush, restore serve-stale settings
When flushing the cache, we create a new cache database. The serve-stale
settings need to be restored after doing this. We already did this
for max-stale-ttl, but forgot to do this for stale-refresh-time.

(cherry picked from commit 3ae721db6c)
2023-08-31 11:13:08 +02:00
Mark Andrews
58be5d8ed0 rr_exists should not error if the name does not exist
rr_exists errored if the name did not exist in the zone.  This was
not an issue prior to the addition of krb5-subdomain-self-rhs and
ms-subdomain-self-rhs as the only name used was the zone name which
always existed.

(cherry picked from commit b76a15977a)
2023-08-30 10:05:09 +10:00