This commit changes TLS DNS so that partial writes by the
SSL_write_ex() function are taken into account properly. Now, before
doing encryption, we are flushing the buffers for outgoing encrypted
data.
The problem is fairly complicated and originates from the fact that it
is somewhat hard to understand by reading the documentation if and
when partial writes are supported/enabled or not, and one can get a
false impression that they are not supported or enabled by
default (https://www.openssl.org/docs/man3.1/man3/SSL_write_ex.html). I
have added a lengthy comment about that into the code because it will
be more useful there. The documentation on this topic is vague and
hard to follow.
The main point is that when SSL_write_ex() fails with
SSL_ERROR_WANT_WRITE, the OpenSSL code tells us that we need to flush
the outgoing buffers and then call SSL_write_ex() again with exactly
the same arguments in order to continue as partial write could have
happened on the previous call to SSL_write_ex() (that is not hard to
verify by calling BIO_pending(sock->tls.app_rbio) before and after the
call to SSL_write_ex() and comparing the returned values). This aspect
was not taken into account in the code.
Now, one can wonder how that could have led to the behaviour that we
saw in the #4255 bug report. In particular, how could we lose one
message and duplicate one twice? That is where things get interesting.
One needs to keep two things in mind (that is important):
Firstly, the possibility that two (or more) subsequent SSL_write_ex()
calls will be done with exactly the same arguments is very high (the
code does not guarantee that in any way, but in practice, that happens
a lot).
Secondly, the dnsperf (the software that helped us to trigger the bug)
bombed the test server with messages that contained exactly the same
data. The only difference in the responses is message IDs, which can
be found closer to the start of a message.
So, that is what was going on in the older version of the code:
1. During one of the isc_nm_send() calls, the SSL_write_ex() call
fails with SSL_ERROR_WANT_WRITE. Partial writing has happened, though,
and we wrote a part of the message with the message
ID (e.g. 2014). Nevertheless, we have rescheduled the complete send
operation asynchronously by a call to tlsdns_send_enqueue().
2. While the asynchronous request has not been completed, we try to
send the message (e.g. with ID 2015). The next isc_nm_send() or
re-queued send happens with a call to SSL_write_ex() with EXACTLY the
same arguments as in the case of the previous call. That is, we are
acting as if we want to complete the previously failed SSL_write_ex()
attempt (according to the OpenSSL documentation:
https://www.openssl.org/docs/man3.1/man3/SSL_write_ex.html, the
"Warnings" section). This way, we already have a start of the message
containing the previous ID (2014 in our case) but complete the write
request with the rest of the data given in the current write
attempt. However, as responses differ only in message ID, we end up
sending a valid (properly structured) DNS message but with the ID of
the previous one. This way, we send a message with ID from the
previous isc_nm_send() attempt. The message with the ID from the send
request from this attempt will never be sent, as the code thinks that
it is sending it now (that is how we send the message with ID 2014
instead of 2015, as in our example, thus making the message with ID
2015 never to be sent).
3. At some point later, the asynchronous send request (the rescheduled
on the first step) completes without an error, sending a second
message with the same ID (2014).
It took exhausting SSL write buffers (so that a data encryption
attempt cannot be completed in one operation) via long DoT streams in
order to exhibit the behaviour described above. The exhaustion
happened because we have not been trying to flush the buffers often
enough (especially in the case of multiple subsequent writes).
In my opinion, the origin of the problem can be described as follows:
It happened due to making wrong guesses caused by poorly written
documentation.
This commit adds couple of functions to change "dirty_decay_ms" and
"muzzy_decay_ms" settings on arenas associated with memory contexts.
(cherry picked from commit 6e98b58d15)
This commit extends the internal memory management middleware code in
BIND so that memory contexts backed by dedicated jemalloc arenas can
be created. A new function (isc_mem_create_arena()) is added for that.
Moreover, it extends the existing code so that specialised memory
contexts can be created easily, should we need that functionality for
other future purposes. We have achieved that by passing the flags to
the underlying jemalloc-related calls. See the above
isc_mem_create_arena(), which can serve as an example of this.
Having this opens up possibilities for creating memory contexts tuned
for specific needs.
(cherry picked from commit 8550c52588)
This commit ensures that BIND and supplementary tools still can be
built on newer versions of DragonFly BSD. It used to be the case, but
somewhere between versions 6.2 and 6.4 the OS developers rearranged
headers and moved some function definitions around.
Before that the fact that it worked was more like a coincidence, this
time we, at least, looked at the related man pages included with the
OS.
No in depth testing has been done on this OS as we do not really
support this platform - so it is more like a goodwill act. We can,
however, use this platform for testing purposes, too. Also, we know
that the OS users do use BIND, as it is included in its ports
directory.
Building with './configure' and './configure --without-jemalloc' have
been fixed and are known to work at the time the commit is made.
(cherry picked from commit 942569a1bb)
A negative or excessively large Content-Length could cause a crash
by making `INSIST(httpd->consume != 0)` fail.
(cherry picked from commit 26e10e8fb5)
Oracle Linux 7 sets __STDC_VERSION__ to 201112L, but doesn't define
__STDC_NO_ATOMICS__, so we try to include <stdatomic.h> without the
header present in the system. Since we are already detecting the header
in the autoconf, use the HAVE_STDATOMIC_H for more reliable detecting
whether <stdatomic.h> header is present.
The detach (and possibly close) netmgr events can cause additional
callbacks to be called when under exclusive mode. The detach can
trigger next queued TCP query to be processed and close will call
configured close callback.
Move the detach and close netmgr events from the priority queue to the
normal queue as the detaching and closing the sockets can wait for the
exclusive mode to be over.
In HTTP/1.0 and HTTP/1.1, RFC 9112 section 9.6 says the last response
in a connection should include a `Connection: close` header, but the
statschannel server omitted it.
In an HTTP/1.0 response, the statschannel server can sometimes send a
`Connection: keep-alive` header when it is about to close the
connection. There are two ways:
If the first request on a connection is keep-alive and the second
request is not, then _both_ responses have `Connection: keep-alive`
but the connection is (correctly) closed after the second response.
If a single request contains
Connection: close
Connection: keep-alive
then RFC 9112 section 9.3 says the keep-alive header is ignored, but
the statschannel sends a spurious keep-alive in its response, though
it correctly closes the connection.
To fix these bugs, make it more clear that the `httpd->flags` are part
of the per-request-response state. The Connection: flags are now
described in terms of the effect they have instead of what causes them
to be set.
(manually picked from commit e18ca83a3b)
Tasks can block for a long time, especially when used by tools in
interactive mode. Update the event loop's time to avoid unexpected
errors when processing later events during the same callback.
For example, newly started timers can fire too early, because the
current time was stale. See the note about uv_update_time() in the
https://docs.libuv.org/en/v1.x/timer.html#c.uv_timer_start page.
The isc_result_t enum was to sparse when each library code would skip to
next << 16 as a base. Remove the huge holes in the isc_result_t enum to
make the isc_result tables more compact.
This change required a rewrite how we map dns_rcode_t to isc_result_t
and back, so we don't ever return neither isc_result_t value nor
dns_rcode_t out of defined range.
(cherry picked from commit a8e6c3b8f7)
Report "permission denied" instead of "unexpected error"
when trying to update a zone file on a read-only file system.
(cherry picked from commit dd6acc1cac)
isc_mem_put NULL's the pointer to the memory being freed. The
equality test 'parent->r == node' was accidentally being turned
into a test against NULL.
(cherry picked from commit ac2e0bc3ff)
Seemingly by omission, sockstop netievent used by multi-layer sockets
was not a high priority event, like it should be (similarly to other
socket types).
In particular, that could make BIND stuck on reconfiguration after a
DoH-listener is removed from the configuration.
This commit fixes that.
The intention behind 'isc__nmsocket_stop()' was that the function
sends notifications on every worker thread, making them synchronise on
the barrier, then the initiating thread waits on it, too. This way we
ensure than no other operation will start when we shutting down the
listener.
However, it seems that due to mistake we have been passing the wrong
worker pointer into isc__nm_async_sockstop() from within the context
of an worker thread which has initiated shutting down. While
effectively we have not been using the pointer in this case, it could
cause maintenance issues later. This commit fixes that.
Stop deliberately breaking const rules by copying file->name into
dirbuf and truncating it there. Handle files located in the root
directory properly. Use unlinkat() from POSIX 200809.
(cherry picked from commit 9fcd42c672)
Removing old timestamp or increment versions of log backup files did
not work when the file is an absolute path: only the entry name was
provided to the file remove function.
The dirname was also bogus, since the file separater was put back too
soon.
Fix these issues to make log file rotation work when the file is
configured to be an absolute path.
(cherry picked from commit 70629d73da)
This reverts commit 6cdeb5b046 which added
wrapper around all the unit tests that would run the unit test in the
forked process.
This makes any debugging of the unit tests too hard. Futures attempts to
fix#3980 (closed) should add a custom automake test harness (log
driver) that would kill the unit test after configured timeout.
The CI doesn't provide useful forensics when a system test locks
up. Fork the process and kill it with ABRT if it is still running
after 20 minutes. Pass the exit status to the caller.
(cherry picked from commit 3d5c7cd46c)
The isc_fsaccess API was created to hide the implementation details
between POSIX and Windows APIs. As we are not supporting the Windows
APIs anymore, it's better to drop this API used in the DST part.
Moreover, the isc_fsaccess was setting the permissions in an insecure
manner - it operated on the filename, and not on the file descriptor
which can lead to all kind of attacks if unpriviledged user has read (or
even worse write) access to key directory.
Replace the code that operates on the private keys with code that uses
mkstemp(), fchmod() and atomic rename() at the end, so at no time the
private key files have insecure permissions.
(cherry picked from commit 263d232c79)
As it's impossible to get the current umask without modifying it at the
same time, initialize the current umask at the program start and keep
the loaded value internally. Add isc_os_umask() function to access the
starttime umask.
(cherry picked from commit aca7dd3961)
This commit contains the backport of the behaviour for handling TLS
connect callbacks when wrapping up.
The current behaviour have not caused any problems to us, yet, but we
are changing it to remain on the safer side.
This commit ensures that 'sock->tls.pending_req' is not getting
nullified during TLS connection timeout callback as it prevents the
connection callback being called when connecting was not successful.
We expect 'isc__nm_failed_connect_cb() to be called from
'isc__nm_tlsdns_shutdown()' when establishing connections was
successful, but with 'sock->tls.pending_req' nullified that will not
happen.
The code removed most likely was required in older iterations of the
NM, but to me it seems that now it does only harm. One of the well
know pronounced effects is leading to irrecoverable zone transfer
hangs via TLS.
Previously, the RPZ updates ran quantized on the main nm_worker loops.
As the quantum was set to 1024, this might lead to service
interruptions when large RPZ update was processed.
Change the RPZ update process to run as the offloaded work. The update
and cleanup loops were refactored to do as little locking of the
maintenance lock as possible for the shortest periods of time and the db
iterator is being paused for every iteration, so we don't hold the rbtdb
tree lock for prolonged periods of time.
(cherry picked from commit f106d0ed2b)
libuv support for receiving multiple UDP messages in a single system
call (recvmmsg()) has been tweaked several times between libuv versions
1.35.0 and 1.40.0. Mixing and matching libuv versions within that span
may lead to assertion failures and is therefore considered harmful, so
try to limit potential damage be preventing users from mixing libuv
versions with distinct sets of recvmmsg()-related flags.
(cherry picked from commit 735d09bffe)
The implementation of UDP recvmmsg in libuv 1.35 and 1.36 is
incomplete and could cause assertion failure under certain
circumstances.
Modify the configure and runtime checks to report a fatal error when
trying to compile or run with the affected versions.
(cherry picked from commit 251f411fc3)
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
(cherry picked from commit 935879ed11)
This commit ensures that BIND and supplementary tools still can be
built on newer versions of DragonFly BSD. It used to be the case, but
somewhere between versions 6.2 and 6.4 the OS developers rearranged
headers and moved some function definitions around.
Before that the fact that it worked was more like a coincidence, this
time we, at least, looked at the related man pages included with the
OS.
No in depth testing has been done on this OS as we do not really
support this platform - so it is more like a goodwill act. We can,
however, use this platform for testing purposes, too. Also, we know
that the OS users do use BIND, as it is included in its ports
directory.
Building with './configure' and './configure --without-jemalloc' have
been fixed and are known to work at the time the commit is made.
(cherry picked from commit 942569a1bb)
Return 'isc_result_t' type value instead of 'bool' to indicate
the actual failure. Rename the function to something not suggesting
a boolean type result. Make changes in the places where the API
function is being used to check for the result code instead of
a boolean value.
(cherry picked from commit 41dc48bfd7)
If the OpenSSL SHA1_{Init,Update,Final} API is still available, use it.
The API has been deprecated in OpenSSL 3.0, but it is significantly
faster than EVP_MD API, so make an exception here and keep using it
until we can't.
(cherry picked from commit 25db8d0103)
Instead of going through another layer, use OpenSSL EVP_MD API directly
in the isc_iterated_hash() implementation. This shaves off couple of
microseconds in the microbenchmark.
(cherry picked from commit 36654df732)
as far as I can determine the order of operations is not important.
*** CID 351372: Concurrent data access violations (ATOMICITY)
/lib/isc/timer.c: 227 in timer_purge()
221 LOCK(&timer->lock);
222 if (!purged) {
223 /*
224 * The event has already been executed, but not
225 * yet destroyed.
226 */
>>> CID 351372: Concurrent data access violations (ATOMICITY)
>>> Using an unreliable value of "event" inside the second locked section. If the data that "event" depends on was changed by another thread, this use might be incorrect.
227 timerevent_unlink(timer, event);
228 }
229 }
230 }
231
232 void
(cherry picked from commit 98718b3b4b)
The reference counting and isc_timer_attach()/isc_timer_detach()
semantic are actually misleading because it cannot be used under normal
conditions. The usual conditions under which is timer used uses the
object where timer is used as argument to the "timer" itself. This
means that when the caller is using `isc_timer_detach()` it needs the
timer to stop and the isc_timer_detach() does that only if this would be
the last reference. Unfortunately, this also means that if the timer is
attached elsewhere and the timer is fired it will most likely be
use-after-free, because the object used in the timer no longer exists.
Remove the reference counting from the isc_timer unit, remove
isc_timer_attach() function and rename isc_timer_detach() to
isc_timer_destroy() to better reflect how the API needs to be used.
The only caveat is that the already executed event must be destroyed
before the isc_timer_destroy() is called because the timer is no longet
attached to .ev_destroy_arg.
(cherry picked from commit ae01ec2823)
The isc_task_purge() and isc_task_purgerange() were now unused, so sweep
the task.c file. Additionally remove unused ISC_EVENTATTR_NOPURGE event
attribute.
(cherry picked from commit c17eee034b)
Add isc_task_setquantum() function that modifies quantum for the future
isc_task_run() invocations.
NOTE: The current isc_task_run() caches the task->quantum into a local
variable and therefore the current event loop is not affected by any
quantum change.
(cherry picked from commit 15ea6f002f)
Instead of searching for the events to purge, keep the list of scheduled
events on the timer list and purge the events that we have scheduled.
(cherry picked from commit 3f8024b4a2f12fcd28a9dd813b6f1f3f11d506f2)
The isc_task_purgerange() was walking through all events on the task to
find a matching task. Instead use the ISC_LINK_LINKED to find whether
the event is active.
Cleanup the related isc_task_unsend() and isc_task_unsendrange()
functions that were not used anywhere.
(cherry picked from commit 17aed2f895)
Previously, an incremental hash table resizing was implemented for the
dns_rbt_t hash table implementation. Using that as a base, also
implement the incremental hash table resizing also for isc_ht API
hashtables:
1. During the resize, allocate the new hash table, but keep the old
table unchanged.
2. In each lookup, delete, or iterator operation, check both tables.
3. Perform insertion operations only in the new table.
4. At each insertion also move <r> elements from the old table to
the new table.
5. When all elements are removed from the old table, deallocate it.
To ensure that the old table is completely copied over before the new
table itself needs to be enlarged, it is necessary to increase the
size of the table by a factor of at least (<r> + 1)/<r> during resizing.
In our implementation <r> is equal to 1.
The downside of this approach is that the old table and the new table
could stay in memory for longer when there are no new insertions into
the hash table for prolonged periods of time as the incremental
rehashing happens only during the insertions.
(cherry picked from commit e42cb1f198)
Prefer the pthread_barrier implementation on platforms where it is
available over uv_barrier implementation. This also solves the problem
with thread sanitizer builds on macOS that doesn't have pthread barrier.
(cherry picked from commit d07c4a98da)
This reverts commit f17f5e831b that made
following change:
> The TLSDNS transport was not honouring the single read callback for
> TLSDNS client. It would call the read callbacks repeatedly in case the
> single TLS read would result in multiple DNS messages in the decoded
> buffer.
Turns out that this change broke XoT, so we are reverting the change
until we figure out a proper fix that will keep the design promise and
not break XoT at the same time.
Arthimetic on NULL pointers is undefined. Avoid arithmetic operations
when 'in' is NULL and require 'in' to be non-NULL if 'inlen' is not zero.
(cherry picked from commit 349c23dbb7)
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.
To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.
(cherry picked from commit 916ea26ead)
Additionally to renaming, it changes the function definition so that
it accepts a pointer to pointer instead of returning a pointer to the
new object.
It is mostly done to make it in line with other functions in the
module.
(cherry picked from commit 7962e7f575)