bind9

mirror of https://github.com/isc-projects/bind9.git synced 2026-05-28 04:34:54 -04:00

Author	SHA1	Message	Date
Artem Boldariev	d0907a3a1f	TLS DNS: Simplify tls_cycle_input() This commit simplifies code flow in the tls_cycle_input() and makes the incoming data processing similar to that in TCP DNS. In particular, now we decipher all the the incoming data before making a single isc__nm_process_sock_buffer() call. Previously we would try to decipher data bit-by-bit before trying to process the deciphered bit via isc__nm_process_sock_buffer(). Doing like before made the code much less predictable, in particular in the areas like when reading is paused or resumed. The newer approach also allowed us to get rid of some old kludges.	2025-03-24 09:49:38 +02:00
Aram Sargsyan	df373d7d99	Fix memory ordering for operations with quota->used and quota->waiting Change all the non-locked operations on 'quota->used' and 'quota->waiting' to "acq/rel" for inter-thread synchronization. Some loads are left as "relaxed", because they are under a locked mutex which also provides protection.	2025-03-04 09:57:34 +00:00
Aram Sargsyan	80d7d11f37	Use relaxed memory ordering for quota->max and quota->soft These variables are not critical for memory ordering issues and we can use the relaxed memory ordering, as done in the main branch.	2025-03-04 09:57:34 +00:00
Artem Boldariev	94bcd8c253	DoH: Bump the active streams processing limit This commit bumps the total number of active streams (= the opened streams for which a request is received, but response is not ready) to 60% of the total streams limit. The previous limit turned out to be too tight as revealed by longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests. (cherry picked from commit `eaad0aefe6`)	2025-03-03 12:08:15 +02:00
Artem Boldariev	aa6fd85b0b	DoH: remove obsolete INSIST() check The check, while not active by default, is not valid since the commit `8b8f4d500d`. See 'if (total == 0) { ...' below branch to understand why. (cherry picked from commit `217a1ebd79`)	2025-03-03 12:07:48 +02:00
Artem Boldariev	d9928ccb62	DoH: Flush HTTP write buffer on an outgoing DNS message Previously, the code would try to avoid sending any data regardless of what it is unless: a) The flush limit is reached; b) There are no sends in flight. This strategy is used to avoid too numerous send requests with little amount of data. However, it has been proven to be too aggressive and, in fact, harms performance in some cases (e.g., on longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:"). Now, additionally to the listed cases, we also: c) Flush the buffer and perform a send operation when there is an outgoing DNS message passed to the code (which is indicated by the presence of a send callback). That helps improve performance for "stress:long:rpz:doh+udp:linux:" tests. (cherry picked from commit `c5f7968856`)	2025-03-03 12:07:15 +02:00
Artem Boldariev	b4e8089694	DoH: Limit the number of delayed IO processing requests Previously, a function for continuing IO processing on the next UV tick was introduced (http_do_bio_async()). The intention behind this function was to ensure that http_do_bio() is eventually called at least once in the future. However, the current implementation allows queueing multiple such delayed requests needlessly. There is currently no need for these excessive requests as http_do_bio() can requeue them if needed. At the same time, each such request can lead to a memory allocation, particularly in BIND 9.18. This commit ensures that the number of enqueued delayed IO processing requests never exceeds one in order to avoid potentially bombarding IO threads with the delayed requests needlessly. (cherry picked from commit `0e1b02868a`)	2025-03-03 12:06:44 +02:00
Artem Boldariev	e525029b89	DoH: Simplify http_do_bio() This commit significantly simplifies the code flow in the http_do_bio() function, which is responsible for processing incoming and outgoing HTTP/2 data. It seems that the way it was structured before was indirectly caused by the presence of the missing callback calls bug, fixed in `8b8f4d500d`. The change introduced by this commit is known to remove a bottleneck and allows reproducible and measurable performance improvement for long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests. Additionally, it fixes a similar issue with potentially missing send callback calls processing and hardens the code against use-after-free errors related to the session object (they can potentially occur). (cherry picked from commit `0956fb9b9e`)	2025-03-03 12:06:05 +02:00
Artem Boldariev	66bdddc51a	DoH: http_send_outgoing() return value is not used The value returned by http_send_outgoing() is not used anywhere, so we make it not return anything (void). Probably it is an omission from older times. (cherry picked from commit `2adabe835a`)	2025-02-19 19:42:15 +02:00
Artem Boldariev	0b9e8e6063	DoH: Fix missing send callback calls When handling outgoing data, there were a couple of rarely executed code paths that would not take into account that the callback MUST be called. It could lead to potential memory leaks and consequent shutdown hangs. (cherry picked from commit `8b8f4d500d`)	2025-02-19 19:42:15 +02:00
Artem Boldariev	f9aa7a298d	DoH: change how the active streams number is calculated This commit changes the way how the number of active HTTP streams is calculated and allows it to scale with the values of the maximum amount of streams per connection, instead of effectively capping at STREAM_CLIENTS_PER_CONN. The original limit, which is intended to define the pipelining limit for TCP/DoT. However, it appeared to be too restrictive for DoH, as it works quite differently and implements pipelining at protocol level by the means of multiplexing multiple streams. That renders each stream to be effectively a separate connection from the point of view of the rest of the codebase. (cherry picked from commit `a22bc2d7d4`)	2025-02-19 19:42:15 +02:00
Artem Boldariev	3c49824589	DoH: Track the amount of in flight outgoing data Previously we would limit the amount of incoming data to process based solely on the presence of not completed send requests. That worked, however, it was found to severely degrade performance in certain cases, as was revealed during extended testing. Now we switch to keeping track of how much data is in flight (or ready to be in flight) and limit the amount of processed incoming data when the amount of in flight data surpasses the given threshold, similarly to like we do in other transports. (cherry picked from commit `05e8a50818`)	2025-02-19 19:42:15 +02:00
Andoni Duarte Pintado	73997c8161	Merge tag 'v9.18.33' into bind-9.18	2025-01-29 17:23:11 +01:00
Ondřej Surý	d8206a939c	Reduce struct isc__nm_uvreq size from 1560 to 560 bytes The uv_req union member of struct isc__nm_uvreq contained libuv request types that we don't use. Turns out that uv_getnameinfo_t is 1000 bytes big and unnecessarily enlarged the whole structure. Remove all the unused members from the uv_req union.	2025-01-22 14:12:38 +01:00
Ondřej Surý	a7630c2c62	Reduce sizeof isc_sockaddr from 152 to 48 bytes After removing sockaddr_unix from isc_sockaddr, we can also remove sockaddr_storage and reduce the isc_sockaddr size from 152 bytes to just 48 bytes needed to hold IPv6 addresses. (cherry picked from commit `2367b6a2e1`)	2025-01-22 14:12:38 +01:00
Artem Boldariev	550b692343	DoH: reduce excessive bad request logging We started using isc_nm_bad_request() more actively throughout codebase. In the case of HTTP/2 it can lead to a large count of useless "Bad Request" messages in the BIND log, as often we attempt to send such request over effectively finished HTTP/2 sessions. This commit fixes that. (cherry picked from commit `937b5f8349`)	2025-01-15 16:50:13 +01:00
Artem Boldariev	796708775d	DoH: introduce manual read timer control This commit introduces manual read timer control as used by StreamDNS and its underlying transports. Before that, DoH code would rely on the timer control provided by TCP, which would reset the timer any time some data arrived. Now, the timer is restarted only when a full DNS message is processed in line with other DNS transports. That change is required because we should not stop the timer when reading from the network is paused due to throttling. We need a way to drop timed-out clients, particularly those who refuse to read the data we send. (cherry picked from commit `609a41517b`)	2025-01-15 16:49:32 +01:00
Artem Boldariev	ee42514be2	DoH: floodding clients detection This commit adds logic to make code better protected against clients that send valid HTTP/2 data that is useless from a DNS server perspective. Firstly, it adds logic that protects against clients who send too little useful (=DNS) data. We achieve that by adding a check that eventually detects such clients with a nonfavorable useful to processed data ratio after the initial grace period. The grace period is limited to processing 128 KiB of data, which should be enough for sending the largest possible DNS message in a GET request and then some. This is the main safety belt that would detect even flooding clients that initially behave well in order to fool the checks server. Secondly, in addition to the above, we introduce additional checks to detect outright misbehaving clients earlier: The code will treat clients that open too many streams (50) without sending any data for processing as flooding ones; The clients that managed to send 1.5 KiB of data without opening a single stream or submitting at least some DNS data will be treated as flooding ones. Of course, the behaviour described above is nothing else but heuristical checks, so they can never be perfect. At the same time, they should be reasonable enough not to drop any valid clients, realatively easy to implement, and have negligible computational overhead. (cherry picked from commit `3425e4b1d0`)	2025-01-15 16:49:23 +01:00
Artem Boldariev	11a2956dce	DoH: process data chunk by chunk instead of all at once Initially, our DNS-over-HTTP(S) implementation would try to process as much incoming data from the network as possible. However, that might be undesirable as we might create too many streams (each effectively backed by a ns_client_t object). That is too forgiving as it might overwhelm the server and trash its memory allocator, causing high CPU and memory usage. Instead of doing that, we resort to processing incoming data using a chunk-by-chunk processing strategy. That is, we split data into small chunks (currently 256 bytes) and process each of them asynchronously. However, we can process more than one chunk at once (up to 4 currently), given that the number of HTTP/2 streams has not increased while processing a chunk. That alone is not enough, though. In addition to the above, we should limit the number of active streams: these streams for which we have received a request and started processing it (the ones for which a read callback was called), as it is perfectly fine to have more opened streams than active ones. In the case we have reached or surpassed the limit of active streams, we stop reading AND processing the data from the remote peer. The number of active streams is effectively decreased only when responses associated with the active streams are sent to the remote peer. Overall, this strategy is very similar to the one used for other stream-based DNS transports like TCP and TLS. (cherry picked from commit `9846f395ad`)	2025-01-15 16:47:21 +01:00
Artem Boldariev	125bfd71d3	Add isc__nm_async_run() This commit adds isc__nm_async_run() which is very similar to isc_async_run() in newer versions of BIND: it allows calling a callback asynchronously. Potentially, it can be used to replace some other async operations in other networking code, in particular the delayed I/O calls in TLS a TCP DNS transports to name a few and remove quiet a lot of code, but it we are unlikely to do that for the strictly maintenance only branch, so it is protected with DoH-related #ifdefs. It is implemented in a "universal" way mainly because doing it in the specific code requires the same amount of code and is not simpler.	2025-01-15 16:43:47 +01:00
Artem Boldariev	13d521fa5f	Implement TLS manual read timer control functionality This commit adds a manual TLS read timer control mode which is supposed to override automatic resetting of the timer when any data is received. It both depends and complements similar functionality in TCP.	2025-01-15 15:34:43 +00:00
Artem Boldariev	a67b325542	Implement TCP manual read timer control functionality This commit adds a manual TCP read timer control mode which is supposed to override automatic resetting of the timer when any data is received. That can be accomplished by `isc__nmhandle_set_manual_timer()`. This functionality is supposed to be used by multilevel networking transports which require finer grained control over the read timer (TLS Stream, DoH). The commit is essentially an implementation of the functionality from newer versions of BIND.	2025-01-15 15:34:43 +00:00
Aram Sargsyan	73b6d9e9e5	Fix a bug in isc_rwlock_trylock() When isc_rwlock_trylock() fails to get a read lock because another writer was faster, it should wake up other waiting writers in case there are no other readers, but the current code forgets about the currently active writer when evaluating 'cntflag'. Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if there are other readers, otherwise the waiting writers, if they exist, might not wake up.	2025-01-07 13:30:26 +00:00
Ondřej Surý	43f7642e5d	Update picohttpparser.{c,h} with upstream repository Upstream code doesn't do regular releases, so we need to regularly sync the code from the upstream repository. This is synchronization up to the commit f8d0513 from Jan 29, 2024. (cherry picked from commit `d14a76e115`)	2024-12-08 12:30:11 +00:00
Matthijs Mekking	a0ce89bc15	Implement global limit for outgoing queries This global limit is not reset on query restarts and is a hard limit for any client request. Note: This commit has been significantly modified because of many merge conflicts due to the dns_resolver_createfetch api changes. (cherry picked from commit `16b3bd1cc7`)	2024-12-06 15:17:53 +00:00
Matthijs Mekking	3d0559621b	Implement getter function for counter limit (cherry picked from commit `ca7d487357`)	2024-12-06 15:17:53 +00:00
Matthijs Mekking	90fbe91997	Fix nsupdate hang when processing a large update The root cause is the fix for CVE-2024-0760 (part 3), which resets the TCP connection on a failed send. Specifically commit `4b7c6138` stops reading on the socket because the TCP connection is throttling. When the tcpdns_send_cb callback thinks about restarting reading on the socket, this fails because the socket is a client socket. And nsupdate is a client and is using the same netmgr code. This commit removes the requirement that the socket must be a server socket, allowing reading on the socket again after being throttled. (manually picked from commit `aa24b77d8b`)	2024-12-06 09:26:40 +00:00
Ondřej Surý	4fbdad515c	Move contributed DLZ modules into a separate repository The DLZ modules are poorly maintained as we only ensure they can still be compiled, the DLZ interface is blocking, so anything that blocks the query to the database blocks the whole server and they should not be used except in testing. The DLZ interface itself should be scheduled for removal. (cherry picked from commit `a6cce753e2`)	2024-11-26 16:24:35 +01:00
Mark Andrews	6fc76a1e87	Provide more visibility into configuration errors by logging SSL_CTX_use_certificate_chain_file and SSL_CTX_use_PrivateKey_file errors (cherry picked from commit `9006839ed7`)	2024-11-26 12:24:41 +11:00
Ondřej Surý	c5bac96fd0	Remove redundant parentheses from the return statement (cherry picked from commit `0258850f20`)	2024-11-19 16:06:16 +01:00
Petr Menšík	e5ffa52c6d	Remove unused <openssl/{hmac,engine}.h> headers from OpenSSL shims The <openssl/{hmac,engine}.h> headers were unused and including the <openssl/engine.h> header might cause build failure when OpenSSL doesn't have Engines support enabled. See https://fedoraproject.org/wiki/Changes/OpensslDeprecateEngine (cherry picked from commit `75a50925f7`)	2024-10-18 01:29:27 +00:00
Ondřej Surý	7ad2d6e986	Don't enable SO_REUSEADDR on outgoing UDP sockets Currently, the outgoing UDP sockets have enabled SO_REUSEADDR (SO_REUSEPORT on BSDs) which allows multiple UDP sockets to bind to the same address+port. There's one caveat though - only a single (the last one) socket is going to receive all the incoming traffic. This in turn could lead to incoming DNS message matching to invalid dns_dispatch and getting dropped. Disable setting the SO_REUSEADDR on the outgoing UDP sockets. This needs to be done explicitly because `uv_udp_open()` silently enables the option on the socket. (cherry picked from commit `eec30c33c2`)	2024-10-02 15:20:28 +02:00
Ondřej Surý	5bac885ace	Use release memory ordering when incrementing reference counter As the relaxed memory ordering doesn't ensure any memory synchronization, it is possible that the increment will succeed even in the case when it should not - there is a race between atomic_fetch_sub(..., acq_rel) and atomic_fetch_add(..., relaxed). Only the result is consistent, but the previous value for both calls could be same when both calls are executed at the same time. (cherry picked from commit `88227ea665`)	2024-10-02 09:09:03 +02:00
Nicki Křížek	50221d6ff1	Update code formatting clang 19 was updated in the base image. (cherry picked from commit `ebb5bd9c0f`)	2024-09-21 07:20:11 +00:00
alessio	01e3567243	Do not set SO_INCOMING_CPU We currently set SO_INCOMING_CPU incorrectly, and testing by Ondrej shows that fixing the issue and setting affinities is worse than letting the kernel schedule threads without constraints. So we should not set SO_INCOMING_CPU anymore. (cherry picked from commit `8b8149cdd2`)	2024-09-19 16:40:59 +02:00
Ondřej Surý	3012a97d58	Limit the outgoing UDP send queue size If the operating system UDP queue gets full and the outgoing UDP sending starts to be delayed, BIND 9 could exhibit memory spikes as it tries to enqueue all the outgoing UDP messages. As those are not going to be delivered anyway (as we argued when we stopped enlarging the operating system send and receive buffers), try to send the UDP messages directly using `uv_udp_try_send()` and if that fails, drop the outgoing UDP message. (cherry picked from commit `b576c4c977`)	2024-09-17 16:20:00 +02:00
Michal Nowak	fe8d6023e0	Update code formatting clang 19 was updated in the base image. (cherry picked from commit `ff69d07f`)	2024-09-11 11:47:10 +02:00
Ondřej Surý	c8f1fa0e47	Follow the number of CPU set by taskset/cpuset Administrators may wish to constrain the set of cores that BIND 9 runs on via the 'taskset', 'cpuset' or 'numactl' programs (or equivalent on other O/S), for example to achieve higher (or more stable) performance by more closely associating threads with individual NIC rx queues. If the admin has used taskset, it follows that BIND ought to automatically use the given number of CPUs rather than the system wide count. Co-Authored-By: Ray Bellis <ray@isc.org> (cherry picked from commit `5a2df8caf5`)	2024-09-03 14:54:40 +02:00
Ondřej Surý	015b390f62	Stop using malloc_usable_size and malloc_size Although the nanual page of malloc_usable_size says: Although the excess bytes can be over‐written by the application without ill effects, this is not good programming practice: the number of excess bytes in an allocation depends on the underlying implementation. it looks like the premise is broken with _FORTIFY_SOURCE=3 on newer systems and it might return a value that causes program to stop with "buffer overflow" detected from the _FORTIFY_SOURCE. As we do have own implementation that tracks the allocation size that we can use to track the allocation size, we can stop relying on this introspection function. Also the newer manual page for malloc_usable_size changed the NOTES to: The value returned by malloc_usable_size() may be greater than the requested size of the allocation because of various internal implementation details, none of which the programmer should rely on. This function is intended to only be used for diagnostics and statistics; writing to the excess memory without first calling realloc(3) to resize the allocation is not supported. The returned value is only valid at the time of the call. Remove usage of both malloc_usable_size() and malloc_size() to be on the safe size and only use the internal size tracking mechanism when jemalloc is not available. (cherry picked from commit `d61712d14e`)	2024-08-27 04:49:55 +02:00
Mark Andrews	b73a385696	Define ISC_ATTR_UNUSED macro for __attribute__((__unused__)) The ISC_ATTR_UNUSED macro was missing in BIND 9.18, which complicated things when backporting merge requests from main. As __attribute__((__unused__)) is ubiquitous, just define the macro.	2024-08-27 04:49:55 +02:00
Michal Nowak	b5caae0633	Use clang-format-19 to update formatting	2024-08-22 10:25:22 +02:00
Evan Hunt	a1b2c85d84	ensure fd is non-negative before calling dup() this silences a spurious warning from clang-scan 19.	2024-08-21 21:37:51 -07:00
Ondřej Surý	a49079c84c	Change the NS_PER_SEC (and friends) from enum to static const New version of clang (19) has introduced a stricter checks when mixing integer (and float types) with enums. In this case, we used enum {} as C17 doesn't have constexpr yet. Change the time conversion constants to be #defined constants because of RHEL 8 compiler doesn't consider static const unsigned int to be constant. (cherry picked from commit `b03e90e0d4`)	2024-08-19 15:32:03 +00:00
Ondřej Surý	e08d3a7932	Check the result of dirfd() before calling unlinkat() Instead of directly using the result of dirfd() in the unlinkat() call, check whether the returned file descriptor is actually valid. That doesn't really change the logic as the unlinkat() would fail with invalid descriptor anyway, but this is cleaner and will report the right error returned directly by dirfd() instead of EBADF from unlinkat(). (cherry picked from commit `59f4fdebc0`)	2024-08-19 11:23:05 +00:00
Ondřej Surý	bd8a1abc80	Remove code to read and parse /proc/net/if_inet6 on Linux The getifaddr() works fine for years, so we don't have to keep the callback to parse /proc/net/if_inet6 anymore. (cherry picked from commit `2fbf9757b8`)	2024-08-19 09:46:07 +00:00
Ondřej Surý	e707ee0946	Ignore errno returned from rewind() in the interface iterator The clang-scan 19 has reported that we are ignoring errno after the call to rewind(). As we don't really care about the result, just silence the error, the whole code will be removed in the development version anyway as it is not needed. (cherry picked from commit `dda5ba53df`)	2024-08-19 09:46:07 +00:00
Ondřej Surý	acabe271c5	Disassociate the SSL object from the cached SSL_SESSION When the SSL object was destroyed, it would invalidate all SSL_SESSION objects including the cached, but not yet used, TLS session objects. Properly disassociate the SSL object from the SSL_SESSION before we store it in the TLS session cache, so we can later destroy it without invalidating the cached TLS sessions. Co-authored-by: Ondřej Surý <ondrej@isc.org> Co-authored-by: Artem Boldariev <artem@isc.org> Co-authored-by: Aram Sargsyan <aram@isc.org> (cherry picked from commit `c11b736e44`)	2024-08-07 16:01:03 +00:00
Ondřej Surý	875755d9ea	Attach/detach to the listening child socket when accepting TLS When TLS connection (TLSstream) connection was accepted, the children listening socket was not attached to sock->server and thus it could have been freed before all the accepted connections were actually closed. In turn, this would cause us to call isc_tls_free() too soon - causing cascade errors in pending SSL_read_ex() in the accepted connections. Properly attach and detach the children listening socket when accepting and closing the server connections. (cherry picked from commit `684f3eb8e6`)	2024-08-07 17:20:03 +02:00
Ondřej Surý	9615f5b348	Don't loop indefinitely when isc_task quantum is 'unlimited' Don't run more events than already scheduled. If the quantum is set to a high value, the task_run() would execute already scheduled, and all new events that result from running event->ev_action(). Setting quantum to a number of scheduled events will postpone events scheduled after we enter the loop here to the next task_run() invocation.	2024-08-07 08:27:15 +02:00
Ondřej Surý	236de53c52	Use EXIT_SUCCESS and EXIT_FAILURE Instead of randomly using -1 or 1 as a failure status, properly utilize the EXIT_FAILURE define that's platform specific (as it should be). (cherry picked from commit76997983fde02d9c32aa23bda30b65f1ebd4178c)	2024-08-06 15:19:06 +02:00

1 2 3 4 5 ...

4601 commits