Commit graph

9161 commits

Author SHA1 Message Date
Olivier Houchard
b9aa1c0e64 MEDIUM: tasks: Redispatch shared tasks when the thread is loaded
Now that there is no longer a shared wake queue, chances are if a shared task
is scheduled, it will always end up on the same thread. In
wake_expired_tasks(), when a task has to be waken up, randomly look to
three other threads, and if the runqueue of the current thread is at least
two time bigger than the runqueue of one of the other threads, then give
that task to that thread, so that our load gets reduced.
If we're giving the task to another thread, then we have to add the
TASK_RUNNING flag until we waked it up, otherwise the other thread could
just run it, if it gets waken up from another path, and free it while
we're still not done with it.
2 times has been chosen somewhat arbitrarily, and may be tweaked at a
later date if deemed not optimal.
2026-06-12 11:49:09 +02:00
Olivier Houchard
aaee6c463c MINOR: tasks: Remove wq_lock and the per-thread group wait queues
Now that they are no longer used, remove wq_lock and the per-thread
group wait queues.
2026-06-12 11:49:09 +02:00
Olivier Houchard
caa1cd0674 MINOR: tasks: Use __task_set_state_and_tid() in task_instant_wakeup()
Modify task_instant_wakeup() to use __task_set_state_and_tid().
It uses the new ownership behavior, but that's okay because
task_instant_wakeup() was not used anywhere.
2026-06-12 11:49:09 +02:00
Olivier Houchard
0988b9c773 MEDIUM: tasks: Remove the per-thread group wait queue
Totally remove the per-thread group wait queue. This was potentially a
source of contention, because there were only a global lock for all
those wait queues.
Instead, for shared tasks, there is now the concept of ownership for the
task. When a task is in the wait queue, run queue, or is running on that
particular thread, the task's tid is set to -2 - thread_tid, and only
that thread will be responsible for it until it is no longer running,
and in none of its queue.
When a shared task is scheduled to be run at a later time, if its
current tid is -1, then the current thread will take ownership, and put
it in its own wait queue. If it is already owned, then TASK_WOKEN_WQ is
added to the task's state, and a task_wakeup() is done, so that the
owner thread will add it in its wait queue.
If there is any owner, then a task_wakeup() will just add the task to
the owner's runqueue, otherwise the current thread will become the
owner.
2026-06-12 11:49:09 +02:00
Olivier Houchard
c9f3ddcb1e MINOR: tasks: Start using __task_set_state_and_tid()
Start using __task_set_state_and_tid() when we're changing the state of
the task while queueing it, in preparation to the future ownership
changes.
2026-06-12 11:49:09 +02:00
Olivier Houchard
74b16c5477 MINOR: tasks: Introduce __task_get_current_owner
Introduce a new function, __task_get_current_owner, that returns the
owner of a task based on its current tid.
-1 means there is no current owner, otherwise either the tid is >= 0, in
which case it will just return it, or it's < -1, in which case it will
return -2 - tid, the tid of the thread with the current ownership.
2026-06-12 11:49:09 +02:00
Olivier Houchard
8b6d8f5e4f MINOR: tasks: Add __task_get_new_tid_field()
Introduce __task_get_new_tid_field(), that provides the tid to be used
for a task.
For shared task, to mark temporary ownership of a task, instead of -1,
the tid will be set to -2-tid, tid being the tid of the current thread.
2026-06-12 11:49:09 +02:00
Olivier Houchard
91f9e3a3dd MINOR: tasks: Introduce __task_set_state_and_tid
Introduce a new function, __task_set_state_and_tid, that atomically can
set a task's state and its tid. This will be used later, as the tid will
be used to indicate task ownership even for shared tasks.
2026-06-12 11:49:09 +02:00
William Lallemand
4bb21dae2f MINOR: acme: publish ACME_DEPLOY event via event_hdl
Add EVENT_HDL_SUB_ACME_DEPLOY to the ACME family. It is published in
the dns-01 challenge path after the TXT record information has been
prepared, carrying the certificate store name, domain, account
thumbprint, dns_record value, and optionally the provider and vars
strings.

Lua subscribers using core.event_sub() receive the event data as an
AcmeEvent object, which is the same class used for ACME_NEWCERT and
carries the fields relevant to the event type.
2026-06-11 19:14:52 +02:00
William Lallemand
81d7624e01 MINOR: acme: publish ACME_NEWCERT event via event_hdl
Add a new EVENT_HDL_SUB_ACME_NEWCERT event type in the ACME family.
It is published after a new certificate has been successfully fetched
and installed. The event carries the certificate store name, allowing
subscribers to act on newly available certificates.

Lua subscribers using core.event_sub() receive the event data as an
AcmeEvent object with a crtname field containing the certificate store
name.
2026-06-11 19:14:52 +02:00
Willy Tarreau
7d63efa5f5 MINOR: errors: add ha_diag_notice() to report diag-level notifications
Right now the only way to report info that is only displayed in diag
mode with -dD is to use ha_diag_warning(). The problem is that this is
then counted as a warning and may result in errors when combined with
-dW, as happens for the CPU topology info:

  $ printf "global\nstats socket /tmp/sock1\n" | ./haproxy -dD -dW -c -f /dev/stdin; echo $?
  [NOTICE]   (10406) : haproxy version is 3.5-dev0-5091ac-35
  [NOTICE]   (10406) : path to executable is ./haproxy
  [DIAG]     (10406) : Created 20 threads split into 2 groups
  [ALERT]    (10406) : Some warnings were found and 'zero-warning' is set. Aborting.
  1

We need another level. This commit introduces ha_diag_notice() which only
emits a notification that doesn't count as a warning. Note that we could
even introduce an info level and revisit various messages so that notice
only reports certain events while info is for anything (like versions
above). That could be a future improvement.
2026-06-11 18:48:59 +02:00
Karol Kucharski
96b08e959c BUG/MEDIUM: ktls: defer enabling TLS ULP on a socket until connected
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
FreeBSD / clang (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
The Linux tls module requires a socket to be in TCP_ESTABLISHED state
before we can enable the TLS ULP on the socket, if the socket is in any
other state, then the setsockopt() call will fail, and we won't use
kTLS on that socket.
To make sure we're not doing it too early, defer it until the TLS
handshake is done, which means the TCP connection is established.

This should be backported up to 3.3.

Signed-off-by: Karol Kucharski <kkucharski@fastlogic.pl>
2026-06-11 14:18:31 +02:00
William Lallemand
5c0733db9a MEDIUM: lua: move longjmp annotation macros to hlua.h
__LJMP, WILL_LJMP() and MAY_LJMP() were defined locally in hlua.c,
making them unavailable to other modules that implement Lua bindings.
Move them to include/haproxy/hlua.h so they can be used outside of
hlua.c.
2026-06-11 14:40:27 +02:00
William Lallemand
d0fde90e16 MINOR: lua: add REGISTER_HLUA_STATE_INIT() to register state init callbacks
Add a registration mechanism so that modules outside of hlua.c can hook
into each lua_State creation. Modules call hap_register_hlua_state_init()
(or the REGISTER_HLUA_STATE_INIT() macro) with a callback of the form:

  int my_init(lua_State *L, char **errmsg);

The callback returns an ERR_* code. ERR_ALERT and ERR_WARN trigger
ha_alert()/ha_warning() respectively; any other non-zero errmsg is
emitted via ha_notice(). ERR_FATAL or ERR_ABORT cause exit(1).
Registered entries are freed in hlua_deinit().
2026-06-11 14:13:04 +02:00
William Lallemand
9e60d35aaf MINOR: acme: introduce acme_challenge_ready() for reuse outside the CLI
Extract the challenge-readiness logic from cli_acme_chall_ready_parse()
into a new acme_challenge_ready(crt, dns) function so it can be called
from other contexts such as Lua event handlers.

It slightly changes the messages on the CLI.
2026-06-11 11:33:27 +02:00
Olivier Houchard
3c923d075c MEDIUM: servers: Move to a per-thread idle connection cleanup task
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
FreeBSD / clang (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Having a single task to take care of idle connection cleanup across all
servers leads to high contention. It uses a lock to maintain its tree of
servers to track, and then can acquire the idle_conns lock for each thread.
Instead, have one task per thread. Each thread will maintain its own
tree, so there will be no need for any lock, and it will just acquire
its own idle_conns lock, so it will lead to less contention.
This is a performance improvement, so backporting is optional, but may be
considered if it is worth it. That would require backporting commit
6f8dab2583 too.
2026-06-08 15:38:22 +02:00
Olivier Houchard
6f8dab2583 MINOR: servers: Add a back-pointer to the server in srv_per_thread
In struct srv_per_thread, add a pointer to the server, as with just a
pointer to srv_per_thread, we can't figure out the related server.
2026-06-08 15:37:50 +02:00
Willy Tarreau
7835e1fcbe [RELEASE] Released version 3.5-dev0
Some checks failed
Contrib / admin/halog/ (push) Has been cancelled
Contrib / dev/flags/ (push) Has been cancelled
Contrib / dev/haring/ (push) Has been cancelled
Contrib / dev/hpack/ (push) Has been cancelled
Contrib / dev/poll/ (push) Has been cancelled
FreeBSD / clang (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
Released version 3.5-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2026-06-03 15:26:45 +02:00
Willy Tarreau
02f0101cde MINOR: version: mention that it's development again
This essentially reverts 1cf7dc07e9.
2026-06-03 15:25:53 +02:00
Willy Tarreau
1cf7dc07e9 MINOR: version: mention that it's 3.4 LTS now.
The version will be maintained up to around Q2 2031. Let's
also update the INSTALL file to mention this.
2026-06-03 15:00:25 +02:00
Willy Tarreau
b794190262 BUG/MEDIUM: chunk: do not rely on small trash by default for expressions
There's a corner case with get_trash_chunk_sz() combined with the use
of small bufs: if some incoming data is going to be inflated by a
converter in a non-predictable way (say url_enc etc) then there are
two possibilities:
  - either we try to allocate a size that corresponds to the data, but
    we risk to allocate a small buf to convert a 900B chunk, that will
    now fail if it contains too many non-printable chars;
  - or we try to allocate 3x the size to be conservative, but without
    large bufs we'd fail to transcode any chunk larger than 5.3kB, even
    if it contains only printable chars.

The approach should definitely be refined and it is not 100% reliable
for now. Better temporarily ignore the small buffers for these particular
cases where the savings are not relevant, and see how to pass the knowledge
of the expected size ranges deeper down the API in 3.5. We may possibly rely
on the current trash size (instead of contents) or other mechanisms that
are yet to be specified. alloc_small_trash_chunk() gets the same change
BTW for the same reasons.

The comment for get_trash_chunk_sz() was updated to restate the importance
of being conservative when requesting a size.

No backport is needed.
2026-06-03 14:45:54 +02:00
Christopher Faulet
1b4255a885 BUG/MEDIUM: channel: Fix condition to know if a channel may send
Historically, we considered a channel cannot send before the connection was
established. This was useful to know if the reserve should still be
respected for the receives. This was because it was possible to rewrite the
request on connection retry (because of http-send-name-header option).

However noadays, it is a useless limitation. Once data forwarding is
started, there is no longer rewrites on the request at the stream layer
(http-send-name-header option is handled by the muxes). And, since it is
possible to use small buffers to queue requests, it could be an issue,
because the reserve and the small buffer size are the same by default. Once
a small request was finally dequeued, the receives on client side were not
re-armed because we should still respect the reserve on receives
(channel_recv_limit() was returning 0 in that case).

To fix the issue, we must consider a channel may send since the underlying
stconn has reached the SC_ST_REQ state, instead of SC_ST_EST. Doing so, we
are able to ignore the reserve earlier and the receives can be re-armed even
with small buffers.

There is no reason to backport this patch, except if an issue is reported,
because only the 3.4 is concerned. But it could theorically be backported to
all stable versions.
2026-06-03 12:05:56 +02:00
Willy Tarreau
030a2bfeeb MINOR: debug: add -dA to dump an archive of all dependencies
This adds "-dA[file]" on the command line, which dumps an archive of all
dependencies detected at runtime into the designated file in tar format.
This is equivalent to "set-dumpable libs", but instead of keeping the libs
in memory, it dumps them into a file. This may be used after a core dump,
in order to provide all necessary libraries to developers to permit them
to exploit the core. This may not be available on all operating systems.
2026-06-01 15:01:32 +02:00
Willy Tarreau
f8fd6d25d8 MINOR: deinit: release the in-memory copy of shared libs
When shared libs were loaded via "set-dumpable libs", better release
them upon deinit, it will make valgrind happier. For this we now have
a new function free_collected_libs() in tools.c and call it in deinit().
2026-06-01 15:01:32 +02:00
Amaury Denoyelle
c989d9da6d CLEANUP: fix comment typo
Fix comment for H3_UNI_S_T_CTRL used for unidirectional streams.
2026-06-01 09:55:14 +02:00
Olivier Houchard
004ad29bb2 MINOR: quic: Copy sin6_flowinfo and sin6_scope_id too
In in46un_to_addr(), when copying a struct sockaddr_in6, copy the
sin6_flowinfo and sin6_scope_id, as they are part of the structure too.
They are unlikely to be of any use for us, but this is more correct
anyway.
2026-05-29 15:36:47 +02:00
Frederic Lecaille
54633f078c Revert "MEDIUM: quic: optimize HKDF operations by reusing per-thread contexts"
This reverts commit 4e0af590e8.
This patch does not work at all with AWSLC! This is incredible!

No need to backport.
2026-05-28 18:15:19 +02:00
Frederic Lecaille
4e0af590e8 MEDIUM: quic: optimize HKDF operations by reusing per-thread contexts
Allocating and freeing an OpenSSL EVP_PKEY_CTX context via
EVP_PKEY_CTX_new_id() and EVP_PKEY_CTX_free() on every HKDF cryptographic
operation (such as during stateless reset token generation) induces
unnecessary memory allocation overhead.

Optimize this by introducing a global per-thread context array
'quic_tls_hkdf_ctxs'. These contexts are allocated and initialized once
at startup via a POST_CHECK hook (quic_tls_alloc_hkdf_ctxs) and are
properly freed at exit via a POST_DEINIT hook (quic_tls_dealloc_hkdf_ctxs).

The functions quic_hkdf_extract(), quic_hkdf_expand(), and
quic_hkdf_extract_and_expand() now reuse the pre-allocated context
corresponding to the current thread ID ('tid'), removing dynamic
allocations from these frequent execution paths.

As a cleanup, quic_hkdf_expand() is now static and unexported from the
header file.

Should be easily backported to all versions for optimization purposes.
2026-05-28 17:47:31 +02:00
Amaury Denoyelle
fb828a4711 MINOR: mux_quic/flags: add missing flags
Add missing mux QUIC values for the dev flags utility, both for qcc and
qcs types.
2026-05-28 17:36:05 +02:00
Frederic Lecaille
7ad81403d0 CLEANUP: qpack: move encoded macros to qpack-t.h to avoid duplication
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
QPACK_LFL_WLN_BIT and related encoded field line bitmasks were defined
in both qpack-enc.c and qpack-dec.c. Moved them to qpack-t.h where
they are shared between encoder and decoder, eliminating the duplicate
definitions.

Should be backported to ease any further commit to come.
2026-05-27 18:40:53 +02:00
Frederic Lecaille
40313cd0d5 BUG/MINOR: qpack: Fix index calculation in debug functions
Although qpack_idx_to_name and qpack_idx_to_value are currently only
called within uncompiled debug code, they contained an index bug. They
passed absolute indexes directly to qpack_get_dte instead of relative
dynamic table indexes.

This patch fixes the logic by subtracting QPACK_SHT_SIZE and guarding
against static table index lookups.

Should be easily backported to all versions.
2026-05-27 18:40:53 +02:00
Christopher Faulet
3843f48faf BUG/MEDIUM: h1-htx: Sanitize parsing to properly handle upgrade requests
Thanks to previous patches, the request messages are now sanitized to
properly handle Upgrade requests. Now, if a 'connection: upgrade' header
value was found while no 'Upgrade' header, the 'upgrade' values is removed
from the 'connection' header. Conversely the opposite is also performed. If
'Upgrade' header was found, but no "conneciotn: upgrade" header value, all
occurrences of 'Upgrade' header are refused.

This patch depends on following ones:
  * MINOR: h1: Add  a H1M flag to specify a non-empty 'Upgrade:' header was parsed
  * MINOR: http: Add function to remove all occurrences of a value in a header

It should fix the issue 3397. But the H2 part should be reviewed too, and
probably the H1 response parsing, to be consistent with this change.

The series should be backported as far as 2.4.
2026-05-26 18:28:07 +02:00
Christopher Faulet
b238c08015 MINOR: h1: Add a H1M flag to specify a non-empty 'Upgrade:' header was parsed
H1_MF_UPG_HDR flags was introduced to let H1 parser knwon a non-empty 'Upgrade:'
header was parsed.

This patch is mandatory to fix a bug.
2026-05-26 18:28:07 +02:00
Christopher Faulet
547c2e4e78 MINOR: http: Add function to remove all occurrences of a value in a header
http_remove_header_value() function was added to parse a header value and
remove all occurrences of a specific value.

This patch is mandatory to fix a bug.
2026-05-26 18:28:07 +02:00
Willy Tarreau
4a9ec66fd8 MEDIUM: tools: switch the main PRNG to a thread-local xoshiro256**
The current PRNG is xoroshiro128**, it was introduced in 2.2 with
commit 52bf83939 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG").  It features a 2^128 sequence and can perform
2^64 or 2^96 jumps, though only the 2^96 jump is implemented. It
was initially designed to support both processes and threads, and
implements a shared state between threads instead of allocating
distinct sequences based on PID and thread numbers.

Since then, the PRNG's usage grew and processes have disappeared,
but the lock or the DWCAS are still there due to its shared nature,
and it's possible to trigger watchdog warnings by issuing 100 UUIDs
in a single log-format string.

Also, UUID and QUIC retry tokens now consume 128 bits from the PRNG
in two 64-bit calls, and used to weaken the PRNG by rapidly disclosing
its internal state on reasonably idle systems. This indicates that
most of the time we now need 128 bits.

This patch modernizes the internal generator by switching to xoshiro256**,
which has comparable properties (it's even faster), and features even
longer 2^256 periods, still returning 64 bits per call. It can be
initialized with 2^128 and 2^192 jumps. More details here:

   https://prng.di.unimi.it/
   https://prng.di.unimi.it/xoshiro256starstar.c

Here we implement a thread-local state instead of the old shared one,
so there is no more need for synchronization. The state is seeded at
boot, and each thread performs as many 2^192 jumps as their TID is
large. The master process performs a 2^128 jump where it used to
perform a 2^96 jump so that it doesn't overlap with any worker thread.
However a cleaner approach could be to perform a 2^128 jump for each
fork() (here the worker) and 2^192 for each thread. This might be for
a future improvement.

ha_random64_internal() is now the new PRNG, so that everything else
remains totally transparent. _ha_random64_pair_hashed() continues to
hash the first 128 bits of the state.

A simple config generating 100 UUID on 20 threads jumps from 135k to
1.25M req/s, which translates to a bump from 13.5M to 125M UUID/s,
or 9 times faster. And there is no more DWCAS can be seen anymore
in perf top:

Before: 13.5M/s
Overhead  Shared Object            Symbol
  99.04%  haproxy       [.] ha_random64_internal
   0.66%  haproxy       [.] _ha_random64_pair_hashed
   0.03%  libc-2.42.so  [.] __printf_buffer
   0.02%  [kernel]      [k] _raw_spin_lock
   0.01%  libc-2.42.so  [.] __strchrnul_avx2
   0.01%  [kernel]      [k] ktime_get
   0.01%  [kernel]      [k] lapic_next_deadline
   0.01%  haproxy       [.] sample_process
   0.01%  haproxy       [.] chunk_printf
   0.01%  libc-2.42.so  [.] __printf_buffer_write
   0.01%  [kernel]      [k] hrtimer_active
   0.01%  libc-2.42.so  [.] __memmove_avx_unaligned_erms
   0.01%  libc-2.42.so  [.] _itoa_word

After: 125M/s
  18.84%  libc-2.42.so      [.] __printf_buffer
   9.84%  haproxy           [.] sample_process
   8.33%  libc-2.42.so      [.] __strchrnul_avx2
   6.61%  libc-2.42.so      [.] __memmove_avx_unaligned_erms
   6.06%  libc-2.42.so      [.] __printf_buffer_write
   4.43%  haproxy           [.] strlcpy2
   4.09%  libc-2.42.so      [.] _itoa_word
   2.62%  haproxy           [.] sess_build_logline_orig
   2.12%  haproxy           [.] _ha_random64_pair_hashed
   1.28%  haproxy           [.] pool_put_to_cache
   1.06%  haproxy           [.] __pool_alloc
   1.00%  haproxy           [.] smp_fetch_uuid
   0.93%  haproxy           [.] lf_text_len
   0.82%  haproxy           [.] ha_generate_uuid_v4
2026-05-26 13:13:24 +02:00
Willy Tarreau
26c3b3f41d MINOR: tools: provide a function to generate a hashed random pair
A lot of places call two ha_random64() in a row to generate a 128-bit
random. While it's now safe against linear analysis thanks to the XXH64
call, it's still particularly expensive due to the lock.

Here we introduce a new function ha_random64_pair_hashed(), that feeds
two uint64_t with a hash of the PRNG's internal state, and make it
advance. This will cut in half the number of calls to ha_random64()
and should recover a part of the performance lost in the lock. For
now it's not used.
2026-05-26 13:13:24 +02:00
Willy Tarreau
9b6389c8a0 BUG/MEDIUM: tools: insert an XXH64 layer on the PRNG output
Consuming randoms in pairs directly exposes the internal PRNG's state
on moderately idle system. It can allow to predict next (or previous)
UUIDs, QUIC retry tokens, and WS keys for example. Let's insert an XXH64
call on the ha_random64() output to avoid this. We expand the boot seed
as the secret at boot, and use now_ns as the seed for each call. The
original ha_random64() function was renamed to ha_random64_internal()
for use cases where it's not a problem to directly use the internal
state.

The performance loss is only measurable when single-threaded. It drops
from 7.32M UUID per second to 7.16M. Above that there is no longer any
difference due to the DWCAS loop which reaches up to 98.5% CPU at 20
threads.

This will need to be backported to stable releases after a period of
observation.
2026-05-26 13:13:24 +02:00
Willy Tarreau
32fc35ef09 CLEANUP: resolvers: fix comment typos and wrong filenames in file headers
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
A few asorted comment fixes for resolvers (incorrect file name etc).
2026-05-25 10:57:14 +02:00
Willy Tarreau
8fe8d5fbe3 CLEANUP: resolvers: use read_n32() instead of open-coded big-endian read
In resolv_validate_dns_response(), the second DNS record parsing path
manually constructs a 32-bit big-endian TTL value from four individual
bytes using the expression:

  reader[0] * 16777216 + reader[1] * 65536 + reader[2] * 256 + reader[3]

We have read_n32() to do this, and it's more robust against unexpected
signedness surprises (which should not happen right here since reader is
unsigned char and we use -fwrapv so the result is defined). Also, let's
make the ttl an uint instead of an int. The TTL is only retrieved and not
used for now, so better clean it now.
2026-05-25 10:57:13 +02:00
Willy Tarreau
007d5946b4 BUILD: intops: mask the fail value in array_size_or_fail()
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Cross-compilation on m68k fails in ssl_sock_resize_passphrase_cache()
where the compiler noticed the SIZE_MAX passed to realloc() in the
error path and complained that it's larger than PTRDIFF_MAX. This can
be disabled with -Walloc-size-larger-than=SIZE_MAX but in practice we
can simply hide the value and keep the warning to detect real failures
elsewhere. Let's pass it through DISGUISE() and also take this
opportunity for doing that inside an unlikely() clause since it's never
supposed to happen.
2026-05-25 07:33:35 +02:00
Remi Tricot-Le Breton
e2c3cd9eb7 BUG/MINOR: ocsp: Manage date too far away in the future
The check on the OCSP response expire time is based on the "Next Update"
field of the response, converted by my_timegm function that returns a
time_t (signed long). It is then stored in the 'expire' field of the
certificate_ocsp structure which is typed as a signed long.
When loading an OCSP response, if the "Next Update" time is too far in
the future and we are running on a 32 bits machine, we might end up with
negative times ireturned by my_timegm, which make the comparison with
the current date fail and raises the "OCSP single response: no longer
valid." error message.

This problem typically happens in the ocsp_auto_update.vtc regtest since
the loaded OCSP response have a "Next Update" field in 2050.

This patch simply changes the type of the expire field to an unsigned
long since the 'my_timegm' function does not return '-1' in case of
error, contrary to the standard 'timegm' one.

Ths patch can be backported to all stable branches.
2026-05-21 15:43:49 +02:00
Amaury Denoyelle
8fe8f78473 MINOR: connection: define mask CO_FL_WAIT_XPRT_L6
Define a new connection flag mask CO_FL_WAIT_XPRT_L6. This will be used
to indicate that a XPRT layer is running on top of layer 6. For now,
only xprt_qmux implements this method of operation.
2026-05-21 15:09:10 +02:00
Amaury Denoyelle
9e6e0fd149 MINOR: connection: define xprt_add_l6hs()
When QMux protocol is used, xprt_qmux layer is setup after SSL handshake
completion but prior to the MUX initialization. Once transport
parameters exchange is successful, the layer is removed and the MUX is
started.

The layer setup operation was performed directly on ssl_sock_io_cb().
Simplify the code by extracting it in a dedicated function
xprt_add_l6hs(). The function is generic so the requested XPRT layer
must be passed as argument.

The code is mostly identical. One difference is that a check is
performed to ensure no SSL handshake is pending. If this is the case,
the function is a noop. This will become useful to support QMux
transparently both in clear or on top of SSL.

Another minor addition is that CO_FL_XPRT_READY flag is automatically
resetted by xprt_add_l6hs(). This allows the code to use
conn_xprt_start() standard function after XPRT init.
2026-05-21 15:09:10 +02:00
Willy Tarreau
b62ba7592a MINOR: intops: add a multiply overflow detection for ulong and size_t
Sometimes we'd like to know if some products overflow, so let's add a
pair of functions for this, for ulong and for size_t. For recent enough
compilers (gcc >= 5, clang >= 3.4) we just use __builtin_mul_overflow()
otherwise we rely on a division and a comparison before performing the
operation.

A third function, array_size_or_fail() computes the size of an array
of m elements of n bytes each, and returns the total size if it fits
in a size_t, otherwise ~0 if it does not so that passing this to
malloc() or any other variant would fail by trying to exhaust the
entire memory space.
2026-05-20 17:05:19 +02:00
Christopher Faulet
56e7f8ef31 MEDIUM: htx: Improve htx_xfer API to not count HTX meta-data
This patch add the ability to the htx_xfer() function to transfer data
without acounting the meta-data. By default, the <count> variable includes
the meta-data. But by setting the flag HTX_XFER_NO_METADATA, It is possible
to transfer HTX blocks without count meta-data. In that case, <count> will
not contain the blocks meta-data and the return value will not include them.
2026-05-20 16:21:02 +02:00
Christopher Faulet
99d48c3aec BUG/MINOR: htx: Fix value of HTX_XFER_HDRS_ONLY flag
HTX_XFER_* flags must be declared as a bitfield. However, value of
HTX_XFER_HDRS_ONLY was set of 0x03 while it should be 0x04. So let's fix it.

This patch must be backported where the htx_xfer() function was backported
(5ead611cc "MEDIUM: htx: Add htx_xfer function to replace htx_xfer_blks").
2026-05-20 16:21:02 +02:00
Amaury Denoyelle
47a61eb86d BUG/MINOR: mux_quic: do not exceed stream.max-concurrent on backend side
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Fix usage of stream.max-concurrent QUIC setting on the backend side.
Contrary to frontend connections, this limit must be enforced by QUIC
MUX directly. This is necessary as the peer may allow a larger number of
concurrent streams via its flow control.

First, QUIC TP initial max bidi streams value is now set to 0. This is
fine as only the HTTP/3 client is expected to open bidirectional
streams.

The most important changes is performed in qcm_avail_streams(). The
value first depends on the peer flow control. Now, it is further reduced
if necessary to not exceed the configured BE stream.max-concurrent.

Note that this new behavior may further increases current limitation on
QUIC BE reuse when a QCS instance is kept while its upper stream layer
is detached. In this case there is a risk that the connection is not
reinserted in the correct server pool, as an idle or avail one.

This is a breaking change as BE stream.max-concurrent keyword setting
meaning is changed in effect. However, this does not necessitate extra
warnings as the previous usage was in effect useless. Furthermore, QUIC
on the backend side is still considered as experimental.

This can be backported up to 3.3.
2026-05-20 14:42:03 +02:00
Willy Tarreau
d142c7f421 BUILD: traces: add USE_TRACE allowing to disable traces
This reduces the total code size by 6-10% and speeds up the build a
bit. It can be further reduced by disabling the trace decoding code
inside certain subsystems like muxes. But at least like this it will
help users on small systems to reduce the footprint when not needed
by explicitly passing USE_TRACE=0 (they remain enabled by default).
2026-05-20 11:46:43 +02:00
Olivier Houchard
de3f245df0 BUG/MEDIUM: servers: Store the connection hash with the parameter cache
When we store the negociated server parameters, such as the ALPN, also
store the calculated hash with the connection. If it is different, as
can happen because the IP address is different because set-dst was used,
we certainly do not want to reuse the information in the cache,
otherwise we could end up using the wrong ALPN and mux.
That means we already have to calculate the hash in connect_server()
now, while before we would not do it for Websockets, if we could not do
connection reuse, as that's all the hash was used for.

This should fix Github issue #3386

This should be backported as far as 3.2.
2026-05-20 10:29:22 +02:00
Amaury Denoyelle
89f3975acc MINOR: mux_quic: define ms_bidi_rel QCC member
Add a new QCC member <ms_bidi_rel>. This represents the number of
concurrent streams advertised similarly to ms_bidi, but as a relative
value.

This patch does not introduce any functional change. For now,
<ms_bidi_rel> will be equal to <ms_bidi_init>. However, with the
implementation of stream elasticity and dynamic adjustment for
concurrent max-streams-bidi, the former will be required to keep the
last advertised value.
2026-05-20 09:52:50 +02:00