HTTP/0.9 transcoder is minimal. In particular, it did not checked if the
HTX payload length was unknown. In this case, the stream shutdown is the
normal termination signal. As this condition was not reported to the
MUX, the stream would be closed via a RESET_STREAM during the stream
shut callback invokation.
Fix this by properly inspecting HTX response line prior to generating
the HTTP/0.9 response. If flag HTX_SL_F_XFER_LEN is not set, correctly
convert it to QCS flag QC_SF_UNKNOWN_PL_LENGTH. This ensures that MUX
will use a FIN signal instead of a RESET_STREAM frame when shut is
called by the upper stream layer.
This procedure is already implemented by HTTP/3 transcoder.
This bug was detected with haterm, because contrary to httpterm the
latter does not honour Connection keep-alive header in case of HTTP/1.0.
Thus connection close mode is used and no content-length is added.
This must be backported up to 2.8.
HTTP/0.9 is a simple protocol. The response only contains the body
without any status line nor header. An HTX start line must be built when
transcoding the message for haproxy stream layer on the first invocation
of rcv_buf() callback.
Previously, this condition was detected by using an access to the stream
object. However, it's possible to rely only on the QCS by checking the
value from <rx.offset> field. This is a better solution which completely
remove the superfluous dependency between hq-interop and the stream
layer.
hq-interop request parser is minimal. It simply extracts a method and a
path until the whitespace delimiter. Parsing is interrupted if the
delimiter cannot be found, and MUX is responsible to reinvoke it later
with new content added.
This patch adjusts hq-interop parsing in case of a missing delimiter.
Now it also checks if there is space remaining in the buffer. If this is
not the case, it returns a fatal error as parsing cannot be completed at
all.
This change has the side effect of preventing a BUG_ON() crash in MUX :
in case of a truncated parsing, qcs_transfer_rx_data() may be used to
realign content from the next buffer. However this function explicitely
forbids to be called with a full buffer as it could do nothing in this
case, hence this BUG_ON() to ensure parsing is never fully blocked.
The impact of this bug remains low despite the potential BUG_ON() crash.
This is because hq-interop is only used for QUIC debugging purpose and
should not be activated in production. HTTP/3 layer is immune as it
already ensures that frame length is never bigger than a buffer size
(except for DATA frames which can be parsed in a streaming mode).
Thanks to BeaCox <root@beacox.space> for having reported us this issue.
This should be backported up to 2.6. From 3.3, qcm_stream_rx_bufsz() is
using the older "qmux" prefix and must be renamed. Also, this function
does not exists in 3.0 and older so the test must be adjusted there as
well.
The FCGI demux record length field (drl) is uint16_t. In the
ignore_record path, the expression "fconn->drl += fconn->drp" overflows
to 0 when contentLength=65535 and paddingLength>=1. This causes the
state machine to consider the record complete without consuming any
buffer data. The remaining buffer contents are then parsed as new FCGI
record headers.
The same drl+=drp pattern at lines 2382/2418/2475 is not affected
because drl is guaranteed to be 0 at those points (all content bytes
are consumed before reaching end_transfer).
Widen drl from uint16_t to uint32_t so that the addition of drp
(uint8_t, max 255) cannot overflow.
Reported-by: Tristan (@TristanInSec)
For set-headers-bin and add-header-bin actions, the new HTX headers to add
or set are extracted from the result of a sample expression evaluation. This
result must be copied because it is based a static trash chunk and internal
functions manipulating HTX messages also use static chunks. So to not
overwrite the sample expression, for instance because a defrag is performed,
it must be copied into a dynamically allocated chunk.
It is also important because the sample expression may be based on part of
the HTX message itself.
This patch must be backported to 3.4.
QCS uses a multiple Rx buffer for incoming data to decode. However this
may cause the app proto layer to block on decoding if some data are
splitted across two buffers. In this case qcs_transfer_rx_data() is used
to move the data in the same current buffer.
However, this function contains a bug as the next Rx buffer is not
reinserted as expected in the QCS tree after the move operation. This
can cause data loss which will cause a transfer freeze. This may affect
POST requests on the frontend side as well as most backend transfers.
This patch fixes the reinsert operation. It is now performed as expected
if the next rxbuf is not fully truncated. In the opposite case, the
rxbuf can simply be freed completely.
This must be backported up to 3.2.
hpack_dht_insert() has three call sites for hpack_dht_defrag(). Two of
them (lines 293 and 306) correctly check for a NULL return and bail out
with -1. The third (line 353, data-space defrag path) assigns the return
value to dht and immediately dereferences it without a NULL check.
When pool_head_hpack_tbl is exhausted, hpack_dht_alloc() returns NULL,
hpack_dht_defrag() propagates it, and line 354 dereferences NULL+0x0a
(offsetof wrap), crashing the worker with SIGSEGV.
Add a NULL check consistent with the two other call sites.
This must be backported to all stable versions.
Reported-by: Tristan (@TristanInSec)
When 0RTT is enabled, a temporary buffer for early data is used. We read
from it first when the mux asks for data, and then we free it when it is
empty, but that is not right, because maybe we have more early data to
receive, and then we no longer have any buffer to store them, and that
will eventually end up with the connection closed in error.
To fix that, as long as we haven't received all the early data yet, just
reset the buffer, instead of freeing it.
This should fix github issue #3416
This should be backported up to 2.8.
QUIC flow control on bidirectional streams ensure that the peer cannot
emit more than what haproxy has allowed, which guarantees that buffering
is under controlled on the receiver side.
This limit is first announced on the transport parameter via
initial_max_stream_data_bidi_remote which is derived from configuration
value. QUIC MUX calculation for its streams is then directly based on
the transport parameter.
This mechanism works as expected on the frontend side, as in this case
all exchanges occur on remote streams opened by the opposite side.
However, this is not working as expected on the backend side, as in this
case transfers occur on streams opened locally by haproxy as the client.
Thus, configuration has no impact on backend side rxbuf which remains
set to a single buffer, causing important latency when retrieving large
objects.
This patch removes this limitation on the backend side by adjusting
quic_transport_params_init(). If <server> parameter is false, limitation
is set for initial_max_stream_data_bidi_local TP.
This must be backported up to 3.3.
The file specified by 'h1-case-adjust-file' directive is only loaded during
post-parsing. However when a relative path is used, the corresponding
absoulte path was not resolve during parsing. So the file could be loaded
relatively from the wrong location leading to a configuration error. It may
happen if several configuration files are used or if several
"default-config" are used. The last "default" location was always used.
To fix the issue, the absolute path of the file is now resolved when the
directive is parsed.
This patch should fix the issue #3415. It must be backported to all
versions.
with L7 retries are configured, when the max number of retries is reached
the error must be reported to the client. However, when it was an abort on a
reused connections, the client connection is silently closed. While it is
expected without L7 retries, to let the client retries on its own, it is
unexepcted with L7 retries.
So let's fix it by ignoring the SF_SRV_REUSED flag on the stream when a L7
retry fails. This way, a 502/425 will be reported to the client.
This patch should help to fix the issue #3414. It must be backported to all
supported versions.
A memset used for debug was left when "keep-query" option was added. Let's
remove it. This bug should be harmless but it consumes extra CPU for
nothing.
This patch should be backported as far as 3.2.
Whne the endpoint is shut (an applet or a mux), at least one of the SHW
flags must be set (SE_SHW_SILENT or SE_SHW_NORMAL). It is mandatory for
muxes to perform the shutdown. Otherwise, the shutdown could be ignored.
So let's add a BUG_ON() on it to be sure this never happen.
This patch documents the behavior where the internal HTTP client sets
the response status to 0 when an error is encountered by the stream
(SF_ERR_MASK).
This allows users to distinguish between an HTTP status code returned
by a remote server and an internal error generated by HAProxy (e.g.
connection timeout, connection refused, etc.).
With the httpclient it's difficult to get if the HTTP status code was
returned by the actual server or if it's the internal proxy that
generate the error.
This patch changes the behavior by setting the status to 0 when an error
is get by the stream.
There were already valid cases when the status was 0 on some error, so
that should not really change the error path in the scripts.
Refactor the Lua HTTP client to defer initialization. core.httpclient()
no longer initializes the internal HTTP client immediately. Instead,
initialization now occurs within hlua_httpclient_send() when a request
method (e.g., get, put, head) is invoked.
The HTTPClient class now serves as a factory for accessing methods, while
a new class, HTTPClientRequest, has been introduced to represent individual
requests and manage the HTTP client lifecycle.
This change allows multiple requests to be executed using a single
HTTP client instance:
local hc = core.httpclient()
local res1 = hc:get({url = "...", headers = ...})
local res2 = hc:post({url = "...", headers = ...})
local res3 = hc:put({url = "...", headers = ...})
This refactor maintains backward compatibility, as existing scripts that
instantiate a new core.httpclient() for every request will continue to
work as expected.
Move the lua httpclient code from hlua.c to http_client.c
The code is almost the same but the registering of the class which is
done in hlua_http_client_init_state(), from REGISTER_HLUA_STATE_INIT()
check_args() calls have been replaced by hlua_check_args().
hlua_httpclient_destroy_all() is exported so it can be called in hlua.c.
hlua_httpclient_table_to_hdrs() is made static.
hlua_pusherror() and check_args() are being exported.
check_args() is now a macro to hlua_check_args() so it's not confusing
when called outside hlua.c.
When a dynamic server is added with consistent hash balancing on the
backend, its lb_nodes elements are allocated and associated with a
calculated server key. This operation is performed in add server handler
via srv_alloc_lb(). By default, the server key is based on its ID.
However, automatic server ID is calculated later in add server handler,
which means the initial lb_nodes are not valid.
This could cause load balacing issue but in fact this is not directly
visible as the server key is recalculated when the dynamic server is
enabled via chash_queue_dequeue_srv() : all server lb_nodes are dequeued
and requeued with the now proper key.
Thus, "add server" handler must be corrected as it is buggy when
considering it alone. The simplest solution of the current patch is to
initialize server ID before srv_alloc_lb() is invoked. There is no issue
as handler runs under thread isolation so there is no risk of multiple
servers manipulating the same ID. Server insertion in proxy ID tree is
still performed at the end of the handler when all fallible operation
are completed.
The fact that server key is recalculated when the server is set to ready
state is a side effect of the following patch which was introduced in
3.0. What this means though is that users of older releases are facing a
bigger issue, with load-balancing not working as expected. Thus,
this patch is even more crucial for 2.8 and older releases.
faa8c3e024
MEDIUM: lb-chash: Deterministic node hashes based on server address
This should fix github issue #3413. Thanks to Joao Morais for is
analysis on the problem.
This must be backported to all stable releases.
When we receive a PUSH_PROMISE frame while we don't expect it, flag it
as a connection error, do not just set ret to H3_ERR_ID_ERROR, as it
would just be considered the number of bytes we read, and could lead to
random corruption. This should only happen with backend connections.
This should be backported whenever commit 4a8bb2fe5 is backported.
Now that there is no longer a shared wake queue, chances are if a shared task
is scheduled, it will always end up on the same thread. In
wake_expired_tasks(), when a task has to be waken up, randomly look to
three other threads, and if the runqueue of the current thread is at least
two time bigger than the runqueue of one of the other threads, then give
that task to that thread, so that our load gets reduced.
If we're giving the task to another thread, then we have to add the
TASK_RUNNING flag until we waked it up, otherwise the other thread could
just run it, if it gets waken up from another path, and free it while
we're still not done with it.
2 times has been chosen somewhat arbitrarily, and may be tweaked at a
later date if deemed not optimal.
Modify task_instant_wakeup() to use __task_set_state_and_tid().
It uses the new ownership behavior, but that's okay because
task_instant_wakeup() was not used anywhere.
Totally remove the per-thread group wait queue. This was potentially a
source of contention, because there were only a global lock for all
those wait queues.
Instead, for shared tasks, there is now the concept of ownership for the
task. When a task is in the wait queue, run queue, or is running on that
particular thread, the task's tid is set to -2 - thread_tid, and only
that thread will be responsible for it until it is no longer running,
and in none of its queue.
When a shared task is scheduled to be run at a later time, if its
current tid is -1, then the current thread will take ownership, and put
it in its own wait queue. If it is already owned, then TASK_WOKEN_WQ is
added to the task's state, and a task_wakeup() is done, so that the
owner thread will add it in its wait queue.
If there is any owner, then a task_wakeup() will just add the task to
the owner's runqueue, otherwise the current thread will become the
owner.
Introduce a new function, __task_get_current_owner, that returns the
owner of a task based on its current tid.
-1 means there is no current owner, otherwise either the tid is >= 0, in
which case it will just return it, or it's < -1, in which case it will
return -2 - tid, the tid of the thread with the current ownership.
Introduce __task_get_new_tid_field(), that provides the tid to be used
for a task.
For shared task, to mark temporary ownership of a task, instead of -1,
the tid will be set to -2-tid, tid being the tid of the current thread.
Introduce a new function, __task_set_state_and_tid, that atomically can
set a task's state and its tid. This will be used later, as the tid will
be used to indicate task ownership even for shared tasks.
This Lua script automates dns-01 ACME challenges using the Gandi LiveDNS
API v5. It subscribes to the ACME_DEPLOY event to set the required
_acme-challenge TXT record via the Gandi REST API, signals HAProxy that
the challenge is ready using ACME.challenge_ready(), then cleans up the
TXT record once the certificate is issued on ACME_NEWCERT.
The API key is read from the GANDI_API_KEY environment variable at
startup. Zone discovery is automatic: the script probes parent zones
from longest to shortest until Gandi accepts the record, which handles
both apex and wildcard certificates transparently.
Add EVENT_HDL_SUB_ACME_DEPLOY to the ACME family. It is published in
the dns-01 challenge path after the TXT record information has been
prepared, carrying the certificate store name, domain, account
thumbprint, dns_record value, and optionally the provider and vars
strings.
Lua subscribers using core.event_sub() receive the event data as an
AcmeEvent object, which is the same class used for ACME_NEWCERT and
carries the fields relevant to the event type.
Add a new EVENT_HDL_SUB_ACME_NEWCERT event type in the ACME family.
It is published after a new certificate has been successfully fetched
and installed. The event carries the certificate store name, allowing
subscribers to act on newly available certificates.
Lua subscribers using core.event_sub() receive the event data as an
AcmeEvent object with a crtname field containing the certificate store
name.
Using ha_diag_warning() to report the number of threads created resulted
in warnings being counted and possibly an error being fired when combined
with -dW:
$ printf "global\nstats socket /tmp/sock1\n" | ./haproxy -dD -dW -c -f /dev/stdin; echo $?
[NOTICE] (10406) : haproxy version is 3.5-dev0-5091ac-35
[NOTICE] (10406) : path to executable is ./haproxy
[DIAG] (10406) : Created 20 threads split into 2 groups
[ALERT] (10406) : Some warnings were found and 'zero-warning' is set. Aborting.
1
Now that we have ha_diag_notice(), let's use it:
$ printf "global\nstats socket /tmp/sock1\n" | ./haproxy -dD -dW -c -f /dev/stdin; echo $?
[DIAG] (10513) : Created 20 threads split into 2 groups
0
It would make sense to backport this to 3.2 because it helps validate configs
against diag warnings without triggering a false positive. It depends on
this previous patch:
MINOR: errors: add ha_diag_notice() to report diag-level notifications
Right now the only way to report info that is only displayed in diag
mode with -dD is to use ha_diag_warning(). The problem is that this is
then counted as a warning and may result in errors when combined with
-dW, as happens for the CPU topology info:
$ printf "global\nstats socket /tmp/sock1\n" | ./haproxy -dD -dW -c -f /dev/stdin; echo $?
[NOTICE] (10406) : haproxy version is 3.5-dev0-5091ac-35
[NOTICE] (10406) : path to executable is ./haproxy
[DIAG] (10406) : Created 20 threads split into 2 groups
[ALERT] (10406) : Some warnings were found and 'zero-warning' is set. Aborting.
1
We need another level. This commit introduces ha_diag_notice() which only
emits a notification that doesn't count as a warning. Note that we could
even introduce an info level and revisit various messages so that notice
only reports certain events while info is for anything (like versions
above). That could be a future improvement.
The Linux tls module requires a socket to be in TCP_ESTABLISHED state
before we can enable the TLS ULP on the socket, if the socket is in any
other state, then the setsockopt() call will fail, and we won't use
kTLS on that socket.
To make sure we're not doing it too early, defer it until the TLS
handshake is done, which means the TCP connection is established.
This should be backported up to 3.3.
Signed-off-by: Karol Kucharski <kkucharski@fastlogic.pl>
Add a new ACME global Lua table with a challenge_ready(crt, dns) method
that wraps acme_challenge_ready(). It marks the ACME challenge for domain
<dns> in certificate <crt> as ready and returns the number of remaining
challenges, or 0 when all challenges are ready and validation has been
triggered. A Lua error is raised if the certificate or domain is not found.
The ACME table is registered for each lua_State via the new
REGISTER_HLUA_STATE_INIT() mechanism.
__LJMP, WILL_LJMP() and MAY_LJMP() were defined locally in hlua.c,
making them unavailable to other modules that implement Lua bindings.
Move them to include/haproxy/hlua.h so they can be used outside of
hlua.c.
Add a registration mechanism so that modules outside of hlua.c can hook
into each lua_State creation. Modules call hap_register_hlua_state_init()
(or the REGISTER_HLUA_STATE_INIT() macro) with a callback of the form:
int my_init(lua_State *L, char **errmsg);
The callback returns an ERR_* code. ERR_ALERT and ERR_WARN trigger
ha_alert()/ha_warning() respectively; any other non-zero errmsg is
emitted via ha_notice(). ERR_FATAL or ERR_ABORT cause exit(1).
Registered entries are freed in hlua_deinit().
Output HTTP/3 header traces on the backend side. As previous commit,
this relies on h3_trace_header() function.
Extra calls are added for fields extracted from the request start-line
which produce an HTTP/3 pseudo-header.
Output trace for HTTP/3 headers manipulated on the frontend side. This
is implement via a new utility function h3_trace_header(), largely
inspired from existing h2_trace_header().
An extra call is added for HTTP/3 response :status which is extracted
from the HTX start line.
Define two new values for HTTP/3 trace verbosity : simple and advanced.
For now, these values are unused. However advanced level will become
useful to implement HTTP/3 header traces.
Extract the challenge-readiness logic from cli_acme_chall_ready_parse()
into a new acme_challenge_ready(crt, dns) function so it can be called
from other contexts such as Lua event handlers.
It slightly changes the messages on the CLI.
When an ACME order is re-used or when a domain was recently validated,
the CA may return status "valid" for an authorization without requiring
any challenge to be solved. In acme_res_auth(), this is handled by
setting auth->validated = 1 and jumping to out — but auth->ready is
never initialized and stays 0.
This became a bug in 3.4 when the "challenge-ready" option and the
ACME_CLI_WAIT state were introduced (commit 2b0c510aff). ACME_CLI_WAIT
computes:
all_cond_ready &= auth->ready;
across all authorizations. A single auth->ready == 0 drives the AND
to zero and the task waits indefinitely for a readiness signal that
will never arrive, since no challenge was published and no external
agent will ever call challenge_ready() for that domain.
Fix it by setting auth->ready = ctx->cfg->cond_ready for already-valid
authorizations, marking them as satisfying all required readiness
conditions so ACME_CLI_WAIT can proceed normally.
This should be backported to 3.4.
Commit 3c923d075 introduced a C99ism by declaring a variable in a for loop,
don't do that, especially since there already is a variable named "i"
declared.
This should fix the build when -std=c89 is used.
This should be backported if commit 3c923d075 is backported.
QUIC packets using a long header contains a Length field. Its value is
the length of the content following it, i.e. the packet number field and
the remaining payload (QUIC frames and TLS AEAD tag).
Computation to determine the packet length is performed in
qc_do_build_pkt(). However this calculation is incorrect when Initial
padding is added on a small enough Initial packet. As length field is
encoded as a varint, it changes the field size which grow from one to
two bytes, reducing in effect the total required padding length from one
byte. However, length value is not updated and thus is one byte bigger
than the final packet payload with padding.
Fix this calculation by reducing the length value after padding size has
been adjusted.
This bug caused the peer to reject such faulty packets. However, its
impact is minor as it only happened only for Initial with small enough
payload. Packets used for ClientHello/ServerHello exchanges should be
large enough and typically not concerned by this bug, except maybe in
case of fragmentation.
This bug was detected by testing QUIC backend with a quiche server. The
server endpoint reported the faulty packets with the following trace :
[2026-06-09T13:42:13.694179158Z ERROR quiche_server] 1b1b961c9c4ae1f470f3687510b120da1f5d5f5a recv failed: InvalidPacket
This must be backported up to 3.0.