haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-03-27 12:56:09 -04:00

Author	SHA1	Message	Date
Jacques Heunis	91eb9b082b	BUG/MINOR: freq_ctr: Prevent possible signed overflow in freq_ctr_overshoot_period Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details All of the other bandwidth-limiting code stores limits and intermediate (byte) counters as unsigned integers. The exception here is freq_ctr_overshoot_period which takes in unsigned values but returns a signed value. While this has the benefit of letting the caller know how far away from overshooting they are, this is not currently leveraged anywhere in the codebase, and it has the downside of halving the positive range of the result. More concretely though, returning a signed integer when all intermediate values are unsigned (and boundaries are not checked) could result in an overflow, producing values that are at best unexpected. In the case of flt_bwlim (the only usage of freq_ctr_overshoot_period in the codebase at the time of writing), an overflow could cause the filter to wait for a large number of milliseconds when in fact it shouldn't wait at all. This is a niche possibility, because it requires that a bandwidth limit is defined in the range [2^31, 2^32). In this case, the raw limit value would not fit into a signed integer, and close to the end of the period, the `(elapsed * freq)/period` calculation could produce a value which also doesn't fit into a signed integer. If at the same time `curr` (the number of events counted so far in the current period) is small, then we could get a very large negative value which overflows. This is undefined behaviour and could produce surprising results. The most obvious outcome is flt_bwlim sometimes waiting for a large amount of time in a case where it shouldn't wait at all, thereby incorrectly slowing down the flow of data. Converting just the return type from signed to unsigned (and checking for the overflow) prevents this undefined behaviour. It also makes the range of valid values consistent between the input and output of freq_ctr_overshoot_period and with the input and output of other freq_ctr functions, thereby reducing the potential for surprise in intermediate calculations: now everything supports the full 0 - 2^32 range.	2025-11-24 14:10:13 +01:00
Amaury Denoyelle	2829165f61	BUG/MEDIUM: server: do not use default SNI if manually set A new server feature "sni-auto" has been introduced recently. The objective is to automatically set the SNI value to the host header if no SNI is explicitely set. `668916c1a2` MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default There is an issue with it : server SNI is currently always overwritten, even if explicitely set in the configuration file. Adjust check_config_validity() to ensure the default value is only used if <sni_expr> is NULL. This issue was detected as a memory leak on <sni_expr> was reported when SNI is explicitely set on a server line. This patch is related to github feature request #3081. No need to backport, unless the above patch is.	2025-11-24 11:45:18 +01:00
William Lallemand	5dbf06e205	MINOR: httpclient: complete the https log Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details The httpsclient_log_format variable lacks a few values in the TLS fields that are now available as fetches. On the backend side we have: "%[fc_err]/%[ssl_fc_err,hex]/%[ssl_c_err]/%[ssl_c_ca_err]/%[ssl_fc_is_resumed] %[ssl_fc_sni]/%sslv/%sslc" We now have enough sample fetches to have this equivalent in the httpclient: "%[bc_err]/%[ssl_bc_err,hex]/%[ssl_c_err]/%[ssl_c_ca_err]/%[ssl_bc_is_resumed] %[ssl_bc_sni]/%[ssl_bc_protocol]/%[ssl_bc_cipher]" Instead of the current: "%[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"	2025-11-22 12:29:33 +01:00
William Lallemand	0cae2f0515	BUG/MINOR: acme: warning ‘ctx’ may be used uninitialized Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Please compiler with maybe-uninitialized warning src/acme.c: In function ‘cli_acme_chall_ready_parse’: include/haproxy/task.h:215:9: error: ‘ctx’ may be used uninitialized [-Werror=maybe-uninitialized] 215 \| _task_wakeup(t, f, MK_CALLER(WAKEUP_TYPE_TASK_WAKEUP, 0, 0)) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/acme.c:2903:17: note: in expansion of macro ‘task_wakeup’ 2903 \| task_wakeup(ctx->task, TASK_WOKEN_MSG); \| ^~~~~~~~~~~ src/acme.c:2862:26: note: ‘ctx’ was declared here 2862 \| struct acme_ctx *ctx; \| ^~~ Backport to 3.2.	2025-11-21 23:04:16 +01:00
William Lallemand	d77d3479ed	BUG/MINOR: acme: better challenge_ready processing Improve the challenge_ready processing: - do a lookup directly instead looping in the task tree - only do a task_wakeup when every challenges are ready to avoid starting the task and stopping it just after - Compute the number of remaining challenge to setup - Output a message giving the number of remaining challenges to setup and if the task started again. Backport to 3.2.	2025-11-21 22:47:52 +01:00
William Lallemand	548e7079cd	BUG/MINOR: acme: prevent creating map entries with dns-01 We don't need map entries with dns-01. The patch must be backported to 3.2.	2025-11-21 12:28:41 +01:00
William Lallemand	26093121a3	BUG/MINOR: acme: handle multiple auth with the same name In case of the dns-01 challenge, it is possible to have a domain "example.com" and "*.example.com" in the same request. This will create 2 different auth objects, which need 2 different challenges. However the associated domain is "example.com" for both auth objects. When doing a "challenge_ready", the algorithm will break at the first domain found. But since you can have multiple time the same domain in this case, breaking at the first one prevent to have all auth objects in a ready state. This patch just remove the break so we can loop on every auth objects. Must be backported to 3.2.	2025-11-21 12:28:41 +01:00
Amaury Denoyelle	bbd83e3de9	BUG/MINOR: mux-quic: check access on qcs stream-endpoint Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Since the following commit, allocation of stream-endpoint has been delayed. The objective is to allocate it only for QCS attached to an upper stream object. commit `e6064c5616` OPTIM: mux-quic: delay FE sedesc alloc to stream creation However, some MUX functions are unsafe as qcs->sd is dereferenced without any check on it which will result in a crash. Fix this by testing that qcs->sd is allocated before using it. This does not need to be backported, unless the above patch is.	2025-11-21 11:16:07 +01:00
Frederic Lecaille	91f479604e	BUG/MEDIUM: quic-be: quic_conn_closed buffer overflow This bug impacts only the backends. Recent commits have modified quic_rx_pkt_parse() for the QUIC backend to handle the retry token, and version negotiation. This function is called for the quic_conn even when is closing state (so for the quic_conn_closed struct). The quic_conn struct and quic_conn_closed struct share some members thank to the leading QUIC_CONN_COMMON struct. The recent modification impacts some members which do not exist for the quic_connn_closed struct, leading to buffer overflows if modified. For the backends only this patch: 1- silently drops the Retry packet (received/parsed only by backends) 2- silently drops the Initial packets received in closing state This is safe for the Initial packets because in closing state the datagrams are entirely skipped thanks to qc_rx_check_closing() in quic_dgram_parse(). No backport needed because the backend support arrived with the current dev.	2025-11-21 10:49:44 +01:00
Amaury Denoyelle	e6064c5616	OPTIM: mux-quic: delay FE sedesc alloc to stream creation On frontend side, a stream-endpoint is allocated on every qcs_new() invokation. However, this is only used for bidirectional request streams. This patch delays stream-endpoint allocation to qcs_attach_sc(), just prior the instantiation of the upper stream object. This does not bring any behavior change but is a nice optimization.	2025-11-21 10:34:08 +01:00
Amaury Denoyelle	4fb8908605	BUG/MINOR: mux-quic: fix sedesc leak on BE side On backend side, streams are instantiated prior to their QCS MUX counterpart. Thus, QCS can reuse the stream-endpoint already allocated with the streams, either on qmux_init() or attach operation. However, a stream-endpoint is also always allocated in every qcs_new() invokation. For backend QCS, it is thus overwritten on qmux_init()/attach operation. This causes a memleak. Fix this by restricting allocation of stream-endpoint only for frontend connection. This does not need to be backported.	2025-11-21 10:34:08 +01:00
Amaury Denoyelle	9f16c64a8c	MINOR: h3: adjust sedesc update for known input payload len	2025-11-21 10:34:08 +01:00
Christopher Faulet	0629ce8f4b	BUG/MEDIUM: cli: State the cli have no more data to deliver if it yields A regression was introduced in the commit `2d7e3ddd4` ("BUG/MEDIUM: cli: do not return ACKs one char at a time"). When the CLI is processing a command line, we no longer send response immediately. It is especially useful for clients sending a bunch of commands with very short response. However, in that state, the CLI applet must state it has no more data to deliver. Otherwise it will be woken up again and again because data are found in its output buffer with no blocking conditions. In worst cases, if the command rate is really high, this can trigger the watchdog. This patch must be backported where the patch above is, so probably as far as 3.0.	2025-11-21 10:00:15 +01:00
Christopher Faulet	dfdccbd2af	BUG/MEDIUM: applet: Fix conditions to detect spinning loop with the new API There was a mixup between read/send events and ability for an applet to receive and send. The fix seems obvious by reading it. The call-rate must be incremented when nothing was received from the applet while it was allowed and nothing was sent to the applet while it was allowed. This patch must be backported as far as 3.0.	2025-11-21 09:41:05 +01:00
Willy Tarreau	4cbff2cad9	MINOR: limits: display the computed maxconn using ha_notice() Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details The computed maxconn was only displayed in verbose or debug modes. This is too bad because lots of users just don't know what they're starting with and can be trapped when an environment changes. Let's use ha_notice() instead of a conditional fprintf() so that it gets displayed right after the other startup messages, hoping that users will get used to seeing it and more easily spot anomalies. See github issue #3191 for more context.	2025-11-20 18:38:09 +01:00
Willy Tarreau	05c409f1be	BUG/MEDIUM: connection/ssl: also fix the ssl_sock_io_cb() regarding idle list Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details The fix in commit `9481cef948` ("BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list") is also needed for ssl_sock_io_cb() which can also release an idle connection and must perform the same checks. This fix must be backported to all stable versions containing the fix above.	2025-11-20 17:19:50 +01:00
Amaury Denoyelle	b2664d4450	BUG/MINOR: quic: flag conn with CO_FL_FDLESS on backend side Connection struct defines an handle which can point to either a FD or a quic_conn. On the latter case, CO_FL_FDLESS must be set. This is already the case on frontend side. This patch fixes QUIC backend support. Before setting connection handle member to a quic_conn instance, ensure that CO_FL_FDLESS flag is set on the connection. Prior to this patch, crash can occur in "show sess all". No need to backport.	2025-11-20 16:44:03 +01:00
Amaury Denoyelle	cd2962ee64	MINOR: quic: store source address for backend conns quic_conn has a local_addr member which is used to store the connection source address. On backend side, this member is initialized to NULL as the address is not yet known prior to connect. With this patch, quic_connect_server() is extended so that local_addr is updated after connect() success. Also, quic_sock_get_src() is completed for the backend side which now returns local_addr member. This step is necessary to properly support fetches bc_src/bc_src_port.	2025-11-20 16:44:03 +01:00
Christopher Faulet	0a7f3954b5	BUG/MEDIUM: config: Use the mux protocol ALPN by default for listeners if forced Since the commit `5003ac7fe` ("MEDIUM: config: set useful ALPN defaults for HTTPS and QUIC"), the ALPN is set by default to "h2,http/1.1" for HTTPS listeners. However, it is in conflict with the forced mux protocol, if any. Indeed, with "proto" keyword, the mux can be forced. In that case, some combinations with the default ALPN will triggers connections errors. For instance, by setting "proto h2", it will not be possible to use the H1 multiplexer. So we must take care to not advertise it in the ALPN. Worse, since the commit above, most modern HTTP clients will try to use the H2 because it is advertised in the ALPN. By setting "proto h1" on the bind line will make all the traffic rejected in error. To fix the issue, and thanks to previous commits, if it is defined, we are now relying on the ALPN defined by the mux protocol by default. The H1 multiplexer (only the one that can be forced) defines it to "http/1.1" while the H2 multiplexer defines it to "h2". So by default, if one or another of these muxes is forced, and if no ALPN is set, the mux ALPN is used. Other multiplexers are not defining any default ALPN for now, because it is useless. In addition, only the listeners are concerned because there is no default ALPN on the server side.Finally, there is no tests performed if the ALPN is forced on the bind line. It is the user responsibility to properly configure his listeners (at least for now). This patch depends on: * MINOR: config: Do proto detection for listeners before checks about ALPN * MINOR: muxes: Support an optional ALPN string when defining mux protocols The series must be backported as far as 2.8.	2025-11-20 16:14:52 +01:00
Christopher Faulet	2ef8b91a00	MINOR: config: Do proto detection for listeners before checks about ALPN The verification of any forced mux protocol, via the "proto" keyword, for listeners is now performed before any tests on the ALPN. It will be mandatory to be able to force the default ALPN, if not forced on the bind line. This patch will be mandatory for the next fix.	2025-11-20 16:14:52 +01:00
Christopher Faulet	8e08a635eb	MINOR: muxes: Support an optional ALPN string when defining mux protocols When a multiplexer protocol is defined, it is now possible to specify the ALPN it supports, in binary format. This info is optionnal. For now only the h2 and the h1 multiplexers define an ALPN because this will be mandatory for a fix. But this could be used in future for different purpose. This patch will be mandatory for the next fix.	2025-11-20 16:14:52 +01:00
Olivier Houchard	e9d34f991e	BUG/MEDIUM: queues: Don't forget to unlock the queue before exiting Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details In assign_server_and_queue(), there's a rare case when the server was full, so we created a pendconn, another server was considered but in the meanwhile the pendconn was unqueued already, so we just left the function. We did so, however, while still holding the queue lock, which will ultimately lead to a deadlock, and ultimately the watchdog would kill the process. To fix that, just unlock the queue before leaving. This should be backported to 3.2.	2025-11-20 13:57:06 +01:00
William Lallemand	e0665d4ffe	BUG/MINOR: acme: alert when the map doesn't exist at startup When configuring an acme section with the 'map' keyword, the user must use an existing map. If the map doesn't exist, a log will be emitted when trying to add the challenge to the map. This patch change the behavior by checking at startup if the map exists, so haproxy would warn and won't start with a non-existing map. This must be backported in 3.2.	2025-11-20 12:22:19 +01:00
Frederic Lecaille	fab7da0fd0	BUG/MEDIUM: quic-be/ssl_sock: TLS callback called without connection Contrary to TCP, QUIC does not SSL_free() its SSL * object when its ->close() XPRT callback is called. This has as side effect to trigger some BUG_ON(!conn) with <conn> the connection from TLS callbacks registered at configuration parsing time, so after this <conn> have been released. This is the case for instance with ssl_sock_srv_verifycbk() whose role is to add some checks to the built-in server certificate verification process. This patch prevents the pointer to <conn> dereferencing inside several callbacks shared between TCP and QUIC. Thank you to @InputOutputZ for its report in GH #3188. As the QUIC backend feature arrived with the current 3.3 dev, no need to backport.	2025-11-20 11:36:57 +01:00
Willy Tarreau	8438ca273f	MINOR: limits: explain a bit better what to do when fd limits are exceeded As shown in github issue #3191, the error message shown when FD limits are exceeded is not very useful as-is, since the current hard limit is not displayed, and no suggestion is made about what to change in the config. Let's explain about maxconn/ulimit-n/fd-hard-limit, suggest dropping them or setting them to a context-based value at roughly 49% of the current limit minus the known used FDs for listeners and checks. This allows common "large" hard limits to report mostly round maxconns. Example: [ALERT] (25330) : [haproxy.main()] Cannot raise FD limit to 4001020, current limit is 1024 and hard limit is 4096. You may prefer to let HAProxy adjust the limit by itself; for this, please just drop any 'maxconn' and 'ulimit-n' from the global section, and possibly add 'fd-hard-limit' lower than this hard limit. You may also force a new 'maxconn' value that is a bit lower than half of the hard limit minus listeners and checks. This results in roughly 1500 here.	2025-11-20 08:44:52 +01:00
Willy Tarreau	91d4f4f618	MINOR: limits: keep a copy of the rough estimate of needed FDs in global struct It's always a pain to guess the number of FDs that can be needed by listeners, checks, threads, pollers etc. We have this estimate in global.maxsock before calling set_global_maxconn(), but we lose it the line after. Let's copy it into global.est_fd_usage and keep it. This will be helpful to try to provide more accurate suggestions for maxconn.	2025-11-20 08:44:52 +01:00
Frederic Lecaille	2c6720a163	MINOR: quic: uneeded xprt context variable passed as parameter This quic_conn ->xrpt_ctx is passed to qc_send_ppkts(), the quic_conn is retrieved from this context to be used inside this function and it is not used at all by this function. This patch simply directly passes the quic_conn to qc_send_ppkts(). This is only what this function needs.	2025-11-20 08:17:44 +01:00
Amaury Denoyelle	d54d78fe9a	BUG/MINOR: quic: fix FD usage for quic_conn_closed on backend side Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details On the frontend side, QUIC transfer can be performed either via a connection owned FD or multiplex on the listener one. When a quic_conn is freed and converted to quic_conn_closed instance, its FD if open is closed and all exchanges are now multiplex via the listener FD. This is different for the backend as connections only has the choice to use their owned FD. Thus, special care care must be taken when freeing a connection and converting it to a quic_conn_closed instance. In this case, qc_release_fd() is delayed to the quic_conn_closed release. Furthermore, when the FD is transferred, its iocb and owner fields are updated to the new quic_conn_closed instance. Without it, a crash will occur when accessing the freed quic_conn tasklet. A newly dedicated handler quic_conn_closed_sock_fd_iocb is used to ensure access to quic_conn_closed members only.	2025-11-19 16:02:22 +01:00
Amaury Denoyelle	46c5c232d7	BUG/MINOR: quic: do not decrement jobs for backend conns jobs is a global counter which serves to account activity through the whole process. Soft-stop procedure will wait until this counter is resetted to the nul value. jobs is not used for backend connections. Thus, it is not incremented when a QUIC backend connection is instantiated as expected. However, decrement is performed on all sides during quic_conn_release(). This causes the counter wrapping. Fix this by decrementing jobs only for frontend connections. Without this patch, soft stop procedure will hang indefinitely if QUIC backend connections were in use.	2025-11-19 16:02:22 +01:00
Amaury Denoyelle	1a22caa6ed	MINOR: quic: fix trace on quic_conn_closed release Adjust leaving trace of quic_release_cc_conn() so that the end of the function is properly reported.	2025-11-19 16:02:22 +01:00
Amaury Denoyelle	e55bcf5746	BUG/MINOR: mux-quic: implement max-reuse server parameter Properly implement support for max-reuse server keyword. This is done by adding a total count of streams seen for the whole connection. This value is used in avail_streams callback.	2025-11-19 16:02:22 +01:00
William Lallemand	c8540f7437	BUG/MINOR: ssl: remove dead code in ssl_sock_from_buf() Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details When haproxy is compiled in -O0, the SSL_get_max_early_data() symbol is used in the generated assembly, however -O2 seems to remove this symbol when optimizing the code. It happens because `if conn_is_back(conn)` and `if (objt_listener(conn->target))` are opposed conditions, which mean we never use the branch when objt_listener(conn->target) is true. This patch removes the dead code. Bonus: SSL_get_max_early_data() is not implemented in rustls, and that's the only thing preventing to start with it. This can be backported in every stable branches.	2025-11-19 11:00:05 +01:00
William Lallemand	177816d2b8	BUG/MINOR: acme: P-256 doesn't work with openssl >= 3.0 Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details When trying to use the P-256 curve in the acme configuration with OpenSSL 3.x, the generation of the account was failing because OpenSSL doesn't return a NIST or SECG curve name, but a ANSI X9.62 one. Since the ANSI X9.62 curve names were not in the list, it couldn't match anything supported. This patch fixes the issue by adding both prime192v1 and prime256v1 name in the struct curve array which is used during curve parsing. Must be backported to 3.2.	2025-11-18 11:34:28 +01:00
William Lallemand	9bf01a0d29	BUG/MINOR: mworker: wrong signals during startup Since the new master-worker model in 3.1, signals are registered in step_init_3(). However, those signals were supposed to be registered only for the worker or the standalone mode. It would call the wrong callback in the master even during configuration parsing. The patch set the signals handler to NULL for the master so it does nothing until they really are registered. Must be backported as far as 3.1.	2025-11-18 10:27:34 +01:00
William Lallemand	709cde6d08	BUG/MEDIUM: mworker: signals inconsistencies during startup and reload Since haproxy 3.1, the master-worker mode changed to let the worker parse the configuration instead of the master. Previously, signals were blocked during configuration parsing and unblocked before entering the polling loop of the master. This way it was impossible to start a reload during the configuration parsing. But with the new model, the polling loop is started in the master before the configuration parsing is finished, and the signals are still unblocked at this step. Meaning that it is possible to start a reload while the configuration is parsing. This patch reintroduce the behavior of blocking the signals during configuration parsing adapted to the new model: - Before the exec() of the reload, signals are blocked. - When entering the polling loop, the SIGCHLD is unblocked because it is required to get a failure during configuration parsing in the worker - Once the configuration is parsed, upon success in _send_status() or upon failure in run_master_in_recovery_mode() every signals are unblocked. This patch must be backported as far as 3.1.	2025-11-18 10:05:42 +01:00
William Lallemand	b38405d156	CLEANUP: startup: move confusing msg variable Move the char *msg variable declared in main() in a sub-block since there's already multiple msg variable in other sub-blocks in this function. Also make it const.	2025-11-18 09:43:25 +01:00
Frederic Lecaille	37d01eea37	BUG/MEDIUM: quic-be: prevent use of MUX for 0-RTT sessions without secrets Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details The QUIC backend crashes when its peer does not support 0-RTT. In this case, when the sessions are reused, no early-data level secrets are derived by the TLS stack. This leads to crashes from qc_send_mux() which does not suppose that both early-data level (qc->eel) and application level (qc->ael) cipher levels could be non initialized. To fix this: - prevent qc_send_mux() to send data if these two encryption level are not intialized. In this case it returns QUIC_TX_ERR_NONE; - avoid waking up the MUX from XPRT ->start() callback if the MUX is ready but without early-data level secrets to send them; - ensure the MUX is woken up by qc_ssl_do_handshake() after handshake completion if it is ready calling qc_notify_send() Thank you to @InputOutputZ for having reported this issue in GH #3188. No need to backport because QUIC backends is a current 3.3 development feature.	2025-11-17 15:40:24 +01:00
William Lallemand	0367227375	MEDIUM: mworker: set the mworker-max-reloads to 50 Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details There was no mworker-max-reload value by default, it was set to INT_MAX so this was impossible to reach. The default value is now 50, which is still high, but no workers should undergo that much reloads. Meaning that a worker will be killed with SIGTERM if it reach this much reloads.	2025-11-17 11:54:30 +01:00
Amaury Denoyelle	c67a614e45	MINOR: quic: remove <ipv4> arg from qc_new_conn() Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Remove <ipv4> argument from qc_new_conn(). This parameter is unnecessary as it can be derived from the family type of the addresses also passed as argument.	2025-11-17 10:20:54 +01:00
Amaury Denoyelle	133f100467	MINOR: quic: refactor qc_new_conn() prototype The objective of this patch is to streamline qc_new_conn() usage so that it is similar for frontend and backend sides. Previously, several parameters were set only for frontend connections. These arguments are replaced by a single quic_rx_packet argument, which represents the INITIAL packet triggering the connection allocation on the server side. For a QUIC client endpoint, it remains NULL. This usage is consider more explicit. As a minor change, <target> is moved as the first argument of the function. This is considered useful as this argument determines whether the connection is a frontend or backend entry. Along with these changes, qc_new_conn() documentation has been reworded so that it is now up-to-date with the newest usage.	2025-11-17 10:13:40 +01:00
Amaury Denoyelle	035c026220	MINOR: quic: support multiple random CID generation for BE side When a new backend connection is instantiated, a CID is first randomly generated. It will serve as the first DCID for incoming packets from the server. Prior to this patch, if the generated CID caused a collision with an other entries from another connection, an error is reported and the connection cannot be allocated. This patch improves this procedure by implementing retries when a collision occurs. Now, at most three attemps will be performed before giving up. This is the same procedure already performed for CIDs instantiated after RETIRE_CONNECTION_ID frame parsing. Along with this functional change, qc_new_conn() is refactored for backend instantiation. The CID generation is extracted from it and the value is passed as an argument. This is considered cleaner as the code is more similar between frontend and backend sides.	2025-11-17 10:11:04 +01:00
Amaury Denoyelle	8720130cc7	MINOR: quic: do not use quic_newcid_from_hash64 on BE side quic_newcid_from_hash64 is an external callback. If defined, it serves as a CID method generation, as an alternative to the default random implementation. This mechanism was not correctly implemented on the backend side. Indeed, <hash64> quic_conn member is only setted for frontend connections. The simplest solution would be to properly define it also for backend ones. However, quic_newcid_from_hash64 derivation is really only useful for the frontend side for now. Thus, this patch disables using it on the backend side in favor of the default random generator. To implement this, quic_cid_generate() is splitted in two functions, for both methods of CIDs generation. This is the responsibility of the caller to select the proper method. On backend side, only random implementation is now used.	2025-11-17 10:11:04 +01:00
Christopher Faulet	fc6e3e9081	MINOR: stick-tables: Rename stksess shards to use buckets The shard keyword is already used by the peers and on the server lines. And it is unrelated with the session keys distribution. So instead of talking about shard for the session key hashing, we now use the term "bucket".	2025-11-17 07:42:51 +01:00
Frederic Lecaille	54eeda4b01	BUG/MINOR: quic-be: backend SSL session reuse fix (OpenSSL 3.5) This bug impacts only the QUIC backends when haproxy is compiled against OpenSSL 3.5 with QUIC API(HAVE_OPENSSL_QUIC). The QUIC clients could not reuse their SSL session because the TLS tickets received from the servers could not be provided to the TLS stack. This should be done when the stack calls ha_quic_ossl_crypto_recv_rcd() (OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_RECV_RCD callback). According to OpenSSL team, an SSL_read() call must be done after the handshake completion. It seems the correct location is at the same level as for SSL_process_quic_post_handshake() for quictls. Thank you to @mattcaswell, @Sashan and @vdukhovni for having helped in solving this issue. Must be backported to 3.1	2025-11-14 17:50:49 +01:00
Frederic Lecaille	644bf585c3	CLEANUP: quic: Missing succesful SSL handshake backend trace (OpenSSL 3.5) This very minor issue impacts only the backend when compiled against OpenSSL 3.5 with QUIC API (HAVE_OPENSSL_QUIC). The "SSL handshake OK" trace was not dumped by a TRACE() call. This was very annoying when debugging. Modify the concerned code section which is a bit ugly and simplify it. The TRACE() call is done at a unique location for now on. Should be backported to 3.2 to ease any further backport.	2025-11-14 17:50:49 +01:00
Frederic Lecaille	f0c52f7160	BUG/MINOR: quic-be: missing version negotiation This bug impacts only the QUIC clients (or backends). The version negotiation was not supported at all for them. This is an oversight. Contrary to the QUIC server which choose the negotiated version after having received the transport parameters (into ClientHello message) the client selects the negotiated version from the first Initial packet version field. Indeed, the server transport parameters are inside the ServerHello messages ciphered into Handshake packets. This non intrusive patch does not impact the QUIC server implementation. It only selects the negotiated version from the first Initial packet received from the server and consequently initializes the TLS cipher context. Thank you to @InputOutputZ for having reporte this issue in GH #3178. No need to backport because the QUIC backends support arrives with 3.3.	2025-11-14 17:37:34 +01:00
Willy Tarreau	0746aa68b8	BUG/MINOR: check: fix QUIC check test when QUIC disabled Latest commit `ef206d441c` ("MINOR: check: ensure QUIC checks configuration coherency") introduced a regression when QUIC is not compiled in. Indeed, not specifying a check proto sets mux_proto to NULL, which also happens to be the value of get_mux_proto("QUIC"), so it complains about QUIC. Let's add a non-null check in addition to this. No backport is needed.	2025-11-14 17:27:53 +01:00
Willy Tarreau	4a6dec7193	DEBUG: servers: add a few checks for stress-testing idle conns The latest idle conns fix `9481cef948` ("BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list") addresses a very hard-to-hit case which manifests itself with an attempt to reuse a connection fails because conn->mux is NULL: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000655410b8642c in conn_backend_get (reuse_mode=4, srv=srv@entry=0x6554378a7140, sess=sess@entry=0x7cfe140948a0, is_safe=is_safe@entry=0, hash=hash@entry=910818338996668161) at src/backend.c:1390 1390 if (conn->mux->takeover && conn->mux->takeover(conn, i, 0) == 0) { However the condition that leads to this situation can be detected earlier, by the presence of the connection in the toremove_list, whose race window is much larger and easier to detect. This patch adds a few BUG_ON_STRESS() at selected places that an detect this condition. When built with -DDEBUG_STRESS and run under stress with two distinct processes communicating over H2 over SSL, under a stress of 400-500k req/s, the front process usually crashes in the first 10-30s triggering in _srv_add_idle() if the fix above is reverted (and it does not crash with the fix). This is mainly included to serve as an illustration of how to instrument the code for seamless stress testing.	2025-11-14 17:00:17 +01:00
Amaury Denoyelle	9481cef948	BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details A recent patch was introduced to fix a rare race condition in idle connection code which would result in a crash. The issue is when MUX IO handler run on top of connection moved in the purgeable list. The connection would be considered as present in the idle list instead, and reinserted in it at the end of the handler while still in the purge list. `096999ee20` BUG/MEDIUM: connections: permit to permanently remove an idle conn This patch solves the described issue. However, it introduces another bug as it may clear connection flag when removing a connection from its parent list. However, these flags now serve primarily as a status which indicate that the connection is accounted by the server. When a backend connection is freed, server idle/used counters are decremented accordingly to these flags. With the above patch, an incorrect counter could be adjusted and thus wrapping would occured. The first impact of this bug is that it may distort the estimated number of connections needed by servers, which would result either in poor reuse rate or too many idle connections kept. Another noticeable impact is that it may prevent server deletion. The main problem of the original and current issues is that connection flags are misinterpreted as telling if a connection is present in the idle list. As already described here, in fact these flags are solely a status which indicate that the connection is accounted in server counters. Thus, here are the definitive conclusion that can be learned here : * (conn->flags & CO_FL_LIST_MASK) == 1: the connection is accounted by the server it may or may not be present in the idle list * (conn->flags & CO_FL_LIST_MASK) == 0 the connection is not accounted and not present in idle list The discussion above does not mention session list, but a similar pattern can be observed when CO_FL_SESS_IDLE flag is set. To keep the original issue solved and fix the current one, IO MUX handlers prologue are rewritten. Now, flags are not checked anymore for list appartenance and LIST_INLIST macro is used instead. This is definitely clearer with conn_in_list purpose here. On IO MUX handlers end, conn idle flags may be checked if conn_in_list was true, to reinsert the connection either in idle or safe list. This is considered safe as no function should modify idle flags when a connection is not stored in a list, except during conn_free() operation. This patch must be backported to every stable versions after revert of the above commit. It should be appliable up to 3.0 without any issue. On 2.8 and below, <idle_list> connection member does not exist. It should be safe to check <leaf_p> tree node as a replacement.	2025-11-14 16:06:34 +01:00
Amaury Denoyelle	d79295d89b	Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn" The target patch fixes a rare race condition which happen when a MUX IO handler is working on a connection already moved into the purge list. In this case, the handler will incorrectly moved back the connection into the idle list. To fix this, conn_delete_from_tree() was extended to remove flags along with the connection from the idle list. This was performed when the connection is moved into the purge list. However, it introduces another issue related to the idle server connection accounting. Thus it is necessary to revert it prior to the incoming newer fix. This patch must be backported to every version where the original commit is.	2025-11-14 16:06:34 +01:00

1 2 3 4 5 ...

20327 commits