haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-04-15 21:59:41 -04:00

Author	SHA1	Message	Date
Amaury Denoyelle	47dff5be52	MINOR: quic: implement cc-algo server keyword Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Extend QUIC server configuration so that congestion algorithm and maximum window size can be set on the server line. This can be achieved using quic-cc-algo keyword with a syntax similar to a bind line. This should be backported up to 3.3 as this feature is considered as necessary for full QUIC backend support. Note that this relies on the serie of previous commits which should be picked first.	2025-12-01 15:53:58 +01:00
Amaury Denoyelle	979588227f	MINOR: quic: define quic_cc_algo as const Each QUIC congestion algorithm is defined as a structure with callbacks in it. Every quic_conn has a member pointing to the configured algorithm, inherited from the bind-conf keyword or to the default CUBIC value. Convert all these definitions to const. This ensures that there never will be an accidental modification of a globally shared structure. This also requires to mark quic_cc_algo field in bind_conf and quic_cc as const.	2025-12-01 15:05:41 +01:00
Willy Tarreau	36133759d3	[RELEASE] Released version 3.4-dev0 Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Released version 3.4-dev0 with the following main changes : - MINOR: version: mention that it's development again	2025-11-26 16:12:45 +01:00
Willy Tarreau	e8d6ffb692	MINOR: version: mention that it's development again This essentially reverts `d8ba9a2a92`.	2025-11-26 16:11:47 +01:00
Willy Tarreau	d8ba9a2a92	MINOR: version: mention that 3.3 is stable now This version will be maintained up to around Q1 2027. The INSTALL file also mentions it.	2025-11-26 15:54:30 +01:00
Amaury Denoyelle	49e6fca51b	MINOR: quic: use separate global quic_conns FE/BE lists Each quic_conn instance is stored in a global list. Its purpose is to be able to loop over all known connections during "show quic". Split this into two separate lists for frontend and backend usage. Another change is that closing backend connections do not move into quic_conns_clo list. They remain instead in their original list. The objective of this patch is to reduce the contention between the two sides. Note that this prevents backend connections to be listed in "show quic" now. This will be adjusted in a future patch.	2025-11-25 14:30:18 +01:00
Amaury Denoyelle	a5801e542d	MINOR: quic: split global CID tree between FE and BE sides QUIC CIDs are stored in a global tree. Prior to this patch, CIDs used on both frontend and backend sides were mixed together. This patch implement CID storage separation between FE and BE sides. The original tre quic_cid_trees is splitted as quic_fe_cid_trees/quic_be_cid_trees. This patch should reduce contention between frontend and backend usages. Also, it should reduce the risk of random CID collision.	2025-11-25 14:30:18 +01:00
Jacques Heunis	91eb9b082b	BUG/MINOR: freq_ctr: Prevent possible signed overflow in freq_ctr_overshoot_period Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details All of the other bandwidth-limiting code stores limits and intermediate (byte) counters as unsigned integers. The exception here is freq_ctr_overshoot_period which takes in unsigned values but returns a signed value. While this has the benefit of letting the caller know how far away from overshooting they are, this is not currently leveraged anywhere in the codebase, and it has the downside of halving the positive range of the result. More concretely though, returning a signed integer when all intermediate values are unsigned (and boundaries are not checked) could result in an overflow, producing values that are at best unexpected. In the case of flt_bwlim (the only usage of freq_ctr_overshoot_period in the codebase at the time of writing), an overflow could cause the filter to wait for a large number of milliseconds when in fact it shouldn't wait at all. This is a niche possibility, because it requires that a bandwidth limit is defined in the range [2^31, 2^32). In this case, the raw limit value would not fit into a signed integer, and close to the end of the period, the `(elapsed * freq)/period` calculation could produce a value which also doesn't fit into a signed integer. If at the same time `curr` (the number of events counted so far in the current period) is small, then we could get a very large negative value which overflows. This is undefined behaviour and could produce surprising results. The most obvious outcome is flt_bwlim sometimes waiting for a large amount of time in a case where it shouldn't wait at all, thereby incorrectly slowing down the flow of data. Converting just the return type from signed to unsigned (and checking for the overflow) prevents this undefined behaviour. It also makes the range of valid values consistent between the input and output of freq_ctr_overshoot_period and with the input and output of other freq_ctr functions, thereby reducing the potential for surprise in intermediate calculations: now everything supports the full 0 - 2^32 range.	2025-11-24 14:10:13 +01:00
Christopher Faulet	8e08a635eb	MINOR: muxes: Support an optional ALPN string when defining mux protocols When a multiplexer protocol is defined, it is now possible to specify the ALPN it supports, in binary format. This info is optionnal. For now only the h2 and the h1 multiplexers define an ALPN because this will be mandatory for a fix. But this could be used in future for different purpose. This patch will be mandatory for the next fix.	2025-11-20 16:14:52 +01:00
Willy Tarreau	91d4f4f618	MINOR: limits: keep a copy of the rough estimate of needed FDs in global struct It's always a pain to guess the number of FDs that can be needed by listeners, checks, threads, pollers etc. We have this estimate in global.maxsock before calling set_global_maxconn(), but we lose it the line after. Let's copy it into global.est_fd_usage and keep it. This will be helpful to try to provide more accurate suggestions for maxconn.	2025-11-20 08:44:52 +01:00
Frederic Lecaille	a88fdf8669	MINOR: quic/flags: add missing QUIC flags for flags dev tool. Add missing QUIC_FL_CONN_XPRT_CLOSED quic_conn flags definition.	2025-11-20 08:10:58 +01:00
Amaury Denoyelle	d54d78fe9a	BUG/MINOR: quic: fix FD usage for quic_conn_closed on backend side Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details On the frontend side, QUIC transfer can be performed either via a connection owned FD or multiplex on the listener one. When a quic_conn is freed and converted to quic_conn_closed instance, its FD if open is closed and all exchanges are now multiplex via the listener FD. This is different for the backend as connections only has the choice to use their owned FD. Thus, special care care must be taken when freeing a connection and converting it to a quic_conn_closed instance. In this case, qc_release_fd() is delayed to the quic_conn_closed release. Furthermore, when the FD is transferred, its iocb and owner fields are updated to the new quic_conn_closed instance. Without it, a crash will occur when accessing the freed quic_conn tasklet. A newly dedicated handler quic_conn_closed_sock_fd_iocb is used to ensure access to quic_conn_closed members only.	2025-11-19 16:02:22 +01:00
Amaury Denoyelle	e55bcf5746	BUG/MINOR: mux-quic: implement max-reuse server parameter Properly implement support for max-reuse server keyword. This is done by adding a total count of streams seen for the whole connection. This value is used in avail_streams callback.	2025-11-19 16:02:22 +01:00
Amaury Denoyelle	c67a614e45	MINOR: quic: remove <ipv4> arg from qc_new_conn() Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Remove <ipv4> argument from qc_new_conn(). This parameter is unnecessary as it can be derived from the family type of the addresses also passed as argument.	2025-11-17 10:20:54 +01:00
Amaury Denoyelle	133f100467	MINOR: quic: refactor qc_new_conn() prototype The objective of this patch is to streamline qc_new_conn() usage so that it is similar for frontend and backend sides. Previously, several parameters were set only for frontend connections. These arguments are replaced by a single quic_rx_packet argument, which represents the INITIAL packet triggering the connection allocation on the server side. For a QUIC client endpoint, it remains NULL. This usage is consider more explicit. As a minor change, <target> is moved as the first argument of the function. This is considered useful as this argument determines whether the connection is a frontend or backend entry. Along with these changes, qc_new_conn() documentation has been reworded so that it is now up-to-date with the newest usage.	2025-11-17 10:13:40 +01:00
Amaury Denoyelle	49edaca513	MINOR: quic: try to clarify quic_conn CIDs fields direction quic_conn has two fields named <dcid> and <scid>. It may cause confusion as it is not obvious how these fields are related to the connection direction. Try to improve this by extending the documentation of these two fields.	2025-11-17 10:11:04 +01:00
Amaury Denoyelle	8720130cc7	MINOR: quic: do not use quic_newcid_from_hash64 on BE side quic_newcid_from_hash64 is an external callback. If defined, it serves as a CID method generation, as an alternative to the default random implementation. This mechanism was not correctly implemented on the backend side. Indeed, <hash64> quic_conn member is only setted for frontend connections. The simplest solution would be to properly define it also for backend ones. However, quic_newcid_from_hash64 derivation is really only useful for the frontend side for now. Thus, this patch disables using it on the backend side in favor of the default random generator. To implement this, quic_cid_generate() is splitted in two functions, for both methods of CIDs generation. This is the responsibility of the caller to select the proper method. On backend side, only random implementation is now used.	2025-11-17 10:11:04 +01:00
Christopher Faulet	fc6e3e9081	MINOR: stick-tables: Rename stksess shards to use buckets The shard keyword is already used by the peers and on the server lines. And it is unrelated with the session keys distribution. So instead of talking about shard for the session key hashing, we now use the term "bucket".	2025-11-17 07:42:51 +01:00
Willy Tarreau	675c86c4aa	DEBUG: add BUG_ON_STRESS(): a BUG_ON() implemented only when DEBUG_STRESS > 0 The purpose of this new BUG_ON is beyond BUG_ON_HOT(). While BUG_ON_HOT() is meant to be light but placed on very hot code paths, BUG_ON_STRESS() might be heavy and only used under stress-testing, to try to detect early that something bad is starting to happen. This one is not even type-checked when not defined because we don't want to risk the compiler emitting the slightest piece of code there in production mode, so as to give enough freedom to the developers.	2025-11-14 16:42:53 +01:00
Willy Tarreau	3d441e78e5	DEBUG: extend DEBUG_STRESS to ease testing and turn on extra checks DEBUG_STRESS is currently used only to expose "stress-level". With this patch, we go a bit further, by automatically forcing DEBUG_STRICT and DEBUG_STRICT_ACTION to their highest values in order to enable all BUG_ON levels, and make all of them result in a crash. In addition, care is taken to always only have 0 or 1 in the macro, so that it can be tested using "#if DEBUG_STRESS > 0" as well as "if (DEBUG_STRESS) { }" everywhere. The goal will be to ease insertion of extra tests for builds dedicated to stress-testing that enable possibly expensive extra checks on certain code paths that cannot reasonably be compiled in for production code right now.	2025-11-14 16:38:04 +01:00
Amaury Denoyelle	d79295d89b	Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn" The target patch fixes a rare race condition which happen when a MUX IO handler is working on a connection already moved into the purge list. In this case, the handler will incorrectly moved back the connection into the idle list. To fix this, conn_delete_from_tree() was extended to remove flags along with the connection from the idle list. This was performed when the connection is moved into the purge list. However, it introduces another issue related to the idle server connection accounting. Thus it is necessary to revert it prior to the incoming newer fix. This patch must be backported to every version where the original commit is.	2025-11-14 16:06:34 +01:00
William Lallemand	3d15c07ed0	MINOR: cfgcond: add "awslc_api_atleast" and "awslc_api_before" AWS-LC features are not easily tested with just the openssl version constant. AWS-LC uses its own API versioning stored in the AWSLC_API_VERSION constant. This patch add the two awslc_api_atleast and awslc_api_before predicates that help to check the AWS-LC API.	2025-11-14 11:01:45 +01:00
Amaury Denoyelle	8415254cea	MINOR: check: clarify check-reuse-pool interaction with reuse policy check-reuse-pool can only perform as expected if reuse policy on the backend is set to aggressive or higher. Update the documentation to reflect this and implement a server diag warning.	2025-11-14 10:44:05 +01:00
William Lallemand	2bdf5a7937	BUG/MEDIUM: acme: move from mt_list to a rwlock + ebmbtree The current ACME scheduler suffers from problems due to the way the tasks are stored: - MT_LIST are not scalables when having a lot of ACME tasks and having to look for a specific one. - the acme_task pointer was stored in the ckch_store in order to not passing through the whole list. But a ckch_store can be updated and the pointer lost in the previous one. - when a task fails, the ptr in the ckch_store was not removed because we only work with a copy of the original ckch_store, it would need to lock the ckchs_tree and remove this pointer. This patch fixes the issues by removing the MT_LIST-based architecture, and replacing it by a simple ebmbtree + rwlock design. The pointer to the task is not stored anymore in the ckch_store, but instead it is stored in the acme_tasks tree. Finding a task is done by doing a lookup on this tree with a RDLOCK. Instead of checking if store->acme_task is not NULL, a lookup is also done. This allow to remove the stuck "acme_task" pointer in the store, which was preventing to restart an acme task when the previous failed for this specific certificate. Must be backported in 3.2.	2025-11-13 15:18:12 +01:00
Frederic Lecaille	d84463f9f6	MINOR: quic-be: validate the 0-RTT transport parameters During 0-RTT sessions, some server transport parameters are reused after having been save from previous sessions. These parameters must not be reduced when it resends them. The client must check this is the case when some early data are accepted by the server. This is what is implemented by this patch. Implement qc_early_tranport_params_validate() which checks the new server parameters are not reduced. Also implement qc_ssl_eary_data_accepted() which was not implemented for TLS stack without 0-RTT support (for instance wolfssl). That said this function was no more used. This is why the compilation against wolfssl could not fail.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	6419b9f204	MEDIUM: quic-be: enable the use of 0-RTT This patch allows the use of 0-RTT feature on QUIC server lines with "allow-0rtt" option. In fact 0-RTT is really enabled only if ssl_sock_srv_try_reuse_sess() successfully manages to reuse the SSL session and the chosen application protocol from previous connections. Note that, at this time, 0-RTT works only with quictls and aws-lc as TLS stack. (0-RTT does not work at all (even for QUIC frontends) with libressl).	2025-11-13 14:04:31 +01:00
Frederic Lecaille	a4bbbc75db	MINOR: quic-be: Send post handshake frames from list of frames (0-RTT) This patch is required to make 0-RTT work. It modifies the prototype of quic_build_post_handshake_frames() to send post handshake frames from a list of frames in place of the application encryption level (used as <qc->ael> local variable). This patch does not modify at all the current QUIC stack behavior (even for QUIC frontends). It must be considered as a preparation for the code to come about 0-RTT support for QUIC backends.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	6e14365a5b	MEDIUM: quic-be: modify ssl_sock_srv_try_reuse_sess() to reuse backend sessions (0-RTT) This function is called for both TCP and QUIC connections to reuse SSL sessions saved by ssl_sess_new_srv_cb() callback called upon new SSL session creation. In addition to this, a QUIC SSL session must reuse the ALPN and some specific QUIC transport parameters. This is what is added by this patch for QUIC 0-RTT sessions. Note that for now on, ssl_sock_srv_try_reuse_sess() may fail for QUIC connections if it did not managed to reuse the ALPN. The caller must be informed of such an issue. It must not enable 0-RTT for the current session in this case. This is impossible without ALPN which is required to start a mux. ssl_sock_srv_try_reuse_sess() is modified to always succeeds for TCP connections.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	5309dfb56b	MINOR: quic-be: Save the backend 0-RTT parameters For both TCP and QUIC connections, this is ssl_sess_new_srv_cb() callback which is called when a new SSL session is created. Its role is to save the session to be reused for the next sessions. This patch modifies this callback to save the QUIC parameters to be reused for the next 0-RTT sessions (or during SSL session resumption). The already existing path_params->nego_alpn member is used to store the ALPN as this is done for TCP alongside path_params->tps new quic_early_transport_params struct used to save the QUIC transport parameters to be reused for 0-RTT sessions.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	41e40eb431	MINOR: quic-be: helper quic_reuse_srv_params() function to reuse server params (0-RTT) Implement quic_reuse_srv_params() whose role is to reuse the ALPN negotiated during a first connection to a QUIC backend alongside its transport parameters.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	33564ca54c	MINOR: quic-be: helper functions to save/restore transport params (0-RTT) Define quic_early_transport_params new struct for QUIC transport parameters in relation with 0-RTT. This parameters must be saved during a first session to be reused for 0-RTT next sessions. qc_early_transport_params_cpy() copies the 0-RTT transport parameters to be saved during a first connection to a backend. The copy is made from a quic_transport_params struct to a quic_ealy_transport_params struct. On the contrary, qc_early_transport_params_reuse() copies the transport parameters to be reused for a 0-RTT session from a previous one. The copy is made from a quic_early_transport_params strcut to a quic_transport_params struct. Also add QUIC_EV_EARLY_TRANSP_PARAMS trace event to dump such 0-RTT transport parameters from traces.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	80070fe51c	MEDIUM: quic-be: Parse, store and reuse tokens provided by NEW_TOKEN Add a per thread ist struct to srv_per_thread struct to store the QUIC token to be reused for subsequent sessions. Parse at packet level (from qc_parse_ptk_frms()) these tokens and store them calling qc_try_store_new_token() newly implemented function. This is this new function which does its best (may fail) to update the tokens. Modify qc_do_build_pkt() to resend these tokens calling quic_enc_token() implemented by this patch.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	8f23d4d287	MINOR: quic-be: Parse the NEW_TOKEN frame Rename ->data qf_new_token struct field to ->w_data to distinguish it from ->r_data new field used to parse the NEW_TOKEN frame. Indeed to build the NEW_TOKEN we need to write it to a static buffer into the frame struct. To parse it we only need to store the address of the token field into the RX buffer.	2025-11-13 14:04:31 +01:00
Amaury Denoyelle	5a8728d03a	MEDIUM/OPTIM: quic: alloc quic_conn after CID collision check Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details On Initial packet parsing, a new quic_conn instance is allocated via qc_new_conn(). Then a CID is allocated with its value derivated from client ODCID. On CID tree insert, a collision can occur if another thread was already parsing an Initial packet from the same client. In this case, the connection is released and the packet will be requeued to the other thread. Originally, CID collision check was performed prior to quic_conn allocation. This was changed by the commit below, as this could cause issue on quic_conn alloc failure. commit `4ae29be18c` BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr() However, this procedure is less optimal. Indeed, qc_new_conn() performs many steps, thus it could be better to skip it on Initial CID collision, which can happen frequently. This patch restores the older order of operations, with CID collision check prior to quic_conn allocation. To ensure this does not cause again the same bug, the CID is removed in case of quic_conn alloc failure. This should prevent any loop as it ensures that a CID found in the global tree does not point to a NULL quic_conn, unless if CID is attach to a foreign thread. When this thread will parse a re-enqueued packet, either the quic_conn is already allocated or the CID has been removed, triggering a fresh CID and quic_conn allocation procedure.	2025-11-10 12:10:14 +01:00
Amaury Denoyelle	2623e0a0b7	BUG/MEDIUM: quic: handle collision on CID generation CIDs are provided by haproxy so that the peer can use them as DCID of its packets. Their value is set via a random generator. It happens on several occasions during connection lifetime: * via ODCID derivation if haproxy is the server * on quic_conn init if haproxy is the client * during post-handshake if haproxy is the server * on RETIRE_CONNECTION_ID frame parsing CIDs are stored in a global tree. On ODCID derivation, a check is performed to ensure the CID is not a duplicate value. This is mandatory to properly handle multiple INITIAL packets from the same client on different thread. However, for the other cases, no check is performed for CID collision. As _quic_cid_insert() is silent, the issue is not detected at all. This results in a CID advertized to the peer but not stored in the global one. In the end, this may cause two issues. The first one is that packets from the client which use the new CID will be rejected by haproxy, most probably with a STATELESS_RESET. The second issue is that it can cause a crash during quic_conn release. Indeed, the CID is stored in the quic_conn local tree and thus eb_delete() for the global tree will be performed. As <leaf_p> member is uninit, this results in a segfault. Note that this issue is pretty rare. It can only be observed if running with a high number of concurrent connections in parallel, so that the random generator will provide duplicate values. Patch is still labelled as MEDIUM as this modifies code paths used frequently. To fix this, _quic_cid_insert() unsafe function is completely removed. Instead, quic_cid_insert() can be used, which reports an error code if a collision happens. CID are then stored in the quic_conn tree only after global tree insert success. Here is the solution for each steps if a collision occurs : * on init as client: the connection is completely released * post-handshake: the CID is immediately released. The connection is kept, but it will miss an extra CID. * on RETIRE_CONNECTION_ID parsing: a loop is implemented to retry random generation. It it fails several times, the connection is closed in error. A small convenience change is made to quic_cid_insert(). Output parameter <new_tid> can now be NULL, which is useful as most of the times caller do not care about it. This must be backported up to 2.6.	2025-11-10 12:10:14 +01:00
Amaury Denoyelle	419e5509d8	MINOR: quic: split CID alloc/generation function Split new_quic_cid() function into multiple ones. This patch should not introduce any visible change. The objective is to render CID allocation and generation more modular. The first advantage of this patch is to bring code simplication. In particular, conn CID sequence number increment and insertion into connection tree is simpler than before. Another improvment is also that errors could now be handled easier at each different steps of the CID init. This patch is a prerequisite for the fix on CID collision, thus it must be backported prior to it to every affected version.	2025-11-10 12:10:14 +01:00
Christopher Faulet	ecc2c3a35d	MEDIUM: peers: Remove commitupdate field on stick-tables This stick-table field was atomically updated with the last update id pushed and dumped on the CLI but never used otherwise. And all peer sessions share the same id because it is a stick-table info. So the info in peers dump is pretty limited. So, let's remove it.	2025-11-07 12:17:53 +01:00
Ben Kallus	d5ca3bb3b4	IMPORT: cebtree: Replace offset calculation with offsetof to avoid UB Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details This is the same as the equivalent fix in ebtree: The C standard specifies that it's undefined behavior to dereference NULL (even if you use & right after). The hand-rolled offsetof idiom &(((s*)NULL)->f) is thus technically undefined. This clutters the output of UBSan and is simple to fix: just use the real offsetof when it's available. This is cebtree commit 2d08958858c2b8a1da880061aed941324e20e748.	2025-11-07 07:32:58 +01:00
Willy Tarreau	14087e48b9	MINOR: tools: add env_suggest() to suggest alternate variable names The purpose here is to look in the environment for a variable whose name looks like the provided one. This will be used to try to auto- correct misspelled environment variables that would silently be turned to an empty string.	2025-11-06 19:57:44 +01:00
Willy Tarreau	a4d78dd4f5	MINOR: tools: add support for ist to the word fingerprinting functions The word fingerprinting functions are used to compare similar words to suggest a correctly spelled one that looks like what the user proposed. Currently the functions only support const char*, but there's no reason for this, and it would be convenient to support substrings extracted from random pieces of configurations. Here we're adding new variants "_with_len" that take these ISTs and which are in fact a slight change of the original ones that the old ones now rely on.	2025-11-06 19:57:44 +01:00
Willy Tarreau	0144426dfb	BUG/MEDIUM: server: close a race around ready_srv when deleting a server When a server is being disabled or deleted, in case it matches the backend's ready_srv, this one is reset. However it's currently done in a non-atomic way when the server goes down, and that could occasionally reset the entry matching another server, but more importantly if in parallel some requests are dequeued for that server, it may re-appear there after having been removed, leading to a possible crash once it is fully removed, as shown in issue #3177. Let's make sure we reset the pointer when detaching the server from the proxy, and use a CAS in both cases to only reset this server. This fix needs to be backported to 3.2. There, srv_detach() is in server.c instead of server.h. Thanks to Basha Mougamadou for the detailed report and the useful backtraces.	2025-11-06 19:57:44 +01:00
Christopher Faulet	a1b5325a7a	MINOR: channel: Remove total field from channels The <total> field in the channel structure is now useless, so it can be removed. The <bytes_in> field from the SC is used instead. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	1effe0fc0a	MINOR: applet: Add function to get amount of data in the output buffer The helper function applet_output_data() returns the amount of data in the output buffer of an applet. For applets using the new API, it is based on data present in the outbuf buffer. For legacy applets, it is based on input data present in the input channel's buffer. The HTX version, applet_htx_output_data(), is also available This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	4991a51208	MINOR: stats: Add stats about request and response bytes received and sent In previous patches, these counters were added per frontend, backend, server and listener. With this patch, these counters are reported on stats, including promex. Note that the stats file minor version was incremented by one because the shm_stats_file_object struct size has changed. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	0084baa6ba	MINOR: counters: Remove bytes_in and bytes_out counter from fe/be/srv/li bytes_in and bytes_out counters per frontend, backend, listener and server were removed and we now rely on, respectively on, req_in and res_in counters. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	567df50d91	MINOR: stream: Remove bytes_in and bytes_out counters from stream per-stream bytes_in and bytes_out counters was removed and replaced by req.in and res.in. Coorresponding samples still exists but replies on new counters. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	1c62a6f501	MINOR: counters: Add req_in/req_out/res_in/res_out counters for fe/be/srv/li Thanks to the previous patch, and based on info available on the stream, it is now possible to have counters for frontends, backends, servers and listeners to report number of bytes received and sent on both sides. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	ac9201f929	MINOR: stream: Add samples to get number of bytes received or sent on each side req.in and req.out samples can now be used to get the number of bytes received by a client and send to the server. And res.in and res.out samples can be used to get the number of bytes received by a server and send to the client. These info are stored in the logs structure inside a stream. This patch is related to issue #1617.	2025-11-06 15:01:28 +01:00
Christopher Faulet	629fbbce19	MINOR: stconn: Add counters to SC to know number of bytes received and sent <bytes_in> and <bytes_out> counters were added to SC to count, respectively, the number of bytes received from an endpoint or sent to an endpoint. These counters are updated for connections and applets. This patch is related to issue #1617.	2025-11-06 15:01:28 +01:00
Willy Tarreau	5fe4677231	MINOR: server: move the lock inside srv_add_idle() Almost all callers of _srv_add_idle() lock the list then call the function. It's not the most efficient and it requires some care from the caller to take care of that lock. Let's change this a little bit by having srv_add_idle() that takes the lock and calls _srv_add_idle() that is now inlined. This way callers don't have to handle the lock themselves anymore, and the lock is only taken around the sensitive parts, not the function call+return. Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS to 2.32-2.37M RPS on a 128-thread system.	2025-11-06 13:16:24 +01:00
William Lallemand	546c67d137	MINOR: acme: generate a temporary key pair This patch provides two functions acme_gen_tmp_pkey() and acme_gen_tmp_x509(). These functions generates a unique keypair and X509 certificate that will be stored in tmp_x509 and tmp_pkey. If the key pair or certificate was already generated they will return the existing one. The key is an RSA2048 and the X509 is generated with a expiration in the past. The CN is "expired". These are just placeholders to be used if we don't have files.	2025-11-06 11:56:27 +01:00
William Lallemand	1df55b441b	MEDIUM: ssl/ckch: use ckch_store instead of ckch_data for ckch_conf_kws This is an API change, instead of passing a ckch_data alone, the ckch_conf_kws.func() is called with a ckch_store. This allows the callback to access the whole ckch_store, with the ckch_conf and the ckch_data. But it requires the ckch_conf to be actually put in the ckch_store before.	2025-11-06 11:56:27 +01:00
Amaury Denoyelle	b9809fe0d0	MINOR: quic: remove <mux_state> field Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details This patch removes <mux_state> field from quic_conn structure. The purpose of this field was to indicate if MUX layer above quic_conn is not yet initialized, active, or already released. It became tedious to properly set it as initialization order of the various quic_conn/conn/MUX layers now differ between the frontend and backend sides, and also depending if 0-RTT is used or not. Recently, a new change introduced in connect_server() will allow to initialize QUIC MUX earlier if ALPN is cached on the server structure. This had another level of complexity. Thus, this patch removes <mux_state> field completely. Instead, a new flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place only on close XPRT callback invokation. It can be mixed with the new utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the status of conn/MUX layers now without an extra quic_conn field.	2025-11-05 14:03:34 +01:00
Willy Tarreau	096999ee20	BUG/MEDIUM: connections: permit to permanently remove an idle conn There's currently a function conn_delete_from_tree() which is used to detach an idle connection from the tree it's currently attached to so that it is no longer found. This function is used in three circumstances: - when picking a new connection that no longer has any avail stream - when temporarily working on the connection from an I/O handler, in which case it's re-added at the end - when killing a connection The 2nd case above is quite specific, as it requires to preserve the CO_FL_LIST_MASK flags so that the connection can be re-inserted into the proper tree when leaving the handler. However, there's a catch. When killing a connection, we want to be certain it will not be reinserted into the tree. The flags preservation is causing a tiny race if an I/O happens while the connection is in the kill list, because in this case the I/O handler will note the connection flags, do its work, then reinsert the connection where it believed it was, then the connection gets purged, and another user can find it in the tree. The issue is very difficult to reproduce. On a 128-thread machine it happens in H2 around 500k req/s after around 50M requests. In H1 it happens after around 1 billion requests. The fix here consists in passing an extra argument to the function to indicate if the removal is permanent or not. When it's permanent, the function will clear the associated flags. The callers were adjusted so that all those dequeuing a connection in order to kill it do it permanently and all other ones do it only temporarily. A slightly different approach could have worked: the function could always remove all flags, and the callers would need to restore them. But this would require trickier modifications of the various call places, compared to only passing 0/1 to indicate the permanent status. This will need to be backported to all stable versions. The issue was at least reproduced since 3.1 (not tested before). The patch will need to be adjusted for 3.2 and older, because a 2nd argument "thr" was added in 3.3, so the patch will not apply to older versions as-is.	2025-11-05 11:08:25 +01:00
Olivier Houchard	7d4aa7b22b	BUG/MEDIUM: server: Add a rwlock to path parameter Add a rwlock to control the server's path_parameter, to make sure multiple threads don't set it at the same time, and it can't be seen in an inconsistent state. Also don't set the parameter every time, only set them if they have changed, to prevent needless writes. This does not need to be backported.	2025-11-04 18:47:34 +01:00
Amaury Denoyelle	efe60745b3	MINOR: quic: remove connection arg from qc_new_conn() Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details This patch is similar to the previous one, this time dealing with qc_new_conn(). This function was asymetric on frontend and backend side, as connection argument was set only in the latter case. This was required prior due to qc_alloc_ssl_sock_ctx() signature. This has changed with the previous patch, thus qc_new_conn() can also be realigned on both FE and BE sides. <conn> member of quic_conn instance is always set outside it, in qc_xprt_start() on the backend case.	2025-11-04 17:47:42 +01:00
Amaury Denoyelle	5a17cade4f	MINOR: quic: do not set conn member if ssl_sock_ctx ssl_sock_ctx is a generic object used both on TCP/SSL and QUIC stacks. Most notably it contains a <conn> member which is a pointer to struct connection. On QUIC frontend side, this member is always set to NULL. Indeed, connection is only created after handshake completion. However, this has changed for backend side, where the connection is instantiated prior to its quic_conn counterpart. Thus, ssl_sock_ctx member would be set in this case as a convenience for use later in qc_ssl_do_hanshake(). However, this method was unsafe as the connection can be released, without resetting ssl_sock_ctx member. Thus, the previous patch fixes this by using on <conn> member through the quic_conn instance which is the proper way. Thus, this patch resets ssl_sock_ctx <conn> member to NULL. This is deemed the cleanest method as it ensures that both frontend and backend sides must not use it anymore.	2025-11-04 17:38:09 +01:00
Willy Tarreau	fd012b6c59	OPTIM: proxy: move atomically access fields out of the read-only ones Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Perf top showed that h1_snd_buf() was having great difficulties accessing the proxy's server_id_hdr_name field in the middle of the headers loop. Moving the assignment out of the loop to a local variable moved the problem there as well: \| if (!(h1m->flags & H1_MF_RESP) && isttest(h1c->px->server_id_hdr_n 0.10 \|20b0: mov -0x120(%rbp),%rdi 1.33 \| mov 0x60(%rdi),%r10 0.01 \| test %eax,%eax 0.18 \| jne 2118 12.87 \| mov 0x350(%r10),%rdi 0.01 \| test %rdi,%rdi 0.05 \| je 2118 \| mov 0x358(%r10),%r11 It turns out that there are several atomically accessed fields in its vicinity, causing the cache line to bounce all the time. Let's collect the few frequently changed fields and place them together at the end of the structure, and plug the 32-bit hole with another isolated field. Doing so also reduced a little bit the cost of decrementing be->be_conn in process_stream(), and overall the HTTP/1 performance increased by about 1% both on ARM and x86_64.	2025-11-03 13:54:49 +01:00
Amaury Denoyelle	6bfabfdc77	OPTIM: backend: skip conn reuse for incompatible proxies Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details When trying to reuse a backend connection, a connection hash is calculated to match an entry with similar parameters. Previously, this operation was skipped if the stream content wasn't based on HTTP, as it would have been incompatible with http-reuse. With the introduction of SPOP backends, this condition was removed, so that it can also benefit from connection reuse. However, this means that now hash calcul is always performed when connecting to a server, even for TCP or log backends. This is unnecessary as these proxies cannot perform connection reuse. Note also that reuse mode is resetted on postparsing for incompatible backends. This at least guarantees that no tree lookup will be performed via be_reuse_connection(). However, connection lookup is still performed in the session via session_get_conn() which is another unnecessary operation. Thus, this patch restores the condition so that reuse operations are now entirely skipped if a backend mode is incompatible. This is implemented via a new utility function named be_supports_conn_reuse(). This could be backported up to 3.1, as this commit could be considered as a performance regression for tcp/log backend modes.	2025-11-03 10:43:50 +01:00
Amaury Denoyelle	14a6468df5	MINOR: quic: reject conf with QUIC servers if not compiled Ensure that QUIC support is compiled into haproxy when a QUIC server is configured. This check is performed during _srv_parse_finalize() so that it is detected both on configuration parsing and when adding a dynamic server via the CLI. Note that this changes the behavior of srv_is_quic() utility function. Previously, it always returned false when QUIC support wasn't compiled. With this new check introduced, it is now guaranteed that a QUIC server won't exist if compilation support is not active. Hence srv_is_quic() does not rely anymore on USE_QUIC define.	2025-10-31 11:32:20 +01:00
Willy Tarreau	b0e8edaef2	MEDIUM: mux-h2: do not needlessly refrain from sending data early The mux currently refrains from sending data before H2_CS_FRAME_H, i.e. before the peer's SETTINGS frame was received. While it makes sense on the frontend, it's causing harm on the backend because it forces the first request to be sent in two halves over an extra RTT: first the preface and settings, second the request once the settings are received. This is totally contrary to the philosophy of the H2 protocol, consisting in permitting the client to send as soon as possible. Actually what happens is the following: - process_stream() calls connect_server() - connect_server() creates a connection, and if the proto/alpn is guessed or known, the mux is instantiated for the current request. - the H2 init code wakes the h2 tasklet up and returns - process_stream() tries to send the request using h2_snd_buf(), but that one sees that we're before H2_CS_FRAME_H, refrains from doing so and returns. - process_stream() subscribes and quits - the h2 tasklet can now execute to send the preface and settings, which leave as a first TCP segment. The connection is ready. - the iocb is woken again once the server's SETTINGS frame is received, turning the connection to the H2_CS_FRAME_H state, and the iocb wake up process_stream(). - process_stream() executes again and can try to send again. - h2_snd_buf() is called and finally sends the request as a second TCP segment. Not only this is inefficient, but it also renders 0-RTT and TFO impossible on H2 connections. When 0-RTT is used, only the preface and settings leave as early data (the very first data of that connection), which is totally pointless. In order to fix this, we have to go through a few steps: - first we need to let data be sent to a server immediately after the SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of H2_CS_FRAME_H). However, some protocol extensions are advertised by the server using SETTINGS (e.g. RFC8441) and some requests might need to know the existence of such extensions. For this reason we're adding a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some operations were not done because a server's SETTINGS frame is needed. This is set when trying to send a protocol upgrade or extended CONNECT during H2_CS_SETTINGS1, indicating that it's needed to wait for H2_CS_FRAME_H in this case. The flag is always set on frontend connections. This is what is being done in this patch. - second, we need to be able to push the preface opportunistically with the first h2_snd_buf() so that it's not needed to wake the tasklet up just to send that and wake process_stream() again. This will be in a separate patch. By doing the first step, we're at least saving one needless tasklet wakeup per connection (~9%), which results in ~5% backend connection rate increase.	2025-10-30 18:16:54 +01:00
William Lallemand	1e2f920be6	MINOR: listener: implement bind_conf_find_by_name() Returns a pointer to the first bind_conf matching <name> in a frontend <front>. When name is prefixed by a @ (@<filename>:<linenum>), it tries to look for the corresponding filename and line of the configuration file. NULL is returned if no match is found.	2025-10-30 10:37:42 +01:00
sftcd	23f5cbb411	MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI This patch adds functions to expose Encrypted Client Hello (ECH) status and outer SNI information for logging and sample fetching. Two new helper functions are introduced in ech.c: - conn_get_ech_status() places the ECH processing status string into a buffer. - conn_get_ech_outer_sni() retrieves the outer SNI value if ECH succeeded. Two new sample fetch keywords are added: - "ssl_fc_ech_status" returns the ECH status string. - "ssl_fc_ech_outer_sni" returns the outer SNI value seen during ECH. These allow ECH information to be used in HAProxy logs, ACLs, and captures.	2025-10-30 10:37:30 +01:00
sftcd	dba4fd248a	MEDIUM: ssl/ech: config and load keys This patch introduces the USE_ECH option in the Makefile to enable support for Encrypted Client Hello (ECH) with OpenSSL. A new function, load_echkeys, is added to load ECH keys from a specified directory. The SSL context initialization process in ssl_sock.c is updated to load these keys if configured. A new configuration directive, `ech`, is introduced to allow users to specify the ECH key directory in the listener configuration.	2025-10-30 10:37:12 +01:00
Remi Tricot-Le Breton	dc35a3487b	MINOR: ssl: Do not dump decrypted privkeys in 'dump ssl cert' A private keys that is password protected and was decoded during init thanks to the password obtained thanks to 'ssl-passphrase-cmd' should not be dumped via 'dump ssl cert' CLI command.	2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton	478dd7bad0	MEDIUM: ssl: Add certificate password callback that calls external command When a certificate is protected by a password, we can provide the password via the dedicated pem_password_cb param provided to PEM_read_bio_PrivateKey. HAProxy will fetch the password automatically during init by calling a user-defined external command that should dump the right password on its standard output (see new 'ssl-passphrase-cmd' global option).	2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton	1ec59d3426	MINOR: init: Make devnullfd global and create it earlier in init The devnull fd might be needed during configuration parsing, if some options require to fork/exec for instance. So we now create it much earlier in the init process and without depending on the '-q' or '-d' parameters.	2025-10-29 10:54:17 +01:00
Willy Tarreau	2d7e3ddd4a	BUG/MEDIUM: cli: do not return ACKs one char at a time Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Since 3.0 where the CLI started to use rcv_buf, it appears that some external tools sending chained commands are randomly experiencing failures. Each time this happens when the whole command is sent as a single packet, immediately followed by a close. This is not a correct way to use the CLI but this has been working for ages for simple netcat-based scripts, so we should at least try to preserve this. The cause of the failure is that the first LF that acks a command is immediately sent back to the client and rejected due to the closed connection. This in turn forwards the error back to the applet which aborts its processing. Before 3.0 the responses would be queued into the buffer, then sent back to the channel, and would all fail at once. This changed when snd_buf/rcv_buf were implemented because the applets are much more responsive and since they yield between each command, they can deliver one ACK at a time that is immediately forwarded down the chain. An easy way to observe the problem is to send 5 map updates, a shutdown, and immediately close via tcploop, and in parallel run a periodic "show map" to count the number of elements: $ tcploop -U /tmp/sock1 C S:"add map #0 1 1; add map #0 2 2; add map #0 3 3; add map #0 4 4; add map #0 5 5\n" F K Before 3.0, there would always be 5 elements. Since 3.0 and before `20ec1de214` ("MAJOR: cli: Refacor parsing and execution of pipelined commands"), almost always 2. And since that commit above in 3.2, almost always one. Doing the same using socat or netcat shows almost always 5... It's entirely timing-dependent, and might even vary based on the RTT between the client and haproxy! The approach taken here consists in doing the same principle as MSG_MORE or Nagle but on the response buffer: the applet doesn't need to send a single ACK for each command when it has already been woken up and is scheduled to come back to work. It's fine (and even desirable) that ACKs are grouped in a single packet as much as possible. For this reason, this patch implements APPCTX_CLI_ST1_YIELD, a new CLI flag which indicates that the applet left in yielding condition, i.e. it has not finished its work. This flag is used by .rcv_buf to hold pending data. This way we won't return partial responses for no reason, and we can continue to emulate the previous behavior. One very nice benefit to this is that it saves huge amounts of CPU on the client. In the test below that tries to update 1M map entries, the CPU used by socat went from 100% to 0% and the total transfer time dropped by 28%: before: $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \ printf "add map #0 %d %d\n",i,i,i }}' \| socat /tmp/sock1 - >/dev/null real 0m2.407s user 0m1.485s sys 0m1.682s after: $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \ printf "add map #0 %d %d\n",i,i,i }}' \| socat /tmp/sock1 - >/dev/null real 0m1.721s user 0m0.952s sys 0m0.057s The difference is also quite visible on the number of syscalls during the test (for 1k updates): before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.071691 0 100001 sendmsg after: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000011 1 9 sendmsg This patch will need to be backported to 3.0, and depends on these two patches to be backported as well: MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()	2025-10-27 16:57:07 +01:00
Olivier Houchard	837351245a	BUG/MEDIUM: mt_list: Use atomic operations to prevent compiler optims As a folow-up to `f40f5401b9`, explicitely use atomic operations to set the prev and next fields, to make sure the compiler can't assume anything about it, and just does it. This should be backported after `f40f5401b9` up to 2.8.	2025-10-24 13:34:41 +02:00
Willy Tarreau	2ec6df59bf	BUILD: openssl-compat: fix build failure with OPENSSL=0 and KTLS=1 The USE_KTLS test is currently being done outside of the USE_OPENSSL guard so disabling USE_OPENSSL still results in build failures on libcs built with support for kernels before 4.17, because we enable KTLS by default on linux. Let's move the KTLS block inside the USE_OPENSSL guard instead. No backport is needed since KTLS is only in 3.3.	2025-10-24 10:45:02 +02:00
Aurelien DARRAGON	d655ed5f14	BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency (2nd attempt) Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details This is a second attempt at fixing issues on 32bits systems which would trigger the following BUG_ON() statement: FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825 shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this This is a drop-in replacement for `d30b88a6c` + `4693ee0ff`, as suggested by Willy. Indeed, on supported platforms unsigned int can be assumed to be 4 bytes long, and long can be assumed to be 8 bytes long. As such, the previous attempt was overkill and added unecessary maintenance complexity which could result in bugs if not used properly. Moreover, it would only partially solve the issue, since on little endian vs big endian architectures, the provisioned memory areas (originating from the same shm stats file) could be read differently by the host. Instead we fix the aligments issues, and this alone helps to ensure struct memory consistency on 64 vs 32bits platforms. It was tested on both i386 and i586. last_change and last_sess counters are now stored as unsigned int, as it helped to fix the alignment issues and they were found to be used as 32bits integers anyway. Thanks to Willy for problem analysis and the patch proposal. No backport needed.	2025-10-24 09:35:38 +02:00
Aurelien DARRAGON	a931779dde	Revert "MINOR: compiler: add FIXED_SIZE(size, type, name) macro" This reverts commit `466a603b59`. Due to the last 2 commits, this macro is now unused, and will probably never be used, so let's get rid of that for now.	2025-10-24 09:35:34 +02:00
Aurelien DARRAGON	8277f891d2	Revert "MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct" This reverts commit `4693ee0ff7`. As discussed in GH #3168, this works but it is not the proper way to fix the issue. See following commits.	2025-10-24 09:35:29 +02:00
Aurelien DARRAGON	c0d952ccc1	Revert "BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency" This reverts commit `d30b88a6cc`. As discussed in GH #3168, this works but it is not the proper way to fix the issue. See following commits.	2025-10-24 09:35:25 +02:00
Amaury Denoyelle	7ba4b0ad5f	BUG/MINOR: quic: rename and duplicate stream settings Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Several settings can be set to control stream multiplexing and associated receive window. Previously, all of these settings were configured using prefix "tune.quic.frontend.", despite being applied blindly on both sides. Fix this by duplicating these settings specific to frontend and backend side. Options are also renamed to use the standardize prefix "tune.quic.[be\|fe].stream." notation. Also, each option is individually renamed to better reflect its purpose and hide technical details relative to QUIC transport parameter naming : * max-data-size -> stream.rxbuf * max-streams-bidi -> stream.max-concurrent * stream-data-ratio -> stream.data-ratio No need to backport.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	d5142706f8	BUG/MINOR: quic: split option for congestion max window size	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	33afba0dda	BUG/MINOR: quic: split max-idle-timeout option for FE/BE usage Streamline max-idle-timeout option. Rename it to use the newer cohesive naming scheme 'tune.quic.fe\|be.'. Two different fields were already defined in global struct. These fields are moved into quic_tune along with other QUIC settings. However, no parser was defined for backend option, this commit fixes this. No need to backport this.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	5bc659a4a2	MINOR: quic: rename frontend sock-per-conn setting On frontend side, a quic_conn can have a dedicated FD or use the listener one. These different modes can be activated via a global QUIC tune setting. This patch adjusts the option. First, it is renamed to the more meaningful name 'tune.quic.fe.sock-per-conn'. Also, arguments are now either 'default-on' or 'force-off'. The objective is to better highlight reliationship with 'quic-socket' bind option. The older option is deprecated and will be removed in 3.5.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	a14c6cee17	MINOR: quic: rename retry-threshold setting A QUIC global tune setting is defined to be able to force Retry emission prior to handshake. By definition, this ability is only supported by QUIC servers, hence it is a frontend option only. Rename the option to use "fe" prefix. The old option name is deprecated and will be removed in 3.5	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	d248c5bd21	MINOR: quic: rename max Tx mem setting QUIC global memory can be limited across the entire process via a global tune setting. Previously, this setting used to misleading "frontend" prefix. As this is applied as a sum between all QUIC connections, both from frontend and backend sides, remove the prefix. The new option name is "tune.quic.mem.tx-max". The older option name is deprecated and will be removed in 3.5.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	9bfe9b9e21	MINOR: quic: split Tx options for FE/BE usage This patch is similar to the previous one, except that it is focused on Tx QUIC settings. It is now possible to toggle GSO and pacing on frontend and backend sides independently. As with previous patch, option are renamed to use "fe/be" unified prefixes. This is part of the current serie of commits which unify QUI settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	33a8cb87a9	MINOR: quic: split congestion controler options for FE/BE usage Various settings can be configured related to QUIC congestion controler. This patch duplicates them to be able to set independent values on frontend and backend sides. As with previous patch, option are renamed to use "fe/be" unified prefixes. This is part of the current serie of commits which unify QUIC settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	7640e9a9ee	MINOR: quic: duplicate glitches FE option on BE side Previously, QUIC glitches support was only implemented for frontend side. Extend this so that the option can be specified separately both on frontend and backend sides. Function _qcc_report_glitch() now retrieves the relevant max value based on connection side. In addition to this, option has been renamed to use "fe/be" prefixes. This is part of the current serie of commits which unify QUIC settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	b34cd0b506	MINOR: quic: rename "no-quic" to "tune.quic.listen" Rename the option to quickly enable/disable every QUIC listeners. It now takes an argument on/off. The documentation is extended to reflect the fact that QUIC backend are not impacted by this option. The older keyword is simply removed. Deprecation is considered unnecessary as this setting is only useful during debugging.	2025-10-23 16:47:58 +02:00
Amaury Denoyelle	42e5ec6519	MINOR: quic: prepare support for options on FE/BE side A major reorganization of QUIC settings is going to be performed. One of its objective is to clearly define options which can be separately configured on frontend and backend proxy sides. To implement this, quic_tune structure is extended to support fe and be options. A set of macros/functions is also defined : it allows to retrieve an option defined on both sides with unified code, based on proxy side of a quic_conn/connection instance.	2025-10-23 15:06:01 +02:00
Olivier Houchard	f40f5401b9	BUG/MEDIUM: mt_lists: Avoid el->prev = el->next = el Avoid setting both el->prev and el->next on the same line. The goal is to set both el->prev and el->next to el, but a naive compiler, such as when we're using -O0, will set el->next first, then will set el->prev to the value of el->next, but if we're unlucky, el->next will have been set to something else by another thread. So explicitely set both to what we want. This should be backported up to 2.8.	2025-10-23 14:43:51 +02:00
Aurelien DARRAGON	d30b88a6cc	BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details As reported by @tianon on GH #3168, running haproxy on 32bits i386 platform would trigger the following BUG_ON() statement: FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825 shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this In fact, some efforts were already taken to ensure shm_stats_file_object struct size remains consistent on 64 vs 32 bits platforms, since shm_stats_file_object is part of the public API and directly exposed in the stats file. However, some parts were overlooked: some structs that are embedded in shm_stats_file_object struct itself weren't using fixed-width integers, and would sometime be unaligned. The result of this is that it was up to the compiler (platform-dependent) to choose how to deal with such ambiguities, which could cause the struct mapping/size to be inconsistent from one platform to another. Hopefully this was caught by the BUG_ON() statement and with the precious help of @tianon To fix this, we now use fixed-width integers everywhere for members (and submembers) of shm_stats_file_object struct, and we use explicit padding where missing to avoid automatic padding when we don't expect one. As for the previous commit, we leverage FIXED_SIZE() and FIXED_SIZE_ARRAY() macro to set the expected width for each integer without causing build issues on platform that don't support larger integers. No backport needed, this feature was introduced during 3.3-dev.	2025-10-22 20:52:22 +02:00
Aurelien DARRAGON	4693ee0ff7	MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct freq-ctr struct is used by the shm_stats_file API, and more precisely, it is used in the shm_stats_file_object struct for counters. shm_stats_file_object struct requires to be plateform-independent, thus we switch to using explicit size types (AKA fixed width integer types) for freq-ctr, in the attempt to make freq-ctr size and memory mapping consistent from one platform to another. We cannot simply use fixed-width integer because some of them are involved in atomic operations, and forcing a given width could cause build issues on some platforms where atomic ops are not implemented for large integers. Instead we leverage the FIXED_SIZE macro to keep handling the integers as before, but forcing them to be stored using expected number of bytes (unused bytes will simply be ignored). No change of behavior should be expected.	2025-10-22 20:52:18 +02:00
Aurelien DARRAGON	466a603b59	MINOR: compiler: add FIXED_SIZE(size, type, name) macro FIXED_SIZE() macro can be used to instruct the compiler that the struct member named <name>, handled as <type>, must be stored using <size> bytes and that even if the type used is actualler smaller than the expected size FIXED_SIZE_ARRAY(), similar to FIXED_SIZE() but for arrays: it takes an extra argument which is the number of members. They may be used for portability concerns to ensure a structure mapping remains consistent between platforms.	2025-10-22 20:52:12 +02:00
Amaury Denoyelle	f50425c021	MINOR: quic: remove received CRYPTO temporary tree storage Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details The previous commit switch from ncbuf to ncbmbuf as storage for received CRYPTO frames. The latter ensures that buffering of such frames cannot fail anymore due to gaps size. Previously, extra mechanism were implemented on QUIC frames parsing function to overcome the limitation of ncbuf on gaps size. Before insertion, CRYPTO frames were stored in a temporary tree to order their insertion. As this is not necessary anymore, this commit removes the temporary tree insertion. This commit is closely associated to the previous bug fix. As it provides a neat optimization and code simplication, it can be backported with it, but not in the next immediate release to spot potential regression.	2025-10-22 15:24:02 +02:00
Amaury Denoyelle	4c11206395	BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling In QUIC, TLS handshake messages such as ClientHello are encapsulated in CRYPTO frames. Each QUIC implementation can split the content in several frames of random sizes. In fact, this feature is now used by several clients, based on chrome so-called "Chaos protection" mechanism : https://quiche.googlesource.com/quiche/+/cb6b51054274cb2c939264faf34a1776e0a5bab7 To support this, haproxy uses a ncbuf storage to store received CRYPTO frames before passing it to the SSL library. However, this storage suffers from a limitation as gaps between two filled blocks cannot be smaller than 8 bytes. Thus, depending on the size of received CRYPTO frames and their order, ncbuf may not be sufficient. Over time, several mechanisms were implemented in haproxy QUIC frames parsing to overcome the ncbuf limitation. However, reports recently highlight that with some clients haproxy is not able to deal with CRYPTO frames reception. In particular, this is the case with the latest ngtcp2 release, which implements a similar chaos protection mechanism via the following patch. It also seems that this impacts haproxy interaction with firefox. commit 89c29fd8611d5e6d2f6b1f475c5e3494c376028c Author: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> Date: Mon Aug 4 22:48:06 2025 +0900 Crumble Client Initial CRYPTO (aka chaos protection) To fix haproxy CRYPTO frames buffering once and for all, an alternative non-contiguous buffer named ncbmbuf has been recently implemented. This type does not suffer from gaps size limitation, albeit at the cost of a small reduction in the size available for data storage. Thus, the purpose of this current patch is to replace ncbuf with the newer ncbmbuf for QUIC CRYPTO frames parsing. Now, ncbmb_add() is used to buffer received frames which is guaranteed to suceed. The only remaining case of error is if a received frame offset and length exceed the ncbmbuf data storage, which would result in a CRYPTO_BUFFER_EXCEEDED error code. A notable behavior change when switching to ncbmbuf implementation is that NCB_ADD_COMPARE mode cannot be used anymore during add. Instead, crypto frame content received at a similar offset will be overwritten. A final note regarding STREAM frames parsing. For now, it is considered unnecessary to switch from ncbuf in this case. Indeed, QUIC clients does not perform aggressive fragmentation for them. Keeping ncbuf ensure that the data storage size is bigger than the equivalent ncbmbuf area. This should fix github issue #3141. This patch must be backported up to 2.6. It is first necessary to pick the relevant commits for ncbmbuf implementation prior to it.	2025-10-22 15:04:41 +02:00
Amaury Denoyelle	8b8ab2824e	MINOR: ncbmbuf: implement advance operation Implement ncbmb_advance() function for the ncbmbuf type. This allows to remove bytes in front of the buffer, regardless of the existing gaps. This is implemented by resetting the corresponding bits of the bitmap. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	42c495f3d7	MINOR: ncbmbuf: implement ncbmb_data() Implement ncbmb_data() function for the ncbmbuf type. Its purpose is similar to its ncbuf counterpart : it returns the size in bytes of data starting at a specific offset until the next gap. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	1e1a3aa6aa	MINOR: ncbmbuf: implement add This patch implements add operation for ncbmbuf type. This function is simpler than its ncbuf counterpart. Indeed, for now only NCB_ADD_OVERWRT mode is supported. This compromise has been chosen as ncbmbuf will be first used for QUIC CRYPTO frames handling, which does not mandate to compare existing filled blocks during insertion. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	b9f91ad3ff	MINOR: ncbmbuf: define new ncbmbuf type Define ncbmbuf which is an alternative non-contiguous buffer implementation. "bm" abbreviation stands for bitmap, which reflects how gaps and filled blocks are encoded. The main purpose of this implementation is to get rid of the ncbuf limitation regarding the minimal size for gaps between two blocks of data. This commit adds the new module ncbmbuf. Along with it, some utility functions such as ncbmb_make(), ncbmb_init() and ncbmb_is_empty() are defined. Public API of ncbmbuf will be extended in the following patches. This patch is not considered a bug fix. However, it will be required to fix issue encountered on QUIC CRYPTO frames parsing. Thus, it will be necessary to backport the current patch prior to the fix to come.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	59f0bafef2	MINOR: ncbuf: extract common types ncbuf is a module which provide a non-contiguous buffer type implementation. This patch extracts some basic types related to it into a new file ncbuf_common.h. This patch will be useful to provide a new non-contiguous buffer alternative implementation based on a bitmap. This patch is not a bug fix. However, it is necessary for ncbmbuf implementation which will be required to fix a QUIC issue on CRYPTO frames parsing. This, it will be necessary to backport the current patch prior to the fix to come.	2025-10-22 11:11:20 +02:00
Olivier Houchard	d5562e31bd	MEDIUM: stick-tables: Remove the table lock Remove the table lock, it was only protecting the per-table expiration date, and that task is gone.	2025-10-20 15:04:47 +02:00
Olivier Houchard	8bc8a21b25	MEDIUM: stick-tables: Use a per-shard expiration task Instead of having per-table expiration tasks, just use one per shard. The task will now go through all the tables to expire entries. When a table gets an expiration earlier than the one previously known, it will be put in a mt-list, and the task will be responsible to put it into an eb32, ordered based on the next expiration. Each per-shard task will run on a different thread, so it should lead to a better load distribution than the per-table tasks.	2025-10-20 15:04:47 +02:00
Olivier Houchard	945aa0ea82	MINOR: initcalls: Add a new initcall stage, STG_INIT_2 Add a new initcall stage, STG_INIT_2, for stuff to be called after step_init_2() is called, so after we know for sure that global.nbthread will be set. Modify stick-tables stkt_late_init() to run at STG_INIT_2 instead of STG_INIT, in anticipation for it to be enhanced and have a need for global.nbthread.	2025-10-20 15:04:41 +02:00
Olivier Houchard	7a33b90b3c	BUG/MEDIUM: mt_list: Make sure not to unlock the element twice Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details In mt_list_delete(), if the element was not in a list, then n and p will point to it, and so setting n->prev and n->next will be enough to unlock it. Don't do it twice, as once it's been done the first time, another thread may be working with it, and may have added it to a list already, and doing it a second time can lead to list inconsistencies. This should be backported up to 2.8.	2025-10-19 23:21:42 +02:00

1 2 3 4 5 ...

8810 commits