haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-04-15 21:59:41 -04:00

Author	SHA1	Message	Date
Willy Tarreau	9019a5db93	MEDIUM: counters: return aggregate extra counters in ->fill_stats() Now thanks to new macro EXTRA_COUNTERS_AGGR() we can iterate over all thread groups storages when returning the data for a given metric. This remains convenient and mostly transparent. The caller continues to pass the pointer to the metric in the first group, and offsets are calculated for all other groups and data summed. For now all groups except the first one contain only zeroes but reported values are nevertheless correct.	2026-02-26 17:03:53 +01:00
Willy Tarreau	de0eddf512	MINOR: counters: add EXTRA_COUNTERS_BASE() to retrieve extra_counters base storage The goal is to always retrieve the storage address of the first thread group for the given module. This will be used to iterate over all thread groups. For now it returns the same value as EXTRA_COUNTERS_GET().	2026-02-26 17:03:53 +01:00
Willy Tarreau	8dd22a62a4	CLEANUP: counters: only retrieve zeroes for unallocated extra_counters Since version 2.4 with commit `7f8f6cb926` ("BUG/MEDIUM: stats: prevent crash if counters not alloc with dummy one") we can afford to always update extra_counters because we know they're always either allocated or linked to a dedicated trash. However, the ->fill_stats() callbacks continue to access such values, making it technically possible to retrieve random counters from this trash, which is not really clean. Let's implement an explicit test in the ->fill_stats() functions to only return 0 for the metric when not allocated like this. It's much cleaner because it guarantees that we're returning an empty counter in this case rather than random values. The situation currently happens for dummy servers like the ones used in Lua proxies as well as those used by rings (e.g. used for logging or traces). Normally, none of the objects retrieved via stats or Prometheus is concerned by this unallocated extra_counters situation, so this is more about a cleanup than a real fix.	2026-02-26 08:24:03 +01:00
Willy Tarreau	95a9f472d2	MEDIUM: counters: change the fill_stats() API to pass the module and extra_counters We'll soon need to iterate over thread groups in the fill_stats() functions, so let's first pass the extra_counters and stats_module pointers to the fill_stats functions. They now call EXTRA_COUNTERS_GET() themselves with these elements in order to retrieve the required pointer. Nothing else changed, and it's getting even a bit more transparent for callers. This doesn't change anything visible however.	2026-02-26 08:24:03 +01:00
Willy Tarreau	44932b6c41	BUG/MEDIUM: mux-h2: make sure to always report pending errors to the stream Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Some stream parsing errors that do not affect the connection result in the parsed block not being transferred from the rx buffer to the channel and not being reported upstream in rcv_buf(), causing the stconn to time out. Let's detect this condition, and propagate term flags anyway since no more progress will be made otherwise. This should be backported at least till 3.2, probably even 2.8.	2026-02-26 00:30:42 +01:00
Willy Tarreau	e67e36c9eb	MINOR: mux-h2: add a new setting, "tune.h2.log-errors" to tweak error logging The H2 mux currently logs whenever some decoding fails. Most of the errors happen at the connection level, but some are even at the stream level, meaning that multiple logs can be emitted for a given connection, which can quickly use some resource for little value. This new setting allows to tweak this and decide to only log errors that affect the connection, or even none at all. This should be backported at least as far as 3.2.	2026-02-25 22:43:40 +01:00
Willy Tarreau	cad6e0b3da	MINOR: mux-h2: also count glitches on invalid trailers Two cases were not causing glitches to be incremented: - invalid trailers - trailers on closed streams This patch addresses this. It could be backported, at least to 3.2.	2026-02-25 22:03:16 +01:00
Christopher Faulet	36282ae348	MEDIUM: mux-h1/mux-h2/mux-fcgi/h3: Disable 0-copy for buffers of different size Today, it is useless to check the buffers size before performing a 0-copy in muxes when data are sent, but it will be mandatory when the large buffers support on channels will be added. Indeed, muxes will still rely on normal buffers, so we must take care to never swap buffers of different size.	2026-02-18 13:26:21 +01:00
Christopher Faulet	6bf450b7fe	MINOR: tree-wide: Use the buffer size instead of global setting when possible At many places, we rely on global.tune.bufsize value instead of using the buffer size. For now, it is not a problem. But if we want to be able to deal with buffers of different sizes, it is good to reduce as far as possible dependencies on the global value. most of time, we can use b_size() or c_size() functions. The main change is performed on the error snapshot where the buffer size was added into the error_snapshot structure.	2026-02-18 13:26:20 +01:00
Christopher Faulet	cda056b9f4	BUG/MEDIUM: mux-h2/quic: Stop sending via fast-forward if stream is closed If is illegal to send data if the stream is already closed. The case is properly handled when data are sent via snd_buf(), by draining the data. But it was still possible to process these data via nego_ff(). So, in this patch, both for the H2 and QUIC multiplexers, the fast-forward is disabled if the stream is closed and nothing is performed. Doing so, we will automatically fall back on the regular sending path and be able to drain data in snd_buf(). Thanks to Mike Walker for his investigation on the subject. This patch should be backported as far as 3.0.	2026-02-18 09:44:09 +01:00
Ilia Shipitsin	f8a77ecf62	CLEANUP: assorted typo fixes in the code, commits and doc Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details	2025-12-25 19:45:29 +01:00
Willy Tarreau	0901f60cef	MINOR: mux-h2: perform a graceful close at 75% glitches threshold This avoids hitting the hard wall for connections with non-compliant peers that would be accumulating errors over long connections. We now permit to recycle the connection early enough to reset the connection counter. This was tested artificially by adding this to h2c_frt_handle_headers(): h2c_report_glitch(h2c, 1, "new stream"); or this to h2_detach(): h2c_report_glitch(h2c, 1, "detaching"); and injecting using h2load -c 1 -n 1000 0:4445 on a config featuring tune.h2.fe.glitches-threshold 1000: finished in 8.74ms, 85802.54 req/s, 686.62MB/s requests: 1000 total, 751 started, 751 done, 750 succeeded, 250 failed, 250 errored, 0 timeout status codes: 750 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 6.00MB (6293303) total, 132.57KB (135750) headers (space savings 29.84%), 5.86MB (6144000) data min max mean sd +/- sd time for request: 9us 178us 10us 6us 99.47% time for connect: 139us 139us 139us 0us 100.00% time to 1st byte: 339us 339us 339us 0us 100.00% req/s : 87477.70 87477.70 87477.70 0.00 100.00% The failures are due to h2load not supporting reconnection.	2025-12-20 19:26:29 +01:00
Willy Tarreau	52adeef7e1	MINOR: mux-h2: add missing glitch count for non-decodable H2 headers One rare error case could produce a protocol error on the stream when not being able to decode response headers wasn't being accounted as a glitch, so let's fix it.	2025-12-20 19:11:16 +01:00
Willy Tarreau	9a046fc3ad	BUG/MEDIUM: mux-h2: synchronize all conditions to create a new backend stream In H2 the conditions to create a new stream differ for a client and a server when a GOAWAY was exchanged. While on the server, any stream whose ID is lower than or equal to the one advertised in GOAWAY is valid, for a client it's forbidden to create any stream after receipt of a GOAWAY, even if its ID is lower than or equal to the last one, despite the server not being able to tell the difference from the number of streams in flight. Unfortunately, the logic in the code did not always reflect this specificity of the client (the backend code in our case), and most often considered that it was still permitted to create a new stream until the max_id was greater than or equal to the advertised last_id. This is for example what h2c_is_dead() and h2c_streams_left() do. In other places, such as h2_avail_streams(), the rule is properly taken into account. Very often the advertised last_id is the same, and this is also what haproxy does (which explains why it's impossible to reproduce the issue by chaining two haproxy layers), but a server may wish to advertise any ID including 2^31-1 as mentioned in the spec, and in this case the functions would behave differently. This discrepancy results in a corner case where a GOAWAY received on an idle connection will cause the next stream creation to be initially accepted but then rejected via h2_avail_streams(), and the connection left in a bad state, still attached to the session due to http-reuse safe, but not reinserted into idle list, since the backend code currently is not able to properly recover from this situation. Worse, the idle flags are no longer on it but TASK_F_USR1 still is, and this makes the recently added BUG_ON() rightfully trigger since this case is not supposed to happen. Admittedly more of the backend recovery code needs to be reworked, however the mux must consistently decide whether or not a connection may be reused or needs to be released. This commit fixes the affected logic by introducing a new function "h2c_reached_last_stream()" which says if a connection has reached its last stream, regardless of the side, and using this one everywhere max_id was compared to last_id. This is sufficient to address the corner case that be_reuse_connection() currently cannot recover from. This is in relation to GH issue #3215 and it should be sufficient to fix the issue there. Thanks to Chris Staite for reporting the issue and kudos to Amaury for spotting the events sequence that can lead to this situation. This patch must be backported to 3.3 first, then to older versions later. It's worth noting that it's much more difficult to observe the issue before 3.3 because the BUG_ON() is not there, and the possibly non-released connection might end up being killed for other reasons (timeouts etc). But one possible visible effect might be the impossibility to delete a server (which Chris observed in 3.3).	2025-12-18 17:01:32 +01:00
Willy Tarreau	3ec5818807	MINOR: h2/trace: emit a trace of the received RST_STREAM type Right now we don't get any state trace when receiving an RST_STREAM, and this is not convenient because RST_STREAM(0) is not visible at all, except in developer level because the function is entered and left. Let's extract the RST code first and always log it using TRACE_PRINTF() (along with h2c/h2s) so that it's possible to detect certain codes being used.	2025-12-10 15:58:56 +01:00
Christopher Faulet	8e08a635eb	MINOR: muxes: Support an optional ALPN string when defining mux protocols When a multiplexer protocol is defined, it is now possible to specify the ALPN it supports, in binary format. This info is optionnal. For now only the h2 and the h1 multiplexers define an ALPN because this will be mandatory for a fix. But this could be used in future for different purpose. This patch will be mandatory for the next fix.	2025-11-20 16:14:52 +01:00
Willy Tarreau	4a6dec7193	DEBUG: servers: add a few checks for stress-testing idle conns The latest idle conns fix `9481cef948` ("BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list") addresses a very hard-to-hit case which manifests itself with an attempt to reuse a connection fails because conn->mux is NULL: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000655410b8642c in conn_backend_get (reuse_mode=4, srv=srv@entry=0x6554378a7140, sess=sess@entry=0x7cfe140948a0, is_safe=is_safe@entry=0, hash=hash@entry=910818338996668161) at src/backend.c:1390 1390 if (conn->mux->takeover && conn->mux->takeover(conn, i, 0) == 0) { However the condition that leads to this situation can be detected earlier, by the presence of the connection in the toremove_list, whose race window is much larger and easier to detect. This patch adds a few BUG_ON_STRESS() at selected places that an detect this condition. When built with -DDEBUG_STRESS and run under stress with two distinct processes communicating over H2 over SSL, under a stress of 400-500k req/s, the front process usually crashes in the first 10-30s triggering in _srv_add_idle() if the fix above is reverted (and it does not crash with the fix). This is mainly included to serve as an illustration of how to instrument the code for seamless stress testing.	2025-11-14 17:00:17 +01:00
Amaury Denoyelle	9481cef948	BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details A recent patch was introduced to fix a rare race condition in idle connection code which would result in a crash. The issue is when MUX IO handler run on top of connection moved in the purgeable list. The connection would be considered as present in the idle list instead, and reinserted in it at the end of the handler while still in the purge list. `096999ee20` BUG/MEDIUM: connections: permit to permanently remove an idle conn This patch solves the described issue. However, it introduces another bug as it may clear connection flag when removing a connection from its parent list. However, these flags now serve primarily as a status which indicate that the connection is accounted by the server. When a backend connection is freed, server idle/used counters are decremented accordingly to these flags. With the above patch, an incorrect counter could be adjusted and thus wrapping would occured. The first impact of this bug is that it may distort the estimated number of connections needed by servers, which would result either in poor reuse rate or too many idle connections kept. Another noticeable impact is that it may prevent server deletion. The main problem of the original and current issues is that connection flags are misinterpreted as telling if a connection is present in the idle list. As already described here, in fact these flags are solely a status which indicate that the connection is accounted in server counters. Thus, here are the definitive conclusion that can be learned here : * (conn->flags & CO_FL_LIST_MASK) == 1: the connection is accounted by the server it may or may not be present in the idle list * (conn->flags & CO_FL_LIST_MASK) == 0 the connection is not accounted and not present in idle list The discussion above does not mention session list, but a similar pattern can be observed when CO_FL_SESS_IDLE flag is set. To keep the original issue solved and fix the current one, IO MUX handlers prologue are rewritten. Now, flags are not checked anymore for list appartenance and LIST_INLIST macro is used instead. This is definitely clearer with conn_in_list purpose here. On IO MUX handlers end, conn idle flags may be checked if conn_in_list was true, to reinsert the connection either in idle or safe list. This is considered safe as no function should modify idle flags when a connection is not stored in a list, except during conn_free() operation. This patch must be backported to every stable versions after revert of the above commit. It should be appliable up to 3.0 without any issue. On 2.8 and below, <idle_list> connection member does not exist. It should be safe to check <leaf_p> tree node as a replacement.	2025-11-14 16:06:34 +01:00
Amaury Denoyelle	d79295d89b	Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn" The target patch fixes a rare race condition which happen when a MUX IO handler is working on a connection already moved into the purge list. In this case, the handler will incorrectly moved back the connection into the idle list. To fix this, conn_delete_from_tree() was extended to remove flags along with the connection from the idle list. This was performed when the connection is moved into the purge list. However, it introduces another issue related to the idle server connection accounting. Thus it is necessary to revert it prior to the incoming newer fix. This patch must be backported to every version where the original commit is.	2025-11-14 16:06:34 +01:00
Willy Tarreau	5fe4677231	MINOR: server: move the lock inside srv_add_idle() Almost all callers of _srv_add_idle() lock the list then call the function. It's not the most efficient and it requires some care from the caller to take care of that lock. Let's change this a little bit by having srv_add_idle() that takes the lock and calls _srv_add_idle() that is now inlined. This way callers don't have to handle the lock themselves anymore, and the lock is only taken around the sensitive parts, not the function call+return. Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS to 2.32-2.37M RPS on a 128-thread system.	2025-11-06 13:16:24 +01:00
Willy Tarreau	096999ee20	BUG/MEDIUM: connections: permit to permanently remove an idle conn There's currently a function conn_delete_from_tree() which is used to detach an idle connection from the tree it's currently attached to so that it is no longer found. This function is used in three circumstances: - when picking a new connection that no longer has any avail stream - when temporarily working on the connection from an I/O handler, in which case it's re-added at the end - when killing a connection The 2nd case above is quite specific, as it requires to preserve the CO_FL_LIST_MASK flags so that the connection can be re-inserted into the proper tree when leaving the handler. However, there's a catch. When killing a connection, we want to be certain it will not be reinserted into the tree. The flags preservation is causing a tiny race if an I/O happens while the connection is in the kill list, because in this case the I/O handler will note the connection flags, do its work, then reinsert the connection where it believed it was, then the connection gets purged, and another user can find it in the tree. The issue is very difficult to reproduce. On a 128-thread machine it happens in H2 around 500k req/s after around 50M requests. In H1 it happens after around 1 billion requests. The fix here consists in passing an extra argument to the function to indicate if the removal is permanent or not. When it's permanent, the function will clear the associated flags. The callers were adjusted so that all those dequeuing a connection in order to kill it do it permanently and all other ones do it only temporarily. A slightly different approach could have worked: the function could always remove all flags, and the callers would need to restore them. But this would require trickier modifications of the various call places, compared to only passing 0/1 to indicate the permanent status. This will need to be backported to all stable versions. The issue was at least reproduced since 3.1 (not tested before). The patch will need to be adjusted for 3.2 and older, because a 2nd argument "thr" was added in 3.3, so the patch will not apply to older versions as-is.	2025-11-05 11:08:25 +01:00
Willy Tarreau	59c599f3f0	BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle In h2_detach(), it looks possible to place a dead connection back to the idle list, and to later call h2_release() on it once detected as dead. It's not certain that it happens but nothing in the code shows it is not possible, so better make sure it cannot happen. This should be preventively backported to all versions.	2025-11-05 11:08:25 +01:00
Willy Tarreau	a1f26ca307	BUG/MINOR: mux-h2: send the preface along with the first request if needed Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Tests involving 0-RTT and H2 on the backend show that 0-RTT is being partially used but does not work. The analysis shows that only the preface and settings are sent using early-data and the request is sent separately. As explained in the previous patch, this is caused by the fact that a wakeup of the iocb is needed just to send the preface, then a new call to process_stream is needed to try sending again. Here with this patch, we're making h2_snd_buf() able to send the preface if it was not yet sent. Thanks to this, the preface, settings and first request can now leave as a single TCP segment. In case of TLS with 0-RTT, it now allows all the block to leave in early data. Even in clear-text H2, we're now seeing a 15% lower context-switch count, and the number of calls to process_stream() per connection dropped from 3 to 2. The connection rate increased by an extra 9.5%. Compared to without the last 3 patches, this is a 22% reduction of context-switches, 33% reduction of process_stream() calls, and 15.7% increase in connection rate. And more importantly, 0-RTT now really works with H2 on the backend, saving one full RTT on the first request. This fix is only for a missed optimization and a non-functional 0-RTT on the backend. It's worth backporting it, but it doesn't cause enough harm to hurry a backport. Better wait for it to live a little bit in 3.3 (till at least a week or two after the final release) before backporting it. It's not sure that it's worth going beyond 3.2 in any case. It depends on the these two previous commits: MEDIUM: mux-h2: do not needlessly refrain from sending data early MINOR: mux-h2: extract the code to send preface+settings into its own function	2025-10-30 18:16:54 +01:00
Willy Tarreau	d5aa3e19cc	MINOR: mux-h2: extract the code to send preface+settings into its own function The code that deals with sending preface + settings and changing the state currently is in h2_process_mux(), but we'll want to do it as well from h2_snd_buf(), so let's move it to a dedicate function first. At this point there is no functional change.	2025-10-30 18:16:54 +01:00
Willy Tarreau	b0e8edaef2	MEDIUM: mux-h2: do not needlessly refrain from sending data early The mux currently refrains from sending data before H2_CS_FRAME_H, i.e. before the peer's SETTINGS frame was received. While it makes sense on the frontend, it's causing harm on the backend because it forces the first request to be sent in two halves over an extra RTT: first the preface and settings, second the request once the settings are received. This is totally contrary to the philosophy of the H2 protocol, consisting in permitting the client to send as soon as possible. Actually what happens is the following: - process_stream() calls connect_server() - connect_server() creates a connection, and if the proto/alpn is guessed or known, the mux is instantiated for the current request. - the H2 init code wakes the h2 tasklet up and returns - process_stream() tries to send the request using h2_snd_buf(), but that one sees that we're before H2_CS_FRAME_H, refrains from doing so and returns. - process_stream() subscribes and quits - the h2 tasklet can now execute to send the preface and settings, which leave as a first TCP segment. The connection is ready. - the iocb is woken again once the server's SETTINGS frame is received, turning the connection to the H2_CS_FRAME_H state, and the iocb wake up process_stream(). - process_stream() executes again and can try to send again. - h2_snd_buf() is called and finally sends the request as a second TCP segment. Not only this is inefficient, but it also renders 0-RTT and TFO impossible on H2 connections. When 0-RTT is used, only the preface and settings leave as early data (the very first data of that connection), which is totally pointless. In order to fix this, we have to go through a few steps: - first we need to let data be sent to a server immediately after the SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of H2_CS_FRAME_H). However, some protocol extensions are advertised by the server using SETTINGS (e.g. RFC8441) and some requests might need to know the existence of such extensions. For this reason we're adding a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some operations were not done because a server's SETTINGS frame is needed. This is set when trying to send a protocol upgrade or extended CONNECT during H2_CS_SETTINGS1, indicating that it's needed to wait for H2_CS_FRAME_H in this case. The flag is always set on frontend connections. This is what is being done in this patch. - second, we need to be able to push the preface opportunistically with the first h2_snd_buf() so that it's not needed to wake the tasklet up just to send that and wake process_stream() again. This will be in a separate patch. By doing the first step, we're at least saving one needless tasklet wakeup per connection (~9%), which results in ~5% backend connection rate increase.	2025-10-30 18:16:54 +01:00
Christopher Faulet	914538cd39	MEDIUM: htx: Remove the HTX extra field Thanks for previous changes, it is now possible to remove the <extra> field from the HTX structure. HTX_FL_ALTERED_PAYLOAD flag is also removed because it is now unsued.	2025-10-08 11:10:42 +02:00
Christopher Faulet	bc8c6c42f4	MINOR: mux-h2: Set known input payload length of the sedesc Set <kip> value when data are transfer to the upper layer, in h2_rcv_buf(). The new <body_len> filed of the H2S is used to increment <kip> value and then it is reset. The patch relies on the previous one ("MINOR: mux-h2: Save the known length of the payload").	2025-10-08 11:01:36 +02:00
Christopher Faulet	3a6a576e73	MINOR: mux-h2: Use <body_len> H2S field for payload without content-length Before, the <body_len> H2S field was only use for verity the annonced content-lenght value was respected. Now, this field is used for all messages. Messages with a content-length are still handled the same way. <body_len> is set to the content-length value and decremented by the size of each DATA frame. For other messages, the value is initialized to ULLONG_MAX and still decremented by the size of each DATA frame. This change is mandatory to properly define the known input payload length value of the sedesc.	2025-10-08 11:01:36 +02:00
Willy Tarreau	2d6b5c7a60	MEDIUM: connection: reintegrate conn_hash_node into connection Previously the conn_hash_node was placed outside the connection due to the big size of the eb64_node that could have negatively impacted frontend connections. But having it outside also means that one extra allocation is needed for each backend connection, and that one memory indirection is needed for each lookup. With the compact trees, the tree node is smaller (16 bytes vs 40) so the overhead is much lower. By integrating it into the connection, We're also eliminating one pointer from the connection to the hash node and one pointer from the hash node to the connection (in addition to the extra object bookkeeping). This results in saving at least 24 bytes per total backend connection, and only inflates connections by 16 bytes (from 240 to 256), which is a reasonable compromise. Tests on a 64-core EPYC show a 2.4% increase in the request rate (from 2.08 to 2.13 Mrps).	2025-09-16 09:23:46 +02:00
Willy Tarreau	ceaf8c1220	MEDIUM: connection: move idle connection trees to ceb64 Idle connection trees currently require a 56-byte conn_hash_node per connection, which can be reduced to 32 bytes by moving to ceb64. While ceb64 is theoretically slower, in practice here we're essentially dealing with trees that almost always contain a single key and many duplicates. In this case, ceb64 insert and lookup functions become faster than eb64 ones because all duplicates are a list accessed in O(1) while it's a subtree for eb64. In tests it is impossible to tell the difference between the two, so it's worth reducing the memory usage. This commit brings the following memory savings to conn_hash_node (one per backend connection), and to srv_per_thread (one per thread and per server): struct before after delta conn_hash_nodea 56 32 -24 srv_per_thread 96 72 -24 The delicate part is conn_delete_from_tree(), because we need to know the tree root the connection is attached to. But thanks to recent cleanups, it's now clear enough (i.e. idle/safe/avail vs session are easy to distinguish).	2025-09-16 09:23:46 +02:00
Willy Tarreau	95b8adff67	MINOR: connection: pass the thread number to conn_delete_from_tree() We'll soon need to choose the server's root based on the connection's flags, and for this we'll need the thread it's attached to, which is not always the current one. This patch simply passes the thread number from all callers. They know it because they just set the idle_conns lock on it prior to calling the function.	2025-09-16 09:23:46 +02:00
Christopher Faulet	b901e56acd	BUG/MEDIUM: mux-h2: Reinforce conditions to report an error to app-layer stream This patch relies on the previous one ("BUG/MEDIUM: mux-h2: Report RST/error to app-layer stream during 0-copy fwding"). When the end of the connection is detected, so when the H2_CF_END_REACHED flag is set after the shutdown was received and all incoming data were processed, if a stream is blocked by the flow control (the stream one or the connection one), an error must be reported to the app-layer stream. Otherwise, outgoing data won't be sent and the opposite side will handle this as a lack of room. So the stream will be blocked until the write timeout is triggerd. By reporting the error early, the stream can be immediately closed. This patch should be backported to 3.2. For older versions, it is probably a good idea to wait for bug report.	2025-09-09 16:30:54 +02:00
Christopher Faulet	22e14f7b54	BUG/MEDIUM: mux-h2: Report RST/error to app-layer stream during 0-copy fwding In h2_nego_ff(), it is important to report reset and error to app-layer stream and to send the RST-STREAM frame accordingly. It is not clear if it is an issue or not. But it is clearly a difference with the classical forwarding via h2_snd_buf. And it is mandatory for the next fix. This patch should be backported to 3.2. But is is probably a good idea to not backport it on older versions, except if a bug is reported in this area.	2025-09-09 16:30:21 +02:00
Christopher Faulet	3b7112aa1d	BUG/MINOR: mux-h2: Remove H2_CF_DEM_DFULL flags when the demux buffer is reset This only happens when a connection error is detected or when the H2 connection is in ERR/ERR2 state. The demux buffer is explicitly reset. In that case, it is important to remove the flag reporting this buffer as full. It is probably worth to backport this patch to 3.2. But it is not mandatory on older versions because it does not fix any known issue.	2025-09-09 16:29:14 +02:00
Christopher Faulet	12edcccc82	BUG/MEDIUM: mux-h2: Restart reading when mbuf ring is no longer full When the mbuf ring buffer is full, the flag H2_CF_DEM_MROOM is set on the H2 connection to block any demux. It is important to properly handle ACK frames. However, we must take care to restart reading when some data were removed from the mbuf. Otherwise, we may block the demux for no reason. It is especially an issue if the demux buffer is full. In that case, the H2 connection is blocked, waiting for the timeout. This patch should be backported to 3.2. But is is probably a good idea to not backport it on older versions, except if a bug is reported in this area.	2025-09-09 16:07:20 +02:00
Christopher Faulet	c6e4584d2b	BUG/MEDIUM: mux-h2; Don't block reveives in H2_CS_ERROR and H2_CS_ERROR2 states The H2 connection is switched to ERR when a GOAWAY must be sent and in ERR2 when it is sent. In these states, no more data can be emitted by the mux. But there is no reason to not try to process incoming data or to not try to receive data. It is espcially important to be able to get the shutdown from the TCP connection when a SSL connection was previously detected. Otherwise, it is possible to block a H2 connection until its timeout expiration to be able to close it. This patch should be backported to 3.2. But is is probably a good idea to not backport it on older versions, except if a bug is reported in this area.	2025-09-09 16:07:20 +02:00
Christopher Faulet	626d7934cf	BUG/MEDIUM: mux-h2: Reset MUX blocking flags when a send error is caught When an send error is detected on the underlying connection, a pending error is reported to the H2 connection by setting H2_CF_ERR_PENDING flag. When this happen the tail of the mux ring buffer is reset. However some blocking flags remain set and have no chance to be removed later because of the pending error. Especially the flag H2_CF_DEM_MROOM which block data demultiplexing. Thus, it is possible to block a H2 connection with unparsed incoming data. Worse, if a read event is received, it could lead to a wakeup loop between the H2 connection and the underlying SSL connection. The H2 connection is unable to convert the pending error to a fatal error because the demultiplexing is blocked. In the mean time, it tries to receive more data because of the not-consumed read event. On the underlying connection side, the error detected earlier blocks the read, but the H2 connection is woken up to handle the error. To fix the issue, blocking flags must be removed when a send error is caught, H2_CF_MUX_MFULL and H2_CF_DEM_MROOM flags. But, it is not necessary to only release the tail of the mbuf ring. When a send error is detected, all outgoing data can be flushed. So, now, in h2_send(), h2_release_mbuf() function is called on pending error. The mbuf ring is fully released and H2_CF_MUX_MFULL and H2_CF_DEM_MROOM flags are removed. Many thanks to Krzysztof Kozłowski for its help to spot this issue. This patch could be backported at least as far as 2.8. But it is a bit sensitive. So, it is probably a good idea to backport it to 3.2 for now and wait for bug report on older versions.	2025-09-09 16:07:20 +02:00
Amaury Denoyelle	687df405fe	BUG/MINOR: connection: streamline conn detach from lists Over their lifetime, connections are attached to different list. These lists depends on whether connection is on frontend or backend side. Attach point members are stored via a union in struct connection. The next commit reorganizes them so that a proper frontend/backend separation is performed : commit `a96f1286a7` BUG/MINOR: connection: rearrange union list members On conn_free(), connection instance must be removed from these lists to ensure there is no use-after-free case. However code was still shaky there, despite no real issue. Indeed, <toremove_list> was detached for all connections, despite being only used on backend side only. This patch streamlines the freeing of connection. Now, <toremove_list> detach is performed in conn_backend_deinit(). Moreover, a new helper conn_frontend_deinit() is defined. It ensures that <stopping_list> detach is done. Prior it was performed individually by muxes. Note that a similar procedure is performed when the connection is reversed. Hence, conn_frontend_deinit() is now used here as well, rendering reversal from FE to BE or vice versa symmetrical. As mentionned above, no crash occured prior to this patch, but the code was fragile, in particular access to <toremove_list> for frontend connections. Thus this patch is considered as a bug fix worthy of a backport along with above mentionned patch, currently up to 3.0.	2025-09-04 18:31:20 +02:00
Amaury Denoyelle	1868ca9a95	MINOR: conn/muxes/ssl: add ASSUME_NONNULL() prior to _srv_add_idle When manipulating idle backend connections for input/output processing, special care is taken to ensure the connection cannot be accessed by another thread, for example via a takeover. When processing is over, connection is reinserted in its original list. A connection can either be attached to a session (private ones) or a server idle tree. In the latter case, <srv> is guaranteed to be non null prior to _srv_add_idle() thanks to CO_FL_LIST_MASK comparison with conn flags. This patch adds an ASSUME_NONNULL() to better reflect this. This should fix coverity reports from github issue #3095.	2025-09-01 15:35:22 +02:00
Amaury Denoyelle	d971d3fed8	MINOR: muxes: adjust takeover with buf_wait interaction Takeover operation defines an argument <release>. It's a boolean which if set indicate that freed connection resources during the takeover does not have to be reallocated on the new thread. Typically, it is set to false when takever is performed to reuse a connection. However, when used to be able to delete a connection from a different thread, <release> should be set to true. Previously, <release> was only set in conjunction with "del server" handler. This operation was performed under thread isolation, which guarantee that not thread-safe operation such as removal from buf_wait list could be performed on takeover if <release> was true. In the contrary case, takeover operation would fail. Recently, "del server" handler has been adjusted to remove idle connection cleanup with takeover. As such, <release> is never set to true in remaining takeover usage. However, takeover is also used to enforce strict-maxconn on a server. This is performed to delete a connection from any thread, which is the primary reason of <release> to true. But for the moment as takeover implementers considers that thread isolation is active if <release> is set, this is not yet applicable for strict-maxconn usage. Thus, the purpose of this patch is to adjust takeover implementation. Remove assumption between <release> and thread-isolation mode. It's not possible to remove a connection from a buf_wait list, an error will be return in any case.	2025-08-28 16:09:48 +02:00
Amaury Denoyelle	73fd12e928	MEDIUM: conn/muxes/ssl: remove BE priv idle conn from sess on IO This is a direct follow-up of previous patch which adjust idle private connections access via input/output handlers. This patch implement the handlers prologue part. Now, private idle connections require a similar treatment with non-private idle connections. Thus, private conns are removed temporarily from its session under protection of idle_conns lock. As locking usage is already performed in input/output handler, session_unown_conn() cannot be called. Thus, a new function session_detach_idle_conn() is implemented in session module, which performs basically the same operation but relies on external locking.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	8de0807b74	MEDIUM: conn/muxes/ssl: reinsert BE priv conn into sess on IO completion When dealing with input/output on a connection related handler, special care must be taken prior to access the connection if it is considered as idle, as it could be manipulated by another thread. Thus, connection is first removed from its idle tree before processing. The connection is reinserted on processing completion unless it has been freed during it. Idle private connections are not concerned by this, because takeover is not applied on them. However, a future patch will implement purging of these connections along with regular idle ones. As such, it is necessary to also protect private connections usage now. This is the subject of this patch and the next one. With this patch, input/output handlers epilogue of muxes/SSL/conn_notify_mux() are adjusted. A new code path is able to deal with a connection attached to a session instead of a server. In this case, session_reinsert_idle_conn() is used. Contrary to session_add_conn(), this new function is reserved for idle connections usage after a temporary removal. Contrary to _srv_add_idle() used by regular idle connections, session_reinsert_idle_conn() may fail as an allocation can be required. If this happens, the connection is immediately destroyed. This patch has no effect for now. It must be coupled with the next one which will temporarily remove private idle connections on input/output handler prologue.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	9574867358	MINOR: muxes: enforce thread-safety for private idle conns When a backend connnection becomes idle, muxes must activate some protection to mark future access on it as dangerous. Indeed, once a connection is inserted in an idle list, it may be manipulated by another thread, either via takeover or scheduled for purging. Private idle connections are stored into a session instead of the server tree. They are never subject to a takeover for reuse or purge mechanism. As such, currently they do not require the same level of protection. However, a new patch will introduce support for private idle connections purging. Thus, the purpose of this patch is to ensure protection is activated as well now. TASK_F_USR1 was already set on them as an anticipation for such need. Only some extra operations were missing, most notably xprt_set_idle() invokation. Also, return path of muxes detach operation is adjusted to ensure such connection are never accessed after insertion.	2025-08-28 14:55:21 +02:00
Amaury Denoyelle	67df6577ff	MEDIUM: server: close new idle conns if server in maintenance Currently, when a server is set on maintenance mode, its idle connection are scheduled for purge. However, this does not prevent currently used connection to become idle later on, even if the server is still off. Change this behavior : an idle connection is now rejected by the server if it is in maintenance. This is implemented with a new condition in srv_add_to_idle_list() which returns an error value. In this case, muxes stream detach callback will immediately free the connection. A similar change is also performed in each MUX and SSL I/O handlers and in conn_notify_mux(). An idle connection is not reinserted in its idle list if server is in maintenance, but instead it is immediately freed.	2025-08-28 14:55:18 +02:00
Amaury Denoyelle	901de11157	BUG/MEDIUM: mux-h2: fix crash on idle-ping due to unwanted ABORT_NOW An ABORT_NOW() was used during debugging idle-ping but was not removed from the final code. This may cause crash, in particular when mixing idle-ping with shorter http-request/http-keep-alive values. Fix this situation by removing ABORT_NOW() statement. This should fix github issue #3079. This must be backported up to 3.2.	2025-08-21 14:21:11 +02:00
Olivier Houchard	3d685fcb7d	MINOR: xprt: Add recvmsg() and sendmsg() parameters to rcv_buf() and snd_buf(). In rcv_buf() and snd_buf(), use sendmsg/recvmsg instead of send and recv, and add two new optional parameters to provide msg_control and msg_controllen. Those are unused for now, but will be used later for kTLS.	2025-08-20 17:28:03 +02:00
Willy Tarreau	c264ea1679	MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL This will make the pools size and alignment automatically inherit the type declaration. It was done like this: sed -i -e 's:DECLARE_POOL($[^,],[^,],\s$sizeof($[^)]$)):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons) sed -i -e 's:DECLARE_STATIC_POOL($[^,],[^,],\s$sizeof($[^)]$)):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons) 81 replacements were made. The only remaining ones are those which set their own size without depending on a structure. The few ones with an extra size were manually handled. It also means that the requested alignments are now checked against the type's. Given that none is specified for now, no issue is reported. It was verified with "show pools detailed" that the definitions are exactly the same, and that the binaries are similar.	2025-08-11 19:55:30 +02:00
Amaury Denoyelle	697f7d1142	MINOR: muxes: refactor private connection detach Following the latest adjustment on session_add_conn() / session_check_idle_conn(), detach muxes callbacks were rewritten for private connection handling. Nothing really fancy here : some more explicit comments and the removal of a duplicate checks on idle conn status for muxes with true multipexing support.	2025-07-30 16:14:00 +02:00
Amaury Denoyelle	dd9645d6b9	MINOR: session: do not release conn in session_check_idle_conn() session_check_idle_conn() is called to flag a connection already inserted in a session list as idle. If the session limit on the number of idle connections (max-session-srv-conns) is exceeded, the connection is removed from the session list. In addition to the connection removal, session_check_idle_conn() directly calls MUX destroy callback on the connection. This means the connection is freed by the function itself and should not be used by the caller anymore. This is not practical when an alternative connection closure method should be used, such as a graceful shutdown with QUIC. As such, remove MUX destroy invokation : this is now the responsability of the caller to either close or release immediately the connection.	2025-07-30 11:43:41 +02:00
Amaury Denoyelle	ec1ab8d171	MINOR: session: remove redundant target argument from session_add_conn() session_add_conn() uses three argument : connection and session instances, plus a void pointer labelled as target. Typically, it represents the server, but can also be a backend instance (for example on dispatch). In fact, this argument is redundant as <target> is already a member of the connection. This commit simplifies session_add_conn() by removing it. A BUG_ON() on target is extended to ensure it is never NULL.	2025-07-30 11:39:57 +02:00

1 2 3 4 5 ...

1059 commits