haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-03-23 19:03:06 -04:00

Author	SHA1	Message	Date
Remi Tricot-Le Breton	f5632fd481	MINOR: jwt: Add new jwt_verify_cert converter This converter will be in charge of performing the same operation as the 'jwt_verify' one except that it takes a full-on pem certificate path instead of a public key path as parameter. The certificate path can be either provided directly as a string or via a variable. This allows to use certificates that are not known during init to perform token validation.	2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton	c3c0597a34	MEDIUM: jwt: Remove certificate support in jwt_verify converter The jwt_verify converter will not take full-on certificates anymore in favor of a new soon to come jwt_verify_cert. We might end up with a new jwt_verify_hmac in the future as well which would allow to deprecate the jwt_verify converter and remove the need for a specific internal tree for public keys. The logic to always look into the internal jwt tree by default and resolve to locking the ckch tree as little as possible will also be removed. This allows to get rid of the duplicated reference to EVP_PKEYs, the one in the jwt tree entry and the one in the ckch_store.	2025-10-13 10:38:52 +02:00
Christopher Faulet	4145a61101	BUG/MEDIUM: stconn: Properly forward kip to the opposite SE descriptor By refactoring the HTX to remove the extra field, a bug was introduced in the stream-connector part. The <kip> (known input payload) value of a sedesc was moved to <kop> (knwon output payload) using the same sedesc. Of course, this is totally wrong. <kip> value of a sedesc must be forwarded to the opposite side. In addition, the operation is performed in sc_conn_send(). In this function, we manipulate the stream-connectors. So se_fwd_kip() function was changed to use the stream-connectors directely. Now, the function sc_ep_fwd_kip() is now called with the both stream-connectors to properly forward <kip> from on side to the opposite side. The bug is 3.3-specific. No backport needed.	2025-10-10 11:01:21 +02:00
Christopher Faulet	914538cd39	MEDIUM: htx: Remove the HTX extra field Thanks for previous changes, it is now possible to remove the <extra> field from the HTX structure. HTX_FL_ALTERED_PAYLOAD flag is also removed because it is now unsued.	2025-10-08 11:10:42 +02:00
Christopher Faulet	c0b6db2830	MINOR: stconn: Add two fields in sedesc to replace the HTX extra value For now, the HTX extra value is used to specify the known part, in bytes, of the HTTP payload we will receive. It may concerne the full payload if a content-length is specified or the current chunk for a chunk-encoded message. The main purpose of this value is to be used on the opposite side to be able to announce chunks bigger than a buffer. It can also be used to check the validity of the payload on the sending path, to properly detect too big or too short payload. However, setting this information in the HTX message itself is not really appropriate because the information is lost when the HTX message is consumed and the underlying buffer released. So the producer must take care to always add it in all HTX messages. it is especially an issue when the payload is altered by a filter. So to fix this design issue, the information will be moved in the sedesc. It is a persistent area to save the information. In addition, to avoid the ambiguity between what the producer say and what the consumer see, the information will be splitted in two fields. In this patch, the fields are added: * kip : The known input payload length * kop : The known output payload lenght The producer will be responsible to set <kip> value. The stream will be responsible to decrement <kip> and increment <kop> accordingly. And the consumer will be responsible to remove consumed bytes from <kop>.	2025-10-08 11:01:36 +02:00
Willy Tarreau	75103e7701	MINOR: proxy: introduce proxy_abrt_close_def() to pass the desired default With this function we can now pass the desired default value for the abortonclose option when neither the option nor its opposite were set. Let's also take this opportunity for using it directly from the HTTP analyser since there's no point in re-checking the proxy's mode there.	2025-10-08 10:29:41 +02:00
Willy Tarreau	644b3dc7d8	MAJOR: proxy: enable abortonclose by default on HTTP proxies As discussed on https://github.com/orgs/haproxy/discussions/3146 and on the mailing list, there's a marked preference for having abortonclose enabled by default when relevant. The point being that with todays' internet, the large majority of requests sent with a closed input channel are aborted requests, and that it's pointless to waste resources processing them. This patch now considers both "option abortonclose" and its opposite "no option abortonclose" to figure whether abortonclose is enabled or disabled in a backend. When neither are set (thus not even inherited from a defaults section), then it considers the proxy's mode, and HTTP mode implies abortonclose by default. This may make some legacy services fail starting with 3.3. In this case it will be sufficient to add "no option abortonclose" in either the affected backend or the defaults section it derives from. But for internet-facing proxies it's better to stay with the option enabled.	2025-10-08 10:29:41 +02:00
Willy Tarreau	fe47e8dfc5	MINOR: proxy: only check abortonclose through a dedicated function In order to prepare for changing the way abortonclose works, let's replace the direct flag check with a similarly named function (proxy_abrt_close) which returns the on/off status of the directive for the proxy. For now it simply reflects the flag's state.	2025-10-08 10:29:41 +02:00
Willy Tarreau	1afaa7b59d	MINOR: rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete reads Normally, when reading a full buffer, or exactly the requested size, it is not really possible to know if the peer had closed immediately after, and usually we don't care. There's a problematic case, though, which is with SSL: the SSL layer reads in small chunks of a few bytes, and can consume a client_hello this way, then start computation without knowing yet that the client has aborted. In order to permit knowing more, we now introduce a new read flag, CO_RFL_TRY_HARDER, which says that if we've read up to the permitted limit and the flag is set, then we attempt one extra byte using MSG_PEEK to detect whether the connection was closed immediately after that content or not. The first use case will obviously be related to SSL and client_hello, but it might possibly also make sense on HTTP responses to detect a pending FIN at the end of a response (e.g. if a close was already advertised).	2025-10-01 10:23:01 +02:00
Willy Tarreau	25f5f357cc	MINOR: sched: pass the thread number to is_sched_alive() Now it will be possible to query any thread's scheduler state, not only the current one. This aims at simplifying the watchdog checks for reported threads. The operation is now a simple atomic xchg.	2025-10-01 10:18:53 +02:00
William Lallemand	b70c7f48fa	MINOR: acme: implement "reuse-key" option The new "reuse-key" option in the "acme" section, allows to keep the private key instead of generating a new one at each renewal.	2025-09-27 21:41:39 +02:00
William Lallemand	3e72a9f618	MINOR: acme: provider-name for dpapi sink Like "acme-vars", the "provider-name" in the acme section is used in case of DNS-01 challenge and is sent to the dpapi sink. This is used to pass the name of a DNS provider in order to chose the DNS API to use. This patch implements the cfg_parse_acme_vars_provider() which parses either acme-vars or provider-name options and escape their strings. Example: $ ( echo "@@1 show events dpapi -w -0"; cat - ) \| socat /tmp/master.sock - \| cat -e <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$ acme-vars "var1=foobar\"toto\",var2=var2"$ provider-name "godaddy"$ {$ "identifier": {$ "type": "dns",$ "value": "example.com"$ },$ "status": "pending",$ "expires": "2025-09-25T14:41:57Z",$ [...]	2025-09-26 10:23:35 +02:00
William Lallemand	92c31a6fb7	MINOR: acme: acme-vars allow to pass data to the dpapi sink In the case of the dns-01 challenge, the agent that handles the challenge might need some extra information which depends on the DNS provider. This patch introduces the "acme-vars" option in the acme section, which allows to pass these data to the dpapi sink. The double quotes will be escaped when printed in the sink. Example: global setenv VAR1 'foobar"toto"' acme LE directory https://acme-staging-v02.api.letsencrypt.org/directory challenge DNS-01 acme-vars "var1=${VAR1},var2=var2" Would output: $ ( echo "@@1 show events dpapi -w -0"; cat - ) \| socat /tmp/master.sock - \| cat -e <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$ acme-vars "var1=foobar\"toto\",var2=var2"$ {$ "identifier": {$ "type": "dns",$ "value": "example.com"$ },$ "status": "pending",$ "expires": "2025-09-25T14:41:57Z",$ [...]	2025-09-19 16:40:53 +02:00
Aurelien DARRAGON	5c299dee5a	MEDIUM: stats: consider that shared stats pointers may be NULL This patch looks huge, but it has a very simple goal: protect all accessed to shared stats pointers (either read or writes), because we know consider that these pointers may be NULL. The reason behind this is despite all precautions taken to ensure the pointers shouldn't be NULL when not expected, there are still corner cases (ie: frontends stats used on a backend which no FE cap and vice versa) where we could try to access a memory area which is not allocated. Willy stumbled on such cases while playing with the rings servers upon connection error, which eventually led to process crashes (since 3.3 when shared stats were implemented) Also, we may decide later that shared stats are optional and should be disabled on the proxy to save memory and CPU, and this patch is a step further towards that goal. So in essence, this patch ensures shared stats pointers are always initialized (including NULL), and adds necessary guards before shared stats pointers are de-referenced. Since we already had some checks for backends and listeners stats, and the pointer address retrieval should stay in cpu cache, let's hope that this patch doesn't impact stats performance much.	2025-09-18 16:49:51 +02:00
Willy Tarreau	08c6bbb542	OPTIM: sink: don't waste time calling sink_announce_dropped() if busy If we see that another thread is already busy trying to announce the dropped counter, there's no point going there, so let's just skip all that operation from sink_write() and avoid disturbing the other thread. This results in a boost from 244 to 262k req/s.	2025-09-18 09:07:35 +02:00
Willy Tarreau	361c227465	MINOR: trace: don't call strlen() on the function's name Currently there's a small mistake in the way the trace function and macros. The calling function name is known as a constant until the macro and passed as-is to the __trace() function. That one needs to know its length and will call ist() on it, resulting in a real call to strlen() while that length was known before the call. Let's use an ist instead of a const char* for __trace() and __trace_enabled() so that we can now completely avoid calling strlen() during this operation. This has significantly reduced the importance of __trace_enabled() in perf top.	2025-09-18 08:31:57 +02:00
Willy Tarreau	8c077c17eb	MINOR: server: add the "cc" keyword to set the TCP congestion controller It is possible on at least Linux and FreeBSD to set the congestion control algorithm to be used with outgoing connections, among the list of supported and permitted ones. Let's expose this setting with "cc". Unknown or forbidden algorithms will be ignored and the default one will continue to be used.	2025-09-17 17:19:33 +02:00
Willy Tarreau	4ed3cf295d	MINOR: listener: add the "cc" bind keyword to set the TCP congestion controller It is possible on at least Linux and FreeBSD to set the congestion control algorithm to be used with incoming connections, among the list of supported and permitted ones. Let's expose this setting with "cc". Permission issues might be reported (as warnings).	2025-09-17 17:03:42 +02:00
Aurelien DARRAGON	644b6b9925	MINOR: counters: document that tg shared counters are tied to shm-stats-file mapping Let's explicitly mention that fe_counters_shared_tg and be_counters_shared_tg structs are embedded in shm_stats_file_object struct so any change in those structs will result in shm stats file incompatibility between processes, thus extra precaution must be taken when making changes to them. Note that the provisionning made in shm_stats_file_object struct could be used to add members to {fe,be}_counters_shared_tg without changing shm_stats_file_object struct size if needed in order to preserve shm stats file version.	2025-09-17 11:31:29 +02:00
Willy Tarreau	4edff4a2cc	CLEANUP: vars: use the item API for the variables trees The variables trees use the immediate cebtree API, better use the item one which is more expressive and safer. The "node" field was renamed to "name_node" to avoid any ambiguity.	2025-09-16 10:51:23 +02:00
Willy Tarreau	2d6b5c7a60	MEDIUM: connection: reintegrate conn_hash_node into connection Previously the conn_hash_node was placed outside the connection due to the big size of the eb64_node that could have negatively impacted frontend connections. But having it outside also means that one extra allocation is needed for each backend connection, and that one memory indirection is needed for each lookup. With the compact trees, the tree node is smaller (16 bytes vs 40) so the overhead is much lower. By integrating it into the connection, We're also eliminating one pointer from the connection to the hash node and one pointer from the hash node to the connection (in addition to the extra object bookkeeping). This results in saving at least 24 bytes per total backend connection, and only inflates connections by 16 bytes (from 240 to 256), which is a reasonable compromise. Tests on a 64-core EPYC show a 2.4% increase in the request rate (from 2.08 to 2.13 Mrps).	2025-09-16 09:23:46 +02:00
Willy Tarreau	ceaf8c1220	MEDIUM: connection: move idle connection trees to ceb64 Idle connection trees currently require a 56-byte conn_hash_node per connection, which can be reduced to 32 bytes by moving to ceb64. While ceb64 is theoretically slower, in practice here we're essentially dealing with trees that almost always contain a single key and many duplicates. In this case, ceb64 insert and lookup functions become faster than eb64 ones because all duplicates are a list accessed in O(1) while it's a subtree for eb64. In tests it is impossible to tell the difference between the two, so it's worth reducing the memory usage. This commit brings the following memory savings to conn_hash_node (one per backend connection), and to srv_per_thread (one per thread and per server): struct before after delta conn_hash_nodea 56 32 -24 srv_per_thread 96 72 -24 The delicate part is conn_delete_from_tree(), because we need to know the tree root the connection is attached to. But thanks to recent cleanups, it's now clear enough (i.e. idle/safe/avail vs session are easy to distinguish).	2025-09-16 09:23:46 +02:00
Willy Tarreau	95b8adff67	MINOR: connection: pass the thread number to conn_delete_from_tree() We'll soon need to choose the server's root based on the connection's flags, and for this we'll need the thread it's attached to, which is not always the current one. This patch simply passes the thread number from all callers. They know it because they just set the idle_conns lock on it prior to calling the function.	2025-09-16 09:23:46 +02:00
Willy Tarreau	7773d87ea6	CLEANUP: proxy: slightly reorganize fields to plug some holes The proxy struct has several small holes that deserved being plugged by moving a few fields around. Now we're down to 3056 from 3072 previously, and the remaining holes are small. At the moment, compared to before this series, we're seeing these sizes: type\size `7d554ca62` current delta listener 752 704 -48 (-6.4%) server 4032 3840 -192 (-4.8%) proxy 3184 3056 -128 (-4%) stktable 3392 3328 -64 (-1.9%) Configs with many servers have shrunk by about 4% in RAM and configs with many proxies by about 3%.	2025-09-16 09:23:46 +02:00
Willy Tarreau	8df81b6fcc	CLEANUP: server: slightly reorder fields in the struct to plug holes The struct server still has a lot of holes and padding that make it quite big. By moving a few fields aronud between areas which do not interact (e.g. boot vs aligned areas), it's quite easy to plug some of them and/or to arrange larger ones which could be reused later with a bit more effort. Here we've reduced holes by 40 bytes, allowing the struct to shrink by one more cache line (64 bytes). The new size is 3840 bytes.	2025-09-16 09:23:46 +02:00
Willy Tarreau	d18d972b1f	MEDIUM: server: index server ID using compact trees The server ID is currently stored as a 32-bit int using an eb32 tree. It's used essentially to find holes in order to automatically assign IDs, and to detect duplicates. Let's change this to use compact trees instead in order to save 24 bytes in struct server for this node, plus 8 bytes in struct proxy. The server struct is still 3904 bytes large (due to alignment) and the proxy struct is 3072.	2025-09-16 09:23:46 +02:00
Willy Tarreau	66191584d1	MEDIUM: listener: index listener ID using compact trees The listener ID is currently stored as a 32-bit int using an eb32 tree. It's used essentially to find holes in order to automatically assign IDs, and to detect duplicates. Let's change this to use compact trees instead in order to save 24 bytes in struct listener for this node, plus 8 bytes in struct proxy. The struct listener is now 704 bytes large, and the struct proxy 3080.	2025-09-16 09:23:46 +02:00
Willy Tarreau	1a95bc42c7	MEDIUM: proxy: index proxy ID using compact trees The proxy ID is currently stored as a 32-bit int using an eb32 tree. It's used essentially to find holes in order to automatically assign IDs, and to detect duplicates. Let's change this to use compact trees instead in order to save 24 bytes in struct proxy for this node, plus 8 bytes in the root (which is static so not much relevant here). Now the proxy is 3088 bytes large.	2025-09-16 09:23:46 +02:00
Willy Tarreau	eab5b89dce	MINOR: proxy: add proxy_index_id() to index a proxy by its ID This avoids needlessly exposing the tree's root and the mechanics outside of the low-level code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	5e4b6714e1	MINOR: listener: add listener_index_id() to index a listener by its ID This avoids needlessly exposing the tree's root and the mechanics outside of the low-level code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	5a5cec4d7a	MINOR: server: add server_index_id() to index a server by its ID This avoids needlessly exposing the tree's root and the mechanics outside of the low-level code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	0b0aefe19b	MINOR: server: add server_get_next_id() to find next free server ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated server_get_next_id(). As a bonus it reduces the exposure of the tree's root outside of the functions.	2025-09-16 09:23:46 +02:00
Willy Tarreau	23605eddb1	MINOR: listener: add listener_get_next_id() to find next free listener ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated listener_get_next_id(). As a bonus it reduces the exposure of the tree's root outside of the functions.	2025-09-16 09:23:46 +02:00
Willy Tarreau	b2402d67b7	MINOR: proxy: add proxy_get_next_id() to find next free proxy ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated proxy_get_next_id().	2025-09-16 09:23:46 +02:00
Willy Tarreau	f4059ea42f	MEDIUM: stktable: index table names using compact trees Here we're saving 64 bytes per stick-table, from 3392 to 3328, and the change was really straightforward so there's no reason not to do it.	2025-09-16 09:23:46 +02:00
Willy Tarreau	d0d60a007d	MEDIUM: proxy: switch conf.name to cebis_tree This is used to index the proxy's name and it contains a copy of the pointer to the proxy's name in <id>. Changing that for a ceb_node placed just before <id> saves 32 bytes to the struct proxy, which is now 3112 bytes large. Here we need to continue to support duplicates since they're still allowed between type-incompatible proxies. Interestingly, the use of cebis_next_dup() instead of cebis_next() in proxy_find_by_name() allows us to get rid of an strcmp() that was performed for each use_backend rule. A test with a large config (100k backends) shows that we can get 3% extra performance on a config involving a static use_backend rule (3.09M to 3.18M rps), and even 4.5% on a dynamic rule selecting a random backend (2.47M to 2.59M).	2025-09-16 09:23:46 +02:00
Willy Tarreau	fdf6fd5b45	MEDIUM: server: switch the host_dn member to cebis_tree This member is used to index the hostname_dn contents for DNS resolution. Let's replace it with a cebis_tree to save another 32 bytes (24 for the node + 8 by avoiding the duplication of the pointer). The struct server is now at 3904 bytes.	2025-09-16 09:23:46 +02:00
Willy Tarreau	413e903a22	MEDIUM: server: switch conf.name to cebis_tree This is used to index the server name and it contains a copy of the pointer to the server's name in <id>. Changing that for a ceb_node placed just before <id> saves 32 bytes to the struct server, which remains 3968 bytes large due to alignment. The proxy struct shrinks by 8 bytes to 3144. It's worth noting that the current way duplicate names are handled remains based on the previous mechanism where dups were permitted. Ideally we should now reject them during insertion and use unique key trees instead.	2025-09-16 09:23:46 +02:00
Willy Tarreau	0e99f64fc6	MEDIUM: server: switch addr_node to cebis_tree This contains the text representation of the server's address, for use with stick-tables with "srvkey addr". Switching them to a compact node saves 24 more bytes from this structure. The key was moved to an external pointer "addr_key" right after the node. The server struct is now 3968 bytes (down from 4032) due to alignment, and the proxy struct shrinks by 8 bytes to 3152.	2025-09-16 09:23:46 +02:00
Willy Tarreau	91258fb9d8	MEDIUM: guid: switch guid to more compact cebuis_tree The current guid struct size is 56 bytes. Once reduced using compact trees, it goes down to 32 (almost half). We're not on a critical path and size matters here, so better switch to this. It's worth noting that the name part could also be stored in the guid_node at the end to save 8 extra byte (no pointer needed anymore), however the purpose of this struct is to be embedded into other ones, which is not compatible with having a dynamic size. Affected struct sizes in bytes: Before After Diff server 4032 4032 0* proxy 3184 3160 -24 listener 752 728 -24 *: struct server is full of holes and padding (176 bytes) and is 64-byte aligned. Moving the guid_node elsewhere such as after sess_conn reduces it to 3968, or one less cache line. There's no point in moving anything now because forthcoming patches will arrange other parts.	2025-09-16 09:23:46 +02:00
Willy Tarreau	e36b3b60b3	MEDIUM: migrate the patterns reference to cebs_tree cebs_tree are 24 bytes smaller than ebst_tree (16B vs 40B), and pattern references are only used during map/acl updates, so their storage is pure loss between updates (which most of the time never happen). By switching their indexing to compact trees, we can save 16 to 24 bytes per entry depending on alightment (here it's 24 per struct but 16 practical as malloc's alignment keeps 8 unused). Tested on core i7-8650U running at 3.0 GHz, with a file containing 17.7M IP addresses (16.7M different): $ time ./haproxy -c -f acl-ip.cfg Save 280 MB RAM for 17.7M IP addresses, and slightly speeds up the startup (5.8%, from 19.2s to 18.2s), a part of which possible being attributed to having to write less memory. Note that this is on small strings. On larger ones such as user-agents, ebtree doesn't reread the whole key and might be more efficient. Before: RAM (VSZ/RSS): 4443912 3912444 real 0m19.211s user 0m18.138s sys 0m1.068s Overhead Command Shared Object Symbol 44.79% haproxy haproxy [.] ebst_insert 25.07% haproxy haproxy [.] ebmb_insert_prefix 3.44% haproxy libc-2.33.so [.] __libc_calloc 2.71% haproxy libc-2.33.so [.] _int_malloc 2.33% haproxy haproxy [.] free_pattern_tree 1.78% haproxy libc-2.33.so [.] inet_pton4 1.62% haproxy libc-2.33.so [.] _IO_fgets 1.58% haproxy libc-2.33.so [.] _int_free 1.56% haproxy haproxy [.] pat_ref_push 1.35% haproxy libc-2.33.so [.] malloc_consolidate 1.16% haproxy libc-2.33.so [.] __strlen_avx2 0.79% haproxy haproxy [.] pat_idx_tree_ip 0.76% haproxy haproxy [.] pat_ref_read_from_file 0.60% haproxy libc-2.33.so [.] __strrchr_avx2 0.55% haproxy libc-2.33.so [.] unlink_chunk.constprop.0 0.54% haproxy libc-2.33.so [.] __memchr_avx2 0.46% haproxy haproxy [.] pat_ref_append After: RAM (VSZ/RSS): 4166108 3634768 real 0m18.114s user 0m17.113s sys 0m0.996s Overhead Command Shared Object Symbol 38.99% haproxy haproxy [.] cebs_insert 27.09% haproxy haproxy [.] ebmb_insert_prefix 3.63% haproxy libc-2.33.so [.] __libc_calloc 3.18% haproxy libc-2.33.so [.] _int_malloc 2.69% haproxy haproxy [.] free_pattern_tree 1.99% haproxy libc-2.33.so [.] inet_pton4 1.74% haproxy libc-2.33.so [.] _IO_fgets 1.73% haproxy libc-2.33.so [.] _int_free 1.57% haproxy haproxy [.] pat_ref_push 1.48% haproxy libc-2.33.so [.] malloc_consolidate 1.22% haproxy libc-2.33.so [.] __strlen_avx2 1.05% haproxy libc-2.33.so [.] __strcmp_avx2 0.80% haproxy haproxy [.] pat_idx_tree_ip 0.74% haproxy libc-2.33.so [.] __memchr_avx2 0.69% haproxy libc-2.33.so [.] __strrchr_avx2 0.69% haproxy libc-2.33.so [.] _IO_getline_info 0.62% haproxy haproxy [.] pat_ref_read_from_file 0.56% haproxy libc-2.33.so [.] unlink_chunk.constprop.0 0.56% haproxy libc-2.33.so [.] cfree@GLIBC_2.2.5 0.46% haproxy haproxy [.] pat_ref_append If the addresses are totally disordered (via "shuf" on the input file), we see both implementations reach exactly 68.0s (slower due to much higher cache miss ratio). On large strings such as user agents (1 million here), it's now slightly slower (+9%): Before: real 0m2.475s user 0m2.316s sys 0m0.155s After: real 0m2.696s user 0m2.544s sys 0m0.147s But such patterns are much less common than short ones, and the memory savings do still count. Note that while it could be tempting to get rid of the list that chains all these pat_ref_elt together and only enumerate them by walking along the tree to save 16 extra bytes per entry, that's not possible due to the problem that insertion ordering is critical (think overlapping regex such as /index.* and /index.html). Currently it's not possible to proceed differently because patterns are first pre-loaded into the pat_ref via pat_ref_read_from_file_smp() and later indexed by pattern_read_from_file(), which has to only redo the second part anyway for maps/acls declared multiple times.	2025-09-16 09:23:46 +02:00
Willy Tarreau	ddf900a0ce	IMPORT: cebtree: import version 0.5.0 to support duplicates The support for duplicates is necessary for various use cases related to config names, so let's upgrade to the latest version which brings this support. This updates the cebtree code to commit 808ed67 (tag 0.5.0). A few tiny adaptations were needed: - replace a few ceb_node with ceb_root since pointers are now tagged ; - replace cebu.h with ceb.h since both are now merged in the same include file. This way we can drop the unused cebu*.h files from cebtree that are provided only for compatibility. - rename immediate storage functions to cebXX_imm_XXX() as per the API change in 0.5 that makes immediate explicit rather than implicit. This only affects vars and tools.c:copy_file_name(). The tests continue to work.	2025-09-16 09:23:46 +02:00
Remi Tricot-Le Breton	257df69fbd	BUG/MINOR: ocsp: Crash when updating CA during ocsp updates If an ocsp response is set to be updated automatically and some certificate or CA updates are performed on the CLI, if the CLI update happens while the OCSP response is being updated and is then detached from the udapte tree, it might be wrongly inserted into the update tree in 'ssl_sock_load_ocsp', and then reinserted when the update finishes. The update tree then gets corrupted and we could end up crashing when accessing other nodes in the ocsp response update tree. This patch must be backported up to 2.8. This patch fixes GitHub #3100.	2025-09-15 15:34:36 +02:00
Aurelien DARRAGON	6a92b14cc1	MEDIUM: log/proxy: store log-steps selection using a bitmask, not an eb tree An eb tree was used to anticipate for infinite amount of custom log steps configured at a proxy level. In turns out this makes no sense to configure that much logging steps for a proxy, and the cost of the eb tree is non negligible in terms of memory footprint, especially when used in a default section. Instead, let's use a simple bitmask, which allows up to 64 logging steps configured at proxy level. If we lack space some day (and need more than 64 logging steps to be configured), we could simply modify "struct log_steps" to spread the bitmask over multiple 64bits integers, minor some adjustments where the mask is set and checked.	2025-09-15 10:29:02 +02:00
Christopher Faulet	b582fd41c2	Revert "BUG/MINOR: ocsp: Crash when updating CA during ocsp updates" This reverts commit `167ea8fc7b`. The patch was backported by mistake.	2025-09-15 10:16:20 +02:00
Remi Tricot-Le Breton	167ea8fc7b	BUG/MINOR: ocsp: Crash when updating CA during ocsp updates If an ocsp response is set to be updated automatically and some certificate or CA updates are performed on the CLI, if the CLI update happens while the OCSP response is being updated and is then detached from the udapte tree, it might be wrongly inserted into the update tree in 'ssl_sock_load_ocsp', and then reinserted when the update finishes. The update tree then gets corrupted and we could end up crashing when accessing other nodes in the ocsp response update tree. This patch must be backported up to 2.8. This patch fixes GitHub #3100.	2025-09-15 08:20:16 +02:00
Willy Tarreau	8fb5ae5cc6	MINOR: activity/memory: count allocations performed under a lock By checking the current thread's locking status, it becomes possible to know during a memory allocation whether it's performed under a lock or not. Both pools and memprofile functions were instrumented to check for this and to increment the memprofile bin's locked_calls counter. This one, when not zero, is reported on "show profiling memory" with a percentage of all allocations that such locked allocations represent. This way it becomes possible to try to target certain code paths that are particularly expensive. Example: $ socat - /tmp/sock1 <<< "show profiling memory"\|grep lock 20297301 0 2598054528 0\| 0x62a820fa3991 sockaddr_alloc+0x61/0xa3 p_alloc(128) [pool=sockaddr] [locked=54962 (0.2 %)] 0 20297301 0 2598054528\| 0x62a820fa3a24 sockaddr_free+0x44/0x59 p_free(-128) [pool=sockaddr] [locked=34300 (0.1 %)] 9908432 0 1268279296 0\| 0x62a820eb8524 main+0x81974 p_alloc(128) [pool=task] [locked=9908432 (100.0 %)] 9908432 0 554872192 0\| 0x62a820eb85a6 main+0x819f6 p_alloc(56) [pool=tasklet] [locked=9908432 (100.0 %)] 263001 0 63120240 0\| 0x62a820fa3c97 conn_new+0x37/0x1b2 p_alloc(240) [pool=connection] [locked=20662 (7.8 %)] 71643 0 47307584 0\| 0x62a82105204d pool_get_from_os_noinc+0x12d/0x161 posix_memalign(660) [locked=5393 (7.5 %)]	2025-09-11 16:32:34 +02:00
Willy Tarreau	9d8c2a888b	MINOR: activity: collect CPU time spent on memory allocations for each task When task profiling is enabled, the pool alloc/free code will measure the time it takes to perform memory allocation after a cache miss or memory freeing to the shared cache or OS. The time taken with the thread-local cache is never measured as measuring that time is very expensive compared to the pool access time. Here doing so costs around 2% performance at 2M req/s, only when task profiling is enabled, so this remains reasonable. The scheduler takes care of collecting that time and updating the sched_activity entry corresponding to the current task when task profiling is enabled. The goal clearly is to track places that are wasting CPU time allocating and releasing too often, or causing large evictions. This appears like this in "show profiling tasks aggr": Tasks activity over 11.428 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg mem_avg lat_avg process_stream 44183891 16.47m 22.36us 491.0ns 1.154us 1.000ns 101.1us h1_io_cb 57386064 4.011m 4.193us 20.00ns 16.00ns - 29.47us sc_conn_io_cb 42088024 49.04s 1.165us - - - 54.67us h1_timeout_task 438171 196.5ms 448.0ns - - - 100.1us srv_cleanup_toremove_conns 65 1.468ms 22.58us 184.0ns 87.00ns - 101.3us task_process_applet 3 508.0us 169.3us - 107.0us 1.847us 29.67us srv_cleanup_idle_conns 6 225.3us 37.55us 15.74us 36.84us - 49.47us accept_queue_process 2 45.62us 22.81us - - 4.949us 54.33us	2025-09-11 16:32:34 +02:00
Willy Tarreau	195794eb59	MINOR: activity: add a new mem_avg column to show profiling stats This new column will be used for reporting the average time spent allocating or freeing memory in a task when task profiling is enabled. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	98cc815e3e	MINOR: activity: collect time spent with a lock held for each task When DEBUG_THREAD > 0 and task profiling enabled, we'll now measure the time spent with at least one lock held for each task. The time is collected by locking operations when locks are taken raising the level to one, or released resetting the level. An accumulator is updated in the thread_ctx struct that is collected by the scheduler when the task returns, and updated in the sched_activity entry of the related task. This allows to observe figures like this one: Tasks activity over 259.516 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 15466589 2.574m 9.984us - - 33.45us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 8047994 8.325s 1.034us - - 870.1us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 7734689 4.356m 33.79us 1.990us 1.641us 1.554ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 7734292 46.74m 362.6us 278.3us 132.2us 972.0us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 7733158 46.88s 6.061us - - 68.78us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 6603593 4.484m 40.74us 16.69us 34.00us 96.47us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup task_process_applet 4761796 3.420m 43.09us 18.79us 39.28us 138.2us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup process_table_expire 4710662 4.880m 62.16us 9.648us 53.95us 158.6us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 4171868 6.786s 1.626us - 1.487us 47.94us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup h1_io_cb 2871683 1.198s 417.0ns 70.00ns 69.00ns 1.005ms <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup process_peer_sync 2304957 5.368s 2.328us - 1.156us 68.54us <- stktable_add_pend_updates@src/stick_table.c:873 task_wakeup process_peer_sync 1388141 3.174s 2.286us - 1.130us 52.31us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 463488 3.530s 7.615us 2.000ns 7.134us 771.2us <- stktable_touch_with_exp@src/stick_table.c:654 tasklet_wakeup Here we see that almost the entirety of stktable_add_pend_updates() is spent under a lock, that 1/3 of the execution time of process_stream() was performed under a lock and that 2/3 of it was spent waiting for a lock (this is related to the 10 track-sc present in this config), and that the locking time in process_peer_sync() has now significantly reduced. This is more visible with "show profiling tasks aggr": Tasks activity over 475.354 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 25742539 3.699m 8.622us 11.00ns 10.00ns 188.0us sc_conn_io_cb 22565666 1.475m 3.920us - - 473.9us process_stream 21665212 1.195h 198.6us 140.6us 67.08us 1.266ms task_process_applet 16352495 11.31m 41.51us 17.98us 36.55us 112.3us process_peer_sync 7831923 17.15s 2.189us - 1.107us 41.27us process_table_expire 6878569 6.866m 59.89us 9.359us 51.91us 151.8us stktable_add_pend_updates 6602502 14.77s 2.236us - 2.060us 119.8us h1_timeout_task 801 703.4us 878.0ns - - 185.7us srv_cleanup_toremove_conns 347 12.43ms 35.82us 240.0ns 70.00ns 1.924ms accept_queue_process 142 1.384ms 9.743us - - 340.6us srv_cleanup_idle_conns 74 475.0us 6.418us 896.0ns 5.667us 114.6us	2025-09-11 16:32:34 +02:00

1 2 3 4 5 ...

4383 commits