We'll need to permit any user to update its own tgroup's extra counters
instead of the global ones. For this we now store the per-tgroup step
between two consecutive data storages, for when they're stored in a
tgroup array. When shared (e.g. resolvers or listeners), we just store
zero to indicate that it doesn't scale with tgroups. For now only the
registration was handled, it's not used yet.
Servers, proxies, listeners and resolvers all use extra_counters. We'll
need to move the storage to per-tgroup for those where it matters. Now
we're relying on an external storage, and the data member of the struct
was replaced with a pointer to that pointer to data called datap. When
the counters are registered, these datap are set to point to relevant
locations. In the case of proxies and servers, it points to the first
tgrp's storage. For listeners and resolvers, it points to a local
storage. The rationale here is that listeners are limited to a single
group anyway, and that resolvers have a low enough load so that we do
not care about contention there.
Nothing should change for the user at this point.
During proxy_finalize(), a lookup is performed over the servers by name
tree to detect any collision. Only the first conflict for each server
instance is reported to avoid a combinatory explosion with too many
alerts shown.
Previously, this was written using a for loop without any iteration.
Replace this by a simple if statement as this is cleaner.
This should fix github issue #3276.
Auto SNI configuration is configured during check config validity.
However, nothing was implemented for dynamic servers.
Fix this by implementing auto SNI configuration during "add server" CLI
handler. Auto SNI configuration code is moved in a dedicated function
srv_configure_auto_sni() called both for static and dynamic servers.
Along with this, allows the keyword "no-sni-auto" on dynamic servers, so
that this process can be deactivated if wanted. Note that "sni-auto"
remains unavailable as it only makes sense with default-servers which
are never used for dynamic server creation.
This must be backported up to 3.3.
There was no check on the result of strdup() used to setup auto SNI on a
server instance during check config validity. In case of failure, the
error would be silently ignored as the following server_parse_exprs()
does nothing when <sni_expr> server field is NULL. Hence, no SNI would
be used on the server, without any error nor warning reported.
Fix this by adding a check on strdup() return value. On error, ERR_ABORT
is reported along with an alert, parsing should be interrupted as soon
as possible.
This must be backported up to 3.3. Note that the related code in this
case is present in cfgparse.c source file.
Add a pointer to function to proxies as ->stream_new_from_sc proxy
struct member to instantiate stream from connection as this is done by
all the muxes when they call sc_new_from_endp(). The default value for
this pointer is obviously stream_new() which is exported by this patch.
At many places, we rely on global.tune.bufsize value instead of using the buffer
size. For now, it is not a problem. But if we want to be able to deal with
buffers of different sizes, it is good to reduce as far as possible dependencies
on the global value. most of time, we can use b_size() or c_size()
functions. The main change is performed on the error snapshot where the buffer
size was added into the error_snapshot structure.
For "add backend" implementation, postparsing code in
check_config_validity() from cfgparse.c has been extracted in a new
dedicated function named proxy_finalize() into proxy.c.
This has caused unexpected compilation issue as in the latter file
TLSEXT_TYPE_application_layer_protocol_negotiation macro may be
undefined, in particular when building without QUIC support. Thus, code
related to default ALPN on binds is discarded after the preprocessing
stage.
Fix this by including openssl-compat header file into proxy source file.
This should be sufficient to ensure SSL related defines are properly
included.
This should fix recent issues on SSL regtests.
No need to backport.
When a backend is created at runtime, the new proxy instance is inserted
at the end of proxies_list. This operation is buggy if this list is
empty : the code causes a null dereference which will lead to a crash.
This causes the following compilation error :
CC src/proxy.o
src/proxy.c: In function 'cli_parse_add_backend':
src/proxy.c:4933:36: warning: null pointer dereference [-Wnull-dereference]
4933 | proxies_list->next = px;
| ~~~~~~~~~~~~~~~~~~~^~~~
This patch fixes this issue. Note that in reality it cannot occur at
this moment as proxies_list cannot be empty (haproxy requires at least
one frontend to start, and the list also always contains internal
proxies).
No need to backport.
This patch fixes the following compilation error :
src/proxy.c:4954:12: error: format string is not a string literal
(potentially insecure) [-Werror,-Wformat-security]
4954 | ha_notice(msg);
| ^~~
No need to backport.
Implement proxy ID generation for dynamic backends. This is performed
through the already function existing proxy_get_next_id().
As an optimization, lookup will performed starting from a global
variable <dynpx_next_id>. It is initialized to the greatest ID assigned
after parsing, and updated each time a backend instance is created. When
backend deletion will be implemented, it could be lowered to the newly
available slot.
Implement the required operations for "add backend" handler. This
requires a new proxy allocation, settings copy from the specified
default instance and proxy config finalization. All handlers registered
via REGISTER_POST_PROXY_CHECK() are also called on the newly created
instance.
If no error were encountered, the newly created proxy is finally
attached in the proxies list.
This commits completes "add backend" handler with some checks performed
on the specified default proxy instance. These are additional checks
outside of the already existing inheritance rules, specific to dynamic
backends.
For now, a default proxy is considered not compatible if it is not in
mode TCP/HTTP. Also, a default proxy is rejected if it references HTTP
errors. This limitation may be lifted in the future, when HTTP errors
are partiallay reworked.
Add an optional "mode" argument to "add backend" CLI command. This
argument allows to specify if the backend is in TCP or HTTP mode.
By default, it is mandatory, unless the inherited default proxy already
explicitely specifies the mode. To differentiate if TCP mode is implicit
or explicit, a new proxy flag PR_FL_DEF_EXPLICIT_MODE is defined. It is
set for every defaults instances which explicitely defined their mode.
Define a basic CLI handler for "add backend".
For now, this handler only performs a parsing of the name argument and
return an error if a duplicate already exists. It runs under thread
isolation, to guarantee thread safety during the proxy creation.
This feature is considered in development. CLI command requires to set
experimental-mode.
Define a new utility function str_to_proxy_mode() which is able to
convert a string into the corresponding proxy mode if possible. This new
function is used for the parsing of "mode" configuration proxy keyword.
This patch will be reused for dynamic backend implementation, in order
to parse a similar "mode" argument via a CLI handler.
If a proxy is referencing a defaults instance, some checks must be
performed to ensure that inheritance will be compatible. Refcount of the
defaults instance may also be incremented if some settings cannot be
copied. This operation is performed when parsing a new proxy of defaults
section which references a defaults, either implicitely or explicitely.
This patch extracts this code into a dedicated function named
proxy_ref_defaults(). This in turn may call defaults_px_ref()
(previously called proxy_ref_defaults()) to increment its refcount.
The objective of this patch is to be able to reuse defaults inheritance
validation for dynamic backends created at runtime, outside of the
parsing code.
A lot of proxies initialization code is delayed on post-parsing stage,
as it depends on the configuration fully parsed. This is performed via a
loop on proxies_list.
Extract this code in a dedicated function proxy_finalize(). This patch
will be useful for dynamic backends creation.
Note that for the moment the code has been extracted as-is. With each
new features, some init code was added there. This has become a giant
loop with no real ordering. A future patch may provide some cleanup in
order to reorganize this.
A few capture pools can fail in case of too large values for example.
These include the req_uri, capture, and caphdr pools, and may be triggered
with "tune.http.logurilen 2147483647" in the global section, or one of
these in a frontend:
capture request header name len 2147483647
http-request capture src len 2147483647
tcp-request content capture src len 2147483647
These seem to be the only occurrences where create_pool()'s return value
is assigned without being checked, so let's add the proper check for
errors there. This can be backported as a hardening measure though the
risks and impacts are extremely low.
This patch changes the handling of named defaults sections. Prior to
this patch, every unreferenced defaults proxies were removed on post
parsing. Now by default, these sections are kept after postparsing and
only purged on deinit. The objective is to allow reusing them as base
configuration for dynamic backends.
To implement this, refcount of every still addressable named sections is
incremented by one after parsing. This ensures that they won't be
removed even if referencing proxies are removed at runtime. This is done
via the new function proxy_ref_all_defaults().
To ensure defaults instances are still properly removed on deinit, the
inverse operation is performed : refcount is decremented by one on every
defaults sections via proxy_unref_all_defaults().
The original behavior can still be used by using the new global keyword
tune.defaults.purge. This is useful for users using configuration with
large number of defaults and not interested in dynamic backends
creation.
Defaults section are indexed by their name in defproxy_by_name tree. For
named sections, there is no duplicate : if two instances have the same
name, the older one is removed from the tree. However, this was not the
case for unnamed defaults which are all stored inconditionnally in
defproxy_by_name.
This commit introduces a new approach for unnamed defaults. Now, these
instances are never inserted in the defproxy_by_name tree. Indeed, this
is not needed as no tree lookup is performed with empty names. This may
optimize slightly config parsing with a huge number of named and unnamed
defaults sections, as the first ones won't fill up the tree needlessly.
However, defproxy_by_name tree is also used to purge unreferenced
defaults instances, both on postparsing and deinit. Thus, a new approach
is needed for unnamed sections cleanup. Now, each time a new defaults is
parsed, if the previous instance is unnamed, it is freed unless if
referenced by a proxy. When config parsing is ended, a similar operation
is performed to ensure the last unnamed defaults section won't stay in
memory. To implement this, last_defproxy static variable is now set to
global. Unnamed sections which cannot be removed due to proxies
referencing proxies will still be removed when such proxies are freed
themselves, at runtime or on deinit.
Defaults proxies instance are stored in a global name tree. When there
is a name conflict and the older entry cannot be simply discarded as it
is already referenced, the older entry is instead removed from the name
tree and inserted into the orphaned list.
The purpose of the orphaned list was to guarantee that any remaining
unreferenced defaults are purged either on postparsing or deinit.
However, this is in fact completely useless. Indeed on postparsing,
orphaned entries are always referenced. On deinit instead, defaults are
already freed along the cleanup of all frontend/backend instances clean
up, thanks to their refcounting.
This patch streamlines this by removing orphaned list. Instead, a
defaults section is inserted into a new global defaults_list during
their whole lifetime. This is not strictly necessary but it ensures that
defaults instances can still be accessed easily in the future if needed
even if not present in the name tree. On deinit, a BUG_ON() is added to
ensure that defaults_list is indeed emptied.
Another benefit from this patch is to simplify the defaults deletion
procedure. Orphaned simple list is replaced by a proper double linked
list implementation, so a single LIST_DELETE() is now performed. This
will be notably useful as defaults may be removed at runtime in the
future if backends deletion at runtime is implemented.
This patch renames functions which deal with defaults section. A common
"defaults_px_" prefix is defined. This serves as a marker to identify
functions which can only be used with proxies defaults capability. New
BUG_ON() are enforced to ensure this is valid.
Also, older proxy_unref_or_destroy_defaults() is renamed
defaults_px_detach().
Function proxy_preset_defaults() purpose has evolved over time.
Originally, it was only used to initialize defaults proxies instances.
Until today, it was extended so that all proxies use it. Its objective
is to initialize settings to common default values.
To remove the confusion, this function is now removed. Its content is
integrated directly into init_new_proxy().
A defaults proxy instance may be move into the orphaned list when it is
replaced by a newer section with the same name. This is attached via
<next> member as a single linked list entry. However, proxy free does
not clear <next> attach point.
This causes a crash on deinit if orphaned list is not empty. First, all
frontend/backend instances are freed. This triggers the release of every
referenced defaults instances as their refcount reach zero, but orphaned
list is not clean up. A loop is then conducted on orphaned list via
proxy_destroy_all_unref_defaults(). This causes a segfault due to access
on already freed entries.
To fix this, this patch extends proxy_destroy_defaults(). If orphaned
list is not empty, a loop is performed to remove a possible entry of the
currently released defaults instance. This ensures that loop over
orphaned list won't be able to access to already freed entries.
This bug is pretty rare as it requires to have duplicate name in
defaults sections, and also to use settings which forces defaults
referencing, such as TCP/HTTP rules. This can be reproduced with the
minimal config here :
defaults def
http-request return status 200
frontend fe
bind :20080
defaults def
Note that in fact orphaned list looping is not strictly necessary, as
defaults instances are automatically removed via refcounting. This will
be the purpose of a future patch. However, to limit the risk of
regression on stable releases during backport, this patch uses the more
direct approach for now.
This must be backported up to 3.1.
A recent patch has introduced a new state for proxies : unpublished
backends. Such backends won't be eligilible for traffic, thus
use_backend/default_backend rules which target them won't match and
content switching rules processing will continue.
This patch defines a new frontend keywords 'force-be-switch'. This
keyword allows to ignore unpublished or disabled state. Thus,
use_backend/default_backend will match even if the target backend is
unpublished or disabled. This is useful to be able to test a backend
instance before exposing it outside.
This new keyword is converted into a persist rule of new type
PERSIST_TYPE_BE_SWITCH, stored in persist_rules list proxy member. This
is the only persist rule applicable to frontend side. Prior to this
commit, pure frontend proxies persist_rules list were always empty.
This new features requires adjustment in process_switching_rules(). Now,
when a use_backend/default_backend rule matches with an non eligible
backend, frontend persist_rules are inspected to detect if a
force-be-switch is present so that the backend may be selected.
Define a new set of CLI commands publish/unpublish backend <be>. The
objective is to be able to change the status of a backend to
unpublished. Such a backend is considered ineligible to traffic : this
allows to skip use_backend rules which target it.
Note that contrary to disabled/stopped proxies, an unpublished backend
still has server checks running on it.
Internally, a new proxy flags PR_FL_BE_UNPUBLISHED is defined. CLI
commands handler "publish backend" and "unpublish backend" are executed
under thread isolation. This guarantees that the flag can safely be set
or remove in the CLI handlers, and read during content-switching
processing.
force-persist proxy keyword is converted into a persist_rule, stored in
proxy persist_rules list member. Each new rule is dynamically allocated
during parsing.
This commit fixes the memory leak on deinit due to a missing free on
persist_rules list entries. This is done via deinit_proxy()
modification. Each rule in the list is freed, along with its associated
ACL condition type.
This can be backported to every stable version.
Instead of statically allocating the per-thread group counters,
based on the max number of thread groups available, allocate
them dynamically, based on the number of thread groups actually
used. That way we can increase the maximum number of thread
groups without using an unreasonable amount of memory.
This reverts commit a6504c9cfb.
Each supported QUIC algo are associated with a set of callbacks defined
in a structure quic_cc_algo. Originally, bind_conf would use a constant
pointer to one of these definitions.
During pacing implementation, this field was transformed into a
dynamically allocated value copied from the original definition. The
idea was to be able to tweak settings at the listener level. However,
this was never used in practice. As such, revert to the original model.
This may need to be backported to support QUIC congestion control
algorithm support on the server line in version 3.3.
In 3.2, commit f879b9a18 ("MINOR: proxies: Add a per-thread group field
to struct proxy") introduced struct proxy_per_tgroup that is declared as
thread_aligned, but is allocated using calloc(). Thus it is at risk of
crashing on machines using instructions requiring 64-byte alignment such
as AVX512. Let's use ha_aligned_zalloc_typed() instead of malloc().
For 3.2, we don't have aligned allocations, so instead the THREAD_ALIGNED()
will have to be removed from the struct definition. Alternately, we could
manually align it as is done for fdtab.
Commit fd012b6c5 ("OPTIM: proxy: move atomically access fields out of
the read-only ones") caused the proxy struct to be 64-byte aligned,
which allows the compiler to use optimizations such as AVX512 to zero
certain fields. However the struct was allocated using calloc() so it
was not necessarily aligned, causing segv on startup on compatible
machines. Let's just use ha_aligned_zalloc_typed() to allocate the
struct.
No backport is needed.
The "abortonclose" option was recently deprecated in frontends because its
action was essentially limited to the backend part (queuing etc). But in
3.3 we started to support it for TLS on frontends, though it would only
work when placed in a defaults section. Let's officially support it in
frontends, and take this opportunity to clarify the documentation on this
topic, which was incomplete regarding frontend and TLS support. Now the
doc tries to better cover the different use cases.
It is possible on at least Linux and FreeBSD to set the congestion control
algorithm to be used with incoming connections, among the list of supported
and permitted ones. Let's expose this setting with "cc". Permission issues
might be reported (as warnings).
The server ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct server for this node, plus 8 bytes in
struct proxy. The server struct is still 3904 bytes large (due to
alignment) and the proxy struct is 3072.
The listener ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct listener for this node, plus 8 bytes
in struct proxy. The struct listener is now 704 bytes large, and the
struct proxy 3080.
The proxy ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct proxy for this node, plus 8 bytes in
the root (which is static so not much relevant here). Now the proxy is
3088 bytes large.
This is used to index the proxy's name and it contains a copy of the
pointer to the proxy's name in <id>. Changing that for a ceb_node placed
just before <id> saves 32 bytes to the struct proxy, which is now 3112
bytes large.
Here we need to continue to support duplicates since they're still
allowed between type-incompatible proxies.
Interestingly, the use of cebis_next_dup() instead of cebis_next() in
proxy_find_by_name() allows us to get rid of an strcmp() that was
performed for each use_backend rule. A test with a large config
(100k backends) shows that we can get 3% extra performance on a
config involving a static use_backend rule (3.09M to 3.18M rps),
and even 4.5% on a dynamic rule selecting a random backend (2.47M
to 2.59M).
This contains the text representation of the server's address, for use
with stick-tables with "srvkey addr". Switching them to a compact node
saves 24 more bytes from this structure. The key was moved to an external
pointer "addr_key" right after the node.
The server struct is now 3968 bytes (down from 4032) due to alignment, and
the proxy struct shrinks by 8 bytes to 3152.
An eb tree was used to anticipate for infinite amount of custom log steps
configured at a proxy level. In turns out this makes no sense to configure
that much logging steps for a proxy, and the cost of the eb tree is non
negligible in terms of memory footprint, especially when used in a default
section.
Instead, let's use a simple bitmask, which allows up to 64 logging steps
configured at proxy level. If we lack space some day (and need more than
64 logging steps to be configured), we could simply modify
"struct log_steps" to spread the bitmask over multiple 64bits integers,
minor some adjustments where the mask is set and checked.
stktable_trash_oldest() does insist a lot on purging what was requested,
only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two
conditions, one to allocate a new stksess, and the other one to purge
entries of a stopping process. The cost of iterating over all shards
is huge, and a shard lock is taken each time before looking up entries.
Moreover, multiple threads can end up doing the same and looking hard for
many entries to purge when only one is needed. Furthermore, all threads
start from the same shard, hence synchronize their locks. All of this
costs a lot to other operations such as access from peers.
This commit simplifies the approach by ignoring the budget, starting
from a random shard number, and using a trylock so as to be able to
give up early in case of contention. The approach chosen here consists
in trying hard to flush at least one entry, but once at least one is
evicted or at least one trylock failed, then a failure on the trylock
will result in finishing.
The function now returns a success as long as one entry was freed.
With this, tests no longer show watchdog warnings during tests, though
a few still remain when stopping the tests (which are not related to
this function but to the contention from process_table_expire()).
With this change, under high contention some entries' purge might be
postponed and the table may occasionally contain slightly more entries
than their size (though this already happens since stksess_new() first
increments ->current before decrementing it).
Measures were made on a 64-core system with 8 peers
of 16 threads each, at CPU saturation (350k req/s each doing 10
track-sc) for 10M req, with 3 different approaches:
- this one resulted in 1500 failures to find an entry (0.015%
size overhead), with the lowest contention and the fairest
peers distibution.
- leaving only after a success resulted in 229 failures (0.0029%
size overhead) but doubled the time spent in the function (on
the write lock precisely).
- leaving only when both a success and a failed lock were met
resulted in 31 failures (0.00031% overhead) but the contention
was high enough again so that peers were not all up to date.
Considering that a saturated machine might exceed its entries by
0.015% is pretty minimal, the mechanism is kept.
This should be backported to 3.2 after a bit more testing as it
resolves some watchdog warnings and panics. It requires precedent
commit "MINOR: stick-table: permit stksess_new() to temporarily
allocate more entries" to over-allocate instead of failing in case
of contention.
Willy reported that the following config would segfault right after the
"removing incomplete section 'peer' is emitted:
peers peers
bind :2300
server n10 127.0.0.1:2310
listen dummy
bind localhost:9999
This is caused by the fact that stop_proxy(), which tries to read shared
counters, is called during early init while shared counters are not yet
initialized. To fix the crash, let's check if we're still during starting
phase, in which case we assume the counters are not initialized and we
assume 0 value instead.
No backport needed unless 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") is.
counters_{fe,be}_shared_prepare now take an extra <errmsg> parameter
that contains additional hints about the error in case of failure.
It must be freed accordingly since it is allocated using memprintf
CLI command "show servers conn" is used as a debugging tool to monitor
the number of connections per server. This patch extends its output by
adding the content of two server counters.
<served> is the first added column. It represents the number of active
streams on a server. <curr_sess_idle_conns> is the second added column.
This is a recently added value which account private idle connections
referencing a server.
We used to allocate and prepare listener counters from
check_config_validity() all at once. But it isn't correct, since at that
time listeners's guid are not inserted yet, thus
counters_fe_shared_prepare() cannot work correctly, and so does
shm_stats_file_preload() which is meant to be called even earlier.
Thus in this commit (and to prepare for upcoming shm shared counters
preloading patches), we handle the shared listener counters prep in
proxy_postcheck(), which means that between the allocation and the
prep there is the proper window for listener's guid insertion and shm
counters preloading.
No change of behavior expected when shm shared counters are not
actually used.
It happens that we free servers at various places in the code, both
on error paths and at runtime thanks to the "server delete" feature. In
order to switch to an aligned struct, we'll need to change the calloc()
and free() calls. Let's first spot them and switch them to srv_alloc()
and srv_free() instead of using calloc() and either free() or ha_free().
An easy trap to fall into is that some of them are default-server
entries. The new srv_free() function also resets the pointer like
ha_free() does.
This was done by running the following coccinelle script all over the
code:
@@
struct server *srv;
@@
(
- free(srv)
+ srv_free(&srv)
|
- ha_free(&srv)
+ srv_free(&srv)
)
@@
struct server *srv;
expression e1;
expression e2;
@@
(
- srv = malloc(e1)
+ srv = srv_alloc()
|
- srv = calloc(e1, e2)
+ srv = srv_alloc()
)
This is marked medium because despite spotting all call places, we can
never rule out the possibility that some out-of-tree patches would
allocate their own servers and continue to use the old API... at their
own risk.
post_section_px_cleanup(), which was implemented in abcc73830
("MEDIUM: proxy: register a post-section cleanup function"), is called
for the current section no matter if the parsing was aborted due to
a fatal error. In this case, the curproxy pointer may point to NULL,
yet post_section_px_cleanup() assumes curproxy pointer is always valid,
which could lead to NULL-deref.
For instance, the config below will cause SEGFAULT:
listen toto titi
To fix the issue, let's simply consider that the curproxy pointer may
be NULL in post_section_px_cleanup(), in which case we skip the cleanup
for the curproxy since there is nothing we can do.
No backport needed
75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing the shared
stats directly in counters struct") took care of renaming
counters_fe_shared_init() but we forgot counters_be_shared_init().
Let's fix that for consistency