We anticipated that the do-log action should be expanded with optional
arguments at some point. Now that we heard of multiple use-cases
that could be achieved with do-log action, but that are limitated by the
fact that all do-log statements inherit from the implicit log-profile
defined on the logger, we need to provide a way for the user to specify
that custom log-profile that could be used per do-log actions individually
This is what we try to achieve in this commit, by leveraging the
prerequisite work performed by the last 2 commits.
The new field th_ctx->rq_tot_peak contains the computed peak run queue
length averaged over the last 512 calls. This is computed when entering
process_runnable_tasks. It will not take into account new tasks that are
created or woken up during this round nor those which are evicted, which
is the reason why we're using a peak measurement to increase chances to
observe transient high values. Tests have shown that 512 samples are good
to provide a relatively smooth average measurement while still fading
away in a matter of milliseconds at high loads. Since this value is
only updated once per round, it cannot be used as a statistic and
shouldn't be exposed, it's only for internal use (self-regulation).
Sometimes it's desirable to observe fading away peak values, where a new
value that is higher than the historical one instantly replaces it,
otherwise contributes to it. It is convenient when trying to observe
certain phenomenons like peak queue sizes. The new function
swrate_add_peak_local() does that to a private variable (no atomic ops
involved as it's not worth the cost since such use cases are typically
local).
When "set-dumpable" is set to "libs", in addition to marking the process
dumpable, haproxy also reads the binary and shared objects into memory as
a tar archive in a page-aligned location so that these files are easily
extractable from a future core dump. The goal here is to always have
access to the exact same binary and libs as those which caused the core
to happen. It's indeed very frequent to miss some of these, or to get
mismatching files due to a local update that didn't experience a reload,
or to get those of a host system instead of the container.
The in-memory tar file presents everything under a directory called
"core-%d" where %d corresponds to the PID of the worker process. In
order to ease the finding of these data in the core dump, the memory
area is contiguous and surrounded by PROT_NONE pages so that it appears
in its own segment in the core file. The total size used by this is a
few tens of MB, which is not a problem on large systems.
New function load_file_into_tar() concatenates a file into an in-memory
tar archive and grows its size. Only the base name and a provided prefix
are used to name the faile. If the file cannot be loaded, it's added as
size zero and permissions 0 to show that it failed to load. This will
be used to load post-mortem information so it needs to remain simple.
The purpose here is to create a tar file header in memory from a known
file name, prefix, size and mode. It will be used to prepare archives
of libs in use for improved debugging, but may probably be useful for
other purposes due to its simplicity.
We no longer rely on now_offset stored in the shm-stats-file. Instead
haproxy automatically computes the now_offset relative to the monotonic
clock and the shared global clock.
Indeed, the previous model based on static now_offset when monotonic
clock is available proved to be insufficient when used in
combination with shm-stats-file (that is when monotonic clock is shared
between multiple co-processes). In ideal situation co-processes would
correctly apply the offset to their local monotonic clock and end up
with consistent now_ns. But when restarting from an existing
shm-stats-file from a previous session (ie: prior to reboot), then the
local monotonic clock would no longer be consistent with the one used
to update the file previously, so applying a static offset would fail
to restore clock consistency.
For this specific issue, a workaround was brought by 09bf116
("BUG/MEDIUM: stats-file: detect and fix inconsistent shared clock when resuming from shm-stats-file")
but the solution implemented there was deemed too fragile, because there
is a 60sec window where the fix would fail to detect inconsistent clock
and would leave haproxy with a broken clock ranging from 0 to 60 seconds,
which can be huge..
By simply recomputing the now_offset each time we learn from another
process (through the shared map by reading global_now_ns), we simply
recompute our local offset (difference between OUR monotonic clock
and the SHARED one). Also, in clock_update_global_date(), we make
sure we always recompute the now_offset as now_ms may have been
updated from shared clock if shared clock was ahead of us.
Thanks to that new logic, interrupted processes, resumed processes,
processed started with shm-stats-file from previous session now
correctly recover from those various situations and multiple
co-processes with diverting clocks on startup end up converging to
the same values.
Since it is no longer relevant to save now_offset in the map, it was
removed but to prevent shm-stats-file incompatibility with previous
versions, 8-byte hole was forced, and we didn't bump the shm-stats-file
version on purpose.
This patch may be backported in 3.3 after a solid period of observation
to ensure we didn't break things.
We defer processing of the "-dt" options until after the configuration
file has been read. This will be useful if we ever allow trace sources
to be registered later, for instance with LUA.
No backport needed.
Keywords registered out of an initcall will have a TH_EX_CTX_CLI_KWL
execution context pointing to the keyword list. The report will indicate
the 5 first words of the first command of the list, e.g.:
exec_ctx: cli kwl starting with 'debug counters '
This should also work for CLI keywords registered in Lua.
Now CLI keywords registered via an initcall will be tracked during
execution, by keeping a link to their initcall location. "show threads"
now shows "exec_ctx: kw registered at @debug.c:3093" which indeed
corresponds to the initcall for the debugging commands.
Till now the CLI didn't know what keyword was being processed after it
was parsed. In order to report the execution context, we'll need to
store it. And this may even help for post-mortem analysis to know the
exact keyword being processed, so let's store the pointer in the cli_ctx
part of the appctx.
It allows to know when a thread is currnetly running inside an applet.
For example now "show threads" will show "applet '<CLI>'" for the thread
issuing this command.
It now appears almost everywhere due to callbacks (e.g. ssl_sock_io_cb).
Muxes also become visible now on memory profiling. A small test on h1+ssl
yields 838 lines of statistics. The number of buckets should definitely
be increased, and more grouping criteria should be added.
A performance test was conducted to observe the possible effect of
setting the execution context on each task switch, and it didn't change
at all, remaining at about 1.01 billion ctxsw/s on a 128-thread EPYC.
Most calls to mux ops were instrumented with a CALL_MUX_WITH_RET() or
CALL_MUX_NO_RET() macro in order to make the current thread's context
point to the called mux and be able to track its allocations. Only
a bunch of harmless mux_ctl() and ->subscribe/unsubscribe calls were
left untouched since useless. But destroy/detach/shut/init/snd_buf
and rcv_buf are now tracked.
It will not show allocations performed in IO callback via tasklet
wakeups however.
In order to ease reading of the output, cmp_memprof_ctx() knows about
muxes and sorts based on the .subscribe function address instead of
the mux_ops address so as to keep various callers grouped.
Doing this allows to report the allocations/releases performed by filters
when running with memory profiling enabled. The flt_conf pointer is kept
and the report shows the filter name.
A bit similar to what was done for sample fetch functions and converters,
we now store with each action keyword the location of the initcall when
they're registered this way. Since there are many functions only calling
a LIST_APPEND() (one per ruleset), we now implement a dedicated function
to store the context in all keywords before doing the append.
However that's not sufficient, because keywords are not mandatory for
actions, so we cannot safely rely on rule->kw. Thus we then set the
exec_ctx per rule when they are all scanned in check_action_rules(),
based on the keyword if it exists, otherwise we make a context from
the action_ptr function if it is set (it should).
Finally at all call points we now check rule->exec_ctx.
The purpose here is to be able to spot certain callbacks, such as the
SSL message callbacks, which are difficult to associate to anything.
Thus we introduce a new context type, TH_EX_CTX_FUNC, for which the
context is just the function pointed to by the void *pointer. One
difficulty with callbacks is that the allocation and release contexts
will likely be different, so the code should be properly structured
to allow proper tracking, either by instrumenting all calls, or by
making sure that the free calls are easy to spot in a report.
With the two new context types TH_EX_CTX_SMPF/CONV, we can now also
report contexts corresponding to direct calls to sample_register_fetches()
and sample_register_convs(). In this case, the first word of the keyword
list is reported.
Now keywords are registered with an exec_ctx and this one is passed
when calling ->process. The ctx is of type INITCALL when passed via
an initcall where we know the file name and line number.
This was tested with and extra "malloc(15)" added in smp_fetch_path()
which shows that it works:
$ socat /tmp/sock1 - <<< "show profiling memory"|grep via
Calls | Tot Bytes | Caller and method [via]
1893399 0 60592592 0| 0x78b2ec task_run_applet+0x3339c malloc(32) [via initcall @http_fetch.c:2416]
When the execution context is set to TH_EX_CTX_INITCALL, the pointer
points to a valid initcall, and the decoder will show "kw registered
at %s:%d" with file and line number of the initcall declaration. It's
up to the caller to make the initcall pointer point to the one that was
set during the initcall. The purpose here is to be able to preserve and
pass that knowledge of an initcall down the chain so that future calls
to functions registered via the initcall are still assigned to it.
The INITCALL macros will now store the file and line number where they
are declared into the initcall struct, and RUN_INITCALLS() will assign
them to the global caller_file and caller_line variables, and will even
set caller_initcall to the current initall so that at any instant such
functions know where their caller declared them. This will help with
error messages and traces where a bit of context will be welcome.
The new function chunk_append_thread_ctx() appends to a buffer the given
execution context based on its type and pointer. The goal is to easily
use it in profiling output and thread dumps. For now it only handles
TH_EX_CTX_NONE (which prints nothing) and TH_EX_CTX_OTHER (which indicates
"other ctx" followed by the pointer). It will be extended by new types as
they arrive.
This now allows to report the same function in multiple bins based on the
th_ctx's exec_ctx discriminant. It's also worth noting that the context is
not atomically committed, but this shouldn't be a problem since a single
entry can get it. In the worst case, a second thread trying to create the
same context in parallel would create a different bin just for this call,
which is harmless. The same situation already exists with the caller
pointer.
We have the struct made of a type and a pointer in the th_ctx and a
function to switch it for the current thread. Two macros are provided
to enclose a callee within a temporary context. For now only type OTHER
is supported (only a generic pointer).
The purpose here is to combine two pointers and a long argument instead
of having the caller perform the mixing. Also it's cleaner and more
efficient this was as the arg is mixed after the multiplications, and
modern processors are efficient at multiplying then adding.
We'll need to further extend the pointer hashing code to pass extra
parameters and to retrieve the dropped bits, so let's first split the
part that hashes the pointer from the part that reduces the hash to
the desired size.
Starting with OpenSSL 4.0, X509_get_subject_name(), X509_get_issuer_name(),
and X509_CRL_get_issuer() return a const-qualified X509_NAME pointer.
Similarly, X509_NAME_get_entry() returns a const X509_NAME_ENTRY *, and
X509_NAME_ENTRY_get_data() returns a const ASN1_STRING *.
Introduce the __X509_NAME_CONST__ macro (defined to 'const' for OpenSSL
>= 4.0.0, empty for WolfSSL and older OpenSSL version which lacks const
on these APIs) and use it to qualify X509_NAME * variables and the
parameters of the three DN helper functions ssl_sock_get_dn_entry(),
ssl_sock_get_dn_formatted(), and ssl_sock_get_dn_oneline(). This avoids
both const-qualifier warnings on OpenSSL 4.0 and discarded-qualifier
warnings on WolfSSL, without needing explicit casts at call sites.
In ssl_sock.c (ssl_get_client_ca_file) and ssl_gencert.c
(ssl_sock_do_create_cert), a __X509_NAME_CONST__ X509_NAME * variable was
being reused to store the result of X509_NAME_dup() and then passed to
mutating functions (X509_NAME_add_entry_by_txt, X509_NAME_free). Introduce
separate X509_NAME * variables (xn_dup, subject) to hold the mutable
duplicate.
Original patch from Alexandr Nedvedicky <sashan@openssl.org>:
https://www.mail-archive.com/haproxy@formilux.org/msg46696.html
wake_srv_chk() function is now only used by srv_chk_io_cb(), the
health-checl I/O callback function. So let's remove it. The code of the
function was moved in srv_chk_io_cb().
At we fail to create a mux, in conn_create_mux(), instead of calling the
app_ops .wake() callback function, we can directly call sc_conn_process().
At this stage, we know we are using an connection, so it is safe to do so.
At the end of task_run_applet() and task_process_applet(), instead of
calling the app_ops .wake() callback function, we can directly call
sc_applet_process(). At this stage, we know we are using an applet, so it is
safe to do so.
The 'jwt_tokenize' function that can be used to split a JWT token into
its subparts can either fully process the token (from beginning to end)
when we need to check its signature, or only partially when using the
jwt_header_query or jwt_member_query converters. In this case we relied
on the fact that the return value of the 'jwt_tokenize' function was not
checked because a '-1' was returned (which was not actually an error).
In order to make this logic more explicit, the 'jwt_tokenize' function
now has a way to warn the caller that the token was invalid (less
subparts than the specified 'item_num') or that the token was not
processed in full (enough subparts found without parsing the token all
the way).
The function will now only return 0 if we found strictly the same number
of subparts as 'item_num'.
Existing "compression" filter is a multi-purpose filter that will try
to compress both requests and responses according to "compression"
settings, such as "compression direction".
One of the pre-requisite work identified to implement decompression
filter is that we needed a way to manually define the sequence of
enabled filters to chain them in the proper order to make
compression and decompression chains work as expected in regard
to the intended use-case.
Due to the current nature of the "compression" filter this was not
possible, because the filter has a combined action as it will try
to compress both requests and responses, and as we are about to
implement "filter-sequence" directive, we will not be able to
change the order of execution of the compression filter between
requests and responses.
A possible solution we identified to solve this issue is to split the
existing "compression" filter into 2 distinct filters, one which is
request-oriented, "comp-req", and another one which is response-oriented
"comp-res". This is what we are doing in this commit. Compression logic
in itself is unchanged, "comp-req" will only aim to compress the request
while "comp-res" will try to compress the response. Both filters will
still be invoked on request and responses hooks, but they only do their
part of the job.
From now on, to compress both requests and responses, both filters have
to be enabled on the proxy. To preserve original behavior, the "compression"
filter is still supported, what it does is that it instantiates both
"comp-req" and "comp-res" filters implicitly, as the compression filter is
now effectively split into 2 separate filters under the hood.
When using "comp-res" and "comp-req" filters explicitly, the use of the
"compression direction" setting is not relevant anymore. Indeed, the
compression direction is assumed as soon as one or both filters are
enabled. Thus "compression direction" is kept as a legacy option in
order to configure the "compression" generic filter.
Documentation was updated.
Add a new option, "stats calculate-max-counters [on|off]".
It makes it possible to disable the calculation of max counters, as they
can have a performance cost.
Introduce COUNTERS_UPDATE_MAX(), and use it instead of using
HA_ATOMIC_UPDATE_MAX() directly.
For now it just calls HA_ATOMIC_UPDATE_MAX(), but will later be modified
so that we can disable max calculation.
This can be backported up to 2.8 if the usage of COUNTERS_UPDATE_MAX()
generates too many conflicts.