A new type of transaction was introduced for master-cli streams. So
SF_TXN_PCLI flag and functions to allocate and destroy PCLI transactions
were added.
In the stream structure, all pcli_* members were moved in the pcli
transaction and the txn union was updated accordingly.
When it was ambiguous, a test on the transaction type was performed. For
instance to destroy the transaciton.
To be able to deal with different types of transaction for a stream, new
stream flags was added to know the transaction type when allocated. For now
only HTTP transactions can be allocated, so only SF_TXN_HTTP was
introduced. The mask SF_TXN_MASK must be used to get the transaction type.
The transaction type is set when it is allocated and removed when it is
destroyed.
The HTTP transaction is moved in an union. For now, it is the only possible
transaction that can be allocated. But that will change. Thanks to this
commit and the next one, it will be possible to deal with different kind of
transactions for a stream.
This patch looks quite huge, but it is more or less a renaming of all
accesses to "txn" field by "txn.http".
With the introduction of the `generate_unique_id()` helper, the actual
complicated logic is sitting in a different file. Allow inlining of
`stream_generate_unique_id()`, so that callers can benefit from an abstraction
without hiding away the access of `strm->unique_id` behind a function call.
This new function will handle the actual generation of the unique ID according
to a format. The caller is responsible to check that no unique ID is stored
yet.
The return value of the `if()` and `else` branch is identical. We can just move
it out of conditional paths.
Reviewed-by: Volker Dusch <github@wallbash.com>
In order to improve our ability to distinguish operations that had
already started from others under high loads, it would be nice to know
if an application layer (stream) has started to work with an endpoint
or not. The use case typically is a frontend mux instantiating a stream
to instantly cancel it. Currently this info will take some time to be
detected and processed if the applcation's task takes time to wake up.
By flagging the sedesc with SE_FL_APP_STARTED the first time a the app
layer starts, the lower layers can know whether they're cancelling a
stream that has started to work or not, and act accordingly. For now
this is done unconditionally on the backend, and performed early in the
only two app layers that can be reached by a frontend: process_stream()
and process_hstream() (for haterm).
When picking a mux, pay attention to its MX_FL_FRAMED. If it is set,
then it means we explicitely want QUIC, so don't use that mux for any
protocol that is not QUIC.
Thanks to previous commits, it is possible to use small buffers at different
places: to store the request when a connection is queued or when L7 retries
are enabled, or for health-checks requests. However, there was no
configuration parameter to fine tune small buffer use.
It is now possible, thanks to the proxy option "use-small-buffers".
Documentation was updated accordingly.
When a HTX stream is queued, if the request is small enough, it is moved
into a small buffer. This should save memory on instances intensively using
queues.
Applet and connection receive function were update to block receive when a
small buffer is in use.
The commit 9f1e9ee0e ("DEBUG: stream: Display the currently running rule in
stream dump") revealed a bug. When a stream is dumped, if it is blocked on a
rule, we must take care the rule has a keyword to display its name.
Indeed, some action parsings are inlined with the rule parser. In that case,
there is no keyword attached to the rule.
Because of this bug, crashes can be experienced when a stream is
dumped. Now, when there is no keyword, "?" is display instead.
This patch must be backported as far as 2.6.
A bit similar to what was done for sample fetch functions and converters,
we now store with each action keyword the location of the initcall when
they're registered this way. Since there are many functions only calling
a LIST_APPEND() (one per ruleset), we now implement a dedicated function
to store the context in all keywords before doing the append.
However that's not sufficient, because keywords are not mandatory for
actions, so we cannot safely rely on rule->kw. Thus we then set the
exec_ctx per rule when they are all scanned in check_action_rules(),
based on the keyword if it exists, otherwise we make a context from
the action_ptr function if it is set (it should).
Finally at all call points we now check rule->exec_ctx.
Introduce COUNTERS_UPDATE_MAX(), and use it instead of using
HA_ATOMIC_UPDATE_MAX() directly.
For now it just calls HA_ATOMIC_UPDATE_MAX(), but will later be modified
so that we can disable max calculation.
This can be backported up to 2.8 if the usage of COUNTERS_UPDATE_MAX()
generates too many conflicts.
Since the 3.1, when stream's info are dump, it is possible to print the
yielding filter on each channel, if any. It was useful to detect buggy
filter on spinning loop. But it is not possible to detect a filter consuming
too much CPU per-execution. We can see a filter was executing in the
backtrace reported by the watchdog, but we are unable to spot the specific
one.
Thanks to this patch, it is now possible. When a dump is emitted, the
running or yield filters on each channel are now displayed with their
current state (RUNNING or YIELDING).
This patch could be backported as far as 3.2 because it could be useful to
spot issues. But the filter API was slightly refactored in 3.4, so this
patch should be adapted.
Since the 2.5, when stream's info are dump, it is possible to print the
yielding rule, if any. It was useful to detect buggy rules on spinning
loop. But it is not possible to detect a rule consuming too much CPU
per-execution. We can see a rule was executing in the backtrace reported by
the watchdog, but we are unable to spot the specific rule.
Thanks to this patch, it is now possible. When a dump is emitted, the
running or yield rule is now displayed with its current state (RUNNING or
YIELDING).
This patch could be backported as far as 3.2 because it could be useful to
spot issues.
A number of C files include stats.h or stats-t.h, many of which were
just to access the counters. Now those which really need counters rely
on counters.h or counters-t.h, which already reduces the amount of
preprocessed code to be built (~3000 lines or about 0.05%).
Add a pointer to function to proxies as ->stream_new_from_sc proxy
struct member to instantiate stream from connection as this is done by
all the muxes when they call sc_new_from_endp(). The default value for
this pointer is obviously stream_new() which is exported by this patch.
Add the ability to set connect, queue and tarpit timeouts from the
set-timeout action. This is especially useful when using set-dst to
dynamically connect to servers.
This patch also adds the relevant fe_/be_/cur_ sample fetches for these
timeouts.
b_is_default() and b_is_large() can now be used to know if a buffer is a
default buffer or a large one. _b_free() now relies on it.
These functions are also used when possible (stream_free(),
stream_release_buffers() and http_wait_for_msg_body()).
Add the support for large bufers. A dedicated memory pool is added. The size
of these buffers must be explicitly configured by setting
"tune.bufsize.large" directive. If it is not set, the pool is not
created. In addition, if the size for large buffers is the same than for
regular buffer, the feature is automatically disable.
For now, large buffers remain unused.
Check the channels buffers size on release before trying to offer it to
waiting entities. Only normal buffers must be considered. This will be
mandatory when the large buffers support on channels will be added.
It is not a bug fix, because there is no way to hit the issue for now. But
there is nothing preventing a loop of synchronous sends in process_stream().
Indead, when a synchronous send is successfully performed, we restart the
SCs evaluation and at the end another synchronous send is attempted. So with
an endpoint consuming data bit by bit or with a filter fowarding few bytes
at each call, it is possible to loop for a while in process_stream().
Because it is not expected, we now limit the number of synchronous send per
wakeup to two calls. In a nominal case, it should never be more. This commit
is mandatory to be able to handle large buffers on channels
There is no reason to backport this commit except if the large buffers
support on channels are backported.
In the historical implementation, all filter related information where
stored at the stream level (using struct strm_flt * context), and filters
iteration was performed at the stream level also.
We identified that this was not ideal and would make the implementation of
future filters more complex since filters ordering should be handled in
a different order during request and response handling for decompression
for instance.
To make such thing possible, in this commit we migrate some channel
specific filter contexts in the channel directly (request or response),
and we implement 2 additional filter lists, one on the request channel
and another on the response channel. The historical stream filter list
is kept as-is because in some contexts only the stream is available and
we have to iterate on all filters. But for functions where we only are
interested in request side or response side filters, we now use dedicated
channel filters list instead.
The only overhead is that the "struct filter" was expanded by two "struct
list".
For now, no change of behavior is expected.
A recent patch has introduced a new state for proxies : unpublished
backends. Such backends won't be eligilible for traffic, thus
use_backend/default_backend rules which target them won't match and
content switching rules processing will continue.
This patch defines a new frontend keywords 'force-be-switch'. This
keyword allows to ignore unpublished or disabled state. Thus,
use_backend/default_backend will match even if the target backend is
unpublished or disabled. This is useful to be able to test a backend
instance before exposing it outside.
This new keyword is converted into a persist rule of new type
PERSIST_TYPE_BE_SWITCH, stored in persist_rules list proxy member. This
is the only persist rule applicable to frontend side. Prior to this
commit, pure frontend proxies persist_rules list were always empty.
This new features requires adjustment in process_switching_rules(). Now,
when a use_backend/default_backend rule matches with an non eligible
backend, frontend persist_rules are inspected to detect if a
force-be-switch is present so that the backend may be selected.
A proxy can be marked as disabled using the keyword with the same name.
The doc mentions that it won't process any traffic. However, this is not
really the case for backends as they may still be selected via switching
rules during stream processing.
In fact, currently access to disabled backends will be conducted up to
assign_server(). However, no eligible server is found at this stage,
resulting in a connection closure or an HTTP 503, which is expected. So
in the end, servers in disabled backends won't receive any traffic. But
this is only because post-parsing steps are not performed on such
backends. Thus, this can be considered as functional but only via
side-effects.
This patch clarifies the handling of disable backends, so that they are
never selected via switching rules. Now, process_switching_rules() will
ignore disable backends and continue rules evaluation.
As this is a behavior change, this patch is labelled as medium. The
documentation manuel for use_backend is updated accordingly.
This commit rewrites process_switching_rules() function. The objective
is to simplify backend selection so that a single unified
stream_set_backend() call is kept, both for regular and default backends
case.
This patch will be useful to add new capabilities on backends, in the
context of dynamic backend support implementation.
Instead of statically allocating the per-thread group counters,
based on the max number of thread groups available, allocate
them dynamically, based on the number of thread groups actually
used. That way we can increase the maximum number of thread
groups without using an unreasonable amount of memory.
The <total> field in the channel structure is now useless, so it can be
removed. The <bytes_in> field from the SC is used instead.
This patch is related to issue #1617.
bytes_in and bytes_out counters per frontend, backend, listener and server
were removed and we now rely on, respectively on, req_in and res_in
counters.
This patch is related to issue #1617.
per-stream bytes_in and bytes_out counters was removed and replaced by
req.in and res.in. Coorresponding samples still exists but replies on new
counters.
This patch is related to issue #1617.
Thanks to the previous patch, and based on info available on the stream, it
is now possible to have counters for frontends, backends, servers and
listeners to report number of bytes received and sent on both sides.
This patch is related to issue #1617.
req.in and req.out samples can now be used to get the number of bytes
received by a client and send to the server. And res.in and res.out samples
can be used to get the number of bytes received by a server and send to the
client. These info are stored in the logs structure inside a stream.
This patch is related to issue #1617.
<bytes_in> and <bytes_out> counters were added to SC to count, respectively,
the number of bytes received from an endpoint or sent to an endpoint. These
counters are updated for connections and applets.
This patch is related to issue #1617.
Thanks for previous changes, it is now possible to remove the <extra> field
from the HTX structure. HTX_FL_ALTERED_PAYLOAD flag is also removed because
it is now unsued.
In order to prepare for changing the way abortonclose works, let's
replace the direct flag check with a similarly named function
(proxy_abrt_close) which returns the on/off status of the directive
for the proxy. For now it simply reflects the flag's state.
This patch looks huge, but it has a very simple goal: protect all
accessed to shared stats pointers (either read or writes), because
we know consider that these pointers may be NULL.
The reason behind this is despite all precautions taken to ensure the
pointers shouldn't be NULL when not expected, there are still corner
cases (ie: frontends stats used on a backend which no FE cap and vice
versa) where we could try to access a memory area which is not
allocated. Willy stumbled on such cases while playing with the rings
servers upon connection error, which eventually led to process crashes
(since 3.3 when shared stats were implemented)
Also, we may decide later that shared stats are optional and should
be disabled on the proxy to save memory and CPU, and this patch is
a step further towards that goal.
So in essence, this patch ensures shared stats pointers are always
initialized (including NULL), and adds necessary guards before shared
stats pointers are de-referenced. Since we already had some checks
for backends and listeners stats, and the pointer address retrieval
should stay in cpu cache, let's hope that this patch doesn't impact
stats performance much.
This contains the text representation of the server's address, for use
with stick-tables with "srvkey addr". Switching them to a compact node
saves 24 more bytes from this structure. The key was moved to an external
pointer "addr_key" right after the node.
The server struct is now 3968 bytes (down from 4032) due to alignment, and
the proxy struct shrinks by 8 bytes to 3152.
This patchs reverts commit a498e527b ("BUG/MAJOR: stream: Remove READ/WRITE
events on channels after analysers eval") because of a regression. It was an
attempt to properly detect synchronous sends, even when the stream was woken
up on a write event. However, the fix was wrong because it could mask
shutdowns performed during process_stream() and block the stream.
Indeed, when a shutdown is performed, because an error occurred for
instance, a write event is reported. The commit above could mask this event
while the shutdown prevent any synchronous sends. In such case, the stream
could remain blocked infinitly because an I/O event was missed.
So to properly fix the original issue (#3070), the write event must not be
masked before a synchronous send. Instead, we now force the channel analysis
by setting explicitly CF_WAKE_ONCE flags on the corresponding channel if a
write event is reported after the synchronous send. CF_WRITE_EVENT flag is
remove explicitly just before, so it is quite easy to detect.
This patch must be backport to all stable version in same time of the commit
above.
Normally the connect loop cannot loop, but some recent traces can easily
convince one of the opposite. Let's add a counter, including in panic
dumps, in order to avoid the repeated long head scratching sessions
starting with "and what if...". In addition, if it's found to loop, this
time it will be certain and will indicate what to zoom in. This should
be backported to 3.2.
It is possible to miss a synchronous write event in process_stream() if the
stream was woken up on a write event. In that case, it is possible to freeze
the stream until the next I/O event or timeout.
Concretely, the stream is woken up with CF_WRITE_EVENT on a channel. this
flag is removed from the channel when we leave proces_stream(). But before
leaving process_stream(), when a synchronous send is tried on this channel,
the flag is removed and eventually set again on success. But this event is
masked by the previous one, and the channel is not resync as it should be.
To fix the bug, CF_READ_EVENT and CF_WRITE_EVENT flags are removed from a
channel after the corresponding analysers evaluation. This way, we will be
able to detect a successful synchronous send to restart analysers evaluation
based on the new channel state. It is safe (or it should be) to do so
becaues these flags are only used by analysers and tested to resync the
stream inside process_stream().
It is a very old bug and I guess all versions are affected. It was observed
on 2.9 and higher, and with the master/worker only. But it could affect any
stream. It is tagged a MAJOR because this area is really sensitive to any
change.
This patch should fix the issue #3070. It should probably be backported to
all stable versions, but only after a period of observation and with a
special care because this area is really sensitive to changes. It is
probably reasonnable to backport it as far as 3.0 and wait for older
versions.
Thanks to Valentine for its help on this issue !
This will make the pools size and alignment automatically inherit
the type declaration. It was done like this:
sed -i -e 's:DECLARE_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons)
sed -i -e 's:DECLARE_STATIC_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons)
81 replacements were made. The only remaining ones are those which set
their own size without depending on a structure. The few ones with an
extra size were manually handled.
It also means that the requested alignments are now checked against the
type's. Given that none is specified for now, no issue is reported.
It was verified with "show pools detailed" that the definitions are
exactly the same, and that the binaries are similar.
Following commit 75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing
the shared stats directly in counters struct"), in order to minimize the
impact of the recent sharded counters work, we try to push things a bit
further in this patch by storing and using "fast" pointers at the session
and stream levels when available to avoid costly indirections and
systematic "tgid" resolution (which can not be cached by the CPU due to
its THREAD-local nature).
Indeed, we know that a session/stream is tied to a given CPU, thanks to
this we know that the tgid for a given session/stream will never change.
Given that, we are able to store sharded frontend and listener counters
pointer at the session level (namely sess->fe_tgcounters and
sess->li_tgcounters), and once the backend and the server are selected,
we are also able to store backend and server sharded counters
pointer at the stream level (namely s->be_tgcounters and s->sv_tgcounters)
Everywhere we rely on these counters and the stream or session context is
available, we use the fast pointers it instead of the indirect pointers
path to make the pointer resolution a bit faster.
This optimization proved to bring a few percents back, and together with
the previous 75e480d10 commit we now fixed the performance regression (we
are back to back with 3.2 stats performance)
Between 3.2 and 3.3-dev we noticed a noticeable performance regression
due to stats handling. After bisecting, Willy found out that recent
work to split stats computing accross multiple thread groups (stats
sharding) was responsible for that performance regression. We're looking
at roughly 20% performance loss.
More precisely, it is the added indirections, multiplied by the number
of statistics that are updated for each request, which in the end causes
a significant amount of time being spent resolving pointers.
We noticed that the fe_counters_shared and be_counters_shared structures
which are currently allocated in dedicated memory since a0dcab5c
("MAJOR: counters: add shared counters base infrastructure")
are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") because they now essentially hold flags plus the
per-thread group id pointer mapping, not the counters themselves.
As such we decided to try merging fe_counters_shared and
be_counters_shared in their parent structures. The cost is slight memory
overhead for the parent structure, but it allows to get rid of one
pointer indirection. This patch alone yields visible performance gains
and almost restores 3.2 stats performance.
counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and
now returns either failure or success instead of a pointer because we
don't need to retrieve a shared pointer anymore, the function takes care
of initializing existing pointer.