The HTX functions used to add new HTX blocks in a message have been moved to
the header file to inline them in calling functions. These functions are
small enough.
A normalized URI is the internal term used to specify an URI is stored using
the absolute format (scheme + authority + path). For now, it is only used
for H2 clients. It is the default and recommended format for H2 request.
However, it is unusual for H1 servers to receive such URI. So in this case,
we only send the path of the absolute URI. It is performed for H1 servers,
but not for FCGI applications. This patch fixes the difference.
Note that it is not a real bug, because FCGI applications should support
abosolute URI.
Note also a normalized URI is only detected for H2 clients when a request is
received. There is no such test on the H1 side. It means an absolute URI
received from an H1 client will be sent without modification to an H1 server
or a FCGI application.
To make it possible, a dedicated function has been added to get the H1
URI. This function is called by the H1 and the FCGI multiplexer when a
request is sent to a server.
This patch should fix the issue #1232. It must be backported as far as 2.2.
The error path of the NUMA topology detection introduced in commit
b56a7c89a ("MEDIUM: cfgparse: detect numa and set affinity if needed")
lacks an initialization resulting in possible crashes at boot. No
backport is needed since that was introduced in 2.4-dev.
In proxy.c, when process is stopping we try to flush tables content
using 'stktable_trash_oldest'. A check on a counter "table->syncing" was
made to verify if there is no pending resync in progress.
But using multiple threads this counter can be increased by an other thread
only after some delay, so the content of some tables can be trashed earlier and
won't be pushed to the new process (after reload, some tables appear reset and
others don't).
This patch re-names the counter "table->syncing" to "table->refcnt" and
the counter is increased during configuration parsing (registering a table to
a peer section) to protect tables during runtime and until resync of a new
process has succeeded or failed.
The inc/dec operations are now made using atomic operations
because multiple peer sections could refer to the same table in futur.
This fix addresses github #1216.
This patch should be backported on all branches multi-thread support (v >= 1.8)
The peers task handling the "stopping" could wake up multiple
times in stopping state with WOKEN_SIGNAL: the connection to the
local peer initiated on the first processing was immediatly
shutdown by the next processing of the task and the old process
exits considering it is unable to connect. It results on
empty stick-tables after a reload.
This patch checks the flag 'PEERS_F_DONOTSTOP' to know if the
signal is considered and if remote peers connections shutdown
is already done or if a connection to the local peer must be
established.
This patch should be backported on all supported branches (v >= 1.6)
The old process checked each table resync status even if
the resync process is finished. This behavior had no known impact
except useless processing and was discovered during debugging on
an other issue.
This patch could be backported in all supported branches (v >= 1.6)
but once again, it has no impact except avoid useless processing.
In tv_update_date(), we calculate the new global date based on the local
one. It's very likely that other threads will end up with the exact same
now_ms date (at 1 million wakeups/s it happens 99.9% of the time), and
even the microsecond was measured to remain unchanged ~70% of the time
with 16 threads, simply because sometimes another thread already updated
a more recent version of it.
In such cases, performing a CAS to the global variable requires a cache
line flush which brings nothing. By checking if they're changed before
writing, we can divide by about 6 the number of writes to the global
variables, hence the overall contention.
In addition, it's worth noting that all threads will want to update at
the same time, so let's place a cpu relax call before trying again, this
will spread attempts apart.
The time adjustment is very rare, even at high pool rates. Tests show
that only 0.2% of tv_update_date() calls require a change of offset. Such
concurrent writes to a shared variable have an important impact on future
loads, so let's only update the variable if it changed.
The compilation is currently broken on platform without USE_CPU_AFFINITY
set. An error has been reported by the cygwin build of the CI.
This does not need to be backported.
In file included from include/haproxy/global-t.h:27,
from include/haproxy/global.h:26,
from include/haproxy/fd.h:33,
from src/ev_poll.c:22:
include/haproxy/cpuset-t.h:32:3: error: #error "No cpuset support implemented on this platform"
32 | # error "No cpuset support implemented on this platform"
| ^~~~~
include/haproxy/cpuset-t.h:37:2: error: unknown type name ‘CPUSET_REPR’
37 | CPUSET_REPR cpuset;
| ^~~~~~~~~~~
make: *** [Makefile:944: src/ev_poll.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from include/haproxy/global-t.h:27,
from include/haproxy/global.h:26,
from include/haproxy/fd.h:33,
from include/haproxy/connection.h:30,
from include/haproxy/ssl_sock.h:27,
from src/ssl_sample.c:30:
include/haproxy/cpuset-t.h:32:3: error: #error "No cpuset support implemented on this platform"
32 | # error "No cpuset support implemented on this platform"
| ^~~~~
include/haproxy/cpuset-t.h:37:2: error: unknown type name ‘CPUSET_REPR’
37 | CPUSET_REPR cpuset;
| ^~~~~~~~~~~
make: *** [Makefile:944: src/ssl_sample.o] Error 1
Fix the warning treated as error on the CI for the macOS compilation :
"src/haproxy.c:2939:23: error: unused variable 'set'
[-Werror,-Wunused-variable]"
This does not need to be backported.
Render numa detection optional with a global configuration statement
'no numa-cpu-mapping'. This can be used if the applied affinity of the
algorithm is not optimal. Also complete the documentation with this new
keyword.
On process startup, the CPU topology of the machine is inspected. If a
multi-socket CPU machine is detected, automatically define the process
affinity on the first node with active cpus. This is done to prevent an
impact on the overall performance of the process in case the topology of
the machine is unknown to the user.
This step is not executed in the following condition :
- a non-null nbthread statement is present
- a restrictive 'cpu-map' statement is present
- the process affinity is already restricted, for example via a taskset
call
For the record, benchmarks were executed on a machine with 2 CPUs
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. In both clear and ssl
scenario, the performance were sub-optimal without the automatic
rebinding on a single node.
Allow to specify multiple cpu ids/ranges in parse_cpu_set separated by a
comma. This is optional and must be activated by a parameter.
The comma support is disabled for the parsing of the 'cpu-map' config
statement. However, it will be useful to parse files in sysfs when
inspecting the cpus topology for NUMA automatic process binding.
Create a function thread_cpu_mask_forced. Its purpose is to report if a
restrictive cpu mask is active for the current proces, for example due
to a taskset invocation. It is only implemented for the linux platform
currently.
Use the platform independent type hap_cpuset for the cpu-map statement
parsing. This allow to address CPU index greater than LONGBITS.
Update the documentation to reflect the removal of this limit except for
platforms without cpu_set_t type or equivalent.
Replace the unsigned long parameter by a hap_cpuset. This allows to
address CPU with index greater than LONGBITS.
This function is used to parse the 'cpu-map' statement. However at the
moment, the result is casted back to a long to store it in the global
structure. The next step is to replace ulong in in cpu_map in the
global structure with hap_cpuset.
This module can be used to manipulate a cpu sets in a platform agnostic
way. Use the type cpu_set_t/cpuset_t if available on the platform, or
fallback to unsigned long, which limits de facto the maximum cpu index
to LONGBITS.
The H2_CF_RCVD_SHUT flag is used to report a read0 was encountered. It is
used by the H2 mux to properly handle shutdowns. However, this flag is only
set when no data are received. If it is detected at the socket level when
some data are received, it is not handled. And because the event was
reported on the connection, any other read attempts are blocked. In this
case, we are unable to close the connection and release the mux
immediately. We must wait the mux timeout expires.
This patch should fix the issue #1231. It must be backported as far as 2.0.
As we now embed the library we don't need to support the older 1.0 API
any more, so we can remove the explicit calls to slz_make_crc_table()
and slz_prepare_dist_table().
Now that SLZ is merged, let's update the makefile and compression
files to use it. As a result, SLZ_INC and SLZ_LIB are neither defined
nor used anymore.
USE_SLZ is enabled by default ("USE_SLZ=default") and can be disabled
by passing "USE_SLZ=" or by enabling USE_ZLIB=1.
The doc was updated to reflect the changes.
SLZ is rarely packaged by distros and there have been complaints about
the CPU and memory usage of ZLIB, leading to some suggestions to better
address the issue by simply integrating SLZ into the tree (just 3 files).
See discussions below:
https://www.mail-archive.com/haproxy@formilux.org/msg38037.htmlhttps://www.mail-archive.com/haproxy@formilux.org/msg40079.htmlhttps://www.mail-archive.com/haproxy@formilux.org/msg40365.html
This patch does just this, after minor adjustments to these files:
- tables.h was renamed to slz-tables.h
- tables.h had the precomputed tables removed since not used here
- slz.c uses includes <import/slz*> instead of "slz*.h"
The slz commit imported here was b06c172 ("slz: avoid a build warning
with -Wimplicit-fallthrough"). No other change was performed either to
SLZ nor to haproxy at this point so that this operation may be replicated
if needed for a future version.
Since commit 3f12887 ("MINOR: mworker: don't use children variable
anymore"), the oldpids array is not used anymore to generate the new -sf
parameters. So we don't need to set nb_oldpids to 0 during the first
start of the master process.
This patch fixes a bug when 2 masters process tries to synchronize their
peers, there is a small chances that it won't work because nb_oldpids
equals 0.
Should be backported as far as 2.0.
This bug affects the peers synchronisation code which rely on the
nb_oldpids variable to synchronize the peer from the old PID.
In the case the process is not started in master-worker mode and tries
to synchronize using the peers, there is a small chance that won't work
because nb_oldpids equals 0.
Fix the bug by setting the variable to 0 only in the case of the
master-worker when not reloaded.
It could also be a problem when trying to synchronize the peers between
2 masters process which should be fixed in another patch.
Bug exists since commit 8a361b5 ("BUG/MEDIUM: mworker: don't reuse PIDs
passed to the master").
Sould be backported as far as 1.8.
The application of a cpu-map statement with both process and threads
is broken (P-Q/1 or 1/P-Q notation).
For example, before the fix, when using P-Q/1, proc_t1 would be updated.
Then it would be AND'ed with thread which is still 0 and thus does
nothing.
Another problem is when using 1/1[-Q], thread[0] is defined. But if
there is multiple processes, every processes will use this define
affinity even if it should be applied only to 1st process.
The solution to the fix is a little bit too complex for my taste and
there is maybe a simpler solution but I did not wish to break the
storage of global.cpu_map, as it is quite painful to test all the
use-cases. Besides, this code will probably be clean up when
multiprocess support removed on the future version.
Let's try to explain my logic.
* either haproxy runs in multiprocess or multithread mode. If on
multiprocess, we should consider proc_t1 (P-Q/1 notation). If on
multithread, we should consider thread (1/P-Q notation). However
during parsing, the final number of processes or threads is unknown,
thus we have to consider the two possibilities.
* there is a special case for the first thread / first process which is
present in both execution modes. And as a matter of fact cpu-map 1 or
1/1 notation represents the same thing. Thus, thread[0] and proc_t1[0]
represents the same thing. To solve this problem, only thread[0] is
used for this special case.
This fix must be backported up to 2.0.
This normalizer removes "/./" segments from the path component.
Usually the dot refers to the current directory which renders those segments redundant.
See GitHub Issue #714.
Currently the delimiter is hardcoded as ampersand (&) but the function takes the delimiter as a paramter.
This patch replaces the hardcoded ampersand with the given delimiter.
When header are splitted over several frames, payload of HEADERS and
CONTINUATION frames are merged to form a unique HEADERS frame before
decoding the payload. To do so, info about the current frame are updated
(dff, dfl..) with info of the next one. Here there is a bug when the frame
length (dfl) is update. We must add the next frame length (hdr.dfl) and not
only the amount of data found in the buffer (clen). Because HEADERS frames
are decoded in one pass, dfl value is the whole frame length or 0. nothing
intermediary.
This patch must be backported as far as 2.0.
In the function decoding payload of HEADERS frames, an internal error is
returned if the frame length is too large. it cannot exceed the buffer
size. The same is true when headers are splitted on several frames. The
payload of HEADERS and CONTINUATION frames are merged and the overall size
must not exceed the buffer size.
However, there is a bug when the current frame is big enough to only have
the space for a part of the header of the next frame. Because, in this case,
we wait for more data, to have the whole frame header. We don't properly
detect that the headers are too large to be stored in one buffer. In fact
the test to trigger this error is not accurate. When the buffer is full, the
error is reported if the frame length exceeds the amount of data in the
buffer. But in reality, an error must be reported when we are unable to
decode the current frame while the buffer is full. Because, in this case, we
know there is no way to change this state.
When the bug happens, the H2 connection is woken up in loop, consumming all
the CPU. But the traffic is not blocked for all that.
This patch must be backported as far as 2.0.
gcc still reports a potential null pointer dereference in delete server
function event with a BUG_ON before it. Remove the misleading NULL check
in the for loop which should never happen.
This does not need to be backported.
Implement a new CLI command 'del server'. It can be used to removed a
dynamically added server. Only servers in maintenance mode can be
removed, and without pending/active/idle connection on it.
Add a new reg-test for this feature. The scenario of the reg-test need
to first add a dynamic server. It is then deleted and a client is used
to ensure that the server is non joinable.
The management doc is updated with the new command 'del server'.
cli_parse_add_server can be executed in parallel by several CLI
instances and so must be thread-safe. The critical points of the
function are :
- server duplicate detection
- insertion of the server in the proxy list
The mode of operation has been reversed. The server is first
instantiated and parsed. The duplicate check has been moved at the end
just before the insertion in the proxy list, under the thread isolation.
Thus, the thread safety is guaranteed and server allocation is kept
outside of locks/thread isolation.
Config information has been added into the logsrv struct. The filename
is duplicated and should be freed on exit.
Introduced in the current release.
This does not need to be backported.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
This improves the use of local variables in sample_conv_json_query:
- Use the enum type for the return value of `mjson_find`.
- Do not use single letter variables.
- Reduce the scope of variables that are only needed in a single branch.
- Add missing newlines after variable declaration.
The test in srv_alloc_lb() to allocate the lb_nodes[] array used in the
consistent hash was incorrect, it wouldn't do it for consistent hash and
could do it for regular random.
No backport is needed as this was added for dynamic servers in 2.4-dev by
commit f99f77a50 ("MEDIUM: server: implement 'add server' cli command").
Amaury noticed that I managed to break the build of DEBUG_FAIL_ALLOC
for the second time with 207c09509 ("MINOR: pools: move the fault
injector to __pool_alloc()"). The joy of endlessly reworking patch
sets... No backport is needed, that was in the just merged cleanup
series.
This function has become too big (251 bytes) and is now hurting
performance a lot, with up to 4% request rate being lost over the last
pool changes. Let's move it to pool.c as a regular function. Other
attempts were made to cut it in half but it's still inefficient. Doing
this results in saving ~90kB of object code, and even 112kB since the
pool changes, with code that is even slightly faster!
Conversely, pool_get_from_cache(), which remains half of this size, is
still faster inlined, likely in part due to the immediate use of the
returned pointer afterwards.
Till now it used to call it only if there were not too many objects into
the local cache otherwise would send the latest one directly into the
shared cache. Now it always sends to the local cache and it's up to the
local cache to free its oldest objects. From a cache freshness perspective
it's better this way since we always evict cold objects instead of hot
ones. From an API perspective it's better because it will help make the
shared cache invisible to the public API.
Till now we could only evict oldest objects from all local caches using
pool_evict_from_local_caches() until the cache size was satisfying again,
but there was no way to evict excess objects from a single cache, which
is the reason why pool_put_to_cache() used to refrain from putting into
the local cache and would directly write to the shared cache, resulting
in massive writes when caches were full.
Let's add this new function now. It will stop once the number of objects
in the local cache is no higher than 16+total/8 or the cache size is no
more than 75% full, just like before.
For now the function is not used.
Continuing the unification of local and shared pools, now the usage of
pools is governed by CONFIG_HAP_POOLS without which allocations and
releases are performed directly from the OS using pool_alloc_nocache()
and pool_free_nocache().
There are two levels of freeing to the OS:
- code that wants to keep the pool's usage counters updated uses
pool_free_area() and handles the counters itself. That's what
pool_put_to_shared_cache() does in the no-global-pools case.
- code that does not want to update the counters because they were
already updated only calls pool_free_area().
Let's extract these calls to establish the symmetry with pool_get_from_os()
and pool_alloc_nocache(), resulting in pool_put_to_os() (which only updates
the allocated counter) and pool_free_nocache() (which also updates the used
counter). This will later allow to simplify the generic code.
A part of the code cannot be factored out because it still uses non-atomic
inc/dec for pool->used and pool->allocated as these are located under the
pool's lock. While it can make sense in terms of bus cycles, it does not
make sense in terms of code normalization. Further, some operations were
still performed under a lock that could be totally removed via the use of
atomic ops.
There is still one occurrence in pool_put_to_shared_cache() in the locked
code where pool_free_area() is called under the lock, which must absolutely
be fixed.
Now there's one part dealing with the allocation itself and keeping
counters up to date, and another one on top of it to return such an
allocated pointer to the user and update the use count and stats.
This is in anticipation for being able to group cache-related parts.
The release code is still done at once.
Till now it was limited to objects allocated from the OS which means
it had little use as soon as pools were enabled. Let's move it upper
in the layers so that any code can benefit from fault injection. In
addition this allows to pass a new flag POOL_F_NO_FAIL to disable it
if some callers prefer a no-failure approach.
ha_random() is quite heavy and uses atomic ops or even a lock on some
architectures. Here we don't seek good randoms, just statistical ones,
so let's use the statistical prng instead.
Now the multi-level cache becomes more visible:
pool_get_from_local_cache()
pool_put_to_local_cache()
pool_get_from_shared_cache()
pool_put_to_shared_cache()
The functions were rightfully called from/to_cache when the thread-local
cache was considered as the only cache, but this is getting terribly
confusing. Let's call them from/to local_cache to make it clear that
it is not related with the shared cache.
As a side note, since pool_evict_from_cache() used not to work for a
particular pool but for all of them at once, it was renamed to
pool_evict_from_local_caches() (plural form).
They were strictly equivalent, let's remerge them and rename them to
pool_alloc_nocache() as it's the call which performs a real allocation
which does not check nor update the cache. The only difference in the
past was the former taking the lock and not the second but now the lock
is not needed anymore at this stage since the pool's list is not touched.
In addition, given that the "avail" argument is no longer used by the
function nor by its callers, let's drop it.
Now we don't loop anymore trying to refill multiple items at once, and
an allocated object is directly returned to the requester instead of
being stored into the shared pool. This has multiple benefits. The
first one is that no locking is needed anymore on the allocation path
and the second one is that the loop will no longer cause latency spikes.
This is a first step towards unifying all the fallback code. Right now
these two functions are the only ones which do not update the needed_avg
rate counter since there's currently no shared pool kept when using them.
But their code is similar to what could be used everywhere except for
this one, so let's make them capable of maintaining usage statistics.
As a side effect the needed field in "show pools" will now be populated.
The mem_should_fail() call enabled by DEBUG_FAIL_ALLOC used to be placed
only in the no-cache version of the allocator. Now we can generalize it
to all modes and remove the exclusive test on CONFIG_HAP_NO_GLOBAL_POOLS.
We're going to make the local pool always present unless pools are
completely disabled. This means that pools are always enabled by
default, regardless of the use of threads. Let's drop this notion
of "local" pools and make it just "pool". The equivalent debug
option becomes DEBUG_NO_POOLS instead of DEBUG_NO_LOCAL_POOLS.
For now this changes nothing except the option and dropping the
dependency on USE_THREAD.
Initially per-thread pool caches were stored into a fixed-size array.
But this was a bit ugly because the last allocated pools were not able
to benefit from the cache at all. As a work around to preserve
performance, a size of 64 cacheable pools was set by default (there
are 51 pools at the moment, excluding any addon and debugging code),
so all in-tree pools were covered, at the expense of higher memory
usage.
In addition an index had to be calculated for each pool, and was used
to acces the pool cache head into that array. The pool index was not
even stored into the pools so it was required to determine it to access
the cache when the pool was already known.
This patch changes this by moving the pool cache head into the pool
head itself. This way it is certain that each pool will have its own
cache. This removes the need for index calculation.
The pool cache head is 32 bytes long so it was aligned to 64B to avoid
false sharing between threads. The extra cost is not huge (~2kB more
per pool than before), and we'll make better use of that space soon.
The pool cache head contains the size, which should probably be removed
since it's already in the pool's head.
When building with DEBUG_FAIL_ALLOC we call a random generator to decide
whether the pool alloc should succeed or fail, and there was a preliminary
debugging mechanism to keep sort of a history of the previous decisions. But
it was never used, enforces a lock during the allocation, and forces to use
static variables, all of which are limiting the ability to pursue the pools
cleanups with no real benefit. Let's get rid of them now.
Since recent commit ae07592 ("MEDIUM: pools: add CONFIG_HAP_NO_GLOBAL_POOLS
and CONFIG_HAP_GLOBAL_POOLS") the pre-allocation of all desired reserved
buffers was not done anymore on systems not using the shared cache. This
basically has no practical impact since these ones will quickly be refilled
by all the ones used at run time, but it may confuse someone checking if
they're allocated in "show pools".
That's only 2.4-dev, no backport is needed.
When running with CONFIG_HAP_NO_GLOBAL_POOLS, it's theoritically possible
to keep an incorrect count of allocated entries in a pool because the
allocated counter was used as a cumulated counter of alloc calls instead
of a number of currently allocated items (it's possible the meaning has
changed over time). The only impact in this mode essentially is that
"show pools" will report incorrect values. But this would only happen on
limited pools, which is not even certain still exist.
This was added by recent commit 0bae07592 ("MEDIUM: pools: add
CONFIG_HAP_NO_GLOBAL_POOLS and CONFIG_HAP_GLOBAL_POOLS") so no backport
is needed.
This patch renames all existing uri-normalizers into a more consistent naming
scheme:
1. The part of the URI that is being touched.
2. The modification being performed as an explicit verb.
This normalizer merges `../` path segments with the predecing segment, removing
both the preceding segment and the `../`.
Empty segments do not receive special treatment. The `merge-slashes` normalizer
should be executed first.
See GitHub Issue #714.
When the session is aborted before any connection attempt to any server, the
number of connection retries reported in the logs is wrong. It happens
because when the retries counter is not strictly positive, we consider the
max number of retries was reached and the backend retries value is used. It
is obviously wrong when no connectioh was performed.
In fact, at this stage, the retries counter is initialized to 0. But the
backend stream-interface is in the INI state. Once it is set to SI_ST_REQ,
the counter is set to the backend value. And it is the only possible state
transition from INI state. Thus it is safe to rely on it to fix the bug.
This patch must be backported to all stable versions.
The http_get_stline() was designed to be called from HTTP analyzers. Thus
before any data forwarding. To prevent any invalid usage, two BUG_ON()
statements were added. However, it is not a good idea because it is pretty
hard to be sure no HTTP sample fetch will never be called outside the
analyzers context. Especially because there is at least one possible area
where it may happens. An HTTP sample fetch may be used inside the unique-id
format string. On the normal case, it is generated in AN_REQ_HTTP_INNER
analyzer. But if an error is reported too early, the id is generated when
the log is emitted.
So, it is safer to remove the BUG_ON() statements and consider the normal
behavior is to return NULL if the first block is not a start-line. Of
course, this means all calling functions must test the return value or be
sure the start-line is really there.
This patch must be backported as far as 2.0.
This patch adds 4 new sample fetches to get the source and the destination
info (ip address and port) of the backend connection :
* bc_dst : Returns the destination address of the backend connection
* bc_dst_port : Returns the destination port of the backend connection
* bc_src : Returns the source address of the backend connection
* bc_src_port : Returns the source port of the backend connection
The configuration manual was updated accordingly.
When method sample fetch is called, if an exotic method is found
(HTTP_METH_OTHER), when smp_prefetch_htx() is called, we must be sure the
start-line is still there. Otherwise, HAproxy may crash because of a NULL
pointer dereference, for instance if the method sample fetch is used inside
a unique-id format string. Indeed, the unique id may be generated when the
log message is emitted. At this stage, the request channel is empty.
This patch must be backported as far as 2.0. But the bug exists in all
stable versions for the legacy HTTP mode too. Thus it must be adapted to the
legacy HTTP mode and backported to all other stable versions.
For all ssl_bc_* sample fetches, the test on the keyword when called from a
health-check is inverted. We must be sure the 5th charater is a 'b' to
retrieve a connection.
This patch must be backported as far as 2.2.
bc_http_major sample fetch now works when it is called from a
tcp-check. When it happens, the session origin is a check. The backend
connection is retrieved from the conn-stream attached to the check.
If required, this path may easily be backported as far as 2.2.
fc_http_major and bc_http_major sample fetches return the major digit of the
HTTP version used, respectively, by the frontend and the backend
connections, based on the mux. However, in reality, "2" is returned if the
H2 mux is detected, otherwise "1" is inconditionally returned, regardless
the mux used. Thus, if called for a raw TCP connection, "1" is returned.
To fix this bug, we now get the multiplexer flags, if there is one, to be
sure MX_FL_HTX is set.
I guess it was made this way on purpose when the H2 multiplexer was
introduced in the 1.8 and with the legacy HTTP mode there is no other
solution at the connection level. Thus this patch should be backported as
far as 2.2. For the 2.0, it must be evaluated first because of the legacy
HTTP mode.
When a log-format string is built from an health-check, the session origin
is the health-check itself and not a connection. In addition, there is no
stream. It means for now some formats are not supported: %s, %sc, %b, %bi,
%bp, %si and %sp.
Thanks to this patch, the session origin is converted to a check. So it is
possible to retrieve the backend and the backend connection. Note this
session have no listener, thus %ft format must be guarded.
This patch is light and standalone, thus it may be backported as far as 2.2
if required. However, because the error is human, it is probably better to
wait a bit to be sure everything is properly protected.
The dummy frontend used to create the session of the tcp-checks is
initialized without identifier. However, it is required because this id may
be used without any guard, for instance in log-format string via "%f" or
when fe_name sample fetch is called. Thus, an unset id may lead to crashes.
This patch must be backported as far as 2.2.
When a thread ends its harmeless period, we must only consider running
threads when testing threads_want_rdv_mask mask. To do so, we reintroduce
all_threads_mask mask in the bitwise operation (It was removed to fix a
deadlock).
Note that for now it is useless because there is no way to stop threads or
to have threads reserved for another task. But it is safer this way to avoid
bugs in the future.
With the json_query can a JSON value be extacted from a header
or body of the request and saved to a variable.
This converter makes it possible to handle some JSON workload
to route requests to different backends.
This library is required for the subsequent patch which adds
the JSON query possibility.
It is necessary to change the include statement in "src/mjson.c"
because the imported includes in haproxy are in "include/import"
orig: #include "mjson.h"
new: #include <import/mjson.h>
ub64dec and ub64enc are the base64url equivalent of b64dec and base64
converters. base64url encoding is the "URL and Filename Safe Alphabet"
variant of base64 encoding. It is also used in in JWT (JSON Web Token)
standard.
RFC1421 mention in base64.c file is deprecated so it was replaced with
RFC4648 to which existing converters, base64/b64dec, still apply.
Example:
HAProxy:
http-request return content-type text/plain lf-string %[req.hdr(Authorization),word(2,.),ub64dec]
Client:
Token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyIjoiZm9vIiwia2V5IjoiY2hhZTZBaFhhaTZlIn0.5VsVj7mdxVvo1wP5c0dVHnr-S_khnIdFkThqvwukmdg
$ curl -H "Authorization: Bearer ${TOKEN}" http://haproxy.local
{"user":"foo","key":"chae6AhXai6e"}
Adjust the size of the sample buffer before we change the "area"
pointer. The change in size is calculated as the difference between the
original pointer and the new start pointer. But since the
`smp->data.u.str.area` assignment results in `smp->data.u.str.area` and
`start` being the same pointer, we always ended up substracting zero.
This changes it to change the size by the actual amount it changed.
I'm not entirely sure what the impact of this is, but the previous code
seemed wrong.
[wt: from what I can see the only harmful case is when the output is
converted to a stick-table key, it could result in zeroing past the
end of the buffer; other cases do not touch beyond ->data]
All allocation errors in cfg_parse_listen() are now handled in a unique
place under the "alloc_error" label. This simplify a bit error handling in
this function.
At several places during the proxy section parsing, memory allocation was
performed with no check. Result is now tested and an error is returned if
the allocation fails.
This patch may be backported to all stable version but it only fixes
allocation errors during configuration parsing. Thus, it is not mandatory.
Allocation error are now handled in bind_conf_alloc() functions. Thus
callers, when not already done, are also updated to catch NULL return value.
This patch may be backported (at least partially) to all stable
versions. However, it only fix errors durung configuration parsing. Thus it
is not mandatory.
Allocated variables are now released when an error occurred during
use_backend, use-server, force/ignore-parsing, stick-table, stick and stats
directives parsing. For some of these directives, allocation errors have
been added.
This patch may be backported to all stable version but it only fixes leaks
or allocation errors during configuration parsing. Thus, it is not
mandatory. It should fix issue #1119.
When an error occurred in hlua_register_cli(), the allocated lua function
and keyword must be released to avoid memory leaks.
This patch depends on "MINOR: hlua: Add function to release a lua
function". It may be backported in all stable versions.
When an error occurred in hlua_register_service(), the allocated lua
function and keyword must be released to avoid memory leaks.
This patch depends on "MINOR: hlua: Add function to release a lua
function". It may be backported in all stable versions.
When an error occurred in hlua_register_action(), the allocated lua function
and keyword must be released to avoid memory leaks.
This patch depends on "MINOR: hlua: Add function to release a lua
function". It may be backported in all stable versions.
hen an error occurred in action_register_lua(), the allocated hlua rule and
arguments must be released to avoid memory leaks.
This patch may be backported in all stable versions.
When an error occurred in hlua_register_fetches(), the allocated lua
function and keyword must be released to avoid memory leaks.
This patch depends on "MINOR: hlua: Add function to release a lua
function". It may be backported in all stable versions. It should fix#1112.
When an error occurred in hlua_register_converters(), the allocated lua
function and keyword must be released to avoid memory leaks.
This patch depends on "MINOR: hlua: Add function to release a lua
function". It may be backported in all stable versions.
When an error occurred in hlua_register_task(), the allocated lua context
and task must be released to avoid memory leaks.
This patch may be backported in all stable versions.
Add the trace support for the checks. Only tcp-check based health-checks are
supported, including the agent-check.
In traces, the first argument is always a check object. So it is easy to get
all info related to the check. The tcp-check ruleset, the conn-stream and
the connection, the server state...
Since 1.8 for simplicity the time offset used to compensate for time
drift and jumps had been stored per thread. But with a global time,
the complexit has significantly increased.
What this patch does in order to address this is to get back to the
origins of the pre-thread time drift correction, and keep a single
offset between the system's date and the current global date.
The thread first verifies from the before_poll date if the time jumped
backwards or forward, then either fixes it by computing the new most
likely date, or applies the current offset to this latest system date.
In the first case, if the date is out of range, the old one is reused
with the max_wait offset or not depending on the interrupted flag.
Then it compares its date to the global date and updates both so that
both remain monotonic and that the local date always reflects the
latest known global date.
In order to support atomic updates to the offset, it's saved as a
ullong which contains both the tv_sec and tv_usec parts in its high
and low words. Note that a part of the patch comes from the inlining
of the equivalent of tv_add applied to the offset to make sure that
signed ints are permitted (otherwise it depends on how timeval is
defined).
This is significantly more reliable than the previous model as the
global time should move in a much smoother way, and not according
to what thread last updated it, and the thread-local time should
always be very close to the global one.
Note that (at least for debugging) a cheap way to measure processing
lag would consist in measuring the difference between global_now_ms
and now_ms, as long as other threads keep it up-to-date.
Instead of using two CAS loops, better compute the two units
simultaneously and update them at once. There is no guarantee that
the update will be synchronous, but we don't care, what matters is
that both are monotonically updated and that global_now_ms always
follows the last known value of global_now.
In the global_now loop, we used to set tmp_adj from adjusted, then
set update it from tmp_now, then set adjusted back to tmp_adj, and
finally set now from adjusted. This is a long and unneeded set of
moves resulting from years of code changes. Let's just set now
directly in the loop, stop using adjusted and remove tmp_adj.
The time initialization was made a bit complex because we rely on a
dummy negative argument to reset all fields, leaving no distinction
between process-level initialization and thread-level initialization.
This patch changes this by introducing two functions, one for the
process and the second one for the threads. This removes ambigous
test and makes sure that the relevant fields are always initialized
exactly once. This also offers a better solution to the bug fixed in
commit b48e7c001 ("BUG/MEDIUM: time: make sure to always initialize
the global tick") as there is no more special values for global_now_ms.
It's simple enough to be backported if any other time-related issues
are encountered in stable versions in the future.
It was only used by freq_ctr and is not used anymore. In addition the
local curr_sec_ms was removed, as well as the equivalent extern
definitions which did not exist anymore either.
It remains cumbersome to preserve two versions of the freq counters and
two different internal clocks just for this. In addition, the savings
from using two different mechanisms are not that important as the only
saving is a divide that is replaced by a multiply, but now thanks to
the freq_ctr_total() unificaiton the code could also be simplified to
optimize it in case of constants.
This patch turns all non-period freq_ctr functions to static inlines
which call the period-based ones with a period of 1 second. A direct
benefit is that a single internal clock is now needed for any counter
and that they now all rely on ticks.
These 1-second counters are essentially used to report request rates
and to enforce a connection rate limitation in listeners. It was
verified that these continue to work like before.
Both structures are identical except the name of the field starting
the period and its description. Let's call them all freq_ctr and the
period's start "curr_tick" which is generic.
This is only a temporary change and fields are expected to remain
the same with no code change (verified).
This one is the easiest to implement, it just requires a call and a
divide of the result. Anti-flapping correction for low-rates was
preserved.
Now calls using a constant period will be able to use a reciprocal
multiply for the period instead of a divide.
Most of the functions designed to read a counter over a period go through
the same complex loop and only differ in the way they use the returned
values, so it was worth implementing all this into freq_ctr_total() which
returns the total number of events over a period so that the caller can
finish its operation using a divide or a remaining time calculation. As
a special case, read_freq_ctr_period() doesn't take pending events but
requires to enable an anti-flapping correction at very low frequencies.
Thus the function implements it when pend<0.
Thanks to this function it will be possible to reimplement the other ones
as inline and merge the per-second ones with the arbitrary period ones
without always adding the cost of a 64 bit divide.
This variable almost never changes and is read a lot in time-critical
sections. threads_want_rdv_mask is read very often as well in
thread_harmless_end() and is almost never changed (only when someone
uses thread_isolate()). Let's move both to read_mostly.
This one only contains the list of per-thread kqueue FDs, and is used
a lot during updates. Let's mark it read_mostly to avoid false sharing
of FDs placed at the extremities.
This one only contains the list of per-thread epoll FDs, and is used
a lot during updates. Let's mark it read_mostly to avoid false sharing
of FDs placed at the extremities.
Some pointer to arrays such as fdtab, fdinfo, polled_mask etc are never
written to at run time but are used a lot. fdtab accesses appear a lot in
perf top because ha_used_fds is in the same cache line and is modified
all the time. This patch moves all these read-mostly variables to the
read_mostly section when defined. This way their cache lines will be
able to remain in shared state in all CPU caches.
Some variables are mostly read (mostly pointers) but they tend to be
merged with other ones in the same cache line, slowing their access down
in multi-thread setups. This patch declares an empty, aligned variable
in a section called "read_mostly". This will force a cache-line alignment
on this section so that any variable declared in it will be certain to
avoid false sharing with other ones. The section will be eliminated at
link time if not used.
A __read_mostly attribute was added to compiler.h to ease use of this
section.
Interestingly, all arrays used to declare patterns were read-write while
only hard-coded. Let's mark them const so that they move from data to
rodata and don't risk to experience false sharing.
In mux_pt_io_cb(), if a connection error or a shutdown is detected, the mux
is destroyed. Thus we must be careful to not use it in a trace message once
destroyed.
No backport needed. This patch should fix the issue #1220.
As for the other muxes, traces are now supported in the pt mux. All parts of
the multiplexer is covered by these traces. Events are splitted by
categories (connection, stream, rx and tx).
In traces, the first argument is always a connection. So it is easy to get
the mux context (conn->ctx). The second argument is always a conn-stream and
mau be NUUL. The third one is a buffer and it may also be NULL. Depending on
the context it is the request or the response. In all cases it is owned by a
channel. Finally, the fourth argument is an integer value. Its meaning
depends on the calling context.
This patch re-works configuration parsing, it removes the "server"
lines from "resolvers" sections introduced in commit 56fc5d9eb:
MEDIUM: resolvers: add supports of TCP nameservers in resolvers.
It also extends the nameserver lines to support stream server
addresses such as:
resolvers
nameserver localhost tcp@127.0.0.1:53
Doing so, a part of nameserver's init code was factorized in
function 'parse_resolvers' and removed from 'post_parse_resolvers'.
This patch replaces roughly all occurrences of an HA_ATOMIC_ADD(&foo, 1)
or HA_ATOMIC_SUB(&foo, 1) with the equivalent HA_ATOMIC_INC(&foo) and
HA_ATOMIC_DEC(&foo) respectively. These are 507 changes over 45 files.
The fetch_and_xxx variant is often missing for add/sub/and/or. In fact
it was only provided for ADD under the name XADD which corresponds to
the x86 instruction name. But for destructive operations like AND and
OR it's missing even more as it's not possible to know the value before
modifying it.
This patch explicitly adds HA_ATOMIC_FETCH_{OR,AND,ADD,SUB} which
cover these standard operations, and renames XADD to FETCH_ADD (there
were only 6 call places).
In the future, backport of fixes involving such operations could simply
remap FETCH_ADD(x) to XADD(x), FETCH_SUB(x) to XADD(-x), and for the
OR/AND if needed, these could possibly be done using BTS/BTR.
It's worth noting that xchg could have been renamed to fetch_and_store()
but xchg already has well understood semantics and it wasn't needed to
go further.
Currently our atomic ops return a value but it's never known whether
the fetch is done before or after the operation, which causes some
confusion each time the value is desired. Let's create an explicit
variant of these operations suffixed with _FETCH to explicitly mention
that the fetch occurs after the operation, and make use of it at the
few call places.
Slightly reorder the status flags to better match their order in the
"state" field, and also decode the "shut" state which is particularly
useful and already part of this field.
There is a function called fd_write_frag_line() that's essentially used
by loggers and that is used to write an atomic message line over a file
descriptor using writev(). However a lock is required around the writev()
call to prevent messages from multiple threads from being interleaved.
Till now a SPIN_TRYLOCK was used on a dedicated lock that was common to
all FDs. This is quite not pretty as if there are multiple output pipes
to collect logs, there will be quite some contention. Now that there
are empty flags left in the FD state and that we can finally use atomic
ops on them, let's add a flag to indicate the FD is locked for exclusive
access by a syscall. At least the locking will now be on an FD basis and
not the whole process, so we can remove the log_lock.
No need to keep this flag apart any more, let's merge it into the global
state. The bit was not cleared in fd_insert() because the only user is
the function used to create and atomically send a log message to a pipe
FD, which never registers the fd. Here we clear it nevertheless for the
sake of clarity.
Note that with an extra cleaning pass we could have a bit number
here and simply use a BTS to test and set it.
No need to keep this flag apart any more, let's merge it into the global
state. The CLI's output state was extended to 6 digits and the linger/cloned
flags moved inside the parenthesis.
For a long time we've had fdtab[].ev and fdtab[].state which contain two
arbitrary sets of information, one is mostly the configuration plus some
shutdown reports and the other one is the latest polling status report
which also contains some sticky error and shutdown reports.
These ones used to be stored into distinct chars, complicating certain
operations and not even allowing to clearly see concurrent accesses (e.g.
fd_delete_orphan() would set the state to zero while fd_insert() would
only set the event to zero).
This patch creates a single uint with the two sets in it, still delimited
at the byte level for better readability. The original FD_EV_* values
remained at the lowest bit levels as they are also known by their bit
value. The next step will consist in merging the remaining bits into it.
The whole bits are now cleared both in fd_insert() and _fd_delete_orphan()
because after a complete check, it is certain that in both cases these
functions are the only ones touching these areas. Indeed, for
_fd_delete_orphan(), the thread_mask has already been zeroed before a
poller can call fd_update_event() which would touch the state, so it
is certain that _fd_delete_orphan() is alone. Regarding fd_insert(),
only one thread will get an FD at any moment, and it as this FD has
already been released by _fd_delete_orphan() by definition it is certain
that previous users have definitely stopped touching it.
Strictly speaking there's no need for clearing the state again in
fd_insert() but it's cheap and will remove some doubts during some
troubleshooting sessions.
In preparation of merging FD_POLL* and FD_EV*, this only changes the
value of FD_POLL_* to use bits 8-15 (the second byte). The size of the
field has been temporarily extended to 32 bits already, as well as
the temporary variables that carry the new composite value inside
fd_update_events(). The resulting fdtab entry becomes temporarily
unaligned. All places making access to .ev or FD_POLL_* were carefully
inspected to make sure they were safe regarding this change. Only one
temporary update was needed for the "show fd" code. The code was only
slightly inflated at this step.
The regression was introduced by commit previous commit 94aab06:
MEDIUM: log: support tcp or stream addresses on log lines.
This previous patch tries to retrieve the used protocol parsing
the address using the str2sa_range function but forgets that
the raw file descriptor adresses don't specify a protocol
and str2sa_range probes an error.
This patch re-work the str2sa_range function to stop
probing error if an authorized RAW_FD address is parsed
whereas the caller request also a protocol.
It also modify the code of parse_logsrv to switch on stream
logservers only if a protocol was detected.
An explicit stream address prefix such as "tcp6@" "tcp4@"
"stream+ipv6@" "stream+ipv4@" or "stream+unix@" will
allocate an implicit ring buffer with a forward server
targeting the given address.
This is usefull to simply send logs to a log server in tcp
and It doesn't need to declare a ring section in configuration.
This patch registers the parsed file and the line where a log server
is declared to make those information available in configuration
post check.
Those new informations were added on error messages probed resolving
ring names on post configuration check.
Since the internal function str2sa_range is used to addresses
for different objects ('server', 'bind' but also 'log' or
'nameserver') we notice that some combinations are missing.
"ip@" is introduced to authorize the prefix "dgram+ip@" or
"stream+ip@" which dectects automatically IP version but
specify dgram or stream.
"tcp@" was introduced and is an alias for "stream+ip@".
"tcp6" and "tcp4" are now aliases for "stream+ipv6@" and
"stream+ipv4@".
"uxst@" and "uxdg@" are now aliases for "stream+unix@" and
"dgram+unix@".
This patch also adds a complete section in documentation to
describe adresses and their prefixes.
Commit c20ad0d8db (BUG/MINOR: tools: make
parse_time_err() more strict on the timer validity) broke parsing the "us"
unit in timers. It caused `parse_time_err()` to return the string "s",
which indicates an error.
Now if the "u" is followed by an "s" we properly continue processing the
time instead of immediately failing.
This fixes#1209. It must be backported to all stable versions.
When a script retrieves request data from an HTTP applet, line per line or
not, we must be sure to properly detect the end of the request by checking
HTX_FL_EOM flag when everything was consumed. Otherwise, the script may
hang.
It is pretty easy to reproduce the bug by calling applet:receive() without
specifying any length. If the request is not chunked, the function never
returns.
The bug was introduced when the EOM block was removed. Thus, it is specific
to the 2.4. This patch should fix the issue #1207. No backport needed.
HTTP_2.0 predefined macro returns true for HTTP/2 requests. HTTP/2 doen't
convey a version information, so this macro may seem a bit strange. But for
compatiblity reasons, internally, the "HTTP/2.0" version is set. Thus, it is
handy to rely on it to differenciate HTTP/1 and HTTP/2 requests.
Some predefined ACLs were still based on deprecated sample fetches, like
req_proto_http or req_ver. Now, they use non-deprecated sample fetches. In
addition, the usage lines in the configuration manual have been updated to
be more explicit.
Both the source file and the dummy library are now at the same place.
Maybe the build howto could be moved there as well to make things even
cleaner.
The Makefile, MAINTAINERS, doc, and vtest matrix were updated.
Both the source file and the dummy library are now at the same place.
Maybe the build howto could be moved there as well to make things even
cleaner.
The Makefile, MAINTAINERS, doc, github build matrix, coverity checks
and travis CI's build were updated.
Now it's much cleaner, both 51d.c and the dummy library live together and
are easier to spot and maintain. The build howto probably ought to be moved
there as well. Makefile, docs and MAINTAINERS were updated, as well as
the github CI's build matrix, travis CI's, and coverity checks.
The following directories were moved from contrib/ to dev/ to make their
use case a bit clearer. In short, only developers are expected to ever
go there. The makefile was updated to build and clean from these ones.
base64/ flags/ hpack/ plug_qdisc/ poll/ tcploop/ trace/
Add a diagnostic to check that two servers of the same backend does not
use the same cookie value. Ignore backup servers as it is quite common
for them to share a cookie value with a primary one.
Define MODE_DIAG which is used to run haproxy in diagnostic mode. This
mode is used to output extra warnings about possible configuration
blunder or sub-optimal usage. It can be activated with argument '-dD'.
A new output function ha_diag_warning is implemented reserved for
diagnostic output. It serves to standardize the format of diagnostic
messages.
A macro HA_DIAG_WARN_COND is also available to automatically check if
diagnostic mode is on before executing the diagnostic check.
In issue #1200 Coverity believes we may use an uninitialized field
smp.sess here while it's not possible because the returned variable
necessarily matches SCOPE_PROC hence smp.sess is not used. But it
cannot see this and it could be confusing if the code later evolved
into something more complex. That's not a critical path so let's
first reset the sample.
A bug was introduced when the legacy HTTP mode was removed. To capture the
HTTP version of the request or the response, we rely on the message state to
be sure the status line was received. However, the test is inverted. The
version can be captured if message headers were received, not the opposite.
This patch must be backported as far as 2.2.
Historically, an option was added to wait for the request payload (option
http-buffer-request). This option has 2 drawbacks. First, it is an ON/OFF
option for the whole proxy. It cannot be enabled on demand depending on the
message. Then, as its name suggests, it only works on the request side. The
only option to wait for the response payload was to write a dedicated
filter. While it is an acceptable solution for complex applications, it is a
bit overkill to simply match strings in the body.
To make everyone happy, this patch adds a dedicated HTTP action to wait for
the message payload, for the request or the response depending it is used in
an http-request or an http-response ruleset. The time to wait is
configurable and, optionally, the minimum payload size to have before stop
to wait.
Both the http action and the old http analyzer rely on the same internal
function.
L6 sample fetches are now ignored when called from an HTTP proxy. Thus, a
warning is emitted during the startup if such usage is detected. It is true
for most ACLs and for log-format strings. Unfortunately, it is a bit painful
to do so for sample expressions.
This patch relies on the commit "MINOR: action: Use a generic function to
check validity of an action rule list".
The check_action_rules() function is now used to check the validity of an
action rule list. It is used from check_config_validity() function to check
L5/6/7 rulesets.
It is not really a context-less sample fetch, but it is internal. And it
only fails if no stream is attached to the sample. This way, it is still
possible to use it on an HTTP proxy (L6 sample fetches are ignored now for
HTTP proxies).
If the commit "BUG/MINOR: payload/htx: Ingore L6 sample fetches for HTX
streams/checks" is backported, it may be a good idea to backport this one
too. But only as far as 2.2.
Use a L6 sample fetch on an HTX streams or a HTX health-check is meaningless
because data are not raw but structured. So now, these sample fetches fail
when called from an HTTP proxy. In addition, a warning has been added in the
configuration manual, at the begining of the L6 sample fetches section.
Note that req.len and res.len samples return the HTX data size instead of
failing. It is not accurate because it does not reflect the buffer size nor
the raw data length. But we keep it for backward compatibility purpose.
However it remains a bit strange to use it on an HTTP proxy.
This patch may be backported to all versions supporting the HTX, i.e as far
as 2.0. But the part about the health-checks is only valid for the 2.2 and
upper.
If a 'switch-mode http' tcp action is configured on a listener with no
backend, a warning is displayed to remember HTTP connections cannot be
routed to TCP servers. Indeed, backend connection is still established using
the proxy mode.
It is now possible to perform HTTP upgrades on a TCP stream from the
frontend side. To do so, a tcp-request content rule must be defined with the
switch-mode action, specifying the mode (for now, only http is supported)
and optionnaly the proto (h1 or h2).
This way it could be possible to set HTTP directives on a TCP frontend which
will only be evaluated if an upgrade is performed. This new way to perform
HTTP upgrades should replace progressively the old way, consisting to route
the request to an HTTP backend. And it should be also a good start to remove
all HTTP processing from tcp-request content rules.
This action is terminal, it stops the ruleset evaluation. It is only
available on proxy with the frontend capability.
The configuration manual has been updated accordingly.
The code responsible to perform an HTTP upgrade from a TCP stream is moved
in a dedicated function, stream_set_http_mode().
The stream_set_backend() function is slightly updated, especially to
correctly set the request analysers.
Now allocation and initialization of HTTP transactions are performed in a
unique function. Historically, there were two functions because the same TXN
was reset for K/A connections in the legacy HTTP mode. Now, in HTX, K/A
connections are handled at the mux level. A new stream, and thus a new TXN,
is created for each request. In addition, the function responsible to end
the TXN is now also reponsible to release it.
So, now, http_create_txn() and http_destroy_txn() must be used to create and
destroy an HTTP transaction.
It is just a small cleanup. AN_REQ_FLT_HTTP_HDRS and AN_RES_FLT_HTTP_HDRS
analysers are now set in HTTP analysers at the same place
AN_REQ_HTTP_XFER_BODY and AN_RES_HTTP_XFER_BODY are set.
We now use the stream instead of the proxy to know if we are processing HTTP
data or not. If the stream is an HTX stream, it means we are dealing with
HTTP data. It is more accurate than the proxy mode because when an HTTP
upgrade is performed, the proxy is not changed and only the stream may be
used.
Note that it was not a problem to rely on the proxy because HTTP upgrades
may only happen when an HTTP backend was set. But, we will add the support
of HTTP upgrades on the frontend side, after te tcp-request rules
evaluation. In this context, we cannot rely on the proxy mode.
Add "none" in the list of supported mux protocols. It relies on the
passthrough multiplexer and use almost the same mux_ops structure. Only the
flags differ because this "new" mux does not support the upgrades. "none"
was chosen to explicitly stated there is not processing at the mux level.
Thus it is now possible to set "proto none" or "check-proto none" on
bind/server lines, depending on the context. However, when set, no upgrade
to HTTP is performed. It may be a way to disable HTTP upgrades per bind
line.
Add "h1" in the list of supported mux protocols. It relies on the H1
multiplexer and use the almost the same mux_ops structure. Only the flags
differ because this "new" mux does not support the upgrades.
Thus it is now possible to set "proto h1" or "check-proto h1" on bind/server
lines, depending on the context. However, when set, no upgrade to HTTP/2 is
performed. It may be a way to disable implicit HTTP/2 upgrades per bind
line.
MX_FL_NO_UPG flag may now be set on a multiplexer to explicitly disable
upgrades from this mux. For now, it is set on the FCGI multiplexer because
it is not supported and there is no upgrade on backend-only multiplexers. It
is also set on the H2 multiplexer because it is clearly not supported.
No warning is emitted if some http-after-response rules are configured on a
TCP proxy while such warning messages are emitted for other HTTP ruleset in
same condition. It is just an oversight.
This patch may be backported as far as 2.2.
When a TCP stream is first upgraded to H1 and then to H2, we must be sure to
inhibit any connect and to properly handle the TCP stream destruction.
When the TCP stream is upgraded to H1, the HTTP analysers are set. Thus
http_wait_for_request() is called. In this case, the server connection must
be blocked, waiting for the request analysis. Otherwise, a server may be
assigned to the stream too early. It is especially a problem if the stream
is finally destroyed because of an implicit upgrade to H2.
In this case, the stream processing must be properly aborted to not have a
stalled stream. Thus, if a shutdown is detected in http_wait_for_request()
when an HTTP upgrade is performed, the stream is aborted.
It is a 2.4-specific bug. No backport is needed.
Always set frontend HTTP analysers when an HTX stream is created. It is only
useful in case a destructive HTTP upgrades (TCP>H2) because the frontend is
a TCP proxy.
In fact, to be strict, we must only set these analysers when the upgrade is
performed before setting the backend (it is not supported yet, but this
patch is required to do so), in the frontend part. If the upgrade happens
when the backend is set, it means the HTTP processing is just the backend
buisness. But there is no way to make the difference when a stream is
created, at least for now.
When an HTX stream is created, be sure to always create the HTTP txn object,
regardless of the ".http_needed" value of the frontend. That happens when a
destructive HTTP upgrades is performed (TCP>H2). The frontend is a TCP
proxy. If there is no dependency on the HTTP part, the HTTP transaction is
not created at this stage but only when the backend is set. For now, it is
not a problem. But an HTTP txn will be mandatory to fully support TCP to
HTTP upgrades after frontend tcp-request rules evaluation.
When a TCP stream is upgraded to H2 stream, a destructive upgrade is
performed. It means the TCP stream is silently released while a new one is
created. It is of course more complicated but it is what we observe from the
stream point of view.
That was performed by returning an error when the backend was set. It is
neither really elegant nor accurate. So now, instead of returning an error
from stream_set_backend() in case of destructive HTTP upgrades, the TCP
stream processing is aborted and no error is reported. However, the result
is more or less the same.
sess_log() was called twice if an error occurred on the preface parsing, in
h2c_frt_recv_preface() and in h2_process_demux().
This patch must be backported as far as 2.0.
The fix in commit 7b0e00d94 ("BUG/MINOR: http_fetch: make hdr_ip() reject
trailing characters") made hdr_ip() more sensitive to empty fields, for
example if a trusted proxy incorrectly sends the header with an empty
value, we could return 0.0.0.0 which is not correct. Let's make sure we
only assign an IPv4 type here when a non-empty address was found.
This should be backported to all branches where the fix above was
backported.
Historically we've used SOL_IP/SOL_IPV6/SOL_TCP everywhere as the socket
level value in getsockopt() and setsockopt() but as we've seen over time
it regularly broke the build and required to have them defined to their
IPPROTO_* equivalent. The Linux ip(7) man page says:
Using the SOL_IP socket options level isn't portable; BSD-based
stacks use the IPPROTO_IP level.
And it indeed looks like a pure linuxism inherited from old examples and
documentation. strace also reports SOL_* instead of IPPROTO_*, which does
not help... A check to linux/in.h shows they have the same values. Only
SOL_SOCKET and other non-IP values make sense since there is no IPPROTO
equivalent.
Let's get rid of this annoying confusion by removing all redefinitions of
SOL_IP/IPV6/TCP and using IPPROTO_* instead, just like any other operating
system. This also removes duplicated tests for the same value.
Note that this should not result in exposing syscalls to other OSes
as the only ones that were still conditionned to SOL_IPV6 were for
IPV6_UNICAST_HOPS which already had an IPPROTO_IPV6 equivalent, and
IPV6_TRANSPARENT which is Linux-specific.
Lukas reported in issue #1203 that the previous fix for silent-drop in
commit ab79ee8b1 ("BUG/MINOR: tcp: fix silent-drop workaround for IPv6")
breaks the build on FreeBSD/MacOS due to SOL_IPV6 not being defined. On
these platforms, IPPROTO_IPV6 must be used instead, so this should fix
it.
This needs to be backported to whatever version the fix above is backported
to.
As reported in github issue #1203 the TTL-based workaround that is used
when permissions are insufficient for the TCP_REPAIR trick does not work
for IPv6 because we're using only SOL_IP with IP_TTL. In IPv6 we have to
use SOL_IPV6 and IPV6_UNICAST_HOPS. Let's pick the right one based on the
source address's family.
This may be backported to all versions.
The issue with non-rotating freq counters was addressed in commit 8cc586c73
("BUG/MEDIUM: freq_ctr/threads: use the global_now_ms variable") using the
global date. But an issue remained with the comparison of the most recent
time. Since the initial time in the structure is zero, the tick_is_lt()
works on half of the periods depending on the first date an entry is
touched. And the wrapping happened last night:
$ date --date=@$(((($(date +%s) * 1000) & -0x8000000) / 1000))
Mon Mar 29 23:59:46 CEST 2021
So users of the last fix (backported to 2.3.8) may experience again an
always increasing rate for the next 24 days if they restart their process.
Let's always update the time if the latest date was not updated yet. It
will likely be simplified once the function is reorganized but this will
do the job for now.
Note that since this timer is only used by freq counters, no other
sub-system is affected. The bug can easily be tested with this config
during the right time period (i.e. today to today+24 days + N*49.7 days):
global
stats socket /tmp/sock1
frontend web
bind :8080
mode http
http-request track-sc0 src
stick-table type ip size 1m expire 1h store http_req_rate(2s)
Issuing 'socat - /tmp/sock1 <<< "show table web"' should show a stable
rate after 2 seconds.
The fix must be backported to 2.3 and any other version the fix above
goes into.
Thanks to Thomas SIMON and Sander Klein for quickly reporting this issue
with a working reproducer.
When a backend is in status DOWN and going UP it is currently displayed
as yellow ("active UP, going down") instead of orange ("active DOWN, going
UP"). This patches restyles the table rows to actually match the
legend.
This may be backported to any version, the issue appeared in 1.7-dev2
with commit 0c378efe8 ("MEDIUM: stats: compute the color code only in
the HTML form").
In payload() and payload_lv() sample fetches, if the buffer is empty, we
must wait for more data by setting SMP_F_MAY_CHANGE flag on the sample.
Otherwise, when it happens in an ACL, nothing is returned (because the
buffer is empty) and the ACL is considered as finished (success or failure
depending on the test).
As a workaround, the buffer length may be tested first. For instance :
tcp-request inspect-delay 1s
tcp-request content reject unless { req.len gt 0 } { req.payload(0,0),fix_is_valid }
instead of :
tcp-request inspect-delay 1s
tcp-request content reject if ! { req.payload(0,0),fix_is_valid }
This patch must be backported as far as 2.2.
Commit b1adf03df ("MEDIUM: backend: use a trylock when trying to grab an
idle connection") solved a contention issue on the backend under normal
condition, but there is another one further, which only happens when the
number of FDs in use is considered too high, and which obviously causes
random crashes with just 16 threads once the number of FDs is about to be
exhausted.
Like the aforementioned patch, this one should be backported to 2.3.
set var <name> <expression>
Allows to set or overwrite the process-wide variable 'name' with the result
of expression <expression>. Only process-wide variables may be used, so the
name must begin with 'proc.' otherwise no variable will be set. The
<expression> may only involve "internal" sample fetch keywords and converters
even though the most likely useful ones will be str('something') or int().
Note that the command line parser doesn't know about quotes, so any space in
the expression must be preceeded by a backslash. This command requires levels
"operator" or "admin". This command is only supported on a CLI connection
running in experimental mode (see "experimental-mode on").
Just like for "set-var" in the global section, the command uses a temporary
dummy proxy to create a temporary "set-var(name)" rule to assign the value.
The reg test was updated to verify that an updated global variable is properly
reflected in subsequent HTTP responses.
Process-wide variables can now be displayed from the CLI using "get var"
followed by the variable name. They must all start with "proc." otherwise
they will not be found. The output is very similar to the one of the
debug converter, with a type and value being reported for the embedded
sample.
This command is limited to clients with the level "operator" or higher,
since it can possibly expose traffic-related data.
In order to process samples from the command line interface we'll need
rules as well, and these rules will have to be marked as coming from
the CLI parser. This new origin is used for this.
In order to prepare for supporting calling sample expressions from the
CLI, let's create a new CLI_PARSER parsing context. This one supports
constants and internal samples only.
While we do support process-wide variables ("proc.<name>"), there was
no way to preset them from the configuration. This was particularly
limiting their usefulness since configs involving them always had to
first check if the variable was set prior to performing an operation.
This patch adds a new "set-var" directive in the global section that
supports setting the proc.<name> variables from an expression, like
other set-var actions do. The syntax however follows what is already
being done for setenv, which consists in having one argument for the
variable name and another one for the expression.
Only "constant" expressions are allowed here, such as "int", "str"
etc, combined with arithmetic or string converters, and variable
lookups. A few extra sample fetch keywords like "date", "rand" and
"uuid" are also part of the constant expressions and may make sense
to allow to create a random key or differentiate processes.
The way it was done consists in parsing a dummy rule an executing the
expression in the CFG_PARSE context, then releasing the expression.
This is safe because the sample that variables store does not hold a
back pointer to expression that created them.
In order to process samples from the config file we'll need rules as
well, and these rules will have to be marked as coming from the
config parser. This new origin is used for this.
We'd sometimes like to be able to process samples while parsing
the configuration based on purely internal thing but that's not
possible right now. Let's add a new CFG_PARSER context for samples
which only permits constant samples (i.e. those which do not change
in the process' life and which are stable during config parsing).
A number of keywords are really constant and safe to use at config
time. This is the case for str(), int() etc but also env(), hostname(),
nbproc() etc. By extension a few other ones which can be useful to
preset values in a configuration were enabled as well, like data(),
rand() or uuid(). At the moment this doesn't change anything as they
are still only usable from runtime rules.
The "var()" keyword was also marked as const as it can definitely
return stable stuff at boot time.
This level indicates that everything it constant in the expression during
the whole process' life and that it may safely be used at config parsing
time.
For now smp_resolve_args() complains on stderr via ha_alert(), but if we
want to make it a bit more dynamic, we need it to return errors in an
allocated message. Let's pass it an error pointer and have it fill it.
On return we indent the output if it contains more than one line.
The "stopping" sample fetch keyword was accidently duplicated in 1.9
by commit 70fe94419 ("MINOR: sample: add cpu_calls, cpu_ns_avg,
cpu_ns_tot, lat_ns_avg, lat_ns_tot"). This has no effect so no
backport is needed.
This sample fetch doesn't require any L4 client session in practice, as
get_var() now checks for the session. This is important to remove this
dependency in order to support accessing variables in scope "proc" from
anywhere.
Instantiate both lua Socket servers tcp/ssl using standard function
new_server. There is currently no need to tune their settings except to
activate the ssl mode with noverify for the second one. Both servers are
freed with the free_server function.
Replace static initialization of the lua Socket proxy with the standard
function alloc_new_proxy. The settings proxy are properly applied thanks
to PR_CAP_LUA. The proxy is freed with the free_proxy function.
Define a new cap PR_CAP_LUA. It can be used to allocate the internal
proxy for lua Socket class. This cap overrides default settings for
preferable values in the lua context.
Move all liberation code related to a proxy in a dedicated function
free_proxy in proxy.c. For now, this function is only called in
haproxy.c. In the future, it will be used to free the lua proxy.
This helps to clean up haproxy.c.
Create a new function parse_new_proxy specifically designed to allocate
a new proxy from the configuration file and copy settings from the
default proxy.
The function alloc_new_proxy is reduced to a minimal allocation. It is
used for default proxy allocation and could also be used for internal
proxies such as the lua Socket proxy.
Move deinit_acl_cond and deinit_act_rules from haproxy.c respectively in
acl.c and action.c. The name of the functions has been slightly altered,
replacing the prefix deinit_* by free_* to reflect their purpose more
clearly.
This change has been made in preparation to the implementation of a free
proxy function. As a side-effect, it helps to clean up haproxy.c.
Create a new module init which contains code related to REGISTER_*
macros for initcalls. init.h is included in api.h to make init code
available to all modules.
It's a step to clean up a bit haproxy.c/global.h.
If the first active line of a crt-list file is also the first mentioned
certificate of a frontend that does not have the strict-sni option
enabled, then its certificate will be used as the default one. We then
do not want this instance to be removable since it would make a frontend
lose its default certificate.
Considering that a crt-list file can be used by multiple frontends, and
that its first mentioned certificate can be used as default certificate
for only a subset of those frontends, we do not want the line to be
removable for some frontends and not the others. So if any of the ckch
instances corresponding to a crt-list line is a default instance, the
removal of the crt-list line will be forbidden.
It can be backported as far as 2.2.
The default SSL_CTX used by a specific frontend is the one of the first
ckch instance created for this frontend. If this instance has SNIs, then
the SSL context is linked to the instance through the list of SNIs
contained in it. If the instance does not have any SNIs though, then the
SSL_CTX is only referenced by the bind_conf structure and the instance
itself has no link to it.
When trying to update a certificate used by the default instance through
a cli command, a new version of the default instance was rebuilt but the
default SSL context referenced in the bind_conf structure would not be
changed, resulting in a buggy behavior in which depending on the SNI
used by the client, he could either use the new version of the updated
certificate or the original one.
This patch adds a reference to the default SSL context in the default
ckch instances so that it can be hot swapped during a certificate
update.
This should fix GitHub issue #1143.
It can be backported as far as 2.2.
In issue #1197, Stphane Graber reported a rare case of crash that
results from an attempt to close an already closed H1 connection. It
indeed looks like under some circumstances it should be possible to
call the h1_shutw_conn() function more than once, though these
conditions are not very clear.
Without going through a deep analysis of all possibilities, one
potential case seems to be a detach() called with pending output data,
causing H1C_F_ST_SHUTDOWN to be set on the connection, then h1_process()
being immediately called on I/O, causing h1_send() to flush these data
and call h1_shutw_conn(), and finally the upper stream calling cs_shutw()
hence h1_shutw(), which itself will call h1_shutw_conn() again while the
transport and control layers have already been released. But the whole
sequence is not certain as it's not very clear in which case it's
possible to leave h1_send() without the connection anymore (at least
the obuf is empty).
However what is certain is that a shutdown function must be idempotent,
so let's fix h1_shutw_conn() regarding this point. Stphane reported the
issue as far back as 2.0, so this patch should be backported this far.
The hdr_ip() sample fetch function will try to extract IP addresses
from a header field. These IP addresses are parsed using url2ipv4()
and if it fails it will fall back to inet_pton(AF_INET6), otherwise
will fail.
There is a small problem there which is that if a field starts with
an IP address and is immediately followed by some garbage, the IP
address part is still returned. This is a problem with fields such
as x-forwarded-for because it prevents detection of accidental
corruption or bug along the chain. For example, the following string:
x-forwarded-for: 1.2.3.4; 5.6.7.8
or this one:
x-forwarded-for: 1.2.3.4O ( the last one being the letter 'O')
would still return "1.2.3.4" despite the trailing characters. This is
bad because it will silently cover broken code running on intermediary
proxies and may even in some cases allow haproxy to pass improperly
formatted headers after they were apparently validated, for example,
if someone extracts the address from this field to place it into
another one.
This issue would only affect the IPv4 parser, because the IPv6 parser
already uses inet_pton() which fails at the first invalid character and
rejects trailing port numbers.
In strict compliance with RFC7239, let's make sure that if there are any
characters left in the string, the parsing fails and makes hdr_ip()
return nothing. However, a special case has to be handled to support
IPv4 addresses followed by a colon and a valid port number, because till
now the parser used to implicitly accept them and it appears that this
practice, though rare, does exist at least in Azure:
https://docs.microsoft.com/en-us/azure/application-gateway/how-application-gateway-works
This issue has always been there so the fix may be backported to all
versions. It will need the following commit in order to work as expected:
MINOR: tools: make url2ipv4 return the exact number of bytes parsed
Many thanks to https://twitter.com/melardev and the BitMEX Security Team
for their detailed report.
The function's return value is currently used as a boolean but we'll
need it to return the number of bytes parsed. Right now it returns
it minus one, unless the last char doesn't match what is permitted.
Let's update this to make it more usable.
If an isolated thread is marked as harmless, it will loop forever in
thread_harmless_till_end() waiting no threads are isolated anymore. It never
happens because the current thread is isolated. To fix the bug, we exclude
the current thread for the test. We now wait for all other threads to leave
the rendez-vous point.
This bug only seems to occurr if HAProxy is compiled with DEBUG_UAF, when
pool_gc() is called. pool_gc() isolates the current thread, while
pool_free_area() set the thread as harmless when munmap is called.
This patch must be backported as far as 2.0.
Release the lock before calling mux destroy in connect_server when
trying to kill an idle connection because the pool high count has been
reached.
The lock must be released because the mux destroy will call
srv_release_conn which also takes the lock to remove the connection from
the tree. As the connection was already deleted from the tree at this
stage, it is safe to release the lock, and the removal in
srv_release_conn will be a noop.
It does not need to be backported because it is only present in the
current release. It has been introduced by
5c7086f6b0
MEDIUM: connection: protect idle conn lists with locks
In fd_delete(), if we're running with no double-width cas, take the
fd_mig_lock before setting thread_mask to 0 to make sure that
another thread calling fd_set_running() won't miss the new value of
thread_mask and set its bit in running_mask after we checked it.
This should be backported to 2.2 as part of the series fixing fd_delete().
Christopher discovered an issue mostly affecting 2.2 and to a less extent
2.3 and above, which is that it's possible to deadlock a soft-stop when
several threads are using a same listener:
thread1 thread2
unbind_listener() fd_set_running()
lock(listener) listener_accept()
fd_delete() lock(listener)
while (running_mask); -----> deadlock
unlock(listener)
This simple case disappeared from 2.3 due to the removal of some locked
operations at the end of listener_accept() on the regular path, but the
architectural problem is still here and caused by a lock inversion built
around the loop on running_mask in fd_clr_running_excl(), because there
are situations where the caller of fd_delete() may hold a lock that is
preventing other threads from dropping their bit in running_mask.
The real need here is to make sure the last user deletes the FD. We have
all we need to know the last one, it's the one calling fd_clr_running()
last, or entering fd_delete() last, both of which can be summed up as
the last one calling fd_clr_running() if fd_delete() calls fd_clr_running()
at the end. And we can prevent new threads from appearing in running_mask
by removing their bits in thread_mask.
So what this patch does is that it sets the running_mask for the thread
in fd_delete(), clears the thread_mask, thus marking the FD as orphaned,
then clears the running mask again, and completes the deletion if it was
the last one. If it was not, another thread will pass through fd_clr_running
and will complete the deletion of the FD.
The bug is easily reproducible in 2.2 under high connection rates during
soft close. When the old process stops its listener, occasionally two
threads will deadlock and the old process will then be killed by the
watchdog. It's strongly believed that similar situations do exist in 2.3
and 2.4 (e.g. if the removal attempt happens during resume_listener()
called from listener_accept()) but if so, they should be much harder to
trigger.
This should be backported to 2.2 as the issue appeared with the FD
migration. It requires previous patches "fd: make fd_clr_running() return
the remaining running mask" and "MINOR: fd: remove the unneeded running
bit from fd_insert()".
Notes for backport: in 2.2, the fd_dodelete() function requires an extra
argument "do_close" indicating whether we want to remove and close the FD
(fd_delete) or just delete it (fd_remove). While this information is not
conveyed along the chain, we know that late calls always imply do_close=1
become do_close=0 exclusively results from fd_remove() which is only used
by the config parser and the master, both of which are single-threaded,
hence are always the last ones in the running_mask. Thus it is safe to
assume that a postponed FD deletion always implies do_close=1.
Thanks to Olivier for his help in designing this optimal solution.
When a lua context is allocated, its stack must be initialized to NULL
before attaching it to its owner (task, stream or applet). Otherwise, if
the watchdog is fired before the stack is really created, that may lead to a
segfault because we try to dump the traceback of an uninitialized lua stack.
It is easy to trigger this bug if a lua script do a blocking call while
another thread try to initialize a new lua context. Because of the global
lua lock, the init is blocked before the stack creation. Of course, it only
happens if the script is executed in the shared global context.
This patch must be backported as far as 2.0.
The commit reverts following commits:
* 83926a04 BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable
* a61789a1 MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Instead of relying on a Lua function to print the lua traceback into the
debugger, we are now using our own internal function (hlua_traceback()).
This one does not allocate memory and use a chunk instead. This avoids any
issue with a possible deadlock in the memory allocator because the thread
processing was interrupted during a memory allocation.
This patch relies on the commit "BUG/MEDIUM: debug/lua: Use internal hlua
function to dump the lua traceback". Both must be backported wherever the
patches above are backported, thus as far as 2.0
The separator string is now configurable, passing it as parameter when the
function is called. In addition, the message have been slightly changed to
be a bit more readable.
If an unknown CA file was first mentioned in an "add ssl crt-list" CLI
command, it would result in a call to X509_STORE_load_locations which
performs a disk access which is forbidden during runtime. The same would
happen if a "ca-verify-file" or "crl-file" was specified. This was due
to the fact that the crt-list file parsing and the crt-list related CLI
commands parsing use the same functions.
The patch simply adds a new parameter to all the ssl_bind parsing
functions so that they know if the call is made during init or by the
CLI, and the ssl_store_load_locations function can then reject any new
cafile_entry creation coming from a CLI call.
It can be backported as far as 2.2.
Previous commit 69ba35146 ("MINOR: tools: introduce new option
PA_O_DEFAULT_DGRAM on str2sa_range.") managed to introduce a
parenthesis imbalance that broke the build. No backport is needed.
str2sa_range function options PA_O_DGRAM and PA_O_STREAM are used to
define the supported address types but also to set the default type
if it is not explicit. If the used address support both STREAM and DGRAM,
the default was always set to STREAM.
This patch introduce a new option PA_O_DEFAULT_DGRAM to force the
default to DGRAM type if it is not explicit in the address field
and both STREAM and DGRAM are supported. If only DGRAM or only STREAM
is supported, it continues to be considered as the default.
In commit a1ecbca0a ("BUG/MINOR: freq_ctr/threads: make use of the last
updated global time"), for period-based counters, the millisecond part
of the global_now variable was used as the date for the new period. But
it's wrong, it only works with sub-second periods as it wraps every
second, and for other periods the counters never rotate anymore.
Let's make use of the newly introduced global_now_ms variable instead,
which contains the global monotonic time expressed in milliseconds.
This patch needs to be backported wherever the patch above is backported.
It depends on previous commit "MINOR: time: also provide a global,
monotonic global_now_ms timer".
The period-based freq counters need the global date in milliseconds,
so better calculate it and expose it rather than letting all call
places incorrectly retrieve it.
Here what we do is that we maintain a new globally monotonic timer,
global_now_ms, which ought to be very close to the global_now one,
but maintains the monotonic approach of now_ms between all threads
in that global_now_ms is always ahead of any now_ms.
This patch is made simple to ease backporting (it will be needed for
a subsequent fix), but it also opens the way to some simplifications
on the time handling: instead of computing the local time and trying
to force it to the global one, we should soon be able to proceed in
the opposite way, that is computing the new global time an making the
local one just the latest snapshot of it. This will bring the benefit
of making sure that the global time is always ahead of the local one.
The function's purpose used to be to fail a buffer allocation if that
allocation wouldn't result in leaving some buffers available. Thus,
some allocations could succeed and others fail for the sole purpose of
trying to provide 2 buffers at once to process_stream(). But things
have changed a lot with 1.7 breaking the promise that process_stream()
would always succeed with only two buffers, and later the thread-local
pool caches that keep certain buffers available that are not accounted
for in the global pool so that local allocators cannot guess anything
from the number of currently available pools.
Let's just replace all last uses of b_alloc_margin() with b_alloc() once
for all.
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any real benefit, it only avoids the
area being poisonned before being zeroed. Ideally a pool_calloc() function
should be provided for this.
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
This fixes a gcc warning about a missing const on defproxy for
mem_parse_global_fail_alloc.
This is needed since the commit :
018251667e
CLEANUP: config: make the cfg_keyword parsers take a const for the
defproxy
When we try to dump the stack of a lua context, if it is not dumpable,
nothing is performed and a message is emitted instead. This happens when a
lua execution was interrupted inside a non-reentrant part.
This patch depends on following commit :
* MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Thanks to this patch, we avoid a possible deadllock if the lua is
interrupted by the watchdog in the lua memory allocator, because realloc()
is not async-signal-safe.
Both patches must be backported as far as 2.0.
Some parts of the Lua are non-reentrant. We must be sure to carefully track
these parts to not dump the lua stack when it is interrupted inside such
parts. For now, we only identified the custom lua allocator. If the thread
is interrupted during the memory allocation, we must not try to print the
lua stack wich also allocate memory. Indeed, realloc() is not
async-signal-safe.
In this patch we introduce a thread-local counter. It is incremented before
entering in a non-reentrant part and decremented when exiting. It is only
performed in hlua_alloc() for now.
It was misspelled (expect-netscaler-ip instead of expect-netscaler-cip). 2
commits are concerned :
* db67b0ed7 MINOR: tcp-rules: suggest approaching action names on mismatch
* 72d012fbd CLEANUP: tcp-rules: add missing actions in the tcp-request error message
The first one will not be backported, but the second one was backported as
far as 1.8. Thus this one may also be backported, but only the 2nd part
about the list of accepted keywords.
Now that connections aren't being reused when they failed, remove the
reset() method. It was unimplemented anywhere, except for H1 where it did
nothing, anyway.
Add a start() method to ssl_sock. It is responsible with initiating the
SSL handshake, currently by just scheduling the tasklet, instead of doing
it in the init() method, when all the XPRT may not have been initialized.
Add a start_method to xprt_handshake. It schedules the tasklet that does
the handshake. This used to be done in xprt_handshake_add_xprt(), but that's
a much better place.
Introduce a new XPRT method, start(). The init() method will now only
initialize whatever is needed for the XPRT to run, but any action the XPRT
has to do before being ready, such as handshakes, will be done in the new
start() method. That way, we will be sure the full stack of xprt will be
initialized before attempting to do anything.
The init() call is also moved to conn_prepare(). There's no longer any reason
to wait for the ctrl to be ready, any action will be deferred until start(),
anyway. This means conn_xprt_init() is no longer needed.
The proto "uxdg" (UNIX DGRAM) was not declared, causing an error trying
to put a socket unix on "dgram-bind" into a log-forward section.
This patch introduces the missing "uxdg" protocol by adding proto_uxdg.c
which was fully created based on the code available for the other
protocols.
This patch should be backported to version 2.3 and above.
Allow to specify the mux proto for a dynamic server. It must be
compatible with the backend mode to be accepted. The reg-tests has been
extended for this error case.
Enable a subset of server options to be used as keywords on the CLI
command 'add server'. These options are safe and can be applied
flawlessly for a dynamic server.
Add a new cli command 'add server'. This command is used to create a new
server at runtime attached on an existing backend. The syntax is the
following one :
$ add server <be_name>/<sv_name> [<kws>...]
This command is only available through experimental mode for the moment.
Currently, no server keywords are supported. They will be activated
individually when deemed properly functional and safe.
Another limitation is put on the backend load-balancing algorithm. The
algorithm must use consistent hashing to guarantee a minimal
reallocation of existing connections on the new server insertion.
Remove static qualifier on stats_allocate_proxy_counters_internal. This
function will be used to allocate extra counters at runtime for dynamic
servers.
Prepare the server parsing API to support dynamic servers.
- define a new parsing flag to be used for dynamic servers
- each keyword contains a new field dynamic_ok to indicate if it can be
used for a dynamic server. For now, no keyword are supported.
- do not copy settings from the default server for a new dynamic server.
- a dynamic server is created in a maintenance mode and requires an
explicit 'enable server' command.
- a new server flag named SRV_F_DYNAMIC is created. This flag is set for
all servers created at runtime. It might be useful later, for example
to know if a server can be purged.
Modify the API of parse_server function. Use flags to describe the type
of the parsed server instead of discrete arguments. These flags can be
used to specify if a server/default-server/server-template is parsed.
Additional parameters are also specified (parsing of the address
required, resolve of a name must be done immediately).
It is now unneeded to use strcmp on args[0] in parse_server. Also, the
calls to parse_server are more explicit thanks to the flags.
Move server linked into proxy backend list outside of _srv_parse_init to
parse_server.
This is groundwork for dynamic servers support. There will be two
differences in case of a dynamic server :
- the server will be attached to the proxy list only at the very end of the
operations when everything is ok
- the server will be directly attached to the end of the server proxy
list
Move every ha_alert calls in parsing functions into parse_server.
Parsing functions now support a pointer-to-string argument which will be
allocated with an error message if needed via memprintf.
parse_server has then the responsibility to display errors with ha_alert.
This is groundwork for dynamic server. No traces should be printed on
stderr as a response to a cli command. cli_err will replace ha_alert in
this case.
The huge parse_server function is splitted into two smaller ones.
* _srv_parse_init allocates a new server instance and parses the address
parameter
* _srv_parse_kw parse the current server keyword
This simplify a bit the parse_server function. Besides, it will be
useful for dynamic server creation.
Move server-keyword hardcoded in parse_server into the srv_kws list of
server.c. Now every server keywords is checked through srv_find_kw. This
has the effect to reduce the size of parse_server. As a side-effect,
common kw list can be reduced.
This change has been made to be able to quickly discard these keywords
in case of a dynamic server.
The idle conn task is is a global task used to cleanup backend
connections marked for deletion. Previously, it was only only allocated
if at least one server in the configuration has idle connections.
This assumption won't be valid anymore when new servers can be created
at runtime with idle connections. Always allocate the global idle conn
task.
Declaring a master CLI socket without activating the master-worker mode
is likely a user error, so we issue a warning.
This patch can be backported as far as 1.8.
If the configuration file contains a 'unix-bind prefix' directive, and
if we use the -S option and specify a UNIX socket path, the path of the
socket will be prepended with the value of the unix-bind prefix.
For instance, if we have 'unix-bind prefix /tmp/sockets/' and we use
'-S /tmp/master-socket' on the command line, we will get this error:
Starting proxy MASTER:
cannot bind UNIX socket (No such file or directory) [/tmp/sockets/tmp/master-socket]
So this patch adds an exception, and will ignore the unix-bind prefix
for the master CLI socket.
This patch can be backported as far as 1.9.
The freq counters were using the thread's own time as the start of the
current period. The problem is that in case of contention, it was
occasionally possible to perform non-monotonic updates on the edge of
the next second, because if the upfront thread updates a counter first,
it causes a rotation, then the second thread loses the race from its
older time, and tries again, and detects a different time again, but
in the past so it only updates the counter, then a third thread on the
new date would detect a change again, thus provoking a rotation again.
The effect was triple:
- rare loss of stored values during certain transitions from one
period to the next one, causing counters to report 0
- half of the threads forced to go through the slow path every second
- difficult convergence when using many threads where the CAS can fail
a lot and we can observe N(N-1) attempts for N threads to complete
This patch fixes this issue in two ways:
- first, it now makes use og the monotonic global_now value which also
happens to be volatile and to carry the latest known time; this way
time will never jump backwards anymore and only the first thread
updates it on transition, the other ones do not need to.
- second, re-read the time in the loop after each failure, because
if the date changed in the counter, it means that one thread knows
a more recent one and we need to update. In this case if it matches
the new current second, the fast path is usable.
This patch relies on previous patch "MINOR: time: export the global_now
variable" and must be backported as far as 1.8.
This is the process-wide monotonic time that is used to update each
thread's own time. It may be required at a few places where a strictly
monotonic clock is required such as freq_ctr. It will be have to be
backported as a dependency of a forthcoming fix.
DNS hostname comparisons were fixed to be case-insensitive (see b17b88487
"BUG/MEDIUM: dns: Consider the fact that dns answers are
case-insensitive"). However 2 comparisons are still case-sensitive.
This patch must be backported as far as 1.8.
Some are not always easy to spot with "chk" vs "check" or hyphens at
some places and not at others. Now entering "option http-close" properly
suggests "httpclose" and "option tcp-chk" suggests "tcp-check". There's
no need to consider the proxy's capabilities, what matters is to figure
what related word the user tried to spell, and there are not that many
options anyway.
Now the suggested keywords are sorted with the most relevant ones first
instead of scanning them all in registration order and only dumping the
proposed ones:
- "tra"
trace <module> [cmd [args...]] : manage live tracing
operator : lower the level of the current CLI session to operator
user : lower the level of the current CLI session to user
show trace [<module>] : show live tracing state
- "pool"
show pools : report information about the memory pools usage
add acl : add acl entry
del map : delete map entry
user : lower the level of the current CLI session to user
del acl : delete acl entry
- "sh ta"
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show tasks : show running tasks
set table [id] : update or create a table entry's data
show table [id]: report table usage stats or dump this table's contents
trace <module> [cmd [args...]] : manage live tracing
- "sh state"
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
set table [id] : update or create a table entry's data
show table [id]: report table usage stats or dump this table's contents
show servers state [id]: dump volatile server information (for backend <id>)
show sess [id] : report the list of current sessions or dump this session
Till now the fuzzy matching would only work on the same number of words,
but this doesn't account for commands like "show servers conn" which
involve 3 words and were not proposed when entering only "show conn".
Let's improve the situation by building the two fingerprints separately
for the correct keyword sequence and the entered one, then compare them.
This can result in slightly larger variations due to the different string
lengths but is easily compensated for. Thanks to this, we can now see
"show servers conn" when entering "show conn", and the following choices
are relevant to correct typos:
- "show foo"
show sess [id] : report the list of current sessions or dump this session
show info : report information about the running process [desc|json|typed]*
show env [var] : dump environment variables known to the process
show fd [num] : dump list of file descriptors in use
show pools : report information about the memory pools usage
- "show stuff"
show sess [id] : report the list of current sessions or dump this session
show info : report information about the running process [desc|json|typed]*
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show fd [num] : dump list of file descriptors in use
show tasks : show running tasks
- "show stafe"
show sess [id] : report the list of current sessions or dump this session
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show fd [num] : dump list of file descriptors in use
show table [id]: report table usage stats or dump this table's contents
show tasks : show running tasks
- "show state"
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show servers state [id]: dump volatile server information (for backend <id>)
It's still visible that the shorter ones continue to easily match, such
as "show sess" not having much in common with "show foo" but what matters
is that the best candidates are definitely relevant. Probably that listing
them in match order would further help.
While sums of squares usually give excellent results in fixed-sise
patterns, they don't work well to compare different sized ones such
as when some sub-words are missing, because a word such as "server"
contains "er" twice, which will rsult in an extra distance of at
least 4 for just this e->r transition compared to another one missing
it. This is one of the main reasons why "show conn" only proposes
"show info" on the CLI. Maybe an improved approach consisting in
using squares only for exact same lengths would work, but it would
still make it difficult to spot reversed characters.
The distance between two words can be high due to a sub-word being missing
and in this case it happens that other totally unrealted words are proposed
because their average score looks lower thanks to being shorter. Here we're
introducing the notion of presence of each character so that word sequences
that contain existing sub-words are favored against the shorter ones having
nothing in common. In addition we do not distinguish being/end from a
regular delimitor anymore. That made it harder to spot inverted words.
In commit a0e8eb8ca ("MINOR: cfgparse: suggest correct spelling for
unknown words in global section") we got the ability to locate a better
matching word in case of error. But it mistakenly used the CFG_LISTEN
class of words instead of CFG_GLOBAL, resulting in proposing unsuitable
matches in addition to the long hard-coded list. Now, "tune.dh-param"
correctly proposes "tune.ssl.default-dh-param".
No backport is needed.
I somehow managed to re-break the "help" command in b736458bf ("MEDIUM:
cli: apply spelling fixes for known commands before listing them")
after fixing it once. A null-deref happens when checking the args
early in the processing.
No backport is needed as this was introduced in 2.4-dev12.
When tasklets were derived from tasks, there was no immediate need for
the scheduler to know their status after execution, and in a spirit of
simplicity they just started to always return NULL. The problem is that
it simply prevents the scheduler from 1) accounting their execution time,
and 2) keeping track of their current execution status. Indeed, a remote
wake-up could very well end up manipulating a tasklet that's currently
being executed. And this is the reason why those handlers have to take
the idle lock before checking their context.
In 2.5 we'll take care of making tasklets and tasks work more similarly,
but trouble is to be expected if we continue to propagate the trend of
returning NULL everywhere, especially if some fixes relying on a stricter
model later need to be backported. For this reason this patch updates all
known tasklet handlers to make them return NULL only when the tasklet was
freed. It has no effect for now and isn't even guaranteed to always be
100% safe but it puts the code into the right direction for this.
There were still a very small list of functions, variables and fields
called "stats_" while they were really purely CLI-centric. There's the
frontend called "stats_fe" in the global section, which instantiates a
"cli_applet" called "<CLI>" so it was renamed "cli_fe".
The "alloc_stats_fe" function cas renamed to "cli_alloc_fe" which also
better matches the naming convention of all cli-specific functions.
Finally the "stats_permission_denied_msg" used to return an error on
the CLI was renamed "cli_permission_denied_msg".
Now there's no more "stats_something" that designates the CLI.
This is the number of args accepted on a command received on the CLI,
is has long been totally independent of stats and should not carry
this misleading "stats" name anymore.
Now instead of comparing words at an exact position, we build a fingerprint
made of all of them, so that we can check for them in any position. For
example, "show conn serv" finds "show servers conn" and that "set servers
maxconn" proposes both "set server" and "set maxconn servers".
Instead of making a new one from scratch, let's support not wiping the
existing fingerprint and updating it, and to do the same char by char.
The word-by-word one will still result in multiple beginnings and ends,
but that will accurately translate word boundaries. The char-based one
has more flexibility and requires that the caller maintains the previous
char to indicate the transition, which also allows to insert delimiters
for example.
Entering "show tls" would still emit 35 entries. By measuring the distance
between all unknown words and the candidates, we can sort them and pick the
10 most likely candidates. This works reasonably well, as now "show tls"
only proposes "show tls-keys", "show threads", "show pools" and "show tasks".
If the distance is still too high or if a word is missing, the whole
prefix list continues to be dumped, thus "show" alone will still report
the entire list of commands beginning with "show".
It's still impossible to skip a word, for example "show conn" will not
propose "show servers conn" because the distance is calculated for each
word individually. Some changes to the distance calculation to support
updating an existing map could easily address this. But this is already
a great improvement.
The error message on the CLI has become unreadable due to the long list
and it's not even sorted, making it even harder to figure the right
command.
This patch starts by looking if some of the words match something known,
and if so, will limit the listing only to those commands that start like
the current one. The "help", "prompt" and "quit" commands are always
shown to help the user try something else. Now thanks to this, typing
"add" or "del" will only list "add acl", "add map" and not 50 lines
anymore.
As a small bonus, we won't print "Unknown command" anymore in response
to the "help" command.
By doing so we can report more accurate information about what's wrong.
As a first step, we already distinguish the case of expert-only commands
from other ones.
Now that the appctx contains the master level, it greatly simplifies
all the tests, as we can simply verify that keyword levels match the
effective level without having to cheat with applet pointers. This
also allows to fold the expert test in them.
Right now the code is a bit hackish, it tests for the keyword's level
flags but checks the applet's origin to compare the bits. Let's start
by properly setting the ACCESS_MASTER_ONLY and ACCESS_MASTER flags on
the master CLI's bind_conf so that they are automatically present
all the time.
These 3 commands are functionally valid both in master and worker CLIs.
However, while they do have a valid handler, they are not permitted by
the code and work partially by chance in the master:
- "prompt" and "quit" are intercepted by the request analyser
- "help" triggers an error, which results in displaying the error
message
Let's make sure they are permitted so that we don't count errors there and
that we can report appropriate help.
This bug has always been there but it doesn't have any functional effect
at the moment since "help" can only show the error message. As such, there
is no need to backport it.
The loop looking for existing ADD items to renew their last_seen must ignore
the items already renewed in the same loop. To do so, we rely on the
last_seen time. because it is now based on now_ms, it is safe.
Doing so avoid to match several time the same ADD item when the same IP
address is found in several ADD item. This reduces the number of extra DNS
resolutions.
This patch depends on "MINOR: resolvers: Use milliseconds for cached items
in resolver responses". Both may be backported as far as 2.2 if necessary.
The last time when an item was seen in a resolver responses is now stored in
milliseconds instead of seconds. This avoid some corner-cases at the
edges. This also simplifies time comparisons.
At startup, if a SRV resolution is set for a server, no DNS resolution is
created. We must wait the first SRV resolution to know if it must be
triggered. It is important to do so for two reasons.
First, during a "classical" startup, a server based on a SRV resolution has
no hostname. Thus the created DNS resolution is useless. Best waiting the
first SRV resolution. It is not really a bug at this stage, it is just
useless.
Second, in the same situation, if the server state is loaded from a file,
its hosname will be set a bit later. Thus, if there is no additionnal record
for this server, because there is already a DNS resolution, it inhibits any
new DNS resolution. But there is no hostname attached to the existing DNS
resolution. So no resolution is performed at all for this server.
To avoid any problem, it is fairly easier to handle this special case during
startup. But this means we must be prepared to have no "resolv_requester"
field for a server at runtime.
This patch must be backported as far as 2.2.
Another way to say it: "Safely unlink requester from a requester callbacks".
Requester callbacks must never try to unlink a requester from a resolution, for
the current requester or another one. First, these callback functions are called
in a loop on a request list, not necessarily safe. Thus unlink resolution at
this place, may be unsafe. And it is useless to try to make these loops safe
because, all this stuff is placed in a loop on a resolution list. Unlink a
requester may lead to release a resolution if it is the last requester.
However, the unkink is necessary because we cannot reset the server state
(hostname and IP) with some pending DNS resolution on it. So, to workaround
this issue, we introduce the "safe" unlink. It is only performed from a
requester callback. In this case, the unlink function never releases the
resolution, it only reset it if necessary. And when a resolution is found
with an empty requester list, it is released.
This patch depends on the following commits :
* MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
* MINOR: resolvers: Use a function to remove answers attached to a resolution
* MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
* MINOR: resolvers: Add function to change the srv status based on SRV resolution
All the series must be backported as far as 2.2. It fixes a regression
introduced by the commit b4badf720 ("BUG/MINOR: resolvers: new callback to
properly handle SRV record errors").
don't release resolution from requester cb
When the server status must be updated from the result of a SRV resolution,
we can directly call srvrq_update_srv_state(). It is simpler and this avoid
a test on the server DNS resolution.
This patch is mandatory for the next commit. It also rely on "MINOR:
resolvers: Directly call srvrq_update_srv_state() when possible".
srvrq_update_srv_status() update the server status based on result of SRV
resolution. For now, it is only used from snr_update_srv_status() when
appropriate.
When a SRV request trigger an error, if we decide to handle the error
because last_valid duration is expired, the answer list may be purged. All
items are considered as obsolete.
resolv_purge_resolution_answer_records() must be used to removed all answers
attached to a resolution. For now, it is only used when a resolution is
released.
When a ADD item attached to a SRV item is removed because it is obsolete, we
must trigger a DNS resolution to be sure the hostname still resolves or
not. There is no other way to be the entry is still valid. And we cannot set
the server in RMAINT immediatly, because a DNS server may be inconsitent and
may stop to add some additionnal records.
The opposite is also true. If a valid ADD item is still attached to a SRV
item, any DNS resolution must be stopped. There is no reason to perform
extra resolution in this case.
This patch must be backported as far as 2.2.
If no ADD item is found for a SRV item in a SRV response, a DNS resolution
is triggered. When it succeeds, we must be sure the SRV item is still
alive. Otherwise the DNS resolution must be ignored.
This patch depends on the commit "MINOR: resolvers: Move last_seen time of
an ADD into its corresponding SRV item". Both must be backported as far as
2.2.
This function search for a SRV answer item associated to a requester
whose type is server.
This is mainly useful to "link" a server to its SRV record when no
additional record were found to configure the IP address.
This patch is required by a bug fix.
For each ADD item found in a SRV response, we try to find a corresponding
ADD item already attached to an existing SRV item. If found, the ADD
last_seen time is updated, otherwise we try to find a SRV item with no ADD
to attached the new one.
However, the loop is buggy. Instead of comparing 2 ADD items, it compares
the new ADD item with the SRV item. Because of this bug, we are unable to
renew last_seen time of existing ADD.
This patch must be backported as far as 2.2.
when a server status is updated based on a SRV item, it is always set to UP,
regardless it has an IP address defined or not. For instance, if only a SRV
item is received, with no additional record, only the server hostname is
defined. We must wait to have an IP address to set the server as UP.
This patch must be backported as far as 2.2.
When a server is set in RMAINT becaues of a SRV resolution failure, the
server DNS resolution, if any, must be unlink first. It is mandatory to
handle the change in the context of a SRV resolution.
This patch must be backported as far as 2.2.
When a DNS resolution error is detected, in snr_resolution_error_cb(), the
server address must be reset only if the server status has changed. It this
case, it means the server is set to RMAINT. Thus the server address may by
reset.
This patch fixes a bug introduced by commit d127ffa9f ("BUG/MEDIUM:
resolvers: Reset address for unresolved servers"). It must be backported as
far as 2.0.
When an error is received for a DNS resolution, for instance a NXDOMAIN
error, the server must be considered to have no address when its status is
updated, not the opposite.
Concretly, because this parameter is not used on error path in
snr_update_srv_status(), there is no impact.
This patch must be backported as far as 1.8.
This reverts commit a331a1e8eb.
This commit fixes a real bug, but it also reveals some hidden bugs, mostly
because of some design issues. Thus, in itself, it create more problem than
it solves. So revert it for now. All known bugs will be addressed in next
commits.
This patch should be backported as far as 2.2.
This was introduced in previous commit 49c2b45c1 ("MINOR: cfgparse/server:
try to fix spelling mistakes on server lines"), the loop was changed but
the increment left. No backport is needed.
This adds support for action_suggest() in http-request, http-response
and http-after-response rulesets. For example:
parsing [/dev/stdin:2]: 'http-request' expects (...), but got 'del-hdr'. Did you mean 'del-header' maybe ?
action_suggest() will return a pointer to an action whose keyword more or
less ressembles the passed argument. It also accepts to be more tolerant
against prefixes (since actions taking arguments are handled as prefixes).
This will be used to suggest approaching words.
Just like with the server keywords, now's the turn of "bind" keywords.
The difference is that 100% of the bind keywords are registered, thus
we do not need the list of extra keywords.
There are multiple bind line parsers today, all were updated:
- peers
- log
- dgram-bind
- cli
$ printf "listen f\nbind :8000 tcut\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/101358 (25146) : haproxy version is 2.4-dev11-7b8787-26
[NOTICE] 070/101358 (25146) : path to executable is ./haproxy
[ALERT] 070/101358 (25146) : parsing [/dev/stdin:2] : 'bind :8000' unknown keyword 'tcut'; did you mean 'tcp-ut' maybe ?
[ALERT] 070/101358 (25146) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/101358 (25146) : Fatal errors found in configuration.
Let's apply the fuzzy match to server keywords so that we can avoid
dumping the huge list of supported keywords each time there is a spelling
mistake, and suggest proper spelling instead:
$ printf "listen f\nserver s 0 sendpx-v2\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/095718 (24152) : haproxy version is 2.4-dev11-caa6e3-25
[NOTICE] 070/095718 (24152) : path to executable is ./haproxy
[ALERT] 070/095718 (24152) : parsing [/dev/stdin:2] : 'server s' unknown keyword 'sendpx-v2'; did you mean 'send-proxy-v2' maybe ?
[ALERT] 070/095718 (24152) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/095718 (24152) : Fatal errors found in configuration.
The global section also knows a large number of keywords that are not
referenced in any list, so this needed them to be specifically listed.
It becomes particularly handy now because some tunables are never easy
to remember, but now it works remarkably well:
$ printf "global\nsched.queue_depth\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/093007 (23457) : haproxy version is 2.4-dev11-dd8ee5-24
[NOTICE] 070/093007 (23457) : path to executable is ./haproxy
[ALERT] 070/093007 (23457) : parsing [/dev/stdin:2] : unknown keyword 'sched.queue_depth' in 'global' section; did you mean 'tune.runqueue-depth' maybe ?
[ALERT] 070/093007 (23457) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/093007 (23457) : Fatal errors found in configuration.
Let's start by the largest keyword list, the listeners. Many keywords were
still not part of a list, so a common_kw_list array was added to list the
not enumerated ones. Now for example, typing "tmout" properly suggests
"timeout":
$ printf "frontend f\ntmout client 10s\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/091355 (22545) : haproxy version is 2.4-dev11-3b728a-21
[NOTICE] 070/091355 (22545) : path to executable is ./haproxy
[ALERT] 070/091355 (22545) : parsing [/dev/stdin:2] : unknown keyword 'tmout' in 'frontend' section; did you mean 'timeout' maybe ?
[ALERT] 070/091355 (22545) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/091355 (22545) : Fatal errors found in configuration.
Instead of just reporting "unknown keyword", let's provide a function which
will look through a list of registered keywords for a similar-looking word
to the one that wasn't matched. This will help callers suggest correct
spelling. Also, given that a large part of the config parser still relies
on a long chain of strcmp(), we'll need to be able to pass extra candidates.
Thus the function supports an optional extra list for this purpose.
This introduces two functions, one which creates a fingerprint of a word,
and one which computes a distance between two words fingerprints. The
fingerprint is made by counting the transitions between one character and
another one. Here we consider the 26 alphabetic letters regardless of
their case, then any digit as a digit, and anything else as "other". We
also consider the first and last locations as transitions from begin to
first char, and last char to end. The distance is simply the sum of the
squares of the differences between two fingerprints. This way, doubling/
missing a letter has the same cost, however some repeated transitions
such as "e"->"r" like in "server" are very unlikely to match against
situations where they do not exist. This is a naive approach but it seems
to work sufficiently well for now. It may be refined in the future if
needed.
The error message for http-request and http-response starts with a comma
that very likely is a leftover from a previous list construct. Let's remove
it: "'http-request' expects , 'wait-for-handshake', 'use-service' ...".
The error message after "http-response set-var" isn't very clear:
[ALERT] 070/115043 (30526) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-var' rule : invalid variable 'set-var'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
Let's change it to this instead:
[ALERT] 070/115608 (30799) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-var' rule : invalid or incomplete action 'set-var'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
With a wrong action name, it also works better (it's handled as a prefix
due to the opening parenthesis):
[ALERT] 070/115608 (30799) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-varxxx' rule : invalid or incomplete action 'set-varxxx'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
The tcp-request error message only mentions "accept", "reject" and
track-sc*, but there are a few other ones that were missing, so let's
add them.
This could be backported, though it's not likely that it will help anyone
with an existing config.
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") also had a copy-paste
error resulting in using global.server_state_file instead of the
function's argument, which easily crashes with a conf having a
state file in a backend and no global state file.
In addition, let's simplify the code and get rid of strcpy() which
almost certainly will break the build on OpenBSD.
This was introduced in 2.4-dev10, no backport is needed.
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") made the global
server_state_base be dereferenced before being checked, resulting
in a crash on certain files.
This happened in 2.4-dev10, no backport is needed.
When a "tcp-check" or a "http-check" rule is parsed, we try to get the
previous rule in the ruleset to get its index. We must take care to reset
the pointer on this rule in case an error is triggered later on the
parsing. Otherwise, the same rule may be released twice. For instance, it
happens with such line :
http-check meth GET uri / ## note there is no "send" parameter
This patch must be backported as far as 2.2.
If an agent-check is configured for a server, When the response is parsed,
the .health threshold of the agent must be updated on up/down/stopped/fail
command and not the threshold of the health-check. Otherwise, the
agent-check will compete with the health-check and may mark a DOWN server as
UP.
This patch should fix the issue #1176. It must be backported as far as 2.2.
CF_FL_ANALYZE flag is used to know a channel is filtered. It is important to
synchronize request and response channels when the filtering ends.
However, it is possible to call all request analyzers before starting the
filtering on the response channel. This means flt_end_analyze() may be
called for the request channel before flt_start_analyze() on the response
channel. Thus because CF_FL_ANALYZE flag is not set on the response channel,
we consider the filtering is finished on both sides. The consequence is that
flt_end_analyze() is not called for the response and backend filters are
unregistered before their execution on the response channel.
It is possible to encounter this bug on TCP frontend or CONNECT request on
HTTP frontend if the client shutdown is reveiced with the first read.
To fix this bug, CF_FL_ANALYZE is set when filters are attached to the
stream. It means, on the request channel when the stream is created, in
flt_stream_start(). And on both channels when the backend is set, in
flt_set_stream_backend().
This patch must be backported as far as 1.7.
Setting multiple http-request track-scX rules generates entries
which never expires.
If there was already an entry registered by a previous http rule
'stream_track_stkctr(&s->stkctr[rule->action], t, ts)' didn't
register the new 'ts' into the stkctr. And function is left
with no reference on 'ts' whereas refcount had been increased
by the '_get_entry'
The patch applies the same policy as the one showed on tcp track
rules and if there is successive rules the track counter keep the
first entry registered in the counter and nothing more is computed.
After validation this should be backported in all versions.
The recent default runqueue size reduction appeared to have significantly
lowered performance on low-thread count configs. Testing various values
runqueue values on different workloads under thread counts ranging from
1 to 64, it appeared that lower values are more optimal for high thread
counts and conversely. It could even be drawn that the optimal value for
various workloads sits around 280/sqrt(nbthread), and probably has to do
with both the L3 cache usage and how to optimally interlace the threads'
activity to minimize contention. This is much easier to optimally
configure, so let's do this by default now.
Instead of setting a hard-limit on runqueue-depth and keeping it short
to maintain fairness, let's allow the scheduler to automatically cut
the existing one in two equal halves if its size is between the configured
size and its double. This will allow to increase the default value while
keeping a low latency.
Emeric found that SSL+keepalive traffic had dropped quite a bit in the
recent changes, which could be bisected to recent commit 9205ab31d
("MINOR: ssl: mark the SSL handshake tasklet as heavy"). Indeed, a
first incarnation of this commit made use of the TASK_SELF_WAKING
flag but the last version directly used TASK_HEAVY, but it would still
continue to remove the already absent TASK_SELF_WAKING one instead of
TASK_HEAVY. As such, the SSL traffic remained processed with low
granularity.
No backport is needed as this is only 2.4.
The tests were made on slz and the zlib parsers for memlevel and windowsize
managed to escape the change made by commit 018251667 ("CLEANUP: config: make
the cfg_keyword parsers take a const for the defproxy"). This is now fixed.
These are some leftovers from the ancient code where they were still
called sessions, but these areas in the code remain confusing due to
this naming. They were now called "strm" which will not even affect
indenting nor alignment.
When implementing a client applet, a NULL dereference was encountered on
the error path which increment the counters.
Indeed, the counters incremented are the one in the listener which does
not exist in the case of client applets, so in sess->listener->counters,
listener is NULL.
This patch fixes the access to the listener structure when accessing
from a sesssion, most of the access are the counters in error paths.
Must be backported as far as 1.8.
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
This patch touches all occurrences found (89).
grq_total was incremented when picking tasks from the global run queue,
but this variable was not defined with threads disabled, and the code
was optimized away at -O2. No backport is needed.