When "set-dumpable" is set to "libs", in addition to marking the process
dumpable, haproxy also reads the binary and shared objects into memory as
a tar archive in a page-aligned location so that these files are easily
extractable from a future core dump. The goal here is to always have
access to the exact same binary and libs as those which caused the core
to happen. It's indeed very frequent to miss some of these, or to get
mismatching files due to a local update that didn't experience a reload,
or to get those of a host system instead of the container.
The in-memory tar file presents everything under a directory called
"core-%d" where %d corresponds to the PID of the worker process. In
order to ease the finding of these data in the core dump, the memory
area is contiguous and surrounded by PROT_NONE pages so that it appears
in its own segment in the core file. The total size used by this is a
few tens of MB, which is not a problem on large systems.
During a soft reload, a starting worker sends sock_pair[0] to the master
via send_fd_uxst(), then reads on sock_pair[1] waiting for the master to
acknowledge receipt. Because of a documented macOS sendmsg(2) bug, the
worker must keep sock_pair[0] open until the master confirms the fd was
received by the CLI applet. This means the read() on sock_pair[1] will
never return 0 (EOF), since the worker itself still holds a reference to
sock_pair[0]. The worker can only unblock when the master actively sends
a byte back. If the master crashes before doing so, the worker blocks
indefinitely in read().
Fix this by setting a 2-second SO_RCVTIMEO on sock_pair[1] before the
read(), so the worker can unblock and continue regardless of the master's
state.
This was introduced by d7f6819161 ("BUG/MEDIUM: mworker: fix startup
and reload on macOS").
This should be backported to 3.1 and later.
We defer processing of the "-dt" options until after the configuration
file has been read. This will be useful if we ever allow trace sources
to be registered later, for instance with LUA.
No backport needed.
In master-worker mode, when a freshly forked worker looks up its own
entry in proc_list to send its "READY" status to the master, the loop
was breaking on the first process with pid == -1 regardless of its
type. If a non-worker process (e.g. a master or program) also had
pid == -1, the wrong entry could be selected, causing send_fd_uxst()
to use an invalid ipc_fd.
Fix this by adding a PROC_O_TYPE_WORKER check to the loop condition,
and add a BUG_ON() assertion to catch any case where the loop exits
without finding a valid worker entry.
Must be backported to 3.1.
Implement proxy refcount for Lua proxy class. This is similar to the
server class.
In summary, proxy_take() is used to increment refcount when a Lua proxy
is instantiated. proxy_drop() is called via Lua garbage collector. To
ensure a deleted backend is released asap, hlua_check_proxy() now
returns NULL if PR_FL_DELETED is set.
This approach is directly dependable on Lua GC execution. As such, it
probably suffers from the same limitations as the ones already described
in the previous commit. With the current patch, "del backend" is not
directly impacted though. However, the final proxy deinit may happen
after a long period of time, which could cause memory pressure increase.
One final observations regarding deinit : it is necessary to delay a
BUG_ON() which checks that defaults proxies list is empty. Now this must
be executed after Lua deinit (called via post_deinit_list). This should
guarantee that all proxies and their defaults refcount are null.
Rename proxy conf <refcount> to <def_ref>. This field only serves for
defaults proxy instances. The objective is to avoid confusion with the
newly introduced <refcount> field used for dynamic backends.
As an optimization, it could be possible to remove <def_ref> and only
use <refcount> also for defaults proxies usage. However for now the
simplest solution is implemented.
This patch does not bring any functional change.
Implement refcount notion into proxy structure. The objective is to be
able to increment refcount on proxy to prevent its deletion temporarily.
This is similar to the server refcount : "del backend" is not blocked
and will remove the targetted instance from the global proxies_list.
However, the final free operation is delayed until the refcount is null.
As stated above, the API is similar to servers. Proxies are initialized
with a refcount of 1. Refcount can be incremented via proxy_take(). When
no longer useful, refcount is decremented via proxy_drop() which
replaces the older free_proxy(). Deinit is only performed once refcount
is null.
This commit also defines flag PR_FL_DELETED. It is set when a proxy
instance has been removed via a "del backend" CLI command. This should
serve as indication to modules which may still have a refcount on the
target proxy so that they can release it as soon as possible.
Note that this new refcount is completely ignored for a default proxy
instance. For them, proxy_take() is pure noop. Free is immediately
performed on first proxy_drop() invokation.
A number of C files include stats.h or stats-t.h, many of which were
just to access the counters. Now those which really need counters rely
on counters.h or counters-t.h, which already reduces the amount of
preprocessed code to be built (~3000 lines or about 0.05%).
This patches allows the argv[] parsing function to be redefined from
others C modules. This is done extracting the function which really
parse the argv[] array to implement haproxy_init_args(). This function
is declared as a weak symbol which may be overloaded by others C module.
Same thing for copy_argv() which checks/cleanup/modifies the argv array.
One may want this function to be redefined. This is the case when other
C modules do not handle the same command line option. Copying such
argv[] would lead to conflicts with the original haproxy argv[] during
the copy.
This patch provides the possibility to initialize haproxy without
configuration file. This may be identified by the new global and exported
<fileless_mode> and <fileless_cfg> variables which may be used to
provide a struct cfgfile to haproxy by others means than a physical
file (built in memory).
When enabled, this fileless mode skips all the configuration files
parsing.
Add the support for large bufers. A dedicated memory pool is added. The size
of these buffers must be explicitly configured by setting
"tune.bufsize.large" directive. If it is not set, the pool is not
created. In addition, if the size for large buffers is the same than for
regular buffer, the feature is automatically disable.
For now, large buffers remain unused.
The call to init_buffer() during the worker startup may fail. In that case,
an error message is displayed but the error was not properly handled. So
let's add the proper check and exit on error.
Features prefixed by "HAVE_WORKING_" in the haproxy -vv feature list,
are features that are detected during runtime.
This patch splits these features on another line in haproxy -vv. This
line is named "Detected feature list".
The feature list in haproxy -vv is partly generated from the Makefile
using the USE_* keywords, but it's also possible to add keywords in the
feature list using hap_register_feature(), which adds the keyword at the
end of list. When doing so, the list is not correctly sorted anymore.
This patch fixes the problem by splitting the string using an array of
ist and applying a qsort() on it.
Commit 3674afe8a0 ("BUG/MEDIUM: threads: Atomically set TH_FL_SLEEPING
and clr FL_NOTIFIED") accidentally left a strange-looking line wrapping
making one think of an editing mistake, let's fix it and keep it on a
single line given that even indented wrapping is almost as large.
This can be backported with the fix above till 2.8 to keep the patch
context consistent between versions.
Fix unhandled strdup() failure when initializing global.log_tag.
Bug was introduced with the fix UAF for global progname pointer from
351ae5dbe. So it must be backported as far as 3.1.
Initially when init_early was introduced the progname string was a local
used for temporary storage of log_tag. Now it's global and detached from
log_tag enough. Thus, in the past we could inform that log_tag
allocation has been failed but not now.
Must be backported since the progname string became global, that is
v3.1-dev9-96-g49772c55e
Give global.maxthrpertgroup its default value at global creation,
instead of later when we're trying to detect the thread count.
It is used when verifying the configuration file validity, and if it was
not set in the config file, in a few corner cases, the value of 0 would
be used, which would then reject perfectly fine configuration files.
This should be backported to 3.3.
When we're about to enter polling, atomically set TH_FL_SLEEPING and
remove TH_FL_NOTIFIED, instead of doing it in sequence. Otherwise,
another thread may sett that both the TH_FL_SLEEPING and the
TH_FL_NOTIFIED bits are set, and don't wake up the thread then it should
be doing that.
This prevents a bug where a thread is sleeping while it should be
handling a new connection, which can happen if there are very few
incoming connection. This is easy to reproduce when using only two
threads, and injecting with only one connection, the connection may then
never be handled.
This should be backported up to 2.8.
This patch changes the handling of named defaults sections. Prior to
this patch, every unreferenced defaults proxies were removed on post
parsing. Now by default, these sections are kept after postparsing and
only purged on deinit. The objective is to allow reusing them as base
configuration for dynamic backends.
To implement this, refcount of every still addressable named sections is
incremented by one after parsing. This ensures that they won't be
removed even if referencing proxies are removed at runtime. This is done
via the new function proxy_ref_all_defaults().
To ensure defaults instances are still properly removed on deinit, the
inverse operation is performed : refcount is decremented by one on every
defaults sections via proxy_unref_all_defaults().
The original behavior can still be used by using the new global keyword
tune.defaults.purge. This is useful for users using configuration with
large number of defaults and not interested in dynamic backends
creation.
Defaults section are indexed by their name in defproxy_by_name tree. For
named sections, there is no duplicate : if two instances have the same
name, the older one is removed from the tree. However, this was not the
case for unnamed defaults which are all stored inconditionnally in
defproxy_by_name.
This commit introduces a new approach for unnamed defaults. Now, these
instances are never inserted in the defproxy_by_name tree. Indeed, this
is not needed as no tree lookup is performed with empty names. This may
optimize slightly config parsing with a huge number of named and unnamed
defaults sections, as the first ones won't fill up the tree needlessly.
However, defproxy_by_name tree is also used to purge unreferenced
defaults instances, both on postparsing and deinit. Thus, a new approach
is needed for unnamed sections cleanup. Now, each time a new defaults is
parsed, if the previous instance is unnamed, it is freed unless if
referenced by a proxy. When config parsing is ended, a similar operation
is performed to ensure the last unnamed defaults section won't stay in
memory. To implement this, last_defproxy static variable is now set to
global. Unnamed sections which cannot be removed due to proxies
referencing proxies will still be removed when such proxies are freed
themselves, at runtime or on deinit.
Defaults proxies instance are stored in a global name tree. When there
is a name conflict and the older entry cannot be simply discarded as it
is already referenced, the older entry is instead removed from the name
tree and inserted into the orphaned list.
The purpose of the orphaned list was to guarantee that any remaining
unreferenced defaults are purged either on postparsing or deinit.
However, this is in fact completely useless. Indeed on postparsing,
orphaned entries are always referenced. On deinit instead, defaults are
already freed along the cleanup of all frontend/backend instances clean
up, thanks to their refcounting.
This patch streamlines this by removing orphaned list. Instead, a
defaults section is inserted into a new global defaults_list during
their whole lifetime. This is not strictly necessary but it ensures that
defaults instances can still be accessed easily in the future if needed
even if not present in the name tree. On deinit, a BUG_ON() is added to
ensure that defaults_list is indeed emptied.
Another benefit from this patch is to simplify the defaults deletion
procedure. Orphaned simple list is replaced by a proper double linked
list implementation, so a single LIST_DELETE() is now performed. This
will be notably useful as defaults may be removed at runtime in the
future if backends deletion at runtime is implemented.
This patch renames functions which deal with defaults section. A common
"defaults_px_" prefix is defined. This serves as a marker to identify
functions which can only be used with proxies defaults capability. New
BUG_ON() are enforced to ensure this is valid.
Also, older proxy_unref_or_destroy_defaults() is renamed
defaults_px_detach().
Now that it is unused, eliminate all_tgroups_mask, as we can't 64bits
masks to represent thread groups, if we want to be able to have more
than 64 thread groups.
Released version 3.4-dev2 with the following main changes :
- BUG/MEDIUM: mworker/listener: ambiguous use of RX_F_INHERITED with shards
- BUG/MEDIUM: http-ana: Properly detect client abort when forwarding response (v2)
- BUG/MEDIUM: stconn: Don't report abort from SC if read0 was already received
- BUG/MEDIUM: quic: Don't try to use hystart if not implemented
- CLEANUP: backend: Remove useless test on server's xprt
- CLEANUP: tcpcheck: Remove useless test on the xprt used for healthchecks
- CLEANUP: ssl-sock: Remove useless tests on connection when resuming TLS session
- REGTESTS: quic: fix a TLS stack usage
- REGTESTS: list all skipped tests including 'feature cmd' ones
- CI: github: remove openssl no-deprecated job
- CI: github: add a job to test the master branch of OpenSSL
- CI: github: openssl-master.yml misses actions/checkout
- BUG/MEDIUM: backend: Do not remove CO_FL_SESS_IDLE in assign_server()
- CI: github: use git prefix for openssl-master.yml
- BUG/MEDIUM: mux-h2: synchronize all conditions to create a new backend stream
- REGTESTS: fix error when no test are skipped
- MINOR: cpu-topo: Turn the cpu policy configuration into a struct
- MEDIUM: cpu-topo: Add a "threads-per-core" keyword to cpu-policy
- MEDIUM: cpu-topo: Add a "cpu-affinity" option
- MEDIUM: cpu-topo: Add a new "max-threads-per-group" global keyword
- MEDIUM: cpu-topo: Add the "per-thread" cpu_affinity
- MEDIUM: cpu-topo: Add the "per-ccx" cpu_affinity
- BUG/MINOR: cpu-topo: fix -Wlogical-not-parentheses build with clang
- DOC: config: fix number of values for "cpu-affinity"
- MINOR: tools: add a secure implementation of memset
- MINOR: mux-h2: add missing glitch count for non-decodable H2 headers
- MINOR: mux-h2: perform a graceful close at 75% glitches threshold
- MEDIUM: mux-h1: implement basic glitches support
- MINOR: mux-h1: perform a graceful close at 75% glitches threshold
- MEDIUM: cfgparse: acknowledge that proxy ID auto numbering starts at 2
- MINOR: cfgparse: remove useless checks on no server in backend
- OPTIM/MINOR: proxy: do not init proxy management task if unused
- MINOR: patterns: preliminary changes for reorganization
- MEDIUM: patterns: reorganize pattern reference elements
- CLEANUP: patterns: remove dead code
- OPTIM: patterns: cache the current generation
- MINOR: tcp: add new bind option "tcp-ss" to instruct the kernel to save the SYN
- MINOR: protocol: support a generic way to call getsockopt() on a connection
- MINOR: tcp: implement the get_opt() function
- MINOR: tcp_sample: implement the fc_saved_syn sample fetch function
- CLEANUP: assorted typo fixes in the code, commits and doc
- BUG/MEDIUM: cpu-topo: Don't forget to reset visited_ccx.
- BUG/MAJOR: set the correct generation ID in pat_ref_append().
- BUG/MINOR: backend: fix the conn_retries check for TFO
- BUG/MINOR: backend: inspect request not response buffer to check for TFO
- MINOR: net_helper: add sample converters to decode ethernet frames
- MINOR: net_helper: add sample converters to decode IP packet headers
- MINOR: net_helper: add sample converters to decode TCP headers
- MINOR: net_helper: add ip.fp() to build a simplified fingerprint of a SYN
- MINOR: net_helper: prepare the ip.fp() converter to support more options
- MINOR: net_helper: add an option to ip.fp() to append the TTL to the fingerprint
- MINOR: net_helper: add an option to ip.fp() to append the source address
- DOC: config: fix the length attribute name for stick tables of type binary / string
- MINOR: mworker/cli: only keep positive PIDs in proc_list
- CLEANUP: mworker: remove duplicate list.h include
- BUG/MINOR: mworker/cli: fix show proc pagination using reload counter
- MINOR: mworker/cli: extract worker "show proc" row printer
- MINOR: cpu-topo: Factorize code
- MINOR: cpu-topo: Rename variables to better fit their usage
- BUG/MEDIUM: peers: Properly handle shutdown when trying to get a line
- BUG/MEDIUM: mux-h1: Take care to update <kop> value during zero-copy forwarding
- MINOR: threads: Avoid using a thread group mask when stopping.
- MINOR: hlua: Add support for lua 5.5
- MEDIUM: cpu-topo: Add an optional directive for per-group affinity
- BUG/MEDIUM: mworker: can't use signals after a failed reload
- BUG/MEDIUM: stconn: Move data from <kip> to <kop> during zero-copy forwarding
- DOC: config: fix a few typos and refine cpu-affinity
- MINOR: receiver: Remove tgroup_mask from struct shard_info
- BUG/MINOR: quic: fix deprecated warning for window size keyword
Remove the "stopped_tgroup_mask" variable, that indicated which thread
groups were stopping, and instead just use "stopped_tgroups", a counter
indicating how many thread groups are stopping. We want to remove all
thread group masks, so that we can increase the maximum number of thread
groups past 64.
It's always a pain to guess the number of FDs that can be needed by
listeners, checks, threads, pollers etc. We have this estimate in
global.maxsock before calling set_global_maxconn(), but we lose it
the line after. Let's copy it into global.est_fd_usage and keep it.
This will be helpful to try to provide more accurate suggestions for
maxconn.
Since the new master-worker model in 3.1, signals are registered in
step_init_3(). However, those signals were supposed to be registered
only for the worker or the standalone mode. It would call the wrong
callback in the master even during configuration parsing.
The patch set the signals handler to NULL for the master so it does
nothing until they really are registered.
Must be backported as far as 3.1.
Since haproxy 3.1, the master-worker mode changed to let the worker
parse the configuration instead of the master.
Previously, signals were blocked during configuration parsing and
unblocked before entering the polling loop of the master. This way it
was impossible to start a reload during the configuration parsing.
But with the new model, the polling loop is started in the master before
the configuration parsing is finished, and the signals are still
unblocked at this step. Meaning that it is possible to start a reload
while the configuration is parsing.
This patch reintroduce the behavior of blocking the signals during
configuration parsing adapted to the new model:
- Before the exec() of the reload, signals are blocked.
- When entering the polling loop, the SIGCHLD is unblocked because it is
required to get a failure during configuration parsing in the worker
- Once the configuration is parsed, upon success in _send_status() or
upon failure in run_master_in_recovery_mode() every signals are unblocked.
This patch must be backported as far as 3.1.
Move the char *msg variable declared in main() in a sub-block since
there's already multiple msg variable in other sub-blocks in this
function.
Also make it const.
Since commit "65760d MINOR: init: Make devnullfd global and create it
earlier in init" the devnullfd file descriptor pointing to /dev/null
is created regardless of the process's parameters so we can use it in
all 'stdio_quiet' calls instead or recreating an FD.
The devnull fd might be needed during configuration parsing, if some
options require to fork/exec for instance. So we now create it much
earlier in the init process and without depending on the '-q' or '-d'
parameters.
During init we were calling 'stdio_quiet' and passing the previously
created 'devnullfd' file descriptor. But the 'stdio_quiet' was also
closed afterwards which raised an error (EBADF).
If we keep from closing FDs that were opened outside of the
'stdio_quiet' function we will let the caller manage its FD and avoid
double close calls.
This patch can be backported to all stable branches.
Several settings can be set to control stream multiplexing and
associated receive window. Previously, all of these settings were
configured using prefix "tune.quic.frontend.", despite being applied
blindly on both sides.
Fix this by duplicating these settings specific to frontend and backend
side. Options are also renamed to use the standardize prefix
"tune.quic.[be|fe].stream." notation.
Also, each option is individually renamed to better reflect its purpose
and hide technical details relative to QUIC transport parameter naming :
* max-data-size -> stream.rxbuf
* max-streams-bidi -> stream.max-concurrent
* stream-data-ratio -> stream.data-ratio
No need to backport.
Streamline max-idle-timeout option. Rename it to use the newer cohesive
naming scheme 'tune.quic.fe|be.'.
Two different fields were already defined in global struct. These fields
are moved into quic_tune along with other QUIC settings. However, no
parser was defined for backend option, this commit fixes this.
No need to backport this.
On frontend side, a quic_conn can have a dedicated FD or use the
listener one. These different modes can be activated via a global QUIC
tune setting.
This patch adjusts the option. First, it is renamed to the more
meaningful name 'tune.quic.fe.sock-per-conn'. Also, arguments are now
either 'default-on' or 'force-off'. The objective is to better highlight
reliationship with 'quic-socket' bind option.
The older option is deprecated and will be removed in 3.5.
A QUIC global tune setting is defined to be able to force Retry emission
prior to handshake. By definition, this ability is only supported by
QUIC servers, hence it is a frontend option only.
Rename the option to use "fe" prefix. The old option name is deprecated
and will be removed in 3.5
QUIC global memory can be limited across the entire process via a global
tune setting. Previously, this setting used to misleading "frontend"
prefix. As this is applied as a sum between all QUIC connections, both
from frontend and backend sides, remove the prefix. The new option name
is "tune.quic.mem.tx-max".
The older option name is deprecated and will be removed in 3.5.
Various settings can be configured related to QUIC congestion controler.
This patch duplicates them to be able to set independent values on
frontend and backend sides.
As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUIC
settings. Older options are deprecated and will be removed on 3.5
release.
Add a new initcall stage, STG_INIT_2, for stuff to be called after
step_init_2() is called, so after we know for sure that global.nbthread
will be set.
Modify stick-tables stkt_late_init() to run at STG_INIT_2 instead of
STG_INIT, in anticipation for it to be enhanced and have a need for
global.nbthread.
Historically, when the purge of pools was forced by sending a SIGQUIT to
haproxy, information about the pools were first dumped. It is now totally
pointless because these info can be retrieved via the CLI. It is even less
relevant now because the purge is forced typically when there are memroy
issues and to dump pools information, data must be allocated.
dump_pools_info() function was simplified because it is now called only from
an applet. No reason to still try to dump info on stderr.
I continue to mistakenly set the traces using "-dtXXX" and to have to
refer to the doc to figure that it requires a separate argument and
differs from some other options. Worse, "-dthelp" doesn't say anything
and silently ignores the argument.
Let's make the parser take whatever follows "-dt" as the argument if
present, otherwise take the next one (as it currently does). Doing
this even allows to simplify the code, and is easier to figure the
syntax since "-dthelp" now works.
This patch introduces three new command line flags to display HAProxy version
info more flexibly:
- `-vqs` outputs the short version string without commit info (e.g., "3.3.1").
- `-vqb` outputs only the branch (major.minor) part of the version (e.g., "3.3").
- `-vq` outputs the full version string with suffixes (e.g., "3.3.1-dev5-1bb975-71").
This allows easier parsing of version info in automation while keeping existing -v and -vv behaviors.
The command line argument parsing now calls `display_version_plain()` with a
display_mode parameter to select the desired output format. The function handles
stripping of commit or patch info as needed, depending on the mode.
Signed-off-by: Nikita Kurashkin <nkurashkin@stsoft.ru>
Like many exposed network deamons, haproxy does normally not need to run
as root and strongly recommends against this, unless strictly necessary.
On some operating systems, capabilities even totally alleviate this need.
Lately, maybe due to a raise of containerization or automated config
generation or a bit of both, we've observed a resurgence of this bad
practice, possibly due to the fact that users are just not aware of the
conditions they're using their daemon.
Let's add a warning at boot when starting as root without having requested
it using "uid" or "user". And take this opportunity for warning the user
about the existence of capabilities when supported, and encouraging the
use of a chroot.
This is achieved by leaving global.uid set to -1 by default, allowing us
to detect if it was explicitly set or not.
add initial support for the "shm-stats-file" directive and
associated "shm-stats-file-max-objects" directive. For now they are
flagged as experimental directives.
The shared memory file is automatically created by the first process.
The file is created using open() so it is up to the user to provide
relevant path (either on regular filesystem or ramfs for performance
reasons). The directive takes only one argument which is path of the
shared memory file. It is passed as-is to open().
The maximum number of objects per thread-group (hard limit) that can be
stored in the shm is defined by "shm-stats-file-max-objects" directive,
Upon initial creation, the main shm stats file header is provisioned with
the version which must remains the same to be compatible between processes
and defaults to 2k. which means approximately 1mb max per thread group
and should cover most setups. When the limit is reached (during startup)
an error is reported by haproxy which invites the user to increase the
"shm-stats-file-max-objects" if desired, but this means more memory will
be allocated. Actual memory usage is low at start, because only the mmap
(mapping) is provisionned with the maximum number of objects to avoid
relocating the memory area during runtime, but the actual shared memory
file is dynamically resized when objects are added (resized by following
half power of 2 curve when new objects are added, see upcoming commits)
For now only the file is created, further logic will be implemented in
upcoming commits.
The fix in 4a9e3e102e ("BUG/MINOR: haproxy: only tid 0 must not sleep
if got signal") had the nasty side effect of breaking the graceful
reload operations: threads whose id is non-zero could quit too early and
not process incoming traffic, which is visible with broken connections
during reloads. They just need to ignore the the stopping condition
until the signal queue is empty. In any case, it's the thread in charge
of the signal queue which will notify them once it receives the signal.
It was verified that connections are no longer broken with this fix,
and that the issue that required it (#2537, looping threads on reload)
does not re-appear with the reproducer, while it still did without the
fix above. Since the fix above was backported to every stable version,
this one will also have to.
Fix read return value unused result.
src/haproxy.c: In function ‘main’:
src/haproxy.c:3630:17: error: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
3630 | read(sock_pair[1], &c, 1);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Must be backported where d7f6819 is backported.
Since the mworker rework in haproxy 3.1, the worker need to tell the
master that it is ready. This is done using the sockpair protocol by
sending a _send_status message to the master.
It seems that the sockpair protocol is buggy on macOS because of a known
issue around fd transfer documented in sendmsg(2):
https://man.freebsd.org/cgi/man.cgi?sendmsg(2) BUGS section
Because sendmsg() does not necessarily block until the data has been
transferred, it is possible to transfer an open file descriptor across
an AF_UNIX domain socket (see recv(2)), then close() it before it has
actually been sent, the result being that the receiver gets a closed
file descriptor. It is left to the application to implement an
acknowledgment mechanism to prevent this from happening.
Indeed the recv side of the sockpair is closed on the send side just
after the send_fd_uxst(), which does not implement an acknowledgment
mechanism. So the master might never recv the _send_status message.
In order to implement an acknowledgment mechanism, a blocking read() is
done before closing the recv fd on the sending side, so we are sure that
the message was read on the other side.
This was only reproduced on macOS, meaning the master CLI is also
impacted on macOS. But no solution was found on macOS for it.
Implementing an acknowledgment mechanism would complexify too much the
protocol in non-blocking mode.
The problem was reported in ticket #3045, reproduced and analyzed by
@cognet.
Must be backported as far as 3.1.
When pre-check and post-check postparsing hooks= are evaluated in
step_init_2() potential fatal errors are ignored during the iteration
and are only taken into account at the end of the loop. This is not ideal
because some errors (ie: memory errors) could cause multiple alert
messages in a row, which could make troubleshooting harder for the user.
Let's stop as soon as a fatal error is encountered for post parsing
hooks, as we use to do everywhere else.