haproxy/src
Willy Tarreau 2a4523f6f4 BUG/MAJOR: pools: fix possible race with free() in the lockless variant
In GH issue #1275, Fabiano Nunes Parente provided a nicely detailed
report showing reproducible crashes under musl. Musl is one of the libs
coming with a simple allocator for which we prefer to keep the shared
cache. On x86 we have a DWCAS so the lockless implementation is enabled
for such libraries.

And this implementation has had a small race since day one: the allocator
will need to read the first object's <next> pointer to place it into the
free list's head. If another thread picks the same element and immediately
releases it, while both the local and the shared pools are too crowded, it
will be freed to the OS. If the libc's allocator immediately releases it,
the memory area is unmapped and we can have a crash while trying to read
that pointer. However there is no problem as long as the item remains
mapped in memory because whatever value found there will not be placed
into the head since the counter will have changed.

The probability for this to happen is extremely low, but as analyzed by
Fabiano, it increases with the buffer size. On 16 threads it's relatively
easy to reproduce with 2MB buffers above 200k req/s, where it should
happen within the first 20 seconds of traffic usually.

This is a structural issue for which there are two non-trivial solutions:
  - place a read lock in the alloc call and a barrier made of lock/unlock
    in the free() call to force to serialize operations; this will have
    a big performance impact since free() is already one of the contention
    points;

  - change the allocator to use a self-locked head, similar to what is
    done in the MT_LISTS. This requires two memory writes to the head
    instead of a single one, thus the overhead is exactly one memory
    write during alloc and one during free;

This patch implements the second option. A new POOL_DUMMY pointer was
defined for the locked pointer value, allowing to both read and lock it
with a single xchg call. The code was carefully optimized so that the
locked period remains the shortest possible and that bus writes are
avoided as much as possible whenever the lock is held.

Tests show that while a bit slower than the original lockless
implementation on large buffers (2MB), it's 2.6 times faster than both
the no-cache and the locked implementation on such large buffers, and
remains as fast or faster than the all implementations when buffers are
48k or higher. Tests were also run on arm64 with similar results.

Note that this code is not used on modern libcs featuring a fast allocator.

A nice benefit of this change is that since it removes a dependency on
the DWCAS, it will be possible to remove the locked implementation and
replace it with this one, that is then usable on all systems, thus
significantly increasing their performance with large buffers.

Given that lockless pools were introduced in 1.9 (not supported anymore),
this patch will have to be backported as far as 2.0. The code changed
several times in this area and is subject to many ifdefs which will
complicate the backport. What is important is to remove all the DWCAS
code from the shared cache alloc/free lockless code and replace it with
this one. The pool_flush() code is basically the same code as the
allocator, retrieving the whole list at once. If in doubt regarding what
barriers to use in older versions, it's safe to use the generic ones.

This patch depends on the following previous commits:

 - MINOR: pools: do not maintain the lock during pool_flush()
 - MINOR: pools: call malloc_trim() under thread isolation
 - MEDIUM: pools: use a single pool_gc() function for locked and lockless

The last one also removes one occurrence of an unneeded DWCAS in the
code that was incompatible with this fix. The removal of the now unused
seq field will happen in a future patch.

Many thanks to Fabiano for his detailed report, and to Olivier for
his help on this issue.
2021-06-10 17:46:50 +02:00
..
acl.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
action.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
activity.c MINOR: activity/cli: optionally support sorting by address on "show profiling" 2021-05-13 10:00:17 +02:00
applet.c BUG/MINOR: applet: Notify the other side if data were consumed by an applet 2021-04-28 10:51:08 +02:00
arg.c BUG/MINOR: http: Missing calloc return value check in make_arg_list 2021-05-31 10:51:09 +02:00
auth.c BUILD: auth: include missing list.h 2021-05-08 12:29:51 +02:00
backend.c MINOR: backend: Don't release SI endpoint anymore in connect_server() 2021-06-01 15:54:50 +02:00
base64.c MINOR: sample: add ub64dec and ub64enc converters 2021-04-13 17:28:13 +02:00
cache.c BUILD: cache: include tools.h in cache.c 2021-05-08 13:03:55 +02:00
calltrace.c BUILD: trace: include tools.h 2020-09-25 17:54:48 +02:00
cfgdiag.c CLEANUP: assorted typo fixes in the code and comments 2021-04-26 10:42:58 +02:00
cfgparse-global.c BUILD: fix usage of ha_alert without format string 2021-05-07 15:07:21 +02:00
cfgparse-listen.c BUILD: config: include tools.h in cfgparse-listen.c 2021-05-08 13:00:23 +02:00
cfgparse-ssl.c BUILD: make tune.ssl.keylog available again 2021-06-09 17:10:13 +02:00
cfgparse-tcp.c MINOR: server: prepare parsing for dynamic servers 2021-03-18 15:51:12 +01:00
cfgparse-unix.c MINOR: listener: create a new struct "settings" in bind_conf 2020-09-16 20:13:13 +02:00
cfgparse.c MINOR: haproxy: Add -cc argument 2021-06-08 11:17:19 +02:00
channel.c CLEANUP: channel: fix comment in ci_putblk. 2021-02-13 09:43:17 +01:00
check.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
chunk.c MINOR: pool: move pool declarations to read_mostly 2021-04-10 19:27:41 +02:00
cli.c BUILD: cli: appease a null-deref warning in cli_gen_usage_msg() 2021-05-10 07:47:05 +02:00
compression.c BUG/MINOR: compression: Missing calloc return value check in comp_append_type/algo 2021-05-31 10:51:04 +02:00
connection.c BUILD: connection: move list_mux_proto() to connection.c 2021-05-08 20:24:09 +02:00
cpuset.c BUG/MAJOR: fix build on musl with cpu_set_t support 2021-04-27 14:11:26 +02:00
debug.c CLEANUP: cli/tree-wide: properly re-align the CLI commands' help messages 2021-05-07 11:51:26 +02:00
dgram.c REORG: dgram: rename proto_udp to dgram 2020-06-11 10:18:59 +02:00
dict.c CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec 2021-04-07 18:18:37 +02:00
dns.c DOC: fix a few remainig cases of "Haproxy" and "HAproxy" in doc and comments 2021-05-09 06:50:46 +02:00
dynbuf.c CLEANUP: pools: re-merge pool_refill_alloc() and __pool_refill_alloc() 2021-04-19 15:24:33 +02:00
eb32sctree.c REORG: ebtree: move the include files from ebtree to include/import/ 2020-06-11 09:31:11 +02:00
eb32tree.c REORG: ebtree: move the include files from ebtree to include/import/ 2020-06-11 09:31:11 +02:00
eb64tree.c REORG: ebtree: move the include files from ebtree to include/import/ 2020-06-11 09:31:11 +02:00
ebimtree.c CLEANUP: include: tree-wide alphabetical sort of include files 2020-06-11 10:18:59 +02:00
ebistree.c REORG: ebtree: move the include files from ebtree to include/import/ 2020-06-11 09:31:11 +02:00
ebmbtree.c REORG: ebtree: move the include files from ebtree to include/import/ 2020-06-11 09:31:11 +02:00
ebpttree.c REORG: ebtree: move the include files from ebtree to include/import/ 2020-06-11 09:31:11 +02:00
ebsttree.c REORG: ebtree: move the include files from ebtree to include/import/ 2020-06-11 09:31:11 +02:00
ebtree.c BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare memory blocks 2020-06-16 11:30:33 +02:00
errors.c BUG: errors: remove printf positional args for user messages context 2021-06-08 11:40:44 +02:00
ev_epoll.c MINOR: epoll: move epoll_fd to read_mostly 2021-04-10 19:27:41 +02:00
ev_evports.c CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec 2021-04-07 18:18:37 +02:00
ev_kqueue.c MINOR: kqueue: move kqueue_fd to read_mostly 2021-04-10 19:27:41 +02:00
ev_poll.c CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec 2021-04-07 18:18:37 +02:00
ev_select.c CLEANUP: Remove useless malloc() casts 2021-04-08 20:11:58 +02:00
extcheck.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
fcgi-app.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
fcgi.c CLEANUP: include: tree-wide alphabetical sort of include files 2020-06-11 10:18:59 +02:00
fd.c BUILD: fd: include log.h from fd.c 2021-05-08 20:35:39 +02:00
filters.c BUG/MEDIUM: filters: Exec pre/post analysers only one time per filter 2021-05-21 09:59:00 +02:00
fix.c CLEANUP: assorted typo fixes in the code and comments 2020-12-21 11:24:48 +01:00
flt_http_comp.c BUG/MEDIUM: compression: Add a flag to know the filter is still processing data 2021-06-10 08:57:55 +02:00
flt_spoe.c BUILD: spoe: flt_spoe.c needs tools.h 2021-05-08 12:57:17 +02:00
flt_trace.c CLEANUP: Use istadv(const struct ist, const size_t) whenever possible 2021-03-03 05:07:10 +01:00
freq_ctr.c CLEANUP: freq_ctr: make arguments of freq_ctr_total() const 2021-04-28 17:44:37 +02:00
frontend.c MINOR: http-ana: Simplify creation/destruction of HTTP transactions 2021-04-01 11:06:48 +02:00
h1.c MEDIUM: h1: add a WebSocket key on handshake if needed 2021-01-28 16:37:14 +01:00
h1_htx.c MINOR: muxes/h1-htx: Realign input buffer using b_slow_realign_ofs() 2021-05-25 10:41:50 +02:00
h2.c CLEANUP: htx: Remove unsued hdrs_bytes field from the HTX start-line 2021-04-28 10:51:08 +02:00
haproxy.c MINOR: haproxy: Add -cc argument 2021-06-08 11:17:19 +02:00
hash.c REORG: include: move base64.h, errors.h and hash.h from common to to haproxy/ 2020-06-11 10:18:56 +02:00
hlua.c BUILD: hlua: include proxy.h from hlua.c 2021-05-08 20:35:39 +02:00
hlua_fcn.c MINOR: stats: pass the appctx flags to stats_fill_info() 2021-05-08 10:52:12 +02:00
hpack-dec.c CLEANUP: Use isttest(const struct ist) whenever possible 2021-03-03 05:07:10 +01:00
hpack-enc.c CLEANUP: include: tree-wide alphabetical sort of include files 2020-06-11 10:18:59 +02:00
hpack-huff.c CONTRIB: move some dev-specific tools to dev/ 2021-04-02 17:48:42 +02:00
hpack-tbl.c MINOR: pool: move pool declarations to read_mostly 2021-04-10 19:27:41 +02:00
http.c MINOR: http: Add HTTP 501-not-implemented error message 2021-01-21 15:21:12 +01:00
http_acl.c CLEANUP: acl: don't reference the generic pattern deletion function anymore 2020-11-05 19:27:09 +01:00
http_act.c BUG/MINOR: http: Missing calloc return value check in parse_http_req_capture 2021-05-31 10:50:55 +02:00
http_ana.c MINOR: http-ana: Use -1 status for client aborts during queuing and connect 2021-06-02 17:17:34 +02:00
http_conv.c MINOR: http-conv: Don't check if argument list is set in sample converters 2021-01-29 13:26:02 +01:00
http_fetch.c BUG/MINOR: http_fetch: fix possible uninit sockaddr in fetch_url_ip/port 2021-05-10 14:48:55 +02:00
http_htx.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
http_rules.c BUG/MINOR: http: Missing calloc return value check while parsing redirect rule 2021-05-31 10:51:08 +02:00
htx.c CLEANUP: htx: Remove unsued hdrs_bytes field from the HTX start-line 2021-04-28 10:51:08 +02:00
init.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
lb_chash.c CLEANUP: backend: fix incorrect comments on locking conditions for lb functions 2021-06-04 15:40:50 +02:00
lb_fas.c CLEANUP: backend: fix incorrect comments on locking conditions for lb functions 2021-06-04 15:40:50 +02:00
lb_fwlc.c CLEANUP: backend: fix incorrect comments on locking conditions for lb functions 2021-06-04 15:40:50 +02:00
lb_fwrr.c CLEANUP: backend: fix incorrect comments on locking conditions for lb functions 2021-06-04 15:40:50 +02:00
lb_map.c MINOR: lb/map: use seek lock and read locks where appropriate 2020-10-17 19:04:27 +02:00
listener.c BUILD: listener: include proxy.h from listener.c 2021-05-08 20:35:39 +02:00
log.c REORG: errors: split errors reporting function from log.c 2021-06-07 16:58:15 +02:00
lru.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
mailers.c MEDIUM: mailers: use "HAProxy" nor "HAproxy" in the subject of messages 2021-05-09 06:45:16 +02:00
map.c MINOR: map/acl: print the count of all the map/acl entries in "show map/acl" 2021-05-25 08:44:45 +02:00
mjson.c MINOR: sample: converter: Add mjson library. 2021-04-15 17:05:38 +02:00
mqtt.c CLEANUP: assorted typo fixes in the code and comments 2020-12-21 11:24:48 +01:00
mux_fcgi.c CLEANUP: mux-fcgi: Don't needlessly store result of data/trailers parsing 2021-06-02 12:04:42 +02:00
mux_h1.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
mux_h2.c MEDIUM: connection: close front idling connection on soft-stop 2021-05-05 14:39:23 +02:00
mux_pt.c MINOR: trace: make trace sources read_mostly 2021-04-10 19:29:26 +02:00
mworker-prog.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
mworker.c BUG/MINOR: worker: Missing calloc return value check in mworker_env_to_proc_list 2021-05-31 10:51:06 +02:00
namespace.c REORG: include: move the error reporting functions to from log.h to errors.h 2020-06-11 10:18:59 +02:00
pattern.c MINOR: map/acl: print the count of all the map/acl entries in "show map/acl" 2021-05-25 08:44:45 +02:00
payload.c BUILD: payload: include tools.h in payload.c 2021-05-08 13:55:40 +02:00
peers.c BUG/MINOR: peers: Missing calloc return value check in peers_register_table 2021-05-31 10:50:46 +02:00
pipe.c CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec 2021-04-07 18:18:37 +02:00
pool.c BUG/MAJOR: pools: fix possible race with free() in the lockless variant 2021-06-10 17:46:50 +02:00
proto_quic.c MINOR: fd: move .linger_risk into fdtab[].state 2021-04-07 18:07:49 +02:00
proto_sockpair.c MINOR: fd: move .linger_risk into fdtab[].state 2021-04-07 18:07:49 +02:00
proto_tcp.c MINOR: fd: move .linger_risk into fdtab[].state 2021-04-07 18:07:49 +02:00
proto_udp.c BUILD: udp: include tools.h from proto_udp.c 2021-05-08 13:59:56 +02:00
proto_uxdg.c BUG/MINOR: protocol: add missing support of dgram unix socket. 2021-03-18 18:30:29 +01:00
proto_uxst.c MINOR: fd: move .linger_risk into fdtab[].state 2021-04-07 18:07:49 +02:00
protocol.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
proxy.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
qpack-tbl.c CLEANUP: qpack: Wrong comment about the draft for QPACK static header table. 2021-01-04 12:31:28 +01:00
queue.c BUILD: queue: include tools.h from queue.c 2021-05-08 13:59:05 +02:00
quic_cc.c MINOR: quic: Import C source code files for QUIC protocol. 2020-12-23 11:57:26 +01:00
quic_cc_newreno.c MINOR: quic: Add traces to congestion avoidance NewReno callback. 2020-12-23 11:57:26 +01:00
quic_frame.c CLEANUP: assorted typo fixes in the code and comments 2021-01-06 16:26:50 +01:00
quic_sock.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
quic_tls.c MINOR: quic: Update the initial salt to that of draft-29. 2020-12-23 11:57:26 +01:00
raw_sock.c MINOR: fd: move .linger_risk into fdtab[].state 2021-04-07 18:07:49 +02:00
regex.c OPTIM: regex: PCRE2 use JIT match when JIT optimisation occured. 2020-08-14 07:53:40 +02:00
resolvers.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
ring.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
sample.c BUILD: sample: use strtoll() instead of atoll() 2021-05-14 08:51:53 +02:00
server.c BUG/MINOR: server: explicitly set "none" init-addr for dynamic servers 2021-06-10 17:44:05 +02:00
server_state.c BUILD: server-state: include tools.h from server_state.c 2021-05-08 13:08:34 +02:00
session.c BUILD: session: include tools.h in session.c 2021-05-08 13:03:04 +02:00
sha1.c BUILD: use inttypes.h instead of stdint.h 2019-04-01 07:44:56 +02:00
shctx.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
signal.c CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion 2021-04-21 09:20:17 +02:00
sink.c BUILD: sink: include proxy.h in sink.c 2021-05-08 20:24:09 +02:00
slz.c IMPORT: slz: use inttypes.h instead of stdint.h 2021-05-14 08:44:52 +02:00
sock.c MINOR: fd: move .linger_risk into fdtab[].state 2021-04-07 18:07:49 +02:00
sock_inet.c MINOR: fd: move .exported into fdtab[].state 2021-04-07 18:10:36 +02:00
sock_unix.c MINOR: fd: move .exported into fdtab[].state 2021-04-07 18:10:36 +02:00
ssl_ckch.c MINOR: ssl: Add the "show ssl cert foo.pem.ocsp" CLI command 2021-06-10 16:44:11 +02:00
ssl_crtlist.c MEDIUM: ssl: Chain ckch instances in ca-file entries 2021-05-17 10:50:24 +02:00
ssl_sample.c BUILD: make tune.ssl.keylog available again 2021-06-09 17:10:13 +02:00
ssl_sock.c MINOR: ssl: Add new "show ssl ocsp-response" CLI command 2021-06-10 16:44:11 +02:00
ssl_utils.c BUILD: ssl: ssl_utils requires chunk.h 2021-05-08 12:52:56 +02:00
stats.c BUG/MINOR: stats: fix lastchk metric that got accidently lost 2021-05-12 17:50:16 +02:00
stick_table.c BUG/MINOR: peers: Missing calloc return value check in peers_register_table 2021-05-31 10:50:46 +02:00
stream.c Revert "MEDIUM: http-ana: Deal with L7 retries in HTTP analysers" 2021-05-25 10:51:20 +02:00
stream_interface.c MINOR: http-ana: Perform L7 retries because of status codes in response analyser 2021-05-26 13:56:06 +02:00
task.c MINOR: task: stop including stream.h from task.c 2021-05-08 20:27:08 +02:00
tcp_act.c CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec 2021-04-07 18:18:37 +02:00
tcp_rules.c BUG/MINOR: http: Missing calloc return value check while parsing tcp-request rule 2021-05-31 10:51:02 +02:00
tcp_sample.c MINOR: tcp_samples: Be able to call bc_src/bc_dst from the health-checks 2021-04-19 08:31:05 +02:00
tcpcheck.c MINOR: errors: specify prefix "config" for parsing output 2021-06-07 17:19:16 +02:00
thread.c BUILD: thread: include log.h from thread.c 2021-05-08 20:35:39 +02:00
time.c BUG/MEDIUM: time: fix updating of global_now upon clock drift 2021-04-28 17:43:55 +02:00
tools.c CLEANUP: tools: Make errptr const in parse_line() 2021-06-08 10:56:10 +02:00
trace.c CLEANUP: cli/tree-wide: properly re-align the CLI commands' help messages 2021-05-07 11:51:26 +02:00
uri_auth.c CLEANUP: Compare the return value of XXXcmp() functions with zero 2021-01-04 10:09:02 +01:00
uri_normalizer.c MINOR: uri_normalizer: Add fragment-encode normalizer 2021-05-11 17:24:32 +02:00
vars.c BUG/MINOR: vars: Be sure to have a session to get checks variables 2021-06-02 11:55:14 +02:00
version.c BUILD: Fix build by including haproxy/global.h 2020-06-16 23:36:04 +02:00
wdt.c BUILD: wdt: include signal-t.h 2021-05-08 12:29:01 +02:00
xprt_handshake.c MEDIUM: connections: Implement a start() method for xprt_handshake. 2021-03-19 15:33:04 +01:00
xprt_quic.c BUG/MEDIUM: quic: fix null deref on error path in qc_conn_init() 2021-05-10 07:40:27 +02:00