HAProxy - Load balancer
Find a file
Willy Tarreau 21447b1dd4 BUG/MAJOR: stick-tables: fix race with peers in entry expiration
In 2.9 with commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt
atomically") we significantly relaxed the stick-tables locking when
dealing with peers by adjusting the ref_cnt atomically and moving it
out of the lock.

However it opened a tiny window that became problematic in 3.0-dev7
when the table's contention was lowered by commit 1a088da7c2 ("MAJOR:
stktable: split the keys across multiple shards to reduce contention").

What happens is that some peers may access the entry for reading at
the moment it's about to expire, and while the read accesses to push
the data remain unnoticed (possibly that from time to time we push
crap), but the releasing of the refcount causes a new write that may
damage anything else. The scenario is the following:

  process_table_expire()               peer_send_teachmsgs()

                                       RDLOCK(&updt_lock);
     tick_is_expired() != 0

     ebmb_delete(ts->key);

     if (ts->upd.node.leaf_p) {
                                       HA_ATOMIC_INC(&ts->ref_cnt);
                                       RDUNLOCK(&updt_lock);
          WRLOCK(&updt_lock);
          eb32_delete(&ts->upd);
     }
     __stksess_free(t, ts);
                                       peer_send_updatemsg(ts);
                                       RDLOCK(&updt_lock);
                                       HA_ATOMIC_DEC(&ts->ref_cnt);

Here it's clear that the bottom part of peer_send_teachmsgs() believes
to be protected but may act on freed data.

This is more visible when enabling -dMtag,no-merge,integrity because
the ATOMIC_DEC(&ref_cnt) decrements one byte in the area, that makes
the eviction check fail while the tag has the address of the left
__stksess_free(), proving a completed pool_free() before the decrement,
and the anomaly there is pretty visible in the crash dump. Changing
INC()/DEC() with ADD(2)/DEC(2) shows that the byte is now off by two,
confirming that the operation happened there.

The solution is not very hard, it consists in checking for the ref_cnt
on the left after grabbing the lock, and doing both before deleting the
element, so that we have the guarantee that either the peer will not
take it or that it has already started taking it.

This was proven to be sufficient, as instead of crashing after 3s of
injection with 4 peers, 16 threads and 130k RPS, it survived for 15mn.

In order to stress the setup, a config involving 4+ peers, tracking
HTTP request with randoms and applying a bwlim-out filter with a
random key, with a client made of 160 h2 conns downloading 10 streams
of 4MB objects in parallel managed to trigger it within a few seconds:

  frontend ft
    http-request track-sc0 rand(100000) table tbl
    filter bwlim-out lim-out limit 2047m key rand(100000000),ipmask(32) min-size 1 table tbl
    http-request set-bandwidth-limit lim-out
    use_backend bk

  backend bk
    server s1 198.18.0.30:8000
    server s2 198.18.0.34:8000

  backend tbl
        stick-table type ip size 1000k expire 1s store http_req_cnt,bytes_in_rate(1s),bytes_out_rate(1s) peers peers

This seems to be very dependent on the timing and setup though.

This will need to be backported to 2.9. This part of the code was
reindented with shards but the block should remain mostly unchanged.
The logic to apply is the same.
2024-04-12 18:00:13 +02:00
.github BUILD: makefile: also drop DEBUG_CFLAGS 2024-04-11 17:33:28 +02:00
addons CLEANUP: assorted typo fixes in the code and comments 2024-03-05 11:50:34 +01:00
admin BUILD: address a few remaining calloc(size, n) cases 2024-02-10 11:37:27 +01:00
dev MEDIUM: ring: use the topmost bit of the tail as a lock 2024-03-25 17:34:19 +00:00
doc DOC: configuration: Add 3.12 Certificate Storage 2024-04-12 15:38:54 +02:00
examples CLEANUP: assorted typo fixes in the code and comments 2023-11-23 16:23:14 +01:00
include MINOR: ssl: rename ckchs_load_cert_file to new_ckch_store_load_files_path 2024-04-12 15:38:54 +02:00
reg-tests REGTESTS: ssl: test simple case of crt-store 2024-04-12 15:38:54 +02:00
scripts CI: ssl: add yet another OpenSSL download fallback 2024-02-07 11:05:45 +01:00
src BUG/MAJOR: stick-tables: fix race with peers in entry expiration 2024-04-12 18:00:13 +02:00
tests Revert "MAJOR: import: update mt_list to support exponential back-off" 2023-09-15 17:13:43 +02:00
.cirrus.yml CI: cirrus-ci: display gdb bt if any 2023-09-22 08:28:30 +02:00
.gitattributes MINOR: Configure the cpp userdiff driver for *.[ch] in .gitattributes 2021-02-22 18:17:57 +01:00
.gitignore CONTRIB: Add vi file extensions to .gitignore 2023-06-02 18:14:34 +02:00
.mailmap DOC: update Tim's address in .mailmap 2021-09-16 09:14:14 +02:00
.travis.yml CI: travis-ci: temporarily disable arm64 builds 2021-08-07 07:28:15 +02:00
BRANCHES DOC: fix some spelling issues over multiple files 2021-01-08 14:53:47 +01:00
BSDmakefile BUILD: makefile: commit the tiny FreeBSD makefile stub 2023-05-24 17:17:36 +02:00
CHANGELOG [RELEASE] Released version 3.0-dev7 2024-04-06 17:02:07 +02:00
CONTRIBUTING CLEANUP: assorted typo fixes in the code and comments 2021-08-16 12:37:59 +02:00
INSTALL DOC: install: clarify the build process by splitting it into subsections 2024-04-11 18:02:26 +02:00
LICENSE LICENSE: add licence exception for OpenSSL 2012-09-07 13:52:26 +02:00
MAINTAINERS MAJOR: spoe: Deprecate the SPOE filter 2024-03-15 11:29:39 +01:00
Makefile CLEANUP: makefile: make the output of the "opts" target more readable 2024-04-11 17:33:28 +02:00
README DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
SUBVERS BUILD: use format tags in VERDATE and SUBVERS files 2013-12-10 11:22:49 +01:00
VERDATE [RELEASE] Released version 3.0-dev7 2024-04-06 17:02:07 +02:00
VERSION [RELEASE] Released version 3.0-dev7 2024-04-06 17:02:07 +02:00

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)