haproxy/src
Bin Wang 95fad5ba4b BUG/MAJOR: stream-int: don't re-arm recv if send fails
When
    1) HAProxy configured to enable splice on both directions
    2) After some high load, there are 2 input channels with their socket buffer
       being non-empty and pipe being full at the same time, sitting in `fd_cache`
       without any other fds.

The 2 channels will repeatedly be stopped for receiving (pipe full) and waken
for receiving (data in socket), thus getting out and in of `fd_cache`, making
their fd swapping location in `fd_cache`.

There is a `if (entry < fd_cache_num && fd_cache[entry] != fd) continue;`
statement in `fd_process_cached_events` to prevent frequent polling, but since
the only 2 fds are constantly swapping location, `fd_cache[entry] != fd` will
always hold true, thus HAProxy can't make any progress.

The root cause of the issue is dual :
  - there is a single fd_cache, for next events and for the ones being
    processed, while using two distinct arrays would avoid the problem.

  - the write side of the stream interface wakes the read side up even
    when it couldn't write, and this one really is a bug.

Due to CF_WRITE_PARTIAL not being cleared during fast forwarding, a failed
send() attempt will still cause ->chk_rcv() to be called on the other side,
re-creating an entry for its connection fd in the cache, causing the same
sequence to be repeated indefinitely without any opportunity to make progress.

CF_WRITE_PARTIAL used to be used for what is present in these tests : check
if a recent write operation was performed. It's part of the CF_WRITE_ACTIVITY
set and is tested to check if timeouts need to be updated. It's also used to
detect if a failed connect() may be retried.

What this patch does is use CF_WROTE_DATA() to check for a successful write
for connection retransmits, and to clear CF_WRITE_PARTIAL before preparing
to send in stream_int_notify(). This way, timeouts are still updated each
time a write succeeds, but chk_rcv() won't be called anymore after a failed
write.

It seems the fix is required all the way down to 1.5.

Without this patch, the only workaround at this point is to disable splicing
in at least one direction. Strictly speaking, splicing is not absolutely
required, as regular forwarding could theorically cause the issue to happen
if the timing is appropriate, but in practice it appears impossible to
reproduce it without splicing, and even with splicing it may vary.

The following config manages to reproduce it after a few attempts (haproxy
going 100% CPU and having to be killed) :

  global
      maxpipes 50000
      maxconn 10000

  listen srv1
      option splice-request
      option splice-response
      bind :8001
      server s1 127.0.0.1:8002

  server$ tcploop 8002 L N20 A R10 S1000000 R10 S1000000 R10 S1000000 R10 S1000000 R10 S1000000
  client$ tcploop 8001 N20 C T S1000000 R10 J
2017-10-05 11:20:16 +02:00
..
51d.c CLEANUP: 51d: move global settings out of the global section 2016-12-21 21:30:54 +01:00
acl.c BUG/MEDIUM: map/acl: fix unwanted flags inheritance. 2017-07-04 10:45:53 +02:00
applet.c MINOR: applet: Check applets_active_queue before processing applets queue 2017-09-05 10:21:29 +02:00
arg.c BUG/MEDIUM: arg: ensure that we properly unlink unresolved arguments on error 2017-04-13 12:20:52 +02:00
auth.c CLEANUP: auth: use the build options list to report its support 2016-12-21 21:30:54 +01:00
backend.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
base64.c [MINOR] add encode/decode function for 30-bit integers from/to base64 2010-10-30 19:04:33 +02:00
buffer.c MINOR: buffers: Move swap_buffer into buffer.c and add deinit_buffer function 2017-09-05 10:34:30 +02:00
cfgparse.c MINOR: listeners: new function create_listeners 2017-09-15 11:49:52 +02:00
channel.c BUG/MEDIUM: buffers: Fix how input/output data are injected into buffers 2017-03-31 14:36:04 +02:00
checks.c MEDIUM: checks: do not allocate a permanent connection anymore 2017-10-04 19:36:29 +02:00
chunk.c MINOR: chunks: Use dedicated function to init/deinit trash buffers 2017-09-05 10:22:20 +02:00
cli.c BUG/MEDIUM: cli: fix "show fd" crash when dumping closed FDs 2017-10-04 20:28:26 +02:00
compression.c MINOR: compression: fix -vv output without zlib/slz 2017-01-11 16:11:11 +01:00
connection.c MEDIUM: connection: get rid of data->init() which was not for data 2017-08-30 07:04:04 +02:00
da.c CLEANUP: da: move global settings out of the global section 2016-12-21 21:30:54 +01:00
dns.c MINOR: net_helper: add functions to read from vectors 2017-09-20 11:27:31 +02:00
ev_epoll.c MINOR: polling: Use fd_update_events to update events seen for a fd 2017-09-05 15:45:11 +02:00
ev_kqueue.c MINOR: polling: Use fd_update_events to update events seen for a fd 2017-09-05 15:45:11 +02:00
ev_poll.c MINOR: polling: Use fd_update_events to update events seen for a fd 2017-09-05 15:45:11 +02:00
ev_select.c MINOR: polling: Use fd_update_events to update events seen for a fd 2017-09-05 15:45:11 +02:00
fd.c MINOR: fd: Move (de)allocation of fdtab and fdinfo in (de)init_pollers 2017-09-05 10:49:45 +02:00
filters.c BUG/MEDIUM: filters: Be sure to call flt_end_analyze for both channels 2017-07-06 23:07:36 +02:00
flt_http_comp.c BUG/MINOR: compression: Check response headers before http-response rules eval 2017-09-15 18:42:23 +02:00
flt_spoe.c BUG/MINOR: spoe: Don't rely on SPOE ctx in debug message when its creation failed 2017-09-15 18:42:23 +02:00
flt_trace.c MINOR: filters: Add check_timeouts callback to handle timers expiration on streams 2016-11-21 15:29:58 +01:00
freq_ctr.c BUG/MINOR: time: frequency counters are not totally accurate 2012-12-29 21:50:07 +01:00
frontend.c MINOR: frontend: don't retrieve ALPN on the critical path 2017-09-15 11:49:27 +02:00
haproxy.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
hash.c MINOR: hash: add new function hash_crc32 2015-01-20 19:48:05 +01:00
hdr_idx.c OPTIM/MINOR: move the hdr_idx pools out of the proxy struct 2011-10-24 18:15:04 +02:00
hlua.c MEDIUM: session: count the frontend's connections at a single place 2017-09-15 11:49:52 +02:00
hlua_fcn.c BUG/MINOR: lua: Fix bitwise logic for hlua_server_check_* functions. 2017-07-28 15:24:57 +02:00
i386-linux-vsys.c MEDIUM: listener: add support for linux's accept4() syscall 2012-10-08 20:11:03 +02:00
lb_chash.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
lb_fas.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
lb_fwlc.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
lb_fwrr.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
lb_map.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
listener.c MEDIUM: session: count the frontend's connections at a single place 2017-09-15 11:49:52 +02:00
log.c BUG/MINOR: log: fixing small memory leak in error code path. 2017-09-21 17:44:31 +02:00
lru.c MINOR: lru: new function to delete <nb> least recently used keys 2016-01-11 07:31:35 +01:00
mailers.c MEDIUM: Add parsing of mailers section 2015-02-03 00:24:16 +01:00
map.c MINOR: add severity information to cli feedback messages 2017-09-13 13:38:32 +02:00
memory.c MINOR: memory: remove macros 2017-07-21 09:54:03 +02:00
namespace.c CLEANUP: namespaces: use the build options list to report it 2016-12-21 21:30:54 +01:00
pattern.c BUG/MEDIUM: map/acl: fix unwanted flags inheritance. 2017-07-04 10:45:53 +02:00
payload.c BUG: payload: fix payload not retrieving arbitrary lengths 2017-03-20 07:25:37 +01:00
peers.c MEDIUM: session: count the frontend's connections at a single place 2017-09-15 11:49:52 +02:00
pipe.c BUILD/MINOR: silent a build warning in src/pipe.c (fcntl) 2011-10-24 17:09:22 +02:00
proto_http.c BUG/MEDIUM: http: Return an error when url_dec sample converter failed 2017-10-05 11:11:34 +02:00
proto_tcp.c BUG/MEDIUM: tcp/http: set-dst-port action broken 2017-10-04 04:36:17 +02:00
proto_udp.c CLEANUP: fix inconsistency between fd->iocb, proto->accept and accept() 2016-04-14 11:18:22 +02:00
proto_uxst.c BUG/MINOR: unix: properly check for octal digits in the "mode" argument 2017-10-04 14:43:44 +02:00
protocol.c BUILD: protocol: fix some build errors on OpenBSD 2016-08-10 19:31:58 +02:00
proxy.c MINOR: listeners: make listeners count consistent with reality 2017-09-15 11:49:52 +02:00
queue.c MEDIUM: check: server states and weight propagation re-work 2017-09-05 15:23:16 +02:00
raw_sock.c REORG/MEDIUM: connection: introduce the notion of connection handle 2017-08-24 19:30:04 +02:00
rbtree.c [MINOR] imported the rbtree function from Linux kernel 2007-01-07 02:12:57 +01:00
regex.c MEDIUM: regex: pcre2 support 2016-12-28 12:51:51 +01:00
sample.c MINOR: samples: Handle the type SMP_T_METH when we duplicate a sample in smp_dup 2017-07-24 17:15:47 +02:00
server.c BUG/MEDIUM: server: unwanted behavior leaving maintenance mode on tracked stopping server (take2) 2017-09-21 17:37:38 +02:00
session.c MEDIUM: session: count the frontend's connections at a single place 2017-09-15 11:49:52 +02:00
shctx.c MEDIUM: ssl: Add support for OpenSSL 1.1.0 2016-11-08 20:54:41 +01:00
signal.c MEDIUM: mworker: handle reload and signals 2017-06-02 10:56:32 +02:00
ssl_sock.c MINOR: ssl: Remove useless checks on bind_conf or bind_conf->is_ssl 2017-09-15 18:42:23 +02:00
standard.c MINOR: tools: add a portable timegm() alternative 2017-07-19 19:15:06 +02:00
stats.c MINOR: unix: remove the now unused proto_uxst.h file 2017-09-15 11:49:52 +02:00
stick_table.c MINOR: add severity information to cli feedback messages 2017-09-13 13:38:32 +02:00
stream.c BUG/MAJOR: stream-int: don't re-arm recv if send fails 2017-10-05 11:20:16 +02:00
stream_interface.c BUG/MAJOR: stream-int: don't re-arm recv if send fails 2017-10-05 11:20:16 +02:00
task.c MINOR: tasks: Move Lua notification from Lua to tasks 2017-09-11 18:59:40 +02:00
tcp_rules.c MINOR: tcp-rules: check that the listener exists before updating its counters 2016-12-22 23:26:37 +01:00
time.c CLEANUP: time: curr_sec_ms doesn't need to be exported 2017-03-29 15:24:33 +02:00
trace.c BUG/MEDIUM: trace.c: rdtsc() is defined in two files 2016-04-09 22:27:01 +02:00
uri_auth.c CLEANUP: uniformize last argument of malloc/calloc 2016-04-03 14:17:42 +02:00
vars.c MINOR: samples: Don't allocate memory for SMP_T_METH sample when method is known 2017-07-24 17:16:11 +02:00
wurfl.c CLEANUP: wurfl: move global settings out of the global section 2016-12-21 21:30:54 +01:00
xxhash.c CLEANUP: remove unneeded casts 2016-04-03 14:17:42 +02:00