haproxy/include/types
Willy Tarreau 2b57cb8f30 MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :

  19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
  19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
  19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257

After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.

The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.

In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :

  19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
  19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
  19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
  19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257

This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 20:33:23 +02:00
..
acl.h MEDIUM: acl: have a pointer to the keyword name in acl_expr 2013-04-03 02:13:01 +02:00
arg.h MAJOR: sample: maintain a per-proxy list of the fetch args to resolve 2013-04-03 02:13:02 +02:00
auth.h [REORG] http: move the http-request rules to proto_http 2011-03-13 22:00:24 +01:00
backend.h BUG/MAJOR: checks: don't call set_server_status_* when no LB algo is set 2012-05-19 19:09:46 +02:00
capture.h [MAJOR] last bunch of capture changes for mempool v2 2007-05-13 22:46:04 +02:00
channel.h CLEANUP: channel: remove any reference of the hijackers 2012-11-11 23:05:39 +01:00
checks.h MINOR: checks: add on-marked-up option 2012-06-03 23:48:42 +02:00
compression.h MEDIUM: compression: use pool for comp_ctx 2012-11-21 01:56:47 +01:00
connection.h REORG: tproxy: prepare the transparent proxy defines for accepting other OSes 2013-05-11 08:03:37 +02:00
counters.h MINOR: stats: report the total number of compressed responses per front/back 2012-11-24 14:54:13 +01:00
fd.h MAJOR: polling: remove unused callbacks from the poller struct 2012-11-11 21:02:34 +01:00
freq_ctr.h [MINOR] freq_ctr: add new types and functions for periods different from 1s 2010-08-10 14:01:09 +02:00
global.h MINOR: ssl: add a global tunable for the max SSL/TLS record size 2013-02-21 07:53:13 +01:00
hdr_idx.h [BUG] files were missing for hdr_idx in previous commit 2006-12-04 02:20:02 +01:00
lb_chash.h [MEDIUM] build: switch ebtree users to use new ebtree version 2009-10-26 21:10:04 +01:00
lb_fas.h MEDIUM: backend: add the 'first' balancing algorithm 2012-02-21 22:27:27 +01:00
lb_fwlc.h [MEDIUM] build: switch ebtree users to use new ebtree version 2009-10-26 21:10:04 +01:00
lb_fwrr.h [MEDIUM] build: switch ebtree users to use new ebtree version 2009-10-26 21:10:04 +01:00
lb_map.h [CLEANUP] proxy: move last lb-specific bits to their respective files 2009-10-03 18:41:18 +02:00
listener.h MINOR: ssl: add support for the "alpn" bind keyword 2013-04-03 02:13:02 +02:00
log.h BUG/MINOR: log: make log-format, unique-id-format and add-header more independant 2012-12-28 09:51:00 +01:00
obj_type.h MAJOR: connection: replace struct target with a pointer to an enum 2012-11-12 00:42:33 +01:00
peers.h MEDIUM: checks: Add agent health check 2013-02-13 11:03:28 +01:00
pipe.h [MEDIUM] introduce pipe pools 2009-01-25 13:49:53 +01:00
port_range.h [MEDIUM] add support for binding to source port ranges during connect 2009-06-10 12:23:32 +02:00
proto_http.h MEDIUM: http: add support for "http-request tarpit" rule 2012-12-28 14:47:19 +01:00
proto_tcp.h MEDIUM: counters: add support for tracking a third counter 2013-05-29 00:37:16 +02:00
protocol.h MEDIUM: protocol: implement a "drain" function in protocol layers 2013-06-10 20:33:23 +02:00
proxy.h BUG/MEDIUM: log: fix regression on log-format handling 2013-04-12 18:13:46 +02:00
queue.h [MAJOR] ported pendconn to mempools v2 2007-05-13 20:19:55 +02:00
sample.h MINOR: sample: provide a function to report the name of a sample check point 2013-04-03 02:13:00 +02:00
server.h MEDIUM: connection: introduce "struct conn_src" for servers and proxies 2012-12-09 10:04:39 +01:00
session.h MINOR: log: add a new flag 'L' for locally processed requests 2013-06-10 16:42:09 +02:00
signal.h [MEDIUM] signals: add support for registering functions and tasks 2010-08-27 18:00:40 +02:00
ssl_sock.h MEDIUM: ssl: improve crt-list format to support negation 2013-05-07 22:11:54 +02:00
stick_table.h MEDIUM: counters: add a new "gpc0_rate" counter in stick-tables 2013-05-29 15:54:14 +02:00
stream_interface.h MEDIUM: stats: add proxy name filtering on the statistic page 2013-04-15 22:50:33 +02:00
task.h [MEDIUM] signals: add support for registering functions and tasks 2010-08-27 18:00:40 +02:00
template.h [CLEANUP] included common/version.h everywhere 2006-06-29 18:54:54 +02:00