By default, when data is sent over a socket, both the write timeout and the
read timeout for that socket are refreshed, because we consider that there is
activity on that socket, and we have no other means of guessing if we should
receive data or not.
While this default behaviour is desirable for almost all applications, there
exists a situation where it is desirable to disable it, and only refresh the
read timeout if there are incoming data. This happens on sessions with large
timeouts and low amounts of exchanged data such as telnet session. If the
server suddenly disappears, the output data accumulates in the system's
socket buffers, both timeouts are correctly refreshed, and there is no way
to know the server does not receive them, so we don't timeout. However, when
the underlying protocol always echoes sent data, it would be enough by itself
to detect the issue using the read timeout. Note that this problem does not
happen with more verbose protocols because data won't accumulate long in the
socket buffers.
When this option is set on the frontend, it will disable read timeout updates
on data sent to the client. There probably is little use of this case. When
the option is set on the backend, it will disable read timeout updates on
data sent to the server. Doing so will typically break large HTTP posts from
slow lines, so use it with caution.
The lbprm structure has moved to backend.h, where it should be, and
all algo-specific types and declarations have moved to their specific
files. The proxy struct is now much more readable.
This patch implements "description" (proxy and global) and "node" (global)
options, removes "node-name" and adds "show-node" & "show-desc" options
for "stats". It also changes the way the header lines (with proxy name) and
the statistics are displayed, so stats no longer look so clumsy with very
long names.
Instead of "node-name" it is possible to use show-node/show-desc with
an optional parameter that overrides a default node/description.
backend cust-0045
# report specific values for this customer
stats show-node Europe
stats show-desc Master node for Europe, Asia, Africa
This patch adds health logging so it possible to check what
was happening before a crash. Failed healt checks are logged if
server is UP and succeeded healt checks if server is DOWN,
so the amount of additional information is limited.
I also reworked the code a little:
- check_status_description[] and check_status_info[] is now
joined into check_statuses[]
- set_server_check_status updates not only s->check_status and
s->check_duration but also s->result making the code simpler
Changes in v3:
- for now calculate and use local versions of health/rise/fall/state,
it is a slow path, no harm should be done. One day we may centralize
processing of the checks and remove the duplicated code.
- also log checks that are restoring current state
- use "conditionally succeeded" for 404 with disable-on-404
sess_establish() used to resort to protocol-specific guesses
in order to set rep->analysers. This is no longer needed as it
gets set from the frontend and the backend as a copy of what
was defined in the configuration.
Analyser bitmaps are now stored in the frontend and backend, and
combined at configuration time. That way, set_session_backend()
does not need to perform any protocol-specific combinations.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
The new statement "persist rdp-cookie" enables RDP cookie
persistence. The RDP cookie is then extracted from the RDP
protocol, and compared against available servers. If a server
matches the RDP cookie, then it gets the connection.
This patch propagates the ACL conditions' "requires" bitfield
to the proxies. This makes it possible to know exactly what a
proxy might have to support for any request, which helps knowing
whether we have to allocate some space for certain types of
structures or not (eg: the hdr_idx struct).
The concept might be extended to a lot more types of information,
such as detecting whether we need to allocate some space for some
request ACLs which need a result in the response, etc...
This new option enables combining of request buffer data with
the initial ACK of an outgoing TCP connection. Doing so saves
one packet per connection which is quite noticeable on workloads
mostly consisting in small objects. The option is not enabled by
default.
This option disables TCP quick ack upon accept. It is also
automatically enabled in HTTP mode, unless the option is
explicitly disabled with "no option tcp-smart-accept".
This saves one packet per connection which can bring reasonable
amounts of bandwidth for servers processing small requests.
Sometimes we would want to implement implicit default options,
but for this we need to be able to disable them, which requires
to keep track of "no option" settings. With this change, an option
explicitly disabled in a defaults section will still be seen as
explicitly disabled. There should be no regression as nothing makes
use of this yet.
Some users want to keep the max sessions/s seen on servers, frontends
and backends for capacity planning. It's easy to grab it while the
session count is updated, so let's keep it.
Some people are using haproxy in a shared environment where the
system logger by default sends alert and emerg messages to all
consoles, which happens when all servers go down on a backend for
instance. These people can not always change the system configuration
and would like to limit the outgoing messages level in order not to
disturb the local users.
The addition of an optional 4th field on the "log" line permits
exactly this. The minimal log level ensures that all outgoing logs
will have at least this level. So the logs are not filtered out,
just set to this level.
There is a patch made by me that allow for balancing on any http header
field.
[WT:
made minor changes:
- turned 'balance header name' into 'balance hdr(name)' to match more
closely the ACL syntax for easier future convergence
- renamed the proxy structure fields header_* => hh_*
- made it possible to use the domain name reduction to any header, not
only "host" since it makes sense to do it with other ones.
Otherwise patch looks good.
/WT]
Some big traffic sites have trouble dealing with logs and tend to
disable them. Here are two new options to help cope with massive
logs.
- dontlog-normal only disables logging for 100% successful
connections, other ones will still be logged
- log-separate-errors will cause non-100% successful connections
to be logged at level "err" instead of level "info" so that a
properly configured syslog daemon can send them to a different
file for longer conservation.
I have attached a patch which will add on every http request a new
header 'X-Original-To'. If you have HAProxy running in transparent mode
with a big number of SQUID servers behind it, it is very nice to have
the original destination ip as a common header to make decisions based
on it.
The whole thing is configurable with a new option 'originalto'. I have
updated the sourcecode as well as the documentation. The 'haproxy-en.txt'
and 'haproxy-fr.txt' files are untouched, due to lack of my french
language knowledge. ;)
Also the patch adds this header for IPv4 only. I haven't any IPv6 test
environment running here and don't know if getsockopt() with SO_ORIGINAL_DST
will work on IPv6. If someone knows it and wants to test it I can modify
the diff. Feel free to ask me questions or things which should be changed. :)
--Maik
The byte counters have long been 64-bit to avoid overflows. But with
several sites nowadays, we see session counters wrap around every 10-days
or so. So it was the moment to switch counters to 64-bit, including
error and warning counters which can theorically rise as fast as session
counters even if in practice there is very low risk.
The performance impact should not be noticeable since those counters are
only updated once per session. The stats output have been carefully checked
for proper types on both 32- and 64-bit platforms.
Sometimes it is required to let invalid requests pass because
applications sometimes take time to be fixed and other servers
do not care. Thus we provide two new options :
option accept-invalid-http-request (for the frontend)
option accept-invalid-http-response (for the backend)
When those options are set, invalid requests or responses do
not cause a 403/502 error to be generated.
The new "rate-limit sessions" statement sets a limit on the number of
new connections per second on the frontend. As it is extremely accurate
(about 0.1%), it is efficient at limiting resource abuse or DoS.
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.
Each proxy instance, either frontend or backend, now has some room
dedicated to storing a complete dated request or response in case
of parsing error. This will make it possible to consult errors in
order to find the exact cause, which is particularly important for
troubleshooting faulty applications.
The "bind-process" keyword lets the admin select which instances may
run on which process (in multi-process mode). It makes it easier to
more evenly distribute the load across multiple processes by avoiding
having too many listen to the same IP:ports.
Specifying "interface <name>" after the "source" statement allows
one to bind to a specific interface for proxy<->server traffic.
This makes it possible to use multiple links to reach multiple
servers, and to force traffic to pass via an interface different
from the one the system would have chosen based on the routing
table.
Three new options have been added when CONFIG_HAP_LINUX_SPLICE is
set :
- splice-request
- splice-response
- splice-auto
They are used to enable splicing per frontend/backend. They are also
supported in defaults sections. The "splice-auto" option is meant to
automatically turn splice on for buffers marked as fast streamers.
This should save quite a bunch of file descriptors.
It was required to add a new "options2" field to the proxy structure
because the original "options" is full.
When global.maxpipes is not set, it is automatically adjusted to
the max of the sums of all frontend's and backend's maxconns for
those which have at least one splice option enabled.
It is now possible to set or clear a cookie during a redirection. This
is useful for logout pages, or for protecting against some DoSes. Check
the documentation for the options supported by the "redirect" keyword.
(cherry-picked from commit 4af993822e880d8c932f4ad6920db4c9242b0981)
If "drop-query" is present on a "redirect" line using the "prefix" mode,
then the returned Location header will be the request URI without the
query-string. This may be used on some login/logout pages, or when it
must be decided to redirect the user to a non-secure server.
(cherry-picked from commit f2d361ccd73aa16538ce767c766362dd8f0a88fd)
Because I needed it in my situation - here's a quick patch to
allow changing of the "x-forwarded-for" header by using a suboption to
"option forwardfor".
Suboption "header XYZ" will set the header from "x-forwarded-for" to "XYZ".
Default is still "x-forwarded-for" if the header value isn't defined.
Also the suboption 'except a.b.c.d/z' still works on the same line.
So it's now: option forwardfor [except a.b.c.d[/z]] [header XYZ]
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
This is the first attempt at moving all internal parts from
using struct timeval to integer ticks. Those provides simpler
and faster code due to simplified operations, and this change
also saved about 64 bytes per session.
A new header file has been added : include/common/ticks.h.
It is possible that some functions should finally not be inlined
because they're used quite a lot (eg: tick_first, tick_add_ifset
and tick_is_expired). More measurements are required in order to
decide whether this is interesting or not.
Some function and variable names are still subject to change for
a better overall logics.
A new "redirect" keyword adds the ability to send an HTTP 301/302/303
redirection to either an absolute location or to a prefix followed by
the original URI. The redirection is conditionned by ACL rules, so it
becomes very easy to move parts of a site to another site using this.
This work was almost entirely done at Exceliance by Emeric Brun.
A test-case has been added in the tests/ directory.
This patch allows to specify a domain used when inserting a cookie
providing a session stickiness. Usefull for example with wildcard domains.
The patch adds one new variable to the struct proxy: cookiedomain.
When set the domain is appended to a Set-Cookie header.
Domain name is validated using the new invalid_domainchar() function.
It is basically invalid_char() limited to [A-Za-z0-9_.-]. Yes, the test
is too trivial and does not cover all wrong situations, but the main
purpose is to detect most common mistakes, not intentional abuses.
The underscore ("_") character is not RFC-valid but as it is
often (mis)used so I decided to allow it.
This patch adds two optional arguments "len" and "depth" to
"balance uri". They are used to limit the length in characters
of the analysis, as well as the number of directory components
it applies to.
This patch extends the "url_param" load balancing method by introducing
the "check_post" option. Using this option enables analysis of the beginning
of POST requests to search for the specified URL parameter.
The patch also fixes a few minor typos in comments that were discovered
during code review.
The new "leastconn" LB algorithm selects the server which has the
least established or pending connections. The weights are considered,
so that a server with a weight of 20 will get twice as many connections
as the server with a weight of 10.
The algorithm respects the minconn/maxconn settings, as well as the
slowstart since it is a dynamic algorithm. It also correctly supports
backup servers (one and all).
It is generally suited for protocols with long sessions (such as remote
terminals and databases), as it will ensure that upon restart, a server
with no connection will take all new ones until its load is balanced
with others.
A test configuration has been added in order to ease regression testing.
This patch adds two new variables: fastinter and downinter.
When server state is:
- non-transitionally UP -> inter (no change)
- transitionally UP (going down), unchecked or transitionally DOWN (going up) -> fastinter
- down -> downinter
It allows to set something like:
server sr6 127.0.51.61:80 cookie s6 check inter 10000 downinter 20000 fastinter 500 fall 3 weight 40
In the above example haproxy uses 10000ms between checks but as soon as
one check fails fastinter (500ms) is used. If server is down
downinter (20000) is used or fastinter (500ms) if one check pass.
Fastinter is also used when haproxy starts.
New "timeout.check" variable was added, if set haproxy uses it as an additional
read timeout, but only after a connection has been already established. I was
thinking about using "timeout.server" here but most people set this
with an addition reserve but still want checks to kick out laggy servers.
Please also note that in most cases check request is much simpler
and faster to handle than normal requests so this timeout should be smaller.
I also changed the timeout used for check connections establishing.
Changes from the previous version:
- use tv_isset() to check if the timeout is set,
- use min("timeout connect", "inter") but only if "timeout check" is set
as this min alone may be to short for full (connect + read) check,
- debug code (fprintf) commented/removed
- documentation
Compile tested only (sorry!) as I'm currently traveling but changes
are rather small and trivial.
In order to offer DoS protection, it may be required to lower the maximum
accepted time to receive a complete HTTP request without affecting the client
timeout. This helps protecting against established connections on which
nothing is sent. The client timeout cannot offer a good protection against
this abuse because it is an inactivity timeout, which means that if the
attacker sends one character every now and then, the timeout will not
trigger. With the HTTP request timeout, no matter what speed the client
types, the request will be aborted if it does not complete in time.
Add the "backlog" parameter to frontends, to give hints to
the system about the approximate listen backlog desired size.
In order to protect against SYN flood attacks, one solution is
to increase the system's SYN backlog size. Depending on the
system, sometimes it is just tunable via a system parameter,
sometimes it is not adjustable at all, and sometimes the system
relies on hints given by the application at the time of the
listen() syscall. By default, HAProxy passes the frontend's
maxconn value to the listen() syscall. On systems which can
make use of this value, it can sometimes be useful to be able
to specify a different value, hence this backlog parameter.
The code in haproxy-1.3.13.1 only supports syslogging to an internet
address. The attached patch:
- Adds support for syslogging to a UNIX domain socket (e.g., /dev/log).
If the address field begins with '/' (absolute file path), then
AF_UNIX is used to construct the socket. Otherwise, AF_INET is used.
- Achieves clean single-source build on both Mac OS X and Linux
(sockaddr_in.sin_len and sockaddr_un.sun_len field aren't always present).
For handling sendto() failures in send_log(), it appears that the existing
code is fine (no need to close/recreate socket) for both UDP and UNIX-domain
syslog server. So I left things alone (did not close/recreate socket).
Closing/recreating socket after each failure would also work, but would lead
to increased amount of unnecessary socket creation/destruction if syslog is
temporarily unavailable for some reason (especially for verbose loggers).
Please consider this patch for inclusion into the upstream haproxy codebase.
One user reported that an indicator was missing in the statistics:
the number of times each server was selected by load balancing. It
is in fact the total number of sessions assigned to a server by the
load balancing algorithm. It should directly reflect the weight for
"fair" algorithms such as round-robin, since it will not account for
persistant connections.
It should help a lot tuning each server's weight depending on the
load it receives.
Now the connect timeout, tarpit timeout and queue timeout are
distinct. In order to retain compatibility with older versions,
if either queue or tarpit is left unset both in the proxy and
in the default proxy, then it is inherited from the connect
timeout as before.
Under certain circumstances, it is very useful to be able to fail some
monitor requests. One specific case is when the number of servers in
the backend falls below a certain level. The new "monitor fail" construct
followed by either "if"/"unless" <condition> makes it possible to specify
ACL-based conditions which will make the monitor return 503 instead of
200. Any number of conditions can be passed. Another use may be to limit
the requests to local networks only.
When an HTTP server returns "404 not found", it indicates that at least
part of it is still running. For this reason, it can be convenient for
application administrators to be able to consider code 404 as valid,
but for a server which does not want to participate to load balancing
anymore. This is useful to seamlessly exclude a server from a farm
without acting on the load balancer. For instance, let's consider that
haproxy checks for the "/alive" file. To enable load balancing on a
server, the admin would simply do :
# touch /var/www/alive
And to disable the server, he would simply do :
# rm /var/www/alive
Another immediate gain from doing this is that it is now possible to
send NOTICE messages instead of ALERT messages when a server is first
disable, then goes down. This provides a graceful shutdown method.
To enable this behaviour, specify "http-check disable-on-404" in the
backend.