Since 1.5, the request body analyser has become independant from any
other element and does not even disturb the message forwarder anymore.
And since it's disabled by default, we can place it before most
analysers so that it's can preempt any other one if an intermediary
one enables it.
This behavior is already existing for the "WAIT_HTTP" analyzer,
this patch just extends the system to any analyzer that would
be waked up on response activity.
Commit bb2e669 ("BUG/MAJOR: http: correctly rewind the request body
after start of forwarding") was incorrect/incomplete. It used to rely on
CF_READ_ATTACHED to stop updating msg->sov once data start to leave the
buffer, but this is unreliable because since commit a6eebb3 ("[BUG]
session: clear BF_READ_ATTACHED before next I/O") merged in 1.5-dev1,
this flag is only ephemeral and is cleared once all analysers have
seen it. So we can start updating msg->sov again each time we pass
through this place with new data. With a sufficiently large amount of
data, it is possible to make msg->sov wrap and validate the if()
condition at the top, causing the buffer to advance by about 2GB and
crash the process.
Note that the offset cannot be controlled by the attacker because it is
a sum of millions of small random sizes depending on how many bytes were
read by the server and how many were left in the buffer, only because
of the speed difference between reading and writing. Also, nothing is
written, the invalid pointer resulting from this operation is only read.
Many thanks to James Dempsey for reporting this bug and to Chris Forbes for
narrowing down the faulty area enough to make its root cause analysable.
This fix must be backported to haproxy 1.5.
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.
Cyril Bonté managed to reproduce it with the configuration below :
listen test :80
mode http
balance url_param q
hash-type consistent
server s demo.1wt.eu:80
and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").
The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.
The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.
The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.
Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.
For future versions, some better options will have to be considered :
- let all analysers subscribe to both request and response events ;
- let analysers subscribe to stream interface events (reduces number
of useless calls)
- change CF_WAKE_WRITE's semantics to persist across calls to
process_session(), but that is different from validating a
connection establishment (eg: no data sent, or no data to send)
The bug was introduced in 1.5-dev23, no backport is needed.
We store the time stamp of last read in the channel in order to
be able to measure some bit rate and pause lengths. We only use
16 bits which were unused for this. We don't need more, as it
allows us to measure with a millisecond precision for up to 65s.
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.
The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.
That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.
In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.
The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).
That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
This value is stored as unsigned in chn->to_forward. Having it defined
as signed makes it impossible to pass channel_forward() a previously
saved value because the argument will be zero-extended during the
conversion to long long, while the test will be performed using sign
extension. There is no impact on existing code right now.
Hijackers were functions designed to inject data into channels in the
distant past. They became unused around 1.3.16, and since there has
not been any user of this mechanism to date, it's uncertain whether
the mechanism still works (and it's not really useful anymore). So
better remove it as well as the pointer it uses in the channel struct.
Now that the buffer is moved out of the channel, it is possible to move
the pointer earlier in the struct and reorder some fields. This new
ordering improves overall performance by 2%, mainly saved in the HTTP
parsers and data transfers.
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
This is a massive rename of most functions which should make use of the
word "channel" instead of the word "buffer" in their names.
In concerns the following ones (new names) :
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes);
static inline void channel_init(struct channel *buf)
static inline int channel_input_closed(struct channel *buf)
static inline int channel_output_closed(struct channel *buf)
static inline void channel_check_timeouts(struct channel *b)
static inline void channel_erase(struct channel *buf)
static inline void channel_shutr_now(struct channel *buf)
static inline void channel_shutw_now(struct channel *buf)
static inline void channel_abort(struct channel *buf)
static inline void channel_stop_hijacker(struct channel *buf)
static inline void channel_auto_connect(struct channel *buf)
static inline void channel_dont_connect(struct channel *buf)
static inline void channel_auto_close(struct channel *buf)
static inline void channel_dont_close(struct channel *buf)
static inline void channel_auto_read(struct channel *buf)
static inline void channel_dont_read(struct channel *buf)
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes)
Some functions provided by channel.[ch] have kept their "buffer" name because
they are really designed to act on the buffer according to some information
gathered from the channel. They have been moved together to the same place in
the file for better readability but they were not changed at all.
The "buffer" memory pool was also renamed "channel".
Get rid of these confusing BF_* flags. Now channel naming should clearly
be used everywhere appropriate.
No code was changed, only a renaming was performed. The comments about
channel operations was updated.
This is similar to the recent removal of BF_OUT_EMPTY. This flag was very
problematic because it relies on permanently changing information such as the
to_forward value, so it had to be updated upon every change to the buffers.
Previous patch already got rid of its users.
One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
input and output is more reliable than checking the flag's replacement.
This flag was very problematic because it was composite in that both changes
to the pipe or to the buffer had to cause this flag to be updated, which is
not always simple (eg: there may not even be a channel attached to a buffer
at all).
There were not that many users of this flags, mostly setters. So the flag got
replaced with a macro which reports whether the channel is empty or not, by
checking both the pipe and the buffer.
One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
output size is more useful than checking the flag's replacement.