Commit graph

164 commits

Author SHA1 Message Date
Remi Tricot-Le Breton
51058d64a6 MINOR: cache: Consider invalid Age values as stale
Do not store responses that have an invalid age header (non numerical,
negative ...).
2020-12-04 10:21:56 +01:00
Remi Tricot-Le Breton
72cffaf440 MEDIUM: cache: Remove cache entry in case of POST on the same resource
In case of successful unsafe method on a stored resource, the cached entry
must be invalidated (see RFC7234#4.4).
A "non-error response" is one with a 2xx (Successful) or 3xx (Redirection)
status code.
This implies that the primary hash must now be calculated on requests
that have an unsafe method (POST or PUT for instance) so that we can
disable the corresponding entries when we process the response.
2020-12-04 10:21:56 +01:00
Remi Tricot-Le Breton
fcea374fdf MINOR: cache: Add extra "cache-control" value checks
The Cache-Control max-age and s-maxage directives should be followed by
a positive numerical value (see RFC 7234#5.2.1.1). According to the
specs, a sender "should not" generate a quoted-string value but we will
still accept this format.
2020-12-04 10:21:56 +01:00
Remi Tricot-Le Breton
795e1412b0 MINOR: cache: Do not store stale entry
When a response has an Age header (filled in by another cache on the
message's path) that is greater than its defined maximum age (extracted
either from cache-control directives or an expires header), it is
already stale and should not be cached.
2020-12-04 10:21:56 +01:00
Remi Tricot-Le Breton
3243447f83 MINOR: cache: Add entry to the tree as soon as possible
When many concurrent requests targeting the same resource were seen, the
cache could sometimes be filled by too many partial responses resulting
in the impossibility to cache a single one of them. This happened
because the actual tree insertion happened only after all the payload of
every response was seen. So until then, every response was added to the
cache because none of the streams knew that a similar request/response
was already being treated.
This patch consists in adding the cache_entry as soon as possible in the
tree (right after the first packet) so that the other responses do not
get cached as well (if they have the same primary key).
A "complete" flag is also added to the cache_entry so that we know if
all the payload is already stored in the entry or if it is still being
processed.
2020-12-02 16:38:42 +01:00
Remi Tricot-Le Breton
8bb72aa82f MINOR: cache: Improve accept_encoding_normalizer
Turn the "Accept-Encoding" value to lower case before processing it.
Calculate the CRC on every token instead of a sorted concatenation of
them all (in order to avoir copying them) then XOR all the CRCs into a
single hash (while ignoring duplicates).
2020-12-02 16:32:54 +01:00
Tim Duesterhus
23b2945c1c BUG/CRITICAL: cache: Fix trivial crash by sending accept-encoding header
Since commit 3d08236cb3 HAProxy can be trivially
crashed remotely by sending an `accept-encoding` HTTP request header that
contains 16 commas.

This is because the `values` array in `accept_encoding_normalizer` accepts only
16 entries and it is not verified whether the end is reached during looping.

Fix this issue by checking the length. This patch also simplifies the ist
processing in the loop, because it manually calculated offsets and lengths,
when the ist API exposes perfectly safe functions to advance and truncate ists.

I wonder whether the accept_encoding_normalizer function is able to re-use some
existing function for parsing headers that may contain lists of values. I'll
leave this evaluation up to someone else, only patching the obvious crash.

This commit is 2.4-dev specific and was merged just a few hours ago. No
backport needed.
2020-11-25 10:23:00 +01:00
Remi Tricot-Le Breton
754b2428d3 MINOR: cache: Add a process-vary option that can enable/disable Vary processing
The cache section's process-vary option takes a 0 or 1 value to disable
or enable the vary processing.
When disabled, a response containing such a header will never be cached.
When enabled, we will calculate a preliminary hash for a subset of request
headers on all the incoming requests (which might come with a cpu cost) which
will be used to build a secondary key for a given request (see RFC 7234#4.1).
The default value is 0 (disabled).
2020-11-24 16:52:57 +01:00
Remi Tricot-Le Breton
1785f3dd96 MEDIUM: cache: Add the Vary header support
Calculate a preliminary secondary key for every request we see so that
we can have a real secondary key if the response is cacheable and
contains a manageable Vary header.
The cache's ebtree is now allowed to have multiple entries with the same
primary key. Two of those entries will be distinguished thanks to
secondary keys stored in the cache_entry (based on hashes of a subset of
their headers).
When looking for an entry in the cache (cache_use), we still use the
primary key (built the same way as before), but in case of match, we
also need to check if the entry has a vary signature. If it has one, we
need to perform an extra check based on the newly built secondary key.
We will only be able to forge a response out of the cache if both the
primary and secondary keys match with one of our entries. Otherwise the
request will be forwarder to the server.
2020-11-24 16:52:57 +01:00
Remi Tricot-Le Breton
3d08236cb3 MINOR: cache: Prepare helper functions for Vary support
The Vary functionality is based on a secondary key that needs to be
calculated for every request to which a server answers with a Vary
header. The Vary header, which can only be found in server responses,
determines which headers of the request need to be taken into account in
the secondary key. Since we do not want to have to store all the headers
of the request until we have the response, we will pre-calculate as many
sub-hashes as there are headers that we want to manage in a Vary
context. We will only focus on a subset of headers which are likely to
be mentioned in a Vary response (accept-encoding and referer for now).
Every managed header will have its own normalization function which is
in charge of transforming the header value into a core representation,
more robust to insignificant changes that could exist between multiple
clients. For instance, two accept-encoding values mentioning the same
encodings but in different orders should give the same hash.
This patch adds a function that parses a Vary header value and checks if
all the values belong to our supported subset. It also adds the
normalization functions for our two headers, as well as utility
functions that can prebuild a secondary key for a given request and
transform it into an actual secondary key after the vary signature is
determined from the response.
2020-11-24 16:52:57 +01:00
Christopher Faulet
fc633b6eff CLEANUP: config: Return ERR_NONE from config callbacks instead of 0
Return ERR_NONE instead of 0 on success for all config callbacks that should
return ERR_* codes. There is no change because ERR_NONE is a macro equals to
0. But this makes the return value more explicit.
2020-11-13 16:26:10 +01:00
Remi Tricot-Le Breton
cc9bf2e5fe MEDIUM: cache: Change caching conditions
Do not cache responses that do not have an explicit expiration time
(s-maxage or max-age Cache-Control directives or Expires header) or a
validator (ETag or Last-Modified headers) anymore, as suggested in
RFC 7234#3.
The TX_FLAG_IGNORE flag is used instead of the TX_FLAG_CACHEABLE so as
not to change the behavior of the checkcache option.
2020-11-12 11:22:05 +01:00
Remi Tricot-Le Breton
8c2db71326 BUG/MINOR: cache: Inverted variables in http_calc_maxage function
The maxage and smaxage variables were inadvertently assigned the
Cache-Control s-maxage and max-age values respectively when it should
have been the other way around.

This can be backported on all branches after 1.8 (included).
2020-10-30 14:29:29 +01:00
Remi Tricot-Le Breton
a6476114ec MINOR: cache: Add Expires header value parsing
When no Cache-Control max-age or s-maxage information is present in a
cached response, we need to parse the Expires header value (RFC 7234#5.3).
An invalid Expires date value or a date earlier than the reception date
will make the cache_entry stale upon creation.
For now, the Cache-Control and Expires headers are parsed after the
insertion of the response in the cache so even if the parsing of the
Expires results in an already stale entry, the entry will exist in the
cache.
2020-10-30 11:08:38 +01:00
Remi Tricot-Le Breton
bf97121f1c MINOR: cache: Create res.cache_hit and res.cache_name sample fetches
Res.cache_hit sample fetch returns a boolean which is true when the HTTP
response was built out of a cache. The cache's name is returned by the
res.cache_name sample_fetch.

This resolves GitHub issue #900.
2020-10-27 18:25:43 +01:00
Remi Tricot-Le Breton
53161d81b8 MINOR: cache: Process the If-Modified-Since header in conditional requests
If a client sends a conditional request containing an If-Modified-Since
header (and no If-None-Match header), we try to compare the date with
the one stored in the cache entry (coming either from a Last-Modified
head, or a Date header, or corresponding to the first response's
reception time). If the request's date is earlier than the stored one,
we send a "304 Not Modified" response back. Otherwise, the stored is sent
(through a 200 OK response).

This resolves GitHub issue #821.
2020-10-27 18:10:25 +01:00
Remi Tricot Le Breton
27091b4dd0 MINOR: cache: Store the "Last-Modified" date in the cache_entry
In order to manage "If-Modified-Since" requests, we need to keep a
reference time for our cache entries (to which the conditional request's
date will be compared).
This reference is either extracted from the "Last-Modified" header, or
the "Date" header, or the reception time of the response (in decreasing
order of priority).
The date values are converted into seconds since epoch in order to ease
comparisons and to limit storage space.
2020-10-27 18:10:25 +01:00
Tim Duesterhus
e0142340b2 BUG/MINOR: cache: Check the return value of http_replace_res_status
Send the full body if the status `304` cannot be applied. This should be the
most graceful failure.

Specific for 2.3, no backport needed.
2020-10-27 17:01:49 +01:00
Remi Tricot-Le Breton
6cb10384a3 MEDIUM: cache: Add support for 'If-None-Match' request header
Partial support of conditional HTTP requests. This commit adds the
support of the 'If-None-Match' header (see RFC 7232#3.2).
When a client specifies a list of ETags through one or more
'If-None-Match' headers, they are all compared to the one that might have
been stored in the corresponding http cache entry until one of them
matches.
If a match happens, a specific "304 Not Modified" response is
sent instead of the cached data. This response has all the stored
headers but no other data (see RFC 7232#4.1). Otherwise, the whole cached data
is sent.
Although unlikely in a GET/HEAD request, the "If-None-Match: *" syntax is
valid and also receives a "304 Not Modified" response (RFC 7434#4.3.2).

This resolves a part of GitHub issue #821.
2020-10-22 16:10:20 +02:00
Remi Tricot-Le Breton
dbb65b5a7a MEDIUM: cache: Store the ETag information in the cache_entry
When sent by a server for a given resource, the ETag header is
stored in the coresponding cache entry (as any other header). So in
order to perform future ETag comparisons (for subsequent conditional
HTTP requests), we keep the length of the ETag and its offset
relative to the start of the cache_entry.
If no ETag header exists, the length and offset are zero.
2020-10-22 16:10:20 +02:00
Tim Duesterhus
d7c6e6a71d CLEANUP: cache: Fix leak of cconf->c.name during config check
During the config check, the post parsing is not performed. Thus, cache filters
are not fully initialized and their cache name are never released. To be able to
release them, a flag is now set when a cache filter is fully initialized. On
deinit, if the flag is not set, it means the cache name must be freed.

The patch should fix #849. No backport needed.

[Cf: Tim is the patch author, but I added the commit message]
2020-10-07 14:07:29 +02:00
Tim Duesterhus
ff4d86becd MINOR: cache: Reject duplicate cache names
Using a duplicate cache name most likely is the result of a misgenerated
configuration. There is no good reason to allow this, as the duplicate
caches can't be referred to.

This commit resolves GitHub issue #820.

It can be argued whether this is a fix for a bug or not. I'm erring on the
side of caution and marking this as a "new feature". It can be considered for
backporting to 2.2, but for other branches the risk of accidentally breaking
some working (but non-ideal) configuration might be too large.
2020-08-18 22:51:24 +02:00
Tim Duesterhus
ea969f6f26 DOC: cache: Use '<name>' instead of '<id>' in error message
When the cache name is left out in 'filter cache' the error message refers
to a missing '<id>'. The name of the cache is called 'name' within the docs.

Adjust the error message for consistency.

The error message was introduced in 99a17a2d91.
This commit first appeared in 1.9, thus the patch must be backported to 1.9+.
2020-08-18 22:51:24 +02:00
Christopher Faulet
810df06145 MEDIUM: htx: Add a flag on a HTX message when no more data are expected
The HTX_FL_EOI flag must now be set on a HTX message when no more data are
expected. Most of time, it must be set before adding the EOM block. Thus, if
there is no space for the EOM, there is still an information to know all data
were received and pushed in the HTX message. There is only an exception for the
HTTP replies (deny, return...). For these messages, the flag is set after all
blocks are pushed in the message, including the EOM block, because, on error,
we remove all inserted data.
2020-07-22 16:43:32 +02:00
Willy Tarreau
b2551057af CLEANUP: include: tree-wide alphabetical sort of include files
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
2020-06-11 10:18:59 +02:00
Willy Tarreau
36979d9ad5 REORG: include: move the error reporting functions to from log.h to errors.h
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().

This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
2020-06-11 10:18:59 +02:00
Willy Tarreau
6be7849f39 REORG: include: move cfgparse.h to haproxy/cfgparse.h
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.

There is nothing left in common/ now so this directory must not be used
anymore.
2020-06-11 10:18:58 +02:00
Willy Tarreau
dfd3de8826 REORG: include: move stream.h to haproxy/stream{,-t}.h
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.

In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.

It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
2020-06-11 10:18:58 +02:00
Willy Tarreau
a264d960f6 REORG: include: move proxy.h to haproxy/proxy{,-t}.h
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
2020-06-11 10:18:58 +02:00
Willy Tarreau
aeed4a85d6 REORG: include: move log.h to haproxy/log{,-t}.h
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.

Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.

A good approach would consist in splitting that file in 3 parts:
  - one for error output ("errors" ?).
  - one for log_format processing
  - and one for actual logging.
2020-06-11 10:18:58 +02:00
Willy Tarreau
c7babd8570 REORG: include: move filters.h to haproxy/filters{,-t}.h
Just a minor change, moved the macro definitions upwards. A few caller
files were updated since they didn't need to include it.
2020-06-11 10:18:58 +02:00
Willy Tarreau
c2b1ff04e5 REORG: include: move http_ana.h to haproxy/http_ana{,-t}.h
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
2020-06-11 10:18:58 +02:00
Willy Tarreau
f1d32c475c REORG: include: move channel.h to haproxy/channel{,-t}.h
The files were moved with no change. The callers were cleaned up a bit
and a few of them had channel.h removed since not needed.
2020-06-11 10:18:58 +02:00
Willy Tarreau
5e539c9b8d REORG: include: move stream_interface.h to haproxy/stream_interface{,-t}.h
Almost no changes, removed stdlib and added buf-t and connection-t to
the types to avoid a warning.
2020-06-11 10:18:58 +02:00
Willy Tarreau
83487a833c REORG: include: move cli.h to haproxy/cli{,-t}.h
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
2020-06-11 10:18:58 +02:00
Willy Tarreau
c761f843da REORG: include: move http_rules.h to haproxy/http_rules.h
There was no include file. This one still includes types/proxy.h.
2020-06-11 10:18:57 +02:00
Willy Tarreau
122eba92b7 REORG: include: move action.h to haproxy/action{,-t}.h
List.h was missing for LIST_ADDQ(). A few unneeded includes of action.h
were removed from certain files.

This one still relies on applet.h and stick-table.h.
2020-06-11 10:18:57 +02:00
Willy Tarreau
87735330d1 REORG: include: move http_htx.h to haproxy/http_htx{,-t}.h
A few includes had to be added, namely list-t.h in the type file and
types/proxy.h in the proto file. actions.h was including http-htx.h
but didn't need it so it was dropped.
2020-06-11 10:18:57 +02:00
Willy Tarreau
334099c324 REORG: include: move shctx to haproxy/shctx{,-t}.h
Minor cleanups were applied, some includes were missing from the types
file and some were incorrect in a few C files (duplicated or not using
path).
2020-06-11 10:18:57 +02:00
Willy Tarreau
16f958c0e9 REORG: include: split common/htx.h into haproxy/htx{,-t}.h
Most of the file was a large set of HTX elements manipulation functions
and few types, so splitting them allowed to further reduce dependencies
and shrink the build time. Doing so revealed that a few files (h2.c,
mux_pt.c) needed haproxy/buf.h and were previously getting it through
htx.h. They were fixed.
2020-06-11 10:18:57 +02:00
Willy Tarreau
6131d6a731 REORG: include: move common/net_helper.h to haproxy/net_helper.h
No change was necessary.
2020-06-11 10:18:57 +02:00
Willy Tarreau
8d36697dee REORG: include: move base64.h, errors.h and hash.h from common to to haproxy/
These ones do not depend on any other file. One used to include
haproxy/api.h but that was solely for stddef.h.
2020-06-11 10:18:56 +02:00
Willy Tarreau
4c7e4b7738 REORG: include: update all files to use haproxy/api.h or api-t.h if needed
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:

  - common/config.h
  - common/compat.h
  - common/compiler.h
  - common/defaults.h
  - common/initcall.h
  - common/tools.h

The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.

In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.

No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
2020-06-11 10:18:42 +02:00
Willy Tarreau
8d2b777fe3 REORG: ebtree: move the include files from ebtree to include/import/
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.

The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).

A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.

No further cleanup was done yet in order to keep changes minimal.
2020-06-11 09:31:11 +02:00
Christopher Faulet
2a37cdbe6b BUG/MINOR: cache: Don't needlessly test "cache" keyword in parse_cache_flt()
parse_cache_flt() is the registered callback for the "cache" filter keyword. It
is only called when the "cache" keyword is found on a filter line. So, it is
useless to test the filter name in the callback function.

This patch should fix the issue #634. It may be backported as far as 1.9.
2020-05-18 17:47:18 +02:00
Ilya Shipitsin
6fb0f2148f CLEANUP: assorted typo fixes in the code and comments
This is sixth iteration of typo fixes
2020-04-02 16:25:45 +02:00
Christopher Faulet
65554e1b95 MINOR: cache/filters: Initialize the cache filter when stream is created
Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the cache filter can be initialized when the stream is created and released with
the stream. Concretly, .channel_start_analyze and .channel_end_analyze callback
functions are replaced by .attach and .detach ones.

With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the cache filter.
2020-03-06 15:36:04 +01:00
Christopher Faulet
497c759558 BUG/MEDIUM: cache/filters: Fix loop on HTX blocks caching the response payload
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.

So, from the cache point of view, when we loop on the HTX blocks to cache the
response payload, we must start from the head of the HTX message. To ease the
loop, we use the function htx_find_offset().

This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
2020-03-06 14:12:59 +01:00
Willy Tarreau
8b5075806d CLEANUP: cache: use read_u32/write_u32 to access the cache entry's hash
Enabling strict aliasing fails on the cache's hash which is a series of
20 bytes cast as u32. And in practice it could even fail on some archs
if the http_txn didn't guarantee the hash was properly aligned. Let's
use read_u32() to read the value and write_u32() to set it, this makes
sure the compiler emits the correct code to access these and knows about
the intentional aliasing.
2020-02-25 09:35:07 +01:00
Tim Duesterhus
d34b1ce5a2 BUG/MINOR: cache: Fix leak of cache name in error path
This issue was introduced in commit 99a17a2d91
which first appeared in tag v1.9-dev11. This bugfix should be backported
to HAProxy 1.9+.
2020-01-18 06:45:54 +01:00