We used to have 3 types of counters with a huge overlap :
- listener counters : stats collected for each bind line
- proxy counters : union of the frontend and backend counters
- server counters : stats collected per server
It happens that quite a good part was common between listeners and
proxies due to the frontend counters being updated at the two locations,
and that similarly the server and proxy counters were overlapping and
being updated together.
This patch cleans this up to propose only two types of counters :
- fe_counters: used by frontends and listeners, related to
incoming connections activity
- be_counters: used by backends and servers, related to outgoing
connections activity
This allowed to remove some non-sensical counters from both parts. For
frontends, the following entries were removed :
cum_lbconn, last_sess, nbpend_max, failed_conns, failed_resp,
retries, redispatches, q_time, c_time, d_time, t_time
For backends, this ones was removed : intercepted_req.
While doing this it was discovered that we used to incorrectly report
intercepted_req for backends in the HTML stats, which was always zero
since it's never updated.
Also it revealed a few inconsistencies (which were not fixed as they
are harmless). For example, backends count connections (cum_conn)
instead of sessions while servers count sessions and not connections.
Over the long term, some extra cleanups may be performed by having
some counters update functions touching both the server and backend
at the same time, as well as both the frontend and listener, to
ensure that all sides have all their stats properly filled. The stats
dump will also be able to factor the dump functions by counter types.
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
Summary:
Track and report last session time on the stats page for each server
in every backend, as well as the backend.
This attempts to address the requirement in the ROADMAP
- add a last activity date for each server (req/resp) that will be
displayed in the stats. It will be useful with soft stop.
The stats page reports this as time elapsed since last session. This
change does not adequately address the requirement for long running
session (websocket, RDP... etc).
Depending on the content-types and accept-encoding fields, some responses
might or might not be compressed. Let's have a counter of the number of
compressed responses and report it in the stats to help improve compression
usage.
Some cosmetic issues were fixed in the CSV output too (missing commas at the
end).
It was a bit frustrating to have no idea about the bandwidth saved by
HTTP compression. Now we have per-frontend and per-backend stats. The
stats on the HTTP interface are shown in a hover title in the "bytes out"
column if at least something was fed to the compressor. 3 new columns
appeared in the CSV stats output.
These requests are mainly monitor requests, as well as stats requests when
the stats are processed by the frontend. Having this counter helps explain
the difference in number of sessions that is sometimes observed between a
frontend and a backend.
It's very annoying that frontend and backend stats are merged because we
don't know what we're observing. For instance, if a "listen" instance
makes use of a distinct backend, it's impossible to know what the bytes_out
means.
Some points take care of not updating counters twice if the backend points
to the frontend, indicating a "listen" instance. The thing becomes more
complex when we try to add support for server side keep-alive, because we
have to maintain a pointer to the backend used for last request, and to
update its stats. But we can't perform such comparisons anymore because
the counters will not match anymore.
So in order to get rid of this situation, let's have both frontend AND
backend stats in the "struct proxy". We simply update the relevant ones
during activity. Some of them are only accounted for in the backend,
while others are just for frontend. Maybe we can improve a bit on that
later, but the essential part is that those counters now reflect what
they really mean.
It's very disturbing to see the "denied req" counter increase without
any other session counter moving. In fact, we can't count a rejected
TCP connection as "denied req" as we have not yet instanciated any
session at all. Let's use a new counter for that.
Now we're able to reject connections very early, so we need to use a
different counter for the connections that are received and the ones
that are accepted and converted into sessions, so that the rate limits
can still apply to the accepted ones. The session rate must still be
used to compute the rate limit, so that we can reject undesired traffic
without affecting the rate.
Often we need to understand why some transfers were aborted or what
constitutes server response errors. With those two counters, it is
now possible to detect an unexpected transfer abort during a data
phase (eg: too short HTTP response), and to know what part of the
server response errors may in fact be assigned to aborted transfers.
It is wrong to merge FE and BE stats for a proxy because when we consult a
BE's stats, it reflects the FE's stats eventhough the BE has received no
traffic. The most common example happens with listen instances, where the
backend gets credited for all the trafic even when a use_backend rule makes
use of another backend.
Implement decreasing health based on observing communication between
HAProxy and servers.
Changes in this version 2:
- documentation
- close race between a started check and health analysis event
- don't force fastinter if it is not set
- better names for options
- layer4 support
Changes in this version 3:
- add stats
- port to the current 1.4 tree
There are a few remaining max values that need to move to counters.
Also, the counters are more often used than some config information,
so get them closer to the other useful struct members for better cache
efficiency.