From 3020fde525896bc863fd2b6f43ac313f13a08dfa Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Wed, 8 Apr 2026 17:53:24 +0200
Subject: [PATCH] BUG/MAJOR: slz: always make sure to limit fixed output to
 less than worst case literals

Literals are sent in two ways:
  - in EOB state, unencoded and prefixed with their length
  - in FIXED state, huffman-encoded

And references are only sent in FIXED state.

The API promises that the amount of data will not grow by more than
5 bytes every 65535 input bytes (the comment was adjusted to remind
this last point). This is guaranteed by the literal encoding in EOB
state (BT, LEN, NLEN + bytes), which is supposed to be the worst
case by design.

However, as reported by Greg KH, this is currently not true: the test
that decides whether or not to switch to FIXED state to send references
doesn't properly account for the number of bytes needed to roll back
to the *exact* same state in EOB, which means sending EOB, BT,
alignment, LEN and NLEN in addition to the referenced bytes, versus
sending the encoding for the reference. By not taking into account the
cost of returning to the initial state (BT+LEN+NLEN), it was possible
to stay too long in the FIXED state and to consume the extra bytes that
are needed to return to the EOB state, resulting in producing much more
data in case of multiple switchovers (up to 6.25% increase was measured
in tests, or 1/16, which matches worst case estimates based on the code).

And this check is only valid when starting from EOB (in order to restore
the same state that offers this guarantee). When already in FIXED state,
the encoded reference is always smaller than or same size as the data.
The smallest match length we support is 4 bytes, and when encoded this
is no more than 28 bits, so it is safe to stay in FIXED state as long
as needed while checking the possibility of switching back to EOB.

This very slightly reduces the compression ratio (-0.17% on a linux
kernel source) but makes sure we respect the API promise of no more
than 5 extra bytes per 65535 of input. A side effect of the slightly
simpler check is an ~7.5% performance increase in compression speed.

Many thanks to Greg for the detailed report allowing to reproduce
the issue.

This is libslz upstream commit 002e838935bf298d967f670036efa95822b6c84e.

Note: in haproxy's default configuration (tune.bufsize 16384,
tune.maxrewrite 1024), this problem cannot be triggered, because the
reserve limits input to 15360 bytes, and the overflow is maximum
960 bytes resulting in 16320 bytes total, which still fits into the
buffer. However, reducing tune.maxrewrite below 964, or tune.bufsize
above 17408 can result in overflows for specially crafted patterns.

A workaround for larger buffers consists in always setting tune.bufsize
to at least 1/16 of tune.bufsize.

Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://www.mail-archive.com/haproxy@formilux.org/msg46837.html
---
 src/slz.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 8 deletions(-)
diff --git a/src/slz.c b/src/slz.c
index 002b345f8..2d670f83c 100644
--- a/src/slz.c
+++ b/src/slz.c
@@ -508,11 +508,11 @@ static void reset_refs(union ref *refs, long count)
 }
 
 /* Compresses <ilen> bytes from <in> into <out> according to RFC1951. The
- * output result may be up to 5 bytes larger than the input, to which 2 extra
- * bytes may be added to send the last chunk due to BFINAL+EOB encoding (10
- * bits) when <more> is not set. The caller is responsible for ensuring there
- * is enough room in the output buffer for this. The amount of output bytes is
- * returned, and no CRC is computed.
+ * output result may be up to 5 bytes larger than the input for each 65535
+ * nput bytes, to which 2 extra bytes may be added to send the last chunk due
+ * to BFINAL+EOB encoding (10 bits) when <more> is not set. The caller is
+ * responsible for ensuring there is enough room in the output buffer for this.
+ * The amount of output bytes is returned, and no CRC is computed.
  */
 long slz_rfc1951_encode(struct slz_stream *strm, unsigned char *out, const unsigned char *in, long ilen, int more)
 {
@@ -642,10 +642,55 @@ long slz_rfc1951_encode(struct slz_stream *strm, unsigned char *out, const unsig
 		/* direct mapping of dist->huffman code */
 		dist = fh_dist_table[pos - last - 1];
 
-		/* if encoding the dist+length is more expensive than sending
-		 * the equivalent as bytes, lets keep the literals.
+		/* If encoding the dist+length is more expensive than sending
+		 * the equivalent as bytes, lets keep the literals. This is
+		 * important for the compression ratio and also for standing by
+		 * our promise that the output is no longer than input + 5 bytes
+		 * every 65535. The explanation is that a valid stream starts in
+		 * EOB and ends in EOB, which is the state where we can grow by
+		 * up to 5 bytes (1 byte for BT, 2 for LEN, 2 for NLEN, then
+		 * literal data). Sending compressed Huffman data (reference or
+		 * fixed encoding) may only be accepted if it is guaranteed that
+		 * it will not represent more bytes in the worst case. Switching
+		 * from EOB to FIXED requires 3 bits of BT. Then either fixed
+		 * data (8 or 9 bits) or references (15-33 bits). Switching back
+		 * to EOB requires EOB (7 bits), and in order to get back to the
+		 * situation where bytes can be added with no overhead, the BT,
+		 * align, LEN and NLEN have to be sent as well:
+		 *
+		 * Before switching:
+		 *          [BT] [ALIGN] [LEN] [NLEN] [PLIT]
+		 *   bits:    3     7     16     16   8*plit
+		 *
+		 * In case of sending mlen bytes as literals:
+		 *          [BT] [ALIGN] [LEN] [NLEN] [PLIT] [MLEN]
+		 *   bits:    3     7     16     16   8*plit 8*mlen
+		 *
+		 * In case of sending mlen bytes as literals, we add:
+		 *       [MLEN]
+		 * bits: 8*mlen
+		 *
+		 * In case of sending reference, we add in the worst case:
+		 *       [BT] [CODE] [DIST] [EOB] [BT] [ALIGN] [LEN] [NLEN]
+		 * bits:  3    clen   dlen    7    3      7      16    16
+		 *
+		 * Thus for literals we add 8*mlen bits, and for reference we
+		 * add clen+dlen+52 bits. If the reference encoding + 52 bits
+		 * is shorter. Of course there are plenty of opportunities to
+		 * be much shorter once we switch to FIXED, and a stricter
+		 * tracking could allow to send references more often. But here
+		 * we at least guarantee that if data fit as literals, they also
+		 * fit using a temporary switch to FIXED.
+		 *
+		 * Regarding encoding size, clen is 7 for mlen 3..10, to 12 for
+		 * mlen 131..257. dlen is 8 for distances 1..4, to 21 for
+		 * distances 16385..32768. Thus mlen <= 130 produces clen+dlen
+		 * <= 32 bits. This means that for mlen 4 or above, the encoded
+		 * reference is always smaller than the data it references. This
+		 * is what guarantees that once switched to FIXED we can stay
+		 * in it for as long as needed.
 		 */
-		if ((dist & 0x1f) + (code >> 16) + 8 >= 8 * mlen + bit9)
+		if (strm->state == SLZ_ST_EOB && (dist & 0x1f) + (code >> 16) + 52 > 8 * mlen)
 			goto send_as_lit;
 
 		/* first, copy pending literals */