redis/tests
Salvatore Sanfilippo 154fdcee01
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Test tcp deadlock fixes (#14667)
**Disclaimer: this patch was created with the help of AI**

My experience with the Redis test not passing on older hardware didn't
stop just with the other PR opened with the same problem. There was
another deadlock happening when the test was writing a lot of commands
without reading it back, and the cause seems related to the fact that
such tests have something in common. They create a deferred client (that
does not read replies at all, if not asked to), flood the server with 1
million of requests without reading anything back. This results in a
networking issue where the TCP socket stops accepting more data, and the
test hangs forever.

To read those replies from time to time allows to run the test on such
older hardware.

Ping oranagra that introduced at least one of the bulk writes tests.
AFAIK there is no problem in the test, if we change it in this way,
since the slave buffer is going to be filled anyway. But better to be
sure that it was not intentional to write all those data without reading
back for some reason I can't see.

IMPORTANT NOTE: **I am NOT sure at all** that the TCP socket senses
congestion in one side and also stops the other side, but anyway this
fix works well and is likely a good idea in general. At the same time, I
doubt there is a pending bug in Redis that makes it hang if the output
buffer is too large, or we are flooding the system with too many
commands without reading anything back. So the actual cause remains
cloudy. I remember that Redis, when the output limit is reached, could
kill the client, and not lower the priority of command processing. Maybe
Oran knows more about this.

## LLM commit message.

The test "slave buffer are counted correctly" was hanging indefinitely
on slow machines. The test sends 1M pipelined commands without reading
responses, which triggers a TCP-level deadlock.

Root cause: When the test client sends commands without reading
responses:
1. Server processes commands and sends responses
2. Client's TCP receive buffer fills (client not reading)
3. Server's TCP send buffer fills
4. Packets get dropped due to buffer pressure
5. TCP congestion control interprets this as network congestion
6. cwnd (congestion window) drops to 1, RTO increases exponentially
7. After multiple backoffs, RTO reaches ~100 seconds
8. Connection becomes effectively frozen

This was confirmed by examining TCP socket state showing cwnd:1,
backoff:9, rto:102912ms, and rwnd_limited:100% on the client side.

The fix interleaves reads with writes by processing responses every
10,000 commands. This prevents TCP buffers from filling to the point
where congestion control triggers the pathological backoff behavior.

The test still validates the same functionality (slave buffer memory
accounting) since the measurement happens after all commands complete.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 14:26:22 +08:00
..
assets Optimistic locking for string objects - compare-and-set and compare-and-delete (#14435) 2025-10-21 10:32:49 +03:00
cluster Add Atomic Slot Migration (ASM) support (#14414) 2025-10-22 15:56:20 +03:00
helpers Fix daily CI for atomic slot migration (#14459) 2025-10-25 09:00:33 +08:00
integration Fix infinite loop during reverse iteration due to invalid numfields of corrupted stream (#14472) 2026-01-05 21:16:53 +08:00
modules New eviction policies - least recently modified (#14624) 2026-01-06 20:57:31 +08:00
sentinel Fix timing issue for sentinel master-reboot test (#14312) 2025-09-05 14:49:19 +08:00
support fix test tag leakage that can result in skipping tests (#14572) 2025-11-26 09:13:21 +02:00
tmp minor fixes to the new test suite, html doc updated 2010-05-14 18:48:33 +02:00
unit Test tcp deadlock fixes (#14667) 2026-01-07 14:26:22 +08:00
vectorset Add daily CI for vectorset (#14302) 2025-12-10 08:52:43 +08:00
instances.tcl Fix some daily CI issues (#14217) 2025-07-28 10:53:57 +08:00
README.md Add thread sanitizer run to daily CI (#13964) 2025-06-02 10:13:23 +03:00
test_helper.tcl Add Atomic Slot Migration (ASM) support (#14414) 2025-10-22 15:56:20 +03:00

Redis Test Suite

The normal execution mode of the test suite involves starting and manipulating local redis-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations:

Option Impact
--singledb Only use database 0, don't assume others are supported.
--ignore-encoding Skip all checks for specific encoding.
--ignore-digest Skip key value digest validations.
--cluster-mode Run in strict Redis Cluster compatibility mode.
--large-memory Enables tests that consume more than 100mb

Tags

Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.

Tags can be applied in different context levels:

  • start_server context
  • tags context that bundles several tests together
  • A single test context.

The following compatibility and capability tags are currently used:

Tag Indicates
external:skip Not compatible with external servers.
cluster:skip Not compatible with --cluster-mode.
large-memory Test that requires more than 100mb
tls:skip Not compatible with --tls.
tsan:skip Not compatible with running under thread sanitizer.
needs:repl Uses replication and needs to be able to SYNC from server.
needs:debug Uses the DEBUG command or other debugging focused commands (like OBJECT REFCOUNT).
needs:pfdebug Uses the PFDEBUG command.
needs:config-maxmemory Uses CONFIG SET to manipulate memory limit, eviction policies, etc.
needs:config-resetstat Uses CONFIG RESETSTAT to reset statistics.
needs:reset Uses RESET to reset client connections.
needs:save Uses SAVE or BGSAVE to create an RDB file.

When using an external server (--host and --port), filtering using the external:skip tags is done automatically.

When using --cluster-mode, filtering using the cluster:skip tag is done automatically.

When not using --large-memory, filtering using the largemem:skip tag is done automatically.

In addition, it is possible to specify additional configuration. For example, to run tests on a server that does not permit SYNC use:

./runtest --host <host> --port <port> --tags -needs:repl