mirror of https://github.com/redis/redis.git synced 2026-02-03 20:39:54 -05:00

History

Salvatore Sanfilippo 154fdcee01 Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Test tcp deadlock fixes (#14667 ) Disclaimer: this patch was created with the help of AI My experience with the Redis test not passing on older hardware didn't stop just with the other PR opened with the same problem. There was another deadlock happening when the test was writing a lot of commands without reading it back, and the cause seems related to the fact that such tests have something in common. They create a deferred client (that does not read replies at all, if not asked to), flood the server with 1 million of requests without reading anything back. This results in a networking issue where the TCP socket stops accepting more data, and the test hangs forever. To read those replies from time to time allows to run the test on such older hardware. Ping oranagra that introduced at least one of the bulk writes tests. AFAIK there is no problem in the test, if we change it in this way, since the slave buffer is going to be filled anyway. But better to be sure that it was not intentional to write all those data without reading back for some reason I can't see. IMPORTANT NOTE: I am NOT sure at all that the TCP socket senses congestion in one side and also stops the other side, but anyway this fix works well and is likely a good idea in general. At the same time, I doubt there is a pending bug in Redis that makes it hang if the output buffer is too large, or we are flooding the system with too many commands without reading anything back. So the actual cause remains cloudy. I remember that Redis, when the output limit is reached, could kill the client, and not lower the priority of command processing. Maybe Oran knows more about this. ## LLM commit message. The test "slave buffer are counted correctly" was hanging indefinitely on slow machines. The test sends 1M pipelined commands without reading responses, which triggers a TCP-level deadlock. Root cause: When the test client sends commands without reading responses: 1. Server processes commands and sends responses 2. Client's TCP receive buffer fills (client not reading) 3. Server's TCP send buffer fills 4. Packets get dropped due to buffer pressure 5. TCP congestion control interprets this as network congestion 6. cwnd (congestion window) drops to 1, RTO increases exponentially 7. After multiple backoffs, RTO reaches ~100 seconds 8. Connection becomes effectively frozen This was confirmed by examining TCP socket state showing cwnd:1, backoff:9, rto:102912ms, and rwnd_limited:100% on the client side. The fix interleaves reads with writes by processing responses every 10,000 commands. This prevents TCP buffers from filling to the point where congestion control triggers the pathological backoff behavior. The test still validates the same functionality (slave buffer memory accounting) since the measurement happens after all commands complete. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>		2026-01-07 14:26:22 +08:00
..
assets	Optimistic locking for string objects - compare-and-set and compare-and-delete (#14435 )	2025-10-21 10:32:49 +03:00
cluster	Add Atomic Slot Migration (ASM) support (#14414 )	2025-10-22 15:56:20 +03:00
helpers	Fix daily CI for atomic slot migration (#14459 )	2025-10-25 09:00:33 +08:00
integration	Fix infinite loop during reverse iteration due to invalid numfields of corrupted stream (#14472 )	2026-01-05 21:16:53 +08:00
modules	New eviction policies - least recently modified (#14624 )	2026-01-06 20:57:31 +08:00
sentinel	Fix timing issue for sentinel master-reboot test (#14312 )	2025-09-05 14:49:19 +08:00
support	fix test tag leakage that can result in skipping tests (#14572 )	2025-11-26 09:13:21 +02:00
tmp	minor fixes to the new test suite, html doc updated	2010-05-14 18:48:33 +02:00
unit	Test tcp deadlock fixes (#14667 )	2026-01-07 14:26:22 +08:00
vectorset	Add daily CI for vectorset (#14302 )	2025-12-10 08:52:43 +08:00
instances.tcl	Fix some daily CI issues (#14217 )	2025-07-28 10:53:57 +08:00
README.md	Add thread sanitizer run to daily CI (#13964 )	2025-06-02 10:13:23 +03:00
test_helper.tcl	Add Atomic Slot Migration (ASM) support (#14414 )	2025-10-22 15:56:20 +03:00

README.md

Redis Test Suite

The normal execution mode of the test suite involves starting and manipulating local redis-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations:

Option	Impact
`--singledb`	Only use database 0, don't assume others are supported.
`--ignore-encoding`	Skip all checks for specific encoding.
`--ignore-digest`	Skip key value digest validations.
`--cluster-mode`	Run in strict Redis Cluster compatibility mode.
`--large-memory`	Enables tests that consume more than 100mb

Tags

Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.

Tags can be applied in different context levels:

start_server context
tags context that bundles several tests together
A single test context.

The following compatibility and capability tags are currently used:

Tag	Indicates
`external:skip`	Not compatible with external servers.
`cluster:skip`	Not compatible with `--cluster-mode`.
`large-memory`	Test that requires more than 100mb
`tls:skip`	Not compatible with `--tls`.
`tsan:skip`	Not compatible with running under thread sanitizer.
`needs:repl`	Uses replication and needs to be able to `SYNC` from server.
`needs:debug`	Uses the `DEBUG` command or other debugging focused commands (like `OBJECT REFCOUNT`).
`needs:pfdebug`	Uses the `PFDEBUG` command.
`needs:config-maxmemory`	Uses `CONFIG SET` to manipulate memory limit, eviction policies, etc.
`needs:config-resetstat`	Uses `CONFIG RESETSTAT` to reset statistics.
`needs:reset`	Uses `RESET` to reset client connections.
`needs:save`	Uses `SAVE` or `BGSAVE` to create an RDB file.

When using an external server (--host and --port), filtering using the external:skip tags is done automatically.

When using --cluster-mode, filtering using the cluster:skip tag is done automatically.

When not using --large-memory, filtering using the largemem:skip tag is done automatically.

In addition, it is possible to specify additional configuration. For example, to run tests on a server that does not permit SYNC use:

./runtest --host <host> --port <port> --tags -needs:repl