redis

mirror of https://github.com/redis/redis.git synced 2026-02-03 20:39:54 -05:00

Author	SHA1	Message	Date
Mincho Paskalev	b5a37c0e42	Add cmd tips for HOTKEYS. Return err when hotkeys START specifies invalid slots (#14761 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Reply-schemas linter / reply-schemas-linter (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details - When passing slots not within the range of a node to `HOTKEYS START SLOTS ...` the hotkey command now returns error. - Changed the cmd tips for the HOTKEYS subcommands so that they reflect the special nature of the cmd in cluster mode - i.e command should be issued against a single node only. Clients should not care about cluster management and aggregation of results. - Change reply schema to return Array of the maps. For a single node this will return array of 1 element. Getting results from multiple nodes will make it easy to concatenate the elements into one array.	2026-02-03 17:54:32 +02:00
Slavomir Kaslev	bafaec5b6a	Fix HOTKEYS to track each command in a MULTI/EXEC block (#14756 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Fix HOTKEYS to track each command in a MULTI/EXEC block.	2026-02-02 09:50:44 +02:00
Slavomir Kaslev	ca681f997e	Add LTRIM/LREM and RM_StringTruncate() memory tracking tests (#14751 ) Add LTRIM/LREM and RM_StringTruncate() memory tracking tests.	2026-01-29 13:04:46 +02:00
Mincho Paskalev	591fc90263	Change reply schema for hotkeys get to use map instead of flat array (#14749 ) Follow #14680 Reply of `HOTKEYS GET` is an unordered collection of key-value pairs. It is more reasonable to be a map in resp3 instead of flat array.	2026-01-29 11:21:05 +02:00
debing.sun	beb75e40bf	Fix test failure when using bind "" in introspection.tcl (#14745 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details The reason for the failure is that when starting server with bind , the host will be set to *. At this time, when reconnect, the client will not recognize this host. So this fix skipped checking whether the server was ready.	2026-01-27 20:50:06 +08:00
Mincho Paskalev	b209e8afde	Fix hotkey info metric names. Disable HOTKEY SLOTS param for non-cluster (#14742 ) Some hotkeys cpu metrics display time in milliseconds others in microseconds. Change the metrics showing time of command executions to all use microseconds and use the `-us` postfix to show that. Also, disable the `SLOTS` param for `HOTKEYS START` if we are not in cluster mode.	2026-01-26 13:32:05 +02:00
Stav-Levi	a765ee8238	Add security configuration warnings at startup (#14708 ) Adds startup-time security warnings when the default user permits unauthenticated access, with behavior dependent on protected-mode and bind settings. Warnings are skipped in Sentinel mode since it intentionally disables protected-mode by design. - No password + no protected-mode + no bind: warn about accepting connections from any IP/interface - No password + no protected-mode: warn about accepting connections from any IP on configured interface - No password + protected-mode enabled: warn about accepting connections from local clients	2026-01-26 16:58:53 +08:00
debing.sun	18538461d1	Add separate statistics for active expiration of keys and hash fields (#14727 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details ### Summary Adds `expired_keys_active` and `expired_subkeys_active` counters to track keys and hash fields expired by the active expiration cycle, distinguishing them from lazy expirations. These new metrics are exposed in INFO stats output. ### Motivation Currently, Redis tracks the total number of expired keys (expired_keys) and expired hash fields (expired_subkeys), but there's no way to differentiate between expirations triggered by active expire and lazy expire. --------- Co-authored-by: Moti Cohen <moti.cohen@redis.com>	2026-01-22 22:30:25 +08:00
Slavomir Kaslev	5dec7d3675	Add key allocation sizes histogram (#14695 ) Add key allocation sizes histograms based on previous memory accounting work in #14363 and #14451. The histograms are exposed via `INFO keysizes` and use logarithmic (power-of-2) bins, similar to current key sizes/length histogram implementation in the following fields: db0_distrib_lists_sizes:1=...,2=...,4=... db0_distrib_sets_sizes:1=...,2=...,4=... db0_distrib_hashes_sizes:1=...,2=...,4=... db0_distrib_zsets_sizes:1=...,2=...,4=... To avoid confusion with existing distrib_strings_sizes histograms which are based on string lengths we don't report allocation sizes histograms for strings. So far per key and per slot memory accounting code has been relying type specific functions (hashTypeAllocSize(), listTypeAllocSize(), zsetAllocSize(), etc) for computing data structure allocation sizes since it's faster and we only need to track size deltas and not the complete allocation size along with the kvobj and key length overhead. In order to keep the allocation sizes histogram consistent, memory accounting code has been switched to use kvobjAllocSize() instead which does return the total allocation size. Note that the feature is enabled with `key-bytes-stats` or `cluster-slot-stats` config in redis config file on startup.	2026-01-22 09:40:04 +02:00
Paulo Sousa	c4baa64ea8	Optimize peak memory stats by switching from per-command checks to threshold-based (#14692 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details This PR optimizes peak memory tracking by moving from per-command checks to a threshold-based mechanism in `zmalloc`. Instead of updating peak memory on every command, peak tracking is now triggered only when a thread's memory delta exceeds 100KB. This reduces runtime overhead while keeping peak memory accuracy acceptable. ## Implementation Details - Peak memory is tracked atomically in `zmalloc` when a thread's memory delta exceeds 100KB - Thread-safe peak updates using CAS - Peak tracking considers both: - current used memory - zmalloc-reported peak memory ## Performance Results (ARM AArch64) All performance numbers were obtained on an AWS m8g.metal (ARM AArch64) instance. The database was pre-populated with 1M keys, each holding a 1KB value. Benchmarks were executed using memtier with a 10 SET : 90 GET ratio and pipeline = 10 ([full benchmark spec. here](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-setget200c-1KiB-pipeline-10.yml)). \| Environment \| Baseline `redis/redis` unstable (median ± std.dev) \| Comparison `paulorsousa/redis` `f05a4bd273cb4d63ff03d33e6207837b6e51de86` (median) \| % change (higher better) \| Note \| \|------------------------------\|----------------------------------------------------\|----------------------------------------------------------------------------------:\|--------------------------\|-----------------------\| \| oss-standalone \| 802,830 ± 0.2% (7 datapoints) \| 796,660 \| -0.8% \| No change \| \| oss-standalone-02-io-threads \| 982,698 ± 0.6% (7 datapoints) \| 980,520 \| -0.2% \| No change \| \| oss-standalone-04-io-threads \| 2,573,244 ± 1.9% (7 datapoints) \| 2,630,931 \| +2.2% \| Potential improvement \| \| oss-standalone-08-io-threads \| 2,343,609 ± 1.6% (7 datapoints) \| 2,455,630 \| +4.8% \| Improvement \|	2026-01-21 22:52:31 +08:00
Mincho Paskalev	e3c38aab66	Handle primary/replica clients in IO threads (#14335 ) # Problem While introducing Async IO threads(https://github.com/redis/redis/pull/13695) primary and replica clients were left to be handled inside main thread due to data race and synchronization issues. This PR solves this issue with the additional hope it increases performance of replication. # Overview ## Moving the clients to IO threads Since clients first participate in a handshake and an RDB replication phases it was decided they are moved to IO-thread after RDB replication is done. For primary client this was trivial as the master client is created only after RDB sync (+ some additional checks one can see in `isClientMustHandledByMainThread`). Replica clients though are moved to IO threads immediately after connection (as are all clients) so currently in `unstable` replication happens while this client is in IO-thread. In this PR it was moved to main thread after receiving the first `REPLCONF` message from the replica, but it is a bit hacky and we can remove it. I didn't find issues between the two versions. ## Primary client (replica node) We have few issues here: - during `serverCron` a `replicationCron` is ran which periodically sends `REPLCONF ACK` message to the master, also checks for timed-out master. In order to prevent data races we utilize`IOThreadClientsCron`. The client is periodically sent to main thread and during `processClientsFromIOThread` it's checked if it needs to run the replication cron behaviour. - data races with main thread - specifically `lastinteraction` and `read_reploff` members of the primary client that are written to in `readQueryFromClient` could be accessed at the same time from main thread during execution of `INFO REPLICATION`(`genRedisInfoString`). To solve this the members were duplicated so if the client is in IO-thread it writes to the duplicates and they are synced with the original variables each time the client is send to main thread ( that means `INFO REPLICATION` could potentially return stale values). - During `freeClient` the primary client is fetched to main thread but when caching it(`replicationCacheMaster`) the thread id will remain the id of the IO thread it was from. This creates problems when resurrecting the master client. Here the call to `unbindClientFromIOThreadEventLoop` in `freeClient` was rewritten to call `keepClientInMainThread` which automatically fixes the problem. - During `exitScriptTimedoutMode` the master is queued for reprocessing (specifically process any pending commands ASAP after it's unblocked). We do that by putting it in the `server.unblocked_clients` list, which are processed in the next `beforeSleep` cycle in main thread. Since this will create a contention between main and IO thread, we just skip this queueing in `unblocked_clients` and just queue the client to main thread - the `processClientsFromIOThread` will process the pending commands just as main would have. ## Replica clients (primary node) We move the client after RDB replication is done and after replication backlog is fed with its first message. We do that so that the client's reference to the first replication backlog node is initialized before it's read from IO-thread, hence no contention with main thread on it. ### Shared replication buffer Currently in unstable the replication buffer is shared amongst clients. This is done via clients holding references to the nodes inside the buffer. A node from the buffer can be trimmed once each replica client has read it and send its contents. The reference is `client->ref_repl_buf_node`. The replication buffer is written to by main thread in `feedReplicationBuffer` and the refcounting is intrusive - it's inside the replication-buffer nodes themselves. Since the replica client changes the refcount (decreases the refcount of the node it has just read, and increases the refcount of the next node it starts to read) during `writeToClient` we have a data race with main thread when it feeds the replication buffer. Moreover, main thread also updates the `used` size of the node - how much it has written to it, compared to its capacity which the replica client relies on to know how much to read. Obviously replica being in IO-thread creates another data race here. To mitigate these issues a few new variables were added to the client's struct: - `io_curr_repl_node` - starting node this replica is reading from inside IO-thread - `io_bound_repl_node` - the last node in the replication buffer the replica sees before being send to IO-thread. These values are only allowed to be updated in main thread. The client keeps track of how much it has read into the buffer via the old `ref_repl_buf_node`. Generally while in IO-thread the replica client will now keep refcount of the `io_curr_repl_node` until it's processed all the nodes up to `io_bound_repl_node` - at that point its returned to main thread which can safely update the refcounts. The `io_bound_repl_node` reference is there so the replica knows when to stop reading from the repl buffer - imagine that replica reads from the last node of the replication buffer while main thread feeds data to it - we will create a data race on the `used` value (`_writeToClientSlave`(IO-thread) vs `feedReplicationBuffer`(main)). That's why this value is updated just before the replica is being send to IO thread. NOTE, this means that when replicas are handled by IO threads they will hold more than one node at a time (i.e `io_curr_repl_node` up to `io_bound_repl_node`) meaning trimming will happen a bit less frequently. Tests show no significant problems with that. (tnx to @ShooterIT for the `io_curr_repl_node` and `io_bound_repl_node` mechanism as my initial implementation had similar semantics but was way less clear) Example of how this works: * Replication buffer state at time N: \| node 0\| ... \| node M, used_size K \| * replica caches `io_curr_repl_node`=0, `io_bound_repl_node`=M and `io_bound_block_pos`=K * replica moves to IO thread and processes all the data it sees * Replication buffer state at time N + 1: \| node 0\| ... \| node M, used_size Full \| \|node M + 1\| \|node M + 2, used_size L\|, where Full > M * replica moves to main thread at time N + 1, at this point following happens - refcount to node 0 (io_curr_repl_node) is decreased - `ref_repl_buf_node` becomes node M(io_bound_repl_node) (we still have size-K bytes to process from there) - refcount to node M is increased (now all nodes from 0 up to M-1 including can be trimmed unless some other replica holds reference to them) - And just before the replica is send back to IO thread the following are updated: - `io_bound_repl_node` ref becomes node M+2 - `io_bound_block_pos` becomes L Note that replica client is only moved to main if it has processed all the data it knows about (i.e up to `io_bound_repl_node` + `io_bound_block_pos`) ### Replica clients kept in main as much as possible During implementation an issue arose - how fast is the replica client able to get knowledge about new data from the replication buffer and how fast can it trim it. In order for that to happen ASAP whenever a replica is moved to main it remains there until the replication buffer is fed new data. At that point its put in the pending write queue and special cased in handleClientsWithPendingWrites so that its send to IO thread ASAP to write the new data to replica. Also since each time the replica writes its whole repl data it knows about that means after it's send to main thread `processClientsFromIOThread` is able to immediately update the refcounts and trim whatever it can. ### ACK messages from primary Slave clients need to periodically read `REPLCONF ACK` messages from client. Since replica can remain in main thread indefinitely if no DB change occurs, a new atomic `pending_read` was added during `readQueryFromClient`. If a replica client has a pending read it's returned back to IO-thread in order to process the read even if there is no pending repl data to write. ### Replicas during shutdown During shutdown the main thread pauses write actions and periodically checks if all replicas have reached the same replication offset as the primary node. During `finishShutdown` that may or may not be the case. Either way a client data may be read from the replicas and even we may try to write any pending data to them inside `flushSlavesOutputBuffers`. In order to prevent races all the replicas from IO threads are moved to main via `fetchClientFromIOThread`. A cancel of the shutdown should be ok, since the mechanism employed by `handleClientsWithPendingWrites` should return the client back to IO thread when needed. ## Notes While adding new tests timing issues with Tsan tests were found and fixed. Also there is a data race issue caught by Tsan on the `last_error` member of the `client` struct. It happens when both IO-thread and main thread make a syscall using a `client` instance - this can happen only for primary and replica clients since their data can be accessed by commands send from other clients. Specific example is the `INFO REPLICATION` command. Although other such races were fixed, as described above, this once is insignificant and it was decided to be ignored in `tsan.sup`. --------- Co-authored-by: Yuan Wang <wangyuancode@163.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2026-01-21 16:19:12 +02:00
Slavomir Kaslev	b9c00b27f8	Make cluster-slot-stats-enabled config multivalued (#14719 ) This allows users to specify exactly what per slot statistics are to be collected -- CPU, network traffic and/or memory used. The config accepts multiple values as a space-separated list: - cpu: Track CPU usage per slot (cpu-usec metric) - net: Track network bytes per slot (network-bytes-in, network-bytes-out metrics) - mem: Track memory usage per slot (memory-bytes metric) - yes: Enable all tracking (equivalent to "cpu net mem") - no: Disable all tracking (default) Note: Memory tracking (mem) can ONLY be enabled at startup. If you try to enable memory tracking via CONFIG SET when it wasn't enabled at startup, the command will fail. However, you can disable memory tracking at runtime by removing the 'mem' flag. Once disabled, memory tracking cannot be re-enabled without restarting the server.	2026-01-21 15:36:03 +02:00
Yuan Wang	a2e901c93d	Fix inaccurate IO thread client count due to delayed freeing (#14723 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details There is a failure in CI: ``` *** [err]: Clients are evenly distributed among io threads in tests/unit/introspection.tcl Expected '2' to be equal to '1' (context: type eval line 3 cmd {assert_equal $cur_clients 1} proc ::start_server) ``` There might be a client used for health checks (to detect if the server is up) that has not been freed timely. This can lead to an inaccurate count of connected clients processed by IO threads. So we wait it to close completely.	2026-01-21 18:13:40 +08:00
Stav-Levi	25f780b662	Fix crash when calling internal container command without arguments (#14690 ) Addresses crash and clarifies errors around container commands. - Update server.c to handle container commands with no subcommand: emit "missing subcommand. Try HELP."; keep "unknown subcommand" for invalid subcommands; for unknown commands, include args preview only when present - Add a test module command subcommands.internal_container with a subcommand for validation - Add unit test asserting missing subcommand error when calling the internal container command without arguments	2026-01-21 08:38:04 +02:00
debing.sun	e76e3af5b7	Fix some test timing issues in replication.tcl and maxmemory.tcl (#14718 ) 1) Replace fixed sleep with wait_for_condition to avoid flaky test failures when checking master_current_sync_attempts counter. 2) Similar to https://github.com/redis/redis/pull/14674, use assert_lessthan_equal instead of assert_lessthan to verify the idle time.	2026-01-20 19:25:15 +08:00
debing.sun	d2da5cca37	Fix timeout waiting for blocked clients in pause test (#14716 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details To verify the pause duration, we need to wait for the client to be unpause and the command to complete, so add `$rd read` to wait for the command to finish. The test failure was caused by $rd still being blocked and not closed in the previous test, so the next test would get 2 blocked clients instead of 1 client, causing the test to fail.	2026-01-20 17:12:22 +08:00
Yuan Wang	cfa6129040	Minor fixes for ASM (#14707 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details - TCL test failure https://github.com/redis/redis/actions/runs/21121021310/job/60733781853#step:6:5705 ``` [err]: Test cluster module notifications when replica restart with RDB during importing in tests/unit/cluster/atomic-slot-migration.tcl Expected '{sub: cluster-slot-migration-import-started, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100}' to be equal to '{sub: cluster-slot-migration-import-started, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100} {sub: cluster-slot-migration-import-completed, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100}' (context: type eval line 29 cmd {assert_equal [list "sub: cluster-slot-migration-import-started, source_node_id:$src_id, destination_node_id:$dest_id, task_id:$task_id, slots:0-100" ] [R 4 asm.get_cluster_event_log]} proc ::test) ``` If there is a delay to work to check, the ASM task may complete, so we will get `started & completed` ASM log instead of only `started` log, it feels fragile, so delete the check, we will check all logs later. ``` restart_server -4 true false true save ;# rdb save ---> if there is a delay, the ASM task should complete # the asm task info in rdb will fire module event assert_equal [list \ "sub: cluster-slot-migration-import-started, source_node_id:$src_id, destination_node_id:$dest_id, task_id:$task_id, slots:0-100" \ ] [R 4 asm.get_cluster_event_log] ``` - Start BGSAVE for slot snapshot ASAP Since we consider the migrating client as a replica that wants diskless replication, so it will wait for repl-diskless-sync-delay` to start a new fork after the last child exits. But actually slot snapshot can not be shared with other slaves, so we can start BGSAVE for it immediately. also resolve internal ticket RED-177974.	2026-01-19 19:57:20 +08:00
debing.sun	39881fa6f2	Reply Copy Avoidance (#14608 ) This PR is based on https://github.com/valkey-io/valkey/pull/2078 # Reply Copy Avoidance Optimization This PR introduces an optimization to avoid unnecessary memory copies when sending replies to clients in Redis. ## Overview Currently, Redis copies reply data into client output buffers before sending responses. This PR implements a mechanism to avoid these copies in certain scenarios, improving performance and reducing memory overhead. ### Key Changes * Added capability to reply construction allowing to interleave regular replies with copy avoid replies in client reply buffers * Extended write-to-client handlers to support copy avoid replies * Added copy avoidance of string bulk replies when copy avoidance indicated by I/O threads * Copy avoidance is beneficial for performance despite object size only starting certain number of threads. So it will be enabled only starting certain number of threads. Note: When copy avoidance disabled content and handling of client reply buffers remains as before this PR --------- Signed-off-by: Alexander Shabanov <alexander.shabanov@gmail.com> Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Alexander Shabanov <alexander.shabanov@gmail.com> Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Slavomir Kaslev <slavomir.kaslev@gmail.com> Co-authored-by: moticless <moticless@github.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2026-01-19 11:09:16 +08:00
Filipe Oliveira (Redis)	7f541b9607	Prefetch client fields before prefetching command-related data (#14700 ) This PR refines the prefetch strategy by removing ineffective (to close on the pipeline) dictionary-level prefetching and improving prefetch usage in IO threads. The goal is to better aligning prefetches with predictable access patterns. ## Changes - Removed speculative prefetching from `dictFindLinkInternal()`, simplifying the dictionary lookup hot path. - Introduced a two-phase prefetch approach in `prefetchIOThreadCommands()`: - Phase 1: Prefetch client structures and `pending_cmds` - Phase 2: Add commands to the batch and prefetch follow-up fields (`reply`, `mem_usage_bucket`) ## Performance Measured with `memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-16`. \| Environment \| % change \| \|-----------------------------\|----------\| \| oss-standalone \| -0.1% \| \| oss-standalone-02-io-threads \| +0.4% \| \| oss-standalone-04-io-threads \| +1.6% \| \| oss-standalone-08-io-threads \| +2.3% \| \| oss-standalone-12-io-threads \| +0.7% \| \| oss-standalone-16-io-threads \| +1.9% \| Overall, this shows an ~2% throughput improvement on IO-threaded configurations, with no meaningful impact on non-IO-threaded setups. --------- Co-authored-by: Yuan Wang <wangyuancode@163.com>	2026-01-18 20:14:39 +08:00
Mincho Paskalev	c93e4a62c6	Add hotkeys detection (#14680 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Reply-schemas linter / reply-schemas-linter (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details # Description Introducing a new method for identifying hotkeys inside a redis server during a tracking time period. Hotkeys in this context are defined by two metrics: * Percentage of time spend by cpu on the key from the total time during the tracking period * Percentage of network bytes (input+output) used for the key from the total network bytes used by redis during the tracking period ## Usage Although the API is subject to change the general idea is for the user to initiate a hotkeys tracking process which should run for some time. The keys' metrics are recorded inside a probabilistic structure and after that the user is able to fetch the top K of them. ### Current API ``` HOTKEYS START <METRICS count [CPU] [NET]> [COUNT k] [DURATION duration] [SAMPLE ratio] [SLOTS count slot…] HOTKEYS GET HOTKEYS STOP HOTKEYS RESET ``` ### HOTKEYS START Start a tracking session if either no is already started, or one was stopped or reset. Return error if one is in progress. * METRICS count [CPU] [NET] - chose one or more metrics to track * COUNT k - track top K keys * DURATION duration - preset how long the tracking session should last * SAMPLE ratio - a key is tracked with probability 1/ratio * SLOTS count slot... - Only track a key if it's in a slot amongst the chosen ones ### HOTKEYS GET Return array of the chosen metrics to track and various other metadata. (nil) if no tracking was started or it was reset. ``` 127.0.0.1:6379> hotkeys get 1) "tracking-active" 2) 1 3) "sample-ratio" 4) <ratio> 5) "selected-slots" (empty array if no slots selected) 6) 1) 0 2) 5 3) 6 7) "sampled-command-selected-slots-ms" (show on condition sample-ratio > 1 and selected-slots != empty-array) 8) <time-in-milliseconds> 9) "all-commands-selected-slots-ms" (show on condition selected-slots != empty-array) 10) <time-in-milliseconds> 11) "all-commands-all-slots-ms" 12) <time-in-milliseconds> 13) "net-bytes-sampled-commands-selected-slots" (show on condition sample-ratio > 1 and selected-slots != empty-array) 14) <num-bytes> 15) "net-bytes-all-commands-selected-slots" (show on condition selected-slots != empty-array) 16) <num-bytes> 17) "net-bytes-all-commands-all-slots" 18) <num-bytes> 19) "collection-start-time-unix-ms" 20) <start-time-unix-timestamp-in-ms> 21) "collection-duration-ms" 22) <duration-in-milliseconds> 23) "used-cpu-sys-ms" 24) <duration-in-millisec> 25) "used-cpu-user-ms" 26) <duration-in-millisec> 27) "total-net-bytes" 28) <num-bytes> 29) "by-cpu-time" 30) 1) key-1_1 2) <millisec> ... 19) key-10_1 20) <millisec> 31) 1) "by-net-bytes" 32) 1) key-1_2 2) <num-bytes> ... 19) key-10_2 20) <num-bytes> ``` ### HOTKEYS STOP Stop tracking session but user can still get results from `HOTKEYS GET`. ### HOTKEYS RESET Release resources used for hotkeys tracking only when it is stopped. Return error if a tracking is active. ## Additional changes The `INFO` command now has a "hotkeys" section with 3 fields * tracking_active - a boolean flag indicating whether or not we currently track hotkeys. * used-memory - memory overhead of the structures used for hotkeys tracking. * cpu-time - time in ms spend updating the hotkey structure. ## Implementation Independent of API, implementation is based on a probabilistic structure - [Cuckoo Heavy Keeper](https://dl.acm.org/doi/abs/10.14778/3746405.3746434) structure with added min-heap to keep track of top K hotkey's names. CHK is an loosely based on [HeavyKeeper](https://www.usenix.org/conference/atc18/presentation/gong) which is used in RedisBloom's TopK but has higher throughput. Random fixed probability sampling via the `HOTKEYS start sample <ratio>` param. Each key is sampled with probability `1/ratio`. ## Performance implications With low enough sample rate (controlled by `HOTKEYS start sample <ratio>`) there is negligible performance hit. Tracking every key though can incur up to 15% hit in [the worst case](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-get-10B-pipeline-500.yml) after running the tests in this [bench](https://github.com/redis/redis-benchmarks-specification/). --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Slavomir Kaslev <slavomir.kaslev@gmail.com> Co-authored-by: debing.sun <debing.sun@redis.com>	2026-01-16 17:15:28 +02:00
Moti Cohen	11e73c66a8	Modules KeyMeta (Keys Metadata) (#14445 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Modules KeyMeta (Keys Metadata) Redis modules often need to associate additional metadata with keys in the keyspace. The objective is to create a unified and extensible interface, usable by modules, Redis core, and maybe later by the users, that facilitate the association and management of metadata with keys. While extending RedisModuleTypes might be an easier path, this proposal goes one step further: a general-purpose mechanism that lets attach metadata to any key, independent of underlying data type. A major part of this feature involves defining how metadata is managed throughout a key’s lifecycle. Modules will be able to optionally register distinct metadata classes, each with its own lifecycle callbacks and capable of storing arbitrary 8-byte value per key. These metadata values will be embedded directly within Redis’s core key-value objects to ensure fast access and automatic callback execution as keys are created, updated, or deleted. Each 8 bytes of metadata can represent either a simple primitive value or a pointer/handle to more complex, externally managed data by the module and RDB serialized along with the key. Key Features: - Modules can register up to 7 metadata classes (8 total, 1 reserved) - Each class: 4-char name + 5-bit version (e.g., "SRC1" v1) - Each class attaches 8 bytes per key (value or pointer/handle) - Separate namespace from module data types Module API: - RedisModule_CreateKeyMetaClass() - Register metadata class - RedisModule_ReleaseKeyMetaClass() - Release metadata class - RedisModule_SetKeyMeta() - Attach/update metadata - RedisModule_GetKeyMeta() - Retrieve metadata Lifecycle Callbacks: - copy, rename, move - Handle key operations - unlink, free - Handle key deletion/expiration - rdb_save, rdb_load - RDB persistence - aof_rewrite - AOF rewrite support Implementation: - Metadata slots allocated before kvobj in reverse class ID order - 8-bit metabits bitmap tracks active classes per key - Minimal memory overhead - only allocated slots consume memory RDB Serialization (v13): - New opcode RDB_OPCODE_KEY_METADATA - Compact 32-bit class spec: 24-bit name + 5-bit ver + 3-bit flags - Self-contained format: [META,] TYPE, KEY, VALUE - Portable across cluster nodes Integration: - Core ops: dbAdd, dbSet, COPY, MOVE, RENAME, DELETE - DUMP/RESTORE support - AOF rewrite via module callbacks - Defragmentation support - Module type I/O refactored to ModuleEntityId	2026-01-15 23:11:17 +02:00
Sergei Georgiev	221409788a	Add idempotency support to XADD via IDMPAUTO and IDMP parameters (#14615 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Reply-schemas linter / reply-schemas-linter (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details # Overview This PR introduces idempotency support to Redis Streams' XADD command, enabling automatic deduplication of duplicate message submissions through optional IDMPAUTO and IDMP parameters with producer identification. This enables reliable at-least-once delivery while preventing duplicate entries in streams. ## Problem Statement Current Redis Streams implementations lack built-in idempotency mechanisms, making reliable at-least-once delivery impossible without accepting duplicates: - Application-level tracking: Developers must maintain separate data structures to track submitted messages - Race conditions: Network failures and retries can result in duplicate stream entries - Complexity overhead: Each producer must implement custom deduplication logic - Memory inefficiency: External deduplication systems duplicate Redis's storage capabilities This lack of native idempotency support creates reliability challenges in distributed systems where at-least-once delivery semantics are required but exactly-once processing is desired. ## Solution Extends XADD with optional idempotency parameters that include producer identification: ``` XADD key [NOMKSTREAM] [KEEPREF \| DELREF \| ACKED] [IDMPAUTO pid \| IDMP pid iid] [MAXLEN \| MINID [= \| ~] threshold [LIMIT count]] <* \| id> field value [field value ...] ``` ### Producer ID (pid) - pid (producer id): A unique identifier for each producer - Must be unique per producer instance - Producers must use the same pid after restart to access their persisted idempotency tracking - Enables per-producer idempotency tracking, isolating duplicate detection between different producers Format: Binary or string, recommended max 36 bytes Generation: - Recommended: UUID v4 for globally unique identification - Alternative: `hostname:process_id` or application-assigned IDs ### Idempotency Modes IDMPAUTO pid (Automatic Idempotency): - Producer specifies its pid, Redis automatically calculates a unique idempotent ID (iid) based on entry content - Hash calculation combines XXH128 hashing of individual field-value pairs using an order-independent Sum + XOR approach with rotation (each pair: `XXH128(field \|\| field_length \|\| value)`) - 16-byte binary iid with extremely low accidental collision probability - XXH128 is a non-cryptographic hash function: fast and well-distributed, but does NOT prevent intentional collision attacks - For protection against adversarial collision crafting, use IDMP mode with cryptographically-signed idempotent IDs - Order-independent: field ordering does not affect the calculated iid - If (pid, iid) pair exists in producer's IDMP map: returns existing entry ID without creating duplicate entry - Generally slower than manual mode due to hash calculation overhead IDMP pid iid (Manual Idempotency): - Caller provides explicit producer id (pid) and idempotent ID (iid) for deduplication - iid must be unique per message (either globally or per pid) - Faster processing than IDMPAUTO (no hash calculation overhead) - Enables shorter iids for reduced memory footprint - If (pid, iid) pair exists in producer's IDMP map: returns existing entry ID without comparing field contents - Caller responsible for iid uniqueness and consistency across retries Both modes can only be specified when entry ID is `` (auto-generated). ### Deduplication Logic When XADD is called with idempotency parameters: 1. Redis checks if the message was recently added to the stream based on the (pid, iid) pair 2. If the (pid, iid) pair matches a recently-seen pair for that producer, the message is assumed to be identical 3. No duplicate message is added to the stream; the existing entry ID is returned 4. With IDMP pid iid: Redis does not compare the specified fields and their values—two messages with the same (pid, iid) are assumed identical 5. With IDMPAUTO pid: Redis calculates the iid from message content and checks for duplicates ## IDMP Map: Per-Producer Time and Capacity-Based Expiration Each producer with idempotency enabled maintains its own isolated IDMP map (iid → entry_id) with dual expiration criteria: Time-based expiration (duration): - Each iid expires automatically after duration seconds from insertion - Provides operational guarantee: Redis will not forget an iid before duration elapses (unless capacity reached) - Configurable per-stream via XCFGSET Capacity-based expiration (maxsize): - Each producer's map enforces maximum capacity of maxsize entries - When capacity reached, oldest iids for that producer are evicted regardless of remaining duration - Prevents unbounded memory growth during extended usage ### Configuration Commands XINFO STREAM: View current configuration and metrics Use `XINFO STREAM key` to retrieve idempotency configuration (idmp-duration, idmp-maxsize) along with tracking metrics. XCFGSET: Configure expiration parameters ``` XCFGSET key [IDMP-DURATION duration] [IDMP-MAXSIZE maxsize] ``` - duration: Seconds to retain each iid (range: 1- 86400 seconds) - maxsize: Maximum iids to track per producer (range: 1-10,000 entries) - Calling XCFGSET clears all existing producer IDMP maps for the stream Default Configuration* (when XCFGSET not called): - Duration: 100 seconds - Maxsize: 100 iids per producer - Runtime configurable via: `stream-idmp-duration` and `stream-idmp-maxsize` ## Response Behavior On first submission (pid, iid) pair not in producer's map: - Entry added to stream with generated entry ID - (pid, iid) pair stored in producer's IDMP map with current timestamp - Returns new entry ID On duplicate submission (pid, iid) pair exists in producer's map: - No entry added to stream - Returns existing entry ID from producer's IDMP map - Identical response to original submission (client cannot distinguish) ## Stream Metadata XINFO STREAM extended with idempotency metrics and configuration: - idmp-duration: The duration value (in seconds) configured for the stream's IDMP map - idmp-maxsize: The maxsize value configured for the stream's IDMP map - pids-tracked: Current number of producers with active IDMP maps - iids-tracked: Current total number of iids across all producers' IDMP maps (reflects active iids that haven't expired or been evicted) - iids-added: Lifetime cumulative count of entries added with idempotency parameters - iids-duplicates: Lifetime cumulative count of duplicate iids detected across all producers ## Persistence and Restart Behavior IDMP maps are fully persisted and restored across Redis restarts: - RDB/AOF: All pid-iid pairs, timestamps, and configuration are included in snapshots and AOF logs - Recovery: On restart, all tracked (pid, iid) pairs remain valid and operational - Producer Requirement: Producers must reuse the same pid after restart to access their persisted IDMP map - Configuration: Stream-level settings (duration, maxsize) persist across restarts - Important: Calling XCFGSET after restart clears restored IDMP maps (same behavior as during runtime) ## Key Benefits - Enables At-most-once Producer Semantics: Makes it possible to safely retry message submissions without creating duplicates - Automatic Retry Safety: Network failures and retries cannot create duplicate entries - Producer Isolation: Each producer maintains independent idempotency tracking - Memory Efficient: Time and capacity-based expiration per producer prevents unbounded growth - Flexible Implementation: Choose automatic (IDMPAUTO) or manual (IDMP) based on performance needs - Backward Compatible: Fully optional parameters with zero impact on existing XADD behavior - Collision Resistant: XXH128 with Sum + XOR combination and field-length separators provides high-quality non-cryptographic hashing for IDMPAUTO with extremely low collision probability and prevents ambiguous concatenation attacks	2026-01-15 21:58:44 +08:00
Filipe Oliveira (Redis)	7e7c7b0558	Fix flaky test failures in caused by clock precision issues with monotonic clock. (#14697 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Fix flaky test failures in `tests/unit/moduleapi/blockedclient.tcl` caused by clock precision issues with monotonic clock. The test runs a command that blocks for 200ms and then asserts the elapsed time is >= 200ms. Due to clock skew and timing precision differences, the measured time occasionally comes back as 199ms, causing spurious test failures.	2026-01-14 19:44:05 +08:00
Vitah Lin	e396dd3385	Fix flaky stream LRM test due to timing precision (#14674 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details	2026-01-09 10:14:44 +08:00
Yuan Wang	858a8800e2	Propagate migrate task info to replicas (#14672 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details - Allow replicas to track master's migrate task state Previously, we only propagate import task info to replicas, but now we also support propagating migrate task info, so the new master can initiate slots trimming again if needed after failover, this can avoid data redundancy. - Prevent replicas from initiating slot trimming actively Lack of data cleaning mechanism on source side, so we allow replicas to continue pending slot trimming, but it is not good idea to let replicas trim actively. As we introduce above feature, we can delete this logic	2026-01-08 19:06:57 +08:00
Slavomir Kaslev	5aa47347e7	Fix CLUSTER SLOT-STATS test Lua scripts (#14671 ) Fix hard-coded keys in test Lua scripts which is incompatible with cluster-mode. Reported-by: Oran Agra <oran@redis.com>	2026-01-08 11:16:50 +02:00
Stav-Levi	73249497d4	Fix ACL key-pattern bypass in MSETEX command (#14659 ) MSETEX doesn't properly check ACL key permissions for all keys - only the first key is validated. MSETEX arguments look like: MSETEX <numkeys> key1 val1 key2 val2 ... EX seconds Keys are at every 2nd position (step=2). When Redis extracts keys for ACL checking, it calculates where the last key is: last = first + numkeys - 1; => calculation ignores step last = first + (numkeys-1) * step; With 2 keys starting at position 2: Bug: last = 2 + 2 - 1 = 3 → only checks position 2 Fix: last = 2 + (2-1)*2 = 4 → checks positions 2 and 4 Fixes #14657	2026-01-08 08:41:55 +02:00
Salvatore Sanfilippo	154fdcee01	Test tcp deadlock fixes (#14667 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Disclaimer: this patch was created with the help of AI My experience with the Redis test not passing on older hardware didn't stop just with the other PR opened with the same problem. There was another deadlock happening when the test was writing a lot of commands without reading it back, and the cause seems related to the fact that such tests have something in common. They create a deferred client (that does not read replies at all, if not asked to), flood the server with 1 million of requests without reading anything back. This results in a networking issue where the TCP socket stops accepting more data, and the test hangs forever. To read those replies from time to time allows to run the test on such older hardware. Ping oranagra that introduced at least one of the bulk writes tests. AFAIK there is no problem in the test, if we change it in this way, since the slave buffer is going to be filled anyway. But better to be sure that it was not intentional to write all those data without reading back for some reason I can't see. IMPORTANT NOTE: I am NOT sure at all that the TCP socket senses congestion in one side and also stops the other side, but anyway this fix works well and is likely a good idea in general. At the same time, I doubt there is a pending bug in Redis that makes it hang if the output buffer is too large, or we are flooding the system with too many commands without reading anything back. So the actual cause remains cloudy. I remember that Redis, when the output limit is reached, could kill the client, and not lower the priority of command processing. Maybe Oran knows more about this. ## LLM commit message. The test "slave buffer are counted correctly" was hanging indefinitely on slow machines. The test sends 1M pipelined commands without reading responses, which triggers a TCP-level deadlock. Root cause: When the test client sends commands without reading responses: 1. Server processes commands and sends responses 2. Client's TCP receive buffer fills (client not reading) 3. Server's TCP send buffer fills 4. Packets get dropped due to buffer pressure 5. TCP congestion control interprets this as network congestion 6. cwnd (congestion window) drops to 1, RTO increases exponentially 7. After multiple backoffs, RTO reaches ~100 seconds 8. Connection becomes effectively frozen This was confirmed by examining TCP socket state showing cwnd:1, backoff:9, rto:102912ms, and rwnd_limited:100% on the client side. The fix interleaves reads with writes by processing responses every 10,000 commands. This prevents TCP buffers from filling to the point where congestion control triggers the pathological backoff behavior. The test still validates the same functionality (slave buffer memory accounting) since the measurement happens after all commands complete. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-07 14:26:22 +08:00
debing.sun	0cb1ee0dc1	New eviction policies - least recently modified (#14624 ) ### Summary This PR introduces two new maxmemory eviction policies: `volatile-lrm` and `allkeys-lrm`. LRM (Least Recently Modified) is similar to LRU but only updates the timestamp on write operations, not read operations. This makes it useful for evicting keys that haven't been modified recently, regardless of how frequently they are read. ### Core Implementation The LRM implementation reuses the existing LRU infrastructure but with a key difference in when timestamps are updated: - LRU: Updates timestamp on both read and write operations - LRM: Updates timestamp only on write operations via `updateLRM()` ### Key changes: Add `keyModified()` to accept an optional `robj *val` parameter and call `updateLRM()` when a value is provided. Since `keyModified()` serves as the unified entry point for all key modifications, placing the LRM update here ensures timestamps are consistently updated across all write operations --------- Co-authored-by: oranagra <oran@redislabs.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2026-01-06 20:57:31 +08:00
debing.sun	9ca860be9e	Fix XTRIM/XADD with approx not deletes entries for DELREF/ACKED strategies (#14623 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details This bug was introduced by #14130 and found by guybe7 When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete strategies, if a node was eligible for removal but couldn't be removed directly (because consumer group references need to be checked), the code would incorrectly break out of the loop instead of continuing to process entries within the node. This fix allows the per-entry deletion logic to execute for eligible nodes when using non-KEEPREF strategies.	2026-01-05 21:17:36 +08:00
debing.sun	4eda670de9	Fix infinite loop during reverse iteration due to invalid numfields of corrupted stream (#14472 ) Follow https://github.com/redis/redis/pull/14423 In https://github.com/redis/redis/pull/14423, I thought the last lpNext operation of the iterator occurred at the end of streamIteratorGetID. However, I overlooked the fact that after calling `streamIteratorGetID()`, we might still use `streamIteratorGetField()` to continue moving within the current entry. This means that during reverse iteration, the iterator could move back to a previous entry position. To fix this, in this PR I record the current position at the beginning of streamIteratorGetID(). When we enter it again next time, we ensure that the entry position does not exceed the previous one, that is, during forward iteration the entry must be greater than the last entry position, and during reverse iteration it must be smaller than the last entry position. Note that the fix for https://github.com/redis/redis/pull/14423 has been replaced by this fix.	2026-01-05 21:16:53 +08:00
Stav-Levi	860b8c772a	Add TLS certificate-based automatic client authentication (#14610 ) This PR implements support for automatic client authentication based on a field in the client's TLS certificate. We adopt ValKey’s PR: https://github.com/valkey-io/valkey/pull/1920 API Changes: Add New configuration tls-auth-clients-user - Allowed values: `off` (default), `CN`. - `off` – disable TLS certificate–based auto-authentication. - `CN` – derive the ACL username from the Common Name (CN) field of the client certificate. New INFO stat - `acl_access_denied_tls_cert` - Counts failed TLS certificate–based authentication attempts, i.e. TLS connections where a client certificate was presented, a username was derived from it, but no matching ACL user was found. New ACL LOG reason - Reason string: `"tls-cert"` - Emitted when a client certificate’s Common Name fails to match any existing ACL user. Implementation Details: - Added getCertFieldByName() utility to extract fields from peer certificates. - Added autoAuthenticateClientFromCert() to handle automatic login logic post-handshake. - Integrated automatic authentication into the TLSAccept function after handshake completion. - Updated test suite (tests/integration/tls.tcl) to validate the feature.	2025-12-25 14:07:58 +02:00
Ozan Tezcan	fde3576f88	Fix adjacent slot range behavior in ASM operations (#14637 ) This PR containts a few changes for ASM: Bug fix: - Fixes an issue in ASM when adjacent slot ranges are provided in CLUSTER MIGRATION IMPORT command (e.g. 0-10 11-100). ASM task keeps the original slot ranges as given, but later the source node reconstructs the slot ranges from the config update as a single range (e.g. 0-100). This causes asmLookupTaskBySlotRangeArray() to fail to match the task, and the source node incorrectly marks the ASM task as failed. Although the migration completes successfully, the source node performs a blocking trim operation for these keys, assuming the slot ownership changed outside of an ASM operation. With this PR, redis merges adjacent slot ranges in a slot range array to avoid this problem. Other improvements: - Indicates imported/migrated key count in the log once asm operation is completed. - Use error return value instead of assert in parseSlotRangesOrReply() - Validate slot range array that is given by cluster implementation on ASM_EVENT_IMPORT_START. --------- Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2025-12-23 11:54:12 +03:00
Yuan Wang	33391a7b61	Support delay trimming slots after finishing migrating slots (#14567 ) This PR introduces a mechanism that allows a module to temporarily disable trimming after an ASM migration operation so it can safely finish ongoing asynchronous jobs that depend on keys in migrating (and about to be trimmed) slots. 1. ClusterDisableTrim/ClusterEnableTrim We introduce `ClusterDisableTrim/ClusterEnableTrim` Module APIs to allow module to disable/enable slot migration ``` /* Disable automatic slot trimming. / int RM_ClusterDisableTrim(RedisModuleCtx ctx) /* Enable automatic slot trimming / int RM_ClusterEnableTrim(RedisModuleCtx ctx) ``` Please notice: Redis will not start any subsequent import or migrate ASM operations while slot trimming is disabled, so modules must re-enable trimming immediately after completing their pending work. The only valid and meaningful time for a module to disable trimming appears to be after the MIGRATE_COMPLETED event. 2. REDISMODULE_OPEN_KEY_ACCESS_TRIMMED Added REDISMODULE_OPEN_KEY_ACCESS_TRIMMED to RM_OpenKey() so that module can operate with these keys in the unowned slots after trim is paused. And now we don't delete the key if it is in trim job when we access it. And `expireIfNeeded` returns `KEY_VALID` if `EXPIRE_ALLOW_ACCESS_TRIMMED` is set, otherwise, returns `KEY_TRIMMED` without deleting key. 3. REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS We also extend RM_GetContextFlags() to include a flag REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS indicating whether a trimming job is pending (due to trim pause) or in progress. Modules could periodically poll this flag to synchronize their internal state, e.g., if a trim job was delayed or if the module incorrectly assumed trimming was still active. Bugfix: RM_SetClusterFlags could not clear a flag after enabling it first. --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>	2025-12-16 16:30:56 +08:00
Yuan Wang	f3316c3a1a	Introduce flushdb option for repl-diskless-load (#14596 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details `repl-diskless-load` feature can effectively reduce the time of full synchronization, but maybe it is not widely used. `swapdb` option needs double `maxmemory`, and `on-empty-db` only works on the first full sync (the replica must have no data). This PR introduce a new option: `flushdb` - Always flush the entire dataset before diskless load. If the diskless load fails, the replica will lose all existing data. Of course, it brings the risk of data loss, but it provides a choice if you want to reduce full sync time and accept this risk.	2025-12-15 11:25:53 +08:00
Stav-Levi	23aca15c8c	Fix the flexibility of argument positions in the Redis API's (#14416 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details This PR implements flexible keyword-based argument parsing for all 12 hash field expiration commands, allowing users to specify arguments in any logical order rather than being constrained by rigid positional requirements. This enhancement follows Redis's modern design of keyword-based flexible argument ordering and significantly improves user experience. Commands with Flexible Parsing HEXPIRE, HPEXPIRE, HEXPIREAT, HPEXPIREAT, HGETEX, HSETEX some examples: HEXPIRE: * All these are equivalent and valid: HEXPIRE key EX 60 NX FIELDS 2 f1 f2 HEXPIRE key NX EX 60 FIELDS 2 f1 f2 HEXPIRE key FIELDS 2 f1 f2 EX 60 NX HEXPIRE key FIELDS 2 f1 f2 NX EX 60 HEXPIRE key NX FIELDS 2 f1 f2 EX 60 HGETEX: * All these are equivalent and valid: HGETEX key EX 60 FIELDS 2 f1 f2 HGETEX key FIELDS 2 f1 f2 EX 60 HSETEX: * All these are equivalent and valid: HSETEX key FNX EX 60 FIELDS 2 f1 v1 f2 v2 HSETEX key EX 60 FNX FIELDS 2 f1 v1 f2 v2 HSETEX key FIELDS 2 f1 v1 f2 v2 FNX EX 60 HSETEX key FIELDS 2 f1 v1 f2 v2 EX 60 FNX HSETEX key FNX FIELDS 2 f1 v1 f2 v2 EX 60	2025-12-14 09:35:12 +02:00
debing.sun	679e009b73	Add daily CI for vectorset (#14302 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details	2025-12-10 08:52:43 +08:00
Slavomir Kaslev	5299ccf2a9	Add kvstore type and decouple kvstore from its metadata (#14543 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Decouple kvstore from its metadata by introducing `kvstoreType` structure of callbacks. This resolves the abstraction layer violation of having kvstore include `server.h` directly. Move (again) cluster slot statistics to per slot dicts' metadata. The callback `canFreeDict` is used to prevent freeing empty per slot dicts from losing per slot statistics. Co-authored-by: Ran Tidhar <ran.tidhar@redis.com>	2025-12-08 21:12:33 +02:00
Yuan Wang	cb71dec0c3	Disable RDB compression when diskless replication is used (#14575 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details Fixes #14538 If the master uses diskless synchronization and the replica uses diskless load, we can disable RDB compression to reduce full sync time. I tested on AWS and found we could reduce time by 20-40%. In terms of implementation, when the replica can use diskless load, the replica will send `replconf rdb-no-compress 1` to master to deliver a RDB without compression. If your network is slow, please disable repl-diskless-load, and maybe even repl-diskless-sync --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>	2025-12-04 09:24:23 +08:00
Ozan Tezcan	08b63b6ceb	Fix flaky ASM tests (#14604 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details 1. Fix "Simple slot migration with write load" by introducing artificial delay to traffic generator to slow down it for tsan builds. Failed test: https://github.com/redis/redis/actions/runs/19720942981/job/56503213650 2. Fix "Test RM_ClusterCanAccessKeysInSlot returns false for unowned slots" by waiting config propagation before checking it on a replica. Failed test: https://github.com/redis/redis/actions/runs/19841852142/job/56851802772	2025-12-03 12:12:48 +03:00
Ozan Tezcan	3c57a8fc92	Retry an ASM import step when the source node is temporarily not ready (#14599 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details The cluster implementation may be temporarily unavailable and return an error to the `ASM_EVENT_MIGRATE_PREP` event to prevent starting a new migration. Although this is most likely a transient condition, the source node has no way to distinguish it from a real error, so it must fail the import attempt and start a new one. In Redis, failing an attempt is cheap, but in other cluster implementations it may require cleaning up resources and can cause unnecessary disruption. This PR introduces a new `-NOTREADY` error reply for the `CLUSTER SYNCSLOTS SYNC` command. When the source replies with `-NOTREADY`, the destination can recognize the condition as transient and retry sending `CLUSTER SYNCSLOTS SYNC` step periodically instead of failing the attempt.	2025-12-02 13:38:22 +03:00
Oran Agra	82fbf213eb	fix test tag leakage that can result in skipping tests (#14572 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details some error handling paths didn't remove the tags they added, but most importantly, if the start_server proc is given the "tags" argument more than once, on exit, it only removed the last one. this problem exists in start_cluster in list.tcl, and the result was that the "external:skip cluster modules" were not removed	2025-11-26 09:13:21 +02:00
RoyBenMoshe	39200596f4	SCAN: restore original filter order (#14537 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details In #14121, the SCAN filters order was changed, before #14121the order was - pattern, expiration and type, after #14121pattern became last, this break change broke the original behavior, which will cause scan with pattern also to remove the expired keys. This PR reorders the filters to be consistent with the original behavior and extends a test to cover this scenario.	2025-11-25 15:30:43 +08:00
lihp	0288d70820	Fixes an issue where EXEC checks ACL during AOF loading (#14545 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details This PR fixes an issue(#14541) where EXEC’s ACL recheck was still being performed during AOF loading, that may cause AOF loading failed, if ACL rules are changed and don't allow some commands in MULTI-EXEC.	2025-11-22 11:52:31 +08:00
debing.sun	bb6389e823	Fix min_cgroup_last_id cache not updated when destroying consumer group (#14552 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details ## Problem When destroying a consumer group with `XGROUP DESTROY`, the cached `min_cgroup_last_id` was not being invalidated. This caused incorrect behavior when using `XDELEX` with the `ACKED` option, as the cache still referenced the destroyed group's `last_id`. ## Solution Invalidate the `min_cgroup_last_id` cache when the destroyed group's `last_id` equals the cached minimum. The cache will be recalculated on the next call to `streamEntryIsReferenced()`. --------- Co-authored-by: guybe7 <guy.benoish@redislabs.com>	2025-11-21 22:37:17 +08:00
Ozan Tezcan	b632e9df6a	Fix flaky ASM write load test (#14551 ) Extend write pause timeout to stabilize ASM write load test under TSAN. Failing test for reference: https://github.com/redis/redis/actions/runs/19520561209/job/55882882951	2025-11-21 12:18:28 +03:00
Yuan Wang	7a3cb3b4b3	Fix CI flaky tests (#14531 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details - https://github.com/redis/redis/actions/runs/19200504999/job/54887625884 avoid calling `start_write_load` before pausing the destination node - https://github.com/redis/redis/actions/runs/18958533020/job/54140746904 maybe the replica did not sync with master, then the replica did not update the counter	2025-11-19 17:10:57 +08:00
Mincho Paskalev	837b14c89a	Fix ASan Daily (#14527 ) After https://github.com/redis/redis/pull/14226 module tests started running with ASan enabled. `auth.c` blocks the user on auth and spawns a thread that sleeps for 0.5s before unblocking the client and returning. A tcl tests unloads the module which may happen just after the spawned thread unblocks the client. In that case if the unloading finishes fast enough the spawned thread may try to execute code from the module's dynamic library that is already unloaded resulting in sefault. Fix: just wait on the thread during module's OnUnload method.	2025-11-19 10:56:18 +02:00
Oran Agra	0a6eacff1f	Add variable key-spec flags to SET IF* and DELEX (#14529 ) Some checks failed CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Reply-schemas linter / reply-schemas-linter (push) Has been cancelled Details These commands behave as DEL and SET (blindly Remove or Overwrite) when they don't get IF* flags, and require the value of the key when they do run with these flags. Making sure they have the VARIABLE_FLAGS flag, and getKeysProc that can provide the right flags depending on the arguments used. (the plain flags when arguments are unknown are the common denominator ones) Move lookupKey call in DELEX to avoid double lookup, which also means (some, namely arity) syntax errors are checked (and reported) before checking the existence of the key.	2025-11-12 11:36:10 +02:00
Sergei Georgiev	90ba7ba4dc	Fix XREADGROUP CLAIM to return delivery metadata as integers (#14524 ) ### Problem The XREADGROUP command with CLAIM parameter incorrectly returns delivery metadata (idle time and delivery count) as strings instead of integers, contradicting the Redis specification. ### Solution Updated the XREADGROUP CLAIM implementation to return delivery metadata fields as integers, aligning with the documented specification and maintaining consistency with Redis response conventions. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-11-11 19:05:22 +08:00

1 2 3 4 5 ...

2527 commits