Commit graph

12918 commits

Author SHA1 Message Date
Mincho Paskalev
b5a37c0e42
Add cmd tips for HOTKEYS. Return err when hotkeys START specifies invalid slots (#14761)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Reply-schemas linter / reply-schemas-linter (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
- When passing slots not within the range of a node to `HOTKEYS START
SLOTS ...` the hotkey command now returns error.
- Changed the cmd tips for the HOTKEYS subcommands so that they reflect
the special nature of the cmd in cluster mode - i.e command should be
issued against a single node only. Clients should not care about cluster
management and aggregation of results.
- Change reply schema to return Array of the maps. For a single node
this will return array of 1 element. Getting results from multiple nodes
will make it easy to concatenate the elements into one array.
2026-02-03 17:54:32 +02:00
I
02700f11cd
RDB Channel connections mistakenly discovered by Sentinel (#14728) (#14729)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Fix RDB Channel connections mistakenly discovered by Sentinel

During fullsync, if the main replication connection is interrupted, but
the rdbchannel connection is still active, it will be visible in the
"info replication" output. Currently, the rdbchannel connection does not
send `REPLCONF ip-address`, and in a meshed scenario, when the source IP
addresses of both connections differ, Sentinel will treat them as
separate replicas. This commit adds `REPLCONF ip-address` to rdbchannel
replica handshake if `server.slave_announce_ip` is enabled.

fixes: https://github.com/redis/redis/issues/14728

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2026-02-02 14:02:44 +03:00
Slavomir Kaslev
bafaec5b6a
Fix HOTKEYS to track each command in a MULTI/EXEC block (#14756)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Fix HOTKEYS to track each command in a MULTI/EXEC block.
2026-02-02 09:50:44 +02:00
RoyBenMoshe
bf6287d087
Redact user input in selected logs. (#14748)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
This PR continues the work #14645, to further ensure sensitive user
data is not exposed in logs when hide_user_data_from_log is enabled.

- Redact empty key notices during RDB load.
- Redact key names in eviction/expiration debug logs.
- Block DEBUG SCRIPT output and suppress raw string dump in crash object
debug when redaction is enabled.
- Redact malformed MODULE LOAD argument snippets and unresolved module
configuration logs.
- Redact empty key notices during RDB load.
- Redact key names during Lua globals allow‑list warnings.
2026-01-29 23:42:19 +08:00
Martin Dimitrov
0024d5dfde
Vectorize binary quantization path for vectorsets distance calculation (#14492)
Some checks failed
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Reply-schemas linter / reply-schemas-linter (push) Has been cancelled
This PR adds SIMD vectorization for binary quantization distance
calculation, similar to PR #14474.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2026-01-29 19:59:48 +08:00
Slavomir Kaslev
ca681f997e
Add LTRIM/LREM and RM_StringTruncate() memory tracking tests (#14751)
Add LTRIM/LREM and RM_StringTruncate() memory tracking tests.
2026-01-29 13:04:46 +02:00
Mincho Paskalev
591fc90263
Change reply schema for hotkeys get to use map instead of flat array (#14749)
Follow #14680
Reply of `HOTKEYS GET` is an unordered collection of key-value pairs. It
is more reasonable to be a map in resp3 instead of flat array.
2026-01-29 11:21:05 +02:00
Filipe Oliveira (Redis)
319153fe46
[Vector sets] Replace manual popcount with __builtin_popcountll for binary vector distance (#13962)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
This PR replaces the manual `popcount64()` implementation with
`__builtin_popcountll()` for computing Hamming distance in binary
vectors, when the underlying hardware supports the `POPCNT` instruction.

The built-in version simplifies the code and enables the compiler to
emit a single `POPCNT` instruction on supported CPUs, which is
significantly faster than the manual bitwise method. You can verify the
difference here:
[https://godbolt.org/z/TxWMcE8M3](https://godbolt.org/z/TxWMcE8M3) — the
manual version generates a long sequence of instructions (approximately
34 on modern HW) vs 1 instruction (popcnt) when using
__builtin_popcountll()

## Portability across platforms

This change maintains full portability across platforms and compilers.
The use of `__builtin_popcountll()` is guarded by the `HAVE_POPCNT`
macro, which is defined only when the compiler supports the
target("popcnt") attribute. At runtime, we also check
`__builtin_cpu_supports("popcnt")` to ensure the hardware provides
support for the instruction. If not available, the implementation safely
falls back to the original manual `popcount64()` logic.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2026-01-28 09:42:46 +08:00
debing.sun
beb75e40bf
Fix test failure when using bind "*" in introspection.tcl (#14745)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
The reason for the failure is that when starting server with bind *, the
host will be set to *. At this time, when reconnect, the client will not
recognize this host.
So this fix skipped checking whether the server was ready.
2026-01-27 20:50:06 +08:00
Sergei Georgiev
cecdc99873
Optimize Redis XREADGROUP CLAIM (#14726)
## Overview
This PR optimizes Redis Streams consumer group performance by replacing
the `pel_by_time` rax tree with a doubly-linked list, delivering
significant performance improvements for NACK updates and XREADGROUP
CLAIM operations while also reducing memory usage.

## The Problem
Consumer groups maintain a time-ordered index of pending entries using a
radix tree (`pel_by_time`). Every time a pending entry is reclaimed or
delivered, we need to update its delivery time, which currently
requires:

```c
raxRemovePelByTime(group->pel_by_time, old_time, &id);  // O(k) where k=key length
nack->delivery_time = current_time;
raxInsertPelByTime(group->pel_by_time, current_time, &id); // O(k) where k=key length
```

## The Key Insight

**99% of delivery_time updates set the value to the current time** —
which means they're appending to the tail of a time-ordered structure.

We're using a radix tree (O(k) operations where k is key length, plus
tree traversal overhead) for what is essentially an append-only workload
(should be O(1)).

## The Solution

Replace the rax tree with a doubly-linked list embedded directly in each
`streamNACK`:

```c
typedef struct streamNACK {
    mstime_t delivery_time;
    uint64_t delivery_count;
    streamConsumer *consumer;
    listNode *cgroup_ref_node;
    streamID id;                    // NEW
    struct streamNACK *pel_prev;    // NEW
    struct streamNACK *pel_next;    // NEW
} streamNACK;
```

Now updating a NACK becomes:
```c
pelListUpdate(group, nack, current_time);  // O(1): unlink + append
```

## Why This Works

**Typical case (99%):** Delivery time = current time
- Unlink from current position: O(1) — just update 2-4 pointers
- Append to tail: O(1) — update tail pointer and link

**Edge case (1%):** XCLAIM with explicit past IDLE time
- Still handled correctly by `pelListInsertSorted()` which scans
backward from tail
- Rare enough that O(N) worst case doesn't matter

## Memory Reduction

The linked list approach uses less memory than the rax tree:

**What we add:**
- 3 new fields in `streamNACK`: `id` (16 bytes) + `pel_prev` (8 bytes) +
`pel_next` (8 bytes) = 32 bytes per entry

**What we remove:**
- Entire `pel_by_time` rax tree with its node overhead (~40-50 bytes per
entry)

**Net result:** Lower memory footprint per pending entry, plus better
cache locality from eliminating the separate tree structure.

## Performance Impact

### Theoretical Analysis

| Operation | Before | After |
|-----------|--------|-------|
| NACK update | O(k) × 2 + tree overhead | O(1) |
| CLAIM iteration | O(k) per entry + traversal | O(1) per entry |

*k = key length (32 bytes: timestamp + stream ID)*

For a consumer group with 10,000 pending entries claiming 100 oldest:
- **Before:** Tree traversal + key comparisons for each operation
- **After:** Simple pointer updates

**Key Findings:**
- **28% higher throughput** for XREADGROUP with CLAIM
- **22% lower average latency** (0.195ms → 0.152ms)
- **21% lower P99 latency** (0.212ms → 0.168ms)
- XADD performance unchanged (69K ops/sec both implementations)
2026-01-27 20:16:16 +08:00
Martin Dimitrov
37f685908e
Vectorized the quantized 8-bit vector distance calculation (#14474)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This pull request vectorizes the 8-bit quantization vector-search path
in a similar was as the non-quantization path.
The assembly intrinsics are a bit more complicated than in the
non-quantization path, since we are operating on 8-bit integers and we
need to worry about preventing overflow. Thus, after loading the 8-bit
integers, they are extended into 16-bits before multiplying and
accumulating into 32-bit integers.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2026-01-27 09:25:59 +08:00
Yuan Wang
48aa1ce524
Avoid allocating and releasing list node in reply copy avoidance (#14739)
Some checks failed
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Reply-schemas linter / reply-schemas-linter (push) Has been cancelled
Optimizes handling of clients with referenced replies by embedding the
`pending_ref_reply_node` list node in `client` and avoiding
per-operation node alloc/free.

there is an improvement: ~2% on 4 and 16 io-threads. ~1% on 8 io-threads
2026-01-26 19:49:36 +08:00
Mincho Paskalev
b209e8afde
Fix hotkey info metric names. Disable HOTKEY SLOTS param for non-cluster (#14742)
Some hotkeys cpu metrics display time in milliseconds others in
microseconds.

Change the metrics showing time of command executions to all use
microseconds and use the `-us` postfix to show that.

Also, disable the `SLOTS` param for `HOTKEYS START` if we are not in
cluster mode.
2026-01-26 13:32:05 +02:00
Stav-Levi
a765ee8238
Add security configuration warnings at startup (#14708)
Adds startup-time security warnings when the default user permits
unauthenticated access, with behavior dependent on protected-mode and
bind settings.
Warnings are skipped in Sentinel mode since it intentionally
disables protected-mode by design.

- No password + no protected-mode + no bind: warn about accepting
  connections from any IP/interface
- No password + no protected-mode: warn about accepting connections
  from any IP on configured interface
- No password + protected-mode enabled: warn about accepting
  connections from local clients
2026-01-26 16:58:53 +08:00
Yuan Wang
6e2cbd51c3
Fix deferring free object that refcount is more than 1 (#14738)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
in https://github.com/redis/redis/pull/14440, we remove the refcount
check in
[tryDeferFreeClientObject](235e688b01 (diff-252bce0cc340542712f0c1adf62e9035ea47a4a064321fbf40ec3dd4b814aaf2R1509)),
it is ok in 8.4 version, since after command execution, the refcount of
a kvobject always is 1.
but in #14608 (8.6 RC1) we change this assumption, increment refcount
when a client refer a kvobject in reply, so now if the refcount of
kvobject is more than 1, we may let the io thread call `decrRefCount`,
there is data race, maybe it causes memory leak.
2026-01-25 18:41:16 +08:00
debing.sun
18538461d1
Add separate statistics for active expiration of keys and hash fields (#14727)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
### Summary

Adds `expired_keys_active` and `expired_subkeys_active` counters to
track keys and hash fields expired by the active expiration cycle,
distinguishing them from lazy expirations.
These new metrics are exposed in INFO stats output.

### Motivation

Currently, Redis tracks the total number of expired keys (expired_keys)
and expired hash fields (expired_subkeys), but there's no way to
differentiate between expirations triggered by active expire and lazy
expire.

---------

Co-authored-by: Moti Cohen <moti.cohen@redis.com>
2026-01-22 22:30:25 +08:00
Filipe Oliveira (Redis)
3ff37ea815
Reduce per command syscalls by reusing cached time when HW monotic clock is available (#14713)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This PR reduces per-command `ustime()` syscalls in `call()` by reusing
cached time and batching wall-clock updates when HW monotonic time is
available.

### What changed
- Pass `server.ustime` to `enterExecutionUnit()` instead of calling
`ustime()`.
- Use HW monotonic clock to measure duration and accumulate it across
commands.
- Refresh cached time with `ustime()` only when accumulated duration >
**10µs** or after **25 commands**.
- Fallback to direct `ustime()` when HW monotonic clock isn’t available.

### Impact
- `ustime` CPU: **4.58% → 0.25%**, which leads to ~4% boost on max QPS

### Notes
- Time drift is bounded (≤10µs or 25 commands).
- No behavior change on non-HW-monotonic systems.

---------

Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2026-01-22 17:45:45 +08:00
Sergei Georgiev
d099c10581
added: stream-idmp-duration and stream-idmp-maxsize to redis.conf (#14725)
## Summary
Adds missing IDMP configuration parameters to redis.conf, previously
ommitted in #14615

## Changes
- Added `stream-idmp-duration` configuration parameter with
documentation
- Added `stream-idmp-maxsize` configuration parameter with documentation
- Both parameters were already implemented in the code (src/config.c,
src/server.h) but were missing from redis.conf

## Configuration Parameters

### stream-idmp-duration
- **Purpose**: Duration (in seconds) to remember IDMP identifiers for
duplicate detection
- **Range**: 1 to 86400 seconds (1 second to 24 hours)
- **Default**: 100 seconds
- **Modifiable**: Yes, via CONFIG SET at runtime

### stream-idmp-maxsize
- **Purpose**: Maximum number of IDMP identifiers to track per producer
per stream
- **Range**: 1 to 10000 entries
- **Default**: 100 entries
- **Modifiable**: Yes, via CONFIG SET at runtime
2026-01-22 15:40:57 +08:00
Slavomir Kaslev
5dec7d3675
Add key allocation sizes histogram (#14695)
Add key allocation sizes histograms based on previous memory accounting work
in #14363 and #14451.

The histograms are exposed via `INFO keysizes` and use logarithmic (power-of-2) bins,
similar to current key sizes/length histogram implementation in the following fields:

    db0_distrib_lists_sizes:1=...,2=...,4=...
    db0_distrib_sets_sizes:1=...,2=...,4=...
    db0_distrib_hashes_sizes:1=...,2=...,4=...
    db0_distrib_zsets_sizes:1=...,2=...,4=...

To avoid confusion with existing distrib_strings_sizes histograms which are based on
string lengths we don't report allocation sizes histograms for strings.

So far per key and per slot memory accounting code has been relying type specific functions
(hashTypeAllocSize(), listTypeAllocSize(), zsetAllocSize(), etc) for computing data structure
allocation sizes since it's faster and we only need to track size deltas and not the complete
allocation size along with the kvobj and key length overhead. In order to keep the allocation
sizes histogram consistent, memory accounting code has been switched to use kvobjAllocSize()
instead which does return the total allocation size.

Note that the feature is enabled with `key-bytes-stats` or `cluster-slot-stats` config in redis
config file on startup.
2026-01-22 09:40:04 +02:00
cui
262ed50201
fix: two typos (#14655)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2026-01-22 10:33:58 +08:00
Paulo Sousa
c4baa64ea8
Optimize peak memory stats by switching from per-command checks to threshold-based (#14692)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This PR optimizes peak memory tracking by moving from **per-command
checks** to a **threshold-based mechanism** in `zmalloc`.

Instead of updating peak memory on every command, peak tracking is now
triggered only when a thread's memory delta exceeds **100KB**. This
reduces runtime overhead while keeping peak memory accuracy acceptable.

## Implementation Details

- Peak memory is tracked atomically in `zmalloc` when a thread's memory
delta exceeds 100KB
- Thread-safe peak updates using CAS
- Peak tracking considers both:
  - current used memory
  - zmalloc-reported peak memory

## Performance Results (ARM AArch64)

All performance numbers were obtained on an **AWS m8g.metal (ARM
AArch64)** instance.

The database was pre-populated with **1M keys**, each holding a **1KB
value**.
Benchmarks were executed using memtier with a **10 SET : 90 GET ratio**
and **pipeline = 10** ([full benchmark spec.
here](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-setget200c-1KiB-pipeline-10.yml)).

| Environment | Baseline `redis/redis` unstable (median ± std.dev) |
Comparison `paulorsousa/redis`
`f05a4bd273cb4d63ff03d33e6207837b6e51de86` (median) | % change (higher
better) | Note |

|------------------------------|----------------------------------------------------|----------------------------------------------------------------------------------:|--------------------------|-----------------------|
| oss-standalone | 802,830 ± 0.2% (7 datapoints) | 796,660 | -0.8% | No
change |
| oss-standalone-02-io-threads | 982,698 ± 0.6% (7 datapoints) | 980,520
| -0.2% | No change |
| oss-standalone-04-io-threads | 2,573,244 ± 1.9% (7 datapoints) |
2,630,931 | +2.2% | Potential improvement |
| oss-standalone-08-io-threads | 2,343,609 ± 1.6% (7 datapoints) |
2,455,630 | +4.8% | Improvement |
2026-01-21 22:52:31 +08:00
Mincho Paskalev
e3c38aab66
Handle primary/replica clients in IO threads (#14335)
# Problem

While introducing Async IO
threads(https://github.com/redis/redis/pull/13695) primary and replica
clients were left to be handled inside main thread due to data race and
synchronization issues. This PR solves this issue with the additional
hope it increases performance of replication.

# Overview

## Moving the clients to IO threads

Since clients first participate in a handshake and an RDB replication
phases it was decided they are moved to IO-thread after RDB replication
is done. For primary client this was trivial as the master client is
created only after RDB sync (+ some additional checks one can see in
`isClientMustHandledByMainThread`). Replica clients though are moved to
IO threads immediately after connection (as are all clients) so
currently in `unstable` replication happens while this client is in
IO-thread. In this PR it was moved to main thread after receiving the
first `REPLCONF` message from the replica, but it is a bit hacky and we
can remove it. I didn't find issues between the two versions.

## Primary client (replica node)

We have few issues here:
- during `serverCron` a `replicationCron` is ran which periodically
sends `REPLCONF ACK` message to the master, also checks for timed-out
master. In order to prevent data races we utilize`IOThreadClientsCron`.
The client is periodically sent to main thread and during
`processClientsFromIOThread` it's checked if it needs to run the
replication cron behaviour.

- data races with main thread - specifically `lastinteraction` and
`read_reploff` members of the primary client that are written to in
`readQueryFromClient` could be accessed at the same time from main
thread during execution of `INFO REPLICATION`(`genRedisInfoString`). To
solve this the members were duplicated so if the client is in IO-thread
it writes to the duplicates and they are synced with the original
variables each time the client is send to main thread ( that means `INFO
REPLICATION` could potentially return stale values).

- During `freeClient` the primary client is fetched to main thread but
when caching it(`replicationCacheMaster`) the thread id will remain the
id of the IO thread it was from. This creates problems when resurrecting
the master client. Here the call to `unbindClientFromIOThreadEventLoop`
in `freeClient` was rewritten to call `keepClientInMainThread` which
automatically fixes the problem.

- During `exitScriptTimedoutMode` the master is queued for reprocessing
(specifically process any pending commands ASAP after it's unblocked).
We do that by putting it in the `server.unblocked_clients` list, which
are processed in the next `beforeSleep` cycle in main thread. Since this
will create a contention between main and IO thread, we just skip this
queueing in `unblocked_clients` and just queue the client to main thread
- the `processClientsFromIOThread` will process the pending commands
just as main would have.

## Replica clients (primary node)

We move the client after RDB replication is done and after replication
backlog is fed with its first message.
We do that so that the client's reference to the first replication
backlog node is initialized before it's read from IO-thread, hence no
contention with main thread on it.

### Shared replication buffer

Currently in unstable the replication buffer is shared amongst clients.
This is done via clients holding references to the nodes inside the
buffer. A node from the buffer can be trimmed once each replica client
has read it and send its contents. The reference is
`client->ref_repl_buf_node`. The replication buffer is written to by
main thread in `feedReplicationBuffer` and the refcounting is intrusive
- it's inside the replication-buffer nodes themselves.

Since the replica client changes the refcount (decreases the refcount of
the node it has just read, and increases the refcount of the next node
it starts to read) during `writeToClient` we have a data race with main
thread when it feeds the replication buffer. Moreover, main thread also
updates the `used` size of the node - how much it has written to it,
compared to its capacity which the replica client relies on to know how
much to read. Obviously replica being in IO-thread creates another data
race here. To mitigate these issues a few new variables were added to
the client's struct:

- `io_curr_repl_node` - starting node this replica is reading from
inside IO-thread
- `io_bound_repl_node` - the last node in the replication buffer the
replica sees before being send to IO-thread.

These values are only allowed to be updated in main thread. The client
keeps track of how much it has read into the buffer via the old
`ref_repl_buf_node`. Generally while in IO-thread the replica client
will now keep refcount of the `io_curr_repl_node` until it's processed
all the nodes up to `io_bound_repl_node` - at that point its returned to
main thread which can safely update the refcounts.
The `io_bound_repl_node` reference is there so the replica knows when to
stop reading from the repl buffer - imagine that replica reads from the
last node of the replication buffer while main thread feeds data to it -
we will create a data race on the `used` value
(`_writeToClientSlave`(IO-thread) vs `feedReplicationBuffer`(main)).
That's why this value is updated just before the replica is being send
to IO thread.
*NOTE*, this means that when replicas are handled by IO threads they
will hold more than one node at a time (i.e `io_curr_repl_node` up to
`io_bound_repl_node`) meaning trimming will happen a bit less
frequently. Tests show no significant problems with that.
(tnx to @ShooterIT for the `io_curr_repl_node` and `io_bound_repl_node`
mechanism as my initial implementation had similar semantics but was way
less clear)

Example of how this works:

* Replication buffer state at time N:
   | node 0| ... | node M, used_size K |
* replica caches `io_curr_repl_node`=0, `io_bound_repl_node`=M and
`io_bound_block_pos`=K
* replica moves to IO thread and processes all the data it sees
* Replication buffer state at time N + 1:
| node 0| ... | node M, used_size Full | |node M + 1| |node M + 2,
used_size L|, where Full > M
* replica moves to main thread at time N + 1, at this point following
happens
   - refcount to node 0 (io_curr_repl_node) is decreased
- `ref_repl_buf_node` becomes node M(io_bound_repl_node) (we still have
size-K bytes to process from there)
- refcount to node M is increased (now all nodes from 0 up to M-1
including can be trimmed unless some other replica holds reference to
them)
- And just before the replica is send back to IO thread the following
are updated:
   - `io_bound_repl_node` ref becomes node M+2
   - `io_bound_block_pos` becomes L

Note that replica client is only moved to main if it has processed all
the data it knows about (i.e up to `io_bound_repl_node` +
`io_bound_block_pos`)

### Replica clients kept in main as much as possible

During implementation an issue arose - how fast is the replica client
able to get knowledge about new data from the replication buffer and how
fast can it trim it. In order for that to happen ASAP whenever a replica
is moved to main it remains there until the replication buffer is fed
new data. At that point its put in the pending write queue and special
cased in handleClientsWithPendingWrites so that its send to IO thread
ASAP to write the new data to replica. Also since each time the replica
writes its whole repl data it knows about that means after it's send to
main thread `processClientsFromIOThread` is able to immediately update
the refcounts and trim whatever it can.

### ACK messages from primary

Slave clients need to periodically read `REPLCONF ACK` messages from
client. Since replica can remain in main thread indefinitely if no DB
change occurs, a new atomic `pending_read` was added during
`readQueryFromClient`. If a replica client has a pending read it's
returned back to IO-thread in order to process the read even if there is
no pending repl data to write.

### Replicas during shutdown

During shutdown the main thread pauses write actions and periodically
checks if all replicas have reached the same replication offset as the
primary node. During `finishShutdown` that may or may not be the case.
Either way a client data may be read from the replicas and even we may
try to write any pending data to them inside `flushSlavesOutputBuffers`.
In order to prevent races all the replicas from IO threads are moved to
main via `fetchClientFromIOThread`. A cancel of the shutdown should be
ok, since the mechanism employed by `handleClientsWithPendingWrites`
should return the client back to IO thread when needed.

## Notes

While adding new tests timing issues with Tsan tests were found and
fixed.

Also there is a data race issue caught by Tsan on the `last_error`
member of the `client` struct. It happens when both IO-thread and main
thread make a syscall using a `client` instance - this can happen only
for primary and replica clients since their data can be accessed by
commands send from other clients. Specific example is the `INFO
REPLICATION` command.
Although other such races were fixed, as described above, this once is
insignificant and it was decided to be ignored in `tsan.sup`.

---------

Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2026-01-21 16:19:12 +02:00
Slavomir Kaslev
b9c00b27f8
Make cluster-slot-stats-enabled config multivalued (#14719)
This allows users to specify exactly what per slot statistics are to be
collected -- CPU, network traffic and/or memory used.

The config accepts multiple values as a space-separated list:
  - cpu: Track CPU usage per slot (cpu-usec metric)
  - net: Track network bytes per slot (network-bytes-in, network-bytes-out metrics)
  - mem: Track memory usage per slot (memory-bytes metric)
  - yes: Enable all tracking (equivalent to "cpu net mem")
  - no: Disable all tracking (default)

Note: Memory tracking (mem) can ONLY be enabled at startup. If you try to enable
memory tracking via CONFIG SET when it wasn't enabled at startup, the command will
fail. However, you can disable memory tracking at runtime by removing the 'mem' flag.
Once disabled, memory tracking cannot be re-enabled without restarting the server.
2026-01-21 15:36:03 +02:00
EmilyZHANG00
5656e99c7c
Modify the condition for adding CLIENT_IO_CLOSE_ASAP flag (#14709)
1. CLIENT_IO_CLOSE_ASAP is a flag for c->io_flags, which does not match
c->flags. The flag corresponding to c->flags is CLIENT_CLOSE_ASAP.
2. If we want to asynchronously free a client running on an io-thread,
we should check its c->io_flags to determine if CLIENT_IO_CLOSE_ASAP has
already been added. If it hasn't been added before, then
CLIENT_IO_CLOSE_ASAP should be added.
2026-01-21 19:39:15 +08:00
Yuan Wang
a2e901c93d
Fix inaccurate IO thread client count due to delayed freeing (#14723)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
There is a failure in CI:
```
*** [err]: Clients are evenly distributed among io threads in tests/unit/introspection.tcl
Expected '2' to be equal to '1' (context: type eval line 3 cmd {assert_equal $cur_clients 1} proc ::start_server)
```

There might be a client used for health checks (to detect if the server
is up)
that has not been freed timely. This can lead to an inaccurate count of
connected clients processed by IO threads. So we wait it to close
completely.
2026-01-21 18:13:40 +08:00
Stav-Levi
25f780b662
Fix crash when calling internal container command without arguments (#14690)
Addresses crash and clarifies errors around container commands.

- Update server.c to handle container commands with no subcommand: emit
"missing subcommand. Try HELP."; keep "unknown subcommand" for invalid
subcommands; for unknown commands, include args preview only when
present
- Add a test module command subcommands.internal_container with a
subcommand for validation
- Add unit test asserting missing subcommand error when calling the
internal container command without arguments
2026-01-21 08:38:04 +02:00
Yuan Wang
5c5c7c5a2c
Quick user ACL permission verification (#14714)
Optimizes ACL evaluation by adding a fast path for fully privileged users.
2026-01-21 14:34:53 +08:00
Slavomir Kaslev
818046d031
Avoid memory allocation on quicklist iteration (#14720)
This change is the last in the series (see #14200 and #14473) where we
store iterators
on the stack rather than allocating them the heap.
2026-01-21 08:09:49 +02:00
Yuan Wang
e8240017fd
Fix prefetch size (#14715)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Fixes prefetch sizing so when the remaining work is smaller than
the effective max batch (2× configured), Redis prefetches it all
at once instead of splitting into an inefficient tiny tail batch.
2026-01-20 19:43:57 +08:00
debing.sun
e76e3af5b7
Fix some test timing issues in replication.tcl and maxmemory.tcl (#14718)
1) Replace fixed sleep with wait_for_condition to avoid flaky test
failures when checking master_current_sync_attempts counter.

2) Similar to https://github.com/redis/redis/pull/14674, use
assert_lessthan_equal instead of assert_lessthan to verify the idle
time.
2026-01-20 19:25:15 +08:00
debing.sun
d2da5cca37
Fix timeout waiting for blocked clients in pause test (#14716)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
To verify the pause duration, we need to wait for the client to be
unpause and the command to complete, so add `$rd read` to wait for the
command to finish.

The test failure was caused by $rd still being blocked and not closed in
the previous test, so the next test would get 2 blocked clients instead
of 1 client, causing the test to fail.
2026-01-20 17:12:22 +08:00
Omer Shadmi
8816bcd973
MOD-13504: Update Search to RC1 8.5.90 (#14717)
Update Search to 8.6 RC1 version 8.5.90
2026-01-20 11:07:15 +02:00
Mincho Paskalev
1ab0cd228f
Add hotkeys memory to memory-overhead and fix hotkeys info keys (#14711)
Add the memory overhead of the hotkeyStats structure to
`used_memory_overhead`, add `hotkeys-` prefix to hotkey keys in INFO and
remove `used_memory` in the hotkeys info section as it's unneeded (too
little memory for us to care about).

Tnx @oranagra for pointing
[this](https://github.com/redis/redis/pull/14680#discussion_r2702151804)
out.
2026-01-20 10:35:00 +02:00
Yuan Wang
cfa6129040
Minor fixes for ASM (#14707)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
- **TCL test failure**

https://github.com/redis/redis/actions/runs/21121021310/job/60733781853#step:6:5705
```
[err]: Test cluster module notifications when replica restart with RDB during importing
in tests/unit/cluster/atomic-slot-migration.tcl
Expected '{sub: cluster-slot-migration-import-started, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100}' to be equal to '{sub: cluster-slot-migration-import-started, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100} {sub: cluster-slot-migration-import-completed, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100}' (context: type eval line 29 cmd {assert_equal  [list  "sub: cluster-slot-migration-import-started, source_node_id:$src_id, destination_node_id:$dest_id, task_id:$task_id, slots:0-100"  ] [R 4 asm.get_cluster_event_log]} proc ::test)
```
If there is a delay to work to check, the ASM task may complete, so we
will get `started & completed` ASM log instead of only `started` log, it
feels fragile, so delete the check, we will check all logs later.
```
                restart_server -4 true false true save ;# rdb save
---> if there is a delay, the ASM task should complete
                # the asm task info in rdb will fire module event
                assert_equal  [list \
                    "sub: cluster-slot-migration-import-started, source_node_id:$src_id, destination_node_id:$dest_id, task_id:$task_id, slots:0-100" \
                ] [R 4 asm.get_cluster_event_log]
```
- **Start BGSAVE for slot snapshot ASAP**
Since we consider the migrating client as a replica that wants diskless
replication, so it will wait for repl-diskless-sync-delay` to start a
new fork after the last child exits. But actually slot snapshot can not
be shared with other slaves, so we can start BGSAVE for it immediately.

  also resolve internal ticket RED-177974.
2026-01-19 19:57:20 +08:00
Tom Gabsow
c42d07a76e
MOD-13505 Update DataType Modules to 8.5.90 (#14705)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
update data type modules to 8.6 RC1
time series v8.5.90
bloom v8.5.90
json v8.5.90
2026-01-19 09:37:07 +02:00
Filipe Oliveira (Redis)
c118f91b25
Optimize lpDecodeBacklen() fast paths by removing loop-based decoding (#14662)
## Optimization details

The current `lpDecodeBacklen()` implementation decodes the backlen using
a loop with backward pointer mutation and a branch-heavy termination
condition:

```c
do {
    val |= (uint64_t)(p[0] & 127) << shift;
    if (!(p[0] & 128)) break;
    shift += 7;
    p--;
    if (shift > 28) return UINT64_MAX;
} while(1);
```

While correct, this structure introduces avoidable overhead in hot
paths:

- repeated loop control
- unpredictable branch (if (!(p[0] & 128)) break)
- increased front-end pressure and bad speculation

### Optimization

This PR replaces the loop with a straight-line implementation optimized
for the common case:

- explicit fast paths for 1–2 byte backlen encodings (dominant in
practice)
- unrolled handling up to the maximum 5-byte encoding
- no pointer mutation, no loop, fewer branches
- identical encoding semantics and validation behavior

leading to a 5.3% boost on listpack iterator heavy benchmark on HASH
datatype.
2026-01-19 14:23:15 +08:00
debing.sun
39881fa6f2
Reply Copy Avoidance (#14608)
This PR is based on https://github.com/valkey-io/valkey/pull/2078

# Reply Copy Avoidance Optimization

This PR introduces an optimization to avoid unnecessary memory copies
when sending replies to clients in Redis.

## Overview

Currently, Redis copies reply data into client output buffers before
sending responses. This PR implements a mechanism to avoid these copies
in certain scenarios, improving performance and reducing memory
overhead.

### Key Changes
* Added capability to reply construction allowing to interleave regular
replies with copy avoid replies in client reply buffers
* Extended write-to-client handlers to support copy avoid replies
* Added copy avoidance of string bulk replies when copy avoidance
indicated by I/O threads
* Copy avoidance is beneficial for performance despite object size only
starting certain number of threads. So it will be enabled only starting
certain number of threads.

**Note**: When copy avoidance disabled content and handling of client
reply buffers remains as before this PR

---------

Signed-off-by: Alexander Shabanov <alexander.shabanov@gmail.com>
Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Alexander Shabanov <alexander.shabanov@gmail.com>
Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Slavomir Kaslev <slavomir.kaslev@gmail.com>
Co-authored-by: moticless <moticless@github.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2026-01-19 11:09:16 +08:00
Moti Cohen
609acaad02
Optimize zset to use dict with no_value=1 (#14701)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
* Embed sds element inside skiplist nodes: Changed zset dict to store
zskiplistNode* as keys (with no_value=1) instead of storing sds keys and
double* values, eliminating redundant sds storage and enabling
single-allocation nodes
* Single allocation for skiplist nodes: Each node now contains: fixed
fields + level[] array + embedded sds, reducing memory fragmentation and
allocation overhead. This optimization is based on https://github.com/valkey-io/valkey/pull/1427
* Optimize lookups with dictFindLink: Use dictFindLink in zsetAdd to
avoid double hash table lookup when inserting new elements (find + add
becomes single operation)
* Simplify score updates
2026-01-18 14:52:50 +02:00
Filipe Oliveira (Redis)
7f541b9607
Prefetch client fields before prefetching command-related data (#14700)
This PR refines the prefetch strategy by removing ineffective (to close
on the pipeline) dictionary-level prefetching and improving prefetch
usage in IO threads. The goal is to better aligning prefetches with
predictable access patterns.

## Changes

- Removed speculative prefetching from `dictFindLinkInternal()`,
simplifying the dictionary lookup hot path.
- Introduced a two-phase prefetch approach in
`prefetchIOThreadCommands()`:
  - Phase 1: Prefetch client structures and `pending_cmds`
- Phase 2: Add commands to the batch and prefetch follow-up fields
(`reply`, `mem_usage_bucket`)

## Performance

Measured with
`memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-16`.

| Environment                  | % change |
|-----------------------------|----------|
| oss-standalone               | -0.1%    |
| oss-standalone-02-io-threads | +0.4%    |
| oss-standalone-04-io-threads | +1.6%    |
| oss-standalone-08-io-threads | +2.3%    |
| oss-standalone-12-io-threads | +0.7%    |
| oss-standalone-16-io-threads | +1.9%    |

Overall, this shows an ~2% throughput improvement on IO-threaded
configurations, with no meaningful impact on non-IO-threaded setups.

---------

Co-authored-by: Yuan Wang <wangyuancode@163.com>
2026-01-18 20:14:39 +08:00
Mincho Paskalev
c93e4a62c6
Add hotkeys detection (#14680)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Reply-schemas linter / reply-schemas-linter (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
# Description

Introducing a new method for identifying hotkeys inside a redis server
during a tracking time period.

Hotkeys in this context are defined by two metrics:
* Percentage of time spend by cpu on the key from the total time during
the tracking period
* Percentage of network bytes (input+output) used for the key from the
total network bytes used by redis during the tracking period

## Usage

Although the API is subject to change the general idea is for the user
to initiate a hotkeys tracking process which should run for some time.
The keys' metrics are recorded inside a probabilistic structure and
after that the user is able to fetch the top K of them.

### Current API

```
HOTKEYS START
            <METRICS count [CPU] [NET]>
            [COUNT k] 
            [DURATION duration]
            [SAMPLE ratio]
            [SLOTS count slot…]

HOTKEYS GET
HOTKEYS STOP
HOTKEYS RESET

```

### HOTKEYS START

Start a tracking session if either no is already started, or one was
stopped or reset. Return error if one is in progress.

* METRICS count [CPU] [NET] - chose one or more metrics to track
* COUNT k - track top K keys
* DURATION duration - preset how long the tracking session should last
* SAMPLE ratio - a key is tracked with probability 1/ratio
* SLOTS count slot... - Only track a key if it's in a slot amongst the
chosen ones

### HOTKEYS GET

Return array of the chosen metrics to track and various other metadata.
(nil) if no tracking was started or it was reset.

```
127.0.0.1:6379> hotkeys get
1) "tracking-active"
2) 1
3) "sample-ratio"
4) <ratio>
5) "selected-slots" (empty array if no slots selected)
6) 1) 0
   2) 5
   3) 6
7) "sampled-command-selected-slots-ms" (show on condition sample-ratio > 1 and selected-slots != empty-array)
8) <time-in-milliseconds>
9) "all-commands-selected-slots-ms" (show on condition selected-slots != empty-array)
10) <time-in-milliseconds>
11) "all-commands-all-slots-ms"
12) <time-in-milliseconds>
13) "net-bytes-sampled-commands-selected-slots" (show on condition sample-ratio > 1 and selected-slots != empty-array)
14) <num-bytes>
15) "net-bytes-all-commands-selected-slots" (show on condition selected-slots != empty-array)
16) <num-bytes>
17) "net-bytes-all-commands-all-slots"
18) <num-bytes>
19) "collection-start-time-unix-ms"
20) <start-time-unix-timestamp-in-ms>
21) "collection-duration-ms"
22) <duration-in-milliseconds>
23) "used-cpu-sys-ms"
24) <duration-in-millisec>
25) "used-cpu-user-ms"
26) <duration-in-millisec>
27) "total-net-bytes"
28) <num-bytes>
29) "by-cpu-time"
30) 1) key-1_1
    2) <millisec>
    ...
    19) key-10_1
    20) <millisec>
31) 1) "by-net-bytes"
32) 1) key-1_2
    2) <num-bytes>
    ...
    19) key-10_2
    20) <num-bytes>

```

### HOTKEYS STOP

Stop tracking session but user can still get results from `HOTKEYS GET`.

### HOTKEYS RESET

Release resources used for hotkeys tracking only when it is stopped.
Return error if a tracking is active.

## Additional changes

The `INFO` command now has a "hotkeys" section with 3 fields
* tracking_active - a boolean flag indicating whether or not we
currently track hotkeys.
* used-memory - memory overhead of the structures used for hotkeys
tracking.
* cpu-time - time in ms spend updating the hotkey structure. 

## Implementation

Independent of API, implementation is based on a probabilistic structure
- [Cuckoo Heavy
Keeper](https://dl.acm.org/doi/abs/10.14778/3746405.3746434) structure
with added min-heap to keep track of top K hotkey's names. CHK is an
loosely based on
[HeavyKeeper](https://www.usenix.org/conference/atc18/presentation/gong)
which is used in RedisBloom's TopK but has higher throughput.

Random fixed probability sampling via the `HOTKEYS start sample <ratio>`
param. Each key is sampled with probability `1/ratio`.

## Performance implications

With low enough sample rate (controlled by `HOTKEYS start sample
<ratio>`) there is negligible performance hit. Tracking every key though
can incur up to 15% hit in [the worst
case](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-get-10B-pipeline-500.yml)
after running the tests in this
[bench](https://github.com/redis/redis-benchmarks-specification/).

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Slavomir Kaslev <slavomir.kaslev@gmail.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
2026-01-16 17:15:28 +02:00
Filipe Oliveira (Redis)
9bbeaa312d
Avoid unnecessary command rewriting for SET command with expiration (#14699)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This PR optimizes SET commands with expiration options (EX/PX/EXAT) by
using in-place argument replacement instead of full command vector
rewrite during replication propagation/AOF.

We've moved from rewriteClientCommandVector taking ~6% of CPU cycles on
mixed SET+GET benchmark with expiration
2026-01-16 09:57:37 +08:00
Moti Cohen
11e73c66a8
Modules KeyMeta (Keys Metadata) (#14445)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Modules KeyMeta (Keys Metadata)

Redis modules often need to associate additional metadata with keys in
the keyspace. The objective is to create a unified and extensible
interface, usable by modules, Redis core, and maybe later by the users,
that facilitate the association and management of metadata with keys.
While extending RedisModuleTypes might be an easier path, this proposal
goes one step further: a general-purpose mechanism that lets attach
metadata to any key, independent of underlying data type.

A major part of this feature involves defining how metadata is managed
throughout a key’s lifecycle. Modules will be able to optionally
register distinct metadata classes, each with its own lifecycle
callbacks and capable of storing arbitrary 8-byte value per key. These
metadata values will be embedded directly within Redis’s core key-value
objects to ensure fast access and automatic callback execution as keys
are created, updated, or deleted. Each 8 bytes of metadata can represent
either a simple primitive value or a pointer/handle to more complex,
externally managed data by the module and RDB serialized along with the
key.

Key Features:
- Modules can register up to 7 metadata classes (8 total, 1 reserved)
- Each class: 4-char name + 5-bit version (e.g., "SRC1" v1)
- Each class attaches 8 bytes per key (value or pointer/handle)
- Separate namespace from module data types

Module API:
- RedisModule_CreateKeyMetaClass() - Register metadata class
- RedisModule_ReleaseKeyMetaClass() - Release metadata class
- RedisModule_SetKeyMeta() - Attach/update metadata
- RedisModule_GetKeyMeta() - Retrieve metadata

Lifecycle Callbacks:
- copy, rename, move - Handle key operations
- unlink, free - Handle key deletion/expiration
- rdb_save, rdb_load - RDB persistence
- aof_rewrite - AOF rewrite support

Implementation:
- Metadata slots allocated before kvobj in reverse class ID order
- 8-bit metabits bitmap tracks active classes per key
- Minimal memory overhead - only allocated slots consume memory

RDB Serialization (v13):
- New opcode RDB_OPCODE_KEY_METADATA
- Compact 32-bit class spec: 24-bit name + 5-bit ver + 3-bit flags
- Self-contained format: [META,] TYPE, KEY, VALUE
- Portable across cluster nodes

Integration:
- Core ops: dbAdd, dbSet, COPY, MOVE, RENAME, DELETE
- DUMP/RESTORE support
- AOF rewrite via module callbacks
- Defragmentation support
- Module type I/O refactored to ModuleEntityId
2026-01-15 23:11:17 +02:00
Sergei Georgiev
221409788a
Add idempotency support to XADD via IDMPAUTO and IDMP parameters (#14615)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Reply-schemas linter / reply-schemas-linter (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
# Overview

This PR introduces idempotency support to Redis Streams' XADD command,
enabling automatic deduplication of duplicate message submissions
through optional IDMPAUTO and IDMP parameters with producer
identification. This enables reliable at-least-once delivery while
preventing duplicate entries in streams.

## Problem Statement

Current Redis Streams implementations lack built-in idempotency
mechanisms, making reliable at-least-once delivery impossible without
accepting duplicates:

- **Application-level tracking**: Developers must maintain separate data
structures to track submitted messages
- **Race conditions**: Network failures and retries can result in
duplicate stream entries
- **Complexity overhead**: Each producer must implement custom
deduplication logic
- **Memory inefficiency**: External deduplication systems duplicate
Redis's storage capabilities

This lack of native idempotency support creates reliability challenges
in distributed systems where at-least-once delivery semantics are
required but exactly-once processing is desired.

## Solution

Extends XADD with optional idempotency parameters that include producer
identification:

```
XADD key [NOMKSTREAM] [KEEPREF | DELREF | ACKED] [IDMPAUTO pid | IDMP pid iid] [MAXLEN | MINID [= | ~] threshold [LIMIT count]] <* | id> field value [field value ...]
```

### Producer ID (pid)

- **pid** (producer id): A unique identifier for each producer
- Must be unique per producer instance
- Producers must use the same pid after restart to access their
persisted idempotency tracking
- Enables per-producer idempotency tracking, isolating duplicate
detection between different producers

**Format**: Binary or string, recommended max 36 bytes

**Generation**: 
- **Recommended**: UUID v4 for globally unique identification
- **Alternative**: `hostname:process_id` or application-assigned IDs

### Idempotency Modes

**IDMPAUTO pid (Automatic Idempotency)**:

- Producer specifies its pid, Redis automatically calculates a unique
idempotent ID (iid) based on entry content
- Hash calculation combines XXH128 hashing of individual field-value
pairs using an order-independent Sum + XOR approach with rotation (each
pair: `XXH128(field || field_length || value)`)
- 16-byte binary iid with extremely low accidental collision probability
- XXH128 is a non-cryptographic hash function: fast and
well-distributed, but does NOT prevent intentional collision attacks
- For protection against adversarial collision crafting, use IDMP mode
with cryptographically-signed idempotent IDs
- Order-independent: field ordering does not affect the calculated iid
- If (pid, iid) pair exists in producer's IDMP map: returns existing
entry ID without creating duplicate entry
- Generally slower than manual mode due to hash calculation overhead

**IDMP pid iid (Manual Idempotency)**:

- Caller provides explicit producer id (pid) and idempotent ID (iid) for
deduplication
- iid must be unique per message (either globally or per pid)
- Faster processing than IDMPAUTO (no hash calculation overhead)
- Enables shorter iids for reduced memory footprint
- If (pid, iid) pair exists in producer's IDMP map: returns existing
entry ID without comparing field contents
- Caller responsible for iid uniqueness and consistency across retries

Both modes can only be specified when entry ID is `*` (auto-generated).

### Deduplication Logic

When XADD is called with idempotency parameters:

1. Redis checks if the message was recently added to the stream based on
the (pid, iid) pair
2. If the (pid, iid) pair matches a recently-seen pair for that
producer, the message is assumed to be identical
3. No duplicate message is added to the stream; the existing entry ID is
returned
4. With **IDMP pid iid**: Redis does not compare the specified fields
and their values—two messages with the same (pid, iid) are assumed
identical
5. With **IDMPAUTO pid**: Redis calculates the iid from message content
and checks for duplicates

## IDMP Map: Per-Producer Time and Capacity-Based Expiration

Each producer with idempotency enabled maintains its own isolated IDMP
map (iid → entry_id) with dual expiration criteria:

**Time-based expiration (duration)**:

- Each iid expires automatically after duration seconds from insertion
- Provides operational guarantee: Redis will not forget an iid before
duration elapses (unless capacity reached)
- Configurable per-stream via XCFGSET

**Capacity-based expiration (maxsize)**:

- Each producer's map enforces maximum capacity of maxsize entries
- When capacity reached, oldest iids for that producer are evicted
regardless of remaining duration
- Prevents unbounded memory growth during extended usage

### Configuration Commands

**XINFO STREAM**: View current configuration and metrics

Use `XINFO STREAM key` to retrieve idempotency configuration
(idmp-duration, idmp-maxsize) along with tracking metrics.

**XCFGSET**: Configure expiration parameters

```
XCFGSET key [IDMP-DURATION duration] [IDMP-MAXSIZE maxsize]
```

- **duration**: Seconds to retain each iid (range: 1- 86400 seconds)
- **maxsize**: Maximum iids to track per producer (range: 1-10,000
entries)
- Calling XCFGSET clears all existing producer IDMP maps for the stream

**Default Configuration** (when XCFGSET not called):

- Duration: 100 seconds
- Maxsize: 100 iids per producer
- Runtime configurable via: `stream-idmp-duration` and
`stream-idmp-maxsize`

## Response Behavior

**On first submission** (pid, iid) pair not in producer's map:

- Entry added to stream with generated entry ID
- (pid, iid) pair stored in producer's IDMP map with current timestamp
- Returns new entry ID

**On duplicate submission** (pid, iid) pair exists in producer's map:

- No entry added to stream
- Returns existing entry ID from producer's IDMP map
- Identical response to original submission (client cannot distinguish)

## Stream Metadata

XINFO STREAM extended with idempotency metrics and configuration:

- **idmp-duration**: The duration value (in seconds) configured for the
stream's IDMP map
- **idmp-maxsize**: The maxsize value configured for the stream's IDMP
map
- **pids-tracked**: Current number of producers with active IDMP maps
- **iids-tracked**: Current total number of iids across all producers'
IDMP maps (reflects active iids that haven't expired or been evicted)
- **iids-added**: Lifetime cumulative count of entries added with
idempotency parameters
- **iids-duplicates**: Lifetime cumulative count of duplicate iids
detected across all producers

## Persistence and Restart Behavior

**IDMP maps are fully persisted and restored across Redis restarts**:

- **RDB/AOF**: All pid-iid pairs, timestamps, and configuration are
included in snapshots and AOF logs
- **Recovery**: On restart, all tracked (pid, iid) pairs remain valid
and operational
- **Producer Requirement**: Producers must reuse the same pid after
restart to access their persisted IDMP map
- **Configuration**: Stream-level settings (duration, maxsize) persist
across restarts
- **Important**: Calling XCFGSET after restart clears restored IDMP maps
(same behavior as during runtime)

## Key Benefits

- **Enables At-most-once Producer Semantics**: Makes it possible to
safely retry message submissions without creating duplicates
- **Automatic Retry Safety**: Network failures and retries cannot create
duplicate entries
- **Producer Isolation**: Each producer maintains independent
idempotency tracking
- **Memory Efficient**: Time and capacity-based expiration per producer
prevents unbounded growth
- **Flexible Implementation**: Choose automatic (IDMPAUTO) or manual
(IDMP) based on performance needs
- **Backward Compatible**: Fully optional parameters with zero impact on
existing XADD behavior
- **Collision Resistant**: XXH128 with Sum + XOR combination and
field-length separators provides high-quality non-cryptographic hashing
for IDMPAUTO with extremely low collision probability and prevents
ambiguous concatenation attacks
2026-01-15 21:58:44 +08:00
Moti Cohen
8da9c04bb8
Eliminate zslGetRank calls from ZCOUNT/ZLEXCOUNT + simplify zslGetRankByNode() (#14684)
- Eliminate redundant zslGetRank calls in ZCOUNT/ZLEXCOUNT
- Simplify and clarify zslGetRankByNode()
- Add UT for skiplist
2026-01-15 11:48:16 +02:00
Filipe Oliveira (Redis)
7e7c7b0558
Fix flaky test failures in caused by clock precision issues with monotonic clock. (#14697)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Fix flaky test failures in `tests/unit/moduleapi/blockedclient.tcl`
caused by
clock precision issues with monotonic clock.

The test runs a command that blocks for 200ms and then asserts the
elapsed time
is >= 200ms. Due to clock skew and timing precision differences, the
measured
time occasionally comes back as 199ms, causing spurious test failures.
2026-01-14 19:44:05 +08:00
Kalin Staykov
23a947ee3d fix Rust installation checksums 2026-01-14 13:41:20 +02:00
LukeMathWalker
cb842d6cc2 chore: Bump Rust stable version from 1.88.0 to 1.92.0 2026-01-14 12:10:53 +02:00
cui
b23122d002
Remove unused comment in ebuckets.c (#14694)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2026-01-14 14:25:58 +08:00
Filipe Oliveira (Redis)
3c96680cfb
Enable hardware clock by default on ARM AArch64. (#14676)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Redis can already use a processor-provided hardware counter as a
high-performance monotonic clock. On some architectures this must be
enabled carefully, but on ARM AArch64 the situation is different:

- The ARM Generic Timer is architecturally mandatory for all processors
that implement the AArch64 execution state.
- The system counter (`CNTVCT_EL0`) and its frequency (`CNTFRQ_EL0`) are
guaranteed to exist and provide a monotonic time source (per the *“The
Generic Timer in AArch64 state”* section of the *Arm® Architecture
Reference Manual for Armv8-A* —
https://developer.arm.com/documentation/ddi0487/latest).

Because of this architectural guarantee, it is safe to enable the
hardware clock by default on ARM AArch64.
Like detailed bellow, this gives us around 5% boost on io-thread
deployments for a simple strings benchmark.
2026-01-13 20:12:04 +08:00
Salvatore Sanfilippo
60a4fa2e4b
Vsets: Remove stale note about replication from README. (#14528) 2026-01-13 16:13:59 +08:00