PR https://github.com/redis/redis/pull/14937 updates the Codecov
workflow configuration for `codecov/codecov-action` v6.
The action no longer accepts the singular `file` input, so this switches
to `files` to ensure `./src/redis.info` is uploaded correctly.
## Problem
The test `HOTKEYS - commands inside MULTI/EXEC` in
`tests/unit/hotkeys.tcl` is flaky on fast hardware. This PR raises its
inner loop count from 7 to 30 to make `key2` reliably appear in the CPU
top-K.
Failed CI:
https://github.com/redis/redis/actions/runs/25051455424/job/73380034469?pr=15128
Inside `MULTI`/`EXEC`, each queued command's per-command CPU time is
recorded as `c->duration = ustime() - call_timer` (microseconds,
integer). Very fast commands such as `SET` against a small value can
complete in less than 1 µs and therefore be measured as `0`.
`hotkeyStatsUpdateCurrentCmd` then forwards that zero duration as the
weight to `chkTopKUpdate`, which has an explicit early return on `weight
== 0`:
```c
sds chkTopKUpdate(chkTopK *topk, char *item, int itemlen, counter_t weight) {
if (weight == 0) return NULL;
...
}
```
In the original test, `key2` is `SET` only 7 times inside the
transaction. On fast hosts (the failure was observed on an ARM box with
`ustime()` ticking at 1 µs resolution) it is possible for all 7 calls to
be measured as 0 µs, which means `key2` is never inserted into the CPU
top-K and the assertion
```tcl
assert [dict exists $cpu_result $key2]
```
fails. `key1` has 21 calls and is statistically safe.
The author already anticipated this and left a comment ("Send multiple
commands to avoid <1us cpu for $key2"), but 7 iterations turned out to
be insufficient.
## Changes
Bump the iteration count from 7 to 30. With `key2` now `SET` 30 times
the probability of every single call being measured as 0 µs becomes
negligible on any realistic hardware.
Fixes#15085
## Problem
getKeySlot() may return `server.current_client->slot` while a command is
executing instead of computing the slot from the provided string.
The unsubscribe can be triggered by another client, in which case server.current_client is not the client being unsubscribed, so getKeySlot() would return that client's cached slot. Using this wrong slot would make the lookup in type.serverPubSubChannels miss the channel and ultimately trigger the assertion below.
## Fix
Always use keyHashSlot() instead of getKeySlot() on unsubscribe.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
## Summary
Add a batched prefetch fast path for `HGETALL` on hashtable-encoded
hashes. When iterating large hash tables, pointer chasing through
scattered heap allocations (`dictEntry` → `Entry` → value SDS) causes
cache misses that dominate CPU time (~10% flat in `dictNext`).
The new path collects dict entries in batches of configured batch size,
issues software prefetches for the `Entry` structs and their value SDS
data, then emits replies while the data is cache-warm. This hides memory
latency by overlapping prefetch with reply generation.
Following #14890
## Problem
RM_GetUserUsername() documents that the returned RedisModuleString can
be freed via automatic memory management, but it always creates the
string with ctx=NULL so it cannot be tracked by RedisModule_AutoMemory.
Modules following the documentation may leak memory.
## Fix
Fixes `RedisModule_GetUserUsername` to accept a `RedisModuleCtx *` and create the returned `RedisModuleString` with that context, allowing RedisModule auto-memory management to track/free it as documented.
Optimize SET key value GET propagation rewriting in setGenericCommand()
by removing GET arguments in-place with rewriteClientCommandArgument().
This avoids the overhead of allocating a new argv vector and
incrementing reference counts for every retained argument.
The optimization is scoped to the no-expire SET ... GET rewrite path.
It also adds test coverage for cases with repeated GET tokens to
ensure robust string semantics and consistent replication behavior.
Changes:
- Use rewriteClientCommandArgument(c, j, NULL) for in-place removal.
- Eliminate redundant argv allocations and refcount increments.
- Improve performance of SET GET in high-throughput write streams.
Fixes checkPrefixCollisionsOrReply() to return 0 (failure) on any provided-prefix self-overlap, instead of accidentally returning a non-zero loop index for overlaps found after the first prefix.
Signed-off-by: Raj Danday <rajkripal.danday@gmail.com>
Made-with: Cursor
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Documentation-only change; no code or runtime behavior is affected,
but it changes the official intake channels for vulnerability reports.
>
> **Overview**
> Updates `SECURITY.md` to redirect vulnerability reporters from
emailing the core team to using the **Redis Vulnerability Disclosure
Program** link, with GitHub’s *Report a Vulnerability* as an
alternative.
>
> Adds a dedicated security contact email (`security@redis.com`) for
questions and includes brief rationale for the new reporting path.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
eeaa8c4ada. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## Summary
Follow-up to #15065. The merged code calls `vecReserve(&keys, count)`
where `count` is user-supplied. A client can pass a giant `COUNT` (e.g.
`HSCAN k 0 COUNT 10000000000000`) and the server pre-allocates the
corresponding pointer slots before any work happens — ~80 TB on a 64-bit
build. Pre-reserve DoS surface flagged in code review.
## Fix
Drop the pre-reserve entirely. The vec already starts on a 256-pointer
stack buffer and grows-by-doubling driven by **actual cardinality** of
the dictionary, not by user-supplied `COUNT`.
## Why drop the pre-reserve (vs cap it)
The pre-reserve doesn't pay measurable performance — `vecPush()`'s
grow-by-doubling path is amortized O(1) and the dominant cost on SCAN
workloads is the per-entry callback work, not vector growth.
`raxRecursiveFree` and `raxRecursiveFreeWithCtx` used C call-stack
recursion to walk the entire radix tree during `raxFree`. On trees with
pathologically deep paths (long keys with no shared prefixes) this could
overflow the thread stack and crash the process.
This PR replaces both recursive functions with a single unified
iterative helper (`raxFreeNodesWithCallback`) that maintains an explicit
heap-allocated `raxStack` — the same stack structure already used
elsewhere in the rax code (e.g. `raxIterator`). The helper accepts both
callback variants (with and without a user-supplied context) so the two
public entry points `raxFreeWithCallback` and `raxFreeWithCbAndContext`
now both delegate to it. Child pointers are now enumerated forward from
`raxNodeFirstChildPtr` instead of backward from `raxNodeLastChildPtr`,
which is simpler and consistent with how the rest of the codebase
traverses children. No functional change: every node is still visited
exactly once, its optional data callback is still invoked before the
node is freed, and `rax->numnodes` is decremented identically.
**Summary**
Detects and rejects corrupt stream RDB payloads where the same NACK
(pending entry) is referenced by more than one consumer, which violates
a stream data-structure.
**Changes**
- **`rdbLoadObject` (stream consumer PEL loading)**: Added a guard that
checks `nack->consumer != NULL` before assigning the consumer pointer.
When a second consumer's PEL references a NACK that was already claimed
by a prior consumer, the loader now reports a corrupt RDB error and
aborts instead of silently overwriting the pointer. Without this check,
two consumers share the same `streamNACK`, and freeing the first
consumer's PEL leaves the second with a dangling pointer.
- **`corrupt-dump.tcl`**: Added a regression test that crafts a stream
with two consumers (`consumerA`, `consumerB`) whose PELs both reference
the same entry (`1-0`). The `RESTORE` command is expected to fail with
`"Bad data format"`, and the server must remain responsive (`PING`
succeeds).
**Benefits**
- **Fail-fast on corrupt data**: The invariant violation is caught at
load time with a clear diagnostic message rather than manifesting as a
crash later during normal operation.
- **Regression coverage**: The crafted payload in the test ensures this
class of corruption is permanently guarded against.
### Problem
In `scanGenericCommand`, `maxiterations = count * 10` overflows when
`count > LONG_MAX / 10`, causing undefined behavior.
### Changed
1. Use saturating arithmetic to prevent overflow.
2. Added a test to trigger the overflow path, detectable by UBSan.
RM_RegisterClusterMessageReceiver() unlinks a receiver node from the
clusterReceivers[type] linked list when the callback is set to NULL, but
when removing the head node (prev == NULL), the code updates
clusterReceivers[type]->next instead of clusterReceivers[type] itself.
This leaves clusterReceivers[type] pointing to the freed node, so any
later traversal through clusterReceivers[type] dereferences a dangling
pointer.
Fix by updating clusterReceivers[type] directly when prev == NULL.
Fixes#15057
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Several `addReplyError` and `addReplyErrorFormat` calls in
`xnackCommand` included a redundant `"ERR "` prefix in the message
string. Since `addReplyErrorLength` already prepends `-ERR ` to the RESP
reply, clients received `ERR ERR ...` for these error paths.
This PR removes the redundant prefix from all five affected calls and
tightens the corresponding test patterns to match from the beginning of
the error message (`"ERR ..."` instead of `"*...*"`), so any future
double-prefix regression will be caught.
## Root cause
Roughly 50% of random double scores generated by the ZADD listpack
workload have 17-19 significant digits, which exceed
`MAX_MANTISSA_FAST_PATH` (`2^53`). These inputs fall through to the
`strtod()` fallback:
```c
char static_buf[128];
memcpy(buf, nptr, len); /* memcpy back! */
buf[len] = '\0'; /* null-term */
double result = strtod(buf, ...); /* glibc strtod — ~10× slower on ARM */
```
The original C++ `fast_float` library handled the same 17-19 digit
inputs with Eisel-Lemire / bigint arithmetic without falling back to
`strtod()`. That is what the pure-C replacement lost.
## Fix
Compute `mantissa * 10^exponent` in 128-bit integer arithmetic using
`__uint128_t`, then convert to double with a single IEEE
round-to-nearest-even cast. Supported for `|exp| in [0, 19]` where
`10^|exp|` fits in `uint64`; cases outside that range (or otherwise
outside the fast path's preconditions) still fall through to `strtod()`.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Update RediSearch module version to 8.8 RC1 (v8.7.90)
Made with [Cursor](https://cursor.com)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Low risk: a single version bump that changes which RediSearch git tag
is cloned/built; main risk is build/runtime incompatibility from the
upstream RC update.
>
> **Overview**
> Updates the RediSearch module build configuration to fetch and build
upstream `redisearch` tag `v8.7.90` (8.8 RC1) instead of `v8.5.90`.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
21e121c738. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
In addReplyErrorLength and addReplyErrorFormatInternal, `-ERR` is
automatically prepended if the message doesn’t start with `-`, so the
initial `-ERR` is unnecessary. Also, trailing `\r\n` will be trimmed, so
it doesn’t need to be included.
---------
Signed-off-by: charsyam <charsyam@naver.com>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
### Problem
While the new type `OBJ_GCRA` was added, several related code paths were
not updated accordingly, leading to failures in the
`reply-schemas-validator` CI job and `corrupt-dump-fuzzer.tcl`
##### reply-schemas-validator
Failed CI:
https://github.com/redis/redis/actions/runs/24485248057/job/71558533290#step:10:903
```shell
Traceback (most recent call last):
File "/home/runner/work/redis/redis/./utils/req-res-log-validator.py", line 238, in process_file
jsonschema.validate(instance=res.json, schema=req.schema, cls=schema_validator)
File "/home/runner/.local/lib/python3.12/site-packages/jsonschema/validators.py", line 1121, in validate
raise error
jsonschema.exceptions.ValidationError: 'rate_limit' is not valid under any of the given schemas
Failed validating 'oneOf' in schema['patternProperties']['^.*$']['properties']['group']:
{'description': 'the functional group to which the command belongs',
'oneOf': [{'const': 'bitmap'},
{'const': 'cluster'},
{'const': 'connection'},
{'const': 'generic'},
{'const': 'geo'},
{'const': 'hash'},
{'const': 'hyperloglog'},
{'const': 'list'},
{'const': 'module'},
{'const': 'pubsub'},
{'const': 'scripting'},
{'const': 'sentinel'},
{'const': 'server'},
{'const': 'set'},
{'const': 'sorted-set'},
{'const': 'stream'},
{'const': 'string'},
{'const': 'transactions'}]}
On instance['gcrasetvalue']['group']:
'rate_limit'
```
##### `corrupt-dump-fuzzer.tcl`
Also fixed `: Fuzzer corrupt restore payloads - sanitize_dump: yes in
tests/integration/corrupt-dump-fuzzer.tcl`
Failed daily test :
https://github.com/redis/redis/actions/runs/24485248057/job/71558533312#step:6:8652
```shell
Server crashed (by signal: 0, err: key "gcra" not known in dictionary), with payload: "\x1C\x0A\x02\x5F\x37\xC0\x06\xC0\x00\x02\x5F\x39\xC0\x08\x02\x5F\x33\x02\x5F\x35\x02\x5F\x31\xC0\x02\xC0\x04\x0E\x00\xA9\x71\xBF\xEE\x6F\x46\xEF\xA6"
violating commands:
Done 1434 cycles in 600 seconds.
RESTORE: successful: 601, rejected: 833
Total commands sent in traffic: 1194776, crashes during traffic: 1 (0 by signal).
[: Fuzzer corrupt restore payloads - sanitize_dump: yes in tests/integration/corrupt-dump-fuzzer.tcl
Expected '1' to be equal to '0' (context: type eval line 155 cmd {assert_equal $stat_terminated_in_traffic 0} proc ::test)
[147/147 done]: integration/corrupt-dump-fuzzer (1201 seconds)
```
### Changed
This change completes the necessary updates across all relevant
components to ensure consistent handling of the rate_limit group and
restores CI stability.
## Motivation
Redis's existing keyspace notification system operates at the **key
level** only — when a hash field is modified via `HSET`, `HDEL`, or
`HEXPIRE`, the subscriber receives the key name and the event type, but
not **which fields** were affected, therefore, these notifications has
very little practical value.
This PR introduces a subkey notification system that extends keyspace
events to include field-level (subkey) details for hash operations,
through both Pub/Sub channels and the Module API.
## New Pub/Sub Notification Channels
Four new channels are added:
|Channel Format | Payload |
|---------------|---------|
| `__subkeyspace@<db>__:<key>` | `<event>\|<len>:<subkey>[,...]` |
|`__subkeyevent@<db>__:<event>` |
`<key_len>:<key>\|<len>:<subkey>[,...]` |
| `__subkeyspaceitem@<db>__:<key>\n<subkey>` | `<event>` |
|`__subkeyspaceevent@<db>__:<event>\|<key>` | `<len>:<subkey>[,...]` |
**Design rationale for 4 channels:**
- **Subkeyspace**: Subscribe to a specific key, receive all field
changes in a single message — efficient for key-centric consumers.
- **Subkeyevent**: Subscribe to a specific event type, receive
key+fields — efficient for event-centric consumers.
- **Subkeyspaceitem**: Subscribe to a specific key+field combination —
the most selective, one message per field, no parsing needed.
- **Subkeyspaceevent**: Subscribe to event+key combination, receiving
only the affected fields — server-side filtering on both dimensions.
Subkeys are encoded in a length-prefixed format (`<len>:<subkey>`) to
support binary-safe field names containing delimiters.
**Safety guards:**
- Events containing `|` are skipped for `__subkeyspace` and
`__subkeyspaceevent ` channels (to avoid parsing ambiguity).
- Keys containing `\n` are skipped for the `__subkeyspaceitem` channel
(newline is the key/subkey separator).
- Subkeys channels are only published when `subkeys != NULL && count >
0`.
## Hash Command Integration
The following hash operations now emit subkey level notifications with
the affected field names:
| Command | Event | Subkeys |
|---------|-------|---------|
| `HSET` / `HMSET` | `hset` | All fields being set |
| `HSETNX` | `hset` | The field (if set) |
| `HDEL` | `hdel` | All fields deleted |
| `HGETDEL` | `hdel` / `hexpired` | Deleted or lazily expired fields |
| `HGETEX` | `hexpire` / `hpersist` / `hdel` / `hexpired` | Affected
fields per event |
| `HINCRBY` | `hincrby` | The field |
| `HINCRBYFLOAT` | `hincrbyfloat` | The field |
| `HEXPIRE` / `HPEXPIRE` / `HEXPIREAT` / `HPEXPIREAT` | `hexpire` |
Updated fields |
| `HPERSIST` | `hpersist` | Persisted fields |
| `HSETEX` | `hset` / `hdel` / `hexpire` / `hexpired` | Affected fields
per event |
| Field expiration (active/lazy) | `hexpired` | All expired fields
(batched) |
For field expiration, expired fields are collected into a dynamic array
and sent as a single batched notification after the expiration loop,
rather than one notification per field.
## Module API
Three new APIs and one new callback type:
```c
/* Function pointer type for keyspace event notifications with subkeys from modules. */
typedef void (*RedisModuleNotificationWithSubkeysFunc)(
RedisModuleCtx *ctx, int type, const char *event,
RedisModuleString *key, RedisModuleString **subkeys, int count);
/* Subscribe to keyspace notifications with subkey information.
*
* This is the extended version of RM_SubscribeToKeyspaceEvents. When subkeys
* are available, the `subkeys` array and `count` are passed to the callback.
* `subkeys` contains only the names of affected subkeys (values are not included),
* and `count` is the number of elements. The array may contain duplicates when
* the same subkey appears more than once in a command (e.g. HSET key f1 v1 f1 v2
* produces subkeys=["f1","f1"], count=2). When no subkeys are present, `subkeys`
* will be NULL and `count` will be 0. Whether events without subkeys are delivered
* depends on the `flags` parameter (see below).
*
* `types` is a bit mask of event types the module is interested in
* (using the same REDISMODULE_NOTIFY_* flags as RM_SubscribeToKeyspaceEvents).
*
* `flags` controls delivery filtering:
* - REDISMODULE_NOTIFY_FLAG_NONE: The callback is invoked for all matching
* events regardless of whether subkeys are present, so a separate
* RM_SubscribeToKeyspaceEvents registration can be omitted.
* - REDISMODULE_NOTIFY_FLAG_SUBKEYS_REQUIRED: The callback is only invoked
* when subkeys are not empty. Events without subkey information (e.g. SET,
* EXPIRE, DEL) are skipped.
*
* The callback signature is:
* void callback(RedisModuleCtx *ctx, int type, const char *event,
* RedisModuleString *key, RedisModuleString **subkeys, int count);
*
* The subkeys array and its contents are only valid during the callback.
* The underlying objects may be stack-allocated or temporary, so
* RM_RetainString must NOT be used on them. To keep a subkey beyond
* the callback (e.g. in a RM_AddPostNotificationJob callback), use
* RM_HoldString (which handles static objects by copying) or
* RM_CreateStringFromString to make a deep copy before returning.
*/
int RM_SubscribeToKeyspaceEventsWithSubkeys(RedisModuleCtx *ctx, int types, int flags, RedisModuleNotificationWithSubkeysFunc callback);
/* Unregister a module's callback from keyspace notifications with subkeys
* for specific event types.
*
* This function removes a previously registered subscription identified by
* the event mask, delivery flags, and the callback function.
*
* Parameters:
* - ctx: The RedisModuleCtx associated with the calling module.
* - types: The event mask representing the notification types to unsubscribe from.
* - flags: The delivery flags that were used during registration.
* - callback: The callback function pointer that was originally registered.
*
* Returns:
* - REDISMODULE_OK on successful removal of the subscription.
* - REDISMODULE_ERR if no matching subscription was found. */
int RM_UnsubscribeFromKeyspaceEventsWithSubkeys(
RedisModuleCtx *ctx, int types, int flags,
RedisModuleNotificationWithSubkeysFunc cb);
/* Like RM_NotifyKeyspaceEvent, but also triggers subkey-level notifications
* when subkeys are provided. Both key-level (keyspace/keyevent) and
* subkey-level (subkeyspace/subkeyevent/subkeyspaceitem/subkeyspaceevent)
* channels are published to, depending on the server configuration.
*
* This is the extended version of RM_NotifyKeyspaceEvent and can actually
* replace it. When called with subkeys=NULL and count=0, it behaves
* identically to RM_NotifyKeyspaceEvent. */
int RM_NotifyKeyspaceEventWithSubkeys(
RedisModuleCtx *ctx, int type, const char *event,
RedisModuleString *key, RedisModuleString **subkeys, int count);
```
## Configuration
Subkey notifications are controlled via the existing
`notify-keyspace-events` configuration string with four new characters:
`notify-keyspace-events` "STIV"
**S** -> Subkeyspace events, published with `__subkeyspace@<db>__:<key>`
prefix.
**T** -> Subkeyevent events, published with
`__subkeyevent@<db>__:<event>` prefix.
**I** -> Subkeyspaceitem events, published per subkey with
`__subkeyspaceitem@<db>__:<key>\n<subkey>` prefix.
**V** -> Subkeyspaceevent events, published with
`__subkeyspaceevent@<db>__:<event>|<key>` prefix.
These flags are **independent** from the existing key-level flags (`K`,
`E`, etc.). Enabling subkey notifications does **not** implicitly enable
or depend on keyspace/keyevent notifications, and vice versa.
## Known Limitations
- **Duplicate fields in subkey notifications**: Subkey notification
payloads may contain duplicate field names when the same field is
affected more than once within a single command. Since duplicate fields
are not the common case and deduplication would introduce significant
overhead on every notification, we chose not to deduplicate at this
time.
- **Subkey is sds encoding object**: We assume the subkey is sds
encoding object, and access it by `subkey->ptr`, and there is an assert,
redis will crash if not.
## Summary
This PR fixes two issues when processing corrupt data in
rdbLoadCheckModuleValue():
1. When handling `RDB_MODULE_OPCODE_STRING` opcode,
rdbGenericLoadStringObject() can return NULL on a corrupt payload. The
code called decrRefCount(o) unconditionally without a NULL check,
resulting in a NULL pointer dereference crash.
2. The while loop condition was `!= RDB_MODULE_OPCODE_EOF`, which means
a truncated payload (causing rdbLoadLen to return RDB_LENERR) would
never exit the loop, since `RDB_LENERR != RDB_MODULE_OPCODE_EOF` is
always true, potentially causing an infinite hang.
HSETEX crashed on assert() with a SIGABRT when the same field appeared
more than once in the FIELDS list and an expiry time was given
(EX/PX/EXAT/PXAT).
Root cause: hfieldPersist() and the KEEP_TTL path in hashTypeSet() both
asserted that dictExpireMeta->expireMeta.trash == 0, meaning the hash
must be globally registered in the HFE DS. This is incorrect during
HSETEX execution because hashTypeSetExDone(), which registers the hash
globally and clears trash, called only at the end of flow. The private
per-field ebuckets are fully valid regardless of the global registration state.
Fix: Remove both incorrect assertions. The operations on the private
ebuckets (ebRemove in hfieldPersist, ebAdd in the KEEP_TTL path) are
correct and do not require the hash to be globally registered.
Tests: Added two regression tests covering the crash scenarios:
- HSETEX EX with a duplicate field (existing field, expiry given)
- HSETEX FNX EX with a duplicate field (no prior field, FNX condition
passes)
[PR ](https://github.com/redis/redis/pull/14826) introduced a new rate
limiting command which stores its internal implementation-detail data
into a string key.
Since this will prevent a client from detecting type errors or
accidental overwrites or value invalidations, f.e via SET or INCR this
PR introduces a new data type - OBJ_GCRA specifically created for that
new command.
Furthermore, a new RATE_LIMIT KSN type was introduced for emitting "gcra" events on such keys.
GCRASETTAT was renamed to GCRASETVALUE.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Refactor command propagation code to reduce overhead on master
Currently, the main bottleneck is `feedReplicationBuffer()`. It is
called for each argument in the command and has bookkeeping overhead on
every call (e.g. checking whether to attach replicas to the replication
backlog). It is also not inlined by the compiler. These costs become
more visible with pipelining and commands with many arguments (e.g. HSET
with many fields).
Changes:
- Defer all bookkeeping to be done once per command instead of once per
command argument.
- Refactor the hot path so the compiler can inline
`replBufWriterAppend()`.
- Add `replBufWritterAppendBulkLen()` that uses shared RESP headers for
small values, avoiding formatting overhead.
These changes should not introduce any behavioral change.
**TODO:** In a follow-up PR, explore forwarding the exact command from
the client querybuf to avoid re-serialization. Many commands are
propagated without modification and can benefit from this.
--
| Benchmark | Before (ops/s) | After (ops/s) | Improvement |
|---|---|---|---|
| SET | 256,048 | 265,131 | **+3%** |
| SET (pipeline) | 1,477,310 | 1,671,272 | **+13%** |
| HSET 10 fields | 145,000 | 158,000 | **+9%** |
| HSET 10 fields (pipeline) | 363,483 | 430,855 | **+18%** |
| HSET 10 fields, 15B values (pipeline) | 387,443 | 487,135 | **+26%** |
| ZADD 5 members | 180,700 | 193,519 | **+7%** |
| ZADD 5 members (pipeline) | 466,453 | 564,872 | **+21%** |
------
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
During RDB saving and AOF rewriting, the fork child already dismisses
(madvise(MADV_DONTNEED)) individual key-value objects after serializing them.
However, the hash table bucket arrays of each dict were never dismissed,
leaving large contiguous allocations subject to CoW when the parent
modifies them.
This PR extends the dismiss mechanism to cover dict bucket arrays,
reducing CoW memory overhead.
- **Expires kvstore** — dismissed upfront before saving starts, since the
child never accesses expires directly, after embeding expire time in the key object.
- **Slot dicts** (cluster mode) — dismissed per-slot as the iterator moves
to the next slot during RDB saving or AOF rewriting.
- **DB keys kvstore** (standalone mode) — dismissed per-DB after each DB is
fully serialized during RDB saving or AOF rewriting.
The fast_float dependency required C++ (libstdc++) to build Redis. This
commit replaces the 3800-line C++ template library with a minimal pure C
implementation (~360 lines) that provides the same functionality needed
by Redis.
This is **very important** because Redis build process would fail
without g++ installed, a common situation in Linux distributions even
after installing the basic build tools: we want the build process of
Redis to be the simplest possible. Also Redis sometimes is compiled in
embedded systems lacking the g++ toolchain. There is no reason to depend
on C++ in a project written in C.
## The C implementation uses
1. Fast path (Clinger's algorithm) for numbers with mantissa <= 2^53 and
exponent in [-22, 22], covering ~99% of real-world cases.
2. Fallback to strtod() for complex cases to ensure correctly-rounded
results.
## Changes
- Move new fast_float_strtod.c(C implementation) from deps into Redis
core since it is now a single file and no longer needs a separate
directory.
- Remove all c++ dependencies
The implementation was tested against both strtod and the original C++
implementation with 10,000+ test cases including edge cases, special
values (inf/nan), and random inputs.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Mincho Paskalev <minchopaskal@gmail.com>
Co-authored-by: Moti Cohen <moti.cohen@redis.com>
`xinfoReplyWithStreamInfo` passed the wrong key(c->argv[1]) instead of
`c->argv[2]` to `updateSlotAllocSize` when updating per-slot memory
tracking.
Fix by passing the key explicitly to `xinfoReplyWithStreamInfo` instead
of relying on a hardcoded argv index.
Also, add the `-DDEBUG_ASSERTIONS` flag to the test-ubuntu-jemalloc CI
to cover this debug assertion.
### Overview
This PR adds a new `COUNT` aggregation mode to the `ZUNIONSTORE`,
`ZINTERSTORE`, `ZUNION`, and `ZINTER` sorted set commands. When
`AGGREGATE COUNT` is specified, the resulting score for each element
reflects how many input sets contain it (optionally scaled by
`WEIGHTS`), rather than combining the actual scores of the elements.
This enables a common use case — counting set membership frequency —
directly at the command level, without application-side workarounds.
### Problem Statement
For developers who need to know **how many input sorted sets contain
each element**, there is no single-command solution today.
**Example:** given several game leaderboards, find how many leaderboards
each player appears in.
The existing aggregation modes (`SUM`, `MIN`, `MAX`) all operate on the
elements' scores. To ignore scores and just count set membership, you'd
currently need to copy each sorted set with all scores set to 1, then
run `ZUNIONSTORE`/`ZINTERSTORE` with `SUM` — requiring multiple round
trips, temporary keys, and application-level locking to avoid races.
A `COUNT` aggregation mode solves this directly.
### Solution
Introduces `AGGREGATE COUNT` as a fourth aggregation mode:
- `ZINTER numkeys key [key ...] [WEIGHTS weight [weight ...]] [AGGREGATE
<SUM | MIN | MAX | COUNT>] [WITHSCORES]`
- `ZINTERSTORE destination numkeys key [key ...] [WEIGHTS weight [weight
...]] [AGGREGATE <SUM | MIN | MAX | COUNT>]`
- `ZUNION numkeys key [key ...] [WEIGHTS weight [weight ...]] [AGGREGATE
<SUM | MIN | MAX | COUNT>] [WITHSCORES]`
- `ZUNIONSTORE destination numkeys key [key ...] [WEIGHTS weight [weight
...]] [AGGREGATE <SUM | MIN | MAX | COUNT>]`
When `COUNT` is specified, **the scores in the input sets are ignored**.
Note that `WEIGHTS` is **not** ignored — each set contributes its weight
(default 1) per element, and the contributions are summed.
**Implementation details:**
A new helper function `zuiWeightedScore()` computes the per-set
contribution:
```c
inline static double zuiWeightedScore(double score, double weight, int aggregate) {
return (aggregate == REDIS_AGGR_COUNT) ? weight : weight * score;
}
```
The `zunionInterAggregate()` function treats `COUNT` identically to
`SUM` — it adds the per-set contributions. All four call sites where
`weight * score` was previously computed inline are updated to use
`zuiWeightedScore()`.
### Examples
```
> ZADD s1 1 foo 1 bar
> ZADD s2 2 foo 2 bar
> ZADD s3 3 foo
```
**With `SUM` (existing behavior, for comparison):**
```
> ZINTERSTORE t1 3 s1 s2 s3 WEIGHTS 10 5 3 AGGREGATE SUM
(integer) 1
> ZRANGE t1 0 -1 WITHSCORES
1) "foo"
2) "29"
> ZUNIONSTORE t1 3 s1 s2 s3 WEIGHTS 10 5 3 AGGREGATE SUM
(integer) 2
> ZRANGE t1 0 -1 WITHSCORES
1) "bar"
2) "20"
3) "foo"
4) "29"
```
**With `COUNT` and `WEIGHTS`:**
```
> ZINTERSTORE t1 3 s1 s2 s3 WEIGHTS 10 5 3 AGGREGATE COUNT
(integer) 1
> ZRANGE t1 0 -1 WITHSCORES
1) "foo"
2) "18"
> ZUNIONSTORE t1 3 s1 s2 s3 WEIGHTS 10 5 3 AGGREGATE COUNT
(integer) 2
> ZRANGE t1 0 -1 WITHSCORES
1) "bar"
2) "15"
3) "foo"
4) "18"
```
**With `COUNT` and no specified `WEIGHTS`** — resulting score equals the
number of input sorted sets containing the element:
```
> ZINTERSTORE t1 3 s1 s2 s3 AGGREGATE COUNT
(integer) 1
> ZRANGE t1 0 -1 WITHSCORES
1) "foo"
2) "3"
> ZUNIONSTORE t1 3 s1 s2 s3 AGGREGATE COUNT
(integer) 2
> ZRANGE t1 0 -1 WITHSCORES
1) "bar"
2) "2"
3) "foo"
4) "3"
```
### Backward Compatibility
This is a fully additive change. The new `COUNT` keyword is only
recognized after the `AGGREGATE` token in the four affected commands.
Existing commands, arguments, and default behavior (`AGGREGATE SUM`) are
completely unchanged. No new command is introduced, and no existing
response format is modified.
Validate HEXPIRE-family field counts without parser overflow
keep flexible option order; only require fields fit in argv
add tests for INT_MAX numfields across HEXPIRE/HPEXPIRE/HEXPIREAT/HPEXPIREAT
Fixes#14985
### Problem
dict stream_idmp_keys was using objectKeyPointerValueDictType, in this
dict type dicts are expected to have RObj as keys and Pointers as
values, but stream_idmp_keys was not using the value field at all.
### Solution
This PR fixes the above issue by implementing new dict type
(objectKeyNoValueDictType) for stream_idmp_keys
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This PR implements the tarball creation job by reusing 01 script.
It splits the original job to smaller jobs and moves the gate and test
jobs before the upload job.
The job outputs the SHA of the tar and the size.
Link to a run:
https://github.com/m-marinov/redis/actions/runs/23437802059
## Summary
Fixes consumer replication inconsistency when `XREADGROUP` is called for
a new consumer but no `XCLAIM` commands are propagated to the replica.
Previously, consumer creation was only propagated to replicas when
`noack=true`, relying on `XCLAIM` propagation to implicitly create the
consumer in the non-NOACK path. However, if no messages exist to read,
no `XCLAIM` is generated, and the consumer is silently lost on the
replica.
This is a follow-up to the original fix in
[redis/redis#7140](https://github.com/redis/redis/issues/7140) /
[redis/redis#7526](https://github.com/redis/redis/pull/7526), which
introduced `XGROUP CREATECONSUMER` propagation but only for the `NOACK`
case.
## Changes
- **`xreadgroupCommand` (src/t_stream.c):** Replaced the `if (noack)`
guard around the `streamPropagateConsumerCreation()` call with a
deferred check after `streamReplyWithRange()`. Consumer creation is now
propagated when `noack || propCount == 0` — that is, only when no
`XCLAIM` commands were generated. This avoids redundant propagation in
the common case where `XCLAIM` already implicitly creates the consumer
on the replica, while correctly handling both the NOACK path (where
PEL/XCLAIM is skipped entirely) and the no-messages path (where there is
nothing to XCLAIM).
- **Test (tests/unit/type/stream-cgroups.tcl):** Added replication test
`"XREADGROUP propagates new consumer to replica"` that sets up a
master-replica pair and verifies consumer propagation in two cases: (1)
without NOACK when no messages are available to deliver, and (2) with
NOACK when messages are delivered but XCLAIM is skipped.
## Benefits
- **Master-replica consistency:** Consumers created by `XREADGROUP` are
now visible on replicas whenever no `XCLAIM` would otherwise create them
— covering both the NOACK path and the empty-stream path.
- **No redundant propagation:** The noack || propCount == 0 condition
avoids emitting a superfluous XGROUP CREATECONSUMER when XCLAIM commands
are already propagated and would implicitly create the consumer on the
replica.
### Overview
This PR enhances Redis Streams consumer groups by adding a new `XNACK`
command that allows consumers to explicitly release pending messages
back to the group without acknowledging them. Released (NACKed) entries
become immediately available for re-delivery to other consumers,
eliminating the idle-timeout delay currently required for message
recovery. The command supports three modes — SILENT, FAIL, and FATAL —
giving consumers fine-grained control over delivery counter semantics to
handle graceful shutdowns, transient failures, and poison messages
respectively.
### Problem Statement
For developers using Redis Streams with consumer groups, there are
several common scenarios where a consumer needs to release a message it
has claimed without acknowledging it:
1. **Transient internal failures**: A consumer may fail to process a
message because of problems unrelated to the message itself — for
example, it cannot connect to an external service to fetch required
context. The message is perfectly valid and should be retried promptly
by another consumer.
2. **Resource pressure**: A consumer under resource stress (low CPU, low
memory) may be unable to handle a specific message (e.g., a complex or
large message) within acceptable QoS. It should leave the opportunity to
other consumers in the group, with minimal delay.
3. **Graceful shutdown**: A consumer about to shut down would like to
immediately release all unprocessed messages it has claimed, so they can
be picked up by remaining consumers without waiting for idle timeouts.
4. **Poison / malicious messages**: A consumer may detect or suspect
that a claimed message is invalid or malicious and wants to mark it as
permanently failed (for dead-letter queue processing when available).
**Currently, a consumer cannot NACK a message.** It can either:
- **XACK** it — marks it as "processed" and removes it from the PEL
entirely, losing the ability to redeliver it
- **Leave it pending** — requires other consumers to discover it via
`XPENDING` and claim it via `XCLAIM`/`XAUTOCLAIM` or `XREADGROUP CLAIM`
after the idle timeout expires, introducing a long, unnecessary delay
In all these cases, the logic that applications must implement
introduces **message handling delays**, **implementation complexity**,
and **code duplication** across consumer implementations.
### Solution
Introduces a new `XNACK` (Negative ACKnowledge) command that explicitly
releases pending messages from their owning consumer back to the group's
PEL, making them immediately claimable via `XCLAIM` and `XAUTOCLAIM`,
and prioritized for re-delivery in `XREADGROUP CLAIM`:
```
XNACK key group <SILENT|FAIL|FATAL> IDS numids id [id ...] [RETRYCOUNT count] [FORCE]
```
When executed, the command:
1. **Disassociates** the entry from its owning consumer (`consumer =
NULL`)
2. **Repositions** the entry to the head of the PEL time-ordered list
(`delivery_time = 0`), making it immediately claimable with any
`min-idle-time` threshold
3. **Adjusts the delivery counter** based on the specified mode, giving
consumers fine-grained control over retry semantics
4. **Returns** the count of successfully NACKed entries
**Mode** controls the delivery counter adjustment and communicates the
reason for the NACK:
| Mode | Delivery Counter Behavior | Use Case |
|----------|---------------------------------------------------|---------------------------------------------|
| `SILENT` | Decrement by 1 (undo the delivery increment) | Consumer
shutdown / transient internal error — the delivery "didn't count" |
| `FAIL` | No change (keep the incremented value) | Message too complex
for this consumer, but may work for others — count this as an attempt |
| `FATAL` | Set to `LLONG_MAX` | Invalid / suspected malicious message —
mark as permanently failed |
The three modes map directly to the real-world scenarios above:
- **SILENT** for graceful shutdown or transient failures unrelated to
the message
- **FAIL** for resource-constrained consumers that cannot handle a
specific message
- **FATAL** for poison message detection and dead-letter queue
integration
**Optional parameters:**
- **`RETRYCOUNT count`**: Directly sets `delivery_count` to the
specified value, overriding the mode-based adjustment
- **`FORCE`**: Creates new unowned PEL entries for IDs that are not
already in the group PEL (the entry must exist in the stream). When
`FORCE` creates an entry, the delivery counter is set to `0` (or to
`RETRYCOUNT` if specified, or to `LLONG_MAX` if mode is `FATAL`). This
is used internally for AOF rewrite and replication.
### Response Format
The command returns an integer — the number of messages successfully
NACKed (released back to the group PEL):
```
127.0.0.1:6379> XADD mystream 1-0 f v1
"1-0"
127.0.0.1:6379> XADD mystream 2-0 f v2
"2-0"
127.0.0.1:6379> XGROUP CREATE mystream grp 0
OK
127.0.0.1:6379> XREADGROUP GROUP grp c1 STREAMS mystream >
1) 1) "mystream"
2) 1) 1) "1-0"
2) 1) "f"
2) "v1"
2) 1) "2-0"
2) 1) "f"
2) "v2"
127.0.0.1:6379> XNACK mystream grp FAIL IDS 2 1-0 2-0
(integer) 2
```
After XNACK, the entries appear with an empty consumer in XPENDING
output:
```
127.0.0.1:6379> XPENDING mystream grp - + 10
1) 1) "1-0"
2) ""
3) (integer) -1
4) (integer) 1
2) 1) "2-0"
2) ""
3) (integer) -1
4) (integer) 1
```
### NACK Zone: Data Structure Extension
To support unowned PEL entries and ensure they are prioritized for
re-delivery, a **NACK zone** is introduced at the head of the existing
PEL time-ordered doubly-linked list. A new `pel_nack_tail` pointer is
added to the `streamCG` structure:
**PEL ordering:**
```
[pel_time_head] <-> ... <-> [pel_nack_tail] <-> [owned entries...] <-> [pel_time_tail]
|_____________ NACK zone ______________| |_______ normal PEL ________|
```
The head of the PEL contains all NACKed messages (FIFO-ordered),
followed by all delivered messages that were not NACKed (same order as
today). This ensures NACKed messages are always prioritized over idle
pending messages.
The delivery order for `XREADGROUP` is therefore:
1. If `CLAIM` was specified: first deliver NACKed messages, then deliver
due pending messages (current behavior)
2. Deliver new entries after the group's last-delivered-id (current
behavior)
**Structure Design:**
- NACKed entries occupy positions from `pel_time_head` to
`pel_nack_tail` in the time-ordered list
- Their `delivery_time` is set to `0`, ensuring they always appear
"oldest" and are immediately claimable
- Their `consumer` pointer is set to `NULL`, marking them as unowned
- `pel_nack_tail` is `NULL` when no NACKed entries exist
**Key Properties:**
- **O(1) insertion**: New NACKed entries are inserted right after
`pel_nack_tail` (or at the list head if the zone is empty)
- **FIFO ordering** among NACKed entries: entries are NACKed in the
order they are released
- **Immediate claimability**: Since `delivery_time = 0`, NACKed entries
have maximum idle time and satisfy any `min-idle-time` threshold in
`XCLAIM` and `XAUTOCLAIM`, In `XREADGROUP CLAIM`, NACKed entries are
also prioritized over other pending entries due to their position at the
head of the PEL.
- **Zone integrity**: The `pelListInsertSorted` function is updated to
stop scanning at the `pel_nack_tail` boundary, ensuring owned entries
are never placed inside the NACK zone
### Impact on Existing Commands
All commands that interact with the PEL are updated to handle unowned
(`consumer = NULL`) entries:
- **XPENDING**: Shows NACKed entries with an empty consumer name
- **XCLAIM / XAUTOCLAIM**: Can claim NACKed entries (they satisfy any
min-idle-time since `delivery_time = 0`)
- **XREADGROUP CLAIM**: NACKed entries are picked up by the claim phase
- **XACK**: Works correctly on NACKed entries (removes from group PEL)
- **XINFO STREAM FULL**: Displays NACKed entries with an empty consumer
name
- **XGROUP DELCONSUMER**: Unaffected — NACKed entries are not in any
consumer's PEL
Propagation is also updated: when `XCLAIM` or `XAUTOCLAIM` encounters a
deleted stream entry for an unowned NACK, it propagates `XACK` (instead
of `XCLAIM`) to replicas and AOF, since there is no source consumer to
reference.
### Persistence
**RDB:**
- A new RDB type `RDB_TYPE_STREAM_LISTPACKS_5` (type 27) is introduced
- After saving consumer PEL entries, the NACK zone stream IDs are saved
separately (count + encoded IDs)
- On load, NACK zone entries are reconstructed by looking them up in the
group PEL, unlinking from their sorted position, and re-inserting into
the NACK zone via `pelListInsertNacked`
- Backward compatibility is preserved: old RDB types continue to load
with the existing validation (all entries must have consumers)
**AOF:**
- AOF rewrite emits `XNACK <key> <group> FAIL IDS <n> <id...> RETRYCOUNT
<cnt> FORCE` commands for entries in the NACK zone
- Consecutive entries with the same `delivery_count` are batched into a
single command (up to `AOF_REWRITE_ITEMS_PER_CMD` IDs per command)
### Defragmentation
The defragmentation logic is restructured to handle unowned entries:
- **`defragStreamCGPendingEntry`** (new): Walks the group-level PEL rax,
defragments each NACK, updates the doubly-linked list pointers
(`pel_prev`, `pel_next`), `pel_time_head`, `pel_time_tail`,
`pel_nack_tail`, and the consumer PEL back-pointer for owned entries
- **`defragStreamConsumerPendingEntry`** (simplified): Only fixes up
back-pointers to the possibly-relocated consumer and CG, since actual
defragmentation is now done at the group-level walk. Unowned (NACK zone)
entries have no consumer PEL walk, so the group-level pass is their only
chance
### Key Benefits
- **Immediate re-delivery**: NACKed entries are instantly claimable by
other consumers via `XCLAIM` and `XAUTOCLAIM` (since `delivery_time = 0`
satisfies any `min-idle-time`), and prioritized for re-delivery in
`XREADGROUP CLAIM`, eliminating idle-time delays that can range from
seconds to minutes
- **Explicit release semantics**: Consumers can release messages
intentionally, with fine-grained control over retry behavior — a
capability that exists in competing systems like RabbitMQ
- **Flexible retry control**: Three modes (SILENT, FAIL, FATAL) plus
RETRYCOUNT cover the full spectrum of failure handling strategies, from
graceful shutdown to poison message detection
- **Reduced application complexity**: Eliminates the need for
application-level workarounds involving XPENDING polling, arbitrary idle
timeouts, and manual XCLAIM orchestration
- **Dead-letter queue readiness**: FATAL mode + delivery count enables
straightforward poison message detection and future DLQ integration
- **Backward compatibility**: Fully optional new command with zero
breaking changes to existing behavior
M_CreateKeyMetaClass() allows registration only on:
- 'DEBUG enable-module-keymeta-runtime-registration 1' (replaces server.enable_debug_cmd)
- REDISMODULE_CTX_FLAGS_SERVER_STARTUP, in addition to module->onload
As part of KSN, modules must not modify keys. However, RediSearch
modifies key metadata in some flows, which may invalidate the local
kvobj pointer.
Introduce KSN_INVALIDATE_KVOBJ() to explicitly invalidate kvobj after
notifications, preventing further access by Redis core. Currently
relevant for hash keys without HFE.
Changes:
- Add KSN_INVALIDATE_KVOBJ() to guard unsafe flows
- Apply invalidation beyond hash-specific paths
- Extend KSN side-effect coverage for DELEX and MOVE
- Rearrange flows to avoid kvobj access after notification
- Include additional tests from @JoanFM (#14939)
Behavior:
No intended behavior change and no reordering of notifications.
## Summary
Unregisters stream keys from `db->stream_idmp_keys` when IDMP
configuration is changed via `XCFGSET`. Previously, changing
`IDMP-DURATION` or `IDMP-MAXSIZE` would clear all IDMP producers but
leave the key registered in the tracking dictionary, causing the cron
job `handleExpiredIdmpEntries()` to needlessly iterate over streams with
no IDMP data.
## Changes
- **`xcfgsetCommand` (src/t_stream.c):** Added
`dictDelete(c->db->stream_idmp_keys, key)` in the `if (changed)` block
to immediately untrack the key when IDMP configuration changes clear all
producers.
## Benefits
- **Immediate cleanup of tracking state:** The stream key is removed
from `stream_idmp_keys` when configuration changes, rather than relying
on the cron to detect and clean up the stale entry on a subsequent pass.
- **Reduced unnecessary cron work:** The cron no longer wastes cycles
inspecting streams that have no IDMP producers.
In the "PUBSUB channels/shardchannels" test, we call sunsubscribe
without channels, but the number of loops in
consume_subscribe_messages() is determined by the size of channels.
When channels are empty, the loop will loop 0 times and will not read
the sunsubscribe response message returned by the server.
This means that when verifying the channel length, the previous command
might not have been complete yet, so this PR added a read after
sunsubscribe.
Rename the `void *value` parameter to `void *reserved` in keymeta
`rdb_save` and `aof_rewrite` module callbacks, and pass `NULL` at call
sites.
Originally the `value` parameter was planned to pass the internal kvobj
for core use of key metadata, but since modules cannot use it in any
meaningful way, it should not be exposed. The parameter is kept as a
reserved slot for potential future use.