Fix flaky test failures in `tests/unit/moduleapi/blockedclient.tcl`
caused by
clock precision issues with monotonic clock.
The test runs a command that blocks for 200ms and then asserts the
elapsed time
is >= 200ms. Due to clock skew and timing precision differences, the
measured
time occasionally comes back as 199ms, causing spurious test failures.
Redis can already use a processor-provided hardware counter as a
high-performance monotonic clock. On some architectures this must be
enabled carefully, but on ARM AArch64 the situation is different:
- The ARM Generic Timer is architecturally mandatory for all processors
that implement the AArch64 execution state.
- The system counter (`CNTVCT_EL0`) and its frequency (`CNTFRQ_EL0`) are
guaranteed to exist and provide a monotonic time source (per the *“The
Generic Timer in AArch64 state”* section of the *Arm® Architecture
Reference Manual for Armv8-A* —
https://developer.arm.com/documentation/ddi0487/latest).
Because of this architectural guarantee, it is safe to enable the
hardware clock by default on ARM AArch64.
Like detailed bellow, this gives us around 5% boost on io-thread
deployments for a simple strings benchmark.
Introduce encodeEntryKey() helper to centralize key encoding logic for
no_value dicts, replacing 4 instances of duplicated code.
This also fixes a bug in dictDefragBucket() where:
- Before: *bucketref = newkey (loses ENTRY_PTR_IS_EVEN_KEY tag)
- After: *bucketref = encodeEntryKey(d, newkey) (preserves tag bits)
The bug affects dicts with no_value=1 and keys_are_odd=0 when defragKey
callback returns a relocated pointer. Currently theoretical as main DB
dict uses defragKey=NULL.
Vector sets have the ability to also ask for ground truth performing an
O(N) scan.
This allows to perform a recall test against any key holding a vector
set, allowing users to verify what is the best EF value to use and how
HNSW performs depending on the data set on a given key (the level of
clustering changes significantly how vectors near/far a cluster will
behave).
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
- Allow replicas to track master's migrate task state
Previously, we only propagate import task info to replicas, but now we
also support propagating migrate task info, so the new master can
initiate slots trimming again if needed after failover, this can avoid
data redundancy.
- Prevent replicas from initiating slot trimming actively
Lack of data cleaning mechanism on source side, so we allow replicas to
continue pending slot trimming, but it is not good idea to let replicas
trim actively. As we introduce above feature, we can delete this logic
This optimization is based on Valkey valkey-io/valkey#1389
ZRANK no longer performs per-level string comparisons when walking
the skiplist. Instead, it retrieves the skiplist node directly from
the hash table entry via pointer arithmetic.
Rank is computed by walking upward from the node and summing spans
using stored node heights, eliminating costly byte-wise comparisons.
This improves ZRANK throughput by 2–14% depending on score
distribution.
---------
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
MSETEX doesn't properly check ACL key permissions for all keys - only
the first key is validated.
MSETEX arguments look like: MSETEX <numkeys> key1 val1 key2 val2 ... EX
seconds
Keys are at every 2nd position (step=2). When Redis extracts keys for
ACL checking, it calculates where the last key is:
last = first + numkeys - 1; => calculation ignores step
last = first + (numkeys-1) * step;
With 2 keys starting at position 2:
Bug: last = 2 + 2 - 1 = 3 → only checks position 2
Fix: last = 2 + (2-1)*2 = 4 → checks positions 2 and 4
Fixes#14657
## Summary
This bus was introduced by https://github.com/redis/redis/pull/14623
Before PR #14623, when a stream node was going to be fully removed, we
would just delete the whole node directly instead of iterating through
and deleting each entry.
Now, with the XTRIM/XADD flags, we have to iterate and delete entries
one by one. However, the implementation in issue #8169 didn’t consider
the case where all entries are removed, so `p` can end up being NULL.
Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking
the last entry in a listpack as deleted. The issue occurs when
performing pointer arithmetic on a NULL pointer after `lpNext()` reaches
the end of the listpack.
## Solution
If p is NULL, we skip the delta calculation and the calculation of
new `p`.
**Disclaimer: this patch was created with the help of AI**
My experience with the Redis test not passing on older hardware didn't
stop just with the other PR opened with the same problem. There was
another deadlock happening when the test was writing a lot of commands
without reading it back, and the cause seems related to the fact that
such tests have something in common. They create a deferred client (that
does not read replies at all, if not asked to), flood the server with 1
million of requests without reading anything back. This results in a
networking issue where the TCP socket stops accepting more data, and the
test hangs forever.
To read those replies from time to time allows to run the test on such
older hardware.
Ping oranagra that introduced at least one of the bulk writes tests.
AFAIK there is no problem in the test, if we change it in this way,
since the slave buffer is going to be filled anyway. But better to be
sure that it was not intentional to write all those data without reading
back for some reason I can't see.
IMPORTANT NOTE: **I am NOT sure at all** that the TCP socket senses
congestion in one side and also stops the other side, but anyway this
fix works well and is likely a good idea in general. At the same time, I
doubt there is a pending bug in Redis that makes it hang if the output
buffer is too large, or we are flooding the system with too many
commands without reading anything back. So the actual cause remains
cloudy. I remember that Redis, when the output limit is reached, could
kill the client, and not lower the priority of command processing. Maybe
Oran knows more about this.
## LLM commit message.
The test "slave buffer are counted correctly" was hanging indefinitely
on slow machines. The test sends 1M pipelined commands without reading
responses, which triggers a TCP-level deadlock.
Root cause: When the test client sends commands without reading
responses:
1. Server processes commands and sends responses
2. Client's TCP receive buffer fills (client not reading)
3. Server's TCP send buffer fills
4. Packets get dropped due to buffer pressure
5. TCP congestion control interprets this as network congestion
6. cwnd (congestion window) drops to 1, RTO increases exponentially
7. After multiple backoffs, RTO reaches ~100 seconds
8. Connection becomes effectively frozen
This was confirmed by examining TCP socket state showing cwnd:1,
backoff:9, rto:102912ms, and rwnd_limited:100% on the client side.
The fix interleaves reads with writes by processing responses every
10,000 commands. This prevents TCP buffers from filling to the point
where congestion control triggers the pathological backoff behavior.
The test still validates the same functionality (slave buffer memory
accounting) since the measurement happens after all commands complete.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This change simplifies the dictionary API for handling stored keys by
replacing the previous dict stored-key mechanism with a cleaner
`keyFromStoredKey` callback approach.
### Summary
This PR introduces two new maxmemory eviction policies: `volatile-lrm`
and `allkeys-lrm`.
LRM (Least Recently Modified) is similar to LRU but only updates the
timestamp on write operations, not read operations. This makes it useful
for evicting keys that haven't been modified recently, regardless of how
frequently they are read.
### Core Implementation
The LRM implementation reuses the existing LRU infrastructure but with a
key difference in when timestamps are updated:
- **LRU**: Updates timestamp on both read and write operations
- **LRM**: Updates timestamp only on write operations via `updateLRM()`
### Key changes:
Add `keyModified()` to accept an optional `robj *val` parameter and call
`updateLRM()` when a value is provided. Since `keyModified()` serves as
the unified entry point for all key modifications, placing the LRM
update here ensures timestamps are consistently updated across all write
operations
---------
Co-authored-by: oranagra <oran@redislabs.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
This bug was introduced by #14130 and found by guybe7
When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete
strategies, if a node was eligible for removal but couldn't be removed
directly (because consumer group references need to be checked), the
code would incorrectly break out of the loop instead of continuing to
process entries within the node. This fix allows the per-entry deletion
logic to execute for eligible nodes when using non-KEEPREF strategies.
Follow https://github.com/redis/redis/pull/14423
In https://github.com/redis/redis/pull/14423,
I thought the last lpNext operation of the iterator occurred at the end
of streamIteratorGetID.
However, I overlooked the fact that after calling
`streamIteratorGetID()`, we might still use `streamIteratorGetField()`
to continue moving within the current entry.
This means that during reverse iteration, the iterator could move back
to a previous entry position.
To fix this, in this PR I record the current position at the beginning
of streamIteratorGetID().
When we enter it again next time, we ensure that the entry position does
not exceed the previous one,
that is, during forward iteration the entry must be greater than the
last entry position,
and during reverse iteration it must be smaller than the last entry
position.
Note that the fix for https://github.com/redis/redis/pull/14423 has been
replaced by this fix.
This PR fixes a bug on Solaris where setsockopt() fails with EINVAL
when TCP keepalive parameters fall below the kernel's 10-second
minimum. When the user-configured interval is divided by 3 to
calculate TCP_KEEPINTVL, values below 30 seconds result in
intervals less than 10 seconds (e.g., interval=25 → intvl=8),
causing connection failures.
The fix adds if (intvl < 10) intvl = 10; to enforce the minimum and
simplifies the TCP_KEEPALIVE_ABORT_THRESHOLD calculation for older
Solaris versions. This behavior is documented in the Oracle
Solaris TCP manual.
Problem:
The activeDefragHfieldDictCallback was wrongly using dictSetKey() which
set key and value. However, hash field dictionaries use no_value=1
(since PR #14595), causing assertion `assert(!d->type->no_value)` to
fail
Solution:
* Replace `dictSetKey(d, (dictEntry *)de, newEntry)` with
`dictSetKeyAtLink(d, newEntry, &plink, 0)` which properly handles both
regular dictEntry and the optimized `no_value=1` case where keys are
stored directly in the hash table. The callback already receives the
plink parameter pointing to the exact location that needs updating.
* Following PR #14595 value can be now optionally embedded in `entry`.
As a result, `activeDefragEntry()` refines and defragments an entry’s
value only when `entryGetValuePtrRef(entry) != NULL`.
This PR continues the work from
[#13400](https://github.com/redis/redis/pull/13400), following the
discussion in
[#11747](https://github.com/redis/redis/pull/11747#discussion_r1094418111),
to further ensure sensitive user data is not exposed in logs when
hide_user_data_from_log is enabled.
- Introduce redactLogCstr() helper for safe, centralized log redaction.
- Update ACL and networking log messages to use redacted values where
appropriate.
- Prevent leaking raw query buffer contents.
During hash RDB loading (`rdbLoadObject`), the element count `len` is
consumed as entries are read.
In the listpack -> hashtable (HT) spillover path, we later used the
*remaining* `len` for `dictTryExpand`.
By that point `len` may no longer represent the original cardinality
(and can be 0), which can skip/undersize the pre-sizing and lead to
extra rehash/expansion work while loading large hashes.
The same issue existed in the hash-with-metadata (field expire) load
path.
Flags defined (mutually exclusive):
plainFlag = flags & RDB_LOAD_PLAIN
sdsFlag = flags & RDB_LOAD_SDS
robjFlag = !(plainFlag || sdsFlag)
If robjFlag is true, the function returns early. Otherwise we are in the
plain/sds path:
plainFlag → allocate with ztrymalloc_usable()
sdsFlag → allocate with sdstrynewlen()
Thus, in error handling only two cases exist:
plainFlag → zfree(buf)
else → sdsFlag → sdsfree(buf)
The hfldFlag branch assumed a third allocation path that no longer exists
after PR #14595, making entryFree(buf, NULL) unreachable.
This PR implements support for automatic client authentication based on
a field in the client's TLS certificate.
We adopt ValKey’s PR: https://github.com/valkey-io/valkey/pull/1920
API Changes:
Add New configuration tls-auth-clients-user
- Allowed values: `off` (default), `CN`.
- `off` – disable TLS certificate–based auto-authentication.
- `CN` – derive the ACL username from the Common Name (CN) field of the
client certificate.
New INFO stat
- `acl_access_denied_tls_cert`
- Counts failed TLS certificate–based authentication attempts, i.e. TLS
connections where a client certificate was presented, a username was
derived from it, but no matching ACL user was found.
New ACL LOG reason
- Reason string: `"tls-cert"`
- Emitted when a client certificate’s Common Name fails to match any
existing ACL user.
Implementation Details:
- Added getCertFieldByName() utility to extract fields from peer
certificates.
- Added autoAuthenticateClientFromCert() to handle automatic login logic
post-handshake.
- Integrated automatic authentication into the TLSAccept function after
handshake completion.
- Updated test suite (tests/integration/tls.tcl) to validate the
feature.
The condition for blocking `o->refcount++` in `incrRefCount` is `if
(o->refcount < OBJ_FIRST_SPECIAL_REFCOUNT)`, meaning refcount can
accidentally reach the first special refcount (`OBJ_STATIC_REFCOUNT`
currently).
Fixed the condition to be `if (o->refcount < OBJ_FIRST_SPECIAL_REFCOUNT
- 1)`
Unifies field–value pairs and optional expiration into a single allocation,
removing the generic mstr abstraction (mstr was discarded because its overly
generic design added complexity and runtime overhead without clear benefits
for hash workloads). The new Entry layout supports embedded values (≤128B)
and pointer-based values, with expiration metadata integrated for per-field hash
TTLs. Update hash dictionaries to no_value=1 and apply optimizations to avoid
regressions. This significantly reduces hash memory usage (~30–50%) with minimal
performance impact.
This PR containts a few changes for ASM:
**Bug fix:**
- Fixes an issue in ASM when adjacent slot ranges are provided in
CLUSTER MIGRATION IMPORT command (e.g. 0-10 11-100). ASM task keeps the
original slot ranges as given, but later the source node reconstructs
the slot ranges from the config update as a single range (e.g. 0-100).
This causes asmLookupTaskBySlotRangeArray() to fail to match the task,
and the source node incorrectly marks the ASM task as failed. Although
the migration completes successfully, the source node performs a
blocking trim operation for these keys, assuming the slot ownership
changed outside of an ASM operation. With this PR, redis merges adjacent
slot ranges in a slot range array to avoid this problem.
**Other improvements:**
- Indicates imported/migrated key count in the log once asm operation is
completed.
- Use error return value instead of assert in parseSlotRangesOrReply()
- Validate slot range array that is given by cluster implementation on
ASM_EVENT_IMPORT_START.
---------
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
These bugs was located by @rantidhar
This PR fixes two related issues in kvstore iterator handling when
dealing with empty kvstores:
1. If the kvstore is empty, kvstoreGetFirstNonEmptyDictIndex() may
return 0. For example, during defragmentation, it may only be when
calling kvstoreGetNextNonEmptyDictIndex() that the invalid slot is
detected. This fix ensures that kvstoreGetFirstNonEmptyDictIndex() will
eventually return -1 and terminate the defragmentation process.
However, currently, when the kvstore is created, the number of
dictionary arrays is at least 1, so this is just a defensive fix.
2. If a kvstoreIterator is initialized but not used by calling
kvstoreIteratorNextDict() before it is released, then during the
kvstoreIteratorReset(), using didx(-1) to access the dictionary array
could lead to an out-of-bounds access. However, in the current code,
there will never be a situation where kvstoreIteratorNextDict() is not
called, so this is just a defensive fix.
---------
Co-authored-by: rantidhar <ran.tidhar@redis.com>
`kqueue` has the capability of batch applying events:
> The kevent,() kevent64() and kevent_qos() system calls are used to
register events with the queue, and return any pending events to the
user. The changelist argument is a pointer to an array of kevent,
kevent64_s or kevent_qos_s structures, as defined in <sys/event.h>. All
changes contained in the changelist are applied before any pending
events
are read from the queue. The nchanges argument gives the size of
changelist.
This PR implements this functionality for `kqueue` with which we're able
to reduce plenty of system calls of `kevent(2)`.
## References
[FreeBSD - kqueue](https://man.freebsd.org/cgi/man.cgi?kqueue)
Using a different cluster implementation instead of the legacy one may
result in inconsistent slot migration logs, which can cause confusion.
Therefore, we should centralize these logs within the slot migration
process itself rather than relying on the specific cluster
implementation.
Since all commands that invoke streamReplyWithRange with a group
argument always pass end as NULL, therefore will not trigger incorrect
stream ID comparisons. In other words, even if this bug remains unfixed,
no incident would occur.
This PR introduces a mechanism that allows a module to temporarily
disable trimming after an ASM migration operation so it can safely
finish ongoing asynchronous jobs that depend on keys in migrating (and
about to be trimmed) slots.
1. **ClusterDisableTrim/ClusterEnableTrim**
We introduce `ClusterDisableTrim/ClusterEnableTrim` Module APIs to allow
module to disable/enable slot migration
```
/* Disable automatic slot trimming. */
int RM_ClusterDisableTrim(RedisModuleCtx *ctx)
/* Enable automatic slot trimming */
int RM_ClusterEnableTrim(RedisModuleCtx *ctx)
```
**Please notice**: Redis will not start any subsequent import or migrate
ASM operations while slot trimming is disabled, so modules must
re-enable trimming immediately after completing their pending work.
The only valid and meaningful time for a module to disable trimming
appears to be after the MIGRATE_COMPLETED event.
2. **REDISMODULE_OPEN_KEY_ACCESS_TRIMMED**
Added REDISMODULE_OPEN_KEY_ACCESS_TRIMMED to RM_OpenKey() so that module
can operate with these keys in the unowned slots after trim is paused.
And now we don't delete the key if it is in trim job when we access it.
And `expireIfNeeded` returns `KEY_VALID` if
`EXPIRE_ALLOW_ACCESS_TRIMMED` is set, otherwise, returns `KEY_TRIMMED`
without deleting key.
3. **REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS**
We also extend RM_GetContextFlags() to include a flag
REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS indicating whether a trimming job
is pending (due to trim pause) or in progress. Modules could
periodically poll this flag to synchronize their internal state, e.g.,
if a trim job was delayed or if the module incorrectly assumed trimming
was still active.
Bugfix: RM_SetClusterFlags could not clear a flag after enabling it first.
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
This PR introduces a new flag `--name <client-name>` to `redis-cli`.
This allows users to specify a persistent client name that remains
associated with the connection.
Implementation Details:
- Configuration: Added `client_name` field to the global config struct.
- Argument Parsing: Updated `parseOptions` to handle the `--name` flag.
- Unified Logic (`cliSetName`): Introduced a helper function cliSetName
that sends `CLIENT SETNAME <name>` immediately after the connection is
established. This ensures the name is set consistently for both RESP2
and RESP3 modes.
- Documentation: Updated `redis-cli --help` output to include the new
flag.
This PR can close#14585
`repl-diskless-load` feature can effectively reduce the time of full
synchronization, but maybe it is not widely used.
`swapdb` option needs double `maxmemory`, and `on-empty-db` only works
on the first full sync (the replica must have no data).
This PR introduce a new option: `flushdb` - Always flush the entire
dataset before diskless load. If the diskless load fails, the replica
will lose all existing data.
Of course, it brings the risk of data loss, but it provides a choice if
you want to reduce full sync time and accept this risk.
This PR implements flexible keyword-based argument parsing for all 12
hash field expiration commands, allowing users to specify arguments in
any logical order rather than being constrained by rigid positional
requirements.
This enhancement follows Redis's modern design of keyword-based flexible
argument ordering and significantly improves user experience.
Commands with Flexible Parsing
HEXPIRE, HPEXPIRE, HEXPIREAT, HPEXPIREAT, HGETEX, HSETEX
some examples:
HEXPIRE:
* All these are equivalent and valid:
HEXPIRE key EX 60 NX FIELDS 2 f1 f2
HEXPIRE key NX EX 60 FIELDS 2 f1 f2
HEXPIRE key FIELDS 2 f1 f2 EX 60 NX
HEXPIRE key FIELDS 2 f1 f2 NX EX 60
HEXPIRE key NX FIELDS 2 f1 f2 EX 60
HGETEX:
* All these are equivalent and valid:
HGETEX key EX 60 FIELDS 2 f1 f2
HGETEX key FIELDS 2 f1 f2 EX 60
HSETEX:
* All these are equivalent and valid:
HSETEX key FNX EX 60 FIELDS 2 f1 v1 f2 v2
HSETEX key EX 60 FNX FIELDS 2 f1 v1 f2 v2
HSETEX key FIELDS 2 f1 v1 f2 v2 FNX EX 60
HSETEX key FIELDS 2 f1 v1 f2 v2 EX 60 FNX
HSETEX key FNX FIELDS 2 f1 v1 f2 v2 EX 60
1. GitHub has deprecated older macOS runners, and macos-13 is no longer
supported. Updating to macos-26 ensures that CI workflows continue to
run without interruption.
2. Previously, cross-platform-actions/action@v0.22.0 used runs-on:
macos-13. I checked the latest version of cross-platform-actions, and
the official examples now use runs-on: ubuntu. I think we can switch
from macOS to Ubuntu.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Decouple kvstore from its metadata by introducing `kvstoreType` structure of
callbacks. This resolves the abstraction layer violation of having kvstore
include `server.h` directly.
Move (again) cluster slot statistics to per slot dicts' metadata. The callback
`canFreeDict` is used to prevent freeing empty per slot dicts from losing per
slot statistics.
Co-authored-by: Ran Tidhar <ran.tidhar@redis.com>