Commit graph

12870 commits

Author SHA1 Message Date
Filipe Oliveira (Redis)
3c96680cfb
Enable hardware clock by default on ARM AArch64. (#14676)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Redis can already use a processor-provided hardware counter as a
high-performance monotonic clock. On some architectures this must be
enabled carefully, but on ARM AArch64 the situation is different:

- The ARM Generic Timer is architecturally mandatory for all processors
that implement the AArch64 execution state.
- The system counter (`CNTVCT_EL0`) and its frequency (`CNTFRQ_EL0`) are
guaranteed to exist and provide a monotonic time source (per the *“The
Generic Timer in AArch64 state”* section of the *Arm® Architecture
Reference Manual for Armv8-A* —
https://developer.arm.com/documentation/ddi0487/latest).

Because of this architectural guarantee, it is safe to enable the
hardware clock by default on ARM AArch64.
Like detailed bellow, this gives us around 5% boost on io-thread
deployments for a simple strings benchmark.
2026-01-13 20:12:04 +08:00
Salvatore Sanfilippo
60a4fa2e4b
Vsets: Remove stale note about replication from README. (#14528) 2026-01-13 16:13:59 +08:00
Moti Cohen
cc1660abdd
Refactor dict key encoding and fix defrag tag bit bug (#14682)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Introduce encodeEntryKey() helper to centralize key encoding logic for
no_value dicts, replacing 4 instances of duplicated code.

This also fixes a bug in dictDefragBucket() where:
- Before: *bucketref = newkey (loses ENTRY_PTR_IS_EVEN_KEY tag)
- After: *bucketref = encodeEntryKey(d, newkey) (preserves tag bits)

The bug affects dicts with no_value=1 and keys_are_odd=0 when defragKey
callback returns a relocated pointer. Currently theoretical as main DB
dict uses defragKey=NULL.
2026-01-12 13:19:03 +02:00
Salvatore Sanfilippo
391530cd15
[Vector sets]: redis-cli recall testing abilities (#14408)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Vector sets have the ability to also ask for ground truth performing an
O(N) scan.
This allows to perform a recall test against any key holding a vector
set, allowing users to verify what is the best EF value to use and how
HNSW performs depending on the data set on a given key (the level of
clustering changes significantly how vectors near/far a cluster will
behave).

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2026-01-12 12:40:39 +08:00
Vitah Lin
e396dd3385
Fix flaky stream LRM test due to timing precision (#14674)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
2026-01-09 10:14:44 +08:00
Yuan Wang
858a8800e2
Propagate migrate task info to replicas (#14672)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
- Allow replicas to track master's migrate task state
Previously, we only propagate import task info to replicas, but now we
also support propagating migrate task info, so the new master can
initiate slots trimming again if needed after failover, this can avoid
data redundancy.

- Prevent replicas from initiating slot trimming actively
Lack of data cleaning mechanism on source side, so we allow replicas to
continue pending slot trimming, but it is not good idea to let replicas
trim actively. As we introduce above feature, we can delete this logic
2026-01-08 19:06:57 +08:00
Moti Cohen
29f733484a
Optimize ZRANK by avoiding string comparisons during skiplist traversal (#14636)
This optimization is based on Valkey valkey-io/valkey#1389

ZRANK no longer performs per-level string comparisons when walking
the skiplist. Instead, it retrieves the skiplist node directly from
the hash table entry via pointer arithmetic.

Rank is computed by walking upward from the node and summing spans
using stored node heights, eliminating costly byte-wise comparisons.

This improves ZRANK throughput by 2–14% depending on score
distribution.

--------- 
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
2026-01-08 11:20:52 +02:00
Slavomir Kaslev
5aa47347e7
Fix CLUSTER SLOT-STATS test Lua scripts (#14671)
Fix hard-coded keys in test Lua scripts which is incompatible with
cluster-mode.

Reported-by: Oran Agra <oran@redis.com>
2026-01-08 11:16:50 +02:00
Stav-Levi
73249497d4
Fix ACL key-pattern bypass in MSETEX command (#14659)
MSETEX doesn't properly check ACL key permissions for all keys - only
the first key is validated.

MSETEX arguments look like: MSETEX <numkeys> key1 val1 key2 val2 ... EX
seconds

Keys are at every 2nd position (step=2). When Redis extracts keys for
ACL checking, it calculates where the last key is:

last = first + numkeys - 1;        => calculation ignores step
last = first + (numkeys-1) * step; 
With 2 keys starting at position 2:

Bug: last = 2 + 2 - 1 = 3 → only checks position 2
Fix: last = 2 + (2-1)*2 = 4 → checks positions 2 and 4

Fixes #14657
2026-01-08 08:41:55 +02:00
debing.sun
85ab4cab58
Fix UBSan error in stream trim when processing last entry (#14669)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
## Summary

This bus was introduced by https://github.com/redis/redis/pull/14623

Before PR #14623, when a stream node was going to be fully removed, we
would just delete the whole node directly instead of iterating through
and deleting each entry.

Now, with the XTRIM/XADD flags, we have to iterate and delete entries
one by one. However, the implementation in issue #8169 didn’t consider
the case where all entries are removed, so `p` can end up being NULL.

Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking
the last entry in a listpack as deleted. The issue occurs when
performing pointer arithmetic on a NULL pointer after `lpNext()` reaches
the end of the listpack.

## Solution
If p is NULL, we skip the delta calculation and the calculation of
new `p`.
2026-01-07 20:51:41 +08:00
Salvatore Sanfilippo
154fdcee01
Test tcp deadlock fixes (#14667)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
**Disclaimer: this patch was created with the help of AI**

My experience with the Redis test not passing on older hardware didn't
stop just with the other PR opened with the same problem. There was
another deadlock happening when the test was writing a lot of commands
without reading it back, and the cause seems related to the fact that
such tests have something in common. They create a deferred client (that
does not read replies at all, if not asked to), flood the server with 1
million of requests without reading anything back. This results in a
networking issue where the TCP socket stops accepting more data, and the
test hangs forever.

To read those replies from time to time allows to run the test on such
older hardware.

Ping oranagra that introduced at least one of the bulk writes tests.
AFAIK there is no problem in the test, if we change it in this way,
since the slave buffer is going to be filled anyway. But better to be
sure that it was not intentional to write all those data without reading
back for some reason I can't see.

IMPORTANT NOTE: **I am NOT sure at all** that the TCP socket senses
congestion in one side and also stops the other side, but anyway this
fix works well and is likely a good idea in general. At the same time, I
doubt there is a pending bug in Redis that makes it hang if the output
buffer is too large, or we are flooding the system with too many
commands without reading anything back. So the actual cause remains
cloudy. I remember that Redis, when the output limit is reached, could
kill the client, and not lower the priority of command processing. Maybe
Oran knows more about this.

## LLM commit message.

The test "slave buffer are counted correctly" was hanging indefinitely
on slow machines. The test sends 1M pipelined commands without reading
responses, which triggers a TCP-level deadlock.

Root cause: When the test client sends commands without reading
responses:
1. Server processes commands and sends responses
2. Client's TCP receive buffer fills (client not reading)
3. Server's TCP send buffer fills
4. Packets get dropped due to buffer pressure
5. TCP congestion control interprets this as network congestion
6. cwnd (congestion window) drops to 1, RTO increases exponentially
7. After multiple backoffs, RTO reaches ~100 seconds
8. Connection becomes effectively frozen

This was confirmed by examining TCP socket state showing cwnd:1,
backoff:9, rto:102912ms, and rwnd_limited:100% on the client side.

The fix interleaves reads with writes by processing responses every
10,000 commands. This prevents TCP buffers from filling to the point
where congestion control triggers the pathological backoff behavior.

The test still validates the same functionality (slave buffer memory
accounting) since the measurement happens after all commands complete.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 14:26:22 +08:00
Moti Cohen
da4c5eec82
Replace fragile dict stored-key API with getKeyId callback (#14646)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This change simplifies the dictionary API for handling stored keys by
replacing the previous dict stored-key mechanism with a cleaner
`keyFromStoredKey` callback approach.
2026-01-06 18:57:28 +02:00
debing.sun
0cb1ee0dc1
New eviction policies - least recently modified (#14624)
### Summary

This PR introduces two new maxmemory eviction policies: `volatile-lrm`
and `allkeys-lrm`.
LRM (Least Recently Modified) is similar to LRU but only updates the
timestamp on write operations, not read operations. This makes it useful
for evicting keys that haven't been modified recently, regardless of how
frequently they are read.

### Core Implementation

The LRM implementation reuses the existing LRU infrastructure but with a
key difference in when timestamps are updated:

- **LRU**: Updates timestamp on both read and write operations
- **LRM**: Updates timestamp only on write operations via `updateLRM()`

### Key changes:
Add `keyModified()` to accept an optional `robj *val` parameter and call
`updateLRM()` when a value is provided. Since `keyModified()` serves as
the unified entry point for all key modifications, placing the LRM
update here ensures timestamps are consistently updated across all write
operations

---------

Co-authored-by: oranagra <oran@redislabs.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2026-01-06 20:57:31 +08:00
debing.sun
9ca860be9e
Fix XTRIM/XADD with approx not deletes entries for DELREF/ACKED strategies (#14623)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This bug was introduced by #14130 and found by guybe7 

When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete
strategies, if a node was eligible for removal but couldn't be removed
directly (because consumer group references need to be checked), the
code would incorrectly break out of the loop instead of continuing to
process entries within the node. This fix allows the per-entry deletion
logic to execute for eligible nodes when using non-KEEPREF strategies.
2026-01-05 21:17:36 +08:00
debing.sun
4eda670de9
Fix infinite loop during reverse iteration due to invalid numfields of corrupted stream (#14472)
Follow https://github.com/redis/redis/pull/14423

In https://github.com/redis/redis/pull/14423,
I thought the last lpNext operation of the iterator occurred at the end
of streamIteratorGetID.
However, I overlooked the fact that after calling
`streamIteratorGetID()`, we might still use `streamIteratorGetField()`
to continue moving within the current entry.
This means that during reverse iteration, the iterator could move back
to a previous entry position.

To fix this, in this PR I record the current position at the beginning
of streamIteratorGetID().
When we enter it again next time, we ensure that the entry position does
not exceed the previous one,
that is, during forward iteration the entry must be greater than the
last entry position,
and during reverse iteration it must be smaller than the last entry
position.

Note that the fix for https://github.com/redis/redis/pull/14423 has been
replaced by this fix.
2026-01-05 21:16:53 +08:00
Andy Pan
7511a1919b
Sanitize TCP_KEEPINTVL and simplify TCP_KEEPALIVE_ABORT_THRESHOLD on Solaris (#13142)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This PR fixes a bug on Solaris where setsockopt() fails with EINVAL
when TCP keepalive parameters fall below the kernel's 10-second
minimum. When the user-configured interval is divided by 3 to
calculate TCP_KEEPINTVL, values below 30 seconds result in
intervals less than 10 seconds (e.g., interval=25 → intvl=8),
causing connection failures.

The fix adds if (intvl < 10) intvl = 10; to enforce the minimum and
simplifies the TCP_KEEPALIVE_ABORT_THRESHOLD calculation for older
Solaris versions. This behavior is documented in the Oracle
Solaris TCP manual.
2026-01-05 09:57:33 +02:00
Moti Cohen
16068d6b63
Fix: Use dictSetKeyAtLink in activeDefragHfieldDictCallback (#14654)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Problem:
The activeDefragHfieldDictCallback was wrongly using dictSetKey() which
set key and value. However, hash field dictionaries use no_value=1
(since PR #14595), causing assertion `assert(!d->type->no_value)` to
fail

Solution:
* Replace `dictSetKey(d, (dictEntry *)de, newEntry)` with
`dictSetKeyAtLink(d, newEntry, &plink, 0)` which properly handles both
regular dictEntry and the optimized `no_value=1` case where keys are
stored directly in the hash table. The callback already receives the
plink parameter pointing to the exact location that needs updating.
* Following PR #14595 value can be now optionally embedded in `entry`.
As a result, `activeDefragEntry()` refines and defragments an entry’s
value only when `entryGetValuePtrRef(entry) != NULL`.
2026-01-04 14:38:08 +02:00
igalperelman
ea72406275
Updated readme: added a Redis Cloud paragraph (#14651)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Making Redis Cloud a little more visible in the readme file.
2026-01-04 09:09:14 +02:00
Andy Pan
eb2661a46d
Detect accept4() on specific versions of various platforms (#14558)
This PR has mainly done three things:
1. Enable `accept4()` on DragonFlyBSD 4.3+
2. Fix the failures of determining the presence of `accept4()` due to
the missing <sys/param.h> on two OSs: NetBSD, OpenBSD
3. Drop the support of FreeBSD <10.0 for `redis`, FreeBSD 10 is past
EOL, as are the two major versions following it, so defined(__FreeBSD__)
is sufficient.

- [param.h in
DragonFlyBSD](7485684fa5/sys/sys/param.h (L129-L257))
- [param.h in
FreeBSD](https://github.com/freebsd/freebsd-src/blob/main/sys/sys/param.h#L46-L76)
- [param.h in
NetBSD](b5f8d2f930/sys/sys/param.h (L53-L70))
- [param.h in
OpenBSD](d9c286e032/sys/sys/param.h (L40-L45))

---------

Signed-off-by: Andy Pan <i@andypan.me>
2026-01-04 15:05:07 +08:00
zzj
0ef4a4e7e3
Fix some comment spelling typos (#14648)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2026-01-04 10:38:11 +08:00
RoyBenMoshe
29346eb7dd
Hide PII from ACL log (#14645)
This PR continues the work from
[#13400](https://github.com/redis/redis/pull/13400), following the
discussion in
[#11747](https://github.com/redis/redis/pull/11747#discussion_r1094418111),
to further ensure sensitive user data is not exposed in logs when
hide_user_data_from_log is enabled.

- Introduce redactLogCstr() helper for safe, centralized log redaction.
- Update ACL and networking log messages to use redacted values where
appropriate.
- Prevent leaking raw query buffer contents.
2026-01-04 10:35:30 +08:00
Yueyang (Terry) Tao
174307530b
Expand hash dicts using original length when rdb loading (#14635)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
During hash RDB loading (`rdbLoadObject`), the element count `len` is
consumed as entries are read.
In the listpack -> hashtable (HT) spillover path, we later used the
*remaining* `len` for `dictTryExpand`.
By that point `len` may no longer represent the original cardinality
(and can be 0), which can skip/undersize the pre-sizing and lead to
extra rehash/expansion work while loading large hashes.

The same issue existed in the hash-with-metadata (field expire) load
path.
2025-12-26 14:22:40 +08:00
Moti Cohen
e4b69f9a13
Remove dead code leftover (#14640)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Flags defined (mutually exclusive):
plainFlag = flags & RDB_LOAD_PLAIN
sdsFlag = flags & RDB_LOAD_SDS
robjFlag = !(plainFlag || sdsFlag)

If robjFlag is true, the function returns early. Otherwise we are in the
plain/sds path:
plainFlag → allocate with ztrymalloc_usable()
sdsFlag → allocate with sdstrynewlen()

Thus, in error handling only two cases exist:
plainFlag → zfree(buf)
else → sdsFlag → sdsfree(buf)

The hfldFlag branch assumed a third allocation path that no longer exists
after PR #14595, making entryFree(buf, NULL) unreachable.
2025-12-25 14:14:24 +02:00
Stav-Levi
860b8c772a
Add TLS certificate-based automatic client authentication (#14610)
This PR implements support for automatic client authentication based on
a field in the client's TLS certificate.
We adopt ValKey’s PR: https://github.com/valkey-io/valkey/pull/1920

API Changes:

Add New configuration tls-auth-clients-user  
  -  Allowed values: `off` (default), `CN`.
  - `off` – disable TLS certificate–based auto-authentication.
- `CN` – derive the ACL username from the Common Name (CN) field of the
client certificate.
 
New INFO stat
  - `acl_access_denied_tls_cert`
- Counts failed TLS certificate–based authentication attempts, i.e. TLS
connections where a client certificate was presented, a username was
derived from it, but no matching ACL user was found.

New ACL LOG reason
  - Reason string: `"tls-cert"`
- Emitted when a client certificate’s Common Name fails to match any
existing ACL user.


Implementation Details:

- Added getCertFieldByName() utility to extract fields from peer
certificates.

- Added autoAuthenticateClientFromCert() to handle automatic login logic
post-handshake.

- Integrated automatic authentication into the TLSAccept function after
handshake completion.

- Updated test suite (tests/integration/tls.tcl) to validate the
feature.
2025-12-25 14:07:58 +02:00
itayTziv
877c09f662
incrRefCount off-by-one error (#14647)
The condition for blocking `o->refcount++` in `incrRefCount` is `if
(o->refcount < OBJ_FIRST_SPECIAL_REFCOUNT)`, meaning refcount can
accidentally reach the first special refcount (`OBJ_STATIC_REFCOUNT`
currently).
Fixed the condition to be `if (o->refcount < OBJ_FIRST_SPECIAL_REFCOUNT
- 1)`
2025-12-25 18:51:57 +08:00
Moti Cohen
238a626859
Hash - Unify Field-Value into a single struct along with dict no_value=1 (#14595)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Unifies field–value pairs and optional expiration into a single allocation, 
removing the generic mstr abstraction (mstr was discarded because its overly 
generic design added complexity and runtime overhead without clear benefits 
for hash workloads). The new Entry layout supports embedded values (≤128B) 
and pointer-based values, with expiration metadata integrated for per-field hash 
TTLs. Update hash dictionaries to no_value=1 and apply optimizations to avoid 
regressions. This significantly reduces hash memory usage (~30–50%) with minimal 
performance impact.
2025-12-23 12:19:00 +02:00
Ozan Tezcan
fde3576f88
Fix adjacent slot range behavior in ASM operations (#14637)
This PR containts a few changes for ASM:

**Bug fix:** 
- Fixes an issue in ASM when adjacent slot ranges are provided in
CLUSTER MIGRATION IMPORT command (e.g. 0-10 11-100). ASM task keeps the
original slot ranges as given, but later the source node reconstructs
the slot ranges from the config update as a single range (e.g. 0-100).
This causes asmLookupTaskBySlotRangeArray() to fail to match the task,
and the source node incorrectly marks the ASM task as failed. Although
the migration completes successfully, the source node performs a
blocking trim operation for these keys, assuming the slot ownership
changed outside of an ASM operation. With this PR, redis merges adjacent
slot ranges in a slot range array to avoid this problem.
 
 **Other improvements:**
- Indicates imported/migrated key count in the log once asm operation is
completed.
 - Use error return value instead of assert in parseSlotRangesOrReply()
- Validate slot range array that is given by cluster implementation on
ASM_EVENT_IMPORT_START.

---------

Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2025-12-23 11:54:12 +03:00
h.o.t. neglected
c5f3d3e11c
Fix use-after-free in hnsw_cursor_free (#14627)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Close https://github.com/redis/redis/issues/14626.

Note that this method hasn't been used by any place.
2025-12-22 10:34:50 +08:00
John
0d5d75e04d
Fix incorrect comment about LRU clock resolution in initObjectLRUOrLFU (#14582)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Because the LRU_CLOCK_RESOLUTION macro is 1000 and its comment is LRU
clock resolution in ms
2025-12-20 19:30:15 +08:00
debing.sun
1e974e6311
Fix kvstoreGetFirstNonEmptyDictIndex() and kvstoreIteratorReset() for empty kvstore (#14625)
These bugs was located by @rantidhar 

This PR fixes two related issues in kvstore iterator handling when
dealing with empty kvstores:

1. If the kvstore is empty, kvstoreGetFirstNonEmptyDictIndex() may
return 0. For example, during defragmentation, it may only be when
calling kvstoreGetNextNonEmptyDictIndex() that the invalid slot is
detected. This fix ensures that kvstoreGetFirstNonEmptyDictIndex() will
eventually return -1 and terminate the defragmentation process.
However, currently, when the kvstore is created, the number of
dictionary arrays is at least 1, so this is just a defensive fix.

2. If a kvstoreIterator is initialized but not used by calling
kvstoreIteratorNextDict() before it is released, then during the
kvstoreIteratorReset(), using didx(-1) to access the dictionary array
could lead to an out-of-bounds access. However, in the current code,
there will never be a situation where kvstoreIteratorNextDict() is not
called, so this is just a defensive fix.

---------

Co-authored-by: rantidhar <ran.tidhar@redis.com>
2025-12-19 11:40:01 +08:00
Andy Pan
a9f0f07b7c
Merge kqueue events to reduce system calls (#14557)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
`kqueue` has the capability of batch applying events:

> The kevent,() kevent64() and kevent_qos() system calls are used to
register events with the queue, and return any pending events to the
     user.  The changelist argument is a pointer to an array of kevent,
kevent64_s or kevent_qos_s structures, as defined in <sys/event.h>. All
changes contained in the changelist are applied before any pending
events
     are read from the queue.  The nchanges argument gives the size of
     changelist.

This PR implements this functionality for `kqueue` with which we're able
to reduce plenty of system calls of `kevent(2)`.

## References

[FreeBSD - kqueue](https://man.freebsd.org/cgi/man.cgi?kqueue)
2025-12-18 19:51:02 +08:00
Yuan Wang
dd67275033
Unify slot migration logs across cluster implementations (#14628)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Using a different cluster implementation instead of the legacy one may
result in inconsistent slot migration logs, which can cause confusion.
Therefore, we should centralize these logs within the slot migration
process itself rather than relying on the specific cluster
implementation.
2025-12-18 18:22:25 +08:00
Moti Cohen
2e69130ea3
Improve dict pointer tagging doc (#14616)
Clarifies the pointer tagging scheme used in Redis dicts, particularly
for the no_value=1 optimization introduced in #11595.
2025-12-18 09:24:45 +02:00
John
081693f32e
Fix incorrect comment about STATS_METRIC_* Macro in server.h (#14620) 2025-12-18 14:43:05 +08:00
fanpei91
e6e0cf5764
Fix incorrect stream ID comparison in streamReplyWithRangeFromConsumerPEL() (#14619)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Since all commands that invoke streamReplyWithRange with a group
argument always pass end as NULL, therefore will not trigger incorrect
stream ID comparisons. In other words, even if this bug remains unfixed,
no incident would occur.
2025-12-16 17:00:22 +08:00
Yuan Wang
33391a7b61
Support delay trimming slots after finishing migrating slots (#14567)
This PR introduces a mechanism that allows a module to temporarily
disable trimming after an ASM migration operation so it can safely
finish ongoing asynchronous jobs that depend on keys in migrating (and
about to be trimmed) slots.

1. **ClusterDisableTrim/ClusterEnableTrim**
We introduce `ClusterDisableTrim/ClusterEnableTrim` Module APIs to allow
module to disable/enable slot migration
    ```
    /* Disable automatic slot trimming. */
    int RM_ClusterDisableTrim(RedisModuleCtx *ctx)

    /* Enable automatic slot trimming */
    int RM_ClusterEnableTrim(RedisModuleCtx *ctx)
    ```

**Please notice**: Redis will not start any subsequent import or migrate
ASM operations while slot trimming is disabled, so modules must
re-enable trimming immediately after completing their pending work.

The only valid and meaningful time for a module to disable trimming
appears to be after the MIGRATE_COMPLETED event.

2. **REDISMODULE_OPEN_KEY_ACCESS_TRIMMED**
Added REDISMODULE_OPEN_KEY_ACCESS_TRIMMED to RM_OpenKey() so that module
can operate with these keys in the unowned slots after trim is paused.

And now we don't delete the key if it is in trim job when we access it.
And `expireIfNeeded` returns `KEY_VALID` if
`EXPIRE_ALLOW_ACCESS_TRIMMED` is set, otherwise, returns `KEY_TRIMMED`
without deleting key.

3. **REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS**
We also extend RM_GetContextFlags() to include a flag
REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS indicating whether a trimming job
is pending (due to trim pause) or in progress. Modules could
periodically poll this flag to synchronize their internal state, e.g.,
if a trim job was delayed or if the module incorrectly assumed trimming
was still active.

Bugfix: RM_SetClusterFlags could not clear a flag after enabling it first.

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2025-12-16 16:30:56 +08:00
Rushabh Mehta
ddbd96d8ae
Add --name flag to redis-cli for setting client name (#14588)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
​This PR introduces a new flag `--name <client-name>` to `redis-cli`.
This allows users to specify a persistent client name that remains
associated with the connection.

​Implementation Details:
- ​Configuration: Added `client_name` field to the global config struct.
- ​Argument Parsing: Updated `parseOptions` to handle the `--name` flag.
- ​Unified Logic (`cliSetName`): Introduced a helper function cliSetName
that sends `CLIENT SETNAME <name>` immediately after the connection is
established. This ensures the name is set consistently for both RESP2
and RESP3 modes.
- ​Documentation: Updated `redis-cli --help` output to include the new
flag.

This PR can close #14585
2025-12-15 21:43:51 +08:00
Yuan Wang
f3316c3a1a
Introduce flushdb option for repl-diskless-load (#14596)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
`repl-diskless-load` feature can effectively reduce the time of full
synchronization, but maybe it is not widely used.
`swapdb` option needs double `maxmemory`, and `on-empty-db` only works
on the first full sync (the replica must have no data).

This PR introduce a new option: `flushdb` - Always flush the entire
dataset before diskless load. If the diskless load fails, the replica
will lose all existing data.

Of course, it brings the risk of data loss, but it provides a choice if
you want to reduce full sync time and accept this risk.
2025-12-15 11:25:53 +08:00
Stav-Levi
23aca15c8c
Fix the flexibility of argument positions in the Redis API's (#14416)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This PR implements flexible keyword-based argument parsing for all 12
hash field expiration commands, allowing users to specify arguments in
any logical order rather than being constrained by rigid positional
requirements.
This enhancement follows Redis's modern design of keyword-based flexible
argument ordering and significantly improves user experience.

Commands with Flexible Parsing
HEXPIRE, HPEXPIRE, HEXPIREAT, HPEXPIREAT, HGETEX, HSETEX

some examples: 
HEXPIRE: 
* All these are equivalent and valid:
HEXPIRE key EX 60 NX FIELDS 2 f1 f2
HEXPIRE key NX EX 60 FIELDS 2 f1 f2  
HEXPIRE key FIELDS 2 f1 f2 EX 60 NX
HEXPIRE key FIELDS 2 f1 f2 NX EX 60
HEXPIRE key NX FIELDS 2 f1 f2 EX 60

HGETEX:
* All these are equivalent and valid:
HGETEX key EX 60 FIELDS 2 f1 f2
HGETEX key FIELDS 2 f1 f2 EX 60

HSETEX:
* All these are equivalent and valid:
HSETEX key FNX EX 60 FIELDS 2 f1 v1 f2 v2
HSETEX key EX 60 FNX FIELDS 2 f1 v1 f2 v2
HSETEX key FIELDS 2 f1 v1 f2 v2 FNX EX 60
HSETEX key FIELDS 2 f1 v1 f2 v2 EX 60 FNX
HSETEX key FNX FIELDS 2 f1 v1 f2 v2 EX 60
2025-12-14 09:35:12 +02:00
Lior Kogan
9b7254c810
Clarify that BUILD_WITH_MODULES=yes is not supported on 32 bit systems. (#14606)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Following #14618 , This PR Update the readme file
2025-12-11 11:07:47 +02:00
YaacovHazan
ec84bd6143
Prevent building with modules on 32-bit systems (#14618)
Redis modules do not support 32-bit architectures. The build now fails
early when modules are enabled on such systems.
2025-12-11 11:04:30 +02:00
debing.sun
679e009b73
Add daily CI for vectorset (#14302)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
2025-12-10 08:52:43 +08:00
Vitah Lin
4499d68748
Cleanup redundant declaration of getSlotOrReply() (#14576)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-12-09 17:58:19 +08:00
Vitah Lin
3bcacd8a21
Upgrade GitHub Actions macOS runner (#14613)
1. GitHub has deprecated older macOS runners, and macos-13 is no longer
supported. Updating to macos-26 ensures that CI workflows continue to
run without interruption.
2. Previously, cross-platform-actions/action@v0.22.0 used runs-on:
macos-13. I checked the latest version of cross-platform-actions, and
the official examples now use runs-on: ubuntu. I think we can switch
from macOS to Ubuntu.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-12-09 15:01:58 +08:00
Slavomir Kaslev
5299ccf2a9
Add kvstore type and decouple kvstore from its metadata (#14543)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Decouple kvstore from its metadata by introducing `kvstoreType` structure of
callbacks. This resolves the abstraction layer violation of having kvstore
include `server.h` directly.

Move (again) cluster slot statistics to per slot dicts' metadata. The callback
`canFreeDict` is used to prevent freeing empty per slot dicts from losing per
slot statistics.

Co-authored-by: Ran Tidhar <ran.tidhar@redis.com>
2025-12-08 21:12:33 +02:00
debing.sun
dd57b141b9
Clean up lookahead-related code (#14562)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
## Summary

Clean up lookahead-related(https://github.com/redis/redis/issues/14440)
code by consolidating slot extraction logic.

## Changes

* Replace `GETSLOT_NOKEYS` with `INVALID_CLUSTER_SLOT`
* Refactor `getSlotFromCommand()` to reuse `extractSlotFromKeysResult()`
* Let extractSlotFromKeysResult () behavior more unified and more
readable
* Fix comment alignment

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2025-12-08 14:47:39 +08:00
Yuan Wang
cb71dec0c3
Disable RDB compression when diskless replication is used (#14575)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Fixes #14538

If the master uses diskless synchronization and the replica uses
diskless load, we can disable RDB compression to reduce full sync time.
I tested on AWS and found we could reduce time by 20-40%.

In terms of implementation, when the replica can use diskless load, the
replica will send `replconf rdb-no-compress 1` to master to deliver a
RDB without compression.

If your network is slow, please disable repl-diskless-load, and maybe
even repl-diskless-sync

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2025-12-04 09:24:23 +08:00
Ozan Tezcan
08b63b6ceb
Fix flaky ASM tests (#14604)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
1. Fix "Simple slot migration with write load" by introducing artificial
delay to traffic generator to slow down it for tsan builds. Failed test:
https://github.com/redis/redis/actions/runs/19720942981/job/56503213650

2. Fix "Test RM_ClusterCanAccessKeysInSlot returns false for unowned
slots" by waiting config propagation before checking it on a replica.
Failed test:
https://github.com/redis/redis/actions/runs/19841852142/job/56851802772
2025-12-03 12:12:48 +03:00
Ozan Tezcan
3c57a8fc92
Retry an ASM import step when the source node is temporarily not ready (#14599)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
The cluster implementation may be temporarily unavailable and return an
error to the `ASM_EVENT_MIGRATE_PREP` event to prevent starting a new
migration. Although this is most likely a transient condition, the
source node has no way to distinguish it from a real error, so it must
fail the import attempt and start a new one.

In Redis, failing an attempt is cheap, but in other cluster
implementations it may require cleaning up resources and can cause
unnecessary disruption.

This PR introduces a new `-NOTREADY` error reply for the `CLUSTER
SYNCSLOTS SYNC` command. When the source replies with `-NOTREADY`, the
destination can recognize the condition as transient and retry sending
`CLUSTER SYNCSLOTS SYNC` step periodically instead of failing the
attempt.
2025-12-02 13:38:22 +03:00
Ozan Tezcan
86c63588b0
Refactor some of ASM and slot-stats functions (#14587)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
This PR does not introduce any behavioral changes.

- Refactored and moved verifyClusterConfigWithData() into cluster.c.
- Refactored and centralized ASM and slot-stats initialization
functions.

These changes place shared logic in a common location so it can be
reused by different cluster implementations.
2025-11-29 22:41:58 +03:00