Commit graph

12622 commits

Author SHA1 Message Date
Yuan Wang
70a079db5e
Improve multithreaded performance with memory prefetching (#14017)
This PR is based on: https://github.com/valkey-io/valkey/pull/861

> ### Memory Access Amortization
> (Designed and implemented by [dan
touitou](https://github.com/touitou-dan))
> 
> Memory Access Amortization (MAA) is a technique designed to optimize
the performance of dynamic data structures by reducing the impact of
memory access latency. It is applicable when multiple operations need to
be executed concurrently. The principle behind it is that for certain
dynamic data structures, executing operations in a batch is more
efficient than executing each one separately.
> 
> Rather than executing operations sequentially, this approach
interleaves the execution of all operations. This is done in such a way
that whenever a memory access is required during an operation, the
program prefetches the necessary memory and transitions to another
operation. This ensures that when one operation is blocked awaiting
memory access, other memory accesses are executed in parallel, thereby
reducing the average access latency.
> 
> We applied this method in the development of dictPrefetch, which takes
as parameters a vector of keys and dictionaries. It ensures that all
memory addresses required to execute dictionary operations for these
keys are loaded into the L1-L3 caches when executing commands.
Essentially, dictPrefetch is an interleaved execution of dictFind for
all the keys.

### Implementation of Redis
When the main thread processes clients with ready-to-execute commands
(i.e., clients for which the IO thread has parsed the commands), a batch
of up to 16 commands is created. Initially, the command's argv, which
were allocated by the IO thread, is prefetched to the main thread's L1
cache. Subsequently, all the dict entries and values required for the
commands are prefetched from the dictionary before the command
execution.

#### Memory prefetching for main hash table
As shown in the picture, after https://github.com/redis/redis/pull/13806
, we unify key value and the dict uses no_value optimization, so the
memory prefetching has 4 steps:

1. prefetch the bucket of the hash table
2. prefetch the entry associated with the given key's hash
3. prefetch the kv object of the entry
4. prefetch the value data of the kv object

we also need to handle the case that the dict entry is the pointer of kv
object, just skip step 3.

MAA can improves single-threaded memory access efficiency by
interleaving the execution of multiple independent operations, allowing
memory-level parallelism and better CPU utilization. Its key point is
batch-wise interleaved execution. Split a batch of independent
operations (such as multiple key lookups) into multiple state machines,
and interleave their progress within a single thread to hide the memory
access latency of individual requests.

The difference between serial execution and interleaved execution:
**naive serial execution**
```
key1: step1 → wait → step2 → wait → done
key2: step1 → wait → step2 → wait → done
```
**interleaved execution**
```
key1: step1   → step2   → done
key2:   step1 → step2   → done
key3:     step1 → step2 → done
         ↑ While waiting for key1’s memory, progress key2/key3
```

#### New configuration
This PR involves a new configuration `prefetch-batch-max-size`, but we
think it is a low level optimization, so we hide this config:
When multiple commands are parsed by the I/O threads and ready for
execution, we take advantage of knowing the next set of commands and
prefetch their required dictionary entries in a batch. This reduces
memory access costs. The optimal batch size depends on the specific
workflow of the user. The default batch size is 16, which can be
modified using the 'prefetch-batch-max-size' config.
When the config is set to 0, prefetching is disabled.

---------

Co-authored-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2025-06-05 08:57:43 +08:00
Slavomir Kaslev
b7c6755b1b
Add thread sanitizer run to daily CI (#13964)
Add thread sanitizer run to daily CI.

Few tests are skipped in tsan runs for two reasons:
* Stack trace producing tests (oom, `unit/moduleapi/crash`, etc) are
tagged `tsan:skip` because redis calls `backtrace()` in signal handler
which turns out to be signal-unsafe since it might allocate memory (e.g.
glibc 2.39 does it through a call to `_dl_map_object_deps()`).
* Few tests become flaky with thread sanitizer builds and don't finish
in expected deadlines because of the additional tsan overhead. Instead
of skipping those tests, this can improved in the future by allowing
more iterations when waiting for tsan builds.

Deadlock detection is disabled for now because of tsan limitation where
max 64 locks can be taken at once.

There is one outstanding (false-positive?) race in jemalloc which is
suppressed in `tsan.sup`.

Fix few races thread sanitizer reported having to do with writes from
signal handlers. Since in multi-threaded setting signal handlers might
be called on any thread (modulo pthread_sigmask) while the main thread
is running, `volatile sig_atomic_t` type is not sufficient and atomics
are used instead.
2025-06-02 10:13:23 +03:00
Ozan Tezcan
7f60945bc6
Fix short read issue that causes exit() on replica (#14085)
When `repl-diskless-load` is enabled on a replica, and it is in the
process of loading an RDB file, a broken connection detected by the main
channel may trigger a call to rioAbort(). This sets a flag to cause the
rdb channel to fail on the next rioRead() call, allowing it to perform
necessary cleanup.

However, there are specific scenarios where the error is checked using
rioGetReadError(), which does not account for the RIO_ABORT flag (see
[source](79b37ff535/src/rdb.c (L3098))).
As a result, the error goes undetected. The code then proceeds to
validate a module type, fails to find a match, and calls
rdbReportCorruptRDB() which logs the following error and exits the
process:

```
The RDB file contains module data I can't load: no matching module type '_________'
```

To fix this issue, the RIO_ABORT flag has been removed. Now, rioAbort()
sets both read and write error flags, so that subsequent operations and
error checks properly detect the failure.

Additional keys were added to the short read test. It reproduces the
issue with this change. We hit that problematic line once per key. My
guess is that with many smaller keys, the likelihood of the connection
being killed at just the right moment increases.
2025-05-28 12:43:59 +03:00
kei-nan
161326d332
Avoid performing IO on coverage when child exits due to signal handler (#14072)
Compiled Redis with COVERAGE_TEST, while using the fork API encountered
the following issue:
- Forked process calls `RedisModule_ExitFromChild` - child process
starts to report its COW while performing IO operations
- Parent process terminates child process with
`RedisModule_KillForkChild`
- Child process signal handler gets called while an IO operation is
called
- exit() is called because COVERAGE_TEST was on during compilation.
- exit() tries to perform more IO operations in its exit handlers.
- process gets deadlocked

Backtrace snippet:
```
#0  futex_wait (private=0, expected=2, futex_word=0x7e1220000c50) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait_private (futex=0x7e1220000c50) at ./nptl/lowlevellock.c:34
#2  0x00007e1234696429 in __GI__IO_flush_all () at ./libio/genops.c:698
#3  0x00007e123469680d in _IO_cleanup () at ./libio/genops.c:843
#4  0x00007e1234647b74 in __run_exit_handlers (status=status@entry=255, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:129
#5  0x00007e1234647bbe in __GI_exit (status=status@entry=255) at ./stdlib/exit.c:138
#6  0x00005ef753264e13 in exitFromChild (retcode=255) at /home/jonathan/CLionProjects/redis/src/server.c:263
#7  sigKillChildHandler (sig=<optimized out>) at /home/jonathan/CLionProjects/redis/src/server.c:6794
#8  <signal handler called>
#9  0x00007e1234685b94 in _IO_fgets (buf=buf@entry=0x7e122dafdd90 "KSM:", ' ' <repeats 19 times>, "0 kB\n", n=n@entry=1024, fp=fp@entry=0x7e1220000b70) at ./libio/iofgets.c:47
#10 0x00005ef75326c5e0 in fgets (__stream=<optimized out>, __n=<optimized out>, __s=<optimized out>, __s=<optimized out>, __n=<optimized out>, __stream=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:200
#11 zmalloc_get_smap_bytes_by_field (field=0x5ef7534c42fd "Private_Dirty:", pid=<optimized out>) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:928
#12 0x00005ef75338ab1f in zmalloc_get_private_dirty (pid=-1) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:978
#13 sendChildInfoGeneric (info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE, keys=0, progress=-1, pname=0x5ef7534c95b2 "Module fork") at /home/jonathan/CLionProjects/redis/src/childinfo.c:71
#14 0x00005ef75337962c in sendChildCowInfo (pname=0x5ef7534c95b2 "Module fork", info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE) at /home/jonathan/CLionProjects/redis/src/server.c:6895
#15 RM_ExitFromChild (retcode=0) at /home/jonathan/CLionProjects/redis/src/module.c:11468
```

Change is to make the exit() _exit() calls conditional based on a
parameter to exitFromChild function.
The signal handler should exit without io operations since it doesn't
know its history.(If we were in the middle of IO operations before it
was called)

---------

Co-authored-by: Yuan Wang <wangyuancode@163.com>
2025-05-28 16:27:52 +08:00
Moti Cohen
79b37ff535
Fix RESTORE with TTL (#14071)
restoreCommand() creates a key-value object (kv) with a TTL in two steps.
During the second step, setExpire() may reallocate the kv object. To ensure
correct behavior, kv must be updated after this call, as it might be used later
in the function.
2025-05-28 08:02:10 +03:00
Salvatore Sanfilippo
0ac822e154
Implement WITHATTRIBS for VSIM. (#14065)
Hi, as described, this implements WITHATTRIBS, a feature requested by a
few users, and indeed needed.
This was requested the first time by @rowantrollope but I was not sure
how to make it work with RESP2 and RESP3 in a clean way, hopefully
that's it.

The patch includes tests and documentation updates.
2025-05-27 22:12:48 +08:00
debing.sun
bb23eb0b01
Fix incorrect server.cronloops update in defragWhileBlocked() causing timer to run twice as fast (#14081)
This bug was introduced in
[#13814](https://github.com/redis/redis/issues/13814), and was found by
@guybe7.
It incorrectly moved the update of `server.cronloops` from
`whileBlockedCron()` to `activeDefragTimeProc()`,
causing the cron-based timers to effectively run twice as fast when
active defrag is enabled.
As a result, memory statistics are not updated during blocked
operations.
The repair parts from https://github.com/redis/redis/pull/13995, because
it needs to be backport, so use a separate pr repair it.
2025-05-27 17:14:06 +08:00
guybe7
6349a7c4f9
Add GETRANGE tests with negative indices (#13950)
Inspired by https://github.com/redis/redis/pull/12272
2025-05-27 09:41:28 +08:00
debing.sun
e93b44560c
Resolve bounds checks on cluster_legacy.c (#13970)
Based on https://github.com/valkey-io/valkey/pull/1463 and
https://github.com/valkey-io/valkey/pull/1481

In the failure of fully
CI(https://github.com/redis/redis/actions/runs/14595343452/job/40979173087?pr=13965)
in version 7.0 we are getting a number of errors like:
```
array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’
```

Which is basically GCC telling us that we have an object which is longer
than the underlying storage of the allocation. We actually do this a
lot, but GCC is generally not aware of how big the underlying allocation
is, so it doesn't throw this error. We are specifically getting this
error because the msgBlock can be of variable length depending on the
type of message, but GCC assumes it's the longest one possible. The
solution I went with here was make the message type optional, so that it
wasn't included in the size. I think this also makes some sense, since
it's really just a helper for us to easily cast the object around.

This compilation warning only occurs in version 7.2, because in [this
PR](https://github.com/redis/redis/pull/13073), we started passing
`-flto` to `CFLAGS` by default. It seems that in this case, GCC is
unable to detect such warnings. However, this change is not present in
version 7.2.
So, to reproduce this compilation warning in versions after 7.2, we can
pass `OPTIMIZATION=-O2` manually.

---------

Co-authored-by: madolson <34459052+madolson@users.noreply.github.com>
2025-05-26 11:52:06 +03:00
Salvatore Sanfilippo
22ebb06eb3
LOLWUT for Redis 8. (#14048)
# Add LOLWUT 8: TAPE MARK I - Computer Poetry Generation

This PR introduces LOLWUT 8, implementing Nanni Balestrini's
groundbreaking TAPE MARK I algorithm from 1962 - one of the first
experiments in computer-generated poetry.

## Background

TAPE MARK I, created by Italian poet Nanni Balestrini and published in
Almanacco Letterario Bompiani (1962), represents a [pioneering moment in
computational creativity](https://en.wikipedia.org/wiki/Digital_poetry).
Using an IBM 7090 mainframe, Balestrini developed an algorithm that
combines verses from three different literary sources:

1. **Diary of Hiroshima** by Michihito Hachiya
2. **The Mystery of the Elevator** by Paul Goldwin  
3. **Tao Te Ching** by Lao Tse

The algorithm selects and arranges verses based on metrical
compatibility rules and ensures alternation between different literary
sources, creating unique poetic combinations with each execution.

## Implementation

This LOLWUT command faithfully reproduces Balestrini's original
algorithm.
The main difference is that the default output is in English, and not in
Italian. However it should be noted that Balestrini used three poems
that were not in Italian anyway, so the translation process was already
part of it. In the English versions, sometimes I operated minimal
changes in order to preserve either the metric, or to make sure that the
sentence stands on its own (like adding "it" before expands rapidly).

## Cultural Significance

TAPE MARK I predates most computational art experiments and demonstrates
the early intersection of literature, technology, and algorithmic
creativity. This implementation honors that pioneering work while making
it accessible to a modern audience through Redis's LOLWUT tradition.

Each execution generates a unique poem, just as Balestrini intended.

Trivia: the original code, running on an IBM 7090, used six minutes to
generate each verse :D

**IMPORTANT** This commit should be back-ported to Redis 8.
2025-05-26 09:27:45 +03:00
Vitah Lin
35e15962b5
Add length check before content comparison in equalStringObjects (#14062)
### Issue 

Previously, even when only string equality needed to be determined, the
comparison logic still performed unnecessary `memcmp()` calls to check
string ordering, even if the lengths were not equal.

### Change
This PR add length check before content comparison in
`equalStringObjects` function.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-24 11:47:13 +08:00
Yuan Wang
7998a2a05f
To reduce memory churn during client list transfer (#14068)
When we transfer clients between IO thread and main thread, the creation
and destruction of list nodes also consume some CPU. In this commit, we
reuse list nodes to avoid this issue.
2025-05-23 19:21:29 +08:00
Hüseyin Açacak
645858d518
Add size_t cast for RM_call() in module tests (#14061)
This PR addresses a potential misalignment issue when using `va_args`.
Without this fix,
[argument](9a9aa921bc/src/module.c (L6249-L6264))
values may occasionally become incorrect due to stack alignment
inconsistencies.
2025-05-23 10:10:11 +08:00
Yuan Wang
99d30654e8
Let IO threads free argv and rewrite objects (#13968)
Some objects that are allocated in the IO thread, we should let IO
thread free them, so we can avoid memory arena contention and also
reduce the load of the main thread.

These objects include:
- client argv objects
- the rewrite objects that are only `OBJ_ENCODING_RAW` encoding strings,
since only the type object is usually allocated by IO threads.

For the implementation, if the client is assigned to IO threads, we will
create a `deferred_objects` array of size 32. We will put objects into
the `deferred_objects` when main thread wants to free above objects,
and finally they are be freed by IO threads.
2025-05-23 09:09:58 +08:00
Vitah Lin
d592cb7409
Cleanup redundant declaration of kvstoreDictFetchValue() (#14066) 2025-05-22 20:54:37 +08:00
debing.sun
ba88a7fbb6
Fix crash when freeing newly created node when nodeIp2String fail (#14055)
This PR is a reference from
https://github.com/valkey-io/valkey/pull/1535

In the process of handling the
failure(https://github.com/redis/redis/pull/14024) of ARM64 CI with TLS,
found in a slow environment, the nodes might frequently disconnect and
trigger this assertion.

---------

Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-05-21 10:04:40 +08:00
Moti Cohen
8bd50a3b35
Refine and optimize dbSetValue() (#14040)
Optimize and simplify dbSetValue() code into one place in case not required to keepTTL.
2025-05-20 13:23:54 +03:00
Salvatore Sanfilippo
871d4c4004
Test: check always for memory leaks on MacOS. (#14060)
When running the Redis test on MacOS, the test detects that the
operating system is able to use "leaks" to test for memory leaks and
executes this check after every server spinned is terminated.

While we have the ability to run the test in environments able to detect
memory issues, the fact it is possible to check for leaks at every run
baasically for free is very valuable, and allows to fix leaks
immediately in your laptop before submitting a PR.

However, the feature avoided to run leaks when no test was run: this
check was added in the early stage of Redis, when all the tests were
like:

server {
   test { ... }
}

So the check counts for the number of tests ran, and if no test is
executed, no leaks detection is performed. However now we have certain
tests that are in the form:

test {
    server { ... }
}

For instance just loading a corrupted RDB or alike. In this case, the
leaks test is not executed. This commit removes the check so that the
leaks test is always executed.
2025-05-20 17:46:56 +08:00
Mincho Paskalev
8dfb823c51
Implement DIFF, DIFF1, ANDOR and ONE for BITOP (#13898)
This PR adds 4 new operators to the `BITOP` command - `DIFF`, `DIFF1`,
`ANDOR` and `ONE`. They enable redis clients to atomically do
non-trivial logical operations that are useful for checking membership
of a bitmap against a group of bitmaps.
 
* **DIFF**
    `BITOP DIFF dest srckey1 srckey2 [key...]`

    **Description**
DIFF(*X*, *A1*, *A2*, *...*, *AN*) = *X* ∧ ¬(*A1* ∨ *A2* ∨ *...* ∨
*AN*), i.e the set bits of *X* that are not set in any of *A1*, *A2*,
*…*, *AN*

    **NOTE**
    Command expects at least 2 source keys.

* **DIFF1**
    `BITOP DIFF1 dest srckey1 srckey2 [key...]`

    **Description**
DIFF1(*X*, *A1*, *A2*, *...*, *AN*) = ¬*X* ∧ (*A1* ∨ *A2* ∨ *...* ∨
*AN*), i.e the bits set in one or more of *A1*, *A2*, *…*, *AN* that are
not set in *X*

    **NOTE**
    Command expects at least 2 source keys.

* **ANDOR**
    `BITOP ANDOR dest srckey1 srckey2 [key...]`

    **Description**
ANDOR(*X*, *A1*, *A2*, *...*, *AN*) = *X* ∧ (*A1* ∨ *A2* ∨ *...* ∨
*AN*), i.e the set bits of X that are also set in *A1*, *A2*, *…*, *AN*

    **NOTE**
    Command expects at least 2 source keys.

* **ONE**
    `BITOP ONE dest key [key...]`

    **Description**
    ONE(*A1*, *A2*, *...*, *AN*) = *X*, where 
if *X[i]* is the *i*-th bit of *X* then *X[i] = 1* if and only if there
is m such that *A_m[i] = 1* and *An[i] = 0* for all *n != m*, i.e bit
*X[i]* is set only if it set in exactly one of *A1*, *A2*, *...*, *AN*

**Return value**
As in all other `BITOP` operators return value for all the new ones is
the number bytes of the longest key.

EDIT:
Besides adding the new commands couple more changes were made:
- Added AVX2 path for more optimized computation of the BITOP operations
(including the new ones)
- Removed the hard limit of max 16 source keys for the fast path to be
used - now no matter the number of keys we can enter the fast path given
keys are long enough.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-20 10:45:50 +03:00
Mincho Paskalev
391e3452ca
Optimize hash and zset with lpBatchAppend (#13981)
# Description

Use lpBatchAppend/Insert instead of multiple calls to lpAppend/Insert in
hash and zset

HSET Improvement 2.2%
ZADD Improvement 3.3%
2025-05-20 10:44:46 +03:00
Hyeon Sung
9a9aa921bc
extract duplicated AOF list formatting logic into helper function (#14012)
Separated the repeated logic for iterating and formatting AOF info from
both
history and incremental AOF lists into a new helper function named 
appendAofInfoFromList. This improves code readability, reduces
duplication,
and makes the getAofManifestAsString function cleaner and easier to
maintain.

No changes in behavior were introduced.
2025-05-20 11:08:03 +08:00
Moti Cohen
51ad2f8d00
Fix keysizes - SPOP with count (case 3) and SETRANGE (#14028)
This commit addresses issues with the keysizes histogram tracking in two
Redis commands:

**SPOP with count (case 3)**
In the spopWithCountCommand function, when handling case 3 (where the
number of elements to return is very large, approaching the size of the
set itself), the keysizes histogram was not being properly updated. This
PR adds the necessary call to updateKeysizesHist() to ensure the
histogram accurately reflects the changes in set size after the
operation.

**SETRANGE command**
Fixed an issue in the setrangeCommand function where the keysizes
histogram wasn't being properly updated when modifying strings. The PR
ensures that the histogram correctly tracks the old and new lengths of
the string after a SETRANGE operation.

Added tests accordingly.
2025-05-19 16:59:21 +03:00
debing.sun
5d0d64b062
Add support to defrag ebuckets incrementally (#13842)
In PR #13229, we introduced the ebucket for HFE.
Before this PR, when updating eitems stored in ebuckets, the lack of
incremental fragmentation support for non-kvstore data structures (until
PR #13814) meant that we had to reverse lookup the position of the eitem
in the ebucket and then perform the update.
This approach was inefficient as it often required frequent traversals
of the segment list to locate and update the item.

To address this issue, in this PR, This PR implements incremental
fragmentation for hash dict ebuckets and server.hexpires.
By incrementally defrag the ebuckets, we also perform defragmentation
for the associated items, eliminates the need for frequent traversals of
the segment list for defragging the eitem.

---------

Co-authored-by: Moti Cohen <moticless@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-18 12:38:53 +08:00
chx9
090f6252c4
Fix typo in object.c (#13589) 2025-05-17 12:33:37 +08:00
Moti Cohen
3b19c7919b
Fix test INFO overhead for 32bit architecture (#14035)
This PR fixes the `tests/unit/info.tcl` test to properly handle 32-bit 
architectures by dynamically determining the pointer size based on the 
architecture instead of hardcoding it to 8 bytes.
2025-05-15 12:35:36 +03:00
debing.sun
ae0bb6e82a
Fix internal-secret test flakiness under slow environment (#14024)
in the original test, we start a cluster with 20 instances(10 masters +
10 replicas), which leads to frequent disconnections of instances in a
slow environment, resulting in an inability to achieve consistency.

This test reduced the number of instances from 20 to 6.
2025-05-14 16:31:41 +08:00
Vitah Lin
232f2fb077
Include missing getchannels.tcl in moduleapi tests and fix incorrect assertions (#14037) 2025-05-14 08:57:01 +08:00
Mincho Paskalev
6995d8ac17
Fix build flags for dependencies (#14038)
PR https://github.com/redis/redis/pull/13916 introduced a regression -
by overriding the `CFLAGS` and `LDFLAGS` variables for all of the
dependencies hiredis and fast_float lost some of their compiler/linker
flags.

This PR makes it so we can pass additional CFLAGS/LDFLAGS to hiredis,
without overriding them as it contains a bit more complex Makefile. As
for fast_float - passing CFLAGS/LDFLAGS from outside now doesn't break
the expected behavior.

The build step in the CI was changed so that the MacOS is now build with
TLS to catch such errors in the future.
2025-05-13 16:56:22 +03:00
Ozan Tezcan
a0b22576b8
Fix flaky replication test (#14034)
- Fix flaky replication test which checks memory usage on master
- Fix comments in another replication test
2025-05-13 13:29:27 +03:00
Yuan Wang
d5f7672b77
Make IO thread and main thread process in parallel and reduce notifications (#13969)
In pipeline mode, especially with TLS, two IO threads may have worse
performance than single thread, one reason is the io thread and the main
thread cannot process in parallel, now, the IO threads will deliver
clients if pending client list is more than 16 instead of finishing
processing all clients, this approach can make IO threads and main
thread process in parallel as much as possible.

IO threads may do some unnecessary notification with the main thread,
the notification is based on eventfd, read(2) and write(2) eventfd are
system calls that are costly. When they are running, they can check the
pending client list to process in `beforeSleep`, so in this commit, if
both the main thread and the IO thread are running, they can pass the
client without notification, and these transferred clients will be
processed in `beforeSleep`.
2025-05-13 15:13:48 +08:00
Salvatore Sanfilippo
65e164caff
[Vector sets] More rdb loading fixes (#14032)
Hi all, this PR fixes two things:

1. An assertion, that prevented the RDB loading from recovery if there
was a quantization type mismatch (with regression test).
2. Two code paths that just returned NULL without proper cleanup during
RDB loading.
2025-05-12 21:57:38 +03:00
Moti Cohen
e1789e4368
keyspace - Unify key and value & use dict no_value=1 (#13806)
The idea of packing the key (`sds`), value (`robj`) and optionally TTL
into a single struct in memory was mentioned a few times in the past by
the community in various flavors. This approach improves memory
efficiency, reduces pointer dereferences for faster lookups, and
simplifies expiration management by keeping all relevant data in one
place. This change goes along with setting keyspace's dict to
no_value=1, and saving considerable amount of memory.

Two more motivations that well aligned with this unification are:

- Prepare the groundwork for replacing EXPIRE scan based implementation
and evaluate instead new `ebuckets` data structure that was introduced
as part of [Hash Field Expiration
feature](https://redis.io/blog/hash-field-expiration-architecture-and-benchmarks/).
Using this data structure requires embedding the ExpireMeta structure
within each object.
- Consider replacing dict with a more space efficient open addressing
approach hash table that might rely on keeping a single pointer to
object.

Before this PR, I POC'ed on a variant of open addressing hash-table and
was surprised to find that dict with no_value actually could provide a
good balance between performance, memory efficiency, and simplicity.
This realization prompted the separation of the unification step from
the evaluation of a new hash table to avoid introducing too many changes
at once and to evaluate its impact independently before considering
replacement of existing hash-table. On an earlier
[commit](https://github.com/redis/redis/pull/13683) I extended dict
no_value optimization (which saves keeping dictEntry where possible) to
be relevant also for objects with even addresses in memory. Combining it
with this unification saves a considerable amount of memory for
keyspace.

# kvobj
This PR adopts Valkey’s
[packing](3eb8314be6)
layout and logic for key, value, and TTL. However, unlike Valkey
implementation, which retained a common `robj` throughout the project,
this PR distinguishes between the general-purpose, overused `robj`, and
the new `kvobj`, which embeds both the key and value and used by the
keyspace. Conceptually, `robj` serves as a base class, while `kvobj`
acts as a derived class.

Two new flags introduced into redis object, `iskvobj` and `expirable`:
```
struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS;
    unsigned iskvobj : 1;             /* new flag */
    unsigned expirable : 1;           /* new flag */
    unsigned refcount : 30;           /* modified: 32bits->30bits */
    void *ptr;
};

typedef struct redisObject robj;
typedef struct redisObject kvobj;
```
When the `iskvobj` flag is set, the object includes also the key and it
is appended to the end of the object. If the `expirable` flag is set, an
additional 8 bytes are added to the object. If the object is of type
string, and the string is rather short, then it will be embedded as
well.

As a result, all keys in the keyspace are promoted to be of type
`kvobj`. This term attempts to align with the existing Redis object,
robj, and the kvstore data structure.

# EXPIRE Implementation
As `kvobj` embeds expiration time as well, looking up expiration times
is now an O(1) operation. And the hash-table of EXPIRE is set now to be
`no_value` mode, directly referencing `kvobj` entries, and in turn,
saves memory.

Next, I plan to evaluate replacing the EXPIRE implementation with the
[ebuckets](https://github.com/redis/redis/blob/unstable/src/ebuckets.h)
data structure, which would eliminate keyspace scans for expired keys.
This requires embedding `ExpireMeta` within each `kvobj` of each key
with expiration. In such implementation, the `expirable` flag will be
shifted to indicate whether `ExpireMeta` is attached.


# Implementation notes

## Manipulating keyspace (find, modify, insert)
Initially, unifying the key and value into a single object and storing
it in dict with `no_value` optimization seemed like a quick win.
However, it (quickly) became clear that this change required deeper
modifications to how keys are manipulated. The challenge was handling
cases where a dictEntry is opt-out due to no_value optimization. In such
cases, many of the APIs that return the dictEntry from a lookup become
insufficient, as it just might be the key itself. To address this issue,
a new-old approach of returning a "link" to the looked-up key's
`dictEntry` instead of the `dictEntry` itself. The term `link` was
already somewhat available in dict API, and is well aligned with the new
dictEntLink declaration:
```
typedef dictEntry **dictEntLink;
```
This PR introduces two new function APIs to dict to leverage returned
link from the search:
```
dictEntLink dictFindLink(dict *d, const void *key, dictEntLink *bucket);
void dictSetKeyAtLink(dict *d, void *key, dictEntLink *link, int newItem);
```
After calling `link = dictFindLink(...)`, any necessary updates must be
performed immediately after by calling `dictSetKeyAtLink()` without any
intervening operations on given dict. Otherwise, `dictEntLink` may
become invalid. Example:
```
/* replace existing key */
link = dictFindLink(d, key, &bucket, 0);
// ... Do something, but don't modify the dict ...
// assert(link != NULL);
dictSetKeyAtLink(d, kv, &link, 0);
     
/* Add new value (If no space for the new key, dict will be expanded and 
   bucket will be looked up again.) */  
link = dictFindLink(d, key, &bucket);
// ... Do something, but don't modify the dict ...
// assert(link == NULL);
dictSetKeyAtLink(d, kv, &bucket, 1);
```
## dict.h 
- The dict API has became cluttered with many unused functions. I have
removed these from dict.h.
- Additionally, APIs specifically related to hash maps (no_value=0),
primarily those handling key-value access, have been gathered and
isolated.
- Removed entirely internal functions ending with “*ByHash()” that were
originally added for optimization and not required any more.
- Few other legacy dict functions were adapted at API level to work with
the term dictEntLink as well.
- Simplified and generalized an optimization that related to comparison
of length of keys of type strings.

## Hash Field Expiration
Until now each hash object with expiration on fields needed to maintain
a reference to its key-name (of the hash object), such that in case it
will be active-expired, then it will be possible to resolve the key-name
for the notification sake. Now there is no need anymore.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-12 10:15:17 +03:00
Mincho Paskalev
4d9b4d6e51
Input output traffic stats and command process count for each client. (#13944) 2025-05-09 16:55:47 +03:00
Mincho Paskalev
fdbf88032c
Add MSan and integrate it with CI (#13916)
## Description
Memory sanitizer (MSAN) is used to detect use-of-uninitialized memory
issues. While Address Sanitizer catches a wide range of memory safety
issues, it doesn't specifically detect uninitialized memory usage.
Therefore, Memory Sanitizer complements Address Sanitizer. This PR adds
MSAN run to the daily build, with the possibility of incorporating it
into the ci.yml workflow in the future if needed.

Changes in source files fix false-positive issues and they should not
introduce any runtime implications.

Note: Valgrind performs similar checks to both ASAN and MSAN but
sanitizers run significantly faster.

## Limitations
- Memory sanitizer is only supported by Clang.
- MSAN documentation states that all dependencies, including the
standard library, must be compiled with MSAN. However, it also mentions
there are interceptors for common libc functions, so compiling the
standard library with the MSAN flag is not strictly necessary.
Therefore, we are not compiling libc with MSAN.

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2025-05-09 11:44:54 +03:00
Vitah Lin
8148e4116e
Remove redundant HASH_SET_COPY macro and hfieldGetExpireTime declaration (#13991) 2025-05-09 15:22:28 +08:00
George Padron
538713e622
Fix minor grammatical error (#14022) 2025-05-09 15:20:56 +08:00
Yuan Wang
191afb8903
Reclaim page cache memory used by the AOF file after loading (#13811)
We can reclaim page cache memory used by the AOF file after loading,
since we don't read AOF again, corresponding to
https://github.com/redis/redis/pull/11248

There is a test after loading 9.5GB AOF, this PR uses much less
`buff/cache` than unstable.

**Unstable**
```
$ free -m
               total        used        free      shared  buff/cache   available
Mem:           31293       16181        4562          13       10958       15111
Swap:              0           0           0
```
**This PR**
```
$ free -m
               total        used        free      shared  buff/cache   available
Mem:           31293       15391       15854          13         439       15902
Swap:              0           0           0
```

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-09 10:05:37 +08:00
Yuan Wang
714ea20fec
Delete the tmp file that was accidentally submitted (#14027)
#13809 accidentally submitted a tmp rdb file, so delete it
2025-05-09 09:48:44 +08:00
Will Johnston
98335e1237
doc: adding TOC to readme, improving usability (#14021)
The PR aims to improve the README usability for new users as well as
developers looking to go in depth.

Key improvements include:

- **Structure & Navigation:**
  - Introduces a detailed Table of Contents for easier navigation.
  - Improved overall organization of sections.
- **Content:**
  - Expanded "What is Redis?" with section for "Key use cases"
  - Expanded "Why choose Redis?" section
- New "Getting started" section, including Redis starter projects and
ordering of sections based on desired use for new users
- Changes to "Redis data types, processing engines, and capabilities"
section for better readability and consistency
  - Formatting markdown blocks to specify language
2025-05-08 22:05:21 +03:00
Moti Cohen
30d5f05637
Fix various KEYSIZES enumeration issues (#13923)
There are several issues with maintaining histogram counters.

Ideally, the hooks would be placed in the low-level datatype
implementations. However, this logic is triggered in various contexts
and doesn’t always map directly to a stored DB key. As a result, the
hooks sit closer to the high-level commands layer. It’s a bit messy, but
the right way to ensure histogram counters behave correctly is through
broad test coverage.

* Fix inaccuracies around deletion scenarios.
* Fix inaccuracies around modules calls. Added corresponding tests.
* The info-keysizes.tcl test has been extended to operate on meaningful
datasets
* Validate histogram correctness in edge cases involving collection
deletions.
* Add new macro debugServerAssert(). Effective only if compiled with
DEBUG_ASSERTIONS.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-08 10:59:12 +03:00
Yuan Wang
6a436b6f72
Redis-cli gets RDB by RDB channel (#13809)
Now we have RDB channel in https://github.com/redis/redis/pull/13732,
child process can transfer RDB in a background method, instead of
handled by main thread. So when redis-cli gets RDB from server, we can
adopt this way to reduce the main thread load.

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2025-05-08 08:47:29 +08:00
Salvatore Sanfilippo
a46624e10e
[Vector sets] RDB IO errors handling (#13978)
This PR adds support for REDISMODULE_OPTIONS_HANDLE_IO_ERRORS.
and tests for short read and corrupted RESTORE payload.

Please: note that I also removed the comment about async loading support
since we should be already covered. No manipulation of global data
structures in Vector Sets, if not for the unique ID used to create new
vector sets with different IDs.
2025-05-07 21:49:00 +03:00
debing.sun
ac0bef15b5
Correctly update kvstore overhead after emptying or releasing dict (#13984)
Close #13973

This PR fixed two bugs.
1)  `overhead_hashtable_lut` isn't updated correctly
    This bug was introduced by https://github.com/redis/redis/pull/12913
We only update `overhead_hashtable_lut` at the beginning and end of
rehashing, but we forgot to update it when a dict is emptied or
released.

This PR introduces a new `bucketChanged` callback to track the change
changes in the bucket size.
Now, `rehashingStarted` and `rehashingCompleted` callbacks are no longer
responsible for bucket changes, but are entirely handled by
`bucketChanged`, this can also avoid that we need to register three
callbacks to track the change of bucket size, now only one is needed.

In most cases, it will be triggered together with `rehashingStarted` or
`rehashingCompleted`,
except when a dict is being emptied or released, in these cases, even if
the dict is not rehashing, we still need to subtract its current size.

On the other hand, `overhead_hashtable_lut` was duplicated with
`bucket_count`, so we remove `overhead_hashtable_lut` and use
`bucket_count` instead.

Note that this bug only happens with cluster mode, because we don't use
KVSTORE_FREE_EMPTY_DICTS without cluster.

2) The size of `dict_size_index` repeatedly counted in terms of memory
usage.
`dict_size_index` is created at startup, so its memory usage has been
counted into `used_memory_startup`.
However, when we want to count the overhead, we repeat the calculation,
which may cause the overhead to exceed the total memory usage.

---------

Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2025-05-07 16:45:23 +08:00
Alexander Gorbulya
97d7d2f865
Fix typo in replication state log message (#13805)
The log message incorrectly referred to the expected state as
`RECEIVE_PSYNC`,
while it should be `RECEIVE_PSYNC_REPLY`. This aligns the log with the
actual state check.
2025-05-07 15:28:45 +08:00
Yuan Wang
57a5f51f26
Reduce the call of ERR_clear_error (#13903)
From flame graph, we can find `ERR_clear_error` costs much cpu in tls
mode, some calls of `ERR_clear_error` are duplicate, in function
`tlsHandleEvent`, we call `ERR_clear_error` but we also call
`ERR_clear_error` when reading and writing, so it is not necessary.

from benchmark, this commit can bring 2-3% performance improvement.
2025-05-07 15:24:08 +08:00
Eran Hadad
a3f1d09a7d
Update TS, JSON and Bloom to 8.0.1 (#14013) 2025-05-06 21:20:29 +03:00
alonre24
14578b3b8b
RQE - bump version to 8.0.1 (#14011) 2025-05-06 21:19:43 +03:00
chx9
11954d925e
Fix sds leak in slaveTryPartialResynchronization (#13996)
1. Fix sds leak in slaveTryPartialResynchronization
2. delete wrong comments
2025-05-06 21:53:52 +08:00
Lior Kogan
2668356595
LICENSE.txt wrongly included the text of GPLv3 instead of AGPLv3 (#14010) 2025-05-06 14:45:36 +03:00
Vitah Lin
47505c3533
Fix 'Client output buffer hard limit is enforced' test causing infinite loop (#13934)
This PR fixes an issue in the CI test for client-output-buffer-limit,
which was causing an infinite loop when running on macOS 15.4.

### Problem

This test start two clients, R and R1:
```c
R1 subscribe foo
R publish foo bar
```

When R executes `PUBLISH foo bar`, the server first stores the message
`bar` in R1‘s buf. Only when the space in buf is insufficient does it
call `_addReplyProtoToList`.
Inside this function, `closeClientOnOutputBufferLimitReached` is invoked
to check whether the client’s R1 output buffer has reached its
configured limit.
On macOS 15.4, because the server writes to the client at a high speed,
R1’s buf never gets full. As a result,
`closeClientOnOutputBufferLimitReached` in the test is never triggered,
causing the test to never exit and fall into an infinite loop.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-06 10:44:16 +08:00