This PR is based on: https://github.com/valkey-io/valkey/pull/861
> ### Memory Access Amortization
> (Designed and implemented by [dan
touitou](https://github.com/touitou-dan))
>
> Memory Access Amortization (MAA) is a technique designed to optimize
the performance of dynamic data structures by reducing the impact of
memory access latency. It is applicable when multiple operations need to
be executed concurrently. The principle behind it is that for certain
dynamic data structures, executing operations in a batch is more
efficient than executing each one separately.
>
> Rather than executing operations sequentially, this approach
interleaves the execution of all operations. This is done in such a way
that whenever a memory access is required during an operation, the
program prefetches the necessary memory and transitions to another
operation. This ensures that when one operation is blocked awaiting
memory access, other memory accesses are executed in parallel, thereby
reducing the average access latency.
>
> We applied this method in the development of dictPrefetch, which takes
as parameters a vector of keys and dictionaries. It ensures that all
memory addresses required to execute dictionary operations for these
keys are loaded into the L1-L3 caches when executing commands.
Essentially, dictPrefetch is an interleaved execution of dictFind for
all the keys.
### Implementation of Redis
When the main thread processes clients with ready-to-execute commands
(i.e., clients for which the IO thread has parsed the commands), a batch
of up to 16 commands is created. Initially, the command's argv, which
were allocated by the IO thread, is prefetched to the main thread's L1
cache. Subsequently, all the dict entries and values required for the
commands are prefetched from the dictionary before the command
execution.
#### Memory prefetching for main hash table
As shown in the picture, after https://github.com/redis/redis/pull/13806
, we unify key value and the dict uses no_value optimization, so the
memory prefetching has 4 steps:
1. prefetch the bucket of the hash table
2. prefetch the entry associated with the given key's hash
3. prefetch the kv object of the entry
4. prefetch the value data of the kv object
we also need to handle the case that the dict entry is the pointer of kv
object, just skip step 3.
MAA can improves single-threaded memory access efficiency by
interleaving the execution of multiple independent operations, allowing
memory-level parallelism and better CPU utilization. Its key point is
batch-wise interleaved execution. Split a batch of independent
operations (such as multiple key lookups) into multiple state machines,
and interleave their progress within a single thread to hide the memory
access latency of individual requests.
The difference between serial execution and interleaved execution:
**naive serial execution**
```
key1: step1 → wait → step2 → wait → done
key2: step1 → wait → step2 → wait → done
```
**interleaved execution**
```
key1: step1 → step2 → done
key2: step1 → step2 → done
key3: step1 → step2 → done
↑ While waiting for key1’s memory, progress key2/key3
```
#### New configuration
This PR involves a new configuration `prefetch-batch-max-size`, but we
think it is a low level optimization, so we hide this config:
When multiple commands are parsed by the I/O threads and ready for
execution, we take advantage of knowing the next set of commands and
prefetch their required dictionary entries in a batch. This reduces
memory access costs. The optimal batch size depends on the specific
workflow of the user. The default batch size is 16, which can be
modified using the 'prefetch-batch-max-size' config.
When the config is set to 0, prefetching is disabled.
---------
Co-authored-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Add thread sanitizer run to daily CI.
Few tests are skipped in tsan runs for two reasons:
* Stack trace producing tests (oom, `unit/moduleapi/crash`, etc) are
tagged `tsan:skip` because redis calls `backtrace()` in signal handler
which turns out to be signal-unsafe since it might allocate memory (e.g.
glibc 2.39 does it through a call to `_dl_map_object_deps()`).
* Few tests become flaky with thread sanitizer builds and don't finish
in expected deadlines because of the additional tsan overhead. Instead
of skipping those tests, this can improved in the future by allowing
more iterations when waiting for tsan builds.
Deadlock detection is disabled for now because of tsan limitation where
max 64 locks can be taken at once.
There is one outstanding (false-positive?) race in jemalloc which is
suppressed in `tsan.sup`.
Fix few races thread sanitizer reported having to do with writes from
signal handlers. Since in multi-threaded setting signal handlers might
be called on any thread (modulo pthread_sigmask) while the main thread
is running, `volatile sig_atomic_t` type is not sufficient and atomics
are used instead.
When `repl-diskless-load` is enabled on a replica, and it is in the
process of loading an RDB file, a broken connection detected by the main
channel may trigger a call to rioAbort(). This sets a flag to cause the
rdb channel to fail on the next rioRead() call, allowing it to perform
necessary cleanup.
However, there are specific scenarios where the error is checked using
rioGetReadError(), which does not account for the RIO_ABORT flag (see
[source](79b37ff535/src/rdb.c (L3098))).
As a result, the error goes undetected. The code then proceeds to
validate a module type, fails to find a match, and calls
rdbReportCorruptRDB() which logs the following error and exits the
process:
```
The RDB file contains module data I can't load: no matching module type '_________'
```
To fix this issue, the RIO_ABORT flag has been removed. Now, rioAbort()
sets both read and write error flags, so that subsequent operations and
error checks properly detect the failure.
Additional keys were added to the short read test. It reproduces the
issue with this change. We hit that problematic line once per key. My
guess is that with many smaller keys, the likelihood of the connection
being killed at just the right moment increases.
Compiled Redis with COVERAGE_TEST, while using the fork API encountered
the following issue:
- Forked process calls `RedisModule_ExitFromChild` - child process
starts to report its COW while performing IO operations
- Parent process terminates child process with
`RedisModule_KillForkChild`
- Child process signal handler gets called while an IO operation is
called
- exit() is called because COVERAGE_TEST was on during compilation.
- exit() tries to perform more IO operations in its exit handlers.
- process gets deadlocked
Backtrace snippet:
```
#0 futex_wait (private=0, expected=2, futex_word=0x7e1220000c50) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait_private (futex=0x7e1220000c50) at ./nptl/lowlevellock.c:34
#2 0x00007e1234696429 in __GI__IO_flush_all () at ./libio/genops.c:698
#3 0x00007e123469680d in _IO_cleanup () at ./libio/genops.c:843
#4 0x00007e1234647b74 in __run_exit_handlers (status=status@entry=255, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:129
#5 0x00007e1234647bbe in __GI_exit (status=status@entry=255) at ./stdlib/exit.c:138
#6 0x00005ef753264e13 in exitFromChild (retcode=255) at /home/jonathan/CLionProjects/redis/src/server.c:263
#7 sigKillChildHandler (sig=<optimized out>) at /home/jonathan/CLionProjects/redis/src/server.c:6794
#8 <signal handler called>
#9 0x00007e1234685b94 in _IO_fgets (buf=buf@entry=0x7e122dafdd90 "KSM:", ' ' <repeats 19 times>, "0 kB\n", n=n@entry=1024, fp=fp@entry=0x7e1220000b70) at ./libio/iofgets.c:47
#10 0x00005ef75326c5e0 in fgets (__stream=<optimized out>, __n=<optimized out>, __s=<optimized out>, __s=<optimized out>, __n=<optimized out>, __stream=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:200
#11 zmalloc_get_smap_bytes_by_field (field=0x5ef7534c42fd "Private_Dirty:", pid=<optimized out>) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:928
#12 0x00005ef75338ab1f in zmalloc_get_private_dirty (pid=-1) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:978
#13 sendChildInfoGeneric (info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE, keys=0, progress=-1, pname=0x5ef7534c95b2 "Module fork") at /home/jonathan/CLionProjects/redis/src/childinfo.c:71
#14 0x00005ef75337962c in sendChildCowInfo (pname=0x5ef7534c95b2 "Module fork", info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE) at /home/jonathan/CLionProjects/redis/src/server.c:6895
#15 RM_ExitFromChild (retcode=0) at /home/jonathan/CLionProjects/redis/src/module.c:11468
```
Change is to make the exit() _exit() calls conditional based on a
parameter to exitFromChild function.
The signal handler should exit without io operations since it doesn't
know its history.(If we were in the middle of IO operations before it
was called)
---------
Co-authored-by: Yuan Wang <wangyuancode@163.com>
restoreCommand() creates a key-value object (kv) with a TTL in two steps.
During the second step, setExpire() may reallocate the kv object. To ensure
correct behavior, kv must be updated after this call, as it might be used later
in the function.
Hi, as described, this implements WITHATTRIBS, a feature requested by a
few users, and indeed needed.
This was requested the first time by @rowantrollope but I was not sure
how to make it work with RESP2 and RESP3 in a clean way, hopefully
that's it.
The patch includes tests and documentation updates.
This bug was introduced in
[#13814](https://github.com/redis/redis/issues/13814), and was found by
@guybe7.
It incorrectly moved the update of `server.cronloops` from
`whileBlockedCron()` to `activeDefragTimeProc()`,
causing the cron-based timers to effectively run twice as fast when
active defrag is enabled.
As a result, memory statistics are not updated during blocked
operations.
The repair parts from https://github.com/redis/redis/pull/13995, because
it needs to be backport, so use a separate pr repair it.
Based on https://github.com/valkey-io/valkey/pull/1463 and
https://github.com/valkey-io/valkey/pull/1481
In the failure of fully
CI(https://github.com/redis/redis/actions/runs/14595343452/job/40979173087?pr=13965)
in version 7.0 we are getting a number of errors like:
```
array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’
```
Which is basically GCC telling us that we have an object which is longer
than the underlying storage of the allocation. We actually do this a
lot, but GCC is generally not aware of how big the underlying allocation
is, so it doesn't throw this error. We are specifically getting this
error because the msgBlock can be of variable length depending on the
type of message, but GCC assumes it's the longest one possible. The
solution I went with here was make the message type optional, so that it
wasn't included in the size. I think this also makes some sense, since
it's really just a helper for us to easily cast the object around.
This compilation warning only occurs in version 7.2, because in [this
PR](https://github.com/redis/redis/pull/13073), we started passing
`-flto` to `CFLAGS` by default. It seems that in this case, GCC is
unable to detect such warnings. However, this change is not present in
version 7.2.
So, to reproduce this compilation warning in versions after 7.2, we can
pass `OPTIMIZATION=-O2` manually.
---------
Co-authored-by: madolson <34459052+madolson@users.noreply.github.com>
# Add LOLWUT 8: TAPE MARK I - Computer Poetry Generation
This PR introduces LOLWUT 8, implementing Nanni Balestrini's
groundbreaking TAPE MARK I algorithm from 1962 - one of the first
experiments in computer-generated poetry.
## Background
TAPE MARK I, created by Italian poet Nanni Balestrini and published in
Almanacco Letterario Bompiani (1962), represents a [pioneering moment in
computational creativity](https://en.wikipedia.org/wiki/Digital_poetry).
Using an IBM 7090 mainframe, Balestrini developed an algorithm that
combines verses from three different literary sources:
1. **Diary of Hiroshima** by Michihito Hachiya
2. **The Mystery of the Elevator** by Paul Goldwin
3. **Tao Te Ching** by Lao Tse
The algorithm selects and arranges verses based on metrical
compatibility rules and ensures alternation between different literary
sources, creating unique poetic combinations with each execution.
## Implementation
This LOLWUT command faithfully reproduces Balestrini's original
algorithm.
The main difference is that the default output is in English, and not in
Italian. However it should be noted that Balestrini used three poems
that were not in Italian anyway, so the translation process was already
part of it. In the English versions, sometimes I operated minimal
changes in order to preserve either the metric, or to make sure that the
sentence stands on its own (like adding "it" before expands rapidly).
## Cultural Significance
TAPE MARK I predates most computational art experiments and demonstrates
the early intersection of literature, technology, and algorithmic
creativity. This implementation honors that pioneering work while making
it accessible to a modern audience through Redis's LOLWUT tradition.
Each execution generates a unique poem, just as Balestrini intended.
Trivia: the original code, running on an IBM 7090, used six minutes to
generate each verse :D
**IMPORTANT** This commit should be back-ported to Redis 8.
### Issue
Previously, even when only string equality needed to be determined, the
comparison logic still performed unnecessary `memcmp()` calls to check
string ordering, even if the lengths were not equal.
### Change
This PR add length check before content comparison in
`equalStringObjects` function.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
When we transfer clients between IO thread and main thread, the creation
and destruction of list nodes also consume some CPU. In this commit, we
reuse list nodes to avoid this issue.
This PR addresses a potential misalignment issue when using `va_args`.
Without this fix,
[argument](9a9aa921bc/src/module.c (L6249-L6264))
values may occasionally become incorrect due to stack alignment
inconsistencies.
Some objects that are allocated in the IO thread, we should let IO
thread free them, so we can avoid memory arena contention and also
reduce the load of the main thread.
These objects include:
- client argv objects
- the rewrite objects that are only `OBJ_ENCODING_RAW` encoding strings,
since only the type object is usually allocated by IO threads.
For the implementation, if the client is assigned to IO threads, we will
create a `deferred_objects` array of size 32. We will put objects into
the `deferred_objects` when main thread wants to free above objects,
and finally they are be freed by IO threads.
When running the Redis test on MacOS, the test detects that the
operating system is able to use "leaks" to test for memory leaks and
executes this check after every server spinned is terminated.
While we have the ability to run the test in environments able to detect
memory issues, the fact it is possible to check for leaks at every run
baasically for free is very valuable, and allows to fix leaks
immediately in your laptop before submitting a PR.
However, the feature avoided to run leaks when no test was run: this
check was added in the early stage of Redis, when all the tests were
like:
server {
test { ... }
}
So the check counts for the number of tests ran, and if no test is
executed, no leaks detection is performed. However now we have certain
tests that are in the form:
test {
server { ... }
}
For instance just loading a corrupted RDB or alike. In this case, the
leaks test is not executed. This commit removes the check so that the
leaks test is always executed.
This PR adds 4 new operators to the `BITOP` command - `DIFF`, `DIFF1`,
`ANDOR` and `ONE`. They enable redis clients to atomically do
non-trivial logical operations that are useful for checking membership
of a bitmap against a group of bitmaps.
* **DIFF**
`BITOP DIFF dest srckey1 srckey2 [key...]`
**Description**
DIFF(*X*, *A1*, *A2*, *...*, *AN*) = *X* ∧ ¬(*A1* ∨ *A2* ∨ *...* ∨
*AN*), i.e the set bits of *X* that are not set in any of *A1*, *A2*,
*…*, *AN*
**NOTE**
Command expects at least 2 source keys.
* **DIFF1**
`BITOP DIFF1 dest srckey1 srckey2 [key...]`
**Description**
DIFF1(*X*, *A1*, *A2*, *...*, *AN*) = ¬*X* ∧ (*A1* ∨ *A2* ∨ *...* ∨
*AN*), i.e the bits set in one or more of *A1*, *A2*, *…*, *AN* that are
not set in *X*
**NOTE**
Command expects at least 2 source keys.
* **ANDOR**
`BITOP ANDOR dest srckey1 srckey2 [key...]`
**Description**
ANDOR(*X*, *A1*, *A2*, *...*, *AN*) = *X* ∧ (*A1* ∨ *A2* ∨ *...* ∨
*AN*), i.e the set bits of X that are also set in *A1*, *A2*, *…*, *AN*
**NOTE**
Command expects at least 2 source keys.
* **ONE**
`BITOP ONE dest key [key...]`
**Description**
ONE(*A1*, *A2*, *...*, *AN*) = *X*, where
if *X[i]* is the *i*-th bit of *X* then *X[i] = 1* if and only if there
is m such that *A_m[i] = 1* and *An[i] = 0* for all *n != m*, i.e bit
*X[i]* is set only if it set in exactly one of *A1*, *A2*, *...*, *AN*
**Return value**
As in all other `BITOP` operators return value for all the new ones is
the number bytes of the longest key.
EDIT:
Besides adding the new commands couple more changes were made:
- Added AVX2 path for more optimized computation of the BITOP operations
(including the new ones)
- Removed the hard limit of max 16 source keys for the fast path to be
used - now no matter the number of keys we can enter the fast path given
keys are long enough.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Separated the repeated logic for iterating and formatting AOF info from
both
history and incremental AOF lists into a new helper function named
appendAofInfoFromList. This improves code readability, reduces
duplication,
and makes the getAofManifestAsString function cleaner and easier to
maintain.
No changes in behavior were introduced.
This commit addresses issues with the keysizes histogram tracking in two
Redis commands:
**SPOP with count (case 3)**
In the spopWithCountCommand function, when handling case 3 (where the
number of elements to return is very large, approaching the size of the
set itself), the keysizes histogram was not being properly updated. This
PR adds the necessary call to updateKeysizesHist() to ensure the
histogram accurately reflects the changes in set size after the
operation.
**SETRANGE command**
Fixed an issue in the setrangeCommand function where the keysizes
histogram wasn't being properly updated when modifying strings. The PR
ensures that the histogram correctly tracks the old and new lengths of
the string after a SETRANGE operation.
Added tests accordingly.
In PR #13229, we introduced the ebucket for HFE.
Before this PR, when updating eitems stored in ebuckets, the lack of
incremental fragmentation support for non-kvstore data structures (until
PR #13814) meant that we had to reverse lookup the position of the eitem
in the ebucket and then perform the update.
This approach was inefficient as it often required frequent traversals
of the segment list to locate and update the item.
To address this issue, in this PR, This PR implements incremental
fragmentation for hash dict ebuckets and server.hexpires.
By incrementally defrag the ebuckets, we also perform defragmentation
for the associated items, eliminates the need for frequent traversals of
the segment list for defragging the eitem.
---------
Co-authored-by: Moti Cohen <moticless@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR fixes the `tests/unit/info.tcl` test to properly handle 32-bit
architectures by dynamically determining the pointer size based on the
architecture instead of hardcoding it to 8 bytes.
in the original test, we start a cluster with 20 instances(10 masters +
10 replicas), which leads to frequent disconnections of instances in a
slow environment, resulting in an inability to achieve consistency.
This test reduced the number of instances from 20 to 6.
PR https://github.com/redis/redis/pull/13916 introduced a regression -
by overriding the `CFLAGS` and `LDFLAGS` variables for all of the
dependencies hiredis and fast_float lost some of their compiler/linker
flags.
This PR makes it so we can pass additional CFLAGS/LDFLAGS to hiredis,
without overriding them as it contains a bit more complex Makefile. As
for fast_float - passing CFLAGS/LDFLAGS from outside now doesn't break
the expected behavior.
The build step in the CI was changed so that the MacOS is now build with
TLS to catch such errors in the future.
In pipeline mode, especially with TLS, two IO threads may have worse
performance than single thread, one reason is the io thread and the main
thread cannot process in parallel, now, the IO threads will deliver
clients if pending client list is more than 16 instead of finishing
processing all clients, this approach can make IO threads and main
thread process in parallel as much as possible.
IO threads may do some unnecessary notification with the main thread,
the notification is based on eventfd, read(2) and write(2) eventfd are
system calls that are costly. When they are running, they can check the
pending client list to process in `beforeSleep`, so in this commit, if
both the main thread and the IO thread are running, they can pass the
client without notification, and these transferred clients will be
processed in `beforeSleep`.
Hi all, this PR fixes two things:
1. An assertion, that prevented the RDB loading from recovery if there
was a quantization type mismatch (with regression test).
2. Two code paths that just returned NULL without proper cleanup during
RDB loading.
The idea of packing the key (`sds`), value (`robj`) and optionally TTL
into a single struct in memory was mentioned a few times in the past by
the community in various flavors. This approach improves memory
efficiency, reduces pointer dereferences for faster lookups, and
simplifies expiration management by keeping all relevant data in one
place. This change goes along with setting keyspace's dict to
no_value=1, and saving considerable amount of memory.
Two more motivations that well aligned with this unification are:
- Prepare the groundwork for replacing EXPIRE scan based implementation
and evaluate instead new `ebuckets` data structure that was introduced
as part of [Hash Field Expiration
feature](https://redis.io/blog/hash-field-expiration-architecture-and-benchmarks/).
Using this data structure requires embedding the ExpireMeta structure
within each object.
- Consider replacing dict with a more space efficient open addressing
approach hash table that might rely on keeping a single pointer to
object.
Before this PR, I POC'ed on a variant of open addressing hash-table and
was surprised to find that dict with no_value actually could provide a
good balance between performance, memory efficiency, and simplicity.
This realization prompted the separation of the unification step from
the evaluation of a new hash table to avoid introducing too many changes
at once and to evaluate its impact independently before considering
replacement of existing hash-table. On an earlier
[commit](https://github.com/redis/redis/pull/13683) I extended dict
no_value optimization (which saves keeping dictEntry where possible) to
be relevant also for objects with even addresses in memory. Combining it
with this unification saves a considerable amount of memory for
keyspace.
# kvobj
This PR adopts Valkey’s
[packing](3eb8314be6)
layout and logic for key, value, and TTL. However, unlike Valkey
implementation, which retained a common `robj` throughout the project,
this PR distinguishes between the general-purpose, overused `robj`, and
the new `kvobj`, which embeds both the key and value and used by the
keyspace. Conceptually, `robj` serves as a base class, while `kvobj`
acts as a derived class.
Two new flags introduced into redis object, `iskvobj` and `expirable`:
```
struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS;
unsigned iskvobj : 1; /* new flag */
unsigned expirable : 1; /* new flag */
unsigned refcount : 30; /* modified: 32bits->30bits */
void *ptr;
};
typedef struct redisObject robj;
typedef struct redisObject kvobj;
```
When the `iskvobj` flag is set, the object includes also the key and it
is appended to the end of the object. If the `expirable` flag is set, an
additional 8 bytes are added to the object. If the object is of type
string, and the string is rather short, then it will be embedded as
well.
As a result, all keys in the keyspace are promoted to be of type
`kvobj`. This term attempts to align with the existing Redis object,
robj, and the kvstore data structure.
# EXPIRE Implementation
As `kvobj` embeds expiration time as well, looking up expiration times
is now an O(1) operation. And the hash-table of EXPIRE is set now to be
`no_value` mode, directly referencing `kvobj` entries, and in turn,
saves memory.
Next, I plan to evaluate replacing the EXPIRE implementation with the
[ebuckets](https://github.com/redis/redis/blob/unstable/src/ebuckets.h)
data structure, which would eliminate keyspace scans for expired keys.
This requires embedding `ExpireMeta` within each `kvobj` of each key
with expiration. In such implementation, the `expirable` flag will be
shifted to indicate whether `ExpireMeta` is attached.
# Implementation notes
## Manipulating keyspace (find, modify, insert)
Initially, unifying the key and value into a single object and storing
it in dict with `no_value` optimization seemed like a quick win.
However, it (quickly) became clear that this change required deeper
modifications to how keys are manipulated. The challenge was handling
cases where a dictEntry is opt-out due to no_value optimization. In such
cases, many of the APIs that return the dictEntry from a lookup become
insufficient, as it just might be the key itself. To address this issue,
a new-old approach of returning a "link" to the looked-up key's
`dictEntry` instead of the `dictEntry` itself. The term `link` was
already somewhat available in dict API, and is well aligned with the new
dictEntLink declaration:
```
typedef dictEntry **dictEntLink;
```
This PR introduces two new function APIs to dict to leverage returned
link from the search:
```
dictEntLink dictFindLink(dict *d, const void *key, dictEntLink *bucket);
void dictSetKeyAtLink(dict *d, void *key, dictEntLink *link, int newItem);
```
After calling `link = dictFindLink(...)`, any necessary updates must be
performed immediately after by calling `dictSetKeyAtLink()` without any
intervening operations on given dict. Otherwise, `dictEntLink` may
become invalid. Example:
```
/* replace existing key */
link = dictFindLink(d, key, &bucket, 0);
// ... Do something, but don't modify the dict ...
// assert(link != NULL);
dictSetKeyAtLink(d, kv, &link, 0);
/* Add new value (If no space for the new key, dict will be expanded and
bucket will be looked up again.) */
link = dictFindLink(d, key, &bucket);
// ... Do something, but don't modify the dict ...
// assert(link == NULL);
dictSetKeyAtLink(d, kv, &bucket, 1);
```
## dict.h
- The dict API has became cluttered with many unused functions. I have
removed these from dict.h.
- Additionally, APIs specifically related to hash maps (no_value=0),
primarily those handling key-value access, have been gathered and
isolated.
- Removed entirely internal functions ending with “*ByHash()” that were
originally added for optimization and not required any more.
- Few other legacy dict functions were adapted at API level to work with
the term dictEntLink as well.
- Simplified and generalized an optimization that related to comparison
of length of keys of type strings.
## Hash Field Expiration
Until now each hash object with expiration on fields needed to maintain
a reference to its key-name (of the hash object), such that in case it
will be active-expired, then it will be possible to resolve the key-name
for the notification sake. Now there is no need anymore.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
## Description
Memory sanitizer (MSAN) is used to detect use-of-uninitialized memory
issues. While Address Sanitizer catches a wide range of memory safety
issues, it doesn't specifically detect uninitialized memory usage.
Therefore, Memory Sanitizer complements Address Sanitizer. This PR adds
MSAN run to the daily build, with the possibility of incorporating it
into the ci.yml workflow in the future if needed.
Changes in source files fix false-positive issues and they should not
introduce any runtime implications.
Note: Valgrind performs similar checks to both ASAN and MSAN but
sanitizers run significantly faster.
## Limitations
- Memory sanitizer is only supported by Clang.
- MSAN documentation states that all dependencies, including the
standard library, must be compiled with MSAN. However, it also mentions
there are interceptors for common libc functions, so compiling the
standard library with the MSAN flag is not strictly necessary.
Therefore, we are not compiling libc with MSAN.
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
We can reclaim page cache memory used by the AOF file after loading,
since we don't read AOF again, corresponding to
https://github.com/redis/redis/pull/11248
There is a test after loading 9.5GB AOF, this PR uses much less
`buff/cache` than unstable.
**Unstable**
```
$ free -m
total used free shared buff/cache available
Mem: 31293 16181 4562 13 10958 15111
Swap: 0 0 0
```
**This PR**
```
$ free -m
total used free shared buff/cache available
Mem: 31293 15391 15854 13 439 15902
Swap: 0 0 0
```
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
The PR aims to improve the README usability for new users as well as
developers looking to go in depth.
Key improvements include:
- **Structure & Navigation:**
- Introduces a detailed Table of Contents for easier navigation.
- Improved overall organization of sections.
- **Content:**
- Expanded "What is Redis?" with section for "Key use cases"
- Expanded "Why choose Redis?" section
- New "Getting started" section, including Redis starter projects and
ordering of sections based on desired use for new users
- Changes to "Redis data types, processing engines, and capabilities"
section for better readability and consistency
- Formatting markdown blocks to specify language
There are several issues with maintaining histogram counters.
Ideally, the hooks would be placed in the low-level datatype
implementations. However, this logic is triggered in various contexts
and doesn’t always map directly to a stored DB key. As a result, the
hooks sit closer to the high-level commands layer. It’s a bit messy, but
the right way to ensure histogram counters behave correctly is through
broad test coverage.
* Fix inaccuracies around deletion scenarios.
* Fix inaccuracies around modules calls. Added corresponding tests.
* The info-keysizes.tcl test has been extended to operate on meaningful
datasets
* Validate histogram correctness in edge cases involving collection
deletions.
* Add new macro debugServerAssert(). Effective only if compiled with
DEBUG_ASSERTIONS.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Now we have RDB channel in https://github.com/redis/redis/pull/13732,
child process can transfer RDB in a background method, instead of
handled by main thread. So when redis-cli gets RDB from server, we can
adopt this way to reduce the main thread load.
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
This PR adds support for REDISMODULE_OPTIONS_HANDLE_IO_ERRORS.
and tests for short read and corrupted RESTORE payload.
Please: note that I also removed the comment about async loading support
since we should be already covered. No manipulation of global data
structures in Vector Sets, if not for the unique ID used to create new
vector sets with different IDs.
Close#13973
This PR fixed two bugs.
1) `overhead_hashtable_lut` isn't updated correctly
This bug was introduced by https://github.com/redis/redis/pull/12913
We only update `overhead_hashtable_lut` at the beginning and end of
rehashing, but we forgot to update it when a dict is emptied or
released.
This PR introduces a new `bucketChanged` callback to track the change
changes in the bucket size.
Now, `rehashingStarted` and `rehashingCompleted` callbacks are no longer
responsible for bucket changes, but are entirely handled by
`bucketChanged`, this can also avoid that we need to register three
callbacks to track the change of bucket size, now only one is needed.
In most cases, it will be triggered together with `rehashingStarted` or
`rehashingCompleted`,
except when a dict is being emptied or released, in these cases, even if
the dict is not rehashing, we still need to subtract its current size.
On the other hand, `overhead_hashtable_lut` was duplicated with
`bucket_count`, so we remove `overhead_hashtable_lut` and use
`bucket_count` instead.
Note that this bug only happens with cluster mode, because we don't use
KVSTORE_FREE_EMPTY_DICTS without cluster.
2) The size of `dict_size_index` repeatedly counted in terms of memory
usage.
`dict_size_index` is created at startup, so its memory usage has been
counted into `used_memory_startup`.
However, when we want to count the overhead, we repeat the calculation,
which may cause the overhead to exceed the total memory usage.
---------
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
The log message incorrectly referred to the expected state as
`RECEIVE_PSYNC`,
while it should be `RECEIVE_PSYNC_REPLY`. This aligns the log with the
actual state check.
From flame graph, we can find `ERR_clear_error` costs much cpu in tls
mode, some calls of `ERR_clear_error` are duplicate, in function
`tlsHandleEvent`, we call `ERR_clear_error` but we also call
`ERR_clear_error` when reading and writing, so it is not necessary.
from benchmark, this commit can bring 2-3% performance improvement.
This PR fixes an issue in the CI test for client-output-buffer-limit,
which was causing an infinite loop when running on macOS 15.4.
### Problem
This test start two clients, R and R1:
```c
R1 subscribe foo
R publish foo bar
```
When R executes `PUBLISH foo bar`, the server first stores the message
`bar` in R1‘s buf. Only when the space in buf is insufficient does it
call `_addReplyProtoToList`.
Inside this function, `closeClientOnOutputBufferLimitReached` is invoked
to check whether the client’s R1 output buffer has reached its
configured limit.
On macOS 15.4, because the server writes to the client at a high speed,
R1’s buf never gets full. As a result,
`closeClientOnOutputBufferLimitReached` in the test is never triggered,
causing the test to never exit and fall into an infinite loop.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>