redis

mirror of https://github.com/redis/redis.git synced 2026-04-25 16:17:33 -04:00

Author	SHA1	Message	Date
Yuan Wang	70a079db5e	Improve multithreaded performance with memory prefetching (#14017 ) This PR is based on: https://github.com/valkey-io/valkey/pull/861 > ### Memory Access Amortization > (Designed and implemented by [dan touitou](https://github.com/touitou-dan)) > > Memory Access Amortization (MAA) is a technique designed to optimize the performance of dynamic data structures by reducing the impact of memory access latency. It is applicable when multiple operations need to be executed concurrently. The principle behind it is that for certain dynamic data structures, executing operations in a batch is more efficient than executing each one separately. > > Rather than executing operations sequentially, this approach interleaves the execution of all operations. This is done in such a way that whenever a memory access is required during an operation, the program prefetches the necessary memory and transitions to another operation. This ensures that when one operation is blocked awaiting memory access, other memory accesses are executed in parallel, thereby reducing the average access latency. > > We applied this method in the development of dictPrefetch, which takes as parameters a vector of keys and dictionaries. It ensures that all memory addresses required to execute dictionary operations for these keys are loaded into the L1-L3 caches when executing commands. Essentially, dictPrefetch is an interleaved execution of dictFind for all the keys. ### Implementation of Redis When the main thread processes clients with ready-to-execute commands (i.e., clients for which the IO thread has parsed the commands), a batch of up to 16 commands is created. Initially, the command's argv, which were allocated by the IO thread, is prefetched to the main thread's L1 cache. Subsequently, all the dict entries and values required for the commands are prefetched from the dictionary before the command execution. #### Memory prefetching for main hash table As shown in the picture, after https://github.com/redis/redis/pull/13806 , we unify key value and the dict uses no_value optimization, so the memory prefetching has 4 steps: 1. prefetch the bucket of the hash table 2. prefetch the entry associated with the given key's hash 3. prefetch the kv object of the entry 4. prefetch the value data of the kv object we also need to handle the case that the dict entry is the pointer of kv object, just skip step 3. MAA can improves single-threaded memory access efficiency by interleaving the execution of multiple independent operations, allowing memory-level parallelism and better CPU utilization. Its key point is batch-wise interleaved execution. Split a batch of independent operations (such as multiple key lookups) into multiple state machines, and interleave their progress within a single thread to hide the memory access latency of individual requests. The difference between serial execution and interleaved execution: naive serial execution ``` key1: step1 → wait → step2 → wait → done key2: step1 → wait → step2 → wait → done ``` interleaved execution ``` key1: step1 → step2 → done key2: step1 → step2 → done key3: step1 → step2 → done ↑ While waiting for key1’s memory, progress key2/key3 ``` #### New configuration This PR involves a new configuration `prefetch-batch-max-size`, but we think it is a low level optimization, so we hide this config: When multiple commands are parsed by the I/O threads and ready for execution, we take advantage of knowing the next set of commands and prefetch their required dictionary entries in a batch. This reduces memory access costs. The optimal batch size depends on the specific workflow of the user. The default batch size is 16, which can be modified using the 'prefetch-batch-max-size' config. When the config is set to 0, prefetching is disabled. --------- Co-authored-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>	2025-06-05 08:57:43 +08:00
Slavomir Kaslev	b7c6755b1b	Add thread sanitizer run to daily CI (#13964 ) Add thread sanitizer run to daily CI. Few tests are skipped in tsan runs for two reasons: * Stack trace producing tests (oom, `unit/moduleapi/crash`, etc) are tagged `tsan:skip` because redis calls `backtrace()` in signal handler which turns out to be signal-unsafe since it might allocate memory (e.g. glibc 2.39 does it through a call to `_dl_map_object_deps()`). * Few tests become flaky with thread sanitizer builds and don't finish in expected deadlines because of the additional tsan overhead. Instead of skipping those tests, this can improved in the future by allowing more iterations when waiting for tsan builds. Deadlock detection is disabled for now because of tsan limitation where max 64 locks can be taken at once. There is one outstanding (false-positive?) race in jemalloc which is suppressed in `tsan.sup`. Fix few races thread sanitizer reported having to do with writes from signal handlers. Since in multi-threaded setting signal handlers might be called on any thread (modulo pthread_sigmask) while the main thread is running, `volatile sig_atomic_t` type is not sufficient and atomics are used instead.	2025-06-02 10:13:23 +03:00
Ozan Tezcan	7f60945bc6	Fix short read issue that causes exit() on replica (#14085 ) When `repl-diskless-load` is enabled on a replica, and it is in the process of loading an RDB file, a broken connection detected by the main channel may trigger a call to rioAbort(). This sets a flag to cause the rdb channel to fail on the next rioRead() call, allowing it to perform necessary cleanup. However, there are specific scenarios where the error is checked using rioGetReadError(), which does not account for the RIO_ABORT flag (see [source](`79b37ff535/src/rdb.c (L3098)`)). As a result, the error goes undetected. The code then proceeds to validate a module type, fails to find a match, and calls rdbReportCorruptRDB() which logs the following error and exits the process: ``` The RDB file contains module data I can't load: no matching module type '_________' ``` To fix this issue, the RIO_ABORT flag has been removed. Now, rioAbort() sets both read and write error flags, so that subsequent operations and error checks properly detect the failure. Additional keys were added to the short read test. It reproduces the issue with this change. We hit that problematic line once per key. My guess is that with many smaller keys, the likelihood of the connection being killed at just the right moment increases.	2025-05-28 12:43:59 +03:00
kei-nan	161326d332	Avoid performing IO on coverage when child exits due to signal handler (#14072 ) Compiled Redis with COVERAGE_TEST, while using the fork API encountered the following issue: - Forked process calls `RedisModule_ExitFromChild` - child process starts to report its COW while performing IO operations - Parent process terminates child process with `RedisModule_KillForkChild` - Child process signal handler gets called while an IO operation is called - exit() is called because COVERAGE_TEST was on during compilation. - exit() tries to perform more IO operations in its exit handlers. - process gets deadlocked Backtrace snippet: ``` #0 futex_wait (private=0, expected=2, futex_word=0x7e1220000c50) at ../sysdeps/nptl/futex-internal.h:146 #1 __GI___lll_lock_wait_private (futex=0x7e1220000c50) at ./nptl/lowlevellock.c:34 #2 0x00007e1234696429 in __GI__IO_flush_all () at ./libio/genops.c:698 #3 0x00007e123469680d in _IO_cleanup () at ./libio/genops.c:843 #4 0x00007e1234647b74 in __run_exit_handlers (status=status@entry=255, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:129 #5 0x00007e1234647bbe in __GI_exit (status=status@entry=255) at ./stdlib/exit.c:138 #6 0x00005ef753264e13 in exitFromChild (retcode=255) at /home/jonathan/CLionProjects/redis/src/server.c:263 #7 sigKillChildHandler (sig=<optimized out>) at /home/jonathan/CLionProjects/redis/src/server.c:6794 #8 <signal handler called> #9 0x00007e1234685b94 in _IO_fgets (buf=buf@entry=0x7e122dafdd90 "KSM:", ' ' <repeats 19 times>, "0 kB\n", n=n@entry=1024, fp=fp@entry=0x7e1220000b70) at ./libio/iofgets.c:47 #10 0x00005ef75326c5e0 in fgets (__stream=<optimized out>, __n=<optimized out>, __s=<optimized out>, __s=<optimized out>, __n=<optimized out>, __stream=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:200 #11 zmalloc_get_smap_bytes_by_field (field=0x5ef7534c42fd "Private_Dirty:", pid=<optimized out>) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:928 #12 0x00005ef75338ab1f in zmalloc_get_private_dirty (pid=-1) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:978 #13 sendChildInfoGeneric (info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE, keys=0, progress=-1, pname=0x5ef7534c95b2 "Module fork") at /home/jonathan/CLionProjects/redis/src/childinfo.c:71 #14 0x00005ef75337962c in sendChildCowInfo (pname=0x5ef7534c95b2 "Module fork", info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE) at /home/jonathan/CLionProjects/redis/src/server.c:6895 #15 RM_ExitFromChild (retcode=0) at /home/jonathan/CLionProjects/redis/src/module.c:11468 ``` Change is to make the exit() _exit() calls conditional based on a parameter to exitFromChild function. The signal handler should exit without io operations since it doesn't know its history.(If we were in the middle of IO operations before it was called) --------- Co-authored-by: Yuan Wang <wangyuancode@163.com>	2025-05-28 16:27:52 +08:00
Moti Cohen	79b37ff535	Fix RESTORE with TTL (#14071 ) restoreCommand() creates a key-value object (kv) with a TTL in two steps. During the second step, setExpire() may reallocate the kv object. To ensure correct behavior, kv must be updated after this call, as it might be used later in the function.	2025-05-28 08:02:10 +03:00
Salvatore Sanfilippo	0ac822e154	Implement WITHATTRIBS for VSIM. (#14065 ) Hi, as described, this implements WITHATTRIBS, a feature requested by a few users, and indeed needed. This was requested the first time by @rowantrollope but I was not sure how to make it work with RESP2 and RESP3 in a clean way, hopefully that's it. The patch includes tests and documentation updates.	2025-05-27 22:12:48 +08:00
debing.sun	bb23eb0b01	Fix incorrect server.cronloops update in defragWhileBlocked() causing timer to run twice as fast (#14081 ) This bug was introduced in [#13814](https://github.com/redis/redis/issues/13814), and was found by @guybe7. It incorrectly moved the update of `server.cronloops` from `whileBlockedCron()` to `activeDefragTimeProc()`, causing the cron-based timers to effectively run twice as fast when active defrag is enabled. As a result, memory statistics are not updated during blocked operations. The repair parts from https://github.com/redis/redis/pull/13995, because it needs to be backport, so use a separate pr repair it.	2025-05-27 17:14:06 +08:00
guybe7	6349a7c4f9	Add GETRANGE tests with negative indices (#13950 ) Inspired by https://github.com/redis/redis/pull/12272	2025-05-27 09:41:28 +08:00
debing.sun	e93b44560c	Resolve bounds checks on cluster_legacy.c (#13970 ) Based on https://github.com/valkey-io/valkey/pull/1463 and https://github.com/valkey-io/valkey/pull/1481 In the failure of fully CI(https://github.com/redis/redis/actions/runs/14595343452/job/40979173087?pr=13965) in version 7.0 we are getting a number of errors like: ``` array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’ ``` Which is basically GCC telling us that we have an object which is longer than the underlying storage of the allocation. We actually do this a lot, but GCC is generally not aware of how big the underlying allocation is, so it doesn't throw this error. We are specifically getting this error because the msgBlock can be of variable length depending on the type of message, but GCC assumes it's the longest one possible. The solution I went with here was make the message type optional, so that it wasn't included in the size. I think this also makes some sense, since it's really just a helper for us to easily cast the object around. This compilation warning only occurs in version 7.2, because in [this PR](https://github.com/redis/redis/pull/13073), we started passing `-flto` to `CFLAGS` by default. It seems that in this case, GCC is unable to detect such warnings. However, this change is not present in version 7.2. So, to reproduce this compilation warning in versions after 7.2, we can pass `OPTIMIZATION=-O2` manually. --------- Co-authored-by: madolson <34459052+madolson@users.noreply.github.com>	2025-05-26 11:52:06 +03:00
Salvatore Sanfilippo	22ebb06eb3	LOLWUT for Redis 8. (#14048 ) # Add LOLWUT 8: TAPE MARK I - Computer Poetry Generation This PR introduces LOLWUT 8, implementing Nanni Balestrini's groundbreaking TAPE MARK I algorithm from 1962 - one of the first experiments in computer-generated poetry. ## Background TAPE MARK I, created by Italian poet Nanni Balestrini and published in Almanacco Letterario Bompiani (1962), represents a [pioneering moment in computational creativity](https://en.wikipedia.org/wiki/Digital_poetry). Using an IBM 7090 mainframe, Balestrini developed an algorithm that combines verses from three different literary sources: 1. Diary of Hiroshima by Michihito Hachiya 2. The Mystery of the Elevator by Paul Goldwin 3. Tao Te Ching by Lao Tse The algorithm selects and arranges verses based on metrical compatibility rules and ensures alternation between different literary sources, creating unique poetic combinations with each execution. ## Implementation This LOLWUT command faithfully reproduces Balestrini's original algorithm. The main difference is that the default output is in English, and not in Italian. However it should be noted that Balestrini used three poems that were not in Italian anyway, so the translation process was already part of it. In the English versions, sometimes I operated minimal changes in order to preserve either the metric, or to make sure that the sentence stands on its own (like adding "it" before expands rapidly). ## Cultural Significance TAPE MARK I predates most computational art experiments and demonstrates the early intersection of literature, technology, and algorithmic creativity. This implementation honors that pioneering work while making it accessible to a modern audience through Redis's LOLWUT tradition. Each execution generates a unique poem, just as Balestrini intended. Trivia: the original code, running on an IBM 7090, used six minutes to generate each verse :D IMPORTANT This commit should be back-ported to Redis 8.	2025-05-26 09:27:45 +03:00
Vitah Lin	35e15962b5	Add length check before content comparison in equalStringObjects (#14062 ) ### Issue Previously, even when only string equality needed to be determined, the comparison logic still performed unnecessary `memcmp()` calls to check string ordering, even if the lengths were not equal. ### Change This PR add length check before content comparison in `equalStringObjects` function. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-24 11:47:13 +08:00
Yuan Wang	7998a2a05f	To reduce memory churn during client list transfer (#14068 ) When we transfer clients between IO thread and main thread, the creation and destruction of list nodes also consume some CPU. In this commit, we reuse list nodes to avoid this issue.	2025-05-23 19:21:29 +08:00
Hüseyin Açacak	645858d518	Add size_t cast for RM_call() in module tests (#14061 ) This PR addresses a potential misalignment issue when using `va_args`. Without this fix, [argument](`9a9aa921bc/src/module.c (L6249-L6264)`) values may occasionally become incorrect due to stack alignment inconsistencies.	2025-05-23 10:10:11 +08:00
Yuan Wang	99d30654e8	Let IO threads free argv and rewrite objects (#13968 ) Some objects that are allocated in the IO thread, we should let IO thread free them, so we can avoid memory arena contention and also reduce the load of the main thread. These objects include: - client argv objects - the rewrite objects that are only `OBJ_ENCODING_RAW` encoding strings, since only the type object is usually allocated by IO threads. For the implementation, if the client is assigned to IO threads, we will create a `deferred_objects` array of size 32. We will put objects into the `deferred_objects` when main thread wants to free above objects, and finally they are be freed by IO threads.	2025-05-23 09:09:58 +08:00
Vitah Lin	d592cb7409	Cleanup redundant declaration of kvstoreDictFetchValue() (#14066 )	2025-05-22 20:54:37 +08:00
debing.sun	ba88a7fbb6	Fix crash when freeing newly created node when nodeIp2String fail (#14055 ) This PR is a reference from https://github.com/valkey-io/valkey/pull/1535 In the process of handling the failure(https://github.com/redis/redis/pull/14024) of ARM64 CI with TLS, found in a slow environment, the nodes might frequently disconnect and trigger this assertion. --------- Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-05-21 10:04:40 +08:00
Moti Cohen	8bd50a3b35	Refine and optimize dbSetValue() (#14040 ) Optimize and simplify dbSetValue() code into one place in case not required to keepTTL.	2025-05-20 13:23:54 +03:00
Salvatore Sanfilippo	871d4c4004	Test: check always for memory leaks on MacOS. (#14060 ) When running the Redis test on MacOS, the test detects that the operating system is able to use "leaks" to test for memory leaks and executes this check after every server spinned is terminated. While we have the ability to run the test in environments able to detect memory issues, the fact it is possible to check for leaks at every run baasically for free is very valuable, and allows to fix leaks immediately in your laptop before submitting a PR. However, the feature avoided to run leaks when no test was run: this check was added in the early stage of Redis, when all the tests were like: server { test { ... } } So the check counts for the number of tests ran, and if no test is executed, no leaks detection is performed. However now we have certain tests that are in the form: test { server { ... } } For instance just loading a corrupted RDB or alike. In this case, the leaks test is not executed. This commit removes the check so that the leaks test is always executed.	2025-05-20 17:46:56 +08:00
Mincho Paskalev	8dfb823c51	Implement DIFF, DIFF1, ANDOR and ONE for BITOP (#13898 ) This PR adds 4 new operators to the `BITOP` command - `DIFF`, `DIFF1`, `ANDOR` and `ONE`. They enable redis clients to atomically do non-trivial logical operations that are useful for checking membership of a bitmap against a group of bitmaps. * DIFF `BITOP DIFF dest srckey1 srckey2 [key...]` Description DIFF(X, A1, A2, ..., AN) = X ∧ ¬(A1 ∨ A2 ∨ ... ∨ AN), i.e the set bits of X that are not set in any of A1, A2, …, AN NOTE Command expects at least 2 source keys. * DIFF1 `BITOP DIFF1 dest srckey1 srckey2 [key...]` Description DIFF1(X, A1, A2, ..., AN) = ¬X ∧ (A1 ∨ A2 ∨ ... ∨ AN), i.e the bits set in one or more of A1, A2, …, AN that are not set in X NOTE Command expects at least 2 source keys. * ANDOR `BITOP ANDOR dest srckey1 srckey2 [key...]` Description ANDOR(X, A1, A2, ..., AN) = X ∧ (A1 ∨ A2 ∨ ... ∨ AN), i.e the set bits of X that are also set in A1, A2, …, AN NOTE Command expects at least 2 source keys. * ONE `BITOP ONE dest key [key...]` Description ONE(A1, A2, ..., AN) = X, where if X[i] is the i-th bit of X then X[i] = 1 if and only if there is m such that A_m[i] = 1 and An[i] = 0 for all n != m, i.e bit X[i] is set only if it set in exactly one of A1, A2, ..., AN Return value As in all other `BITOP` operators return value for all the new ones is the number bytes of the longest key. EDIT: Besides adding the new commands couple more changes were made: - Added AVX2 path for more optimized computation of the BITOP operations (including the new ones) - Removed the hard limit of max 16 source keys for the fast path to be used - now no matter the number of keys we can enter the fast path given keys are long enough. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-20 10:45:50 +03:00
Mincho Paskalev	391e3452ca	Optimize hash and zset with lpBatchAppend (#13981 ) # Description Use lpBatchAppend/Insert instead of multiple calls to lpAppend/Insert in hash and zset HSET Improvement 2.2% ZADD Improvement 3.3%	2025-05-20 10:44:46 +03:00
Hyeon Sung	9a9aa921bc	extract duplicated AOF list formatting logic into helper function (#14012 ) Separated the repeated logic for iterating and formatting AOF info from both history and incremental AOF lists into a new helper function named appendAofInfoFromList. This improves code readability, reduces duplication, and makes the getAofManifestAsString function cleaner and easier to maintain. No changes in behavior were introduced.	2025-05-20 11:08:03 +08:00
Moti Cohen	51ad2f8d00	Fix keysizes - SPOP with count (case 3) and SETRANGE (#14028 ) This commit addresses issues with the keysizes histogram tracking in two Redis commands: SPOP with count (case 3) In the spopWithCountCommand function, when handling case 3 (where the number of elements to return is very large, approaching the size of the set itself), the keysizes histogram was not being properly updated. This PR adds the necessary call to updateKeysizesHist() to ensure the histogram accurately reflects the changes in set size after the operation. SETRANGE command Fixed an issue in the setrangeCommand function where the keysizes histogram wasn't being properly updated when modifying strings. The PR ensures that the histogram correctly tracks the old and new lengths of the string after a SETRANGE operation. Added tests accordingly.	2025-05-19 16:59:21 +03:00
debing.sun	5d0d64b062	Add support to defrag ebuckets incrementally (#13842 ) In PR #13229, we introduced the ebucket for HFE. Before this PR, when updating eitems stored in ebuckets, the lack of incremental fragmentation support for non-kvstore data structures (until PR #13814) meant that we had to reverse lookup the position of the eitem in the ebucket and then perform the update. This approach was inefficient as it often required frequent traversals of the segment list to locate and update the item. To address this issue, in this PR, This PR implements incremental fragmentation for hash dict ebuckets and server.hexpires. By incrementally defrag the ebuckets, we also perform defragmentation for the associated items, eliminates the need for frequent traversals of the segment list for defragging the eitem. --------- Co-authored-by: Moti Cohen <moticless@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-05-18 12:38:53 +08:00
chx9	090f6252c4	Fix typo in object.c (#13589 )	2025-05-17 12:33:37 +08:00
Moti Cohen	3b19c7919b	Fix test INFO overhead for 32bit architecture (#14035 ) This PR fixes the `tests/unit/info.tcl` test to properly handle 32-bit architectures by dynamically determining the pointer size based on the architecture instead of hardcoding it to 8 bytes.	2025-05-15 12:35:36 +03:00
debing.sun	ae0bb6e82a	Fix internal-secret test flakiness under slow environment (#14024 ) in the original test, we start a cluster with 20 instances(10 masters + 10 replicas), which leads to frequent disconnections of instances in a slow environment, resulting in an inability to achieve consistency. This test reduced the number of instances from 20 to 6.	2025-05-14 16:31:41 +08:00
Vitah Lin	232f2fb077	Include missing getchannels.tcl in moduleapi tests and fix incorrect assertions (#14037 )	2025-05-14 08:57:01 +08:00
Mincho Paskalev	6995d8ac17	Fix build flags for dependencies (#14038 ) PR https://github.com/redis/redis/pull/13916 introduced a regression - by overriding the `CFLAGS` and `LDFLAGS` variables for all of the dependencies hiredis and fast_float lost some of their compiler/linker flags. This PR makes it so we can pass additional CFLAGS/LDFLAGS to hiredis, without overriding them as it contains a bit more complex Makefile. As for fast_float - passing CFLAGS/LDFLAGS from outside now doesn't break the expected behavior. The build step in the CI was changed so that the MacOS is now build with TLS to catch such errors in the future.	2025-05-13 16:56:22 +03:00
Ozan Tezcan	a0b22576b8	Fix flaky replication test (#14034 ) - Fix flaky replication test which checks memory usage on master - Fix comments in another replication test	2025-05-13 13:29:27 +03:00
Yuan Wang	d5f7672b77	Make IO thread and main thread process in parallel and reduce notifications (#13969 ) In pipeline mode, especially with TLS, two IO threads may have worse performance than single thread, one reason is the io thread and the main thread cannot process in parallel, now, the IO threads will deliver clients if pending client list is more than 16 instead of finishing processing all clients, this approach can make IO threads and main thread process in parallel as much as possible. IO threads may do some unnecessary notification with the main thread, the notification is based on eventfd, read(2) and write(2) eventfd are system calls that are costly. When they are running, they can check the pending client list to process in `beforeSleep`, so in this commit, if both the main thread and the IO thread are running, they can pass the client without notification, and these transferred clients will be processed in `beforeSleep`.	2025-05-13 15:13:48 +08:00
Salvatore Sanfilippo	65e164caff	[Vector sets] More rdb loading fixes (#14032 ) Hi all, this PR fixes two things: 1. An assertion, that prevented the RDB loading from recovery if there was a quantization type mismatch (with regression test). 2. Two code paths that just returned NULL without proper cleanup during RDB loading.	2025-05-12 21:57:38 +03:00
Moti Cohen	e1789e4368	keyspace - Unify key and value & use dict no_value=1 (#13806 ) The idea of packing the key (`sds`), value (`robj`) and optionally TTL into a single struct in memory was mentioned a few times in the past by the community in various flavors. This approach improves memory efficiency, reduces pointer dereferences for faster lookups, and simplifies expiration management by keeping all relevant data in one place. This change goes along with setting keyspace's dict to no_value=1, and saving considerable amount of memory. Two more motivations that well aligned with this unification are: - Prepare the groundwork for replacing EXPIRE scan based implementation and evaluate instead new `ebuckets` data structure that was introduced as part of [Hash Field Expiration feature](https://redis.io/blog/hash-field-expiration-architecture-and-benchmarks/). Using this data structure requires embedding the ExpireMeta structure within each object. - Consider replacing dict with a more space efficient open addressing approach hash table that might rely on keeping a single pointer to object. Before this PR, I POC'ed on a variant of open addressing hash-table and was surprised to find that dict with no_value actually could provide a good balance between performance, memory efficiency, and simplicity. This realization prompted the separation of the unification step from the evaluation of a new hash table to avoid introducing too many changes at once and to evaluate its impact independently before considering replacement of existing hash-table. On an earlier [commit](https://github.com/redis/redis/pull/13683) I extended dict no_value optimization (which saves keeping dictEntry where possible) to be relevant also for objects with even addresses in memory. Combining it with this unification saves a considerable amount of memory for keyspace. # kvobj This PR adopts Valkey’s [packing](`3eb8314be6`) layout and logic for key, value, and TTL. However, unlike Valkey implementation, which retained a common `robj` throughout the project, this PR distinguishes between the general-purpose, overused `robj`, and the new `kvobj`, which embeds both the key and value and used by the keyspace. Conceptually, `robj` serves as a base class, while `kvobj` acts as a derived class. Two new flags introduced into redis object, `iskvobj` and `expirable`: ``` struct redisObject { unsigned type:4; unsigned encoding:4; unsigned lru:LRU_BITS; unsigned iskvobj : 1; /* new flag / unsigned expirable : 1; / new flag / unsigned refcount : 30; / modified: 32bits->30bits / void ptr; }; typedef struct redisObject robj; typedef struct redisObject kvobj; ``` When the `iskvobj` flag is set, the object includes also the key and it is appended to the end of the object. If the `expirable` flag is set, an additional 8 bytes are added to the object. If the object is of type string, and the string is rather short, then it will be embedded as well. As a result, all keys in the keyspace are promoted to be of type `kvobj`. This term attempts to align with the existing Redis object, robj, and the kvstore data structure. # EXPIRE Implementation As `kvobj` embeds expiration time as well, looking up expiration times is now an O(1) operation. And the hash-table of EXPIRE is set now to be `no_value` mode, directly referencing `kvobj` entries, and in turn, saves memory. Next, I plan to evaluate replacing the EXPIRE implementation with the [ebuckets](https://github.com/redis/redis/blob/unstable/src/ebuckets.h) data structure, which would eliminate keyspace scans for expired keys. This requires embedding `ExpireMeta` within each `kvobj` of each key with expiration. In such implementation, the `expirable` flag will be shifted to indicate whether `ExpireMeta` is attached. # Implementation notes ## Manipulating keyspace (find, modify, insert) Initially, unifying the key and value into a single object and storing it in dict with `no_value` optimization seemed like a quick win. However, it (quickly) became clear that this change required deeper modifications to how keys are manipulated. The challenge was handling cases where a dictEntry is opt-out due to no_value optimization. In such cases, many of the APIs that return the dictEntry from a lookup become insufficient, as it just might be the key itself. To address this issue, a new-old approach of returning a "link" to the looked-up key's `dictEntry` instead of the `dictEntry` itself. The term `link` was already somewhat available in dict API, and is well aligned with the new dictEntLink declaration: ``` typedef dictEntry *dictEntLink; ``` This PR introduces two new function APIs to dict to leverage returned link from the search: ``` dictEntLink dictFindLink(dict d, const void key, dictEntLink bucket); void dictSetKeyAtLink(dict d, void key, dictEntLink link, int newItem); ``` After calling `link = dictFindLink(...)`, any necessary updates must be performed immediately after by calling `dictSetKeyAtLink()` without any intervening operations on given dict. Otherwise, `dictEntLink` may become invalid. Example: ``` / replace existing key / link = dictFindLink(d, key, &bucket, 0); // ... Do something, but don't modify the dict ... // assert(link != NULL); dictSetKeyAtLink(d, kv, &link, 0); / Add new value (If no space for the new key, dict will be expanded and bucket will be looked up again.) / link = dictFindLink(d, key, &bucket); // ... Do something, but don't modify the dict ... // assert(link == NULL); dictSetKeyAtLink(d, kv, &bucket, 1); ``` ## dict.h - The dict API has became cluttered with many unused functions. I have removed these from dict.h. - Additionally, APIs specifically related to hash maps (no_value=0), primarily those handling key-value access, have been gathered and isolated. - Removed entirely internal functions ending with “ByHash()” that were originally added for optimization and not required any more. - Few other legacy dict functions were adapted at API level to work with the term dictEntLink as well. - Simplified and generalized an optimization that related to comparison of length of keys of type strings. ## Hash Field Expiration Until now each hash object with expiration on fields needed to maintain a reference to its key-name (of the hash object), such that in case it will be active-expired, then it will be possible to resolve the key-name for the notification sake. Now there is no need anymore. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-12 10:15:17 +03:00
Mincho Paskalev	4d9b4d6e51	Input output traffic stats and command process count for each client. (#13944 )	2025-05-09 16:55:47 +03:00
Mincho Paskalev	fdbf88032c	Add MSan and integrate it with CI (#13916 ) ## Description Memory sanitizer (MSAN) is used to detect use-of-uninitialized memory issues. While Address Sanitizer catches a wide range of memory safety issues, it doesn't specifically detect uninitialized memory usage. Therefore, Memory Sanitizer complements Address Sanitizer. This PR adds MSAN run to the daily build, with the possibility of incorporating it into the ci.yml workflow in the future if needed. Changes in source files fix false-positive issues and they should not introduce any runtime implications. Note: Valgrind performs similar checks to both ASAN and MSAN but sanitizers run significantly faster. ## Limitations - Memory sanitizer is only supported by Clang. - MSAN documentation states that all dependencies, including the standard library, must be compiled with MSAN. However, it also mentions there are interceptors for common libc functions, so compiling the standard library with the MSAN flag is not strictly necessary. Therefore, we are not compiling libc with MSAN. --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>	2025-05-09 11:44:54 +03:00
Vitah Lin	8148e4116e	Remove redundant HASH_SET_COPY macro and hfieldGetExpireTime declaration (#13991 )	2025-05-09 15:22:28 +08:00
George Padron	538713e622	Fix minor grammatical error (#14022 )	2025-05-09 15:20:56 +08:00
Yuan Wang	191afb8903	Reclaim page cache memory used by the AOF file after loading (#13811 ) We can reclaim page cache memory used by the AOF file after loading, since we don't read AOF again, corresponding to https://github.com/redis/redis/pull/11248 There is a test after loading 9.5GB AOF, this PR uses much less `buff/cache` than unstable. Unstable ``` $ free -m total used free shared buff/cache available Mem: 31293 16181 4562 13 10958 15111 Swap: 0 0 0 ``` This PR ``` $ free -m total used free shared buff/cache available Mem: 31293 15391 15854 13 439 15902 Swap: 0 0 0 ``` --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-09 10:05:37 +08:00
Yuan Wang	714ea20fec	Delete the tmp file that was accidentally submitted (#14027 ) #13809 accidentally submitted a tmp rdb file, so delete it	2025-05-09 09:48:44 +08:00
Will Johnston	98335e1237	doc: adding TOC to readme, improving usability (#14021 ) The PR aims to improve the README usability for new users as well as developers looking to go in depth. Key improvements include: - Structure & Navigation: - Introduces a detailed Table of Contents for easier navigation. - Improved overall organization of sections. - Content: - Expanded "What is Redis?" with section for "Key use cases" - Expanded "Why choose Redis?" section - New "Getting started" section, including Redis starter projects and ordering of sections based on desired use for new users - Changes to "Redis data types, processing engines, and capabilities" section for better readability and consistency - Formatting markdown blocks to specify language	2025-05-08 22:05:21 +03:00
Moti Cohen	30d5f05637	Fix various KEYSIZES enumeration issues (#13923 ) There are several issues with maintaining histogram counters. Ideally, the hooks would be placed in the low-level datatype implementations. However, this logic is triggered in various contexts and doesn’t always map directly to a stored DB key. As a result, the hooks sit closer to the high-level commands layer. It’s a bit messy, but the right way to ensure histogram counters behave correctly is through broad test coverage. * Fix inaccuracies around deletion scenarios. * Fix inaccuracies around modules calls. Added corresponding tests. * The info-keysizes.tcl test has been extended to operate on meaningful datasets * Validate histogram correctness in edge cases involving collection deletions. * Add new macro debugServerAssert(). Effective only if compiled with DEBUG_ASSERTIONS. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-08 10:59:12 +03:00
Yuan Wang	6a436b6f72	Redis-cli gets RDB by RDB channel (#13809 ) Now we have RDB channel in https://github.com/redis/redis/pull/13732, child process can transfer RDB in a background method, instead of handled by main thread. So when redis-cli gets RDB from server, we can adopt this way to reduce the main thread load. --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>	2025-05-08 08:47:29 +08:00
Salvatore Sanfilippo	a46624e10e	[Vector sets] RDB IO errors handling (#13978 ) This PR adds support for REDISMODULE_OPTIONS_HANDLE_IO_ERRORS. and tests for short read and corrupted RESTORE payload. Please: note that I also removed the comment about async loading support since we should be already covered. No manipulation of global data structures in Vector Sets, if not for the unique ID used to create new vector sets with different IDs.	2025-05-07 21:49:00 +03:00
debing.sun	ac0bef15b5	Correctly update kvstore overhead after emptying or releasing dict (#13984 ) Close #13973 This PR fixed two bugs. 1) `overhead_hashtable_lut` isn't updated correctly This bug was introduced by https://github.com/redis/redis/pull/12913 We only update `overhead_hashtable_lut` at the beginning and end of rehashing, but we forgot to update it when a dict is emptied or released. This PR introduces a new `bucketChanged` callback to track the change changes in the bucket size. Now, `rehashingStarted` and `rehashingCompleted` callbacks are no longer responsible for bucket changes, but are entirely handled by `bucketChanged`, this can also avoid that we need to register three callbacks to track the change of bucket size, now only one is needed. In most cases, it will be triggered together with `rehashingStarted` or `rehashingCompleted`, except when a dict is being emptied or released, in these cases, even if the dict is not rehashing, we still need to subtract its current size. On the other hand, `overhead_hashtable_lut` was duplicated with `bucket_count`, so we remove `overhead_hashtable_lut` and use `bucket_count` instead. Note that this bug only happens with cluster mode, because we don't use KVSTORE_FREE_EMPTY_DICTS without cluster. 2) The size of `dict_size_index` repeatedly counted in terms of memory usage. `dict_size_index` is created at startup, so its memory usage has been counted into `used_memory_startup`. However, when we want to count the overhead, we repeat the calculation, which may cause the overhead to exceed the total memory usage. --------- Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2025-05-07 16:45:23 +08:00
Alexander Gorbulya	97d7d2f865	Fix typo in replication state log message (#13805 ) The log message incorrectly referred to the expected state as `RECEIVE_PSYNC`, while it should be `RECEIVE_PSYNC_REPLY`. This aligns the log with the actual state check.	2025-05-07 15:28:45 +08:00
Yuan Wang	57a5f51f26	Reduce the call of ERR_clear_error (#13903 ) From flame graph, we can find `ERR_clear_error` costs much cpu in tls mode, some calls of `ERR_clear_error` are duplicate, in function `tlsHandleEvent`, we call `ERR_clear_error` but we also call `ERR_clear_error` when reading and writing, so it is not necessary. from benchmark, this commit can bring 2-3% performance improvement.	2025-05-07 15:24:08 +08:00
Eran Hadad	a3f1d09a7d	Update TS, JSON and Bloom to 8.0.1 (#14013 )	2025-05-06 21:20:29 +03:00
alonre24	14578b3b8b	RQE - bump version to 8.0.1 (#14011 )	2025-05-06 21:19:43 +03:00
chx9	11954d925e	Fix sds leak in slaveTryPartialResynchronization (#13996 ) 1. Fix sds leak in slaveTryPartialResynchronization 2. delete wrong comments	2025-05-06 21:53:52 +08:00
Lior Kogan	2668356595	LICENSE.txt wrongly included the text of GPLv3 instead of AGPLv3 (#14010 )	2025-05-06 14:45:36 +03:00
Vitah Lin	47505c3533	Fix 'Client output buffer hard limit is enforced' test causing infinite loop (#13934 ) This PR fixes an issue in the CI test for client-output-buffer-limit, which was causing an infinite loop when running on macOS 15.4. ### Problem This test start two clients, R and R1: ```c R1 subscribe foo R publish foo bar ``` When R executes `PUBLISH foo bar`, the server first stores the message `bar` in R1‘s buf. Only when the space in buf is insufficient does it call `_addReplyProtoToList`. Inside this function, `closeClientOnOutputBufferLimitReached` is invoked to check whether the client’s R1 output buffer has reached its configured limit. On macOS 15.4, because the server writes to the client at a high speed, R1’s buf never gets full. As a result, `closeClientOnOutputBufferLimitReached` in the test is never triggered, causing the test to never exit and fall into an infinite loop. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-06 10:44:16 +08:00

1 2 3 4 5 ...

12622 commits