redis/tests/unit
Yuan Wang 70a079db5e
Improve multithreaded performance with memory prefetching (#14017)
This PR is based on: https://github.com/valkey-io/valkey/pull/861

> ### Memory Access Amortization
> (Designed and implemented by [dan
touitou](https://github.com/touitou-dan))
> 
> Memory Access Amortization (MAA) is a technique designed to optimize
the performance of dynamic data structures by reducing the impact of
memory access latency. It is applicable when multiple operations need to
be executed concurrently. The principle behind it is that for certain
dynamic data structures, executing operations in a batch is more
efficient than executing each one separately.
> 
> Rather than executing operations sequentially, this approach
interleaves the execution of all operations. This is done in such a way
that whenever a memory access is required during an operation, the
program prefetches the necessary memory and transitions to another
operation. This ensures that when one operation is blocked awaiting
memory access, other memory accesses are executed in parallel, thereby
reducing the average access latency.
> 
> We applied this method in the development of dictPrefetch, which takes
as parameters a vector of keys and dictionaries. It ensures that all
memory addresses required to execute dictionary operations for these
keys are loaded into the L1-L3 caches when executing commands.
Essentially, dictPrefetch is an interleaved execution of dictFind for
all the keys.

### Implementation of Redis
When the main thread processes clients with ready-to-execute commands
(i.e., clients for which the IO thread has parsed the commands), a batch
of up to 16 commands is created. Initially, the command's argv, which
were allocated by the IO thread, is prefetched to the main thread's L1
cache. Subsequently, all the dict entries and values required for the
commands are prefetched from the dictionary before the command
execution.

#### Memory prefetching for main hash table
As shown in the picture, after https://github.com/redis/redis/pull/13806
, we unify key value and the dict uses no_value optimization, so the
memory prefetching has 4 steps:

1. prefetch the bucket of the hash table
2. prefetch the entry associated with the given key's hash
3. prefetch the kv object of the entry
4. prefetch the value data of the kv object

we also need to handle the case that the dict entry is the pointer of kv
object, just skip step 3.

MAA can improves single-threaded memory access efficiency by
interleaving the execution of multiple independent operations, allowing
memory-level parallelism and better CPU utilization. Its key point is
batch-wise interleaved execution. Split a batch of independent
operations (such as multiple key lookups) into multiple state machines,
and interleave their progress within a single thread to hide the memory
access latency of individual requests.

The difference between serial execution and interleaved execution:
**naive serial execution**
```
key1: step1 → wait → step2 → wait → done
key2: step1 → wait → step2 → wait → done
```
**interleaved execution**
```
key1: step1   → step2   → done
key2:   step1 → step2   → done
key3:     step1 → step2 → done
         ↑ While waiting for key1’s memory, progress key2/key3
```

#### New configuration
This PR involves a new configuration `prefetch-batch-max-size`, but we
think it is a low level optimization, so we hide this config:
When multiple commands are parsed by the I/O threads and ready for
execution, we take advantage of knowing the next set of commands and
prefetch their required dictionary entries in a batch. This reduces
memory access costs. The optimal batch size depends on the specific
workflow of the user. The default batch size is 16, which can be
modified using the 'prefetch-batch-max-size' config.
When the config is set to 0, prefetching is disabled.

---------

Co-authored-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2025-06-05 08:57:43 +08:00
..
cluster Fix internal-secret test flakiness under slow environment (#14024) 2025-05-14 16:31:41 +08:00
moduleapi Add thread sanitizer run to daily CI (#13964) 2025-06-02 10:13:23 +03:00
type Add GETRANGE tests with negative indices (#13950) 2025-05-27 09:41:28 +08:00
acl-v2.tcl Fix Read/Write key pattern selector (CVE-2024-51741) 2025-01-13 21:20:19 +02:00
acl.tcl Free current client asynchronously after user permissions changes (#13274) 2024-05-30 22:09:30 +08:00
aofrw.tcl Attempt to solve MacOS CI issues in GH Actions (#12013) 2023-04-12 09:19:21 +03:00
auth.tcl Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
bitfield.tcl Add BITFIELD_RO basic tests for non-repl use cases (#12187) 2023-05-18 12:16:46 +03:00
bitops.tcl Implement DIFF, DIFF1, ANDOR and ONE for BITOP (#13898) 2025-05-20 10:45:50 +03:00
client-eviction.tcl Add thread sanitizer run to daily CI (#13964) 2025-06-02 10:13:23 +03:00
dump.tcl Fix RESTORE with TTL (#14071) 2025-05-28 08:02:10 +03:00
expire.tcl Fix dictionary use-after-free in active expire and make kvstore iter to respect EMPTY flag (#13135) 2024-03-18 17:41:54 +02:00
functions.tcl Trigger Lua GC after script loading (#13407) 2024-07-16 09:28:47 +08:00
geo.tcl adding geo command edge cases tests (#12274) 2023-06-20 12:50:03 +03:00
hyperloglog.tcl Fix bug in PFMERGE command (#13672) 2024-12-18 14:41:04 +08:00
info-command.tcl Make INFO command variadic (#6891) 2022-02-08 13:14:42 +02:00
info-keysizes.tcl Fix keysizes - SPOP with count (case 3) and SETRANGE (#14028) 2025-05-19 16:59:21 +03:00
info.tcl Fix test INFO overhead for 32bit architecture (#14035) 2025-05-15 12:35:36 +03:00
introspection-2.tcl RED-129256, Fix TOUCH command from script in no-touch mode (#13512) 2024-09-12 11:33:26 +03:00
introspection.tcl Input output traffic stats and command process count for each client. (#13944) 2025-05-09 16:55:47 +03:00
keyspace.tcl Prevent pattern matching abuse (CVE-2024-31228) 2024-10-08 20:55:44 +03:00
latency-monitor.tcl Add printing for LATENCY related tests (#12514) 2023-08-27 11:42:55 +03:00
lazyfree.tcl Fix timing issue in lazyfree test (#13926) 2025-04-13 20:32:16 +08:00
limits.tcl Improve test suite to handle external servers better. (#9033) 2021-06-09 15:13:24 +03:00
maxmemory.tcl Add thread sanitizer run to daily CI (#13964) 2025-06-02 10:13:23 +03:00
memefficiency.tcl Add thread sanitizer run to daily CI (#13964) 2025-06-02 10:13:23 +03:00
multi.tcl Fix propagation of entries_read by calling streamPropagateGroupID unconditionally (#12898) 2024-02-29 09:48:20 +02:00
networking.tcl Improve multithreaded performance with memory prefetching (#14017) 2025-06-05 08:57:43 +08:00
obuf-limits.tcl Fix 'Client output buffer hard limit is enforced' test causing infinite loop (#13934) 2025-05-06 10:44:16 +08:00
oom-score-adj.tcl Check user's oom_score_adj write permission for oom-score-adj test (#13111) 2024-03-05 14:42:28 +02:00
other.tcl Cluster compatibility check (#13846) 2025-03-20 10:35:53 +08:00
pause.tcl Fix potential infinite loop of RANDOMKEY during client pause (#13863) 2025-03-20 21:32:12 +08:00
printver.tcl Print version info before running the test 2011-05-20 11:44:54 +02:00
protocol.tcl Fix crash due to cron argv release (#13725) 2025-01-08 09:57:23 +08:00
pubsub.tcl Fix order of KSN for hgetex command (#13931) 2025-04-14 13:31:31 +03:00
pubsubshard.tcl Async IO Threads (#13695) 2024-12-23 14:16:40 +08:00
querybuf.tcl Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
quit.tcl flushSlavesOutputBuffers should not write to replicas scheduled to drop (#12242) 2023-06-12 14:05:34 +03:00
replybufsize.tcl Introduce debug command to disable reply buffer resizing (#10360) 2022-03-01 14:40:29 +02:00
scan.tcl Revert "improve performance for scan command when matching data type (#12395)" 2025-02-05 20:49:42 +02:00
scripting.tcl Fix memory leak of jemalloc tcache on function flush command (#13661) 2024-11-21 14:12:58 +03:00
shutdown.tcl Tests: Do not save an RDB by default and add a SIGTERM default AOFRW test (#12064) 2023-04-18 16:14:26 +03:00
slowlog.tcl Exit early if slowlog/acllog max len set to zero (#12965) 2024-01-22 16:01:04 -08:00
sort.tcl Fix get # option in sort command (#13608) 2024-10-22 09:55:00 +08:00
tls.tcl Add support for reading encrypted keyfiles. (#8644) 2021-03-22 13:27:46 +02:00
tracking.tcl Bump codespell from 2.2.4 to 2.2.5 (#12557) 2023-09-08 16:10:17 +03:00
violations.tcl Run large-memory tests as solo. (#10626) 2022-04-24 17:29:35 +03:00
wait.tcl WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync (#12622) 2023-09-28 17:19:20 +03:00