Commit graph

298 commits

Author SHA1 Message Date
Slavomir Kaslev
ca681f997e
Add LTRIM/LREM and RM_StringTruncate() memory tracking tests (#14751)
Add LTRIM/LREM and RM_StringTruncate() memory tracking tests.
2026-01-29 13:04:46 +02:00
Paulo Sousa
c4baa64ea8
Optimize peak memory stats by switching from per-command checks to threshold-based (#14692)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
This PR optimizes peak memory tracking by moving from **per-command
checks** to a **threshold-based mechanism** in `zmalloc`.

Instead of updating peak memory on every command, peak tracking is now
triggered only when a thread's memory delta exceeds **100KB**. This
reduces runtime overhead while keeping peak memory accuracy acceptable.

## Implementation Details

- Peak memory is tracked atomically in `zmalloc` when a thread's memory
delta exceeds 100KB
- Thread-safe peak updates using CAS
- Peak tracking considers both:
  - current used memory
  - zmalloc-reported peak memory

## Performance Results (ARM AArch64)

All performance numbers were obtained on an **AWS m8g.metal (ARM
AArch64)** instance.

The database was pre-populated with **1M keys**, each holding a **1KB
value**.
Benchmarks were executed using memtier with a **10 SET : 90 GET ratio**
and **pipeline = 10** ([full benchmark spec.
here](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-setget200c-1KiB-pipeline-10.yml)).

| Environment | Baseline `redis/redis` unstable (median ± std.dev) |
Comparison `paulorsousa/redis`
`f05a4bd273cb4d63ff03d33e6207837b6e51de86` (median) | % change (higher
better) | Note |

|------------------------------|----------------------------------------------------|----------------------------------------------------------------------------------:|--------------------------|-----------------------|
| oss-standalone | 802,830 ± 0.2% (7 datapoints) | 796,660 | -0.8% | No
change |
| oss-standalone-02-io-threads | 982,698 ± 0.6% (7 datapoints) | 980,520
| -0.2% | No change |
| oss-standalone-04-io-threads | 2,573,244 ± 1.9% (7 datapoints) |
2,630,931 | +2.2% | Potential improvement |
| oss-standalone-08-io-threads | 2,343,609 ± 1.6% (7 datapoints) |
2,455,630 | +4.8% | Improvement |
2026-01-21 22:52:31 +08:00
Mincho Paskalev
e3c38aab66
Handle primary/replica clients in IO threads (#14335)
# Problem

While introducing Async IO
threads(https://github.com/redis/redis/pull/13695) primary and replica
clients were left to be handled inside main thread due to data race and
synchronization issues. This PR solves this issue with the additional
hope it increases performance of replication.

# Overview

## Moving the clients to IO threads

Since clients first participate in a handshake and an RDB replication
phases it was decided they are moved to IO-thread after RDB replication
is done. For primary client this was trivial as the master client is
created only after RDB sync (+ some additional checks one can see in
`isClientMustHandledByMainThread`). Replica clients though are moved to
IO threads immediately after connection (as are all clients) so
currently in `unstable` replication happens while this client is in
IO-thread. In this PR it was moved to main thread after receiving the
first `REPLCONF` message from the replica, but it is a bit hacky and we
can remove it. I didn't find issues between the two versions.

## Primary client (replica node)

We have few issues here:
- during `serverCron` a `replicationCron` is ran which periodically
sends `REPLCONF ACK` message to the master, also checks for timed-out
master. In order to prevent data races we utilize`IOThreadClientsCron`.
The client is periodically sent to main thread and during
`processClientsFromIOThread` it's checked if it needs to run the
replication cron behaviour.

- data races with main thread - specifically `lastinteraction` and
`read_reploff` members of the primary client that are written to in
`readQueryFromClient` could be accessed at the same time from main
thread during execution of `INFO REPLICATION`(`genRedisInfoString`). To
solve this the members were duplicated so if the client is in IO-thread
it writes to the duplicates and they are synced with the original
variables each time the client is send to main thread ( that means `INFO
REPLICATION` could potentially return stale values).

- During `freeClient` the primary client is fetched to main thread but
when caching it(`replicationCacheMaster`) the thread id will remain the
id of the IO thread it was from. This creates problems when resurrecting
the master client. Here the call to `unbindClientFromIOThreadEventLoop`
in `freeClient` was rewritten to call `keepClientInMainThread` which
automatically fixes the problem.

- During `exitScriptTimedoutMode` the master is queued for reprocessing
(specifically process any pending commands ASAP after it's unblocked).
We do that by putting it in the `server.unblocked_clients` list, which
are processed in the next `beforeSleep` cycle in main thread. Since this
will create a contention between main and IO thread, we just skip this
queueing in `unblocked_clients` and just queue the client to main thread
- the `processClientsFromIOThread` will process the pending commands
just as main would have.

## Replica clients (primary node)

We move the client after RDB replication is done and after replication
backlog is fed with its first message.
We do that so that the client's reference to the first replication
backlog node is initialized before it's read from IO-thread, hence no
contention with main thread on it.

### Shared replication buffer

Currently in unstable the replication buffer is shared amongst clients.
This is done via clients holding references to the nodes inside the
buffer. A node from the buffer can be trimmed once each replica client
has read it and send its contents. The reference is
`client->ref_repl_buf_node`. The replication buffer is written to by
main thread in `feedReplicationBuffer` and the refcounting is intrusive
- it's inside the replication-buffer nodes themselves.

Since the replica client changes the refcount (decreases the refcount of
the node it has just read, and increases the refcount of the next node
it starts to read) during `writeToClient` we have a data race with main
thread when it feeds the replication buffer. Moreover, main thread also
updates the `used` size of the node - how much it has written to it,
compared to its capacity which the replica client relies on to know how
much to read. Obviously replica being in IO-thread creates another data
race here. To mitigate these issues a few new variables were added to
the client's struct:

- `io_curr_repl_node` - starting node this replica is reading from
inside IO-thread
- `io_bound_repl_node` - the last node in the replication buffer the
replica sees before being send to IO-thread.

These values are only allowed to be updated in main thread. The client
keeps track of how much it has read into the buffer via the old
`ref_repl_buf_node`. Generally while in IO-thread the replica client
will now keep refcount of the `io_curr_repl_node` until it's processed
all the nodes up to `io_bound_repl_node` - at that point its returned to
main thread which can safely update the refcounts.
The `io_bound_repl_node` reference is there so the replica knows when to
stop reading from the repl buffer - imagine that replica reads from the
last node of the replication buffer while main thread feeds data to it -
we will create a data race on the `used` value
(`_writeToClientSlave`(IO-thread) vs `feedReplicationBuffer`(main)).
That's why this value is updated just before the replica is being send
to IO thread.
*NOTE*, this means that when replicas are handled by IO threads they
will hold more than one node at a time (i.e `io_curr_repl_node` up to
`io_bound_repl_node`) meaning trimming will happen a bit less
frequently. Tests show no significant problems with that.
(tnx to @ShooterIT for the `io_curr_repl_node` and `io_bound_repl_node`
mechanism as my initial implementation had similar semantics but was way
less clear)

Example of how this works:

* Replication buffer state at time N:
   | node 0| ... | node M, used_size K |
* replica caches `io_curr_repl_node`=0, `io_bound_repl_node`=M and
`io_bound_block_pos`=K
* replica moves to IO thread and processes all the data it sees
* Replication buffer state at time N + 1:
| node 0| ... | node M, used_size Full | |node M + 1| |node M + 2,
used_size L|, where Full > M
* replica moves to main thread at time N + 1, at this point following
happens
   - refcount to node 0 (io_curr_repl_node) is decreased
- `ref_repl_buf_node` becomes node M(io_bound_repl_node) (we still have
size-K bytes to process from there)
- refcount to node M is increased (now all nodes from 0 up to M-1
including can be trimmed unless some other replica holds reference to
them)
- And just before the replica is send back to IO thread the following
are updated:
   - `io_bound_repl_node` ref becomes node M+2
   - `io_bound_block_pos` becomes L

Note that replica client is only moved to main if it has processed all
the data it knows about (i.e up to `io_bound_repl_node` +
`io_bound_block_pos`)

### Replica clients kept in main as much as possible

During implementation an issue arose - how fast is the replica client
able to get knowledge about new data from the replication buffer and how
fast can it trim it. In order for that to happen ASAP whenever a replica
is moved to main it remains there until the replication buffer is fed
new data. At that point its put in the pending write queue and special
cased in handleClientsWithPendingWrites so that its send to IO thread
ASAP to write the new data to replica. Also since each time the replica
writes its whole repl data it knows about that means after it's send to
main thread `processClientsFromIOThread` is able to immediately update
the refcounts and trim whatever it can.

### ACK messages from primary

Slave clients need to periodically read `REPLCONF ACK` messages from
client. Since replica can remain in main thread indefinitely if no DB
change occurs, a new atomic `pending_read` was added during
`readQueryFromClient`. If a replica client has a pending read it's
returned back to IO-thread in order to process the read even if there is
no pending repl data to write.

### Replicas during shutdown

During shutdown the main thread pauses write actions and periodically
checks if all replicas have reached the same replication offset as the
primary node. During `finishShutdown` that may or may not be the case.
Either way a client data may be read from the replicas and even we may
try to write any pending data to them inside `flushSlavesOutputBuffers`.
In order to prevent races all the replicas from IO threads are moved to
main via `fetchClientFromIOThread`. A cancel of the shutdown should be
ok, since the mechanism employed by `handleClientsWithPendingWrites`
should return the client back to IO thread when needed.

## Notes

While adding new tests timing issues with Tsan tests were found and
fixed.

Also there is a data race issue caught by Tsan on the `last_error`
member of the `client` struct. It happens when both IO-thread and main
thread make a syscall using a `client` instance - this can happen only
for primary and replica clients since their data can be accessed by
commands send from other clients. Specific example is the `INFO
REPLICATION` command.
Although other such races were fixed, as described above, this once is
insignificant and it was decided to be ignored in `tsan.sup`.

---------

Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2026-01-21 16:19:12 +02:00
Stav-Levi
25f780b662
Fix crash when calling internal container command without arguments (#14690)
Addresses crash and clarifies errors around container commands.

- Update server.c to handle container commands with no subcommand: emit
"missing subcommand. Try HELP."; keep "unknown subcommand" for invalid
subcommands; for unknown commands, include args preview only when
present
- Add a test module command subcommands.internal_container with a
subcommand for validation
- Add unit test asserting missing subcommand error when calling the
internal container command without arguments
2026-01-21 08:38:04 +02:00
Moti Cohen
11e73c66a8
Modules KeyMeta (Keys Metadata) (#14445)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Modules KeyMeta (Keys Metadata)

Redis modules often need to associate additional metadata with keys in
the keyspace. The objective is to create a unified and extensible
interface, usable by modules, Redis core, and maybe later by the users,
that facilitate the association and management of metadata with keys.
While extending RedisModuleTypes might be an easier path, this proposal
goes one step further: a general-purpose mechanism that lets attach
metadata to any key, independent of underlying data type.

A major part of this feature involves defining how metadata is managed
throughout a key’s lifecycle. Modules will be able to optionally
register distinct metadata classes, each with its own lifecycle
callbacks and capable of storing arbitrary 8-byte value per key. These
metadata values will be embedded directly within Redis’s core key-value
objects to ensure fast access and automatic callback execution as keys
are created, updated, or deleted. Each 8 bytes of metadata can represent
either a simple primitive value or a pointer/handle to more complex,
externally managed data by the module and RDB serialized along with the
key.

Key Features:
- Modules can register up to 7 metadata classes (8 total, 1 reserved)
- Each class: 4-char name + 5-bit version (e.g., "SRC1" v1)
- Each class attaches 8 bytes per key (value or pointer/handle)
- Separate namespace from module data types

Module API:
- RedisModule_CreateKeyMetaClass() - Register metadata class
- RedisModule_ReleaseKeyMetaClass() - Release metadata class
- RedisModule_SetKeyMeta() - Attach/update metadata
- RedisModule_GetKeyMeta() - Retrieve metadata

Lifecycle Callbacks:
- copy, rename, move - Handle key operations
- unlink, free - Handle key deletion/expiration
- rdb_save, rdb_load - RDB persistence
- aof_rewrite - AOF rewrite support

Implementation:
- Metadata slots allocated before kvobj in reverse class ID order
- 8-bit metabits bitmap tracks active classes per key
- Minimal memory overhead - only allocated slots consume memory

RDB Serialization (v13):
- New opcode RDB_OPCODE_KEY_METADATA
- Compact 32-bit class spec: 24-bit name + 5-bit ver + 3-bit flags
- Self-contained format: [META,] TYPE, KEY, VALUE
- Portable across cluster nodes

Integration:
- Core ops: dbAdd, dbSet, COPY, MOVE, RENAME, DELETE
- DUMP/RESTORE support
- AOF rewrite via module callbacks
- Defragmentation support
- Module type I/O refactored to ModuleEntityId
2026-01-15 23:11:17 +02:00
Filipe Oliveira (Redis)
7e7c7b0558
Fix flaky test failures in caused by clock precision issues with monotonic clock. (#14697)
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-centos-jemalloc (push) Waiting to run
CI / build-old-chain-jemalloc (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
Fix flaky test failures in `tests/unit/moduleapi/blockedclient.tcl`
caused by
clock precision issues with monotonic clock.

The test runs a command that blocks for 200ms and then asserts the
elapsed time
is >= 200ms. Due to clock skew and timing precision differences, the
measured
time occasionally comes back as 199ms, causing spurious test failures.
2026-01-14 19:44:05 +08:00
debing.sun
0cb1ee0dc1
New eviction policies - least recently modified (#14624)
### Summary

This PR introduces two new maxmemory eviction policies: `volatile-lrm`
and `allkeys-lrm`.
LRM (Least Recently Modified) is similar to LRU but only updates the
timestamp on write operations, not read operations. This makes it useful
for evicting keys that haven't been modified recently, regardless of how
frequently they are read.

### Core Implementation

The LRM implementation reuses the existing LRU infrastructure but with a
key difference in when timestamps are updated:

- **LRU**: Updates timestamp on both read and write operations
- **LRM**: Updates timestamp only on write operations via `updateLRM()`

### Key changes:
Add `keyModified()` to accept an optional `robj *val` parameter and call
`updateLRM()` when a value is provided. Since `keyModified()` serves as
the unified entry point for all key modifications, placing the LRM
update here ensures timestamps are consistently updated across all write
operations

---------

Co-authored-by: oranagra <oran@redislabs.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2026-01-06 20:57:31 +08:00
Yuan Wang
7a3cb3b4b3
Fix CI flaky tests (#14531)
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
- https://github.com/redis/redis/actions/runs/19200504999/job/54887625884
   avoid calling `start_write_load` before pausing the destination node

- https://github.com/redis/redis/actions/runs/18958533020/job/54140746904
maybe the replica did not sync with master, then the replica did not update the counter
2025-11-19 17:10:57 +08:00
Mincho Paskalev
91b5808fd6
Fix numeric config boundry check (#14286)
Revert a breaking change introduced in #14051 described in this comment
https://github.com/redis/redis/pull/14051#discussion_r2281765769

The non-negative check inside `checkNumericBoundaries` was ignoring that
passing a big unsigned long long value (> 2^63-1) will be passed as
negative value and will never reach the lower/upper boundary check.

The check is removed, reverting the breaking change.

This allows for RedisModule_ConfigSetNumeric to pass big unsigned number, albeit via `long long` parameter. Added comments about this behaviour.

Added tests for https://github.com/redis/redis/pull/14051#discussion_r2281765769
2025-10-28 13:40:44 +02:00
Stav-Levi
e0bb2a31d4
Remove unnecessary skip:external for module tests (#14354)
This PR follows: https://github.com/redis/redis/pull/14226
this pr goal is to remove unnecessary skip:external
2025-09-28 17:09:50 +03:00
debing.sun
60adba48aa
Introduce DEBUG_DEFRAG compilation option to allow run test with activedefrag when allocator is not jemalloc (#14326)
This PR is based on https://github.com/valkey-io/valkey/pull/1303

This PR introduces a DEBUG_DEFRAG compilation option that enables
activedefrag functionality even when the allocator is not jemalloc, and
always forces defragmentation regardless of the amount or ratio of
fragmentation.

## Using
```
make SANITIZER=address DEBUG_DEFRAG=<force|fully>
./runtest --debug-defrag
```

* DEBUG_DEFRAG=force
   * Ignore the threshold for defragmentation to ensure that
defragmentation is always triggered.
   * Always reallocate pointers to probe for correctness issues in pointer
reallocation.

* DEBUG_DEFRAG=fully
   * Includes everything in the option `force`.
   * Additionally performs a full defrag on every defrag cycle, which is
significantly slower but more accurate.

---------

Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-09-10 12:52:20 +08:00
debing.sun
e2b8f8ff6d
Fix timing issue for module defrag test (#14305)
These two tests often fail in the slow environment.
1. `Module defrag: late defrag with cursor works` test
`defragtest_datatype_resumes` in a defrag cycle does not always reach 10
times, so increase the threshold and move the assertion of
`defragtest_datatype_resumes` to `wait_for_condition`.

2. `Module defrag: global defrag works` test
     Increase the waiting time for this test.
2025-08-27 22:25:32 +08:00
Stav-Levi
81df8deca6
Run module tests as part of the base redis testsuit (#14226)
integrate module API tests into default test suite
- Add module_tests target to main Makefile to build test modules
- Include unit/moduleapi in test_dirs to run module tests with ./runtest
- Module API tests now run by default instead of requiring
runtest-moduleapi

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-08-26 14:49:05 +03:00
guybe7
ca9ede6968
Expose "touches-arbitrary-keys" flag to Redis modules (#14290)
This commit adds support for the "touches-arbitrary-keys" command flag
in Redis modules, allowing module commands to be properly marked when
they modify keys not explicitly provided as arguments, to avoid wrapping
replicated commands with MULTI/EXEC.

Changes:
- Added "touches-arbitrary-keys" flag parsing in
commandFlagsFromString()
- Updated module command documentation to describe the new flag
- Added test implementation in zset module with zset.delall command to
demonstrate and verify the flag functionality

The zset.delall command serves as a test case that scans the keyspace
and deletes all zset-type keys, properly using the new flag since it
modifies keys not provided via argv.

This commit adds a new `zset.delall` command to the zset test module
that iterates through the keyspace and deletes all keys of type "zset".

Key changes:
- Added zset_delall() function that uses RedisModule_Scan to iterate
through all keys in the keyspace
- Added zset_delall_callback() that checks each key's type and deletes
zset keys using RedisModule_Call with "DEL" command
- Registered the new command with "write touches-arbitrary-keys" flags
since it modifies arbitrary keys not provided via argv
- Added support for "touches-arbitrary-keys" flag in module command
parsing
- Added comprehensive tests for the new functionality

The command returns the number of deleted zset keys and properly handles
replication by using the "s!" format specifier with RedisModule_Call to
ensure DEL commands are replicated to slaves and AOF.

Usage: ZSET.DELALL
Returns: Integer count of deleted zset keys
2025-08-20 07:32:24 +08:00
h.o.t. neglected
3fa7a656f1
Fix memory leak in RM_GetCommandKeysWithFlags (#14243)
Fix https://github.com/redis/redis/issues/14208
As mentioned in the above issue, RM_GetCommandKeysWithFlags could have
memory leak when the number of keys is larger than MAX_KEYS_BUFFER. This
PR fixes it by calling getKeysFreeResult before the function's return. A
TCL testcase is created to verify the fix.
2025-08-06 15:09:08 +08:00
Moti Cohen
c55e33a99f
KEYSIZES - Fix resolving key slot on modules (#14240)
In cluster mode with modules, for a given key, the slot resolution for
the KEYSIZES histogram update was incorrect. As a result, the histogram
might gracefully ignored those keys instead or update the wrong slot
histogram.
2025-07-31 15:44:25 +03:00
Stav-Levi
82396716d0
Add API to allow Redis modules to unsubscribe from keyspace notifications
This API complements module subscribe by enabling modules to unsubscribe
from specific keyspace event notifications when they are no longer
needed.
This helps reduce performance overhead and unnecessary callback
invocations.

The function matches subscriptions based on event mask, callback
pointer,
and module identity. If a matching subscription is found, it is removed.

Returns REDISMODULE_OK if a subscription was successfully removed,
otherwise REDISMODULE_ERR.
2025-07-28 10:17:48 +03:00
debing.sun
457089b1fe
Create global data before test instead of module load for module defrag test (#13951)
After #13816, we added defragmentation support for moduleDict, which
significantly increased global data size.
As a result, the defragmentation tests for non-global data were
affected.
Now, we move the creation of global data to before the global data test
to avoid it interfering with other tests.
Fixed the simple key test failure due to forgetting to reset stats.
2025-07-22 20:44:12 +08:00
Mincho Paskalev
15706f2e82
Module set/get config API (#14051)
# Problem

Some redis modules need to call `CONFIG GET/SET` commands. Server may be
ran with `rename-command CONFIG ""`(or something similar) which leads to
the module being unable to access the config.

# Solution

Added new API functions for use by modules
```
RedisModuleConfigIterator* RedisModule_GetConfigIterator(RedisModuleCtx *ctx, const char *pattern);
void RedisModule_ReleaseConfigIterator(RedisModuleCtx *ctx, RedisModuleConfigIterator *iter);
const char *RedisModule_ConfigIteratorNext(RedisModuleConfigIterator *iter);
int RedisModule_GetConfigType(const char *name, RedisModuleConfigType *res);
int RedisModule_GetBoolConfig(RedisModuleCtx *ctx, const char *name, int *res);
int RedisModule_GetConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString **res);
int RedisModule_GetEnumConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString **res);
int RedisModule_GetNumericConfig(RedisModuleCtx *ctx, const char *name, long long *res);
int RedisModule_SetBoolConfig(RedisModuleCtx *ctx, const char *name, int value, RedisModuleString **err);
int RedisModule_SetConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString *value, RedisModuleString **err);
int RedisModule_SetEnumConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString *value, RedisModuleString **err);
int RedisModule_SetNumericConfig(RedisModuleCtx *ctx, const char *name, long long value, RedisModuleString **err);
```

## Implementation

The work is mostly done inside `config.c` as I didn't want to expose the
config dict outside of it. That means each of these module functions has
a corresponding method in `config.c` that actually does the job. F.e
`RedisModule_SetEnumConfig` calls `moduleSetEnumConfig` which is
implemented in `config.c`

## Notes

Also, refactored `configSetCommand` and `restoreBackupConfig` functions
for the following reasons:
- code and logic is now way more clear in `configSetCommand`. Only
caveat here is removal of an optimization that skipped running apply
functions that already have ran in favour of code clarity.
- Both functions needlessly separated logic for module configs and
normal configs whereas no such separation is needed. This also had the
side effect of removing some allocations.
- `restoreBackupConfig` now has clearer interface and can be reused with
ease. One of the places I reused it is for the individual
`moduleSet*Config` functions, each of which needs the restoration
functionality but for a single config only.

## Future

Additionally, a couple considerations were made for potentially
extending the API in the future
- if need be an API for atomically setting multiple config values can be
added - `RedisModule_SetConfigsTranscationStart/End` or similar that can
be put around `RedisModule_Set*Config` calls.
- if performance is an issue an API
`RedisModule_GetConfigIteratorNextWithTypehint` or similar may be added
in order not to incur the additional cost of calling
`RedisModule_GetConfigType`.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2025-07-03 13:46:33 +03:00
Mincho Paskalev
ad8c7de3fe
Fix assertion in updateClientMemUsageAndBucket (#14152)
## Description

`updateClientMemUsageAndBucket` is called from the main thread to update
memory usage and memory bucket of a client. That's why it has assertion
that it's being called by the main thread.

But it may also be called from a thread spawned by a module.
Specifically, when a module calls `RedisModule_Call` which in turn calls
`call`->`replicationFeedMonitors`->`updateClientMemUsageAndBucket`.
This is generally safe as module calls inside a spawned thread should be
guarded by a call to `ThreadSafeContextLock`, i.e the module is holding
the GIL at this point.

This commit fixes the assertion inside `updateClientMemUsageAndBucket`
so that it encompasses that case also. Generally calls from
module-spawned threads are safe to operate on clients that are not
running on IO-threads when the module is holding the GIL.

---------

Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
2025-07-02 11:55:57 +03:00
Slavomir Kaslev
b7c6755b1b
Add thread sanitizer run to daily CI (#13964)
Add thread sanitizer run to daily CI.

Few tests are skipped in tsan runs for two reasons:
* Stack trace producing tests (oom, `unit/moduleapi/crash`, etc) are
tagged `tsan:skip` because redis calls `backtrace()` in signal handler
which turns out to be signal-unsafe since it might allocate memory (e.g.
glibc 2.39 does it through a call to `_dl_map_object_deps()`).
* Few tests become flaky with thread sanitizer builds and don't finish
in expected deadlines because of the additional tsan overhead. Instead
of skipping those tests, this can improved in the future by allowing
more iterations when waiting for tsan builds.

Deadlock detection is disabled for now because of tsan limitation where
max 64 locks can be taken at once.

There is one outstanding (false-positive?) race in jemalloc which is
suppressed in `tsan.sup`.

Fix few races thread sanitizer reported having to do with writes from
signal handlers. Since in multi-threaded setting signal handlers might
be called on any thread (modulo pthread_sigmask) while the main thread
is running, `volatile sig_atomic_t` type is not sufficient and atomics
are used instead.
2025-06-02 10:13:23 +03:00
Ozan Tezcan
7f60945bc6
Fix short read issue that causes exit() on replica (#14085)
When `repl-diskless-load` is enabled on a replica, and it is in the
process of loading an RDB file, a broken connection detected by the main
channel may trigger a call to rioAbort(). This sets a flag to cause the
rdb channel to fail on the next rioRead() call, allowing it to perform
necessary cleanup.

However, there are specific scenarios where the error is checked using
rioGetReadError(), which does not account for the RIO_ABORT flag (see
[source](79b37ff535/src/rdb.c (L3098))).
As a result, the error goes undetected. The code then proceeds to
validate a module type, fails to find a match, and calls
rdbReportCorruptRDB() which logs the following error and exits the
process:

```
The RDB file contains module data I can't load: no matching module type '_________'
```

To fix this issue, the RIO_ABORT flag has been removed. Now, rioAbort()
sets both read and write error flags, so that subsequent operations and
error checks properly detect the failure.

Additional keys were added to the short read test. It reproduces the
issue with this change. We hit that problematic line once per key. My
guess is that with many smaller keys, the likelihood of the connection
being killed at just the right moment increases.
2025-05-28 12:43:59 +03:00
Vitah Lin
232f2fb077
Include missing getchannels.tcl in moduleapi tests and fix incorrect assertions (#14037) 2025-05-14 08:57:01 +08:00
Moti Cohen
30d5f05637
Fix various KEYSIZES enumeration issues (#13923)
There are several issues with maintaining histogram counters.

Ideally, the hooks would be placed in the low-level datatype
implementations. However, this logic is triggered in various contexts
and doesn’t always map directly to a stored DB key. As a result, the
hooks sit closer to the high-level commands layer. It’s a bit messy, but
the right way to ensure histogram counters behave correctly is through
broad test coverage.

* Fix inaccuracies around deletion scenarios.
* Fix inaccuracies around modules calls. Added corresponding tests.
* The info-keysizes.tcl test has been extended to operate on meaningful
datasets
* Validate histogram correctness in edge cases involving collection
deletions.
* Add new macro debugServerAssert(). Effective only if compiled with
DEBUG_ASSERTIONS.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-08 10:59:12 +03:00
Oran Agra
2a189709e0
avoid possible use-after-free with module KSN changes (#13875)
in #13505, we changed the code to use the string value of the key rather
than the integer value on the stack, but we have a test in
unit/moduleapi/keyspace_events that uses keyspace notification hook to
modify the value with RM_StringDMA, which can cause this value to be
released before used. the reason it didn't happen so far is because we
were using shared integers, so releasing the object doesn't free it.
2025-03-24 12:24:52 +02:00
debing.sun
cb02bd190b
Fix timing issue in module defrag test (#13870)
After #13840, the data we populate becomes more complex and slower, we
always wait for a defragmentation cycle to end before verifying that the
test is okay.
However, in some slow environments, an entire defragmentation cycle can
exceed 5 seconds, and in my local test using 'taskset -c 0' it can reach
6 seconds, so increase the threshold to avoid test failures.
2025-03-20 21:22:47 +08:00
Yuan Wang
951ec79654
Cluster compatibility check (#13846)
### Background
The program runs normally in standalone mode, but migrating to cluster
mode may cause errors, this is because some cross slot commands can not
run in cluster mode. We should provide an approach to detect this issue
when running in standalone mode, and need to expose a metric which
indicates the usage of no incompatible commands.

### Solution
To avoid perf impact, we introduce a new config
`cluster-compatibility-sample-ratio` which define the sampling ratio
(0-100) for checking command compatibility in cluster mode. When a
command is executed, it is sampled at the specified ratio to determine
if it complies with Redis cluster constraints, such as cross-slot
restrictions.

A new metric is exposed: `cluster_incompatible_ops` in `info stats`
output.

The following operations will be considered incompatible operations.

- cross-slot command
   If a command has multiple cross slot keys, it is incompatible
- `swap, copy, move, select` command
These commands involve multi databases in some cases, we don't allow
multiple DB in cluster mode, so there are not compatible
- Module command with `no-cluster` flag
If a module command has `no-cluster` flag, we will encounter an error
when loading module, leading to fail to load module if cluster is
enabled, so this is incompatible.
- Script/function with `no-cluster` flag
Similar with module command, if we declare `no-cluster` in shebang of
script/function, we also can not run it in cluster mode
- `sort` command by/get pattern
When `sort` command has `by/get` pattern option, we must ask that the
pattern slot is equal with the slot of keys, otherwise it is
incompatible in cluster mode.

- The script/function command accesses the keys and declared keys have
different slots
For the script/function command, we not only check the slot of declared
keys, but only check the slot the accessing keys, if they are different,
we think it is incompatible.

**Besides**, commands like `keys, scan, flushall, script/function
flush`, that in standalone mode iterate over all data to perform the
operation, are only valid for the server that executes the command in
cluster mode and are not broadcasted. However, this does not lead to
errors, so we do not consider them as incompatible commands.

### Performance impact test
**cross slot test**
Below are the test commands and results. When using MSET with 8 keys,
performance drops by approximately 3%.

**single key test**
It may be due to the overhead of the sampling function, and single-key
commands could cause a 1-2% performance drop.
2025-03-20 10:35:53 +08:00
debing.sun
f364dcca2d
Make RM_DefragRedisModuleDict API support incremental defragmentation for dict leaf (#13840)
After https://github.com/redis/redis/pull/13816, we make a new API to
defrag RedisModuleDict.
Currently, we only support incremental defragmentation of the dictionary
itself, but the defragmentation of values is still not incremental. If
the values are very large, it could lead to significant blocking.
Therefore, in this PR, we have added incremental defragmentation for the
values.

The main change is to the `RedisModuleDefragDictValueCallback`, we
modified the return value of this callback.
When the callback returns 1, we will save the `seekTo` as the key of the
current unfinished node, and the next time we enter, we will continue
defragmenting this node.
When the return value is 0, we will proceed to the next node.

## Test
Since each dictionary in the global dict originally contained only 10
strings, but now it has been changed to a nested dictionary, each
dictionary now has 10 sub-dictionaries, with each sub-dictionary
containing 10 strings, this has led to a corresponding reduction in the
defragmentation time obtained from other tests.
Therefore, the other tests have been modified to always wait for
defragmentation to be turned off before the test begins, then start it
after creating fragmentation, ensuring that they can always run for a
full defragmentation cycle.

---------

Co-authored-by: ephraimfeldblum <ephraim.feldblum@redis.com>
2025-03-04 17:19:41 +08:00
debing.sun
7939ba031d
Enable the callback to be NULL for RM_DefragRedisModuleDict() and reduce the system calls of RM_DefragShouldStop() (#13830)
1) Enable the callback to be NULL for RM_DefragRedisModuleDict()
    Because the dictionary may store only the key without the value.

2) Reduce the system calls of RM_DefragShouldStop()
The API checks the following thresholds before performing a time check:
over 512 defrag hits, or over 1024 defrag misses, and performs the time
judgment if any of these thresholds are reached.

3) Added defragmentation statistics for dictionary items to cover the
associated code for RM_DefragRedisModuleDict().

4) Removed `module_ctx` from `defragModuleCtx` struct, which can be
replaced by a temporary variable.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-02-26 20:04:29 +08:00
debing.sun
ee933d9e2b
Fixed passing incorrect endtime value for module context (#13822)
1) Fix a bug that passing an incorrect endtime to module.
   This bug was found by @ShooterIT.
After #13814, all endtime will be monotonic time, and we should no
longer convert it to ustime relative.
Add assertions to prevent endtime from being much larger thatn the
current time.

2) Fix a race in test `Reduce defrag CPU usage when module data can't be
defragged`

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
2025-02-23 12:58:48 +08:00
debing.sun
032357ec0f
Add RM_DefragRedisModuleDict module API (#13816)
After #13815, we introduced incremental defragmentation for global data
for module.
Now we added a new module API `RM_DefragRedisModuleDict` to incremental
defrag `RedisModuleDict`.

This PR adds a new APIs and a new defrag callback:
```c
RedisModuleDict *RM_DefragRedisModuleDict(RedisModuleDefragCtx *ctx, RedisModuleDict *dict, RedisModuleDefragDictValueCallback valueCB, RedisModuleString **seekTo);

typedef void *(*RedisModuleDefragDictValueCallback)(RedisModuleDefragCtx *ctx, void *data, unsigned char *key, size_t keylen);
```

Usage:
```c
RedisModuleString *seekTo = NULL;
RedisModuleDict *dict = = RedisModule_CreateDict(ctx);
... populate the dict code ...
/* Defragment a dictionary completely */
do {
    RedisModuleDict *new = RedisModule_DefragRedisModuleDict(ctx, dict, defragGlobalDictValueCB, &seekTo);
    if (new != NULL) {
        dict = new;
    }
} while (seekTo);
```

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-02-20 21:09:29 +08:00
debing.sun
695126ccce
Add support for incremental defragmentation of global module data (#13815)
## Description

Currently, when performing defragmentation on non-key data within the
module, we cannot process the defragmentation incrementally. This
limitation affects the efficiency and flexibility of defragmentation in
certain scenarios.
The primary goal of this PR is to introduce support for incremental
defragmentation of global module data.

## Interface Change
New module API `RegisterDefragFunc2`

This is a more advanced version of `RM_RegisterDefragFunc`, in that it
takes a new callbacks(`RegisterDefragFunc2`) that has a return value,
and can use RM_DefragShouldStop in and indicate that it should be called
again later, or is it done (returned 0).

## Note
The `RegisterDefragFunc` API remains available.

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-02-20 00:28:16 +08:00
jonathan keinan
7a40fd630d * fix comments 2025-02-06 13:16:33 +02:00
jonathan keinan
b9361ad5fe * only use new api if override-default was provided as an argument 2025-02-06 13:16:33 +02:00
kei-nan
f164012c19 Update tests/unit/moduleapi/moduleconfigs.tcl
Co-authored-by: nafraf <nafraf@users.noreply.github.com>
2025-02-06 13:16:33 +02:00
jonathan keinan
49455c43ae * change foo to goo so test will be correct and pass 2025-02-06 13:16:33 +02:00
jonathan keinan
c2694fb696 * change config value in test to be different than overwritten value 2025-02-06 13:16:33 +02:00
jonathan keinan
f7353db7eb * fix test
* cleanup strval2 on if an error during the OnLoad was encountered.
2025-02-06 13:16:33 +02:00
jonathan keinan
294492dbf2 * fix tests
* add some logging to test module
2025-02-06 13:16:33 +02:00
jonathan keinan
fd5c325886 * initial commit 2025-02-06 13:16:33 +02:00
Raz Monsonego
04589f90d7
Add internal connection and command mechanism (#13740)
# PR: Add Mechanism for Internal Commands and Connections in Redis

This PR introduces a mechanism to handle **internal commands and
connections** in Redis. It includes enhancements for command
registration, internal authentication, and observability.

## Key Features

1. **Internal Command Flag**:
   - Introduced a new **module command registration flag**: `internal`.
- Commands marked with `internal` can only be executed by **internal
connections**, AOF loading flows, and master-replica connections.
- For any other connection, these commands will appear as non-existent.

2. **Support for internal authentication added to `AUTH`**:
- Used by depicting the special username `internal connection` with the
right internal password, i.e.,: `AUTH "internal connection"
<internal_secret>`.
- No user-defined ACL username can have this name, since spaces are not
aloud in the ACL parser.
   - Allows connections to authenticate as **internal connections**.
- Authenticated internal connections can execute internal commands
successfully.

4. **Module API for Internal Secret**:
- Added the `RedisModule_GetInternalSecret()` API, that exposes the
internal secret that should be used as the password for the new `AUTH
"internal connection" <password>` command.
- This API enables the modules to authenticate against other shards as
local connections.

## Notes on Behavior

- **ACL validation**:
- Commands dispatched by internal connections bypass ACL validation, to
give the caller full access regardless of the user with which it is
connected.

- **Command Visibility**:
- Internal commands **do not appear** in `COMMAND <subcommand>` and
`MONITOR` for non-internal connections.
- Internal commands **are logged** in the slow log, latency report and
commands' statistics to maintain observability.

- **`RM_Call()` Updates**:
  - **Non-internal connections**:
- Cannot execute internal commands when the command is sent with the `C`
flag (otherwise can).
- Internal connections bypass ACL validations (i.e., run as the
unrestricted user).

- **Internal commands' success**:
- Internal commands succeed upon being sent from either an internal
connection (i.e., authenticated via the new `AUTH "internal connection"
<internal_secret>` API), an AOF loading process, or from a master via
the replication link.
Any other connections that attempt to execute an internal command fail
with the `unknown command` error message raised.

- **`CLIENT LIST` flags**:
  - Added the `I` flag, to indicate that the connection is internal.

- **Lua Scripts**:
   - Prevented internal commands from being executed via Lua scripts.

---------

Co-authored-by: Meir Shpilraien <meir@redis.com>
2025-02-05 11:48:08 +02:00
Raz Monsonego
c688537d49
Add flag for ability of a module context to execute debug commands (#13774)
This PR adds a flag to the `RM_GetContextFlags` module-API function that
depicts whether the context may execute debug commands, according to
redis's standards.
2025-02-03 09:52:41 +02:00
debing.sun
f86575f210
Gradually reduce defrag CPU usage when defragmentation is ineffective (#13752)
This PR addresses an issue where if a module does not provide a
defragmentation callback, we cannot defragment the fragmentation it
generates. However, the defragmentation process still considers a large
amount of fragmentation to be present, leading to more aggressive
defragmentation efforts that ultimately have no effect.

To mitigate this, the PR introduces a mechanism to gradually reduce the
CPU consumption for defragmentation when the defragmentation
effectiveness is poor. This occurs when the fragmentation rate drops
below 2% and the hit ratio is less than 1%, or when the fragmentation
rate increases by no more than 2%. The CPU consumption will be gradually
decreased until it reaches the minimum threshold defined by
`active-defrag-cycle-min`.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-01-24 11:35:32 +08:00
Yuan Wang
64a40b20d9
Async IO Threads (#13695)
## Introduction
Redis introduced IO Thread in 6.0, allowing IO threads to handle client
request reading, command parsing and reply writing, thereby improving
performance. The current IO thread implementation has a few drawbacks.
- The main thread is blocked during IO thread read/write operations and
must wait for all IO threads to complete their current tasks before it
can continue execution. In other words, the entire process is
synchronous. This prevents the efficient utilization of multi-core CPUs
for parallel processing.

- When the number of clients and requests increases moderately, it
causes all IO threads to reach full CPU utilization due to the busy wait
mechanism used by the IO threads. This makes it challenging for us to
determine which part of Redis has reached its bottleneck.

- When IO threads are enabled with TLS and io-threads-do-reads, a
disconnection of a connection with pending data may result in it being
assigned to multiple IO threads simultaneously. This can cause race
conditions and trigger assertion failures. Related issue:
redis#12540

Therefore, we designed an asynchronous IO threads solution. The IO
threads adopt an event-driven model, with the main thread dedicated to
command processing, meanwhile, the IO threads handle client read and
write operations in parallel.

## Implementation
### Overall
As before, we did not change the fact that all client commands must be
executed on the main thread, because Redis was originally designed to be
single-threaded, and processing commands in a multi-threaded manner
would inevitably introduce numerous race and synchronization issues. But
now each IO thread has independent event loop, therefore, IO threads can
use a multiplexing approach to handle client read and write operations,
eliminating the CPU overhead caused by busy-waiting.

the execution process can be briefly described as follows:
the main thread assigns clients to IO threads after accepting
connections, IO threads will notify the main thread when clients
finish reading and parsing queries, then the main thread processes
queries from IO threads and generates replies, IO threads handle
writing reply to clients after receiving clients list from main thread,
and then continue to handle client read and write events.

### Each IO thread has independent event loop
We now assign each IO thread its own event loop. This approach
eliminates the need for the main thread to perform the costly
`epoll_wait` operation for handling connections (except for specific
ones). Instead, the main thread processes requests from the IO threads
and hands them back once completed, fully offloading read and write
events to the IO threads.

Additionally, all TLS operations, including handling pending data, have
been moved entirely to the IO threads. This resolves the issue where
io-threads-do-reads could not be used with TLS.

### Event-notified client queue
To facilitate communication between the IO threads and the main thread,
we designed an event-notified client queue. Each IO thread and the main
thread have two such queues to store clients waiting to be processed.
These queues are also integrated with the event loop to enable handling.
We use pthread_mutex to ensure the safety of queue operations, as well
as data visibility and ordering, and race conditions are minimized, as
each IO thread and the main thread operate on independent queues,
avoiding thread suspension due to lock contention. And we implemented an
event notifier based on `eventfd` or `pipe` to support event-driven
handling.

### Thread safety
Since the main thread and IO threads can execute in parallel, we must
handle data race issues carefully.

**client->flags**
The primary tasks of IO threads are reading and writing, i.e.
`readQueryFromClient` and `writeToClient`. However, IO threads and the
main thread may concurrently modify or access `client->flags`, leading
to potential race conditions. To address this, we introduced an io-flags
variable to record operations performed by IO threads, thereby avoiding
race conditions on `client->flags`.

**Pause IO thread**
In the main thread, we may want to operate data of IO threads, maybe
uninstall event handler, access or operate query/output buffer or resize
event loop, we need a clean and safe context to do that. We pause IO
thread in `IOThreadBeforeSleep`, do some jobs and then resume it. To
avoid thread suspended, we use busy waiting to confirm the target
status. Besides we use atomic variable to make sure memory visibility
and ordering. We introduce these functions to pause/resume IO Threads as
below.
```
pauseIOThread, resumeIOThread
pauseAllIOThreads, resumeAllIOThreads
pauseIOThreadsRange, resumeIOThreadsRange
```
Testing has shown that `pauseIOThread` is highly efficient, allowing the
main thread to execute nearly 200,000 operations per second during
stress tests. Similarly, `pauseAllIOThreads` with 8 IO threads can
handle up to nearly 56,000 operations per second. But operations
performed between pausing and resuming IO threads must be quick;
otherwise, they could cause the IO threads to reach full CPU
utilization.

**freeClient and freeClientAsync**
The main thread may need to terminate a client currently running on an
IO thread, for example, due to ACL rule changes, reaching the output
buffer limit, or evicting a client. In such cases, we need to pause the
IO thread to safely operate on the client.

**maxclients and maxmemory-clients updating**
When adjusting `maxclients`, we need to resize the event loop for all IO
threads. Similarly, when modifying `maxmemory-clients`, we need to
traverse all clients to calculate their memory usage. To ensure safe
operations, we pause all IO threads during these adjustments.

**Client info reading**
The main thread may need to read a client’s fields to generate a
descriptive string, such as for the `CLIENT LIST` command or logging
purposes. In such cases, we need to pause the IO thread handling that
client. If information for all clients needs to be displayed, all IO
threads must be paused.

**Tracking redirect**
Redis supports the tracking feature and can even send invalidation
messages to a connection with a specified ID. But the target client may
be running on IO thread, directly manipulating the client’s output
buffer is not thread-safe, and the IO thread may not be aware that the
client requires a response. In such cases, we pause the IO thread
handling the client, modify the output buffer, and install a write event
handler to ensure proper handling.

**clientsCron**
In the `clientsCron` function, the main thread needs to traverse all
clients to perform operations such as timeout checks, verifying whether
they have reached the soft output buffer limit, resizing the
output/query buffer, or updating memory usage. To safely operate on a
client, the IO thread handling that client must be paused.
If we were to pause the IO thread for each client individually, the
efficiency would be very low. Conversely, pausing all IO threads
simultaneously would be costly, especially when there are many IO
threads, as clientsCron is invoked relatively frequently.
To address this, we adopted a batched approach for pausing IO threads.
At most, 8 IO threads are paused at a time. The operations mentioned
above are only performed on clients running in the paused IO threads,
significantly reducing overhead while maintaining safety.

### Observability
In the current design, the main thread always assigns clients to the IO
thread with the least clients. To clearly observe the number of clients
handled by each IO thread, we added the new section in INFO output. The
`INFO THREADS` section can show the client count for each IO thread.
```
# Threads
io_thread_0:clients=0
io_thread_1:clients=2
io_thread_2:clients=2
```

Additionally, in the `CLIENT LIST` output, we also added a field to
indicate the thread to which each client is assigned.

`id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name=
lib-ver= io-thread=1`

## Trade-off
### Special Clients
For certain special types of clients, keeping them running on IO threads
would result in severe race issues that are difficult to resolve.
Therefore, we chose not to offload these clients to the IO threads.

For replica, monitor, subscribe, and tracking clients, main thread may
directly write them a reply when conditions are met. Race issues are
difficult to resolve, so we have them processed in the main thread. This
includes the Lua debug clients as well, since we may operate connection
directly.

For blocking client, after the IO thread reads and parses a command and
hands it over to the main thread, if the client is identified as a
blocking type, it will be remained in the main thread. Once the blocking
operation completes and the reply is generated, the client is
transferred back to the IO thread to send the reply and wait for event
triggers.

### Clients Eviction
To support client eviction, it is necessary to update each client’s
memory usage promptly during operations such as read, write, or command
execution. However, when a client operates on an IO thread, it is not
feasible to update the memory usage immediately due to the risk of data
races. As a result, memory usage can only be updated either in the main
thread while processing commands or in the `ClientsCron` periodically.
The downside of this approach is that updates might experience a delay
of up to one second, which could impact the precision of memory
management for eviction.

To avoid incorrectly evicting clients. We adopted a best-effort
compensation solution, when we decide to eviction a client, we update
its memory usage again before evicting, if the memory used by the client
does not decrease or memory usage bucket is not changed, then we will
evict it, otherwise, not evict it.

However, we have not completely solved this problem. Due to the delay in
memory usage updates, it may lead us to make incorrect decisions about
the need to evict clients.

### Defragment
In the majority of cases we do NOT use the data from argv directly in
the db.
1. key names
We store a copy that we allocate in the main thread, see `sdsdup()` in
`dbAdd()`.
2. hash key and value
We store key as hfield and store value as sds, see `hfieldNew()` and
`sdsdup()` in `hashTypeSet()`.
3. other datatypes
   They don't even use SDS, so there is no reference issues.

But in some cases client the data from argv may be retain by the main
thread.
As a result, during fragmentation cleanup, we need to move allocations
from the IO thread’s arena to the main thread’s arena. We always
allocate new memory in the main thread’s arena, but the memory released
by IO threads may not yet have been reclaimed. This ultimately causes
the fragmentation rate to be higher compared to creating and allocating
entirely within a single thread.
The following cases below will lead to memory allocated by the IO thread
being kept by the main thread.
1. string related command: `append`, `getset`, `mset` and `set`.
If `tryObjectEncoding()` does not change argv, we will keep it directly
in the main thread, see the code in `tryObjectEncoding()`(specifically
`trimStringObjectIfNeeded()`)
2. block related command.
    the key names will be kept in `c->db->blocking_keys`.
3. watch command
    the key names will be kept in `c->db->watched_keys`.
4. [s]subscribe command
    channel name will be kept in `serverPubSubChannels`.
5. script load command
    script will be kept in `server.lua_scripts`.
7. some module API: `RM_RetainString`, `RM_HoldString`

Those issues will be handled in other PRs.

## Testing
### Functional Testing
The commit with enabling IO Threads has passed all TCL tests, but we did
some changes:
**Client query buffer**: In the original code, when using a reusable
query buffer, ownership of the query buffer would be released after the
command was processed. However, with IO threads enabled, the client
transitions from an IO thread to the main thread for processing. This
causes the ownership release to occur earlier than the command
execution. As a result, when IO threads are enabled, the client's
information will never indicate that a shared query buffer is in use.
Therefore, we skip the corresponding query buffer tests in this case.
**Defragment**: Add a new defragmentation test to verify the effect of
io threads on defragmentation.
**Command delay**: For deferred clients in TCL tests, due to clients
being assigned to different threads for execution, delays may occur. To
address this, we introduced conditional waiting: the process proceeds to
the next step only when the `client list` contains the corresponding
commands.

### Sanitizer Testing
The commit passed all TCL tests and reported no errors when compiled
with the `fsanitizer=thread` and `fsanitizer=address` options enabled.
But we made the following modifications: we suppressed the sanitizer
warnings for clients with watched keys when updating `client->flags`, we
think IO threads read `client->flags`, but never modify it or read the
`CLIENT_DIRTY_CAS` bit, main thread just only modifies this bit, so
there is no actual data race.

## Others
### IO thread number
In the new multi-threaded design, the main thread is primarily focused
on command processing to improve performance. Typically, the main thread
does not handle regular client I/O operations but is responsible for
clients such as replication and tracking clients. To avoid breaking
changes, we still consider the main thread as the first IO thread.

When the io-threads configuration is set to a low value (e.g., 2),
performance does not show a significant improvement compared to a
single-threaded setup for simple commands (such as SET or GET), as the
main thread does not consume much CPU for these simple operations. This
results in underutilized multi-core capacity. However, for more complex
commands, having a low number of IO threads may still be beneficial.
Therefore, it’s important to adjust the `io-threads` based on your own
performance tests.

Additionally, you can clearly monitor the CPU utilization of the main
thread and IO threads using `top -H -p $redis_pid`. This allows you to
easily identify where the bottleneck is. If the IO thread is the
bottleneck, increasing the `io-threads` will improve performance. If the
main thread is the bottleneck, the overall performance can only be
scaled by increasing the number of shards or replicas.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: oranagra <oran@redislabs.com>
2024-12-23 14:16:40 +08:00
Moti Cohen
c51c96656b
modules API: Add test for ACL check of empty prefix (#13678)
- Add empty string test for the new API
`RedisModule_ACLCheckKeyPrefixPermissions`.
- Fix order of checks: `(pattern[patternLen - 1] != '*' || patternLen ==
0)`

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-12-10 09:16:30 +02:00
Moti Cohen
0dd057222b
Modules API: new HashFieldMinExpire(). Add flag REDISMODULE_HASH_EXPIRE_TIME to HashGet(). (#13676)
This PR introduces API to query Expiration time of hash fields.

# New `RedisModule_HashFieldMinExpire()`
For a given hash, retrieves the minimum expiration time across all
fields. If no fields have expiration or if the key is not a hash then
return `REDISMODULE_NO_EXPIRE` (-1).
```
mstime_t RM_HashFieldMinExpire(RedisModuleKey *hash);
```

# Extension to `RedisModule_HashGet()`
Adds a new flag, `REDISMODULE_HASH_EXPIRE_TIME`, to retrieve the
expiration time of a specific hash field. If the field does not exist or
has no expiration, returns `REDISMODULE_NO_EXPIRE`. It is fully
backward-compatible (RM_HashGet retains its original behavior unless the
new flag is used).

Example:
```
mstime_t expiry1, expiry2;
RedisModule_HashGet(mykey, REDISMODULE_HASH_EXPIRE_TIME, "field1", &expiry1, NULL);
RedisModule_HashGet(mykey, REDISMODULE_HASH_EXPIRE_TIME, "field1", &expiry1, "field2", &expiry2, NULL);
```
2024-12-05 11:14:52 +02:00
Moti Cohen
06b144aa09
Modules API: Add RedisModule_ACLCheckKeyPrefixPermissions (#13666)
This PR introduces a new API function to the Redis Module API:
```
int RedisModule_ACLCheckKeyPrefixPermissions(RedisModuleUser *user, RedisModuleString *prefix, int flags);
```
Purpose:
The function checks if a given user has access permissions to any key
that match a specific prefix. This validation is based on the user’s ACL
permissions and the specified flags.

Note, this prefix-based approach API may fail to detect prefixes that
are individually uncovered but collectively covered by the patterns. For
example the prefix `ID-*` is not fully included in pattern `ID-[0]*` and
is not fully included in pattern `ID-[^0]*` but it is fully included in
the set of patterns `{ID-[0]*, ID-[^0]*}`
2024-11-28 18:33:58 +02:00
Moti Cohen
155634502d
modules API: Support register unprefixed config parameters (#13656)
PR #10285 introduced support for modules to register four types of
configurations — Bool, Numeric, String, and Enum. Accessible through the
Redis config file and the CONFIG command.

With this PR, it will be possible to register configuration parameters
without automatically prefixing the parameter names. This provides
greater flexibility in configuration naming, enabling, for instance,
both `bf-initial-size` or `initial-size` to be defined in the module
without automatically prefixing with `<MODULE-NAME>.`. In addition it
will also be possible to create a single additional alias via the same
API. This brings us another step closer to integrate modules into redis
core.

**Example:** Register a configuration parameter `bf-initial-size` with
an alias `initial-size` without the automatic module name prefix, set
with new `REDISMODULE_CONFIG_UNPREFIXED` flag:
```
RedisModule_RegisterBoolConfig(ctx, "bf-initial-size|initial-size", default_val, optflags | REDISMODULE_CONFIG_UNPREFIXED, getfn, setfn, applyfn, privdata);
```
# API changes
Related functions that now support unprefixed configuration flag
(`REDISMODULE_CONFIG_UNPREFIXED`) along with optional alias:
```
RedisModule_RegisterBoolConfig
RedisModule_RegisterEnumConfig
RedisModule_RegisterNumericConfig
RedisModule_RegisterStringConfig
```

# Implementation Details:
`config.c`: On load server configuration, at function
`loadServerConfigFromString()`, it collects all unknown configurations
into `module_configs_queue` dictionary. These may include valid module
configurations or invalid ones. They will be validated later by
`loadModuleConfigs()` against the configurations declared by the loaded
module(s).
`Module.c:` The `ModuleConfig` structure has been modified to store now:
(1) Full configuration name (2) Alias (3) Unprefixed flag status -
ensuring that configurations retain their original registration format
when triggered in notifications.

Added error printout:
This change introduces an error printout for unresolved configurations,
detailing each unresolved parameter detected during startup. The last
line in the output existed prior to this change and has been retained to
systems relies on it:
```
595011:M 18 Nov 2024 08:26:23.616 # Unresolved Configuration(s) Detected:
595011:M 18 Nov 2024 08:26:23.616 #  >>> 'bf-initiel-size 8'
595011:M 18 Nov 2024 08:26:23.616 #  >>> 'search-sizex 32'
595011:M 18 Nov 2024 08:26:23.616 # Module Configuration detected without loadmodule directive or no ApplyConfig call: aborting
```

# Backward Compatibility:
Existing modules will function without modification, as the new
functionality only applies if REDISMODULE_CONFIG_UNPREFIXED is
explicitly set.

# Module vs. Core API Conflict Behavior
The new API allows to modules loading duplication of same configuration
name or same configuration alias, just like redis core configuration
allows (i.e. the users sets two configs with a different value, but
these two configs are actually the same one). Unlike redis core, given a
name and its alias, it doesn't allow have both configuration on load. To
implement it, it is required to modify DS `module_configs_queue` to
reflect the order of their loading and later on, during
`loadModuleConfigs()`, resolve pairs of names and aliases and which one
is the last one to apply. "Relaxing" this limitation can be deferred to
a future update if necessary, but for now, we error in this case.
2024-11-21 09:55:02 +02:00
nafraf
5b84dc9678
Fix module loadex command crash due to invalid config (#13653)
Fix to https://github.com/redis/redis/issues/13650

providing an invalid config to a module with datatype crashes when redis
tries to unload the module due to the invalid config

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-11-21 14:14:14 +08:00
Ozan Tezcan
54038811c0
Print command tokens on a crash when hide-user-data-from-log is enabled (#13639)
If `hide-user-data-from-log` config is enabled, we don't print client
argv in the crashlog to avoid leaking user info.
Though, debugging a crash becomes harder as we don't see the command
arguments causing the crash.

With this PR, we'll be printing command tokens to the log. As we have
command tokens defined in json schema for each command, using this data,
we can find tokens in the client argv.

e.g. 
`SET key value GET EX 10` ---> we'll print `SET * * GET EX *` in the
log.

Modules should introduce their command structure via
`RM_SetCommandInfo()`.
Then, on a crash we'll able to know module command tokens.
2024-11-11 09:34:18 +03:00