`kqueue` has the capability of batch applying events:
> The kevent,() kevent64() and kevent_qos() system calls are used to
register events with the queue, and return any pending events to the
user. The changelist argument is a pointer to an array of kevent,
kevent64_s or kevent_qos_s structures, as defined in <sys/event.h>. All
changes contained in the changelist are applied before any pending
events
are read from the queue. The nchanges argument gives the size of
changelist.
This PR implements this functionality for `kqueue` with which we're able
to reduce plenty of system calls of `kevent(2)`.
## References
[FreeBSD - kqueue](https://man.freebsd.org/cgi/man.cgi?kqueue)
Using a different cluster implementation instead of the legacy one may
result in inconsistent slot migration logs, which can cause confusion.
Therefore, we should centralize these logs within the slot migration
process itself rather than relying on the specific cluster
implementation.
Since all commands that invoke streamReplyWithRange with a group
argument always pass end as NULL, therefore will not trigger incorrect
stream ID comparisons. In other words, even if this bug remains unfixed,
no incident would occur.
This PR introduces a mechanism that allows a module to temporarily
disable trimming after an ASM migration operation so it can safely
finish ongoing asynchronous jobs that depend on keys in migrating (and
about to be trimmed) slots.
1. **ClusterDisableTrim/ClusterEnableTrim**
We introduce `ClusterDisableTrim/ClusterEnableTrim` Module APIs to allow
module to disable/enable slot migration
```
/* Disable automatic slot trimming. */
int RM_ClusterDisableTrim(RedisModuleCtx *ctx)
/* Enable automatic slot trimming */
int RM_ClusterEnableTrim(RedisModuleCtx *ctx)
```
**Please notice**: Redis will not start any subsequent import or migrate
ASM operations while slot trimming is disabled, so modules must
re-enable trimming immediately after completing their pending work.
The only valid and meaningful time for a module to disable trimming
appears to be after the MIGRATE_COMPLETED event.
2. **REDISMODULE_OPEN_KEY_ACCESS_TRIMMED**
Added REDISMODULE_OPEN_KEY_ACCESS_TRIMMED to RM_OpenKey() so that module
can operate with these keys in the unowned slots after trim is paused.
And now we don't delete the key if it is in trim job when we access it.
And `expireIfNeeded` returns `KEY_VALID` if
`EXPIRE_ALLOW_ACCESS_TRIMMED` is set, otherwise, returns `KEY_TRIMMED`
without deleting key.
3. **REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS**
We also extend RM_GetContextFlags() to include a flag
REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS indicating whether a trimming job
is pending (due to trim pause) or in progress. Modules could
periodically poll this flag to synchronize their internal state, e.g.,
if a trim job was delayed or if the module incorrectly assumed trimming
was still active.
Bugfix: RM_SetClusterFlags could not clear a flag after enabling it first.
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
This PR introduces a new flag `--name <client-name>` to `redis-cli`.
This allows users to specify a persistent client name that remains
associated with the connection.
Implementation Details:
- Configuration: Added `client_name` field to the global config struct.
- Argument Parsing: Updated `parseOptions` to handle the `--name` flag.
- Unified Logic (`cliSetName`): Introduced a helper function cliSetName
that sends `CLIENT SETNAME <name>` immediately after the connection is
established. This ensures the name is set consistently for both RESP2
and RESP3 modes.
- Documentation: Updated `redis-cli --help` output to include the new
flag.
This PR can close#14585
`repl-diskless-load` feature can effectively reduce the time of full
synchronization, but maybe it is not widely used.
`swapdb` option needs double `maxmemory`, and `on-empty-db` only works
on the first full sync (the replica must have no data).
This PR introduce a new option: `flushdb` - Always flush the entire
dataset before diskless load. If the diskless load fails, the replica
will lose all existing data.
Of course, it brings the risk of data loss, but it provides a choice if
you want to reduce full sync time and accept this risk.
This PR implements flexible keyword-based argument parsing for all 12
hash field expiration commands, allowing users to specify arguments in
any logical order rather than being constrained by rigid positional
requirements.
This enhancement follows Redis's modern design of keyword-based flexible
argument ordering and significantly improves user experience.
Commands with Flexible Parsing
HEXPIRE, HPEXPIRE, HEXPIREAT, HPEXPIREAT, HGETEX, HSETEX
some examples:
HEXPIRE:
* All these are equivalent and valid:
HEXPIRE key EX 60 NX FIELDS 2 f1 f2
HEXPIRE key NX EX 60 FIELDS 2 f1 f2
HEXPIRE key FIELDS 2 f1 f2 EX 60 NX
HEXPIRE key FIELDS 2 f1 f2 NX EX 60
HEXPIRE key NX FIELDS 2 f1 f2 EX 60
HGETEX:
* All these are equivalent and valid:
HGETEX key EX 60 FIELDS 2 f1 f2
HGETEX key FIELDS 2 f1 f2 EX 60
HSETEX:
* All these are equivalent and valid:
HSETEX key FNX EX 60 FIELDS 2 f1 v1 f2 v2
HSETEX key EX 60 FNX FIELDS 2 f1 v1 f2 v2
HSETEX key FIELDS 2 f1 v1 f2 v2 FNX EX 60
HSETEX key FIELDS 2 f1 v1 f2 v2 EX 60 FNX
HSETEX key FNX FIELDS 2 f1 v1 f2 v2 EX 60
1. GitHub has deprecated older macOS runners, and macos-13 is no longer
supported. Updating to macos-26 ensures that CI workflows continue to
run without interruption.
2. Previously, cross-platform-actions/action@v0.22.0 used runs-on:
macos-13. I checked the latest version of cross-platform-actions, and
the official examples now use runs-on: ubuntu. I think we can switch
from macOS to Ubuntu.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Decouple kvstore from its metadata by introducing `kvstoreType` structure of
callbacks. This resolves the abstraction layer violation of having kvstore
include `server.h` directly.
Move (again) cluster slot statistics to per slot dicts' metadata. The callback
`canFreeDict` is used to prevent freeing empty per slot dicts from losing per
slot statistics.
Co-authored-by: Ran Tidhar <ran.tidhar@redis.com>
Fixes#14538
If the master uses diskless synchronization and the replica uses
diskless load, we can disable RDB compression to reduce full sync time.
I tested on AWS and found we could reduce time by 20-40%.
In terms of implementation, when the replica can use diskless load, the
replica will send `replconf rdb-no-compress 1` to master to deliver a
RDB without compression.
If your network is slow, please disable repl-diskless-load, and maybe
even repl-diskless-sync
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
The cluster implementation may be temporarily unavailable and return an
error to the `ASM_EVENT_MIGRATE_PREP` event to prevent starting a new
migration. Although this is most likely a transient condition, the
source node has no way to distinguish it from a real error, so it must
fail the import attempt and start a new one.
In Redis, failing an attempt is cheap, but in other cluster
implementations it may require cleaning up resources and can cause
unnecessary disruption.
This PR introduces a new `-NOTREADY` error reply for the `CLUSTER
SYNCSLOTS SYNC` command. When the source replies with `-NOTREADY`, the
destination can recognize the condition as transient and retry sending
`CLUSTER SYNCSLOTS SYNC` step periodically instead of failing the
attempt.
This PR does not introduce any behavioral changes.
- Refactored and moved verifyClusterConfigWithData() into cluster.c.
- Refactored and centralized ASM and slot-stats initialization
functions.
These changes place shared logic in a common location so it can be
reused by different cluster implementations.
In case we have to kill an rdb child at shutdown, we wait for the child
process to exit, and then resume with the shutodwn, and we did not clear
the child_pid variable, since we're going to terminate anyway. but if
the shutdown is then aborted due to another issue further down that
function, we will try to kill that child again, and the waitpid will
never get released.
Reproduced in the test "SHUTDOWN can proceed if shutdown command was
with nosave"
some error handling paths didn't remove the tags they added, but most
importantly, if the start_server proc is given the "tags" argument more
than once, on exit, it only removed the last one.
this problem exists in start_cluster in list.tcl, and the result was
that the "external:skip cluster modules" were not removed
An incorrect constant value prevented the configuration from being
broadcast immediately. As a result, the config was only broadcast later
as part of periodic ping messages, causing other nodes to learn about
the configuration change with a delay.
Introduced by https://github.com/redis/redis/pull/14504.
Currently, there is a `TODO` regarding adding a check if the backup
directory exists in `redis-cli`.
This PR adds a check to improve usability:
- If the backup directory does not exist, the user is informed with an
error message.
- If the specified path exists but is not a directory, an error is now
properly reported.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
In #14121, the SCAN filters order was changed, before #14121the order
was - pattern, expiration and type, after #14121pattern became last,
this break change broke the original behavior, which will cause scan
with pattern also to remove the expired keys.
This PR reorders the filters to be consistent with the original behavior
and extends a test to cover this scenario.
This PR fixes an issue(#14541) where EXEC’s ACL recheck was still being
performed during AOF loading, that may cause AOF loading failed, if ACL
rules are changed and don't allow some commands in MULTI-EXEC.
Broadcast configuration changes immediately to the other cluster nodes.
This ensures that all nodes are aware of configuration updates and the
node’s latest epoch immediately, preventing subsequent messages from
being incorrectly rejected as stale.
This issue was observed in an Atomic Slot Migration (ASM) scenario:
- Node B joins the cluster, but there is a config epoch collision with
Node A.
- Node A increments its epoch but has not yet broadcast the new
configuration.
- Meanwhile, Node B starts an import operation. When it finishes, Node B
bumps its epoch and broadcasts the new configuration immediately.
- However, since Node A already has a higher epoch, it ignores Node B’s
update, causing the import operation to fail.
## Problem
When destroying a consumer group with `XGROUP DESTROY`, the cached
`min_cgroup_last_id` was not being invalidated. This caused incorrect
behavior when using `XDELEX` with the `ACKED` option, as the cache still
referenced the destroyed group's `last_id`.
## Solution
Invalidate the `min_cgroup_last_id` cache when the destroyed group's
`last_id` equals the cached minimum. The cache will be recalculated on
the next call to `streamEntryIsReferenced()`.
---------
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
The PR is follow up on #14200 where we prefer storing iterators on the stack
rather than allocating on the heap. Here we continue this for iterators over
hashes, lists, sets and kvstores.
Quicklist's iterators are still using heap allocation and will be addressed
soon. The reason is that `NULL` is perfectly valid quicklist iterator value and
handling this would be better reviewed separately from the mostly mechanical
changes here.
After https://github.com/redis/redis/pull/14226 module tests started
running with ASan enabled.
`auth.c` blocks the user on auth and spawns a thread that sleeps for
0.5s before unblocking the client and returning.
A tcl tests unloads the module which may happen just after the spawned
thread unblocks the client. In that case if the unloading finishes fast
enough the spawned thread may try to execute code from the module's
dynamic library that is already unloaded resulting in sefault.
Fix: just wait on the thread during module's OnUnload method.
These commands behave as DEL and SET (blindly Remove or Overwrite) when
they don't get IF* flags, and require the value of the key when they do
run with these flags.
Making sure they have the VARIABLE_FLAGS flag, and getKeysProc that can
provide the right flags depending on the arguments used. (the plain
flags when arguments are unknown are the common denominator ones)
Move lookupKey call in DELEX to avoid double lookup, which also means
(some, namely arity) syntax errors are checked (and reported) before
checking the existence of the key.
Per slot memory statistics are collected only if
'cluster-slot-stats-enabled' is enabled on strartup.
Document this behavior in 'redis.conf'.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
### Problem
The XREADGROUP command with CLAIM parameter incorrectly returns delivery
metadata (idle time and delivery count) as strings instead of integers,
contradicting the Redis specification.
### Solution
Updated the XREADGROUP CLAIM implementation to return delivery metadata
fields as integers, aligning with the documented specification and
maintaining consistency with Redis response conventions.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Verify that following RDB load fields keep their expiration time.
Verify that hashes that had HFEs not counted following rdb load in
subexpiry (by command `info keyspace`)
As seen in the following flamegraph, even after PR #14480, there a lot
of redundant work when propagating multiple XCLAIMs withing a
XREADGROUP.
This PR refactors streamPropagateXCLAIM to add a new static inline
variant, `streamPropagateXCLAIMCopyFree()`, which accepts pre-created
`robj*` arguments.
This enables reusing argument objects across multiple XCLAIM
propagations, reducing repeated creation and destruction costs during
high-throughput consumer group operations.
The added logic from https://github.com/redis/redis/pull/14402
introduced overhead to the XREADGROUP even when the added feature is not
used.
This PR tries to mitigate it, by removing unnecessary streamEncodeID()
calls and redundant byte-swapping operations from the stream iterator
hot path.
By comparing stream IDs directly in native-endian form, we eliminate
repeated encoding and memcmp() calls that were responsible for a
significant portion of total CPU time during stream iteration.
Additionally, endian conversion helpers are modernized to leverage
compiler-provided intrinsics (__builtin_bswap*) for single-instruction
byte-swaps on supported compilers.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Pull request #14039 introduced `CLUSTER SLOT-STATS` command based on
valkey's implementation but unintentionally picked up unnecessary
changes to getKeySlot() caching behavior.
Reported-by: ran.tidhar@redis.com
This issue was introduced by redis/redis#14130.
The problem is that when the number of IDs exceeds STREAMID_STATIC_VECTOR_LEN (8), the code forgot to reallocate memory for the IDs array, which causes a stack overflow.
When the HGETEX command is used with the FIELDS option but without the required
numfields argument, the server would attempt to access an out-of-bounds argv index.
This PR adds a check to ensure numfields is present before accessing it,
returning an error if it is missing. Also includes a test case to cover this scenario.
The MurmurHash64A function in hyperloglog.c used an int parameter for length,
causing integer overflow when processing PFADD entries larger than 2GB.
This could lead to server crashes.
Changed the len parameter from int to size_t to properly handle
large inputs up to SIZE_MAX in HyperLogLog operations.
Refer to the implementation in facebook/mcrouter@2dbee3d/mcrouter/lib/fbi/hash.c#L54