redis/tests
Mincho Paskalev c93e4a62c6
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Reply-schemas linter / reply-schemas-linter (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Add hotkeys detection (#14680)
# Description

Introducing a new method for identifying hotkeys inside a redis server
during a tracking time period.

Hotkeys in this context are defined by two metrics:
* Percentage of time spend by cpu on the key from the total time during
the tracking period
* Percentage of network bytes (input+output) used for the key from the
total network bytes used by redis during the tracking period

## Usage

Although the API is subject to change the general idea is for the user
to initiate a hotkeys tracking process which should run for some time.
The keys' metrics are recorded inside a probabilistic structure and
after that the user is able to fetch the top K of them.

### Current API

```
HOTKEYS START
            <METRICS count [CPU] [NET]>
            [COUNT k] 
            [DURATION duration]
            [SAMPLE ratio]
            [SLOTS count slot…]

HOTKEYS GET
HOTKEYS STOP
HOTKEYS RESET

```

### HOTKEYS START

Start a tracking session if either no is already started, or one was
stopped or reset. Return error if one is in progress.

* METRICS count [CPU] [NET] - chose one or more metrics to track
* COUNT k - track top K keys
* DURATION duration - preset how long the tracking session should last
* SAMPLE ratio - a key is tracked with probability 1/ratio
* SLOTS count slot... - Only track a key if it's in a slot amongst the
chosen ones

### HOTKEYS GET

Return array of the chosen metrics to track and various other metadata.
(nil) if no tracking was started or it was reset.

```
127.0.0.1:6379> hotkeys get
1) "tracking-active"
2) 1
3) "sample-ratio"
4) <ratio>
5) "selected-slots" (empty array if no slots selected)
6) 1) 0
   2) 5
   3) 6
7) "sampled-command-selected-slots-ms" (show on condition sample-ratio > 1 and selected-slots != empty-array)
8) <time-in-milliseconds>
9) "all-commands-selected-slots-ms" (show on condition selected-slots != empty-array)
10) <time-in-milliseconds>
11) "all-commands-all-slots-ms"
12) <time-in-milliseconds>
13) "net-bytes-sampled-commands-selected-slots" (show on condition sample-ratio > 1 and selected-slots != empty-array)
14) <num-bytes>
15) "net-bytes-all-commands-selected-slots" (show on condition selected-slots != empty-array)
16) <num-bytes>
17) "net-bytes-all-commands-all-slots"
18) <num-bytes>
19) "collection-start-time-unix-ms"
20) <start-time-unix-timestamp-in-ms>
21) "collection-duration-ms"
22) <duration-in-milliseconds>
23) "used-cpu-sys-ms"
24) <duration-in-millisec>
25) "used-cpu-user-ms"
26) <duration-in-millisec>
27) "total-net-bytes"
28) <num-bytes>
29) "by-cpu-time"
30) 1) key-1_1
    2) <millisec>
    ...
    19) key-10_1
    20) <millisec>
31) 1) "by-net-bytes"
32) 1) key-1_2
    2) <num-bytes>
    ...
    19) key-10_2
    20) <num-bytes>

```

### HOTKEYS STOP

Stop tracking session but user can still get results from `HOTKEYS GET`.

### HOTKEYS RESET

Release resources used for hotkeys tracking only when it is stopped.
Return error if a tracking is active.

## Additional changes

The `INFO` command now has a "hotkeys" section with 3 fields
* tracking_active - a boolean flag indicating whether or not we
currently track hotkeys.
* used-memory - memory overhead of the structures used for hotkeys
tracking.
* cpu-time - time in ms spend updating the hotkey structure. 

## Implementation

Independent of API, implementation is based on a probabilistic structure
- [Cuckoo Heavy
Keeper](https://dl.acm.org/doi/abs/10.14778/3746405.3746434) structure
with added min-heap to keep track of top K hotkey's names. CHK is an
loosely based on
[HeavyKeeper](https://www.usenix.org/conference/atc18/presentation/gong)
which is used in RedisBloom's TopK but has higher throughput.

Random fixed probability sampling via the `HOTKEYS start sample <ratio>`
param. Each key is sampled with probability `1/ratio`.

## Performance implications

With low enough sample rate (controlled by `HOTKEYS start sample
<ratio>`) there is negligible performance hit. Tracking every key though
can incur up to 15% hit in [the worst
case](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-get-10B-pipeline-500.yml)
after running the tests in this
[bench](https://github.com/redis/redis-benchmarks-specification/).

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Slavomir Kaslev <slavomir.kaslev@gmail.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
2026-01-16 17:15:28 +02:00
..
assets Optimistic locking for string objects - compare-and-set and compare-and-delete (#14435) 2025-10-21 10:32:49 +03:00
cluster Add Atomic Slot Migration (ASM) support (#14414) 2025-10-22 15:56:20 +03:00
helpers Fix daily CI for atomic slot migration (#14459) 2025-10-25 09:00:33 +08:00
integration Fix infinite loop during reverse iteration due to invalid numfields of corrupted stream (#14472) 2026-01-05 21:16:53 +08:00
modules Modules KeyMeta (Keys Metadata) (#14445) 2026-01-15 23:11:17 +02:00
sentinel Fix timing issue for sentinel master-reboot test (#14312) 2025-09-05 14:49:19 +08:00
support fix test tag leakage that can result in skipping tests (#14572) 2025-11-26 09:13:21 +02:00
tmp minor fixes to the new test suite, html doc updated 2010-05-14 18:48:33 +02:00
unit Add hotkeys detection (#14680) 2026-01-16 17:15:28 +02:00
vectorset Add daily CI for vectorset (#14302) 2025-12-10 08:52:43 +08:00
instances.tcl Fix some daily CI issues (#14217) 2025-07-28 10:53:57 +08:00
README.md Add thread sanitizer run to daily CI (#13964) 2025-06-02 10:13:23 +03:00
test_helper.tcl Add Atomic Slot Migration (ASM) support (#14414) 2025-10-22 15:56:20 +03:00

Redis Test Suite

The normal execution mode of the test suite involves starting and manipulating local redis-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations:

Option Impact
--singledb Only use database 0, don't assume others are supported.
--ignore-encoding Skip all checks for specific encoding.
--ignore-digest Skip key value digest validations.
--cluster-mode Run in strict Redis Cluster compatibility mode.
--large-memory Enables tests that consume more than 100mb

Tags

Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.

Tags can be applied in different context levels:

  • start_server context
  • tags context that bundles several tests together
  • A single test context.

The following compatibility and capability tags are currently used:

Tag Indicates
external:skip Not compatible with external servers.
cluster:skip Not compatible with --cluster-mode.
large-memory Test that requires more than 100mb
tls:skip Not compatible with --tls.
tsan:skip Not compatible with running under thread sanitizer.
needs:repl Uses replication and needs to be able to SYNC from server.
needs:debug Uses the DEBUG command or other debugging focused commands (like OBJECT REFCOUNT).
needs:pfdebug Uses the PFDEBUG command.
needs:config-maxmemory Uses CONFIG SET to manipulate memory limit, eviction policies, etc.
needs:config-resetstat Uses CONFIG RESETSTAT to reset statistics.
needs:reset Uses RESET to reset client connections.
needs:save Uses SAVE or BGSAVE to create an RDB file.

When using an external server (--host and --port), filtering using the external:skip tags is done automatically.

When using --cluster-mode, filtering using the cluster:skip tag is done automatically.

When not using --large-memory, filtering using the largemem:skip tag is done automatically.

In addition, it is possible to specify additional configuration. For example, to run tests on a server that does not permit SYNC use:

./runtest --host <host> --port <port> --tags -needs:repl