|
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-centos-jemalloc (push) Has been cancelled
CI / build-old-chain-jemalloc (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Reply-schemas linter / reply-schemas-linter (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
# Description
Introducing a new method for identifying hotkeys inside a redis server
during a tracking time period.
Hotkeys in this context are defined by two metrics:
* Percentage of time spend by cpu on the key from the total time during
the tracking period
* Percentage of network bytes (input+output) used for the key from the
total network bytes used by redis during the tracking period
## Usage
Although the API is subject to change the general idea is for the user
to initiate a hotkeys tracking process which should run for some time.
The keys' metrics are recorded inside a probabilistic structure and
after that the user is able to fetch the top K of them.
### Current API
```
HOTKEYS START
<METRICS count [CPU] [NET]>
[COUNT k]
[DURATION duration]
[SAMPLE ratio]
[SLOTS count slot…]
HOTKEYS GET
HOTKEYS STOP
HOTKEYS RESET
```
### HOTKEYS START
Start a tracking session if either no is already started, or one was
stopped or reset. Return error if one is in progress.
* METRICS count [CPU] [NET] - chose one or more metrics to track
* COUNT k - track top K keys
* DURATION duration - preset how long the tracking session should last
* SAMPLE ratio - a key is tracked with probability 1/ratio
* SLOTS count slot... - Only track a key if it's in a slot amongst the
chosen ones
### HOTKEYS GET
Return array of the chosen metrics to track and various other metadata.
(nil) if no tracking was started or it was reset.
```
127.0.0.1:6379> hotkeys get
1) "tracking-active"
2) 1
3) "sample-ratio"
4) <ratio>
5) "selected-slots" (empty array if no slots selected)
6) 1) 0
2) 5
3) 6
7) "sampled-command-selected-slots-ms" (show on condition sample-ratio > 1 and selected-slots != empty-array)
8) <time-in-milliseconds>
9) "all-commands-selected-slots-ms" (show on condition selected-slots != empty-array)
10) <time-in-milliseconds>
11) "all-commands-all-slots-ms"
12) <time-in-milliseconds>
13) "net-bytes-sampled-commands-selected-slots" (show on condition sample-ratio > 1 and selected-slots != empty-array)
14) <num-bytes>
15) "net-bytes-all-commands-selected-slots" (show on condition selected-slots != empty-array)
16) <num-bytes>
17) "net-bytes-all-commands-all-slots"
18) <num-bytes>
19) "collection-start-time-unix-ms"
20) <start-time-unix-timestamp-in-ms>
21) "collection-duration-ms"
22) <duration-in-milliseconds>
23) "used-cpu-sys-ms"
24) <duration-in-millisec>
25) "used-cpu-user-ms"
26) <duration-in-millisec>
27) "total-net-bytes"
28) <num-bytes>
29) "by-cpu-time"
30) 1) key-1_1
2) <millisec>
...
19) key-10_1
20) <millisec>
31) 1) "by-net-bytes"
32) 1) key-1_2
2) <num-bytes>
...
19) key-10_2
20) <num-bytes>
```
### HOTKEYS STOP
Stop tracking session but user can still get results from `HOTKEYS GET`.
### HOTKEYS RESET
Release resources used for hotkeys tracking only when it is stopped.
Return error if a tracking is active.
## Additional changes
The `INFO` command now has a "hotkeys" section with 3 fields
* tracking_active - a boolean flag indicating whether or not we
currently track hotkeys.
* used-memory - memory overhead of the structures used for hotkeys
tracking.
* cpu-time - time in ms spend updating the hotkey structure.
## Implementation
Independent of API, implementation is based on a probabilistic structure
- [Cuckoo Heavy
Keeper](https://dl.acm.org/doi/abs/10.14778/3746405.3746434) structure
with added min-heap to keep track of top K hotkey's names. CHK is an
loosely based on
[HeavyKeeper](https://www.usenix.org/conference/atc18/presentation/gong)
which is used in RedisBloom's TopK but has higher throughput.
Random fixed probability sampling via the `HOTKEYS start sample <ratio>`
param. Each key is sampled with probability `1/ratio`.
## Performance implications
With low enough sample rate (controlled by `HOTKEYS start sample
<ratio>`) there is negligible performance hit. Tracking every key though
can incur up to 15% hit in [the worst
case](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-get-10B-pipeline-500.yml)
after running the tests in this
[bench](https://github.com/redis/redis-benchmarks-specification/).
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Slavomir Kaslev <slavomir.kaslev@gmail.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
|
||
|---|---|---|
| .. | ||
| assets | ||
| cluster | ||
| helpers | ||
| integration | ||
| modules | ||
| sentinel | ||
| support | ||
| tmp | ||
| unit | ||
| vectorset | ||
| instances.tcl | ||
| README.md | ||
| test_helper.tcl | ||
Redis Test Suite
The normal execution mode of the test suite involves starting and manipulating
local redis-server instances, inspecting process state, log files, etc.
The test suite also supports execution against an external server, which is
enabled using the --host and --port parameters. When executing against an
external server, tests tagged external:skip are skipped.
There are additional runtime options that can further adjust the test suite to match different external server configurations:
| Option | Impact |
|---|---|
--singledb |
Only use database 0, don't assume others are supported. |
--ignore-encoding |
Skip all checks for specific encoding. |
--ignore-digest |
Skip key value digest validations. |
--cluster-mode |
Run in strict Redis Cluster compatibility mode. |
--large-memory |
Enables tests that consume more than 100mb |
Tags
Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.
Tags can be applied in different context levels:
start_servercontexttagscontext that bundles several tests together- A single test context.
The following compatibility and capability tags are currently used:
| Tag | Indicates |
|---|---|
external:skip |
Not compatible with external servers. |
cluster:skip |
Not compatible with --cluster-mode. |
large-memory |
Test that requires more than 100mb |
tls:skip |
Not compatible with --tls. |
tsan:skip |
Not compatible with running under thread sanitizer. |
needs:repl |
Uses replication and needs to be able to SYNC from server. |
needs:debug |
Uses the DEBUG command or other debugging focused commands (like OBJECT REFCOUNT). |
needs:pfdebug |
Uses the PFDEBUG command. |
needs:config-maxmemory |
Uses CONFIG SET to manipulate memory limit, eviction policies, etc. |
needs:config-resetstat |
Uses CONFIG RESETSTAT to reset statistics. |
needs:reset |
Uses RESET to reset client connections. |
needs:save |
Uses SAVE or BGSAVE to create an RDB file. |
When using an external server (--host and --port), filtering using the
external:skip tags is done automatically.
When using --cluster-mode, filtering using the cluster:skip tag is done
automatically.
When not using --large-memory, filtering using the largemem:skip tag is done
automatically.
In addition, it is possible to specify additional configuration. For example, to
run tests on a server that does not permit SYNC use:
./runtest --host <host> --port <port> --tags -needs:repl