This PR adds SIMD vectorization for binary quantization distance
calculation, similar to PR #14474.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This PR replaces the manual `popcount64()` implementation with
`__builtin_popcountll()` for computing Hamming distance in binary
vectors, when the underlying hardware supports the `POPCNT` instruction.
The built-in version simplifies the code and enables the compiler to
emit a single `POPCNT` instruction on supported CPUs, which is
significantly faster than the manual bitwise method. You can verify the
difference here:
[https://godbolt.org/z/TxWMcE8M3](https://godbolt.org/z/TxWMcE8M3) — the
manual version generates a long sequence of instructions (approximately
34 on modern HW) vs 1 instruction (popcnt) when using
__builtin_popcountll()
## Portability across platforms
This change maintains full portability across platforms and compilers.
The use of `__builtin_popcountll()` is guarded by the `HAVE_POPCNT`
macro, which is defined only when the compiler supports the
target("popcnt") attribute. At runtime, we also check
`__builtin_cpu_supports("popcnt")` to ensure the hardware provides
support for the instruction. If not available, the implementation safely
falls back to the original manual `popcount64()` logic.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This pull request vectorizes the 8-bit quantization vector-search path
in a similar was as the non-quantization path.
The assembly intrinsics are a bit more complicated than in the
non-quantization path, since we are operating on 8-bit integers and we
need to worry about preventing overflow. Thus, after loading the 8-bit
integers, they are extended into 16-bits before multiplying and
accumulating into 32-bit integers.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
(https://github.com/RediSearch/RediSearch/pull/7076,
https://github.com/RediSearch/RediSearch/pull/6857) - Introducing
`FT.HYBRID`, a new command that enables hybrid queries combining both
text and vector search, with support for **RRF** and **LINEAR** result
combination. This update enhances performance and reliability through a
more efficient networking layer, smoother query execution, and improved
overall stability.
(https://github.com/RediSearch/RediSearch/pull/7065) - Add
`search-default-scorer` configuration to set the default text scorer
across queries (by default it is BM25).
https://github.com/RediSearch/RediSearch/pull/7022 - Handle Atomic Slot
Migration events upon moving slots from one node to another in a cluster
mode.
(https://github.com/RediSearch/RediSearch/pull/6769,
https://github.com/RediSearch/RediSearch/pull/6828,
https://github.com/RediSearch/RediSearch/pull/6877,
https://github.com/RediSearch/RediSearch/pull/6921) - Introduce query
time memory guardrails by adding a new `search-on-oom` configuration
which defines the query engine behavior when OOM (Out Of Memory) is
reached. OOM checks are applied to `FT.SEARCH`, `FT.AGGREGATE`, and
`FT.HYBRID` commands. The behavior on OOM can be configured to one of
three modes: `IGNORE`, `FAIL`, or `RETURN`.
`IGNORE` - The default behavior, run queries anyway (not recommended for
heavy queries returning a large result set).
`FAIL` - Fail query execution immediately if any of the nodes are in OOM
state when query execution starts.
`RETURN` - A best effort appraoch to return partial results when OOM is
detected in only some of the nodes in a cluster mode.
Conducted tests on IceLake server with LAION 512 1M vectors dataset. We
can see in the perf profile, that function ‘vector_distance_float’ is
consuming majority of the computation time (this profile is captured
during upload). Vectorizing this function, leads to about 2.38X speedup
in upload time and 1.45X speedup in RPS during search.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Omer Shadmi <76992134+oshadmi@users.noreply.github.com>
This is basically the Vector Set iteration primitive.
It exploits the underlying radix tree implementation.
The usage pattern is strongly reminiscent of other Redis commands doing
similar things.
The command usage is straightforward:
```
> VRANGE word_embeddings_int8 [Redis + 10
1) "Redis"
2) "Rediscover"
3) "Rediscover_Ashland"
4) "Rediscover_Northern_Ireland"
5) "Rediscovered"
6) "Rediscovered_Bookshop"
7) "Rediscovering"
8) "Rediscovering_God"
9) "Rediscovering_Lost"
10) "Rediscovers"
```
The above command returns 10 (or less, if less are available in the
specified range) elements from "Redis" (inclusive) to the maximum
possible element. The comparison is performed byte by byte, as
`memcmp()` would do, in this way the elements have a total order. The
start and end range can be either a string, prefixed
by `[` or `(` (the prefix is mandatory) to tell the command if the range
is inclusive or exclusive, or can be the special symbols `-` and `+`
that means the maximum and minimum element.
More info can be found in the implementation itself and in the README
file change.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This PR aims to avoid the situation of a potential crash when efSearch
is too large (and therefore the memory allocated could lead to a server
crash or an integer overflow (where less memory is allocated than
expected).
- Limit the accepted EF in the request o 100_000 as in VADD
- Limit the ef search to the number of nodes in the HNSW graph
This PR adds a `--ollama-url` option to `cli.py`, the lightweight
redis-cli-like tool that expands !"text" arguments into embeddings via
Ollama.
Previously, the embedding call was hardcoded to
http://localhost:11434/api/embeddings. With this change, users can
specify a custom Ollama server URL when starting the tool.
If no URL is provided, the tool defaults to what it was before.
This PR fixes a bug in the `hnsw_cursor_free` function where the prev
pointer was never updated during cursor list traversal. As a result, if
the cursor being freed was not the head of the list, it would not be
correctly unlinked, potentially causing memory leaks or corruption of
the cursor list.
Note that since `hnsw_cursor_free()` is never used for now, this PR does
not actually fix any bug.
Hi, this PR implements the following changes:
1. The EPSILON option of VSIM is now documented.
2. The EPSILON behavior was fixed: the score was incorrectly divided by
two in the meaning, with a 0-2 interval provided by the underlying
cosine similarity, instead of the 0-1 interval. So an EPSILON of 0.2
only returned elements with a distance between 1 and 0.9 instead of 1
and 0.8. This is a *breaking change* but the command was not documented
so far, and it is a fix, as the user sees the similarity score so was a
total mismatch. I believe this fix should definitely be back ported as
soon as possible.
3. There are now tests.
Thanks for checking,
Salvatore
VRANDMEMBER had a bug when exactly two elements where present in the
vector set: we selected a fixed number of random paths to take, and this
will lead always to the same element. This PR should be kindly
back-ported to Redis 8.x.
Hello, this is a patch that improves vector sets in two ways:
1. It makes the RDB format compatible with big endian machines: yeah,
they are non existent nowadays, but still it is better to be correct.
The behavior remains unchanged in little endian systems, it only changes
what happens in big endian systems in order for it to load and emit the
exact same format produced by little endian. The implementation was
*already largely safe* but for one detail.
2. More importantly, this PR saves nodes worst link score / index in a
backward compatible way, introducing also versioning information for the
serialized node encoding, that could be useful in the future. With this
information, that in the past was not saved for a programming error
(mine), there is no longer need to compute the worst link info at
runtime when loading data. This results in a speed improvement of about
30% when loading data from disk / RESTORE. The saving performance is
unaffected.
The patch was tested with care to be sure that data produced with old
vector sets implementations are loaded without issues (that is, the
backward compatibility was hand-tested). The new code is tested by the
persistence test already in the test suite, so no new test was added.
The SHA256 checksums for Rust 1.88.0 were incorrect, causing checksum
verification failures during installation. Updated with the correct
official checksums from https://static.rust-lang.org/dist/:
- x86_64-unknown-linux-gnu:
7b5437c1d18a174faae253a18eac22c32288dccfc09ff78d5ee99b7467e21bca
- x86_64-unknown-linux-musl:
200bcf3b5d574caededba78c9ea9d27e7afc5c6df4154ed0551879859be328e1
- aarch64-unknown-linux-gnu:
d5decc46123eb888f809f2ee3b118d13586a37ffad38afaefe56aa7139481d34
- aarch64-unknown-linux-musl:
f8b3a158f9e5e8cc82e4d92500dd2738ac7d8b5e66e0f18330408856235dec35
This PR introduces "IN" overloading for strings in Vector Sets VSIM
FILTER expressions.
Now it is possible to do something like:
"foo" IN "foobar"
IN continues to work as usually if the second operand is an array,
checking for membership of the left operand.
Ping @rowantrollope that requested this feature. I'm evaluating if to
add glob matching functionalities via the `=~` operator but I need to do
an optimization round in our glob matching function probably. Glob
matching can be slower, at the same time the complexity of the greedy
search in the graph remains unchanged, so it may be a good idea to have
it.
Case insensitive search will be likely not be added however, since this
would require handling unicode that is kinda outside the scope of Redis
filters. The user is still able to perform `"foo" in "foobar" || "FOO"
in "foobar"` at least.
This changes improve a bit the Vector Sets tests:
* DB9 is used instead of the target DB. After a successful test the DB
is left empty.
* If the replica is not available, the replication tests are skipped
without errors but just a warning.
* Other refactoring stuff.
This PR introduces the initial configuration infrastructure for
vector-sets, along with a new option:
`vset-force-single-threaded-execution`. When enabled, it applies the
`NOTHREAD` flag to VSIM and disables the `CAS` option for VADD, thereby
enforcing single-threaded execution.
Note: This mode is not optimized for single-threaded performance.
---------
Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Vector Sets deserialization was not designed to resist corrupted data,
assuming that a good checksum would mean everything is fine. However
Redis allows the user to specify extra protection via a specific
configuration option.
This commit makes the implementation more resistant, at the cost of some
slowdown. This also fixes a serialization bug that is unrelated (and has
no memory corruption effects) about the lack of the worst index /
distance serialization, that could lower the quality of a graph after
links are replaced. I'll address the serialization issues in a new PR
that will focus on that aspect alone (already work in progress).
The net result is that loading vector sets is, when the serialization of
worst index/distance is missing (always, for now) 100% slower, that is 2
times the loading time we had before. Instead when the info will be
added it will be just 10/15% slower, that is, just making the new sanity
checks.
It may be worth to export to modules if advanced sanity check if needed
or not. Anyway most of the slowdown in this patch comes from having to
recompute the worst neighbor, since duplicated and non reciprocal links
detection was heavy optimized with probabilistic algorithms.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Hi, as described, this implements WITHATTRIBS, a feature requested by a
few users, and indeed needed.
This was requested the first time by @rowantrollope but I was not sure
how to make it work with RESP2 and RESP3 in a clean way, hopefully
that's it.
The patch includes tests and documentation updates.
Hi all, this PR fixes two things:
1. An assertion, that prevented the RDB loading from recovery if there
was a quantization type mismatch (with regression test).
2. Two code paths that just returned NULL without proper cleanup during
RDB loading.