Retrieval (Python)

betterdb-retrieval is the Python counterpart to @betterdb/retrieval: a developer-facing retrieval SDK over Valkey Search (FT.*) with a typed index schema, idempotent index lifecycle, upsert/delete, and vector + filtered + hybrid query. Built on betterdb-valkey-search-kit.

Same architecture and same Valkey data format as the TypeScript package - a TypeScript app and a Python app can share the same index.

Prerequisites

  • Valkey 8.0+ with the valkey-search module loaded
  • Or Amazon ElastiCache for Valkey (8.0+)
  • Or Google Cloud Memorystore for Valkey
  • Python >= 3.11

Installation

pip install betterdb-retrieval valkey

Quick start

from valkey.asyncio import Valkey

from betterdb_retrieval import Retriever, UpsertEntry

client = Valkey.from_url("redis://localhost:6379")


async def embed(text: str) -> list[float]:
    ...  # return an embedding


retriever = Retriever(
    client=client,
    name="docs",
    schema={
        "fields": {
            "category": {"type": "tag"},
            "year": {"type": "numeric", "sortable": True},
        },
        "vector": {"algorithm": "hnsw", "metric": "cosine"},
    },
    embed_fn=embed,
)

# Create the index if it doesn't exist (idempotent; dims resolved from embed_fn).
await retriever.create_index()

await retriever.upsert([
    UpsertEntry(
        id="doc1",
        text="Valkey is a high-performance key-value store",
        fields={"category": "db", "year": 2024},
    ),
])

hits = await retriever.query(
    text="fast in-memory database",
    k=5,
    filter={"category": "db"},
)

Retriever API

Method Description
create_index() Create the index if absent (idempotent). Vector dimension is taken from schema["vector"]["dims"] or resolved by probing embed_fn.
upsert(entries) Embed each entry’s text and write it as a hash with its fields.
delete(ids) Delete documents by id.
query(*, k, text=None, vector=None, filter=None, hybrid=None) KNN search. Provide text (embedded for you) or a precomputed vector, a positive k, an optional filter (tag/numeric fields), and hybrid="rerank" to post-process hits through a rerank_fn. Returns list[QueryHit].
describe_index() / health() Index stats: doc count, indexing state, dimension, percent indexed, and an optional estimated recall.
drop_index() Drop the index (no-op if it doesn’t exist).
register() / unregister() Publish/remove a discovery marker in the shared __betterdb:caches registry, ownership-checked so it never clobbers a foreign cache type.

QueryHit.score is the raw KNN vector distance (lower is closer), not a similarity. Rank ascending.

The query() method is keyword-only. UpsertEntry, QueryHit, and IndexDescription are dataclasses with snake_case fields; the schema TypedDicts keep camelCase keys (fieldName, efConstruction) to match the wire format.

Observability

Pass metrics (a RetrievalMetrics) and/or tracer (a RetrievalTracer) to instrument every operation. create_prometheus_metrics() provides a ready-made prometheus-client implementation.

Interoperability with the TypeScript package

The Python and TypeScript packages use the same index schema and the same hash field layout, so an index written by one can be queried by the other. BetterDB Monitor treats them identically through the shared discovery registry.

See also