Agent Cache

@betterdb/agent-cache is a standalone, framework-agnostic, multi-tier exact-match cache for AI agent workloads backed by Valkey. Three cache tiers behind one connection: LLM responses, tool results, and session state. Every cache operation emits an OpenTelemetry span and updates Prometheus metrics, giving teams full production observability without additional instrumentation. No modules required — works on vanilla Valkey 7+, ElastiCache, Memorystore, MemoryDB, and any Redis-compatible endpoint.

Prerequisites

  • Valkey 7+ or Redis 6.2+ (no modules, no RediSearch, no RedisJSON)
  • Or Amazon ElastiCache for Valkey / Redis
  • Or Google Cloud Memorystore for Valkey
  • Or Amazon MemoryDB
  • Node.js >= 20

Installation

npm install @betterdb/agent-cache iovalkey

iovalkey is a peer dependency — you must install it alongside the package.

Quick start

import Valkey from 'iovalkey';
import { AgentCache } from '@betterdb/agent-cache';

const client = new Valkey({ host: 'localhost', port: 6379 });

const cache = new AgentCache({
  client,
  tierDefaults: {
    llm:     { ttl: 3600 },
    tool:    { ttl: 300 },
    session: { ttl: 1800 },
  },
});

// LLM response caching
const params = {
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is Valkey?' }],
  temperature: 0,
};

const result = await cache.llm.check(params);

if (!result.hit) {
  const response = await callLlm(params);
  await cache.llm.store(params, response);
}

// Tool result caching
const weather = await cache.tool.check('get_weather', { city: 'Sofia' });
if (!weather.hit) {
  const data = await getWeather({ city: 'Sofia' });
  await cache.tool.store('get_weather', { city: 'Sofia' }, JSON.stringify(data));
}

// Session state
await cache.session.set('thread-1', 'last_intent', 'book_flight');
const intent = await cache.session.get('thread-1', 'last_intent');

Why agent-cache

As of 2026, no existing caching solution for AI agents provides all three of the following: multi-tier caching (LLM responses, tool results, and session state in one package), built-in observability (OpenTelemetry spans and Prometheus metrics at the cache operation level), and no module requirements (works on vanilla Valkey without RedisJSON or RediSearch).

Capability @betterdb/agent-cache LangChain RedisCache LangGraph checkpoint-redis AutoGen RedisStore LiteLLM Redis Upstash + Vercel AI SDK
Multi-tier (LLM + Tool + State) ❌ LLM only ❌ State only ❌ LLM only ❌ LLM only ❌ LLM only
Built-in OTel + Prometheus ⚠️ Partial
No modules required ❌ Redis 8 + modules ❌ Upstash only
Framework adapters ✅ LC, LG, AI SDK ❌ LC only ❌ LG only ❌ AutoGen only ❌ LiteLLM only ❌ AI SDK only

Configuration reference

Option Type Default Description
client Valkey required An iovalkey client instance. The caller owns the connection lifecycle
name string 'betterdb_ac' Key prefix for all Valkey keys
defaultTtl number undefined Default TTL in seconds. undefined means no expiry
tierDefaults.llm.ttl number undefined Default TTL for LLM cache entries
tierDefaults.tool.ttl number undefined Default TTL for tool cache entries
tierDefaults.session.ttl number undefined Default TTL for session entries
costTable Record<string, ModelCost> undefined Model pricing for cost savings tracking
telemetry.tracerName string '@betterdb/agent-cache' OpenTelemetry tracer name
telemetry.metricsPrefix string 'agent_cache' Prefix for all Prometheus metric names
telemetry.registry Registry prom-client default prom-client Registry to register metrics on

ModelCost format

{
  'gpt-4o': { inputPer1k: 0.0025, outputPer1k: 0.01 },
  'gpt-4o-mini': { inputPer1k: 0.00015, outputPer1k: 0.0006 },
}

Cache tiers

LLM cache

Caches LLM responses by exact match on model, messages, temperature, top_p, max_tokens, and tools.

Key format: {name}:llm:{sha256_hash}

TTL precedence: per-call ttl > tierDefaults.llm.ttl > defaultTtl

// Check for cached response
const result = await cache.llm.check({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
  temperature: 0,
});

// Store a response with token counts for cost tracking
await cache.llm.store(params, response, {
  ttl: 3600,
  tokens: { input: 10, output: 50 },
});

// Invalidate all entries for a specific model
const deleted = await cache.llm.invalidateByModel('gpt-4o');

Cache keys are computed by serializing parameters with recursively sorted object keys before SHA-256 hashing. This means { city: 'Sofia', units: 'metric' } and { units: 'metric', city: 'Sofia' } produce the same cache key.

Tool cache

Caches tool/function call results by tool name and argument hash.

Key format: {name}:tool:{toolName}:{sha256_hash}

TTL precedence: per-call ttl > tool policy > tierDefaults.tool.ttl > defaultTtl

// Check for cached result
const result = await cache.tool.check('get_weather', { city: 'Sofia' });

// Store a result with API cost tracking
await cache.tool.store('get_weather', { city: 'Sofia' }, jsonResult, {
  ttl: 300,
  cost: 0.001,
});

// Set a persistent per-tool TTL policy
await cache.tool.setPolicy('get_weather', { ttl: 600 });

// Invalidate all results for a tool
const deleted = await cache.tool.invalidateByTool('get_weather');

// Invalidate a specific tool+args combination
const existed = await cache.tool.invalidate('get_weather', { city: 'Sofia' });

Session store

Key-value storage for agent session state with sliding window TTL. Fields are stored as individual Valkey keys (not Redis HASHes), enabling per-field TTL.

Key format: {name}:session:{threadId}:{field}

TTL behavior: get() refreshes TTL on hit (sliding window). set() sets TTL. touch() refreshes TTL on all fields.

// Get/set individual fields
await cache.session.set('thread-1', 'last_intent', 'book_flight');
const intent = await cache.session.get('thread-1', 'last_intent');

// Get all fields for a thread
const all = await cache.session.getAll('thread-1');

// Delete a field
await cache.session.delete('thread-1', 'last_intent');

// Destroy entire thread (including LangGraph checkpoints)
const deleted = await cache.session.destroyThread('thread-1');

// Refresh TTL on all fields
await cache.session.touch('thread-1');

Stats and self-optimization

stats()

Returns aggregate statistics for all tiers:

const stats = await cache.stats();
// {
//   llm: { hits: 150, misses: 50, total: 200, hitRate: 0.75 },
//   tool: { hits: 300, misses: 100, total: 400, hitRate: 0.75 },
//   session: { reads: 1000, writes: 500 },
//   costSavedMicros: 12500000, // $12.50 in microdollars
//   perTool: {
//     get_weather: { hits: 200, misses: 50, hitRate: 0.8, ttl: 300 },
//   }
// }

toolEffectiveness()

Returns per-tool effectiveness rankings with TTL recommendations:

const effectiveness = await cache.toolEffectiveness();
// [
//   { tool: 'get_weather', hitRate: 0.85, costSaved: 5.00, recommendation: 'increase_ttl' },
//   { tool: 'search', hitRate: 0.6, costSaved: 2.50, recommendation: 'optimal' },
//   { tool: 'rare_api', hitRate: 0.1, costSaved: 0.10, recommendation: 'decrease_ttl_or_disable' },
// ]
Recommendation Criteria
increase_ttl Hit rate > 80% and current TTL < 1 hour
optimal Hit rate 40–80%
decrease_ttl_or_disable Hit rate < 40%

Framework adapters

Three optional adapters are available as subpath exports. They do not add framework dependencies to the base package — only install the adapter’s peer dependency if you use it.

LangChain

Import from @betterdb/agent-cache/langchain. Requires @langchain/core >= 0.3.0 as a peer dependency.

import { ChatOpenAI } from '@langchain/openai';
import { BetterDBLlmCache } from '@betterdb/agent-cache/langchain';

const model = new ChatOpenAI({
  model: 'gpt-4o',
  cache: new BetterDBLlmCache({ cache }),
});

The adapter implements LangChain’s BaseCache interface.

Vercel AI SDK

Import from @betterdb/agent-cache/ai. Requires ai ^6.0.135 as a peer dependency.

import { wrapLanguageModel } from 'ai';
import { openai } from '@ai-sdk/openai';
import { createAgentCacheMiddleware } from '@betterdb/agent-cache/ai';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: createAgentCacheMiddleware({ cache }),
});

The middleware intercepts non-streaming doGenerate calls. On a cache hit, the model is not called and the response includes providerMetadata: { agentCache: { hit: true } } so consumers can distinguish cached responses from real zero-token calls. Responses containing tool-call parts are not cached to avoid breaking tool-calling workflows.

LangGraph

Import from @betterdb/agent-cache/langgraph. Requires @langchain/langgraph-checkpoint >= 0.1.0 as a peer dependency.

Works on vanilla Valkey 7+ with no modules. Unlike langgraph-checkpoint-redis, this does not require Redis 8.0+, RedisJSON, or RediSearch.

import { StateGraph } from '@langchain/langgraph';
import { BetterDBSaver } from '@betterdb/agent-cache/langgraph';

const checkpointer = new BetterDBSaver({ cache });

const graph = new StateGraph({ channels: schema })
  .addNode('agent', agentNode)
  .compile({ checkpointer });

The saver implements the full LangGraph checkpoint protocol including pendingWrites reconstruction, supporting interrupt/resume workflows, human-in-the-loop patterns, and parallel node execution.

Storage layout:

Key pattern Contents
{name}:session:{thread_id}:checkpoint:{id} JSON-serialized CheckpointTuple
{name}:session:{thread_id}:checkpoint:latest Pointer to the most recent checkpoint
{name}:session:{thread_id}:writes:{checkpoint_id}\|{task_id}\|{channel}\|{idx} JSON-serialized pending write value

Observability

OpenTelemetry

Every public method emits a span via the @opentelemetry/api tracer. Spans require an OpenTelemetry SDK to be configured in the host application — this package does not bundle an SDK.

Span name Key attributes
agent_cache.llm.check cache.key, cache.model, cache.hit
agent_cache.llm.store cache.key, cache.model, cache.ttl, cache.bytes
agent_cache.llm.invalidateByModel cache.model, cache.deleted_count
agent_cache.tool.check cache.key, cache.tool_name, cache.hit
agent_cache.tool.store cache.key, cache.tool_name, cache.ttl, cache.bytes
agent_cache.tool.invalidateByTool cache.tool_name, cache.deleted_count
agent_cache.session.get cache.key, cache.thread_id, cache.field, cache.hit
agent_cache.session.set cache.key, cache.thread_id, cache.field, cache.ttl, cache.bytes
agent_cache.session.getAll cache.thread_id, cache.field_count
agent_cache.session.destroyThread cache.thread_id, cache.deleted_count
agent_cache.session.touch cache.thread_id, cache.touched_count

Prometheus

All metric names are prefixed with the configured telemetry.metricsPrefix (default: agent_cache).

Metric Type Labels Description
{prefix}_requests_total Counter cache_name, tier, result, tool_name Total cache requests. result is hit or miss
{prefix}_operation_duration_seconds Histogram cache_name, tier, operation Duration of cache operations in seconds
{prefix}_cost_saved_total Counter cache_name, tier, model, tool_name Estimated cost saved in dollars from cache hits
{prefix}_stored_bytes_total Counter cache_name, tier Total bytes stored in cache
{prefix}_active_sessions Gauge cache_name Approximate number of active session threads

BetterDB Monitor integration

Connect BetterDB Monitor to the same Valkey instance and it will automatically detect the agent cache stats hash ({name}:__stats) and surface hit rates, cost savings, and per-tool effectiveness in the dashboard. No additional configuration is required.

Design tradeoffs

Individual keys vs Redis hashes for session state

Session fields are stored as individual Valkey keys, not as fields inside a single Redis HASH per thread. This allows per-field TTL and atomic operations on individual fields. The trade-off is that getAll() and destroyThread() require a SCAN + pipeline instead of a single HGETALL or DEL. For typical agent sessions with dozens of fields, this is negligible. For sessions with thousands of fields, a HASH-based approach would be faster for bulk reads.

Plain JSON strings vs RedisJSON for LangGraph checkpoints

The LangGraph adapter stores checkpoints as plain JSON strings via SET/GET, not via RedisJSON path operations. This is what makes the adapter work on vanilla Valkey 7+ and every managed service without module configuration. The trade-off is that list() with filtering requires SCAN + parse instead of indexed queries. For typical checkpoint volumes (hundreds to low thousands per thread), this is fast enough. langgraph-checkpoint-redis uses RedisJSON + RediSearch for O(1) indexed lookups — if you have millions of checkpoints per thread, use that instead.

Counter-based stats vs event streams

Cache statistics are stored as atomic counters in a single Valkey hash (HINCRBY), not as event streams. BetterDB Monitor computes rates by diffing counter values over time windows. The trade-off is no per-request event detail — you get aggregate hit rates and cost savings, not a log of every cache operation. Event streams are planned for a future release.

Approximate active session tracking

The active_sessions Prometheus gauge is approximate — it tracks threads seen via an in-memory LRU (bounded at 10k entries), incremented on first write, decremented on destroyThread(). It does not survive process restarts and may drift if threads expire via TTL without an explicit destroy. For accurate session counts, query Valkey directly with SCAN.

Cluster mode

Cluster support works by running SCAN on each master node sequentially and merging results. When an iovalkey Cluster client is passed, destroyThread(), invalidateByModel(), invalidateByTool(), flush(), getAll(), touch(), and scanFieldsByPrefix() automatically iterate all master nodes. The trade-off is N sequential SCAN loops (one per master) instead of 1. For typical deployments with 3–6 masters, this is negligible — the operations were already O(n) over all keys. No API or configuration changes are needed; pass a Cluster instance and everything works correctly.

Known limitations

Streaming

Streaming LLM responses are not cached by the Vercel AI SDK adapter. Accumulate the full response before caching. The cached response is always returned as a complete string, not re-streamed token-by-token.

LangGraph list() memory usage

The list() method loads all checkpoint data for a thread into memory before filtering and applying the limit. For typical agent deployments with hundreds of checkpoints per thread, this is acceptable. The limit: 1 fast path short-circuits by reading checkpoint:latest directly. For threads with thousands of large checkpoints, consider using langgraph-checkpoint-redis with Redis 8+ instead.

Self-healing corrupt entries

Corrupt (unparseable JSON) cache entries in the LLM and tool tiers are deleted on first detection and treated as misses. This prevents repeated re-fetching of bad data until TTL expiry.