HNSW for Agent Recall

One-Line Summary: HNSW (Hierarchical Navigable Small World) is the dominant approximate-nearest-neighbor index used by agent vector stores — it is the data structure underneath AgentDB, Pinecone, Qdrant, Weaviate, and most production memory layers, and understanding its trade-offs explains a lot about why agent recall feels the way it does.

Prerequisites: Vector databases, AgentDB and vector stores in harnesses

What Is HNSW?

HNSW is a graph-based approximate-nearest-neighbor (ANN) index. Each indexed vector is a node; edges connect vectors that are near each other under the chosen distance metric (cosine, L2). The graph has multiple layers — top layers are sparse (long-range edges that quickly traverse the space), bottom layers are dense (local refinement). A query walks down the layers, narrowing the candidate set at each step.

The properties that make it the dominant choice:

Sub-linear query time: ~O(log n) typical for top-K nearest neighbors.
Decent write performance: New vectors can be inserted incrementally without rebuilding the index.
Good recall at modest memory cost: 95–99% recall is achievable with reasonable parameters.
No required GPU: Works on CPU at agent-relevant query rates.

Why Agents Care About HNSW

The properties above translate directly to user experience. An agent that takes 200ms to retrieve relevant memories at every turn (HNSW on a small index) feels different from one that takes 2 seconds (a worse index). At ruflo's claimed scale (millions of trajectories), HNSW is the difference between feasible and infeasible.

The other reason: HNSW exposes tunable knobs (M, efConstruction, efSearch) that map directly to agent-relevant trade-offs. Higher M → more memory, higher recall. Higher efSearch → slower query, higher recall. Knowing these knobs is necessary if you're operating an agent memory layer in production.

Key Technical Details

Distance metric: Cosine for embeddings (default for most LLM-derived vectors); L2 for some image-text embeddings; inner product for some recommendation use cases.
M parameter: Number of bidirectional links per node. Higher M = more memory + better recall. Defaults of 16–32 are typical for agent workloads.
efConstruction: How many candidates to consider when adding a new node. Higher = slower indexing, better-quality graph.
efSearch: How many candidates to consider during a query. The main runtime tunable.
Filter-and-search: HNSW + metadata filtering is well-supported (Qdrant, Pinecone). Pre-filter narrows the search space; post-filter rejects results that don't match.
Memory footprint: Roughly 4–8× the raw vector size, depending on M and dimensionality.
Updates and deletes: HNSW handles incremental inserts well. Deletes are typically tombstones; periodic compaction is needed for sustained delete-heavy workloads.
Cold-start: Building an HNSW index from scratch is O(n log n) in time and roughly proportional in memory.

How Harnesses & Frameworks Implement This

Harness / Framework	HNSW backend
ruflo	AgentDB uses HNSW directly
LangGraph (with Pinecone/Qdrant/Weaviate)	HNSW under the hood
AutoGen + Chroma	HNSW
CrewAI + Qdrant/Chroma	HNSW
Claude Code	Whatever the connected vector-DB MCP uses (typically HNSW)
Most others	Indirectly via vector-DB choice

Connections to Other Concepts

agentdb-and-vector-stores-in-harnesses.md — The level this sits at.
harness-owned-memory.md — The category.
cross-session-memory-strategies.md — When HNSW recall fires.
../../ai-agent-concepts/06-knowledge-and-retrieval/semantic-search.md — Foundational coverage.

What Is HNSW?

Why Agents Care About HNSW

Key Technical Details

How Harnesses & Frameworks Implement This

Connections to Other Concepts

Further Reading