One-Line Summary: HNSW (Hierarchical Navigable Small World) is the dominant approximate-nearest-neighbor index used by agent vector stores — it is the data structure underneath AgentDB, Pinecone, Qdrant, Weaviate, and most production memory layers, and understanding its trade-offs explains a lot about why agent recall feels the way it does.
Prerequisites: Vector databases, AgentDB and vector stores in harnesses
What Is HNSW?
HNSW is a graph-based approximate-nearest-neighbor (ANN) index. Each indexed vector is a node; edges connect vectors that are near each other under the chosen distance metric (cosine, L2). The graph has multiple layers — top layers are sparse (long-range edges that quickly traverse the space), bottom layers are dense (local refinement). A query walks down the layers, narrowing the candidate set at each step.
The properties that make it the dominant choice:
- Sub-linear query time: ~O(log n) typical for top-K nearest neighbors.
- Decent write performance: New vectors can be inserted incrementally without rebuilding the index.
- Good recall at modest memory cost: 95–99% recall is achievable with reasonable parameters.
- No required GPU: Works on CPU at agent-relevant query rates.
Why Agents Care About HNSW
The properties above translate directly to user experience. An agent that takes 200ms to retrieve relevant memories at every turn (HNSW on a small index) feels different from one that takes 2 seconds (a worse index). At ruflo's claimed scale (millions of trajectories), HNSW is the difference between feasible and infeasible.
The other reason: HNSW exposes tunable knobs (M, efConstruction, efSearch) that map directly to agent-relevant trade-offs. Higher M → more memory, higher recall. Higher efSearch → slower query, higher recall. Knowing these knobs is necessary if you're operating an agent memory layer in production.
Key Technical Details
- Distance metric: Cosine for embeddings (default for most LLM-derived vectors); L2 for some image-text embeddings; inner product for some recommendation use cases.
- M parameter: Number of bidirectional links per node. Higher M = more memory + better recall. Defaults of 16–32 are typical for agent workloads.
- efConstruction: How many candidates to consider when adding a new node. Higher = slower indexing, better-quality graph.
- efSearch: How many candidates to consider during a query. The main runtime tunable.
- Filter-and-search: HNSW + metadata filtering is well-supported (Qdrant, Pinecone). Pre-filter narrows the search space; post-filter rejects results that don't match.
- Memory footprint: Roughly 4–8× the raw vector size, depending on M and dimensionality.
- Updates and deletes: HNSW handles incremental inserts well. Deletes are typically tombstones; periodic compaction is needed for sustained delete-heavy workloads.
- Cold-start: Building an HNSW index from scratch is O(n log n) in time and roughly proportional in memory.
How Harnesses & Frameworks Implement This
| Harness / Framework | HNSW backend |
|---|---|
| ruflo | AgentDB uses HNSW directly |
| LangGraph (with Pinecone/Qdrant/Weaviate) | HNSW under the hood |
| AutoGen + Chroma | HNSW |
| CrewAI + Qdrant/Chroma | HNSW |
| Claude Code | Whatever the connected vector-DB MCP uses (typically HNSW) |
| Most others | Indirectly via vector-DB choice |
Connections to Other Concepts
agentdb-and-vector-stores-in-harnesses.md— The level this sits at.harness-owned-memory.md— The category.cross-session-memory-strategies.md— When HNSW recall fires.../../ai-agent-concepts/06-knowledge-and-retrieval/semantic-search.md— Foundational coverage.
Further Reading
- Malkov & Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs" (2016) — The original HNSW paper.
- Pinecone / Qdrant engineering blogs — Practical HNSW tuning guides.