Agents & RAG · Module 25·7 min read

Agentic Hybrid RAG

Two complementary improvements over naive RAG — hybrid sparse-dense retrieval, and an LLM-driven loop — stacked. The result is the configuration most modern production RAG systems land at.

The five-bullet version

  • Hybrid retrieval combines BM25 (lexical) and dense vector search; agentic adds an LLM in a loop.
  • Stacked together: hybrid retrieval gives a stronger first stage; the agent decides when to retrieve, with what query, against which index.
  • Per-query, the agent can choose BM25-heavy (rare terms, IDs, code) or dense-heavy (paraphrases, semantic).
  • Most production retrieval systems in 2026 look like this — hybrid first stage, optional rerank, agent loop on top.
  • Cost grows: hybrid is ~2× the retrieval work; agentic adds multiple LLM calls per query.

§ 00 · TWO IDEAS, COMBINEDHybrid + agentic, why both

We’ve covered both pieces separately: hybrid retrievalhybrid retrieval. Combining lexical (BM25) and dense (vector) retrieval, scored together via rank fusion or weighted sum. Covers paraphrase and exact-match failure modes that pure dense or pure lexical miss. See Advanced RAG. (BM25 + dense, see Advanced RAG) and agentic RAGagentic RAG. A pattern where the LLM iteratively decides what to retrieve, observes results, and decides what to retrieve next. See the Agentic RAG lesson. (LLM in the retrieval loop, see Agentic RAG). They fix orthogonal problems:

Stack them: hybrid retrieval as the building block, agentic loop as the controller. Each retrieval the agent issues is itself a hybrid search.

§ 01 · THE HYBRID LAYERQuick refresher

For each query, run two retrievals in parallel:

Combine via Reciprocal Rank FusionReciprocal Rank Fusion. A simple, parameter-free score-fusion method: for each doc, compute Σ 1/(k + rank_i) across the retrievers. Robust to retrievers having very different score scales. (RRF) or weighted sum. Optionally rerank with a cross-encoder.

QueryBM25 (lexical)Dense (vector)RRF fusiontop-50Cross-encodertop-5LLMTwo paths fused; each catches what the other misses.
Fig 1The hybrid retrieval pipeline. Two paths, fused, optionally reranked. The most common production shape.

§ 02 · THE AGENTIC LOOP ON TOPDriving multiple hybrid retrievals

With hybrid retrieval as a primitive, the agent operates above:

  1. Receive user question.
  2. Decide what to retrieve. Issue a hybrid_search(query) call.
  3. Observe the top-k results.
  4. Have enough? Answer. Need more? Issue another hybrid_search.
  5. Loop with step cap (typically 3–5).

Each retrieval is hybrid; the agent only sees the merged top-k. The agent doesn’t need to know whether dense or BM25 surfaced a given chunk — the fusion is invisible.

§ 03 · PER-QUERY ROUTING DECISIONSLetting the agent choose its weapon

A modest extension: expose multiple retrieval tools, so the agent can pick which strategy fits each sub-query.

The agent learns to route: when the question says “what does error E-7321 mean,” lexical wins; when the question is “how do I roll back a release,” semantic wins; default otherwise to hybrid. A small system-prompt nudge teaches the routing.

§ 04 · PRODUCTION SHAPE AND TRADE-OFFSWhat this looks like at scale

Cost profile compared to vanilla RAG:

When this is the right choice:

When to skip:

CHECKAn enterprise search app has 50k mixed docs (PDFs, code, wikis). Users ask both 'what's our SOC 2 policy' (paraphrase) and 'what does error 9341 mean' (exact). Best retrieval architecture?

§ 05 · TAKING THIS FORWARDWhere this is going

The current frontier is making the loop cheaper (smaller, faster models for the routing/decision steps; cached intermediate retrievals) and the retrieval smarter (learned fusion weights, per-query embedding model selection). The architectural shape is stable; the engineering keeps improving.

§ · GOING DEEPERHybrid retrieval and when to add the agent on top

Hybrid retrieval = sparse + dense, fused. BM25 (Robertson & Zaragoza 2009) is a strong baseline for queries with exact terms — error codes, product names, acronyms. Dense vectors handle paraphrase. Fusing the two via Reciprocal Rank Fusion (RRF) or weighted score combination consistently beats either alone. This is the pattern almost every production search system uses in 2026.

Adding an agent loop on top buys you two things: multi-stepquestions and query reformulation. The model can run a cheap hybrid search, observe what came back, decide a sub-question wasn’t answered, and run a different search. Costs latency but unlocks question shapes the one-shot pipeline can’t handle. Worth the cost when the eval shows multi-hop or cross-document queries dominating the tail.

§ · FURTHER READINGReferences & deeper sources

  1. Robertson, Zaragoza (2009). The Probabilistic Relevance Framework: BM25 and Beyond · Foundations and Trends in IR
  2. Bruch et al. (2023). An Analysis of Fusion Functions for Hybrid Retrieval · ACM TOIS
  3. Karpukhin et al. (2020). Dense Passage Retrieval · EMNLP
  4. Yao et al. (2022). ReAct · ICLR
  5. Gao et al. (2023). RAG Survey · arXiv

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.