Agents & RAG · Module 24·8 min read

Agentic RAG

Standard RAG retrieves once, then generates. Agentic RAG lets the model drive — decide what to look up, look at the result, decide what to look up next. Multi-hop questions, query refinement, and tool use beyond search all open up.

The five-bullet version

  • Naive RAG retrieves once and answers. Multi-hop or refining queries break this pattern.
  • Agentic RAG puts the LLM in a loop with a search() tool — it decides what to retrieve, observes, decides what's next.
  • The agent can decompose questions, route to different indices, and combine results from multiple searches.
  • Tool use generalizes beyond search — call APIs, run code, query databases as part of the same loop.
  • Pay attention to latency and cost: every loop step adds an LLM call.

§ 00 · WHAT NAIVE RAG CAN’T DOSingle-shot retrieval’s limits

Naive RAG embeds the user query, retrieves the closest chunks, hands them to the LLM. One round trip, one answer. Many real questions don’t fit that shape.

Three failure shapes:

§ 01 · AGENTIC = LLM IN THE RETRIEVAL LOOPThe model drives

Agentic RAGagentic RAG. A RAG variant where the LLM iteratively decides what to retrieve. The model receives the user query, decides on a search, observes the results, and decides what to retrieve next — until it has enough to answer. treats retrieval as a tool the model calls multiple times in a loop:

  1. Receive user question.
  2. Decide what to search for first.
  3. Call search(query).
  4. Read the results.
  5. Decide: have enough? Answer. Need more? Pick the next search.
  6. Loop until done or step cap reached.
Lab · agentic retrieval traceTwo retrievals, chained by the model itself
1 / 7
Decide
User asks: 'Which of our 2025 customers are renewing in Q1 2026, and what's their ARR?' I need two retrievals — a list of 2025 customers, then their renewal dates and ARR. Let me search for the customer list first.

§ 02 · QUERY ROUTING & DECOMPOSITIONSmarter retrieval setup

Two specific patterns inside the loop:

§ 03 · TOOL USE BEYOND SEARCHRetrieval is one tool among several

Once the model is in a tool-calling loop, retrieval is just one tool. Add others:

The model decides which tool fits each step. Retrieval becomes one option in a richer action space. The pattern blends RAG with general agentic behavior — see the Agentic Patterns lesson.

§ 04 · WHEN AGENTIC EARNS ITS COSTLatency vs flexibility

Every loop iteration is at least one LLM call plus one tool call. Naive RAG is one round-trip; agentic might be 5–10. Latency multiplies; token spend multiplies. Use agentic RAG when:

Skip agentic RAG when:

CHECKA customer support bot answers about your product. 80% of questions need one doc lookup. 20% need 2–3 lookups across product + billing. What's the right architecture?

§ 05 · TAKING THIS FORWARDAdjacent variants to know

Next: Agentic Hybrid RAG (combines agentic looping with hybrid sparse-dense retrieval) and HyPA-RAG (hypothetical query / hierarchical variants). The space is moving fast — the patterns above are durable; the labels are not.

§ · GOING DEEPERWhen to upgrade from one-shot RAG to an agent

Agentic RAG places the LLM in a loop with retrieval as a tool the model can call multiple times. ReAct (Yao et al. 2022) defined the pattern: alternate Thought tokens with Action (tool call) tokens, observe results, decide what to do next. For single-hop factual queries, one-shot retrieval still wins on latency and cost; for multi-hop or comparative questions, the loop is the only thing that works.

Two follow-ups in the literature improve the basic loop. FLARE (Jiang et al. 2023) decides whento retrieve by watching the model’s token confidence — only triggers retrieval when the next token is uncertain. Self-RAG (Asai et al. 2023) trains the model to emit special tokens marking when to retrieve and when to critique its own output. Both cut unnecessary retrievals without sacrificing recall on the questions that need them.

§ · FURTHER READINGReferences & deeper sources

  1. Yao et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models · ICLR
  2. Schick et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools · NeurIPS
  3. Asai et al. (2023). Self-RAG · ICLR
  4. Jiang et al. (2023). Active Retrieval Augmented Generation (FLARE) · EMNLP
  5. Trivedi et al. (2023). Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions (IRCoT) · ACL

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.