One-Line Summary: Assembling the five research skills into a LangGraph state machine with typed state, conditional routing, and a system prompt that guides the research workflow.
Prerequisites: implementing-the-skill-set.md, project-overview-and-requirements.md, choosing-your-framework.md
What Is the Agent Graph?
Skills are like instruments in an orchestra -- each plays perfectly alone, but without a conductor and score there is no symphony. The agent graph defines execution order, branching conditions, and the shared state passing information between skills.
In LangGraph terms, the agent graph is a StateGraph where each node reads from and writes to shared state, and edges define transitions. Some edges are unconditional, others conditional based on state. The graph compiles into a runnable that executes the full research workflow. Your decisions about state shape, edge conditions, and node boundaries determine whether the agent is robust or fragile.
How It Works
State Definition
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
import asyncio, json, time
DEPTH_TO_SOURCES = {"quick": 3, "standard": 5, "deep": 8}
class ResearchState(TypedDict):
topic: str
depth: str # "quick" | "standard" | "deep"
focus_areas: list[str]
search_queries: list[str]
search_results: list[dict]
search_rounds_completed: int
pages_read: list[dict]
pages_failed: list[str]
summaries: list[dict]
all_claims: list[dict] # {"claim": str, "source_url": str}
fact_checks: list[dict]
report: dict | None
metadata: dict
status: str
error_log: list[str]Node Functions
Each node takes current state, returns a partial update:
async def generate_queries_node(state: ResearchState) -> dict:
topic, focus = state["topic"], state.get("focus_areas", [])
prompt = (f"Generate 3 diverse search queries for: {topic}\n"
f"{'Focus: ' + ', '.join(focus) if focus else ''}\n"
f'JSON: {{"queries": ["q1", "q2", "q3"]}}')
resp = await llm_client.chat.completions.create(model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}, temperature=0.7)
return {"search_queries": json.loads(resp.choices[0].message.content)["queries"],
"search_rounds_completed": 0}
async def search_node(state: ResearchState) -> dict:
existing = {r["url"] for r in state.get("search_results", [])}
new = []
for q in state["search_queries"]:
out = await web_search(WebSearchInput(query=q, num_results=5))
for r in out.results:
if r.url not in existing:
new.append(r.model_dump()); existing.add(r.url)
return {"search_results": state.get("search_results", []) + new,
"search_rounds_completed": state["search_rounds_completed"] + 1}
async def read_pages_node(state: ResearchState) -> dict:
target = DEPTH_TO_SOURCES[state.get("depth", "standard")]
read_urls = {p["url"] for p in state.get("pages_read", [])}
urls = [r["url"] for r in state["search_results"]
if r["url"] not in read_urls][:target]
results = await asyncio.gather(
*[read_page(ReadPageInput(url=u)) for u in urls], return_exceptions=True)
pages, failed = list(state.get("pages_read", [])), []
for r in results:
if isinstance(r, Exception): pass
elif r.success: pages.append(r.model_dump())
else: failed.append(r.url)
return {"pages_read": pages, "pages_failed": failed}
async def summarize_node(state: ResearchState) -> dict:
done = {s.get("source_url") for s in state.get("summaries", [])}
new_pages = [p for p in state["pages_read"] if p["url"] not in done]
results = await asyncio.gather(
*[summarize(SummarizeInput(text=p["content"], focus=state["topic"],
source_url=p["url"])) for p in new_pages], return_exceptions=True)
sums, claims = list(state.get("summaries", [])), list(state.get("all_claims", []))
for r in results:
if isinstance(r, Exception): continue
sums.append(r.model_dump())
claims.extend({"claim": c, "source_url": r.source_url} for c in r.key_claims)
return {"summaries": sums, "all_claims": claims}
async def fact_check_node(state: ResearchState) -> dict:
results = await asyncio.gather(
*[fact_check(FactCheckInput(claim=c["claim"], original_source=c["source_url"]))
for c in state["all_claims"][:5]], return_exceptions=True)
return {"fact_checks": [r.model_dump() for r in results
if not isinstance(r, Exception)]}
async def write_report_node(state: ResearchState) -> dict:
report = await write_report(WriteReportInput(
topic=state["topic"],
summaries=[SummarizeOutput(**s) for s in state["summaries"]],
fact_checks=[FactCheckOutput(**fc) for fc in state["fact_checks"]],
sources=[WebSearchResult(**r) for r in state["search_results"]]))
return {"report": report.model_dump(), "status": "done"}Edge Logic and Conditional Routing
def should_search_more(state: ResearchState) -> str:
target = DEPTH_TO_SOURCES[state.get("depth", "standard")]
if len(state.get("summaries", [])) >= target:
return "enough_sources"
if state.get("search_rounds_completed", 0) >= 3:
return "enough_sources" # Cap to prevent infinite loops
return "need_more"Complete Graph Construction
def build_research_graph() -> StateGraph:
graph = StateGraph(ResearchState)
graph.add_node("generate_queries", generate_queries_node)
graph.add_node("search", search_node)
graph.add_node("read_pages", read_pages_node)
graph.add_node("summarize", summarize_node)
graph.add_node("fact_check", fact_check_node)
graph.add_node("write_report", write_report_node)
graph.add_edge(START, "generate_queries")
graph.add_edge("generate_queries", "search")
graph.add_edge("search", "read_pages")
graph.add_edge("read_pages", "summarize")
graph.add_conditional_edges("summarize", should_search_more,
{"enough_sources": "fact_check", "need_more": "generate_queries"})
graph.add_edge("fact_check", "write_report")
graph.add_edge("write_report", END)
return graph
research_graph = build_research_graph().compile()System Prompt and Runner
SYSTEM_PROMPT = """You are a thorough research assistant. Guidelines:
1. Generate diverse queries covering different angles of the topic.
2. Focus on facts, data, and specific verifiable claims.
3. When fact-checking, look for independent corroboration.
4. Distinguish verified facts from unverified claims in the report.
5. If information is contradictory, present both perspectives.
6. Never fabricate information. If you cannot find something, say so."""
async def run_research_agent(topic: str, depth: str = "standard",
focus_areas: list[str] | None = None) -> dict:
start = time.monotonic()
state: ResearchState = {
"topic": topic, "depth": depth, "focus_areas": focus_areas or [],
"search_queries": [], "search_results": [], "search_rounds_completed": 0,
"pages_read": [], "pages_failed": [], "summaries": [], "all_claims": [],
"fact_checks": [], "report": None, "metadata": {}, "status": "starting",
"error_log": []}
final = await research_graph.ainvoke(state)
final["metadata"] = {"duration_s": round(time.monotonic() - start, 1),
"sources": len(final["search_results"]), "pages": len(final["pages_read"])}
return finalWhy It Matters
The Graph Is the Architecture
Node boundaries determine what can be parallelized, retried, and traced. Six nodes balances granularity with overhead.
Conditional Routing Enables Adaptive Behavior
should_search_more makes this an agent rather than a pipeline. Enough sources? Proceed. Not enough? Loop back.
Key Technical Details
- Each node returns only modified fields, merged via the reducer pattern
- Conditional edges are synchronous -- no LLM calls in routing functions
asyncio.gatherin read/summarize nodes enables 3-5x speedup- Graph compilation validates structure at build time; state serialization adds ~1-5ms per transition
Common Misconceptions
"The LLM should decide which node to visit next": Routing is deterministic Python, not a language task. Deterministic checks are faster, cheaper, and more predictable.
"You need to pass the full state to every node": Each node should only read what it needs and write what it modifies. Touching everything signals the state needs refactoring.
Connections to Other Concepts
implementing-the-skill-set.md-- Skill functions called by each nodeproject-overview-and-requirements.md-- Architecture diagram this implementsrunning-and-iterating.md-- Testing and improving this graph
Further Reading
- Harrison Chase, "LangGraph: Build Stateful Agents" (2024) -- Official documentation and tutorials
- Anthropic, "Building Effective Agents" (2024) -- Multi-step agent architectural patterns