One-Line Summary: A plan string is what an LLM emits when you ask it to "plan first" — a numbered list embedded in chain-of-thought; a plan graph is a structured, typed representation of the plan that the harness can inspect, verify, and replay — graphs are dramatically more reliable for non-trivial tasks, at the cost of more upfront engineering.
Prerequisites: Plan-and-execute, A* planner for agents
What Is the Distinction?
A plan string looks like:
1. Search the codebase for "auth"
2. Read the matching files
3. Write a summaryIt is human-readable and easy to produce. It is also opaque to the harness — there is no way to ask "what is step 2's preconditions?" or "what does step 2 produce?" The agent has to re-parse the string mentally on every turn, which is unreliable as plans get longer.
A plan graph looks like (in pseudocode):
plan = StateGraph()
.add_node("search", { tool: "Grep", input: { pattern: "auth" } })
.add_node("read", { tool: "Read", input: "{{search.matches}}" })
.add_node("summarize", { tool: "LLM", input: "{{read.contents}}" })
.add_edge("search", "read")
.add_edge("read", "summarize");It is verbose but the harness can inspect it, validate connections, replay it, checkpoint at each node, and run it as a state machine. LangGraph is built around this idea; ruflo's ruflo-goals produces plans of this shape; the Claude Agent SDK has structured-plan adapters.
Why Graphs Beat Strings
For trivial plans (3 steps, sequential, no branches), strings are fine. For non-trivial plans, graphs win on every axis that matters:
- Verifiability: A graph's preconditions/postconditions can be checked statically. A string's cannot.
- Replayability: A graph can be re-executed from any node. A string would have to be re-parsed.
- Branching: Conditional execution is natural in graphs and awkward in strings.
- Parallelism: Independent nodes can run in parallel; independent strings cannot.
- Checkpointing: A graph has natural checkpoint boundaries (between nodes). A string doesn't.
- Observability: Each node's input/output can be logged. String execution leaves only the final transcript.
Why Strings Persist Anyway
Two reasons. First, strings are zero-friction: any LLM produces them. Graphs require schema, infrastructure, and discipline. Second, strings are flexible: an agent can change its plan mid-thought without re-instantiating anything. Graphs are more rigid.
For ad-hoc, short-horizon tasks, strings are still the right answer — the engineering overhead of a graph isn't justified. The threshold above which graphs win is roughly 5 steps and/or any branching/parallelism.
Key Technical Details
- Graph schema is the engineering investment: Defining node types, input/output types, and edge semantics is the work. Once you have the schema, generating graphs from LLMs is much easier.
- Hybrid approach: Many harnesses produce a string plan first, then parse it into a graph for execution. LangGraph's
create_react_agentis closer to string-based; explicitStateGraphis graph-based. - Cycles in graphs: Plan graphs often need cycles (loops, retries). State graphs handle this; pure DAGs don't.
- State-as-node-output: Each node should produce well-typed output other nodes can read by name. Avoids string-passing antipatterns.
- LLM-as-node: Graph nodes can themselves be LLM calls. This is how planning and execution interleave.
- Tooling cost: Graph-based plans need visualization, debugging, replay tools. Without them, graphs are harder to debug than strings.
How Harnesses & Frameworks Implement This
| Harness / Framework | Plan representation |
|---|---|
| Claude Code | String (chain-of-thought planning) |
| Claude Agent SDK | Either; has structured-plan adapters |
| ruflo | Both — ruflo-goals produces graph-shaped plans |
| LangGraph | Graph (its core abstraction) |
| AutoGen | String + group-chat dynamics |
| CrewAI | String + sequential/hierarchical processes |
| OpenAI Agents SDK | String + tracing (closer to graph in retrospect) |
| Codex CLI | String |
| Cursor | String |
Connections to Other Concepts
goal-oriented-action-planning.md,a-star-planner-for-agents.md— Produce graph plans.plan-rollback-and-checkpointing.md— Easier with graphs.plan-driven-vs-reactive-harnesses.md— Graph-based plans tend toward plan-driven; strings tend toward reactive.
Further Reading
- LangGraph documentation — The strongest case for the graph approach.
- Anthropic, "Building Effective Agents" — Argues for keeping plan complexity proportional to task complexity.