Plan Graphs vs. Plan Strings

One-Line Summary: A plan string is what an LLM emits when you ask it to "plan first" — a numbered list embedded in chain-of-thought; a plan graph is a structured, typed representation of the plan that the harness can inspect, verify, and replay — graphs are dramatically more reliable for non-trivial tasks, at the cost of more upfront engineering.

Prerequisites: Plan-and-execute, A* planner for agents

What Is the Distinction?

A plan string looks like:

1. Search the codebase for "auth"
2. Read the matching files
3. Write a summary

It is human-readable and easy to produce. It is also opaque to the harness — there is no way to ask "what is step 2's preconditions?" or "what does step 2 produce?" The agent has to re-parse the string mentally on every turn, which is unreliable as plans get longer.

A plan graph looks like (in pseudocode):

plan = StateGraph()
  .add_node("search", { tool: "Grep", input: { pattern: "auth" } })
  .add_node("read", { tool: "Read", input: "{{search.matches}}" })
  .add_node("summarize", { tool: "LLM", input: "{{read.contents}}" })
  .add_edge("search", "read")
  .add_edge("read", "summarize");

It is verbose but the harness can inspect it, validate connections, replay it, checkpoint at each node, and run it as a state machine. LangGraph is built around this idea; ruflo's ruflo-goals produces plans of this shape; the Claude Agent SDK has structured-plan adapters.

Why Graphs Beat Strings

For trivial plans (3 steps, sequential, no branches), strings are fine. For non-trivial plans, graphs win on every axis that matters:

Verifiability: A graph's preconditions/postconditions can be checked statically. A string's cannot.
Replayability: A graph can be re-executed from any node. A string would have to be re-parsed.
Branching: Conditional execution is natural in graphs and awkward in strings.
Parallelism: Independent nodes can run in parallel; independent strings cannot.
Checkpointing: A graph has natural checkpoint boundaries (between nodes). A string doesn't.
Observability: Each node's input/output can be logged. String execution leaves only the final transcript.

Why Strings Persist Anyway

Two reasons. First, strings are zero-friction: any LLM produces them. Graphs require schema, infrastructure, and discipline. Second, strings are flexible: an agent can change its plan mid-thought without re-instantiating anything. Graphs are more rigid.

For ad-hoc, short-horizon tasks, strings are still the right answer — the engineering overhead of a graph isn't justified. The threshold above which graphs win is roughly 5 steps and/or any branching/parallelism.

Key Technical Details

Graph schema is the engineering investment: Defining node types, input/output types, and edge semantics is the work. Once you have the schema, generating graphs from LLMs is much easier.
Hybrid approach: Many harnesses produce a string plan first, then parse it into a graph for execution. LangGraph's create_react_agent is closer to string-based; explicit StateGraph is graph-based.
Cycles in graphs: Plan graphs often need cycles (loops, retries). State graphs handle this; pure DAGs don't.
State-as-node-output: Each node should produce well-typed output other nodes can read by name. Avoids string-passing antipatterns.
LLM-as-node: Graph nodes can themselves be LLM calls. This is how planning and execution interleave.
Tooling cost: Graph-based plans need visualization, debugging, replay tools. Without them, graphs are harder to debug than strings.

How Harnesses & Frameworks Implement This

Harness / Framework	Plan representation
Claude Code	String (chain-of-thought planning)
Claude Agent SDK	Either; has structured-plan adapters
ruflo	Both — `ruflo-goals` produces graph-shaped plans
LangGraph	Graph (its core abstraction)
AutoGen	String + group-chat dynamics
CrewAI	String + sequential/hierarchical processes
OpenAI Agents SDK	String + tracing (closer to graph in retrospect)
Codex CLI	String
Cursor	String

Connections to Other Concepts

goal-oriented-action-planning.md, a-star-planner-for-agents.md — Produce graph plans.
plan-rollback-and-checkpointing.md — Easier with graphs.
plan-driven-vs-reactive-harnesses.md — Graph-based plans tend toward plan-driven; strings tend toward reactive.