Speculative Planning and Branching

One-Line Summary: Speculative planning explores multiple candidate plans in parallel — picking the best one only after partial execution — at higher token cost in exchange for lower wall-clock latency and better outcomes on hard tasks; closer to chess-engine search than to typical LLM planning.

Prerequisites: Plan-and-execute, plan graphs vs plan strings, A* planner for agents

What Is Speculative Planning?

Standard planning produces one plan and executes it; if it fails, replan. Speculative planning produces several plans (or several first steps), partially executes them in parallel, evaluates the results, and commits to the most promising branch. The remaining branches are discarded.

This is structurally similar to speculative execution in CPUs and to MCTS in chess engines: spend extra cycles on paths you might not need, in exchange for the ability to choose the best path with information you only have after partial execution.

How It Works

Three components:

Branching: At a decision point, generate K candidate next-actions (or full plans) instead of one.
Parallel partial execution: Execute each branch up to a cutoff (one step, several steps, until cost threshold).
Selection: Score the results and commit to one branch. Discard the others.

The cutoff and selection criteria are the design choices. Aggressive cutoffs (run each branch all the way) approach mesh-style exploration with all the cost. Tight cutoffs (one step each) are cheaper but select on noisier signals.

Why It Matters

For high-stakes tasks where execution latency matters more than token cost — debugging in production, time-pressured agentic CI/CD, real-time decision making — speculative planning hides latency by overlapping exploration. For low-stakes tasks, it is overkill.

The other use case: tasks where a small fraction of plans succeed but you can't tell which in advance. Speculation lets you try several without serializing them.

Key Technical Details

Token cost is K×: Branching multiplies spend by the number of branches. Pick K small (2–4) unless cost is irrelevant.
Branches must be independent: If branches share side effects, parallel execution corrupts state. Only speculate on read-only operations or sandbox each branch.
Selection criteria matter: Naive "branch with the longest plan wins" is wrong. Score by progress-toward-goal, by predicted final cost, or by an LLM-as-judge.
Discarded branches still cost real money: Don't speculate just because it sounds clever; cost-justify each instance.
Bandit-like dynamics: Online learning on which branches tend to succeed lets the system invest more compute in promising branches and prune unpromising ones early.
State isolation is hard: If branches read/write a database, you need isolation per branch. Easier with read-only or pure-compute branches.

How Harnesses & Frameworks Implement This

Harness / Framework	Speculative planning
Claude Code	DIY — generally not worth the complexity for coding tasks
Claude Agent SDK	DIY
ruflo	Limited first-class — `ruflo-goals` supports plan branching
LangGraph	DIY — naturally expressible as parallel graph branches
AutoGen	DIY
CrewAI	✗
OpenAI Agents SDK	DIY
Codex CLI	✗
Cursor	✗

Connections to Other Concepts

goal-oriented-action-planning.md, a-star-planner-for-agents.md — A* itself is a search; speculation extends it.
plan-graphs-vs-plan-strings.md — Graph-based plans branch naturally.
multi-step-plan-evaluation.md — Evaluation is the selection criterion.
harness-cost-models.md — Speculation is a major cost lever (in either direction).