Plan Rollback and Checkpointing

One-Line Summary: Rollback is the harness's ability to undo actions taken by an agent (file edits, branch creates, tool side effects) when a plan fails or is replanned; checkpointing is the snapshotting that makes rollback possible — together they are the difference between a recoverable agent and a destructive one.

Prerequisites: Adaptive replanning, plan-and-execute, agent state management

What Is Plan Rollback?

When an agent's plan fails partway, the partial work it did remains: edited files, created branches, sent emails, deleted records. Rollback is the discipline of undoing that partial work cleanly so the system returns to a known-good state. Done right, this means you can be aggressive with autopilot (run for hours unattended) because mistakes are cheap to recover from.

Done wrong — or not at all — agents leave a trail of half-finished changes that are worse than no change at all.

How It Works

Rollback requires checkpoints. Three common strategies:

Filesystem-level: Use git as a checkpoint substrate. Before risky operations, the harness creates a branch or stash; on failure, it resets. LangGraph's checkpointer family is closest to this in spirit (state-based), and Claude Code's "git as truth" pattern is closest in practice.
Application-level: The harness records every mutation as a (target, before, after) tuple. Rollback applies the inverses in reverse order. This works for in-memory state but is brittle for filesystem and side-effecting tools.
Transactional: Some tools support transactions natively (databases, source control, certain file systems). The harness wraps tool calls in transactions and aborts on failure.

The harness usually combines strategies — git for files, application-level for memory, transactional for databases.

Why It Matters

Without rollback, the user has to read every action and confirm it because the cost of a mistake is permanent. With rollback, autopilot becomes safe: the agent can try aggressive plans because failed attempts are reverted. This is one of the largest UX-quality differences between mature and immature harnesses.

Rollback also reshapes the cost-of-mistake calculation. Without rollback, every wrong action costs the user attention to undo. With rollback, the cost is approximately just the tokens spent on the wrong attempt.

Key Technical Details

Not everything is rollbackable: Sent emails, posted Slack messages, deleted production rows. The harness should refuse risky non-rollbackable actions in autopilot mode (or require explicit confirmation).
Checkpoint granularity matters: Per-tool-call is granular enough but expensive. Per-plan-step is usually right. Per-session is too coarse.
Git as checkpoint substrate: For coding agents, git is the natural fit. Pre-action: git stash or git branch wip-N. Post-action: keep or git reset --hard.
State storage: Checkpoints have to live somewhere. In-memory works for short sessions; persistent storage (sqlite, file) works for resumable ones.
Compounding rollbacks: Rolling back action N may invalidate actions N+1..M. The harness has to either revert the whole tail or detect compatibility.
User-visible rollback: A good harness shows what was undone, not just that something was undone. "I reverted the changes to auth.py and db.py" beats "rolled back."
Idempotency: Tools should be idempotent when possible — applying the same action twice produces the same state. This makes rollback simpler.

How Harnesses & Frameworks Implement This

Harness / Framework	Rollback / Checkpointing
Claude Code	Git-based for files; conversation rewind for state
Claude Agent SDK	Pluggable — bring your own
ruflo	First-class — checkpointing across plugins
LangGraph	First-class — `Checkpointer` family (Memory, SQLite, Postgres)
AutoGen	DIY
CrewAI	Limited
OpenAI Agents SDK	Tracing supports replay; rollback DIY
Codex CLI	Git-based
Cursor	Limited — undo for IDE actions

Connections to Other Concepts

adaptive-replanning.md — Replanning often requires rolling back first.
plan-graphs-vs-plan-strings.md — Structured plans are easier to checkpoint.
permission-and-tool-scoping-primitives.md — Some non-rollbackable tools should be permission-gated.
../../langgraph-agents/04-memory-and-persistence/checkpointers.md — Foundational LangGraph coverage.