One-Line Summary: Rollback is the harness's ability to undo actions taken by an agent (file edits, branch creates, tool side effects) when a plan fails or is replanned; checkpointing is the snapshotting that makes rollback possible — together they are the difference between a recoverable agent and a destructive one.

Prerequisites: Adaptive replanning, plan-and-execute, agent state management

What Is Plan Rollback?

When an agent's plan fails partway, the partial work it did remains: edited files, created branches, sent emails, deleted records. Rollback is the discipline of undoing that partial work cleanly so the system returns to a known-good state. Done right, this means you can be aggressive with autopilot (run for hours unattended) because mistakes are cheap to recover from.

Done wrong — or not at all — agents leave a trail of half-finished changes that are worse than no change at all.

How It Works

Rollback requires checkpoints. Three common strategies:

  1. Filesystem-level: Use git as a checkpoint substrate. Before risky operations, the harness creates a branch or stash; on failure, it resets. LangGraph's checkpointer family is closest to this in spirit (state-based), and Claude Code's "git as truth" pattern is closest in practice.
  2. Application-level: The harness records every mutation as a (target, before, after) tuple. Rollback applies the inverses in reverse order. This works for in-memory state but is brittle for filesystem and side-effecting tools.
  3. Transactional: Some tools support transactions natively (databases, source control, certain file systems). The harness wraps tool calls in transactions and aborts on failure.

The harness usually combines strategies — git for files, application-level for memory, transactional for databases.

Why It Matters

Without rollback, the user has to read every action and confirm it because the cost of a mistake is permanent. With rollback, autopilot becomes safe: the agent can try aggressive plans because failed attempts are reverted. This is one of the largest UX-quality differences between mature and immature harnesses.

Rollback also reshapes the cost-of-mistake calculation. Without rollback, every wrong action costs the user attention to undo. With rollback, the cost is approximately just the tokens spent on the wrong attempt.

Key Technical Details

  • Not everything is rollbackable: Sent emails, posted Slack messages, deleted production rows. The harness should refuse risky non-rollbackable actions in autopilot mode (or require explicit confirmation).
  • Checkpoint granularity matters: Per-tool-call is granular enough but expensive. Per-plan-step is usually right. Per-session is too coarse.
  • Git as checkpoint substrate: For coding agents, git is the natural fit. Pre-action: git stash or git branch wip-N. Post-action: keep or git reset --hard.
  • State storage: Checkpoints have to live somewhere. In-memory works for short sessions; persistent storage (sqlite, file) works for resumable ones.
  • Compounding rollbacks: Rolling back action N may invalidate actions N+1..M. The harness has to either revert the whole tail or detect compatibility.
  • User-visible rollback: A good harness shows what was undone, not just that something was undone. "I reverted the changes to auth.py and db.py" beats "rolled back."
  • Idempotency: Tools should be idempotent when possible — applying the same action twice produces the same state. This makes rollback simpler.

How Harnesses & Frameworks Implement This

Harness / FrameworkRollback / Checkpointing
Claude CodeGit-based for files; conversation rewind for state
Claude Agent SDKPluggable — bring your own
rufloFirst-class — checkpointing across plugins
LangGraphFirst-class — Checkpointer family (Memory, SQLite, Postgres)
AutoGenDIY
CrewAILimited
OpenAI Agents SDKTracing supports replay; rollback DIY
Codex CLIGit-based
CursorLimited — undo for IDE actions

Connections to Other Concepts

  • adaptive-replanning.md — Replanning often requires rolling back first.
  • plan-graphs-vs-plan-strings.md — Structured plans are easier to checkpoint.
  • permission-and-tool-scoping-primitives.md — Some non-rollbackable tools should be permission-gated.
  • ../../langgraph-agents/04-memory-and-persistence/checkpointers.md — Foundational LangGraph coverage.

Further Reading

  • LangGraph, Checkpointers documentation — The strongest framework-level treatment.