One-Line Summary: Rollback is the harness's ability to undo actions taken by an agent (file edits, branch creates, tool side effects) when a plan fails or is replanned; checkpointing is the snapshotting that makes rollback possible — together they are the difference between a recoverable agent and a destructive one.
Prerequisites: Adaptive replanning, plan-and-execute, agent state management
What Is Plan Rollback?
When an agent's plan fails partway, the partial work it did remains: edited files, created branches, sent emails, deleted records. Rollback is the discipline of undoing that partial work cleanly so the system returns to a known-good state. Done right, this means you can be aggressive with autopilot (run for hours unattended) because mistakes are cheap to recover from.
Done wrong — or not at all — agents leave a trail of half-finished changes that are worse than no change at all.
How It Works
Rollback requires checkpoints. Three common strategies:
- Filesystem-level: Use git as a checkpoint substrate. Before risky operations, the harness creates a branch or stash; on failure, it resets. LangGraph's checkpointer family is closest to this in spirit (state-based), and Claude Code's "git as truth" pattern is closest in practice.
- Application-level: The harness records every mutation as a (target, before, after) tuple. Rollback applies the inverses in reverse order. This works for in-memory state but is brittle for filesystem and side-effecting tools.
- Transactional: Some tools support transactions natively (databases, source control, certain file systems). The harness wraps tool calls in transactions and aborts on failure.
The harness usually combines strategies — git for files, application-level for memory, transactional for databases.
Why It Matters
Without rollback, the user has to read every action and confirm it because the cost of a mistake is permanent. With rollback, autopilot becomes safe: the agent can try aggressive plans because failed attempts are reverted. This is one of the largest UX-quality differences between mature and immature harnesses.
Rollback also reshapes the cost-of-mistake calculation. Without rollback, every wrong action costs the user attention to undo. With rollback, the cost is approximately just the tokens spent on the wrong attempt.
Key Technical Details
- Not everything is rollbackable: Sent emails, posted Slack messages, deleted production rows. The harness should refuse risky non-rollbackable actions in autopilot mode (or require explicit confirmation).
- Checkpoint granularity matters: Per-tool-call is granular enough but expensive. Per-plan-step is usually right. Per-session is too coarse.
- Git as checkpoint substrate: For coding agents, git is the natural fit. Pre-action:
git stashorgit branch wip-N. Post-action: keep orgit reset --hard. - State storage: Checkpoints have to live somewhere. In-memory works for short sessions; persistent storage (sqlite, file) works for resumable ones.
- Compounding rollbacks: Rolling back action N may invalidate actions N+1..M. The harness has to either revert the whole tail or detect compatibility.
- User-visible rollback: A good harness shows what was undone, not just that something was undone. "I reverted the changes to
auth.pyanddb.py" beats "rolled back." - Idempotency: Tools should be idempotent when possible — applying the same action twice produces the same state. This makes rollback simpler.
How Harnesses & Frameworks Implement This
| Harness / Framework | Rollback / Checkpointing |
|---|---|
| Claude Code | Git-based for files; conversation rewind for state |
| Claude Agent SDK | Pluggable — bring your own |
| ruflo | First-class — checkpointing across plugins |
| LangGraph | First-class — Checkpointer family (Memory, SQLite, Postgres) |
| AutoGen | DIY |
| CrewAI | Limited |
| OpenAI Agents SDK | Tracing supports replay; rollback DIY |
| Codex CLI | Git-based |
| Cursor | Limited — undo for IDE actions |
Connections to Other Concepts
adaptive-replanning.md— Replanning often requires rolling back first.plan-graphs-vs-plan-strings.md— Structured plans are easier to checkpoint.permission-and-tool-scoping-primitives.md— Some non-rollbackable tools should be permission-gated.../../langgraph-agents/04-memory-and-persistence/checkpointers.md— Foundational LangGraph coverage.
Further Reading
- LangGraph, Checkpointers documentation — The strongest framework-level treatment.