One-Line Summary: Autopilot modes let the harness run an agent without per-action user confirmation — bounded by a budget (tokens, time, steps), gated by permission scopes, monitored by background workers, and ended by an explicit checkpoint where the user reviews — they are the UX surface that makes long-horizon agentic work practical.

Prerequisites: Permission and tool scoping primitives, hooks and lifecycle events, plan rollback and checkpointing

What Are Autopilot Modes?

A non-autopilot session: the user types, the agent acts, the user confirms, the user types again. A session in autopilot: the user states a goal, the agent acts (often for many turns), the agent or harness decides when to stop, the user reviews the result. The model on either side is roughly the same; the user's involvement schedule is different.

Different harnesses expose autopilot at different granularities:

  • Codex CLI: manual / on-failure / auto are explicit modes selectable per session.
  • Cursor: "agent mode" expands the per-turn budget — closer to a long autopilot turn than to a multi-hour session.
  • Claude Code: Approval modes are per-tool; whole-session autopilot is achieved by allowing tools.
  • ruflo: Autopilot is a first-class operating mode; ruflo-autopilot plugin handles budget and safety gating.

How They Work

Autopilot relies on three subsystems:

  1. Budget: Tokens, dollars, wall-clock time, or tool-call count. The harness enforces. When the budget runs out, the session pauses and surfaces what was done.
  2. Permission gating: Even in autopilot, permission rules apply. Destructive operations either auto-deny or auto-prompt to a trusted automation identity rather than the user.
  3. Checkpointing: At natural breakpoints (plan complete, file group finished), the harness saves state so the user can review and approve before the next phase.

Without these, autopilot devolves into "the agent ran for an hour and did unspeakable things to my code" — exactly the failure mode that gives autopilot a bad reputation.

Why It Matters

For long-horizon work — overnight refactors, multi-day research, batch analysis of thousands of items — autopilot is the difference between "the agent helped" and "I would have done it faster myself." But autopilot without budgets, permissions, and checkpoints is dangerous; it produces the agent-equivalent of a runaway process.

The right framing: autopilot is a configurable trade between user attention and capability. Tighter budgets and more checkpoints mean more attention, less capability. Looser budgets mean less attention, more capability — if the harness's safety machinery is good enough.

Key Technical Details

  • Budget exhaustion should be soft, not hard: When budget runs out, pause and ask, don't abort. Hard aborts lose work.
  • Per-tool budgets vs. global budgets: A global budget is simpler. Per-tool budgets let you cap network calls separately from local edits.
  • Trust escalation: A long-running autopilot session that's behaving well can earn higher per-action budgets ("you can run more before I check in"). Like behavioral-trust scoring but for the same session.
  • Pause-and-resume should preserve everything: Autopilot pause should snapshot enough state that the user can review, edit the plan, and resume.
  • Notification design: Autopilot needs to notify the user at appropriate moments — checkpoint reached, important decision needs approval, error triggered. Too few notifications → user feels out of the loop; too many → notification fatigue.
  • Default-deny destructive ops: rm -rf, git push --force, DROP TABLE should never auto-allow even in autopilot. The savings from skipping confirmation aren't worth the disasters.
  • Audit trail is non-negotiable: Every action in autopilot is logged. Without that, debugging is impossible.

How Harnesses & Frameworks Implement This

Harness / FrameworkAutopilot mode
Claude CodePer-tool approval modes; whole-session via allowlist
Claude Agent SDKConfigurable; default loop is autopilot-ready
rufloFirst-class — ruflo-autopilot plugin
LangGraphDIY — graphs are autopilot-shaped
AutoGenLimited — human_input_mode controls
CrewAIYes — runs autopilot by default
OpenAI Agents SDKYes — Runner.run_async() is autopilot-shaped
Codex CLINative — auto mode
CursorNative — agent mode

Connections to Other Concepts

  • background-worker-pattern.md — Workers monitor autopilot in flight.
  • permission-and-tool-scoping-primitives.md — The safety substrate.
  • plan-rollback-and-checkpointing.md — What lets autopilot fail gracefully.
  • continuous-execution-loops.md — The runtime model.

Further Reading

  • ruvnet, ruflo-autopilot documentation.
  • Anthropic, Claude Code permissions — gating in autopilot scenarios.