One-Line Summary: Checkpointers let you inspect the current state, walk through the full history, and replay execution from any previous checkpoint -- enabling time travel for debugging and recovery.
Prerequisites: checkpointers.md, thread-based-memory.md, graph-state.md
What Is State Inspection and Replay?
Imagine you have a security camera system that records every room in a building. At any moment you can check the live feed (current state), rewind the tape to see what happened at 2:47 PM (state history), or start a new recording branch from any point in the past (replay). That is exactly what LangGraph provides when you use a checkpointer.
Every time a node executes, the checkpointer saves a snapshot of the entire graph state. These snapshots form a timeline that you can inspect, search, and branch from. This is not just logging -- you can actually resume execution from any historical checkpoint, making it possible to debug issues, test alternative paths, and recover from errors by rolling back to a known good state.
State inspection and replay transform agent development from "run it and hope" to "run it, see everything that happened, and try again from any point." This is particularly valuable for complex multi-step agents where failures can occur deep in a workflow.
How It Works
Inspecting Current State
get_state() returns the current state of a thread along with metadata about what would execute next:
config = {"configurable": {"thread_id": "debug-session"}}
# Run the graph
graph.invoke({"messages": [("user", "Analyze this data")]}, config=config)
# Inspect current state
state_snapshot = graph.get_state(config)
# The full state values
print(state_snapshot.values["messages"])
# Which nodes would run next (empty if graph is complete)
print(state_snapshot.next) # e.g., ("tools",) or ()
# The checkpoint config for this exact snapshot
print(state_snapshot.config)Walking Through State History
get_state_history() returns every checkpoint in reverse chronological order:
config = {"configurable": {"thread_id": "debug-session"}}
for snapshot in graph.get_state_history(config):
checkpoint_id = snapshot.config["configurable"]["checkpoint_id"]
node_count = len(snapshot.values.get("messages", []))
next_nodes = snapshot.next
print(f"Checkpoint: {checkpoint_id}")
print(f" Messages so far: {node_count}")
print(f" Next to execute: {next_nodes}")
print()Replaying from a Specific Checkpoint
To resume execution from a historical checkpoint, include its checkpoint_id in the config:
# Find the checkpoint you want to replay from
target_config = None
for snapshot in graph.get_state_history(config):
if some_condition(snapshot):
target_config = snapshot.config
break
# Resume from that exact point -- pass None as input to continue
result = graph.invoke(None, config=target_config)Branching with Modified State
You can also modify state before replaying, creating an alternative execution branch:
# Get a historical checkpoint
for snapshot in graph.get_state_history(config):
if snapshot.next == ("agent",):
# Update the state at this checkpoint
graph.update_state(
snapshot.config,
{"messages": [("user", "Try a different approach instead")]}
)
# Resume from the modified state
result = graph.invoke(None, config=snapshot.config)
breakPractical Debugging Workflow
def debug_thread(graph, thread_id: str):
"""Print a complete execution trace for a thread."""
config = {"configurable": {"thread_id": thread_id}}
print(f"=== Execution trace for thread: {thread_id} ===\n")
snapshots = list(graph.get_state_history(config))
for i, snap in enumerate(reversed(snapshots)):
step = len(snapshots) - i - 1
msgs = snap.values.get("messages", [])
last_msg = msgs[-1].content[:80] if msgs else "(empty)"
print(f"Step {step}: next={snap.next} | last_msg={last_msg}")Why It Matters
- Debugging complex agents -- instead of adding print statements and re-running, inspect the exact state at every step of a failed execution.
- Reproducing bugs -- share a
thread_idandcheckpoint_idto reproduce the exact conditions that caused an issue. - A/B testing agent behavior -- replay from a checkpoint with different prompts, tools, or state modifications to compare outcomes.
- Error recovery -- when a node fails partway through a long workflow, roll back to the last successful checkpoint and retry.
- Audit trails -- maintain a complete record of every state transition for compliance or analysis.
Key Technical Details
get_state()returns aStateSnapshotwithvalues,next,config,metadata, andparent_config.get_state_history()yields snapshots in reverse chronological order (newest first).- Each snapshot's
configcontains a uniquecheckpoint_idthat identifies that exact point in time. update_state()writes a new checkpoint with modified values, creating a branch in the history.- Replaying from a checkpoint re-executes nodes from that point forward; it does not skip to the end.
- The
parent_configfield links each snapshot to its predecessor, forming a chain.
Common Misconceptions
- "Replay re-runs the entire graph from the start." Replay starts from the specified checkpoint. All state up to that point is restored from the saved snapshot, and only subsequent nodes are re-executed.
- "You can only inspect state after the graph finishes." You can inspect state at any time, including while the graph is paused at an interrupt point.
- "State history is only available with PostgresSaver." All checkpointer backends (MemorySaver, SqliteSaver, PostgresSaver) support full state history and replay.
- "update_state modifies the original checkpoint." It creates a new checkpoint with the modified values. The original checkpoint remains unchanged, preserving the full history.
Connections to Other Concepts
checkpointers.md-- the persistence layer that makes inspection and replay possiblethread-based-memory.md-- threads organize the state history being inspectedinterrupt-and-resume.md-- human-in-the-loop patterns that rely on state inspectionstate-schema-design.md-- well-designed state makes inspection output meaningfullong-term-memory-store.md-- cross-thread knowledge that complements per-thread state history