ReasoningBank

One-Line Summary: ReasoningBank is ruflo's named pattern for storing whole trajectories — the sequence of (state, decision, outcome) tuples an agent produced — as memory the system can replay or learn from; it is a specialized vector store optimized for trajectory-shaped data and a key driver of ruflo's claimed self-learning behavior.

Prerequisites: AgentDB and vector stores in harnesses, trajectory learning, harness-owned memory

What Is the ReasoningBank?

A regular vector store keeps point-shaped memories: "this fact, this snippet, this decision." A trajectory-shaped store keeps sequences: "the agent saw X, decided Y, observed Z; then saw W, decided V; the whole rollout succeeded/failed." The difference matters because what an agent does is determined by which decisions led to good outcomes, not by isolated facts.

The ReasoningBank, as implemented in ruflo, indexes trajectories by their initial state and outcome label. When the agent is in a similar state again, it can retrieve relevant past trajectories: "the last three times I was in this state, here is what worked and what didn't." That retrieval is fed into the agent's context so it can either repeat what worked or avoid what didn't.

How It Works

Three pipeline stages:

Logging: Every meaningful agent action is recorded with its full context — pre-state, action, post-state, eventual outcome. Recording is automatic via hooks.
Indexing: At session end, the trajectory is post-processed: outcome label assigned (succeeded / failed / partially), initial state embedded, key decisions tagged, then written to the bank.
Retrieval: At decision points the agent (or a meta-agent) queries the bank by current-state embedding. Returned trajectories — with their outcomes — are summarized into the prompt.

Variants of this pattern have been published as "case-based reasoning," "experience replay," and (in 2025–2026 papers) "trajectory memory."

Why It Matters

ReasoningBank is what makes "the system gets better at our codebase" go from a marketing line to a measurable property. Without it, the agent's effective IQ is bounded by its in-context information at any moment. With it, the agent has organizational memory across sessions and even across users.

The cost is engineering: trajectory indexing is more involved than point-vector indexing, retrieval is more expensive (returned trajectories are long), and stale trajectories rot worse than stale facts (a fix that worked six months ago may no longer apply).

Key Technical Details

Trajectory size matters: Whole trajectories can be thousands of tokens. Stored summaries (key-decisions extract) are cheaper to retrieve than full transcripts.
Outcome labeling is the hard part: Did the trajectory succeed? Often unclear. Labels can come from explicit user signals, tests passing, downstream feedback, or LLM-as-judge.
Negative trajectories are valuable: "The agent tried X and it failed" is as useful for future sessions as "X worked."
Retrieval-time summarization: Returning a 5000-token trajectory verbatim is wasteful. Summarize on retrieval to a compact "lessons learned."
Privacy considerations: Trajectories can include sensitive context. Per-user, per-project, or per-team scoping is essential.
Aging: Old trajectories should be discounted or pruned as the codebase / context evolves.

How Harnesses & Frameworks Implement This

Harness / Framework	Trajectory memory
Claude Code	None natively
Claude Agent SDK	DIY
ruflo	First-class — `ReasoningBank` as named subsystem
LangGraph	DIY — checkpointers + custom retriever
AutoGen	DIY
CrewAI	Limited — long-term memory partially overlaps
OpenAI Agents SDK	DIY; tracing data is a starting point
Codex CLI / Cursor	✗

Connections to Other Concepts

agentdb-and-vector-stores-in-harnesses.md — The substrate.
trajectory-learning.md — Closely related; ReasoningBank is the storage layer for trajectory learning.
sona-self-learning-neural-patterns.md — Higher-level learning that uses ReasoningBank as input.
harness-owned-memory.md — The category.