One-Line Summary: Model routing is the harness-layer decision of which model handles which turn — a small fast model for routing/classification, a large smart model for hard reasoning, a code-tuned model for coding subtasks; routing is the second-largest cost lever after caching, and a major source of harness differentiation.
Prerequisites: Harness cost models, harness primitives
What Is Model Routing?
A naive harness uses one model for everything. A routed harness picks the right model per turn based on what's happening. The routing dimensions:
- Capability tier: Easy turns (formatting, classification) → cheap small model. Hard turns (multi-step planning, hard debugging) → large frontier model.
- Specialization: Code-heavy turns → code-tuned model. Vision-heavy turns → multimodal model.
- Context length: Short turns → standard context model. Long-context turns → extended context model.
- Latency: User-blocking turns → fast model. Background turns → cheap-but-slow model.
A 75% cost reduction (ruflo's claim) without quality loss comes mainly from routing — sending easy turns to Haiku-class models while reserving Opus-class for hard turns.
How It Works
A routing pipeline:
- Per-turn classifier: A small, cheap model (or rules) classifies the current turn — what type, how complex.
- Model selection: The router maps classification → model.
- Fallback policy: If the chosen model fails or is unavailable, fall back to a known-good alternative.
- Cost accounting: Per-turn cost is tracked; budgets are enforced.
The classifier can itself be the source of regression — a routing decision that sends a hard turn to a small model produces bad outputs. Good routers err on the side of "if uncertain, use the bigger model."
Why It Matters
Routing is why production agent costs are not 5–10× hobbyist costs. A team running thousands of agent invocations daily, all on the largest model, pays orders of magnitude more than necessary. A routed deployment matches model size to task — most tasks land at the cheap end of the distribution; the expensive turns are reserved for the few that need them.
Routing is also a quality lever, not just a cost lever. A code-tuned model on a coding turn outperforms a generalist of equivalent size. Routing toward specialization is a quality investment.
Key Technical Details
- Classifier latency adds up: A 50ms classification on every turn is real overhead. Cache classifications per session prefix.
- Hysteresis prevents thrashing: Once routed to a model, stay there for a few turns unless signals strongly change.
- Multi-provider routing complicates failover: If you route across Claude, GPT, Gemini, you need to handle differing tool-call formats, prompt engineering quirks, and capabilities.
- Quality regressions are easy to miss: Routing changes can degrade output in ways users don't immediately notice. Track quality metrics post-routing-change.
- Per-tenant overrides: Some users prefer "always use the largest model"; offer a setting.
- Router-as-agent: A learned router can outperform rule-based routing but adds complexity. Start with rules.
- Locality matters: A router that streams between providers loses prompt-cache benefits (caches are per-provider).
How Harnesses & Frameworks Implement This
| Harness / Framework | Routing |
|---|---|
| Claude Code | Per-session model selection; no per-turn routing |
| Claude Agent SDK | Programmatic — DIY router |
| ruflo | First-class — multi-provider router across Claude, GPT, Gemini, Cohere, Ollama |
| LangGraph | DIY — different nodes can use different models |
| AutoGen | Per-agent model — limited turn-level routing |
| CrewAI | Per-agent model |
| OpenAI Agents SDK | Per-agent model |
| Codex CLI | Per-session |
| Cursor | Per-session + free tier-bundled fast models for autocompletes |
Connections to Other Concepts
harness-cost-models.md— Routing is the second-largest lever.prompt-and-context-caching.md— The largest lever; interacts with routing.the-75-percent-savings-claim.md— Routing is most of the reason.claude-code-vs-codex-vs-cursor.md— Routing differences across harnesses.../../llm-concepts/07-inference-and-deployment/model-routing.md— Foundational coverage.
Further Reading
- ruvnet, ruflo multi-provider routing documentation.
- Vercel AI SDK / OpenRouter — Routing-as-a-service products.