One-Line Summary: Micro-LoRA adapters are small, project-scoped low-rank fine-tunes (typically <50MB) that the harness can load on top of a base model to bias it toward the project's conventions, vocabulary, and successful trajectories — emerging in 2026 as a way to give agents a kind of parametric memory without the cost of full fine-tuning.
Prerequisites: LoRA, harness-owned memory, trajectory learning
What Is a Micro-LoRA Adapter?
LoRA (Low-Rank Adaptation) is a fine-tuning technique that adds small low-rank matrices to a frozen base model, achieving most of full-fine-tune benefits at a tiny fraction of the parameter count and training cost. A micro-LoRA is a LoRA adapter trained on a small dataset (a single project, a single user, a single team's trajectories) — typically 1k–100k examples, producing an adapter of a few megabytes.
In a harness setting, micro-LoRA adapters become a parametric memory: a way to encode project-specific patterns directly into the model rather than into the prompt. At inference, the harness loads the adapter, runs the base model with the adapter's deltas applied, and the model's outputs reflect the adapter's training distribution.
How It Fits in a Harness
The pipeline ruflo uses (and the broader 2026 pattern):
- Trajectory collection: As described in
trajectory-learning.md. - Filtering: Successful trajectories form the training set.
- Adapter training: Parameter-efficient fine-tuning produces an adapter (e.g., LoRA rank 8–32).
- Adapter registry: Adapters are tagged with scope (project, team, user) and stored.
- Runtime selection: At session start, the harness picks the right adapter(s) based on context.
- Composition: Multiple adapters can be loaded simultaneously (with weighted blending) — e.g., a "Python project" adapter plus a "this team's style" adapter.
The training is typically run by the harness operator (not the user) on accumulated trajectories. Hosted ruflo deployments do this on a schedule.
Why It Matters
Micro-LoRA is how agent systems get persistent improvement without bloating prompts. A 50MB adapter encodes patterns that would otherwise need 100k+ tokens of in-context examples. The trade is engineering and infrastructure complexity (training, hosting, serving with adapters) for runtime savings (smaller prompts, faster inference).
The other reason it matters: it is the most natural answer to the data-locality question. A team that doesn't want to expose proprietary code to a centralized vector store can still benefit from training a private adapter on their trajectories.
Key Technical Details
- Rank and target modules: Typical LoRA rank for micro-adapters is 4–16. Targeting attention-Q/V is enough for most agent biases.
- Catastrophic forgetting: Mitigated by LoRA's design (base model frozen) but still possible if rank is high or training overfits.
- Adapter composition: Multiple adapters with weights (
base + 0.5 * project + 0.3 * team). Composition adds up to a few percent inference overhead. - Hosting: Adapters can be served from the same infra as the base model (vLLM/multi-LoRA serving). Self-hosted deployments need adapter-aware serving stacks.
- Privacy footprint: Adapter weights themselves can leak training data via known attacks. Treat adapter files with the sensitivity of training data.
- Eval is non-trivial: Micro-LoRA can subtly bias outputs in ways that are hard to detect without project-specific evals.
- Update frequency: Daily or weekly adapter retraining is feasible; per-session is not.
How Harnesses & Frameworks Implement This
| Harness / Framework | Micro-LoRA support |
|---|---|
| Claude Code | None directly (depends on Anthropic's adapter support roadmap) |
| Claude Agent SDK | Future-facing — deployable once provider supports per-tenant adapters |
| ruflo | First-class — ruflo-ruvllm adapter pipeline |
| LangGraph | DIY — connect to a serving stack that supports LoRA |
| AutoGen / CrewAI | DIY |
| OpenAI Agents SDK | None — OpenAI fine-tuning is full-model only at the time of writing |
| Codex CLI / Cursor | ✗ |
Connections to Other Concepts
trajectory-learning.md— The training-data source.harness-owned-memory.md— The category.cross-session-memory-strategies.md— A complementary technique (non-parametric).../../llm-concepts/06-parameter-efficient-fine-tuning/lora.md— Foundational coverage.
Further Reading
- Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (2021) — The foundational paper.
- ruvnet, ruflo-ruvllm — Adapter serving for agents.