One-Line Summary: Micro-LoRA adapters are small, project-scoped low-rank fine-tunes (typically <50MB) that the harness can load on top of a base model to bias it toward the project's conventions, vocabulary, and successful trajectories — emerging in 2026 as a way to give agents a kind of parametric memory without the cost of full fine-tuning.

Prerequisites: LoRA, harness-owned memory, trajectory learning

What Is a Micro-LoRA Adapter?

LoRA (Low-Rank Adaptation) is a fine-tuning technique that adds small low-rank matrices to a frozen base model, achieving most of full-fine-tune benefits at a tiny fraction of the parameter count and training cost. A micro-LoRA is a LoRA adapter trained on a small dataset (a single project, a single user, a single team's trajectories) — typically 1k–100k examples, producing an adapter of a few megabytes.

In a harness setting, micro-LoRA adapters become a parametric memory: a way to encode project-specific patterns directly into the model rather than into the prompt. At inference, the harness loads the adapter, runs the base model with the adapter's deltas applied, and the model's outputs reflect the adapter's training distribution.

How It Fits in a Harness

The pipeline ruflo uses (and the broader 2026 pattern):

  1. Trajectory collection: As described in trajectory-learning.md.
  2. Filtering: Successful trajectories form the training set.
  3. Adapter training: Parameter-efficient fine-tuning produces an adapter (e.g., LoRA rank 8–32).
  4. Adapter registry: Adapters are tagged with scope (project, team, user) and stored.
  5. Runtime selection: At session start, the harness picks the right adapter(s) based on context.
  6. Composition: Multiple adapters can be loaded simultaneously (with weighted blending) — e.g., a "Python project" adapter plus a "this team's style" adapter.

The training is typically run by the harness operator (not the user) on accumulated trajectories. Hosted ruflo deployments do this on a schedule.

Why It Matters

Micro-LoRA is how agent systems get persistent improvement without bloating prompts. A 50MB adapter encodes patterns that would otherwise need 100k+ tokens of in-context examples. The trade is engineering and infrastructure complexity (training, hosting, serving with adapters) for runtime savings (smaller prompts, faster inference).

The other reason it matters: it is the most natural answer to the data-locality question. A team that doesn't want to expose proprietary code to a centralized vector store can still benefit from training a private adapter on their trajectories.

Key Technical Details

  • Rank and target modules: Typical LoRA rank for micro-adapters is 4–16. Targeting attention-Q/V is enough for most agent biases.
  • Catastrophic forgetting: Mitigated by LoRA's design (base model frozen) but still possible if rank is high or training overfits.
  • Adapter composition: Multiple adapters with weights (base + 0.5 * project + 0.3 * team). Composition adds up to a few percent inference overhead.
  • Hosting: Adapters can be served from the same infra as the base model (vLLM/multi-LoRA serving). Self-hosted deployments need adapter-aware serving stacks.
  • Privacy footprint: Adapter weights themselves can leak training data via known attacks. Treat adapter files with the sensitivity of training data.
  • Eval is non-trivial: Micro-LoRA can subtly bias outputs in ways that are hard to detect without project-specific evals.
  • Update frequency: Daily or weekly adapter retraining is feasible; per-session is not.

How Harnesses & Frameworks Implement This

Harness / FrameworkMicro-LoRA support
Claude CodeNone directly (depends on Anthropic's adapter support roadmap)
Claude Agent SDKFuture-facing — deployable once provider supports per-tenant adapters
rufloFirst-class — ruflo-ruvllm adapter pipeline
LangGraphDIY — connect to a serving stack that supports LoRA
AutoGen / CrewAIDIY
OpenAI Agents SDKNone — OpenAI fine-tuning is full-model only at the time of writing
Codex CLI / Cursor

Connections to Other Concepts

  • trajectory-learning.md — The training-data source.
  • harness-owned-memory.md — The category.
  • cross-session-memory-strategies.md — A complementary technique (non-parametric).
  • ../../llm-concepts/06-parameter-efficient-fine-tuning/lora.md — Foundational coverage.

Further Reading

  • Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (2021) — The foundational paper.
  • ruvnet, ruflo-ruvllm — Adapter serving for agents.