Testgap and Coverage Workers

One-Line Summary: A testgap worker continuously identifies code without test coverage and proposes (or generates) tests; coverage workers track what's covered and aren't, surface deltas after each session, and prevent slow erosion of test quality — among the highest-leverage background workers because the work they do is something humans skip under time pressure.

Prerequisites: Background worker pattern, audit and optimize workers

What Is a Testgap Worker?

A test gap is a unit of code (a function, a branch, a file, a feature) that has no tests covering it. Test gaps accumulate naturally — a new feature ships without tests because they take longer to write than to skip. A testgap worker periodically scans for these and surfaces them: "the parseConfig function added in your last session has no tests; here are three I can write."

A coverage worker is a thinner relative: it tracks code-coverage metrics over time, alerts on regressions, and prompts the main agent (or user) when coverage drops. Unlike a testgap worker, it doesn't propose specific tests — just tracks the metric.

The two often work together: coverage detects the regression, testgap proposes the fix.

How It Works

A testgap worker pipeline:

Trigger: After a session ends, or on commit, or daily.
Diff vs. test-coverage: Use coverage.py, Istanbul, or equivalent to find untested code added or modified since last run.
Prioritize: Public APIs > internal functions > test helpers. Critical paths > leaf utilities. Recent changes > old.
Generate test proposals: Use the LLM to write candidate tests. Run them in a sandbox to verify they pass.
Surface: Land proposed tests in a scratchpad / PR comment / suggestion list. The main agent (or user) decides whether to accept.

Ruflo's testgaps worker does roughly this with the additional twist of dispatching specialized sub-agents for different test types (unit, integration, e2e).

Why It Matters

Tests are the kind of work that pays off later but requires effort now — exactly the kind of work humans defer and that benefits most from being autonomously generated. A background testgap worker turns "I should write tests for this" into "the agent already proposed tests; I just review and accept."

The honest qualifier: auto-generated tests have quality problems. They can be tautological (testing the implementation back at itself), miss edge cases, or create fragile mocks. A good testgap worker biases toward suggesting test targets humans can fill in, with proposed tests as starting points rather than finished work.

Key Technical Details

Coverage tools are the substrate: coverage.py, Istanbul, JaCoCo, etc. Without coverage data, testgap workers grope blindly.
Test framework conventions are project-specific: A worker that proposes tests in pytest style for a Mocha project is wasted output. Detect the existing convention.
Run proposed tests before reporting: A test that doesn't run is noise. The worker should execute its proposals and only report passing ones.
Mutation testing as a quality check: A test that doesn't catch a mutated version of the function is weak. Mutation testing tools can score test proposals.
Avoid test-pollution: Workers shouldn't commit tests autonomously. Commit by humans.
Triangulation: Multiple test types per untested function (unit + property-based + integration) catches more.
Cost vs. value: Testgap workers are expensive (LLM calls per gap). Limit per-session work; queue the rest for next time.

How Harnesses & Frameworks Implement This

Harness / Framework	Testgap workers
Claude Code	DIY
Claude Agent SDK	DIY
ruflo	First-class — `testgaps` is one of 12 default workers
LangGraph	DIY
AutoGen	DIY
CrewAI	DIY
OpenAI Agents SDK	DIY
Codex CLI / Cursor	✗

Connections to Other Concepts

background-worker-pattern.md — Parent concept.
audit-and-optimize-workers.md — Same family.
event-driven-harness-architectures.md — Infrastructure.
../../multi-skill-agent/08-testing-multi-skill-agents/integration-testing.md — Foundational coverage.