PII Gating and AIDefence

One-Line Summary: PII gating is the harness-layer scrubbing of personally identifiable information (and secrets, credentials, sensitive metadata) from data flowing across trust boundaries; ruflo's AIDefence plugin is the reference implementation, identifying 14+ classes of sensitive data and either redacting, blocking, or alerting based on configured policy.

Prerequisites: Cross-machine agent federation, permission and tool scoping primitives

What Is PII Gating?

When agent state crosses a trust boundary — sent to a federated peer, written to an external log, posted to a third-party API — there is a question to answer: does this contain anything that should not leave? PII gating is the discipline of asking that question consistently, automatically, and at the harness layer rather than per-tool.

The categories of "sensitive" extend beyond classical PII:

Personal: names, emails, phone numbers, addresses, SSNs, DOBs.
Credentials: API keys, passwords, OAuth tokens, SSH keys, AWS credentials.
Internal: internal hostnames, IP ranges, service names, commit messages with vulnerability details.
Confidential business: customer data, unreleased product details, financial figures.
Code-level: secrets accidentally committed, embedded credentials in code or config.

A PII gate scans outbound payloads for these categories, redacts what it finds (or blocks, depending on policy), and emits an audit event.

How It Works

A typical PII-gating pipeline:

Hook into outbound traffic: All outbound tool calls, federation messages, log writes are intercepted.
Detection: Pattern matching (regex for credit cards, SSNs), entropy heuristics (high-entropy strings look like keys), trained classifiers (NER for names), and vocabulary lookups (hostnames, API key prefixes).
Action: Redact (replace with [REDACTED-EMAIL]), block (refuse the operation), or alert (allow but warn).
Audit: Log the detection event with category and action.

Ruflo's AIDefence plugin layers additional behaviors on top: prompt-injection detection on inbound payloads, CVE pattern detection in code being shared, and "trust-score adjustments" when peers attempt to send data that looks suspicious.

Why It Matters

Outbound data leaks are the most-cited risk in enterprise agent deployments. A tool call that ships a project's whole .env file to a third-party API is a breach. A federated agent that includes raw customer data in a shared decision is a privacy incident. PII gating is the harness's promise that this category of mistake doesn't slip through silently.

The other reason it matters: it's a regulatory checkbox. GDPR, HIPAA, SOC 2 controls all require demonstrable handling of sensitive data. A harness with first-class PII gating makes that demonstration possible; one without it makes it impossible.

Key Technical Details

Detection precision and recall trade off: Aggressive regex catches more (high recall) but produces false positives (low precision). NER models do better but cost compute. Tune to your tolerance.
Custom dictionaries: Every organization has its own sensitive terms (project codenames, internal endpoints). PII gates need extension points.
Scope by destination: A name being sent to a federated peer is different from the same name being written to a local log. Same data, different policies.
Reversible vs. irreversible redaction: Reversible (token-replaced and stored) lets the receiver request unredaction with proper authority. Irreversible is safer.
False negatives are the dangerous failure: A PII gate that misses a leak is worse than one that over-redacts. Default to over-redact and let users adjust.
Composability with sandboxing: PII gating + sandboxing (network egress restriction) gives layered defense. Don't rely on one alone.
Performance: Scanning every outbound payload adds latency. Compile patterns once, reuse aggressively, run async when possible.

How Harnesses & Frameworks Implement This

Harness / Framework	PII gating
Claude Code	DIY via `PreToolUse` hooks
Claude Agent SDK	DIY
ruflo	First-class — `ruflo-aidefence` (14 PII classes + injection + CVE detection)
LangGraph	DIY
AutoGen	DIY
CrewAI	Limited
OpenAI Agents SDK	Guardrails partially overlap
Codex CLI / Cursor	Limited

Connections to Other Concepts

cross-machine-agent-federation.md — The natural setting.
permission-and-tool-scoping-primitives.md — Same defense-in-depth philosophy.
prompt-injection-defense-in-harnesses.md — The other side of harness-layer protection.
behavioral-trust-scoring.md — PII detections feed trust score updates.