One-Line Summary: Byzantine fault-tolerant (BFT) protocols handle the case where peers may not just fail but actively misbehave — returning wrong data, breaking the protocol, colluding — with the cost of needing 3f+1 peers to tolerate f bad ones; for federated agent systems with peers from untrusted parties, BFT is the right correctness model.

Prerequisites: Raft for agents, consensus in multi-agent systems, cross-machine agent federation

What Is Byzantine Fault Tolerance?

The "Byzantine generals problem" (Lamport, Shostak, Pease, 1982) asks: how do n generals coordinate an attack when some of them may be traitors who lie about their messages? The answer requires 3f+1 generals to tolerate f traitors — and the protocols that achieve this are collectively called Byzantine fault-tolerant (BFT).

For agent systems, "traitors" are peers that may be:

  • Compromised (a plugin from an untrusted source).
  • Buggy (returning malformed responses without raising).
  • Adversarial (a federated peer from a competing organization).
  • Subject to prompt injection (yielding to attacker control).

BFT protocols ensure the system reaches correct consensus despite such peers, as long as fewer than f of them go rogue out of 3f+1 total.

How It Works

Practical Byzantine Fault Tolerance (PBFT) is the most-cited reference. Simplified flow:

  1. Client → primary: A client sends a request to a designated primary peer.
  2. Pre-prepare: The primary broadcasts the request to all backups.
  3. Prepare: Each backup broadcasts a prepare message acknowledging the request.
  4. Commit: Once a backup has received 2f+1 prepares (including its own), it broadcasts a commit.
  5. Reply: Once a peer has received 2f+1 commits, it executes and replies to the client.
  6. View change: If the primary is suspected faulty, peers initiate a view change to elect a new one.

The cost is communication: PBFT is O(n²) messages per request. Modern BFT protocols (HotStuff, Tendermint) reduce this to O(n) with chained commits.

Why It Matters for Agents

Federated agent platforms — ruflo's federation mode, multi-organization agent collaborations, agent marketplaces with installable third-party agents — necessarily face Byzantine peers. A plugin from an unknown developer might be malicious, a federated peer might be compromised, a plugin updated yesterday might have introduced a bug.

BFT is the protocol-level answer to "can we trust this swarm's consensus?" Without it, one bad peer can corrupt outputs, manipulate memory writes, or subvert decisions silently.

Key Technical Details

  • 3f+1 minimum: To tolerate f Byzantine peers. To tolerate 1 you need 4; to tolerate 2 you need 7.
  • BFT is more expensive than Raft: More peers, more messages, longer latencies. Use it when threat model justifies.
  • Cryptographic signatures are required: Messages must be signed so receivers can verify origin. (See mtls-and-ed25519-for-agent-trust.md.)
  • View changes have liveness implications: Aggressive view-change triggers cause thrashing; conservative ones leave faulty primaries in place.
  • Hybrid setups are common: BFT for inter-organization communication; Raft within each organization. Cheaper inside, robust at the boundary.
  • Performance ceilings: Even modern BFT (HotStuff) maxes out around 10k tx/sec. Agent workloads are well within that range.
  • Protocol choice depends on adversary model: Crash-stop tolerance is Raft. Byzantine + asynchronous is HoneyBadger-style. Byzantine + partial synchrony is PBFT/HotStuff.

How Harnesses & Frameworks Implement This

Harness / FrameworkBFT support
Claude CodeNone
Claude Agent SDKDIY
rufloFirst-class — selectable per swarm in federation mode
LangGraphDIY
AutoGenDIY
CrewAIDIY
OpenAI Agents SDKDIY
Codex CLI / Cursor

Connections to Other Concepts

  • raft-for-agents.md — The crash-fault-tolerant counterpart.
  • cross-machine-agent-federation.md — The natural setting.
  • behavioral-trust-scoring.md — Reputation layer above protocol consensus.
  • mtls-and-ed25519-for-agent-trust.md — The signature substrate.

Further Reading

  • Castro & Liskov, "Practical Byzantine Fault Tolerance" (1999).
  • Yin et al., "HotStuff: BFT Consensus with Linearity and Responsiveness" (2018).