One-Line Summary: Byzantine fault-tolerant (BFT) protocols handle the case where peers may not just fail but actively misbehave — returning wrong data, breaking the protocol, colluding — with the cost of needing 3f+1 peers to tolerate f bad ones; for federated agent systems with peers from untrusted parties, BFT is the right correctness model.
Prerequisites: Raft for agents, consensus in multi-agent systems, cross-machine agent federation
What Is Byzantine Fault Tolerance?
The "Byzantine generals problem" (Lamport, Shostak, Pease, 1982) asks: how do n generals coordinate an attack when some of them may be traitors who lie about their messages? The answer requires 3f+1 generals to tolerate f traitors — and the protocols that achieve this are collectively called Byzantine fault-tolerant (BFT).
For agent systems, "traitors" are peers that may be:
- Compromised (a plugin from an untrusted source).
- Buggy (returning malformed responses without raising).
- Adversarial (a federated peer from a competing organization).
- Subject to prompt injection (yielding to attacker control).
BFT protocols ensure the system reaches correct consensus despite such peers, as long as fewer than f of them go rogue out of 3f+1 total.
How It Works
Practical Byzantine Fault Tolerance (PBFT) is the most-cited reference. Simplified flow:
- Client → primary: A client sends a request to a designated primary peer.
- Pre-prepare: The primary broadcasts the request to all backups.
- Prepare: Each backup broadcasts a prepare message acknowledging the request.
- Commit: Once a backup has received 2f+1 prepares (including its own), it broadcasts a commit.
- Reply: Once a peer has received 2f+1 commits, it executes and replies to the client.
- View change: If the primary is suspected faulty, peers initiate a view change to elect a new one.
The cost is communication: PBFT is O(n²) messages per request. Modern BFT protocols (HotStuff, Tendermint) reduce this to O(n) with chained commits.
Why It Matters for Agents
Federated agent platforms — ruflo's federation mode, multi-organization agent collaborations, agent marketplaces with installable third-party agents — necessarily face Byzantine peers. A plugin from an unknown developer might be malicious, a federated peer might be compromised, a plugin updated yesterday might have introduced a bug.
BFT is the protocol-level answer to "can we trust this swarm's consensus?" Without it, one bad peer can corrupt outputs, manipulate memory writes, or subvert decisions silently.
Key Technical Details
- 3f+1 minimum: To tolerate f Byzantine peers. To tolerate 1 you need 4; to tolerate 2 you need 7.
- BFT is more expensive than Raft: More peers, more messages, longer latencies. Use it when threat model justifies.
- Cryptographic signatures are required: Messages must be signed so receivers can verify origin. (See
mtls-and-ed25519-for-agent-trust.md.) - View changes have liveness implications: Aggressive view-change triggers cause thrashing; conservative ones leave faulty primaries in place.
- Hybrid setups are common: BFT for inter-organization communication; Raft within each organization. Cheaper inside, robust at the boundary.
- Performance ceilings: Even modern BFT (HotStuff) maxes out around 10k tx/sec. Agent workloads are well within that range.
- Protocol choice depends on adversary model: Crash-stop tolerance is Raft. Byzantine + asynchronous is HoneyBadger-style. Byzantine + partial synchrony is PBFT/HotStuff.
How Harnesses & Frameworks Implement This
| Harness / Framework | BFT support |
|---|---|
| Claude Code | None |
| Claude Agent SDK | DIY |
| ruflo | First-class — selectable per swarm in federation mode |
| LangGraph | DIY |
| AutoGen | DIY |
| CrewAI | DIY |
| OpenAI Agents SDK | DIY |
| Codex CLI / Cursor | ✗ |
Connections to Other Concepts
raft-for-agents.md— The crash-fault-tolerant counterpart.cross-machine-agent-federation.md— The natural setting.behavioral-trust-scoring.md— Reputation layer above protocol consensus.mtls-and-ed25519-for-agent-trust.md— The signature substrate.
Further Reading
- Castro & Liskov, "Practical Byzantine Fault Tolerance" (1999).
- Yin et al., "HotStuff: BFT Consensus with Linearity and Responsiveness" (2018).