05 — Case study · Case study · Neptune
Multi-Agent Fact-Checker
Claude and GPT-4 agents argue over claims; a confidence-weighted arbiter settles them with live web evidence — cutting fact-checking from 7h to 2h.
Year
2026
Role
Designer & builder (Finary recruitment case)
Context
Case study
Stack
Claude API · GPT-4 · Tavily · Notion webhooks
7h → 2h
per script fact-checked
−71%
verification time
2
competing frontier models
The challenge
Fact-checking a long-form video script took around seven hours of manual source-hunting per script. A single LLM can't be trusted as the checker — it hallucinates with confidence. The interesting question: can two models keep each other honest?
What I built
- A claim-extraction stage splitting a script into individually verifiable statements.
- Two competing agents — Claude and GPT-4 — independently verifying each claim and scoring their own confidence.
- Live evidence retrieval through the Tavily search API, so verdicts cite current sources rather than training data.
- A confidence-weighted arbitration layer resolving disagreements between the two agents instead of naively trusting either.
- A Notion webhook delivering the annotated report — claim, verdict, confidence, sources — into the team's existing workspace.
Results
- Fact-checking time dropped from ~7 hours to ~2 hours per script — a 71% reduction with sources attached to every verdict.
- Adversarial multi-agent setups beat single-model verification: disagreement between models is exactly where human attention should go.