01 — Case study · Case study · Neptune

Multi-Agent Fact-Checker

Two AI agents argue every claim and a confidence-weighted arbiter settles it with live web evidence — cutting video fact-checking from 7h to 1h.

Read the case study

Year

2026

Role

Designer & builder

Context

Case study

Stack

Claude API · GPT-4 · Tavily · Notion webhooks

7h → 1h

per video fact-checked

−86%

verification time

~10h

saved per week

The challenge

Fact-checking a long-form video script meant about seven hours of manual source-hunting per video — roughly ten hours a week. A single LLM can't be trusted to do it: it hallucinates with confidence and cites nothing. The real question was whether two models could keep each other honest.

What I built

A claim-extraction stage that splits a script into individually verifiable statements.
Two competing agents — Claude and GPT-4 — that verify each claim independently and score their own confidence.
Live evidence retrieval through the Tavily search API, so every verdict cites current sources rather than training data.
A confidence-weighted arbiter that resolves disagreements between the two agents instead of trusting either blindly.
A Notion trigger: paste a script, get back an annotated report — claim, verdict, confidence, sources — in the team's existing workspace.

Results

Fact-checking dropped from ~7 hours to ~1 hour per video — an 86% cut, roughly ten hours back every week, with sources attached to every verdict.
The adversarial setup beat single-model checking: disagreement between the two models is exactly where human attention belongs.

← Previous

Tiro

Newsletter Studio

theo.david@audencia.com