05 — Case study · Case study · Neptune

Multi-Agent Fact-Checker

Claude and GPT-4 agents argue over claims; a confidence-weighted arbiter settles them with live web evidence — cutting fact-checking from 7h to 2h.

Year

2026

Role

Designer & builder (Finary recruitment case)

Context

Case study

Stack

Claude API · GPT-4 · Tavily · Notion webhooks

7h → 2h

per script fact-checked

−71%

verification time

2

competing frontier models

The challenge

Fact-checking a long-form video script took around seven hours of manual source-hunting per script. A single LLM can't be trusted as the checker — it hallucinates with confidence. The interesting question: can two models keep each other honest?

What I built

Results

theo.david@audencia.com