Ajay Yadav
← All projects
Voice AI · Observability

Outcome Health

Observability for voice AI agents — built from 1,048 call transcripts and one sobering number: 0.1% goal completion rate

Built with:PythonpandasData AnalysisHTML prototypeProduct Strategy
1,048
Reggie calls analyzed
0.1%
goal completion rate found
21.5%
of calls had hallucinations
4 hrs
analysis + working prototype
01
The brief

Voice AI agents sound coherent and still fail their primary goal — silently

Operators of voice AI agents have rich conversation logs but no outcome-level observability. An agent can run thousands of calls, handle objections smoothly, and never achieve the goal it was deployed for — with nothing in the dashboard to surface that failure.

The core problem: operators don't know where to start, what's important, or what to do next. Every product decision in Outcome Health was anchored to those three questions.

I analyzed 1,048 voice AI call transcripts and 18,502 turns using Python and pandas, surfaced five failure categories, and built a working HTML prototype demonstrating the full Outcome Health dashboard end-to-end.

02
What I found

Five failure categories. Each one a different kind of product gap.

Finding 01
0.1% goal completion

The demo booking wasn't happening

One confirmed demo booking across 1,048 calls. 97 callers showed explicit intent to book (9.3%) — but Reggie was responding with a URL redirect instead of invoking the Cal.com booking action. The most expensive problem was invisible to the customer.

Finding 02
2.46 / 7 BANT dimensions

Reggie barely qualified leads

Budget, authority, and timeline were captured in zero calls. Every call. Reggie knew the caller's name and maybe their company — and stopped there. The data needed for pipeline scoring simply didn't exist.

Finding 03
21.5% hallucination rate

1 in 5 calls had an unsourced claim

Healthcare callers heard them 43.4% of the time. 49 calls showed a doubling-down pattern — Reggie repeated or amplified a claim when the caller pushed back. The specific usage claims attached to named customers (Google, Guitar Center) couldn't be verified from any KB source.

Finding 04
48 calls, 1 confirmed lost

A template variable spoken aloud lost a real prospect

Andrew Wolcock, Honda of Princeton, gave his callback number. Reggie confirmed with the literal string "[PHONE]" spoken aloud. Satish was never notified. This is the most emotionally resonant finding: a named customer, a named loss, a fixable root cause.

Finding 05
34 voicemail + 91 repetition loops

125 calls wasted on voicemail and loops

The longest stuck call: 55 minutes, 101 turns, 7 distinct loops. The fix already existed in Agent Builder — voicemail detection is a configurable action. The customer just didn't know they needed it because no dashboard surfaced the pattern.

“The biggest insight wasn't any single finding — it was that a customer looking at today's dashboards would have no idea any of this was happening. Outcome visibility was the product gap, not conversation quality.”

03
What I built

Outcome Health

A five-panel CI dashboard that answers the three questions Regal customers couldn't answer today: where to start, what's important, and what to do next.

Panel 01

Goal Funnel

Five-step funnel from call start → goal intent → booking attempt → completion. Shows exactly where the drop-off is — not just that it exists.

Panel 02

Qualification Scorecard

BANT dimensions captured per call, aggregated over time. Customer-configurable — each agent owner defines which dimensions matter for their use case.

Panel 03

Risk Monitor

Live Tracker alerts for hallucinations and template variable errors. One-click drill-down to the transcript evidence. One-click fix path back to Agent Builder.

Panel 04

Failure Mode Analysis

Clusters the most common failure patterns across all calls. Surfaces patterns the customer didn't know to look for — not just the ones they already track.

Panel 05

Suggested Next 3 Fixes

Ranked by volume × estimated lift. Fix 1: wire Cal.com action (46 calls affected). Fix 2: remove template variable risk (1 confirmed lost prospect). Fix 3: tighten healthcare KB (106 callers affected).

4w
Phase 1 roadmap: 4 weeks

1 frontend + 1 backend engineer. No new ML required. Everything in Phase 1 is built on existing Tracker and Custom AI Analysis infrastructure — composing capabilities that already exist in a new outcome-oriented view.

04
Key decisions

The calls that shaped what got built — and what didn't

Phase 1 uses zero new ML

The Goal Funnel is an existing Tracker with a new configuration. The Risk Monitor panels are existing Trackers surfaced in a new view. The Qualification Scorecard pulls from Custom AI Analysis. Phase 1 is infrastructure composition — not new infrastructure. That's how it ships in 4 weeks.

Named it Outcome Health, not observability

Regal already has an Observability section. But Outcome Health is analytical, operational, and prescriptive — not just monitoring. The word 'outcome' does specific work: it fills the gap that Regal Improve and Coverage Gap don't touch.

Built a working prototype, not a Figma wireframe

The brief listed 'build your own app' as an option. A labeled wireframe shows structure. A working prototype demonstrates that the product actually answers the three customer questions: where to start, what's important, what to do next. Clicking the Honda Princeton row and reading the transcript is a different experience.

Five findings, not three

Each finding maps to a distinct customer decision. Each decision maps directly to one panel in Outcome Health. Dropping either the data quality finding (one real customer lost) or the compute waste finding (most operationally clear) would have weakened both the analysis and the product framing.

See it in the browser

The working Outcome Health prototype — styled to Regal's actual CI interface.