Observability for voice AI agents — built from 1,048 call transcripts and one sobering number: 0.1% goal completion rate
Operators of voice AI agents have rich conversation logs but no outcome-level observability. An agent can run thousands of calls, handle objections smoothly, and never achieve the goal it was deployed for — with nothing in the dashboard to surface that failure.
The core problem: operators don't know where to start, what's important, or what to do next. Every product decision in Outcome Health was anchored to those three questions.
I analyzed 1,048 voice AI call transcripts and 18,502 turns using Python and pandas, surfaced five failure categories, and built a working HTML prototype demonstrating the full Outcome Health dashboard end-to-end.
One confirmed demo booking across 1,048 calls. 97 callers showed explicit intent to book (9.3%) — but Reggie was responding with a URL redirect instead of invoking the Cal.com booking action. The most expensive problem was invisible to the customer.
Budget, authority, and timeline were captured in zero calls. Every call. Reggie knew the caller's name and maybe their company — and stopped there. The data needed for pipeline scoring simply didn't exist.
Healthcare callers heard them 43.4% of the time. 49 calls showed a doubling-down pattern — Reggie repeated or amplified a claim when the caller pushed back. The specific usage claims attached to named customers (Google, Guitar Center) couldn't be verified from any KB source.
Andrew Wolcock, Honda of Princeton, gave his callback number. Reggie confirmed with the literal string "[PHONE]" spoken aloud. Satish was never notified. This is the most emotionally resonant finding: a named customer, a named loss, a fixable root cause.
The longest stuck call: 55 minutes, 101 turns, 7 distinct loops. The fix already existed in Agent Builder — voicemail detection is a configurable action. The customer just didn't know they needed it because no dashboard surfaced the pattern.
“The biggest insight wasn't any single finding — it was that a customer looking at today's dashboards would have no idea any of this was happening. Outcome visibility was the product gap, not conversation quality.”
A five-panel CI dashboard that answers the three questions Regal customers couldn't answer today: where to start, what's important, and what to do next.
Five-step funnel from call start → goal intent → booking attempt → completion. Shows exactly where the drop-off is — not just that it exists.
BANT dimensions captured per call, aggregated over time. Customer-configurable — each agent owner defines which dimensions matter for their use case.
Live Tracker alerts for hallucinations and template variable errors. One-click drill-down to the transcript evidence. One-click fix path back to Agent Builder.
Clusters the most common failure patterns across all calls. Surfaces patterns the customer didn't know to look for — not just the ones they already track.
Ranked by volume × estimated lift. Fix 1: wire Cal.com action (46 calls affected). Fix 2: remove template variable risk (1 confirmed lost prospect). Fix 3: tighten healthcare KB (106 callers affected).
1 frontend + 1 backend engineer. No new ML required. Everything in Phase 1 is built on existing Tracker and Custom AI Analysis infrastructure — composing capabilities that already exist in a new outcome-oriented view.
The Goal Funnel is an existing Tracker with a new configuration. The Risk Monitor panels are existing Trackers surfaced in a new view. The Qualification Scorecard pulls from Custom AI Analysis. Phase 1 is infrastructure composition — not new infrastructure. That's how it ships in 4 weeks.
Regal already has an Observability section. But Outcome Health is analytical, operational, and prescriptive — not just monitoring. The word 'outcome' does specific work: it fills the gap that Regal Improve and Coverage Gap don't touch.
The brief listed 'build your own app' as an option. A labeled wireframe shows structure. A working prototype demonstrates that the product actually answers the three customer questions: where to start, what's important, what to do next. Clicking the Honda Princeton row and reading the transcript is a different experience.
Each finding maps to a distinct customer decision. Each decision maps directly to one panel in Outcome Health. Dropping either the data quality finding (one real customer lost) or the compute waste finding (most operationally clear) would have weakened both the analysis and the product framing.
The working Outcome Health prototype — styled to Regal's actual CI interface.