samora.ai
Backed by Y Combinator

Data & Evaluation
for Voice AI.

The richest multilingual voice dataset and the most rigorous evaluation framework - built from 2M+ live production calls.

Built on a foundation of scale.

The data, languages, and metrics powering Voice AI evaluation at every layer.

65,000+

Contributors building & evaluating Voice AI.

80+

Languages covered across global voice datasets.

40+

Automated eval metrics across every conversation.

3-Layer

Eval stack: transcript, speech, and behavior.

GoogleMicrosoftMorgan StanleyUNICEF

From spec to delivery.

We align on target languages, dialects, demographics, recording conditions, and use-case domain. Together we draft an evaluation rubric - what counts as a pass, what counts as a failure event, and how each turn should be scored.

  • +Language & dialect targeting
  • +Demographic & domain spec
  • +Custom eval rubric

Spin up language-specific data collection in days.

From contributor recruitment to verbatim transcription with dialect codes, every dataset is shaped to your model and your market.

  • +65,000+ contributors across 80+ countries
  • +Verbatim transcripts with dialect codes
  • +Full consent & provenance chain
  • +Custom demographic & annotation specs
Live coverage
80+ countries · 65,000+ contributors
Studio / Controlled
High-fidelity recordings in acoustically treated environments.
Call Center / Telephony
Real production telephony audio with codec realism.
Mobile & Outdoor
In-the-wild captures across devices and ambient conditions.
Synthetic / Edge Cases
Targeted adversarial and long-tail scenarios.

The most rigorous evaluation framework for voice AI.

A three-layer stack - automated, industry-tuned, and human-reviewed - applied to every conversation your agent has.

LAYER 01
WERDiarizationIntent MatchEntity F1MOSSNRProsodyLatencyTurn-taking

Turn-by-Turn AI Evaluation

Automated metrics applied to every turn across transcript, speech, and behavior.

TranscriptSpeechBehavioral
LAYER 02
FDCPA ChecksRight Party ContactPromise to PayMini-MirandaScreening FidelityBias SurfaceDrop-offScheduling Conv.Resolution

Industry-Specific Evaluation

Domain rubrics built with operators in each vertical.

Debt CollectionRecruitmentCustomer Support
LAYER 03
Root CauseSeverityRecurrenceRecoveryPrompt DiffsTool UseKnowledge GapsRoutingPrompt Injection

Human Expert Evaluation

Trained reviewers catch what models miss.

Failure Mode IDAgent ImprovementAdversarial & Edge Cases

Failure Events, not pass/fail scores.

Call-level pass/fail loses the signal. Samora logs each failure as a structured event with turn, type, severity, and recovery - so you can fix the exact failure mode, not relitigate the entire call.

samora-logs
event_id:fe_28a91c3d
turn:4 of 12
type:compliance_gap
severity:high
recovery:succeeded · +1 turn
action:re-collect: hi-IN, debt
2M+
Production calls
40+
Eval metrics
80+
Languages

Built ground-up for code-switching, dialects, and low-resource languages - not English with translations bolted on.

Dashboard preview

Frequently asked questions.

80+ languages spanning major and low-resource locales, with dialect-level metadata (e.g. hi-IN, es-MX, en-NG). New locales spin up in days through our contributor network.

Most custom pipelines deliver first batches within 7–14 days from spec sign-off, scaling weekly thereafter.

Layer 01 is turn-by-turn AI scoring across transcript, speech, and behavior. Layer 02 applies industry-specific rubrics. Layer 03 is human expert review for failure modes and adversarial cases.

Structured records of every individual failure - turn, type, severity, recovery, and recommended action - instead of a single call-level pass/fail.

No. Every engagement is on-demand and scoped to your model, locale, and domain. This is the only way to guarantee distributional fit.

Explicit informed consent is captured per contributor, with full provenance chains, PII redaction, and regional compliance (GDPR, sector-specific) by default.

Ready to scope your data & evaluation needs?

Tell us your model, your locales, and your edge cases. We'll come back with a pipeline and an evaluation plan in 48 hours.

Backed by Y Combinator
Founded by alumni of Stanford & Microsoft