Back/Inworld STT

Cloud APICommercial

Inworld STT

Voice profiling STT — <100ms, Realtime API $0.015/min, clonage vocal natif, 100+ langues

Website Docs

92ms

Latency (best case) ?

150ms

Latency (typical) ?

WER (general audio) ?

$0.0060/min

Price per minute

Comparative Scores

Accuracy (WER)?9/10

Streaming latency?10/10

Multilingual10/10

Sovereignty?3/10

Price accessibility7/10

Streaming quality?10/10

Architecture

ArchitectureMulti-provider router (Whisper large-v3 + AssemblyAI Universal Streaming + proprietary)

ParametersN/A (cloud router)

Languages100+

Self-hostable No

Streaming ? Yes

WER clean audio ?3%

GamiWays

Phase 1 MVP — STT émotionnel + Axe 2 Avatar Behavior

High strategic value for GamiWays Axis 2 (Avatar Behavior) and Emotional Toolbox. Voice profiling enables real-time emotion detection without a separate model — feeding directly into avatar expression selection and LLM prompt conditioning. <100ms latency compatible with Phase 1 target. Evaluate as primary STT if emotion-aware routing is required in Phase 1.

Analysis

Inworld STT (2025–2026) is the most feature-rich cloud STT API for interactive voice agents. Sub-100ms documented latency, 100+ languages via multi-provider routing (Whisper large-v3 + AssemblyAI). Unique real-time voice profiling extracts emotion (happy/calm/angry/frustrated), accent, age, pitch, and vocal style on every streaming chunk. Realtime API (full pipeline STT+LLM+TTS) from $0.015/min — 4x cheaper than OpenAI Realtime ($0.06/min). Native voice cloning: built-in + cloned + custom voices (up to 3,000 custom voices on Growth plan). RAG via function calling (tool calling mid-conversation). ZDR support. On-premise available on Enterprise. Drop-in compatible with OpenAI Realtime API.

Strengths

<100ms documented streaming latency (92ms TTFA)
Realtime API $0.015/min — 4x cheaper than OpenAI Realtime
Native voice cloning : built-in + cloned + custom (up to 3,000 voices)
RAG via function calling mid-conversation
Voice profiling: emotion, accent, age, pitch, vocal style
100+ languages (Whisper large-v3 backend)
Semantic + acoustic VAD
ZDR — audio never stored
On-premise available (Enterprise)
Drop-in compatible with OpenAI Realtime API
Condition Router: route by emotion/language/tier

Weaknesses

On-premise Enterprise only (not available on lower tiers)
Pricing less transparent than Deepgram (credit-based system)
Multi-provider adds latency variability
Vendor lock-in risk if using full Inworld stack (STT+TTS+LLM)
No fine-tuning on custom data
400%+ price increase reported in 2026 (market consolidation risk)

STT Capabilities

Streaming ? Yes

Bidirectional WebSocket streaming. <100ms documented latency (92ms TTFA). Interim results with voice profile signals on every audio chunk. Semantic + acoustic VAD. Configurable endpointing.

Diarization ? Yes

Custom Vocabulary Yes

Word Timestamps Yes

Auto Punctuation Yes

Multilingual Yes

100+ languages

Pricing

Price / minute

$0.0060

Price / hour

$0.360

Free tier

Free tier available. Growth plan: $1,500 credits/month with 40% off.

STT seul : $0.006–0.012/min selon modèle. Realtime API (pipeline complet STT+LLM+TTS) : à partir de $0.015/min (vs OpenAI Realtime $0.06/min). Growth plan : 40% de réduction ($1,500 crédits/mois). On-prem : Enterprise uniquement. Free tier disponible.

Sovereignty & Compliance

On-premise No

No on-premise. ZDR ensures audio never stored. EU data residency on Enterprise.

GDPR ? Compliant

Data residency: US (default). EU data residency on Enterprise. ZDR available — audio never stored.

On-premise No

Cloud API only. Zero Data Retention (ZDR) available — audio never stored, processed in real time.

Strategic & Business Analysis

Inworld STT — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?

Inworld STT offers the best sovereignty credentials among commercial STT providers — full on-premise, EU data residency, SOC 2 Type II + GDPR + HIPAA. But its 400%+ pricing increases are a strategic red flag that accelerates open-source migration.

Cloud + On-premise

Lock-in risk:High

Sovereignty fit:High

Open-source threat:Medium

Pricing:Rising ↑

A. Strategic Positioning

Target customer: Developer / Enterprise — gaming, regulated industries, real-time interactive experiences

Sub-200ms real-time STT with full on-premise deployment, EU/India data residency, and SOC 2 Type II + GDPR + HIPAA compliance.

B. Competitive Moat

Sub-200ms real-time STT with high accuracy — surpassing comparable models in latency
Full on-premise deployment + EU/India data residency — sovereignty trifecta
SOC 2 Type II, GDPR, HIPAA — enterprise compliance for regulated industries

Vulnerability: Significant pricing increases (400%+) reported by users could push clients toward open-source alternatives. Open-source models closing the quality gap.

E. Strategic Questions for GamiWays

Sovereignty fit

Full on-premise deployment + EU data residency + SOC 2 Type II + GDPR + HIPAA. Best sovereignty fit among commercial STT providers.

Build vs. Buy

Buy for Phase 1 real-time requirements. Monitor pricing carefully. For Phase 2, evaluate Whisper/Voxtral self-hosted as cost-effective sovereignty alternative.

Lock-in risk

Proprietary models + significant pricing increases create high lock-in risk. On-premise deployment reduces cloud dependency but not vendor dependency.

Roadmap alignment

Good for Phase 1 real-time agents. Phase 2 sovereignty is technically satisfied but pricing risk is high. Monitor pricing trajectory carefully.

Back to Speech Recognition View in Benchmarks

Data Freshness

Updated 2 May 2026

https://inworld.ai/resources/best-speech-to-text-apis

Update note: Realtime API $0.015/min confirmé (mai 2026) — 4x moins cher qu'OpenAI Realtime. Clonage vocal natif confirmé : built-in + cloné + custom (jusqu'à 3 000 voix custom sur Growth). RAG via function calling mid-conversation. On-premise disponible en Enterprise. ZDR sur tous les plans. EU data residency sur Enterprise. Hausse de prix 400%+ signalée en 2026.

Reference Sources

Inworld Realtime APIdocs Inworld STT Product Pagedocs Inworld Pricing (mai 2026)pricing Inworld STT Benchmark Articlebenchmark Artificial Analysis STTbenchmark Inworld vs OpenAI Realtime API Comparisonbenchmark