Back/Deepgram Aura 2

Cloud APICommercial

Deepgram Aura 2

Ultra-low latency TTS optimized for voice agents — <100ms

Website Docs

80ms

TTFA (best case) ?

150ms

TTFA (typical) ?

$15/1M

Price per million chars

—

ELO Score ?

Comparative Scores

Voice quality?6/10

Latency?9/10

Voice cloning?1/10

Expressiveness?4/10

Sovereignty?3/10

Price accessibility7/10

Multilingual1/10

Architecture

ArchitectureProprietary streaming neural TTS

ParametersN/A (cloud)

Languages1

Self-hostable No

Streaming Yes

DigiDouble

Phase 1 MVP — Stack ASR+TTS intégré

Relevant for Phase 1 MVP if using Deepgram Nova-3 for ASR. The ASR+TTS stack from a single provider simplifies integration and reduces latency. Limited to English and no voice cloning are significant constraints.

Analysis

Deepgram Aura 2 is optimized for voice agent pipelines, achieving <100ms TTFA. Best used in conjunction with Deepgram Nova-3 ASR (75–200ms) for a complete, low-latency ASR+TTS stack. No voice cloning, limited expressiveness — focused on speed and reliability for English voice agents.

Strengths

<100ms TTFA
Optimized for voice agent pipelines
Natural pairing with Deepgram ASR
Reliable at scale

Weaknesses

No voice cloning
English only (Aura 2)
Limited expressiveness
No lip-sync data

Voice Capabilities

Voice Cloning ? No

No voice cloning. Pre-built voices only.

Emotion Control No

Limited emotion control. Focused on natural, neutral speech.

Streaming ? Yes

<100ms TTFA. Optimized for real-time voice agents. Often paired with Deepgram Nova-3 ASR (75–200ms) for full stack.

Lip-sync Data ? No

No native lip-sync data.

Pricing

Price / 1M chars

$15

Price / minute

$0.0150

Free tier

$200 free credits on signup

$15/1M chars. Often bundled with Deepgram ASR for full voice agent stack.

Sovereignty & Compliance

On-premise No

Cloud only. Enterprise on-premise via agreement.

GDPR ? Compliant

Data residency: US, EU

Strategic & Business Analysis

Deepgram Aura 2 — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for DigiDouble?

Deepgram Aura is the unified voice platform play: STT+TTS+LLM in one API, sub-200ms latency, $1.3B valuation — but its VPC-only stance limits sovereignty appeal for regulated European deployments.

Cloud + VPC

Lock-in risk:Medium

Sovereignty fit:Medium

Open-source threat:Medium

Pricing:Commoditizing ↓↓

A. Strategic Positioning

Target customer: Developer / Enterprise — real-time voice agents, unified STT+TTS platform

Sub-200ms TTS latency as part of a unified STT+TTS+LLM platform — the one-stop shop for voice AI agent infrastructure.

B. Competitive Moat

Unified STT+TTS+LLM API — reduces integration complexity and latency vs multi-vendor stacks
Sub-200ms TTS latency with natural voices — competitive with Cartesia for real-time agents
Series C $130M (Jan 2026) — $1.3B valuation — financial strength for R&D

Vulnerability: No on-premise TTS option (VPC only). Open-source models catching up. Pricing pressure from competitors.

E. Strategic Questions for DigiDouble

Sovereignty fit

EU data residency via VPC available. No full on-premise for TTS. Moderate sovereignty fit — better than pure cloud, worse than Inworld.

Build vs. Buy

Buy for Phase 1 (unified platform, low integration overhead). Evaluate on-premise alternatives for Phase 2 sovereignty.

Lock-in risk

Unified STT+TTS platform creates integration lock-in. VPC deployment and competitive pricing reduce dependency risk.

Roadmap alignment

Good for Phase 1 (unified STT+TTS for voice agents). Phase 2 requires on-premise TTS — consider Inworld or open-source for sovereignty.

Back to State of the Art View in Benchmarks

Data Freshness

Updated 30 April 2026

Deepgram docs + Introl Voice AI guide, Jan 2026

Update note: Deepgram Aura-2 pricing: $15/1M chars. TTFA <100ms. English only. Best paired with Deepgram Nova-3 ASR for full voice agent stack.

Reference Sources

Deepgram Aura-2 Pricingpricing Deepgram Aura-2 Docsdocs Artificial Analysis TTSbenchmark