Back/Chatterbox (Resemble AI)

Open Source#12 Artificial AnalysisMIT

Chatterbox (Resemble AI)

MIT license — beats ElevenLabs in blind tests (63.75% preference)

Website Docs

150ms

TTFA (best case) ?

300ms

TTFA (typical) ?

$40/1M

Price per million chars

1050

ELO Score ?

Comparative Scores

Voice quality?7/10

Latency?7/10

Voice cloning?8/10

Expressiveness?8/10

Sovereignty?10/10

Price accessibility9/10

Multilingual1/10

Architecture

ArchitectureFlow matching + 1-step decoder (350M params)

Parameters350M

Languages1

Self-hostable Yes

Streaming Yes

GamiWays

Phase 1 MVP — Clonage vocal souverain

Excellent for sovereign Phase 1 MVP with voice cloning. MIT license enables unrestricted deployment on Swiss infrastructure. English-only is a limitation for multilingual GamiWays use cases. Emotional exaggeration control aligns with Axis 2 (expressive avatar).

Analysis

Chatterbox by Resemble AI achieved 63.75% user preference vs ElevenLabs in blind tests. MIT license enables unrestricted commercial use and self-hosting. Emotional exaggeration control parameter is unique. 350M params with 1-step decoder. #1 trending TTS on HuggingFace in December 2025.

Strengths

63.75% preference vs ElevenLabs (blind test)
MIT license — unrestricted use
Emotional exaggeration control
Zero-shot voice cloning
#1 HuggingFace trending Dec 2025

Weaknesses

English only
GPU required for real-time
No lip-sync data
$40/1M chars managed (4× Inworld)

Voice Capabilities

Voice Cloning ? Yes

Zero-shot voice cloning. Emotional exaggeration control parameter. 63.75% preference vs ElevenLabs in blind tests.

Emotion Control Yes

Emotional exaggeration control parameter (0–1 scale). Unique feature for expressive synthesis.

Streaming ? Yes

Streaming capable. ~150ms TTFA on GPU. 1-step decoder reduces latency vs multi-step models.

Lip-sync Data ? No

No native lip-sync data. Can be paired with external aligner.

Pricing

Price / 1M chars

$40

Price / minute

$0.0400

Free tier

Free (open weights, MIT)

$40/1M chars (Chatterbox HD managed). Self-hosted: near-zero cost.

Sovereignty & Compliance

On-premise Yes

Full self-hosting under MIT license. No usage restrictions.

GDPR ? Compliant

Data residency: Fully local — no data leaves the server.

Strategic & Business Analysis

Chatterbox (Resemble AI) — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?

Chatterbox is the open-source TTS that beat ElevenLabs in blind tests — 63.75% user preference, emotion exaggeration control, Apache 2.0 — the strongest sovereignty-first alternative to premium cloud TTS.

Open-source / self-hosted

Lock-in risk:Low

Sovereignty fit:High

Open-source threat:Low

Pricing:Stable →

A. Strategic Positioning

Target customer: Developer / Enterprise — open-source, emotion control, voice cloning

Apache 2.0 TTS by Resemble AI — 63.75% user preference over ElevenLabs in blind tests, with emotion exaggeration control and zero-shot voice cloning.

B. Competitive Moat

63.75% user preference over ElevenLabs in blind evaluations — best-in-class open-source quality
Emotion exaggeration slider + zero-shot voice cloning from 5 seconds of audio
Backed by Resemble AI ($13M raised 2025) — commercial support available

Vulnerability: Open-core strategy monetization uncertainty. Resemble AI's pivot to deepfake detection (Dec 2025) may shift focus away from Chatterbox.

E. Strategic Questions for GamiWays

Sovereignty fit

Fully self-hostable on Swiss/EU infrastructure. Apache 2.0 license. Resemble AI's deepfake detection focus adds audio security layer.

Build vs. Buy

Build (integrate open-source) for both Phase 1 and Phase 2. Best quality-sovereignty combination in open-source TTS.

Lock-in risk

Apache 2.0 open-source — zero vendor lock-in. GamiWays can fork and maintain if needed.

Roadmap alignment

Excellent: best open-source quality for Phase 1, full sovereignty for Phase 2. Natural fit for GamiWays's progressive deployment strategy.

Back to State of the Art View in Benchmarks

Data Freshness

Updated 3 May 2026

Artificial Analysis TTS mai 2026 + Resemble AI benchmark

Update note: Chatterbox v1 (avr 2025) : ELO 1050 (Artificial Analysis mai 2026, #12 classement). 63,75% préférence vs ElevenLabs (test aveugle). MIT license. 350M params, flow matching + décodeur 1 étape. Chatterbox v2 en développement (support multilingual annoncé). Clonage vocal : OUI (zero-shot). Managed : $40/1M chars. Self-hosted : gratuit.

Reference Sources

Chatterbox GitHubdocs Resemble AI Blognews HuggingFace TTS Arenabenchmark Artificial Analysis TTS Benchmarksbenchmark Chatterbox HuggingFacedocs