Back/Kokoro 82M v1.0
Open Source#9 Artificial AnalysisApache 2.0

Kokoro 82M v1.0

Highest-ranked open-weight TTS — ELO 1059, 82M params, Apache 2.0

60ms
TTFA (best case) ?
120ms
TTFA (typical) ?
$0.7/1M
Price per million chars
1059
ELO Score ?

Comparative Scores

Voice quality?7/10
Latency?9/10
Voice cloning?1/10
Expressiveness?5/10
Sovereignty?10/10
Price accessibility10/10
Multilingual2/10

Architecture

ArchitectureLightweight transformer (82M params)
Parameters82M
Languages2
Self-hostable Yes
Streaming Yes
DigiDouble
Phase 1 MVP — Souveraineté maximale

Strong candidate for sovereign Phase 1 MVP. Runs on Swiss Exoscale GPU infrastructure. No voice cloning is a significant limitation for personalized DigiDouble use cases. Pair with XTTS-v2 or Chatterbox for voice cloning needs.

Analysis

Kokoro 82M v1.0 is the highest-ranked open-weight TTS model on Artificial Analysis (ELO 1059, rank #9). At just 82M parameters, it runs efficiently on CPU or modest GPU with 36× real-time speed on T4. Apache 2.0 license enables full sovereign deployment. Best price-performance among open models: $0.70/1M chars managed.

Strengths

  • ELO 1059 — #1 open-weight model
  • 82M params — runs on CPU
  • 36× real-time on T4 GPU
  • Apache 2.0 — full sovereignty
  • $0.70/1M chars managed

Weaknesses

  • No voice cloning
  • English only (American/British)
  • Limited emotion control
  • No lip-sync data

Voice Capabilities

Voice Cloning ? No

No zero-shot voice cloning. Pre-built voices only (American/British English).

Emotion Control No

Limited emotion control. Natural prosody but no explicit emotion tags.

Streaming ? Yes

Streaming-capable. 36× real-time on T4 GPU. <100ms on modern hardware.

Lip-sync Data ? No

No native lip-sync data. Can be paired with external aligner.

Pricing

Price / 1M chars
$0.7
Price / minute
$0.0007
Free tier
Free (open weights)

$0.70/1M chars (managed inference). Self-hosted: near-zero marginal cost.

Sovereignty & Compliance

On-premise Yes

Full self-hosting. Apache 2.0 license. Runs on CPU without GPU.

GDPR ? Compliant

Data residency: Fully local — no data leaves the server.

Strategic & Business Analysis

Kokoro 82M v1.0 — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for DigiDouble?

Kokoro proves that TTS quality no longer requires massive compute: 82M parameters, <2GB VRAM, Apache 2.0 — the sovereignty-first choice for DigiDouble Phase 2 deployments on Swiss infrastructure.

Open-source / self-hosted
Lock-in risk:Low
Sovereignty fit:High
Open-source threat:Low
Pricing:Stable →

A. Strategic Positioning

Target customer: Developer / Privacy-conscious SMB — self-hosted, resource-constrained deployments

82M parameter open-source TTS with comparable quality to larger models — runs on <2GB VRAM, Apache 2.0, zero licensing cost.

B. Competitive Moat

  • Lightweight architecture (82M params, <2GB VRAM) — runs on commodity hardware
  • Apache 2.0 license — zero licensing cost, full commercial use rights
  • Community-driven development with growing ecosystem integrations

Vulnerability: No commercial support. Limited voice diversity vs larger models. Community-only maintenance creates enterprise adoption risk.

E. Strategic Questions for DigiDouble

Sovereignty fit

Fully self-hostable on Swiss/EU infrastructure. Zero data leaves the deployment environment. Best sovereignty score among TTS options.

Build vs. Buy

Build (integrate and customize) for Phase 2 sovereignty. Use as baseline for Phase 1 MVP if quality is sufficient — no cost, full control.

Lock-in risk

Open-source Apache 2.0 — zero vendor lock-in. DigiDouble owns the full stack. Only risk is internal expertise dependency.

Roadmap alignment

Strong for Phase 2 sovereignty. Phase 1 depends on quality requirements — Kokoro may be sufficient for many use cases.

Data Freshness

Updated 30 April 2026

Artificial Analysis Speech Leaderboard, Jan 2026

Update note: Kokoro 82M v1.0 ELO 1055 (rank #14, Apr 2026). Apache 2.0. Replicate hosted: $0.65/1M chars. Self-hosted: free. 8 languages confirmed.