Kokoro 82M v1.0
Highest-ranked open-weight TTS — ELO 1059, 82M params, Apache 2.0
Comparative Scores
Architecture
Strong candidate for sovereign Phase 1 MVP. Runs on Swiss Exoscale GPU infrastructure. No voice cloning is a significant limitation for personalized DigiDouble use cases. Pair with XTTS-v2 or Chatterbox for voice cloning needs.
Analysis
Kokoro 82M v1.0 is the highest-ranked open-weight TTS model on Artificial Analysis (ELO 1059, rank #9). At just 82M parameters, it runs efficiently on CPU or modest GPU with 36× real-time speed on T4. Apache 2.0 license enables full sovereign deployment. Best price-performance among open models: $0.70/1M chars managed.
Strengths
- ELO 1059 — #1 open-weight model
- 82M params — runs on CPU
- 36× real-time on T4 GPU
- Apache 2.0 — full sovereignty
- $0.70/1M chars managed
Weaknesses
- No voice cloning
- English only (American/British)
- Limited emotion control
- No lip-sync data
Voice Capabilities
No zero-shot voice cloning. Pre-built voices only (American/British English).
Limited emotion control. Natural prosody but no explicit emotion tags.
Streaming-capable. 36× real-time on T4 GPU. <100ms on modern hardware.
No native lip-sync data. Can be paired with external aligner.
Pricing
$0.70/1M chars (managed inference). Self-hosted: near-zero marginal cost.
Sovereignty & Compliance
Full self-hosting. Apache 2.0 license. Runs on CPU without GPU.
Data residency: Fully local — no data leaves the server.
Kokoro 82M v1.0 — Strategic Positioning
Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for DigiDouble?
Kokoro proves that TTS quality no longer requires massive compute: 82M parameters, <2GB VRAM, Apache 2.0 — the sovereignty-first choice for DigiDouble Phase 2 deployments on Swiss infrastructure.
A. Strategic Positioning
Target customer: Developer / Privacy-conscious SMB — self-hosted, resource-constrained deployments
82M parameter open-source TTS with comparable quality to larger models — runs on <2GB VRAM, Apache 2.0, zero licensing cost.
B. Competitive Moat
- Lightweight architecture (82M params, <2GB VRAM) — runs on commodity hardware
- Apache 2.0 license — zero licensing cost, full commercial use rights
- Community-driven development with growing ecosystem integrations
Vulnerability: No commercial support. Limited voice diversity vs larger models. Community-only maintenance creates enterprise adoption risk.
E. Strategic Questions for DigiDouble
Sovereignty fit
Fully self-hostable on Swiss/EU infrastructure. Zero data leaves the deployment environment. Best sovereignty score among TTS options.
Build vs. Buy
Build (integrate and customize) for Phase 2 sovereignty. Use as baseline for Phase 1 MVP if quality is sufficient — no cost, full control.
Lock-in risk
Open-source Apache 2.0 — zero vendor lock-in. DigiDouble owns the full stack. Only risk is internal expertise dependency.
Roadmap alignment
Strong for Phase 2 sovereignty. Phase 1 depends on quality requirements — Kokoro may be sufficient for many use cases.
Data Freshness
Artificial Analysis Speech Leaderboard, Jan 2026
Update note: Kokoro 82M v1.0 ELO 1055 (rank #14, Apr 2026). Apache 2.0. Replicate hosted: $0.65/1M chars. Self-hosted: free. 8 languages confirmed.