Back/Kyutai TTS 1.6B
Open SourceCC-BY 4.0

Kyutai TTS 1.6B

Delayed streams modeling — streaming-native, timestamps, batching

100ms
TTFA (best case) ?
200ms
TTFA (typical) ?
Free
Price per million chars
ELO Score ?

Comparative Scores

Voice quality?7/10
Latency?8/10
Voice cloning?6/10
Expressiveness?6/10
Sovereignty?10/10
Price accessibility10/10
Multilingual2/10

Architecture

ArchitectureDelayed streams modeling (novel technique)
Parameters1.6B
Languages2
Self-hostable Yes
Streaming Yes
DigiDouble
Axe 1 R&D — Architecture streaming

Interesting for Axis 1 R&D (latency). Delayed streams modeling is a novel architecture worth studying. Native timestamps directly usable for avatar lip-sync. CC-BY 4.0 enables sovereign deployment.

Analysis

Kyutai TTS 1.6B uses a novel 'delayed streams modeling' technique enabling streaming-native generation with timestamps and batching. Released July 2025 by Kyutai (French AI lab, creators of Moshi). CC-BY 4.0 license. Related to Moshi full-duplex speech model. Unique architecture worth studying for Axis 1 R&D.

Strengths

  • Streaming-native via delayed streams
  • Native timestamps for lip-sync
  • Batching support
  • CC-BY 4.0 — sovereign deployment
  • From Kyutai (Moshi creators)

Weaknesses

  • Limited language support
  • Limited emotion control
  • Less community adoption than Kokoro/Chatterbox

Voice Capabilities

Voice Cloning ? Yes

Voice conditioning from audio samples. Related to Moshi speech-to-speech model.

Emotion Control No

Natural prosody. Limited explicit emotion control.

Streaming ? Yes

Streaming-native via delayed streams modeling. Timestamps enabled. Batching supported.

Lip-sync Data ? Yes

Timestamps natively supported via delayed streams modeling.

Pricing

Price / 1M chars
Free
Price / minute
Free
Free tier
Free (open weights)

Open weights — self-hosting cost only.

Sovereignty & Compliance

On-premise Yes

Full self-hosting under CC-BY 4.0.

GDPR ? Compliant

Data residency: Fully local when self-hosted.

Strategic & Business Analysis

Kyutai TTS 1.6B — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for DigiDouble?

Kyutai Moshi is the EU's answer to real-time voice AI — full-duplex speech at 160ms latency, €300M research-backed, open-source from a French lab. CC-BY-NC license is the only commercial deployment hurdle.

Open-source / self-hosted
Lock-in risk:Low
Sovereignty fit:High
Open-source threat:Low
Pricing:Stable →

A. Strategic Positioning

Target customer: Researcher / Developer — real-time duplex voice, French lab

Open-source real-time voice-to-voice model (CC-BY-NC 4.0) from Kyutai — full-duplex speech with 160ms latency, backed by €300M research funding.

B. Competitive Moat

  • Full-duplex speech model — handles interruptions and overlapping speech naturally
  • Neural audio codec (Mimi) + speech-text foundation model — unique architecture
  • €300M research funding from Kyutai (Xavier Niel, Rodolphe Saadé) — long-term R&D commitment

Vulnerability: CC-BY-NC 4.0 license restricts commercial use. Non-profit research lab model — commercial monetization path unclear.

E. Strategic Questions for DigiDouble

Sovereignty fit

French research lab, EU-aligned values, self-hostable on Swiss/EU infrastructure. CC-BY-NC limits commercial use but research/prototype use is free.

Build vs. Buy

Use for research/prototype (Phase 1). For Phase 2 commercial deployment, negotiate CC-BY-NC commercial license or switch to Apache 2.0 alternatives.

Lock-in risk

Open-source with CC-BY-NC — zero vendor lock-in for non-commercial use. Commercial deployment requires license negotiation with Kyutai.

Roadmap alignment

Excellent for research and Phase 1 prototype. Phase 2 commercial deployment requires license clarification. Scaleway EU partnership helps.

Data Freshness

Updated 30 April 2026

Kyutai blog, Jul 2025

Update note: Kyutai TTS 1.6B released Jul 2025. CC-BY 4.0. Delayed streams modeling — streaming-native with timestamps. Part of Moshi S2S system. Self-hosted on GPU.