Back/Kyutai TTS 1.6B

Open SourceCC-BY 4.0

Kyutai TTS 1.6B

Delayed streams modeling — streaming-native, timestamps, batching

Website Docs

100ms

TTFA (best case) ?

200ms

TTFA (typical) ?

Free

Price per million chars

—

ELO Score ?

Comparative Scores

Voice quality?7/10

Latency?8/10

Voice cloning?6/10

Expressiveness?6/10

Sovereignty?10/10

Price accessibility10/10

Multilingual2/10

Architecture

ArchitectureDelayed streams modeling (novel technique)

Parameters1.6B

Languages2

Self-hostable Yes

Streaming Yes

DigiDouble

Axe 1 R&D — Architecture streaming

Interesting for Axis 1 R&D (latency). Delayed streams modeling is a novel architecture worth studying. Native timestamps directly usable for avatar lip-sync. CC-BY 4.0 enables sovereign deployment.

Analysis

Kyutai TTS 1.6B uses a novel 'delayed streams modeling' technique enabling streaming-native generation with timestamps and batching. Released July 2025 by Kyutai (French AI lab, creators of Moshi). CC-BY 4.0 license. Related to Moshi full-duplex speech model. Unique architecture worth studying for Axis 1 R&D.

Strengths

Streaming-native via delayed streams
Native timestamps for lip-sync
Batching support
CC-BY 4.0 — sovereign deployment
From Kyutai (Moshi creators)

Weaknesses

Limited language support
Limited emotion control
Less community adoption than Kokoro/Chatterbox

Voice Capabilities

Voice Cloning ? Yes

Voice conditioning from audio samples. Related to Moshi speech-to-speech model.

Emotion Control No

Natural prosody. Limited explicit emotion control.

Streaming ? Yes

Streaming-native via delayed streams modeling. Timestamps enabled. Batching supported.

Lip-sync Data ? Yes

Timestamps natively supported via delayed streams modeling.

Pricing

Price / 1M chars

Free

Price / minute

Free

Free tier

Free (open weights)

Open weights — self-hosting cost only.

Sovereignty & Compliance

On-premise Yes

Full self-hosting under CC-BY 4.0.

GDPR ? Compliant

Data residency: Fully local when self-hosted.

Strategic & Business Analysis

Kyutai TTS 1.6B — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for DigiDouble?

Kyutai Moshi is the EU's answer to real-time voice AI — full-duplex speech at 160ms latency, €300M research-backed, open-source from a French lab. CC-BY-NC license is the only commercial deployment hurdle.

Open-source / self-hosted

Lock-in risk:Low

Sovereignty fit:High

Open-source threat:Low

Pricing:Stable →

A. Strategic Positioning

Target customer: Researcher / Developer — real-time duplex voice, French lab

Open-source real-time voice-to-voice model (CC-BY-NC 4.0) from Kyutai — full-duplex speech with 160ms latency, backed by €300M research funding.

B. Competitive Moat

Full-duplex speech model — handles interruptions and overlapping speech naturally
Neural audio codec (Mimi) + speech-text foundation model — unique architecture
€300M research funding from Kyutai (Xavier Niel, Rodolphe Saadé) — long-term R&D commitment

Vulnerability: CC-BY-NC 4.0 license restricts commercial use. Non-profit research lab model — commercial monetization path unclear.

E. Strategic Questions for DigiDouble

Sovereignty fit

French research lab, EU-aligned values, self-hostable on Swiss/EU infrastructure. CC-BY-NC limits commercial use but research/prototype use is free.

Build vs. Buy

Use for research/prototype (Phase 1). For Phase 2 commercial deployment, negotiate CC-BY-NC commercial license or switch to Apache 2.0 alternatives.

Lock-in risk

Open-source with CC-BY-NC — zero vendor lock-in for non-commercial use. Commercial deployment requires license negotiation with Kyutai.

Roadmap alignment

Excellent for research and Phase 1 prototype. Phase 2 commercial deployment requires license clarification. Scaleway EU partnership helps.

Back to State of the Art View in Benchmarks

Data Freshness

Updated 30 April 2026

Kyutai blog, Jul 2025

Update note: Kyutai TTS 1.6B released Jul 2025. CC-BY 4.0. Delayed streams modeling — streaming-native with timestamps. Part of Moshi S2S system. Self-hosted on GPU.

Reference Sources

Kyutai GitHubdocs Kyutai Blognews