Back/Moshi (Kyutai)

Open SourceCC-BY 4.0

Moshi (Kyutai)

Full-duplex spoken dialogue — simultaneous listening and speaking

Website Docs

200ms

TTFA (best case) ?

500ms

TTFA (typical) ?

Free

Price per million chars

—

ELO Score ?

Comparative Scores

Voice quality?7/10

Latency?8/10

Voice cloning?1/10

Expressiveness?6/10

Sovereignty?10/10

Price accessibility10/10

Multilingual1/10

Architecture

ArchitectureSpeech-text foundation model + Mimi codec (full-duplex)

Parameters7B

Languages1

Self-hostable Yes

Streaming Yes

DigiDouble

Axe 1 R&D — Full-duplex conversation

Key reference for Axis 1 R&D (full-duplex conversation). Full-duplex capability is the long-term goal for DigiDouble — enables natural interruption handling. CC-BY 4.0 enables sovereign deployment. Not suitable for Phase 1 MVP — evaluate for Axis 1 advanced research.

Analysis

Moshi is a full-duplex speech-text foundation model from Kyutai (French AI lab). Enables simultaneous listening and speaking — unlike turn-based systems. Uses Mimi streaming neural audio codec. CC-BY 4.0 license enables sovereign deployment. Reference for full-duplex conversation research.

Strengths

Full-duplex: simultaneous listen + speak
CC-BY 4.0 — commercial sovereign deployment
Mimi streaming codec
From Kyutai (European AI lab)
Research reference for full-duplex

Weaknesses

English only
Requires A100 for real-time
No voice cloning
Research-grade stability

Voice Capabilities

Voice Cloning ? No

No voice cloning. Fixed voice output.

Emotion Control No

Natural prosody from end-to-end training. No explicit emotion control.

Streaming ? Yes

Full-duplex: simultaneous listening and speaking. Streaming neural audio codec (Mimi). Real-time capable on A100.

Lip-sync Data ? No

No native lip-sync data.

Pricing

Price / 1M chars

Free

Price / minute

Free

Free tier

Free (open weights)

Open weights — self-hosting cost only.

Sovereignty & Compliance

On-premise Yes

Full self-hosting under CC-BY 4.0. Commercial use allowed.

GDPR ? Compliant

Data residency: Fully local when self-hosted.

Strategic & Business Analysis

Moshi (Kyutai) — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for DigiDouble?

Moshi is Kyutai's open-source breakthrough for real-time full-duplex voice AI — 160ms latency, EU-backed research, CC-BY-NC license. The future of natural voice interaction, available today for research.

Open-source / self-hosted

Lock-in risk:Low

Sovereignty fit:High

Open-source threat:Low

Pricing:Stable →

A. Strategic Positioning

Target customer: Researcher / Developer — real-time duplex voice, French lab

Kyutai's full-duplex real-time voice AI — handles interruptions, overlapping speech, and natural conversation flow at 160ms latency.

B. Competitive Moat

Full-duplex speech — handles interruptions and overlapping speech without turn-taking
160ms end-to-end latency — competitive with commercial real-time voice solutions
€300M Kyutai research backing — long-term open-source commitment

Vulnerability: CC-BY-NC 4.0 license restricts commercial use. Research model — production readiness and enterprise support uncertain.

E. Strategic Questions for DigiDouble

Sovereignty fit

French lab, EU-aligned, self-hostable. CC-BY-NC restricts commercial use but research/prototype use is free and sovereign.

Build vs. Buy

Use for research/prototype (Phase 1). For Phase 2 commercial, negotiate license or use Apache 2.0 alternatives (Ultravox, Chatterbox).

Lock-in risk

Open-source CC-BY-NC — zero vendor lock-in for non-commercial. Commercial deployment requires license negotiation.

Roadmap alignment

Excellent for research and Phase 1. Phase 2 commercial deployment requires CC-BY-NC license resolution.

Back to State of the Art View in Benchmarks

Data Freshness

Updated 30 April 2026

Kyutai GitHub + research blog, 2024–2025

Update note: Moshi released Sep 2024 by Kyutai. CC-BY 4.0. Full-duplex S2S with inner monologue. 7B params. Self-hosted on GPU.

Reference Sources

Moshi GitHubdocs Kyutai Research Blognews HuggingFace Moshidocs