DigiDouble
A conversational avatar of any person, from their existing video archives. Personalized, sovereign, real-time.
This portal documents the state of the art, research gaps, and technology choices for the DigiDouble project. It serves as a reference for industrial partners, Innosuisse evaluators, and IDIAP researchers. It also provides independent information on voice synthesis and video avatar challenges.
Speech Synthesis & Recognition
Challenges independent of the DigiDouble project
TTFB target for TTS first audio
TTS engines evaluated (cloud + open-source)
cost per minute depending on TTS engine
STT architectures compared (Nova-3, Whisper, Inworld)
The Voice Pipeline section provides an independent analysis of the TTS/STT ecosystem: technical comparisons, latency benchmarks, cost models, and stack recommendations. This information is useful for any conversational AI project, beyond DigiDouble.
Streaming Video Avatars
Challenges independent of the DigiDouble project
video avatar platforms compared
latency achieved by Simli Trinity-1
cost per minute depending on platform
actor covers multi-style criterion (LemonSlice)
The Video Avatars section provides an independent analysis of the streaming video avatar ecosystem: platform comparisons, interactive cost simulator, business and market challenges. This information is useful for any project integrating a conversational avatar.
Explore the portal
Three independent sections. Each can be consulted alone or in connection with the others.
DigiDouble — Project & Research
Product vision, target architecture, Innosuisse research axes, identified gaps, and academic assessment.
Speech Synthesis & Recognition
Complete comparison of TTS and STT engines, latency benchmarks, recommended stack, and Phase 1 pipeline.
Streaming Video Avatars
Comparison of commercial platforms, cost simulator, business challenges, behavior, and emotional design.