Custom Voice Tool Ranking
Weight criteria to your context and get a dynamic TTS / STT ranking.
TTS Ranking — 16 tools
Sorted by weighted scoreInworld TTS-1.5 + Realtime API
#1 quality benchmark — ELO 1160, sub-120ms Mini, Realtime S2S + STT + LLM Router
Orpheus 3B
LLM-based TTS — ultra-natural speech with emotion tags and non-verbals
Voxtral TTS (Mistral)
Open-weights TTS from Mistral — fast, adaptable, 9 languages (Mar 2026)
ElevenLabs v3
Industry reference — 380+ voices, 70+ languages, emotional range
Chatterbox (Resemble AI)
MIT license — beats ElevenLabs in blind tests (63.75% preference)
Sesame CSM
Conversational Speech Model — crosses the uncanny valley of voice
Dia (Nari Labs)
Ultra-realistic dialogue generation — multi-speaker, emotion, non-verbals
Kyutai TTS 1.6B
Delayed streams modeling — streaming-native, timestamps, batching
Cartesia Sonic 3
Fastest TTFA on the market — 40ms, State Space Model architecture
Hume AI Octave 2
LLM-based emotional TTS — natural language emotion control
Fish Audio OpenAudio S1
Pay-as-you-go voice cloning — 70% cheaper than ElevenLabs
Kokoro 82M v1.0
Highest-ranked open-weight TTS — ELO 1059, 82M params, Apache 2.0
Ultravox v0.5
Speech-to-speech model — ~100ms latency, no ASR/TTS pipeline needed
Moshi (Kyutai)
Full-duplex spoken dialogue — simultaneous listening and speaking
OpenAI Realtime API
GPT-4o speech-to-speech — integrated LLM + voice, WebSocket
Deepgram Aura 2
Ultra-low latency TTS optimized for voice agents — <100ms
Methodology: Raw scores (1–10) are sourced from public benchmarks (Artificial Analysis ELO, Koenecke WER, measured TTFA). Weighting is applied via weighted average. Sovereignty and lock-in badges come from the DigiDouble strategic analysis.