Tavus (Phoenix-4 + Raven-1)
Conversational Video Interface with full emotional intelligence stack — Raven-1 perception + Phoenix-4 rendering + Sparrow-1 turn-taking
TTFR Latency
~500ms
real-time
Cost / minute
$0.320/min
real-time
Visual Quality
10/10
estimated score
Protocols
WebRTC, REST, Daily SDK
Avatar Customisation
RAG / Knowledge Base
Native RAG via Persona: provide document_ids or document_tags. Avatar queries knowledge base in real-time during conversation.
Behavior & Personality
Persona system_prompt + Objectives (guided conversational goals) + Guardrails (strict behavioral limits). 30+ languages.
Body Language & Gestures
Phoenix-4 (2026): generates full head, hair, eyes, pose and expression from scratch every frame — no pre-recorded video loops. Active listening: nods, tilts, microexpressions react to what the user says in real time.
Facial Expressions
Phoenix-4 emotional intelligence: smooth transitions between emotional states, emergent microexpressions, no brute-force emotion. Raven-1 perception layer feeds emotional context to Phoenix-4 in real time (context freshness < 300ms).
Voice & Voice Cloning
Tone and accent customisation. 30+ languages. Echo mode: drive avatar with external audio stream.
Persona Fine-Tuning
Objectives + Guardrails system allows fine-grained persona control. Text Respond mode for scripted interactions.
Avatar Training
Best Practices
- 01.1 min continuous speech (clear articulation, teeth visible)
- 02.1 min neutral listening (closed mouth, no expression)
- 03.Waist-up, seated, ~1m from camera
- 04.Diffuse lighting, static background
- 05.Verbal consent declaration required
API Analysis
Protocols
SDKs
Concurrent Sessions
1 (Free) → 15+ (Growth) → unlimited (Enterprise)
Rate Limits
S3 pre-signed URLs required for training media
Key Features
- Raven-1 (2026): multimodal perception — audio-visual fusion, tone + expression + gaze + posture → natural language output for LLMs. Context < 300ms stale. Audio perception < 100ms.
- Phoenix-4 (2026): fully generated face/hair/eyes/pose every frame — no video loops. Active listening behaviors. Smooth emotional transitions with microexpressions.
- Sparrow-1: turn-taking model for natural conversation flow
- Raven-1 tool calling: OpenAI-compatible schema, callbacks on user laughter, emotional thresholds, attention shifts
- Echo mode: lip-sync on external audio stream
- Text Respond: generate response from text input
- Cerebras chip integration for ultra-fast LLM inference
- Webhooks for training completion and conversation state
API Constraints
- Dependency on Daily for WebRTC layer
- S3 pre-signed URLs required for media upload
- 4–5h training time for custom replicas
Pricing Model
| Plan | Price | Included minutes | Overage |
|---|---|---|---|
| Free | $0/mo | 25 min conversation | N/A |
| Starter | $59/mo | 100 min | $0.37/min |
| Growth | $397/mo | 1250 min | $0.32/min |
| Enterprise | Custom | Custom | Negotiated |
Hidden costs / watch out
- Replica training: $40–$65 per extra training
- Video generation billed separately from conversation minutes
Sovereignty & Hosting
Sovereignty Score
Hosting
AWS US
GDPR
YesOn-premise
NoSovereignty detail
AWS US. SOC2 Type II + HIPAA (Growth+). No EU hosting.
Constraints & Limits
- No manual control of specific hand/arm gestures
- 4–5h training time for custom replicas
- High-quality video required (1080p min, 4K recommended)
- US hosting only
- SOC2/HIPAA only on Growth+ plans
- Mandatory verbal consent for personal replicas
GamiWays Relevance
Score
9/10
As of April 2026, Tavus is the most advanced commercial platform for emotional intelligence in conversational video avatars. The Raven-1 + Phoenix-4 + Sparrow-1 stack is the reference architecture for GamiWays's target capabilities. Raven-1's perception layer (audio-visual fusion, < 300ms context freshness) directly addresses GamiWays Axis 3 (Contextual Awareness). Phoenix-4's fully-generated rendering (no video loops, active listening behaviors) sets the quality benchmark. Main limitations for GamiWays: US-only hosting (GDPR sovereignty concern), high cost ($0.32/min), and no open-source equivalent available yet.