Back/Tavus (Phoenix-4 + Raven-1)
commercialSovereignty 1/5

Tavus (Phoenix-4 + Raven-1)

Conversational Video Interface with full emotional intelligence stack — Raven-1 perception + Phoenix-4 rendering + Sparrow-1 turn-taking

TTFR Latency

~500ms

real-time

Cost / minute

$0.320/min

real-time

Visual Quality

10/10

estimated score

Protocols

WebRTC, REST, Daily SDK

Avatar Customisation

RAG / Knowledge Base

Native RAG via Persona: provide document_ids or document_tags. Avatar queries knowledge base in real-time during conversation.

Behavior & Personality

Persona system_prompt + Objectives (guided conversational goals) + Guardrails (strict behavioral limits). 30+ languages.

Body Language & Gestures

Phoenix-4 (2026): generates full head, hair, eyes, pose and expression from scratch every frame — no pre-recorded video loops. Active listening: nods, tilts, microexpressions react to what the user says in real time.

Facial Expressions

Phoenix-4 emotional intelligence: smooth transitions between emotional states, emergent microexpressions, no brute-force emotion. Raven-1 perception layer feeds emotional context to Phoenix-4 in real time (context freshness < 300ms).

Voice & Voice Cloning

Tone and accent customisation. 30+ languages. Echo mode: drive avatar with external audio stream.

Persona Fine-Tuning

Objectives + Guardrails system allows fine-grained persona control. Text Respond mode for scripted interactions.

Avatar Training

Video required Yes
Duration2 minutes (1 min speech + 1 min neutral listening)
Resolution1080p minimum, 4K recommended
FormatMP4 (H.264/AAC) or WebM, 25fps min
Consent required Yes (mandatory)
Processing time4–5 hours

Best Practices

  • 01.1 min continuous speech (clear articulation, teeth visible)
  • 02.1 min neutral listening (closed mouth, no expression)
  • 03.Waist-up, seated, ~1m from camera
  • 04.Diffuse lighting, static background
  • 05.Verbal consent declaration required

API Analysis

Protocols

RESTWebRTC (Daily)WebSocket

SDKs

JavaScript/ReactPython
Webhooks Yes

Concurrent Sessions

1 (Free) → 15+ (Growth) → unlimited (Enterprise)

Rate Limits

S3 pre-signed URLs required for training media

Key Features

  • Raven-1 (2026): multimodal perception — audio-visual fusion, tone + expression + gaze + posture → natural language output for LLMs. Context < 300ms stale. Audio perception < 100ms.
  • Phoenix-4 (2026): fully generated face/hair/eyes/pose every frame — no video loops. Active listening behaviors. Smooth emotional transitions with microexpressions.
  • Sparrow-1: turn-taking model for natural conversation flow
  • Raven-1 tool calling: OpenAI-compatible schema, callbacks on user laughter, emotional thresholds, attention shifts
  • Echo mode: lip-sync on external audio stream
  • Text Respond: generate response from text input
  • Cerebras chip integration for ultra-fast LLM inference
  • Webhooks for training completion and conversation state

API Constraints

  • Dependency on Daily for WebRTC layer
  • S3 pre-signed URLs required for media upload
  • 4–5h training time for custom replicas

Pricing Model

Model: Monthly subscription + pay-as-you-go
PlanPriceIncluded minutesOverage
Free$0/mo25 min conversationN/A
Starter$59/mo100 min$0.37/min
Growth$397/mo1250 min$0.32/min
EnterpriseCustomCustomNegotiated
Free tier
Cloud only
Enterprise pricing

Hidden costs / watch out

  • Replica training: $40–$65 per extra training
  • Video generation billed separately from conversation minutes

Sovereignty & Hosting

Sovereignty Score

1/5

Hosting

AWS US

GDPR

Yes

On-premise

No

Sovereignty detail

AWS US. SOC2 Type II + HIPAA (Growth+). No EU hosting.

Constraints & Limits

  • No manual control of specific hand/arm gestures
  • 4–5h training time for custom replicas
  • High-quality video required (1080p min, 4K recommended)
  • US hosting only
  • SOC2/HIPAA only on Growth+ plans
  • Mandatory verbal consent for personal replicas

GamiWays Relevance

Score

9/10

As of April 2026, Tavus is the most advanced commercial platform for emotional intelligence in conversational video avatars. The Raven-1 + Phoenix-4 + Sparrow-1 stack is the reference architecture for GamiWays's target capabilities. Raven-1's perception layer (audio-visual fusion, < 300ms context freshness) directly addresses GamiWays Axis 3 (Contextual Awareness). Phoenix-4's fully-generated rendering (no video loops, active listening behaviors) sets the quality benchmark. Main limitations for GamiWays: US-only hosting (GDPR sovereignty concern), high cost ($0.32/min), and no open-source equivalent available yet.