Anam.ai
One-shot photorealistic avatar with native RAG and 180ms latency
TTFR Latency
~180ms
real-time
Cost / minute
$0.120/min
real-time
Visual Quality
8/10
estimated score
Protocols
WebRTC, REST
Avatar Customisation
RAG / Knowledge Base
Native RAG (beta): upload PDF, MD, TXT documents. Avatar queries knowledge base in real-time. Available on Explorer+ plans.
Behavior & Personality
Structured 5-block prompt: Personality, Environment, Tone, Objectives, Guardrails. Fine-grained response style and politeness control.
Body Language & Gestures
Cara-3 generates photorealistic body language and micro-movements dynamically based on conversation context. Not manually scriptable.
Facial Expressions
Photorealistic facial expressions auto-generated from conversation context. High fidelity emotional range.
Voice & Voice Cloning
Stability, clarity, speed adjustments. ElevenLabs integration for custom voice cloning (Professional+ plans only).
Persona Fine-Tuning
5-block structured prompt system enables deep persona definition. Session-level personality override available.
Avatar Training
Best Practices
- 01.Front-facing, well-lit photo
- 02.Neutral background
- 03.No obstructions
- 04.Or: text-to-avatar generation (no photo needed)
- 05.Custom voice: ElevenLabs audio samples (few minutes)
API Analysis
Protocols
SDKs
Concurrent Sessions
Unlimited (Growth+)
Rate Limits
Session duration: 3–10 min (lower plans) → unlimited (Growth+)
Key Features
- 180ms median server latency
- 25fps video output
- Custom LLM support (OpenAI-compatible endpoints or client-side)
- Voice activity detection: sensitivity and silence controls
- Multilingual support
- Tool calling support
API Constraints
- RAG still in beta
- Session duration limited on lower plans
- Custom voice cloning requires Professional ($999/mo)
- No webhook support
Pricing Model
| Plan | Price | Included minutes | Overage |
|---|---|---|---|
| Free | $0/mo | 30 min | N/A |
| Starter | $12/mo | 50 min | $0.16/min |
| Explorer | $49/mo | 250 min | $0.14/min |
| Growth | $299/mo | 2000 min | $0.12/min |
| Professional | $999/mo | 5000 min | $0.11/min |
| Enterprise | Custom | Unlimited | Custom |
Hidden costs / watch out
- Custom voice cloning: Professional plan only ($999/mo)
- RAG feature: Explorer+ only
- Watermark on Free and Starter plans
Sovereignty & Hosting
Sovereignty Score
Hosting
Cloud (AWS/GCP). Zero Data Retention option for Enterprise.
GDPR
YesOn-premise
NoSovereignty detail
HIPAA + SOC-II certified. Zero Data Retention option for Enterprise. Cloud-based (AWS/GCP).
Constraints & Limits
- No manual control of specific gestures or posture
- RAG in beta — experimental
- Custom voice cloning requires $999/mo plan
- Session duration limited on lower tiers
- No on-premise standard option
DigiDouble Relevance
Score
9/10
Fastest median latency (180ms) among commercial platforms. One-Shot avatar creation ideal for rapid prototyping. Native RAG (beta) and structured persona system align well with DigiDouble's educational use case. Limitation: custom voice requires expensive plan.