Back/Anam.ai
commercialSovereignty 3/5

Anam.ai

One-shot photorealistic avatar with native RAG and 180ms latency

TTFR Latency

~180ms

real-time

Cost / minute

$0.120/min

real-time

Visual Quality

8/10

estimated score

Protocols

WebRTC, REST

Avatar Customisation

RAG / Knowledge Base

Native RAG (beta): upload PDF, MD, TXT documents. Avatar queries knowledge base in real-time. Available on Explorer+ plans.

Behavior & Personality

Structured 5-block prompt: Personality, Environment, Tone, Objectives, Guardrails. Fine-grained response style and politeness control.

Body Language & Gestures

Cara-3 generates photorealistic body language and micro-movements dynamically based on conversation context. Not manually scriptable.

Facial Expressions

Photorealistic facial expressions auto-generated from conversation context. High fidelity emotional range.

Voice & Voice Cloning

Stability, clarity, speed adjustments. ElevenLabs integration for custom voice cloning (Professional+ plans only).

Persona Fine-Tuning

5-block structured prompt system enables deep persona definition. Session-level personality override available.

Avatar Training

Video required No (image sufficient)
DurationSingle image (One-Shot Avatar)
ResolutionHigh-quality photo, front-facing, neutral background
FormatJPEG / PNG
Consent required No
Processing timeUnder 2 minutes

Best Practices

  • 01.Front-facing, well-lit photo
  • 02.Neutral background
  • 03.No obstructions
  • 04.Or: text-to-avatar generation (no photo needed)
  • 05.Custom voice: ElevenLabs audio samples (few minutes)

API Analysis

Protocols

RESTWebRTC

SDKs

JavaScript/TypeScriptPython
Webhooks No

Concurrent Sessions

Unlimited (Growth+)

Rate Limits

Session duration: 3–10 min (lower plans) → unlimited (Growth+)

Key Features

  • 180ms median server latency
  • 25fps video output
  • Custom LLM support (OpenAI-compatible endpoints or client-side)
  • Voice activity detection: sensitivity and silence controls
  • Multilingual support
  • Tool calling support

API Constraints

  • RAG still in beta
  • Session duration limited on lower plans
  • Custom voice cloning requires Professional ($999/mo)
  • No webhook support

Pricing Model

Model: Monthly subscription + per-second overage
PlanPriceIncluded minutesOverage
Free$0/mo30 minN/A
Starter$12/mo50 min$0.16/min
Explorer$49/mo250 min$0.14/min
Growth$299/mo2000 min$0.12/min
Professional$999/mo5000 min$0.11/min
EnterpriseCustomUnlimitedCustom
Free tier
Cloud only
Enterprise pricing

Hidden costs / watch out

  • Custom voice cloning: Professional plan only ($999/mo)
  • RAG feature: Explorer+ only
  • Watermark on Free and Starter plans

Sovereignty & Hosting

Sovereignty Score

3/5

Hosting

Cloud (AWS/GCP). Zero Data Retention option for Enterprise.

GDPR

Yes

On-premise

No

Sovereignty detail

HIPAA + SOC-II certified. Zero Data Retention option for Enterprise. Cloud-based (AWS/GCP).

Constraints & Limits

  • No manual control of specific gestures or posture
  • RAG in beta — experimental
  • Custom voice cloning requires $999/mo plan
  • Session duration limited on lower tiers
  • No on-premise standard option

DigiDouble Relevance

Score

9/10

Fastest median latency (180ms) among commercial platforms. One-Shot avatar creation ideal for rapid prototyping. Native RAG (beta) and structured persona system align well with DigiDouble's educational use case. Limitation: custom voice requires expensive plan.