D-ID (Expressive V4)
Enterprise AI agents with expressive avatars and native RAG
TTFR Latency
~450ms
real-time
Cost / minute
$0.400/min
real-time
Visual Quality
8/10
estimated score
Protocols
WebRTC, REST
Avatar Customisation
RAG / Knowledge Base
Native 'Knowledge' API: connect custom knowledge bases (PDF, text). Agent queries them in real-time during conversation.
Behavior & Personality
System prompt configuration for personality, tone, response style. Full agent persona definition.
Body Language & Gestures
V4 Expressive: adaptive body language generated to match voice inflection and emotional context. Natural posture.
Facial Expressions
V4 Expressive: emotion and micro-expression control via sentiment tags (joy, seriousness, surprise).
Voice & Voice Cloning
ElevenLabs + Microsoft Azure TTS partnerships. Hundreds of multilingual voices. Pitch, speed, accent control. Voice cloning available.
Persona Fine-Tuning
Full agent persona: system prompt + knowledge base + voice + visual identity. Consistent role maintenance throughout interaction.
Avatar Training
Best Practices
- 01.Frontal uniform lighting, no shadows
- 02.Neutral fixed background
- 03.Stable camera at eye level
- 04.Natural speech with pauses
- 05.Avoid excessive head movements
- 06.Clean audio recording (no background noise)
- 07.No face occlusions (sunglasses, hands)
API Analysis
Protocols
SDKs
Concurrent Sessions
Plan-dependent
Rate Limits
BYO-S3 available on Pro+ plans
Key Features
- Knowledge API for native RAG
- Agents API for full conversational agent management
- Expressive V4 emotion tags
- ElevenLabs + Azure TTS integration
- WebRTC real-time streaming
API Constraints
- Expressive V4 requires specific credits (not all stock avatars)
- No on-premise standard option
- Data stored on D-ID infrastructure by default (BYO-S3 on Pro+)
- Celebrity face moderation
Pricing Model
| Plan | Price | Included minutes | Overage |
|---|---|---|---|
| Trial | Free (14 days) | Limited | N/A |
| Build | $18/mo | 32 min streaming | ~$0.56/min |
| Launch | $40/mo | 90 min streaming | ~$0.44/min |
| Scale | $158/mo | 400 min streaming | ~$0.40/min |
| Enterprise | Custom | Custom | Custom |
Hidden costs / watch out
- Expressive V4 credits premium
- BYO-S3 only on Pro+
- 20–30% discount on annual commitment
Sovereignty & Hosting
Sovereignty Score
Hosting
AWS US-East-1
GDPR
YesOn-premise
NoSovereignty detail
Israeli company. AWS US-East-1. No EU hosting. GDPR principles respected but no EU datacenter.
Constraints & Limits
- No manual hand/arm gesture control
- US hosting only (AWS US-East-1)
- Expressive V4 not available on all stock avatars
- Celebrity face moderation
- No on-premise standard option
DigiDouble Relevance
Score
7/10
Strong native RAG and expressive emotion control make D-ID compelling for educational agents. V4 Expressive is a significant upgrade. Main limitation: US-only hosting and higher cost per minute vs. Simli/bitHuman.