Simli (Trinity-1)
Ultra-low latency real-time avatar from a single image
TTFR Latency
~300ms
real-time
Cost / minute
$0.009/min
real-time
Visual Quality
7/10
estimated score
Protocols
WebRTC, REST, WebSocket, LiveKit, Pipecat
Avatar Customisation
RAG / Knowledge Base
Via third-party LLM integration (OpenAI, Anthropic via Pipecat/LiveKit). Custom knowledge bases fed to the LLM layer. Simli handles Speech-to-Video only.
Behavior & Personality
Behavior defined by connected LLM prompt. Simli is a Speech-to-Video renderer — personality lives in the LLM layer.
Body Language & Gestures
Head movements and facial micro-expressions auto-generated. No complex hand/body gesture API.
Facial Expressions
Realistic facial expressions and smooth animation via Trinity-1. Gaussian model for photorealistic face cloning.
Voice & Voice Cloning
ElevenLabs integration for voice customisation (tone, accent, speed). Simli handles audio-to-video sync.
Persona Fine-Tuning
Persona lives in the LLM layer (external). Simli only handles visual rendering from audio input.
Avatar Training
Best Practices
- 01.Front-facing photo, well-lit, neutral expression
- 02.Closed mouth
- 03.No obstructions (glasses, hair over face)
- 04.Gaussian model: stricter quality requirements for photorealism
API Analysis
Protocols
SDKs
Concurrent Sessions
1 (Free) → 2 (Hobby) → 10 (Pro) → 50 (Scale)
Rate Limits
Avatar slots: 1 (Free) → 1 (Hobby) → 5 (Pro) → 30 (Scale)
Key Features
- POST /compose/token — session token
- GET /compose/ice — ICE servers for WebRTC
- Native LiveKit and Pipecat integration
- Speech-to-Video pipeline: audio in → video out
- <300ms end-to-end latency
API Constraints
- Manual WebRTC negotiation required for custom implementations
- No built-in LLM or TTS (bring your own)
- Avatar slots limited by plan
- No webhook support
Pricing Model
| Plan | Price | Included minutes | Overage |
|---|---|---|---|
| Free | $0/mo | 50 min/mo | N/A |
| Hobby | $10/mo | 1000 min/mo | $0.01/min |
| Pro | $49/mo | 5500 min/mo | $0.0095/min |
| Scale | $249/mo | 27500 min/mo | $0.009/min |
| Enterprise | Custom | Custom | Custom |
Hidden costs / watch out
- ElevenLabs TTS billed separately
- LLM API costs (OpenAI/Anthropic) billed separately
Sovereignty & Hosting
Sovereignty Score
Hosting
Cloud (Norwegian company, EU jurisdiction)
GDPR
YesOn-premise
NoSovereignty detail
Norwegian company (Simli AS). EU jurisdiction. No explicit EU datacenter confirmed. No on-premise.
Constraints & Limits
- No body gesture API (head movements only)
- No built-in LLM or TTS — must integrate separately
- Avatar slots limited by plan tier
- No webhook support
- No on-premise option
DigiDouble Relevance
Score
9/10
Best price/performance ratio for real-time video rendering. Ideal as a Speech-to-Video module in DigiDouble's modular pipeline. Ultra-low cost ($0.009/min) and <300ms latency. Limitation: no built-in AI stack — must integrate ASR/LLM/TTS separately.