Back/State of the Art/Simli (Trinity-1)

commercialSovereignty 3/5

Simli (Trinity-1)

Ultra-low latency real-time avatar from a single image

Website API Docs

TTFR Latency

~300ms

real-time

Cost / minute

$0.009/min

real-time

Visual Quality

7/10

estimated score

Protocols

WebRTC, REST, WebSocket, LiveKit, Pipecat

Avatar Customisation

RAG / Knowledge Base

Via third-party LLM integration (OpenAI, Anthropic via Pipecat/LiveKit). Custom knowledge bases fed to the LLM layer. Simli handles Speech-to-Video only.

Behavior & Personality

Behavior defined by connected LLM prompt. Simli is a Speech-to-Video renderer — personality lives in the LLM layer.

Body Language & Gestures

Head movements and facial micro-expressions auto-generated. No complex hand/body gesture API.

Facial Expressions

Realistic facial expressions and smooth animation via Trinity-1. Gaussian model for photorealistic face cloning.

Voice & Voice Cloning

ElevenLabs integration for voice customisation (tone, accent, speed). Simli handles audio-to-video sync.

Persona Fine-Tuning

Persona lives in the LLM layer (external). Simli only handles visual rendering from audio input.

Avatar Training

Video required No (image sufficient)

DurationSingle high-quality image

ResolutionHigh-quality JPEG or PNG, front-facing

FormatJPEG / PNG (via /faces/trinity endpoint)

Consent required No

Processing timeMinutes to a few hours

Best Practices

01.Front-facing photo, well-lit, neutral expression
02.Closed mouth
03.No obstructions (glasses, hair over face)
04.Gaussian model: stricter quality requirements for photorealism

API Analysis

Protocols

RESTWebRTCWebSocketLiveKitPipecat

SDKs

JavaScript/TypeScriptPython

Webhooks No

Concurrent Sessions

1 (Free) → 2 (Hobby) → 10 (Pro) → 50 (Scale)

Rate Limits

Avatar slots: 1 (Free) → 1 (Hobby) → 5 (Pro) → 30 (Scale)

Key Features

POST /compose/token — session token
GET /compose/ice — ICE servers for WebRTC
Native LiveKit and Pipecat integration
Speech-to-Video pipeline: audio in → video out
<300ms end-to-end latency

API Constraints

Manual WebRTC negotiation required for custom implementations
No built-in LLM or TTS (bring your own)
Avatar slots limited by plan
No webhook support

Pricing Model

Model: Monthly subscription + included minutes

Plan	Price	Included minutes	Overage
Free	$0/mo	50 min/mo	N/A
Hobby	$10/mo	1000 min/mo	$0.01/min
Pro	$49/mo	5500 min/mo	$0.0095/min
Scale	$249/mo	27500 min/mo	$0.009/min
Enterprise	Custom	Custom	Custom

Free tier

Cloud only

Enterprise pricing

Hidden costs / watch out

ElevenLabs TTS billed separately
LLM API costs (OpenAI/Anthropic) billed separately

Sovereignty & Hosting

Sovereignty Score

3/5

Hosting

Cloud (Norwegian company, EU jurisdiction)

GDPR

Yes

On-premise

Sovereignty detail

Norwegian company (Simli AS). EU jurisdiction. No explicit EU datacenter confirmed. No on-premise.

Constraints & Limits

No body gesture API (head movements only)
No built-in LLM or TTS — must integrate separately
Avatar slots limited by plan tier
No webhook support
No on-premise option

DigiDouble Relevance

Score

9/10

Best price/performance ratio for real-time video rendering. Ideal as a Speech-to-Video module in DigiDouble's modular pipeline. Ultra-low cost ($0.009/min) and <300ms latency. Limitation: no built-in AI stack — must integrate ASR/LLM/TTS separately.

← Back to State of the Art Research Challenges →