Back/Simli (Trinity-1)
commercialSovereignty 3/5

Simli (Trinity-1)

Ultra-low latency real-time avatar from a single image

TTFR Latency

~300ms

real-time

Cost / minute

$0.009/min

real-time

Visual Quality

7/10

estimated score

Protocols

WebRTC, REST, WebSocket, LiveKit, Pipecat

Avatar Customisation

RAG / Knowledge Base

Via third-party LLM integration (OpenAI, Anthropic via Pipecat/LiveKit). Custom knowledge bases fed to the LLM layer. Simli handles Speech-to-Video only.

Behavior & Personality

Behavior defined by connected LLM prompt. Simli is a Speech-to-Video renderer — personality lives in the LLM layer.

Body Language & Gestures

Head movements and facial micro-expressions auto-generated. No complex hand/body gesture API.

Facial Expressions

Realistic facial expressions and smooth animation via Trinity-1. Gaussian model for photorealistic face cloning.

Voice & Voice Cloning

ElevenLabs integration for voice customisation (tone, accent, speed). Simli handles audio-to-video sync.

Persona Fine-Tuning

Persona lives in the LLM layer (external). Simli only handles visual rendering from audio input.

Avatar Training

Video required No (image sufficient)
DurationSingle high-quality image
ResolutionHigh-quality JPEG or PNG, front-facing
FormatJPEG / PNG (via /faces/trinity endpoint)
Consent required No
Processing timeMinutes to a few hours

Best Practices

  • 01.Front-facing photo, well-lit, neutral expression
  • 02.Closed mouth
  • 03.No obstructions (glasses, hair over face)
  • 04.Gaussian model: stricter quality requirements for photorealism

API Analysis

Protocols

RESTWebRTCWebSocketLiveKitPipecat

SDKs

JavaScript/TypeScriptPython
Webhooks No

Concurrent Sessions

1 (Free) → 2 (Hobby) → 10 (Pro) → 50 (Scale)

Rate Limits

Avatar slots: 1 (Free) → 1 (Hobby) → 5 (Pro) → 30 (Scale)

Key Features

  • POST /compose/token — session token
  • GET /compose/ice — ICE servers for WebRTC
  • Native LiveKit and Pipecat integration
  • Speech-to-Video pipeline: audio in → video out
  • <300ms end-to-end latency

API Constraints

  • Manual WebRTC negotiation required for custom implementations
  • No built-in LLM or TTS (bring your own)
  • Avatar slots limited by plan
  • No webhook support

Pricing Model

Model: Monthly subscription + included minutes
PlanPriceIncluded minutesOverage
Free$0/mo50 min/moN/A
Hobby$10/mo1000 min/mo$0.01/min
Pro$49/mo5500 min/mo$0.0095/min
Scale$249/mo27500 min/mo$0.009/min
EnterpriseCustomCustomCustom
Free tier
Cloud only
Enterprise pricing

Hidden costs / watch out

  • ElevenLabs TTS billed separately
  • LLM API costs (OpenAI/Anthropic) billed separately

Sovereignty & Hosting

Sovereignty Score

3/5

Hosting

Cloud (Norwegian company, EU jurisdiction)

GDPR

Yes

On-premise

No

Sovereignty detail

Norwegian company (Simli AS). EU jurisdiction. No explicit EU datacenter confirmed. No on-premise.

Constraints & Limits

  • No body gesture API (head movements only)
  • No built-in LLM or TTS — must integrate separately
  • Avatar slots limited by plan tier
  • No webhook support
  • No on-premise option

DigiDouble Relevance

Score

9/10

Best price/performance ratio for real-time video rendering. Ideal as a Speech-to-Video module in DigiDouble's modular pipeline. Ultra-low cost ($0.009/min) and <300ms latency. Limitation: no built-in AI stack — must integrate ASR/LLM/TTS separately.