Back/State of the Art/bitHuman

commercialSovereignty 5/5

bitHuman

Edge-deployable avatar with gesture API and self-hosted option

Website API Docs

TTFR Latency

~300ms

real-time

Cost / minute

$0.050/min

real-time

Visual Quality

7/10

estimated score

Protocols

WebRTC, REST, LiveKit

Avatar Customisation

RAG / Knowledge Base

Agent Context API: inject 'silent knowledge' (documents, data) into the avatar's context. CO-STAR framework for structured knowledge.

Behavior & Personality

CO-STAR framework: Context, Objective, Style, Tone, Audience, Response. Emotional tone (empathetic, calm). Structured response style.

Body Language & Gestures

Dynamics API: trigger specific gestures via keywords or API commands (hand signs, nods, laughs). Unique feature among commercial platforms.

Facial Expressions

Real-time facial expressions at 25fps, synchronized with audio (lip-sync). Emotion-driven micro-expressions.

Voice & Voice Cloning

Voice cloning via Voice Upload. OpenAI and ElevenLabs TTS integration. Custom voice persona.

Persona Fine-Tuning

CO-STAR prompt framework + Dynamics API for gesture triggers. Persona can include non-human characters (animals, illustrated characters).

Avatar Training

Video required No (image sufficient)

DurationImage OR short video (max 30 seconds for Likeness)

ResolutionImage: <10MB, frontal neutral lighting. Video: single person, centered, minimal movement.

FormatImage: JPEG/PNG. Video: MP4 (max 30s)

Consent required No

Processing timeMinutes

Best Practices

01.Image: frontal, well-lit, neutral expression
02.Video (Likeness): single person, centered, minimal movement
03.Max 30 seconds video
04.No complex gestures in training footage
05.Expression model requires GPU for custom faces outside catalog

API Analysis

Protocols

RESTWebRTCLiveKit

SDKs

JavaScript/TypeScriptPython

Webhooks No

Concurrent Sessions

Plan-dependent

Rate Limits

250 credits fixed cost for initial agent generation

Key Features

Dynamics API: gesture triggers via keywords or API
CO-STAR framework for structured persona
Self-hosted deployment (CPU or GPU)
Edge deployment on ARM/x86 hardware
Offline operation possible
LiveKit integration for WebRTC streaming

API Constraints

Expression model (GPU) required for custom faces outside catalog
LiveKit dependency for real-time streaming
Manual memory persistence via context injection
No webhook support

Pricing Model

Model: Credit-based subscription

Plan	Price	Included minutes	Overage
Free	$0/mo	~99 min (Essence)	N/A
Basic	$10/mo	700 credits	$0.01/min (Essence)
Pro	$25/mo	2300 credits	$0.01/min (Essence)
Creator	$75/mo	10000 credits	$0.01/min (Essence)
Business	$200/mo	30000 credits	$0.01/min (Essence)

Free tier

On-premise available

Enterprise pricing

Hidden costs / watch out

250 credits fixed for agent generation
Expression model (GPU): 4 credits/min vs 1 credit/min for Essence
GPU hardware cost if self-hosted

Sovereignty & Hosting

Sovereignty Score

5/5

Hosting

Self-hosted (on-premise, CPU or GPU) OR cloud. Full sovereignty option.

GDPR

Yes

On-premise

Yes

Sovereignty detail

Full self-hosted option (on-premise, CPU or GPU). Total data sovereignty. No cloud dependency required.

Constraints & Limits

Expression model requires GPU (higher cost)
LiveKit dependency for streaming
No webhook support
Manual memory persistence required
Body customisation limited vs. facial expressions

DigiDouble Relevance

Score

8/10

Unique Dynamics API for gesture control is a key differentiator. Self-hosted CPU deployment enables total sovereignty and 10x cost reduction. Ideal for DigiDouble's Swiss sovereignty requirements. Limitation: Expression model requires GPU for custom faces.

← Back to State of the Art Research Challenges →