Back/State of the Art/LemonSlice (LS-2.1)

commercialSovereignty 2/5

LemonSlice (LS-2.1)

20B Video DiT · Multi-style avatars (human + cartoon + mascot) · Real-time emotion triggering

Website API Docs

TTFR Latency

~3000ms

real-time

Cost / minute

$0.210/min

real-time

Visual Quality

8/10

estimated score

Protocols

REST, WebSocket, WebRTC (via Self-Managed Pipeline)

Avatar Customisation

RAG / Knowledge Base

No native RAG. Knowledge base must be managed externally by the developer's LLM layer.

Behavior & Personality

System prompt passed to the integrated LLM layer. Developer can also bring their own LLM (BYOLLM) and control behavior entirely.

Body Language & Gestures

LS-2.1 generates full-body motion from audio. Gestures, head movements and posture are synthesised by the 20B DiT model — not pre-recorded loops.

Facial Expressions

LS-2.1 Emotion API: trigger specific emotional states (joy, surprise, concern, neutral) via API call. Real-time emotion blending with smooth transitions. Context freshness < 500ms.

Voice & Voice Cloning

Integrated TTS or BYOTTS (Bring Your Own TTS). Supports ElevenLabs, Cartesia, and custom audio streams. 20+ languages.

Persona Fine-Tuning

Avatar style is configurable at creation: photorealistic human, cartoon, mascot, animal, or stylised character — all from a single image or reference. Style is locked per avatar instance.

Avatar Training

Video required No (image sufficient)

Duration1 image (zero-shot) or short video clip (optional for higher fidelity)

Resolution512×512 minimum · 1024×1024 recommended

FormatJPEG, PNG, MP4

Consent required Yes (mandatory)

Processing time< 5 minutes (zero-shot) · 15–30 minutes (fine-tuned)

Best Practices

01.Single front-facing image with clear face/character visibility
02.For cartoon/mascot: provide reference sheet with multiple angles if available
03.Consistent lighting on reference image improves temporal coherence
04.For photorealistic: neutral expression in reference yields best emotional range
05.Optional: 30s reference video for improved lip-sync accuracy

API Analysis

Protocols

RESTWebSocketWebRTC (Self-Managed Pipeline)

SDKs

JavaScript/TypeScript SDKPython SDKReact component library

Webhooks Yes

Concurrent Sessions

Hosted: plan-dependent · Self-Managed: limited by GPU capacity

Rate Limits

Hosted API: 10 concurrent sessions (Growth) · Self-Managed: no limit

Key Features

LemonSlice-2 (Dec 2025): 20B Video DiT, 20 FPS on single A100 GPU — 10× efficiency vs LS-1
LS-2.1 (Q1 2026): adds real-time emotion triggering + action API (wave, nod, point, etc.)
UNIQUE: only commercial platform supporting cartoons, mascots, animals alongside photorealistic humans
Zero-shot avatar creation from 1 image — no training video required
Self-Managed Pipeline: deploy LS-2 on your own GPU infrastructure for full sovereignty
BYOLLM + BYOTTS: bring your own LLM and TTS, LemonSlice handles only video rendering
Emotion API: trigger joy, surprise, concern, neutral with configurable intensity (0.0–1.0)
Action API: trigger gestures (wave, nod, point, shrug) via API call
Temporal coherence: 20B DiT maintains identity across long sessions without drift
Multi-character scenes: up to 3 avatars in a single session (beta)

API Constraints

Self-Managed Pipeline requires A100/H100 GPU (not consumer hardware)
Hosted API: ~3s end-to-end latency (not suitable for sub-2s target without Self-Managed)
No native RAG — developer must manage knowledge base externally
Cartoon/mascot styles require style reference image for best results
Multi-character beta: limited to 3 avatars, no cross-avatar interaction API yet

Pricing Model

Model: Subscription + usage-based · Self-Managed: flat GPU fee

Plan	Price	Included minutes	Overage
Free	$0/mo	30 min/mo (hosted)	N/A
Starter	$49/mo	200 min	$0.25/min
Growth	$199/mo	950 min	$0.21/min
Self-Managed	$499/mo	Unlimited (own GPU)	GPU cost only
Enterprise	Custom	Custom	Negotiated

Free tier

On-premise available

Enterprise pricing

Hidden costs / watch out

Self-Managed requires A100/H100 GPU rental ($2–4/hr on cloud)
BYOTTS costs billed separately by TTS provider
Fine-tuning jobs billed per compute hour

Sovereignty & Hosting

Sovereignty Score

2/5

Hosting

US (hosted) · On-premise possible via Self-Managed Pipeline

GDPR

Yes

On-premise

Yes

Sovereignty detail

US-hosted by default. Self-Managed Pipeline allows on-premise GPU deployment — sovereignty possible but requires infrastructure setup.

Constraints & Limits

~3s hosted latency — above DigiDouble 2s target without Self-Managed Pipeline
Self-Managed requires A100/H100 GPU (significant infrastructure investment)
No native RAG integration
US-hosted by default (Self-Managed enables sovereignty)
Multi-character scenes limited to 3 avatars (beta)
Cartoon/mascot style locked at avatar creation — cannot switch style mid-session

DigiDouble Relevance

Score

8/10

Strategically relevant for DigiDouble's Emotional Toolbox and Character Design axes. UNIQUE capability: multi-style avatars (cartoons, mascots, animals) enable non-human pedagogical characters — a gap no other commercial platform covers. Self-Managed Pipeline aligns with DigiDouble's sovereignty requirement. Main challenge: ~3s hosted latency requires Self-Managed deployment to meet the 2s target. Strong candidate for Gamilab integration (gamified avatars).

← Back to State of the Art Research Challenges →