LemonSlice (LS-2.1)
20B Video DiT · Multi-style avatars (human + cartoon + mascot) · Real-time emotion triggering
TTFR Latency
~3000ms
real-time
Cost / minute
$0.210/min
real-time
Visual Quality
8/10
estimated score
Protocols
REST, WebSocket, WebRTC (via Self-Managed Pipeline)
Avatar Customisation
RAG / Knowledge Base
No native RAG. Knowledge base must be managed externally by the developer's LLM layer.
Behavior & Personality
System prompt passed to the integrated LLM layer. Developer can also bring their own LLM (BYOLLM) and control behavior entirely.
Body Language & Gestures
LS-2.1 generates full-body motion from audio. Gestures, head movements and posture are synthesised by the 20B DiT model — not pre-recorded loops.
Facial Expressions
LS-2.1 Emotion API: trigger specific emotional states (joy, surprise, concern, neutral) via API call. Real-time emotion blending with smooth transitions. Context freshness < 500ms.
Voice & Voice Cloning
Integrated TTS or BYOTTS (Bring Your Own TTS). Supports ElevenLabs, Cartesia, and custom audio streams. 20+ languages.
Persona Fine-Tuning
Avatar style is configurable at creation: photorealistic human, cartoon, mascot, animal, or stylised character — all from a single image or reference. Style is locked per avatar instance.
Avatar Training
Best Practices
- 01.Single front-facing image with clear face/character visibility
- 02.For cartoon/mascot: provide reference sheet with multiple angles if available
- 03.Consistent lighting on reference image improves temporal coherence
- 04.For photorealistic: neutral expression in reference yields best emotional range
- 05.Optional: 30s reference video for improved lip-sync accuracy
API Analysis
Protocols
SDKs
Concurrent Sessions
Hosted: plan-dependent · Self-Managed: limited by GPU capacity
Rate Limits
Hosted API: 10 concurrent sessions (Growth) · Self-Managed: no limit
Key Features
- LemonSlice-2 (Dec 2025): 20B Video DiT, 20 FPS on single A100 GPU — 10× efficiency vs LS-1
- LS-2.1 (Q1 2026): adds real-time emotion triggering + action API (wave, nod, point, etc.)
- UNIQUE: only commercial platform supporting cartoons, mascots, animals alongside photorealistic humans
- Zero-shot avatar creation from 1 image — no training video required
- Self-Managed Pipeline: deploy LS-2 on your own GPU infrastructure for full sovereignty
- BYOLLM + BYOTTS: bring your own LLM and TTS, LemonSlice handles only video rendering
- Emotion API: trigger joy, surprise, concern, neutral with configurable intensity (0.0–1.0)
- Action API: trigger gestures (wave, nod, point, shrug) via API call
- Temporal coherence: 20B DiT maintains identity across long sessions without drift
- Multi-character scenes: up to 3 avatars in a single session (beta)
API Constraints
- Self-Managed Pipeline requires A100/H100 GPU (not consumer hardware)
- Hosted API: ~3s end-to-end latency (not suitable for sub-2s target without Self-Managed)
- No native RAG — developer must manage knowledge base externally
- Cartoon/mascot styles require style reference image for best results
- Multi-character beta: limited to 3 avatars, no cross-avatar interaction API yet
Pricing Model
| Plan | Price | Included minutes | Overage |
|---|---|---|---|
| Free | $0/mo | 30 min/mo (hosted) | N/A |
| Starter | $49/mo | 200 min | $0.25/min |
| Growth | $199/mo | 950 min | $0.21/min |
| Self-Managed | $499/mo | Unlimited (own GPU) | GPU cost only |
| Enterprise | Custom | Custom | Negotiated |
Hidden costs / watch out
- Self-Managed requires A100/H100 GPU rental ($2–4/hr on cloud)
- BYOTTS costs billed separately by TTS provider
- Fine-tuning jobs billed per compute hour
Sovereignty & Hosting
Sovereignty Score
Hosting
US (hosted) · On-premise possible via Self-Managed Pipeline
GDPR
YesOn-premise
YesSovereignty detail
US-hosted by default. Self-Managed Pipeline allows on-premise GPU deployment — sovereignty possible but requires infrastructure setup.
Constraints & Limits
- ~3s hosted latency — above DigiDouble 2s target without Self-Managed Pipeline
- Self-Managed requires A100/H100 GPU (significant infrastructure investment)
- No native RAG integration
- US-hosted by default (Self-Managed enables sovereignty)
- Multi-character scenes limited to 3 avatars (beta)
- Cartoon/mascot style locked at avatar creation — cannot switch style mid-session
DigiDouble Relevance
Score
8/10
Strategically relevant for DigiDouble's Emotional Toolbox and Character Design axes. UNIQUE capability: multi-style avatars (cartoons, mascots, animals) enable non-human pedagogical characters — a gap no other commercial platform covers. Self-Managed Pipeline aligns with DigiDouble's sovereignty requirement. Main challenge: ~3s hosted latency requires Self-Managed deployment to meet the 2s target. Strong candidate for Gamilab integration (gamified avatars).