Back/LemonSlice (LS-2.1)
commercialSovereignty 2/5

LemonSlice (LS-2.1)

20B Video DiT · Multi-style avatars (human + cartoon + mascot) · Real-time emotion triggering

TTFR Latency

~3000ms

real-time

Cost / minute

$0.210/min

real-time

Visual Quality

8/10

estimated score

Protocols

REST, WebSocket, WebRTC (via Self-Managed Pipeline)

Avatar Customisation

RAG / Knowledge Base

No native RAG. Knowledge base must be managed externally by the developer's LLM layer.

Behavior & Personality

System prompt passed to the integrated LLM layer. Developer can also bring their own LLM (BYOLLM) and control behavior entirely.

Body Language & Gestures

LS-2.1 generates full-body motion from audio. Gestures, head movements and posture are synthesised by the 20B DiT model — not pre-recorded loops.

Facial Expressions

LS-2.1 Emotion API: trigger specific emotional states (joy, surprise, concern, neutral) via API call. Real-time emotion blending with smooth transitions. Context freshness < 500ms.

Voice & Voice Cloning

Integrated TTS or BYOTTS (Bring Your Own TTS). Supports ElevenLabs, Cartesia, and custom audio streams. 20+ languages.

Persona Fine-Tuning

Avatar style is configurable at creation: photorealistic human, cartoon, mascot, animal, or stylised character — all from a single image or reference. Style is locked per avatar instance.

Avatar Training

Video required No (image sufficient)
Duration1 image (zero-shot) or short video clip (optional for higher fidelity)
Resolution512×512 minimum · 1024×1024 recommended
FormatJPEG, PNG, MP4
Consent required Yes (mandatory)
Processing time< 5 minutes (zero-shot) · 15–30 minutes (fine-tuned)

Best Practices

  • 01.Single front-facing image with clear face/character visibility
  • 02.For cartoon/mascot: provide reference sheet with multiple angles if available
  • 03.Consistent lighting on reference image improves temporal coherence
  • 04.For photorealistic: neutral expression in reference yields best emotional range
  • 05.Optional: 30s reference video for improved lip-sync accuracy

API Analysis

Protocols

RESTWebSocketWebRTC (Self-Managed Pipeline)

SDKs

JavaScript/TypeScript SDKPython SDKReact component library
Webhooks Yes

Concurrent Sessions

Hosted: plan-dependent · Self-Managed: limited by GPU capacity

Rate Limits

Hosted API: 10 concurrent sessions (Growth) · Self-Managed: no limit

Key Features

  • LemonSlice-2 (Dec 2025): 20B Video DiT, 20 FPS on single A100 GPU — 10× efficiency vs LS-1
  • LS-2.1 (Q1 2026): adds real-time emotion triggering + action API (wave, nod, point, etc.)
  • UNIQUE: only commercial platform supporting cartoons, mascots, animals alongside photorealistic humans
  • Zero-shot avatar creation from 1 image — no training video required
  • Self-Managed Pipeline: deploy LS-2 on your own GPU infrastructure for full sovereignty
  • BYOLLM + BYOTTS: bring your own LLM and TTS, LemonSlice handles only video rendering
  • Emotion API: trigger joy, surprise, concern, neutral with configurable intensity (0.0–1.0)
  • Action API: trigger gestures (wave, nod, point, shrug) via API call
  • Temporal coherence: 20B DiT maintains identity across long sessions without drift
  • Multi-character scenes: up to 3 avatars in a single session (beta)

API Constraints

  • Self-Managed Pipeline requires A100/H100 GPU (not consumer hardware)
  • Hosted API: ~3s end-to-end latency (not suitable for sub-2s target without Self-Managed)
  • No native RAG — developer must manage knowledge base externally
  • Cartoon/mascot styles require style reference image for best results
  • Multi-character beta: limited to 3 avatars, no cross-avatar interaction API yet

Pricing Model

Model: Subscription + usage-based · Self-Managed: flat GPU fee
PlanPriceIncluded minutesOverage
Free$0/mo30 min/mo (hosted)N/A
Starter$49/mo200 min$0.25/min
Growth$199/mo950 min$0.21/min
Self-Managed$499/moUnlimited (own GPU)GPU cost only
EnterpriseCustomCustomNegotiated
Free tier
On-premise available
Enterprise pricing

Hidden costs / watch out

  • Self-Managed requires A100/H100 GPU rental ($2–4/hr on cloud)
  • BYOTTS costs billed separately by TTS provider
  • Fine-tuning jobs billed per compute hour

Sovereignty & Hosting

Sovereignty Score

2/5

Hosting

US (hosted) · On-premise possible via Self-Managed Pipeline

GDPR

Yes

On-premise

Yes

Sovereignty detail

US-hosted by default. Self-Managed Pipeline allows on-premise GPU deployment — sovereignty possible but requires infrastructure setup.

Constraints & Limits

  • ~3s hosted latency — above DigiDouble 2s target without Self-Managed Pipeline
  • Self-Managed requires A100/H100 GPU (significant infrastructure investment)
  • No native RAG integration
  • US-hosted by default (Self-Managed enables sovereignty)
  • Multi-character scenes limited to 3 avatars (beta)
  • Cartoon/mascot style locked at avatar creation — cannot switch style mid-session

DigiDouble Relevance

Score

8/10

Strategically relevant for DigiDouble's Emotional Toolbox and Character Design axes. UNIQUE capability: multi-style avatars (cartoons, mascots, animals) enable non-human pedagogical characters — a gap no other commercial platform covers. Self-Managed Pipeline aligns with DigiDouble's sovereignty requirement. Main challenge: ~3s hosted latency requires Self-Managed deployment to meet the 2s target. Strong candidate for Gamilab integration (gamified avatars).