Back/OpenAI Realtime API

Cloud API#4 Artificial AnalysisCommercial

OpenAI Realtime API

GPT-4o speech-to-speech — integrated LLM + voice, WebSocket

Website Docs

300ms

TTFA (best case) ?

700ms

TTFA (typical) ?

Free

Price per million chars

1106

ELO Score ?

Comparative Scores

Voice quality?8/10

Latency?6/10

Voice cloning?1/10

Expressiveness?7/10

Sovereignty?1/10

Price accessibility4/10

Multilingual9/10

Architecture

ArchitectureGPT-4o multimodal (speech-to-speech, integrated LLM)

ParametersN/A (cloud, GPT-4o scale)

Languages50

Self-hostable No

Streaming Yes

GamiWays

Phase 1 MVP — Référence benchmark

Reference for Phase 1 MVP benchmarking. Not suitable for production GamiWays due to no sovereignty and no voice cloning. Use as quality/latency benchmark. Compare against Ultravox (end-to-end) and cascading sovereign stack.

Analysis

OpenAI Realtime API provides GPT-4o speech-to-speech with integrated reasoning. WebSocket streaming, full-duplex capable. ELO 1106 (TTS rank #4). 1.536s median latency vs 0.864s for Ultravox. No voice cloning, no sovereignty. Best for teams already in OpenAI ecosystem needing integrated LLM+voice.

Strengths

GPT-4o reasoning integrated
ELO 1106 — rank #4
50+ languages
WebSocket full-duplex
Well-documented API

Weaknesses

No voice cloning
No sovereignty (US cloud)
$0.10/min — expensive at scale
1.536s median latency (vs 0.864s Ultravox)

Voice Capabilities

Voice Cloning ? No

No voice cloning. 6 pre-built voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer).

Emotion Control Yes

Natural emotional range from GPT-4o. Limited explicit emotion control.

Streaming ? Yes

WebSocket streaming. Full-duplex capable. Integrated with GPT-4o reasoning. 1.536s median latency (vs 0.864s for Ultravox).

Lip-sync Data ? No

No native lip-sync data.

Pricing

Price / 1M chars

Free

Price / minute

$0.1000

Free tier

API credits on signup

$0.06/min audio input + $0.24/min audio output (GPT-4o Realtime). ~$0.10/min average.

Sovereignty & Compliance

On-premise No

Cloud only. No on-premise option.

GDPR ? Compliant

Data residency: US (default). EU data residency on enterprise.

Strategic & Business Analysis

OpenAI Realtime API — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?

OpenAI Realtime API combines GPT-4o reasoning with real-time voice — the most capable but also the most expensive and sovereignty-hostile option. Cloud-only at $0.30/min total makes it a Phase 1 prototype tool, not a Phase 2 production choice.

Cloud SaaS only

Lock-in risk:High

Sovereignty fit:Low

Open-source threat:High

Pricing:Falling ↓

A. Strategic Positioning

Target customer: Developer / Enterprise — GPT-4o integration, real-time voice agents

GPT-4o native real-time voice API — speech-to-speech with LLM reasoning, emotional voice, and OpenAI ecosystem integration.

B. Competitive Moat

Native GPT-4o integration — LLM reasoning + voice in one API call
OpenAI brand and ecosystem — massive developer adoption and trust
Emotional voice with natural interruption handling

Vulnerability: Cloud-only — no on-premise option. High cost. EU data sovereignty concerns. OpenAI's legal/regulatory exposure.

E. Strategic Questions for GamiWays

Sovereignty fit

Cloud-only with limited EU data residency. OpenAI's US jurisdiction creates sovereignty risk for Swiss/EU regulated deployments.

Build vs. Buy

Buy for Phase 1 if GPT-4o reasoning is required. For Phase 2 sovereignty, switch to open-source stack (Ultravox + Kokoro/Chatterbox).

Lock-in risk

Deep GPT-4o integration creates strong ecosystem lock-in. Switching costs are high if LLM reasoning is core to the voice agent design.

Roadmap alignment

Good for Phase 1 prototyping with GPT-4o. Incompatible with Phase 2 sovereignty requirements without major architectural changes.

Back to State of the Art View in Benchmarks

Data Freshness

Updated 30 April 2026

OpenAI docs + Ultravox AIEWF eval, Feb 2026

Update note: OpenAI Realtime API (GPT-4o) pricing: $0.06/min audio input + $0.24/min audio output. ELO 1106 (rank #4, Apr 2026). TTFA ~300ms. WebSocket streaming. 57 languages. No on-premise.

Reference Sources

OpenAI Realtime API Pricingpricing OpenAI Realtime API Docsdocs Artificial Analysis TTSbenchmark