OpenAI Realtime API
GPT-4o speech-to-speech — integrated LLM + voice, WebSocket
Comparative Scores
Architecture
Reference for Phase 1 MVP benchmarking. Not suitable for production GamiWays due to no sovereignty and no voice cloning. Use as quality/latency benchmark. Compare against Ultravox (end-to-end) and cascading sovereign stack.
Analysis
OpenAI Realtime API provides GPT-4o speech-to-speech with integrated reasoning. WebSocket streaming, full-duplex capable. ELO 1106 (TTS rank #4). 1.536s median latency vs 0.864s for Ultravox. No voice cloning, no sovereignty. Best for teams already in OpenAI ecosystem needing integrated LLM+voice.
Strengths
- GPT-4o reasoning integrated
- ELO 1106 — rank #4
- 50+ languages
- WebSocket full-duplex
- Well-documented API
Weaknesses
- No voice cloning
- No sovereignty (US cloud)
- $0.10/min — expensive at scale
- 1.536s median latency (vs 0.864s Ultravox)
Voice Capabilities
No voice cloning. 6 pre-built voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer).
Natural emotional range from GPT-4o. Limited explicit emotion control.
WebSocket streaming. Full-duplex capable. Integrated with GPT-4o reasoning. 1.536s median latency (vs 0.864s for Ultravox).
No native lip-sync data.
Pricing
$0.06/min audio input + $0.24/min audio output (GPT-4o Realtime). ~$0.10/min average.
Sovereignty & Compliance
Cloud only. No on-premise option.
Data residency: US (default). EU data residency on enterprise.
OpenAI Realtime API — Strategic Positioning
Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?
OpenAI Realtime API combines GPT-4o reasoning with real-time voice — the most capable but also the most expensive and sovereignty-hostile option. Cloud-only at $0.30/min total makes it a Phase 1 prototype tool, not a Phase 2 production choice.
A. Strategic Positioning
Target customer: Developer / Enterprise — GPT-4o integration, real-time voice agents
GPT-4o native real-time voice API — speech-to-speech with LLM reasoning, emotional voice, and OpenAI ecosystem integration.
B. Competitive Moat
- Native GPT-4o integration — LLM reasoning + voice in one API call
- OpenAI brand and ecosystem — massive developer adoption and trust
- Emotional voice with natural interruption handling
Vulnerability: Cloud-only — no on-premise option. High cost. EU data sovereignty concerns. OpenAI's legal/regulatory exposure.
E. Strategic Questions for GamiWays
Sovereignty fit
Cloud-only with limited EU data residency. OpenAI's US jurisdiction creates sovereignty risk for Swiss/EU regulated deployments.
Build vs. Buy
Buy for Phase 1 if GPT-4o reasoning is required. For Phase 2 sovereignty, switch to open-source stack (Ultravox + Kokoro/Chatterbox).
Lock-in risk
Deep GPT-4o integration creates strong ecosystem lock-in. Switching costs are high if LLM reasoning is core to the voice agent design.
Roadmap alignment
Good for Phase 1 prototyping with GPT-4o. Incompatible with Phase 2 sovereignty requirements without major architectural changes.
Data Freshness
OpenAI docs + Ultravox AIEWF eval, Feb 2026
Update note: OpenAI Realtime API (GPT-4o) pricing: $0.06/min audio input + $0.24/min audio output. ELO 1106 (rank #4, Apr 2026). TTFA ~300ms. WebSocket streaming. 57 languages. No on-premise.