OpenAI’s New Models Listen, Translate & Act in Real Time

OpenAI is moving beyond the keyboard with the launch of three new audio models: GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper.
Voice is now becoming the most common and natural way for people to use software, helping them multitask and manage matters while on the move.
The company aims to make voice-based AI more conversational and capable of completing tasks in real time by giving software agents a natural tone and the ability to reason.
GPT-Realtime-2 is OpenAI’s first voice model with GPT‑5‑class reasoning. It can handle harder requests and carry the conversation forward naturally.
GPT-Realtime-Translate is a new live translation model that translates speech from more than 70 input languages into 13 output languages while keeping pace with the speaker.
It targets customer support, education and other settings.
GPT-Realtime-Whisper, on the other hand, is a streaming speech-to-text that transcribes speech live as a speaker talks.
It would help in easy generation of live captions, meeting notes and workflow updates.
Moving beyond talk
OpenAI is observing developers building around emerging patterns like voice-to-action, where systems reason through requests and complete tasks.
GPT‑Realtime‑2, built for live voice interactions, keeps the conversation moving while it reasons through a request, calls tools, handles corrections or interruptions.
It will intelligently respond in a way that fits the moment.
A system-to-voice approach enables users to turn content into live spoken guidance.
Travellers can search for flights and hotels conversationally where the system will handle changes like adjusting hotel reservations after flight delays.
Cobus Kok, VP AI Experiences at Priceline, says: “GPT-Realtime-2 stood out for how well it handles complex requests, coordinates multiple tool calls at once, and keeps the interaction feeling natural.
“For Penny, Priceline’s AI travel agent, that translates into quicker, more practical support by voice–especially when travellers need to adjust plans in real time.”
In a voice-to-voice approach, AI can help live conversations continue across languages, tasks or changing context.
Here, GPT-Realtime-Translate can help developers build live multilingual voice experiences where each person speaks in their preferred language.
The model supports translation from more than 70 languages into 13 output languages.
Deutsche Telekom is building voice support experiences where customers can speak in the language they’re most comfortable using, while the model translates the conversation in real time.
The system targets customer support, education and media platforms serving global audiences.
Bridging communication gaps
Vimeo also uses the model to translate product education videos live as they play.
This allows global customers to hear updates in their preferred language without waiting for separately produced versions.
Prateek Sachan, Co-Founder and CTO at BolnaAI, says: “Building voice AI for India means handling diverse regional phonetics. GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested.”
GPT-Realtime-Whisper, on the other hand, provides live speech to text, allowing captions and meeting notes to be generated as a speaker talks.
The streaming transcription model is built for low-latency experiences.
The model makes live speech usable inside business workflows as it happens. Teams can power captions for broadcasts or generate summaries while conversations are still in progress.
Additionally, the Realtime API incorporates multiple layers of safeguards to prevent misuse like active classifiers to halt sessions that violate harmful content guidelines.
Usage policies also prohibit the distribution of outputs for spam or deceptive purposes. Developers must ensure it is clear to end users when they are interacting with AI.



