After facing backlash over its earlier version, OpenAI has bounced back and launched its latest advanced Voice Mode (AVM) on ChatGPT—a smarter voice assistant that can sense emotions and lets users interrupt with new features and upgrades.
The AVM had its debut demonstration at the company’s Spring Launch event in May but was later shelved after receiving criticisms for sounding like Scarlett Johansson. Now, starting in late July, the newest edition has been available in alpha for select ChatGPT Plus subscribers.
“We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions,” OpenAI wrote in an announcement on X.
Better performance and capabilities
Compared to its previous version, the artificial intelligence (AI) giant claimed that its new product has an improved performance, thanks to GPT-4o’s video and audio capabilities.
While the May edition was developed using three separate models, the recent AVM utilizes only one model end-to-end across vision, text, and audio. This means that all processes are done by a single neural network, allowing for latencies lower than 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4.
Moreover, OpenAI has taken steps to prevent copyright infringement by adding new filters that refuse requests to create music or other types of licensed media.
Besides audio, the AVM is also designed to assist with its screen and video-sharing features. However, these are not yet accessible in alpha.
OpenAI stated that it would be open for user feedback to refine its model, with plans to report on GPT-4o’s performance, safety evaluations, and limitations in August before the company rolls out the product to all ChatGPT Plus subscribers sometime in the fall.
Storyteller, singer, and more
Since its release, many ChatGPT Plus users have expressed their reactions after trying out the AVM. They already have uploaded videos showing how OpenAI’s voice assistants can tell stories and sing songs with proper pronunciations and even accents.
One of them was @nickfloats on X, who posted a clip asking ChatGPT to “tell me a story as if you’re an airline pilot telling it to passengers on a flight.” After just a few seconds, the chatbot narrated a story with an added effect, as if it were coming from an intercom.
A YouTube video also shows ChatGPT as a language coach, giving instructions on how to pronounce French words correctly. Similarly, the AI is heard speaking Turkish while telling an emotional story in a separate clip.
Different regional US accents, such as New York, Boston, and Wisconsin, were mimicked by the chatbot in a video shared on X. Further, ChatGPT was requested to sing “Happy Birthday” in the Blues genre and delivered, as it also did when asked to perform the same song by imitating various animals.
Sky voice on pause
Five voices—Breeze, Cove, Ember, Juniper, and Sky—were heard when the first version of the Voice Mode was originally introduced in September 2023. Among them, Sky rose to controversy for resembling Scarlett Johansson, who later threatened to file a lawsuit against OpenAI for using her voice without permission.
Since then, the company has paused using Sky’s voice in its products and released a statement that said, “The voice of Sky is not Scarlett Johansson’s, and it was never intended to resemble hers. We cast the voice actor behind Sky’s voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky’s voice in our products. We are sorry to Ms. Johansson that we didn’t communicate better.”
Currently, the latest AVM only features four preset voices.