OpenAI begins public release of ChatGPT Enhanced Voice Mode

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more

Four months later initially shown to the public, Open AI finally brings new human-like conversational voice interface for ChatGPT — “ChatGPT Advanced Voice Mode” to users beyond the initial small test group and waiting list.

All paid subscribers to OpenAI’s ChatGPT Plus and Team plans will have access to the new ChatGPT Enhanced Voice Mode, but according to OpenAI, access will be gradually rolled out over the next few days. It will initially be available in the US.

The company plans to roll out ChatGPT Enhanced Voice Mode to Edu and Enterprise plan subscribers next week.

OpenAI is also adding the ability to store “specific instructions” for the voice assistant and include a “memory” of behaviors the user wants it to exhibit, similar to features that were introduced for the text version of ChatGPT earlier this year.

Also available today are five new, distinctly styled voices: Arbor, Maple, Sol, Spruce, and Vale — joining the previous four voices, Breeze, Juniper, Cove, and Ember, that users could talk to using ChatGPT’s older, less advanced voice mode.

Enhanced Audio is rolling out to all Plus and Team users on the ChatGPT app throughout the week.
While you wait patiently, we’ve added Special Instructions, Memory, five new voices, and improved accents.
It can also say “Sorry I’m late” in over 50 languages. picture.twitter.com/APOqqhXtDg
— OpenAI (@OpenAI) September 24, 2024

This means ChatGPT users — individuals for Plus and small business teams for Teams — can use the chatbot by speaking rather than typing a command. Users will know they’ve entered Advanced Voice Assistant via a pop-up when they access voice mode in the app.

“Since the alpha release, we’ve used what we’ve learned to improve accents and overall speaking speed and fluidity in ChatGPT’s most popular foreign languages,” the company said. “You’ll also see a new design for Advanced Voice Mode, featuring an animated blue globe.”

Initially, the voice mod featured four voices (Breeze, Juniper, Cove, and Ember), but the new update will bring five new voices called Arbor, Maple, Sol, Spruce, and Vale. OpenAI did not provide an audio sample for the new voices.

These updates are only available on the GPT-4o model and not on the recently released model preview model, o1ChatGPT users can also use custom instructions and memory to ensure the voice mode is personalized and responds according to their preferences for all conversations.

AI voice chat race

Since the emergence of AI-powered voice assistants like Apple’s Siri and Amazon’s Alexa, developers have been looking to make the productive AI conversational experience more human.

ChatGPT featured sounds even before the voice mode was released; Read Aloud functionBut the goal of Enhanced Voice Mode is to give users a more human-like conversational experience, a concept that other AI developers are looking to emulate.

Hume AI, a startup from former Google Deepminder Alan Cowen, has released its second version Empathic Voice InterfaceIt is a human-like voice assistant that can detect emotion from a person’s voice and is available to developers through a dedicated API.

French AI company Kyutai Moshi releasedIn July, an open-source artificial intelligence voice assistant, .

Google also added voices to its Gemini chatbot via Gemini LiveAs it aims to catch up with OpenAI. Reuters It was reported Meta is also reportedly developing voices that resemble popular actors to add to the Meta AI platform.

OpenAI says it is bringing the technology into the hands of a wider audience by making AI voices available to more users across its platforms. a lot more people is better than other companies.

It came after delays and controversy

But the idea of AI voices speaking in real time and responding with the appropriate emotion hasn’t always been well-received.

OpenAI’s attempt to add voice to ChatGPT was initially controversial. At its May event announcing GPT-4o and its voice mode, people noticed similarities between one of the voices, Sky. that of actress Scarlett Johanssen.

It didn’t help that OpenAI CEO Sam Altman took to social media to share the word “her,” referencing a film in which Johansson provides the voice of an AI assistant. The controversy has sparked concerns about AI developers impersonating the voices of famous people.

The company denied the allegations that it referred to Johansson and said it had no intention of hiring actors who sounded like others.

The company said users are limited to just nine voices from OpenAI. It also said it was evaluating its security before launch.

“We tested our model’s voice capabilities with external red team members who spoke a total of 45 different languages and represented 29 different geographies,” the company told reporters.

However, this ChatGPT delays launch of Advanced Voice Mode from the originally planned late June launch date to “late July or early August” and only then The first group of users selected by OpenAI Like Ethan Mollick, a professor at the Wharton School of Business at the University of Pennsylvania, he says voice mode should continue to be tested for security or a “reading team” should be formed to prevent it from being used in possible fraud and irregularities.

Clearly, the company feels it’s done enough to release the mod more broadly now, and that’s consistent with OpenAI’s generally more cautious approach of late, Hand in hand with the US and UK governments and giving them the opportunity to preview new models, for example o1 series before launch.

VB Daily

Stay informed! Get the latest news in your inbox every day

By subscribing you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occurred.