ChatGPT received a cool new Advanced Voice Mode earlier this week, and while it's only rolling out to a small subset of paid subscribers for now (in alpha testing), we've now been able to see several samples of the feature in action.
These features are popping up all over the internet, on sites like YouTube and X, and lucky ChatGPT Plus users who have access to this feature are putting it to use in a variety of different tasks. As The Verge reports, these include requests to sing a song in a certain way or imitate accents, to tackling the nuances of correct pronunciation in languages.
If you recall, this feature was revealed in the GPT-4o release a few months back. However, the advanced voice mode was delayed due to apparent concerns about the need to beef up security with this feature, but it's here now and it's definitely in action, as mentioned, with some impressive results.
For example, The Verge notes that ChatGPT gives a lesson on the pronunciation of French words to a user on YouTube, where the AI is quite helpful.
Here's another example: a request to sing “Happy Birthday” in a “soulful blues” style. Or how about ChatGPT telling some jokes in different voices (shy, angry)?
ChatGPT Advanced Voice Mode counting as fast as he can to 10, then to 50 (this blew my mind – he stopped to catch his breath like a human would) pic.twitter.com/oZMCPO5RPhJuly 31, 2024
Finally, check out the previous and next posts on ChatGPT's Advanced Voice Mode X, which quickly counts and then addresses US regional accents.
ChatGPT's advanced voice mode tests various US regional accents pic.twitter.com/UvDeQUNHLpJuly 31, 2024
If you're interested in getting involved, OpenAI has told us that all ChatGPT Plus subscribers will receive the advanced voice mode later this year. The full rollout should be completed by “late fall,” so in theory, everyone should have it by the time December rolls around.
Analysis: 50 shades of cool
If you've seen the previous demos, they're great, aren't they? If not, keep watching…
A lot of attention to detail has been paid to making the Advanced Voice Mode feel more human and real – notice the artificial self-imposed difficulty level built in for counting to 50 super fast, including a pause for breath – a really neat touch.
Or the blues singing excursion, which isn’t just about the singing itself (which is very well implemented, to be sure), but about the detailed explanations of how the singer might approach the song and the natural style and performance of the AI’s voice here (and elsewhere). These AI interactions are taken to new heights of realism here, even if there are still some details to be worked out.
On the latter note, we weren't all that impressed with the American accents, though it was a difficult task and they improved a bit when the user asked ChatGPT to emphasize them more. And while the AI's responses are generally very quick and concise (and fluent), you can notice the occasional moment of silence and confusion when watching a number of these clips online.
However, remember that the advanced voice mode is still in its alpha phase, and given that, it's really impressive – surprisingly good in some scenarios. This might be one of the areas where AI is advancing so fast it's scary…
You may also like…