The voices of AI generally aim to be realistic in a friendly way, imitating relaxed, happy and server. But a new open source model called Dia is tilted in the most emotional spectrum of voices, including some really intense screams.
The creators of the day on Nari Labs are a small group, but they have given the voices of the option to sound like a somewhat melodramatic artist, capable of making realistic laughs, cough, clarifying the throat, smell and yes, shout.
You may not think that shouting is a big problem for AI at this time, but shouting is difficult to pretend. You can't just be talking out loud; It is a completely different speech mode.
Emotionally expressive discourse is a gap in most of the voices of AI. It is easy for a voice model to read a story to go to bed. However, it is much more difficult for them to sound as if they were trying to calm a friend, or as if he simply saw something shocking. Most commercial models avoid robotic sounding to soften the tone of the voice, which leaves no room for the type of audio asymmetry to speak emotionally.
Day deals with nonverbal communication as part of the performance. He knows that “(cough)” is not something that is literally ignored or read. He knows that a cry is not just a stronger line. And perform these things with a time level, modulation of tone and control of breathing that makes them feel more real.
An entrepreneurial user even used it to recreate a bit of the famous Leroy Jenkins sketch made in Warcraft world.
That does not mean that Operai, Elevenlabs, Google, Sesame and others have not produced incredible models of the voice of AI. You can customize OpenAi's advanced voice mode to talk to different emotions, and Oncelabs is good to interpret capitalization and score to adjust speech, but that is not the same as shouting in surprise or laughter of laughter.
Sesame is particularly good to sound and react as a real person, but even their models are wrong with happy and generally positive behaviors.
Of course, realism is subjective, and could resolve quite fast that day is a voice of AI. On the other hand, false screams and laughs are also human sounds in the right context.
Two university students. One still in the army. Zero financing. A ridiculous objective: build a TTS model that rival the notebook podcast, Elevenlabs Studio and Sesame CSM. Some moment … we did it. This is how 👇 pic.twitter.com/8cfjsegcixApril 21, 2025
Shout for ai
What makes this a bigger story that “Ai Voice learns a party trick” is what he points out for the widest race in AI for emotional intelligence.
We are quickly entering an era where it will not be enough for your assistant to say the right thing; You will have to say it in the right way. Think of the customer service bots that sound genuinely, the teachers who sound encouraging instead of instruction and characters in the game that transmit sincerity.
Of course, to give the power to emotionly make it more persuasive and, therefore, potentially more manipulative. If emotional discourse can be just another AI tool, then more than a few people may want to shout.
Even so, I imagine that he has fun writing a story of ghosts so that day not only reads, but acts, shouts and everything.
You may also like