OpenAI’s delay in releasing ChatGPT’s impressive voice mode upset many AI chatbot fans, but they may now have gotten ahead of themselves. French AI developer Kyutai has unveiled a real-time voice assistant called Moshi.
Moshi is designed to deliver lifelike conversations with users via voice, like Alexa or Google Assistant, but it’s powered by the large language models that underpin ChatGPT and its rivals, in this case, the Helium 7B model. According to Kyutai, Moshi can speak in multiple accents and has 70 different speech styles and emotions. The AI can even handle two audio streams simultaneously, allowing Moshi to listen and speak simultaneously.
Kyutai’s development of Moshi involved fine-tuning over 100,000 synthetic dialogues created using text-to-speech (TTS) technology. The goal was to help teach Moshi the nuances and tones of human communication. The brand even collaborated with a professional voice artist to improve Moshi’s voice quality.
This AI assistant integrates text and audio training, optimized for multiple backends, meaning it can run on devices like laptops without needing to interact with the cloud. The company presents it as a way to maintain privacy and security by preventing the transmission of sensitive data over the internet. You can see a demo from Moshi here.
Open talk
Kyutai announced that Moshi will be an open-source project, including the codes and model framework, providing a foundation for future innovations. The open-source approach may also help mitigate complaints faced by larger AI companies regarding the security and ethics of their closed models. Kyutai’s backers, including French billionaire Xavier Niel, are pushing for the open-source approach.
Kyutai is also working on AI-powered audio identification, watermarking, and signature tracking systems that will be incorporated into Moshi. These features will help identify AI-generated audio, promoting accountability and traceability, while ensuring that AI-generated content can be monitored and verified.
Moshi is still in development, but the voice mode in the presentation is impressive. The voice approach may act as a catalyst for other voice-enabled versions of ChatGPT or accelerate the incorporation of LLM into Alexa and other voice assistants if Moshi catches on and becomes popular.
If you want to try Moshi, there is a demo available online and you can also sign up there to get early access to the full chatbot.