Are you ready to generate more awareness for your brand? Consider becoming a sponsor of The AI Impact Tour. Learn more about opportunities here.
Meta AI researchers announced Thursday that they have developed a new set of artificial intelligence models called Seamless Communication that aims to enable more natural and authentic communication between languages, essentially making the concept of a universal voice translator a reality. The models were published this week along with accompanying research papers and data.
The flagship model, called Seamless, combines capabilities from three other models (PerfectExpression, SeamlessStreaming and SeamlessM4T v2) into a unified system. According to the research paper, Seamless is “the first publicly available system that unlocks expressive communication between languages in real time.”
How Seamless works as a universal real-time translator
The Seamless translator represents a new frontier in the use of AI for blog communication. It combines three sophisticated neural network models to enable real-time translation between more than 100 spoken and written languages, while preserving the vocal style, emotion and prosody of the speaker’s voice.
SeamlessExpression focuses on preserving the vocal style and emotional nuances of the speaker’s voice when translating between languages. As described in the article, “translations must capture the nuances of human expression. “While existing translation tools are adept at capturing the content of a conversation, they typically rely on robotic, monotonous text-to-speech systems for their production.”
The AI Impact Tour
Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!
SeamlessStreaming enables near real-time translation with just two seconds of latency. The researchers say it is the “first massively multilingual model” to offer such fast translation speeds in nearly 100 spoken and written languages.
The third model, SeamlessM4T v2, serves as the basis for the other two models. It is an improved version of the original SeamlessM4T model released last year. The new architecture offers “improved consistency between text and voice output,” according to the document.
“In summary, Seamless gives us a fundamental insight into the technical foundation needed to turn the Universal Speech Translator from a science fiction concept to a real-world technology,” the researchers wrote.
Potential to transform global communication
The models’ capabilities could enable new voice-based communication experiences, from real-time multilingual conversations using smart glasses to automatically dubbed videos and podcasts. Researchers suggest it could also help break down language barriers for immigrants and others who struggle with communication.
“By making our work public, we hope that researchers and developers can expand the impact of our contributions by building technologies aimed at bridging multilingual connections in an increasingly interconnected and interdependent world,” the document states.
However, researchers acknowledge that the technology could also be misused for voice phishing scams, deepfakes, and other harmful applications. To promote the safety and responsible use of the models, they implemented several measures, including audio watermarking and new techniques to reduce toxic emissions from hallucinations.
Models posted publicly on Hugging Face
In line with Meta’s commitment to open research and collaboration, Seamless Communication models have been published publicly on Hugging Face and Github.
The collection includes the Seamless, SeamlessExpressed, SeamlessStreaming, and SeamlessM4T v2 models along with their accompanying metadata.
By making these next-generation natural language processing models freely available, Meta hopes to enable other researchers and developers to build on and extend this work to help connect people across languages and cultures. The release underscores Meta’s leadership in open source AI and provides a valuable new resource for the research community.
“Overall, the multidimensional experiences that Seamless can generate could lead to a radical change in the way machine-assisted interlingual communication is achieved,” the researchers concluded.
VentureBeat’s mission is to be a digital marketplace for technical decision makers to gain insights into transformative business technology and transact. Discover our Briefings.