Cover Image for Meta launches SeamlessM4T, a sophisticated artificial intelligence model that translates and transcribes voice and text in 101 languages.
Fri Jan 17 2025

Meta launches SeamlessM4T, a sophisticated artificial intelligence model that translates and transcribes voice and text in 101 languages.

Farewell to the Tower of Babel. Meta's artificial intelligence can handle 101 languages and produces synthesized voice translations in 36 different languages.

Meta has developed SeamlessM4T, an innovative artificial intelligence solution that enables voice and text translation and transcription in a total of 101 languages. This technology, which could bring us closer to the vision of a universal translator from "The Hitchhiker's Guide to the Galaxy," aims to solve the challenges of communication across multiple languages.

Meta's new model stands out for its ability to provide instant translations from voice to voice and from text to voice, while maintaining the expression and tone of the speakers. SeamlessM4T, which translates to "Multilingual and Multimodal Machine Translation," integrates multiple functions into a single system, offering an improvement in translation accuracy ranging from 8% to 23%. Its resistance to background noise and variations in speech has also shown a 50% increase.

Researchers have paid special attention to factors such as the toxicity of translations and gender biases, implementing strategies to mitigate these issues during the training and inference of the model. Marta Costa-Jussà, who leads this project in Meta's artificial intelligence division, mentions that the model was trained using one million hours of natural conversational audio, which expands its translation capabilities to languages that were not specifically included in its training.

Meta has decided to make the model and its data available to the public for non-commercial purposes, thereby encouraging research and development in the field of voice translation. However, SeamlessM4T still faces notable challenges, especially in contexts such as medicine and law, where accuracy is crucial. The translation of proper names and colloquial expressions, along with the recognition of accents and gender biases, still requires refinement.

Machine translation has seen significant advancements in recent decades, largely due to the use of neural networks supported by large volumes of data. However, the scarcity of data for less spoken languages limits the impact of machine translation. Despite this, the Meta team has leveraged its experience with previous projects, such as No Language Left Behind, to expand its translation capabilities to languages with less representation on the Internet.

To train SeamlessM4T, millions of hours of speech and their respective translations were collected, using various sources, including the United Nations. This allowed for the automatic association of audio and text segments across different languages, thereby improving the system's effectiveness.