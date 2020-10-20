Facebook introduced M2M-100, the first artificial intelligence-based multilingual machine translation (MMT) model that can translate content into 100 different languages ​​without relying on data from the English language. This powerful tool, available as open source, would “improve the quality of translations for billions of people,” promises Facebook.

A model based on 15 billion parameters

Currently, most instant multilingual models rely on data from the English language to perform their translations. For example, to switch from Chinese to French, most tools will first switch from Chinese to English before switching from English to French. A process that often leaves room for dangerous translations.

Facebook’s artificial intelligence research division has been working to change that with M2M-100, a multilingual translation model that can translate 100 languages ​​between them and switch directly from one language to another without relying on English. . Result: The translations obtained are much more precise and accurate than those of the previously known models.

To achieve this result, the scientists responsible for the project collected 7.5 billion sentences over several years, which were translated into different languages, in order to train their model, which today comprises 15 billion parameters. A long and tedious task that started with using CommonCrawl (ccAligned, ccMatrix) to collect sample text on the web. FastText, a text classification system that Facebook developed a few years ago, then identified the languages ​​in which texts were written.

Finally, the LASER mining protocol was able to automatically localize sentences translated into different languages. Angela Fan, research fellow for this project, explains: “He (LASER) reads the sentences, takes the texts and creates a mathematical representation of them so that the sentences that have the same meaning correspond to the same thought”.

M2M-100 goes beyond instant translations on Facebook

For Facebook, the use of M2M-100 is obvious. Thanks to this powerful model, Mark Zuckerberg’s social network can significantly improve the quality of the instant translations available on its platform, thus providing better readability for users. As a reminder: 20 billion translations are carried out on Facebook every day. However, before it gets there, the multilingual machine translation model still has to be subjected to several conformity tests.