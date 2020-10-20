Facebook developed the first multilingual machine translation model, capable of translating up to 100 languages ​​without using English as an intermediary. The system, called M2M-100, uses artificial intelligence.

According to Facebook research assistant Angela Fan, this is an important step towards a universal model that understands all languages ​​in different tasks. The company has not yet released information on when the model will be implemented. So far, technology is just a research project.

How the study was conducted

Initially, the research team collected 7.5 billion pairs of phrases in 100 different languages ​​from the internet, giving priority to the translations most requested by Internet users.

Then, the languages ​​were separated into 14 groups, based on linguistic, geographical and cultural similarities. One such group, for example, includes common Indian languages, such as Hindi, Bengali and Marathi. To facilitate people’s understanding, the team decided to create translation bridges.

In the case of Indian languages, Hindi, Bengali and Tamil served as intermediaries for Indo-Aryans. With this technique, the company says it has surpassed the English-centric systems by 10 points on the BLEU metric, which evaluates automatic translations, reaching the 20.1 mark.

“When translating, say, from Chinese to French, most English-centric multilingual models train from Chinese to English and from English to French, because English training data is widely available,” explained Angela Fan. “Our model trains directly on Chinese to French data to better preserve meaning.”

Although it has not yet been incorporated into Facebook, where users post content in more than 160 languages, tests carried out by the team indicate that the model can support a wide variety of translations.



