
Meta develops AI translator for unwritten languages
Meta developed an AI system to translate the unwritten Hokkien language into English.
Hokkien is spoken across Southeast Asia and has about 49 million native speakers. Its peculiarity lies in the absence of a single writing system. In total, there are around 3,500 such languages worldwide.
To train AI to understand human speech, researchers typically feed the computer a large corpus of written transcripts. However, for Hokkien, assembling a dataset is problematic.
Meta researchers focused on developing a speech-to-speech system. According to company representatives, they converted speech samples into a sequence of acoustic sounds that were used to create the language’s waveforms.
These signals were then combined with Mandarin Chinese to create pseudo-tokens. Meta described it as a \”related language\” for Hokkien.
\”We first translated English (or Hokkien) speech into Mandarin text, and then translated back into Hokkien (or English) and added it to the training data,\” Meta CEO Mark Zuckerberg said.
The system is still under development, as the AI can translate only one sentence at a time. Still, Zuckerberg is confident the technology can be applied to other similar languages.
The company has released the project’s source code so that other researchers can use it in their work.
Meta also released a speech-to-speech matrix, described as \”a large collection of speech-to-speech transformations developed with an innovative set of natural language processing tools\”.
In July, the company introduced the AI model NLLB-200 for online translations. The algorithm supports 200 languages, including less widely spoken languages.
In September, Meta developed a \”brain AI decoder\” for turning thought into speech. Its accuracy reached 73% when using a set of 793 words.
Subscribe to ForkLog AI on Telegram: ForkLog AI — all the news from the world of AI!
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!