Speech recognition and text-to-speech models must be trained on thousands of hours of audio, but this data is not available for some languages that are only spread in industrialized countries or are on the verge of extinction. Because of this, Meta took an unusual approach: The model was trained on audio recordings of religious texts.
We turned to religious texts, such as the Bible, that have been translated into many different languages and whose translations have been studied extensively in text-based language translation research.
The company announced.
There are publicly available audio recordings of these translations being read in different languages
By incorporating audio recordings of the Bible and similar texts, the Meta researchers increased the number of languages fed into the model to more than 4,000.
However, according to Meta, the model will not produce scripts with a religious tone, as they used a different, more limited approach than the ChatGPT developers prone to hallucinations. Moreover, according to the company, although most religious texts are read by men, the model faithfully reproduces women’s voices.
Meta says in its announcement that the tool is expected to reverse a negative trend accelerated by big tech companies that is leading to the disappearance of less used languages. Popular software used around the world only supports 100 languages or less, so users use their native language less than ever before. Meta expects that assistive technologies such as speech recognition and VR/AR technologies will enable everyone to speak and learn in their native language.
Financial IT 2023
The banking application of similar tools will also be discussed at Portfolio’s Financial IT 2023 conference held on June 1.
Information and application
Cover image source: Getty Images