Translation tech helps business work with a world of words

Researchers in the United States say they could help bring about a tenfold increase in the number of languages that work with automatic speech recognition

Researchers at Carnegie Mellon University are working on expanding the number of languages that benefit from modern language technologies such as voice-to-text transcription, automatic captioning, instantaneous translation and voice recognition. 

Currently, only a small fraction of the 7,000 to 8,000 languages spoken worldwide have access to these tools, with only around 200 languages having automatic speech recognition capabilities. The team aims to increase this number to potentially 2,000 languages.

"A lot of people in this world speak diverse languages, but language technology tools aren't being developed for all of them," says Xinjian Li, a PhD student at the School of Computer Science's Language Technologies Institute (LTI). "Developing technology and a good language model for all people is one of the goals of this research."

Li is a member of a research team that is working to make it easier to create speech recognition models by simplifying the data requirements. The team, which includes faculty members Shinji Watanabe, Florian Metze, David Mortensen and Alan Black, recently presented their latest work, called ASR2K: Speech Recognition for Approximately 2,000 Languages Without Audio, at the Interspeech 2022 conference in South Korea.

Linguistic elements shared across languages

To create a speech recognition model, most existing technologies require two types of data: text and audio. While text data is readily available for thousands of languages, audio data is not. The research team aims to eliminate the need for audio data by focusing on linguistic elements that are common across many languages.

Traditionally, speech recognition models focus on a language's phoneme, which are the distinct sounds that distinguish one word from another, such as the "d" in "dog" that differentiates it from "log" and "cog." However, languages also have phones, which describe how a word sounds physically. Multiple phones might correspond to a single phoneme, which means that even though different languages may have different phonemes, their underlying phones could be the same.

To address this, the LTI team is developing a speech recognition model that moves away from phonemes and instead relies on information about how phones are shared between languages, thereby reducing the effort to build separate models for each language.

"We are trying to remove this audio data requirement, which helps us move from 100 or 200 languages to 2,000," says Li. "This is the first research to target such a large number of languages, and we're the first team aiming to expand language tools to this scope."

Researchers say the work has improved existing language tools by just five per cent, but the team hopes it will inspire their future work and that of other researchers.

"Each language is a very important factor in its culture. Each language has its own story, and if you don't try to preserve languages, those stories might be lost," says Li. "Developing this kind of speech recognition system and this tool is a step to try to preserve those languages."

Share

Featured Articles

Lenovo: Employees prefer mix of AI and human IT support

New Lenovo survey shows 91% of employees believe they would be more productive when their IT issues at work are resolved quickly and effectively

Kyndryl’s Data and AI Console to simplify data management

Data-driven solution expands and increases observability and insights, while enhanced data governance helps identify irregularities and threats

Deep neural networks still struggling to match human vision

New study by researchers in Canada finds artificial intelligence still can't match the powers of human vision despite deep learning's ability with big data

Metaverse destined to become an impossible, dangerous place

Technology

Clever coders lead the way as Microsoft launches 365 Copilot

AI Applications

Baidu’s ERNIE doesn’t want confrontation with United States

AI Applications