Google on Monday gave a brief update on its ambitious AI language model that supports 1,000 different languages.
In November 2022, Google had announced a model that would support the world’s one thousand most-spoken languages, bringing greater inclusion to billions of people around the globe. Elaborating on its capabilities so far, Google said that its Universal Speech Model (USM) from a family of state-of-the-art speech models has 2 billion parameters trained on 12 million hours of speech and 28 billion sentences of text, spanning 300+ languages.
USM, which is for use in YouTube (e.g., for closed captions), can perform automatic speech recognition (ASR) not only on widely-spoken languages like English and Mandarin, but also on under-resourced languages like Amharic, Cebuano, Assamese, and Azerbaijani to name a few.
Google has identified two challenges to automatic speech recognition (ASR) including lack of scalability with conventional supervised learning approaches and models need to improve in a computationally efficient manner while the company expands the language coverage and quality.
“The development of USM is a critical effort towards realising Google’s mission to organize the world’s information and make it universally accessible. We believe USM’s base model architecture and training pipeline comprise a foundation on which we can build to expand speech modeling to the next 1,000 languages,” Google mentioned in its blog.