The project is a step forward from Google’s current language model that supports 400 languages, and is aimed at preserving some of the languages that are spoken by less than one million people and may be in danger of extinction in the next few generations.
Google has announced a new artificial intelligence project called the 1,000 Languages Initiative, which aims to have a single AI language model that can support the 1,000 most spoken languages on Earth.
It is a step forward from Google’s current language model which supports 400 languages, and is aimed at preserving some of the languages that are spoken by less than one million people and may be in danger of extinction in the next few generations.
It also opens up “low-resource” languages, which do not have a lot of native representation on the web, up to a wider audience.
Google has a clear route to making this technology available to others, as it already operates the world’s most used translation app, Google Translate. Translate has also been integrated into Chrome and other Google services, allowing users to translate webpages from English and other languages into their own native language.
A single large AI language model is seen as the way forward, instead of building separate models for each language. As the artificial intelligence model integrates more languages, it should in theory be able to learn them quicker and potentially clear up some of the holes in the language which may come from under representation.
“By having a single model that is exposed to and trained on many different languages, we get much better performance on our low resource languages,” said Zoubin Ghahramani, vice president of research at Google AI, to The Verge. “The way we get to 1,000 languages is not by building 1,000 different models. Languages are like organisms, they’ve evolved from one another and they have certain similarities. And we can find some pretty spectacular advances in what we call zero-shot learning when we incorporate data from a new language into our 1,000 language model and get the ability to translate from a high-resource language to a low-resource language.”
Artificial intelligence has been utilized for many language tasks in the past, including as a resource for reviving dead languages. MIT’s Computer Science and Artificial Intelligence Laboratory developed a model that could decipher a lost language without knowing its relation to other languages, Google’s DeepMind developed a system which could recognize patterns on ancient stone inscriptions and translate them.
“One of the really interesting things about large AI language models and language research in general is that they can do lots and lots of different tasks,” said Ghahramani. “The same language model can turn commands for a robot into code; it can solve maths problems; it can do translation. The really interesting things about language models is they’re becoming repositories of a lot of knowledge, and by probing them in different ways you can get to different bits of useful functionality.”
Google has had a flurry of AI announcements, including a public beta of its Imagen text-to-image model, which is currently only allowing people to build isometric cities and monsters. It also shared a text-to-video model and an AI writing assistant.