DeepMind says its new language model can beat others 25 times its size

Called RETRO (for “Retrieval-Enhanced Transformer”), the AI matches the performance of neural networks 25 times its size, cutting the time and cost needed to train very large models. The researchers also claim that the database makes it easier to analyze what the AI has learned, which could help with filtering out bias and toxic language.  

“Being able to look things up on the fly instead of having to memorize everything can often be useful, as it is for humans,” says Jack Rae at DeepMind, who leads the firm’s language research.

Language models generate text by predicting what words come next in a sentence or conversation. The larger a model, the more information about the world it can learn during training, which makes its predictions better. GPT-3 has 175 billion parameters—the values in a neural network that store data and get adjusted as the model learns. Microsoft’s Megatron-Turing language model has 530 billion parameters. But large models also take vast amounts of computing power to train, putting them out of reach of all but the richest organizations.

With RETRO, DeepMind has tried to cut the costs of training without cutting how much the AI learns. The researchers trained the model on a vast data set of news articles, Wikipedia pages, books, and text from GitHub, an online code repository. The data set contains text in 10 languages, including English, Spanish, German, French, Russian, Chinese, Swahili, and Urdu.