AI is already training itself, claim MIT and Google experts

February 09, 2023

undefined mins

In-context learning means large AI language models can accomplish a task after seeing only a few examples, essentially training themselves as they go along

Scientists from MIT, Google Research, and Stanford University are working to unravel the mystery known as in-context learning, in which a large language model learns to accomplish a task after seeing just a handful of examples.

Researchers have discovered that large neural network models, such as GPT-3, can contain smaller, simpler linear models within them. The larger model can then use its existing information to train this linear model for a new task through a simple learning algorithm - the model simulates and trains a smaller version of itself.

GPT-3 is a large language model that can generate human-like text, ranging from poetry to code, by using vast amounts of internet data. It takes a small piece of input text and predicts what is likely to come next.

However, these models have even more potential. Researchers are investigating a phenomenon called in-context learning, where a large language model can learn to perform a new task after seeing only a few examples, without being specifically trained for that task. For example, if the model is provided with a few sentences and their sentiments (positive or negative), it can predict the sentiment of a new sentence correctly.

Normally, a model like GPT-3 would need to be retrained with new data to perform a new task, updating its parameters as it learns. But with in-context learning, the model's parameters remain unchanged, giving the appearance of learning a new task without actually learning.

In-context learning does away with expensive retraining

This research is crucial in understanding in-context learning and the algorithms these large models can implement. According to Ekin Akyürek, a computer science graduate student and lead author of a study on this phenomenon, this opens up further exploration and could lead to models being able to perform new tasks without the need for expensive retraining.

"Usually, if you want to fine-tune these models, you need to collect domain-specific data and do some complex engineering. But now we can just feed it an input, five examples, and it accomplishes what we want. So, in-context learning is an unreasonably efficient learning phenomenon that needs to be understood," Akyürek says.

Joining Akyürek on the paper are Dale Schuurmans, a research scientist at Google Brain and professor of computing science at the University of Alberta; as well as senior authors Jacob Andreas, the X Consortium Assistant Professor in the MIT Department of Electrical Engineering and Computer Science and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL); Tengyu Ma, an assistant professor of computer science and statistics at Stanford; and Danny Zhou, principal scientist and research director at Google Brain. The research is due to be presented at the International Conference on Learning Representations.

Many in the machine-learning research community believe that large language models like GPT-3 are capable of in-context learning due to their training, explains Akyürek.

GPT-3, for instance, has hundreds of billions of parameters and was trained by reading a massive amount of text from various sources on the internet, such as Wikipedia articles and Reddit posts. This means that when presented with examples of a new task, the model has likely already encountered something similar in its training data, allowing it to repeat patterns rather than truly learning to perform new tasks.

Akyürek's hypothesis that in-context learners are not just matching past patterns, but actually learning to perform new tasks, was supported by experiments using synthetic data. He and his team found that these models were able to learn from just a few examples, leading them to believe that these neural network models may contain smaller machine-learning models that can be trained to complete new tasks.

"This could account for most of the learning phenomena we have observed with these large models," says Akyürek.

Promise of transformer's hidden depths

To test this hypothesis, the researchers used a transformer neural network model, similar in architecture to GPT-3, but specifically trained for in-context learning. Through their exploration of the transformer's architecture, the researchers were able to theoretically prove that the model could write a linear model within its hidden states. In a neural network, the hidden states refer to the layers between the input and output layers and the researchers found that this linear model was written in the earliest layers of the transformer. The transformer then updated this linear model by implementing simple learning algorithms.

The researchers describe this as the model simulating and training a smaller version of itself. The researchers conducted probing experiments to further support this hypothesis, looking into the transformer's hidden layers to recover specific information.

“In this case, we tried to recover the actual solution to the linear model, and we could show that the parameter is written in the hidden states. This means the linear model is in there somewhere,” says Akyürek.

Building off this theoretical work, the researchers say they may be able to enable a transformer to perform in-context learning by adding just two layers to the neural network. There are still many technical details to work out before that would be possible, says Akyürek, but it could help engineers create models that can complete new tasks without the need for retraining with new data.

“The paper sheds light on one of the most remarkable properties of modern large language models — their ability to learn from data given in their inputs, without explicit training,” says Mike Lewis, a research scientist at Facebook AI Research who was not involved with this work. “Using the simplified case of linear regression, the authors show theoretically how models can implement standard learning algorithms while reading their input, and empirically which learning algorithms best match their observed behaviour. These results are a stepping stone to understanding how models can learn more complex tasks and will help researchers design better training methods for language models to further improve their performance.”

Akyürek plans to continue exploring in-context learning with more complex functions than the linear models they studied in this work.

“With this work, people can now visualise how these models can learn from exemplars. So, my hope is that it changes some people’s views about in-context learning,” says Akyürek. “These models are not as dumb as people think. They don’t just memorise these tasks. They can learn new tasks, and we have shown how that can be done.”

gpt-3 mit research