To infinity and beyond for AI agents taking the long view

A new AI system can teach a group of agents to find an optimal long-term solution, an approach that could lead to developments including self-driving cars

Researchers from a group of organisations including MIT and the MIT-IBM Watson AI Lab have developed a machine-learning framework that enables cooperative or competitive AI agents to consider what other agents will do over a great deal of time, not just over a few next steps. 

This means the agents then adapt behaviours accordingly to influence other agents’ future behaviours and arrive at an optimal, long-term solution, and researchers say this framework could be used by a group of autonomous drones working together to find a lost hiker in a thick forest or by self-driving cars that strive to keep passengers safe by anticipating future moves of other vehicles driving on a busy highway.

“When AI agents are cooperating or competing, what matters most is when their behaviours converge at some point in the future,” says Dong-Ki Kim, a graduate student in the MIT Laboratory for Information and Decision Systems (LIDS) and lead author of a paper describing this framework. “There are a lot of transient behaviours along the way that don’t matter very much in the long run. Reaching this converged behaviour is what we really care about, and we now have a mathematical way to enable that,” 

The senior author is Jonathan P. How, the Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Co-authors include others at the MIT-IBM Watson AI Lab, IBM Research, Mila-Quebec Artificial Intelligence Institute, and Oxford University. The research will be presented at the Conference on Neural Information Processing Systems.

Multiagent reinforcement learning by trial and error

The researchers focused on a problem known as multiagent reinforcement learning. Reinforcement learning is a form of machine learning in which an AI agent learns by trial and error. Researchers reward the agent for “good” behaviours that help it achieve a goal. The agent adapts its behaviour to maximize that reward until it eventually becomes an expert at a task.

But when many cooperative or competing agents simultaneously learn, things become increasingly complex. As agents consider more future steps of their fellow agents and how their own behaviour influences others, the problem soon requires far too much computational power to solve efficiently. This is why other approaches only focus on the short term.

“The AIs really want to think about the end of the game, but they don’t know when the game will end,” says Kim. “They need to think about how to keep adapting their behaviour into infinity so they can win at some far time in the future. Our paper essentially proposes a new objective that enables an AI to think about infinity.”

It is impossible to plug infinity into an algorithm, so researchers designed their system so agents focus on a future point where their behaviour will converge to a general concept called an “active equilibrium.”

The machine-learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing acTive influence witH averagE Reward), enables agents to learn how to adapt their behaviours as they interact with other agents to achieve this active equilibrium.

“The challenge was thinking about infinity,” says Kim. “We had to use a lot of different mathematical tools to enable that, and make some assumptions to get it to work in practice.”

Researchers tested their approach against other multiagent reinforcement learning frameworks in several different scenarios, including a pair of robots fighting sumo-style and a battle pitting two 25-agent teams against one another. In both instances, the AI agents using FURTHER won the games more often.

While games were used in the testing phase, researchers say FURTHER could be used to tackle any kind of multiagent problem, including economists seeking to develop sound policy in situations where many interacting entities have behaviours and interests that change over time.

Share

Featured Articles

Pick N Pay’s Leon Van Niekerk: Evaluating Enterprise AI

We spoke with Pick N Pay Head of Testing Leon Van Niekerk at OpenText World Europe 2024 about its partnership with OpenText and how it plans to use AI

AI Agenda at Paris 2024: Revolutionising the Olympic Games

We attended the IOC Olympic AI Agenda Launch for Olympic Games Paris 2024 to learn about its AI strategy and enterprise partnerships to transform sports

Who is Gurdeep Singh Pall? Qualtrics’ AI Strategy President

Qualtrics has appointed Microsoft veteran Gurdeep Singh Pall as its new President of AI Strategy to transform the company’s AI offerings for customers

Should Tech Leaders be Concerned About the Power of AI?

Technology

Andrew Ng Joins Amazon Board to Support Enterprise AI

Machine Learning

GPT-4 Turbo: OpenAI Enhances ChatGPT AI Model for Developers

Machine Learning