Research highlight

Computer science: AI revisits its past to solve complex tasks

Nature

February 25, 2021

A family of reinforcement learning algorithms that score higher than human players and state-of-the-art artificial intelligence systems at classic Atari video games, such as Montezuma’s Revenge and Pitfall, is reported in this week’s Nature. Collectively known as Go-Explore, the algorithms offer a way to improve the exploration of complex environments, which may be an important step towards creating truly intelligent learning agents.

Reinforcement learning can be used to train artificial intelligence systems to make decisions by exploring and understanding complicated environments, and to learn how to optimally acquire rewards. Rewards may include a robot reaching a specific location or completing a level in a video game. However, existing reinforcement learning algorithms seem to struggle when complex environments offer little feedback.

Adrien Ecoffet, Joost Huizinga and colleagues identify the main impediments to effective exploration and present a family of algorithms that addresses these two challenges. Go-Explore can thoroughly explore environments and it builds up an archive to help it to remember where it has been, ensuring that it does not forget the route to a promising intermediate stage or successful outcome (the reward). The authors demonstrate the potential of the family of algorithms by using them to solve all previously unsolved Atari 2600 games. Go-Explore quadruples previous scores on Montezuma’s Revenge and surpasses average human performance on Pitfall (where previous algorithms were unable to score any points). Go-Explore can also solve a simulated robotic task where a robot arm must pick up an object and put it on one of four shelves, two of which are behind latched doors.

The simple principles of remembering and returning to promising areas for exploration are a powerful and general approach to exploration, the authors note. They suggest that the algorithms presented here could have applications in robotics, language understanding and drug design.

After the embargo ends, the full paper will be available at: https://www.nature.com/articles/s41586-020-03157-9

doi: 10.1038/s41586-020-03157-9

Return to research highlights

PrivacyMark System