A new version of the AlphaGo computer program is able to teach itself to rapidly master the classic strategy game Go, starting from a blank slate and without human input, reports a paper published in Nature this week. The new program, called AlphaGo Zero, defeated its predecessor (which defeated Go champion Lee Sedol in a tournament in March 2016) by 100 games to 0.
A grand challenge for artificial intelligence is to develop an algorithm that learns challenging concepts from a blank slate and with superhuman proficiency. To beat world-champion human players at Go, a previous version of AlphaGo was trained through a combination of supervised learning based on millions of human expert moves and reinforcement learning from self-play. That version of AlphaGo was trained over several months and required multiple machines and 48 TPUs (specialized chips for neural network training).
Here, David Silver, Julian Schrittwieser, Karen Simonyan, Demis Hassabis and colleagues introduce AlphaGo Zero, which learns solely from the games that it plays against itself, starting from random moves, with only the board and pieces as inputs and without human data. AlphaGo Zero uses a single neural network, which is trained to predict the program’s own move selection and the winner of its games, improving with each iteration of self-play. The new program uses a single machine and 4 TPUs.
After a few days of training - including almost 5 million games of self-play - AlphaGo Zero could outperform humans and defeat all previous versions of AlphaGo. As the program trained, it independently discovered some of the same game principles that took humans thousands of years to conceptualize and also developed novel strategies that provide new insights into this ancient game.