One of the main goals of creating artificial intelligence is to make a system that is able to self-teach so it can be used in varying situations. In the case of Google’s DeepMind, which is working on just that goal, it has made a huge leap towards the realization of its ambition.
In a paper published by Cornell University, researchers working at DeepMind describe the application of an artificial intelligence called AlphaZero. This creation was able to reach a “superhuman level of play” in chess, among other games, defeating champions that were created to specialize in those games.
AlphaZero was given the rules of the game then it used a reinforcement learning algorithm to teach itself tactics. After eight hours of self-training it was able to defeat the previous generation of AI at the game Go. This previous AI, AlphaGo Zero, already beat the human Go champion.
As if that wasn’t enough, the new AlphaZero then trained for just four hours at chess before beating the world champion chess-playing program, Stockfish. It then went on for a further two hours of training in shogi before defeating the world’s greatest shogi programme, Elmo.
While three world champion level victories in just 24 hours are impressive, it’s the fact AlphaZero wasn’t specifically created to play any of these games that’s the real feat.
This new system uses reinforcement learning to find a balance between exploration - of uncharted territory - and exploitation - of current knowledge. Crucially, it applies this in a far broader way than previous game-specific program.
The paper explains,“The AlphaZero algorithm is a more generic version of the AlphaGo Zero algorithm that was first introduced in the context of Go. It replaces the handcrafted knowledge and domain specific augmentations used in traditional game-playing programs with deep neural networks and a tabula rasa reinforcement learning algorithm.”
More specifically, “Instead of a handcrafted evaluation function and move ordering heuristics, AlphaZero utilizes a deep neural network (p, v) = fθ(s) with parameters θ. This neural network takes the board position s as an input and outputs a vector of move probabilities p with components pa = P r(a|s) 2 for each action a, and a scalar value v estimating the expected outcome z from position s, v ≈ E[z|s]. AlphaZero learns these move probabilities and value estimates entirely from self-play; these are then used to guide its search.”
All that means that the AlphaZero program is able to work much faster and therefore more efficiently than the competition. The paper describes this, “In AlphaGo Zero, self-play games were generated by the best player from all previous iterations. After each iteration of training, the performance of the new player was measured against the best player; if it won by a margin of 55% then it replaced the best player and self-play games were subsequently generated by this new player. In contrast, AlphaZero simply maintains a single neural network that is updated continually, rather than waiting for an iteration to complete.”
In 100 games of chess, go and shogi, moves were limited to one minute of thinking time per turn. AlphaGo Zero had three days of training, Stockfish and Elmo were the results of years of programming work. AlphaZero taught itself all three games in mere hours.
Speaking about the Go specific AlphaGo Zero, earlier this year, DeepMind CEO Demis Hassabis claimed a future version would help solve scientific problems like designing new drugs and discovering new materials. While the AlphaZero update is a step towards a broader working, it’s still very specifically focused on game scenarios right now.
The paper clarifies just how impressive AlphaZero is with what it has done in chess, “The game of chess represented the pinnacle of AI research over several decades. State-ofthe-art programs are based on powerful engines that search many millions of positions, leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic reinforcement learning algorithm – originally devised for the game of Go – that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules of chess. Furthermore, the same algorithm was applied without modification to the more challenging game of shogi, again outperforming the state of the art within a few hours.”