I bet everyone's heard of the arrival of the news of world's next Chess champion - the "alien" AlphaZero. No, it isn't exactly an alien, it's a software - but it does not work like a typical software. It uses techniques similar to the functions of the human brain and with its computational power it performs exceptionally better than the human.
The inception of AlphaZero took place in DeepMind - the famous branch of Google known for delivering amazing research in advancement of AI.
Google’s DeepMind
DeepMind
Technologies Limited is a British artificial
intelligence research company which was founded by Demis Hassabis, Shane Legg,
and Mustafa Suleyman in 2010. The start-up was later acquired by Google in
2014.
The company is involved in pioneering research in the field of AI, developing
programs which can learn to solve complex problems from observing their environment -a technique known as machine learning.
DeepMind
made a breakthrough last year with its AlphaGo program which mastered the famous game Go. Go
is an ancient and complex game of strategy and intuition which involves two
players putting black and white markers on a 19-by-19 grid. The game is said to
have an impossible number of playing permutations. Many previously believed it
could not be played successfully by an AI program, however last year AlphaGo defeated
world champion Lee Sedol.
AlphaGo was
effective because it had been programmed with millions of moves made by past
masters and was able to predict its own chances of winning, adjusting its
strategy accordingly. The AlphaGo program used algorithms and practiced by
analyzing data from 100,000 professional human games and played against itself
some 30 million times.
AlphaZero
AlphaZero
is a generalized (and improvised) version of AlphaGo and DeepMind. The creators of AlphaZero recently published an academic
paper at arXiv, which has
not yet been peer reviewed. The paper describes the advancements made by a game-playing
program which was able to master the games of Go, chess and Shogi (Japanese
chess) within 24 hours. According to the paper’s authors,
“Starting from random play and given no domain knowledge except
the game rules, AplhaZero achieved within 24 hours a superhuman level of play
in the games of chess and shogi as well as Go, and convincingly defeated a
world champion program in each case.”
In a series
of 100 games played against reigning computer chess engine Stockfish 8, the AlphaZero
system did not lose a single game, winning or drawing all of the 100 matches
played. AlphaZero won 25 games while playing with the first mover advantage of
white and picked up another three games playing as black. The rest of the
contests were drawn with neither program recording win or loss. Even more
impressive, AlphaZero achieved this feat after only four hours of
self-training. AlphaZero was also able to defeat the world’s best Shogi computer
playing program Elmo by learning for just two hours. AlphaZero was also able to beat its
predecessor AlphaGo by learning for 8 hours.
Due to the arcane nature of AI, researchers are always wary of another AI winter. Results such as these provide positive affirmations that we're headed in the right direction in our research.
The co-founder and CEO Hassabis presented further details of the system at the recent Neural Information Processing Systems (NIPS) AI conference in California. According to Hassabis, “It doesn’t play like a human and it doesn’t play like a program. It plays in a third, almost alien way.”
Hassabis speculates that because AlphaZero teaches itself, it has the advantage of not assigning value to individual pieces and attempting to minimize losses in the same manner which human players tend to do when playing chess.
Reinforcement Learning
AlphaZero
was able to acquire 1,400 years of human chess knowledge in an amazingly short
amount of time. AlphaZero uses a reinforcement learning algorithm, a neural
net, and only the pieces on the board for input.
Reinforcement
learning refers to a type of machine learning algorithms in which our AI agent determines the best course of actions to achieve its goal with maximum performance. The technique uses a rewards and punishment system similar to how kids are taught about the good and the bad.
If the agent performs an action which takes it towards the goal, it is rewarded. If the action takes it further away from the goal, the agent is punished.
Let's say in a simplistic environment a good action is +1 points and a bad action is -1 points. When the agent reaches the goal, we'll add up all the points it gathered at each action. If the agent tried two different ways to achieve its goal, one with a cumulative result of +12 and another with +15, which one do you think will the agent adopt?
This is how AlphaZero learned all about good moves and bad moves without having any prior knowledge of chess except the rules. AlphaZero learned to master chess
by trial and error by playing against itself and further improving itself with
each game.
According to the authors of the paper AlphaZero learned opening moves in chess and gradually began to discard some moves in favor of others as it improved.
“AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations- arguably a more “human-like” approach to search.”
In this manner it learned chess on its own, akin to how humans learn. One more advantage you would have noticed is that it since it didn't require any prior game knowledge or special techniques except the concrete game rules, we can train it to learn any game other than chess.
And yes, this is what the AlphaZero team did - they trained the agent for two other games - Go and shogi, and AlphaZero emerged victorious in both.
AlphaZero is
a generalized AI agent which is able to learn from the least amount of
information available, and I hope most of you can visualize its capabilities. Present it with a problem, give it some basic rules and see what solution it suggests. This is one of the major applications of AI - solving complex problems in new ways which could lead
to solutions not previously considered.
There are
exciting implications for the AI techniques used in AlphaZero, majorly due to its ability to learn from
the least amount of information. As we discussed, this could be applied to a number of areas such as medical diagnosis, weather and disaster predictions, solutions for better management in organisations and government - the possibilities are endless.
According to Hassabis the program is so powerful because it is “no longer
constrained by the limits of human knowledge.” An example Hassabis believes
that if applied it could be used for defeating Alzheimer’s disease, coming up
with a cure in a matter of weeks which could take humans hundreds of years to
find. Hassabis states that “Ultimately we
want to harness algorithmic breakthroughs like this to help solve all sorts of
pressing real world problems.”
AI programs
may be able to drive forward human understanding of what is possible and
positively impact the lives of humans. It is
fascinating to see how far the research has come in AI developments and to
speculate how much further still we can still go.