AlphaGo Zero

AlphaGo Zero is the second version of AlphaGo.  It’s a leaner, meaner, smarter version than its predecessor, which beat the best Go players in the World.  This version beats them worse, and beats the first version every time.

I’ve been fascinated with this due to my experience playing Go.  I was a weak amateur player (7 kyu) but used to enjoy knowing that, at that level, I could still beat the best go playing software (11 kyu at the time, the numbers get smaller until they reach 1 kyu, then go to 1 dan and up for the good players).  I took a certain pride in playing a game that couldn’t be mastered by computer, like chess had been.

Articles explaining why go is so difficult for a computer focus on the 19×19 grid the game is played on, and the astronomical number (well, more than astronomical, there are more possible games than  atoms in the Universe) of possible games there are and how hard it is to make a blind search looking for best moves.

But it’s not the 19×19 grid that’s the problem.  It’s the fact that a computer simply hasn’t been able to calculate the score, even of a finished game.  And if it can’t know the score, it can’t decide which moves are best for improving its score.

Here’s the problem. Stones are put on the board in an attempt to control more territory on the board.  If you put your stones in a tight close pattern, your opponent can place stones more loosely and surround more territory.  But, and here’s the complexity of the game, if those loose stones are too loose, then they can be surrounded and captured.

A board, at the end of the game then, will have clusters of stones around various local battles for territory and control, and some of those stones will have been abandoned as dead, but some will definitely be alive and the players will understand the status of each. But there is no way, without playing out all scenarios, that a computer can determine which stones are even alive or dead.

This is the very first, easiest example in a collection of go exercises available online.  Is the one black stone surrounded by white dead?  Looks like it.  Does white then have 4 points of territory?  Or rather, will black with one more stone inside of white kill the white stones?  And own the entire corner?

If this were part of a real game, depending on whose turn it is, a white or black stone would be played at C1 and the players would understand that the situation is resolved and require no more play.  But how can a computer figure out the status of those white stones?

To make matters worse, even if black were to play at C1, and the players were to recognize the white stones as dead, still, if black made extremely stupid, but legal moves, the white stones could become alive again.

The answer is the neural network pattern-matching machine learning software.  No longer is it required to code an algorithm to figure out the score of a situation like this, to decide the worth of it.  The same technology that is used in self-driving cars to evaluate the visual field around the car, is used to evaluate board positions, after studying zillions of games.

Having used that technology to allow the computer to assess the value of different board positions, then it’s back to boring old AI planning software to search for the best moves.

Back to AlphaGo Zero.  Here’s what’s scary about it.  AlphaGo studied the vast libraries of human go games to learn how to play. AlphaGo Zero simply played against itself.  It rederived centuries of Go wisdom on best play combinations in the opening moves, and then grew that knowledge.

It didn’t need people to teach it how to play.

1 thought on “AlphaGo Zero”

Leave a Reply

Your email address will not be published. Required fields are marked *