or: Mastering The Game Of Go With Deep Neural Networks And Tree Search

A convnet learns to predict the outcome of a game conditional on a board arrangement, and an RL policy plays against itself (using the value predictions as a guide) to provide source material for the prediction network.