0% found this document useful (0 votes)
240 views7 pages

AlphaZero: AI Mastering Chess, Shogi, Go

AlphaZero is an AI program developed by DeepMind that achieved superhuman ability in the games of chess, shogi, and Go after training for a short period of time against itself using deep neural networks and reinforcement learning. It defeated world-champion programs Stockfish in chess and Elmo in shogi within hours of training. The results surprised many in the chess and AI communities, though some experts argued the matches may not have been completely fair comparisons due to hardware limitations.

Uploaded by

joseph676
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
240 views7 pages

AlphaZero: AI Mastering Chess, Shogi, Go

AlphaZero is an AI program developed by DeepMind that achieved superhuman ability in the games of chess, shogi, and Go after training for a short period of time against itself using deep neural networks and reinforcement learning. It defeated world-champion programs Stockfish in chess and Elmo in shogi within hours of training. The results surprised many in the chess and AI communities, though some experts argued the matches may not have been completely fair comparisons due to hardware limitations.

Uploaded by

joseph676
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

AlphaZero

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to


master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

On December 5, 2017, the DeepMind team released a preprint paper introducing AlphaZero, which within
24 hours of training achieved a superhuman level of play in these three games by defeating world-
champion programs Stockfish, Elmo, and the three-day version of AlphaGo Zero. In each case it made use
of custom tensor processing units (TPUs) that the Google programs were optimized to use.[1] AlphaZero
was trained solely via self-play using 5,000 first-generation TPUs to generate the games and 64 second-
generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame
tables. After four hours of training, DeepMind estimated AlphaZero was playing chess at a higher Elo
rating than Stockfish 8; after nine hours of training, the algorithm defeated Stockfish 8 in a time-controlled
100-game tournament (28 wins, 0 losses, and 72 draws).[1][2][3] The trained algorithm played on a single
machine with four TPUs.

DeepMind's paper on AlphaZero was published in the journal Science on 7 December 2018;[4] however,
the AlphaZero program itself has not been made available to the public.[5] In 2019, DeepMind published a
new paper detailing MuZero, a new algorithm able to generalise AlphaZero's work, playing both Atari and
board games without knowledge of the rules or representations of the game.[6]

Relation to AlphaGo Zero


AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play
shogi and chess as well as Go. Differences between AZ and AGZ include:[1]

AZ has hard-coded rules for setting search hyperparameters.


The neural network is now updated continually.
AZ doesn't use symmetries, unlike AGZ.
Chess can end in a draw unlike Go; therefore, AlphaZero takes into account the possibility of
a drawn game.

Stockfish and Elmo


Comparing Monte Carlo tree search searches, AlphaZero searches just 80,000 positions per second in chess
and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero
compensates for the lower number of evaluations by using its deep neural network to focus much more
selectively on the most promising variation.[1]

Training
AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games and 64
second-generation TPUs to train the neural networks. In parallel, the in-training AlphaZero was
periodically matched against its benchmark (Stockfish, Elmo, or AlphaGo Zero) in brief one-second-per-
move games to determine how well the training was progressing. DeepMind judged that AlphaZero's
performance exceeded the benchmark after around four hours of training for Stockfish, two hours for Elmo,
and eight hours for AlphaGo Zero.[1]

Preliminary results

Outcome

Chess

In AlphaZero's chess match against Stockfish 8 (2016 TCEC world champion), each program was given
one minute per move. AlphaZero was flying the English flag, while Stockfish the Norwegian.[7] Stockfish
was allocated 64 threads and a hash size of 1 GB,[1] a setting that Stockfish's Tord Romstad later criticized
as suboptimal.[8][note 1] AlphaZero was trained on chess for a total of nine hours before the match. During
the match, AlphaZero ran on a single machine with four application-specific TPUs. In 100 games from the
normal starting position, AlphaZero won 25 games as White, won 3 as Black, and drew the remaining
72.[9] In a series of twelve, 100-game matches (of unspecified time or resource constraints) against
Stockfish starting from the 12 most popular human openings, AlphaZero won 290, drew 886 and lost 24.[1]

Shogi

AlphaZero was trained on shogi for a total of two hours before the tournament. In 100 shogi games against
Elmo (World Computer Shogi Championship 27 summer 2017 tournament version with YaneuraOu 4.73
search), AlphaZero won 90 times, lost 8 times and drew twice.[9] As in the chess games, each program got
one minute per move, and Elmo was given 64 threads and a hash size of 1 GB.[1]

Go

After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost
40.[1][9]

Analysis

DeepMind stated in its preprint, "The game of chess represented the pinnacle of AI research over several
decades. State-of-the-art programs are based on powerful engines that search many millions of positions,
leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic
reinforcement learning algorithm  – originally devised for the game of go  – that achieved superior results
within a few hours, searching a thousand times fewer positions, given no domain knowledge except the
rules."[1] DeepMind's Demis Hassabis, a chess player himself, called AlphaZero's play style "alien": It
sometimes wins by offering counterintuitive sacrifices, like offering up a queen and bishop to exploit a
positional advantage. "It's like chess from another dimension."[10]

Given the difficulty in chess of forcing a win against a strong opponent, the +28 –0 =72 result is a
significant margin of victory. However, some grandmasters, such as Hikaru Nakamura and Komodo
developer Larry Kaufman, downplayed AlphaZero's victory, arguing that the match would have been
closer if the programs had access to an opening database (since Stockfish was optimized for that
scenario).[11] Romstad additionally pointed out that Stockfish is not optimized for rigidly fixed-time moves
and the version used was a year old.[8][12]

Similarly, some shogi observers argued that the Elmo hash size was too low, that the resignation settings
and the "EnteringKingRule" settings (cf. shogi § Entering King) may have been inappropriate, and that
Elmo is already obsolete compared with newer programs.[13][14]

Reaction and criticism

Papers headlined that the chess training took only four hours: "It was managed in little more than the time
between breakfast and lunch."[2][15] Wired described AlphaZero as "the first multi-skilled AI board-game
champ".[16] AI expert Joanna Bryson noted that Google's "knack for good publicity" was putting it in a
strong position against challengers. "It's not only about hiring the best programmers. It's also very political,
as it helps make Google as strong as possible when negotiating with governments and regulators looking at
the AI sector."[9]

Human chess grandmasters generally expressed excitement about AlphaZero. Danish grandmaster Peter
Heine Nielsen likened AlphaZero's play to that of a superior alien species.[9] Norwegian grandmaster Jon
Ludvig Hammer characterized AlphaZero's play as "insane attacking chess" with profound positional
understanding.[2] Former champion Garry Kasparov said, "It's a remarkable achievement, even if we
should have expected it after AlphaGo."[11][17]

Grandmaster Hikaru Nakamura was less impressed, stating: "I don't necessarily put a lot of credibility in the
results simply because my understanding is that AlphaZero is basically using the Google supercomputer
and Stockfish doesn't run on that hardware; Stockfish was basically running on what would be my laptop.
If you wanna have a match that's comparable you have to have Stockfish running on a supercomputer as
well."[8]

Top US correspondence chess player Wolff Morrow was also unimpressed, claiming that AlphaZero would
probably not make the semifinals of a fair competition such as TCEC where all engines play on equal
hardware. Morrow further stated that although he might not be able to beat AlphaZero if AlphaZero played
drawish openings such as the Petroff Defence, AlphaZero would not be able to beat him in a
correspondence chess game either.[18]

Motohiro Isozaki, the author of YaneuraOu, noted that although AlphaZero did comprehensively beat
Elmo, the rating of AlphaZero in shogi stopped growing at a point which is at most 100–200 higher than
Elmo. This gap is not that high, and Elmo and other shogi software should be able to catch up in 1–2
years.[19]

Final results
DeepMind addressed many of the criticisms in their final version of the paper, published in December 2018
in Science.[4] They further clarified that AlphaZero was not running on a supercomputer; it was trained
using 5,000 tensor processing units (TPUs), but only ran on four TPUs and a 44-core CPU in its
matches.[20]

Chess
In the final results, Stockfish version 8 ran under the same conditions as in the TCEC superfinal: 44 CPU
cores, Syzygy endgame tablebases, and a 32GB hash size. Instead of a fixed time control of one move per
minute, both engines were given 3 hours plus 15 seconds per move to finish the game. In a 1000-game
match, AlphaZero won with a score of 155 wins, 6 losses, and 839 draws. DeepMind also played a series
of games using the TCEC opening positions; AlphaZero also won convincingly. Stockfish needed 10-to-1
time odds to match AlphaZero.[21]

Shogi

Similar to Stockfish, Elmo ran under the same conditions as in the 2017 CSA championship. The version of
Elmo used was WCSC27 in combination with YaneuraOu 2017 Early KPPT 4.79 64AVX2
TOURNAMENT. Elmo operated on the same hardware as Stockfish: 44 CPU cores and a 32GB hash
size. AlphaZero won 98.2% of games when playing sente (i.e. having the first move) and 91.2% overall.

Reactions and criticisms

Human grandmasters were generally impressed with AlphaZero's games against Stockfish.[21] Former
world champion Garry Kasparov said it was a pleasure to watch AlphaZero play, especially since its style
was open and dynamic like his own.[22][23]

In the computer chess community, Komodo developer Mark Lefler called it a "pretty amazing
achievement", but also pointed out that the data was old, since Stockfish had gained a lot of strength since
January 2018 (when Stockfish 8 was released). Fellow developer Larry Kaufman said AlphaZero would
probably lose a match against the latest version of Stockfish, Stockfish 10, under Top Chess Engine
Championship (TCEC) conditions. Kaufman argued that the only advantage of neural network–based
engines was that they used a GPU, so if there was no regard for power consumption (e.g. in an equal-
hardware contest where both engines had access to the same CPU and GPU) then anything the GPU
achieved was "free". Based on this, he stated that the strongest engine was likely to be a hybrid with neural
networks and standard alpha–beta search.[24]

AlphaZero inspired the computer chess community to develop Leela Chess Zero, using the same
techniques as AlphaZero. Leela contested several championships against Stockfish, where it showed
roughly similar strength to Stockfish, although Stockfish has since pulled away.[25]
In 2019 DeepMind published MuZero, a unified system that played excellent chess, shogi, and go, as well
as games in the Atari Learning Environment, without being pre-programmed with their rules.[26][27]

See also
AlphaGo
AlphaDev
AlphaFold
General game playing
MuZero
Leela Chess Zero
Pluribus (poker bot)

Notes
1. Stockfish developer Tord Romstad responded with

The match results by themselves are not particularly meaningful because of the
rather strange choice of time controls and Stockfish parameter settings: The
games were played at a fixed time of 1 minute/move, which means that Stockfish
has no use of its time management heuristics (lot of effort has been put into
making Stockfish identify critical points in the game and decide when to spend
some extra time on a move; at a fixed time per move, the strength will suffer
significantly). The version of Stockfish used is one year old, was playing with far
more search threads than has ever received any significant amount of testing,
and had way too small hash tables for the number of threads. I believe the
percentage of draws would have been much higher in a match with more normal
conditions.[8]

References
1. Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew;
Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap,
Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and
Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 (htt
ps://arxiv.org/abs/1712.01815) [cs.AI (https://arxiv.org/archive/cs.AI)].
2. Knapton, Sarah; Watson, Leon (December 6, 2017). "Entire human chess knowledge
learned and surpassed by DeepMind's AlphaZero in four hours" (https://www.telegraph.co.u
k/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alpha
zero/). Telegraph.co.uk. Retrieved December 6, 2017.
3. Vincent, James (December 6, 2017). "DeepMind's AI became a superhuman chess player in
a few hours, just for fun" (https://www.theverge.com/2017/12/6/16741106/deepmind-ai-chess
-alphazero-shogi-go). The Verge. Retrieved December 6, 2017.
4. Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew;
Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap,
Timothy; Simonyan, Karen; Hassabis, Demis (December 7, 2018). "A general reinforcement
learning algorithm that masters chess, shogi, and go through self-play" (https://doi.org/10.11
26%2Fscience.aar6404). Science. 362 (6419): 1140–1144. Bibcode:2018Sci...362.1140S
(https://ui.adsabs.harvard.edu/abs/2018Sci...362.1140S). doi:10.1126/science.aar6404 (http
s://doi.org/10.1126%2Fscience.aar6404). PMID 30523106 (https://pubmed.ncbi.nlm.nih.gov/
30523106).
5. "Chess Terms: AlphaZero" (https://chess.com/terms/alphazero-chess-engine). Chess.com.
Retrieved July 30, 2022.
6. Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent;
Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap,
Timothy (2020). "Mastering Atari, Go, chess and shogi by planning with a learned model".
Nature. 588 (7839): 604–609. arXiv:1911.08265 (https://arxiv.org/abs/1911.08265).
Bibcode:2020Natur.588..604S (https://ui.adsabs.harvard.edu/abs/2020Natur.588..604S).
doi:10.1038/s41586-020-03051-4 (https://doi.org/10.1038%2Fs41586-020-03051-4).
PMID 33361790 (https://pubmed.ncbi.nlm.nih.gov/33361790). S2CID 208158225 (https://ap
i.semanticscholar.org/CorpusID:208158225).
7. "AlphaZero vs. Stockfish 2017" (https://chess24.com/en/embed-custom-tournament/condens
ed/alphazero-vs-stockfish).
8. "AlphaZero: Reactions From Top GMs, Stockfish Author" (https://www.chess.com/news/view/
alphazero-reactions-from-top-gms-stockfish-author). chess.com. December 8, 2017.
Retrieved December 9, 2017.
9. " 'Superhuman' Google AI claims chess crown" (https://www.bbc.com/news/technology-4225
1535). BBC News. December 6, 2017. Retrieved December 7, 2017.
10. Knight, Will (December 8, 2017). "Alpha Zero's "Alien" Chess Shows the Power, and the
Peculiarity, of AI" (https://www.technologyreview.com/s/609736/alpha-zeros-alien-chess-sho
ws-the-power-and-the-peculiarity-of-ai/). MIT Technology Review. Retrieved December 11,
2017.
11. "Google's AlphaZero Destroys Stockfish In 100-Game Match" (https://www.chess.com/news/
view/google-s-alphazero-destroys-stockfish-in-100-game-match). Chess.com. Retrieved
December 7, 2017.
12. Katyanna Quach. "DeepMind's AlphaZero AI clobbered rival chess app on non-level
playing...board" (https://www.theregister.co.uk/2017/12/14/deepmind_alphazero_ai_unfair).
The Register (December 14, 2017).
13. "Some concerns on the matching conditions between AlphaZero and Shogi engine" (http://w
ww.uuunuuun.com/single-post/2017/12/07/Some-concerns-on-the-matching-conditions-bet
ween-AlphaZero-and-Shogi-engine). コンピュータ将棋 レーティング . "uuunuuun" (a
blogger who rates free shogi engines). Retrieved December 9, 2017. (via " 瀧澤 誠 @elmo
(@mktakizawa) | Twitter" (https://twitter.com/mktakizawa). mktakizawa (elmo developer).
December 9, 2017. Retrieved December 11, 2017.)
14. "DeepMind 社がやねうら王に注目し始めたようです " (http://yaneuraou.yaneu.com/2017/12/
07/deepmind%E7%A4%BE%E3%81%8C%E3%82%84%E3%81%AD%E3%81%86%E3%
82%89%E7%8E%8B%E3%81%AB%E6%B3%A8%E7%9B%AE%E3%81%97%E5%A7%8
B%E3%82%81%E3%81%9F%E3%82%88%E3%81%86%E3%81%A7%E3%81%99/). The
developer of YaneuraOu, a search component used by elmo. December 7, 2017. Retrieved
December 9, 2017.
15. Badshah, Nadeem (December 7, 2017). "Google's DeepMind robot becomes world-beating
chess grandmaster in four hours" (https://www.thetimes.co.uk/article/google-s-deepmind-alp
hazero-becomes-world-beating-chess-grandmaster-in-four-hours-hcppp9vr2). The Times of
London. Retrieved December 7, 2017.
16. "Alphabet's Latest AI Show Pony Has More Than One Trick" (https://www.wired.com/story/al
phabets-latest-ai-show-pony-has-more-than-one-trick/). WIRED. December 6, 2017.
Retrieved December 7, 2017.
17. Gibbs, Samuel (December 7, 2017). "AlphaZero AI beats champion chess program after
teaching itself in four hours" (https://www.theguardian.com/technology/2017/dec/07/alphazer
o-google-deepmind-ai-beats-champion-program-teaching-itself-to-play-four-hours). The
Guardian. Retrieved December 8, 2017.
18. "Talking modern correspondence chess" (https://en.chessbase.com/post/correspondence-ch
ess-and-correspondence-database-2018). Chessbase. June 26, 2018. Retrieved July 11,
2018.
19. DeepMind 社がやねうら王に注目し始めたようです やねうら王 公式サイト | (http://yaneura
ou.yaneu.com/2017/12/07/deepmind%E7%A4%BE%E3%81%8C%E3%82%84%E3%81%
AD%E3%81%86%E3%82%89%E7%8E%8B%E3%81%AB%E6%B3%A8%E7%9B%AE%
E3%81%97%E5%A7%8B%E3%82%81%E3%81%9F%E3%82%88%E3%81%86%E3%8
1%A7%E3%81%99/), 2017 12 7 年 月日
20. As given in the Science paper, a TPU is "roughly similar in inference speed to a Titan V
GPU, although the architectures are not directly comparable" (Ref. 24).
21. "AlphaZero Crushes Stockfish In New 1,000-Game Match" (https://www.chess.com/news/vie
w/updated-alphazero-crushes-stockfish-in-new-1-000-game-match). December 6, 2018.
22. Sean Ingle (December 11, 2018). " 'Creative' AlphaZero leads way for chess computers and,
maybe, science" (https://www.theguardian.com/sport/2018/dec/11/creative-alphazero-leads-
way-chess-computers-science). The Guardian.
23. Albert Silver (December 7, 2018). "Inside the (deep) mind of AlphaZero" (https://en.chessbas
e.com/post/the-full-alphazero-paper-is-published-at-long-last). Chessbase.
24. "Komodo MCTS (Monte Carlo Tree Search) is the new star of TCEC" (http://www.chessdom.
com/komodo-mcts-monte-carlo-tree-search-is-the-new-star-of-tcec/). Chessdom. December
18, 2018.
25. See TCEC and Leela Chess Zero.
26. "Could Artificial Intelligence Save Us From Itself?" (https://fortune.com/2019/11/26/ai-is-the-p
roblem-and-the-solution/). Fortune. 2019. Retrieved February 29, 2020.
27. "DeepMind's MuZero teaches itself how to win at Atari, chess, shogi, and Go" (https://venture
beat.com/2019/11/20/deepminds-muzero-teaches-itself-how-to-win-at-atari-chess-shogi-and
-go/). VentureBeat. November 20, 2019. Retrieved February 29, 2020.

External links
Chessprogramming wiki on AlphaZero (https://www.chessprogramming.org/AlphaZero)
Chess.com Youtube playlist for AlphaZero vs. Stockfish (https://www.youtube.com/watch?v=
akgalUq5vew&list=PL-qLOQ-OEls607FPLAsPZ6De4f1W3ZF-I)

Retrieved from "https://en.wikipedia.org/w/index.php?title=AlphaZero&oldid=1163299642"

You might also like