Reinforcement
Reinforcement
Title of the paper: " Mastering Chess and Shogi by Self-Play with a General Reinforcement
Learning Algorithm”.
Objective:
This study demonstrates that Alpha Zero can master the games of chess, shogi, and go within
24 hours with no prior knowledge of the domain outside of the game protocols.
Hardware Components:
Authors have mentioned the use of four TPUs (Tensor Processing Units).
Software Components
Alpha Zero’s algorithm is a mixture of deep neural networks and the Monte Carlo tree search
algorithm.
It also makes use of deep neural networking to ameliorate performance over time.
b) Linear combinations: Authors have solved the issue of raw evaluation using the
method of linear combination of handcrafted qualities and their weights.
c) Quiescence search: Before applying the evaluation function, this process is used to
address the issue of ongoing tactical scenarios.
g) Iterative deepening: This method is used to plan moves in the search, enhancing
Alpha-Beta search effectiveness.
h) Opening book: To choose the moves at the start of the game, a method known as
the opening book is used.
i) Endgame table base: This table base is created using the method of
comprehensive retrograde analysis and contains details on moves that are possible in every
possible place. using six, seven, or even less pieces.
Major Contribution
1) Proposing Alpha Zero technique: Authors have proposed a novel artificial intelligence
called Alpha Zero, which is a more simplified version of Alpha Go Zero, the popular AI that
defeated professional Go players.
2) Single algorithm procedure: The authors have introduced a single algorithm that learns
about playing and mastering myriad games without exact knowledge pertaining to each
game.
3) Performance examination: The authors have applied their knowledge to compare the
performance of Alpha Zero players with players such as Stockfish 8 for games like chess and
Elmo for shogi.
Pros
1. Simplification: This research briefs us about Alpha Zero, a single algorithm that has
the capacity to learn to become a pro in the field of myriad board games without any
prior knowledge about the game.
2. New methodology: Writers have made use of the techniques of reinforcement
learning and self-play to train Alpha Zero, which is a novice method compared to
traditional AI methods, which are highly dependent on handcrafted rules.
3. Higher Efficacy: The performance of Alpha Zero against top baseline players
illustrates the impactful nature of the suggested method in learning and becoming
proficient in games that are not simple.
4. Encouraging further research: The encompassments of Alpha Zero inspire further
research in the field of reinforcement learning and AI techniques to solve complex
tasks that are beyond board games.
Cons
1. Restricted Domains: This paper illustrates the capacity of Alpha Zero to become a pro
at board games, but its application to other fields does not remain explored.
It is not clear how well this algorithm would perform in contemporary tasks with poorly
defined protocols or huge state spaces.
2. Resources for computation: training and examining Alpha Zero require more
computational resources, which may be a hindrance for those who are in the field of
research or developers with restricted access to such resources.
3. Deficiency in code or implementation details: This paper does not give information
completely on details or source code, which makes it highly challenging for others to
re-generate or develop on the outcomes.