0% found this document useful (0 votes)
26 views3 pages

Reinforcement

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Reinforcement

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Name: Pradyumna Anil Kumar Kubear

Title of the paper: " Mastering Chess and Shogi by Self-Play with a General Reinforcement
Learning Algorithm”.

Objective:

An overview of the Alpha-Zero algorithm is provided in this work. It streamlines the


approach taken by the Alpha Go Zero program to obtain superhuman performance in fields
that are extremely difficult to enter from scratch.

This study demonstrates that Alpha Zero can master the games of chess, shogi, and go within
24 hours with no prior knowledge of the domain outside of the game protocols.

Assumptions and Predictions (Hardware, Software and Networking):

Hardware Components:

Authors have mentioned the use of four TPUs (Tensor Processing Units).

Monte Carlo tree search was executed using these TPUs.

Software Components

Alpha Zero’s algorithm is a mixture of deep neural networks and the Monte Carlo tree search
algorithm.

It also makes use of deep neural networking to ameliorate performance over time.

Authors have also made use of the following algorithms:

a) Handcrafted features: Characteristics like material point values, material imbalance


tables, piece-square tables, mobility and imprisoned pieces, pawn structure, and other
examining patterns are required in order to evaluate positions.

b) Linear combinations: Authors have solved the issue of raw evaluation using the
method of linear combination of handcrafted qualities and their weights.

c) Quiescence search: Before applying the evaluation function, this process is used to
address the issue of ongoing tactical scenarios.

d) Minimax search: We employ this strategy and a method known as quiescence


search to ascertain the position's final evaluation.
e) Alpha-beta pruning: This technique is used to remove any branches that are being
overtaken by a different variety.

f) Aspiration Windows: This technique is employed to accomplish the goal of


making more cuts in the search tree.

g) Iterative deepening: This method is used to plan moves in the search, enhancing
Alpha-Beta search effectiveness.

h) Opening book: To choose the moves at the start of the game, a method known as
the opening book is used.

i) Endgame table base: This table base is created using the method of
comprehensive retrograde analysis and contains details on moves that are possible in every
possible place. using six, seven, or even less pieces.

Major Contribution

1) Proposing Alpha Zero technique: Authors have proposed a novel artificial intelligence
called Alpha Zero, which is a more simplified version of Alpha Go Zero, the popular AI that
defeated professional Go players.

2) Single algorithm procedure: The authors have introduced a single algorithm that learns
about playing and mastering myriad games without exact knowledge pertaining to each
game.

3) Performance examination: The authors have applied their knowledge to compare the
performance of Alpha Zero players with players such as Stockfish 8 for games like chess and
Elmo for shogi.

Pros

1. Simplification: This research briefs us about Alpha Zero, a single algorithm that has
the capacity to learn to become a pro in the field of myriad board games without any
prior knowledge about the game.
2. New methodology: Writers have made use of the techniques of reinforcement
learning and self-play to train Alpha Zero, which is a novice method compared to
traditional AI methods, which are highly dependent on handcrafted rules.
3. Higher Efficacy: The performance of Alpha Zero against top baseline players
illustrates the impactful nature of the suggested method in learning and becoming
proficient in games that are not simple.
4. Encouraging further research: The encompassments of Alpha Zero inspire further
research in the field of reinforcement learning and AI techniques to solve complex
tasks that are beyond board games.
Cons

1. Restricted Domains: This paper illustrates the capacity of Alpha Zero to become a pro
at board games, but its application to other fields does not remain explored.

It is not clear how well this algorithm would perform in contemporary tasks with poorly
defined protocols or huge state spaces.

2. Resources for computation: training and examining Alpha Zero require more
computational resources, which may be a hindrance for those who are in the field of
research or developers with restricted access to such resources.
3. Deficiency in code or implementation details: This paper does not give information
completely on details or source code, which makes it highly challenging for others to
re-generate or develop on the outcomes.

You might also like