0% found this document useful (0 votes)
84 views43 pages

Adversarial Search in AI Games

Uploaded by

Abhishek Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views43 pages

Adversarial Search in AI Games

Uploaded by

Abhishek Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Topic for the class:

Module-III
Adversarial Search
V.S.V.S.MURTHY
Assistant Professor
Department of CSE
GITAM School of Technology (GST)
Visakhapatnam – 530045
Email: [email protected]
Mobile: 9989720516

Department of CSE, GST CSEN2031: AI 1


Games
 Multi-agent needs to consider the actions of other agents and how they affect its own welfare.
 Competitive environments, in which the agents goals are in conflict, giving rise to adversarial search problems
known as games.
 In AI, the most common games are deterministic, two-player, turn taking ,zero-sum games of perfect information
(such as chess).
 Games are good examples of Adversarial search.
– States are easy to represent
– Agents are restricted to finite number of actions
– Outcome of agent is defined by precise rules.
– Too hard to solve

Department of CSE, GST CSEN2031 : AI 2


Games
Example: Which games are Adversarial?

8-PUZZLE
N-QUEEN
CHESS(Adversarial)
TIC-TAC-TOE(Adversarial)

Department of CSE, GST CSEN2031 : AI 3


Games
A game is formally defined with the following elements:
 S0: The initial state, which specifies how the game is set up at the start.
 PLAYER(s): Defines which player has the move in a state.
 ACTIONS(s): Returns the set of legal moves in a state.
 RESULT(s, a): The transition model, which defines the result of a move.
 TERMINAL-TEST(s): A terminal test, which is true when the game is over and false otherwise. States where the game has
ended are called terminal states.
 UTILITY(s, p): A utility function (also called an objective function or payoff function), is the final numeric value for a game
that ends in terminal state s for a player p. In chess, the outcome is a win, loss, or draw, with values +1, 0, or ½.
 A zero-sum game is defined as one where the total payoff to all players is the same for every instance of the
game. Chess is zero-sum because every game has payoff of either 0 + 1, 1 + 0 or ½ .

Department of CSE, GST CSEN2031 : AI 4


Game Tree
• Consider a TicTacToe game of two players:
• Player 1:MAX
• Player 2:MIN
• MAX moves first(places X) followed by MIN(Places O)
• The initial state, ACTIONS function, and RESULT function define the
game tree for the game
– where the nodes are game states and the edges are moves.

Department of CSE, GIT ECS302: AI 5


• From the initial state, MAX has nine possible moves.
• Play alternates between MAX’s placing an X and MIN’s placing an O
until we reach leaf nodes corresponding to terminal states such that
one player has three in a row or all the squares are filled.

Department of CSE, GIT ECS302: AI 6


Game tree
The number on each leaf node indicates the utility value of the terminal state from the point of view
of MAX; high values are assumed to be good for MAX and bad for MIN.

Department of CSE, GIT ECS302: AI 7


OPTIMAL DECISIONS IN GAMES
• Normal Search Problem:
• Optimal solution is a sequence of actions leading to goal state.
• Adversarial search problem.
• Min interferes the sequence of actions.
• Strategy for MAX:
– Specify the moves in the Initial state.
– Observe every possible response by min.
– Specify moves in response.

Department of CSE, GIT ECS302: AI 8


Continued...

For example:
Moves by max at root node are a1,a2,a3…
Possible replies to a1 from MIN are b1,b2….

Department of CSE, GIT ECS302: AI 9


Department of CSE, GIT ECS302: AI 10
Continued...

 The optimal strategy of each node is determined from the minimax value which is written as MINIMAX(n).
 The minimax value of a node is the utility (for MAX) of being in the corresponding state, assuming that both
players play optimally from there to the end of the game.
 The minimax value of a terminal state is its utility.
 MAX prefers to move to a state of maximum value, and MIN prefers a state of minimum value.

Department of CSE, GIT ECS302: AI 11


Min max Algorithm

• The minimax algorithm computes the minimax decision from the current state.
• It uses a simple recursive computation of the minimax values of each successor state, directly implementing the
defining equations.
• The recursion proceeds all the way down to the leaves of the tree, and then the minimax values are backed up
through the tree as the recursion unwinds.
 The minimax algorithm performs a complete depth-first exploration of the game tree.
 If the maximum depth of the tree is m, b is legal moves.
• Time Complexity :(bm).
• Space complexity :O(bm)

Department of CSE, GIT ECS302: AI 12


Continued...

The minimax values are backed up through recursion


Department of CSE, GIT ECS302: AI 13
• The algorithm first recurses down to the three bottom left nodes and
uses the UTILITY function on them to discover that their values are 3,
12, and 8, respectively.
• Then it takes the minimum of these values, 3, and returns it as the
backedup value of node B.
• A similar process gives the backed-up values of 2 for C and 2 for D.
• Finally, we take the maximum of 3, 2, and 2 to get the backed-up value
of 3 for the root node.

Department of CSE, GIT ECS302: AI 14


Department of CSE, GIT ECS302: AI 15
Optimal decisions in multiplayer games
 The single value for each node gets replaced with a vector of values (one for each opponent).
 Players A, B, and C, a vector <VA, VB , VC > is associated
 This vector gives values at terminal nodes from each player’s viewpoint.
 UTILITY function returns a vector of utilities.
 Multiplayer games usually involve alliances, whether formal or informal, among the players. Alliances are made
and broken as the game proceeds.

Department of CSE, GIT ECS302: AI 16


ALPHA–BETA PRUNING

 The problem with minimax search is that the number of game states it has to examine is exponential in the depth
of the tree.
 With pruning, it is possible to compute the correct minimax decision without looking at every node in the game
tree.
 ALPHA–BETA beta pruning: When applied to a standard minimax tree, it returns the same move as minimax
would, but prunes away branches that cannot possibly influence the final decision.

Department of CSE, GIT ECS302: AI 17


Continued...

Department of CSE, GIT ECS302: AI 18


Continued...

 Alpha–beta pruning gets its name from the following two parameters that describe bounds on the backed-up
values that appear anywhere along the path.
α = the value of the best (i.e., highest-value) choice found so far at any choice point along the path for MAX.
β = the value of the best (i.e., lowest-value) choice found so far at any choice point along the path for MIN.
 Alpha–beta search updates the values of α and β as it goes along and prunes the remaining branches at a node
(i.e., terminates the recursive call) as soon as the value of the current node is known to be worse than the current
α or β value for MAX or MIN, respectively.

Department of CSE, GIT ECS302: AI 19


Continued...

Move ordering
• The effectiveness of alpha–beta pruning is highly dependent on the order in which the states are examined.
• We could not prune any successors of D at all because the worst successors (from the point of view of MIN) were
generated first.
• If the third successor of D had been generated first, we would have been able to prune the other two.
• Alpha–Beta needs to examine only O(bm/2) nodes to pick the best move, instead of O(bm) for minimax.
• Dynamic move ordering schemes are called killer moves and heuristic is killer move heuristic trying them first.

Department of CSE, GIT ECS302: AI 20


Continued...

Department of CSE, GIT ECS302: AI 21


IMPERFECT REAL-TIME DECISIONS

 The minimax algorithm generates the entire game search space, whereas the alpha–beta algorithm allows us to
prune large parts of it.
 Alpha-beta pruning can help, but searches can still take too long(have to go all the way to leafs).
 To improve on this, terminate searches early based on a heuristic evaluation function.
Treats non-terminal nodes like leaves.
 So, modify min-max or alpha-beta by:
 Replace utility function with heuristic function EVAL.
 Replace terminal test with a cutoff test.

 Heuristic minimax for state s and maximum depth d:

Department of CSE, GIT ECS302: AI 22


EVALUATION FUNCTIONS

Provide an estimate of the utility from a given position of a move.


Good evaluation functions are a must,bad ones lose you the game.
Desirable properties of evaluation function:

a) Should order terminal states by desirability.


b) Computation must take reasonable time.
c) Should be strongly correlated to the actual chance of a win.
i) Can’t examine everything, we’re cutting off some states. Introduces uncertainty.
ii) Computational uncertainty, not random chance uncertainity.

Department of CSE, GIT ECS302: AI 23


EVALUATION FUNCTIONS(Working)

Calculate features of a state.


Define categories or equivalence classes of states. The states in each category have the same values for all the features.
E.g. All 1 pawn vs 2 pawn states. Each category will win some, lose some, draw some.
Function figures out the ratio for each outcome.

For 1 pawn vs 2 pawn, may be 72:20:8 W:L:D ratio.


Use this ratio to compute an expected value and order based on this.
Expected value: (0.72 × +1) + ( 0.20 × 0) + (0.08 × 1/2) = 0.76.
This kind of analysis requires too many categories and too much experience to estimate all the probabilities of
winning called material value.

Department of CSE, GIT ECS302: AI 24


EVALUATION FUNCTIONS(Working)

Most evaluation functions compute separate numerical contributions from each feature and then combine them to find
the total value. For e.g.,
• pawn is worth 1
• a knight or bishop is worth 3,
• a rook 5
• the queen 9.
Mathematically, known as weighted linear function because it can be expressed as

each wi is a weight and each fi is a feature of the position

Department of CSE, GIT ECS302: AI 25


Cutting off Search

Modify ALPHA-BETA-SEARCH to call the heuristic EVAL function when it is appropriate to cut off the search.
if CUTOFF-TEST(state, depth) then return EVAL(state)
Choose a depth d that allows for evaluation within the desired time frame.
Set a fixed depth limit so that CUTOFF-TEST(state,depth) will control the amount of search.
 Not perfect ,not a guarantee, just gives best chance. Counter moves exist even for the highest evaluated
move.
The evaluation function should be applied only to positions that are quiescent, which are unlikely to exhibit
wild swings in value in the near future.
Nonquiescent positions can be expanded further until quiescent positions are reached. This extra search is
called a quiescence search

Department of CSE, GIT ECS302: AI 26


Forward Pruning

Forward moves are eliminated without any consideration.


Beam search is one type of forward pruning.
 On each ply, consider only a “beam” of the n best moves rather than considering all possible moves.
No guarantee best move doesn’t get pruned.
The PROBCUT, or probabilistic cut, algorithm is a forward-pruning version of alpha–beta search uses
statistics gained from prior experience to guess which moves are probably the safest to cut out.
 Alpha–beta search prunes any node that is provably outside the current (α, β) window. PROBCUT
also prunes nodes that are probably outside the window.
Does a shallow search computing backed-up value v .
Use statistics to compute probability using v at a depth of d.
Built a game of Othello that beat traditional algorithm most of the time.

Department of CSE, GIT ECS302: AI 27


Search vs lookup

Overkill to look at entire tree for just the opening move.


Good openings and endings have been known for a while.
 For these situations, use a table to find the best move. (much quicker).
Table is great for the best moves, which have been studied by humans.
 For end game, computer is better thinking all the combinations possible quickly.
Closing in on the checkmate can take a human a lot of time to figure out.
Computer computes a policy, mapping from every possible state to the best move in that state.
Then, just look up that move instead of recomputing it over and over.

Department of CSE, GIT ECS302: AI 28


How big will the KBNK lookup table be?

Numbers:
462 ways that two kings can be placed on the board without being adjacent.
62 empty squares for the bishop, 61 for the knight, and two squares for each players
to move next.
So there are just 462 × 62 × 61 × 2 = 3, 494, 568 possible positions. Some are
checkmates. Put them in a table.
From the table, perform retrograde search which is a search through moves in reverse.
Look at all possibilities. Eventually, you get a guaranteed set of moves and a win for KBNK .

Department of CSE, GIT ECS302: AI 29


Stochastic Games

 Games with random chance (dice rolls) are stochastic games . E.g., Backgammon.
 Black player knows where all the pieces are, but can’t know ahead of time where white will move because of the
random dice roll. Can’t make a standard game tree.

Department of CSE, GIT ECS302: AI 30


Stochastic Games(Game Tree)

 Requires a tree containing chance nodes in addition to min and max nodes.
 They consider the possible dice rolls.

Department of CSE, GIT ECS302: AI 31


Stochastic Games(Continued)

 Each chance node gets dictated by probabilities of the die rolls 1/36,1/18 , etc.
 Uncertainty. Only possible to calculate a position’s expected value: Average of all possible outcomes of the chance
nodes.
 Generalize the deterministic game’s minimax value to an expectiminimax value for games with chance nodes.
 For chance nodes, sum the value of all outcomes(weighted using probability):

 where r represents a possible dice roll (or other chance event) and RESULT(s, r) is the same state as s.

Department of CSE, GIT ECS302: AI 32


Evaluation functions for games of chance

 The presence of chance nodes specify one has to be more careful about what the evaluation values mean.
 Assigning the values [1, 2, 3, 4] to the leaves, move a1 is best; with values [1, 20, 30, 400], move a2 is best.

 Assume the program knew in advance all the dice rolls that would occur ,then the performance is O(bm) time, b is
the branching factor and m is the maximum depth
 Since expectiminimax is also considering all the possible dice-roll sequences, it will take O(bmnm), where n is the
number of distinct rolls.
 In backgammon n is 21 and b is usually around 20, but at times can be as high as 4000 for dice rolls that are
doubles.
Department of CSE, GIT ECS302: AI 33
Evaluation functions for games of chance

 By putting bounds on the possible values of the utility function, then something like alpha-beta pruning can be
done to improve performance.

 Example:
 If all utility values are between −2 and +2; then the value of leaf nodes is bounded,.
 we can place an upper bound on the value of a chance node without looking at all its children.
 Alternative: Monte Carlo Simulation
 Evaluate the position by start with alpha-beta algorithm.
 Play thousands of games against itself, using random dice rolls.
 Provides a win percentage that can be used as a heuristic, which can be good for backgammon.
 For games with dice, this type of simulation is called a rollout.

Department of CSE, GIT ECS302: AI 34


State of the Art Game Programs

 Chess: IBM’s DEEP BLUE chess program, known for defeating world champion Garry Kasparov running on a
parallel computer with 30 IBM RS/6000 processors doing alpha–beta search.
 Consisted of 480 custom VLSI chess processors which performed move generation and move ordering for the
last few levels of the tree, and evaluated the leaf nodes.
 Deep Blue searched up to 30 billion positions per move, reaching depth 14 routinely.
 The evaluation function had over 8000 features, many of them describing highly specific patterns of pieces. An
“opening book” of about 4000 positions and database of 700,000 grandmaster games from which consensus
recommendations could be extracted.
 The system also used a large endgame database of solved positions containing all positions with five pieces and
many with six pieces.

Department of CSE, GIT ECS302: AI 35


State of the Art Game Programs

 Pruning heuristics which are effective in reducing the branching factor to less than 3 (compared with the actual
branching factor of about 35) have been used.
 Null move heuristic, which generates a good lower bound on the value of a position, using a shallow search in
which the opponent gets to move twice at the beginning. This lower bound often allows alpha–beta pruning
without the expense of a full-depth search.
 Futility pruning, which helps decide in advance which moves will cause a beta cutoff in the successor nodes.
 HYDRA can be seen as the successor to DEEP BLUE with FPGA (Field Programmable Gate Array) chips. HYDRA
reaches 18 plies deep rather than just 14 because of aggressive use of the null move heuristic and forward
pruning.
 RYBKA, winner of the 2008 and 2009 World Computer Chess Championships, is considered the strongest current
computer player. It uses an off-the-shelf 8-core 3.2 GHz Intel Xeon processor, but little is known about the design
of the program. RYBKA’s main advantage appears to be its evaluation function.

Department of CSE, GIT ECS302: AI 36


State of the Art Game Programs

Checkers: Jonathan Schaeffer CHECKERS and


colleagues developed CHINOOK, uses alpha–beta
search. Chinook defeated the long-running human
champion in and has been able to play perfectly by
using alpha–beta search combined with a database of
39 trillion endgame positions.

Department of CSE, GIT ECS302: AI 37


State of the Art Game Programs

Othello, also called Reversi, is probably more popular as a


computer game than as a board game having a smaller search
space than chess, usually 5 to 15 legal moves.In 1997, the
LOGISTELLO program (Buro, 2002) defeated the human world
champion, TakeshiMurakami, by six games to none.
Humans are no match for computers at Othello.

Department of CSE, GIT ECS302: AI 38


State of the Art Game Programs

Backgammon: Gerry Tesauro (1992) combined reinforcement


learning with neural networks with a remarkably accurate
evaluator with a search to depth 2 or 3. After playing more than
a million training games against itself, Tesauro’s program,
TD-GAMMON, is competitive with top human players.

Department of CSE, GIT ECS302: AI 39


State of the Art Game Programs

Go is the most popular board game in Asia. Because the board is


19 × 19 and the branching factor starts at 361, which is too
daunting for regular alpha–beta search methods. MOGO use
Monte Carlo rollouts. The UCT (upper confidence bounds on
trees) method works by making random moves in the first few
iterations, and over time guiding the sampling process to prefer
moves that have led to wins in previous samples. Some programs
also include special techniques from combinatorial game theory
to analyze endgames. These techniques decompose a position
into sub-positions that can be analyzed separately and then
combined .

Department of CSE, GIT ECS302: AI 40


State of the Art Game Programs

Bridge is a card game of imperfect information: a player’s cards


are hidden from the other players. Bridge is also a multiplayer
game with four players instead of two, although the players are
paired into two teams. Optimal play in partially observable
games like bridge can include elements of information gathering,
communication, and careful weighing of probabilities. Many of
these techniques are used in the Bridge Baron program
(Smith et al., 1998), which won the 1997 computer bridge
championship. Bridge Baron is one of the few successful game-
playing systems to use complex, hierarchical involving high-level
ideas, such as finessing and squeezing, that are familiar to bridge
players.

Department of CSE, GIT ECS302: AI 41


State of the Art Game Programs

Scrabble: Most people think the hard part about Scrabble


SCRABBLE is coming up with good words, but given the official
dictionary, it turns out to be rather easy to program a move
generator to find the highest-scoring move (Gordon, 1994). The
problem is that Scrabble is both partially observable and
stochastic: as we are unaware of the letters the other player has
or what letters you will draw next. So playing Scrabble well
combines the difficulties of backgammon and bridge. QUACKLE
program defeated the former world champion, David Boys, 3–2.

Department of CSE, GIT ECS302: AI 42


Department of CSE, GIT ECS302: AI 43

You might also like