0% found this document useful (0 votes)

84 views43 pages

Adversarial Search in AI Games

Uploaded by

Abhishek Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views43 pages

Adversarial Search in AI Games

Uploaded by

Abhishek Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Topic for the class:

Module-III
Adversarial Search
V.S.V.S.MURTHY
Assistant Professor
Department of CSE
GITAM School of Technology (GST)
Visakhapatnam – 530045
Email: [email protected]
Mobile: 9989720516

Department of CSE, GST CSEN2031: AI 1

Games
 Multi-agent needs to consider the actions of other agents and how they affect its own welfare.
 Competitive environments, in which the agents goals are in conflict, giving rise to adversarial search problems
known as games.
 In AI, the most common games are deterministic, two-player, turn taking ,zero-sum games of perfect information
(such as chess).
 Games are good examples of Adversarial search.
– States are easy to represent
– Agents are restricted to finite number of actions
– Outcome of agent is defined by precise rules.
– Too hard to solve

Department of CSE, GST CSEN2031 : AI 2

Games
Example: Which games are Adversarial?

8-PUZZLE
N-QUEEN
CHESS(Adversarial)
TIC-TAC-TOE(Adversarial)

Department of CSE, GST CSEN2031 : AI 3

Games
A game is formally defined with the following elements:
 S0: The initial state, which specifies how the game is set up at the start.
 PLAYER(s): Defines which player has the move in a state.
 ACTIONS(s): Returns the set of legal moves in a state.
 RESULT(s, a): The transition model, which defines the result of a move.
 TERMINAL-TEST(s): A terminal test, which is true when the game is over and false otherwise. States where the game has
ended are called terminal states.
 UTILITY(s, p): A utility function (also called an objective function or payoff function), is the final numeric value for a game
that ends in terminal state s for a player p. In chess, the outcome is a win, loss, or draw, with values +1, 0, or ½.
 A zero-sum game is defined as one where the total payoff to all players is the same for every instance of the
game. Chess is zero-sum because every game has payoff of either 0 + 1, 1 + 0 or ½ .

Department of CSE, GST CSEN2031 : AI 4

Game Tree
• Consider a TicTacToe game of two players:
• Player 1:MAX
• Player 2:MIN
• MAX moves first(places X) followed by MIN(Places O)
• The initial state, ACTIONS function, and RESULT function define the
game tree for the game
– where the nodes are game states and the edges are moves.
•

Department of CSE, GIT ECS302: AI 5

• From the initial state, MAX has nine possible moves.
• Play alternates between MAX’s placing an X and MIN’s placing an O
until we reach leaf nodes corresponding to terminal states such that
one player has three in a row or all the squares are filled.

Department of CSE, GIT ECS302: AI 6

Game tree
The number on each leaf node indicates the utility value of the terminal state from the point of view
of MAX; high values are assumed to be good for MAX and bad for MIN.

Department of CSE, GIT ECS302: AI 7

OPTIMAL DECISIONS IN GAMES
• Normal Search Problem:
• Optimal solution is a sequence of actions leading to goal state.
• Adversarial search problem.
• Min interferes the sequence of actions.
• Strategy for MAX:
– Specify the moves in the Initial state.
– Observe every possible response by min.
– Specify moves in response.

Department of CSE, GIT ECS302: AI 8

Continued...

For example:
Moves by max at root node are a1,a2,a3…
Possible replies to a1 from MIN are b1,b2….

Department of CSE, GIT ECS302: AI 9

Department of CSE, GIT ECS302: AI 10
Continued...

 The optimal strategy of each node is determined from the minimax value which is written as MINIMAX(n).
 The minimax value of a node is the utility (for MAX) of being in the corresponding state, assuming that both
players play optimally from there to the end of the game.
 The minimax value of a terminal state is its utility.
 MAX prefers to move to a state of maximum value, and MIN prefers a state of minimum value.

Department of CSE, GIT ECS302: AI 11

Min max Algorithm

• The minimax algorithm computes the minimax decision from the current state.
• It uses a simple recursive computation of the minimax values of each successor state, directly implementing the
defining equations.
• The recursion proceeds all the way down to the leaves of the tree, and then the minimax values are backed up
through the tree as the recursion unwinds.
 The minimax algorithm performs a complete depth-first exploration of the game tree.
 If the maximum depth of the tree is m, b is legal moves.
• Time Complexity :(bm).
• Space complexity :O(bm)

Department of CSE, GIT ECS302: AI 12

Continued...

The minimax values are backed up through recursion

Department of CSE, GIT ECS302: AI 13
• The algorithm first recurses down to the three bottom left nodes and
uses the UTILITY function on them to discover that their values are 3,
12, and 8, respectively.
• Then it takes the minimum of these values, 3, and returns it as the
backedup value of node B.
• A similar process gives the backed-up values of 2 for C and 2 for D.
• Finally, we take the maximum of 3, 2, and 2 to get the backed-up value
of 3 for the root node.

Department of CSE, GIT ECS302: AI 14

Department of CSE, GIT ECS302: AI 15
Optimal decisions in multiplayer games
 The single value for each node gets replaced with a vector of values (one for each opponent).
 Players A, B, and C, a vector <VA, VB , VC > is associated
 This vector gives values at terminal nodes from each player’s viewpoint.
 UTILITY function returns a vector of utilities.
 Multiplayer games usually involve alliances, whether formal or informal, among the players. Alliances are made
and broken as the game proceeds.

Department of CSE, GIT ECS302: AI 16

ALPHA–BETA PRUNING

 The problem with minimax search is that the number of game states it has to examine is exponential in the depth
of the tree.
 With pruning, it is possible to compute the correct minimax decision without looking at every node in the game
tree.
 ALPHA–BETA beta pruning: When applied to a standard minimax tree, it returns the same move as minimax
would, but prunes away branches that cannot possibly influence the final decision.

Department of CSE, GIT ECS302: AI 17

Continued...

Department of CSE, GIT ECS302: AI 18

Continued...

 Alpha–beta pruning gets its name from the following two parameters that describe bounds on the backed-up
values that appear anywhere along the path.
α = the value of the best (i.e., highest-value) choice found so far at any choice point along the path for MAX.
β = the value of the best (i.e., lowest-value) choice found so far at any choice point along the path for MIN.
 Alpha–beta search updates the values of α and β as it goes along and prunes the remaining branches at a node
(i.e., terminates the recursive call) as soon as the value of the current node is known to be worse than the current
α or β value for MAX or MIN, respectively.

Department of CSE, GIT ECS302: AI 19

Continued...

Move ordering
• The effectiveness of alpha–beta pruning is highly dependent on the order in which the states are examined.
• We could not prune any successors of D at all because the worst successors (from the point of view of MIN) were
generated first.
• If the third successor of D had been generated first, we would have been able to prune the other two.
• Alpha–Beta needs to examine only O(bm/2) nodes to pick the best move, instead of O(bm) for minimax.
• Dynamic move ordering schemes are called killer moves and heuristic is killer move heuristic trying them first.

Department of CSE, GIT ECS302: AI 20

Continued...

Department of CSE, GIT ECS302: AI 21

IMPERFECT REAL-TIME DECISIONS

 The minimax algorithm generates the entire game search space, whereas the alpha–beta algorithm allows us to
prune large parts of it.
 Alpha-beta pruning can help, but searches can still take too long(have to go all the way to leafs).
 To improve on this, terminate searches early based on a heuristic evaluation function.
Treats non-terminal nodes like leaves.
 So, modify min-max or alpha-beta by:
 Replace utility function with heuristic function EVAL.
 Replace terminal test with a cutoff test.

 Heuristic minimax for state s and maximum depth d:

Department of CSE, GIT ECS302: AI 22

EVALUATION FUNCTIONS

Provide an estimate of the utility from a given position of a move.

Good evaluation functions are a must,bad ones lose you the game.
Desirable properties of evaluation function:

a) Should order terminal states by desirability.

b) Computation must take reasonable time.
c) Should be strongly correlated to the actual chance of a win.
i) Can’t examine everything, we’re cutting off some states. Introduces uncertainty.
ii) Computational uncertainty, not random chance uncertainity.

Department of CSE, GIT ECS302: AI 23

EVALUATION FUNCTIONS(Working)

Calculate features of a state.

Define categories or equivalence classes of states. The states in each category have the same values for all the features.
E.g. All 1 pawn vs 2 pawn states. Each category will win some, lose some, draw some.
Function figures out the ratio for each outcome.

For 1 pawn vs 2 pawn, may be 72:20:8 W:L:D ratio.

Use this ratio to compute an expected value and order based on this.
Expected value: (0.72 × +1) + ( 0.20 × 0) + (0.08 × 1/2) = 0.76.
This kind of analysis requires too many categories and too much experience to estimate all the probabilities of
winning called material value.

Department of CSE, GIT ECS302: AI 24

EVALUATION FUNCTIONS(Working)

Most evaluation functions compute separate numerical contributions from each feature and then combine them to find
the total value. For e.g.,
• pawn is worth 1
• a knight or bishop is worth 3,
• a rook 5
• the queen 9.
Mathematically, known as weighted linear function because it can be expressed as

each wi is a weight and each fi is a feature of the position

Department of CSE, GIT ECS302: AI 25

Cutting off Search

Modify ALPHA-BETA-SEARCH to call the heuristic EVAL function when it is appropriate to cut off the search.
if CUTOFF-TEST(state, depth) then return EVAL(state)
Choose a depth d that allows for evaluation within the desired time frame.
Set a fixed depth limit so that CUTOFF-TEST(state,depth) will control the amount of search.
 Not perfect ,not a guarantee, just gives best chance. Counter moves exist even for the highest evaluated
move.
The evaluation function should be applied only to positions that are quiescent, which are unlikely to exhibit
wild swings in value in the near future.
Nonquiescent positions can be expanded further until quiescent positions are reached. This extra search is
called a quiescence search

Department of CSE, GIT ECS302: AI 26

Forward Pruning

Forward moves are eliminated without any consideration.

Beam search is one type of forward pruning.
 On each ply, consider only a “beam” of the n best moves rather than considering all possible moves.
No guarantee best move doesn’t get pruned.
The PROBCUT, or probabilistic cut, algorithm is a forward-pruning version of alpha–beta search uses
statistics gained from prior experience to guess which moves are probably the safest to cut out.
 Alpha–beta search prunes any node that is provably outside the current (α, β) window. PROBCUT
also prunes nodes that are probably outside the window.
Does a shallow search computing backed-up value v .
Use statistics to compute probability using v at a depth of d.
Built a game of Othello that beat traditional algorithm most of the time.

Department of CSE, GIT ECS302: AI 27

Search vs lookup

Overkill to look at entire tree for just the opening move.

Good openings and endings have been known for a while.
 For these situations, use a table to find the best move. (much quicker).
Table is great for the best moves, which have been studied by humans.
 For end game, computer is better thinking all the combinations possible quickly.
Closing in on the checkmate can take a human a lot of time to figure out.
Computer computes a policy, mapping from every possible state to the best move in that state.
Then, just look up that move instead of recomputing it over and over.

Department of CSE, GIT ECS302: AI 28

How big will the KBNK lookup table be?

Numbers:
462 ways that two kings can be placed on the board without being adjacent.
62 empty squares for the bishop, 61 for the knight, and two squares for each players
to move next.
So there are just 462 × 62 × 61 × 2 = 3, 494, 568 possible positions. Some are
checkmates. Put them in a table.
From the table, perform retrograde search which is a search through moves in reverse.
Look at all possibilities. Eventually, you get a guaranteed set of moves and a win for KBNK .

Department of CSE, GIT ECS302: AI 29

Stochastic Games

 Games with random chance (dice rolls) are stochastic games . E.g., Backgammon.
 Black player knows where all the pieces are, but can’t know ahead of time where white will move because of the
random dice roll. Can’t make a standard game tree.

Department of CSE, GIT ECS302: AI 30

Stochastic Games(Game Tree)

 Requires a tree containing chance nodes in addition to min and max nodes.
 They consider the possible dice rolls.

Department of CSE, GIT ECS302: AI 31

Stochastic Games(Continued)

 Each chance node gets dictated by probabilities of the die rolls 1/36,1/18 , etc.
 Uncertainty. Only possible to calculate a position’s expected value: Average of all possible outcomes of the chance
nodes.
 Generalize the deterministic game’s minimax value to an expectiminimax value for games with chance nodes.
 For chance nodes, sum the value of all outcomes(weighted using probability):

 where r represents a possible dice roll (or other chance event) and RESULT(s, r) is the same state as s.

Department of CSE, GIT ECS302: AI 32

Evaluation functions for games of chance

 The presence of chance nodes specify one has to be more careful about what the evaluation values mean.
 Assigning the values [1, 2, 3, 4] to the leaves, move a1 is best; with values [1, 20, 30, 400], move a2 is best.

 Assume the program knew in advance all the dice rolls that would occur ,then the performance is O(bm) time, b is
the branching factor and m is the maximum depth
 Since expectiminimax is also considering all the possible dice-roll sequences, it will take O(bmnm), where n is the
number of distinct rolls.
 In backgammon n is 21 and b is usually around 20, but at times can be as high as 4000 for dice rolls that are
doubles.
Department of CSE, GIT ECS302: AI 33
Evaluation functions for games of chance

 By putting bounds on the possible values of the utility function, then something like alpha-beta pruning can be
done to improve performance.

 Example:
 If all utility values are between −2 and +2; then the value of leaf nodes is bounded,.
 we can place an upper bound on the value of a chance node without looking at all its children.
 Alternative: Monte Carlo Simulation
 Evaluate the position by start with alpha-beta algorithm.
 Play thousands of games against itself, using random dice rolls.
 Provides a win percentage that can be used as a heuristic, which can be good for backgammon.
 For games with dice, this type of simulation is called a rollout.