AI Unit 2 Adversarial Search
AI Unit 2 Adversarial Search
🞆 ADVERSARIAL SEARCH
• In previous topics, we have studied the search strategies which are only
associated with a single agent that aims to find the solution which often
expressed in the form of a sequence of actions.
• The environment with more than one agent is termed as multi-agent environment
where each agent is an opponent of other agent, playing against each other
considering the action of other agent and effect of that action on their performance
• So, Searches in which two or more players with conflicting goals are
trying to explore the same search space for the solution, are called
adversarial searches, often known as Games.
• Games are modeled as a Search problem and heuristic evaluation function, and
these are the two main factors which help to model and solve games in AI.
TYPES OF GAMES IN AI:
Deterministic Chance Moves
(Non-D/stochastic)
Perfect information Chess, Checkers Backgammon, monopoly
•Imperfect information: If in a game agents do not have all information about the game
and not aware with what's going on. Examples such as tic-tac-toe, Battleship
•Deterministic games: Deterministic games are those games which follow a strict
pattern and set of rules for the games, and there is no randomness associated with them.
Examples are chess, Checkers, Go, tic-tac-toe, etc.
• From the initial state, MAX has 9 possible moves as he starts first. MAX place
x and MIN place o, and both player plays alternatively until we reach a leaf
node where one player has three in a row or all squares are filled.
• Both players will compute for each node, minimax, the minimax value which
is the best achievable utility against an optimal adversary.
• Suppose both the players are well aware of the tic-tac-toe and playing the best
play. Each player is doing his best to prevent another one from winning. MIN
is acting against Max in the game.
• So in the game tree, we have a layer of Max, a layer of MIN, and each layer is
called as Ply. Max place x, then MIN puts o to prevent Max from winning,
and this game continues until the terminal node.
• In this either MIN wins, MAX wins, or it's a draw. This game-tree is the whole
search space of possibilities that MIN and MAX are playing tic-tac-toe and
taking turns alternately.
Hence adversarial Search for the minimax procedure works as follows:
• It aims to find the optimal strategy for MAX to win the game.
• In the game tree, optimal leaf node could appear at any depth of the tree. It
follows the approach of Depth-first search.
• Propagate the minimax values up to the tree until the terminal node
discovered.
🞆 In a given game tree, the optimal strategy can be determined from the minimax
value of each node, which can be written as MINIMAX(n). MAX prefer to move
to a state of maximum value and MIN prefer to move to a state of minimum
value then:
MINIMAX(s) =
⎧ UTILITY(s) if TERMINAL-TEST(s)
⎨ maxa∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MAX
a3.
🞆 The possible replies to a1 for MIN are b1, b2, b3, and so on.
🞆 This particular game ends after one move each by MAX and MIN.
(In game parlance, we say that this tree is one move deep, consisting
of two half-moves, each of which is called a ply.)
🞆 The utilities of the terminal states in this game range from 2 to 14.
🞆 Given a game tree, the optimal strategy can be determined
from the minimax value of each node, which we write as
MINIMAX(n).
🞆 The minimax value of a node is the utility (for MAX) of being
in the corresponding state, assuming that both players play
optimally from there to the end of the game.
🞆 Obviously, the minimax value of a terminal state is just its
utility.
🞆 Furthermore, given a choice, MAX prefers to move to a state
of maximum value, whereas MIN prefers a state of minimum
value.
🞆 So we have the following:
🞆 MINIMAX(s) =
🞆 ⎧ UTILITY(s) if TERMINAL-TEST(s)
🞆 ⎨ maxa∈Actions(s) MINIMAX(RESULT(s, a)) if
PLAYER(s) = MAX
🞆 ⎩ mina∈Actions(s) MINIMAX(RESULT(s, a)) if
PLAYER(s) = MIN
🞆 Let us apply these definitions to the game tree in Figure 5.2.
🞆 The terminal nodes on the bottom level get their utility values from the
game’s UTILITY function.
🞆 The first MIN node, labeled B, has three successor states with values 3,
12, and 8, so its minimax value is 3.
🞆 Similarly, the other two MIN nodes have minimax value 2.
🞆 The root node is a MAX node; its successor states have minimax values 3,
2, and 2; so it has a minimax value of 3.
🞆 We can also identify the minimax decision
at the root: action a1 is the optimal choice for
MAX because it leads to the state with the
highest minimax value.
🞆 This definition of optimal play for MAX
assumes that MIN also plays optimally—it
maximizes the worst-case outcome for MAX.
What if MIN does not play optimally? Then it
is easy to show that MAX will do even better.
THE MINIMAX ALGORITHM
🞆 The minimax algorithm (Figure 5.3) computes the minimax decision from
the current state. It uses a simple recursive computation of the minimax
values of each successor state, directly implementing the defining
equations. The recursion proceeds all the way down to the leaves of the tree,
and then the minimax values are backed up through the tree as the
recursion unwinds. For example, in Figure 5.2, the algorithm first recurses
down to the three bottom- left nodes and uses the UTILITY function on
them to discover that their values are 3, 12, and 8, respectively. Then it
takes the minimum of these values, 3, and returns it as the backed- up value
of node B. A similar process gives the backed-up values of 2 for C and 2 for
D. Finally, we take the maximum of 3, 2, and 2 to get the backed-up value
of 3 for the root node. The minimax algorithm performs a complete depth-
first exploration of the game tree.
🞆 If the maximum depth of the tree is m and there are b legal moves at each
point, then the time complexity of the minimax algorithm is O(b m). The
space complexity is O(bm) for an algorithm that generates all actions at
once, or O(m) for an algorithm that generates actions one at a time (see
page 87). For real games, of course, the time cost is totally impractical, but
this algorithm serves as the basis for the mathematical analysis of games
and for more practical algorithms.
EXAMPLE PROBLEM
🞆 Step-1: In the first step, the algorithm generates the entire game-tree and
apply the utility function to get the utility values for the terminal states. In
the below tree diagram, let's take A is the initial state of the tree. Suppose
maximizer takes first turn which has worst-case initial value =- infinity, and
minimizer will take next turn which has worst-case initial value = +infinity.
Step 2: Now, first we find the utilities value for the Maximizer, its
initial value is -∞, so we will compare each value in terminal state with
initial value of Maximizer and determines the higher nodes values. It
will find the maximum among the all.
•For node D max(-1,- -∞) => max(-1,4)= 4
•For Node E max(2, -∞) => max(2, 6)= 6
•For Node F max(-3, -∞) => max(-3,-5) = -3
•For node G max(0, -∞) = max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare
all nodes value with +∞, and will find the 3rd layer node values.
•For node B= min(4,6) = 4
•For node C= min (-3, 7) = -3
Step 4: Now it's a turn for Maximizer, and it will again choose the
maximum of all nodes value and find the maximum value for the root
node. In this game tree, there are only 4 layers, hence we reach
immediately to the root node, but in real games, there will be more
than 4 layers.
•For node A max(4, -3)= 4
🞆 Properties of Mini-Max algorithm:
• Complete- Min-Max algorithm is Complete. It will definitely find a solution
(if exist), in the finite search tree.
• Optimal- Min-Max algorithm is optimal if both opponents are playing
optimally.
• Time complexity- As it performs DFS for the game-tree, so the time
complexity of Min-Max algorithm is O(bm), where b is branching factor of the
game-tree, and m is the maximum depth of the tree.
• Space Complexity- Space complexity of Mini-max algorithm is also similar
to DFS which is O(bm).
🞆 Limitation of the minimax Algorithm:
🞆 The main drawback of the minimax algorithm is that it gets really slow for
complex games such as Chess, go, etc. This type of games has a huge
branching factor, and the player has lots of choices to decide. This limitation of
the minimax algorithm can be improved from alpha-beta pruning which we
have discussed in the next topic.
ALPHA-BETA PRUNING
• Alpha-beta pruning is a modified version of the minimax algorithm. It is an
optimization technique for the minimax algorithm.
• As we have seen in the minimax search algorithm that the number of game
states it has to examine are exponential in depth of the tree. Since we cannot
eliminate the exponent, but we can cut it to half.
• Hence there is a technique by which without checking each node of the game tree
we can compute the correct minimax decision, and this technique is called pruning.
• This involves two threshold parameter Alpha and beta for future expansion, so
it is called alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
• Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prune the tree leaves but also entire sub-tree.
• Beta: The best (lowest-value) choice we have found so far at any point along
the path of Minimizer. The initial value of beta is +∞.
🞆 The Alpha-beta pruning to a standard
minimax algorithm returns the same move
as the standard algorithm does, but it
removes all the nodes which are not
really affecting the final decision but
making algorithm slow. Hence by pruning
these nodes, it makes the algorithm fast
• While backtracking the tree, the node values will be passed to upper
nodes instead of values of alpha and beta
• We will only pass the alpha, beta values to the child nodes
Step 1: At the first step the, Max player will start first move
from node A where α= -∞ and β= +∞, these value of alpha and beta
passed down to node B where again α= -∞ and β= +∞, and Node B
passes the same value to its child D.
Step 2: At Node D, the value of α will be calculated as its turn for
Max. The value of α is compared with firstly 2 and then 3, and the max
(2, 3) = 3 will be the value of α at node D and node value will also 3
In the next step, algorithm traverse the next successor of Node B which
is node E, and the values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will
change. The current value of alpha will be compared with 5, so max (-∞,
5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right
successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from
node B to node A. At node A, the value of alpha will be changed the
maximum available value is 3 as max (-∞, 3)= 3, and β= +∞, these
two values now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to
node F.
Step 6: At node F, again the value of α will be compared with left child
which is 0, and max(3,0)= 3, and then compared with right child which
is 1, and max(3,1)= 3 still α remains 3, but the node value of F will
become 1.
🞆 Step 7: Node F returns the node value 1 to node C, at C α= 3
and β= +∞, here the value of beta will be changed, it will
compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child
of C which is G will be pruned, and the algorithm will not
compute the entire sub-tree G.
🞆 Step 8: C now returns the value of 1 to A here the best value
for A is max (3, 1) = 3. Following is the final game tree which
is the showing the nodes which are computed and nodes
which has never computed. Hence the optimal value for the
maximizer is 3 for this example.
2-PLY WITH ALPHA BETA PRUNING
🞆 Consider again the two-ply game tree from
Figure 5.2. Let’s go through the calculation of
the optimal decision once more, this time
paying careful attention to what we know at
each point in the process. The steps are
explained in Figure 5.5. The outcome is that
we can identify the minimax decision without
ever evaluating two of the leaf nodes.
🞆 The general principle is this: consider a node
n somewhere in the tree (see Figure 5.6),
such that Player has a choice of moving to
that node. If Player has a better choice m
either at the parent node of n or at any
choice point further up, then n will never be
reached in actual play. So once we have
found out enough about n (by examining
some of its descendants) to reach this
conclusion, we can prune it.
🞆 Alpha–beta search updates the values of α
and β as it goes along and prunes the
remaining branches at a node (i.e.,
terminates the recursive call) as soon as the
value of the current node is known to be
worse than the current α or β value for MAX
or MIN, respectively. The complete algorithm
is given in Figure 5.7.
SUMMARY
🞆 in each state, the result of each action, a
terminal test (which says when the game is
over), and a utility function that applies to
terminal states.
🞆 In two-player zero-sum games with perfect
information, the minimax algorithm can
select optimal moves by a depth-first
enumeration of the game tree.
🞆 The alpha–beta search algorithm computes
the same optimal move as minimax, but
achieves much greater efficiency by
eliminating subtrees that are provably
irrelevant.
OPTIMAL DECISIONS IN
MULTIPLAYER GAMES
🞆 Many popular games allow more than two players. Let us examine
how to extend the minimax idea to multiplayer games. This is
straightforward from the technical viewpoint, but raises some
interesting new conceptual issues.
🞆 First, we need to replace the single value for each node with a
vector of values. For example, in a three-player game with players
A, B, and C, a vector (vA, vB, vC ) is associated with each node. For
terminal states, this vector gives the utility of the state from each
player’s
🞆 viewpoint. (In two-player, zero-sum games, the two-element vector
can be reduced to a single value because the values are always
opposite.) The simplest way to implement this is to have the UTILITY
function return a vector of utilities.
🞆 Now we have to consider nonterminal states.
Consider the node marked X in the game tree
shown in Figure 5.4. In that state, player C
chooses what to do. The two choices lead
🞆 to terminal states with utility vectors (vA = 1,
vB = 2, vC = 6) and (vA = 4, vB = 2, vC = 3).
Since 6 is bigger than 3, C should choose the
first move. This means that if state X is
reached,
🞆 subsequent play will lead to a terminal state
with utilities (vA = 1, vB = 2, vC = 6). Hence,
the backed-up value of X is this vector. The
backed-up value of a node n is always the
🞆 vector of the successor state with the highest
value for the player choosing at n. Anyone
who plays multiplayer games, such as
Diplomacy, quickly becomes aware that
much more is going on than in two-player
games. Multiplayer games usually involve
alliances, whether formal or informal,
among the players. Alliances are made and
broken as the game proceeds. How are we to
understand such behavior? Are alliances a
natural consequence of optimal strategies for
each player in a multiplayer game? It turns
out that they can be. For example,
🞆 suppose A and B are in weak positions and C is in a stronger
position. Then it is often optimal for both A and B to attack C
rather than each other, lest C destroy each of them
individually. In this way, collaboration emerges from purely
selfish behavior. Of course, as soon as C weakens under
the joint onslaught, the alliance loses its value, and either
A or B could violate the agreement. In some cases, explicit
alliances merely make concrete what would have happened
anyway. In other cases, a social stigma attaches to breaking
an alliance, so players must balance the immediate
advantage of breaking an alliance against the long-term
disadvantage of being perceived as untrustworthy
🞆 If the game is not zero-sum, then
collaboration can also occur with just two
players. Suppose, for example, that there is a
terminal state with utilities (vA = 1000, vB =
1000) and that 1000 is the highest possible
utility for each player. Then the optimal
strategy is for both
🞆 players to do everything possible to reach
this state—that is, the players will
automatically cooperate to achieve a
mutually desirable goal.