0% found this document useful (0 votes)
29 views51 pages

2025 Lecture03 AdversarialSearch

The document discusses adversarial search in two-player zero-sum games, focusing on optimal decision-making through algorithms like minimax and alpha-beta pruning. It highlights the complexities of various games such as chess, checkers, and Go, emphasizing the importance of evaluation functions and heuristic search methods for practical gameplay. Additionally, it covers the theoretical foundations of game trees and the implications of player strategies in competitive environments.

Uploaded by

nmkhoi232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views51 pages

2025 Lecture03 AdversarialSearch

The document discusses adversarial search in two-player zero-sum games, focusing on optimal decision-making through algorithms like minimax and alpha-beta pruning. It highlights the complexities of various games such as chess, checkers, and Go, emphasizing the importance of evaluation functions and heuristic search methods for practical gameplay. Additionally, it covers the theoretical foundations of game trees and the implications of player strategies in competitive environments.

Uploaded by

nmkhoi232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

ADVERSARIAL SEARCH

AND GAMES

Nguyễn Ngọc Thảo – Nguyễn Hải Minh


{nnthao, nhminh}@fit.hcmus.edu.vn
Outline
• Two-player zero-sum games
• Optimal decisions in games
• Heuristic alpha-beta tree search
• Stochastic games

2
Two-player zero-sum games

3
Game theory and AI games
• Game theory views any multiagent environment as a game.
• The impact of each agent on the others is significant, regardless of
whether the agents are cooperative or competitive.
• Each agent needs to consider the actions of other agents
and how they affect its own welfare.

4
Two-player zero-sum games
• The games most commonly studied within AI are

Perfect information Zero-sum Deterministic


Fully observable What is good for one
player is just as bad
Two-player
for the other.
No “win-win” outcome Turn-taking

• The terms are slightly different from those in search.


• Action → Move and State → Position

5
Two-player zero-sum games
• Two players are MAX and MIN. MAX moves first.
• The players take turns moving until the game is over.
• At the end of the game, points are awarded to the winning
player and penalties are given to the loser.

• The two players are rational, trying to maximize their utilities.

6
Game formulation
• 𝑺𝟎 : The initial state, which specifies how the game is set up at the start.
• 𝑇𝑂- 𝑀𝑂𝑉𝐸 𝑠 : The player whose turn it is to move in state 𝑠.
• 𝑨𝑪𝑻𝑰𝑶𝑵𝑺 𝒔 : The set of legal moves in state 𝑠.
• 𝑹𝑬𝑺𝑼𝑳𝑻 𝒔, 𝒂 : The transition model, which defines the state resulting
from acting 𝑎 in state 𝑠.
• 𝐼𝑆-𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿(𝑠) : A terminal test, which is true when the game is over
and false otherwise.
• States where the game has ended are called terminal states.
• 𝑈𝑇𝐼𝐿𝐼𝑇𝑌 𝑠, 𝑝 : A utility function defines the final numeric value to player
when the game ends in terminal state
• E.g., chess: win (+1), lose (-1) and draw (0), backgammon: [0, 192]

7
State space graph and Game tree
• The initial state, 𝐴𝐶𝑇𝐼𝑂𝑁𝑆 function, and 𝑅𝐸𝑆𝑈𝐿𝑇 function
defines the state space graph.
• The complete game tree is a search tree that follows every
sequence of moves all the way to a terminal state.
• It may be infinite if the state space itself is unbounded or if the rules
of the game allow for infinitely repeating positions

8
A game tree for Tic-tac-toe

MAX uses search tree to


pick the next move.

There are fewer than


9! terminal nodes
(with only 5,478 distinct states)

from the point of view of MAX 9


Examples of game: Checkers

• Complexity
• ~ 1018 nodes, which may require 100k years with 106 positions/sec
• Chinook (1989-2007)
• The first computer program that won the world champion title in a
competition against humans
• 1990: won 2 games in competition with world champion Tinsley (final
score: 2-4, 33 draws). 1994: 6 draws
• Chinook’s search
• Ran on regular PCs, played perfectly by using alpha-beta search
combining with a database of 39 trillion endgame positions

10
Examples of game: Chess
• Complexity
• b  35, d  100, 10154 nodes (!!)
• Completely impractical to search this
• Deep Blue (May 11, 1997)
• Kasparov lost a 6-game match against IBM’s Deep Blue (1 win Kasp
– 2 wins DB) and 3 ties.
• In the future, focus will be to allow computers to LEARN to
play chess rather than being TOLD how it should play

11
Deep Blue
• Ran on a parallel computer with 30 IBM RS/6000 processors doing alpha–beta
search
• Searched up to 30 billion positions/move, average depth 14 (be able to reach to
40 plies)
• Evaluation function: 8000 features
• highly specific patterns of pieces (~4000 positions)
• 700,000 grandmaster games in database
• Working at 200 million positions/sec, even Deep Blue would require 10100 years
to evaluate all possible games.
• (The universe is only 1010 years old.)
• Now: algorithmic improvements have allowed programs running on standard PCs
to win World Computer Chess Championships.
• Pruning heuristics reduce the effective branching factor to less than 3

12
1 million trillion trillion trillion
GO trillion more configurations
than chess!

• Complexity
• Board of 19x19, b  361, average depth  200
• 10174 possible board configuration.
• The control of territory is unpredictable
until the endgame
• AlphaGo (2016) by Google
• Beat 9-dan professional Lee Sedol (4-1)
• Machine learning + Monte Carlo search guided by a “value network”
and a “policy network” (implemented using deep neural network
technology)
• Learn from human + Learn by itself (self-play games)

13
An overview of AlphaGo

14
Optimal decisions in games
The minimax algorithm
• Assume that both agents play optimally from any state 𝑠 to
the end of the game.
• The minimax value of state 𝑠 is the utility of being in 𝑠.
• The minimax value of a terminal state is just its utility.
• MAX prefers to move to a state of maximum value, and MIN
prefers a state of minimum value.

For MAX

16
The minimax algorithm
• Recursively proceed all the way down to the leave nodes
and backs up the minimax values as the recursion unwinds.

17
Image credit: Medium
Minimax example: A two-ply game tree
MAX best move

MIN best move

MAX’s best move at the root is 𝑎1 , leading to the state with the highest minimax
value. MIN’s best reply is 𝑏1 , heading to the state with the lowest minimax value.

∆ nodes: MAX’s turn to move, ∇ nodes: MIN’s turn to move; terminal nodes show the utility values for
MAX and other nodes are labeled with their minimax values. 18
function MINIMAX-SEARCH(game, state) returns an action
player ← game.TO-MOVE(state)
value, move ← MAX-VALUE(game, state)
return move
function MAX-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null
v ← –
for each a in game.ACTIONS(state) do
v2, a2 ← MIN-VALUE(game, game.RESULT(state, a))
if v2 > v then
v, move ← v2, a
return v, move
function MIN-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null
v ← +
for each a in game.ACTIONS(state) do
v2, a2 ← MAX-VALUE(game, game.RESULT(state, a))
if v2 < v then
v, move ← v2, a
return v, move 19
What if MIN does not play optimally?
• MAX will do at least as well as against an optimal player.
• However, that does not mean that it is always best to play
the optimal move when facing a suboptimal opponent.
• Consider the following situation.

TIE
An optimal play by both sides will lead to a draw.

10 possible response moves by MIN that all seem reasonable,


but 9 of them are a loss for MIN and one is a loss for MAX.

MAX makes a risky move


WIN?
MIN cannot discover
the optimal move 20
An evaluation of Minimax algorithm
• A complete depth-first exploration of the game tree
• Completeness: Yes (if tree is finite)
• Optimality : Yes (against an optimal opponent)
• Time complexity: 𝑂(𝑏 𝑚 ) → infeasible for practical game
• 𝑚: the maximum depth of the tree, 𝑏: the legal moves at each point

• Space complexity: 𝑂(𝑏𝑚) (depth-first exploration)

21
Optimality in multiplayer games
• The 𝑈𝑇𝐼𝐿𝐼𝑇𝑌 function is improved to return a vector of utilities.
• For terminal states, this vector gives the utility of the state from each
player’s viewpoint.

A three-ply game tree with three players (A, B, C). Each node is labeled with
values from the viewpoint of each player. The best move is marked at the root.22
Optimality in multiplayer games
• Multiplayer games usually involve alliances, which are made
and broken as the game proceeds.

A and B are weak while C is strong. C becomes weak.


A forms an alliance with B. A or B could violate the agreement

• If the game is not zero-sum, collaboration can also occur


with just two players.

23
Quiz 01: Minimax algorithm
• Calculate the minimax value for each of the remaining nodes.
• Which node should MAX and MIN choose?

24
Alpha-beta pruning
• Alpha-beta pruning aims to cut back any branches of the
game tree that cannot possibly influence the final decision.
• Its worst case is as good as the minimax algorithm.

The player plans to move to a node 𝑛.


If he has a better choice either at the
same level, e.g., node 𝑚′ , or at any
point higher up in the tree, e.g., node
𝑚, then he will never move 𝑛.

25
Alpha-beta pruning
• The two parameters, 𝜶 and 𝜷, describes the bounds on the
backed-up values that appear anywhere along the path.
• 𝜶 = the value of the best (i.e., highest-value) choice we have found
so far at any choice point along the path for MAX.
• β = the value of the best (i.e., lowest-value) choice we have found so
far at any choice point along the path for MIN.

• The algorithm updates these values as it goes along.


• Pruning at a current node happens when its value is worse
than the current 𝛼 or 𝛽 value for MAX or MIN, respectively.

26
Alpha-beta pruning: An example

27
• Let the two unevaluated successors of node 𝐶 have values 𝑥 and 𝑦.
• Then the value of the root node is given by

𝑀𝐼𝑁𝐼𝑀𝐴𝑋 𝑟𝑜𝑜𝑡 = max(min 3, 12, 8 , min 2, 𝑥, 𝑦 , min 14, 5, 2 )


= max(3, min 2, 𝑥, 𝑦 , 2)
= max(3, 𝑧, 2) 𝑤ℎ𝑒𝑟𝑒 𝑧 = min(2, 𝑥, 𝑦) ≤ 2
=3 28
function ALPHA-BETA-SEARCH(game, state) returns an action
player ← game.TO-MOVE(state)
value, move ← MAX-VALUE(game, state, –, +)
return move
function MAX-VALUE(game, state, , ) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null
v ← –
for each a in game.ACTIONS(state) do
v2, a2 ← MIN-VALUE(game, game.RESULT(state, a), , )
if v2 > v then
v, move ← v2, a
 ← MAX(, v)
if v   then return v, move
return v, move
function MIN-VALUE(game, state, , ) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null
v ← +
for each a in game.ACTIONS(state) do
v2, a2 ← MAX-VALUE(game, game.RESULT(state, a), , )
if v2 < v then
v, move ← v2, a
 ← MIN(, v)
if v   then return v, move
return v, move
return v, move 29
Good move ordering
• It might be worthwhile to first examine the successors that
are likely to be best.
• E.g., the successors of node D in the previous example.
• Alpha–beta with perfect move ordering can solve a tree
roughly twice as deep as minimax in the same duration.
• Perfect move ordering: 𝑂(𝑏𝑚/2 ) → effective branching factor 𝑏
• Random move ordering: 𝑂(𝑏 3𝑚/4 ) for moderate 𝑏
• Obviously, we cannot achieve perfect move ordering.

30
Good move ordering
• Dynamic move-ordering schemes bring us quite close to the
theoretical limit.
• E.g., trying first the moves that were found to be best in the past
• Killer move heuristic: run IDS search with 1 ply deep and
record the best path, from which search 1 ply deeper.
• Transposition table avoids re-evaluation a state by caching
the heuristic value of states.
• Transpositions are different permutations of the move sequence that
end up in the same position.

31
Quiz 02: Alpha-beta pruning
• Calculate the minimax value for each of the remaining nodes.
• Which node should MAX and MIN choose?

32
Imperfect
real-time
decisions

• Evaluation functions
• Cutting off search
• Forward pruning
• Search versus Lookup
Heuristic minimax
• Both minimax and alpha-beta pruning search all the way to
terminal states.
• This depth is usually impractical because moves must be made in a
reasonable amount of time (~ minutes).
• Cut off the search earlier with some depth limit
• Use an evaluation function
• An estimation for the desirability of position (win, lose, tie?)

34
Evaluation functions
• These evaluation function should order the terminal states in
the same way as the true utility function does
• States that are wins must evaluate better than draws, which in turn
must be better than losses.
• The computation must not take too long!
• For nonterminal states, their orders should be strongly
correlated with the actual chances of winning.

35
Evaluation functions
• For chess, typically linear weighted sum of features
𝑬𝒗𝒂𝒍(𝒔) = 𝒘𝟏 𝒇𝟏 (𝒔) + 𝒘𝟐 𝒇𝟐 (𝒔) + … + 𝒘𝒏 𝒇𝒏 (𝒔)
• where 𝑓𝑖 could be the numbers of each kind of piece on the board,
and 𝑤𝑖 could be the values of the pieces
• E.g., 𝐸𝑣𝑎𝑙(𝑠) = 9𝑞 + 5𝑟 + 3𝑏 + 3𝑛 + 𝑝
• Implicit strong assumption: the contribution of each feature
is independent of the values of the other features.
• E.g., assign the value 3 to a bishop ignores the fact that bishops are
more powerful in the endgame → Nonlinear combination

36
Cutting off search
• Minimax Cutoff is identical to Minimax Value except
1. 𝐼𝑠 − 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙? is replaced by 𝐼𝑠 − 𝐶𝑢𝑡𝑜𝑓𝑓?
2. 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 is replaced by 𝐸𝑣𝑎𝑙

If IS-CUTOFF(state, depth) then return EVAL(state)

• Does it work in practice?


• 𝑏 𝑚 = 106 , 𝑏 = 35 → 𝑚 = 4
• 4-ply lookahead is a hopeless chess player!
• 4-ply ≈ human novice, 8-ply ≈ typical PC, human master, 12-ply ≈
Deep Blue, Kasparov

37
A more sophisticated cutoff test
• Quiescent positions are those unlikely to exhibit wild swings
in value in the near future.
• E.g., in chess, positions in which favorable captures can be made
are not quiescent for an evaluation function counting material only
• Quiescence search: expand nonquiescent positions until
quiescent positions are reached.

38
Quiescent positions: An example

Two chess positions that differ only in the position of the rook at lower right.
In (a), Black has an advantage of a knight and two pawns, which should be
enough to win the game. In (b), White will capture the queen, giving it an
advantage that should be strong enough to win. 39
A more sophisticated cutoff test
• Horizon effect: The program is facing an evitable serious
loss and temporarily avoid it by delaying tactics.

With Black to move, the black bishop is


surely doomed. But Black can forestall
that event by checking the white king
with its pawns, forcing the king to
capture the pawns.

40
A more sophisticated cutoff test
• Singular extension: a move that is “clearly better” than all
other moves in a given position.
• The algorithm allows for further consideration on a legal singular
extension → deeper search tree, yet only a few singular extensions.
• Beam search
• Forward pruning, consider only a “beam” of the 𝑛 best moves only
• Most humans consider only a few moves from each position
• PROBCUT, or probabilistic cut, algorithm (Buro, 1995)
• Search vs. Lookup
• Use table lookup rather than search for the opening and ending

41
Stochastic
games
Stochastic behaviors
• Uncertain outcomes controlled by chance, not an adversary!
• Why wouldn’t we know what the result of an action will be?
• Explicit randomness: rolling dice
• Unpredictable opponents: the ghosts respond randomly
• Actions can fail: when a robot is moving, wheels might slip

43
Expectimax search
• Values reflect the average-case (expectimax) outcomes, not
worst-case (minimax) outcomes.
• Expectimax search: compute the average score for optimal
play
• Max nodes are as in minimax search
• Chance nodes are like min nodes, but the outcome is uncertain
• Calculate expected utilities, i.e., take weighted average of children

44
Expectimax search
• For minimax, terminal function scale doesn't matter
• Monotonic transformations: better states to have higher evaluations
• For expectimax, we need magnitudes to be meaningful

Image credit: CSE 473 45


Quiz 03: Expectimax search
• Calculate the expectimax values for all the nodes.

1/2 1/4 1/2 1/4 1/2 1/4


1/4 1/4 1/4

46
Expectimax search: Pseudo code

47
Expectimax pruning

Is it possible to perform pruning in expectimax search?

48
Expectimax pruning
• Pruning can only be possible with knowledge of a fix range.

How to prune this tree?


• Each child has an equal
probability of being chosen
• The values can only be in
the range 0-9 (inclusive).

49
Depth-limited expectimax

50
51

You might also like