2025 Lecture03 AdversarialSearch
2025 Lecture03 AdversarialSearch
AND GAMES
2
Two-player zero-sum games
3
Game theory and AI games
• Game theory views any multiagent environment as a game.
• The impact of each agent on the others is significant, regardless of
whether the agents are cooperative or competitive.
• Each agent needs to consider the actions of other agents
and how they affect its own welfare.
4
Two-player zero-sum games
• The games most commonly studied within AI are
5
Two-player zero-sum games
• Two players are MAX and MIN. MAX moves first.
• The players take turns moving until the game is over.
• At the end of the game, points are awarded to the winning
player and penalties are given to the loser.
6
Game formulation
• 𝑺𝟎 : The initial state, which specifies how the game is set up at the start.
• 𝑇𝑂- 𝑀𝑂𝑉𝐸 𝑠 : The player whose turn it is to move in state 𝑠.
• 𝑨𝑪𝑻𝑰𝑶𝑵𝑺 𝒔 : The set of legal moves in state 𝑠.
• 𝑹𝑬𝑺𝑼𝑳𝑻 𝒔, 𝒂 : The transition model, which defines the state resulting
from acting 𝑎 in state 𝑠.
• 𝐼𝑆-𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿(𝑠) : A terminal test, which is true when the game is over
and false otherwise.
• States where the game has ended are called terminal states.
• 𝑈𝑇𝐼𝐿𝐼𝑇𝑌 𝑠, 𝑝 : A utility function defines the final numeric value to player
when the game ends in terminal state
• E.g., chess: win (+1), lose (-1) and draw (0), backgammon: [0, 192]
7
State space graph and Game tree
• The initial state, 𝐴𝐶𝑇𝐼𝑂𝑁𝑆 function, and 𝑅𝐸𝑆𝑈𝐿𝑇 function
defines the state space graph.
• The complete game tree is a search tree that follows every
sequence of moves all the way to a terminal state.
• It may be infinite if the state space itself is unbounded or if the rules
of the game allow for infinitely repeating positions
8
A game tree for Tic-tac-toe
• Complexity
• ~ 1018 nodes, which may require 100k years with 106 positions/sec
• Chinook (1989-2007)
• The first computer program that won the world champion title in a
competition against humans
• 1990: won 2 games in competition with world champion Tinsley (final
score: 2-4, 33 draws). 1994: 6 draws
• Chinook’s search
• Ran on regular PCs, played perfectly by using alpha-beta search
combining with a database of 39 trillion endgame positions
10
Examples of game: Chess
• Complexity
• b 35, d 100, 10154 nodes (!!)
• Completely impractical to search this
• Deep Blue (May 11, 1997)
• Kasparov lost a 6-game match against IBM’s Deep Blue (1 win Kasp
– 2 wins DB) and 3 ties.
• In the future, focus will be to allow computers to LEARN to
play chess rather than being TOLD how it should play
11
Deep Blue
• Ran on a parallel computer with 30 IBM RS/6000 processors doing alpha–beta
search
• Searched up to 30 billion positions/move, average depth 14 (be able to reach to
40 plies)
• Evaluation function: 8000 features
• highly specific patterns of pieces (~4000 positions)
• 700,000 grandmaster games in database
• Working at 200 million positions/sec, even Deep Blue would require 10100 years
to evaluate all possible games.
• (The universe is only 1010 years old.)
• Now: algorithmic improvements have allowed programs running on standard PCs
to win World Computer Chess Championships.
• Pruning heuristics reduce the effective branching factor to less than 3
12
1 million trillion trillion trillion
GO trillion more configurations
than chess!
• Complexity
• Board of 19x19, b 361, average depth 200
• 10174 possible board configuration.
• The control of territory is unpredictable
until the endgame
• AlphaGo (2016) by Google
• Beat 9-dan professional Lee Sedol (4-1)
• Machine learning + Monte Carlo search guided by a “value network”
and a “policy network” (implemented using deep neural network
technology)
• Learn from human + Learn by itself (self-play games)
13
An overview of AlphaGo
14
Optimal decisions in games
The minimax algorithm
• Assume that both agents play optimally from any state 𝑠 to
the end of the game.
• The minimax value of state 𝑠 is the utility of being in 𝑠.
• The minimax value of a terminal state is just its utility.
• MAX prefers to move to a state of maximum value, and MIN
prefers a state of minimum value.
For MAX
16
The minimax algorithm
• Recursively proceed all the way down to the leave nodes
and backs up the minimax values as the recursion unwinds.
17
Image credit: Medium
Minimax example: A two-ply game tree
MAX best move
MAX’s best move at the root is 𝑎1 , leading to the state with the highest minimax
value. MIN’s best reply is 𝑏1 , heading to the state with the lowest minimax value.
∆ nodes: MAX’s turn to move, ∇ nodes: MIN’s turn to move; terminal nodes show the utility values for
MAX and other nodes are labeled with their minimax values. 18
function MINIMAX-SEARCH(game, state) returns an action
player ← game.TO-MOVE(state)
value, move ← MAX-VALUE(game, state)
return move
function MAX-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null
v ← –
for each a in game.ACTIONS(state) do
v2, a2 ← MIN-VALUE(game, game.RESULT(state, a))
if v2 > v then
v, move ← v2, a
return v, move
function MIN-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null
v ← +
for each a in game.ACTIONS(state) do
v2, a2 ← MAX-VALUE(game, game.RESULT(state, a))
if v2 < v then
v, move ← v2, a
return v, move 19
What if MIN does not play optimally?
• MAX will do at least as well as against an optimal player.
• However, that does not mean that it is always best to play
the optimal move when facing a suboptimal opponent.
• Consider the following situation.
TIE
An optimal play by both sides will lead to a draw.
21
Optimality in multiplayer games
• The 𝑈𝑇𝐼𝐿𝐼𝑇𝑌 function is improved to return a vector of utilities.
• For terminal states, this vector gives the utility of the state from each
player’s viewpoint.
A three-ply game tree with three players (A, B, C). Each node is labeled with
values from the viewpoint of each player. The best move is marked at the root.22
Optimality in multiplayer games
• Multiplayer games usually involve alliances, which are made
and broken as the game proceeds.
23
Quiz 01: Minimax algorithm
• Calculate the minimax value for each of the remaining nodes.
• Which node should MAX and MIN choose?
24
Alpha-beta pruning
• Alpha-beta pruning aims to cut back any branches of the
game tree that cannot possibly influence the final decision.
• Its worst case is as good as the minimax algorithm.
25
Alpha-beta pruning
• The two parameters, 𝜶 and 𝜷, describes the bounds on the
backed-up values that appear anywhere along the path.
• 𝜶 = the value of the best (i.e., highest-value) choice we have found
so far at any choice point along the path for MAX.
• β = the value of the best (i.e., lowest-value) choice we have found so
far at any choice point along the path for MIN.
26
Alpha-beta pruning: An example
27
• Let the two unevaluated successors of node 𝐶 have values 𝑥 and 𝑦.
• Then the value of the root node is given by
30
Good move ordering
• Dynamic move-ordering schemes bring us quite close to the
theoretical limit.
• E.g., trying first the moves that were found to be best in the past
• Killer move heuristic: run IDS search with 1 ply deep and
record the best path, from which search 1 ply deeper.
• Transposition table avoids re-evaluation a state by caching
the heuristic value of states.
• Transpositions are different permutations of the move sequence that
end up in the same position.
31
Quiz 02: Alpha-beta pruning
• Calculate the minimax value for each of the remaining nodes.
• Which node should MAX and MIN choose?
32
Imperfect
real-time
decisions
• Evaluation functions
• Cutting off search
• Forward pruning
• Search versus Lookup
Heuristic minimax
• Both minimax and alpha-beta pruning search all the way to
terminal states.
• This depth is usually impractical because moves must be made in a
reasonable amount of time (~ minutes).
• Cut off the search earlier with some depth limit
• Use an evaluation function
• An estimation for the desirability of position (win, lose, tie?)
34
Evaluation functions
• These evaluation function should order the terminal states in
the same way as the true utility function does
• States that are wins must evaluate better than draws, which in turn
must be better than losses.
• The computation must not take too long!
• For nonterminal states, their orders should be strongly
correlated with the actual chances of winning.
35
Evaluation functions
• For chess, typically linear weighted sum of features
𝑬𝒗𝒂𝒍(𝒔) = 𝒘𝟏 𝒇𝟏 (𝒔) + 𝒘𝟐 𝒇𝟐 (𝒔) + … + 𝒘𝒏 𝒇𝒏 (𝒔)
• where 𝑓𝑖 could be the numbers of each kind of piece on the board,
and 𝑤𝑖 could be the values of the pieces
• E.g., 𝐸𝑣𝑎𝑙(𝑠) = 9𝑞 + 5𝑟 + 3𝑏 + 3𝑛 + 𝑝
• Implicit strong assumption: the contribution of each feature
is independent of the values of the other features.
• E.g., assign the value 3 to a bishop ignores the fact that bishops are
more powerful in the endgame → Nonlinear combination
36
Cutting off search
• Minimax Cutoff is identical to Minimax Value except
1. 𝐼𝑠 − 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙? is replaced by 𝐼𝑠 − 𝐶𝑢𝑡𝑜𝑓𝑓?
2. 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 is replaced by 𝐸𝑣𝑎𝑙
37
A more sophisticated cutoff test
• Quiescent positions are those unlikely to exhibit wild swings
in value in the near future.
• E.g., in chess, positions in which favorable captures can be made
are not quiescent for an evaluation function counting material only
• Quiescence search: expand nonquiescent positions until
quiescent positions are reached.
38
Quiescent positions: An example
Two chess positions that differ only in the position of the rook at lower right.
In (a), Black has an advantage of a knight and two pawns, which should be
enough to win the game. In (b), White will capture the queen, giving it an
advantage that should be strong enough to win. 39
A more sophisticated cutoff test
• Horizon effect: The program is facing an evitable serious
loss and temporarily avoid it by delaying tactics.
40
A more sophisticated cutoff test
• Singular extension: a move that is “clearly better” than all
other moves in a given position.
• The algorithm allows for further consideration on a legal singular
extension → deeper search tree, yet only a few singular extensions.
• Beam search
• Forward pruning, consider only a “beam” of the 𝑛 best moves only
• Most humans consider only a few moves from each position
• PROBCUT, or probabilistic cut, algorithm (Buro, 1995)
• Search vs. Lookup
• Use table lookup rather than search for the opening and ending
41
Stochastic
games
Stochastic behaviors
• Uncertain outcomes controlled by chance, not an adversary!
• Why wouldn’t we know what the result of an action will be?
• Explicit randomness: rolling dice
• Unpredictable opponents: the ghosts respond randomly
• Actions can fail: when a robot is moving, wheels might slip
43
Expectimax search
• Values reflect the average-case (expectimax) outcomes, not
worst-case (minimax) outcomes.
• Expectimax search: compute the average score for optimal
play
• Max nodes are as in minimax search
• Chance nodes are like min nodes, but the outcome is uncertain
• Calculate expected utilities, i.e., take weighted average of children
44
Expectimax search
• For minimax, terminal function scale doesn't matter
• Monotonic transformations: better states to have higher evaluations
• For expectimax, we need magnitudes to be meaningful
46
Expectimax search: Pseudo code
47
Expectimax pruning
48
Expectimax pruning
• Pruning can only be possible with knowledge of a fix range.
49
Depth-limited expectimax
50
51