03 Adversarial Search
03 Adversarial Search
- GAMES -
*************************
[~] Cover competitive environment, in which the agent's goals are in conflict,
giving rise to adversarial search problems.
[~] Begin with a definition of optimal move, algorithm to find it. Then look at
technique for choosing a good move in limited time.
[-] Prunning allows us to ignore portions of search tree that makes no
difference to the final choice.
[-] Heuristic evaluation functions allow us to approximate the true utility of
the state without doing complete search.
[~] Consider 2-player games, MAX and MIN. Game can be defined as:
[-] S(0): initial state
[-] Player(S): defines which player has the move in state S
[-] Action(S): set of legal moves in a state
[-] Result(S, a): transistion model, defines result of a move
[-] Terminal-Test(S): true when game is over.
[-] Utility(S, p): utility function defines final numeric value for game that
ends in terminal state S for player p. (e.g., outcome: 1-win, 0-lose, 1/2-draw)
**********************************************
- OPTIMAL DECISIONS IN GAMES -
**********************************************
[~] In game parlance, we say that a tree is one move deep, consists of two half
moves from each player, each of which is called a ply.
[~] Optimal strategy can be determined from the minimax value of each node,
MINIMAX(n) - utility of being in the corresponding state, assume both plays
optimally.
[-] Minimax of a terminal state is its utility.
[-] MINIMAX(S) = UTILITY(S) if TERMINAL-TEST(S)
max(..a) MINIMAX(RESULT(S, a)) if PLAYER(S) = MAX
min(..a) MINIMAX(RESULT(S, a)) if PLAYER(S) = MIN
[~] The minimax algorithm: computes the minimax decision from the current state.
[-] uses a simple recursive computation of the minimax values of each successor
state.
[-] minimax values are backed up through the tree as the recursion unwinds.
[-] performs a complete depth-first exploration of the game tree => time
complexity: O(b^m) - b: num legal moves at each point, depth: m
[~] For multiplayers, replace single value for each node with a vector of values.
[-] The backed-up value of a node n is always the utility vector of the
successor state with the highest value for the player choosing at n.
[-] Usually involve alliances: Collaboration emerges from purely selfish
behavior.
**************************************
- ALPHA-BETA PRUNING -
**************************************
[~] Problem with minimax search: number of game states is exponential in the depth
of the tree.
[-] Cannot eliminate the exponential, but can cut it in half.
[-] Possible to compute correct minimax decision without looking at all nodes.
=> Alpha-beta pruning.
[~] Alpha-beta pruning can be applied to trees of any depth, usually possible to
prune entire subtrees.
[~] General principle: consider a node n, such that Player has a choice of moving
to that node.
[-] If Player has a better choice m before either at parent node of n or any
choice point further up, n will never be reached in actual play.
[~] alpha: value of the best choice found so far at any choice point along the path
for MAX
beta: value of the best choice found so far at any choice point along the path
for MIN
[-] Alpha-beta search updates the value of alpha-beta (of each node) as it goes
along and prunes remaining branches at a node as soon as the value of current node
is known to be worse than current alpha or beta value for MAX and MIN,
respectively.
[~] Move-ordering: effectiveness of alpha-beta pruning is highly dependent on the
examined order of the states.
[-] Might be worthwhile to examine first the succesors that are likely to be
the best.`
[-] If can be done, time complexity reduce to O(b^(m/2)), branching factor
becomes sqrt(b) instead of b.
[+] alpha-beta can solve a tree twice as deep as minimax in same amount of
time.
[-] Can add dynamic move-ordering schemes - trying first the moves that were
found to be best in the past
[+] Can apply iterative deepening search: first 1 ply, then 1 ply
deeper,...
[-] Repeated states may occur frequently due to transpositions - different
permutations but yields same result.
[+] Worthwhile to store the evaluation of resulting position in hash table
the first time encountered => transpotition table.
*************************************************
- IMPERFECT REAL-TIME DECISIONS -
*************************************************
[~] Should cut off the search earlier and apply heuristics evaluation function to
states
[-] Replace utility function by a heuristic evaluation function EVAL -
estimates the position's utility.
[-] Replace terminal test by a cutoff test - decides when to apply EVAL.
[~] Evaluation function: estimate expected utility of game from given position.
[-] First, evaluation func should order terminal states same way as the true
utility func does: win states must be evaluated better than draws.
[-] Second, computational time is not too long !
[-] Finally, for nonterminal states, evaluation func should be strongly
correlated with the actual chances of winning.
[-] Most EVAL computer separate numberical contributions from each feature and
then combine them to find total value.
[+] Mathematically called weighted linear function:
EVAL(s) = w1f1(s) + w2f2(s) + ... wnfn(s), fi(s): feature of state, wi:
weight of corresponding feature.
[+] Can lead to errors due to approximate nature of evaluation function.
[~] Need more sophisticated cutoff test:
[-] EVAL func should only be applied to quiescent positions - unlikely to
exhibit wild swings in value in the near future.
[-] Horizon effect - arises when facing opponent's move that causes serious
damage and unavoidable.
[+] Can be temporarily avoided by delayed tactics.
[+] Can mitigate horizon effect by singular extension - move that "clearly
better" than others.
[-] Forward pruning: prun some moves without consideration
[+] Can use beam search
[+] Dangerous => use ProbCut (based on gained experience and statistics).
[-] Lookup table: rather than search for opening and ending.
************************************
- STOCHASTIC GAMES -
************************************
[~] Unpredictable external events => put us into unforeseen situations.
=> stochastic games.
[~] Must include chance nodes in addition to MAX and MIN nodes.
[-] Branches leading from each chance nodes denote possible dice rolls, for ex.
[~] Need to make correct decisions, but positions don't have definite minimax
values.
[-] But we can calculate expected value.
[-] This leads us to generalize the minimax value for deterministic games to an
expecti-minimax values for chance-node games.
[+] Terminal, MAX, MIN nodes (when dice roll is known) work exactly the
same way before.
[+] For chance nodes, we calculate the expected value - sum of value over
all outcomes, weighted by probability.
EXPECTI-MINIMAX(S) = UTILITY(S) if
TERMINAL-TEST(S)
max(..a) EXPECTI-MINIMAX(RESULT(S, a)) if
PLAYER(S) = MAX
min(..a) EXPECTI-MINIMAX(RESULT(S, a)) if
PLAYER(S) = MIN
sum of P(r)*EXPECTI-MINIMAX(RESULT(S, r)) if
PLAYER(S) = CHANCE
[~] The presence of chance nodes make the evaluation function more sensitive:
[-] Program behaves totally different when changing the scale of some
evaluation values!
[-] To avoid this sensitivity, EVAL must be a positive linear transformation of
probability of winning from a position.