AI & ML (Unit-1)
AI & ML (Unit-1)
What is Search?
a) Search is the systematic examination of states to find path from the start/root
state to the goal state.
b) The set of possible states, together with operators defining their connectivity
constitute the search space.
c) The output of a search algorithm is a solution, that is, a path from the initial
state to a state that satisfies the goal test.
Fig: 1.13
A transition model, which describes what each action does. RESULT (s,a)
returns the state that results from doing action a in state s. For example,
An action cost function, denoted by ACTION-COST (s,a s’) when we are
programming or (s,a.s’) when we are doing math, that gives the numeric cost
of applying action in state s to reach state s’. A problem-solving agent should
use a cost function that reflects its own performance measure; for example, for
route-finding agents, the cost of an action might be the length in miles or it
might be the time it takes to complete the action.
A sequence of actions forms a path, and a solution is a path from the initial
state to a goal state. We assume that action costs are additive; that is, the total
cost of a path is the sum of the individual action costs.
An optimal solution has the lowest path cost among all solutions.
The state space can be represented as a graph in which the vertices are states
and the directed edges between them are actions. The map of Romania shown
in figure is such a graph, where each road indicates two actions, one in each
direction
1.7.2 Formulating problems
We derive a formulation of the problem in terms of the initial state, successor
function , goal test, and path cost
Our formulation of the problem of getting to Bucharest is a model—an
abstract mathematical description—and not the real thing.
Compare the simple atomic state description Arad to an actual cross-country
trip, where the state of the world includes so many things: the traveling
companions, the current radio program, the scenery out of the window, the
proximity of law enforcement officers, the distance to the next rest stop, the
condition of the road, the weather, the traffic, and so on.
All these considerations are left out of our model because they are irrelevant
to the problem of finding a route to Bucharest.
The process of removing detail from a representation is called abstraction.
Fig: 1.14
b. 8-puzzle:
An 8-puzzle consists of a 3x3 board with eight numbered tiles and a blank space.
A tile adjacent to the blank space can slide into the space. The object is to reach the
specific goal state ,as shown in figure 1.15
Fig: 1.15
c. 8-queens problem
The goal of 8-queens problem is to place 8 queens on the chessboard such that no
queen attacks any other. (A queen attacks any piece in the same row, column or
diagonal).
The following figure shows an attempted solution that fails: the queen in the right
most column is attacked by the queen at the top left.
An Incremental formulation involves operators that augments the state description,
starting with an empty state for 8-queens problem, this means each action adds a
queen to the state.
A complete-state formulation starts with all 8 queens on the board and move them
around.
In either case the path cost is of no interest because only the final state counts.
The first incremental formulation one might try is the following :
States : Any arrangement of 0 to 8 queens on board is a state.
Initial state : No queen on the board.
Successor function : Add a queen to any empty square.
Goal Test : 8 queens are on the board, none attacked.
In this formulation, we have 64.63…57 = 3 x 1014 possible sequences to investigate.
A better formulation would prohibit placing a queen in any square that is already
attacked.
States : Arrangements of n queens ( 0 <= n < = 8 ) ,one per column in the left
most columns ,with no queen attacking another are states.
Successor function : Add a queen to any square in the left most empty
column such that it is not attacked by any other queen.
This formulation reduces the 8-queen state space from 3 x 1014 to just 2057,and
solutions are easy to find.
For the 100 queens the initial formulation has roughly 10400 states whereas the
improved formulation has about 1052 states.
This is a huge reduction, but the improved state space is still too big for the
algorithms to handle.
TOURING PROBLEMS
Touring problems are closely related to route-finding problems, but with an important
difference.
Consider for example, the problem, "Visit every city at least once" as shown in
Romania map.
As with route-finding the actions correspond to trips between adjacent cities.
Initial state would be "In Bucharest; visited{Bucharest}". Intermediate state would
be "In Vaslui; visited {Bucharest,Vrziceni,Vaslui}".
Goal test would check whether the agent is in Bucharest and all 20 cities have been
visited.
In the water jug problem in Artificial Intelligence, we are provided with two jugs:
one having the capacity to hold 3 gallons of water and the other has the capacity to
hold 4 gallons of water. There is no other measuring equipment available and the jugs
also do not have any kind of marking on them. So, the agent’s task here is to fill the 4-
gallon jug with 2 gallons of water by using only these two jugs and no other material.
Initially, both our jugs are empty.
So, to solve this problem, following set of rules were proposed: shown in figure: 1.16
Production rules for solving the water jug problem
Here, let x denote the 4-gallon jug and y denote the 3-gallon jug.
Fig 1.16
The listed production rules contain all the actions that could be performed by the
agent in transferring the contents of jugs. But, to solve the water jug problem in a
minimum number of moves, following set of rules in the given sequence should be
performed: shown in figure 1.17
Fig 1.17 Solution of water jug problem according to the production rules
If (x+y)
≥4
If ((x+y (x-[3-y], 3)
¿≥3
On reaching the 7th attempt, we reach a state which is our goal state. Therefore,
at this state, our problem is solved.
1.9 SEARCH ALGORITHMS
A search algorithm takes a search problem as input and returns a solution, or an
indication of failure.
We consider algorithms that try to find a path that reaches a goal state.
Each node in the search tree corresponds to a state in the state space and the edges in
the search tree correspond to actions.
The root of the tree corresponds to the initial state of the problem.
The state space describes the set of states in the world, and the actions that allow
transitions from one state to another.
The search tree describes paths between these states, reaching towards the goal. The
search tree may have multiple paths to any given state, but each node in the tree has a
unique path back to the root (as in all trees)
Figure 1.18 shows the first few steps in finding a path from Arad to Bucharest.
The root node of the search tree is at the initial state, Arad.
We can expand the node, by considering the available ACTIONS for that state, using
the RESULT function to see where those actions lead to, and generating a new node
(called a child node or successor node) for each of the resulting states. Each child
node has Arad as its parent node.
At each stage, we have expanded every node on the frontier, extending every path
with all applicable actions that don’t result in a state that has already been reached.
At the third stage, the topmost city (Oradea) has two successors, both of which have
already been reached by other paths, so no paths are extended from Oradea.
Nodes that have been expanded and nodes on the frontier that have been generated are
shown. Nodes that could be generated next are shown in faint dashed lines. In the
bottom tree there is a cycle from Arad to Sibiu to Arad; that can’t be an optimal path,
so search should not continue from there.
Fig: 1.18
Now we must choose which of these three child nodes to consider next. This is the
essence of search—following up one option now and putting the others aside for later.
Suppose we choose to expand Sibiu first, results a set of 6 unexpanded nodes. We call
this the frontier of the search tree. We say that any state that has had a node generated
for it has been reached (whether or not that node has been expanded).
1.9.1 Best-first search
How do we decide which node from the frontier to expand next?
A very general approach is called best-first search, in which we choose a
node, with minimum value of some f (n) , evaluation function, the algorithm
is shown in figure 1.19.
On each iteration we choose a node on the frontier with minimum value,
return it if its state is a goal state, and otherwise apply EXPAND to generate
child nodes.
Each child node is added to the frontier if it has not been reached before, or is
re-added if it is now being reached with a path that has a lower path cost than
any previous path.
The algorithm returns either an indication of failure, or a node that represents
a path to a goal. By employing different functions, we get different specific
algorithms, which this chapter will cover.
Fig 1.19 The best-first search algorithm, and the function for expanding a node.
Search data structures
Search algorithms require a data structure to keep track of the search tree. A node in
the tree is represented by a data structure with four components:
[Link]: the state to which the node corresponds;
[Link]: the node in the tree that generated this node;
[Link]: the action that was applied to the parent’s state to generate this node;
[Link]-COST: the total cost of the path from the initial state to this node. In
mathematical formulas, we use as a synonym for PATH-COST.
We need a data structure to store the frontier.
The appropriate choice is a queue of some kind, because the operations on a frontier
are:
IS-EMPTY(frontier) returns true only if there are no nodes in the frontier.
POP(frontier) removes the top node from the frontier and returns it.
TOP(frontier) returns (but does not remove) the top node of the frontier.
ADD(node, frontier) inserts node into its proper place in the queue.
MEASURING PROBLEM-SOLVING PERFORMANCE
The output of problem-solving algorithm is either failure or a solution.
The algorithm's performance can be measured in four ways :
i. Completeness: Is the algorithm guaranteed to find a solution when
there is one?
ii. Optimality : Does the strategy find the optimal solution
iii. Time complexity: How long does it take to find a solution?
iv. Space complexity: How much memory is needed to perform the
search?
1.10 UNINFORMED SEARCH STRATEGIES (Blind search)
The term means that the strategies have no additional information about states
beyond that provided in the problem definition.
All they can do is generate successors and distinguish a goal state from a non-goal
state.
All search strategies are distinguished by the order in which nodes are expanded.
1.10.1 Breadth-first search
Breadth-first search is a simple strategy in which the root node is expanded first,
then all the successors of the root node are expanded next, then their successors,
and so on.
In general all the nodes are expanded at a given depth in the search tree before any
nodes at the next level are expanded.
BFS is an instance of the general graph-search algorithm in which the shallowest
unexpanded node is chosen for expansion. This is achieved very simply by using
a FIFO queue for the frontier.
Fig: 1.20
For a state space with branching factor b and maximum depth m, depth-first search
requires storage of only O(b m) nodes.
1.10.4 Depth-limited search
The embarrassing failure of depth-first search in infinite state spaces can be
alleviated by supplying depth-first search with a predetermined depth limit l .
That is, nodes at depthl are treated as if they have no successors. This approach is
called depth-limited search.
The depth limit solves the infinite-path problem. Unfortunately, it also introduces an
additional source of incompleteness if we choose l<d , that is, the shallowest goal is
beyond the depth-limit.
Depth-limit search will also be non-optimal if we choose l>d . Its time complexity
is O(b l) and its space complexity is O(bl). Depth-first search can be viewed as a
special case of depth-limited search with l=∞.
Fig: 1.22 The Recursive implementation of Depth-limited tree search:
1.10.5 Iterative deepening search
Iterative deepening search is used in combination with depth-first tree search, that
finds the best depth limit.
It does this by gradually increasing the limit-first 0, then 1, then 2, and so on-
until a goal is found. This will occur when the depth limit reaches d, the depth of the
shallowest goal node.
The algorithm is shown in figure 1.23, which repeatedly applies depth-limited
search with increasing limits. It terminates when a solution is found or if the
depth-limited search return failure, meaning that no solution exists.
Fig: 1.23
Fig: 1.24
Advantage:
Bidirectional search is fast and it requires less memory.
Disadvantage:
We should know the goal state in advance.
Performance Evaluation
Completeness Bidirectional search is complete if branching factor b is finite and if we
use BFS in both searches.
Optimality Bidirectional search is optimal.
Time Complexity O ¿) if it used BFS (where b is the branching factors or number of
nodes and d is the depth of the search tree or number of levels in search tree).
Space ComplexityO ¿
2.1 Informed (Heuristic) Search Strategies
Informed search strategy is one that uses problem-specific knowledge beyond the
definition of the problem itself.
It can find solutions more efficiently than uninformed strategy.
The hints come in the form of a heuristic function, denoted h(n).
Where, h(n) = estimated cost of the cheapest path from the state at node n to a goal state.
For example, in route-finding problems, we can estimate the distance from the current
state to a goal by computing the straight-line distance on the map between the two
points
Best-first search
Best-first search is an instance of general TREE-SEARCH or GRAPH-SEARCH
algorithm in which a node is selected for expansion based on an evaluation function
f(n).
The node with lowest evaluation is selected for expansion,because the evaluation
measures the distance to the goal.
This can be implemented using a priority-queue,a data structure that will maintain the
fringe in ascending order of f-values.
Heuristic functions
A heuristic function or simply a heuristic is a function that ranks alternatives in
various search algorithms at each branching step basing on an available information in
order to make a decision which branch is to be followed during a search.
The key component of Best-first search algorithm is a heuristic function,denoted by
h(n):
h(n) = extimated cost of the cheapest path from node n to a goal node.
For example,in Romania,one might estimate the cost of the cheapest path from Arad
to Bucharest via a straight-line distance from Arad to Bucharest(Figure 2.1).
Heuristic function are the most common form in which additional knowledge is
imparted to the search algorithm.
Fig: 2.2
Figure 2.2 shows the progress of greedy best-first search usingh SLD to find a path from Arad
to Bucharest. The first node to be expanded from Arad will be Sibiu,because it is closer to
Bucharest than either Zerind or Timisoara. The next node to be expanded will be
Fagaras,because it is closest. Fagaras in turn generates Bucharest,which is the goal.
Properties of greedy search
Complete?? No–can get stuck in loops, e.g.,
Complete in finite space with repeated-state checking
Time?? O(bm), but a good heuristic can give dramatic improvement
Space?? O(bm)—keeps all nodes in memory
Optimal?? No
Greedy best-first search is not optimal,and it is incomplete.
The worst-case time and space complexity is O(b m),where m is the maximum depth
of the search space.
2.1.2 A* Search
A* Search is the most widely used form of best-first search. The evaluation function
f(n) is
obtained by combining
i. g(n) = the cost to reach the node,and
ii. h(n) = the cost to get from the node to the goal :
f(n) = g(n) + h(n).
A* Search is both optimal and complete. A* is optimal if h(n) is an admissible
heuristic. The obvious example of admissible heuristic is the straight-line distance
h SLD. It cannot be an overestimate.
A* Search is optimal if h(n) is an admissible heuristic – that is,provided that h(n)
never overestimates the cost to reach the goal.
An obvious example of an admissible heuristic is the straight-line distance h SLD that
we used in getting to Bucharest. The progress of an A* tree search for Bucharest is
shown in Figure 2.2.
The values of ‘g ‘ are computed from the step costs shown in the Romania map
( figure 2.1). Also the values ofh SLD are given in Figure 2.1.
Fig: 2.2
A* search is complete.
Whether A* is cost-optimal depends on certain properties of the heuristic.
A key property is admissibility: an admissible heuristic is one that never
overestimates the cost to reach a goal.
A slightly stronger property is called consistency. A heuristic is consistent if, for
every node and every successor of generated by an action we have:
h(n)≤ c (n , a , n ')+h(n ').
This is a form of the triangle inequality, which stipulates that a side of a triangle
cannot be longer than the sum of the other two sides (see Figure 3.19 ). An example
of a consistent heuristic is the straight-line distance that we used in getting to
Bucharest.
Fig: 2.3
Fig: 2.4
The average solution cost for a randomly generated 8-puzzle instance is about 22
steps.
The branching factor is about 3.(When the empty tile is in the middle, there are four
possible moves; when it is in the corner there are two; and when it is along an edge
there are three).
This means that an exhaustive search to depth 22 would look at about 322
approximately = 3.1 X 1010 states.
By keeping track of repeated states, we could cut this down by a factor of about
170,000, because there are only 9!/2 = 181,440 distinct states that are reachable. This
is a manageable number, but the corresponding number for the 15-puzzle is roughly
1013.
If we want to find the shortest solutions by using A*,we need a heuristic function that
never overestimates the number of steps to the goal.
The two commonly used heuristic functions for the 15-puzzle are :
i. h1 = the number of misplaced tiles.
For figure 2.6, all of the eight tiles are out of position, so the start state would have h1
= 8. h1 is an admissible heuristic.
ii. h2 = the sum of the distances of the tiles from their goal positions.
This is called the city block distance or Manhattan distance.
h2 is admissible ,because all any move can do is move one tile one step closer to the goal.
Tiles 1 to 8 in start state give a Manhattan distance of
h2 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18.
Neither of these overestimates the true solution cost ,which is 26.
Fig 2.8
Pattern databases
The idea behind pattern databases is to store these exact solution costs for every
possible subproblem instance- in our example, every possible configuration of the
four tiles and the blank.
Then we compute an admissible heuristic for each state encountered during a
search simply by looking up the corresponding subproblem configuration in the
database.
The database itself is constructed by searching back from the goal and recording
the cost of each new pattern encountered;
1.2.4 Generating heuristics with landmarks
There are online services that host maps with tens of millions of vertices and find
cost-optimal driving directions in milliseconds (figure 2.9)
How can they do that, when the best search algorithms we have considered so far
are about a million times slower?
There are many tricks, but the most important one is precomputation of some
optimal path costs.
Although the precomputation can be time-consuming, it need only be done once,
and then can be amortized over billions of user search requests.
Fig 2.9
If the optimal path happens to go through a landmark, this heuristic will be exact; if
not it is inadmissible—it overestimates the cost to the goal.
In an A* search, if you have exact heuristics, then once you reach a node that is on an
optimal path, every node you expand from then on will be on an optimal path.
Some route-finding algorithms save even more time by adding shortcuts—artificial
edges in the graph that define an optimal multi-action path.
Could an agent learn how to search better? The answer is yes, and the method rests on
an important concept called the metalevel state space.
Each state in a metalevel state space captures the internal (computational) state of a
program that is searching in an ordinary state space such as the map of Romania.
(To keep the two concepts separate, we call the map of Romania a k object-level state
space.)
Each action in the metalevel state space is a computation step that alters the internal
state; for example, each computation step in A* expands a leaf node and adds its
successors to the tree.
For harder problems, there will be many such missteps, and a metalevel learning
algorithm can learn from these experiences to avoid exploring unpromising subtrees.
The goal of learning is to minimize the total cost of problem solving, trading off
computational expense and path cost.
1.2.6 Learning heuristics from experience
one way to invent a heuristic is to devise a relaxed problem for which an optimal
solution can be found easily.
An alternative is to learn from experience. “Experience” here means solving lots
of 8-puzzles, for instance.
Each optimal solution to an 8-puzzle problem provides an example (goal, path)
pair. From these examples, a learning algorithm can be used to construct a
function that can approximate the true path cost for other states that arise during
search.
2.3 LOCAL SEARCH AND OPTIMIZATION PROBLEM
Local Search
Local search algorithms operate using a single current node and generally move
only to neighbors of that node.
Local search method keeps small amount of nodes in memory. They are suitable for
problems when the solution is the goal state itself and not the path.
Local search have two key advantages
They use very little memory - usually a constant amount
They can often find reasonable solutions in large or infinite state spaces for which
systematic algorithms are unsuitable.
Optimization Problem
In addition to finding goals, local search algorithms are useful for solving pure optimization
problems, in which the aim is to find the best state according to an objective function.
Hill Climbing and Simulated annealing are examples of local search algorithms
2.3.1 Hill Climbing search
It is a local search algorithm which continuously moves in the direction of
increasing elevation/value to find the peak of the mountain or best solution to the
problem.
It terminates when it reaches a peak value where no neighbor has a higher value.
Hill climbing is sometimes called greedy local search because it grabs a good
neighbor state without thinking ahead about where to go next.
Fig: 2.10
Minimum in the 8-queeens state space; the state has h=1 but every successor
has a higher cost
h= number of pairs of queens that are attacking each other, either
directly or indirectly
h = 17 for the above state
A local minimum with h=1
Limitations:
Hill climbing cannot reach the optimum/best state(global maximum) if it enters any of
the following regions:
Local Maxima
A local maximum is a peak that is higher than each of its neighbouring states but
lower than the global maximum.
Plateaus
A plateau is a flat area of the state-space landscape.
It can be a flat local maximum, from which no uphill exit exits, or a shoulder, from
which progress is possible.
Ridges
A Ridge is an area which is higher than surrounding states, but it cannot be
reached in a single move.
Fig: 2.12
A Ridge is shown in figure 2.12 result in a sequence of local maxima that is very
difficult for greedy algorithm to navigate.
Variations of Hill Climbing
In steepest Ascent hill climbing all successors are compared and the closest to the
solution is chosen.
Steepest ascent hill climbing is like best-first search, which tries all possible
extensions of the current path instead of only one.
It gives optimal solution but time consuming.
2.3.2 Simulated Annealing:
Annealing is the process used to temper or harden metals and glass by heating them to
a high temperature and then gradually cooling them, thus allowing the material to
reach a low-energy crystalline state.
The simulated annealing algorithm is quite similar to hill climbing.
Instead of picking the best move, however, it picks a random move.
If the move improve the situation, it is always accepted.
Otherwise the algorithm accepts the move with some probability less than 1.
Checks all the neighbours.
Moves to worst state may be accepted.
2.3.3 Local Beam Search
The local beam search algorithm keeps track of k states rather than just one.
It begins with k randomly generated states.
At each step, all the successors of all states are generated.
If anyone is a goal, the algorithm halts. Otherwise, it selects the best successors from
the complete list and repeats.
Limitations
It only explore best ‘k’ nodes that mean lack of diversity to remove this problem
Stochastic beam search came into picture.
Instead of choosing the best k from the pool of candidate successors, stochastic beam
search chooses k successors at random, with the probability of choosing a given
successor being an increasing function of its value.
2.3.4 Evolutionary algorithms
A genetic algorithm is a variant of stochastic beam search in which successor states
are generated by combining two parent states rather than by modifying a single
state.
This algorithm reflects the process of natural selection where the fittest individuals are
selected for reproduction in order to produce offspring of the next generation.
The fitness function evaluates how close a given solution is to the optimal solution
of the desired problem. A fitness function should return higher values for better
states.
Fig:2.13
Fig 2.13 c)
Fig 2.14
Fig 2.13 d)
Fig 2.13
Fig 3.1
Fig 3.2
So, we can disregard x and y nodes because MIN will pick 2, no matter what.
Why? The 2 is smaller than any of the other nodes, in the other branches.
Another way of looking at it:
Fig 3.6
Works, in part, because of DFS.
Where did the name come from?
Keeping track of the best values for MAX(alpha) and the best values for MIN(beta):
Fig 3.7
Move Ordering
Order in which states are looked at can dramatically impact performance.
Depending on the values of each state, it can be determined to examine fewer nodes.
o Determine the smallest value sooner and you don’t need to look at the others.
If examinations begin with the likely best successors:
o Alpha-beta need only examine O(b m/ 2 ¿ nodes.
o Minimax needs O(b m).
o Branching factor essentially becomes √ b instead of b.
Chess would go from 35 to something like 6.
Dynamic move-ordering schemes can improve it further.
Example: use moves that were best in the past.
These best moves are often called killer moves.
Trying them first is called the killer move heuristic.
In certain games, transpositions can kill performance. (certain moves that are mirrors
of each other).
Example: chess pieces,
[a1, b1, a2, b2] mirrors [a2, b2, a1, b1]
Pieces ending up in the same position, just different
order of same moves to get there.
The redundant paths to repeated states can cause an exponential increase in search
cost, and that keeping a table of previously reached states can address this problem.
In game tree search, repeated states can occur because of transpositions—different
permutations of the move sequence that end up in the same position, and the problem
can be addressed with a transposition table that caches the heuristic value of states.
Keep a transposition table. Ignore the duplicates.
Similar to the explored list from GRAPH-SEARCH
Fig 3.8
Not perfect. Certain pieces gain in power in the end game relative to others. (bishops
on a sparse board).
Nonlinear valuing systems often used. (two bishops worth more than 2 bishops,
bishop valued more in end game)
3.3.2 Cutting off search
Time to end the search early.
Replace a call to a TERMINAL-TEST function with:
Choose a depth d that allows for evaluation within the desired time frame.
When time runs out, pick the best move.
Not perfect, not a guarantee, just gives best chance. Counter moves exit even for the
highest evaluated move.
An improvement: add a quiescence search that tries to find these special quiescent
positions.
Could be a state that is especially bad down a branch. Make special cases for these.
The horizon effect is more difficult to eliminate. It arises when the program is facing
an opponent’s move that causes serious damage and is ultimately unavoidable, but
can be temporarily avoided by the use of delaying tactics.
Consider the chess position below: It is clear that there is no way for the black bishop
to escape.
Example: the white rook can capture it by moving to h1, then a1, then a2; a capture at
depth 6 ply.
Fig 3.9
The game of Go illustrates two major weaknesses of heuristic alpha–beta tree search:
i. Go has a branching factor that starts at 361, which means alpha–beta search
would be limited to only 4 or 5 ply.
ii. It is difficult to define a good evaluation function for Go because material
value is not a strong indicator and most positions are in flux until the
endgame. In response to these two challenges, modern Go programs have
abandoned alpha–beta search and instead use a strategy called Monte Carlo
tree search (MCTS)
The basic MCTS strategy does not use a heuristic evaluation function. Instead, the
value of a state is estimated as the average utility over a number of simulations of
complete games starting from the state.
A simulation (also called a playout or rollout) chooses moves first for one player,
than for the other, repeating until a terminal position is reached. At that point the rules
of the game determine who has won or lost, and by what score.
To get useful information from the playout we need a playout policy that biases the
moves towards good ones. For Go and other games, playout policies have been
successfully learned from self-play by using neural networks.
Given a playout policy, we next need to decide two things:
i. from what positions do we start the playouts, and
ii. how many playouts do we allocate to each position?
Monte Carlo search do simulations starting from the current state of the game, and
track which of the possible moves from the current position has the highest win
percentage.
For some stochastic games this converges to optimal play as increases, but for most
games it is not sufficient—we need a selection policy that selectively focuses the
computational resources on the important parts of the game tree.
It balances two factors:
i. exploration of states that have had few playouts, and
ii. exploitation of states that have done well in past playouts, to get a more
accurate estimate of their value.
Monte Carlo tree search does that by maintaining a search tree and growing it on each
iteration of the following four steps, as shown in Figure
SELECTION:
Starting at the root of the search tree, we choose a move leading to a
successor node, and repeat that process, moving down the tree to a leaf.
Figure 5.10(a) shows a search tree with the root representing a state where
white has just moved, and white has won 37 out of the 100 playouts done so
far.
The thick arrow shows the selection of a move by black that leads to a node
where black has won 60/79 playouts. This is the best win percentage among
the three moves.
Selection continues on to the leaf node marked 27/35.
EXPANSION:
We grow the search tree by generating a new child of the selected node;
Figure 5.10(b) shows the new node marked with 0/0.
SIMULATION:
We perform a playout from the newly generated child node, choosing moves
for both players according to the playout policy.
These moves are not recorded in the search tree. In the figure, the simulation
results in a win for black.
BACK-PROPAGATION:
We now use the result of the simulation to update all the search tree nodes
going up to the root.
Since black won the playout, black nodes are incremented in both the number
of wins and the number of playouts, so 27/35 becomes 28/26 and 60/79
becomes 61/80.
Since white lost, the white nodes are incremented in the number of playouts
only, so 16/53 becomes 16/54 and the root 37/100 becomes 37/101.
We repeat these four steps either for a set number of iterations, or until the allotted
time has expired, and then return the move with the highest number of playouts.
One very effective selection policy is called “upper confidence bounds applied to
trees” or UCT. The policy ranks each possible move based on an upper confidence
bound formula called UCB1.
For a node , the formula is:
Fig 3.10
Where,
U (n) is the total utility of all playouts that went through node n,
N ( n) is the number of playouts through node n , and Parent (n) is the parent node of n in the
tree.
U (n)
is the exploitation term: the average utility of n .
N ( n)
The term with the square root is the exploration term: it has the count N (n) in the
denominator, which means the term will be high for nodes that have only been
explored a few times.
In the numerator it has the log of the number of times we have explored the parent of
n.
The pseudo code shows the complete UCT MCTS algorithm. When the iterations
terminate, the move with the highest number of playouts is returned.
The idea is that a node with wins is better than one with wins,
UCB1 formula ensures that the node with the most playouts is almost always the node
with the highest win percentage
Advantages of Monte Carlo Tree Search:
It does not necessarily require any tactical knowledge about the game
A general MCTS implementation can be reused for any number of games with little
modification
MCTS supports asymmetric expansion of the search tree based on the circumstances
in which it is operating.
As the tree growth becomes rapid after a few iterations, it requires a huge amount of
memory.
Requires a tree containing chance nodes in addition to min and max nodes.
They consider the possible dice rolls.
Fig 3.12
Each chance node gets dictated by probabilities of the die rolls 1/36, 1/18, etc.
Uncertainty. Only possible to calculate a position’s expected value: the average of all
possible outcomes of the chance nodes.
So, generalize the deterministic game’s minimax value to an expectiminimax value
for games with chance nodes.
Terminal, MIN, and MAX nodes stay the same.
For chance nodes, sum the value of all outcomes (weighted using probability):
Where r is the dice roll, and RESULT(s, r) is the same state s with roll r.
3.5.1 Evaluation functions for games of chance
Because of chance nodes, the meaning of evaluation values is a bit more dicey than in
deterministic games. Consider:
Assigning values to the leaves has different outcomes (who knew?) [1, 2, 3, 4] leads
to taking a1, but [1, 20, 30, 400] leads to taking a2.
Fig 3.13
Fig 3.14
The general AND-OR search algorithm can be applied to the belief-state space to find
guaranteed checkmates.
It finds midgame checkmates up to depth 9, which most humans can’t
do.
In addition to guaranteed checkmates, we have probabilistic checkmate, which
makes no sense in fully observable games. Hence randomization happens.
By moving randomly, the white king eventually bumps into the black king.
Black can’t keep guessing escape moves forever.
In KBNK(King, Bishop, Knight vs King) endgame:
White gives black infinite choices.
Eventually black guesses wrong.
This reveals black’s position.
This ends in checkmate.
Hard to find probabilistic checkmate with a reasonable depth, except endgame.
Usually you get an accidental checkmate early on, where the random choices just
work out.
So, how likely will a strategy win? How likely is the belief state board state the actual
true board state?
Now, not all belief states are equally likely. Certain moves are more important than
others, skewing the probabilities.
But, a player may want to avoid being predictable, skewing the probabilities even
more.
So, to play optimally, some randomness has to be built into moves on the part of the
player.
Leads to the idea of an equilibrium solution.
3.6.2 Card games
Many examples of stochastic partial observability.
Example:
Randomly deal cards at game start.
Cards hidden from other players.
Bridge, poker, hearts, etc.
Not exactly like dice, but suggests an algorithm:
Solve all possible deals of the invisible cards as if fully observable.
Then, pick best move average over all the deals.
Then, for every deal s with probability P(s), we can say the desired move is:
argmax α ∑ P ( s ) MINIMAX (RESULT ( s , a ) )
s
Number of deals can be huge, so solving all of them can be impossible.
Instead, use a Monte Carlo approximation
o i.e., don’t add up all deals, take a random sample of N deals.
o Consider the probability of s appearing in that sample is P(s), then:
N
1
argmax α ∑ MINIMAX (RESULT ( S i , a ) )
N i=1
o The bigger the N, the better the approximation.
3.7 Constraint Satisfaction Problem
A constraint satisfaction problem is one of the standard search problems where
instead of saying that state is a black box, we say that state is defined by variables and
values.
Each state has a certain set of variables and each variable has a certain set of values
and a complete assignment to all the variables, creates a final state.
A problem is solved when each variable has a value that satisfies all the constraints on
the variable. A problem described this way is called a constraint satisfaction
problem, or CSP.
This is a simple example of a formal representation language and it allows for general
purpose algorithms with more power than standard search algorithms.
3.7.1 Defining Constraint Satisfaction Problems
This is the solution, the solution is a specific assignment to each variable such that all
constraints are satisfied.
It can be helpful to visualize a CSP as a constraint graph, as shown in Fig (b).
In a constraint graph, each node is a variable and each edge determines whether
there is a constraint between those two variables or not.
This kind of a constraint graph is a binary constraint graph, where each constraint
relates at most two variables and such a CSP are called binary CSPs.
A state has many variables and we call them as state variables where, each state
variable is a node.
3.7.3 Variations on the CSP
i. Discrete variables
Finite domains
The simplest kind of CSP involves variables that have discrete and have finite
domains.
The simplest kind of CSP involves variables that are discrete and have finite domains.
Map coloring problems are of this kind.
The 8-queens problem can also be viewed as finite-domain
CSP, where the variables Q1,Q2,…..Q8 are the positions each queen in columns
1 , … .8 and each variable has the domain {1,2,3,4,5,6,7,8}.
If the maximum domain size of any variable in a CSP is d, then the number of
possible complete assignments is O(d n ) - that is, exponential in the number of
variables.
Infinite domains
Discrete variables can also have infinite domains - for example,the set of integers or
the set of strings.
With infinite domains, it is no longer possible to describe constraints by enumerating
all allowed combination of values. Instead a constraint language is needed such as
Start Job1 +5≤Start Job3
ii. Continuous variables
CSPs with continuous domains are very common in real world.
For example, in operation research field, the scheduling of experiments on the
Hubble Telescope requires very precise timing of observations; the start and finish
of each observation are continuous-valued variables that must obey a variety of
astronomical, precedence and power constraints.
The best known category of continuous-domain CSPs is that of linear
programming problems, where the constraints must be linear inequalities
forming a convex region.
Linear programming problems can be solved in time polynomial in the
number of variables.
Varieties of CSPs
Fig 3.16
Each letter stands for a distinct digit; the aim is to find a substitution of digits for
letters such that the resulting sum is arithmetically correct, with the added restriction
that no leading zeros are allowed.
The constraint hypergraph for the cryptarithmetic problem, shown in the Alldiff
constraint as well as the column addition constraints.
Each constraint is a square box connected to the variables it contains.
3.8 Constraint Propogation
A number of inference techniques use the constraints to infer which variable/value
pairs are consistent and which are not. These include node, arc, path, and k-consistent.
constraint propagation: Using the constraints to reduce the number of legal values
for a variable, which in turn can reduce the legal values for another variable, and so
on.
local consistency: If we treat each variable as a node in a graph and each binary
constraint as an arc, then the process of enforcing local consistency in each part of the
graph causes inconsistent values to be eliminated throughout the graph.
There are different types of local consistency:
3.8.4 K- consistency:
K-consistency: A CSP is k-consistent if, for any set of k-1 variables and for any
consistent assignment to those variables, a consistent value can always be assigned to
any kth variable.
1-consistency = node consistency; 2-consisency = arc consistency; 3-consistensy =
path consistency.
A CSP is strongly k-consistent if it is k-consistent and is also (k - 1)-consistent,
(k – 2)-consistent, … all the way down to 1-consistent.
A CSP with n nodes and make it strongly n-consistent, we are guaranteed to find a
solution in time O(n2d). But algorithm for establishing n-consitentcy must take time
exponential in n in the worse case, also requires space that is exponential in n.
3.8.5 Global constraints
A global constraint is one involving an arbitrary number of variables (but not
necessarily all variables). Global constraints can be handled by special-purpose
algorithms that are more efficient than general-purpose methods.
3.8.6 Sudoku
A Sudoku board consists of 81 squares, some of which are initially filled with digits
from 1 to 9. The puzzle is to fill in all the remaining squares such that no digit appears
twice in any row, column, or box. A row, column, or 3 ×3box is called a unit.
Fig 3.17
A Sudoku puzzle can be considered a CSP with 81 variables, one for each square. We
use the variable names A1 through A9 for the top row (left to right), down to I1
through I9 for the bottom row. The empty squares have the domain {1, 2, 3, 4, 5, 6, 7,
8, 9} and the pre-filled squares have a domain consisting of a single value.
There are 27 different Alldiff constraints: one for each row, column, and box of 9
squares:
Alldiff(A1, A2, A3, A4, A5, A6, A7, A8, A9)
Alldiff(B1, B2, B3, B4, B5, B6, B7, B8, B9)
…
Alldiff(A1, B1, C1, D1, E1, F1, G1, H1, I1)
Alldiff(A2, B2, C2, D2, E2, F2, G2, H2, I2)
…
Alldiff(A1, A2, A3, B1, B2, B3, C1, C2, C3)
Alldiff(A4, A5, A6, B4, B5, B6, C4, C5, C6)
Fig 3.18
Degree heuristic:
The degree heuristic attempts to reduce the branching factor on future choices by
selecting the variable that is involved in the largest number of constraints on other
unassigned variables. [useful tie-breaker]
E.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T
has degree 0.
ORDER-DOMAIN-VALUES
Value selection-fail -last
If we are trying to find all the solution to a problem (not just the first one), then the
ordering does not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice
for the neighboring variables in the constraint graph. (Try to leave the maximum
flexibility for subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that
our next choice is for Q. Blue would be a bad choice because it eliminates the last
legal value left for Q’s neighbor, SA, therefore prefers red to blue.
The minimum-remaining-values and degree heuristic are domain-independent
methods for deciding which variable to choose next in a backtracking search.
The least-constraining-value heuristic helps in deciding which value to try first for a
given variable.
3.9.2 Interleaving search and inference
INFERENCE - every time we make a choice of a value for a variable.
One of the simplest forms of inference is called forward checking. Whenever a
variable X is assigned, the forward-checking process establishes arc consistency for it:
for each unassigned variable Y that is connected to X by a constraint, delete from Y’s
domain any value that is inconsistent with the value chosen for X.
There is no reason to do forward checking if we have already done arc consistency as
a preprocessing step.
Fig 3.19
Advantage: For many problems the search will be more effective if we combine the
MRV heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but
doesn’t look ahead and make all the other variables arc-consistent.
Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s
domain, add X=x to Y’s conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are
added to the conflict set of X.
In fact,every branch pruned by backjumping is also pruned by forward checking.
Hence simple backjumping is redundant in a forward-checking search or in a search
that uses stronger consistency checking (such as MAC).
Conflict-directed backjumping:
e.g.
consider the partial assignment which is proved to be inconsistent: {WA=red,
NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these
last 4 variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work
because NT doesn’t have a complete conflict set of preceding variables that caused to
fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together
with any subsequent variables to have no consistent solution. So the algorithm should
backtrack to NSW and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-
direct backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable
has a standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value
for Xj fails, backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward,
given the preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)
After backjumping from a contradiction, how to avoid running into the same problem
again:
The idea of finding a minimum set of variables from the conflict set that causes the problem.
This set of variables, along with their corresponding values, is called a no-good. We then
record the no-good, either by adding a new constraint to the CSP or by keeping a separate
cache of no-goods.
Backtracking occurs when no legal assignment can be found for a variable.
A backjumping algorithm that uses conflict sets defined in this way is called
conflict-directed backjumping.
Conflict-directed backjumping backtracks directly to the source of the problem.
3.10 Local search for CSPs
Local search algorithms turn out to be very effective in solving many CSPs. They use
a complete-state formulation, where each state assigns a value to every variable, and
the search changes the value of one variable at a time.
As an example, we’ll use the 8-queens problem, as defined as a CSP. In Figure, we
start on the left with a complete assignment to the 8 variables; typically this will
violate several constraints.
We then randomly choose a conflicted variable, which turns out to be, the rightmost
column.
Fig 3.20
The min-conflicts heuristic: In choosing a new value for a variable, select the value
that results in the minimum number of conflicts with other variables.
In the above figure we see there are two rows that only violate one constraint; we pick
Q8=3 (that is, we move the queen to the 8th column, 3rd row).
On the next iteration, in the middle board of the figure, we select Q6 as the variable to
change, and note that moving the queen to the 8th row results in no conflicts.
At this point there are no more conflicted variables, so we have a solution. The
algorithm is shown in Figure 3.21.
The landscape of a CSP under the mini-conflicts heuristic usually has a series of
plateau. Simulated annealing and Plateau search (i.e. allowing sideways moves to
another state with the same score) can help local search find its way off the plateau.
This wandering on the plateau can be directed with tabu search: keeping a small list
of recently visited states and forbidding the algorithm to return to those states.
Constraint weighting: a technique that can help concentrate the search on the
important constraints.
Each constraint is given a numeric weight Wi, initially all 1.
At each step, the algorithm chooses a variable/value pair to change that will result in
the lowest total weight of all violated constraints.
Fig 3.21
The weights are then adjusted by incrementing the weight of each constraint that is
violated by the current assignment.
Local search can be used in an online setting when the problem changes, this is
particularly important in scheduling problems.
Fig 3.22
Fig 3.23
3.11.1Tree Decomposition
A tree decomposition must satisfy the following three requirements:
i. Every variable in the original problem appears in atleast one of the
subproblems.
ii. If two variables are connected by a constraint in the original problem, they
must appear together (along with the constraint) in atleast one of the
subproblems.
iii. If a variable appears in two subproblems in a tree, it must appear in every
subproblem along the path connecting those subproblems.
Fig 3.24