0% found this document useful (0 votes)
15 views72 pages

AI & ML (Unit-1)

The document discusses various problem-solving methods in artificial intelligence, focusing on search strategies, problem formulation, and the role of problem-solving agents. It outlines key concepts such as state space, goal states, actions, and path costs, while providing examples of toy and real-world problems, including the 8-puzzle and the traveling salesperson problem. Additionally, it emphasizes the importance of abstraction in problem formulation and the application of these methods in practical scenarios like route finding and robot navigation.

Uploaded by

Reeba Rose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views72 pages

AI & ML (Unit-1)

The document discusses various problem-solving methods in artificial intelligence, focusing on search strategies, problem formulation, and the role of problem-solving agents. It outlines key concepts such as state space, goal states, actions, and path costs, while providing examples of toy and real-world problems, including the 8-puzzle and the traveling salesperson problem. Additionally, it emphasizes the importance of abstraction in problem formulation and the application of these methods in practical scenarios like route finding and robot navigation.

Uploaded by

Reeba Rose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-1

PROBLEM SOLVING METHODS

Problem solving Methods - Search Strategies- Uninformed - Informed - Heuristics - Local


Search Algorithms and Optimization Problems - Searching with Partial Observations -
Constraint Satisfaction Problems – Constraint Propagation - Backtracking Search - Game
Playing – Optimal Decisions in Games – Alpha - Beta Pruning
2.1 Problem Formulation
 An important aspect of intelligence is goal-based problem solving.
 The solution of many problems can be described by finding a sequence of actions that
lead to a desirable goal.
 Each action changes the state and the aim is to find the sequence of actions and states
that lead from the initial (start) state to a final (goal) state.
 A well-defined problem can be described by:
a) Initial state
b) Operator or successor function - for any state x returns s(x), the set
of states reachable from x with one action
c) State space - all states reachable from initial by any sequence of
actions
d) Path - sequence through state space
e) Path cost - function that assigns a cost to a path. Cost of a path is the
sum of costs of individual actions along the path
f) Goal test - test to determine if at goal state

 What is Search?
a) Search is the systematic examination of states to find path from the start/root
state to the goal state.
b) The set of possible states, together with operators defining their connectivity
constitute the search space.
c) The output of a search algorithm is a solution, that is, a path from the initial
state to a state that satisfies the goal test.

1.7 Problem-solving agents


 A Problem solving agent is a goal-based agent.
 It decides what to do by finding sequence of actions that lead to desirable states.
 The agent can adopt a goal and aim at satisfying it.
 To illustrate the agent's behaviour
 For example where our agent is in the city of Arad, which is in Romania. The agent
has to adopt a goal of getting to Bucharest.
 Goal formulation
 based on the current situation and the agent's performance measure, is the first
step in problem solving.
 The agent's task is to find out which sequence of actions will get to a goal
state.
 Problem formulation
 is the process of deciding what actions and states to consider given a goal.
 Search
 Before taking any action in the real world, the agent simulates sequences of
actions in its model, searching until it finds a sequence of actions that reaches
the goal. Such a sequence is called a solution.
 The agent might have to simulate multiple sequences that do not reach the
goal, but eventually it will find a solution (such as going from Arad to Sibiu to
Fagaras to Bucharest), or it will find that no solution is possible.
 Execution
 The agent can now execute the actions in the solution, one at a time.

Fig: 1.13

1.7.1 Search problems and solutions


A search problem can be defined formally as follows
 A set of possible states that the environment can be in. We call this the state
space.
 The initial state that the agent starts in. For example: Arad
 A set of one or more goal states. Sometimes there is one goal state (e.g.,
Bucharest), sometimes there is a small set of alternative goal states, and
sometimes the goal is defined by a property that applies to many states
(potentially an infinite number). For example, in a vacuum-cleaner world, the
goal might be to have no dirt in any location, regardless of any other facts
about the state.
 The actions available to the agent. Given a state s, ACTIONS (s) returns a
finite set of actions that can be executed in s. We say that each of these actions
is applicable in s. An example:

 A transition model, which describes what each action does. RESULT (s,a)
returns the state that results from doing action a in state s. For example,
 An action cost function, denoted by ACTION-COST (s,a s’) when we are
programming or (s,a.s’) when we are doing math, that gives the numeric cost
of applying action in state s to reach state s’. A problem-solving agent should
use a cost function that reflects its own performance measure; for example, for
route-finding agents, the cost of an action might be the length in miles or it
might be the time it takes to complete the action.
 A sequence of actions forms a path, and a solution is a path from the initial
state to a goal state. We assume that action costs are additive; that is, the total
cost of a path is the sum of the individual action costs.
 An optimal solution has the lowest path cost among all solutions.
 The state space can be represented as a graph in which the vertices are states
and the directed edges between them are actions. The map of Romania shown
in figure is such a graph, where each road indicates two actions, one in each
direction
1.7.2 Formulating problems
 We derive a formulation of the problem in terms of the initial state, successor
function , goal test, and path cost
 Our formulation of the problem of getting to Bucharest is a model—an
abstract mathematical description—and not the real thing.
 Compare the simple atomic state description Arad to an actual cross-country
trip, where the state of the world includes so many things: the traveling
companions, the current radio program, the scenery out of the window, the
proximity of law enforcement officers, the distance to the next rest stop, the
condition of the road, the weather, the traffic, and so on.
 All these considerations are left out of our model because they are irrelevant
to the problem of finding a route to Bucharest.
 The process of removing detail from a representation is called abstraction.

1.8 EXAMPLE PROBLEMS


 The problem solving approach has been applied to a vast array of task environments.
Some best known problems are summarized below.
 They are distinguished as toy or real-world problems.
i. A Toy problem (standardized) is intended to illustrate various
problem solving methods. It can be easily used by different researchers
to compare the performance of algorithms.
ii. A Real-world problem is one whose solutions people actually care
about.
1.8.1 TOY PROBLEMS
a. Vacuum World Example
 States: The agent is in one of two locations., each of which might or might not
contain dirt. Thus there are 2 ×22 = 8 possible world states.
 Initial state: Any state can be designated as initial state.
 Successor function : This generates the legal states that results from trying the three
actions (left, right, suck). The complete state space is shown in below figure 1.14
 Goal Test : This tests whether all the squares are clean.
 Path Test : Each step costs one ,so that the path cost is the number of steps in the
path.
Vacuum World State Space

Fig: 1.14

b. 8-puzzle:
 An 8-puzzle consists of a 3x3 board with eight numbered tiles and a blank space.
 A tile adjacent to the blank space can slide into the space. The object is to reach the
specific goal state ,as shown in figure 1.15

Fig: 1.15

The problem formulation is as follows:


 States : A state description specifies the location of each of the eight tiles and the
blank in one of the nine squares.
 Initial state : Any state can be designated as the initial state. It can be noted that any
given goal can be reached from exactly half of the possible initial states.
 Successor function : This generates the legal states that result from trying the four
actions (blank moves Left,Right,Vp or down).
 Goal Test : This checks whether the state matches the goal configuration shown in
figure 2.4.(Other goal configurations are possible)
 Path cost: Each step costs 1, so the path cost is the number of steps in the path.
 The 8-puzzle belongs to the family of sliding-block puzzles, which are often used as
test problems for new search algorithms in AI.
 This general class is known as NP-complete.
 The 8-puzzle has 9!/2 = 181,440 reachable states and is easily solved.
 The 15 puzzle ( 4 x 4 board ) has around 1.3 trillion states, an the random instances
can be solved optimally in few milli seconds by the best search algorithms.
 The 24-puzzle (on a 5 x 5 board) has around 1025 states ,and random instances are
still quite difficult to solve optimally with current machines and algorithms.

c. 8-queens problem
 The goal of 8-queens problem is to place 8 queens on the chessboard such that no
queen attacks any other. (A queen attacks any piece in the same row, column or
diagonal).
 The following figure shows an attempted solution that fails: the queen in the right
most column is attacked by the queen at the top left.
 An Incremental formulation involves operators that augments the state description,
starting with an empty state for 8-queens problem, this means each action adds a
queen to the state.
 A complete-state formulation starts with all 8 queens on the board and move them
around.

In either case the path cost is of no interest because only the final state counts.
 The first incremental formulation one might try is the following :
 States : Any arrangement of 0 to 8 queens on board is a state.
 Initial state : No queen on the board.
 Successor function : Add a queen to any empty square.
 Goal Test : 8 queens are on the board, none attacked.
 In this formulation, we have 64.63…57 = 3 x 1014 possible sequences to investigate.
 A better formulation would prohibit placing a queen in any square that is already
attacked.
 States : Arrangements of n queens ( 0 <= n < = 8 ) ,one per column in the left
most columns ,with no queen attacking another are states.
 Successor function : Add a queen to any square in the left most empty
column such that it is not attacked by any other queen.
 This formulation reduces the 8-queen state space from 3 x 1014 to just 2057,and
solutions are easy to find.
 For the 100 queens the initial formulation has roughly 10400 states whereas the
improved formulation has about 1052 states.
 This is a huge reduction, but the improved state space is still too big for the
algorithms to handle.

1.8.2 REAL WORLD PROBLEMS


 A real world problem is one whose solutions people actually care about.
 They tend not to have a single agreed upon description, but attempt is made to give
general favour of their formulation,
 The following are the some real world problems,
 Route Finding Problem
 Touring Problems
 Travelling Salesman Problem
 Robot Navigation
ROUTE-FINDING PROBLEM
 Route-finding problem is defined in terms of specified locations and transitions along
links between them.
 Route-finding algorithms are used in a variety of applications, such as routing in
computer networks, military operations planning, and air line travel planning
systems.
a. AIRLINE TRAVEL PROBLEM
The airline travel problem is specifies as follows :
 States : Each is represented by a location(e.g.,an airport) and the
current time.
 Initial state : This is specified by the problem.
 Successor function : This returns the states resulting from taking any
scheduled flight(further specified by seat class and location),leaving
later than the current time plus the within-airport transit time,from the
current airport to another.
 Goal Test : Are we at the destination by some prespecified time?
 Path cost : This depends upon the monetary cost,waiting time,flight
time,customs and immigration procedures,seat quality,time of dat,type
of air plane,frequent-flyer mileage awards, and so on.

TOURING PROBLEMS
 Touring problems are closely related to route-finding problems, but with an important
difference.
 Consider for example, the problem, "Visit every city at least once" as shown in
Romania map.
 As with route-finding the actions correspond to trips between adjacent cities.
 Initial state would be "In Bucharest; visited{Bucharest}".  Intermediate state would
be "In Vaslui; visited {Bucharest,Vrziceni,Vaslui}".
 Goal test would check whether the agent is in Bucharest and all 20 cities have been
visited.

THE TRAVELLING SALESPERSON PROBLEM (TSP)


 TSP is a touring problem in which each city must be visited exactly once.
 The aim is to find the shortest tour. The problem is known to be NP-hard.
 Enormous efforts have been expended to improve the capabilities of TSP algorithms.
 These algorithms are also used in tasks such as planning movements of automatic
circuit-board drills and of stocking machines on shop floors
VLSI layout
 A VLSI layout problem requires positioning millions of components and connections
on a chip to minimize area, minimize circuit delays, minimize stray capacitances, and
maximize manufacturing yield.
 The layout problem is split into two parts: cell layout and channel routing.
ROBOT navigation
 ROBOT navigation is a generalization of the route-finding problem.
 Rather than a discrete set of routes, a robot can move in a continuous space with an
infinite set of possible actions and states.
 For a circular Robot moving on a flat surface, the space is essentially two-
dimensional.
 When the robot has arms and legs or wheels that also must be controlled, the search
space becomes multi-dimensional.
 Advanced techniques are required to make the search space finite.

AUTOMATIC ASSEMBLY SEQUENCING


 The example includes assembly of intricate objects such as electric motors.
 The aim in assembly problems is to find the order in which to assemble the parts of
some objects.
 If the wrong order is choosen, there will be no way to add some part later without
undoing somework already done.
 Another important assembly problem is protein design, in which the goal is to find a
sequence of Amino acids that will be fold into a three-dimensional protein with the
right properties to cure some disease.
1.8.3 Water jug Problem

 In the water jug problem in Artificial Intelligence, we are provided with two jugs:
one having the capacity to hold 3 gallons of water and the other has the capacity to
hold 4 gallons of water. There is no other measuring equipment available and the jugs
also do not have any kind of marking on them. So, the agent’s task here is to fill the 4-
gallon jug with 2 gallons of water by using only these two jugs and no other material.
Initially, both our jugs are empty.
 So, to solve this problem, following set of rules were proposed: shown in figure: 1.16
 Production rules for solving the water jug problem
 Here, let x denote the 4-gallon jug and y denote the 3-gallon jug.
Fig 1.16

 The listed production rules contain all the actions that could be performed by the
agent in transferring the contents of jugs. But, to solve the water jug problem in a
minimum number of moves, following set of rules in the given sequence should be
performed: shown in figure 1.17

Fig 1.17 Solution of water jug problem according to the production rules

If (x+y)
≥4
If ((x+y (x-[3-y], 3)
¿≥3
 On reaching the 7th attempt, we reach a state which is our goal state. Therefore,
at this state, our problem is solved.
1.9 SEARCH ALGORITHMS
 A search algorithm takes a search problem as input and returns a solution, or an
indication of failure.
 We consider algorithms that try to find a path that reaches a goal state.
 Each node in the search tree corresponds to a state in the state space and the edges in
the search tree correspond to actions.
 The root of the tree corresponds to the initial state of the problem.
 The state space describes the set of states in the world, and the actions that allow
transitions from one state to another.
 The search tree describes paths between these states, reaching towards the goal. The
search tree may have multiple paths to any given state, but each node in the tree has a
unique path back to the root (as in all trees)
 Figure 1.18 shows the first few steps in finding a path from Arad to Bucharest.
 The root node of the search tree is at the initial state, Arad.
 We can expand the node, by considering the available ACTIONS for that state, using
the RESULT function to see where those actions lead to, and generating a new node
(called a child node or successor node) for each of the resulting states. Each child
node has Arad as its parent node.
 At each stage, we have expanded every node on the frontier, extending every path
with all applicable actions that don’t result in a state that has already been reached.
 At the third stage, the topmost city (Oradea) has two successors, both of which have
already been reached by other paths, so no paths are extended from Oradea.
 Nodes that have been expanded and nodes on the frontier that have been generated are
shown. Nodes that could be generated next are shown in faint dashed lines. In the
bottom tree there is a cycle from Arad to Sibiu to Arad; that can’t be an optimal path,
so search should not continue from there.
Fig: 1.18

 Now we must choose which of these three child nodes to consider next. This is the
essence of search—following up one option now and putting the others aside for later.
Suppose we choose to expand Sibiu first, results a set of 6 unexpanded nodes. We call
this the frontier of the search tree. We say that any state that has had a node generated
for it has been reached (whether or not that node has been expanded).
1.9.1 Best-first search
 How do we decide which node from the frontier to expand next?
 A very general approach is called best-first search, in which we choose a
node, with minimum value of some f (n) , evaluation function, the algorithm
is shown in figure 1.19.
 On each iteration we choose a node on the frontier with minimum value,
return it if its state is a goal state, and otherwise apply EXPAND to generate
child nodes.
 Each child node is added to the frontier if it has not been reached before, or is
re-added if it is now being reached with a path that has a lower path cost than
any previous path.
 The algorithm returns either an indication of failure, or a node that represents
a path to a goal. By employing different functions, we get different specific
algorithms, which this chapter will cover.
Fig 1.19 The best-first search algorithm, and the function for expanding a node.
 Search data structures
Search algorithms require a data structure to keep track of the search tree. A node in
the tree is represented by a data structure with four components:
[Link]: the state to which the node corresponds;
[Link]: the node in the tree that generated this node;
[Link]: the action that was applied to the parent’s state to generate this node;
[Link]-COST: the total cost of the path from the initial state to this node. In
mathematical formulas, we use as a synonym for PATH-COST.
 We need a data structure to store the frontier.
 The appropriate choice is a queue of some kind, because the operations on a frontier
are:
IS-EMPTY(frontier) returns true only if there are no nodes in the frontier.
POP(frontier) removes the top node from the frontier and returns it.
TOP(frontier) returns (but does not remove) the top node of the frontier.
ADD(node, frontier) inserts node into its proper place in the queue.
 MEASURING PROBLEM-SOLVING PERFORMANCE
 The output of problem-solving algorithm is either failure or a solution.
 The algorithm's performance can be measured in four ways :
i. Completeness: Is the algorithm guaranteed to find a solution when
there is one?
ii. Optimality : Does the strategy find the optimal solution
iii. Time complexity: How long does it take to find a solution?
iv. Space complexity: How much memory is needed to perform the
search?
1.10 UNINFORMED SEARCH STRATEGIES (Blind search)
 The term means that the strategies have no additional information about states
beyond that provided in the problem definition.
 All they can do is generate successors and distinguish a goal state from a non-goal
state.
 All search strategies are distinguished by the order in which nodes are expanded.
1.10.1 Breadth-first search
 Breadth-first search is a simple strategy in which the root node is expanded first,
then all the successors of the root node are expanded next, then their successors,
and so on.
 In general all the nodes are expanded at a given depth in the search tree before any
nodes at the next level are expanded.
 BFS is an instance of the general graph-search algorithm in which the shallowest
unexpanded node is chosen for expansion. This is achieved very simply by using
a FIFO queue for the frontier.

Fig: 1.20 Breadth first search algorithm


 Now suppose that the solution is at depth Then the total number of nodes generated is
All the nodes remain in memory, so both time and space complexity are O ( bd ) .
 The memory requirements are a bigger problem for breadth-first search than the
execution time.
1.10.2 Uniform-cost search
 Uniform-cost search does not care about the number of steps a path has, but only
about their total cost.
 By a simple extension, we can find an algorithm that is optimal with any step-cost
function.
 Instead of expanding the shallowest node, uniform-cist search expands the node
n with the lowest path cost g(n) .
 This is done by storing the frontier as a priority queue ordered by g .

Fig: 1.20

 The algorithm is shown in figure 1.20


 Uniform-cost search on a graph. The algorithm is identical to the general graph search
algorithm, except for the use of a priority queue and the addition of an extra check in
case a shorter path to a frontier state is discovered.
 The data structure for frontier needs to support efficient membership testing, so it
should combine the capabilities of a priority queue and a hash table.
1.10.3 Depth-first search
 Depth-first search always expands the deepest node in the current frontier of the
search tree.
 The progress of the search is illustrated in figure 1.21.
 The search proceeds immediately to the deepest level of the search tree, where the
nodes have no successors.
 All those nodes are expanded, they are dropped from the frontier, so then the
search “backs up” to the next deepest node that still has unexplored successors.
Fig: 1.21

 For a state space with branching factor b and maximum depth m, depth-first search
requires storage of only O(b m) nodes.
1.10.4 Depth-limited search
 The embarrassing failure of depth-first search in infinite state spaces can be
alleviated by supplying depth-first search with a predetermined depth limit l .
That is, nodes at depthl are treated as if they have no successors. This approach is
called depth-limited search.
 The depth limit solves the infinite-path problem. Unfortunately, it also introduces an
additional source of incompleteness if we choose l<d , that is, the shallowest goal is
beyond the depth-limit.
 Depth-limit search will also be non-optimal if we choose l>d . Its time complexity
is O(b l) and its space complexity is O(bl). Depth-first search can be viewed as a
special case of depth-limited search with l=∞.
Fig: 1.22 The Recursive implementation of Depth-limited tree search:
1.10.5 Iterative deepening search
 Iterative deepening search is used in combination with depth-first tree search, that
finds the best depth limit.
 It does this by gradually increasing the limit-first 0, then 1, then 2, and so on-
until a goal is found. This will occur when the depth limit reaches d, the depth of the
shallowest goal node.
 The algorithm is shown in figure 1.23, which repeatedly applies depth-limited
search with increasing limits. It terminates when a solution is found or if the
depth-limited search return failure, meaning that no solution exists.

Fig: 1.23

1.10.6 Bidirectional Search


 The idea behind bidirectional search is to run two simultaneous searches
 One is the forward search from the initial state and
 other is the backward search from the goal state,
 It stops when the two searches meet in the middle.
d d
 The motivation is that b 2 +b 2 much less than b d
 The general best-first bidirectional search algorithm is shown in Figure 1.24.
 We pass in two versions of the problem and the evaluation function, one in the
forward direction (subscript) and one in the backward direction (subscript).
 When the evaluation function is the path cost, we know that the first solution
found will be an optimal solution, but with different evaluation functions that is
not necessarily true.
 Therefore, we keep track of the best solution found so far, and might have to
update that several times before the TERMINATED test proves that there is no
possible better solution remaining.

Fig: 1.24

Advantage:
 Bidirectional search is fast and it requires less memory.
Disadvantage:
 We should know the goal state in advance.
Performance Evaluation
Completeness Bidirectional search is complete if branching factor b is finite and if we
use BFS in both searches.
Optimality Bidirectional search is optimal.
Time Complexity O ¿) if it used BFS (where b is the branching factors or number of
nodes and d is the depth of the search tree or number of levels in search tree).
Space ComplexityO ¿
2.1 Informed (Heuristic) Search Strategies
 Informed search strategy is one that uses problem-specific knowledge beyond the
definition of the problem itself.
 It can find solutions more efficiently than uninformed strategy.
 The hints come in the form of a heuristic function, denoted h(n).
Where, h(n) = estimated cost of the cheapest path from the state at node n to a goal state.
 For example, in route-finding problems, we can estimate the distance from the current
state to a goal by computing the straight-line distance on the map between the two
points
Best-first search
 Best-first search is an instance of general TREE-SEARCH or GRAPH-SEARCH
algorithm in which a node is selected for expansion based on an evaluation function
f(n).
 The node with lowest evaluation is selected for expansion,because the evaluation
measures the distance to the goal.
 This can be implemented using a priority-queue,a data structure that will maintain the
fringe in ascending order of f-values.
Heuristic functions
 A heuristic function or simply a heuristic is a function that ranks alternatives in
various search algorithms at each branching step basing on an available information in
order to make a decision which branch is to be followed during a search.
 The key component of Best-first search algorithm is a heuristic function,denoted by
h(n):
h(n) = extimated cost of the cheapest path from node n to a goal node.
 For example,in Romania,one might estimate the cost of the cheapest path from Arad
to Bucharest via a straight-line distance from Arad to Bucharest(Figure 2.1).
 Heuristic function are the most common form in which additional knowledge is
imparted to the search algorithm.

2.1.1 Greedy Best-first search


 Greedy best-first search tries to expand the node that is closest to the goal,on the
grounds that this is likely to a solution quickly.
 It evaluates the nodes by using the heuristic function f(n) = h(n).
 Taking the example of Route-finding problems in Romania , the goal is to reach
Bucharest starting from the city Arad.
 We need to know the straight-line distances to Bucharest from various cities as shown
in Figure 2.1. For example, the initial state is In(Arad) ,and the straight line distance
heuristic h SLD(In(Arad)) is found to be 366.
 Using the straight-line distance heuristic h SLD ,the goal state can be reached
faster.

Fig: 2.2
Figure 2.2 shows the progress of greedy best-first search usingh SLD to find a path from Arad
to Bucharest. The first node to be expanded from Arad will be Sibiu,because it is closer to
Bucharest than either Zerind or Timisoara. The next node to be expanded will be
Fagaras,because it is closest. Fagaras in turn generates Bucharest,which is the goal.
Properties of greedy search
 Complete?? No–can get stuck in loops, e.g.,
Complete in finite space with repeated-state checking
 Time?? O(bm), but a good heuristic can give dramatic improvement
 Space?? O(bm)—keeps all nodes in memory
 Optimal?? No
 Greedy best-first search is not optimal,and it is incomplete.
 The worst-case time and space complexity is O(b m),where m is the maximum depth
of the search space.
2.1.2 A* Search
 A* Search is the most widely used form of best-first search. The evaluation function
f(n) is
obtained by combining
i. g(n) = the cost to reach the node,and
ii. h(n) = the cost to get from the node to the goal :
f(n) = g(n) + h(n).
 A* Search is both optimal and complete. A* is optimal if h(n) is an admissible
heuristic. The obvious example of admissible heuristic is the straight-line distance
h SLD. It cannot be an overestimate.
 A* Search is optimal if h(n) is an admissible heuristic – that is,provided that h(n)
never overestimates the cost to reach the goal.
 An obvious example of an admissible heuristic is the straight-line distance h SLD that
we used in getting to Bucharest. The progress of an A* tree search for Bucharest is
shown in Figure 2.2.
 The values of ‘g ‘ are computed from the step costs shown in the Romania map
( figure 2.1). Also the values ofh SLD are given in Figure 2.1.
Fig: 2.2

 A* search is complete.
 Whether A* is cost-optimal depends on certain properties of the heuristic.
 A key property is admissibility: an admissible heuristic is one that never
overestimates the cost to reach a goal.
 A slightly stronger property is called consistency. A heuristic is consistent if, for
every node and every successor of generated by an action we have:
h(n)≤ c (n , a , n ')+h(n ').
 This is a form of the triangle inequality, which stipulates that a side of a triangle
cannot be longer than the sum of the other two sides (see Figure 3.19 ). An example
of a consistent heuristic is the straight-line distance that we used in getting to
Bucharest.

Fig: 2.3

2.1.3 Memory-bounded search


 The main issue with A* is its use of memory.
 Memory is split between the frontier and the reached states.
 In our implementation of best-first search, a state that is on the frontier is stored in
two places: as a node in the frontier (so we can decide what to expand next) and as an
entry in the table of reached states (so we know if we have visited the state before).
 We can keep reference counts of the number of times a state has been reached, and
remove it from the reached table when there are no more ways to reach the state.
 Beam search limits the size of the frontier. The easiest approach is to keep only the
nodes with the best -scores, discarding any other expanded nodes.
 This makes the search incomplete and suboptimal, but we can choose to make good
use of available memory, and the algorithm executes fast because it expands fewer
nodes.
2.1.4 Iterative-deepening A* search
 (IDA*) is to A* what iterative-deepening search is to depth first: IDA* gives us the
benefits of A* without the requirement to keep all reached states in memory, at a cost
of visiting some states multiple times.
 It is a very important and commonly used algorithm for problems that do not fit in
memory.
 In standard iterative deepening the cutoff is the depth, which is increased by one each
iteration. In IDA* the cutoff is the f -cost (g+h); at each iteration, the cutoff value
is the smallest f -cost of any node that exceeded the cutoff on the previous iteration
2.1.5 Recursive Best-first Search(RBFS)
 Recursive best-first search is a simple recursive algorithm that attempts to mimic the
operation of standard best-first search,but using only linear space. The algorithm is
shown in figure 2.4.
 Its structure is similar to that of recursive depth-first search,but rather than continuing
indefinitely down the current path,it keeps track of the f-value of the best
alternative path available from any ancestor of the current node.
 If the current node exceeds this limit,the recursion unwinds back to the
alternative path. As the recursion unwinds,RBFS replaces the f-value of each
node along the pathwith the best f-value of its children.

Fig: 2.4

Figure 2.5 shows how RBFS reaches Bucharest.


 RBFS is optimal if the heuristic function is admissible. Its space complexity is
linear in the depth of the deepest optimal solution, but its time complexity is rather
difficult to characterize: it depends both on the accuracy of the heuristic function and
on how often the best path changes as nodes are expanded. It expands nodes in order
of increasing -score, even if is nonmonotonic.
 IDA* and RBFS suffer from using too little memory. Between iterations, IDA*
retains only a single number: the current -cost limit.
 RBFS retains more information in memory, but it uses only linear space: even if
more memory were available, RBFS has no way to make use of it.
 Because they forget most of what they have done, both algorithms may end up
reexploring the same states many times over
 To determine how much memory we have available, and allow an algorithm to use all
of it. Two algorithms that do this are MA* (memory-bounded A*) and SMA*
(simplified MA*).
2.1.6 SMA*
 SMA* proceeds just like A*, expanding the best leaf until memory is full.
 At this point, it cannot add a new node to the search tree without dropping an old one.
 SMA* always drops the worst leaf node—the one with the highest -value.
 Like RBFS, SMA* then backs up the value of the forgotten node to its parent.
 In this way, the ancestor of a forgotten subtree knows the quality of the best path in
that subtree.
 With this information, SMA* regenerates the subtree only when all other paths have
been shown to look worse than the path it has forgotten.
 Another way of saying this is that if all the descendants of a node are forgotten, then
we will not know which way to go from n but we will still have an idea of how
worthwhile it is to go anywhere from n.
2.2 Heuristic Functions
 A heuristic function or simply a heuristic is a function that ranks alternatives in
various search algorithms at each branching step basing on an available information in
order to make a decision which branch is to be followed during a search.
The 8-puzzle
 The 8-puzzle is an example of Heuristic search problem. The object of the puzzle is to
slide the tiles horizontally or vertically into the empty space until the configuration
matches the goal configuration(Figure 2.6)

 The average solution cost for a randomly generated 8-puzzle instance is about 22
steps.
 The branching factor is about 3.(When the empty tile is in the middle, there are four
possible moves; when it is in the corner there are two; and when it is along an edge
there are three).
 This means that an exhaustive search to depth 22 would look at about 322
approximately = 3.1 X 1010 states.
 By keeping track of repeated states, we could cut this down by a factor of about
170,000, because there are only 9!/2 = 181,440 distinct states that are reachable. This
is a manageable number, but the corresponding number for the 15-puzzle is roughly
1013.
 If we want to find the shortest solutions by using A*,we need a heuristic function that
never overestimates the number of steps to the goal.
 The two commonly used heuristic functions for the 15-puzzle are :
i. h1 = the number of misplaced tiles.
For figure 2.6, all of the eight tiles are out of position, so the start state would have h1
= 8. h1 is an admissible heuristic.
ii. h2 = the sum of the distances of the tiles from their goal positions.
This is called the city block distance or Manhattan distance.
h2 is admissible ,because all any move can do is move one tile one step closer to the goal.
Tiles 1 to 8 in start state give a Manhattan distance of
h2 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18.
Neither of these overestimates the true solution cost ,which is 26.

2.2.1 The effect of heuristic accuracy on performance


The Effective Branching factor
One way to characterize the quality of a heuristic is the effective branching factor b*. If
the total number of nodes generated by A* for a particular problem is N,and the solution
depth is d,then b*is the branching factor that a uniform tree of depth d would have to have in
order to contain N+1nodes. Thus,
N +1=1+ b∗+ ¿ ¿

 For example, if A* finds a solution at depth 5 using 52 nodes, then effective


branching factor is 1.92.
 To test the heuristic functions h1 and h2, 1200 random problems were generated with
solution lengths from 2 to 24 and solved them with iterative deepening search and
with A* search using both h1 and h2.
 Figure 2.7 gives the average number of nodes expanded by each strategy and the
effective branching factor.
 The results suggest that h2 is better than h1,and is far better than using iterative
deepening search.
Fig 2.7 Comparison of the search cost and effective branching factor

2.2.2 Generating heuristics from relaxed problems


Relaxed problems
 A problem with fewer restrictions on the actions is called a relaxed problem
 The cost of an optimal solution to a relaxed problem is an admissible heuristic for the
original problem
 If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1(n)
gives the shortest solution
 If the rules are relaxed so that a tile can move to any adjacent square, then h2(n)
gives the shortest solution
 Hence, the cost of an optimal solution to a relaxed problem is an admissible heuristic
for the original problem.
 For example, if the 8-puzzle actions are described as
A tile can move from square X to square Y if
X is adjacent to Y and Y is blank,
 we can generate three relaxed problems by removing one or both of the conditions:
i. A tile can move from square X to square Y if X is adjacent to
Y.
ii. A tile can move from square X to square Y if Y is blank.
iii. A tile can move from square X to square Y.
 From (a), we can derive (Manhattan distance). The reasoning is that h2 would be the
proper score if we moved each tile in turn to its destination.
 From (c), we can derive h1 (misplaced tiles) because it would be the proper score if
tiles could move to their intended destination in one action.
 If the relaxed problem is hard to solve, then the values of the corresponding heuristic
will be expensive to obtain.
 A program called ABSOLVER can generate heuristics automatically from problem
definitions, using the “relaxed problem” method.
2.2.3 Generating admissible heuristics from sub problems: Pattern databases
 Admissible heuristics can also be derived from the solution cost of a subproblem
of a given problem.

Fig 2.8

Pattern databases
 The idea behind pattern databases is to store these exact solution costs for every
possible subproblem instance- in our example, every possible configuration of the
four tiles and the blank.
 Then we compute an admissible heuristic for each state encountered during a
search simply by looking up the corresponding subproblem configuration in the
database.
 The database itself is constructed by searching back from the goal and recording
the cost of each new pattern encountered;
1.2.4 Generating heuristics with landmarks
 There are online services that host maps with tens of millions of vertices and find
cost-optimal driving directions in milliseconds (figure 2.9)
 How can they do that, when the best search algorithms we have considered so far
are about a million times slower?
 There are many tricks, but the most important one is precomputation of some
optimal path costs.
 Although the precomputation can be time-consuming, it need only be done once,
and then can be amortized over billions of user search requests.
Fig 2.9

 If the optimal path happens to go through a landmark, this heuristic will be exact; if
not it is inadmissible—it overestimates the cost to the goal.
 In an A* search, if you have exact heuristics, then once you reach a node that is on an
optimal path, every node you expand from then on will be on an optimal path.
 Some route-finding algorithms save even more time by adding shortcuts—artificial
edges in the graph that define an optimal multi-action path.

 This is called a differential heuristic.


1.2.5 Learning to search better

 Could an agent learn how to search better? The answer is yes, and the method rests on
an important concept called the metalevel state space.
 Each state in a metalevel state space captures the internal (computational) state of a
program that is searching in an ordinary state space such as the map of Romania.
(To keep the two concepts separate, we call the map of Romania a k object-level state
space.)
 Each action in the metalevel state space is a computation step that alters the internal
state; for example, each computation step in A* expands a leaf node and adds its
successors to the tree.
 For harder problems, there will be many such missteps, and a metalevel learning
algorithm can learn from these experiences to avoid exploring unpromising subtrees.
 The goal of learning is to minimize the total cost of problem solving, trading off
computational expense and path cost.
1.2.6 Learning heuristics from experience
 one way to invent a heuristic is to devise a relaxed problem for which an optimal
solution can be found easily.
 An alternative is to learn from experience. “Experience” here means solving lots
of 8-puzzles, for instance.
 Each optimal solution to an 8-puzzle problem provides an example (goal, path)
pair. From these examples, a learning algorithm can be used to construct a
function that can approximate the true path cost for other states that arise during
search.
2.3 LOCAL SEARCH AND OPTIMIZATION PROBLEM
Local Search
 Local search algorithms operate using a single current node and generally move
only to neighbors of that node.
 Local search method keeps small amount of nodes in memory. They are suitable for
problems when the solution is the goal state itself and not the path.
Local search have two key advantages
 They use very little memory - usually a constant amount
 They can often find reasonable solutions in large or infinite state spaces for which
systematic algorithms are unsuitable.
Optimization Problem
In addition to finding goals, local search algorithms are useful for solving pure optimization
problems, in which the aim is to find the best state according to an objective function.
Hill Climbing and Simulated annealing are examples of local search algorithms
2.3.1 Hill Climbing search
 It is a local search algorithm which continuously moves in the direction of
increasing elevation/value to find the peak of the mountain or best solution to the
problem.
 It terminates when it reaches a peak value where no neighbor has a higher value.
 Hill climbing is sometimes called greedy local search because it grabs a good
neighbor state without thinking ahead about where to go next.
Fig: 2.10

To illustrate hill climbing, we will use the 8-queens problem


Fig: 2.11 (a) The 8-queens problem: place 8 queens on a chess board so that no queen
attacks another
Fig: (b) A 8-queens state with heuristic cost estimate. The board shows the value of
for each possible successor obtained by moving a queen within its column. There are
8 moves that are tied for best, with h=12

 Minimum in the 8-queeens state space; the state has h=1 but every successor
has a higher cost
h= number of pairs of queens that are attacking each other, either
directly or indirectly
h = 17 for the above state
A local minimum with h=1
Limitations:
 Hill climbing cannot reach the optimum/best state(global maximum) if it enters any of
the following regions:
Local Maxima
 A local maximum is a peak that is higher than each of its neighbouring states but
lower than the global maximum.
Plateaus
 A plateau is a flat area of the state-space landscape.
 It can be a flat local maximum, from which no uphill exit exits, or a shoulder, from
which progress is possible.
Ridges
 A Ridge is an area which is higher than surrounding states, but it cannot be
reached in a single move.

Fig: 2.12

 A Ridge is shown in figure 2.12 result in a sequence of local maxima that is very
difficult for greedy algorithm to navigate.
Variations of Hill Climbing
 In steepest Ascent hill climbing all successors are compared and the closest to the
solution is chosen.
 Steepest ascent hill climbing is like best-first search, which tries all possible
extensions of the current path instead of only one.
 It gives optimal solution but time consuming.
2.3.2 Simulated Annealing:
 Annealing is the process used to temper or harden metals and glass by heating them to
a high temperature and then gradually cooling them, thus allowing the material to
reach a low-energy crystalline state.
 The simulated annealing algorithm is quite similar to hill climbing.
 Instead of picking the best move, however, it picks a random move.
 If the move improve the situation, it is always accepted.
 Otherwise the algorithm accepts the move with some probability less than 1.
 Checks all the neighbours.
 Moves to worst state may be accepted.
2.3.3 Local Beam Search
 The local beam search algorithm keeps track of k states rather than just one.
 It begins with k randomly generated states.
 At each step, all the successors of all states are generated.
 If anyone is a goal, the algorithm halts. Otherwise, it selects the best successors from
the complete list and repeats.
Limitations
 It only explore best ‘k’ nodes that mean lack of diversity to remove this problem
Stochastic beam search came into picture.
 Instead of choosing the best k from the pool of candidate successors, stochastic beam
search chooses k successors at random, with the probability of choosing a given
successor being an increasing function of its value.
2.3.4 Evolutionary algorithms
 A genetic algorithm is a variant of stochastic beam search in which successor states
are generated by combining two parent states rather than by modifying a single
state.
 This algorithm reflects the process of natural selection where the fittest individuals are
selected for reproduction in order to produce offspring of the next generation.
 The fitness function evaluates how close a given solution is to the optimal solution
of the desired problem. A fitness function should return higher values for better
states.
Fig:2.13

Consider a 8 queens problem:


 The problem is to place 8 queens on a chess board so that none of them attack the
other.
 A chess board can be considered a plain board with eight columns and eight rows.
 Here, the fitness function is the number of queen that attack none.
 There is a population of individuals (states), in which the fittest (highest value)
individuals produce offspring (successor states) that populate the next generation, a
process called recombination.

Fig 2.13 c)
Fig 2.14
Fig 2.13 d)
Fig 2.13

It includes the following ways:


 The size of the population.
 The representation of each individual. In evolution strategies, an individual is
a sequence of real numbers, and in genetic programming an individual is a
computer program.
 The selection process for selecting the individuals who will become the
parents of the next generation: one possibility is to select from all individuals
with probability proportional to their fitness score.
 The recombination procedure. One common approach is to randomly select a
crossover point to split each of the parent strings, and recombine the parts to
form two children, one with the first part of parent 1 and the second part of
parent 2; the other with the second part of parent 1 and the first part of parent
2.
 The mutation rate, which determines how often offspring have random
mutations to their representation. Once an offspring has been generated,
every bit in its composition is flipped with probability equal to the
mutation rate.

3.1 Game theory


 Multiagent environments are a form of competitive environment if the agent’s goals
are conflicting.
 Such environments lead to adversarial search problems (games).
 Mathematical game theory considers multiagent environments as a game when agent
decisions significantly impact other agents.
3.1.1 Two- player zero-sum games
 Most common games in AI are deterministic, turn-taking, two-player, zero-sum
games of perfect informat
 ion (chess, checkers).
 Another way to put it: deterministic, fully observable environments with agents
alternating turns having end game utility values always equal and opposite.
 One player wins, the other must lose.
 Interesting problems to study because they are hard.
 Chess has average branching factor of 35
 Must make a decision, even if optimal takes too long to find.
 Games penalize inefficiency
 Some ideas for saving time:
 Pruning lets you ignore parts of search trees.
 Heuristic evaluation function help you to make good guesses without
completing an entire search.
 Some games have to deal with imperfect information.
 Can’t see all the cards at once in solitaire.
 To start, consider a two-player game with players MIN and MAX
 MAX goes first.
 Turns alternate until game ends.
 Points given to winner, penalities to loser.
 Then, a game becomes a search problem.
 Properties of the search problem:
So :The initial state (the game setup)
PLAYER(s) : find player’s turn in state s.
ACTION ( s ) : find legal moves in state s.
RESULT ( s , a ) : transition model, the result of the move.
TERMINAL−¿TEST(s): a terminal test, returns true if the game is over.
Game ending states are terminal states.
UTILITY ( s ) :utility function(objective function), figures out a “score”.
 The score:
 In chess, +1 for a win, 0 for a loss, ½ for tie
 In backgammon, from 0 to +192.
 Zero-sum game: total payoff to all players is the same for each game istance.
 Chess is zero-sum. Pays: 0+1 or 1+0 or ½+ ½.
 The game’s game tree is defined by ACTIONS, RESULT and initial state:

Fig 3.1

 Notes about the tree:


 MAX starts and can make 9 moves.
 Play alternates.
 Play ends once reaching a leaf node.
 Leaves are terminal states: three-in-a-row or cat’s eye.
 Leaf numbers are utility values from MAX’s POV.
 High values good for MAX, low values good for MIN.
 Tree is fairly small for tic-tac-toe: 9! Terminal nodes.
 Tree is huge for chess: 1040 nodes.
 Tree is theoretical.
 A “search tree” is a tree superimposed on the game tree.
 Look at just enough nodes to make a move.
 Player always picks the best move it can.
3.2 OPTIMAL DECISION IN GAMES
 An optimal search provides an action sequence ending in a winning terminal state.
 In adversarial searches, MIN has an impact on this search.
 In response, MAX finds a contingent strategy.
 What is this?
 Max picks an initial move and then comes up with moves in response
to every MIN counter move.
 Basically, MAX is anticipating it’s moves, all of MIN’s counter moves, and MAX’s
counter-counter moves, etc.
 Analogous to the AND-OR search algorithm.
MAX is or…
MIN is end
 Think of its way: an optimal strategy provides outcomes at least as good as any
strategy assuming a perfect opponent.
 How to find it?
 Since even TIC-TAC-TOE is too complicated, consider:

Fig 3.2

 To start, MAX can take moves a1, a2, a3.


 MIN can reply with b1, b2, b3, etc.
 Game ends with one move by each.
 So:
 The game tree is one move deep.
 There’s 2 half moves
 Each half move is called a ply.
 Terminal state utilities: 2 14
 Find the optimal strategy using a the minmax value from each node:
 Use function: MINMAX(n).
 This value assumes both players play optimally the entire game.
 The minmax value of a terminal state is just its utility.
 MAX prefers a move for maximum value.
 MIN prefers a move for minimum value.
 A definition for function MINMAX:

 MAX will fare even better if MIN plays suboptimally.


 To start, the best minmax decision for MAX is move a 1
 MIN’s best response is b 1.
3.2.1 The minimax search algorithm
 Minimax is a kind of backtracking algorithm that is used in game theory to find the
optimal move for a player.
 It is widely used in two player turn-based games.
 Example: chess, checkers, ti-tac-toe.
 In Minimax the two players are called MAX And MIN.
 MAX highest value
 MIN lowest value
 The minimax algorithm finds the minimax decision for the current state.
 Uses recursion to get to a terminal state (leaf)
 The minmax values are backed up through the recursion tree.
Fig 3.3

 Performs a complete depth-first search of the game tree


 For a tree depth of m with b moves at each point, time complexity is: O(b ¿¿ m). ¿
 Space complexity: O(bm), when generating all actions at once.
OR
O(m) if actions are evaluated one at a time.
 Real games? Time is horrible. Gives us a starting point though for learning practical
algorithms.
3.2.2 Optimal decisions in multiplayer games
 The single value for each node gets replaced with a vector of values (one for each
opponent).
 For players A,B, C, vector <
V A , V B , V C >would be assigned ¿ each node .
 The vector gives values at terminal nodes from each player’s perspective.
 How? Have UTILITY return a vector of utilities.
 Consider:
Fig 3.4

 C chooses a move from the ‘X’ state.


 Choice gives you either
<V A =1 ,V B=2 , V C =6 >¿<V A =4 , V B=2, V C =3 >¿
 C picks the former because V C =6.
 So, play will lead to the state with the vector containing that value.
 Then, the “backed-up” value of X is the vector with that value.
 Multiplayer games usually involve alliances, whether formal or informal, among the
players. Alliances are made and broken as the game proceeds.
 Multiplayer alliances can lead to complications.
 C too strong? A and B gang up.

3.2.3 Alpha-Beta Pruning


 Alpha-beta pruning is a modified version of the minimax algorithm
 It is an optimization technique for the minimax algorithm.
 Alpha-beta pruning is the pruning(cutting down) of useless branches in decision
trees.
 Alpha(α ): highest value
Initial value of α =−∞
Max player will only update the value of alpha Condition
 Beta( β ) : lowest value α≥β
Initial value of β=∞
Min player will only update the value of beta
 How? Figure out the right minimax decision without examining all possible nodes.
 How? Use pruning to ignore parts of the tree.
 The technique is called alpha-beta pruning.
 Used to get the same decision as minimax without all the work.
 Consider:
Fig 3.5

 We can also simplify the formula for minimax:

 So, we can disregard x and y nodes because MIN will pick 2, no matter what.
 Why? The 2 is smaller than any of the other nodes, in the other branches.
 Another way of looking at it:

Fig 3.6
 Works, in part, because of DFS.
 Where did the name come from?
 Keeping track of the best values for MAX(alpha) and the best values for MIN(beta):

Fig 3.7

Move Ordering
 Order in which states are looked at can dramatically impact performance.
 Depending on the values of each state, it can be determined to examine fewer nodes.
o Determine the smallest value sooner and you don’t need to look at the others.
 If examinations begin with the likely best successors:
o Alpha-beta need only examine O(b m/ 2 ¿ nodes.
o Minimax needs O(b m).
o Branching factor essentially becomes √ b instead of b.
 Chess would go from 35 to something like 6.
 Dynamic move-ordering schemes can improve it further.
Example: use moves that were best in the past.
 These best moves are often called killer moves.
 Trying them first is called the killer move heuristic.
 In certain games, transpositions can kill performance. (certain moves that are mirrors
of each other).
 Example: chess pieces,
[a1, b1, a2, b2] mirrors [a2, b2, a1, b1]
 Pieces ending up in the same position, just different
order of same moves to get there.
 The redundant paths to repeated states can cause an exponential increase in search
cost, and that keeping a table of previously reached states can address this problem.
 In game tree search, repeated states can occur because of transpositions—different
permutations of the move sequence that end up in the same position, and the problem
can be addressed with a transposition table that caches the heuristic value of states.
 Keep a transposition table. Ignore the duplicates.
 Similar to the explored list from GRAPH-SEARCH

3.3 Heuristic Alpha-Beta search


 Alpha-beta pruning can help, but searches can still take too long (have to go all the
way to leafs).
 To improve on hills, terminate searches early based on a heuristic evaluation
function.
 Treats non-terminal nodes like leaves.
 So, modify min-max or alpha-beta by:
 Replace utility function with heuristic function EVAL.
 Replace terminal test with a cutoff test.
 So, we will then have a heuristic minimax function for states with a max depth of d:

3.3.1 Evaluation functions


 Provide an estimate of the utility from a given position of a move.
 Provide a sort of short cut.
 Good evaluation functions are a must; bad ones lose you the game.
 Should order terminal states by desirability.
 The computation must take reasonable time.
 Should be “strongly correlated” to the actual chance of a win.
 Can’t examine everything, we’re cutting off some states. Introduces
uncertainty.
 Computational uncertainty, not random chance uncertainty.
 How do evaluation functions work?
 They calculate features of a state.
 Defines categories or equivalence classes of states.
 Example: All 1 pawn Vs pawn states.
 Each category will win some, lose some, draw some.
 Function figures out the ratio for each outcome.
 For 1 pawn Vs 2 pawn, maybe [Link] W: L: D ratio.
 Use this ratio to compute an expected value and order based on this.
 Expected value: 0.72*1+.20*0+.08* 1/2=.76
 Still too slow. Instead, figure out values for contributions from features and add them
up. Called material value.
 Example: chess. Approximately:
 Pawns are worth 1, rook 5, quees 9.
 Situations could be valued too.
 Add them up to evaluate a position.
 Mathematically known as a weighted linear function.

Fig 3.8

 Not perfect. Certain pieces gain in power in the end game relative to others. (bishops
on a sparse board).
 Nonlinear valuing systems often used. (two bishops worth more than 2 bishops,
bishop valued more in end game)
3.3.2 Cutting off search
 Time to end the search early.
 Replace a call to a TERMINAL-TEST function with:

if CUTOFF-TEST(state, depth) then return EVAL(state)

 Choose a depth d that allows for evaluation within the desired time frame.
 When time runs out, pick the best move.
 Not perfect, not a guarantee, just gives best chance. Counter moves exit even for the
highest evaluated move.
 An improvement: add a quiescence search that tries to find these special quiescent
positions.
 Could be a state that is especially bad down a branch. Make special cases for these.
 The horizon effect is more difficult to eliminate. It arises when the program is facing
an opponent’s move that causes serious damage and is ultimately unavoidable, but
can be temporarily avoided by the use of delaying tactics.
 Consider the chess position below: It is clear that there is no way for the black bishop
to escape.
 Example: the white rook can capture it by moving to h1, then a1, then a2; a capture at
depth 6 ply.

Fig 3.9

3.3.3 Forward pruning


 Another type of pruning eliminates moves without any consideration. Called forward
pruning.
 After all, most people consider only a few chess moves at a time.
 One type of forward pruning is the beam search.
 What? For each ply, look at only a beam of n moves, not all possible moves.
 Bad? Can be. No guarantee best move doesn’t get pruned.
 The PROBCUT algorithm uses statistics on previous games to guess which moves are
probably the safest to cut out.
 Does a shallow search computing backed up value v.
 Uses statistics to compute probability using v at a depth of d.
 Built a game of Othello that beat traditional algorithms most the time.
 Combining all these techniques can get a chess game to look ahead up to 10 plies in
reasonable time.
 That’s expert levels of chess play.
 Need further tuning to reach grand master, including some special tables we’ll add….
3.3.4 Search vs lookup
 Overkill to look at entire tree for just the opening move.
 Good openings and endings have been known for awhile.
 For these situations, use a table to find the best move. (much faster).
 Table is great for the best 10 moves, which have been studied by humans.
 For end game, computer is better thinking about all the combinations possible
quickly.
 Closing in on the checkmate can take a human a lot of time to figure out.
 Computer computes a policy, a mapping from every possible state to the best move in
that state.
 Then, just look up that move instead of recomputing it over and over.
 Consider KBNK (king bishop knight king) scenario.
 Numbers:
 462 was to put two kings on the board non-adjacent.
 62 squares left for bishop, 61 for the knight, two squares for each player to
move next.
 So, 462*62*62*2=3,494,568 possible combinations. Some are checkmates.
Put them in a table.
 From the table, perform retrograde search, which is a search through moves in
reverse.
 Look at all possibilities. Eventually, you get a guaranteed set of moves and a win for
KNBK.
3.4 Monte Carlo Tree Search

 The game of Go illustrates two major weaknesses of heuristic alpha–beta tree search:
i. Go has a branching factor that starts at 361, which means alpha–beta search
would be limited to only 4 or 5 ply.
ii. It is difficult to define a good evaluation function for Go because material
value is not a strong indicator and most positions are in flux until the
endgame. In response to these two challenges, modern Go programs have
abandoned alpha–beta search and instead use a strategy called Monte Carlo
tree search (MCTS)
 The basic MCTS strategy does not use a heuristic evaluation function. Instead, the
value of a state is estimated as the average utility over a number of simulations of
complete games starting from the state.
 A simulation (also called a playout or rollout) chooses moves first for one player,
than for the other, repeating until a terminal position is reached. At that point the rules
of the game determine who has won or lost, and by what score.
 To get useful information from the playout we need a playout policy that biases the
moves towards good ones. For Go and other games, playout policies have been
successfully learned from self-play by using neural networks.
 Given a playout policy, we next need to decide two things:
i. from what positions do we start the playouts, and
ii. how many playouts do we allocate to each position?
 Monte Carlo search do simulations starting from the current state of the game, and
track which of the possible moves from the current position has the highest win
percentage.
 For some stochastic games this converges to optimal play as increases, but for most
games it is not sufficient—we need a selection policy that selectively focuses the
computational resources on the important parts of the game tree.
 It balances two factors:
i. exploration of states that have had few playouts, and
ii. exploitation of states that have done well in past playouts, to get a more
accurate estimate of their value.
 Monte Carlo tree search does that by maintaining a search tree and growing it on each
iteration of the following four steps, as shown in Figure
SELECTION:
 Starting at the root of the search tree, we choose a move leading to a
successor node, and repeat that process, moving down the tree to a leaf.
 Figure 5.10(a) shows a search tree with the root representing a state where
white has just moved, and white has won 37 out of the 100 playouts done so
far.
 The thick arrow shows the selection of a move by black that leads to a node
where black has won 60/79 playouts. This is the best win percentage among
the three moves.
 Selection continues on to the leaf node marked 27/35.
EXPANSION:
 We grow the search tree by generating a new child of the selected node;
Figure 5.10(b) shows the new node marked with 0/0.
SIMULATION:
 We perform a playout from the newly generated child node, choosing moves
for both players according to the playout policy.
 These moves are not recorded in the search tree. In the figure, the simulation
results in a win for black.

BACK-PROPAGATION:
 We now use the result of the simulation to update all the search tree nodes
going up to the root.
 Since black won the playout, black nodes are incremented in both the number
of wins and the number of playouts, so 27/35 becomes 28/26 and 60/79
becomes 61/80.
 Since white lost, the white nodes are incremented in the number of playouts
only, so 16/53 becomes 16/54 and the root 37/100 becomes 37/101.
 We repeat these four steps either for a set number of iterations, or until the allotted
time has expired, and then return the move with the highest number of playouts.
 One very effective selection policy is called “upper confidence bounds applied to
trees” or UCT. The policy ranks each possible move based on an upper confidence
bound formula called UCB1.
 For a node , the formula is:
Fig 3.10

Where,
U (n) is the total utility of all playouts that went through node n,
N ( n) is the number of playouts through node n , and Parent (n) is the parent node of n in the
tree.
U (n)
is the exploitation term: the average utility of n .
N ( n)
 The term with the square root is the exploration term: it has the count N (n) in the
denominator, which means the term will be high for nodes that have only been
explored a few times.
 In the numerator it has the log of the number of times we have explored the parent of
n.

 The pseudo code shows the complete UCT MCTS algorithm. When the iterations
terminate, the move with the highest number of playouts is returned.
 The idea is that a node with wins is better than one with wins,
 UCB1 formula ensures that the node with the most playouts is almost always the node
with the highest win percentage
Advantages of Monte Carlo Tree Search:

 MCTS is a simple algorithm to implement.

 It does not necessarily require any tactical knowledge about the game

 A general MCTS implementation can be reused for any number of games with little
modification

 Focuses on nodes with higher chances of winning the game

 Algorithm is very straightforward to implement

 MCTS supports asymmetric expansion of the search tree based on the circumstances
in which it is operating.

Disadvantages of Monte Carlo Tree Search:

 As the tree growth becomes rapid after a few iterations, it requires a huge amount of
memory.

 Computationally inefficient — when you have a large amount of variables


bounded to different constraints, it requires a lot of time and a lot of computations
to approximate a solution
using this method.

3.5 STOCHASTIC GAMES


 Chess is a deterministic game.
 Games with random chance (dice rolls) are stochastic games.
 Example: backgammon.
 Black player knows where all the pieces are, but can’t know ahead of time where
white will move because of the random dice roll.
 Can’t make a standard game tree.
Fig 3.11

 Requires a tree containing chance nodes in addition to min and max nodes.
 They consider the possible dice rolls.
Fig 3.12

 Each chance node gets dictated by probabilities of the die rolls 1/36, 1/18, etc.
 Uncertainty. Only possible to calculate a position’s expected value: the average of all
possible outcomes of the chance nodes.
 So, generalize the deterministic game’s minimax value to an expectiminimax value
for games with chance nodes.
 Terminal, MIN, and MAX nodes stay the same.
 For chance nodes, sum the value of all outcomes (weighted using probability):

 Where r is the dice roll, and RESULT(s, r) is the same state s with roll r.
3.5.1 Evaluation functions for games of chance
 Because of chance nodes, the meaning of evaluation values is a bit more dicey than in
deterministic games. Consider:
 Assigning values to the leaves has different outcomes (who knew?) [1, 2, 3, 4] leads
to taking a1, but [1, 20, 30, 400] leads to taking a2.
Fig 3.13

 Dealing with this involves a linear transformation of win probability of a position


 Assume the program knows all the die rolls ahead of time (if don’t ,but..) then the
performance is O(b m) , b is branching factor, m is max depth.
 Worse than that though, because you have to consider chance nodes, so:
m m
O(b n ) where n is the number of dice rolls.
 For example: backgammon has a b of ~20 and an n of 21. Sometimes b can be as
much as 4000. 3 plies is about it.
 By putting bounds on possible utility function values, then something like alpha-beta
pruning can be done to improve performance.
 Example:
 If all utility values are between -2 and +2, the leaf node values are bounded.
 Then an upper bound can be placed on a chance node without looking at all
children.
 Alternative : Monter Carlo simulation
 Evaluate the position by starting with alpha-beta algorithm.
 Play many games against yourself, using random dice.
 Provides a win percentage that can be used as a heuristic.
 Works pretty good for backgammon.
 For games with dice, referred to as a rollout.
3.6 Partially Observable Games
 Games where certain aspects are unknown are games with partial observability.
 Games with the “fog of war” are examples.
Scouts, spies, feints, bluffs, etc. are possible.
3.6.1 Kriegspiel: partially observable chess
 Deterministic partially observable games keep the opponent choices a secret.
 Battleship, Stratego, Kriegspiel.
Kriegspiel rules:
 Black and white see only their pieces, with a referee conducting the game.
 Player tells ref about a move, ref resolves the move.
 Humans pull it off, computers can leverage belief states.
 Starting off, white’s belief state is a singleton, black hasn’t moved yet.
 After black’s move, white belief state can have 20 positions because black can
respond in 20 ways.
 Keeping track of the belief state is the problem of state estimation.
 Can work with kriegspiel using partially observable nondeterministic section
 RESULT uses white’s move plus the unpredictable black move.
 Strategy changes in partially-observable games:
 Moves are decided based on every possible percept sequence we could
get.
 Not on each move the opponent might make.
 With kriegspiel: guaranteed checkmate comes with each possible percept sequence
leads to a checkmate, no matter what the opponent does.
 Opponent’s belief state doesn’t matter.
 Simplifies things a ton. Here’s a part of a guaranteed checkmate for King and Rook vs
a King situation.

Fig 3.14
 The general AND-OR search algorithm can be applied to the belief-state space to find
guaranteed checkmates.
 It finds midgame checkmates up to depth 9, which most humans can’t
do.
 In addition to guaranteed checkmates, we have probabilistic checkmate, which
makes no sense in fully observable games. Hence randomization happens.
 By moving randomly, the white king eventually bumps into the black king.
 Black can’t keep guessing escape moves forever.
 In KBNK(King, Bishop, Knight vs King) endgame:
 White gives black infinite choices.
 Eventually black guesses wrong.
 This reveals black’s position.
 This ends in checkmate.
 Hard to find probabilistic checkmate with a reasonable depth, except endgame.
 Usually you get an accidental checkmate early on, where the random choices just
work out.
 So, how likely will a strategy win? How likely is the belief state board state the actual
true board state?
 Now, not all belief states are equally likely. Certain moves are more important than
others, skewing the probabilities.
 But, a player may want to avoid being predictable, skewing the probabilities even
more.
 So, to play optimally, some randomness has to be built into moves on the part of the
player.
 Leads to the idea of an equilibrium solution.
3.6.2 Card games
 Many examples of stochastic partial observability.
 Example:
 Randomly deal cards at game start.
 Cards hidden from other players.
 Bridge, poker, hearts, etc.
 Not exactly like dice, but suggests an algorithm:
 Solve all possible deals of the invisible cards as if fully observable.
 Then, pick best move average over all the deals.
 Then, for every deal s with probability P(s), we can say the desired move is:
argmax α ∑ P ( s ) MINIMAX (RESULT ( s , a ) )
s
 Number of deals can be huge, so solving all of them can be impossible.
 Instead, use a Monte Carlo approximation
o i.e., don’t add up all deals, take a random sample of N deals.
o Consider the probability of s appearing in that sample is P(s), then:
N
1
argmax α ∑ MINIMAX (RESULT ( S i , a ) )
N i=1
o The bigger the N, the better the approximation.
3.7 Constraint Satisfaction Problem
 A constraint satisfaction problem is one of the standard search problems where
instead of saying that state is a black box, we say that state is defined by variables and
values.
 Each state has a certain set of variables and each variable has a certain set of values
and a complete assignment to all the variables, creates a final state.
 A problem is solved when each variable has a value that satisfies all the constraints on
the variable. A problem described this way is called a constraint satisfaction
problem, or CSP.
 This is a simple example of a formal representation language and it allows for general
purpose algorithms with more power than standard search algorithms.
3.7.1 Defining Constraint Satisfaction Problems

3.7.2 Example Problem: Map Coloring


Consider the map of Australia, and we try to solve the graph coloring problem or the map
coloring problem.
 Here, there are seven states in the map and given with three colors, red, blue and
green.
 The task is to color the map such that no two adjacent states have the same color.
 This is a very standard graph theory problem and if we wanted to pass it as a
constraint satisfaction problem, our variables would be 7th.
 Assign one variable for each state and the domains would be red, blue and green.
 These are the three colors that we are allowed to use for coloring each variable and
then the constraint would be that Western Australia cannot be equal to Northern
Territory and so on.
Fig 3.15

 This is the solution, the solution is a specific assignment to each variable such that all
constraints are satisfied.
 It can be helpful to visualize a CSP as a constraint graph, as shown in Fig (b).
 In a constraint graph, each node is a variable and each edge determines whether
there is a constraint between those two variables or not.
 This kind of a constraint graph is a binary constraint graph, where each constraint
relates at most two variables and such a CSP are called binary CSPs.
 A state has many variables and we call them as state variables where, each state
variable is a node.
3.7.3 Variations on the CSP
i. Discrete variables
Finite domains
 The simplest kind of CSP involves variables that have discrete and have finite
domains.
 The simplest kind of CSP involves variables that are discrete and have finite domains.
 Map coloring problems are of this kind.
 The 8-queens problem can also be viewed as finite-domain
 CSP, where the variables Q1,Q2,…..Q8 are the positions each queen in columns
1 , … .8 and each variable has the domain {1,2,3,4,5,6,7,8}.
 If the maximum domain size of any variable in a CSP is d, then the number of
possible complete assignments is O(d n ) - that is, exponential in the number of
variables.
Infinite domains
 Discrete variables can also have infinite domains - for example,the set of integers or
the set of strings.
 With infinite domains, it is no longer possible to describe constraints by enumerating
all allowed combination of values. Instead a constraint language is needed such as
Start Job1 +5≤Start Job3
ii. Continuous variables
 CSPs with continuous domains are very common in real world.
 For example, in operation research field, the scheduling of experiments on the
Hubble Telescope requires very precise timing of observations; the start and finish
of each observation are continuous-valued variables that must obey a variety of
astronomical, precedence and power constraints.
 The best known category of continuous-domain CSPs is that of linear
programming problems, where the constraints must be linear inequalities
forming a convex region.
 Linear programming problems can be solved in time polynomial in the
number of variables.
Varieties of CSPs

Example: red is better than green.


 Often represented by a cost for each variable assignment called constraint
optimization problem.
Cryptarithmetic
 Another example is provided by cryptarithmetic puzzles
 Each letter in a cryptarithmetic puzzle represents a different digit.
 It would be represented as the global constraint

Fig 3.16
 Each letter stands for a distinct digit; the aim is to find a substitution of digits for
letters such that the resulting sum is arithmetically correct, with the added restriction
that no leading zeros are allowed.
 The constraint hypergraph for the cryptarithmetic problem, shown in the Alldiff
constraint as well as the column addition constraints.
 Each constraint is a square box connected to the variables it contains.
3.8 Constraint Propogation
 A number of inference techniques use the constraints to infer which variable/value
pairs are consistent and which are not. These include node, arc, path, and k-consistent.
constraint propagation: Using the constraints to reduce the number of legal values
for a variable, which in turn can reduce the legal values for another variable, and so
on.
local consistency: If we treat each variable as a node in a graph and each binary
constraint as an arc, then the process of enforcing local consistency in each part of the
graph causes inconsistent values to be eliminated throughout the graph.
 There are different types of local consistency:

3.8.1 Node consistency


 A single variable (a node in the CSP network) is node-consistent if all the values in
the variable’s domain satisfy the variable’s unary constraint.
 For example, in the variant of the Australia map-coloring problem where South
Australians dislike green, the variable starts with domain {red, green, blue} , and we
can make it node consistent by eliminating green, leaving SA with the reduced
domain {red, blue}.
 We say that a network is node-consistent if every variable in the network is node-
consistent.

3.8.2 Arc consistency


 A variable in a CSP is arc-consistent if every value in its domain satisfies the
variable’s binary constraints.
 Xi is arc-consistent with respect to another variable Xj if for every value in the current
domain Di there is some value in the domain Dj that satisfies the binary constraint on
the arc (Xi, Xj).
 A network is arc-consistent if every variable is arc-consistent with every other
variable.
 Arc consistency tightens down the domains (unary constraint) using the arcs (binary
constraints).
 AC-3 maintains a queue of arcs which initially contains all the arcs in the CSP.
 AC-3 then pops off an arbitrary arc (Xi, Xj) from the queue and makes Xi arc-
consistent with respect to Xj.
 If this leaves Di unchanged, just moves on to the next arc;
 But if this revises Di, then add to the queue all arcs (Xk, Xi) where Xk is a neighbor of
Xi.
 If Di is revised down to nothing, then the whole CSP has no consistent solution, return
failure;
 Otherwise, keep checking, trying to remove values from the domains of variables
until no more arcs are in the queue.
 The result is an arc-consistent CSP that have the same solutions as the original one
but have smaller domains.
 The complexity of AC-3: Assume a CSP with n variables, each with domain size at
most d, and with c binary constraints (arcs). Checking consistency of an arc can be
done in O(d2) time, total worst-case time is O(cd3).

3.8.3 Path consistency:


 A two-variable set {Xi, Xj} is path-consistent with respect to a third variable Xm if, for
every assignment {Xi = a, Xj = b} consistent with the constraint on {Xi, Xj}, there is
an assignment to Xm that satisfies the constraints on {Xi, Xm} and {Xm, Xj}.
 Path consistency tightens the binary constraints by using implicit constraints that are
inferred by looking at triples of variables.

3.8.4 K- consistency:
 K-consistency: A CSP is k-consistent if, for any set of k-1 variables and for any
consistent assignment to those variables, a consistent value can always be assigned to
any kth variable.
 1-consistency = node consistency; 2-consisency = arc consistency; 3-consistensy =
path consistency.
 A CSP is strongly k-consistent if it is k-consistent and is also (k - 1)-consistent,
(k – 2)-consistent, … all the way down to 1-consistent.
 A CSP with n nodes and make it strongly n-consistent, we are guaranteed to find a
solution in time O(n2d). But algorithm for establishing n-consitentcy must take time
exponential in n in the worse case, also requires space that is exponential in n.
3.8.5 Global constraints
 A global constraint is one involving an arbitrary number of variables (but not
necessarily all variables). Global constraints can be handled by special-purpose
algorithms that are more efficient than general-purpose methods.

i) inconsistency detection for Alldiff constraints


 A simple algorithm: First remove any variable in the constraint that has a singleton
domain, and delete that variable’s value from the domains of the remaining variables.
Repeat as long as there are singleton variables. If at any point an empty domain is
produced or there are more vairables than domain values left, then an inconsistency
has been detected.
 A simple consistency procedure for a higher-order constraint is sometimes more
effective than applying arc consistency to an equivalent set of binary constrains.

ii) inconsistency detection for resource constraint (the atmost constraint)


 We can detect an inconsistency simply by checking the sum of the minimum of
the current domains;
e.g.
 Atmost(10, P1, P2, P3, P4): no more than 10 personnel are assigned in total.
If each variable has the domain {3, 4, 5, 6}, the Atmost constraint cannot be satisfied.
 We can enforce consistency by deleting the maximum value of any domain if it is not
consistent with the minimum values of the other domains.
e.g. If each variable in the example has the domain {2, 3, 4, 5, 6}, the values 5 and 6 can
be deleted from each domain.

iii) inconsistency detection for bounds consistent


 For large resource-limited problems with integer values, domains are represented by
upper and lower bounds and are managed by bounds propagation.
e.g.
 suppose there are two flights F1 and F2 in an airline-scheduling problem, for which the
planes have capacities 165 and 385, respectively. The initial domains for the numbers
of passengers on each flight are
D1 = [0, 165] and D2 = [0, 385].
 Now suppose we have the additional constraint that the two flight together must carry
420 people: F1 + F2 = 420. Propagating bounds constraints, we reduce the domains to
D1 = [35, 165] and D2 = [255, 385].
 A CSP is bounds consistent if for every variable X, and for both the lower-bound
and upper-bound values of X, there exists some value of Y that satisfies the constraint
between X and Y for every variable Y.

3.8.6 Sudoku
 A Sudoku board consists of 81 squares, some of which are initially filled with digits
from 1 to 9. The puzzle is to fill in all the remaining squares such that no digit appears
twice in any row, column, or box. A row, column, or 3 ×3box is called a unit.
Fig 3.17

 A Sudoku puzzle can be considered a CSP with 81 variables, one for each square. We
use the variable names A1 through A9 for the top row (left to right), down to I1
through I9 for the bottom row. The empty squares have the domain {1, 2, 3, 4, 5, 6, 7,
8, 9} and the pre-filled squares have a domain consisting of a single value.
 There are 27 different Alldiff constraints: one for each row, column, and box of 9
squares:
Alldiff(A1, A2, A3, A4, A5, A6, A7, A8, A9)
Alldiff(B1, B2, B3, B4, B5, B6, B7, B8, B9)

Alldiff(A1, B1, C1, D1, E1, F1, G1, H1, I1)
Alldiff(A2, B2, C2, D2, E2, F2, G2, H2, I2)

Alldiff(A1, A2, A3, B1, B2, B3, C1, C2, C3)
Alldiff(A4, A5, A6, B4, B5, B6, C4, C5, C6)

3.9 Backtracking search for CSPs


 Backtracking search, a form of depth-first search, is commonly used for solving
CSPs. Inference can be interwoven with search.
Commutativity:
 CSPs are all commutative. A problem is commutative if the order of application of
any given set of actions has no effect on the outcome.
Backtracking search:
 A depth-first search that chooses values for one variable at a time and backtracks
when a variable has no legal values left to assign.
 Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all
values in the domain of that variable in turn, trying to find a solution. If an
inconsistency is detected, then BACKTRACK returns failure, causing the previous
call to try another value.
 There is no need to supply BACKTRACKING-SEARCH with a domain-specific
initial state, action function, transition model, or goal test.
 BACKTRACKING-SARCH keeps only a single representation of a state and alters
that representation rather than creating a new ones.

Fig 3.18

To solve CSPs efficiently without domain-specific knowledge, address following questions:


i) function SELECT-UNASSIGNED-VARIABLE: which variable should be
assigned next?
Function ORDER-DOMAIN-VALUES: in what order should its values be tried?
ii) function INFERENCE : what inferences should be performed at each step in the
search?
iii) When the search arrives at an assignment that violates a constraint, can the search
avoid repeating this failure?

3.9.1Variable and value ordering


 The backtracking algorithm contains the line

 Variable Selection-fail first


Minimum-remaining-values (MRV) heuristic:
 The idea of choosing the variable with the fewest “legal” value. A.k.a. “most
constrained variable” or “fail-first” heuristic, it picks a variable that is most likely
to cause a failure soon thereby pruning the search tree.
 If some variable X has no legal values left, the MRV heuristic will select X and
failure will be detected immediately—avoiding pointless searches through other
variables.
 E.g. After the assignment for WA=red and NT=green, there is only one possible
value for SA, so it makes sense to assign SA=blue next rather than assigning Q.

Degree heuristic:
 The degree heuristic attempts to reduce the branching factor on future choices by
selecting the variable that is involved in the largest number of constraints on other
unassigned variables. [useful tie-breaker]
 E.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T
has degree 0.
 ORDER-DOMAIN-VALUES
 Value selection-fail -last
 If we are trying to find all the solution to a problem (not just the first one), then the
ordering does not matter.
 Least-constraining-value heuristic: prefers the value that rules out the fewest choice
for the neighboring variables in the constraint graph. (Try to leave the maximum
flexibility for subsequent variable assignments.)
 e.g. We have generated the partial assignment with WA=red and NT=green and that
our next choice is for Q. Blue would be a bad choice because it eliminates the last
legal value left for Q’s neighbor, SA, therefore prefers red to blue.
 The minimum-remaining-values and degree heuristic are domain-independent
methods for deciding which variable to choose next in a backtracking search.
The least-constraining-value heuristic helps in deciding which value to try first for a
given variable.
3.9.2 Interleaving search and inference
 INFERENCE - every time we make a choice of a value for a variable.
 One of the simplest forms of inference is called forward checking. Whenever a
variable X is assigned, the forward-checking process establishes arc consistency for it:
for each unassigned variable Y that is connected to X by a constraint, delete from Y’s
domain any value that is inconsistent with the value chosen for X.
 There is no reason to do forward checking if we have already done arc consistency as
a preprocessing step.
Fig 3.19

 Advantage: For many problems the search will be more effective if we combine the
MRV heuristic with forward checking.
 Disadvantage: Forward checking only makes the current variable arc-consistent, but
doesn’t look ahead and make all the other variables arc-consistent.

MAC (Maintaining Arc Consistency) algorithm:


 [More powerful than forward checking, detect this inconsistency.] After a variable
Xi is assigned a value, the INFERENCE procedure calls AC-3, but instead of a queue
of all arcs in the CSP, we start with only the arcs(Xj, Xi) for all Xj that are unassigned
variables that are neighbors of Xi.
 From there, AC-3 does constraint propagation in the usual way, and if any variable
has its domain reduced to the empty set, the call to AC-3 fails and we know to
backtrack immediately.
 chronological backtracking: The BACKGRACKING-SEARCH has a simple policy,
when a branch of the search fails, back up to the preceding variable and try a different
value for it.
e.g.
 Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue,
T=red}.
 When we try the next variable SA, we see every value violates a constraint.
 We back up to T and try a new color, it cannot resolve the problem.
3.9.3 Intelligent backtracking:
Backtrack to a variable that was responsible for making one of the possible values of the next
variable (e.g. SA) impossible.
 Conflict set for a variable: A set of assignments that are in conflict with some value
for that variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
 backjumping method: Backtracks to the most recent assignment in the conflict set.
(e.g. backjumping would jump over T and try a new value for V.)

 Forward checking can supply the conflict set with no extra work.
 Whenever forward checking based on an assignment X=x deletes a value from Y’s
domain, add X=x to Y’s conflict set;
 If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are
added to the conflict set of X.
 In fact,every branch pruned by backjumping is also pruned by forward checking.
Hence simple backjumping is redundant in a forward-checking search or in a search
that uses stronger consistency checking (such as MAC).

Conflict-directed backjumping:
e.g.
 consider the partial assignment which is proved to be inconsistent: {WA=red,
NSW=red}.
 We try T=red next and then assign NT, Q, V, SA, no assignment can work for these
last 4 variables.
 Eventually we run out of value to try at NT, but simple backjumping cannot work
because NT doesn’t have a complete conflict set of preceding variables that caused to
fail.
 The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together
with any subsequent variables to have no consistent solution. So the algorithm should
backtrack to NSW and skip over T.
 A backjumping algorithm that uses conflict sets defined in this way is called conflict-
direct backjumping.
How to Compute:
 When a variable’s domain becomes empty, the “terminal” failure occurs, that variable
has a standard conflict set.
 Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value
for Xj fails, backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪conf(Xj) – {Xi}.
 The conflict set for an variable means, there is no solution from that variable onward,
given the preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)

 After backjumping from a contradiction, how to avoid running into the same problem
again:

3.9.4 Constraint learning:

The idea of finding a minimum set of variables from the conflict set that causes the problem.
This set of variables, along with their corresponding values, is called a no-good. We then
record the no-good, either by adding a new constraint to the CSP or by keeping a separate
cache of no-goods.
 Backtracking occurs when no legal assignment can be found for a variable.
 A backjumping algorithm that uses conflict sets defined in this way is called
conflict-directed backjumping.
 Conflict-directed backjumping backtracks directly to the source of the problem.
3.10 Local search for CSPs

 Local search algorithms turn out to be very effective in solving many CSPs. They use
a complete-state formulation, where each state assigns a value to every variable, and
the search changes the value of one variable at a time.
 As an example, we’ll use the 8-queens problem, as defined as a CSP. In Figure, we
start on the left with a complete assignment to the 8 variables; typically this will
violate several constraints.
 We then randomly choose a conflicted variable, which turns out to be, the rightmost
column.

Fig 3.20

 The min-conflicts heuristic: In choosing a new value for a variable, select the value
that results in the minimum number of conflicts with other variables.
 In the above figure we see there are two rows that only violate one constraint; we pick
Q8=3 (that is, we move the queen to the 8th column, 3rd row).
 On the next iteration, in the middle board of the figure, we select Q6 as the variable to
change, and note that moving the queen to the 8th row results in no conflicts.
 At this point there are no more conflicted variables, so we have a solution. The
algorithm is shown in Figure 3.21.
 The landscape of a CSP under the mini-conflicts heuristic usually has a series of
plateau. Simulated annealing and Plateau search (i.e. allowing sideways moves to
another state with the same score) can help local search find its way off the plateau.
 This wandering on the plateau can be directed with tabu search: keeping a small list
of recently visited states and forbidding the algorithm to return to those states.
 Constraint weighting: a technique that can help concentrate the search on the
important constraints.
 Each constraint is given a numeric weight Wi, initially all 1.
 At each step, the algorithm chooses a variable/value pair to change that will result in
the lowest total weight of all violated constraints.
Fig 3.21

 The weights are then adjusted by incrementing the weight of each constraint that is
violated by the current assignment.
 Local search can be used in an online setting when the problem changes, this is
particularly important in scheduling problems.

3.11 The Structure of Problems


 We examine ways in which the structure of the problem, as represented by the
constraint graph, can be used to find solutions quickly.
 From CSP, we represent constraint graph

Fig 3.22

DAC & Topological sort


 A constraint graph is a tree when any two variables are connected by only one path.
 Any tree-structured CSP can be solved in linear time in the number of variables.
 A CSP is defined to be directed arc-consistent (DAC) under an ordering of variables
X 1 , X 2 , … X n if and only if every X i is arc-consistent with each X j for j>i.
 An ordering of the variables such that each variable appears after its parent in the tree.
Such an ordering is called a topological sort.
 We have a directed arc-consistent (DAC) graph, we can just march down the list of
variables and choose any remaining value. Since each link from a parent to its child is
arc-consistent, we know that for any value we choose for the parent, there will be a
valid value left to choose for the child.
Two ways to reduce constraint graphs to trees
3.11.1Cutset conditioning
 The general algorithm is as follows:
 Choose a subset S of the CSP’s variables such that the constraint graph
becomes a tree after removal of S. S is called a cycle cutset.
 For each possible assignment to the variables in S that satisfies all
constraints on S,
a) remove from the domains of the remaining variables any
values that are inconsistent with the assignment for S, and
b) If the remaining CSP has a solution, return it together with
the assignment for S.

Fig 3.23
3.11.1Tree Decomposition
 A tree decomposition must satisfy the following three requirements:
i. Every variable in the original problem appears in atleast one of the
subproblems.
ii. If two variables are connected by a constraint in the original problem, they
must appear together (along with the constraint) in atleast one of the
subproblems.
iii. If a variable appears in two subproblems in a tree, it must appear in every
subproblem along the path connecting those subproblems.

Fig 3.24

3.11.3 Value symmetry


 Consider the map-coloring problem with d colors. For every consistent
solution, there is actually a set od d! solutions formed by permuting the color
names.
 For example, on the Australia map we know that WA,NT, and SA must all
have different colors, but there are 3!=6 ways to assign three colors to three
regions.
 This is called value symmetry.

You might also like