AI unit 3
AI unit 3
Game theory
Game theory is used in various fields to lay out various situations and
predict their most likely outcomes. Businesses may use it, for example, to set
prices, decide whether to acquire another firm, and determine how to handle a
lawsuit.
KEY TAKEAWAYS
Here are a few terms commonly used in the study of game theory:
Game: Any set of circumstances that has a result dependent on the actions
of two or more decision-makers (players).
Players: A strategic decision-maker within the context of the game.
Strategy: A complete plan of action a player will take given the set of
circumstances that might arise within the game.
Payoff: The payout a player receives from arriving at a particular
outcome. The payout can be in any quantifiable form, from dollars
to utility.
Information set: The information available at a given point in the game.
The term information set is most usually applied when the game has a
sequential component.
Equilibrium: The point in a game where both players have made their
decisions and an outcome is reached.
The Nash equilibrium is reached over time, in most cases. However, once
the Nash equilibrium is reached, it will not be deviated from. After we learn how
to find the Nash equilibrium, take a look at how a unilateral move would affect
the situation. Does it make any sense? It shouldn't, and that's why the Nash
equilibrium is described as "no regrets."
Economics
Business
Game theory in business may most resemble a game tree as shown below.
A company may start in position one and must decide on two outcomes. However,
there are continually other decisions to be made; the final payoff amount is not
known until the final decision has been processed.
Project Management
When dealing with an internal team, game theory may be less prevalent as
all participants working for the same employer often have a greater shared
interest for success. However, third-party consultants or external parties assisting
with a project may be incentivized by other means separate from the project's
success.
The strategy of Black Friday shopping is at the heart of game theory. The
concept holds that should companies reduce prices, more consumers will buy
more goods. The relationship between a consumer, a good, and the financial
exchange to transfer ownership plays a major part in game theory as each
consumer has a different set of expectations.
When there is a direct conflict between multiple parties striving for the
same outcome, it is often called a zero-sum game. This means that for every
winner, there is a loser. Alternatively, it means that the collective net benefit
received is equal to the collective net benefit lost. Lots of sporting events are a
zero-sum game as one team wins and another team loses.
Game theory can begin and end in a single instance. Like much of life, the
underlying competition starts, progresses, ends, and cannot be redone. This is
often the case with equity traders, who must wisely choose their entry point and
exit point as their decision may not easily be undone or retried.
On the other hand, some repeated games continue on and seamlessly never
end. These types of games often contain the same participants each time, and
each party has the knowledge of what occurred last time. For example, consider
rival companies trying to price their goods. Whenever one makes a price
adjustment, so may the other. This circular competition repeats itself across
product cycles or sale seasonality.
There are several "games" that game theory analyzes. Below, we will
briefly describe a few of these.
"Tit for tat" is said to be the optimal strategy in a prisoner's dilemma. Tit
for tat was introduced by Anatol Rapoport, who developed a strategy in which
each participant in an iterated prisoner's dilemma follows a course of action
consistent with their opponent's previous turn. For example, if provoked, a player
subsequently responds with retaliation; if unprovoked, the player cooperates.
The image below depicts the dilemma where the choice of the participant
in the column and the choice of the participant in the row may clash. For example,
both parties may receive the most favorable outcome if both choose row/column
1. However, each faces the risk of strong adverse outcomes should the other party
not choose the same outcome.
Dictator Game
This is a simple game in which Player A must decide how to split a cash
prize with Player B, who has no input into Player A’s decision. While this is not
a game theory strategy per se, it does provide some interesting insights into
people’s behavior. Experiments reveal about 50% keep all the money to
themselves, 5% split it equally, and the other 45% give the other participant a
smaller share.
Volunteer’s Dilemma
The centipede game concludes as soon as a player takes the stash, with
that player getting the larger portion and the other player getting the smaller
portion. The game has a pre-defined total number of rounds, which are known to
each player in advance.
Game theory participants can decide between a few primary ways to play
their game. In general, each participant must decide what level of risk they are
willing to take and how far they are willing to go to pursue the best possible
outcome.
Maximax Strategy
Maximin Strategy
Dominant Strategy
Pure Strategy
A mixed strategy may seem like random chance, but there is much thought
that must go into devising a plan of mixing elements or actions. Consider the
relationship between a baseball pitcher and batter. The pitcher cannot throw the
same pitch each time; otherwise, the batter could predict what would come next.
Instead, the pitcher must mix its strategy from pitch to pitch to create a sense of
unpredictability that it hopes to benefit from.
The biggest issue with game theory is that, like most other economic
models, it relies on the assumption that people are rational actors that are self-
interested and utility-maximizing. Of course, we are social beings who do
cooperate often at our own expense. Game theory cannot account for the fact that
in some situations we may fall into a Nash equilibrium, and other times not,
depending on the social context and who the players are.
Games are usually intriguing because they are difficult to solve. Chess,
for example, has an average branching factor of around 35, and games
frequently stretch to 50 moves per player, therefore the search tree has roughly
35100 or 10154 nodes (despite the search graph having “only” about 1040
unique nodes). As a result, games, like the real world, necessitate the ability to
make some sort of decision even when calculating the best option is impossible.
Let us start with games with two players, whom we’ll refer to as MAX and MIN
for obvious reasons. MAX is the first to move, and then they take turns until the
game is finished. At the conclusion of the game, the victorious player receives
points, while the loser receives penalties. A game can be formalized as a type of
search problem that has the following elements:
S0: The initial state of the game, which describes how it is set up at the start.
Player (s): Defines which player in a state has the move.
Actions (s): Returns a state’s set of legal moves.
Result (s, a): A transition model that defines a move’s outcome.
Terminal-Test (s): A terminal test that returns true if the game is over but
false otherwise. Terminal states are those in which the game has come to a
conclusion.
Utility (s, p): A utility function (also known as a payout function or objective
function ) determines the final numeric value for a game that concludes in the
terminal state s for player p. The result in chess is a win, a loss, or a draw,
with values of +1, 0, or 1/2. Backgammon’s payoffs range from 0 to +192,
but certain games have a greater range of possible outcomes. A zero-sum
game is defined (confusingly) as one in which the total reward to all players
is the same for each game instance. Chess is a zero-sum game because each
game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have
been a preferable name, 22 but zero-sum is the usual term and makes sense if
each participant is charged 1.
The game tree for the game is defined by the beginning state, ACTIONS
function, and RESULT function—a tree in which the nodes are game states and
the edges represent movements. The figure below depicts a portion of the tic-
tac-toe game tree (noughts and crosses). MAX may make nine different
maneuvers from his starting position. The game alternates between MAXs
setting an X and MINs placing an O until we reach leaf nodes corresponding to
terminal states, such as one player having three in a row or all of the squares
being filled. The utility value of the terminal state from the perspective of MAX
is shown by the number on each leaf node; high values are thought to be
beneficial for MAX and bad for MIN
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880
terminal nodes. However, because there are over 1040 nodes in chess, the game
tree is better viewed as a theoretical construct that cannot be realized in the
actual world. But, no matter how big the game tree is, MAX’s goal is to find a
solid move. A tree that is superimposed on the whole game tree and examines
enough nodes to allow a player to identify what move to make is referred to as
a search tree.
We’ll move to the trivial game in the figure below since even a simple
game like tic-tac-toe is too complex for us to draw the full game tree on one
page. MAX’s root node moves are designated by the letters a1, a2, and a3.
MIN’s probable answers to a1 are b1, b2, b3, and so on. This game is over after
MAX and MIN each make one move. (In game terms, this tree consists of two
half-moves and is one move deep, each of which is referred to as a ply.) The
terminal states in this game have utility values ranging from 2 to 14.
The optimal strategy can be found from the minimax value of each node,
which we express as MINIMAX, given a game tree (n). Assuming that both
players play optimally from there through the finish of the game, the utility (for
MAX) of being in the corresponding state is the node’s minimax value. The
usefulness of a terminal state is obviously its minimax value. Furthermore, if
given the option, MAX prefers to shift to a maximum value state, whereas MIN
wants to move to a minimum value state. So here’s what we’ve got:
Let’s use these definitions to analyze the game tree shown in the figure
above. The game’s UTILITY function provides utility values to the terminal
nodes on the bottom level. Because the first MIN node, B, has three successor
states with values of 3, 12, and 8, its minimax value is 3. Minimax value 2 is
also used by the other two MIN nodes. The root node is a MAX node, with
minimax values of 3, 2, and 2, resulting in a minimax value of 3. We can also
find the root of the minimax decision: action a1 is the best option for MAX since
it leads to the highest minimax value.
This concept of optimal MAX play requires that MIN plays optimally as
well—it maximizes MAX’s worst-case outcome. What happens if MIN isn’t
performing at its best? Then it’s a simple matter of demonstrating that MAX can
perform even better. Other strategies may outperform the minimax method
against suboptimal opponents, but they will always outperform optimal
opponents.
Alpha-beta search
point along the path of Maximizer. The initial value of alpha is -∞.
2. Beta: The best (lowest-value) choice we have found so far at any
point along the path of Minimizer. The initial value of beta is +∞.
o The Alpha-beta pruning to a standard minimax algorithm returns the same
move as the standard algorithm does, but it removes all the nodes which
are not really affecting the final decision but making algorithm slow. Hence
by pruning these nodes, it makes the algorithm fast.
1. α>=β
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value
of α is compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value
of α at node D and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as
this is a turn of Min, Now β= +∞, will compare with the available subsequent
nodes value, i.e. min (∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node
E, and the values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will change. The
current value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node
E α= 5 and β= 3, where α>=β, so the right successor of E will be pruned, and
algorithm will not traverse it, and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A.
At node A, the value of alpha will be changed the maximum available value is 3
as max (-∞, 3)= 3, and β= +∞, these two values now passes to right successor of
A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is
0, and max(3,0)= 3, and then compared with right child which is 1, and max(3,1)=
3 still α remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the
value of beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at
C, α=3 and β= 1, and again it satisfies the condition α>=β, so the next child of C
which is G will be pruned, and the algorithm will not compute the entire sub-tree
G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1)
= 3. Following is the final game tree which is the showing the nodes which are
computed and nodes which has never computed. Hence the optimal value for the
maximizer is 3 for this example.
Monte Carlo Tree Search (MCTS) is a heuristic search set of rules that
has won big attention and reputation within the discipline of synthetic
intelligence, specially in the area of choice-making and game playing. It is
known for its ability to effectively handle complex and strategic video games
with massive search areas, in which traditional algorithms may additionally
struggle due to the full-size number of feasible actions or actions.
MCTS combines the standards of Monte Carlo strategies, which rely upon
random sampling and statistical evaluation, with tree-primarily based search
techniques. Unlike traditional search algorithms that rely upon exhaustive
exploration of the entire seek area, MCTS specializes in sampling and exploring
only promising areas of the hunt area.
The center idea in the back of MCTS is to build a seek tree incrementally
by using simulating more than one random performs (regularly known as
rollouts or playouts) from the current recreation nation. These simulations are
carried out until a terminal state or a predefined intensity is reached. The results
of these simulations are then backpropagated up the tree, updating the records
of the nodes visited at some stage in the play, which includes the wide variety
of visits and the win ratios.
1. Handling Complex and Strategic Games: MCTS excels in games with large
playing domains, its principles and techniques are applicable to other problem
domains as well. MCTS has been successfully applied to planning problems,
scheduling, optimization, and decision-making in various real-world
scenarios. Its ability to handle complex decision-making and uncertainty
makes it valuable in a range of applications.
7. Domain Independence: MCTS is relatively domain-independent. It does not
Selection: In this process, the MCTS algorithm traverses the current tree
from the root node using a specific strategy. The strategy uses an evaluation
function to optimally select nodes with the highest estimated value. MCTS
uses the Upper Confidence Bound (UCB) formula applied to trees as the
strategy in the selection process to traverse the tree. It balances the
exploration-exploitation trade-off. During tree traversal, a node is selected
based on some parameters that return the maximum value. The parameters
are characterized by the formula that is typically used for this purpose is given
below.
where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns
the greatest value from the above equation will be one that will get selected.
During traversal, once a child node is found which is also a leaf node, the
MCTS jumps into the expansion step.
Expansion: In this process, a new child node is added to the tree to that node
which was optimally reached during the selection process.
Simulation: In this process, a simulation is performed by choosing moves or
strategies until a result or predefined state is achieved.
Backpropagation: After determining the value of the newly added node, the
remaining tree must be updated. So, the backpropagation process is
performed, where it backpropagates from the new node to the root node.
During the process, the number of simulation stored in each node is
incremented. Also, if the new node’s simulation results in a win, then the
number of wins is also incremented.
The above steps can be visually understood by the diagram given below:
These types of algorithms are particularly useful in turn based games
where there is no element of chance in the game mechanics, such as Tic Tac
Toe, Connect 4, Checkers, Chess, Go, etc. This has recently been used by
Artificial Intelligence Programs like AlphaGo, to play against the world’s top
Go players. But, its application is not limited to games only. It can be used in
any situation which is described by state-action pairs and simulations used to
forecast outcomes.
effectively without any knowledge in the particular domain, apart from the
rules and end conditions, and can find its own moves and learn from them by
playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used
1. As the tree growth becomes rapid after a few iterations, it requires a huge
amount of memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain
scenarios, there might be a single branch or path, that might lead to loss
against the opposition when implemented for those turn-based games. This is
mainly due to the vast amount of combinations and each of the nodes might
not be visited enough number of times to understand its result or outcome in
the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively
decide the most efficient path. So, there is a bit of a speed issue there.
rollouts to obtain accurate statistics and make informed decisions. This can
be computationally expensive, especially in complex domains with a large
search space. Improving the sample efficiency of MCTS is an ongoing
research area.
3. High Variance: The outcomes of individual rollouts in MCTS can be highly
variable due to the random nature of the simulations. This can lead to
inconsistent estimations of action values and introduce noise in the decision-
making process. Techniques such as variance reduction and progressive
widening are used to mitigate this issue.
4. Heuristic Design: MCTS relies on heuristics to guide the search and prioritize
introduce additional challenges and issues for MCTS. For example, games
with hidden or imperfect information, large branching factors, or continuous
action spaces require adaptations and extensions of the basic MCTS
algorithm to handle these complexities effectively.
Stochastic games
White knows his or her own legal moves, but he or she has no idea how
Black will roll, and thus has no idea what Black’s legal moves will be. That
means White won’t be able to build a normal game tree-like in chess or tic-tac-
toe. In backgammon, in addition to M A X and M I N nodes, a game tree must
include chance nodes. The figure below depicts chance nodes as circles. The
possible dice rolls are indicated by the branches leading from each chance node;
each branch is labelled with the roll and its probability. There are 36 different
ways to roll two dice, each equally likely, yet there are only 21 distinct rolls
because a 6–5 is the same as a 5–6. P (1–1) = 1/36 because each of the six
doubles (1–1 through 6–6) has a probability of 1/36. Each of the other 15 rolls
has a 1/18 chance of happening.
Hidden States: The environment's true state, also known as the hidden state,
evolves according to a probabilistic process. The agent's observations provide
noisy or incomplete information about this hidden state.
Belief State: To handle partial observability, the agent maintains a belief state,
which is a probability distribution over possible hidden states. The belief state
captures the agent's uncertainty about the true state of the environment.
Action and Observation: The agent takes actions based on its belief state, and it
receives observations that depend on the hidden state. These observations help
the agent update its belief state and make decisions.
Objective and Policy: The agent's goal is to find a policy—a mapping from belief
states to actions—that maximizes a specific objective, such as cumulative
rewards or long-term expected utility.
Belief Space Methods: These methods work directly in the space of belief states
and involve updating beliefs based on observations and actions. Techniques like
the POMDP forward algorithm and backward induction are used to compute
optimal policies.
Basically, there are three different categories of limitations in regard towards the
parameters:
o Unary restrictions are the easiest kind of restrictions because they only
limit the value of one variable.
o Binary resource limits: These restrictions connect two parameters. A value
between x1 and x3 can be found in a variable named x2.
o Global Resource limits: This kind of restriction includes a unrestricted
amount of variables.
The main kinds of restrictions are resolved using certain kinds of resolution
methodologies:
Think of a Sudoku puzzle where some of the squares have initial fills of certain
integers.
You must complete the empty squares with numbers between 1 and 9, making
sure that no rows, columns, or blocks contains a recurring integer of any kind.
This solving multi - objective issue is pretty elementary. A problem must be
solved while taking certain limitations into consideration.
The integer range (1-9) that really can occupy the other spaces is referred to as a
domain, while the empty spaces themselves were referred as variables. The values
of the variables are drawn first from realm. Constraints are the rules that
determine how a variable will select the scope.
Constraint propagation
Backtracking
Even simple backtracking (BT) performs some kind of consistency
technique and it can be seen as a combination of pure generate &
test and a fraction of arc consistency. The BT algorithm tests arc
consistency among already instantiated variables, i.e., the algorithm
checks the validity of constraints considering the partial instantiation.
Because the domains of instantiated variables contains just one value,
it is possible to check only those constraints/arcs containing the last
instantiated variable. If any domain is reduced then the
corresponding constraint is not consistent and the algorithm
backtracks to a new instantiation.
Forward Checking
Look Ahead
Look ahead prunes the search tree further more than forward
checking but, again, it should be noted that look ahead does even
more work when each assignment is added to the current partial
solution than forward checking.
The following figure shows which constraints are tested when the above
described propagation techniques are applied.
Backtracking search for CSP
Define CSP
CSPs represent a state with a set of variable/value pairs and represent the
conditions for a solution by a set of constraints on the variables. Many important
real-world problems can be described as CSPs.CSP (constraint satisfaction
problem): Use a factored representation (a set of variables, each of which has a
value) for each state, a problem that is solved when each variable has a value that
satisfies all the constraints on the variable is called a CSP.
A relation can be represented as: a. an explicit list of all tuples of values that
satisfy the constraint; or b. an abstract relation that supports two operations. (e.g.
if X1 and X2 both have the domain {A,B}, the constraint saying “the two variables
must have different values” can be written as a. <(X1,X2),[(A,B),(B,A)]> or b.
<(X1,X2),X1≠X2>.
Assignment:
An assignment that does not violate any constraints is called a consistent or legal
assignment;
A complete assignment is one in which every variable is assigned;
A partial assignment is one that assigns values to only some of the variables.
Map coloring
To formulate a CSP:
define the variables to be the regions X = {WA, NT, Q, NSW, V, SA, T}.
2) CSP solvers can be faster than state-space searchers because the CSP solver
can quickly eliminate large swatches of the search space;
3) With CSP, once we find out that a partial assignment is not a solution, we can
immediately discard further refinements of the partial assignment.
Job-shop scheduling
Precedent constraints: Whenever a task T1 must occur before task T2, and T1 take
duration d1 to complete. We add an arithmetic constraint of the form T1 + d1 ≤ T2 .
So,
Disjunctive constraint: AxleF and AxleB must not overlap in time. So,
Di = {1, 2, 3, …, 27}.
The simplest kind of CSP involves variables that have discrete, finite domains.
E.g. Map-coloring problems, scheduling with time limits, the 8-queens problem.
A discrete domain can be infinite. e.g. The set of integers or strings. With infinite
domains, to describe constraints, a constraint language must be used instead of
enumerating all allowed combinations of values.
CSP with continuous domains are common in the real world and are widely
studied in the field of operations research.
The simplest type is the unary constraint, which restricts the value of a single
variable.
A binary constraint relates two variables. (e.g. SA≠NSW.) A binary CSP is one
with only binary constraints, can be represented as a constraint graph.
e.g. If the original graph has variable {X,Y,Z} and constraints <(X,Y,Z),C 1> and
<(X,Y),C2>, then the dual graph would have variables {C1,C2} with the binary
constraint <(X,Y),R1>, where (X,Y) are the shared variables and R1 is a new
relation that defines the constraint between the shared variables.
We might prefer a global constraint (such as Alldiff) rather than a set of binary
constraints for two reasons:
local consistency: If we treat each variable as a node in a graph and each binary
constraint as an arc, then the process of enforcing local consistency in each part
of the graph causes inconsistent values to be eliminated throughout the graph.
A single variable (a node in the CSP network) is node-consistent if all the values
in the variable’s domain satisfy the variable’s unary constraint.
Arc consistency
AC-3 algrithm:
AC-3 maintains a queue of arcs which initially contains all the arcs in the CSP.
AC-3 then pops off an arbitrary arc (Xi, Xj) from the queue and makes Xi arc-
consistent with respect to Xj.
If this leaves Di unchanged, just moves on to the next arc;
But if this revises Di, then add to the queue all arcs (Xk, Xi) where Xk is a neighbor
of Xi.
If Di is revised down to nothing, then the whole CSP has no consistent solution,
return failure;
Otherwise, keep checking, trying to remove values from the domains of variables
until no more arcs are in the queue.
The result is an arc-consistent CSP that have the same solutions as the original
one but have smaller domains.
Assume a CSP with n variables, each with domain size at most d, and
with c binary constraints (arcs). Checking consistency of an arc can be done in
O(d2) time, total worst-case time is O(cd3).
Path consistency
Path consistency tightens the binary constraints by using implicit constraints that
are inferred by looking at triples of variables.
K-consistency
K-consistency: A CSP is k-consistent if, for any set of k-1 variables and for any
consistent assignment to those variables, a consistent value can always be
assigned to any kth variable.
Global constraints
A simple algorithm: First remove any variable in the constraint that has a
singleton domain, and delete that variable’s value from the domains of the
remaining variables. Repeat as long as there are singleton variables. If at any point
an empty domain is produced or there are more vairables than domain values left,
then an inconsistency has been detected.
e.g.
Atmost(10, P1, P2, P3, P4): no more than 10 personnel are assigned in total.
If each variable has the domain {3, 4, 5, 6}, the Atmost constraint cannot be
satisfied.
e.g. If each variable in the example has the domain {2, 3, 4, 5, 6}, the values 5
and 6 can be deleted from each domain.
e.g.
Now suppose we have the additional constraint that the two flight together must
carry 420 people: F1 + F2 = 420. Propagating bounds constraints, we reduce the
domains to
A CSP is bounds consistent if for every variable X, and for both the lower-bound
and upper-bound values of X, there exists some value of Y that satisfies the
constraint between X and Y for every variable Y.
Sudoku
A Sudoku puzzle can be considered a CSP with 81 variables, one for each square.
We use the variable names A1 through A9 for the top row (left to right), down to
I1 through I9 for the bottom row. The empty squares have the domain {1, 2, 3, 4,
5, 6, 7, 8, 9} and the pre-filled squares have a domain consisting of a single value.
There are 27 different Alldiff constraints: one for each row, column, and box of
9 squares:
Backtracking search: A depth-first search that chooses values for one variable
at a time and backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries
all values in the domain of that variable in turn, trying to find a solution. If an
inconsistency is detected, then BACKTRACK returns failure, causing the
previous call to try another value.
3)When the search arrives at an assignment that violates a constraint, can the
search avoid repeating this failure?
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the
variable with the fewest “legal” value. A.k.a. “most constrained variable” or “fail-
first” heuristic, it picks a variable that is most likely to cause a failure soon thereby
pruning the search tree. If some variable X has no legal values left, the MRV
heuristic will select X and failure will be detected immediately—avoiding
pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible
value for SA, so it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor
on future choices by selecting the variable that is involved in the largest number
of constraints on other unassigned variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or
3; T has degree 0.
ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then
the ordering does not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest
choice for the neighboring variables in the constraint graph. (Try to leave the
maximum flexibility for subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and
that our next choice is for Q. Blue would be a bad choice because it eliminates
the last legal value left for Q’s neighbor, SA, therefore prefers red to blue.
Advantage: For many problems the search will be more effective if we combine
the MRV heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent,
but doesn’t look ahead and make all the other variables arc-consistent.
3. Intelligent backtracking
chronological backtracking: The BACKGRACKING-SEARCH in Fig 6.5. When
a branch of the search fails, back up to the preceding variable and try a different
value for it. (The most recent decision point is revisited.)
e.g.
When we try the next variable SA, we see every value violates a constraint.
Conflict set for a variable: A set of assignments that are in conflict with some
value for that variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method: Backtracks to the most recent assignment in the conflict
set.
(e.g. backjumping would jump over T and try a new value for V.)
Forward checking can supply the conflict set with no extra work.
If the last value is deleted from Y’s domain, the assignment in the conflict set of
Y are added to the conflict set of X.
e.g.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for
these last 4 variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work
because NT doesn’t have a complete conflict set of preceding variables that
caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT
together with any subsequent variables to have no consistent solution. So the
algorithm should backtrack to NSW and skip over T.
When a variable’s domain becomes empty, the “terminal” failure occurs, that
variable has a standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible
value for Xj fails, backjump to the most recent variable Xi in conf(Xj), and set
The conflict set for an variable means, there is no solution from that variable
onward, given the preceding assignment to the conflict set.
e.g.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT,
NSW}.
After backjumping from a contradiction, how to avoid running into the same
problem again:
Constraint learning: The idea of finding a minimum set of variables from the
conflict set that causes the problem. This set of variables, along with their
corresponding values, is called a no-good. We then record the no-good, either by
adding a new constraint to the CSP or by keeping a separate cache of no-goods.
Local search algorithms for CSPs use a complete-state formulation: the initial
state assigns a value to every variable, and the search change the value of one
variable at a time.
The min-conflicts heuristic: In choosing a new value for a variable, select the
value that results in the minimum number of conflicts with other variables.
Local search techniques in Section 4.1 can be used in local search for CSPs.
The landscape of a CSP under the mini-conflicts heuristic usually has a series of
plateau. Simulated annealing and Plateau search (i.e. allowing sideways moves
to another state with the same score) can help local search find its way off the
plateau. This wandering on the plateau can be directed with tabu search: keeping
a small list of recently visited states and forbidding the algorithm to return to
those tates.
Constraint weighting: a technique that can help concentrate the search on the
important constraints.
At each step, the algorithm chooses a variable/value pair to change that will result
in the lowest total weight of all violated constraints.
The weights are then adjusted by incrementing the weight of each constraint that
is violated by the current assignment.
Local search can be used in an online setting when the problem changes, this is
particularly important in scheduling problems.
The structure of problem
The structure of the problem as represented by the constraint graph can be used
to find solution quickly.
By using DAC, any tree-structured CSP can be solved in time linear in the number
of variables.
Choose an ordering of the variable such that each variable appears after its parent
in the tree. (topological sort)
Any tree with n nodes has n-1 arcs, so we can make this graph directed arc-
consistent in O(n) steps, each of which must compare up to d possible domain
values for 2 variables, for a total time of O(nd2).
Once we have a directed arc-consistent graph, we can just march down the list of
variables and choose any remaining value.
Since each link from a parent to its child is arc consistent, we won’t have to
backtrack, and can move linearly through the variables.
There are 2 primary ways to reduce more general constraint graphs to trees:
Choose a subset S of the CSP’s variables such that the constraint graph becomes
a tree after removal of S. S is called a cycle cutset.
For each possible assignment to the variables in S that satisfies all constraints on
S,
(a) remove from the domain of the remaining variables any values that are
inconsistent with the assignment for S, and
(b) If the remaining CSP has a solution, return it together with the assignment
for S.
·Every variable in the original problem appears in at least one of the subproblems.
·If 2 variables are connected by a constraint in the original problem, they must
appear together (along with the constraint) in at least one of the subproblems.
If we can solve all the subproblems, then construct a global solution as follows:
First, view each subproblem as a “mega-variable” whose domain is the set of all
solutions for the subproblem.
Then, solve the constraints connecting the subproblems using the efficient
algorithm for trees.
Tree width:
The tree width of a tree decomposition of a graph is one less than the size of the
largest subproblems.
The tree width of the graph itself is the minimum tree width among all its tree
decompositions.
e.g.
Consider the map-coloring problems with n colors, for every consistent solution,
there is actually a set of n! solutions formed by permuting the color names.(value
symmetry)
On the Australia map, WA, NT and SA must all have different colors, so there
are 3!=6 ways to assign.