0% found this document useful (0 votes)
114 views89 pages

AI unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views89 pages

AI unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

UNIT III

GAME PLAYING AND CSP

Syllabus : Game theory – optimal decisions in games – alpha-beta


search – monte-carlo tree search – stochastic games – partially
observable games. Constraint satisfaction problems – constraint
propagation – backtracking search for CSP – local search for CSP –
structure of CSP.

Game theory

What Is Game Theory?

Game theory is a theoretical framework for conceiving social


situations among competing players. In some respects, game theory is the
science of strategy, or at least the optimal decision-making of independent and
competing actors in a strategic setting.

Game theory is used in various fields to lay out various situations and
predict their most likely outcomes. Businesses may use it, for example, to set
prices, decide whether to acquire another firm, and determine how to handle a
lawsuit.

KEY TAKEAWAYS

 Game theory is a theoretical framework to conceive social situations


among competing players.
 The intention of game theory is to produce optimal decision-making of
independent and competing actors in a strategic setting.
 Using game theory, real-world scenarios for such situations as pricing
competition and product releases (and many more) can be laid out and
their outcomes predicted.
 Scenarios include the prisoner's dilemma and the dictator game among
many others.
 Different types of game theory include cooperative/non-cooperative, zero-
sum/non-zero-sum, and simultaneous/sequential.
How Game Theory Works

Game theory tries to understand the strategic actions of two or more


"players" in a given situation containing set rules and outcomes. Any time we
have a situation with two or more players that involve known payouts or
quantifiable consequences, we can use game theory to help determine the most
likely outcomes.

The focus of game theory is the game, which serves as a model of an


interactive situation among rational players. The key to game theory is that one
player's payoff is contingent on the strategy implemented by the other player.

The game identifies the players' identities, preferences, and available


strategies and how these strategies affect the outcome. Depending on the model,
various other requirements or assumptions may be necessary.

Game theory has a wide range of applications, including psychology,


evolutionary biology, war, politics, economics, and business. Despite its many
advances, game theory is still a young and developing science.

Useful Terms in Game Theory

Here are a few terms commonly used in the study of game theory:
 Game: Any set of circumstances that has a result dependent on the actions
of two or more decision-makers (players).
 Players: A strategic decision-maker within the context of the game.
 Strategy: A complete plan of action a player will take given the set of
circumstances that might arise within the game.
 Payoff: The payout a player receives from arriving at a particular
outcome. The payout can be in any quantifiable form, from dollars
to utility.
 Information set: The information available at a given point in the game.
The term information set is most usually applied when the game has a
sequential component.
 Equilibrium: The point in a game where both players have made their
decisions and an outcome is reached.

The Nash Equilibrium

Nash equilibrium is an outcome reached that, once achieved, means no


player can increase payoff by changing decisions unilaterally. It can also be
thought of as "no regrets," in the sense that once a decision is made, the player
will have no regrets concerning decisions considering the consequences.

The Nash equilibrium is reached over time, in most cases. However, once
the Nash equilibrium is reached, it will not be deviated from. After we learn how
to find the Nash equilibrium, take a look at how a unilateral move would affect
the situation. Does it make any sense? It shouldn't, and that's why the Nash
equilibrium is described as "no regrets."

Generally, there can be more than one equilibrium in a game. However,


this usually occurs in games with more complex elements than two choices by
two players. In simultaneous games that are repeated over time, one of these
multiple equilibria is reached after some trial and error.
This scenario of different choices over time before reaching equilibrium
is most often played out in the business world when two firms are determining
prices for highly interchangeable products, such as airfare or soft drinks.

Impact of Game Theory

Game theory is present in almost every industry or field of research. Its


expansive theory can pertain to many situations, making it a versatile and
important theory to comprehend. Here are several fields of study directly
impacted by game theory.

Economics

Game theory brought about a revolution in economics by addressing


crucial problems in prior mathematical economic models. For instance,
neoclassical economics struggled to understand entrepreneurial anticipation and
could not handle the imperfect competition. Game theory turned attention away
from steady-state equilibrium toward the market process.

Economists often use game theory to understand oligopoly firm


behavior. It helps to predict likely outcomes when firms engage in certain
behaviors, such as price-fixing and collusion.

Business

In business, game theory is beneficial for modeling competing behaviors


between economic agents. Businesses often have several strategic choices that
affect their ability to realize economic gain. For example, businesses may face
dilemmas such as whether to retire existing products and develop new ones or
employ new marketing strategies.
Businesses can often choose their opponent as well. Some focus on
external forces and compete against other market participants. Others set internal
goals and strive to be better than their previous versions. Whether external or
internal, companies are always competing for resources, attempting to hire the
best candidates away from rivals, and gather the attention of customers away
from competing goods.

Game theory in business may most resemble a game tree as shown below.
A company may start in position one and must decide on two outcomes. However,
there are continually other decisions to be made; the final payoff amount is not
known until the final decision has been processed.
Project Management

Project management involves social aspects of game theory as different


participants may have different influences. For example, a project manager may
be incentivized to successfully complete a building development project.
Meanwhile, the construction worker may be incentivized to work slower for
safety or delay the project to incur more billable hours.

When dealing with an internal team, game theory may be less prevalent as
all participants working for the same employer often have a greater shared
interest for success. However, third-party consultants or external parties assisting
with a project may be incentivized by other means separate from the project's
success.

Consumer Product Pricing

The strategy of Black Friday shopping is at the heart of game theory. The
concept holds that should companies reduce prices, more consumers will buy
more goods. The relationship between a consumer, a good, and the financial
exchange to transfer ownership plays a major part in game theory as each
consumer has a different set of expectations.

Other than sweeping sales in advance of the holiday season, companies


must utilize game theory when pricing products for launch or in anticipation of
competition from rival goods. A balance must be found. Price a good too low
and it won't reap profit, yet price a good too high and it might scare customers
toward a substitute.
Types of Game Theories

Cooperative vs. Non-Cooperative Games

Although there are many types of game theories, such as


symmetric/asymmetric, simultaneous/sequential, and so on, cooperative and
non-cooperative game theories are the most common. Cooperative game theory
deals with how coalitions, or cooperative groups, interact when only the payoffs
are known. It is a game between coalitions of players rather than between
individuals, and it questions how groups form and how they allocate the payoff
among players.

Non-cooperative game theory deals with how rational economic agents


deal with each other to achieve their own goals. The most common non-
cooperative game is the strategic game, in which only the available strategies
and the outcomes that result from a combination of choices are listed. A
simplistic example of a real-world non-cooperative game is rock-paper-scissors.

Zero-Sum vs. Non-Zero Sum Games

When there is a direct conflict between multiple parties striving for the
same outcome, it is often called a zero-sum game. This means that for every
winner, there is a loser. Alternatively, it means that the collective net benefit
received is equal to the collective net benefit lost. Lots of sporting events are a
zero-sum game as one team wins and another team loses.

A non-zero-sum game is one in which all participants can win or lose at


the same time. Consider business partnerships that are mutually beneficial and
foster value for both entities. Instead of competing and attempting to "win," both
parties benefit.
Investing and trading stocks is sometimes considered a zero-sum game.
After all, one market participant will buy a stock and another participant sells
that same stock for the same price. However, because different investors have
different risk appetites and investing goals, it may be mutually beneficial for both
parties to transact.

Simultaneous Move vs. Sequential Move Games

Many times in life, game theory presents itself in simultaneous move


situations. This means each participant must continually make decisions at the
same time their opponent is making decisions. As companies devise their
marketing, product development, and operational plans, competing companies
are also doing the same thing at the same time.

In some cases, there is an intentional staggering of decision-making steps,


enabling one party to see the other party's moves before making their own. This
is usually present in negotiations; one party lists their demands, then the other
party has a designated amount of time to respond and list their own.

One Shot vs. Repeated Games

Game theory can begin and end in a single instance. Like much of life, the
underlying competition starts, progresses, ends, and cannot be redone. This is
often the case with equity traders, who must wisely choose their entry point and
exit point as their decision may not easily be undone or retried.

On the other hand, some repeated games continue on and seamlessly never
end. These types of games often contain the same participants each time, and
each party has the knowledge of what occurred last time. For example, consider
rival companies trying to price their goods. Whenever one makes a price
adjustment, so may the other. This circular competition repeats itself across
product cycles or sale seasonality.

In the example below, a depiction of the Prisoner's Dilemma (discussed in


the next section) is shown. In this depiction, after the first iteration occurs, there
is no payoff. Instead, a second iteration of the game occurs, bringing with it a
new set of outcomes not possible under one shot games.

Examples of Game Theory

There are several "games" that game theory analyzes. Below, we will
briefly describe a few of these.

The Prisoner's Dilemma


The Prisoner's Dilemma is the most well-known example of game
theory. Consider the example of two criminals arrested for a crime. Prosecutors
have no hard evidence to convict them. However, to gain a confession, officials
remove the prisoners from their solitary cells and question each one in separate
chambers. Neither prisoner has the means to communicate with the
other. Officials present four deals, often displayed as a 2 x 2 box.

. If both confess, they will each receive a five-year prison sentence.


. If Prisoner 1 confesses, but Prisoner 2 does not, Prisoner 1 will get three
years and Prisoner 2 will get nine years.
. If Prisoner 2 confesses, but Prisoner 1 does not, Prisoner 1 will get 10
years, and Prisoner 2 will get two years.
. If neither confesses, each will serve two years in prison.

The most favorable strategy is to not confess. However, neither is aware


of the other's strategy and, without certainty that one will not confess, both will
likely confess and receive a five-year prison sentence. The Nash equilibrium
suggests that in a prisoner's dilemma, both players will make the move that is
best for them individually but worse for them collectively.

"Tit for tat" is said to be the optimal strategy in a prisoner's dilemma. Tit
for tat was introduced by Anatol Rapoport, who developed a strategy in which
each participant in an iterated prisoner's dilemma follows a course of action
consistent with their opponent's previous turn. For example, if provoked, a player
subsequently responds with retaliation; if unprovoked, the player cooperates.

The image below depicts the dilemma where the choice of the participant
in the column and the choice of the participant in the row may clash. For example,
both parties may receive the most favorable outcome if both choose row/column
1. However, each faces the risk of strong adverse outcomes should the other party
not choose the same outcome.

Dictator Game

This is a simple game in which Player A must decide how to split a cash
prize with Player B, who has no input into Player A’s decision. While this is not
a game theory strategy per se, it does provide some interesting insights into
people’s behavior. Experiments reveal about 50% keep all the money to
themselves, 5% split it equally, and the other 45% give the other participant a
smaller share.

The dictator game is closely related to the ultimatum game, in which


Player A is given a set amount of money, part of which has to be given to Player
B, who can accept or reject the amount given. The catch is if the second player
rejects the amount offered, both A and B get nothing. The dictator and ultimatum
games hold important lessons for charitable giving and philanthropy.

Volunteer’s Dilemma

In a volunteer’s dilemma, someone has to undertake a chore or job for the


common good. The worst possible outcome is realized if nobody volunteers. For
example, consider a company in which accounting fraud is rampant, though top
management is unaware of it. Some junior employees in the accounting
department are aware of the fraud but hesitate to tell top management because it
would result in the employees involved in the fraud being fired and most likely
prosecuted.

Being labeled as a whistleblower may also have some repercussions down


the line. But if nobody volunteers, the large-scale fraud may result in the
company’s eventual bankruptcy and the loss of everyone’s jobs.

The Centipede Game

The centipede game is an extensive-form game in game theory in which


two players alternately get a chance to take the larger share of a slowly increasing
money stash. It is arranged so that if a player passes the stash to their opponent
who then takes the stash, the player receives a smaller amount than if they had
taken the pot.

The centipede game concludes as soon as a player takes the stash, with
that player getting the larger portion and the other player getting the smaller
portion. The game has a pre-defined total number of rounds, which are known to
each player in advance.

Types of Game Theory Strategies

Game theory participants can decide between a few primary ways to play
their game. In general, each participant must decide what level of risk they are
willing to take and how far they are willing to go to pursue the best possible
outcome.

Maximax Strategy

A maximax strategy involves no hedging. The participant is either all in


or all out; they'll either win big or face the worst consequence. Consider
new start-up companies introducing new products to the market. Their new
product may result in the company's market cap increasing fifty-fold. On the
other hand, a failed product launch will leave the company bankrupt. The
participant is willing to take a chance on achieving the best outcome even if the
worst outcome is possible.

Maximin Strategy

A maximin strategy in game theory results in the participant choosing the


best of the worst payoff. The participant has decided to hedge risk and sacrifice
full benefit in exchange for avoiding the worst outcome. Often, companies face
and accept this strategy when considering lawsuits. By settling out of court and
avoiding a public trial, companies agree to an adverse outcome. However, that
outcome could have been worse if the case had gone to trial.

Dominant Strategy

In a dominant strategy, a participant performs actions that are the best


outcome for the play, irrespective of what other participants decide to do. In
business, this may be a situation where a company decides to scale and expand
to a new market, regardless of whether a competing company has decided to
move into the market as well. In Prisoner's Dilemma, the dominant strategy
would be to confess.

Pure Strategy

Pure strategy entails the least amount of strategic decision-making, as pure


strategy is simply a defined choice that is made regardless of external forces or
actions of others. Consider a game of rock-paper-scissors in which one
participant decides to throw the same shape each trial. As the outcome for this
participant is well-defined in advance (outcomes are either a specific shape or
not that specific shape), the strategy is defined as pure.
Mixed Strategy

A mixed strategy may seem like random chance, but there is much thought
that must go into devising a plan of mixing elements or actions. Consider the
relationship between a baseball pitcher and batter. The pitcher cannot throw the
same pitch each time; otherwise, the batter could predict what would come next.
Instead, the pitcher must mix its strategy from pitch to pitch to create a sense of
unpredictability that it hopes to benefit from.

Limitations of Game Theory

The biggest issue with game theory is that, like most other economic
models, it relies on the assumption that people are rational actors that are self-
interested and utility-maximizing. Of course, we are social beings who do
cooperate often at our own expense. Game theory cannot account for the fact that
in some situations we may fall into a Nash equilibrium, and other times not,
depending on the social context and who the players are.

In addition, game theory often struggles to factor in human elements such


as loyalty, honesty, or empathy. Though statistical and mathematical
computations can dictate what a best course of action should be, humans may
not take this course due to incalculable and complex scenarios of self-sacrifice
or manipulation. Game theory may analyze a set of behaviors but it can not truly
forecast the human element.

Optimal decisions in games

Humans’ intellectual capacities have been engaged by games for as long


as civilization has existed, sometimes to an alarming degree. Games are an
intriguing subject for AI researchers because of their abstract character. A
game’s state is simple to depict, and actors are usually limited to a small number
of actions with predetermined results. Physical games, such as croquet and ice
hockey, contain significantly more intricate descriptions, a much wider variety
of possible actions, and rather ambiguous regulations defining the legality of
activities. With the exception of robot soccer, these physical games have not
piqued the AI community’s interest.

Games are usually intriguing because they are difficult to solve. Chess,
for example, has an average branching factor of around 35, and games
frequently stretch to 50 moves per player, therefore the search tree has roughly
35100 or 10154 nodes (despite the search graph having “only” about 1040
unique nodes). As a result, games, like the real world, necessitate the ability to
make some sort of decision even when calculating the best option is impossible.

Inefficiency is also heavily punished in games. Whereas a half-efficient


implementation of A search will merely take twice as long to complete, a chess
software that is half as efficient in utilizing its available time will almost
certainly be beaten to death, all other factors being equal. As a result of this
research, a number of intriguing suggestions for making the most use of time
have emerged.

Optimal Decision Making in Games

Let us start with games with two players, whom we’ll refer to as MAX and MIN
for obvious reasons. MAX is the first to move, and then they take turns until the
game is finished. At the conclusion of the game, the victorious player receives
points, while the loser receives penalties. A game can be formalized as a type of
search problem that has the following elements:

 S0: The initial state of the game, which describes how it is set up at the start.
 Player (s): Defines which player in a state has the move.
 Actions (s): Returns a state’s set of legal moves.
 Result (s, a): A transition model that defines a move’s outcome.
 Terminal-Test (s): A terminal test that returns true if the game is over but
false otherwise. Terminal states are those in which the game has come to a
conclusion.
 Utility (s, p): A utility function (also known as a payout function or objective
function ) determines the final numeric value for a game that concludes in the
terminal state s for player p. The result in chess is a win, a loss, or a draw,
with values of +1, 0, or 1/2. Backgammon’s payoffs range from 0 to +192,
but certain games have a greater range of possible outcomes. A zero-sum
game is defined (confusingly) as one in which the total reward to all players
is the same for each game instance. Chess is a zero-sum game because each
game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have
been a preferable name, 22 but zero-sum is the usual term and makes sense if
each participant is charged 1.

The game tree for the game is defined by the beginning state, ACTIONS
function, and RESULT function—a tree in which the nodes are game states and
the edges represent movements. The figure below depicts a portion of the tic-
tac-toe game tree (noughts and crosses). MAX may make nine different
maneuvers from his starting position. The game alternates between MAXs
setting an X and MINs placing an O until we reach leaf nodes corresponding to
terminal states, such as one player having three in a row or all of the squares
being filled. The utility value of the terminal state from the perspective of MAX
is shown by the number on each leaf node; high values are thought to be
beneficial for MAX and bad for MIN

The game tree for tic-tac-toe is relatively short, with just 9! = 362,880
terminal nodes. However, because there are over 1040 nodes in chess, the game
tree is better viewed as a theoretical construct that cannot be realized in the
actual world. But, no matter how big the game tree is, MAX’s goal is to find a
solid move. A tree that is superimposed on the whole game tree and examines
enough nodes to allow a player to identify what move to make is referred to as
a search tree.

A sequence of actions leading to a goal state—a terminal state that is a


win—would be the best solution in a typical search problem. MIN has
something to say about it in an adversarial search. MAX must therefore devise
a contingent strategy that specifies M A X’s initial state move, then MAX’s
movements in the states resulting from every conceivable MIN response, then
MAX’s moves in the states resulting from every possible MIN reaction to those
moves, and so on. This is quite similar to the AND-OR search method, with
MAX acting as OR and MIN acting as AND. When playing an infallible
opponent, an optimal strategy produces results that are as least as excellent as
any other plan. We’ll start by demonstrating how to find the best plan.

We’ll move to the trivial game in the figure below since even a simple
game like tic-tac-toe is too complex for us to draw the full game tree on one
page. MAX’s root node moves are designated by the letters a1, a2, and a3.
MIN’s probable answers to a1 are b1, b2, b3, and so on. This game is over after
MAX and MIN each make one move. (In game terms, this tree consists of two
half-moves and is one move deep, each of which is referred to as a ply.) The
terminal states in this game have utility values ranging from 2 to 14.

Game’s Utility Function

The optimal strategy can be found from the minimax value of each node,
which we express as MINIMAX, given a game tree (n). Assuming that both
players play optimally from there through the finish of the game, the utility (for
MAX) of being in the corresponding state is the node’s minimax value. The
usefulness of a terminal state is obviously its minimax value. Furthermore, if
given the option, MAX prefers to shift to a maximum value state, whereas MIN
wants to move to a minimum value state. So here’s what we’ve got:
Let’s use these definitions to analyze the game tree shown in the figure
above. The game’s UTILITY function provides utility values to the terminal
nodes on the bottom level. Because the first MIN node, B, has three successor
states with values of 3, 12, and 8, its minimax value is 3. Minimax value 2 is
also used by the other two MIN nodes. The root node is a MAX node, with
minimax values of 3, 2, and 2, resulting in a minimax value of 3. We can also
find the root of the minimax decision: action a1 is the best option for MAX since
it leads to the highest minimax value.

This concept of optimal MAX play requires that MIN plays optimally as
well—it maximizes MAX’s worst-case outcome. What happens if MIN isn’t
performing at its best? Then it’s a simple matter of demonstrating that MAX can
perform even better. Other strategies may outperform the minimax method
against suboptimal opponents, but they will always outperform optimal
opponents.

Alpha-beta search

o Alpha-beta pruning is a modified version of the minimax algorithm. It is


an optimization technique for the minimax algorithm.
o As we have seen in the minimax search algorithm that the number of game
states it has to examine are exponential in depth of the tree. Since we cannot
eliminate the exponent, but we can cut it to half. Hence there is a technique
by which without checking each node of the game tree we can compute the
correct minimax decision, and this technique is called pruning. This
involves two threshold parameter Alpha and beta for future expansion, so
it is called alpha-beta pruning. It is also called as Alpha-Beta
Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it
not only prune the tree leaves but also entire sub-tree.
o The two-parameter can be defined as:
1. Alpha: The best (highest-value) choice we have found so far at any

point along the path of Maximizer. The initial value of alpha is -∞.
2. Beta: The best (lowest-value) choice we have found so far at any

point along the path of Minimizer. The initial value of beta is +∞.
o The Alpha-beta pruning to a standard minimax algorithm returns the same
move as the standard algorithm does, but it removes all the nodes which
are not really affecting the final decision but making algorithm slow. Hence
by pruning these nodes, it makes the algorithm fast.

Condition for Alpha-beta pruning:

The main condition which required for alpha-beta pruning is:

1. α>=β

Key points about alpha-beta pruning:

o The Max player will only update the value of alpha.


o The Min player will only update the value of beta.
o While backtracking the tree, the node values will be passed to upper nodes
instead of values of alpha and beta.
o We will only pass the alpha, beta values to the child nodes.

Working of Alpha-Beta Pruning:

Let's take an example of two-player search tree to understand the working of


Alpha-beta pruning
Step 1: At the first step the, Max player will start first move from node A where
α= -∞ and β= +∞, these value of alpha and beta passed down to node B where
again α= -∞ and β= +∞, and Node B passes the same value to its child D.

Step 2: At Node D, the value of α will be calculated as its turn for Max. The value
of α is compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value
of α at node D and node value will also 3.

Step 3: Now algorithm backtrack to node B, where the value of β will change as
this is a turn of Min, Now β= +∞, will compare with the available subsequent
nodes value, i.e. min (∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node
E, and the values of α= -∞, and β= 3 will also be passed.

Step 4: At node E, Max will take its turn, and the value of alpha will change. The
current value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node
E α= 5 and β= 3, where α>=β, so the right successor of E will be pruned, and
algorithm will not traverse it, and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A.
At node A, the value of alpha will be changed the maximum available value is 3
as max (-∞, 3)= 3, and β= +∞, these two values now passes to right successor of
A which is Node C.

At node C, α=3 and β= +∞, and the same values will be passed on to node F.

Step 6: At node F, again the value of α will be compared with left child which is
0, and max(3,0)= 3, and then compared with right child which is 1, and max(3,1)=
3 still α remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the
value of beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at
C, α=3 and β= 1, and again it satisfies the condition α>=β, so the next child of C
which is G will be pruned, and the algorithm will not compute the entire sub-tree
G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1)
= 3. Following is the final game tree which is the showing the nodes which are
computed and nodes which has never computed. Hence the optimal value for the
maximizer is 3 for this example.

Move Ordering in Alpha-Beta pruning:

The effectiveness of alpha-beta pruning is highly dependent on the order in which


each node is examined. Move order is an important aspect of alpha-beta pruning.

It can be of two types:

o Worst ordering: In some cases, alpha-beta pruning algorithm does not


prune any of the leaves of the tree, and works exactly as minimax
algorithm. In this case, it also consumes more time because of alpha-beta
factors, such a move of pruning is called worst ordering. In this case, the
best move occurs on the right side of the tree. The time complexity for such
an order is O(bm).
o Ideal ordering: The ideal ordering for alpha-beta pruning occurs when
lots of pruning happens in the tree, and best moves occur at the left side of
the tree. We apply DFS hence it first search left of the tree and go deep
twice as minimax algorithm in the same amount of time. Complexity in
ideal ordering is O(bm/2).

Rules to find good ordering:

Following are some rules to find good ordering in alpha-beta pruning:

o Occur the best move from the shallowest node.


o Order the nodes in the tree such that the best nodes are checked first.
o Use domain knowledge while finding the best move. Ex: for Chess, try
order: captures first, then threats, then forward moves, backward moves.
o We can bookkeep the states, as there is a possibility that states may repeat.

Monte-carlo tree search

Monte Carlo Tree Search (MCTS) is a heuristic search set of rules that
has won big attention and reputation within the discipline of synthetic
intelligence, specially in the area of choice-making and game playing. It is
known for its ability to effectively handle complex and strategic video games
with massive search areas, in which traditional algorithms may additionally
struggle due to the full-size number of feasible actions or actions.

MCTS combines the standards of Monte Carlo strategies, which rely upon
random sampling and statistical evaluation, with tree-primarily based search
techniques. Unlike traditional search algorithms that rely upon exhaustive
exploration of the entire seek area, MCTS specializes in sampling and exploring
only promising areas of the hunt area.
The center idea in the back of MCTS is to build a seek tree incrementally
by using simulating more than one random performs (regularly known as
rollouts or playouts) from the current recreation nation. These simulations are
carried out until a terminal state or a predefined intensity is reached. The results
of these simulations are then backpropagated up the tree, updating the records
of the nodes visited at some stage in the play, which includes the wide variety
of visits and the win ratios.

As the search progresses, MCTS dynamically balances exploration and


exploitation. It selects moves through considering both the exploitation of
notably promising movements with high win ratios and the exploration of
unexplored or less explored moves. This balancing is finished through the usage
of an top confidence sure (UCB) components, which includes the Upper
Confidence Bounds for Trees (UCT), to decide which moves or nodes to visit
for the duration of the hunt.

MCTS has been efficiently implemented in numerous domains, including


board games (e.G., Go, chess, and shogi), card video games (e.G., poker), and
video games. It has done splendid overall performance in lots of challenging
recreation-gambling scenarios, frequently surpassing human understanding.
MCTS has also been prolonged and tailored to deal with different trouble
domains, which include making plans, scheduling, and optimization.

One of the exquisite blessings of MCTS is its ability to handle video


games with unknown or imperfect data, as it relies on statistical sampling as
opposed to whole know-how of the game state. Additionally, MCTS is scalable
and may be parallelized efficaciously, making it suitable for disbursed
computing and multi-core architectures.

Monte Carlo Tree Search (MCTS) is a search technique in the field of


Artificial Intelligence (AI). It is a probabilistic and heuristic driven search
algorithm that combines the classic tree search implementations alongside
machine learning principles of reinforcement learning.
In tree search, there’s always the possibility that the current best action is
actually not the most optimal action. In such cases, MCTS algorithm becomes
useful as it continues to evaluate other alternatives periodically during the
learning phase by executing them, instead of the current perceived optimal
strategy. This is known as the ” exploration-exploitation trade-off “. It exploits
the actions and strategies that is found to be the best till now but also must
continue to explore the local space of alternative decisions and find out if they
could replace the current best.
Exploration helps in exploring and discovering the unexplored parts of
the tree, which could result in finding a more optimal path. In other words, we
can say that exploration expands the tree’s breadth more than its depth.
Exploration can be useful to ensure that MCTS is not overlooking any
potentially better paths. But it quickly becomes inefficient in situations with
large number of steps or repetitions. In order to avoid that, it is balanced out by
exploitation. Exploitation sticks to a single path that has the greatest estimated
value. This is a greedy approach and this will extend the tree’s depth more than
its breadth. In simple words, UCB formula applied to trees helps to balance the
exploration-exploitation trade-off by periodically exploring relatively
unexplored nodes of the tree and discovering potentially more optimal paths
than the one it is currently exploiting.
For this characteristic, MCTS becomes particularly useful in making
optimal decisions in Artificial Intelligence (AI) problems.

Why use Monte Carlo Tree Search (MCTS) ?

Here are some reasons why MCTS is commonly used:

1. Handling Complex and Strategic Games: MCTS excels in games with large

search spaces, complex dynamics, and strategic decision-making. It has been


successfully applied to games like Go, chess, shogi, poker, and many others,
achieving remarkable performance that often surpasses human expertise.
MCTS can effectively explore and evaluate different moves or actions,
leading to strong gameplay and decision-making in such games.
2. Unknown or Imperfect Information: MCTS is suitable for games or scenarios

with unknown or imperfect information. It relies on statistical sampling and


does not require complete knowledge of the game state. This makes MCTS
applicable to domains where uncertainty or incomplete information exists,
such as card games or real-world scenarios with limited or unreliable data.
3. Learning from Simulations: MCTS learns from simulations or rollouts to

estimate the value of actions or states. Through repeated iterations, MCTS


gradually refines its knowledge and improves decision-making. This learning
aspect makes MCTS adaptive and capable of adapting to changing
circumstances or evolving strategies.
4. Optimizing Exploration and Exploitation: MCTS effectively balances

exploration and exploitation during the search process. It intelligently


explores unexplored areas of the search space while exploiting promising
actions based on existing knowledge. This exploration-exploitation trade-off
allows MCTS to find a balance between discovering new possibilities and
exploiting known good actions.
5. Scalability and Parallelization: MCTS is inherently scalable and can be

parallelized efficiently. It can utilize distributed computing resources or


multi-core architectures to speed up the search and handle larger search
spaces. This scalability makes MCTS applicable to problems that require
significant computational resources.
6. Applicability Beyond Games: While MCTS gained prominence in game-

playing domains, its principles and techniques are applicable to other problem
domains as well. MCTS has been successfully applied to planning problems,
scheduling, optimization, and decision-making in various real-world
scenarios. Its ability to handle complex decision-making and uncertainty
makes it valuable in a range of applications.
7. Domain Independence: MCTS is relatively domain-independent. It does not

require domain-specific knowledge or heuristics to operate. Although


domain-specific enhancements can be made to improve performance, the
basic MCTS algorithm can be applied to a wide range of problem domains
without significant modifications.

Monte Carlo Tree Search (MCTS) algorithm:


In MCTS, nodes are the building blocks of the search tree. These nodes
are formed based on the outcome of a number of simulations. The process of
Monte Carlo Tree Search can be broken down into four distinct steps, viz.,
selection, expansion, simulation and backpropagation. Each of these steps is
explained in details below:

 Selection: In this process, the MCTS algorithm traverses the current tree
from the root node using a specific strategy. The strategy uses an evaluation
function to optimally select nodes with the highest estimated value. MCTS
uses the Upper Confidence Bound (UCB) formula applied to trees as the
strategy in the selection process to traverse the tree. It balances the
exploration-exploitation trade-off. During tree traversal, a node is selected
based on some parameters that return the maximum value. The parameters
are characterized by the formula that is typically used for this purpose is given
below.
 where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns
the greatest value from the above equation will be one that will get selected.
During traversal, once a child node is found which is also a leaf node, the
MCTS jumps into the expansion step.
 Expansion: In this process, a new child node is added to the tree to that node
which was optimally reached during the selection process.
 Simulation: In this process, a simulation is performed by choosing moves or
strategies until a result or predefined state is achieved.
 Backpropagation: After determining the value of the newly added node, the
remaining tree must be updated. So, the backpropagation process is
performed, where it backpropagates from the new node to the root node.
During the process, the number of simulation stored in each node is
incremented. Also, if the new node’s simulation results in a win, then the
number of wins is also incremented.
The above steps can be visually understood by the diagram given below:
These types of algorithms are particularly useful in turn based games
where there is no element of chance in the game mechanics, such as Tic Tac
Toe, Connect 4, Checkers, Chess, Go, etc. This has recently been used by
Artificial Intelligence Programs like AlphaGo, to play against the world’s top
Go players. But, its application is not limited to games only. It can be used in
any situation which is described by state-action pairs and simulations used to
forecast outcomes.

Advantages of Monte Carlo Tree Search:

1. MCTS is a simple algorithm to implement.

2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate

effectively without any knowledge in the particular domain, apart from the
rules and end conditions, and can find its own moves and learn from them by
playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used

in future use cases whenever required.


4. MCTS supports asymmetric expansion of the search tree based on the

circumstances in which it is operating.


Disadvantages of Monte Carlo Tree Search:

1. As the tree growth becomes rapid after a few iterations, it requires a huge

amount of memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain

scenarios, there might be a single branch or path, that might lead to loss
against the opposition when implemented for those turn-based games. This is
mainly due to the vast amount of combinations and each of the nodes might
not be visited enough number of times to understand its result or outcome in
the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively

decide the most efficient path. So, there is a bit of a speed issue there.

Issues in Monte Carlo Tree Search:

Here are some common issues associated with MCTS:

1. Exploration-Exploitation Trade-off: MCTS faces the challenge of balancing

exploration and exploitation during the search. It needs to explore different


branches of the search tree to gather information about their potential, while
also exploiting promising actions based on existing knowledge. Achieving
the right balance is crucial for the algorithm’s effectiveness and performance.
2. Sample Efficiency: MCTS can require a large number of simulations or

rollouts to obtain accurate statistics and make informed decisions. This can
be computationally expensive, especially in complex domains with a large
search space. Improving the sample efficiency of MCTS is an ongoing
research area.
3. High Variance: The outcomes of individual rollouts in MCTS can be highly

variable due to the random nature of the simulations. This can lead to
inconsistent estimations of action values and introduce noise in the decision-
making process. Techniques such as variance reduction and progressive
widening are used to mitigate this issue.
4. Heuristic Design: MCTS relies on heuristics to guide the search and prioritize

actions or nodes. Designing effective and domain-specific heuristics can be


challenging, and the quality of the heuristics directly affects the algorithm’s
performance. Developing accurate heuristics that capture the characteristics
of the problem domain is an important aspect of using MCTS.
5. Computation and Memory Requirements: MCTS can be computationally

intensive, especially in games with long horizons or complex dynamics. The


algorithm’s performance depends on the available computational resources,
and in resource-constrained environments, it may not be feasible to run
MCTS with a sufficient number of simulations. Additionally, MCTS requires
memory to store and update the search tree, which can become a limitation in
memory-constrained scenarios.
6. Overfitting: In certain cases, MCTS can overfit to specific patterns or biases

present in the early simulations, which can lead to suboptimal decisions. To


mitigate this issue, techniques such as exploration bonuses, progressive
unpruning, and rapid action-value estimation have been proposed to
encourage exploration and avoid premature convergence.
7. Domain-specific Challenges: Different domains and problem types can

introduce additional challenges and issues for MCTS. For example, games
with hidden or imperfect information, large branching factors, or continuous
action spaces require adaptations and extensions of the basic MCTS
algorithm to handle these complexities effectively.

Stochastic games

Many unforeseeable external occurrences can place us in unforeseen


circumstances in real life. Many games, such as dice tossing, have a random
element to reflect this unpredictability. These are known as stochastic games.
Backgammon is a classic game that mixes skill and luck. The legal moves are
determined by rolling dice at the start of each player’s turn white, for example,
has rolled a 6–5 and has four alternative moves in the backgammon scenario
shown in the figure below.

This is a standard backgammon position. The object of the game is to get


all of one’s pieces off the board as quickly as possible. White moves in a
clockwise direction toward 25, while Black moves in a counterclockwise
direction toward 0. Unless there are many opponent pieces, a piece can advance
to any position; if there is only one opponent, it is caught and must start over.
White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11),
(5–11,19–24), (5–10,10–16), and (5–11,11–16), where the notation (5–11,11–
16) denotes moving one piece from position 5 to 11 and then another from 11
to 16.

White knows his or her own legal moves, but he or she has no idea how
Black will roll, and thus has no idea what Black’s legal moves will be. That
means White won’t be able to build a normal game tree-like in chess or tic-tac-
toe. In backgammon, in addition to M A X and M I N nodes, a game tree must
include chance nodes. The figure below depicts chance nodes as circles. The
possible dice rolls are indicated by the branches leading from each chance node;
each branch is labelled with the roll and its probability. There are 36 different
ways to roll two dice, each equally likely, yet there are only 21 distinct rolls
because a 6–5 is the same as a 5–6. P (1–1) = 1/36 because each of the six
doubles (1–1 through 6–6) has a probability of 1/36. Each of the other 15 rolls
has a 1/18 chance of happening.

The following phase is to learn how to make good decisions. Obviously,


we want to choose the move that will put us in the best position. Positions, on
the other hand, do not have specific minimum and maximum values. Instead,
we can only compute a position’s anticipated value, which is the average of all
potential outcomes of the chance nodes.

As a result, we can generalize the deterministic minimax value to an


expected-minimax value for games with chance nodes. Terminal nodes, MAX
and MIN nodes (for which the dice roll is known), and MAX and MIN nodes
(for which the dice roll is unknown) all function as before. We compute the
expected value for chance nodes, which is the sum of all outcomes, weighted by
the probability of each chance action.
where r is a possible dice roll (or other random events) and RESULT(s,r)
denotes the same state as s, but with the addition that the dice roll’s result is r.

Partially observable games

Partially Observable Games, often referred to as Partially Observable


Markov Decision Processes (POMDPs), are a class of problems and models in
artificial intelligence that involve decision-making in situations where an agent's
observations do not provide complete information about the underlying state of
the environment. POMDPs are an extension of Markov Decision Processes
(MDPs) to scenarios where uncertainty and partial observability are significant
factors. They are commonly used to model and solve problems in various
domains, including robotics, healthcare, finance, and game playing.

Key Characteristics of Partially Observable Games (POMDPs):

Partial Observability: In POMDPs, the agent's observations are incomplete and


do not directly reveal the true state of the environment. This introduces
uncertainty, as the agent must reason about the possible states given its
observations.

Hidden States: The environment's true state, also known as the hidden state,
evolves according to a probabilistic process. The agent's observations provide
noisy or incomplete information about this hidden state.
Belief State: To handle partial observability, the agent maintains a belief state,
which is a probability distribution over possible hidden states. The belief state
captures the agent's uncertainty about the true state of the environment.

Action and Observation: The agent takes actions based on its belief state, and it
receives observations that depend on the hidden state. These observations help
the agent update its belief state and make decisions.

Objective and Policy: The agent's goal is to find a policy—a mapping from belief
states to actions—that maximizes a specific objective, such as cumulative
rewards or long-term expected utility.

Solving Partially Observable Games (POMDPs):

Solving POMDPs is challenging due to the added complexity of partial


observability. Traditional techniques used for MDPs, such as dynamic
programming and value iteration, are not directly applicable to POMDPs. Instead,
specialized algorithms and techniques are developed to address the partial
observability:

Belief Space Methods: These methods work directly in the space of belief states
and involve updating beliefs based on observations and actions. Techniques like
the POMDP forward algorithm and backward induction are used to compute
optimal policies.

Particle Filtering: Particle filters are used to maintain an approximation of the


belief state using a set of particles, each representing a possible state hypothesis.

Point-Based Methods: These methods focus on selecting a subset of belief states


(points) that are critical for decision-making. Techniques like PBVI (Point-Based
Value Iteration) and POMCP (Partially Observable Monte Carlo Planning) fall
under this category.

Approximate Solutions: Due to the complexity of exact solutions, approximate


methods such as online planning, heuristic-based policies, and reinforcement
learning techniques are often employed to find near-optimal solutions.

Applications of Partially Observable Games:

Partially Observable Games have numerous real-world applications, including:

Robotics: Robot navigation, exploration, and manipulation tasks in uncertain and


partially observable environments.

Healthcare: Optimal patient treatment scheduling and management under


uncertainty.

Financial Planning: Portfolio optimization, trading, and risk management in


financial markets.

Game Playing: Modeling opponents in games with hidden information, such as


poker and strategic board games.

Partially Observable Games (POMDPs) are a powerful framework for modeling


decision-making under uncertainty and partial observability. They provide a way
to represent and solve problems where agents must reason about hidden states
and make optimal decisions based on incomplete observations.

Constraint satisfaction problems


We have encountered a wide variety of methods, including adversarial
search and instant search, to address various issues. Every method for issue has a
single purpose in mind: to locate a remedy that will enable that achievement of
the objective. However there were no restrictions just on bots' capability to
resolve issues as well as arrive at responses in adversarial search and local search,
respectively.

These section examines the constraint optimization methodology, another


form or real concern method. By its name, constraints fulfilment implies that such
an issue must be solved while adhering to a set of restrictions or guidelines.

Whenever a problem is actually variables comply with stringent conditions


of principles, it is said to have been addressed using the solving multi - objective
method. Wow what a method results in a study sought to achieve of the intricacy
and organization of both the issue.

There are mainly three basic components in the constraint satisfaction


problem:

Variables: The things that need to be determined are variables. Variables in


a CSP are the objects that must have values assigned to them in order to satisfy
a particular set of constraints. Boolean, integer, and categorical variables are
just a few examples of the various types of variables Variables, for instance,
could stand in for the many puzzle cells that need to be filled with numbers in a
sudoku puzzle.

Domains: The range of potential values that a variable can have is


represented by domains. Depending on the issue, a domain may be finite or
limitless. For instance, in Sudoku, the set of numbers from 1 to 9 can serve as
the domain of a variable representing a problem cell.
Constraints: The guidelines that control how variables relate to one another are
known as constraints. Constraints in a CSP define the ranges of possible values
for variables. Unary constraints, binary constraints, and higher-order constraints
are only a few examples of the various sorts of constraints. For instance, in a
sudoku problem, the restrictions might be that each row, column, and 3×3 box
can only have one instance of each number from 1 to 9.

Constraint Satisfaction Problems (CSP) representation:

 The finite set of variables V1, V2, V3 ……………..Vn.


 Non-empty domain for every single variable D1, D2, D3 …………..Dn.
 The finite set of constraints C1, C2 …….…, Cm.
 where each constraint Ci restricts the possible values for variables,
 e.g., V1 ≠ V2
 Each constraint Ci is a pair <scope, relation>
 Example: <(V1, V2), V1 not equal to V2>
 Scope = set of variables that participate in constraint.
 Relation = list of valid variable value combinations.
 There might be a clear list of permitted combinations. Perhaps a relation that
is abstract and that allows for membership testing and listing.

Constraint Satisfaction Problems (CSP) algorithms:


 The backtracking algorithm is a depth-first search algorithm that
methodically investigates the search space of potential solutions up until a
solution is discovered that satisfies all the restrictions. The method begins by
choosing a variable and giving it a value before repeatedly attempting to give
values to the other variables. The method returns to the prior variable and
tries a different value if at any time a variable cannot be given a value that
fulfills the requirements. Once all assignments have been tried or a solution
that satisfies all constraints has been discovered, the algorithm ends.
 The forward-checking algorithm is a variation of the backtracking
algorithm that condenses the search space using a type of local consistency.
For each unassigned variable, the method keeps a list of remaining values and
applies local constraints to eliminate inconsistent values from these sets. The
algorithm examines a variable’s neighbors after it is given a value to see
whether any of its remaining values become inconsistent and removes them
from the sets if they do. The algorithm goes backward if, after forward
checking, a variable has no more values.
 Algorithms for propagating constraints are a class that uses local
consistency and inference to condense the search space. These algorithms
operate by propagating restrictions between variables and removing
inconsistent values from the variable domains using the information obtained.

Types of Constraints in CSP

Basically, there are three different categories of limitations in regard towards the
parameters:

o Unary restrictions are the easiest kind of restrictions because they only
limit the value of one variable.
o Binary resource limits: These restrictions connect two parameters. A value
between x1 and x3 can be found in a variable named x2.
o Global Resource limits: This kind of restriction includes a unrestricted
amount of variables.

The main kinds of restrictions are resolved using certain kinds of resolution
methodologies:

o In linear programming, when every parameter carrying an integer value


only occurs in linear equation, linear constraints are frequently utilised.
o Non-linear Constraints: With non-linear programming, when each variable
(an integer value) exists in a non-linear form, several types of restrictions
were utilised.

Think of a Sudoku puzzle where some of the squares have initial fills of certain
integers.

You must complete the empty squares with numbers between 1 and 9, making
sure that no rows, columns, or blocks contains a recurring integer of any kind.
This solving multi - objective issue is pretty elementary. A problem must be
solved while taking certain limitations into consideration.

The integer range (1-9) that really can occupy the other spaces is referred to as a
domain, while the empty spaces themselves were referred as variables. The values
of the variables are drawn first from realm. Constraints are the rules that
determine how a variable will select the scope.

Constraint propagation

In the previous sections we presented two rather different schemes for


solving the CSP: backtracking and consistency techniques. A third possible
scheme is to embed a consistency algorithm inside a backtracking algorithm as
follows.

As a skeleton we use the simple backtracking algorithm that incrementally


instantiates variables and extends a partial solution that specifies consistent
values for some of the variables, toward a complete solution, by repeatedly
choosing a value for another variable. After assigning a value to the variable,
some consistency technique is applied to the constraint graph. Depending on the
degree of consistency technique we get various constraint satisfaction algorithms.

Backtracking
Even simple backtracking (BT) performs some kind of consistency
technique and it can be seen as a combination of pure generate &
test and a fraction of arc consistency. The BT algorithm tests arc
consistency among already instantiated variables, i.e., the algorithm
checks the validity of constraints considering the partial instantiation.
Because the domains of instantiated variables contains just one value,
it is possible to check only those constraints/arcs containing the last
instantiated variable. If any domain is reduced then the
corresponding constraint is not consistent and the algorithm
backtracks to a new instantiation.

The BT algorithm detects the inconsistency as soon as it appears and, therefore,


it is far away efficient than the simple generate & test approach. But it has still to
perform too much search.

Example: (4-queens problem and BT)


The BT algorithm can be easily extended to backtrack to the
conflicting variable and, thus, to incorporate some form of look-back
scheme or intelligent backtracking. Nevertheless, this adds some
additional expenses to the algorithm and it seems that preventing
possible future conflicts is more reasonable than recovering from
them.

Forward Checking

Forward checking is the easiest way to prevent future conflicts.


Instead of performing arc consistency to the instantiated variables, it
performs restricted form of arc consistency to the not yet instantiated
variables. We speak about restricted arc consistency because
forward checking checks only the constraints between the current
variable and the future variables. When a value is assigned to the
current variable, any value in the domain of a "future" variable
which conflicts with this assignment is (temporarily) removed from
the domain. The advantage of this is that if the domain of a future
variable becomes empty, it is known immediately that the current
partial solution is inconsistent. Forward checking therefore allows
branches of the search tree that will lead to failure to be pruned
earlier than with simple backtracking. Note that whenever a new
variable is considered, all its remaining values are guaranteed to be
consistent with the past variables, so the checking an assignment
against the past assignments is no longer necessary.

Forward checking detects the inconsistency earlier than simple


backtracking and thus it allows branches of the search tree that will
lead to failure to be pruned earlier than with simple backtracking.
This reduces the search tree and (hopefully) the overall amount of
work done. But it should be noted that forward checking does more
work when each assignment is added to the current partial solution.

Example: (4-queens problem and FC)

Forward checking is almost always a much better choice than simple


backtracking.

Look Ahead

Forward checking checks only the constraints between the current


variable and the future variables. So why not to perform full arc
consistency that will further reduces the domains and removes
possible conflicts? This approach is called (full) look
ahead or maintaining arc consistency (MAC).

The advantage of look ahead is that it detects also the conflicts


between future variables and therefore allows branches of the search
tree that will lead to failure to be pruned earlier than with forward
checking. Also as with forward checking, whenever a new variable
is considered, all its remaining values are guaranteed to be consistent
with the past variables, so the checking an assignment against the
past assignments is no necessary.

Look ahead prunes the search tree further more than forward
checking but, again, it should be noted that look ahead does even
more work when each assignment is added to the current partial
solution than forward checking.

Example: (4-queens problem and LA)

Comparison of propagation techniques

The following figure shows which constraints are tested when the above
described propagation techniques are applied.
Backtracking search for CSP

Define CSP

CSPs represent a state with a set of variable/value pairs and represent the
conditions for a solution by a set of constraints on the variables. Many important
real-world problems can be described as CSPs.CSP (constraint satisfaction
problem): Use a factored representation (a set of variables, each of which has a
value) for each state, a problem that is solved when each variable has a value that
satisfies all the constraints on the variable is called a CSP.

A CSP consists of 3 components:

·X is a set of variables, {X1, …, Xn}.

·D is a set of domains, {D1, …, Dn}, one for each variable.


Each domain Di consists of a set of allowable values, {v1, …, vk} for variable Xi.

·C is a set of constraints that specify allowable combination of values.

Each constraint Ci consists of pair <scope, rel>, where scope is a tuple of


variables that participate in the constraint, and rel is a relation that defines the
values that those variables can take on.

A relation can be represented as: a. an explicit list of all tuples of values that
satisfy the constraint; or b. an abstract relation that supports two operations. (e.g.
if X1 and X2 both have the domain {A,B}, the constraint saying “the two variables
must have different values” can be written as a. <(X1,X2),[(A,B),(B,A)]> or b.
<(X1,X2),X1≠X2>.

Assignment:

Each state in a CSP is defined by an assignment of values to some of the variables,


{Xi=vi, Xj=vj, …};

An assignment that does not violate any constraints is called a consistent or legal
assignment;
A complete assignment is one in which every variable is assigned;

A solution to a CSP is a consistent, complete assignment;

A partial assignment is one that assigns values to only some of the variables.

Map coloring

To formulate a CSP:

define the variables to be the regions X = {WA, NT, Q, NSW, V, SA, T}.

The domain of each variable is the set Di = {red, green, blue}.

The constraints is C = {SA≠WA, SAW≠NT, SA≠Q, SA≠NSW, SA≠V, WA≠NT,


NT≠Q, Q≠NSW, NSW≠V}. ( SA≠WA is a shortcut for <(SA,WA),SA≠WA>. )

Constraint graph: The nodes of the graph correspond to variables of the


problem, and a link connects to any two variables that participate in a constraint.
Advantage of formulating a problem as a CSP:

1) CSPs yield a natural representation for a wide variety of problems;

2) CSP solvers can be faster than state-space searchers because the CSP solver
can quickly eliminate large swatches of the search space;

3) With CSP, once we find out that a partial assignment is not a solution, we can
immediately discard further refinements of the partial assignment.

4) We can see why a assignment is not a solution—which variables violate a


constraint.

Job-shop scheduling

Consider a small part of the car assembly, consisting of 15 tasks: install


axles (front and back), affix all four wheels (right and left, front and
back), tighten nuts for each wheel, affix hubcaps, and inspect the final assembly.
Represent the tasks with 15 variables:
X = {AxleF, AxleB, WheelRF, WheelLF, WheelRB, WheelLB, NutsRF, NutsLF, NutsRB,
NutsLB, CapRF, CapLF, CapRB, CapLB, Inspect}. The value of each variable is the
time that the task starts.

Precedent constraints: Whenever a task T1 must occur before task T2, and T1 take
duration d1 to complete. We add an arithmetic constraint of the form T1 + d1 ≤ T2 .
So,

AxleF + 10 ≤ WheelRF ; AxleF + 10 ≤ WheelLF ; AxleB + 10 ≤ WheelRB ; AxleB +


10 ≤ WheelLB ;

WheelRF + 1 ≤ NutsRF ; WheelLF + 1 ≤ NutsLF ; WheelRB + 1 ≤ NutsRB ; WheelLB +


1 ≤ NutsLB ;

NutsRF + 2 ≤ CapRF ; NutsLF + 2 ≤ CapLF ; NutsRB + 2 ≤ CapRB ; NutsLB + 2 ≤


CapLB ;

Disjunctive constraint: AxleF and AxleB must not overlap in time. So,

( AxleF + 10 ≤ AxleB ) or ( AxleB + 10 ≤ AxleF )


Assert that the inspection come last and takes 3 minutes. For every variable except
Inspect we add a constraint of the form X + dX ≤ Inspect.

There is a requirement to get the whle assembly done in 30 minutes, we can


achieve that by limiting the domain of all variables:

Di = {1, 2, 3, …, 27}.

Variation on the CSP formalism

a. Types of variables in CSPs

The simplest kind of CSP involves variables that have discrete, finite domains.
E.g. Map-coloring problems, scheduling with time limits, the 8-queens problem.

A discrete domain can be infinite. e.g. The set of integers or strings. With infinite
domains, to describe constraints, a constraint language must be used instead of
enumerating all allowed combinations of values.
CSP with continuous domains are common in the real world and are widely
studied in the field of operations research.

The best known category of continuous-domain CSPs is that of linear


programming problems, where constraints must be linear equalities or
inequalities. Linear programming problems can be solved in time polynomial in
the number of variables.

b. Types of constraints in CSPs

The simplest type is the unary constraint, which restricts the value of a single
variable.

A binary constraint relates two variables. (e.g. SA≠NSW.) A binary CSP is one
with only binary constraints, can be represented as a constraint graph.

We can also describe higher-order constraints. (e.g. The ternary


constraint Between(X, Y, Z).)

A constraint involving an arbitrary number of variables is called a global


constraint. (Need not involve all the variable in a problem.) One of the most
common global constraint is Alldiff, which says that all of the variables involved
in the constraint must have different values.

Constraint hypergraph: consists of ordinary nodes (circles in the figure) and


hypernodes (the squares), which represent n-ary constraints.

Two ways to transform an n-ary CSP to a binary one:

a. Every finite domain constraint can be reduced to a set of binary constraints if


enough auxiliary variables are introduced, so we could transform any CSP into
one with only binary constraints.
b. The dual-graph transformation: create a new graph in which there will be one
variable for each constraint in the original graph, and one binary constraint for
each pair of constraints in the original graph that share variables.

e.g. If the original graph has variable {X,Y,Z} and constraints <(X,Y,Z),C 1> and
<(X,Y),C2>, then the dual graph would have variables {C1,C2} with the binary
constraint <(X,Y),R1>, where (X,Y) are the shared variables and R1 is a new
relation that defines the constraint between the shared variables.

We might prefer a global constraint (such as Alldiff) rather than a set of binary
constraints for two reasons:

1) easier and less error-prone to write the problem description.

2) possible to design special-purpose inference algorithms for global constraints


that are not available for a set of more primitive constraints.

Absolute constraints: Violation of which rules out a potential solution.

Preference constraints: indicate which solutions are preferred, included in many


real-world CSPs. Preference constraints can often be encoded as costs on
individual variable assignments, with this formulation, CSPs with preferences
can be solved with optimization search methods. We ca call such a problem
a constraint optimization problem(COP). Linear programming problems do
this kind of optimization.

Constraint propagation: Inference in CSPs

A number of inference techniques use the constraints to infer which


variable/value pairs are consistent and which are not. These include node, arc,
path, and k-consistent.

constraint propagation: Using the constraints to reduce the number of legal


values for a variable, which in turn can reduce the legal values for another
variable, and so on.

local consistency: If we treat each variable as a node in a graph and each binary
constraint as an arc, then the process of enforcing local consistency in each part
of the graph causes inconsistent values to be eliminated throughout the graph.

There are different types of local consistency:


Node consistency

A single variable (a node in the CSP network) is node-consistent if all the values
in the variable’s domain satisfy the variable’s unary constraint.

We say that a network is node-consistent if every variable in the network is node-


consistent.

Arc consistency

A variable in a CSP is arc-consistent if every value in its domain satisfies the


variable’s binary constraints.

Xi is arc-consistent with respect to another variable Xj if for every value in the


current domain Di there is some value in the domain Dj that satisfies the binary
constraint on the arc (Xi, Xj).

A network is arc-consistent if every variable is arc-consistent with every other


variable.
Arc consistency tightens down the domains (unary constraint) using the arcs
(binary constraints).

AC-3 algrithm:

AC-3 maintains a queue of arcs which initially contains all the arcs in the CSP.

AC-3 then pops off an arbitrary arc (Xi, Xj) from the queue and makes Xi arc-
consistent with respect to Xj.
If this leaves Di unchanged, just moves on to the next arc;

But if this revises Di, then add to the queue all arcs (Xk, Xi) where Xk is a neighbor
of Xi.

If Di is revised down to nothing, then the whole CSP has no consistent solution,
return failure;

Otherwise, keep checking, trying to remove values from the domains of variables
until no more arcs are in the queue.

The result is an arc-consistent CSP that have the same solutions as the original
one but have smaller domains.

The complexity of AC-3:

Assume a CSP with n variables, each with domain size at most d, and
with c binary constraints (arcs). Checking consistency of an arc can be done in
O(d2) time, total worst-case time is O(cd3).
Path consistency

Path consistency: A two-variable set {Xi, Xj} is path-consistent with respect to


a third variable Xm if, for every assignment {Xi = a, Xj = b} consistent with the
constraint on {Xi, Xj}, there is an assignment to Xm that satisfies the constraints
on {Xi, Xm} and {Xm, Xj}.

Path consistency tightens the binary constraints by using implicit constraints that
are inferred by looking at triples of variables.

K-consistency

K-consistency: A CSP is k-consistent if, for any set of k-1 variables and for any
consistent assignment to those variables, a consistent value can always be
assigned to any kth variable.

1-consistency = node consistency; 2-consisency = arc consistency; 3-consistensy


= path consistency.

A CSP is strongly k-consistent if it is k-consistent and is also (k - 1)-consistent,


(k – 2)-consistent, … all the way down to 1-consistent.
A CSP with n nodes and make it strongly n-consistent, we are guaranteed to find
a solution in time O(n2d). But algorithm for establishing n-consitentcy must take
time exponential in n in the worse case, also requires space that is exponential in
n.

Global constraints

A global constraint is one involving an arbitrary number of variables (but not


necessarily all variables). Global constraints can be handled by special-purpose
algorithms that are more efficient than general-purpose methods.

1) inconsistency detection for Alldiff constraints

A simple algorithm: First remove any variable in the constraint that has a
singleton domain, and delete that variable’s value from the domains of the
remaining variables. Repeat as long as there are singleton variables. If at any point
an empty domain is produced or there are more vairables than domain values left,
then an inconsistency has been detected.

A simple consistency procedure for a higher-order constraint is sometimes more


effective than applying arc consistency to an equivalent set of binary constrains.
2) inconsistency detection for resource constraint (the atmost constraint)

We can detect an inconsistency simply by checking the sum of the minimum of


the current domains;

e.g.

Atmost(10, P1, P2, P3, P4): no more than 10 personnel are assigned in total.

If each variable has the domain {3, 4, 5, 6}, the Atmost constraint cannot be
satisfied.

We can enforce consistency by deleting the maximum value of any domain if it


is not consistent with the minimum values of the other domains.

e.g. If each variable in the example has the domain {2, 3, 4, 5, 6}, the values 5
and 6 can be deleted from each domain.

3) inconsistency detection for bounds consistent


For large resource-limited problems with integer values, domains are represented
by upper and lower bounds and are managed by bounds propagation.

e.g.

suppose there are two flights F1 and F2 in an airline-scheduling problem, for


which the planes have capacities 165 and 385, respectively. The initial domains
for the numbers of passengers on each flight are

D1 = [0, 165] and D2 = [0, 385].

Now suppose we have the additional constraint that the two flight together must
carry 420 people: F1 + F2 = 420. Propagating bounds constraints, we reduce the
domains to

D1 = [35, 165] and D2 = [255, 385].

A CSP is bounds consistent if for every variable X, and for both the lower-bound
and upper-bound values of X, there exists some value of Y that satisfies the
constraint between X and Y for every variable Y.
Sudoku

A Sudoku puzzle can be considered a CSP with 81 variables, one for each square.
We use the variable names A1 through A9 for the top row (left to right), down to
I1 through I9 for the bottom row. The empty squares have the domain {1, 2, 3, 4,
5, 6, 7, 8, 9} and the pre-filled squares have a domain consisting of a single value.

There are 27 different Alldiff constraints: one for each row, column, and box of
9 squares:

Alldiff(A1, A2, A3, A4, A5, A6, A7, A8, A9)

Alldiff(B1, B2, B3, B4, B5, B6, B7, B8, B9)

Alldiff(A1, B1, C1, D1, E1, F1, G1, H1, I1)

Alldiff(A2, B2, C2, D2, E2, F2, G2, H2, I2)


Alldiff(A1, A2, A3, B1, B2, B3, C1, C2, C3)

Alldiff(A4, A5, A6, B4, B5, B6, C4, C5, C6)

Backtracking search for CSPs

Backtracking search, a form of depth-first search, is commonly used for solving


CSPs. Inference can be interwoven with search.

Commutativity: CSPs are all commutative. A problem is commutative if the


order of application of any given set of actions has no effect on the outcome.

Backtracking search: A depth-first search that chooses values for one variable
at a time and backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries
all values in the domain of that variable in turn, trying to find a solution. If an
inconsistency is detected, then BACKTRACK returns failure, causing the
previous call to try another value.

There is no need to supply BACKTRACKING-SEARCH with a domain-specific


initial state, action function, transition model, or goal test.

BACKTRACKING-SARCH keeps only a single representation of a state and


alters that representation rather than creating a new ones.
To solve CSPs efficiently without domain-specific knowledge, address following
questions:

1)function SELECT-UNASSIGNED-VARIABLE: which variable should be


assigned next?
function ORDER-DOMAIN-VALUES: in what order should its values be tried?

2)function INFERENCE: what inferences should be performed at each step in the


search?

3)When the search arrives at an assignment that violates a constraint, can the
search avoid repeating this failure?

1. Variable and value ordering

SELECT-UNASSIGNED-VARIABLE

Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the
variable with the fewest “legal” value. A.k.a. “most constrained variable” or “fail-
first” heuristic, it picks a variable that is most likely to cause a failure soon thereby
pruning the search tree. If some variable X has no legal values left, the MRV
heuristic will select X and failure will be detected immediately—avoiding
pointless searches through other variables.

E.g. After the assignment for WA=red and NT=green, there is only one possible
value for SA, so it makes sense to assign SA=blue next rather than assigning Q.

[Powerful guide]

Degree heuristic: The degree heuristic attempts to reduce the branching factor
on future choices by selecting the variable that is involved in the largest number
of constraints on other unassigned variables. [useful tie-breaker]

e.g. SA is the variable with highest degree 5; the other variables have degree 2 or
3; T has degree 0.
ORDER-DOMAIN-VALUES

Value selection—fail-last

If we are trying to find all the solution to a problem (not just the first one), then
the ordering does not matter.

Least-constraining-value heuristic: prefers the value that rules out the fewest
choice for the neighboring variables in the constraint graph. (Try to leave the
maximum flexibility for subsequent variable assignments.)

e.g. We have generated the partial assignment with WA=red and NT=green and
that our next choice is for Q. Blue would be a bad choice because it eliminates
the last legal value left for Q’s neighbor, SA, therefore prefers red to blue.

The minimum-remaining-values and degree heuristic are domain-independent


methods for deciding which variable to choose next in a backtracking search.
The least-constraining-value heuristic helps in deciding which value to try first
for a given variable.

2. Interleaving search and inference


INFERENCE

forward checking: [One of the simplest forms of inference.] Whenever a


variable X is assigned, the forward-checking process establishes arc consistency
for it: for each unassigned variable Y that is connected to X by a constraint, delete
from Y’s domain any value that is inconsistent with the value chosen for X.

There is no reason to do forward checking if we have already done arc consistency


as a preprocessing step.

Advantage: For many problems the search will be more effective if we combine
the MRV heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent,
but doesn’t look ahead and make all the other variables arc-consistent.

MAC (Maintaining Arc Consistency) algorithm: [More powerful than forward


checking, detect this inconsistency.] After a variable X i is assigned a value, the
INFERENCE procedure calls AC-3, but instead of a queue of all arcs in the CSP,
we start with only the arcs(Xj, Xi) for all Xj that are unassigned variables that are
neighbors of Xi. From there, AC-3 does constraint propagation in the usual way,
and if any variable has its domain reduced to the empty set, the call to AC-3 fails
and we know to backtrack immediately.

3. Intelligent backtracking
chronological backtracking: The BACKGRACKING-SEARCH in Fig 6.5. When
a branch of the search fails, back up to the preceding variable and try a different
value for it. (The most recent decision point is revisited.)

e.g.

Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue,


T=red}.

When we try the next variable SA, we see every value violates a constraint.

We back up to T and try a new color, it cannot resolve the problem.

Intelligent backtracking: Backtrack to a variable that was responsible for


making one of the possible values of the next variable (e.g. SA) impossible.

Conflict set for a variable: A set of assignments that are in conflict with some
value for that variable.

(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method: Backtracks to the most recent assignment in the conflict
set.

(e.g. backjumping would jump over T and try a new value for V.)

Forward checking can supply the conflict set with no extra work.

Whenever forward checking based on an assignment X=x deletes a value from


Y’s domain, add X=x to Y’s conflict set;

If the last value is deleted from Y’s domain, the assignment in the conflict set of
Y are added to the conflict set of X.

In fact,every branch pruned by backjumping is also pruned by forward checking.


Hence simple backjumping is redundant in a forward-checking search or in a
search that uses stronger consistency checking (such as MAC).
Conflict-directed backjumping:

e.g.

consider the partial assignment which is proved to be inconsistent: {WA=red,


NSW=red}.

We try T=red next and then assign NT, Q, V, SA, no assignment can work for
these last 4 variables.

Eventually we run out of value to try at NT, but simple backjumping cannot work
because NT doesn’t have a complete conflict set of preceding variables that
caused to fail.

The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT
together with any subsequent variables to have no consistent solution. So the
algorithm should backtrack to NSW and skip over T.

A backjumping algorithm that uses conflict sets defined in this way is


called conflict-direct backjumping.
How to Compute:

When a variable’s domain becomes empty, the “terminal” failure occurs, that
variable has a standard conflict set.

Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible
value for Xj fails, backjump to the most recent variable Xi in conf(Xj), and set

conf(Xi) ← conf(Xi)∪conf(Xj) – {Xi}.

The conflict set for an variable means, there is no solution from that variable
onward, given the preceding assignment to the conflict set.

e.g.

assign WA, NSW, T, NT, Q, V, SA.

SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT,
NSW}.

Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA,


NSW}.

Hence the algorithm backjump to NSW. (over T)

After backjumping from a contradiction, how to avoid running into the same
problem again:

Constraint learning: The idea of finding a minimum set of variables from the
conflict set that causes the problem. This set of variables, along with their
corresponding values, is called a no-good. We then record the no-good, either by
adding a new constraint to the CSP or by keeping a separate cache of no-goods.

Backtracking occurs when no legal assignment can be found for a


variable. Conflict-directed backjumping backtracks directly to the source of the
problem.
Local search for CSPs

Local search algorithms for CSPs use a complete-state formulation: the initial
state assigns a value to every variable, and the search change the value of one
variable at a time.

The min-conflicts heuristic: In choosing a new value for a variable, select the
value that results in the minimum number of conflicts with other variables.

Local search techniques in Section 4.1 can be used in local search for CSPs.
The landscape of a CSP under the mini-conflicts heuristic usually has a series of
plateau. Simulated annealing and Plateau search (i.e. allowing sideways moves
to another state with the same score) can help local search find its way off the
plateau. This wandering on the plateau can be directed with tabu search: keeping
a small list of recently visited states and forbidding the algorithm to return to
those tates.

Constraint weighting: a technique that can help concentrate the search on the
important constraints.

Each constraint is given a numeric weight Wi, initially all 1.

At each step, the algorithm chooses a variable/value pair to change that will result
in the lowest total weight of all violated constraints.

The weights are then adjusted by incrementing the weight of each constraint that
is violated by the current assignment.

Local search can be used in an online setting when the problem changes, this is
particularly important in scheduling problems.
The structure of problem

1. The structure of constraint graph

The structure of the problem as represented by the constraint graph can be used
to find solution quickly.

e.g. The problem can be decomposed into 2 independent subproblems:


Coloring T and coloring the mainland.
Tree: A constraint graph is a tree when any two varyiable are connected by only
one path.

Directed arc consistency (DAC): A CSP is defined to be directed arc-consistent


under an ordering of variables X1, X2, … , Xn if and only if every Xi is arc-
consistent with each Xj for j>i.

By using DAC, any tree-structured CSP can be solved in time linear in the number
of variables.

How to solve a tree-structure CSP:

Pick any variable to be the root of the tree;

Choose an ordering of the variable such that each variable appears after its parent
in the tree. (topological sort)

Any tree with n nodes has n-1 arcs, so we can make this graph directed arc-
consistent in O(n) steps, each of which must compare up to d possible domain
values for 2 variables, for a total time of O(nd2).
Once we have a directed arc-consistent graph, we can just march down the list of
variables and choose any remaining value.

Since each link from a parent to its child is arc consistent, we won’t have to
backtrack, and can move linearly through the variables.
There are 2 primary ways to reduce more general constraint graphs to trees:

1. Based on removing nodes;


e.g. We can delete SA from the graph by fixing a value for SA and deleting from
the domains of other variables any values that are inconsistent with the value
chosen for SA.

The general algorithm:

Choose a subset S of the CSP’s variables such that the constraint graph becomes
a tree after removal of S. S is called a cycle cutset.

For each possible assignment to the variables in S that satisfies all constraints on
S,

(a) remove from the domain of the remaining variables any values that are
inconsistent with the assignment for S, and

(b) If the remaining CSP has a solution, return it together with the assignment
for S.

Time complexity: O(dc·(n-c)d2), c is the size of the cycle cut set.


Cutset conditioning: The overall algorithmic approach of efficient
approximation algorithms to find the smallest cycle cutset.

2. Based on collapsing nodes together

Tree decomposition: construct a tree decomposition of the constraint graph


into a set of connected subproblems, each subproblem is solved independently,
and the resulting solutions are then combined.
A tree decomposition must satisfy 3 requirements:

·Every variable in the original problem appears in at least one of the subproblems.

·If 2 variables are connected by a constraint in the original problem, they must
appear together (along with the constraint) in at least one of the subproblems.

·If a variable appears in 2 subproblems in the tree, it must appear in every


subproblem along the path connecting those those subproblems.

We solve each subproblem independently.

If any one has no solution, the entire problem has no solution.

If we can solve all the subproblems, then construct a global solution as follows:

First, view each subproblem as a “mega-variable” whose domain is the set of all
solutions for the subproblem.
Then, solve the constraints connecting the subproblems using the efficient
algorithm for trees.

A given constraint graph admits many tree decomposition;

In choosing a decomposition, the aim is to make the subproblems as small as


possible.

Tree width:

The tree width of a tree decomposition of a graph is one less than the size of the
largest subproblems.

The tree width of the graph itself is the minimum tree width among all its tree
decompositions.

Time complexity: O(ndw+1), w is the tree width of the graph.

The complexity of solving a CSP is strongly related to the structure of its


constraint graph. Tree-structured problems can be solved in linear time. Cutset
conditioning can reduce a general CSP to a tree-structured one and is quite
efficient if a small cutset can be found. Tree decomposition techniques
transform the CSP into a tree of subproblems and are efficient if the tree width of
constraint graph is small.
2. The structure in the values of variables

By introducing a symmetry-breaking constraint, we can break the value


symmetry and reduce the search space by a factor of n!.

e.g.

Consider the map-coloring problems with n colors, for every consistent solution,
there is actually a set of n! solutions formed by permuting the color names.(value
symmetry)

On the Australia map, WA, NT and SA must all have different colors, so there
are 3!=6 ways to assign.

We can impose an arbitrary ordering constraint NT<SA<WA that requires the 3


values to be in alphabetical order. This constraint ensures that only one of the n!
solution is possible: {NT=blue, SA=green, WA=red}. (symmetry-breaking
constraint)

You might also like