AI - Unit 3
AI - Unit 3
Adversarial Search Methods (Game Theory) - Mini max algorithm - Alpha beta pruning -
Constraint satisfactory problems – Constraints – Crypt Arithmetic Puzzles – Constraint
Domain – CSP as a search problem (Room colouring).
Adversarial search
• Adversarial search is a search, where we examine the problem which arises when we try
to plan ahead of the world and other agents are planning against us.
• The Adversarial Search involves more than one entity, each with competing aims and
purposes. These entities are put against one another in a game-like environment and each
player's strategy or game approach alters depending on the opponent's move.
• Used in game playing in which one can trace the movement of an enemy or opponent.
Adversarial search
• Blind and Heuristic search strategies are only associated with a single agent that aims to
find the solution which often expressed in the form of a sequence of actions.
• But there might be some situations where more than one agent is searching for the solution
in the same search space, and this situation usually occurs in game playing.
• The environment with more than one agent is termed as multi-agent environment, and in
game each agent is an opponent of other agent and playing against each other. Each agent
needs to consider the action of other agent and effect of that action on their performance.
• Searches in which two or more players with conflicting goals are trying to explore the same
search space for the solution, are called adversarial searches, often known as Games.
• Games are modeled as a Search problem and their heuristic evaluation function, are the two
main factors which help to model and solve games in AI.
Types of Games in AI
Perfect information
• Agents can look into the complete board.
• Agents have all the information about the game, and they can see each other moves also.
• Examples are Chess, Checkers, Go, etc.
Imperfect information
• Agents do not have all information about the game and not aware with what's going on,
• Examples are Tic-tac-toe, Battleship, blind, Bridge, etc.
Deterministic games
• Games which follow a strict pattern and set of rules for the games
• There is no randomness associated with them.
• Examples are chess, Checkers, Go, tic-tac-toe, etc.
Non-deterministic games
• Games which have various unpredictable events and has a factor of chance or luck.
• This factor of chance or luck is introduced by either dice or cards.
• These are random, and each action response is not fixed.
• Such games are also called as stochastic games.
• Example: Backgammon, Monopoly, Poker, etc.
Zero-Sum Game
Zero-sum games is a mathematical representation in game theory and economic theory of a
situation that involves two sides, where the result is an advantage for one side and an equivalent
loss for the other.
In other words, player one's gain is equivalent to player two's loss, with the result that the net
improvement in benefit of the game is zero.
This requires embedded thinking or backward reasoning to solve the game problems in AI
Formalization of the problem
A game can be defined as a type of search in AI which can be formalized of the following
elements:
• Initial state: It specifies how the game is set up at the start.
• Player(s): It specifies which player has moved in the state space.
• Action(s): It returns the set of legal moves in state space.
• Result(s, a): It is the transition model, which specifies the result of moves in the state
space.
• Terminal-Test(s): Terminal test is true if the game is over, else it is false at any case. The
state where the game ends is called terminal states.
• Utility(s, p): A utility function gives the final numeric value for a game that ends in
terminal states s for player p. It is also called payoff function.
For Chess, the outcomes are a win, loss, or draw and its payoff values are +1, 0, ½.
And for tic-tac-toe, utility values are +1, -1, and 0.
Mini-Max Algorithm
• Minimax is a kind of backtracking algorithm that is used in decision making and game
theory to find the optimal move for a player, assuming that your opponent also plays
optimally.
• It is widely used in two player turn-based games such as Tic-Tac-Toe, Backgammon,
Mancala, Chess, etc.
• In Min imax the two players are called maximizer and minimizer.
• The maximizer (Player 1) tries to get the highest score possible while
the minimizer (Player 2) tries to minimize the score of player
Minimax Algorithm Steps
• Scoring-Based Games
The final difference between Player 1 and Player 2’s
scores is used.
Player 1 Score - Player 2 Score determines the value of
each leaf node.
Mini-Max Algorithm
In the below tree diagram, find the utility values and best strategy for Max.
let's take A is the initial state of the tree.
Step-1: In the first step of the algorithm game-tree is generated which is given
Step 2: The utility values/Scores for the terminal states are given.
Step 3: Backtracking
• Suppose maximizer takes first
turn which has worst-case initial
value =- infinity, and
• minimizer will take next turn
which has worst-case initial
value = +infinity.
• Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we will
compare each value in terminal state with initial value of Maximizer and determines the
higher nodes values. It will find the maximum among the all.
• In the next step, it's a turn for minimizer, so it will compare
all nodes value with +∞, and will find the 3rd layer node
values.
• Now it's a turn for Maximizer, and it will again choose the
maximum of all nodes value and find the maximum value
for the root node. In this game tree, there are only 4 layers,
hence we reach immediately to the root node, but in real
games, there will be more than 4 layers.
For node A max(4, -3)= 4
That was the complete workflow of the minimax two player
game.
48
48 30
Minimax(N)
1. if N is a terminal node
2. value ← eval(N) 48 74 30 45
3. else if N is a max node
4. value ← - infinity 11 48 53 74 23 30 50 45
5. for each child C of N
6. value ← max(value, Minimax(C))
7. else value ← +infinity
8. for each child C of N
9. value ← min(value, Minimax(C))
10. return value
Example 1 The figure shows a game tree with evaluations W (win), L (loss) and D (draw) from Max's
perspective. In this game tree the labels P, Q, R, S, T indicate strategies/moves at the root.
W L W L L
W L W W L
L L D D L L
1. What is the outcome (W, D or L) of the game when both players play perfectly?
2. Which of the moves P, Q, R, S, T are the best moves for Max?
3. Which of the moves P, Q, R, S, T are the best moves for MIN?
Example 2: The figure shows a 4-ply game tree with evaluation function values at the horizon. The nodes in the
horizon are assigned reference numbers A,B,C,...,P.
Root Node Player 1’s Turn Pick (1) [1, 5, 233, 7] Pick(7)
Level 1
(Player 2’s Turn) [5, 233, 7] Pick (7) Pick (1) [1, 5, 233] Pick (233)
Pick (5)
Level 2
(Player 1’s Turn) Pick (233) [233, 7] Pick(7) Pick(5) [5, 233] Pick (233) Pick (5) [5, 233] Pick (233) Pick(1) [1, 5] Pick (233)
Level 3
(Player 2’s Turn
[ 7] [233] [233] [5] [233] [5] [5] [1]
Final Move – Gets
left out value)
Score Calculation
Back Propagation
Player 1 [1,5,233,7]
[5]
Player 2 [5,233,7]
[7] [5]
[1,5,233]
Player 1 [233,7]
[7] [233,5]
[5] [5]
[233,5] [5,1]
[1]
The final difference between Player 1 and Player 2’s scores is used.
Player 1 Score - Player 2 Score determines the value of each leaf node.
Winning Strategy
[1, 5, 233, 7]
Pick(7)
Time complexity : O(b^d) b is the branching factor and d is count of depth or ply of graph or tree.
Space Complexity : O(bd) where b is branching factor into d is maximum depth of tree similar to DFS.
Alpha-Beta Pruning
• Alpha-Beta Pruning is an optimization technique for the Minimax algorithm used in
decision-making and game theory.
• It helps reduce the number of nodes that need to be evaluated in a game tree, making
Minimax more efficient without affecting the final decision.
• In the minimax search algorithm that the number of game states it has to examine are
exponential in depth of the tree.
• Alpha-beta pruning is a modified version of the minimax algorithm.
• Since we cannot eliminate the exponent, but we can cut it to half. Hence there is a
technique by which without checking each node of the game tree we can compute the
correct minimax decision, and this technique is called pruning.
• Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only prune
the tree leaves but also entire sub-tree.
Alpha-Beta Pruning
• The Alpha-beta pruning to a standard minimax algorithm returns the same move as the standard
algorithm does, but it removes all the nodes which are not really affecting the final decision but
making algorithm slow. Hence by pruning these nodes, it makes the algorithm fast.
This involves two threshold parameter Alpha and beta for future expansion, so it is called
alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
The two-parameter can be defined as:
• Alpha: The best (highest-value) choice we have found so far at any point along the path of
Maximizer. The initial value of alpha is -∞.
• Beta: The best (lowest-value) choice we have found so far at any point along the path of
Minimizer. The initial value of beta is +∞.
• At the first step the, Max player will start first move • At Node D, the value of α will be calculated as its turn
from node A where α= -∞ and β= +∞, these value of for Max. The value of α is compared with firstly 2 and
alpha and beta passed down to node B where again α= then 3, and the max (-∞ 2, 3) = 3 will be the value of α
-∞ and β= +∞, and Node B passes the same value to its at node D and node value will also 3.
child D.
• Now algorithm backtrack to node B, where the value • In the next step, algorithm traverse the next successor
of β will change as this is a turn of Min, Now β= +∞, of Node B which is node E, and the values of α= -∞,
will compare with the available subsequent nodes and β= 3 will also be passed.
value, i.e. min (∞, 3) = 3, hence at node B now α= -∞,
and β= 3.
• At node E, Max will take its turn, and the value of • At next step, algorithm again backtrack the tree, from
alpha will change. The current value of alpha will be node B to node A. At node A, the value of alpha will
compared with 5, so max (-∞, 5) = 5, hence at node E be changed the maximum available value is 3 as max
α= 5 and β= 3, where α>=β, so the right successor of (-∞, 3)= 3, and β= +∞, these two values now passes
E will be pruned, and algorithm will not traverse it, to right successor of A which is Node C.
and the value at node E will be 5.
At node C, α=3 and β= +∞, and the same values will be
passed on to node F.
• At node F, again the value of α will be compared • Node F returns the node value 1 to node C, at C α= 3
with left child which is 0, and max(3,0)= 3, and then and β= +∞, here the value of beta will be changed, it
compared with right child which is 1, and max(3,1)= will compare with 1 so min (∞, 1) = 1.
3 still α remains 3, but the node value of F will • Now at C, α=3 and β= 1, and again it satisfies the
become 1. condition α>=β, so the next child of C which is G will
be pruned, and the algorithm will not compute the
entire sub-tree G.
• C now returns the value of 1 to A here the best value for A is max (3, 1) = 3. Following is the final game tree
which is the showing the nodes which are computed and nodes which has never computed. Hence the optimal
value for the maximizer is 3 for this example.
Example 1: When AlphaBeta(root,-INF,+INF) is invoked, it passes the (alpha,beta) bounds to descendants,
where the bounds are updated with values received from horizon nodes.
• A wide variety of methods, including adversarial search and instant search, are used to address
various issues.
• Every method for issue has a single purpose in mind: to locate a remedy that will enable that
achievement of the objective.
• However, there were no restrictions just on bots' capability to resolve issues as well as arrive at
responses in adversarial search and local search, respectively.
• These section examines the constraint optimization methodology, another form or real concern
method.
• By its name, constraints fulfilment implies that such an issue must be solved while adhering to a set
of restrictions or guidelines.
Constraint Satisfaction Problem (CSP) deals with solving problems by identifying constraints and
finding solutions that satisfy those constraints.
Significance of Constraint Satisfaction Problem in AI
Domain
• Domains describe the variety of possible values that a variable might have.
• A domain may be finite or limitless, depending on the problem.
• For example, in Sudoku, a variable that represents a puzzle cell can have as its domain a
range of values from 1 to 9.
• It is denoted by “D”. Domains can be finite, like {1, 2, 3}, or continuous, such as real
numbers between 0 and 1.
Key Elements of CSPs
Constraints
• Constraints are the rules that control how variables interact with one another.
• The ranges of acceptable values for variables are determined by constraints in a CSP.
• The different types of constraints include unary constraints, binary constraints, and
higher-order constraints, to mention a few.
• For example, in a sudoku puzzle, the limitations might be that only one of each
number from 1 to 9 can appear in each row, column, and 3*3 boxes
• Constraints can be expressed in various ways, such as equations, inequalities, or logical
expressions.
Three factors affect restriction compliance, particularly regarding
X: It refers to a group of parameters.
D: The variables are contained within a collection several domain. Every variables has a
distinct scope.
C: It is a set of restrictions that the collection of parameters must abide by.
Constraints
Unary Constraints
• Unary constraints limit the possible values of a single variable without considering the
values of other variables.
• It is the easiest constraint to find, as it has only one parameter. Example: The
expression X1 ≠ 7 says that the variable X1 cannot have the value 7.
Binary Constraints
• Binary constraints describe the relationship between two variables and consist of only
two variables.
• Example: X1< X2 indicates that X1 must be less than X2 in order to be true.
Constraints
Global Constraints
• In contrast to unary or binary constraints, global constraints involve multiple variables
and impose a more complex relationship or restriction between them.
• Global constraints are often used in CSP problems to capture higher-level patterns,
structures, or rules.
• These restrictions can apply to any number of variables at once and are not limited to
pairwise interactions.
Alldifferent Constraint
• The Alldifferent constraint (AllDiff) requires that each variable in a set of variables has a
unique value.
• You commonly apply alldifferent constraints, when you want to be sure that no two
variables in a set can take the same value.
• Example: The expression alldifferent(X1, X2, X3) ensures that the values of X1, X2, and
X3 must be unique.
Sum Constraint
• The Sum Constraint requires that the sum of the values assigned to a group of variables
meet a particular requirement.
• It is useful for expressing restrictions like “the sum of these variables should equal a
certain value.”
• Example: The expression Sum(X1, X2, X3) = 15 demands that the sum of the values for
X1, X2, and X3 be 15.
Domain Categories in CSP
In Constraint Satisfaction Problems (CSPs), domain categories refer to the set of possible values that
can be assigned to each variable in the problem.
The specific categories or domains can vary depending on the nature of the CSP, but here are some
common domain categories:
Finite Domain: Variables in many CSPs have finite domains that are made up of discrete values.
Examples comprise:
• Binary Domains: Domains that only have two values (for binary CSPs, this would be 0 and 1).
• Integer Domains: Domains made up of a limited number of integer values, such as 1, 2, 3, and 4, are
known as integer domains.
• Enumeration Domains: Domains containing a limited number of distinct values, such as “red, green,
and blue” in an issue involving color assignment.
Continuous Domains: Some CSPs contain variables whose domains are continuous, i.e., they can accept any real
number falling within a given range.
Examples comprise:
• Real-valued Domains: Variables may accept any real number that falls within a given range (for example,
X [0, 1]).
• Interval Domains: Variables are limited to a specific range of real values in the interval domain. e.g., X ∈
[−π, π])
Algorithms in CSP
Constraint Satisfaction Problems (CSPs) are typically solved using various algorithms designed to
find a consistent assignment of values to variables that satisfies all the constraints.
Some of the common algorithms used for solving CSPs include:
The Backtracking Algorithm
• The backtracking algorithm is a popular method for resolving CSPs.
• It looks for the search space by picking a variable, setting a value for it, and then recursively
scanning through the other variables.
• In the event of a conflict, it goes back and tries a different value for the preceding variable.
Forward Checking
• The backtracking technique has been improved using forward checking.
• It tracks the remaining accurate values of the unassigned variables after each assignment and
reduces the domains of variables whose values don’t match the assigned ones.
• As a result, the search space is smaller, and constraint propagation is more effectively
accomplished.
Constraint Propagation
• Constraint propagation techniques reduce the search space by removing values inconsistent with
current assignments through local consistency checks.
• To do this, techniques like generalized arc consistency and path consistency are applied.
Real-World Examples of CSPs
To illustrate CSPs, consider the following examples:
•Sudoku Puzzles: In Sudoku, the variables are the empty cells, the domains are numbers from 1 to 9,
and the constraints ensure that no number is repeated in a row, column, or 3x3 subgrid.
•Scheduling Problems: In university course scheduling, variables might represent classes, domains
represent time slots, and constraints ensure that classes with overlapping students or instructors cannot
be scheduled simultaneously.
•Map Coloring: In the map coloring problem, variables represent regions or countries, domains
represent available colors, and constraints ensure that adjacent regions must have different colors.
These examples demonstrate how CSPs provide a framework for modeling and solving problems that
require satisfying various conditions and limitations, making them a fundamental tool in AI and
operations research.
Example: Formulate the map coloring problem for the map of Australia, shown below
X: {WA, NT, SA, Q, NSW, V, T}, where each variable represents a state or territory of Australia.
D: {red, green, blue}, where each variable has the same domain of three colors.
C: < (WA, NT), WA!= NT >, < (WA, SA), WA!= SA >, < (NT, SA), NT!= SA >, < (NT, Q), NT!= Q
>, < (SA, Q), SA!= Q >, < (SA, NSW), SA!= NSW >, < (SA, V), SA!= V >, < (Q, NSW), Q!=
NSW >, < (NSW, V), NSW!= V >},Each constraint is a binary constraint that states that two
adjacent regions must have different colors.
Backtracking Algorithm
A recursive depth-first search that tries to assign values to variables one by one and backtracks
if a conflict is found.
It is a systematic search algorithm that explores possible assignments for variables,
backtracking when it encounters constraints that cannot be satisfied.
This method is more efficient than pure backtracking because it prevents some conflicts
before they happen, reducing unnecessary computations.
The algorithm starts with an empty assignment and
selects the first variable to assign.
1) According to the MRV heuristic, the variable with The algorithm then uses forward checking to
the most constraints is SA, as it has six prune the domains of the neighboring variables
neighbors. and updates the domains as follows:
The algorithm then tries to assign a value to SA,
and according to the LCV heuristic, it chooses • WA: {green, red}, NT: {green, red},Q: {green,
red, as it is the least constraining value for the red}, NSW: {green, red}, V: {green, red}, T:
neighboring variables. {red, green, red}
2) The algorithm then recurses to the next level and selects the next variable to assign.
According to the MRV heuristic, the variable with the most constraints is NT, as it has three
neighbors.
The algorithm then tries to assign a value to NT, and according to the LCV heuristic, it chooses
green, as it is the least constraining value for the neighboring variables.
The algorithm then uses forward checking to prune the domains of the neighboring variables and
updates the domains as follows:
WA: {red}, Q: {red}, NSW: {green, red}, V: {green, red}, T: {red, green, blue}
3) The algorithm then recurses to the next level and selects the next variable to assign.
According to the MRV heuristic, the variable with the most constraints is Q, as it also has three
neighbors.
The algorithm then tries to assign a value to Q, and according to the LCV heuristic, it chooses red, as
it is the only remaining value for Q.
The algorithm then uses forward checking to prune the domains of the neighboring variables and
updates the domains as follows:
WA: {red}, NSW: {green}, V: {green, red}, T: {red, green, blue}
and so on….
Final coloring
Example for usefulness of forward checking
Forward checking is based on the idea that once variable X i is assigned a value v, then
certain future variable-value pairs (Xj,v’) become impossible.
Starting with WA without applying any heuristics we can see SA has no color options
Constraint Propagation: Reducing the domain of variables based on constraint compliance is
known as constraint propagation.
• Extends forward checking by spreading the effect of constraints beyond immediate neighbors,
ensuring a more global reduction of the search space.
• Constraints are propagated between related variables.
• Inconsistent values are eliminated from variable domains by leveraging information gained from other
variables.
• These algorithms refine the search space by making inferences, removing values that would lead to
conflicts.
Steps in Constraint Propagation for Map Coloring
1. Initial Setup: Each region starts with a set of available colors.
2. Assign a Color: When a region is assigned a color, constraint propagation updates the domains of its
neighboring regions.
3. Eliminate Conflicting Colors: The chosen color is removed from the available colors of adjacent regions and
further propagate this restriction recursively.
4. Use techniques like Arc Consistency (AC-3) to ensure all remaining uncolored regions still have at least one
valid color.
5. Propagate Constraints: If a neighboring region now has only one available color, it must be assigned that
color, further restricting other regions.
6. Repeat Until No More Reductions: Continue eliminating invalid options until no more values can be pruned.
NT and SA cannot both be blue!
Constraint propagation repeatedly enforces
constraints locally
Example: 4-Queens Problem
Cryptarithmetic puzzles
Solve the cryptarithmethic problem shown in Fig. using the strategy of backtracking with forward
checking and the MRV and least-constraining-value heuristic.
A cryptarithmetic problem. Each letter stands for a distinct digit; the aim is to find a substitution of
digits for letters such that the resulting sum is arithmetically correct, with the added restriction that no
leading zeroes are allowed.
Alldiff (F,T,U,W,R,O)
C_1, C_2, and C_3 are auxiliary variables representing the digit carried over into the tens, hundreds, or
thousands column. The carries can take the values {0,1}
Consistent or Legal Assignment: A task is referred to as consistent or legal if it complies
with all laws and regulations.
Complete Assignment: An assignment in which each variable has a number associated to it
and that the CSP solution is continuous. One such task is referred to as a completed task.
Partial assignment is one that just gives some of the variables values. Projects of this nature
are referred to as incomplete assignment.
INTELLIGENT AGENTS
• Adopt the view that intelligence is concerned mainly with rational action.
• Ideally, an intelligent agent takes the best possible action in a situation.
• We study the problem of building agents that are intelligent in this sense.
• The concept of rationality can be applied to a wide variety of agents operating in any
imaginable environment.
AGENTS AND ENVIRONMENTS
An agent is anything that can be
viewed as perceiving its environment
through sensors and acting upon that
environment through actuators.
• A human agent has eyes, ears, and other organs for sensors and hands, legs, vocal tract, and
so on for actuators.
• A robotic agent might have cameras and infrared range finders for sensors and various
motors for actuators.
• A software agent receives keystrokes, file contents, and network packets as sensory inputs
and acts on the environment by displaying on the screen, writing files, and sending
network packets.
Percept
• The term percept refer to the agent’s perceptual inputs at any given instant.
• An agent’s percept sequence is the complete history of everything the agent has ever
perceived.
• An agent’s choice of action at any given instant can depend on the entire percept
sequence observed to date, but not on anything it hasn’t perceived.
• The various vacuum-world agents can be defined simply by filling in the right-hand
column in various ways.
• The obvious question, then, is this:
• What is the right way to fill out the table?
• In other words, what makes an agent good or bad, intelligent or stupid
Agent Function and Agent Program
• Mathematically, an agent’s behavior is • Internally, the agent function for an
described by the agent function that maps artificial agent will be implemented by an
any given percept sequence to an action. agent program.
[f: P* A]
• The agent program runs on the physical
• Tabulating the agent function that describes architecture to produce
any given agent; for most agents, this would
be a very large table—infinite, in fact, Agent = architecture + program
unless we place a bound on the length of
percept sequences we want to consider. • It is important to keep these two ideas
distinct.
• Given an agent to experiment with, we can,
construct this table by trying out all possible • The agent function is an abstract
percept sequences and recording which mathematical description; the agent
actions the agent does in response. program is a concrete implementation,
• The table is, of course, an external
running within some physical system.
characterization of the agent.
Performance Measure
Good Behavior: The Concept Of Rationality
• A rational agent is one that does the right thing—every entry in the table for
the agent function is filled out correctly.
• Obviously, doing the right thing is better than doing the wrong thing, but
what does it mean to do the right thing?
• By considering the consequences of the agent’s behavior.
Omniscience
• The state of knowing everything
• An omniscient agent knows the actual outcome of its actions and can act accordingly; but
omniscience is impossible in reality.
Learning
• Doing actions in order to modify future percepts—sometimes called information
gathering
• A rational agent not only gather information but also to learn as much as possible from
what it perceives.
• The agent’s initial configuration could reflect some prior knowledge of the environment,
but as the agent gains experience this may be modified and augmented.
• There are extreme cases in which the environment is completely known a priori.
• In such cases, the agent need not perceive or learn; it simply acts correctly.
• Such agents are fragile.
• Successful agents split the task of computing the agent function into three different
periods.
• When the agent is being designed, some of the computation is done by its designers:
when it is deliberating on its next actions, the agent does more computation and its learn
from its experience, it does even more computation to decide how to modify its
behavior.
Autonomy
• If an agent relies on the prior knowledge of its designer rather than on its own percepts,
we say that the agent lacks autonomy.
• A rational agent should be autonomous—it should learn what it can to compensate for
partial or incorrect prior knowledge.
• Agent seldom requires complete autonomy from the start: when the agent has had little or
no experience, it would have to act randomly unless the designer gave some assistance.
• It would be reasonable to provide an artificial intelligent agent with some initial
knowledge as well as an ability to learn.
• After sufficient experience of its environment, the behavior of a rational agent can
become effectively independent of its prior knowledge.
• Hence, the incorporation of learning allows one to design a single rational agent that will
succeed in a vast variety of environments.
THE NATURE OF ENVIRONMENTS
• Task environments, which are essentially the “problems” to which rational agents are
the “solutions”.
• Specifying the task environment
• The performance measure, the environment, and the agent’s actuators and sensors are
grouped under the heading of the task environment. Acronymically PEAS (Performance,
Environment, Actuators, Sensors)
• In designing an agent, the first step must always be to specify the task environment as
fully as possible.
• List of agent types includes some programs that operate in the entirely artificial
environment defined by keyboard input and character output on a screen.
• In fact, what matters is not the distinction between “real” and “artificial” environments,
but the complexity of the relationship among the behavior of the agent, the percept
sequence generated by the environment, and the performance measure.
• Some “real” environments are actually quite simple..
• In contrast, some software agents (or software robots or softbots) exist in rich, unlimited
domains.
TYPES OF ENVIRONMENTS
Fully observable vs. partially observable:
• If an agent’s sensors give it access to the complete state of the environment at each
point in time, then we say that the task environment is fully observable.
• A task environment is effectively fully observable if the sensors detect all aspects that
are relevant to the choice of action; relevance, in turn, depends on the performance
measure.
• Fully observable environments are convenient because the agent need not maintain
any internal state to keep track of the world.
• An environment might be partially observable because of noisy and inaccurate sensors
or because parts of the state are simply missing from the sensor data.
• If the agent has no sensors at all then the environment is unobservable.
• One might think that in such cases the agent’s plight is hopeless, but, the agent’s goals
may still be achievable, sometimes with certainty.
Single agent vs. multi agent:
• The distinction between single-agent and multiagent environments may seem simple
enough.
• For example, an agent solving a crossword puzzle by itself is clearly in a single-agent
environment, whereas an agent playing chess is in a two-agent environment.
• Chess is a competitive multiagent environment.
• In the taxi-driving environment, avoiding collisions maximizes the performance
measure of all agents, so it is a partially cooperative multiagent environment.
• It is also partially competitive because, for example, only one car can occupy a parking
space.
• The agent-design problems in multiagent environments are often quite different from
those in single-agent environments; for example, communication often emerges as a
rational behavior in multiagent environments; in some competitive environments,
randomized behavior is rational because it avoids the pitfalls of predictability
• Episodic vs. sequential:
• In an episodic task environment, the agent’s experience is divided into atomic
episodes.
• In each episode the agent receives a percept and then performs a single action.
• Crucially, the next episode does not depend on the actions taken in previous
episodes. Many classification tasks are episodic. For example, an agent that has to
spot defective parts on an assembly line bases each decision on the current part,
regardless of previous decisions; moreover, the current decision doesn’t affect
whether the next part is defective.
• In sequential environments, on the other hand, the current decision could affect all
future decisions.
• Chess and taxi driving are sequential: in both cases, short-term actions can have
long-term consequences.
• Episodic environments are much simpler than sequential environments because the
agent does not need to think ahead.
Static vs. dynamic:
• If the environment can change while an agent is deliberating, then we say the
environment is dynamic for that agent; otherwise, it is static.
• Static environments are easy to deal with because the agent need not keep looking at
the world while it is deciding on an action, nor need it worry about the passage of time.
• Dynamic environments, on the other hand, are continuously asking the agent what it
wants to do; if it hasn’t decided yet, that counts as deciding to do nothing.
• If the environment itself does not change with the passage of time but the agent’s
performance score does, then we say the environment is semi-dynamic.
• Taxi driving is clearly dynamic: the other cars and the taxi itself keep moving while
the driving algorithm dithers about what to do next.
• Chess, when played with a clock, is semi-dynamic.
• Crossword puzzles are static.
Discrete vs. continuous:
• The discrete/continuous distinction applies to the state of the environment, to the way
time is handled, and to the percepts and actions of the agent.
• For example, the chess environment has a finite number of distinct states (excluding the
clock).
• Chess also has a discrete set of percepts and actions.
• Taxi driving is a continuous-state and continuous-time problem: the speed and location
of the taxi and of the other vehicles sweep through a range of continuous values and do
so smoothly over time.
• Taxi-driving actions are also continuous (steering angles, etc.). Input from digital
cameras is discrete, strictly speaking, but is typically treated as representing continuously
varying intensities and locations
Known vs. unknown
• Strictly speaking, this distinction refers not to the environment itself but to the agent’s
(or designer’s) state of knowledge about the “laws of physics” of the environment.
• In a known environment, the outcomes (or outcome probabilities if the environment is
stochastic) for all actions are given.
• Obviously, if the environment is unknown, the agent will have to learn how it works in
order to make good decisions.
• Note that the distinction between known and unknown environments is not the same as
the one between fully and partially observable environments.
• It is quite possible for a known environment to be partially observable—for example, in
solitaire card games, we know the rules but still unable to see the cards that have not yet
been turned over.
• Conversely, an unknown environment can be fully observable—in a new video game,
the screen may show the entire game state but I still don’t know what the buttons do until
we try them.
THE STRUCTURE OF AGENTS
• Agents by describing behavior—the action that is performed after any given sequence
of percepts
• The job of AI is to design an agent program that implements the agent function— the
mapping from percepts to actions.
• We assume this program will run on some sort of computing device with physical
sensors and actuators—we call this the architecture
agent = architecture + program .
• Obviously, the program chosen has to be one that is appropriate for the architecture.
• If the program is going to recommend actions like Walk, the architecture had better
have legs. The architecture might be just an ordinary PC, or it might be a robotic car
with several onboard computers, cameras, and other sensors.
• In general, the architecture makes the percepts from the sensors available to the
program, runs the program, and feeds the program’s action choices to the actuators as
they are generated.
Agent programs
• Four basic kinds of agent programs that embody the principles underlying almost all
intelligent systems:
• Simple reflex agents
• Model-based reflex agents
• Goal-based agents
• Utility-based agents
• The agent programs take the current percept as input from the sensors and return an action
to the actuators.
• Agent function, which takes the entire percept history. The agent program takes just the
current percept as input because nothing more is available from the environment;
• if the agent’s actions need to depend on the entire percept sequence, the agent will have to
remember the percepts.
• Each kind of agent program combines particular components in particular ways to generate
actions
Simple reflex agents
• The simplest kind of agent is the simple reflex agent. These agents select actions on the
basis of the current percept, ignoring the rest of the percept history.
• For example, the vacuum agent because its decision is based only on the current location
and on whether that location contains dirt.
• Notice that the vacuum agent program is very small indeed compared to the
corresponding table.
• The most obvious reduction comes from ignoring the percept history, which cuts down the
number of possibilities from 4T to just 4.
• A further, small reduction comes from the fact that when the current square is dirty, the
action does not depend on the location.
• Simple reflex behaviors occur even in more complex environments.
• A simple reflex agent is the runt of the litter.
• It has very limited intelligence and operates on a direct condition-action rule.
• These rule-based agents aren’t suited for complex tasks. However, they’re perfectly adept at the
specific tasks they’re designed for.
• Simple reflex agents are suited for straightforward tasks in a predictable environment. This kind of
agent’s actions affect the world around it, but only in specific tasks.
Thermostats
It’s 6pm in the winter? Crank that heat up. It’s noon in the summer? This simple reflex agent, with its
limited intelligence, will turn on the AC.
Automatic doors
While its perceived intelligence is low, automatic doors are often examples of simple reflex agents. This
AI agent senses a human in front of a door, and it opens. Beautifully simple.
Smoke detectors
This AI agent operates from your kitchen ceiling. Yep, it’s a simple reflex agent, too. It senses smoke,
and it sounds an alarm.
Basic spam filters
Some agents in artificial intelligence have been helping us daily for years. The email spam filter is one
of these. Basic versions don’t use natural language processing, but rather keywords or the sender’s
reputation.
Model-based reflex agents
• The most effective way to handle partial observability is for the agent to keep track of the
part of the world it can’t see now. That is, the agent should maintain some sort of internal
state that depends on the percept history and thereby reflects at least some of the
unobserved aspects of the current state.
• For the braking problem, the internal state is not too extensive just the previous frame from
the camera, allowing the agent to detect when two red lights at the edge of the vehicle go
on or off simultaneously.
• For other driving tasks such as changing lanes, the agent needs to keep track of where the
other cars are if it can’t see them all at once. And for any driving to be possible at all, the
agent needs to keep track of where its keys are.
• Updating this internal state information as time goes by requires two kinds of knowledge
to be encoded in the agent program.
• First, we need some information about how the world evolves independently of the agent
—for example, that an overtaking car generally will be closer behind than it was a moment
ago.
• Second, we need some information about how the agent’s own actions affect the world—
for example, that when the agent turns the steering wheel clockwise, the car turns to the
right, or that after driving for five minutes northbound on the freeway, one is usually about
five miles north of where one was five minutes ago.
• This knowledge about “how the world works”—whether implemented in simple Boolean
circuits or in complete scientific theories—is called a model of the world. An agent that
uses such a is called a model-based agent.
• When you need to adapt to information that isn’t always visible or predictable, model-based
reflex agents are the tool to use.
• Unlike simple reflex agents that react solely based on current perceptions, model-based reflex
agents maintain an internal state that allows them to predict partially observable environments.
This is an internal model of the section of the world relevant to their duties.
• This model is constantly updated with incoming data from their environment, so that the AI
agent can make inferences about unseen parts of the environment and anticipate future
conditions.
• They assess the potential outcomes of their actions before making decisions, allowing them to
handle complications. This is especially useful when doing complex tasks, like driving a car in
a city, or managing an automated smart home system.
• Because of their ability to combine past knowledge and real-time data, model-based reflex
agents can optimize their performance, no matter the task. Like a human, they can make
context-aware decisions, even when the conditions are unpredictable.
Autonomous Vehicles
Even though these cars span multiple types of intelligent agents, they’re a good example of model-
based reflex agents.
Complex systems like traffic and pedestrian movements are exactly the kind of challenge that model-
based reflex agents are designed for.
Their internal model is used to make real-time decisions on the road, like braking when another car
runs a red light, or slowing down rapidly when the car ahead does the same. Their internal system is
constantly updating based on their environmental inputs: other cars, activity at crosswalks, the
weather.
Modern irrigation systems
Model-based reflex agents are the powerhouse behind modern irrigation systems. Their ability to
respond to unexpected environmental feedback is perfectly suited for weather and soil moisture levels.
The AI agent’s internal model represents and predicts various environmental factors, like soil moisture
levels, weather conditions, and plant water requirements.
These agents continuously collect data from sensors in their fields, including real-time information on
humidity, temperature, and precipitation.
By analyzing this data, the model-based reflex agent can make informed decisions about when to
water, how much water to dispense, and which zones of a field require more attention. This predictive
capability allows the irrigation system to optimize water usage, ensuring that plants receive exactly
what they need to thrive (without wasting water).
Home automation systems
The internal model here is that of a home’s environment – these systems are continuously updated with
data from sensors, and use this information to inform their decisions.
A thermostat will detect changing temperatures and configure as needed. Or a lighting system might
detect darkness outdoors and adjust accordingly – since this darkness might come from nighttime, or
from an unexpected thunderstorm, it requires an intelligent agent to both anticipate and react to
differences.
Goal-based agents
• Knowing something about the current state of the environment is not always enough to
decide what to do.
• For example, at a road junction, the taxi can turn left, turn right, or go straight on.
• The correct decision depends on where the taxi is trying to get to.
• In other words, as well as a current state description, the agent needs some sort of goal
information that describes situations that are desirable—for example, being at the
passenger’s destination.
• The agent program can combine this with the model (the same information as was used in
the model based reflex agent) to choose actions that achieve the goal.
• Sometimes goal-based action selection is straightforward—
for example, when goal satisfaction results immediately
from a single action.
• Sometimes it will be more tricky—for example, when the
agent has to consider long sequences of twists and turns in
order to find a way to achieve the goal.
• Search and planning are the subfields of AI devoted to
finding action sequences that achieve the agent’s goals.
• Goal-based AI agents are designed to achieve specific goals with artificial intelligence.
• Instead of just responding to stimuli, these rational agents are capable of considering the future
consequences of their actions, so they can make strategic decisions to reach their goals.
• Unlike simple reflex agents, which respond directly to stimuli based on condition-action rules, goal-
based agents evaluate and plan actions to meet their goals.
• What makes them distinct from other types of intelligent agents is their ability to combine foresight
and strategic planning to navigate towards specific outcomes.
Roomba- Robotic vacuum cleaners are designed with a specific goal: clean all accessible floor space.
This goal-based agent has a simple goal, and it does it well.
All their decisions made by this goal-based agent (like when to rotate) are made in pursuit of this lofty
goal. The cats that sit on top of them are just a bonus.
Project Management Software
While it may also use a utility-based agent, project management software usually focuses on achieving
a specific project objective.
These AI agents will often schedule tasks and allocate resources so that a team is optimized to complete
a project on time. The agent evaluates the most likely course of success and actions it on behalf of a
team.
Video Game AI
In strategy and role-playing games, AI characters act as goal-based agents – their objectives might
range from defending a location to defeating an opponent.
These dolled-up AI agents consider a variety of strategies and resources – which attack to use, which
power-up to burn – so that they can achieve their goal.
Utility-based agents
• Goals alone are not enough to generate high-quality behavior in most environments.
• For example, many action sequences will get the taxi to its destination (thereby achieving
the goal) but some are quicker, safer, more reliable, or cheaper than others.
• Goals just provide a crude binary distinction between “happy” and “unhappy” states.
• A more general performance measure should allow a comparison of different world states
according to exactly how happy they would make the agent.
• Because “happy” does not sound very scientific, economists and computer scientists use
the term utility instead
• Performance measure assigns a score to any given
sequence of environment states, so it can easily distinguish
between more and less desirable ways of getting to the
taxi’s destination.
• An agent’s utility function is essentially an internalization
of the performance measure. If the internal utility function
and the external performance measure are in agreement,
then an agent that chooses actions to maximize its utility
will be rational according to the external performance
measure.
• Like goal-based agents, a utility-based agent has many advantages in terms of
• flexibility and learning.
• Furthermore, in two kinds of cases, goals are inadequate but a utility-based agent can still
make rational decisions.
• First, when there are conflicting goals, only some of which can be achieved (for
example, speed and safety), the utility function specifies the appropriate tradeoff.
• Second, when there are several goals that the agent can aim for, none of which can
be achieved with certainty, utility provides a way in which the likelihood of success can
be weighed against the importance of the goals.
• Partial observability and stochasticity are ubiquitous in the real world, and so, therefore, is
decision making under uncertainty.
• Technically speaking, a rational utility-based agent
• chooses the action that maximizes the expected utility of the action outcomes
—that is, the utility the agent expects to derive, on average, given the
probabilities and utilities of each outcome.
• Any rational agent must behave as if it possesses a utility function whose expected value
it tries to maximize.
• An agent that possesses an explicit utility function can make rational decisions with a
general-purpose algorithm that does not depend on the specific utility function being
maximized.
• In this way, the “global” definition of rationality—designating as rational those agent
functions that have the highest performance—is turned into a “local” constraint on
rational-agent designs that can be expressed in a simple program.
• Unlike simpler agents that might merely react to environmental stimuli, utility-based agents
evaluate their potential actions based on the expected utility. They’ll predict how useful or
beneficial each option is in regards to their set goal.
• Utility-based agents excel in complex decision-making environments with multiple potential
outcomes – like balancing different risks in order to make investment decisions, or weigh side
effects of treatment options.
• The utility function of these intelligent agents is a mathematical representation of its preferences.
The utility function maps to the world around it, deciding and ranking which option is the most
preferable. Then a utility agent can choose the optimal action.
• Since they can process large amounts of data, they’re useful in any field that involves high-stakes
decision-making.
Financial Trading
Utility-based agents are well-suited for stock and cryptocurrency markets – they’re able to buy or sell
based on algorithms that aim to maximize financial returns or minimize losses. This type of utility
function can take into account both historical data and real-time market data.
Dynamic Pricing Systems
Ever paid extra for an Uber or Lyft in the rain? That’s a utility-based agent at work – they can adjust
prices in real-time for flights, hotels, or ride-sharing, based on demand, competition, or time of
booking.
Smart Grid Controllers
These types of intelligent agents are the ‘smart’ in smart grids: it’s utility-based agents that control the
distribution and storage of electricity.
They optimize the use of resources based on demand forecasts and energy prices to improve
efficiency and reduce costs.
Personalized Content Recommendations
You finish watching a movie and Netflix recommends 3 more just like it.
Streaming services like Netflix and Spotify use utility-based agents to suggest similar content to
users. The optimized utility here is how likely you are to click on it.
Learning agents
• A learning agent can be divided into four conceptual components
• The most important distinction is between the learning element, which is responsible for
making improvements, and the performance element, which is responsible for selecting
external actions.
• The performance element is what we have previously considered to be the entire agent: it
takes in percepts and decides on actions.
• The learning element uses feedback from the critic on how the agent is doing and determines
how the performance element should be modified to do better in the future.
• The design of the learning element depends very much on the design of the performance
element.
• When trying to design an agent that learns a certain capability, the first question is not
“How am I going to get it to learn this?” but “What kind of performance element will my
agent need to do this once it has learned how?”
• Given an agent design, learning mechanisms can be constructed to improve every part of
the agent.
• The critic tells the learning element how well the agent is doing with respect to a fixed
performance standard. The critic is necessary because the percepts themselves provide no
indication of the agent’s success.
• For example, a chess program could receive a percept indicating that it has checkmated its
opponent, but it needs a performance standard to know that this is a good thing; the
percept itself does not say so.
• It is important that the performance standard be fixed. Conceptually, one should think of it
as being outside the agent altogether because the agent must not modify it to fit its own
behavior.
• The last component of the learning agent is the problem generator.
• It is responsible for suggesting actions that will lead to new and informative experiences.
• The point is that if the performance element had its way, it would keep doing the actions
that are best, given what it knows.
• But if the agent is willing to explore a little and do some perhaps suboptimal actions in the
short run, it might discover much better actions for the long run.
• The problem generator’s job is to suggest these exploratory actions. This is what scientists
do when they carry out experiments.
• Learning agents stand out due to their ability to adapt and improve over time based on their
experiences.
• Unlike more static AI agents that operate solely on pre-programmed rules or models, a learning
agent can evolve its behavior and strategies. Because of this learning element, they’re most often
used in changing environments.
Fraud Detection
Fraud detection systems operate by continuously collecting data and then adjusting to recognize
fraudulent patterns more effectively. Since scammers are always changing their tactics, fraud detection
agents need to keep adapting, too.
Content Recommendation
Platforms like Netflix and Amazon use a system equipped with a learning agent to improve their
recommendations for movies, shows, and products.
Even if your profile says you should like horror and thriller movies, if you suddenly switch to rom-
coms, your recommendations will adapt. Just like us, it’s always learning.
Speech Recognition Software
Applications like Google Assistant and Siri make use of a learning agent to better understand sour
garbled attempts to speak to them.
It’s thanks to learning agents that these systems get better at understanding accents and slang – so we
can ask Siri things like, “Och, Siri, can ye find me the nearest chippy for some supper? I'm pure
peckish!"
Adaptive Thermostats
Even smart thermostats – like Nest – learn from user behavior, like when users tend to be home or
away, and their preferred temperatures.
This information might always be changing, so thermostats must be able to adapt over time – this
makes them another example of a learning agent.
Solving Problems by Searching
• Kind of goal-based agent called a problem-solving agent.
• Problem-solving agents use atomic representations—that is, states of the world are
considered as wholes, with no internal structure visible to the problem solving algorithms.
• Goal-based agents that use more advanced factored or structured representations are
usually called planning agents.
• Problem solving begins with precise definitions of problems and their solutions
• uninformed search algorithms—algorithms that are given no information about the
problem other than its definition. Although some of these algorithms can solve any
solvable problem, none of them can do so efficiently.
• Informed search algorithms, can do quite well given some guidance on where to look
for solutions.
PROBLEM-SOLVING AGENTS
Formulation
Goal formulation, based on the current situation and the agent’s performance measure, is
the first step in problem solving.
• Imagine an agent in the city of Arad, Romania, enjoying a touring holiday.
• The agent’s performance measure contains many factors: it wants to improve its suntan,
improve its Romanian, take in the sights, enjoy the nightlife (such as it is), avoid
hangovers, and so on.
• The decision problem is a complex one involving many tradeoffs and careful reading of
guidebooks.
• Now, suppose the agent has a nonrefundable ticket to fly out of Bucharest the following
day. In that case, it makes sense for the agent to adopt the goal of getting to Bucharest.
• Courses of action that don’t reach Bucharest on time can be rejected without further
consideration and the agent’s decision problem is greatly simplified.
• Goals help organize behavior by limiting the objectives that the agent is trying to
achieve and hence the actions it needs to consider.
Problem formulation is the process of deciding what actions and states to consider,given a
goal.
• Consider a goal to be a set of world states—exactly those states in which the goal is
satisfied.
• The agent’s task is to find out how to act, now and in the future, so that it reaches a goal
state.
• Before it can do this, it needs to decide (or we need to decide on its behalf) what sorts of
actions and states it should consider.
• If it were to consider actions at the level of “move the left foot forward an inch” or “turn
the steering wheel one degree left,” the agent would probably never find its way out of
the parking lot, let alone to Bucharest, because at that level of detail there is too much
uncertainty in the world and there would be too many steps in a solution.
• Let us assume that the agent will consider actions at the level of driving from one major
town to another. Each state therefore corresponds to being in a particular town