AI-unit-4
AI-unit-4
Planning
Classical Planning: Definition of Classical Planning, Algorithms for Planning with State
Space Search, Planning Graphs, other Classical Planning Approaches, Analysis of Planning
approaches.
Planning and Acting in the Real World: Time, Schedules, and Resources, Hierarchical Planning,
Planning andActing in Nondeterministic Domains, Multi agent planning
Planning Classical Planning: AI as the study of rational action, which means that planning—devising a
plan of action to achieve one’s goals—is a critical part of AI. We have seen two examples of planning
agents so far the search-based problem-solving agent.
DEFINITION OF CLASSICAL PLANNING: The problem-solving agent can find sequences of actions that
result in a goal state. But it deals with atomic representations of states and thus needs good domain-
specific heuristics to perform well. The hybrid propositional logical agent can find plans without
domain- specific heuristics because it uses domain-independent heuristics based on the logical
structure of the problem but it relies on ground (variable-free)
propositional inference, which means that it may be swamped when there are many actions
and states .For example,intheworld,thesimpleactionofmovingastepforwardhadtoberepeated for
all four agent orientations, T time steps, and n2 currentlocations.
In response to this, planning researchers have settled on a factored representation— one in which a
state of the world is represented by a collection of variables. We use a language called PDDL, the
Planning Domain Definition Language that allows us to express all 4Tn2 actions with one action
schema. There have been several versions of PDDL.we select a simple version and alter its syntax to be
consistent with the rest of the book. We now show how PDDL describes the four things we need to
define a search problem: the initial state, the actions that are available in a state, the result of applying
an action, and the goal test.
Each state is represented as a conjunction of flaunts that are ground, functionless atoms. For example,
Poor 𝖠 Unknown might represent the state of a hapless agent, and a state in a package delivery
problem might be At(Truck 1, Melbourne) 𝖠 At(Truck 2, Sydney ).
Database semantics is used: the closed-world assumption means that any flaunts that are not
mentioned are false, and the unique names assumption means that Truck 1 and Truck 2 are distinct.
A set of ground (variable-free) actions can be represented by a single action schema. The schema is a
lifted representation—it lifts the level of reasoning from propositional logic to a restricted subset of
first-order logic. For example, here is an action schema for flying a plane from one location to another:
Action(Fly (p, from, to),
PRECOND:At(p, from) 𝖠 Plane(p) 𝖠 Airport (from) 𝖠 Airport (to) EFFECT:¬At(p, from) 𝖠 At(p, to))
The schema consists of the action name, a list of all the variables used in the schema, a precondition
and an effect.
A set of action schemas serves as a definition of a planning domain. A specific problem within the
domain is defined with the addition of an initial state and a goal.
An air cargo transport problem involving loading and unloading cargo and flying it from place to place.
The problem can be defined with three actions: Load , Unload , and Fly . The actions affect two
predicates: In(c, p) means that cargo c is inside plane p, and At(x, a) means that object x (either plane
or cargo) is at airport a. Note that some care must be taken to make sure the At predicates are
maintained properly. When a plane flies from one airport to another, all the cargo inside the plane
goes with it. In first-order logic it would be easy to quantify over all objects that are inside the plane.
But basic PDDL does not have a universal quantifier, so we need a different solution. The approach we
use is to say that a piece of cargo ceases to be At anywhere when it is In a plane; the cargo only
becomes At the new airport when it is unloaded. So At really means “available for use at a given
location.”
We consider the theoretical complexity of planning and distinguish two decision problems. PlanSAT is
the question of whether there exists any plan that solves a planning problem. Bounded PlanSAT asks
whether there is a solution of length k or less; this can be used to find an optimal plan.
The first result is that both decision problems are decidable for classical planning. The proof follows
from the fact that the number of states is finite. But if we add function symbols to the language, then
the number of states becomes infinite, and PlanSAT becomes only semi decidable: an algorithm exists
that will terminate with the correct answer for any solvable problem, but may not terminate on
unsolvable problems. The Bounded PlanSAT problem remains decidable even in the presence of
function symbols.
Both PlanSAT and Bounded PlanSAT are in the complexity class PSPACE, a class that is larger (and hence
more difficult) than NP and refers to problems that can be solved by a deterministic Turing machine
with a polynomial amount of space. Even if we make some rather severe restrictions, the problems
remain quite difficult.
Now that we have shown how a planning problem maps into a search problem, we can solve planning
problems with any of the heuristic search algorithms from Chapter 3 or a local search algorithm from
Chapter 4 (provided we keep track of the actions used to reach the goal). From the earliest days of
planning research (around 1961) until around 1998 it was assumed that forward state-space search
was too inefficient to be practical. It is not hard to come up with reasons why .
First, forward search is prone to exploring irrelevant actions. Consider the noble task of buying a copy
of AI: A Modern Approach from an online bookseller. Suppose there is an action schema Buy(isbn) with
effect Own(isbn). ISBNs are 10 digits, so this action schema represents 10 billion ground actions. An
Second, planning problems often have large state spaces. Consider an air cargo problem with 10
airports, where each airport has 5 planes and 20 pieces of cargo. The goal is to move all the cargo at
airport A to airport B. There is a simple solution to the problem: load the 20 pieces of cargo into one
of the planes at A, fly the plane to B, and unload the cargo. Finding the solution can be difficult because
the average branching factor is huge: each of the 50 planes can fly to 9 other airports, and each of the
200 packages can be either unloaded (if it is loaded) or loaded into any plane at its airport (if it is
unloaded). So in any state there is a minimum of 450 actions (when all the packages are at airports
with no planes) and a maximum of 10,450 (when all packages and planes are at the same airport). On
average, let’s say there are about 2000 possible actions per state, so the search graph up to the depth
of the obvious solution has about 2000 nodes.
In regression search we start at the goal and apply the actions backward until we find a sequence of
steps that reaches the initial state. It is called relevant-states search because we only consider actions
that are relevant to the goal (or current state). As in belief-state search (Section 4.4), there is a set of
relevant states to consider at each step, not just a single state.
In general, backward search works only when we know how to regress from a state description to the
predecessor state description. For example, it is hard to search backwards for a solution to the n-
queens problem because there is no easy way to describe the states that are one move away from the
goal. Happily, the PDDL representation was designed to make it easy to regress actions—if a domain
can be expressed in PDDL, then we can do regression search on it.
The final issue is deciding which actions are candidates to regress over. In the forward direction we
chose actions that were applicable—those actions that could be the next step in the plan. In backward
search we want actions that are relevant—those actions that could be the last step in a plan leading
up to the current goal state.
Neither forward nor backward search is efficient without a good heuristic function. Recall from Chapter
3 that a heuristic function h(s) estimates the distance from a state s to the goal and that if we can
derive an admissible heuristic for this distance—one that does not overestimate—then we can use A∗
search to find optimal solutions. An admissible heuristic can be derived by defining a relaxed problem
that is easier to solve. The exact cost of a solution to this easier problem then becomes the heuristic
for the original problem.
By definition, there is no way to analyze an atomic state, and thus it it requires some ingenuity by a
human analyst to define good domain-specific heuristics for search problems with atomic states.
Planning uses a factored representation for states and action schemas. That makes it possible to define
good domain-independent heuristics and for programs to automatically apply a good domain-
independent heuristic for a given problem.
Planning Graphs:
All of the heuristics we have suggested can suffer from inaccuracies. This section shows how a special
data structure called a planning graph can be used to give better heuristic estimates. These heuristics
can be applied to any of the search techniques we have seen so far. Alternatively, we can search for a
solution over the space formed by the planning graph, using an algorithm called GRAPHPLAN.
A planning problem asks if we can reach a goal state from the initial state. Suppose we are given a tree
of all possible actions from the initial state to successor states, and their successors, and so on. If we
indexed this tree appropriately, we could answer the planning question “can we reach state G from
state S0” immediately, just by looking it up. Of course, the tree is of exponential size, so this approach
is impractical. A planning graph is polynomial- size approximation to this tree that can be constructed
quickly. The planning graph can’t answer definitively whether G is reachable from S0, but it can
estimate how many steps it takes to reach G. The estimate is always correct when it reports the goal is
not reachable, and it never overestimates the number of steps, so it is an admissible heuristic.
A planning graph is a directed graph organized into levels: first a level S0 for the initial state, consisting
of nodes representing each fluent that holds in S0; then a level A0 consisting of nodes for each ground
action that might be applicable in S0; then alternating levels Si followed by Ai; until we reach a
termination condition (to be discussed later).
i. We say “roughly speaking” because the planning graph records only a restricted subset of the
possible negative interactions among actions; therefore, a literal might show up at level Sj when
actually it could not be true until a later level, if at all. (A literal will never show up too late.) Despite
the possible error, the level j at which a literal first appears is a good estimate of how difficult it is to
achieve the literal from the initial state.
We now define mutex links for both actions and literals. A mutex relation holds between two actions
at a given level if any of the following three conditions holds:
Inconsistent effects: one action negates an effect of the other. For example, Eat(Cake) and the
persistence of Have(Cake) have inconsistent effects because they disagree on the effect Have(Cake).
Interference: one of the effects of one action is the negation of a precondition of the other. For example
Eat(Cake) interferes with the persistence of Have(Cake) by its precondition.
Competing needs: one of the preconditions of one action is mutually exclusive with a precondition of
the other. For example, Bake(Cake) and Eat(Cake) are mutex because they compete on the value of the
Have(Cake) precondition.
A mutex relation holds between two literals at the same level if one is the negation of the other or if
each possible pair of actions that could achieve the two literals is mutually exclusive. This condition is
called inconsistent support. For example, Have(Cake) and Eaten(Cake) are mutex in S1 because the
only way of achieving Have(Cake), the persistence action, is mutex with the only way of achieving
Eaten(Cake), namely Eat(Cake). In S2 the two literals are not mutex, because there are new ways of
achieving them, such as Bake(Cake) and the persistence of Eaten(Cake), that are not mutex.
Currently the most popular and effective approaches to fully automated planning are:
These three approaches are not the only ones tried in the 40-year history of automated planning.
Figure
10.11 shows some of the top systems in the International Planning Competitions, which have been
held every even year since 1998. In this section we first describe the translation to a satisfiability
problem and then describe three other influential approaches: planning as first-order logical
deduction; as constraint satisfaction; and as plan refinement.
we saw how SATPLAN solves planning problems that are expressed in propositional logic. Here we
show how to translate a PDDL description into a form that can be processed by SATPLAN. The
translation is a series of straightforward steps:
Define the initial state: assert F 0 for every fluent F in the problem’s initial state, and ¬F for every fluent
not mentioned in the initial state.
Proposition Alize the goal: for every variable in the goal, replace the literals that contain the variable
with a disjunction over constants. For example, the goal of having block A on another block,
On(A, x) 𝖠 Block (x) in a world with objects A, B and C, would be replaced by the goal (On(A, A) 𝖠 Block
(A)) ∨ (On(A, B) 𝖠 Block (B)) ∨ (On(A, C) 𝖠 Block (C)) .
Add successor-state axioms: For each fluent F , add an axiom of the form F t+1 ⇔ ActionCausesF t ∨ (F
t 𝖠 ¬ActionCausesNotF t) ,
where Action CausesF is a disjunction of all the ground actions that have F in their add list, and Action
CausesNotF is a disjunction of all the ground actions that have F in their delete list.
Planning combines the two major areas of AI we have covered so far: search and logic. A planner can
be seen either as a program that searches for a solution or as one that (constructively) proves the
existence of a solution. The cross-fertilization of ideas from the two areas has led both to
improvements in performance amounting to several orders of magnitude in the last decade and to an
increased use of planners in industrial applications. Unfortunately, we do not yet have a clear
understanding of which techniques work best on which kinds of problems. Quite possibly, new
techniques will emerge that dominate existing methods.
Sometimes it is possible to solve a problem efficiently by recognizing that negative interactions can be
ruled out. We say that a problem has serializable sub goals if there exists an order of sub goals such
that the planner can achieve them in that order without having to undo any of the previously achieved
sub goals. For example, in the blocks world, if the goal is to build a tower (e.g., A on B, which in turn is
on C, which in turn is on the Table, as in Figure 10.4 on page 371), then the sub goals are serializable
bottom to top: if we first achieve C on Table, we will never have to undo it while we are achieving the
other sub goals. Planners such as GRAPHPLAN, SATPLAN, and FF have moved the field of planning
forward, by raising the level of performance of planning systems.
The classical planning representation talks about what to do, and in what order, but the representation
cannot talk about time: how long an action takes and when it occurs. For example, the planners of
Chapter 10 could produce a schedule for an airline that says which planes are assigned to which flights,
but we really need to know departure and arrival times as well. This is the subject matter of scheduling.
The real world also imposes many resource constraints; for example, an airline has a limited number
of staff—and staff who are on one flight cannot be on another at the same time. This section covers
methods for representing and solving planning problems that include temporal and resource
constraints.
The approach we take in this section is “plan first, schedule later”: that is, we divide the overall problem
into a planning phase in which actions are selected, with some ordering constraints, to meet the goals
of the problem, and a later scheduling phase, in which temporal information is added to the plan to
ensure that it meets resource and deadline constraints.
This approach is common in real-world manufacturing and logistical settings, where the planning
phase is often performed by human experts. The automated methods of Chapter 10 can also be used
for the planning phase, provided that they produce plans with just the minimal ordering constraints
required for correctness. G RAPHPLAN (Section 10.3), SATPLAN (Section 10.4.1), and partial-order
planners (Section 10.4.4) can do this; search-based methods (Section 10.2) produce totally ordered
plans, but these can easily be converted to plans with minimal ordering constraints.
The problem-solving and planning methods of the preceding chapters all operate with a fixed set of
atomic actions. Actions can be strung together into sequences or branching networks; state-of-the-art
algorithms can generate solutions containing thousands of actions.
For plans executed by the human brain, atomic actions are muscle activations. In very round numbers,
we have about 103 muscles to activate (639, by some counts, but many of them have multiple
subunits); we can modulate their activation perhaps 10 times per second; and we are alive and awake
for about 109 seconds in all. Thus, a human life contains about 1013 actions, give or take one or two
orders of magnitude. Even if we restrict ourselves to planning over much shorter time horizons—for
example, a two-week vacation in Hawaii—a detailed motor plan would contain around 1010 actions.
This is a lot more than 1000.
To bridge this gap, AI systems will probably have to do what humans appear to do: plan at higher levels
of abstraction. A reasonable plan for the Hawaii vacation might be “Go to San Francisco airport; take
Hawaiian Airlines flight 11 to Honolulu; do vacation stuff for two weeks; take Hawaiian Airlines flight
12 back to San Francisco; go home.” Given such a plan, the action “Go to San Francisco airport” can be
viewed as a planning task in itself, with a solution such as “Drive to the long-term parking lot; park;
take the shuttle to the terminal.” Each of these actions, in turn, can be decomposed further, until we
reach the level of actions that can be executed without deliberation to generate the required motor
control sequence.
Planning and Acting in Nondeterministic Domains: While the basic concepts are the same as in
Chapter 4, there are also significant differences. These arise because planners deal with factored
representations rather than atomic representations. This affects the way we represent the agent’s
capability for action and observation and the way we represent belief states—the sets of possible
physical states the agent might be in—for unobservable and partially observable environments. We
can also take ad- vantage of many of the domain-independent methods given in Chapter 10 for
calculating search heuristics.
Consider this problem: given a chair and a table, the goal is to have them match—have the same color.
In the initial state we have two cans of paint, but the colors of the paint and the furniture are unknown.
Only the table is initially in the agent’s field of view:
Init(Object(Table) 𝖠 Object(Chair ) 𝖠 Can(C1) 𝖠 Can(C2) 𝖠 InView (Table)) Goal (Color (Chair , c) 𝖠 Color
(Table, c))
There are two actions: removing the lid from a paint can and painting an object using the paint from
an open can. The action schemas are straightforward, with one exception: we now allow preconditions
and effects to contain variables that are not part of the action’s variable list. That is, Paint(x, can) does
not mention the variable c, representing the color of the paint in the can. In the fully observable case,
this is not allowed—we would have to name the action Paint(x, can, c). But in the partially observable
case, we might or might not know what color is in the can. (The variable c is universally quantified, just
like all the other variables in an action schema.)
Action(Paint(x , can),
we have assumed that only one agent is doing the sensing, planning, and acting. When there are
multiple agents in the environment, each agent faces a multi agent planning problem in which it tries
to achieve its own goals with the help or hindrance of others.
Between the purely single-agent and truly multi agent cases is a wide spectrum of problems that
exhibit various degrees of decomposition of the monolithic agent. An agent with multiple effectors
that can operate concurrently—for example, a human who can type and speak at the same time—
needs to do multi effector planning to manage each effector while handling positive and negative
interactions among the effectors. When the effectors are physically decoupled into detached units—
as in a fleet of delivery robots in a factory— multi effector planning becomes multibody planning. A
multibody problem is still a “standard” single-agent problem as long as the relevant sensor information
collected by each body can be pooled—either centrally or within each body—to form a common
estimate of the world state that then informs the execution of the overall plan; in this case, the multiple
bodies act as a single body.
When a single entity is doing the planning, there is really only one goal, which all the bodies necessarily
share. When the bodies are distinct agents that do their own planning, they may still share identical
goals; for example, two human tennis players who form a doubles team share the goal of winning the
match. Even with shared goals, however, the multibody and multi agent cases are quite different. In a
multibody robotic doubles team, a single plan dictates which body will go where on the court and
which body will hit the ball. In a multi- agent doubles team, on the other hand, each agent decides
what to do; without some method for coordination, both agents may decide to cover the same part of
the court and each may leave the ball for the other to hit.
For the time being, we will treat the multi effector, multibody, and multi agent settings in the same
way, labeling them generically as multi actor settings, using the generic term actor to cove cover
effectors, bodies, and agents. The goal of this section is to work out how to define transition models,
correct plans, and efficient planning algorithms for the multi actor setting.
A correct plan is one that, if executed by the actors, achieves the goal. (In the true multi agent setting,
of course, the agents may not agree to execute any particular plan, but at least they will know what
plans would work if they did agree to execute them.) For simplicity, we assume perfect
synchronization: each action takes the same amount of time and actions at each point in the joint plan
are simultaneous.
The standard approach to loosely coupled problems is to pretend the problems are completely
decoupled and then fix up the interactions. For the transition model, this means writing action
schemas as if the actors acted independently. Let’s see how this works for the doubles tennis problem.
Let’s suppose that at one point in the game, the team has the goal of returning the ball that has been
hit to them and ensuring that at least one of them is covering the net.
Now let us consider the true multi agent setting in which each agent makes its own plan. To start with,
let us assume that the goals and knowledge base are shared. One might think that this reduces to the
multibody case—each agent simply computes the joint solution and executes its own part of that
solution. Alas, the “the” in “the joint solution” is misleading. For our doubles team, more than one
joint solution exists:
If both agents can agree on either plan 1 or plan 2, the goal will be achieved. But if A chooses plan 2
and B chooses plan 1, then nobody will return the ball. Conversely, if A chooses 1 and B chooses 2,
then they will both try to hit the ball.
One option is to adopt a convention before engaging in joint activity. A convention is any constraint on
the selection of joint plans. For example, the convention “stick to your side of the court” would rule
out plan 1, causing the doubles partners to select plan 2. Drivers on a road face the problem of not
colliding with each other; this is (partially) solved by adopting the convention “stay on the right side of
Conventions can also arise through evolutionary processes. For example, seed-eating harvester ants
are social creatures that evolved from the less social wasps. Colonies of ants execute very elaborate
joint plans without any centralized control—the queen’s job is to re- produce, not to do centralized
planning—and with very limited computation,
Communication, and memory capabilities in each ant (Gordon, 2000, 2007). The colony has many
roles, including interior workers, patrollers, and foragers. Each ant chooses to perform a role ac-
cording to the local conditions it observes. One final example of cooperative multi agent behavior
appears in the flocking behavior of birds.
We can obtain a reasonable simulation of a flock if each bird agent (sometimes called a boid) observes
the positions of its nearest neighbors and then chooses the heading and acceleration that maximizes
the weighted sum of these three components.
(a (b (c
Figure 11.11 (a) A simulated flock of birds, using Reynold’s boids model. Image
courtesy
Cohesion: a positive score for getting closer to the average position of the neighbors
Separation: a negative score for getting too close to any one neighbor
Alignment: a positive score for getting closer to the average heading of the neighbors
If all the boids execute this policy, the flock exhibits the emergent behavior of flying as a pseudo rigid
body with roughly constant density that does not disperse over time, and that occasionally makes
sudden swooping motions. You can see a still images in Figure 11.11(a) and compare it to an actual
flock in (b). As with ants, there is no need for each agent to possess a joint plan that models the actions
of other agents. The most difficult multi agent problems involve both cooperation with members of
one’s own team and competition against members of opposing teams, all without centralized control.