Notes AI 4
Notes AI 4
Name Code
🞂 In many real life applications, there is a structure of the problem that cannot be learned
with Deep Learning (DL).
🞂 Solving optimization problems with learning is hard, but integrating planning techniques
with
heuristic guidance learned by DL will result in the most famous success stories of AI to day.
⮩ GO player AlphaGO uses planning (monte-carlo tree search) with deep learning (heuristic guidance) to
select the next move
⮩ Cognitive assistant (Samsung) uses knowledge graph, planning, and deep learning to answer
complicated
queries
Components of a Planning System
🞂 Planning refers to the process of computing several steps of a problem-solving
procedure before executing any of them.
🞂 The planning consists of following important steps:
1. Choose the best rule for applying the next rule based on the best available heuristics.
⮩ The most widely used technique for selecting appropriate rules to apply is first to isolate a set
of differences between desired goal state and then to identify those rules that are relevant to reduce
those differences.
⮩ If there are several rules, a variety of other heuristic information can be exploited to choose among them.
2. Apply the chosen rule for computing the new problem state.
⮩ In simple systems, applying rules is easy. Each rule simply specifies the problem state that would
result from its application.
⮩ In complex systems, we must be able to deal with rules that specify only a small part of the
complete
problem state.
⮩ One way is to describe, for each action, each of the changes it makes to the state description.
Components of a Planning System
3. Detect when a solution has been found.
⮩ A planning system has succeeded in finding a solution to a problem when it has found a
sequence of
operators that transform the initial problem state into the goal state.
⮩ How will it know when this has done?
⮩ In simple problem-solving systems, this question is easily answered by a straightforward match
of the state descriptions.
⮩ One of the representative systems for planning systems is predicate logic. Suppose that as a part of
our
goal, we have the predicate P(x).
⮩ To see whether P(x) satisfied in some state, we ask whether we can prove P(x) given the assertions
that
describe that state and the axioms that define the world model.
Components of a Planning System
4. Detect dead ends so that they can be abandoned and the system’s effort is directed in
more fruitful directions.
⮩ As a planning system is searching for a sequence of operators to solve a particular problem, it
must be able to detect when it is exploring a path that can never lead to a solution.
⮩ The same reasoning mechanisms that can use to detect a solution can often use for detecting a dead
end.
⮩ If the search process is reasoning forward from the initial state. It can prune any path that leads to a
state
from which the goal state cannot reach.
⮩ If search process reasoning backward from the goal state, it can also terminate a path either because
it is
sure that the initial state cannot reach or because little progress made.
🞂 Predicates : In order to specify both the conditions under which an operation may be
performed and the results of performing it, we need the following predicates:
1. ON(A, B): Block A is on Block B.
2. ONTABLES(A): Block A is on the table.
3. CLEAR(A): There is nothing on the top of Block A.
4. HOLDING(A): The arm is holding Block A.
5. ARMEMPTY: The arm is holding nothing.
STRIPS-Based Approach to Robot Control
🞂 It uses the first-order logic and theorem proving to plan strategies from start to goal.
🞂 STRIPS language: “Classical” approach that most planners use, lends itself to efficient
planning
algorithms.
🞂 Environment: office environment with specially colored and shaped objects.
🞂 STRIPS planner: developed for this system to determine the actions of the robot should
take to
achieve goals.
🞂 STRIPS (STanford Research Institute Problem Solver) is a restrictive way to express
states,
actions and goals, but leads to more efficiency.
Robot Problem-Solving Systems (STRIPS)
🞂 ADD List : List of new predicates that the operator causes to become true.
🞂 DELETE List : List of old predicates that the operator causes to become false.
🞂 PRECONDITIONS list contains those predicates that must be true for the operator to be
applied.
🞂 STRIPS style operators for BLOCK World problem are :
• STACK(x, y)
• P: CLEAR(y) Λ HOLDING(x)
• D: CLEAR(y) Λ HOLDING(x)
• A: ARMEMPTY Λ ON(x, y)
• UNSTACK(x, y)
• PICKUP(x)
• P: CLEAR(x) Λ ONTABLE(x) Λ ARMEMPTY
• D: ONTABLE(x) Λ ARMEMPTY
• A: HOLDING(x)
• PUTDOWN(x)
Goal Stack Planning
🞂 Goal Stack Planning is the one of the simplest planning algorithms that is designed to
handle problems which include compound goals.
🞂 It utilizes STRIPS as a formal language for specifying and manipulating the world with
which it is working.
🞂 This approach uses a Stack for plan generation. The stack can contain Sub-goal and
actions described using predicates. The Sub-goals can be solved one by one in any
order.
🞂 It starts by pushing the unsatisfied goals into the stack.
🞂 Then it pushes the individual sub-goals into the stack and its pops an element out of the
stack.
🞂 When popping an element out of the stack the element could be,
⮩ either a predicate describing a situation about our world or
⮩ it could be an action that can be applied to our world under consideration.
Goal Stack Planning
🞂 So, a decision has to be made based on the kind of element we are popping out from the
stack.
🞂 If it is a Predicate, then compares it with the description of the current world, if it is
satisfied or
is already present in our current situation then there is nothing to do because already its true.
🞂 On the contrary if the Predicate is not true then we have to select and push relevant
action
satisfying the predicate to the Stack.
🞂 So after pushing the relevant action into the stack its precondition should also has
to be pushed into the stack.
🞂 In order to apply an operation its precondition has to be satisfied, i.e., the present
situation of the world should be suitable enough to apply an operation.
🞂 For that, the preconditions are pushed into the stack once after an action is pushed.
Goal Stack Planning – Example
🞂 Lets start here with the BLOCK WORLD example, the initial state is our current
description of our world. The Goal state is what we have to achieve.
A C A
• Discover new things or structures which were previously unknown (data mining,
scientific discovery)
• Reproduce an important aspect of 3
intelligent behaviour.
• Fill in skeletal or incomplete observations or specifications about a domain (this
expands the domain of expertise and lessens the brittleness of the system)
• Build software agents that can adapt to their users or to other software agents
• Machine learning systems also discover patterns without prior expected results
• Open box: changes are clearly visible in the knowledge base and clearly
interpretable by the human users.
• Black box: changes done to the system are not readily visible or understandable.
4
Learner Architecture
• Machine learning systems has the four main components:
➢ Knowledge Base (KB):
✓ what is being learnt
✓ Representation of domain
✓ Description and representation of problem space
➢ Learner: takes output from the critic and modifies something in the KB or the
performer.
5
Learning Agent Architecture
6
Learning Examples
Problem Representation Performer (interacts Critic (human player) Learner(elicits new
with human) questions to modify
KBP
Animal guessing Binary decision tree Walk the tree and ask Human feedback Elicit a question from
game associated questions the user and add it to
the binary tree
Playing chess The board layout, game Chain through the rules Who won (credit Increase the weight for
rules, moves to identify move, use assignment problem) some rules and
conflict resolution to decrease for others.
choose one, output the
move.
Categorizing Vector of word frequencies, Apply appropriate A set of human- Modify the weights on
documents corpus of documents functions to identify categorized documents the function and
which category the file improve categorization
belongs to
Fixing computers Frequency matrix of causes Use known symptoms Human input about Update the frequency
and symptoms to identify potential symptoms and cause matrix with actual
causes. observed for a specific symptoms and
case outcomes
Identifying digits in Probability of digits, matrix Input the features for a Human categorized Modify the weights on
Optical Character of pixels, percentage of digit, output probability training set the network of
Recognition light, no: of straight lines that it is one in the set associations
from 0 to 9.
7
Learning Paradigms
Paradigm Description
• It is also called memorization because the knowledge, without any modification is,
simply copied into the knowledge base. Direct entry of rules and facts
• As computed values are stored, this technique can save a significant amount of time.
• Rote learning technique can also be used in complex learning systems provided
sophisticated techniques are employed to use the stored values faster and there is a
generalization to keep the number of stored information down to a manageable level.
positions it evaluates in its 9
• Checkers-playing program, for example, uses this technique to learn the board
10
Learning by taking advice
• This type is the easiest and simple way of learning.
• Also, there can be several sources for taking advice such as humans(experts),
internet etc.
• However, this type of learning has a more necessity of inference than rote learning.
• The programs shall operationalize the advice by turning it into a single or multiple
expressions that contain concepts and actions that the program can use while under
execution.
• This ability to operationalize knowledge is very critical for learning. This is also a11n
important aspect of Explanation Based Learning (EBL).
Learning in Problem Solving
• When the program does not learn from advice, it can learn by generalizing from its
own experiences.
➢ Learning by chunking
12
Learning in Problem Solving
Learning by parameter adjustment
• Here the learning system relies on evaluation procedure that combines information from
several sources into a single summary static.
• For example, the factors such as demand and production capacity may be combined into a
single score to indicate the chance for increase of production.
• But it is difficult to know a priori how much weight should be attached to each factor.
• The correct weight can be found by taking some estimate of the correct settings and then allow
the program modify its settings based on its experience.
• Features that appear to be good predictors of overall success will have their weights
increases, while those that do not will have their weights decreased.
• In game programs, for example, the factors such as piece advantage and mobility are
combined into a single score to decide whether a particular board position is desirable. This
single score is nothing but a knowledge which the program gathered by means of calculation.
13
Learning in Problem Solving
Learning by parameter adjustment
• The t terms are the values of the features that contribute to the evaluation. The
c terms are the coefficients or weights that are attached to each of these
values. As learning progresses, the c values will change.
be decreased.
➢ Start with some estimate of the correct weight settings.
➢ Modify the weight in the program on the basis of accumulated experiences.
➢ Features that appear to be good predictors will have their weights
be decreased.
Learning in Problem Solving
Learning by parameter adjustment
• This method is very useful in situations where very little additional knowledge is
available or in programs in which it is combined with more knowledge intensive
methods.
16
Learning in Problem Solving
Learning with Macro-Operators
• Domain specific knowledge we need can be learnt in the form of macro operators. 17
Learning in Problem Solving
Learning by chunking
• A production system consists of a set of rules that are in if-then form. That is given a
particular situation, what are the actions to be performed. For example, if it is raining then
take umbrella.
• Production system also contains knowledge base, control strategy and a rule applier. To
solve a problem, a system will compare the present situation with the left hand side of the
rules. If there is a match then the system will perform the actions described in the right hand
side of the corresponding rule.
• Problem solvers solve problems by applying the rules. Some of these rules may be more
useful than others and the results are stored as a chunk.
• Several chunks may encode a single macro-operator and one chunk may participate
in a number of macro sequences.
18
Learning in Problem Solving
Learning by chunking
• Chunks learned in the beginning of problem solving, may be used in the later stage. The
system keeps the chunk to use it in solving other problems.
• Soar is a general cognitive architecture for developing intelligent systems. Soar requires
knowledge to solve various problems. It acquires knowledge using chunking mechanism.
The system learns reflexively when impasses have been resolved. An impasse arises when
the system does not have sufficient knowledge. Consequently, Soar chooses a new
problem space (set of states and the operators that manipulate the states) in a bid to
resolve the impasse. While resolving the impasse, the individual steps of the task plan are
grouped into larger steps known as chunks. The chunks decrease the problem space
search and so increase the efficiency of performing the task.
• In Soar, the knowledge is stored in long-term memory. Soar uses the chunking mechanism
to create productions that are stored in long-term memory. A chunk is nothing but a large
production that does the work of an entire sequence of smaller ones. The productions have
a set of conditions or patterns to be matched to working memory which consists of current
goals, problem spaces, states and operators and a set of actions to perform when the
production fires. Chunks are generalized before storing. When the same impasse occurs
again, the chunks so collected can be used to resolve it.
19
Learning in Problem Solving
The Utility Problem
• The utility problem in learning systems occurs when knowledge learned in an attempt to
improve a system's performance degrades it instead.
• The problem appears in many AI systems, but it is most familiar in speedup learning.
Speedup learning systems are designed to improve their performance by learning control
rules which guide their problem-solving performance. These systems often exhibit the
undesirable property of actually slowing down if they are allowed to learn in an unrestricted
fashion.
• Each individual control rule is guaranteed to have a positive utility (improve performance)
but, in concert, they have a negative utility (degrade performance).
• One of the causes of the utility problem is the serial nature of current hardware. The more
control rules that speedup learning systems acquire, the longer it takes for the system to
test them on each cycle.
• One solution to the utility problem is to design a parallel memory system to eliminate the
increase in match cost. his approach moves the matching problem away from the central
processor and into the memory of the system. These so-called active memories allow
memory search to occur in "nearly constant-time" in the number of data items, relying on
the memory for fast, simple inference and reminding.
20
Learning in Problem Solving
The Utility Problem
• PRODIGY program maintains a utility measure for each control rule. This measure takes
into account the average savings provided by the rule, the frequency of its application and
the cost of matching it.
• If not, it is placed in long term memory with the other rules. It is then monitored during
subsequent problem solving.
• If its utility falls, the rule I discarded.
• Empirical experiments have demonstrated the effectiveness of keeping only those control
rules with high utility.
21
Learning by Analogy
Qa=3 Qb=9 I1 I2
I3=I1+I2
Qc=?
One may infer, by analogy, that hydraulics laws are similar to Kirchoff's
laws, and Ohm's law.
22
Learning by Analogy
Examples of analogies:
23
Transformational Analogy
Look for a similar solution and copy it to the
new situation making suitable substitutions
where appropriate.
E.g. Geometry.
If you know about lengths of line
segments and a proof that
certain lines are equal then we
can make similar assertions
about angles.
Transformational analogy does not look at how the problem was solved -- it
only looks at the final solution. The history of the problem solution - the steps
involved - are often relevant.
24
Derivational Analogy
GIVEN: AB = CD GIVEN: <BAC = <DAE
D CD < - <DAE B C
AC <- <BAD
C BD < - <CAE )
B D
A
A E
AB = CD <BAC = <DAE
BC = BC <CAD = <CAD
AB + BC = BC + CD <BAC + <CAD= <CAD + <DAE
AC = BD <BAD = <CAE
25
Explanation based Learning
26
Explanation based Learning
27
Explanation based Learning
28
Explanation based Learning
29
Explanation based Learning
30
Explanation based Learning
31
Learning by Discovery
An entity acquires knowledge without the help of a teacher.
• Integers-- it is possible to count the elements of this set and this is an the
image of this counting function -- the integers --
interesting set in its own right.
• Addition-- The union of two disjoint sets and their counting function
• Prime Numbers-- factorisation of numbers and numbers with only one factor
were discovered.
• BACON holds some constant and attempts to notice trends in the data.
• Inferences made.
BACON has also been applied to Kepler's 3rd law, Ohm's law,
conservation of 34
momentum and Joule's law.
Learning by Discovery
Clustering
• It is a common descriptive task where one seeks to identify a finite set of categories
or clusters to describe the data. For example, we may want to cluster houses to find
distribution patterns.
• A cluster is a collection of data objects that are similar to one another within the same
cluster and are dissimilar to the objects in other clusters. Clustering analysis helps
construct meaningful partitioning of a large set of objects.
35
Learning by Discovery
Clustering
36
Learning by Discovery
AutoClass
• AutoClass is a clustering algorithm based upon the Bayesian approach for
determining optimal classes in large datasets.
·
• Given a set X={X1, …, Xn} of data instances Xi with unknown classes, the goal of
Bayesian classification is to search for the best class description that predicts the
data in a model space.
·
• Class membership is expressed probabilistically.
• AutoClass calculates the likelihood of each instance belonging to each class C and
then calculates a set of weights wij=(Ci / SjCj) for each instance.
• Weighted statistics relevant to each term of the class likelihood are calculated for
• Theory of the learnable by Valiant: classifies problems by how difficult they are to learn.
• Formally, a device can learn a concept if it can, given positive and negative examples,
produce an algorithm that will classify future examples correctly with probability 1/h.
• If the number of training examples is a polynomial in h,t, f, then the system is said to be
trainable.
38
Formal Learning
39
Other Learning Models
Neural net learning and genetic learning
• Neural networks
40
Learning in Problem Solving
Neural net learning and genetic learning
• Neural networks
41
What is learning
42
General Learning Model.
General Learning Model: - AS noted earlier, learning can be accomplished using a number of
different methods, such as by memorization facts, by being told, or by studying examples like
problem solution. Learning requires that new knowledge structures be created from some form of
input stimulus. This new knowledge must then be assimilated into a knowledge base and be
tested in some way for its utility. Testing means that the knowledge should be used in
performance of some task from which meaningful feedback can be obtained, where the feedback
provides some measure of the accuracy and usefulness of the newly acquired knowledge.
General Learning Model
general learning model is depicted in figure 4.1 where the environment has been included as a
part of the overall learner system. The environment may be regarded as either a form of nature
which produces random stimuli or as a more organized training source such as a teacher which
provides carefully selected training examples for the learner component. The actual form of
environment used will depend on the particular learning paradigm. In any case, some
representation language must be assumed for communication between the environment and the
learner. The language may be the same representation scheme as that used in the knowledge base
(such as a form of predicate calculus). When they are hosen to be the same, we say the single
representation trick is being used. This usually results in a simpler implementation since it is not
necessary to transform between two or more different representations.
For some systems the environment may be a user working at a keyboard . Other systems will use
program modules to simulate a particular environment. In even more realistic cases the system
will have real physical sensors which interface with some world environment.
Inputs to the learner component may be physical stimuli of some type or descriptive , symbolic
training examples. The information conveyed to the learner component is used to create and
modify knowledge structures in the knowledge base. This same knowledge is used by the
performance component to carry out some tasks, such as solving a problem playing a game, or
classifying instances of some concept.
given a task, the performance component produces a response describing its action in
performing the task. The critic module then evaluates this response relative to an optimal
response.
Feedback , indicating whether or not the performance was acceptable , is then sent by the critic
module to the learner component for its subsequent use in modifying the structures in the
knowledge base. If proper learning was accomplished, the system’s performance will have
improved with the changes made to the knowledge base.
The cycle described above may be repeated a number of times until the performance of the
system has reached some acceptable level, until a known learning goal has been reached, or until
changes ceases to occur in the knowledge base after some chosen number of training examples
have been observed.
There are several important factors which influence a system’s ability to learn in addition to the
form of representation used. They include the types of training provided, the form and extent of
any initial background knowledge , the type of feedback provided, and the learning algorithms
used.
The type of training used in a system can have a strong effect on performance, much the same as
it does for humans. Training may consist of randomly selected instance or examples that have
been carefully selected and ordered for presentation. The instances may be positive examples of
some concept or task a being learned, they may be negative, or they may be mixture of both
positive and negative. The instances may be well focused using only relevant information, or
they may contain a variety of facts and details including irrelevant data.
There are Many forms of learning can be characterized as a search through a space of possible
hypotheses or solutions. To make learning more efficient. It is necessary to constrain this search
process or reduce the search space. One method of achieving this is through the use of
background knowledge which can be used to constrain the search space or exercise control
operations which limit the search process.
Feedback is essential to the learner component since otherwise it would never know if the
knowledge structures in the knowledge base were improving or if they were adequate for the
performance of the given tasks. The feedback may be a simple yes or no type of evaluation, or it
may contain more useful information describing why a particular action was good or bad. Also ,
the feedback may be completely reliable, providing an accurate assessment of the performance or
it may contain noise, that is the feedback may actually be incorrect some of the time. Intuitively ,
the feedback must be accurate more than 50% of the time; otherwise the system carries useful
information, the learner should also to build up a useful corpus of knowledge quickly. On the
other hand, if the feedback is noisy or unreliable, the learning process may be very slow and the
resultant knowledge incorrect.
47
48
49
50