Artificial Intelligence: (Unit 5: Machine Learning)
Artificial Intelligence: (Unit 5: Machine Learning)
com
1
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
What is Learning?
“Learning denotes changes in the system that are adaptive in the sense that they enable the
system to do the same task (or tasks drawn from the same population) more effectively the
next time.” --Herbert Simon
Types of Learning:
The strategies for learning can be classified according to the amount of inference the
system has to perform on its training data. In increasing order we have
1. Rote learning – the new knowledge is implanted directly with no inference at all, e.g.
simple memorisation of past events, or a knowledge engineer’s direct programming of
rules elicited from a human expert into an expert system.
2. Supervised learning – the system is supplied with a set of training examples consisting
of inputs and corresponding outputs, and is required to discover the relation or mapping
between then, e.g. as a series of rules, or a neural network.
Early expert systems relied on rote learning, but for modern AI systems we are generally
interested in the supervised learning of various levels of rules.
As with many other types of AI system, it is much more efficient to give the system
enough knowledge to get it started, and then leave it to learn the rest for itself. We may
even end up with a system that learns to be better than a human expert.
The general learning approach is to generate potential improvements, test them, and
discard those which do not work. Naturally, there are many ways we might generate the
potential improvements, and many ways we can test their usefulness. At one extreme, there
are model driven (top-down) generators of potential improvements, guided by an
understanding of how the problem domain works. At the other, there are data driven
(bottom-up) generators, guided by patterns in some set of training data.
2
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
Machine Learning:
As regards machines, we might say, very broadly, that a machine learns whenever it
changes its structure, program, or data (based on its inputs or in response to external
information) in such a manner that its expected future performance improves. Some of
these changes, such as the addition of a record to a data base, fall comfortably within the
province of other disciplines and are not necessarily better understood for being called
learning. But, for example, when the performance of a speech-recognition machine
improves after hearing several samples of a person's speech, we feel quite justified in that
case saying that the machine has learned.
Machine learning usually refers to the changes in systems that perform tasks associated
with artificial intelligence (AI). Such tasks involve recognition, diagnosis, planning, robot
control, prediction, etc. The changes might be either enhancements to already performing
systems or synthesis of new systems.
Concept learning also refers to a learning task in which a human or machine learner is
trained to classify objects by being shown a set of example objects along with their class
labels. The learner will simplify what has been observed in an example. This simplified
version of what has been learned will then be applied to future examples. Concept learning
ranges in simplicity and complexity because learning takes place over many areas. When a
concept is more difficult, it will be less likely that the learner will be able to simplify, and
therefore they will be less likely to learn. This learning by example consists of the idea of
version space.
The most specific hypotheses (i.e., the specific boundary SB) are the hypotheses that cover
the observed positive training examples, and as little of the remaining feature space as
possible. These are hypotheses which if reduced any further would exclude a positive
training example, and hence become inconsistent. These minimal hypotheses essentially
constitute a (pessimistic) claim that the true concept is defined just by the positive data
The most general hypotheses (i.e., the general boundary GB) are those which cover the
observed positive training examples, but also cover as much of the remaining feature space
without including any negative training examples. These are hypotheses which if enlarged
any further would include a negative training example, and hence become inconsistent.
Tentative heuristics are represented using version spaces. A version space represents all the
alternative plausible descriptions of a heuristic. A plausible description is one that is
applicable to all known positive examples and no known negative example.
Fundamental Assumptions
Diagrammatical Guidelines
Nodes in the generalization tree are connected to a model that matches everything in its
subtree.
Nodes in the specialization tree are connected to a model that matches only one thing in its
subtree.
In the diagram below, the specialization tree is colored red, and the generalization tree is
colored green.
4
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
The key idea in version space learning is that specialization of the general models and
generalization of the specific models may ultimately lead to just one correct model that
matches all observed positive examples and does not match any negative examples.
That is, each time a negative example is used to specialilize the general models, those
specific models that match the negative example are eliminated and each time a positive
example is used to generalize the specific models, those general models that fail to match
the positive example are eliminated. Eventually, the positive and negative examples may
be such that only one general model and one identical specific model survive.
5
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
The version space method handles positive and negative examples symmetrically.
Given:
A representation language.
A set of positive and negative examples expressed in that language.
Compute: a concept description that is consistent with all the positive examples and none
of the negative examples.
Method:
Initialize G, the set of maximally general hypotheses, to contain one element: the
null description (all features are variables).
Initialize S, the set of maximally specific hypotheses, to contain one element: the
first positive example.
Accept a new training example.
o If the example is positive:
1. Generalize all the specific models to match the positive example, but
ensure the following:
The new specific models involve minimal changes.
Each new specific model is a specialization of some general
model.
No new specific model is a generalization of some other
specific model.
2. Prune away all the general models that fail to match the positive
example.
o If the example is negative:
1. Specialize all general models to prevent match with the negative
example, but ensure the following:
The new general models involve minimal changes.
Each new general model is a generalization of some specific
model.
No new general model is a specialization of some other
general model.
2. Prune away all the specific models that match the negative example.
o If S and G are both singleton sets, then:
if they are identical, output their value and halt.
if they are different, the training cases were inconsistent. Output this
result and halt.
else continue accepting new training examples.
6
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
Problem 1:
Solution:
Initialize G to a singleton
set that includes everything. G = { (?, ?, ?, ?, ?) }
Initialize S to a singleton S = { (Japan, Honda, Blue, 1980,
set that includes the first Economy) }
positive example.
These models represent the most general and the most specific heuristics one might learn.
The actual heuristic to be learned, "Japanese Economy Car", probably lies between them
somewhere within the version space.
7
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
8
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
G = { (Japan, ?, ?, ?, Economy) }
S = { (Japan, ?, ?, ?, Economy) }
9
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
A training example : what the learning sees in the world. (specific facts that rule out some
possible hypotheses)
A goal concept : a high level description of what the program is supposed to learn. (the set
of all possible conclusions)
10
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
A domain theory : a set of rules that describe relationships between objects and actions in
a domain. (axioms about a domain of interest)
From this EBL computes a generalization of the training example that is sufficient not only
to describe the goal concept but also satisfies the operational criterion.
Explanation: the domain theory is used to prune away all unimportant aspects of the
training example with respect to the goal concept.
Generalisation: the explanation is generalized as far possible while still describing the
goal concept
An example of EBL using a perfect domain theory is a program that learns to play chess by
being shown examples. A specific chess position that contains an important feature, say,
"Forced loss of black queen in two moves," includes many irrelevant features, such as the
specific scattering of pawns on the board. EBL can take a single training example and
determine what the relevant features are in order to form a generalization.
Learning by Analogy:
11
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
The question in above figure represents some known aspects of a new case, which has
unknown aspects to be determined. In deduction, the known aspects are compared (by a
version of structure mapping called unification) with the premises of some implication.
Then the unknown aspects, which answer the question, are derived from the conclusion of
the implication. In analogy, the known aspects of the new case are compared with the
corresponding aspects of the older cases. The case that gives the best match may be
assumed as the best source of evidence for estimating the unknown aspects of the new
case. The other cases show alternative possibilities for those unknown aspects; the closer
the agreement among the alternatives, the stronger the evidence for the conclusion.
1. Retrieve: Given a target problem, retrieve cases from memory that are relevant to
solving it. A case consists of a problem, its solution, and, typically, annotations
about how the solution was derived. For example, suppose Fred wants to prepare
blueberry pancakes. Being a novice cook, the most relevant experience he can
recall is one in which he successfully made plain pancakes. The procedure he
followed for making the plain pancakes, together with justifications for decisions
made along the way, constitutes Fred's retrieved case.
2. Reuse: Map the solution from the previous case to the target problem. This may
involve adapting the solution as needed to fit the new situation. In the pancake
example, Fred must adapt his retrieved solution to include the addition of
blueberries.
3. Revise: Having mapped the previous solution to the target situation, test the new
solution in the real world (or a simulation) and, if necessary, revise. Suppose Fred
adapted his pancake solution by adding blueberries to the batter. After mixing, he
discovers that the batter has turned blue – an undesired effect. This suggests the
following revision: delay the addition of blueberries until after the batter has been
ladled into the pan.
4. Retain: After the solution has been successfully adapted to the target problem,
store the resulting experience as a new case in memory. Fred, accordingly, records
his newfound procedure for making blueberry pancakes, thereby enriching his set
of stored experiences, and better preparing him for future pancake-making
demands.
12
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
Transformational Analogy:
Suppose you are asked to prove a theorem in plane geometry. You might look for a
previous theorem that is very similar and copy its proof, making substitutions when
necessary. The idea is to transform a solution to a previous problem in to solution for the
current problem. The following figure shows this process,
New Previously
Problem solved problem
Derivational Analogy:
Notice that transformational analogy does not look at how the old problem was solved, it
only looks at the final solution. Often the twists and turns involved in solving an old
problem are relevant to solving a new problem. The detailed history of problem solving
episode is called derivation, Analogical reasoning that takes these histories into account is
called derivational analogy.
New Previously
Problem solved problem
Refer Book:- E. Rich, K. Knight, S. B. Nair, Tata MacGraw Hill ( Pages 371-
372)
13
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
Refer Book:- P. H. Winston, Artificial Intelligence, Addison Wesley. (Around page 220)
The learning algorithm we demonstrate is the same across all the output neurons, therefore
everything that follows is applied to a single neuron in isolation. We first define some
variables:
Assume for the convenience that the bias term b is zero. An extra dimension n + 1 can be
added to the input vectors x with x(n + 1) = 1, in which case w(n + 1) replaces the bias
term.
the appropriate weights are applied to the inputs, and the resulting weighted sum passed to
a function which produces the output y
14
Bal Krishna Subedi
Downloded from: CSITauthority.blogspot.com
Artificial Intelligence Chapter- Machine Learning
4. Adapts weights
Steps 3 and 4 are repeated until the iteration error is less than a user-specified error
threshold or a predetermined number of iterations have been completed.
15
Bal Krishna Subedi