0% found this document useful (0 votes)
22 views16 pages

4.4-InstanceBasedLearning Part 1

Uploaded by

Sujithra Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views16 pages

4.4-InstanceBasedLearning Part 1

Uploaded by

Sujithra Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 4 Inductive Learning based on


Symbolic Representations
and Weak Theories

Video 4.4 Instance Based Learning Part 1


Subtopics for this lecture

Instance Based Learning in general


The structure of the instance space
K-Nearest Neighbor algorithm
Distance and Similarity metrices
Weighted Nearest Neighbor algorithm
Binary Linear Classifier
Support Vector Machines
Kernel Methods
The Kernel Trick enabling binary non-linear classification
Instance Based Learning
Synonym: Memory-based learning

Instance based learning is a family of learning algorithms that, instead of performing


explicit generalization, compares new problem instances with instances seen in
training, which have been stored in memory. It is called instance-based because it
evaluates test cases directly based on the training instances themselves.

Instance based learning is a kind of lazy learning, where the evaluation is only
approximated locally and all computation is deferred until classification.

In the worst case, a hypothesis is a list of n training items and the computational
complexity of classifying a single new instance is O(n).

One advantage that instance based learning has over other methods of machine
learning is its high flexibility to adapt its model to previously unseen data.

Instance-based learners may store a new instance and/or throw an old instance away.
The structure of the instance space
In many machine learning approaches the internal structure of the instance space
is not explicitly considered.

However, the character of the instance space will always implicitly influence the
performance of learning algorithms even if it is not explicitly considered in the
algorithm design.

In contrast, for instance based learning, the character of the instance space is of
key importance.

In earlier lectures, a few crucial structural aspects of the instance space have been
mentioned:
- The number of features
- The value set of features
- Instances with special status: prototypes, outliers and near misses
- Similarity or distance measures
- Structural properties of the whole space such as sparseness, density etc.
These aspects will come into play now.
K-Nearest Neighbor Algorithm (KNN)
In the k-nearest neighbors algorithm (k-NN) the analysis is based on the k closest
training examples in the instance space.

k is a predefined positive integer, typically small and odd. Potentially an optimal k can be
calculated by special techniques (hyper parameter optimization techniques)

The typical representation of an instance x is a feature vector ( a1(x), a2(x),....an(x)) + a


target function f(x). The training phase is simply the storage of the feature vectors of all
training instances in a datastructure.

A distance metric is always needed. A default metric is the Euclidean distance


d(xi,xj) = sqrt ((r=1..n) :Sum ( ar(xi)-ar(xj))^2)). A metric is typically manually defined
but can also be learned.

The k-NN algorithm can be used both for classification or regression:


• In k-NN classification, the output is a class membership. A query instance (xq) is
assigned the class label most common among its k nearest neighbors. If k = 1, then the
instance is simply assigned the class of that single nearest neighbor.
• In k-NN regression, the output is the property value for the query instance (xq). This
value is the average of the values of its k nearest neighbors.

For examples we will use binary classifications in a feature space of two dimensions.
Illustration of simple classification application
of the K-nearest neigbour algorithm
The circles represent instances
of the algorithm for values of K =1,3,5

Query instance

Instance classified as BLUE

Instance classified as RED

In the example the query instance is


classified as follows:
- In the k=1 case BLUE
- In the k=3 case RED
- In the k=5 case BLUE
Implicit representation or visualization of
the Hypothesis space
It is obvious that in the case of instance-based learning there is only one
explicit space, the instance space. A hypothesis is never explicitly built
up.

The ´hypotheses´ are implicit in the structure of the instance space.

One form of such implicit representations is the so called Voronoi


Diagram.

A Voronoi Diagram is a partitioning of the decision surface into convex


polyhedral surroundings of the training instances.

Each polyhedron covers the potential query instances positively


determined by a training instance. Query points outside a specific
polyhedron is closer to another training instance.

The approch can be extended to a number of dimensions larger than 2.


Normed and Inner Product Euclidean Vector Spaces
In this presentation we only considers vectors in an Euclidean space.

A normed vector space is a vector space over the real or complex numbers, on
which a norm or length is defined. A norm is a real-valued function that has the
following properties:
1. A norm is written as d(x) or |x| where x is a vector
2. d(x)>=0
3. d(k*x) = k*d(x)
4. d(x+y) <= d(x) + d(y)
5. An euclidian norm is written as ||x|| and = sqrt ( (r=1..n) Sum ar(x)^2 ))
A norm applied to a difference of two vectors is called a distance = d(x-y).

An inner product space is a normed euclidean vector space on which an inner product or dot
product is defined.The inner product associates each pair of vectors in the space with a scalar
quantity. Inner products allow the introduction of the intuitive geometrical notion of the angle
between two vectors.
6. An inner product or dot product is written as ( x.y) where x and y are vectors
7. (x.y) = r=1..n Sum ar * br
8. (x.y) = d(x)*d(y) * cos (angle between x and y)
9. (x.y) = d(x)*d(y) * cos (90 degrees) = 0 => x and y are orthogonal
10. An euclidian norm ||x|| = sqrt ((x.x)).
Distance and Similarity Metrices
A distance metric (measure, function) is typically a real-valued function that quantifies
the distance between two objects:
• distances between a point and itself are zero: d(x,x) = 0;
• all other distances are larger than zero: d(x, y) > 0
• distances are symmetric: d(y,x) = d(x, y)
• detours can not shorten the distance: d(x, z) <= d(x, y)+d(y, z)

Distance metrics and similarity metrics have been developed more or less
independently for different purposes, but intuitively specific similarity metrics are
inverses of corresponding distance metrics and can be transformed into each
other.

Typically similarity metrics takes values in the range of -1...0...1, where 1 means that
the objects are regarded as identical and -1 means is the maximum distance
considered by the corresponding distance metric. Distance metrics can take arbitrary
values from 0 to infinity. Through some transforms and normalizations distance and
similarity metrics can be made comparable.

We will exemplify by metrics in a normed Euclidean vector space and Metrices based
on overlapping elements.
Metrics in Normed and Inner product vector spaces
Minkovsky distance
The Minkowski distance is a metric in a normed Euclidean vector space.
d(xi, xj) = ( r=1..n Sum ( ar(xi)-ar(xj))^k))^1/k, range 0..

Manhattan or taxicab distance = the Minkovsky distance with k=1.


d(xi,xj) = ( r=1..n Sum (Abs (ar(xi)-ar(xj)))), range 0....infinity
The sum of the absolute differences of the Cartesian coordinates for two vectors.

Euclidean distance = the Minkovsky distance with k=2


||x-y|| = d(xi,xj) = sqrt ( r=1..n Sum (ar(Xi)-ar(xj))^2), range 0....infinity
The classic Euclidean distance according to the theorem of Pythagoras.

Chebyshev or chessboard distance = the Minkovsky distance with k=infinity


d(xi, xj) = ( r=1..n Sum ( ar(xi)-ar(xj))^k))^1/k -> max r=1..n ( ar(xi)-ar(xj)),range:0-infinity
The greatest of their difference along any coordinate dimension
The minimum number of moves a king requires to move between two chess positions.

Cosine similarity measure


d(xi,xj) = Cosine (angle between xi and xj) = sqrt ( r=1..n Sum (ar(xi)*ar(xj)))/
(sqrt ( r=1..n Sum (ar(xi)^2))*sqrt ( r=1..n Sum (ar(xj)^2))), range: 0-infinity
The cosine measure disconsiders the magnitude of vectors, which is preferable for certain data-sets
Different metrices give rise to different Voronoi diagrams
Example of Cosine similarity
Metrices based on overlapping elements
One category of metrices measures the degree of overlap of elements in sets,
arrays or vectors. Elements could be binary digits, numbers or words.

Levenshtein Distance
A string metric for measuring the difference between two sequences.
Informally, the Levenshtein distance between two words is the minimum
number of single-character edits (insertions, deletions or substitutions)
required to change one word into the other.

Jaccard Similarity, Index or Coefficient


The Jaccard index, The Jaccard coefficient measures similarity between finite
sample sets, and is defined as the size of the intersection divided by the size
of the union of the sample sets.

Hamming distance
The Hamming distance between two strings of equal length is the number of
positions at which the corresponding symbols are different. In other words, it
measures the minimum number of substitutions required to change one string
into the other.
Issues to consider for the k2-nearest neighbor algorithm
In binary (two class) classification problems, it is helpful to choose k to be an odd number as
this avoids tied votes. One way of choosing the empirically optimal k in this setting is via the
bootstrap method.

The principle of "majority voting" for deciding the class labels can be problematic when the
class distribution is skewed.

Instances of a more frequent class tend to dominate the prediction of the new examples, because
they tend to be more common among the k nearest neighbors due to their large number.

Irrelevant features within a large feature set, tend to degrade performance.

The simple model where all instances are treated fairly using the same distance metric may be
inadequate e.g. in sparse instance spaces.
The Bootstrap method
The bootstrap method is a statistical technique for estimating quantities about
a population by averaging estimates from multiple small data samples

Importantly, samples are constructed by drawing observations from a large


data sample one at a time and returning them to the data sample after they
have been chosen. This allows a given observation to be included in a given
small sample more than once. This approach to sampling is called sampling
with replacement.

The bootstrap method can be used to estimate a quantity of a population. This


is done by repeatedly taking small samples, calculating the statistic, and
taking the average of the calculated statistics.

We can summarize this procedure as follows:


1. Choose a number of bootstrap samples to perform
2. Choose a sample size
3. For each bootstrap sample
a) Draw a sample with replacement with the chosen size
b) Calculate the statistic on the sample
4. Calculate the mean of the calculated sample statistics.
To be continued in Part 2

You might also like