0% found this document useful (0 votes)
299 views28 pages

Machine Learning KNN Presentation

This document provides an overview of classification using the k-nearest neighbors (k-NN) algorithm. It discusses key concepts like labeled vs unlabeled data, different distance measures like Euclidean and Manhattan distance, properties of distances, and nearest neighbor search. The document also explains the k-NN algorithm and classification using a majority vote of the k closest neighbors. An example is shown to classify a new point based on its 3 nearest neighbors from a training set.

Uploaded by

AAW 1912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
299 views28 pages

Machine Learning KNN Presentation

This document provides an overview of classification using the k-nearest neighbors (k-NN) algorithm. It discusses key concepts like labeled vs unlabeled data, different distance measures like Euclidean and Manhattan distance, properties of distances, and nearest neighbor search. The document also explains the k-NN algorithm and classification using a majority vote of the k closest neighbors. An example is shown to classify a new point based on its 3 nearest neighbors from a training set.

Uploaded by

AAW 1912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Classification Using K-Nearest

Neighbor
Back Ground

Prepared By
Anand Bhosale
Supervised Unsupervised

• Labeled Data • Unlabeled Data

X1 X2 Class X1 X2
10 100 Square 10 100
2 4 Root 2 4
Distance
Distance
Distances

• Distance are used Distan


Mahalan ce
to measure obis
Distance
similarity
• There are many Euclidean
ways to measure Distance

the distance s
between two Hammin
g
instances Distance
Minkows
ki
distance
Distances

• Manhattan Distance •  Euclidean Distance


|X1-X2| + |Y1-Y2| • 2
Properties of Distance

• Dist (x,y) >= 0


• Dist (x,y) = Dist (y,x) are Symmetric
• Detours can not Shorten Distance
Dist(x,z) <= Dist(x,y) + Dist (y,z)
z

X X y
y z
Distance

Hamming Distance
Distances Measure

• Distance Measure – What does it mean “Similar"?


• Minkowski Distance
1/ m

N

– Norm:
d ( x, y ) || x  y ||m   ( xi  yi ) m 
 i 1 
– Chebyshew Distance
– Mahalanobis distance:

d(x , y) = |x – y|TSxy1|x – y|
Nearest Neighbor and Exemplar
Exemplar

• Arithmetic Mean
• Geometric Mean
• Medoid
• Centroid
Arithmetic Mean
Geometric Mean

A term between two terms of a geometric sequence is the geometric


mean of the two terms.
Example: In the geometric sequence 4, 20, 100, ....(with a factor of 5), 20
is the geometric mean of 4 and 100.
Nearest Neighbor Search

• Given: a set P of n points in Rd


• Goal: a data structure, which given a query
point q, finds the nearest neighbor p of q
in P

p
q
K-NN

• (K-l)-NN: Reduce complexity by having a threshold on the


majority. We could restrict the associations through (K-l)-NN.
K-NN

• (K-l)-NN: Reduce complexity by having a threshold on the


majority. We could restrict the associations through (K-l)-NN.

K=5
K-NN

• Select 5 Nearest Neighbors


as Value of K=5 by Taking their
Euclidean Disances
K-NN

• Decide if majority of Instances over a given


value of K Here, K=5.
Example

Points X1 (Acid Durability ) X2(strength) Y=Classification

P1 7 7 BAD

P2 7 4 BAD

P3 3 4 GOOD

P4 1 4 GOOD
KNN Example

Points X1(Acid Durability) X2(Strength) Y(Classification)


P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 ?
Scatter Plot
Euclidean Distance From Each
Point
KNN
P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)


Euclidean
Distance of
P5(3,7) from Sqrt((7-3) 2 + (7-7)2 ) Sqrt((7-3) 2 + (4-7)2 ) Sqrt((3-3) 2 + (4- Sqrt((1-3) 2 + (4-7)2
= = 7)2 ) = )=
3 Nearest NeighBour

P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)


Euclidean
Distance of
P5(3,7) from Sqrt((7-3) 2 + (7-7)2 ) Sqrt((7-3) 2 + (4-7)2 ) Sqrt((3-3) 2 + (4- Sqrt((1-3) 2 + (4-7)2
= = 7)2 ) = )=

Class BAD BAD GOOD GOOD


KNN Classification

Points X1(Durability) X2(Strength) Y(Classification)


P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 GOOD
Variation In KNN
Different Values of K
References
• Machine Learning : The Art and Science of Algorithms that Make Sense of Data By
Peter Flach
• A presentation on KNN Algorithm : West Virginia University , Published on May 22,
2015
Thanks

You might also like