0% found this document useful (0 votes)

51 views2 pages

Nearest-Neighbor Classifier Guide

Nearest Neighbour Classifier - converted kectures

Uploaded by

harsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views2 pages

Nearest-Neighbor Classifier Guide

Nearest Neighbour Classifier - converted kectures

Uploaded by

harsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Lazy vs.

Eager Learning 1 nearest-neighbor

 Lazy vs. eager learning • Nearest neighbor algorithm does not explicitly compute decision boundaries, but
Lazy learning (e.g., instance-based learning): Simply stores these can be inferred

Nearest-Neighbor Classifier training data (or only minor processing) and waits until it is given
a test tuple
• Decision boundaries: Voronoi diagram visualization, show how input space divided
into classes;

(Instance Based Learning) Eager learning : Given a set of training tuples, constructs a
classification model before receiving new (e.g., test) data to
• Each line segment is equidistant between two points of opposite classes

classify e.g Naïve-Bayes, Decision tree, SVM

 Lazy: less time in training but more time in predicting

 Accuracy: Lazy method effectively uses a richer hypothesis

space since it uses many local linear functions to form an implicit
Ref & Acknowledgments global approximation to the target function
1. Dr B S Panda IIT Delhi Eager: must commit to a single hypothesis that covers the entire
2. R. Zemel, R. Urtasun, S. Fidler, University of Toronto instance space
6
3. Dr Sudeshna Sarkar, IIT Kharagpur

Nearest Neighbors: Decision Boundaries

Classification: parametric vs non-parametric Instance-Based Learning
• Linear regression relates two variables with a straight line; • One way of solving tasks of approximating discrete or real
nonlinear regression relates the variables using a curve. valued target functions
• Line/curve characteristics are needed, such classification is • Have training examples: (xn, f(xn)), n=1..N.
parametric
• Key idea:
• Other alternate: non-parametric
– just store the training examples
• Typically simple methods for approximating discrete-valued
– when a test example is given then find the closest matches
or real-valued target functions (they work for classification
or regression problems)

7 Example: 2D decision boundary

Nearest Neighbors: Decision Boundaries

Instance-Based Classifiers Inductive Assumption
Set of Stored Cases • Store the training records

Atr1 ……... AtrN Class • Use training records to • Similar inputs map to similar outputs
predict the class label of
A unseen cases – If not true => learning is impossible
B – If true => learning reduces to defining “similar”
B
Unseen Case
C
Atr1 AtrN
• Not all similarities created equal
A ……...

C
– predicting a person’s weight may depend on different attributes
than predicting their IQ
B

Example: 3D decision boundary

Nearest Neighbors: Multi-modal Data

 Nearest Neighbor approaches can work with multi-modal data
Instance Based Classifiers Nearest-Neighbor Classifiers  Multi modal data: Multimodal data refers to data that spans different types
Unknown record and contexts (e.g., imaging, text, or genetics)
Requires three things
• Examples:
– The set of stored records
– Rote-learner – Distance Metric to compute distance
• Memorizes entire training data and performs classification only if attributes between records
of record match one of the training examples exactly – The value of k, the number of nearest
neighbors to retrieve

– Nearest neighbor l To classify an unknown record:

• Uses k “closest” points (nearest neighbors) for performing classification – Compute distance to other training records
– Identify k nearest neighbors
– Use class labels of nearest neighbors to
determine the class label of unknown record
(e.g., by taking majority vote)

Nearest Neighbors
Nearest Neighbor Classifiers Definition of Nearest Neighbor [Pic by Olga Veksler]

• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute X X X

Distance Test
Record

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

Training Choose k of the K-nearest neighbors of a record x are data points

Records “nearest” records
that have the k smallest distance to x  Nearest neighbors sensitive to mis-labeled data (“class noise”).
 Solution?
13 / 22
k-Nearest Neighbors Nearest Neighbor Classification… K-NN: Issues (Complexity) & Remedies
• Choosing the value of k:  Expensive at test time: To find one nearest neighbor of a query point x, we
[Pic by Olga Veksler] must compute the distance to all N training examples. Complexity:
– If k is too small, sensitive to noise points
O(kdN) for kNN
– If k is too large, neighborhood may include points from other classes
 Use subset of dimensions
– We can use cross-validation to find k  Pre-sort training examples into fast data structures (e.g., kd-trees)
– Rule of thumb is k < sqrt(n), where n is the number of training  Compute only an approximate distance (e.g., LSH)
examples  Remove redundant data (e.g., condensing)
 Storage Requirements: Must store all training data
 Remove redundant data (e.g., condensing)
X  High Dimensional Data: “Curse of Dimensionality”
 Required amount of training data increases exponentially with
dimension
 Computational cost also increases

13 / 22

k-Nearest Neighbors Remedies: Remove Redundancy

Nearest Neighbor Classification Nearest Neighbor Classification: Issues If all Voronoi neighbors have the same class, a sample is useless, remove it

• Compute distance between two points: • Scaling issues

– Euclidean distance – If some attributes (coordinates of x) have larger ranges, they are
d ( p, q)   ( pi
i
q ) i
2
treated as more important
– Manhatten distance – Example:
• height of a person may vary from 1.5m to 1.8m
𝑖 𝑖
• weight of a person may vary from 60 KG to 100KG
𝑖
– q norm distance • income of a person may vary from Rs10K to Rs 2 Lakh

𝑞 1/𝑞
𝑖 𝑖 𝑖

Nearest Neighbor Classification: Scaling Issue Nearest neighbor Classification…

• k-NN classifiers are lazy learners
• Determine the class from nearest neighbor list  Scaling issues – It does not build models explicitly
– take the majority vote of class labels among the k-nearest neighbors  Attributes may have to be scaled to prevent distance – Unlike eager learners such as decision tree induction and rule-based
y’ = 𝒙 𝑖 ,𝑦 𝑖 ϵ 𝐷𝑧 𝑖 measures from being dominated by one of the attributes systems
𝑣
where Dz is the set of k closest training examples to z.  Normalize scale – Naturally forms complex decision boundaries;
– Weigh the vote according to distance  Simple option: Linearly scale the range of each feature to – adapts to data density If we have lots of samples, kNN typically
be, e.g., in range [0,1] works well
y’ = 𝒙 𝑖 ,𝑦 𝑖 ϵ 𝐷𝑧 𝑖 𝑖
• Problems: Sensitive to class noise
𝑣  Linearly scale each dimension to have 0 mean and variance
• weight factor, w = 1/d2 – Sensitive to scales of attributes
1 (compute mean µ and variance σ2 for an attribute xj and
scale: (xj − m)/σ) – Distances are less meaningful in high dimensions
– Classifying unknown records are relatively expensive

The KNN classification algorithm Nearest Neighbor Classification: Issues

Let k be the number of nearest neighbors and D be the set of  Irrelevant, correlated attributes add noise to distance
training examples. measure
1. for each test example z = (x’,y’) do  eliminate some attributes Thank You
2.Compute d(x’,x), the distance between z and every  or vary and possibly adapt weight of attributes
example, (x,y) ϵ D  Non-metric attributes (symbols)
3. Select Dz D, the set of k closest training examples to z.  Hamming distance
4. y’ = 𝑖 𝑖 ϵ 𝑧

5. end for

KNN Classification NN Classification: Issue with Distance Measure

$2,50,000 • Problem with Euclidean measure:
$2,00,000 – High dimensional data
• curse of dimensionality: all vectors are almost equidistant to the query vector
$1,50,000

Loan$ Non-Default – Can produce undesirable results

$1,00,000 Default
111111111110 100000000000
vs
$50,000
011111111111 000000000001
$0 d = 1.4142 d = 1.4142
0 10 20 30 40 50 60 70
Age
 Solution: Normalize the vectors to unit length

20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
K-Nearest Neighbor Classifier Explained
No ratings yet
K-Nearest Neighbor Classifier Explained
16 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
25 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
K Nearest Neighbor Classification Guide
0% (1)
K Nearest Neighbor Classification Guide
32 pages
K-Nearest Neighbors Classification Explained
No ratings yet
K-Nearest Neighbors Classification Explained
20 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
ML KN
No ratings yet
ML KN
12 pages
k-Nearest Neighbors Lecture Slides
No ratings yet
k-Nearest Neighbors Lecture Slides
57 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
K-Nearest Neighbors: A Non-Parametric Approach
No ratings yet
K-Nearest Neighbors: A Non-Parametric Approach
22 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
Understanding k-Nearest Neighbors
No ratings yet
Understanding k-Nearest Neighbors
18 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
No ratings yet
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
11 pages
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
No ratings yet
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
23 pages
KNN Classifier for Data Scientists
No ratings yet
KNN Classifier for Data Scientists
16 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Classification (K-Nearest Neighbor)
No ratings yet
Classification (K-Nearest Neighbor)
22 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
445 Lecture 5
No ratings yet
445 Lecture 5
28 pages
UNIT V 5.1 ML Instance Based Learning
No ratings yet
UNIT V 5.1 ML Instance Based Learning
52 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
Aiml Module 3 Part 2
No ratings yet
Aiml Module 3 Part 2
12 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
04 KNN
No ratings yet
04 KNN
60 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
جدول الاختبارات 2023-2024 الفصل الاول - مع القاعات - ٠٦١١٣٥
No ratings yet
جدول الاختبارات 2023-2024 الفصل الاول - مع القاعات - ٠٦١١٣٥
36 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
K-Nearest Neighbors Overview
No ratings yet
K-Nearest Neighbors Overview
21 pages
Nearest Neighbor Methods in Machine Learning
No ratings yet
Nearest Neighbor Methods in Machine Learning
22 pages
CHP 4
No ratings yet
CHP 4
24 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
k-Nearest Neighbors Lecture Notes
No ratings yet
k-Nearest Neighbors Lecture Notes
23 pages
Mod 3
No ratings yet
Mod 3
56 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Nearest Neighbor Regression: Find Training Datum Closest To Predict
No ratings yet
Nearest Neighbor Regression: Find Training Datum Closest To Predict
37 pages
3a KNN PDF
No ratings yet
3a KNN PDF
26 pages
Siddu AIml
No ratings yet
Siddu AIml
8 pages
ML 5
No ratings yet
ML 5
35 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Cloud Mid2
No ratings yet
Cloud Mid2
14 pages
8a. Artificial Neural Network
No ratings yet
8a. Artificial Neural Network
1 page
Special Function and Series Solution of Differential Equation
No ratings yet
Special Function and Series Solution of Differential Equation
83 pages
DBMS Lab - Manual Dec 15, 2020
No ratings yet
DBMS Lab - Manual Dec 15, 2020
22 pages
People Central Hub Configuration Workbook
No ratings yet
People Central Hub Configuration Workbook
2,487 pages
Tutorial Letter 101/3/2025: Name of Module
No ratings yet
Tutorial Letter 101/3/2025: Name of Module
20 pages
Mitel MiVoice 6920 IP Phone Quick Reference Guide
No ratings yet
Mitel MiVoice 6920 IP Phone Quick Reference Guide
4 pages
Nursing Informatics Overview
No ratings yet
Nursing Informatics Overview
19 pages
OOPs DBMS OS Topics For Interviews
No ratings yet
OOPs DBMS OS Topics For Interviews
3 pages
Funtime Handels GMBH - Freefall Drop Tower Manual
No ratings yet
Funtime Handels GMBH - Freefall Drop Tower Manual
177 pages
Criterion B Checklist
No ratings yet
Criterion B Checklist
1 page
Pcsae Study Guide 2021
No ratings yet
Pcsae Study Guide 2021
115 pages
Chapter 5 Memory Organization
No ratings yet
Chapter 5 Memory Organization
75 pages
Research Proposal Seminar
No ratings yet
Research Proposal Seminar
12 pages
Guide For Authors - Neuroscience & Biobehavioral Reviews - ISSN 0149-7634 - ScienceDirect - Com by Elsevier
No ratings yet
Guide For Authors - Neuroscience & Biobehavioral Reviews - ISSN 0149-7634 - ScienceDirect - Com by Elsevier
19 pages
Project Work
No ratings yet
Project Work
18 pages
Agfa Duoscan f40 Users Manual 505555 PDF
No ratings yet
Agfa Duoscan f40 Users Manual 505555 PDF
36 pages
04 DPDK Based Userspace TCPIP Stack Testing
No ratings yet
04 DPDK Based Userspace TCPIP Stack Testing
11 pages
LPC2148 Microcontroller Architecture and
100% (1)
LPC2148 Microcontroller Architecture and
50 pages
Ki CAD
No ratings yet
Ki CAD
50 pages
Bmats201 Vtu Passing Package - Google Drive
No ratings yet
Bmats201 Vtu Passing Package - Google Drive
1 page
American Express - Management Trainee
No ratings yet
American Express - Management Trainee
2 pages
Result Hit
No ratings yet
Result Hit
34 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
4 pages
987 Dark Moon Apollo and The Whistle Blowers
14% (7)
987 Dark Moon Apollo and The Whistle Blowers
2 pages
Podman Basics Cheatsheet RedHat Developer
100% (1)
Podman Basics Cheatsheet RedHat Developer
19 pages
2021 Sinha Et Al Data Technologies and Applications 56 2 A Review of Data Mining Ontologies
No ratings yet
2021 Sinha Et Al Data Technologies and Applications 56 2 A Review of Data Mining Ontologies
33 pages
Expert Systems in Auditing Review
No ratings yet
Expert Systems in Auditing Review
21 pages
SECTION 1: Basic Concepts and Notations, Arrays and Recursion
No ratings yet
SECTION 1: Basic Concepts and Notations, Arrays and Recursion
6 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
Skyrim SE Mod Organizer Setup Guide
No ratings yet
Skyrim SE Mod Organizer Setup Guide
13 pages
The Information Age Sts
No ratings yet
The Information Age Sts
48 pages
Word Processing Basics and Tools
No ratings yet
Word Processing Basics and Tools
99 pages
CS-104 Master View CPU Switch Manual
No ratings yet
CS-104 Master View CPU Switch Manual
12 pages

Nearest-Neighbor Classifier Guide

Uploaded by

Nearest-Neighbor Classifier Guide

Uploaded by

Lazy vs.

Eager Learning 1 nearest-neighbor

classify e.g Naïve-Bayes, Decision tree, SVM

 Accuracy: Lazy method effectively uses a richer hypothesis

Nearest Neighbors: Decision Boundaries

7 Example: 2D decision boundary

Nearest Neighbors: Decision Boundaries

Example: 3D decision boundary

Nearest Neighbors: Multi-modal Data

– Nearest neighbor l To classify an unknown record:

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

Training Choose k of the K-nearest neighbors of a record x are data points

k-Nearest Neighbors Remedies: Remove Redundancy

• Compute distance between two points: • Scaling issues

Nearest Neighbor Classification: Scaling Issue Nearest neighbor Classification…

The KNN classification algorithm Nearest Neighbor Classification: Issues

KNN Classification NN Classification: Issue with Distance Measure

Loan$ Non-Default – Can produce undesirable results

You might also like