0% found this document useful (0 votes)

34 views22 pages

Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth

This document introduces a simple supervised learning model called prototypes for binary and multi-class classification. The model works by computing the distances between a test sample and labeled prototypes or examples from the training data. It predicts the class of the test sample based on which prototype it is closest to. While simple, this model demonstrates the basic idea of learning from labeled examples and making predictions on new test data. The document also discusses how this basic model could be improved, such as by using more training examples per class to better capture variations within each class.

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views22 pages

Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Learning with Prototypes

CS771: Introduction to Machine Learning

Nisheeth
2
Supervised Learning
Labeled
Training “dog”
Data
“dog” Supervised Learning
“cat” Algorithm
“cat” Cat vs Dog
“cat” Prediction model

Predicted Label
A test image
Cat vs Dog (cat/dog)
Prediction model

CS771: Intro to ML
3
Some Types of Supervised Learning Problems
 Consider building an ML module for an e-mail client

 Some tasks that we may want this module to perform

 Predicting whether an email of spam or normal: Binary Classification
 Predicting which of the many folders the email should be sent to: Multi-class Classification
 Predicting all the relevant tags for an email: Tagging or Multi-label Classification
 Predicting what’s the spam-score of an email: Regression
 Predicting which email(s) should be shown at the top: Ranking
 Predicting which emails are work/study-related emails: One-class Classification

 These predictive modeling tasks can be formulated as supervised learning problems

 Today: A very simple supervised learning model for binary/multi-class classification

 This model doesn’t require any fancy maths – just computing means and distances
CS771: Intro to ML
4
Some Notation and Conventions
 In ML, inputs are usually represented by vectors 0.5 0.3 0.6 0.1 0.2 0.5 0.9 0.2 0.1 0.5

 A vector consists of an array of scalar values

 Geometrically, a vector is just a point in a vector space, e.g.,
 A length 2 vector is a point in 2-dim vector space
Likewise for higher
 A length 3 vector is a point in 3-dim vector space dimensions, even though
harder to visualize
(0.5,0.3) (0.5,0.3,0.6)
0.5 0.3 0.6
0.5 0.3

 Unless specified otherwise

 Small letters in bold font will denote vectors, e.g., , , etc.
 Small letters in normal font to denote scalars, e.g. , etc
 Capital letters in bold font will denote matrices (2-dim arrays), e.g., , etc
CS771: Intro to ML
5
Some Notation and Conventions
 A single vector will be assumed to be of the form

 Unless specified otherwise, vectors will be assumed to be column vectors

 So we will assume to be a column vector of size
 Assuming each element to be real-valued scalar, or (: space of reals)

 If is a feature vector representing, say an image, then

 denotes the dimensionality of this feature vector (number of features)
 (a scalar) denotes the value of feature in the image

 For denoting multiple vectors, we will use a subscript with each vector, e.g.,
 N images denoted by N feature vectors , or compactly as
 The vector denotes the image
 (a scalar) denotes the feature () of the image
CS771: Intro to ML
6
Some Basic Operations on Vectors
 Addition/subtraction of two vectors gives another vector of the same size

 The mean (average or centroid) of vectors

𝑁
1
𝜇= ∑ 𝐱 𝑛 (of the same size as each )
𝑁 𝑛=1
 The inner/dot product of two vectors and Assuming both and have
unit Euclidean norm

= (a real-valued number denoting how “similar” and are)

 For a vector , its Euclidean norm is defined via its inner product with itself

CS771: Intro to ML
7
Computing Distances
 Euclidean (L2 norm) distance between two vectors and

√
Sqrt of Inner product of Another expression in terms of inner
𝐷 the difference vector products of individual vectors

𝑑 2 ( 𝒂, 𝒃 )=¿|𝒂− 𝒃|∨¿ 2= ∑ ( 𝑎𝑖 − 𝑏𝑖 ) =√ ( 𝒂− 𝒃 )
2 ⊤
( 𝒂−𝒃 )=√ 𝒂 𝒂+𝒃 𝒃 − 2 𝒂 𝒃 ¿ ⊤ ⊤ ⊤

𝑖=1
 Weighted Euclidean distance between two vectors and
is a DxD diagonal matrix with weights on its
diagonals. Weights may be known or even

√
learned from data (in ML problems)
𝐷
𝑑 𝑤 ( 𝒂 , 𝒃 )= ∑ 𝑤 𝑖 ( 𝑎𝑖 − 𝑏𝑖 ) =√ ( 𝒂 − 𝒃 )
2 ⊤
𝐖 ( 𝒂 − 𝒃)
𝑖=1
 Absolute (L1 norm) distance between two vectors and
L1 norm distance is also known as the
Manhattan distance or Taxicab norm 𝐷
(it’s a very natural notion of distance
between two points in some vector space) 𝑑 1 ( 𝒂 , 𝒃 )=¿|𝒂 − 𝒃|∨¿1 =∑ ¿ 𝑎 𝑖 −𝑏 𝑖∨¿ ¿ ¿
𝑖=1

CS771: Intro to ML
8

Our First Supervised

Learner

CS771: Intro to ML
9
Prelude: A Very Primitive Classifier
The idea also applies to multi-
class classification: Use one
image per class, and predict label
based on the distances of the test
 Consider a binary classification problem – cat vs dog image from all such images

 Assume training data with just 2 images – one and one

 Given a new test image (cat/dog), how do we predict its label?

 A simple idea: Predict using its distance from each of the 2 training images

d( Test
image , ) < d( Test
image
, ) ? Predict cat else dog
Wait. Is it ML? Seems to be Some possibilities: Use a feature Even this simple model can be
like just a simple “rule”. learning/selection algorithm to learned. For example, for the
Where is the “learning” part extract features, and use a feature extraction/selection part
in this? Mahalanobis distance where you and/or for the distance computation
learn the W matrix (instead of using part
a predefined W), using “distance
metric learning” techniques CS771: Intro to ML
10
Improving Our Primitive Classifier
 Just one input per class may not sufficiently capture variations in a class

 A natural improvement can be by using more inputs per class

“cat” “dog”
“cat” “dog”
“dog”
“cat”

 We will consider two approaches to do this

 Learning with Prototypes (LwP)
 Nearest Neighbors (NN)
 Both LwP and NN will use multiple inputs per class but in different ways

CS771: Intro to ML
Learning to predict categories

dog

dog ?? cat

dog
cat

dog cat

CS771: Intro to ML
12
Learning with Prototypes (LwP)
 Basic idea: Represent each class by a “prototype” vector

 Class Prototype: The “mean” or “average” of inputs from that class

Averages (prototypes) of each of the handwritten digits 1-9

 Predict label of each test input based on its distances from the class prototypes
 Predicted label will be the class that is the closest to the test input

 How we compute distances can have an effect on the accuracy of this model
(may need to try Euclidean, weight Euclidean, Mahalanobis, or something else)
Pic from: https://www.reddit.com/r/dataisbeautiful/comments/3wgbv9/average_handwritten_digit_oc/ CS771: Intro to ML
13
Learning with Prototypes (LwP): An Illustration
 Suppose the task is binary classification (two classes assumed pos and neg)

 Training data: labelled examples ,,

 Assume example from positive class, examples from negative class
 Assume green is positive and red is negative

1 𝜇
∑
1
𝜇− = 𝐱𝑛 𝜇−
+¿=
∑
¿
𝑁 − 𝑦 =−1
𝑛 𝜇+¿¿ 𝑁+¿
𝑦 𝑛=+1
𝐱𝑛¿

For LwP, the prototype

LwP straightforwardly generalizes
vectors (and here) define
to more than 2 classes as well
(multi-class classification) – K Test example Test example the “model”
prototypes for K classes
CS771: Intro to ML
14
LwP: The Prediction Rule, Mathematically
 What does the prediction rule for LwP look like mathematically?

 Assume we are using Euclidean distances here

||𝝁− − 𝐱|| =||𝝁−|| +||𝐱|| −2 ⟨ 𝝁− , 𝐱 ⟩

2 2 2
𝜇− 𝜇+¿¿
¿ ¿

Test example

Prediction Rule: Predict label as +1 if otherwise -1

CS771: Intro to ML
15
LwP: The Prediction Rule, Mathematically
 Let’s expand the prediction rule expression a bit more

 Thus LwP with Euclidean distance is equivalent to a linear model with

 Weight vector 2( Will look at linear models
 Bias term more formally and in more
detail later

 Prediction rule therefore is: Predict +1 if > 0, else predict -1

CS771: Intro to ML
16
LwP: Some Failure Cases
 Here is a case where LwP with Euclidean distance may not work well

Can use feature scaling or use

Mahalanobis distance to handle
such cases (will discuss this in
𝜇− the next lecture)
𝜇+¿¿

Test example

 In general, if classes are not equisized and spherical, LwP with Euclidean
distance will usually not work well (but improvements possible; will discuss
later)
CS771: Intro to ML
17
LwP: Some Key Aspects
 Very simple, interpretable, and lightweight model
 Just requires computing and storing the class prototype vectors

 Works with any number of classes (thus for multi-class classification as well)

 Can be generalized in various ways to improve it further, e.g.,

 Modeling each class by a probability distribution rather than just a prototype vector
 Using distances other than the standard Euclidean distance (e.g., Mahalanobis)

 With a learned distance function, can work very well even with very few
examples from each class (used in some “few-shot learning” models nowadays
– if interested, please refer to “Prototypical Networks for Few-shot Learning”)
CS771: Intro to ML
18
Learning with Prototypes (LwP)
1 𝜇
∑ 𝐱
1
𝜇− = 𝜇−
+¿=
∑
¿
𝑁 − 𝑦 =−1 𝑛
𝑛
𝐰 𝜇+¿¿ 𝑁+¿
𝑦 𝑛=+1
𝐱𝑛 ¿

Prediction rule for LwP

𝐰= 𝝁+¿ −𝝁 − ¿
(for binary classification
If Euclidean distance used
with Euclidean distance)

+ For LwP, the prototype vectors (or

Decision boundary their difference) define the “model”.
(> 0 then predict +1 otherwise -1) and (or just in the Euclidean distance
(perpendicular bisector of line
case) are the model parameters.
joining the class prototype vectors)
Exercise: Show that for the bin. classfn case
𝑁 Note: Even though can be expressed in Can throw away training data after computing the
𝑓 ( 𝐱 )= ∑ 𝛼𝑛 ⟨ 𝐱𝑛 , 𝐱 ⟩ +𝑏 this form, if N > D, this may be more
expensive to compute (O(N) time)as
prototypes and just need to keep the model parameters
𝑛=1 for the test time in such “parametric” models
compared to (O(D) time).

So the “score” of a test point is a weighted sum of its

similarities with each of the N training inputs. Many However the form is still very useful as we will see later
supervised learning models have in this form as we will see when we discuss kernel methods
later
CS771: Intro to ML
19
Improving LwP when classes are complex-shaped
 Using weighted Euclidean or Mahalanobis distance can sometimes help
𝜇+¿¿ √
𝐷
𝑑 𝑤 ( 𝒂 , 𝒃 )= ∑ 𝑤 𝑖 ( 𝑎𝑖 − 𝑏𝑖 )2
𝜇− 𝑖=1

Use a smaller for the horizontal

axis feature in this example

 Note: Mahalanobis distance also has the effect of rotating the axes which helps
A good W will help bring
W will be a 2x2 symmetric matrix in points from same class
this case (chosen by us or learned) closer and move different
classes apart
𝑑 𝑤 ( 𝒂 , 𝒃 )=√ ( 𝒂 − 𝒃 ) 𝐖 ( 𝒂 − 𝒃 )
⊤

𝜇− 𝜇+¿¿
𝜇+¿¿ 𝜇−

CS771: Intro to ML
20
Improving LwP when classes are complex-shaped
 Even with weighted Euclidean or Mahalanobis dist, LwP still a linear classifier

 Exercise: Prove the above fact. You may use the following hint
 Mahalanobis dist can be written as
 is a symmetric matrix and thus can be written as for any matrix
 Showing for Mahalabonis is enough. Weighted Euclidean is a special case with diag

 LwP can be extended to learn nonlinear decision boundaries if we use

nonlinear distances/similarities(more on this Note:
when we talk
Modeling each about kernels)
class by not
just a mean by a probability
distribution can also help in learning
nonlinear decision boundaries. More
on this when we discuss
probabilistic models for
classification
CS771: Intro to ML
21
LwP as a subroutine in other ML models
 For data-clustering (unsupervised learning), K-means clustering is a popular algo

 K-means also computes means/centres/prototypes of groups of unlabeled points

 Harder than LwP since labels are unknown. But we can do the following
 Guess the label of each point, compute means using guess labels Will see K-means
in detail later
 Refine labels using these means (assign each point to the current closest mean)
 Repeat until means don’t change anymore
 Many other models also use LwP as a subroutine
CS771: Intro to ML
22
Next Lecture
 Nearest Neighbors

CS771: Intro to ML

Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
65 pages
CQF - ML - 2 - General Issues - Annotated
No ratings yet
CQF - ML - 2 - General Issues - Annotated
81 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
L03 Slides - Perceptron
No ratings yet
L03 Slides - Perceptron
22 pages
cz4041 8a SVM
No ratings yet
cz4041 8a SVM
38 pages
Lec 5
No ratings yet
Lec 5
22 pages
Lec 5
No ratings yet
Lec 5
20 pages
771 A18 Lec2
No ratings yet
771 A18 Lec2
119 pages
771 A18 Lec8
No ratings yet
771 A18 Lec8
107 pages
Lecture-03 - Vectors and Matrices
No ratings yet
Lecture-03 - Vectors and Matrices
27 pages
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
Lec 3 - LWP & SVM
No ratings yet
Lec 3 - LWP & SVM
16 pages
04-05-Supervised Learning by Computing Distances
No ratings yet
04-05-Supervised Learning by Computing Distances
16 pages
Lecture 2
No ratings yet
Lecture 2
26 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
771 A18 Lec4
100% (1)
771 A18 Lec4
128 pages
ML1 Skript 2023
No ratings yet
ML1 Skript 2023
97 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
Distance Based Method
No ratings yet
Distance Based Method
67 pages
Showfile
No ratings yet
Showfile
130 pages
CS550 Lec7-ClassificationIntro
No ratings yet
CS550 Lec7-ClassificationIntro
49 pages
4 Nearest Neighbors
No ratings yet
4 Nearest Neighbors
13 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
Lec 26
No ratings yet
Lec 26
16 pages
Lec 03
No ratings yet
Lec 03
42 pages
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
No ratings yet
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
22 pages
Lecture 21 and 22
No ratings yet
Lecture 21 and 22
28 pages
Lecture 21 and 22
No ratings yet
Lecture 21 and 22
28 pages
Lecture 3
No ratings yet
Lecture 3
21 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
3 - Learning With Prototypes
No ratings yet
3 - Learning With Prototypes
17 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
6 - Support Vector Machines
No ratings yet
6 - Support Vector Machines
14 pages
3 Percept Ron
No ratings yet
3 Percept Ron
34 pages
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
No ratings yet
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
29 pages
Lecture 02 - Warming-Up and Data and Features - Plain
No ratings yet
Lecture 02 - Warming-Up and Data and Features - Plain
23 pages
2 Getting Started
No ratings yet
2 Getting Started
20 pages
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
No ratings yet
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
15 pages
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
No ratings yet
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
29 pages
Perceptrons and SVMS: Cs771: Introduction To Machine Learning Nisheeth
No ratings yet
Perceptrons and SVMS: Cs771: Introduction To Machine Learning Nisheeth
18 pages
CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
CS771: Introduction To Machine Learning Piyush Rai
25 pages
Lec 02
No ratings yet
Lec 02
27 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Lecture 15 - Generative Models For Supervised Learning - Plain
No ratings yet
Lecture 15 - Generative Models For Supervised Learning - Plain
15 pages
Books Doubtnut Question Bank
No ratings yet
Books Doubtnut Question Bank
63 pages
Lecture 16 - Hyperplane Classifiers - Perceptron - Plain
No ratings yet
Lecture 16 - Hyperplane Classifiers - Perceptron - Plain
9 pages
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
No ratings yet
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
16 pages
Linear Models: CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Linear Models: CS771: Introduction To Machine Learning Piyush Rai
8 pages
Distance Metric Learning For Large Margin Nearest Neighbor Classification
No ratings yet
Distance Metric Learning For Large Margin Nearest Neighbor Classification
8 pages
Lecture 03 - Supervised Learning by Computing Distances - Plain
No ratings yet
Lecture 03 - Supervised Learning by Computing Distances - Plain
17 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Introduction: Geometric Models: - Page 1 of 25
No ratings yet
Introduction: Geometric Models: - Page 1 of 25
25 pages
Lect 1
No ratings yet
Lect 1
24 pages
Determinations Oil Properties by PVT Correlations
No ratings yet
Determinations Oil Properties by PVT Correlations
104 pages
Inequalities and Logarithm
No ratings yet
Inequalities and Logarithm
8 pages
Bernd Klein Python and Machine Learning Letter
No ratings yet
Bernd Klein Python and Machine Learning Letter
453 pages
Cnns Convolution Neural Networks
No ratings yet
Cnns Convolution Neural Networks
50 pages
Multiplying and Dividing Fractions Worksheets
No ratings yet
Multiplying and Dividing Fractions Worksheets
4 pages
A B Testing
100% (1)
A B Testing
28 pages
Hradíc Kralova Entrance Exams Topics
No ratings yet
Hradíc Kralova Entrance Exams Topics
7 pages
Inventory Management Unit-3
No ratings yet
Inventory Management Unit-3
41 pages
Bernd Klein Python Data Analysis Letter
No ratings yet
Bernd Klein Python Data Analysis Letter
514 pages
On The Statistical Distribution of The Heights of Sea Waves
No ratings yet
On The Statistical Distribution of The Heights of Sea Waves
23 pages
Model Training: (Anything Done While We Train The Model)
No ratings yet
Model Training: (Anything Done While We Train The Model)
194 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
CH 04 - Inventory Control Subject To Deterministic Demand - Part 01
No ratings yet
CH 04 - Inventory Control Subject To Deterministic Demand - Part 01
32 pages
Phy-Taylor: Physics-Model-Based Deep Neural Networks: November 2022
No ratings yet
Phy-Taylor: Physics-Model-Based Deep Neural Networks: November 2022
35 pages
General Observation
No ratings yet
General Observation
93 pages
El Ouahabi Mohssine Final Version
No ratings yet
El Ouahabi Mohssine Final Version
6 pages
Determination of Variable Sampling Plans For Non-Normal Process Through Skewness and Kurtosis
No ratings yet
Determination of Variable Sampling Plans For Non-Normal Process Through Skewness and Kurtosis
6 pages
11311-995733 Scattolini PDF
No ratings yet
11311-995733 Scattolini PDF
15 pages
CSAT 2024 - All-In-One Resource Guide
No ratings yet
CSAT 2024 - All-In-One Resource Guide
4 pages
Matlab Tutorial - CIE323 - 2018
No ratings yet
Matlab Tutorial - CIE323 - 2018
13 pages
Changing Summation Limits - The Infinite Series Module
No ratings yet
Changing Summation Limits - The Infinite Series Module
2 pages
EELE 477 Lab 11
No ratings yet
EELE 477 Lab 11
5 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Alldredge Et Al 2015
No ratings yet
Alldredge Et Al 2015
7 pages
HW 7 Sol
No ratings yet
HW 7 Sol
8 pages
Year 1: Using and Applying Mathematics
No ratings yet
Year 1: Using and Applying Mathematics
21 pages
Caraig, Danica G. - LP GRADE 9 (Joint Variation)
No ratings yet
Caraig, Danica G. - LP GRADE 9 (Joint Variation)
7 pages
RGPV Syllabus PDF
No ratings yet
RGPV Syllabus PDF
44 pages
Dynamic Programming Introduction - Tutorial (Updated)
No ratings yet
Dynamic Programming Introduction - Tutorial (Updated)
6 pages
Dataset: (Most Famous)
No ratings yet
Dataset: (Most Famous)
8 pages
Lip Text PDF
No ratings yet
Lip Text PDF
5 pages
Senkin
No ratings yet
Senkin
26 pages
The DES Algorithm Illustrated: Some Preliminary Examples of DES
No ratings yet
The DES Algorithm Illustrated: Some Preliminary Examples of DES
13 pages
Electromagnetism: Example Sheet 1
No ratings yet
Electromagnetism: Example Sheet 1
3 pages
Analysis of Automotive Passive Suspension System With Matlab Program Generation
No ratings yet
Analysis of Automotive Passive Suspension System With Matlab Program Generation
5 pages
GRE 9-Session Syllabus 3rded
No ratings yet
GRE 9-Session Syllabus 3rded
3 pages
Lesson 1.2: Branches of Geodesy
No ratings yet
Lesson 1.2: Branches of Geodesy
3 pages
Command Line Python Scripting: Takeaways: Syntax
No ratings yet
Command Line Python Scripting: Takeaways: Syntax
2 pages
Working With Programs: Takeaways: Syntax
No ratings yet
Working With Programs: Takeaways: Syntax
2 pages
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
Introduction to Vectorial and Matricial Calculus
From Everand
Introduction to Vectorial and Matricial Calculus
Simone Malacrida
No ratings yet

Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth

Uploaded by

Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth

Uploaded by

Learning with Prototypes

CS771: Introduction to Machine Learning

 Some tasks that we may want this module to perform

 These predictive modeling tasks can be formulated as supervised learning problems

 Today: A very simple supervised learning model for binary/multi-class classification

 A vector consists of an array of scalar values

 Unless specified otherwise

 Unless specified otherwise, vectors will be assumed to be column vectors

 If is a feature vector representing, say an image, then

 The mean (average or centroid) of vectors

= (a real-valued number denoting how “similar” and are)

Our First Supervised

 Assume training data with just 2 images – one and one

 Given a new test image (cat/dog), how do we predict its label?

 A natural improvement can be by using more inputs per class

 We will consider two approaches to do this

 Class Prototype: The “mean” or “average” of inputs from that class

Averages (prototypes) of each of the handwritten digits 1-9

 Training data: labelled examples ,,

For LwP, the prototype

 Assume we are using Euclidean distances here

||𝝁− − 𝐱|| =||𝝁−|| +||𝐱|| −2 ⟨ 𝝁− , 𝐱 ⟩

Prediction Rule: Predict label as +1 if otherwise -1

 Thus LwP with Euclidean distance is equivalent to a linear model with

 Prediction rule therefore is: Predict +1 if > 0, else predict -1

Can use feature scaling or use

 Can be generalized in various ways to improve it further, e.g.,

Prediction rule for LwP

+ For LwP, the prototype vectors (or

So the “score” of a test point is a weighted sum of its

Use a smaller for the horizontal

 LwP can be extended to learn nonlinear decision boundaries if we use

 K-means also computes means/centres/prototypes of groups of unlabeled points

You might also like