0% found this document useful (0 votes)

88 views21 pages

Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)

Linear Models for Classification discusses several linear classification models: 1. Linear discriminant functions assign data points to classes based on a linear combination of features. This can result in ambiguous classification regions with multiple classes. 2. Fisher's linear discriminant aims to maximize separation between projected class means while minimizing variance within classes, providing an optimal linear classifier. 3. The perceptron model applies a step function to the weighted sum of inputs, mimicking neurons. It is optimized to minimize misclassifications using gradient descent. 4. Gaussian discriminant analysis models class-conditional densities as Gaussians, allowing linear or quadratic decision boundaries depending on covariance assumptions. 5. Logistic regression directly models class probabilities via

Uploaded by

Anonymous BqTt5TdDS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views21 pages

Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)

Uploaded by

Anonymous BqTt5TdDS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Linear Models for Classification

Sumeet Agarwal, EEL709

(Most figures from Bishop, PRML)

Approaches to classification
Discriminant function: Directly assigns each
data point x to a particular class Ci
Model the conditional class distribution p(C i|x):
allows separation of inference and decision
Generative approach: model class likelihoods,
p(x|Ci), and priors, p(Ci); use Bayes' theorem to
get posteriors:
p(Ci|x) ~ p(x|Ci)p(Ci)
Linear discriminant functions
y(x) = wTx + w0
Multiple Classes
Problem of ambiguous regions
Multiple Classes
Consider a single K-class discriminant, with K linear functions:
yk(x) = wkTx + wk0
And assign x to class Ck if yk(x) > yj(x) for all j k
Implies singly connected and convex decision regions:
Least squares for classification
Too sensitive to outliers:
Least squares for classification
Problematic due to evidently non-Gaussian distribution of target
values:
Fisher's linear discriminant
Linear classification model is like 1-D projection of data: y = wTx.
Thus we need to find a decision threshold along this 1-D
projection (line). Simplest measure is separation of the class
means: m2 m1 = wT(m2 m1). If classes have nondiagonal
covariances, then a better idea is to use the Fisher criterion:

J(w) = (m2 m1)2 / (s12 + s22)

Where s12 denotes the variance of class 1 in the 1-D projection.

Maximising J() attempts to give a large separation between

projected class means, but also a small variance within each
class.
Fisher's linear discriminant

Line joining class means Fisher discriminant

The Perceptron

1(x) w1

f(wT(x))
2(x) w2
f() 1
Activation
function
3(x) w3 0 wT(x)
-1

4(x) w4

A non-linear transformation in the form of a step function

is applied to the weighted sum of the input features. This
is inspired by the way neurons appear to function,
mimicking the action potential.
The perceptron criterion
We'd like a weight vector w such that wT(xi) > 0 for xi C1
(say, ti=1) and wT(xi) < 0 for xi C2 (ti=-1)
Thus, we want wT(xi)ti > 0 i; those data points for which
this is not true will be misclassified
The perceptron criterion tries to minimise the 'magnitude' of
misclassification, i.e., it tries to minimise -wT(xi)ti for all
misclassified points (the set of which is denoted by M):

EP(w) = -iM wT(xi)ti

Why not just count the number of misclassified points?
Because this is a piecewise constant function of w, and thus
the gradient is zero at most places, making optimisation hard
Learning by gradient descent
w(+1) = w() EP(w)
= w() + (xi)ti
(if xi is misclassified)
We can show that after this update, the error due to xi will be
reduced:

-w(+1)T(xi)ti = -w()T(xi)ti ((xi)ti)T(xi)ti

< -w()T(xi)ti
(having set =1, which can be done without loss of generality)
Perceptron convergence

Perceptron
convergence
theorem
guarantees
exact solution in
finite steps for
linearly
separable data;
but no
convergence for
nonseparable
data
Gaussian Discriminant Analysis
Generative approach, with class-conditional densities
(likelihoods) modelled as Gaussians

For the case of two classes, we have:

Logistic sigmoid
Gaussian Discriminant Analysis
In the Gaussian case, we get

The assumption
of equal
covariance
matrices leads
to linear
decision
boundaries
Gaussian Discriminant Analysis

Allowing for unequal covariance matrices for different

classes leads to quadratic decision boundaries
Parameter estimation for GDA
Likelihood:
(assuming equal covariance matrices)

Maximum Likelihood Estimators

Logistic Regression
An example of a probabilistic discriminative model
Rather than learning P(x|Ci) and P(Ci), attempts to directly
learn P(Ci|x)
Advantages: fewer parameters, better if assumptions in
class-conditional density formulation are inaccurate
We have seen how the class posterior for a two-class setting
can be written as a logistic sigmoid acting on a linear function
of the feature vector :

This model is called logistic regression, even though it is

a model for classification, not regression!
Parameter learning
If we let

then the likelihood function is

and we can define a corresponding error, known as

cross-entropy:
Parameter learning
The derivative of the sigmoid function is given by:

Using this, we can obtain the gradient of the error function

with respect to w:

Thus the contribution to the gradient from point n is given by

the 'error' between model prediction and actual class label (yn
tn) times the basis function vector for that point, n
Could use this for sequential learning by gradient descent,
exactly as for least-squares linear regression
Nonlinear basis functions

Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Linear - Classification
No ratings yet
Linear - Classification
72 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
Linear-classifiers
No ratings yet
Linear-classifiers
48 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
linear-models-for-classification
No ratings yet
linear-models-for-classification
21 pages
Inf2b Learn Note10 2up
No ratings yet
Inf2b Learn Note10 2up
7 pages
Linear Discriminant
No ratings yet
Linear Discriminant
25 pages
Machine Learning: Linear Models For Classification 1
No ratings yet
Machine Learning: Linear Models For Classification 1
30 pages
PRu 4
No ratings yet
PRu 4
13 pages
Chapter 4: Linear Models For Classification: Grit Hein & Susanne Leiberg
No ratings yet
Chapter 4: Linear Models For Classification: Grit Hein & Susanne Leiberg
21 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
Bayesian
No ratings yet
Bayesian
21 pages
Lec 9
No ratings yet
Lec 9
15 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
39 pages
Discriminant, Generative, Discriminative Models
No ratings yet
Discriminant, Generative, Discriminative Models
98 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Lec 13
No ratings yet
Lec 13
16 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
Pattern Recognition Linear Classifier by Zaheer Ahmad
0% (1)
Pattern Recognition Linear Classifier by Zaheer Ahmad
37 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Asdfghjkl
No ratings yet
Asdfghjkl
22 pages
Pattern Recognition 21BR551 MODULE 02 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 02 NOTES
16 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
No ratings yet
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
26 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Lecture 04
No ratings yet
Lecture 04
28 pages
C30 C35 LinearModelForClassification
No ratings yet
C30 C35 LinearModelForClassification
50 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
1906.02590v1
No ratings yet
1906.02590v1
16 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Linear Methods For Classification
No ratings yet
Linear Methods For Classification
29 pages
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
No ratings yet
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
31 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
NN Theory
No ratings yet
NN Theory
138 pages
Lecture 6_Generative Models
No ratings yet
Lecture 6_Generative Models
33 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
Bayesian Learning
No ratings yet
Bayesian Learning
21 pages
Parametric Classification PDF
No ratings yet
Parametric Classification PDF
46 pages
ML-chap10_2024_110300
No ratings yet
ML-chap10_2024_110300
29 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
Lec-04_Linear Discriminant Analysis
No ratings yet
Lec-04_Linear Discriminant Analysis
23 pages
Cours FLD
No ratings yet
Cours FLD
28 pages
ML Unit 2
No ratings yet
ML Unit 2
53 pages
Module 3.1
No ratings yet
Module 3.1
25 pages
Unit 3-Discriminative Models
No ratings yet
Unit 3-Discriminative Models
29 pages
3 Linear
No ratings yet
3 Linear
5 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Unit-4 Part-1 Ml Ai&Ml r23
No ratings yet
Unit-4 Part-1 Ml Ai&Ml r23
20 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Test Bank Questions Chapter 3 2019 2022
No ratings yet
Test Bank Questions Chapter 3 2019 2022
6 pages
Course Paper On Regression Analysis of Gold Prices
33% (3)
Course Paper On Regression Analysis of Gold Prices
16 pages
SPSS Oa
No ratings yet
SPSS Oa
4 pages
Data Science Exercises
No ratings yet
Data Science Exercises
22 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
21 pages
3.7 AI - DS Assignment2-1
No ratings yet
3.7 AI - DS Assignment2-1
4 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Lecture 2 - Instrumental Variable
No ratings yet
Lecture 2 - Instrumental Variable
18 pages
Lampiran Hasil Analisis Jalur Dengan Lisrel
No ratings yet
Lampiran Hasil Analisis Jalur Dengan Lisrel
7 pages
smarter-maths-s4
No ratings yet
smarter-maths-s4
16 pages
Unit III Data Analysis and Reporting
No ratings yet
Unit III Data Analysis and Reporting
15 pages
Partial Autocorrelation and The PACF First Examples
No ratings yet
Partial Autocorrelation and The PACF First Examples
5 pages
Review Jurnal Manajemen Pemasaran
No ratings yet
Review Jurnal Manajemen Pemasaran
16 pages
Quasi-Likelihood Functions, Generalized Linear Models, and The Gauss-Newton Method (Wedderburn Article)
No ratings yet
Quasi-Likelihood Functions, Generalized Linear Models, and The Gauss-Newton Method (Wedderburn Article)
9 pages
Stepwise Regression
No ratings yet
Stepwise Regression
3 pages
Solutions To Practice Problems For Part VI: Regression Statistics
No ratings yet
Solutions To Practice Problems For Part VI: Regression Statistics
40 pages
Quiz 6 - Chap 7
No ratings yet
Quiz 6 - Chap 7
5 pages
EViews 2nd Week Assignment With Solution
No ratings yet
EViews 2nd Week Assignment With Solution
12 pages
Forecasting Using Eviews
No ratings yet
Forecasting Using Eviews
4 pages
ho_moderation
No ratings yet
ho_moderation
5 pages
Time Series Analysis
No ratings yet
Time Series Analysis
3 pages
Mece-00 1:econometric Methods: Course Code: Asst
No ratings yet
Mece-00 1:econometric Methods: Course Code: Asst
22 pages
1 PB3
No ratings yet
1 PB3
7 pages
L2 Forecasting
No ratings yet
L2 Forecasting
118 pages
Bayesian Inference 4 LMS PDF
No ratings yet
Bayesian Inference 4 LMS PDF
91 pages
Mtcars: Choosing The Most Related Variable (S) To The Response
No ratings yet
Mtcars: Choosing The Most Related Variable (S) To The Response
13 pages
Latihan Soal Utk UAS
No ratings yet
Latihan Soal Utk UAS
5 pages
Uji Asumsi Klasik
No ratings yet
Uji Asumsi Klasik
27 pages
Machine Learning Algorithms, Real-World Applications and Research Directions
No ratings yet
Machine Learning Algorithms, Real-World Applications and Research Directions
73 pages
Bai602 Ml Lesson Plan 2024-25 Even Aiml Dept
No ratings yet
Bai602 Ml Lesson Plan 2024-25 Even Aiml Dept
5 pages

Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)

Uploaded by

Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)

Uploaded by

Linear Models for Classification

Sumeet Agarwal, EEL709

(Most figures from Bishop, PRML)

J(w) = (m2 m1)2 / (s12 + s22)

Where s12 denotes the variance of class 1 in the 1-D projection.

Maximising J() attempts to give a large separation between

Line joining class means Fisher discriminant

A non-linear transformation in the form of a step function

EP(w) = -iM wT(xi)ti

-w(+1)T(xi)ti = -w()T(xi)ti ((xi)ti)T(xi)ti

For the case of two classes, we have:

Allowing for unequal covariance matrices for different

Maximum Likelihood Estimators

This model is called logistic regression, even though it is

then the likelihood function is

and we can define a corresponding error, known as

Using this, we can obtain the gradient of the error function

Thus the contribution to the gradient from point n is given by

You might also like