SlideShare a Scribd company logo
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
CSC446 : Pattern Recognition
Prof. Dr. Mostafa G. M. Mostafa
Faculty of Computer & Information Sciences
Computer Science Department
AIN SHAMS UNIVERSITY
Parameter Estimation :
Ch3 : Maximum-Likelihood
& Problem of Dimensionality
(Study DHS-Chapter 3 – Sec. 3.1, 3.2, 3.3, 3.7, 3.8)
• Introduction
• Maximum-Likelihood Estimation
–The General Principle
– The Gaussian Case: unknown  and 
– Bias
•Problem of Dimensionality
•Component Analysis and Discrimination
ML & Bayesian Estimation
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 2
• Introduction
– Data availability in a Bayesian framework
• We could design an optimal classifier if we knew:
– P(i) (priors)
– p(x | i) (class-conditional densities)
Unfortunately, we rarely have this complete
information!
– Design a classifier from a training sample
• No problem with prior estimation
• Samples are often too small for class-conditional
estimation (large dimension of feature space!)
ML & Bayesian Estimation
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 3
– A priori information about the problem
– Normality of P(x | i)
• Characterized by 2 parameters
P(x | i) ~ N( i, i)
• Estimation techniques
– Maximum-Likelihood (ML) and the Bayesian
estimations
– Results are nearly identical, but the approaches
are different
ML & Bayesian Estimation
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 4
• Maximum-Likelihood vs. Bayesian
– Parameters in ML estimation are fixed but
unknown!
– Best parameters are obtained by maximizing the
probability of obtaining the samples observed
– Bayesian methods view the parameters as
random variables having some known
distribution
– In either approach, we use P(i | x) for our
classification rule!
ML & Bayesian Estimation
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 5
• Advantages:
– Has good convergence properties as the sample
size increases
– Simpler than alternative methods.
• General principle
– Assume we have c classes and
where: ),( jjj  
Maximum-Likelihood Estimation
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 6
P(x | j)  P (x | j ) ~ N( j, j)
• Our goal is to use the information provided by the training
samples D to estimate  = (1, 2, …, c), where i (i = 1, 2,
…, c) defines the parameters associated with each category
• Suppose that D contains n samples, x1, x2,…, xn that are iid
• ML estimate of  is, by definition the value that maximizes
P(D | )
“It is the value of  that best agrees with the actually
observed training sample”
)|()|(
1
 



nk
k
kxPDP
Maximum-Likelihood Estimation
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 7
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 8
• Optimal Estimation
– Let  = (1, 2, …, p)t and let  be the gradient
operator
– We define l() as the log-likelihood function
l() = ln P(D | )
– New problem statement:
“determine  that maximizes the log-likelihood”
t
p 















 ,...,,
21
)(argˆ
max 

l
The MLE Method
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 9
Set of necessary conditions for an optimum is:
or
0)|(ln)(
1






 


nk
k
kxPl  
The MLE Method
0))ˆ|((ln
1



nk
k
kxP 
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 10
• ML: Univariate Gaussian Case, unknown  and  :
– That is, p(xi | ) ~ N(,  2), where (1, 2) =(, 2)
– The log-likelihood for a single point xk is:
2
1
2
2 )(
2
1
2ln
2
1
)|(ln)( 

  kk xxpl
0
)(
)(
)(
2
1























l
l
l 0
2
)(
2
1
)(
1
2
2
2
1
2
1
2





















k
k
x
x
MLE: Example (1)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 11
Summation over the n points gives:
from eq.(1) and eq.(2), we obtain:






nk
k
k
nk
k
k x
n
x
n 1
22
1
)(
1
;
1





nk
k
kx
1
1
2
(1)0)(
ˆ
1


MLE: Example (1)
 







nk
k
nk
k
kx
1 1
2
2
2
1
2
(2)0
ˆ
)ˆ(
ˆ
1



ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 12
MLE: Example (2)
• The Multivariate Gaussian case:
– That is, p(xi |  ) ~ N(, ) , (unknown  and )
(Samples are drawn from a multivariate normal
population). Therefore:
  )()(
2
1
)2(ln
2
1
)|(ln 1
  
k
t
k
d
k xxxp


n
k
t
kk
n 1
)μˆx)(μˆx(
1ˆ
Similar analysis yields  and  as:
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 13




nk
k
kx
n 1
1
ˆ
Problem: Find  =(1, 2, …, c), that best fit the
parametric function P(D|  ) with the sample data.
Solution: MLE assume that the data is iid, that is:
Then find the value of  that maximize P(D| ) as:
Finally get  from:
 

n
k
k
n
k
k xpxpDp
1
ˆ
1
ˆˆ
)|(lnmax)|(lnmax)|(lnmax 

MLE: Summary
0))ˆ|((ln
1



nk
k
kxP 
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 14
)|())
1
1
k
nk
k
n
xp| θ, x,p(xp(D| θ



Given the following samples of the feature values of
two different classes:
C1 = {10, 8, 7, 11, 9, 12, 6}
C2 = {15, 16, 11, 14, 18, 9, 22, 8}
Use Bayesian Decision Theory to find the class from
which the feature value of x = 12 is drawn from, if
the priors P(C1) = 0.6 and P(C2) = 0.4.
Assignment 3
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 15
Problem of Dimensionality
ASU-CS446 : Pattern Recognition. Prof. Mostafa Gadal-Haqq slide - 16
• Having problems involving 50 or 100
features. Two issues arises:
– How classification accuracy depends upon the
dimensionality (and the amount of training
data)?
– What is the computational complexity of
designing a classifier?
Problems of Dimensionality
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 17
1- Accuracy, Dimension, and Training Sample Size:
• If the features are statistically independent,
theoretically we could have excellent performance.
• For example: Two-class multivariate normal with
the same covariance
– if the priors are equal, we can show that:
0)(lim
)()(:
2
1
)(
21
1
21
2
2
2/
2








errorP
rwhere
dueerrorP
r
t
u
r


Problems of Dimensionality
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 18
1- Accuracy, Dimension, and Training Sample Size:
• If features are independent then:
• Most useful features are the ones for which the
difference between the means is large relative to the
standard deviation
• It has frequently been observed in practice that, beyond
a certain point, the inclusion of additional features leads
to worse rather than better performance
2
1
212
22
2
2
1 ),...,,(








 


di
i i
ii
d
r
diag



Problems of Dimensionality
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 19
77
Problems of Dimensionality
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 20
2- Computational Complexity
• In describing the computational complexity of an
algorithm we are generally interested in the number
of basic mathematical operations, such as additions,
multiplications and divisions it requires, or in the
time and memory needed on a computer.
Problems of Dimensionality
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 21
• Gaussian priors in d dimensions classifier
with n training samples for each of c classes
Total = O(d2..n)
• We have to compute the discriminant
function for each category c ,
• The total for c classes = O(cd2.n)  O(d2.n)
• Cost increase when d and n are large!
Computational Complexity: The MLE
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 22
• Goal: Combine features in order to reduce the
dimension of the feature space:
– Linear combinations are simple to compute & tractable.
– Projecting high dimensional data onto a lower
dimensional space.
• Two classical approaches for finding “optimal”
linear transformation:
– PCA (Principal Component Analysis) “Projection that
best represents the data in a least- square sense”
– MDA (Multiple Discriminant Analysis) “Projection that
best separates the data in a least-squares sense”
Component Analysis and Discriminants
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 23
• Fisher Linear Discriminant (Sec. 4.10)
Component Analysis and Discriminants
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 24
• Multiple Discriminant Analysis (Sec. 4.11)
Component Analysis and Discriminants
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 25
Assignment 4
• Generate an image that consists of Four classes
using the following distributions:
C1 = N( 50, 5) + N(0,10)
C2 = N(100,5) + N(0,20)
C3 = N(150,5) + N(0,30)
C4 = N(200,5) + N(0,40)
• Then, use the MLE to find the parameter of the
density functions (µ,) for each class in the image,
and use the Bayesian Decision Theory to classify
each pixel in the image to its corresponding class.
ASU-CS446 : Pattern Recognition. Prof. Mostafa Gadal-Haqq slide - 26

More Related Content

PDF
CSC446: Pattern Recognition (LN4)
Mostafa G. M. Mostafa
 
PDF
CSC446: Pattern Recognition (LN7)
Mostafa G. M. Mostafa
 
PPT
Decision tree and random forest
Lippo Group Digital
 
PPT
Decision tree
Ami_Surati
 
PPTX
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
PDF
Decision trees in Machine Learning
Mohammad Junaid Khan
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPTX
boosting algorithm
Prithvi Paneru
 
CSC446: Pattern Recognition (LN4)
Mostafa G. M. Mostafa
 
CSC446: Pattern Recognition (LN7)
Mostafa G. M. Mostafa
 
Decision tree and random forest
Lippo Group Digital
 
Decision tree
Ami_Surati
 
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
boosting algorithm
Prithvi Paneru
 

What's hot (20)

PPTX
Feedforward neural network
Sopheaktra YONG
 
PPTX
Bagging.pptx
ComsatsSahiwal1
 
PPTX
Decision tree
shivani saluja
 
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
PPTX
Bag the model with bagging
Chode Amarnath
 
PPT
Decision tree
Soujanya V
 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
 
PDF
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Universitat Politècnica de Catalunya
 
PPTX
Classification and Regression
Megha Sharma
 
PDF
Introduction to Recurrent Neural Network
Yan Xu
 
PPTX
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
PPTX
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
PDF
Classification
CloudxLab
 
PDF
Csc446: Pattren Recognition (LN1)
Mostafa G. M. Mostafa
 
PPTX
The world of loss function
홍배 김
 
PDF
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
PDF
Dimensionality Reduction
Saad Elbeleidy
 
PPTX
Naive bayes
Ashraf Uddin
 
PDF
Understanding Bagging and Boosting
Mohit Rajput
 
PPTX
Machine Learning-Linear regression
kishanthkumaar
 
Feedforward neural network
Sopheaktra YONG
 
Bagging.pptx
ComsatsSahiwal1
 
Decision tree
shivani saluja
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Bag the model with bagging
Chode Amarnath
 
Decision tree
Soujanya V
 
Introduction to Machine Learning Classifiers
Functional Imperative
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Universitat Politècnica de Catalunya
 
Classification and Regression
Megha Sharma
 
Introduction to Recurrent Neural Network
Yan Xu
 
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
Classification
CloudxLab
 
Csc446: Pattren Recognition (LN1)
Mostafa G. M. Mostafa
 
The world of loss function
홍배 김
 
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Dimensionality Reduction
Saad Elbeleidy
 
Naive bayes
Ashraf Uddin
 
Understanding Bagging and Boosting
Mohit Rajput
 
Machine Learning-Linear regression
kishanthkumaar
 
Ad

Similar to CSC446: Pattern Recognition (LN6) (20)

PDF
Csc446: Pattern Recognition
Mostafa G. M. Mostafa
 
PDF
CSC446: Pattern Recognition (LN8)
Mostafa G. M. Mostafa
 
PDF
CSC446: Pattern Recognition (LN3)
Mostafa G. M. Mostafa
 
PDF
CSC446: Pattern Recognition (LN5)
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
PDF
Csc446: Pattren Recognition (LN2)
Mostafa G. M. Mostafa
 
PPT
Understandig PCA and LDA
Dr. Syed Hassan Amin
 
PDF
Dr. Shivu__Machine Learning-Module 3.pdf
Dr. Shivashankar
 
PDF
Neural Networks: Support Vector machines
Mostafa G. M. Mostafa
 
PPTX
Aggregating Multiple Dimensions for Computing Document Relevance
José Ramón Ríos Viqueira
 
PPTX
lecture_16.pptx
ObaidUllah693733
 
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Ryan B Harvey, CSDP, CSM
 
PDF
Machine learning for_finance
Stefan Duprey
 
PDF
Implementing Minimum Error Rate Classifier
Dipesh Shome
 
PPTX
MachineLearningGlobalAcademyofTechnologySlides
ssusercae49e
 
PDF
Module - 5 Machine Learning-22ISE62.pdf
Dr. Shivashankar
 
PDF
机器学习Adaboost
Shocky1
 
PDF
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
Hakka Labs
 
PDF
Lecture 8: Decision Trees & k-Nearest Neighbors
Marina Santini
 
PDF
MLHEP Lectures - day 1, basic track
arogozhnikov
 
Csc446: Pattern Recognition
Mostafa G. M. Mostafa
 
CSC446: Pattern Recognition (LN8)
Mostafa G. M. Mostafa
 
CSC446: Pattern Recognition (LN3)
Mostafa G. M. Mostafa
 
CSC446: Pattern Recognition (LN5)
Mostafa G. M. Mostafa
 
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
Csc446: Pattren Recognition (LN2)
Mostafa G. M. Mostafa
 
Understandig PCA and LDA
Dr. Syed Hassan Amin
 
Dr. Shivu__Machine Learning-Module 3.pdf
Dr. Shivashankar
 
Neural Networks: Support Vector machines
Mostafa G. M. Mostafa
 
Aggregating Multiple Dimensions for Computing Document Relevance
José Ramón Ríos Viqueira
 
lecture_16.pptx
ObaidUllah693733
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Ryan B Harvey, CSDP, CSM
 
Machine learning for_finance
Stefan Duprey
 
Implementing Minimum Error Rate Classifier
Dipesh Shome
 
MachineLearningGlobalAcademyofTechnologySlides
ssusercae49e
 
Module - 5 Machine Learning-22ISE62.pdf
Dr. Shivashankar
 
机器学习Adaboost
Shocky1
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
Hakka Labs
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Marina Santini
 
MLHEP Lectures - day 1, basic track
arogozhnikov
 
Ad

More from Mostafa G. M. Mostafa (14)

PDF
Csc446: Pattren Recognition
Mostafa G. M. Mostafa
 
PDF
Digital Image Processing: Image Restoration
Mostafa G. M. Mostafa
 
PDF
Digital Image Processing: Image Segmentation
Mostafa G. M. Mostafa
 
PDF
Digital Image Processing: Image Enhancement in the Spatial Domain
Mostafa G. M. Mostafa
 
PDF
Digital Image Processing: Image Enhancement in the Frequency Domain
Mostafa G. M. Mostafa
 
PDF
Digital Image Processing: Digital Image Fundamentals
Mostafa G. M. Mostafa
 
PDF
Digital Image Processing: An Introduction
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Introducton
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Least Mean Square (LSM) Algorithm
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Rosenblatt's Perceptron
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Model Building Through Linear Regression
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Radial Bases Functions (RBF)
Mostafa G. M. Mostafa
 
PDF
Neural Networks: Self-Organizing Maps (SOM)
Mostafa G. M. Mostafa
 
Csc446: Pattren Recognition
Mostafa G. M. Mostafa
 
Digital Image Processing: Image Restoration
Mostafa G. M. Mostafa
 
Digital Image Processing: Image Segmentation
Mostafa G. M. Mostafa
 
Digital Image Processing: Image Enhancement in the Spatial Domain
Mostafa G. M. Mostafa
 
Digital Image Processing: Image Enhancement in the Frequency Domain
Mostafa G. M. Mostafa
 
Digital Image Processing: Digital Image Fundamentals
Mostafa G. M. Mostafa
 
Digital Image Processing: An Introduction
Mostafa G. M. Mostafa
 
Neural Networks: Introducton
Mostafa G. M. Mostafa
 
Neural Networks: Least Mean Square (LSM) Algorithm
Mostafa G. M. Mostafa
 
Neural Networks: Rosenblatt's Perceptron
Mostafa G. M. Mostafa
 
Neural Networks: Model Building Through Linear Regression
Mostafa G. M. Mostafa
 
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
Neural Networks: Radial Bases Functions (RBF)
Mostafa G. M. Mostafa
 
Neural Networks: Self-Organizing Maps (SOM)
Mostafa G. M. Mostafa
 

Recently uploaded (20)

PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PDF
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPT
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
1intro to AI.pptx AI components & composition
ssuserb993e5
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
CH2-MODEL-SETUP-v2017.1-JC-APR27-2017.pdf
jcc00023con
 
PPTX
International-health-agency and it's work.pptx
shreehareeshgs
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Chad Readey - An Independent Thinker
Chad Readey
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
1intro to AI.pptx AI components & composition
ssuserb993e5
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
CH2-MODEL-SETUP-v2017.1-JC-APR27-2017.pdf
jcc00023con
 
International-health-agency and it's work.pptx
shreehareeshgs
 

CSC446: Pattern Recognition (LN6)

  • 1. ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1 CSC446 : Pattern Recognition Prof. Dr. Mostafa G. M. Mostafa Faculty of Computer & Information Sciences Computer Science Department AIN SHAMS UNIVERSITY Parameter Estimation : Ch3 : Maximum-Likelihood & Problem of Dimensionality (Study DHS-Chapter 3 – Sec. 3.1, 3.2, 3.3, 3.7, 3.8)
  • 2. • Introduction • Maximum-Likelihood Estimation –The General Principle – The Gaussian Case: unknown  and  – Bias •Problem of Dimensionality •Component Analysis and Discrimination ML & Bayesian Estimation ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 2
  • 3. • Introduction – Data availability in a Bayesian framework • We could design an optimal classifier if we knew: – P(i) (priors) – p(x | i) (class-conditional densities) Unfortunately, we rarely have this complete information! – Design a classifier from a training sample • No problem with prior estimation • Samples are often too small for class-conditional estimation (large dimension of feature space!) ML & Bayesian Estimation ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 3
  • 4. – A priori information about the problem – Normality of P(x | i) • Characterized by 2 parameters P(x | i) ~ N( i, i) • Estimation techniques – Maximum-Likelihood (ML) and the Bayesian estimations – Results are nearly identical, but the approaches are different ML & Bayesian Estimation ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 4
  • 5. • Maximum-Likelihood vs. Bayesian – Parameters in ML estimation are fixed but unknown! – Best parameters are obtained by maximizing the probability of obtaining the samples observed – Bayesian methods view the parameters as random variables having some known distribution – In either approach, we use P(i | x) for our classification rule! ML & Bayesian Estimation ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 5
  • 6. • Advantages: – Has good convergence properties as the sample size increases – Simpler than alternative methods. • General principle – Assume we have c classes and where: ),( jjj   Maximum-Likelihood Estimation ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 6 P(x | j)  P (x | j ) ~ N( j, j)
  • 7. • Our goal is to use the information provided by the training samples D to estimate  = (1, 2, …, c), where i (i = 1, 2, …, c) defines the parameters associated with each category • Suppose that D contains n samples, x1, x2,…, xn that are iid • ML estimate of  is, by definition the value that maximizes P(D | ) “It is the value of  that best agrees with the actually observed training sample” )|()|( 1      nk k kxPDP Maximum-Likelihood Estimation ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 7
  • 8. ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 8
  • 9. • Optimal Estimation – Let  = (1, 2, …, p)t and let  be the gradient operator – We define l() as the log-likelihood function l() = ln P(D | ) – New problem statement: “determine  that maximizes the log-likelihood” t p                  ,...,, 21 )(argˆ max   l The MLE Method ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 9
  • 10. Set of necessary conditions for an optimum is: or 0)|(ln)( 1           nk k kxPl   The MLE Method 0))ˆ|((ln 1    nk k kxP  ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 10
  • 11. • ML: Univariate Gaussian Case, unknown  and  : – That is, p(xi | ) ~ N(,  2), where (1, 2) =(, 2) – The log-likelihood for a single point xk is: 2 1 2 2 )( 2 1 2ln 2 1 )|(ln)(     kk xxpl 0 )( )( )( 2 1                        l l l 0 2 )( 2 1 )( 1 2 2 2 1 2 1 2                      k k x x MLE: Example (1) ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 11
  • 12. Summation over the n points gives: from eq.(1) and eq.(2), we obtain:       nk k k nk k k x n x n 1 22 1 )( 1 ; 1      nk k kx 1 1 2 (1)0)( ˆ 1   MLE: Example (1)          nk k nk k kx 1 1 2 2 2 1 2 (2)0 ˆ )ˆ( ˆ 1    ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 12
  • 13. MLE: Example (2) • The Multivariate Gaussian case: – That is, p(xi |  ) ~ N(, ) , (unknown  and ) (Samples are drawn from a multivariate normal population). Therefore:   )()( 2 1 )2(ln 2 1 )|(ln 1    k t k d k xxxp   n k t kk n 1 )μˆx)(μˆx( 1ˆ Similar analysis yields  and  as: ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 13     nk k kx n 1 1 ˆ
  • 14. Problem: Find  =(1, 2, …, c), that best fit the parametric function P(D|  ) with the sample data. Solution: MLE assume that the data is iid, that is: Then find the value of  that maximize P(D| ) as: Finally get  from:    n k k n k k xpxpDp 1 ˆ 1 ˆˆ )|(lnmax)|(lnmax)|(lnmax   MLE: Summary 0))ˆ|((ln 1    nk k kxP  ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 14 )|()) 1 1 k nk k n xp| θ, x,p(xp(D| θ   
  • 15. Given the following samples of the feature values of two different classes: C1 = {10, 8, 7, 11, 9, 12, 6} C2 = {15, 16, 11, 14, 18, 9, 22, 8} Use Bayesian Decision Theory to find the class from which the feature value of x = 12 is drawn from, if the priors P(C1) = 0.6 and P(C2) = 0.4. Assignment 3 ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 15
  • 16. Problem of Dimensionality ASU-CS446 : Pattern Recognition. Prof. Mostafa Gadal-Haqq slide - 16
  • 17. • Having problems involving 50 or 100 features. Two issues arises: – How classification accuracy depends upon the dimensionality (and the amount of training data)? – What is the computational complexity of designing a classifier? Problems of Dimensionality ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 17
  • 18. 1- Accuracy, Dimension, and Training Sample Size: • If the features are statistically independent, theoretically we could have excellent performance. • For example: Two-class multivariate normal with the same covariance – if the priors are equal, we can show that: 0)(lim )()(: 2 1 )( 21 1 21 2 2 2/ 2         errorP rwhere dueerrorP r t u r   Problems of Dimensionality ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 18
  • 19. 1- Accuracy, Dimension, and Training Sample Size: • If features are independent then: • Most useful features are the ones for which the difference between the means is large relative to the standard deviation • It has frequently been observed in practice that, beyond a certain point, the inclusion of additional features leads to worse rather than better performance 2 1 212 22 2 2 1 ),...,,(             di i i ii d r diag    Problems of Dimensionality ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 19
  • 20. 77 Problems of Dimensionality ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 20
  • 21. 2- Computational Complexity • In describing the computational complexity of an algorithm we are generally interested in the number of basic mathematical operations, such as additions, multiplications and divisions it requires, or in the time and memory needed on a computer. Problems of Dimensionality ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 21
  • 22. • Gaussian priors in d dimensions classifier with n training samples for each of c classes Total = O(d2..n) • We have to compute the discriminant function for each category c , • The total for c classes = O(cd2.n)  O(d2.n) • Cost increase when d and n are large! Computational Complexity: The MLE ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 22
  • 23. • Goal: Combine features in order to reduce the dimension of the feature space: – Linear combinations are simple to compute & tractable. – Projecting high dimensional data onto a lower dimensional space. • Two classical approaches for finding “optimal” linear transformation: – PCA (Principal Component Analysis) “Projection that best represents the data in a least- square sense” – MDA (Multiple Discriminant Analysis) “Projection that best separates the data in a least-squares sense” Component Analysis and Discriminants ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 23
  • 24. • Fisher Linear Discriminant (Sec. 4.10) Component Analysis and Discriminants ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 24
  • 25. • Multiple Discriminant Analysis (Sec. 4.11) Component Analysis and Discriminants ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 25
  • 26. Assignment 4 • Generate an image that consists of Four classes using the following distributions: C1 = N( 50, 5) + N(0,10) C2 = N(100,5) + N(0,20) C3 = N(150,5) + N(0,30) C4 = N(200,5) + N(0,40) • Then, use the MLE to find the parameter of the density functions (µ,) for each class in the image, and use the Bayesian Decision Theory to classify each pixel in the image to its corresponding class. ASU-CS446 : Pattern Recognition. Prof. Mostafa Gadal-Haqq slide - 26