0% found this document useful (0 votes)

12 views26 pages

GMM

Uploaded by

katariyam071

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

GMM

Uploaded by

katariyam071

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Density Estimation

with Gaussian Mixture Models1

CS 2XX: Mathematics for AI and ML

Chandresh Kumar Maurya

IIT Indore
https://chandreshiit.github.io

November 17, 2024

1
Slides credit goes to Yi, Yung
November 17, 2024 1 / 26
Warm-Up

Please watch this tutorial video by Luis Serrano on Gaussian

Mixture Model.

https://www.youtube.com/watch?v=q71Niz856KE

November 17, 2024 2 / 26

Roadmap

(1) Gaussian Mixture Model

(2) Parameter Learning: MLE
(3) Latent-Variable Perspective for Probabilistic Modeling
(4) EM Algorithm

November 17, 2024 3 / 26

Roadmap

(1) Gaussian Mixture Model

(2) Parameter Learning: MLE
(3) Latent-Variable Perspective for Probabilistic Modeling
(4) EM Algorithm

L11(1) November 17, 2024 4 / 26

Density Estimation
• Represent data compactly using a density from a parametric family, e.g., Gaussian
or Beta distribution
• Parameters of those families can be found by MLE and MAPE
• However, there are many cases when simple distributions (e.g., just Gaussian) fail
to approximate data.

L11(1) November 17, 2024 5 / 26

Mixture Models

• More expressive family of distribution

• Idea: Let’s mix! A convex combination of K “base” distributions

K
X K
X
p(x) = πk pk (x), 0 ≤ πk ≤ 1, πk = 1
k=1 k=1

• Multi-modal distributions: Can be used to describe datasets with multiple clusters

• Our focus: Gaussian mixture models
• Want to finding the parameters using MLE, but cannot have the closed form
solution (even with the mixture of Gaussians) → some iterative methods needed

L11(1) November 17, 2024 6 / 26

Gaussian Mixture Model

K
X K
X
p(x|θ) = N (x|µk , Σk ), 0 ≤ πk ≤ 1, πk = 1,
k=1 k=1
where the parameters θ := {µk , Σk , πk : k = 1, . . . , K }
• Example. p(x|θ) = 0.5N (x| − 2, 1/2) + 0.2N (x|1, 2) + 0.3N (x|4, 1)

L11(1) November 17, 2024 7 / 26

Roadmap

(1) Gaussian Mixture Model

(2) Parameter Learning: MLE
(3) Latent-Variable Perspective for Probabilistic Modeling
(4) EM Algorithm

L11(2) November 17, 2024 8 / 26

Parameter Learning: Maximum Likelihood

• Given a iid dataset X = {x1 , . . . , xn }, the log-likelihood is:

N
X N
X K
X
L(θ) = log p(X |θ) = log p(xn |θ) = log πk N (xn |µk , Σk )
n=1 n=1 k=1

• θML = arg minθ (−L(θ))

dL
• Necessary condition for θML : =0
dθ θML
• However, the closed-form solution of θML does not exist, so we rely on an iterative
algorithm (also called EM algorithm).
• We show the algorithm first, and then discuss how we get the algorithm.

L11(2) November 17, 2024 9 / 26

Responsibilities

• Definition. Responsibilities. Given n-th data point xn and the parameters

(µk , Σk , πk : k = 1, . . . , K ),
πk N (xn |µk , Σk )
rnk = P
j πj N (xn |µj , Σj )

• How much is each component k responsible, if the data xn is sampled from the
current mixture model?
PK
• rn = (rnk : k = 1, . . . , K ) is a probability distribution, so k=1 rnk = 1
• Soft assignment of xn to the K mixture components

L11(2) November 17, 2024 10 / 26

EM Algorithm: MLE in Gaussian Mixture Models
EM for MLE in Gaussian Mixture Models
S1. Initialize µk , Σk , πk
S2. E-step: Evaluate responsibilities rnk for every data point xn using the current µk , Σk , πk :
N
πk N (xn |µk , Σk ) X
rnk =P , Nk = rnk
π
j j N (x |µ
n j , Σ j ) n=1

S3. M-step: Reestimate parameters µk , Σk , πk using the current responsibilities rnk :

N N
1 X 1 X T Nk
µk = rnk xn , Σk = rnk (xn − µk )(xn − µk ) , πk = ,
Nk n=1 Nk n=1 N

and go to S2.

- The update equation in M-step is still mysterious, which will be covered later.

L11(2) November 17, 2024 11 / 26

Example: EM Algorithm

L11(2) November 17, 2024 12 / 26

M-Step: Towards the Zero Gradient
• Given X and rnk from E-step, the new updates of µk , Σk , πk should be made, such
that the followings are satisfied:
N
X ∂ log p(xn |θ)
∂L T
= 0 ⇐⇒ = 0T
∂µk ∂µk
n=1
N
∂L X ∂ log p(xn |θ)
= 0 ⇐⇒ =0
∂Σk ∂Σk
n=1
N
∂L X ∂ log p(xn |θ)
= 0 ⇐⇒ =0
∂πk ∂πk
n=1
• Nice thing: the new updates of µk , Σk , πk are all expressed by the responsibilities
[rnk ]
• Let’s take a look at them one by one!

L11(2) November 17, 2024 13 / 26

M-Step: Update of µk

PN
rnk xn
µnew
k = Pn=1
N
, k = 1, . . . , K
n=1 rnk

L11(2) November 17, 2024 14 / 26

M-Step: Update of Σk

N
1 X
Σnew
k = rnk (xn − µk )(xn − µk )T , k = 1, . . . , K
Nk
n=1

L11(2) November 17, 2024 15 / 26

M-Step: Update of πk

PN
n=1 rnk
πknew = , k = 1, . . . , K
N

L11(2) November 17, 2024 16 / 26

Roadmap

(1) Gaussian Mixture Model

(2) Parameter Learning: MLE
(3) Latent-Variable Perspective for Probabilistic Modeling
(4) EM Algorithm

L11(3) November 17, 2024 17 / 26

Latent-Variable Perspective
• Justify some ad hoc decisions made earlier
• Allow for a concrete interpretation of the responsibilities as posterior distributions
• Iterative algorithm for updating the model parameters can be derived in a principled
manner

L11(3) November 17, 2024 18 / 26

Generative Process
• Latent variable z: One-hot encoding random vector z = [z1 , . . . , zK ]T consisting of
K − 1 many 0s and exactly one 1.
• An indicator rv zk = 1 represents whether k-th component is used to generate the
data sample x or not.
• p(x|zk = 1) = N (x|µk , Σk )
• Prior for z with πk = p(zk = 1)
K
X
p(z) = π = [π1 , . . . , πK ]T , πk = 1
k=1
• Sampling procedure
1. Sample which component to use z (i) ∼ p(z)
2. Sample data according to i-th Gaussian x (i) ∼ p(x|z (i) )

L11(3) November 17, 2024 19 / 26

2
In probabilistic PCA, z was continuous, so we integrated them out.
L11(3) November 17, 2024 20 / 26
Joint Distribution, Likelihood, and Posterior (2)

• Posterior for the k-th zk , given an arbitrary single data x:

• Responsibilities are mathematically interpreted as posterior distributions.

L11(3) November 17, 2024 21 / 26

Roadmap

(1) Gaussian Mixture Model

(2) Parameter Learning: MLE
(3) Latent-Variable Perspective for Probabilistic Modeling
(4) EM Algorithm

L11(4) November 17, 2024 22 / 26

Revisiting EM Algorithm for MLE

S1. Initialize µk , Σk , πk • E-step. Expectation over z|x, θ (t) : Given

the current θ (t) = (µk , Σk , πk ), calculates
S2. E-step:
the expected log-likelihood
πk N (xn |µk , Σk )
rnk = P Q(θ|θ (t) ) = Ez|x,θ(t) [log p(x, z|θ)]
j πj N (xn |µj , Σj ) Z
= log p(x, z|θ)p(z|x, θ (t) )dz
S3. M-step: Update µk , Σk , πk using rnk and
go to S2.
• M-step. Maximization of the computation
results in E-step for the new model
parameters.

• Only guarantee of just local-optimum because the original optimization is not

necessarily a convex optimization. L7(4)

L11(4) November 17, 2024 23 / 26

Other Issues
• Model selection for finding a good K , e.g., using nested cross-validation
• Application: Clustering
◦ K-means: Treat the means in GMM as cluster centers and ignore the covariances.
◦ K-means: hard assignment, GMM: soft assignment
• EM algorithm: Highly generic in the sense that it can be used for parameter
learning in general latent-variable models
• Standard criticism for MLE exists such as overfitting. Also, fully-Bayesian approach
assuming some priors on the parameters is possible, but not covered in this notes.
• Other density estimation methods
◦ Histogram-based method: non-parametric method
◦ Kernel-density estimation: non-parametric method

L11(4) November 17, 2024 24 / 26

Questions?

L11(4) November 17, 2024 25 / 26

Review Questions

L11(4) November 17, 2024 26 / 26

Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Lecture-04_GMM_EMalg
No ratings yet
Lecture-04_GMM_EMalg
34 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
lecture5
No ratings yet
lecture5
16 pages
20-gaussian-mixture-model
No ratings yet
20-gaussian-mixture-model
55 pages
Beamer
No ratings yet
Beamer
34 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Lec16 PDF
No ratings yet
Lec16 PDF
10 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
CB PDF
No ratings yet
CB PDF
69 pages
CM Latent - Models 2022
No ratings yet
CM Latent - Models 2022
27 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
5
No ratings yet
5
29 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
Notes7_Mixtures_and_EM
No ratings yet
Notes7_Mixtures_and_EM
7 pages
06 Estimation
No ratings yet
06 Estimation
31 pages
CS772 Lec10
No ratings yet
CS772 Lec10
23 pages
Cse291d 7
No ratings yet
Cse291d 7
39 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
ds11 2
No ratings yet
ds11 2
19 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
PROBABILISTIC Learning Jb-new
No ratings yet
PROBABILISTIC Learning Jb-new
13 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
Machine Learning-Em Algorithm
No ratings yet
Machine Learning-Em Algorithm
5 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Expectation Maximization (EM) Algorithm.pptx
No ratings yet
Expectation Maximization (EM) Algorithm.pptx
47 pages
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
No ratings yet
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
5 pages
gmm
No ratings yet
gmm
8 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Wk04 machine learning
No ratings yet
Wk04 machine learning
6 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
Lecture 11
No ratings yet
Lecture 11
124 pages
An Alternative View of EM_poornima
No ratings yet
An Alternative View of EM_poornima
4 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
Lecture 13. em Algorithm (After-Class)
No ratings yet
Lecture 13. em Algorithm (After-Class)
6 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
AI29
No ratings yet
AI29
3 pages
Expectation Maximization
No ratings yet
Expectation Maximization
21 pages
Chapter 4 ML Parametric Classification
No ratings yet
Chapter 4 ML Parametric Classification
42 pages
ML Unit3 EM GMM VodnalaSrujana
No ratings yet
ML Unit3 EM GMM VodnalaSrujana
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
ddncsl94 PDF
No ratings yet
ddncsl94 PDF
12 pages
Advanced Database
No ratings yet
Advanced Database
23 pages
MACHINE LEARNING QUESTION BANK (M1 M2 M3)
No ratings yet
MACHINE LEARNING QUESTION BANK (M1 M2 M3)
16 pages
Monte Carlo Schedule Risk Analysis
No ratings yet
Monte Carlo Schedule Risk Analysis
3 pages
The Needleman Wunsch Algorithm For Sequence Alignment
No ratings yet
The Needleman Wunsch Algorithm For Sequence Alignment
46 pages
4CCS1FC1 Summary
No ratings yet
4CCS1FC1 Summary
1 page
Ix - Sa-I (24-25)
No ratings yet
Ix - Sa-I (24-25)
4 pages
Chapter 5: Job Shop Scheduling: Phan Nguyen Ky Phuc April 8, 2020
No ratings yet
Chapter 5: Job Shop Scheduling: Phan Nguyen Ky Phuc April 8, 2020
18 pages
Agricultural Data Analysis Using Machine Learningpdf
No ratings yet
Agricultural Data Analysis Using Machine Learningpdf
5 pages
Correlation and Linear
No ratings yet
Correlation and Linear
68 pages
06 Performance Evaluation
No ratings yet
06 Performance Evaluation
12 pages
9-Introduction To Fourier Transform
No ratings yet
9-Introduction To Fourier Transform
10 pages
Sat Practice Test 2
No ratings yet
Sat Practice Test 2
23 pages
Team Member: Meng Zhang, Tianyu Feng Ming Gao, Xintong Li
No ratings yet
Team Member: Meng Zhang, Tianyu Feng Ming Gao, Xintong Li
34 pages
Research On K-Value Selection Method of K-Means Clustering Algorithm
No ratings yet
Research On K-Value Selection Method of K-Means Clustering Algorithm
10 pages
Regulation and Control: by Tewedage Sileshi
No ratings yet
Regulation and Control: by Tewedage Sileshi
12 pages
11206155_605_Question_Paper
No ratings yet
11206155_605_Question_Paper
2 pages
Survey Paper On Linear Programming in Machine Learning
No ratings yet
Survey Paper On Linear Programming in Machine Learning
3 pages
Advantages and Disadvantages of Symmetric and Asymmetric Key Encryption Methods
No ratings yet
Advantages and Disadvantages of Symmetric and Asymmetric Key Encryption Methods
3 pages
Cryptography Stallings CH02 Answers
100% (2)
Cryptography Stallings CH02 Answers
6 pages
Scene Change Detection
No ratings yet
Scene Change Detection
31 pages
Class 12th - Maths I - Solutions
No ratings yet
Class 12th - Maths I - Solutions
13 pages
ML-UNIT-3
No ratings yet
ML-UNIT-3
17 pages
CAT 2021 Junior Solutions
No ratings yet
CAT 2021 Junior Solutions
7 pages
Extended Target Tracking Using Gaussian Processes: Niklas Wahlstr Om, Student Member, IEEE, Emre Ozkan, Member, IEEE
No ratings yet
Extended Target Tracking Using Gaussian Processes: Niklas Wahlstr Om, Student Member, IEEE, Emre Ozkan, Member, IEEE
31 pages
Recommender Systems Notes
No ratings yet
Recommender Systems Notes
16 pages
Fast-Fourier-transform Based Numerical Integration Method For The RayleighSommerfeld Diffraction Formula, PDF
No ratings yet
Fast-Fourier-transform Based Numerical Integration Method For The RayleighSommerfeld Diffraction Formula, PDF
9 pages
Cambridge International AS & A Level: Mathematics 9709/52
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/52
12 pages
Analytical Evaluation of Third Virial Coefficient With Lennard-Jones (12-6) Potential and Its Applications
No ratings yet
Analytical Evaluation of Third Virial Coefficient With Lennard-Jones (12-6) Potential and Its Applications
6 pages

GMM

Uploaded by

GMM

Uploaded by

Density Estimation

with Gaussian Mixture Models1

Chandresh Kumar Maurya

November 17, 2024

Please watch this tutorial video by Luis Serrano on Gaussian

November 17, 2024 2 / 26

(1) Gaussian Mixture Model

November 17, 2024 3 / 26

(1) Gaussian Mixture Model

L11(1) November 17, 2024 4 / 26

L11(1) November 17, 2024 5 / 26

• More expressive family of distribution

• Multi-modal distributions: Can be used to describe datasets with multiple clusters

L11(1) November 17, 2024 6 / 26

L11(1) November 17, 2024 7 / 26

(1) Gaussian Mixture Model

L11(2) November 17, 2024 8 / 26

• Given a iid dataset X = {x1 , . . . , xn }, the log-likelihood is:

• θML = arg minθ (−L(θ))

L11(2) November 17, 2024 9 / 26

• Definition. Responsibilities. Given n-th data point xn and the parameters

L11(2) November 17, 2024 10 / 26

S3. M-step: Reestimate parameters µk , Σk , πk using the current responsibilities rnk :

L11(2) November 17, 2024 11 / 26

L11(2) November 17, 2024 12 / 26

L11(2) November 17, 2024 13 / 26

L11(2) November 17, 2024 14 / 26

L11(2) November 17, 2024 15 / 26

L11(2) November 17, 2024 16 / 26

(1) Gaussian Mixture Model

L11(3) November 17, 2024 17 / 26

L11(3) November 17, 2024 18 / 26

L11(3) November 17, 2024 19 / 26

• Posterior for the k-th zk , given an arbitrary single data x:

• Responsibilities are mathematically interpreted as posterior distributions.

L11(3) November 17, 2024 21 / 26

(1) Gaussian Mixture Model

L11(4) November 17, 2024 22 / 26

S1. Initialize µk , Σk , πk • E-step. Expectation over z|x, θ (t) : Given

• Only guarantee of just local-optimum because the original optimization is not

L11(4) November 17, 2024 23 / 26

L11(4) November 17, 2024 24 / 26

L11(4) November 17, 2024 25 / 26

L11(4) November 17, 2024 26 / 26

You might also like