0% found this document useful (0 votes)

8 views3 pages

Problem Sheet 1 (1)

This document outlines Homework 1 for a statistics course at the University of Chicago, focusing on the Expectation-Maximization (EM) algorithm, k-means clustering, and Gaussian mixture models. It includes specific problems related to deriving algorithms, maximizing likelihood, implementing k-means, and comparing clustering methods. Additionally, there is an extra credit opportunity for creating a dataset that demonstrates the effectiveness of k-means++ initialization over random initialization.

Uploaded by

tuyue3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

Problem Sheet 1 (1)

Uploaded by

tuyue3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

STAT 37710/CAAM 37710/CMSC 35400 Homework 1

University of Chicago, Spring 2025 due Thursday, April 10 at 11:59pm

1. (25 points) In this problem, we illustrate the Expectation-Maximization (EM) algorithm using con-
crete examples. Suppose that the complete dataset consists of Z = (X , Y), where X is observed but Y
is unobserved. The log likelihood for Z is then denoted by l(θ; X , Y), where θ determines the unknown
parameter vector. We repeat the E-Step and M-Step below until the sequence of θnew ’s converges
(which can be guaranteed for some cases for local maximum).
E-Step (Expectation Step): We compute the expected value of l(θ; X , Y), using (a) the information
gained from the observed data X , and (b) the current parameter estimate, θold . More precisely, let
Z
Q (θ; θold ) := E [l(θ; X , Y) | X , θold ] = l(θ; X , y)p (y | X , θold ) dy. (1)

where p (· | X , θold ) is the conditional density of Y given the observed data X .

M-Step (Maximization Step): We maximize θ over the conditional expectation (1). We simply set
θnew := maxθ Q (θ; θold ), and afterwards, let θold = θnew .

(b) Denote the rightmost side of (2) by g (θ | θold ). It is clear that l(θ; X ) ≥ g (θ | θold ). Prove that
we have equality when θ = θold . Why does this imply that the EM algorithm is reasonable for
maximizing likelihood?
(c) Now, consider the multinomial distribution with four classes Mult (n, πθ ) where

1 1 1 1 1
πθ = + θ, (1 − θ), (1 − θ), θ .
2 4 4 4 4

let x := (x1 , x2 , x3 , x4 ) be a sample from this distribution. Write down the likelihood L(θ; x), and
log-likelihood l(θ; x), for sample x.
(d) We will maximize l(θ; x) over θ using the EM algorithm (other algorithms will receive no marks), as
a toy example. To use EM, we assume that the complete data Z is given by y := (y1 , y2 , y3 , y4 , y5 )
and that y has a 5-class Mult (n, πθ∗ ) distribution where

∗ 1 1 1 1 1
πθ = , θ, (1 − θ), (1 − θ), θ .
2 4 4 4 4

However, instead of observing y directly, we are only able to observe x = (y1 + y2 , y3 , y4 , y5 ).

Therefore, we let X = (y1 + y2 , y3 , y4 , y5 ) and Y = y2 , where Y remains unobserved. Write down
the E-Step and M-Step update equations, with derivations.

2. (15 points) As we saw in class, k-means clustering minimizes the average square distance distortion
k X
X
Javg2 = d(x, mj )2 , (3)
j=1 x∈Cj

1
where d(x, x′ ) = ∥x − x′ ∥, and Cj is the set of points belonging to cluster j. Another distortion function
that we mentioned is the intra-cluster sum of squared distances,
k
X 1 X X
JIC = d(x, x′ )2 .
|C j | ′
j=1 x∈Cj x ∈Cj

1
P
(a) Given that in k-means, mj = |Cj | x∈Cj x, show that JIC = 2 Javg2 .
(b) Let γi ∈ {1, . . . , k} be the cluster assignment of the i ’th data point xi , and let n be the total
number of data points. Then
n
X 2
Javg2 (γ1 , . . . , γn , m1 , . . . , mk ) = d xi , mγi .
i=1

Recall that k-means clustering alternates the following two steps:

1. Update the cluster assignments:

γi ← arg min d(xi , mj ) ∀ i = 1, . . . , n.

j∈{1,...,k}

2. Update the centroids:

1 X
mj ← xi j = 1, . . . , k.
|Cj | i: γ =j
i

Show that step 1 minimizes Javg2 w.r.t. the assignments (holding {mj } fixed), and step 2 minimizes
Javg2 w.r.t. the centroids (holding the assignments fixed).

3. (10 points) Implement the k-means algorithm in a language of your choice, initializing the cluster
centers randomly. The algorithm terminates when no further change in cluster assignments or centroids
occurs.

(a) Use the toy dataset toydata.txt (500 points in R2 , from 3 well-separated clusters). Plot the final
clustering assignments (by color or symbol) and also, on a separate figure, plot the distortion value
vs. iteration for 20 separate runs. Comment on whether you get the “correct” clusters each time,
and on the variability of results across runs.
(b) Implement k-means++ initialization and repeat part (a). Compare convergence (speed and final
distortion) to the original random initialization.
(c) Run both the original and k-means++ algorithms on the MNIST dataset (images are 28 × 28
pixels, i.e. 784-dimensional vectors). Compare how they converge and how results differ for k = 10
vs. k = 16. You can download MNIST via
from torchvision import datasets
mnist_trainset = datasets.MNIST(root=’./data’, train=True,
download=True, transform=None)
mnist_testset = datasets.MNIST(root=’./data’, train=False,
download=True, transform=None)
Explain any differences you observe in speed, distortion, or cluster quality.

4. (50 points) Recall the Gaussian mixture model for clustering

p(x, z) = πz N x; µz , Σz ,
k
with parameters θ = {πj }, {µj }, {Σj } j=1
.

2
(a) Given an i.i.d. sample {(xi , zi )}ni=1 from the model, write down the complete-data log-likelihood
ℓ(θ), ignoring additive constants that do not affect optimization.
(b) Let pi,j = P (zi = j | xi ). Give an expression for pi,j in terms of the mixture parameters.
(c) Derive the expected complete-data log-likelihood ℓθold (θ) with respect to these posterior probabil-
ities pi,j .
P
(d) Show that maximizing ℓθold (θ) under the constraint j πj = 1 gives
n
1X
πj ← pi,j .
n i=1

(e) Similarly, derive the updates for µj and Σj .

(f) Compare these updates to the k-means updates from Question 2.
(g) Apply the mixture-of-Gaussians EM algorithm to the toy data and comment on how it clusters
the points vs. k-means (both accuracy and convergence speed).

5. (Extra Credit up to 20 points)

Create a dataset for which k-means++ leads to solutions whose final distortion is at least 10 times
better (on average) than random initialization. Provide the code used to generate the data.

2023_Summer_Final (1)
No ratings yet
2023_Summer_Final (1)
21 pages
Machine Learning 1 - Programming Assignment 1
No ratings yet
Machine Learning 1 - Programming Assignment 1
1 page
7.Estimation Clustering
No ratings yet
7.Estimation Clustering
56 pages
OHE Manual Volume-I
67% (3)
OHE Manual Volume-I
404 pages
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
8 pages
HW 4
No ratings yet
HW 4
5 pages
Probabilistic Modelling and Reasoning
No ratings yet
Probabilistic Modelling and Reasoning
13 pages
Examples1 2up
No ratings yet
Examples1 2up
4 pages
ECE521H1_20191_631567517513Final2019
No ratings yet
ECE521H1_20191_631567517513Final2019
14 pages
Final F02soln
No ratings yet
Final F02soln
11 pages
BMW MY 2023 All Models Maintenance (on Line)
No ratings yet
BMW MY 2023 All Models Maintenance (on Line)
24 pages
Reading@the End of Life On Earth - Exercises
0% (1)
Reading@the End of Life On Earth - Exercises
4 pages
Service Manual: Model No. ESA41 2K
No ratings yet
Service Manual: Model No. ESA41 2K
24 pages
HW2
No ratings yet
HW2
4 pages
ANSI-IEEE STD C57.117-1986 (IEEE Guide For Reporting Failure Data For Power Transformers and Shunt Reactors On Electric Utility Power Systems)
No ratings yet
ANSI-IEEE STD C57.117-1986 (IEEE Guide For Reporting Failure Data For Power Transformers and Shunt Reactors On Electric Utility Power Systems)
29 pages
Homework Set 3
No ratings yet
Homework Set 3
7 pages
Basics of ECG: DR Subroto Mandal, MD, DM, DC Associate Professor, Cardiology
100% (2)
Basics of ECG: DR Subroto Mandal, MD, DM, DC Associate Professor, Cardiology
206 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
Csci567 Hw1 Spring 2016
No ratings yet
Csci567 Hw1 Spring 2016
9 pages
INAIO_Stage_2_Sample_Problems_MLTheory
No ratings yet
INAIO_Stage_2_Sample_Problems_MLTheory
6 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
The Healing Power of Water
100% (1)
The Healing Power of Water
4 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
Midterm - EE511 - Part B: K K K K
No ratings yet
Midterm - EE511 - Part B: K K K K
8 pages
Ps 4
No ratings yet
Ps 4
6 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
No ratings yet
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
12 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
5G RRC
100% (1)
5G RRC
37 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Homework 2
No ratings yet
Homework 2
4 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
gmm
No ratings yet
gmm
8 pages
66 Data Analyst Interview Questions to Ace Your Interview
No ratings yet
66 Data Analyst Interview Questions to Ace Your Interview
47 pages
ES_key (4)
No ratings yet
ES_key (4)
4 pages
Final f02
No ratings yet
Final f02
12 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
10-601 Machine Learning: Homework 7: Instructions
No ratings yet
10-601 Machine Learning: Homework 7: Instructions
5 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
GU4291_GR5291_Homework1_23079925
No ratings yet
GU4291_GR5291_Homework1_23079925
3 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
GUERIDON SERVICE
No ratings yet
GUERIDON SERVICE
19 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Assignment_2__DS413 (3)
No ratings yet
Assignment_2__DS413 (3)
2 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
Homework 1: Instructions and Notes
No ratings yet
Homework 1: Instructions and Notes
2 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
endsem_ML_makeup_AK-_1_
No ratings yet
endsem_ML_makeup_AK-_1_
7 pages
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
No ratings yet
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
7 pages
2011_end_spring_2011_Computer_Science_Machine_Learning
No ratings yet
2011_end_spring_2011_Computer_Science_Machine_Learning
10 pages
Quiz3_2024
No ratings yet
Quiz3_2024
2 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
SAM Phase 2 - Hard
No ratings yet
SAM Phase 2 - Hard
13 pages
Optical Fiber Communication
0% (1)
Optical Fiber Communication
8 pages
Conduct of Examination Rules
No ratings yet
Conduct of Examination Rules
15 pages
endsem_ML_regular_AK
No ratings yet
endsem_ML_regular_AK
7 pages
Perth Dundee Arbroath Montrose Stonehaven Aberdeen X7: From 14 May 2018
No ratings yet
Perth Dundee Arbroath Montrose Stonehaven Aberdeen X7: From 14 May 2018
7 pages
Chapter 31
No ratings yet
Chapter 31
13 pages
ml-20240315
No ratings yet
ml-20240315
8 pages
Anh Lê Toeic: Part 2 Hotline: 0967.403.648
No ratings yet
Anh Lê Toeic: Part 2 Hotline: 0967.403.648
6 pages
Final 2006
No ratings yet
Final 2006
15 pages
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
No ratings yet
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
4 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
Quiz3_2023
No ratings yet
Quiz3_2023
2 pages
Uni-2 22 12 12
No ratings yet
Uni-2 22 12 12
7 pages
DHA-HVAC - ROOM SIDE DESIGN
No ratings yet
DHA-HVAC - ROOM SIDE DESIGN
20 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Res 1 Chapter 1 Group 3 Lannister
No ratings yet
Res 1 Chapter 1 Group 3 Lannister
15 pages
1 1092GB
No ratings yet
1 1092GB
8 pages
Material Lifting Plan - Revised Format 10 05 2018
No ratings yet
Material Lifting Plan - Revised Format 10 05 2018
2 pages
TM 9-1005-208-12 M1918a2 Bar
100% (2)
TM 9-1005-208-12 M1918a2 Bar
57 pages
Cat The Guardian
No ratings yet
Cat The Guardian
4 pages
Bioluminescence
No ratings yet
Bioluminescence
2 pages
The Ethical Difference Between Private and Public Sector
No ratings yet
The Ethical Difference Between Private and Public Sector
4 pages
quiz4
No ratings yet
quiz4
4 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Ge MCB G100
No ratings yet
Ge MCB G100
6 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
The Ship Building Story
No ratings yet
The Ship Building Story
2 pages
Investment
No ratings yet
Investment
4 pages
Extra Questions
No ratings yet
Extra Questions
48 pages
INTERNSHIP
No ratings yet
INTERNSHIP
23 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Problem Sheet 1 (1)

Uploaded by

Problem Sheet 1 (1)

Uploaded by

STAT 37710/CAAM 37710/CMSC 35400 Homework 1

University of Chicago, Spring 2025 due Thursday, April 10 at 11:59pm

where p (· | X , θold ) is the conditional density of Y given the observed data X .

However, instead of observing y directly, we are only able to observe x = (y1 + y2 , y3 , y4 , y5 ).

Recall that k-means clustering alternates the following two steps:

γi ← arg min d(xi , mj ) ∀ i = 1, . . . , n.

2. Update the centroids:

4. (50 points) Recall the Gaussian mixture model for clustering

(e) Similarly, derive the updates for µj and Σj .

5. (Extra Credit up to 20 points)

You might also like