EM Algorithm - Praveen
EM Algorithm - Praveen
Variants of Expectation-Maximization
Algorithm
Submitted by,
Praveen M
CB.EN.P2CSE15012
I MTech CSE
Introduction
The expectation-maximization (EM) algorithm introduced by Dempster et al in 1977 is a
very general method to solve maximum likelihood estimation problems. In this report,
we review the theory behind EM as well as a number of EM variants, suggesting that beyond
the current state of the art is an even much wider territory still to be discovered.
EM background
Let Y a random variable with probability density function (pdf) p (y|), where is an unknown
parameter vector. Given an outcome y of Y, we aim at maximizing the likelihood function
L() p(y|) wrt over a given search space . This is the very principle of maximum
likelihood (ML) estimation. Unfortunately, except in not very exciting situations such as, e.g.
estimating the mean and variance of a Gaussian population, a ML estimation problem has
generally no closed-form solution. Numerical routines are then needed to approximate it.
EM as a likelihood maximizer
The EM algorithm is a class of optimizers specifically tailored to ML problems, which makes
it both general and not so general. Perhaps the most salient feature of EM is that it works
iteratively by maximizing successive local approximations of the likelihood function. Therefore,
Each iteration consists of two steps: one that performs the approximation (the E-step) and one
that maximizes it (the M-step). But, lets make it clear, not any two-step iterative scheme is an
EM algorithm. For instance, Newton and quasi-Newton methods work in a similar iterative
fashion but do not have much to do with EM. What essentially defines an EM algorithm is the
philosophy underlying the local approximation scheme which, in particular, doesnt rely on
differential calculus.
SAEM
Stochastic Approximation type EM. The SAEM algorithm is a simple hybridation of EM
and SEM that provides a point wise convergence as opposed to the erratic behavior of SEM.
Given a current estimate _n, SAEM performs a standard EM iteration in addition to the SEM
iteration. The parameter is then updated as a weighted mean of both contributions, yielding:
_n+1 = (1 n+1) _EM n+1 + n+1_SEM n+1,
MCEM
Monte Carlo EM [30]. At least formally, MCEM turns out to be a generalization of SEM. In
the SEM simulation step, draw m independent samples z(1)n , z(2)n , . . . , z(m)n instead of just
one, and then maximize the following function:
which, in general, converges almost surely to the standard EM auxiliary function thanks to the
law of large numbers.
Choosing a large value for m justifies calling this Monte Carlo something. In this case,
Q may be seen as an empirical approximation of the standard EM auxiliary function, and the
algorithm is expected to behave similarly to EM. On the other hand, choosing a small value
for m is not forbidden, if not advised (in particular, for computational reasons). We notice that,
for m = 1, MCEM reduces to SEM. A possible strategy consists of increasing progressively the
parameter m, yielding a simulated annealing MCEM which is close in spirit to SAEM.
References:
1. G. Wei and M. A. Tanner. A Monte Carlo implementation of the EM algorithm and the
poor mans data augmentation algorithm. Journal of the American Statistical Association,
85:699704, 1990.
2. D. A. van Dyk and X. L. Meng. Algorithms based on data augmentation: A graphical
representation and comparison. In K. Berk andM. Pourahmadi, editors, Computing
Science and Statistics: Proceedings of the 31st Symposium on the Interface, pages 230
239. Interface Foundation of North America, 2000.