0% found this document useful (0 votes)
54 views63 pages

Robabilistic Egmentation: Computer Science and Engineering, Indian Institute of Technology Kharagpur

This document discusses probabilistic mixture models and their application to problems involving missing or incomplete data. It introduces mixture models and their use in image segmentation, where each pixel is modeled as a mixture of probabilities from different image components or "blobs". Expectation-maximization (EM) is presented as an algorithm for fitting mixture models to data with missing values. The EM algorithm iterates between computing expected values for the missing data (E-step) and maximizing a complete-data log-likelihood with respect to the model parameters (M-step).

Uploaded by

sri
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views63 pages

Robabilistic Egmentation: Computer Science and Engineering, Indian Institute of Technology Kharagpur

This document discusses probabilistic mixture models and their application to problems involving missing or incomplete data. It introduces mixture models and their use in image segmentation, where each pixel is modeled as a mixture of probabilities from different image components or "blobs". Expectation-maximization (EM) is presented as an algorithm for fitting mixture models to data with missing values. The EM algorithm iterates between computing expected values for the missing data (E-step) and maximizing a complete-data log-likelihood with respect to the model parameters (M-step).

Uploaded by

sri
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

,

PROBABILISTIC SEGMENTATION
IIT Kharagpur
Computer Science and Engineering,
Indian Institute of Technology
Kharagpur.
1 / 36
,
Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) =

l
p (x |
l
)
l
The mixture model has the form:
p (x | ) =
g

l=1

l
p
l
(x |
l
)
Component densities:
p
l
(x |
l
) =
1
(2)
d/2
det(
l
)
1/2
exp
_

1
2
_
x
l
_

1
l
_
x
l
_
_
2 / 36
,
Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) =

l
p (x |
l
)
l
The mixture model has the form:
p (x | ) =
g

l=1

l
p
l
(x |
l
)
Component densities:
p
l
(x |
l
) =
1
(2)
d/2
det(
l
)
1/2
exp
_

1
2
_
x
l
_

1
l
_
x
l
_
_
2 / 36
,
Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) =

l
p (x |
l
)
l
The mixture model has the form:
p (x | ) =
g

l=1

l
p
l
(x |
l
)
Component densities:
p
l
(x |
l
) =
1
(2)
d/2
det(
l
)
1/2
exp
_

1
2
_
x
l
_

1
l
_
x
l
_
_
2 / 36
,
Image Segmentation
Likelihood for all observations (data points):

j observations
_

_
g

l=1

l
p
l
_
x
j
|
l
_
_

_
3 / 36
,
Mixture Model Line Fitting
p (W) =

l
p (W| a
l
)
Likelihood for a set of observations:

j observations
_

_
g

l=1

l
p
l
_
W
j
| a
l
_
_

_
4 / 36
,
Mixture Model Line Fitting
p (W) =

l
p (W| a
l
)
Likelihood for a set of observations:

j observations
_

_
g

l=1

l
p
l
_
W
j
| a
l
_
_

_
4 / 36
,
Missing data problems
L
c
(x ; u) = log
_

j
p
c
_
x
j
; u
_
_

_
=

j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_

j
p
c
_
x
j
; u
_
_

_
=

j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_

j
p
c
_
x
j
; u
_
_

_
=

j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_

j
p
c
_
x
j
; u
_
_

_
=

j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_

j
p
c
_
x
j
; u
_
_

_
=

j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u) d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
The incomplete data likelihood:

j observations
p
i
_
y
j
; u
_
L
i
(y ; u)
= log
_

j
p
i
_
y
j
; u
_
_

_
=

j
log
_
p
i
_
y
j
; u
_ _
=

j
log
_

_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_

_
6 / 36
,
Missing data problems
The incomplete data likelihood:

j observations
p
i
_
y
j
; u
_
L
i
(y ; u)
= log
_

j
p
i
_
y
j
; u
_
_

_
=

j
log
_
p
i
_
y
j
; u
_ _
=

j
log
_

_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_

_
6 / 36
,
Missing data problems
The incomplete data likelihood:

j observations
p
i
_
y
j
; u
_
L
i
(y ; u) = log
_

j
p
i
_
y
j
; u
_
_

_
=

j
log
_
p
i
_
y
j
; u
_ _
=

j
log
_

_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_

_
6 / 36
,
Missing data problems
The incomplete data likelihood:

j observations
p
i
_
y
j
; u
_
L
i
(y ; u) = log
_

j
p
i
_
y
j
; u
_
_

_
=

j
log
_
p
i
_
y
j
; u
_ _
=

j
log
_

_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_

_
6 / 36
,
Missing data problems
The incomplete data likelihood:

j observations
p
i
_
y
j
; u
_
L
i
(y ; u) = log
_

j
p
i
_
y
j
; u
_
_

_
=

j
log
_
p
i
_
y
j
; u
_ _
=

j
log
_

_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_

_
6 / 36
,
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
x
j
=
_
y
j
, z
j
_
Mixture model:
p (y) =

l
p (y | a
l
)
Complete data log likelihood:

j observations
_

_
g

l=1
z
lj
log p
_
y
j
| a
l
_
_

_
7 / 36
,
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
x
j
=
_
y
j
, z
j
_
Mixture model:
p (y) =

l
p (y | a
l
)
Complete data log likelihood:

j observations
_

_
g

l=1
z
lj
log p
_
y
j
| a
l
_
_

_
7 / 36
,
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
x
j
=
_
y
j
, z
j
_
Mixture model:
p (y) =

l
p (y | a
l
)
Complete data log likelihood:

j observations
_

_
g

l=1
z
lj
log p
_
y
j
| a
l
_
_

_
7 / 36
,
EM
E-step: Compute the expected value for z
j
for each j.
i.e. Compute z
(s)
j
. This results in x
s
= [y, z
s
]
M-step: Maximize the complete data log-likelihood
with respect to u
u
s+1
= arg max
u
L
c
(x
s
; u)
= arg max
u
L
c
([y, z
s
] ; u)
8 / 36
,
EM
E-step: Compute the expected value for z
j
for each j.
i.e. Compute z
(s)
j
. This results in x
s
= [y, z
s
]
M-step: Maximize the complete data log-likelihood
with respect to u
u
s+1
= arg max
u
L
c
(x
s
; u)
= arg max
u
L
c
([y, z
s
] ; u)
8 / 36
,
EM
E-step: Compute the expected value for z
j
for each j.
i.e. Compute z
(s)
j
. This results in x
s
= [y, z
s
]
M-step: Maximize the complete data log-likelihood
with respect to u
u
s+1
= arg max
u
L
c
(x
s
; u)
= arg max
u
L
c
([y, z
s
] ; u)
8 / 36
,
EM in General Case
Expected value of the complete data log-likelihood:
Q
_
u; u
(s)
_
=
_
L
c
(x ; u) p
_
x | u
(s)
, y
_
dx
We maximize with respect to u to get.
u
s+1
= arg max
u
Q
_
u; u
(s)
_
9 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =

I
lm
=
1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:

I
lm
=

(s)
m
p
m
_
x
l
|
(s)
m
_

K
k=1

(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =

I
lm
= 1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:

I
lm
=

(s)
m
p
m
_
x
l
|
(s)
m
_

K
k=1

(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =

I
lm
= 1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:

I
lm
=

(s)
m
p
m
_
x
l
|
(s)
m
_

K
k=1

(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =

I
lm
= 1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:

I
lm
=

(s)
m
p
m
_
x
l
|
(s)
m
_

K
k=1

(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
COMPLETE DATA LOG-LIKELIHOOD:
L
c
_
[x,

I
lm
] ;
(s)
_
=

l all pixel
_

_
g

m=1

I
lm
log p (x
l
|
m
)
_

_
Maximization step:

(s+1)
= arg max

L
c
_
[x,

I
lm
] ;
(s)
_
11 / 36
,
Image Segmentation
COMPLETE DATA LOG-LIKELIHOOD:
L
c
_
[x,

I
lm
] ;
(s)
_
=

l all pixel
_

_
g

m=1

I
lm
log p (x
l
|
m
)
_

_
Maximization step:

(s+1)
= arg max

L
c
_
[x,

I
lm
] ;
(s)
_
11 / 36
,
Image Segmentation
Maximization step:

(s+1)
m
=
1
n
n

l=1
p
_
m| x
l
,
(s)
_

(s+1)
m
=

n
l=1
x
l
p
_
m| x
l
,
(s)
_

n
l=1
p
_
m| x
l
,
(s)
_

(s+1)
m
=

n
l=1
p
_
m| x
l
,
(s)
_
_
_
x
l

(s)
m
_ _
x
l

(s)
m
_

n
l=1
p
_
m| x
l
,
(s)
_
12 / 36
,
How EM works for Image Segmentation
E-step:

I
lm
=

(s)
m
p
m
_
x
l
|
(s)
m
_

K
k=1

(s)
k
p
k
_
x
l
|
(s)
m
_
For each pixel we compute the values:
(s)
m
p
m
_
x
l
|
(s)
m
_
for each
segment m.
For each pixel compute the sum

K
k=1

(s)
k
p
k
_
x
l
|
(s)
m
_
, i.e.
perform summation over all the K segments.
Divide the former by the latter.
M-step:
Compute the
(s+1)
m
,
(s+1)
m
,
(s+1)
m
13 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?
An (n g) matrix M of indicator variables.
k, l
th
entry of M = m
k,l
=
_
1 if point k is drawn from line l
0 otherwise

l
P(m
kl
= 1| point k, line l

s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_

(distance from point k to line l )


2
2
2
_
14 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?
An (n g) matrix M of indicator variables.
k, l
th
entry of M = m
k,l
=
_
1 if point k is drawn from line l
0 otherwise

l
P(m
kl
= 1| point k, line l

s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_

(distance from point k to line l )


2
2
2
_
14 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?
An (n g) matrix M of indicator variables.
k, l
th
entry of M = m
k,l
=
_
1 if point k is drawn from line l
0 otherwise

l
P(m
kl
= 1| point k, line l

s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_

(distance from point k to line l )


2
2
2
_
14 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?
An (n g) matrix M of indicator variables.
k, l
th
entry of M = m
k,l
=
_
1 if point k is drawn from line l
0 otherwise

l
P(m
kl
= 1| point k, line l

s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_

(distance from point k to line l )


2
2
2
_
14 / 36
,
Motion Segmentation EM
WHAT IS MISSING DATA? It is the motion eld to which the pixel l
belongs. Indicator variable V
xy,l
is the xy, l
th
entry of V.
V
xy,l
=
_
1 if xy
th
pixel belongs to the l
th
motion eld
0 otherwise
HOW TO FORMULATE LIKELIHOOD?
L(V, ) =

xy,l
V
xy,l
(I
1
(x, y) I
2
(x+m
1
(x, y ;
l
), y+m
2
(x, y ;
l
)) )
2
2
2
where =
_

1
,
2
, . . .
g
_
P
_
V
xy,l
= 1; I
1
, I
2
,
_
15 / 36
,
Motion Segmentation EM
WHAT IS MISSING DATA? It is the motion eld to which the pixel l
belongs. Indicator variable V
xy,l
is the xy, l
th
entry of V.
V
xy,l
=
_
1 if xy
th
pixel belongs to the l
th
motion eld
0 otherwise
HOW TO FORMULATE LIKELIHOOD?
L(V, ) =

xy,l
V
xy,l
(I
1
(x, y) I
2
(x+m
1
(x, y ;
l
), y+m
2
(x, y ;
l
)) )
2
2
2
where =
_

1
,
2
, . . .
g
_
P
_
V
xy,l
= 1; I
1
, I
2
,
_
15 / 36
,
Motion Segmentation EM
HOW TO FORMULATE LIKELIHOOD?
P
_
V
xy,l
= 1; I
1
, I
2
,
_
A common choice is the afne motion model:
_
m
1
m
2
_
(x, y ;
l
) =
_
a
11
a
12
a
21
a
22
__
x
y
__
a
13
a
23
_
where
l
= (a
11
, a
12
, . . . , a
23
)
Layered representation
16 / 36
,
Identifying Outliers EM
We construct an explicit model of the outliers.
(1 ) P(measurements | model) + P(outliers)
Here = [0, 1] models the frequency with which the outliers
occur,
P(outliers) is the probability model for the outliers.
WHAT IS MISSING DATA?
A variable that indicates which component generated each point.
Complete data likelihood

j
_
(1 ) P
_
measurement
j
| model
_
+ P
_
measurement
j
| outliers
_ _
17 / 36
,
Identifying Outliers EM
We construct an explicit model of the outliers.
(1 ) P(measurements | model) + P(outliers)
Here = [0, 1] models the frequency with which the outliers
occur,
P(outliers) is the probability model for the outliers.
WHAT IS MISSING DATA?
A variable that indicates which component generated each point.
Complete data likelihood

j
_
(1 ) P
_
measurement
j
| model
_
+ P
_
measurement
j
| outliers
_ _
17 / 36
,
Background Subtraction EM
For each pixel we get a series of observations for the successive
frames.
The source of these obeservations is a mixture model with two
components: the background and the noise (foreground).
The background can be modeled as a Gaussian.
The noise can come from some uniform source.
Any pixel which belongs to noise is not background.
18 / 36
,
Difculties Expectation Maximization
Local minima.
Proper initialization.
Extremely small expected weights.
Parameters converging to the boundaries of parameter space.
19 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;

) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;

) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;

) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;

) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;

) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;

) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;

) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;

) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Bayesian Information Criteria (BIC)
P(M| D) =
P(D| M)
P(D)
P(M)
=
_
P(D| M, ) P() d
P(D)
P(M)
Maximizing the posterior P(M| D) yields:
L(D;

) +
p
2
log N
where p is the number of free parameters.
21 / 36
,
Minimum Description Length (MDL) criteria
It yields a selection criteria which is the same as BIC.
L(D;

) +
p
2
log N
where p is the number of free parameters.
22 / 36
,
23 / 36
,
24 / 36
,
25 / 36
,
26 / 36
,
27 / 36
,
28 / 36
,
29 / 36
,
30 / 36
,
31 / 36
,
32 / 36
,
33 / 36
,
34 / 36
,
35 / 36
,
36 / 36

You might also like