Robabilistic Egmentation: Computer Science and Engineering, Indian Institute of Technology Kharagpur
Robabilistic Egmentation: Computer Science and Engineering, Indian Institute of Technology Kharagpur
PROBABILISTIC SEGMENTATION
IIT Kharagpur
Computer Science and Engineering,
Indian Institute of Technology
Kharagpur.
1 / 36
,
Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) =
l
p (x |
l
)
l
The mixture model has the form:
p (x | ) =
g
l=1
l
p
l
(x |
l
)
Component densities:
p
l
(x |
l
) =
1
(2)
d/2
det(
l
)
1/2
exp
_
1
2
_
x
l
_
1
l
_
x
l
_
_
2 / 36
,
Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) =
l
p (x |
l
)
l
The mixture model has the form:
p (x | ) =
g
l=1
l
p
l
(x |
l
)
Component densities:
p
l
(x |
l
) =
1
(2)
d/2
det(
l
)
1/2
exp
_
1
2
_
x
l
_
1
l
_
x
l
_
_
2 / 36
,
Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) =
l
p (x |
l
)
l
The mixture model has the form:
p (x | ) =
g
l=1
l
p
l
(x |
l
)
Component densities:
p
l
(x |
l
) =
1
(2)
d/2
det(
l
)
1/2
exp
_
1
2
_
x
l
_
1
l
_
x
l
_
_
2 / 36
,
Image Segmentation
Likelihood for all observations (data points):
j observations
_
_
g
l=1
l
p
l
_
x
j
|
l
_
_
_
3 / 36
,
Mixture Model Line Fitting
p (W) =
l
p (W| a
l
)
Likelihood for a set of observations:
j observations
_
_
g
l=1
l
p
l
_
W
j
| a
l
_
_
_
4 / 36
,
Mixture Model Line Fitting
p (W) =
l
p (W| a
l
)
Likelihood for a set of observations:
j observations
_
_
g
l=1
l
p
l
_
W
j
| a
l
_
_
_
4 / 36
,
Missing data problems
L
c
(x ; u) = log
_
j
p
c
_
x
j
; u
_
_
_
=
j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_
j
p
c
_
x
j
; u
_
_
_
=
j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_
j
p
c
_
x
j
; u
_
_
_
=
j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_
j
p
c
_
x
j
; u
_
_
_
=
j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u)
d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
L
c
(x ; u) = log
_
j
p
c
_
x
j
; u
_
_
_
=
j
log
_
p
c
_
x
j
; u
_ _
The incomplete data space:
p
i
(y ; u) =
_
(x | f (x)=y)
p
c
(x ; u) d
where measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
The incomplete data likelihood:
j observations
p
i
_
y
j
; u
_
L
i
(y ; u)
= log
_
j
p
i
_
y
j
; u
_
_
_
=
j
log
_
p
i
_
y
j
; u
_ _
=
j
log
_
_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_
_
6 / 36
,
Missing data problems
The incomplete data likelihood:
j observations
p
i
_
y
j
; u
_
L
i
(y ; u)
= log
_
j
p
i
_
y
j
; u
_
_
_
=
j
log
_
p
i
_
y
j
; u
_ _
=
j
log
_
_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_
_
6 / 36
,
Missing data problems
The incomplete data likelihood:
j observations
p
i
_
y
j
; u
_
L
i
(y ; u) = log
_
j
p
i
_
y
j
; u
_
_
_
=
j
log
_
p
i
_
y
j
; u
_ _
=
j
log
_
_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_
_
6 / 36
,
Missing data problems
The incomplete data likelihood:
j observations
p
i
_
y
j
; u
_
L
i
(y ; u) = log
_
j
p
i
_
y
j
; u
_
_
_
=
j
log
_
p
i
_
y
j
; u
_ _
=
j
log
_
_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_
_
6 / 36
,
Missing data problems
The incomplete data likelihood:
j observations
p
i
_
y
j
; u
_
L
i
(y ; u) = log
_
j
p
i
_
y
j
; u
_
_
_
=
j
log
_
p
i
_
y
j
; u
_ _
=
j
log
_
_
_
{
x | f (x)=y
j
}
p
c
(x ; u) d
_
_
6 / 36
,
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
x
j
=
_
y
j
, z
j
_
Mixture model:
p (y) =
l
p (y | a
l
)
Complete data log likelihood:
j observations
_
_
g
l=1
z
lj
log p
_
y
j
| a
l
_
_
_
7 / 36
,
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
x
j
=
_
y
j
, z
j
_
Mixture model:
p (y) =
l
p (y | a
l
)
Complete data log likelihood:
j observations
_
_
g
l=1
z
lj
log p
_
y
j
| a
l
_
_
_
7 / 36
,
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
x
j
=
_
y
j
, z
j
_
Mixture model:
p (y) =
l
p (y | a
l
)
Complete data log likelihood:
j observations
_
_
g
l=1
z
lj
log p
_
y
j
| a
l
_
_
_
7 / 36
,
EM
E-step: Compute the expected value for z
j
for each j.
i.e. Compute z
(s)
j
. This results in x
s
= [y, z
s
]
M-step: Maximize the complete data log-likelihood
with respect to u
u
s+1
= arg max
u
L
c
(x
s
; u)
= arg max
u
L
c
([y, z
s
] ; u)
8 / 36
,
EM
E-step: Compute the expected value for z
j
for each j.
i.e. Compute z
(s)
j
. This results in x
s
= [y, z
s
]
M-step: Maximize the complete data log-likelihood
with respect to u
u
s+1
= arg max
u
L
c
(x
s
; u)
= arg max
u
L
c
([y, z
s
] ; u)
8 / 36
,
EM
E-step: Compute the expected value for z
j
for each j.
i.e. Compute z
(s)
j
. This results in x
s
= [y, z
s
]
M-step: Maximize the complete data log-likelihood
with respect to u
u
s+1
= arg max
u
L
c
(x
s
; u)
= arg max
u
L
c
([y, z
s
] ; u)
8 / 36
,
EM in General Case
Expected value of the complete data log-likelihood:
Q
_
u; u
(s)
_
=
_
L
c
(x ; u) p
_
x | u
(s)
, y
_
dx
We maximize with respect to u to get.
u
s+1
= arg max
u
Q
_
u; u
(s)
_
9 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =
I
lm
=
1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:
I
lm
=
(s)
m
p
m
_
x
l
|
(s)
m
_
K
k=1
(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =
I
lm
= 1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:
I
lm
=
(s)
m
p
m
_
x
l
|
(s)
m
_
K
k=1
(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =
I
lm
= 1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:
I
lm
=
(s)
m
p
m
_
x
l
|
(s)
m
_
K
k=1
(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
WHAT IS MISSING DATA? An (n g) matrix I of indicator variables.
Expectation step:
E(I
lm
) =
I
lm
= 1 P
_
l
th
pixel comes from m
th
blob
_
+ 0 P
_
l
th
pixel does not come from m
th
blob
_
= P
_
l
th
pixel comes from m
th
blob
_
We get:
I
lm
=
(s)
m
p
m
_
x
l
|
(s)
m
_
K
k=1
(s)
k
p
k
_
x
l
|
(s)
k
_
10 / 36
,
Image Segmentation
COMPLETE DATA LOG-LIKELIHOOD:
L
c
_
[x,
I
lm
] ;
(s)
_
=
l all pixel
_
_
g
m=1
I
lm
log p (x
l
|
m
)
_
_
Maximization step:
(s+1)
= arg max
L
c
_
[x,
I
lm
] ;
(s)
_
11 / 36
,
Image Segmentation
COMPLETE DATA LOG-LIKELIHOOD:
L
c
_
[x,
I
lm
] ;
(s)
_
=
l all pixel
_
_
g
m=1
I
lm
log p (x
l
|
m
)
_
_
Maximization step:
(s+1)
= arg max
L
c
_
[x,
I
lm
] ;
(s)
_
11 / 36
,
Image Segmentation
Maximization step:
(s+1)
m
=
1
n
n
l=1
p
_
m| x
l
,
(s)
_
(s+1)
m
=
n
l=1
x
l
p
_
m| x
l
,
(s)
_
n
l=1
p
_
m| x
l
,
(s)
_
(s+1)
m
=
n
l=1
p
_
m| x
l
,
(s)
_
_
_
x
l
(s)
m
_ _
x
l
(s)
m
_
n
l=1
p
_
m| x
l
,
(s)
_
12 / 36
,
How EM works for Image Segmentation
E-step:
I
lm
=
(s)
m
p
m
_
x
l
|
(s)
m
_
K
k=1
(s)
k
p
k
_
x
l
|
(s)
m
_
For each pixel we compute the values:
(s)
m
p
m
_
x
l
|
(s)
m
_
for each
segment m.
For each pixel compute the sum
K
k=1
(s)
k
p
k
_
x
l
|
(s)
m
_
, i.e.
perform summation over all the K segments.
Divide the former by the latter.
M-step:
Compute the
(s+1)
m
,
(s+1)
m
,
(s+1)
m
13 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?
An (n g) matrix M of indicator variables.
k, l
th
entry of M = m
k,l
=
_
1 if point k is drawn from line l
0 otherwise
l
P(m
kl
= 1| point k, line l
s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_
l
P(m
kl
= 1| point k, line l
s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_
l
P(m
kl
= 1| point k, line l
s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_
l
P(m
kl
= 1| point k, line l
s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp
_
xy,l
V
xy,l
(I
1
(x, y) I
2
(x+m
1
(x, y ;
l
), y+m
2
(x, y ;
l
)) )
2
2
2
where =
_
1
,
2
, . . .
g
_
P
_
V
xy,l
= 1; I
1
, I
2
,
_
15 / 36
,
Motion Segmentation EM
WHAT IS MISSING DATA? It is the motion eld to which the pixel l
belongs. Indicator variable V
xy,l
is the xy, l
th
entry of V.
V
xy,l
=
_
1 if xy
th
pixel belongs to the l
th
motion eld
0 otherwise
HOW TO FORMULATE LIKELIHOOD?
L(V, ) =
xy,l
V
xy,l
(I
1
(x, y) I
2
(x+m
1
(x, y ;
l
), y+m
2
(x, y ;
l
)) )
2
2
2
where =
_
1
,
2
, . . .
g
_
P
_
V
xy,l
= 1; I
1
, I
2
,
_
15 / 36
,
Motion Segmentation EM
HOW TO FORMULATE LIKELIHOOD?
P
_
V
xy,l
= 1; I
1
, I
2
,
_
A common choice is the afne motion model:
_
m
1
m
2
_
(x, y ;
l
) =
_
a
11
a
12
a
21
a
22
__
x
y
__
a
13
a
23
_
where
l
= (a
11
, a
12
, . . . , a
23
)
Layered representation
16 / 36
,
Identifying Outliers EM
We construct an explicit model of the outliers.
(1 ) P(measurements | model) + P(outliers)
Here = [0, 1] models the frequency with which the outliers
occur,
P(outliers) is the probability model for the outliers.
WHAT IS MISSING DATA?
A variable that indicates which component generated each point.
Complete data likelihood
j
_
(1 ) P
_
measurement
j
| model
_
+ P
_
measurement
j
| outliers
_ _
17 / 36
,
Identifying Outliers EM
We construct an explicit model of the outliers.
(1 ) P(measurements | model) + P(outliers)
Here = [0, 1] models the frequency with which the outliers
occur,
P(outliers) is the probability model for the outliers.
WHAT IS MISSING DATA?
A variable that indicates which component generated each point.
Complete data likelihood
j
_
(1 ) P
_
measurement
j
| model
_
+ P
_
measurement
j
| outliers
_ _
17 / 36
,
Background Subtraction EM
For each pixel we get a series of observations for the successive
frames.
The source of these obeservations is a mixture model with two
components: the background and the noise (foreground).
The background can be modeled as a Gaussian.
The noise can come from some uniform source.
Any pixel which belongs to noise is not background.
18 / 36
,
Difculties Expectation Maximization
Local minima.
Proper initialization.
Extremely small expected weights.
Parameters converging to the boundaries of parameter space.
19 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;
) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;
) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;
) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;
) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;
) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;
) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
2L(x ;
) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
L(D;
) +
p
2
log N
where p is the number of free parameters.
20 / 36
,
Bayesian Information Criteria (BIC)
P(M| D) =
P(D| M)
P(D)
P(M)
=
_
P(D| M, ) P() d
P(D)
P(M)
Maximizing the posterior P(M| D) yields:
L(D;
) +
p
2
log N
where p is the number of free parameters.
21 / 36
,
Minimum Description Length (MDL) criteria
It yields a selection criteria which is the same as BIC.
L(D;
) +
p
2
log N
where p is the number of free parameters.
22 / 36
,
23 / 36
,
24 / 36
,
25 / 36
,
26 / 36
,
27 / 36
,
28 / 36
,
29 / 36
,
30 / 36
,
31 / 36
,
32 / 36
,
33 / 36
,
34 / 36
,
35 / 36
,
36 / 36