0% found this document useful (0 votes)
124 views17 pages

Understanding Hidden Markov Models

This document discusses hidden Markov models (HMMs) for modeling sequential data where the underlying states are hidden. It describes how to perform forward-backward inference in HMMs to calculate state probabilities given observed data. It also explains how the Baum-Welch algorithm uses an expectation-maximization approach to learn the transition and observation probabilities in HMMs by treating the hidden states as latent variables during training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views17 pages

Understanding Hidden Markov Models

This document discusses hidden Markov models (HMMs) for modeling sequential data where the underlying states are hidden. It describes how to perform forward-backward inference in HMMs to calculate state probabilities given observed data. It also explains how the Baum-Welch algorithm uses an expectation-maximization approach to learn the transition and observation probabilities in HMMs by treating the hidden states as latent variables during training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Hidden Markov Models

Machine Learning 

CSx824/ECEx242
Bert Huang
Virginia Tech
Outline

• Hidden Markov models (HMMs)

• Forward-backward for HMMs

• Baum-Welch learning (expectation maximization)


Hidden State Transitions

? ?

submarine ?
Hidden State Transitions

? ?

submarine ?
Hidden State Transitions

? ?

submarine
Hidden Markov Models
p(yt |xt ) observation probability SONAR noisiness

p(xt |xt 1) transition probability submarine locomotion


TY1 T
Y
p(X , Y ) = p(x1 ) p(xt+1 |xt ) p(yt 0 |xt 0 )
t=1 t 0 =1

x1 x2 x3 x4 x5

y1 y2 y3 y4 y5
Hidden State Inference
p(X |Y ) p(xt |Y )

↵t (xt ) = p(xt , y1 , ... , yt ) t (xt ) = p(yt+1 , ... , yT |xt )

↵t (xt ) t (xt ) = p(xt , y1 , ... , yt )p(yt+1 , ... , yT |xt ) = p(xt , Y ) / p(xt |Y )

normalize to get conditional probability

note: not the same as p(x1 , ... , xT , Y )


Forward Inference
↵t (xt ) = p(xt , y1 , ... , yt )

p(x1 , y1 ) = p(x1 )p(y1 |x1 ) = ↵1 (x1 )

X X
p(x2 , y1 , y2 ) = p(x1 , y1 )p(x2 |x1 )p(y2 |x2 ) = ↵2 (x2 ) = ↵1 (x1 )p(x2 |x1 )p(y2 |x2 )
x1 x1
X
p(xt+1 , y1 , ... , yt+1 ) = ↵t+1 (xt+1 ) = ↵t (xt )p(xt+1 |xt )p(yt+1 |xt+1 )
xt
Backward Inference
t (xt ) = p(yt+1 , ... , yT |xt )

p({}|xT ) = 1 = T (xT )
X
t 1 (xt 1 ) = p(yt , ... , yT |xt 1) = p(xt |xt 1 )p(yt , yt+1 , ... , yT |xt )
xt
X
= p(xt |xt 1 )p(yt |xt )p(yt+1 , ... , yT |xt )
xt
X
= p(xt |xt 1 )p(yt |xt ) t (xt )
xt
Backward Inference
t (xt ) = p(yt+1 , ... , yT |xt )

p({}|xT ) = 1 = T (xT )

X
t 1 (xt 1 ) = p(yt , ... , yT |xt 1) = p(xt |xt 1 )p(yt |xt ) t (xt )
xt
Fusing the Messages
↵t (xt ) = p(xt , y1 , ... , yt ) t (xt ) = p(yt+1 , ... , yT |xt )

↵t (xt ) t (xt ) = p(xt , y1 , ... , yt )p(yt+1 , ... , yT |xt ) = p(xt , Y ) / p(xt |Y )

p(xt , xt+1 , y1 , ... , yt , yt+1 , yt+2 , ... , yT )


p(xt , xt+1 |Y ) =
p(Y )
p(xt , y1 , ... , yt )p(xt+1 |xt )p(yt+2 , ... , yT |xt+1 )p(yt+1 |xt+1 )
= P
xT p(x t , Y )
↵t (xt )p(xt+1 |xt ) t+1 (xt+1 )p(yt+1 |xt+1 )
= P
xT ↵T (xT )
Forward-Backward Inference
X
↵1 (x1 ) = p(x1 )p(y1 |x1 ) ↵t+1 (xt+1 ) = ↵t (xt )p(xt+1 |xt )p(yt+1 |xt+1 )
xt
X
T (xT ) = 1 t 1 (xt 1 ) = p(xt |xt 1 )p(yt |xt ) t (xt )
xt

↵t (xt ) t (xt )
p(xt , Y ) = ↵t (xt ) t (xt ) p(xt |Y ) = P 0 0
0
xt ↵ t (x t ) t (x t )

↵t (xt )p(xt+1 |xt ) t+1 (xt+1 )p(yt+1 |xt+1 )


p(xt , xt+1 |Y ) = P
xT ↵T (xT )
Normalization
To avoid underflow, re-normalize at each time step

↵t (xt ) ˜t (xt ) = P t (xt )


˜ t (xt ) = P
↵ 0 0
xt0 ↵t (xt ) xt0 t (xt )

Exercise: why is this okay?


Learning
• Parameterize and learn


 p(xt+1 |xt ) p(yt |xt )

conditional probability table observation model
transition matrix emission model
• If fully observed, super easy!

• If x is hidden (most cases) treat as latent variable

• E.g., expectation maximization


Baum-Welch Algorithm

EM using forward-backward inference as E-step


Baum-Welch Details
Compute p(xt |Y ) and p(xt , xt+1 |Y ) using forward-backward

Maximize weighted (expected) log-likelihood

T e.g., Gaussian
1 X PT
p(x1 ) p(xt |Y ) or p(x1 |Y ) t=1 p(x t = x|Y )y t
T µx PT
t=1 p(x = x|Y )
0
t =1 t
PT 1 PT
t=1p(xt+1 = i, xt = j|Y ) p(x = x|Y )I (y = y )
p(xt 0 +1 = i|xt 0 = j) PT 1 t=1 t t
p(y |x) PT
t=1 p(xt = j|Y )
0
t =1 p(x t = x|Y )
e.g., multinomial
Summary
• HMMs represent hidden states

• Transitions between adjacent states

• Observation based on states

• Forward-backward inference to incorporate all evidence

• Expectation maximization to train parameters (Baum-Welch)

• Treat states as latent variables

You might also like