LDA Beginner's Tutorial

©2013 LinkedIn Corporation. All Rights Reserved.
Latent Dirichlet Allocation (LDA)
- for ML-IR Discussion Group
1
Prepared by Wayne Tai Lee, Satpreet Singh

Latent Dirichlet Allocation:
A Bayesian Unsupervised Learning Model
Roadmap
2
• Unsupervised learning
• Bayesian Statistics
• Mixture Models
• LDA – theory and intuition
• LDA – practice and applications

Unsupervised Learning
Learning patterns with no labels
3
• Clustering is a form of “Unsupervised learning”
• Classification is known as supervised learning
• Validation is difficult

©2013 LinkedIn Corporation. All Rights Reserved. 4
How would you cluster?

Documents of wikipedia
Now try these ones!

Bayesian Statistics
A framework to update your beliefs
6
• Probabilities as beliefs
• Updates your belief as data is observed
• Requires a model that describes the data generation

Candidate potential
Example: Evaluating Candidates

Candidate potential
Schooling
Experience
Interview
Internship

Candidate potential
Schooling
Experience
Interview
Internship
How to update?!

Model for candidates Model for data generation

Mixture Models
A popular statistical model
12
• An easy way to build hierarchical relationships

Mixture models visualized
13
Candidate Quality
High
Low

Marginal Distribution of Candidate Performance: ignore quality

Distribution of Candidate Performance:

Mixture Weights

?
? ?
?

How are words in a document generated?
19

One possibility:
20
Each word comes from different topics (bag of words: ignore order)

How are words in a document generated?
21
Each word comes from different topics
Mixture Weight
for Topic k
Multinomial Distribution
over ALL words based
on topic k

Just a mixture model
22
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning

23
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
1) Pick a topic
2) Pick a word

24
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
The chosen
Topic: Z

25
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
So we really want to know
1) Z
2) _
3) _
The chosen
Topic: Z

26
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
So we really want to know
1) Z (cluster for the word)
2) (document composition)
3) (key words)
The chosen
Topic: Z

Review!
27
Z W

Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
D: number of documents

Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
Bayesian: But what about the distribution for and ??

and control the “sparsity” of the weights for the multinomial.
Implications: a priori we assume
- Topics have few key words
- Documents only have a small subset of topics

Dirichlet Distribution with Different Sparsity Parameters
32

Latent Dirichlet Allocation!!!
Zd,n
k=1…K
Wd,n
n=1,…,Nd

How do we fit this model?
Want the posterior:
Worst part of Bayesian Analysis…..personally speaking~

Two main ways to get posterior:
- Sampling methods
- Asymtotically correct
- Time consuming
- Lots of black magic in sampling tricks
- Variational methods (practical solution!)
- An approximation with no guarantees
- Faster
- Need math skills

Variational Bayes (specifically mean field variational bayes):
What’s crazy?
- Assumes all the latent variables are independent
What’s not crazy?
- Finds the “best” model within this crazy class.
- Best under KL divergence
Empirically have shown promising results!
For “sufficient” details:
“Explaining Variational Approximations ” by Ormerod and Wand

LDA Take Home
37
- An intuitively appealing Bayesian unsupervised learning model
- Training is difficult
- Lots of packages exist, main issue is scalability
- Validation is difficult
- Usually cast into a supervised learning framework
- Presentation is difficult
- Visualization for the Bayesian model is hard.

LDA Beginner's Tutorial

More Related Content

What's hot (20)

Similar to LDA Beginner's Tutorial (20)

More from Wayne Lee (7)

Recently uploaded (20)

LDA Beginner's Tutorial

Editor's Notes