SlideShare a Scribd company logo
©2013 LinkedIn Corporation. All Rights Reserved.
Latent Dirichlet Allocation (LDA)
- for ML-IR Discussion Group
1
Prepared by Wayne Tai Lee, Satpreet Singh
©2013 LinkedIn Corporation. All Rights Reserved.
Latent Dirichlet Allocation:
A Bayesian Unsupervised Learning Model
Roadmap
2
• Unsupervised learning
• Bayesian Statistics
• Mixture Models
• LDA – theory and intuition
• LDA – practice and applications
©2013 LinkedIn Corporation. All Rights Reserved.
Unsupervised Learning
Learning patterns with no labels
3
• Clustering is a form of “Unsupervised learning”
• Classification is known as supervised learning
• Validation is difficult
©2013 LinkedIn Corporation. All Rights Reserved. 4
How would you cluster?
©2013 LinkedIn Corporation. All Rights Reserved. 5
Documents of wikipedia
Now try these ones!
©2013 LinkedIn Corporation. All Rights Reserved.
Bayesian Statistics
A framework to update your beliefs
6
• Probabilities as beliefs
• Updates your belief as data is observed
• Requires a model that describes the data generation
©2013 LinkedIn Corporation. All Rights Reserved. 7
Candidate potential
Example: Evaluating Candidates
©2013 LinkedIn Corporation. All Rights Reserved. 8
Candidate potential
Example: Evaluating Candidates
Schooling
Experience
Interview
Internship
©2013 LinkedIn Corporation. All Rights Reserved. 9
Candidate potential
Example: Evaluating Candidates
Schooling
Experience
Interview
Internship
How to update?!
©2013 LinkedIn Corporation. All Rights Reserved. 10
©2013 LinkedIn Corporation. All Rights Reserved. 11
Model for candidates Model for data generation
©2013 LinkedIn Corporation. All Rights Reserved.
Mixture Models
A popular statistical model
12
• An easy way to build hierarchical relationships
©2013 LinkedIn Corporation. All Rights Reserved.
Mixture models visualized
13
Candidate Quality
High
Low
©2013 LinkedIn Corporation. All Rights Reserved. 14
Marginal Distribution of Candidate Performance: ignore quality
©2013 LinkedIn Corporation. All Rights Reserved. 15
Distribution of Candidate Performance:
©2013 LinkedIn Corporation. All Rights Reserved. 16
Distribution of Candidate Performance:
Mixture Weights
©2013 LinkedIn Corporation. All Rights Reserved. 17
Mixture Weights
Distribution of Candidate Performance:
©2013 LinkedIn Corporation. All Rights Reserved. 18
Distribution of Candidate Performance:
?
? ?
?
©2013 LinkedIn Corporation. All Rights Reserved.
How are words in a document generated?
19
©2013 LinkedIn Corporation. All Rights Reserved.
One possibility:
20
Each word comes from different topics (bag of words: ignore order)
©2013 LinkedIn Corporation. All Rights Reserved.
How are words in a document generated?
21
Each word comes from different topics
Mixture Weight
for Topic k
Multinomial Distribution
over ALL words based
on topic k
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
22
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
23
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
1) Pick a topic
2) Pick a word
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
24
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
The chosen
Topic: Z
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
25
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
So we really want to know
1) Z
2) _
3) _
The chosen
Topic: Z
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
26
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
So we really want to know
1) Z (cluster for the word)
2) (document composition)
3) (key words)
The chosen
Topic: Z
©2013 LinkedIn Corporation. All Rights Reserved.
Review!
27
Z W
©2013 LinkedIn Corporation. All Rights Reserved. 28
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
D: number of documents
©2013 LinkedIn Corporation. All Rights Reserved. 29
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
D: number of documents
Bayesian: But what about the distribution for and ??
©2013 LinkedIn Corporation. All Rights Reserved. 30
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
D: number of documents
Bayesian: But what about the distribution for and ??
©2013 LinkedIn Corporation. All Rights Reserved. 31
and control the “sparsity” of the weights for the multinomial.
Implications: a priori we assume
- Topics have few key words
- Documents only have a small subset of topics
©2013 LinkedIn Corporation. All Rights Reserved.
Dirichlet Distribution with Different Sparsity Parameters
32
©2013 LinkedIn Corporation. All Rights Reserved. 33
Latent Dirichlet Allocation!!!
Zd,n
k=1…K
Wd,n
n=1,…,Nd
©2013 LinkedIn Corporation. All Rights Reserved. 34
How do we fit this model?
Want the posterior:
Worst part of Bayesian Analysis…..personally speaking~
©2013 LinkedIn Corporation. All Rights Reserved. 35
Two main ways to get posterior:
- Sampling methods
- Asymtotically correct
- Time consuming
- Lots of black magic in sampling tricks
- Variational methods (practical solution!)
- An approximation with no guarantees
- Faster
- Need math skills
©2013 LinkedIn Corporation. All Rights Reserved. 36
Variational Bayes (specifically mean field variational bayes):
What’s crazy?
- Assumes all the latent variables are independent
What’s not crazy?
- Finds the “best” model within this crazy class.
- Best under KL divergence
Empirically have shown promising results!
For “sufficient” details:
“Explaining Variational Approximations ” by Ormerod and Wand
©2013 LinkedIn Corporation. All Rights Reserved.
LDA Take Home
37
- An intuitively appealing Bayesian unsupervised learning model
- Training is difficult
- Lots of packages exist, main issue is scalability
- Validation is difficult
- Usually cast into a supervised learning framework
- Presentation is difficult
- Visualization for the Bayesian model is hard.

More Related Content

ODP
Topic Modeling
PPTX
What is word2vec?
PPTX
Word embedding
PDF
Latent Dirichlet Allocation
PPTX
PDF
Topic Modeling
PPT
Topic Models - LDA and Correlated Topic Models
PDF
Topic Modeling - NLP
Topic Modeling
What is word2vec?
Word embedding
Latent Dirichlet Allocation
Topic Modeling
Topic Models - LDA and Correlated Topic Models
Topic Modeling - NLP

What's hot (20)

PPTX
Text similarity measures
PPTX
Natural language processing and transformer models
PDF
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
PPTX
PPT
Topic Models
PDF
Word2Vec
PDF
NLP using transformers
PDF
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
PPTX
Introduction to Transformer Model
PDF
Natural Language Processing
PPTX
Random Forest and KNN is fun
PDF
Natural Language Processing (NLP)
PDF
Stable Diffusion path
PPTX
Question answering
PPTX
A Simple Introduction to Word Embeddings
PDF
Attention is All You Need (Transformer)
PPTX
NLP.pptx
PPT
Natural Language Processing
PPT
Text classification
PDF
GPT-2: Language Models are Unsupervised Multitask Learners
Text similarity measures
Natural language processing and transformer models
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
Topic Models
Word2Vec
NLP using transformers
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Introduction to Transformer Model
Natural Language Processing
Random Forest and KNN is fun
Natural Language Processing (NLP)
Stable Diffusion path
Question answering
A Simple Introduction to Word Embeddings
Attention is All You Need (Transformer)
NLP.pptx
Natural Language Processing
Text classification
GPT-2: Language Models are Unsupervised Multitask Learners
Ad

Similar to LDA Beginner's Tutorial (20)

PPTX
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
PPTX
Computing Professional Identity for the Economic Graph
PPTX
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
PPTX
SF Data Science: Developing Data Products
PDF
Workshop - Neo4j Graph Data Science
PPTX
Developing Data Products
PDF
MIT Sloan: Intro to Machine Learning
PDF
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
PDF
Getstarteddssd12717sd
PDF
Relationships Matter: Using Connected Data for Better Machine Learning
PDF
interacting-with-ai-2023---module-2---session-3---handout.pdf
PDF
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
PPTX
Big Data and HR - Talk @SwissHR Congress
PPTX
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
PDF
Social Search in a Professional Context
PPTX
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
PDF
Building Enterprise Knowledge Using Semantic Encyclopedias
PDF
Knowledge Graphs and Generative AI
PDF
Data-X-v3.1
PDF
Data-X-Sparse-v2
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Computing Professional Identity for the Economic Graph
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
SF Data Science: Developing Data Products
Workshop - Neo4j Graph Data Science
Developing Data Products
MIT Sloan: Intro to Machine Learning
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Getstarteddssd12717sd
Relationships Matter: Using Connected Data for Better Machine Learning
interacting-with-ai-2023---module-2---session-3---handout.pdf
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Big Data and HR - Talk @SwissHR Congress
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Social Search in a Professional Context
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
Building Enterprise Knowledge Using Semantic Encyclopedias
Knowledge Graphs and Generative AI
Data-X-v3.1
Data-X-Sparse-v2
Ad

More from Wayne Lee (7)

PPTX
Feature selection can hurt model inference
ODP
Explaining the Basics of Mean Field Variational Approximation for Statisticians
ODP
What is bayesian statistics and how is it different?
PPT
R merge-tutorial
PPTX
The Key to Blind Dates - Data Snooping
PPTX
Crash Course in A/B testing
PPTX
Introduction to Bag of Little Bootstrap
Feature selection can hurt model inference
Explaining the Basics of Mean Field Variational Approximation for Statisticians
What is bayesian statistics and how is it different?
R merge-tutorial
The Key to Blind Dates - Data Snooping
Crash Course in A/B testing
Introduction to Bag of Little Bootstrap

Recently uploaded (20)

PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
My India Quiz Book_20210205121199924.pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Empowerment Technology for Senior High School Guide
PDF
Hazard Identification & Risk Assessment .pdf
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
IGGE1 Understanding the Self1234567891011
PDF
Complications of Minimal Access-Surgery.pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
TNA_Presentation-1-Final(SAVE)) (1).pptx
My India Quiz Book_20210205121199924.pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
History, Philosophy and sociology of education (1).pptx
Chinmaya Tiranga quiz Grand Finale.pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Empowerment Technology for Senior High School Guide
Hazard Identification & Risk Assessment .pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
IGGE1 Understanding the Self1234567891011
Complications of Minimal Access-Surgery.pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
Computer Architecture Input Output Memory.pptx
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα

LDA Beginner's Tutorial

  • 1. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation (LDA) - for ML-IR Discussion Group 1 Prepared by Wayne Tai Lee, Satpreet Singh
  • 2. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation: A Bayesian Unsupervised Learning Model Roadmap 2 • Unsupervised learning • Bayesian Statistics • Mixture Models • LDA – theory and intuition • LDA – practice and applications
  • 3. ©2013 LinkedIn Corporation. All Rights Reserved. Unsupervised Learning Learning patterns with no labels 3 • Clustering is a form of “Unsupervised learning” • Classification is known as supervised learning • Validation is difficult
  • 4. ©2013 LinkedIn Corporation. All Rights Reserved. 4 How would you cluster?
  • 5. ©2013 LinkedIn Corporation. All Rights Reserved. 5 Documents of wikipedia Now try these ones!
  • 6. ©2013 LinkedIn Corporation. All Rights Reserved. Bayesian Statistics A framework to update your beliefs 6 • Probabilities as beliefs • Updates your belief as data is observed • Requires a model that describes the data generation
  • 7. ©2013 LinkedIn Corporation. All Rights Reserved. 7 Candidate potential Example: Evaluating Candidates
  • 8. ©2013 LinkedIn Corporation. All Rights Reserved. 8 Candidate potential Example: Evaluating Candidates Schooling Experience Interview Internship
  • 9. ©2013 LinkedIn Corporation. All Rights Reserved. 9 Candidate potential Example: Evaluating Candidates Schooling Experience Interview Internship How to update?!
  • 10. ©2013 LinkedIn Corporation. All Rights Reserved. 10
  • 11. ©2013 LinkedIn Corporation. All Rights Reserved. 11 Model for candidates Model for data generation
  • 12. ©2013 LinkedIn Corporation. All Rights Reserved. Mixture Models A popular statistical model 12 • An easy way to build hierarchical relationships
  • 13. ©2013 LinkedIn Corporation. All Rights Reserved. Mixture models visualized 13 Candidate Quality High Low
  • 14. ©2013 LinkedIn Corporation. All Rights Reserved. 14 Marginal Distribution of Candidate Performance: ignore quality
  • 15. ©2013 LinkedIn Corporation. All Rights Reserved. 15 Distribution of Candidate Performance:
  • 16. ©2013 LinkedIn Corporation. All Rights Reserved. 16 Distribution of Candidate Performance: Mixture Weights
  • 17. ©2013 LinkedIn Corporation. All Rights Reserved. 17 Mixture Weights Distribution of Candidate Performance:
  • 18. ©2013 LinkedIn Corporation. All Rights Reserved. 18 Distribution of Candidate Performance: ? ? ? ?
  • 19. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 19
  • 20. ©2013 LinkedIn Corporation. All Rights Reserved. One possibility: 20 Each word comes from different topics (bag of words: ignore order)
  • 21. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 21 Each word comes from different topics Mixture Weight for Topic k Multinomial Distribution over ALL words based on topic k
  • 22. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 22 Word Topic 1 Topic K Leadership Big Data Machine Learning
  • 23. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 23 Word Topic 1 Topic K Leadership Big Data Machine Learning 1) Pick a topic 2) Pick a word
  • 24. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 24 Word Topic 1 Topic K Leadership Big Data Machine Learning The chosen Topic: Z
  • 25. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 25 Word Topic 1 Topic K Leadership Big Data Machine Learning So we really want to know 1) Z 2) _ 3) _ The chosen Topic: Z
  • 26. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 26 Word Topic 1 Topic K Leadership Big Data Machine Learning So we really want to know 1) Z (cluster for the word) 2) (document composition) 3) (key words) The chosen Topic: Z
  • 27. ©2013 LinkedIn Corporation. All Rights Reserved. Review! 27 Z W
  • 28. ©2013 LinkedIn Corporation. All Rights Reserved. 28 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents
  • 29. ©2013 LinkedIn Corporation. All Rights Reserved. 29 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents Bayesian: But what about the distribution for and ??
  • 30. ©2013 LinkedIn Corporation. All Rights Reserved. 30 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents Bayesian: But what about the distribution for and ??
  • 31. ©2013 LinkedIn Corporation. All Rights Reserved. 31 and control the “sparsity” of the weights for the multinomial. Implications: a priori we assume - Topics have few key words - Documents only have a small subset of topics
  • 32. ©2013 LinkedIn Corporation. All Rights Reserved. Dirichlet Distribution with Different Sparsity Parameters 32
  • 33. ©2013 LinkedIn Corporation. All Rights Reserved. 33 Latent Dirichlet Allocation!!! Zd,n k=1…K Wd,n n=1,…,Nd
  • 34. ©2013 LinkedIn Corporation. All Rights Reserved. 34 How do we fit this model? Want the posterior: Worst part of Bayesian Analysis…..personally speaking~
  • 35. ©2013 LinkedIn Corporation. All Rights Reserved. 35 Two main ways to get posterior: - Sampling methods - Asymtotically correct - Time consuming - Lots of black magic in sampling tricks - Variational methods (practical solution!) - An approximation with no guarantees - Faster - Need math skills
  • 36. ©2013 LinkedIn Corporation. All Rights Reserved. 36 Variational Bayes (specifically mean field variational bayes): What’s crazy? - Assumes all the latent variables are independent What’s not crazy? - Finds the “best” model within this crazy class. - Best under KL divergence Empirically have shown promising results! For “sufficient” details: “Explaining Variational Approximations ” by Ormerod and Wand
  • 37. ©2013 LinkedIn Corporation. All Rights Reserved. LDA Take Home 37 - An intuitively appealing Bayesian unsupervised learning model - Training is difficult - Lots of packages exist, main issue is scalability - Validation is difficult - Usually cast into a supervised learning framework - Presentation is difficult - Visualization for the Bayesian model is hard.

Editor's Notes

  • #5: Take home: validation is difficult….no true answer here.
  • #6: Clustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.
  • #14: 2 stage process
  • #22: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #23: 2 stage process
  • #24: 2 stage process
  • #25: 2 stage process
  • #26: 2 stage process
  • #27: 2 stage process
  • #28: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #29: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #30: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #31: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #32: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #34: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #35: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #36: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • #37: Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.