SlideShare a Scribd company logo
BigML Education
Topic Models
July 2017
BigML Education Program 2Ensembles
In This Video
• Introduction to Topic Models
• Exploration of Topic Models in the BigML Interface
• Inference of topic distributions using a trained topic
model
• Parameterization of topic models
BigML Education Program 3Ensembles
Data For Topic Models
• Unstructured text data
• Short stories, novels, newspaper articles
• Web pages
• Customer reviews or surveys
• E-mail Messages
• Data is not like most machine learning data
• Often no fields in each row (i.e., no “columns”)
• Each instance is just the text of the document
BigML Education Program 4Ensembles
Categorizing Instances
• Often, many instances will have words indicating they
are about the same thing (the same topic)
• It may be useful to identify instances corresponding to a
certain topic
• Topic modeling automatically discovers common topics
in the data
• Can assign a score to each instance indicating how
much that instance is “about” a given topic
BigML Education Program 5Ensembles
Generative Modeling
• Decision trees / Logistic regression are discriminative
models
• Aggressively model the classification boundary
• Parsimonious: Don’t consider anything you don’t
have to
• Topic models are generative models
• Posit a theory of how the data was generated
• Tweak the theory to fit the data
BigML Education Program 6Ensembles
Title Text
Be not afraid of greatness:
some are born great, some
achieve greatness, and
some have greatness
thrust upon 'em.
DocumentTerm
BigML Education Program 7Ensembles
Topics
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
shoe asteroid
flashlight
pizza…
plate giraffe
purple jump…
Be not afraid
of greatness: 

some are born
great, some
achieve 

greatness…
term probability
shoe ϵ
asteroid ϵ
flashlight ϵ
pizza ϵ
… ϵ
• A topic is a term generator
• Invoke it a bunch of times to get a document
• Most will be nonsense, but eventually you’ll generate
your dataset
BigML Education Program 8Ensembles
Topic Models
word probability
travel 23,55 %
airplane 2,33 %
mars 0,003 %
mantle ϵ
… ϵTopic: travel
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
Topic: space
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
airplane
passport pizza
…
mars quasar
lightyear soda
word probability
space 38,94 %
airplane ϵ
mars 13,43 %
mantle 0,05 %
… ϵ
Generate
Document
BigML Education Program 9Ensembles
Review
• Topic models are generative models for unstructured
text data
• The BigML interface provides an intuitive way to explore
your topic model
• You can get the topic distribution for an instance by
using the “topic distribution” or “batch topic distribution”
options in the model resource view
• Changing the “number of topics” and specifying
“excluded terms” may give you a much different and
possibly better topic model

More Related Content

PDF
BigML Education - Introduction
PDF
BigML Education - Sources
PDF
BigML Education - OptiML
PDF
Ideas spracklen-final
PDF
BSSML17 - Topic Models
PDF
BigML Fall 2016 Release
PDF
BSSML16 L4. Association Discovery and Topic Modeling
PDF
Learning Emergent Knowledge from Blog Postings
BigML Education - Introduction
BigML Education - Sources
BigML Education - OptiML
Ideas spracklen-final
BSSML17 - Topic Models
BigML Fall 2016 Release
BSSML16 L4. Association Discovery and Topic Modeling
Learning Emergent Knowledge from Blog Postings

Similar to BigML Education - Topic Models (20)

PPTX
llm_presentation and deep learning methods
PDF
Smart Data Webinar: Machine Learning Techniques for Analyzing Unstructured Bu...
PDF
Data Science Accelerator Program
PPTX
deep_learning_presentation related to llm
PPTX
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
PPTX
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
PDF
Hacking Predictive Modeling - RoadSec 2018
PDF
Debugging machine-learning
PPTX
Towads Unsupervised Commonsense Reasoning in AI
PDF
VSSML18. Clustering and Latent Dirichlet Allocation
PPTX
Lesson 1
PPTX
cs236_lecture1_2023.pptx about machine learning
PDF
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
PDF
Mlj 2013 itm
PPTX
Who's afraid of the DITA wolf?
PPTX
Data science for advanced dummies
PDF
Effective Ways to Use Technology When Teaching Literacy to Young Learners
PPTX
Topic Modelling to identify behavioral trends in online communities
PDF
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
PDF
Machine learning technology for publishing industry / Buchmesse 2018
llm_presentation and deep learning methods
Smart Data Webinar: Machine Learning Techniques for Analyzing Unstructured Bu...
Data Science Accelerator Program
deep_learning_presentation related to llm
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
Hacking Predictive Modeling - RoadSec 2018
Debugging machine-learning
Towads Unsupervised Commonsense Reasoning in AI
VSSML18. Clustering and Latent Dirichlet Allocation
Lesson 1
cs236_lecture1_2023.pptx about machine learning
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Mlj 2013 itm
Who's afraid of the DITA wolf?
Data science for advanced dummies
Effective Ways to Use Technology When Teaching Literacy to Young Learners
Topic Modelling to identify behavioral trends in online communities
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
Machine learning technology for publishing industry / Buchmesse 2018
Ad

More from BigML, Inc (20)

PDF
Digital Transformation and Process Optimization in Manufacturing
PDF
DutchMLSchool 2022 - Automation
PDF
DutchMLSchool 2022 - ML for AML Compliance
PDF
DutchMLSchool 2022 - Multi Perspective Anomalies
PDF
DutchMLSchool 2022 - My First Anomaly Detector
PDF
DutchMLSchool 2022 - Anomaly Detection
PDF
DutchMLSchool 2022 - History and Developments in ML
PDF
DutchMLSchool 2022 - End-to-End ML
PDF
DutchMLSchool 2022 - A Data-Driven Company
PDF
DutchMLSchool 2022 - ML in the Legal Sector
PDF
DutchMLSchool 2022 - Smart Safe Stadiums
PDF
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
PDF
DutchMLSchool 2022 - Anomaly Detection at Scale
PDF
DutchMLSchool 2022 - Citizen Development in AI
PDF
Democratizing Object Detection
PDF
BigML Release: Image Processing
PDF
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
PDF
Machine Learning in Retail: ML in the Retail Sector
PDF
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
PDF
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
Digital Transformation and Process Optimization in Manufacturing
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Citizen Development in AI
Democratizing Object Detection
BigML Release: Image Processing
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: ML in the Retail Sector
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
Ad

Recently uploaded (20)

PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
DOCX
Factor Analysis Word Document Presentation
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Introduction to the R Programming Language
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
[EN] Industrial Machine Downtime Prediction
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Factor Analysis Word Document Presentation
A Complete Guide to Streamlining Business Processes
importance of Data-Visualization-in-Data-Science. for mba studnts
SAP 2 completion done . PRESENTATION.pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
CYBER SECURITY the Next Warefare Tactics
New ISO 27001_2022 standard and the changes
Pilar Kemerdekaan dan Identi Bangsa.pptx
Introduction to the R Programming Language
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
[EN] Industrial Machine Downtime Prediction

BigML Education - Topic Models

  • 2. BigML Education Program 2Ensembles In This Video • Introduction to Topic Models • Exploration of Topic Models in the BigML Interface • Inference of topic distributions using a trained topic model • Parameterization of topic models
  • 3. BigML Education Program 3Ensembles Data For Topic Models • Unstructured text data • Short stories, novels, newspaper articles • Web pages • Customer reviews or surveys • E-mail Messages • Data is not like most machine learning data • Often no fields in each row (i.e., no “columns”) • Each instance is just the text of the document
  • 4. BigML Education Program 4Ensembles Categorizing Instances • Often, many instances will have words indicating they are about the same thing (the same topic) • It may be useful to identify instances corresponding to a certain topic • Topic modeling automatically discovers common topics in the data • Can assign a score to each instance indicating how much that instance is “about” a given topic
  • 5. BigML Education Program 5Ensembles Generative Modeling • Decision trees / Logistic regression are discriminative models • Aggressively model the classification boundary • Parsimonious: Don’t consider anything you don’t have to • Topic models are generative models • Posit a theory of how the data was generated • Tweak the theory to fit the data
  • 6. BigML Education Program 6Ensembles Title Text Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon 'em. DocumentTerm
  • 7. BigML Education Program 7Ensembles Topics cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… shoe asteroid flashlight pizza… plate giraffe purple jump… Be not afraid of greatness: some are born great, some achieve greatness… term probability shoe ϵ asteroid ϵ flashlight ϵ pizza ϵ … ϵ • A topic is a term generator • Invoke it a bunch of times to get a document • Most will be nonsense, but eventually you’ll generate your dataset
  • 8. BigML Education Program 8Ensembles Topic Models word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵTopic: travel cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… Topic: space cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… airplane passport pizza … mars quasar lightyear soda word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ Generate Document
  • 9. BigML Education Program 9Ensembles Review • Topic models are generative models for unstructured text data • The BigML interface provides an intuitive way to explore your topic model • You can get the topic distribution for an instance by using the “topic distribution” or “batch topic distribution” options in the model resource view • Changing the “number of topics” and specifying “excluded terms” may give you a much different and possibly better topic model