SlideShare a Scribd company logo
BigML Fall 2016 Release
Introducing
Topic Models
BigML, Inc 2Fall Release Webinar - November 2016
Fall 2016 Release
CHARLES PARKER, (VP Algorithms)
Enter questions into chat box – we’ll
answer some via chat; others at the end of
the session
https://bigml.com/releases
ATAKAN CETINSOY, (VP Predictive Applications)
Resources
Moderator
Speaker
Contact info@bigml.com
Twitter @bigmlcom
Questions
Topic Modeling
BigML, Inc 4Fall Release Webinar - November 2016
Topic Modeling
• Method for discovering
structure in "unstructured"
text.
• Based on LDA,
introduced by David Blei,
Andrew Ng, and Michael
I. Jordan in 2003.
• Now "BigML Easy"
BigML, Inc 5Fall Release Webinar - November 2016
BigML Resources
SOURCE DATASET CORRELATION
STATISTICAL
TEST
MODEL ENSEMBLE
LOGISTIC
REGRESSION
EVALUATION
ANOMALY
DETECTOR
ASSOCIATION
DISCOVERY
SINGLE/BATCH
PREDICTION
SCRIPT LIBRARY EXECUTION
Data
Exploration
Supervised
Learning
Unsupervised
Learning
Automation
CLUSTER
Scoring
TOPIC MODEL
BigML, Inc 6Fall Release Webinar - November 2016
Unsupervised Learning
Features
Instances
• Learn from instances
• Each instance has features
• There is no label
Clustering
Find similar instances
Anomaly Detection
Find unusual instances
Association Discovery
Find feature rules
BigML, Inc 7Fall Release Webinar - November 2016
Topic Model
Text Fields
• Unsupervised algorithm
• Learns only from text fields
• Finds hidden topics that model
the text
• How is this different from the Text Analysis
that BigML already offers?
• What does it output and how do we use it
• Unsupervised… model?
Questions:
BigML, Inc 8Fall Release Webinar - November 2016
Text Analysis
Be not afraid of greatness:
some are born great, some
achieve greatness, and
some have greatness
thrust upon 'em.
great: appears 4 times
Bag of Words
BigML, Inc 9Fall Release Webinar - November 2016
Text Analysis
… great afraid born achieve … …
… 4 1 1 1 … …
… … … … … … …
Be not afraid of greatness:
some are born great, some achieve
greatness, and some have greatness
thrust upon ‘em.
Model
The token “great” 

occurs more than 3 times
The token “afraid” 

occurs no more than once
Text Analysis Demo #1
BigML, Inc 11Fall Release Webinar - November 2016
Text Analysis vs Topic Models
Text Topic Model
Creates thousands of
hidden token counts
Token counts are
independently
uninteresting
No semantic importance
No measure of co-
occurrence
Creates tens of topics
that model the text
Topics are independently
interesting
Semantic meaning
extracted
Support for bigrams
BigML, Inc 12Fall Release Webinar - November 2016
Generative Modeling
• Decision trees are discriminative models
• Aggressively model the classification boundary
• Parsimonious: Don’t consider anything you don’t have to
• Topic Models are generative models
• Come up with a theory of how the data is generated
• Tweak the theory to fit your data
BigML, Inc 13Fall Release Webinar - November 2016
Generating Documents
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
shoe asteroid
flashlight
pizza…
plate giraffe
purple jump…
Be not afraid
of greatness: 

some are born
great, some
achieve 

greatness…
• "Machine" that generates a random word with equal
probability with each pull.
• Pull random number of times to generate a document.
• All documents can be generated, but most are nonsense.
word probability
shoe ϵ
asteroid ϵ
flashlight ϵ
pizza ϵ
… ϵ
BigML, Inc 14Fall Release Webinar - November 2016
Topic Model
• Written documents have meaning - one way to
describe meaning is to assign a topic.
• For our random machine, the topic can be thought
of as increasing the probability of certain words.
Intuition:
Topic: travel
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
airplane
passport pizza
…
word probability
travel 23,55 %
airplane 2,33 %
mars 0,003 %
mantle ϵ
… ϵ
Topic: space
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
mars quasar
lightyear soda
word probability
space 38,94 %
airplane ϵ
mars 13,43 %
mantle 0,05 %
… ϵ
BigML, Inc 15Fall Release Webinar - November 2016
Topic Model
plate giraffe
purple
jump…
airplane
passport
pizza …
Topic: "1"
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
travel 23,55 %
airplane 2,33 %
mars 0,003 %
mantle ϵ
… ϵ
Topic: "k"
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
shoe 12,12 %
coffee 3,39 %
telephone 13,43 %
paper 4,11 %
… ϵ
…Topic: "2"
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
space 38,94 %
airplane ϵ
mars 13,43 %
mantle 0,05 %
… ϵ
plate giraffe
purple
jump…
• Each text field in a row is concatenated into a document
• The documents are analyzed to generate "k" related topics
• Each topic is represented by a distribution of term
probabilities
Topic Model Demo #1
BigML, Inc 17Fall Release Webinar - November 2016
Uses
• As a preprocessor for other techniques
• Bootstrapping categories for classification
• Recommendation
• Discovery in large, heterogeneous text datasets
BigML, Inc 18Fall Release Webinar - November 2016
Topic Distribution
• Any given document is likely a mixture of the
modeled topics…
• This can be represented as a distribution of topic
probabilities
Intuition:
Will 2020 be
the year that
humans will
embrace
space
exploration
and finally
travel to Mars?
Topic: travel
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
travel 23,55 %
airplane 2,33 %
mars 0,003 %
mantle ϵ
… ϵ
11%
Topic: space
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
space 38,94 %
airplane ϵ
mars 13,43 %
mantle 0,05 %
… ϵ
89%
Topic Model Demo #2
BigML, Inc 20Fall Release Webinar - November 2016
Clustering?
Unlabelled Data
Centroid Label
Unlabelled Data
topic 1
prob
topic 3
prob
topic k
prob
Clustering Batch Centroid
Topic Model
Text Fields
Batch Topic Distribution
…
Topic Model Demo #3
BigML, Inc 22Fall Release Webinar - November 2016
Some Tips
• Setting k
• Much like k-means, the best value is data specific
• Too few will agglomerate unrelated topics, too many will
partition highly related topics
• I tend to find the latter more annoying than the former
• Tuning the Model
• Remove common, useless terms
• Set term limit higher, use bigrams
Questions?
Twitter: @bigmlcom
Mail: info@bigml.com
Docs: https://bigml.com/releases

More Related Content

PDF
BigML Summer 2016 Release
PDF
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
PDF
BSSML16 L8. REST API, Bindings, and Basic Workflows
PDF
BigML Fall 2015 Release
PDF
BSSML16 L10. Summary Day 2 Sessions
PDF
Web UI, Algorithms, and Feature Engineering
PDF
API, WhizzML and Apps
PDF
BigML Summer 2017 Release
BigML Summer 2016 Release
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BSSML16 L8. REST API, Bindings, and Basic Workflows
BigML Fall 2015 Release
BSSML16 L10. Summary Day 2 Sessions
Web UI, Algorithms, and Feature Engineering
API, WhizzML and Apps
BigML Summer 2017 Release

What's hot (11)

PDF
VSSML17 Review. Summary Day 2 Sessions
PDF
BSSML16 L7. Feature Engineering
PDF
Congressional PageRank: Graph Analytics of US Congress With Neo4j
PDF
VSSML17 L5. Basic Data Transformations and Feature Engineering
PDF
MLSD18. Feature Engineering
PPTX
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
PDF
MLSD18. Basic Transformations - BigML
PDF
MLSD18. Supervised Summary
PDF
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
PPTX
Pa2 session 5
PDF
VSSML18. Feature Engineering
VSSML17 Review. Summary Day 2 Sessions
BSSML16 L7. Feature Engineering
Congressional PageRank: Graph Analytics of US Congress With Neo4j
VSSML17 L5. Basic Data Transformations and Feature Engineering
MLSD18. Feature Engineering
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
MLSD18. Basic Transformations - BigML
MLSD18. Supervised Summary
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
Pa2 session 5
VSSML18. Feature Engineering
Ad

Viewers also liked (19)

PDF
BigML Winter 2017 Release
PDF
VSSML16 L5. Basic Data Transformations
PDF
VSSML16 LR1. Summary Day 1
PDF
VSSML16 LR2. Summary Day 2
PDF
VSSML16 L6. Feature Engineering
PDF
BSSML16 L4. Association Discovery and Topic Modeling
PDF
BSSML16 L6. Basic Data Transformations
PDF
BSSML16 L1. Introduction, Models, and Evaluations
PDF
VSSML16 L2. Ensembles and Logistic Regression
PPTX
MLconf NYC Xiangrui Meng
PPT
Presentation2 2
PDF
Cloud Based Digital Signage Framework
PPTX
How to book_transfer_quick_engl
PDF
2016.12.01 CP LaPrimaire.org créé 1er parti politique indépendant et éphémère
PPTX
MLconf NYC Justin Basilico
PDF
Aplicações financeiras em períodos turbulentos
PDF
Bifiform in News & Social Media - Analytics Research
PDF
Inovação e excelência operacional características da empresa ambidestra
PDF
6 dicas de livros para uma nova liderança
BigML Winter 2017 Release
VSSML16 L5. Basic Data Transformations
VSSML16 LR1. Summary Day 1
VSSML16 LR2. Summary Day 2
VSSML16 L6. Feature Engineering
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L6. Basic Data Transformations
BSSML16 L1. Introduction, Models, and Evaluations
VSSML16 L2. Ensembles and Logistic Regression
MLconf NYC Xiangrui Meng
Presentation2 2
Cloud Based Digital Signage Framework
How to book_transfer_quick_engl
2016.12.01 CP LaPrimaire.org créé 1er parti politique indépendant et éphémère
MLconf NYC Justin Basilico
Aplicações financeiras em períodos turbulentos
Bifiform in News & Social Media - Analytics Research
Inovação e excelência operacional características da empresa ambidestra
6 dicas de livros para uma nova liderança
Ad

Similar to BigML Fall 2016 Release (20)

PDF
BSSML17 - Topic Models
PDF
BigML Education - Topic Models
PDF
VSSML18. Clustering and Latent Dirichlet Allocation
PDF
Odsc machine-learning-guide-v1
PDF
Hybrid use of machine learning and ontology
PDF
Hate Speech / Toxic Comment Detection - Data Mining (CSE-362) Project
PDF
IRJET- Survey for Amazon Fine Food Reviews
PPTX
Natural Language Processing
PDF
Fantastic Problems and Where to Find Them: Daryl Weir
PPTX
Introduction to Text Mining
PPTX
What is AI ML NLP and how to apply them
PDF
Reviews on swarm intelligence algorithms for text document clustering
PDF
Hacking Predictive Modeling - RoadSec 2018
PDF
OpenML Reproducibility in Machine Learning ICML2017
PPTX
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
PPTX
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
PDF
A systematic study of text mining techniques
PDF
Text Document categorization using support vector machine
PDF
Exploiting Wikipedia and Twitter for Text Mining Applications
PPTX
Machine learning at scale - Webinar By zekeLabs
BSSML17 - Topic Models
BigML Education - Topic Models
VSSML18. Clustering and Latent Dirichlet Allocation
Odsc machine-learning-guide-v1
Hybrid use of machine learning and ontology
Hate Speech / Toxic Comment Detection - Data Mining (CSE-362) Project
IRJET- Survey for Amazon Fine Food Reviews
Natural Language Processing
Fantastic Problems and Where to Find Them: Daryl Weir
Introduction to Text Mining
What is AI ML NLP and how to apply them
Reviews on swarm intelligence algorithms for text document clustering
Hacking Predictive Modeling - RoadSec 2018
OpenML Reproducibility in Machine Learning ICML2017
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
A systematic study of text mining techniques
Text Document categorization using support vector machine
Exploiting Wikipedia and Twitter for Text Mining Applications
Machine learning at scale - Webinar By zekeLabs

More from BigML, Inc (20)

PDF
Digital Transformation and Process Optimization in Manufacturing
PDF
DutchMLSchool 2022 - Automation
PDF
DutchMLSchool 2022 - ML for AML Compliance
PDF
DutchMLSchool 2022 - Multi Perspective Anomalies
PDF
DutchMLSchool 2022 - My First Anomaly Detector
PDF
DutchMLSchool 2022 - Anomaly Detection
PDF
DutchMLSchool 2022 - History and Developments in ML
PDF
DutchMLSchool 2022 - End-to-End ML
PDF
DutchMLSchool 2022 - A Data-Driven Company
PDF
DutchMLSchool 2022 - ML in the Legal Sector
PDF
DutchMLSchool 2022 - Smart Safe Stadiums
PDF
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
PDF
DutchMLSchool 2022 - Anomaly Detection at Scale
PDF
DutchMLSchool 2022 - Citizen Development in AI
PDF
Democratizing Object Detection
PDF
BigML Release: Image Processing
PDF
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
PDF
Machine Learning in Retail: ML in the Retail Sector
PDF
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
PDF
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
Digital Transformation and Process Optimization in Manufacturing
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Citizen Development in AI
Democratizing Object Detection
BigML Release: Image Processing
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: ML in the Retail Sector
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...

Recently uploaded (20)

PDF
Microsoft Core Cloud Services powerpoint
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
modul_python (1).pptx for professional and student
PDF
annual-report-2024-2025 original latest.
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPT
Predictive modeling basics in data cleaning process
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Introduction to the R Programming Language
PDF
How to run a consulting project- client discovery
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
A Complete Guide to Streamlining Business Processes
Microsoft Core Cloud Services powerpoint
retention in jsjsksksksnbsndjddjdnFPD.pptx
[EN] Industrial Machine Downtime Prediction
modul_python (1).pptx for professional and student
annual-report-2024-2025 original latest.
IBA_Chapter_11_Slides_Final_Accessible.pptx
Leprosy and NLEP programme community medicine
STERILIZATION AND DISINFECTION-1.ppthhhbx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Predictive modeling basics in data cleaning process
Optimise Shopper Experiences with a Strong Data Estate.pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Introduction to the R Programming Language
How to run a consulting project- client discovery
Acceptance and paychological effects of mandatory extra coach I classes.pptx
CYBER SECURITY the Next Warefare Tactics
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
A Complete Guide to Streamlining Business Processes

BigML Fall 2016 Release

  • 1. BigML Fall 2016 Release Introducing Topic Models
  • 2. BigML, Inc 2Fall Release Webinar - November 2016 Fall 2016 Release CHARLES PARKER, (VP Algorithms) Enter questions into chat box – we’ll answer some via chat; others at the end of the session https://bigml.com/releases ATAKAN CETINSOY, (VP Predictive Applications) Resources Moderator Speaker Contact [email protected] Twitter @bigmlcom Questions
  • 4. BigML, Inc 4Fall Release Webinar - November 2016 Topic Modeling • Method for discovering structure in "unstructured" text. • Based on LDA, introduced by David Blei, Andrew Ng, and Michael I. Jordan in 2003. • Now "BigML Easy"
  • 5. BigML, Inc 5Fall Release Webinar - November 2016 BigML Resources SOURCE DATASET CORRELATION STATISTICAL TEST MODEL ENSEMBLE LOGISTIC REGRESSION EVALUATION ANOMALY DETECTOR ASSOCIATION DISCOVERY SINGLE/BATCH PREDICTION SCRIPT LIBRARY EXECUTION Data Exploration Supervised Learning Unsupervised Learning Automation CLUSTER Scoring TOPIC MODEL
  • 6. BigML, Inc 6Fall Release Webinar - November 2016 Unsupervised Learning Features Instances • Learn from instances • Each instance has features • There is no label Clustering Find similar instances Anomaly Detection Find unusual instances Association Discovery Find feature rules
  • 7. BigML, Inc 7Fall Release Webinar - November 2016 Topic Model Text Fields • Unsupervised algorithm • Learns only from text fields • Finds hidden topics that model the text • How is this different from the Text Analysis that BigML already offers? • What does it output and how do we use it • Unsupervised… model? Questions:
  • 8. BigML, Inc 8Fall Release Webinar - November 2016 Text Analysis Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon 'em. great: appears 4 times Bag of Words
  • 9. BigML, Inc 9Fall Release Webinar - November 2016 Text Analysis … great afraid born achieve … … … 4 1 1 1 … … … … … … … … … Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon ‘em. Model The token “great” occurs more than 3 times The token “afraid” occurs no more than once
  • 11. BigML, Inc 11Fall Release Webinar - November 2016 Text Analysis vs Topic Models Text Topic Model Creates thousands of hidden token counts Token counts are independently uninteresting No semantic importance No measure of co- occurrence Creates tens of topics that model the text Topics are independently interesting Semantic meaning extracted Support for bigrams
  • 12. BigML, Inc 12Fall Release Webinar - November 2016 Generative Modeling • Decision trees are discriminative models • Aggressively model the classification boundary • Parsimonious: Don’t consider anything you don’t have to • Topic Models are generative models • Come up with a theory of how the data is generated • Tweak the theory to fit your data
  • 13. BigML, Inc 13Fall Release Webinar - November 2016 Generating Documents cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… shoe asteroid flashlight pizza… plate giraffe purple jump… Be not afraid of greatness: some are born great, some achieve greatness… • "Machine" that generates a random word with equal probability with each pull. • Pull random number of times to generate a document. • All documents can be generated, but most are nonsense. word probability shoe ϵ asteroid ϵ flashlight ϵ pizza ϵ … ϵ
  • 14. BigML, Inc 14Fall Release Webinar - November 2016 Topic Model • Written documents have meaning - one way to describe meaning is to assign a topic. • For our random machine, the topic can be thought of as increasing the probability of certain words. Intuition: Topic: travel cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… airplane passport pizza … word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵ Topic: space cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… mars quasar lightyear soda word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ
  • 15. BigML, Inc 15Fall Release Webinar - November 2016 Topic Model plate giraffe purple jump… airplane passport pizza … Topic: "1" cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵ Topic: "k" cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability shoe 12,12 % coffee 3,39 % telephone 13,43 % paper 4,11 % … ϵ …Topic: "2" cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ plate giraffe purple jump… • Each text field in a row is concatenated into a document • The documents are analyzed to generate "k" related topics • Each topic is represented by a distribution of term probabilities
  • 17. BigML, Inc 17Fall Release Webinar - November 2016 Uses • As a preprocessor for other techniques • Bootstrapping categories for classification • Recommendation • Discovery in large, heterogeneous text datasets
  • 18. BigML, Inc 18Fall Release Webinar - November 2016 Topic Distribution • Any given document is likely a mixture of the modeled topics… • This can be represented as a distribution of topic probabilities Intuition: Will 2020 be the year that humans will embrace space exploration and finally travel to Mars? Topic: travel cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵ 11% Topic: space cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ 89%
  • 20. BigML, Inc 20Fall Release Webinar - November 2016 Clustering? Unlabelled Data Centroid Label Unlabelled Data topic 1 prob topic 3 prob topic k prob Clustering Batch Centroid Topic Model Text Fields Batch Topic Distribution …
  • 22. BigML, Inc 22Fall Release Webinar - November 2016 Some Tips • Setting k • Much like k-means, the best value is data specific • Too few will agglomerate unrelated topics, too many will partition highly related topics • I tend to find the latter more annoying than the former • Tuning the Model • Remove common, useless terms • Set term limit higher, use bigrams
  • 23. Questions? Twitter: @bigmlcom Mail: [email protected] Docs: https://bigml.com/releases