Classification with Naïve BayesA Deep Dive into Apache Mahout
Today’s speaker – Josh Pattersonjosh@cloudera.com / twitter: @jpatanoogaMaster’s Thesis: self-organizing mesh networksPublished in IAAI-09: TinyTermite: A Secure Routing AlgorithmConceived, built, and led Hadoop integration for the openPDC project at TVA (Smartgrid stuff)Led small team which designed classification techniques for time series and Map ReduceOpen source work at http://openpdc.codeplex.comNow: Solutions Architect at Cloudera2
What is Classification?Supervised LearningWe give the system a set of instances to learn fromSystem builds knowledge of some structureLearns “concepts”System can then classify new instances
Supervised vs Unsupervised LearningSupervisedGive system examples/instances of multiple conceptsSystem learns “concepts”More “hands on”Example: Naïve Bayes, Neural NetsUnsupervisedUses unlabled dataBuilds joint density modelExample: k-means clustering
Naïve BayesCalled Naïve Bayes because its based on “Baye’sRule” and “naively” assumes independence given the labelIt is only valid to multiply probabilities when the events are independentSimplistic assumption in real lifeDespite the name, Naïve works well on actual datasets
Naïve Bayes ClassifierSimple probabilistic classifier based on applying Baye’s theorem (from Bayesian statistics) strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model".
Naïve Bayes Classifier (2)Assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. Example: a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.
A Little Bit o’ Theory
Condensing MeaningTo train our system we needTotal number input training instances (count)Counts tuples: {attributen,outcomeo,valuem} Total counts of each outcomeo{outcome-count}To Calculate each Pr[En|H]({attributen,outcomeo,valuem} / {outcome-count} )…From the Vapor of That Last Big Equation
A Real Example From Witten, et al
Enter Apache MahoutWhat is it?Apache Mahout is a scalable machine learning library that supports large data setsWhat Are the Major Algorithm Type?ClassificationRecommendationClusteringhttp://mahout.apache.org/
Mahout Algorithms
Naïve Bayes and TextNaive Bayes does not model text well. “Tackling the Poor Assumptions of Naive Bayes Text Classifiers”http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdfMahout does some modifications based around TF-IDF scoring (Next Slide)Includes two other pre-processing steps, common for information retrieval but not for Naive Bayes classification
High Level AlgorithmFor Each Feature(word) in each Doc:Calc: “Weight Normalized Tf-Idf”for a given feature in a label is the Tf-idf calculated using standard idf multiplied by the Weight Normalized TfWe calculate the sum of W-N-Tf-idf for all the features in a label called Sigma_k, and alpha_i == 1.0Weight = Log [ ( W-N-Tf-Idf + alpha_i ) / ( Sigma_k + N  ) ]
BayesDriver Training WorkflowNaïve Bayes Training MapReduce Workflow in Mahout
Logical Classification ProcessGather, Clean, and Examine the Training DataReally get to know your data!Train the Classifier, allowing the system to “Learn” the “Concepts”But not “overfit” to this specific training data setClassify New Unseen InstancesWith Naïve Bayes we’ll calculate the probabilities of each class wrt this instance
How Is Classification Done?Sequentially or via Map ReduceTestClassifier.javaCreates ClassifierContextFor Each File in DirFor Each LineBreak line into map of tokensFeed array of words to Classifier engine for new classification/labelCollect classifications as output
A Quick Note About Training Data…Your classifier can only be as good as the training data lets it be…If you don’t do good data prep, everything will perform poorlyData collection and pre-processing takes the bulk of the time
Enough Math, Run the CodeDownload and install Mahouthttp://www.apache.orgRun 20Newsgroups Examplehttps://cwiki.apache.org/confluence/display/MAHOUT/Twenty+NewsgroupsUses Naïve Bayes ClassificationDownload and extract 20news-bydate.tar.gz from the 20newsgroups dataset
Generate Test and Train DatasetTraining Dataset:mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \  -p examples/bin/work/20news-bydate/20news-bydate-train \  -o examples/bin/work/20news-bydate/bayes-train-input \  -a org.apache.mahout.vectorizer.DefaultAnalyzer\  -c UTF-8Test Dataset:mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \  -p examples/bin/work/20news-bydate/20news-bydate-test \  -o examples/bin/work/20news-bydate/bayes-test-input \  -a org.apache.mahout.vectorizer.DefaultAnalyzer \  -c UTF-8
Train and Test ClassifierTrain:$MAHOUT_HOME/bin/mahout trainclassifier \  -i 20news-input/bayes-train-input \  -o newsmodel \  -type bayes \  -ng 3 \  -source hdfsTest:$MAHOUT_HOME/bin/mahout testclassifier \  -m newsmodel \  -d 20news-input \  -type bayes \  -ng 3 \  -source hdfs \  -method mapreduce
Other Use CasesPredictive AnalyticsYou’ll hear this term a lot in the field, especially in the context of SASGeneral Supervised Learning ClassificationWe can recognize a lot of things with practiceAnd lots of tuning!Document ClassificationSentiment Analysis
Questions?We’re Hiring!Cloudera’sDistro of Apache Hadoop:http://www.cloudera.comResources“Tackling the Poor Assumptions of Naive Bayes Text Classifiers”http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

More Related Content

PDF
Naive Bayes
ODP
NAIVE BAYES CLASSIFIER
PDF
Bias and variance trade off
PPTX
Classification and Regression
PPTX
Concept learning
PPTX
Probabilistic Reasoning
PPT
Bayes Classification
PDF
Data Science - Part V - Decision Trees & Random Forests
Naive Bayes
NAIVE BAYES CLASSIFIER
Bias and variance trade off
Classification and Regression
Concept learning
Probabilistic Reasoning
Bayes Classification
Data Science - Part V - Decision Trees & Random Forests

What's hot (20)

PPTX
Pattern recognition UNIT 5
PDF
Bayesian inference
PDF
Bayesian networks
PPT
Planning
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
ODP
Machine Learning with Decision trees
PPTX
Naïve Bayes Classifier Algorithm.pptx
PDF
Decision tree learning
PDF
Heart Attack Prediction using Machine Learning
PPTX
Knowledge representation and Predicate logic
PPT
3.7 outlier analysis
PPTX
Local search algorithm
PPTX
A* Algorithm
PPTX
Machine Learning and Real-World Applications
PDF
Naive Bayes Classifier
PPTX
Support Vector Machine ppt presentation
PDF
Decision trees in Machine Learning
PDF
I. Mini-Max Algorithm in AI
PPSX
Lasso and ridge regression
Pattern recognition UNIT 5
Bayesian inference
Bayesian networks
Planning
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Machine Learning with Decision trees
Naïve Bayes Classifier Algorithm.pptx
Decision tree learning
Heart Attack Prediction using Machine Learning
Knowledge representation and Predicate logic
3.7 outlier analysis
Local search algorithm
A* Algorithm
Machine Learning and Real-World Applications
Naive Bayes Classifier
Support Vector Machine ppt presentation
Decision trees in Machine Learning
I. Mini-Max Algorithm in AI
Lasso and ridge regression
Ad

Viewers also liked (7)

PDF
Lecture 5: Bayesian Classification
PDF
Bayesian classification
PPT
2.3 bayesian classification
PPT
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
PPT
2.4 rule based classification
PPTX
04 Classification in Data Mining
PPT
2.5 backpropagation
Lecture 5: Bayesian Classification
Bayesian classification
2.3 bayesian classification
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
2.4 rule based classification
04 Classification in Data Mining
2.5 backpropagation
Ad

Similar to Classification with Naive Bayes (20)

PDF
News article classification using Naive Bayes Algorithm
PPTX
Naive Bayesian classifier Naive Bayesian classifier Naive Bayesian classifier
PDF
Machine learning naive bayes and svm.pdf
PPT
UNIT2_NaiveBayes algorithms used in machine learning
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPTX
Naive Bayes_1.pptx Slides of NB in classical machine learning
PDF
19BayesTheoremClassification19BayesTheoremClassification.ppt
PDF
Understanding Mahout classification documentation
PDF
Text Classification Powered by Apache Mahout and Lucene
PPT
bayes answer jejisiowwoowwksknejejrjejej
PPT
bayesNaive.ppt
PPT
bayesNaive.ppt
PPT
bayesNaive algorithm in machine learning
PPT
Unit-2.ppt
PPTX
Spark application on ec2 cluster
PPT
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
PPT
NaiveBayes.ppt
PPT
NaiveBayes.ppt
PPT
NaiveBayes.ppt
PPT
NaiveBayes.ppt Naive Bayes algorithm machine learning
News article classification using Naive Bayes Algorithm
Naive Bayesian classifier Naive Bayesian classifier Naive Bayesian classifier
Machine learning naive bayes and svm.pdf
UNIT2_NaiveBayes algorithms used in machine learning
Naïve Bayes Classifier Algorithm.pptx
Naive Bayes_1.pptx Slides of NB in classical machine learning
19BayesTheoremClassification19BayesTheoremClassification.ppt
Understanding Mahout classification documentation
Text Classification Powered by Apache Mahout and Lucene
bayes answer jejisiowwoowwksknejejrjejej
bayesNaive.ppt
bayesNaive.ppt
bayesNaive algorithm in machine learning
Unit-2.ppt
Spark application on ec2 cluster
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
NaiveBayes.ppt
NaiveBayes.ppt
NaiveBayes.ppt
NaiveBayes.ppt Naive Bayes algorithm machine learning

More from Josh Patterson (20)

PPTX
Patterson Consulting: What is Artificial Intelligence?
PPTX
What is Artificial Intelligence
PPTX
Smart Data Conference: DL4J and DataVec
PPTX
Deep Learning: DL4J and DataVec
PPTX
Deep Learning and Recurrent Neural Networks in the Enterprise
PPTX
Modeling Electronic Health Records with Recurrent Neural Networks
PPTX
Building Deep Learning Workflows with DL4J
PPTX
How to Build Deep Learning Models
PPTX
Deep learning with DL4J - Hadoop Summit 2015
PPTX
Enterprise Deep Learning with DL4J
PPTX
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
PPTX
Vectorization - Georgia Tech - CSE6242 - March 2015
PPTX
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
PPTX
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
PPTX
Intro to Vectorization Concepts - GaTech cse6242
PPTX
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
PPTX
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
PPTX
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
PPTX
Knitting boar atl_hug_jan2013_v2
PPTX
Knitting boar - Toronto and Boston HUGs - Nov 2012
Patterson Consulting: What is Artificial Intelligence?
What is Artificial Intelligence
Smart Data Conference: DL4J and DataVec
Deep Learning: DL4J and DataVec
Deep Learning and Recurrent Neural Networks in the Enterprise
Modeling Electronic Health Records with Recurrent Neural Networks
Building Deep Learning Workflows with DL4J
How to Build Deep Learning Models
Deep learning with DL4J - Hadoop Summit 2015
Enterprise Deep Learning with DL4J
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Intro to Vectorization Concepts - GaTech cse6242
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Knitting boar atl_hug_jan2013_v2
Knitting boar - Toronto and Boston HUGs - Nov 2012

Classification with Naive Bayes

  • 1. Classification with Naïve BayesA Deep Dive into Apache Mahout
  • 2. Today’s speaker – Josh [email protected] / twitter: @jpatanoogaMaster’s Thesis: self-organizing mesh networksPublished in IAAI-09: TinyTermite: A Secure Routing AlgorithmConceived, built, and led Hadoop integration for the openPDC project at TVA (Smartgrid stuff)Led small team which designed classification techniques for time series and Map ReduceOpen source work at http://openpdc.codeplex.comNow: Solutions Architect at Cloudera2
  • 3. What is Classification?Supervised LearningWe give the system a set of instances to learn fromSystem builds knowledge of some structureLearns “concepts”System can then classify new instances
  • 4. Supervised vs Unsupervised LearningSupervisedGive system examples/instances of multiple conceptsSystem learns “concepts”More “hands on”Example: Naïve Bayes, Neural NetsUnsupervisedUses unlabled dataBuilds joint density modelExample: k-means clustering
  • 5. Naïve BayesCalled Naïve Bayes because its based on “Baye’sRule” and “naively” assumes independence given the labelIt is only valid to multiply probabilities when the events are independentSimplistic assumption in real lifeDespite the name, Naïve works well on actual datasets
  • 6. Naïve Bayes ClassifierSimple probabilistic classifier based on applying Baye’s theorem (from Bayesian statistics) strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model".
  • 7. Naïve Bayes Classifier (2)Assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. Example: a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.
  • 8. A Little Bit o’ Theory
  • 9. Condensing MeaningTo train our system we needTotal number input training instances (count)Counts tuples: {attributen,outcomeo,valuem} Total counts of each outcomeo{outcome-count}To Calculate each Pr[En|H]({attributen,outcomeo,valuem} / {outcome-count} )…From the Vapor of That Last Big Equation
  • 10. A Real Example From Witten, et al
  • 11. Enter Apache MahoutWhat is it?Apache Mahout is a scalable machine learning library that supports large data setsWhat Are the Major Algorithm Type?ClassificationRecommendationClusteringhttp://mahout.apache.org/
  • 13. Naïve Bayes and TextNaive Bayes does not model text well. “Tackling the Poor Assumptions of Naive Bayes Text Classifiers”http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdfMahout does some modifications based around TF-IDF scoring (Next Slide)Includes two other pre-processing steps, common for information retrieval but not for Naive Bayes classification
  • 14. High Level AlgorithmFor Each Feature(word) in each Doc:Calc: “Weight Normalized Tf-Idf”for a given feature in a label is the Tf-idf calculated using standard idf multiplied by the Weight Normalized TfWe calculate the sum of W-N-Tf-idf for all the features in a label called Sigma_k, and alpha_i == 1.0Weight = Log [ ( W-N-Tf-Idf + alpha_i ) / ( Sigma_k + N ) ]
  • 15. BayesDriver Training WorkflowNaïve Bayes Training MapReduce Workflow in Mahout
  • 16. Logical Classification ProcessGather, Clean, and Examine the Training DataReally get to know your data!Train the Classifier, allowing the system to “Learn” the “Concepts”But not “overfit” to this specific training data setClassify New Unseen InstancesWith Naïve Bayes we’ll calculate the probabilities of each class wrt this instance
  • 17. How Is Classification Done?Sequentially or via Map ReduceTestClassifier.javaCreates ClassifierContextFor Each File in DirFor Each LineBreak line into map of tokensFeed array of words to Classifier engine for new classification/labelCollect classifications as output
  • 18. A Quick Note About Training Data…Your classifier can only be as good as the training data lets it be…If you don’t do good data prep, everything will perform poorlyData collection and pre-processing takes the bulk of the time
  • 19. Enough Math, Run the CodeDownload and install Mahouthttp://www.apache.orgRun 20Newsgroups Examplehttps://cwiki.apache.org/confluence/display/MAHOUT/Twenty+NewsgroupsUses Naïve Bayes ClassificationDownload and extract 20news-bydate.tar.gz from the 20newsgroups dataset
  • 20. Generate Test and Train DatasetTraining Dataset:mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \ -p examples/bin/work/20news-bydate/20news-bydate-train \ -o examples/bin/work/20news-bydate/bayes-train-input \ -a org.apache.mahout.vectorizer.DefaultAnalyzer\ -c UTF-8Test Dataset:mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \ -p examples/bin/work/20news-bydate/20news-bydate-test \ -o examples/bin/work/20news-bydate/bayes-test-input \ -a org.apache.mahout.vectorizer.DefaultAnalyzer \ -c UTF-8
  • 21. Train and Test ClassifierTrain:$MAHOUT_HOME/bin/mahout trainclassifier \ -i 20news-input/bayes-train-input \ -o newsmodel \ -type bayes \ -ng 3 \ -source hdfsTest:$MAHOUT_HOME/bin/mahout testclassifier \ -m newsmodel \ -d 20news-input \ -type bayes \ -ng 3 \ -source hdfs \ -method mapreduce
  • 22. Other Use CasesPredictive AnalyticsYou’ll hear this term a lot in the field, especially in the context of SASGeneral Supervised Learning ClassificationWe can recognize a lot of things with practiceAnd lots of tuning!Document ClassificationSentiment Analysis
  • 23. Questions?We’re Hiring!Cloudera’sDistro of Apache Hadoop:http://www.cloudera.comResources“Tackling the Poor Assumptions of Naive Bayes Text Classifiers”http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Editor's Notes

  • #2: https://cwiki.apache.org/MAHOUT/books-tutorials-and-talks.html
  • #6: Contrasts with “1Rule” method (1Rule uses 1 attribute)NB allows all attributes to make contributions that are equally important and independent of one another
  • #7: This classifier produces a probability estimate for each class rather than a predictionConsidered “Supervised Learning”
  • #8: comparison with other classification methods in 2006 showed that Bayes classification is outperformed by more current approaches, such as boosted trees or random forestsAn advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification.
  • #9: Pr[E|H] -> all evidence for instances with H->”yes”Pr[H] -> percent of instances w/ this outcomePr[E] -> sum of the values ( ) for all outcomes
  • #10: Book reference: snow crashFor each attribute “a” there are multiple values, and given these combinations we need to look at how many times the instances were actually classified each class.In training we use the term “outcome”, in classification we use the term “class”Example: say we have 2 attributes to an instance
  • #11: We don’t take into account some of the other things like “missing values” here
  • #13: Now that we’ve established the case for Naïve Bayes + Text  show how it fits in with other classifications algos
  • #14: *** Need to sell case for using another feature calculating mechanic ***when one class has more training examples than anotherNaive Bayes selects poor weights for the decision boundary. To balance the amount of training examples used per estimatethey introduced a “complement class” formulation of Naive Bayes.A document is treated as a sequence of words and it is assumed that each word position is generated independently of every other word
  • #15: Term frequency =num occurrences of the considered term ti in document dj / sizeof ( words in doc dj )Normalized to protect against bias in larger docsIDF = log( Normalized Frequency for a term(feature) in a document is calculated by dividing the term frequency by the root mean square of terms frequencies in that documentWeight Normalized Tffor a given feature in a given label = sum of Normalized Frequency of the feature across all the documents in the label.
  • #16: Need to get a better handle on Sigma_kirSigmaWijhttps://cwiki.apache.org/MAHOUT/bayesian.html
  • #20: https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups
  • #22: Can also test sequentially