The document discusses the naïve Bayes classification model for document classification. It explains that the model calculates the probabilities of a document belonging to different classes (e.g. p(app|words) and p(other|words)) based on the words in the document. It makes the independence assumption that word probabilities are independent, allowing it to simplify the calculations. However, it notes that words are not truly independent in documents. Nonetheless, the naïve Bayes model often performs reasonably well due to how it compares class probabilities.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
119 views4 pages
Data Smart
The document discusses the naïve Bayes classification model for document classification. It explains that the model calculates the probabilities of a document belonging to different classes (e.g. p(app|words) and p(other|words)) based on the words in the document. It makes the independence assumption that word probabilities are independent, allowing it to simplify the calculations. However, it notes that words are not truly independent in documents. Nonetheless, the naïve Bayes model often performs reasonably well due to how it compares class probabilities.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4
data-smart
Everything You Ever Needed to
Know about Spreadsheets but Were Too Afraid to Ask Cluster Analysis Part I: Using K-Means to Segment Your Customer Base When You Name a Product Mandrill, Youre Going to Get Some Signal and Some Noise Supervised articial intelligence models: nave Bayes model. In supervised articial intelligence, you train a model to make predictions using data thats already been classied. The most common use of nave Bayes is for document classication. Training data are provided to the training algorithm and the model can classify new documents into these categories using its knowledge p 77 The Worlds Fastest Intro to Probability Theory High-Level Class Probabilities Are Often Assumed to Be Equal A Couple More Odds and Ends Using Bayes Rule to Create an AI Model Treat each tweet as a bag of words which means breaking each tweet up into words (often called tokens) at spaces and punctuation. There are two classes of tweetscalled app for the Mandrill.com tweets and other for everything else. You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other | word1, word2, word3, ...) These are the probabilities of a tweet being either about the app or about something else given that we see the words word1, word2, word3, etc. The standard implementation of a nave Bayes model classies a new document based on which of these two classes is most likely given the words. Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely given the words Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app probability as follows: p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...) Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) / p(word1, word2, ...) Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ? Assume that the probabilities of these words being in the document are independent from one another. p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)... p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3| other)... The independence assumption allows you to break that joint conditional probability of the bag of words given the class into probabilities of single words given the class. However words are not independent of one another in a document! MAP rule doesnt really care that you calculated your class probabilities correctly; it just cares about which incorrectly calculated probability is larger. And by assuming independence of words, youre injecting all sorts of error into that calculation, but at least this sloppiness is across the board. The comparisons used in the MAP rule tend to come out in the same direction they would have had you applied all sorts of fancier linguistic understanding to the model. notes formulae problems solutions do in excel Removing Extraneous Punctuation Splitting on Spaces Counting Tokens and Calculating Probabilities And We Have a Model! Lets Use It Lets Get This Excel Party Started Wrapping Up Nave Bayes and the Incredible Lightness of Being an Idiot Optimisation Modelling: Because That Fresh Squeezed Orange Juice Aint Gonna Blend Itself Cluster Analysis Part II: Network Graphs and Community Detection The Granddaddy of Supervised Articial Intelligence Regression Ensemble Models: A Whole Lot of Bad Pizza Forecasting: Breathe Easy; You Cant Win Outlier Detection: Just Because Theyre Odd Doesnt Mean Theyre Unimportant Moving from Spreadsheets into R Conclusion Data Smart Using Data Science to Transform Information into Insight by John W. Foreman 2014 Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask Cluster Analysis Part I: Using K-Means to Segment Your Customer Base When You Name a Product Mandrill, Youre Going to Get Some Signal and Some Noise Supervised articial intelligence models: nave Bayes model. In supervised articial intelligence, you train a model to make predictions using data thats already been classied. The most common use of nave Bayes is for document classication. Training data are provided to the training algorithm and the model can classify new documents into these categories using its knowledge p 77 The Worlds Fastest Intro to Probability Theory High-Level Class Probabilities Are Often Assumed to Be Equal A Couple More Odds and Ends Using Bayes Rule to Create an AI Model Treat each tweet as a bag of words which means breaking each tweet up into words (often called tokens) at spaces and punctuation. There are two classes of tweetscalled app for the Mandrill.com tweets and other for everything else. You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other | word1, word2, word3, ...) These are the probabilities of a tweet being either about the app or about something else given that we see the words word1, word2, word3, etc. The standard implementation of a nave Bayes model classies a new document based on which of these two classes is most likely given the words. Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely given the words Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app probability as follows: p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...) Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) / p(word1, word2, ...) Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ? Assume that the probabilities of these words being in the document are independent from one another. p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)... p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3| other)... The independence assumption allows you to break that joint conditional probability of the bag of words given the class into probabilities of single words given the class. However words are not independent of one another in a document! MAP rule doesnt really care that you calculated your class probabilities correctly; it just cares about which incorrectly calculated probability is larger. And by assuming independence of words, youre injecting all sorts of error into that calculation, but at least this sloppiness is across the board. The comparisons used in the MAP rule tend to come out in the same direction they would have had you applied all sorts of fancier linguistic understanding to the model. notes formulae problems solutions do in excel Removing Extraneous Punctuation Splitting on Spaces Counting Tokens and Calculating Probabilities And We Have a Model! Lets Use It Lets Get This Excel Party Started Wrapping Up Nave Bayes and the Incredible Lightness of Being an Idiot Optimisation Modelling: Because That Fresh Squeezed Orange Juice Aint Gonna Blend Itself Cluster Analysis Part II: Network Graphs and Community Detection The Granddaddy of Supervised Articial Intelligence Regression Ensemble Models: A Whole Lot of Bad Pizza Forecasting: Breathe Easy; You Cant Win Outlier Detection: Just Because Theyre Odd Doesnt Mean Theyre Unimportant Moving from Spreadsheets into R Conclusion Data Smart Using Data Science to Transform Information into Insight by John W. Foreman 2014 Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask Cluster Analysis Part I: Using K-Means to Segment Your Customer Base When You Name a Product Mandrill, Youre Going to Get Some Signal and Some Noise Supervised articial intelligence models: nave Bayes model. In supervised articial intelligence, you train a model to make predictions using data thats already been classied. The most common use of nave Bayes is for document classication. Training data are provided to the training algorithm and the model can classify new documents into these categories using its knowledge p 77 The Worlds Fastest Intro to Probability Theory High-Level Class Probabilities Are Often Assumed to Be Equal A Couple More Odds and Ends Using Bayes Rule to Create an AI Model Treat each tweet as a bag of words which means breaking each tweet up into words (often called tokens) at spaces and punctuation. There are two classes of tweetscalled app for the Mandrill.com tweets and other for everything else. You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other | word1, word2, word3, ...) These are the probabilities of a tweet being either about the app or about something else given that we see the words word1, word2, word3, etc. The standard implementation of a nave Bayes model classies a new document based on which of these two classes is most likely given the words. Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely given the words Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app probability as follows: p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...) Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) / p(word1, word2, ...) Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ? Assume that the probabilities of these words being in the document are independent from one another. p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)... p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3| other)... The independence assumption allows you to break that joint conditional probability of the bag of words given the class into probabilities of single words given the class. However words are not independent of one another in a document! MAP rule doesnt really care that you calculated your class probabilities correctly; it just cares about which incorrectly calculated probability is larger. And by assuming independence of words, youre injecting all sorts of error into that calculation, but at least this sloppiness is across the board. The comparisons used in the MAP rule tend to come out in the same direction they would have had you applied all sorts of fancier linguistic understanding to the model. notes formulae problems solutions do in excel Removing Extraneous Punctuation Splitting on Spaces Counting Tokens and Calculating Probabilities And We Have a Model! Lets Use It Lets Get This Excel Party Started Wrapping Up Nave Bayes and the Incredible Lightness of Being an Idiot Optimisation Modelling: Because That Fresh Squeezed Orange Juice Aint Gonna Blend Itself Cluster Analysis Part II: Network Graphs and Community Detection The Granddaddy of Supervised Articial Intelligence Regression Ensemble Models: A Whole Lot of Bad Pizza Forecasting: Breathe Easy; You Cant Win Outlier Detection: Just Because Theyre Odd Doesnt Mean Theyre Unimportant Moving from Spreadsheets into R Conclusion Data Smart Using Data Science to Transform Information into Insight by John W. Foreman 2014 Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask Cluster Analysis Part I: Using K-Means to Segment Your Customer Base When You Name a Product Mandrill, Youre Going to Get Some Signal and Some Noise Supervised articial intelligence models: nave Bayes model. In supervised articial intelligence, you train a model to make predictions using data thats already been classied. The most common use of nave Bayes is for document classication. Training data are provided to the training algorithm and the model can classify new documents into these categories using its knowledge p 77 The Worlds Fastest Intro to Probability Theory High-Level Class Probabilities Are Often Assumed to Be Equal A Couple More Odds and Ends Using Bayes Rule to Create an AI Model Treat each tweet as a bag of words which means breaking each tweet up into words (often called tokens) at spaces and punctuation. There are two classes of tweetscalled app for the Mandrill.com tweets and other for everything else. You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other | word1, word2, word3, ...) These are the probabilities of a tweet being either about the app or about something else given that we see the words word1, word2, word3, etc. The standard implementation of a nave Bayes model classies a new document based on which of these two classes is most likely given the words. Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely given the words Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app probability as follows: p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...) Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) / p(word1, word2, ...) Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ? Assume that the probabilities of these words being in the document are independent from one another. p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)... p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3| other)... The independence assumption allows you to break that joint conditional probability of the bag of words given the class into probabilities of single words given the class. However words are not independent of one another in a document! MAP rule doesnt really care that you calculated your class probabilities correctly; it just cares about which incorrectly calculated probability is larger. And by assuming independence of words, youre injecting all sorts of error into that calculation, but at least this sloppiness is across the board. The comparisons used in the MAP rule tend to come out in the same direction they would have had you applied all sorts of fancier linguistic understanding to the model. notes formulae problems solutions do in excel Removing Extraneous Punctuation Splitting on Spaces Counting Tokens and Calculating Probabilities And We Have a Model! Lets Use It Lets Get This Excel Party Started Wrapping Up Nave Bayes and the Incredible Lightness of Being an Idiot Optimisation Modelling: Because That Fresh Squeezed Orange Juice Aint Gonna Blend Itself Cluster Analysis Part II: Network Graphs and Community Detection The Granddaddy of Supervised Articial Intelligence Regression Ensemble Models: A Whole Lot of Bad Pizza Forecasting: Breathe Easy; You Cant Win Outlier Detection: Just Because Theyre Odd Doesnt Mean Theyre Unimportant Moving from Spreadsheets into R Conclusion Data Smart Using Data Science to Transform Information into Insight by John W. Foreman 2014