0% found this document useful (0 votes)
119 views4 pages

Data Smart

The document discusses the naïve Bayes classification model for document classification. It explains that the model calculates the probabilities of a document belonging to different classes (e.g. p(app|words) and p(other|words)) based on the words in the document. It makes the independence assumption that word probabilities are independent, allowing it to simplify the calculations. However, it notes that words are not truly independent in documents. Nonetheless, the naïve Bayes model often performs reasonably well due to how it compares class probabilities.

Uploaded by

gasaas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views4 pages

Data Smart

The document discusses the naïve Bayes classification model for document classification. It explains that the model calculates the probabilities of a document belonging to different classes (e.g. p(app|words) and p(other|words)) based on the words in the document. It makes the independence assumption that word probabilities are independent, allowing it to simplify the calculations. However, it notes that words are not truly independent in documents. Nonetheless, the naïve Bayes model often performs reasonably well due to how it compares class probabilities.

Uploaded by

gasaas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

data-smart

Everything You Ever Needed to


Know about Spreadsheets but Were
Too Afraid to Ask
Cluster Analysis
Part I: Using K-Means to Segment
Your Customer Base
When You Name a Product Mandrill, Youre Going to
Get Some Signal and Some Noise
Supervised articial intelligence models: nave Bayes model. In
supervised articial intelligence, you train a model to make predictions
using data thats already been classied. The most common use of nave
Bayes is for document classication. Training data are provided to the
training algorithm and the model can classify new documents into these
categories using its knowledge p 77
The Worlds Fastest Intro to Probability Theory
High-Level Class Probabilities Are Often Assumed
to Be Equal
A Couple More Odds and Ends
Using Bayes Rule to Create an AI Model
Treat each tweet as a bag of words which means breaking each tweet up into words (often
called tokens) at spaces and punctuation. There are two classes of tweetscalled app for
the Mandrill.com tweets and other for everything else.
You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other |
word1, word2, word3, ...)
These are the probabilities of a tweet being either about the app or about something else
given that we see the words word1, word2, word3, etc.
The standard implementation of a nave Bayes model classies a new document based on
which of these two classes is most likely given the words.
Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely
given the words
Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app
probability as follows:
p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...)
Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) /
p(word1, word2, ...)
Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ?
Assume that the probabilities of these words being in the document are independent from
one another.
p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)...
p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3|
other)...
The independence assumption allows you to break that joint conditional probability of the
bag of words given the class into probabilities of single words given the class.
However words are not independent of one another in a document!
MAP rule doesnt really care that you calculated your class probabilities correctly; it just
cares about which incorrectly calculated probability is larger.
And by assuming independence of words, youre injecting all sorts of error into that
calculation, but at least this sloppiness is across the board. The comparisons used in the
MAP rule tend to come out in the same direction they would have had you applied all sorts
of fancier linguistic understanding to the model.
notes
formulae
problems
solutions
do in excel
Removing Extraneous Punctuation
Splitting on Spaces
Counting Tokens and Calculating Probabilities
And We Have a Model! Lets Use It
Lets Get This Excel Party Started
Wrapping Up
Nave Bayes and the Incredible
Lightness of Being an Idiot
Optimisation Modelling: Because
That Fresh Squeezed Orange Juice
Aint Gonna Blend Itself
Cluster Analysis
Part II: Network Graphs and
Community Detection
The Granddaddy of Supervised
Articial Intelligence Regression
Ensemble Models: A Whole Lot of
Bad Pizza
Forecasting: Breathe Easy; You Cant
Win
Outlier Detection: Just Because
Theyre Odd Doesnt Mean Theyre
Unimportant
Moving from Spreadsheets into R
Conclusion
Data Smart
Using Data Science to
Transform Information into
Insight
by John W. Foreman 2014
Everything You Ever Needed to
Know about Spreadsheets but Were
Too Afraid to Ask
Cluster Analysis
Part I: Using K-Means to Segment
Your Customer Base
When You Name a Product Mandrill, Youre Going to
Get Some Signal and Some Noise
Supervised articial intelligence models: nave Bayes model. In
supervised articial intelligence, you train a model to make predictions
using data thats already been classied. The most common use of nave
Bayes is for document classication. Training data are provided to the
training algorithm and the model can classify new documents into these
categories using its knowledge p 77
The Worlds Fastest Intro to Probability Theory
High-Level Class Probabilities Are Often Assumed
to Be Equal
A Couple More Odds and Ends
Using Bayes Rule to Create an AI Model
Treat each tweet as a bag of words which means breaking each tweet up into words (often
called tokens) at spaces and punctuation. There are two classes of tweetscalled app for
the Mandrill.com tweets and other for everything else.
You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other |
word1, word2, word3, ...)
These are the probabilities of a tweet being either about the app or about something else
given that we see the words word1, word2, word3, etc.
The standard implementation of a nave Bayes model classies a new document based on
which of these two classes is most likely given the words.
Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely
given the words
Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app
probability as follows:
p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...)
Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) /
p(word1, word2, ...)
Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ?
Assume that the probabilities of these words being in the document are independent from
one another.
p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)...
p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3|
other)...
The independence assumption allows you to break that joint conditional probability of the
bag of words given the class into probabilities of single words given the class.
However words are not independent of one another in a document!
MAP rule doesnt really care that you calculated your class probabilities correctly; it just
cares about which incorrectly calculated probability is larger.
And by assuming independence of words, youre injecting all sorts of error into that
calculation, but at least this sloppiness is across the board. The comparisons used in the
MAP rule tend to come out in the same direction they would have had you applied all sorts
of fancier linguistic understanding to the model.
notes
formulae
problems
solutions
do in excel
Removing Extraneous Punctuation
Splitting on Spaces
Counting Tokens and Calculating Probabilities
And We Have a Model! Lets Use It
Lets Get This Excel Party Started
Wrapping Up
Nave Bayes and the Incredible
Lightness of Being an Idiot
Optimisation Modelling: Because
That Fresh Squeezed Orange Juice
Aint Gonna Blend Itself
Cluster Analysis
Part II: Network Graphs and
Community Detection
The Granddaddy of Supervised
Articial Intelligence Regression
Ensemble Models: A Whole Lot of
Bad Pizza
Forecasting: Breathe Easy; You Cant
Win
Outlier Detection: Just Because
Theyre Odd Doesnt Mean Theyre
Unimportant
Moving from Spreadsheets into R
Conclusion
Data Smart
Using Data Science to
Transform Information into
Insight
by John W. Foreman 2014
Everything You Ever Needed to
Know about Spreadsheets but Were
Too Afraid to Ask
Cluster Analysis
Part I: Using K-Means to Segment
Your Customer Base
When You Name a Product Mandrill, Youre Going to
Get Some Signal and Some Noise
Supervised articial intelligence models: nave Bayes model. In
supervised articial intelligence, you train a model to make predictions
using data thats already been classied. The most common use of nave
Bayes is for document classication. Training data are provided to the
training algorithm and the model can classify new documents into these
categories using its knowledge p 77
The Worlds Fastest Intro to Probability Theory
High-Level Class Probabilities Are Often Assumed
to Be Equal
A Couple More Odds and Ends
Using Bayes Rule to Create an AI Model
Treat each tweet as a bag of words which means breaking each tweet up into words (often
called tokens) at spaces and punctuation. There are two classes of tweetscalled app for
the Mandrill.com tweets and other for everything else.
You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other |
word1, word2, word3, ...)
These are the probabilities of a tweet being either about the app or about something else
given that we see the words word1, word2, word3, etc.
The standard implementation of a nave Bayes model classies a new document based on
which of these two classes is most likely given the words.
Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely
given the words
Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app
probability as follows:
p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...)
Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) /
p(word1, word2, ...)
Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ?
Assume that the probabilities of these words being in the document are independent from
one another.
p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)...
p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3|
other)...
The independence assumption allows you to break that joint conditional probability of the
bag of words given the class into probabilities of single words given the class.
However words are not independent of one another in a document!
MAP rule doesnt really care that you calculated your class probabilities correctly; it just
cares about which incorrectly calculated probability is larger.
And by assuming independence of words, youre injecting all sorts of error into that
calculation, but at least this sloppiness is across the board. The comparisons used in the
MAP rule tend to come out in the same direction they would have had you applied all sorts
of fancier linguistic understanding to the model.
notes
formulae
problems
solutions
do in excel
Removing Extraneous Punctuation
Splitting on Spaces
Counting Tokens and Calculating Probabilities
And We Have a Model! Lets Use It
Lets Get This Excel Party Started
Wrapping Up
Nave Bayes and the Incredible
Lightness of Being an Idiot
Optimisation Modelling: Because
That Fresh Squeezed Orange Juice
Aint Gonna Blend Itself
Cluster Analysis
Part II: Network Graphs and
Community Detection
The Granddaddy of Supervised
Articial Intelligence Regression
Ensemble Models: A Whole Lot of
Bad Pizza
Forecasting: Breathe Easy; You Cant
Win
Outlier Detection: Just Because
Theyre Odd Doesnt Mean Theyre
Unimportant
Moving from Spreadsheets into R
Conclusion
Data Smart
Using Data Science to
Transform Information into
Insight
by John W. Foreman 2014
Everything You Ever Needed to
Know about Spreadsheets but Were
Too Afraid to Ask
Cluster Analysis
Part I: Using K-Means to Segment
Your Customer Base
When You Name a Product Mandrill, Youre Going to
Get Some Signal and Some Noise
Supervised articial intelligence models: nave Bayes model. In
supervised articial intelligence, you train a model to make predictions
using data thats already been classied. The most common use of nave
Bayes is for document classication. Training data are provided to the
training algorithm and the model can classify new documents into these
categories using its knowledge p 77
The Worlds Fastest Intro to Probability Theory
High-Level Class Probabilities Are Often Assumed
to Be Equal
A Couple More Odds and Ends
Using Bayes Rule to Create an AI Model
Treat each tweet as a bag of words which means breaking each tweet up into words (often
called tokens) at spaces and punctuation. There are two classes of tweetscalled app for
the Mandrill.com tweets and other for everything else.
You care about these two probabilities: p(app | word1, word2, word3, ...) and p(other |
word1, word2, word3, ...)
These are the probabilities of a tweet being either about the app or about something else
given that we see the words word1, word2, word3, etc.
The standard implementation of a nave Bayes model classies a new document based on
which of these two classes is most likely given the words.
Maximum a posteriori rule (MAP rule) - decision that picks the class thats most likely
given the words
Use the Bayes Rule on them. Using the Bayes Rule, you can rewrite the conditional app
probability as follows:
p(app | word1, word2, ...) = p(app) p(word1, word2, ...| app) / p(word1, word2, ...)
Similarly, you get: p(other | word1, word2, ...) = p(other) p(word1, word2, ...| other) /
p(word1, word2, ...)
Which is larger: p(app) p(word1, word2, ...| app) or p(other) p(word1, word2, ...| other) ?
Assume that the probabilities of these words being in the document are independent from
one another.
p(app) p(word1, word2, ...| app) = p(app) p(word1| app) p(word2| app) p(word3| app)...
p(other) p(word1, word2, ...| other) = p(other) p(word1| other) p(word2| other) p(word3|
other)...
The independence assumption allows you to break that joint conditional probability of the
bag of words given the class into probabilities of single words given the class.
However words are not independent of one another in a document!
MAP rule doesnt really care that you calculated your class probabilities correctly; it just
cares about which incorrectly calculated probability is larger.
And by assuming independence of words, youre injecting all sorts of error into that
calculation, but at least this sloppiness is across the board. The comparisons used in the
MAP rule tend to come out in the same direction they would have had you applied all sorts
of fancier linguistic understanding to the model.
notes
formulae
problems
solutions
do in excel
Removing Extraneous Punctuation
Splitting on Spaces
Counting Tokens and Calculating Probabilities
And We Have a Model! Lets Use It
Lets Get This Excel Party Started
Wrapping Up
Nave Bayes and the Incredible
Lightness of Being an Idiot
Optimisation Modelling: Because
That Fresh Squeezed Orange Juice
Aint Gonna Blend Itself
Cluster Analysis
Part II: Network Graphs and
Community Detection
The Granddaddy of Supervised
Articial Intelligence Regression
Ensemble Models: A Whole Lot of
Bad Pizza
Forecasting: Breathe Easy; You Cant
Win
Outlier Detection: Just Because
Theyre Odd Doesnt Mean Theyre
Unimportant
Moving from Spreadsheets into R
Conclusion
Data Smart
Using Data Science to
Transform Information into
Insight
by John W. Foreman 2014

You might also like