Naïve Bayes Classifier: April 25, 2006

The document discusses Naive Bayes classifiers and their use in document classification. It begins by introducing classification methods like manual classification, rule-based classification, and supervised machine learning classification. It then focuses on Naive Bayes classifiers, explaining that they are a simple and commonly used supervised learning method that uses Bayes' theorem. The document outlines the parameters and assumptions of Naive Bayes classifiers, including conditional independence of features. It discusses properties like incremental learning, combining prior and observed data, and probabilistic outputs. Finally, it covers techniques like log probability calculations to prevent underflow and Laplace smoothing to avoid overfitting.

Uploaded by

aaminj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Naïve Bayes Classifier: April 25, 2006

Uploaded by

aaminj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 19

Naïve Bayes Classifier

April 25th, 2006

Classification Methods (1)
 Manual classification
 Used by Yahoo!, Looksmart, about.com, ODP
 Very accurate when job is done by experts
 Consistent when the problem size and team is
small
 Difficult and expensive to scale
Classification Methods (2)
 Automatic classification
 Hand-coded rule-based systems
 One technique used by CS dept’s spam filter, Reuters,
Snort IDS …
 E.g., assign category if the instance matches the rules
 Accuracy is often very high if a rule has been carefully
refined over time by a subject expert
 Building and maintaining these rules is expensive
Classification Methods (3)
 Supervised learning of a document-label assignment function
 Many systems partly rely on machine learning (Google, MSN,
Yahoo!, …)
 Naive Bayes (simple, common method)
 k-Nearest Neighbors (simple, powerful)
 Support-vector machines (new, more powerful)
 … plus many other methods
 No free lunch: requires hand-classified training data
 But data can be built up (and refined) by amateurs

 Note that many commercial systems use a mixture of methods

Decision Tree
 Strength
 Decision trees are able to generate understandable rules.
 Decision trees perform classification without requiring
much computation.
 Decision trees are able to handle both continuous and
categorical variables.
 Decision trees provide a clear indication of which fields are
most important for prediction or classification
 Weakness
 Error-prone with many classes
 Computationally expensive to train, hard to update
 Simple true/false decision, nothing in between
Does patient have cancer or not?
 A patient takes a lab test and the result comes back positive.
It is known that the test returns a correct positive result in
only 99% of the cases and a correct negative result in only
95% of the cases. Furthermore, only 0.03 of the entire
population has this disease.

How likely that this patient has cancer?

Bayesian Methods
 Our focus this lecture
 Learning and classification methods based on
probability theory.
 Bayes theorem plays a critical role in probabilistic
learning and classification.
 Uses prior probability of each category given no
information about an item.
 Categorization produces a posterior probability
distribution over the possible categories given a
description of an item.
Basic Probability Formulas
 Product rule
P ( A  B)  P( A | B) P( B )  P( B | A) P( A)
 Sum rule
P ( A  B )  P ( A)  P ( B )  P ( A  B )
 Bayes theorem
P ( D | h) P ( h)
P(h | D) 
P( D)
 Theorem of total probability, if event Ai is
mutually exclusive and probability sum to 1
n
P( B )   P ( B | Ai ) P ( Ai )
i 1
Bayes Theorem
 Given a hypothesis h and data D which bears on the
hypothesis: P ( D | h) P ( h)
P (h | D) 
P( D)
 P(h): independent probability of h: prior probability
 P(D): independent probability of D
 P(D|h): conditional probability of D given h:
likelihood
 P(h|D): conditional probability of h given D: posterior
probability
Does patient have cancer or not?
 A patient takes a lab test and the result comes back positive. It is
known that the test returns a correct positive result in only 99% of
the cases and a correct negative result in only 95% of the cases.
Furthermore, only 0.03 of the entire population has this disease.

1. What is the probability that this patient has cancer?

2. What is the probability that he does not have cancer?
3. What is the diagnosis?
Maximum A Posterior
 Based on Bayes Theorem, we can compute the
Maximum A Posterior (MAP) hypothesis for the data
 We are interested in the best hypothesis for some space
H given observed training data D.

hMAP  argmax P ( h | D )
hH

P ( D | h ) P ( h)
 argmax
hH P ( D)
 argmax P ( D | h) P (h)
hH
H: set of all hypothesis.
Note that we can drop P(D) as the probability of the data is constant
(and independent of the hypothesis).
Maximum Likelihood
 Now assume that all hypotheses are equally
probable a priori, i.e., P(hi ) = P(hj ) for all hi,
hj belong to H.
 This is called assuming a uniform prior. It
simplifies computing the posterior:
hML  arg max P ( D | h)
hH

 This hypothesis is called the maximum

likelihood hypothesis.
Desirable Properties of Bayes Classifier
 Incrementality: with each training example,
the prior and the likelihood can be updated
dynamically: flexible and robust to errors.
 Combines prior knowledge and observed
data: prior probability of a hypothesis
multiplied with probability of the hypothesis
given the training data
 Probabilistic hypothesis: outputs not only a
classification, but a probability distribution
over all classes
Bayes Classifiers
Assumption: training set consists of instances of different classes
described cj as conjunctions of attributes values
Task: Classify a new instance d based on a tuple of attribute values
into one of the classes cj  C
Key idea: assign the most probable class c MAP using Bayes
Theorem.

cMAP  argmax P (c j | x1 , x2 , , xn )
c j C

P ( x1 , x2 , , xn | c j ) P(c j )
 argmax
c j C P( x1 , x2 ,  , xn )
 argmax P( x1 , x2 ,  , xn | c j ) P(c j )
c j C
Parameters estimation
 P(cj)
 Can be estimated from the frequency of classes in the
training examples.
 P(x1,x2,…,xn|cj)
 O(|X|n•|C|) parameters
 Could only be estimated if a very, very large number of
training examples was available.
 Independence Assumption: attribute values are
conditionally independent given the target value: naïve
Bayes. P( x , x , , x | c )  P( x | c )
1 2 n j 
i
i j

c NB  arg max P (c j ) P ( xi | c j )
c j C i
Properties
 Estimating P( xi | c j ) instead of P( x1 , x2 , , xn | c j ) greatly
reduces the number of parameters (and the data
sparseness).
 The learning step in Naïve Bayes consists of
estimating P( xi | c j ) and P(c j ) based on the
frequencies in the training data
 An unseen instance is classified by computing the
class that maximizes the posterior
 When conditioned independence is satisfied, Naïve
Bayes corresponds to MAP classification.
Question: For the day <sunny, cool, high, strong>, what’s
the play prediction?
Underflow Prevention
 Multiplying lots of probabilities, which are
between 0 and 1 by definition, can result in
floating-point underflow.
 Since log(xy) = log(x) + log(y), it is better to
perform all computations by summing logs of
probabilities rather than multiplying
probabilities.
 Class with highest final un-normalized log
probability score is still the most probable.
c NB  argmax log P(c j ) 
c jC
 log P( x | c )
i positions
i j
Smoothing to Avoid Overfitting
N ( X i  xi , C  c j )  1
Pˆ ( xi | c j ) 
N (C  c j )  k
# of values of Xi
 Somewhat more subtle version overall fraction in
data where Xi=xi,k

N ( X i  xi ,k , C  c j )  mpi ,k
Pˆ ( xi ,k | c j ) 
N (C  c j )  m
extent of
“smoothing”

Culinary Cheat Sheet
100% (2)
Culinary Cheat Sheet
1 page
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Mathematics - Iii: Institute of Science&Technology
No ratings yet
Mathematics - Iii: Institute of Science&Technology
16 pages
2024 - Slide2 - BayesML Sub
No ratings yet
2024 - Slide2 - BayesML Sub
40 pages
module_3_Last Part
No ratings yet
module_3_Last Part
16 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
2022 Slide9 BayesML Eng
No ratings yet
2022 Slide9 BayesML Eng
34 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
07_Naive_Bayes
No ratings yet
07_Naive_Bayes
6 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
unit-3(after_mid)
No ratings yet
unit-3(after_mid)
10 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
BSC ML CH2.pptx
No ratings yet
BSC ML CH2.pptx
79 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
UNIT 2 AAM notes (1)
No ratings yet
UNIT 2 AAM notes (1)
38 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
8 ML
No ratings yet
8 ML
22 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
ML Unit-4
No ratings yet
ML Unit-4
82 pages
Unit III
No ratings yet
Unit III
19 pages
Text Mining - Classification
No ratings yet
Text Mining - Classification
28 pages
6. Naive Bayes
No ratings yet
6. Naive Bayes
26 pages
Bayesian
No ratings yet
Bayesian
23 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
Lecture 06 Bayesian Networks 07112022 011127pm
No ratings yet
Lecture 06 Bayesian Networks 07112022 011127pm
33 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
NOTES
No ratings yet
NOTES
15 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
05-Classification-II-2024
No ratings yet
05-Classification-II-2024
54 pages
slide07-bayes
No ratings yet
slide07-bayes
51 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
ML-09-naive-bayes-classifier
No ratings yet
ML-09-naive-bayes-classifier
24 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Endorsement Letter For PEME
No ratings yet
Endorsement Letter For PEME
4 pages
The International Phonetic Alphabet.... Prof..Msaddek PDF
No ratings yet
The International Phonetic Alphabet.... Prof..Msaddek PDF
3 pages
Your Partner For SCADA, Process Control and Electrical Solutions
No ratings yet
Your Partner For SCADA, Process Control and Electrical Solutions
2 pages
Foundation of Education
No ratings yet
Foundation of Education
7 pages
CIS110_Unit_3_APA_Style_Title_Page_Step-By-Step_Guide-2
No ratings yet
CIS110_Unit_3_APA_Style_Title_Page_Step-By-Step_Guide-2
3 pages
Robert Gagne'S Learning Hierarchy
No ratings yet
Robert Gagne'S Learning Hierarchy
6 pages
Onkyo tx-sr333
No ratings yet
Onkyo tx-sr333
79 pages
Pet Parent Guide To Kidney Disease
No ratings yet
Pet Parent Guide To Kidney Disease
6 pages
Reaction Paper: From: Mark Tattao Eidrian Balatico
No ratings yet
Reaction Paper: From: Mark Tattao Eidrian Balatico
2 pages
Enus106 503
No ratings yet
Enus106 503
14 pages
Picc Code Item Description Unit Material (RS) Labour (RS) Machinery (RS) Overhead & Profit (RS) Composite (RS)
No ratings yet
Picc Code Item Description Unit Material (RS) Labour (RS) Machinery (RS) Overhead & Profit (RS) Composite (RS)
3 pages
Phy110 - Engineering Physics
No ratings yet
Phy110 - Engineering Physics
17 pages
Criteria For Judging ASEAN 2023
No ratings yet
Criteria For Judging ASEAN 2023
13 pages
Introduction To: Transport Phenomena
No ratings yet
Introduction To: Transport Phenomena
11 pages
Jumbos and Jumping Devils
No ratings yet
Jumbos and Jumping Devils
313 pages
Contracts Law Cat - Postal Rule Edwin Wanyama
No ratings yet
Contracts Law Cat - Postal Rule Edwin Wanyama
4 pages
MV Willy: Report On The Investigation of The Grounding of
No ratings yet
MV Willy: Report On The Investigation of The Grounding of
33 pages
Columns and Slabs
No ratings yet
Columns and Slabs
22 pages
What Is The Difference?: EINC (Ma'am Nera Galan)
No ratings yet
What Is The Difference?: EINC (Ma'am Nera Galan)
2 pages
Standard Downtilt Kit For Panel Antennas (Wind Load Category "L" and "M")
No ratings yet
Standard Downtilt Kit For Panel Antennas (Wind Load Category "L" and "M")
2 pages
OpenShift_Container_Platform-4.14-Installation_overview-en-US
No ratings yet
OpenShift_Container_Platform-4.14-Installation_overview-en-US
39 pages
Gec 101 Quiz 1 (Ece 1a)
No ratings yet
Gec 101 Quiz 1 (Ece 1a)
12 pages
Mvonline
No ratings yet
Mvonline
12 pages
Bio-Diversity Loss: Presented by Team:-Multifarious
No ratings yet
Bio-Diversity Loss: Presented by Team:-Multifarious
45 pages
The Endangered English Dictionary Bodacious Words Your Dictiona
83% (6)
The Endangered English Dictionary Bodacious Words Your Dictiona
296 pages
Deming's 14 Points
100% (1)
Deming's 14 Points
7 pages
QPJOBLOG
No ratings yet
QPJOBLOG
221 pages
Production Readiness Checklist
No ratings yet
Production Readiness Checklist
31 pages
Review of Related Literature
No ratings yet
Review of Related Literature
5 pages

Naïve Bayes Classifier: April 25, 2006

Uploaded by

Naïve Bayes Classifier: April 25, 2006

Uploaded by

Naïve Bayes Classifier

April 25th, 2006

 Note that many commercial systems use a mixture of methods

How likely that this patient has cancer?

1. What is the probability that this patient has cancer?

 This hypothesis is called the maximum

You might also like