0% found this document useful (0 votes)
99 views27 pages

Bayesian Decision Theory

1. The document provides information about Bayesian decision theory and describes the process of using a Naive Bayes classifier to classify data. 2. It gives the steps to estimate probabilities from data, calculate the posterior probability to classify a new data point using the Bayes theorem and independence assumptions of Naive Bayes. 3. An example is shown applying these steps to classify an iris flower species from a test data point using the iris dataset, estimating probabilities, calculating likelihoods and posterior probabilities to make the prediction.

Uploaded by

Nurma Ayu Wigati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views27 pages

Bayesian Decision Theory

1. The document provides information about Bayesian decision theory and describes the process of using a Naive Bayes classifier to classify data. 2. It gives the steps to estimate probabilities from data, calculate the posterior probability to classify a new data point using the Bayes theorem and independence assumptions of Naive Bayes. 3. An example is shown applying these steps to classify an iris flower species from a test data point using the iris dataset, estimating probabilities, calculating likelihoods and posterior probabilities to make the prediction.

Uploaded by

Nurma Ayu Wigati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

25 April 2013

Bayesian Decision
Theory
Dr. Anto Satriyo Nugroho,
M.Eng
Center for Information & Communication Technology
Agency for the Assessment & Application of Technology
URL: http://asnugroho.net Email: [email protected]

Introduction
The sea bass/salmon example
State of nature, prior
State of nature is a random variable
The catch of salmon and sea bass is equiprobable
P(1) = P(2) (uniform priors)
P(1) + P( 2) = 1 (exclusivity and exhaustivity)

Decision Rules
Decision rule with only the prior information
Decide 1 if P(1) > P(2) otherwise decide 2
Use of the class conditional information
P(x | 1) and P(x | 2) describe the difference in lightness
between populations of sea and salmon

Probability Density

Posterior, likelihood, evidence


P(j | x) = P(x | j) . P (j) / P(x)
Where in case of two categories

Posterior = (Likelihood x Prior) / Evidence


j 2

P( x) P( x | j ) P( j )
j 1

Error probability
Decision given the posterior probabilities
X is an observation for which:
if P(1 | x) > P(2 | x)
if P(1 | x) < P(2 | x)

True state of nature = 1


True state of nature = 2

Therefore:
whenever we observe a particular x, the probability of
error is :
P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1

Minimizing the probability of error


Decide 1 if P(1 | x) > P(2 | x);
otherwise decide 2
Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
(Bayes decision)

Bayesian Classifier
Consider each attribute and class label as random variables
Given a record with attributes (A1, A2,,An)
Goal is to predict class C
Specifically, we want to find the value of C that maximizes P(C| A1,
A2,,An )
Can we estimate P(C| A1, A2,,An ) directly from data?

Naive Bayes Classifier


Approach:
compute the posterior probability P(C | A1, A2, , An) for all
values of C using the Bayes theorem

P(A1 A2 An | C)P(C)
P(C | A1 A2 An ) =
P(A A A )
Choose value of C that maximizes P(C | 1A1,2 A2, ,n An)

Equivalent to choosing value of C that maximizes


P(A1, A2, , An|C) P(C)

How can we estimate the value of P(A1, A2, , An | C ) ?

Naive Bayes Classifier


Assume independence among attributes Ai when class is given:
P(A1, A2, , An |C) = P(A1| Cj) P(A2| Cj) P(An| Cj)
We can estimate the value of P(Ai| Cj) for all Ai and Cj.
New point is classified to Cj if P(Cj) P(Ai| Cj) is maximal.

How can we estimate the probabilities


from the data ?
Case 1 : Discrete data

Class: P(C) = Nc/N

e.g., P(No) = 7/10,


P(Yes) = 3/10
For discrete attributes:
P(Ai | Ck) = |Aik|/ Nc
where |Aik| is number of
instances having attribute Ai
and belongs to class Ck
Examples:
P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0

How can we estimate the probabilities


from the data ?
Case 2 : Continuous data

For continuous attributes:


Discretize the range into bins
one ordinal attribute per bin
violates independence assumption
Two-way split: (A < v) or (A > v)
choose only one of the two splits as new attribute
Probability density estimation:
Assume attribute follows a normal distribution
Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
Once probability distribution is known, can use it to estimate the
conditional probability P(Ai|c)

1
P( A | c )
e
2
i

ij

( Ai ij ) 2
2 ij2

Normal distribution:

1
P( A | c )
e
2
i

( Ai ij ) 2
2 ij2

ij

One for each (Ai,ci) pair


For (Income, Class=No):
If Class=No
sample mean = 110
sample variance = 2975

1
P( Income 120 | No )
e
2 (54 .54 )

( 1 21 0 1) 2 0
2 ( 2 9 )7 5

0.0072

14

Example of Naive Bayes


Classification

Given a Test Record:

X (Refu n d No ,M arried In
, co me 1 2 0 K)

P(X|Class=No) = P(Refund=No|Class=No)
P(Married| Class=No)
P(Income=120K|
Class=No)
= 4/7 4/7 0.0072 = 0.0024

P(X|Class=Yes) = P(Refund=No| Class=Yes)


P(Married| Class=Yes)
P(Income=120K|
Class=Yes)
= 1 0 1.2 10-9 = 0

Since P(X|No)P(No) > P(X|Yes)P(Yes)


Therefore P(No|X) > P(Yes|X)

=> Class = No

15

Characteristics of Naive
Bayes Classifier

Robust to isolated noise points


Handle missing values by ignoring the instance during
probability estimate calculations
Robust to irrelevant attributes
Independence assumption may not hold for some attributes
Use other techniques such as Bayesian Belief Networks
(BBN)

16

Example with Iris Dataset


Training set:
Iris Setosa (1): 25 samples (first half of the original dataset)
Iris Versicolor (2): 25 samples (first half of the original dataset)
Iris Virginica (3):

25 samples (first half of the original dataset)

Testing set
Iris Setosa (1): 25 samples (second half of the original dataset)
Iris Versicolor (2): 25 samples (second half of the original dataset)
25 samples (second half of the original
Iris Virginica (3):
dataset)
Suppose we want to classifiy a datum from Testing set, with the
following characteristics (the actual class is Iris Versicolor):
Sepal length: 5.7
Sepal width:
2.6
Petal length:
3.5
Petal width:
1
17

Solution:
1.
2.
3.
4.
5.

Calculate the prior probability


Calculate the mean & variance of each feature
Calculate the likelihood
Calculate
prior probability x likelihood
Make decision based on posterior probability

Step
4
Step
1

Step 2
&3

POSTERIOR = PRIOR x LIKELIHOOD


EVIDENCE

Step 1: Prior Probability


Calculation

P(1)= number of 1 samples / total samples = 25/75 = 0.33


P(2)= number of 2 samples / total samples = 25/75 = 0.33
P(3)= number of 3 samples / total samples = 25/75 = 0.33

19

Step 2: Mean & Variance


Calculation

Iris has continuous attributes, thus to calculate the likelihood we have to


calculate the mean () and variance (2) of each class of each
attributes.

20

Step 3: Likelihood
Calculation

Suppose we want to classifiy a datum from Testing set, with the


following characteristics (the actual class is Iris Versicolor):
Sepal length: 5.7 Sepal width:2.6 Petal length:3.5 Petal width:

(Ai ij )

1: Iris Setosa 2: Iris versicolor


3: Iris Virginica

P(Ai | j ) =

A1:Sepal length=5.7,
A2:sepal width=2.6
A3:petal length=3.5
A4:petal width=1

2
ij

2
ij

21

C code for Likelihood


Calculation

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
main()
{
float x,m,var;

Compile with
gcc programfilename.c o programfilename -lm

while(1){
printf("attribute value: ");
scanf("%f",&x);
printf("attribute mean: ");
scanf("%f",&m);
printf("attribute variance: ");
scanf("%f",&var);
printf("%g\n",1.0/sqrt(2*M_PI*var)*exp(-(x-m)*(x-m)/(2*var)));
}
}

22

$ gcc calculate_likelihood.c -o calculate_likelihood -lm


$ ./calculate_likelihood
Example of how to compile
attribute value: 5.7
the program and use it to
attribute mean: 5.028
calculate the likelihood
attribute variance: 0.16043333
P(sepal length=5.7 | Iris Setosa)
0.243805
to exit, press CTRL+C

23

Likelihood Calculation
Results
P(sepal length=5.7 | Iris Setosa) = 0.241763
P(sepal width=2.6 | Iris Setosa) = 0.0625788
P(petal length=3.5 | Iris Setosa) = 1.7052 e-23 = 0
P(petal width=1
| Iris Setosa) = 2.23877 e-11
P(sepal length=5.7 | Iris Versicolor) = 0.619097
P(sepal width=2.6 | Iris Versicolor) = 0.998687
P(petal length=3.5 | Iris Versicolor) = 0.16855
P(petal width=1
| Iris Versicolor) = 0.481618
P(sepal length=5.7 | Iris Virginica) = 0.265044
P(sepal width=2.6 | Iris Virginica) = 0.731322
P(petal length=3.5 | Iris Virginica) = 0.00256255
P(petal width=1
| Iris Virginica) = 0.000360401

24

Step 4: Prior x Likelihood


Calculation

prior (Iris Setosa)*P(sepal length=5.7 | Iris Setosa)*P(sepal width=2.6 |


Iris Setosa)*P(petal length=3.5 | Iris Setosa)*P(petal width=1 | Iris
Setosa) = 0
prior (Iris Versicolor)*P(sepal length=5.7 | Iris Versicolor)*P(sepal
width=2.6 | Iris Versicolor)*P(petal length=3.5 | Iris Versicolor)*P(petal
width=1 | Iris Versicolor) = 0.016730091
prior (Iris Virginica)*P(sepal length=5.7 | Iris Virginica)*P(sepal
width=2.6 | Iris Virginica)*P(petal length=3.5 | IrisVirginica)*P(petal
width=1 | IrisVirginica) = 5.96711 10-8

25

Step 5: Decision based on Posterior


Probability
Posterior (Iris Setosa | Sepal length: 5.7, Sepal width: 2.6, Petal length:
3.5, Petal width:1 ) = 0/evidence
Posterior (Iris Versicolor | Sepal length: 5.7, Sepal width: 2.6, Petal
length: 3.5, Petal width:1 ) = 0.016730091/evidence
Posterior (Iris Versicolor | Sepal length: 5.7, Sepal width: 2.6, Petal
length: 3.5, Petal width:1 ) = 5.96711 10-8 /evidence
From the three posterior values above, the second one is the biggest.
Thus the class for datum with sepal length: 5.7 Sepal width:2.6 Petal
length:3.5 Petal width:1 is Iris Versicolor

POSTERIOR = PRIOR x LIKELIHOOD


EVIDENCE
26

Final Examination 2011


The following dataset are
part of microarray
dataset used to design a
classifier (thus, it is the
trainingset)
for classifying and
diagnostics cancers
using gene expression.
The original dataset
consists of expression of
6567 genes (attributes)
and 63 training samples
of 4 classes:
neuroblastoma (NB),
rhabdomyosarcoma
(RMS), non-Hodgkin
lymphoma (NHL) and the
Ewing family of tumors
(EWS). For the sake of
simplicity, only
expression of four genes,

Determine the class of the following datum


gene-1

gene-2

gene-3

gene-4

0.4964

0.2509

2.714

0.1805

27

You might also like