0% found this document useful (0 votes)

5 views

UNIT-3

The document discusses Bayesian learning methods in machine learning, highlighting their importance in calculating probabilities for hypotheses and understanding various learning algorithms. It covers practical applications such as the naive Bayes classifier and the Gibbs algorithm, as well as the Bayes Optimal Classifier and the PAC learning framework. Additionally, it addresses sample complexity in both finite and infinite hypothesis spaces, emphasizing the role of the Vapnik-Chervonenkis dimension.

Uploaded by

Lavankumar Chiluka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

UNIT-3

Uploaded by

Lavankumar Chiluka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 99

MACHINE LEARNING

UNIT 3
BAYESIAN LEARNING
 Bayesian learning methods are relevant to our study of machine learning
for two different reasons
 First, Bayesian learning algorithms that calculate explicit probabilities for
hypotheses, such as the naive Bayes classifier, are among the most
practical approaches to certain types of learning problems.
 The second reason that Bayesian methods are important to our study of
machine learning is that they provide a useful perspective for
understanding many learning algorithms that do not explicitly manipulate
probabilities.
Features of Bayesian learning methods

 Each observed training example can incrementally decrease or increase

the estimated probability that a hypothesis is correct
 Prior knowledge can be combined with observed data to determine the

final probability of a hypothesis

 Bayesian methods can accommodate hypotheses that make
probabilistic predictions
 New instances can be classified by combining the predictions of multiple

 hypotheses, weighted by their probabilities.

WSN security
NAIVE BAYES CLASSIFIER
One highly practical Bayesian learning method is the naive Bayes
learner, often called the naive Bayes classijier.

The naive Bayes classifier applies to learning tasks where each

instance x is described by a conjunction of attribute values and
where the target function f ( x ) can take on any value from some
finite set V.

A set of training examples of the target function is provided, and a

new instance is presented, described by the tuple of attribute values
(al, a2.. .a,). The learner is asked to predict the target value, or
classification, for this new instance.
where VNB denotes the target value output by the naive Bayes
classifier
GIBBS ALGORITHM
Bayes optimal classifier obtains the best performance that
can be achieved from the given training data, it can be quite
costly to apply.

The expense is due to the fact that it computes the posterior

probability for every hypothesis in H and then combines the
predictions of each hypothesis to classify each new instance.
An alternative, less optimal method is the Gibbs algorithm
(see Opper and Haussler 1991), defined as follows:
Gibbs algorithm simply applies a hypothesis drawn at random
according to the current posterior probability distribution .

Surprisingly, it can be shown that under certain conditions the

expected misclassification error for the Gibbs algorithm is at
most twice the expected error of the Bayes optimal classifie
AN EXAMPLE: LEARNING TO
CLASSIFY TEXT
K-NEAREST NEIGHBUOR
CLASSIFICATION
K-Nearest Neighbours
 K-Nearest Neighbors is one of the most basic yet essential classification
algorithms in Machine Learning

 It belongs to the supervised learning domain

 Has application in pattern recognition, data mining and intrusion detection

 We are given some prior data (also called training data), which classifies
coordinates into groups identified by an attribute.

 given another set of data points (also called testing data), allocate these points a
group by analyzing the training set.
 given an unclassified point, we can assign it to a group by observing what
group its nearest neighbors belong to.

 This means a point close to a cluster of points classified as ‘Red’ has a higher
probability of getting classified as ‘Red’.

 Intuitively, we can see that the first point (2.5, 7) should be classified as
‘Green’ and the second point (5.5, 4.5) should be classified as ‘Red’.
Weighted K-Nearest Neighbor
Example
Minimum Description Length
Principle
Minimum Description Length Principle (Conti..)
Minimum Description Length Principle (Conti..)
BAYES OPTIMAL CLASSIFIER
The Bayes Optimal Classifier is a probabilistic model that
predicts the most likely outcome for a new situation

The Bayes theorem is a method for calculating a hypothesis’s

probability based on its prior probability, the probabilities of
observing specific data given the hypothesis, and the seen
data itself.

Maximum a Posteriori (MAP), a probabilistic framework for

determining the most likely hypothesis for a training
dataset.
Take a hypothesis space that has 3 hypotheses h1, h2, and
h3.

The posterior probabilities of the hypotheses are as

follows:
h1 -> 0.4
h2 -> 0.3
h3 -> 0.3

Hence, h1 is the MAP hypothesis. (MAP => max

posterior)
Suppose a new instance x is encountered, which is
classified negative by h2 and h3 but positive by h1.
Taking all hypotheses into account, the probability that x is
positive is .4 and the probability that it is negative is
therefore .6.
The classification generated by the MAP hypothesis is
different from the most probable classification in this case
which is negative.

The most probable classification of the new instance is

obtained by combining the predictions of all hypotheses,
weighted by their posterior probabilities.
If the new example’s probable classification can be any
value vj from a set V, the probability P(vj/D) that the right
classification for the new instance is vj is merely

The denominator is omitted since we’re only using this

for comparison and all the values of P(vj/D) will have the
same denominator.

The value vj, for which P (vj/D) is maximum, is the

best classification for the new instance.
Computational Learning Theory
PAC MODEL (PROBABLY
APPROXIMATELY CORRECT)

PAC (Probably Approximately Correct) learning is a framework used for

mathematical analysis. A PAC Learner tries to learn a concept
(approximately correct) by selecting a hypothesis from a set of
hypotheses that has a low generalization error.
Approximately correct Hypothesis if h⊕c <= ɛ
where 0< ɛ < 0.5

Probably Approximately Correct if Pr(h⊕ c ) <= 1- δ

δ is confidence interval where 0< δ < 0.5
SAMPLE COMPLEXITY FOR
FINITE HYPOTHESIS SPACES
 PAC-learnability is largely determined by the number of training examples
required by the learner.
 The growth in the number of required training examples with problem size,
called the sample complexity of the learning problem
 A learner is consistent if it outputs hypotheses that perfectly fit the training
data, whenever possible
 we derive a bound on the number of training examples required by
any consistent learner, independent of the specific algorithm it uses
SAMPLE COMPLEXITY FOR
INFINITE HYPOTHESIS SPACES
 we consider a second measure of the complexity of H, called the Vapnik-
Chervonenkis dimension of H we can state bounds on sample complexity
that use VC(H) rather than IHI.
 The VC dimension measures the complexity of the hypothesis space H, not
by the number of distinct hypotheses 1 H 1, but instead by the numberof
distinct instances from X that can be completely discriminated using H.
 It uses a notation called as shettering.

James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
100% (1)
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
789 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Statistical Analysis in Microbiology StatNotes
0% (1)
Statistical Analysis in Microbiology StatNotes
173 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Unit III
No ratings yet
Unit III
19 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
ML 3
No ratings yet
ML 3
45 pages
slide07-bayes
No ratings yet
slide07-bayes
51 pages
module_3_Last Part
No ratings yet
module_3_Last Part
16 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
12 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
Machine_learning(unit 3)
No ratings yet
Machine_learning(unit 3)
9 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
ML Module 4 Chapter 8 RNSIT
No ratings yet
ML Module 4 Chapter 8 RNSIT
5 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Notes
No ratings yet
Notes
125 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
ML 3
No ratings yet
ML 3
36 pages
DM See M4
No ratings yet
DM See M4
8 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
UNIT 1 Notes
No ratings yet
UNIT 1 Notes
38 pages
Bayesian
No ratings yet
Bayesian
23 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
Bayesian Learning
No ratings yet
Bayesian Learning
81 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Dl Highlights
No ratings yet
Dl Highlights
6 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
L13 Bayesian Methods
No ratings yet
L13 Bayesian Methods
30 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
6 Naive-Bayes
No ratings yet
6 Naive-Bayes
18 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
Machine Learning: Foundations: Prof. Nathan Intrator
No ratings yet
Machine Learning: Foundations: Prof. Nathan Intrator
60 pages
Bayesian Learning Note
No ratings yet
Bayesian Learning Note
20 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
AL3451 13 M
No ratings yet
AL3451 13 M
22 pages
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Molluscicidal Effect of Derris Elliptica On Apple Snails
No ratings yet
The Molluscicidal Effect of Derris Elliptica On Apple Snails
9 pages
Introduction to Support Vector Machines SVM
No ratings yet
Introduction to Support Vector Machines SVM
9 pages
Explore: Notes
No ratings yet
Explore: Notes
37 pages
Tabel Student T PDF
100% (1)
Tabel Student T PDF
3 pages
ESD EXAM
No ratings yet
ESD EXAM
6 pages
Noncentral T
No ratings yet
Noncentral T
56 pages
ACTIVITY 2 (Cost Behavior with Regression Analysis) -WITH ANSWERS
No ratings yet
ACTIVITY 2 (Cost Behavior with Regression Analysis) -WITH ANSWERS
3 pages
05 MQA Pre-Test & Post Test Analysis With SAMPLE COMPUTATIONS
No ratings yet
05 MQA Pre-Test & Post Test Analysis With SAMPLE COMPUTATIONS
5 pages
Inferential Statistics
No ratings yet
Inferential Statistics
2 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
TS - Lectures AAU August 7 - 9 2018
No ratings yet
TS - Lectures AAU August 7 - 9 2018
190 pages
V2 Exam 2 Afternoon PDF
No ratings yet
V2 Exam 2 Afternoon PDF
79 pages
Dougherty C12G06 2016 05 22
No ratings yet
Dougherty C12G06 2016 05 22
31 pages
Tide Prediction For 9 Constituents: (Using Least Squares Method)
No ratings yet
Tide Prediction For 9 Constituents: (Using Least Squares Method)
23 pages
Chi-Square Test: Prem Mann, Introductory Statistics, 7/E
No ratings yet
Chi-Square Test: Prem Mann, Introductory Statistics, 7/E
33 pages
Primary Functions
No ratings yet
Primary Functions
4 pages
Data Analysis
100% (2)
Data Analysis
87 pages
PP 01 Soln
No ratings yet
PP 01 Soln
10 pages
Analysis of Variance Anova
No ratings yet
Analysis of Variance Anova
7 pages
SPC Format - Updated (1)
No ratings yet
SPC Format - Updated (1)
2 pages
(Ebook) Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later by T. Fomby & R. Carter Hill (Editors) ISBN 0762310758 instant download
100% (1)
(Ebook) Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later by T. Fomby & R. Carter Hill (Editors) ISBN 0762310758 instant download
61 pages
AP Psychology Unit 2 Teacher Noelle Name: - Date: - Word 1 Gut Feeling
No ratings yet
AP Psychology Unit 2 Teacher Noelle Name: - Date: - Word 1 Gut Feeling
7 pages
Ondas Estacionarias: Universidad Nacional Autónoma de Honduras
No ratings yet
Ondas Estacionarias: Universidad Nacional Autónoma de Honduras
9 pages
GMM in R - Jan2021
No ratings yet
GMM in R - Jan2021
19 pages
Download full Modern Business Statistics with Microsoft Office Excel 4th Edition Anderson Solutions Manual all chapters
100% (36)
Download full Modern Business Statistics with Microsoft Office Excel 4th Edition Anderson Solutions Manual all chapters
65 pages
Time Series
No ratings yet
Time Series
40 pages
676-Article Text-3122-1-10-20220124
No ratings yet
676-Article Text-3122-1-10-20220124
15 pages
Case Study
No ratings yet
Case Study
64 pages

UNIT-3

Uploaded by

UNIT-3

Uploaded by

MACHINE LEARNING

 Each observed training example can incrementally decrease or increase

final probability of a hypothesis

 hypotheses, weighted by their probabilities.

The naive Bayes classifier applies to learning tasks where each

A set of training examples of the target function is provided, and a

The expense is due to the fact that it computes the posterior

Surprisingly, it can be shown that under certain conditions the

 It belongs to the supervised learning domain

 Has application in pattern recognition, data mining and intrusion detection

The Bayes theorem is a method for calculating a hypothesis’s

Maximum a Posteriori (MAP), a probabilistic framework for

The posterior probabilities of the hypotheses are as

Hence, h1 is the MAP hypothesis. (MAP => max

The most probable classification of the new instance is

The denominator is omitted since we’re only using this

The value vj, for which P (vj/D) is maximum, is the

PAC (Probably Approximately Correct) learning is a framework used for

Probably Approximately Correct if Pr(h⊕ c ) <= 1- δ

You might also like