0% found this document useful (0 votes)

682 views

Bayesian Learning Unit 3 PDF

This document discusses Bayesian learning and concepts in machine learning. It covers Bayes' theorem, which provides a probabilistic approach to inference based on probability distributions over quantities of interest and observed data. Bayesian learning allows prior knowledge to be combined with new data, hypotheses to make probabilistic predictions, and new instances to be classified by weighing multiple hypotheses. While practical difficulties include estimating unknown probabilities and computational costs, Bayesian learning provides optimal decision making standards. Specific topics covered include the Naive Bayes classifier, Bayesian belief networks, and the EM algorithm.

Uploaded by

Harry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

682 views

Bayesian Learning Unit 3 PDF

Uploaded by

Harry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

UNIT 3- BAYESIAN LEARNING

Bayes Theorem, Concept Learning, Maximum Likelihood, Minimum Description Length Principle,
Bayes Optimal Classifier, Gibbs Algorithm , Naïve Bayes Classifier, Bayesian Belief Network, EM
Algorithm

Introduction
Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that
the quantities of interest are governed by probability distributions and that optimal decisions can be
made by reasoning about these probabilities together with observed data. It is important to machine
learning because it provides a quantitative approach to weighing the evidence supporting alternative
hypotheses. Bayesian reasoning provides the basis for learning algorithms that directly manipulate
probabilities, as well as a framework for analyzing the operation of other algorithms that do not
explicitly manipulate probabilities

Features of Bayesian learning:

 Each observed training example can incrementally decrease or increase the estimated
probability that a hypothesis is correct. This provides a more flexible approach to learning
than algorithms that completely eliminate a hypothesis if it is found to be inconsistent with
any single example.
 Prior knowledge can be combined with observed data to determine the final probability of a
hypothesis. In Bayesian learning, prior knowledge is provided by asserting (1) a prior
probability for each candidate hypothesis, and (2) a probability distribution over observed
data for each possible hypothesis.
 Bayesian methods can accommodate hypotheses that make probabilistic predictions(e.g.,
hypotheses such as "this pneumonia patient has a 93% chance of complete recovery").
 New instances can be classified by combining the predictions of multiple hypotheses,
weighted by their probabilities.
 Even in cases where Bayesian methods prove computationally intractable, they can provide a
standard of optimal decision making against which other practical methods can be measured.

Limitations of Bayesian learning

 One practical difficulty in applying Bayesian methods is that they typically require initial
knowledge of many probabilities. When these probabilities are not known in advance they
are often estimated based on background knowledge, previously available data, and
assumptions about the form of the underlying distributions.
 A second practical difficulty is the significant computational cost required to determine the
Bayes optimal hypothesis in the general case

Page 1
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

BAYES THEOREM
In machine learning our objective is to determine the best hypothesis from some space H, given the
observed training data D. One way to specify the best hypothesis is to say that we demand the most
probable hypothesis, given the data D plus any initial knowledge about the prior probabilities of the
various hypotheses in H. Bayes theorem provides a way to calculate the probability of a hypothesis
based on its prior probability, the probabilities of observing various data given the hypothesis, and
the observed data itself.

Notations
P(h) - the initial probability that hypothesis h holds, before we have observed the training data.

P(h) is often called the prior probability of h and may reflect any background knowledge we have
about the chance that h is a correct
hypothesis. If we have no such prior knowledge, then we might simply assign the same prior
probability to each candidate hypothesis.

P(D) - the prior probability that training data D will be observed (i.e., the probability of D given no
knowledge about which hypothesis holds).

P(D|h) - the probability of observing data D given some world in which hypothesis h holds. More
generally, we write P(xly) to denote the probability of x given y.
In machine learning problems we are interested in the probability P (h|D) that h holds given the
observed training data D.

P (h |D) is called the posterior probability of h, because it reflects our confidence that h holds after
we have seen the training data D.

According to Bayes theorem , we can compute posterior probability P (h |D) provides from the prior
probability P(h), together with P(D) and P(D(h) as

In many learning scenarios, the learner considers some set of candidate hypotheses H and is
interested in finding the most probable hypothesis h ε H given the observed data D.. Any such
maximally probable hypothesis is called a maximum a posteriori (MAP) hypothesis. We can
determine the MAP hypotheses by using Bayes theorem to calculate the posterior probability of each
candidate hypothesis.

More precisely, we will say that MAP is a MAP hypothesis provided

Page 2
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Notice in the final step above we dropped the term P(D) because it is a constant independent of h. In
some cases, we will assume that every hypothesis in H is equally probable a priori (P(hi) = P(hj) for
all hi and hj in H). In this case we need to only consider the term P(D|h) to find the most probable
hypothesis. P(Dlh) is often called the likelihood of the data D given h, and any hypothesis that
maximizes P(D|h) is called a maximum likelihood (ML) hypothesis, hML.

An Example
Consider a medical diagnosis problem in which there are two alternative hypotheses:
(1) that the patient has a particular form of cancer.
(2) that the patient does not.

The available data is from a particular laboratory test with two possible outcomes:
+ (positive) and - (negative).
We have prior knowledge that over the entire population of people only .008 have this
disease. The lab test is only an imperfect indicator of the disease. The test returns a correct
positive result in only 98% of the cases in which the disease is actually present and a correct
negative result in only 97% of the cases in which the disease is not present. In other cases,
the test returns the opposite result. The above situation can be summarized by the following
probabilities:

Suppose we now observe a new patient for whom the lab test returns a positive result. Should we
diagnose the patient as having cancer or not? The maximum a posteriori hypothesis can be found
using

Thus, h~= ~cancer.

Page 3
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

BAYES THEOREM AND CONCEPT LEARNING

Brute-Force Bayes Concept Learning

We can design a straightforward concept learning algorithm to output the maximum a posteriori
hypothesis, based on Bayes theorem, as follows:

This algorithm may require significant computation, because it applies Bayes theorem to each
hypothesis in H to calculate P(h|D ).

In order specify a learning problem for the BRUTE-FORCE MAP LEARNING algorithm we must
specify what values are to be used for P(h) and for P(D|h)

The following are the assumptions.

1. The training data D is noise free (i.e., di = c(xi)).
2. The target concept c is contained in the hypothesis space H
3. We have no a priori reason to believe that any hypothesis is more probable than any other.

To specify P(h)
Given no prior knowledge that one hypothesis is more likely than another, it is reasonable to assign
the same prior probability to every hypothesis h in H. Furthermore, because we assume the target
concept is contained in H we should require that these prior probabilities sum to 1. Together these
constraints imply that we should choose

To specify P(D|h)
P(D|h) is the probability of observing the target values D = (dl . . .dm) for the fixed set of instances
(xi . . . xn), given a world in which hypothesis h holds. Since we assume noise-free training data, the
probability of observing classification di given h is just 1 if di = h(xi) and 0 if di ≠ h(xi). Therefore,

Page 4
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

In other words, the probability of data D given hypothesis h is 1 if D is consistent with h, and 0
otherwise.

By Bayes theorem, we have

First consider the case where h is inconsistent with the training data D. Since P(D|h) is 0 when h is
inconsistent with D, we have

The posterior probability of a hypothesis inconsistent with D is zero.

Now consider the case where h is consistent with D. Since P(D|h) is 1 when h is consistent with D,
we have

To summarize, Bayes theorem implies that the posterior probability P(h |D)under our assumed P(h)
and P(D|h) is

Page 5
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

where IVSH,DI is the number of hypotheses from H consistent with D.

BAYES OPTIMAL CLASSIFIER

The question considered till now is "what is the most probable hypothesis given the training data?'
In fact, the question that is most significant is "what is the most probable classification of the new
instance given the training data?

Consider a hypothesis space containing three hypotheses, hl, h2, and h3. Suppose that the posterior
probabilities of these hypotheses given the training data are 0.4, 0.3, and 0.3 respectively. Thus, hl is
the MAP hypothesis. Suppose a new instance x is encountered, which is classified positive by h1,
but negative by h2 and h3. Taking all hypotheses into account, the probability that x is positive is .4
(the probability associated with hi), and the probability that it is negative is therefore .6. The most
probable classification (negative) in this case is different from the classification generated by the
MAP hypothesis.

In general, the most probable classification of the new instance is obtained by combining the
predictions of all hypotheses, weighted by their posterior probabilities.
If the possible classification of the new example can take on any value vj from some set V, then the
probability P(vj|D) that the correct classification for the new instance is vj, is

The optimal classification of the new instance is the value v,, for which P (vj | D) is maximum

To illustrate in terms of the above example, the set of possible classifications of the new instance is
V = +,- and

Page 6
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Any system that classifies new instances according to

is called a Bayes optimal classifier, or Bayes optimal learner. This method maximizes the
probability that the new instance is classified correctly, given the available data, hypothesis space,
and prior probabilities over the hypotheses.

GIBBS ALGORITHM
Although the Bayes optimal classifier obtains the best performance that can be achieved from the
given training data, it can be quite costly to apply. The expense is because it computes the posterior
probability for every hypothesis in H and then combines the predictions of each hypothesis to
classify each new instance.

An alternative, less optimal method is the Gibbs algorithm, defined as follows:

1. Choose a hypothesis h from H at random, according to the posterior probability distribution over
H.
2. Use h to predict the classification of the next instance x.

Given a new instance to classify, the Gibbs algorithm simply applies a hypothesis drawn at random
according to the current posterior probability distribution. It can be shown that under certain
conditions the expected misclassification error for the Gibbs algorithm is at most twice the expected
error of the Bayes optimal classifier. Under this condition, the expected value of the error of the
Gibbs algorithm is at worst twice the expected value of the error of the Bayes optimal classifier.

Page 7
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

NAIVE BAYES CLASSIFIER

The highly practical Bayesian learning method is the naive Bayes learner, often called the naive
Bayes classijier. In some domains its performance has been shown to be comparable to that of
neural network and decision tree learning.
.
The naive Bayes classifier applies to learning tasks where each instance x is described by a
conjunction of attribute values and where the target function f ( x ) can take on any value from some
finite set V. A set of training examples of the target function is provided, and a new instance is
presented, described by the tuple of attribute values (al, a2.. .an,). The learner is asked to predict the
target value, or classification, for this new instance.

The Bayesian approach to classifying the new instance is to assign the most probable target value,
VMAP given the attribute values (al,a 2 . . .an ,) that describe the instance.

We can use Bayes theorem to rewrite this expression as

The naive Bayes classifier is based on the simplifying assumption that the attribute values are
conditionally independent given the target value. In other words, the assumption is that given the
target value of the instance, the probability of observing the conjunction al, a2.. .an, is just the
product of the probabilities for the individual attributes:

where VNB denotes the target value output by the naive Bayes classifier.

Here, nc is the number of training examples satisfying the condition and n is the total number of examples, p is the
prior estimate of the probability we wish to determine, and m is a constant called the equivalent sample size, which
determines how heavily to weight p relative to the observed data.

Page 8
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Example

Page 9
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Page 10
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Page 11
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

i. Estimate the conditional probability P(Color|Yes),P(Color/No), P(Type|Yes), P(Type|No).

P(Origin|Yes), P(Origin|No). Predict the class of the example (Red,Domestic,SUV) using Naïve
Bayes.
ii. Estimate the conditional probability using m-estimate. Predict the class of the example
(Red,Domestic,SUV) using Naïve Bayes.

Page 12
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Bayesian Belief Networks

Bayesian Belief Network is a Probabilistic Graphical Model (PGM) that represents a set of variables
and their conditional dependencies using a directed acyclic graph.

Bayesian networks are probabilistic, because these networks are built from a probability distribution,
and also use probability theory for prediction and anomaly detection.

Real world applications are probabilistic in nature, and to represent the relationship between multiple
events, we need a Bayesian network. It can also be used in various tasks including prediction,
anomaly detection, diagnostics, automated insight, reasoning, time series prediction, and decision
making under uncertainty.

Bayesian Network can be used for building models from data and experts opinions, and it consists of
two parts:

o Directed Acyclic Graph

o Table of conditional probabilities.

A Bayesian network graph is made up of nodes and Arcs (directed links), where

o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no
directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes
of the network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then node
A is called the parent of Node B.

Page 13
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

o Node C is independent of node A.

o Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ),
which determines the effect of the parent on that node

Benefits of Bayesian Belief Networks

o Visualization. The model provides a direct way to visualize the structure of the model
and motivate the design of new models.
o Relationships. Provides insights into the presence and absence of the relationships
between random variables.
o Computations. Provides a way to structure complex probability calculations.

Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1,
x2, x3.. xn, are known as Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably
responds at detecting a burglary but also responds for minor earthquakes. Harry has two neighbors
David and Sophia, who have taken a responsibility to inform Harry at work when they hear the
alarm. David always calls Harry when he hears the alarm, but sometimes he got confused with the
phone ringing and calls at that time too. On the other hand, Sophia likes to listen to high music, so
sometimes she misses to hear the alarm. Here we would like to compute the probability of Burglary
Alarm.

Problem:

Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.

Page 14
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Solution:

o The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly affecting
the probability of alarm's going off, but David and Sophia's calls depend on alarm
probability.
o The network is representing that our assumptions do not directly perceive the burglary and
also do not notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or
CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an
exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there
are two parents, then CPT will contain 4 probability values

List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can
rewrite the above probability statement using joint probability distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Page 15
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Let's take the observed probability for the Burglary and earthquake component:

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not occurred.

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Page 16
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the probability of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) P (D|A)P (A|¬B ^ ¬E) P (¬B) P (¬E).

= 0.75* 0.91* 0.001* 0.998*0.999

= 0.00068045.

Page 17
Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

Page 18

Analysis of Transport Choice of Employees - A Project On Machine Learning
100% (10)
Analysis of Transport Choice of Employees - A Project On Machine Learning
24 pages
Able Baker Example
No ratings yet
Able Baker Example
13 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
Instance Based Learning
100% (1)
Instance Based Learning
49 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Designing A Learning System
No ratings yet
Designing A Learning System
12 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
Unit V Graphical Models
No ratings yet
Unit V Graphical Models
23 pages
CP5191 NAAC - Machine Learning Techniques Lesson Plan - M.E 2017
No ratings yet
CP5191 NAAC - Machine Learning Techniques Lesson Plan - M.E 2017
4 pages
Machine Learning-Unit-V-Notes
No ratings yet
Machine Learning-Unit-V-Notes
23 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
CS6659 AI UNIT 1 Notes
100% (8)
CS6659 AI UNIT 1 Notes
47 pages
AIML Module 3
No ratings yet
AIML Module 3
25 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
CP5191 Machine Learning Techniques L T P C3 0 0 3
No ratings yet
CP5191 Machine Learning Techniques L T P C3 0 0 3
7 pages
M.Tech (CSE) Big Data Analytics Curriculum
No ratings yet
M.Tech (CSE) Big Data Analytics Curriculum
69 pages
DAA PPT - Unit - I
No ratings yet
DAA PPT - Unit - I
111 pages
Unit-1
No ratings yet
Unit-1
88 pages
Machine Learning Notes Unit 1
No ratings yet
Machine Learning Notes Unit 1
25 pages
First-Order Logic in Artificial Intelligence
No ratings yet
First-Order Logic in Artificial Intelligence
21 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
Forward Chaining and Backward Chaining in Ai: Inference Engine
No ratings yet
Forward Chaining and Backward Chaining in Ai: Inference Engine
18 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
Unit 1 Notes
100% (1)
Unit 1 Notes
14 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Unit-3-Second Chapter
No ratings yet
Unit-3-Second Chapter
9 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
No ratings yet
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
54 pages
QA Techmax
No ratings yet
QA Techmax
263 pages
274 - Soft Computing LECTURE NOTES
No ratings yet
274 - Soft Computing LECTURE NOTES
499 pages
Lab Program
100% (1)
Lab Program
15 pages
ESDL Lab Manual
No ratings yet
ESDL Lab Manual
7 pages
ML Unit-5
No ratings yet
ML Unit-5
83 pages
AI Unit 4 Lecture Notes It
No ratings yet
AI Unit 4 Lecture Notes It
15 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
textbook ML_removed (2)
No ratings yet
textbook ML_removed (2)
10 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
136 pages
Applied Probability and Statistics For Computer Science Engineers
No ratings yet
Applied Probability and Statistics For Computer Science Engineers
1 page
Game Playing: Adversarial Search
No ratings yet
Game Playing: Adversarial Search
66 pages
Machine Learning Notes (All Units Merged)
No ratings yet
Machine Learning Notes (All Units Merged)
144 pages
Machine Learning Lab Manual (15CSL76)
No ratings yet
Machine Learning Lab Manual (15CSL76)
30 pages
Notes of Data Science Unit 3
No ratings yet
Notes of Data Science Unit 3
22 pages
Representing Knowledge Using
No ratings yet
Representing Knowledge Using
22 pages
AAI Module 2 Notes
No ratings yet
AAI Module 2 Notes
14 pages
Studocu DAA Unit 1 Notes
No ratings yet
Studocu DAA Unit 1 Notes
52 pages
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
3.1 New
No ratings yet
3.1 New
12 pages
The Global Positioning System consists of three major segments
No ratings yet
The Global Positioning System consists of three major segments
6 pages
sanjay REPORT
No ratings yet
sanjay REPORT
7 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
Cryptography and Network Security Self-Study: Submitted by
No ratings yet
Cryptography and Network Security Self-Study: Submitted by
28 pages
ANN-unit 4 PDF
No ratings yet
ANN-unit 4 PDF
23 pages
ANN-unit 4 PDF
No ratings yet
ANN-unit 4 PDF
23 pages
Bda Unit-4 PDF
No ratings yet
Bda Unit-4 PDF
63 pages
Cryptography and Network Security Self-Study: Submitted by
No ratings yet
Cryptography and Network Security Self-Study: Submitted by
28 pages
Getting Started With Vuforia: 1. Create A License Key For Our Application
No ratings yet
Getting Started With Vuforia: 1. Create A License Key For Our Application
12 pages
Stocks Analysis and Prediction Using Big Data Analytics
No ratings yet
Stocks Analysis and Prediction Using Big Data Analytics
4 pages
A Study On E-Commerce Recommender System Based On Big Data
No ratings yet
A Study On E-Commerce Recommender System Based On Big Data
5 pages
Augmented Reality in Education and Training: Techtrends March 2012
No ratings yet
Augmented Reality in Education and Training: Techtrends March 2012
10 pages
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
No ratings yet
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
7 pages
Emulator Ground Plane PDF
No ratings yet
Emulator Ground Plane PDF
2 pages
Computer Networks Lab Manual On Computer Networks (08CSL67) & VIVA VOCE Questions
No ratings yet
Computer Networks Lab Manual On Computer Networks (08CSL67) & VIVA VOCE Questions
46 pages
Two Sample Inference: By: Girma M
No ratings yet
Two Sample Inference: By: Girma M
33 pages
Activity for Z and T Tests
No ratings yet
Activity for Z and T Tests
1 page
Research Methodology - Question Solve
No ratings yet
Research Methodology - Question Solve
87 pages
Bootstrap Prediction Interval
No ratings yet
Bootstrap Prediction Interval
12 pages
Predictors of Students' Performance in Chemistry Laboratory of Selected Private Schools (Estrella, 2009)
100% (1)
Predictors of Students' Performance in Chemistry Laboratory of Selected Private Schools (Estrella, 2009)
2 pages
Module - 03 Probability Distributions: Complex Analysis, Probability and Statistical Methods (18mat41)
No ratings yet
Module - 03 Probability Distributions: Complex Analysis, Probability and Statistical Methods (18mat41)
27 pages
Forecast
100% (1)
Forecast
1 page
EDRE331StatisticalAnalysisReport-Salazar Ruth
No ratings yet
EDRE331StatisticalAnalysisReport-Salazar Ruth
27 pages
CH 4.2 Probability Distribu P. 70-72
No ratings yet
CH 4.2 Probability Distribu P. 70-72
4 pages
Data Analytics Interview Handbook Isb
No ratings yet
Data Analytics Interview Handbook Isb
40 pages
The Design of Research: ©the Mcgraw-Hill Companies, Inc., 2001 Irwin/Mcgraw-Hill
No ratings yet
The Design of Research: ©the Mcgraw-Hill Companies, Inc., 2001 Irwin/Mcgraw-Hill
18 pages
CookBook 15 Assessment Trueness Measurement Procedure by Use of RM - 10-2018 PDF
No ratings yet
CookBook 15 Assessment Trueness Measurement Procedure by Use of RM - 10-2018 PDF
3 pages
SIP Report Format
No ratings yet
SIP Report Format
4 pages
Spirkin, A. - Philosophy As A World-View and A Methodology
No ratings yet
Spirkin, A. - Philosophy As A World-View and A Methodology
10 pages
Statistik Assignment 2 Aku Yg Betul
No ratings yet
Statistik Assignment 2 Aku Yg Betul
28 pages
Analysis and Characterisation of Albizia Lebbeck: Submitted by
No ratings yet
Analysis and Characterisation of Albizia Lebbeck: Submitted by
10 pages
CHAPTER 3 Final Print
No ratings yet
CHAPTER 3 Final Print
4 pages
Research Statistics Using JASP
100% (1)
Research Statistics Using JASP
47 pages
NLC CapB Demo in Science Lesson 1
No ratings yet
NLC CapB Demo in Science Lesson 1
26 pages
Cambridge International AS and A Level Sociology Coursebook partial 2014 Chris Livesey - The ebook with all chapters is available with just one click
50% (2)
Cambridge International AS and A Level Sociology Coursebook partial 2014 Chris Livesey - The ebook with all chapters is available with just one click
49 pages
Management Science 1234
No ratings yet
Management Science 1234
10 pages
Lilliefors Test Table
No ratings yet
Lilliefors Test Table
4 pages
Sample Size Determination
No ratings yet
Sample Size Determination
30 pages
Data Preparation and Analysis
No ratings yet
Data Preparation and Analysis
11 pages
Module 5 RM: 1) Testing of Hypothesis Concepts and Testing
No ratings yet
Module 5 RM: 1) Testing of Hypothesis Concepts and Testing
21 pages
What Is Methodological Literature Review
100% (1)
What Is Methodological Literature Review
4 pages
Chapter 4: Seasonal Series: Forecasting and Decomposition
No ratings yet
Chapter 4: Seasonal Series: Forecasting and Decomposition
29 pages
3 Control Charts P NP C U
No ratings yet
3 Control Charts P NP C U
28 pages

Bayesian Learning Unit 3 PDF

Uploaded by

Bayesian Learning Unit 3 PDF

Uploaded by

Machine Learning Notes 6th Sem CSE Elective 2019-20Sujata Joshi/Assoc Prof/CSE

UNIT 3- BAYESIAN LEARNING

Features of Bayesian learning:

Limitations of Bayesian learning

More precisely, we will say that MAP is a MAP hypothesis provided

Thus, h~= ~cancer.

BAYES THEOREM AND CONCEPT LEARNING

Brute-Force Bayes Concept Learning

The following are the assumptions.

By Bayes theorem, we have

The posterior probability of a hypothesis inconsistent with D is zero.

where IVSH,DI is the number of hypotheses from H consistent with D.

BAYES OPTIMAL CLASSIFIER

Any system that classifies new instances according to

An alternative, less optimal method is the Gibbs algorithm, defined as follows:

NAIVE BAYES CLASSIFIER

We can use Bayes theorem to rewrite this expression as

i. Estimate the conditional probability P(Color|Yes),P(Color/No), P(Type|Yes), P(Type|No).

Bayesian Belief Networks

o Directed Acyclic Graph

o Node C is independent of node A.

Benefits of Bayesian Belief Networks

Joint probability distribution:

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

List of all events occurring in this network:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

= 0.75* 0.91* 0.001* 0.998*0.999

You might also like

P(S, D, A, ¬B, ¬E) = P (S|A) P (D|A)P (A|¬B ^ ¬E) P (¬B) P (¬E).