0% found this document useful (0 votes)

7 views10 pages

Lecture 2 BayesianHypothesisTesting

The document discusses Bayesian hypothesis testing, focusing on decision-making based on observations in various applications like medical diagnosis and digital communication. It introduces the concept of prior and posterior probabilities, the likelihood ratio test, and the formulation of optimum decision rules for binary hypothesis testing. The framework aims to minimize decision errors by utilizing a cost function to guide the decision process based on observed data and prior beliefs.

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Lecture 2 BayesianHypothesisTesting

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Massachusetts Institute of Technology

Department of Electrical Engineering and Computer Science

6.437 Inference and Information
Spring 2015

2 Bayesian Hypothesis Testing

In a wide range of applications, one must make decisions based on a set of observa-
tions. Examples include medical diagnosis, voice and face recognition, DNA sequence
analysis, air traffic control, and digital communication. In general, the observations
are noisy, incomplete, or otherwise imperfect, and thus the decisions produced will
not always be correct. However, we would like to use a decision process that is as
good as possible in an appropriate sense.
Addressing such problems is the aim of decision theory, and a natural framework
for setting up such problems is in terms of a hypothesis test. In this framework, each
of the possible scenarios corresponds to a hypothesis. When there are M hypotheses,
we denote the set of possible hypotheses using H = {H0 , H1 , . . . , HM −1 }.1 For each
of the possible hypotheses, there is a different model for the observed data, and this
is what we will exploit to distinguish among the hypotheses.
In our formulation, the observed collection of data is represented as a random
vector y, which may be discrete- or continuous-valued. There are a variety of ways to
model the hypotheses. In this section, we follow what is referred to as the Bayesian
approach, and model the valid hypothesis as a (discrete-valued) random variable, and
thus we denote it using H.
In a Bayesian hypothesis testing problem, the complete model therefore consists
of the a priori probabilities

pH (Hm ), m = 0, 1, . . . , M − 1,

together with a characterization of the observed data under each hypothesis, which
takes the form of the conditional probability distributions2

py|H (·|Hm ), m = 0, 1, . . . , M − 1. (1)

Of course, a complete characterization of our knowledge of the correct hypothesis

based on our observations is the set of a posteriori probabilities

pH|y (Hm |y), m = 0, 1, . . . , M − 1. (2)

The distribution of possible values of H is often referred to as our belief about

the hypothesis. From this perspective, we can view the a priori probabilities as our
prior belief, and view (2) as the revision of our belief based on having observed the
1
Note that H0 is sometimes referred to as the “null” hypothesis, particularly in asymmetric
problems where it has special significance.
2
As related terminology, the function py|H (y|·), where y is the actual observed data, is referred
to as the likelihood function.
data y. The belief update is, of course, computed from the particular data y based
on the model via Bayes’ Rule:3

py|H (y|Hm ) pH (Hm )

pH|y (Hm |y) = X .
py|H (y|Hm0 ) pH (Hm0 )
m0

While the belief is a complete characterization of our knowledge of the true hy-
pothesis, in applications one must often go further and make a decision (i.e., an
intelligent guess) based on this information. To make a good decision we need some
measure of goodness, appropriately chosen for the application of interest. In the se-
quel, we develop a framework for such decision-making, restricting our attention to
the binary (M = 2) case to simplify the exposition.

2.1 Binary Hypothesis Testing

Specializing to the binary case, our model consists of two components. One is the set
of prior probabilities
P0 = pH (H0 )
(3)
P1 = pH (H1 ) = 1 − P0 .
The second is the observation model, corresponding to the likelihood functions

H0 : py|H (y|H0 )
(4)
H1 : py|H (y|H1 ).

The development is essentially the same whether the observations are discrete or
continuous. We arbitrarily use the continuous case in our development. The discrete
case differs only in that integrals are replaced by summations.
We begin with a simple example to which we will return later.

Example 1. As a highly simplified scenario, suppose a single bit of information

m ∈ {0, 1} is encoded into a codeword sm and sent over a communication channel,
where s0 and s1 are both deterministic, known quantities. Let’s further suppose that
the channel is noisy; specifically, what is received is

y = sm + w ,

where w is a zero-mean Gaussian random variable with variance σ 2 and independent

of H. From this information, we can readily construct the probability density for the
3
In applications where further data is obtained, beliefs can be further revised, again using Bayes’
Rule as for the computation. This updating is a simple form of what is referred to as belief propa-
gation.

2
observation under each of the hypotheses, obtaining:
1 2 /(2σ 2 )
py |H (y|H0 ) = N(y; s0 , σ 2 ) = √ e−(y−s0 )
2πσ 2
(5)
2 1 −(y−s1 )2 /(2σ 2 )
py |H (y|H1 ) = N(y; s1 , σ ) = √ e .
2πσ 2
In addition, if 0’s and 1’s are equally likely to be transmitted we would set the a
priori probabilities to
P0 = P1 = 1/2.

2.1.1 Optimum Decision Rules: The Likelihood Ratio Test

The solution to a hypothesis test is specified in terms of a decision rule. We focus for
the time being on deterministic decision rules. Mathematically, such a decision rule
is a function Ĥ(·) that uniquely maps every possible observation y ∈ Y to one of the
two hypotheses, i.e., Ĥ : Y 7→ H, where H = {H0 , H1 }. From this perspective, we see
that choosing the function Ĥ(·) is equivalent to partitioning the observation space Y
into two disjoint “decision” regions, corresponding to the values of y for which each
of the two possible decisions are made. Specifically, we use Ym to denote those values
of y ∈ Y for which our rule decides Hm , i.e.,

Y0 = {y ∈ Y : Ĥ(y) = H0 }
(6)
Y1 = {y ∈ Y : Ĥ(y) = H1 }.

These regions are depicted schematically in Fig. 1.

Our goal, then, is to design this bi-valued function (equivalently the associated
decision regions Y0 and Y1 ) in such a way that the best possible performance is
obtained. In order to do this, we need to be able to quantify the notion of “best.” This
requires that we have a well-defined objective function corresponding to a suitable
measure of goodness. In the Bayesian approach, we use an objective function taking
the form of an expected cost function. Specifically, we use

C̃(Hj , Hi ) , Cij (7)

to denote the “cost” of deciding that the hypothesis is Ĥ = Hi when the correct
hypothesis is H = Hj . Then the optimum decision rule takes the form

Ĥ(·) = arg min ϕ(f ) (8)

f (·)

where the average cost, which is referred to as the “Bayes risk,” is

h i
ϕ(f ) = E C̃(H, f (y)) , (9)

3
Y0

Y1
Y0
Y0
Y

Figure 1: The regions Y0 and Y1 as defined in (6) corresponding to an example decision

rule Ĥ(·), where Y is the the observation alphabet.

and where the expectation in (9) is over both y and H, and where f (·) is a decision
rule.
Generally, the application dictates an appropriate choice of the costs Cij . For
example, a symmetric cost function of the form Cij = 1 − 1i=j , i.e.,
C00 = C11 = 0
(10)
C01 = C10 = 1,
corresponds to seeking a decision rule that minimizes the probability of a decision
error. However, there are many applications for which such symmetric cost functions
are not well-matched. For example, in a medical diagnosis problem where H0 denotes
the hypotheses that the patient does not have a particular disease and H1 that he
does, we would typically want to select cost assignments such that C01 C10 .
Definition 1. A set of costs {Cij } is valid if the cost of a correct decision is lower
than the cost of an incorrect decision, i.e., Cjj < Cij whenever i 6= j.
Theorem 1. Given a priori probabilities P0 , P1 , data y, observation models py|H (·|H0 ),
py|H (·|H1 ), and valid costs C00 , C01 , C10 , C11 , the optimum Bayes’ decision rule takes
the form:
py|H (y|H1 ) Ĥ(y)=H1 P0 (C10 − C00 )
L(y) , R , η, (11)
py|H (y|H0 ) Ĥ(y)=H P1 (C01 − C11 )
0

4
i.e., the decision is H1 when L(y) > η, the decision is H0 when L(y) < η, and the
decision can be made arbitrarily when L(y) = η.

Before establishing this result, we make a few remarks. First, the left-hand side
of (11) is referred to as the likelihood ratio, and thus (11) is typically referred to as
a likelihood ratio test (LRT). Note too that the likelihood ratio—which we denote
using L(y)—is constructed from the observations model and the data. Meanwhile,
the right-hand side of (11)—which we denote using η—is a precomputable threshold
that is determined from the a priori probabilities and costs.
Proof. Consider an arbitrary but fixed decision rule f (·). In terms of this generic
f (·), the Bayes risk can be expanded in the form
h i
ϕ(f ) = E C̃(H, f (y))
h i
= E E C̃(H, f (y)) y = y
Z
= ϕ̃(f (y), y) py (y) dy, (12)

with h i
ϕ̃(H, y) = E C̃(H, H) y = y , (13)

and where to obtain the second equality in (12) we have used iterated expectation.
Note from (12) that since py (y) is nonnegative, it is clear that we minimize ϕ if
we minimize ϕ̃(f (y), y) for each particular value of y. Hence, we can determine the
optimum decision rule Ĥ(·) on a point-by-point basis, i.e., Ĥ(y) for each y.
Let’s consider a particular (observation) point y = y∗ . For this point, if we choose
the assignment
Ĥ(y∗ ) = H0 ,
then our conditional expectation (13) takes the value

ϕ̃(H0 , y∗ ) = C00 pH|y (H0 |y∗ ) + C01 pH|y (H1 |y∗ ). (14)

Alternatively, if we choose the assignment

Ĥ(y∗ ) = H1 ,

then our conditional expectation (13) takes the value

ϕ̃(H1 , y∗ ) = C10 pH|y (H0 |y∗ ) + C11 pH|y (H1 |y∗ ). (15)

Hence, the optimum assignment for the value y∗ is simply the choice corresponding
to the smaller of (14) and (15). It is convenient to express this optimum decision

5
rule using the following notation (now replacing our particular observation y∗ with a
generic observation y):

Ĥ(y)=H1
C00 pH|y (H0 |y) C10 pH|y (H0 |y)
R (16)
+ C01 pH|y (H1 |y) Ĥ(y)=H0
+ C11 pH|y (H1 |y).

Note that when the two sides of (16) are equal, then either assignment is equally
good—both have the same effect on the objective function (12).
A minor rearrangement of the terms in (16) results in

Ĥ(y)=H1
(C01 − C11 )pH|y (H1 |y) R (C10 − C00 )pH|y (H0 |y). (17)
Ĥ(y)=H0

Since for any valid choice of costs the terms in parentheses in (17) are both positive,
we can equivalently write (17) in the form4

Ĥ(y)=H1
pH|y (H1 |y) (C10 − C00 )
R . (18)
pH|y (H0 |y) Ĥ(y)=H0
(C01 − C11 )

When we then substitute (19) into (18) and multiply both sides by P0 /P1 , we
obtain the decision rule in its final form (11), directly in terms of the measurement
densities.
As a final remark, observe that, not surprisingly, the optimum decision produced
by (17) is a particular function of our beliefs, i.e., the a posteriori probabilities

py|H (y|Hm ) Pm
pH|y (Hm |y) = . (19)
py|H (y|H0 ) P0 + py|H (y|H1 ) P1

2.1.2 Properties of the Likelihood Ratio Test

Several observations lend insight into the optimum decision rule (11). First, note
that the likelihood ratio L(·) is a scalar-valued function, i.e., L : Y → R, regardless
of the dimension or alphabet of the data. In fact, L(y) is an example of what is
referred to as a sufficient statistic for the problem: it summarizes everything we need
to know about the observation vector in order to make a decision. Phrased differently,
in terms of our ability to make the optimum decision (in the Bayesian sense in this
case), knowledge of L(y) is as good as knowledge of the full data vector y itself.
4
Technically, we have to be careful about dividing by zero here if pH|y (H0 |y) = 0. To simplify
our exposition, however, as we discuss in Section 2.1.2, we will generally restrict our attention to
the case where this does not happen.

6
We will develop the notion of a sufficient statistic more precisely and in greater
generality in a subsequent section of the notes; however, at this point it suffices to
make two observations with respect to our hypothesis testing problem. First, (11)
tells us an explicit construction for a scalar sufficient statistic for the Bayesian binary
hypothesis testing problem. Second, sufficient statistics are not unique. For example,
any invertible function of L(y) is also a sufficient statistic. In fact, for the purposes of
implementation or analysis it is often more convenient to rewrite the likelihood ratio
test in the form
Ĥ(y)=H1
0
L (y) = g(L(y)) R g(η), (20)
Ĥ(y)=H0

where g(·) is some suitably chosen, monotonically increasing function. An important

example is the case corresponding to g(·) = ln(·), which simplifies many tests involving
densities with exponential factors, such as Gaussians.5
It is also important to emphasize that L = L(y) is a random variable—i.e., it takes
on a different value in each experiment. As such, we will frequently be interested in
its probability density function—or at least moments such as its mean and variance—
under each of H0 and H1 . Such densities can be derived using the usual method of
events, and are often used in calculating performance of the decision rule.
It follows immediately from the definition in (11) that the likelihood ratio is a
nonnegative quantity. Furthermore, depending on the problem, some values of y may
lead to L(y) being zero or infinite. In particular, the former occurs when py|H (y|H1 ) =
0 but py|H (y|H0 ) > 0, which is an indication that values in a neighborhood of y
effectively cannot occur under H1 but can under H0 . In this case, there will be values
of y for which we’ll effectively know with certainty that the correct hypothesis is
H0 . When the likelihood ratio is infinite, corresponding a division-by-zero scenario,
an analogous situation exists, but with the roles of H0 and H1 reversed. These cases
where such perfect decisions are possible are referred to as singular decision scenarios.
In some practical problems, these scenarios do in fact occur. However, in other cases
they suggest a potential lack of robustness in the data modeling, i.e., that some source
of inherent uncertainty may be missing from the model. In any event, to simplify our
development for the remainder of the topic we will largely restrict our attention to
the case where 0 < L(y) < ∞ for all y.
While the likelihood ratio focuses the observed data into a single scalar for the
purpose of making an optimum decision, the threshold η for the test plays a com-
plementary role. In particular, from (11) we see that η focuses the relevant features
of the cost function and a priori probabilities into a single scalar. Furthermore, this
information is combined in a manner that is intuitively satisfying. For example, as
(11) also reflects, an increase in P0 means that H0 is more likely, so that η is increased
to appropriately bias the test toward deciding H0 for any particular observation. Sim-
5
We will discuss an important such family of distributions—exponential families—in detail in a
subsequent section of the notes.

7
ilarly, an increase in C10 means that deciding H1 when H0 is true is more costly, so
η is increased to appropriately bias the test toward deciding H0 to offset this risk.
Finally, note that adding a constant to the cost function (i.e., to all Cij ) has, as we
would anticipate, no effect on the threshold. Hence, without loss of generality we
may set at least one of the correct decision costs—i.e., C00 or C11 —to zero.
Finally, it is important to emphasize that the likelihood ratio test (11) indirectly
determines the decision regions (6). In particular, we have

Y0 = {y ∈ Y : Ĥ(y) = H0 } = {y ∈ Y : L(y) < η}

(21)
Y1 = {y ∈ Y : Ĥ(y) = H1 } = {y ∈ Y : L(y) > η}.
As Fig. 1 suggests, while a decision rule expressed in the measurement data space
Y can be complicated,6 (11) tells us that the observations can be transformed into
a one-dimensional space defined via L = L(y) where the decision regions have a
particularly simple form: the decision Ĥ(L) = H0 is made whenever L lies to the left
of some point η on the line, and Ĥ(L) = H1 whenever L lies to the right.

2.1.3 Maximum A Posteriori and Maximum Likelihood Decision Rules

An important cost assignment for many problems is that given by (10), which as we
recall corresponds to a minimum probability-of-error criterion. Indeed, in this case,
we have

ϕ(Ĥ) = P Ĥ(y) = H0 , H = H1 + P Ĥ(y) = H1 , H = H0 .

The corresponding decision rule in this case can be obtained as a special case of
(11).
Corollary 1. The minimum probability-of-error decision rule takes the form

Ĥ(y) = arg max pH|y (H|y). (22)

H∈{H0 ,H1 }

The rule (22), in which one chooses the hypothesis for which our belief is largest, is
referred to as the maximum a posteriori (MAP) decision rule.
Proof. Instead of specializing (11), we specialize the equivalent test (17), from which
we obtain a form of the minimum probability-of-error test expressed in terms of the
a posteriori probabilities for the problem, viz.,
Ĥ(y)=H1
pH|y (H1 |y) R pH|y (H0 |y). (23)
Ĥ(y)=H0

From (23) we see that the desired decision rule can be expressed in the form (22)
6
Indeed, neither of the respective sets Y0 and Y1 are even connected in general.

8
Still further simplification is possible when the hypotheses are equally likely (P0 =
P1 = 1/2). In this case, we have the following.

Corollary 2. When the hypotheses are equally likely, the minimum probability of
error decision rule takes the form

Ĥ(y) = arg max py|H (y|H). (24)

H∈{H0 ,H1 }

The rule (24), which is referred to as the maximum likelihood (ML) decision rule,
chooses the hypothesis for which the corresponding likelihood function is largest.
Proof. Specializing (11) we obtain

Ĥ(y)=H1
py|H (y|H1 )
R 1, (25)
py|H (y|H0 ) Ĥ(y)=H0

or, equivalently,
Ĥ(y)=H1
py|H (y|H1 ) R py|H (y|H0 ),
Ĥ(y)=H0

whence (24)

Example 2. Continuing with Example 1, we obtain from (5) that the likelihood ratio
test for this problem takes the form
1 2 /(2σ 2 )
√ e−(y−s1 ) Ĥ(y)=H1
L(y) = 2πσ 2 R η. (26)
1 2 /(2σ 2 )
√ e−(y−s0 ) Ĥ(y)=H0
2πσ 2
As (26) suggests—and as is generally the case in Gaussian problems—the natural
logarithm of the likelihood ratio is a more convenient sufficient statistic to work with
in this example. In this case, taking logarithms of both sides of (26) yields

Ĥ(y)=H1
1
L0 (y) = 2 2

(y − s 0 ) − (y − s 1 ) R ln η. (27)
2σ 2 Ĥ(y)=H0

Expanding the quadratics and cancelling terms in (27) we obtain the test in its
simplest form, which for s1 > s0 is given by

Ĥ(y)=H1
s1 + s0 σ 2 ln η
y R + , γ. (28)
Ĥ(y)=H0
2 s1 − s0

9
In this form, the resulting error probability is easily obtained, and is naturally ex-
pressed in terms of Q-function notation.
We also remark that with a minimum probability-of-error criterion, if P0 = P1
then ln η = 0 and we see immediately from (27) that the optimum test takes the form

Ĥ(y)=H1
|y − s0 | R |y − s1 |,
Ĥ(y)=H0

which corresponds to a “minimum-distance” decision rule, i.e.,

Ĥ(y) = Hm̂ , m̂ = arg min |y − sm |.

m∈{0,1}

This minimum-distance property turns out to hold in multidimensional Gaussian

problems as well, and leads to convenient analysis in terms of Euclidean geometry.
Note too that in this problem the decisions regions on the y-axis have a particularly
simple form; for example, for s1 > s0 we obtain

Y0 = {y ∈ R : y < γ}
(29)
Y1 = {y ∈ R : y > γ}.

In other problems—even Gaussian ones—the decision regions can be more compli-

cated, as our next example illustrates.

Example 3. Suppose that a zero-mean Gaussian random variable has one of two
possible variances, σ12 or σ02 , where σ12 > σ02 . Let the costs and prior probabilities be
arbitrary. Then the likelihood ratio test for this problem takes the form
1 2 2
p
2
e−y /(2σ1 ) Ĥ(y)=H1
2πσ1
L(y) = R η.
1 2 2
p
2
e−y /(2σ0 ) Ĥ(y)=H0
2πσ0

In this problem, it is a straightforward exercise to show that the test simplifies to one
of the form s
Ĥ(y)=H1
σ02 σ12

σ1
|y| R 2 2 2
ln η , γ.
Ĥ(y)=H
σ 1 − σ 0 σ0
0

Hence, the decision region Y1 is the union of two disconnected regions in this case,
i.e.,
Y1 = {y ∈ R : y > γ} ∪ {y ∈ R : y < −γ}.

Statistical Inference For Engineers and Data Scientists Solutions Manual
No ratings yet
Statistical Inference For Engineers and Data Scientists Solutions Manual
12 pages
BayesianHypothesisTesting
No ratings yet
BayesianHypothesisTesting
17 pages
ELEG 5633 Detection and Estimation Detection Theory I: Jing Yang
100% (1)
ELEG 5633 Detection and Estimation Detection Theory I: Jing Yang
27 pages
Classical Detection and Estimation Theory.
100% (1)
Classical Detection and Estimation Theory.
13 pages
Estimation and Detection Lec 01
No ratings yet
Estimation and Detection Lec 01
5 pages
Introduction To Signal Detection
No ratings yet
Introduction To Signal Detection
40 pages
Cognitive Radio With Binary Hypothesis Testing
No ratings yet
Cognitive Radio With Binary Hypothesis Testing
8 pages
1.1 What Is Cognitive Radio?
No ratings yet
1.1 What Is Cognitive Radio?
11 pages
An Introduction To Bayesian Statistics
100% (9)
An Introduction To Bayesian Statistics
20 pages
Block 4 ST3189
No ratings yet
Block 4 ST3189
25 pages
Classical Detection Theory
No ratings yet
Classical Detection Theory
23 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
SDA Bayes
No ratings yet
SDA Bayes
12 pages
Revised Lecture Notes 2
No ratings yet
Revised Lecture Notes 2
16 pages
Bayes&Voice Recognition
No ratings yet
Bayes&Voice Recognition
76 pages
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
No ratings yet
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
11 pages
RN Notes
No ratings yet
RN Notes
119 pages
3assignment Sol
No ratings yet
3assignment Sol
7 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Detection: R.G. Gallager
No ratings yet
Detection: R.G. Gallager
18 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
EC-512EC512 LecNotes Pt2
No ratings yet
EC-512EC512 LecNotes Pt2
29 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
zzzz-essential_bayes
No ratings yet
zzzz-essential_bayes
158 pages
chapter 2 l1
No ratings yet
chapter 2 l1
26 pages
22-23_323_Week7Notes (1)
No ratings yet
22-23_323_Week7Notes (1)
15 pages
Reference: "Detection, Estimation and Modulation Theory" by H.L. Van Trees
No ratings yet
Reference: "Detection, Estimation and Modulation Theory" by H.L. Van Trees
18 pages
M-Ary Hypothesis Testing
No ratings yet
M-Ary Hypothesis Testing
13 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Signal Detection Binary Detection With A Single Observation
No ratings yet
Signal Detection Binary Detection With A Single Observation
3 pages
T06 - Bayes Classifiers
No ratings yet
T06 - Bayes Classifiers
22 pages
Main
No ratings yet
Main
195 pages
Bayes Manuscripts
No ratings yet
Bayes Manuscripts
180 pages
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
No ratings yet
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
11 pages
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
No ratings yet
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
16 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Bayesian Decision Theory: Intro To
No ratings yet
Bayesian Decision Theory: Intro To
56 pages
HW 2 Sol
No ratings yet
HW 2 Sol
7 pages
2017 MT Sol-1
No ratings yet
2017 MT Sol-1
9 pages
Bayesian Hypothesis Testing
No ratings yet
Bayesian Hypothesis Testing
1 page
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Baysian Inferences
No ratings yet
Baysian Inferences
20 pages
pr2 bayes
No ratings yet
pr2 bayes
44 pages
Bayes Variance Point Smith1975
No ratings yet
Bayes Variance Point Smith1975
10 pages
Lec18 HierarchicalBayesianModels
No ratings yet
Lec18 HierarchicalBayesianModels
20 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
ML Unit III
No ratings yet
ML Unit III
40 pages
Bayesian Inference: Chris Mathys
No ratings yet
Bayesian Inference: Chris Mathys
32 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
Homework1 Solutions
No ratings yet
Homework1 Solutions
5 pages
13_HypothesisTestingFilled
No ratings yet
13_HypothesisTestingFilled
26 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
Baysian-Slides 16 Bayes Intro
No ratings yet
Baysian-Slides 16 Bayes Intro
49 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Math Jee Main Pya No.
No ratings yet
Math Jee Main Pya No.
36 pages
T103 - TM103 - Mta - Makeup - Summer 1819
No ratings yet
T103 - TM103 - Mta - Makeup - Summer 1819
3 pages
CGL Lab Manual
No ratings yet
CGL Lab Manual
30 pages
MAT 212 Lecture Notes 2
No ratings yet
MAT 212 Lecture Notes 2
82 pages
Artificial Intelligence - Midterm Exam
100% (1)
Artificial Intelligence - Midterm Exam
7 pages
Mathematics in The Modern World: Gematmw
No ratings yet
Mathematics in The Modern World: Gematmw
109 pages
Fermat and Euler's Theorems
No ratings yet
Fermat and Euler's Theorems
13 pages
Hairer Geometric Numerical Integration
100% (3)
Hairer Geometric Numerical Integration
525 pages
Asas Algebra 2021 - Persamaan Linear
No ratings yet
Asas Algebra 2021 - Persamaan Linear
6 pages
0580_w20_ms_22 (1)
No ratings yet
0580_w20_ms_22 (1)
6 pages
Mathematics High School Summary
No ratings yet
Mathematics High School Summary
29 pages
Transformations and Weighting To Correct Model Inadequacies: Linear Regression Analysis 5E Montgomery, Peck & Vining 1
No ratings yet
Transformations and Weighting To Correct Model Inadequacies: Linear Regression Analysis 5E Montgomery, Peck & Vining 1
45 pages
Number Theory: Modular Arithmetic
No ratings yet
Number Theory: Modular Arithmetic
4 pages
A Level Pure2 Oct2021ans
No ratings yet
A Level Pure2 Oct2021ans
48 pages
ch13 3
No ratings yet
ch13 3
6 pages
Assign1 2013 PDF
No ratings yet
Assign1 2013 PDF
2 pages
PrincipalComponentAnalysis-LectureNotesPublic
No ratings yet
PrincipalComponentAnalysis-LectureNotesPublic
24 pages
ANALYSIS OF STRAINS Assignment 2 PDF
100% (1)
ANALYSIS OF STRAINS Assignment 2 PDF
3 pages
Ingeniería Estadística - Plan de Estudios: Total Total 22
No ratings yet
Ingeniería Estadística - Plan de Estudios: Total Total 22
1 page
Arslan Ali: Profile
No ratings yet
Arslan Ali: Profile
2 pages
Lesson 1 - Implicit Differentiation
0% (1)
Lesson 1 - Implicit Differentiation
17 pages
Ito Stratonovich
No ratings yet
Ito Stratonovich
18 pages
Project Euler Problems
No ratings yet
Project Euler Problems
3 pages
Simplifying and Proving Trig Expressions
No ratings yet
Simplifying and Proving Trig Expressions
3 pages
Ou R.C Om: Mock Test Paper - I Mathematics Class X Solution Section A
No ratings yet
Ou R.C Om: Mock Test Paper - I Mathematics Class X Solution Section A
13 pages
MTH633 Handout
No ratings yet
MTH633 Handout
1,093 pages
Sta116 Scheme of Work and Assessment For Online Class
No ratings yet
Sta116 Scheme of Work and Assessment For Online Class
2 pages
1.EM-IV Syllabus
No ratings yet
1.EM-IV Syllabus
4 pages
PDF
No ratings yet
PDF
58 pages
EQ-502 Curve Fitting - Sinusoid
No ratings yet
EQ-502 Curve Fitting - Sinusoid
8 pages

Lecture 2 BayesianHypothesisTesting

Uploaded by

Lecture 2 BayesianHypothesisTesting

Uploaded by

Massachusetts Institute of Technology

Department of Electrical Engineering and Computer Science

2 Bayesian Hypothesis Testing

py|H (·|Hm ), m = 0, 1, . . . , M − 1. (1)

Of course, a complete characterization of our knowledge of the correct hypothesis

pH|y (Hm |y), m = 0, 1, . . . , M − 1. (2)

The distribution of possible values of H is often referred to as our belief about

py|H (y|Hm ) pH (Hm )

2.1 Binary Hypothesis Testing

Example 1. As a highly simplified scenario, suppose a single bit of information

where w is a zero-mean Gaussian random variable with variance σ 2 and independent

2.1.1 Optimum Decision Rules: The Likelihood Ratio Test

These regions are depicted schematically in Fig. 1.

C̃(Hj , Hi ) , Cij (7)

Ĥ(·) = arg min ϕ(f ) (8)

where the average cost, which is referred to as the “Bayes risk,” is

Figure 1: The regions Y0 and Y1 as defined in (6) corresponding to an example decision

Alternatively, if we choose the assignment

then our conditional expectation (13) takes the value

2.1.2 Properties of the Likelihood Ratio Test

where g(·) is some suitably chosen, monotonically increasing function. An important

Y0 = {y ∈ Y : Ĥ(y) = H0 } = {y ∈ Y : L(y) < η}

2.1.3 Maximum A Posteriori and Maximum Likelihood Decision Rules

Ĥ(y) = arg max pH|y (H|y). (22)

Ĥ(y) = arg max py|H (y|H). (24)

which corresponds to a “minimum-distance” decision rule, i.e.,

Ĥ(y) = Hm̂ , m̂ = arg min |y − sm |.

This minimum-distance property turns out to hold in multidimensional Gaussian

In other problems—even Gaussian ones—the decision regions can be more compli-

You might also like