Bayesian Inference
Bayesian Inference
php
In this chapter, we would like to discuss a different framework for inference, namely the Bayesian approach.
In the Bayesian framework, we treat the unknown quantity, Θ , as a random variable. More specifically, we
assume that we have some initial guess about the distribution of Θ . This distribution is called the prior
distribution. After observing some data, we update the distribution of Θ (based on the observed data). This
step is usually done using Bayes' Rule. That is why this approach is called the Bayesian approach. The details
of this approach will be clearer as you go through the chapter. Here, to motivate the Bayesian approach, we
will provide two examples of statistical problems that might be solved using the Bayesian approach.
Example 9.1
Suppose that you would like to estimate the portion of voters in your town that plan to vote for Party A in an
upcoming election. To do so, you take a random sample of size n from the likely voters in the town. Since
you have a limited amount of time and resources, your sample is relatively small. Specifically, suppose that
n = 20. After doing your sampling, you find out that 6 people in your sample say they will vote for Party
A.
• Solution
◦ Let θ be the true portion of voters in your town who plan to vote for Party A. You might want to
estimate θ as
6
θ ̂ = = 0.3
20
In fact, in absence of any other data, that seems to be a reasonable estimate. However, you might
feel that n = 20 is too small. Thus, your guess is that the error in your estimation might be too
high. While thinking about this problem, you remember that the data from the previous election
is available to you. You look at that data and find out that, in the previous election, 40% of the
people in your town voted for Party A. How can you use this data to possibly improve your
estimate of θ ? You might argue as follows:
Although the portion of votes for Party A changes from one election to another, the change is not
usually very drastic. Therefore, given that in the previous election 40% of the voters voted for
Party A, you might want to model the portion of votes for Party A in the next election as a
random variable Θ with a probability density function, fΘ (θ) , that is mostly concentrated
around θ = 0.4. For example, you might want to choose the density such that
E[Θ] = 0.4
1 of 4 10/12/22, 3:53 pm
Bayesian Inference https://www.probabilitycourse.com/chapter9/9_1_0_bayesian_inference.php
Figure 9.1 shows an example of such density functions. Such a distribution shows your prior
belief about Θ in the absence of any additional data. That is, before taking your random sample
of size n = 20 , this is your guess about the distribution of Θ .
Therefore, you initially have the prior distribution fΘ (θ) . Then you collect some data, shown by
D. More specifically, here your data is a random sample of size n = 20 voters, 6 of whom are
voting for Party A. As we will discuss in more detail, you can then proceed to find an updated
distribution for Θ , called the posterior distribution, using Bayes' rule:
P(D|θ)fΘ (θ)
fΘ|D (θ|D) = .
P(D)
We can now use the posterior density, fΘ|D (θ|D), to further draw inferences about Θ . More
specifically, we might use it to find point or interval estimates of Θ .
Example 9.2
Consider a communication channel as shown in Figure 9.2. We can model the communication over this
channel as follows. At time n, a random variable Xn is generated and is transmitted over the channel.
However, the channel is noisy. Thus, at the receiver, a noisy version of Xn is received. More specifically, the
received signal is
Yn = Xn + Wn ,
where Wn ∼ N(0, σ 2 ) is the noise added to Xn . We assume that the receiver knows the distribution of Xn .
The goal here is to recover (estimate) the value of Xn based on the observed value of Yn .
2 of 4 10/12/22, 3:53 pm
Bayesian Inference https://www.probabilitycourse.com/chapter9/9_1_0_bayesian_inference.php
• Solution
◦ Again, we are dealing with estimating a random variable (Xn ). In this case, the prior distribution
is fX (x). After observing Yn , the posterior distribution can be written as
If you think about Examples 9.1 and 9.2 carefully, you will notice that they have similar structures. Basically,
in both problems, our goal is to draw an inference about the value of an unobserved random variable (Θ or
Xn ). We observe some data (D or Yn ). We then use Bayes' rule to make inference about the unobserved
random variable. This is generally how we approach inference problems in Bayesian statistics.
It is worth noting that Examples 9.1 and 9.2 are conceptually different in the following sense: In Example
9.1, the choice of prior distribution fΘ (θ) is somewhat unclear. That is, different people might use different
prior distributions. In other words, the choice of prior distribution is subjective here. On the other hand, in
Example 9.2, the prior distribution fXn (x) might be determined as a part of the communication system
design. In other words, for this example, the prior distribution might be known without any ambiguity.
Nevertheless, once the prior distribution is determined, then one uses similar methods to attack both
problems. For this reason, we study both problems under the umbrella of Bayesian statistics.
The goal is to draw inferences about an unknown variable X by observing a related random variable Y . The
unknown variable is modeled as a random variable X , with prior distribution
3 of 4 10/12/22, 3:53 pm
Bayesian Inference https://www.probabilitycourse.com/chapter9/9_1_0_bayesian_inference.php
4 of 4 10/12/22, 3:53 pm