0% found this document useful (0 votes)
94 views

An Overview of Bayesian Econometrics

This document provides an overview of Bayesian econometrics. It discusses key Bayesian concepts like the posterior density, likelihood function, and prior density. The document also covers how Bayes' theorem is used to estimate parameters in a model, compare different models, and make predictions. An example is provided on deriving the posterior density when data follows a Bernoulli distribution with a Beta prior.

Uploaded by

6doit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

An Overview of Bayesian Econometrics

This document provides an overview of Bayesian econometrics. It discusses key Bayesian concepts like the posterior density, likelihood function, and prior density. The document also covers how Bayes' theorem is used to estimate parameters in a model, compare different models, and make predictions. An example is provided on deriving the posterior density when data follows a Bernoulli distribution with a Beta prior.

Uploaded by

6doit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

An Overview of Bayesian Econometrics

() Bayesian Overview 1 / 30
Bayesian Theory

Reading: Chapter 1 of textbook and Appendix B, section B.1.


Begin with general concepts in Bayesian theory before getting to
speci…c models.
If you know these general concepts you will never get lost.
What does econometrician do? i) Estimate parameters in a model
(e.g. regression coe¢ cients), ii) Compare di¤erent models (e.g.
hypothesis testing), iii) Prediction.
Bayesian econometrics does these based on a few simple rules of
probability.

() Bayesian Overview 2 / 30
Let A and B be two events, p (B jA) is the conditional probability of
B jA. “summarizes what is known about B given A”
Bayesians use this rule with A = something known or assumed (e.g.
the Data), B is something unknown (e.g. coe¢ cients in a model).
Let y be data, y be unobserved data (i.e. to be forecast), Mi for
i = 1, .., m be set of models each of which depends on some
parameters, θ i .
Learning about parameters in a model is based on the posterior
density: p (θ i jMi , y )
Model comparison based on posterior model probability: p (Mi jy )
Prediction based on the predictive density p (y jy ).

() Bayesian Overview 3 / 30
Bayes Theorem

I expect you know basics of probability theory from previous studies,


see Appendix B of my textbook if you do not.
De…nition: Conditional Probability
The conditional probability of A given B, denoted by Pr (AjB ), is the
probability of event A occurring given event B has occurred.
Theorem: Rules of Conditional Probability including Bayes’Theorem
Let A and B denote two events, then
Pr (A,B )
Pr (AjB ) = Pr (B )
and
Pr (A,B )
Pr (B jA) = Pr (A )
.

() Bayesian Overview 4 / 30
These two rules can be combined to yield Bayes’Theorem:

Pr (AjB ) Pr (B )
Pr (B jA) = .
Pr (A)

Note: Above is expressed in terms of two events, A and B. However,


can be interpreted as holding for random variables, A and B with
probability density functions replacing the Pr ()s in previous formulae.

() Bayesian Overview 5 / 30
Learning About Parameters in a Given Model (Estimation)

Assume a single model which depends on parameters θ


Want to …gure out properties of the posterior p (θ jy )
It is convenient to use Bayes’rule to write the posterior in a di¤erent
way.
Bayes’rule lies at the heart of Bayesian econometrics:

p (AjB )p (B )
p (B jA) = .
p (A)

Replace B by θ and A by y to obtain:

p (y j θ )p ( θ )
p ( θ jy ) = .
p (y )

() Bayesian Overview 6 / 30
Bayesians treat p (θ jy ) as being of fundamental interest: “Given the
data, what do we know about θ?”.
Treatment of θ as a random variable is controversial among some
econometricians.
Competitor to Bayesian econometrics, called frequentist econometrics,
says that θ is not a random variable.
For estimation can ignore the term p (y ) since it does not involve θ:

p ( θ jy ) ∝ p (y j θ )p ( θ ).

p (θ jy ) is referred to as the posterior density


p (y jθ ) is the likelihood function
p (θ ) as the prior density.
“posterior is proportional to likelihood times prior”.

() Bayesian Overview 7 / 30
p (θ ), does not depend on the data. It contains any non-data
information available about θ.
Prior information is controversial aspect since it sounds unscienti…c.
Bayesian answers (to be elaborated on later):
i) Often we do have prior information and, if so, we should include it
(more information is good)
ii) Can work with “noninformative” priors
iii) Can use “empirical Bayes” methods which estimate prior from the
data
iv) Training sample priors
v) Bayesian estimators often have better frequentist properties than
frequentist estimators (e.g. results due to Stein show MLE is
inadmissible – but Bayes estimators are admissible)
vi) Prior sensitivity analysis

() Bayesian Overview 8 / 30
Prediction in a Single Model

Prediction based on the predictive density p (y jy )


Since a marginal density can be obtained from a joint density through
integration: Z
p (y jy ) = p (y , θ jy )d θ.

Term inside integral can be rewritten as:


Z
p (y jy ) = p (y jy , θ )p (θ jy )d θ.

Prediction involves the posterior and p (y jy , θ ) (more description


provided later)

() Bayesian Overview 9 / 30
Model Comparison (Hypothesis testing)

Models denoted by Mi for i = 1, .., m. Mi depends on parameters θ i .


Posterior model probability is p (Mi jy ).
Using Bayes rule with B = Mi and A = y we obtain:

p (y jMi )p (Mi )
p ( Mi j y ) =
p (y )

p (Mi ) is referred to as the prior model probability.


p (y jMi ) is called the marginal likelihood.

() Bayesian Overview 10 / 30
How is marginal likelihood calculated?
Posterior can be written as:

p (y jθ i , Mi )p (θ i jMi )
p (θ i jy , Mi ) =
p (y jMi )

Integrate
R both sides with respect to θ i , use fact that
p (θ i jy , Mi )d θ i = 1 and rearrange:
Z
p (y jMi ) = p (y jθ i , Mi )p (θ i jMi )d θ i .

Note: marginal likelihood depends only on the prior and likelihood.

() Bayesian Overview 11 / 30
Posterior odds ratio compares two models:
p ( Mi j y ) p (y jMi )p (Mi )
POij = = .
p (Mj jy ) p (y jMj )p (Mj )
Note: p (y ) is common to both models, no need to calculate.
Can use fact that p (M1 jy ) + p (M2 jy ) + ... + p (Mm jy ) = 1 and POij
to calculate the posterior model probabilities.
E.g. if m = 2 models:
p (M1 jy ) + p (M2 jy ) = 1
p (M1 jy )
PO12 =
p (M2 jy )
imply
PO12
p (M1 jy ) =
1 + PO12
p (M2 jy ) = 1 p (M1 jy ).
The Bayes Factor is:
p (y jMi )
BFij = .
p (y jMj )
() Bayesian Overview 12 / 30
Example: Deriving Posterior When Data Has Bernoulli
Distribution

This is Exercise 1 on Problem Sheet 1 (answer available on web)


Background:
Experiment repeated T times
Each time the outcome can be “success” or “failure”
yt for t = 1, .., T are random variables for each repetition of
experiment
Realization of yt can be 1 or 0
Probability of success is θ (hence probability of failure is 1 θ)
The goal is to estimate θ

() Bayesian Overview 13 / 30
Example (cont.): The Bernoulli Likelihood function

Notation for things above is: yt 2 f0, 1g , 0 θ 1 and

θ if yt = 1
p (yt jθ ) =
1 θ if yt = 0.

Let m be the number of successes in T repetitions of experiment


Likelihood function is:
T
p (y j θ ) = ∏ p (yt jθ )
t =1
m
= θ (1 θ )T m

() Bayesian Overview 14 / 30
Example (cont.): The Beta Prior

View this likelihood in terms of θ: proportional to p.d.f. of a Beta


distribution
See de…nition in textbook Appendix B or Wikipedia
Most common distribution for random variables bounded to lie in the
interval [0, 1]
Commonly used for parameters which are probabilities (like θ)
Bayesians need prior
Let us also Beta distribution for prior
Prior beliefs concerning θ are represented by
1
p (θ ) ∝ θ α (1 θ )δ 1

() Bayesian Overview 15 / 30
Example (cont.): Eliciting a Prior

The researcher chooses prior hyperparameters α > 0 and δ > 0 to


re‡ect beliefs
Called prior elicitation
Properties of Beta distribution imply prior mean is
α
E (θ ) =
α+δ
Suppose you believe, a priori, that success and failure are equally likely
1
E (θ ) = 2 obtained by setting α = δ
1
If I look on Wikipedia I see α = δ = 2 has mean at E (θ ) = 2 but
spreads probability widely over interval [0, 1]
So I might be “relatively noninformative” and choose this for my prior

() Bayesian Overview 16 / 30
Example (cont.): A Noninformative Prior

Or I might set α = δ = 1 and be completely noninformative


Note: α = δ = 1 implies p (θ ) ∝ 1
Uniform distribution over interval [0, 1]
Every value for θ receives same probability (equally likely) =
noninformative prior

() Bayesian Overview 17 / 30
Example (cont.): Deriving the Posterior

To get posterior multiply prior times likelihood


1 1 m
p ( θ jy ) ∝ θα (1 θ )δ θ (1 θ )T m

1 1
= θα (1 θ )δ

where

α = α+m
δ = δ+T m

() Bayesian Overview 18 / 30
Example (cont.): Interpretation and Terminology

Posterior same Beta form as prior (terminology = conjugate)


Posterior has arguments α and δ instead of α and δ
Arguments have been updated:
Begin with prior belief (α or δ) update with data information (m and
T m)
Posterior combines prior and data information
“Bayesian learning” = learn about θ by combining prior and data
information

() Bayesian Overview 19 / 30
Example (cont.): Predictive Density

Derivations of marginal likelihood and predictive density are a bit


messier
Exercise 7.1 in Bayesian Econometric Methods shows predictive
density has Beta-Binomial distribution
Shows
α
E (y jy ) =
α+δ
How do I interpret this?
Question: If I run the experiment again what is the probability of
getting a success?
α
Answer: α+δ

() Bayesian Overview 20 / 30
Summary

These few pages have outlined all the basic theoretical concepts
required for the Bayesian to learn about parameters, compare models
and predict.
This is an enormous advantage: Once you accept that unknown
things (i.e. θ, Mi and y ) are random variables, the rest of Bayesian
approach is non-controversial.
What are going to do in rest of this course?
See how these concepts work in some models of interest.
First the regression model
Then time series models of interest for macroeconomics
Bayesian computation.

() Bayesian Overview 21 / 30
Bayesian Computation

How do you present results from a Bayesian empirical analysis?


p (θ jy ) is a p.d.f. Especially if θ is a vector of many parameters
cannot present a graph of it.
Want features analogous to frequentist point estimates and
con…dence intervals.
A common point estimate is the mean of the posterior density (or
posterior mean).
Let θ be a vector with k elements, θ = (θ 1 , .., θ k )0 . The posterior
mean of any element of θ is:
Z
E ( θ i jy ) = θ i p (θ jy )d θ.

() Bayesian Overview 22 / 30
Aside De…nition B.8: Expected Value
Let g () be a function, then the expected value of g (X ), denoted
E [g (X )], is de…ned by:
N
E [g (X )] = ∑ g (xi ) p (xi )
i =1

if X is discrete random variable with sample space fx1 , x2 , x3 , .., xN g


Z ∞
E [g (X )] = g (x ) p (x ) dx

if X is a continuous random variable (provided E [g (X )] < ∞).

() Bayesian Overview 23 / 30
Common measure of dispersion is the posterior standard deviation
(square root of posterior variance)
Posterior variance:

var (θ i jy ) = E (θ 2i jy ) fE (θ i jy )g2 ,

This requires calculating another expected value:


Z
E (θ 2i jy ) = θ 2i p (θ jy )d θ.

Many other possible features of interest. E.g. what is probability that


a coe¢ cient is positive?
Z ∞
p (θ i 0jy ) = p ( θ i jy )d θ i
0

() Bayesian Overview 24 / 30
All of these posterior features have the form:
Z
E [g ( θ ) jy ] = g (θ )p (θ jy )d θ,

where g (θ ) is a function of interest.


All these features have integrals in them. Marginal likelihood and
predictive density also involved integrals.
Apart from a few simple cases, it is not possible to evaluate these
integrals analytically, and we must turn to the computer.

() Bayesian Overview 25 / 30
Posterior Simulation

The integrals involved in Bayesian analysis are usually evaluated using


simulation methods.
Will use several methods later on. Here we provide some intuition.
Frequentist asymptotic theory uses Laws of Large Numbers (LLN)
and a Central Limit Theorems (CLT).
A typical LLN: “consider a random sample, Y1 , ..YN , as N goes to
in…nity, the average converges to its expectation” (e.g. Y ! µ)
Bayesians use LLN: “consider a random sample from the posterior,
θ (1 ) , ..θ (S ) , as S goes to in…nity, the average of these converges to
E [ θ jy ]”
Note: Bayesians use asymptotic theory, but asymptotic in S (under
control of researcher) not N

() Bayesian Overview 26 / 30
Example: Monte Carlo integration.
Let θ (s ) for s = 1, .., S be a random sample from p (θ jy ) and de…ne
S
1
gbS =
S ∑g θ (s ) ,
s =1

then gbS converges to E [g (θ ) jy ] as S goes to in…nity.


Monte Carlo integration approximates E [g (θ ) jy ], but only if S were
in…nite would the approximation error be zero.
We can choose any value for S (but larger values of S will increase
computational burden).
To gauge size of approximation error, use a CLT to obtain numerical
standard error.

() Bayesian Overview 27 / 30
Most Bayesians write own programs (e.g. using Matlab, Julia,
Python, Gauss, R or C++) to do posterior simulation
Bayesian work cannot (easily) be done in standard econometric
packages like Micro…t, Eviews or Stata.
New Stata has some Bayes, but limited (and little for
macroeconomics)
I have a Matlab website for VARs, TVP-VARs and TVP-FAVARs (see
my website)
Dimitris Korobilis:
https://sites.google.com/site/dimitriskorobilis/matlab
Joshua Chan: http://joshuachan.org/
Haroon Mumtaz: https://sites.google.com/site/hmumtaz77/
Many more using R see
http://cran.r-project.org/web/views/Bayesian.html

() Bayesian Overview 28 / 30
Learning Outside of Lectures

Go through the textbook and readings provided.


In addition to this:
Computational methods are the most important thing for the aspiring
Bayesian econometrician to learn
Thus, I devote all of the tutorial hours in this course to computer
sessions
Four computer sessions based on four question sheets
Computer code will be provided which will “answer” the questions
Work through/adapt/extend the code
Idea is to develop skills so as to produce your own code or adapt
someone else’s for your purposes

() Bayesian Overview 29 / 30
Learning Outside of Lectures

What about proofs/derivations of theoretical results?


In lectures (with a few exceptions) will not do proofs
E.g. just state a particular posterior in Normal with formula given for
mean and variance
To use Bayesian methods in practice, this is usually all that is needed
But if you want to derive posterior for new model or obtain deeper
understanding need to learn necessary tools
These tools best learned by practicing on your own
I will provide Problem Sheets which give practice problems and ask
for derivations of some key results
Answers are provided, so I will not formally take them up in lectures
or tutorials
Bayesian Econometrics Methods by Koop, Poirier and Tobias has
many more practice problems (and answers)
() Bayesian Overview 30 / 30

You might also like