Quantitative Economics With Julia PDF
Quantitative Economics With Julia PDF
September 8, 2020
2
Contents
3 Introductory Examples 17
4 Julia Essentials 45
3
4 CONTENTS
37 Consumption and Tax Smoothing with Complete and Incomplete Markets 669
39 Robustness 705
1
Chapter 1
1.1 Contents
• Overview 1.2
• A Note on Jupyter 1.3
• Desktop Installation of Julia and Jupyter 1.4
• Using Julia on the Web 1.5
• Installing Packages 1.6
1.2 Overview
In this lecture we will cover how to get up and running with Julia.
There are a few different options for using Julia, including a local desktop installation and
Jupyter hosted on the web.
If you have access to a web-based Jupyter and Julia setup, it is typically the most straightfor-
ward way to get started.
Like Python and R, and unlike products such as Matlab and Stata, there is a looser connec-
tion between Julia as a programming language and Julia as a specific development environ-
ment.
While you will eventually use other editors, there are some advantages to starting with the
Jupyter environment while learning Julia.
• The ability to mix formatted text (including mathematical expressions) and code in a
single document.
• Nicely formatted output including tables, figures, animation, video, etc.
• Conversion tools to generate PDF slides, static HTML, etc.
• Online Jupyter may be available, and requires no installation.
We’ll discuss the workflow on these features in the next lecture.
3
4 CHAPTER 1. SETTING UP YOUR JULIA ENVIRONMENT
This is called the JULIA REPL (Read-Evaluate-Print-Loop), which we discuss more later.
• In the Julia REPL, hit ] to enter package mode and then enter.
using InstantiateFromURL
github_project("QuantEcon/quantecon-notebooks-julia", version = "0.8.0", instant
If you have previously installed Jupyter (e.g., installing Anaconda Python by downloading
the binary https://www.anaconda.com/download/) then the add IJulia installs
everything you need into your existing environment.
1.4. DESKTOP INSTALLATION OF JULIA AND JUPYTER 5
Otherwise - or in addition - you can install it directly from the Julia REPL
Choose the default, y if asked to install Jupyter and then JupyterLab via Conda.
After the installation, a JupyterLab tab should open in your browser.
(Optional) To enable launching JupyterLab from a terminal, use add Julia’s Jupyter to your
path.
Next, let’s install the QuantEcon lecture notes to our machine and run them (for more details
on the tools we’ll use, see our lecture on version control).
1. Install git.
After installing the Git Desktop application, click this link on your desktop computer to auto-
matically install the notebooks.
It should open a window in the GitHub desktop app like this
If you do not wish to install the GitHub Desktop, you can get the notebooks using the Git
command-line tool.
Open a new terminal session and run
This will download the repository with the notebooks in the working directory.
Then, cd to that location in your Mac, Linux, or Windows PowerShell terminal
cd quantecon-notebooks-julia
Then, either using the using IJulia; jupyterlab() or execute jupyter lab within
your shell.
And open the Interacting With Julia lecture (the file julia_environment.ipynb in the
list of notebooks in JupyterLab) to continue.
If you have access to an online Julia installation, it is the easiest way to get started.
Eventually, you will want to do a local installation in order to use other tools and editors such
as Atom/Juno, but don’t let the environment get in the way of learning the language.
If you have access to a web-based solution for Jupyter, then that is typically a straightforward
option
• Students: ask your department if these resources are available.
• Universities and workgroups: email [mailto:[email protected]”}{[email protected]}
for help on setting up a shared JupyterHub instance with precompiled packages ready
for these lecture notes.
Obtaining Notebooks
Your first step is to get a copy of the notebooks in your JupyterHub environment.
1.6. INSTALLING PACKAGES 7
While you can individually download the notebooks from the website, the easiest way to ac-
cess the notebooks is usually to clone the repository with Git into your JupyterHub environ-
ment.
JupyterHub installations have different methods for cloning repositories, with which you
can use the url for the notebooks repository: https://github.com/QuantEcon/
quantecon-notebooks-julia.
After you have some of the notebooks available, as in above, these lectures depend on func-
tionality (like packages for plotting, benchmarking, and statistics) that are not installed with
every Jupyter installation on the web.
If your online Jupyter does not come with QuantEcon packages pre-installed, you can install
the InstantiateFromURL package, which is a tool written by the QE team to manage
package dependencies for the lectures.
To add this package, in an online Jupyter notebook run (typically with <Shift-Enter>)
Then, run
using InstantiateFromURL
github_project("QuantEcon/quantecon-notebooks-julia", version = "0.8.0", instant
If your online Jupyter environment does not have the packages pre-installed, it may take 15-
20 minutes for your first QuantEcon notebook to run.
After this step, open the downloaded Interacting with Julia notebook to begin writing code.
If the QuantEcon notebooks do not work after this installation step, you may need to speak
to the JupyterHub administrator.
8 CHAPTER 1. SETTING UP YOUR JULIA ENVIRONMENT
Chapter 2
2.1 Contents
• Overview 2.2
• Using Jupyter 2.3
• Using the REPL 2.4
• (Optional) Adding Jupyter to the Path 2.5
2.2 Overview
In this lecture we’ll start examining different features of the Julia and Jupyter environments.
Recall that the easiest way to get started with these notebooks is to follow the cloning in-
structions earlier.
To summarize, if on a desktop you should clone the notebooks repository https://
github.com/quantecon/quantecon-notebooks-julia, then in a Julia REPL type
Hint: Julia will remember the last commands in the REPL, so you can use up-arrow to
restart JupyterLab.
Alternatively, if you are using an online Jupyter, then you can directly open a new notebook.
Finally, if you installed Jupyter separately or have added added Jupyter to the Path then cd
to the folder location in a terminal, and run
jupyter lab
Regardless, your web browser should open to a page that looks something like this
9
10 CHAPTER 2. INTERACTING WITH JULIA
The notebook displays an active cell, into which you can type Julia commands.
Notice that in the previous figure the cell is surrounded by a blue border.
This means that the cell is selected, and double-clicking will place it in edit mode.
As a result, you can type in Julia code and it will appear in the cell.
When you’re ready to execute these commands, hit Shift-Enter
2.3. USING JUPYTER 11
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem.
This means that the effect of typing at the keyboard depends on which mode you are in.
The two modes are
1. Edit mode
1. Command mode
Switching modes
• To switch to command mode from edit mode, hit the Esc key.
• To switch to edit mode from command mode, hit Enter or click in a cell.
The modal behavior of the Jupyter notebook is a little tricky at first but very efficient when
you get used to it.
To run an existing Julia file using the notebook you can copy and paste the contents into a
cell in the notebook.
If it’s a long file, however, you have the alternative of
The present working directory can be found by executing the command pwd().
12 CHAPTER 2. INTERACTING WITH JULIA
Plots
Note that if you’re using a JupyterHub setup, you will need to first run
Out[2]:
You’ll see something like this (although the style of plot depends on your installation)
2.3. USING JUPYTER 13
Note: The “time-to-first-plot” in Julia takes a while, since it needs to compile many func-
tions - but is almost instantaneous the second time you run the cell.
Let’s go over some more Jupyter notebook features — enough so that we can press ahead
with programming.
Tab Completion
Tab completion in Jupyter makes it easy to find Julia commands and functions available.
For example if you type rep and hit the tab key you’ll get a list of all commands that start
with rep
Getting Help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page.
For example, here we enter a mixture of plain text and LaTeX instead of code
Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX.
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift + Enter to produce this
Julia supports the use of unicode characters such as α and β in your code.
Unicode characters can be typed quickly in Jupyter using the tab key.
Try creating a new code cell and typing \alpha, then hitting the tab key on your keyboard.
Shell Commands
You can execute shell commands (system commands) in Jupyter by prepending a semicolon.
For example, ; ls will execute the UNIX style shell command ls, which — at least for
UNIX style operating systems — lists the contents of the current working directory.
These shell commands are handled by your default system shell and hence are platform spe-
cific.
2.4. USING THE REPL 15
Package Operations
Notebook files are just text files structured in JSON and typically end with .ipynb.
A notebook can easily be saved and shared between users — you just need to pass around the
ipynb file.
To open an existing ipynb file, import it from the dashboard (the first browser page that
opens when you start Jupyter notebook) and run the cells or edit as discussed above.
The Jupyter organization has a site for sharing notebooks called nbviewer which provides a
static HTML representations of notebooks.
QuantEcon also hosts the QuantEcon Notes website, where you can upload and share your
notebooks with other economists and the QuantEcon community.
If you are using a JupyterHub installation, you can start the REPL in JupyterLab by choos-
ing
We examine the REPL and its different modes in more detail in the tools and editors lecture.
If you installed Jupyter using Julia, then you may find it convenient to add it to your system
path in order to launch JupyterLab without running a Julia terminal.
16 CHAPTER 2. INTERACTING WITH JULIA
The default location for the Jupyter binaries is relative to the .julia folder (e.g.,
"C:\Users\USERNAME\.julia\conda\3\Scripts on Windows).
You can find the directory in a Julia REPL using by executing
] add Conda
using Conda
Conda.SCRIPTDIR
Introductory Examples
3.1 Contents
• Overview 3.2
• Example: Plotting a White Noise Process 3.3
• Example: Variations on Fixed Points 3.4
• Exercises 3.5
• Solutions 3.6
3.2 Overview
3.2.1 Level
Our approach is aimed at those who already have at least some knowledge of programming —
perhaps experience with Python, MATLAB, Fortran, C or similar.
In particular, we assume you have some familiarity with fundamental programming concepts
such as
• variables
• arrays or vectors
• loops
• conditionals (if/else)
3.2.2 Approach
In this lecture we will write and then pick apart small Julia programs.
At this stage the objective is to introduce you to basic syntax and data structures.
Deeper concepts—how things work—will be covered in later lectures.
Since we are looking for simplicity the examples are a little contrived
In this lecture, we will often start with a direct MATLAB/FORTRAN approach which often
is poor coding style in Julia, but then move towards more elegant code which is tightly
17
18 CHAPTER 3. INTRODUCTORY EXAMPLES
3.2.3 Set Up
We assume that you’ve worked your way through our getting started lecture already.
In particular, the easiest way to install and precompile all the Julia packages used in QuantE-
con notes is to type ] add InstantiateFromURL and then work in a Jupyter notebook,
as described here.
To begin, let’s suppose that we want to simulate and plot the white noise process
𝜖0 , 𝜖1 , … , 𝜖𝑇 , where each draw 𝜖𝑡 is independent standard normal.
1. add the packages directly into your global installation (e.g. Pkg.add("MyPackage")
or ] add MyPackage)
If you have never run this code on a particular computer, it is likely to take a long time as it
downloads, installs, and compiles all dependent packages.
This code will download and install project files from the lecture repo.
3.3. EXAMPLE: PLOTTING A WHITE NOISE PROCESS 19
We will discuss it more in Tools and Editors, but these files provide a listing of packages and
versions used by the code.
This ensures that an environment for running code is reproducible, so that anyone can
replicate the precise set of package and versions used in construction.
The careful selection of package versions is crucial for reproducibility, as otherwise your code
can be broken by changes to packages out of your control.
After the installation and activation, using provides a way to say that a particular code or
notebook will use the package.
Some functions are built into the base Julia, such as randn, which returns a single draw from
a normal distibution with mean 0 and variance 1 if given no parameters.
In [3]: randn()
Out[3]: -0.1428301483114254
Other functions require importing all of the names from an external library
n = 100
ϵ = randn(n)
plot(1:n, ϵ)
Out[4]:
20 CHAPTER 3. INTRODUCTORY EXAMPLES
3.3.3 Arrays
As a language intended for mathematical and scientific computing, Julia has strong support
for using unicode characters.
In the above case, the ϵ and many other symbols can be typed in most Julia editor by provid-
ing the LaTeX and <TAB>, i.e. \epsilon<TAB>.
The return type is one of the most fundamental Julia data types: an array
In [5]: typeof(ϵ)
Out[5]: Array{Float64,1}
In [6]: ϵ[1:5]
The information from typeof() tells us that ϵ is an array of 64 bit floating point values, of
dimension 1.
In Julia, one-dimensional arrays are interpreted as column vectors for purposes of linear alge-
bra.
The ϵ[1:5] returns an array of the first 5 elements of ϵ.
Notice from the above that
• array indices start at 1 (like MATLAB and Fortran, but unlike Python and C)
• array elements are referenced using square brackets (unlike MATLAB and Fortran)
To get help and examples in Jupyter or other julia editor, use the ? before a function
name or syntax.
3.3. EXAMPLE: PLOTTING A WHITE NOISE PROCESS 21
?typeof
Examples
julia> a = 1//2;
julia> typeof(a)
Rational{Int64}
julia> typeof(M)
Array{Float64,2}
Although there’s no need in terms of what we wanted to achieve with our program, for the
sake of learning syntax let’s rewrite our program to use a for loop for generating the data.
Note
In Julia v0.7 and up, the rules for variables accessed in for and while loops can
be sensitive to how they are used (and variables can sometimes require a global
as part of the declaration). We strongly advise you to avoid top level (i.e. in the
REPL or outside of functions) for and while loops outside of Jupyter note-
books. This issue does not apply when used within functions.
Starting with the most direct version, and pretending we are in a world where randn can
only return a single value
Here we first declared ϵ to be a vector of n numbers, initialized by the floating point 0.0.
The for loop then populates this array by successive calls to randn().
Like all code blocks in Julia, the end of the for loop code block (which is just one line here)
is indicated by the keyword end.
The word in from the for loop can be replaced by either � or =.
The index variable is looped over for all integers from 1:n – but this does not actually create
a vector of those indices.
22 CHAPTER 3. INTRODUCTORY EXAMPLES
Instead, it creates an iterator that is looped over – in this case the range of integers from 1
to n.
While this example successfully fills in ϵ with the correct values, it is very indirect as the con-
nection between the index i and the ϵ vector is unclear.
To fix this, use eachindex
Out[9]: 0.1361239072181655
Out[10]: true
In these examples, note the use of ≈ to test equality, rather than ==, which is appropriate for
integers and other types.
Approximately equal, typed with \approx<TAB>, is the appropriate way to compare any
floating point numbers due to the standard issues of floating point math.
For the sake of the exercise, let’s go back to the for loop but restructure our program so
that generation of random variables takes place within a user-defined function.
To make things more interesting, instead of directly plotting the draws from the distribution,
let’s plot the squares of these draws
3.3. EXAMPLE: PLOTTING A WHITE NOISE PROCESS 23
data = generatedata(10)
plot(data)
Out[11]:
Here
• function is a Julia keyword that indicates the start of a function definition
• generatedata is an arbitrary name for the function
• return is a keyword indicating the return value, as is often unnecessary
Let us make this example slightly better by “remembering” that randn can return a vectors.
for i in eachindex(ϵ)
ϵ[i] = ϵ[i]^2 # squaring the result
end
return ϵ
end
data = generatedata(5)
24 CHAPTER 3. INTRODUCTORY EXAMPLES
While better, the looping over the i index to square the results is difficult to read.
Instead of looping, we can broadcast the ^2 square function over a vector using a ..
To be clear, unlike Python, R, and MATLAB (to a lesser extent), the reason to drop the for
is not for performance reasons, but rather because of code clarity.
Loops of this sort are at least as efficient as vectorized approach in compiled languages like
Julia, so use a for loop if you think it makes the code more clear.
Finally, we can broadcast any function, where squaring is only a special case.
As a final – abstract – approach, we can make the generatedata function able to generi-
cally apply to a function.
Whether this example is better or worse than the previous version depends on how it is used.
High degrees of abstraction and generality, e.g. passing in a function f in this case, can make
code either clearer or more confusing, but Julia enables you to use these techniques with no
performance overhead.
For this particular case, the clearest and most general solution is probably the simplest.
x = randn(n)
plot(f.(x), label="x^2")
plot!(x, label="x") # layer on the same plot
Out[17]:
26 CHAPTER 3. INTRODUCTORY EXAMPLES
function plothistogram(distribution, n)
ϵ = rand(distribution, n) # n draws from distribution
histogram(ϵ)
end
lp = Laplace()
plothistogram(lp, 500)
Out[18]:
3.3. EXAMPLE: PLOTTING A WHITE NOISE PROCESS 27
Let’s have a casual discussion of how all this works while leaving technical details for later in
the lectures.
First, lp = Laplace() creates an instance of a data type defined in the Distributions
module that represents the Laplace distribution.
The name lp is bound to this value.
When we make the function call plothistogram(lp, 500) the code in the body of the
function plothistogram is run with
• the name distribution bound to the same value as lp
• the name n bound to the integer 500
A Mystery
In [19]: rand(3)
On the other hand, distribution points to a data type representing the Laplace distribu-
tion that has been defined in a third party package.
28 CHAPTER 3. INTRODUCTORY EXAMPLES
So how can it be that rand() is able to take this kind of value as an argument and return
the output that we want?
The answer in a nutshell is multiple dispatch, which Julia uses to implement generic pro-
gramming.
This refers to the idea that functions in Julia can have different behavior depending on the
particular arguments that they’re passed.
Hence in Julia we can take an existing function and give it a new behavior by defining how it
acts on a new type of value.
The compiler knows which function definition to apply to in a given setting by looking at the
types of the values the function is called on.
In Julia these alternative versions of a function are called methods.
Consider the simple equation, where the scalars 𝑝, 𝛽 are given, and 𝑣 is the scalar we wish to
solve for
𝑣 = 𝑝 + 𝛽𝑣
Of course, in this simple example, with parameter restrictions this can be solved as 𝑣 = 𝑝/(1−
𝛽).
Rearrange the equation in terms of a map 𝑓(𝑥) ∶ ℝ → ℝ
𝑣 = 𝑓(𝑣) (1)
where
𝑓(𝑣) ∶= 𝑝 + 𝛽𝑣
One approach to finding a fixed point of (1) is to start with an initial value, and iterate the
map
3.4. EXAMPLE: VARIATIONS ON FIXED POINTS 29
For this exact f function, we can see the convergence to 𝑣 = 𝑝/(1 − 𝛽) when |𝛽| < 1 by
iterating backwards and taking 𝑛 → ∞
𝑛−1
𝑣𝑛+1 = 𝑝 + 𝛽𝑣𝑛 = 𝑝 + 𝛽𝑝 + 𝛽 2 𝑣𝑛−1 = 𝑝 ∑ 𝛽 𝑖 + 𝛽 𝑛 𝑣0
𝑖=0
To implement the iteration in (2), we start by solving this problem with a while loop.
The syntax for the while loop contains no surprises, and looks nearly identical to a MATLAB
implementation.
The while loop, like the for loop should only be used directly in Jupyter or the inside of a
function.
Here, we have used the norm function (from the LinearAlgebra base library) to compare
the values.
The other new function is the println with the string interpolation, which splices the value
of an expression or variable prefixed by $ into a string.
An alternative approach is to use a for loop, and check for convergence in each iteration.
for i in 1:maxiter
v_new = p + β * v_old # the f(v) map
normdiff = norm(v_new - v_old)
if normdiff < tolerance # check convergence
iter = i
break # converged, exit loop
end
# replace and continue
v_old = v_new
end
println("Fixed point = $v_old, and |f(x) - x| = $normdiff in $iter�
↪iterations")
The new feature there is break , which leaves a for or while loop.
The first problem with this setup is that it depends on being sequentially run – which can be
easily remedied with a function.
# some values
p = 1.0 # note 1.0 rather than 1
β = 0.9
maxiter = 1000
tolerance = 1.0E-7
v_initial = 0.8 # initial condition
The chief issue is that the algorithm (finding a fixed point) is reusable and generic, while the
function we calculate p + β * v is specific to our problem.
A key feature of languages like Julia, is the ability to efficiently handle functions passed to
other functions.
maxiter = 1000
tolerance = 1.0E-7
v_initial = 0.8 # initial condition
Much closer, but there are still hidden bugs if the user orders the settings or returns types
wrong.
To enable this, Julia has two features: named function parameters, and named tuples
iter = 1
while normdiff > tolerance && iter <= maxiter
x_new = f(x_old) # use the passed in map
normdiff = norm(x_new - x_old)
x_old = x_new
iter = iter + 1
end
return (value = x_old, normdiff=normdiff, iter=iter) # A named tuple
end
" iterations")
In this example, all function parameters after the ; in the list, must be called by name.
Furthermore, a default value may be enabled – so the named parameter iv is required while
tolerance and maxiter have default values.
The return type of the function also has named fields, value, normdiff, and iter – all
accessed intuitively using ..
To show the flexibilty of this code, we can use it to find a fixed point of the non-linear logistic
equation, 𝑥 = 𝑓(𝑥) where 𝑓(𝑥) ∶= 𝑟𝑥(1 − 𝑥).
In [25]: r = 2.0
f(x) = r * x * (1 - x)
iterations")
p = 1.0
3.4. EXAMPLE: VARIATIONS ON FIXED POINTS 33
β = 0.9
f(v) = p .+ β * v # broadcast the +
sol = fixedpoint(f, [0.8])
println("Fixed point = $(sol.zero), and |f(x) - x| = $(norm(f(sol.zero) -�
↪sol.zero)) in
" *
"$(sol.iterations) iterations")
The fixedpoint function from the NLsolve.jl library implements the simple fixed point
iteration scheme above.
Since the NLsolve library only accepts vector based inputs, we needed to make the f(v)
function broadcast on the + sign, and pass in the initial condition as a vector of length 1 with
[0.8].
While a key benefit of using a package is that the code is clearer, and the implementation is
tested, by using an orthogonal library we also enable performance improvements.
" *
"$(sol.iterations) iterations")
Note that this completes in 3 iterations vs 177 for the naive fixed point iteration algorithm.
Since Anderson iteration is doing more calculations in an iteration, whether it is faster or not
would depend on the complexity of the f function.
But this demonstrates the value of keeping the math separate from the algorithm, since by
decoupling the mathematical definition of the fixed point from the implementation in (2), we
were able to exploit new algorithms for finding a fixed point.
The only other change in this function is the move from directly defining f(v) and using an
anonymous function.
Similar to anonymous functions in MATLAB, and lambda functions in Python, Julia enables
the creation of small functions without any names.
The code v -> p .+ β * v defines a function of a dummy argument, v with the same
body as our f(x).
34 CHAPTER 3. INTRODUCTORY EXAMPLES
A key benefit of using Julia is that you can compose various packages, types, and techniques,
without making changes to your underlying source.
As an example, consider if we want to solve the model with a higher-precision, as floating
points cannot be distinguished beyond the machine epsilon for that type (recall that comput-
ers approximate real numbers to the nearest binary of a given precision; the machine epsilon
is the smallest nonzero magnitude).
In Julia, this number can be calculated as
In [28]: eps()
Out[28]: 2.220446049250313e-16
For many cases, this is sufficient precision – but consider that in iterative algorithms applied
millions of times, those small differences can add up.
The only change we will need to our model in order to use a different floating point type is to
call the function with an arbitrary precision floating point, BigFloat, for the initial value.
# otherwise identical
sol = fixedpoint(v -> p .+ β * v, iv)
println("Fixed point = $(sol.zero), and |f(x) - x| = $(norm(f(sol.zero) -�
↪sol.zero)) in
" *
"$(sol.iterations) iterations")
Here, the literal BigFloat(0.8) takes the number 0.8 and changes it to an arbitrary pre-
cision number.
The result is that the residual is now exactly 0.0 since it is able to use arbitrary precision
in the calculations, and the solution has a finite-precision solution with those parameters.
The above example can be extended to multivariate maps without any modifications to the
fixed point iteration code.
Using our own, homegrown iteration and simply passing in a bivariate map:
iv = [0.8, 2.0]
f(v) = p .+ β * v # note that p and β are used in the function!
"iterations")
This also works without any modifications with the fixedpoint library function.
" *
"$(sol.iterations) iterations")
" *
"$(sol.iterations) iterations")
The @SVector in front of the [1.0, 2.0, 0.1] is a macro for turning a vector literal into
a static vector.
36 CHAPTER 3. INTRODUCTORY EXAMPLES
All macros in Julia are prefixed by @ in the name, and manipulate the code prior to compila-
tion.
We will see a variety of macros, and discuss the “metaprogramming” behind them in a later
lecture.
3.5 Exercises
3.5.1 Exercise 1
3.5.2 Exercise 2
3.5.3 Exercise 3
3.5.4 Exercise 4
Write a program that prints one realization of the following random device:
• Flip an unbiased coin 10 times.
• If 3 consecutive heads occur one or more times within this sequence, pay one dollar.
• If not, pay nothing.
Once again use only rand() as your random number generator.
3.5. EXERCISES 37
3.5.5 Exercise 5
3.5.6 Exercise 6
Plot three simulated time series, one for each of the cases 𝛼 = 0, 𝛼 = 0.8 and 𝛼 = 0.98.
(The figure will illustrate how time series with the same one-step-ahead conditional volatili-
ties, as these three processes have, can have very different unconditional volatilities)
3.5.7 Exercise 7
1. calculate the first-passage time, 𝑇0 , for 100 simulated random walks – to a 𝑡max = 200
and plot a histogram
2. plot the sample mean of 𝑇0 from the simulation for 𝛼 ∈ {0.8, 1.0, 1.2}
𝑓(𝑥𝑛 )
𝑥𝑛+1 = 𝑥𝑛 −
𝑓 ′ (𝑥𝑛 )
2. Test it with 𝑓(𝑥) = (𝑥 − 1)3 and another function of your choice where you can analyti-
cally find the derivative.
For those impatient to use more advanced features of Julia, implement a version of Exercise
8(a) where f_prime is calculated with auto-differentiation.
f(0.1), f_prime(0.1)
1. Using the D(f) operator definition above, implement a version of Newton’s method
that does not require the user to provide an analytical derivative.
3.6 Solutions
3.6.1 Exercise 1
factorial2(4)
Out[34]: 24
Out[35]: true
3.6. SOLUTIONS 39
3.6.2 Exercise 2
for j in 1:25
b = binomial_rv(10, 0.5)
print("$b, ")
end
6, 5, 5, 3, 8, 5, 2, 6, 6, 6, 5, 3, 8, 8, 5, 7, 8, 5, 3, 4, 8, 8, 8, 4, 5,
3.6.3 Exercise 3
In [37]: n = 1000000
count = 0
for i in 1:n
u, v = rand(2)
d = sqrt((u - 0.5)^2 + (v - 0.5)^2) # distance from middle of square
if d < 0.5
count += 1
end
end
area_estimate = count / n
3.144936
3.6.4 Exercise 4
In [38]: payoff = 0
count = 0
40 CHAPTER 3. INTRODUCTORY EXAMPLES
print("Count = ")
for i in 1:10
U = rand()
if U < 0.5
count += 1
else
count = 0
end
print(count)
if count == 3
payoff = 1
end
end
println("\npayoff = $payoff")
Count = 1200123012
payoff = 1
We can simplify this somewhat using the ternary operator. Here are some examples
Out[39]: "foo"
Out[40]: "bar"
print("Count = ")
for i in 1:10
U = rand()
count = U < 0.5 ? count + 1 : 0
print(count)
if count == 3
payoff = 1
end
end
println("\npayoff = $payoff")
Count = 0100101230
payoff = 1
3.6. SOLUTIONS 41
3.6.5 Exercise 5
for t in 1:n
x[t+1] = α * x[t] + randn()
end
plot(x)
Out[42]:
3.6.6 Exercise 6
for α in αs
x = zeros(n + 1)
x[1] = 0.0
for t in 1:n
x[t+1] = α * x[t] + randn()
end
plot!(p, x, label = "alpha = $α") # add to plot p
end
p # display plot
42 CHAPTER 3. INTRODUCTORY EXAMPLES
Out[43]:
As a hint, notice the following pattern for finding the number of draws of a uniform random
number until it is below a given threshold
Out[44]: 2
Additionally, it is sometimes convenient to add to just push numbers onto an array without
indexing it directly
for i in 1:100
val = rand()
if val < 0.5
push!(vals, val)
3.6. SOLUTIONS 43
end
end
println("There were $(length(vals)) below 0.5")
Julia Essentials
4.1 Contents
• Overview 4.2
• Common Data Types 4.3
• Iterating 4.4
• Comparisons and Logical Operators 4.5
• User-Defined Functions 4.6
• Broadcasting 4.7
• Scoping and Closures 4.8
• Exercises 4.9
• Solutions 4.10
Having covered a few examples, let’s now turn to a more systematic exposition of the essen-
tial features of the language.
4.2 Overview
Topics:
• Common data types
• Iteration
• More on user-defined functions
• Comparisons and logic
4.2.1 Setup
45
46 CHAPTER 4. JULIA ESSENTIALS
Like most languages, Julia language defines and provides functions for operating on standard
data types such as
• integers
• floats
• strings
• arrays, etc…
Let’s learn a bit more about them.
A particularly simple data type is a Boolean value, which can be either true or false.
In [3]: x = true
Out[3]: true
In [4]: typeof(x)
Out[4]: Bool
Out[5]: false
The two most common data types used to represent numbers are integers and floats.
(Computers distinguish between floats and integers because arithmetic is handled in a differ-
ent way)
In [6]: typeof(1.0)
Out[6]: Float64
In [7]: typeof(1)
Out[7]: Int64
If you’re running a 32 bit system you’ll still see Float64, but you will see Int32 instead of
Int64 (see the section on Integer types from the Julia manual).
Arithmetic operations are fairly standard.
In [8]: x = 2; y = 1.0;
The ; can be used to suppress output from a line of code, or to combine two lines of code
together (as above), but is otherwise not necessary.
4.3. COMMON DATA TYPES 47
In [9]: x * y
Out[9]: 2.0
In [10]: x^2
Out[10]: 4
In [11]: y / x
Out[11]: 0.5
Although the * can be omitted for multiplication between a numeric literal and a variable.
In [12]: 2x - 3y
Out[12]: 1.0
A useful tool for displaying both expressions and code is to use the @show macro, which dis-
plays the text and the results.
In [13]: @show 2x - 3y
@show x + y;
2x - 3y = 1.0
x + y = 3.0
Here we have used ; to suppress the output on the last line, which otherwise returns the re-
sults of x + y.
Complex numbers are another primitive data type, with the imaginary part being specified by
im.
In [14]: x = 1 + 2im
Out[14]: 1 + 2im
In [15]: y = 1 - 2im
Out[15]: 1 - 2im
Out[16]: 5 + 0im
There are several more primitive data types that we’ll introduce as necessary.
48 CHAPTER 4. JULIA ESSENTIALS
4.3.2 Strings
In [17]: x = "foobar"
Out[17]: "foobar"
In [18]: typeof(x)
Out[18]: String
In [19]: x = 10; y = 20
Out[19]: 20
With parentheses, you can splice the results of expressions into strings as well.
Out[22]: "foobar"
In [24]: split(s)
4.3. COMMON DATA TYPES 49
Out[27]: "foobar"
Julia can also find and replace using regular expressions (see regular expressions documenta-
tion for more info).
4.3.3 Containers
Out[29]: ("foo", 2)
In [31]: x = "foo", 1
50 CHAPTER 4. JULIA ESSENTIALS
Out[31]: ("foo", 1)
Out[32]: ("foo", 1)
In [33]: x = ("foo", 1)
Out[33]: ("foo", 1)
Tuples can be created with a hanging , – this is useful to create a tuple with one element.
Referencing Items
The last element of a sequence type can be accessed with the keyword end.
In [37]: x[end]
Out[37]: 40
In [38]: x[end-1]
Out[38]: 30
4.3. COMMON DATA TYPES 51
To access multiple elements of an array or tuple, you can use slice notation.
In [39]: x[1:3]
In [40]: x[2:end]
In [41]: "foobar"[3:end]
Out[41]: "obar"
Dictionaries
In [43]: d["age"]
Out[43]: 33
4.4 Iterating
One of the most important tasks in computing is stepping through a sequence of data and
performing a given action.
Julia provides neat and flexible tools for iteration as we now discuss.
4.4.1 Iterables
An iterable is something you can put on the right hand side of for and loop over.
These include sequence data types like arrays.
123
In [47]: keys(d)
This makes sense, since the most common thing you want to do with keys is loop over them.
The benefit of providing an iterator rather than an array, say, is that the former is more
memory efficient.
Should you need to transform an iterator into an array you can always use collect().
4.4. ITERATING 53
In [48]: collect(keys(d))
You can loop over sequences without explicit indexing, which often leads to neater code.
For example compare
Out[49]: 1:5
1
4
9
16
25
1
4
9
16
25
Julia provides some functional-style helper functions (similar to Python and R) to facilitate
looping without indices.
One is zip(), which is used for stepping through pairs from two sequences.
For example, try running the following code
If we happen to need the index as well as the value, one option is to use enumerate().
The following snippet will give you the idea
4.4.3 Comprehensions
13 14 15
14 15 16
[:, :, 2] =
13 14 15
14 15 16
15 16 17
[:, :, 3] =
14 15 16
15 16 17
16 17 18
4.4.4 Generators
In [61]: xs = 1:10000
f(x) = x^2
f_x = f.(xs)
sum(f_x)
Out[61]: 333383335000
We could have created the temporary using a comprehension, or even done the comprehen-
sion within the sum function, but these all create temporary arrays.
sum(f_x2) = 333383335000
sum([f(x) for x = xs]) = 333383335000
Note, that if you were hand-code this, you would be able to calculate the sum by simply it-
erating to 10000, applying f to each number, and accumulating the results. No temporary
vectors would be necessary.
A generator can emulate this behavior, leading to clear (and sometimes more efficient) code
when used with any function that accepts iterators. All you need to do is drop the ] brack-
ets.
Out[63]: 333383335000
Notice that the first two cases are nearly identical, and allocate a temporary array, while the
final case using generators has no allocations.
In this example you may see a speedup of over 1000x. Whether using generators leads to code
that is faster or slower depends on the cirumstances, and you should (1) always profile rather
than guess; and (2) worry about code clarify first, and performance second—if ever.
4.5.1 Comparisons
In [65]: x = 1
Out[65]: 1
In [66]: x == 2
Out[66]: false
In [67]: x != 3
Out[67]: true
In [68]: 1 + 1E-8 ≈ 1
Out[68]: true
Be careful when using this, however, as there are subtleties involving the scales of the quanti-
ties compared.
Out[69]: false
Out[70]: true
Remember
• P && Q is true if both are true, otherwise it’s false.
• P || Q is false if both are false, otherwise it’s true.
In Julia, the return statement is optional, so that the following functions have identical be-
havior
function f2(a, b)
a * b
end
When no return statement is present, the last value obtained when executing the code block
is returned.
Although some prefer the second option, we often favor the former on the basis that explicit
is better than implicit.
A function can have arbitrarily many return statements, with execution terminating when
the first return is hit.
You can see this in action when experimenting with the following function
For short function definitions Julia offers some attractive simplified syntax.
First, when the function body is a simple expression, it can be defined without the
function keyword or end.
Out[74]: 1.2246467991473532e-16
4.6. USER-DEFINED FUNCTIONS 59
In [75]: map(x -> sin(1 / x), randn(3)) # apply function to each element
In [77]: f(pi)
Out[77]: 0.36787944117144233
In [78]: f(pi, 2)
Out[78]: 2.718281828459045
Out[79]: 2.718281828459045
60 CHAPTER 4. JULIA ESSENTIALS
4.7 Broadcasting
More generally, if f is any Julia function, then f. references the broadcasted version.
Conveniently, this applies to user-defined functions as well.
To illustrate, let’s write a function chisq such that chisq(k) returns a chi-squared random
variable with k degrees of freedom when k is an integer.
In doing this we’ll exploit the fact that, if we take k independent standard normals, square
them all and sum, we get a chi-squared with k degrees of freedom.
The macro @assert will check that the next expression evaluates to true, and will stop and
display an error otherwise.
In [83]: chisq(3)
4.7. BROADCASTING 61
Out[83]: 1.3960489970287002
Note that calls with integers less than 1 will trigger an assertion failure inside the function
body.
In [84]: chisq(-2)
AssertionError: k > 0
Stacktrace:
The broadcasting notation is not simply vectorization, as it is able to “fuse” multiple broad-
casts together to generate efficient code.
In [86]: x = 1.0:1.0:5.0
y = [2.0, 4.0, 5.0, 6.0, 8.0]
z = similar(y)
z .= x .+ y .- sin.(x) # generates efficient code instead of many�
↪temporaries
In [87]: @. z = x + y - sin(x)
Since the +, -, = operators are functions, behind the scenes this is broadcasting against
both the x and y vectors.
The compiler will fix anything which is a scalar, and otherwise iterate across every vector
f.(a, b) = [5 7 9]
f.(a, 2) = [3 4 5]
The compiler is only able to detect “scalar” values in this way for a limited number of types
(e.g. integers, floating points, etc) and some packages (e.g. Distributions).
For other types, you will need to wrap any scalars in Ref to fix them, or else it will try to
broadcast the value.
Another place that you may use a Ref is to fix a function parameter you do not want to
broadcast over.
Since global variables are usually a bad idea, we will concentrate on understanding the role of
good local scoping practice.
That said, while many of the variables in these Jupyter notebook are global, we have been
careful to write the code so that the entire code could be copied inside of a function.
When copied inside a function, variables become local and functions become closures.
Warning.
For/while loops and global variables in Jupyter vs. the REPL: * In the current version of
Julia, there is a distinction between the use of scope in an interactive Jupyter environment.
* The description here of globals applies to Jupyter notebooks, and may also apply to the
REPL and top-level scripts. * In general, you should be creating functions when working with
.jl files, and the distinction generally won’t apply.
For more information on using globals outside of Jupyter, (see variable scoping documenta-
tion), though these rules are likely to become consistent in a future version.
4.8. SCOPING AND CLOSURES 63
4.8.1 Functions
The scope of a variable name determines where it is valid to refer to it, and how clashes be-
tween names can occur.
Think of the scope as a list of all of the name bindings of relevant variables.
Different scopes could contain the same name but be assigned to different things.
An obvious place to start is to notice that functions introduce their own local names.
Out[90]: 25
Out[91]: 25
Out[92]: 25
In [93]: f(x; y = 1) = x + y # `x` and `y` are names local to the `f` function
xval = 0.1
yval = 2
f(xval; y = yval)
Out[93]: 2.1
In [94]: f(x; y = 1) = x + y # `x` and `y` are names local to the `f` function
x = 0.1
y = 2
f(x; y = y) # left hand `y` is the local name of the argument in the�
↪function
Out[94]: 2.1
Similarly to named arguments, the local scope also works with named tuples.
x = 0.1
y = 2
# create a named tuple with names `x` and `y` local to the tuple, bound�
↪ to the RHS `x`
and `y`
(x = x, y = y)
Out[95]: (x = 0.1, y = 2)
As you use Julia, you will find that scoping is very natural and that there is no reason to
avoid using x and y in both places.
In fact, it frequently leads to clear code closer to the math when you don’t need to specify
intermediaries.
Another example is with broadcasting
4.8.2 Closures
Frequently, you will want to have a function that calculates a value given some fixed parame-
ters.
4.8. SCOPING AND CLOSURES 65
f(1, 0.2)
Out[97]: 0.2
While the above was convenient, there are other times when you want to simply fix a variable
or refer to something already calculated.
In [98]: a = 0.2
f(x) = a * x^2 # refers to the `a` in the outer scope
f(1) # univariate function
Out[98]: 0.2
When the function f is parsed in Julia, it will look to see if any of the variables are already
defined in the current scope.
In this case, it finds the a since it was defined previously, whereas if the code defines a =
0.2 after the f(x) definition, it would fail.
This also works when embedded in other functions
Out[99]: 0.2
Comparing the two: the key here is not that a is a global variable, but rather that the f
function is defined to capture a variable from an outer scope.
This is called a closure, and are used throughout the lectures.
It is generally bad practice to modify the captured variable in the function, but otherwise the
code becomes very clear.
One place where this can be helpful is in a string of dependent calculations.
For example, if you wanted to calculate a (a, b, c) from 𝑎 = 𝑓(𝑥), 𝑏 = 𝑔(𝑎), 𝑐 = ℎ(𝑎, 𝑏)
where 𝑓(𝑥) = 𝑥2 , 𝑔(𝑎) = 2𝑎, ℎ(𝑎, 𝑏) = 𝑎 + 𝑏
solvemodel(0.1)
One of the benefits of working with closures and functions is that you can return them from
other functions.
This leads to some natural programming patterns we have already been using, where we can
use functions of functions and functions returning functions (or closures).
To see a simple example, consider functions that accept other functions (including closures)
This pattern has already been used extensively in our code and is key to keeping things like
interpolation, numerical integration, and plotting generic.
One example of using this in a library is Expectations.jl, where we can pass a function to the
expectation function.
@show d = Exponential(2.0)
f(x) = x^2
@show expectation(f, d); # E(f(x))
d = Exponential(2.0) = Exponential{Float64}(θ=2.0)
expectation(f, d) = 8.00000000000004
f(x) = x^2
h = multiplyit(2.0, f) # use our quadratic, returns a new function�
↪which doubles the
result
h(2) # returned function is like any other function
4.8. SCOPING AND CLOSURES 67
Out[103]: 8.0
f(x) = x^2
h = snapabove(f, 2.0)
using Plots
gr(fmt=:png);
plot(h, 0.0:0.1:3.0)
Out[104]:
4.8.4 Loops
The for and while loops also introduce a local scope, and you can roughly reason about
them the same way you would a function/closure.
In particular
68 CHAPTER 4. JULIA ESSENTIALS
1
2
1
2
On the other hand just as with closures, if a variable is already defined it will be available in
the inner scope.
Out[106]: 2
@show val;
# @show difference fails, not in scope
val = 0.001953125
While we have argued against global variables as poor practice, you may have noticed that in
Jupyter notebooks we have been using them throughout.
Here, global variables are used in an interactive editor because they are convenient, and not
because they are essential to the design of functions.
A simple test of the difference is to take a segment of code and wrap it in a function, for ex-
ample
4.9. EXERCISES 69
In [108]: x = 2.0
f(y) = x + y
z = f(4.0)
for i in 1:3
z += i
end
println("z = $z")
z = 12.0
Here, the x and z are global variables, the function f refers to the global variable x, and the
global variable z is modified in the for loop.
However, you can simply wrap the entire code in a function
for i in 1:3
z += i
end
println("z = $z")
end
wrapped()
z = 12.0
4.9 Exercises
4.9.1 Exercise 1
Part 1: Given two numeric arrays or tuples x_vals and y_vals of equal length, compute
their inner product using zip().
Part 2: Using a comprehension, count the number of even numbers between 0 and 99.
• Hint: iseven returns true for even numbers and false for odds.
Part 3: Using a comprehension, take pairs = ((2, 5), (4, 2), (9, 8), (12,
10)) and count the number of pairs (a, b) such that both a and b are even.
70 CHAPTER 4. JULIA ESSENTIALS
4.9.2 Exercise 2
𝑛
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑛 𝑥𝑛 = ∑ 𝑎𝑖 𝑥𝑖 (1)
𝑖=0
Using enumerate() in your loop, write a function p such that p(x, coeff) computes the
value in (1) given a point x and an array of coefficients coeff.
4.9.3 Exercise 3
Write a function that takes a string as an argument and returns the number of capital letters
in the string.
Hint: uppercase("foo") returns "FOO".
4.9.4 Exercise 4
Write a function that takes two sequences seq_a and seq_b as arguments and returns true
if every element in seq_a is also an element of seq_b, else false.
• By “sequence” we mean an array, tuple or string.
4.9.5 Exercise 5
4.9.6 Exercise 6
Out[110]: 167
4.9.7 Exercise 7
1. Pass in a range instead of the a, b, and n. Test with a range such as nodes =
-1.0:0.5:1.0.
2. Instead of the while used in the solution to Exercise 5, find a better way to efficiently
bracket the x in the nodes.
4.10 Solutions
4.10.1 Exercise 1
Part 1 solution:
Here’s one possible solution
Out[111]: 6
72 CHAPTER 4. JULIA ESSENTIALS
Part 2 solution:
One solution is
Out[112]: 50
Part 3 solution:
Here’s one possibility
In [113]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum(xy -> all(iseven, xy), pairs)
Out[113]: 2
4.10.2 Exercise 2
Out[115]: 6
4.10.3 Exercise 3
Out[116]: 3
4.10.4 Exercise 4
# test
println(f_ex4([1, 2], [1, 2, 3]))
println(f_ex4([1, 2, 3], [1, 2]))
true
false
`issubset`
true
false
4.10.5 Exercise 5
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals
Let’s test it
Out[121]:
4.10.6 Exercise 6
5.1 Contents
• Overview 5.2
• Array Basics 5.3
• Operations on Arrays 5.4
• Ranges 5.5
• Tuples and Named Tuples 5.6
• Nothing, Missing, and Unions 5.7
• Exercises 5.8
• Solutions 5.9
“Let’s be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.” – Michael Crichton
5.2 Overview
In Julia, arrays and tuples are the most important data type for working with numerical
data.
In this lecture we give more details on
• creating and manipulating Julia arrays
• fundamental array processing operations
• basic matrix algebra
• tuples and named tuples
• ranges
• nothing, missing, and unions
75
76 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
5.2.1 Setup
The output tells us that the arrays are of types Array{Int64,1} and Array{Float64,1}
respectively.
Here Int64 and Float64 are types for the elements inferred by the compiler.
We’ll talk more about types later.
The 1 in Array{Int64,1} and Array{Any,1} indicates that the array is one dimensional
(i.e., a Vector).
This is the default for many Julia functions that create arrays
In [5]: typeof(randn(100))
Out[5]: Array{Float64,1}
5.3. ARRAY BASICS 77
In Julia, one dimensional vectors are best interpreted as column vectors, which we will see
when we take transposes.
We can check the dimensions of a using size() and ndims() functions
In [6]: ndims(a)
Out[6]: 1
In [7]: size(a)
Out[7]: (3,)
The syntax (3,) displays a tuple containing one element – the size along the one dimension
that exists.
In Julia, Vector and Matrix are just aliases for one- and two-dimensional arrays respec-
tively
Out[8]: true
Out[9]: true
We’ve already seen some functions for creating a vector filled with 0.0
In [11]: zeros(3)
In [12]: zeros(2, 2)
In [13]: fill(5.0, 2, 2)
Finally, you can create an empty array using the Array() constructor
In [14]: x = Array{Float64}(undef, 2, 2)
The printed values you see here are just garbage values.
(the existing contents of the allocated memory slots being interpreted as 64 bit floats)
If you need more control over the types, fill with a non-floating point
For the most part, we will avoid directly specifying the types of arrays, and let the compiler
deduce the optimal types on its own.
The reasons for this, discussed in more detail in this lecture, are to ensure both clarity and
generality.
One place this can be inconvenient is when we need to create an array based on an existing
array.
First, note that assignment in Julia binds a name to a value, but does not make a copy of
that type
In [17]: x = [1, 2, 3]
y = x
y[1] = 2
x
In the above, y = x simply creates a new named binding called y which refers to whatever x
currently binds to.
To copy the data, you need to be more explicit
In [18]: x = [1, 2, 3]
y = copy(x)
y[1] = 2
x
However, rather than making a copy of x, you may want to just have a similarly sized array
In [19]: x = [1, 2, 3]
y = similar(x)
y
We can also use similar to pre-allocate a vector with a different size, but the same shape
In [20]: x = [1, 2, 3]
y = similar(x, 4) # make a vector of length 4
80 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
In [21]: x = [1, 2, 3]
y = similar(x, 2, 2) # make a 2x2 matrix
As we’ve seen, you can create one dimensional arrays from manually specified data like so
In [24]: ndims(a)
Out[24]: 2
You might then assume that a = [10; 20; 30; 40] creates a two dimensional column
vector but this isn’t the case.
In [27]: ndims(a)
Out[27]: 1
In [29]: ndims(a)
Out[29]: 2
Out[30]: 30
In [31]: a[1:3]
In [32]: a = randn(2, 2)
a[1, 1]
Out[32]: -0.6728149604129142
In [35]: a = randn(2, 2)
In [37]: a[b]
In [38]: a = zeros(4)
In [39]: a[2:end] .= 42
In [40]: a
Using the : notation provides a slice of an array, copying the sub-array to a new array with a
similar type.
In [41]: a = [1 2; 3 4]
b = a[:, 2]
@show b
a[:, 2] = [4, 5] # modify a
@show a
@show b;
b = [2, 4]
a = [1 4; 3 5]
b = [2, 4]
In [42]: a = [1 2; 3 4]
@views b = a[:, 2]
@show b
a[:, 2] = [4, 5]
@show a
@show b;
b = [2, 4]
a = [1 4; 3 5]
b = [4, 5]
Note that the only difference is the @views macro, which will replace any slices with views in
the expression.
An alternative is to call the view function directly – though it is generally discouraged since
it is a step away from the math.
Out[43]: true
As with most programming in Julia, it is best to avoid prematurely assuming that @views
will have a significant impact on performance, and stress code clarity above all else.
Another important lesson about @views is that they are not normal, dense arrays.
In [44]: a = [1 2; 3 4]
b_slice = a[:, 2]
@show typeof(b_slice)
@show typeof(a)
@views b = a[:, 2]
@show typeof(b);
84 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
typeof(b_slice) = Array{Int64,1}
typeof(a) = Array{Int64,2}
typeof(b) =
SubArray{Int64,1,Array{Int64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}
The type of b is a good example of how types are not as they may seem.
Similarly
In [45]: a = [1 2; 3 4]
b = a' # transpose
typeof(b)
Out[45]: Adjoint{Int64,Array{Int64,2}}
In [46]: a = [1 2; 3 4]
b = a' # transpose
c = Matrix(b) # convert to matrix
d = collect(b) # also `collect` works on any iterable
c == d
Out[46]: true
As we saw with transpose, sometimes types that look like matrices are not stored as a
dense array.
As an example, consider creating a diagonal matrix
In [48]: @show 2a
b = rand(2,2)
@show b * a;
5.3. ARRAY BASICS 85
While the implementation of I is a little abstract to go into at this point, a hint is:
In [51]: typeof(I)
Out[51]: UniformScaling{Bool}
This is a UniformScaling type rather than an identity matrix, making it much more pow-
erful and general.
As discussed above, in Julia, the left hand side of an assignment is a “binding” or a label to a
value.
In [52]: x = [1 2 3]
y = x # name `y` binds to whatever value `x` bound to
In [53]: x = [1 2 3]
y = x # name `y` binds to whatever `x` bound to
z = [2 3 4]
y = z # only changes name binding, not value!
@show (x, y, z);
86 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
What this means is that if a is an array and we set b = a then a and b point to exactly the
same data.
In the above, suppose you had meant to change the value of x to the values of y, you need to
assign the values rather than the name.
In [54]: x = [1 2 3]
y = x # name `y` binds to whatever `x` bound to
z = [2 3 4]
y .= z # now dispatches the assignment of each element
@show (x, y, z);
val = [1, 2]
f(val)
In general, these “out-of-place” functions are preferred to “in-place” functions, which modify
the arguments.
val = [1, 2]
y = similar(val)
function f!(out, x)
out .= [1 2; 3 4] * x
end
f!(y, val)
y
This demonstrates a key convention in Julia: functions which modify any of the arguments
have the name ending with ! (e.g. push!).
We can also see a common mistake, where instead of modifying the arguments, the name
binding is swapped
val = [1, 2]
y = similar(val)
function f!(out, x)
out = [1 2; 3 4] * x # MISTAKE! Should be .= or [:]
end
f!(y, val)
y
The frequency of making this mistake is one of the reasons to avoid in-place functions, unless
proven to be necessary by benchmarking.
In [58]: y = [1 2]
y .-= 2 # y .= y .- 2, no problem
x = 5
# x .-= 2 # Fails!
x = x - 2 # subtle difference - creates a new value and rebinds the�
↪variable
Out[58]: 3
In particular, there is no way to pass any immutable into a function and have it modified
In [59]: x = 2
function f(x)
x = 3 # MISTAKE! does not modify x, creates a new value!
end
x = 2
88 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
This is also true for other immutable types such as tuples, as well as some vector types
f(x) = 2x
@show f(xdynamic)
@show f(xstatic)
# inplace version
function g(x)
x .= 2x
return "Success!"
end
@show xdynamic
@show g(xdynamic)
@show xdynamic;
f(xdynamic) = [2, 4]
f(xstatic) = [2, 4]
xdynamic = [1, 2]
g(xdynamic) = "Success!"
xdynamic = [2, 4]
Julia provides standard functions for acting on arrays, some of which we’ve already seen
In [61]: a = [-1, 0, 1]
@show length(a)
@show sum(a)
@show mean(a)
@show std(a) # standard deviation
@show var(a) # variance
@show maximum(a)
@show minimum(a)
@show extrema(a) # (mimimum(a), maximum(a))
length(a) = 3
sum(a) = 0
mean(a) = 0.0
std(a) = 1.0
var(a) = 1.0
maximum(a) = 1
minimum(a) = -1
extrema(a) = (-1, 1)
5.4. OPERATIONS ON ARRAYS 89
Out[61]: (-1, 1)
To sort an array
In [62]: b = sort(a, rev = true) # returns new array, original not modified
Out[64]: true
In [65]: b === a # tests if arrays are identical (i.e share same memory)
Out[65]: true
In [66]: a = ones(1, 2)
In [67]: b = ones(2, 2)
In [68]: a * b
In [69]: b * a'
90 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
In [70]: A = [1 2; 2 3]
In [71]: B = ones(2, 2)
In [72]: A \ B
In [73]: inv(A) * B
Although the last two operations give the same result, the first one is numerically more stable
and should be preferred in most cases.
Multiplying two one dimensional vectors gives an error – which is reasonable since the mean-
ing is ambiguous.
More precisely, the error is that there isn’t an implementation of * for two one dimensional
vectors.
The output explains this, and lists some other methods of * which Julia thinks are close to
what we want.
↪MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}�
↪MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}�
↪MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}�
↪MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}�
↪buildworker/worker/package_linux64/build/usr/share/julia/stdlib/
↪v1.4/LinearAlgebra/src/triangular.jl:1971
Stacktrace:
Out[75]: 2.0
Alternatively, for inner product in this setting use dot() or the unicode \cdot<TAB>
Matrix multiplication using one dimensional vectors similarly follows from treating them as
column vectors. Post-multiplication requires a transpose
In [77]: b = ones(2, 2)
b * ones(2)
In [78]: ones(2)' * b
Note that the type of the returned value in this case is not Array{Float64,1} but rather
Adjoint{Float64,Array{Float64,1}}.
This is since the left multiplication by a row vector should also be a row-vector. It also hints
that the types in Julia more complicated than first appears in the surface notation, as we will
explore further in the introduction to types lecture.
5.4. OPERATIONS ON ARRAYS 93
Algebraic Operations
Suppose that we wish to multiply every element of matrix A with the corresponding element
of matrix B.
In that case we need to replace * (matrix multiplication) with .* (elementwise multiplica-
tion).
For example, compare
In [81]: A = -ones(2, 2)
However in practice some operations are mathematically valid without broadcasting, and
hence the . can be omitted.
In [84]: A = ones(2, 2)
94 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
In [85]: 2 * A # same as 2 .* A
In fact you can omit the * altogether and just write 2A.
Unlike MATLAB and other languages, scalar addition requires the .+ in order to correctly
broadcast
In [86]: x = [1, 2]
x .+ 1 # not x + 1
x .- 1 # not x - 1
Elementwise Comparisons
In [89]: b .> a
In [90]: a .== b
5.4. OPERATIONS ON ARRAYS 95
In [91]: b
In [92]: b .> 1
This is particularly useful for conditional extraction – extracting the elements of an array that
satisfy a condition
In [93]: a = randn(4)
In [94]: a .< 0
Changing Dimensions
In [97]: b = reshape(a, 2, 2)
In [98]: b
Out[99]: 100
In [100]: b
In [101]: a
Julia provides standard mathematical functions such as log, exp, sin, etc.
In [104]: log(1.0)
Out[104]: 0.0
In [105]: log.(1:4)
Note that we can get the same result as with a comprehension or more explicit loop
In [107]: A = [1 2; 3 4]
In [108]: det(A)
Out[108]: -2.0
In [109]: tr(A)
Out[109]: 5
In [110]: eigvals(A)
In [111]: rank(A)
Out[111]: 2
5.5 Ranges
Ranges can also be created with floating point numbers using the same notation.
Out[113]: 0.0:0.1:1.0
But care should be taken if the terminal node is not a multiple of the set sizes.
Out[114]: false
To evenly space points where the maximum value is important, i.e., linspace in other lan-
guages
maximum(a) == maxval
Out[115]: true
As well as named tuples, which extend tuples with names for each argument.
Named tuples are a convenient and high-performance way to manage and unpack sets of pa-
rameters
100 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
return α + β
end
Out[119]: 0.30000000000000004
This functionality is aided by the Parameters.jl package and the @unpack macro
function f(parameters)
@unpack α, β = parameters # good style, less sensitive to errors
return α + β
end
Out[120]: 0.30000000000000004
An alternative approach, defining a new type using struct tends to be more prone to acci-
dental misuse, and leads to a great deal of boilerplate code.
For that, and other reasons of generality, we will use named tuples for collections of parame-
ters where possible.
Sometimes a variable, return type from a function, or value in an array needs to represent the
absence of a value rather than a particular value.
There are two distinct use cases for this
5.7. NOTHING, MISSING, AND UNIONS 101
1. nothing (“software engineers null”): used where no value makes sense in a particular
context due to a failure in the code, a function parameter not passed in, etc.
2. missing (“data scientists null”): used when a value would make conceptual sense, but
it isn’t available.
In [122]: typeof(nothing)
Out[122]: Nothing
@show f(1.0)
@show f(-1.0);
x = 1.0
f(1.0) = 1.0
x was not set
f(-1.0) = nothing
While in general you want to keep a variable name bound to a single type in Julia, this is a
notable exception.
Similarly, if needed, you can return a nothing from a function to indicate that it did not
calculate as expected.
end
end
x1 = 1.0
x2 = -1.0
y1 = f(x1)
y2 = f(x2)
f(-1.0) failed
As an aside, an equivalent way to write the above function is to use the ternary operator,
which gives a compact if/then/else structure
f(1.0)
Out[125]: 1.0
We will sometimes use this form when it makes the code more clear (and it will occasionally
make the code higher performance).
Regardless of how f(x) is written, the return type is an example of a union, where the result
could be one of an explicit set of types.
In this particular case, the compiler would deduce that the type would be a
Union{Nothing,Float64} – that is, it returns either a floating point or a nothing.
You will see this type directly if you use an array containing both types
When considering error handling, whether you want a function to return nothing or simply
fail depends on whether the code calling f(x) is carefully checking the results.
For example, if you were calling on an array of parameters where a priori you were not sure
which ones will succeed, then
On the other hand, if the parameter passed is invalid and you would prefer not to handle a
graceful failure, then using an assertion is more appropriate.
f(1.0)
Out[128]: 1.0
if isnothing(z)
println("No z given with $x")
else
println("z = $z given with $x")
end
end
f(1.0)
f(1.0, z=3.0)
An alternative to nothing, which can be useful and sometimes higher performance, is to use
NaN to signal that a value is invalid returning from a function.
f(0.1)
f(-1.0)
@show typeof(f(-1.0))
@show f(-1.0) == NaN # note, this fails!
@show isnan(f(-1.0)) # check with this
104 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
typeof(f(-1.0)) = Float64
f(-1.0) == NaN = false
isnan(f(-1.0)) = true
Out[130]: true
Note that in this case, the return type is Float64 regardless of the input for Float64 in-
put.
Keep in mind, though, that this only works if the return type of a function is Float64.
5.7.2 Exceptions
Out[131]: DomainError(-1.0, "sqrt will only return a complex result if called with�
↪a complex
Another example you will see is when the compiler cannot convert between types.
If these exceptions are generated from unexpected cases in your code, it may be appropriate
simply let them occur and ensure you can read the error.
Occasionally you will want to catch these errors and try to recover, as we did above in the
try block.
5.7. NOTHING, MISSING, AND UNIONS 105
f(0.0)
f(-1.0)
5.7.3 Missing
A key feature of missing is that it propagates through other function calls - unlike
nothing
The purpose of this is to ensure that failures do not silently fail and provide meaningless nu-
merical results.
This even applies for the comparison of values, which
106 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
In [136]: x = missing
@show x == missing
@show x === missing # an exception
@show ismissing(x);
x == missing = missing
x === missing = true
ismissing(x) = true
@show mean(x)
@show mean(skipmissing(x))
@show coalesce.(x, 0.0); # replace missing with 0.0;
mean(x) = missing
mean(skipmissing(x)) = 2.6666666666666665
coalesce.(x, 0.0) = [1.0, 0.0, 2.0, 0.0, 0.0, 5.0]
As missing is similar to R’s NA type, we will see more of missing when we cover
DataFrames.
5.8 Exercises
5.8.1 Exercise 1
This exercise uses matrix operations that arise in certain problems, including when dealing
with linear stochastic difference equations.
If you aren’t familiar with all the terminology don’t be concerned – you can skim read the
background discussion and focus purely on the matrix exercise.
With that said, consider the stochastic difference equation
Here
• 𝑋𝑡 , 𝑏 and 𝑋𝑡+1 are 𝑛 × 1
• 𝐴 is 𝑛 × 𝑛
• Σ is 𝑛 × 𝑘
• 𝑊𝑡 is 𝑘 × 1 and {𝑊𝑡 } is iid with zero mean and variance-covariance matrix equal to the
identity matrix
5.8. EXERCISES 107
It can be shown that, provided all eigenvalues of 𝐴 lie within the unit circle, the sequence
{𝑆𝑡 } converges to a unique limit 𝑆.
This is the unconditional variance or asymptotic variance of the stochastic difference
equation.
As an exercise, try writing a simple function that solves for the limit 𝑆 by iterating on (2)
given 𝐴 and Σ.
To test your solution, observe that the limit 𝑆 is a solution to the matrix equation
5.8.2 Exercise 2
where
• 𝑤𝑡+1 is distributed Normal(0,1)
• 𝛾 = 1, 𝜎 = 1, 𝑦0 = 0
• 𝜃 ∈ Θ ≡ {0.8, 0.9, 0.98}
Given these parameters
• Simulate a single 𝑦𝑡 series for each 𝜃 ∈ Θ for 𝑇 = 150. Feel free to experiment with
different 𝑇 .
• Overlay plots of the rolling mean of the process for each 𝜃 ∈ Θ, i.e. for each 1 ≤ 𝜏 ≤ 𝑇
plot
1 𝜏
∑𝑦
𝜏 𝑡=1 𝑇
• Simulate 𝑁 = 200 paths of the stochastic process above to the 𝑇 , for each 𝜃 ∈ Θ, where
we refer to an element of a particular simulation as 𝑦𝑡𝑛 .
108 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
• Overlay plots a histogram of the stationary distribution of the final 𝑦𝑇𝑛 for each 𝜃 ∈ Θ.
Hint: pass alpha to a plot to make it transparent (e.g. histogram(vals, alpha =
0.5)) or use stephist(vals) to show just the step function for the histogram.
𝑁 𝑛
• Numerically find the mean and variance of this as an ensemble average, i.e. ∑𝑛=1 𝑦𝑁𝑇
𝑛 2 𝑛 2
𝑁 (𝑦𝑇 ) 𝑁 𝑦𝑇
and ∑𝑛=1 𝑁 − (∑𝑛=1 𝑁 ) .
5.8.3 Exercise 3
where 𝑦, 𝑥1 , 𝑥2 are scalar observables, 𝑎, 𝑏, 𝑐, 𝑑 are parameters to estimate, and 𝑤 are iid nor-
mal with mean 0 and variance 1.
First, let’s simulate data we can use to estimate the parameters
• Draw 𝑁 = 50 values for 𝑥1 , 𝑥2 from iid normal distributions.
Then, simulate with different 𝑤 * Draw a 𝑤 vector for the N values and then y from this sim-
ulated data if the parameters were 𝑎 = 0.1, 𝑏 = 0.2𝑐 = 0.5, 𝑑 = 1.0, 𝜎 = 0.1. * Repeat that so
you have M = 20 different simulations of the y for the N values.
Finally, calculate order least squares manually (i.e., put the observables into matrices and
vectors, and directly use the equations for OLS rather than a package).
• For each of the M=20 simulations, calculate the OLS estimates for 𝑎, 𝑏, 𝑐, 𝑑, 𝜎.
• Plot a histogram of these estimates for each variable.
5.8.4 Exercise 4
Redo Exercise 1 using the fixedpoint function from NLsolve this lecture.
Compare the number of iterations of the NLsolve’s Anderson Acceleration to the handcoded
iteration used in Exercise 1.
Hint: Convert the matrix to a vector to use fixedpoint. e.g. A = [1 2; 3 4] then x =
reshape(A, 4) turns it into a vector. To reverse, reshape(x, 2, 2).
5.9 Solutions
5.9.1 Exercise 1
S = S0
err = tolerance + 1
i = 1
while err > tolerance && i ≤ maxiter
next_S = A * S * A' + V
err = norm(S - next_S)
S = next_S
i += 1
end
return S
end
Σ = [0.5 0.4;
0.4 0.6]
Out[140]: 0.9
Out[143]: 3.883245447999784e-6
110 CHAPTER 5. ARRAYS, TUPLES, RANGES, AND OTHER FUNDAMENTAL TYPES
Chapter 6
6.1 Contents
• Overview 6.2
• Finding and Interpreting Types 6.3
• The Type Hierarchy 6.4
• Deducing and Declaring Types 6.5
• Creating New Types 6.6
• Introduction to Multiple Dispatch 6.7
• Exercises 6.8
6.2 Overview
In Julia, arrays and tuples are the most important data type for working with numerical
data.
In this lecture we give more details on
• declaring types
• abstract types
• motivation for generic programming
• multiple dispatch
• building user-defined types
6.2.1 Setup
111
112 CHAPTER 6. INTRODUCTION TO TYPES AND GENERIC PROGRAMMING
As we have seen in the previous lectures, in Julia all values have a type, which can be queried
using the typeof function
typeof(1) = Int64
typeof(1.0) = Float64
The hard-coded values 1 and 1.0 are called literals in a programming language, and the
compiler deduces their types (Int64 and Float64 respectively in the example above).
You can also query the type of a value
In [4]: x = 1
typeof(x)
Out[4]: Int64
We will learn more details about generic programming later, but the key is to interpret the
curly brackets as swappable parameters for a given type.
For example, Array{Float64, 2} can be read as
1. Array is a parametric type representing a dense array, where the first parameter is the
type stored, and the second is the number of dimensions.
2. Float64 is a concrete type declaring that the data stored will be a particular size of
floating point.
A concrete type is one where values can be created by the compiler (equivalently, one which
can be the result of typeof(x) for some object x).
Values of a parametric type cannot be concretely constructed unless all of the parameters
are given (themselves with concrete types).
In the case of Complex{Float64}
2. Float64 is a concrete type declaring what the type of the real and imaginary parts of
the value should store.
typeof(x) = Tuple{Int64,Float64,String}
Out[6]: Tuple{Int64,Float64,String}
In this case, Tuple is the parametric type, and the three parameters are a list of the types of
each value.
For a named tuple
The parametric NamedTuple type contains two parameters: first a list of names for each
field of the tuple, and second the underlying Tuple type to store the values.
Anytime a value is prefixed by a colon, as in the :a above, the type is Symbol – a special
kind of string used by the compiler.
In [8]: typeof(:a)
Out[8]: Symbol
Remark: Note that, by convention, type names use CamelCase – Array, AbstractArray,
etc.
114 CHAPTER 6. INTRODUCTION TO TYPES AND GENERIC PROGRAMMING
Since variables and functions are lower case by convention, this can be used to easily identify
types when reading code and output.
After assigning a variable name to a value, we can query the type of the value via the name.
In [9]: x = 42
@show typeof(x);
typeof(x) = Int64
In [10]: x = 42.0
Out[10]: 42.0
In [11]: typeof(x)
Out[11]: Float64
However, beyond a few notable exceptions (e.g. nothing used for error handling), changing
types is usually a symptom of poorly organized code, and makes type inference more difficult
for the compiler.
In the above, both Float64 and Int64 are subtypes of Real, whereas the Complex num-
bers are not.
They are, however, all subtypes of Number
Out[14]: true
In particular, the type tree is organized with Any at the top and the concrete types at the
bottom.
We never actually see instances of abstract types (i.e., typeof(x) never returns an abstract
type).
The point of abstract types is to categorize the concrete types, as well as other abstract types
that sit below them in the hierarchy.
There are some further functions to help you explore the type hierarchy, such as
show_supertypes which walks up the tree of types to Any for a given type.
In [15]: using Base: show_supertypes # import the function from the `Base` package
show_supertypes(Int64)
116 CHAPTER 6. INTRODUCTION TO TYPES AND GENERIC PROGRAMMING
Int64 <: Signed <: Integer <: Real <: Number <: Any
And the subtypes which gives a list of the available subtypes for any packages or code cur-
rently loaded
We will discuss this in detail in generic programming, but much of Julia’s performance gains
and generality of notation comes from its type system.
For example
In [17]: x1 = [1, 2, 3]
x2 = [1.0, 2.0, 3.0]
@show typeof(x1)
@show typeof(x2)
typeof(x1) = Array{Int64,1}
typeof(x2) = Array{Float64,1}
Out[17]: Array{Float64,1}
x = [1, 2, 3]
z = f(x) # call with an integer array - compiler deduces type
In order to keep many of the benefits of Julia, you will sometimes want to ensure the com-
piler can always deduce a single type from any function or expression.
An example of bad practice is to use an array to hold unrelated types
The type of this array is Array{Any,1}, where Any means the compiler has determined
that any valid Julia type can be added to the array.
While occasionally useful, this is to be avoided whenever possible in performance sensitive
code.
The other place this can come up is in the declaration of functions.
As an example, consider a function which returns different types depending on the arguments.
@show f(1)
@show f(-1);
f(1) = 1.0
f(-1) = 0
The issue here is relatively subtle: 1.0 is a floating point, while 0 is an integer.
Consequently, given the type of x, the compiler cannot in general determine what type the
function will return.
This issue, called type stability, is at the heart of most Julia performance considerations.
Luckily, trying to ensure that functions return the same types is also generally consistent with
simple, clear code.
It is also in contrast to some of the sample code you will see in other Julia sources, which you
will need to be able to read.
To give an example of the declaration of types, the following are equivalent
While declaring the types may be verbose, would it ever generate faster code?
The answer is almost never.
Furthermore, it can lead to confusion and inefficiencies since many things that behave like
vectors and matrices are not Matrix{Float64} and Vector{Float64}.
Here, the first line works and the second line fails
Up until now, we have used NamedTuple to collect sets of parameters for our models and
examples.
These are useful for maintaining values for model parameters, but you will eventually need to
be able to use code that creates its own types.
In either case, the compiler generates a function to create new values of the data type, called
a “constructor”.
It has the same name as the data type but uses function call notion
@show typeof(foo)
@show foo.a # get the value for a field
@show foo.b
@show foo.c;
typeof(foo) = Foo
foo.a = 2.0
foo.b = 3
foo.c = [1.0, 2.0, 3.0]
120 CHAPTER 6. INTRODUCTION TO TYPES AND GENERIC PROGRAMMING
You will notice two differences above for the creation of a struct compared to our use of
NamedTuple.
• Types are declared for the fields, rather than inferred by the compiler.
• The construction of a new instance has no named parameters to prevent accidental mis-
use if the wrong order is chosen.
Was it necessary to manually declare the types a::Float64 in the above struct?
The answer, in practice, is usually yes.
Without a declaration of the type, the compiler is unable to generate efficient code, and the
use of a struct declared without types could drop performance by orders of magnitude.
Moreover, it is very easy to use the wrong type, or unnecessarily constrain the types.
The first example, which is usually just as low-performance as no declaration of types at all,
is to accidentally declare it with an abstract type
The second issue is that by choosing a type (as in the Foo above), you may be unnecessarily
constraining what is allowed
f(foo) = 11.0
f(foo_nt) = 11.0
Motivated by the above, we can create a type which can adapt to holding fields of different
types.
# works fine
a = 2
b = 3
c = [1.0, 2.0, 3.0]' # transpose is not a `Vector` but `f()` would work
foo = Foo3(a, b, c)
@show typeof(foo)
f(foo)
typeof(foo) = Foo3{Int64,Int64,Adjoint{Float64,Array{Float64,1}}}
Out[29]: 11.0
Of course, this is probably too flexible, and the f function might not work on an arbitrary set
of a, b, c.
You could constrain the types based on the abstract parent type using the <: operator
typeof(foo) = Foo4{Int64,Int64,Adjoint{Float64,Array{Float64,1}}}
Out[30]: 11.0
There is no way to avoid learning parametric types to achieve high performance code.
However, the other issue where constructor arguments are error-prone, can be remedied with
the Parameters.jl library.
122 CHAPTER 6. INTRODUCTION TO TYPES AND GENERIC PROGRAMMING
@show foo
@show foo2
function f(x)
@unpack a, b, c = x # can use `@unpack` on any struct
return a + b + sum(c)
end
f(foo)
foo = Foo5
a: Float64 0.1
b: Int64 2
c: Array{Float64}((3,)) [1.0, 2.0, 3.0]
foo2 = Foo5
a: Float64 2.0
b: Int64 2
c: Array{Float64}((3,)) [1.0, 2.0, 3.0]
Out[31]: 8.1
As discussed in the previous sections, there is major advantage to never declaring a type un-
less it is absolutely necessary.
The main place where it is necessary is designing code around multiple dispatch.
If you are careful to write code that doesn’t unnecessarily assume types, you will both achieve
higher performance and allow seamless use of a number of powerful libraries such as auto-
differentiation, static arrays, GPUs, interval arithmetic and root finding, arbitrary precision
numbers, and many more packages – including ones that have not even been written yet.
A few simple programming patterns ensure that this is possible
• Do not declare types when declaring variables or functions unless necessary.
In [32]: # BAD
x = [5.0, 6.0, 2.1]
6.6. CREATING NEW TYPES 123
g(x)
# GOOD
function g2(x) # or `x::AbstractVector`
y = similar(x)
z = I
q = ones(eltype(x), length(x)) # or `fill(one(x), length(x))`
y .= z * x + q
return y
end
g2(x)
• Preallocate related vectors with similar where possible, and use eltype or typeof.
This is important when using Multiple Dispatch given the different input types the
function can call
g([BigInt(1), BigInt(2)])
typeof(ones(3)) = Array{Float64,1}
typeof(ones(Int64, 3)) = Array{Int64,1}
typeof(zeros(3)) = Array{Float64,1}
typeof(zeros(Int64, 3)) = Array{Int64,1}
@show typeof(zero(BigFloat))
x = BigFloat(2)
typeof(1) = Int64
typeof(1.0) = Float64
typeof(BigFloat(1.0)) = BigFloat
typeof(one(BigFloat)) = BigFloat
typeof(zero(BigFloat)) = BigFloat
typeof(one(x)) = BigFloat
typeof(zero(x)) = BigFloat
In [37]: # ACCEPTABLE
function g(x::AbstractFloat)
return x + 1.0 # assumes `1.0` can be converted to something�
↪compatible with
`typeof(x)`
end
x = BigFloat(1.0)
typeof(g(x)) = BigFloat
In [38]: # BAD
function g2(x::AbstractFloat)
6.6. CREATING NEW TYPES 125
otherwise
return 0 # BAD! Returns a `Int64`
end
end
x = BigFloat(1.0)
x2 = BigFloat(-1.0)
@show typeof(g2(x))
@show typeof(g2(x2)) # type unstable
# GOOD
function g3(x) #
if x > zero(x) # any type with an additive identity
return x + one(x) # more general but less important of a change
otherwise
return zero(x)
end
end
@show typeof(g3(x))
@show typeof(g3(x2)); # type stable
typeof(g2(x)) = BigFloat
typeof(g2(x2)) = Nothing
typeof(g3(x)) = BigFloat
typeof(g3(x2)) = Nothing
These patterns are relatively straightforward, but generic programming can be thought of
as a Leontief production function: if any of the functions you write or call are not precise
enough, then it may break the chain.
This is all the more reason to exploit carefully designed packages rather than “do-it-yourself”.
The previous section helps to establish some of the reasoning behind the style choices in these
lectures: “be aware of types, but avoid declaring them”.
The purpose of this is threefold:
• Provide easy to read code with minimal “syntactic noise” and a clear correspondence to
the math.
• Ensure that code is sufficiently generic to exploit other packages and types.
• Avoid common mistakes and unnecessary performance degradations.
This is just one of many decisions and patterns to ensure that your code is consistent and
clear.
The best resource is to carefully read other peoples code, but a few sources to review are
• Julia Style Guide.
• Invenia Blue Style Guide.
126 CHAPTER 6. INTRODUCTION TO TYPES AND GENERIC PROGRAMMING
Commenting Code
One common mistake people make when trying to apply these goals is to add in a large num-
ber of comments.
Over the years, developers have found that excess comments in code (and especially big com-
ment headers used before every function declaration) can make code harder to read.
The issue is one of syntactic noise: if most of the comments are redundant given clear vari-
able and function names, then the comments make it more difficult to mentally parse and
read the code.
If you examine Julia code in packages and the core language, you will see a great amount of
care taken in function and variable names, and comments are only added where helpful.
For creating packages that you intend others to use, instead of a comment header, you should
use docstrings.
One of the defining features of Julia is multiple dispatch, whereby the same function name
can do different things depending on the underlying types.
Without realizing it, in nearly every function call within packages or the standard library you
have used this feature.
To see this in action, consider the absolute value function abs
abs(-1) = 1
abs(-1.0) = 1.0
6.7. INTRODUCTION TO MULTIPLE DISPATCH 127
In all of these cases, the abs function has specialized code depending on the type passed in.
To do this, a function specifies different methods which operate on a particular set of types.
Unlike most cases we have seen before, this requires a type annotation.
To rewrite the abs function
function ourabs(x::Complex)
sqrt(real(x)^2 + imag(x)^2)
end
ourabs(-1) = 1
ourabs(-1.0) = 1.0
ourabs(1.0 - 2.0im) = 2.23606797749979
Note that in the above, x works for any type of Real, including Int64, Float64, and ones
you may not have realized exist
typeof(x) = Rational{Int64}
ourabs(x) = 2//3
You will also note that we used an abstract type, Real, and an incomplete parametric type,
Complex, when defining the above functions.
Unlike the creation of struct fields, there is no penalty in using abstract types when you
define function parameters, as they are used purely to determine which version of a function
to use.
If you want an algorithm to have specialized versions when given different input types, you
need to declare the types for the function inputs.
128 CHAPTER 6. INTRODUCTION TO TYPES AND GENERIC PROGRAMMING
As an example where this could come up, assume that we have some grid x of values, the re-
sults of a function f applied at those values, and want to calculate an approximate derivative
using forward differences.
In that case, given 𝑥𝑛 , 𝑥𝑛+1 , 𝑓(𝑥𝑛 ) and 𝑓(𝑥𝑛+1 ), the forward-difference approximation of the
derivative is
𝑓(𝑥𝑛+1 ) − 𝑓(𝑥𝑛 )
𝑓 ′ (𝑥𝑛 ) ≈
𝑥𝑛+1 − 𝑥𝑛
To implement this calculation for a vector of inputs, we notice that there is a specialized im-
plementation if the grid is uniform.
The uniform grid can be implemented using an AbstractRange, which we can analyze with
typeof, supertype and show_supertypes.
@show typeof(x)
@show typeof(x_2)
@show supertype(typeof(x))
typeof(x) =
StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}
typeof(x_2) = StepRange{Int64,Int64}
supertype(typeof(x)) = AbstractRange{Float64}
Out[42]: AbstractRange{Float64}
StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.
↪TwicePrecision{Float64}} <:
AbstractRange{Float64} <: AbstractArray{Float64,1} <: Any
In [44]: show_supertypes(typeof(x_2))
The types of the range objects can be very complicated, but are both subtypes of
AbstractRange.
While you may not know the exact concrete type, any AbstractRange has an informal set
of operations that are available.
minimum(x) = 0.0
maximum(x) = 1.0
length(x) = 20
step(x) = 0.05263157894736842
Similarly, there are a number of operations available for any AbstractVector, such as
length.
@show typeof(f_x)
@show supertype(typeof(f_x))
@show supertype(supertype(typeof(f_x))) # walk up tree again!
@show length(f_x); # and many more
typeof(f_x) = Array{Float64,1}
supertype(typeof(f_x)) = DenseArray{Float64,1}
supertype(supertype(typeof(f_x))) = AbstractArray{Float64,1}
length(f_x) = 20
In [48]: show_supertypes(typeof(f_x))
There are also many functions that can use any AbstractArray, such as diff.
?diff
# if A is a matrix, specify the dimension over which to operate with the dims ke
diff(A::AbstractMatrix; dims::Integer)
Out[50]:
In the final example, we see that it is able to use specialized implementations over both the f
and the x arguments.
This is the “multiple” in multiple dispatch.
6.8 Exercises
6.8.1 Exercise 1
N = 3
A = rand(N, N)
x = rand(N)
6.8.2 Exercise 2
A key step in the calculation of the Kalman Filter is calculation of the Kalman gain, as can
be seen with the following example using dense matrices from the Kalman lecture.
Using what you learned from Exercise 1, benchmark this using Static Arrays
6.8.3 Exercise 3
@show p
p′ = derivative(p) # gives the derivative of p, another polynomial
@show p(0.1), p′(0.1) # call like a function
@show roots(p); # find roots such that p(x) = 0
6.8.4 Exercise 4
Use your solution to Exercise 8(a/b) in Introductory Examples to create a specialized version
of Newton’s method for Polynomials using the derivative function.
The signature of the function should be newtonsmethod(p::Polynomial, x_0;
tolerance = 1E-7, maxiter = 100), where p::Polynomial ensures that this ver-
sion of the function will be used anytime a polynomial is passed (i.e. dispatch).
Compare the results of this function to the built-in roots(p) function.
𝑥̄ 𝑁
𝑓(𝑥𝑛−1 ) + 𝑓(𝑥𝑛 )
∫ 𝑓(𝑥) 𝑑𝑥 ≈ ∑ Δ𝑥𝑛
𝑥 𝑛=1
2
f(x) = x^2
value, accuracy = quadgk(f, 0.0, 1.0)
𝑥̄
𝑑
∫ 𝑓(𝑥) 𝑑𝑥
𝑑 𝑥̄ 𝑥
Hint: See the following code for the general pattern, and be careful to follow the rules for
generic programming.
Df(3.0)
Out[57]: 0.5
Part II
135
Chapter 7
Generic Programming
7.1 Contents
• Overview 7.2
• Exploring Type Trees 7.3
• Distributions 7.4
• Numbers and Algebraic Structures 7.5
• Reals and Algebraic Structures 7.6
• Functions, and Function-Like Types 7.7
• Limitations of Dispatching on Abstract Types 7.8
• Exercises 7.9
7.2 Overview
In this lecture we delve more deeply into the structure of Julia, and in particular into
• abstract and concrete types
• the type tree
• designing and using generic interfaces
• the role of generic interfaces in Julia performance
Understanding them will help you
• form a “mental model” of the Julia language
• design code that matches the “white-board” mathematics
• create code that can use (and be used by) a variety of other packages
• write “well organized” Julia code that’s easy to read, modify, maintain and debug
• improve the speed at which your code runs
(Special thank you to Jeffrey Sarnoff)
137
138 CHAPTER 7. GENERIC PROGRAMMING
7.2.2 Setup
The connection between data structures and the algorithms which operate on them is handled
by the type system.
Concrete types (i.e., Float64 or Array{Float64, 2}) are the data structures we
apply an algorithm to, and the abstract types (e.g. the corresponding Number and
AbstractArray) provide the mapping between a set of related data structures and algo-
rithms.
Beyond the typeof and supertype functions, a few other useful tools for analyzing the
tree of types are discussed in the introduction to types lecture
In [4]: using Base: show_supertypes # import the function from the `Base` package
show_supertypes(Int64)
Int64 <: Signed <: Integer <: Real <: Number <: Any
In [5]: subtypes(Integer)
Using the subtypes function, we can write an algorithm to traverse the type tree below any
time t – with the confidence that all types support subtypes
indenting
end
end
Number
Complex
Real
AbstractFloat
BigFloat
140 CHAPTER 7. GENERIC PROGRAMMING
Float16
Float32
Float64
AbstractIrrational
Irrational
FixedPointNumbers.FixedPoint
FixedPointNumbers.Fixed
FixedPointNumbers.Normed
Integer
Bool
GeometryTypes.OffsetInteger
Signed
BigInt
Int128
Int16
Int32
Int64
Int8
Unsigned
UInt128
UInt16
UInt32
UInt64
UInt8
Rational
Ratios.SimpleRatio
StatsBase.TestStat
For the most part, all of the “leaves” will be concrete types.
7.3.1 Any
myval = MyType(2.0)
@show myval
@show typeof(myval)
@show supertype(typeof(myval))
@show typeof(myval) <: Any;
myval = MyType(2.0)
typeof(myval) = MyType
supertype(typeof(myval)) = Any
typeof(myval) <: Any = true
7.3. EXPLORING TYPE TREES 141
Here we see another example of generic programming: every type <: Any supports the
@show macro, which in turn, relies on the show function.
The @show macro (1) prints the expression as a string; (2) evaluates the expression; and (3)
calls the show function on the returned values.
To see this with built-in types
In [9]: x = [1, 2]
show(x)
[1, 2]
The Any type is useful, because it provides a fall-back implementation for a variety of func-
tions.
Hence, calling show on our custom type dispatches to the fallback function
MyType(2.0)
function show(io::IO, x)
str = string(x)
print(io, str)
end
To implement a specialized implementation of the show function for our type, rather than
using this fallback
(MyType.a = 2.0)
At that point, we can use the @show macro, which in turn calls show
Here we see another example of generic programming: any type with a show function works
with @show.
Layering of functions (e.g. @show calling show) with a “fallback” implementation makes it
possible for new types to be designed and only specialized where necessary.
142 CHAPTER 7. GENERIC PROGRAMMING
function myalgorithm!(m::MyModel, x)
m.algorithmcalculation = m.a + m.b + x # some algorithm
end
function set_a!(m::MyModel, a)
m.a = a
end
m = MyModel(2.0, 3.0)
x = 0.1
set_a!(m, 4.1)
myalgorithm!(m, x)
@show m.algorithmcalculation;
m.algorithmcalculation = 7.199999999999999
You may think to yourself that the above code is similar to OO, except that you * re-
verse the first argument, i.e., myalgorithm!(m, x) instead of the object-oriented
m.myalgorithm!(x) * cannot control encapsulation of the fields a, b, but you can add get-
ter/setters like set_a * do not have concrete inheritance
While this sort of programming is possible, it is (verbosely) missing the point of Julia and the
power of generic programming.
When programming in Julia
• there is no encapsulation and most custom types you create will be im-
mutable.
7.4. DISTRIBUTIONS 143
As its essence, the design of generic software is that you will start with creating algorithms
which are largely orthogonal to concrete types.
In the process, you will discover commonality which leads to abstract types with informally
defined functions operating on them.
Given the abstract types and commonality, you then refine the algorithms as they are more
limited or more general than you initially thought.
This approach is in direct contrast to object-oriented design and analysis (OOAD).
With that, where you specify a taxonomies of types, add operations to those types, and then
move down to various levels of specialization (where algorithms are embedded at points
within the taxonomy, and potentially specialized with inheritance).
In the examples that follow, we will show for exposition the hierarchy of types and the algo-
rithms operating on them, but the reality is that the algorithms are often designed first, and
the abstact types came later.
7.4 Distributions
d1 = Normal{Float64}(μ=1.0, σ=2.0)
Normal{Float64} <: Distribution{Univariate,Continuous} <:
Sampleable{Univariate,Continuous} <: Any
rand(d1) = 2.6549944520827173
The purpose of that abstract type is to provide an interface for drawing from a variety of dis-
tributions, some of which may not have a well-defined predefined pdf.
If you were writing a function to simulate a stochastic process with arbitrary iid shocks,
where you did not need to assume an existing pdf etc., this is a natural candidate.
For example, to simulate 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏𝜖𝑡+1 where 𝜖 ∼ 𝐷 for some 𝐷, which allows drawing
random values.
d1 = Normal{Float64}(μ=1.0, σ=2.0)
d2 = Exponential{Float64}(θ=0.1)
supertype(typeof(d1)) = Distribution{Univariate,Continuous}
supertype(typeof(d2)) = Distribution{Univariate,Continuous}
pdf(d1, 0.1) = 0.18026348123082397
pdf(d2, 0.1) = 3.6787944117144233
cdf(d1, 0.1) = 0.32635522028792
cdf(d2, 0.1) = 0.6321205588285577
support(d1) = RealInterval(-Inf, Inf)
support(d2) = RealInterval(0.0, Inf)
minimum(d1) = -Inf
minimum(d2) = 0.0
maximum(d1) = Inf
maximum(d2) = Inf
Out[18]:
Distributions.minimum(d::OurTruncatedExponential) = 0
Distributions.maximum(d::OurTruncatedExponential) = d.xmax
# ... more to have a complete type
To demonstrate this
In [20]: d = OurTruncatedExponential(1.0,2.0)
@show minimum(d), maximum(d)
@show support(d) # why does this work?
Curiously, you will note that the support function works, even though we did not provide
one.
This is another example of the power of multiple dispatch and generic programming.
In the background, the Distributions.jl package has something like the following imple-
mented
Out[21]:
7.4. DISTRIBUTIONS 147
typeof(d) = Truncated{Exponential{Float64},Continuous,Float64}
Out[22]:
148 CHAPTER 7. GENERIC PROGRAMMING
This is the power of generic programming in general, and Julia in particular: you can com-
bine and compose completely separate packages and code, as long as there is an agreement on
abstract types and functions.
Define two binary functions, + and ⋅, called addition and multiplication – although the opera-
tors can be applied to data structures much more abstract than a Real.
In mathematics, a ring is a set with associated additive and multiplicative operators where
• Remark: We use the term “motivation” because they are not formally con-
nected and the mapping is imperfect.
• The main difficulty when dealing with numbers that can be concretely created on a
computer is that the requirement that the operators are closed in the set are difficult
to ensure (e.g. floating points have finite numbers of bits of information).
7.5. NUMBERS AND ALGEBRAIC STRUCTURES 149
Let typeof(a) = typeof(b) = T <: Number, then under an informal definition of the
generic interface for Number, the following must be defined
typeof(a) = Complex{Float64}
typeof(a) <: Number = true
a + b = 1.0 + 3.0im
a * b = -2.0 + 2.0im
-a = -1.0 - 1.0im
a - b = 1.0 - 1.0im
zero(a) = 0.0 + 0.0im
one(a) = 1.0 + 0.0im
And for an arbitrary precision integer where BigInt <: Number (i.e., a different type than
the Int64 you have worked with, but nevertheless a Number)
In [24]: a = BigInt(10)
b = BigInt(4)
@show typeof(a)
@show typeof(a) <: Number
@show a + b
@show a * b
@show -a
@show a - b
@show zero(a)
@show one(a);
typeof(a) = BigInt
typeof(a) <: Number = true
150 CHAPTER 7. GENERIC PROGRAMMING
a + b = 14
a * b = 40
-a = -10
a - b = 6
zero(a) = 0
one(a) = 1
This allows us to showcase further how different generic packages compose – even if they are
only loosely coupled through agreement on common generic interfaces.
The Complex numbers require some sort of storage for their underlying real and imaginary
parts, which is itself left generic.
This data structure is defined to work with any type <: Number, and is parameterized (e.g.
Complex{Float64} is a complex number storing the imaginary and real parts in Float64)
The implementation of the Complex numbers use the underlying operations of storage type,
so as long as +, * etc. are defined – as they should be for any Number – the complex opera-
tion can be defined
real(z) and imag(z) returns the associated components of the complex number in the
underlying storage type (e.g. Float64 or BigFloat).
The rest of the function has been carefully written to use functions defined for any Number
(e.g. + but not <, since it is not part of the generic number interface).
To follow another example , look at the implementation of abs specialized for complex num-
bers
The source is
In this case, if you look at the generic function to get the hypotenuse, hypot, you will see
that it has the function signature hypot(x::T, y::T) where T<:Number, and hence
works for any Number.
That function, in turn, relies on the underlying abs for the type of real(z).
This would dispatch to the appropriate abs for the type
With implementations
For a Real number (which we will discuss in the next section) the fallback implementation
calls a function signbit to determine if it should flip the sign of the number.
The specialized version for Float64 <: Real calls a function called abs_float – which
turns out to be a specialized implementation at the compiler level.
While we have not completely dissected the tree of function calls, at the bottom of the tree
you will end at the most optimized version of the function for the underlying datatype.
Hopefully this showcases the power of generic programming: with a well-designed set of ab-
stract types and functions, the code can both be highly general and composable and still use
the most efficient implementation possible.
Thinking back to the mathematical motivation, a field is a ring with a few additional prop-
erties, among them
This type gives some motivation for the operations and properties of the Real type.
Of course, Complex{Float64} <: Number but not Real – since the ordering is not de-
fined for complex numbers in mathematics.
These operations are implemented in any subtype of Real through
In [30]: a = 1 // 10
b = 4 // 6
@show typeof(a)
@show typeof(a) <: Number
@show typeof(a) <: Real
@show inv(a)
@show a / b
@show a < b;
typeof(a) = Rational{Int64}
typeof(a) <: Number = true
typeof(a) <: Real = true
inv(a) = 10//1
a / b = 3//20
a < b = true
Remark: Here we see where and how the precise connection to the mathematics for number
types breaks down for practical reasons, in particular
• However, it is necessary in practice for integer division to be defined, and return back a
member of the Real’s.
• This is called type promotion, where a type can be converted to another to ensure an
operation is possible by direct conversion between types (i.e., it can be independent of
the type hierarchy).
Do not think of the break in the connection between the underlying algebraic structures and
the code as a failure of the language or design.
Rather, the underlying algorithms for use on a computer do not perfectly fit the algebraic
structures in this instance.
Moving further down the tree of types provides more operations more directly tied to the
computational implementation than abstract algebra.
For example, floating point numbers have a machine precision, below which numbers become
indistinguishable due to lack of sufficient “bits” of information
7.6. REALS AND ALGEBRAIC STRUCTURES 153
As we saw previously, the Real data type is an abstract type, and encompasses both floats
and integers.
If we go to the provided link in the source, we see the entirety of the function is
That is, for any values where typeof(x) <: Real and typeof(y) <: Real, the defini-
tion relies on <.
We know that < is defined for the types because it is part of the informal interface for the
Real abstract type.
Note that this is not defined for Number because not all Number types have the < ordering
operator defined (e.g. Complex).
In order to generate fast code, the implementation details may define specialized versions of
these operations.
Note that the reason Float64 <: Real calls this implementation rather than the one given
above, is that Float64 <: Real, and Julia chooses the most specialized implementation for
each function.
The specialized implementations are often more subtle than you may realize due to floating
point arithmetic, underflow, etc.
154 CHAPTER 7. GENERIC PROGRAMMING
Another common example of the separation between data structures and algorithms is the
use of functions.
Syntactically, a univariate “function” is any f that can call an argument x as f(x).
For example, we can use a standard function
function plotfunctions(f)
intf(x) = quadgk(f, 0.0, x)[1] # int_0^x f(x) dx
x = 0:0.1:1.0
f_x = f.(x)
plot(x, f_x, label="f")
plot!(x, intf.(x), label="int_f")
end
plotfunctions(f) # call with our f
Out[34]:
@show p
@show p(1.0) # call like a function
Out[35]:
f_int(1.0) = 1.0
Out[36]:
156 CHAPTER 7. GENERIC PROGRAMMING
Note that the same generic plotfunctions could use any variable passed to it that “looks”
like a function, i.e., can call f(x).
This approach to design with types – generic, but without any specific type declarations – is
called duck typing.
If you need to make an existing type callable, see Function Like Objects.
You will notice that types in Julia represent a tree with Any at the root.
The tree structure has worked well for the above examples, but it doesn’t allow us to asso-
ciate multiple categorizations of types.
For example, a semi-group type would be useful for a writing generic code ( e.g. continuous-
time solutions for ODEs and matrix-free methods), but cannot be implemented rigorously
since the Matrix type is a semi-group as well as an AbstractArray, but not all semi-
groups are AbstractArray s.
The main way to implement this in a generic language is with a design approach called
“traits”.
• See the original discussion and an example of a package to facilitate the pattern.
• A complete description of the traits pattern as the natural evolution of Multiple Dis-
patch is given in this blog post.
7.9. EXERCISES 157
7.9 Exercises
7.9.1 Exercise 1a
𝑥̄
∫ 𝑓(𝑥) 𝑑𝑥 ≈ 𝜔 ⋅ 𝑓 ⃗
𝑥
𝜔 ≡ Δ [ 21 1 … 1 1
2] ∈ 𝑅𝑁
Out[37]: 0.3333503384008434
However, in this case the creation of the ω temporary is inefficient as there are no rea-
sons to allocate an entire vector just to iterate through it with the dot. Instead, create
an iterable by following the interface definition for Iteration, and implement the modified
trap_weights and integration.
Hint: create a type such as
Make the UniformTrapezoidal type operate as an array with interface definition for Ab-
stractArray. With this, you should be able it go ω[2] or length(ω) to access the quadra-
ture weights.
Implement the same features as Exercise 1a and 1b, but for the non-uniform trapezoidal rule.
158 CHAPTER 7. GENERIC PROGRAMMING
Chapter 8
8.1 Contents
• Overview 8.2
• Numerical Integration 8.3
• Interpolation 8.4
• Linear Algebra 8.5
• General Tools 8.6
8.2 Overview
Julia has both a large number of useful, well written libraries and many incomplete poorly
maintained proofs of concept.
A major advantage of Julia libraries is that, because Julia itself is sufficiently fast, there is
less need to mix in low level languages like C and Fortran.
As a result, most Julia libraries are written exclusively in Julia.
Not only does this make the libraries more portable, it makes them much easier to dive into,
read, learn from and modify.
In this lecture we introduce a few of the Julia libraries that we’ve found particularly useful for
quantitative work in economics.
Also see data and statistical packages and optimization, solver, and related packages for more
domain specific packages.
8.2.1 Setup
159
160 CHAPTER 8. GENERAL PURPOSE PACKAGES
Many applications require directly calculating a numerical derivative and calculating expecta-
tions.
This is an adaptive Gauss-Kronrod integration technique that’s relatively accurate for smooth
functions.
However, its adaptive implementation makes it slow and not well suited to inner loops.
Alternatively, many integrals can be done efficiently with (non-adaptive) Gaussian quadra-
ture.
For example, using FastGaussQuadrature.jl
w � f.(x) = 0.6666666666666667
The only problem with the FastGaussQuadrature package is that you will need to deal
with affine transformations to the non-default domains yourself.
Alternatively, QuantEcon.jl has routines for Gaussian quadrature that translate the do-
mains.
w � cos.(x) = -3.0064051806277455e-15
8.4. INTERPOLATION 161
8.3.3 Expectations
If the calculations of the numerical integral is simply for calculating mathematical expecta-
tions of a particular distribution, then Expectations.jl provides a convenient interface.
Under the hood, it is finding the appropriate Gaussian quadrature scheme for the distribution
using FastGaussQuadrature.
E(f) = -6.991310601309959e-18
Out[6]: true
8.4 Interpolation
In economics we often wish to interpolate discrete data (i.e., build continuous functions that
join discrete sequences of points).
The package we usually turn to for this purpose is Interpolations.jl.
There are a variety of options, but we will only demonstrate the convenient notations.
Out[7]:
162 CHAPTER 8. GENERAL PURPOSE PACKAGES
In [8]: li = LinearInterpolation(x, y)
li_spline = CubicSplineInterpolation(x, y)
li(0.3) = 0.25244129544236954
Out[8]:
8.4. INTERPOLATION 163
In the above, the LinearInterpolation function uses a specialized function for regular
grids since x is a Range type.
For an arbitrary, irregular grid
interp = LinearInterpolation(x, y)
Out[9]:
164 CHAPTER 8. GENERAL PURPOSE PACKAGES
At this point, Interpolations.jl does not have support for cubic splines with irregu-
lar grids, but there are plenty of other packages that do (e.g. Dierckx.jl and GridInterpola-
tions.jl).
# linear interpolation
interp_linear = LinearInterpolation((xs, ys), A)
@show interp_linear(3, 2) # exactly log(3 + 2)
@show interp_linear(3.1, 2.1) # approximately log(3.1 + 2.1)
interp_linear(3, 2) = 1.6094379124341003
interp_linear(3.1, 2.1) = 1.6484736801441782
interp_cubic(3, 2) = 1.6094379124341
interp_cubic(3.1, 2.1) = 1.6486586594237707
The standard library contains many useful routines for linear algebra, in addition to standard
functions such as det(), inv(), factorize(), etc.
Routines are available for
• Cholesky factorization
• LU decomposition
• Singular value decomposition,
• Schur factorization, etc.
See here for further details.
8.6.1 LaTeXStrings.jl
When you need to properly escape latex code (e.g. for equation labels), use LaTeXStrings.jl.
Out[11]: an equation: 1 + 𝛼2
8.6.2 ProgressMeter.jl
9.1 Contents
• Overview 9.2
• DataFrames 9.3
• Statistics and Econometrics 9.4
9.2 Overview
This lecture explores some of the key packages for working with data and doing statistics in
Julia.
In particular, we will examine the DataFrame object in detail (i.e., construction, manipula-
tion, querying, visualization, and nuances like missing data).
While Julia is not an ideal language for pure cookie-cutter statistical analysis, it has many
useful packages to provide those tools as part of a more general solution.
This list is not exhaustive, and others can be found in organizations such as JuliaStats, Julia-
Data, and QueryVerse.
9.2.1 Setup
using GLM
9.3 DataFrames
167
168 CHAPTER 9. DATA AND STATISTICS PACKAGES
The most important data type provided is a DataFrame, a two dimensional array for storing
heterogeneous data.
Although data can be heterogeneous within a DataFrame, the contents of the columns must
be homogeneous (of the same type).
This is analogous to a data.frame in R, a DataFrame in Pandas (Python) or, more
loosely, a spreadsheet in Excel.
There are a few different ways to create a DataFrame.
Out[3]: \begin{tabular}{r|cc}
& commod & price\\
\hline
& String & Float64?\\
\hline
1 & crude & 4.2 \\
2 & gas & 11.3 \\
3 & gold & 12.1 \\
4 & silver & \emph{missing} \\
\end{tabular}
In [4]: df.price
Note that the type of this array has values Union{Missing, Float64} since it was cre-
ated with a missing value.
In [5]: df.commod
In [6]: DataFrames.describe(df)
Out[6]: \begin{tabular}{r|cccccccc}
& variable & mean & min & median & max & nunique & nmissing & eltype\\
\hline
& Symbol & …Union & Any & …Union & Any & …Union & …Union & Type\\
\hline
1 & commod & & crude & & silver & 4 & & String \\
2 & price & 9.2 & 4.2 & 11.3 & 12.1 & & 1 & Union\{Missing, Float64\} \\
\end{tabular}
While often data will be generated all at once, or read from a file, you can add to a
DataFrame by providing the key parameters.
Out[7]: \begin{tabular}{r|cc}
& commod & price\\
\hline
& String & Float64?\\
\hline
1 & crude & 4.2 \\
2 & gas & 11.3 \\
3 & gold & 12.1 \\
4 & silver & \emph{missing} \\
5 & nickel & 5.1 \\
\end{tabular}
Named tuples can also be used to construct a DataFrame, and have it properly deduce all
types.
Out[8]: \begin{tabular}{r|cc}
& t & col1\\
\hline
& Int64 & Float64\\
\hline
1 & 1 & 3.0 \\
2 & 2 & 4.0 \\
\end{tabular}
In order to modify a column, access the mutating version by the symbol df[!, :col].
11.3
12.1
missing
5.1
As discussed in the next section, note that the fundamental types, is propagated, i.e.
missing * 2 === missing.
As we discussed in fundamental types, the semantics of missing are that mathematical op-
erations will not silently ignore it.
In order to allow missing in a column, you can create/load the DataFrame from a source
with missing’s, or call allowmissing! on a column.
Out[11]: \begin{tabular}{r|cc}
& t & col1\\
\hline
& Int64 & Float64?\\
\hline
1 & 1 & 3.0 \\
2 & 2 & 4.0 \\
3 & 3 & \emph{missing} \\
4 & 4 & 5.1 \\
\end{tabular}
We can see the propagation of missing to caller functions, as well as a way to efficiently
calculate with non-missing data.
mean(df2.col1) = missing
mean(skipmissing(df2.col1)) = 4.033333333333333
Out[12]: 4.033333333333333
9.3. DATAFRAMES 171
Out[14]: \begin{tabular}{r|ccc}
& t & col1 & col2\\
\hline
& Int64 & Float64? & Float64\\
\hline
1 & 1 & 3.0 & 9.0 \\
2 & 2 & 4.0 & 16.0 \\
3 & 3 & 0.0 & 0.0 \\
4 & 4 & 5.1 & 26.01 \\
\end{tabular}
Out[15]: \begin{tabular}{r|cc}
& id & y\\
\hline
& Int64 & …Cat\\
\hline
1 & 1 & old \\
2 & 2 & young \\
3 & 3 & young \\
4 & 4 & old \\
\end{tabular}
In [16]: levels(df.y)
172 CHAPTER 9. DATA AND STATISTICS PACKAGES
The DataFrame (and similar types that fulfill a standard generic interface) can fit into a va-
riety of packages.
One set of them is the QueryVerse.
Note: The QueryVerse, in the same spirit as R’s tidyverse, makes heavy use of the pipeline
syntax |>.
In [17]: x = 3.0
f(x) = x^2
g(x) = log(x)
@show g(f(x))
@show x |> f |> g; # pipes nest function calls
g(f(x)) = 2.1972245773362196
(x |> f) |> g = 2.1972245773362196
To give an example directly from the source of the LINQ inspired Query.jl
x = @from i in df begin
@where i.age>50
@select {i.name, i.children}
@collect DataFrame
end
Out[18]: \begin{tabular}{r|cc}
& name & children\\
\hline
& String & Int64\\
\hline
1 & Kirk & 2 \\
\end{tabular}
While it is possible to just use the Plots.jl library, there may be better options for dis-
playing tabular data – such as VegaLite.jl.
x=:PetalLength,
y=:PetalWidth,
color=:Species
)
Out[19]:
While Julia is not intended as a replacement for R, Stata, and similar specialty languages, it
has a growing number of packages aimed at statistics and econometrics.
Many of the packages live in the JuliaStats organization.
A few to point out
• StatsBase has basic statistical functions such as geometric and harmonic means, auto-
correlations, robust statistics, etc.
• StatsFuns has a variety of mathematical functions and constants such as pdf and cdf of
many distributions, softmax, etc.
To run linear regressions and similar statistics, use the GLM package.
x = randn(100)
y = 0.9 .* x + 0.5 * rand(100)
df = DataFrame(x=x, y=y)
ols = lm(@formula(y ~ x), df) # R-style notation
Out[20]: StatsModels.TableRegressionModel{LinearModel{GLM.
↪LmResp{Array{Float64,1}},GLM.DensePredC
hol{Float64,Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
y ~ 1 + x
Coefficients:
──────────────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept) 0.23899 0.0144942 16.4887 <1e-29 0.210226 0.267753
x 0.908299 0.0130152 69.7878 <1e-84 0.882471 0.934128
──────────────────────────────────────────────────────────────────────────
To display the results in a useful tables for LaTeX and the REPL, use RegressionTables for
output similar to the Stata package esttab and the R package stargazer.
174 CHAPTER 9. DATA AND STATISTICS PACKAGES
----------------------
y
--------
(1)
----------------------
(Intercept) 0.239***
(0.014)
x 0.908***
(0.013)
----------------------
Estimator OLS
----------------------
N 100
R2 0.980
----------------------
While Julia may be overkill for estimating a simple linear regression, fixed-effects estimation
with dummies for multiple variables are much more computationally intensive.
For a 2-way fixed-effect, taking the example directly from the documentation using cigarette
consumption data
fe(YearCategorical)),
weights = :Pop, Vcov.cluster(:State))
regtable(fixedeffectresults)
----------------------------
Sales
---------
(1)
----------------------------
NDI -0.005***
(0.001)
----------------------------
StateCategorical Yes
YearCategorical Yes
----------------------------
Estimator OLS
----------------------------
N 1,380
R2 0.803
9.4. STATISTICS AND ECONOMETRICS 175
----------------------------
176 CHAPTER 9. DATA AND STATISTICS PACKAGES
Chapter 10
10.1 Contents
• Overview 10.2
• Introduction to Differentiable Programming 10.3
• Optimization 10.4
• Systems of Equations and Least Squares 10.5
• LeastSquaresOptim.jl 10.6
• Additional Notes 10.7
• Exercises 10.8
10.2 Overview
In this lecture we introduce a few of the Julia libraries that we’ve found particularly useful for
quantitative work in economics.
10.2.1 Setup
LeastSquaresOptim
using Optim: converged, maximum, maximizer, minimizer, iterations #some�
↪extra functions
177
178 CHAPTER 10. SOLVERS, OPTIMIZERS, AND AUTOMATIC DIFFERENTIATION
The promise of differentiable programming is that we can move towards taking the derivatives
of almost arbitrarily complicated computer programs, rather than simply thinking about the
derivatives of mathematical functions. Differentiable programming is the natural evolution of
automatic differentiation (AD, sometimes called algorithmic differentiation).
Stepping back, there are three ways to calculate the gradient or Jacobian
• Analytic derivatives / Symbolic differentiation
– You can sometimes calculate the derivative on pen-and-paper, and potentially sim-
plify the expression.
– In effect, repeated applications of the chain rule, product rule, etc.
– It is sometimes, though not always, the most accurate and fastest option if there
are algebraic simplifications.
– Sometimes symbolic integration on the computer a good solution, if the package
can handle your functions. Doing algebra by hand is tedious and error-prone, but
is sometimes invaluable.
• Finite differences
– Evaluate the function at least 𝑁 + 1 times to get the gradient – Jacobians are even
worse.
– Large Δ is numerically stable but inaccurate, too small of Δ is numerically unsta-
ble but more accurate.
– Choosing the Δ is hard, so use packages such as DiffEqDiffTools.jl.
– If a function is 𝑅𝑁 → 𝑅 for a large 𝑁 , this requires 𝑂(𝑁 ) function evaluations.
𝑓(𝑥1 , … 𝑥𝑖 + Δ, … 𝑥𝑁 ) − 𝑓(𝑥1 , … 𝑥𝑖 , … 𝑥𝑁 )
𝜕𝑥𝑖 𝑓(𝑥1 , … 𝑥𝑁 ) ≈
Δ
• Automatic Differentiation
– The same as analytic/symbolic differentiation, but where the chain rule is calcu-
lated numerically rather than symbolically.
– Just as with analytic derivatives, can establish rules for the derivatives of individ-
ual functions (e.g. 𝑑 (𝑠𝑖𝑛(𝑥)) to 𝑐𝑜𝑠(𝑥)𝑑𝑥) for intrinsic derivatives.
AD has two basic approaches, which are variations on the order of evaluating the chain rule:
reverse and forward mode (although mixed mode is possible).
We will explore two types of automatic differentiation in Julia (and discuss a few packages
which implement them). For both, remember the chain rule
𝑑𝑦 𝑑𝑦 𝑑𝑤
= ⋅
𝑑𝑥 𝑑𝑤 𝑑𝑥
𝑑𝑦
Forward-mode starts the calculation from the left with 𝑑𝑤 first, which then calculates the
𝑑𝑤
product with 𝑑𝑥 . On the other hand, reverse mode starts on the right hand side with 𝑑𝑤𝑑𝑥 and
works backwards.
10.3. INTRODUCTION TO DIFFERENTIABLE PROGRAMMING 179
Take an example a function with fundamental operations and known analytical derivatives
𝑓(𝑥1 , 𝑥2 ) = 𝑥1 𝑥2 + sin(𝑥1 )
And rewrite this as a function which contains a sequence of simple operations and tempo-
raries.
Here we can identify all of the underlying functions (*, sin, +), and see if each has an in-
trinsic derivative. While these are obvious, with Julia we could come up with all sorts of dif-
ferentiation rules for arbitrarily complicated combinations and compositions of intrinsic oper-
ations. In fact, there is even a package for registering more.
In forward-mode AD, you first fix the variable you are interested in (called “seeding”), and
then evaluate the chain rule in left-to-right order.
For example, with our 𝑓(𝑥1 , 𝑓2 ) example above, if we wanted to calculate the derivative with
respect to 𝑥1 then we can seed the setup accordingly. 𝜕𝑤𝜕𝑥 = 1 since we are taking the deriva-
1
1
𝜕𝑤1
tive of it, while 𝜕𝑥1 = 0.
Following through with these, redo all of the calculations for the derivative in parallel with
the function itself.
𝜕𝑓(𝑥1 ,𝑥2 )
𝑓(𝑥1 , 𝑥2 ) 𝜕𝑥1
𝜕𝑤1
𝑤1 = 𝑥1 𝜕𝑥1 = 1 (seed)
𝜕𝑤2
𝑤2 = 𝑥2 𝜕𝑥1 = 0 (seed)
𝜕𝑤3 𝜕𝑤1 𝜕𝑤2
𝑤3 = 𝑤 1 ⋅ 𝑤2 𝜕𝑥1 = 𝑤2 ⋅ 𝜕𝑥1 + 𝑤1 ⋅ 𝜕𝑥1
𝜕𝑤4 𝜕𝑤1
𝑤4 = sin 𝑤1 𝜕𝑥1 = cos 𝑤1 ⋅ 𝜕𝑥1
𝜕𝑤5 𝜕𝑤3 𝜕𝑤4
𝑤5 = 𝑤 3 + 𝑤4 𝜕𝑥1 = 𝜕𝑥1 + 𝜕𝑥1
Since these two could be done at the same time, we say there is “one pass” required for this
calculation.
Generalizing a little, if the function was vector-valued, then that single pass would get the
entire row of the Jacobian in that single pass. Hence for a 𝑅𝑁 → 𝑅𝑀 function, requires 𝑁
passes to get a dense Jacobian using forward-mode AD.
How can you implement forward-mode AD? It turns out to be fairly easy with a generic pro-
gramming language to make a simple example (while the devil is in the details for a high-
performance implementation).
180 CHAPTER 10. SOLVERS, OPTIMIZERS, AND AUTOMATIC DIFFERENTIATION
But, note that if we keep track of the constant in front of the 𝜖 terms (e.g. a 𝑥′ and 𝑦′ )
(𝑥 + 𝑥′ 𝜖) + (𝑦 + 𝑦′ 𝜖) = (𝑥 + 𝑦) + (𝑥′ + 𝑦′ )𝜖
(𝑥 + 𝑥′ 𝜖) × (𝑦 + 𝑦′ 𝜖) = (𝑥𝑦) + (𝑥′ 𝑦 + 𝑦′ 𝑥)𝜖
exp(𝑥 + 𝑥′ 𝜖) = exp(𝑥) + (𝑥′ exp(𝑥))𝜖
Using the generic programming in Julia, it is easy to define a new dual number type which
can encapsulate the pair (𝑥, 𝑥′ ) and provide a definitions for all of the basic operations. Each
definition then has the chain-rule built into it.
With this approach, the “seed” process is simple the creation of the 𝜖 for the underlying vari-
able.
So if we have the function 𝑓(𝑥1 , 𝑥2 ) and we wanted to find the derivative 𝜕𝑥1 𝑓(3.8, 6.9) then
then we would seed them with the dual numbers 𝑥1 → (3.8, 1) and 𝑥2 → (6.9, 0).
If you then follow all of the same scalar operations above with a seeded dual number, it will
calculate both the function value and the derivative in a single “sweep” and without modify-
ing any of your (generic) code.
10.3.3 ForwardDiff.jl
Dual-numbers are at the heart of one of the AD packages we have already seen.
Out[5]: 1.4142135623730951
Out[6]: 0.35355339059327373
10.3.4 Zygote.jl
Here we see that Zygote has a gradient function as the interface, which returns a tuple.
You could create this as an operator if you wanted to.,
182 CHAPTER 10. SOLVERS, OPTIMIZERS, AND AUTOMATIC DIFFERENTIATION
D_sin = D(sin)
D_sin(4.0)
Out[8]: -0.6536436208636119
For functions of one (Julia) variable, we can find the by simply using the ' after a function
name
Or, using the complicated iterative function we defined for the squareroot,
In [10]: squareroot'(2.0)
Out[10]: 0.3535533905932737
You can Zygote supports combinations of vectors and scalars as the function parameters.
0.0im)
The gradients can be very high dimensional. For example, to do a simple nonlinear optimiza-
tion problem with 1 million dimensions, solved in a few seconds.
x_iv = rand(N)
function g!(G, x)
G .= obj'(x)
end
Caution: while Zygote is the most exciting reverse-mode AD implementation in Julia, it has
many rough edges.
• If you write a function, take its gradient, and then modify the function, you need to call
Zygote.refresh() or else the gradient will be out of sync. This may not apply for
Julia 1.3+.
• It provides no features for getting Jacobians, so you would have to ask for each row of
the Jacobian separately. That said, you probably want to use ForwardDiff.jl for
Jacobians if the dimension of the output is similar to the dimension of the input.
• You cannot, in the current release, use mutating functions (e.g. modify a value in an
array/etc.) although that feature is in progress.
• Compiling can be very slow for complicated functions.
10.4 Optimization
There are a large number of packages intended to be used for optimization in Julia.
Part of the reason for the diversity of options is that Julia makes it possible to efficiently im-
plement a large number of variations on optimization routines.
The other reason is that different types of optimization problems require different algorithms.
10.4.1 Optim.jl
xmin = result.minimizer
result.minimum
Out[14]: 0.0
xmin = maximizer(result)
fmax = maximum(result)
Out[15]: -0.0
Note: Notice that we call optimize results using result.minimizer, and maximize
results using maximizer(result).
* Candidate solution
Minimizer: [1.00e+00, 1.00e+00]
Minimum: 3.525527e-09
* Found with
Algorithm: Nelder-Mead
Initial Point: [0.00e+00, 0.00e+00]
* Convergence measures
�
√(Σ(y�-y)²)/n ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 60
f(x) calls: 118
10.4. OPTIMIZATION 185
The default algorithm in NelderMead, which is derivative-free and hence requires many
function evaluations.
To change the algorithm type to L-BFGS
"$(results.iterations) iterations")
"$(results.iterations) iterations")
Note that we did not need to use ForwardDiff.jl directly, as long as our f(x) function
was written to be generic (see the generic programming lecture ).
Alternatively, with an analytical gradient
For derivative-free methods, you can change the algorithm – and have no need to provide a
gradient
* Candidate solution
Minimizer: [0.00e+00, 0.00e+00]
Minimum: 1.000000e+00
* Found with
Algorithm: Simulated Annealing
Initial Point: [0.00e+00, 0.00e+00]
* Convergence measures
|x - x'| = NaN � 0.0e+00
|x - x'|/|x'| = NaN � 0.0e+00
|f(x) - f(x')| = NaN � 0.0e+00
|f(x) - f(x')|/|f(x')| = NaN � 0.0e+00
|g(x)| = NaN � 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 1000
f(x) calls: 1001
However, you will note that this did not converge, as stochastic methods typically require
many more iterations as a tradeoff for their global-convergence properties.
See the maximum likelihood example and the accompanying Jupyter notebook.
10.4.2 JuMP.jl
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************
Number of Iterations…: 5
(scaled) (unscaled)
Objective…: -1.4142135740093271e+00 -1.4142135740093271e+00
Dual infeasibility…: 8.2280586788385790e-09 8.2280586788385790e-09
Constraint violation…: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity…: 2.5059035815063646e-09 2.5059035815063646e-09
Overall NLP error…: 8.2280586788385790e-09 8.2280586788385790e-09
In [22]: # solve
# min (1-x)^2 + 100(y-x^2)^2)
# st x + y >= 10
using JuMP,Ipopt
m = Model(with_optimizer(Ipopt.Optimizer)) # settings for the solver
@variable(m, x, start = 0.0)
@variable(m, y, start = 0.0)
JuMP.optimize!(m)
println("x = ", value(x), " y = ", value(y))
Number of Iterations…: 14
(scaled) (unscaled)
Objective…: 1.3288608467480825e-28 1.3288608467480825e-28
Dual infeasibility…: 2.0183854587685121e-13 2.0183854587685121e-13
Constraint violation…: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity…: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error…: 2.0183854587685121e-13 2.0183854587685121e-13
Number of Iterations…: 9
(scaled) (unscaled)
Objective…: 2.8946075504894599e+00 2.8946075504894599e+00
Dual infeasibility…: 5.9130478291535837e-13 5.9130478291535837e-13
Constraint violation…: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity…: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error…: 5.9130478291535837e-13 5.9130478291535837e-13
10.4.3 BlackBoxOptim.jl
function rosenbrock2d(x)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
lector,BlackBoxOptim.
↪AdaptiveDiffEvoRandBin{3},RandomBound{ContinuousRectSearchSpace}}
10.5. SYSTEMS OF EQUATIONS AND LEAST SQUARES 191
Fitness: 0.000000000
10.5.1 Roots.jl
Out[24]: 0.40829350427936706
10.5.2 NLsolve.jl
The NLsolve.jl package provides functions to solve for multivariate systems of equations and
fixed points.
From the documentation, to solve for a system of equations without providing a Jacobian
f(x) = [(x[1]+3)*(x[2]^3-7)+18
sin(x[2]*exp(x[1])-1)] # returns an array
In the above case, the algorithm used finite differences to calculate the Jacobian.
Alternatively, if f(x) is written generically, you can use auto-differentiation with a single
setting.
println("converged=$(NLsolve.converged(results)) at root=$(results.zero)�
↪ in "*
"$(results.iterations) iterations and $(results.f_calls) function calls")
Providing a function which operates inplace (i.e., modifies an argument) may help perfor-
mance for large systems of equations (and hurt it for small ones).
F[2] = sin(x[2]*exp(x[1])-1)
end
println("converged=$(NLsolve.converged(results)) at root=$(results.zero)�
↪ in "*
"$(results.iterations) iterations and $(results.f_calls) function calls")
10.6 LeastSquaresOptim.jl
Many optimization problems can be solved using linear or nonlinear least squares.
Let 𝑥 ∈ 𝑅𝑁 and 𝐹 (𝑥) ∶ 𝑅𝑁 → 𝑅𝑀 with 𝑀 ≥ 𝑁 , then the nonlinear least squares problem is
While 𝐹 (𝑥)𝑇 𝐹 (𝑥) → 𝑅, and hence this problem could technically use any nonlinear opti-
mizer, it is useful to exploit the structure of the problem.
In particular, the Jacobian of 𝐹 (𝑥), can be used to approximate the Hessian of the objective.
As with most nonlinear optimization problems, the benefits will typically become evident only
when analytical or automatic differentiation is possible.
If 𝑀 = 𝑁 and we know a root 𝐹 (𝑥∗ ) = 0 to the system of equations exists, then NLS is the
defacto method for solving large systems of equations.
An implementation of NLS is given in LeastSquaresOptim.jl.
From the documentation
Note: Because there is a name clash between Optim.jl and this package, to use both we
need to qualify the use of the optimize function (i.e. LeastSquaresOptim.optimize).
Here, by default it will use AD with ForwardDiff.jl to calculate the Jacobian, but you
could also provide your own calculation of the Jacobian (analytical or using finite differences)
and/or calculate the function inplace.
2))
10.8 Exercises
10.8.1 Exercise 1
Here we have made it a subtype of Real so that it can pass through functions expecting Re-
als.
We can add on a variety of chain rule definitions by importing in the appropriate functions
and adding DualNumber versions. For example
dual
+(a::Number, x::DualNumber) = DualNumber(x.val + a, x.ϵ) # i.e. scalar�
↪addition, not
dual
With that, we can seed a dual number and find simple derivatives,
2. Come up with some examples of univariate and multivariate functions combining those
operations and use your AD implementation to find the derivatives.
196 CHAPTER 10. SOLVERS, OPTIMIZERS, AND AUTOMATIC DIFFERENTIATION
Chapter 11
11.1 Contents
Hitting ; brings you into shell mode, which lets you run bash commands (PowerShell on
Windows)
197
198 CHAPTER 11. JULIA TOOLS AND EDITORS
In [1]: ; pwd
/home/ubuntu/repos/lecture-source-jl/_build/jupyterpdf/executed/more_julia
In [2]: x = 2
Out[2]: 2
In [3]: ; echo $x
In [4]: ] ?
Synopsis
Commands
On some operating systems (such as OSX) REPL pasting may not work for package mode,
and you will need to access it in the standard way (i.e., hit ] first and then run your com-
mands).
Note that objects must be loaded for Julia to return their documentation, e.g.
? @test
using Test
200 CHAPTER 11. JULIA TOOLS AND EDITORS
? @test
will succeed.
11.4 Atom
As discussed previously, eventually you will want to use a fully fledged text editor.
The most feature-rich one for Julia development is Atom, with the Juno package.
There are several reasons to use a text editor like Atom, including
• Git integration (more on this in the next lecture).
• Painless inspection of variables and data.
• Easily run code blocks, and drop in custom snippets of code.
• Integration with Julia documentation and plots.
Installing Atom
Installing Juno
3. Type uber-juno into the search box and then click Install on the package that ap-
pears.
5. When it asks you whether or not to use the standard layout, click yes.
At that point, you should see a built-in REPL at the bottom of the screen and be able to
start using Julia and Atom.
11.4. ATOM 201
Troubleshooting
Sometimes, Juno will fail to find the Julia executable (say, if it’s installed somewhere non-
standard, or you have multiple).
To do this 1. Ctrl-, to get Settings pane, and select the Packages tab. 2. Type in
julia-client and choose Settings. 3. Find the Julia Path, and fill it in with the location
of the Julia binary.
• To find the binary, you could run Sys.BINDIR in the REPL, then add in an
additional /julia to the end of the screen.
• e.g. C:\Users\YOURUSERNAME\AppData\Local\Julia-1.0.1\bin\julia.exe
on Windows as /Applications/Julia-1.0.app/Contents/Resources/julia/bin/julia
on OSX.
See the setup instructions for Juno if you have further issues.
If you upgrade Atom and it breaks Juno, run the following in a terminal.
If you aren’t able to install apm in your PATH, you can do the above by running the follow-
ing in PowerShell:
cd $ENV:LOCALAPPDATA/atom/bin
Upgrading Julia
To get a new release working with Jupyter, run (in the new version’s REPL)
] add IJulia
] build IJulia
If you follow the instructions, you should see something like this when you open a new file.
If you don’t, simply go to the command palette and type “Julia standard layout”
The bottom pane is a standard REPL, which supports the different modes above.
The “workspace” pane is a snapshot of currently-defined objects.
For example, if we define an object in the REPL
In [5]: x = 2
Out[5]: 2
The ans variable simply captures the result of the last computation.
11.4. ATOM 203
The Plots pane captures Julia plots output (the code is as follows)
using Plots
gr(fmt = :png);
data = rand(10, 10)
h = heatmap(data)
Note: The plots feature is not perfectly reliable across all plotting backends, see the Basic
Usage page.
204 CHAPTER 11. JULIA TOOLS AND EDITORS
Julia’s package manager lets you set up Python-style “virtualenvs,” or subsets of packages
that draw from an underlying pool of assets on the machine.
This way, you can work with (and specify) the dependencies (i.e., required packages) for one
project without worrying about impacts on other projects.
• An environment is a set of packages specified by a Project.toml (and optionally, a
Manifest.toml).
• A registry is a git repository corresponding to a list of (typically) registered pack-
ages, from which Julia can pull (for more on git repositories, see version control).
• A depot is a directory, like ~/.julia, which contains assets (compile caches, reg-
istries, package source directories, etc.).
Essentially, an environment is a dependency tree for a project, or a “frame of mind” for Ju-
lia’s package manager.
• We can see the default (v1.1) environment as such
In [6]: ] st
Status `~/repos/lecture-source-
jl/_build/jupyterpdf/executed/more_julia/Project.toml`
[2169fc97] AlgebraicMultigrid v0.2.2
[28f2ccd6] ApproxFun v0.11.13
[7d9fca2a] Arpack v0.4.0
[aae01518] BandedMatrices v0.15.7
[6e4b80f9] BenchmarkTools v0.5.0
[a134a8b2] BlackBoxOptim v0.5.0
[ffab5731] BlockBandedMatrices v0.8.4
[324d7699] CategoricalArrays v0.8.0
[34da2185] Compat v2.2.0
[a93c6f00] DataFrames v0.21.0
[1313f7d8] DataFramesMeta v0.5.1
[39dd38d3] Dierckx v0.4.1
[9fdde737] DiffEqOperators v4.10.0
[31c24e10] Distributions v0.23.2
[2fe49d83] Expectations v1.1.1
[a1e7a1ef] Expokit v0.2.0
[d4d017d3] ExponentialUtilities v1.6.0
[442a2c76] FastGaussQuadrature v0.4.2
[1a297f60] FillArrays v0.8.9
[9d5cd8c9] FixedEffectModels v0.10.7
[c8885935] FixedEffects v0.7.3
[587475ba] Flux v0.10.4
[f6369f11] ForwardDiff v0.10.10
[38e38edf] GLM v1.3.9
[28b8d3ca] GR v0.49.1
[40713840] IncompleteLU v0.1.1
[43edad99] InstantiateFromURL v0.5.0
11.5. PACKAGE ENVIRONMENTS 205
• And go to it
In [8]: ; cd ExampleEnvironment
206 CHAPTER 11. JULIA TOOLS AND EDITORS
/home/ubuntu/repos/lecture-source-
jl/_build/jupyterpdf/executed/more_julia/ExampleEnvironment
In [9]: ] activate .
Updating git-repo
`https://github.com/JuliaRegistries/General.git`
[2a0f44e3] + Base64
[ade2ca70] + Dates
[8bb1440f] + DelimitedFiles
[8ba89e20] + Distributed
[b77e0a4c] + InteractiveUtils
[76f85450] + LibGit2
[8f399da3] + Libdl
[37e2e46d] + LinearAlgebra
[56ddb016] + Logging
[d6f4376e] + Markdown
[a63ad114] + Mmap
[44cfe95a] + Pkg
[de0858da] + Printf
[3fa0cd96] + REPL
[9a3f8284] + Random
[ea8e919c] + SHA
[9e88b42a] + Serialization
[1a1011a3] + SharedArrays
[6462fe0b] + Sockets
[2f01184e] + SparseArrays
[10745b16] + Statistics
[4607b0f0] + SuiteSparse
[8dfed614] + Test
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
name = "ExampleEnvironment"
uuid = "14d3e79e-e2e5-11e8-28b9-19823016c34c"
authors = ["QuantEcon User <[email protected]>"]
version = "0.1.0"
[deps]
Expectations = "2fe49d83-0758-5602-8f54-1f90ad0d522b"
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
We can also
In [11]: ] precompile
Precompiling project…
┌ Info: Precompiling Expectations [2fe49d83-0758-5602-8f54-1f90ad0d522b]
└ @ Base loading.jl:1260
┌ Warning: Package Expectations does not have LinearAlgebra in its dependencies:
│ - If you have Expectations checked out for development and have
│ added LinearAlgebra as a dependency but haven't updated your primary
│ environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with Expectations
└ Loading LinearAlgebra into Expectations from project dependency, future warnings�
↪for
Expectations are suppressed.
208 CHAPTER 11. JULIA TOOLS AND EDITORS
Note The TOML files are independent of the actual assets (which live in
~/.julia/packages, ~/.julia/dev, and ~/.julia/compiled).
You can think of the TOML as specifying demands for resources, which are supplied by the
~/.julia user depot.
• To return to the default Julia environment, simply
In [12]: ] activate
In [13]: ; cd ..
/home/ubuntu/repos/lecture-source-jl/_build/jupyterpdf/executed/more_julia
11.5.1 InstantiateFromURL
With this knowledge, we can explain the operation of the setup block
What this github_project function does is activate (and if necessary, download, instanti-
ate and precompile) a particular Julia environment.
Chapter 12
12.1 Contents
• Setup 12.2
• Basic Objects 12.3
• Individual Workflow 12.4
• Collaborative Work 12.5
• Collaboration via Pull Request 12.6
• Additional Resources and Troubleshooting 12.7
• Exercises 12.8
Co-authored with Arnav Sood
An essential part of modern software engineering is using version control.
We use version control because
• Not all iterations on a file are perfect, and you may want to revert changes.
• We want to be able to see who has changed what and how.
• We want a uniform version scheme to do this between people and machines.
• Concurrent editing on code is necessary for collaboration.
• Version control is an essential part of creating reproducible research.
In this lecture, we’ll discuss how to use Git and GitHub.
12.2 Setup
• If you are a student, be sure to use the GitHub Student Developer Pack.
• Otherwise, see if you qualify for a free Non-Profit/Academic Plan.
• These come with things like unlimited private repositories, testing support, etc.
2. Install git.
209
210 CHAPTER 12. GIT, GITHUB, AND VERSION CONTROL
12.3.1 Repositories
The fundamental object in GitHub is a repository (or “repo”) – this is the master directory
for a project.
One example of a repo is the QuantEcon Expectations.jl package.
On the machine, a repo is a normal directory, along with a subdirectory called .git which
contains the history of changes.
12.3.2 Commits
In addition, each GitHub repository typically comes with a few standard text files
• A .gitignore file, which lists files/extensions/directories that GitHub shouldn’t try
to track (e.g., LaTeX compilation byproducts).
• A README.md file, which is a Markdown file which GitHub puts on the repository web-
site.
• A LICENSE.txt file, which describes the terms under which the repository’s contents
are made available.
For an example of all three, see the Expectations.jl repo.
Of these, the README.md is the most important, as GitHub will display it as Markdown
when accessing the repository online.
In this section, we’ll describe how to use GitHub to version your own projects.
Much of this will carry over to the collaborative section.
In general, we will always want to repos for new projects using the following dropdown
Now that we have the repository, we can start working with it.
For example, let’s say that we’ve amended the README.md (using our editor of choice), and
also added a new file economics.jl which we’re still working on.
Returning to GitHub Desktop, we should see something like
To select individual files for commit, we can use the check boxes to the left of each file.
Let’s say you select only the README to commit. Going to the history tab should show you
our change
The small “1^” to the right of the text indicates we have one commit to upload.
As mentioned, one of the key features of GitHub is the ability to scan through history.
By clicking the “commits” tab on the repo front page, we see this page (as an example).
Clicking an individual commit gives us the difference view, (e.g., example commit).
Sometimes, however, we want to not only inspect what happened before, but reverse the com-
mit.
• If you haven’t made the commit yet, just right-click the file in the “changes” tab and hit
“discard changes” to reset the file to the last known commit.
• If you have made the commit but haven’t pushed to the server yet, go to the “history”
tab as above, right click the commit and click “revert this commit.” This will create the
inverse commit, shown below.
Generally, you want to work on the same project but across multiple machines (e.g., a home
laptop and a lab workstation).
The key is to push changes from one machine, and then to pull changes from the other ma-
chine.
Pushing can be done as above.
To pull, simply click pull under the “repository” dropdown at the top of the screen
GitHub’s website also comes with project management tools to coordinate work between peo-
ple.
The main one is an issue, which we can create from the issues tab.
You should see something like this
Any project management tool needs to figure out how to reconcile conflicting changes be-
tween people.
In GitHub, this event is called a “merge conflict,” and occurs whenever people make conflict-
ing changes to the same line of code.
Note that this means that two people touching the same file is OK, so long as the differences
are compatible.
A common use case is when we try to push changes to the server, but someone else has
pushed conflicting changes.
GitHub will give us the following window
12.6. COLLABORATION VIA PULL REQUEST 217
• The warning symbol next to the file indicates the existence of a merge conflict.
• The viewer tries to show us the discrepancy (I changed the word repository to repo, but
someone else tried to change it to “repo” with quotes).
To fix the conflict, we can go into a text editor (such as Atom)
Let’s say we click the first “use me” (to indicate that my changes should win out), and then
save the file.
Returning to GitHub Desktop gives us a pre-formed commit to accept
Clicking “commit to master” will let us push and pull from the server as normal.
One of the defining features of GitHub is that it is the dominant platform for open source
code, which anyone can access and use.
218 CHAPTER 12. GIT, GITHUB, AND VERSION CONTROL
However, while anyone can make a copy of the source code, not everyone has access to modify
the particular version stored on GitHub.
A maintainer (i.e. someone with “write” access to directly modify a repository) might con-
sider different contributions and “merge” the changes into the main repository if the changes
meet their criteria.
A pull request (“PR”) allows any outsiders to suggest changes to open source repositories.
A PR requests the project maintainer to merge (“pull”) changes you’ve worked on into their
repository.
There are a few different workflows for creating and handling PRs, which we’ll walk through
below.
Note: If the changes are for a Julia Package, you will need to follow a different workflow –
described in the testing lecture.
GitHub’s website provides an online editor for quick and dirty changes, such as fixing typos in
documentation.
To use it, open a file in GitHub and click the small pencil to the upper right
Here, we’re trying to add the QuantEcon link to the Julia project’s README file.
After making our changes, we can then describe and propose them for review by maintainers.
But what if we want to make more in-depth changes?
A common problem is when we don’t have write access (i.e. we can’t directly modify) the
repo in question.
In that case, click the “Fork” button that lives in the top-right of every repo’s main page
12.6. COLLABORATION VIA PULL REQUEST 219
This will copy the repo into your own GitHub account.
For example, this repo is a fork of our original git setup.
Clone this fork to our desktop and work with it in exactly the same way as we would a repo
we own (as the fork is in your account, you now have write access).
That is, click the “clone” button on our fork
You’ll see a new repo with the same name but different URL in your GitHub Desktop repo
list, along with a special icon to indicate that it’s a fork
Commit some changes by selecting the files and writing a commit message
220 CHAPTER 12. GIT, GITHUB, AND VERSION CONTROL
Below, for example, we’ve committed and pushed some changes to the fork that we want to
upstream into the main repo
We should make sure that these changes are on the server (which we can get to by going to
the fork and clicking “commits”)
Next, go to the pull requests menu and click “New Pull Request”.
You’ll see something like this
This gives us a quick overview of the commits we want to merge in, as well as the overall dif-
ferences.
Hit create and then click through the following form.
222 CHAPTER 12. GIT, GITHUB, AND VERSION CONTROL
That is, creating a pull request is not like bundling up your changes and delivering them, but
rather like opening an ongoing connection between two repositories, that is only severed when
the PR is closed or merged.
12.6. COLLABORATION VIA PULL REQUEST 223
As you become more familiar with GitHub, and work on larger projects, you will find yourself
making PRs even when it isn’t strictly required.
If you are a maintainer of the repo (e.g. you created it or are a collaborator) then you don’t
need to create a fork, but will rather work with a git branch.
Branches in git represent parallel development streams (i.e., sequences of commits) that the
PR is trying to merge.
First, load the repo in GitHub Desktop and use the branch dropdown
Click “New Branch” and choose an instructive name (make sure there are no spaces or special
characters).
This will “check out” a new branch with the same history as the old one (but new commits
will be added only to this branch).
We can see the active branch in the top dropdown
For example, let’s say we add some stuff to the Julia code file and commit it
To put this branch (with changes) on the server, we simply need to click “Publish Branch”.
Navigating to the repo page, we will see a suggestion about a new branch
224 CHAPTER 12. GIT, GITHUB, AND VERSION CONTROL
One special case is when the repo in question is actually a Julia project or package.
We cover that (along with package workflow in general) in the testing lecture.
You may want to go beyond the scope of this tutorial when working with GitHub.
For example, perhaps you run into a bug, or you’re working with a setup that doesn’t have
GitHub Desktop installed.
Here are some resources to help
• Kate Hudson’s excellent git flight rules, which is a near-exhaustive list of situations you
could encounter, and command-line fixes.
• The GitHub Learning Lab, an interactive sandbox environment for git.
• The docs for forking on GitHub Desktop and the GitHub Website.
From here, you can get the latest files on the server by cd-ing into the directory and running
git pull.
When you pull from the server, it will never overwrite your modified files, so it is impossible
to lose local changes.
Instead, to do a hard reset of all files and overwrite any of your local changes, you can run
git reset --hard origin/master.
12.8 Exercises
12.8.1 Exercise 1a
Follow the instructions to create a new repository for one of your GitHub accounts. In this
repository
• Take the code from one of your previous assignments, such as Newton’s method in In-
troductory Examples (either as a .jl file or a Jupyter notebook).
• Put in a README.md with some text.
• Put in a .gitignore file, ignoring the Jupyter files .ipynb_checkpoints and the
project files, .projects.
12.8.2 Exercise 1b
Pair-up with another student who has done Exercise 1a and find out their GitHub ID, and
each do the following
• Add the GitHub ID as a collaborators on your repository.
226 CHAPTER 12. GIT, GITHUB, AND VERSION CONTROL
12.8.3 Exercise 1c
Pair-wise with the results of Exercise 1b examine a merge-conflict by editing the README.md
file for your repository that you have both setup as collaborators.
Start by ensuring there are multiple lines in the file so that some changes may have conflicts,
and some may not.
• Clone the repository to your local desktops.
• Modify different lines of code in the file and both commit and push to the server (prior
to pulling from each other)–and see how it merges things “automatically”.
• Modify the same line of code in the file, and deal with the merge conflict.
12.8.4 Exercise 2a
Just using GitHub’s web interface, submit a Pull Request for a simple change of documenta-
tion to a public repository.
The easiest may be to submit a PR for a typo in the source repository for these notes, i.e.
https://github.com/QuantEcon/lecture-source-jl.
Note: The source for that repository is in .rst files, but you should be able to find spelling
mistakes/etc. without much effort.
12.8.5 Exercise 2b
Following the instructions for forking and cloning a public repository to your local desktop,
submit a Pull Request to a public repository.
Again, you could submit it for a typo in the source repository for these notes, i.e.
https://github.com/QuantEcon/lecture-source-jl, but you are also encouraged
to instead look for a small change that could help the documentation in another repository.
If you are ambitious, then go to the Exercise Solutions for one of the Exercises in these lec-
ture notes and submit a PR for your own modified version (if you think it is an improve-
ment!).
Chapter 13
13.1 Contents
A complex system that works is invariably found to have evolved from a simple
system that worked. The inverse proposition also appears to be true: A complex
system designed from scratch never works and cannot be made to work. You have
to start over, beginning with a working simple system – Gall’s Law.
227
228 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
Travis CI
As we’ll see later, Travis is a service that automatically tests your project on the GitHub
server.
First, we need to make sure that your GitHub account is set up with Travis CI and Codecov.
As a reminder, make sure you signed up for the GitHub Student Developer Pack or Academic
Plan if eligible.
Navigate to the travis-ci.com website and click “sign up with GitHub” – supply your creden-
tials.
If you get stuck, see the Travis tutorial.
Codecov
Codecov is a service that tells you how comprehensive your tests are (i.e., how much of your
code is actually tested).
To sign up, visit the Codecov website, and click “sign up”
Next, click “add a repository” and enable private scope (this allows Codecov to service your
private projects).
The result should be
13.2. PROJECT SETUP 229
Note: Before these steps, make sure that you’ve either completed the version control lecture
or run.
Note: Throughout this lecture, important points and sequential workflow steps are listed as
bullets.
To set up a project on Julia:
• Load the PkgTemplates package.
using PkgTemplates
Note: Make sure you replace the quanteconuser with your GitHub ID.
• Create a specific project based off this template
230 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
generate("ExamplePackage.jl", ourTemplate)
If we navigate to the package directory, we should see something like the following.
In particular
• The repo you create should have the same name as the project we added.
• We should leave the boxes unchecked for the README.md, LICENSE, and
.gitignore, since these are handled by PkgTemplates.
Then,
13.2. PROJECT SETUP 231
• Drag and drop your folder from your ~/.julia/dev directory to GitHub Desktop.
• Click the “publish branch” button to upload your files to GitHub.
If you navigate to your git repo (ours is here), you should see something like
Note: Be sure that you don’t separately clone the repo you just added to another location
(i.e., to your desktop).
A key note is that you have some set of files on your local machine (here in
~/.julia/dev/ExamplePackage.jl) and git is plugged into those files.
For convenience, you might want to create a shortcut to that location somewhere accessible.
] activate
to get into the main Julia environment (more on environments in the second half of this lec-
ture).
• And run
] dev .
] st
For more on the package mode, see the tools and editors lecture.
using ExamplePackage
using ExamplePackage
pathof(ExamplePackage) # returns path to src/ExamplePackage.jl
module ExamplePackage
end # module
• Likewise, the test directory should have only one file (runtests.jl), which reads
using ExamplePackage
using Test
Environments
As before, the .toml files define an environment for our project, or a set of files which repre-
sent the dependency information.
The actual files are written in the TOML language, which is a lightweight format to specify
configuration options.
This information is the name of every package we depend on, along with the exact versions of
those packages.
This information (in practice, the result of package operations we execute) will be reflected in
our ExamplePackage.jl directory’s TOML, once that environment is activated (selected).
This allows us to share the project with others, who can exactly reproduce the state used to
build and test it.
See the Pkg3 docs for more information.
Pkg Operations
] activate ExamplePackage
This tells Julia to write the results of package operations to ExampleProject’s TOML,
and use the versions of packages specified there.
Note that the base environment isn’t special, except that it’s what’s loaded by a freshly-
started REPL or Jupyter notebook.
• Add a package
] add Expectations
] activate
] activate ExamplePackage
] add Distributions
and edit the source (paste this into the file itself ) to read as follows
module ExamplePackage
export foo
end # module
13.4. PROJECT WORKFLOW 235
] activate
using ExamplePackage
ExamplePackage.greet()
Note: You may need to quit your REPL and load the package again.
From here, we can use our package’s functions as we would functions from other packages.
This lets us produce neat output documents, without pasting the whole codebase.
We can also run package operations inside the notebook
name = "ExamplePackage"
uuid = "f85830d0-e1f0-11e8-2fad-8762162ab251"
authors = ["QuantEcon User <[email protected]>"]
236 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
version = "0.1.0"
[deps]
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Expectations = "2fe49d83-0758-5602-8f54-1f90ad0d522b"
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[targets]
test = ["Test"]
] dev https://github.com/quanteconuser/ExamplePackage.jl.git
In [2]: DEPOT_PATH[1]
Out[2]: "/home/ubuntu/.julia"
] activate ExamplePackage
] instantiate
Julia provides testing features through a built-in package called Test, which we get by
using Test.
The basic object is the macro @test
In [4]: @test_broken 1 == 2
This way, we still have access to information about the test, instead of just deleting it or
commenting it out.
There are other test macros, that check for things like error handling and type-stability.
Advanced users can check the Julia docs.
13.5.2 Example
Let’s add some unit tests for the foo() function we defined earlier.
Our tests/runtests.jl file should look like this.
As before, this should be pasted into the file directly
using ExamplePackage
using Test
238 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
And run it by typing ] test into an activated REPL (i.e., a REPL where you’ve run ]
activate ExamplePackage).
There are a few different ways to run the tests for your package.
• Run the actual runtests.jl, say by hitting shift-enter on it in Atom.
• From a fresh (v1.1) REPL, run ] test ExamplePackage.
• From an activated (ExamplePackage) REPL, simply run ] test (recall that you can
activate with ] activate ExamplePackage).
13.6.1 Setup
By default, Travis should have access to all your repositories and deploy automatically.
This includes private repos if you’re on a student developer pack or an academic plan (Travis
detects this automatically).
To change this, go to “settings” under your GitHub profile
13.6. CONTINUOUS INTEGRATION WITH TRAVIS 239
Click “Applications,” then “Travis CI,” then “Configure,” and choose the repos you want to
be tracked.
By default, Travis will compile and test your project (i.e., “build” it) for new commits and
PRs for every tracked repo with a .travis.yml file.
We can see ours by opening it in Atom
# Documentation: http://docs.travis-ci.com/user/languages/julia/
language: julia
os:
- linux
- osx
julia:
- 1.1
- nightly
matrix:
allow_failures:
- julia: nightly
fast_finish: true
notifications:
email: false
after_success:
- julia -e 'using Pkg; Pkg.add("Coverage"); using Coverage; Codecov.submit(proce
240 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
This is telling Travis to build the project in Julia, on OSX and Linux, using Julia v1.1 and
the latest development build (“nightly”).
It also says that if the nightly version doesn’t work, that shouldn’t register as a failure.
Note You won’t need OSX unless you’re building something Mac-specific, like iOS or Swift.
You can delete those lines to speed up the build, likewise for the nightly Julia version.
As above, builds are triggered whenever we push changes or open a pull request.
For example, if we push our changes to the server and then click the Travis badge (the one
which says “build”) on the README, we should see something like
Note that you may need to wait a bit and/or refresh your browser.
This gives us an overview of all the builds running for that commit.
To inspect a build more closely (say, if it fails), we can click on it and expand the log options
13.6. CONTINUOUS INTEGRATION WITH TRAVIS 241
Note that the build times here aren’t informative, because we can’t generally control the
hardware to which our job is allocated.
We can also cancel specific jobs, either from their specific pages or by clicking the grey “x”
button on the dashboard.
Lastly, we can trigger builds manually (without a new commit or PR) from the Travis
overview
To commit without triggering a build, simply add “[ci skip]” somewhere inside the commit
message.
One key feature of Travis is the ability to see at-a-glance whether PRs pass tests before merg-
ing them.
This happens automatically when Travis is enabled on a repository.
For an example of this feature, see this PR in the Games.jl repository.
242 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
Beyond the success or failure of our test suite, we also want to know how much of our code
the tests cover.
The tool we use to do this is called Codecov.
13.7.1 Setup
You’ll find that Codecov is automatically enabled for public repos with Travis.
For private ones, you’ll need to first get an access token.
Add private scope in the Codecov website, just like we did for Travis.
Navigate to the repo settings page (i.e., https://codecov.io/gh/quanteconuser/ExamplePackage.
for our repo) and copy the token.
Next, go to your Travis settings and add an environment variable as below
Click the Codecov badge to see the build page for your project.
This shows us that our tests cover 50% of our functions in src//.
Note: To get a more detailed view, we can click the src// and the resultant filename.
Note: Codecov may take a few minutes to run for the first time
13.8. PULL REQUESTS TO EXTERNAL JULIA PROJECTS 243
This shows us precisely which methods (and parts of methods) are untested.
As mentioned in version control, sometimes we’ll want to work on external repos that are also
Julia projects.
• ] dev the git URL (or package name, if the project is a registered Julia package),
which will both clone the git repo to ~/.julia/dev and sync it with the Julia pack-
age manager.
For example, running
] dev Expectations
In [5]: DEPOT_PATH[1]
Out[5]: "/home/ubuntu/.julia"
The ] dev command will also add the target to the package manager, so that whenever we
run using Expectations, Julia will load our cloned copy from that location
using Expectations
pathof(Expectations) # points to our git clone
244 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
• Edit the settings in GitHub Desktop (from the “Repository” dropdown) to reflect the
new URL.
Here, we’d change the highlighted text to read quanteconuser, or whatever our
GitHub ID is.
• If you make some changes in a text editor and return to GitHub Desktop, you’ll see
something like.
Note: As before, we’re editing the files directly in ~/.julia/dev, as opposed to cloning
the repo again.
whether it’s to be committed) and then pushing (e.g., hitting “push” under the “Reposi-
tory” dropdown) will add the committed changes to your account.
To confirm this, we can check the history on our account here; for more on working with git
repositories, see the version control lecture.
The green check mark indicates that Travis tests passed for this commit.
• Clicking “new pull request” from the pull requests tab will show us a snapshot of the
changes, and let us create a pull request for project maintainers to review and approve.
For more on PRs, see the relevant section of the version control lecture.
For more on forking, see the docs on GitHub Desktop and the GitHub Website.
If you have write access to the repo, we can skip the preceding steps about forking and
changing the URL.
You can use ] dev on a package name or the URL of the package
] dev Expectations
using Expectations
pathof(Expectations) # points to our git clone
246 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
From here, we can edit this package just like we created it ourselves and use GitHub Desk-
top to track versions of our package files (say, after ] up, or editing source code, ] add
Package, etc.).
To “un-dev” a Julia package (say, if we want to use our old Expectations.jl), you can
simply run
] free Expectations
] rm Expectations
13.9 Benchmarking
Another goal of testing is to make sure that code doesn’t slow down significantly from one
version to the next.
We can do this using tools provided by the BenchmarkTools.jl package.
See the need for speed lecture for more details.
• The JuliaCI organization provides more Julia utilities for continuous integration and
testing.
13.11. REVIEW 247
13.11 Review
To review the workflow for creating, versioning, and testing a new project end-to-end.
2. Add that package to the Julia package manager, by opening a Julia REPL in the
~/.julia/dev/ExamplePackage.jl, making sure the active environment is the
default one (v1.1), and hitting ] dev ..
4. Create an empty repository with the same name on the GitHub server.
7. Make changes, test, iterate on it, etc. As a rule, functions like should live in the src/
directory once they’re stable, and you should export them from that file with export
func1, func2. This will export all methods of func1, func2, etc.
8. Commit them in GitHub Desktop as you go (i.e., you can and should use version con-
trol to track intermediate states).
9. Push to the server, and see the Travis and Codecov results (note that these may take a
few minutes the first time).
13.12 Exercises
13.12.1 Exercise 1
Following the instructions for a new project, create a new package on your github account
called NewtonsMethod.jl.
In this package, you should create a simple package to do Newton’s Method using the code
you did in the Newton’s method exercise in Introductory Examples.
In particular, within your package you should have two functions
• newtonroot(f, f′; x�, tol = 1E-7, maxiter = 1000)
• newtonroot(f; x�, tol = 1E-7, maxiter = 1000)
Where the second function uses Automatic Differentiation to call the first.
The package should include
• implementations of those functions in the /src directory
• comprehensive set of tests
• project and manifest files to replicate your development environment
• automated running of the tests with Travis CI in GitHub
For the tests, you should have at the very minimum
• a way to handle non-convergence (e.g. return back nothing as discussed in error han-
dling
248 CHAPTER 13. PACKAGES, TESTING, AND CONTINUOUS INTEGRATION
• several @test for the root of a known function, given the f and analytical f' deriva-
tives
• tests of those roots using the automatic differentiation version of the function
• test of finding those roots with a BigFloat and not just a Float64
• test of non-convergence for a function without a root (e.g. 𝑓(𝑥) = 2 + 𝑥2 )
• test to ensure that the maxiter is working (e.g. what happens if you call maxiter =
5
• test to ensure that tol is working
And anything else you can think of. You should be able to run ] test for the project to
check that the test-suite is running, and then ensure that it is running automatically on
Travis CI.
Push a commit to the repository which breaks one of the tests and see what the Travis CI
reports after running the build.
13.12.2 Exercise 2
Watch the youtube video Developing Julia Packages from Chris Rackauckas. The demonstra-
tion goes through many of the same concepts as this lecture, but with more background in
test-driven development and providing more details for open-source projects..
Chapter 14
14.1 Contents
• Overview 14.2
• Understanding Multiple Dispatch in Julia 14.3
• Foundations 14.4
• JIT Compilation in Julia 14.5
• Fast and Slow Julia Code 14.6
• Further Comments 14.7
14.2 Overview
Computer scientists often classify programming languages according to the following two cat-
egories.
High level languages aim to maximize productivity by
• being easy to read, write and debug
• automating standard tasks (e.g., memory management)
• being interactive, etc.
Low level languages aim for speed and control, which they achieve by
• being closer to the metal (direct access to CPU, memory, etc.)
• requiring a relatively large amount of information from the user (e.g., all data types
must be specified)
Traditionally we understand this as a trade off
• high productivity or high performance
• optimized for humans or optimized for machines
One of the great strengths of Julia is that it pushes out the curve, achieving both high pro-
ductivity and high performance with relatively little fuss.
The word “relatively” is important here, however…
In simple programs, excellent performance is often trivial to achieve.
For longer, more sophisticated programs, you need to be aware of potential stumbling blocks.
This lecture covers the key points.
249
250 CHAPTER 14. THE NEED FOR SPEED
14.2.1 Requirements
You should read our earlier lecture on types, methods and multiple dispatch before this one.
14.2.2 Setup
This section provides more background on how methods, functions, and types are connected.
The precise data type is important, for reasons of both efficiency and mathematical correct-
ness.
For example consider 1 + 1 vs. 1.0 + 1.0 or [1 0] + [0 1].
On a CPU, integer and floating point addition are different things, using a different set of in-
structions.
Julia handles this problem by storing multiple, specialized versions of functions like addition,
one for each data type or set of data types.
These individual specialized versions are called methods.
When an operation like addition is requested, the Julia compiler inspects the type of data to
be acted on and hands it out to the appropriate method.
This process is called multiple dispatch.
Like all “infix” operators, 1 + 1 has the alternative syntax +(1, 1)
In [3]: +(1, 1)
Out[3]: 2
We see that the operation is sent to the + method that specializes in adding floating point
numbers.
Here’s the integer case
In [5]: x, y = 1, 1
@which +(x, y)
Out[5]: +(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128,�
↪UInt16, UInt32,
This output says that the call has been dispatched to the + method responsible for handling
integer values.
(We’ll learn more about the details of this syntax below)
Here’s another example, with complex numbers
Again, the call has been dispatched to a + method specifically designed for handling the
given data type.
Adding Methods
We can now be a little bit clearer about what happens when you call a function on given
types.
Suppose we execute the function call f(a, b) where a and b are of concrete types S and T
respectively.
252 CHAPTER 14. THE NEED FOR SPEED
The Julia interpreter first queries the types of a and b to obtain the tuple (S, T).
It then parses the list of methods belonging to f, searching for a match.
If it finds a method matching (S, T) it calls that method.
If not, it looks to see whether the pair (S, T) matches any method defined for immediate
parent types.
For example, if S is Float64 and T is ComplexF32 then the immediate parents are
AbstractFloat and Number respectively
In [8]: supertype(Float64)
Out[8]: AbstractFloat
In [9]: supertype(ComplexF32)
Out[9]: Number
Hence the interpreter looks next for a method of the form f(x::AbstractFloat,
y::Number).
If the interpreter can’t find a match in immediate parents (supertypes) it proceeds up the
tree, looking at the parents of the last type it checked at each iteration.
• If it eventually finds a matching method, it invokes that method.
• If not, we get an error.
This is the process that leads to the following error (since we only added the + for adding
Integer and String above)
Stacktrace:
Because the dispatch procedure starts from concrete types and works upwards, dispatch al-
ways invokes the most specific method available.
For example, if you have methods for function f that handle
and you call f with f(0.5, 1) then the first method will be invoked.
This makes sense because (hopefully) the first method is optimized for exactly this kind of
data.
The second method is probably more of a “catch all” method that handles other data in a
less optimal way.
Here’s another simple example, involving a user-defined function
function q(x::Number)
println("Number method invoked")
end
function q(x::Integer)
println("Integer method invoked")
end
Let’s now run this and see how it relates to our discussion of method dispatch above
In [12]: q(3)
In [13]: q(3.0)
In [14]: q("foo")
Since typeof(3) <: Int64 <: Integer <: Number, the call q(3) proceeds up the
tree to Integer and invokes q(x::Integer).
On the other hand, 3.0 is a Float64, which is not a subtype of Integer.
Hence the call q(3.0) continues up to q(x::Number).
Finally, q("foo") is handled by the function operating on Any, since String is not a sub-
type of Number or Integer.
254 CHAPTER 14. THE NEED FOR SPEED
For the most part, time spent “optimizing” Julia code to run faster is about ensuring the
compiler can correctly deduce types for all functions.
The macro @code_warntype gives us a hint
In [15]: x = [1, 2, 3]
f(x) = 2x
@code_warntype f(x)
Variables
#self#::Core.Compiler.Const(f, false)
x::Array{Int64,1}
Body::Array{Int64,1}
1 ─ %1 = (2 * x)::Array{Int64,1}
└── return %1
The @code_warntype macro compiles f(x) using the type of x as an example – i.e., the
[1, 2, 3] is used as a prototype for analyzing the compilation, rather than simply calcu-
lating the value.
Here, the Body::Array{Int64,1} tells us the type of the return value of the function,
when called with types like [1, 2, 3], is always a vector of integers.
In contrast, consider a function potentially returning nothing, as in this lecture
Variables
#self#::Core.Compiler.Const(f, false)
x::Int64
Body::Union{Nothing, Int64}
1 ─ %1 = (x > 0.0)::Bool
└── goto #3 if not %1
2 ─ return x
3 ─ return Main.nothing
This states that the compiler determines the return type when called with an integer (like 1)
could be one of two different types, Body::Union{Nothing, Int64}.
A final example is a variation on the above, which returns the maximum of x and 0.
Variables
#self#::Core.Compiler.Const(f, false)
x::Int64
Body::Union{Float64, Int64}
14.4. FOUNDATIONS 255
1 ─ %1 = (x > 0.0)::Bool
└── goto #3 if not %1
2 ─ return x
3 ─ return 0.0
Which shows that, when called with an integer, the type could be that integer or the floating
point 0.0.
On the other hand, if we use change the function to return 0 if x <= 0, it is type-unstable
with floating point.
Variables
#self#::Core.Compiler.Const(f, false)
x::Float64
Body::Union{Float64, Int64}
1 ─ %1 = (x > 0.0)::Bool
└── goto #3 if not %1
2 ─ return x
3 ─ return 0
The solution is to use the zero(x) function which returns the additive identity element of
type x.
On the other hand, if we change the function to return 0 if x <= 0, it is type-unstable with
floating point.
zero(2.3) = 0.0
zero(4) = 0
zero(2.0 + 3im) = 0.0 + 0.0im
Variables
#self#::Core.Compiler.Const(f, false)
x::Float64
Body::Float64
1 ─ %1 = (x > 0.0)::Bool
└── goto #3 if not %1
2 ─ return x
3 ─ %4 = Main.zero(x)::Core.Compiler.Const(0.0, false)
└── return %4
14.4 Foundations
• hardware configuration
• algorithm (i.e., set of instructions to be executed)
We’ll start by discussing the kinds of instructions that machines understand.
pushq %rbp
movq %rsp, %rbp
addq %rdi, %rdi
leaq (%rdi,%rsi,8), %rax
popq %rbp
retq
nopl (%rax)
Note that this code is specific to one particular piece of hardware that we use — different ma-
chines require different machine code.
If you ever feel tempted to start rewriting your economic model in assembly, please restrain
yourself.
It’s far more sensible to give these instructions in a language like Julia, where they can be
easily written and understood.
or Python
or even C
14.4. FOUNDATIONS 257
In any of these languages we end up with code that is much easier for humans to write, read,
share and debug.
We leave it up to the machine itself to turn our code into machine code.
How exactly does this happen?
The process for turning high level code into machine code differs across languages.
Let’s look at some of the options and how they differ from one another.
Traditional compiled languages like Fortran, C and C++ are a reasonable option for writing
fast code.
Indeed, the standard benchmark for performance is still well-written C or Fortran.
These languages compile down to efficient machine code because users are forced to provide a
lot of detail on data types and how the code will execute.
The compiler therefore has ample information for building the corresponding machine code
ahead of time (AOT) in a way that
• organizes the data optimally in memory and
• implements efficient operations as required for the task in hand
At the same time, the syntax and semantics of C and Fortran are verbose and unwieldy when
compared to something like Julia.
Moreover, these low level languages lack the interactivity that’s so crucial for scientific work.
Interpreted Languages
Interpreted languages like Python generate machine code “on the fly”, during program execu-
tion.
This allows them to be flexible and interactive.
Moreover, programmers can leave many tedious details to the runtime environment, such as
• specifying variable types
• memory allocation/deallocation, etc.
But all this convenience and flexibility comes at a cost: it’s hard to turn instructions written
in these languages into efficient machine code.
For example, consider what happens when Python adds a long list of numbers together.
258 CHAPTER 14. THE NEED FOR SPEED
Typically the runtime environment has to check the type of these objects one by one before it
figures out how to add them.
This involves substantial overheads.
There are also significant overheads associated with accessing the data values themselves,
which might not be stored contiguously in memory.
The resulting machine code is often complex and slow.
Just-in-time compilation
Just-in-time (JIT) compilation is an alternative approach that marries some of the advan-
tages of AOT compilation and interpreted languages.
The basic idea is that functions for specific tasks are compiled as requested.
As long as the compiler has enough information about what the function does, it can in prin-
ciple generate efficient machine code.
In some instances, all the information is supplied by the programmer.
In other cases, the compiler will attempt to infer missing information on the fly based on us-
age.
Through this approach, computing environments built around JIT compilers aim to
• provide all the benefits of high level languages discussed above and, at the same time,
• produce efficient instruction sets when functions are compiled down to machine code
14.5.1 An Example
.text
; ┌ @ In[21]:2 within `f'
; │┌ @ In[21]:2 within `+'
leaq (%rdi,%rsi,8), %rcx
; │└
; │┌ @ intfuncs.jl:261 within `literal_pow'
; ││┌ @ int.jl:54 within `*'
imulq %rcx, %rcx
; │└└
; │ @ In[21]:3 within `f'
; │┌ @ int.jl:54 within `*'
leaq (,%rcx,8), %rax
subq %rcx, %rax
; │└
retq
nopw %cs:(%rax,%rax)
nop
; └
If we now call f again, but this time with floating point arguments, the JIT compiler will
once more infer types for the other variables inside the function.
• e.g., y will also be a float
It then compiles a new version to handle this type of argument.
.text
; ┌ @ In[21]:2 within `f'
movabsq $140082528229400, %rax # imm = 0x7F6781559818
; │┌ @ promotion.jl:312 within `*' @ float.jl:405
vmulsd (%rax), %xmm1, %xmm1
; │└
; │┌ @ float.jl:401 within `+'
vaddsd %xmm0, %xmm1, %xmm0
; │└
; │┌ @ intfuncs.jl:261 within `literal_pow'
; ││┌ @ float.jl:405 within `*'
vmulsd %xmm0, %xmm0, %xmm0
movabsq $140082528229408, %rax # imm = 0x7F6781559820
; │└└
; │ @ In[21]:3 within `f'
; │┌ @ promotion.jl:312 within `*' @ float.jl:405
vmulsd (%rax), %xmm0, %xmm0
; │└
retq
nopw %cs:(%rax,%rax)
nop
; └
260 CHAPTER 14. THE NEED FOR SPEED
Subsequent calls using either floats or integers are now routed to the appropriate compiled
code.
To summarize what we’ve learned so far, Julia provides a platform for generating highly effi-
cient machine code with relatively little effort by combining
1. JIT compilation
2. Optional type declarations and type inference to pin down the types of variables and
hence compile efficient code
3. Multiple dispatch to facilitate specialization and optimization of compiled code for dif-
ferent data types
14.6.1 BenchmarkTools
Global variables are names assigned to values outside of any function or type definition.
The are convenient and novice programmers typically use them with abandon.
But global variables are also dangerous, especially in medium to large size programs, since
• they can affect what happens in any part of your program
• they can be changed by any function
14.6. FAST AND SLOW JULIA CODE 261
This makes it much harder to be certain about what some small part of a given piece of code
actually commands.
Here’s a useful discussion on the topic.
When it comes to JIT compilation, global variables create further problems.
The reason is that the compiler can never be sure of the type of the global variable, or even
that the type will stay constant while a given function runs.
To illustrate, consider this code, where b is global
In [24]: b = 1.0
function g(a)
global b
for i � 1:1_000_000
tmp = a + b
end
end
The code executes relatively slowly and uses a huge amount of memory.
@btime g(1.0)
If you look at the corresponding machine code you will see that it’s a mess.
.text
; ┌ @ In[24]:3 within `g'
pushq %rbp
movq %rsp, %rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
andq $-32, %rsp
subq $128, %rsp
vmovsd %xmm0, 24(%rsp)
vxorps %xmm0, %xmm0, %xmm0
vmovaps %ymm0, 32(%rsp)
movq %fs:0, %rax
movq $8, 32(%rsp)
movq -15712(%rax), %rcx
movq %rcx, 40(%rsp)
leaq 32(%rsp), %rcx
movq %rcx, -15712(%rax)
leaq -15712(%rax), %r12
movl $1000000, %ebx # imm = 0xF4240
262 CHAPTER 14. THE NEED FOR SPEED
Note that the second run was dramatically faster than the first.
That’s because the first call included the time for JIT compilaiton.
Notice also how small the memory footprint of the execution is.
Also, the machine code is simple and clean
.text
; ┌ @ In[27]:2 within `g'
retq
nopw %cs:(%rax,%rax)
nopl (%rax,%rax)
; └
Now the compiler is certain of types throughout execution of the function and hence can opti-
mize accordingly.
Another way to stabilize the code above is to maintain the global variable but prepend it
with const
Another scenario that trips up the JIT compiler is when composite types have fields with ab-
stract types.
We met this issue earlier, when we discussed AR(1) models.
Let’s experiment, using, respectively,
• an untyped field
• a field with abstract type, and
264 CHAPTER 14. THE NEED FOR SPEED
• parametric typing
As we’ll see, the last of these options gives us the best performance, while still maintaining
significant flexibility.
Here’s the untyped case
In [34]: fg = Foo_generic(1.0)
fa = Foo_abstract(1.0)
fc = Foo_concrete(1.0)
Out[34]: Foo_concrete{Float64}(1.0)
In the last case, concrete type information for the fields is embedded in the object
In [35]: typeof(fc)
Out[35]: Foo_concrete{Float64}
Timing
Let’s try timing our code, starting with the generic case:
.text
; ┌ @ In[36]:2 within `f'
pushq %rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $56, %rsp
vxorps %xmm0, %xmm0, %xmm0
vmovaps %xmm0, (%rsp)
movq $0, 16(%rsp)
movq %rsi, 48(%rsp)
movq %fs:0, %rax
movq $4, (%rsp)
movq -15712(%rax), %rcx
movq %rcx, 8(%rsp)
movq %rsp, %rcx
movq %rcx, -15712(%rax)
leaq -15712(%rax), %rax
movq %rax, 24(%rsp)
movq (%rsi), %r13
movl $1, %ebx
movabsq $jl_apply_generic, %r12
movabsq $jl_system_image_data, %r14
leaq 32(%rsp), %r15
nopl (%rax)
; │ @ In[36]:3 within `f'
; │┌ @ Base.jl:33 within `getproperty'
L128:
movq (%r13), %rbp
; │└
movq %rbx, %rdi
movabsq $jl_box_int64, %rax
callq *%rax
movq %rax, 16(%rsp)
movq %rax, 32(%rsp)
movq %rbp, 40(%rsp)
movq %r14, %rdi
movq %r15, %rsi
movl $2, %edx
callq *%r12
; │┌ @ range.jl:597 within `iterate'
addq $1, %rbx
; ││┌ @ promotion.jl:398 within `=='
cmpq $1000001, %rbx # imm = 0xF4241
; │└└
266 CHAPTER 14. THE NEED FOR SPEED
jne L128
movq 8(%rsp), %rax
movq 24(%rsp), %rcx
movq %rax, (%rcx)
movabsq $jl_system_image_data, %rax
; │ @ In[36]:3 within `f'
addq $56, %rsp
popq %rbx
popq %r12
popq %r13
popq %r14
popq %r15
popq %rbp
retq
nopw %cs:(%rax,%rax)
nopl (%rax)
; └
Some of this time is JIT compilation, and one more execution gets us down to.
Here’s the corresponding machine code
.text
; ┌ @ In[36]:2 within `f'
retq
nopw %cs:(%rax,%rax)
nopl (%rax,%rax)
; └
Much nicer…
14.6. FAST AND SLOW JULIA CODE 267
Another way we can run into trouble is with abstract container types.
Consider the following function, which essentially does the same job as Julia’s sum() func-
tion but acts only on floating point data
Out[43]: Array{Float64,1}
Out[44]: 499999.9999999796
When Julia compiles this function, it knows that the data passed in as x will be an array of
64 bit floats.
Hence it’s known to the compiler that the relevant method for + is always addition of floating
point numbers.
Moreover, the data can be arranged into continuous 64 bit blocks of memory to simplify
memory access.
Finally, data types are stable — for example, the local variable sum starts off as a float and
remains a float throughout.
Type Inferences
Here’s the same function minus the type annotation in the function signature
When we run it with the same array of floating point numbers it executes at a similar speed
as the function with type information.
Out[46]: 499999.9999999796
The reason is that when sum_array() is first called on a vector of a given data type, a
newly compiled version of the function is produced to handle that type.
In this case, since we’re calling the function on a vector of floats, we get a compiled version of
the function with essentially the same internal representation as sum_float_array().
An Abstract Container
Things get tougher for the interpreter when the data type within the array is imprecise.
For example, the following snippet creates an array where the element type is Any
In [48]: eltype(x)
Out[48]: Any
Out[49]: 14.392726722864989
Writing fast Julia code amounts to writing Julia from which the compiler can generate effi-
cient machine code.
For this, Julia needs to know about the type of data it’s processing as early as possible.
We could hard code the type of all variables and function arguments but this comes at a cost.
Our code becomes more cumbersome and less generic.
We are starting to loose the advantages that drew us to Julia in the first place.
Moreover, explicitly typing everything is not necessary for optimal performance.
The Julia compiler is smart and can often infer types perfectly well, without any performance
cost.
What we really want to do is
• keep our code simple, elegant and generic
• help the compiler out in situations where it’s liable to get tripped up
A good next stop for further reading is the relevant part of the Julia documentation.
270 CHAPTER 14. THE NEED FOR SPEED
Part III
271
Chapter 15
Linear Algebra
15.1 Contents
• Overview 15.2
• Vectors 15.3
• Matrices 15.4
• Solving Systems of Equations 15.5
• Eigenvalues and Eigenvectors 15.6
• Further Topics 15.7
• Exercises 15.8
• Solutions 15.9
15.2 Overview
Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in.
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as
𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2
The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛 .
When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions
• Does a solution actually exist?
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?
273
274 CHAPTER 15. LINEAR ALGEBRA
15.3 Vectors
15.3.1 Setup
Out[3]:
15.3. VECTORS 275
The two most common operators for vectors are addition and scalar multiplication, which we
now describe.
As a matter of definition, when we add two vectors, we add them element by element
𝑥1 𝑦1 𝑥1 + 𝑦1
⎡ 𝑥 ⎤ ⎡ 𝑦2 ⎤ ⎡ 𝑥 +𝑦 ⎤
𝑥+𝑦 =⎢ 2 ⎥+⎢ ⎥ ∶= ⎢ 2 2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 𝑥𝑛 ⎦ ⎣ 𝑦𝑛 ⎦ 𝑥
⎣ 𝑛 + 𝑦 𝑛 ⎦
Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces
𝛾𝑥1
⎡ 𝛾𝑥 ⎤
2 ⎥
𝛾𝑥 ∶= ⎢
⎢ ⋮ ⎥
⎣ 𝛾𝑥𝑛 ⎦
x = [2]
scalars = [-2 1 2]
vals = [0 0 0; x * scalars]
labels = [(-3.6, -4.2, "-2x"), (2.4, 1.8, "x"), (4.4, 3.8, "2x")]
Out[4]:
In [5]: x = ones(3)
In [6]: y = [2, 4, 6]
In [7]: x + y
𝑛
′
𝑥 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1
1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥′ 𝑥 ∶= (∑ 𝑥2𝑖 )
𝑖=1
Out[10]: 12.0
Out[11]: 12.0
Out[12]: 1.7320508075688772
Out[13]: 1.7320508075688772
278 CHAPTER 15. LINEAR ALGEBRA
15.3.4 Span
Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 , it’s natural to think about the new vectors we
can create by performing linear operations.
New vectors created in this manner are called linear combinations of 𝐴.
In particular, 𝑦 ∈ ℝ𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if
In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination.
The set of linear combinations of 𝐴 is called the span of 𝐴.
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in ℝ3 .
The span is a 2 dimensional plane passing through these two points and the origin.
# lines to vectors
x_vec = [0 0; 3 3]
y_vec = [0 0; 4 -4]
z_vec = [0 0; f(3, 4) f(3, -4)]
Out[14]:
15.3. VECTORS 279
Examples
If 𝐴 contains only one vector 𝑎1 ∈ ℝ2 , then its span is just the scalar multiples of 𝑎1 , which is
the unique line passing through both 𝑎1 and the origin.
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of ℝ3 , that is
1 0 0
𝑒1 ∶= ⎡ ⎤
⎢ 0 ⎥, 𝑒2 ∶= ⎡ ⎤
⎢ 1 ⎥, 𝑒3 ∶= ⎡ ⎤
⎢ 0 ⎥
⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦
then the span of 𝐴 is all of ℝ3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 , we can write
𝑥 = 𝑥 1 𝑒1 + 𝑥 2 𝑒2 + 𝑥 3 𝑒3
As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is what’s called linear inde-
pendence.
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 is said to be
• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴
280 CHAPTER 15. LINEAR ALGEBRA
Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors.
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛 is linearly independent and
𝑦 = 𝛽 1 𝑎1 + ⋯ 𝛽 𝑘 𝑎𝑘
15.4 Matrices
Matrices are a neat way of organizing data for use in linear operations.
An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:
Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-
cussed at the start of this lecture.
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1.
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector.
If 𝑛 = 𝑘, then 𝐴 is called square.
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴, and
denoted 𝐴′ or 𝐴⊤ .
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric.
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal
diagonal.
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal.
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
𝐴 is called the identity matrix, and denoted by 𝐼.
Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:
and
In the latter case, the matrices must have the same shape in order for the definition to make
sense.
We also have a convention for multiplying two matrices.
The rule for matrix multiplication generalizes the idea of inner products discussed above, and
is designed to make multiplication play well with basic linear operations.
If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element
the inner product of the 𝑖-th row of 𝐴 and the 𝑗-th column of 𝐵.
There are many tutorials to help you visualize this operation, such as this one, or the discus-
sion on the Wikipedia page.
If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting
matrix 𝐴𝐵 is 𝑛 × 𝑚.
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1
column vector 𝑥.
According to the preceding rule, this gives us an 𝑛 × 1 column vector
282 CHAPTER 15. LINEAR ALGEBRA
Note
𝐴𝐵 and 𝐵𝐴 are not generally the same thing.
Julia arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations.
You can create them as follows
In [15]: A = [1 2
3 4]
In [16]: typeof(A)
Out[16]: Array{Int64,2}
In [17]: size(A)
Out[17]: (2, 2)
The size function returns a tuple giving the number of rows and columns.
To get the transpose of A, use transpose(A) or, more simply, A'.
There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) — see here.
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax
In [18]: A = ones(3, 3)
In [19]: 2I
Out[19]: UniformScaling{Int64}
2*I
In [20]: A + I
Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ ℝ𝑘 into
𝑦 = 𝐴𝑥 ∈ ℝ𝑛 .
These kinds of functions have a special property: they are linear.
A function 𝑓 ∶ ℝ𝑘 → ℝ𝑛 is called linear if, for all 𝑥, 𝑦 ∈ ℝ𝑘 and all scalars 𝛼, 𝛽, we have
You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector, and
fails when 𝑏 is nonzero.
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥
for all 𝑥.
𝑦 = 𝐴𝑥 (3)
The problem we face is to determine a vector 𝑥 ∈ ℝ𝑘 that solves (3), taking 𝑦 and 𝐴 as given.
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥).
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
284 CHAPTER 15. LINEAR ALGEBRA
Out[21]:
In the first plot there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since 𝑦 lies outside the range of 𝑓.
Can we impose conditions on 𝐴 in (3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it cor-
responds to a linear combination of the columns of 𝐴.
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then
𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘
As you might recall, the condition that we want for the span to be large is linear indepen-
dence.
A happy fact is that linear independence of the columns of 𝐴 also gives us uniqueness.
Indeed, it follows from our earlier discussion that if {𝑎1 , … , 𝑎𝑘 } are linearly independent and
𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘 , then no 𝑧 ≠ 𝑥 satisfies 𝑦 = 𝐴𝑧.
Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛.
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary 𝑦 ∈ ℝ𝑛 , we hope to find a unique 𝑥 ∈ ℝ𝑛 such that 𝑦 = 𝐴𝑥.
In view of the observations immediately above, if the columns of 𝐴 are linearly independent,
then their span, and hence the range of 𝑓(𝑥) = 𝐴𝑥, is all of ℝ𝑛 .
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥.
Moreover, the solution is unique.
In particular, the following are equivalent
The property of having linearly independent columns is sometimes expressed as having full
column rank.
Inverse Matrices
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix — you can find the expression for it
here.
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular.
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴
is of full column rank.
286 CHAPTER 15. LINEAR ALGEBRA
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted.
This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns.
In this case there are either no solutions or infinitely many — in other words, uniqueness
never holds.
For example, consider the case where 𝑘 = 3 and 𝑛 = 2.
Thus, the columns of 𝐴 consists of 3 vectors in ℝ2 .
This set can never be linearly independent, since it is possible to find two vectors that span
ℝ2 .
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3 .
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write
15.5. SOLVING SYSTEMS OF EQUATIONS 287
Here’s an illustration of how to solve linear equations with Julia’s built-in linear algebra facili-
ties
In [24]: det(A)
Out[24]: -2.0
Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) * y, or using A \ y.
The latter method is preferred because it automatically selects the best algorithm for the
problem based on the types of A and y.
If A is not square then A \ y returns the least squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦.
288 CHAPTER 15. LINEAR ALGEBRA
𝐴𝑣 = 𝜆𝑣
In [29]: A = [1 2
2 1]
evals, evecs = eigen(A)
a1, a2 = evals
eig_1 = [0 0; evecs[:,1]']
eig_2 = [0 0; evecs[:,2]']
x = range(-5, 5, length = 10)
y = -x
framestyle = :origin)
plot!(a2 * eig_1[:, 2], a2 * eig_2, arrow = true, color = :red)
plot!(eig_1, eig_2, arrow = true, color = :blue)
plot!(x, y, color = :blue, lw = 0.4, alpha = 0.6)
plot!(x, x, color = :blue, lw = 0.4, alpha = 0.6)
Out[29]:
15.6. EIGENVALUES AND EIGENVECTORS 289
The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only
when the columns of 𝐴 − 𝜆𝐼 are linearly dependent.
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero.
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree
𝑛.
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might
be repeated.
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows
2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of
the eigenvalues.
4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are
1/𝜆1 , … , 1/𝜆𝑛 .
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero.
Using Julia, we can solve for the eigenvalues and eigenvectors of a matrix as follows
In [32]: evals
In [33]: evecs
It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces 𝐴 and 𝐵, seeks generalized eigenvalues 𝜆 and eigenvectors 𝑣 such that
𝐴𝑣 = 𝜆𝐵𝑣
We round out our discussion by briefly mentioning several other important topics.
Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1,
∞
then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1 .
A generalization of this idea exists in the matrix setting.
Matrix Norms
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm — in this case, the so-called spectral norm.
For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the
sense that it pulls all vectors towards the origin Section ??.
15.7. FURTHER TOPICS 291
Neumann’s Theorem
∞
(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (4)
𝑘=0
Spectral Radius
A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,
Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of
𝐴.
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a 𝑘 with ‖𝐴𝑘 ‖ < 1.
In which case (4) is valid.
Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and
hence 𝐴 is invertible (with positive definite inverse).
𝜕𝑎′ 𝑥
1. 𝜕𝑥 =𝑎
292 CHAPTER 15. LINEAR ALGEBRA
𝜕𝐴𝑥
2. 𝜕𝑥 = 𝐴′
𝜕𝑥′ 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧
𝜕𝑦′ 𝐵𝑧
5. 𝜕𝐵 = 𝑦𝑧 ′
The documentation of the linear algebra features built into Julia can be found here.
Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the
same lines as above, with solved exercises.
If you don’t mind a slightly abstract approach, a nice intermediate-level text on linear algebra
is [57].
15.8 Exercises
15.8.1 Exercise 1
𝑦 = 𝐴𝑥 + 𝐵𝑢
Here
• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix
• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite
(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)
One way to solve the problem is to form the Lagrangian
ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
1. 𝜆 = −2𝑃 𝑦
2. The optimizing choice of 𝑢 satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
3. The function 𝑣 satisfies 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 where 𝑃 ̃ = 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
As we will see, in economic contexts Lagrange multipliers often are shadow prices
Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize −(𝐴𝑥+𝐵𝑢)′ 𝑃 (𝐴𝑥+𝐵𝑢)−𝑢′ 𝑄𝑢
with respect to 𝑢. You can verify that this leads to the same maximizer.
15.9 Solutions
Thanks to Willem Hekman and Guanlong Ren for providing this solution.
15.9.1 Exercise 1
s.t.
𝑦 = 𝐴𝑥 + 𝐵𝑢
with primitives
• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix.
• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix.
• 𝐴 an 𝑛 × 𝑛 matrix.
• 𝐵 an 𝑛 × 𝑚 matrix.
The associated Lagrangian is :
𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦
since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies
𝜆 = −2𝑃 𝑦 .
294 CHAPTER 15. LINEAR ALGEBRA
2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0 .
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives
𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0 .
𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0
(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0
𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,
which follows from the definition of the first-oder conditions for Lagrangian equation.
3.
Rewriting our problem by substituting the constraint into the objective function, we get
Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then
−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
15.9. SOLUTIONS 295
Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric.
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,
Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
This implies that
Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by
denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴.
Footnotes
[1] Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ =
𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is pulled towards the origin.
296 CHAPTER 15. LINEAR ALGEBRA
Chapter 16
16.1 Contents
• Overview 16.2
• Key Definitions 16.3
• The Orthogonal Projection Theorem 16.4
• Orthonormal Basis 16.5
• Projection Using Matrix Algebra 16.6
• Least Squares Regression 16.7
• Orthogonalization and Decomposition 16.8
• Exercises 16.9
• Solutions 16.10
16.2 Overview
Orthogonal projection is a cornerstone of vector space methods, with many diverse applica-
tions.
These include, but are not limited to,
• Least squares projection, also known as linear regression
• Conditional expectations for multivariate normal (Gaussian) distributions
• Gram–Schmidt orthogonalization
• QR decomposition
• Orthogonal polynomials
• etc
In this lecture we focus on
• key ideas
• least squares regression
For background and foundational concepts, see our lecture on linear algebra.
297
298 CHAPTER 16. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
For more proofs and greater theoretical detail, see A Primer in Econometric Theory.
For a complete set of proofs in a general setting, see, for example, [90].
For an advanced treatment of projection in the context of least squares prediction, see this
book chapter.
Assume 𝑥, 𝑧 ∈ ℝ𝑛 .
Define ⟨𝑥, 𝑧⟩ = ∑𝑖 𝑥𝑖 𝑧𝑖 .
Recall ‖𝑥‖2 = ⟨𝑥, 𝑥⟩.
The law of cosines states that ⟨𝑥, 𝑧⟩ = ‖𝑥‖‖𝑧‖ cos(𝜃) where 𝜃 is the angle between the vectors
𝑥 and 𝑧.
When ⟨𝑥, 𝑧⟩ = 0, then cos(𝜃) = 0 and 𝑥 and 𝑧 are said to be orthogonal and we write 𝑥 ⟂ 𝑧
𝑆 ⟂ is a linear subspace of ℝ𝑛
• To see this, fix 𝑥, 𝑦 ∈ 𝑆 ⟂ and 𝛼, 𝛽 ∈ ℝ.
• Observe that if 𝑧 ∈ 𝑆, then
300 CHAPTER 16. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
𝑦 ̂ ∶= arg min ‖𝑦 − 𝑧‖
𝑧∈𝑆
Hence ‖𝑦 − 𝑧‖ ≥ ‖𝑦 − 𝑦‖,
̂ which completes the proof.
For a linear space 𝑌 and a fixed linear subspace 𝑆, we have a functional relationship
1. 𝑃 𝑦 ∈ 𝑆 and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆
For example, to prove 1, observe that 𝑦 = 𝑃 𝑦 + 𝑦 − 𝑃 𝑦 and apply the Pythagorean law.
Orthogonal Complement
Let 𝑆 ⊂ ℝ𝑛 .
The orthogonal complement of 𝑆 is the linear subspace 𝑆 ⟂ that satisfies 𝑥1 ⟂ 𝑥2 for every
𝑥1 ∈ 𝑆 and 𝑥2 ∈ 𝑆 ⟂ .
Let 𝑌 be a linear space with linear subspace 𝑆 and its orthogonal complement 𝑆 ⟂ .
We write
𝑌 = 𝑆 ⊕ 𝑆⟂
to indicate that for every 𝑦 ∈ 𝑌 there is unique 𝑥1 ∈ 𝑆 and a unique 𝑥2 ∈ 𝑆 ⟂ such that
𝑦 = 𝑥 1 + 𝑥2 .
16.5. ORTHONORMAL BASIS 303
𝑘
𝑥 = ∑⟨𝑥, 𝑢𝑖 ⟩𝑢𝑖 for all 𝑥∈𝑆
𝑖=1
To see this, observe that since 𝑥 ∈ span{𝑢1 , … , 𝑢𝑘 }, we can find scalars 𝛼1 , … , 𝛼𝑘 that verify
304 CHAPTER 16. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
𝑘
𝑥 = ∑ 𝛼𝑗 𝑢𝑗 (1)
𝑗=1
𝑘
⟨𝑥, 𝑢𝑖 ⟩ = ∑ 𝛼𝑗 ⟨𝑢𝑗 , 𝑢𝑖 ⟩ = 𝛼𝑖
𝑗=1
When the subspace onto which are projecting is orthonormal, computing the projection sim-
plifies:
Theorem If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis for 𝑆, then
𝑘
𝑃 𝑦 = ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , ∀ 𝑦 ∈ ℝ𝑛 (2)
𝑖=1
𝑘 𝑘
⟨𝑦 − ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , 𝑢𝑗 ⟩ = ⟨𝑦, 𝑢𝑗 ⟩ − ∑⟨𝑦, 𝑢𝑖 ⟩⟨𝑢𝑖 , 𝑢𝑗 ⟩ = 0
𝑖=1 𝑖=1
𝐸𝑆̂ 𝑦 = 𝑃 𝑦
𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′
1. 𝑃 𝑦 ∈ 𝑆, and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆
𝑃 𝑦 = 𝑈 (𝑈 ′ 𝑈 )−1 𝑈 ′ 𝑦
𝑘
𝑃 𝑦 = 𝑈 𝑈 ′ 𝑦 = ∑⟨𝑢𝑖 , 𝑦⟩𝑢𝑖
𝑖=1
We have recovered our earlier result about projecting onto the span of an orthonormal basis.
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑋 𝛽 ̂ = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑃 𝑦
Because 𝑋𝑏 ∈ span(𝑋)
If probabilities and hence 𝔼 are unknown, we cannot solve this problem directly.
However, if a sample is available, we can estimate the risk with the empirical risk:
1 𝑁
min ∑(𝑦 − 𝑓(𝑥𝑛 ))2
𝑓∈ℱ 𝑁 𝑛=1 𝑛
𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2
𝑏∈ℝ𝐾
𝑛=1
16.7.2 Solution
𝑦1 𝑥𝑛1
⎛
⎜ 𝑦2 ⎞
⎟ ⎛
⎜ 𝑥𝑛2 ⎞
⎟
𝑦 ∶= ⎜
⎜ ⎟
⎟ , 𝑥𝑛 ∶= ⎜
⎜ ⎟
⎟ = n-th obs on all regressors
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎝ 𝑦𝑁 ⎠ ⎝ 𝑥𝑛𝐾 ⎠
and
𝑁
arg min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2 = arg min ‖𝑦 − 𝑋𝑏‖
𝑏∈ℝ𝐾 𝑛=1 𝑏∈ℝ𝐾
308 CHAPTER 16. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑦 ̂ ∶= 𝑋 𝛽 ̂ = 𝑃 𝑦
𝑢̂ ∶= 𝑦 − 𝑦 ̂ = 𝑦 − 𝑃 𝑦 = 𝑀 𝑦
Let’s return to the connection between linear independence and orthogonality touched on
above.
A result of much interest is a famous algorithm for constructing orthonormal sets from lin-
early independent sets.
The next section gives details.
Theorem For each linearly independent set {𝑥1 , … , 𝑥𝑘 } ⊂ ℝ𝑛 , there exists an orthonormal
set {𝑢1 , … , 𝑢𝑘 } with
16.8.2 QR Decomposition
The following result uses the preceding algorithm to produce a useful decomposition.
Theorem If 𝑋 is 𝑛 × 𝑘 with linearly independent columns, then there exists a factorization
𝑋 = 𝑄𝑅 where
• 𝑅 is 𝑘 × 𝑘, upper triangular, and nonsingular
• 𝑄 is 𝑛 × 𝑘 with orthonormal columns
Proof sketch: Let
• 𝑥𝑗 ∶= col𝑗 (𝑋)
• {𝑢1 , … , 𝑢𝑘 } be orthonormal with same span as {𝑥1 , … , 𝑥𝑘 } (to be constructed using
Gram–Schmidt)
• 𝑄 be formed from cols 𝑢𝑖
Since 𝑥𝑗 ∈ span{𝑢1 , … , 𝑢𝑗 }, we have
𝑗
𝑥𝑗 = ∑⟨𝑢𝑖 , 𝑥𝑗 ⟩𝑢𝑖 for 𝑗 = 1, … , 𝑘
𝑖=1
For matrices 𝑋 and 𝑦 that overdetermine 𝑏𝑒𝑡𝑎 in the linear equation system 𝑦 = 𝑋𝛽, we
found the least squares approximator 𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦.
Using the QR decomposition 𝑋 = 𝑄𝑅 gives
𝛽 ̂ = (𝑅′ 𝑄′ 𝑄𝑅)−1 𝑅′ 𝑄′ 𝑦
= (𝑅′ 𝑅)−1 𝑅′ 𝑄′ 𝑦
= 𝑅−1 (𝑅′ )−1 𝑅′ 𝑄′ 𝑦 = 𝑅−1 𝑄′ 𝑦
Numerical routines would in this case use the alternative form 𝑅𝛽 ̂ = 𝑄′ 𝑦 and back substitu-
tion.
310 CHAPTER 16. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
16.9 Exercises
16.9.1 Exercise 1
16.9.2 Exercise 2
Let 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and let 𝑀 = 𝐼 − 𝑃 . Show that 𝑃 and 𝑀 are both idempotent and
symmetric. Can you give any intuition as to why they should be idempotent?
16.9.3 Exercise 3
1
𝑦 ∶= ⎜ 3 ⎞
⎛ ⎟,
⎝ −3 ⎠
and
1 0
𝑋 ∶= ⎜ 0 −6 ⎞
⎛ ⎟
⎝ 2 2 ⎠
16.10 Solutions
16.10.1 Exercise 1
16.10.2 Exercise 2
Symmetry and idempotence of 𝑀 and 𝑃 can be established using standard rules for matrix
algebra. The intuition behind idempotence of 𝑀 and 𝑃 is that both are orthogonal projec-
tions. After a point is projected into a given subspace, applying the projection again makes
no difference. (A point inside the subspace is not shifted by orthogonal projection onto that
space because it is already the closest point in the subspace to itself).
16.10.3 Exercise 3
Here’s a function that computes the orthonormal vectors using the GS algorithm given in the
lecture.
16.10. SOLUTIONS 311
16.10.4 Setup
function normalized_orthogonal_projection(b, Z)
# project onto the orthogonal complement of the col span of Z
orthogonal = I - Z * inv(Z'Z) * Z'
projection = orthogonal * b
# normalize
return projection / norm(projection)
end
return U
end
First let’s do ordinary projection of 𝑦 onto the basis spanned by the columns of 𝑋.
In [6]: U = gram_schmidt(X)
Now we can project using the orthonormal basis and see if we get the same thing:
The result is the same. To complete the exercise, we get an orthonormal basis by QR decom-
position and project once more.
In [8]: Q, R = qr(X)
Q = Matrix(Q)
17.1 Contents
• Overview 17.2
• Relationships 17.3
• LLN 17.4
• CLT 17.5
• Exercises 17.6
• Solutions 17.7
17.2 Overview
This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT).
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling.
The lecture is based around simulations that show the LLN and CLT in action.
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold.
In addition, we examine several useful extensions of the classical theorems, such as
• The delta method, for smooth functions of random variables
• The multivariate case
Some of these extensions are presented as exercises.
17.3 Relationships
313
314 CHAPTER 17. LLN AND CLT
17.4 LLN
We begin with the law of large numbers, which tells us when sample averages will converge to
their population means.
The classical law of large numbers concerns independent and identically distributed (IID)
random variables.
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law.
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with com-
mon distribution 𝐹 .
When it exists, let 𝜇 denote the common mean of this sample:
𝜇 ∶= 𝔼𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)
In addition, let
1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1
ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)
17.4.2 Proof
The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [26].
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition.
The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with 𝔼𝑋𝑖2 < ∞, then, for any 𝜖 > 0, we
have
(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖 .
Recall the Chebyshev inequality, which tells us that
𝔼[(𝑋̄ 𝑛 − 𝜇)2 ]
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (3)
𝜖2
2
⎧
{ 1 𝑛 ⎫
}
𝔼[(𝑋̄ 𝑛 − 𝜇)2 ] = 𝔼 ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
1 𝑛 𝑛
= 2 ∑ ∑ 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= ∑ 𝔼(𝑋𝑖 − 𝜇)2
𝑛2 𝑖=1
𝜎2
=
𝑛
Here the crucial step is at the third equality, which follows from independence.
Independence means that if 𝑖 ≠ 𝑗, then the covariance term 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out.
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛.
Combining our last result with (3), we come to the estimate
𝜎2
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ 2 (4)
𝑛𝜖
17.4.3 Illustration
Let’s now illustrate the classical IID law of large numbers using simulation.
316 CHAPTER 17. LLN AND CLT
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of 𝑋̄ 𝑛 as 𝑛 increases.
Below is a figure that does just this (as usual, you can click on it to expand it).
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each
case.
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100.
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted.
17.4.4 Setup
Out[2]: Plots.GRBackend()
label = ["Mean"])
end
plot!(1:n, sample_means, linewidth = 3, alpha = 0.6, color = :green,
label = "Sample mean")
return plot!(title = title)
end
Exponential(1)]
Poisson{Float64}(λ=4.0)
LogNormal{Float64}(μ=0.5, σ=1.0)
Exponential{Float64}(θ=1.0)
In [5]: ksl(Normal())
Out[5]:
false)
Out[7]:
318 CHAPTER 17. LLN AND CLT
What happens if the condition 𝔼|𝑋| < ∞ in the statement of the LLN is not satisfied?
This might be the case if the underlying distribution is heavy tailed — the best known exam-
ple is the Cauchy distribution, which has density
1
𝑓(𝑥) = (𝑥 ∈ ℝ)
𝜋(1 + 𝑥2 )
The next figure shows 100 independent draws from this distribution
In [9]: ksl(Cauchy())
Out[9]:
Notice how extreme observations are far more prevalent here than the previous figure.
Let’s now have a look at the behavior of the sample mean
3)
return hline!([0], color = :black, linestyle = :dash, label = "",�
↪grid = false)
17.4. LLN 319
end
plot_means()
Out[11]:
Here we’ve increased 𝑛 to 1000, but the sequence still shows no sign of converging.
Will convergence become visible if we take 𝑛 even larger?
The answer is no.
To see this, recall that the characteristic function of the Cauchy distribution is
̄ 𝑡 𝑛
𝔼𝑒𝑖𝑡𝑋𝑛 = 𝔼 exp {𝑖 ∑ 𝑋𝑗 }
𝑛 𝑗=1
𝑛
𝑡
= 𝔼 ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ 𝔼 exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛
17.5 CLT
Next we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means.
The central limit theorem is one of the most remarkable results in all of mathematics.
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞),
then
√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛→∞ (6)
𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation 𝜎.
17.5.2 Intuition
The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve.
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [26]).
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition.
In fact all of the proofs of the CLT that we know are similar in this respect.
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating addition of independent Bernoulli ran-
dom variables.
In particular, let 𝑋𝑖 be binary, with ℙ{𝑋𝑖 = 0} = ℙ{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be
independent.
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials.
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8
In [12]: binomial_pdf(n) =
bar(0:n, pdf.(Binomial(n), 0:n),
xticks = 0:10, ylim = (0, 1), yticks = 0:0.1:1,
label = "Binomial($n, 0.5)", legend = :topleft)
In [13]: plot(binomial_pdf.((1,2,4,8))...)
Out[13]:
17.5. CLT 321
When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability.
When 𝑛 = 2 we can either have 0, 1 or 2 successes.
Notice the peak in probability mass at the mid-point 𝑘 = 1.
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then
fail”) than to get zero or two successes.
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed
then fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”.
(If there was positive correlation, say, then “succeed then fail” would be less likely than “suc-
ceed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails.
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the mini-
mum and the maximum possible value).
The intuition is the same — there are simply more ways to get these middle outcomes.
If we continue, the bell-shaped curve becomes ever more pronounced.
We are witnessing the binomial approximation of the normal distribution.
17.5.3 Simulation 1
Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition.
To this end, we now perform the following simulation
3. Use these draws to compute some measure of their distribution — such as a histogram.
322 CHAPTER 17. LLN AND CLT
Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥 .
(Please experiment with other choices of 𝐹 , but remember that, to conform with the condi-
tions of the CLT, the distribution must have finite second moment)
In [15]: simulation1(Exponential(0.5))
Out[15]:
The fit to the normal density is already tight, and can be further improved by increasing n.
You can also experiment with other specifications of 𝐹 .
17.5.4 Simulation 2
Our next simulation is somewhat like the first, except that we aim to track the distribution of
√
𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛 increases.
17.5. CLT 323
In [17]: ys = simulation2()
plots = [] # would preallocate in optimized code
for i in 1:size(ys, 2)
p = density(ys[:, i], linealpha = i, title = "n = $i")
push!(plots, p)
end
Out[17]:
324 CHAPTER 17. LLN AND CLT
The law of large numbers and central limit theorem work just as nicely in multidimensional
settings.
To state the results, let’s recall some elementary facts about random vectors.
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 ).
Each realization of X is an element of ℝ𝑘 .
A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors
x1 , … , x𝑛 in ℝ𝑘 , we have
𝔼[𝑋1 ] 𝜇1
⎛
⎜ 𝔼[𝑋2 ] ⎞
⎟ ⎛
⎜ 𝜇2 ⎞
⎟
𝔼[X] ∶= ⎜
⎜ ⎟
⎟ =⎜ ⎟ =∶ 𝜇
⎜ ⋮ ⎜
⎟ ⎜ ⋮ ⎟ ⎟
⎝ 𝔼[𝑋𝑘 ] ⎠ ⎝ 𝜇𝑘 ⎠
1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1
ℙ {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (7)
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (8)
17.6 Exercises
17.6.1 Exercise 1
√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛→∞ (9)
This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors — many of which can be expressed as functions of sample means.
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇.
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let
𝑔(𝑥) = sin(𝑥).
√
Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the
same spirit as the program illustrate_clt.jl discussed above.
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?
17.6.2 Exercise 2
Here’s a result that’s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem.
If you study econometric theory, you will see this result used again and again.
Assume the setting of the multivariate CLT discussed above, so that
3. The convergence
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (10)
is valid.
In a statistical setting, one often wants the right hand side to be standard normal, so that
confidence intervals are easily computed.
This normalization can be achieved on the basis of three observations.
First, if X is a random vector in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then
Var[AX] = A Var[X]A′
𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in ℝ𝑘 and A is constant and 𝑘 × 𝑘,
then
𝑑
AZ𝑛 → AZ
Third, if S is a 𝑘×𝑘 symmetric positive definite matrix, then there exists a symmetric positive
definite matrix Q, called the inverse square root of S, such that
QSQ′ = I
√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)
Applying the continuous mapping theorem one more time tells us that
𝑑
‖Z𝑛 ‖2 → ‖Z‖2
𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (11)
𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊 𝑖
where
• each 𝑊𝑖 is an IID draw from the uniform distribution on [−1, 1]
• each 𝑈𝑖 is an IID draw from the uniform distribution on [−2, 2]
• 𝑈𝑖 and 𝑊𝑖 are independent of each other
Hints:
17.7 Solutions
17.7.1 Exercise 1
cos)
μ, σ = mean(distribution), std(distribution)
y = rand(distribution, n, k)
y = mean(y, dims = 1)
y = vec(y)
error_obs = sqrt(n) .* (g.(y) .- g.(μ))
density(error_obs, label = "Empirical Density")
return plot!(Normal(0, g′(μ) .* σ), linestyle = :dash, label =�
↪"Asymptotic",
color = :black)
end
exercise1()
Out[18]:
328 CHAPTER 17. LLN AND CLT
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0.
Hence the conditions of the delta theorem are not satisfied.
17.7.2 Exercise 2
√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)
√
Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)
𝑑
QY𝑛 → QY
Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal.
Its mean is clearly 0, and its variance covariance matrix is
𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show.
Now we turn to the simulation exercise.
Our solution is as follows
17.7. SOLUTIONS 329
vw = var(dw)
vu = var(du)
Σ = [vw vw
vw vw + vu]
Q = inv(sqrt(Σ))
function generate_data(dw, du, n)
dw = rand(dw, n)
X = [dw dw + rand(du, n)]
return sqrt(n) * mean(X, dims = 1)
end
X = mapreduce(x -> generate_data(dw, du, n), vcat, 1:k)
X = Q * X'
X = sum(abs2, X, dims = 1)
X = vec(X)
density(X, label = "", xlim = (0, 10))
return plot!(Chisq(2), color = :black, linestyle = :dash,
label = "Chi-squared with 2 degrees of freedom", grid =�
↪false)
end
exercise2()
Out[19]:
330 CHAPTER 17. LLN AND CLT
Chapter 18
18.1 Contents
• Overview 18.2
• The Linear State Space Model 18.3
• Distributions and Moments 18.4
• Stationarity and Ergodicity 18.5
• Noisy Observations 18.6
• Prediction 18.7
• Code 18.8
• Exercises 18.9
• Solutions 18.10
“We may regard the present state of the universe as the effect of its past and the
cause of its future” – Marquis de Laplace
18.2 Overview
331
332 CHAPTER 18. LINEAR STATE SPACE MODELS
18.2.1 Setup
18.3.1 Primitives
1. the matrices 𝐴, 𝐶, 𝐺
Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model (1) pins down the values of the se-
quences {𝑥𝑡 } and {𝑦𝑡 }.
Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 }
and {𝑦𝑡 }.
Later we’ll see how to compute these distributions and their moments.
We’ve made the common assumption that the shocks are independent standardized normal
vectors.
But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale
difference sequence.
18.3. THE LINEAR STATE SPACE MODEL 333
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information.
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies
This is a weaker condition than that {𝑤𝑡 } is iid with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼).
18.3.2 Examples
1 1 0 0 0
𝑥𝑡 = ⎢ 𝑦𝑡 ⎤
⎡
⎥ 𝐴 = ⎢ 𝜙0 𝜙1 𝜙2 ⎤
⎡
⎥ 𝐶 = ⎢0⎤
⎡
⎥ 𝐺 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0⎦ ⎣0⎦
You can confirm that under these definitions, (1) and (2) agree.
The next figure shows dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 =
𝑦−1 = 1
334 CHAPTER 18. LINEAR STATE SPACE MODELS
𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦
The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ].
The next figure shows dynamics of this process when
Vector Autoregressions
𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦
Seasonals
0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦
It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:Section
??
𝑥𝑡+4 = 𝑥𝑡
Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series.
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations.
Time Trends
1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (4)
0 1 0
′
and starting at initial condition 𝑥0 = [0 1] .
In fact it’s possible to use the state-space system to represent polynomial trends of any order.
For instance, let
336 CHAPTER 18. LINEAR STATE SPACE MODELS
0 1 1 0 0
𝑥0 = ⎡ ⎤
⎢0⎥ 𝐴=⎡
⎢0 1 1⎥
⎤ 𝐶=⎡
⎢0⎥
⎤
⎣1⎦ ⎣0 0 1 ⎦ ⎣0⎦
It follows that
1 𝑡 𝑡(𝑡 − 1)/2
𝐴𝑡 = ⎡
⎢ 1
0 𝑡 ⎤
⎥
⎣0 0 1 ⎦
Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1], so that 𝑥𝑡 contains linear and quadratic time trends.
𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
⋮ (5)
𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0
1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0
1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0] .
0 1
Substituting into the moving average representation (5), we obtain
𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0
Using (1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡 .
We’ll explain what unconditional and conditional mean soon.
Letting 𝜇𝑡 ∶= 𝔼[𝑥𝑡 ] and using linearity of expectations, we find that
18.4.2 Distributions
In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution.
However, there are some situations where these moments alone tell us all we need to know.
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution.
(Sufficient statistics form a list of objects that characterize a population distribution)
338 CHAPTER 18. LINEAR STATE SPACE MODELS
One such situation is when the vector in question is Gaussian (i.e., normally distributed).
This is the case here, given
In particular, given our Gaussian assumptions on the primitives and the linearity of (1) we
can see immediately that both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 0 Section ??.
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix.
But in fact we’ve already done this, in (6) and (7).
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have
𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (11)
In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 𝑦𝑇 ’s.
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.jl)
Here is another figure, this time with 100 observations
Let’s now try with 500,000 observations, showing only the histogram (without rotation)
Ensemble means
Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average
1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑𝑦
𝐼 𝑖=1 𝑇
approximates the expectation 𝔼[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers).
Here’s a simulation comparing the ensemble averages and population means at time points
𝑡 = 0, … , 50.
The parameters are the same as for the preceding figures, and the sample size is relatively
small (𝐼 = 20).
1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
18.4. DISTRIBUTIONS AND MOMENTS 341
1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0
𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )
Autocovariance functions
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (14)
342 CHAPTER 18. LINEAR STATE SPACE MODELS
Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the
earlier date.
Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models.
Let’s start with the intuition.
Let’s look at some more time series from the same model that we analyzed above.
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″
Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are
relatively similar to each other — but unlike the distribution at 𝑇 .
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞.
When such a distribution exists it is called a stationary distribution.
Since
𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )
Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution.
Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random
fluctuations due to the finite sample size.
By
• our choosing 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ )
• the definitions of 𝜇∞ and Σ∞ as fixed points of (6) and (7) respectively
we’ve ensured that
Moreover, in view of (14), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ , which
depends on 𝑗 but not on 𝑡.
This motivates the following definition.
A process {𝑥𝑡 } is said to be covariance stationary if
• both 𝜇𝑡 and Σ𝑡 are constant in 𝑡
• Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on time 𝑡
In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that
none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡 depends on 𝑡.
344 CHAPTER 18. LINEAR STATE SPACE MODELS
The difference equation 𝜇𝑡+1 = 𝐴𝜇𝑡 is known to have unique fixed point 𝜇∞ = 0 if all eigen-
values of 𝐴 have moduli strictly less than unity.
That is, if all(abs(eigvals(A)) .< 1) == true.
The difference equation (7) also has a unique fixed point in this case, and, moreover
𝜇𝑡 → 𝜇∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞
𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0
where
• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector
′
Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1.
It follows that
Let 𝜇1𝑡 = 𝔼[𝑥1𝑡 ] and take expectations on both sides of this expression to get
Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one.
Then (15) has a unique stationary solution, namely,
𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎
18.5. STATIONARITY AND ERGODICITY 345
′
The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1] .
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy
Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
(16)
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞
Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡.
In conclusion, if
• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity
then the {𝑥𝑡 } process is covariance stationary, with constant state component
Note
If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on (7) converge to the fixed point of the discrete
Lyapunov equation in the first line of (16).
18.5.5 Ergodicity
Ensemble averages across simulations are interesting theoretically, but in real life we usually
observe only a single realization {𝑥𝑡 , 𝑦𝑡 }𝑇𝑡=0 .
So now let’s take a single realization and form the time series averages
1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡
Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity.
Ergodicity is the property that time series and ensemble averages coincide.
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution.
In particular,
1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞
346 CHAPTER 18. LINEAR STATE SPACE MODELS
In our linear Gaussian setting, any covariance stationary process is also ergodic.
In some settings the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term.
Often this error term represents the idea that the true state can only be observed imperfectly.
To include an error term in the observation we introduce
• An iid sequence of ℓ × 1 random vectors 𝑣𝑡 ∼ 𝑁 (0, 𝐼)
• A 𝑘 × ℓ matrix 𝐻
and extend the linear state-space system to
𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )
18.7 Prediction
The theory of prediction for linear state space systems is elegant and simple.
The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and
independent of 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 .
18.7. PREDICTION 347
More generally, we’d like to compute the 𝑗-step ahead forecasts 𝔼𝑡 [𝑥𝑡+𝑗 ] and 𝔼𝑡 [𝑦𝑡+𝑗 ].
With a bit of algebra we obtain
In view of the iid property, current and past state values provide no information about future
values of the shock.
Hence 𝔼𝑡 [𝑤𝑡+𝑘 ] = 𝔼[𝑤𝑡+𝑘 ] = 0.
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is
𝔼𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡
It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors
𝑗−1
𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (20)
𝑠=0
Evidently,
𝑗−1
′
𝑉𝑗 ∶= 𝔼𝑡 [(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])′ ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘 (21)
𝑘=0
𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡
information 𝑥𝑡 .
Under particular conditions, 𝑉𝑗 converges to
348 CHAPTER 18. LINEAR STATE SPACE MODELS
𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (23)
In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system (1).
We want the following objects
∞
• Forecast of a geometric sum of future 𝑥’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ].
∞
• Forecast of a geometric sum of future 𝑦’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ].
These objects are important components of some famous and interesting dynamic models.
For example,
∞
• if {𝑦𝑡 } is a stream of dividends, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price
∞
• if {𝑦𝑡 } is the money supply, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level
Formulas
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
18.8 Code
Our preceding simulations and calculations are based on code in the file lss.jl from the Quan-
tEcon.jl package.
18.9. EXERCISES 349
The code implements a type which the linear state space models can act on directly through
specific methods (for simulations, calculating moments, etc.).
Examples of usage are given in the solutions to the exercises.
18.9 Exercises
18.9.1 Exercise 1
18.9.2 Exercise 2
18.9.3 Exercise 3
18.9.4 Exercise 4
18.10 Solutions
18.10.1 Exercise 1
A = [1.0 0.0 0
ϕ0 ϕ1 ϕ2
0.0 1.0 0.0]
C = zeros(3, 1)
G = [0.0 1.0 0.0]
μ_0 = ones(3)
350 CHAPTER 18. LINEAR STATE SPACE MODELS
x, y = simulate(lss, 50)
plot(dropdims(y, dims = 1), color = :blue, linewidth = 2, alpha = 0.7)
plot!(xlabel="time", ylabel = "y_t", legend = :none)
Out[4]:
18.10.2 Exercise 2
A = [ϕ1 ϕ2 ϕ3 ϕ4
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0]
C = [σ
0.0
0.0
0.0]''
G = [1.0 0.0 0.0 0.0]
Out[5]:
18.10. SOLUTIONS 351
18.10.3 Exercise 3
A = [ ϕ1 ϕ2 ϕ3 ϕ4
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0]
C = [σ
0.0
0.0
0.0]
G = [1.0 0.0 0.0 0.0]
I = 20
T = 50
ar = LSS(A, C, G; mu_0 = ones(4))
ymin, ymax = -0.5, 1.15
ensemble_mean = zeros(T)
ys = []
for i � 1:I
x, y = simulate(ar, T)
y = dropdims(y, dims = 1)
push!(ys, y)
ensemble_mean .+= y
end
ensemble_mean = ensemble_mean ./ I
plot(ys, color = :blue, alpha = 0.2, linewidth = 0.8, label = "")
plot!(ensemble_mean, color = :blue, linewidth = 2, label = "y_t_bar")
m = moment_sequence(ar)
pop_means = zeros(0)
352 CHAPTER 18. LINEAR STATE SPACE MODELS
Out[6]:
18.10.4 Exercise 4
A = [ϕ1 ϕ2 ϕ3 ϕ4
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0]
C = [σ
0.0
0.0
0.0]''
G = [1.0 0.0 0.0 0.0]
T0 = 10
T1 = 50
T2 = 75
T4 = 100
ys = []
x_scatter = []
y_scatter = []
for i � 1:80
rcolor = colors[rand(1:3)]
x, y = simulate(ar, T4)
y = dropdims(y, dims = 1)
push!(ys, y)
x_scatter = [x_scatter; T0; T1; T2]
y_scatter = [y_scatter; y[T0]; y[T1]; y[T2]]
end
Out[7]:
Footnotes
[1] The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).
354 CHAPTER 18. LINEAR STATE SPACE MODELS
[2] The correct way to argue this is by induction. Suppose that 𝑥𝑡 is Gaussian. Then (1) and
(10) imply that 𝑥𝑡+1 is Gaussian. Since 𝑥0 is assumed to be Gaussian, it follows that every 𝑥𝑡
is Gaussian. Evidently this implies that each 𝑦𝑡 is Gaussian.
Chapter 19
19.1 Contents
• Overview 19.2
• Definitions 19.3
• Simulation 19.4
• Marginal Distributions 19.5
• Irreducibility and Aperiodicity 19.6
• Stationary Distributions 19.7
• Ergodicity 19.8
• Computing Expectations 19.9
• Exercises 19.10
• Solutions 19.11
19.2 Overview
Markov chains are one of the most useful classes of stochastic processes, being
• simple, flexible and supported by many elegant theoretical results
• valuable for building intuition about random dynamic models
• central to quantitative modeling in their own right
You will find them in many of the workhorse models of economics and finance.
In this lecture we review some of the theory of Markov chains.
We will also introduce some of the high quality routines for working with Markov chains
available in QuantEcon.jl.
Prerequisite knowledge is basic probability and linear algebra.
19.2.1 Setup
355
356 CHAPTER 19. FINITE MARKOV CHAINS
19.3 Definitions
Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes.
It is too not difficult to check Section ?? that if 𝑃 is a stochastic matrix, then so is the 𝑘-th
power 𝑃 𝑘 for all 𝑘 ∈ ℕ.
In other words, knowing the current state is enough to know probabilities for future states.
In particular, the dynamics of a Markov chain are fully determined by the set of values
By construction,
• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
We can view 𝑃 as a stochastic matrix where
𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛
Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 }
as follows:
19.3. DEFINITIONS 357
19.3.3 Example 1
Consider a worker who, at any given time 𝑡, is either unemployed (state 1) or employed (state
2).
Suppose that, over a one month period,
2. An employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1).
1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽
Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as
• What is the average duration of unemployment?
• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least
once over the next 12 months?
We’ll cover such applications below.
19.3.4 Example 2
0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞
⎟
⎝ 0 0.508 0.492 ⎠
where
• the frequency is monthly
• the first state represents “normal growth”
• the second state represents “mild recession”
• the third state represents “severe recession”
For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97.
In general, large values on the main diagonal indicate persistence in the process {𝑋𝑡 }.
358 CHAPTER 19. FINITE MARKOV CHAINS
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities
19.4 Simulation
One natural way to answer questions about Markov chains is to simulate them.
(To approximate the probability of event 𝐸, we can simulate many times and count the frac-
tion of times that 𝐸 occurs)
Nice functionality for simulating Markov chains exists in QuantEcon.jl.
• Efficient, bundled with lots of other useful routines for handling Markov chains.
However, it’s also a good exercise to roll our own routines — let’s do that first and then come
back to the methods in QuantEcon.jl.
In these exercises we’ll take the state space to be 𝑆 = 1, … , 𝑛.
To simulate a Markov chain, we need its stochastic matrix 𝑃 and either an initial state or a
probability distribution 𝜓 for initial state to be drawn from.
The Markov chain is then constructed as discussed above. To repeat:
2. At each subsequent time 𝑡, the new state 𝑋𝑡+1 is drawn from 𝑃 (𝑋𝑡 , ⋅).
In order to implement this simulation procedure, we need a method for generating draws from
a discrete distributions.
For this task we’ll use a Categorical random variable (i.e. a discrete random variable with as-
signed probabilities)
rand(d, 5) = [1, 2, 1, 1, 1]
supertype(typeof(d)) = Distribution{Univariate,Discrete}
pdf(d, 1) = 0.5
support(d) = Base.OneTo(3)
pdf.(d, support(d)) = [0.5, 0.3, 0.2]
We’ll write our code as a function that takes the following three arguments
• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function
should return
for t in 2:sample_size
dist = dists[X[t-1]] # get discrete RV from last state's transition�
↪distribution
0.4 0.6
𝑃 ∶= ( ) (3)
0.2 0.8
As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 1
will be about 0.25.
If you run the following code you should get roughly that answer
1)
Out[5]: 0.24701
360 CHAPTER 19. FINITE MARKOV CHAINS
As discussed above, QuantEcon.jl has routines for handling Markov chains, including simula-
tion.
Here’s an illustration using the same P as the preceding example
Out[6]: 0.25143
In [10]: simulate_indices(mc, 4)
Suppose that
19.5.1 Solution
In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can hap-
pen and sum their probabilities.
Rewriting this statement in terms of marginal and conditional probabilities gives.
𝜓𝑡+1 = 𝜓𝑡 𝑃 (4)
In other words, to move the distribution forward one unit of time, we postmultiply by 𝑃 .
By repeating this 𝑚 times we move forward 𝑚 steps into the future.
Hence, iterating on (4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th
power of 𝑃 .
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then
𝜓0 𝑃 𝑚 is the distribution of 𝑋𝑚 .
This is very important, so let’s repeat it
𝑋0 ∼ 𝜓 0 ⟹ 𝑋𝑚 ∼ 𝜓0 𝑃 𝑚 (5)
𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (6)
362 CHAPTER 19. FINITE MARKOV CHAINS
We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦).
It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the
(𝑥, 𝑦)-th element of the 𝑚-th power of 𝑃 .
To see why, consider again (6), but now with 𝜓𝑡 putting all probability on state 𝑥.
• 1 in the 𝑥-th position and zero elsewhere.
Inserting this into (6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the
𝑥-th row of 𝑃 𝑚 .
In particular
Recall the stochastic matrix 𝑃 for recession and growth considered above.
Suppose that the current state is unknown — perhaps statistics are available only at the end
of the current month.
We estimate the probability that the economy is in state 𝑥 to be 𝜓(𝑥).
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product
0
𝜓𝑃 ⋅ ⎜ 1 ⎞
6 ⎛ ⎟
1
⎝ ⎠
The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples.
To illustrate, recall our model of employment / unemployment dynamics for a given worker
discussed above.
Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime expe-
riences are described by the specified dynamics, independently of one another.
Let 𝜓 be the current cross-sectional distribution over {1, 2}.
• For example, 𝜓(1) is the unemployment rate.
The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment.
The same distribution also describes the fractions of a particular worker’s career spent being
employed and unemployed, respectively.
19.6. IRREDUCIBILITY AND APERIODICITY 363
Irreducibility and aperiodicity are central concepts of modern Markov chain theory.
Let’s see what they’re about.
19.6.1 Irreducibility
We can translate this into a stochastic matrix, putting zeros where there’s no edge between
nodes
0.9 0.1 0
𝑃 ∶= ⎛
⎜ 0.4 0.4 0.2 ⎞
⎟
⎝ 0.1 0.1 0.8 ⎠
It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually.
We can also test this using QuantEcon.jl’s MarkovChain class
In [11]: P = [0.9 0.1 0.0; 0.4 0.4 0.2; 0.1 0.1 0.8];
mc = MarkovChain(P)
is_irreducible(mc)
Out[11]: true
364 CHAPTER 19. FINITE MARKOV CHAINS
Here’s a more pessimistic scenario, where the poor are poor forever
This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor.
Let’s confirm this
In [12]: P = [1.0 0.0 0.0; 0.1 0.8 0.1; 0.0 0.2 0.8];
mc = MarkovChain(P);
is_irreducible(mc)
Out[12]: false
We can also determine the “communication classes,” or the sets of communicating states
(where communication refers to a nonzero probability of moving in each direction).
In [13]: communication_classes(mc)
It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes.
For example, poverty is a life sentence in the second graph but not the first.
We’ll come back to this a bit later.
19.6.2 Aperiodicity
Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise.
Here’s a trivial example with three states
19.6. IRREDUCIBILITY AND APERIODICITY 365
In [14]: P = [0 1 0; 0 0 1; 1 0 0];
mc = MarkovChain(P);
period(mc)
Out[14]: 3
More formally, the period of a state 𝑥 is the greatest common divisor of the set of integers
In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3.
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise.
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state 𝑎 has period 2
Out[15]: 2
In [16]: is_aperiodic(mc)
Out[16]: false
366 CHAPTER 19. FINITE MARKOV CHAINS
As seen in (4), we can shift probabilities forward one unit of time via postmultiplication by
𝑃.
Some distributions are invariant under this updating process — for example,
A stochastic matrix satisfying the conditions of the theorem is sometimes called uniformly
ergodic.
One easy sufficient condition for aperiodicity and irreducibility is that every element of 𝑃 is
strictly positive
• Try to convince yourself of this
19.7.1 Example
Recall our model of employment / unemployment dynamics for a given worker discussed
above.
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied.
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment
(state 1).
Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields
𝛽
𝑝=
𝛼+𝛽
This is, in some sense, a steady state probability of unemployment — more on interpretation
below.
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0.
As discussed above, a given Markov matrix 𝑃 can have many stationary distributions.
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃 .
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since
in this case, as you can verify,
𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2
Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
𝑋𝑡 converges to the stationary distribution regardless of where we start off.
This adds considerable weight to our interpretation of 𝜓∗ as a stochastic steady state.
The convergence in the theorem is illustrated in the next figure
t = 20 # path length
x_vals = zeros(t)
y_vals = similar(x_vals)
z_vals = similar(x_vals)
colors = [repeat([:red], 20); :black] # for plotting
for i in 1:t
x_vals[i] = ψ[1]
y_vals[i] = ψ[2]
z_vals[i] = ψ[3]
ψ = ψ * P # update distribution
end
mc = MarkovChain(P)
ψ_star = stationary_distributions(mc)[1]
x_star, y_star, z_star = ψ_star # unpack the stationary dist
plt = scatter([x_vals; x_star], [y_vals; y_star], [z_vals; z_star], color�
↪= colors,
Out[19]:
19.8. ERGODICITY 369
Here
• 𝑃 is the stochastic matrix for recession and growth considered above
• The highest red dot is an arbitrarily chosen initial probability distribution 𝜓, repre-
sented as a vector in ℝ3
• The other red dots are the distributions 𝜓𝑃 𝑡 for 𝑡 = 1, 2, …
• The black dot is 𝜓∗
The code for the figure can be found here — you might like to try experimenting with differ-
ent initial conditions.
19.8 Ergodicity
1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (7)
𝑚 𝑡=1
Here
• 1{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise
• convergence is with probability one
• the result does not depend on the distribution (or value) of 𝑋0
The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as
time goes to infinity.
This gives us another way to interpret the stationary distribution — provided that the con-
vergence result in (7) is valid.
The convergence in (7) is a special case of a law of large numbers result for Markov chains —
see EDTC, section 4.3.4 for some additional information.
370 CHAPTER 19. FINITE MARKOV CHAINS
19.8.1 Example
𝛽
𝑝=
𝛼+𝛽
𝔼[ℎ(𝑋𝑡 )] (8)
𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (9)
where
• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃
• ℎ is a given function, which, in expressions involving matrix algebra, we’ll think of as
the column vector
ℎ(𝑥1 )
ℎ=⎛
⎜ ⋮ ⎞
⎟
⎝ ℎ(𝑥 )
𝑛 ⎠
The unconditional expectation (8) is easy: We just sum over the distribution of 𝑋𝑡 to get
𝔼[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ
19.10. EXERCISES 371
For the conditional expectation (9), we need to sum over the conditional distribution of 𝑋𝑡+𝑘
given 𝑋𝑡 = 𝑥.
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so
∞
𝔼 [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0
where
(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯
19.10 Exercises
19.10.1 Exercise 1
According to the discussion above, if a worker’s employment dynamics obey the stochastic
matrix
1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽
with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be
𝛽
𝑝 ∶=
𝛼+𝛽
In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 →
∞, where
1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 1}
𝑚 𝑡=1
(You don’t need to add the fancy touches to the graph — see the solution if you’re interested)
19.10.2 Exercise 2
2. have the matches returned in order, where the order corresponds to some measure of
“importance”
𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖
where
• ℓ𝑖 is the total number of outbound links from 𝑖
• 𝐿𝑗 is the set of all pages 𝑖 such that 𝑖 has a link to 𝑗
This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/ℓ𝑖 ).
There is, however, another interpretation, and it brings us back to Markov chains.
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗
and zero otherwise.
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link.
374 CHAPTER 19. FINITE MARKOV CHAINS
𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖
19.10.3 Exercise 3
𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2
Tauchen’s method [104] is the most common method for approximating this continuous state
process with a finite state Markov chain.
A routine for this already exists in QuantEcon.jl but let’s write our own version as an exer-
cise.
As a first step we choose
• 𝑛, the number of states for the discrete approximation
19.11. SOLUTIONS 375
1. If 𝑗 = 0, then set
1. If 𝑗 = 𝑛 − 1, then set
1. Otherwise, set
19.11 Solutions
19.11.1 Exercise 1
Compute the fraction of time that the worker spends unemployed, and compare it to the sta-
tionary probability.
for x0 in 1:2
X = simulate_indices(mc, N; init = x0) # generate the sample path
�
X = cumsum(X .== 1) ./ (1:N) # compute state fraction. ./ required for�
↪precedence
� �
y_vals[x0] = X .- p # plot divergence from steady state
end
Out[20]:
19.11.2 Exercise 2
Q = fill(false, n, n)
for (node, edges) in enumerate(values(web_graph_data))
Q[node, nodes .� Ref(edges)] .= true
end
mc = MarkovChain(P)
r = stationary_distributions(mc)[1] # stationary distribution
ranked_pages = Dict(zip(keys(web_graph_data), r)) # results holder
# print solution
println("Rankings\n ***")
sort(collect(ranked_pages), by = x -> x[2], rev = true) # print sorted
Rankings
***
19.11.3 Exercise 3
20.1 Contents
• Overview 20.2
• The Density Case 20.3
• Beyond Densities 20.4
• Stability 20.5
• Exercises 20.6
• Solutions 20.7
• Appendix 20.8
20.2 Overview
In a previous lecture we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models.
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov
chains.
Most stochastic dynamic models studied by economists either fit directly into this class or can
be represented as continuous state Markov chains after minor modifications.
In this lecture, our focus will be on continuous Markov models that
• evolve in discrete time
• are often nonlinear
The fact that we accommodate nonlinear models here is significant, because linear stochastic
models have their own highly developed tool set, as we’ll see later on.
The question that interests us most is: Given a particular stochastic dynamic model, how will
the state of the system evolve over time?
In particular,
• What happens to the distribution of the state variables?
• Is there anything we can say about the “average behavior” of these variables?
• Is there a notion of “steady state” or “long run equilibrium” that’s applicable to the
model?
– If so, how can we compute it?
379
380 CHAPTER 20. CONTINUOUS STATE MARKOV CHAINS
Answering these questions will lead us to revisit many of the topics that occupied us in the
finite state case, such as simulation, distribution dynamics, stability, ergodicity, etc.
Note
For some people, the term “Markov chain” always refers to a process with a finite
or discrete state space. We follow the mainstream mathematical literature (e.g.,
[77]) in using the term to refer to any discrete time Markov process.
20.2.1 Setup
You are probably aware that some distributions can be represented by densities and some
cannot.
(For example, distributions on the real numbers ℝ that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one step transition
probabilities have density representations.
The benefit is that the density case offers a very direct parallel to the finite case in terms of
notation and intuition.
Once we’ve built some intuition we’ll cover the general case.
In our lecture on finite Markov chains, we studied discrete time Markov chains that evolve on
a finite state space 𝑆.
In this setting, the dynamics of the model are described by a stochastic matrix — a nonnega-
tive square matrix 𝑃 = 𝑃 [𝑖, 𝑗] such that each row 𝑃 [𝑖, ⋅] sums to one.
The interpretation of 𝑃 is that 𝑃 [𝑖, 𝑗] represents the probability of transitioning from state 𝑖
to state 𝑗 in one unit of time.
In symbols,
ℙ{𝑋𝑡+1 = 𝑗 | 𝑋𝑡 = 𝑖} = 𝑃 [𝑖, 𝑗]
Equivalently,
• 𝑃 can be thought of as a family of distributions 𝑃 [𝑖, ⋅], one for each 𝑖 ∈ 𝑆
20.3. THE DENSITY CASE 381
1 (𝑦 − 𝑥)2
𝑝𝑤 (𝑥, 𝑦) ∶= √ exp {− } (1)
2𝜋 2
IID
𝑋𝑡+1 = 𝑋𝑡 + 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (2)
In the previous section, we made the connection between stochastic difference equation (2)
and stochastic kernel (1).
In economics and time series analysis we meet stochastic difference equations of all different
shapes and sizes.
It will be useful for us if we have some systematic methods for converting stochastic difference
equations into stochastic kernels.
To this end, consider the generic (scalar) stochastic difference equation given by
This is a special case of (3) with 𝜇(𝑥) = 𝛼𝑥 and 𝜎(𝑥) = (𝛽 + 𝛾𝑥2 )1/2 .
Example 3: With stochastic production and a constant savings rate, the one-sector neoclas-
sical growth model leads to a law of motion for capital per worker such as
Here
• 𝑠 is the rate of savings
• 𝐴𝑡+1 is a production shock
– The 𝑡 + 1 subscript indicates that 𝐴𝑡+1 is not visible at time 𝑡
• 𝛿 is a depreciation rate
• 𝑓 ∶ ℝ+ → ℝ+ is a production function satisfying 𝑓(𝑘) > 0 whenever 𝑘 > 0
(The fixed savings rate can be rationalized as the optimal policy for a particular set of tech-
nologies and preferences (see [68], section 3.1.2), although we omit the details here)
Equation (5) is a special case of (3) with 𝜇(𝑥) = (1 − 𝛿)𝑥 and 𝜎(𝑥) = 𝑠𝑓(𝑥).
Now let’s obtain the stochastic kernel corresponding to the generic model (3).
To find it, note first that if 𝑈 is a random variable with density 𝑓𝑈 , and 𝑉 = 𝑎 + 𝑏𝑈 for some
constants 𝑎, 𝑏 with 𝑏 > 0, then the density of 𝑉 is given by
1 𝑣−𝑎
𝑓𝑉 (𝑣) = 𝑓𝑈 ( ) (6)
𝑏 𝑏
(The proof is below. For a multidimensional version see EDTC, theorem 8.1.3)
Taking (6) as given for the moment, we can obtain the stochastic kernel 𝑝 for (3) by recalling
that 𝑝(𝑥, ⋅) is the conditional density of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥.
In the present case, this is equivalent to stating that 𝑝(𝑥, ⋅) is the density of 𝑌 ∶= 𝜇(𝑥) +
𝜎(𝑥) 𝜉𝑡+1 when 𝜉𝑡+1 ∼ 𝜙.
Hence, by (6),
1 𝑦 − 𝜇(𝑥)
𝑝(𝑥, 𝑦) = 𝜙( ) (7)
𝜎(𝑥) 𝜎(𝑥)
1 𝑦 − (1 − 𝛿)𝑥
𝑝(𝑥, 𝑦) = 𝜙( ) (8)
𝑠𝑓(𝑥) 𝑠𝑓(𝑥)
In this section of our lecture on finite Markov chains, we asked the following question: If
This intuitive equality states that the probability of being at 𝑗 tomorrow is the probability of
visiting 𝑖 today and then going on to 𝑗, summed over all possible 𝑖.
In the density case, we just replace the sum with an integral and probability mass functions
with densities, yielding
Note
Unlike most operators, we write 𝑃 to the right of its argument, instead of to the
left (i.e., 𝜓𝑃 instead of 𝑃 𝜓). This is a common convention, with the intention be-
ing to maintain the parallel with the finite case — see here.
With this notation, we can write (9) more succinctly as 𝜓𝑡+1 (𝑦) = (𝜓𝑡 𝑃 )(𝑦) for all 𝑦, or, drop-
ping the 𝑦 and letting “=” indicate equality of functions,
384 CHAPTER 20. CONTINUOUS STATE MARKOV CHAINS
𝜓𝑡+1 = 𝜓𝑡 𝑃 (11)
Equation (11) tells us that if we specify a distribution for 𝜓0 , then the entire sequence of fu-
ture distributions can be obtained by iterating with 𝑃 .
It’s interesting to note that (11) is a deterministic difference equation.
Thus, by converting a stochastic difference equation such as (3) into a stochastic kernel 𝑝 and
hence an operator 𝑃 , we convert a stochastic difference equation into a deterministic one (al-
beit in a much higher dimensional space).
Note
Some people might be aware that discrete Markov chains are in fact a special case
of the continuous Markov chains we have just described. The reason is that proba-
bility mass functions are densities with respect to the counting measure.
20.3.4 Computation
To learn about the dynamics of a given process, it’s useful to compute and study the se-
quences of densities generated by the model.
One way to do this is to try to implement the iteration described by (10) and (11) using nu-
merical integration.
However, to produce 𝜓𝑃 from 𝜓 via (10), you would need to integrate at every 𝑦, and there is
a continuum of such 𝑦.
Another possibility is to discretize the model, but this introduces errors of unknown size.
A nicer alternative in the present setting is to combine simulation with an elegant estimator
called the look ahead estimator.
Let’s go over the ideas with reference to the growth model discussed above, the dynamics of
which we repeat here for convenience:
Our aim is to compute the sequence {𝜓𝑡 } associated with this model and fixed initial condi-
tion 𝜓0 .
To approximate 𝜓𝑡 by simulation, recall that, by definition, 𝜓𝑡 is the density of 𝑘𝑡 given 𝑘0 ∼
𝜓0 .
If we wish to generate observations of this random variable, all we need to do is
A naive approach would be to use a histogram, or perhaps a smoothed histogram using the
kde function from KernelDensity.jl.
However, in the present setting there is a much better way to do this, based on the look-
ahead estimator.
With this estimator, to construct an estimate of 𝜓𝑡 , we actually generate 𝑛 observations of
𝑘𝑡−1 , rather than 𝑘𝑡 .
1 𝑛
Now we take these 𝑛 observations 𝑘𝑡−1 , … , 𝑘𝑡−1 and form the estimate
1 𝑛
𝜓𝑡𝑛 (𝑦) = 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) (13)
𝑛 𝑖=1
1 𝑛 𝑖 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) → 𝔼𝑝(𝑘𝑡−1 , 𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓𝑡−1 (𝑥) 𝑑𝑥 = 𝜓𝑡 (𝑦)
𝑛 𝑖=1
20.3.5 Implementation
A function which calls an LAE type for estimating densities by this technique can be found in
lae.jl.
This function returns the right-hand side of (13) using
• an object of type LAE that stores the stochastic kernel and the observations
• the value 𝑦 as its second argument
The function is vectorized, in the sense that if psi is such an instance and y is an array, then
the call psi(y) acts elementwise.
(This is the reason that we reshaped X and y inside the type — to make vectorization work)
20.3.6 Example
The following code is example of usage for the stochastic growth model described above
s = 0.2
386 CHAPTER 20. CONTINUOUS STATE MARKOV CHAINS
δ = 0.1
a_σ = 0.4 # A = exp(B) where B ~ N(0, a_σ)
α = 0.4 # We set f(k) = k**α
ψ_0 = Beta(5.0, 5.0) # Initial distribution
ϕ = LogNormal(0.0, a_σ)
function p(x, y)
# Stochastic kernel for the growth model with Cobb-Douglas production.
# Both x and y must be strictly positive.
d = s * x.^α
# Generate T instances of LAE using this data, one for each date t
laes = [LAE(p, k[:, t]) for t in T:-1:1]
# Plot
ygrid = range(0.01, 4, length = 200)
laes_plot = []
colors = []
for i in 1:T
ψ = laes[i]
push!(laes_plot, lae_est(ψ , ygrid))
push!(colors, RGBA(0, 0, 0, 1 - (i - 1)/T))
end
plot(ygrid, laes_plot, color = reshape(colors, 1, length(colors)), lw = 2,
xlabel = "capital", legend = :none)
t = "Density of k_1 (lighter) to k_T (darker) for T=$T"
plot!(title = t)
Out[3]:
20.4. BEYOND DENSITIES 387
The figure shows part of the density sequence {𝜓𝑡 }, with each density computed via the look
ahead estimator.
Notice that the sequence of densities shown in the figure seems to be converging — more on
this in just a moment.
Another quick comment is that each of these distributions could be interpreted as a cross sec-
tional distribution (recall this discussion).
Up until now, we have focused exclusively on continuous state Markov chains where all condi-
tional distributions 𝑝(𝑥, ⋅) are densities.
As discussed above, not all distributions can be represented as densities.
If the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥 cannot be represented as a density for
some 𝑥 ∈ 𝑆, then we need a slightly different theory.
The ultimate option is to switch from densities to probability measures, but not all readers
will be familiar with measure theory.
We can, however, construct a fairly general theory using distribution functions.
To illustrate the issues, recall that Hopenhayn and Rogerson [55] study a model of firm dy-
namics where individual firm productivity follows the exogenous process
IID
𝑋𝑡+1 = 𝑎 + 𝜌𝑋𝑡 + 𝜉𝑡+1 , where {𝜉𝑡 } ∼ 𝑁 (0, 𝜎2 )
388 CHAPTER 20. CONTINUOUS STATE MARKOV CHAINS
If you think about it, you will see that for any given 𝑥 ∈ [0, 1], the conditional distribution of
𝑋𝑡+1 given 𝑋𝑡 = 𝑥 puts positive probability mass on 0 and 1.
Hence it cannot be represented as a density.
What we can do instead is use cumulative distribution functions (cdfs).
To this end, set
This family of cdfs 𝐺(𝑥, ⋅) plays a role analogous to the stochastic kernel in the density case.
The distribution dynamics in (9) are then replaced by
Here 𝐹𝑡 and 𝐹𝑡+1 are cdfs representing the distribution of the current state and next period
state.
The intuition behind (14) is essentially the same as for (9).
20.4.2 Computation
If you wish to compute these cdfs, you cannot use the look-ahead estimator as before.
Indeed, you should not use any density estimator, since the objects you are estimating/com-
puting are not densities.
One good option is simulation as before, combined with the empirical distribution function.
20.5 Stability
In our lecture on finite Markov chains we also studied stationarity, stability and ergodicity.
Here we will cover the same topics for the continuous case.
We will, however, treat only the density case (as in this section), where the stochastic kernel
is a family of densities.
The general case is relatively similar — references are given below.
20.5. STABILITY 389
Analogous to the finite case, given a stochastic kernel 𝑝 and corresponding Markov operator
as defined in (10), a density 𝜓∗ on 𝑆 is called stationary for 𝑃 if it is a fixed point of the op-
erator 𝑃 .
In other words,
As with the finite case, if 𝜓∗ is stationary for 𝑃 , and the distribution of 𝑋0 is 𝜓∗ , then, in
view of (11), 𝑋𝑡 will have this same distribution for all 𝑡.
Hence 𝜓∗ is the stochastic equivalent of a steady state.
In the finite case, we learned that at least one stationary distribution exists, although there
may be many.
When the state space is infinite, the situation is more complicated.
Even existence can fail very easily.
For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210).
However, there are well-known conditions under which a stationary density 𝜓∗ exists.
With additional conditions, we can also get a unique stationary density (𝜓 ∈ 𝒟 and 𝜓 =
𝜓𝑃 ⟹ 𝜓 = 𝜓∗ ), and also global convergence in the sense that
∀ 𝜓 ∈ 𝒟, 𝜓𝑃 𝑡 → 𝜓∗ as 𝑡 → ∞ (16)
This combination of existence, uniqueness and global convergence in the sense of (16) is often
referred to as global stability.
Under very similar conditions, we get ergodicity, which means that
1 𝑛
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑛 → ∞ (17)
𝑛 𝑡=1
for any (measurable) function ℎ ∶ 𝑆 → ℝ such that the right-hand side is finite.
Note that the convergence in (17) does not depend on the distribution (or value) of 𝑋0 .
This is actually very important for simulation — it means we can learn about 𝜓∗ (i.e., ap-
proximate the right hand side of (17) via the left hand side) without requiring any special
knowledge about what to do with 𝑋0 .
So what are these conditions we require to get global stability and ergodicity?
In essence, it must be the case that
1. Probability mass does not drift off to the “edges” of the state space
2. Sufficient “mixing” obtains
As stated above, the growth model treated here is stable under mild conditions on the primi-
tives.
• See EDTC, section 11.3.4 for more details.
We can see this stability in action — in particular, the convergence in (16) — by simulating
the path of densities from various initial conditions.
Here is such a figure
All sequences are converging towards the same limit, regardless of their initial condition.
The details regarding initial conditions and so on are given in this exercise, where you are
asked to replicate the figure.
In the preceding figure, each sequence of densities is converging towards the unique stationary
density 𝜓∗ .
Even from this figure we can get a fair idea what 𝜓∗ looks like, and where its mass is located.
20.6. EXERCISES 391
However, there is a much more direct way to estimate the stationary density, and it involves
only a slight modification of the look ahead estimator.
Let’s say that we have a model of the form (3) that is stable and ergodic.
Let 𝑝 be the corresponding stochastic kernel, as given in (7).
To approximate the stationary density 𝜓∗ , we can simply generate a long time series
𝑋0 , 𝑋1 , … , 𝑋𝑛 and estimate 𝜓∗ via
1 𝑛
𝜓𝑛∗ (𝑦) = ∑ 𝑝(𝑋𝑡 , 𝑦) (18)
𝑛 𝑡=1
This is essentially the same as the look ahead estimator (13), except that now the observa-
tions we generate are a single time series, rather than a cross section.
The justification for (18) is that, with probability one as 𝑛 → ∞,
1 𝑛
∑ 𝑝(𝑋𝑡 , 𝑦) → ∫ 𝑝(𝑥, 𝑦)𝜓∗ (𝑥) 𝑑𝑥 = 𝜓∗ (𝑦)
𝑛 𝑡=1
where the convergence is by (17) and the equality on the right is by (15).
The right hand side is exactly what we want to compute.
On top of this asymptotic result, it turns out that the rate of convergence for the look ahead
estimator is very good.
The first exercise helps illustrate this point.
20.6 Exercises
20.6.1 Exercise 1
IID
𝑋𝑡+1 = 𝜃|𝑋𝑡 | + (1 − 𝜃2 )1/2 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (19)
This is one of those rare nonlinear stochastic models where an analytical expression for the
stationary density is available.
In particular, provided that |𝜃| < 1, there is a unique stationary density 𝜓∗ given by
𝜃𝑦
𝜓∗ (𝑦) = 2 𝜙(𝑦) Φ [ ] (20)
(1 − 𝜃2 )1/2
Here 𝜙 is the standard normal density and Φ is the standard normal cdf.
As an exercise, compute the look ahead estimate of 𝜓∗ , as defined in (18), and compare it
with 𝜓∗ in (20) to see whether they are indeed close for large 𝑛.
In doing so, set 𝜃 = 0.8 and 𝑛 = 500.
The next figure shows the result of such a computation
392 CHAPTER 20. CONTINUOUS STATE MARKOV CHAINS
The additional density (black line) is a nonparametric kernel density estimate, added to the
solution for illustration.
(You can try to replicate it before looking at the solution if you want to)
As you can see, the look ahead estimator is a much tighter fit than the kernel density estima-
tor.
If you repeat the simulation you will see that this is consistently the case.
20.6.2 Exercise 2
for i in 1:4
# .... some code
rand_draws = (rand(ψ_0, n) .+ 2.5i) ./ 2
20.6. EXERCISES 393
20.6.3 Exercise 3
In [4]: n = 500
x = randn(n) # N(0, 1)
x = exp.(x) # Map x to lognormal
y = randn(n) .+ 2.0 # N(2, 1)
z = randn(n) .+ 4.0 # N(4, 1)
data = vcat(x, y, z)
l = ["X" "Y" "Z"]
xlabels = reshape(repeat(l, n), 3n, 1)
Out[4]:
{𝑋1 , … , 𝑋𝑛 } ∼ 𝐿𝑁 (0, 1), {𝑌1 , … , 𝑌𝑛 } ∼ 𝑁 (2, 1), and {𝑍1 , … , 𝑍𝑛 } ∼ 𝑁 (4, 1),
2. Create a boxplot representing 𝑛 distributions, where the 𝑡-th distribution shows the 𝑘
observations of 𝑋𝑡 .
20.7 Solutions
20.7.1 Exercise 1
Look ahead estimation of a TAR stationary density, where the TAR model is
and 𝜉𝑡 ∼ 𝑁 (0, 1). Try running at n = 10, 100, 1000, 10000 to get an idea of the speed of con-
vergence.
In [6]: ϕ = Normal()
n = 500
θ = 0.8
d = sqrt(1.0 - θ^2)
δ = θ / d
Z = rand(ϕ, n)
X = zeros(n)
for t in 1:n-1
X[t+1] = θ * abs(X[t]) + d * Z[t]
end
20.7. SOLUTIONS 395
Out[6]:
20.7.2 Exercise 2
In [7]: s = 0.2
δ = 0.1
a_σ = 0.4 # A = exp(B) where B ~ N(0, a_σ)
α = 0.4 # We set f(k) = k**α
ψ_0 = Beta(5.0, 5.0) # Initial distribution
ϕ = LogNormal(0.0, a_σ)
function p_growth(x, y)
# Stochastic kernel for the growth model with Cobb-Douglas production.
# Both x and y must be strictly positive.
d = s * x.^α
end
xmax = 6.5
ygrid = range(0.01, xmax, length = 150)
laes_plot = zeros(length(ygrid), 4T)
colors = []
for i in 1:4
k = zeros(n, T)
A = rand!(ϕ, zeros(n, T))
# Generate T instances of LAE using this data, one for each date t
laes = [LAE(p_growth, k[:, t]) for t in T:-1:1]
ind = i
for j in 1:T
ψ = laes[j]
laes_plot[:, ind] = lae_est(ψ, ygrid)
ind = ind + 4
push!(colors, RGBA(0, 0, 0, 1 - (j - 1) / T))
end
end
Out[7]:
20.7. SOLUTIONS 397
20.7.3 Exercise 3
In [8]: n = 20
k = 5000
J = 6
θ = 0.9
d = sqrt(1 - θ^2)
δ = θ / d
Z = randn(k, n, J)
titles = []
data = []
x_labels = []
for j in 1:J
title = "time series from t = $(initial_conditions[j])"
push!(titles, title)
X = zeros(k, n)
X[:, 1] .= initial_conditions[j]
labels = []
labels = vcat(labels, ones(k, 1))
for t in 2:n
X[:, t] = θ .* abs.(X[:, t-1]) .+ d .* Z[:, t, j]
labels = vcat(labels, t*ones(k, 1))
end
X = reshape(X, n*k, 1)
398 CHAPTER 20. CONTINUOUS STATE MARKOV CHAINS
push!(data, X)
push!(x_labels, labels)
end
In [9]: plots = []
for i in 1:J
push!(plots, boxplot(vec(x_labels[i]), vec(data[i]), title = titles[i]))
end
plot(plots..., layout = (J, 1), legend = :none, size = (800, 2000))
Out[9]:
20.7. SOLUTIONS 399
400 CHAPTER 20. CONTINUOUS STATE MARKOV CHAINS
20.8 Appendix
21.1 Contents
• Overview 21.2
• The Basic Idea 21.3
• Convergence 21.4
• Implementation 21.5
• Exercises 21.6
• Solutions 21.7
21.2 Overview
This lecture provides a simple and intuitive introduction to the Kalman filter, for those who
either
• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from
For additional (more advanced) reading on the Kalman filter, see
• [68], section 2.7.
• [3]
The second reference presents a comprehensive treatment of the Kalman filter.
Required knowledge: Familiarity with matrix manipulations, multivariate normal distribu-
tions, covariance matrices, etc.
21.2.1 Setup
401
402 CHAPTER 21. A FIRST LOOK AT THE KALMAN FILTER
The Kalman filter has many applications in economics, but for now let’s pretend that we are
rocket scientists.
A missile has been launched from country Y and our mission is to track it.
Let 𝑥 ∈ ℝ2 denote the current location of the missile—a pair indicating latitude-longitude
coordinates on a map.
At the present moment in time, the precise location 𝑥 is unknown, but we do have some be-
liefs about 𝑥.
One way to summarize our knowledge is a point prediction 𝑥̂
• But what if the President wants to know the probability that the missile is currently
over the Sea of Japan?
• Then it is better to summarize our initial beliefs with a bivariate probability density 𝑝.
– ∫𝐸 𝑝(𝑥)𝑑𝑥 indicates the probability that we attach to the missile being in region 𝐸
The density 𝑝 is called our prior for the random variable 𝑥.
To keep things tractable in our example, we assume that our prior is Gaussian. In particular,
we take
𝑝 = 𝑁 (𝑥,̂ Σ) (1)
where 𝑥̂ is the mean of the distribution and Σ is a 2×2 covariance matrix. In our simulations,
we will suppose that
This density 𝑝(𝑥) is shown below as a contour map, with the center of the red ellipse being
equal to 𝑥̂
R = 0.5 .* Σ
# define A and Q
A = [1.2 0
0 -0.2]
Q = 0.3Σ
y = [2.3, -1.9]
21.3. THE BASIC IDEA 403
# plotting objects
x_grid = range(-1.5, 2.9, length = 100)
y_grid = range(-3.1, 1.7, length = 100)
# generate distribution
�
dist = MvNormal(x, Σ)
two_args_to_pdf(dist) = (x, y) -> pdf(dist, [x, y]) # returns a function�
↪to be plotted
# plot
contour(x_grid, y_grid, two_args_to_pdf(dist), fill = false,
color = :lighttest, cbar = false)
contour!(x_grid, y_grid, two_args_to_pdf(dist), fill = false, lw=1,
color = :grays, cbar = false)
Out[4]:
We are now presented with some good news and some bad news.
The good news is that the missile has been located by our sensors, which report that the cur-
rent location is 𝑦 = (2.3, −1.9).
The next figure shows the original prior 𝑝(𝑥) and the new reported location 𝑦
Out[5]:
404 CHAPTER 21. A FIRST LOOK AT THE KALMAN FILTER
Here 𝐺 and 𝑅 are 2 × 2 matrices with 𝑅 positive definite. Both are assumed known, and the
noise term 𝑣 is assumed to be independent of 𝑥.
How then should we combine our prior 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ) and this new information 𝑦 to improve
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us to update our
prior 𝑝(𝑥) to 𝑝(𝑥 | 𝑦) via
𝑝(𝑦 | 𝑥) 𝑝(𝑥)
𝑝(𝑥 | 𝑦) =
𝑝(𝑦)
𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 )
21.3. THE BASIC IDEA 405
where
𝑥𝐹̂ ∶= 𝑥̂ + Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂ and Σ𝐹 ∶= Σ − Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ (4)
Here Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is the matrix of population regression coefficients of the hidden object
𝑥 − 𝑥̂ on the surprise 𝑦 − 𝐺𝑥.̂
This new density 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is shown in the next figure via contour lines and the
color map.
The original density is left in as contour lines for comparison
Out[6]:
Our new density twists the prior 𝑝(𝑥) in a direction determined by the new information 𝑦 −
𝐺𝑥.̂
406 CHAPTER 21. A FIRST LOOK AT THE KALMAN FILTER
In generating the figure, we set 𝐺 to the identity matrix and 𝑅 = 0.5Σ for Σ defined in (2).
Our aim is to combine this law of motion and our current distribution 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) to
come up with a new predictive distribution for the location in one unit of time.
In view of (5), all we have to do is introduce a random vector 𝑥𝐹 ∼ 𝑁 (𝑥𝐹̂ , Σ𝐹 ) and work out
the distribution of 𝐴𝑥𝐹 + 𝑤 where 𝑤 is independent of 𝑥𝐹 and has distribution 𝑁 (0, 𝑄).
Since linear combinations of Gaussians are Gaussian, 𝐴𝑥𝐹 + 𝑤 is Gaussian.
Elementary calculations and the expressions in (4) tell us that
and
The matrix 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is often written as 𝐾Σ and called the Kalman gain.
• The subscript Σ has been added to remind us that 𝐾Σ depends on Σ, but not 𝑦 or 𝑥.̂
Using this notation, we can summarize our results as follows.
Our updated prediction is the density 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) where
𝑥𝑛𝑒𝑤
̂ ∶= 𝐴𝑥̂ + 𝐾Σ (𝑦 − 𝐺𝑥)̂
(6)
Σ𝑛𝑒𝑤 ∶= 𝐴Σ𝐴′ − 𝐾Σ 𝐺Σ𝐴′ + 𝑄
• The density 𝑝𝑛𝑒𝑤 (𝑥) = 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) is called the predictive distribution.
The predictive distribution is the new density shown in the following figure, where the update
has used parameters
21.3. THE BASIC IDEA 407
1.2 0.0
𝐴=( ), 𝑄 = 0.3 ∗ Σ
0.0 −0.2
# plot Density 3
contour(x_grid, y_grid, two_args_to_pdf(predictdist), fill = false, lw = 1,
color = :lighttest, cbar = false)
contour!(x_grid, y_grid, two_args_to_pdf(dist),
color = :grays, cbar = false)
contour!(x_grid, y_grid, two_args_to_pdf(newdist), fill = false, levels = 7,
color = :grays, cbar = false)
annotate!(y[1], y[2], "y", color = :black)
Out[7]:
Swapping notation 𝑝𝑡 (𝑥) for 𝑝(𝑥) and 𝑝𝑡+1 (𝑥) for 𝑝𝑛𝑒𝑤 (𝑥), the full recursive procedure is:
𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾Σ𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
(7)
Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐾Σ𝑡 𝐺Σ𝑡 𝐴′ + 𝑄
These are the standard dynamic equations for the Kalman filter (see, for example, [68], page
58).
21.4 Convergence
A sufficient (but not necessary) condition is that all the eigenvalues 𝜆𝑖 of 𝐴 satisfy |𝜆𝑖 | < 1
(cf. e.g., [3], p. 77).
(This strong condition assures that the unconditional distribution of 𝑥𝑡 converges as 𝑡 → +∞)
In this case, for any initial choice of Σ0 that is both nonnegative and symmetric, the sequence
{Σ𝑡 } in (8) converges to a nonnegative symmetric matrix Σ that solves (9).
21.5 Implementation
The QuantEcon.jl package is able to implement the Kalman filter by using methods for the
type Kalman
• Instance data consists of:
– The parameters 𝐴, 𝐺, 𝑄, 𝑅 of a given model
– the moments (𝑥𝑡̂ , Σ𝑡 ) of the current prior
• The type Kalman from the QuantEcon.jl package has a number of methods, some that
we will wait to use until we study more advanced applications in subsequent lectures.
• Methods pertinent for this lecture are:
– prior_to_filtered, which updates (𝑥𝑡̂ , Σ𝑡 ) to (𝑥𝐹 𝐹
𝑡̂ , Σ𝑡 )
– filtered_to_forecast, which updates the filtering distribution to the predic-
tive distribution – which becomes the new prior (𝑥𝑡+1
̂ , Σ𝑡+1 )
– update, which combines the last two methods
– a stationary_values, which computes the solution to (9) and the correspond-
ing (stationary) Kalman gain
You can view the program on GitHub.
21.6 Exercises
21.6.1 Exercise 1
Consider the following simple application of the Kalman filter, loosely based on [68], section
2.9.2.
Suppose that
• all variables are scalars
• the hidden state {𝑥𝑡 } is in fact constant, equal to some 𝜃 ∈ ℝ unknown to the modeler
State dynamics are therefore given by (5) with 𝐴 = 1, 𝑄 = 0 and 𝑥0 = 𝜃.
The measurement equation is 𝑦𝑡 = 𝜃 + 𝑣𝑡 where 𝑣𝑡 is 𝑁 (0, 1) and iid.
The task of this exercise to simulate the model and, using the code from kalman.jl, plot
the first five predictive densities 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 ).
As shown in [68], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value 𝜃.
In the simulation, take 𝜃 = 10, 𝑥0̂ = 8 and Σ0 = 1.
Your figure should – modulo randomness – look something like this
410 CHAPTER 21. A FIRST LOOK AT THE KALMAN FILTER
21.6.2 Exercise 2
The preceding figure gives some support to the idea that probability mass converges to 𝜃.
To get a better idea, choose a small 𝜖 > 0 and calculate
𝜃+𝜖
𝑧𝑡 ∶= 1 − ∫ 𝑝𝑡 (𝑥)𝑑𝑥
𝜃−𝜖
for 𝑡 = 0, 1, 2, … , 𝑇 .
Plot 𝑧𝑡 against 𝑇 , setting 𝜖 = 0.1 and 𝑇 = 600.
Your figure should show error erratically declining something like this
21.6. EXERCISES 411
21.6.3 Exercise 3
As discussed above, if the shock sequence {𝑤𝑡 } is not degenerate, then it is not in general
possible to predict 𝑥𝑡 without error at time 𝑡 − 1 (and this would be the case even if we could
observe 𝑥𝑡−1 ).
Let’s now compare the prediction 𝑥𝑡̂ made by the Kalman filter against a competitor who is
allowed to observe 𝑥𝑡−1 .
This competitor will use the conditional expectation 𝔼[𝑥𝑡 | 𝑥𝑡−1 ], which in this case is 𝐴𝑥𝑡−1 .
The conditional expectation is known to be the optimal prediction method in terms of mini-
mizing mean squared error.
(More precisely, the minimizer of 𝔼 ‖𝑥𝑡 − 𝑔(𝑥𝑡−1 )‖2 with respect to 𝑔 is 𝑔∗ (𝑥𝑡−1 ) ∶= 𝔼[𝑥𝑡 | 𝑥𝑡−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in
the sense of being able to observe the latent state) and behaves optimally in terms of mini-
mizing squared error.
Our horse race will be assessed in terms of squared error.
In particular, your task is to generate a graph plotting observations of both ‖𝑥𝑡 − 𝐴𝑥𝑡−1 ‖2 and
‖𝑥𝑡 − 𝑥𝑡̂ ‖2 against 𝑡 for 𝑡 = 1, … , 50.
For the parameters, set 𝐺 = 𝐼, 𝑅 = 0.5𝐼 and 𝑄 = 0.3𝐼, where 𝐼 is the 2 × 2 identity.
Set
0.5 0.4
𝐴=( )
0.6 0.3
0.9 0.3
Σ0 = ( )
0.3 0.9
Observe how, after an initial learning period, the Kalman filter performs quite well, even rela-
tive to the competitor who predicts optimally with knowledge of the latent state.
21.6.4 Exercise 4
21.7 Solutions
21.7.1 Exercise 1
In [9]: # parameters
θ = 10
21.7. SOLUTIONS 413
m, v = kalman.cur_x_hat, kalman.cur_sigma
densities[:, i] = pdf.(Normal(m, sqrt(v)), xgrid)
Out[9]:
21.7.2 Exercise 2
ϵ = 0.1
kalman = Kalman(A, G, Q, R)
�
set_state!(kalman, x_0, Σ_0)
nodes, weights = qnwlege(21, θ-ϵ, θ+ϵ)
T = 600
z = zeros(T)
for t in 1:T
# record the current predicted mean and variance, and plot their�
↪densities
m, v = kalman.cur_x_hat, kalman.cur_sigma
dist = Truncated(Normal(m, sqrt(v)), θ-30*ϵ, θ+30*ϵ) # define on�
↪compact interval,
Out[10]:
21.7.3 Exercise 3
In [11]: # define A, Q, G, R
G = I + zeros(2, 2)
R = 0.5 .* G
21.7. SOLUTIONS 415
A = [0.5 0.4
0.6 0.3]
Q = 0.3 .* G
# print eigenvalues of A
println("Eigenvalues of A:\n$(eigvals(A))")
# print stationary Σ
S, K = stationary_values(kn)
println("Stationary prediction error variance:\n$S")
Eigenvalues of A:
[-0.10000000000000003, 0.8999999999999999]
Stationary prediction error variance:
[0.4032910794778669 0.10507180275061759; 0.1050718027506176 0.41061709375220456]
Out[11]:
416 CHAPTER 21. A FIRST LOOK AT THE KALMAN FILTER
Footnotes
[1] See, for example, page 93 of [14]. To get from his expressions to the ones used above, you
will also need to apply the Woodbury matrix identity.
Chapter 22
22.1 Contents
• Overview 22.2
• Factorizations 22.3
• Continuous-Time Markov Chains (CTMCs) 22.4
• Banded Matrices 22.5
• Implementation Details and Performance 22.6
• Exercises 22.7
22.2 Overview
In this lecture, we examine the structure of matrices and linear operators (e.g., dense, sparse,
symmetric, tridiagonal, banded) and discuss how the structure can be exploited to radically
increase the performance of solving large problems.
We build on applications discussed in previous lectures: linear algebra, orthogonal projec-
tions, and Markov chains.
The methods in this section are called direct methods, and they are qualitatively similar to
performing Gaussian elimination to factor matrices and solve systems of equations. In itera-
tive methods and sparsity we examine a different approach, using iterative algorithms, where
we can think of more general linear operators.
The list of specialized packages for these tasks is enormous and growing, but some of the im-
portant organizations to look at are JuliaMatrices , JuliaSparse, and JuliaMath
NOTE: As this section uses advanced Julia techniques, you may wish to review multiple-
dispatch and generic programming in introduction to types, and consider further study on
generic programming.
The theme of this lecture, and numerical linear algebra in general, comes down to three prin-
ciples:
417
418 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
1. Identify structure (e.g., symmetric, sparse, diagonal) matrices in order to use spe-
cialized algorithms.
2. Do not lose structure by applying the wrong numerical linear algebra operations at
the wrong times (e.g., sparse matrix becoming dense)
22.2.1 Setup
Ask yourself whether the following is a computationally expensive operation as the matrix
size increases
• Multiplying two matrices?
– Answer: It depends. Multiplying two diagonal matrices is trivial.
• Solving a linear system of equations?
– Answer: It depends. If the matrix is the identity, the solution is the vector itself.
• Finding the eigenvalues of a matrix?
– Answer: It depends. The eigenvalues of a triangular matrix are the diagonal ele-
ments.
As the goal of this section is to move toward numerical methods with large systems, we need
to understand how well algorithms scale with the size of matrices, vectors, etc. This is known
as computational complexity. As we saw in the answer to the questions above, the algorithm
- and hence the computational complexity - changes based on matrix structure.
While this notion of complexity can work at various levels, such as the number of significant
digits for basic mathematical operations, the amount of memory and storage required, or the
amount of time, we will typically focus on the time complexity.
For time complexity, the size 𝑁 is usually the dimensionality of the problem, although occa-
sionally the key will be the number of non-zeros in the matrix or the width of bands. For our
applications, time complexity is best thought of as the number of floating point operations
(e.g., addition, multiplication) required.
Notation
The interpretation is that there exist some constants 𝑀 and 𝑁0 such that
You will sometimes need to think through how combining algorithms changes complexity. For
example, if you use
1. an 𝑂(𝑁 3 ) operation 𝑃 times, then it simply changes the constant. The complexity re-
mains 𝑂(𝑁 3 ).
2. one 𝑂(𝑁 3 ) operation and one 𝑂(𝑁 2 ) operation, then you take the max. The complexity
remains 𝑂(𝑁 3 ).
3. a repetition of an 𝑂(𝑁 ) operation that itself uses an 𝑂(𝑁 ) operation, you take the
product. The complexity becomes 𝑂(𝑁 2 ).
In [3]: A = sprand(10, 10, 0.45) # random sparse 10x10, 45 percent filled with�
↪non-zeros
Array.
@show nnz(invA);
nnz(A) = 47
nnz(invA) = 100
This increase from less than 50 to 100 percent dense demonstrates that significant sparsity
can be lost when computing an inverse.
The results can be even more extreme. Consider a tridiagonal matrix of size 𝑁 × 𝑁 that
might come out of a Markov chain or a discretization of a diffusion process,
In [4]: N = 5
A = Tridiagonal([fill(0.1, N-2); 0.2], fill(0.8, N), [0.2; fill(0.1, N-2);])
The number of non-zeros here is approximately 3𝑁 , linear, which scales well for huge matri-
ces into the millions or billions
But consider the inverse
In [5]: inv(A)
nnz(A) / 20 ^ 2 = 0.2825
nnz(A' * A) / 21 ^ 2 = 0.800453514739229
22.2. OVERVIEW 421
We see that a 30 percent dense matrix becomes almost full dense after the product is taken.
Sparsity/Structure is not just for storage: Matrix size can sometimes become important (e.g.,
a 1 million by 1 million tridiagonal matrix needs to store 3 million numbers (i.e., about 6MB
of memory), where a dense one requires 1 trillion (i.e., about 1TB of memory)).
But, as we will see, the main purpose of considering sparsity and matrix structure is that it
enables specialized algorithms, which typically have a lower computational order than un-
structured dense, or even unstructured sparse, operations.
First, create a convenient function for benchmarking linear solvers
In [8]: N = 1000
b = rand(N)
A = Tridiagonal([fill(0.1, N-2); 0.2], fill(0.8, N), [0.2; fill(0.1, N-2);])
A_sparse = sparse(A) # sparse but losing tridiagonal structure
A_dense = Array(A) # dropping the sparsity structure, dense 1000x1000
This example shows what is at stake: using a structured tridiagonal matrix may be 10-20
times faster than using a sparse matrix, which is 100 times faster than using a dense matrix.
In fact, the difference becomes more extreme as the matrices grow. Solving a tridiagonal sys-
tem is 𝑂(𝑁 ), while that of a dense matrix without any structure is 𝑂(𝑁 3 ). The complexity of
a sparse solution is more complicated, and scales in part by the nnz(N), i.e., the number of
nonzeros.
While we write matrix multiplications in our algebra with abundance, in practice the compu-
tational operation scales very poorly without any matrix structure.
422 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
In [9]: N = 5
U = UpperTriangular(rand(N,N))
In [10]: L = U'
But the product is fully dense (e.g., think of a Cholesky multiplied by itself to produce a co-
variance matrix)
In [11]: L * U
On the other hand, a tridiagonal matrix times a diagonal matrix is still tridiagonal - and can
use specialized 𝑂(𝑁 ) algorithms.
D = Diagonal(rand(N))
D * A
22.3. FACTORIZATIONS 423
22.3 Factorizations
When you tell a numerical analyst you are solving a linear system using direct methods, their
first question is “which factorization?”.
Just as you can factor a number (e.g., 6 = 3 × 2) you can factor a matrix as the product
of other, more convenient matrices (e.g., 𝐴 = 𝐿𝑈 or 𝐴 = 𝑄𝑅, where 𝐿, 𝑈 , 𝑄, and 𝑅 have
properties such as being triangular, orthogonal, etc.).
On paper, since the Invertible Matrix Theorem tells us that a unique solution is equivalent to
𝐴 being invertible, we often write the solution to 𝐴𝑥 = 𝑏 as
𝑥 = 𝐴−1 𝑏
In [13]: N = 4
A = rand(N,N)
b = rand(N)
In [14]: x = inv(A) * b
As we will see throughout, inverting matrices should be used for theory, not for code. The
classic advice that you should never invert a matrix may be slightly exaggerated, but is gen-
erally good advice.
424 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
Solving a system by inverting a matrix is always a little slower, is potentially less accurate,
and will sometimes lose crucial sparsity compared to using factorizations. Moreover, the
methods used by libraries to invert matrices are frequently the same factorizations used for
computing a system of equations.
Even if you need to solve a system with the same matrix multiple times, you are better off
factoring the matrix and using the solver rather than calculating an inverse.
In [15]: N = 100
A = rand(N,N)
M = 30
B = rand(N,M)
function solve_inverting(A, B)
A_inv = inv(A)
X = similar(B)
for i in 1:size(B,2)
X[:,i] = A_inv * B[:,i]
end
return X
end
function solve_factoring(A, B)
X = similar(B)
A = factorize(A)
for i in 1:size(B,2)
X[:,i] = A \ B[:,i]
end
return X
end
Some matrices are already in a convenient form and require no further factoring.
For example, consider solving a system with an UpperTriangular matrix,
In [17]: U \ b
A LowerTriangular matrix has similar properties and can be solved with forward substitu-
tion.
The computational order of back substitution and forward substitution is 𝑂(𝑁 2 ) for dense
matrices. Those fast algorithms are a key reason that factorizations target triangular struc-
tures.
22.3.3 LU Decomposition
The 𝐿𝑈 decomposition finds a lower triangular matrix 𝐿 and an upper triangular matrix 𝑈
such that 𝐿𝑈 = 𝐴.
For a general dense matrix without any other structure (i.e., not known to be symmetric,
tridiagonal, etc.) this is the standard approach to solve a system and exploit the speed of
back and forward substitution using the factorization.
The computational order of LU decomposition itself for a dense matrix is 𝑂(𝑁 3 ) - the same
as Gaussian elimination - but it tends to have a better constant term than others (e.g., half
the number of operations of the QR decomposition). For structured or sparse matrices, that
order drops.
We can see which algorithm Julia will use for the \ operator by looking at the factorize
function for a given matrix.
In [18]: N = 4
A = rand(N,N)
b = rand(N)
Out[18]: LU{Float64,Array{Float64,2}}
L factor:
4×4 Array{Float64,2}:
1.0 0.0 0.0 0.0
0.563082 1.0 0.0 0.0
0.730109 0.912509 1.0 0.0
0.114765 0.227879 0.115228 1.0
U factor:
4×4 Array{Float64,2}:
0.79794 0.28972 0.765939 0.496278
0.0 0.82524 0.23962 -0.130989
0.0 0.0 -0.447888 0.374303
0.0 0.0 0.0 0.725264
426 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
In [19]: Af \ b
In [20]: b2 = rand(N)
Af \ b2
In practice, the decomposition also includes a 𝑃 which is a permutation matrix such that
𝑃 𝐴 = 𝐿𝑈 .
Out[21]: true
We can also directly calculate an LU decomposition with lu but without the pivoting,
matrices
Out[22]: LU{Float64,Array{Float64,2}}
L factor:
4×4 Array{Float64,2}:
1.0 0.0 0.0 0.0
0.730109 1.0 0.0 0.0
0.563082 1.09588 1.0 0.0
0.114765 0.249728 0.122733 1.0
U factor:
4×4 Array{Float64,2}:
0.79794 0.28972 0.765939 0.496278
0.0 0.753039 -0.229233 0.254774
0.0 0.0 0.490832 -0.410191
0.0 0.0 0.0 0.725264
In [23]: A ≈ L * U
22.3. FACTORIZATIONS 427
Out[23]: true
To see roughly how the solver works, note that we can write the problem 𝐴𝑥 = 𝑏 as 𝐿𝑈 𝑥 = 𝑏.
Let 𝑈 𝑥 = 𝑦, which breaks the problem into two sub-problems.
𝐿𝑦 = 𝑏
𝑈𝑥 = 𝑦
As we saw above, this is the solution to two triangular systems, which can be efficiently done
with back or forward substitution in 𝑂(𝑁 2 ) operations.
To demonstrate this, first using
In [24]: y = L \ b
In [25]: x = U \ y
x ≈ A \ b # Check identical
Out[25]: true
The LU decomposition also has specialized algorithms for structured matrices, such as a
Tridiagonal
In [26]: N = 1000
b = rand(N)
A = Tridiagonal([fill(0.1, N-2); 0.2], fill(0.8, N), [0.2; fill(0.1, N-2);
↪])
Out[26]: LU{Float64,Tridiagonal{Float64,Array{Float64,1}}}
This factorization is the key to the performance of the A \ b in this case. For Tridiagonal
matrices, the LU decomposition is 𝑂(𝑁 2 ).
Finally, just as a dense matrix without any structure uses an LU decomposition to solve a
system, so will the sparse solvers
sparse
Out[27]: SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}
In [28]: benchmark_solve(A, b)
benchmark_solve(A_sparse, b);
428 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
With sparsity, the computational order is related to the number of non-zeros rather than the
size of the matrix itself.
In [29]: N = 500
B = rand(N,N)
A_dense = B' * B # an easy way to generate a symmetric positive semi-
definite matrix
A = Symmetric(A_dense) # flags the matrix as symmetric
Out[29]: BunchKaufman{Float64,Array{Float64,2}}
Here, the 𝐴 decomposition is Bunch-Kaufman rather than Cholesky, because Julia doesn’t
know that the matrix is positive semi-definite. We can manually factorize with a Cholesky,
Out[30]: Cholesky{Float64,Array{Float64,2}}
Benchmarking,
In [31]: b = rand(N)
cholesky(A) \ b # use the factorization to solve
benchmark_solve(A, b)
benchmark_solve(A_dense, b)
@btime cholesky($A, check=false) \ $b;
22.3.5 QR Decomposition
Previously, we learned about applications of the QR decomposition to solving the linear least
squares.
While in principle the solution to the least-squares problem
is 𝑥 = (𝐴′ 𝐴)−1 𝐴′ 𝑏, in practice note that 𝐴′ 𝐴 becomes dense and calculating the inverse is
rarely a good idea.
The QR decomposition is a decomposition 𝐴 = 𝑄𝑅 where 𝑄 is an orthogonal matrix (i.e.,
𝑄′ 𝑄 = 𝑄𝑄′ = 𝐼) and 𝑅 is an upper triangular matrix.
Given the previous derivation, we showed that we can write the least-squares problem as the
solution to
𝑅𝑥 = 𝑄′ 𝑏
where, as discussed above, the upper-triangular structure of 𝑅 can be solved easily with back
substitution.
The \ operator solves the linear least-squares problem whenever the given A is rectangular
In [32]: N = 10
M = 3
x_true = rand(3)
A = rand(N,M) .+ randn(N)
b = rand(N)
x = A \ b
In [33]: Af = qr(A)
Q = Af.Q
R = [Af.R; zeros(N - M, M)] # Stack with zeros
@show Q * R ≈ A
x = R \ Q'*b # simplified QR solution for least squares
Q * R ≈ A = true
This stacks the R with zeros, but the more specialized algorithm would not multiply directly
in that way.
In some cases, if an LU is not available for a particular matrix structure, the QR factorization
can also be used to solve systems of equations (i.e., not just LLS). This tends to be about 2
times slower than the LU but is of the same computational order.
Deriving the approach, where we can now use the inverse since the system is square and we
assumed 𝐴 was non-singular,
𝐴𝑥 = 𝑏
𝑄𝑅𝑥 = 𝑏
𝑄 𝑄𝑅𝑥 = 𝑄−1 𝑏
−1
𝑅𝑥 = 𝑄′ 𝑏
where the last step uses the fact that 𝑄−1 = 𝑄′ for an orthogonal matrix.
Given the decomposition, the solution for dense matrices is of computational order 𝑂(𝑁 2 ).
To see this, look at the order of each operation.
• Since 𝑅 is an upper-triangular matrix, it can be solved quickly through back substitu-
tion with computational order 𝑂(𝑁 2 )
• A transpose operation is of order 𝑂(𝑁 2 )
• A matrix-vector product is also 𝑂(𝑁 2 )
In all cases, the order would drop depending on the sparsity pattern of the matrix (and cor-
responding decomposition). A key benefit of a QR decomposition is that it tends to maintain
sparsity.
Without implementing the full process, you can form a QR factorization with qr and then
use it to solve a system
In [34]: N = 5
A = rand(N,N)
b = rand(N)
@show A \ b
@show qr(A) \ b;
𝐴 = 𝑄Λ𝑄−1
In Julia, whenever you ask for a full set of eigenvectors and eigenvalues, it decomposes using
an algorithm appropriate for the matrix type. For example, symmetric, Hermitian, and tridi-
agonal matrices have specialized algorithms.
To see this,
A_eig = eigen(A)
Λ = Diagonal(A_eig.values)
Q = A_eig.vectors
norm(Q * Λ * inv(Q) - A)
Out[35]: 2.803627108839096e-15
Keep in mind that a real matrix may have complex eigenvalues and eigenvectors, so if you
attempt to check Q * Λ * inv(Q) - A - even for a positive-definite matrix - it may not
be a real number due to numerical inaccuracy.
In the previous lecture on discrete-time Markov chains, we saw that the transition probability
between state 𝑥 and state 𝑦 was summarized by the matrix 𝑃 (𝑥, 𝑦) ∶= ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥}.
As a brief introduction to continuous time processes, consider the same state space as in the
discrete case: 𝑆 is a finite set with 𝑛 elements {𝑥1 , … , 𝑥𝑛 }.
A Markov chain {𝑋𝑡 } on 𝑆 is a sequence of random variables on 𝑆 that have the Markov
property.
In continuous time, the Markov Property is more complicated, but intuitively is the same as
the discrete-time case.
That is, knowing the current state is enough to know probabilities for future states. Or, for
realizations 𝑥(𝜏 ) ∈ 𝑆, 𝜏 ≤ 𝑡,
Heuristically, consider a time period 𝑡 and a small step forward, Δ. Then the probability to
transition from state 𝑖 to state 𝑗 is
where the 𝑞𝑖𝑗 are “intensity” parameters governing the transition rate, and 𝑜(Δ) is little-o
notation. That is, lim∆→0 𝑜(Δ)/Δ = 0.
Just as in the discrete case, we can summarize these parameters by an 𝑁 × 𝑁 matrix, 𝑄 ∈
𝑅𝑁×𝑁 .
Recall that in the discrete case every element is weakly positive and every row must sum to
one. With continuous time, however, the rows of 𝑄 sum to zero, where the diagonal contains
the negative value of jumping out of the current state. That is,
432 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
• 𝑞𝑖𝑗 ≥ 0 for 𝑖 ≠ 𝑗
• 𝑞𝑖𝑖 ≤ 0
• ∑𝑗 𝑞𝑖𝑗 = 0
The 𝑄 matrix is called the intensity matrix, or the infinitesimal generator of the Markov
chain. For example,
−0.1 0.1 0 0 0 0
⎡ 0.1 −0.2 0.1 0 0 0 ⎤
⎢ ⎥
⎢ 0 0.1 −0.2 0.1 0 0 ⎥
𝑄=⎢
⎢ 0 0 0.1 −0.2 0.1 0 ⎥⎥
⎢ 0 0 0 0.1 −0.2 0.1 ⎥
⎣ 0 0 0 0 0.1 −0.1⎦
In the above example, transitions occur only between adjacent states with the same intensity
(except for a “bouncing back” of the bottom and top states).
Implementing the 𝑄 using its tridiagonal structure
𝜌𝑣 = 𝑟 + 𝑄𝑣
(𝜌𝐼 − 𝑄)𝑣 = 𝑟
A = ρ * I - Q
Note that this 𝐴 matrix is maintaining the tridiagonal structure of the problem, which leads
to an efficient solution to the linear problem.
In [38]: v = A \ r
The 𝑄 is also used to calculate the evolution of the Markov chain, in direct analogy to the
𝜓𝑡+𝑘 = 𝜓𝑡 𝑃 𝑘 evolution with the transition matrix 𝑃 of the discrete case.
In the continuous case, this becomes the system of linear differential equations
̇ = 𝑄(𝑡)𝑇 𝜓(𝑡)
𝜓(𝑡)
given the initial condition 𝜓(0) and where the 𝑄(𝑡) intensity matrix is allowed to vary with
time. In the simplest case of a constant 𝑄 matrix, this is a simple constant-coefficient system
of linear ODEs with coefficients 𝑄𝑇 .
̇ = 0, and the stationary solution 𝜓∗ needs to
If a stationary equilibrium exists, note that 𝜓(𝑡)
satisfy
0 = 𝑄𝑇 𝜓 ∗
Notice that this is of the form 0𝜓∗ = 𝑄𝑇 𝜓∗ and hence is equivalent to finding the eigenvector
associated with the 𝜆 = 0 eigenvalue of 𝑄𝑇 .
With our example, we can calculate all of the eigenvalues and eigenvectors
Out[39]: Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
6-element Array{Float64,1}:
-0.3732050807568874
-0.29999999999999993
-0.19999999999999998
-0.09999999999999995
-0.026794919243112274
0.0
vectors:
6×6 Array{Float64,2}:
-0.149429 -0.288675 0.408248 0.5 -0.557678 0.408248
434 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
Indeed, there is a 𝜆 = 0 eigenvalue, which is associated with the last column in the eigenvec-
tor. To turn that into a probability, we need to normalize it.
A frequent case in discretized models is dealing with Markov chains with multiple “spatial”
dimensions (e.g., wealth and income).
After discretizing a process to create a Markov chain, you can always take the Cartesian
product of the set of states in order to enumerate it as a single state variable.
To see this, consider states 𝑖 and 𝑗 governed by infinitesimal generators 𝑄 and 𝐴.
As = kron(A, sparse(I(M)))
return As + Qs
end
α =
0.1
N =
4
Q Tridiagonal(fill(α, N-1), [-α; fill(-2α, N-2); -α], fill(α, N-1))
=
A =
sparse([-0.1 0.1
0.2 -0.2])
M = size(A,1)
L = markov_chain_product(Q, A)
L |> Matrix # display as a dense matrix
This provides the combined Markov chain for the (𝑖, 𝑗) process. To see the sparsity pattern,
Out[42]:
To calculate a simple dynamic valuation, consider whether the payoff of being in state (𝑖, 𝑗) is
𝑟𝑖𝑗 = 𝑖 + 2𝑗
In [44]: ρ = 0.05
v = (ρ * I - L) \ r
reshape(v, N, M)
ψ = L_eig.vectors[:,end]
ψ = ψ / sum(ψ)
22.4.2 Irreducibility
As with the discrete-time Markov chains, a key question is whether CTMCs are reducible,
i.e., whether states communicate. The problem is isomorphic to determining whether the di-
rected graph of the Markov chain is strongly connected.
We can verify that it is possible to move between every pair of states in a finite number of
steps with
every state
is_strongly_connected(Q_graph) = true
Alternatively, as an example of a reducible Markov chain where states 1 and 2 cannot jump
to state 3.
is_strongly_connected(Q_graph) = false
A tridiagonal matrix has 3 non-zero diagonals: the main diagonal, the first sub-diagonal (i.e.,
below the main diagonal), and also the first super-diagonal (i.e., above the main diagonal).
This is a special case of a more general type called a banded matrix, where the number of
sub- and super-diagonals can be greater than 1. The total width of main-, sub-, and super-
diagonals is called the bandwidth. For example, a tridiagonal matrix has a bandwidth of 3.
An 𝑁 × 𝑁 banded matrix with bandwidth 𝑃 has about 𝑁 𝑃 nonzeros in its sparsity pattern.
These can be created directly as a dense matrix with diagm. For example, with a bandwidth
of three and a zero diagonal,
Or as a sparse matrix,
And, of course, specialized algorithms will be used to exploit the structure when solving lin-
ear systems. In particular, the complexity is related to the 𝑂(𝑁 𝑃𝐿 𝑃𝑈 ) for upper and lower
bandwidths 𝑃
Recall the famous quote from Knuth: “97% of the time, premature optimization is the root
of all evil. Yet we should not pass up our opportunities in that critical 3%.” The most com-
mon example of premature optimization is trying to use your own mental model of a com-
piler while writing your code, worried about the efficiency of code, and (usually incorrectly)
second-guessing the compiler.
Concretely, the lessons in this section are:
1. Don’t worry about optimizing your code unless you need to. Code clarity is your first-
order concern.
2. If you use other people’s packages, they can worry about performance and you don’t
need to.
3. If you absolutely need that “critical 3%,” your intuition about performance is usually
wrong on modern CPUs and GPUs, so let the compiler do its job.
4. Benchmarking (e.g., @btime) and profiling are the tools to figure out performance bot-
tlenecks. If 99% of computing time is spent in one small function, then there is no point
in optimizing anything else.
5. If you benchmark to show that a particular part of the code is an issue, and you can’t
find another library that does a better job, then you can worry about performance.
Numerical analysts sometimes refer to the lowest level of code for basic operations (e.g., a dot
product, matrix-matrix product, convolutions) as kernels.
That sort of code is difficult to write, and performance depends on the characteristics of the
underlying hardware, such as the instruction set available on the particular CPU, the size of
the CPU cache, and the layout of arrays in memory.
Typically, these operations are written in a BLAS library, organized into different levels. The
levels roughly correspond to the computational order of the operations: BLAS Level 1 are
𝑂(𝑁 ) operations such as linear products, Level 2 are 𝑂(𝑁 2 ) operations such as matrix-vector
products, and Level 3 are roughly 𝑂(𝑁 3 ), such as general matrix-matrix products.
An example of a BLAS library is OpenBLAS, which is used by default in Julia, or the Intel
MKL, which is used in Matlab (and in Julia if the MKL.jl package is installed).
440 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
On top of BLAS are LAPACK operations, which are higher-level kernels, such as matrix fac-
torizations and eigenvalue algorithms, and are often in the same libraries (e.g., MKL has both
BLAS and LAPACK functionality).
The details of these packages are not especially relevant, but if you are talking about perfor-
mance, people will inevitably start discussing these different packages and kernels. There are
a few important things to keep in mind:
1. Leave writing kernels to the experts. Even simple-sounding algorithms can be very com-
plicated to implement with high performance.
2. Your intuition about performance of code is probably going to be wrong. If you use
high quality libraries rather than writing your own kernels, you don’t need to use your
intuition.
3. Don’t get distracted by the jargon or acronyms above if you are reading about perfor-
mance.
There is a practical performance issue which may influence your code. Since memory in a
CPU is linear, dense matrices need to be stored by either stacking columns (called column-
major order) or rows.
The reason this matters is that compilers can generate better performance if they work in
contiguous chunks of memory, and this becomes especially important with large matrices due
to the interaction with the CPU cache. Choosing the wrong order when there is no benefit
in code clarity is an example of premature pessimization. The performance difference can be
orders of magnitude in some cases, and nothing in others.
One option is to use the functions that let the compiler choose the most efficient way to tra-
verse memory. If you need to choose the looping order yourself, then you might want to ex-
periment with going through columns first and going through rows first. Other times, let Ju-
lia decide, i.e., enumerate and eachindex will choose the right approach.
Julia, Fortran, and Matlab all use column-major order, while C/C++ and Python use row-
major order. This means that if you find an algorithm written for C/C++/Python, you will
sometimes need to make small changes if performance is an issue.
While we have usually not considered optimizing code for performance (and have focused
on the choice of algorithms instead), when matrices and vectors become large we need to be
more careful.
The most important thing to avoid are excess allocations, which usually occur due to the use
of temporary vectors and matrices when they are not necessary. Sometimes those extra tem-
porary values can cause enormous degradations in performance.
However, caution is suggested since excess allocations are never relevant for scalar values, and
allocations frequently create faster code for smaller matrices/vectors since it can lead to bet-
ter cache locality.
To see this, a convenient tool is the benchmarking
22.6. IMPLEMENTATION DETAILS AND PERFORMANCE 441
The ! on the f! is an informal way to say that the function is mutating, and the first argu-
ment (C here) is by convention the modified variable.
In the f! function, notice that the D is a temporary variable which is created, and then mod-
ified afterwards. But notice that since C is modified directly, there is no need to create the
temporary D matrix.
This is an example of where an in-place version of the matrix multiplication can help avoid
the allocation.
Note that in the output of the benchmarking, the f2! is non-allocating and is using the pre-
allocated C variable directly.
Another example of this is solutions to linear equations, where for large solutions you may
pre-allocate and reuse the solution vector.
In [57]: A = rand(10,10)
y = rand(10)
z = A \ y # creates temporary
However, if you benchmark carefully, you will see that this is sometimes slower. Avoiding al-
locations is not always a good idea - and worrying about it prior to benchmarking is prema-
ture optimization.
There are a variety of other non-allocating versions of functions. For example,
In [58]: A = rand(10,10)
B = similar(A)
Finally, a common source of unnecessary allocations is when taking slices or portions of ma-
trices. For example, the following allocates a new matrix B and copies the values.
In [59]: A = rand(5,5)
B = A[2,:] # extract a vector
A[2, 1] = 100.0
B[1] = 0.07265755245781103
Instead of allocating a new matrix, you can take a view of a matrix, which provides an ap-
propriate AbstractArray type that doesn’t allocate new memory with the @view matrix.
In [61]: A = rand(5,5)
B = @view A[2,:] # does not copy the data
A[2,1] = 100.0
@show A[2,1]
@show B[1];
A[2, 1] = 100.0
B[1] = 100.0
But again, you will often find that doing @view leads to slower code. Benchmark instead,
and generally rely on it for large matrices and for contiguous chunks of memory (e.g.,
columns rather than rows).
22.7 Exercises
22.7.1 Exercise 1
This exercise is for practice on writing low-level routines (i.e., “kernels”), and to hopefully
convince you to leave low-level code to the experts.
The formula for matrix multiplication is deceptively simple. For example, with the product of
square matrices 𝐶 = 𝐴𝐵 of size 𝑁 × 𝑁 , the 𝑖, 𝑗 element of 𝐶 is
444 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
𝑁
𝐶𝑖𝑗 = ∑ 𝐴𝑖𝑘 𝐵𝑘𝑗
𝑘=1
Alternatively, you can take a row 𝐴𝑖,∶ and column 𝐵∶,𝑗 and use an inner product
Note that the inner product in a discrete space is simply a sum, and has the same complexity
as the sum (i.e., 𝑂(𝑁 ) operations).
For a dense matrix without any structure and using a naive multiplication algorithm, this
also makes it clear why the complexity is 𝑂(𝑁 3 ): You need to evaluate it for 𝑁 2 elements in
the matrix and do an 𝑂(𝑁 ) operation each time.
For this exercise, implement matrix multiplication yourself and compare performance in a few
permutations.
1. Use the built-in function in Julia (i.e., C = A * B, or, for a better comparison, the in-
place version mul!(C, A, B), which works with pre-allocated data).
2. Loop over each 𝐶𝑖𝑗 by the row first (i.e., the i index) and use a for loop for the inner
product.
3. Loop over each 𝐶𝑖𝑗 by the column first (i.e., the j index) and use a for loop for the
inner product.
4. Do the same but use the dot product instead of the sum.
5. Choose your best implementation of these, and then for matrices of a few different sizes
(N=10, N=1000, etc.), and compare the ratio of performance of your best implementa-
tion to the built-in BLAS library.
22.7.2 Exercise 2a
Here we will calculate the evolution of the pdf of a discrete-time Markov chain, 𝜓𝑡 , given the
initial condition 𝜓0 .
Start with a simple symmetric tridiagonal matrix
In [62]: N = 100
A = Tridiagonal([fill(0.1, N-2); 0.2], fill(0.8, N), [0.2; fill(0.1, N-2)])
A_adjoint = A';
2. Write code to calculate 𝜓𝑡 to some 𝑇 by iterating the map for each 𝑡, i.e.,
𝜓𝑡+1 = 𝐴′ 𝜓𝑡
1. What is the computational order of calculating 𝜓𝑇 using this iteration approach 𝑇 <
𝑁?
2. What is the computational order of (𝐴′ )𝑇 = (𝐴′ … 𝐴′ ) and then 𝜓𝑇 = (𝐴′ )𝑇 𝜓0 for
𝑇 < 𝑁?
3. Benchmark calculating 𝜓𝑇 with the iterative calculation above as well as the direct
𝜓𝑇 = (𝐴′ )𝑇 𝜓0 to see which is faster. You can take the matrix power with just
A_adjoint^T, which uses specialized algorithms faster and more accurately than re-
peated matrix multiplication (but with the same computational order).
Note: The algorithm used in Julia to take matrix powers depends on the matrix structure,
as always. In the symmetric case, it can use an eigendecomposition, whereas with a general
dense matrix it uses squaring and scaling.
22.7.3 Exercise 2b
With the same setup as in Exercise 2a, do an eigendecomposition of A_transpose. That is,
use eigen to factor the adjoint 𝐴′ = 𝑄Λ𝑄−1 , where 𝑄 is the matrix of eigenvectors and Λ is
the diagonal matrix of eigenvalues. Calculate 𝑄−1 from the results.
Use the factored matrix to calculate the sequence of 𝜓𝑡 = (𝐴′ )𝑡 𝜓0 using the relationship
𝜓𝑡 = 𝑄Λ𝑡 𝑄−1 𝜓0
where matrix powers of diagonal matrices are simply the element-wise power of each element.
Benchmark the speed of calculating the sequence of 𝜓𝑡 up to T = 2N using this method. In
principle, the factorization and easy calculation of the power should give you benefits, com-
pared to simply iterating the map as we did in Exercise 2a. Explain why it does or does not,
using computational order of each approach.
446 CHAPTER 22. NUMERICAL LINEAR ALGEBRA AND FACTORIZATIONS
Chapter 23
23.1 Contents
• Overview 23.2
• Ill-Conditioned Matrices 23.3
• Stationary Iterative Algorithms for Linear Systems 23.4
• Krylov Methods 23.5
• Iterative Methods for Linear Least Squares 23.6
• Iterative Methods for Eigensystems 23.7
• Krylov Methods for Markov-Chain Dynamics 23.8
23.2 Overview
This lecture takes the structure of numerical methods for linear algebra and builds further
toward working with large, sparse matrices. In the process, we will examine foundational nu-
merical analysis such as ill-conditioned matrices.
23.2.1 Setup
23.2.2 Applications
1. Solving a linear system for a square 𝐴 where we will maintain throughout that there is
a unique solution to
447
448 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
𝐴𝑥 = 𝑏
1. Linear least-squares solution, for a rectangular 𝐴
From theory, we know that if 𝐴 has linearly independent columns, then the solution is the
normal equation
𝑥 = (𝐴′ 𝐴)−1 𝐴′ 𝑏
1. In the case of a square matrix 𝐴, the eigenvalue problem is that of finding 𝑥 and 𝜆 such
that
𝐴𝑥 = 𝜆𝑥
For eigenvalue problems, keep in mind that you do not always require all of the 𝜆, and some-
times the largest (or smallest) would be enough. For example, calculating the spectral radius
requires only the eigenvalue with maximum absolute value.
2. The convergence speed of many iterative methods is based on the spectral properties
of the matrices (e.g., the basis formed by the eigenvectors), and hence ill-conditioned
systems can converge slowly.
𝜅(𝐴) ≡ ‖𝐴‖‖𝐴−1 ‖
23.3. ILL-CONDITIONED MATRICES 449
where you can use the Cauchy–Schwarz inequality to show that 𝜅(𝐴) ≥ 1. While the condi-
tion number can be calculated with any norm, we will focus on the 2-norm.
First, a warning on calculations: Calculating the condition number for a matrix can be an
expensive operation (as would calculating a determinant) and should be thought of as roughly
equivalent to doing an eigendecomposition. So use it for detective work judiciously.
Let’s look at the condition number of a few matrices using the cond function (which allows a
choice of the norm, but we’ll stick with the default 2-norm).
In [3]: A = I(2)
cond(A)
Out[3]: 1.0
Here we see an example of the best-conditioned matrix, the identity matrix with its com-
pletely orthonormal basis, which has a condition number of 1.
On the other hand, notice that
In [4]: ϵ = 1E-6
A = [1.0 0.0
1.0 ϵ]
cond(A)
Out[4]: 2.0000000000005004e6
has a condition number of order 10E6 - and hence (taking the base-10 log) you would expect
to be introducing numerical errors of about 6 significant digits if you are not careful. For ex-
ample, note that the inverse has both extremely large and extremely small negative numbers
In [5]: inv(A)
Since we know that the determinant of nearly collinear matrices is close to zero, this shows
another symptom of poor conditioning
In [6]: det(A)
Out[6]: 1.0e-6
However, be careful since the determinant has a scale, while the condition number is dimen-
sionless. That is,
det(1000A) = 1.0
cond(1000A) = 2.0000000000005001e6
450 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
In that case, the determinant of A is 1, while the condition number is unchanged. This exam-
ple also provides some intuition that ill-conditioned matrices typically occur when a matrix
has radically different scales (e.g., contains both 1 and 1E-6, or 1000 and 1E-3). This can
occur frequently with both function approximation and linear least squares.
Multiplying a matrix by a constant does not change the condition number. What about other
operations?
For this example, we see that the inverse has the same condition number (though this will not
always be the case).
cond(A) = 2.0000000000005004e6
cond(inv(A)) = 2.0000000002463197e6
The condition number of the product of two matrices can change radically and lead things to
becoming even more ill-conditioned.
This comes up frequently when calculating the product of a matrix and its transpose (e.g.,
forming the covariance matrix). A classic example is the Läuchli matrix.
cond(L) = 1.732050807568878e8
cond(L' * L) = 5.345191558726545e32
You can show that the analytic eigenvalues of this are {3+𝜖2 , 𝜖2 , 𝜖2 } but the poor conditioning
means it is difficult to distinguish these from 0.
This comes up when conducting Principal Component Analysis, which requires calculations of
the eigenvalues of the covariance matrix
Note that these are significantly different than the known analytic solution and, in particular,
are difficult to distinguish from 0.
Alternatively, we could calculate these by taking the square of the singular values of 𝐿 itself,
which is much more accurate and lets us clearly distinguish from zero
Similarly, we are better off calculating least squares directly rather than forming the normal
equation (i.e., 𝐴′ 𝐴𝑥 = 𝐴′ 𝑏) ourselves
In [14]: N = 3
A = lauchli(N, 1E-7)' |> Matrix
b = rand(N+1)
x_sol_1 = A \ b # using a least-squares solver
x_sol_2 = (A' * A) \ (A' * b) # forming the normal equation ourselves
norm(x_sol_1 - x_sol_2)
Out[14]: 2502.05373776057
𝑁
𝑃 (𝑥) = ∑ 𝑐𝑖 𝑥𝑖
𝑖=0
452 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
To solve for the coefficients, we notice that this is a simple system of equations
c
𝑦0 = 𝑐0 + 𝑐1 𝑥0 + … 𝑐𝑁 𝑥𝑁
0
…
𝑦𝑁 = 𝑐0 + 𝑐1 𝑥𝑁 + … 𝑐𝑁 𝑥𝑁𝑁
1 𝑥0 𝑥20 … 𝑥𝑁
0
𝐴=⎡
⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎤
⎥
⎣1 𝑥 𝑁 𝑥2𝑁 … 𝑥𝑁
𝑁⎦
𝐴𝑐 = 𝑦
In [15]: N = 5
f(x) = exp(x)
x = range(0.0, 10.0, length = N+1)
y = f.(x) # generate some data to interpolate
Out[15]: 1.356966095045209e-9
The final step just checks the interpolation vs. the analytic function at the nodes. Keep in
mind that this should be very close to zero since we are interpolating the function precisely
at those nodes. In our example, the Inf-norm (i.e., maximum difference) of the interpolation
errors at the nodes is around 1E-9, which is reasonable for many problems.
But note that with 𝑁 = 5 the condition number is already of order 1E6.
In [16]: cond(A)
Out[16]: 564652.3214053963
What if we increase the degree of the polynomial with the hope of increasing the precision of
the interpolation?
In [17]: N = 10
f(x) = exp(x)
x = range(0.0, 10.0, length = N+1)
y = f.(x) # generate some data to interpolate
Out[17]: 8.61171429278329e-7
Here, we see that hoping to increase the precision between points by adding extra polyno-
mial terms is backfiring. By going to a 10th-order polynomial, we have introduced an error of
about 1E-5, even at the interpolation points themselves.
This blows up quickly
In [18]: N = 20
f(x) = exp(x)
x = range(0.0, 10.0, length = N+1)
y = f.(x) # generate some data to interpolate
Out[18]: 19978.410967681375
To see the source of the problem, note that the condition number is astronomical.
In [19]: cond(A)
Out[19]: 2.0386741019186427e24
At this point, you should be suspicious of the use of inv(A), since we have considered solv-
ing linear systems by taking the inverse as verboten. Indeed, this made things much worse.
The error drops dramatically if we solve it as a linear system
In [20]: c = A \ y
norm(A * c - f.(x), Inf)
Out[20]: 1.864464138634503e-10
But an error of 1E-10 at the interpolating nodes themselves can be a problem in many ap-
plications, and if you increase N then the error will become non-trivial eventually - even with-
out taking the inverse.
The heart of the issue is that the monomial basis leads to a Vandermonde matrix, which is
especially ill-conditioned.
The monomial basis is also a good opportunity to look at a separate type of error due to
Runge’s Phenomenon. It is an important issue in approximation theory, albeit not one driven
by numerical approximation errors.
It turns out that using a uniform grid of points is, in general, the worst possible choice of in-
terpolation nodes for a polynomial approximation. This phenomenon can be seen with the
1
interpolation of the seemingly innocuous Runge’s function, 𝑔(𝑥) = 1+25𝑥 2.
454 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
Let’s calculate the interpolation with a monomial basis to find the 𝑐𝑖 such that
𝑁
1
≈ ∑ 𝑐𝑖 𝑥𝑖 , for − 1 ≤ 𝑥 ≤ 1
1 + 25𝑥2 𝑖=0
First, interpolate with 𝑁 = 5 and avoid taking the inverse. In that case, as long as we avoid
taking an inverse, the numerical errors from the ill-conditioned matrix are manageable.
# interpolation
N = 5
x = range(-1.0, 1.0, length = N+1)
y = g.(x)
A_5 = [x_i^n for x_i in x, n in 0:N]
c_5 = A_5 \ y
Out[21]:
Note that while the function, 𝑔(𝑥), and the approximation with a 5th-order polynomial,
𝑃5 (𝑥), coincide at the 6 nodes, the approximation has a great deal of error everywhere else.
The oscillations near the boundaries are the hallmarks of Runge’s Phenomenon. You might
guess that increasing the number of grid points and the order of the polynomial will lead to
better approximations:
In [22]: N = 9
x = range(-1.0, 1.0, length = N+1)
y = g.(x)
A_9 = [x_i^n for x_i in x, n in 0:N]
c_9 = A_9 \ y
Out[22]:
While the approximation is better near x=0, the oscillations near the boundaries have become
worse. Adding on extra polynomial terms will not globally increase the quality of the approxi-
mation.
23.4. STATIONARY ITERATIVE ALGORITHMS FOR LINEAR SYSTEMS 455
We can minimize the numerical problems of an ill-conditioned basis matrix by choosing a dif-
ferent basis for the polynomials.
For example, Chebyshev polynomials form an orthonormal basis under an appropriate inner
product, and we can form precise high-order approximations, with very little numerical error
Out[23]: Besides the use of a different polynomial basis, we are approximating at different
nodes (i.e., Chebyshev nodes). Interpolation with Chebyshev polynomials at the Chebyshev
nodes ends up minimizing (but not eliminating) Runge’s Phenomenon.
To summarize:
1. Check the condition number on systems you suspect might be ill-conditioned (based on
intuition of collinearity).
2. If you are working with ill-conditioned matrices, be especially careful not to take the
inverse or multiply by the transpose.
3. Avoid a monomial polynomial basis. Instead, use polynomials (e.g., Chebyshev or La-
grange) orthogonal under an appropriate inner product, or use a non-global basis such
as cubic splines.
4. If possible, avoid using a uniform grid for interpolation and approximation, and choose
nodes appropriate for the basis.
However, sometimes you can’t avoid ill-conditioned matrices. This is especially common with
discretization of PDEs and with linear least squares.
𝐴𝑥 = 𝑏
456 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
We will now focus on cases where 𝐴 is both massive (e.g., potentially millions of equations)
and sparse, and sometimes ill-conditioned - but where there is always a unique solution.
While this may seem excessive, it occurs in practice due to the curse of dimensionality, dis-
cretizations of PDEs, and when working with big data.
The methods in the previous lectures (e.g., factorization and approaches similar to Gaussian
elimination) are called direct methods, and are able in theory to converge to the exact solu-
tion in a finite number of steps while directly working with the matrix in memory.
Instead, iterative solutions start with a guess on a solution and iterate until convergence. The
benefit will be that each iteration uses a lower-order operation (e.g., an 𝑂(𝑁 2 ) matrix-vector
product) which will make it possible to
2. define linear operators in terms of the matrix-vector products, rather than storing as a
matrix.
3. get approximate solutions in progress prior to the completion of all algorithm steps, un-
like the direct methods, which provide a solution only at the end.
Of course, there is no free lunch, and the computational order of the iterations themselves
would be comparable to the direct methods for a given level of tolerance (e.g., 𝑂(𝑁 3 ) opera-
tions may be required to solve a dense unstructured system).
There are two types of iterative methods we will consider. The first type is stationary meth-
ods, which iterate on a map in a way that’s similar to fixed-point problems, and the second
type is Krylov methods, which iteratively solve using left-multiplications of the linear opera-
tor.
For our main examples, we will use the valuation of the continuous-time Markov chain from
the numerical methods for linear algebra lecture. That is, given a payoff vector 𝑟, a discount
rate 𝜌, and the infinitesimal generator of the Markov chain 𝑄, solve the equation
𝜌𝑣 = 𝑟 + 𝑄𝑣
With the sizes and types of matrices here, iterative methods are inappropriate in practice,
but they will help us understand the characteristics of convergence and how they relate to
matrix conditioning.
First, we will solve with a direct method, which will give the solution to machine precision.
A = ρ * I - Q
23.4. STATIONARY ITERATIVE ALGORITHMS FOR LINEAR SYSTEMS 457
v_direct = A \ r
mean(v_direct)
Out[24]: 100.00000000000004
Without proof, consider that given the discount rate of 𝜌 > 0, this problem could be set up
as a contraction for solving the Bellman equation through methods such as value-function
iteration.
The condition we will examine here is called diagonal dominance.
That is, in every row, the diagonal element is weakly greater in absolute value than the sum
of all of the other elements in the row. In cases where it is strictly greater, we say that the
matrix is strictly diagonally dominant.
With our example, given that 𝑄 is the infinitesimal generator of a Markov chain, we know
that each row sums to 0, and hence it is weakly diagonally dominant.
However, notice that when 𝜌 > 0, and since the diagonal of 𝑄 is negative, 𝐴 = 𝜌𝐼 − 𝑄 makes
the matrix strictly diagonally dominant.
For matrices that are strictly diagonally dominant, you can prove that a simple decompo-
sition and iteration procedure will converge.
To solve a system 𝐴𝑥 = 𝑏, split the matrix 𝐴 into its diagonal and off-diagonal elements.
That is,
𝐴=𝐷+𝑅
where
𝐴11 0 … 0
⎡ 0 𝐴 0 ⎤
22 …
𝐷=⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣ 0 0 … 𝐴𝑁𝑁 ⎦
and
0 𝐴12 … 𝐴1𝑁
⎡𝐴 0 … 𝐴2𝑁 ⎤
𝑅 = ⎢ 21 ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣𝐴𝑁1 𝐴𝑁2 … 0 ⎦
𝐷𝑥 = 𝑏 − 𝑅𝑥
𝑥 = 𝐷−1 (𝑏 − 𝑅𝑥)
458 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
where, since 𝐷 is diagonal, its inverse is trivial to calculate with 𝑂(𝑁 ) complexity.
To solve, take an iteration 𝑥𝑘 , starting from 𝑥0 , and then form a new guess with
The complexity here is 𝑂(𝑁 2 ) for the matrix-vector product, and 𝑂(𝑁 ) for the vector sub-
traction and division.
The package IterativeSolvers.jl package implements this method.
For our example, we start with a guess and solve for the value function and iterate
Out[25]: 0.022858373200932647
With this, after 40 iterations we see that the error is in the order of 1E-2
In practice, there are many methods that are better than Jacobi iteration. For example
Gauss-Siedel., which splits the matrix 𝐴 = 𝐿 + 𝑈 into a lower-triangular matrix 𝐿 and an
upper-triangular matrix 𝑈 without the diagonal.
The iteration becomes
𝐿𝑥𝑘+1 = 𝑏 − 𝑈 𝑥𝑘
In that case, since the 𝐿 matrix is triangular, the system can be solved in 𝑂(𝑁 2 ) operations
after 𝑏 − 𝑈 𝑥𝑘 is formed
In [26]: v = zeros(N)
gauss_seidel!(v, A, r, maxiter = 40)
@show norm(v - v_direct, Inf);
The accuracy increases substantially. After 40 iterations, we see that the error is of the order
of 1E-5
Another example is Successive Over-relaxation (SOR), which takes a relaxation parameter
𝜔 > 1 and decomposes the matrix as 𝐴 = 𝐿 + 𝐷 + 𝑈 , where 𝐿, 𝑈 are strictly upper- and
lower-diagonal matrices and 𝐷 is diagonal.
Decompose the 𝐴 matrix, multiply the system by 𝜔, and rearrange to find
23.5. KRYLOV METHODS 459
In that case, 𝐷 + 𝜔𝐿 is a triangular matrix, and hence the linear solution is 𝑂(𝑁 2 ).
In [27]: v = zeros(N)
sor!(v, A, r, 1.1, maxiter = 40)
@show norm(v - v_direct, Inf);
The accuracy is now 1E-7. If we change the parameter to 𝜔 = 1.2, the accuracy further in-
creases to 1E-9.
This technique is common with iterative methods: Frequently, adding a damping or a relax-
ation parameter will counterintuitively speed up the convergence process.
Note: The stationary iterative methods are not always used directly, but are sometimes used
as a “smoothing” step (e.g., running 5-10 times) prior to using other Krylov methods.
A more commonly used set of iterative methods is based on Krylov subspaces, which involve
iterating the 𝐴𝑘 𝑥 matrix-vector product, and orthogonalizing to ensure that the resulting it-
eration is not too collinear.
The prototypical Krylov method is Conjugate Gradient, which requires the 𝐴 matrix to be
symmetric and positive definite.
Solving an example:
In [28]: N = 100
A = sprand(100, 100, 0.1) # 10 percent non-zeros
A = A * A' # easy way to generate a symmetric positive-definite matrix
@show isposdef(A)
b = rand(N)
x_direct = A \ b # sparse direct solver more appropriate here
cond(Matrix(A * A'))
isposdef(A) = true
Out[28]: 3.5791585364800934e10
Notice that the condition numbers tend to be large for large random matrices.
Solving this system with the conjugate gradient method:
In [29]: x = zeros(N)
sol = cg!(x, A, b, log=true, maxiter = 1000)
sol[end]
If you tell a numerical analyst that you are using direct methods, their first question may be,
“which factorization?” But if you tell them you are using an iterative method, they may ask
“which preconditioner?”.
As discussed at the beginning of the lecture, the spectral properties of matrices determine the
rate of convergence of iterative methods. In particular, ill-conditioned matrices can converge
slowly with iterative methods, for the same reasons that naive value-function iteration will
converge slowly if the discount rate is close to 1.
Preconditioning solves this problem by adjusting the spectral properties of the matrix, at the
cost of some extra computational operations.
To see an example of a right-preconditioner, consider a matrix 𝑃 which has a convenient and
numerically stable inverse. Then
𝐴𝑥 = 𝑏
−1
𝐴𝑃 𝑃𝑥 = 𝑏
−1
𝐴𝑃 𝑦=𝑏
𝑃𝑥 = 𝑦
In [30]: AP = A * inv(Diagonal(A))
@show cond(Matrix(A))
@show cond(Matrix(AP));
cond(Matrix(A)) = 189186.6473381337
cond(Matrix(AP)) = 175174.59095330362
In [33]: x = zeros(N)
P = AMGPreconditioner{RugeStuben}(A)
sol = cg!(x, A, b, Pl = P, log=true, maxiter = 1000)
sol[end]
Note: Preconditioning is also available for stationary, iterative methods (see this example),
but is frequently not implemented since such methods are not often used for the complete
solution.
There are many algorithms which exploit matrix structure (e.g., the conjugate gradient
method for positive-definite matrices, and MINRES for matrices that are only symmet-
ric/Hermitian).
On the other hand, if there is no structure to a sparse matrix, then GMRES is a good ap-
proach.
To experiment with these methods, we will use our ill-conditioned interpolation problem with
a monomial basis.
N = 10
f(x) = exp(x)
x = range(0.0, 10.0, length = N+1)
y = f.(x) # generate some data to interpolate
A = sparse([x_i^n for x_i in x, n in 0:N])
c = zeros(N+1) # initial guess required for iterative solutions
results = gmres!(c, A, y, log=true, maxiter = 1000)
println("cond(A) = $(cond(Matrix(A))), $(results[end]) Norm error�
↪$(norm(A*c - y,
Inf))")
462 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
In [35]: N = 10
f(x) = exp(x)
x = range(0.0, 10.0, length = N+1)
y = f.(x) # generate some data to interpolate
A = [x_i^n for x_i in x, n in 0:N]
P = ilu(sparse(A), τ = 0.1)
c = zeros(N+1) # initial guess required for iterative solutions
results = gmres!(c, A, y, Pl = P,log=true, maxiter = 1000)
println("$(results[end]) Norm error $(norm(A*c - y, Inf))")
First, lets use a Krylov method to solve our simple valuation problem
In [36]: α = 0.1
N = 100
Q = Tridiagonal(fill(α, N-1), [-α; fill(-2α, N-2); -α], fill(α, N-1))
A = ρ * I - Q
v = zeros(N)
results = gmres!(v, A, r, log=true)
v_sol = results[1]
println("$(results[end])")
While the A matrix was important to be kept in memory for direct methods, Krylov methods
such as GMRES are built on matrix-vector products, i.e., 𝐴𝑥 for iterations on the 𝑥.
This product can be written directly for a given 𝑥,
(𝜌 + 𝛼)𝑥1 − 𝛼𝑥2
⎡ −𝛼𝑥 + (𝜌 + 2𝛼)𝑥 − 𝛼𝑥 ⎤
⎢ 1 2 3 ⎥
𝐴𝑥 = ⎢ ⋮ ⎥
⎢−𝛼𝑥𝑁−2 + (𝜌 + 2𝛼)𝑥𝑁−1 − 𝛼𝑥𝑁 ⎥
⎣ −𝛼𝑥𝑁−1 + (𝜌 + 𝛼)𝑥𝑁 ⎦
23.5. KRYLOV METHODS 463
x = rand(N)
@show norm(A * x - A_mul(x)) # compare to matrix;
The final line verifies that the A_mul function provides the same result as the matrix multi-
plication with our original A for a random vector.
In abstract mathematics, a finite-dimensional linear operator is a mapping 𝐴 ∶ 𝑅𝑁 → 𝑅𝑁 that
satisfies a number of criteria such as 𝐴(𝑐1 𝑥1 + 𝑐2 𝑥2 ) = 𝑐1 𝐴𝑥1 + 𝑐2 𝐴𝑥2 for scalars 𝑐𝑖 and vectors
𝑥𝑖 .
Moving from abstract mathematics to generic programming, we can think of a linear operator
as a map that satisfies a number of requirements (e.g., it has a left-multiply to apply the map
*, an in-place left-multiply mul!, an associated size). A Julia matrix is just one possible
implementation of the abstract concept of a linear operator.
Convenience wrappers can provide some of the boilerplate which turns the A_mul function
into something that behaves like a matrix. One package is LinearMaps.jl and another is Lin-
earOperators.jl
ishermitian=false, isposdef=false)
Now, with the A_map object, we can fulfill many of the operations we would expect from a
matrix
In [39]: x = rand(N)
@show norm(A_map * x - A * x)
y = similar(x)
mul!(y, A_map, x) # in-place multiplication
@show norm(y - A * x)
@show size(A_map)
@show norm(Matrix(A_map) - A)
@show nnz(sparse(A_map));
norm(A_map * x - A * x) = 0.0
norm(y - A * x) = 0.0
size(A_map) = (100, 100)
norm(Matrix(A_map) - A) = 0.0
nnz(sparse(A_map)) = 298
464 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
Note: In the case of sparse(A_map) and Matrix(A_map), the code is using the left-
multiplication operator with N standard basis vectors to construct the full matrix. This
should be used only for testing purposes.
But notice that as the linear operator does not have indexing operations, it is not an array or
a matrix.
Out[40]: false
As long as algorithms using linear operators are written generically (e.g., using the matrix-
vector * or mul! functionss) and the types of functions are not unnecessarily constrained to
be Matrix or AbstractArray when it isn’t strictly necessary, then the A_map type can
work in places which would otherwise require a matrix.
For example, the Krylov methods in IterativeSolvers.jl are written for generic left-
multiplication
In [41]: results = gmres(A_map, r, log = true) # Krylov method using the matrix-
free type
println("$(results[end])")
These methods are typically not competitive with sparse, direct methods unless the problems
become very large. In that case, we often want to work with pre-allocated vectors. Instead of
using y = A * x for matrix-vector products, we would use the in-place mul!(y, A, x)
function. The wrappers for linear operators all support in-place non-allocating versions for
this purpose.
v = zeros(N)
@show norm(A_map_2 * v - A * v) # can still call with * and have it�
↪allocate
norm(A_map_2 * v - A * v) = 0.0
Converged after 20 iterations.
23.6. ITERATIVE METHODS FOR LINEAR LEAST SQUARES 465
Finally, keep in mind that the linear operators can compose, so that 𝐴(𝑐1 𝑥) + 𝐵(𝑐2 𝑥) + 𝑥 =
(𝑐1 𝐴 + 𝑐2 𝐵 + 𝐼)𝑥 is well defined for any linear operators - just as it would be for matrices 𝐴, 𝐵
and scalars 𝑐1 , 𝑐2 .
For example, take 2𝐴𝑥 + 𝑥 = (2𝐴 + 𝐼)𝑥 ≡ 𝐵𝑥 as a new linear map,
Out[43]: LinearMaps.LinearCombination{Float64,Tuple{LinearMaps.
↪CompositeMap{Float64,Tuple{LinearM
aps.FunctionMap{Float64,typeof(A_mul),Nothing},LinearMaps.
↪UniformScalingMap{Float64}}},L
inearMaps.UniformScalingMap{Bool}}}
The wrappers, such as LinearMap wrappers, make this composition possible by keeping the
composition graph of the expression (i.e., LinearCombination) and implementing the left-
multiply recursively using the rules of linearity.
Another example is to solve the 𝜌𝑣 = 𝑟 + 𝑄𝑣 equation for 𝑣 with composition of matrix-free
methods for 𝐿 rather than by creating the full 𝐴 = 𝜌 − 𝑄 operator, which we implemented as
A_mul
α * x[end-1] - α * x[end];]
Q_map = LinearMap(Q_mul, N)
A_composed = ρ * I - Q_map # map composition, performs no calculations
@show norm(A - sparse(A_composed)) # test produces the same matrix
gmres(A_composed, r, log=true)[2]
In this example, the left-multiply of the A_composed used by gmres uses the left-multiply
of Q_map and I with the rules of linearity. The A_composed = ρ * I - Q_map opera-
tion simply creates the LinearMaps.LinearCombination type, and doesn’t perform any
calculations on its own.
In theory, the solution to the least-squares problem, min𝑥 ‖𝐴𝑥 − 𝑏‖2 , is simply the solution to
the normal equations (𝐴′ 𝐴)𝑥 = 𝐴′ 𝑏.
We saw, however, that in practice, direct methods use a QR decomposition - in part because
an ill-conditioned matrix 𝐴 becomes even worse when 𝐴′ 𝐴 is formed.
For large problems, we can also consider Krylov methods for solving the linear least-squares
problem. One formulation is the LSMR algorithm, which can solve the regularized
466 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
The purpose of the 𝜆 ≥ 0 parameter is to dampen the iteration process and/or regularize the
solution. This isn’t required, but can help convergence for ill-conditioned matrices 𝐴. With
the damping parameter, the normalized equations would become (𝐴′ 𝐴 + 𝜆2 𝐼)𝑥 = 𝐴′ 𝑏.
We can compare solving the least-squares problem with LSMR and direct methods
In [45]: M = 1000
N = 10000
σ = 0.1
β = rand(M)
# simulate data
X = sprand(N, M, 0.1)
y = X * β + σ * randn(N)
β_direct = X \ y
results = lsmr(X, y, log = true)
β_lsmr = results[1]
@show norm(β_direct - β_lsmr)
println("$(results[end])")
Note that rather than forming this version of the normal equations, the LSMR algorithm uses
the 𝐴𝑥 and 𝐴′ 𝑦 (i.e., the matrix-vector product and the matrix-transpose vector product) to
implement an iterative solution. Unlike the previous versions, the left-multiply is insufficient
since the least squares also deals with the transpose of the operator. For this reason, in order
to use matrix-free methods, we need to define the A * x and transpose(A) * y functions
separately.
When you use eigen on a dense matrix, it calculates an eigendecomposition and provides all
the eigenvalues and eigenvectors.
While this is sometimes necessary, a spectral decomposition of a dense, unstructured matrix
is one of the costliest 𝑂(𝑁 3 ) operations (i.e., it has one of the largest constants). For large
matrices, it is often infeasible.
23.7. ITERATIVE METHODS FOR EIGENSYSTEMS 467
Luckily, we frequently need only a few eigenvectors/eigenvalues (in some cases just one),
which enables a different set of algorithms.
For example, in the case of a discrete-time Markov chain, in order to find the stationary dis-
tribution, we are looking for the eigenvector associated with the eigenvalue 1. As usual, a lit-
tle linear algebra goes a long way.
From the Perron-Frobenius theorem, the largest eigenvalue of an irreducible stochastic matrix
is 1 - the same eigenvalue we are looking for.
Iterative methods for solving eigensystems allow targeting the smallest magnitude, the largest
magnitude, and many others. The easiest library to use is Arpack.jl.
As an example,
A_adjoint = A'
λ = Complex{Float64}[1.0000000000000189 + 0.0im]
mean(ϕ) = 0.0010000000000000002
Indeed, the λ is equal to 1. If we choose nev = 2, it will provide the eigenpairs with the two
eigenvalues of largest absolute value.
Hint: If you get errors using Arpack, increase the maxiter parameter for your problems.
Iterative methods for eigensystems rely on matrix-vector products rather than decomposi-
tions, and are amenable to matrix-free approaches. For example, take the Markov chain for a
simple counting process:
3. If the count is at 1, then the only transition is to add a count with probability 𝜃.
4. If the current count is 𝑁 , then the only transition is to lose the count with probability
𝜁.
First, finding the transition matrix 𝑃 and its adjoint directly as a check
In [48]: θ = 0.1
ζ = 0.05
N = 5
P = Tridiagonal(fill(ζ, N-1), [1-θ; fill(1-θ-ζ, N-2); 1-ζ], fill(θ, N-1))
P'
468 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
Implementing the adjoint-vector product directly, and verifying that it gives the same matrix
as the adjoint
Out[49]: 0.0
Finally, solving for the stationary distribution using the matrix-free method (which could be
verified against the decomposition approach of 𝑃 ′ )
λ = Complex{Float64}[1.0 + 0.0im]
ϕ = [0.03225806451612657; 0.06451612903225695; 0.1290322580645172;
0.25806451612903425; 0.516129032258065]
Of course, for a problem this simple, the direct eigendecomposition will be significantly faster.
Use matrix-free iterative methods only for large systems where you do not need all of the
eigenvalues.
This example applies the methods in this lecture to a large continuous-time Markov chain,
and provides some practice working with arrays of arbitrary dimensions.
23.8. KRYLOV METHODS FOR MARKOV-CHAIN DYNAMICS 469
Consider a version of the Markov-chain dynamics in [85], where a firm has a discrete number
of customers of different types. To keep things as simple as possible, assume that there are
𝑚 = 1, … 𝑀 types of customers and that the firm may have 𝑛 = 1, … 𝑁 customers of each
type.
To set the notation, let 𝑛𝑚 ∈ {1, … 𝑁 } be the number of customers of type 𝑚, so that the
state of a firm is {𝑛1 , … 𝑛𝑚 … , 𝑛𝑀 }. The cardinality of possible states is then N ≡ 𝑁 𝑀 ,
which can blow up quickly as the number of types increases.
The stochastic process is a simple counting/forgetting process, as follows:
1. For every 1 ≤ 𝑛𝑚 (𝑡) < 𝑁 , there is a 𝜃 intensity of arrival of a new customer, so that
𝑛𝑚 (𝑡 + Δ) = 𝑛𝑚 (𝑡) + 1.
2. For every 1 < 𝑛𝑚 (𝑡) ≤ 𝑁 , there is a 𝜁 intensity of losing a customer, so that 𝑛𝑚 (𝑡+Δ) =
𝑛𝑚 (𝑡) − 1.
𝑀
𝑄(𝑛1 ,…𝑛𝑀 ) ⋅ 𝑣 = 𝜃 ∑ (𝑛𝑚 < 𝑁 )𝑣(𝑛1 , … , 𝑛𝑚 + 1, … , 𝑛𝑀 )
𝑚=1
𝑀
+ 𝜁 ∑ (1 < 𝑛𝑚 )𝑣(𝑛1 , … , 𝑛𝑚 − 1, … , 𝑛𝑀 )
𝑚=1
− (𝜃 Count(𝑛𝑚 < 𝑁 ) + 𝜁 Count(𝑛𝑚 > 1)) 𝑣(𝑛1 , … , 𝑛𝑀 )
Here:
• the first term includes all of the arrivals of new customers into the various 𝑚
• the second term is the loss of a customer for the various 𝑚
• the last term is the intensity of all exits from this state (i.e., counting the intensity of all
other transitions, to ensure that the row will sum to 0)
In practice, rather than working with the 𝑓 as a multidimensional type, we will need to enu-
merate the discrete states linearly, so that we can iterate 𝑓 between 1 and N. An especially
convenient approach is to enumerate them in the same order as the 𝐾-dimensional Cartesian
product of the 𝑁 states in the multi-dimensional array above.
This can be done with the CartesianIndices function, which is used internally in Julia
for the eachindex function. For example,
In [51]: N = 2
M = 3
shape = Tuple(fill(N, M))
v = rand(shape...)
@show typeof(v)
for ind in CartesianIndices(v)
470 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
typeof(v) = Array{Float64,3}
v(1, 1, 1) = 0.639089412234831
v(2, 1, 1) = 0.4302368488000152
v(1, 2, 1) = 0.21490768283644002
v(2, 2, 1) = 0.7542051014748841
v(1, 1, 2) = 0.4330861190374067
v(2, 1, 2) = 0.07556766967902084
v(1, 2, 2) = 0.2143739072351467
v(2, 2, 2) = 0.43231874437572815
The added benefit of this approach is that it will be the most efficient way to iterate through
vectors in the implementation.
For the counting process with arbitrary dimensions, we will frequently be incrementing or
decrementing the 𝑚 unit vectors of the CartesianIndex type with
and then use the vector to increment. For example, if the current count is (1, 2, 2) and
we want to add a count of 1 to the first index and remove a count of 1 from the third index,
then
This works, of course, because the CartesianIndex type is written to support efficient ad-
dition and subtraction. Finally, to implement the operator, we need to count the indices in
the states where increment and decrement occurs.
ind = CartesianIndex(1, 2, 2)
count(ind.I .> 1) = 2
count(ind.I .< N) = 1
23.8. KRYLOV METHODS FOR MARKOV-CHAIN DYNAMICS 471
With this, we are now able to write the 𝑄 operator on the 𝑓 vector, which is enumerated by
the Cartesian indices. First, collect the parameters in a named tuple generator
array
e_m = ([CartesianIndex((1:M .== i)*1...) for i�
↪in 1:M]))
p = default_params()
v = zeros(p.shape)
dv = similar(v)
@btime Q_mul!($dv, $v, $p)
From the output of the benchmarking, note that the implementation of the left-multiplication
takes less than 100 milliseconds, and allocates little or no memory, even though the Markov
chain has a million possible states (i.e., 𝑁 𝑀 = 106 ).
As before, we could use this Markov chain to solve a Bellman equation. Assume that the firm
discounts at rate 𝜌 > 0 and gets a flow payoff of a different 𝑧𝑚 per customer of type 𝑚. For
example, if the state of the firm is (𝑛1 , 𝑛2 , 𝑛3 ) = (2, 3, 2), then it gets [2 3 2] ⋅ [𝑧1 𝑧2 𝑧3 ]
in flow profits.
472 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
Given this profit function, we can write the simple Bellman equation in our standard form of
𝜌𝑣 = 𝑟 + 𝑄𝑣, defining the appropriate payoff 𝑟. For example, if 𝑧𝑚 = 𝑚2 , then
typeof(r_vec(p)) = Array{Float64,1}
Out[57]: 250.25
Note that the returned 𝑟 is a vector, enumerated in the same order as the 𝑛𝑚 states.
Since the ordering of 𝑟 is consistent with that of 𝑄, we can solve (𝜌 − 𝑄)𝑣 = 𝑟 as a linear
system.
Below, we create a linear operator and compare the algorithm for a few different iterative
methods (GMRES, BiCGStab(l), IDR(s), etc.) with a small problem of only 10,000 possible
states.
Here, we see that even if the 𝐴 matrix has been created, the direct sparse solver (which uses a
sparse LU or QR) is at least an order of magnitude slower and allocates over an order of mag-
nitude more memory. This is in addition to the allocation for the A_sparse matrix itself,
which is not needed for iterative methods.
23.8. KRYLOV METHODS FOR MARKOV-CHAIN DYNAMICS 473
The different iterative methods have tradeoffs when it comes to accuracy, speed, convergence
rate, memory requirements, and usefulness of preconditioning. Going much above N = 104 ,
the direct methods quickly become infeasible.
Putting everything together to solving much larger systems with GMRES as our linear solvers
This solves a value function with a Markov chain of a million states in a little over a second!
This general approach seems to scale roughly linearly. For example, try 𝑁 = 10, 𝑀 = 8 to
solve an equation with a Markov chain with 100 million possible states, which can be solved
in about 3-4 minutes. Above that order of magnitude, you may need to tinker with the lin-
ear solver parameters to ensure that you are not memory limited (e.g., change the restart
parameter of GMRES).
Recall that given an 𝑁 -dimensional intensity matrix 𝑄 of a CTMC, the evolution of the pdf
from an initial condition 𝜓(0) is the system of linear differential equations
̇ = 𝑄𝑇 𝜓(𝑡)
𝜓(𝑡)
If 𝑄 is a matrix, we could just take its transpose to find the adoint. However, with matrix-
free methods, we need to implement the adjoint-vector product directly.
The logic for the adjoint is that for a given 𝑛 = (𝑛1 , … , 𝑛𝑚 , … 𝑛𝑀 ), the 𝑄𝑇 product for that
row has terms enter when
1. 1 < 𝑛𝑚 ≤ 𝑁 , entering into the identical 𝑛 except with one less customer in the 𝑚
position
2. 1 ≤ 𝑛𝑚 < 𝑁 , entering into the identical 𝑛 except with one more customer in the 𝑚
position
𝑀
𝑄𝑇(𝑛1 ,…,𝑛𝑀 ) ⋅ 𝜓 = 𝜃 ∑ (𝑛𝑚 > 1)𝜓(𝑛1 , … , 𝑛𝑚 − 1, … , 𝑛𝑀 )
𝑚=1
𝑀
+ 𝜁 ∑ (𝑛𝑚 < 𝑁 )𝜓(𝑛1 , … , 𝑛𝑚 + 1, … , 𝑛𝑀 )
𝑚=1
− (𝜃 Count(𝑛𝑚 < 𝑁 ) + 𝜁 Count(𝑛𝑚 > 1)) 𝜓(𝑛1 , … , 𝑛𝑀 )
The sparse function for the operator is useful for testing that the function is correct, and is
the adjoint of our Q operator.
In [61]: p = default_params(N=5, M=4) # sparse is too slow for the full matrix
Q = LinearMap((df, f) -> Q_mul!(df, f, p), p.N^p.M, ismutating = true)
Q_T = LinearMap((dψ, ψ) -> Q_T_mul!(dψ, ψ, p), p.N^p.M, ismutating = true)
@show norm(sparse(Q)' - sparse(Q_T)); # reminder: use sparse only for�
↪testing!
As discussed previously, the steady state can be found as the eigenvector associated with the
zero eigenvalue (i.e., the one that solves 𝑄𝑇 𝜓 = 0𝜓). We could do this with a dense eigen-
value solution for relatively small matrices
This approach relies on a full factorization of the underlying matrix, delivering the entire
spectrum. For our purposes, this is not necessary.
Instead, we could use the Arpack.jl package to target the eigenvalue of smallest absolute
value, which relies on an iterative method.
A final approach in this case is to notice that the N × N matrix is of rank N − 1 when the
Markov chain is irreducible. The stationary solution is a vector in the 1-dimensional nullspace
of the matrix.
Using Krylov methods to solve a linear system with the right-hand side all 0 values will con-
verge to a point in the nullspace. That is, min𝑥 ||𝐴𝑥 − 0||2 solved iteratively from a non-zero
initial condition will converge to a point in the nullspace.
We can use various Krylov methods for this trick (e.g., if the matrix is symmetric and posi-
tive definite, we could use Conjugate Gradient) but in our case we will use GMRES since we
do not have any structure.
In [63]: p = default_params(N=5, M=4) # sparse is too slow for the full matrix
Q_T = LinearMap((dψ, ψ) -> Q_T_mul!(dψ, ψ, p), p.N^p.M, ismutating = true)
ψ = fill(1/(p.N^p.M), p.N^p.M) # can't use 0 as initial guess
sol = gmres!(ψ, Q_T, zeros(p.N^p.M)) # i.e., solve Ax = 0 iteratively
ψ = ψ / sum(ψ)
@show norm(ψ - direct_ψ);
The speed and memory differences between these methods can be orders of magnitude.
In [64]: p = default_params(N=4, M=4) # Dense and sparse matrices are too slow�
↪for the full
dataset.
Q_T = LinearMap((dψ, ψ) -> Q_T_mul!(dψ, ψ, p), p.N^p.M, ismutating = true)
Q_T_dense = Matrix(Q_T)
Q_T_sparse = sparse(Q_T)
b = zeros(p.N^p.M)
@btime eigen($Q_T_dense)
@btime eigs($Q_T_sparse, nev=1, which=:SM, v0 = iv) setup = (iv = fill(1/
↪(p.N^p.M),
p.N^p.M))
@btime gmres!(iv, $Q_T, $b) setup = (iv = fill(1/(p.N^p.M), p.N^p.M));
The differences become even more stark as the matrix grows. With default_params(N=5,
M=5), the gmres solution is at least 3 orders of magnitude faster, and uses close to 3 orders
of magnitude less memory than the dense solver. In addition, the gmres solution is about an
order of magnitude faster than the iterative sparse eigenvalue solver.
The algorithm can solve for the steady state of 105 states in a few seconds
476 CHAPTER 23. KRYLOV METHODS AND MATRIX CONDITIONING
As a final demonstration, consider calculating the full evolution of the 𝜓(𝑡) Markov chain. For
the constant 𝑄′ matrix, the solution to this system of equations is 𝜓(𝑡) = exp(𝑄′ )𝜓(0)
Matrix-free Krylov methods using a technique called exponential integration can solve this for
high-dimensional problems.
For this, we can set up a MatrixFreeOperator for our Q_T_mul! function (equivalent
to the LinearMap, but with some additional requirements for the ODE solver) and use the
LinearExponential time-stepping method.
function solve_transition_dynamics(p, t)
@unpack N, M = p
Out[66]:
The above plot (1) calculates the full dynamics of the Markov chain from the 𝑛𝑚 = 1 for
all 𝑚 initial condition; (2) solves the dynamics of a system of a million ODEs; and (3) uses
the calculation of the Bellman equation to find the expected valuation during that transition.
The entire process takes less than 30 seconds.
Part IV
Dynamic Programming
477
Chapter 24
Shortest Paths
24.1 Contents
• Overview 24.2
• Outline of the Problem 24.3
• Finding Least-Cost Paths 24.4
• Solving for 𝐽 24.5
• Exercises 24.6
• Solutions 24.7
24.2 Overview
The shortest path problem is a classic problem in mathematics and computer science with
applications in
• Economics (sequential decision making, analysis of social networks, etc.)
• Operations research and transportation
• Robotics and artificial intelligence
• Telecommunication network design and routing
• etc., etc.
Variations of the methods we discuss in this lecture are used millions of times every day, in
applications such as
• Google Maps
• routing packets on the internet
For us, the shortest path problem also provides a nice introduction to the logic of dynamic
programming.
Dynamic programming is an extremely powerful optimization technique that we apply in
many lectures on this site.
The shortest path problem is one of finding how to traverse a graph from one specified node
to another at minimum cost.
479
480 CHAPTER 24. SHORTEST PATHS
• A, D, F, G at cost 8
24.4. FINDING LEAST-COST PATHS 481
where
• 𝐹𝑣 is the set of nodes that can be reached from 𝑣 in one step
• 𝑐(𝑣, 𝑤) is the cost of traveling from 𝑣 to 𝑤
Hence, if we know the function 𝐽 , then finding the best path is almost trivial.
But how to find 𝐽 ?
Some thought will convince you that, for every node 𝑣, the function 𝐽 satisfies
This is known as the Bellman equation, after the mathematician Richard Bellman.
1. Set 𝑛 = 0.
24.6 Exercises
24.6.1 Exercise 1
Use the algorithm given above to find the optimal path (and its cost) for the following graph.
24.6.2 Setup
In [3]: graph = Dict(zip(0:99, [[(14, 72.21), (8, 11.11), (1, 0.04)],[(13, 64.94),�
↪(6, 20.59),
[(12, 3.18), (11, 7.45), (69, 577.91)],[(20, 16.53), (13, 4.42), (70, 2454.
↪28)],[(16,
25.16), (12, 1.87), (89, 5352.79)],[(20, 65.08), (18, 37.55), (94, 4961.
↪32)],[(28,
1075.38)],[(50, 78.81), (41, 2.09), (52, 17.57)], [(57, 260.46), (54, 101.
↪08), (71,
548.68)], [(54, 162.24), (46, 0.28), (53, 18.23)],[(72, 437.49), (47, 10.
↪08), (59,
2564.12)], [(56, 10.89), (50, 0.51), (78, 53.79)],[(55, 20.1), (53, 1.38),�
↪(85,
[(60, 7.01), (58, 0.46), (86, 701.09)],[(65, 34.32), (64, 29.85), (83, 556.
↪7)],[(71,
484 CHAPTER 24. SHORTEST PATHS
0.67), (60, 0.72), (90, 820.66)],[(67, 1.63), (65, 4.76), (76, 48.
↪03)],[(64, 4.88), (63,
0.95), (98, 1057.59)], [(76, 38.43), (64, 2.94), (91, 132.23)],[(75, 56.
↪34), (72,
594.93)],[(73, 37.53), (68, 2.66), (98, 395.63)], [(70, 0.98), (68, 0.09),�
↪(82,
153.53)],[(71, 1.66), (70, 3.35), (94, 232.1)],[(73, 8.99), (70, 0.06), (99,
247.8)],[(73, 8.37), (72, 1.5), (76, 27.18)],[(91, 284.64), (74, 8.86),�
↪(89, 104.5)],
[(92, 133.06), (84, 102.77), (76, 15.32)],[(90, 243.0), (76, 1.4), (83, 52.
↪22)],[(78,
8.08), (76, 0.52), (81, 1.07)],[(77, 1.19), (76, 0.81), (92, 68.53)],[(78,�
↪2.36), (77,
0.45), (85, 13.18)], [(86, 64.32), (78, 0.98), (80, 8.94)],[(81, 2.59), (98,
355.9)],[(91, 22.35), (85, 1.45), (81, 0.09)],[(98, 264.34), (88, 28.78),�
↪(92,
47.44)],[(88, 5.78), (86, 8.75), (94, 114.95)], [(98, 121.05), (94, 30.
↪41), (89,
(88, 0.21)], [(98, 6.12), (91, 6.83), (93, 1.31)],[(99, 82.12), (97, 36.
↪97)], [(99,
24.7 Solutions
24.7.1 Exercise 1
end
end
return next_J
end
current_location = minimizer_dest
sum_costs += minimizer_cost
end
while true
next_J = update_J!(J, graph)
if next_J == J
break
486 CHAPTER 24. SHORTEST PATHS
else
J = next_J
end
end
print_best_path(J, graph)
node 0
node 8
node 11
node 18
node 23
node 33
node 41
node 53
node 56
node 57
node 60
node 67
node 70
node 73
node 76
node 85
node 87
node 88
node 93
node 94
node 96
node 97
node 98
node 99
Cost: 160.55
Chapter 25
25.1 Contents
• Overview 25.2
• The McCall Model 25.3
• Computing the Optimal Policy: Take 1 25.4
• Computing the Optimal Policy: Take 2 25.5
• Exercises 25.6
• Solutions 25.7
25.2 Overview
The McCall search model [76] helped transform economists’ way of thinking about labor mar-
kets.
To clarify vague notions such as “involuntary” unemployment, McCall modeled the decision
problem of unemployed agents directly, in terms of factors such as
• current and likely future wages
• impatience
• unemployment compensation
To solve the decision problem he used dynamic programming.
Here we set up McCall’s model and adopt the same solution method.
As we’ll see, McCall’s model is not only interesting in its own right but also an excellent vehi-
cle for learning dynamic programming.
487
488 CHAPTER 25. JOB SEARCH I: THE MCCALL SEARCH MODEL
2. Reject the offer, receive unemployment compensation 𝑐, and reconsider next period.
The wage sequence {𝑊𝑡 } is assumed to be iid with probability mass function 𝑝1 , … , 𝑝𝑛 .
Here 𝑝𝑖 is the probability of observing wage offer 𝑊𝑡 = 𝑤𝑖 in the set 𝑤1 , … , 𝑤𝑛 .
The worker is infinitely lived and aims to maximize the expected discounted sum of earnings.
∞
𝔼 ∑ 𝛽 𝑡 𝑌𝑡
𝑡=0
In order to optimally trade off current and future rewards, we need to think about two things:
2. the different states that those choices will lead to next period (in this case, either em-
ployment or unemployment)
25.3. THE MCCALL MODEL 489
To weigh these two aspects of the decision problem, we need to assign values to states.
To this end, let 𝑉 (𝑤) be the total lifetime value accruing to an unemployed worker who en-
ters the current period unemployed but with wage offer 𝑤 in hand.
More precisely, 𝑉 (𝑤) denotes the value of the objective function (1) when an agent in this
situation makes optimal decisions now and at all future points in time.
Of course 𝑉 (𝑤) is not trivial to calculate because we don’t yet know what decisions are opti-
mal and what aren’t!
But think of 𝑉 as a function that assigns to each possible wage 𝑤 the maximal lifetime value
that can be obtained with that offer in hand.
A crucial observation is that this function 𝑉 must satisfy the recursion
𝑛
𝑤
𝑉 (𝑤) = max { , 𝑐 + 𝛽 ∑ 𝑉 (𝑤𝑖 )𝑝𝑖 } (1)
1−𝛽 𝑖=1
𝑤
𝑤 + 𝛽𝑤 + 𝛽 2 𝑤 + ⋯ =
1−𝛽
• the second term inside the max operation is the continuation value, which is the life-
time payoff from rejecting the current offer and then behaving optimally in all subse-
quent periods
If we optimize and pick the best of these two options, we obtain maximal lifetime value from
today, given current offer 𝑤.
But this is precisely 𝑉 (𝑤), which is the l.h.s. of (1).
Suppose for now that we are able to solve (1) for the unknown function 𝑉 .
Once we have this function in hand we can behave optimally (i.e., make the right choice be-
tween accept and reject).
All we have to do is select the maximal choice on the r.h.s. of (1).
The optimal action is best thought of as a policy, which is, in general, a map from states to
actions.
In our case, the state is the current wage offer 𝑤.
Given any 𝑤, we can read off the corresponding best choice (accept or reject) by picking the
max on the r.h.s. of (1).
Thus, we have a map from ℝ to {0, 1}, with 1 meaning accept and zero meaning reject.
490 CHAPTER 25. JOB SEARCH I: THE MCCALL SEARCH MODEL
𝑛
𝑤
𝜎(𝑤) ∶= 1 { ≥ 𝑐 + 𝛽 ∑ 𝑉 (𝑤𝑖 )𝑝𝑖 }
1−𝛽 𝑖=1
𝜎(𝑤) ∶= 1{𝑤 ≥ 𝑤}
̄
where
𝑛
𝑤̄ ∶= (1 − 𝛽) {𝑐 + 𝛽 ∑ 𝑉 (𝑤𝑖 )𝑝𝑖 } (2)
𝑖=1
Here 𝑤̄ is a constant depending on 𝛽, 𝑐 and the wage distribution, called the reservation wage.
The agent should accept if and only if the current wage offer exceeds the reservation wage.
Clearly, we can compute this reservation wage if we can compute the value function.
To put the above ideas into action, we need to compute the value function at points
𝑤1 , … , 𝑤 𝑛 .
In doing so, we can identify these values with the vector 𝑣 = (𝑣𝑖 ) where 𝑣𝑖 ∶= 𝑉 (𝑤𝑖 ).
In view of (1), this vector satisfies the nonlinear system of equations
𝑛
𝑤𝑖
𝑣𝑖 = max { , 𝑐 + 𝛽 ∑ 𝑣𝑖 𝑝 𝑖 } for 𝑖 = 1, … , 𝑛 (3)
1−𝛽 𝑖=1
It turns out that there is exactly one vector 𝑣 ∶= (𝑣𝑖 )𝑛𝑖=1 in ℝ𝑛 that satisfies this equation.
𝑛
𝑤𝑖
𝑣𝑖′ = max { , 𝑐 + 𝛽 ∑ 𝑣𝑖 𝑝𝑖 } for 𝑖 = 1, … , 𝑛 (4)
1−𝛽 𝑖=1
Step 3: calculate a measure of the deviation between 𝑣 and 𝑣′ , such as max𝑖 |𝑣𝑖 − 𝑣𝑖′ |.
Step 4: if the deviation is larger than some fixed tolerance, set 𝑣 = 𝑣′ and go to step 2, else
continue.
25.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 491
Step 5: return 𝑣.
This algorithm returns an arbitrarily good approximation to the true solution to (3), which
represents the value function.
(Arbitrarily good means here that the approximation converges to the true solution as the
tolerance goes to zero)
𝑛
𝑤𝑖
𝑇 𝑣𝑖 = max { , 𝑐 + 𝛽 ∑ 𝑣𝑖 𝑝𝑖 } for 𝑖 = 1, … , 𝑛 (5)
1−𝛽 𝑖=1
(A new vector 𝑇 𝑣 is obtained from given vector 𝑣 by evaluating the r.h.s. at each 𝑖)
One can show that the conditions of the Banach contraction mapping theorem are satisfied by
𝑇 as a self-mapping on ℝ𝑛 .
One implication is that 𝑇 has a unique fixed point in ℝ𝑛 .
Moreover, it’s immediate from the definition of 𝑇 that this fixed point is precisely the value
function.
The iterative algorithm presented above corresponds to iterating with 𝑇 from some initial
guess 𝑣.
The Banach contraction mapping theorem tells us that this iterative process generates a se-
quence that converges to the fixed point.
25.4.3 Implementation
25.4.4 Setup
In [4]: n = 50
dist = BetaBinomial(n, 200, 100) # probability distribution
@show support(dist)
w = range(10.0, 60.0, length = n+1) # linearly space wages
492 CHAPTER 25. JOB SEARCH I: THE MCCALL SEARCH MODEL
using StatsPlots
plt = plot(w, pdf.(dist, support(dist)), xlabel = "wages", ylabel =�
↪"probabilities",
legend = false)
support(dist) = 0:50
Out[4]:
To implement our algorithm, let’s have a look at the sequence of approximate value functions
that this fixed point algorithm generates.
25.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 493
c = 25
β = 0.99
num_plots = 6
# Operator
T(v) = max.(w/(1 - β), c + β * E*v) # (5) broadcasts over the w, fixes the v
# alternatively, T(v) = [max(wval/(1 - β), c + β * E*v) for wval in w]
# fill in matrix of vs
vs = zeros(n + 1, 6) # data to fill
vs[:, 1] .= w / (1-β) # initial guess of "accept all"
Out[6]:
One approach to solving the model is to directly implement this sort of iteration, and contin-
ues until measured deviation between successive iterates is below tol
tol = 1e-6)
@unpack c, β, w = params
end
# now compute the reservation wage
return (1 - β) * (c + β * E*v) # (2)
end
• If we had gone v = v_iv instead, then it would have simply created a new
name v which binds to whatever is located at v_iv.
• Since we later use v .= v_next later in the algorithm, the values in it would be modi-
fied.
• Hence, we would be modifying the v_iv vector we were passed in, which may not be
what the caller of the function wanted.
• The big issue this creates are “side-effects” where you can call a function and strange
things can happen outside of the function that you didn’t expect.
• If you intended for the modification to potentially occur, then the Julia style guide says
that we should call the function compute_reservation_wage_direct! to make
the possible side-effects clear.
As usual, we are better off using a package, which may give a better algorithm and is likely to
less error prone.
In this case, we can use the fixedpoint algorithm discussed in our Julia by Example lec-
ture to find the fixed point of the 𝑇 operator.
ftol = 1e-6, m = 6)
@unpack c, β, w = params
T(v) = max.(w/(1 - β), c + β * E*v) # (5) fixing the parameter values
m = 0).zero # (5)
return (1 - β) * (c + β * E*v_star) # (3)
end
Out[9]: 47.31649970147162
Now we know how to compute the reservation wage, let’s see how it varies with parameters.
In particular, let’s look at what happens when we change 𝛽 and 𝑐.
In [10]: grid_size = 25
R = rand(grid_size, grid_size)
end
end
Out[11]:
496 CHAPTER 25. JOB SEARCH I: THE MCCALL SEARCH MODEL
As expected, the reservation wage increases both with patience and with unemployment com-
pensation.
The approach to dynamic programming just described is very standard and broadly applica-
ble.
For this particular problem, there’s also an easier way, which circumvents the need to com-
pute the value function.
Let 𝜓 denote the value of not accepting a job in this period but then behaving optimally in
all subsequent periods.
That is,
𝑛
𝜓 = 𝑐 + 𝛽 ∑ 𝑉 (𝑤𝑖 )𝑝𝑖 (6)
𝑖=1
𝑤𝑖
𝑉 (𝑤𝑖 ) = max { , 𝜓}
1−𝛽
𝑛
𝑤𝑖
𝜓 = 𝑐 + 𝛽 ∑ max { , 𝜓} 𝑝𝑖 (7)
𝑖=1
1−𝛽
25.5. COMPUTING THE OPTIMAL POLICY: TAKE 2 497
𝑛
𝑤𝑖
𝜓′ = 𝑐 + 𝛽 ∑ max { , 𝜓} 𝑝𝑖 (8)
𝑖=1
1−𝛽
tol = 1e-5)
T_ψ(ψ) = [c + β * E*max.((w ./ (1 - β)), ψ[1])] # (7)
# using vectors since fixedpoint doesn't support scalar
ψ_star = fixedpoint(T_ψ, [ψ_iv]).zero[1]
return (1 - β) * ψ_star # (2)
end
compute_reservation_wage_ψ(c, β)
Out[12]: 47.31649976654629
tol = 1e-5)
root_ψ(ψ) = c + β * E*max.((w ./ (1 - β)), ψ) - ψ # (7)
ψ_star = find_zero(root_ψ, ψ_iv)
return (1 - β) * ψ_star # (2)
end
compute_reservation_wage_ψ2(c, β)
Out[13]: 47.316499766546194
498 CHAPTER 25. JOB SEARCH I: THE MCCALL SEARCH MODEL
25.6 Exercises
25.6.1 Exercise 1
Compute the average duration of unemployment when 𝛽 = 0.99 and 𝑐 takes the following
values
That is, start the agent off as unemployed, computed their reservation wage given the param-
eters, and then simulate to see how long it takes to accept.
Repeat a large number of times and take the average.
Plot mean unemployment duration as a function of 𝑐 in c_vals.
25.7 Solutions
25.7.1 Exercise 1
�
In [14]: function compute_stopping_time(w; seed=1234)
Random.seed!(seed)
stopping_time = 0
t = 1
# make sure the constraint is sometimes binding
�
@assert length(w) - 1 � support(dist) && w <= w[end]
while true
# Generate a wage draw
w_val = w[rand(dist)] # the wage dist set up earlier
�
if w_val ≥ w
stopping_time = t
break
else
t += 1
end
end
return stopping_time
end
�
compute_mean_stopping_time(w, num_reps=10000) = mean(i ->
�
compute_stopping_time(w,
seed = i), 1:num_reps)
c_vals = range(10, 40, length = 25)
stop_times = similar(c_vals)
beta = 0.99
for (i, c) in enumerate(c_vals)
�
w = compute_reservation_wage_ψ(c, beta)
�
stop_times[i] = compute_mean_stopping_time(w)
25.7. SOLUTIONS 499
end
Out[14]:
500 CHAPTER 25. JOB SEARCH I: THE MCCALL SEARCH MODEL
Chapter 26
26.1 Contents
• Overview 26.2
• The Model 26.3
• Solving the Model using Dynamic Programming 26.4
• Implementation 26.5
• The Reservation Wage 26.6
• Exercises 26.7
• Solutions 26.8
26.2 Overview
Previously we looked at the McCall job search model [76] as a way of understanding unem-
ployment and worker decisions.
One unrealistic feature of the model is that every job is permanent.
In this lecture we extend the McCall model by introducing job separation.
Once separation enters the picture, the agent comes to view
• the loss of a job as a capital loss, and
• a spell of unemployment as an investment in searching for an acceptable job
26.2.1 Setup
501
502 CHAPTER 26. JOB SEARCH II: SEARCH AND SEPARATION
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑌𝑡 ) (1)
𝑡=0
The only difference from the baseline model is that we’ve added some flexibility over prefer-
ences by introducing a utility function 𝑢.
It satisfies 𝑢′ > 0 and 𝑢″ < 0.
Here’s what happens at the start of a given period in our model with search and separation.
If currently employed, the worker consumes his wage 𝑤, receiving utility 𝑢(𝑤).
If currently unemployed, he
• receives and consumes unemployment compensation 𝑐
• receives an offer to start work next period at a wage 𝑤′ drawn from a known distribu-
tion 𝑝1 , … , 𝑝𝑛
He can either accept or reject the offer.
If he accepts the offer, he enters next period employed with wage 𝑤′ .
If he rejects the offer, he enters next period unemployed.
When employed, the agent faces a constant probability 𝛼 of becoming unemployed at the end
of the period.
(Note: we do not allow for job search while employed—this topic is taken up in a later lec-
ture)
Let
• 𝑉 (𝑤) be the total lifetime value accruing to a worker who enters the current period em-
ployed with wage 𝑤.
• 𝑈 be the total lifetime value accruing to a worker who is unemployed this period.
Here value means the value of the objective function (1) when the worker makes optimal deci-
sions at all future points in time.
26.4. SOLVING THE MODEL USING DYNAMIC PROGRAMMING 503
Suppose for now that the worker can calculate the function 𝑉 and the constant 𝑈 and use
them in his decision making.
Then 𝑉 and 𝑈 should satisfy
and
Let’s interpret these two equations in light of the fact that today’s tomorrow is tomorrow’s
today.
• The left hand sides of equations (2) and (3) are the values of a worker in a particular
situation today.
• The right hand sides of the equations are the discounted (by 𝛽) expected values of the
possible situations that worker can be in tomorrow.
• But tomorrow the worker can be in only one of the situations whose values today are on
the left sides of our two equations.
Equation (3) incorporates the fact that a currently unemployed worker will maximize his own
welfare.
In particular, if his next period wage offer is 𝑤′ , he will choose to remain unemployed unless
𝑈 < 𝑉 (𝑤′ ).
Equations (2) and (3) are the Bellman equations for this model.
Equations (2) and (3) provide enough information to solve out for both 𝑉 and 𝑈 .
Before discussing this, however, let’s make a small extension to the model.
Let’s suppose now that unemployed workers don’t always receive job offers.
Instead, let’s suppose that unemployed workers only receive an offer with probability 𝛾.
If our worker does receive an offer, the wage offer is drawn from 𝑝 as before.
He either accepts or rejects the offer.
Otherwise the model is the same.
With some thought, you will be able to convince yourself that 𝑉 and 𝑈 should now satisfy
and
We’ll use the same iterative approach to solving the Bellman equations that we adopted in
the first job search lecture.
Here this amounts to
2. plug these guesses into the right hand sides of (4) and (5)
3. update the left hand sides from this rule and then repeat
and
𝑇𝑉 ∗ = 𝑉 ∗ (9)
As before, the system always converges to the true solutions—in this case, the 𝑉 and 𝑈 that
solve (4) and (5).
A proof can be obtained via the Banach contraction mapping theorem.
26.5 Implementation
# parameter validation
26.5. IMPLEMENTATION 505
# necessary objects
u_w = u.(w, σ)
u_c = u(c, σ)
E = expectation(dist) # expectation operator for wage distribution
The approach is to iterate until successive iterates are closer together than some small toler-
ance level.
We then return the current iterate as an approximate solution.
Let’s plot the approximate solutions 𝑈 and 𝑉 to see what they look like.
We’ll use the default parameterizations found in the code above.
# model constructor
McCallModel = @with_kw (α = 0.2,
β = 0.98, # discount rate
γ = 0.7,
c = 6.0, # unemployment compensation
σ = 2.0,
506 CHAPTER 26. JOB SEARCH II: SEARCH AND SEPARATION
u = u, # utility function
w = range(10, 20, length = 60), # wage values
dist = BetaBinomial(59, 600, 400)) # distribution over wage values
mcm = McCallModel()
@unpack V, U = solve_mccall_model(mcm)
U_vec = fill(U, length(mcm.w))
Out[5]:
The value 𝑉 is increasing because higher 𝑤 generates a higher wage flow conditional on stay-
ing employed.
At this point, it’s natural to ask how the model would respond if we perturbed the parame-
ters.
These calculations, called comparative statics, are performed in the next section.
Once 𝑉 and 𝑈 are known, the agent can use them to make decisions in the face of a given
wage offer.
If 𝑉 (𝑤) > 𝑈 , then working at wage 𝑤 is preferred to unemployment.
26.6. THE RESERVATION WAGE 507
If 𝑉 (𝑤) < 𝑈 , then remaining unemployed will generate greater lifetime value.
Suppose in particular that 𝑉 crosses 𝑈 (as it does in the preceding figure).
Then, since 𝑉 is increasing, there is a unique smallest 𝑤 in the set of possible wages such that
𝑉 (𝑤) ≥ 𝑈 .
We denote this wage 𝑤̄ and call it the reservation wage.
Optimal behavior for the worker is characterized by 𝑤̄
• if the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts
• if the wage offer 𝑤 in hand is less than 𝑤,̄ then the worker rejects
If 𝑉 (𝑤) < 𝑈 for all 𝑤, then the function returns np.inf.
Let’s use it to look at how the reservation wage varies with parameters.
In each instance below we’ll show you a figure and then ask you to reproduce it in the exer-
cises.
As expected, higher unemployment compensation causes the worker to hold out for higher
wages.
In effect, the cost of continuing job search is reduced.
The next figure plots the reservation wage associated with different values of 𝛽
Again, the results are intuitive: More patient workers will hold out for higher wages.
Finally, let’s look at how 𝑤̄ varies with the job separation rate 𝛼.
Higher 𝛼 translates to a greater chance that a worker will face termination in each period
once employed.
26.7 Exercises
26.7.1 Exercise 1
26.7.2 Exercise 2
Out[6]: 0.05:0.0375:0.95
26.8 Solutions
26.8.1 Exercise 1
Using the solve_mccall_model function mentioned earlier in the lecture, we can create an
array for reservation wages for different values of 𝑐, 𝛽 and 𝛼 and plot the results like so
plot(c_vals,
�
w_vals,
lw = 2,
α = 0.7,
xlabel = "unemployment compensation",
ylabel = "reservation wage",
label = "w� as a function of c")
Out[7]:
510 CHAPTER 26. JOB SEARCH II: SEARCH AND SEPARATION
Note that we could’ve done the above in one pass (which would be important if, for example,
the parameter space was quite large).
� �
In [8]: w_vals = [solve_mccall_model(McCallModel(c = cval)).w for cval in c_vals];
# doesn't allocate new arrays for models and solutions
26.8.2 Exercise 2
�
plot(γ_vals, w_vals, lw = 2, α = 0.7, xlabel = "job offer rate",
ylabel = "reservation wage", label = "w� as a function of gamma")
Out[9]:
26.8. SOLUTIONS 511
27.1 Contents
• Overview 27.2
• Origin of the problem 27.3
• A dynamic programming approach 27.4
• Implementation 27.5
• Comparison with Neyman-Pearson formulation 27.6
Co-authored with Chase Coleman
27.2 Overview
This lecture describes a statistical decision problem encountered by Milton Friedman and W.
Allen Wallis during World War II when they were analysts at the U.S. Government’s Statisti-
cal Research Group at Columbia University.
This problem led Abraham Wald [106] to formulate sequential analysis, an approach to
statistical decision problems intimately related to dynamic programming.
In this lecture, we apply dynamic programming algorithms to Friedman and Wallis and
Wald’s problem.
Key ideas in play will be:
• Bayes’ Law
• Dynamic programming
• Type I and type II statistical errors
– a type I error occurs when you reject a null hypothesis that is true
– a type II error is when you accept a null hypothesis that is false
• Abraham Wald’s sequential probability ratio test
• The power of a statistical test
• The critical region of a statistical test
• A uniformly most powerful test
513
514 CHAPTER 27. A PROBLEM THAT STUMPED MILTON FRIEDMAN
On pages 137-139 of his 1998 book Two Lucky People with Rose Friedman [33], Milton Fried-
man described a problem presented to him and Allen Wallis during World War II, when they
worked at the US Government’s Statistical Research Group at Columbia University.
Let’s listen to Milton Friedman tell us what happened.
“In order to understand the story, it is necessary to have an idea of a simple statistical prob-
lem, and of the standard procedure for dealing with it. The actual problem out of which se-
quential analysis grew will serve. The Navy has two alternative designs (say A and B) for a
projectile. It wants to determine which is superior. To do so it undertakes a series of paired
firings. On each round it assigns the value 1 or 0 to A accordingly as its performance is supe-
rior or inferior to that of B and conversely 0 or 1 to B. The Navy asks the statistician how to
conduct the test and how to analyze the results.
“The standard statistical answer was to specify a number of firings (say 1,000) and a pair of
percentages (e.g., 53% and 47%) and tell the client that if A receives a 1 in more than 53%
of the firings, it can be regarded as superior; if it receives a 1 in fewer than 47%, B can be
regarded as superior; if the percentage is between 47% and 53%, neither can be so regarded.
“When Allen Wallis was discussing such a problem with (Navy) Captain Garret L. Schyler,
the captain objected that such a test, to quote from Allen’s account, may prove wasteful. If
a wise and seasoned ordnance officer like Schyler were on the premises, he would see after the
first few thousand or even few hundred [rounds] that the experiment need not be completed
either because the new method is obviously inferior or because it is obviously superior beyond
what was hoped for … ‘’
Friedman and Wallis struggled with the problem but, after realizing that they were not able
to solve it, described the problem to Abraham Wald.
That started Wald on the path that led him to Sequential Analysis [106].
We’ll formulate the problem using dynamic programming.
The following presentation of the problem closely follows Dmitri Berskekas’s treatment in
Dynamic Programming and Stochastic Control [11].
A decision maker observes iid draws of a random variable 𝑧.
He (or she) wants to know which of two probability distributions 𝑓0 or 𝑓1 governs 𝑧.
After a number of draws, also to be determined, he makes a decision as to which of the distri-
butions is generating the draws he observers.
To help formalize the problem, let 𝑥 ∈ {𝑥0 , 𝑥1 } be a hidden state that indexes the two distri-
butions:
𝑓0 (𝑣) if 𝑥 = 𝑥0 ,
ℙ{𝑧 = 𝑣 ∣ 𝑥} = {
𝑓1 (𝑣) if 𝑥 = 𝑥1
Before observing any outcomes, the decision maker believes that the probability that 𝑥 = 𝑥0
is
27.4. A DYNAMIC PROGRAMMING APPROACH 515
𝑝𝑘 = ℙ{𝑥 = 𝑥0 ∣ 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 },
𝑝𝑘 𝑓0 (𝑧𝑘+1 )
𝑝𝑘+1 = , 𝑘 = −1, 0, 1, …
𝑝𝑘 𝑓0 (𝑧𝑘+1 ) + (1 − 𝑝𝑘 )𝑓1 (𝑧𝑘+1 )
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision maker believes that 𝑧𝑘+1 has probability distribu-
tion
This is a mixture of distributions 𝑓0 and 𝑓1 , with the weight on 𝑓0 being the posterior proba-
bility that 𝑥 = 𝑥0 Section ??.
To help illustrate this kind of distribution, let’s inspect some mixtures of beta distributions.
The density of a beta probability distribution with parameters 𝑎 and 𝑏 is
∞
Γ(𝑎 + 𝑏)𝑧 𝑎−1 (1 − 𝑧)𝑏−1
𝑓(𝑧; 𝑎, 𝑏) = where Γ(𝑡) ∶= ∫ 𝑥𝑡−1 𝑒−𝑥 𝑑𝑥
Γ(𝑎)Γ(𝑏) 0
27.4.1 Setup
begin
base_dist = [Beta(1, 1), Beta(3, 3)]
mixed_dist = MixtureModel.(Ref(base_dist), (p -> [p, one(p) - p]).(0.
↪25:0.25:0.75))
Out[3]:
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision maker chooses among three distinct actions:
• He decides that 𝑥 = 𝑥0 and draws no more 𝑧’s.
• He decides that 𝑥 = 𝑥1 and draws no more 𝑧’s.
• He postpones deciding now and instead chooses to draw a 𝑧𝑘+1 .
Associated with these three actions, the decision maker can suffer three kinds of losses:
• A loss 𝐿0 if he decides 𝑥 = 𝑥0 when actually 𝑥 = 𝑥1 .
• A loss 𝐿1 if he decides 𝑥 = 𝑥1 when actually 𝑥 = 𝑥0 .
• A cost 𝑐 if he postpones deciding and chooses instead to draw another 𝑧.
27.4.4 Intuition
Let’s try to guess what an optimal decision rule might look like before we go further.
Suppose at some given point in time that 𝑝 is close to 1.
Then our prior beliefs and the evidence so far point strongly to 𝑥 = 𝑥0 .
If, on the other hand, 𝑝 is close to 0, then 𝑥 = 𝑥1 is strongly favored.
Finally, if 𝑝 is in the middle of the interval [0, 1], then we have little information in either di-
rection.
This reasoning suggests a decision rule such as the one shown in the figure
As we’ll see, this is indeed the correct form of the decision rule.
The key problem is to determine the threshold values 𝛼, 𝛽, which will depend on the parame-
ters listed above.
You might like to pause at this point and try to predict the impact of a parameter such as 𝑐
or 𝐿0 on 𝛼 or 𝛽.
Let 𝐽 (𝑝) be the total loss for a decision maker with current belief 𝑝 who chooses optimally.
With some thought, you will agree that 𝐽 should satisfy the Bellman equation
𝑝𝑓0 (𝑧)
𝑝′ = (2)
𝑝𝑓0 (𝑧) + (1 − 𝑝)𝑓1 (𝑧)
when 𝑝 is fixed and 𝑧 is drawn from the current best guess, which is the distribution 𝑓 de-
fined by
1. accept 𝑥0
2. accept 𝑥1
Let
and
accept 𝑥 = 𝑥0 if 𝑝 ≥ 𝛼
accept 𝑥 = 𝑥1 if 𝑝 ≤ 𝛽 (8)
draw another 𝑧 if 𝛽 ≤ 𝑝 ≤ 𝛼
Our aim is to compute the value function 𝐽 , and from it the associated cutoffs 𝛼 and 𝛽.
One sensible approach is to write the three components of 𝐽 that appear on the right side of
the Bellman equation as separate functions.
Later, doing this will help us obey the don’t repeat yourself (DRY) golden rule of cod-
ing.
27.5 Implementation
First, consider the cost associated to accepting either distribution and compare the minimum
of the two to the expected benefit of drawing again.
Drawing again will only be worthwhile if the expected marginal benefit of learning from an
additional draw is greater than the explicit cost.
For every belief 𝑝, we can compute the difference between accepting a distribution and choos-
ing to draw again.
The solution 𝛼, 𝛽 occurs at indifference points.
Define the cost function be the minimum of the pairwise differences in cost among the
choices.
Then we can find the indifference points when the cost function is zero.
We can use any roots finding algorithm to solve for the solutions in the interval [0, 1].
Lastly, verify which indifference points correspond to the definition of a permanent transition
between the accept and reject space for each choice.
Here’s the code
p)
function draw_again(p, d0, d1, L0, L1, c, target)
candidate = 0.0
cost = 0.0
while candidate < target
p = bayes_update(p, d0, d1)
cost += c
candidate = min(accept_x0(p, L0), accept_x1(p, L1)) + cost
if candidate >= target
break
end
target = candidate
end
return candidate
end
function choice(p, d0, d1, L0, L1, c)
if isone(p)
output = (1, 0)
elseif iszero(p)
output = (2, 0)
elseif zero(p) < p < one(p)
target, option = findmin([accept_x0(p, L0), accept_x1(p, L1)])
candidate = draw_again(p, d0, d1, L0, L1, c, target)
if candidate < target
target, option = (candidate, 3)
end
output = (option, target)
else
throw(ArgumentError("p must be � [0, 1]"))
end
return output
end
520 CHAPTER 27. A PROBLEM THAT STUMPED MILTON FRIEDMAN
Next we solve a problem by finding the α, β values for the decision rule
if β < α
@printf("Accept x1 if p ≤ %.2f\nContinue to draw if %.2f ≤ p ≤ %.2f
\nAccept x0 if p ≥ %.2f", β, β, α, α)
else
x0 = accept_x0(β, L0)
x1 = accept_x1(β, L1)
draw = draw_again(β, d0, d1, L0, L1, c, min(x0, x1))
if draw == min(x0, x1, draw)
@printf("Accept x1 if p ≤ %.2f\nContinue to draw if %.2f ≤ p ≤�
↪%.2f
\nAccept x0 if p ≥ %.2f", β, β, α, α)
else
@printf("Accept x1 if p ≤ %.2f\nAccept x0 if p ≥ %.2f", β, α)
end
end
return (α, β)
end
We can simulate an agent facing a problem and the outcome with the following function
outcomes = fill(false, n)
costs = fill(0.0, n)
trials = fill(0, n)
for trial in 1:n
# Nature chooses
truth = rand(1:2)
# The true distribution and loss are defined based on the truth
d = (d0, d1)[truth]
l = (L0, L1)[truth]
t = 0
choice = 0
while iszero(choice)
t += 1
outcome = rand(d)
p = bayes_update(p, d0, d1)
if p <= β
choice = 1
elseif p >= α
choice = 2
end
end
correct = choice == truth
cost = t * c + (correct ? 0 : l)
outcomes[trial] = correct
costs[trial] = cost
trials[trial] = t
end
@printf("\nCorrect: %.2f\nAverage Cost: %.2f\nAverage number of trials:
↪ %.2f",
mean(outcomes), mean(costs), mean(trials))
return return_output ? (α, β, outcomes, costs, trials) : nothing
end
In [7]: Random.seed!(0);
simulation(Problem());
Accept x1 if p ≤ 0.35
Continue to draw if 0.35 ≤ p ≤ 0.57
Accept x0 if p ≥ 0.57
Correct: 0.43
Average Cost: 1.42
Average number of trials: 1.40
In [8]: Random.seed!(0);
simulation(Problem(c = 0.4));
Accept x1 if p ≤ 0.41
Continue to draw if 0.41 ≤ p ≤ 0.54
Accept x0 if p ≥ 0.54
Correct: 0.45
Average Cost: 1.59
Average number of trials: 1.22
For several reasons, it is useful to describe the theory underlying the test that Navy Captain
G. S. Schuyler had been told to use and that led him to approach Milton Friedman and Allan
Wallis to convey his conjecture that superior practical procedures existed.
Evidently, the Navy had told Captail Schuyler to use what it knew to be a state-of-the-art
Neyman-Pearson test.
We’ll rely on Abraham Wald’s [106] elegant summary of Neyman-Pearson theory.
For our purposes, watch for there features of the setup:
• the assumption of a fixed sample size 𝑛
• the application of laws of large numbers, conditioned on alternative probability models,
to interpret the probabilities 𝛼 and 𝛽 defined in the Neyman-Pearson theory
Recall that in the sequential analytic formulation above, that
• The sample size 𝑛 is not fixed but rather an object to be chosen; technically 𝑛 is a ran-
dom variable.
• The parameters 𝛽 and 𝛼 characterize cut-off rules used to determine 𝑛 as a random
variable.
• Laws of large numbers make no appearances in the sequential construction.
In chapter 1 of Sequential Analysis [106] Abraham Wald summarizes the Neyman-Pearson
approach to hypothesis testing.
Wald frames the problem as making a decision about a probability distribution that is par-
tially known.
(You have to assume that something is already known in order to state a well posed problem.
Usually, something means a lot.)
27.6. COMPARISON WITH NEYMAN-PEARSON FORMULATION 523
By limiting what is unknown, Wald uses the following simple structure to illustrate the main
ideas.
• A decision maker wants to decide which of two distributions 𝑓0 , 𝑓1 govern an i.i.d. ran-
dom variable 𝑧.
• The null hypothesis 𝐻0 is the statement that 𝑓0 governs the data.
• The alternative hypothesis 𝐻1 is the statement that 𝑓1 governs the data.
• The problem is to devise and analyze a test of hypothesis 𝐻0 against the alternative
hypothesis 𝐻1 on the basis of a sample of a fixed number 𝑛 independent observations
𝑧1 , 𝑧2 , … , 𝑧𝑛 of the random variable 𝑧.
To quote Abraham Wald,
• A test procedure leading to the acceptance or rejection of the hypothesis in question
is simply a rule specifying, for each possible sample of size 𝑛, whether the hypothesis
should be accepted or rejected on the basis of the sample. This may also be expressed
as follows: A test procedure is simply a subdivision of the totality of all possible sam-
ples of size 𝑛 into two mutually exclusive parts, say part 1 and part 2, together with the
application of the rule that the hypothesis be accepted if the observed sample is con-
tained in part 2. Part 1 is also called the critical region. Since part 2 is the totality of
all samples of size 2 which are not included in part 1, part 2 is uniquely determined by
part 1. Thus, choosing a test procedure is equivalent to determining a critical region.
Let’s listen to Wald longer:
• As a basis for choosing among critical regions the following considerations have been ad-
vanced by Neyman and Pearson: In accepting or rejecting 𝐻0 we may commit errors of
two kinds. We commit an error of the first kind if we reject 𝐻0 when it is true; we com-
mit an error of the second kind if we accept 𝐻0 when 𝐻1 is true. After a particular crit-
ical region 𝑊 has been chosen, the probability of committing an error of the first kind,
as well as the probability of committing an error of the second kind is uniquely deter-
mined. The probability of committing an error of the first kind is equal to the proba-
bility, determined by the assumption that 𝐻0 is true, that the observed sample will be
included in the critical region 𝑊 . The probability of committing an error of the second
kind is equal to the probability, determined on the assumption that 𝐻1 is true, that the
probability will fall outside the critical region 𝑊 . For any given critical region 𝑊 we
shall denote the probability of an error of the first kind by 𝛼 and the probability of an
error of the second kind by 𝛽.
Let’s listen carefully to how Wald applies a law of large numbers to interpret 𝛼 and 𝛽:
• The probabilities 𝛼 and 𝛽 have the following important practical interpretation: Sup-
pose that we draw a large number of samples of size 𝑛. Let 𝑀 be the number of such
samples drawn. Suppose that for each of these 𝑀 samples we reject 𝐻0 if the sam-
ple is included in 𝑊 and accept 𝐻0 if the sample lies outside 𝑊 . In this way we make
𝑀 statements of rejection or acceptance. Some of these statements will in general be
wrong. If 𝐻0 is true and if 𝑀 is large, the probability is nearly 1 (i.e., it is practically
certain) that the proportion of wrong statements (i.e., the number of wrong statements
divided by 𝑀 ) will be approximately 𝛼. If 𝐻1 is true, the probability is nearly 1 that
the proportion of wrong statements will be approximately 𝛽. Thus, we can say that in
the long run [ here Wald applies a law of large numbers by driving 𝑀 → ∞ (our com-
ment, not Wald’s) ] the proportion of wrong statements will be 𝛼 if 𝐻0 is true and 𝛽 if
𝐻1 is true.
The quantity 𝛼 is called the size of the critical region, and the quantity 1 − 𝛽 is called the
524 CHAPTER 27. A PROBLEM THAT STUMPED MILTON FRIEDMAN
𝑓1 (𝑧1 ) ⋯ 𝑓1 (𝑧𝑛 )
≥𝑘
𝑓0 (𝑧1 ) ⋯ 𝑓1 (𝑧𝑛 )
is a most powerful critical region for testing the hypothesis 𝐻0 against the alternative
hypothesis 𝐻1 . The term 𝑘 on the right side is a constant chosen so that the region will
have the required size 𝛼.
Wald goes on to discuss Neyman and Pearson’s concept of uniformly most powerful test.
Here is how Wald introduces the notion of a sequential test
• A rule is given for making one of the following three decisions at any stage of the exper-
iment (at the m th trial for each integral value of m ): (1) to accept the hypothesis H ,
(2) to reject the hypothesis H , (3) to continue the experiment by making an additional
observation. Thus, such a test procedure is carried out sequentially. On the basis of the
first observation one of the aforementioned decisions is made. If the first or second de-
cision is made, the process is terminated. If the third decision is made, a second trial is
performed. Again, on the basis of the first two observations one of the three decisions is
made. If the third decision is made, a third trial is performed, and so on. The process
is continued until either the first or the second decisions is made. The number n of ob-
servations required by such a test procedure is a random variable, since the value of n
depends on the outcome of the observations.
Footnotes
[1] Because the decision maker believes that 𝑧𝑘+1 is drawn from a mixture of two i.i.d. distri-
butions, he does not believe that the sequence [𝑧𝑘+1 , 𝑧𝑘+2 , …] is i.i.d. Instead, he believes that
it is exchangeable. See [62] chapter 11, for a discussion of exchangeability.
Chapter 28
28.1 Contents
• Overview 28.2
• Model 28.3
• Take 1: Solution by VFI 28.4
• Take 2: A More Efficient Method 28.5
• Exercises 28.6
• Solutions 28.7
28.2 Overview
In this lecture we consider an extension of the previously studied job search model of McCall
[76].
In the McCall model, an unemployed worker decides when to accept a permanent position at
a specified wage, given
• his or her discount rate
• the level of unemployment compensation
• the distribution from which wage offers are drawn
In the version considered below, the wage distribution is unknown and must be learned.
• The following is based on the presentation in [68], section 6.6.
• Infinite horizon dynamic programming with two states and one binary control.
• Bayesian updating to learn the unknown distribution.
525
526 CHAPTER 28. JOB SEARCH III: SEARCH WITH LEARNING
28.2.2 Setup
28.3 Model
Let’s first review the basic McCall model [76] and then add the variation we want to consider.
Recall that, in the baseline model, an unemployed worker is presented in each period with a
permanent job offer at wage 𝑊𝑡 .
At time 𝑡, our worker either
2. rejects the offer, receives unemployment compensation 𝑐 and reconsiders next period
The wage sequence {𝑊𝑡 } is iid and generated from known density ℎ.
∞
The worker aims to maximize the expected discounted sum of earnings 𝔼 ∑𝑡=0 𝛽 𝑡 𝑦𝑡 The func-
tion 𝑉 satisfies the recursion
𝑤
𝑉 (𝑤) = max { , 𝑐 + 𝛽 ∫ 𝑉 (𝑤′ )ℎ(𝑤′ )𝑑𝑤′ } (1)
1−𝛽
Now let’s extend the model by considering the variation presented in [68], section 6.6.
The model is as above, apart from the fact that
• the density ℎ is unknown
• the worker learns about ℎ by starting with a prior and updating based on wage offers
that he/she observes
The worker knows there are two possible distributions 𝐹 and 𝐺 — with densities 𝑓 and 𝑔.
At the start of time, “nature” selects ℎ to be either 𝑓 or 𝑔 — the wage distribution from
which the entire sequence {𝑊𝑡 } will be drawn.
This choice is not observed by the worker, who puts prior probability 𝜋0 on 𝑓 being chosen.
Update rule: worker’s time 𝑡 estimate of the distribution is 𝜋𝑡 𝑓 + (1 − 𝜋𝑡 )𝑔, where 𝜋𝑡 updates
via
28.3. MODEL 527
𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )
This last expression follows from Bayes’ rule, which tells us that
ℙ{𝑊 = 𝑤 | ℎ = 𝑓}ℙ{ℎ = 𝑓}
ℙ{ℎ = 𝑓 | 𝑊 = 𝑤} = and ℙ{𝑊 = 𝑤} = ∑ ℙ{𝑊 = 𝑤 | ℎ = 𝜓}ℙ{ℎ = 𝜓}
ℙ{𝑊 = 𝑤} 𝜓∈{𝑓,𝑔}
The fact that (2) is recursive allows us to progress to a recursive solution method.
Letting
𝜋𝑓(𝑤)
ℎ𝜋 (𝑤) ∶= 𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤) and 𝑞(𝑤, 𝜋) ∶=
𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤)
we can express the value function for the unemployed worker recursively as follows
𝑤
𝑉 (𝑤, 𝜋) = max { , 𝑐 + 𝛽 ∫ 𝑉 (𝑤′ , 𝜋′ ) ℎ𝜋 (𝑤′ ) 𝑑𝑤′ } where 𝜋′ = 𝑞(𝑤′ , 𝜋) (3)
1−𝛽
Notice that the current guess 𝜋 is a state variable, since it affects the worker’s perception of
probabilities for future rewards.
28.3.3 Parameterization
gr(fmt=:png);
w_max = 2
x = range(0, w_max, length = 200)
G = Beta(3, 1.6)
F = Beta(1, 1)
plot(x, pdf.(G, x/w_max)/w_max, label="g")
plot!(x, pdf.(F, x/w_max)/w_max, label="f")
Out[2]:
528 CHAPTER 28. JOB SEARCH III: SEARCH WITH LEARNING
What kind of optimal policy might result from (3) and the parameterization specified above?
Intuitively, if we accept at 𝑤𝑎 and 𝑤𝑎 ≤ 𝑤𝑏 , then — all other things being given — we should
also accept at 𝑤𝑏 .
This suggests a policy of accepting whenever 𝑤 exceeds some threshold value 𝑤.̄
But 𝑤̄ should depend on 𝜋 — in fact it should be decreasing in 𝜋 because
• 𝑓 is a less attractive offer distribution than 𝑔
• larger 𝜋 means more weight on 𝑓 and less on 𝑔
Thus larger 𝜋 depresses the worker’s assessment of her future prospects, and relatively low
current offers become more attractive.
Summary: We conjecture that the optimal policy is of the form 𝟙{𝑤 ≥ 𝑤(𝜋)}
̄ for some de-
creasing function 𝑤.̄
Let’s set about solving the model and see how our results match with our intuition.
We begin by solving via value function iteration (VFI), which is natural but ultimately turns
out to be second best.
The code is as follows.
F = Beta(F_a, F_b)
G = Beta(G_a, G_b)
# scaled pdfs
f(x) = pdf.(F, x/w_max)/w_max
g(x) = pdf.(G, x/w_max)/w_max
return (β = β, c = c, F = F, G = G, f = f,
g = g, n_w = w_grid_size, w_max = w_max,
w_grid = w_grid, n_π = π_grid_size, π_min = π_min,
π_max = π_max, π_grid = π_grid, quad_nodes = nodes,
quad_weights = weights)
end
vf = extrapolate(interpolate((sp.w_grid, sp.π_grid), v,
Gridded(Linear())), Flat())
end
return out
end
function T(sp, v;
ret_policy = false)
out_type = ret_policy ? Bool : Float64
out = zeros(out_type, sp.n_w, sp.n_π)
T!(sp, v, out, ret_policy=ret_policy)
end
function res_wage_operator(sp, ϕ)
out = similar(ϕ)
res_wage_operator!(sp, ϕ, out)
return out
end
The type SearchProblem is used to store parameters and methods needed to compute opti-
mal actions.
The Bellman operator is implemented as the method T(), while get_greedy() computes
an approximate optimal policy from a guess v of the value function.
We will omit a detailed discussion of the code because there is a more efficient solution
method.
These ideas are implemented in the .res_wage_operator() method.
Before explaining it let’s look at solutions computed from value function iteration.
Here’s the value function:
sp = SearchProblem(;w_grid_size=100, π_grid_size=100)
v_init = fill(sp.c / (1 - sp.β), sp.n_w, sp.n_π)
f(x) = T(sp, x)
v = compute_fixed_point(f, v_init)
policy = get_greedy(sp, v)
plot_value_function()
Out[4]:
532 CHAPTER 28. JOB SEARCH III: SEARCH WITH LEARNING
plot_policy_function()
Out[5]:
28.5. TAKE 2: A MORE EFFICIENT METHOD 533
𝑤(𝜋)
̄
= 𝑐 + 𝛽 ∫ 𝑉 (𝑤′ , 𝜋′ ) ℎ𝜋 (𝑤′ ) 𝑑𝑤′ (4)
1−𝛽
Together, (3) and (4) give
𝑤 𝑤(𝜋)
̄
𝑉 (𝑤, 𝜋) = max { , } (5)
1−𝛽 1−𝛽
𝑤(𝜋)
̄ 𝑤′ ̄ ′)
𝑤(𝜋
= 𝑐 + 𝛽 ∫ max { , } ℎ𝜋 (𝑤′ ) 𝑑𝑤′
1−𝛽 1−𝛽 1−𝛽
𝑤(𝜋)
̄ = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝑤̄ ∘ 𝑞(𝑤′ , 𝜋)} ℎ𝜋 (𝑤′ ) 𝑑𝑤′ (6)
Equation (6) can be understood as a functional equation, where 𝑤̄ is the unknown function.
• Let’s call it the reservation wage functional equation (RWFE).
• The solution 𝑤̄ to the RWFE is the object that we wish to compute.
To solve the RWFE, we will first show that its solution is the fixed point of a contraction
mapping.
To this end, let
• 𝑏[0, 1] be the bounded real-valued functions on [0, 1]
• ‖𝜓‖ ∶= sup𝑥∈[0,1] |𝜓(𝑥)|
Consider the operator 𝑄 mapping 𝜓 ∈ 𝑏[0, 1] into 𝑄𝜓 ∈ 𝑏[0, 1] via
Comparing (6) and (7), we see that the set of fixed points of 𝑄 exactly coincides with the set
of solutions to the RWFE.
• If 𝑄𝑤̄ = 𝑤̄ then 𝑤̄ solves (6) and vice versa.
Moreover, for any 𝜓, 𝜙 ∈ 𝑏[0, 1], basic algebra and the triangle inequality for integrals tells us
that
|(𝑄𝜓)(𝜋) − (𝑄𝜙)(𝜋)| ≤ 𝛽 ∫ |max {𝑤′ , 𝜓 ∘ 𝑞(𝑤′ , 𝜋)} − max {𝑤′ , 𝜙 ∘ 𝑞(𝑤′ , 𝜋)}| ℎ𝜋 (𝑤′ ) 𝑑𝑤′ (8)
28.6. EXERCISES 535
Working case by case, it is easy to check that for real numbers 𝑎, 𝑏, 𝑐 we always have
In other words, 𝑄 is a contraction of modulus 𝛽 on the complete metric space (𝑏[0, 1], ‖ ⋅ ‖).
Hence
• A unique solution 𝑤̄ to the RWFE exists in 𝑏[0, 1].
• 𝑄𝑘 𝜓 → 𝑤̄ uniformly as 𝑘 → ∞, for any 𝜓 ∈ 𝑏[0, 1].
Implementation
28.6 Exercises
28.6.1 Exercise 1
Use the default parameters and the .res_wage_operator() method to compute an opti-
mal policy.
Your result should coincide closely with the figure for the optimal policy shown above.
Try experimenting with different parameters, and confirm that the change in the optimal pol-
icy coincides with your intuition.
28.7 Solutions
28.7.1 Exercise 1
This code solves the “Offer Distribution Unknown” model by iterating on a guess of the reser-
vation wage function. You should find that the run time is much shorter than that of the
value function approach in examples/odu_vfi_plots.jl.
ϕ_init = ones(sp.n_π)
f_ex1(x) = res_wage_operator(sp, x)
�
w = compute_fixed_point(f_ex1, ϕ_init)
�
plot(sp.π_grid, w, linewidth = 2, color=:black,
fillrange = 0, fillalpha = 0.15, fillcolor = :blue)
� �
plot!(sp.π_grid, 2 * ones(length(w)), linewidth = 0, fillrange = w,
fillalpha = 0.12, fillcolor = :green, legend = :none)
plot!(ylims = (0, 2), annotations = [(0.42, 1.2, "reject"),
(0.7, 1.8, "accept")])
Out[6]:
The next piece of code is not one of the exercises from QuantEcon – it’s just a fun simulation
to see what the effect of a change in the underlying distribution on the unemployment rate is.
At a point in the simulation, the distribution becomes significantly worse. It takes a while for
agents to learn this, and in the meantime they are too optimistic, and turn down too many
jobs. As a result, the unemployment rate spikes.
The code takes a few minutes to run.
�
# Set up model and compute the function w
sp = SearchProblem(π_grid_size = 50, F_a = 1, F_b = 1)
ϕ_init = ones(sp.n_π)
g(x) = res_wage_operator(sp, x)
�
w_vals = compute_fixed_point(g, ϕ_init)
� �
w = extrapolate(interpolate((sp.π_grid, ), w_vals,
Gridded(Linear())), Flat())
Agent(_π=1e-3) = Agent(_π, 1)
function update!(ag, H)
if ag.employed == 0
w = rand(H) * 2 # account for scale in julia
�
if w ≥ w(ag._π)
ag.employed = 1
else
ag._π = 1.0 ./ (1 .+ ((1 - ag._π) .* sp.g(w)) ./ (ag._π * sp.f(w)))
end
end
nothing
end
num_agents = 5000
separation_rate = 0.025 # Fraction of jobs that end in each period
separation_num = round(Int, num_agents * separation_rate)
agent_indices = collect(1:num_agents)
agents = [Agent() for i=1:num_agents]
sim_length = 600
H = sp.G # Start with distribution G
change_date = 200 # Change to F after this many periods
unempl_rate = zeros(sim_length)
for i in 1:sim_length
if i % 20 == 0
println("date = $i")
end
if i == change_date
H = sp.F
end
# update agents
for agent in agents
538 CHAPTER 28. JOB SEARCH III: SEARCH WITH LEARNING
update!(agent, H)
end
employed = Int[agent.employed for agent in agents]
unempl_rate[i] = 1.0 - mean(employed)
end
Out[7]:
28.7. SOLUTIONS 539
540 CHAPTER 28. JOB SEARCH III: SEARCH WITH LEARNING
Chapter 29
29.1 Contents
• Overview 29.2
• Model 29.3
• Exercises 29.4
• Solutions 29.5
29.2 Overview
• Career and job within career both chosen to maximize expected discounted wage flow.
• Infinite horizon dynamic programming with two state variables.
29.2.2 Setup
29.3 Model
541
542 CHAPTER 29. JOB SEARCH IV: MODELING CAREER CHOICE
∞
𝔼 ∑ 𝛽 𝑡 𝑤𝑡 (1)
𝑡=0
where
𝐼 = 𝜃 + 𝜖 + 𝛽𝑉 (𝜃, 𝜖)
Evidently 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 correspond to “stay put”, “new job” and “new life”, respectively.
29.3.1 Parameterization
As in [68], section 6.5, we will focus on a discrete version of the model, parameterized as fol-
lows:
• both 𝜃 and 𝜖 take values in the set linspace(0, B, N) — an even grid of 𝑁 points
between 0 and 𝐵 inclusive
• 𝑁 = 50
29.3. MODEL 543
• 𝐵=5
• 𝛽 = 0.95
The distributions 𝐹 and 𝐺 are discrete distributions generating draws from the grid points
linspace(0, B, N).
A very useful family of discrete distributions is the Beta-binomial family, with probability
mass function
𝑛 𝐵(𝑘 + 𝑎, 𝑛 − 𝑘 + 𝑏)
𝑝(𝑘 | 𝑛, 𝑎, 𝑏) = ( ) , 𝑘 = 0, … , 𝑛
𝑘 𝐵(𝑎, 𝑏)
Interpretation:
• draw 𝑞 from a β distribution with shape parameters (𝑎, 𝑏)
• run 𝑛 independent binary trials, each with success probability 𝑞
• 𝑝(𝑘 | 𝑛, 𝑎, 𝑏) is the probability of 𝑘 successes in these 𝑛 trials
Nice properties:
• very flexible class of distributions, including uniform, symmetric unimodal, etc.
• only three parameters
Here’s a figure showing the effect of different shape parameters when 𝑛 = 50.
n = 50
a_vals = [0.5, 1, 100]
b_vals = [0.5, 1, 100]
plt = plot()
for (a, b) in zip(a_vals, b_vals)
ab_label = "a = $a, b = $b"
dist = BetaBinomial(n, a, b)
plot!(plt, 0:n, pdf.(dist, support(dist)), label = ab_label)
end
plt
Out[3]:
544 CHAPTER 29. JOB SEARCH IV: MODELING CAREER CHOICE
Implementation:
The code for solving the DP problem described above is found below:
for j in 1:cp.N
for i in 1:cp.N
# stay put
29.3. MODEL 545
# new job
v2 = (cp.θ[i] .+ cp.G_mean .+ cp.β .*
v[i, :]' * cp.G_probs)[1] # do not need a single element�
↪ array
if ret_policy
if v1 > max(v2, v3)
action = 1
elseif v2 > max(v1, v3)
action = 2
else
action = 3
end
out[i, j] = action
else
out[i, j] = max(v1, v2, v3)
end
end
end
end
function get_greedy(cp, v)
update_bellman(cp, v, ret_policy = true)
end
In [5]: wp = CareerWorkerProblem()
v_init = fill(100.0, wp.N, wp.N)
func(x) = update_bellman(wp, x)
v = compute_fixed_point(func, v_init, max_iter = 500, verbose = false)
Out[5]:
29.4. EXERCISES 547
The optimal policy can be represented as follows (see Exercise 3 for code).
Interpretation:
• If both job and career are poor or mediocre, the worker will experiment with new job
and new career.
• If career is sufficiently good, the worker will hold it and experiment with new jobs until
a sufficiently good one is found.
• If both job and career are good, the worker will stay put.
Notice that the worker will always hold on to a sufficiently good career, but not necessarily
hold on to even the best paying job.
The reason is that high lifetime wages require both variables to be large, and the worker can-
not change careers without changing jobs.
• Sometimes a good job must be sacrificed in order to change to a better career.
29.4 Exercises
29.4.1 Exercise 1
Using the default parameterization in the CareerWorkerProblem, generate and plot typi-
cal sample paths for 𝜃 and 𝜖 when the worker follows the optimal policy.
In particular, modulo randomness, reproduce the following figure (where the horizontal axis
represents time)
548 CHAPTER 29. JOB SEARCH IV: MODELING CAREER CHOICE
Hint: To generate the draws from the distributions 𝐹 and 𝐺, use the type DiscreteRV.
29.4.2 Exercise 2
Let’s now consider how long it takes for the worker to settle down to a permanent job, given
a starting point of (𝜃, 𝜖) = (0, 0).
In other words, we want to study the distribution of the random variable
𝑇 ∗ ∶= the first point in time from which the worker’s job no longer changes
Evidently, the worker’s job becomes permanent if and only if (𝜃𝑡 , 𝜖𝑡 ) enters the “stay put”
region of (𝜃, 𝜖) space.
Letting 𝑆 denote this region, 𝑇 ∗ can be expressed as the first passage time to 𝑆 under the
optimal policy:
𝑇 ∗ ∶= inf{𝑡 ≥ 0 | (𝜃𝑡 , 𝜖𝑡 ) ∈ 𝑆}
Collect 25,000 draws of this random variable and compute the median (which should be
about 7).
Repeat the exercise with 𝛽 = 0.99 and interpret the change.
29.4.3 Exercise 3
As best you can, reproduce the figure showing the optimal policy.
29.5. SOLUTIONS 549
Hint: The get_greedy() method returns a representation of the optimal policy where val-
ues 1, 2 and 3 correspond to “stay put”, “new job” and “new life” respectively. Use this and
the plots functions (e.g., contour, contour!) to produce the different shadings.
Now set G_a = G_b = 100 and generate a new figure with these parameters. Interpret.
29.5 Solutions
29.5.1 Exercise 1
In [6]: wp = CareerWorkerProblem()
function solve_wp(wp)
v_init = fill(100.0, wp.N, wp.N)
func(x) = update_bellman(wp, x)
v = compute_fixed_point(func, v_init, max_iter = 500, verbose = false)
optimal_policy = get_greedy(wp, v)
return v, optimal_policy
end
v, optimal_policy = solve_wp(wp)
F = DiscreteRV(wp.F_probs)
G = DiscreteRV(wp.G_probs)
for t=1:T
# do nothing if stay put
if optimal_policy[i, j] == 2 # new job
j = rand(G)[1]
elseif optimal_policy[i, j] == 3 # new life
i, j = rand(F)[1], rand(G)[1]
end
push!(θ_ind, i)
push!(ϵ_ind, j)
end
return wp.θ[θ_ind], wp.ϵ[ϵ_ind]
end
plot_array = Any[]
for i in 1:2
θ_path, ϵ_path = gen_path()
plt = plot(ϵ_path, label="epsilon")
plot!(plt, θ_path, label="theta")
plot!(plt, legend=:bottomright)
push!(plot_array, plt)
end
plot(plot_array..., layout = (2,1))
Out[6]:
550 CHAPTER 29. JOB SEARCH IV: MODELING CAREER CHOICE
29.5.2 Exercise 2
M = 25000
samples = zeros(M)
for i in 1:M
samples[i] = gen_first_passage_time(optimal_policy)
end
print(median(samples))
7.0
To compute the median with 𝛽 = 0.99 instead of the default value 𝛽 = 0.95, replace
wp=CareerWorkerProblem() with wp=CareerWorkerProblem(β=0.99).
29.5. SOLUTIONS 551
The medians are subject to randomness, but should be about 7 and 14 respectively. Not sur-
prisingly, more patient workers will wait longer to settle down to their final job.
samples2 = zeros(M)
for i in 1:M
samples2[i] = gen_first_passage_time(optimal_policy2)
end
print(median(samples2))
14.0
29.5.3 Exercise 3
In [9]: wp = CareerWorkerProblem();
v, optimal_policy = solve_wp(wp)
Out[9]:
552 CHAPTER 29. JOB SEARCH IV: MODELING CAREER CHOICE
Out[10]:
29.5. SOLUTIONS 553
You will see that the region for which the worker will stay put has grown because the distri-
bution for 𝜖 has become more concentrated around the mean, making high-paying jobs less
realistic.
554 CHAPTER 29. JOB SEARCH IV: MODELING CAREER CHOICE
Chapter 30
30.1 Contents
• Overview 30.2
• Model 30.3
• Implementation 30.4
• Solving for Policies 30.5
• Exercises 30.6
• Solutions 30.7
30.2 Overview
30.2.2 Setup
30.3 Model
Let
• 𝑥𝑡 denote the time-𝑡 job-specific human capital of a worker employed at a given firm
• 𝑤𝑡 denote current wages
555
556 CHAPTER 30. JOB SEARCH V: ON-THE-JOB SEARCH
Let 𝑤𝑡 = 𝑥𝑡 (1 − 𝑠𝑡 − 𝜙𝑡 ), where
• 𝜙𝑡 is investment in job-specific human capital for the current role
• 𝑠𝑡 is search effort, devoted to obtaining new offers from other firms
For as long as the worker remains in the current job, evolution of {𝑥𝑡 } is given by 𝑥𝑡+1 =
𝐺(𝑥𝑡 , 𝜙𝑡 ).
When search effort at 𝑡 is 𝑠𝑡 , the worker receives a new job offer with probability 𝜋(𝑠𝑡 ) ∈
[0, 1].
Value of offer is 𝑈𝑡+1 , where {𝑈𝑡 } is iid with common distribution 𝐹 .
Worker has the right to reject the current offer and continue with existing job.
In particular, 𝑥𝑡+1 = 𝑈𝑡+1 if accepts and 𝑥𝑡+1 = 𝐺(𝑥𝑡 , 𝜙𝑡 ) if rejects.
Letting 𝑏𝑡+1 ∈ {0, 1} be binary with 𝑏𝑡+1 = 1 indicating an offer, we can write
Agent’s objective: maximize expected discounted sum of wages via controls {𝑠𝑡 } and {𝜙𝑡 }.
Taking the expectation of 𝑉 (𝑥𝑡+1 ) and using (1), the Bellman equation for this problem can
be written as
𝑉 (𝑥) = max {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑉 [𝐺(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑉 [𝐺(𝑥, 𝜙) ∨ 𝑢]𝐹 (𝑑𝑢)} . (2)
𝑠+𝜙≤1
30.3.1 Parameterization
√
𝐺(𝑥, 𝜙) = 𝐴(𝑥𝜙)𝛼 , 𝜋(𝑠) = 𝑠 and 𝐹 = Beta(2, 2)
Before we solve the model, let’s make some quick calculations that provide intuition on what
the solution should look like.
To begin, observe that the worker has two instruments to build capital and hence wages:
2. search for a new job with better job-specific capital match via 𝑠
Since wages are 𝑥(1 − 𝑠 − 𝜙), marginal cost of investment via either 𝜙 or 𝑠 is identical.
Our risk neutral worker should focus on whatever instrument has the highest expected return.
The relative expected return will depend on 𝑥.
For example, suppose first that 𝑥 = 0.05
• If 𝑠 = 1 and 𝜙 = 0, then since 𝐺(𝑥, 𝜙) = 0, taking expectations of (1) gives expected
next period capital equal to 𝜋(𝑠)𝔼𝑈 = 𝔼𝑈 = 0.5.
• If 𝑠 = 0 and 𝜙 = 1, then next period capital is 𝐺(𝑥, 𝜙) = 𝐺(0.05, 1) ≈ 0.23.
Both rates of return are good, but the return from search is better.
Next suppose that 𝑥 = 0.4
• If 𝑠 = 1 and 𝜙 = 0, then expected next period capital is again 0.5
• If 𝑠 = 0 and 𝜙 = 1, then 𝐺(𝑥, 𝜙) = 𝐺(0.4, 1) ≈ 0.8
Return from investment via 𝜙 dominates expected return from search.
Combining these observations gives us two informal predictions:
1. At any given state 𝑥, the two controls 𝜙 and 𝑠 will function primarily as substitutes —
worker will focus on whichever instrument has the higher expected return.
Now let’s turn to implementation, and see if we can match our predictions.
30.4 Implementation
# model object
function JvWorker(;A = 1.4,
α = 0.6,
β = 0.96,
grid_size = 50,
ϵ = 1e-4)
G(x, ϕ) = A .* (x .* ϕ).^α
π_func = sqrt
F = Beta(2, 2)
# expectation operator
E = expectation(F)
function T!(jv,
V,
new_V::AbstractVector)
# simplify notation
@unpack G, π_func, F, β, E, ϵ = jv
function w(z)
s, ϕ = z
h(u) = Vf(max(G(x, ϕ), u))
integral = E(h)
q = π_func(s) * integral + (1.0 - π_func(s)) * Vf(G(x, ϕ))
return - x * (1.0 - ϕ - s) - β * q
end
for s in search_grid
for ϕ in search_grid
cur_val = ifelse(s + ϕ <= 1.0, -w((s, ϕ)), -1.0)
if cur_val > max_val
max_val, max_s, max_ϕ = cur_val, s, ϕ
end
end
end
new_V[i] = max_val
end
end
function T!(jv,
V,
out::Tuple{AbstractVector, AbstractVector})
# simplify notation
@unpack G, π_func, F, β, E, ϵ = jv
30.4. IMPLEMENTATION 559
# instantiate variables
s_policy, ϕ_policy = out[1], out[2]
function w(z)
s, ϕ = z
h(u) = Vf(max(G(x, ϕ), u))
integral = E(h)
q = π_func(s) * integral + (1.0 - π_func(s)) * Vf(G(x, ϕ))
return - x * (1.0 - ϕ - s) - β * q
end
for s in search_grid
for ϕ in search_grid
cur_val = ifelse(s + ϕ <= 1.0, -w((s, ϕ)), -1.0)
if cur_val > max_val
max_val, max_s, max_ϕ = cur_val, s, ϕ
end
end
end
where
𝑤(𝑠, 𝜙) ∶= − {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑉 [𝐺(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑉 [𝐺(𝑥, 𝜙) ∨ 𝑢]𝐹 (𝑑𝑢)} (3)
Let’s plot the optimal policies and see what they look like.
The code is as follows
In [4]: wp = JvWorker(grid_size=25)
v_init = collect(wp.x_grid) .* 0.5
f(x) = T(wp, x)
V = fixedpoint(f, v_init)
sol_V = V.zero
# plot solution
p = plot(wp.x_grid, [ϕ_policy s_policy sol_V],
title = ["phi policy" "s policy" "value function"],
color = [:orange :blue :green],
xaxis = ("x", (0.0, maximum(wp.x_grid))),
yaxis = ((-0.1, 1.1)), size = (800, 800),
30.6. EXERCISES 561
Out[4]:
The horizontal axis is the state 𝑥, while the vertical axis gives 𝑠(𝑥) and 𝜙(𝑥).
Overall, the policies match well with our predictions from section.
• Worker switches from one investment strategy to the other depending on relative re-
turn.
• For low values of 𝑥, the best option is to search for a new job.
• Once 𝑥 is larger, worker does better by investing in human capital specific to the cur-
rent position.
30.6 Exercises
30.6.1 Exercise 1
Let’s look at the dynamics for the state process {𝑥𝑡 } associated with these policies.
562 CHAPTER 30. JOB SEARCH V: ON-THE-JOB SEARCH
The dynamics are given by (1) when 𝜙𝑡 and 𝑠𝑡 are chosen according to the optimal policies,
and ℙ{𝑏𝑡+1 = 1} = 𝜋(𝑠𝑡 ).
Since the dynamics are random, analysis is a bit subtle.
One way to do it is to plot, for each 𝑥 in a relatively fine grid called plot_grid, a large
number 𝐾 of realizations of 𝑥𝑡+1 given 𝑥𝑡 = 𝑥. Plot this with one dot for each realization,
in the form of a 45 degree diagram. Set
K = 50
plot_grid_max, plot_grid_size = 1.2, 100
plot_grid = range(0, plot_grid_max, length = plot_grid_size)
plot(plot_grid, plot_grid, color = :black, linestyle = :dash,
lims = (0, plot_grid_max), legend = :none)
By examining the plot, argue that under the optimal policies, the state 𝑥𝑡 will converge to a
constant value 𝑥̄ close to unity.
Argue that at the steady state, 𝑠𝑡 ≈ 0 and 𝜙𝑡 ≈ 0.6.
30.6.2 Exercise 2
In the preceding exercise we found that 𝑠𝑡 converges to zero and 𝜙𝑡 converges to about 0.6.
Since these results were calculated at a value of 𝛽 close to one, let’s compare them to the best
choice for an infinitely patient worker.
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are
a function of steady state capital.
You can take it as given—it’s certainly true—that the infinitely patient worker does not
search in the long run (i.e., 𝑠𝑡 = 0 for large 𝑡).
Thus, given 𝜙, steady state capital is the positive fixed point 𝑥∗ (𝜙) of the map 𝑥 ↦ 𝐺(𝑥, 𝜙).
Steady state wages can be written as 𝑤∗ (𝜙) = 𝑥∗ (𝜙)(1 − 𝜙).
Graph 𝑤∗ (𝜙) with respect to 𝜙, and examine the best choice of 𝜙.
Can you give a rough interpretation for the value that you see?
30.7 Solutions
30.7.1 Exercise 1
In [5]: wp = JvWorker(grid_size=25)
# simplify notation
@unpack G, π_func, F = wp
xs = []
ys = []
for x in plot_grid
for i=1:K
b = rand() < π_func(s(x)) ? 1 : 0
U = rand(wp.F)
y = h_func(x, b, U)
push!(xs, x)
push!(ys, y)
end
end
Out[6]:
564 CHAPTER 30. JOB SEARCH V: ON-THE-JOB SEARCH
30.7.2 Exercise 2
In [7]: wp = JvWorker(grid_size=25)
Out[7]:
30.7. SOLUTIONS 565
31.1 Contents
• Overview 31.2
• The Model 31.3
• Computation 31.4
• Exercises 31.5
• Solutions 31.6
31.2 Overview
In this lecture we’re going to study a simple optimal growth model with one agent.
The model is a version of the standard one sector infinite horizon growth model studied in
• [100], chapter 2
• [68], section 3.1
• EDTC, chapter 1
• [102], chapter 12
The technique we use to solve the model is dynamic programming.
Our treatment of dynamic programming follows on from earlier treatments in our lectures on
shortest paths and job search.
We’ll discuss some of the technical details of dynamic programming as we go along.
567
568CHAPTER 31. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
Production is stochastic, in that it also depends on a shock 𝜉𝑡+1 realized at the end of the
current period.
Next period output is
𝑘𝑡+1 + 𝑐𝑡 ≤ 𝑦𝑡 (1)
In what follows,
• The sequence {𝜉𝑡 } is assumed to be IID.
• The common distribution of each 𝜉𝑡 will be denoted 𝜙.
• The production function 𝑓 is assumed to be increasing and continuous.
• Depreciation of capital is not made explicit but can be incorporated into the production
function.
While many other treatments of the stochastic growth model use 𝑘𝑡 as the state variable, we
will use 𝑦𝑡 .
This will allow us to treat a stochastic model while maintaining only one state variable.
We consider alternative states and timing specifications in some of our other lectures.
31.3.2 Optimization
∞
𝔼 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (2)
𝑡=0
subject to
where
• 𝑢 is a bounded, continuous and strictly increasing utility function and
• 𝛽 ∈ (0, 1) is a discount factor
In (3) we are assuming that the resource constraint (1) holds with equality — which is rea-
sonable because 𝑢 is strictly increasing and no output will be wasted at the optimum.
In summary, the agent’s aim is to select a path 𝑐0 , 𝑐1 , 𝑐2 , … for consumption that is
31.3. THE MODEL 569
1. nonnegative,
3. optimal, in the sense that it maximizes (2) relative to all other feasible consumption
sequences, and
4. adapted, in the sense that the action 𝑐𝑡 depends only on observable outcomes, not future
outcomes such as 𝜉𝑡+1
One way to think about solving this problem is to look for the best policy function.
A policy function is a map from past and present observables into current action.
We’ll be particularly interested in Markov policies, which are maps from the current state
𝑦𝑡 into a current action 𝑐𝑡 .
For dynamic programming problems such as this one (in fact for any Markov decision pro-
cess), the optimal policy is always a Markov policy.
In other words, the current state 𝑦𝑡 provides a sufficient statistic for the history in terms of
making an optimal decision today.
This is quite intuitive but if you wish you can find proofs in texts such as [100] (section 4.1).
Hereafter we focus on finding the best Markov policy.
In our context, a Markov policy is a function 𝜎 ∶ ℝ+ → ℝ+ , with the understanding that states
are mapped to actions via
In other words, a feasible consumption policy is a Markov policy that respects the resource
constraint.
The set of all feasible consumption policies will be denoted by Σ.
Each 𝜎 ∈ Σ determines a continuous state Markov process {𝑦𝑡 } for output via
This is the time path for output when we choose and stick with the policy 𝜎.
570CHAPTER 31. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
∞ ∞
𝔼 [ ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) ] = 𝔼 [ ∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 )) ] (6)
𝑡=0 𝑡=0
This is the total expected present value of following policy 𝜎 forever, given initial income 𝑦0 .
The aim is to select a policy that makes this number as large as possible.
The next section covers these ideas more formally.
31.3.4 Optimality
The policy value function 𝑣𝜎 associated with a given policy 𝜎 is the mapping defined by
∞
𝑣𝜎 (𝑦) = 𝔼 [∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 ))] (7)
𝑡=0
The value function gives the maximal value that can be obtained from state 𝑦, after consider-
ing all feasible policies.
A policy 𝜎 ∈ Σ is called optimal if it attains the supremum in (8) for all 𝑦 ∈ ℝ+ .
With our assumptions on utility and production function, the value function as defined in (8)
also satisfies a Bellman equation.
For this problem, the Bellman equation takes the form
The primary importance of the value function is that we can use it to compute optimal poli-
cies.
The details are as follows.
Given a continuous function 𝑤 on ℝ+ , we say that 𝜎 ∈ Σ is 𝑤-greedy if 𝜎(𝑦) is a solution to
for every 𝑦 ∈ ℝ+ .
In other words, 𝜎 ∈ Σ is 𝑤-greedy if it optimally trades off current and future rewards when
𝑤 is taken to be the value function.
In our setting, we have the following key result
The intuition is similar to the intuition for the Bellman equation, which was provided after
(9).
See, for example, theorem 10.1.11 of EDTC.
Hence, once we have a good approximation to 𝑣∗ , we can compute the (approximately) opti-
mal policy by computing the corresponding greedy policy.
The advantage is that we are now solving a much lower dimensional optimization problem.
In other words, 𝑇 sends the function 𝑤 into the new function 𝑇 𝑤 defined (11).
By construction, the set of solutions to the Bellman equation (9) exactly coincides with the
set of fixed points of 𝑇 .
For example, if 𝑇 𝑤 = 𝑤, then, for any 𝑦 ≥ 0,
One can also show that 𝑇 is a contraction mapping on the set of continuous bounded func-
tions on ℝ+ under the supremum distance
The results stated above assume that the utility function is bounded.
In practice economists often work with unbounded utility functions — and so will we.
In the unbounded setting, various optimality theories exist.
Unfortunately, they tend to be case specific, as opposed to valid for a large range of applica-
tions.
Nevertheless, their main conclusions are usually in line with those stated for the bounded case
just above (as long as we drop the word “bounded”).
Consult, for example, section 12.2 of EDTC, [61] or [74].
31.4. COMPUTATION 573
31.4 Computation
Let’s now look at computing the value function and the optimal policy.
The first step is to compute the value function by value function iteration.
In theory, the algorithm is as follows
1. Begin with an array of values {𝑤1 , … , 𝑤𝐼 } representing the values of some initial func-
tion 𝑤 on the grid points {𝑦1 , … , 𝑦𝐼 }.
2. Build a function 𝑤̂ on the state space ℝ+ by interpolation or approximation, based on
these data points.
3. Obtain and record the value 𝑇 𝑤(𝑦
̂ 𝑖 ) on each grid point 𝑦𝑖 by repeatedly solving (11).
4. Unless some stopping condition is satisfied, set {𝑤1 , … , 𝑤𝐼 } = {𝑇 𝑤(𝑦
̂ 1 ), … , 𝑇 𝑤(𝑦
̂ 𝐼 )} and
go to step 2.
31.4.2 Setup
Af = LinearInterpolation(c_grid, f(c_grid))
0.5,
label = "")
plot!(plt, legend = :top)
Out[3]:
Another advantage of piecewise linear interpolation is that it preserves useful shape properties
such as monotonicity and concavity / convexity.
31.4. COMPUTATION 575
Here’s a function that implements the Bellman operator using linear interpolation
Tw = Optim.maximum.(results)
if compute_policy
σ = Optim.maximizer.(results)
return Tw, σ
end
return Tw
end
Notice that the expectation in (11) is computed via Monte Carlo, using the approximation
1 𝑛
∫ 𝑤(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) ≈ ∑ 𝑤(𝑓(𝑦 − 𝑐)𝜉𝑖 )
𝑛 𝑖=1
31.4.4 An Example
𝜎∗ (𝑦) = (1 − 𝛼𝛽)𝑦
In [5]: α = 0.4
β = 0.96
μ = 0
s = 0.1
c1 = log(1 - α * β) / (1 - β)
c2 = (μ + α * log(α * β)) / (1 - α)
c3 = 1 / (1 - β)
c4 = 1 / (1 - α * β)
# Utility
u(c) = log(c)
∂u∂c(c) = 1 / c
f′(k) = α * k^(α - 1)
To test our code, we want to see if we can replicate the analytical solution numerically, using
fitted value function iteration.
We need a grid and some shock draws for Monte Carlo integration.
Out[7]:
The two functions are essentially indistinguishable, so we are off to a good start.
Now let’s have a look at iterating with the Bellman operator, starting off from an arbitrary
initial condition.
The initial condition we’ll start with is 𝑤(𝑦) = 5 ln(𝑦)
label = "")
end
Out[8]:
1. the first 36 functions generated by the fitted value function iteration algorithm, with
hotter colors given to higher iterates
end
Out[10]:
To compute an approximate optimal policy, we take the approximate value function we just
calculated and then compute the corresponding greedy policy.
The next figure compares the result to the exact solution, which, as mentioned above, is
𝜎(𝑦) = (1 − 𝛼𝛽)𝑦.
Out[11]:
580CHAPTER 31. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
The figure shows that we’ve done a good job in this instance of approximating the true pol-
icy.
31.5 Exercises
31.5.1 Exercise 1
In [12]: s = 0.05
shocks = exp.(μ .+ s * randn(shock_size))
Otherwise, the parameters and primitives are the same as the log linear model discussed ear-
lier in the lecture.
Notice that more patient agents typically have higher wealth.
Replicate the figure modulo randomness.
31.6 Solutions
31.6.1 Exercise 1
Here’s one solution (assuming as usual that you’ve executed everything above)
plt = plot()
582CHAPTER 31. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
Out[13]:
Chapter 32
32.1 Contents
• Overview 32.2
• The Euler Equation 32.3
• Comparison with Value Function Iteration 32.4
• Implementation 32.5
• Exercises 32.6
• Solutions 32.7
32.2 Overview
In this lecture we’ll continue our earlier study of the stochastic optimal growth model.
In that lecture we solved the associated discounted dynamic programming problem using
value function iteration.
The beauty of this technique is its broad applicability.
With numerical problems, however, we can often attain higher efficiency in specific applica-
tions by deriving methods that are carefully tailored to the application at hand.
The stochastic optimal growth model has plenty of structure to exploit for this purpose, espe-
cially when we adopt some concavity and smoothness assumptions over primitives.
We’ll use this structure to obtain an Euler equation based method that’s more efficient
than value function iteration for this and some other closely related applications.
In a subsequent lecture we’ll see that the numerical implementation part of the Euler equa-
tion method can be further adjusted to obtain even more efficiency.
Let’s take the model set out in the stochastic growth model lecture and add the assumptions
that
583
584 CHAPTER 32. OPTIMAL GROWTH II: TIME ITERATION
2. 𝑓(0) = 0
3. lim𝑐→0 𝑢′ (𝑐) = ∞ and lim𝑐→∞ 𝑢′ (𝑐) = 0
4. lim𝑘→0 𝑓 ′ (𝑘) = ∞ and lim𝑘→∞ 𝑓 ′ (𝑘) = 0
The last result is called the envelope condition due to its relationship with the envelope
theorem.
To see why (2) might be valid, write the Bellman equation in the equivalent form
Combining (2) and the first-order condition (3) gives the famous Euler equation
over interior consumption policies 𝜎, one solution of which is the optimal policy 𝑐∗ .
Our aim is to solve the functional equation (5) and hence obtain 𝑐∗ .
32.3. THE EULER EQUATION 585
Just as we introduced the Bellman operator to solve the Bellman equation, we will now intro-
duce an operator over policies to help us solve the Euler equation.
This operator 𝐾 will act on the set of all 𝜎 ∈ Σ that are continuous, strictly increasing and
interior (i.e., 0 < 𝜎(𝑦) < 𝑦 for all strictly positive 𝑦).
Henceforth we denote this set of policies by 𝒫
2. returns a new function 𝐾𝜎, where 𝐾𝜎(𝑦) is the 𝑐 ∈ (0, 𝑦) that solves
We call this operator the Coleman operator to acknowledge the work of [17] (although
many people have studied this and other closely related iterative techniques).
In essence, 𝐾𝜎 is the consumption policy that the Euler equation tells you to choose today
when your future consumption policy is 𝜎.
The important thing to note about 𝐾 is that, by construction, its fixed points coincide with
solutions to the functional equation (5).
In particular, the optimal policy 𝑐∗ is a fixed point.
Indeed, for fixed 𝑦, the value 𝐾𝑐∗ (𝑦) is the 𝑐 that solves
Sketching these curves and using the information above will convince you that they cross ex-
actly once as 𝑐 ranges over (0, 𝑦).
With a bit more analysis, one can show in addition that 𝐾𝜎 ∈ 𝒫 whenever 𝜎 ∈ 𝒫.
How does Euler equation time iteration compare with value function iteration?
Both can be used to compute the optimal policy, but is one faster or more accurate?
There are two parts to this story.
First, on a theoretical level, the two methods are essentially isomorphic.
In particular, they converge at the same rate.
We’ll prove this in just a moment.
The other side to the story is the speed of the numerical implementation.
It turns out that, once we actually implement these two routines, time iteration is faster and
more accurate than value function iteration.
More on this below.
• the two functions commute under 𝜏 , which is to say that 𝜏 (𝑔(𝑥)) = ℎ(𝜏 (𝑥)) for all
𝑥∈𝑋
The last statement can be written more simply as
𝜏 ∘𝑔 =ℎ∘𝜏
𝑔 = 𝜏 −1 ∘ ℎ ∘ 𝜏 (8)
Here’s a similar figure that traces out the action of the maps on a point 𝑥 ∈ 𝑋
In fact, if you like proofs by induction, you won’t have trouble showing that
𝑔𝑛 = 𝜏 −1 ∘ ℎ𝑛 ∘ 𝜏
A Bijection
Let 𝒱 be all strictly concave, continuously differentiable functions 𝑣 mapping ℝ+ to itself and
satisfying 𝑣(0) = 0 and 𝑣′ (𝑦) > 𝑢′ (𝑦) for all positive 𝑦.
For 𝑣 ∈ 𝒱 let
Commutative Operators
It is an additional solved exercise (see below) to show that 𝑇 and 𝐾 commute under 𝑀 , in
the sense that
𝑀 ∘𝑇 =𝐾 ∘𝑀 (9)
𝑇 𝑛 = 𝑀 −1 ∘ 𝐾 𝑛 ∘ 𝑀
32.5 Implementation
We’ve just shown that the operators 𝑇 and 𝐾 have the same rate of convergence.
However, it turns out that, once numerical approximation is taken into account, significant
differences arises.
In particular, the image of policy functions under 𝐾 can be calculated faster and with greater
accuracy than the image of value functions under 𝑇 .
Our intuition for this result is that
• the Coleman operator exploits more information because it uses first order and envelope
conditions
• policy functions generally have less curvature than value functions, and hence admit
more accurate approximations based on grid point information
32.5.2 Setup
gr(fmt = :png);
# The following function does NOT require the container of the output�
↪ value as argument
K(g, grid, β, ∂u∂c, f, f′, shocks) =
K!(similar(g), g, grid, β, ∂u∂c, f, f′, shocks)
It has some similarities to the code for the Bellman operator in our optimal growth lecture.
For example, it evaluates integrals by Monte Carlo and approximates functions using linear
interpolation.
Here’s that Bellman operator code again, which needs to be executed because we’ll use it in
some tests below
if compute_policy
σ = similar(w)
end
if compute_policy
σ[i] = Optim.maximizer(res)
end
Tw[i] = Optim.maximum(res)
end
if compute_policy
return Tw, σ
else
return Tw
end
end
As we did for value function iteration, let’s start by testing our method in the presence of a
model that does have an analytical solution.
32.5. IMPLEMENTATION 591
Here’s an object containing data from the log-linear growth model we used in the value func-
tion iteration lecture
σ)
s = 0.1, # Second parameter in�
↪lognorm(μ,
σ)
grid = range(1e-6, 4, length = 200), # Grid
grid_min = 1e-6, # Smallest grid point
grid_max = 4.0, # Largest grid point
grid_size = 200, # Number of grid points
u = (c, γ = γ) -> isoelastic(c, γ), # utility function
∂u∂c = c -> c^(-γ), # u′
f = k -> k^α, # production function
f′ = k -> α * k^(α - 1), # f′
)
In [7]: m = Model();
Out[10]:
We can’t really distinguish the two plots, so we are looking good, at least for this test.
Next let’s try iterating from an arbitrary initial condition and see if we converge towards 𝑐∗ .
The initial condition we’ll use is the one that eats the whole pie: 𝑐(𝑦) = 𝑦
for i in 1:n_iter
new_g = K(g, grid, β, ∂u∂c, f, f′, shocks)
g = new_g
plot!(grid, g, lw = 2, alpha = 0.6, label = "")
end
plot!(grid, c_star, color = :black, lw = 2, alpha = 0.8,
label = "true policy function c_star")
plot!(legend = :topleft)
end
Out[12]:
32.5. IMPLEMENTATION 593
We see that the policy has converged nicely, in only a few steps.
Now let’s compare the accuracy of iteration using the Coleman and Bellman operators.
We’ll generate
1. 𝐾 𝑛 𝑐 where 𝑐(𝑦) = 𝑦
2. (𝑀 ∘ 𝑇 𝑛 ∘ 𝑀 −1 )𝑐 where 𝑐(𝑦) = 𝑦
pf_error = c_star - g
vf_error = c_star - vf_g
Out[14]:
As you can see, time iteration is much more accurate for a given number of iterations.
32.6 Exercises
32.6.1 Exercise 1
32.6.2 Exercise 2
32.6.3 Exercise 3
Consider the same model as above but with the CRRA utility function
𝑐1−𝛾 − 1
𝑢(𝑐) =
1−𝛾
Iterate 20 times with Bellman iteration and Euler equation time iteration
• start time iteration from 𝑐(𝑦) = 𝑦
• start value function iteration from 𝑣(𝑦) = 𝑢(𝑦)
• set 𝛾 = 1.5
Compare the resulting policies and check that they are close.
32.6.4 Exercise 4
Do the same exercise, but now, rather than plotting results, benchmark both approaches with
20 iterations.
32.7 Solutions
Here’s the code, which will execute if you’ve run all the code above
Out[17]:
Out[19]: BenchmarkTools.Trial:
memory estimate: 155.94 MiB
allocs estimate: 90741
--------------
minimum time: 367.074 ms (4.34% GC)
median time: 375.230 ms (4.24% GC)
mean time: 375.616 ms (4.53% GC)
maximum time: 383.385 ms (4.15% GC)
--------------
samples: 14
evals/sample: 1
Out[20]: BenchmarkTools.Trial:
memory estimate: 155.94 MiB
allocs estimate: 90741
--------------
minimum time: 374.601 ms (4.40% GC)
median time: 383.946 ms (4.32% GC)
mean time: 384.606 ms (4.61% GC)
maximum time: 395.835 ms (5.72% GC)
--------------
samples: 13
evals/sample: 1
Chapter 33
33.1 Contents
• Overview 33.2
• Key Idea 33.3
• Implementation 33.4
• Speed 33.5
33.2 Overview
Let’s start by reminding ourselves of the theory and then see how the numerics fit in.
599
600 CHAPTER 33. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD
33.3.1 Theory
Take the model set out in the time iteration lecture, following the same terminology and no-
tation.
The Euler equation is
The method discussed above requires a root finding routine to find the 𝑐𝑖 corresponding to a
given income value 𝑦𝑖 .
Root finding is costly because it typically involves a significant number of function evalua-
tions.
As pointed out by Carroll [16], we can avoid this if 𝑦𝑖 is chosen endogenously.
The only assumption required is that 𝑢′ is invertible on (0, ∞).
The idea is this:
First we fix an exogenous grid {𝑘𝑖 } for capital (𝑘 = 𝑦 − 𝑐).
Then we obtain 𝑐𝑖 via
33.4. IMPLEMENTATION 601
33.4 Implementation
Let’s implement this version of the Coleman operator and see how it performs.
33.4.2 Setup
gr(fmt = :png);
# The following function does NOT require the container of the output�
↪ value as argument
K(g, grid, β, u′, f, f′, shocks) =
K!(similar(g), g, grid, β, u′, f, f′, shocks)
Let’s test out the code above on some example parameterizations, after the following imports.
As we did for value function iteration and time iteration, let’s start by testing our method
with the log-linear benchmark.
The first step is to bring in the model that we used in the Coleman policy function iteration
In [5]: # model
plt = plot()
plot!(plt, k_grid, c_star.(k_grid), lw = 2, label = "optimal policy c*")
plot!(plt, k_grid, c_star_new.(k_grid), lw = 2, label = "Kc*")
plot!(plt, legend = :topleft)
end
Out[10]:
604 CHAPTER 33. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD
Out[11]: 1.3322676295501878e-15
Next let’s try iterating from an arbitrary initial condition and see if we converge towards 𝑐∗ .
Let’s start from the consumption policy that eats the whole pie: 𝑐(𝑦) = 𝑦
In [12]: n = 15
function check_convergence(m, shocks, c_star, g_init, n_iter)
k_grid = m.grid
g = g_init
plt = plot()
plot!(plt, m.grid, g.(m.grid),
color = RGBA(0,0,0,1), lw = 2, alpha = 0.6, label = "initial�
↪condition c(y) =
y")
for i in 1:n_iter
new_g = coleman_egm(g, k_grid, m.β, m.u′, m.u′, m.f, m.f′, shocks)
g = new_g
plot!(plt, k_grid, new_g.(k_grid), alpha = 0.6, color =�
↪RGBA(0,0,(i / n_iter),
33.5. SPEED 605
1),
lw = 2, label = "")
end
Out[13]:
We see that the policy has converged nicely, in only a few steps.
33.5 Speed
Now let’s compare the clock times per iteration for the standard Coleman operator (with ex-
ogenous grid) and the EGM version.
We’ll do so using the CRRA model adopted in the exercises of the Euler equation time itera-
tion lecture.
Here’s the model and some convenient functions
In [15]: crra_coleman(g, m, shocks) = K(g, m.grid, m.β, m.u′, m.f, m.f′, shocks)
crra_coleman_egm(g, m, shocks) = coleman_egm(g, m.grid, m.β, m.u′,
u′_inv, m.f, m.f′, shocks)
function coleman(m = m, shocks = shocks; sim_length = 20)
g = m.grid
for i in 1:sim_length
g = crra_coleman(g, m, shocks)
end
return g
end
function egm(m, g = identity, shocks = shocks; sim_length = 20)
for i in 1:sim_length
g = crra_coleman_egm(g, m, shocks)
end
return g.(m.grid)
end
Out[16]: BenchmarkTools.Trial:
memory estimate: 1.03 GiB
allocs estimate: 615012
--------------
minimum time: 7.876 s (1.43% GC)
median time: 7.876 s (1.43% GC)
mean time: 7.876 s (1.43% GC)
maximum time: 7.876 s (1.43% GC)
--------------
samples: 1
evals/sample: 1
Out[17]: BenchmarkTools.Trial:
memory estimate: 18.50 MiB
allocs estimate: 76226
--------------
minimum time: 171.206 ms (0.00% GC)
median time: 177.875 ms (0.00% GC)
mean time: 177.749 ms (1.21% GC)
maximum time: 184.593 ms (3.09% GC)
--------------
samples: 29
evals/sample: 1
LQ Dynamic Programming
Problems
34.1 Contents
• Overview 34.2
• Introduction 34.3
• Optimality – Finite Horizon 34.4
• Implementation 34.5
• Extensions and Comments 34.6
• Further Applications 34.7
• Exercises 34.8
• Solutions 34.9
34.2 Overview
Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have
found applications in almost every scientific field.
This lecture provides an introduction to LQ control and its economic applications.
As we will see, LQ systems have a simple structure that makes them an excellent workhorse
for a wide variety of economic problems.
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than
it may appear initially.
These themes appear repeatedly below.
Mathematically, LQ control problems are closely related to the Kalman filter.
• Recursive formulations of linear-quadratic control problems and Kalman filtering prob-
lems both involve matrix Riccati equations.
• Classical formulations of linear control and linear filtering problems make use of similar
matrix decompositions (see for example this lecture and this lecture).
In reading what follows, it will be useful to have some familiarity with
• matrix manipulations
• vectors of random variables
607
608 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
• dynamic programming and the Bellman equation (see for example this lecture and this
lecture)
For additional reading on LQ control, see, for example,
• [68], chapter 5
• [39], chapter 4
• [53], section 3.5
In order to focus on computation, we leave longer proofs to these sources (while trying to pro-
vide as much intuition as possible).
34.3 Introduction
The “linear” part of LQ is a linear law of motion for the state, while the “quadratic” part
refers to preferences.
Let’s begin with the former, move on to the latter, and then put them together into an opti-
mization problem.
Here
• 𝑢𝑡 is a “control” vector, incorporating choices available to a decision maker confronting
the current state 𝑥𝑡 .
• {𝑤𝑡 } is an uncorrelated zero mean shock process satisfying 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼, where the right-
hand side is the identity matrix.
Regarding the dimensions
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
Example 1
𝑎𝑡+1 + 𝑐𝑡 = (1 + 𝑟)𝑎𝑡 + 𝑦𝑡
Here 𝑎𝑡 is assets, 𝑟 is a fixed interest rate, 𝑐𝑡 is current consumption, and 𝑦𝑡 is current non-
financial income.
If we suppose that {𝑦𝑡 } is serially uncorrelated and 𝑁 (0, 𝜎2 ), then, taking {𝑤𝑡 } to be stan-
dard normal, we can write the system as
34.3. INTRODUCTION 609
This is clearly a special case of (1), with assets being the state and consumption being the
control.
Example 2
One unrealistic feature of the previous model is that non-financial income has a zero mean
and is often negative.
This can easily be overcome by adding a sufficiently large mean.
Hence in this example we take 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 for some positive real number 𝜇.
Another alteration that’s useful to introduce (we’ll see why soon) is to change the control
variable from consumption to the deviation of consumption from some “ideal” quantity 𝑐.̄
(Most parameterizations will be such that 𝑐 ̄ is large relative to the amount of consumption
that is attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐.̄
In terms of these variables, the budget constraint 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 becomes
How can we write this new system in the form of equation (1)?
If, as in the previous example, we take 𝑎𝑡 as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side.
This means that we are dealing with an affine function, not a linear one (recall this discus-
sion).
Fortunately, we can easily circumvent this problem by adding an extra state variable.
In particular, if we write
𝑎𝑡+1 1 + 𝑟 −𝑐 ̄ + 𝜇 𝑎 −1 𝜎
( )=( )( 𝑡 ) + ( ) 𝑢𝑡 + ( ) 𝑤𝑡+1 (3)
1 0 1 1 0 0
𝑎𝑡 1 + 𝑟 −𝑐 ̄ + 𝜇 −1 𝜎
𝑥𝑡 ∶= ( ), 𝐴 ∶= ( ), 𝐵 ∶= ( ), 𝐶 ∶= ( ) (4)
1 0 1 0 0
34.3.2 Preferences
In the LQ model, the aim is to minimize a flow of losses, where time-𝑡 loss is given by the
quadratic expression
610 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
Here
• 𝑅 is assumed to be 𝑛 × 𝑛, symmetric and nonnegative definite
• 𝑄 is assumed to be 𝑘 × 𝑘, symmetric and positive definite
Note
In fact, for many economic problems, the definiteness conditions on 𝑅 and 𝑄 can
be relaxed. It is sufficient that certain submatrices of 𝑅 and 𝑄 be nonnegative
definite. See [39] for details
Example 1
A very simple example that satisfies these assumptions is to take 𝑅 and 𝑄 to be identity ma-
trices, so that current loss is
Thus, for both the state and the control, loss is measured as squared distance from the origin.
(In fact the general case (5) can also be understood in this way, but with 𝑅 and 𝑄 identifying
other – non-Euclidean – notions of “distance” from the zero vector).
Intuitively, we can often think of the state 𝑥𝑡 as representing deviation from a target, such as
• deviation of inflation from some target level
• deviation of a firm’s capital stock from some desired quantity
The aim is to put the state close to the target, while using controls parsimoniously.
Example 2
Under this specification, the household’s current loss is the squared deviation of consumption
from the ideal level 𝑐.̄
Let’s now be precise about the optimization problem we wish to consider, and look at how to
solve it.
We will begin with the finite horizon case, with terminal time 𝑇 ∈ ℕ.
34.4. OPTIMALITY – FINITE HORIZON 611
In this case, the aim is to choose a sequence of controls {𝑢0 , … , 𝑢𝑇 −1 } to minimize the objec-
tive
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (6)
𝑡=0
34.4.2 Information
There’s one constraint we’ve neglected to mention so far, which is that the decision maker
who solves this LQ problem knows only the present and the past, not the future.
To clarify this point, consider the sequence of controls {𝑢0 , … , 𝑢𝑇 −1 }.
When choosing these controls, the decision maker is permitted to take into account the effects
of the shocks {𝑤1 , … , 𝑤𝑇 } on the system.
However, it is typically assumed — and will be assumed here — that the time-𝑡 control 𝑢𝑡
can be made with knowledge of past and present shocks only.
The fancy measure-theoretic way of saying this is that 𝑢𝑡 must be measurable with respect to
the 𝜎-algebra generated by 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
This is in fact equivalent to stating that 𝑢𝑡 can be written in the form 𝑢𝑡 =
𝑔𝑡 (𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 ) for some Borel measurable function 𝑔𝑡 .
(Just about every function that’s useful for applications is Borel measurable, so, for the pur-
poses of intuition, you can read that last phrase as “for some function 𝑔𝑡 ”).
Now note that 𝑥𝑡 will ultimately depend on the realizations of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
In fact it turns out that 𝑥𝑡 summarizes all the information about these historical shocks that
the decision maker needs to set controls optimally.
More precisely, it can be shown that any optimal control 𝑢𝑡 can always be written as a func-
tion of the current state alone.
Hence in what follows we restrict attention to control policies (i.e., functions) of the form
𝑢𝑡 = 𝑔𝑡 (𝑥𝑡 ).
Actually, the preceding discussion applies to all standard dynamic programming problems.
What’s special about the LQ case is that – as we shall soon see — the optimal 𝑢𝑡 turns out
to be a linear function of 𝑥𝑡 .
612 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
34.4.3 Solution
To solve the finite horizon LQ problem we can use a dynamic programming strategy based on
backwards induction that is conceptually similar to the approach adopted in this lecture.
For reasons that will soon become clear, we first introduce the notation 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥.
Now consider the problem of the decision maker in the second to last period.
In particular, let the time be 𝑇 − 1, and suppose that the state is 𝑥𝑇 −1 .
The decision maker must trade off current and (discounted) final losses, and hence solves
The function 𝐽𝑇 −1 will be called the 𝑇 − 1 value function, and 𝐽𝑇 −1 (𝑥) can be thought of as
representing total “loss-to-go” from state 𝑥 at time 𝑇 − 1 when the decision maker behaves
optimally.
Now let’s step back to 𝑇 − 2.
For a decision maker at 𝑇 − 2, the value 𝐽𝑇 −1 (𝑥) plays a role analogous to that played by the
terminal loss 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥 for the decision maker at 𝑇 − 1.
That is, 𝐽𝑇 −1 (𝑥) summarizes the future loss associated with moving to state 𝑥.
The decision maker chooses her control 𝑢 to trade off current loss against future loss, where
• the next period state is 𝑥𝑇 −1 = 𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 , and hence depends on the choice
of current control
• the “cost” of landing in state 𝑥𝑇 −1 is 𝐽𝑇 −1 (𝑥𝑇 −1 )
Her problem is therefore
Letting
The first equality is the Bellman equation from dynamic programming theory specialized to
the finite horizon LQ problem.
Now that we have {𝐽0 , … , 𝐽𝑇 }, we can obtain the optimal controls.
34.4. OPTIMALITY – FINITE HORIZON 613
As a first step, let’s find out what the value functions look like.
It turns out that every 𝐽𝑡 has the form 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 + 𝑑𝑡 where 𝑃𝑡 is a 𝑛 × 𝑛 matrix and 𝑑𝑡
is a constant.
We can show this by induction, starting from 𝑃𝑇 ∶= 𝑅𝑓 and 𝑑𝑇 = 0.
Using this notation, (7) becomes
To obtain the minimizer, we can take the derivative of the r.h.s. with respect to 𝑢 and set it
equal to zero.
Applying the relevant rules of matrix calculus, this gives
𝐽𝑇 −1 (𝑥) = 𝑥′ 𝑃𝑇 −1 𝑥 + 𝑑𝑇 −1
where
and
𝑑𝑇 −1 ∶= 𝛽 trace(𝐶 ′ 𝑃𝑇 𝐶) (11)
and
34.5 Implementation
We will use code from lqcontrol.jl in QuantEcon.jl to solve finite and infinite horizon linear
quadratic control problems.
In the module, the various updating, simulation and fixed point methods act on a type called
LQ, which includes
• Instance data:
– The required parameters 𝑄, 𝑅, 𝐴, 𝐵 and optional parameters C, β, T, R_f, N spec-
ifying a given LQ model
* set 𝑇 and 𝑅𝑓 to None in the infinite horizon case
* set C = None (or zero) in the deterministic case
– the value function and policy data
* 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 in the finite horizon case
* 𝑑, 𝑃 , 𝐹 in the infinite horizon case
• Methods:
– update_values — shifts 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 to their 𝑡 − 1 values via (12), (13) and (14)
– stationary_values — computes 𝑃 , 𝑑, 𝐹 in the infinite horizon case
– compute_sequence —- simulates the dynamics of 𝑥𝑡 , 𝑢𝑡 , 𝑤𝑡 given 𝑥0 and assum-
ing standard normal shocks
34.5.1 An Application
Early Keynesian models assumed that households have a constant marginal propensity to
consume from current income.
Data contradicted the constancy of the marginal propensity to consume.
In response, Milton Friedman, Franco Modigliani and others built models based on a con-
sumer’s preference for an intertemporally smooth consumption stream.
(See, for example, [32] or [79]).
One property of those models is that households purchase and sell financial assets to make
consumption streams smoother than income streams.
The household savings problem outlined above captures these ideas.
The optimization problem for the household is to choose a consumption sequence in order to
minimize
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (16)
𝑡=0
(Without such a constraint, the optimal choice is to choose 𝑐𝑡 = 𝑐 ̄ in each period, letting as-
sets adjust accordingly).
As before we set 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 and 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐,̄ after which the constraint can be written as
in (2).
We saw how this constraint could be manipulated into the LQ formulation 𝑥𝑡+1 = 𝐴𝑥𝑡 +𝐵𝑢𝑡 +
𝐶𝑤𝑡+1 by setting 𝑥𝑡 = (𝑎𝑡 1)′ and using the definitions in (4).
To match with this state and control, the objective function (16) can be written in the form
of (6) by choosing
0 0 𝑞 0
𝑄 ∶= 1, 𝑅 ∶= ( ), and 𝑅𝑓 ∶= ( )
0 0 0 0
Now that the problem is expressed in LQ form, we can proceed to the solution by applying
(12) and (14).
After generating shocks 𝑤1 , … , 𝑤𝑇 , the dynamics for assets and consumption can be simu-
lated via (15).
The following figure was computed using 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 2, 𝜇 = 1, 𝜎 = 0.25, 𝑇 = 45
and 𝑞 = 106 .
The shocks {𝑤𝑡 } were taken to be iid and standard normal.
34.5.2 Setup
# formulate as an LQ problem
Q = 1.0
R = zeros(2, 2)
Rf = zeros(2, 2); Rf[1, 1] = q
�
A = [1 + r -c + μ; 0 1]
B = [-1.0, 0]
C = [σ, 0]
# plot results
p = plot([assets, c, zeros(T + 1), income, cumsum(income .- μ)],
lab = ["assets" "consumption" "" "non-financial income" "cumulative
unanticipated income"],
color = [:blue :green :black :orange :red],
xaxis = "Time", layout = (2, 1),
bottom_margin = 20mm, size = (600, 600))
Out[3]:
The top panel shows the time path of consumption 𝑐𝑡 and income 𝑦𝑡 in the simulation.
34.5. IMPLEMENTATION 617
𝑡
𝑧𝑡 ∶= ∑ 𝜎𝑤𝑡
𝑗=0
A key message is that unanticipated windfall gains are saved rather than consumed, while
unanticipated negative shocks are met by reducing assets.
(Again, this relationship breaks down towards the end of life due to the zero final asset re-
quirement).
These results are relatively robust to changes in parameters.
For example, let’s increase 𝛽 from 1/(1 + 𝑟) ≈ 0.952 to 0.96 while keeping other parameters
fixed.
This consumer is slightly more patient than the last one, and hence puts relatively more
weight on later consumption values
# plot results
p = plot([assets, c, zeros(T + 1), income, cumsum(income .- μ)],
lab = ["assets" "consumption" "" "non-financial income" "cumulative
unanticipated income"],
color = [:blue :green :black :orange :red],
xaxis = "Time", layout = (2, 1),
bottom_margin = 20mm, size = (600, 600))
Out[4]:
618 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
We now have a slowly rising consumption stream and a hump-shaped build up of assets in the
middle periods to fund rising consumption.
However, the essential features are the same: consumption is smooth relative to income, and
assets are strongly positively correlated with cumulative unanticipated income.
Let’s now consider a number of standard extensions to the LQ problem treated above.
variables.
One illustration is given below.
For further examples and a more systematic treatment, see [40], section 2.4.
In some LQ problems, preferences include a cross-product term 𝑢′𝑡 𝑁 𝑥𝑡 , so that the objective
function becomes
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (17)
𝑡=0
Finally, we consider the infinite horizon case, with cross-product term, unchanged dynamics
and objective function given by
∞
𝔼 {∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 )} (20)
𝑡=0
In the infinite horizon case, optimal policies can depend on time only if time itself is a compo-
nent of the state vector 𝑥𝑡 .
In other words, there exists a fixed matrix 𝐹 such that 𝑢𝑡 = −𝐹 𝑥𝑡 for all 𝑡.
That decision rules are constant over time is intuitive — after all, the decision maker faces
the same infinite horizon at every stage, with only the current state changing.
Not surprisingly, 𝑃 and 𝑑 are also constant.
The stationary matrix 𝑃 is the solution to the discrete time algebraic Riccati equation.
Equation (21) is also called the LQ Bellman equation, and the map that sends a given 𝑃 into
the right-hand side of (21) is called the LQ Bellman operator.
The stationary optimal policy for this model is
𝛽
𝑑 ∶= trace(𝐶 ′ 𝑃 𝐶) (23)
1−𝛽
The state evolves according to the time-homogeneous process 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1 .
An example infinite horizon problem is treated below.
Linear quadratic control problems of the class discussed above have the property of certainty
equivalence.
By this we mean that the optimal policy 𝐹 is not affected by the parameters in 𝐶, which
specify the shock process.
This can be confirmed by inspecting (22) or (19).
It follows that we can ignore uncertainty when solving for optimal behavior, and plug it back
in when examining optimal state dynamics.
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (24)
𝑡=0
The coefficients 𝑚0 , 𝑚1 , 𝑚2 are chosen such that 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and 𝑝(𝑇 ) = 0.
You can confirm that the specification 𝑚0 = 0, 𝑚1 = 𝑇 𝜇/(𝑇 /2)2 , 𝑚2 = −𝜇/(𝑇 /2)2 satisfies
these constraints.
To put this into an LQ setting, consider the budget constraint, which becomes
The fact that 𝑎𝑡+1 is a linear function of (𝑎𝑡 , 1, 𝑡, 𝑡2 ) suggests taking these four variables as
the state vector 𝑥𝑡 .
Once a good choice of state and control (recall 𝑢𝑡 = 𝑐𝑡 − 𝑐)̄ has been made, the remaining
specifications fall into place relatively easily.
Thus, for the dynamics we set
𝑎𝑡 1 + 𝑟 −𝑐 ̄ 𝑚1 𝑚2 −1 𝜎
⎛
⎜ 1 ⎞⎟ ⎛
⎜ 0 1 0 0 ⎞⎟ ⎜ 0 ⎞
⎛ ⎟ ⎜ 0 ⎞
⎛ ⎟
𝑥𝑡 ∶= ⎜
⎜ ⎟, 𝐴 ∶= ⎜ ⎟, 𝐵 ∶= ⎜ ⎟, 𝐶 ∶= ⎜ ⎟ (26)
⎜ 𝑡 ⎟⎟ ⎜
⎜ 0 1 1 0 ⎟⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ 0 ⎟
⎜ ⎟
2
⎝ 𝑡 ⎠ ⎝ 0 1 2 1 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠
If you expand the expression 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 using this specification, you will
find that assets follow (25) as desired, and that the other state variables also update appropri-
ately.
To implement preference specification (24) we take
0 0 0 0 𝑞 0 0 0
⎛
⎜ 0 0 0 0 ⎞
⎟ ⎛
⎜ 0 0 0 0 ⎞
⎟
𝑄 ∶= 1, 𝑅 ∶= ⎜
⎜ ⎟
⎟ and 𝑅𝑓 ∶= ⎜
⎜ ⎟
⎟ (27)
⎜ 0 0 0 0 ⎟ ⎜ 0 0 0 0 ⎟
⎝ 0 0 0 0 ⎠ ⎝ 0 0 0 0 ⎠
The next figure shows a simulation of consumption and assets computed using the
622 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
In the previous application, we generated income dynamics with an inverted U shape using
polynomials, and placed them in an LQ framework.
It is arguably the case that this income process still contains unrealistic features.
A more common earning profile is where
1. income grows over working life, fluctuating around an increasing trend, with growth
flattening off in later years
2. retirement follows, with lower but relatively stable (non-financial) income
𝑝(𝑡) + 𝜎𝑤𝑡+1 if 𝑡 ≤ 𝐾
𝑦𝑡 = { (28)
𝑠 otherwise
Here
34.7. FURTHER APPLICATIONS 623
• 𝑝(𝑡) ∶= 𝑚1 𝑡 + 𝑚2 𝑡2 with the coefficients 𝑚1 , 𝑚2 chosen such that 𝑝(𝐾) = 𝜇 and 𝑝(0) =
𝑝(2𝐾) = 0
• 𝑠 is retirement income
We suppose that preferences are unchanged and given by (16).
The budget constraint is also unchanged and given by 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 .
Our aim is to solve this problem and simulate paths using the LQ techniques described in this
lecture.
In fact this is a nontrivial problem, as the kink in the dynamics (28) at 𝐾 makes it very diffi-
cult to express the law of motion as a fixed-coefficient linear system.
However, we can still use our LQ methods here by suitably linking two component LQ prob-
lems.
These two LQ problems describe the consumer’s behavior during her working life
(lq_working) and retirement (lq_retired).
(This is possible because in the two separate periods of life, the respective income processes
[polynomial trend and constant] each fit the LQ framework)
The basic idea is that although the whole problem is not a single time-invariant LQ problem,
it is still a dynamic programming problem, and hence we can use appropriate Bellman equa-
tions at every stage.
Based on this logic, we can
1. solve lq_retired by the usual backwards induction procedure, iterating back to the
start of retirement
2. take the start-of-retirement value function generated by this process, and use it as the
terminal condition 𝑅𝑓 to feed into the lq_working specification
3. solve lq_working by backwards induction from this choice of 𝑅𝑓 , iterating back to the
start of working life
This process gives the entire life-time sequence of value functions and optimal policies.
624 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
The full set of parameters used in the simulation is discussed in Exercise 2, where you are
asked to replicate the figure.
Once again, the dominant feature observable in the simulation is consumption smoothing.
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving.
Assets peak at retirement and subsequently decline.
𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 + 𝑑𝑡
∞
𝔼 { ∑ 𝛽 𝑡 𝜋𝑡 } where 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 (29)
𝑡=0
Here
• 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 represents adjustment costs
• 𝑐 is average cost of production
This can be formulated as an LQ problem and then solved and simulated, but first let’s study
the problem and try to get some intuition.
One way to start thinking about the problem is to consider what would happen if 𝛾 = 0.
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose
output to maximize current profit in each period.
It’s not difficult to show that profit-maximizing output is
𝑎0 − 𝑐 + 𝑑 𝑡
𝑞𝑡̄ ∶=
2𝑎1
This makes no difference to the solution, since 𝑎1 𝑞𝑡2̄ does not depend on the controls.
(In fact we are just adding a constant term to (29), and optimizers are not affected by con-
stant terms)
The reason for making this substitution is that, as you will be able to verify, 𝜋𝑡̂ reduces to the
simple quadratic
∞
min 𝔼 ∑ 𝛽 𝑡 {𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 } (30)
𝑡=0
It’s now relatively straightforward to find 𝑅 and 𝑄 such that (30) can be written as (20).
Furthermore, the matrices 𝐴, 𝐵 and 𝐶 from (1) can be found by writing down the dynamics
of each element of the state.
Exercise 3 asks you to complete this process, and reproduce the preceding figures.
34.8 Exercises
34.8.1 Exercise 1
34.8.2 Exercise 2
With some careful footwork, the simulation can be generated by patching together the simu-
lations from these two separate models.
34.8.3 Exercise 3
34.9 Solutions
34.9.1 Exercise 1
𝑦𝑡 = 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1
where {𝑤𝑡 } is iid 𝑁 (0, 1) and the coefficients 𝑚1 and 𝑚2 are chosen so that 𝑝(𝑡) = 𝑚1 𝑡 + 𝑚2 𝑡2
has an inverted U shape with
• 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and
• 𝑝(𝑇 ) = 0.
# formulate as an LQ problem
Q = 1.0
R = zeros(4, 4)
Rf = zeros(4, 4); Rf[1, 1] = q
�
A = [1 + r -c m1 m2;
0 1 0 0;
0 1 1 0;
0 1 2 1]
B = [-1.0, 0.0, 0.0, 0.0]
C = [σ, 0.0, 0.0, 0.0]
# plot results
p1 = plot(Vector[income, ap, c, zeros(T + 1)],
lab = ["non-financial income" "assets" "consumption" ""],
color = [:orange :blue :green :black],
xaxis = "Time", layout = (2,1),
bottom_margin = 20mm, size = (600, 600))
Out[5]:
34.9.2 Exercise 2
This is a permanent income / life-cycle model with polynomial growth in income over work-
ing life followed by a fixed retirement income. The model is solved by combining two LQ pro-
630 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
x0 = [0.0, 1, 0, 0]
xp_w, up_w, wp_w = compute_sequence(lq_working, x0)
up = [up_w up_r]
�
c = vec(up .+ c) # consumption
time = 1:K
income_w = σ * vec(wp_w[1, 2:K+1]) + m1 .* time + m2 .* time.^2 # income
income_r = ones(T - K) * s
income = [income_w; income_r]
# plot results
p2 = plot([income, assets, c, zeros(T + 1)],
lab = ["non-financial income" "assets" "consumption" ""],
color = [:orange :blue :green :black],
xaxis = "Time", layout = (2, 1),
bottom_margin = 20mm, size = (600, 600))
Out[6]:
632 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
34.9.3 Exercise 3
The first task is to find the matrices 𝐴, 𝐵, 𝐶, 𝑄, 𝑅 that define the LQ problem.
Recall that 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡 .
Letting 𝑚0 ∶= (𝑎0 − 𝑐)/2𝑎1 and 𝑚1 ∶= 1/2𝑎1 , we can write 𝑞𝑡̄ = 𝑚0 + 𝑚1 𝑑𝑡 , and then, with
some manipulation
𝑞𝑡+1
̄ = 𝑚0 (1 − 𝜌) + 𝜌𝑞𝑡̄ + 𝑚1 𝜎𝑤𝑡+1
∞
min 𝔼 {∑ 𝛽 𝑡 𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 }
𝑡=0
34.9. SOLUTIONS 633
# useful constants
m0 = (a0 - c) / (2 * a1)
m1 = 1 / (2 * a1)
# formulate LQ problem
Q = γ
R = [a1 -a1 0; -a1 a1 0; 0 0 0]
A = [ρ 0 m0 * (1 - ρ); 0 1 0; 0 0 1]
B = [0.0, 1, 0]
C = [m1 * σ, 0, 0]
lq = QuantEcon.LQ(Q, R, A, B, C; bet = β)
Out[7]:
634 CHAPTER 34. LQ DYNAMIC PROGRAMMING PROBLEMS
Chapter 35
35.1 Contents
• Overview 35.2
• The Savings Problem 35.3
• Alternative Representations 35.4
• Two Classic Examples 35.5
• Further Reading 35.6
• Appendix: the Euler Equation 35.7
35.2 Overview
This lecture describes a rational expectations version of the famous permanent income model
of Milton Friedman [32].
Robert Hall cast Friedman’s model within a linear-quadratic setting [36].
Like Hall, we formulate an infinite-horizon linear-quadratic savings problem.
We use the model as a vehicle for illustrating
• alternative formulations of the state of a dynamic system
• the idea of cointegration
• impulse response functions
• the idea that changes in consumption are useful as predictors of movements in income
Background readings on the linear-quadratic-Gaussian permanent income model are Hall’s
[36] and chapter 2 of [68].
35.2.1 Setup
635
636 CHAPTER 35. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
In this section we state and solve the savings and consumption problem faced by the con-
sumer.
35.3.1 Preliminaries
𝔼𝑡 [𝑋𝑡+1 ] = 𝑋𝑡 , 𝑡 = 0, 1, 2, …
𝑋𝑡+1 = 𝑋𝑡 + 𝑤𝑡+1
𝑡
𝑋𝑡 = ∑ 𝑤𝑗 + 𝑋0
𝑗=1
Not every martingale arises as a random walk (see, for example, Wald’s martingale).
A consumer has preferences over consumption streams that are ordered by the utility func-
tional
∞
𝔼0 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (1)
𝑡=0
where
35.3. THE SAVINGS PROBLEM 637
1
𝑐𝑡 + 𝑏𝑡 = 𝑏 + 𝑦𝑡 𝑡≥0 (2)
1 + 𝑟 𝑡+1
Here
• 𝑦𝑡 is an exogenous endowment process
• 𝑟 > 0 is a time-invariant risk-free net interest rate
• 𝑏𝑡 is one-period risk-free debt maturing at 𝑡
The consumer also faces initial conditions 𝑏0 and 𝑦0 , which can be fixed or random.
35.3.3 Assumptions
For the remainder of this lecture, we follow Friedman and Hall in assuming that (1+𝑟)−1 = 𝛽.
Regarding the endowment process, we assume it has the state-space representation.
where
• {𝑤𝑡 } is an iid vector process with 𝔼𝑤𝑡 = 0 and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼
• the spectral radius of 𝐴 satisfies 𝜌(𝐴) < √1/𝛽
• 𝑈 is a selection vector that pins down 𝑦𝑡 as a particular linear combination of compo-
nents of 𝑧𝑡 .
The restriction on 𝜌(𝐴) prevents income from growing so fast that discounted geometric sums
of some quadratic forms to be described below become infinite.
Regarding preferences, we assume the quadratic utility function
Note
Along with this quadratic utility specification, we allow consumption to be nega-
tive. However, by choosing parameters appropriately, we can make the probability
that the model generates negative consumption paths over finite time horizons as
low as desired.
∞
𝔼0 [∑ 𝛽 𝑡 𝑏𝑡2 ] < ∞ (4)
𝑡=0
This condition rules out an always-borrow scheme that would allow the consumer to enjoy
bliss consumption forever.
𝔼𝑡 [𝑐𝑡+1 ] = 𝑐𝑡 (6)
(In fact quadratic preferences are necessary for this conclusion Section ??)
One way to interpret (6) is that consumption will change only when “new information” about
permanent income is revealed.
These ideas will be clarified below.
Note
One way to solve the consumer’s problem is to apply dynamic programming as
in this lecture. We do this later. But first we use an alternative approach that is
revealing and shows the work that dynamic programming does for us behind the
scenes.
𝑡
To accomplish this, observe first that (4) implies lim𝑡→∞ 𝛽 2 𝑏𝑡+1 = 0.
Using this restriction on the debt path and solving (2) forward yields
35.3. THE SAVINGS PROBLEM 639
∞
𝑏𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 ) (7)
𝑗=0
Take conditional expectations on both sides of (7) and use the martingale property of con-
sumption and the law of iterated expectations to deduce
∞
𝑐𝑡
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − (8)
𝑗=0
1−𝛽
∞ ∞
𝑗 𝑟
𝑐𝑡 = (1 − 𝛽) [∑ 𝛽 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] = [∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] (9)
𝑗=0
1 + 𝑟 𝑗=0
∞ ∞
∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0 𝑗=0
𝑟
𝑐𝑡 = [𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] (10)
1+𝑟
640 CHAPTER 35. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
𝑏𝑡+1 = (1 + 𝑟)(𝑏𝑡 + 𝑐𝑡 − 𝑦𝑡 )
= (1 + 𝑟)𝑏𝑡 + 𝑟[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] − (1 + 𝑟)𝑈 𝑧𝑡
= 𝑏𝑡 + 𝑈 [𝑟(𝐼 − 𝛽𝐴)−1 − (1 + 𝑟)𝐼]𝑧𝑡
= 𝑏𝑡 + 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)𝑧𝑡
To get from the second last to the last expression in this chain of equalities is not trivial.
∞
A key is to use the fact that (1 + 𝑟)𝛽 = 1 and (𝐼 − 𝛽𝐴)−1 = ∑𝑗=0 𝛽 𝑗 𝐴𝑗 .
We’ve now successfully written 𝑐𝑡 and 𝑏𝑡+1 as functions of 𝑏𝑡 and 𝑧𝑡 .
A State-Space Representation
We can summarize our dynamics in the form of a linear state-space system governing con-
sumption, debt and income:
𝑧 𝐴 0 𝐶
𝑥𝑡 = [ 𝑡 ] , 𝐴̃ = [ −1 ], 𝐶̃ = [ ]
𝑏𝑡 𝑈 (𝐼 − 𝛽𝐴) (𝐴 − 𝐼) 1 0
and
𝑈 0 𝑦
𝑈̃ = [ ], 𝑦𝑡̃ = [ 𝑡 ]
(1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 −(1 − 𝛽) 𝑐𝑡
𝑥𝑡+1 = 𝐴𝑥 ̃ + 𝐶𝑤
̃
𝑡 𝑡+1
(12)
̃
𝑦𝑡̃ = 𝑈 𝑥𝑡
We can use the following formulas from linear state space models to compute population
mean 𝜇𝑡 = 𝔼𝑥𝑡 and covariance Σ𝑡 ∶= 𝔼[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]
̃
𝜇𝑡+1 = 𝐴𝜇 with 𝜇0 given (13)
𝑡
̃ 𝐴′̃ + 𝐶 𝐶
Σ𝑡+1 = 𝐴Σ ̃ ′̃ with Σ0 given (14)
𝑡
𝜇𝑦,𝑡 = 𝑈̃ 𝜇𝑡
(15)
Σ𝑦,𝑡 = 𝑈̃ Σ𝑡 𝑈̃ ′
35.3. THE SAVINGS PROBLEM 641
To gain some preliminary intuition on the implications of (11), let’s look at a highly stylized
example where income is just iid.
(Later examples will investigate more realistic income streams)
In particular, let {𝑤𝑡 }∞
𝑡=1 be iid and scalar standard normal, and let
𝑧1 0 0 𝜎
𝑧𝑡 = [ 𝑡 ] , 𝐴=[ ], 𝑈 = [1 𝜇] , 𝐶=[ ]
1 0 1 0
𝑡−1
𝑏𝑡 = −𝜎 ∑ 𝑤𝑗
𝑗=1
𝑡
𝑐𝑡 = 𝜇 + (1 − 𝛽)𝜎 ∑ 𝑤𝑗
𝑗=1
Thus income is iid and debt and consumption are both Gaussian random walks.
Defining assets as −𝑏𝑡 , we see that assets are just the cumulative sum of unanticipated in-
comes prior to the present date.
The next figure shows a typical realization with 𝑟 = 0.05, 𝜇 = 1, and 𝜎 = 0.15
Random.seed!(42)
const r = 0.05
const β = 1.0 / (1.0 + r)
const T = 60
const σ = 0.15
const μ = 1.0
function time_path2()
w = randn(T+1)
w[1] = 0.0
b = zeros(T+1)
for t=2:T+1
b[t] = sum(w[1:t])
end
b .*= -σ
c = μ .+ (1.0 - β) .* (σ .* w .- b)
return w, b, c
end
w, b, c = time_path2()
p = plot(0:T, μ .+ σ .* w, color = :green, label = "non-financial income")
plot!(c, color = :black, label = "consumption")
642 CHAPTER 35. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
Out[3]:
In [4]: time_paths = []
n = 250
for i in 1:n
push!(time_paths, time_path2()[3])
end
Out[4]:
35.4. ALTERNATIVE REPRESENTATIONS 643
In this section we shed more light on the evolution of savings, debt and consumption by rep-
resenting their dynamics in several different ways.
Hall [36] suggested an insightful way to summarize the implications of LQ permanent income
theory.
First, to represent the solution for 𝑏𝑡 , shift (9) forward one period and eliminate 𝑏𝑡+1 by using
(2) to obtain
∞
𝑐𝑡+1 = (1 − 𝛽) ∑ 𝛽 𝑗 𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − (1 − 𝛽) [𝛽 −1 (𝑐𝑡 + 𝑏𝑡 − 𝑦𝑡 )]
𝑗=0
∞
If we add and subtract 𝛽 −1 (1 − 𝛽) ∑𝑗=0 𝛽 𝑗 𝔼𝑡 𝑦𝑡+𝑗 from the right side of the preceding equation
and rearrange, we obtain
∞
𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽) ∑ 𝛽 𝑗 {𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − 𝔼𝑡 [𝑦𝑡+𝑗+1 ]} (16)
𝑗=0
The right side is the time 𝑡 + 1 innovation to the expected present value of the endowment
process {𝑦𝑡 }.
We can represent the optimal decision rule for (𝑐𝑡 , 𝑏𝑡+1 ) in the form of (16) and (8), which we
repeat:
644 CHAPTER 35. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
∞
1
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑐 (17)
𝑗=0
1−𝛽 𝑡
Equation (17) asserts that the consumer’s debt due at 𝑡 equals the expected present value of
its endowment minus the expected present value of its consumption stream.
A high debt thus indicates a large expected present value of surpluses 𝑦𝑡 − 𝑐𝑡 .
Recalling again our discussion on forecasting geometric sums, we have
∞
𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0
∞
𝔼𝑡+1 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡+1
𝑗=0
∞
𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝐴𝑧𝑡
𝑗=0
Using these formulas together with (3) and substituting into (16) and (17) gives the following
representation for the consumer’s optimum decision rule:
35.4.2 Cointegration
Representation (18) reveals that the joint process {𝑐𝑡 , 𝑏𝑡 } possesses the property that Engle
and Granger [27] called cointegration.
Cointegration is a tool that allows us to apply powerful results from the theory of stationary
stochastic processes to (certain transformations of) nonstationary models.
To apply cointegration in the present context, suppose that 𝑧𝑡 is asymptotically stationary
Section ??.
Despite this, both 𝑐𝑡 and 𝑏𝑡 will be non-stationary because they have unit roots (see (11) for
𝑏𝑡 ).
Nevertheless, there is a linear combination of 𝑐𝑡 , 𝑏𝑡 that is asymptotically stationary.
In particular, from the second equality in (18) we have
35.4. ALTERNATIVE REPRESENTATIONS 645
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 . (20)
𝑗=0
Equation (20) asserts that the cointegrating residual on the left side equals the conditional
expectation of the geometric sum of future incomes on the right Section ??.
Consider again (18), this time in light of our discussion of distribution dynamics in the lec-
ture on linear systems.
The dynamics of 𝑐𝑡 are given by
or
𝑡
𝑐𝑡 = 𝑐0 + ∑ 𝑤̂ 𝑗 for 𝑤̂ 𝑡+1 ∶= (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1
𝑗=1
The unit root affecting 𝑐𝑡 causes the time 𝑡 variance of 𝑐𝑡 to grow linearly with 𝑡.
In particular, since {𝑤̂ 𝑡 } is iid, we have
where
Impulse response functions measure responses to various impulses (i.e., temporary shocks).
The impulse response function of {𝑐𝑡 } to the innovation {𝑤𝑡 } is a box.
In particular, the response of 𝑐𝑡+𝑗 to a unit increase in the innovation 𝑤𝑡+1 is (1 − 𝛽)𝑈 (𝐼 −
𝛽𝐴)−1 𝐶 for all 𝑗 ≥ 1.
It’s useful to express the innovation to the expected present value of the endowment process
in terms of a moving average representation for income 𝑦𝑡 .
The endowment process defined by (3) has the moving average representation
where
∞
• 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 for some sequence 𝑑𝑗 , where 𝐿 is the lag operator Section ??
• at time 𝑡, the consumer has an information set Section ?? 𝑤𝑡 = [𝑤𝑡 , 𝑤𝑡−1 , …]
Notice that
It follows that
The object 𝑑(𝛽) is the present value of the moving average coefficients in the represen-
tation for the endowment process 𝑦𝑡 .
𝑧 1 0 𝑧1𝑡 𝜎 0 𝑤1𝑡+1
[ 1𝑡+1 ] = [ ][ ] + [ 1 ][ ]
𝑧2𝑡+1 0 0 𝑧2𝑡 0 𝜎2 𝑤2𝑡+1
Here
• 𝑤𝑡+1 is an iid 2 × 1 process distributed as 𝑁 (0, 𝐼)
• 𝑧1𝑡 is a permanent component of 𝑦𝑡
• 𝑧2𝑡 is a purely transitory component of 𝑦𝑡
35.5. TWO CLASSIC EXAMPLES 647
35.5.1 Example 1
Formula (26) shows how an increment 𝜎1 𝑤1𝑡+1 to the permanent component of income 𝑧1𝑡+1
leads to
• a permanent one-for-one increase in consumption and
• no increase in savings −𝑏𝑡+1
But the purely transitory component of income 𝜎2 𝑤2𝑡+1 leads to a permanent increment in
consumption by a fraction 1 − 𝛽 of transitory income.
The remaining fraction 𝛽 is saved, leading to a permanent increment in −𝑏𝑡+1 .
Application of the formula for debt in (11) to this example shows that
This confirms that none of 𝜎1 𝑤1𝑡 is saved, while all of 𝜎2 𝑤2𝑡 is saved.
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions
for t=2:T2
b[t+1] = b[t] - σ2 * w2[t]
c[t+1] = c[t] + σ1 * w1[t+1] + (1 - β) * σ2 * w2[t+1]
end
return b, c
end
L = 0.175
648 CHAPTER 35. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
b1, c1 = time_path(false)
b2, c2 = time_path(true)
p = plot(0:T2, [c1 c2 b1 b2], layout = (2, 1),
color = [:green :green :blue :blue],
label = ["consumption" "consumption" "debt" "debt"])
t = ["impulse-response, transitory income shock"
"impulse-response, permanent income shock"]
plot!(title = reshape(t, 1, length(t)), xlabel = "Time", ylims = (-L, L),
legend = [:topright :bottomright])
vline!([S S], color = :black, layout = (2, 1), label = "")
Out[5]:
35.5.2 Example 2
Assume now that at time 𝑡 the consumer observes 𝑦𝑡 , and its history up to 𝑡, but not 𝑧𝑡 .
Under this assumption, it is appropriate to use an innovation representation to form 𝐴, 𝐶, 𝑈
in (18).
The discussion in sections 2.9.1 and 2.11.3 of [68] shows that the pertinent state space repre-
sentation for 𝑦𝑡 is
𝑦 1 −(1 − 𝐾) 𝑦𝑡 1
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑎𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 1
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡
where
• 𝐾 ∶= the stationary Kalman gain
35.6. FURTHER READING 649
• 𝑎𝑡 ∶= 𝑦𝑡 − 𝐸[𝑦𝑡 | 𝑦𝑡−1 , … , 𝑦0 ]
In the same discussion in [68] it is shown that 𝐾 ∈ [0, 1] and that 𝐾 increases as 𝜎1 /𝜎2 does.
In other words, 𝐾 increases as the ratio of the standard deviation of the permanent shock to
that of the transitory shock increases.
Please see first look at the Kalman filter.
Applying formulas (18) implies
where the endowment process can now be represented in terms of the univariate innovation to
𝑦𝑡 as
This indicates how the fraction 𝐾 of the innovation to 𝑦𝑡 that is regarded as permanent influ-
ences the fraction of the innovation that is saved.
The model described above significantly changed how economists think about consumption.
While Hall’s model does a remarkably good job as a first approximation to consumption data,
it’s widely believed that it doesn’t capture important aspects of some consumption/savings
data.
For example, liquidity constraints and precautionary savings appear to be present sometimes.
Further discussion can be found in, e.g., [37], [84], [20], [15].
650 CHAPTER 35. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
𝑏1
𝑐0 = − 𝑏0 + 𝑦0 and 𝑐1 = 𝑦1 − 𝑏1
1+𝑟
𝑏1
max {𝑢 ( − 𝑏0 + 𝑦0 ) + 𝛽 𝔼0 [𝑢(𝑦1 − 𝑏1 )]}
𝑏1 𝑅
36.1 Contents
• Overview 36.2
• Introduction 36.3
• The LQ Approach 36.4
• Implementation 36.5
• Two Example Economies 36.6
Co-authored with Chase Coleman.
36.2 Overview
This lecture continues our analysis of the linear-quadratic (LQ) permanent income model of
savings and consumption.
As we saw in our previous lecture on this topic, Robert Hall [36] used the LQ permanent in-
come model to restrict and interpret intertemporal comovements of nondurable consumption,
nonfinancial income, and financial wealth.
For example, we saw how the model asserts that for any covariance stationary process for
nonfinancial income
• consumption is a random walk
• financial wealth has a unit root and is cointegrated with consumption
Other applications use the same LQ framework.
For example, a model isomorphic to the LQ permanent income model has been used by
Robert Barro [8] to interpret intertemporal comovements of a government’s tax collections,
its expenditures net of debt service, and its public debt.
This isomorphism means that in analyzing the LQ permanent income model, we are in effect
also analyzing the Barro tax smoothing model.
It is just a matter of appropriately relabeling the variables in Hall’s model.
In this lecture, we’ll
• show how the solution to the LQ permanent income model can be obtained using LQ
control methods
651
652 CHAPTER 36. OPTIMAL SAVINGS II: LQ TECHNIQUES
36.2.1 Setup
36.3 Introduction
Let’s recall the basic features of the model discussed in permanent income model.
Consumer preferences are ordered by
∞
𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
1
𝑐𝑡 + 𝑏 𝑡 = 𝑏 + 𝑦𝑡 , 𝑡≥0 (2)
1 + 𝑟 𝑡+1
and the no-Ponzi condition
∞
𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < ∞ (3)
𝑡=0
The interpretation of all variables and parameters are the same as in the previous lecture.
We continue to assume that (1 + 𝑟)𝛽 = 1.
The dynamics of {𝑦𝑡 } again follow the linear state space model
The restrictions on the shock process and parameters are the same as in our previous lecture.
For the purposes of this lecture, let’s assume {𝑦𝑡 } is a second-order univariate autoregressive
process:
We can map this into the linear state space framework in (4), as discussed in our lecture on
linear models.
To do so we take
1 1 0 0 0
𝑧𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥,
⎤ 𝐴=⎡
⎢𝛼 𝜌1 𝜌2 ⎤
⎥, 𝐶=⎡ 𝜎
⎢ ⎥,
⎤ and 𝑈 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣ 0 1 0 ⎦ ⎣0⎦
654 CHAPTER 36. OPTIMAL SAVINGS II: LQ TECHNIQUES
Previously we solved the permanent income model by solving a system of linear expectational
difference equations subject to two boundary conditions.
Here we solve the same model using LQ methods based on dynamic programming.
After confirming that answers produced by the two methods agree, we apply QuantEcon’s
LSS type to illustrate features of the model.
Why solve a model in two distinct ways?
Because by doing so we gather insights about the structure of the model.
Our earlier approach based on solving a system of expectational difference equations brought
to the fore the role of the consumer’s expectations about future nonfinancial income.
On the other hand, formulating the model in terms of an LQ dynamic programming problem
reminds us that
• finding the state (of a dynamic programming problem) is an art, and
• iterations on a Bellman equation implicitly jointly solve both a forecasting problem and
a control problem
Recall from our lecture on LQ theory that the optimal linear regulator problem is to choose a
decision rule for 𝑢𝑡 to minimize
∞
𝔼 ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 },
𝑡=0
̃ + 𝐵𝑢
𝑥𝑡+1 = 𝐴𝑥 ̃
̃ 𝑡 + 𝐶𝑤
𝑡 𝑡+1 , 𝑡 ≥ 0, (5)
where 𝑤𝑡+1 is iid with mean vector zero and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼.
The tildes in 𝐴,̃ 𝐵,̃ 𝐶 ̃ are to avoid clashing with notation in (4).
The value function for this problem is 𝑣(𝑥) = −𝑥′ 𝑃 𝑥 − 𝑑, where
• 𝑃 is the unique positive semidefinite solution of the corresponding matrix Riccati equa-
tion.
• The scalar 𝑑 is given by 𝑑 = 𝛽(1 − 𝛽)−1 trace(𝑃 𝐶 𝐶̃ ′̃ ).
1
𝑧𝑡 ⎡ 𝑦 ⎤
𝑥𝑡 ∶= [ ] = ⎢ 𝑡 ⎥
𝑏𝑡 ⎢𝑦𝑡−1 ⎥
⎣ 𝑏𝑡 ⎦
𝐴 0 0 𝐶
𝐴 ̃ ∶= [ ] 𝐵̃ ∶= [ ] and 𝐶 ̃ ∶= [ ] 𝑤𝑡+1
(1 + 𝑟)(𝑈𝛾 − 𝑈 ) 1 + 𝑟 1+𝑟 0
Please confirm for yourself that, with these definitions, the LQ dynamics (5) match the dy-
namics of 𝑧𝑡 and 𝑏𝑡 described above.
To map utility into the quadratic form 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 we can set
• 𝑄 ∶= 1 (remember that we are minimizing) and
• 𝑅 ∶= a 4 × 4 matrix of zeros
However, there is one problem remaining.
We have no direct way to capture the non-recursive restriction (3) on the debt sequence {𝑏𝑡 }
from within the LQ framework.
To try to enforce it, we’re going to use a trick: put a small penalty on 𝑏𝑡2 in the criterion func-
tion.
In the present setting, this means adding a small entry 𝜖 > 0 in the (4, 4) position of 𝑅.
That will induce a (hopefully) small approximation error in the decision rule.
We’ll check whether it really is small numerically soon.
36.5 Implementation
# Set parameters
656 CHAPTER 36. OPTIMAL SAVINGS II: LQ TECHNIQUES
R = 1 / β
A = [1.0 0.0 0.0;
α ρ1 ρ2;
0.0 1.0 0.0]
C = [0.0; σ; 0.0]''
G = [0.0 1.0 0.0]
QLQ = 1.0
BLQ = [0.0; 0.0; 0.0; R]
CLQ = [0.0; σ; 0.0; 0.0]
β_LQ = β
Out[4]: 0.95
A = [1.0 0.0 0.0 0.0; 10.0 0.9 0.0 0.0; 0.0 1.0 0.0 0.0; 0.0 -1.
↪0526315789473684 0.0
1.0526315789473684]
B = [0.0, 0.0, 0.0, 1.0526315789473684]
R = [0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 1.0e-9]
Q = 1.0
We’ll save the implied optimal policy function soon and compare with what we get by em-
ploying an alternative solution method.
In our first lecture on the infinite horizon permanent income problem we used a different solu-
tion method.
The method was based around
• deducing the Euler equations that are the first-order conditions with respect to con-
sumption and savings
• using the budget constraints and boundary condition to complete a system of expecta-
tional linear difference equations
• solving those equations to obtain the solution
Expressed in state space notation, the solution took the form
In [8]: # Use the above formulas to create the optimal policies for b_{t+1} and c_t
b_pol = G * (inv(I - β * A)) * (A - I)
c_pol = (1 - β) * (G * inv(I - β * A))
# Use the following values to start everyone off at b=0, initial incomes zero
μ_0 = [1.0, 0.0, 0.0, 0.0]
Σ_0 = zeros(4, 4)
A_LSS calculated as we have here should equal ABF calculated above using the LQ model
We have verified that the two methods give the same solution.
Now let’s create instances of the LSS type and use it to do some interesting experiments.
To do this, we’ll use the outcomes from our second method.
36.6. TWO EXAMPLE ECONOMIES 659
We generate 25 paths of the exogenous non-financial income process and the associated opti-
mal consumption and debt paths.
In a first set of graphs, darker lines depict a particular sample path, while the lighter lines
describe 24 other paths.
A second graph plots a collection of simulations against the population distribution that we
extract from the LSS instance LSS.
Comparing sample paths with population distributions at each date 𝑡 is a useful exercise—see
our discussion of the laws of large numbers.
# simulation/Moment Parameters
moment_generator = moment_sequence(lss)
660 CHAPTER 36. OPTIMAL SAVINGS II: LQ TECHNIQUES
for i in 1:npaths
sims = simulate(lss,T)
bsim[i, :] = sims[1][end, :]
csim[i, :] = sims[2][2, :]
ysim[i, :] = sims[2][1, :]
end
# get T
T = size(bsim, 2)
# plot debt
plt_2 = plot(bsim[1,: ], label="b", color=:red, lw=2)
plot!(plt_2, bsim', alpha=0.1, color=:red,label="")
plot!(plt_2, xlabel="t", ylabel="debt",legend=:bottomright)
xvals = 1:T
# first fanchart
plt_1=plot(xvals, cons_mean, color=:black, lw=2, label="")
plot!(plt_1, xvals, Array(csim'), color=:black, alpha=0.25, label="")
plot!(xvals, fillrange=[c_perc_95m, c_perc_95p], alpha=0.25, color=:
↪blue, label="")
# second fanchart
plt_2=plot(xvals, debt_mean, color=:black, lw=2,label="")
plot!(plt_2, xvals, Array(bsim'), color=:black, alpha=0.25,label="")
plot!(xvals, fillrange=[d_perc_95m, d_perc_95p], alpha=0.25, color=:
↪blue,label="")
Now let’s create figures with initial conditions of zero for 𝑦0 and 𝑏0
Out[13]:
662 CHAPTER 36. OPTIMAL SAVINGS II: LQ TECHNIQUES
Out[14]:
36.6. TWO EXAMPLE ECONOMIES 663
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝐸𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (6)
𝑗=0
So at time 0 we have
∞
𝑐0 = (1 − 𝛽)𝐸0 ∑ 𝛽 𝑗 𝑦𝑡
𝑡=0
This tells us that consumption starts at the income that would be paid by an annuity whose
value equals the expected discounted value of nonfinancial income at time 𝑡 = 0.
To support that level of consumption, the consumer borrows a lot early and consequently
builds up substantial debt.
In fact, he or she incurs so much debt that eventually, in the stochastic steady state, he con-
sumes less each period than his nonfinancial income.
He uses the gap between consumption and nonfinancial income mostly to service the interest
payments due on his debt.
Thus, when we look at the panel of debt in the accompanying graph, we see that this is a
group of ex ante identical people each of whom starts with zero debt.
All of them accumulate debt in anticipation of rising nonfinancial income.
They expect their nonfinancial income to rise toward the invariant distribution of income, a
consequence of our having started them at 𝑦−1 = 𝑦−2 = 0.
Cointegration residual
The following figure plots realizations of the left side of (6), which, as discussed in our last
lecture, is called the cointegrating residual.
As mentioned above, the right side can be thought of as an annuity payment on the expected
∞
present value of future income 𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 .
Early along a realization, 𝑐𝑡 is approximately constant while (1 − 𝛽)𝑏𝑡 and (1 −
∞
𝛽)𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 both rise markedly as the household’s present value of income and borrow-
ing rise pretty much together.
This example illustrates the following point: the definition of cointegration implies that the
cointegrating residual is asymptotically covariance stationary, not covariance stationary.
The cointegrating residual for the specification with zero income and zero debt initially has a
notable transient component that dominates its behavior early in the sample.
664 CHAPTER 36. OPTIMAL SAVINGS II: LQ TECHNIQUES
By altering initial conditions, we shall remove this transient in our second example to be pre-
sented below
cointegration_figure(bsim0, csim0)
Out[15]:
When we set 𝑦−1 = 𝑦−2 = 0 and 𝑏0 = 0 in the preceding exercise, we make debt “head north”
early in the sample.
Average debt in the cross-section rises and approaches asymptote.
We can regard these as outcomes of a “small open economy” that borrows from abroad at the
fixed gross interest rate 𝑅 = 𝑟 + 1 in anticipation of rising incomes.
So with the economic primitives set as above, the economy converges to a steady state in
which there is an excess aggregate supply of risk-free loans at a gross interest rate of 𝑅.
This excess supply is filled by “foreigner lenders” willing to make those loans.
We can use virtually the same code to rig a “poor man’s Bewley [13] model” in the following
way
• as before, we start everyone at 𝑏0 = 0
36.6. TWO EXAMPLE ECONOMIES 665
𝑦
• But instead of starting everyone at 𝑦−1 = 𝑦−2 = 0, we draw [ −1 ] from the invariant
𝑦−2
distribution of the {𝑦𝑡 } process
This rigs a closed economy in which people are borrowing and lending with each other at a
gross risk-free interest rate of 𝑅 = 𝛽 −1 .
Across the group of people being analyzed, risk-free loans are in zero excess supply.
We have arranged primitives so that 𝑅 = 𝛽 −1 clears the market for risk-free loans at zero
aggregate excess supply.
So the risk-free loans are being made from one person to another within our closed set of
agent.
There is no need for foreigners to lend to our group.
Let’s have a look at the corresponding figures
Out[16]:
Out[17]:
Out[18]:
36.6. TWO EXAMPLE ECONOMIES 667
668 CHAPTER 36. OPTIMAL SAVINGS II: LQ TECHNIQUES
Chapter 37
37.1 Contents
• Overview 37.2
• Background 37.3
• Model 1 (Complete Markets) 37.4
• Model 2 (One-Period Risk Free Debt Only) 37.5
• Example: Tax Smoothing with Complete Markets 37.6
• Linear State Space Version of Complete Markets Model 37.7
37.2 Overview
669
670CHAPTER 37. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE
that we use in a “complete markets” model in the style of Lucas and Stokey [72].
While we are equally interested in consumption-smoothing and tax-smoothing models, for the
most part we focus explicitly on consumption-smoothing versions of these models.
But for each version of the consumption-smoothing model there is a natural tax-smoothing
counterpart obtained simply by
• relabeling consumption as tax collections and nonfinancial income as government expen-
ditures
• relabeling the consumer’s debt as the government’s assets
For elaborations on this theme, please see Optimal Savings II: LQ Techniques and later parts
of this lecture.
We’ll consider two closely related alternative assumptions about the consumer’s exogenous
nonfinancial income process (or in the tax-smoothing interpretation, the government’s exoge-
nous expenditure process):
• that it obeys a finite 𝑁 state Markov chain (setting 𝑁 = 2 most of the time)
• that it is described by a linear state space model with a continuous state vector in ℝ𝑛
driven by a Gaussian vector iid shock process
We’ll spend most of this lecture studying the finite-state Markov specification, but will briefly
treat the linear state space specification before concluding.
This lecture can be viewed as a followup to Optimal Savings II: LQ Techniques and a warm
up for a model of tax smoothing described in opt_tax_recur.
Linear-quadratic versions of the Lucas-Stokey tax-smoothing model are described in lqram-
sey.
The key differences between those lectures and this one are
• Here the decision maker takes all prices as exogenous, meaning that his decisions do not
affect them.
• In lqramsey and opt_tax_recur, the decision maker – the government in the case of
these lectures – recognizes that his decisions affect prices.
So these later lectures are partly about how the government should manipulate prices of gov-
ernment debt.
37.3 Background
In the complete markets version of the model, each period the consumer can buy or sell one-
period ahead state-contingent securities whose payoffs depend on next period’s realization of
the Markov state.
In the two-state Markov chain case, there are two such securities each period.
In an 𝑁 state Markov state version of the model, 𝑁 such securities are traded each period.
These state-contingent securities are commonly called Arrow securities, after Kenneth Arrow
who first theorized about them.
In the incomplete markets version of the model, the consumer can buy and sell only one secu-
rity each period, a risk-free bond with gross return 𝛽 −1 .
𝑦1̄ if 𝑠𝑡 = 𝑠1̄
𝑦𝑡 = {
𝑦2̄ if 𝑠𝑡 = 𝑠2̄
∞
𝔼 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] where 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 and 0<𝛽<1 (1)
𝑡=0
We can regard these as Barro [8] tax-smoothing models if we set 𝑐𝑡 = 𝑇𝑡 and 𝐺𝑡 = 𝑦𝑡 , where
𝑇𝑡 is total tax collections and {𝐺𝑡 } is an exogenous government expenditures process.
The two models differ in how effectively the market structure allows the consumer to trans-
fer resources across time and Markov states, there being more transfer opportunities in the
complete markets setting than in the incomplete markets setting.
Watch how these differences in opportunities affect
• how smooth consumption is across time and Markov states
• how the consumer chooses to make his levels of indebtedness behave over time and
across Markov states
672CHAPTER 37. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE
where 𝑏𝑡 is the consumer’s one-period debt that falls due at time 𝑡 and 𝑏𝑡+1 (𝑠𝑗̄ | 𝑠𝑡 ) are the
consumer’s time 𝑡 sales of the time 𝑡 + 1 consumption good in Markov state 𝑠𝑗̄ , a source of
time 𝑡 revenues.
An analogue of Hall’s assumption that the one-period risk-free gross interest rate is 𝛽 −1 is
To understand this, observe that in state 𝑠𝑖̄ it costs ∑𝑗 𝑞(𝑠𝑗̄ | 𝑠𝑖̄ ) to purchase one unit of con-
sumption next period for sure, i.e., meaning no matter what state of the world occurs at 𝑡 + 1.
Hence the implied price of a risk-free claim on one unit of consumption next period is
This confirms that (2) is a natural analogue of Hall’s assumption about the risk-free one-
period interest rate.
First-order necessary conditions for maximizing the consumer’s expected utility are
𝑢′ (𝑐𝑡+1 )
𝛽 ℙ{𝑠𝑡+1 | 𝑠𝑡 } = 𝑞(𝑠𝑡+1 | 𝑠𝑡 )
𝑢′ (𝑐𝑡 )
𝑐𝑡+1 = 𝑐𝑡 (3)
Thus, our consumer sets 𝑐𝑡 = 𝑐 ̄ for all 𝑡 ≥ 0 for some value 𝑐 ̄ that it is our job now to deter-
mine.
Guess: We’ll make the plausible guess that
so that the amount borrowed today turns out to depend only on tomorrow’s Markov state.
(Why is this is a plausible guess?).
To determine 𝑐,̄ we shall pursue the implications of the consumer’s budget constraints in each
Markov state today and our guess (4) about the consumer’s debt level choices.
For 𝑡 ≥ 1, these imply
or
If we substitute (6) into the first equation of (5) and rearrange, we discover that
𝑏(𝑠1̄ ) = 𝑏0 (7)
We can then use the second equation of (5) to deduce the restriction
𝑦(𝑠1̄ ) − 𝑦(𝑠2̄ ) + [𝑞(𝑠1̄ | 𝑠1̄ ) − 𝑞(𝑠1̄ | 𝑠2̄ ) − 1]𝑏0 + [𝑞(𝑠2̄ | 𝑠1̄ ) + 1 − 𝑞(𝑠2̄ | 𝑠2̄ )]𝑏(𝑠2̄ ) = 0, (8)
The preceding calculations indicate that in the complete markets version of our model, we
obtain the following striking results:
• The consumer chooses to make consumption perfectly constant across time and Markov
states.
We computed the constant level of consumption 𝑐 ̄ and indicated how that level depends on
the underlying specifications of preferences, Arrow securities prices, the stochastic process of
exogenous nonfinancial income, and the initial debt level 𝑏0
• The consumer’s debt neither accumulates, nor decumulates, nor drifts. Instead the debt
level each period is an exact function of the Markov state, so in the two-state Markov
case, it switches between two values.
• We have verified guess (4).
674CHAPTER 37. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE
We computed how one of those debt levels depends entirely on initial debt – it equals it – and
how the other value depends on virtually all remaining parameters of the model.
37.4.2 Code
Here’s some code that, among other things, contains a function called consump-
tion_complete().
This function computes 𝑏(𝑠1̄ ), 𝑏(𝑠2̄ ), 𝑐 ̄ as outcomes given a set of parameters, under the as-
sumption of complete markets.
37.4.3 Setup
function consumption_complete(cp)
@unpack β, P, y, b0 = cp # Unpack
�
# Using equation (5) calculae c
�
c = y1 - b0 + ([b1 b2] * Q[1, :])[1]
�
return c, b1, b2
end
@unpack β, P, y, b0 = cp # unpack
# useful variables
37.5. MODEL 2 (ONE-PERIOD RISK FREE DEBT ONLY) 675
y = y
v = inv(I - β * P) * y
In [4]: cp = ConsumptionProblem()
�
c, b1, b2 = consumption_complete(cp)
debt_complete = [b1, b2]
�
isapprox((c + b2 - cp.y[2] - debt_complete' * (cp.β * cp.P)[2, :])[1], 0)
Out[4]: true
Below, we’ll take the outcomes produced by this code – in particular the implied consumption
and debt paths – and compare them with outcomes from an incomplete markets model in the
spirit of Hall [36] and Barro [8] (and also, for those who love history, Gallatin (1807) [34]).
This is a version of the original models of Hall (1978) and Barro (1979) in which the decision
maker’s ability to substitute intertemporally is constrained by his ability to buy or sell only
one security, a risk-free one-period bond bearing a constant gross interest rate that equals
𝛽 −1 .
Given an initial debt 𝑏0 at time 0, the consumer faces a sequence of budget constraints
𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽𝑏𝑡+1 , 𝑡≥0
where 𝛽 is the price at time 𝑡 of a risk-free claim on one unit of time consumption at time
𝑡 + 1.
First-order conditions for the consumer’s problem are
676CHAPTER 37. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − (1 − 𝛽)−1 𝑐𝑡 (10)
𝑗=0
and
∞
𝑐𝑡 = (1 − 𝛽) [𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑏𝑡 ] . (11)
𝑗=0
Equation (11) expresses 𝑐𝑡 as a net interest rate factor 1 − 𝛽 times the sum of the expected
∞
present value of nonfinancial income 𝔼𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 and financial wealth −𝑏𝑡 .
Substituting (11) into the one-period budget constraint and rearranging leads to
∞
𝑏𝑡+1 − 𝑏𝑡 = 𝛽 −1 [(1 − 𝛽)𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑦𝑡 ] (12)
𝑗=0
Now let’s do a useful calculation that will yield a convenient expression for the key term
∞
𝔼𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 in our finite Markov chain setting.
Define
∞
𝑣𝑡 ∶= 𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗
𝑗=0
In our finite Markov chain setting, 𝑣𝑡 = 𝑣(1) when 𝑠𝑡 = 𝑠1̄ and 𝑣𝑡 = 𝑣(2) when 𝑠𝑡 = 𝑠2̄ .
Therefore, we can write
or
𝑣 ⃗ = 𝑦 ⃗ + 𝛽𝑃 𝑣 ⃗
𝑣(1) 𝑦(1)
where 𝑣 ⃗ = [ ] and 𝑦 ⃗ = [ ].
𝑣(2) 𝑦(2)
We can also write the last expression as
37.5. MODEL 2 (ONE-PERIOD RISK FREE DEBT ONLY) 677
𝑣 ⃗ = (𝐼 − 𝛽𝑃 )−1 𝑦 ⃗
In our finite Markov chain setting, from expression (11), consumption at date 𝑡 when debt is
𝑏𝑡 and the Markov state today is 𝑠𝑡 = 𝑖 is evidently
In contrast to outcomes in the complete markets model, in the incomplete markets model
• consumption drifts over time as a random walk; the level of consumption at time 𝑡 de-
pends on the level of debt that the consumer brings into the period as well as the ex-
pected discounted present value of nonfinancial income at 𝑡
• the consumer’s debt drifts upward over time in response to low realizations of nonfinan-
cial income and drifts downward over time in response to high realizations of nonfinan-
cial income
• the drift over time in the consumer’s debt and the dependence of current consumption
on today’s debt level account for the drift over time in consumption
The code above also contains a function called consumption_incomplete() that uses (13) and
(14) to
• simulate paths of 𝑦𝑡 , 𝑐𝑡 , 𝑏𝑡+1
• plot these against values of of 𝑐,̄ 𝑏(𝑠1 ), 𝑏(𝑠2 ) found in a corresponding complete markets
economy
Let’s try this, using the same parameters in both complete and incomplete markets economies
In [5]: Random.seed!(42)
N_simul = 150
cp = ConsumptionProblem()
�
c, b1, b2 = consumption_complete(cp)
debt_complete = [b1, b2]
:dash)
plot!(plt_debt, legend = :bottomleft)
Out[5]:
In the graph on the left, for the same sample path of nonfinancial income 𝑦𝑡 , notice that
• consumption is constant when there are complete markets, but it takes a random walk
in the incomplete markets version of the model
• the consumer’s debt oscillates between two values that are functions of the Markov state
in the complete markets model, while the consumer’s debt drifts in a “unit root” fashion
in the incomplete markets economy
We can simply relabel variables to acquire tax-smoothing interpretations of our two models
:dash,
lw = 2)
hline!(plt_gov, [0], linestyle = :dash, color = :black, lw = 2, label = "")
plot(plt_tax, plt_gov, layout = (1,2), size = (800, 400))
Out[6]:
• Purchasing insurance protects the government against the need to raise taxes
too high or issue too much debt in the high government expenditure event.
We assume that government expenditures move between two values 𝐺1 < 𝐺2 , where Markov
state 1 means “peace” and Markov state 2 means “war”.
The government budget constraint in Markov state 𝑖 is
𝑇𝑖 + 𝑏𝑖 = 𝐺𝑖 + ∑ 𝑄𝑖𝑗 𝑏𝑗
𝑗
680CHAPTER 37. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE
where
𝑄𝑖𝑗 = 𝛽𝑃𝑖𝑗
is the price of one unit of output next period in state 𝑗 when today’s Markov state is 𝑖 and 𝑏𝑖
is the government’s level of assets in Markov state 𝑖.
That is, 𝑏𝑖 is the amount of the one-period loans owned by the government that fall due at
time 𝑡.
As above, we’ll assume that the initial Markov state is state 1.
In addition, to simplify our example, we’ll set the government’s initial asset level to 0, so that
𝑏1 = 0.
Here’s our code to compute a quantitative example with zero debt in peace time:
In [7]: # Parameters
β = .96
y = [1.0, 2.0]
b0 = 0.0
P = [0.8 0.2;
0.4 0.6]
cp = ConsumptionProblem(β, y, b0, P)
Q = β * P
N_simul = 150
�
c, b1, b2 = consumption_complete(cp)
debt_complete = [b1, b2]
println("P = $P")
println("Q = $Q")
println("Govt expenditures in peace and war = $y")
�
println("Constant tax collections = $c")
println("Govt assets in two states = $debt_complete")
msg = """
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
"""
println(msg)
AS1 = Q[1, 2] * b2
println("Spending on Arrow war security in peace = $AS1")
AS2 = Q[2, 2] * b2
println("Spending on Arrow war security in war = $AS2")
println("\n")
println("Government tax collections plus asset levels in peace and war")
�
TB1 = c + b1
println("T+b in peace = $TB1")
�
TB2 = c + b2
println("T+b in war = $TB2")
37.6. EXAMPLE: TAX SMOOTHING WITH COMPLETE MARKETS 681
println("\n")
println("Total government spending in peace and war")
G1= y[1] + AS1
G2 = y[2] + AS2
println("total govt spending in peace = $G1")
println("total govt spending in war = $G2")
println("\n")
println("Let's see ex post and ex ante returns on Arrow securities")
Π = 1 ./ Q # reciprocal(Q)
exret = Π
println("Ex post returns to purchase of Arrow securities = $exret")
exant = Π .* P
println("Ex ante returns to purchase of Arrow securities = $exant")
37.6.1 Explanation
In this example, the government always purchase 0 units of the Arrow security that pays off
in peace time (Markov state 1).
But it purchases a positive amount of the security that pays off in war time (Markov state 2).
We recommend plugging the quantities computed above into the government budget con-
straints in the two Markov states and staring.
This is an example in which the government purchases insurance against the possibility that
war breaks out or continues
682CHAPTER 37. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE
1 0
𝑃 =[ ]
.2 .8
Also, start the system in Markov state 2 (war) with initial government assets −10, so that the
government starts the war in debt and 𝑏2 = −10.
Now we’ll use a setting like that in first lecture on the permanent income model.
In that model, there were
• incomplete markets: the consumer could trade only a single risk-free one-period bond
bearing gross one-period risk-free interest rate equal to 𝛽 −1
• the consumer’s exogenous nonfinancial income was governed by a linear state space
model driven by Gaussian shocks, the kind of model studied in an earlier lecture about
linear state space models
We’ll write down a complete markets counterpart of that model.
So now we’ll suppose that nonfinancial income is governed by the state space system
where 𝜙(⋅ | 𝜇, Σ) is a multivariate Gaussian distribution with mean vector 𝜇 and covariance
matrix Σ.
Let 𝑏(𝑥𝑡+1 ) be a vector of state-contingent debt due at 𝑡 + 1 as a function of the 𝑡 + 1 state
𝑥𝑡+1 .
Using the pricing function assumed in (15), the value at 𝑡 of 𝑏(𝑥𝑡+1 ) is
In the complete markets setting, the consumer faces a sequence of budget constraints
𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽𝔼𝑡 𝑏𝑡+1 , 𝑡 ≥ 0
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 )
𝑗=0
We assume as before that the consumer cares about the expected value of
∞
∑ 𝛽 𝑡 𝑢(𝑐𝑡 ), 0<𝛽<1
𝑡=0
In the incomplete markets version of the model, we assumed that 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 , so that
the above utility functional became
∞
− ∑ 𝛽 𝑡 (𝑐𝑡 − 𝛾)2 , 0<𝛽<1
𝑡=0
But in the complete markets version, we can assume a more general form of utility function
that satisfies 𝑢′ > 0 and 𝑢″ < 0.
The first-order condition for the consumer’s problem with complete markets and our assump-
tion about Arrow securities prices is
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐)̄
𝑗=0
or
1
𝑏𝑡 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥𝑡 − 𝑐̄ (16)
1−𝛽
1
𝑏̄0 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥0 − 𝑐̄ (17)
1−𝛽
where 𝑏̄0 is an initial level of the consumer’s debt, specified as a parameter of the problem.
Thus, in the complete markets version of the consumption-smoothing model, 𝑐𝑡 = 𝑐,̄ ∀𝑡 ≥ 0
is determined by (17) and the consumer’s debt is a fixed function of the state 𝑥𝑡 described by
(16).
Here’s an example that shows how in this setting the availability of insurance against fluctu-
ating nonfinancial income allows the consumer completely to smooth consumption across time
and across states of the world.
# Debt
x_hist, y_hist = simulate(lss, T)
b_hist = (S_y * rm * x_hist .- cbar[1] / (1.0 - β))
N_simul = 150
# Define parameters
α, ρ1, ρ2 = 10.0, 0.9, 0.0
σ = 1.0
# N_simul = 1
# T = N_simul
A = [1.0 0.0 0.0;
α ρ1 ρ2;
0.0 1.0 0.0]
C = [0.0, σ, 0.0]
S_y = [1.0 1.0 0.0]
β, b0 = 0.95, -10.0
x0 = [1.0, α / (1 - ρ1), α / (1 - ρ1)]
# Consumption plots
plt_cons = plot(title = "Cons and income", xlabel = "Periods", ylim = [-5.
↪0, 110])
# Debt plots
plt_debt = plot(title = "Debt and income", xlabel = "Periods")
plot!(plt_debt, 1:N_simul, b_hist_com, label = "debt", lw = 2)
plot!(plt_debt, 1:N_simul, y_hist_com, label = "Income",
37.7. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 685
Out[8]:
The incomplete markets version of the model with nonfinancial income being governed by a
linear state space system is described in the first lecture on the permanent income model and
the followup lecture on the permanent income model.
In that version, consumption follows a random walk and the consumer’s debt follows a pro-
cess with a unit root.
We leave it to the reader to apply the usual isomorphism to deduce the corresponding impli-
cations for a tax-smoothing model like Barro’s [8].
38.1 Contents
• Overview 38.2
• The Optimal Savings Problem 38.3
• Computation 38.4
• Exercises 38.5
• Solutions 38.6
38.2 Overview
Next we study an optimal savings problem for an infinitely lived consumer—the “common
ancestor” described in [68], section 1.3.
This is an essential sub-problem for many representative macroeconomic models
• [1]
• [56]
• etc.
It is related to the decision problem in the stochastic optimal growth model and yet differs in
important ways.
For example, the choice problem for the agent includes an additive income term that leads to
an occasionally binding constraint.
Our presentation of the model will be relatively brief.
• For further details on economic intuition, implication and models, see [68]
• Proofs of all mathematical results stated below can be found in this paper
To solve the model we will use Euler equation based time iteration, similar to this lecture.
This method turns out to be
• Globally convergent under mild assumptions, even when utility is unbounded (both
above and below).
• More efficient numerically than value function iteration.
687
688 CHAPTER 38. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
38.2.1 References
Other useful references include [20], [22], [63], [87], [89] and [95].
Let’s write down the model and then discuss how to solve it.
38.3.1 Set Up
Consider a household that chooses a state-contingent consumption plan {𝑐𝑡 }𝑡≥0 to maximize
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
Here
• 𝛽 ∈ (0, 1) is the discount factor
• 𝑎𝑡 is asset holdings at time 𝑡, with ad-hoc borrowing constraint 𝑎𝑡 ≥ −𝑏
• 𝑐𝑡 is consumption
• 𝑧𝑡 is non-capital income (wages, unemployment compensation, etc.)
• 𝑅 ∶= 1 + 𝑟, where 𝑟 > 0 is the interest rate on savings
Non-capital income {𝑧𝑡 } is assumed to be a Markov process taking values in 𝑍 ⊂ (0, ∞) with
stochastic kernel Π.
This means that Π(𝑧, 𝐵) is the probability that 𝑧𝑡+1 ∈ 𝐵 given 𝑧𝑡 = 𝑧.
The expectation of 𝑓(𝑧𝑡+1 ) given 𝑧𝑡 = 𝑧 is written as
2. 𝑢 is smooth, strictly increasing and strictly concave with lim𝑐→0 𝑢′ (𝑐) = ∞ and
lim𝑐→∞ 𝑢′ (𝑐) = 0
The asset space is [−𝑏, ∞) and the state is the pair (𝑎, 𝑧) ∈ 𝑆 ∶= [−𝑏, ∞) × 𝑍.
A feasible consumption path from (𝑎, 𝑧) ∈ 𝑆 is a consumption sequence {𝑐𝑡 } such that {𝑐𝑡 }
and its induced asset path {𝑎𝑡 } satisfy
1. (𝑎0 , 𝑧0 ) = (𝑎, 𝑧)
38.3. THE OPTIMAL SAVINGS PROBLEM 689
The meaning of the third point is just that consumption at time 𝑡 can only be a function of
outcomes that have already been observed.
∞
𝑉 (𝑎, 𝑧) ∶= sup 𝔼 {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (2)
𝑡=0
where the supremum is over all feasible consumption paths from (𝑎, 𝑧).
An optimal consumption path from (𝑎, 𝑧) is a feasible consumption path from (𝑎, 𝑧) that at-
tains the supremum in (2).
To pin down such paths we can use a version of the Euler equation, which in the present set-
ting is
and
In essence, this says that the natural “arbitrage” relation 𝑢′ (𝑐𝑡 ) = 𝛽𝑅 𝔼𝑡 [𝑢′ (𝑐𝑡+1 )] holds when
the choice of current consumption is interior.
Interiority means that 𝑐𝑡 is strictly less than its upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏.
(The lower boundary case 𝑐𝑡 = 0 never arises at the optimum because 𝑢′ (0) = ∞)
When 𝑐𝑡 does hit the upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏, the strict inequality 𝑢′ (𝑐𝑡 ) > 𝛽𝑅 𝔼𝑡 [𝑢′ (𝑐𝑡+1 )]
can occur because 𝑐𝑡 cannot increase sufficiently to attain equality.
With some thought and effort, one can show that (3) and (4) are equivalent to
1. For each (𝑎, 𝑧) ∈ 𝑆, a unique optimal consumption path from (𝑎, 𝑧) exists.
2. This path is the unique feasible path from (𝑎, 𝑧) satisfying the Euler equality (5) and
the transversality condition
690 CHAPTER 38. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
Moreover, there exists an optimal consumption function 𝑐∗ ∶ 𝑆 → [0, ∞) such that the path
from (𝑎, 𝑧) generated by
(𝑎0 , 𝑧0 ) = (𝑎, 𝑧), 𝑧𝑡+1 ∼ Π(𝑧𝑡 , 𝑑𝑦), 𝑐𝑡 = 𝑐∗ (𝑎𝑡 , 𝑧𝑡 ) and 𝑎𝑡+1 = 𝑅𝑎𝑡 + 𝑧𝑡 − 𝑐𝑡
satisfies both (5) and (6), and hence is the unique optimal path from (𝑎, 𝑧).
In summary, to solve the optimization problem, we need to compute 𝑐∗ .
38.4 Computation
We can rewrite (5) to make it a statement about functions rather than random variables.
In particular, consider the functional equation
𝑢′ ∘ 𝑐 (𝑎, 𝑧) = max {𝛾 ∫ 𝑢′ ∘ 𝑐 {𝑅𝑎 + 𝑧 − 𝑐(𝑎, 𝑧), 𝑧}́ Π(𝑧, 𝑑𝑧)́ , 𝑢′ (𝑅𝑎 + 𝑧 + 𝑏)} (7)
where
We have to be careful with VFI (i.e., iterating with 𝑇 ) in this setting because 𝑢 is not as-
sumed to be bounded
• In fact typically unbounded both above and below — e.g. 𝑢(𝑐) = log 𝑐.
• In which case, the standard DP theory does not apply.
• 𝑇 𝑛 𝑣 is not guaranteed to converge to the value function for arbitrary continous bounded
𝑣.
Nonetheless, we can always try the popular strategy “iterate and hope”.
We can then check the outcome by comparing with that produced by TI.
The latter is known to converge, as described above.
38.4.3 Implementation
Here’s the code for a named-tuple constructor called ConsumerProblem that stores primi-
tives, as well as
• a T function, which implements the Bellman operator 𝑇 specified above
• a K function, which implements the Coleman operator 𝐾 specified above
• an initialize, which generates suitable initial conditions for iteration
38.4.4 Setup
# model
function ConsumerProblem(;r = 0.01,
β = 0.96,
Π = [0.6 0.4; 0.05 0.95],
z_vals = [0.5, 1.0],
b = 0.0,
grid_max = 16,
grid_size = 50)
R = 1 + r
asset_grid = range(-b, grid_max, length = grid_size)
opt_lb = 1e-8
function obj(c)
EV = dot(vf.(R * a + z - c, z_idx), Π[i_z, :]) # compute�
↪ expectation
return u(c) + β * EV
end
res = maximize(obj, opt_lb, R .* a .+ z .+ b)
converged(res) || error("Didn't converge") # important to check
if ret_policy
out[i_a, i_z] = maximizer(res)
else
out[i_a, i_z] = maximum(res)
end
end
end
38.4. COMPUTATION 693
out
end
get_greedy!(cp, V, out) =
update_bellman!(cp, V, out, ret_policy = true)
get_greedy(cp, V) =
update_bellman(cp, V, ret_policy = true)
function initialize(cp)
# simplify names, set up arrays
@unpack R, β, b, asset_grid, z_vals = cp
shape = length(asset_grid), length(z_vals)
V, c = zeros(shape...), zeros(shape...)
# populate V and c
for (i_z, z) in enumerate(z_vals)
for (i_a, a) in enumerate(asset_grid)
c_max = R * a + z + b
c[i_a, i_z] = c_max
V[i_a, i_z] = u(c_max) / (1 - β)
end
end
return V, c
694 CHAPTER 38. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
end
Both T and K use linear interpolation along the asset grid to approximate the value and con-
sumption functions.
The following exercises walk you through several applications where policy functions are com-
puted.
In exercise 1 you will see that while VFI and TI produce similar results, the latter is much
faster.
Intuition behind this fact was provided in a previous lecture on time iteration.
38.5 Exercises
38.5.1 Exercise 1
The first exercise is to replicate the following figure, which compares TI and VFI as solution
methods
In [4]: cp = ConsumerProblem()
v, c, = initialize(cp)
38.5.2 Exercise 2
38.5.3 Exercise 3
Now let’s consider the long run asset levels held by households.
We’ll take r = 0.03 and otherwise use default parameters.
The following figure is a 45 degree diagram showing the law of motion for assets when con-
sumption is optimal
Out[7]:
𝑎′ = ℎ(𝑎, 𝑧) ∶= 𝑅𝑎 + 𝑧 − 𝑐∗ (𝑎, 𝑧)
38.5.4 Exercise 4
Following on from exercises 2 and 3, let’s look at how savings and aggregate asset holdings
vary with the interest rate
• Note: [68] section 18.6 can be consulted for more background on the topic treated in
this exercise
For a given parameterization of the model, the mean of the stationary distribution can be in-
terpreted as aggregate capital in an economy with a unit mass of ex-ante identical households
facing idiosyncratic shocks.
Let’s look at how this measure of aggregate capital varies with the interest rate and borrow-
ing constraint.
The next figure plots aggregate capital against the interest rate for b in (1, 3)
38.6. SOLUTIONS 699
38.6 Solutions
38.6.1 Exercise 1
In [8]: cp = ConsumerProblem()
N = 80
V, c = initialize(cp)
println("Starting value function iteration")
for i in 1:N
V = T(cp, V)
end
c1 = T(cp, V, ret_policy=true)
V2, c2 = initialize(cp)
println("Starting policy function iteration")
for i in 1:N
c2 = K(cp, c2)
end
Out[8]:
38.6.2 Exercise 2
Out[9]:
38.6. SOLUTIONS 701
38.6.3 Exercise 3
cf = interp(cp.asset_grid, c)
a = zeros(T + 1)
z_seq = simulate(MarkovChain(Π), T)
for t in 1:T
i_z = z_seq[t]
a[t+1] = R * a[t] + z_vals[i_z] - cf(a[t], i_z)
end
return a
end
Out[10]:
702 CHAPTER 38. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
38.6.4 Exercise 4
In [11]: M = 25
r_vals = range(0, 0.04, length = M)
xs = []
ys = []
legends = []
for b in [1.0, 3.0]
asset_mean = zeros(M)
for (i, r_val) in enumerate(r_vals)
cp = ConsumerProblem(r = r_val, b = b)
the_mean = mean(compute_asset_series(cp, 250_000))
asset_mean[i] = the_mean
end
xs = push!(xs, asset_mean)
ys = push!(ys, r_vals)
legends = push!(legends, "b = $b")
println("Finished iteration b = $b")
end
plot(xs, ys, label = reshape(legends, 1, length(legends)))
plot!(xlabel = "capital", ylabel = "interest rate", yticks = ([0, 0.045]))
plot!(legend = :bottomright)
Out[11]:
38.6. SOLUTIONS 703
704 CHAPTER 38. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
Chapter 39
Robustness
39.1 Contents
• Overview 39.2
• The Model 39.3
• Constructing More Robust Policies 39.4
• Robustness as Outcome of a Two-Person Zero-Sum Game 39.5
• The Stochastic Case 39.6
• Implementation 39.7
• Application 39.8
• Appendix 39.9
39.2 Overview
This lecture modifies a Bellman equation to express a decision maker’s doubts about transi-
tion dynamics.
His specification doubts make the decision maker want a robust decision rule.
Robust means insensitive to misspecification of transition dynamics.
The decision maker has a single approximating model.
He calls it approximating to acknowledge that he doesn’t completely trust it.
He fears that outcomes will actually be determined by another model that he cannot describe
explicitly.
All that he knows is that the actual data-generating model is in some (uncountable) set of
models that surrounds his approximating model.
He quantifies the discrepancy between his approximating model and the genuine data-
generating model by using a quantity called entropy.
(We’ll explain what entropy means below)
He wants a decision rule that will work well enough no matter which of those other models
actually governs outcomes.
This is what it means for his decision rule to be “robust to misspecification of an approximat-
ing model”.
705
706 CHAPTER 39. ROBUSTNESS
Note
In reading this lecture, please don’t think that our decision maker is paranoid
when he conducts a worst-case analysis. By designing a rule that works well
against a worst-case, his intention is to construct a rule that will work well across
a set of models.
Our “robust” decision maker wants to know how well a given rule will work when he does not
know a single transition law ….
… he wants to know sets of values that will be attained by a given decision rule 𝐹 under a set
of transition laws.
Ultimately, he wants to design a decision rule 𝐹 that shapes these sets of values in ways that
he prefers.
With this in mind, consider the following graph, which relates to a particular decision prob-
lem to be explained below
39.2. OVERVIEW 707
Notice that the less robust rule 𝐹𝑟 promises higher values for small misspecifications (small
entropy).
(But it is more fragile in the sense that it is more sensitive to perturbations of the approxi-
mating model)
Below we’ll explain in detail how to construct these sets of values for a given 𝐹 , but for now
….
Here is a hint about the secret weapons we’ll use to construct these sets
If you want to understand more about why one serious quantitative researcher is interested in
this approach, we recommend Lars Peter Hansen’s Nobel lecture.
39.2.4 Setup
For simplicity, we present ideas in the context of a class of problems with linear transition
laws and quadratic objective functions.
To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than
value maximization.
To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of con-
trols {𝑢𝑡 } to minimize
39.4. CONSTRUCTING MORE ROBUST POLICIES 709
∞
∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (1)
𝑡=0
As before,
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
• 𝑅 is 𝑛 × 𝑛 and 𝑄 is 𝑘 × 𝑘
Here 𝑥𝑡 is the state, 𝑢𝑡 is the control, and 𝑤𝑡 is a shock vector.
For now we take {𝑤𝑡 } ∶= {𝑤𝑡 }∞
𝑡=1 to be deterministic — a single fixed sequence.
We also allow for model uncertainty on the part of the agent solving this optimization prob-
lem.
In particular, the agent takes 𝑤𝑡 = 0 for all 𝑡 ≥ 0 as a benchmark model, but admits the
possibility that this model might be wrong.
As a consequence, she also considers a set of alternative models expressed in terms of se-
quences {𝑤𝑡 } that are “close” to the zero sequence.
She seeks a policy that will do well enough for a set of alternative models whose members are
pinned down by sequences {𝑤𝑡 }.
Soon we’ll quantify the quality of a model specification in terms of the maximal size of the
∞
expression ∑𝑡=0 𝛽 𝑡+1 𝑤𝑡+1
′
𝑤𝑡+1 .
If our agent takes {𝑤𝑡 } as a given deterministic sequence, then, drawing on intuition from
earlier lectures on dynamic programming, we can anticipate Bellman equations such as
The penalty parameter 𝜃 controls how much we penalize the maximizing agent for “harming”
the minimizing agent.
By raising 𝜃 more and more, we more and more limit the ability of maximizing agent to dis-
tort outcomes relative to the approximating model.
So bigger 𝜃 is implicitly associated with smaller distortion sequences {𝑤𝑡 }.
where
and 𝐼 is a 𝑗 × 𝑗 identity matrix. Substituting this expression for the maximum into (3) yields
𝑃 = ℬ(𝒟(𝑃 ))
The operator ℬ is the standard (i.e., non-robust) LQ Bellman operator, and 𝑃 = ℬ(𝑃 ) is the
standard matrix Riccati equation coming from the Bellman equation — see this discussion.
Under some regularity conditions (see [39]), the operator ℬ ∘ 𝒟 has a unique positive definite
fixed point, which we denote below by 𝑃 ̂ .
A robust policy, indexed by 𝜃, is 𝑢 = −𝐹 ̂ 𝑥 where
We also define
The interpretation of 𝐾̂ is that 𝑤𝑡+1 = 𝐾𝑥̂ 𝑡 on the worst-case path of {𝑥𝑡 }, in the sense that
this vector is the maximizer of (4) evaluated at the fixed rule 𝑢 = −𝐹 ̂ 𝑥.
Note that 𝑃 ̂ , 𝐹 ̂ , 𝐾̂ are all determined by the primitives and 𝜃.
Note also that if 𝜃 is very large, then 𝒟 is approximately equal to the identity mapping.
Hence, when 𝜃 is large, 𝑃 ̂ and 𝐹 ̂ are approximately equal to their standard LQ values.
Furthermore, when 𝜃 is large, 𝐾̂ is approximately equal to zero.
Conversely, smaller 𝜃 is associated with greater fear of model misspecification, and greater
concern for robustness.
What we have done above can be interpreted in terms of a two-person zero-sum game in
which 𝐹 ̂ , 𝐾̂ are Nash equilibrium objects.
Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting
the possibility of misspecification.
Agent 2 is an imaginary malevolent player.
Agent 2’s malevolence helps the original agent to compute bounds on his value function
across a set of models.
We begin with agent 2’s problem.
Agent 2
1. knows a fixed policy 𝐹 specifying the behavior of agent 1, in the sense that 𝑢𝑡 = −𝐹 𝑥𝑡
for all 𝑡
2. responds by choosing a shock sequence {𝑤𝑡 } from a set of paths sufficiently close to the
benchmark sequence {0, 0, 0, …}
A natural way to say “sufficiently close to the zero sequence” is to restrict the summed inner
∞
product ∑𝑡=1 𝑤𝑡′ 𝑤𝑡 to be small.
However, to obtain a time-invariant recursive formulation, it turns out to be convenient to
restrict a discounted inner product
∞
∑ 𝛽 𝑡 𝑤𝑡′ 𝑤𝑡 ≤ 𝜂 (9)
𝑡=1
712 CHAPTER 39. ROBUSTNESS
Now let 𝐹 be a fixed policy, and let 𝐽𝐹 (𝑥0 , w) be the present-value cost of that policy given
sequence w ∶= {𝑤𝑡 } and initial condition 𝑥0 ∈ ℝ𝑛 .
Substituting −𝐹 𝑥𝑡 for 𝑢𝑡 in (1), this value can be written as
∞
𝐽𝐹 (𝑥0 , w) ∶= ∑ 𝛽 𝑡 𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 (10)
𝑡=0
where
∞
max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃(𝑤𝑡+1
′
𝑤𝑡+1 − 𝜂)}
w
𝑡=0
∞
max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 }
w
𝑡=0
or, equivalently,
∞
min ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 + 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 } (12)
w
𝑡=0
subject to (11).
What’s striking about this optimization problem is that it is once again an LQ discounted
dynamic programming problem, with w = {𝑤𝑡 } as the sequence of controls.
The expression for the optimal policy can be found by applying the usual LQ formula (see
here).
We denote it by 𝐾(𝐹 , 𝜃), with the interpretation 𝑤𝑡+1 = 𝐾(𝐹 , 𝜃)𝑥𝑡 .
The remaining step for agent 2’s problem is to set 𝜃 to enforce the constraint (9), which can
be done by choosing 𝜃 = 𝜃𝜂 such that
∞
𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃𝜂 )′ 𝐾(𝐹 , 𝜃𝜂 )𝑥𝑡 = 𝜂 (13)
𝑡=0
Here 𝑥𝑡 is given by (11) — which in this case becomes 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 + 𝐶𝐾(𝐹 , 𝜃))𝑥𝑡 .
39.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 713
39.5.2 Using Agent 2’s Problem to Construct Bounds on the Value Sets
Define the minimized object on the right side of problem (12) as 𝑅𝜃 (𝑥0 , 𝐹 ).
Because “minimizers minimize” we have
∞ ∞
𝑅𝜃 (𝑥0 , 𝐹 ) ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } + 𝛽𝜃 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1 ,
𝑡=0 𝑡=0
∞
𝑅𝜃 (𝑥0 , 𝐹 ) − 𝜃 ent ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } (14)
𝑡=0
where
∞
ent ∶= 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality (14) is a straight line with slope −𝜃.
Technically, it is a “separating hyperplane”.
At a particular value of entropy, the line is tangent to the lower bound of values as a function
of entropy.
In particular, the lower bound on the left side of (14) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)′ 𝐾(𝐹 , 𝜃)𝑥𝑡 (15)
𝑡=0
To construct the lower bound on the set of values associated with all perturbations w satisfy-
ing the entropy constraint (9) at a given entropy level, we proceed as follows:
• Compute the minimizer 𝑅𝜃 (𝑥0 , 𝐹 ) and the associated entropy using (15).
• Compute the lower bound on the value function 𝑅𝜃 (𝑥0 , 𝐹 ) − 𝜃 ent and plot it against
ent.
• Repeat the preceding three steps for a range of values of 𝜃 to trace out the lower bound.
Note
This procedure sweeps out a set of separating hyperplanes indexed by different
values for the Lagrange multiplier 𝜃.
714 CHAPTER 39. ROBUSTNESS
∞
𝑉𝜃 (𝑥 𝑡 ′ ′ ̃ ′
̃ 0 , 𝐹 ) = max ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 − 𝛽 𝜃𝑤𝑡+1 𝑤𝑡+1 } (16)
w
𝑡=0
∞ ∞
𝑡 ′ ′ ̃ 𝑡 ′
̃ 0 , 𝐹 ) ≥ ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 } − 𝛽 𝜃 ∑ 𝛽 𝑤𝑡+1 𝑤𝑡+1
𝑉𝜃 (𝑥
𝑡=0 𝑡=0
∞
𝑉𝜃 (𝑥 ̃ 𝑡 ′ ′
̃ 0 , 𝐹 ) + 𝜃 ent ≥ ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 } (17)
𝑡=0
where
∞
ent ≡ 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality (17) is a straight line with slope 𝜃.̃
The upper bound on the left side of (17) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)̃ ′ 𝐾(𝐹 , 𝜃)𝑥
̃
𝑡 (18)
𝑡=0
To construct the upper bound on the set of values associated all perturbations w with a given
entropy we proceed much as we did for the lower bound.
Now in the interest of reshaping these sets of values by choosing 𝐹 , we turn to agent 1’s prob-
lem.
39.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 715
∞
min ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 − 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 } (19)
{𝑢𝑡 }
𝑡=0
∞
∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 − 𝛽𝜃𝐾 ′ 𝐾)𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (20)
𝑡=0
subject to
Once again, the expression for the optimal policy can be found here — we denote it by 𝐹 ̃ .
Clearly the 𝐹 ̃ we have obtained depends on 𝐾, which, in agent 2’s problem, depended on an
initial policy 𝐹 .
Holding all other parameters fixed, we can represent this relationship as a mapping Φ, where
𝐹 ̃ = Φ(𝐾(𝐹 , 𝜃))
As you may have already guessed, the robust policy 𝐹 ̂ defined in (7) is a fixed point of the
mapping Φ.
In particular, for any given 𝜃,
2. Φ(𝐾)̂ = 𝐹 ̂
Now we turn to the stochastic case, where the sequence {𝑤𝑡 } is treated as an iid sequence of
random vectors.
In this setting, we suppose that our agent is uncertain about the conditional probability distri-
bution of 𝑤𝑡+1 .
The agent takes the standard normal distribution 𝑁 (0, 𝐼) as the baseline conditional distribu-
tion, while admitting the possibility that other “nearby” distributions prevail.
These alternative conditional distributions of 𝑤𝑡+1 might depend nonlinearly on the history
𝑥𝑠 , 𝑠 ≤ 𝑡.
To implement this idea, we need a notion of what it means for one distribution to be near
another one.
Here we adopt a very useful measure of closeness for distributions known as the relative en-
tropy, or Kullback-Leibler divergence.
For densities 𝑝, 𝑞, the Kullback-Leibler divergence of 𝑞 from 𝑝 is defined as
𝑝(𝑥)
𝐷𝐾𝐿 (𝑝, 𝑞) ∶= ∫ ln [ ] 𝑝(𝑥) 𝑑𝑥
𝑞(𝑥)
𝐽 (𝑥) = min max {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 [∫ 𝐽 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) 𝜓(𝑑𝑤) − 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)]} (22)
𝑢 𝜓∈𝒫
Here 𝒫 represents the set of all densities on ℝ𝑛 and 𝜙 is the benchmark distribution 𝑁 (0, 𝐼).
The distribution 𝜙 is chosen as the least desirable conditional distribution in terms of next
period outcomes, while taking into account the penalty term 𝜃𝐷𝐾𝐿 (𝜓, 𝜙).
This penalty term plays a role analogous to the one played by the deterministic penalty 𝜃𝑤′ 𝑤
in (3), since it discourages large deviations from the benchmark.
The maximization problem in (22) appears highly nontrivial — after all, we are maximizing
over an infinite dimensional space consisting of the entire set of densities.
However, it turns out that the solution is tractable, and in fact also falls within the class of
normal distributions.
First, we note that 𝐽 has the form 𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 for some positive definite matrix 𝑃 and
constant real number 𝑑.
Moreover, it turns out that if (𝐼 − 𝜃−1 𝐶 ′ 𝑃 𝐶)−1 is nonsingular, then
where
Substituting the expression for the maximum into Bellman equation (22) and using 𝐽 (𝑥) =
𝑥′ 𝑃 𝑥 + 𝑑 gives
𝑥′ 𝑃 𝑥 + 𝑑 = min {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢) + 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )]} (25)
𝑢
Since constant terms do not affect minimizers, the solution is the same as (6), leading to
To solve this Bellman equation, we take 𝑃 ̂ to be the positive definite fixed point of ℬ ∘ 𝒟.
In addition, we take 𝑑 ̂ as the real number solving 𝑑 = 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )], which is
𝛽
𝑑 ̂ ∶= 𝜅(𝜃, 𝑃 ) (26)
1−𝛽
The robust policy in this stochastic case is the minimizer in (25), which is once again 𝑢 =
−𝐹 ̂ 𝑥 for 𝐹 ̂ given by (7).
Substituting the robust policy into (24) we obtain the worst case shock distribution:
̂ 𝑡 , (𝐼 − 𝜃−1 𝐶 ′ 𝑃 ̂ 𝐶)−1 )
𝑤𝑡+1 ∼ 𝑁 (𝐾𝑥
Before turning to implementation, we briefly outline how to compute several other quantities
of interest.
One thing we will be interested in doing is holding a policy fixed and computing the dis-
counted loss associated with that policy.
So let 𝐹 be a given policy and let 𝐽𝐹 (𝑥) be the associated loss, which, by analogy with (22),
satisfies
718 CHAPTER 39. ROBUSTNESS
Writing 𝐽𝐹 (𝑥) = 𝑥′ 𝑃𝐹 𝑥 + 𝑑𝐹 and applying the same argument used to derive (23) we get
and
𝛽 𝛽
𝑑𝐹 ∶= 𝜅(𝜃, 𝑃𝐹 ) = 𝜃 ln[det(𝐼 − 𝜃−1 𝐶 ′ 𝑃𝐹 𝐶)−1 ] (27)
1−𝛽 1−𝛽
If you skip ahead to the appendix, you will be able to verify that −𝑃𝐹 is the solution to the
Bellman equation in agent 2’s problem discussed above — we use this in our computations.
39.7 Implementation
The QuantEcon.jl package provides a type called RBLQ for implementation of robust LQ opti-
mal control.
The code can be found on GitHub.
Here is a brief description of the methods of the type
• d_operator() and b_operator() implement 𝒟 and ℬ respectively
• robust_rule() and robust_rule_simple() both solve for the triple 𝐹 ̂ , 𝐾,̂ 𝑃 ̂ , as
described in equations (7) – (8) and the surrounding discussion
– robust_rule() is more efficient
– robust_rule_simple() is more transparent and easier to follow
• K_to_F() and F_to_K() solve the decision problems of agent 1 and agent 2 respec-
tively
• compute_deterministic_entropy() computes the left-hand side of (13)
• evaluate_F() computes the loss and entropy associated with a given policy — see
this discussion
39.8 Application
Let us consider a monopolist similar to this one, but now facing model uncertainty.
The inverse demand function is 𝑝𝑡 = 𝑎0 − 𝑎1 𝑦𝑡 + 𝑑𝑡
where
iid
𝑑𝑡+1 = 𝜌𝑑𝑡 + 𝜎𝑑 𝑤𝑡+1 , {𝑤𝑡 } ∼ 𝑁 (0, 1)
(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑦𝑡 − 𝛾 − 𝑐𝑦𝑡
2
Its objective is to maximize expected discounted profits, or, equivalently, to minimize
∞
𝔼 ∑𝑡=0 𝛽 𝑡 (−𝑟𝑡 ).
To form a linear regulator problem, we take the state and control to be
1
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡⎥
⎤ and 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
⎣𝑑𝑡 ⎦
0 𝑏 0
𝑅 = −⎡
⎢ 𝑏 −𝑎 1 1/2⎤
⎥ and 𝑄 = 𝛾/2
⎣0 1/2 0 ⎦
1 0 0 0 0
𝐴 = ⎢0 1 0⎤
⎡
⎥, 𝐵 = ⎢1⎤
⎡
⎥, 𝐶 =⎢0⎤
⎡
⎥
⎣0 0 𝜌 ⎦ 0
⎣ ⎦ ⎣𝜎𝑑 ⎦
The standard normal distribution for 𝑤𝑡 is understood as the agent’s baseline, with uncer-
tainty parameterized by 𝜃.
We compute value-entropy correspondences for two policies.
1. The no concern for robustness policy 𝐹0 , which is the ordinary LQ loss minimizer.
The code for producing the graph shown above, with blue being for the robust policy, is as
follows
# model parameters
a_0 = 100
a_1 = 0.5
ρ = 0.9
σ_d = 0.05
β = 0.95
720 CHAPTER 39. ROBUSTNESS
c = 2
γ = 50.0
θ = 0.002
ac = (a_0 - c) / 2.0
# Define LQ matrices
R = [ 0 ac 0;
ac -a_1 0.5;
0. 0.5 0]
R = -R # For minimization
Q = Matrix([γ / 2.0]')
A = [1. 0. 0.;
0. 1. 0.;
0. 0. ρ]
B = [0. 1. 0.]'
C = [0. 0. σ_d]'
## Functions
function evaluate_policy(θ, F)
rlq = RBLQ(Q, R, A, B, C, β, θ)
K_F, P_F, d_F, O_F, o_F = evaluate_F(rlq, F)
x0 = [1.0 0.0 0.0]'
value = - x0' * P_F * x0 .- d_F
entropy = x0' * O_F * x0 .+ o_F
return value[1], entropy[1] # return scalars
end
data = zeros(grid_size, 2)
## Main
emax = 1.6e6
Out[3]:
722 CHAPTER 39. ROBUSTNESS
Can you explain the different shape of the value-entropy correspondence for the robust pol-
icy?
39.9 Appendix
We sketch the proof only of the first claim in this section, which is that, for any given 𝜃,
𝐾(𝐹 ̂ , 𝜃) = 𝐾,̂ where 𝐾̂ is as given in (8).
This is the content of the next lemma.
Lemma. If 𝑃 ̂ is the fixed point of the map ℬ ∘ 𝒟 and 𝐹 ̂ is the robust policy as given in (7),
then
Proof: As a first step, observe that when 𝐹 = 𝐹 ̂ , the Bellman equation associated with the
LQ problem (11) – (12) is
(revisit this discussion if you don’t know where (29) comes from) and the optimal policy is
Using the definition of 𝒟, we can rewrite the right-hand side more simply as
Although it involves a substantial amount of algebra, it can be shown that the latter is just
𝑃̂ .
(Hint: Use the fact that 𝑃 ̂ = ℬ(𝒟(𝑃 ̂ )))
724 CHAPTER 39. ROBUSTNESS
Chapter 40
40.1 Contents
• Overview 40.2
• Discrete DPs 40.3
• Solving Discrete DPs 40.4
• Example: A Growth Model 40.5
• Exercises 40.6
• Solutions 40.7
• Appendix: Algorithms 40.8
40.2 Overview
In this lecture we discuss a family of dynamic programming problems with the following fea-
tures:
2. an infinite horizon
3. discounted rewards
725
726 CHAPTER 40. DISCRETE STATE DYNAMIC PROGRAMMING
When a given model is not inherently discrete, it is common to replace it with a discretized
version in order to use discrete DP techniques.
This lecture covers
• the theory of dynamic programming in a discrete setting, plus examples and applica-
tions
• a powerful set of routines for solving discrete DPs from the QuantEcon code libary
40.2.2 References
For background reading on dynamic programming and additional applications, see, for exam-
ple,
• [68]
• [53], section 3.5
• [86]
• [100]
• [92]
• [78]
• EDTC, chapter 5
∞
𝔼 ∑ 𝛽 𝑡 𝑟(𝑠𝑡 , 𝑎𝑡 ) (1)
𝑡=0
where
• 𝑠𝑡 is the state variable
• 𝑎𝑡 is the action
• 𝛽 is a discount factor
• 𝑟(𝑠𝑡 , 𝑎𝑡 ) is interpreted as a current reward when the state is 𝑠𝑡 and the action chosen is
𝑎𝑡
Each pair (𝑠𝑡 , 𝑎𝑡 ) pins down transition probabilities 𝑄(𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) for the next period state
𝑠𝑡+1 .
Thus, actions influence not only current rewards but also the future time path of the state.
40.3. DISCRETE DPS 727
The essence of dynamic programming problems is to trade off current rewards vs favorable
positioning of the future state (modulo randomness).
Examples:
• consuming today vs saving and accumulating assets
• accepting a job offer today vs seeking a better one in the future
• exercising an option now vs waiting
40.3.1 Policies
The most fruitful way to think about solutions to discrete DP problems is to compare poli-
cies.
In general, a policy is a randomized map from past actions and states to current action.
In the setting formalized below, it suffices to consider so-called stationary Markov policies,
which consider only the current state.
In particular, a stationary Markov policy is a map 𝜎 from states to actions
• 𝑎𝑡 = 𝜎(𝑠𝑡 ) indicates that 𝑎𝑡 is the action to be taken in state 𝑠𝑡
It is known that, for any arbitrary policy, there exists a stationary Markov policy that domi-
nates it at least weakly.
• See section 5.5 of [86] for discussion and proofs.
In what follows, stationary Markov policies are referred to simply as policies.
The aim is to find an optimal policy, in the sense of one that maximizes (1).
Let’s now step through these ideas more carefully.
2. A finite set of feasible actions 𝐴(𝑠) for each state 𝑠 ∈ 𝑆, and a corresponding set of
feasible state-action pairs
SA ∶= {(𝑠, 𝑎) ∣ 𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠)}
1. A reward function 𝑟 ∶ SA → ℝ
We also use the notation 𝐴 ∶= ⋃𝑠∈𝑆 𝐴(𝑠) = {0, … , 𝑚 − 1} and call this set the action space.
A policy is a function 𝜎 ∶ 𝑆 → 𝐴.
728 CHAPTER 40. DISCRETE STATE DYNAMIC PROGRAMMING
Comments
• {𝑠𝑡 } ∼ 𝑄𝜎 means that the state is generated by stochastic matrix 𝑄𝜎
• See this discussion on computing expectations of Markov chains for an explanation of
the expression in (2)
Notice that we’re not really distinguishing between functions from 𝑆 to ℝ and vectors in ℝ𝑛 .
This is natural because they are in one to one correspondence.
Let 𝑣𝜎 (𝑠) denote the discounted sum of expected reward flows from policy 𝜎 when the initial
state is 𝑠.
To calculate this quantity we pass the expectation through the sum in (1) and use (2) to get
∞
𝑣𝜎 (𝑠) = ∑ 𝛽 𝑡 (𝑄𝑡𝜎 𝑟𝜎 )(𝑠) (𝑠 ∈ 𝑆)
𝑡=0
This function is called the policy value function for the policy 𝜎.
The optimal value function, or simply value function, is the function 𝑣∗ ∶ 𝑆 → ℝ defined by
(We can use max rather than sup here because the domain is a finite set)
A policy 𝜎 ∈ Σ is called optimal if 𝑣𝜎 (𝑠) = 𝑣∗ (𝑠) for all 𝑠 ∈ 𝑆.
Given any 𝑤 ∶ 𝑆 → ℝ, a policy 𝜎 ∈ Σ is called 𝑤-greedy if
As discussed in detail below, optimal policies are precisely those that are 𝑣∗ -greedy.
𝑇𝜎 𝑣 = 𝑟𝜎 + 𝛽𝑄𝜎 𝑣
Now that the theory has been set out, let’s turn to solution methods.
730 CHAPTER 40. DISCRETE STATE DYNAMIC PROGRAMMING
Code for solving discrete DPs is available in ddp.jl from the QuantEcon.jl code library.
It implements the three most important solution methods for discrete dynamic programs,
namely
• value function iteration
• policy function iteration
• modified policy function iteration
Let’s briefly review these algorithms and their implementation.
Perhaps the most familiar method for solving all manner of dynamic programs is value func-
tion iteration.
This algorithm uses the fact that the Bellman operator 𝑇 is a contraction mapping with fixed
point 𝑣∗ .
Hence, iterative application of 𝑇 to any initial function 𝑣0 ∶ 𝑆 → ℝ converges to 𝑣∗ .
The details of the algorithm can be found in the appendix.
This routine, also known as Howard’s policy improvement algorithm, exploits more closely the
particular structure of a discrete DP problem.
Each iteration consists of
1. A policy evaluation step that computes the value 𝑣𝜎 of a policy 𝜎 by solving the linear
equation 𝑣 = 𝑇𝜎 𝑣.
In the current setting policy iteration computes an exact optimal policy in finitely many iter-
ations.
• See theorem 10.2.6 of EDTC for a proof
The details of the algorithm can be found in the appendix.
Modified policy iteration replaces the policy evaluation step in policy iteration with “partial
policy evaluation”.
The latter computes an approximation to the value of a policy 𝜎 by iterating 𝑇𝜎 for a speci-
fied number of times.
This approach can be useful when the state space is very large and the linear system in the
policy evaluation step of policy iteration is correspondingly difficult to solve.
The details of the algorithm can be found in the appendix.
40.5. EXAMPLE: A GROWTH MODEL 731
𝑠′ = 𝑎 + 𝑈 where 𝑈 ∼ 𝑈 [0, … , 𝐵]
1
if 𝑎 ≤ 𝑠′ ≤ 𝑎 + 𝐵
𝑄(𝑠, 𝑎, 𝑠′ ) ∶= { 𝐵+1 (3)
0 otherwise
This information will be used to create an instance of DiscreteDP by passing the following
information
1. An 𝑛 × 𝑚 reward array 𝑅
2. An 𝑛 × 𝑚 × 𝑛 transition probability array 𝑄
3. A discount factor 𝛽
40.5.3 Setup
function transition_matrices(g)
@unpack B, M, α, β = g
u(c) = c^α
n = B + M + 1
m = M + 1
R = zeros(n, m)
Q = zeros(n, m, n)
for a in 0:M
Q[:, a + 1, (a:(a + B)) .+ 1] .= 1 / (B + 1)
for s in 0:(B + M)
R[s + 1, a + 1] = (a≤s ? u(s - a) : -Inf)
end
end
return (Q = Q, R = R)
end
In [5]: g = SimpleOG();
Q, R = transition_matrices(g);
In case the preceding code was too concise, we can see a more verbose form
n = B + M + 1
m = M + 1
0
#but the state s and choice a can be 0
for a in 0:M
for s in 0:(B + M)
if a <= s #i.e. if feasible
R[s + 1, a + 1] = u(s - a)
end
end
end
In [9]: fieldnames(typeof(results))
The most important attributes are v, the value function, and σ, the optimal policy
In [10]: results.v
In [11]: results.sigma .- 1
3
3
4
5
5
5
5
Here 1 is subtracted from results.sigma because we added 1 to each state and action to create
valid indices.
Since we’ve used policy iteration, these results will be exact unless we hit the iteration bound
max_iter.
Let’s make sure this didn’t happen
In [12]: results.num_iter
Out[12]: 3
In [13]: stationary_distributions(results.mc)[1]
std_2 = stationary_distributions(results_2.mc)[1]
Out[15]:
40.5. EXAMPLE: A GROWTH MODEL 737
In [16]: B = 10
M = 5
α = 0.5
β = 0.9
u(c) = c^α
n = B + M + 1
m = M + 1
s_indices = Int64[]
a_indices = Int64[]
Q = zeros(0, n)
R = zeros(0)
738 CHAPTER 40. DISCRETE STATE DYNAMIC PROGRAMMING
b = 1 / (B + 1)
for s in 0:(M + B)
for a in 0:min(M, s)
s_indices = [s_indices; s + 1]
a_indices = [a_indices; a + 1]
q = zeros(1, n)
q[(a + 1):((a + B) + 1)] .= b
Q = [Q; q]
R = [R; u(s-a)]
end
end
40.6 Exercises
In the stochastic optimal growth lecture dynamic programming lecture, we solve a benchmark
model that has an analytical solution to check we could replicate it numerically.
The exercise is to replicate this solution using DiscreteDP.
40.7 Solutions
40.7.1 Setup
Details of the model can be found in the lecture. As in the lecture, we let 𝑓(𝑘) = 𝑘𝛼 with
𝛼 = 0.65, 𝑢(𝑐) = log 𝑐, and 𝛽 = 0.95.
In [17]: α = 0.65
f(k) = k.^α
u_log(x) = log(x)
β = 0.95
Out[17]: 0.95
Here we want to solve a finite state version of the continuous state model above. We dis-
cretize the state space into a grid of size grid_size = 500, from 10−6 to grid_max=2.
In [18]: grid_max = 2
grid_size = 500
grid = range(1e-6, grid_max, length = grid_size)
Out[18]: 1.0e-6:0.004008014028056112:2.0
We choose the action to be the amount of capital to save for the next period (the state is the
capital stock at the beginning of the period). Thus the state indices and the action indices
are both 1, …, grid_size. Action (indexed by) a is feasible at state (indexed by) s if and
only if grid[a] < f([grid[s]) (zero consumption is not allowed because of the log util-
ity).
Thus the Bellman equation is:
Out[19]: 118841
In [20]: R = u_log.(C[C.>0]);
740 CHAPTER 40. DISCRETE STATE DYNAMIC PROGRAMMING
for i in 1:L
Q[i, a_indices[i]] = 1
end
Out[23]: 10
Let us compare the solution of the discrete model with the exact solution of the original con-
tinuous model. Here’s the exact solution:
ab = α * β
c1 = (log(1 - α * β) + log(α * β) * α * β / (1 - α * β)) / (1 - β)
c2 = α / (1 - α * β)
v_star(k) = c1 + c2 * log(k)
c_star(k) = (1 - α * β) * k.^α
"continuous"])
Out[25]:
40.7. SOLUTIONS 741
They are barely distinguishable (although you can see the difference if you zoom).
Now let’s look at the discrete and exact policy functions for consumption.
:topleft)
Out[26]:
742 CHAPTER 40. DISCRETE STATE DYNAMIC PROGRAMMING
These functions are again close, although some difference is visible and becomes more obvious
as you zoom. Here are some statistics:
Out[27]: 121.49819147053378
This is a big error, but most of the error occurs at the lowest gridpoint. Otherwise the fit is
reasonable:
Out[28]: 0.012681735127500815
Out[29]: true
Let’s try different solution methods. The results below show that policy function iteration
and modified policy function iteration are much faster that value function iteration.
In [32]: res1.num_iter
Out[32]: 294
In [33]: σ == res1.sigma
Out[33]: true
In [35]: res2.num_iter
Out[35]: 16
In [36]: σ == res2.sigma
Out[36]: true
40.7. SOLUTIONS 743
ws = []
colors = []
w = w_init
for i in 0:n-1
w = bellman_operator(ddp, w)
push!(ws, w)
push!(colors, RGBA(0, 0, 0, i/n))
end
plot(grid,
w_init,
ylims = (-40, -20),
lw = 2,
xlims = extrema(grid),
label = "initial condition")
Out[37]:
We next plot the consumption policies along the value iteration. First we write a function to
generate the and record the policies at given stages of iteration.
744 CHAPTER 40. DISCRETE STATE DYNAMIC PROGRAMMING
Out[39]:
40.7. SOLUTIONS 745
Finally, let us work on Exercise 2, where we plot the trajectories of the capital stock for three
different discount factors, 0.9, 0.94, and 0.98, with initial condition 𝑘0 = 0.1.
sample_size = 25
for β in discount_factors
ddp0.beta = β
res0 = solve(ddp0, PFI)
k_path_ind = simulate(res0.mc, sample_size, init=k_init_ind)
k_path = grid[k_path_ind.+1]
push!(k_paths, k_path)
push!(labels, "β = $β")
end
plot(k_paths,
xlabel = "time",
ylabel = "capital",
ylim = (0.1, 0.3),
lw = 2,
markershape = :circle,
label = reshape(labels, 1, length(labels)))
Out[40]:
This appendix covers the details of the solution algorithms implemented for DiscreteDP.
We will make use of the following notions of approximate optimality:
• For 𝜀 > 0, 𝑣 is called an 𝜀-approximation of 𝑣∗ if ‖𝑣 − 𝑣∗ ‖ < 𝜀
40.8. APPENDIX: ALGORITHMS 747
The DiscreteDP value iteration method implements value function iteration as follows
2. Compute 𝑣𝑖+1 = 𝑇 𝑣𝑖 .
3. If ‖𝑣𝑖+1 − 𝑣𝑖 ‖ < [(1 − 𝛽)/(2𝛽)]𝜀, then go to step 4; otherwise, set 𝑖 = 𝑖 + 1 and go to step
2.
4. If 𝜎𝑖+1 = 𝜎𝑖 , then return 𝑣𝜎𝑖 and 𝜎𝑖+1 ; otherwise, set 𝑖 = 𝑖 + 1 and go to step 2.
Given 𝜀 > 0, provided that 𝑣0 is such that 𝑇 𝑣0 ≥ 𝑣0 , the modified policy iteration algorithm
terminates in a finite number of iterations.
It returns an 𝜀/2-approximation of the optimal value function and an 𝜀-optimal policy func-
tion (unless iter_max is reached).
See also the documentation for DiscreteDP.
Part V
749
Chapter 41
41.1 Contents
• Overview 41.2
• The SEIR Model 41.3
• Implementation 41.4
• Experiments 41.5
• Ending Lockdown 41.6
41.2 Overview
751
752 CHAPTER 41. MODELING COVID 19 WITH DIFFERENTIAL EQUATIONS
• how long the caseload can be deferred (hopefully until a vaccine arrives)
41.2.1 Setup
In addition, we will be exploring the Ordinary Differential Equations package within the
SciML ecosystem.
In the version of the SEIR model, all individuals in the population are assumed to be in a
finite number of states.
The states are: susceptible (S), exposed (E), infected (I) and removed (R).
This type of compartmentalized model has many extensions (e.g. SEIRS relaxes lifetime im-
munity and allow transitions from 𝑅 → 𝑆).
Comments:
• Those in state R have been infected and either recovered or died. Note that in some
variations, R may refer only to recovered agents.
• Those who have recovered, and live, are assumed to have acquired immunity.
• Those in the exposed group are not yet infectious.
Within the SEIR model, the flow across states follows the path 𝑆 → 𝐸 → 𝐼 → 𝑅.
We will ignore birth and non-covid death during our time horizon, and assume a large, con-
stant, number of individuals of size 𝑁 throughout.
With this, the symbols 𝑆, 𝐸, 𝐼, 𝑅 are used for the total number of individuals in each state at
each point in time, and 𝑆(𝑡) + 𝐸(𝑡) + 𝐼(𝑡) + 𝑅(𝑡) = 𝑁 for all 𝑡.
Since we have assumed that 𝑁 is large, we can use a continuum approximation for the num-
ber of individuals in each state.
The transitions between those states are governed by the following rates
• 𝛽(𝑡) is called the transmission rate or effective contact rate (the rate at which individu-
als bump into others and expose them to the virus).
• 𝜎 is called the infection rate (the rate at which those who are exposed become infected)
• 𝛾 is called the recovery rate (the rate at which infected people recover or die)
41.3. THE SEIR MODEL 753
The rate 𝛽(𝑡) is influenced by both the characteristics of the disease (e.g. the type and length
of prolonged contact required for a transmission) and behavior of the individuals (e.g. social
distancing, hygiene).
The SEIR model can then be written as
𝑑𝑆 𝐼
= −𝛽 𝑆
𝑑𝑡 𝑁
𝑑𝐸 𝐼
= 𝛽𝑆 − 𝜎𝐸
𝑑𝑡 𝑁 (1)
𝑑𝐼
= 𝜎𝐸 − 𝛾𝐼
𝑑𝑡
𝑑𝑅
= 𝛾𝐼
𝑑𝑡
Here, 𝑑𝑦/𝑑𝑡 represents the time derivative for the particular variable.
The first term of (1), −𝛽 𝑆 𝑁𝐼 , is the flow of individuals moving from 𝑆 → 𝐸, and highlights
the underlying dynamics of the epidemic
• Individuals in the susceptible state (S) have a rate 𝛽(𝑡) of prolonged contacts with other
individuals where transmission would occur if either was infected
• Of these contacts, a fraction 𝐼(𝑡)
𝑁 will be with infected agents (since we assumed that
exposed individuals are not yet infectious)
• Finally, there are 𝑆(𝑡) susceptible individuals.
• The sign indicates that the product of those terms is the outflow from the 𝑆 state, and
an inflow to the 𝐸 state.
If 𝛽 was constant, then we could define 𝑅0 ∶= 𝛽/𝛾. This is the famous basic reproduction
number for the SEIR model. See [51] for more details.
When the transmission rate is time-varying, we will follow notation in [31] and refer to 𝑅0 (𝑡)
as a time-varying version of the basic reproduction number.
Analyzing the system in (1) provides some intuition on the 𝑅0 (𝑡) ∶= 𝛽(𝑡)/𝛾 expression:
• Individual transitions from the infected to removed state occur at a Poisson rate 𝛾, the
expected time in the infected state is 1/𝛾
• Prolonged interactions occur at rate 𝛽, so a new individual entering the infected state
will potentially transmit the virus to an average of 𝑅0 = 𝛽 × 1/𝛾 others
• In more complicated models, see [51] for a formal definition for arbitrary models, and an
analysis on the role of 𝑅0 < 1.
Note that the notation 𝑅0 is standard in the epidemiology literature - though confusing, since
𝑅0 is unrelated to 𝑅, the symbol that represents the removed state. For the remainder of the
lecture, we will avoid using 𝑅 for removed state.
Prior to solving the model directly, we make a few changes to (1)
• Re-parameterize using 𝛽(𝑡) = 𝛾𝑅0 (𝑡)
• Define the proportion of individuals in each state as 𝑠 ∶= 𝑆/𝑁 etc.
• Divide each equation in (1) by 𝑁 , and write the system of ODEs in terms of the pro-
portions
754 CHAPTER 41. MODELING COVID 19 WITH DIFFERENTIAL EQUATIONS
𝑑𝑠
= −𝛾 𝑅0 𝑠 𝑖
𝑑𝑡
𝑑𝑒
= 𝛾 𝑅0 𝑠 𝑖 − 𝜎𝑒
𝑑𝑡 (2)
𝑑𝑖
= 𝜎𝑒 − 𝛾𝑖
𝑑𝑡
𝑑𝑟
= 𝛾𝑖
𝑑𝑡
Since the states form a partition, we could reconstruct the “removed” fraction of the popula-
tion as 𝑟 = 1 − 𝑠 − 𝑒 − 𝑖. However, keeping it in the system will make plotting more convenient.
41.3.3 Implementation
We begin by implementing a simple version of this model with a constant 𝑅0 and some base-
line parameter values (which we discuss later).
First, define the system of equations
Given this system, we choose an initial condition and a timespan, and create a ODEProblem
encapsulating the system.
With this, choose an ODE algorithm and solve the initial value problem. A good default al-
gorithm for non-stiff ODEs of this sort might be Tsit5(), which is the Tsitouras 5/4 Runge-
Kutta method).
41.3. THE SEIR MODEL 755
Out[6]:
We did not provide either a set of time steps or a dt time step size to the solve. Most ac-
curate and high-performance ODE solvers appropriate for this problem use adaptive time-
stepping, changing the step size based the degree of curvature in the derivatives.
Or, as an alternative visualization, the proportions in each state over time
In [7]: areaplot(sol.t, sol', labels = ["s" "e" "i" "r"], title = "SEIR�
↪Proportions")
Out[7]:
While maintaining the core system of ODEs in (𝑠, 𝑒, 𝑖, 𝑟), we will extend the basic model to
enable some policy experiments and calculations of aggregate values.
First, we can consider some additional calculations such as the cumulative caseload (i.e., all
those who have or have had the infection) as 𝑐 = 𝑖 + 𝑟. Differentiating that expression and
substituting from the time-derivatives of 𝑖(𝑡) and 𝑟(𝑡) yields 𝑑𝑐
𝑑𝑡 = 𝜎𝑒.
We will assume that the transmission rate follows a process with a reversion to a value 𝑅̄ 0 (𝑡)
which could conceivably be influenced by policy. The intuition is that even if the targeted
𝑅̄ 0 (𝑡) was changed through social distancing/etc., lags in behavior and implementation would
smooth out the transition, where 𝜂 governs the speed of 𝑅0 (𝑡) moves towards 𝑅̄ 0 (𝑡).
𝑑𝑅0
= 𝜂(𝑅̄ 0 − 𝑅0 ) (3)
𝑑𝑡
Finally, let 𝛿 be the mortality rate, which we will leave constant. The cumulative deaths can
be integrated through the flow 𝛾𝑖 entering the “Removed” state.
Define the cumulative number of deaths as 𝐷(𝑡) with the proportion 𝑑(𝑡) ∶= 𝐷(𝑡)/𝑁 .
𝑑 (4)
𝑑(𝑡) = 𝛿𝛾𝑖
𝑑𝑡
While we could integrate the deaths given the solution to the model ex-post, it is more con-
𝑑
venient to use the integrator built into the ODE solver. That is, we add 𝑑𝑡 𝑑(𝑡) rather than
𝑡
calculating 𝑑(𝑡) = ∫0 𝛿𝛾 𝑖(𝜏 )𝑑𝜏 ex-post.
This is a common trick when solving systems of ODEs. While equivalent in principle to using
the appropriate quadrature scheme, this becomes especially convenient when adaptive time-
stepping algorithms are used to solve the ODEs (i.e. there is not a regular time grid). Note
0
that when doing so, 𝑑(0) = ∫0 𝛿𝛾𝑖(𝜏 )𝑑𝜏 = 0 is the initial condition.
The system (2) and the supplemental equations can be written in vector form 𝑥 ∶=
[𝑠, 𝑒, 𝑖, 𝑟, 𝑅0 , 𝑐, 𝑑] with parameter tuple 𝑝 ∶= (𝜎, 𝛾, 𝜂, 𝛿, 𝑅̄ 0 (⋅))
756 CHAPTER 41. MODELING COVID 19 WITH DIFFERENTIAL EQUATIONS
Note that in those parameters, the targeted reproduction number, 𝑅̄ 0 (𝑡), is an exogenous
function.
𝑑𝑥
The model is then 𝑑𝑡 = 𝐹 (𝑥, 𝑡) where,
−𝛾 𝑅0 𝑠 𝑖
⎡ 𝛾 𝑅 𝑠 𝑖 − 𝜎𝑒 ⎤
⎢ 0 ⎥
⎢ 𝜎 𝑒 − 𝛾𝑖 ⎥
𝐹 (𝑥, 𝑡) ∶= ⎢ 𝛾𝑖 ⎥ (5)
⎢ ̄ ⎥
⎢𝜂(𝑅0 (𝑡) − 𝑅0 )⎥
⎢ 𝜎𝑒 ⎥
⎣ 𝛿𝛾𝑖 ⎦
41.3.5 Parameters
The parameters, 𝜎, 𝛿, and 𝛾 should be thought of as parameters determined from biology and
medical technology, and independent of social interactions.
As in Atkeson’s note, we set
• 𝜎 = 1/5.2 to reflect an average incubation period of 5.2 days.
• 𝛾 = 1/18 to match an average illness duration of 18 days.
• 𝑅̄ 0 (𝑡) = 𝑅0 (0) = 1.6 to match a basic reproduction number of 1.6, and initially
time-invariant
• 𝛿 = 0.01 for a one-percent mortality rate
As we will initially consider the case where 𝑅0 (0) = 𝑅̄ 0 (0), the parameter 𝜂 will not influence
the first experiment.
41.4 Implementation
This function takes the vector x of states in the system and extracts the fixed parameters
passed into the p object.
41.5. EXPERIMENTS 757
� �
The only confusing part of the notation is the R�(t, p) which evaluates the p.R� at this
time (and also allows it to depend on the p parameter).
41.4.1 Parameters
The baseline parameters are put into a named tuple generator (see previous lectures using
Parameters.jl) with default values discussed above.
Note that the default 𝑅̄ 0 (𝑡) function always equals 𝑅0𝑛 – a parameterizable natural level of
�
𝑅0 used only by the R� function
Setting initial conditions, we choose a fixed 𝑠, 𝑖, 𝑒, 𝑟, as well as 𝑅0 (0) = 𝑅0𝑛 and 𝑚(0) = 0.01
i_0 = 1E-7
e_0 = 4.0 * i_0
s_0 = 1.0 - i_0 - e_0
The tspan of (0.0, p.T) determines the 𝑡 used by the solver. The time scale needs to be
consistent with the arrival rate of the transition probabilities (i.e. the 𝛾, 𝜎 were chosen based
on daily data, so the unit of 𝑡 is a day).
The time period we investigate will be 550 days, or around 18 months:
41.5 Experiments
length(sol.t) = 45
We see that the adaptive time-stepping used approximately 45 time-steps to solve this prob-
lem to the desired accuracy. Evaluating the solver at points outside of those time-steps uses
an interpolator consistent with the solution to the ODE.
758 CHAPTER 41. MODELING COVID 19 WITH DIFFERENTIAL EQUATIONS
While it may seem that 45 time intervals is extremely small for that range, for much of the
𝑡, the functions are very flat - and hence adaptive time-stepping algorithms can move quickly
and interpolate accurately.
The solution object has built in plotting support.
Out[12]:
Here we chose saveat=0.5 to get solutions that were evenly spaced every 0.5.
Changing the saved points is just a question of storage/interpolation, and does not change
the adaptive time-stepping of the solvers.
Let’s plot current cases as a fraction of the population.
Out[14]:
Out[15]:
41.5. EXPERIMENTS 759
Let’s look at a scenario where mitigation (e.g., social distancing) is successively imposed, but
the target (maintaining 𝑅0𝑛 ) is fixed.
𝑑𝑅0
To do this, we start with 𝑅0 (0) ≠ 𝑅0𝑛 and examine the dynamics using the 𝑑𝑡 = 𝜂(𝑅0𝑛 −
𝑅0 ) ODE.
In the simple case, where 𝑅̄ 0 (𝑡) = 𝑅0𝑛 is independent of the state, the solution to the ODE
given an initial condition is 𝑅0 (𝑡) = 𝑅0 (0)𝑒−𝜂𝑡 + 𝑅0𝑛 (1 − 𝑒−𝜂𝑡 )
We will examine the case where 𝑅0 (0) = 3 and then it falls to 𝑅0𝑛 = 1.6 due to the progres-
sive adoption of stricter mitigation measures.
The parameter η controls the rate, or the speed at which restrictions are imposed.
We consider several different rates:
Let’s calculate the time path of infected people, current cases, and mortality
η_vals];
Out[18]:
Now let’s plot the number of infected persons and the cumulative number of infected persons:
Out[19]:
Out[20]:
760 CHAPTER 41. MODELING COVID 19 WITH DIFFERENTIAL EQUATIONS
The following is inspired by additional results by Andrew Atkeson on the timing of lifting
lockdown.
Consider these two mitigation scenarios:
1. choose 𝑅̄ 0 (𝑡) to target 𝑅0 (𝑡) = 0.5 for 30 days and then 𝑅0 (𝑡) = 2 for the remaining 17
months. This corresponds to lifting lockdown in 30 days.
2. 𝑅0 (𝑡) = 0.5 for 120 days and then 𝑅0 (𝑡) = 2 for the remaining 14 months. This corre-
sponds to lifting lockdown in 4 months.
For both of these, we will choose a large 𝜂 to focus on the case where rapid changes in the
lockdown policy remain feasible.
The parameters considered here start the model with 25,000 active infections and 75,000
agents already exposed to the virus and thus soon to be contagious.
# initial conditions
i_0 = 25000 / p_early.N
e_0 = 75000 / p_early.N
s_0 = 1.0 - i_0 - e_0
x_0 = [s_0, e_0, i_0, 0.0, R�_L, 0.0, 0.0] # start in lockdown
�
# create two problems, with rapid movement of R�(t) towards R�(t)
prob_early = ODEProblem(F, x_0, tspan, p_early)
prob_late = ODEProblem(F, x_0, tspan, p_late)
0.0]
Unlike the previous examples, the 𝑅̄ 0 (𝑡) functions have discontinuities which might occur. We
can improve the efficiency of the adaptive time-stepping methods by telling them to include a
step exactly at those points by using tstops
Let’s calculate the paths:
Out[22]:
𝑑𝐷(𝑡)
Next we examine the daily deaths, 𝑑𝑡 = 𝑁 𝛿𝛾𝑖(𝑡).
Out[23]:
Pushing the peak of curve further into the future may reduce cumulative deaths if a vaccine is
found, or allow health authorities to better smooth the caseload.
41.6.1 Randomness
Despite its richness, the model above is fully deterministic. The policy 𝑅̄ 0 (𝑡) could change
over time, but only in predictable ways.
One way that randomness can lead to aggregate fluctuations is the granularity that comes
through the discreteness of individuals. This topic, the connection between SDEs and the
Langevin equations typically used in the approximation of chemical reactions in well-mixed
media is explored in further lectures on continuous time Markov chains.
Instead, in the next lecture, we will concentrate on randomness that comes from aggregate
changes in behavior or policy.
762 CHAPTER 41. MODELING COVID 19 WITH DIFFERENTIAL EQUATIONS
Chapter 42
42.1 Contents
• Overview 42.2
• The Basic SIR/SIRD Model ??
• Introduction to SDEs 42.4
• Ending Lockdown 42.5
• Reinfection 42.6
42.2 Overview
763
764CHAPTER 42. MODELING SHOCKS IN COVID 19 WITH STOCHASTIC DIFFERENTIAL EQUATI
1. Weiner Processes (as known as Brownian Motion) which leads to a diffusion equations,
and is the only continuous-time Levy process with continuous paths
Every other Levy Process can be represented by these building blocks (e.g. a Diffusion Pro-
cess such as Geometric Brownian Motion is a transformation of a Weiner process, a jump dif-
fusion is a diffusion process with a Poisson arrival of jumps, and a continuous-time markov
chain (CMTC) is a Poisson process jumping between a finite number of states).
In this lecture, we will examine shocks driven by transformations of Brownian motion, as the
prototypical Stochastic Differential Equation (SDE).
42.2.2 Setup
In addition, we will be exploring packages within the SciML ecosystem and others covered in
previous lectures
To demonstrate another common compartmentalized model we will change the previous SEIR
model to remove the exposed state, and more carefully manage the death state, D.
The states are are now: susceptible (S), infected (I), resistant (R), or dead (D).
Comments:
• Unlike the previous SEIR model, the R state is only for those recovered, alive, and cur-
rently resistant.
• As before, we start by assuming those have recovered have acquired immunity.
• Later, we could consider transitions from R to S if resistance is not permanent due to
virus mutation, etc.
See the previous lecture, for a more detailed development of the model.
• 𝛽(𝑡) is called the transmission rate or effective contact rate (the rate at which individu-
als bump into others and expose them to the virus)
• 𝛾 is called the resolution rate (the rate at which infected people recover or die)
• 𝛿(𝑡) ∈ [0, 1] is the death probability
42.4. INTRODUCTION TO SDES 765
𝑑𝑠 = −𝛾 𝑅0 𝑠 𝑖 𝑑𝑡
𝑑𝑖 = (𝛾 𝑅0 𝑠 𝑖 − 𝛾 𝑖) 𝑑𝑡
(1)
𝑑𝑟 = (1 − 𝛿)𝛾 𝑖 𝑑𝑡
𝑑𝑑 = 𝛿 𝛾 𝑖 𝑑𝑡
Note that the notation has changed to heuristically put the 𝑑𝑡 on the right hand side, which
will be used when adding the stochastic shocks.
We start by extending our model to include randomness in 𝑅0 (𝑡) and then the mortality rate
𝛿(𝑡).
The result is a system of Stochastic Differential Equations (SDEs).
As before, we assume that the basic reproduction number, 𝑅0 (𝑡), follows a process with a
reversion to a value 𝑅̄ 0 (𝑡) which could conceivably be influenced by policy. The intuition
is that even if the targeted 𝑅̄ 0 (𝑡) was changed through social distancing/etc., lags in behav-
ior and implementation would smooth out the transition, where 𝜂 governs the speed of 𝑅0 (𝑡)
moves towards 𝑅̄ 0 (𝑡).
Beyond changes in policy, randomness in 𝑅0 (𝑡) may come from shocks to the 𝛽(𝑡) process.
For example,
• Misinformation on Facebook spreading non-uniformly.
• Large political rallies, elections, or protests.
• Deviations in the implementation and timing of lockdown policy between demographics,
locations, or businesses within the system.
• Aggregate shocks in opening/closing industries.
To implement these sorts of randomness, we will add on a diffusion term with an instanta-
neous volatility of 𝜎√𝑅0 .
• This equation is used in the Cox-Ingersoll-Ross and Heston models of interest rates and
stochastic volatility.
• The scaling by the √𝑅0 ensure that the process stays weakly positive. The heuristic
explanation is that the variance of the shocks converges to zero as R� goes to zero, en-
abling the upwards drift to dominate.
• See here for a heuristic description of when the process is weakly and strictly positive.
The notation for this SDE is then
Heuristically, if 𝜎 = 0, divide this equation by 𝑑𝑡 and it nests the original ODE used in the
previous lecture.
While we do not consider any calibration for the 𝜎 parameter, empirical studies such as Es-
timating and Simulating a SIRD Model of COVID-19 for Many Countries, States, and Cities
(Figure 6) show highly volatile 𝑅0 (𝑡) estimates over time.
Even after lockdowns are first implemented, we see variation between 0.5 and 1.5. Since coun-
tries are made of interconnecting cities with such variable contact rates, a high 𝜎 seems rea-
sonable both intuitively and empirically.
Unlike the previous lecture, we will build up towards mortality rates which change over time.
Imperfect mixing of different demographic groups could lead to aggregate shocks in mortality
(e.g. if a retirement home is afflicted vs. an elementary school). These sorts of relatively small
changes might be best modeled as a continuous path process.
Let 𝛿(𝑡) be the mortality rate and in addition,
• Assume that the base mortality rate is 𝛿,̄ which acts as the mean of the process, revert-
ing at rate 𝜃. In more elaborate models, this could be time-varying.
• The diffusion term has a volatility 𝜉√𝛿(1 − 𝛿).
• As the process gets closer to either 𝛿 = 1 or 𝛿 = 0, the volatility goes to 0, which acts as
a force to allow the mean reversion to keep the process within the bounds
• Unlike the well-studied Cox-Ingersoll-Ross model, we make no claims on the long-run
behavior of this process, but will be examining the behavior on a small timescale so this
is not an issue.
Given this, the stochastic process for the mortality rate is,
The system (1) can be written in vector form 𝑥 ∶= [𝑠, 𝑖, 𝑟, 𝑑, 𝑅0 , 𝛿] with parameter tuple pa-
rameter tuple 𝑝 ∶= (𝛾, 𝜂, 𝜎, 𝜃, 𝜉, 𝑅̄ 0 (⋅), 𝛿)̄
The general form of the SDE is.
−𝛾 𝑅0 𝑠 𝑖
⎡ 𝛾 𝑅 𝑠 𝑖 − 𝛾𝑖 ⎤
⎢ 0 ⎥
⎢ (1 − 𝛿)𝛾𝑖 ⎥
𝐹 (𝑥, 𝑡; 𝑝) ∶= ⎢ ⎥ (4)
⎢ 𝛿𝛾𝑖 ⎥
⎢𝜂(𝑅̄ 0 (𝑡) − 𝑅0 )⎥
⎣ 𝜃(𝛿 ̄ − 𝛿) ⎦
42.4. INTRODUCTION TO SDES 767
Here, it is convenient but not necessary for 𝑑𝑊 to have the same dimension as 𝑥. If so, then
we can use a square matrix 𝐺(𝑥, 𝑡; 𝑝) to associate the shocks with the appropriate 𝑥 (e.g. di-
agonal noise, or using a covariance matrix).
As the two independent sources of Brownian motion only affect the 𝑑𝑅0 and 𝑑𝛿 terms
(i.e. the 5th and 6th equations), define the covariance matrix as
42.4.4 Implementation
function G(x, p, t)
s, i, r, d, R�, δ = x
�
@unpack γ, R�, η, σ, ξ, θ, δ_bar = p
Next create a settings generator, and then define a SDEProblem with Diagonal Noise.
We solve the problem with the SOSRI algorithm (Adaptive strong order 1.5 methods for diag-
onal noise Ito and Stratonovich SDEs).
length(sol_1.t) = 478
As in the deterministic case of the previous lecture, we are using an adaptive time-stepping
method. However, since this is an SDE, (1) you will tend to see more timesteps required due
to the greater curvature; and (2) the number of timesteps will change with different shock
realizations.
With stochastic differential equations, a “solution” is akin to a simulation for a particular re-
alization of the noise process.
If we take two solutions and plot the number of infections, we will see differences over time:
Out[7]:
The same holds for other variables such as the cumulative deaths, mortality, and 𝑅0 :
"Trajectory 1",
lw = 2, xlabel = "t", ylabel = "d(t)", legend = :topleft)
plot!(plot_1, sol_2, vars=[4], label = "Trajectory 2", lw = 2)
plot_2 = plot(sol_1, vars=[3], title = "Cumulative Recovered Proportion",�
↪label =
"Trajectory 1",
lw = 2, xlabel = "t", ylabel = "d(t)", legend = :topleft)
plot!(plot_2, sol_2, vars=[3], label = "Trajectory 2", lw = 2)
plot_3 = plot(sol_1, vars=[5], title = "R_0 transition from lockdown",�
↪label =
"Trajectory 1",
lw = 2, xlabel = "t", ylabel = "R_0(t)")
plot!(plot_3, sol_2, vars=[5], label = "Trajectory 2", lw = 2)
plot_4 = plot(sol_1, vars=[6], title = "Mortality Rate", label =�
↪"Trajectory 1",
Out[8]:
See here for comments on finding the appropriate SDE algorithm given the structure of
𝐹 (𝑥, 𝑡) and 𝐺(𝑥, 𝑡)
• If 𝐺 has diagonal noise (i.e. 𝐺(𝑥, 𝑡) is a diagonal, and possibly a function of the state),
then SOSRI is the typical choice.
• If 𝐺 has additive and diagonal noise (i.e. 𝐺(𝑡) is a diagonal and independent from the
state), then SOSRA is usually the best algorithm for even mildly stiff 𝐹 .
• If adaptivity is not required, then EM (i.e. Euler-Maruyama method typically used by
economists) is flexible in its ability to handle different noise processes.
42.4.5 Ensembles
While individual simulations are useful, you often want to look at an ensemble of trajectories
of the SDE in order to get an accurate picture of how the system evolves.
To do this, use the EnsembleProblem in order to have the solution compute multiple tra-
jectories at once. The returned EnsembleSolution acts like an array of solutions but is
imbued to plot recipes to showcase aggregate quantities.
For example:
= 2)
Out[9]:
Or, more frequently, you may want to run many trajectories and plot quantiles, which can be
automatically run in parallel using multiple threads, processes, or GPUs. Here we showcase
EnsembleSummary which calculates summary information from an ensemble and plots the
mean of the solution along with calculated quantiles of the simulation:
Out[10]:
:bottomright])
Out[11]:
Consider a policy maker who wants to consider the impact of relaxing lockdown at various
speeds.
We will shut down the shocks to the mortality rate (i.e. 𝜉 = 0) to focus on the variation
caused by the volatility in 𝑅0 (𝑡).
Consider 𝜂 = 1/50 and 𝜂 = 1/20, where we start at the same initial condition of 𝑅0 (0) = 0.5.
Out[12]:
While the the mean of the 𝑑(𝑡) increases, unsurprisingly, we see that the 95% quantile for
later time periods is also much larger - even after the 𝑅0 has converged.
That is, volatile contact rates (and hence 𝑅0 ) can interact to make catastrophic worst-case
scenarios due to the nonlinear dynamics of the system.
1. choose 𝑅̄ 0 (𝑡) to target 𝑅0 = 0.5 for 30 days and then 𝑅0 = 2 for the remaining 17
months. This corresponds to lifting lockdown in 30 days.
2. target 𝑅0 = 0.5 for 120 days and then 𝑅0 = 2 for the remaining 14 months. This corre-
sponds to lifting lockdown in 4 months.
Since empirical estimates of 𝑅0 (𝑡) discussed in [31] and other papers show it to have wide
variation, we will maintain a fairly larger 𝜎.
We start the model with 100,000 active infections.
� �
p_early = p_gen(R� = R�_lift_early, η = η_experiment, σ = σ_experiment)
� �
p_late = p_gen(R� = R�_lift_late, η = η_experiment, σ = σ_experiment)
# initial conditions
i_0 = 100000 / p_early.N
r_0 = 0.0
d_0 = 0.0
s_0 = 1.0 - i_0 - r_0 - d_0
δ_0 = p_early.δ_bar
Simulating for a single realization of the shocks, we see the results are qualitatively similar to
what we had before
772CHAPTER 42. MODELING SHOCKS IN COVID 19 WITH STOCHASTIC DIFFERENTIAL EQUATI
Out[14]:
However, note that this masks highly volatile values induced by the in 𝑅0 variation, as seen
in the ensemble
saveat)
ensemble_sol_late = solve(EnsembleProblem(prob_late), SOSRI(),
EnsembleThreads(), trajectories = trajectories,�
↪saveat =
saveat)
summ_early = EnsembleSummary(ensemble_sol_early)
summ_late = EnsembleSummary(ensemble_sol_late)
Out[15]:
Finally, rather than looking at the ensemble summary, we can use data directly from the en-
semble to do our own analysis.
For example, evaluating at an intermediate (t = 350) and final time step.
In [16]: N = p_early.N
t_1 = 350
t_2 = p_early.T # i.e. the last element
bins_1 = range(0.000, 0.009, length = 30)
bins_2 = 30 # number rather than grid.
Out[16]:
This shows that there are significant differences after a year, but by 550 days the graphs
largely coincide.
In the above code given the return from solve on an EnsembleProblem , e.g.
ensemble_sol = solve(...)
• You can access the i’th simulation as ensemble_sol[i], which then has all of the
standard solution handling features
• You can evaluate at a real time period, t, with ensemble_sol[i](t). Or access the
4th element with ensemble_sol[i](t)[4]
• If the t was not exactly one of the saveat values (if specified) or the adaptive
timesteps (if it was not), then it will use interpolation
• Alternatively, to access the results of the ODE as a grid exactly at the timesteps,
where j is timestep index, use ensemble_sol[i][j] or the 4th element with
ensemble_sol[i][4, j]
• Warning: unless you have chosen a saveat grid, the timesteps will not be
aligned between simulations. That is, ensemble_sol[i_1].t wouldn’t
match ensemble_sol[i_2].t. In that case, use interpolation with
ensemble_sol[i_1](t) etc.
42.6 Reinfection
As a final experiment, consider a model where the immunity is only temporary, and individu-
als become susceptible again.
In particular, assume that at rate 𝜈 immunity is lost. For illustration, we will examine the
case if the average immunity lasts 12 months (i.e. 1/𝜈 = 360)
The transition modifies the differential equation (1) to become
𝑑𝑠 = (−𝛾 𝑅0 𝑠 𝑖 + 𝜈 𝑟) 𝑑𝑡
𝑑𝑖 = (𝛾 𝑅0 𝑠 𝑖 − 𝛾𝑖) 𝑑𝑡
(6)
𝑑𝑟 = ((1 − 𝛿)𝛾𝑖 − 𝜈 𝑟) 𝑑𝑡
𝑑𝑑 = 𝛿𝛾𝑖 𝑑𝑡
774CHAPTER 42. MODELING SHOCKS IN COVID 19 WITH STOCHASTIC DIFFERENTIAL EQUATI
This change modifies the underlying F function and adds a parameter, but otherwise the
model remains the same.
We will redo the “Ending Lockdown” simulation from above, where the only difference is the
new transition.
� �
p_re_early = p_re_gen(R� = R�_lift_early, η = η_experiment, σ =�
↪σ_experiment)
� �
p_re_late = p_re_gen(R� = R�_lift_late, η = η_experiment, σ = σ_experiment)
trajectories = 400
saveat = 1.0
prob_re_early = SDEProblem(F_reinfect, G, x_0, (0, p_re_early.T),�
↪p_re_early)
The ensemble simulations for the 𝜈 = 0 and 𝜈 > 0 can be compared to see the impact in the
absence of medical innovations.
Out[18]:
Finally, we can examine the same early vs. late lockdown histogram
Out[19]:
In this case, there are significant differences between the early and late deaths and high vari-
ance.
This bleak simulation has assumed that no individuals has long-term immunity and that
there will be no medical advancements on that time horizon - both of which are unlikely to
be true.
Nevertheless, it suggest that the timing of lifting lockdown has a more profound impact after
18 months if we allow stochastic shocks imperfect immunity.
776CHAPTER 42. MODELING SHOCKS IN COVID 19 WITH STOCHASTIC DIFFERENTIAL EQUATI
Part VI
777
Chapter 43
43.1 Contents
• Overview 43.2
• The Model 43.3
• Results 43.4
• Exercises 43.5
• Solutions 43.6
43.2 Overview
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation [97].
His model studies the dynamics of racially mixed neighborhoods.
Like much of Schelling’s work, the model shows how local interactions can lead to surprising
aggregate structure.
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation.
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Eco-
nomic Sciences (joint with Robert Aumann).
In this lecture we (in fact you) will build and run a version of Schelling’s model.
We will cover a variation of Schelling’s model that is easy to program and captures the main
idea.
Suppose we have two types of people: orange people and green people.
For the purpose of this lecture, we will assume there are 250 of each type.
These agents all live on a single unit square.
The location of an agent is just a point (𝑥, 𝑦), where 0 < 𝑥, 𝑦 < 1.
779
780 CHAPTER 43. SCHELLING’S SEGREGATION MODEL
43.3.1 Preferences
We will say that an agent is happy if half or more of her 10 nearest neighbors are of the same
type.
Here ‘nearest’ is in terms of Euclidean distance.
An agent who is not happy is called unhappy.
An important point here is that agents are not averse to living in mixed areas.
They are perfectly happy if half their neighbors are of the other color.
43.3.2 Behavior
3. Else, go to step 1
43.4 Results
Let’s have a look at the results we got when we coded and ran this model.
As discussed above, agents are initially mixed randomly together
43.4. RESULTS 781
But after several cycles they become segregated into distinct regions
782 CHAPTER 43. SCHELLING’S SEGREGATION MODEL
43.4. RESULTS 783
784 CHAPTER 43. SCHELLING’S SEGREGATION MODEL
In this instance, the program terminated after 4 cycles through the set of agents, indicating
that all agents had reached a state of happiness.
What is striking about the pictures is how rapidly racial integration breaks down.
This is despite the fact that people in the model don’t actually mind living mixed with the
other type.
Even with these preferences, the outcome is a high degree of segregation.
43.5 Exercises
43.5.1 Exercise 1
43.6 Solutions
43.6.1 Exercise 1
Here’s one solution that does the job we want. If you feel like a further exercise you can prob-
ably speed up some of the computations and then increase the number of agents.
43.6.2 Setup
function is_happy(a)
distances = [(get_distance(a, agent), agent) for agent in agents]
sort!(distances)
neighbors = [agent for (d, agent) in distances[1:neighborhood_size]]
share = mean(isequal(a.kind), other.kind for other in neighbors)
# can also do
# share = mean(isequal(a.kind),
# first(agents[idx]) for idx in
# partialsortperm(get_distance.(Ref(a), agents),
# 1:neighborhood_size))
function update!(a)
# If not happy, then randomly choose new locations until happy.
while !is_happy(a)
draw_location!(a)
end
end
function plot_distribution(agents)
x_vals_0, y_vals_0 = zeros(0), zeros(0)
x_vals_1, y_vals_1 = zeros(0), zeros(0)
push!(y_vals_0, y)
else
push!(x_vals_1, x)
push!(y_vals_1, y)
end
end
plot_array = Any[]
Out[4]:
43.6. SOLUTIONS 787
788 CHAPTER 43. SCHELLING’S SEGREGATION MODEL
Chapter 44
44.1 Contents
• Overview 44.2
• The Model 44.3
• Implementation 44.4
• Dynamics of an Individual Worker 44.5
• Endogenous Job Finding Rate 44.6
• Exercises 44.7
• Solutions 44.8
44.2 Overview
789
790 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
44.2.1 Prerequisites
Before working through what follows, we recommend you read the lecture on finite Markov
chains.
You will also need some basic linear algebra and probability.
The value 𝑏(𝐸𝑡 + 𝑈𝑡 ) is the mass of new workers entering the labor force unemployed.
The total stock of workers 𝑁𝑡 = 𝐸𝑡 + 𝑈𝑡 evolves as
𝑈𝑡
Letting 𝑋𝑡 ∶= ( ), the law of motion for 𝑋 is
𝐸𝑡
(1 − 𝑑)(1 − 𝜆) + 𝑏 (1 − 𝑑)𝛼 + 𝑏
𝑋𝑡+1 = 𝐴𝑋𝑡 where 𝐴 ∶= ( )
(1 − 𝑑)𝜆 (1 − 𝑑)(1 − 𝛼)
This law tells us how total employment and unemployment evolve over time.
𝑈 /𝑁 1 𝑈 /𝑁
( 𝑡+1 𝑡+1 ) = 𝐴 ( 𝑡 𝑡)
𝐸𝑡+1 /𝑁𝑡+1 1+𝑔 𝐸 𝑡 /𝑁𝑡
Letting
𝑢 𝑈 /𝑁
𝑥𝑡 ∶= ( 𝑡 ) = ( 𝑡 𝑡 )
𝑒𝑡 𝐸𝑡 /𝑁𝑡
̂ 1
𝑥𝑡+1 = 𝐴𝑥 𝑡 where 𝐴 ̂ ∶= 𝐴
1+𝑔
44.4 Implementation
44.4.1 Setup
function transition_matrices(lm)
@unpack λ, α, b, d = lm
g = b - d
A = [(1 - λ) * (1 - d) + b (1 - d) * α + b
(1 - d) * λ (1 - d) * (1 - α)]
 = A ./ (1 + g)
return (A = A, Â = Â)
end
function rate_steady_state(lm)
@unpack  = transition_matrices(lm)
sol = fixedpoint(x -> Â * x, fill(0.5, 2))
converged(sol) || error("Failed to converge in $(result.iterations)�
↪iterations")
return sol.zero
end
In [5]: lm = LakeModel()
A, Â = transition_matrices(lm)
A
In [6]: Â
In [8]: Â
Let’s run a simulation under the default parameters (see above) starting from 𝑋0 = (12, 138)
In [9]: lm = LakeModel()
N_0 = 150 # population
e_0 = 0.92 # initial employment rate
u_0 = 1 - e_0 # initial unemployment rate
T = 50 # simulation length
x1 = X_path[1, :]
x2 = X_path[2, :]
x3 = dropdims(sum(X_path, dims = 1), dims = 1)
label = "")
plt_emp = plot(title = "Employment", 1:T, x2, color = :blue, lw = 2, grid�
↪= true, label
= "")
plt_labor = plot(title = "Labor force", 1:T, x3, color = :blue, lw = 2,�
↪grid = true,
label = "")
Out[9]:
The aggregates 𝐸𝑡 and 𝑈𝑡 don’t converge because their sum 𝐸𝑡 + 𝑈𝑡 grows at rate 𝑔.
On the other hand, the vector of employment and unemployment rates 𝑥𝑡 can be in a steady
state 𝑥̄ if there exists an 𝑥̄ such that
̂ ̄
• 𝑥 ̄ = 𝐴𝑥
• the components satisfy 𝑒 ̄ + 𝑢̄ = 1
This equation tells us that a steady state level 𝑥̄ is an eigenvector of 𝐴 ̂ associated with a unit
eigenvalue.
We also have 𝑥𝑡 → 𝑥̄ as 𝑡 → ∞ provided that the remaining eigenvalue of 𝐴 ̂ has modulus less
that 1.
This is the case for our default parameters:
In [10]: lm = LakeModel()
A, Â = transition_matrices(lm)
44.4. IMPLEMENTATION 795
e, f = eigvals(Â)
abs(e), abs(f)
Let’s look at the convergence of the unemployment and employment rate to steady state lev-
els (dashed red line)
In [11]: lm = LakeModel()
e_0 = 0.92 # initial employment rate
u_0 = 1 - e_0 # initial unemployment rate
T = 50 # simulation length
xbar = rate_steady_state(lm)
x_0 = [u_0; e_0]
x_path = simulate_rate_path(lm, x_0, T)
label = "")
plt_emp = plot(title = "Employment rate", 1:T, x_path[2, :],color = :
↪blue, lw = 2, alpha
= 0.5,
grid = true, label = "")
plot!(plt_emp, [xbar[2]], color=:red, linetype = :hline, linestyle = :
↪dash, lw = 2,
label = "")
plot(plt_unemp, plt_emp, layout = (2, 1), size=(700,500))
Out[11]:
796 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
An individual worker’s employment dynamics are governed by a finite state Markov process.
The worker can be in one of two states:
• 𝑠𝑡 = 0 means unemployed
• 𝑠𝑡 = 1 means employed
Let’s start off under the assumption that 𝑏 = 𝑑 = 0.
The associated transition matrix is then
1−𝜆 𝜆
𝑃 =( )
𝛼 1−𝛼
Let 𝜓𝑡 denote the marginal distribution over employment / unemployment states for the
worker at time 𝑡.
As usual, we regard it as a row vector.
We know from an earlier discussion that 𝜓𝑡 follows the law of motion
𝜓𝑡+1 = 𝜓𝑡 𝑃
We also know from the lecture on finite Markov chains that if 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then
𝑃 has a unique stationary distribution, denoted here by 𝜓∗ .
The unique stationary distribution satisfies
44.5. DYNAMICS OF AN INDIVIDUAL WORKER 797
𝛼
𝜓∗ [0] =
𝛼+𝜆
Not surprisingly, probability mass on the unemployment state increases with the dismissal
rate and falls with the job finding rate rate.
44.5.1 Ergodicity
1 𝑇
𝑠𝑢,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 0}
𝑇 𝑡=1
and
1 𝑇
𝑠𝑒,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 1}
𝑇 𝑡=1
lim 𝑠𝑢,𝑇
̄ = 𝜓∗ [0] and ̄ = 𝜓∗ [1]
lim 𝑠𝑒,𝑇
𝑇 →∞ 𝑇 →∞
How long does it take for time series sample averages to converge to cross sectional averages?
We can use QuantEcon.jl’s MarkovChain type to investigate this.
Let’s plot the path of the sample averages over 5,000 periods
In [13]: lm = LakeModel(d = 0, b = 0)
T = 5000 # Simulation length
798 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
@unpack α, λ = lm
P = [(1 - λ) λ
α (1 - α)]
In [14]: Random.seed!(42)
mc = MarkovChain(P, [0; 1]) # 0=unemployed, 1=employed
xbar = rate_steady_state(lm)
label = "")
plt_emp = plot(title = "Percent of time employed", 1:T, s_bars[:,2],color�
↪= :blue, lw =
2,
alpha = 0.5, label = "", grid = true)
plot!(plt_emp, [xbar[2]], linetype = :hline, linestyle = :dash, color=:
↪red, lw = 2,
label = "")
plot(plt_unemp, plt_emp, layout = (2, 1), size=(700,500))
Out[14]:
44.6. ENDOGENOUS JOB FINDING RATE 799
The most important thing to remember about the model is that optimal decisions are charac-
terized by a reservation wage 𝑤.̄
• If the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts.
• Otherwise, the worker rejects.
As we saw in our discussion of the model, the reservation wage depends on the wage offer dis-
tribution and the parameters
• 𝛼, the separation rate
• 𝛽, the discount factor
• 𝛾, the offer arrival rate
• 𝑐, unemployment compensation
800 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Suppose that all workers inside a lake model behave according to the McCall search model.
The exogenous probability of leaving employment remains 𝛼.
But their optimal decision rules determine the probability 𝜆 of leaving unemployment.
This is now
̄ = 𝛾 ∑ 𝑝(𝑤′ )
𝜆 = 𝛾ℙ{𝑤𝑡 ≥ 𝑤} (1)
𝑤′ ≥𝑤̄
We can use the McCall search version of the Lake Model to find an optimal level of unem-
ployment insurance.
We assume that the government sets unemployment compensation 𝑐.
The government imposes a lump sum tax 𝜏 sufficient to finance total unemployment pay-
ments.
To attain a balanced budget at a steady state, taxes, the steady state unemployment rate 𝑢,
and the unemployment compensation rate must satisfy
𝜏 = 𝑢𝑐
𝜏 = 𝑢(𝑐, 𝜏 )𝑐
𝑊 ∶= 𝑒 𝔼[𝑉 | employed] + 𝑢 𝑈
where the notation 𝑉 and 𝑈 is as defined in the McCall search model lecture.
The wage offer distribution will be a discretized version of the lognormal distribution
𝐿𝑁 (log(20), 1), as shown in the next figure
44.6. ENDOGENOUS JOB FINDING RATE 801
We will make use of (with some tweaks) the code we wrote in the McCall model lecture, em-
bedded below for convenience.
iter = 2_000)
@unpack α, β, σ, c, γ, w, E, u = mcm
# necessary objects
u_w = u.(w, σ)
u_c = u(c, σ)
V)]
end
U = xstar[end]
# model constructor
McCallModel = @with_kw (α = 0.2,
β = 0.98, # discount rate
γ = 0.7,
c = 6.0, # unemployment compensation
σ = 2.0,
u = u, # utility function
w = range(10, 20, length = 60), # wage values
E = Expectation(BetaBinomial(59, 600, 400))) #�
↪distribution over
wage values
Now let’s compute and plot welfare, employment, unemployment, and tax revenue as a func-
tion of the unemployment compensation rate
logw_dist = Normal(log(log_wage_mean), 1)
cdf_logw = cdf.(logw_dist, log.(w_vec))
pdf_logw = cdf_logw[2:end] - cdf_logw[1:end-1]
44.6. ENDOGENOUS JOB FINDING RATE 803
function compute_optimal_quantities(c, τ)
mcm = McCallModel(α = α_q,
β = β,
γ = γ,
c = c - τ, # post-tax compensation
σ = σ,
w = w_vec .- τ, # post-tax wages
E = E) # expectation operator
�
@unpack V, U, w = solve_mccall_model(mcm)
�
indicator = wage -> wage > w
λ = γ * E * indicator.(w_vec .- τ)
�
return w, λ, V, U
end
function compute_steady_state_quantities(c, τ)
�
w, λ_param, V, U = compute_optimal_quantities(c, τ)
function find_balanced_budget_tax(c)
function steady_state_budget(t)
u_rate, e_rate, w = compute_steady_state_quantities(c, t)
return t - u_rate * c
end
tax_vec = zeros(Nc)
804 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
unempl_vec = similar(tax_vec)
empl_vec = similar(tax_vec)
welfare_vec = similar(tax_vec)
for i in 1:Nc
t = find_balanced_budget_tax(c_vec[i])
u_rate, e_rate, welfare = compute_steady_state_quantities(c_vec[i], t)
tax_vec[i] = t
unempl_vec[i] = u_rate
empl_vec[i] = e_rate
welfare_vec[i] = welfare
end
"",
grid = true)
plt_emp = plot(title = "Employment", c_vec, empl_vec, color = :blue, lw =�
↪2, alpha=0.7,
label = "",
grid = true)
plt_welf = plot(title = "Welfare", c_vec, welfare_vec, color = :blue, lw�
↪= 2, alpha=0.7,
label = "",
grid = true)
Out[17]:
44.7. EXERCISES 805
44.7 Exercises
44.7.1 Exercise 1
Consider an economy with initial stock of workers 𝑁0 = 100 at the steady state level of em-
ployment in the baseline parameterization
• 𝛼 = 0.013
• 𝜆 = 0.283
• 𝑏 = 0.0124
• 𝑑 = 0.00822
(The values for 𝛼 and 𝜆 follow [19])
Suppose that in response to new legislation the hiring rate reduces to 𝜆 = 0.2.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to converge to its new steady state?
What is the new steady state level of employment?
806 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
44.7.2 Exercise 2
Consider an economy with initial stock of workers 𝑁0 = 100 at the steady state level of em-
ployment in the baseline parameterization.
Suppose that for 20 periods the birth rate was temporarily high (𝑏 = 0.0025) and then re-
turned to its original level.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to return to its original steady state?
44.8 Solutions
44.8.1 Exercise 1
We begin by constructing an object containing the default parameters and assigning the
steady state values to x0
In [18]: lm = LakeModel()
x0 = rate_steady_state(lm)
println("Initial Steady State: $x0")
In [19]: N0 = 100
T = 50
Out[19]: 50
In [22]: x1 = X_path[1, :]
x2 = X_path[2, :]
x3 = dropdims(sum(X_path, dims = 1), dims = 1)
bg_inside = :lightgrey)
plt_labor = plot(title = "Labor force", 1:T, x3, color = :blue, grid =�
↪true, label = "",
bg_inside = :lightgrey)
Out[22]:
true,
label = "", bg_inside = :lightgrey)
plot!(plt_unemp, [xbar[1]], linetype = :hline, linestyle = :dash, color =:
↪red, label =
"")
808 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Out[23]:
We see that it takes 20 periods for the economy to converge to it’s new steady state levels.
44.8.2 Exercise 2
This next exercise has the economy experiencing a boom in entrances to the labor market and
then later returning to the original levels.
For 20 periods the economy has a new entry rate into the labor market.
Let’s start off at the baseline parameterization and record the steady state
In [24]: lm = LakeModel()
x0 = rate_steady_state(lm)
�
In [25]: b = 0.003
�
T = 20
Out[25]: 20
�
In [26]: lm = LakeModel(b=b)
�
X_path1 = simulate_stock_path(lm, x0 * N0, T) # simulate stocks
�
x_path1 = simulate_rate_path(lm, x0, T) # simulate rates
Now we reset 𝑏 to the original value and then, using the state after 20 periods for the new
initial conditions, we simulate for the additional 30 periods
In [29]: x1 = X_path[1,:]
x2 = X_path[2,:]
x3 = dropdims(sum(X_path, dims = 1), dims = 1)
Out[29]:
true,
label = "", bg_inside = :lightgrey, lw = 2)
plot!(plt_unemp, [x0[1]], linetype = :hline, linestyle = :dash, color =:
↪red, label = "",
lw = 2)
Out[30]:
812 CHAPTER 44. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Chapter 45
45.1 Contents
• Overview 45.2
• Defining Rational Expectations Equilibrium 45.3
• Computation of an Equilibrium 45.4
• Exercises 45.5
• Solutions 45.6
45.2 Overview
813
814 CHAPTER 45. RATIONAL EXPECTATIONS EQUILIBRIUM
This widely used method applies in contexts in which a “representative firm” or agent is a
“price taker” operating within a competitive equilibrium.
We want to impose that
• The representative firm or individual takes aggregate 𝑌 as given when it chooses indi-
vidual 𝑦, but ….
• At the end of the day, 𝑌 = 𝑦, so that the representative firm is indeed representative.
The Big 𝑌 , little 𝑦 trick accomplishes these two goals by
• Taking 𝑌 as beyond control when posing the choice problem of who chooses 𝑦; but ….
• Imposing 𝑌 = 𝑦 after having solved the individual’s optimization problem.
Please watch for how this strategy is applied as the lecture unfolds.
We begin by applying the Big 𝑌 , little 𝑦 trick in a very simple static context.
Consider a static model in which a collection of 𝑛 firms produce a homogeneous good that is
sold in a competitive market.
Each of these 𝑛 firms sells output 𝑦.
The price 𝑝 of the good lies on an inverse demand curve
𝑝 = 𝑎 0 − 𝑎1 𝑌 (1)
where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌 = 𝑛𝑦 is the market-wide level of output
Each firm has total cost function
𝑎0 − 𝑎1 𝑌 − 𝑐1 − 𝑐2 𝑦 = 0 (3)
At this point, but not before, we substitute 𝑌 = 𝑛𝑦 into (3) to obtain the following linear
equation
45.2.3 Setup
Our first illustration of a rational expectations equilibrium involves a market with 𝑛 firms,
each of which seeks to maximize the discounted present value of profits in the face of adjust-
ment costs.
The adjustment costs induce the firms to make gradual adjustments, which in turn requires
consideration of future prices.
Individual firms understand that, via the inverse demand curve, the price is determined by
the amounts supplied by other firms.
Hence each firm wants to forecast future total industry supplies.
In our context, a forecast is generated by a belief about the law of motion for the aggregate
state.
Rational expectations equilibrium prevails when this belief coincides with the actual law of
motion generated by production choices induced by this belief.
816 CHAPTER 45. RATIONAL EXPECTATIONS EQUILIBRIUM
𝑝𝑡 = 𝑎0 − 𝑎1 𝑌𝑡 (5)
where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌𝑡 = 𝑛𝑦𝑡 is the market-wide level of output
∞
∑ 𝛽 𝑡 𝑟𝑡 (6)
𝑡=0
where
𝛾(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 ∶= 𝑝𝑡 𝑦𝑡 − , 𝑦0 given (7)
2
Regarding the parameters,
• 𝛽 ∈ (0, 1) is a discount factor
• 𝛾 > 0 measures the cost of adjusting the rate of output
Regarding timing, the firm observes 𝑝𝑡 and 𝑦𝑡 when it chooses 𝑦𝑡+1 at at time 𝑡.
To state the firm’s optimization problem completely requires that we specify dynamics for all
state variables.
This includes ones that the firm cares about but does not control like 𝑝𝑡 .
We turn to this problem now.
In view of (5), the firm’s incentive to forecast the market price translates into an incentive to
forecast aggregate output 𝑌𝑡 .
Aggregate output depends on the choices of other firms.
45.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 817
We assume that 𝑛 is such a large number that the output of any single firm has a negligible
effect on aggregate output.
That justifies firms in regarding their forecasts of aggregate output as being unaffected by
their own output decisions.
We suppose the firm believes that market-wide output 𝑌𝑡 follows the law of motion
For now let’s fix a particular belief 𝐻 in (8) and investigate the firm’s response to it.
Let 𝑣 be the optimal value function for the firm’s problem given 𝐻.
The value function satisfies the Bellman equation
𝛾(𝑦′ − 𝑦)2
𝑣(𝑦, 𝑌 ) = max {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (9)
′
𝑦 2
where
𝛾(𝑦′ − 𝑦)2
ℎ(𝑦, 𝑌 ) ∶= arg max {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (11)
𝑦′ 2
First-Order Characterization of ℎ
In what follows it will be helpful to have a second characterization of ℎ, based on first order
conditions.
The first-order necessary condition for choosing 𝑦′ is
𝑣𝑦 (𝑦, 𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑦′ − 𝑦)
The firm optimally sets an output path that satisfies (13), taking (8) as given, and subject to
• the initial conditions for (𝑦0 , 𝑌0 )
• the terminal condition lim𝑡→∞ 𝛽 𝑡 𝑦𝑡 𝑣𝑦 (𝑦𝑡 , 𝑌𝑡 ) = 0
This last condition is called the transversality condition, and acts as a first-order necessary
condition “at infinity”.
The firm’s decision rule solves the difference equation (13) subject to the given initial condi-
tion 𝑦0 and the transversality condition.
Note that solving the Bellman equation (9) for 𝑣 and then ℎ in (11) yields a decision rule that
automatically imposes both the Euler equation (13) and the transversality condition.
Thus, when firms believe that the law of motion for market-wide output is (8), their optimiz-
ing behavior makes the actual law of motion be (14).
A rational expectations equilibrium or recursive competitive equilibrium of the model with ad-
justment costs is a decision rule ℎ and an aggregate law of motion 𝐻 such that
Thus, a rational expectations equilibrium equates the perceived and actual laws of motion (8)
and (14).
As we’ve seen, the firm’s optimum problem induces a mapping Φ from a perceived law of mo-
tion 𝐻 for market-wide output to an actual law of motion Φ(𝐻).
The mapping Φ is the composition of two operations, taking a perceived law of motion into a
decision rule via (9)–(11), and a decision rule into an actual law via (14).
The 𝐻 component of a rational expectations equilibrium is a fixed point of Φ.
45.4. COMPUTATION OF AN EQUILIBRIUM 819
Now let’s consider the problem of computing the rational expectations equilibrium.
45.4.1 Misbehavior of Φ
Readers accustomed to dynamic programming arguments might try to address this problem
by choosing some guess 𝐻0 for the aggregate law of motion and then iterating with Φ.
Unfortunately, the mapping Φ is not a contraction.
In particular, there is no guarantee that direct iterations on Φ converge Section ??.
Fortunately, there is another method that works here.
The method exploits a general connection between equilibrium and Pareto optimality ex-
pressed in the fundamental theorems of welfare economics (see, e.g, [75]).
Lucas and Prescott [71] used this method to construct a rational expectations equilibrium.
The details follow.
Our plan of attack is to match the Euler equations of the market problem with those for a
single-agent choice problem.
As we’ll see, this planning problem can be solved by LQ control (linear regulator).
The optimal quantities from the planning problem are rational expectations equilibrium
quantities.
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem.
For convenience, in this section we set 𝑛 = 1.
We first compute a sum of consumer and producer surplus at time 𝑡
𝑌𝑡
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑠(𝑌𝑡 , 𝑌𝑡+1 ) ∶= ∫ (𝑎0 − 𝑎1 𝑥) 𝑑𝑥 − (15)
0 2
The first term is the area under the demand curve, while the second measures the social costs
of changing output.
The planning problem is to choose a production plan {𝑌𝑡 } to maximize
∞
∑ 𝛽 𝑡 𝑠(𝑌𝑡 , 𝑌𝑡+1 )
𝑡=0
Evaluating the integral in (15) yields the quadratic form 𝑎0 𝑌𝑡 − 𝑎1 𝑌𝑡2 /2.
820 CHAPTER 45. RATIONAL EXPECTATIONS EQUILIBRIUM
𝑎1 2 𝛾(𝑌 ′ − 𝑌 )2
𝑉 (𝑌 ) = max {𝑎0 𝑌 − 𝑌 − + 𝛽𝑉 (𝑌 ′ )} (16)
𝑌′ 2 2
−𝛾(𝑌 ′ − 𝑌 ) + 𝛽𝑉 ′ (𝑌 ′ ) = 0 (17)
𝑉 ′ (𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑌 ′ − 𝑌 )
Substituting this into equation (17) and rearranging leads to the Euler equation
2. substituting into it the expression 𝑌𝑡 = 𝑛𝑦𝑡 that “makes the representative firm be rep-
resentative”
If it is appropriate to apply the same terminal conditions for these two difference equations,
which it is, then we have verified that a solution of the planning problem is also a rational
expectations equilibrium quantity sequence
It follows that for this example we can compute equilibrium quantities by forming the optimal
linear regulator problem corresponding to the Bellman equation (16).
The optimal policy function for the planning problem is the aggregate law of motion 𝐻 that
the representative firm faces within a rational expectations equilibrium.
As you are asked to show in the exercises, the fact that the planner’s problem is an LQ prob-
lem implies an optimal policy — and hence aggregate law of motion — taking the form
𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡 (19)
45.5. EXERCISES 821
𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 (20)
45.5 Exercises
45.5.1 Exercise 1
Express the solution of the firm’s problem in the form (20) and give the values for each ℎ𝑗 .
If there were 𝑛 identical competitive firms all behaving according to (20), what would (20)
imply for the actual law of motion (8) for market supply.
45.5.2 Exercise 2
Consider the following 𝜅0 , 𝜅1 pairs as candidates for the aggregate law of motion component
of a rational expectations equilibrium (see (19)).
Extending the program that you wrote for exercise 1, determine which if any satisfy the defi-
nition of a rational expectations equilibrium
• (94.0886298678, 0.923409232937)
• (93.2119845412, 0.984323478873)
• (95.0818452486, 0.952459076301)
Describe an iterative algorithm that uses the program that you wrote for exercise 1 to com-
pute a rational expectations equilibrium.
(You are not being asked actually to use the algorithm you are suggesting)
45.5.3 Exercise 3
45.5.4 Exercise 4
∞
A monopolist faces the industry demand curve (5) and chooses {𝑌𝑡 } to maximize ∑𝑡=0 𝛽 𝑡 𝑟𝑡
where
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑌𝑡 −
2
Formulate this problem as an LQ problem.
Compute the optimal policy using the same parameters as the previous exercise.
In particular, solve for the parameters in
𝑌𝑡+1 = 𝑚0 + 𝑚1 𝑌𝑡
45.6 Solutions
45.6.1 Exercise 1
To map a problem into a discounted optimal linear control problem, we need to define
• state vector 𝑥𝑡 and control vector 𝑢𝑡
• matrices 𝐴, 𝐵, 𝑄, 𝑅 that define preferences and the law of motion for the state
For the state and control vectors we choose
𝑦𝑡
⎡
𝑥𝑡 = ⎢𝑌𝑡 ⎤⎥, 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
1
⎣ ⎦
For , 𝐵, 𝑄, 𝑅 we set
1 0 0 1 0 𝑎1 /2 −𝑎0 /2
𝐴=⎡ ⎤
⎢0 𝜅1 𝜅0 ⎥ , 𝐵=⎡ ⎤
⎢0⎥ , 𝑅=⎡ 𝑎
⎢ 1 /2 0 0 ⎤ ⎥, 𝑄 = 𝛾/2
⎣0 0 1 ⎦ ⎣0⎦ ⎣−𝑎0 /2 0 0 ⎦
𝑦𝑡+1 − 𝑦𝑡 = −𝐹0 𝑦𝑡 − 𝐹1 𝑌𝑡 − 𝐹2
ℎ0 = −𝐹2 , ℎ 1 = 1 − 𝐹0 , ℎ2 = −𝐹1
# beliefs
κ0 = 95.5
κ1 = 0.95
R = [ 0 a1/2 -a0/2
a1/2 0 0
-a0/2 0 0]
Q = 0.5 * γ
For the case 𝑛 > 1, recall that 𝑌𝑡 = 𝑛𝑦𝑡 , which, combined with the previous equation, yields
45.6.2 Exercise 2
To determine whether a 𝜅0 , 𝜅1 pair forms the aggregate law of motion component of a ratio-
nal expectations equilibrium, we can proceed as follows:
• Determine the corresponding firm law of motion 𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 .
• Test whether the associated aggregate law :𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) evaluates to 𝑌𝑡+1 =
𝜅0 + 𝜅1 𝑌𝑡 .
In the second step we can use 𝑌𝑡 = 𝑛𝑦𝑡 = 𝑦𝑡 , so that 𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) becomes
The output tells us that the answer is pair (iii), which implies (ℎ0 , ℎ1 , ℎ2 ) =
(95.0819, 1.0000, −.0475).
(Notice we use isapprox to test equality of floating point numbers, since exact equality is
too strict)
Regarding the iterative algorithm, one could loop from a given (𝜅0 , 𝜅1 ) pair to the associated
firm law and then to a new (𝜅0 , 𝜅1 ) pair.
This amounts to implementing the operator Φ described in the lecture.
(There is in general no guarantee that this iterative process will converge to a rational expec-
tations equilibrium)
45.6. SOLUTIONS 825
45.6.3 Exercise 3
𝑌
𝑥𝑡 = [ 𝑡 ] , 𝑢𝑡 = 𝑌𝑡+1 − 𝑌𝑡
1
1 0 1 𝑎1 /2 −𝑎0 /2
𝐴=[ ], 𝐵 = [ ], 𝑅=[ ], 𝑄 = 𝛾/2
0 1 0 −𝑎0 /2 0
𝑌𝑡+1 − 𝑌𝑡 = −𝐹0 𝑌𝑡 − 𝐹1
we can obtain the implied aggregate law of motion via 𝜅0 = −𝐹1 and 𝜅1 = 1 − 𝐹0
Q = γ / 2.0
κ0=95.08187459215002 κ1=0.9524590627039248
The output yields the same (𝜅0 , 𝜅1 ) pair obtained as an equilibrium from the previous exer-
cise.
45.6.4 Exercise 4
The monopolist’s LQ problem is almost identical to the planner’s problem from the previous
exercise, except that
826 CHAPTER 45. RATIONAL EXPECTATIONS EQUILIBRIUM
𝑎1 −𝑎0 /2
𝑅=[ ]
−𝑎0 /2 0
R = [ a1 -a0 / 2.0
-a0 / 2.0 0.0]
Q = γ / 2.0
m0=73.47294403502833 m1=0.9265270559649701
We see that the law of motion for the monopolist is approximately 𝑌𝑡+1 = 73.4729 + 0.9265𝑌𝑡 .
In the rational expectations case the law of motion was approximately 𝑌𝑡+1 = 95.0819 +
0.9525𝑌𝑡 .
One way to compare these two laws of motion is by their fixed points, which give long run
equilibrium output in each case.
For laws of the form 𝑌𝑡+1 = 𝑐0 + 𝑐1 𝑌𝑡 , the fixed point is 𝑐0 /(1 − 𝑐1 ).
If you crunch the numbers, you will see that the monopolist adopts a lower long run quantity
than obtained by the competitive market, implying a higher market price.
This is analogous to the elementary static-case results
Footnotes
[1] A literature that studies whether models populated with agents who learn can converge
to rational expectations equilibria features iterations on a modification of the mapping Φ that
can be approximated as 𝛾Φ + (1 − 𝛾)𝐼. Here 𝐼 is the identity operator and 𝛾 ∈ (0, 1) is a
relaxation parameter. See [73] and [29] for statements and applications of this approach to
establish conditions under which collections of adaptive agents who use least squares learning
converge to a rational expectations equilibrium.
Chapter 46
46.1 Contents
• Overview 46.2
• Background 46.3
• Linear Markov perfect equilibria 46.4
• Application 46.5
• Exercises 46.6
• Solutions 46.7
46.2 Overview
46.2.1 Setup
827
828 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
46.3 Background
Two firms are the only producers of a good the demand for which is governed by a linear in-
verse demand function
𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (1)
Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0.
In (1) and what follows,
• the time subscript is suppressed when possible to simplify notation
• 𝑥̂ denotes a next period value of variable 𝑥
Each firm recognizes that its output affects total output and therefore the market price.
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:
Substituting the inverse demand curve (1) into (2) lets us express the one-period payoff as
Firm 𝑖 chooses a decision rule that sets next period quantity 𝑞𝑖̂ as a function 𝑓𝑖 of the current
state (𝑞𝑖 , 𝑞−𝑖 ).
An essential aspect of a Markov perfect equilibrium is that each firm takes the decision rule
of the other firm as known and given.
Given 𝑓−𝑖 , the Bellman equation of firm 𝑖 is
𝑣𝑖 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (4)
𝑞𝑖̂
Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions
(𝑣1 , 𝑣2 ) and a pair of policy functions (𝑓1 , 𝑓2 ) such that, for each 𝑖 ∈ {1, 2} and each possible
state,
• The value function 𝑣𝑖 satisfies the Bellman equation (4).
• The maximizer on the right side of (4) is equal to 𝑓𝑖 (𝑞𝑖 , 𝑞−𝑖 ).
The adjective “Markov” denotes that the equilibrium decision rules depend only on the cur-
rent values of the state variables, not other parts of their histories.
“Perfect” means complete, in the sense that the equilibrium is constructed by backward in-
duction and hence builds in optimizing behavior for each firm at all possible future states.
• These include many states that will not be reached when we iterate forward
on the pair of equilibrium strategies 𝑓𝑖 starting from a given initial state.
46.3.2 Computation
One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs
of Bellman equations and decision rules.
In particular, let 𝑣𝑖𝑗 , 𝑓𝑖𝑗 be the value function and policy function for firm 𝑖 at the 𝑗-th itera-
tion.
Imagine constructing the iterates
𝑣𝑖𝑗+1 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖𝑗 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (5)
𝑞𝑖̂
As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations.
In linear quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure.
We’ll lay out that structure in a general setup and then apply it to some simple problems.
830 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 } (6)
𝑡=𝑡0
Here
• 𝑥𝑡 is an 𝑛 × 1 state vector and 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖
• 𝑅𝑖 is 𝑛 × 𝑛
• 𝑆𝑖 is 𝑘−𝑖 × 𝑘−𝑖
• 𝑄𝑖 is 𝑘𝑖 × 𝑘𝑖
• 𝑊𝑖 is 𝑛 × 𝑘𝑖
• 𝑀𝑖 is 𝑘−𝑖 × 𝑘𝑖
• 𝐴 is 𝑛 × 𝑛
• 𝐵𝑖 is 𝑛 × 𝑘𝑖
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 } (8)
𝑡=𝑡0
subject to
where
• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡
′
• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖 − 𝑀𝑖′ 𝐹−𝑖𝑡
′
46.4. LINEAR MARKOV PERFECT EQUILIBRIA 831
𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) (10)
𝑃1𝑡 = Π1𝑡 −(𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )′ (𝑄1 +𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )+𝛽Λ′1𝑡 𝑃1𝑡+1 Λ1𝑡 (11)
𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) (12)
𝑃2𝑡 = Π2𝑡 −(𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )′ (𝑄2 +𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )+𝛽Λ′2𝑡 𝑃2𝑡+1 Λ2𝑡 (13)
Key insight
A key insight is that equations (10) and (12) are linear in 𝐹1𝑡 and 𝐹2𝑡 .
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in (11) and (13).
Infinite horizon
We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time invariant as 𝑡1 → +∞.
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞.
This is the approach we adopt in the next section.
46.4.3 Implementation
We use the function nnash from QuantEcon.jl that computes a Markov perfect equilibrium of
the infinite horizon linear quadratic dynamic game in the manner described above.
832 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
46.5 Application
Let’s use these procedures to treat some applications, starting with the duopoly model.
To map the duopoly model into coupled linear-quadratic dynamic programming problems,
define the state and controls as
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
If we write
where 𝑄1 = 𝑄2 = 𝛾,
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤
⎥
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
1 0 0 0 0
𝐴 ∶= ⎢0 1 0⎤
⎡
⎥, 𝐵1 ∶= ⎢1⎤
⎡
⎥, 𝐵2 ∶= ⎢0⎤
⎡
⎥
⎣0 0 1 ⎦ ⎣0⎦ ⎣1⎦
The optimal decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following
closed loop system for the evolution of 𝑥 in the Markov perfect equilibrium:
Consider the previously presented duopoly model with parameter values of:
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
From these we compute the infinite horizon MPE using the following code
# parameters
46.5. APPLICATION 833
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# in LQ form
A = I + zeros(3, 3)
B1 = [0.0, 1.0, 0.0]
B2 = [0.0, 0.0, 1.0]
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# display policies
println("Computed policies for firm 1 and firm 2:")
println("F1 = $F1")
println("F2 = $F2")
This is close enough for rock and roll, as they say in the trade.
Indeed, isapprox agrees with our assessment
834 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
Out[5]: true
46.5.3 Dynamics
Let’s now investigate the dynamics of price and output in this simple duopoly model under
the MPE policies.
Given our optimal policies 𝐹 1 and 𝐹 2, the state evolves according to (14).
The following program
• imports 𝐹 1 and 𝐹 2 from the previous program along with all parameters
• computes the evolution of 𝑥𝑡 using (14)
• extracts and plots industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡
AF = A - B1 * F1 - B2 * F2
n = 20
x = zeros(3, n)
x[:, 1] = [1 1 1]
for t in 1:n-1
x[:, t+1] = AF * x[:, t]
end
q1 = x[2, :]
q2 = x[3, :]
q = q1 + q2 # total output, MPE
p = a0 .- a1 * q # price, MPE
Out[6]:
46.5. APPLICATION 835
Note that the initial condition has been set to 𝑞10 = 𝑞20 = 1.0.
To gain some perspective we can compare this to what happens in the monopoly case.
The first panel in the next figure compares output of the monopolist and industry output un-
der the MPE, as a function of time.
836 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
Here parameters are the same as above for both the MPE and monopoly solutions.
The monopolist initial condition is 𝑞0 = 2.0 to mimic the industry initial condition 𝑞10 =
𝑞20 = 1.0 in the MPE case.
As expected, output is higher and prices are lower under duopoly than monopoly.
46.6 Exercises
46.6.1 Exercise 1
Replicate the pair of figures showing the comparison of output and prices for the monopolist
and duopoly under MPE.
Parameters are as in duopoly_mpe.jl and you can use that code to compute MPE policies
under duopoly.
The optimal policy in the monopolist case can be computed using QuantEcon.jl’s LQ type.
46.6.2 Exercise 2
It takes the form of infinite horizon linear quadratic game proposed by Judd [59].
Two firms set prices and quantities of two goods interrelated through their demand curves.
Relevant variables are defined as follows:
• 𝐼𝑖𝑡 = inventories of firm 𝑖 at beginning of 𝑡
• 𝑞𝑖𝑡 = production of firm 𝑖 during period 𝑡
• 𝑝𝑖𝑡 = price charged by firm 𝑖 during period 𝑡
• 𝑆𝑖𝑡 = sales made by firm 𝑖 during period 𝑡
• 𝐸𝑖𝑡 = costs of production of firm 𝑖 during period 𝑡
• 𝐶𝑖𝑡 = costs of carrying inventories for firm 𝑖 during 𝑡
The firms’ cost functions are
2
• 𝐶𝑖𝑡 = 𝑐𝑖1 + 𝑐𝑖2 𝐼𝑖𝑡 + 0.5𝑐𝑖3 𝐼𝑖𝑡
2
• 𝐸𝑖𝑡 = 𝑒𝑖1 + 𝑒𝑖2 𝑞𝑖𝑡 + 0.5𝑒𝑖3 𝑞𝑖𝑡 where 𝑒𝑖𝑗 , 𝑐𝑖𝑗 are positive scalars
Inventories obey the laws of motion
𝑆𝑡 = 𝐷𝑝𝑖𝑡 + 𝑏
where
′
• 𝑆𝑡 = [𝑆1𝑡 𝑆2𝑡 ]
• 𝐷 is a 2 × 2 negative definite matrix and
• 𝑏 is a vector of constants
Firm 𝑖 maximizes the undiscounted sum
1 𝑇
lim ∑ (𝑝 𝑆 − 𝐸𝑖𝑡 − 𝐶𝑖𝑡 )
𝑇 →∞ 𝑇 𝑡=0 𝑖𝑡 𝑖𝑡
𝐼1𝑡
𝑝𝑖𝑡
𝑢𝑖𝑡 = [ ] and 𝑥𝑡 = ⎡ ⎤
⎢𝐼2𝑡 ⎥
𝑞𝑖𝑡
⎣1⎦
Decision rules for price and quantity take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 .
The Markov perfect equilibrium of Judd’s model can be computed by filling in the matrices
appropriately.
The exercise is to calculate these matrices and compute the following figures.
The first figure shows the dynamics of inventories for each firm when the parameters are
In [7]: δ = 0.02
D = [ -1 0.5;
0.5 -1]
b = [25, 25]
838 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
c1 = c2 = [1, -2, 1]
e1 = e2 = [10, 10, 3]
46.7 Solutions
46.7.1 Exercise 1
First let’s compute the duopoly MPE under the stated parameters
In [8]: # parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# in LQ form
A = I + zeros(3, 3)
B1 = [0.0, 1.0, 0.0]
B2 = [0.0, 0.0, 1.0]
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
Now we evaluate the time path of industry output and prices given initial condition 𝑞10 =
𝑞20 = 1
In [9]: AF = A - B1 * F1 - B2 * F2
n = 20
x = zeros(3, n)
x[:, 1] = [1 1 1]
for t in 1:(n-1)
x[:, t+1] = AF * x[:, t]
end
840 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
q1 = x[2, :]
q2 = x[3, :]
q = q1 + q2 # Total output, MPE
p = a0 .- a1 * q # Price, MPE
𝑥𝑡 = 𝑞𝑡 − 𝑞 ̄ and 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡
𝑅 = 𝑎1 and 𝑄=𝛾
𝐴=𝐵=1
In [10]: R = a1
Q = γ
A = B = 1
lq_alt = QuantEcon.LQ(Q, R, A, B, bet=β)
P, F, d = stationary_values(lq_alt)
�
q = a0 / (2.0 * a1)
qm = zeros(n)
qm[1] = 2
�
x0 = qm[1]-q
x = x0
for i in 2:n
x = A * x - B * F[1] * x
�
qm[i] = float(x) + q
end
pm = a0 .- a1 * qm
Out[11]:
46.7.2 Exercise 2
In [12]: δ = 0.02
D = [-1 0.5;
0.5 -1]
b = [25, 25]
c1 = c2 = [1, -2, 1]
e1 = e2 = [10, 10, 3]
δ_1 = 1-δ
Out[12]: 0.98
𝐼1𝑡
𝑝𝑖𝑡
𝑢𝑖𝑡 = [ ] and 𝑥𝑡 = ⎡ ⎤
⎢𝐼2𝑡 ⎥
𝑞𝑖𝑡
⎣1⎦
R2 = -[0 0 0;
0 0.5 * c2[3] 0.5*c2[2];
0 0.5 * c2[2] c2[1]]
Q1 = [-0.5*e1[3] 0;
0 D[1, 1]]
Q2 = [-0.5*e2[3] 0;
0 D[2, 2]]
S1 = zeros(2, 2)
S2 = copy(S1)
W1 = [ 0.0 0.0;
0.0 0.0;
-0.5 * e1[2] b[1] / 2.0]
W2 = [ 0.0 0.0;
0.0 0.0;
-0.5 * e2[2] b[2] / 2.0]
M1 = [0.0 0.0;
0.0 D[1, 2] / 2.0]
M2 = copy(M1)
In [14]: F1, F2, P1, P2 = nnash(A, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2)
Now let’s look at the dynamics of inventories, and reproduce the graph corresponding to 𝛿 =
0.02
In [15]: AF = A - B1 * F1 - B2 * F2
n = 25
x = zeros(3, n)
x[:, 1] = [2 0 1]
for t in 1:(n-1)
x[:, t+1] = AF * x[:, t]
end
I1 = x[1, :]
I2 = x[2, :]
Out[15]:
844 CHAPTER 46. MARKOV PERFECT EQUILIBRIUM
Chapter 47
47.1 Contents
• Overview 47.2
• Pricing Models 47.3
• Prices in the Risk Neutral Case 47.4
• Asset Prices under Risk Aversion 47.5
• Exercises 47.6
• Solutions 47.7
“A little knowledge of geometric series goes a long way” – Robert E. Lucas, Jr.
47.2 Overview
845
846 CHAPTER 47. ASSET PRICING I: FINITE STATE MODELS
What happens if for some reason traders discount payouts differently depending on the state
of the world?
Michael Harrison and David Kreps [49] and Lars Peter Hansen and Scott Richard [43] showed
that in quite general settings the price of an ex-dividend asset obeys
Recall that, from the definition of a conditional covariance cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ), we have
Aside from prices, another quantity of interest is the price-dividend ratio 𝑣𝑡 ∶= 𝑝𝑡 /𝑑𝑡 .
Let’s write down an expression that this ratio should satisfy.
We can divide both sides of (2) by 𝑑𝑡 to get
𝑑𝑡+1
𝑣𝑡 = 𝔼𝑡 [𝑚𝑡+1 (1 + 𝑣𝑡+1 )] (5)
𝑑𝑡
What can we say about price dynamics on the basis of the models described above?
The answer to this question depends on
For now let’s focus on the risk neutral case, where the stochastic discount factor is constant,
and study how prices depend on the dividend process.
The simplest case is risk neutral pricing in the face of a constant, non-random dividend
stream 𝑑𝑡 = 𝑑 > 0.
Removing the expectation from (1) and iterating forward gives
𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 )
= 𝛽(𝑑 + 𝛽(𝑑 + 𝑝𝑡+2 ))
⋮
= 𝛽(𝑑 + 𝛽𝑑 + 𝛽 2 𝑑 + ⋯ + 𝛽 𝑘−2 𝑑 + 𝛽 𝑘−1 𝑝𝑡+𝑘 )
848 CHAPTER 47. ASSET PRICING I: FINITE STATE MODELS
𝛽𝑑
𝑝̄ ∶= (6)
1−𝛽
Consider a growing, non-random dividend process 𝑑𝑡+1 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1.
While prices are not usually constant when dividends grow over time, the price dividend-ratio
might be.
If we guess this, substituting 𝑣𝑡 = 𝑣 into (5) as well as our other assumptions, we get 𝑣 =
𝛽𝑔(1 + 𝑣).
Since 𝛽𝑔 < 1, we have a unique positive solution:
𝛽𝑔
𝑣=
1 − 𝛽𝑔
𝛽𝑔
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡
If, in this example, we take 𝑔 = 1 + 𝜅 and let 𝜌 ∶= 1/𝛽 − 1, then the price becomes
1+𝜅
𝑝𝑡 = 𝑑
𝜌−𝜅 𝑡
𝑔𝑡 = 𝑔(𝑋𝑡 ), 𝑡 = 1, 2, …
where
1. {𝑋𝑡 } is a finite Markov chain with state space 𝑆 and transition probabilities
47.4. PRICES IN THE RISK NEUTRAL CASE 849
47.4.4 Setup
In [3]: n = 25
mc = tauchen(n, 0.96, 0.25)
sim_length = 80
Out[3]:
850 CHAPTER 47. ASSET PRICING I: FINITE STATE MODELS
Pricing
To obtain asset prices in this setting, let’s adapt our analysis from the case of deterministic
growth.
In that case we found that 𝑣 is constant.
This encourages us to guess that, in the current case, 𝑣𝑡 is constant given the state 𝑋𝑡 .
In other words, we are looking for a fixed function 𝑣 such that the price-dividend ratio satis-
fies 𝑣𝑡 = 𝑣(𝑋𝑡 ).
We can substitute this guess into (5) to get
or
𝑣 = 𝛽𝐾(𝟙 + 𝑣) (9)
47.4. PRICES IN THE RISK NEUTRAL CASE 851
Here
• 𝑣 is understood to be the column vector (𝑣(𝑥1 ), … , 𝑣(𝑥𝑛 ))′
• 𝐾 is the matrix (𝐾(𝑥𝑖 , 𝑥𝑗 ))1≤𝑖,𝑗≤𝑛
• 𝟙 is a column vector of ones
When does (9) have a unique solution?
From the Neumann series lemma and Gelfand’s formula, this will be the case if 𝛽𝐾 has spec-
tral radius strictly less than one.
In other words, we require that the eigenvalues of 𝐾 be strictly less than 𝛽 −1 in modulus.
The solution is then
47.4.5 Code
K = mc.p .* exp.(mc.state_values)'
v = (I - β * K) \ (β * K * ones(n, 1))
plot(mc.state_values,
v,
lw = 2,
ylabel = "price-dividend ratio",
xlabel = "state",
alpha = 0.7,
label = "v")
Out[4]:
852 CHAPTER 47. ASSET PRICING I: FINITE STATE MODELS
Now let’s turn to the case where agents are risk averse.
We’ll price several distinct assets, including
• The price of an endowment stream.
• A consol (a variety of bond issued by the UK government in the 19th century).
• Call options on a consol.
Let’s start with a version of the celebrated asset pricing model of Robert E. Lucas, Jr. [69].
As in [69], suppose that the stochastic discount factor takes the form
𝑢′ (𝑐𝑡+1 )
𝑚𝑡+1 = 𝛽 (11)
𝑢′ (𝑐𝑡 )
𝑐1−𝛾
𝑢(𝑐) = with 𝛾 > 0 (12)
1−𝛾
−𝛾
𝑐 −𝛾
𝑚𝑡+1 = 𝛽 ( 𝑡+1 ) = 𝛽𝑔𝑡+1 (13)
𝑐𝑡
If we let
𝑣 = 𝛽𝐽 (𝟙 + 𝑣)
Assuming that the spectral radius of 𝐽 is strictly less than 𝛽 −1 , this equation has the unique
solution
𝑣 = (𝐼 − 𝛽𝐽 )−1 𝛽𝐽 𝟙 (14)
We will define a function tree_price to solve for $v$ given parameters stored in the Asset-
PriceModel objects
mc = default_mc,
n = size(mc.p)[1],
g = exp)
# Compute v
v = (I - β * J) \ sum(β * J, dims = 2)
return v
end
Here’s a plot of 𝑣 as a function of the state for several values of 𝛾, with a positively correlated
Markov process and 𝑔(𝑥) = exp(𝑥)
lines = []
labels = []
for γ in γs
v = tree_price(ap, γ = γ)
label = "gamma = $γ"
push!(labels, label)
push!(lines, v)
end
plot(lines,
labels = reshape(labels, 1, length(labels)),
title = "Price-dividend ratio as a function of the state",
ylabel = "price-dividend ratio",
xlabel = "state")
Out[6]:
47.5. ASSET PRICES UNDER RISK AVERSION 855
Special cases
∞
1
𝑣 = 𝛽(𝐼 − 𝛽𝑃 )−1 𝟙 = 𝛽 ∑ 𝛽 𝑖 𝑃 𝑖 𝟙 = 𝛽 𝟙
𝑖=0
1−𝛽
Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant.
Alternatively, if 𝛾 = 0, then 𝐽 = 𝐾 and we recover the risk neutral solution (10).
This is as expected, since 𝛾 = 0 implies 𝑢(𝑐) = 𝑐 (and hence agents are risk neutral).
• 𝜁 in period 𝑡 + 1, plus
• the right to sell the claim for 𝑝𝑡+1 next period
The price satisfies (2) with 𝑑𝑡 = 𝜁, or
𝑝𝑡 = 𝔼𝑡 [𝑚𝑡+1 (𝜁 + 𝑝𝑡+1 )]
−𝛾
𝑝𝑡 = 𝔼𝑡 [𝛽𝑔𝑡+1 (𝜁 + 𝑝𝑡+1 )] (15)
Letting 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and rewriting in vector notation yields the solution
𝑝 = (𝐼 − 𝛽𝑀 )−1 𝛽𝑀 𝜁𝟙 (16)
# Compute price
return (I - β * M) \ sum(β * ζ * M, dims = 2)
end
Let’s now price options of varying maturity that give the right to purchase a consol at a price
𝑝𝑆 .
2. Not to exercise the option now but to retain the right to exercise it later
Thus, the owner either exercises the option now, or chooses not to exercise and wait until
next period.
This is termed an infinite-horizon call option with strike price 𝑝𝑆 .
The owner of the option is entitled to purchase the consol at the price 𝑝𝑆 at the beginning of
any period, after the coupon has been paid to the previous owner of the bond.
The fundamentals of the economy are identical with the one above, including the stochastic
discount factor and the process for consumption.
Let 𝑤(𝑋𝑡 , 𝑝𝑆 ) be the value of the option when the time 𝑡 growth state is known to be 𝑋𝑡 but
before the owner has decided whether or not to exercise the option at time 𝑡 (i.e., today).
Recalling that 𝑝(𝑋𝑡 ) is the value of the consol when the initial growth state is 𝑋𝑡 , the value
of the option satisfies
𝑢′ (𝑐𝑡+1 )
𝑤(𝑋𝑡 , 𝑝𝑆 ) = max {𝛽 𝔼𝑡 𝑤(𝑋𝑡+1 , 𝑝𝑆 ), 𝑝(𝑋𝑡 ) − 𝑝𝑆 }
𝑢′ (𝑐𝑡 )
The first term on the right is the value of waiting, while the second is the value of exercising
now.
We can also write this as
With 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and 𝑤 as the vector of values (𝑤(𝑥𝑖 ), 𝑝𝑆 )𝑛𝑖=1 , we can express
(17) as the nonlinear vector equation
𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙} (18)
To solve (18), form the operator 𝑇 mapping vector 𝑤 into vector 𝑇 𝑤 via
𝑇 𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙}
return w
end
In [9]: ap = AssetPriceModel(β=0.9)
ζ = 1.0
strike_price = 40.0
x = ap.mc.state_values
p = consol_price(ap, ζ)
w = call_option(ap, ζ, strike_price)
Out[9]:
47.6. EXERCISES 859
−𝛾
As before, the stochastic discount factor is 𝑚𝑡+1 = 𝛽𝑔𝑡+1 .
It follows that the reciprocal 𝑅𝑡−1 of the gross risk-free interest rate 𝑅𝑡 in state 𝑥 is
𝑚1 = 𝛽𝑀 𝟙
where the 𝑖-th element of 𝑚1 is the reciprocal of the one-period gross risk-free interest rate in
state 𝑥𝑖 .
Other terms
Let 𝑚𝑗 be an 𝑛 × 1 vector whose 𝑖 th component is the reciprocal of the 𝑗 -period gross risk-
free interest rate in state 𝑥𝑖 .
Then 𝑚1 = 𝛽𝑀 , and 𝑚𝑗+1 = 𝑀 𝑚𝑗 for 𝑗 ≥ 1.
47.6 Exercises
47.6.1 Exercise 1
47.6.2 Exercise 2
In [10]: n = 5
P = fill(0.0125, n, n) + (0.95 - 0.0125)I
s = [1.05, 1.025, 1.0, 0.975, 0.95]
γ = 2.0
β = 0.94
ζ = 1.0
Out[10]: 1.0
47.6.3 Exercise 3
Let’s consider finite horizon call options, which are more common than the infinite horizon
variety.
Finite horizon options obey functional equations closely related to (17).
A 𝑘 period option expires after 𝑘 periods.
If we view today as date zero, a 𝑘 period option gives the owner the right to exercise the op-
tion to purchase the risk-free consol at the strike price 𝑝𝑆 at dates 0, 1, … , 𝑘 − 1.
The option expires at time 𝑘.
Thus, for 𝑘 = 1, 2, …, let 𝑤(𝑥, 𝑘) be the value of a 𝑘-period option.
It obeys
47.7 Solutions
47.7.1 Exercise 2
In [11]: n = 5
P = fill(0.0125, n, n) + (0.95 - 0.0125)I
s = [0.95, 0.975, 1.0, 1.025, 1.05] # state values
mc = MarkovChain(P, s)
γ = 2.0
β = 0.94
ζ = 1.0
p_s = 150.0
Out[11]: 150.0
In [13]: v = tree_price(ap)
println("Lucas Tree Prices: $v\n")
47.7.2 Exercise 3
return w
end
In [17]: lines = []
labels = []
for k in [5, 25]
w = finite_horizon_call_option(ap, ζ, p_s, k)
push!(lines, w)
push!(labels, "k = $k")
end
plot(lines, labels = reshape(labels, 1, length(labels)))
Out[17]:
47.7. SOLUTIONS 863
Not surprisingly, the option has greater value with larger 𝑘. This is because the owner has a
longer time horizon over which he or she may exercise the option.
864 CHAPTER 47. ASSET PRICING I: FINITE STATE MODELS
Chapter 48
48.1 Contents
• Overview 48.2
• The Lucas Model 48.3
• Exercises 48.4
• Solutions 48.5
48.2 Overview
Lucas studied a pure exchange economy with a representative consumer (or household), where
• Pure exchange means that all endowments are exogenous.
• Representative consumer means that either
– there is a single consumer (sometimes also referred to as a household), or
– all consumers have identical endowments and preferences
865
866 CHAPTER 48. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
Either way, the assumption of a representative agent means that prices adjust to eradicate
desires to trade.
This makes it very easy to compute competitive equilibrium prices.
Assets
There is a single “productive unit” that costlessly generates a sequence of consumption goods
{𝑦𝑡 }∞
𝑡=0 .
We will assume that this endowment is Markovian, following the exogenous process
Consumers
A representative consumer ranks consumption streams {𝑐𝑡 } according to the time separable
utility functional
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
Here
• 𝛽 ∈ (0, 1) is a fixed discount factor
• 𝑢 is a strictly increasing, strictly concave, continuously differentiable period utility func-
tion
• 𝔼 is a mathematical expectation
𝑐𝑡 + 𝜋𝑡+1 𝑝𝑡 ≤ 𝜋𝑡 𝑦𝑡 + 𝜋𝑡 𝑝𝑡
𝑣(𝜋, 𝑦) = max
′
{𝑢(𝑐) + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)}
𝑐,𝜋
subject to
We can invoke the fact that utility is increasing to claim equality in (2) and hence eliminate
the constraint, obtaining
868 CHAPTER 48. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
𝑣(𝜋, 𝑦) = max
′
{𝑢[𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ 𝑝(𝑦)] + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)} (3)
𝜋
The solution to this dynamic programming problem is an optimal policy expressing either 𝜋′
or 𝑐 as a function of the state (𝜋, 𝑦).
• Each one determines the other, since 𝑐(𝜋, 𝑦) = 𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ (𝜋, 𝑦)𝑝(𝑦).
Next steps
1. Solve this two dimensional dynamic programming problem for the optimal policy.
However, as Lucas showed, there is a related but more straightforward way to do this.
Equilibrium constraints
Since the consumption good is not storable, in equilibrium we must have 𝑐𝑡 = 𝑦𝑡 for all 𝑡.
In addition, since there is one representative consumer (alternatively, since all consumers are
identical), there should be no trade in equilibrium.
In particular, the representative consumer owns the whole tree in every period, so 𝜋𝑡 = 1 for
all 𝑡.
Prices must adjust to satisfy these two constraints.
Now observe that the first order condition for (3) can be written as
Next we impose the equilibrium constraints while combining the last two equations to get
𝑢′ [𝐺(𝑦, 𝑧)]
𝑝(𝑦) = 𝛽 ∫ [𝐺(𝑦, 𝑧) + 𝑝(𝐺(𝑦, 𝑧))]𝜙(𝑑𝑧) (4)
𝑢′ (𝑦)
48.3. THE LUCAS MODEL 869
𝑢′ (𝑐𝑡+1 )
𝑝𝑡 = 𝔼𝑡 [𝛽 (𝑦 + 𝑝𝑡+1 )] (5)
𝑢′ (𝑐𝑡 ) 𝑡+1
Instead of solving for it directly we’ll follow Lucas’ indirect approach, first setting
Here ℎ(𝑦) ∶= 𝛽 ∫ 𝑢′ [𝐺(𝑦, 𝑧)]𝐺(𝑦, 𝑧)𝜙(𝑑𝑧) is a function that depends only on the primitives.
Equation (7) is a functional equation in 𝑓.
The plan is to solve out for 𝑓 and convert back to 𝑝 via (6).
To solve (7) we’ll use a standard method: convert it to a fixed point problem.
First we introduce the operator 𝑇 mapping 𝑓 into 𝑇 𝑓 as defined by
The reason we do this is that a solution to (7) now corresponds to a function 𝑓 ∗ satisfying
(𝑇 𝑓 ∗ )(𝑦) = 𝑓 ∗ (𝑦) for all 𝑦.
In other words, a solution is a fixed point of 𝑇 .
This means that we can use fixed point theory to obtain and compute the solution.
(Note: If you find the mathematics heavy going you can take 1–2 as given and skip to the
next section)
Recall the Banach contraction mapping theorem.
It tells us that the previous statements will be true if we can find an 𝛼 < 1 such that
≤ 𝛽 ∫ ‖𝑓 − 𝑔‖𝜙(𝑑𝑧)
= 𝛽‖𝑓 − 𝑔‖
Since the right hand side is an upper bound, taking the sup over all 𝑦 on the left hand side
gives (9) with 𝛼 ∶= 𝛽.
The preceding discussion tells that we can compute 𝑓 ∗ by picking any arbitrary 𝑓 ∈ 𝑐𝑏ℝ+ and
then iterating with 𝑇 .
The equilibrium price function 𝑝∗ can then be recovered by 𝑝∗ (𝑦) = 𝑓 ∗ (𝑦)/𝑢′ (𝑦).
Let’s try this when ln 𝑦𝑡+1 = 𝛼 ln 𝑦𝑡 + 𝜎𝜖𝑡+1 where {𝜖𝑡 } is iid and standard normal.
Utility will take the isoelastic form 𝑢(𝑐) = 𝑐1−𝛾 /(1 − 𝛾), where 𝛾 > 0 is the coefficient of
relative risk aversion.
Some code to implement the iterative computational procedure can be found below:
48.3.5 Setup
In [3]: # model
function LucasTree(;γ = 2.0,
β = 0.95,
α = 0.9,
σ = 0.1,
grid_size = 100)
ϕ = LogNormal(0.0, σ)
shocks = rand(ϕ, 500)
# unpack input
@unpack grid, α, β, h = lt
z = lt.shocks
Af = LinearInterpolation(grid, f, extrapolation_bc=Line())
@unpack grid, γ = lt
i = 0
f = zero(grid) # Initial guess of f
error = tol + 1
# p(y) = f(y) * y ^ γ
872 CHAPTER 48. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
price = f .* grid.^γ
return price
end
Out[5]:
The price is increasing, even if we remove all serial correlation from the endowment process.
The reason is that a larger current endowment reduces current marginal utility.
The price must therefore rise to induce the household to consume the entire endowment (and
hence satisfy the resource constraint).
What happens with a more patient consumer?
48.4. EXERCISES 873
Here the orange line corresponds to the previous parameters and the green line is price when
𝛽 = 0.98.
We see that when consumers are more patient the asset becomes more valuable, and the price
of the Lucas tree shifts up.
Exercise 1 asks you to replicate this figure.
48.4 Exercises
48.4.1 Exercise 1
48.5 Solutions
In [6]: plot()
for β in (.95, 0.98)
tree = LucasTree(;β = β)
grid = tree.grid
price_vals = solve_lucas_model(tree)
plot!(grid, price_vals, lw = 2, label = "beta = beta_var")
end
Out[6]:
874 CHAPTER 48. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
Chapter 49
49.1 Contents
• Overview 49.2
• Structure of the Model 49.3
• Solving the Model 49.4
• Exercises 49.5
• Solutions 49.6
49.2 Overview
49.2.1 References
Prior to reading the following you might like to review our lectures on
• Markov chains
• Asset pricing with finite state space
49.2.2 Bubbles
875
876 CHAPTER 49. ASSET PRICING III: INCOMPLETE MARKETS
The Harrison-Kreps model illustrates the following notion of a bubble that attracts many
economists:
49.2.3 Setup
The model simplifies by ignoring alterations in the distribution of wealth among investors
having different beliefs about the fundamentals that determine asset payouts.
There is a fixed number 𝐴 of shares of an asset.
Each share entitles its owner to a stream of dividends {𝑑𝑡 } governed by a Markov chain de-
fined on a state space 𝑆 ∈ {0, 1}.
The dividend obeys
0 if 𝑠𝑡 = 0
𝑑𝑡 = {
1 if 𝑠𝑡 = 1
The owner of a share at the beginning of time 𝑡 is entitled to the dividend paid at time 𝑡.
The owner of the share at the beginning of time 𝑡 is also entitled to sell the share to another
investor during time 𝑡.
Two types ℎ = 𝑎, 𝑏 of investors differ only in their beliefs about a Markov transition matrix 𝑃
with typical element
𝑃 (𝑖, 𝑗) = ℙ{𝑠𝑡+1 = 𝑗 ∣ 𝑠𝑡 = 𝑖}
1 1
𝑃𝑎 = [ 22 2]
1
3 3
2 1
𝑃𝑏 = [ 31 3]
3
4 4
49.3. STRUCTURE OF THE MODEL 877
The stationary (i.e., invariant) distributions of these two matrices can be calculated as fol-
lows:
An owner of the asset at the end of time 𝑡 is entitled to the dividend at time 𝑡 + 1 and also
has the right to sell the asset at time 𝑡 + 1.
Both types of investors are risk-neutral and both have the same fixed discount factor 𝛽 ∈
(0, 1).
In our numerical example, we’ll set 𝛽 = .75, just as Harrison and Kreps did.
We’ll eventually study the consequences of two different assumptions about the number of
shares 𝐴 relative to the resources that our two types of investors can invest in the stock.
1. Both types of investors have enough resources (either wealth or the capacity to borrow)
so that they can purchase the entire available stock of the asset Section ??.
2. No single type of investor has sufficient resources to purchase the entire stock.
The above specifications of the perceived transition matrices 𝑃𝑎 and 𝑃𝑏 , taken directly from
Harrison and Kreps, build in stochastically alternating temporary optimism and pessimism.
Remember that state 1 is the high dividend state.
• In state 0, a type 𝑎 agent is more optimistic about next period’s dividend than a type 𝑏
agent.
• In state 1, a type 𝑏 agent is more optimistic about next period’s dividend.
However, the stationary distributions 𝜋𝐴 = [.57 .43] and 𝜋𝐵 = [.43 .57] tell us that a
type 𝐵 person is more optimistic about the dividend process in the long run than is a type A
person.
Transition matrices for the temporarily optimistic and pessimistic investors are constructed as
follows.
Temporarily optimistic investors (i.e., the investor with the most optimistic beliefs in each
state) believe the transition matrix
1 1
𝑃𝑜 = [ 21 2]
3
4 4
1 1
𝑃𝑝 = [ 21 2]
3
4 4
49.3.4 Information
Investors know a price function mapping the state 𝑠𝑡 at 𝑡 into the equilibrium price 𝑝(𝑠𝑡 ) that
prevails in that state.
This price function is endogenous and to be determined below.
When investors choose whether to purchase or sell the asset at 𝑡, they also know 𝑠𝑡 .
2. There are two types of agent differentiated only by their beliefs. Each type of agent has
sufficient resources to purchase all of the asset (Harrison and Kreps’s setting).
49.4. SOLVING THE MODEL 879
3. There are two types of agent with different beliefs, but because of limited wealth and/or
limited leverage, both types of investors hold the asset each period.
The following table gives a summary of the findings obtained in the remainder of the lecture
(you will be asked to recreate the table in an exercise).
It records implications of Harrison and Kreps’s specifications of 𝑃𝑎 , 𝑃𝑏 , 𝛽.
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
Here
• 𝑝𝑎 is the equilibrium price function under homogeneous beliefs 𝑃𝑎
• 𝑝𝑏 is the equilibrium price function under homogeneous beliefs 𝑃𝑏
• 𝑝𝑜 is the equilibrium price function under heterogeneous beliefs with optimistic marginal
investors
• 𝑝𝑝 is the equilibrium price function under heterogeneous beliefs with pessimistic
marginal investors
• 𝑝𝑎̂ is the amount type 𝑎 investors are willing to pay for the asset
• 𝑝𝑏̂ is the amount type 𝑏 investors are willing to pay for the asset
We’ll explain these values and how they are calculated one row at a time.
𝑝 (0) 0
[ ℎ ] = 𝛽[𝐼 − 𝛽𝑃ℎ ]−1 𝑃ℎ [ ] (1)
𝑝ℎ (1) 1
880 CHAPTER 49. ASSET PRICING III: INCOMPLETE MARKETS
The first two rows of of the table report 𝑝𝑎 (𝑠) and 𝑝𝑏 (𝑠).
Here’s a function that can be used to compute these values
return prices
end
These equilibrium prices under homogeneous beliefs are important benchmarks for the subse-
quent analysis.
• 𝑝ℎ (𝑠) tells what investor ℎ thinks is the “fundamental value” of the asset.
• Here “fundamental value” means the expected discounted present value of future divi-
dends.
We will compare these fundamental values of the asset with equilibrium values when traders
have different beliefs.
𝑝(𝑠)
̄ = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̄ 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̄ (2)
for 𝑠 = 0, 1.
The marginal investor who prices the asset in state 𝑠 is of type 𝑎 if
𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ > 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
𝑃𝑎 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ < 𝑃𝑏 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
49.4. SOLVING THE MODEL 881
𝑝̄𝑗+1 (𝑠) = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑎 (𝑠, 1)(1 + 𝑝̄𝑗 (1)), 𝑃𝑏 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑏 (𝑠, 1)(1 + 𝑝̄𝑗 (1))} (3)
for 𝑠 = 0, 1.
The third row of the table reports equilibrium prices that solve the functional equation when
𝛽 = .75.
Here the type that is optimistic about 𝑠𝑡+1 prices the asset in state 𝑠𝑡 .
It is instructive to compare these prices with the equilibrium prices for the homogeneous be-
lief economies that solve under beliefs 𝑃𝑎 and 𝑃𝑏 .
Equilibrium prices 𝑝̄ in the heterogeneous beliefs economy exceed what any prospective in-
vestor regards as the fundamental value of the asset in each possible state.
Nevertheless, the economy recurrently visits a state that makes each investor want to pur-
chase the asset for more than he believes its future dividends are worth.
The reason is that he expects to have the option to sell the asset later to another investor
who will value the asset more highly than he will.
• Investors of type 𝑎 are willing to pay the following price for the asset
𝑝(0)
̄ if 𝑠𝑡 = 0
𝑝𝑎̂ (𝑠) = {
𝛽(𝑃𝑎 (1, 0)𝑝(0)
̄ + 𝑃𝑎 (1, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 1
• Investors of type 𝑏 are willing to pay the following price for the asset
Outcomes differ when the more optimistic type of investor has insufficient wealth — or insuf-
ficient ability to borrow enough — to hold the entire stock of the asset.
In this case, the asset price must adjust to attract pessimistic investors.
Instead of equation (2), the equilibrium price satisfies
𝑝(𝑠)
̌ = 𝛽 min {𝑃𝑎 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̌ 𝑃𝑏 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̌ (4)
and the marginal investor who prices the asset is always the one that values it less highly
than does the other type.
Now the marginal investor is always the (temporarily) pessimistic type.
Notice from the sixth row of that the pessimistic price 𝑝 is lower than the homogeneous belief
prices 𝑝𝑎 and 𝑝𝑏 in both states.
When pessimistic investors price the asset according to (4), optimistic investors think that
the asset is underpriced.
If they could, optimistic investors would willingly borrow at the one-period gross interest rate
𝛽 −1 to purchase more of the asset.
Implicit constraints on leverage prohibit them from doing so.
When optimistic investors price the asset as in equation (2), pessimistic investors think that
the asset is overpriced and would like to sell the asset short.
49.4. SOLVING THE MODEL 883
p_new = β * temp
return p_new
end
49.5 Exercises
49.5.1 Exercise 1
Recreate the summary table using the functions we have built above.
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
You will first need to define the transition matrices and dividend payoff vector.
49.6 Solutions
49.6.1 Exercise 1
First we will obtain equilibrium price vectors with homogeneous beliefs, including when all
investors are optimistic or pessimistic
dividendreturn = [0; 1]
println("State 0: $s0")
println("State 1: $s1")
println(repeat("-", 20))
end
p_a
====================
State 0: 1.33
49.6. SOLUTIONS 885
State 1: 1.22
--------------------
p_b
====================
State 0: 1.45
State 1: 1.91
--------------------
p_optimistic
====================
State 0: 1.85
State 1: 2.08
--------------------
p_pessimistic
====================
State 0: 1.0
State 1: 1.0
--------------------
We will use the price_optimistic_beliefs function to find the price under heterogeneous be-
liefs.
p_optimistic
====================
State 0: 1.85
State 1: 2.08
--------------------
p_hat_a
====================
State 0: 1.85
State 1: 1.69
--------------------
p_hat_b
====================
State 0: 1.69
State 1: 2.08
--------------------
Notice that the equilibrium price with heterogeneous beliefs is equal to the price under single
beliefs with optimistic investors - this is due to the marginal investor being the temporarily
optimistic type.
Footnotes
[1] By assuming that both types of agent always have “deep enough pockets” to purchase all
of the asset, the model takes wealth dynamics off the table. The Harrison-Kreps model gener-
ates high trading volume when the state changes either from 0 to 1 or from 1 to 0.
886 CHAPTER 49. ASSET PRICING III: INCOMPLETE MARKETS
Chapter 50
Uncertainty Traps
50.1 Contents
• Overview 50.2
• The Model 50.3
• Implementation 50.4
• Results 50.5
• Exercises 50.6
• Solutions 50.7
• Exercise 2 50.8
50.2 Overview
887
888 CHAPTER 50. UNCERTAINTY TRAPS
The original model described in [30] has many interesting moving parts.
Here we examine a simplified version that nonetheless captures many of the key ideas.
50.3.1 Fundamentals
where
• 𝜎𝜃 > 0 and 0 < 𝜌 < 1
• {𝑤𝑡 } is IID and standard normal
The random variable 𝜃𝑡 is not observable at any time.
50.3.2 Output
Let
• 𝕄 ⊂ {1, … , 𝑀̄ } denote the set of currently active firms
• 𝑀 ∶= |𝕄| denote the number of currently active firms
1
• 𝑋 be the average output 𝑀 ∑𝑚∈𝕄 𝑥𝑚 of the active firms
With this notation and primes for next period values, we can write the updating of the mean
and precision via
𝛾𝜇 + 𝑀 𝛾𝑥 𝑋
𝜇′ = 𝜌 (2)
𝛾 + 𝑀 𝛾𝑥
−1
′ 𝜌2
𝛾 =( + 𝜎𝜃2 ) (3)
𝛾 + 𝑀 𝛾𝑥
These are standard Kalman filtering results applied to the current setting.
Exercise 1 provides more details on how (2) and (3) are derived, and then asks you to fill in
remaining steps.
The next figure plots the law of motion for the precision in (3) as a 45 degree diagram, with
one curve for each 𝑀 ∈ {0, … , 6}.
The other parameter values are 𝜌 = 0.99, 𝛾𝑥 = 0.5, 𝜎𝜃 = 0.5
Points where the curves hit the 45 degree lines are long run steady states for precision for dif-
890 CHAPTER 50. UNCERTAINTY TRAPS
ferent values of 𝑀 .
Thus, if one of these values for 𝑀 remains fixed, a corresponding steady state is the equilib-
rium level of precision
• high values of 𝑀 correspond to greater information about the fundamental, and hence
more precision in steady state
• low values of 𝑀 correspond to less information and more uncertainty in steady state
In practice, as we’ll see, the number of active firms fluctuates stochastically.
50.3.4 Participation
Omitting time subscripts once more, entrepreneurs enter the market in the current period if
Here
• the mathematical expectation of 𝑥𝑚 is based on (1) and beliefs 𝑁 (𝜇, 𝛾 −1 ) for 𝜃
• 𝐹𝑚 is a stochastic but previsible fixed cost, independent across time and firms
• 𝑐 is a constant reflecting opportunity costs
The statement that 𝐹𝑚 is previsible means that it is realized at the start of the period and
treated as a constant in (4).
The utility function has the constant absolute risk aversion form
1
𝑢(𝑥) = (1 − exp(−𝑎𝑥)) (5)
𝑎
1
{1 − 𝔼[exp (−𝑎(𝜃 + 𝜖𝑚 − 𝐹𝑚 ))]} > 𝑐
𝑎
Using standard formulas for expectations of lognormal random variables, this is equivalent to
the condition
1 𝑎2 ( 𝛾1 + 1
𝛾𝑥 )
𝜓(𝜇, 𝛾, 𝐹𝑚 ) ∶= (1 − exp (−𝑎𝜇 + 𝑎𝐹𝑚 + )) − 𝑐 > 0 (6)
𝑎 2
50.4 Implementation
The updating methods follow the laws of motion for 𝜃, 𝜇 and 𝛾 given above.
The method to evaluate the number of active firms generates 𝐹1 , … , 𝐹𝑀̄ and tests condition
(6) for each firm.
50.4.1 Setup
In the results below we use this code to simulate time series for the major variables.
50.5 Results
Let’s look first at the dynamics of 𝜇, which the agents use to track 𝜃
892 CHAPTER 50. UNCERTAINTY TRAPS
We see that 𝜇 tracks 𝜃 well when there are sufficient firms in the market.
However, there are times when 𝜇 tracks 𝜃 poorly due to insufficient information.
These are episodes where the uncertainty traps take hold.
During these episodes
• precision is low and uncertainty is high
• few firms are in the market
To get a clearer idea of the dynamics, let’s look at all the main time series at once, for a given
set of shocks
50.5. RESULTS 893
Notice how the traps only take hold after a sequence of bad draws for the fundamental.
Thus, the model gives us a propagation mechanism that maps bad random draws into long
downturns in economic activity.
894 CHAPTER 50. UNCERTAINTY TRAPS
50.6 Exercises
50.6.1 Exercise 1
Fill in the details behind (2) and (3) based on the following standard result (see, e.g., p. 24 of
[109]).
Fact Let x = (𝑥1 , … , 𝑥𝑀 ) be a vector of IID draws from common distribution 𝑁 (𝜃, 1/𝛾𝑥 ) and
let 𝑥̄ be the sample mean. If 𝛾𝑥 is known and the prior for 𝜃 is 𝑁 (𝜇, 1/𝛾), then the posterior
distribution of 𝜃 given x is
where
𝜇𝛾 + 𝑀 𝑥𝛾̄ 𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾 𝑥
𝛾 + 𝑀 𝛾𝑥
50.6.2 Exercise 2
50.7 Solutions
50.7.1 Exercise 1
This exercise asked you to validate the laws of motion for 𝛾 and 𝜇 given in the lecture, based
on the stated result about Bayesian updating in a scalar Gaussian setting.
The stated result tells us that after observing average output 𝑋 of the 𝑀 firms, our posterior
beliefs will be
𝑁 (𝜇0 , 1/𝛾0 )
where
𝜇𝛾 + 𝑀 𝑋𝛾𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾𝑥
𝛾 + 𝑀 𝛾𝑥
If we take a random variable 𝜃 with this distribution and then evaluate the distribution of
𝜌𝜃 + 𝜎𝜃 𝑤 where 𝑤 is independent and standard normal, we get the expressions for 𝜇′ and 𝛾 ′
given in the lecture.
50.8 Exercise 2
First let’s replicate the plot that illustrates the law of motion for precision, which is
50.8. EXERCISE 2 895
−1
𝜌2
𝛾𝑡+1 =( + 𝜎𝜃2 )
𝛾𝑡 + 𝑀 𝛾 𝑥
Here 𝑀 is the number of active firms. The next figure plots 𝛾𝑡+1 against 𝛾𝑡 on a 45 degree
diagram for different values of 𝑀
Out[4]:
The points where the curves hit the 45 degree lines are the long run steady states correspond-
ing to each 𝑀 , if that value of 𝑀 was to remain fixed. As the number of firms falls, so does
the long run steady state of precision.
Next let’s generate time series for beliefs and the aggregates – that is, the number of active
firms and average output
896 CHAPTER 50. UNCERTAINTY TRAPS
# aggregate functions
# auxiliary function ψ
function ψ(γ, μ, F)
temp1 = -a * (μ - F)
temp2 = 0.5 * a^2 / (γ + γ_x)
return (1 - exp(temp1 + temp2)) / a - c
end
# compute X, M
function gen_aggregates(γ, μ, θ)
F_vals = σ_F * randn(num_firms)
M = sum(ψ.(Ref(γ), Ref(μ), F_vals) .> 0) # counts number of active�
↪ firms
if any(ψ(γ, μ, f) > 0 for f in F_vals) # � an active firm
x_vals = θ .+ σ_x * randn(M)
X = mean(x_vals)
else
X = 0.0
end
return (X = X, M = M)
end
# initialize dataframe
X_init, M_init = gen_aggregates(γ_init, μ_init, θ_init)
df = DataFrame(γ = γ_init, μ = μ_init, θ = θ_init, X = X_init, M = M_init)
# update dataframe
for t in 2:capT
# unpack old variables
θ_old, γ_old, μ_old, X_old, M_old = (df.θ[end], df.γ[end], df.
↪μ[end], df.X[end],
df.M[end])
# return
return df
end
In [6]: df = simulate(econ)
Out[6]:
for i in 1:4
plot!(plt[i], len, yvals[i], xlabel = "t", ylabel = vars[i], label = "")
end
plot(plt)
Out[7]:
898 CHAPTER 50. UNCERTAINTY TRAPS
Chapter 51
51.1 Contents
• Overview 51.2
• The Economy 51.3
• Firms 51.4
• Code 51.5
51.2 Overview
In this lecture we describe the structure of a class of models that build on work by Truman
Bewley [12].
We begin by discussing an example of a Bewley model due to Rao Aiyagari.
The model features
• Heterogeneous agents.
• A single exogenous vehicle for borrowing and lending.
• Limits on amounts individual agents may borrow.
The Aiyagari model has been used to investigate many topics, including
• precautionary savings and the effect of liquidity constraints [1]
• risk sharing and asset pricing [50]
• the shape of the wealth distribution [9]
• etc., etc., etc.
51.2.1 References
899
900 CHAPTER 51. THE AIYAGARI MODEL
51.3.1 Households
∞
max 𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
where
• 𝑐𝑡 is current consumption
• 𝑎𝑡 is assets
• 𝑧𝑡 is an exogenous component of labor income capturing stochastic unemployment risk,
etc.
• 𝑤 is a wage rate
• 𝑟 is a net interest rate
• 𝐵 is the maximum amount that the agent is allowed to borrow
The exogenous process {𝑧𝑡 } follows a finite state Markov chain with given stochastic matrix
𝑃.
The wage and interest rate are fixed over time.
In this simple version of the model, households supply labor inelastically because they do not
value leisure.
51.4 Firms
𝑌𝑡 = 𝐴𝐾𝑡𝛼 𝑁 1−𝛼
where
• 𝐴 and 𝛼 are parameters with 𝐴 > 0 and 𝛼 ∈ (0, 1)
• 𝐾𝑡 is aggregate capital
• 𝑁 is total labor supply (which is constant in this simple version of the model)
51.5. CODE 901
1−𝛼
𝑁
𝑟 = 𝐴𝛼 ( ) −𝛿 (1)
𝐾
Using this expression and the firm’s first-order condition for labor, we can pin down the equi-
librium wage rate as a function of 𝑟 as
51.4.1 Equilibrium
2. determine corresponding prices, with interest rate 𝑟 determined by (1) and a wage rate
𝑤(𝑟) as given in (2)
3. determine the common optimal savings policy of the households given these prices
4. compute aggregate capital as the mean of steady state capital given this savings policy
51.5 Code
Our first task is the least exciting one: write code that maps parameters for a household
problem into the R and Q matrices needed to generate an instance of DiscreteDP.
Below is a piece of boilerplate code that does just this.
In reading the code, the following information will be helpful
• R needs to be a matrix where R[s, a] is the reward at state s under action a.
• Q needs to be a three dimensional array where Q[s, a, s'] is the probability of tran-
sitioning to state s' when the current state is s and the current action is a.
(For a detailed discussion of DiscreteDP see this lecture)
Here we take the state to be 𝑠𝑡 ∶= (𝑎𝑡 , 𝑧𝑡 ), where 𝑎𝑡 is assets and 𝑧𝑡 is the shock.
The action is the choice of next period asset level 𝑎𝑡+1 .
The object also includes a default set of parameters that we’ll adopt unless otherwise speci-
fied.
51.5.1 Setup
next_a_i = s_i_vals[next_s_i, 1]
if next_a_i == a_i
Q[s_i, a_i, next_s_i] = z_chain.p[z_i, next_z_i]
end
end
end
end
return Q
end
As a first example of what we can do, let’s compute and plot an optimal accumulation policy
at fixed prices
# Simplify names
@unpack z_size, a_size, n, a_vals = am
z_vals = am.z_chain.state_values
Out[5]:
904 CHAPTER 51. THE AIYAGARI MODEL
The plot shows asset accumulation policies at different values of the exogenous state.
Now we want to calculate the equilibrium.
Let’s do this visually as a first pass.
The following code draws aggregate supply and demand curves.
The intersection gives equilibrium interest rates and capital
function r_to_w(r)
return A * (1 - α) * (A * α / (r + δ)) ^ (α / (1 - α))
end
function rd(K)
return A * α * (N / K) ^ (1 - α) - δ
end
function prices_to_capital_stock(am, r)
# Set up problem
w = r_to_w(r)
@unpack a_vals, s_vals, u = am
setup_R!(am.R, a_vals, s_vals, r, w, u)
# Return K
return dot(am.s_vals[:, 1], stationary_probs)
end
Out[6]:
906 CHAPTER 51. THE AIYAGARI MODEL
Chapter 52
52.1 Contents
• Overview 52.2
• Structure 52.3
• Equilibrium 52.4
• Computation 52.5
• Results 52.6
• Exercises 52.7
• Solutions 52.8
52.2 Overview
907
908 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
incentive to default.
This can lead to
• spikes in interest rates
• temporary losses of access to international credit markets
• large drops in output, consumption, and welfare
• large capital outflows during recessions
Such dynamics are consistent with experiences of many countries.
52.3 Structure
A small open economy is endowed with an exogenous stochastically fluctuating potential out-
put stream {𝑦𝑡 }.
Potential output is realized only in periods in which the government honors its sovereign
debt.
The output good can be traded or consumed.
The sequence {𝑦𝑡 } is described by a Markov process with stochastic density kernel 𝑝(𝑦, 𝑦′ ).
Households within the country are identical and rank stochastic consumption streams accord-
ing to
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
Here
• 0 < 𝛽 < 1 is a time discount factor.
• 𝑢 is an increasing and strictly concave utility function.
Consumption sequences enjoyed by households are affected by the government’s decision to
borrow or lend internationally.
The government is benevolent in the sense that its aim is to maximize (1).
The government is the only domestic actor with access to foreign credit.
Because household are averse to consumption fluctuations, the government will try to smooth
consumption by borrowing from (and lending to) foreign creditors.
The only credit instrument available to the government is a one-period bond traded in inter-
national credit markets.
The bond market has the following features
• The bond matures in one period and is not state contingent.
52.3. STRUCTURE 909
• A purchase of a bond with face value 𝐵′ is a claim to 𝐵′ units of the consumption good
next period.
• To purchase 𝐵′ next period costs 𝑞𝐵′ now, or, what is equivalent.
• For selling −𝐵′ units of next period goods the seller earns −𝑞𝐵′ of today’s goods
– if 𝐵′ < 0, then −𝑞𝐵′ units of the good are received in the current period, for a
promise to repay −𝐵′ units next period
– there is an equilibrium price function 𝑞(𝐵′ , 𝑦) that makes 𝑞 depend on both 𝐵′ and
𝑦
Earnings on the government portfolio are distributed (or, if negative, taxed) lump sum to
households.
When the government is not excluded from financial markets, the one-period national budget
constraint is
Here and below, a prime denotes a next period value or a claim maturing next period.
To rule out Ponzi schemes, we also require that 𝐵 ≥ −𝑍 in every period.
• 𝑍 is chosen to be sufficiently large that the constraint never binds in equilibrium.
Foreign creditors
• are risk neutral
• know the domestic output stochastic process {𝑦𝑡 } and observe 𝑦𝑡 , 𝑦𝑡−1 , … , at time 𝑡
• can borrow or lend without limit in an international credit market at a constant inter-
national interest rate 𝑟
• receive full payment if the government chooses to pay
• receive zero if the government defaults on its one-period debt due
When a government is expected to default next period with probability 𝛿, the expected value
of a promise to pay one unit of consumption next period is 1 − 𝛿.
Therefore, the discounted expected value of a promise to pay 𝐵 next period is
1−𝛿
𝑞= (3)
1+𝑟
Next we turn to how the government in effect chooses the default probability 𝛿.
1. defaulting
2. meeting its current obligations and purchasing or selling an optimal quantity of one-
period sovereign debt
910 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
• it returns to 𝑦 only after the country regains access to international credit markets.
While in a state of default, the economy regains access to foreign credit in each subsequent
period with probability 𝜃.
52.4 Equilibrium
1. The interest rate on the government’s debt includes a risk-premium sufficient to make
foreign creditors expect on average to earn the constant risk-free international interest
rate.
To express these ideas more precisely, consider first the choices of the government, which
1. enters a period with initial assets 𝐵, or what is the same thing, initial debt to be repaid
now of −𝐵
3. chooses either
4. to default, or
In a recursive formulation,
52.4. EQUILIBRIUM 911
𝑣𝑐 (𝐵, 𝑦) = max
′
{𝑢(𝑦 − 𝑞(𝐵′ , 𝑦)𝐵′ + 𝐵) + 𝛽 ∫ 𝑣(𝐵′ , 𝑦′ )𝑝(𝑦, 𝑦′ )𝑑𝑦′ }
𝐵 ≥−𝑍
Given zero profits for foreign creditors in equilibrium, we can combine (3) and (4) to pin
down the bond price function:
1 − 𝛿(𝐵′ , 𝑦)
𝑞(𝐵′ , 𝑦) = (5)
1+𝑟
An equilibrium is
• a pricing function 𝑞(𝐵′ , 𝑦),
• a triple of value functions (𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦), 𝑣(𝐵, 𝑦)),
• a decision rule telling the government when to default and when to pay as a function of
the state (𝐵, 𝑦), and
• an asset accumulation rule that, conditional on choosing not to default, maps (𝐵, 𝑦) into
𝐵′
such that
• The three Bellman equations for (𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦), 𝑣(𝐵, 𝑦)) are satisfied.
912 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
• Given the price function 𝑞(𝐵′ , 𝑦), the default decision rule and the asset accumulation
decision rule attain the optimal value function 𝑣(𝐵, 𝑦), and
• The price function 𝑞(𝐵′ , 𝑦) satisfies equation (5).
52.5 Computation
1. Update the value function v(B, y), the default rule, the implied ex ante default proba-
bility, and the price function.
52.5.1 Setup
# create grids
Bgrid = collect(range(-.4, .4, length = nB))
mc = tauchen(ny, ρ, η)
Π = mc.p
ygrid = exp.(mc.state_values)
ydefgrid = min.(.969 * mean(ygrid), ygrid)
return (β = β, γ = γ, r = r, ρ = ρ, η = η, θ = θ, ny = ny,
nB = nB, ygrid = ygrid, ydefgrid = ydefgrid,
Bgrid = Bgrid, Π = Π, vf = vf, vd = vd, vc = vc,
policy = policy, q = q, defprob = defprob)
end
function one_step_update!(ae,
EV,
EVd,
EVc)
# unpack stuff
@unpack β, γ, r, ρ, η, θ, ny, nB = ae
@unpack ygrid, ydefgrid, Bgrid, Π, vf, vd, vc, policy, q, defprob = ae
zero_ind = searchsortedfirst(Bgrid, 0.)
for iy in 1:ny
y = ae.ygrid[iy]
ydef = ae.ydefgrid[iy]
for ib in 1:nB
B = ae.Bgrid[ib]
current_max = -1e14
pol_ind = 0
914 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
for ib_next=1:nB
c = max(y - ae.q[ib_next, iy]*Bgrid[ib_next] + B, 1e-14)
m = u(ae, c) + β * EV[ib_next, iy]
if m > current_max
current_max = m
pol_ind = ib_next
end
end
function compute_prices!(ae)
# unpack parameters
@unpack β, γ, r, ρ, η, θ, ny, nB = ae
# unpack stuff
@unpack β, γ, r, ρ, η, θ, ny, nB = ae
@unpack ygrid, ydefgrid, Bgrid, Π, vf, vd, vc, policy, q, defprob = ae
Πt = Π'
# Iteration stuff
it = 0
dist = 10.
# update prices
compute_prices!(ae)
if it % 25 == 0
println("Finished iteration $(it) with dist of $(dist)")
end
end
end
function QuantEcon.simulate(ae,
capT = 5000;
y_init = mean(ae.ygrid),
B_init = mean(ae.Bgrid),
)
# create a QE MarkovChain
mc = MarkovChain(ae.Π)
y_sim_indices = simulate(mc, capT + 1; init = y_init_ind)
for t in 1:capT
# get today's indexes
yi, Bi = y_sim_indices[t], B_sim_indices[t]
defstat = default_status[t]
if default_today
# default values
default_status[t] = true
default_status[t + 1] = true
y_sim_val[t] = ae.ydefgrid[y_sim_indices[t]]
B_sim_indices[t + 1] = zero_index
B_sim_val[t+1] = 0.
q_sim_val[t] = ae.q[zero_index, y_sim_indices[t]]
else
default_status[t] = false
y_sim_val[t] = ae.ygrid[y_sim_indices[t]]
B_sim_indices[t + 1] = ae.policy[Bi, yi]
B_sim_val[t + 1] = ae.Bgrid[B_sim_indices[t + 1]]
916 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
52.6 Results
We can use the results of the computation to study the default probability 𝛿(𝐵′ , 𝑦) defined in
(4).
The next plot shows these default probabilities over (𝐵′ , 𝑦) as a heat map
As anticipated, the probability that the government chooses to default in the following period
increases with indebtedness and falls with income.
Next let’s run a time series simulation of {𝑦𝑡 }, {𝐵𝑡 } and 𝑞(𝐵𝑡+1 , 𝑦𝑡 ).
The grey vertical bars correspond to periods when the economy is excluded from financial
markets because of a past default
52.7. EXERCISES 919
One notable feature of the simulated data is the nonlinear response of interest rates.
Periods of relative stability are followed by sharp spikes in the discount rate on government
debt.
52.7 Exercises
52.7.1 Exercise 1
To the extent that you can, replicate the figures shown above
• Use the parameter values listed as defaults in the function ArellanoEconomy.
• The time series will of course vary depending on the shock draws.
920 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
52.8 Solutions
In [6]: # create "Y High" and "Y Low" values as 5% devs from mean
high, low = 1.05 * mean(ae.ygrid), 0.95 * mean(ae.ygrid)
iy_high, iy_low = map(x -> searchsortedfirst(ae.ygrid, x), (high, low))
# generate plot
52.8. SOLUTIONS 921
Out[6]:
Out[7]:
922 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
In [8]: heatmap(ae.Bgrid[1:end-1],
ae.ygrid[2:end],
reshape(clamp.(vec(ae.defprob[1:end - 1, 1:end - 1]), 0, 1), 250, 20)')
plot!(xlabel = "B'", ylabel = "y", title = "Probability of default",
legend = :topleft)
Out[8]:
52.8. SOLUTIONS 923
# simulate
T = 250
y_vec, B_vec, q_vec, default_vec = simulate(ae, T)
# Plot the three variables, and for each each variable shading the�
↪period(s) of default
# in grey
for i in 1:3
plot!(plots[i], 1:T, y_vals[i], title = titles[i], xlabel = "time",�
↪label = "", lw =
2)
for j in 1:length(def_start)
plot!(plots[i], [def_start[j], def_end[j]],�
↪fill(maximum(y_vals[i]), 2),
"")
end
end
plot(plots)
Out[9]:
924 CHAPTER 52. DEFAULT RISK AND INCOME FLUCTUATIONS
Chapter 53
53.1 Contents
• Overview 53.2
• Key Ideas 53.3
• Model 53.4
• Simulation 53.5
• Exercises 53.6
• Solutions 53.7
Co-authored with Chase Coleman
53.2 Overview
In this lecture, we review the paper Globalization and Synchronization of Innovation Cycles
by Kiminori Matsuyama, Laura Gardini and Iryna Sushko.
This model helps us understand several interesting stylized facts about the world economy.
One of these is synchronized business cycles across different countries.
Most existing models that generate synchronized business cycles do so by assumption, since
they tie output in each country to a common shock.
They also fail to explain certain features of the data, such as the fact that the degree of syn-
chronization tends to increase with trade ties.
By contrast, in the model we consider in this lecture, synchronization is both endogenous and
increasing with the extent of trade integration.
In particular, as trade costs fall and international competition increases, innovation incentives
become aligned and countries synchronize their innovation cycles.
53.2.1 Background
The model builds on work by Judd [60], Deneckner and Judd [23] and Helpman and Krugman
[52] by developing a two country model with trade and innovation.
On the technical side, the paper introduces the concept of coupled oscillators to economic
925
926 CHAPTER 53. GLOBALIZATION AND CYCLES
modeling.
As we will see, coupled oscillators arise endogenously within the model.
Below we review the model and replicate some of the results on synchronization of innovation
across countries.
As discussed above, two countries produce and trade with each other.
In each country, firms innovate, producing new varieties of goods and, in doing so, receiving
temporary monopoly power.
Imitators follow and, after one period of monopoly, what had previously been new varieties
now enter competitive production.
Firms have incentives to innovate and produce new goods when the mass of varieties of goods
currently in production is relatively low.
In addition, there are strategic complementarities in the timing of innovation.
Firms have incentives to innovate in the same period, so as to avoid competing with substi-
tutes that are competitively produced.
This leads to temporal clustering in innovations in each country.
After a burst of innovation, the mass of goods currently in production increases.
However, goods also become obsolete, so that not all survive from period to period.
This mechanism generates a cycle, where the mass of varieties increases through simultaneous
innovation and then falls through obsolescence.
53.3.2 Synchronization
In the absence of trade, the timing of innovation cycles in each country is decoupled.
This will be the case when trade costs are prohibitively high.
If trade costs fall, then goods produced in each country penetrate each other’s markets.
As illustrated below, this leads to synchonization of business cycles across the two countries.
53.4 Model
𝑜 1−𝛼 𝛼
𝑋𝑘,𝑡 𝑋𝑘,𝑡
𝑌𝑘,𝑡 = 𝐶𝑘,𝑡 = ( ) ( )
1−𝛼 𝛼
𝑜
Here 𝑋𝑘,𝑡 is a homogeneous input which can be produced from labor using a linear, one-for-
one technology.
It is freely tradeable, competitively supplied, and homogeneous across countries.
By choosing the price of this good as numeraire and assuming both countries find it optimal
to always produce the homogeneous good, we can set 𝑤1,𝑡 = 𝑤2,𝑡 = 1.
The good 𝑋𝑘,𝑡 is a composite, built from many differentiated goods via
1
1− 1 1− 𝜎
𝑋𝑘,𝑡 𝜎 = ∫ [𝑥𝑘,𝑡 (𝜈)] 𝑑𝜈
Ω𝑡
Here 𝑥𝑘,𝑡 (𝜈) is the total amount of a differentiated good 𝜈 ∈ Ω𝑡 that is produced.
The parameter 𝜎 > 1 is the direct partial elasticity of substitution between a pair of varieties
and Ω𝑡 is the set of varieties available in period 𝑡.
We can split the varieties into those which are supplied competitively and those supplied mo-
nopolistically; that is, Ω𝑡 = Ω𝑐𝑡 + Ω𝑚
𝑡 .
53.4.1 Prices
−𝜎
𝑝𝑘,𝑡 (𝜈) 𝛼𝐿𝑘
𝑥𝑘,𝑡 (𝜈) = ( )
𝑃𝑘,𝑡 𝑃𝑘,𝑡
Here
• 𝑝𝑘,𝑡 (𝜈) is the price of the variety 𝜈 and
• 𝑃𝑘,𝑡 is the price index for differentiated inputs in 𝑘, defined by
1−𝜎
[𝑃𝑘,𝑡 ] = ∫ [𝑝𝑘,𝑡 (𝜈)]1−𝜎 𝑑𝜈
Ω𝑡
The price of a variety also depends on the origin, 𝑗, and destination, 𝑘, of the goods because
shipping varieties between countries incurs an iceberg trade cost 𝜏𝑗,𝑘 .
Thus the effective price in country 𝑘 of a variety 𝜈 produced in country 𝑗 becomes 𝑝𝑘,𝑡 (𝜈) =
𝜏𝑗,𝑘 𝑝𝑗,𝑡 (𝜈).
928 CHAPTER 53. GLOBALIZATION AND CYCLES
Using these expressions, we can derive the total demand for each variety, which is
where
𝜌𝑗,𝑘 𝐿𝑘
𝐴𝑗,𝑡 ∶= ∑ and 𝜌𝑗,𝑘 = (𝜏𝑗,𝑘 )1−𝜎 ≤ 1
𝑘
(𝑃𝑘,𝑡 )1−𝜎
It is assumed that 𝜏1,1 = 𝜏2,2 = 1 and 𝜏1,2 = 𝜏2,1 = 𝜏 for some 𝜏 > 1, so that
𝑐 𝑐 𝑐 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= 𝜓 and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
Monopolists will have the same marked-up price, so, for all 𝜈 ∈ Ω𝑚 ,
𝑚 𝜓 𝑚 𝑚 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= 1 and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
1− 𝜎
Define
𝑐
𝑝𝑗,𝑡 𝑐
𝑦𝑗,𝑡 1 1−𝜎
𝜃 ∶= 𝑚 𝑚 = (1 − )
𝑝𝑗,𝑡 𝑦𝑗,𝑡 𝜎
Using the preceding definitions and some algebra, the price indices can now be rewritten as
1−𝜎 𝑚
𝑃𝑘,𝑡 𝑐
𝑁𝑗,𝑡
( ) = 𝑀𝑘,𝑡 + 𝜌𝑀𝑗,𝑡 where 𝑀𝑗,𝑡 ∶= 𝑁𝑗,𝑡 +
𝜓 𝜃
𝑐 𝑚
The symbols 𝑁𝑗,𝑡 and 𝑁𝑗,𝑡 will denote the measures of Ω𝑐 and Ω𝑚 respectively.
To introduce a new variety, a firm must hire 𝑓 units of labor per variety in each country.
Monopolist profits must be less than or equal to zero in expectation, so
𝑚 𝑚 𝑚 𝑚 𝑚 𝑚
𝑁𝑗,𝑡 ≥ 0, 𝜋𝑗,𝑡 ∶= (𝑝𝑗,𝑡 − 𝜓)𝑦𝑗,𝑡 −𝑓 ≤0 and 𝜋𝑗,𝑡 𝑁𝑗,𝑡 =0
𝑚 𝑐 1 𝛼𝐿𝑗 𝛼𝐿𝑘
𝑁𝑗,𝑡 = 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ) ≥ 0, [ + ]≤𝑓
𝜎 𝜃(𝑀𝑗,𝑡 + 𝜌𝑀𝑘,𝑡 ) 𝜃(𝑀𝑗,𝑡 + 𝑀𝑘,𝑡 /𝜌)
With 𝛿 as the exogenous probability of a variety becoming obsolete, the dynamic equation for
the measure of firms becomes
𝑐 𝑐 𝑚 𝑐 𝑐
𝑁𝑗,𝑡+1 = 𝛿(𝑁𝑗,𝑡 + 𝑁𝑗,𝑡 ) = 𝛿(𝑁𝑗,𝑡 + 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ))
𝑐 𝑚
𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑀𝑗,𝑡 𝑖𝑗,𝑡
𝑛𝑗,𝑡 ∶= , 𝑖𝑗,𝑡 ∶= , 𝑚𝑗,𝑡 ∶= = 𝑛𝑗,𝑡 +
𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝜃
𝐿𝑗
We also use 𝑠𝑗 ∶= 𝐿1 +𝐿2 to be the share of labor employed in country 𝑗.
We can use these definitions and the preceding expressions to obtain a law of motion for
𝑛𝑡 ∶= (𝑛1,𝑡 , 𝑛2,𝑡 ).
In particular, given an initial condition, 𝑛0 = (𝑛1,0 , 𝑛2,0 ) ∈ ℝ2+ , the equilibrium trajectory,
{𝑛𝑡 }∞ ∞ 2 2
𝑡=0 = {(𝑛1,𝑡 , 𝑛2,𝑡 )}𝑡=0 , is obtained by iterating on 𝑛𝑡+1 = 𝐹 (𝑛𝑡 ) where 𝐹 ∶ ℝ+ → ℝ+ is
given by
Here
while
𝑠1 − 𝜌𝑠2
𝑠1 (𝜌) = 1 − 𝑠2 (𝜌) = min { , 1}
1−𝜌
𝑠𝑗 𝑠𝑘
1= +
ℎ𝑗 (𝑛𝑘 ) + 𝜌𝑛𝑘 ℎ𝑗 (𝑛𝑘 ) + 𝑛𝑘 /𝜌
Since we know ℎ𝑗 (𝑛𝑘 ) > 0 then we can just solve the quadratic equation and return the posi-
tive root.
This gives us
1 𝑠𝑗 𝑛𝑘
ℎ𝑗 (𝑛𝑘 )2 + ((𝜌 + )𝑛𝑘 − 𝑠𝑗 − 𝑠𝑘 ) ℎ𝑗 (𝑛𝑘 ) + (𝑛2𝑘 − − 𝑠𝑘 𝑛𝑘 𝜌) = 0
𝜌 𝜌
53.5 Simulation
53.5.1 Setup
return root
end
nsync = 0
# model
function MSGSync(s1 = 0.5, θ = 2.5, δ = 0.7, ρ = 0.2)
# Store other cutoffs and parameters we use
s2 = 1 - s1
s1_ρ = min((s1 - ρ * s2) / (1 - ρ), 1)
s2_ρ = 1 - s1_ρ
# Allocate space
n1 = zeros(T)
n2 = zeros(T)
return n1, n2
end
53.5. SIMULATION 933
function create_attraction_basis(model;
maxiter = 250,
npers = 3,
npts = 50)
# Unpack parameters
@unpack s1, s2, θ, δ, ρ, s1_ρ, s2_ρ = model
We write a short function below that exploits the preceding code and plots two time series.
Each time series gives the dynamics for the two countries.
The time series share parameters but differ in their initial condition.
Here’s the function
# Create figures
data_ns = plot_timeseries(0.15, 0.35)
data_s = plot_timeseries(0.4, 0.3)
Out[4]:
934 CHAPTER 53. GLOBALIZATION AND CYCLES
Out[5]:
In the first case, innovation in the two countries does not synchronize.
In the second case different initial conditions are chosen, and the cycles become synchronized.
53.6. EXERCISES 935
Next let’s study the initial conditions that lead to synchronized cycles more systematically.
We generate time series from a large collection of different initial conditions and mark those
conditions with different colors according to whether synchronization occurs or not.
The next display shows exactly this for four different parameterizations (one for each subfig-
ure).
Dark colors indicate synchronization, while light colors indicate failure to synchronize.
53.6 Exercises
53.6.1 Exercise 1
Replicate the figure shown above by coloring initial conditions according to whether or not
synchronization occurs from those conditions.
936 CHAPTER 53. GLOBALIZATION AND CYCLES
53.7 Solutions
53.7.1 Exercise 1
Out[7]:
53.7. SOLUTIONS 937
938 CHAPTER 53. GLOBALIZATION AND CYCLES
Part VII
939
Chapter 54
54.1 Contents
• Overview 54.2
• Introduction 54.3
• Spectral Analysis 54.4
• Implementation 54.5
54.2 Overview
In this lecture we study covariance stationary linear stochastic processes, a class of models
routinely used to study economic and financial time series.
This class has the advantage of being
We will focus much of our attention on linear covariance stationary models with a finite num-
ber of parameters.
In particular, we will study stationary ARMA processes, which form a cornerstone of the
standard theory of time series analysis.
Every ARMA processes can be represented in linear state space form.
However, ARMA have some important structure that makes it valuable to study them sepa-
rately.
941
942 CHAPTER 54. COVARIANCE STATIONARY PROCESSES
54.2.4 Setup
54.3 Introduction
54.3.1 Definitions
2. For all 𝑘 in ℤ, the 𝑘-th autocovariance 𝛾(𝑘) ∶= 𝔼(𝑋𝑡 − 𝜇)(𝑋𝑡+𝑘 − 𝜇) is finite and depends
only on 𝑘.
Perhaps the simplest class of covariance stationary processes is the white noise processes.
A process {𝜖𝑡 } is called a white noise process if
1. 𝔼𝜖𝑡 = 0
From the simple building block provided by white noise, we can construct a very flexible fam-
ily of covariance stationary processes — the general linear processes
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 , 𝑡∈ℤ (1)
𝑗=0
where
• {𝜖𝑡 } is white noise
∞
• {𝜓𝑡 } is a square summable sequence in ℝ (that is, ∑𝑡=0 𝜓𝑡2 < ∞)
The sequence {𝜓𝑡 } is often called a linear filter.
Equation (1) is said to present a moving average process or a moving average representa-
tion.
With some manipulations it is possible to confirm that the autocovariance function for (1) is
∞
𝛾(𝑘) = 𝜎2 ∑ 𝜓𝑗 𝜓𝑗+𝑘 (2)
𝑗=0
By the Cauchy-Schwartz inequality one can show that 𝛾(𝑘) satisfies equation (2).
Evidently, 𝛾(𝑘) does not depend on 𝑡.
944 CHAPTER 54. COVARIANCE STATIONARY PROCESSES
Remarkably, the class of general linear processes goes a long way towards describing the en-
tire class of zero-mean covariance stationary processes.
In particular, Wold’s decomposition theorem states that every zero-mean covariance station-
ary process {𝑋𝑡 } can be written as
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 + 𝜂𝑡
𝑗=0
where
• {𝜖𝑡 } is white noise
• {𝜓𝑡 } is square summable
• 𝜂𝑡 can be expressed as a linear function of 𝑋𝑡−1 , 𝑋𝑡−2 , … and is perfectly predictable
over arbitrarily long horizons
For intuition and further discussion, see [94], p. 286.
54.3.5 AR and MA
𝜎2
𝛾(𝑘) = 𝜙𝑘 , 𝑘 = 0, 1, … (4)
1 − 𝜙2
The next figure plots an example of this function for 𝜙 = 0.8 and 𝜙 = −0.8 with 𝜎 = 1
plt_1=plot()
plt_2=plot()
plots = [plt_1, plt_2]
alpha=0.6, label=label)
plot!(plots[i], legend=:topright, xlabel="time", xlim=(0,15))
plot!(plots[i], seriestype=:hline, [0], linestyle=:dash, alpha=0.5,�
↪lw=2, label="")
end
plot(plots[1], plots[2], layout=(2,1), size=(700,500))
Out[3]:
Another very simple process is the MA(1) process (here MA means “moving average”)
𝑋𝑡 = 𝜖𝑡 + 𝜃𝜖𝑡−1
The AR(1) can be generalized to an AR(𝑝) and likewise for the MA(1).
Putting all of this together, we get the
A stochastic process {𝑋𝑡 } is called an autoregressive moving average process, or ARMA(𝑝, 𝑞),
if it can be written as
946 CHAPTER 54. COVARIANCE STATIONARY PROCESSES
𝐿0 𝑋𝑡 − 𝜙1 𝐿1 𝑋𝑡 − ⋯ − 𝜙𝑝 𝐿𝑝 𝑋𝑡 = 𝐿0 𝜖𝑡 + 𝜃1 𝐿1 𝜖𝑡 + ⋯ + 𝜃𝑞 𝐿𝑞 𝜖𝑡 (6)
In what follows we always assume that the roots of the polynomial 𝜙(𝑧) lie outside the unit
circle in the complex plane.
This condition is sufficient to guarantee that the ARMA(𝑝, 𝑞) process is convariance station-
ary.
In fact it implies that the process falls within the class of general linear processes described
above.
That is, given an ARMA(𝑝, 𝑞) process {𝑋𝑡 } satisfying the unit circle condition, there exists a
∞
square summable sequence {𝜓𝑡 } with 𝑋𝑡 = ∑𝑗=0 𝜓𝑗 𝜖𝑡−𝑗 for all 𝑡.
The sequence {𝜓𝑡 } can be obtained by a recursive procedure outlined on page 79 of [18].
The function 𝑡 ↦ 𝜓𝑡 is often called the impulse response function.
Autocovariance functions provide a great deal of information about covariance stationary pro-
cesses.
In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire
joint distribution.
Even for non-Gaussian processes, it provides a significant amount of information.
It turns out that there is an alternative representation of the autocovariance function of a
covariance stationary process, called the spectral density.
At times, the spectral density is easier to derive, easier to manipulate, and provides additional
intuition.
54.4. SPECTRAL ANALYSIS 947
Before discussing the spectral density, we invite you to recall the main properties of complex
numbers (or skip to the next section).
It can be helpful to remember that, in a formal sense, complex numbers are just points
(𝑥, 𝑦) ∈ ℝ2 endowed with a specific notion of multiplication.
When (𝑥, 𝑦) is regarded as a complex number, 𝑥 is called the real part and 𝑦 is called the
imaginary part.
The modulus or absolute value of a complex number 𝑧 = (𝑥, 𝑦) is just its Euclidean norm in
ℝ2 , but is usually written as |𝑧| instead of ‖𝑧‖.
The product of two complex numbers (𝑥, 𝑦) and (𝑢, 𝑣) is defined to be (𝑥𝑢−𝑣𝑦, 𝑥𝑣+𝑦𝑢), while
addition is standard pointwise vector addition.
When endowed with these notions of multiplication and addition, the set of complex numbers
forms a field — addition and multiplication play well together, just as they do in ℝ.
The complex number (𝑥, 𝑦) is often written as 𝑥+𝑖𝑦, where 𝑖 is called the imaginary unit, and
is understood to obey 𝑖2 = −1.
The 𝑥 + 𝑖𝑦 notation provides an easy way to remember the definition of multiplication given
above, because, proceeding naively,
Converted back to our first notation, this becomes (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢) as promised.
Complex numbers can be represented in the polar form 𝑟𝑒𝑖𝜔 where
(Some authors normalize the expression on the right by constants such as 1/𝜋 — the conven-
tion chosen makes little difference provided you are consistent)
Using the fact that 𝛾 is even, in the sense that 𝛾(𝑡) = 𝛾(−𝑡) for all 𝑡, we can show that
• real-valued
• even (𝑓(𝜔) = 𝑓(−𝜔) ), and
• 2𝜋-periodic, in the sense that 𝑓(2𝜋 + 𝜔) = 𝑓(𝜔) for all 𝜔
It follows that the values of 𝑓 on [0, 𝜋] determine the values of 𝑓 on all of ℝ — the proof is an
exercise.
For this reason it is standard to plot the spectral density only on the interval [0, 𝜋].
It is an exercise to show that the MA(1) process 𝑋𝑡 = 𝜃𝜖𝑡−1 + 𝜖𝑡 has spectral density
With a bit more effort, it is possible to show (see, e.g., p. 261 of [94]) that the spectral den-
sity of the AR(1) process 𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 is
𝜎2
𝑓(𝜔) = (11)
1 − 2𝜙 cos(𝜔) + 𝜙2
More generally, it can be shown that the spectral density of the ARMA process (5) is
2
𝜃(𝑒𝑖𝜔 )
𝑓(𝜔) = ∣ ∣ 𝜎2 (12)
𝜙(𝑒𝑖𝜔 )
where
• 𝜎 is the standard deviation of the white noise process {𝜖𝑡 }
• the polynomials 𝜙(⋅) and 𝜃(⋅) are as defined in (7)
The derivation of (12) uses the fact that convolutions become products under Fourier trans-
formations.
The proof is elegant and can be found in many places — see, for example, [94], chapter 11,
section 4.
It is a nice exercise to verify that (10) and (11) are indeed special cases of (12).
Plotting (11) reveals the shape of the spectral density for the AR(1) model when 𝜙 takes the
values 0.8 and -0.8 respectively
54.4. SPECTRAL ANALYSIS 949
plt_1=plot()
plt_2=plot()
plots=[plt_1, plt_2]
Out[4]:
These spectral densities correspond to the autocovariance functions for the AR(1) process
shown above.
Informally, we think of the spectral density as being large at those 𝜔 ∈ [0, 𝜋] at which the
autocovariance function seems approximately to exhibit big damped cycles.
To see the idea, let’s consider why, in the lower panel of the preceding figure, the spectral
density for the case 𝜙 = −0.8 is large at 𝜔 = 𝜋.
Recall that the spectral density can be expressed as
When we evaluate this at 𝜔 = 𝜋, we get a large number because cos(𝜋𝑘) is large and positive
when (−0.8)𝑘 is positive, and large in absolute value and negative when (−0.8)𝑘 is negative.
Hence the product is always large and positive, and hence the sum of the products on the
right-hand side of (13) is large.
These ideas are illustrated in the next figure, which has 𝑘 on the horizontal axis
In [5]: ϕ = -0.8
times = 0:16
y1 = [ϕ.^k ./ (1 - ϕ.^2) for k in times]
y2 = [cos.(π * k) for k in times]
y3 = [a * b for (a, b) in zip(y1, y2)]
# Cycles at frequence π
plt_2 = plot(times, y2, color=:blue, lw=2, marker=:circle, markersize=3,
alpha=0.6, label="cos(pi k)")
plot!(plt_2, seriestype=:hline, [0], linestyle=:dash, alpha=0.5,
lw=2, label="")
plot!(plt_2, legend=:topright, xlim=(0,15), yticks=[-1, 0, 1])
# Product
plt_3 = plot(times, y3, seriestype=:sticks, marker=:circle, markersize=3,
lw=2, label="gamma(k) cos(pi k)")
plot!(plt_3, seriestype=:hline, [0], linestyle=:dash, alpha=0.5,
lw=2, label="")
plot!(plt_3, legend=:topright, xlim=(0,15), ylim=(-3,3), yticks=[-1, 0, 1,�
↪2, 3])
Out[5]:
54.4. SPECTRAL ANALYSIS 951
On the other hand, if we evaluate 𝑓(𝜔) at 𝜔 = 𝜋/3, then the cycles are not matched, the
sequence 𝛾(𝑘) cos(𝜔𝑘) contains both positive and negative terms, and hence the sum of these
terms is much smaller
In [6]: ϕ = -0.8
times = 0:16
y1 = [ϕ.^k ./ (1 - ϕ.^2) for k in times]
y2 = [cos.(π * k/3) for k in times]
y3 = [a * b for (a, b) in zip(y1, y2)]
# Cycles at frequence π
plt_2 = plot(times, y2, color=:blue, lw=2, marker=:circle, markersize=3,
alpha=0.6, label="cos(pi k/3)")
plot!(plt_2, seriestype=:hline, [0], linestyle=:dash, alpha=0.5,
lw=2, label="")
plot!(plt_2, legend=:topright, xlim=(0,15), yticks=[-1, 0, 1])
# Product
plt_3 = plot(times, y3, seriestype=:sticks, marker=:circle, markersize=3,
lw=2, label="gamma(k) cos(pi k/3)")
plot!(plt_3, seriestype=:hline, [0], linestyle=:dash, alpha=0.5,
lw=2, label="")
plot!(plt_3, legend=:topright, xlim=(0,15), ylim=(-3,3), yticks=[-1, 0, 1,�
↪2, 3])
952 CHAPTER 54. COVARIANCE STATIONARY PROCESSES
Out[6]:
In summary, the spectral density is large at frequencies 𝜔 where the autocovariance function
exhibits damped cycles.
We have just seen that the spectral density is useful in the sense that it provides a frequency-
based perspective on the autocovariance structure of a covariance stationary process.
Another reason that the spectral density is useful is that it can be “inverted” to recover the
autocovariance function via the inverse Fourier transform.
In particular, for all 𝑘 ∈ ℤ, we have
𝜋
1
𝛾(𝑘) = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 (14)
2𝜋 −𝜋
54.4. SPECTRAL ANALYSIS 953
This is convenient in situations where the spectral density is easier to calculate and manipu-
late than the autocovariance function.
(For example, the expression (12) for the ARMA spectral density is much easier to work with
than the expression for the ARMA autocovariance)
This section is loosely based on [94], p. 249-253, and included for those who
• would like a bit more insight into spectral densities
• and have at least some background in Hilbert space theory
Others should feel free to skip to the next section — none of this material is necessary to
progress to computation.
Recall that every separable Hilbert space 𝐻 has a countable orthonormal basis {ℎ𝑘 }.
The nice thing about such a basis is that every 𝑓 ∈ 𝐻 satisfies
𝑒𝑖𝜔𝑘
ℎ𝑘 (𝜔) = √ , 𝑘 ∈ ℤ, 𝜔 ∈ [−𝜋, 𝜋]
2𝜋
954 CHAPTER 54. COVARIANCE STATIONARY PROCESSES
Using the definition of 𝑇 from above and the fact that 𝑓 is even, we now have
𝑒𝑖𝜔𝑘 1
𝑇 𝛾 = ∑ 𝛾(𝑘) √ = √ 𝑓(𝜔) (16)
𝑘∈ℤ 2𝜋 2𝜋
In other words, apart from a scalar multiple, the spectral density is just an transformation of
𝛾 ∈ ℓ2 under a certain linear isometry — a different way to view 𝛾.
In particular, it is an expansion of the autocovariance function with respect to the trigono-
metric basis functions in 𝐿2 .
As discussed above, the Fourier coefficients of 𝑇 𝛾 are given by the sequence 𝛾, and, in partic-
ular, 𝛾(𝑘) = ⟨𝑇 𝛾, ℎ𝑘 ⟩.
Transforming this inner product into its integral expression and using (16) gives (14), justify-
ing our earlier expression for the inverse transform.
54.5 Implementation
Most code for working with covariance stationary models deals with ARMA models.
Julia code for studying ARMA models can be found in the DSP.jl package.
Since this code doesn’t quite cover our needs — particularly vis-a-vis spectral analysis —
we’ve put together the module arma.jl, which is part of QuantEcon.jl package.
The module provides functions for mapping ARMA(𝑝, 𝑞) models into their
3. autocovariance function
4. spectral density
54.5.1 Application
Let’s use this code to replicate the plots on pages 68–69 of [68].
Here are some functions to generate the plots
# plot functions
function plot_spectral_density(arma, plt)
(w, spect) = spectral_density(arma, two_pi=false)
plot!(plt, w, spect, lw=2, alpha=0.7,label="")
plot!(plt, title="Spectral density", xlim=(0,π),
xlabel="frequency", ylabel="spectrum", yscale=:log)
return plt
end
function plot_spectral_density(arma)
plt = plot()
54.5. IMPLEMENTATION 955
plot_spectral_density(arma, plt=plt)
return plt
end
function plot_autocovariance(arma)
plt = plot()
plot_spectral_density(arma, plt=plt)
return plt
end
function plot_impulse_response(arma)
plt = plot()
plot_spectral_density(arma, plt=plt)
return plt
end
return plt
end
function plot_simulation(arma)
plt = plot()
plot_spectral_density(arma, plt=plt)
return plt
end
function quad_plot(arma)
plt_1 = plot()
plt_2 = plot()
plt_3 = plot()
plt_4 = plot()
956 CHAPTER 54. COVARIANCE STATIONARY PROCESSES
plot_functions = [plot_spectral_density,
plot_impulse_response,
plot_autocovariance,
plot_simulation]
for (i, plt, plot_func) in zip(1:1:4, plots, plot_functions)
plots[i] = plot_func(arma, plt)
end
return plot(plots[1], plots[2], plots[3], plots[4], layout=(2,2),�
↪size=(800,800))
end
Out[8]:
54.5. IMPLEMENTATION 957
54.5.2 Explanation
The call
arma = ARMA(ϕ, θ, σ)
𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 + 𝜃𝜖𝑡−1
958 CHAPTER 54. COVARIANCE STATIONARY PROCESSES
The two numerical packages most useful for working with ARMA models are DSP.jl and
the fft routine in Julia.
As discussed above, for ARMA processes the spectral density has a simple representation that
is relatively easy to calculate.
Given this fact, the easiest way to obtain the autocovariance function is to recover it from the
spectral density via the inverse Fourier transform.
Here we use Julia’s Fourier transform routine fft, which wraps a standard C-based package
called FFTW.
A look at the fft documentation shows that the inverse transform ifft takes a given sequence
𝐴0 , 𝐴1 , … , 𝐴𝑛−1 and returns the sequence 𝑎0 , 𝑎1 , … , 𝑎𝑛−1 defined by
1 𝑛−1
𝑎𝑘 = ∑ 𝐴 𝑒𝑖𝑘2𝜋𝑡/𝑛
𝑛 𝑡=0 𝑡
Thus, if we set 𝐴𝑡 = 𝑓(𝜔𝑡 ), where 𝑓 is the spectral density and 𝜔𝑡 ∶= 2𝜋𝑡/𝑛, then
1 𝑛−1 1 2𝜋 𝑛−1
𝑎𝑘 = ∑ 𝑓(𝜔𝑡 )𝑒𝑖𝜔𝑡 𝑘 = ∑ 𝑓(𝜔𝑡 )𝑒𝑖𝜔𝑡 𝑘 , 𝜔𝑡 ∶= 2𝜋𝑡/𝑛
𝑛 𝑡=0 2𝜋 𝑛 𝑡=0
2𝜋 𝜋
1 1
𝑎𝑘 ≈ ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔
2𝜋 0 2𝜋 −𝜋
Estimation of Spectra
55.1 Contents
• Overview 55.2
• Periodograms 55.3
• Smoothing 55.4
• Exercises 55.5
• Solutions 55.6
55.2 Overview
55.2.1 Setup
959
960 CHAPTER 55. ESTIMATION OF SPECTRA
55.3 Periodograms
Recall that the spectral density 𝑓 of a covariance stationary process with autocorrelation
function 𝛾 can be written
Now consider the problem of estimating the spectral density of a given time series, when 𝛾 is
unknown.
In particular, let 𝑋0 , … , 𝑋𝑛−1 be 𝑛 consecutive observations of a single time series that is as-
sumed to be covariance stationary.
The most common estimator of the spectral density of this process is the periodogram of
𝑋0 , … , 𝑋𝑛−1 , which is defined as
2
1 𝑛−1
𝐼(𝜔) ∶= ∣∑ 𝑋𝑡 𝑒𝑖𝑡𝜔 ∣ , 𝜔∈ℝ (1)
𝑛 𝑡=0
2 2
1⎧
{ 𝑛−1 𝑛−1 ⎫
}
𝐼(𝜔) = [∑ 𝑋𝑡 cos(𝜔𝑡)] + [∑ 𝑋𝑡 sin(𝜔𝑡)]
𝑛⎨
{ ⎬
}
⎩ 𝑡=0 𝑡=0 ⎭
It is straightforward to show that the function 𝐼 is even and 2𝜋-periodic (i.e., 𝐼(𝜔) = 𝐼(−𝜔)
and 𝐼(𝜔 + 2𝜋) = 𝐼(𝜔) for all 𝜔 ∈ ℝ).
From these two results, you will be able to verify that the values of 𝐼 on [0, 𝜋] determine the
values of 𝐼 on all of ℝ.
The next section helps to explain the connection between the periodogram and the spectral
density.
55.3.1 Interpretation
To interpret the periodogram, it is convenient to focus on its values at the Fourier frequencies
2𝜋𝑗
𝜔𝑗 ∶= , 𝑗 = 0, … , 𝑛 − 1
𝑛
𝑛−1 𝑛−1
𝑡
∑ 𝑒𝑖𝑡𝜔𝑗 = ∑ exp {𝑖2𝜋𝑗 } = 0
𝑡=0 𝑡=0
𝑛
55.3. PERIODOGRAMS 961
𝑛−1
Letting 𝑋̄ denote the sample mean 𝑛−1 ∑𝑡=0 𝑋𝑡 , we then have
2
𝑛−1 𝑛−1 𝑛−1
̄
𝑛𝐼(𝜔𝑗 ) = ∣∑(𝑋𝑡 − 𝑋)𝑒 𝑖𝑡𝜔𝑗 ̄ 𝑖𝑡𝜔𝑗 ∑(𝑋𝑟 − 𝑋)𝑒
∣ = ∑(𝑋𝑡 − 𝑋)𝑒 ̄ −𝑖𝑟𝜔𝑗
𝑡=0 𝑡=0 𝑟=0
Now let
1 𝑛−1 ̄ ̄
𝛾(𝑘)
̂ ∶= ∑(𝑋 − 𝑋)(𝑋 𝑡−𝑘 − 𝑋), 𝑘 = 0, 1, … , 𝑛 − 1
𝑛 𝑡=𝑘 𝑡
This is the sample autocovariance function, the natural “plug-in estimator” of the autocovari-
ance function 𝛾.
(“Plug-in estimator” is an informal term for an estimator found by replacing expectations
with sample means)
With this notation, we can now write
𝑛−1
𝐼(𝜔𝑗 ) = 𝛾(0)
̂ + 2 ∑ 𝛾(𝑘)
̂ cos(𝜔𝑗 𝑘)
𝑘=1
Recalling our expression for 𝑓 given above, we see that 𝐼(𝜔𝑗 ) is just a sample analog of 𝑓(𝜔𝑗 ).
55.3.2 Calculation
𝑛−1
𝑡𝑗
𝐴𝑗 ∶= ∑ 𝑎𝑡 exp {𝑖2𝜋 }, 𝑗 = 0, … , 𝑛 − 1
𝑡=0
𝑛
With 𝑎0 , … , 𝑎𝑛−1 stored in Julia array a, the function call fft(a) returns the values
𝐴0 , … , 𝐴𝑛−1 as a Julia array.
It follows that, when the data 𝑋0 , … , 𝑋𝑛−1 are stored in array X, the values 𝐼(𝜔𝑗 ) at the
Fourier frequencies, which are given by
962 CHAPTER 55. ESTIMATION OF SPECTRA
2
1 𝑛−1 𝑡𝑗
∣∑ 𝑋 exp {𝑖2𝜋 }∣ , 𝑗 = 0, … , 𝑛 − 1
𝑛 𝑡=0 𝑡 𝑛
where {𝜖𝑡 } is white noise with unit variance, and compares the periodogram to the actual
spectral density
n = 40 # Data size
ϕ = 0.5 # AR parameter
θ = [0, -0.8] # MA parameter
σ = 1.0
lp = ARMA(ϕ, θ, σ)
X = simulation(lp, ts_length = n)
x, y = periodogram(X)
x_sd, y_sd = spectral_density(lp, two_pi=false, res=120)
Out[3]:
55.3. PERIODOGRAMS 963
This estimate looks rather disappointing, but the data size is only 40, so perhaps it’s not sur-
prising that the estimate is poor.
However, if we try again with n = 1200 the outcome is not much better
The periodogram is far too irregular relative to the underlying spectral density.
This brings us to our next topic.
964 CHAPTER 55. ESTIMATION OF SPECTRA
55.4 Smoothing
𝑝
𝐼𝑆 (𝜔𝑗 ) ∶= ∑ 𝑤(ℓ)𝐼(𝜔𝑗+ℓ ) (3)
ℓ=−𝑝
where the weights 𝑤(−𝑝), … , 𝑤(𝑝) are a sequence of 2𝑝 + 1 nonnegative values summing to
one.
In generally, larger values of 𝑝 indicate more smoothing — more on this below.
The next figure shows the kind of sequence typically used.
Note the smaller weights towards the edges and larger weights in the center, so that more dis-
tant values from 𝐼(𝜔𝑗 ) have less weight than closer ones in the sum (3)
Out[4]:
55.4. SMOOTHING 965
Our next step is to provide code that will not only estimate the periodogram but also provide
smoothing as required.
Such functions have been written in estspec.jl and are available once you’ve installed QuantE-
con.jl.
The GitHub listing displays three functions, smooth(), periodogram(),
ar_periodogram(). We will discuss the first two here and the third one below.
The periodogram() function returns a periodogram, optionally smoothed via the
smooth() function.
Regarding the smooth() function, since smoothing adds a nontrivial amount of computa-
tion, we have applied a fairly terse array-centric method based around conv.
Readers are left either to explore or simply to use this code according to their interests.
The next three figures each show smoothed and unsmoothed periodograms, as well as the
population or “true” spectral density.
(The model is the same as before — see equation (2) — and there are 400 observations)
966 CHAPTER 55. ESTIMATION OF SPECTRA
From top figure to bottom, the window length is varied from small to large.
In looking at the figure, we can see that for this model and data size, the window length cho-
sen in the middle figure provides the best fit.
Relative to this value, the first window length provides insufficient smoothing, while the third
gives too much smoothing.
Of course in real estimation problems the true spectral density is not visible and the choice of
appropriate smoothing will have to be made based on judgement/priors or some other theory.
In the code listing we showed three functions from the file estspec.jl.
The third function in the file (ar_periodogram()) adds a pre-processing step to peri-
odogram smoothing.
First we describe the basic idea, and after that we give the code.
55.4. SMOOTHING 967
1. Transform the data in order to make estimation of the spectral density more efficient.
3. Reverse the effect of the transformation on the periodogram, so that it now estimates
the spectral density of the original process.
Let’s examine this idea more carefully in a particular setting — where the data are assumed
to be generated by an AR(1) process.
(More general ARMA settings can be handled using similar techniques to those described be-
low)
Suppose in particular that {𝑋𝑡 } is covariance stationary and AR(1), with
where 𝜇 and 𝜙 ∈ (−1, 1) are unknown parameters and {𝜖𝑡 } is white noise.
It follows that if we regress 𝑋𝑡+1 on 𝑋𝑡 and an intercept, the residuals will approximate white
noise.
Let
• 𝑔 be the spectral density of {𝜖𝑡 } — a constant function, as discussed above
• 𝐼0 be the periodogram estimated from the residuals — an estimate of 𝑔
• 𝑓 be the spectral density of {𝑋𝑡 } — the object we are trying to estimate
In view of an earlier result we obtained while discussing ARMA processes, 𝑓 and 𝑔 are related
by
968 CHAPTER 55. ESTIMATION OF SPECTRA
2
1
𝑓(𝜔) = ∣ ∣ 𝑔(𝜔) (5)
1 − 𝜙𝑒𝑖𝜔
This suggests that the recoloring step, which constructs an estimate 𝐼 of 𝑓 from 𝐼0 , should set
2
1
𝐼(𝜔) = ∣ ∣ 𝐼0 (𝜔)
1 − 𝜙𝑒̂ 𝑖𝜔
The periodograms are calculated from time series drawn from (4) with 𝜇 = 0 and 𝜙 = −0.9.
Each time series is of length 150.
The difference between the three subfigures is just randomness — each one uses a different
55.5. EXERCISES 969
In all cases, periodograms are fit with the “hamming” window and window length of 65.
Overall, the fit of the AR smoothed periodogram is much better, in the sense of being closer
to the true spectral density.
55.5 Exercises
55.5.1 Exercise 1
55.5.2 Exercise 2
55.6 Solutions
55.6.1 Exercise 1
In [5]: n = 400
ϕ = 0.5
θ = [0, -0.8]
σ = 1.0
lp = ARMA(ϕ, θ, 1.0)
X = simulation(lp, ts_length = n)
xs = []
x_sds = []
x_sms = []
ys = []
y_sds = []
y_sms = []
titles = []
Out[6]:
55.6. SOLUTIONS 971
55.6.2 Exercise 2
for i in 1:3
X = simulation(lp2,ts_length=150)
plot!(p[i],xlims = (0,pi))
Out[7]:
972 CHAPTER 55. ESTIMATION OF SPECTRA
Chapter 56
Additive Functionals
56.1 Contents
• Overview 56.2
• A Particular Additive Functional 56.3
• Dynamics 56.4
• Code 56.5
Co-authored with Chase Coleman and Balint Szoke
56.2 Overview
973
974 CHAPTER 56. ADDITIVE FUNCTIONALS
This lecture focuses on a particular type of additive functional: a scalar process {𝑦𝑡 }∞
𝑡=0 whose
increments are driven by a Gaussian vector autoregression.
It is simple to construct, simulate, and analyze.
This additive functional consists of two components, the first of which is a first-order vec-
tor autoregression (VAR)
Here
• 𝑥𝑡 is an 𝑛 × 1 vector,
• 𝐴 is an 𝑛 × 𝑛 stable matrix (all eigenvalues lie within the open unit circle),
• 𝑧𝑡+1 ∼ 𝑁 (0, 𝐼) is an 𝑚 × 1 i.i.d. shock,
• 𝐵 is an 𝑛 × 𝑚 matrix, and
• 𝑥0 ∼ 𝑁 (𝜇0 , Σ0 ) is a random initial condition for 𝑥
The second component is an equation that expresses increments of {𝑦𝑡 }∞
𝑡=0 as linear functions
of
• a scalar constant 𝜈,
• the vector 𝑥𝑡 , and
• the same Gaussian vector 𝑧𝑡+1 that appears in the VAR (1)
In particular,
One way to represent the overall dynamics is to use a linear state space system.
To do this, we set up state and observation vectors
1
𝑥
𝑥𝑡̂ = ⎢𝑥𝑡 ⎤
⎡
⎥ and 𝑦𝑡̂ = [ 𝑡 ]
𝑦𝑡
𝑦
⎣ 𝑡⎦
1 1 0 0 1 0
⎡𝑥 ⎤ = ⎡0 𝐴 0⎤ ⎡𝑥 ⎤ + ⎡ 𝐵 ⎤ 𝑧
⎢ 𝑡+1 ⎥ ⎢ ⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝑡+1
′ ′
⎣ 𝑦𝑡+1 ⎦ ⎣𝜈 𝐷 1⎦ ⎣ 𝑦𝑡 ⎦ ⎣𝐹 ⎦
1
𝑥 0 𝐼 0 ⎡ ⎤
[ 𝑡] = [ ] ⎢𝑥𝑡 ⎥
𝑦𝑡 0 0 1
⎣ 𝑦𝑡 ⎦
56.4. DYNAMICS 975
𝑥𝑡+1
̂ = 𝐴𝑥 ̂ ̂ + 𝐵𝑧
̂ 𝑡+1
𝑡
𝑦𝑡̂ = 𝐷̂ 𝑥𝑡̂
56.4 Dynamics
𝑥𝑡+1
̃ = 𝜙1 𝑥𝑡̃ + 𝜙2 𝑥𝑡−1
̃ + 𝜙3 𝑥𝑡−2
̃ + 𝜙4 𝑥𝑡−3
̃ + 𝜎𝑧𝑡+1 (3)
56.4.1 Simulation
56.4.2 Setup
if B isa AbstractVector
B = reshape(B, length(B), 1)
end
# unpack required elements
nx, nk = size(B)
# set F
if isnothing(F)
F = zeros(nk, 1)
elseif ndims(F) == 1
F = reshape(F, length(F), 1)
end
# set ν
if isnothing(ν)
ν = zeros(nm, 1)
elseif ndims(ν) == 1
ν = reshape(ν, length(ν), 1)
else
throw(ArgumentError("ν must be column vector!"))
end
if size(ν, 1) != size(D, 1)
error("The size of ν is inconsistent with D!")
end
AMF_LSS_VAR(A, B, D) =
AMF_LSS_VAR(A, B, D, nothing, ν=nothing)
AMF_LSS_VAR(A, B, D, F, ν) =
AMF_LSS_VAR(A, B, D, [F], ν=[ν])
function construct_ss(A, B, D, F,
56.4. DYNAMICS 977
H, g = additive_decomp(A, B, D, F, nx)
# auxiliary blocks with 0's and 1's to fill out the lss matrices
nx0c = zeros(nx, 1)
nx0r = zeros(1, nx)
nx1 = ones(1, nx)
nk0 = zeros(1, nk)
ny0c = zeros(nm, 1)
ny0r = zeros(1, nm)
ny1m = I + zeros(nm, nm)
ny0m = zeros(nm, nm)
nyx0m = similar(D)
return lss
end
return H, g
end
return H, g, ν_tilde
end
function loglikelihood_path(amf, x, y)
@unpack A, B, D, F = amf
k, T = size(y)
FF = F * F'
FFinv = inv(FF)
temp = y[:, 2:end]-y[:, 1:end-1] - D*x[:, 1:end-1]
obs = temp .* FFinv .* temp
obssum = cumsum(obs)
scalar = (logdet(FF) + k * log(2π)) * (1:T)
function loglikelihood(amf, x, y)
llh = loglikelihood_path(amf, x, y)
return llh[end]
end
add_figs = []
for ii in 0:nm-1
li, ui = npaths*(ii), npaths*(ii + 1)
LI, UI = 2ii, 2(ii + 1)
push!(add_figs,
plot_given_paths(T, ypath[li + 1:ui, :], mpath[li + 1:ui, :],�
↪spath[li +
1:ui, :],
tpath[li + 1:ui, :], mbounds[LI + 1:UI, :],�
↪sbounds[LI +
1:UI, :],
show_trend = show_trend))
end
return add_figs
end
sbounds_mult = zeros(2nm, T)
tpath_mult = zeros(nm * npaths, T)
ypath_mult = zeros(nm * npaths, T)
end
mult_figs = []
for ii in 0:nm-1
li, ui = npaths * ii, npaths * (ii + 1)
LI, UI = 2ii, 2(ii + 1)
push!(mult_figs,
plot_given_paths(T, ypath_mult[li+1:ui, :], mpath_mult[li+1:
↪ ui, :],
spath_mult[li+1:ui, :], tpath_mult[li+1:ui, :],
mbounds_mult[LI+1:UI, :], sbounds_mult[LI+1:UI, :],
horline = 1.0, show_trend=show_trend))
end
return mult_figs
end
mart_figs = []
for ii in 0:nm-1
li, ui = npaths*(ii), npaths*(ii + 1)
LI, UI = 2ii, 2(ii + 1)
push!(mart_figs,
plot_martingale_paths(T, mpath_mult[li + 1:ui, :],
mbounds_mult[LI + 1:UI, :
↪], horline
= 1))
plot!(mart_figs[ii + 1], title = "Martingale components for many�
↪paths of y_(ii
+ 1)")
end
return mart_figs
end
# allocate space
trange = 1:T
# allocate transpose
mpath� = Matrix(mpath')
# create figure
plots=plot(layout = (2, 2), size = (800, 800))
label = "")
plot!(plots[1], title = "One Path of All Variables", legend=:topleft)
lb = mbounds[1, :]
plot!(plots[2], ub, fillrange = [lb, ub], alpha = 0.25, color = :
↪magenta, label =
"")
plot!(plots[2], seriestype = :hline, [horline], color = :black,�
↪linestyle =:dash,
label = "")
plot!(plots[2], title = "Martingale Components for Many Paths")
label = "")
plot!(plots[3], title = "Stationary Components for Many Paths")
label = "")
plot!(plots[4], title = "Trend Components for Many Paths")
return plots
end
"")
plot!(plt, trange, Matrix(mpath'), linewidth=0.25, color = :black,�
↪label = "")
return plt
end
For now, we just plot 𝑦𝑡 and 𝑥𝑡 , postponing until later a description of exactly how we com-
pute them.
# A matrix should be n x n
A = [ϕ_1 ϕ_2 ϕ_3 ϕ_4;
1 0 0 0;
0 1 0 0;
0 0 1 0]
# B matrix should be n x k
B = [σ, 0, 0, 0]
D = [1 0 0 0] * A
F = [1, 0, 0, 0] � vec(B)
amf = AMF_LSS_VAR(A, B, D, F, ν)
T = 150
x, y = simulate(amf.lss, T)
label = "")
plot!(plots[2], title = "Associated path of x_t")
plot(plots)
Out[5]:
56.4. DYNAMICS 985
56.4.3 Decomposition
Hansen and Sargent [46] describe how to construct a decomposition of an additive functional
into four parts:
• a constant inherited from initial values 𝑥0 and 𝑦0
• a linear trend
• a martingale
• an (asymptotically) stationary component
To attain this decomposition for the particular class of additive functionals defined by (1) and
(2), we first construct the matrices
𝐻 ∶= 𝐹 + 𝐵′ (𝐼 − 𝐴′ )−1 𝐷
𝑔 ∶= 𝐷′ (𝐼 − 𝐴)−1
Martingale component
⏞
𝑡 initial conditions
𝑦𝑡 = 𝑡𝜈
⏟ + ∑ 𝐻𝑧𝑗 − 𝑔𝑥
⏟𝑡 + 𝑔⏞
𝑥 0 + 𝑦0
trend component 𝑗=1 stationary component
At this stage you should pause and verify that 𝑦𝑡+1 − 𝑦𝑡 satisfies (2).
It is convenient for us to introduce the following notation:
• 𝜏𝑡 = 𝜈𝑡 , a linear, deterministic trend
𝑡
• 𝑚𝑡 = ∑𝑗=1 𝐻𝑧𝑗 , a martingale with time 𝑡 + 1 increment 𝐻𝑧𝑡+1
986 CHAPTER 56. ADDITIVE FUNCTIONALS
1 1 0 0 0 0 1 0
⎡ 𝑡 + 1 ⎤ ⎡1 1 0 0 0 ⎤ ⎡ 𝑡 ⎤ ⎡ 0⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 𝑥𝑡+1 ⎥ = ⎢0 0 𝐴 0 0⎥ ⎢ 𝑥𝑡 ⎥ + ⎢ 𝐵 ⎥ 𝑧𝑡+1
⎢ 𝑦𝑡+1 ⎥ ⎢𝜈 0 𝐷′ 1 0⎥ ⎢ 𝑦𝑡 ⎥ ⎢ 𝐹 ′ ⎥
⎣𝑚𝑡+1 ⎦ ⎣0 0 0 0 1⎦ ⎣𝑚𝑡 ⎦ ⎣𝐻 ′ ⎦
and
𝑥𝑡 0 0 𝐼 0 0 1
⎡ 𝑦 ⎤ ⎡0 0 0 1 0 ⎤ ⎡ 𝑡 ⎤
⎢ 𝑡⎥ ⎢ ⎥⎢ ⎥
⎢ 𝜏𝑡 ⎥ = ⎢0 𝜈 0 0 0⎥ ⎢ 𝑥𝑡 ⎥
⎢𝑚𝑡 ⎥ ⎢0 0 0 0 1 ⎥ ⎢ 𝑦𝑡 ⎥
⎣ 𝑠𝑡 ⎦ ⎣0 0 −𝑔 0 0⎦ ⎣𝑚𝑡 ⎦
With
1 𝑥𝑡
⎡ 𝑡 ⎤ ⎡𝑦 ⎤
⎢ ⎥ ⎢ 𝑡⎥
𝑥̃ ∶= ⎢ 𝑥𝑡 ⎥ and 𝑦 ̃ ∶= ⎢ 𝜏𝑡 ⎥
⎢ 𝑦𝑡 ⎥ ⎢𝑚𝑡 ⎥
⎣𝑚𝑡 ⎦ ⎣ 𝑠𝑡 ⎦
𝑥𝑡+1
̃ = 𝐴𝑥 ̃ ̃ + 𝐵𝑧
̃ 𝑡+1
𝑡
𝑦𝑡̃ = 𝐷̃ 𝑥𝑡̃
56.5 Code
The type AMF_LSS_VAR mentioned above does all that we want to study our additive
functional.
In fact AMF_LSS_VAR does more, as we shall explain below.
(A hint that it does more is the name of the type – here AMF stands for “additive and multi-
plicative functional” – the code will do things for multiplicative functionals too)
Let’s use this code (embedded above) to explore the example process described above.
If you run the code that first simulated that example again and then the method call you will
generate (modulo randomness) the plot
56.5. CODE 987
Out[6]:
When we plot multiple realizations of a component in the 2nd, 3rd, and 4th panels, we also
plot population 95% probability coverage sets computed using the LSS type.
We have chosen to simulate many paths, all starting from the same nonrandom initial condi-
tions 𝑥0 , 𝑦0 (you can tell this from the shape of the 95% probability coverage shaded areas).
Notice tell-tale signs of these probability coverage shaded areas
√
• the purple one for the martingale component 𝑚𝑡 grows with 𝑡
• the green one for the stationary component 𝑠𝑡 converges to a constant band
Corresponding to the additive decomposition described above we have the multiplicative de-
composition of the 𝑀𝑡
𝑡
𝑀𝑡
= exp(𝑡𝜈) exp(∑ 𝐻 ⋅ 𝑍𝑗 ) exp(𝐷′ (𝐼 − 𝐴)−1 𝑥0 − 𝐷′ (𝐼 − 𝐴)−1 𝑥𝑡 )
𝑀0 𝑗=1
or
𝑀𝑡 ̃
𝑀 𝑒(𝑋
̃ 0)
̃ ( 𝑡)(
= exp (𝜈𝑡) )
𝑀0 ̃0
𝑀 𝑒(𝑥
̃ 𝑡)
where
𝑡
𝐻 ⋅𝐻 ̃ 𝐻 ⋅𝐻 ̃0 = 1
𝜈̃ = 𝜈 + , 𝑀𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − )), 𝑀
2 𝑗=1
2
and
𝑒(𝑥)
̃ = exp[𝑔(𝑥)] = exp[𝐷′ (𝐼 − 𝐴)−1 𝑥]
Out[7]:
56.5. CODE 989
As before, when we plotted multiple realizations of a component in the 2nd, 3rd, and 4th
panels, we also plotted population 95% confidence bands computed using the LSS type.
Comparing this figure and the last also helps show how geometric growth differs from arith-
metic growth.
̃𝑡 of the multiplicative
Hansen and Sargent [46] (ch. 6) note that the martingale component 𝑀
decomposition has a peculiar property.
̃𝑡 = 1 for all 𝑡 ≥ 0, nevertheless ….
• While 𝐸0 𝑀
• As 𝑡 → +∞, 𝑀̃𝑡 converges to zero almost surely.
̃𝑡 illustrates this property
The following simulation of many paths of 𝑀
Out[8]:
990 CHAPTER 56. ADDITIVE FUNCTIONALS
Chapter 57
Multiplicative Functionals
57.1 Contents
• Overview 57.2
• A Log-Likelihood Process 57.3
• Benefits from Reduced Aggregate Fluctuations 57.4
Co-authored with Chase Coleman and Balint Szoke
57.2 Overview
This lecture uses this special class to create and analyze two examples
• A log likelihood process, an object at the foundation of both frequentist and
Bayesian approaches to statistical inference.
• A version of Robert E. Lucas’s [70] and Thomas Tallarini’s [103] approaches to measur-
ing the benefits of moderating aggregate fluctuations.
991
992 CHAPTER 57. MULTIPLICATIVE FUNCTIONALS
Evidently,
𝑥𝑡+1 = (𝐴 − 𝐵𝐹 −1 𝐷) 𝑥𝑡 + 𝐵𝐹 −1 (𝑦𝑡+1 − 𝑦𝑡 ) ,
The distribution of 𝑦𝑡+1 − 𝑦𝑡 conditional on 𝑥𝑡 is normal with mean 𝐷𝑥𝑡 and nonsingular co-
variance matrix 𝐹 𝐹 ′ .
Let 𝜃 denote the vector of free parameters of the model.
These parameters pin down the elements of 𝐴, 𝐵, 𝐷, 𝐹 .
The log likelihood function of {𝑦𝑠 }𝑡𝑠=1 is
1 𝑡
log 𝐿𝑡 (𝜃) = − ∑(𝑦 − 𝑦𝑗−1 − 𝐷𝑥𝑗−1 )′ (𝐹 𝐹 ′ )−1 (𝑦𝑗 − 𝑦𝑗−1 − 𝐷𝑥𝑗−1 )
2 𝑗=1 𝑗
𝑡 𝑘𝑡
− log det(𝐹 𝐹 ′ ) − log(2𝜋)
2 2
Let’s consider the case of a scalar process in which 𝐴, 𝐵, 𝐷, 𝐹 are scalars and 𝑧𝑡+1 is a scalar
stochastic process.
We let 𝜃𝑜 denote the “true” values of 𝜃, meaning the values that generate the data.
For the purposes of this exercise, set 𝜃𝑜 = (𝐴, 𝐵, 𝐷, 𝐹 ) = (0.8, 1, 0.5, 0.2).
Set 𝑥0 = 𝑦0 = 0.
We’ll do this by formulating the additive functional as a linear state space model and putting
the LSS struct to work.
57.3.2 Setup
function construct_ss(A, B, D, F, ν)
H, g = additive_decomp(A, B, D, F)
57.3. A LOG-LIKELIHOOD PROCESS 993
function additive_decomp(A, B, D, F)
A_res = 1 / (1 - A)
g = D * A_res
H = F + D * A_res * B
return H, g
end
function multiplicative_decomp(A, B, D, F, ν)
H, g = additive_decomp(A, B, D, F)
ν_tilde = ν + 0.5 * H^2
return ν_tilde, H, g
end
function loglikelihood_path(amf, x, y)
@unpack A, B, D, F = amf
T = length(y)
FF = F^2
FFinv = inv(FF)
temp = y[2:end] - y[1:end-1] - D*x[1:end-1]
obs = temp .* FFinv .* temp
obssum = cumsum(obs)
scalar = (log(FF) + log(2pi)) * (1:T-1)
return -0.5 * (obssum + scalar)
end
function loglikelihood(amf, x, y)
llh = loglikelihood_path(amf, x, y)
return llh[end]
end
994 CHAPTER 57. MULTIPLICATIVE FUNCTIONALS
for i in 1:I
# Do specific simulation
x, y = simulate_xy(amf, T)
Now that we have these functions in our took kit, let’s apply them to run some simulations.
In particular, let’s use our program to generate 𝐼 = 5000 sample paths of length 𝑇 = 150,
labeled {𝑥𝑖𝑡 , 𝑦𝑡𝑖 }∞
𝑡=0 for 𝑖 = 1, ..., 𝐼.
Here goes
In [5]: F = 0.2
amf = AMF_LSS_VAR(A = 0.8, B = 1.0, D = 0.5, F = F)
T = 150
I = 5000
Out[5]:
We want as inputs to this program the same sample paths {𝑥𝑖𝑡 , 𝑦𝑡𝑖 }𝑇𝑡=0 that we have already
computed.
We now want to simulate 𝐼 = 5000 paths of {log 𝐿𝑖𝑡 ∣ 𝜃𝑜 }𝑇𝑡=1 .
• For each path, we compute log 𝐿𝑖𝑇 /𝑇 .
𝐼
• We also compute 1𝐼 ∑𝑖=1 log 𝐿𝑖𝑇 /𝑇 .
Then we to compare these objects.
Below we plot the histogram of log 𝐿𝑖𝑇 /𝑇 for realizations 𝑖 = 1, … , 5000
# Allocate space
LLit = zeros(I, T-1)
for i in 1:I
LLit[i, :] = loglikelihood_path(amf, Xit[i, :], Yit[i, :])
end
return LLit
end
Out[6]:
57.3. A LOG-LIKELIHOOD PROCESS 997
Notice that the log likelihood is almost always nonnegative, implying that 𝐿𝑡 is typically big-
ger than 1.
Recall that the likelihood function is a pdf (probability density function) and not a probabil-
ity measure, so it can take values larger than 1.
In the current case, the conditional variance of Δ𝑦𝑡+1 , which equals 𝐹 𝐹 𝑇 = 0.04, is so small
that the maximum value of the pdf is 2 (see the figure below).
This implies that approximately 75% of the time (a bit more than one sigma deviation), we
should expect the increment of the log likelihood to be nonnegative.
Let’s see this in a simulation
Out[8]:
Now consider alternative parameter vector 𝜃1 = [𝐴, 𝐵, 𝐷, 𝐹 ] = [0.9, 1.0, 0.55, 0.25].
We want to compute {log 𝐿𝑡 ∣ 𝜃1 }𝑇𝑡=1 .
The 𝑥𝑡 , 𝑦𝑡 inputs to this program should be exactly the same sample paths {𝑥𝑖𝑡 , 𝑦𝑡𝑖 }𝑇𝑡=0 that
we we computed above.
This is because we want to generate data under the 𝜃𝑜 probability model but evaluate the
likelihood under the 𝜃1 model.
So our task is to use our program to simulate 𝐼 = 5000 paths of {log 𝐿𝑖𝑡 ∣ 𝜃1 }𝑇𝑡=1 .
• For each path, compute 𝑇1 log 𝐿𝑖𝑇 .
𝐼
• Then compute 1𝐼 ∑𝑖=1 𝑇1 log 𝐿𝑖𝑇 .
We want to compare these objects with each other and with the analogous objects that we
computed above.
Then we want to interpret outcomes.
A function that we constructed can handle these tasks.
57.3. A LOG-LIKELIHOOD PROCESS 999
The only innovation is that we must create an alternative model to feed in.
We will creatively call the new model amf2.
We make three graphs
• the first sets the stage by repeating an earlier graph
• the second contains two histograms of values of log likelihoods of the two models over
the period 𝑇
• the third compares likelihoods under the true and alternative models
Here’s the code
θ_0
Out[9]:
Let’s see a histogram of the log-likelihoods under the true and the alternative model (same
sample paths)
1000 CHAPTER 57. MULTIPLICATIVE FUNCTIONALS
In [10]: plot(seriestype = :histogram, LLT, bin = 50, alpha = 0.5, label = "True",�
↪normed = true)
normed = true)
vline!([mean(LLT)], color = :black, lw = 2, linestyle = :dash, label = "")
vline!([mean(LLT2)], color = :black, lw = 2, linestyle = :dash, label = "")
Out[10]:
Now we’ll plot the histogram of the difference in log likelihood ratio
Out[11]:
57.4. BENEFITS FROM REDUCED AGGREGATE FLUCTUATIONS 1001
57.3.5 Interpretation
These histograms of log likelihood ratios illustrate important features of likelihood ratio
tests as tools for discriminating between statistical models.
• The loglikeklihood is higher on average under the true model – obviously a very useful
property.
• Nevertheless, for a positive fraction of realizations, the log likelihood is higher for the
incorrect than for the true model
• in these instances, a likelihood ratio test mistakenly selects the wrong model
where
Here {𝑧𝑡+1 }∞
𝑡=0 is an i.i.d. sequence of 𝑁 (0, 𝐼) random vectors.
where
−1
𝑈 = exp(−𝛿) [𝐼 − exp(−𝛿)𝐴′ ] 𝐷
and
2
exp(−𝛿) (1 − 𝛾) exp(−𝛿) −1
u= 𝜈+ ∣𝐷′ [𝐼 − exp(−𝛿)𝐴] 𝐵 + 𝐹 ∣ ,
1 − exp(−𝛿) 2 1 − exp(−𝛿)
𝑐𝑡 𝑀̃ 𝑒(𝑥
̃ 0)
̃ ( 𝑡)(
= exp(𝜈𝑡) ) (2)
𝑐0 𝑀̃ 0 𝑒(𝑥
̃ 𝑡)
𝑀̃ 𝑡 ̃
where ( 𝑀̃ ) is a likelihood ratio process and 𝑀0 = 1.
0
At this point, as an exercise, we ask the reader please to verify the follow formulas for 𝜈 ̃ and
𝑒(𝑥
̃ 𝑡 ) as functions of 𝐴, 𝐵, 𝐷, 𝐹 :
𝐻 ⋅𝐻
𝜈̃ = 𝜈 +
2
and
𝑒(𝑥)
̃ = exp[𝑔(𝑥)] = exp[𝐷′ (𝐼 − 𝐴)−1 𝑥]
In particular, we want to simulate 5000 sample paths of length 𝑇 = 1000 for the case in which
𝑥 is a scalar and [𝐴, 𝐵, 𝐷, 𝐹 ] = [0.8, 0.001, 1.0, 0.01] and 𝜈 = 0.005.
After accomplishing this, we want to display a histogram of 𝑀̃ 𝑇𝑖 for 𝑇 = 1000.
Here is code that accomplishes these tasks
# Allocate space
add_mart_comp = zeros(I, T)
# Build model
amf_2 = AMF_LSS_VAR(A = 0.8, B = 0.001, D = 1.0, F = 0.01, ν = 0.005)
Comments
• The preceding min, mean, and max of the cross-section of the date 𝑇 realizations of the
multiplicative martingale component of 𝑐𝑡 indicate that the sample mean is close to its
population mean of 1.
– This outcome prevails for all values of the horizon 𝑇 .
• The cross-section distribution of the multiplicative martingale component of 𝑐 at date 𝑇
approximates a log normal distribution well.
1004 CHAPTER 57. MULTIPLICATIVE FUNCTIONALS
In [13]: plot(seriestype = :histogram, amcT, bin = 25, normed = true, label = "")
plot!(title = "Histogram of Additive Martingale Component")
Out[13]:
In [14]: plot(seriestype = :histogram, mmcT, bin = 25, normed = true, label = "")
plot!(title = "Histogram of Multiplicative Martingale Component")
Out[14]:
57.4. BENEFITS FROM REDUCED AGGREGATE FLUCTUATIONS 1005
̃𝑡 }∞ can be represented as
The likelihood ratio process {𝑀 𝑡=0
𝑡
̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )),
𝑀 ̃0 = 1,
𝑀
𝑗=1
2
# The distribution
mdist = LogNormal(-t * H2 / 2, sqrt(t * H2))
x = range(xmin, xmax, length = npts)
p = pdf.(mdist, x)
return x, p
end
1006 CHAPTER 57. MULTIPLICATIVE FUNCTIONALS
# The distribution
lmdist = Normal(-t * H2 / 2, sqrt(t * H2))
x = range(xmin, xmax, length = npts)
p = pdf.(lmdist, x)
return x, p
end
times_to_plot]
Out[15]:
57.4. BENEFITS FROM REDUCED AGGREGATE FLUCTUATIONS 1007
These probability density functions illustrate a peculiar property of log likelihood ratio
processes:
• With respect to the true model probabilities, they have mathematical expectations
equal to 1 for all 𝑡 ≥ 0.
• They almost surely converge to zero.
1008 CHAPTER 57. MULTIPLICATIVE FUNCTIONALS
Suppose in the tradition of a strand of macroeconomics (for example Tallarini [103], [70])
we want to estimate the welfare benefits from removing random fluctuations around trend
growth.
We shall compute how much initial consumption 𝑐0 a representative consumer who ranks
consumption streams according to (1) would be willing to sacrifice to enjoy the consumption
stream
𝑐𝑡
= exp(𝜈𝑡)
̃
𝑐0
resolv = 1 / (1 - exp(-δ) * A)
vect = F + D * resolv * B
U_det = 0
u_det = exp(-δ) / (1 - exp(-δ)) * ν_tilde
# Get coeffs
U_r, u_r, U_d, u_d = Uu(amf_2, δ, γ)
𝑐0𝑟 −𝑐0𝑑
We look for the ratio 𝑐0𝑟 that makes log 𝑉0𝑟 − log 𝑉0𝑑 = 0
57.4. BENEFITS FROM REDUCED AGGREGATE FLUCTUATIONS 1009
log ⏟⏟⏟
⏟⏟ 𝑉0𝑑 + log 𝑐0𝑑 − log 𝑐0𝑟 = (𝑈 𝑟 − 𝑈 𝑑 )𝑥0 + 𝑢𝑟 − 𝑢𝑑
𝑉0𝑟 − log⏟⏟
=0
𝑐0𝑑
= exp ((𝑈 𝑟 − 𝑈 𝑑 )𝑥0 + 𝑢𝑟 − 𝑢𝑑 )
𝑐0𝑟
Hence, the implied percentage reduction in 𝑐0 that the representative consumer would accept
is given by
𝑐0𝑟 − 𝑐0𝑑
= 1 − exp ((𝑈 𝑟 − 𝑈 𝑑 )𝑥0 + 𝑢𝑟 − 𝑢𝑑 )
𝑐0𝑟
Out[17]: 1.0809878812017448
We find that the consumer would be willing to take a percentage reduction of initial con-
sumption equal to around 1.081.
1010 CHAPTER 57. MULTIPLICATIVE FUNCTIONALS
Chapter 58
58.1 Contents
• Overview 58.2
• A Control Problem 58.3
• Finite Horizon Theory 58.4
• The Infinite Horizon Limit 58.5
• Undiscounted Problems 58.6
• Implementation 58.7
• Exercises 58.8
58.2 Overview
In an earlier lecture Linear Quadratic Dynamic Programming Problems we have studied how
to solve a special class of dynamic optimization and prediction problems by applying the
method of dynamic programming. In this class of problems
1011
1012 CHAPTER 58. CLASSICAL CONTROL WITH LINEAR ALGEBRA
In this lecture and the sequel Classical Filtering with Linear Algebra, we mostly rely on ele-
mentary linear algebra.
The main tool from linear algebra we’ll put to work here is LU decomposition.
We’ll begin with discrete horizon problems.
Then we’ll view infinite horizon problems as appropriate limits of these finite horizon prob-
lems.
Later, we will examine the close connection between LQ control and least squares prediction
and filtering problems.
These classes of problems are connected in the sense that to solve each, essentially the same
mathematics is used.
58.2.1 References
58.2.2 Setup
Let 𝐿 be the lag operator, so that, for sequence {𝑥𝑡 } we have 𝐿𝑥𝑡 = 𝑥𝑡−1 .
More generally, let 𝐿𝑘 𝑥𝑡 = 𝑥𝑡−𝑘 with 𝐿0 𝑥𝑡 = 𝑥𝑡 and
𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿 + … + 𝑑𝑚 𝐿𝑚
𝑁
1 2 1 2
max lim ∑ 𝛽 𝑡 {𝑎𝑡 𝑦𝑡 − ℎ𝑦 − [𝑑(𝐿)𝑦𝑡 ] } , (1)
{𝑦𝑡 } 𝑁→∞
𝑡=0
2 𝑡 2
where
• ℎ is a positive parameter and 𝛽 ∈ (0, 1) is a discount factor
• {𝑎𝑡 }𝑡≥0 is a sequence of exponential order less than 𝛽 −1/2 , by which we mean
𝑡
lim𝑡→∞ 𝛽 2 𝑎𝑡 = 0
Maximization in (1) is subject to initial conditions for 𝑦−1 , 𝑦−2 … , 𝑦−𝑚 .
Maximization is over infinite sequences {𝑦𝑡 }𝑡≥0 .
58.4. FINITE HORIZON THEORY 1013
58.3.1 Example
The formulation of the LQ problem given above is broad enough to encompass many useful
models.
As a simple illustration, recall that in lqcontrol we consider a monopolist facing stochastic
demand shocks and adjustment costs.
Let’s consider a deterministic version of this problem, where the monopolist maximizes the
discounted sum
∞
∑ 𝛽 𝑡 𝜋𝑡
𝑡=0
and
1. fixing 𝑁 > 𝑚,
𝑁
𝐽 = ∑ 𝛽 𝑡 [𝑑(𝐿)𝑦𝑡 ][𝑑(𝐿)𝑦𝑡 ]
𝑡=0
𝑁
= ∑ 𝛽 𝑡 (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 ) (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 )
𝑡=0
𝜕𝐽
= 2𝛽 𝑡 𝑑0 𝑑(𝐿)𝑦𝑡 + 2𝛽 𝑡+1 𝑑1 𝑑(𝐿)𝑦𝑡+1 + ⋯ + 2𝛽 𝑡+𝑚 𝑑𝑚 𝑑(𝐿)𝑦𝑡+𝑚
𝜕𝑦𝑡
= 2𝛽 𝑡 (𝑑0 + 𝑑1 𝛽𝐿−1 + 𝑑2 𝛽 2 𝐿−2 + ⋯ + 𝑑𝑚 𝛽 𝑚 𝐿−𝑚 ) 𝑑(𝐿)𝑦𝑡
𝜕𝐽
= 2𝛽 𝑡 𝑑(𝛽𝐿−1 ) 𝑑(𝐿)𝑦𝑡 (2)
𝜕𝑦𝑡
𝜕𝐽
= 2𝛽 𝑁 𝑑0 𝑑(𝐿)𝑦𝑁
𝜕𝑦𝑁
𝜕𝐽
= 2𝛽 𝑁−1 [𝑑0 + 𝛽 𝑑1 𝐿−1 ] 𝑑(𝐿)𝑦𝑁−1
𝜕𝑦𝑁−1 (3)
⋮ ⋮
𝜕𝐽
= 2𝛽 𝑁−𝑚+1 [𝑑0 + 𝛽𝐿−1 𝑑1 + ⋯ + 𝛽 𝑚−1 𝐿−𝑚+1 𝑑𝑚−1 ]𝑑(𝐿)𝑦𝑁−𝑚+1
𝜕𝑦𝑁−𝑚+1
With these preliminaries under our belts, we are ready to differentiate (1).
Differentiating (1) with respect to 𝑦𝑡 for 𝑡 = 0, … , 𝑁 − 𝑚 gives the Euler equations
The system of equations (4) form a 2 × 𝑚 order linear difference equation that must hold for
the values of 𝑡 indicated.
Differentiating (1) with respect to 𝑦𝑡 for 𝑡 = 𝑁 − 𝑚 + 1, … , 𝑁 gives the terminal conditions
That is, for the finite 𝑁 problem, conditions (4) and (5) are necessary and sufficient for a
maximum, by concavity of the objective function.
Next we describe how to obtain the solution using matrix methods.
Let’s look at how linear algebra can be used to tackle and shed light on the finite horizon LQ
control problem.
[ℎ + 𝑑 (𝛽𝐿−1 ) 𝑑 (𝐿)]𝑦𝑡 = 𝑎𝑡 , 𝑡 = 0, 1, … , 𝑁 − 1
(6)
𝛽 𝑁 [𝑎𝑁 − ℎ 𝑦𝑁 − 𝑑0 𝑑 (𝐿)𝑦𝑁 ] = 0
where 𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿.
These equations are to be solved for 𝑦0 , 𝑦1 , … , 𝑦𝑁 as functions of 𝑎0 , 𝑎1 , … , 𝑎𝑁 and 𝑦−1 .
Let
(𝜙0 − 𝑑12 ) 𝜙1 0 0 … … 0 𝑦𝑁 𝑎𝑁
⎡ 𝛽𝜙 𝜙 𝜙 0 … … 0 ⎤ ⎡𝑦 ⎤ ⎡ 𝑎 ⎤
⎢ 1 0 1 ⎥ ⎢ 𝑁−1 ⎥ ⎢ 𝑁−1 ⎥
⎢ 0 𝛽𝜙1 𝜙0 𝜙1 … … 0 ⎥ ⎢𝑦𝑁−2 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⎥ ⎢ ⎥=⎢ ⎥ (7)
⎢ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ 0 … … … 𝛽𝜙1 𝜙0 𝜙1 ⎥ ⎢ 𝑦1 ⎥ ⎢ 𝑎1 ⎥
⎣ 0 … … … 0 𝛽𝜙1 𝜙0 ⎦ ⎣ 𝑦0 ⎦ ⎣𝑎0 − 𝜙1 𝑦−1 ⎦
or
𝑊 𝑦 ̄ = 𝑎̄ (8)
1. The first element differs from the remaining diagonal elements, reflecting the terminal
condition.
𝑦 ̄ = 𝑊 −1 𝑎̄ (9)
An Alternative Representation
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (10)
1 𝑈12 0 0 … 0 0 𝑦𝑁
⎡0 1 𝑈 0 … 0 0 ⎤ ⎡𝑦 ⎤
⎢ 23 ⎥ ⎢ 𝑁−1 ⎥
⎢0 0 1 𝑈34 … 0 0 ⎥ ⎢𝑦𝑁−2 ⎥
⎢0 0 0 1 … 0 0 ⎥ ⎢𝑦𝑁−3 ⎥ =
⎢ ⎥ ⎢ ⎥
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 … 1 𝑈𝑁,𝑁+1 ⎥ ⎢ 𝑦1 ⎥
⎣0 0 0 0 … 0 1 ⎦ ⎣ 𝑦0 ⎦
58.4. FINITE HORIZON THEORY 1017
𝐿−1
11 0 0 … 0 𝑎𝑁
⎡ 𝐿−1 −1
𝐿22 0 … 0 ⎤ ⎡ ⎤
⎢ 21 ⎥ ⎢ 𝑎𝑁−1 ⎥
⎢ 𝐿−1
31 𝐿−1
32 𝐿−1
33 … 0 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎢ ⋮ ⎥
⎢ −1 −1 −1
⎥⎢ ⎥
⎢ 𝐿𝑁,1 𝐿𝑁,2 𝐿𝑁,3 … 0 ⎥⎢ 𝑎1 ⎥
−1 −1 −1
𝐿
⎣ 𝑁+1,1 𝐿𝑁+1,2 𝐿𝑁+1,3 … 𝐿−1 𝑎
𝑁+1 𝑁+1 ⎦ ⎣ 0 − 𝜙 𝑦
1 −1 ⎦
where 𝐿−1
𝑖𝑗 is the (𝑖, 𝑗) element of 𝐿
−1
and 𝑈𝑖𝑗 is the (𝑖, 𝑗) element of 𝑈 .
Note how the left side for a given 𝑡 involves 𝑦𝑡 and one lagged value 𝑦𝑡−1 while the right side
involves all future values of the forcing process 𝑎𝑡 , 𝑎𝑡+1 , … , 𝑎𝑁 .
We briefly indicate how this approach extends to the problem with 𝑚 > 1.
Assume that 𝛽 = 1 and let 𝐷𝑚+1 be the (𝑚 + 1) × (𝑚 + 1) symmetric matrix whose elements
are determined from the following formula:
𝑦𝑁 𝑎𝑁 𝑦𝑁−𝑚+1
⎡𝑦 ⎤ ⎡𝑎 ⎤ ⎡𝑦 ⎤
(𝐷𝑚+1 + ℎ𝐼𝑚+1 ) ⎢ 𝑁−1 ⎥ = ⎢ 𝑁−1 ⎥ + 𝑀 ⎢ 𝑁−𝑚−2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑦
⎣ 𝑁−𝑚 ⎦ ⎣𝑎𝑁−𝑚 ⎦ 𝑦
⎣ 𝑁−2𝑚 ⎦
where 𝑀 is (𝑚 + 1) × 𝑚 and
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (11)
𝑡 𝑁−𝑡
∑ 𝑈−𝑡+𝑁+1, −𝑡+𝑁+𝑗+1 𝑦𝑡−𝑗 = ∑ 𝐿−𝑡+𝑁+1, −𝑡+𝑁+1−𝑗 𝑎𝑡+𝑗
̄ ,
𝑗=0 𝑗=0
𝑡 = 0, 1, … , 𝑁
where 𝐿−1
𝑡,𝑠 is the element in the (𝑡, 𝑠) position of 𝐿, and similarly for 𝑈 .
The left side of equation (11) is the “feedback” part of the optimal control law for 𝑦𝑡 , while
the right-hand side is the “feedforward” part.
We note that there is a different control law for each 𝑡.
Thus, in the finite horizon case, the optimal control law is time dependent.
It is natural to suspect that as 𝑁 → ∞, (11) becomes equivalent to the solution of our infinite
horizon problem, which below we shall show can be expressed as
−1
so that as 𝑁 → ∞ we expect that for each fixed 𝑡, 𝑈𝑡,𝑡−𝑗 → 𝑐𝑗 and 𝐿𝑡,𝑡+𝑗 approaches the
−𝑗 −1 −1
coefficient on 𝐿 in the expansion of 𝑐(𝛽𝐿 ) .
This suspicion is true under general conditions that we shall study later.
For now, we note that by creating the matrix 𝑊 for large 𝑁 and factoring it into the 𝐿𝑈
form, good approximations to 𝑐(𝐿) and 𝑐(𝛽𝐿−1 )−1 can be obtained.
For the infinite horizon problem, we propose to discover first-order necessary conditions by
taking the limits of (4) and (5) as 𝑁 → ∞.
This approach is valid, and the limits of (4) and (5) as 𝑁 approaches infinity are first-order
necessary conditions for a maximum.
However, for the infinite horizon problem with 𝛽 < 1, the limits of (4) and (5) are, in general,
not sufficient for a maximum.
That is, the limits of (5) do not provide enough information uniquely to determine the solu-
tion of the Euler equation (4) that maximizes (1).
As we shall see below, a side condition on the path of 𝑦𝑡 that together with (4) is sufficient
for an optimum is
∞
∑ 𝛽 𝑡 ℎ𝑦𝑡2 < ∞ (12)
𝑡=0
All paths that satisfy the Euler equations, except the one that we shall select below, violate
58.5. THE INFINITE HORIZON LIMIT 1019
this condition and, therefore, evidently lead to (much) lower values of (1) than does the opti-
mal path selected by the solution procedure below.
Consider the characteristic equation associated with the Euler equation
where 𝑧0 is a constant.
In (14), we substitute (𝑧 − 𝑧𝑗 ) = −𝑧𝑗 (1 − 𝑧1 𝑧) and (𝑧 − 𝛽𝑧𝑗−1 ) = 𝑧(1 − 𝑧𝛽 𝑧−1 ) for 𝑗 = 1, … , 𝑚 to
𝑗 𝑗
get
1 1 1 1
ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) = (−1)𝑚 (𝑧0 𝑧1 ⋯ 𝑧𝑚 )(1 − 𝑧) ⋯ (1 − 𝑧)(1 − 𝛽𝑧 −1 ) ⋯ (1 − 𝛽𝑧−1 )
𝑧1 𝑧𝑚 𝑧1 𝑧𝑚
𝑚
Now define 𝑐(𝑧) = ∑𝑗=0 𝑐𝑗 𝑧𝑗 as
1/2 𝑧 𝑧 𝑧
𝑐 (𝑧) = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] (1 − ) (1 − ) ⋯ (1 − ) (15)
𝑧1 𝑧2 𝑧𝑚
𝑐(𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧) (17)
where
1/2 1
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] ; 𝜆𝑗 = , 𝑗 = 1, … , 𝑚
𝑧𝑗
√ √
Since |𝑧𝑗 | > 𝛽 for 𝑗 = 1, … , 𝑚 it follows that |𝜆𝑗 | < 1/ 𝛽 for 𝑗 = 1, … , 𝑚.
1020 CHAPTER 58. CLASSICAL CONTROL WITH LINEAR ALGEBRA
In sum, we have constructed a factorization (16) of the characteristic polynomial for the Euler
equation in which the zeros of 𝑐(𝑧) exceed 𝛽 1/2 in modulus, and the zeros of 𝑐 (𝛽𝑧 −1 ) are less
than 𝛽 1/2 in modulus.
Using (16), we now write the Euler equation as
𝑐(𝛽𝐿−1 ) 𝑐 (𝐿) 𝑦𝑡 = 𝑎𝑡
The unique solution of the Euler equation that satisfies condition (12) is
𝑐0−2 𝑎𝑡
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = (19)
(1 − 𝛽𝜆1 𝐿−1 ) ⋯ (1 − 𝛽𝜆𝑚 𝐿−1 )
Using partial fractions, we can write the characteristic polynomial on the right side of (19) as
𝑚
𝐴𝑗 𝑐0−2
∑ where 𝐴𝑗 ∶= 𝜆𝑖
1 − 𝜆𝑗 𝛽𝐿−1 ∏𝑖≠𝑗 (1 −
𝑗=1 𝜆𝑗 )
𝑚
𝐴𝑗
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝑎
𝑗=1
1 − 𝜆𝑗 𝛽𝐿−1 𝑡
or
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (20)
𝑗=1 𝑘=0
Equation (20) expresses the optimum sequence for 𝑦𝑡 in terms of 𝑚 lagged 𝑦’s, and 𝑚
weighted infinite geometric sums of future 𝑎𝑡 ’s.
Furthermore, (20) is the unique solution of the Euler equation that satisfies the initial condi-
tions and condition (12).
In effect, condition (12) compels us to solve the “unstable” roots of ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) forward
(see [94]).
−1 −1
The step of factoring the polynomial
√ ℎ + 𝑑(𝛽𝑧 ) 𝑑(𝑧) into 𝑐 (𝛽𝑧 )𝑐 (𝑧), where the zeros of
𝑐 (𝑧) all have modulus exceeding 𝛽, is central to solving the problem.
We note two features of the solution (20)
58.6. UNDISCOUNTED PROBLEMS 1021
√ √
• Since |𝜆𝑗 | < 1/ 𝛽 for all 𝑗, it follows that (𝜆𝑗 𝛽) < 𝛽. √
• The assumption that {𝑎𝑡 } is of exponential order less than 1/ 𝛽 is sufficient to guaran-
tee that the geometric sums of future 𝑎𝑡 ’s on the right side of (20) converge.
We immediately see that those sums will converge under the weaker condition that {𝑎𝑡 } is of
exponential order less than 𝜙−1 where 𝜙 = max {𝛽𝜆𝑖 , 𝑖 = 1, … , 𝑚}.
Note that with 𝑎𝑡 identically zero, (20) implies that in general |𝑦𝑡 | eventually grows exponen-
tially at a rate given by max𝑖 |𝜆𝑖 |.
√
The condition max𝑖 |𝜆𝑖 | < 1/ 𝛽 guarantees that condition (12) is satisfied.
√
In fact, max𝑖 |𝜆𝑖 | < 1/ 𝛽 is a necessary condition for (12) to hold.
Were (12) not satisfied, the objective function would diverge to −∞, implying that the 𝑦𝑡
path could not be optimal.
For example, with 𝑎𝑡 = 0, for all 𝑡, it is easy to describe a naive (nonoptimal) policy for
{𝑦𝑡 , 𝑡 ≥ 0} that gives a finite value of (17).
We can simply let 𝑦𝑡 = 0 for 𝑡 ≥ 0.
This policy involves at most 𝑚 nonzero values of ℎ𝑦𝑡2 and [𝑑(𝐿)𝑦𝑡 ]2 , and so yields a finite
value of (1).
Therefore it is easy to dominate a path that violates (12).
It is worthwhile focusing on a special case of the LQ problems above: the undiscounted prob-
lem that emerges when 𝛽 = 1.
In this case, the Euler equation is
(ℎ + 𝑑(𝐿−1 )𝑑(𝐿)) 𝑦𝑡 = 𝑎𝑡
(ℎ + 𝑑 (𝑧 −1 )𝑑(𝑧)) = 𝑐 (𝑧 −1 ) 𝑐 (𝑧)
where
𝑐 (𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧)
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 … 𝑧𝑚 ]
|𝜆𝑗 | < 1 for 𝑗 = 1, … , 𝑚
1
𝜆𝑗 = for 𝑗 = 1, … , 𝑚
𝑧𝑗
𝑧0 = constant
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
Discounted problems can always be converted into undiscounted problems via a simple trans-
formation.
Consider problem (1) with 0 < 𝛽 < 1.
Define the transformed variables
𝑚
Then notice that 𝛽 𝑡 [𝑑 (𝐿)𝑦𝑡 ]2 = [𝑑 ̃(𝐿)𝑦𝑡̃ ]2 with 𝑑 ̃(𝐿) = ∑𝑗=0 𝑑𝑗̃ 𝐿𝑗 and 𝑑𝑗̃ = 𝛽 𝑗/2 𝑑𝑗 .
Then the original criterion function (1) is equivalent to
𝑁
1 1
lim ∑{𝑎𝑡̃ 𝑦𝑡̃ − ℎ 𝑦𝑡2̃ − [𝑑 ̃(𝐿) 𝑦𝑡̃ ]2 } (22)
𝑁→∞
𝑡=0
2 2
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿) 𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ , (23)
𝑗=1 𝑘=0
1/2
[(−1)𝑚 𝑧0̃ 𝑧1̃ … 𝑧𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) … (1 − 𝜆̃ 𝑚 𝑧) = 𝑐 ̃ (𝑧), where |𝜆̃ 𝑗 | < 1
We leave it to the reader to show that (23) implies the equivalent form of the solution
𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + ⋯ + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘
𝑗=1 𝑘=0
where
The transformations (21) and the inverse formulas (24) allow us to solve a discounted prob-
lem by first solving a related undiscounted problem.
58.7 Implementation
Code that computes solutions to the LQ problem using the methods described above can be
found in file control_and_filter.jl.
Here’s how it looks
m = length(d) - 1
m == length(y_m) || throw(ArgumentError("y_m and d must be of same�
↪length = $m"))
for i = -k:k
ϕ_r[k-i+1] = sum(diag(r*r', -i))
end
if h_eps != nothing
ϕ_r[k+1] = ϕ_r[k+1] + h_eps
end
end
function construct_W_and_Wm(lqf, N)
1024 CHAPTER 58. CLASSICAL CONTROL WITH LINEAR ALGEBRA
@unpack d, m = lqf
W = zeros(N + 1, N + 1)
W_m = zeros(N + 1, m)
# terminal conditions
D_m1 = zeros(m + 1, m + 1)
M = zeros(m + 1, m)
for j in 1:(m+1)
for k in j:(m+1)
D_m1[j, k] = dot(d[1:j, 1], d[k-j+1:k, 1])
end
end
for j in 1:m
for i in (j + 1):(m + 1)
M[i, j] = D_m1[i-j, m+1]
end
end
M
for i in 1:m
W[N - m + i + 1 , end-(2m + 1 - i)+1:end] = ϕ[1:end-i]
end
for i in 1:m
W_m[N - i + 2, 1:(m - i)+1] = ϕ[(m + 1 + i):end]
end
return W, W_m
end
function roots_of_characteristic(lqf)
@unpack m, ϕ = lqf
function coeffs_of_c(lqf)
m = lqf.m
z_1_to_m, z_0, λ = roots_of_characteristic(lqf)
c_0 = (z_0 * prod(z_1_to_m) * (-1.0)^m)^(0.5)
c_coeffs = coeffs(Polynomial(z_1_to_m)) * z_0 / c_0
return c_coeffs
end
function solution(lqf)
z_1_to_m, z_0, λ = roots_of_characteristic(lqf)
c_0 = coeffs_of_c(lqf)[end]
A = zeros(lqf.m)
for j in 1:m
denom = 1 - λ/λ[j]
A[j] = c_0^(-2) / prod(denom[1:m .!= j])
end
return λ, A
end
function simulate_a(lqf, N)
V = construct_V(lqf, N + 1)
d = MVNSampler(zeros(N + 1), V)
return rand(d)
end
aux_matrix = zeros(N + 1, N + 1)
aux_matrix[1:t+1 , 1:t+1 ] .= I + zeros(t+1, t+1)
L = chol(V)'
Ea_hist = inv(L) * aux_matrix * L * a_hist
return Ea_hist
end
N = length(a_hist) - 1
W, W_m = construct_W_and_Wm(lqf, N)
F = lu(W)
L, U = F.L, F.U
D = Diagonal(1.0./diag(U))
U = D * U
L = L * Diagonal(1.0./diag(D))
�
# Reverse the order of y with the matrix J
J = reverse(I + zeros(N + m + 1, N + m + 1), dims = 2)
� �
y_hist = J * vcat(y, y_m) # y_hist : concatenated y_m and y
end
�
return y_hist, L, U, y
end
58.7.1 Example
In [4]: gr(fmt=:png);
label="y_t")
plot!(plt, xlabel="Time", grid=true, xlim=(0,maximum(time)), legend=:
↪bottomleft)
end
plot_simulation()
Out[4]:
1028 CHAPTER 58. CLASSICAL CONTROL WITH LINEAR ALGEBRA
In [5]: plot_simulation(γ=5.0)
Out[5]:
And here’s 𝛾 = 10
58.8. EXERCISES 1029
In [6]: plot_simulation(γ=10.0)
Out[6]:
58.8 Exercises
58.8.1 Exercise 1
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿)𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ (25)
𝑗=1 𝑘=0
Here
̃ −1 )𝑑(𝑧)
• ℎ + 𝑑(𝑧 ̃ = 𝑐(𝑧̃ −1 )𝑐(𝑧)
̃
• 𝑐(𝑧) 𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) ⋯ (1 − 𝜆̃ 𝑚 𝑧)
̃ = [(−1) 𝑧0̃ 𝑧1̃ ⋯ 𝑧𝑚 1/2
̃ −1 ) 𝑑(𝑧).
where the 𝑧𝑗̃ are the zeros of ℎ + 𝑑(𝑧 ̃
Prove that (25) implies that the solution for 𝑦𝑡 in feedback form is
1030 CHAPTER 58. CLASSICAL CONTROL WITH LINEAR ALGEBRA
𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + … + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ 𝛽 𝑘 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
58.8.2 Exercise 2
2
1
∑ {𝑎𝑡 𝑦𝑡 − [(1 − 2𝐿)𝑦𝑡 ]2 }
𝑡=0
2
58.8.3 Exercise 3
𝑁
1
lim ∑ − [(1 − 2𝐿)𝑦𝑡 ]2 ,
𝑁→∞
𝑡=0
2
58.8.4 Exercise 4
𝑁
1
lim ∑ (.0000001) 𝑦𝑡2 − [(1 − 2𝐿)𝑦𝑡 ]2
𝑁→∞
𝑡=0
2
subject to 𝑦−1 given. Prove that the solution 𝑦𝑡 = 2𝑦𝑡−1 violates condition (12), and so is not
optimal.
Prove that the optimal solution is approximately 𝑦𝑡 = .5𝑦𝑡−1 .
Chapter 59
59.1 Contents
• Overview 59.2
• Infinite Horizon Prediction and Filtering Problems 59.3
• Finite Dimensional Prediction 59.4
• Combined Finite Dimensional Control and Prediction 59.5
• Exercises 59.6
59.2 Overview
This is a sequel to the earlier lecture Classical Control with Linear Algebra.
That lecture used linear algebra – in particular, the LU decomposition – to formulate and
solve a class of linear-quadratic optimal control problems.
In this lecture, we’ll be using a closely related decomposition, the Cholesky decomposition ,
to solve linear prediction and filtering problems.
We exploit the useful fact that there is an intimate connection between two superficially dif-
ferent classes of problems:
• deterministic linear-quadratic (LQ) optimal control problems
• linear least squares prediction and filtering problems
The first class of problems involves no randomness, while the second is all about randomness.
Nevertheless, essentially the same mathematics solves both type of problem.
This connection, which is often termed “duality,” is present whether one uses “classical” or
“recursive” solution procedures.
In fact we saw duality at work earlier when we formulated control and prediction problems
recursively in lectures LQ dynamic programming problems, A first look at the Kalman filter,
and The permanent income model.
A useful consequence of duality is that
• With every LQ control problem there is implicitly affiliated a linear least squares pre-
1031
1032 CHAPTER 59. CLASSICAL FILTERING WITH LINEAR ALGEBRA
59.2.1 References
59.2.2 Setup
𝑌𝑡 = 𝑑(𝐿)𝑢𝑡 (1)
𝑚
where 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 , and 𝑢𝑡 is a serially uncorrelated stationary random process satisfy-
ing
𝔼𝑢𝑡 = 0
1 if 𝑡 = 𝑠 (2)
𝔼𝑢𝑡 𝑢𝑠 = {
0 otherwise
𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 (3)
where 𝜀𝑡 is a serially uncorrelated stationary random process with 𝔼𝜀𝑡 = 0 and 𝔼𝜀𝑡 𝜀𝑠 = 0 for
all distinct 𝑡 and 𝑠.
We also assume that 𝔼𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠.
59.3. INFINITE HORIZON PREDICTION AND FILTERING PROBLEMS 1033
The linear least squares prediction problem is to find the 𝐿2 random variable 𝑋̂ 𝑡+𝑗
among linear combinations of {𝑋𝑡 , 𝑋𝑡−1 , …} that minimizes 𝔼(𝑋̂ 𝑡+𝑗 − 𝑋𝑡+𝑗 )2 .
∞ ∞
That is, the problem is to find a 𝛾𝑗 (𝐿) = ∑𝑘=0 𝛾𝑗𝑘 𝐿𝑘 such that ∑𝑘=0 |𝛾𝑗𝑘 |2 < ∞ and
𝔼[𝛾𝑗 (𝐿)𝑋𝑡 − 𝑋𝑡+𝑗 ]2 is minimized.
∞
The linear least squares filtering problem is to find a 𝑏 (𝐿) = ∑𝑗=0 𝑏𝑗 𝐿𝑗 such that
∞
∑𝑗=0 |𝑏𝑗 |2 < ∞ and 𝔼[𝑏 (𝐿)𝑋𝑡 − 𝑌𝑡 ]2 is minimized.
Interesting versions of these problems related to the permanent income theory were studied
by [80].
𝐶𝑋 (𝜏 ) = 𝔼𝑋𝑡 𝑋𝑡−𝜏
𝐶𝑌 (𝜏 ) = 𝔼𝑌𝑡 𝑌𝑡−𝜏 𝜏 = 0, ±1, ±2, … (4)
𝐶𝑌 ,𝑋 (𝜏 ) = 𝔼𝑌𝑡 𝑋𝑡−𝜏
∞
𝑔𝑋 (𝑧) = ∑ 𝐶𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞
∞
𝑔𝑌 (𝑧) = ∑ 𝐶𝑌 (𝜏 )𝑧 𝜏 (5)
𝜏=−∞
∞
𝑔𝑌 𝑋 (𝑧) = ∑ 𝐶𝑌 𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞
𝑦𝑡 = 𝐴(𝐿)𝑣1𝑡 + 𝐵(𝐿)𝑣2𝑡
𝑥𝑡 = 𝐶(𝐿)𝑣1𝑡 + 𝐷(𝐿)𝑣2𝑡
𝑔𝑌 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 )
𝑔𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 ) + ℎ (7)
−1
𝑔𝑌 𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 )
The key step in obtaining solutions to our problems is to factor the covariance generating
function 𝑔𝑋 (𝑧) of 𝑋.
The solutions of our problems are given by formulas due to Wiener and Kolmogorov.
These formulas utilize the Wold moving average representation of the 𝑋𝑡 process,
𝑋𝑡 = 𝑐 (𝐿) 𝜂𝑡 (8)
𝑚
where 𝑐(𝐿) = ∑𝑗=0 𝑐𝑗 𝐿𝑗 , with
̂ 𝑡 |𝑋𝑡−1 , 𝑋𝑡−2 , …]
𝑐0 𝜂𝑡 = 𝑋𝑡 − 𝔼[𝑋 (9)
Therefore, we have already showed constructively how to factor the covariance generating
function 𝑔𝑋 (𝑧) = 𝑑(𝑧) 𝑑 (𝑧 −1 ) + ℎ.
We now introduce the annihilation operator:
∞ ∞
[ ∑ 𝑓𝑗 𝐿𝑗 ] ≡ ∑ 𝑓𝑗 𝐿𝑗 (12)
𝑗=−∞ 𝑗=0
+
𝑐(𝐿)
𝛾𝑗 (𝐿) = [ ] 𝑐 (𝐿)−1 (13)
𝐿𝑗 +
̂ 𝑡 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = 𝑏(𝐿)𝑋𝑡 .
We have defined the solution of the filtering problem as 𝔼[𝑌
The Wiener-Kolomogorov formula for 𝑏(𝐿) is
𝑔𝑌 𝑋 (𝐿)
𝑏(𝐿) = ( ) 𝑐(𝐿)−1
𝑐(𝐿−1 ) +
or
𝑑(𝐿)𝑑(𝐿−1 )
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (14)
𝑐(𝐿−1 ) +
Formulas (13) and (14) are discussed in detail in [108] and [94].
The interested reader can there find several examples of the use of these formulas in eco-
nomics Some classic examples using these formulas are due to [80].
As an example of the usefulness of formula (14), we let 𝑋𝑡 be a stochastic process with Wold
moving average representation
𝑋𝑡 = 𝑐(𝐿)𝜂𝑡
Suppose that at time 𝑡, we wish to predict a geometric sum of future 𝑋’s, namely
∞
1
𝑦𝑡 ≡ ∑ 𝛿 𝑗 𝑋𝑡+𝑗 = 𝑋
𝑗=0
1 − 𝛿𝐿−1 𝑡
𝑐(𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (15)
1 − 𝛿𝐿−1 +
In order to evaluate the term in the annihilation operator, we use the following result from
[44].
Proposition Let
∞ ∞
• 𝑔(𝑧) = ∑𝑗=0 𝑔𝑗 𝑧𝑗 where ∑𝑗=0 |𝑔𝑗 |2 < +∞
• ℎ (𝑧 −1 ) = (1 − 𝛿1 𝑧−1 ) … (1 − 𝛿𝑛 𝑧−1 ), where |𝛿𝑗 | < 1, for 𝑗 = 1, … , 𝑛
1036 CHAPTER 59. CLASSICAL FILTERING WITH LINEAR ALGEBRA
Then
𝑛
𝑔(𝑧) 𝑔(𝑧) 𝛿𝑗 𝑔(𝛿𝑗 ) 1
[ −1
] = −1
− ∑ 𝑛 ( ) (16)
ℎ(𝑧 ) + ℎ(𝑧 ) 𝑗=1 ∏ 𝑘=1 (𝛿𝑗 − 𝛿𝑘 ) 𝑧 − 𝛿𝑗
𝑘≠𝑗
and, alternatively,
𝑛
𝑔(𝑧) 𝑧𝑔(𝑧) − 𝛿𝑗 𝑔(𝛿𝑗 )
[ −1
] = ∑ 𝐵𝑗 ( ) (17)
ℎ(𝑧 ) + 𝑗=1 𝑧 − 𝛿𝑗
𝑛
where 𝐵𝑗 = 1/ ∏ 𝑘=1 (1 − 𝛿𝑘 /𝛿𝑗 ).
𝑘+𝑗
Applying formula (17) of the proposition to evaluating (15) with 𝑔(𝑧) = 𝑐(𝑧) and ℎ(𝑧 −1 ) =
1 − 𝛿𝑧 −1 gives
𝐿𝑐(𝐿) − 𝛿𝑐(𝛿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝐿−𝛿
or
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝑏(𝐿) = [ ]
1 − 𝛿𝐿−1
Thus, we have
∞
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝔼̂ [∑ 𝛿 𝑗 𝑋𝑡+𝑗 |𝑋𝑡 , 𝑥𝑡−1 , …] = [ ] 𝑋𝑡 (18)
𝑗=0
1 − 𝛿𝐿−1
This formula is useful in solving stochastic versions of problem 1 of lecture Classical Control
with Linear Algebra in which the randomness emerges because {𝑎𝑡 } is a stochastic process.
The problem is to maximize
𝑁
1 1
𝔼0 lim ∑ 𝛽 𝑡 [𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 ] (19)
𝑁→∞
𝑡−0
2 2
𝑎𝑡 = 𝑐(𝐿) 𝜂𝑡
where
𝑛̃
𝑐(𝐿) = ∑ 𝑐𝑗 𝐿𝑗
𝑗=0
and
̂ 𝑡 |𝑎𝑡−1 , …]
𝜂𝑡 = 𝑎𝑡 − 𝔼[𝑎
59.4. FINITE DIMENSIONAL PREDICTION 1037
The problem is to maximize (19) with respect to a contingency plan expressing 𝑦𝑡 as a func-
tion of information known at 𝑡, which is assumed to be (𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑎𝑡 , 𝑎𝑡−1 , …).
The solution of this problem can be achieved in two steps.
First, ignoring the uncertainty, we can solve the problem assuming that {𝑎𝑡 } is a known se-
quence.
The solution is, from above,
or
𝑚 ∞
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑(𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (20)
𝑗=1 𝑘=0
Second, the solution of the problem under uncertainty is obtained by replacing the terms on
the right-hand side of the above expressions with their linear least squares predictors.
Using (18) and (20), we have the following solution
𝑚
1 − 𝛽𝜆𝑗 𝑐(𝛽𝜆𝑗 )𝐿−1 𝑐(𝐿)−1
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 [ ] 𝑎𝑡
𝑗=1
1 − 𝛽𝜆𝑗 𝐿−1
𝑉 = 𝐿−1 (𝐿−1 )′
1038 CHAPTER 59. CLASSICAL FILTERING WITH LINEAR ALGEBRA
or
𝐿 𝑉 𝐿′ = 𝐼
𝐿11 𝑥1 = 𝜀1
𝐿21 𝑥1 + 𝐿22 𝑥2 = 𝜀2
(21)
⋮
𝐿𝑇 1 𝑥1 … + 𝐿𝑇 𝑇 𝑥𝑇 = 𝜀𝑇
or
𝑡−1
∑ 𝐿𝑡,𝑡−𝑗 𝑥𝑡−𝑗 = 𝜀𝑡 , 𝑡 = 1, 2, … 𝑇 (22)
𝑗=0
We also have
𝑡−1
𝑥𝑡 = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 . (23)
𝑗=0
Notice from (23) that 𝑥𝑡 is in the space spanned by 𝜀𝑡 , 𝜀𝑡−1 , … , 𝜀1 , and from (22) that 𝜀𝑡 is in
the space spanned by 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥1 .
Therefore, we have that for 𝑡 − 1 ≥ 𝑚 ≥ 1
𝑚−1 𝑡−1
𝑥𝑡 = ∑ 𝐿−1 −1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 + ∑ 𝐿𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (25)
𝑗=0 𝑗=𝑚
𝑡−1
Representation (25) is an orthogonal decomposition of 𝑥𝑡 into a part ∑𝑗=𝑚 𝐿−1 𝑡,𝑡−𝑗 𝜀𝑡−𝑗 that
lies in the space spanned by [𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … , 𝑥1 ], and an orthogonal component not in this
space.
59.4.1 Implementation
Code that computes solutions to LQ control and filtering problems using the methods de-
scribed here and in Classical Control with Linear Algebra can be found in the file con-
trol_and_filter.jl.
Here’s how it looks
59.4. FINITE DIMENSIONAL PREDICTION 1039
m = length(d) - 1
m == length(y_m) ||
throw(ArgumentError("y_m and d must be of same length = $m"))
for i = -k:k
ϕ_r[k-i+1] = sum(diag(r*r', -i))
end
if isnothing(h_eps) == false
ϕ_r[k+1] = ϕ_r[k+1] + h_eps
end
end
function construct_W_and_Wm(lqf, N)
d, m = lqf.d, lqf.m
W = zeros(N + 1, N + 1)
W_m = zeros(N + 1, m)
# terminal conditions
D_m1 = zeros(m + 1, m + 1)
M = zeros(m + 1, m)
1040 CHAPTER 59. CLASSICAL FILTERING WITH LINEAR ALGEBRA
for j in 1:(m+1)
for k in j:(m+1)
D_m1[j, k] = dot(d[1:j, 1], d[k-j+1:k, 1])
end
end
for j in 1:m
for i in (j + 1):(m + 1)
M[i, j] = D_m1[i-j, m+1]
end
end
M
for i in 1:m
W[N - m + i + 1 , end-(2m + 1 - i)+1:end] = ϕ[1:end-i]
end
for i in 1:m
W_m[N - i + 2, 1:(m - i)+1] = ϕ[(m + 1 + i):end]
end
return W, W_m
end
function roots_of_characteristic(lqf)
m, ϕ = lqf.m, lqf.ϕ
function coeffs_of_c(lqf)
59.4. FINITE DIMENSIONAL PREDICTION 1041
m = lqf.m
z_1_to_m, z_0, λ = roots_of_characteristic(lqf)
c_0 = (z_0 * prod(z_1_to_m) * (-1.0)^m)^(0.5)
c_coeffs = coeffs(poly(z_1_to_m)) * z_0 / c_0
return c_coeffs
end
function solution(lqf)
z_1_to_m, z_0, λ = roots_of_characteristic(lqf)
c_0 = coeffs_of_c(lqf)[end]
A = zeros(lqf.m)
for j in 1:m
denom = 1 - λ/λ[j]
A[j] = c_0^(-2) / prod(denom[1:m .!= j])
end
return λ, A
end
function simulate_a(lqf, N)
V = construct_V(N + 1)
d = MVNSampler(zeros(N + 1), V)
return rand(d)
end
aux_matrix = zeros(N + 1, N + 1)
aux_matrix[1:t+1 , 1:t+1 ] = Matrix(I, t + 1, t + 1)
L = cholesky(V).U'
Ea_hist = inv(L) * aux_matrix * L * a_hist
return Ea_hist
end
N = length(a_hist) - 1
W, W_m = construct_W_and_Wm(lqf, N)
F = lu(W, Val(true))
L, U = F
D = diagm(0 => 1.0 ./ diag(U))
U = D * U
L = L * diagm(0 => 1.0 ./ diag(D))
�
# Reverse the order of y with the matrix J
J = reverse(Matrix(I, N + m + 1, N + m + 1), dims = 2)
� �
y_hist = J * vcat(y, y_m) # y_hist : concatenated y_m and y
end
�
return y_hist, L, U, y
end
59.4.2 Example 1
𝑥𝑡 = (1 − 2𝐿)𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity.
We want to use the Wiener-Kolmogorov formula (13) to compute the linear least squares fore-
casts 𝔼[𝑥𝑡+𝑗 ∣ 𝑥𝑡 , 𝑥𝑡−1 , …], for 𝑗 = 1, 2.
We can do everything we want by setting 𝑑 = 𝑟, generating an instance of LQFilter, then
invoking pertinent methods of LQFilter
In [4]: m = 1
y_m = zeros(m)
d = [1.0, -2.0]
r = [1.0, -2.0]
h = 0.0
example = LQFilter(d, h, y_m, r=d)
In [5]: coeffs_of_c(example)
In [6]: roots_of_characteristic(example)
Now let’s form the covariance matrix of a time series vector of length 𝑁 and put it in 𝑉 .
Then we’ll take a Cholesky decomposition of 𝑉 = 𝐿−1 𝐿−1 = 𝐿𝑖𝐿𝑖′ and use it to form the
vector of “moving average representations” 𝑥 = 𝐿𝑖𝜀 and the vector of “autoregressive repre-
sentations” 𝐿𝑥 = 𝜀
In [7]: V = construct_V(example,N=5)
Notice how the lower rows of the “moving average representations” are converging to the ap-
propriate infinite history Wold representation
In [8]: F = cholesky(V)
Li = F.L
Notice how the lower rows of the “autoregressive representations” are converging to the ap-
propriate infinite history autoregressive representation
In [9]: L = inv(Li)
𝑚
Remark Let 𝜋(𝑧) = ∑𝑗=0 𝜋𝑗 𝑧𝑗 and let 𝑧1 , … , 𝑧𝑘 be the zeros of 𝜋(𝑧) that are inside the unit
circle, 𝑘 < 𝑚.
Then define
(𝑧1 𝑧 − 1) (𝑧 𝑧 − 1) (𝑧 𝑧 − 1)
𝜃(𝑧) = 𝜋(𝑧)( )( 2 )…( 𝑘 )
(𝑧 − 𝑧1 ) (𝑧 − 𝑧2 ) (𝑧 − 𝑧𝑘 )
and that the zeros of 𝜃(𝑧) are not inside the unit circle.
59.4.3 Example 2
√
𝑋𝑡 = (1 − 2𝐿2 )𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity.
Let’s find a Wold moving average representation for 𝑥𝑡 .
59.4. FINITE DIMENSIONAL PREDICTION 1045
Let’s use the Wiener-Kolomogorov formula (13) to compute the linear least squares forecasts
𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡−1 , …] for 𝑗 = 1, 2, 3.
We proceed in the same way as example 1
In [10]: m = 2
y_m = [0.0, 0.0]
d = [1, 0, -sqrt(2)]
r = [1, 0, -sqrt(2)]
h = 0.0
example = LQFilter(d, h, y_m, r = d)
In [11]: coeffs_of_c(example)
In [12]: roots_of_characteristic(example)
-0.8408964152537157])
In [13]: V = construct_V(example, N = 8)
In [14]: F = cholesky(V)
Li = F.L
Li[end-2:end, :]
In [15]: L = inv(Li)
1046 CHAPTER 59. CLASSICAL FILTERING WITH LINEAR ALGEBRA
59.4.4 Prediction
It immediately follows from the “orthogonality principle” of least squares (see [6] or [94]
[ch. X]) that
𝑡−1
̂ 𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … 𝑥1 ] = ∑ 𝐿−1
𝔼[𝑥 𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=𝑚 (26)
= [𝐿−1
𝑡,1 𝐿−1
𝑡,2 , … , 𝐿−1
𝑡,𝑡−𝑚 0 0 … 0]𝐿 𝑥
This formula will be convenient in representing the solution of control problems under uncer-
tainty.
Equation (23) can be recognized as a finite dimensional version of a moving average represen-
tation.
Equation (22) can be viewed as a finite dimension version of an autoregressive representation.
Notice that even if the 𝑥𝑡 process is covariance stationary, so that 𝑉 is such that 𝑉𝑖𝑗 depends
only on |𝑖 − 𝑗|, the coefficients in the moving average representation are time-dependent, there
being a different moving average for each 𝑡.
If 𝑥𝑡 is a covariance stationary process, the last row of 𝐿−1 converges to the coefficients in the
Wold moving average representation for {𝑥𝑡 } as 𝑇 → ∞.
Further, if 𝑥𝑡 is covariance stationary, for fixed 𝑘 and 𝑗 > 0, 𝐿−1 −1
𝑇 ,𝑇 −𝑗 converges to 𝐿𝑇 −𝑘,𝑇 −𝑘−𝑗
as 𝑇 → ∞.
That is, the “bottom” rows of 𝐿−1 converge to each other and to the Wold moving average
coefficients as 𝑇 → ∞.
This last observation gives one simple and widely-used practical way of forming a finite 𝑇 ap-
proximation to a Wold moving average representation.
′
First, form the covariance matrix 𝔼𝑥𝑥′ = 𝑉 , then obtain the Cholesky decomposition 𝐿−1 𝐿−1
of 𝑉 , which can be accomplished quickly on a computer.
59.5. COMBINED FINITE DIMENSIONAL CONTROL AND PREDICTION 1047
The last row of 𝐿−1 gives the approximate Wold moving average coefficients.
This method can readily be generalized to multivariate systems.
𝑁
1 1
𝔼 ∑ {𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 } , ℎ>0
𝑡=0
2 2
𝑦−1
𝑈 𝑦 ̄ = 𝐿 𝑎̄ + 𝐾 ⎢ ⋮ ⎤
−1 ⎡
⎥
⎣𝑦−𝑚 ⎦
0 0
𝔼[̂ 𝑎̄ ∣ 𝑎𝑠 , 𝑎𝑠−1 , … , 𝑎0 ] = 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄
0 𝐼(𝑠+1)
𝑦−1
−1 0 0
̃
−1
𝑈 𝑦̄ = 𝐿 𝑈 [ ̃ ⎡
] 𝑈 𝑎̄ + 𝐾 ⎢ ⋮ ⎤ ⎥
0 𝐼(𝑡+1)
⎣𝑦−𝑚 ⎦
59.6 Exercises
59.6.1 Exercise 1
Let 𝑌𝑡 = (1 − 2𝐿)𝑢𝑡 where 𝑢𝑡 is a mean zero white noise with 𝔼𝑢2𝑡 = 1. Let
1048 CHAPTER 59. CLASSICAL FILTERING WITH LINEAR ALGEBRA
𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated white noise with 𝔼𝜀2𝑡 = 9, and 𝔼𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠.
Find the Wold moving average representation for 𝑋𝑡 .
Find a formula for the 𝐴1𝑗 ’s in
∞
̂𝑡+1 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴1𝑗 𝑋𝑡−𝑗
𝔼𝑋
𝑗=0
∞
̂ 𝑡+2 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴2𝑗 𝑋𝑡−𝑗
𝔼𝑋
𝑗=0
59.6.2 Exercise 2
𝑌𝑡 = 𝐷(𝐿)𝑈𝑡
𝑚
where 𝐷(𝐿) = ∑𝑗=0 𝐷𝑗 𝐿𝐽 , 𝐷𝑗 an 𝑛 × 𝑛 matrix, 𝑈𝑡 an (𝑛 × 1) vector white noise with :math:
mathbb{E} U_t =0 for all 𝑡, 𝔼𝑈𝑡 𝑈𝑠′ = 0 for all 𝑠 ≠ 𝑡, and 𝔼𝑈𝑡 𝑈𝑡′ = 𝐼 for all 𝑡.
Let 𝜀𝑡 be an 𝑛 × 1 vector white noise with mean 0 and contemporaneous covariance matrix 𝐻,
where 𝐻 is a positive definite matrix.
Let 𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 .
′ ′ ′
Define the covariograms as 𝐶𝑋 (𝜏 ) = 𝔼𝑋𝑡 𝑋𝑡−𝜏 , 𝐶𝑌 (𝜏 ) = 𝔼𝑌𝑡 𝑌𝑡−𝜏 , 𝐶𝑌 𝑋 (𝜏 ) = 𝔼𝑌𝑡 𝑋𝑡−𝜏 .
Then define the matrix covariance generating function, as in (21), only interpret all the ob-
jects in (21) as matrices.
Show that the covariance generating functions are given by
𝑔𝑦 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑔𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻
𝑔𝑌 𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑚
𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻 = 𝐶(𝑧)𝐶(𝑧 −1 )′ , 𝐶(𝑧) = ∑ 𝐶𝑗 𝑧𝑗
𝑗=0
where the zeros of |𝐶(𝑧)| do not lie inside the unit circle.
A vector Wold moving average representation of 𝑋𝑡 is then
59.6. EXERCISES 1049
𝑋𝑡 = 𝐶(𝐿)𝜂𝑡
𝐶(𝐿)
𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ 𝑗 ] 𝜂𝑡
𝐿 +
If 𝐶(𝐿) is invertible, i.e., if the zeros of det 𝐶(𝑧) lie strictly outside the unit circle, then this
formula can be written
𝐶(𝐿)
𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ 𝐽 ] 𝐶(𝐿)−1 𝑋𝑡
𝐿 +
1050 CHAPTER 59. CLASSICAL FILTERING WITH LINEAR ALGEBRA
Part VIII
1051
Chapter 60
60.1 Contents
• Duopoly 60.2
• The Stackelberg Problem 60.3
• Stackelberg Plan 60.4
• Recursive Representation of Stackelberg Plan 60.5
• Computing the Stackelberg Plan 60.6
• Exhibiting Time Inconsistency of Stackelberg Plan 60.7
• Recursive Formulation of the Follower’s Problem 60.8
• Markov Perfect Equilibrium 60.9
• MPE vs. Stackelberg 60.10
This notebook formulates and computes a plan that a Stackelberg leader uses to manip-
ulate forward-looking decisions of a Stackelberg follower that depend on continuation se-
quences of decisions made once and for all by the Stackelberg leader at time 0.
To facilitate computation and interpretation, we formulate things in a context that allows us
to apply linear optimal dynamic programming.
From the beginning we carry along a linear-quadratic model of duopoly in which firms face
adjustment costs that make them want to forecast actions of other firms that influence future
prices.
60.2 Duopoly
𝑝𝑡 = 𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 )
where 𝑞𝑖𝑡 is output of firm 𝑖 at time 𝑡 and 𝑎0 and 𝑎1 are both positive.
𝑞10 , 𝑞20 are given numbers that serve as initial conditions at time 0.
By incurring a cost of change
1053
1054 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
2
𝛾𝑣𝑖𝑡
2
𝜋𝑖𝑡 = 𝑝𝑡 𝑞𝑖𝑡 − 𝛾𝑣𝑖𝑡
∞
∑ 𝛽 𝑡 𝜋𝑖𝑡
𝑡=0
Knowing that firm 2 has chosen {𝑞2𝑡+1 }∞𝑡=0 , the follower firm 1 goes second and chooses
{𝑞1𝑡+1 }∞
𝑡=0 once and for all at time 0.
In choosing 𝑞2⃗ , firm 2 takes into account that firm 1 will base its choice of 𝑞1⃗ on firm 2’s
choice of 𝑞2⃗ .
where the appearance behind the semi-colon indicates that 𝑞2⃗ is given.
Firm 1’s problem induces a best response mapping
𝑞1⃗ = 𝐵(𝑞2⃗ )
whose maximizer is a sequence 𝑞2⃗ that depends on the initial conditions 𝑞10 , 𝑞20 and the pa-
rameters of the model 𝑎0 , 𝑎1 , 𝛾.
This formulation captures key features of the model
• Both firms make once-and-for-all choices at time 0.
• This is true even though both firms are choosing sequences of quantities that are in-
dexed by time.
• The Stackelberg leader chooses first within time 0, knowing that the Stackelberg fol-
lower will choose second within time 0.
While our abstract formulation reveals the timing protocol and equilibrium concept well, it
obscures details that must be addressed when we want to compute and interpret a Stackel-
berg plan and the follower’s best response to it.
To gain insights about these things, we study them in more detail.
Firm 2 knows that firm 1 chooses second and takes this into account in choosing {𝑞2𝑡+1 }∞
𝑡=0 .
In the spirit of working backwards, we study firm 1’s problem first, taking {𝑞2𝑡+1 }∞
𝑡=0 as given.
∞
𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0
𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
These first-order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to take the
form
𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞1𝑡+1 − 1 𝑞2𝑡+1
2𝛾 𝛾 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡
We can substitute the second equation into the first equation to obtain
This equation can in turn be rearranged to become the second-order difference equation
Equation (1) is a second-order difference equation in the sequence 𝑞1⃗ whose solution we want.
It satisfies two boundary conditions:
• an initial condition that 𝑞1,0 , which is given
• a terminal condition requiring that lim𝑇 →+∞ 𝛽 𝑇 𝑞1𝑡
2
< +∞
Using the lag operators described in chapter IX of Macroeconomic Theory, Second edition
(1987), difference equation (1) can be written as
1 + 𝛽 + 𝑐1
𝛽(1 − 𝐿 + 𝛽 −1 𝐿2 )𝑞1𝑡+2 = −𝑐0 + 𝑐2 𝑞2𝑡+1
𝛽
The polynomial in the lag operator on the left side can be factored as
1 + 𝛽 + 𝑐1
(1 − 𝐿 + 𝛽 −1 𝐿2 ) = (1 − 𝛿1 𝐿)(1 − 𝛿2 𝐿) (2)
𝛽
Because 𝛿2 > √1𝛽 the operator (1 − 𝛿2 𝐿) contributes an unstable component if solved back-
wards but a stable component if solved forwards.
Mechanically, write
−1 −1 −1
[−𝛿2 𝐿(1 − 𝛿2−1 𝐿−1 )] = −𝛿2 (1 − 𝛿2 ) 𝐿−1
Operating on both sides of equation (2) with 𝛽 −1 times this inverse operator gives the fol-
lower’s decision rule for setting 𝑞1𝑡+1 in the feedback-feedforward form
∞
1
𝑞1𝑡+1 = 𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 −1
+ 𝑐 2 𝛿 −1 −1
2 𝛽 ∑ 𝛿2𝑗 𝑞2𝑡+𝑗+1 , 𝑡≥0 (3)
1 − 𝛿2 𝑗=0
The problem of the Stackelberg leader firm 2 is to choose the sequence {𝑞2𝑡+1 }∞
𝑡=0 to maxi-
mize its discounted profits
∞
∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
∞
𝐿̃ = ∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
∞ ∞ (4)
1
𝑡
+ ∑ 𝛽 𝜃𝑡 {𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 + 𝑐 𝛿
2 2
−1 −1
𝛽 ∑ 𝛿2−𝑗 𝑞2𝑡+𝑗+1 − 𝑞1𝑡+1
𝑡=0
1 − 𝛿2−1 𝑗=0
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
𝑟(𝑦, 𝑢) = 𝑦′ 𝑅𝑦 + 𝑢′ 𝑄𝑢
Subject to an initial condition for 𝑧0 , but not for 𝑥0 , the Stackelberg leader wants to maxi-
mize
1058 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
∞
− ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 ) (5)
𝑡=0
𝐼 0 𝑧 𝐴̂ 𝐴12̂ 𝑧 ̂ 𝑡
[ ] [ 𝑡+1 ] = [ 11
̂ ̂ ] [ 𝑡 ] + 𝐵𝑢 (6)
𝐺21 𝐺22 𝑥𝑡+1 𝐴21 𝐴22 𝑥𝑡
𝐼 0
We assume that the matrix [ ] on the left side of equation (6) is invertible, so that
𝐺21 𝐺22
we can multiply both sides by its inverse to obtain
𝑧 𝐴 𝐴12 𝑧𝑡
[ 𝑡+1 ] = [ 11 ] [ ] + 𝐵𝑢𝑡 (7)
𝑥𝑡+1 𝐴21 𝐴22 𝑥𝑡
or
The Stackelberg follower’s best response mapping is summarized by the second block of equa-
tions of (7).
In particular, these equations are the first-order conditions of the Stackelberg follower’s opti-
mization problem (i.e., its Euler equations).
These Euler equations summarize the forward-looking aspect of the follower’s behavior and
express how its time 𝑡 decision depends on the leader’s actions at times 𝑠 ≥ 𝑡.
When combined with a stability condition to be imposed below, the Euler equations summa-
rize the follower’s best response to the sequence of actions by the leader.
The Stackelberg leader maximizes (5) by choosing sequences {𝑢𝑡 , 𝑥𝑡 , 𝑧𝑡+1 }∞
𝑡=0 subject to (8)
and an initial condition for 𝑧0 .
Note that we have an initial condition for 𝑧0 but not for 𝑥0 .
𝑥0 is among the variables to be chosen at time 0 by the Stackelberg leader.
The Stackelberg leader uses its understanding of the responses restricted by (8) to manipulate
the follower’s decisions.
Please remember that the follower’s Euler equation is embedded in the system of dynamic
equations 𝑦𝑡+1 = 𝐴𝑦𝑡 + 𝐵𝑢𝑡 .
60.3. THE STACKELBERG PROBLEM 1059
Subproblem 1
∞
𝑣(𝑦0 ) = max − ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 )
(𝑦1⃗ ,𝑢⃗ 0 )∈Ω(𝑦0 )
𝑡=0
Subproblem 2
Subproblem 1
𝑣(𝑦) = max
∗
{−𝑟(𝑦, 𝑢) + 𝛽𝑣(𝑦∗ )} (9)
𝑢,𝑦
𝑦∗ = 𝐴𝑦 + 𝐵𝑢
which as in lecture linear regulator gives rise to the algebraic matrix Riccati equation
𝑢𝑡 = −𝐹 𝑦𝑡
Subproblem 2
−2𝑃21 𝑧0 − 2𝑃22 𝑥0 = 0,
−1
𝑥0 = −𝑃22 𝑃21 𝑧0
Now let’s map our duopoly model into the above setup.
We we’ll formulate a state space system
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
where in this instance 𝑥𝑡 = 𝑣1𝑡 , the time 𝑡 decision of the follower firm 1.
Now we’ll proceed to cast our duopoly model within the framework of the more general
linear-quadratic structure described above.
60.4. STACKELBERG PLAN 1061
That will allow us to compute a Stackelberg plan simply by enlisting a Riccati equation to
solve a linear-quadratic dynamic program.
As emphasized above, firm 1 acts as if firm 2’s decisions {𝑞2𝑡+1 , 𝑣2𝑡 }∞
𝑡=0 are given and beyond
its control.
∞
𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0
𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
These first-order order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to
take the form
𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞1𝑡+1 − 1 𝑞2𝑡+1
2𝛾 𝛾 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡
We use these two equations as components of the following linear system that confronts a
Stackelberg continuation leader at time 𝑡
1 0 0 0 1 1 0 0 0 1 0
⎡ 0 1 0 0 ⎤ ⎡𝑞2𝑡+1 ⎤ ⎡0 1 0 0⎤ ⎡𝑞 ⎤ ⎡1⎤
⎢ ⎥⎢ ⎥=⎢ ⎥ ⎢ 2𝑡 ⎥ + ⎢ ⎥ 𝑣
⎢ 0 0 1 0 ⎥ ⎢𝑞1𝑡+1 ⎥ ⎢0 0 1 1⎥ ⎢𝑞1𝑡 ⎥ ⎢0⎥ 2𝑡
𝛽𝑎0
⎣ 2𝛾 − 𝛽𝑎
2𝛾
1
− 𝛽𝑎𝛾 1 𝛽 ⎦ ⎣𝑣1𝑡+1 ⎦ ⎣0 0 0 1⎦ ⎣𝑣1𝑡 ⎦ ⎣0⎦
2
Time 𝑡 revenues of firm 2 are 𝜋2𝑡 = 𝑎0 𝑞2𝑡 − 𝑎1 𝑞2𝑡 − 𝑎1 𝑞1𝑡 𝑞2𝑡 which evidently equal
′ 𝑎0
1 0 2 0 1
𝑧𝑡′ 𝑅1 𝑧𝑡 ≡ ⎡ 𝑞 ⎤ ⎡ 𝑎0
⎢ 2𝑡 ⎥ ⎢ 2 −𝑎1 − 𝑎21 ⎤ ⎡𝑞 ⎤
⎥ ⎢ 2𝑡 ⎥
⎣𝑞1𝑡 ⎦ ⎣ 0 − 𝑎21 0 ⎦ ⎣𝑞1𝑡 ⎦
2
𝑦𝑡′ 𝑅𝑦𝑡 − 𝑄𝑣2𝑡
where
1062 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
𝑅1 0
𝑅=[ ]
0 0
−1
𝑥0̌ = −𝑃22 𝑃21 𝑧0
𝑢𝑡 = −𝐹 𝑦𝑡̌ , 𝑡≥0
𝑦𝑡+1
̌ = (𝐴 − 𝐵𝐹 )𝑦𝑡̌ , 𝑡≥0
From this representation we can deduce the sequence of functions 𝜎 = {𝜎𝑡 (𝑧𝑡̌ )}∞
𝑡=0 that com-
prise a Stackelberg plan.
𝑧̌
For convenience, let 𝐴 ̌ ≡ 𝐴 − 𝐵𝐹 and partition 𝐴 ̌ conformably to the partition 𝑦𝑡 = [ 𝑡 ] as
𝑥𝑡̌
𝐴̌ ̌
𝐴12
[ 11̌ ̌ ]
𝐴21 𝐴22
𝑧0̌
Then iterations on 𝑦𝑡+1
̌ = 𝐴𝑦̌ 𝑡̌ starting from initial condition 𝑦0̌ = [ ] imply that for
𝐻00 𝑧0̌
𝑡≥1
𝑡
𝑥𝑡 = ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗
̌
𝑗=1
where
̌
𝐻1𝑡 = 𝐴21
𝐻𝑡 = 𝐴̌ 𝐴̌
2 22 21
⋮ ⋮
𝑡
𝐻𝑡−1 ̌ 𝐴̌
= 𝐴𝑡−2
22 21
𝐻𝑡𝑡 = ̌
𝐴𝑡−1 ̌ ̌ 𝐻 0)
22 (𝐴21 + 𝐴22 0
𝑧̌
𝑢𝑡 = −𝐹 𝑦𝑡̌ ≡ − [𝐹𝑧 𝐹𝑥 ] [ 𝑡 ]
𝑥𝑡
or
𝑡
𝑢𝑡 = −𝐹𝑧 𝑧𝑡̌ − 𝐹𝑥 ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗 = 𝜎𝑡 (𝑧𝑡̌ ) (10)
𝑗=1
Representation (10) confirms that whenever 𝐹𝑥 ≠ 0, the typical situation, the time 𝑡 compo-
nent 𝜎𝑡 of a Stackelberg plan is history dependent, meaning that the Stackelberg leader’s
choice 𝑢𝑡 depends not just on 𝑧𝑡̌ but on components of 𝑧𝑡−1
̌ .
After all, at the end of the day, it will turn out that because we set 𝑧0̌ = 𝑧0 , it will be true
that 𝑧𝑡 = 𝑧𝑡̌ for all 𝑡 ≥ 0.
Then why did we distinguish 𝑧𝑡̌ from 𝑧𝑡 ?
The answer is that if we want to present to the Stackelberg follower a history-dependent
representation of the Stackelberg leader’s sequence 𝑞2⃗ , we must use representation (10) cast
in terms of the history 𝑧𝑡̌ and not a corresponding representation cast in terms of 𝑧𝑡 .
Given the sequence 𝑞2⃗ chosen by the Stackelberg leader in our duopoly model, it turns out
that the Stackelberg follower’s problem is recursive in the natural state variables that con-
front a follower at any time 𝑡 ≥ 0.
This means that the follower’s plan is time consistent.
1064 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
To verify these claims, we’ll formulate a recursive version of a follower’s problem that builds
on our recursive representation of the Stackelberg leader’s plan and our use of the Big K,
little k idea.
We now use what amounts to another “Big 𝐾, little 𝑘” trick (see rational expectations equi-
librium) to formulate a recursive version of a follower’s problem cast in terms of an ordinary
Bellman equation.
Firm 1, the follower, faces {𝑞2𝑡 }∞
𝑡=0 as a given quantity sequence chosen by the leader and be-
lieves that its output price at 𝑡 satisfies
To do so, recall that under the Stackelberg plan, firm 2 sets output according to the 𝑞2𝑡 com-
ponent of
1
⎡𝑞 ⎤
𝑦𝑡+1 = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡 ⎥
⎣ 𝑥𝑡 ⎦
which is governed by
𝑦𝑡+1 = (𝐴 − 𝐵𝐹 )𝑦𝑡
1
⎡𝑞 ⎤
𝑦𝑡̃ = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡
̃ ⎥
⎣ 𝑥𝑡̃ ⎦
𝑦𝑡+1
̃ = (𝐴 − 𝐵𝐹 )𝑦𝑡̃
−1
subject to the initial condition 𝑞10
̃ = 𝑞10 and 𝑥0̃ = 𝑥0 where 𝑥0 = −𝑃22 𝑃21 as stated above.
Firm 1’s state vector is
𝑦𝑡̃
𝑋𝑡 = [ ]
𝑞1𝑡
𝑦̃ 𝐴 − 𝐵𝐹 0 𝑦𝑡̃ 0
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑥𝑡 (11)
𝑞1𝑡+1 0 1 𝑞1𝑡 1
This specfification assures that from the point of the view of a firm 1, 𝑞2𝑡 is an exogenous pro-
cess.
Here
• 𝑞1𝑡
̃ , 𝑥𝑡̃ play the role of Big K.
• 𝑞1𝑡 , 𝑥𝑡 play the role of little k.
The time 𝑡 component of firm 1’s objective is
′
1 0 0 0 0 𝑎20 1
⎡𝑞 ⎤ ⎡0 0 0 0 − 𝑎21 ⎤ ⎡𝑞2𝑡 ⎤
2𝑡 ⎥
̃ 𝑡 − 𝑥2𝑡 𝑄̃ = ⎢
𝑋̃ 𝑡′ 𝑅𝑥 ⎢𝑞1𝑡
̃ ⎥
⎢
⎢0 0 0 0
⎥⎢ ⎥
0 ⎥ ⎢𝑞1𝑡 ̃ ⎥ − 𝛾𝑥2𝑡
⎢ 𝑥𝑡̃ ⎥ ⎢0 0 0 0 0 ⎥ ⎢ 𝑥𝑡̃ ⎥
𝑎
⎣𝑞1𝑡 ⎦ ⎣ 20 − 𝑎21 0 0 −𝑎1 ⎦ ⎣𝑞1𝑡 ⎦
𝑥𝑡 = −𝐹 ̃ 𝑋𝑡
𝑋̃ 𝑡+1 = (𝐴 ̃ − 𝐵̃ 𝐹 ̃ )𝑋𝑡
1
⎡𝑞 ⎤
⎢ 20 ⎥
𝑋0 = ⎢𝑞10 ⎥
⎢ 𝑥0 ⎥
⎣𝑞10 ⎦
we recover
𝑥0 = −𝐹 ̃ 𝑋̃ 0
which will verify that we have properly set up a recursive representation of the follower’s
problem facing the Stackelberg leader’s 𝑞2⃗ .
Since the follower can solve its problem using dynamic programming its problem is recursive
in what for it are the natural state variables, namely
1066 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
1
⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎢𝑞10
̃ ⎥
⎣ 𝑥0̃ ⎦
Here is our code to compute a Stackelberg plan via a linear-quadratic dynamic program as
outlined above
We define named tuples and default values for the model and solver settings, and instantiate
one copy of each
defaultModel = model();
defaultSettings = settings();
Now we can compute the actual policy using the LQ routine from QuantEcon.jl
coefficients
Arhs = I + zeros(4, 4);
Arhs[3, 4] = 1.;
Alhsinv = inv(Alhs);
60.6. COMPUTING THE STACKELBERG PLAN 1067
A = Alhsinv * Arhs;
B = Alhsinv * [0, 1, 0, 0,];
R = [0 -a0/2 0 0; -a0/2 a1 a1/2 0; 0 a1/2 0 0; 0 0 0 0];
Q = γ;
lq = QuantEcon.LQ(Q, R, A, B, bet=β);
P, F, d = stationary_values(lq)
# simulate forward
π_leader = zeros(n);
z0 = [1, 1, 1];
x0 = H_0_0 * z0;
y0 = vcat(z0, x0);
for t in 1:n
π_leader[t] = -(yt[:, t]' * π_matrix * yt[:, t]);
end
Out[5]:
1068 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
Out[7]: true
Out[8]: true
60.7. EXHIBITING TIME INCONSISTENCY OF STACKELBERG PLAN 1069
yt_reset = copy(yt)
yt_reset[end, :] = (H_0_0 * yt[1:3, :])
for t in 1:n
vt_leader[t] = -yt[:, t]' * P * yt[:, t]
vt_reset_leader[t] = -yt_reset[:, t]' * P * yt_reset[:, t]
end
Out[9]:
1070 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
We now formulate and compute the recursive version of the follower’s problem.
We check that the recursive Big 𝐾 , little 𝑘 formulation of the follower’s problem produces
the same output path 𝑞1⃗ that we computed when we solved the Stackelberg problem
� � �
lq_tilde = QuantEcon.LQ(Q, R, Ã, B, bet=β);
� � �
P, F, d = stationary_values(lq_tilde);
y0_tilde = vcat(y0, y0[3]);
yt_tilde = compute_sequence(lq_tilde, y0_tilde, n)[1];
In [11]: # checks that the recursive formulation of the follower's problem gives
# the same solution as the original Stackelberg problem
plot(1:n+1, [yt_tilde[5, :], yt_tilde[3, :]], labels = ["q_tilde", "q"])
Out[11]:
60.8. RECURSIVE FORMULATION OF THE FOLLOWER’S PROBLEM 1071
Note: Variables with _tilde are obtained from solving the follower’s problem – those with-
out are from the Stackelberg problem.
In [12]: # maximum absolute difference in quantities over time between the first�
↪and second
solution methods
max(abs(yt_tilde[5] - yt_tilde[3]))
Out[12]: 0.0
In [13]: # x0 == x0_tilde
yt[:, 1][end] - (yt_tilde[:, 2] - yt_tilde[:, 1])[end] < tol0
Out[13]: true
If we inspect the coefficients in the decision rule −𝐹 ̃ , we can spot the reason that the follower
chooses to set 𝑥𝑡 = 𝑥𝑡̃ when it sets 𝑥𝑡 = −𝐹 ̃ 𝑋𝑡 in the recursive formulation of the follower
problem.
Can you spot what features of 𝐹 ̃ imply this?
Hint: remember the components of 𝑋𝑡
�
In [14]: F # policy function in the follower's problem
�
In [16]: P # value function in the follower's problem
Out[17]: true
for i in 1:1000
� � � �
P_guess = ((R + F_star' * Q * F_star) +
� �
β * (Ã - B * F_star)' * P_guess
� �
* (Ã - B * F_star));
end
Out[19]: 112.65590740578102
Out[20]: 112.65590740578097
for i in 1:100
# compute P_iter
dist_vec = similar(P_iter)
for j in 1:1000
�
P_iter = (R + F_iter' * Q * F_iter) + β *
�
(Ã - B * F_iter)' * P_iter *
�
(Ã - B * F_iter);
# update F_iter
� �
F_iter = β * inv(Q + β * B ' * P_iter * B) *
�
B ' * P_iter * Ã;
�
dist_vec = P_iter - ((R + F_iter' * Q * F_iter) +
�
β * (Ã - B * F_iter)' * P_iter *
�
(Ã - B * F_iter));
end
end
iterations")
end
else
println("The policy didn't converge: try increasing the number of�
↪inner loop
iterations")
end
for t in 1:n-1
� �
yt_tilde_star[t+1, :] = (Ã - B * F_star) * yt_tilde_star[t, :];
end
Out[22]:
1074 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
Out[23]: 0.0
1
𝑧𝑡 = ⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎣𝑞1𝑡 ⎦
0 0
𝐵1 = ⎢0⎤
⎡
⎥, 𝐵2 = ⎢1⎤
⎡
⎥
1
⎣ ⎦ 0
⎣ ⎦
𝑧𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2 )𝑧𝑡
In [24]: # in LQ form
A = I + zeros(3, 3);
B1 = [0, 0, 1];
B2 = [0, 1, 0];
R1 = [0 0 -a0/2; 0 0 a1/2; -a0/2 a1/2 a1];
R2 = [0 -a0/2 0; -a0/2 a1 a1/2; 0 a1/2 0];
Q1 = Q2 = γ;
S1 = S2 = W1 = W2 = M1 = M2 = 0.;
# simulate forward
AF = A - B1 * F1 - B2 * F2;
z = zeros(3, n);
z[:, 1] .= 1;
for t in 1:n-1
z[:, t+1] = AF * z[:, t]
end
Out[25]:
1076 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
Out[26]: 8.881784197001252e-16
Out[28]: true
60.10. MPE VS. STACKELBERG 1077
for t in 1:n
vt_MPE[t] = -z[:, t]' * P1 * z[:, t];
�
vt_follower[t] = -yt_tilde[:, t]' * P * yt_tilde[:, t];
end
Out[29]:
vt_leader(y0) = 150.03237147548847
vt_follower(y0) = 112.65590740578102
vt_MPE(y0) = 133.3295555721595
Out[31]: -3.9708322630494877
1078 CHAPTER 60. DYNAMIC STACKELBERG PROBLEMS
Chapter 61
Optimal Taxation in an LQ
Economy
61.1 Contents
• Overview 61.2
• The Ramsey Problem 61.3
• Implementation 61.4
• Examples 61.5
• Exercises 61.6
• Solutions 61.7
61.2 Overview
1079
1080 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
plan.
Because the Lucas and Stokey model features state-contingent government debt, the govern-
ment debt dynamics differ substantially from those in a model of Robert Barro [8].
The treatment given here closely follows this manuscript, prepared by Thomas J. Sargent and
Francois R. Velde.
We cover only the key features of the problem in this lecture, leaving you to refer to that
source for additional results and intuition.
61.2.2 Setup
We begin by outlining the key assumptions regarding technology, households and the govern-
ment sector.
61.3.1 Technology
61.3.2 Households
Consider a representative household who chooses a path {ℓ𝑡 , 𝑐𝑡 } for labor and consumption to
maximize
1 ∞
−𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] (1)
2 𝑡=0
61.3. THE RAMSEY PROBLEM 1081
∞
𝔼 ∑ 𝛽 𝑡 𝑝𝑡0 [𝑑𝑡 + (1 − 𝜏𝑡 )ℓ𝑡 + 𝑠𝑡 − 𝑐𝑡 ] = 0 (2)
𝑡=0
Here
• 𝛽 is a discount factor in (0, 1)
• 𝑝𝑡0 is a scaled Arrow-Debreu price at time 0 of history contingent goods at time 𝑡 + 𝑗
• 𝑏𝑡 is a stochastic preference parameter
• 𝑑𝑡 is an endowment process
• 𝜏𝑡 is a flat tax rate on labor income
• 𝑠𝑡 is a promised time-𝑡 coupon payment on debt issued by the government
The scaled Arrow-Debreu price 𝑝𝑡0 is related to the unscaled Arrow-Debreu price as follows.
If we let 𝜋𝑡0 (𝑥𝑡 ) denote the probability (density) of a history 𝑥𝑡 = [𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] of the state
𝑥𝑡 , then the Arrow-Debreu time 0 price of a claim on one unit of consumption at date 𝑡, his-
tory 𝑥𝑡 would be
𝛽 𝑡 𝑝𝑡0
𝜋𝑡0 (𝑥𝑡 )
Thus, our scaled Arrow-Debreu price is the ordinary Arrow-Debreu price multiplied by the
discount factor 𝛽 𝑡 and divided by an appropriate probability.
The budget constraint (2) requires that the present value of consumption be restricted to
equal the present value of endowments, labor income and coupon payments on bond holdings.
61.3.3 Government
The government imposes a linear tax on labor income, fully committing to a stochastic path
of tax rates at time zero.
The government also issues state-contingent debt.
Given government tax and borrowing plans, we can construct a competitive equilibrium with
distorting government taxes.
Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare
of the representative consumer.
Endowments, government expenditure, the preference shock process 𝑏𝑡 , and promised coupon
payments on initial government debt 𝑠𝑡 are all exogenous, and given by
• 𝑑𝑡 = 𝑆𝑑 𝑥𝑡
• 𝑔𝑡 = 𝑆𝑔 𝑥𝑡
• 𝑏𝑡 = 𝑆𝑏 𝑥𝑡
• 𝑠𝑡 = 𝑆𝑠 𝑥𝑡
The matrices 𝑆𝑑 , 𝑆𝑔 , 𝑆𝑏 , 𝑆𝑠 are primitives and {𝑥𝑡 } is an exogenous stochastic process taking
values in ℝ𝑘 .
1082 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
1. Discrete case: {𝑥𝑡 } is a discrete state Markov chain with transition matrix 𝑃 .
2. VAR case: {𝑥𝑡 } obeys 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 where {𝑤𝑡 } is independent zero mean Gaus-
sian with identify covariance matrix.
61.3.5 Feasibility
𝑐𝑡 + 𝑔𝑡 = 𝑑𝑡 + ℓ𝑡 (3)
Where 𝑝𝑡0 is again a scaled Arrow-Debreu price, the time zero government budget constraint
is
∞
𝔼 ∑ 𝛽 𝑡 𝑝𝑡0 (𝑠𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 ) = 0 (4)
𝑡=0
61.3.7 Equilibrium
An equilibrium is a feasible allocation {ℓ𝑡 , 𝑐𝑡 }, a sequence of prices {𝑝𝑡0 }, and a tax system
{𝜏𝑡 } such that
1. The allocation {ℓ𝑡 , 𝑐𝑡 } is optimal for the household given {𝑝𝑡0 } and {𝜏𝑡 }.
The Ramsey problem is to choose the equilibrium {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } that maximizes the house-
hold’s welfare.
If {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } solves the Ramsey problem, then {𝜏𝑡 } is called the Ramsey plan.
The solution procedure we adopt is
1. Use the first-order conditions from the household problem to pin down prices and allo-
cations given {𝜏𝑡 }.
2. Use these expressions to rewrite the government budget constraint (4) in terms of ex-
ogenous variables and allocations.
3. Maximize the household’s objective function (1) subject to the constraint constructed in
step 2 and the feasibility constraint (3).
The solution to this maximization problem pins down all quantities of interest.
61.3. THE RAMSEY PROBLEM 1083
61.3.8 Solution
Step one is to obtain the first-conditions for the household’s problem, taking taxes and prices
as given.
Letting 𝜇 be the Lagrange multiplier on (2), the first-order conditions are 𝑝𝑡0 = (𝑐𝑡 − 𝑏𝑡 )/𝜇 and
ℓ𝑡 = (𝑐𝑡 − 𝑏𝑡 )(1 − 𝜏𝑡 ).
Rearranging and normalizing at 𝜇 = 𝑏0 − 𝑐0 , we can write these conditions as
𝑏𝑡 − 𝑐𝑡 ℓ𝑡
𝑝𝑡0 = and 𝜏𝑡 = 1 − (5)
𝑏0 − 𝑐0 𝑏𝑡 − 𝑐𝑡
∞
𝔼 ∑ 𝛽 𝑡 [(𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 − ℓ𝑡 ) + ℓ𝑡2 ] = 0 (6)
𝑡=0
The Ramsey problem now amounts to maximizing (1) subject to (6) and (3).
The associated Lagrangian is
∞
1
ℒ = 𝔼 ∑ 𝛽 𝑡 {− [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] + 𝜆 [(𝑏𝑡 − 𝑐𝑡 )(ℓ𝑡 − 𝑠𝑡 − 𝑔𝑡 ) − ℓ𝑡2 ] + 𝜇𝑡 [𝑑𝑡 + ℓ𝑡 − 𝑐𝑡 − 𝑔𝑡 ]} (7)
𝑡=0
2
and
ℓ𝑡 − 𝜆[(𝑏𝑡 − 𝑐𝑡 ) − 2ℓ𝑡 ] = 𝜇𝑡
Combining these last two equalities with (3) and working through the algebra, one can show
that
where
• 𝜈 ∶= 𝜆/(1 + 2𝜆)
• ℓ𝑡̄ ∶= (𝑏𝑡 − 𝑑𝑡 + 𝑔𝑡 )/2
• 𝑐𝑡̄ ∶= (𝑏𝑡 + 𝑑𝑡 − 𝑔𝑡 )/2
• 𝑚𝑡 ∶= (𝑏𝑡 − 𝑑𝑡 − 𝑠𝑡 )/2
Apart from 𝜈, all of these quantities are expressed in terms of exogenous variables.
To solve for 𝜈, we can use the government’s budget constraint again.
The term inside the brackets in (6) is (𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 ) − (𝑏𝑡 − 𝑐𝑡 )ℓ𝑡 + ℓ𝑡2 .
Using (8), the definitions above and the fact that ℓ ̄ = 𝑏 − 𝑐,̄ this term can be rewritten as
1084 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
∞ ∞
𝔼 {∑ 𝛽 𝑡 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )} + (𝜈 2 − 𝜈)𝔼 {∑ 𝛽 𝑡 2𝑚2𝑡 } = 0 (9)
𝑡=0 𝑡=0
∞ ∞
𝑏0 ∶= 𝔼 {∑ 𝛽 𝑡 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )} and 𝑎0 ∶= 𝔼 {∑ 𝛽 𝑡 2𝑚2𝑡 } (10)
𝑡=0 𝑡=0
𝑏0 + 𝑎0 (𝜈 2 − 𝜈) = 0
for 𝜈.
Provided that 4𝑏0 < 𝑎0 , there is a unique solution 𝜈 ∈ (0, 1/2), and a unique corresponding
𝜆 > 0.
Let’s work out how to compute mathematical expectations in (10).
For the first one, the random variable (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 ) inside the summation can be expressed
as
1 ′
𝑥 (𝑆 − 𝑆𝑑 + 𝑆𝑔 )′ (𝑆𝑔 + 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
For the second expectation in (10), the random variable 2𝑚2𝑡 can be written as
1 ′
𝑥 (𝑆 − 𝑆𝑑 − 𝑆𝑠 )′ (𝑆𝑏 − 𝑆𝑑 − 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
It follows that both objects of interest are special cases of the expression
∞
𝑞(𝑥0 ) = 𝔼 ∑ 𝛽 𝑡 𝑥′𝑡 𝐻𝑥𝑡 (11)
𝑡=0
Next suppose that {𝑥𝑡 } is the discrete Markov process described above.
Suppose further that each 𝑥𝑡 takes values in the state space {𝑥1 , … , 𝑥𝑁 } ⊂ ℝ𝑘 .
Let ℎ ∶ ℝ𝑘 → ℝ be a given function, and suppose that we wish to evaluate
∞
𝑞(𝑥0 ) = 𝔼 ∑ 𝛽 𝑡 ℎ(𝑥𝑡 ) given 𝑥0 = 𝑥𝑗
𝑡=0
∞
𝑞(𝑥0 ) = ∑ 𝛽 𝑡 (𝑃 𝑡 ℎ)[𝑗] (12)
𝑡=0
Here
• 𝑃 𝑡 is the 𝑡-th power of the transition matrix 𝑃
• ℎ is, with some abuse of notation, the vector (ℎ(𝑥1 ), … , ℎ(𝑥𝑁 ))
• (𝑃 𝑡 ℎ)[𝑗] indicates the 𝑗-th element of 𝑃 𝑡 ℎ
It can be show that (12) is in fact equal to the 𝑗-th element of the vector (𝐼 − 𝛽𝑃 )−1 ℎ.
This last fact is applied in the calculations below.
We are interested in tracking several other variables besides the ones described above.
To prepare the way for this, we define
𝑡
𝑏𝑡+𝑗 − 𝑐𝑡+𝑗
𝑝𝑡+𝑗 =
𝑏𝑡 − 𝑐𝑡
as the scaled Arrow-Debreu time 𝑡 price of a history contingent claim on one unit of con-
sumption at time 𝑡 + 𝑗.
These are prices that would prevail at time 𝑡 if market were reopened at time 𝑡.
These prices are constituents of the present value of government obligations outstanding at
time 𝑡, which can be expressed as
1086 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
∞
𝐵𝑡 ∶= 𝔼𝑡 ∑ 𝛽 𝑗 𝑝𝑡+𝑗
𝑡
(𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) (13)
𝑗=0
Using our expression for prices and the Ramsey plan, we can also write 𝐵𝑡 as
∞ 2
(𝑏𝑡+𝑗 − 𝑐𝑡+𝑗 )(ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) − ℓ𝑡+𝑗
𝑗
𝐵𝑡 = 𝔼𝑡 ∑ 𝛽
𝑗=0
𝑏𝑡 − 𝑐𝑡
𝑡 𝑡 𝑡+1
𝑝𝑡+𝑗 = 𝑝𝑡+1 𝑝𝑡+𝑗
∞
𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝐸𝑡 ∑ 𝑝𝑡+𝑗 (𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 )
𝑗=1
and
𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 (14)
Define
𝑅𝑡−1 ∶= 𝔼𝑡 𝛽 𝑗 𝑝𝑡+1
𝑡
(15)
61.3.12 A Martingale
𝑡
Π𝑡 ∶= ∑ 𝜋𝑡
𝑠=0
• 𝑅𝑡 [𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ], which is what the government would have owed at the beginning of
period 𝑡 + 1 if it had simply borrowed at the one-period risk-free rate rather than selling
state-contingent securities.
61.4. IMPLEMENTATION 1087
Thus, 𝜋𝑡+1 is the excess payout on the actual portfolio of state contingent government debt
relative to an alternative portfolio sufficient to finance 𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 and consisting entirely of
risk-free one-period bonds.
Use expressions (14) and (15) to obtain
1 𝑡
𝜋𝑡+1 = 𝐵𝑡+1 − 𝑡 [𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 ]
𝛽𝐸𝑡 𝑝𝑡+1
or
where 𝐸𝑡̃ is the conditional mathematical expectation taken with respect to a one-step tran-
sition density that has been formed by multiplying the original transition density with the
likelihood ratio
𝑡
𝑝𝑡+1
𝑚𝑡𝑡+1 = 𝑡
𝐸𝑡 𝑝𝑡+1
which asserts that {𝜋𝑡+1 } is a martingale difference sequence under the distorted probability
measure, and that {Π𝑡 } is a martingale under the distorted probability measure.
In the tax-smoothing model of Robert Barro [8], government debt is a random walk.
In the current model, government debt {𝐵𝑡 } is not a random walk, but the excess payoff
{Π𝑡 } on it is.
61.4 Implementation
function compute_exog_sequences(econ, x)
# compute exogenous variable sequences
Sg, Sd, Sb, Ss = econ.Sg, econ.Sd, econ.Sb, econ.Ss
g, d, b, s = [dropdims(S * x, dims = 1) for S in (Sg, Sd, Sb, Ss)]
return g, d, b, s, Sm
end
if disc ≥ 0
ν = 0.5 *(a0 - sqrt(disc)) / a0
else
println("There is no Ramsey equilibrium for these parameters.")
error("Government spending (economy.g) too low")
end
if ν * (0.5 - ν) < 0
print("Negative multiplier on the government budget constraint.")
error("Government spending (economy.g) too low")
end
return ν
end
mc = MarkovChain(P)
state = simulate(mc, T, init=1)
x = x_vals[:, state]
# compute a0, b0
ns = size(P, 1)
F = I - β.*P
a0 = (F \ ((Sm * x_vals)'.^2))[1] ./ 2
H = ((Sb - Sd + Sg) * x_vals) .* ((Sg - Ss)*x_vals)
b0 = (F \ H')[1] ./ 2
# compute π
π, Π = compute_Π(B, R, rvn, g, ξ)
return (g = g, d = d, b = b, s = s, c = c,
l = l, p = p, τ = τ, rvn = rvn, B = B,
R = R, π = π, Π = Π, ξ = ξ)
end
1090 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
# compute a0 and b0
H = Sm'Sm
a0 = 0.5 * var_quadratic_sum(A, C, H, β, x0)
H = (Sb - Sd + Sg)'*(Sg + Ss)
b0 = 0.5 * var_quadratic_sum(A, C, H, β, x0)
# compute π
π, Π = compute_Π(B, R, rvn, g, ξ)
return (g = g, d = d, b = b, s = s,
c = c, l = l, p = p, τ = τ,
rvn = rvn, B = B, R = R,
61.4. IMPLEMENTATION 1091
π = π, Π = Π, ξ = ξ)
end
function gen_fig_1(path)
T = length(path.c)
function gen_fig_2(path)
T = length(path.c)
61.5 Examples
# parameters
β = 1 / 1.05
ρ, mg = .7, .35
A = [ρ mg*(1 - ρ); 0.0 1.0]
C = [sqrt(1 - ρ^2) * mg / 10 0.0; 0 0]
Sg = [1.0 0.0]
Sd = [0.0 0.0]
Sb = [0 2.135]
Ss = [0.0 0.0]
proc = ContStochProcess(A, C)
gen_fig_1(path)
Out[4]:
61.5. EXAMPLES 1093
In [5]: gen_fig_2(path)
Out[5]:
1094 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
Our second example adopts a discrete Markov specification for the exogenous process
In [6]: # Parameters
β = 1 / 1.05
P = [0.8 0.2 0.0
0.0 0.5 0.5
0.0 0.0 1.0]
path = compute_paths(econ, T)
gen_fig_1(path)
Out[6]:
In [7]: gen_fig_2(path)
Out[7]:
1096 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
61.6 Exercises
61.6.1 Exercise 1
61.7 Solutions
In [8]: # parameters
β = 1 / 1.05
ρ, mg = .95, .35
A = [0. 0. 0. ρ mg*(1-ρ);
1. 0. 0. 0. 0.;
0. 1. 0. 0. 0.;
0. 0. 1. 0. 0.;
61.7. SOLUTIONS 1097
0. 0. 0. 0. 1.]
C = zeros(5, 5)
C[1, 1] = sqrt(1 - ρ^2) * mg / 8
Sg = [1. 0. 0. 0. 0.]
Sd = [0. 0. 0. 0. 0.]
Sb = [0. 0. 0. 0. 2.135]
Ss = [0. 0. 0. 0. 0.]
proc = ContStochProcess(A, C)
econ = Economy(β, Sg, Sd, Sb, Ss, proc)
T = 50
path = compute_paths(econ, T)
In [9]: gen_fig_1(path)
Out[9]:
In [10]: gen_fig_2(path)
Out[10]:
1098 CHAPTER 61. OPTIMAL TAXATION IN AN LQ ECONOMY
Chapter 62
62.1 Contents
• Overview 62.2
• A Competitive Equilibrium with Distorting Taxes 62.3
• Recursive Formulation of the Ramsey problem 62.4
• Examples 62.5
• Further Comments 62.6
62.2 Overview
This lecture describes a celebrated model of optimal fiscal policy by Robert E. Lucas, Jr., and
Nancy Stokey [72].
The model revisits classic issues about how to pay for a war.
Here a war means a more or less temporary surge in an exogenous government expenditure
process.
The model features
• a government that must finance an exogenous stream of government expenditures with
either
– a flat rate tax on labor, or
– purchases and sales from a full array of Arrow state-contingent securities
• a representative household that values consumption and leisure
• a linear production function mapping labor into a single good
• a Ramsey planner who at time 𝑡 = 0 chooses a plan for taxes and trades of Arrow secu-
rities for all 𝑡 ≥ 0
After first presenting the model in a space of sequences, we shall represent it recursively
in terms of two Bellman equations formulated along lines that we encountered in Dynamic
Stackelberg models.
As in Dynamic Stackelberg models, to apply dynamic programming we shall define the state
vector artfully.
1099
1100 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
62.2.1 Setup
For 𝑡 ≥ 0, a history 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ] of an exogenous state 𝑠𝑡 has joint probability density
𝜋𝑡 (𝑠𝑡 ).
We begin by assuming that government purchases 𝑔𝑡 (𝑠𝑡 ) at time 𝑡 ≥ 0 depend on 𝑠𝑡 .
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 and date 𝑡.
A representative household is endowed with one unit of time that can be divided between
leisure ℓ𝑡 and labor 𝑛𝑡 :
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between 𝑐𝑡 (𝑠𝑡 ) and 𝑔𝑡 (𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡
where the utility function 𝑢 is increasing, strictly concave, and three times continuously dif-
ferentiable in both arguments.
The technology pins down a pre-tax wage rate to unity for all 𝑡, 𝑠𝑡 .
The government imposes a flat-rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡 .
There are complete markets in one-period Arrow securities.
One unit of an Arrow security issued at time 𝑡 at history 𝑠𝑡 and promising to pay one unit of
time 𝑡 + 1 consumption in state 𝑠𝑡+1 costs 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ).
The government issues one-period Arrow securities each period.
The government has a sequence of budget constraints whose time 𝑡 ≥ 0 component is
62.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1101
𝑔𝑡 (𝑠𝑡 ) = 𝜏𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) − 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) (4)
𝑠𝑡+1
where
• 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) is a competitive equilibrium price of one unit of consumption at date 𝑡 + 1
in state 𝑠𝑡+1 at date 𝑡 and history 𝑠𝑡
• 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) is government debt falling due at time 𝑡, history 𝑠𝑡 .
Government debt 𝑏0 (𝑠0 ) is an exogenous initial condition.
The representative household has a sequence of budget constraints whose time 𝑡 ≥ 0 compo-
nent is
𝑐𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = [1 − 𝜏𝑡 (𝑠𝑡 )] 𝑛𝑡 (𝑠𝑡 ) + 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) ∀𝑡 ≥ 0. (5)
𝑠𝑡+1
The household faces the price system as a price-taker and takes the government policy as
given.
The household chooses {𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )}∞ 𝑡
𝑡=0 to maximize (3) subject to (5) and (1) for all 𝑡, 𝑠 .
We find it convenient sometimes to work with the Arrow-Debreu price system that is implied
by a sequence of Arrow securities prices.
Let 𝑞𝑡0 (𝑠𝑡 ) be the price at time 0, measured in time 0 consumption goods, of one unit of con-
sumption at time 𝑡, history 𝑠𝑡 .
The following recursion relates Arrow-Debreu prices {𝑞𝑡0 (𝑠𝑡 )}∞
𝑡=0 to Arrow securities prices
𝑡 ∞
{𝑝𝑡+1 (𝑠𝑡+1 |𝑠 )}𝑡=0
0
𝑞𝑡+1 (𝑠𝑡+1 ) = 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑞𝑡0 (𝑠𝑡 ) 𝑠.𝑡. 𝑞00 (𝑠0 ) = 1 (6)
1102 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
Arrow-Debreu prices are useful when we want to compress a sequence of budget constraints
into a single intertemporal budget constraint, as we shall find it convenient to do below.
We apply a popular approach to solving a Ramsey problem, called the primal approach.
The idea is to use first-order conditions for household optimization to eliminate taxes and
prices in favor of quantities, then pose an optimization problem cast entirely in terms of
quantities.
After Ramsey quantities have been found, taxes and prices can then be unwound from the
allocation.
The primal approach uses four steps:
1. Obtain first-order conditions of the household’s problem and solve them for
{𝑞𝑡0 (𝑠𝑡 ), 𝜏𝑡 (𝑠𝑡 )}∞ 𝑡 𝑡 ∞
𝑡=0 as functions of the allocation {𝑐𝑡 (𝑠 ), 𝑛𝑡 (𝑠 )}𝑡=0 .
2. Substitute these expressions for taxes and prices in terms of the allocation into the
household’s present-value budget constraint.
3. Find the allocation that maximizes the utility of the representative household (3) sub-
ject to the feasibility constraints (1) and (2) and the implementability condition derived
in step 2.
4. Use the Ramsey allocation together with the formulas from step 1 to find taxes and
prices.
By sequential substitution of one one-period budget constraint (5) into another, we can ob-
tain the household’s present-value budget constraint:
∞ ∞
∑ ∑ 𝑞𝑡0 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) = ∑ ∑ 𝑞𝑡0 (𝑠𝑡 )[1 − 𝜏𝑡 (𝑠𝑡 )]𝑛𝑡 (𝑠𝑡 ) + 𝑏0 (7)
𝑡=0 𝑠𝑡 𝑡=0 𝑠𝑡
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (8)
𝑢𝑐 (𝑠𝑡 )
62.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1103
and
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽𝜋(𝑠𝑡+1 |𝑠𝑡 ) ( ) (9)
𝑢𝑐 (𝑠𝑡 )
𝑢𝑐 (𝑠𝑡 )
𝑞𝑡0 (𝑠𝑡 ) = 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ) (10)
𝑢𝑐 (𝑠0 )
Using the first-order conditions (8) and (9) to eliminate taxes and prices from (7), we derive
the implementability condition
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )[𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝑢𝑐 (𝑠0 )𝑏0 = 0. (11)
𝑡=0 𝑠𝑡
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] (12)
𝑡=0 𝑠𝑡
subject to (11).
𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] = 𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] + Φ [𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] (13)
∞
𝐽 = ∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ){𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] + 𝜃𝑡 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )]} − Φ𝑢𝑐 (0)𝑏0 (14)
𝑡=0 𝑠𝑡
where {𝜃𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 is a sequence of Lagrange multipliers on the feasible conditions (2).
Given an initial government debt 𝑏0 , we want to maximize 𝐽 with respect to
{𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 and to minimize with respect to {𝜃(𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 .
The first-order conditions for the Ramsey problem for periods 𝑡 ≥ 1 and 𝑡 = 0, respectively,
are
𝑐𝑡 (𝑠𝑡 )∶ (1 + Φ)𝑢𝑐 (𝑠𝑡 ) + Φ [𝑢𝑐𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ𝑐 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝜃𝑡 (𝑠𝑡 ) = 0, 𝑡≥1
𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡
(15)
𝑛𝑡 (𝑠 )∶ − (1 + Φ)𝑢ℓ (𝑠 ) − Φ [𝑢𝑐ℓ (𝑠 )𝑐𝑡 (𝑠 ) − 𝑢ℓℓ (𝑠 )𝑛𝑡 (𝑠 )] + 𝜃𝑡 (𝑠 ) = 0, 𝑡≥1
and
1104 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
𝑐0 (𝑠0 , 𝑏0 )∶ (1 + Φ)𝑢𝑐 (𝑠0 , 𝑏0 ) + Φ [𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓ𝑐 (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] − 𝜃0 (𝑠0 , 𝑏0 )
− Φ𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑏0 = 0
𝑛0 (𝑠0 , 𝑏0 )∶ − (1 + Φ)𝑢ℓ (𝑠0 , 𝑏0 ) − Φ [𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓℓ (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] + 𝜃0 (𝑠0 , 𝑏0 )
+ Φ𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑏0 = 0
(16)
Please note how these first-order conditions differ between 𝑡 = 0 and 𝑡 ≥ 1.
It is instructive to use first-order conditions (15) for 𝑡 ≥ 1 to eliminate the multipliers 𝜃𝑡 (𝑠𝑡 ).
For convenience, we suppress the time subscript and the index 𝑠𝑡 and obtain
𝑔𝑡 (𝑠𝑡 ) = 𝑔𝜏 (𝑠𝜏̃ ) = 𝑔
then it follows from (17) that the Ramsey choices of consumption and leisure, (𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ))
and (𝑐𝑗 (𝑠𝜏̃ ), ℓ𝑗 (𝑠𝜏̃ )), are identical.
The proposition asserts that the optimal allocation is a function of the currently realized
quantity of government purchases 𝑔 only and does not depend on the specific history that
preceded that realization of 𝑔.
Also, assume that government purchases 𝑔 are an exact time-invariant function 𝑔(𝑠) of 𝑠.
We maintain these assumptions throughout the remainder of this lecture.
62.3.7 Determining Φ
We complete the Ramsey plan by computing the Lagrange multiplier Φ on the implementabil-
ity constraint (11).
Government budget balance restricts Φ via the following line of reasoning.
The household’s first-order conditions imply
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (19)
𝑢𝑐 (𝑠𝑡 )
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽Π(𝑠𝑡+1 |𝑠𝑡 ) (20)
𝑢𝑐 (𝑠𝑡 )
Substituting from (19), (20), and the feasibility condition (2) into the recursive version (5) of
the household budget constraint gives
𝑢𝑐 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )] + 𝛽 ∑ Π(𝑠𝑡+1 |𝑠𝑡 )𝑢𝑐 (𝑠𝑡+1 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) =
𝑠𝑡+1 (21)
𝑡 𝑡 𝑡 𝑡−1
𝑢𝑙 (𝑠 )𝑛𝑡 (𝑠 ) + 𝑢𝑐 (𝑠 )𝑏𝑡 (𝑠𝑡 |𝑠 )
Notice that 𝑥𝑡 (𝑠𝑡 ) appears on the right side of (21) while 𝛽 times the conditional expectation
of 𝑥𝑡+1 (𝑠𝑡+1 ) appears on the left side.
Hence the equation shares much of the structure of a simple asset pricing equation with 𝑥𝑡
being analogous to the price of the asset at time 𝑡.
We learned earlier that for a Ramsey allocation 𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ) and 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ), and therefore
also 𝑥𝑡 (𝑠𝑡 ), are each functions of 𝑠𝑡 only, being independent of the history 𝑠𝑡−1 for 𝑡 ≥ 1.
That means that we can express equation (21) as
where 𝑠′ denotes a next period value of 𝑠 and 𝑥′ (𝑠′ ) denotes a next period value of 𝑥.
Equation (22) is easy to solve for 𝑥(𝑠) for 𝑠 = 1, … , 𝑆.
If we let 𝑛,⃗ 𝑔,⃗ 𝑥⃗ denote 𝑆 × 1 vectors whose 𝑖th elements are the respective 𝑛, 𝑔, and 𝑥 values
when 𝑠 = 𝑖, and let Π be the transition matrix for the Markov state 𝑠, then we can express
(22) as the matrix equation
In these equations, by 𝑢⃗𝑐 𝑛,⃗ for example, we mean element-by-element multiplication of the
two vectors.
𝑥(𝑠)
After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or the
matrix equation
𝑥⃗
𝑏⃗ = (25)
𝑢⃗𝑐
where division here means element-by-element division of the respective components of the
𝑆 × 1 vectors 𝑥⃗ and 𝑢⃗𝑐 .
Here is a computational algorithm:
1. Start with a guess for the value for Φ, then use the first-order conditions and the feasi-
bility conditions to compute 𝑐(𝑠𝑡 ), 𝑛(𝑠𝑡 ) for 𝑠 ∈ [1, … , 𝑆] and 𝑐0 (𝑠0 , 𝑏0 ) and 𝑛0 (𝑠0 , 𝑏0 ),
given Φ
• these depend on Φ
𝑆
𝑢𝑐,0 𝑏0 = 𝑢𝑐,0 (𝑛0 − 𝑔0 ) − 𝑢𝑙,0 𝑛0 + 𝛽 ∑ Π(𝑠|𝑠0 )𝑥(𝑠) (26)
𝑠=1
by gradually raising Φ if the left side of (26) exceeds the right side and lowering Φ if the left
side is less than the right side.
1. After computing a Ramsey allocation, recover the flat tax rate on labor from (8) and
the implied one-period Arrow securities prices from (9).
In summary, when 𝑔𝑡 is a time invariant function of a Markov state 𝑠𝑡 , a Ramsey plan can be
constructed by solving 3𝑆 + 3 equations in 𝑆 components each of 𝑐,⃗ 𝑛,⃗ and 𝑥⃗ together with
𝑛0 , 𝑐0 , and Φ.
In our calculations below and in a subsequent lecture based on an extension of the Lucas-
Stokey model by Aiyagari, Marcet, Sargent, and Seppälä (2002) [2], we shall modify the one-
period utility function assumed above.
(We adopted the preceding utility specification because it was the one used in the original
[72] paper)
We will modify their specification by instead assuming that the representative agent has util-
ity function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡
1108 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
With these understandings, equations (17) and (18) simplify in the case of the CRRA utility
function.
They become
and
(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (28)
In equation (27), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠.
In addition, the time 𝑡 = 0 budget constraint is satisfied at 𝑐0 and initial government debt 𝑏0 :
𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (29)
𝑅0
where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time
𝑡 = 0 and 𝜏0 is the time 𝑡 = 0 tax rate.
In equation (29), it is understood that
𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0
function SequentialAllocation(model)
β, Π, G, Θ = model.β, model.Π, model.G, model.Θ
mc = MarkovChain(Π)
S = size(Π, 1) # Number of states
# now find the first best allocation
cFB, nFB, ΞFB, zFB = find_first_best(model, S, 1)
if converged(res) == false
error("Could not find first best")
end
if version == 1
1110 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
cFB = res.zero[1:S]
nFB = res.zero[S+1:end]
ΞFB = Uc(cFB, nFB) # Multiplier on the resource constraint
zFB = vcat(cFB, nFB, ΞFB)
return cFB, nFB, ΞFB, zFB
elseif version == 2
cFB = res.zero[1:S]
nFB = res.zero[S+1:end]
IFB = Uc(cFB, nFB) .* cFB + Un(cFB, nFB) .* nFB
xFB = \(I - β * Π, IFB)
zFB = [vcat(cFB[s], xFB[s], xFB) for s in 1:S]
return cFB, nFB, IFB, xFB, zFB
end
end
function time1_allocation(pas::SequentialAllocation, μ)
model, S = pas.model, pas.S
Θ, β, Π, G, Uc, Ucc, Un, Unn =
model.Θ, model.β, model.Π, model.G,
model.Uc, model.Ucc, model.Un, model.Unn
function FOC!(out, z)
c = z[1:S]
n = z[S+1:2S]
Ξ = z[2S+1:end]
out[1:S] = Uc(c, n) .- μ * (Ucc(c, n) .* c .+ Uc(c, n)) .- Ξ # FOC c
out[S+1:2S] = Un(c, n) .- μ * (Unn(c, n) .* n .+ Un(c, n)) + Θ .* Ξ�
↪# FOC n
(Θ .* n .- c - G)[s_0]
)
end
# Find root
res = nlsolve(FOC!, [0.0, pas.cFB[s_0], pas.nFB[s_0], pas.ΞFB[s_0]])
if res.f_converged == false
error("Could not find time 0 LS allocation.")
end
return (res.zero...,)
end
function time1_value(pas::SequentialAllocation, μ)
model = pas.model
c, n, x, Ξ = time1_allocation(pas, μ)
U_val = model.U.(c, n)
V = \(I - model.β*model.Π, U_val)
return c, n, x, V
end
function Τ(model, c, n)
Uc, Un = model.Uc.(c, n), model.Un.(c, n)
return 1 .+ Un ./ (model.Θ .* Uc)
end
model = pas.model
Π, β, Uc = model.Π, model.β, model.Uc
if isnothing(sHist)
sHist = QuantEcon.simulate(pas.mc, T, init=s_0)
end
cHist = zeros(T)
nHist = similar(cHist)
Bhist = similar(cHist)
ΤHist = similar(cHist)
μHist = similar(cHist)
RHist = zeros(T-1)
# time 0
μ, cHist[1], nHist[1], _ = time0_allocation(pas, B_, s_0)
ΤHist[1] = Τ(pas.model, cHist[1], nHist[1])[s_0]
Bhist[1] = B_
μHist[1] = μ
# time 1 onward
for t in 2:T
c, n, x, Ξ = time1_allocation(pas,μ)
u_c = Uc(c,n)
s = sHist[t]
ΤHist[t] = Τ(pas.model, c, n)[s]
Eu_c = dot(Π[sHist[t-1],:], u_c)
cHist[t], nHist[t], Bhist[t] = c[s], n[s], x[s] / u_c[s]
RHist[t-1] = Uc(cHist[t-1], nHist[t-1]) / (β * Eu_c)
μHist[t] = μ
end
return cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist
end
1112 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
init[i] = lb[i]
end
end
(minf, minx, ret) = optimize(opt, init)
T.z0[i_x, s] = vcat(minx[1], minx[1] + G[s], minx[2:end])
return vcat(-minf, T.z0[i_x, s])
end
𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) in equation (21) appears to be a purely “forward-looking” variable.
But 𝑥𝑡 (𝑠𝑡 ) is a also a natural candidate for a state variable in a recursive formulation of the
Ramsey problem.
1114 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
To express a Ramsey plan recursively, we imagine that a time 0 Ramsey planner is followed
by a sequence of continuation Ramsey planners at times 𝑡 = 1, 2, ….
A “continuation Ramsey planner” has a different objective function and faces different con-
straints than a Ramsey planner.
A key step in representing a Ramsey plan recursively is to regard the marginal utility scaled
government debts 𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) as predetermined quantities that continuation
Ramsey planners at times 𝑡 ≥ 1 are obligated to attain.
Continuation Ramsey planners do this by choosing continuation policies that induce the rep-
resentative household to make choices that imply that 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) = 𝑥𝑡 (𝑠𝑡 ).
A time 𝑡 ≥ 1 continuation Ramsey planner delivers 𝑥𝑡 by choosing a suitable 𝑛𝑡 , 𝑐𝑡 pair and
a list of 𝑠𝑡+1 -contingent continuation quantities 𝑥𝑡+1 to bequeath to a time 𝑡 + 1 continuation
Ramsey planner.
A time 𝑡 ≥ 1 continuation Ramsey planner faces 𝑥𝑡 , 𝑠𝑡 as state variables.
But the time 0 Ramsey planner faces 𝑏0 , not 𝑥0 , as a state variable.
Furthermore, the Ramsey planner cares about (𝑐0 (𝑠0 ), ℓ0 (𝑠0 )), while continuation Ramsey
planners do not.
The time 0 Ramsey planner hands 𝑥1 as a function of 𝑠1 to a time 1 continuation Ramsey
planner.
These lines of delegated authorities and responsibilities across time express the continuation
Ramsey planners’ obligations to implement their parts of the original Ramsey plan, designed
once-and-for-all at time 0.
After 𝑠𝑡 has been realized at time 𝑡 ≥ 1, the state variables confronting the time 𝑡 continua-
tion Ramsey planner are (𝑥𝑡 , 𝑠𝑡 ).
• Let 𝑉 (𝑥, 𝑠) be the value of a continuation Ramsey plan at 𝑥𝑡 = 𝑥, 𝑠𝑡 = 𝑠 for 𝑡 ≥ 1.
• Let 𝑊 (𝑏, 𝑠) be the value of a Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠.
We work backwards by presenting a Bellman equation for 𝑉 (𝑥, 𝑠) first, then a Bellman equa-
tion for 𝑊 (𝑏, 𝑠).
where maximization over 𝑛 and the 𝑆 elements of 𝑥′ (𝑠′ ) is subject to the single imple-
mentability constraint for 𝑡 ≥ 1
𝑛𝑡 = 𝑓(𝑥𝑡 , 𝑠𝑡 ), 𝑡≥1
(32)
𝑥𝑡+1 (𝑠𝑡+1 ) = ℎ(𝑠𝑡+1 ; 𝑥𝑡 , 𝑠𝑡 ), 𝑠𝑡+1 ∈ 𝑆, 𝑡 ≥ 1
where maximization over 𝑛0 and the 𝑆 elements of 𝑥′ (𝑠1 ) is subject to the time 0 imple-
mentability constraint
𝑛0 = 𝑓0 (𝑏0 , 𝑠0 )
(35)
𝑥1 (𝑠1 ) = ℎ0 (𝑠1 ; 𝑏0 , 𝑠0 )
Notice the appearance of state variables (𝑏0 , 𝑠0 ) in the time 0 policy functions for the Ramsey
planner as compared to (𝑥𝑡 , 𝑠𝑡 ) in the policy functions (32) for the time 𝑡 ≥ 1 continuation
Ramsey planners.
The value function 𝑉 (𝑥𝑡 , 𝑠𝑡 ) of the time 𝑡 continuation Ramsey planner equals
∞
𝐸𝑡 ∑𝜏=𝑡 𝛽 𝜏−𝑡 𝑢(𝑐𝑡 , 𝑙𝑡 ), where the consumption and leisure processes are evaluated along the
original time 0 Ramsey plan.
Attach a Lagrange multiplier Φ1 (𝑥, 𝑠) to constraint (31) and a Lagrange multiplier Φ0 to con-
straint (26).
Time 𝑡 ≥ 1: the first-order conditions for the time 𝑡 ≥ 1 constrained maximization problem on
the right side of the continuation Ramsey planner’s Bellman equation (30) are
for 𝑛.
Given Φ1 , equation (37) is one equation to be solved for 𝑛 as a function of 𝑠 (or of 𝑔(𝑠)).
Equation (36) implies 𝑉𝑥 (𝑥′ , 𝑠′ ) = Φ1 , while an envelope condition is 𝑉𝑥 (𝑥, 𝑠) = Φ1 , so it
follows that
Time 𝑡 = 0: For the time 0 problem on the right side of the Ramsey planner’s Bellman equa-
tion (33), first-order conditions are
𝑉𝑥 (𝑥(𝑠1 ), 𝑠1 ) = Φ0 (39)
Notice similarities and differences between the first-order conditions for 𝑡 ≥ 1 and for 𝑡 = 0.
An additional term is present in (40) except in three special cases
• 𝑏0 = 0, or
• 𝑢𝑐 is constant (i.e., preferences are quasi-linear in consumption), or
• initial government assets are sufficiently large to finance all government purchases with
interest earnings from those assets, so that Φ0 = 0
Except in these special cases, the allocation and the labor tax rate as functions of 𝑠𝑡 differ
between dates 𝑡 = 0 and subsequent dates 𝑡 ≥ 1.
Naturally, the first-order conditions in this recursive formulation of the Ramsey problem
agree with the first-order conditions derived when we first formulated the Ramsey plan in the
space of sequences.
𝑉𝑥 (𝑥𝑡 , 𝑠𝑡 ) = Φ0 (41)
for all 𝑡 ≥ 1.
When 𝑉 is concave in 𝑥, this implies state-variable degeneracy along a Ramsey plan in the
sense that for 𝑡 ≥ 1, 𝑥𝑡 will be a time-invariant function of 𝑠𝑡 .
Given Φ0 , this function mapping 𝑠𝑡 into 𝑥𝑡 can be expressed as a vector 𝑥⃗ that solves equa-
tion (34) for 𝑛 and 𝑐 as functions of 𝑔 that are associated with Φ = Φ0 .
62.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1117
While the marginal utility adjusted level of government debt 𝑥𝑡 is a key state variable for the
continuation Ramsey planners at 𝑡 ≥ 1, it is not a state variable at time 0.
The time 0 Ramsey planner faces 𝑏0 , not 𝑥0 = 𝑢𝑐,0 𝑏0 , as a state variable.
The discrepancy in state variables faced by the time 0 Ramsey planner and the time 𝑡 ≥ 1
continuation Ramsey planners captures the differing obligations and incentives faced by the
time 0 Ramsey planner and the time 𝑡 ≥ 1 continuation Ramsey planners.
• The time 0 Ramsey planner is obligated to honor government debt 𝑏0 measured in time
0 consumption goods.
• The time 0 Ramsey planner can manipulate the value of government debt as measured
by 𝑢𝑐,0 𝑏0 .
• In contrast, time 𝑡 ≥ 1 continuation Ramsey planners are obligated not to alter values
of debt, as measured by 𝑢𝑐,𝑡 𝑏𝑡 , that they inherit from a preceding Ramsey planner or
continuation Ramsey planner.
When government expenditures 𝑔𝑡 are a time invariant function of a Markov state 𝑠𝑡 , a Ram-
sey plan and associated Ramsey allocation feature marginal utilities of consumption 𝑢𝑐 (𝑠𝑡 )
that, given Φ, for 𝑡 ≥ 1 depend only on 𝑠𝑡 , but that for 𝑡 = 0 depend on 𝑏0 as well.
This means that 𝑢𝑐 (𝑠𝑡 ) will be a time invariant function of 𝑠𝑡 for 𝑡 ≥ 1, but except when 𝑏0 =
0, a different function for 𝑡 = 0.
This in turn means that prices of one period Arrow securities 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝑝(𝑠𝑡+1 |𝑠𝑡 ) will
be the same time invariant functions of (𝑠𝑡+1 , 𝑠𝑡 ) for 𝑡 ≥ 1, but a different function 𝑝0 (𝑠1 |𝑠0 )
for 𝑡 = 0, except when 𝑏0 = 0.
The differences between these time 0 and time 𝑡 ≥ 1 objects reflect the Ramsey planner’s
incentive to manipulate Arrow security prices and, through them, the value of initial govern-
ment debt 𝑏0 .
else
error("T.time_0 is $(T.time_0), which is invalid")
end
diff = 0.0
for s in 1:S
diff = max(diff, maximum(abs,
(Vf[s].(xgrid)-Vfnew[s].(xgrid))./Vf[s].(xgrid)))
end
print("diff = $diff \n")
Vf = Vfnew
62.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1119
end
# Store value function policies and Bellman Equations
return Vf, policies, T, xgrid
end
end
end
return Vf, [cf, nf, xprimef]
end
s, x = sHist[t], xprime[sHist[t]]
n = nf[s](x)
c = [cf[shat](x) for shat in 1:S]
xprime = [xprimef[s, sprime](x) for sprime in 1:S]
ΤHist[t] = Τ(pab.model, c, n)[s]
u_c = Uc(c, n)
Eu_c = dot(Π[sHist[t-1], :], u_c)
μHist[t] = pab.Vf[s](x)
RHist[t-1] = Uc(cHist[t-1], nHist[t-1]) / (β * Eu_c)
cHist[t], nHist[t], Bhist[t] = c[s], n, x / u_c[s]
end
return cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist
end
62.5 Examples
This example illustrates in a simple setting how a Ramsey planner manages risk.
Government expenditures are known for sure in all periods except one.
• For 𝑡 < 3 and 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1.
• At 𝑡 = 3 a war occcurs with probability 0.5.
– If there is war, 𝑔3 = 𝑔ℎ = 0.2.
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1.
We define the components of the state vector as the following six (𝑡, 𝑔) pairs:
(0, 𝑔𝑙 ), (1, 𝑔𝑙 ), (2, 𝑔𝑙 ), (3, 𝑔𝑙 ), (3, 𝑔ℎ ), (𝑡 ≥ 4, 𝑔𝑙 ).
We think of these 6 states as corresponding to 𝑠 = 1, 2, 3, 4, 5, 6.
The transition matrix is
0 1 0 0 0 0
⎛
⎜0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
0 0 0 0.5 0.5 0⎟
Π=⎜
⎜
⎜
⎟
⎟
⎜0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
0.1
⎛
⎜ 0.1⎞⎟
⎜
⎜ ⎟
⎜ 0.1⎟⎟
𝑔=⎜
⎜ ⎟
⎟ .
⎜ 0.1 ⎟
⎜
⎜0.2⎟ ⎟
⎝0.1⎠
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
sHist_h = [1, 2, 3, 4, 6, 6, 6]
sHist_l = [1, 2, 3, 5, 6, 6, 6]
using Plots
gr(fmt=:png);
titles = hcat("Consumption",
"Labor Supply",
"Government Debt",
"Tax Rate",
"Government Spending",
"Output")
Out[6]:
62.5. EXAMPLES 1123
Tax smoothing
• the tax rate is constant for all 𝑡 ≥ 1
– For 𝑡 ≥ 1, 𝑡 ≠ 3, this is a consequence of 𝑔𝑡 being the same at all those dates
– For 𝑡 = 3, it is a consequence of the special one-period utility function that we
have assumed
– Under other one-period utility functions, the time 𝑡 = 3 tax rate could be either
higher or lower than for dates 𝑡 ≥ 1, 𝑡 ≠ 3
• the tax rate is the same at 𝑡 = 3 for both the high 𝑔𝑡 outcome and the low 𝑔𝑡 outcome
We have assumed that at 𝑡 = 0, the government owes positive debt 𝑏0 .
It sets the time 𝑡 = 0 tax rate partly with an eye to reducing the value 𝑢𝑐,0 𝑏0 of 𝑏0 .
It does this by increasing consumption at time 𝑡 = 0 relative to consumption in later periods.
This has the consequence of raising the time 𝑡 = 0 value of the gross interest rate for risk-free
loans between periods 𝑡 and 𝑡 + 1, which equals
𝑢𝑐,𝑡
𝑅𝑡 =
𝛽𝔼𝑡 [𝑢𝑐,𝑡+1 ]
A tax policy that makes time 𝑡 = 0 consumption be higher than time 𝑡 = 1 consumption
evidently increases the risk-free rate one-period interest rate, 𝑅𝑡 , at 𝑡 = 0.
Raising the time 𝑡 = 0 risk-free interest rate makes time 𝑡 = 0 consumption goods cheaper
relative to consumption goods at later dates, thereby lowering the value 𝑢𝑐,0 𝑏0 of initial gov-
ernment debt 𝑏0 .
We see this in a figure below that plots the time path for the risk free interest rate under
both realizations of the time 𝑡 = 3 government expenditure shock.
1124 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
The following plot illustrates how the government lowers the interest rate at time 0 by raising
consumption
Out[7]:
At time 𝑡 = 1, the government evidently saves since it has set the tax rate sufficiently high to
allow it to set 𝑏2 < 𝑏1 .
At time 𝑡 = 2 the government trades state-contingent Arrow securities to hedge against war
at 𝑡 = 3.
We have seen that when 𝑏0 > 0, the Ramsey plan sets the time 𝑡 = 0 tax rate partly with an
eye toward raising a risk-free interest rate for one-period loans between times 𝑡 = 0 and 𝑡 = 1.
By raising this interest rate, the plan makes time 𝑡 = 0 goods cheap relative to consumption
goods at later times.
By doing this, it lowers the value of time 𝑡 = 0 debt that it has inherited and must finance.
In the preceding example, the Ramsey tax rate at time 0 differs from its value at time 1.
To explore what is going on here, let’s simplify things by removing the possibility of war at
time 𝑡 = 3.
The Ramsey problem then includes no randomness because 𝑔𝑡 = 𝑔𝑙 for all 𝑡.
The figure below plots the Ramsey tax rates and gross interest rates at time 𝑡 = 0 and time
𝑡 ≥ 1 as functions of the initial government debt (using the sequential allocation solution and
a CRRA utility function defined above)
Out[8]:
1126 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
The figure indicates that if the government enters with positive debt, it sets a tax rate at 𝑡 =
0 that is less than all later tax rates.
By setting a lower tax rate at 𝑡 = 0, the government raises consumption, which reduces the
value 𝑢𝑐,0 𝑏0 of its initial debt.
It does this by increasing 𝑐0 and thereby lowering 𝑢𝑐,0 .
Conversely, if 𝑏0 < 0, the Ramsey planner sets the tax rate at 𝑡 = 0 higher than in subsequent
periods.
A side effect of lowering time 𝑡 = 0 consumption is that it raises the one-period interest rate
at time 0 above that of subsequent periods.
There are only two values of initial government debt at which the tax rate is constant for all
𝑡 ≥ 0.
The first is 𝑏0 = 0
• Here the government can’t use the 𝑡 = 0 tax rate to alter the value of the
initial debt.
The second occurs when the government enters with sufficiently large assets that the Ramsey
planner can achieve first best and sets 𝜏𝑡 = 0 for all 𝑡.
It is only for these two values of initial government debt that the Ramsey plan is time-
consistent.
62.5. EXAMPLES 1127
Another way of saying this is that, except for these two values of initial government debt, a
continuation of a Ramsey plan is not a Ramsey plan.
To illustrate this, consider a Ramsey planner who starts with an initial government debt 𝑏1
associated with one of the Ramsey plans computed above.
Call 𝜏1𝑅 the time 𝑡 = 0 tax rate chosen by the Ramsey planner confronting this value for ini-
tial government debt government.
The figure below shows both the tax rate at time 1 chosen by our original Ramsey planner
and what a new Ramsey planner would choose for its time 𝑡 = 0 tax rate
Out[9]:
The tax rates in the figure are equal for only two values of initial government debt.
The complete tax smoothing for 𝑡 ≥ 1 in the preceding example is a consequence of our hav-
ing assumed CRRA preferences.
1128 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
To see what is driving this outcome, we begin by noting that the Ramsey tax rate for 𝑡 ≥ 1
is a time invariant function 𝜏 (Φ, 𝑔) of the Lagrange multiplier on the implementability con-
straint and government expenditures.
For CRRA preferences, we can exploit the relations 𝑈𝑐𝑐 𝑐 = −𝜎𝑈𝑐 and 𝑈𝑛𝑛 𝑛 = 𝛾𝑈𝑛 to derive
(1 + (1 − 𝜎)Φ)𝑈𝑐
=1
(1 + (1 − 𝛾)Φ)𝑈𝑛
Also suppose that 𝑔𝑡 follows a two state i.i.d. process with equal probabilities attached to 𝑔𝑙
and 𝑔ℎ .
To compute the tax rate, we will use both the sequential and recursive approaches described
above.
The figure below plots a sample path of the Ramsey tax rate
In [11]: M1 = log_utility()
μ_grid = range(-0.6, 0.0, length = 200)
PP_seq = SequentialAllocation(M1) # Solve sequential problem
PP_bel = RecursiveAllocation(M1, μ_grid) # Solve recursive problem
T = 20
sHist = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1]
62.5. EXAMPLES 1129
# Simulate
sim_seq = simulate(PP_seq, 0.5, 1, T, sHist)
sim_bel = simulate(PP_bel, 0.5, 1, T, sHist)
# Plot policies
sim_seq_plot = [sim_seq[1:4]..., M1.G[sHist], M1.Θ[sHist].*sim_seq[2]]
sim_bel_plot = [sim_bel[1:4]..., M1.G[sHist], M1.Θ[sHist].*sim_bel[2]]
titles = hcat("Consumption",
"Labor Supply",
"Government Debt",
"Tax Rate",
"Government Spending",
"Output")
labels = [["Sequential", "Recursive"], ["",""], ["",""], ["",""],�
↪["",""], ["",""]]
plots=plot(layout=(3,2), size=(850,780))
for i = 1:6
plot!(plots[i], sim_seq_plot[i], color=:black, lw=2, marker=:circle,
markersize=2, label=labels[i][1])
plot!(plots[i], sim_bel_plot[i], color=:blue, lw=2, marker=:xcross,
markersize=2, label=labels[i][2])
plot!(plots[i], title=titles[i], grid=true, legend=:topright)
end
plot(plots)
diff = 0.0003504611535379196
diff = 0.00015763906388123851
diff = 7.124337606645018e-5
diff = 3.2356917242389125e-5
diff = 1.4829540976261937e-5
diff = 6.9104341194283816e-6
diff = 3.323222470321399e-6
diff = 1.6870560419905608e-6
diff = 9.29342141847281e-7
Out[11]:
1130 CHAPTER 62. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
As should be expected, the recursive and sequential solutions produce almost identical alloca-
tions.
Unlike outcomes with CRRA preferences, the tax rate is not perfectly smoothed.
Instead the government raises the tax rate when 𝑔𝑡 is high.
A related lecture describes an extension of the Lucas-Stokey model by Aiyagari, Marcet, Sar-
gent, and Seppälä (2002) [2].
In th AMSS economy, only a risk-free bond is traded.
That lecture compares the recursive representation of the Lucas-Stokey model presented in
this lecture with one for an AMSS economy.
By comparing these recursive formulations, we shall glean a sense in which the dimension of
the state is lower in the Lucas Stokey model.
Accompanying that difference in dimension will be different dynamics of government debt.
Chapter 63
63.1 Contents
• Overview 63.2
• Competitive Equilibrium with Distorting Taxes 63.3
• Recursive Version of AMSS Model 63.4
• Examples 63.5
63.2 Overview
In an earlier lecture we described a model of optimal taxation with state-contingent debt due
to Robert E. Lucas, Jr., and Nancy Stokey [72].
Aiyagari, Marcet, Sargent, and Seppälä [2] (hereafter, AMSS) studied optimal taxation in a
model without state-contingent debt.
In this lecture, we
• describe assumptions and equilibrium concepts
• solve the model
• implement the model numerically
• conduct some policy experiments
• compare outcomes with those in a corresponding complete-markets model
We begin with an introduction to the model.
63.2.1 Setup
1131
1132 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Many but not all features of the economy are identical to those of the Lucas-Stokey economy.
Let’s start with things that are identical.
For 𝑡 ≥ 0, a history of the state is represented by 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ].
Government purchases 𝑔(𝑠) are an exact time-invariant function of 𝑠.
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 at time 𝑡.
Each period a representative household is endowed with one unit of time that can be divided
between leisure ℓ𝑡 and labor 𝑛𝑡 :
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between consumption 𝑐𝑡 (𝑠𝑡 ) and 𝑔(𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡
where
• 𝜋𝑡 (𝑠𝑡 ) is a joint probability distribution over the sequence 𝑠𝑡 , and
• the utility function 𝑢 is increasing, strictly concave, and three times continuously differ-
entiable in both arguments
The government imposes a flat rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡 .
Lucas and Stokey assumed that there are complete markets in one-period Arrow securities;
also see smoothing models.
It is at this point that AMSS [2] modify the Lucas and Stokey economy.
AMSS allow the government to issue only one-period risk-free debt each period.
Ruling out complete markets in this way is a step in the direction of making total tax collec-
tions behave more like that prescribed in [8] than they do in [72].
That 𝑏𝑡+1 (𝑠𝑡 ) is the same for all realizations of 𝑠𝑡+1 captures its risk-free character.
The market value at time 𝑡 of government debt maturing at time 𝑡 + 1 equals 𝑏𝑡+1 (𝑠𝑡 ) divided
by 𝑅𝑡 (𝑠𝑡 ).
The government’s budget constraint in period 𝑡 at history 𝑠𝑡 is
𝑏𝑡+1 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝜏𝑡𝑛 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) +
𝑅𝑡 (𝑠𝑡 )
(4)
𝑏 (𝑠𝑡 )
≡ 𝑧(𝑠 ) + 𝑡+1 𝑡 ,
𝑡
𝑅𝑡 (𝑠 )
𝑡+1
1 𝑡+1 𝑡 𝑢𝑐 (𝑠 )
𝑡
= ∑ 𝛽𝜋 𝑡+1 (𝑠 |𝑠 ) 𝑡
𝑅𝑡 (𝑠 ) 𝑠𝑡+1 |𝑠𝑡 𝑢𝑐 (𝑠 )
Substituting this expression into the government’s budget constraint (4) yields:
𝑢𝑐 (𝑠𝑡+1 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝑧(𝑠𝑡 ) + 𝛽 ∑ 𝜋𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) 𝑏 (𝑠𝑡 ) (5)
𝑠𝑡+1 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 ) 𝑡+1
Components of 𝑧(𝑠𝑡 ) on the right side depend on 𝑠𝑡 , but the left side is required to depend on
𝑠𝑡−1 only.
This is what it means for one-period government debt to be risk-free.
Therefore, the sum on the right side of equation (5) also has to depend only on 𝑠𝑡−1 .
This requirement will give rise to measurability constraints on the Ramsey allocation to
be discussed soon.
If we replace 𝑏𝑡+1 (𝑠𝑡 ) on the right side of equation (5) by the right side of next period’s bud-
get constraint (associated with a particular realization 𝑠𝑡 ) we get
After making similar repeated substitutions for all future occurrences of government indebt-
edness, and by invoking the natural debt limit, we arrive at:
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = ∑ ∑ 𝛽 𝑗 𝜋𝑡+𝑗 (𝑠𝑡+𝑗 |𝑠𝑡 ) 𝑧(𝑠𝑡+𝑗 ) (6)
𝑗=0 𝑠𝑡+𝑗 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 )
Now let’s
• substitute the resource constraint into the net-of-interest government surplus, and
1134 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
• use the household’s first-order condition 1 − 𝜏𝑡𝑛 (𝑠𝑡 ) = 𝑢ℓ (𝑠𝑡 )/𝑢𝑐 (𝑠𝑡 ) to eliminate the labor
tax rate
so that we can express the net-of-interest government surplus 𝑧(𝑠𝑡 ) as
𝑢ℓ (𝑠𝑡 )
𝑧(𝑠𝑡 ) = [1 − ] [𝑐𝑡 (𝑠𝑡 ) + 𝑔𝑡 (𝑠𝑡 )] − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) . (7)
𝑢𝑐 (𝑠𝑡 )
If we substitute the appropriate versions of right side of (7) for 𝑧(𝑠𝑡+𝑗 ) into equation (6), we
obtain a sequence of implementability constraints on a Ramsey allocation in an AMSS econ-
omy.
Expression (6) at time 𝑡 = 0 and initial state 𝑠0 was also an implementability constraint on a
Ramsey allocation in a Lucas-Stokey economy:
∞
𝑢𝑐 (𝑠𝑗 )
𝑏0 (𝑠−1 ) = 𝔼 0 ∑ 𝛽 𝑗 𝑧(𝑠𝑗 ) (8)
𝑗=0
𝑢𝑐 (𝑠0 )
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝔼 𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) (9)
𝑗=0
𝑢𝑐 (𝑠𝑡 )
The expression on the right side of (9) in the Lucas-Stokey (1983) economy would equal the
present value of a continuation stream of government surpluses evaluated at what would be
competitive equilibrium Arrow-Debreu prices at date 𝑡.
In the Lucas-Stokey economy, that present value is measurable with respect to 𝑠𝑡 .
In the AMSS economy, the restriction that government debt be risk-free imposes that that
same present value must be measurable with respect to 𝑠𝑡−1 .
In a language used in the literature on incomplete markets models, it can be said that the
AMSS model requires that at each (𝑡, 𝑠𝑡 ) what would be the present value of continuation
government surpluses in the Lucas-Stokey model must belong to the marketable subspace
of the AMSS model.
After we have substituted the resource constraint into the utility function, we can express the
Ramsey problem as being to choose an allocation that solves
∞
max 𝔼 0 ∑ 𝛽 𝑡 𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
{𝑐𝑡 (𝑠𝑡 ),𝑏𝑡+1 (𝑠𝑡 )}
𝑡=0
∞
𝑢𝑐 (𝑠𝑗 )
𝔼 0 ∑ 𝛽𝑗 𝑧(𝑠𝑗 ) ≥ 𝑏0 (𝑠−1 ) (10)
𝑗=0
𝑢𝑐 (𝑠0 )
and
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝔼 𝑡 ∑ 𝛽𝑗 𝑧(𝑠𝑡+𝑗 ) = 𝑏𝑡 (𝑠𝑡−1 ) ∀ 𝑠𝑡 (11)
𝑗=0
𝑢𝑐 (𝑠𝑡 )
given 𝑏0 (𝑠−1 ).
Lagrangian Formulation
Depending on how the constraints bind, these multipliers can be positive or negative:
A negative multiplier 𝛾𝑡 (𝑠𝑡 ) < 0 means that if we could relax constraint (11), we would like to
increase the beginning-of-period indebtedness for that particular realization of history 𝑠𝑡 .
That would let us reduce the beginning-of-period indebtedness for some other history Section
??.
These features flow from the fact that the government cannot use state-contingent debt and
therefore cannot allocate its indebtedness efficiently across future states.
∞
𝐽 = 𝔼 0 ∑ 𝛽 𝑡 {𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡=0
∞
+ 𝛾𝑡 (𝑠𝑡 )[𝔼 𝑡 ∑ 𝛽 𝑗 𝑢𝑐 (𝑠𝑡+𝑗 ) 𝑧(𝑠𝑡+𝑗 ) − 𝑢𝑐 (𝑠𝑡 ) 𝑏𝑡 (𝑠𝑡−1 )}
𝑗=0
(12)
∞
𝑡 𝑡 𝑡
= 𝔼 0 ∑ 𝛽 {𝑢 (𝑐𝑡 (𝑠 ), 1 − 𝑐𝑡 (𝑠 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡=0
where
In (12), the second equality uses the law of iterated expectations and Abel’s summation for-
mula (also called summation by parts, see this page).
First-order conditions with respect to 𝑐𝑡 (𝑠𝑡 ) can be expressed as
𝑢𝑐 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 ) + Ψ𝑡 (𝑠𝑡 ) {[𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑧(𝑠𝑡 ) + 𝑢𝑐 (𝑠𝑡 ) 𝑧𝑐 (𝑠𝑡 )}
(14)
− 𝛾𝑡 (𝑠𝑡 ) [𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑏𝑡 (𝑠𝑡−1 ) = 0
If we substitute 𝑧(𝑠𝑡 ) from (7) and its derivative 𝑧𝑐 (𝑠𝑡 ) into first-order condition (14), we find
two differences from the corresponding condition for the optimal allocation in a Lucas-Stokey
economy with state-contingent government debt.
1. The term involving 𝑏𝑡 (𝑠𝑡−1 ) in first-order condition (14) does not appear in the corre-
sponding expression for the Lucas-Stokey economy.
2. The Lagrange multiplier Ψ𝑡 (𝑠𝑡 ) in first-order condition (14) may change over time in re-
sponse to realizations of the state, while the multiplier Φ in the Lucas-Stokey economy
is time invariant.
We need some code from our an earlier lecture on optimal taxation with state-contingent debt
sequential allocation implementation:
import QuantEcon.simulate
63.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1137
function SequentialAllocation(model::Model)
β, Π, G, Θ = model.β, model.Π, model.G, model.Θ
mc = MarkovChain(Π)
S = size(Π, 1) # Number of states
# Now find the first best allocation
cFB, nFB, ΞFB, zFB = find_first_best(model, S, 1)
if converged(res) == false
error("Could not find first best")
1138 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
end
if version == 1
cFB = res.zero[1:S]
nFB = res.zero[S+1:end]
ΞFB = Uc(cFB, nFB) # Multiplier on the resource constraint
zFB = vcat(cFB, nFB, ΞFB)
return cFB, nFB, ΞFB, zFB
elseif version == 2
cFB = res.zero[1:S]
nFB = res.zero[S+1:end]
IFB = Uc(cFB, nFB) .* cFB + Un(cFB, nFB) .* nFB
xFB = \(LinearAlgebra.I - β * Π, IFB)
zFB = [vcat(cFB[s], xFB[s], xFB) for s in 1:S]
return cFB, nFB, IFB, xFB, zFB
end
end
function time0_allocation(pas::SequentialAllocation,
B_::AbstractFloat, s_0::Integer)
model = pas.model
Π, Θ, G, β = model.Π, model.Θ, model.G, model.β
Uc, Ucc, Un, Unn =
model.Uc, model.Ucc, model.Un, model.Unn
# Find root
res = nlsolve(FOC!, [0.0, pas.cFB[s_0], pas.nFB[s_0], pas.ΞFB[s_0]])
if res.f_converged == false
error("Could not find time 0 LS allocation.")
end
return (res.zero...,)
end
function simulate(pas::SequentialAllocation,
B_::AbstractFloat, s_0::Integer,
T::Integer,
sHist::Union{Vector, Nothing}=nothing)
model = pas.model
Π, β, Uc = model.Π, model.β, model.Uc
if isnothing(sHist)
sHist = QuantEcon.simulate(pas.mc, T, init=s_0)
end
cHist = zeros(T)
nHist = zeros(T)
Bhist = zeros(T)
ΤHist = zeros(T)
μHist = zeros(T)
RHist = zeros(T-1)
# time 0
μ, cHist[1], nHist[1], _ = time0_allocation(pas, B_, s_0)
ΤHist[1] = Τ(pas.model, cHist[1], nHist[1])[s_0]
Bhist[1] = B_
μHist[1] = μ
# time 1 onward
for t in 2:T
c, n, x, Ξ = time1_allocation(pas,μ)
1140 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
u_c = Uc(c,n)
s = sHist[t]
ΤHist[t] = Τ(pas.model, c, n)[s]
Eu_c = dot(Π[sHist[t-1],:], u_c)
cHist[t], nHist[t], Bhist[t] = c[s], n[s], x[s] / u_c[s]
RHist[t-1] = Uc(cHist[t-1], nHist[t-1]) / (β * Eu_c)
μHist[t] = μ
end
return cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist
end
function get_policies_time1(T::BellmanEquation,
i_x::Integer, x::AbstractFloat,
s::Integer, Vf::AbstractArray)
model, S = T.model, T.S
β, Θ, G, Π = model.β, model.Θ, model.G, model.Π
U, Uc, Un = model.U, model.Uc, model.Un
function get_policies_time0(T::BellmanEquation,
B_::AbstractFloat, s0::Integer, Vf::Array)
model, S = T.model, T.S
β, Θ, G, Π = model.β, model.Θ, model.G, model.Π
U, Uc, Un = model.U, model.Uc, model.Un
function objf(z, grad)
c, xprime = z[1], z[2:end]
n = c+G[s0]
Vprime = [Vf[sprime](xprime[sprime]) for sprime in 1:S]
return -(U(c, n) + β * dot(Π[s0, :], Vprime))
end
function cons(z::Vector, grad)
c, xprime = z[1], z[2:end]
n = c + G[s0]
return -Uc(c, n) * (c - B_) - Un(c, n) * n - β * dot(Π[s0, :], xprime)
end
lb = vcat(0, T.xbar[1] * ones(S))
ub = vcat(1-G[s0], T.xbar[2] * ones(S))
opt = Opt(:LN_COBYLA, length(T.zFB[s0])-1)
min_objective!(opt, objf)
equality_constraint!(opt, cons)
lower_bounds!(opt, lb)
upper_bounds!(opt, ub)
maxeval!(opt, 300)
maxtime!(opt, 10)
init = vcat(T.zFB[s0][1], T.zFB[s0][3:end])
for (i, val) in enumerate(init)
if val > ub[i]
init[i] = ub[i]
elseif val < lb[i]
init[i] = lb[i]
end
end
(minf, minx, ret) = optimize(opt, init)
1142 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
To analyze the AMSS model, we find it useful to adopt a recursive formulation using tech-
niques like those in our lectures on dynamic Stackelberg models and optimal taxation with
state-contingent debt.
where 𝑅𝑡 (𝑠𝑡 ) is the gross risk-free rate of interest between 𝑡 and 𝑡 + 1 at history 𝑠𝑡 and 𝑇𝑡 (𝑠𝑡 )
are nonnegative transfers.
Throughout this lecture, we shall set transfers to zero (for some issues about the limiting
behavior of debt, this makes a possibly important difference from AMSS [2], who restricted
transfers to be nonnegative).
In this case, the household faces a sequence of budget constraints
𝑏𝑡 (𝑠𝑡−1 ) + (1 − 𝜏𝑡 (𝑠𝑡 ))𝑛𝑡 (𝑠𝑡 ) = 𝑐𝑡 (𝑠𝑡 ) + 𝑏𝑡+1 (𝑠𝑡 )/𝑅𝑡 (𝑠𝑡 ) (16)
The household’s first-order conditions are 𝑢𝑐,𝑡 = 𝛽𝑅𝑡 𝔼 𝑡 𝑢𝑐,𝑡+1 and (1 − 𝜏𝑡 )𝑢𝑐,𝑡 = 𝑢𝑙,𝑡 .
Using these to eliminate 𝑅𝑡 and 𝜏𝑡 from budget constraint (16) gives
or
𝑢𝑐,𝑡 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡−1 ) + 𝑢𝑙,𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) = 𝑢𝑐,𝑡 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) + 𝛽(𝔼 𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 ) (18)
63.4. RECURSIVE VERSION OF AMSS MODEL 1143
Now define
𝑏𝑡+1 (𝑠𝑡 )
𝑥𝑡 ≡ 𝛽𝑏𝑡+1 (𝑠𝑡 )𝔼 𝑡 𝑢𝑐,𝑡+1 = 𝑢𝑐,𝑡 (𝑠𝑡 ) (19)
𝑅𝑡 (𝑠𝑡 )
𝑢𝑐,𝑡 𝑥𝑡−1
= 𝑢𝑐,𝑡 𝑐𝑡 − 𝑢𝑙,𝑡 𝑛𝑡 + 𝑥𝑡 (20)
𝛽𝔼 𝑡−1 𝑢𝑐,𝑡
for 𝑡 ≥ 1.
The right side of equation (21) expresses the time 𝑡 value of government debt in terms of a
linear combination of terms whose individual components are measurable with respect to 𝑠𝑡 .
The sum of terms on the right side of equation (21) must equal 𝑏𝑡 (𝑠𝑡−1 ).
That implies that it is has to be measurable with respect to 𝑠𝑡−1 .
Equations (21) are the measurablility constraints that the AMSS model adds to the single
time 0 implementation constraint imposed in the Lucas and Stokey model.
Let Π(𝑠|𝑠− ) be a Markov transition matrix whose entries tell probabilities of moving from
state 𝑠− to state 𝑠 in one period.
Let
• 𝑉 (𝑥− , 𝑠− ) be the continuation value of a continuation Ramsey plan at 𝑥𝑡−1 = 𝑥− , 𝑠𝑡−1 =
𝑠− for 𝑡 ≥ 1.
• 𝑊 (𝑏, 𝑠) be the value of the Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠.
We distinguish between two types of planners:
For 𝑡 ≥ 1, the value function for a continuation Ramsey planner satisfies the Bellman
equation
𝑢𝑐 (𝑠)𝑥−
= 𝑢𝑐 (𝑠)(𝑛(𝑠) − 𝑔(𝑠)) − 𝑢𝑙 (𝑠)𝑛(𝑠) + 𝑥(𝑠) (23)
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
1144 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
A continuation Ramsey planner at 𝑡 ≥ 1 takes (𝑥𝑡−1 , 𝑠𝑡−1 ) = (𝑥− , 𝑠− ) as given and before 𝑠 is
realized chooses (𝑛𝑡 (𝑠𝑡 ), 𝑥𝑡 (𝑠𝑡 )) = (𝑛(𝑠), 𝑥(𝑠)) for 𝑠 ∈ 𝑆.
The Ramsey planner takes (𝑏0 , 𝑠0 ) as given and chooses (𝑛0 , 𝑥0 ).
The value function 𝑊 (𝑏0 , 𝑠0 ) for the time 𝑡 = 0 Ramsey planner satisfies the Bellman equa-
tion
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ Π(𝑠|𝑠− )𝜇(𝑠|𝑠− ) (27)
𝑠
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠̃ − )𝑢𝑐 (𝑠)̃
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ (Π(𝑠|𝑠− ) ) 𝑉𝑥 (𝑥(𝑠), 𝑠) (28)
𝑠
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
̌ 𝑢𝑐 (𝑠)
Π(𝑠|𝑠 − ) ≡ Π(𝑠|𝑠− )
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
̌
Exercise: Please verify that Π(𝑠|𝑠 − ) is a valid Markov transition density, i.e., that its ele-
ments are all nonnegative and that for each 𝑠− , the sum over 𝑠 equals unity.
Along a Ramsey plan, the state variable 𝑥𝑡 = 𝑥𝑡 (𝑠𝑡 , 𝑏0 ) becomes a function of the history 𝑠𝑡
and initial government debt 𝑏0 .
In Lucas-Stokey model, we found that
63.4. RECURSIVE VERSION OF AMSS MODEL 1145
• a counterpart to 𝑉𝑥 (𝑥, 𝑠) is time invariant and equal to the Lagrange multiplier on the
Lucas-Stokey implementability constraint
• time invariance of 𝑉𝑥 (𝑥, 𝑠) is the source of a key feature of the Lucas-Stokey model,
namely, state variable degeneracy (i.e., 𝑥𝑡 is an exact function of 𝑠𝑡 )
That 𝑉𝑥 (𝑥, 𝑠) varies over time according to a twisted martingale means that there is no state-
variable degeneracy in the AMSS model.
In the AMSS model, both 𝑥 and 𝑠 are needed to describe the state.
This property of the AMSS model transmits a twisted martingale component to consumption,
employment, and the tax rate.
When 𝜇(𝑠|𝑠− ) = 𝛽𝑉𝑥 (𝑥(𝑠), 𝑥) converges to zero, in the limit 𝑢𝑙 (𝑠) = 1 = 𝑢𝑐 (𝑠), so that
𝜏 (𝑥(𝑠), 𝑠) = 0.
Thus, in the limit, if 𝑔𝑡 is perpetually random, the government accumulates sufficient assets
to finance all expenditures from earnings on those assets, returning any excess revenues to the
household as nonnegative lump sum transfers.
63.4.7 Code
zFB::Vector{Vector{TR}}
end
function incomplete_allocation(PP::SequentialAllocation,
μ_::AbstractFloat,
s_::Integer)
c, n, x, V = time1_value(PP, μ_)
return c, n, dot(Π[s_, :], x), dot(Π[s_, :], V)
end
cf = Array{Function}(undef, S, S)
nf = Array{Function}(undef, S, S)
xprimef = Array{Function}(undef, S, S)
Vf = Vector{Function}(undef, S)
xgrid = Array{TR}(undef, S, length(μgrid))
for s_ in 1:S
c = Array{TR}(undef, length(μgrid), S)
n = Array{TR}(undef, length(μgrid), S)
x = Array{TR}(undef, length(μgrid))
V = Array{TR}(undef, length(μgrid))
63.4. RECURSIVE VERSION OF AMSS MODEL 1147
end
splV = Spline1D(x[end:-1:1], V[end:-1:1], k=3)
Vf[s_] = y -> splV(y)
# Vf[s_] = LinInterp(x[end:-1:1], V[end:-1:1])
end
# Create xgrid
xbar = [maximum(minimum(xgrid)), minimum(maximum(xgrid))]
xgrid = range(xbar[1], xbar[2], length = length(μgrid))
diff = 0.0
for s=1:S
diff = max(diff, maximum(abs, (Vf[s].(xgrid) - Vfnew[s].
↪ (xgrid)) ./
Vf[s].(xgrid)))
end
println("diff = $diff")
Vf = copy(Vfnew)
end
function fit_policy_function(T::BellmanEquation_Recursive,
PF::Function,
xgrid::AbstractVector{TF}) where {TF <:�
↪AbstractFloat}
S = T.S
# preallocation
PFvec = Array{TF}(undef, 4S + 1, length(xgrid))
1148 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
cf = Array{Function}(undef, S, S)
nf = Array{Function}(undef, S, S)
xprimef = Array{Function}(undef, S, S)
TTf = Array{Function}(undef, S, S)
Vf = Vector{Function}(undef, S)
# fit policy fuctions
for s_ in 1:S
for (i_x, x) in enumerate(xgrid)
PFvec[:, i_x] = PF(i_x, x, s_)
end
splV = Spline1D(xgrid, PFvec[1,:], k=3)
Vf[s_] = y -> splV(y)
# Vf[s_] = LinInterp(xgrid, PFvec[1, :])
for sprime=1:S
splc = Spline1D(xgrid, PFvec[1 + sprime, :], k=3)
spln = Spline1D(xgrid, PFvec[1 + S + sprime, :], k=3)
splxprime = Spline1D(xgrid, PFvec[1 + 2S + sprime, :], k=3)
splTT = Spline1D(xgrid, PFvec[1 + 3S + sprime, :], k=3)
cf[s_, sprime] = y -> splc(y)
nf[s_, sprime] = y -> spln(y)
xprimef[s_, sprime] = y -> splxprime(y)
TTf[s_, sprime] = y -> splTT(y)
end
end
policies = (cf, nf, xprimef, TTf)
return Vf, policies
end
function Tau(pab::RecursiveAllocation,
c::AbstractArray,
n::AbstractArray)
model = pab.model
Uc, Un = model.Uc(c, n), model.Un(c, n)
return 1. .+ Un ./ (model.Θ .* Uc)
end
function simulate(pab::RecursiveAllocation,
B_::TF, s_0::Integer, T::Integer,
sHist::Vector=simulate(pab.mc, T, init=s_0)) where {TF <:
AbstractFloat}
model, mc, Vf, S = pab.model, pab.mc, pab.Vf, pab.S
Π, Uc = model.Π, model.Uc
cf, nf, xprimef, TTf = pab.policies
63.4. RECURSIVE VERSION OF AMSS MODEL 1149
cHist = Array{TF}(undef, T)
nHist = Array{TF}(undef, T)
Bhist = Array{TF}(undef, T)
xHist = Array{TF}(undef, T)
TauHist = Array{TF}(undef, T)
THist = Array{TF}(undef, T)
μHist = Array{TF}(undef, T)
#time0
cHist[1], nHist[1], xHist[1], THist[1] = time0_allocation(pab, B_, s_0)
TauHist[1] = Tau(pab, cHist[1], nHist[1])[s_0]
Bhist[1] = B_
μHist[1] = Vf[s_0](xHist[1])
#time 1 onward
for t in 2:T
s_, x, s = sHist[t-1], xHist[t-1], sHist[t]
c = Array{TF}(undef, S)
n = Array{TF}(undef, S)
xprime = Array{TF}(undef, S)
TT = Array{TF}(undef, S)
for sprime=1:S
c[sprime], n[sprime], xprime[sprime], TT[sprime] =
cf[s_, sprime](x), nf[s_, sprime](x),
xprimef[s_, sprime](x), TTf[s_, sprime](x)
end
μHist[t] = Vf[s](xprime[s])
function BellmanEquation_Recursive(model::Model{TF},
xgrid::AbstractVector{TF},
policies0::Array) where {TF <: AbstractFloat}
end
z0[i_x, s] = vcat(cs, ns, xprimes, zeros(S))
end
end
cFB, nFB, IFB, xFB, zFB = find_first_best(model, S, 2)
return BellmanEquation_Recursive(model, S, xbar, time_0, z0, cFB, nFB,�
↪ xFB, zFB)
end
function get_policies_time1(T::BellmanEquation_Recursive,
i_x::Integer,
x::Real,
s_::Integer,
Vf::AbstractArray{Function},
xbar::AbstractVector)
model, S = T.model, T.S
β, Θ, G, Π = model.β, model.Θ, model.G, model.Π
U,Uc,Un = model.U, model.Uc, model.Un
if model.transfers == true
lb = vcat(zeros(S_possible), ones(S_possible)*xbar[1],�
↪zeros(S_possible))
if model.n_less_than_one == true
ub = vcat(ones(S_possible) - G[sprimei_possible],
ones(S_possible) * xbar[2], ones(S_possible))
else
ub = vcat(100 * ones(S_possible),
ones(S_possible) * xbar[2],
63.4. RECURSIVE VERSION OF AMSS MODEL 1151
100 * ones(S_possible))
end
init = vcat(T.z0[i_x, s_][sprimei_possible],
T.z0[i_x, s_][2S .+ sprimei_possible],
T.z0[i_x, s_][3S .+ sprimei_possible])
opt = Opt(:LN_COBYLA, 3S_possible)
equality_constraint!(opt, cons, zeros(S_possible))
else
lb = vcat(zeros(S_possible), ones(S_possible)*xbar[1])
if model.n_less_than_one == true
ub = vcat(ones(S_possible)-G[sprimei_possible],�
↪ones(S_possible)*xbar[2])
else
ub = vcat(ones(S_possible), ones(S_possible) * xbar[2])
end
init = vcat(T.z0[i_x, s_][sprimei_possible],
T.z0[i_x, s_][2S .+ sprimei_possible])
opt = Opt(:LN_COBYLA, 2S_possible)
equality_constraint!(opt, cons_no_trans, zeros(S_possible))
end
init[init .> ub] = ub[init .> ub]
init[init .< lb] = lb[init .< lb]
min_objective!(opt, objf)
lower_bounds!(opt, lb)
upper_bounds!(opt, ub)
maxeval!(opt, 10000000)
maxtime!(opt, 10)
ftol_rel!(opt, 1e-8)
ftol_abs!(opt, 1e-8)
if model.transfers == true
T.z0[i_x, s_][3S .+ sprimei_possible] = minx[2S_possible + 1:
↪3S_possible]
else
T.z0[i_x, s_][3S .+ sprimei_possible] = zeros(S)
end
function get_policies_time0(T::BellmanEquation_Recursive,
B_::Real,
s0::Integer,
1152 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Vf::AbstractArray{Function},
xbar::AbstractVector)
model = T.model
β, Θ, G = model.β, model.Θ, model.G
U, Uc, Un = model.U, model.Uc, model.Un
function cons(z,grad)
c, xprime, TT = z[1], z[2], z[3]
n = (c + G[s0]) / Θ[s0]
return -Uc(c, n) * (c - B_ - TT) - Un(c, n) * n - β * xprime
end
cons_no_trans(z, grad) = cons(vcat(z, 0), grad)
if model.transfers == true
lb = [0.0, xbar[1], 0.0]
if model.n_less_than_one == true
ub = [1 - G[s0], xbar[2], 100]
else
ub = [100.0, xbar[2], 100.0]
end
init = vcat(T.zFB[s0][1], T.zFB[s0][3], T.zFB[s0][4])
init = [0.95124922, -1.15926816, 0.0]
opt = Opt(:LN_COBYLA, 3)
equality_constraint!(opt, cons)
else
lb = [0.0, xbar[1]]
if model.n_less_than_one == true
ub = [1-G[s0], xbar[2]]
else
ub = [100, xbar[2]]
end
init = vcat(T.zFB[s0][1], T.zFB[s0][3])
init = [0.95124922, -1.15926816]
opt = Opt(:LN_COBYLA, 2)
equality_constraint!(opt, cons_no_trans)
end
init[init .> ub] = ub[init .> ub]
init[init .< lb] = lb[init .< lb]
min_objective!(opt, objf)
lower_bounds!(opt, lb)
upper_bounds!(opt, ub)
maxeval!(opt, 100000000)
maxtime!(opt, 30)
end
if model.transfers == true
return -minf, minx[1], minx[1]+G[s0], minx[2], minx[3]
else
return -minf, minx[1], minx[1]+G[s0], minx[2], 0
end
end
63.5 Examples
In our lecture on optimal taxation with state contingent debt we studied how the government
manages uncertainty in a simple setting.
As in that lecture, we assume the one-period utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
Note
We consider the same government expenditure process studied in the lecture on optimal taxa-
tion with state contingent debt.
Government expenditures are known for sure in all periods except one
• For 𝑡 < 3 or 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1.
• At 𝑡 = 3 a war occurs with probability 0.5.
– If there is war, 𝑔3 = 𝑔ℎ = 0.2.
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1.
A useful trick is to define components of the state vector as the following six (𝑡, 𝑔) pairs:
0 1 0 0 0 0
⎛
⎜ 0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
0 0 0 0.5 0.5 0⎟
𝑃 =⎜
⎜
⎜
⎟
⎟
⎜ 0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
0.1
⎛
⎜0.1⎞⎟
⎜
⎜ ⎟
0.1⎟
𝑔=⎜
⎜
⎜
⎟
⎟
⎟
⎜0.1 ⎟
⎜
⎜0.2⎟⎟
⎝0.1⎠
The following figure plots the Ramsey plan under both complete and incomplete markets for
both possible realizations of the state at time 𝑡 = 3.
Optimal policies when the government has access to state contingent debt are represented by
black lines, while the optimal policies when there is only a risk free bond are in red.
63.5. EXAMPLES 1155
Paths with circles are histories in which there is peace, while those with triangle denote war.
sHist_h = [1, 2, 3, 4, 6, 6, 6]
sHist_l = [1, 2, 3, 5, 6, 6, 6]
using Plots
gr(fmt=:png);
titles = hcat("Consumption", "Labor Supply", "Government Debt",
"Tax Rate", "Government Spending", "Output")
sim_seq_l_plot = hcat(sim_seq_l[1:3]..., sim_seq_l[4],
time_example.G[sHist_l],
time_example.Θ[sHist_l] .* sim_seq_l[2])
sim_bel_l_plot = hcat(sim_bel_l[1:3]..., sim_bel_l[5],
time_example.G[sHist_l],
time_example.Θ[sHist_l] .* sim_bel_l[2])
sim_seq_h_plot = hcat(sim_seq_h[1:3]..., sim_seq_h[4],
time_example.G[sHist_h],
time_example.Θ[sHist_h] .* sim_seq_h[2])
sim_bel_h_plot = hcat(sim_bel_h[1:3]..., sim_bel_h[5],
time_example.G[sHist_h],
time_example.Θ[sHist_h] .* sim_bel_h[2])
p = plot(size = (920, 750), layout =(3, 2),
xaxis=(0:6), grid=false, titlefont=Plots.font("sans-serif", 10))
plot!(p, title = titles)
for i=1:6
plot!(p[i], 0:6, sim_seq_l_plot[:, i], marker=:circle, color=:black,�
↪lab="")
end
1156 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
diff = 0.01954181139136142
diff = 0.020296606343314445
diff = 0.01825290333882874
diff = 0.01732330130265356
diff = 0.002438290769982289
diff = 0.0017049947758874273
diff = 0.001491200081454862
diff = 0.0008415716013777662
diff = 0.0006551802043226648
diff = 0.0005101110034210979
diff = 0.00045899123671075637
diff = 0.0004130378473633363
diff = 0.00037165554806329687
diff = 0.00033441325852100494
diff = 0.0003008983054583036
diff = 0.0002707536359176724
diff = 0.00024362702609734803
diff = 0.0002192208026612293
diff = 0.00019723701023064875
diff = 0.00017750553015148367
diff = 0.00015972217622243714
diff = 0.00014373496579809245
diff = 0.00012934400797986716
diff = 0.00011639475826865351
diff = 0.00010474335075915266
diff = 9.425940404825291e-5
Out[6]:
63.5. EXAMPLES 1157
How a Ramsey planner responds to war depends on the structure of the asset market.
If it is able to trade state-contingent debt, then at time 𝑡 = 2
• the government purchases an Arrow security that pays off when 𝑔3 = 𝑔ℎ
• the government sells an Arrow security that pays off when 𝑔3 = 𝑔𝑙
• These purchases are designed in such a way that regardless of whether or not there is a
war at 𝑡 = 3, the government will begin period 𝑡 = 4 with the same government debt.
This pattern facilities smoothing tax rates across states.
The government without state contingent debt cannot do this.
Instead, it must enter time 𝑡 = 3 with the same level of debt falling due whether there is
peace or war at 𝑡 = 3.
It responds to this constraint by smoothing tax rates across time.
To finance a war it raises taxes and issues more debt.
To service the additional debt burden, it raises taxes in all future periods.
The absence of state contingent debt leads to an important difference in the optimal tax pol-
icy.
When the Ramsey planner has access to state contingent debt, the optimal tax policy is his-
tory independent
• the tax rate is a function of the current level of government spending only, given the
Lagrange multiplier on the implementability constraint.
Without state contingent debt, the optimal tax rate is history dependent.
• A war at time 𝑡 = 3 causes a permanent increase in the tax rate.
History dependence occurs more dramatically in a case in which the government perpetually
faces the prospect of war.
This case was studied in the final example of the lecture on optimal taxation with state-
contingent debt.
There, each period the government faces a constant probability, 0.5, of war.
In addition, this example features the following preferences
Ucc(c,n) = -c.^(-2.0)
Un(c,n) = -ψ ./ (1.0 .- n)
Unn(c,n) = -ψ ./ (1.0 .- n).^2.0
n_less_than_one = true
return Model(β, Π, G, Θ, transfers,
U, Uc, Ucc, Un, Unn, n_less_than_one)
end
With these preferences, Ramsey tax rates will vary even in the Lucas-Stokey model with
state-contingent debt.
The figure below plots optimal tax policies for both the economy with state contingent debt
(circles) and the economy with only a risk-free bond (triangles)
T = 20
sHist = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1]
#simulate
sim_seq = simulate(log_sequential, 0.5, 1, T, sHist)
sim_bel = simulate(log_bellman, 0.5, 1, T, sHist)
sim_seq_plot = hcat(sim_seq[1:3]...,
sim_seq[4], log_example.G[sHist], log_example.Θ[sHist] .*�
↪sim_seq[2])
sim_bel_plot = hcat(sim_bel[1:3]...,
sim_bel[5], log_example.G[sHist], log_example.Θ[sHist] .*�
↪sim_bel[2])
#plot policies
p = plot(size = (920, 750), layout = grid(3, 2),
xaxis=(0:T), grid=false, titlefont=Plots.font("sans-serif", 10))
labels = fill(("", ""), 6)
labels[3] = ("Complete Market", "Incomplete Market")
plot!(p, title = titles)
for i = vcat(collect(1:4), 6)
plot!(p[i], sim_seq_plot[:, i], marker=:circle, color=:black,�
↪lab=labels[i][1])
legend=:bottomright)
end
plot!(p[5], sim_seq_plot[:, 5], marker=:circle, color=:blue, lab="")
diff = 0.0007972378476372139
diff = 0.0006423560333504441
63.5. EXAMPLES 1159
diff = 0.0005517441622530832
diff = 0.00048553930013351857
diff = 0.0004226590836939342
diff = 0.00037550672316976404
diff = 0.0003294032122270672
diff = 0.00029337232321718974
diff = 0.00025856795048240623
diff = 0.00023042624865279873
diff = 0.0002031214087191915
diff = 0.00018115282833643646
diff = 0.00016034374751970243
diff = 0.00014294960573402432
diff = 0.0001267581715890033
diff = 0.00011295205489914281
diff = 0.00010030878977062024
diff = 8.934103186095062e-5
Out[8]:
When the government experiences a prolonged period of peace, it is able to reduce govern-
ment debt and set permanently lower tax rates.
However, the government finances a long war by borrowing and raising taxes.
This results in a drift away from policies with state contingent debt that depends on the his-
tory of shocks.
This is even more evident in the following figure that plots the evolution of the two policies
over 200 periods
1160 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
legend=:bottomright)
end
p
Out[9]:
Footnotes
63.5. EXAMPLES 1161
[1] In an allocation that solves the Ramsey problem and that levies distorting taxes on labor,
why would the government ever want to hand revenues back to the private sector? It would
not in an economy with state-contingent debt, since any such allocation could be improved by
lowering distortionary taxes rather than handing out lump-sum transfers. But without state-
contingent debt there can be circumstances when a government would like to make lump-sum
transfers to the private sector.
[2] From the first-order conditions for the Ramsey problem, there exists another realization 𝑠𝑡̃
with the same history up until the previous period, i.e., 𝑠𝑡−1
̃ = 𝑠𝑡−1 , but where the multiplier
𝑡
on constraint (11) takes a positive value, so 𝛾𝑡 (𝑠 ̃ ) > 0.
1162 CHAPTER 63. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Bibliography
[1] S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quarterly
Journal of Economics, 109(3):659–684, 1994.
[2] S. Rao Aiyagari, Albert Marcet, Thomas J. Sargent, and Juha Seppala. Optimal Tax-
ation without State-Contingent Debt. Journal of Political Economy, 110(6):1220–1254,
December 2002.
[5] Cristina Arellano. Default risk and income fluctuations in emerging economies. The
American Economic Review, pages 690–712, 2008.
[6] Papoulis Athanasios and S Unnikrishna Pillai. Probability, random variables, and
stochastic processes. Mc-Graw Hill, 1991.
[7] Ravi Bansal and Amir Yaron. Risks for the Long Run: A Potential Resolution of Asset
Pricing Puzzles. Journal of Finance, 59(4):1481–1509, 08 2004.
[8] Robert J Barro. On the Determination of the Public Debt. Journal of Political Econ-
omy, 87(5):940–971, 1979.
[9] Jess Benhabib, Alberto Bisin, and Shenghao Zhu. The wealth distribution in bewley
economies with capital income risk. Journal of Economic Theory, 159:489–515, 2015.
[11] Dmitri Bertsekas. Dynamic Programming and Stochastic Control. Academic Press, New
York, 1975.
[12] Truman Bewley. The permanent income hypothesis: A theoretical formulation. Journal
of Economic Theory, 16(2):252–292, 1977.
[15] Christopher D Carroll. A Theory of the Consumption Function, with and without Liq-
uidity Constraints. Journal of Economic Perspectives, 15(3):23–45, 2001.
1163
1164 BIBLIOGRAPHY
[16] Christopher D Carroll. The method of endogenous gridpoints for solving dynamic
stochastic optimization problems. Economics Letters, 91(3):312–320, 2006.
[17] Wilbur John Coleman. Solving the Stochastic Growth Model by Policy-Function Itera-
tion. Journal of Business & Economic Statistics, 8(1):27–29, 1990.
[18] J. D. Cryer and K-S. Chan. Time Series Analysis. Springer, 2nd edition edition, 2008.
[19] Steven J Davis, R Jason Faberman, and John Haltiwanger. The flow approach to labor
markets: New data sources, micro-macro links and the recent downturn. Journal of
Economic Perspectives, 2006.
[20] Angus Deaton. Saving and Liquidity Constraints. Econometrica, 59(5):1221–1248, 1991.
[21] Angus Deaton and Christina Paxson. Intertemporal Choice and Inequality. Journal of
Political Economy, 102(3):437–467, 1994.
[22] Wouter J Den Haan. Comparison of solutions to the incomplete markets model with
aggregate uncertainty. Journal of Economic Dynamics and Control, 34(1):4–27, 2010.
[23] Raymond J Deneckere and Kenneth L Judd. Cyclical and chaotic behavior in a dy-
namic equilibrium model, with implications for fiscal policy. Cycles and chaos in eco-
nomic equilibrium, pages 308–329, 1992.
[24] Ulrich Doraszelski and Mark Satterthwaite. Computable markov-perfect industry dy-
namics. The RAND Journal of Economics, 41(2):215–243, 2010.
[25] Y E Du, Ehud Lehrer, and A D Y Pauzner. Competitive economy as a ranking device
over networks. submitted, 2013.
[26] R M Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press, 2002.
[27] Robert F Engle and Clive W J Granger. Co-integration and Error Correction: Repre-
sentation, Estimation, and Testing. Econometrica, 55(2):251–276, 1987.
[28] Richard Ericson and Ariel Pakes. Markov-perfect industry dynamics: A framework for
empirical work. The Review of Economic Studies, 62(1):53–82, 1995.
[31] Jesús Fernández-Villaverde and Charles I Jones. Estimating and simulating a sird
model of covid-19 for many countries, states, and cities. Working Paper 27128, National
Bureau of Economic Research, May 2020.
[33] Milton Friedman and Rose D Friedman. Two Lucky People. University of Chicago
Press, 1998.
[34] Albert Gallatin. Report on the finances**, november, 1807. In Reports of the Secretary
of the Treasury of the United States, Vol 1. Government printing office, Washington,
DC, 1837.
BIBLIOGRAPHY 1165
[35] Olle Häggström. Finite Markov chains and algorithmic applications, volume 52. Cam-
bridge University Press, 2002.
[36] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis:
Theory and Evidence. Journal of Political Economy, 86(6):971–987, 1978.
[37] Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transitory
Income: Estimates from Panel Data on Households. National Bureau of Economic Re-
search Working Paper Series, No. 505, 1982.
[38] James D Hamilton. What’s real about the business cycle? Federal Reserve Bank of St.
Louis Review, (July-August):435–452, 2005.
[40] L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press, 2013.
[41] Lars Peter Hansen. Beliefs, Doubts and Learning: Valuing Macroeconomic Risk. Ameri-
can Economic Review, 97(2):1–30, May 2007.
[42] Lars Peter Hansen, John C. Heaton, and Nan Li. Consumption Strikes Back? Measur-
ing Long-Run Risk. Journal of Political Economy, 116(2):260–302, 04 2008.
[43] Lars Peter Hansen and Scott F Richard. The Role of Conditioning Information in De-
ducing Testable. Econometrica, 55(3):587–613, May 1987.
[44] Lars Peter Hansen and Thomas J Sargent. Formulating and estimating dynamic linear
rational expectations models. Journal of Economic Dynamics and control, 2:7–46, 1980.
[45] Lars Peter Hansen and Thomas J Sargent. Wanting robustness in macroeconomics.
Manuscript, Department of Economics, Stanford University., 4, 2000.
[46] Lars Peter Hansen and Thomas J. Sargent. Risk, Uncertainty, and Value. Princeton
University Press, Princeton, New Jersey, 2017.
[47] Lars Peter Hansen and Jose A. Scheinkman. Long-term risk: An operator approach.
Econometrica, 77(1):177–234, 01 2009.
[48] J. Michael Harrison and David M. Kreps. Speculative investor behavior in a stock mar-
ket with heterogeneous expectations. The Quarterly Journal of Economics, 92(2):323–
336, 1978.
[49] J. Michael Harrison and David M. Kreps. Martingales and arbitrage in multiperiod
securities markets. Journal of Economic Theory, 20(3):381–408, June 1979.
[50] John Heaton and Deborah J Lucas. Evaluating the effects of incomplete markets on risk
sharing and asset pricing. Journal of Political Economy, pages 443–487, 1996.
[51] Jane M Heffernan, Robert J Smith, and Lindi M Wahl. Perspectives on the basic repro-
ductive ratio. Journal of the Royal Society Interface, 2(4):281–293, 2005.
[52] Elhanan Helpman and Paul Krugman. Market structure and international trade. MIT
Press Cambridge, 1985.
[54] Hugo A Hopenhayn and Edward C Prescott. Stochastic Monotonicity and Stationary
Distributions for Dynamic Economies. Econometrica, 60(6):1387–1406, 1992.
[55] Hugo A Hopenhayn and Richard Rogerson. Job Turnover and Policy Evaluation: A
General Equilibrium Analysis. Journal of Political Economy, 101(5):915–938, 1993.
[57] K Jänich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technol-
ogy. Springer, 1994.
[58] Robert J. Shiller John Y. Campbell. The Dividend-Price Ratio and Expectations of
Future Dividends and Discount Factors. Review of Financial Studies, 1(3):195–228,
1988.
[59] K L Judd. Cournot versus bertrand: A dynamic resolution. Technical report, Hoover
Institution, Stanford University, 1990.
[60] Kenneth L Judd. On the performance of patents. Econometrica, pages 567–585, 1985.
[61] Takashi Kamihigashi. Elementary results on solutions to the bellman equation of dy-
namic programming: existence, uniqueness, and convergence. Technical report, Kobe
University, 2012.
[62] David M. Kreps. Notes on the Theory of Choice. Westview Press, Boulder, Colorado,
1988.
[64] A Lasota and M C MacKey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynam-
ics. Applied Mathematical Sciences. Springer-Verlag, 1994.
[65] Martin Lettau and Sydney Ludvigson. Consumption, Aggregate Wealth, and Expected
Stock Returns. Journal of Finance, 56(3):815–849, 06 2001.
[66] Martin Lettau and Sydney C. Ludvigson. Understanding Trend and Cycle in Asset
Values: Reevaluating the Wealth Effect on Consumption. American Economic Review,
94(1):276–299, March 2004.
[67] David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic
cournot-nash solution. The Bell Journal of Economics, pages 322–334, 1980.
[68] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 4 edition,
2018.
[69] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the
Econometric Society, 46(6):1429–1445, 1978.
[70] Robert E Lucas, Jr. Macroeconomic Priorities. American Economic Review, 93(1):1–14,
March 2003.
[71] Robert E Lucas, Jr. and Edward C Prescott. Investment under uncertainty. Economet-
rica: Journal of the Econometric Society, pages 659–681, 1971.
[72] Robert E Lucas, Jr. and Nancy L Stokey. Optimal Fiscal and Monetary Policy in an
Economy without Capital. Journal of monetary Economics, 12(3):55–93, 1983.
BIBLIOGRAPHY 1167
[73] Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in En-
vironments with Hidden State Variables and Private Information. Journal of Political
Economy, 97(6):1306–1322, 1989.
[74] V Filipe Martins-da Rocha and Yiannis Vailakis. Existence and Uniqueness of a Fixed
Point for Local Contractions. Econometrica, 78(3):1127–1141, 2010.
[76] J J McCall. Economics of Information and Job Search. The Quarterly Journal of Eco-
nomics, 84(1):113–126, 1970.
[77] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge Uni-
versity Press, 2009.
[78] Mario J Miranda and P L Fackler. Applied Computational Economics and Finance.
Cambridge: MIT Press, 2002.
[79] F. Modigliani and R. Brumberg. Utility analysis and the consumption function: An in-
terpretation of cross-section data. In K.K Kurihara, editor, Post-Keynesian Economics.
1954.
[80] John F Muth. Optimal properties of exponentially weighted forecasts. Journal of the
american statistical association, 55(290):299–306, 1960.
[81] Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor
Economics, 17(2):237–261, 1999.
[83] Jenő Pál and John Stachurski. Fitted value function iteration with probability one con-
tractions. Journal of Economic Dynamics and Control, 37(1):251–264, 2013.
[85] Jesse Perla. A model of product awareness and industry life cycles. Working paper,
University of British Columbia, 2019.
[86] Martin L Puterman. Markov decision processes: discrete stochastic dynamic program-
ming. John Wiley & Sons, 2005.
[87] Guillaume Rabault. When do borrowing constraints bind? Some new results on the
income fluctuation problem. Journal of Economic Dynamics and Control, 26(2):217–
245, 2002.
[94] Thomas J Sargent. Macroeconomic Theory. Academic Press, New York, 2nd edition,
1987.
[95] Jack Schechtman and Vera L S Escudero. Some results on an income fluctuation prob-
lem. Journal of Economic Theory, 16(2):151–166, 1977.
[96] Jose A. Scheinkman. Speculation, Trading, and Bubbles. Columbia University Press,
New York, 2014.
[98] A N Shiryaev. Probability. Graduate Texts in Mathematics. Springer, 2nd edition, 1995.
[99] Alexander A. Stepanov and Daniel E. Rose. From mathematics to generic programming.
Addison-Wesley, 2014.
[101] Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk shar-
ing over the life cycle. Journal of Monetary Economics, 51(3):609–633, 2004.
[103] Thomas D Tallarini. Risk-sensitive real business cycles. Journal of Monetary Eco-
nomics, 45(3):507–532, June 2000.
[104] George Tauchen. Finite state markov-chain approximations to univariate and vector
autoregressions. Economics Letters, 20(2):177–181, 1986.
[105] Ngo Van Long. Dynamic games in the economics of natural resources: a survey. Dy-
namic Games and Applications, 1(1):115–148, 2011.
[106] Abraham Wald. Sequential Analysis. John Wiley and Sons, New York, 1947.
[107] Peter Whittle. Prediction and regulation by linear least-square methods. English Univ.
Press, 1963.
[108] Peter Whittle. Prediction and Regulation by Linear Least Squares Methods. University
of Minnesota Press, Minneapolis, Minnesota, 2nd edition, 1983.
[109] G Alastair Young and Richard L Smith. Essentials of statistical inference. Cambridge
University Press, 2005.