Lecture 3 and 4
Lecture 3 and 4
Dynamic Optimization
We introduce the basic ideas of dynamic optimization and dynamic programming.
Good sources are Ljunquist and Sargent (chapters 3 and 4 and appendix), Sargent (chapter
1 and the appendix) and Stokey, Lucas with Prescott (chapters 1 to 6).
A. Sequential optimization
We begin with a deterministic …nite horizon problem. Suppose a consumer is deciding
on a stream of consumption over T periods fc0 ; :::; cT g and has preferences over this con-
sumption stream denoted by U (c0 ; :::; cT ) : Quite often we model these preferences as being
additively separable13 so that there is a period utility function u (ci ) and a discount factor, ;
P
so that U (c0 ; :::; cT ) = Tt=0 t u (ct ) : < 1 implies that future consumption is less valuable
than current consumption.
The agent’s problem is then
X
T
t
max u (ci )
fct ;kt+1 gT
t=0 t=0
It is clear that with utility strictly increasing that no consumption will be wasted (i.e. the
resource constraint holds with equality) and that in the …nal period the consumer will consume
everything kT +1 = 0:
We solve this problem making use of the Kuhn-Tucker theorem. Without loss of
generality, we will assume that = 1: The Lagrangian is then
X
T
t
L= [u (f (kt ) kt+1 ) + t kt+1 ]
t=0
13
More generally, we may write preferences depending on lagged consumption u (ci ; ci 1 ) or some weighted
measure of lagged consumption u (ci ; si ) where si+1 = f (si ; ci ) : Things get more complicated when we
have preferences depending on future outcomes as in Epstein-Zin (1989) preferences. EZ de…ne preferences
recursively over current consumption and a certainty equivalent of tomorrow’s utility.
18
The …rst order conditions are
@L t 0 t t+1 0
: u (ct ) + t + u (ct+1 ) f 0 (kt+1 ) = 0 if t = 0; 1; 2::; T 1
@kt+1
@L T T
: u0 (cT ) + T =0
@kT +1
Note that u0 (cT ) > 0 by assumption then implies that kT +1 = 0 so that the nothing is left
over. Additionally, because of our assumptions on the utility function we know that in all
the other periods ct > 0 so that t = 0 which yields the Euler equation:
This is a second order di¤erence equation. We now have a system of T equations with two
boundary conditions (k0 ; kT +1 ). This problem generally has a solution and the solution will
be unique under certain conditions. Namely, since u (f (kt ) kt+1 ) is concave in (kt ; kt+1 )
which follows from u being concave and increasing and f being concave. The constraint set
is clearly convex.
What does the Euler equation mean? At the margin, the utility given up today to
invest more is equal to the additional increase in utility tomorrow which depends on the
marginal product of capital. It is often useful to rewrite the Euler equation in the following
form,
This left hand side of the equation is the marginal rate of substitution across periods. The
right hand side is the marginal rate of transformation. E¢ ciency dictates we equate these.
Now while this problem is generally solvable it can be challenging, especially as T !
1. It also becomes more challenging with uncertainty (i.e. subjecting the economy to random
19
shocks to some variable). So, we will take a slightly di¤erent approach to the problem
by essentially breaking it up into two problems, a decision today and a decision starting
tomorrow.
X
T
t
E0 u (xt ; at )
t=0
u (xt ; at ) - instantaneous return function (utility function)
We discount the sum because we care more about the present than the future. We can think
of the problem as
20
There are several constraints
x0 given
When you are choosing an action at it can depend on the state but not on something we
don’t know ("t ). As for the future, given any sequence of controls fat g we can construct the
probability distribution of future states conditional on x0 based on the law of motion and the
initial state. The expectation at time t = 0 is with respect to this distribution.
The environment is stationary (i.e. u; ; f; and F do not depend on t). A stationary
environment allows us to focus on time-invariant decision rules. (xt ; at ) contains all of the
information available at date t relevant for the probability distribution of future events.
Combined with the additive separability of the objective function implies that we want to
choose at = t (xt ) where t is the decision rule. We are really choosing a sequence of
functions ( 0; 1 ; :::; T):
21
De…ne the value of following policy T when there are T periods left and the current
state is x0 as
X
T
t
WT (x0 ; T ) = E0 u [xt ; t (xt )]
t=0
xt+1 = f [xt ; t (xt ) ; "t ]
VT (x) = WT (x; T)
exists and is bounded and continuous in x: {This result is based on the Feller Property14 }
Since decisions at future dates t 1 do not a¤ect the instantaneous return at t = 0 we can
14
Assuming the f is continuous guarantees that the stochastic structure satis…es the Feller Property that is
Z
E [' (xt+1 ) jxt = x; at = a] = ' (f (x; a; ")) dF ("jx; a)
is bounded and continuous in (x; a) for every bounded and continuous real valued function '. Given the Feller
proprty, the existience of an optimal policy and the continuity of the value function, which follows from the
Theorem of the Maximum can be established as in Stokey (1989).
22
cascade the maximization operator,
( )
X
T
t
VT (x0 ) = max E0 u (x0 ; a0 ) + max E1 u (xt ; at ) ;
a0 2 (x0 ) T 12 T 1
t=1
T 1 is the set of feasible policies with T 1 periods to go.
We can think of VT 1 (x1 ) as representing a similar problem when there are T 1 periods left
so that
Drop the time subscripts on fx; a; "g since they are all evaluated at time t = 0; we can
rewrite the previous problem in the general format depending on the number of periods to
go S 2 f1; 2; :::; T g : This is called Bellman’s equation:
Bellman’s equation expresses the choice of a sequence of decision rules as a sequence of choices
for the control variable. It leads to the following solution method known as the DP Algorithm
(Bertsekas 1976).
23
3. Construct a policy by setting
In the DP algorithm we solve the problem starting at the end of time. We work out
the optimal strategy and optimal value at time t + 1 and use that to …gure out what the
optimal strategy and optimal value is at time t. In this way we recursively solve for the
optimal policy. We are using Bellman’s principle of optimality. It says that if a strategy is
optimal for each point in time, given an optimal strategy is used thereafter, then the strategy
is optimal. Optimal policies with this property are time consistent. This result depends on
the recursive structure of the problem but does not generalize fully. Nonetheless, we will
mostly focus on problems which are time consistent.
24
Example 4. One sector growth model with …nite horizon
X
T
t
VT (k0 ) = max ln kt kt+1
fkt+1 gT
t=0 t=0
0 kt+1 kt t = 0; 1; 2; ::T
subject to :
k0 > 0
It is useful if you make sure you are able to map this problem back into the basic dynamic
programming set-up. In particular, make sure you understand which variables are the control
at ; the state variable, xt , what the law of motion, f (xt ; at ), is and what the feasible set, (x) ;
is. Starting at time t = T with 0 periods left, the decision on how much to save is quite
simple kT +1 = 0:
V0 (kT ) = max ln kT kT +1
kT +1 2[0;kT ]
) kT +1 = 0
) V0 (kT ) = ln kT
Now moving back to the penultimate period t = T 1 there is one period left so that
V1 (kT 1) = max ln kT 1 kT + ln kT
kT 2[0;kT 1 ]
dV1 (kT 1) 1
= + =0
dkT kT 1 kT kT
1
) =
kT 1 kT kT
) kT = kT 1 kT
) kT = kT 1
(1 + )
Now we can substitute the optimal policy back into the value function to solve for the value
25
function as a function of the state variable kT 1 which yields
V1 (kT 1) = ln kT 1 kT 1 + ln kT 1
(1 + ) (1 + )
1
= ln k + ln k
(1 + ) T 1 (1 + ) T 1
1
= ln + ln + ln kT 1 + 2 ln kT 1
1+ 1+
1
= ln + ln + (1 + ) ln kT 1
1+ 1+
V2 (kT 2) = max ln kT 2 kT 1
[0;kT 2 ]
kT 12
1
+ ln + ln + (1 + ) ln kT 1
1+ 1+
dV2 (kT 2 ) 1 (1 + )
= + =0
dkT 1 kT 2 kT 1 kT 1
) kT = (1 + ) kT 2 kT
1 1
(1 + ) + 1
) kT 1 = kT 2
(1 + )
(1 + )
) kT 1 = k
1 + (1 + ) T 2
1 ( )T t
kt+1 = k
t+1 t
for t = 1; 2; :::; T
1 ( )T
You should do this as an exercise. Remember to also solve for the value function. Is there a
pattern?
26
value functions for each period. It will be harder because we have no …nal period to start
from. It turns out that this will not slow us down much.
The structure of the problem is similar to the …nite horizon problem except now
individuals maximize the discounted sum of the instantaneous return function (utility) over
an in…nite horizon: t = 0; 1; 2; :::; 1 (time is discrete)
X
1
t
V (x0 ) = max
1
E0 u (xt ; at )
fat gt=0
t=0
x0 given
at 2 (xt )
xt+1 = f (xt ; at ; "t ) (Law of Motion)
"t is a random variable
F ("jx; a) = Pr ("t "jxt = x; at = a)
X
1
t s
Vs (xs ) = max
1
Es u (xt ; at )
fat gt=s
t=s
at 2 (xt )
xt+1 = f (xt ; at ; "t )
The environment is stationary (u; f; F; ; etc.) and the problem is the same at each point in
time. Consequently, the value function and optimal policy are stationary (if they exist) so we
can drop the time subscripts and write the value function and the optimal policy mapping
as V (x) and (x), respectively.
Write Bellman’s equation as:
where the value function, V (x) ; and decision rule, (x) ; solve the functional equation ( i.e.
the solution is a function).
Traditionally there are three methods that people use to solve this problem: 1) Guess
27
and check (undertermined coe¢ cients) 2) Successive Approximations and the 3) Policy Im-
provement.
The idea behind guessing is quite simple: conjecture that a speci…c function V0 (x) is
the solution then substitute it into equation (3) and solve
If V1 (x) = V0 (x) then you have the correct answer and are done. If V1 (x) 6= V0 (x), guess
again. Clearly this method seems a bit haphazard and led people to the next method:
successive approximations.
X
1
t
V (k0 ) = max ln kt kt+1
fkt+1 g1
t=0
t=0
0 kt+1 kt t = 0; 1; 2; ::1
subject to :
k0 > 0
V (k) = max ln k k 0 + V (k 0 )
k0 2[0;k ]
V0 (k) = A + D ln k
Three steps
15
This is the problem studied by Brock and Mirman (1972).
28
1. Solve maximization problem given guess for v
V0 (k) = max ln k k 0 + (A + D ln k 0 )
0 k0 k
1 D D
f oc : = ! k0 = k
k k0 k 0 1+ D
D
2. Evaluate the RHS at the optimum: k 0 = 1+ D
k :
RHS = ln k k 0 + (A + D ln (k 0 ))
k D
= ln + A + D lnk
1+ D 1+ D
D
= ln (1 + D) + A + D ln + ln k + D ln k
1+ D
D
A + D ln k = ln (1 + D) + A + D ln + ln k + D ln k
1+ D
D
A = ln (1 + D) + A + D ln
1+ D
D = + D!D=
1
We can now take D and plug it back into the optimal policy to solve for the policy
function: k 0 = (k) from
D 1
k0 = k = k = k
1+ D 1+ 1
1
A= ln (1 )+ ln
1 1
We can then use the policy function to completely describe a sequence of capital stocks
29
fkt g1
t=0 starting from k0
k1 = (k0 ) = k0
2
k2 = (k1 ) = k1 = k0 =( )1+ k0
The guess and verify approach works well in this case. But, this is because the prim-
itives were rigged to work.16 More generally it does not work so we often are working with
the successive approximations approach.
The starting point for the method of successive approximation is the answer from
an unsuccessful guess. In particular, the method of successive approximations says to use
the new answer V1 (x) as the second guess. This yields a V2 (x). If V2 (x) = V1 (x) then
we are done, if not we use V2 (x) as the new guess. In this way we generate a sequence of
fVn (x) ; n (x)g and the hope is that this sequence eventually results in a …xed point such
that Vn (x) = V (x) and n = (x) or that the sequence Vn (x) ( n (x)) converges to the
true V (x) ( (x)) as n ! 1. This approach essentially introduces greater ‡exibility into the
decision rules by considering a one-period deviation from some rule. This one-shot deviation
must o¤er a weak improvement so that Vn+1 Vn and is a way of introducing extra ‡exibility
into the solution.
In solving the …nite horizon problem we made use of the last period in order to solve the
problem. When time is in…nite we have no last period to start from and work backwards. Yet
under certain circumstance we are able to use the methods from the …nite horizon problem
to solve the in…nite horizon problem. In this case we can think of the solution to the in…nite
horizon problem as the limit of a sequence of …nite problems with the time period becoming
progressively longer (T ! 1). This approach is a speci…c application of the method of
successive approximations when the initial guess is that V0 = 0.
Furthermore, when the period of time is of su¢ ciently long duration, we can also
sometimes approximate the solution to the …nite horizon problem with the in…nite horizon
solution.
16
Guess and verify generally works only with speci…cations with quadratic preferences and linear constraints
or Cobb-Douglas constraints and log preferences. Some further examples are in Hercowitz & Sampson (1991),
Benhabib & Rustichini (1994), and Antony and Maussner (2007).
30
The third method to solve these problems is to iterate on the policy function. It
consists of three step. First pick a feasible policy 0 (x) and compute the value of following
this policy
X
t
V0 (x) = u (x; 0 (x)) with xt+1 = f (xt ; 0 (xt ))
Second, generate a new policy a = 1 (x) that solves the two period problem
X
t
V1 (x) = u (x; 1 (x)) with xt+1 = f (xt ; 1 (xt ))
31