0% found this document useful (0 votes)
14 views14 pages

Lecture 3 and 4

The document discusses dynamic optimization and programming, focusing on sequential optimization and the use of the Kuhn-Tucker theorem to solve deterministic finite horizon problems. It introduces dynamic programming as a method for solving optimal functions in terms of actions and state values, detailing the finite horizon case and Bellman's equation for recursive problem-solving. The document also provides an example of a one-sector growth model to illustrate the application of these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Lecture 3 and 4

The document discusses dynamic optimization and programming, focusing on sequential optimization and the use of the Kuhn-Tucker theorem to solve deterministic finite horizon problems. It introduces dynamic programming as a method for solving optimal functions in terms of actions and state values, detailing the finite horizon case and Bellman's equation for recursive problem-solving. The document also provides an example of a one-sector growth model to illustrate the application of these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3.

Dynamic Optimization
We introduce the basic ideas of dynamic optimization and dynamic programming.
Good sources are Ljunquist and Sargent (chapters 3 and 4 and appendix), Sargent (chapter
1 and the appendix) and Stokey, Lucas with Prescott (chapters 1 to 6).

A. Sequential optimization
We begin with a deterministic …nite horizon problem. Suppose a consumer is deciding
on a stream of consumption over T periods fc0 ; :::; cT g and has preferences over this con-
sumption stream denoted by U (c0 ; :::; cT ) : Quite often we model these preferences as being
additively separable13 so that there is a period utility function u (ci ) and a discount factor, ;
P
so that U (c0 ; :::; cT ) = Tt=0 t u (ct ) : < 1 implies that future consumption is less valuable
than current consumption.
The agent’s problem is then

X
T
t
max u (ci )
fct ;kt+1 gT
t=0 t=0

ct + kt+1 F (kt ) + (1 ) kt = f (kt ) ;


ct ; kt 0 for all t: k0 given,
subject to :
u0 > 0; u00 < 0; limc!0 u0 (c) = 1;
Inada conditions

It is clear that with utility strictly increasing that no consumption will be wasted (i.e. the
resource constraint holds with equality) and that in the …nal period the consumer will consume
everything kT +1 = 0:
We solve this problem making use of the Kuhn-Tucker theorem. Without loss of
generality, we will assume that = 1: The Lagrangian is then

X
T
t
L= [u (f (kt ) kt+1 ) + t kt+1 ]
t=0

13
More generally, we may write preferences depending on lagged consumption u (ci ; ci 1 ) or some weighted
measure of lagged consumption u (ci ; si ) where si+1 = f (si ; ci ) : Things get more complicated when we
have preferences depending on future outcomes as in Epstein-Zin (1989) preferences. EZ de…ne preferences
recursively over current consumption and a certainty equivalent of tomorrow’s utility.

18
The …rst order conditions are

@L t 0 t t+1 0
: u (ct ) + t + u (ct+1 ) f 0 (kt+1 ) = 0 if t = 0; 1; 2::; T 1
@kt+1
@L T T
: u0 (cT ) + T =0
@kT +1

And the complementary slackness conditions

t kt+1 = 0; kt+1 0; and t 0 for t = 0; 1; 2:::; T

Note that u0 (cT ) > 0 by assumption then implies that kT +1 = 0 so that the nothing is left
over. Additionally, because of our assumptions on the utility function we know that in all
the other periods ct > 0 so that t = 0 which yields the Euler equation:

u0 (f (kt ) kt+1 ) = u0 (f (kt+1 ) kt+2 ) f 0 (kt+1 ) t = 0; :::; T 1

This is a second order di¤erence equation. We now have a system of T equations with two
boundary conditions (k0 ; kT +1 ). This problem generally has a solution and the solution will
be unique under certain conditions. Namely, since u (f (kt ) kt+1 ) is concave in (kt ; kt+1 )
which follows from u being concave and increasing and f being concave. The constraint set
is clearly convex.
What does the Euler equation mean? At the margin, the utility given up today to
invest more is equal to the additional increase in utility tomorrow which depends on the
marginal product of capital. It is often useful to rewrite the Euler equation in the following
form,

u0 (f (kt ) kt+1 ) u0 (ct )


= = f 0 (kt+1 ) :
u0 (f (kt+1 ) kt+2 ) u0 (ct+1 )

This left hand side of the equation is the marginal rate of substitution across periods. The
right hand side is the marginal rate of transformation. E¢ ciency dictates we equate these.
Now while this problem is generally solvable it can be challenging, especially as T !
1. It also becomes more challenging with uncertainty (i.e. subjecting the economy to random

19
shocks to some variable). So, we will take a slightly di¤erent approach to the problem
by essentially breaking it up into two problems, a decision today and a decision starting
tomorrow.

B. Discrete Time Dynamic Programming


Compared to the problem of solving for the optimal sequence of actions, dynamic
programming is about solving for optimal functions in terms of actions and the value of states.
We will discuss dynamic programming building from a …nite horizon to in…nite horizon case.
We will then consider properties of these functions. We will conclude with some examples.
The goal here is to get you up and running using these techniques. For a more rigorous
treatment of the subject you should refer to Stokey and Lucas (1989).

The Finite Horizon case


We will consider a more general model than in the last example. As before time is
discrete (t = 0; 1; 2; :::; T < 1).
Individuals maximize the discounted sum of some function

X
T
t
E0 u (xt ; at )
t=0
u (xt ; at ) - instantaneous return function (utility function)

xt - state variable at each date

at - control variable at each date

xt 2 X R m , at 2 A Rn for all t and 2 (0; 1)

E0 - expectation conditional on information at time 0

We discount the sum because we care more about the present than the future. We can think
of the problem as

max E0 U (x0 ; x1 ; :::; xT ; a0 ; a1 ; :::; aT )

20
There are several constraints

at 2 (xt ) for all t

x0 given

xt+1 = f (xt ; at ; "t ) (Law of Motion), "t is a random variable

"t ~F ("jxt ; at ) , F ("jx; a) = Pr ("t "jxt = x; at = a)

When you are choosing an action at it can depend on the state but not on something we
don’t know ("t ). As for the future, given any sequence of controls fat g we can construct the
probability distribution of future states conditional on x0 based on the law of motion and the
initial state. The expectation at time t = 0 is with respect to this distribution.
The environment is stationary (i.e. u; ; f; and F do not depend on t). A stationary
environment allows us to focus on time-invariant decision rules. (xt ; at ) contains all of the
information available at date t relevant for the probability distribution of future events.
Combined with the additive separability of the objective function implies that we want to
choose at = t (xt ) where t is the decision rule. We are really choosing a sequence of
functions ( 0; 1 ; :::; T):

Definition 1. T is a policy (of length T ) where T =( 0; 1 ; :::; T) and t : X ! A for


all t

Definition 2. The set of feasible policies is

T =f T =( 0; 1 ; :::; T) : t (x) 2 (x) 8x; tg

Definition 3. A policy is stationary if t (x) (x)

Any policy generates a stochastic law of motion for the state

xt+1 = f [xt ; t (xt ) ; "t ]

If the policy is stationary then the law of motion is stationary.

21
De…ne the value of following policy T when there are T periods left and the current
state is x0 as

X
T
t
WT (x0 ; T ) = E0 u [xt ; t (xt )]
t=0
xt+1 = f [xt ; t (xt ) ; "t ]

The individual’s problem is to choose T 2 T to maximize WT (x0 ; T ).

Assumption 1. (x) is non-empty, compact and continuous; u (x; a) is continuous and


bounded; and f (x; a; ") is continuous.

Proposition 1. Given these assumption there exists an optimal policy T =( 0; 1 ; :::; T)

and the optimal value function

VT (x) = WT (x; T)

exists and is bounded and continuous in x: {This result is based on the Feller Property14 }

Applying the law of iterated expectations

E0 (:) = E0 [E1 (:)] where Et is the expectation conditional on t

We can write the optimal value function as


( )
X
T
t
VT (x0 ) = max E0 u (x0 ; a0 ) + E1 u (xt ; at )
T2 T
t=1

Since decisions at future dates t 1 do not a¤ect the instantaneous return at t = 0 we can

14
Assuming the f is continuous guarantees that the stochastic structure satis…es the Feller Property that is
Z
E [' (xt+1 ) jxt = x; at = a] = ' (f (x; a; ")) dF ("jx; a)

is bounded and continuous in (x; a) for every bounded and continuous real valued function '. Given the Feller
proprty, the existience of an optimal policy and the continuity of the value function, which follows from the
Theorem of the Maximum can be established as in Stokey (1989).

22
cascade the maximization operator,
( )
X
T
t
VT (x0 ) = max E0 u (x0 ; a0 ) + max E1 u (xt ; at ) ;
a0 2 (x0 ) T 12 T 1
t=1
T 1 is the set of feasible policies with T 1 periods to go.

We can think of VT 1 (x1 ) as representing a similar problem when there are T 1 periods left
so that

VT (x0 ) = max E0 fu (x0 ; a0 ) + E1 VT 1 [f (x0 ; a0 ; "0 )]g


a0 2 (x0 )

Drop the time subscripts on fx; a; "g since they are all evaluated at time t = 0; we can
rewrite the previous problem in the general format depending on the number of periods to
go S 2 f1; 2; :::; T g : This is called Bellman’s equation:

VS (x) = max u (x; a) + EVS 1[f (x; a; ")]


a2 (x)
Z
: where EVS 1 [f (x; a; ")] = VS 1 [f (x; a; ")] dF ("jx; a)

Bellman’s equation expresses the choice of a sequence of decision rules as a sequence of choices
for the control variable. It leads to the following solution method known as the DP Algorithm
(Bertsekas 1976).

1. Start at the …nal period S = 0 periods to go

V0 (x) = max u (x; a)


a2 (x)

) 0 (x) = arg max u (x; a)


a2 (x)

) V0 (x) = u (x; 0 (x))

2. Work backwards solving from S = 1; ::; T

VS (x) = max u (x; a) + EVS 1 [f (x; a; ")]


a2 (x)

) S (x) = arg max u (x; a) + EVS 1 [f (x; a; ")]


a2 (x)

23
3. Construct a policy by setting

t (x) = T t (x) for all t = 0; 1; :::; T

The policy T =( 0; 1 ; :::; T) is optimal.

In the DP algorithm we solve the problem starting at the end of time. We work out
the optimal strategy and optimal value at time t + 1 and use that to …gure out what the
optimal strategy and optimal value is at time t. In this way we recursively solve for the
optimal policy. We are using Bellman’s principle of optimality. It says that if a strategy is
optimal for each point in time, given an optimal strategy is used thereafter, then the strategy
is optimal. Optimal policies with this property are time consistent. This result depends on
the recursive structure of the problem but does not generalize fully. Nonetheless, we will
mostly focus on problems which are time consistent.

24
Example 4. One sector growth model with …nite horizon

X
T
t
VT (k0 ) = max ln kt kt+1
fkt+1 gT
t=0 t=0

0 kt+1 kt t = 0; 1; 2; ::T
subject to :
k0 > 0

Now, we can de…ne the value function VT (k0 ) as

VT (k0 ) = max ln k0 k1 + VT 1 (k1 )


k1 2[0;k0 ]

It is useful if you make sure you are able to map this problem back into the basic dynamic
programming set-up. In particular, make sure you understand which variables are the control
at ; the state variable, xt , what the law of motion, f (xt ; at ), is and what the feasible set, (x) ;
is. Starting at time t = T with 0 periods left, the decision on how much to save is quite
simple kT +1 = 0:

V0 (kT ) = max ln kT kT +1
kT +1 2[0;kT ]

) kT +1 = 0

) V0 (kT ) = ln kT

Now moving back to the penultimate period t = T 1 there is one period left so that

V1 (kT 1) = max ln kT 1 kT + ln kT
kT 2[0;kT 1 ]
dV1 (kT 1) 1
= + =0
dkT kT 1 kT kT
1
) =
kT 1 kT kT
) kT = kT 1 kT

) kT = kT 1
(1 + )

Now we can substitute the optimal policy back into the value function to solve for the value

25
function as a function of the state variable kT 1 which yields

V1 (kT 1) = ln kT 1 kT 1 + ln kT 1
(1 + ) (1 + )
1
= ln k + ln k
(1 + ) T 1 (1 + ) T 1
1
= ln + ln + ln kT 1 + 2 ln kT 1
1+ 1+
1
= ln + ln + (1 + ) ln kT 1
1+ 1+

Now going back one more period we get

V2 (kT 2) = max ln kT 2 kT 1
[0;kT 2 ]
kT 12

1
+ ln + ln + (1 + ) ln kT 1
1+ 1+
dV2 (kT 2 ) 1 (1 + )
= + =0
dkT 1 kT 2 kT 1 kT 1
) kT = (1 + ) kT 2 kT
1 1
(1 + ) + 1
) kT 1 = kT 2
(1 + )
(1 + )
) kT 1 = k
1 + (1 + ) T 2

If we continue along this path, we will …nd that

1 ( )T t
kt+1 = k
t+1 t
for t = 1; 2; :::; T
1 ( )T

You should do this as an exercise. Remember to also solve for the value function. Is there a
pattern?

In…nite Horizon Dynamic Programming


We now extend the problem so that it goes on forever (there is no …nal period). In
some ways this problem will be easier to solve and in others it will be harder to solve than
the Finite Horizon problem. On the one hand it will be easier because we will face the same
problem over and over again and will need to solve for a single value function rather than

26
value functions for each period. It will be harder because we have no …nal period to start
from. It turns out that this will not slow us down much.
The structure of the problem is similar to the …nite horizon problem except now
individuals maximize the discounted sum of the instantaneous return function (utility) over
an in…nite horizon: t = 0; 1; 2; :::; 1 (time is discrete)

X
1
t
V (x0 ) = max
1
E0 u (xt ; at )
fat gt=0
t=0

x0 given
at 2 (xt )
xt+1 = f (xt ; at ; "t ) (Law of Motion)
"t is a random variable
F ("jx; a) = Pr ("t "jxt = x; at = a)

At date t = s the value function is:

X
1
t s
Vs (xs ) = max
1
Es u (xt ; at )
fat gt=s
t=s

at 2 (xt )
xt+1 = f (xt ; at ; "t )

The environment is stationary (u; f; F; ; etc.) and the problem is the same at each point in

time. Consequently, the value function and optimal policy are stationary (if they exist) so we
can drop the time subscripts and write the value function and the optimal policy mapping
as V (x) and (x), respectively.
Write Bellman’s equation as:

(3) V (x) = max u (x; a) + EV [f (x; a; ")]


a2 (x)

where the value function, V (x) ; and decision rule, (x) ; solve the functional equation ( i.e.
the solution is a function).
Traditionally there are three methods that people use to solve this problem: 1) Guess

27
and check (undertermined coe¢ cients) 2) Successive Approximations and the 3) Policy Im-
provement.
The idea behind guessing is quite simple: conjecture that a speci…c function V0 (x) is
the solution then substitute it into equation (3) and solve

V1 (x) = max u (x; a) + EV0 [f (x; a; ")]


a2 (x)

If V1 (x) = V0 (x) then you have the correct answer and are done. If V1 (x) 6= V0 (x), guess
again. Clearly this method seems a bit haphazard and led people to the next method:
successive approximations.

Example 5. One sector growth model with in…nite horizon15

X
1
t
V (k0 ) = max ln kt kt+1
fkt+1 g1
t=0
t=0

0 kt+1 kt t = 0; 1; 2; ::1
subject to :
k0 > 0

Lets set up the problem

V (k) = max ln k k 0 + V (k 0 )
k0 2[0;k ]

where the prime denotes tomorrow’s variable.


Guess

V0 (k) = A + D ln k

Three steps

15
This is the problem studied by Brock and Mirman (1972).

28
1. Solve maximization problem given guess for v

V0 (k) = max ln k k 0 + (A + D ln k 0 )
0 k0 k
1 D D
f oc : = ! k0 = k
k k0 k 0 1+ D
D
2. Evaluate the RHS at the optimum: k 0 = 1+ D
k :

RHS = ln k k 0 + (A + D ln (k 0 ))
k D
= ln + A + D lnk
1+ D 1+ D
D
= ln (1 + D) + A + D ln + ln k + D ln k
1+ D

3. Set LHS = RHS and solve for coe¢ cients

D
A + D ln k = ln (1 + D) + A + D ln + ln k + D ln k
1+ D
D
A = ln (1 + D) + A + D ln
1+ D
D = + D!D=
1

We can now take D and plug it back into the optimal policy to solve for the policy
function: k 0 = (k) from

D 1
k0 = k = k = k
1+ D 1+ 1

With D in hand, we can then solve for constant

1
A= ln (1 )+ ln
1 1

We can then use the policy function to completely describe a sequence of capital stocks

29
fkt g1
t=0 starting from k0

k1 = (k0 ) = k0
2
k2 = (k1 ) = k1 = k0 =( )1+ k0

The guess and verify approach works well in this case. But, this is because the prim-
itives were rigged to work.16 More generally it does not work so we often are working with
the successive approximations approach.
The starting point for the method of successive approximation is the answer from
an unsuccessful guess. In particular, the method of successive approximations says to use
the new answer V1 (x) as the second guess. This yields a V2 (x). If V2 (x) = V1 (x) then
we are done, if not we use V2 (x) as the new guess. In this way we generate a sequence of
fVn (x) ; n (x)g and the hope is that this sequence eventually results in a …xed point such
that Vn (x) = V (x) and n = (x) or that the sequence Vn (x) ( n (x)) converges to the
true V (x) ( (x)) as n ! 1. This approach essentially introduces greater ‡exibility into the
decision rules by considering a one-period deviation from some rule. This one-shot deviation
must o¤er a weak improvement so that Vn+1 Vn and is a way of introducing extra ‡exibility
into the solution.
In solving the …nite horizon problem we made use of the last period in order to solve the
problem. When time is in…nite we have no last period to start from and work backwards. Yet
under certain circumstance we are able to use the methods from the …nite horizon problem
to solve the in…nite horizon problem. In this case we can think of the solution to the in…nite
horizon problem as the limit of a sequence of …nite problems with the time period becoming
progressively longer (T ! 1). This approach is a speci…c application of the method of
successive approximations when the initial guess is that V0 = 0.
Furthermore, when the period of time is of su¢ ciently long duration, we can also
sometimes approximate the solution to the …nite horizon problem with the in…nite horizon
solution.

16
Guess and verify generally works only with speci…cations with quadratic preferences and linear constraints
or Cobb-Douglas constraints and log preferences. Some further examples are in Hercowitz & Sampson (1991),
Benhabib & Rustichini (1994), and Antony and Maussner (2007).

30
The third method to solve these problems is to iterate on the policy function. It
consists of three step. First pick a feasible policy 0 (x) and compute the value of following
this policy

X
t
V0 (x) = u (x; 0 (x)) with xt+1 = f (xt ; 0 (xt ))

Second, generate a new policy a = 1 (x) that solves the two period problem

max fu (x; a) + V0 (f (x; a))g


a

for each x. Then form

X
t
V1 (x) = u (x; 1 (x)) with xt+1 = f (xt ; 1 (xt ))

Third, iterate over j to convergence of the policy rule.

31

You might also like