Dynamic Optimization: A Tool Kit: Manuel W Alti This Draft: September 2002
Dynamic Optimization: A Tool Kit: Manuel W Alti This Draft: September 2002
Manuel Walti
This draft: September 2002
Contents
1 Introduction
2 Optimal control
2.1 Discrete time . . . . . . . . . . . . . . . . .
2.1.1 Finite horizon . . . . . . . . . . . . .
2.1.2 Innite horizon . . . . . . . . . . . .
2.2 Continuous time . . . . . . . . . . . . . . .
2.2.1 Finite horizon . . . . . . . . . . . . .
2.2.2 Innite horizon with discounting . .
2.3 Digression: Continuous versus discrete time
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
5
7
7
10
11
3 Dynamic programming
3.1 Discrete time - deterministic setting
3.1.1 Finite horizon . . . . . . . . .
3.1.2 Innite horizon . . . . . . . .
3.2 Continuous time . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
12
13
17
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
T
t=0
t u (ct )
subject to
kt+1 kt = ct
where
k0 given
kT +1 0
ct is the amount of cake consumed in period t (in the context of our dynamic
optimization problem, ct represents the control variable). ct yields instantaneous
utility u (ct ), where u (ct ) satises the usual assumptions, i.e., u () > 0 and
u () < 0. Future consumption is discounted with discount factor 0
1. The present value in period 0 of the whole consumption path equals V .
(The time-separable lifetime utility, V , represents the criterion function of our
dynamic optimization problem.) T is the last day with consumption. In our
story T is 9 as today is 0. The cake size (which represents the state variable
in our dynamic optimization problem) is denoted by k. A rst constraint on the
in-ight state variable requires that the cake size in period t is the previous size
less the previous consumption. The original size of the cake, k0 , is given. A nal
constraint requires that the cake size at the terminal date must be nonnegative.
The girls problem now is to determine the optimal path of ct . As it stands,
this problem could be solved numerically, e.g. with the help of the solver in
Excel. However, it may also be solved analytically. The remainder of this
handout provides you with the most relevant tools to do so, namely optimal
control and dynamic programming. It is written in the style of a cookbook and
1 The following parable is taken from Kurt Schmidheiny and Manuel W
alti, Doing economics with the computer, Session 5.
explicitly does not deal with the very advanced mathematics behind dynamic
optimization.
This handout is mainly based on King [3], Barro and Sala-i-Martin [1], and
Leonard and Van Long [2].
2
2.1
Optimal control
Discrete time
We all know the standard method of optimization with constraints, the KuhnTucker Theorem.2 To solve a dynamic optimization problem we basically apply
the same theorem.
2.1.1
Finite horizon
The typical problem that we want to solve takes the following form. An agent
chooses or controls a number of control variables, ct ,3 so as to maximize an
objective function subject to some constraints. These constraints are dynamic
in that they describe the evolution of the state of the economy, as represented
by a set of endogenous state variables which we denote by kt . The evolution
of these endogenous state variables is aected through the economic agents
choice of the control variables; they are also inuenced by the variation in some
exogenous state variables, xt .
At the heart of this dynamic system are the equations describing the dynamic
behavior of the states; they take the form
kt+1 kt = g(ct , kt , xt )
We write these so-called accumulation equations as involving changes in the state
variables because this eases the conversion to continuous time in our discussion
below. We assume that the initial values of the state variables, k0 , are given.
Moreover, there are terminal conditions on the state variables, which take the
form
kT +1 k
The criterion (or objective) function is assumed to be a discounted sequence
of ow returns, u(ct , kt , xt ), which can represent prots, utilities, and so on. It
takes the form
T
t u(ct , kt , xt )
t=0
T
t=0
t u(ct , kt , xt ) +
T
t t [g(ct , kt , xt ) + kt kt+1 ]
t=0
+ T +1 T +1 kT +1 k
2 If not: an excellent treatment can be found in the mathematical appendix to Mas-Colell,
Whinston, and Green (1995).
3 c can be considered as a vector.
t
where t denotes current valued multipliers (or current valued co-state variables). With the choice of present valued multipliers, t , the Lagrangian takes
the form
L
T
t u(ct , kt , xt ) +
t=0
+ T +1 kT +1 k
T
t [g(ct , kt , xt ) + kt kt+1 ]
t=0
To change from one concept to the other we use the following transformation of
the co-states
t t = t
T +1 T +1 = T +1
t ,kt ,xt )
Lets look at the case of current valued multipliers and lets write u(cc
t
t
as u
etc.
For
the
FOCs
we
derive
the
Lagrangian
with
respect
to
the
controls,
ct
the states and the co-states
L
ct
L
kt+1
L
kT +1
L
( t t )
= 0 = t
ut
gt
+ t t
ct
ct
(1)
= 0 = t t + t+1 t+1
gt+1
kt+1
ut+1
+ 1 + t+1
kt+1
(2)
= 0 = T T + T +1 T +1
(3)
= 0 = g(ct , kt , xt ) + kt kt+1
(4)
The conditions in (1) and (4) hold for t = 0, 1, ..., T , while those in (2) holds for
t = 0, 1, ..., T 1. Moreover, the complementary slackness conditions have to
be met:
L
( T +1
T +1 )
L
T +1 T +1
( T +1 T +1 )
kT +1 k 0
= T +1 T +1 kT +1 k = 0
The complementary slackness conditions say that if the value of a given state
variable at the terminal date is positive (i.e., kT +1 0) then its current valued
shadow price must be zero. Alternatively, if its current valued shadow price at
the terminal date is positive, then the agent must leave kT +1 = 0.
Application: Cake eating problem in discrete time Consider the following simplied version of the cake eating problem mentioned in Section 1.
The girl chooses c0 , c1 , ..., cT that maximize
T
t=0
u (ct )
k0 given
kT +1 0
Note that in contrast to the story above we assume here that the girl isnt
impatience so that = 1 (i.e., there is no discounting). There is one control
(ct ), one endogenous state (kt ), and no exogenous state variable. Furthermore,
u(ct , kt , xt ) = u (ct ) and g(ct , kt , xt ) = ct . Thus, the optimality conditions are
given by
L
ct
L
kt+1
L
kT +1
L
( t t )
L
(T +1 )
L
T +1
(T +1 )
ut
t
ct
0=
0 = t + t+1
0 = T + T +1
0 = ct + kt kt+1
kT +1 k 0
= T +1 kT +1 k = 0
Infinite horizon
Most models considered in economics involve economic agents with innite planning horizons. The typical problem takes the form
max
ct
t u(ct , kt , xt )
t=0
subject to
kt+1 kt = g(ct , kt , xt )
k0 given
lim t kt = 0
where t , as before, is the current valued multiplier. The terminal condition now
says that kt can be negative and grow forever in magnitude, as long as the rate
of growth is less than t ; it is called the transversality condition.4 Benveniste
and Scheinkman (1979)5 have shown that if
4
by
L=
t u(ct , kt , xt ) +
t t [g(ct , kt , xt ) + kt kt+1 ]
t=0
t=0
ut
gt
+ t t
ct
ct
0 = t
0 = t t + t+1 t+1
0 = g(ct , kt , xt ) + kt kt+1
gt+1
ut+1
+ 1 + t+1
kt+1
kt+1
Application: The neoclassical growth model The benevolent social planners problem is
t u (Ct , 1 Nt )
max
t=0
t u (Ct , 1 Nt ) +
t=0
t=0
The optimality conditions are given by the FOCs (due to the convexity of
the problem these are also sucient)
L
Ct
L
Nt
L
Kt+1
L
t
0 = t u1 (Ct , 1 Nt ) t
0 = AF (Kt , Nt ) Ct + (1 ) Kt Kt+1
6
plus the boundary conditions which are given by the initial capital stock, K0 ,
and the transversality condition, lim t Kt+1 = 0.
t
2.2
Continuous time
Finite horizon
(5)
s.t.
d
k (t) = g [c (t) , k (t) , x (t)]
dt
k (0) = k0 > 0 given
k (T ) k
As in the discrete time setting, equation (5) is called the criterion (or objective)
d
function. The expression for dt
k (t) is called the accumulation (or transition)
equation. Next we have the initial condition and the nal constraint. For
simplicity lets assume that there is just one control, one endogenous state, and
one exogenous state, although a multitude of these variables could readily be
included.
Digression on the discount rate in continuous time The discount factor
1
. denotes the rate of time preference; it
in discrete time is t , where = 1+
expresses the impatience of an economic agent. In continuous time, needs to
be broken down to since the time does not jump in units of 1 anymore. This
can be done as follows
1
2. Take the derivative of the Hamiltonian w.r.t. the control variable and set
it to 0
H
=0
(6)
c (t)
3. Take the derivative of the Hamiltonian w.r.t. the state variable (the variable that appears in the dierential equation above) and set it to equal
the negative of the derivative of the multiplier w.r.t. time
H
d
= (t)
k (t)
dt
(7)
4. Take the derivative of the Hamiltonian w.r.t. the co-state variable and set
it to the derivative of the state variable w.r.t. time
H
d
= k (t)
(t)
dt
(8)
5. Transversality condition: Set the product of the shadow price and the
state variable at the end of the planning horizon to 08
(T ) k (T ) k = 0
If we combine equation (6) and (7) with equation (8) (which represents
nothing else than the transition equation) then we can form a system of two
dierential equations in the variables and k. The nal step is to nd a solution
to this dierential equation system. For an illustrative example compare Barro
and Sala-i-Martin [1], Appendix on mathematical methods, section 1.3.8.
(t) = (t) et
2. Take the derivative of the Hamiltonian w.r.t. the control variable and set
it to 0
H
=0
(9)
c (t)
3. Take the derivative of the Hamiltonian w.r.t. the state variable (the variable that appears in the dierential equation above) minus t and set
8 Generally,
the transversality condition implies the following behavior for the co-state:
Case 1
Case 2
Case 3
k (T ) = k
(T ) free
k (T ) k
(T ) 0
k (T ) > k
(T ) = 0
k (T ) free
(T ) = 0
the sum of the two terms to equal the negative of the derivative of the
multiplier w.r.t. time
d
H
(t) = (t)
k (t)
dt
(10)
u [c (t)] dt
0
Note that in contrast to the story in Section 1 we assume here that the household
is not impatient and, hence, et = e0t = 1. To solve this problem lets make
use of the cookbook procedure given above. Step 1 leads to the Hamiltonian
H = u [c (t)] (t) c (t)
9 Generally,
the transversality condition implies the following behavior for the co-state:
Case 1
k (T ) = k
Case 2
k (T ) k
(T ) 0
k (T ) > k
(T ) = 0
k (T ) free
(T ) = 0
Case 3
with identical conditions for k (0).
(T ) free
In case of an innite horizon with discounting we can apply the same procedure
as for nite horizon except that we change the transversality condition to
lim (t) k (t) = 0
resp.
This means, again, that the value of the capital stock must be asymptotically 0,
otherwise something valuable would be left over: If the quantity, k (t), remains
positive asymptotically, then the price, (t), must approach 0 asymptotically.
If k (t) grows forever at a positive rate then the price (t) must approach 0 at
a faster rate so that the product, (t) k (t), goes to 0.
Application: The neoclassical growth model with fixed labor supply
Consider the following continuous time model
max U =
et
s.t.
C (t)
dt
1
1
K (t) = AK (t)
N C (t) K (t)
K (0) = K0 > 0 given
To solve this problem lets make use of our cookbook procedure. Step 1 leads
to the current value Hamiltonian
1
= C (t)
H
1
1
+ (t) AK (t)
N C (t) K (t)
K = (t) (1 ) AK (t) N (t) = (t)
H
where is supposed to be > 0.
Step 4 leads to the condition
= AK (t)1 N C (t) K (t) = K (t)
H
Finally, the transversality condition (step 5) is given by
lim (t) k (t) = 0
2.3
Given that the two methods are closely related, it is interesting to ask why they
are both used. Continuous and discrete time dier in the following important
ways:
Phase planes: when a dynamic model is stated in continuous time one
can study the qualitative dynamics (of the resulting system of dierential
equations) using certain graphical techniques that are not available in
discrete time.
Culture: some groups of economists learned one way and others learned
another, with persisting dierences.
Solutions: whether a particular model is stated in continuous time or discrete time may lead to dierent solutions, i.e. there may be mathematical
dierences in the respective solutions.
Closed form solutions: a closed form solution is a solution that can be
arrived at by solving an equation or a set of equations, as opposed to
the use of numerical methods. There are dierent forms of closed form
solutions in discrete and continuous time. For example, in the basic growth
model there is a discrete time closed form for the case with log utility and
complete depreciation. In the continuous time model, there is a closed
form which has other restrictions.
11
Dynamic programming
T
t u(ct , kt , xt )
(11)
t=0
subject to
kt+1 kt = g(ct , kt , xt )
k0 given
kT +1 0
Dynamic programming exploits two fundamental properties of this type of
problems, namely separability and additivity over time periods. More precisely,
for any t, the functions ut and gt depend on t and on the state and control
variables, but not on their past or future values;
the maximand V is the sum of the net momentary utilities.
Using these two properties, Bellman (1957) enunciates an important theorem
about the nature of any optimal solution of problem (11). This theorem is known
as the principle of optimality. Roughly speaking, it says that an optimal policy
has the property that at any stage t, the remaining decisions ct , ct+1 , ..., cT must
be optimal with regard to the current state kt , which results from the initial
state k0 and the earlier decisions c0 , c1 , ..., ct1 . This property is obviously
sucient for optimality since we require it to hold for all t: when we put t = 1,
we have the denitions of an optimal policy. Furthermore, the property is also
necessary, since any deviation from the optimal policy, even in the last period,
is clearly suboptimal.
It was left to Bellmans genius to transform this rather trite, nearly tautological observation into an ecient method of solution. We now state the result
formally.
3.1
3.1.1
kt+1 kt
at+1
xt
ct ,kt+1
= g(ct , kt , x(t ))
= at + 1
= x(t )
with
(12)
(13)
(14)
t+1 = m(t )
(15)
(12) is the Bellman Equation, (13) is the accumulation equation, (14) is the age
equation, and (15) gives the law of motion of the exogenous variable xt as a
12
function of a set of exogenous state variables, t , that evolve according to the possibly nonlinear - dierence equation system t+1 = m(t ). For an example
compare the application of stochastic dynamic programming below.
Note that we have converted the many-period optimization problem given
above into a two period optimization problem, which involves trading o between the current return u(ct , kt , x(t )) and the future value V (kt+1 , t+1 , at+1 ).
To solve the problem, we begin at the terminal value and proceed by backward induction. This process is frequently called value iteration, as it involves
taking initial value function V (kt+1 , t+1 , at+1 ), nding the optimal level of the
right hand side of the Bellman equation at each kt , t and thereby constructing
a new value function V (kt , t , a). We can also write now
V (kt , , a)
s.t.
k k
a
max
{u(c, k, x()) + V (k , , a )}
c,k
= g(c, k, x())
= a+1
= m()
The rst FOC then is the derivation of L with respect to the control, the
second with respect to the state at time t + 1, and the third with respect to the
Lagrange multiplier yielding the constraint of state accumulation. To get an ex
pression for V (kk, ,a ) we need the envelope theorem (for a rigorous exposition
compare the relevant literature)
u(c, k, x()
g(c, k, x())
V (k, , a)
+ [
+ 1]
k
k
k
Before plugging in, we change the subscripts since we need
sequent period.
3.1.2
V (k,)
k
of the sub-
Infinite horizon
Apart from this new condition, innite horizon optimization is not dierent
from the nite horizon case. Bellmans principle of optimality at once tells us
why. Consider any nite horizon subproblem with the initial and terminal conditions xed by the larger problem For the subproblem, the maximum principle
conditions apply. But the initial and terminal times of the subproblem could be
arbitrary, so the conditions must in fact hold for the entire range (0, ).
13
Application: Non-stochastic dynamic programming We consider a nonstochastic baseline model of investment where rms face a perfectly elastic supply of capital goods and can adjust their capital stocks costlessly. Suppose that
the prots of the rm can be written as
p (t ) f (kt ) (t ) it u (it , kt , t )
where p (t ) may be interpreted as an output price or a productivity shock;
f (kt ) is a positive, increasing and strictly concave production function (i.e. we
abstract from labor within the scope of this model); (t ) is the investment
good price; and it is the quantity of investment expenditure. The rms capital
accumulation will be described by
kt+1 kt = it d kt g (it , kt )
The Bellman equation for the innite horizon problem is
V (kt , t ) = max {u (it , kt , t ) + V (kt+1 , t+1 )}
it ,kt+1
where maximization takes place subject to kt+1 kt = g (it , kt ) and the dynamic
equations for the exogenous states follow t+1 = m (t ).
Since this is a constrained optimization problem, we may form the Lagrangian
L = {u (it , kt , t ) + V (kt+1 , t+1 )} + t [g (it , kt ) kt+1 + kt ]
The FOCs are
it
kt+1
g (it , kt )
u (it , kt , t )
+ t
it
it
V (kt+1 , t+1 )
0 = t +
kt+1
0 = g (it , kt ) kt+1 + kt
0=
14
t u (ct , lt )
t=0
subject to
1,t
: 0 = at f (kt , nt ) ct it g1 [ct , it , nt , kt , a (t )]
2,t
3,t
: 0 = 1 nt lt g2 [nt , lt ]
: kt+1 kt = it dkt g3 [it , kt ]
15
The derivation of the FOCs for the two control variables ct and lt raises no
problems
ct
lt
u (ct , lt )
1,t
ct
u (ct , lt )
0=
2,t
lt
0=
The FOC for the control variable nt is derived as follows. Recall that in
general the FOC for a control variable is given by10
u(ct , kt , xt )
g(ct , kt , xt )
+
=0
ct
ct
In the case at hand, the control variable nt does not show up in the momentary
utility of the representative agent, u (ct , lt ). Moreover, nt appears in accumulation equation g1 [ct , it , nt , kt , a (t )] and g2 [nt , lt ]. It follows that the FOC is
given by
g1 [ct , it , nt , kt , a (t )]
g2 [nt , lt ]
0 + 1,t
+ 2,t
=0
nt
nt
or, more specic,
nt : 0 = 1,t
at f (kt , nt )
2,t
nt
A similar logic applies to the control variable it , which appears in accumulation equation g1 [ct , it , nt , kt , a (t )] and g3 [it , kt ] (but not in the momentary
utility function). The FOC is given by
it : 0 = 1,t + 3,t
In general the FOC for a state variable is given by11
V (kt+1 , t+1 )
t + Et
=0
kt+1
To get an expression for
V (kt+1 ,t+1 )
kt+1
V (kt , t )
u (ct , kt , xt )
g(ct , kt , xt )
+ t [
+ 1]
kt
kt
kt
In the case at hand, the state variable kt does not show up in the momentary
utility function of the representative household, u (ct , lt ). Moreover, kt appears
in accumulation equation g1 [ct , it , nt , kt , a (t )] and g3 [it , kt ]. Thus,
V (kt , t )
g1 [ct , it , nt , kt , a (t )]
g3 [it , kt ]
= 0 + 1,t
+1
+ 3,t
kt
kt
kt
10 Be aware of the following point: The general function u(c , k , x ( )) and the momentary
t t
t t
utility function of the problem at hand, u (ct , lt ), use the same notation. Also, in the general
problem ct stands for control variables and kt stands for state variables, whereas in the problem
at hand ct denotes consumption (just one of several control variables) and kt denotes physical
capital (the only endogenous state variable).
11 Compare the previous footnote.
16
t ,a(t )]
?
Why has the term +1 been skipped in the expression 1,t g1 [ct ,it ,nkt ,k
t
Well, as you can see above, kt does not show up on the LHS of accumulation
equation 1. Changing the subscripts and substituting yields
at+1 f (kt+1 , nt+1 )
+ 3,t+1 (1 d)
kt+1 : 0 = t + Et 1,t+1
kt+1
The last three FOCs are
3.2
1,t+1
: 0 = at f (kt , nt ) ct it
2,t+1
3,t+1
: 0 = 1 nt lt
: 0 = it dkt kt+1 + kt
Continuous time
To be written.
References
[1] Barro, R.J. and X. Sala-i-Martin (1995), Economic growth, McGraw-Hill
[2] Leonard, D. and N. Van Long (1992), Optimal control theory and static
optimization in economics, Cambridge University Press
[3] King, R.G. (?), Notes on dynamic optimization, handout
17