0% found this document useful (0 votes)
7 views4 pages

[2] On the theory of dynamic programming

Richard Bellman's paper discusses the theory of dynamic programming, focusing on mathematical problems involving sequences of operations aimed at optimizing outcomes. It presents existence and uniqueness theorems for functional equations related to maximizing yields or minimizing costs, along with specific examples and solutions. The work is connected to sequential analysis and references contributions from other researchers in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

[2] On the theory of dynamic programming

Richard Bellman's paper discusses the theory of dynamic programming, focusing on mathematical problems involving sequences of operations aimed at optimizing outcomes. It presents existence and uniqueness theorems for functional equations related to maximizing yields or minimizing costs, along with specific examples and solutions. The work is connected to sequential analysis and references contributions from other researchers in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

716 MATHEMATICS: RICHARD BELLMAN PROC. N. A S.

ON THE THEORY OF DYNAMIC PROGRAMMING


By RICHARD BELLMAN
THE RAND CORPORATION, SANTA MONICA, CALIFORNIA
Cohnmunicated by J. von Neumann, June 5, 1952
1. Introduction.-We are interested in a class of mathematical problems
which arise in connection with situations which require that a bounded or
unbounded sequence of operations be performed for the purpose of achiev-
ing a desired result. Particularly important are the cases where each oper-
ation gives rise to a stochastic event, the result of which is applied to the
determination of subsequent operations.
Two fundamental problems encountered in situations of this type, in
some sense duals of each other, are those of maximizing the yield obtained
in a given time, or of minimizing the time or cost required to accomplish a
certain task.
In many cases, the problem of determining an optimal sequence of oper-
ations may be reduced to that of determining an optimal first operation.
The general class of functional equations generated by problems of this
nature has the form
(min.)
f(p) = {max.}(Tk(f)), (1.1)
kJ
where Tk is an operator. In many cases of interest, the operator has the
form
Tk(f) = gk(P) + hk(P)f(SkP), (1.2)
where Sk is a point transformation.
We shall first presetit some existence and uniqueness theorems pertaining
to the solutions of (1.1), and then present explicit solutions of some simple
functional equations of the form of (1.1).
As simple examples of problems which give rise to functional equations
of this form, we mention the following:
1. We are given the fact that one of N boxes contains a ball, with prob-
ability Pk that the ball is in the kth box. Let qk be the probability that on
examining the kth box we are unable to examine its contents, and tk be the
time consumed in one examination. What procedure minimizes the ex-
pected time required to locate the box containing the ball, and what pro-
cedure minimizes the expected time required to obtain the ball?
2. We are given a quantity x > 0 which may be divided into two parts,
y and x - y. From y we obtain a return of g(y) and from x - y a return
of h(x - y). In so doing we are left with a new quantity ay + b(x -y),
0 < a, b < 1, with which to continue the process. How does one pro-
VOL. 38, 1952 MA THEMA TICS: RICHARD BELLMAN 717

ceed in order to maximize the total return obtained in a finite, or un-


bounded, number of stages?
The theory of dynamic programming is intimately related to the theory
of sequential analysis due to Wald.3 Two papers by Arrow, Blackwell and
Cirshick,' and Arrow, Harris and Marschak2 also treat problems of similar
type.
2. Existence and Uniqueness Theorems.
THEOREM 1. Consider the equationt
f(p) = max.
1<k <n
(gk(P) + hk(P)f(SkP)), p e R, (2.1)
w I:ere we assume that
(a) If p e R, a region of n-dimensional space, then Skp e R. (2.2)
(b) gk(P)I < ci for p e R,
(c) hk(p)I < c2 < 1 for p e R,
(d) gk(P), hk(P) 2 0 for p e R.
Under these assumptions there is a unique bounded solution to (2.1).
THEOREM 2. Consider the equation
f(x) = max. [a(x1,X2, ...,XN) + f(b(x1, x2, X,XN)) ], (2.3)
R
N
where R = R(x) is defined by Xk > 0, Xk = x.
k 1
If
(a) a(x1, x2, ..., XN) is continuous over R(x) for 0 < x < xo, non-
negative, qnd a- (0, 0, ..., 0) = 0, (2.4.)
(b) b(x1, x2, ..., XN) is continuous and non-negative over R,
(c) b(x1, x2, ...,XN) < cx, < c < 1, in R(x),
(d) , h(clxo) < c, where h(x) = max. a(x1, x2, ... , XN),
I = O R
there is a unique continuous solution to (2.3) for which f(O) = 0 for 0 < x <
XO.
THEOREM 3. Consider the equation

f(p) min. + E Pkf(xk), p


=~ 1 + f(S1p)k } $X, (25
f(xo) = 0,
where l= 1, 2, ...,M, and
(Po \ /Po\ 0
P Pi , SIP PV t 2 Xk =(i.I (2.6)
N PN
718 MATHEMATICS: RICHARD BELLMAN PRoC. N. A. S.

the 1 occurring in the kth place. Each p and Sip is a probability vector,
N
Pk . 0, k -O, pt = 1, and f(p) is a scalar function of p.
If for each I it is true tha
N N

k=
E 1 Pkc
<1CE
k 1
P, =
O < cl < 1, (2.7)
there is a unique bounded positive solution to (2.5).
The proof in all three cases employs the method of successive approxi-
mations. The equation in (2.5) occurs in connection with problems similar
to problem 1 above.
3. Solutions of Some Particular Functional Equations.-In this section
we indicate the solution of some simple cases of the general equations dis-
cussed above.
THEOREM 4. The solution of
f(x, y) = max. + f(x,s2Y)]' x, Y . 0 (3.1)
where 0 < PI, P2, su, s2 < 1, r1, r2 > 0, is given by
f(x, y) = Pi [rix + f(sux, y) I for >pi p2r2y
1 PI1-P2
= p2[r2y + f(x, s2y)] for 1rp
1-PI
< p2_ p
P2
(3.2)
If s5m = 52i, f(x, y) is piecewise linear.
This result may be extended in many ways.
THEOREM 5. The solution of
f(x) =
o
max.
%y .x
[g(y) + h(x - y) + f(ay + b(x - y)], (3.3)
where 0 < a, b < 1, may be reduced to that of
f(x) = max. [g(x) +f(ax), h(x) +f(bx)I, (3.4)
in 0 < x < xo, if g and h are monotonically increasing functions such that
g(O) = h(O) = 0, g", h' > 0 in [0, xo].
If g', h' < 0 the situation is much more complicated, and no such simple
result such as (3.4) holds in general. The solution of (3.4) may be obtained
explicitly and is similar in structure to that of (3.1) above. This func-
tional equation arises from problem 2.
THEOREM 6. The solution of
f(P P2, ...PN) = min. [ k +
k qk

(l Pk)f o,Ps***s°s'*'w1 p) (3.5)


VOL. 38, 1952 MATHEMATICS: RICHARD BELLMAN 719

the zero occurring in the kth place, where

f(O.* ... °.O Pk, ...*P) (3.6)

for pk > O,k = 1,2, ...,N, is given by

f(P1, P2p ..
PN) = 1 + (1-P&) X

(-Pi I
of' ..
I PN) (3 7)

if k is the index for which pj(l - ql)/tl is a maximum.


* This is the solution to problem 1 above in the case where we wish to ob-
tain the ball. If we want merely to locate the ball the solution is more com-
plicated. In this case we either examine the box for which PA(1 -qk)/tk
is a maximum first, or we never examine it.
THEOREM 7. The solution oft

f(x) = 1+ min.{(f(ax)} > x > O, O < a < 1, (3.8)


f(O) = O,
is
f(x)= 1 + xf(1), x <XQ
- 1 + f(ax), x > xo,
where xO = (1e-)a)/(k + 1)(1 - a), and k is the integer at which (y + 1)!
(1-aV) is a minimum for y = 1, 2,.
Detailed proofs and further results will appear in another publication.
t Results on the existence of solutions of (2.1) were obtained by S. Karlin and H. N.
Shapiro in an unpublished paper.
t The solution of (3.1) was obtained in conjunction with M. Shiffman, while that of
(3.8) was obtained in conjunction with D. Blackwell.
'Arrow, K. J., Blackwell, D., and Girshick, M. A., "Bayes and Minimax Solutions of
Sequential Decision Problems," Econometrica, 17, 214-244 (1949).
2 Arrow, K. J., Harris, T. E., and Marschak, J., Optimal Inventory Policy, Cowles
Commission Papers, New Series, No. 44, 1951.
3 Wald, A., Statistical Decision Functions, John Wiley & Sons, New York, 1950.

You might also like