Best of The 20th Century: Editors Name Top 10 Algorithms SIAM News, Volume 33, Number 4
Best of The 20th Century: Editors Name Top 10 Algorithms SIAM News, Volume 33, Number 4
1946: John von Neumann, Stan Ulam, and Nick Metropolis, all at the Los Alamos Scientific Laboratory, cook up the Metropolis
algorithm, also known as the Monte Carlo method.
The Metropolis algorithm aims to obtain approximate solutions to numerical problems with unmanageably many degrees of freedom
and to combinatorial problems of factorial size, by mimicking a random process. Given the digital computer’s reputation for
deterministic calculation, it’s fitting that one of its earliest applications was the generation of random numbers.
1947: George Dantzig, at the RAND Corporation, creates the simplex method for linear programming.
In terms of widespread application, Dantzig’s algorithm is one of the most successful of all time: Linear
programming dominates the world of industry, where economic survival depends on the ability to optimize
within budgetary and other constraints. (Of course, the “real” problems of industry are often nonlinear; the use
of linear programming is sometimes dictated by the computational budget.) The simplex method is an elegant
way of arriving at optimal answers. Although theoretically susceptible to exponential delays, the algorithm
in practice is highly efficient—which in itself says something interesting about the nature of computation.
In terms of wide-
spread use, George
Dantzig’s simplex 1950: Magnus Hestenes, Eduard Stiefel, and Cornelius Lanczos, all from the Institute for Numerical Analysis
method is among the at the National Bureau of Standards, initiate the development of Krylov subspace iteration methods.
most successful al- These algorithms address the seemingly simple task of solving equations of the form Ax = b. The catch,
gorithms of all time.
of course, is that A is a huge n ´ n matrix, so that the algebraic answer x = b/A is not so easy to compute.
(Indeed, matrix “division” is not a particularly useful concept.) Iterative methods—such as solving equations of
the form Kxi + 1 = Kxi + b – Axi with a simpler matrix K that’s ideally “close” to A—lead to the study of Krylov subspaces. Named
for the Russian mathematician Nikolai Krylov, Krylov subspaces are spanned by powers of a matrix applied to an initial
“remainder” vector r0 = b – Ax0. Lanczos found a nifty way to generate an orthogonal basis for such a subspace when the matrix
is symmetric. Hestenes and Stiefel proposed an even niftier method, known as the conjugate gradient method, for systems that are
both symmetric and positive definite. Over the last 50 years, numerous researchers have improved and extended these algorithms.
The current suite includes techniques for non-symmetric systems, with acronyms like GMRES and Bi-CGSTAB. (GMRES and
Bi-CGSTAB premiered in SIAM Journal on Scientific and Statistical Computing, in 1986 and 1992,
respectively.)
1951: Alston Householder of Oak Ridge National Laboratory formalizes the decompositional approach
to matrix computations.
The ability to factor matrices into triangular, diagonal, orthogonal, and other special forms has turned
out to be extremely useful. The decompositional approach has enabled software developers to produce
flexible and efficient matrix packages. It also facilitates the analysis of rounding errors, one of the big
bugbears of numerical linear algebra. (In 1961, James Wilkinson of the National Physical Laboratory in
London published a seminal paper in the Journal of the ACM, titled “Error Analysis of Direct Methods
of Matrix Inversion,” based on the LU decomposition of a matrix as a product of lower and upper
triangular factors.)
Alston Householder
1957: John Backus leads a team at IBM in developing the Fortran optimizing compiler.
The creation of Fortran may rank as the single most important event in the history of computer programming: Finally, scientists
1
(and others) could tell the computer what they wanted it to do, without having to descend into the netherworld of machine code.
Although modest by modern compiler standards—Fortran I consisted of a mere 23,500 assembly-language instructions—the early
compiler was nonetheless capable of surprisingly sophisticated computations. As Backus himself recalls in a recent history of
Fortran I, II, and III, published in 1998 in the IEEE Annals of the History of Computing, the compiler “produced code of such
efficiency that its output would startle the programmers who studied it.”
1959–61: J.G.F. Francis of Ferranti Ltd., London, finds a stable method for computing eigenvalues, known as the QR algorithm.
Eigenvalues are arguably the most important numbers associated with matrices—and they can be the trickiest to compute. It’s
relatively easy to transform a square matrix into a matrix that’s “almost” upper triangular, meaning one with a single extra set of
nonzero entries just below the main diagonal. But chipping away those final nonzeros, without launching an avalanche of error,
is nontrivial. The QR algorithm is just the ticket. Based on the QR decomposition, which writes A as the product of an orthogonal
matrix Q and an upper triangular matrix R, this approach iteratively changes Ai = QR into Ai + 1 = RQ, with a few bells and whistles
for accelerating convergence to upper triangular form. By the mid-1960s, the QR algorithm had turned once-formidable eigenvalue
problems into routine calculations.
1965: James Cooley of the IBM T.J. Watson Research Center and John Tukey of Princeton
University and AT&T Bell Laboratories unveil the fast Fourier transform.
Easily the most far-reaching algo-rithm in applied mathematics, the FFT revolutionized
signal processing. The underlying idea goes back to Gauss (who needed to calculate orbits
of asteroids), but it was the Cooley–Tukey paper that made it clear how easily Fourier
transforms can be computed. Like Quicksort, the FFT relies on a divide-and-conquer
strategy to reduce an ostensibly O(N 2) chore to an O(N log N) frolic. But unlike Quick- sort,
the implementation is (at first sight) nonintuitive and less than straightforward. This in itself
James Cooley
gave computer science an impetus to investigate the inherent complexity of computational John Tukey
problems and algorithms.
1977: Helaman Ferguson and Rodney Forcade of Brigham Young University advance an integer relation detection algorithm.
The problem is an old one: Given a bunch of real numbers, say x1, x2, . . . , xn, are there integers a1, a2, . . . , an (not all 0) for which
a1x1 + a2x2 + . . . + anxn = 0? For n = 2, the venerable Euclidean algorithm does the job, computing terms in the continued-fraction
expansion of x1/x2. If x1/x2 is rational, the expansion terminates and, with proper unraveling, gives the “smallest” integers a1 and a2.
If the Euclidean algorithm doesn’t terminate—or if you simply get tired of computing it—then the unraveling procedure at least
provides lower bounds on the size of the smallest integer relation. Ferguson and Forcade’s generalization, although much more
difficult to implement (and to understand), is also more powerful. Their detection algorithm, for example, has been used to find
the precise coefficients of the polynomials satisfied by the third and fourth bifurcation points, B3 = 3.544090 and B4 = 3.564407,
of the logistic map. (The latter polynomial is of degree 120; its largest coefficient is 25730.) It has also proved useful in simplifying
calculations with Feynman diagrams in quantum field theory.
1987: Leslie Greengard and Vladimir Rokhlin of Yale University invent the fast multipole algorithm.
This algorithm overcomes one of the biggest headaches of N-body simulations: the fact that accurate calculations of the motions
of N particles interacting via gravitational or electrostatic forces (think stars in a galaxy, or atoms in a protein) would seem to require
O(N 2) computations—one for each pair of particles. The fast multipole algorithm gets by with O(N) computations. It does so by
using multipole expansions (net charge or mass, dipole moment, quadrupole, and so forth) to approximate the effects of a distant
group of particles on a local group. A hierarchical decomposition of space is used to define ever-larger groups as distances increase.
One of the distinct advantages of the fast multipole algorithm is that it comes equipped with rigorous error estimates, a feature that
many methods lack.
What new insights and algorithms will the 21st century bring? The complete answer obviously won’t be known for another
hundred years. One thing seems certain, however. As Sullivan writes in the introduction to the top-10 list, “The new century is not
going to be very restful for us, but it is not going to be dull either!”
2
Solving LP problems [Zionts, 1974] [4:]1
240
z = [0 0 0]180 = 0 {5}
110
There does not appear (Dantzig) to be a systematic way of setting all the
nonbasic variables simultaneously to optimal values —hence, an iterative2 method.
Choose the variable that increases the objective function most per unit (this
choice is arbitrary), in the example, x1, because its coefficient (0,56) is the largest.
According to the constraints, x1 can be increased till:
x1 = 240 x1 = 240
B 1,5 x1 = 180 → x1 = 120 {6}
x1 = 110 x1 = 110
The third equation (why ?) in {2} leads to x1 = 110 and x5 = 0. The variable x1 will be
the entering variable and x5 the leaving variable:
1
A, B, C identify the iteration, as summarized below.
2
Iterative: involving repetition; relating to iteration. Iterate (from Latin iterare), to say or do again
(and again). Not to be confused with interactive.
File: {LP_ZiontsB0308.doc}
2[:4] Solving LP problems [Zionts, 1974]
C x1 = 110 − x5 {7}
Substituting for x1 everywhere (except in its own constraint), we have
[max]z = 0,56(110 − x5 ) + 0,42 x 2
(110 − x5 ) + 2 x2 + x3 = 240
{8}
1,5(110 − x5 ) + x2 + x4 = 180
x1 + x5 = 110
110
z = [0 0 0]130 + 61,6 = 61,6 {12}
15
C x 2 = 15 − x 4 + 1,5 x5 {14}
Solving LP problems [Zionts, 1974] [4:]3
x1 110
x = 15
2 {17}
x3 100
1
C x5 = 50 − x3 + x 4 {19}
2
Substituting for x5 everywhere (except its own constraint), we have
[max ]z = − 0,42 x 4
1
+ 0,07 50 − x 3 + x 4 + 67,9
2
x3 − x4 + x5 = 50
1 {20}
x2 + x4 − 1,5 50 − x 3 + x 4 = 15
2
1
x1 + 50 − x 3 + x 4 = 110
2
x1 60
x = 50
2 {21}
x5 50
In sum:
A In the system of equations, find the identity matrix (immediate solution).
B search for an entering variable (or finish)
C consequently, find a leaving variable (if wrongly chosen, negative values will
appear).
References:
– ZIONTS, Stanley, 1974, “Linear and integer programming”, Prentice-Hall,
Englewood Cliffs, NJ (USA), p 5. (IST Library.) ISBN 0-13-536763-8.
– See others on the course webpage (http://web.ist.utl.pt/mcasquilho).
v
Mar-2011 Zionts, "An intuitive algebraic approach for solving LP problems" [max ]z = 0,56 x1 + 0,42 x2
s. to x1 + 2 x2 ≤ 240
X = 60 90 Solver model:
(0) 0,56 0,42 [max] z = 71,4 71,4 1,5 x1 + x2 ≤ 180
(Constr.:) x1 x2 Value RHS 2 x1 ≤ 110
(1) 1 2 <= 240 240 <= 240 TRUE
(2) 1,5 1 <= 180 180 <= 180 100
(3) 1 0 <= 110 60 <= 110 100
*
Another method to solve this matter is the “two-phase method”.
In this example, there is only one equation with an artificial variable. If there
were several equations with artificial variables, we would have to subtract
accordingly.
[min] z = 4 x1 + x2
s. to 3 x1 + x2 =3
{6}
4 x1 + 3x2 ≥6
x1 + 2 x2 ≤4
with x ≥ 0. The augmented standard form is
[min] z = 4 x1 + x2 + 0 x3 + 0x4 + Ma1 + Ma 2
s. to 3 x1 + x2 + a1 =3
{7}
4 x1 + 3x2 − x3 + a2 =6
x1 + 2 x2 + x4 =4
References
- HILLIER, Frederick S., and Gerald J. LIEBERMAN, 2005, “Introduction to Operations
Research”, 8.th ed., McGraw-Hill
- TAHA, Hamdy, 1992, “Operations Research: an introduction”, 5.th ed., MacMillan
Publishing Company
(Blank page)
Redundant ?
[max]x2 − x1
x1 + x 2 ≤ 10 {1}
x1 + x 2 ≤ 20
[max] z = − x1 + x2
s. to x1 + x2 ≤ 10 {2}
x1 + x2 ≤ 20
[max] z = − x1 + x2 + 0 x3 + 0 x4
s. to x1 + x2 + x3 = 10 {3}
x1 + x2 + x4 = 20
Solve:
Go to http://web.ist.utl.pt/~mcasquilho/compute/or/Fx-lp-revised.php
Supply:
Opt. max
Coefficients -1 1 0 0
1 1 1 0 10
A|B
1 1 0 1 20
Artificials 0
Initial basis 34
Redundant ? No problem.
[max] z = − x1 + x2
s. to x1 + x2 ≤ 10 {2}
x1 + x2 ≥ 20
M ≅ +∞
[max] z = − x1 + x2 + 0 x3 + 0 x4 − Mx5
s. to x1 + x2 + x3 = 10 {3}
x1 + x2 − x4 + x5 20
Solve:
Go to http://web.ist.utl.pt/~mcasquilho/compute/or/Fx-lp-revised.php
Supply:
Opt. max
Coefficients -1 1 0 0 0
1 1 1 0 0 10
A|B
1 1 0 -1 1 20
Artificials 5
Big M 1+2
Initial basis 35
Impossible ? No problem.
_________________________________________________________________________________
→ Resolution
a) Classical solution
(We will use only points 1, 4 and 7 of Table 1. With all the points, the source
cited gives yˆ =26,81 − 1,55 x , “in the sense of least squares”.)
As is well known, the parameters of the problem are obtained minimizing a
sum of errors (squared, for convenience), of the form
n
z = ∑ ( yi − yie )
2
{1}
i =1
with
z – measure (a sum) of the n errors. ([z] = ψ2, see below)
n – number of experiments
yi – theoretical (or “calculated”) value, y = ax + b , of the measured
variable, corresponding to xi (i integer, i = 1..n)
a, b – process parameters. (With χ and ψ the dimensions of x and y,
respectively, it is [a] = ψχ–1 and [b] = ψ.)
yie – experimental value (a constant, thus) of the measured variable,
corresponding to xi
So, z 1 is a function of only a and b, whose minimum is easy to find by differentiation,
1 The use of z, as may be concluded, would be more logical, although indifferent from the viewpoint
of optimization.
MC-IST file={EckerK_sciSolved.doc}
2 [:6] Scientific application (!) of LP
_________________________________________________________________________________
b
while the optimum of z is not relevant.
Table 2
i xi yie xi − x ( xi − x ) yie ( xi − x ) 2
1 2 24,3 -6 -145,8 36,00
2 4 19,7
3 6 17,8
4 8 14,0 0 0,0 0,00
5 10 12,3
6 12 7,2
7 14 5,5 6 33 36,00
Sum 24 43,8 (0) -112,8 72,00
Average x=8 y = 14,6
with
d i = yie − yi {4}
or, since (in this case) it is y = ax + b ,
2 The plural in the form “x’s” seems appropriate (better than “xx”).
Scientific application (!) of LP [6:] 3
_________________________________________________________________________________
Now, this problem has the disadvantage of not being a linear programming, in this
form, because, of course, the ‘absolute value’ 3 of a linear function is not a linear
function [Ecker et al., 1988]. We can, however, convert it into a linear program
through the following elementary fact:
So, let us replace each (non-linear) inequality by two linear inequalities, to get a linear
program:
[min] z {9}
subject to
(
z ≥ + yie − axi − b ) i = 1..n {10a}
(
z ≥ − yie − axi − b ) i = 1..n {10b}
z ≥ 0; a, b: of free sign
As is known, a and b can be replaced by differences of non-negative variables, say,
a ′ − a ′′ e b ′ − b ′′ . Incidentally, as we have (possibly good) approximations of the
optimum values of a and b, from the previous section, we can simply just replace a by
–a´ (a´ non-negative) —an artifice that must be verified in the end (and which would
be under suspicion in case we obtained the boundary value a´ = 0).
The problem then becomes:
[min] z {11}
subject to
3 Or “modulus”.
4 [:6] Scientific application (!) of LP
_________________________________________________________________________________
In matrix form, it is
[min ] w = cT x
subject to : Ax ≥ b {15}
x ≥0
with
x = [a ′ b z]
T
c T = [0 0 1]
2 − 1 1 − 24,3
8
− 1 1 − 14,0
{16}
14 − 1 1 − 5,5
A= b =
− 2 1 1 24,3
− 8 1 1 14,0
− 14 1 1 5,5
i) Direct resolution
The problem, as just formulated, has 3 structural variables and 6 constraints.
Its manual resolution, thus, faces the practically unfeasible handling of square
matrices of order 6, among others. The computer resolution took 5 iterations and
gave (as structural variables and objective function):
Scientific application (!) of LP [6:] 5
_________________________________________________________________________________
a ′ 1,56667
b = 26,9833 ( z = 0,45 )
{17}
z 0,45
Primal Dual
The case under study corresponds to the classification above; in other cases, the
descriptions under the titles primal and dual would be exchanged.
Among other properties, it can be proved that:
– If one of the problems has an optimum vector (solution), then the other also has
one, and the optimum objective function is identical.
– If one of the problems is possible but has no finite optimum, then the other is
impossible.
– Both problems can be impossible.
– The optimum vector for the maximization has its elements equal to the
coefficients of the slack variables of the optimum basis of the minimization,
and reciprocally.
Therefore, starting from the original problem under study, which has 3 structural
variables and 6 constraints (two per each experimental point), its dual can be
constructed, having 6 structural variables and only 3 constraints. So, in this case, the
dual (a) evolves by much easier iterations (matrices of order 3, not 6), and (b) will
6 [:6] Scientific application (!) of LP
_________________________________________________________________________________
be, expectedly, less cumbersome, as it will yield about half the iterations (about
proportional to 3, not 6). Using the dual would still allow to easily consider all the
experimental points, even if more numerous, as the number of iterations till the
optimum depends essentially on the number of constraints.
The dual would be:
[max] (w = ) 24,3s1 + 14,0 s 2 + 5,5s3 − 24,3s 4 − 14,0 s 5 − 5,5s 6 {18}
subject to
s1
s
2 8 14 − 2 − 8 − 14 0
2
− 1 − 1 − 1 1 s 3 ≤ 0
1 1 s {19}
1 1 1 1 1 1 4 1
s5
s 6
The result, in 4 iterations (instead of 5), is (of course)
z = 0,45 {20}
and contains —in its so-called dual variables— the values
∆ = [ −1,567 −26,98 −0,45]
T
{21}
Consequently, this vector (always negative —i.e., non-positive— in the optimum of a
maximization, of course) has as elements the symmetrical of the results (a´, b, z) of
the primal, already known.
←
References
– ECKER, Joseph G., Michael KUPFERSCHMID, 1988, “Introduction to Operations
Research”, John Wiley & Sons, New York, NY (USA), ISBN 0-471-63362-3.
– GUTTMAN, Irwin, Samuel S. WILKS, J. Stuart HUNTER, 1982, “Introduction to
Engineering Statistics”, 3.rd ed., John Wiley & Sons, New York, NY (USA),
ISBN 0-471-86956-2.
v
Bronson & Naadimuthu, 1997, pp 56–57