0 ratings 0% found this document useful (0 votes) 34 views 190 pages Weide B.W. - Statistical Methods in Algorithm Design and Analysis (Thesis) (1978)
The thesis explores the application of statistical methods in the design and analysis of discrete algorithms, focusing on techniques such as randomization, ranking, and sampling. It discusses probabilistic approximation algorithms and their behavior, providing empirical results that suggest alternatives to traditional algorithms like Quicksort. Additionally, the work analyzes the use of order statistics in optimization problems and presents new algorithms for various computational challenges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Weide B.W. - Statistical Methods in Algorithm Desi... For Later Statistical Methods in Algorithm
Design and Analysis
Brute W. Weide
Dept. of Computer Science
Carnegie-Mellon University
Pittsburgh, PA 15213
‘August 1978
‘Submitted to Carnegie-Mellon University in partial fulfillment of the
Fequirements for the degree of Doctor of PhilosophyCarnegie-Viellon University
CARNEGIE INSTITUTE OF TECHNOLOGY
AND
MELLON INSTITUTE OF SCIENCE
THESIS
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR 1:1 Decree or__Doctor of. Philosophy
ame STATISTICAL METHODS IN ALGORTTHM DESTGN AND ANALYSTS -
PRESENTED BY. Bruce ¥. Weide
Computer Science
ACCEPTED BY THE DEPARTMENT OF.
Michael Sharos
APPROVED BY THE COLLEGE COUNCIL
Banged ey ox G22 8-78Abstract
The use of statisticat methods in the design and analysis of discrete algorithms is
explorec, Among the design tools are randomization, ranking, sampling and subsampling,
density estimation, and “cell” or “bucke!” techniques. The analysis techniques include
those based on the design methods as well as the use of stochastic convergence
concepts and order ctatisties,
The intraduétory chapter contains a lilersture curvey and background shaterial on
probability theory. In Chapter 2, probabilistic approximalion algorithms are discusced
with the goal of exposing and correcting some oversichts in previous work. Some
advantages of the proposed solution are the introduction of a homogeneous model for
dealing with random problems, and a tet of methads for analyzing the probabilistic
behavior of approximation algoritams which permit consideration of fairly complex
algorithms in which there are dependencies aong the random variables in question.
“Chapter 3 contains many uratul design and analysis tools such as those mentioned
above, and several exansies of the uses of tke methods. Algorithms which run in linear
expecied lime for 4 wide range of probabilstie assumptions about their inputs are
given for problem: ransing from sorting to finding all nearest neighbors in a point set
in dimensions. Empirical results are presented which indicale thal the sorting
algorithm, Binsort, is a good aliernative to Quicksort uncer most conditions. There are
also new algorithms for some selection and discrete optimization problems.
Finally, Chapter 4 descrises the uses of resus from order stalislcs 10 analyze greedy
aigoritims and to investaate the behavior of parall algorithms. Among the results
Teported here are gereral Iheavens regarding the distribution of solulion values (or
Optimization problems on weighled graphs. Many recent resuls inthe literature, which
apply for certain distributions of edge weighs and for speclic problems, follow as
immediate corollaries rom these general theorems. ;Full rete
Glebov, and Perepelica [1976] cites similar
Summary
‘This summiary of the thesis begins with the motivations for expanding some
recent work in probabilistic and randomized algorithr-s, and a short literature survey.
snces can be found in the thesis itself. Summaries of the :hree main chapters,
follow.
Only within the last few years have sarious attempts been made to investigate
probabilistic models of algorithm behavior. One of the most revolutionary ideas, at
least to computer scientists and mathematicians, is the notion offered by Karp [1976]
and Rabin [1976] among others, that an algorithm need not get the exact answer for
every input. While it is, of course, most desirable that an algorithm always get the
correct soluticn, this is not necessarily the most cost-effective approach, since it is
Widely believed that the NP-hard problems require exporential computing times.
iculty. First, it is
There are three options available to help overcome this
possible for an algorithm 10 produce a good approximation all the time, an alternative
which has been recogrized for quite a white. Karp [1976] credits Graham [1965] with
pioncering such algorithms for NP-hard probiems. Even producing a guaranteed good
approximation is NP-hard for certain proslems, though (see Garey and Johnson
[19/8]. A second possiblity is for an algorithe: 10 get the exact answer mast of the
time (Rabin (1976). Finally, an algorithm could produce a good approximation to the
correct answer most of the time (Karp (1976). A short survey article by Gimady,
{eas in the Russian mathematical literature.Uniortunately, there is now consi¢erable confusion regarding Vetinitiuns of
certain key terms being used to describe probabilistic algorithms, such as Karp's
[1976] algorithm for the Euclidean traveling salesman problem and Posa’s [1976]
algorithm for the Hamiltonian circuit problem. That this confusion exists is undeniable,
‘ven though it is not mentioned in the literature. The source of the difficully seems to
bbe that there are at least three non-equivalent probabilistic models (and associated
definitions of the phrase “almost everywhere", but results obtained under one model
are commonly cited in papers using a different one. Chapter 2 deals with this problem,
showing the ‘relationships among the different modeis and how some proofs could be
revised to take advantace of these relationships.
Although there have been many mepectadctime analyses of discrete algorithms, |
most suthors mate very restielve stuumptions abou! the sistribulions of input
parameters. Two natable exceptions are Spica (1970) who assumes no patiuar
distribution of edge weizhts for the shortest path problem, but merely ‘hat they are
independent; and Genley and Shanos [1978], who make only a very weak techrieal
assumption about the distribution of the points to prove good expected behavior of
their planar convex hull algorithm.
Hoare [1962] proposed that analysis of Quicksort could be freed
assumption of equally likely input permutations simply by choosing the pa
element at random, Nore recently Yuval [1975b], Rabin (1976} and Carter and
Wegman [1977] have suggested thal this idea be used to design algorithms which
perform well under s wide range of probabilistic assumptions. In Chapter 3 we
Jue this trend, and chow how randomization and sampling can affect both the «
design and analysiz of many algorithms.Chapter 4 includes new probabilistic madals for some problems which Car be
analyzed by the use of order stalistics. Borovov (1962), Weide [1976] Golden
[1977] Baudet [1978], and Robinson [1973] have previously mace use of such
methods for computer sciznce problems. Some of the general theorems in,Chapter 4
have as corollaries 2 number of special results regarding the asymptotic behavior of
the solutions to optimization problems on graphs. The previous cases have been
analyzed on an ad hoc basis, depen
of edge weights.
Summary of Chapter 2: Stochastic Convergence and
Probabilistic Algorithms
Chapter 2 is 2 discussion of the analysis of “probabilistic approximation
algorithms” using the concepts of stochastic convergence. Such an algorithm usually,
but not necessarily always, produces the exact solution to a problem or at least a good
approximate answer. We would like to be able to characterize a particular probabilistic
approximation: algorithm by 2 statement of the form, “The algorithm produces an
answer having relative error at most ¢ with probability at'least p." A good probabilistic
approximation algorithes would have a small value for ¢ and p near one.
Unfortunately, most problems far-which probabilistic” approxim: seems
appropriate (such as the NP-hard problems) cannot be solved even in this weak sense
of such an algorithm is
by simple algoritisic. As a resull, the probabilistic anal
typically complicated by the fact that certain steps of the algorithm are not
‘independent, and by the fact that the answer produced by the algorithm is not
independent of the true ancwer. The latter correlation is, of course, desirable(otherwise we would be hard pressed to justify the procedure as an algorithm for
solving the problem), but it contributes to making probabilistic analysis extremely”
difficult in general.
As an alternative, we follow Karp [1976] in proposing that the behavior of a,
Prababitictic approximation algorithm be characterized by the stochastic convergence
(to zero) of the seauence of errors in the answers it produces to a sequence of
random problems. We chow haw to deal with deperdence among the relevant random
variables, and introduce the’ notions of “strong” and "weak" success of algorithms to
describe those which have error sequences which converge to zero almost surely and
in probability, respectively.
Tt is obvious that the probabilistic modal of the problem instances can be an
important factor in determining haw “strongly” an algorithm succeeds in this sense. If
edge weights
2 graph are chosen from a uniform distribution, for example, the
algorithm might succeed strongly, whereas if they are chosen from 2 normal
distribution the alzorithm might not work at all. This possibility is apparently
recognized by everyone. It turns out that a much more subtle problem with’
the literature of the field, and we’ propose a
deficiency. :
The solution involves the distinction betwoen what we call the “incremental
iffers only incrementally
problem model", in which the n'" problem of the sequence
from the previous onc, and the “independent problem model’, in which the a! problem
Of the sequence is totally independent of the previous one. Our main result relstzs the
problem models and modes of stochastic success, and demonstrates that strong success
in the independent mcdel is a strictly stronger criterion than strong success in the
incremental model, which is strict
ly stronger than weak success in either model.Unfortunately, while thare are many prababilistic analyses of algorithms which
demonstrate strong success, most of these apply only in the incremental model. After
giving a review of many of the papers in this area in an attempt to illustrate the
confusion which can arice if the distinction between problem models is not made
‘explicitly, we propose that this difference be recognized, and argue for adoplion of the
independent problem model as the canonical basis for proving stochastic suecess of
1
probat approximation algorithms,
Finally, we give a detailed analysis of Karp’s [1976] algorithm for the Euclidean
traveling salesman problem, The main result here suggests that the algorithm succeeds
strongly in the independent mocel, although that conclusion has yet to be proved. A
long and rather detailed prosf of our theorem is included to demonstrate the use of
Several techniques which seem to be of universal utility in dealing with such problems.
Chapter 2 is by far the most difficult reading in this thesis, and its importance
to subsequent recults rests primarily with the definitions of strong and weak success
and the identification of the two problem models. The reader who is familiar with these
concepts and understands the hierarchy of problem madeis and stochastic convergence
should have no difficulty interpreting later results which refer to Chapter 2.
‘Summary of Chapter 3: Randomization and Sampling
Chapter 3 contains many practical techniques and results. We begin with a
classification of algoriths into “probabilis
approximation algorithms” of Chapter 2,
“randomized algorithiss", and all others. A randomized algorithm is non-deterministic in
the ser
that it may not perform exactly the same computation if given the same
inputs. The non-determinism is the result of a randomization cr sampling step in thealgorithm which is designed to zive the algorithm goad expected behavior over 2 wide
range of input distributions. A natural correspondence between these algorithm
classes and parametric and non-parametric statistics is introduced in order to suggest
which stati
al techniques may be most useful in designing certain types of
algorithms.
We then introduce four general techniques for designing randomized and
ideas from statistics, These include
probabil
i¢ approximation algorithms using
“randomization”, which is the process of shuffling the input prior -to running an
algorithm in an attempt to achieve good expected running time regardleis of the input
Permutation; “approximation in rank", which is useful in problems on totally ordered
stimation and subsampling” for designing
on algorithms; and the use of “empirical distribution functions”
sets and for
to extend the domain of good behavior of certain algcrithms from uniformly distributed
technical con
inputs to all distribution: satisfying cert ns,
The thesis contains at least two examples of the use of each technique. An
algorithm for sorting real numbers from a wide class of distributions in linear expected
time is given. Empirical results show that the algorithm, Bincort, ie a practical
alternative to Quizkzori when more than a few hundred items are to be sorted. We
also include new on-line algorithms for selection problems which use very litle =:
ace,
but therefore are necessarily only approximate. Again, empirical recuits indicate that
the approximations are bolter in practice than can be proved in theory.
The technique: are alco chown to be effective in designing and analyzing
algo
and for geometrical problems. They lead
iv new algorithms for some clocest point problems which run in linear expected timeVY
for a large class of point distributions. Other geomatrica! problems, such as finding the
convex hull of 2 zet of points in the plane, can be solved in linear expected time by
Using Binzort to do sorting. The expected running time of an algorithm for which the
worst case is dominated by a sorting step can often be improved by this method.
Summary of Chapter 4: Order Statistics
s from the field of order st
Some
iquing re: ies are used in Chapter 4
to analyze the behavior of solutions to graph optimization problems and to compare
these with the behavior of greedy algorithms for such problens, The results include 4
host of previous results in the literature as special cases. In particular, two theorems
relate the value of the optimal zolution to 2 problem defined by an ebjective function
on the edge weights to the exictence in a random graph of a subgraph satisfying the
structural constraints of the problem. For instance, they relate the length of the
‘optimal traveling sitesmen tour in a randomly weighted complete graph to the
existence of a Hamiltonian circuit in a random graph.
There is also a discuscion of the rather surprising fact that a randomized
aigorithm (as defined in Chapter 3) can, in theory at least, possibly be improved simply
by starting up several instantiations of the same problem simultaneously and“*time-
sharing* the computing resources among the different versions. We state a condition
on the distribution of cunning times of a randomized aizorithm which, if satisfied,
assures that the algorithm does nat have optimal expected running time.
Finally, a few easy results from order statistics are used in the analysis of 2
schema for problem decompocition for azynchronous multiprocessors. The results are
‘extended from the cace of an ideal multiprocessor to one in which there is overhead
ascociated with the scheduling and dispatching of tasks to the processors.Contents
Acknowledgements
1. Introduction and Summary
1.1. Previous Work
1.2. Summary of Chapter 2: Stochastic Convergence, Probabilistic Algorithms
1.3. Summary of Chepter 3: Randomization and Sampling
1.4. Summary of Chapter 4: Order Statistics
1.5. Background Material
1.5.1, Notation S
1.5.2, Basie Probability Theory
1.5.3, Random Structures
1.6. Conclusions and Further Work
2. Probabilistic Algorithms and Stochastic Convergence
2.1, Stochastic Convergence
2.2. Random Problem Models
2.3. History of Confusion
2.4, Strong Success in the Independent Mode!
2.5. Example: Tha Traveling Salesman Problem
2.6. Conclusions
15
17
1-9
2-12
219
2-403. Randomization and Sampling
3.1. Classification of Algorithms
3.2. Classification of Statistical Procedures
3.3. Design Principles for Randomized Algorithms
3.3.1, Randomization
3.3.2. Approximation in Rank
3.3.3. Estimation and Subsampling
3.3.4, Empirical Gistribution Functions
3.4, Exsmples
3.4.1, Sorting and Searching
3.4.2. Selection
3.4.3. Discrete Opti
3.4.4, Geometrical Problems
3.5, Conclusions
4. Order Statistics
4.1. Expected Values and Asymptotic Cstributions
4,2. Examples
4.2.1. Greedy Algorithms
4,2.2, Parallelism
4.2.3. Problem Decomposition for Multiprocessors
4.3. Conclusions
5. References
32
34
3-29
3-30
9-39
351
358
375
6
4-18
4-23
4-30Acknowledgements
‘The suggestion that there might be something more to statistics than “mere
computation* was made by my thesis advisor, Mike Shamos, about three years ago. He
proceeded to explore how the algorithm design tools of computer science could be
ion to geometrical computations. Meanwhile, I examined the opposite approach:
How could statistical tools help in the design and analysis of algorithms for computer
science? I am happy to acknowledge that Mike is responsible for my interest in
stalisties as well as algorithms. It is enough to ask that one's thesis advisor help wil
technical matters, but for him to translate articles from the Russian originals is more
than should be expected. Nevertheless, that is exactly what Mike did. He is also the
source of most of the problem ideas and started me thinking about many of the
solutions in this thesis, and I am proud to consider him my friend,
Without the patience and assistance of the other members of my thesis
committee, however, this work would still be a proposal. Jon Bentley provided many
insightful suggestions regarding the algorithmic aspects, and more than he admits
regarding the statistical ones. Bill Eddy put up with my constant questions about
Probability theory «ra slatisiies, answering most of them immediately and spending
considerable effort leading me in search of answers to the others, Bill offered manyParticularly good ideas about the material in Chapter 2, and both he and Jon were
always available for consultaticn about technical and non-technical problems alike. Jay
Kadana and HT. Kung also made many important suggestions, without which I would
still be trying to prove some of the theorems of Chapters 2 and 4.
Of course, many other people helped with the technical problems and with my
writing style. Al the risk of overlooking someone, I would especially like to thank Tom
Andraws, Gerard Baudet, Kevin Brown, Peter Denning, Diane Detig, Therese Flaherty,
Sam Fuller, Paul Hilfinger, David Jefferson, John Lehoczky, Takao Nishizeki, Larry
Rafsky, John Robinson, Jim Saxe, Joe Traub, and Jay Wolf. Also, thanks to Lee
Ccoprider, Reid, and Mark Sapsford for their assistance with CMU's marvelous
document production facilities. In keeping with tradition, I suppose that I should
"assume responsibility for any remaining errors in the msnuzsript, which I hereby do.
However, I am confident that there are not too many left becauwe of the careful
perusal of early dratts by several of these people.
Generous financial support during my four years at CMU was provided by the
National Science Foundation and by IBM in the form of graduate fellowships, and by my
parents in forms too numerous to list here.
Finally, ! would like to thank my family, especially my parents, Harley and Belly
Jo Weide, and several close friends who helped make this experience 2 pleasant as
well as an educational one, Ann Kolwitz, Jay and Ellen Wolf, Diane Delig, Dave and
‘Moddy McKeown, and Jon and Judy Rosenberg saved ma from overwork and boredom
fon several occasions, 2s did the members of Turing's Machine, the Arpanels, the Jive
Turkeys, and last but certainly not least, SIGLUMGH To all these people I owe a
sincere debt of gratitude for their continuing friendship.1. Introduction and Summary
Until qui
recently, research in the design and analysis of discrete algorithms
has been devoted to the “worst-case” question: At worst, how bad is this algorithm?
‘The results produced by this effort ars of considerable intrinsic interest, and even of
eeeasional practical value, but the label “pessimist is often attached to computer
sclentists who pursue these issues. Inv
ions of the “typical” behavior of
algorril
ims are*actually motivated more by pragmatism than by optimism, but they are
frustrated by the difficulty of dealing with general probabilistic models.
‘The major point of this thesis is that certain parts of probability theory and
ics, which are not really difficult to learn, provide valuable tools for the
exploration of some practical issues in algorithm design end analysis. Thus, while
many of the results reported here are apparently only of a theorstical nature, others
are directly applicable to real-world situations. Although the contributions of this
‘work include these resulls, they constitute only @ minor part of the motivation for it:
most are simply demonstrations that resuits can be produced using the proposed
methods.
The most ir:portant parts of this thesis are the introduction of « homogensous
“1. The other side of the coin, namely how algorithm analysts can help statisticians, is
examined by Shamos [1976}1-2
model for random problems, which should help prevent the kinds of misinterpretations
which have appeared in initial efforts to deal with probabilistic models; promotion of
the idea that algorithms need nat always get the exact answer in order to be viable:
‘and introduction of long-ignored probabilistic and statistical tools to enable design and
analysis of algorithms under very general probabilistic assumptions.
The introductory chapter begins with a brief review of pravious work in the
area, although most of the details are left for later chapters. Section 1.2 is a summary
of Chanter 2, on stochastic convergence and probabilistic algorithms; Section 1.3 is a
summary of Chapt:
3, on randomization and sampling; and Section 1.4 is a summary of
Chapter 4, on the uses of order statistics. Section 1.5 introduces some basic material
which is essential to developing the relationships between computer science and
statistics. There is a description of notation and basic probability theory and a unifying
concept of “random structures” which will ba used throughout the thesis. Finally, in
Section 1.6 we mention some open problems and gresent saveral points which the
reader should keep in mind as he reads the more technical material of later chapters.
1.1. Previous Work
Only within the last few years have serious attempts been made to investigate
probabilistic models of algorithm behavior. One of the most revolutionary ideas, at
least to computer scientists and mathematicians, is the notion offered by Karp [1976]
and Rabin [1976], among others, that an algorithm need not get the exact answer for
every input. While it is, of course, most de
‘ablo that an algorithm always get the
correct solution, this is not necessarily the most cost-effective approach, since it is
widely believed that solving NP-hard problems exactly requires exponential computing
time.13
There are three options available to help overcome this difficully. Firat, it is
Possible for an algorithm to produce 4 good approximation all the time, an alternative
which has been recognized for quite a while. Karp [1976] credits Graham [1966] with
pioneering such algorithms for NP-hard problems. Even producing a guarantead good
approximation, however, is NP-hard for certain problems (see Garey and Johnson
(1976). A second possibility is for an algorithm to get tha exzct answer most of the
time (Rabin [1976). Finally, an algorithm could produce 2 good approximation to the
correct answer most of the lime (Karp (1976). A short survey article by Gimady,
Glebov, and Perepelice [1976] cites similar ideas in the Russian mathematical literature.
. Unfortunately, there ix now considerable confusion regarding definitions of
cert
rms being used to describe probabilistic algorithms, such as Karp’s
in key
(1976] algorithm for the Euclidean traveling salesman problem and Posa’s [1976]
algorithm for tha Hamiltonian circuit problem That this confusion exists is undeniable,
evan though it is not mentioned in the literature. The source of the difficully seems to
be that there are at least three non-equivalent probabilistic models (and associated
defi
fons of the phrase “almost everywhere", but results oblained under one motel
are commonly cited in papers using 2 different one. Chapler 2 deals wilh this problem,
showing the relationships among the different models and how some proofs could be
revised to take advantage of these relationships. Although this may sound like
Serious attack on the authors cr their results, it is just the opposite. The fact that
some subtle problems have been overlooked in such pioneering efforts is not unusual
trom a historical viewpoint. An opportunity to clear them up at their inception is too
good to miss.
Aithough there have been many expected-tima analyses of discral
algor14
most authors make very restrictive assumptions about the distributions of input
paraz'sters, Two notable exceptions are Spira [1973] who assumes no particular
distribution of edge weights for the shortest path problem, but merely that they are
Independent; and Bentley and Shares [1978], !
make only a very weak technical
essumption about the distribution of the points to prove good expected behavior of
"
planar convex hull algorithm. One of the many open questions in thi
develop 2 problem model which allows dependence among probabilistic quantities, and
then analyze it, which seems feasible since mild dependence is allowed by a variely-of
statistical theorems,
Hoare [1962] proposed that analysis of Quicksort could be freed of the
assumplion of equally likely input permutations simply by choosing the partitioning
slement at random, More recently Yuval [1975b] Rabin [1976] and Carter and
Wegman [1977] have suggested that this idea be used to design algorithms w:
perform well under a wide range of probabilistic assumptions. In Chapter 3 we
continue this trend, and show how randemization and sampling can affect both the
design and analysis of many algorithms.
Chapter 4 includes new probabilistic models for some problems which can be
analyzed by the use of order statistics. Borovkov [1962], Weide [1976] Golden
[1977}, Baudet [1978] and Robinson [1978] have previously made use of such
methods for computer science problens. Some of the general theorems in Chapter 4
have as corollaries a number éf special results regarding the asymptotic behavior of
She solutions to optimization problems on graphs. The previous cases have been
analyzed on an ad hoc basis, dependis
of edge weights.1s
1.2. Summary of Chapter 2: Stochastic Convergence and
Probabilistic Algorithms
Chapter 2 is 2 discussion of the analysis of “probabilistic approximation
algorithms” using the concepts of stochastic convergence defined in Section 1.5.2. Such
an algorithm usually, but not necessarily always, produces the exact solution to a
problem or at least 2 good approximate answer. Wa would like to be able to
characterize a particular probabilistic approximation algorithm by a statement of the
torm, “The algorithm produces an answer having relative error at most € with
probability at least p." A good probabilistic approximation algorithm would have 2 small
value of € and p near.one.
Unfortunately, most problems for which probabilistic approximation seems
‘appropriate (such as the NP-hard problems) cannot be solved even in this weak sense
le algorithms. As a result, the probabil
by ie analysis of such an algorithm Is
typically complicated by the fact that certain steps of the algorithm are not
independent, and by the fact that the answer produced by the algorithm is not
independent of the true answer. The latter correlation is, of course, desirable
(otherwise we would be hard pressed to justify the procedure as an algorithm for
solving the problem), but it contributes to making probabilistic analysis extremely
diffieult In general.
As an alternative, we follow Karp [1976] in proposing that the behavior of a
proba
istic approximation algorithm be characterized by the stochastic convergence
{to zero) of the sequence of errors in the answers it produces to a sequence of
random problems. We show how to deal with dependence among the relevant random
bles, and introduce the notions of “strong” and "weak" success of algorithms to16
describe those which have error sequences which converge to zero almost surely and
{In probability, respectively.
It Is obvious that the probabilistic model of the problem instances can be an
important factor in determining how “strongly” an algorithm succeeds in this senso. If
‘edge weights in # graph sre chosen from 2 uniform distribution, for example, the
algorithm might succeed slrongly, whereas if they are chosen from @ normal
Gistribution the algorithm might not work at all. This possibility is apparently
recognized by everyone, It turns out that a much more subtle problem with
probabilistic madels has gone unnoticed in the literature of the field, and we propose
scheme to correct this defici
ney.
The sclution involves the distinction between what we call the “incremental
problem model", In which the a! problam of the sequence differs onl
rementally
trom the previous one, and the “independent problem model", in which the n'M problem
of the sequence is totally independent of the previous one. Theorem 28 is our m:
result relating the problem models and modes of stochastic success, and demonstraies
that strong success in the independent model is a sts
tly stronger criterion than
strong success In the incremental model, which is strictly stronger than weak success
in either model.
Unfortunately, of the many probabilistic analyses of algorithms whi
demonstrate strong success, most apply only in the incremental model. Section 23
gives a review of many of the papers in this area in an attempt to illustrato the
contusion which can
it the distinction between problem models is not mado
explicitly. We therefore propose that this difference be recogrized, and argue for
‘adoption of the independent problem model as the canonical basis for proving
stochastic success of probabilistic approximation algorithms.17
Finally, in Section 25 we give a detailed analysis of Karp’s [1976] algorithm for
the Euclidean traveling salesman problem. Theorem 2.9 suggests that the algorithm
succeeds strongly in the independent model, although that conclusion has yet to be
Proved. The long and rather detailed proof of Theorem 2.9 is included to demonstrate
the use of several techniques which seem to be of universal utility in dealing with such
problems.
Chapter 2 is by far the most difficult reading in this thesis, and its importance
to subsequent results rests primarily with the definitions of strong and weak success
and the identification of the two problem models. The reader who is familiar with these
concepts and understands the hicrarchy described in Theorem 28 should have no
difficulty interpreting later results which refer to Chapter 2.
1.3. Summary of Chapier 3: Randomization and Sampling
Chapter 3 contains many practical techniques and results. We begin with 2
classification of algorithms into “probabilistic approximation algorithms” of Chapter 2,
‘randomized algorithms”, and all others. A randomized algorithm is non~deterministie in
the sense that it may not perform exactly the same computatic
if given the same
inputs. The non-determinism is the resuit of a randomization or sampling step in the
algorithm which is designed to give the algorithm good expected behavior over a wide
range of input distributions. A natural correspondence between these algorithm
classes and parametric and non-parametric statistics is introduced in order to suggest
which statistical techniques may be most useful in designing certain types of
jorithms.
n for the title of the thesis. We introduce
Section 3.3 is part of the jusfour general techniques fc: designing randomized and probabilistic approximation
‘algorithms using ideas from statistics. These include *randomization”, which includes as
a special case the process of shuffling the input prior to running an algorithm in an
attempt to achieve good expected running time regardless of the input permutation;
“approximation in rank", which is useful in problems on totally ordered sets and for
timation and subsampling” for designing probabilistic
discrete optimization probiems;
approximation algorithms; and the use of “empirical distribution functions” to extend
the domain of good behavior of certain algorithms from uniformly distributed inputs to
all distributions satisfying certain technical conditions.
In Section 3.4 we present at least two examples of the use of each technique.
class of distributions in linear
An algorithm for sorting real numbers from a
“expected time is given in Section 3.4.1. Empirical results show that the algorithm,
ical alternative to Quicksort when more than a few hundred items are
Binsert, is a pra
22 be sorted. We also include new on-line algorithms for selection problems which use
very little space, but therefore
r@ necessarily Only approximate. Again, empirical
sults indicate that the approximations are better in practice than can be proved in
theory.
The techniques are also shown to be effective in designing and analyzing
algorithms for discrete optimization problems and for geometrical problems. They lead
to new algorithms for some closest point problems which run in linear expected time
for a large class of point distributions. Other geometrical problems, such as finding the
convex hull of a set of points in the plane, can be solved in linear expected time by
using Binsort to do sorting. The expected running time of any algorithm for which the
worst case is dominated by a sorting step can often be improved by this method.1-9
1.4. Summary of Chapter 4: Order Statistics
‘Some intriguing results from the field of order statistics are used in Chapter 4
to analyze the behavior of solutions to graph optimization problems and to compare
these with the behavior of greedy algorithms for such problems. The results of
Section 4.21 include a host of previous results in the literature as special cases. In
al solution to 2 problem
ular, Theorems 48 and 49 relate the va'ue of the opti
P
defined by an objective function on the edge weights to the existence in a random
graph of a subgeaph satisfying the structural constraints of the problem. For instance,
they relate the tength of the optimal traveling salesman tour in « randomly weighted
complete graph to the existence of a Hamiltonian circuit in a randem graph.
There is “also a discussion of the rather surprising fact that a randomized
proved simply
igorithm (as defined in Chapter 3) can, in theory at least, possibly bi
by starting up several instantistions of the same problem simultaneously and “t
fon the distribution of running times Gl a randomized algorithm which, if satisfied,
assures that the algorithm does not have optimal expected running time.
Finally, a few easy results from order statistics are used in the analysis of 2
schema for problem decomposition for asynchronous multiprocessors. The results are
extended from the case of an ideal multiprocessor to one in which there is overhead
associated with the scheduling and dispatching of tasks to the processors.
1.5. Background Material
This section contains a summary of notation, definitions, and elementary
probability theory which will be used throughout the remaining chapters. I! is intended1-10
only to provide a basis for the models and terminology which we will propose and use
later, Most of the new concepts are defined when they arise naturally in later
chapters, so only common terms and ideas are reviewed here. While much of the
discussion may seem unnecessarily formal, subsequent issues will be much easier to
identity with this foundation.
1.5.1. Notation
When dealing with asymptotic behavior of functions, our’ notation will
essentially follow that used by Knuth (1976} Spe
fie notations are available to
describe the relationships between functions f{n) and g(n), all of which are based on
the behavior of the ratio f(n)/g(n) for all sufficiently large values of n. We say that
Ke) = ofgin) iff fend/etn) + 0.
f(a) = OXgin)) iff nd/etn) $ ¢ for some constant c.
(a) = Mein) iff f(ndfeln) z € for some constant ¢ > 0.
Ale) © G(gin)) iff fin) = OXgind) and Kn) = Agim).
Another possi
¥1 f(a /g(n) - co, is not specifically accounted for by this notation, but
it turns out that it would be especially useful to have some way of indicating this
behavior. By symmetry, the correct choice would seem to be {(n) = wig(n)). Rather
than adopt this non-standard notation we will simply say that f(n) grows faster than
(nif this condition is satisfied.
Other notati
is essentially standard For example, x+A* means that x
approaches A from above, and F(x") is the limit of F(z) as z approaches x from below.1
Most of the other terminology commonly used in analysis of diterte algorithms is ako
used here; for example, log means logarithm to the base 2 See Weide [1977] for
‘similar conventions and deseriptions of most of the problems which will be examincd
here.
1.5.2. Basic Probability Theory
Perhaps the major problem which plagues attempts to use probability and
statistics in diverse applications areas is the limited degree to which these ideas are
typically developed. One of the goals of this thesis is to define clearly the problem
models being analyzed and to attempt to put previous work on a sound footing.
The basis of the probabilily theory we will neod is the probability space.
(2BP). The set 9 is called the sample space, and consists of elements w € called
sample points. B is a g-field, or g-algebra, or Borel field, of {2 which means that it Ise
class of subsets of 9 which is closed under complementation and countable union (and
as a result of these two, also under countable intersection). Finally, P is a probability,
axioms:
measure; that is, P salisfies the fol
1) PO) #1.
(2) PRE) 2 0 for every EE B.
a u Ee z PEE,) whenever En E, = $ (the null set) for every i j.
Given a probability space (M58)? we define the infinite product space
(Q,8,P) in which 2 = Nyx Ox, and where B is the usual e-field and P the usual1-12
product measure there (see Halmos [1950] for more details). A sample point & € is
‘an Infinite sequence (wy, W,,—) where w, € My Such a space turns out to be very
Useful in several respects, and every space (9,8) will hereafter be assumed to be
this ini
fe product space. This is very important, since some lemmas and theorems
will not make sense in a general probability space.
Random variables are measurable real-valued functions on the sample space 9.
A random variable X has 2 distribution function F(x) « P{ w: X(@) s x } 2 A distribution
funetion is right-continuous (.e, Flx*) = F(x) for all values of x, but may not be left
continuaus at a countable number of points where it has jump discontinuities, Since
F(x) is non-decreasing, it has an inverse My) =
if x: Fx) 2 y } (see Chung (1974).
The events Ey Eq &, are totally independent itt P(A €)= TT PIE). The
¢ events Ey Eq —» &, are flay independent it P16) = TT PE
sequence of random variables (X,} will be said to be independent if no two of the
functions (X,} depend on common components of a sample point i. This is a non-
‘standard definition of independence of random variables, but clearly, whenever {E,}
event: Involving, respectively, the independent random variables {X,}, the events
{E,} are totally independent. The random variables (X,} are identically distributed If
the distribution of X, does not depend on n. If {X,} have the same distribution as
another random variable X, we write X,~ X. Random variables which satisfy both these
conditions are independent and identicaily distributed (iid)
2. Even though P is a function, it is customary to omit the parentheses around its
argument, which Is an event or set and is usually delimited by braces.1-13
‘The expected value of a function of a random variable X, say g(X), is denoted
E(g(X)). It is defined to be Setner whenever Sleooisr0o exists, where F is the
distribution function of X The mean, or expected value of X is simply EO) The
variance of X, denoted D(X), is just E((X - £0X))2) = £0X2) ~ EO,
The normal distribution Fox) « (2ma7¥2(™ expl-t = wF/t2eAdt is aiven the
special name R(y.c2). A random variable X having this distribution has mean wand
veriance @2, By a slight abuse of notation, this fact will be cenoted X ~ 9(j,02).
Ot primary importance in later chapters will be stochastic convergence of a
‘sequence of random variables. The sequence of random variables (X,} converges
almost surely, or almost everywhere, or with probability one, to X (written X, 55 X)
whenever P{ i: lim X,(03) = Xs) }= 1. Similarly, the sequence {X,] converges in
robability, or in measure, to X (written X,+peX) whenever, for all €>0,
lim P{ a: K(@) ~ XC) <€} = 1. Finally, (X,} converges in distribution to X (written
Xq tg X) whenever F(x) = PL G:X,(G) sx } converges to F(x) = P{ a: X(G) s x } at all
continuity points of F.2
To illustrate these concepts with an example of ari infinite product space, we
3. Convergence in distribution is actually a property of the sequence {F,} s0 that the
random variables (X,} nead not be defined on the same space. This technical
point Is of no real concern io us, since all random variables used here will be
defined on the same space "I described below.cite perhaps the most useful instance of all. The sample space 2, is the set of reals
05 w, <1, Fis the usual Borel field on [0,1), and Py is the usual Lebesgue measure. In
the infinite product space, @ is a sequence of real numbers, each of which Is between
© and 1. Furthermore, Plu, $x} =x for 0 sx <1. In more common terms, each w,
uniformly dist nd a sequence © consists of independently
uted between 0 and 1,
uniformly distributed components. This space is so sper
I that we will call It by its
‘own name, Wf.
By moans of the co-call
ilty-integral transformation, it Is possibts to
define a random variable X having any given distribution function F (not necessarily
continuous) by using the probability space %f, and the functional inverse of F. Briefly,
because F is increasing, Fix) = P[XSx}= P{ FOX) SFO) ) Letting X= FXe) =
inf{ x: Fx) 2 « } for some i gives Plu, 5 F(x) } = Fle), which is satisfied exsctly when
4, is uniformly distributed between O and 1, as it is in the space 3 This principle is
used by simulation systems to generate random numbers from an exponential
distribution, for example, by computing a function of random numbers from the uniform
infinite product space is the elegant manner in which these t:
need to talk about balls in urns, or other combinatorial or procedural structures, in
order to define or understand such diverse topics as independence and stochastic115,
convergence. Even better, such special models can be defined within the probability-
‘space model in a natural way, 2 fact which will enable us to see clearly the sources of
difficulty in a number of misinterpretations which have recently appeared in the
literature.
1.5.3. Re:
ndom Structure
random variables are functions on , they may be defined in arbitrarily
complex ways, including functional composition. Specifically, X(3) may be based on an
intermediate random structure. The structure is determined by the argument @, and
then the final value of X is determined by the structure, In this section, we will
‘examine some random structures based on the space % which arise in computer
sclence problems.
The simplest such structure is a natural extension of the idea of a random
variable. An ordered list of random variables is a random vector, or random point.4
‘Suppose that we were interested in tha distances from the origin of points uniformly
distributed in the unit cube. Our final random variable X might be defined as X(@) =
(of + uf + a)”, For this problem, the random vector is “hidden” by the fact that it
Is supposed to be uniformly distributed in the cube. More generally, if we were
10 distribution, then we would use X(ja) =
interested in points from 3 given multivar
Cf + 1 + 29% where (xy 961g) Is a random vector determined by transforming
4.. This should not be confused with a sample point .1-16
‘some components of @ to produce the desired distribution of points. Random vectors
are useful in probabilistic models of problems from geometry, mathematical
programming, polynomial arithmetic, etc,
Another structure wi
frequently appears in computer science problems is
the random permutation, We typically would like a model under which each of the
possible permutations of n objects is equally likely. A random variable of possible
Interest is the number of comparisons required to sort n elements of @ linearly
ordered set, whose expected value we wish to compute under this probability model.
It is usually easy to think in procedural terms when generating random structures (see
Sedgewick [19771 for more about permutations). In this case, the permutation m
Gafined by i can simply be thal ordering of the integers {1,2-n} for wich ty, <
Ong <— < One
‘The most complex of the random structures which we will encounter are
fandom graphs (see Erdos and Renyi (1959] and Erdos and Spencer (1974). A
classical problem from random graph theory is the question of connectivity: What is
the probability that a random labelled graph with n vertices and m edges is connecte-?
We can di
a random variable X of the 0-1 type (0 if the graph is not cernected, 1
If it is connected), and find its expected value, assuming that every labelled graph with
1 vertices and m edges is equally likely.> Again, a procedural description of X is easy.
Consider the first (9) components of @ to be numbere
“12 “1 1 zs “tn
2a 14 - rn
5.. Other definitions of random graphs are possible. See Chapter 4.1-17
and so forth, through 1, If y Is the m'"-smallest of these components, then we
simply let edge (ij) be present in the graph iff a, y. Now X is 0 if the resulting
graph is not connected and 1 if it is
Slight extensions of ordinary random graphs
random directed graphs and
random weighted graphs. The latter are especially useful in médeling certain
mathematical programming problems, since the weights may be assigned to vertices, or
edges, or both, and may have arbitrary distributions.
Up to this point, we have not made use of the fact that & is infinite-
dimensional, since each random structure depends on only a finite number of
components of &. However, it is easy to use & to define sequences of candom
structures, and thereby sequences of random variables for which we can explore the
properties of stochastic convergence. There are at least two different ways of
the
defining such sequences, one of which "re-uses it components af @ to
determine each structure, and the other of which “discards” the used components.©
‘The structures, and hence the random variables, described by the former method are
‘not Independent, whereas those defined by the latter methed are independent, Random
variables based on these two different sequences of random structures can exhibit
different mades of stochastic convergence.
difference, as far as I can deter
6.1
ine, has been overlooked by virtvally
everyone using sequences of random structures, but is very important. Chapter
2 explains the ramifications of the distinction.1-18
1.6. Conciusions and Further Work
Several statistical techniques are shown to be useful in the design and analysis
‘of algorithms for computer science problems. The techniques illustrated here,
however, are only the simplest used by statisticians. In the case of sampling, for
instance, much more sophisticated schemes than the random sampling used in our
algorithms can be devised, It remains to be seen hich advanced statis
I tools can
be profitably applied te algorithm design and analysis.
In addition to this general observation, there are several other questions of
varying degrees of importance which could be explored in an extension of this work or
which remain as open problems. The following is a partial list which the reader may
keep in mind aé he continues through Chapters 2,3, and 4.
(2) There are several problems for which it is known that finding an approximate
‘olution with bounded relative error is NP-complete, Is there any problem for
which it can be demonstrated that finding an approximate solution with
bounded relative error almost surely is NP-complete, In some reasonable
probabilistic model?
(2) Prove the mi
ing companion to Theorem 2.9 and show that the result of
Beardwood, et al. [1959] holds almosi surely in ihe independent’ model. Such a
proof would, according to Theorem 2.9(b), assure that Karp’s [1976] original
algorithm for the TSP succeeds strongly in the independent problem model.
(2) Extend the techniques of Section 3.3.4 from distributions of bounded random
Variables to a more general class of distributions. This looks somewhat easier
than it probably is because the present form of the algorithm “scive* relics on
the fact that the expected number of items in any bin is bounded by @1-19
constant, and this might not be true if the minimum and maximum elements
could wander off abritrarily far.
(@) Develop a technique for proving lower bounds in a probabilistic setting. (The work.
of Yao [1977] seems important here.) As a starting point, prove that an on-line
selection algorithm displaying the behavior described in Thzorem 9.14 must use
at least as much space as the procedure “median_es
(5) Show how to prove non-trivial lower bounds in a model of computation which
includes the floor function.
(6) Pertorm some experiments to evaluate the technique proposed in Section 2.4.3 for
‘A-opt heuristics.
(7) Extend the results for geometry problems in the plane to higher dimensions. Some
of them generalize very naturally, but others do not. In particular, the
algorithm for constructing the Voronoi diagram does not seem to extend
naturally to three or more dimensions and continue io run in linear expected
time.
(B) Using Lemma 44 as a starting point, derive the three possible forms of limit
distributions for extreme values of random variables. The classical approach to
this problem (see David [1970] for a description) uses a very elegant argument
to show what limiting distributions are possible, but does not make use of the
result of Lemma 44 Because of its simplicity, this lemma seems like an
appropriate candidate for the seed of an alternative proof.
(9) Find 2 companion to Theorem 48 which gives an almost sure lower bound on the
value of the optimal sotution.
(10) Exhibit an uncontrived algorithm for » real problem which can be improved by1-20
Using the method of Section 42.2. In the event that one is found, suggest an
extension to an existing programming language which permits the programmer
to have control over the scheduling of parallel tasks.
(11) Test the conclusions of Section 423 for a real problem on 2 ree! mulliprocessor.
Pr
Fy measurements of an integer programming code running on Cmmp
Inspired the investigaticn of this problem In the first place, and tend to confirm
the conclusions. However, each problem required such an enormous amount of
‘computer time thal no statistically significant results were ever obtained.
(12) Find the variance of the solution time for the randomized algorithms presented
here or, even better, the distribution of solution times. The results of Sections.
422 and 42.3 argue that knowing the distri ight be useful In ways
which are not immediately evident.
(13) Give mere examples of the uses of any cf the statistical techniques suggested in
this thesis.
(14) Suggest other algorithm design and analysis techniques based on statistical
concepts.22
‘8 sequence of random variables (X,} converges almost surely! to the random varlable
X (, a5 X) Itt PL Gi lim X,(@) = XG) } = 1 Similarly, the sequence (X,} converges in,
2 to X iff, for every € > 0, tim Pf a K,(a) ~ X(N <€} = 1.
From both an intuitive and a practical standpoint, it is profitable to view
stochastic convergence in terms of individual sample points & The sequence of random
variables (X,} converges almost surely to X if the set of sample points for which the
sequence of real numbers {X,(j)} converges to X() has probability one. It converges
In probatiity i, for every €>0, x, is within € of XG) fr cals of sample points
whose probabilities approach one as n~ co.
In the Jatter case, it is possible that the sequence of reals (X,(a)} does not
converge to X(G) for any sample point i Consider a case where X, are O-1 random
variables, and X is identically 0. If X, 4g X then for each sample point @ in a set of
probability one, the sequence {X,(a)} can have only finitely many "ones". Far X, “pr %
however, “ones” may continue to appear occasionally (io, infinitely often) in every
sequence {X,(G)}, although for most such sequences they cannot appear too
frequently. .
‘An example of a sequence which converges in probability but nat almost surely
1. The analogous concept in real function theory uses the phrase “almost
everywhere”. Since we are dealing with probabil
lity, though, the terms “almost
urely* and “with probability one” are preferable.
2. Again, this terminology is preferred to the phrase “in measure" for our purposes.23
Is provided by X,(@) = I if @, < I/n and X, = 0 otherwise. It Is clear that “ones* can
appear infinitely often in this sequence, but it is not obvious that this actually happens
for a set of sample points of positive measure. However, Lemma 22 shows that
convergence of this sequence is not almost sure, so that the probability that the
sequence does nol converge is strictly positive. On the other hand, the probability of a
“one appearing in position n tends to 0 as + aa, so the sequence does converge in
probability.
It Is clear from these intuitive pictures that almost sure convergence implies
convergence
LEMMA 2.1 -
TEX, yg X then X, tye
probability, and this can be proved rigorously.
Prost - Ses Chung [1974] sage 66.0
It is easier in practice to prove convergence in probability, directly from its
definition, than it is to prove almost sure convergence. Fortunately, there is an
alternate characterization of almost sure convergence, provided by
LEMMA 2.2 - (Borel-Cantelli Lerma) -
rt z= PUK, -XI>€) is finite for every €>0, then X, a5 X If
X, tas X and the {%,] are independent, then =z PUR, -XI> €) Is
finite for every « > 0.
Proof - See Chung [1974} pages 79-78. 0
From the definition of convergence in probability and the Gorel-Cantelli Lemma,
we can proveroy
EMMA 2.3 -
Let (%,} be independent.
(2) X, “pr Kill, for every € > 0, PL My = X1> €} = (1)
(b) Let logn be the it® iterated logarithm of n (that is, let 1ogn =
and fog"n = tog logYny, and tet f,(n) = (TT es yt, with
to)
3 €20 and every K20. If, for every €>0, P(K,-XI>€) =
Te Xytgg % then PLD, =X1>€} = offind) for every
OK (n}4logny-5) far any b 20 and any 8 > 0, then X, 55 X
Proof - Part (a) is just a restatement of the defi
of convergence in
probebifity. Pert (b) follaws from the Boret-Cantelli Lemma, as follows.
Nole first that >, in) diverges for sit kz 0, since Df) = atlog’"n), which
* &
can be proved by comparing the sum to the integral fiektx. Thur, it
& PUK, - X1> €) is to be finite, Pf PK, ~ XI > €] must approach zero faster than f,(n)
for every k2 0. On the other hand, D. fy(nrilognyt+8) is finite for every k 2 0 and
every § > 0, which can also be proved by considering the corresponding integral (see,
Hardy [1924). This demonstrates statement (b) of the lemma.
Lemmas 2.1 and 23 clearly illustrate that %,-+,,X is a (possibly strictly)
stronger statement than X, ~tpp X Even soy it is not strong enough vhat we can always
draw even the seemingly innocent coniusion that E(X,) + E00.25
‘LEMMA 2.4 ~
1 Q%} are uniformiy bounded by a constant, then X, -tpp X implies
EK, XI)+0 for every p>a. la general, however, even
X, as X does not imply that EX, ~ XIP) + 0.
Proof - The first part of the lemma follows directly from Theorem 4.1.4 of
Chung [1974] An example of 2 case where (X,) are not uniformly bounded can be
constructed using the space 4. Let X,(@) = 0 if w, > 1/n?, and X,(@) = 2" if @, 1/n2,
Applying the Borel-Cantelli Lemma, we see that X, +4, 0, but E(X,|P) = 29P/n? + eo for
allp>0.0
Another somewhat non-intuitive aspect of stochastic convergence Is identified
bby the following lemma, which uses the difference between identical random variables
Lot (Z,} be independent, and let h(n) be an integer-valued function
which is non-decreasing and unbounded. Define {X,} 2s a sequence
Of random variables with the property that X,
Zap Define {¥,)
ss a sequence of independent random variables with the properly
that ¥,~ Zar
G2) IF, gg Z then X, ag Z
(©) IZ, 445 Z then Y, py 2 but it is not necessarily true that
Yq as 226
Proof - We may assume without loss of generality that Z = 0 by considering
the random variables Z, ~ Z, X, - Z, and Y, ~ Z To prove part (a), note thal for each &,
{Z,(@)) and (X,(G)} have a commen subsequence, call it (C,(@)}. The sequence (X,(@)}
contains only the terms {C,(@)}, possibly repeated, depending on h of course. In
particular, we have X,(a) Ca) for 1snsmy where m, = min{n: h(n) > h(1)};
simitarly, X,(B) = C,(G) for m, him,)}; and so forth,
Now if Z,(@) + 0, then C,(G) + 0, since {C,(a)} is a subsequence of (Z,(3)}. As
we have seen above, C,(ai) +0 implies that X,(@) + 0, Therafore, X,() + 0 whenever
Z,(@) +0, and the latter occurs for every w in some set of probability one, Thus,
X15 0
Part (b) is somewhat different, since the only relationship between {¥,} and
{Z,} Is that PLY, sx} = Pl Zy,) 5 * }: Since (Z,} are independent, Lemma 2.3 applies,
showing that Pf [Z,|> € } + 0 for every €> 0, Hence, P{ I¥,|> € } + 0 since hin) is non-
decreasing and unbounded, proving that Y, tp, 0.
As a counterexample to the claim that Y, 4, 0, use H to define 2,(3) = 0 it
@, > 27 and Z,(@) = 1 if a, $27, and let hin) = log nl + 1. For all €>0 we have
PLIZ,[> €}5.2™, so by the Borel-Cantelli Lemma, Z, +55 0. On the other hand, for
O€} 2 1/2n). Again, by the Borel-Cantelli Lemma, {Y,} does not
converge almost surely. 0
In the computer science literature, convergence concepts have been tied to27
algorithm behavior in the sente of probabilistic approximation. Typically, » sample
point & describes 2 sequence of random problems (basz¢ on random structures such
as those introduced earlier), one of each size n for nz 1. The random vi
ble X, is
the error produced by the aigorithm on a problem of size nj in particular, X,(@) Is the
‘error when the algorilhm is applied to the problem of size n specified by ©. Under
these circumstances, we will say that the algorithm succeeds strongly if X, gq 0, and
that it succeeds weakly if X, typ 0. The terms “strong” and "weak" are used by
analogy to the strong and weak laws of large numbers.
The following twa lemmas are helptul in proving stochastic success. The first is
used when X, ic the absolute error, and the second when il is the relative error.
EMMA 2.6 - (Absolute error lemma) ~
() It Yytpp 2 and Zapp b for constants a and b, then
Yat pra tb.
Ab) IF Yy gg 4 and Z, gs b, then Y, + 2, tgs a +b.
Proot - This is a standard result, a stronger version of which is proved in
Chung [1974] but the proof is instructive because it demonstrates a common
technique for pr conclusions. For part (a), we must show that, for every
ne 5
fixed €> 0, P{ IY, *Z,-a-b1>€} +0.
PLY, +Z,-a-bl>e} s PL,~al+iZ,-bl>€)
S PL Ng- al? €/2 or [2,-b1> 6/2)
Ss PUN,-al> 6/2} + PLIQ-b1> 6/2}28
Since Yq pr 2 and Z, “pr b, both terms on the right-hand side of the last
Inequality tend to zero as + ca, which proves part (a). Part (b) is proved In exactly
ne
before each of the probabilities. Each of the sums converges because Y, ty, @ and
the same way, with the only difference being the insertion of summation
Treg oO
LEMMA 2.7 - (Relative error lemma) =
G2) IE, pp € > 0 and Z, apy 6 then (Yq = 2) 1 Yy tgp O-
Ab) IF Yy tag € > 0 and Z, gg €, then (Yq = Z,) 1 Yq tag O-
Proat ~ Again, the proof of part (b) follows exactly the same pattern as that
for part (2) so we will prove only convergence in probability. We must show that
PL KY, ~ 2) / Yel? €} + 0 for every fixed €> 0.
PLU, 2) Yel? = PUY, - 6-2, - /Y> EF
S PCH, =e) / Yul © MZ,- 0d Yel >
5 PEW / Yl? 12 oF [Zy= Yel? 6/2}
5 PLIY,-O/%I> 2} + PUI -a/¥> <2)
‘To show that of the probabilities on the right-hand side of the last inequality
both tend to zero, we
ill concentrate on the first one. The proot for the other is
similar. Let 0<8 8)
+ PLKY,- 0 /Y,1> 2 | N,-el58)29
PUR, - ets 8)
S PLN, e1>3) © PCWY-eD/YI> 72 | My el s8}
Ss PLIY,-c1>8) + PLIV,~ > ec-8/2 }
The last step follows because Y, is positive and is, in fact, no smaller than ¢ -& Since
both of these probabilities approach zero, the lemma is proved. 0
‘As an example of an algorithm which succeeds in each of these modes using
absolute error, consider the problem of finding the arithmetic average of
independent observations from (0,1), 2 normal distribution with mean 0 and
variance 1. Our probabilistic approximation algorithm will simply be to choose the first
Fn) observations from the original n and average them. Assume that r(n) = ofa).
Using the space 41, we may think of the components of a sample point @ as
being numb in a triangular array, with n elements in row n Row n then
determines a sample? of size n through a suitable transformation of the components
Wy Let X,(@) be the
yronce between the average of all n observations determined
by row n, ¥,(@), and the average of the first r(n) observations, Z,(0}. It follows.
immediately trom known properties ofthe normal distribution that ¥, = S2(0 ,n°#) and
Z~ (0, etny4).
3. A sample is a group of observations; an observation is a specific value of «
fandom variable. In this case, a problem of size n Is # sample consisting of
observations which are to be averaged.2-10
A minor complicstion arises in determining the distribution of X, = Y, = 2, since
Y, and Z, are not independent. We may not conclude, therefore, that
X, ~ R(0, n“! + (a), which would be the case if %, and Z, were Independent. This
problem will come up again in many of the algorithms we would like to investigate.
Fortunately in this instance, X, can be written as a linear combination of Z, and the
average of the remaining n-r{n) observations, which are independent. This leads to the
conclusion that X, ~ 10 , (1-rind/ni%tt eine] Ka-r(a))).
at PL KL > € } = Oeln)Yexpl-€190, $0 Xo, 0. In this case, the algorithm succeeds weakly. If (0)
grows faster than log n then, by Lemma 23, X,-t50 and the algorithm succeeds
strongly. It is in fact the case that these growth conditions are necessary for weak
land strong. success, respectively. The reason that r{n) must be unbounded for weak
success is clear. It must grow faster than logn for strong success because
DX ciny-2expt-c2r(n)/2) is finite for all ¢> 0 iff that condition it met.
We could have proved the sufficiency of the conditions on r(n) more easily
using Lemma 2.6, the absolute error lemma. Knowing the distributions of Y,, and Z, we
an show that the conditions given above are sufficient for Z, pp 0 and Zy tag 0,
respectively. Since Y, a5 0 in any event, we may conclude that X, pr 0 If rin) grows
‘without bound, and that X, +g, 0 if rin) grows faster than log n. What we gain by not2a
having to worry about the independence of Y, and Z, is offset by the fact that we
cannot say that the growth condition on r(n) is necessary. Furthermore, we know
nothing about the distribution of X, if we take the easy way out. Nevertheless, it is
g00d to have a technique available for proving stochastic success which can easily deat
with dependent random variables.
This example illustrates two prime objections to the characterization of
probal
ie approximation algérithms by the “strength” of their succes. In the first
plac
ven strong success is only of theoretical value. We could let r(n) = 1 forn sC
land r{n) = login for n> C, and the above algorithm would succeed strongly for any C,
evan if C were larger than the size of any problem we might encounter in practice. For
this reason, the approach does not enable us to conclude that an algorithm is provably
praciicai.
Secondly, neither version of success provides any means of determining, even
asymptotically, how well an algorithm works as a function of n. The only guidelines
available are those provided by Lemma 23, which may be entirely too pessimistic. We
theretore have no basis for comparison of two strongly successful algorithms, or even
for one strongly successful and one weakly successful algorithm, unless the weakly
successful one can be shown not to succeed strongly. Even then the weakly
since both charact
‘successful algorithm might be preferred in practi ions are
only asymptotic.
Notice that these are fundamentally different objections to the probabilistic
approach than the ususi ozs: thet sons 32:re not only reasonable, but at ied, yet a strongly successtul algorithm
can remain impractical. Similarly, the objections are stronger than the typical argument
against asymptotic analysis in general, which is that the conclusions are not valid
except for very large values of n. In cantrast to the “big-0° notation, for example,
there is no way to make any meaningful quantitative stalement even about asymptotic
behavior. These notions of probabilistic success are therefore too weak to allow us to
draw any serious conclusions regarding the practical value of an algorithm, although
they do permit comparisons among algorithms, at we will see in Section 2.2.
Nevertheless, due in part to the impetus which such analysie has already
received from people in the algorithms area, these basically theoretical descriptions of
probabilistic approximation algorithms are here to stay. It is important that @ firm and
‘comprehensible basis for their study be established while there are not too many
di
Hlerent results which depend on one another in a complex fashion. The goal of the
next section is to point out certain difficulties which can arise, and to introduce
‘random problem madel which can serve as the foundation for further work. In Chapter
‘3 we will return to the objections to stochastic convergence concepts as descriptions
of success, and present some alternatives which partially overcome these arguments.
2.2. Random Problem Models
‘There are at least two possible reasons for wanting to deal with random
problems rather than with fixed ones. The first, examined in some detail by Vajda
[1972] for mathematical programming problems, is thst the problem data may be
Uncertain; that is, the data may be random perturbations of the true values of the
Parameters (due to measurement errors, for instance). Among the interesting questions243
In this scenario are such things as how to optimize the expected value of @ linear
program knowing only the joint distribution of the objective function coefficients, and
not their actual values.
On the other hand, i! may be that we would like to solve many instances of
some type of problem, and have no prior knowledge of what these instances might be
other than distributions of certain characteristics. Suppose that XYZ Company has
thousands of customers throughout the United States from which it receives requests
for service at random times, but has only one serviceman who is sent out every time
100 requests have accumulated. If XYZ wants to solve the resulting 100-city traveling
‘salesman problems, company management might be interested in knowing how much
extra travel cost would be incurred, on the average, by using some approximat
sigorithm rather than a (very costly!) exact algorithm.
Whatever the possible practical applications, random problems are consi
bby algorithm analysts in order to make, probabilistic statements similar to those made
bby statisticians about statistical procedures. These statements include not only the
expected resource consumpticn of an algorithm and the classical questions of computer
science, but more recently have centered on the probability that an algorithm will
‘achieve a specified small error. It is, of course, valuable to keep in mind the possible
real-world applications of results, but in large part the models considered for analysis
must be quite simple in order to be mathematically tractable. In this section, we
examine problem models with respect to their suitability for discussion of stochastic
success of algorithms.
‘A candom problem is defined by (1) a random structure, such as set of random
points or a random graph; and (2) some function of the structure, such as the mean21a
distance between the points or the chromatic number of the graph. In some cases, @
“solution” to a problem may require not only the value of this function, but other
information at well. For example, the solution to a traveling salesman problem demands,
‘an actual tour of the points and not just the length of the tour, The function value
Provides the basis for comparison between the exact solution and an approximation.
For dealing with stochastic success, we require sequences of random problems.
A sequence can be generated in many ways from # sample point @, but we will study
only two.
‘The first possibility, which we call the incremental problem model, operates as
follows. It R.(@) is the nt problem determined by &, and R, is fully specified by k(n)
components of a sample point, then R,(@) depends on the components 635, Wy Ohiqy
Specifically, R,(@) is generated “incrementally” from R_.i(@) by some process
2 random
depending on Wxia-tjey»—> S_iqy If the underlying problem structure
vector, R,(cs) might be an n-vector generated by appending one new coordinate to the
(n-L-vector R_(i3). If it is a random graph, the incremental change might be the
addition of a new vertex and scme edges incident to it, with all the edges of the
previous graoh unchanged.
A second possibilty is the independent problem model, Contider the sequence
to be numbered as in a triangular array, with Kin) elements in row n, where k(n) it
defined as above. In this case, R,(G) depends on the components Was Oz —» Onala)
This means that R,, is totally independent of Ry, begin generated by all new defining215
components. If the underlying problem structure is a random vector, (js) is defined
by creating all new coordinate values, none of which is necessarily the same as its
counterpart in R,.(G). Similarly, if it is a random graph, an entirely new graph is
created.4
With these two problem models, and two possible modes of stochastic success
orithm, there are four cases to consider. The following theorem describes the
of an
relationships between them.
‘THEOREM 2.8 -
(a) If an algorithm succeeds strongly in either model, then it succeeds:
weakly in that model, but not necessarily vice versa,
(b) If an algorithm succeeds weakly in one model, then it succeeds
weakly In the other model.
(c) If an algorithm ‘succeeds strongly in the independent problem
model, then it succeeds strongiy in the incremental model, but not
necessarily vice versa.
root - Figure 2.1 shows schematically how the four possibilities are related,
Trroughout the proot, tet X= (yey tp thoy) be the error of the
algorithm in the incremental problem model, and let Y,(G3) = f(a Onz 1 Opyfq)) be
the error in the independent problem model. The difference between X,, and Y,, is that
between successive problems, but not strictly incremental generation. Such
models have apparently not been used In the literature and appear to be of no
value in this context.216
strong success strong success.
independent model incremental model
a ne
cece eeet S&S paeeenatnet
FIGURE 2.1 - Problem models and stochastic convergence.
they are the same function of different components of &, so that the random variables
{%} are not indepandent, whereas {¥,} are independent.
Part (2) of the theorem is clear, since X,-*a, 0 implies that X, tp, 0, and
er
similarly for {Y,} It
not the case that X, pp O implies that X,y5 0, which is
illustrated by the following example. Suppose that Kin) =A and that X(@) =0 if
iq) > L/n, and 5:,(0) = 1 if wg.) $ L/n. Although X, ->p, 0, the Borel-Cantelli Lemma
shows that the convergence is not almost sure. The same example suttices for {Yq},
proving part (a)5
‘The proof of part (b) is a direct consequence of the fact that X,~ Yq which is
5. The reader may note that it is difficult to imagine 2 useful algorithm with an error
that depends in this way on the problem being solved. Nevertheless, such an
error funet
le, which is all that is required.217
true because both X, and Y,, are obtained by applying the same function to arguments
which are identically distributed. Thus, although X, and ¥,, afe not equal, depending 2
they do on different components of a sample point, P(X, sx} = P{ ¥,Sx } If one of
les P{ K,1>€) or Pl ¥,1>€) tends to zero, then s0 does the other one,
equal, This demonstrates part (b) of the theorem.
By far the most interesting point of the theorem is part (c), which shows the
differance between the independent problem model and the incremental problem
model. The implication is an easy consequence of the Borel-Cantelli Lemma and the fact
that X,~Y,, Since {¥,} are independent, >, Pl %,l>€] is finite for every €>0,
Therefore, PL Ky1>€) fe frite for ll €>0, which means that X, 59 0, ever
though (X,} are not independent.
This argument fails to prove the implication in the other direction, since the
lack of independence of (X,) prevents us from concluding anything about the
convergence of >, P{ X,|>€}. However, a very instructive example shows that the
Feverse implication is sometimes false. This will be the key to exposing the underlying
problem which has not been noticed so far in the literature on probabil
approximation algorithms.
Suppose that X,(@) = k(ny” a where Z,~ (0,1) is determined by
iS
Wp and Kin) is non-decreasing and unbounded. Similarly, let ¥,(@) = wort Bemis218
where 2, ~ 2(0,, 1) is determined by «1, ,6
‘According to the strong law of large numbers (see, for example, Chung [1974]
‘Theorem 5.4.2) the sequence S, = mia tag 0- Since X, = Sip by Lemma 25,
Stag 0 mpl
3s that X,-t45 0 On the other hand, {Y,} are independent, with
Ya ~ Suny Lemma 25 says that {Y,} may not converge almost surely, which is in tact
the case for this particular sequence. Gecause Y,~ (0, Kiny"#), we know that
X PC 1> €} is finite tor all €>0 iff Kin) grows faster than log n. The choice
Kin) = Liog nd + 1, for example, wii result in Xq +g almost
bbut {¥,} will nat conver
surely. This example completes the proof of part (c). 0
‘The proof of the last part of Theorem 28 makes an important observation
about the strong law of large numbers. At first glance, it may seem slightly amazing
that under the simple assumption that (Z,} are iid. and have a finite mean, S, =
Kt DS 2, +45 # whenever Kin) is non-decreasing and unbounded. Closer inspection
a!
reveals that because {S,} are partial sums, if the terms of 2 particular sequence
{S,(@)] ever get close to zero, they are not likely to deviate greatly from zero
thareatter. For the usual case Kin) =n, this is true because S,= (1-n)S,. +
717, , 50 that as n+ co, the contribution of the “incremental” summand Z, gets rapidly
‘smaller. In this sense, then, it is less of a surprise that S, +g, 0, beccuse the random
variables {5,} are defined in the ‘neremental model.
6 . John Lehoczhy suggested that the proof be simplified by letting Z, and Z,, be2. Stochastic Convergence and Probabilistic Algorithms
Three types of stochastic convergence of random variables were defined in
Chapter 1. Two of these, amost sure convergence and convergence in probal
pare
important in describing the (stochastic) success of probabilistic approximation
algorithms. By a probabilistic approximation algorithm we mean an sigorithm which
usually, but not always, produces the exact solution or @ good approximate answer to
a problem, We might characterize such an algorithm, and our knowledge of it, by
dividing the class into three categories: algorithms which usually got the exact answer,
those which usually get within some error criterion ¢, and those which have a certain
‘error distribution F,. Most of the algorithms we will consider are of the second type,
although it is desirable but only occasionally possible to find the error distribution.
‘Tho convergence concepts themselves sre first examined in some detail, after
which we discuss their applications to specific probabilistic approximation algorithms
such as those described by Karp [1976}
2.1, Stochastic Convergence
Although convergence in distribution is used in some proofs we will need, the
other two modes of stochastic convergence are of more immediate interest. Recall that2-19
tral limit theorem, leads to a
An extension of the proof given here, using the
version of the strong law of large numbers for the independent problem model. The
only differnnce is that two added conditions (that Z,, have finite variance, and that
kia) grow faster than toga) are used to prove almost sure convergence of
(aor Dz,
ey
2.3. History of Confusion
Theorem 2.8, part 4
model ard the indess
convergence, in I
not vice versa, 1
surely" and tale
considerable con
The fact 1
by computer scientiN
has apparently not bew
the esence of the different
framework of the Lindeberg-Fel® R (see Chung
[1974], pages 196-214), where “tranguiar a PM sriables are introduced.
In weaker versions of the central limt iheorem (Chung [1974} page 169) Ihe partiat
‘sums 5S, of the previous section are uscd exclusively. While this difference is nated in
ed.219
‘An extension of the proof given here, using the central limit theorem, leads to =
version of the strong law of large numbers for the independent problem model. The
only difference is that two added conditions (that 2,, have finite variance, and that
‘(n) grow faster than log n) are used to prove almost sure convergence of
{ator 3a
2.3. History of Confusion
‘Theorem 2.8, part (c), explains the difference between the incremental problem
model and the independent problem model: strong success, or generally almost sure
convergence, in the independent model implies the same in the incremental model, but
not vice versa. In this section, we will review the use of such phrases as “almost
surely” and “almost everywhere” in the relevant papers, and show that there Is
considerable confusion not only between problem models, but even between almost
sure convergence and convergence in probabil.
‘The fact that the difference between the problem models has been overlooked
by computer scientists who are applying probability theory is understandable, since it
thas apparently not been explicitly pointed out in these terms before. Nevertheless,
the essence of the difference is recognized by standard probability theory texts in the
framework of the Lindeberg-Feller version of the central limit theorem (see Chung
(1974}, pages 196-214), where “triangular arrays” of random variables are introduced.
Sn weaker versions of the central limit theorem (Chung [1974], page 169) the pi
‘sums S, of the previous zection are used exclusively. While this difference is note:
Passing, it is not emphasized.
In a very difficult paper, Beardwood, Halton, and Hammersley [1959] prove that