Algorithms - Rodney R Howell
Algorithms - Rodney R Howell
A Top-Down Approach
This page intentionally left blank
Algorithms A Top-Down Approach
RODNEY R HOWELL
Kansas State University, USA
W
World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI • TOKYO
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
ALGORITHMS
A Top-Down Approach
Copyright © 2023 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or
mechanical, including photocopying, recording or any information storage and retrieval system now known or to
be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from
the publisher.
Printed in Singapore
Preface
v
vi Algorithms: A Top-Down Approach
Prerequisite Material
This book is intended to be reasonably self-contained. However, a certain
degree of maturity is assumed regarding the audience. Readers are expected
to have enough experience in writing programs so as to be able to understand
algorithms presented in a pseudo language. Experience with basic data
structures, including stacks, queues, lists, and trees, will be helpful, as will
experience in manipulating finite and infinite sums, solving recurrences, and
performing combinatorial analyses. Though calculus and number theory are
used occasionally, background in these fields of study is not assumed.
Organization
The outline of this book is as follows:
Exercises
Each chapter includes a section of exercises intended to reinforce the chapter
material. Some of the exercises are more challenging, and are therefore
marked with “∗” or “∗∗” to indicate their level of difficulty. Exercises marked
“∗” will be challenging for most undergraduates, and exercises marked “∗∗”
will be challenging for most graduate students.
https://www.worldscientific.com/worldscibooks/10.1142/13069
ix
This page intentionally left blank
Contents
Preface v
I Fundamentals 1
1. Introduction 3
1.1 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Proving Algorithm Correctness . . . . . . . . . . . . . . . . 8
1.4 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 A Case Study: Maximum Subsequence Sum . . . . . . . . . 13
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
xi
xii Algorithms: A Top-Down Approach
3. Analyzing Algorithms 55
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Big-O Notation . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Big-Ω and Big-Θ . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Operations on Sets . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Smooth Functions and Summations . . . . . . . . . . . . . 68
3.6 Analyzing While Loops . . . . . . . . . . . . . . . . . . . . 72
3.7 Analyzing Recursion . . . . . . . . . . . . . . . . . . . . . . 73
3.8 Analyzing Space Usage . . . . . . . . . . . . . . . . . . . . 81
3.9 Multiple Variables . . . . . . . . . . . . . . . . . . . . . . . 82
3.10 Little-o and Little-ω . . . . . . . . . . . . . . . . . . . . . . 89
3.11 * Use of Limits in Asymptotic Analysis . . . . . . . . . . . 92
3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.14 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9. Graphs 309
9.1 Universal Sink Detection . . . . . . . . . . . . . . . . . . . 312
9.2 Topological Sort . . . . . . . . . . . . . . . . . . . . . . . . 313
9.3 Adjacency Matrix Implementation . . . . . . . . . . . . . . 315
xiv Algorithms: A Top-Down Approach
16. N P
P-Completeness 503
16.1 Boolean Satisfiability . . . . . . . . . . . . . . . . . . . . . 503
16.2 The Set P . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
16.3 The Set N P . . . . . . . . . . . . . . . . . . . . . . . . . . 508
16.4 Restricted Satisfiability Problems . . . . . . . . . . . . . . 512
16.5 Vertex Cover and Independent Set . . . . . . . . . . . . . . 518
xvi Algorithms: A Top-Down Approach
Bibliography 581
Index 587
List of Symbols
xvii
xviii Algorithms: A Top-Down Approach
Fundamentals
This page intentionally left blank
Chapter 1
Introduction
1.1 Specifications
Before we can design or analyze any software component — including an
algorithm or a data structure — we must first know what it is supposed
to accomplish. A formal statement of what a software component is meant
to accomplish is called a specification. Here, we will discuss specifically the
specification of an algorithm. The specification of a data structure is similar,
but a bit more involved. For this reason, we will wait until Chapter 4 to
discuss the specification of data structures in detail.
Suppose, for example, that we wish to find the kth smallest element
of an array of n numbers. Thus, if k = 1, we are looking for the smallest
element in the array, or if k = n, we are looking for the largest. We will refer
to this problem as the selection problem. Even for such a seemingly simple
problem, there is a potential for ambiguity if we are not careful to state the
problem precisely. For example, what do we expect from the algorithm if k
3
4 Algorithms: A Top-Down Approach
because there are exactly k distinct values less than or equal to 10. However,
if we were to adopt this definition, there would be no kth smallest element
for k = 6. Because A is sorted, it would be better to conclude that the
kth smallest is A[k] = 9. Note that all elements strictly less than 9 are
in A[1..4]; i.e., there are strictly fewer than k elements less than the kth
smallest. Furthermore, rearranging A would not change this fact. Likewise,
observe that all the elements in A[1..5] are less than or equal to 9; i.e.,
there are at least k elements less than or equal to the kth smallest. Again,
rearranging A would not change this fact.
The above example suggests that the proper definition of the kth smallest
element of an array A[1..n] is the value x such that
• there are fewer than k elements A[i] < x and
• there are at least k elements A[i] ≤ x.
It is possible to show, though we will not do so here, that for any array A[1..n]
and any positive integer k ≤ n, there is exactly one value x satisfying both
of the above conditions. We will therefore adopt this definition of the kth
smallest element.
The complete specification for the selection problem is shown in
Figure 1.1. To express the data types of n and the elements of A, we use Nat
to denote the natural number type and Number to denote the number type.
Note that Nat is a subtype of Number — every Nat is also a Number. In
order to place fewer constraints on the algorithm, we have included in the
postcondition a statement that the elements of A[1..n] may be permuted (i.e.,
rearranged). In order for a specification to be precise, the postcondition must
state when side-effects such as this may occur. In order to keep specifications
from becoming overly wordy, we will adopt the convention that no values
may be changed unless the postcondition explicitly allows the change.
1.2 Algorithms
Once we have a specification, we need to produce an algorithm to implement
that specification. This algorithm is a precise statement of the computational
steps taken to produce the results required by the specification. An algorithm
differs from a program in that it is usually not specified in a programming
language. In this book, we describe algorithms using a notation that is precise
enough to be implemented as a programming language, but which is designed
to be read by humans.
A straightforward approach to solving the selection problem is as follows:
If we already know how to sort, then we have solved our problem; otherwise,
we must come up with a sorting algorithm. By using sorting to solve the
selection problem, we say that we have reduced the selection problem to the
sorting problem.
Solving a problem by reducing it to one or more simpler problems is the
essence of the top-down approach to designing algorithms. One advantage to
this approach is that it allows us to abstract away certain details so that we can
focus on the main steps of the algorithm. In this case, we have a selection
algorithm, but our algorithm requires a sorting algorithm before it can be fully
implemented. This abstraction facilitates understanding of the algorithm at a
high level. Specifically, if we know what is accomplished by sorting — but not
necessarily how it is accomplished — then because the selection algorithm
consists of very little else, we can readily understand what it does.
When we reduce the selection problem to the sorting problem, we need
a specification for sorting as well. For this problem, the precondition will be
that A[1..n] is an array of Numbers, where n ∈ N. Our postcondition will be
that A[1..n] is a permutation of its initial values such that for 1 ≤ i < j ≤ n,
A[i] ≤ A[j] — i.e., that A[1..n] contains its initial values in nondecreasing
order. Our selection algorithm is given in Figure 1.2. Note that Sort is only
specified — its algorithm is not provided.
Let us now refine the SimpleSelect algorithm of Figure 1.2 by designing
a sorting algorithm. We will reduce the sorting problem to the problem of
inserting an element into a sorted array. In order to complete the reduction,
we need to have a sorted array in which to insert. We have thus returned to
our original problem. We can break this circularity, however, by using the
top-down approach in a different way. Specifically, we reduce larger instances
Introduction 7
Figure 1.3 An algorithm implementing the specification of Sort, given in Figure 1.2
formally justify the reasoning used above — namely, that we can assume
that a recursive call meets its specification, provided its input is of a smaller
size than that of the original call, where the size is some natural number.
The ability to prove algorithm correctness is quite possibly the most
underrated skill in the entire discipline of computing. First, knowing how
to prove algorithm correctness also helps us in the design of algorithms.
Specifically, once we understand the mechanics of correctness proofs, we can
design the algorithm with a proof of correctness in mind. This approach
makes designing correct algorithms much easier. Second, the exercise of
working through a correctness proof — or even sketching such a proof —
often uncovers subtle errors that would be difficult to find with testing alone.
Third, this ability brings with it a capacity to understand specific algorithms
on a much deeper level. Thus, the ability to prove algorithm correctness is
a powerful tool for designing and understanding algorithms. Because these
activities are closely related to programming, this ability greatly enhances
programming abilities as well.
The proof techniques we will introduce fit nicely with the top-down
approach to algorithm design. As a result, the top-down approach itself
becomes an even more powerful tool for designing and understanding
algorithms.
variable x). Once the condition indicating the base case is true, the loop
terminates, the base case is executed, and the algorithm terminates.
Figure 1.6 shows the result of eliminating the tail recursion from
RecursiveInsert. Because RecursiveInsert is structured in a slightly
different way from what is shown in Figure 1.5, we could not apply this
translation verbatim. The tail recursion occurs, not in the else part, but
in the if part, of the if statement in RecursiveInsert. For this reason,
we did not negate the condition when forming the while loop. The base
case is then the empty else part of the if statement. Because there is no
code in the base case, there is nothing to place after the loop. In order to
avoid changing the value of n, we have copied its value to j, and used j in
place of n throughout the algorithm. Furthermore, because the meaning of
the statement “A[1..j] ← A[1..j − 1]” is not clear, we instead simulated the
recursion by the statement “j ← j − 1”.
12 Algorithms: A Top-Down Approach
Note that when i = j, the sum has a beginning Similar conventions hold for
index of i and an ending index of i−1. By convention, products, except that an
empty product is assumed
we always write summations so that the index (k in to have a value of 1.
this case) increases from its initial value (i) to its
final value (j − 1). As a result of this convention, whenever the final value
is less than the initial value, the summation contains no elements. Again by
convention, such an empty summation is defined to have a value of 0. Thus,
in the above definition, we are including the empty sequence and assuming
14 Algorithms: A Top-Down Approach
Figure 1.8 The subsequence with maximum sum may begin and end anywhere in
the array, but must be contiguous
its sum is 0. The specification for this problem is given in Figure 1.9. Note
that according to this specification, the values in A[0..n − 1] may not be
modified.
Example 1.1. Suppose A[0..5] = −1, 3, −2, 7, −9, 7. Then the subse-
quence A[1..3] = 3, −2, 7 has a sum of 8. By exhaustively checking all other
contiguous subsequences, we can verify that this is, in fact, the maximum.
For example, the subsequence A[1..5] has a sum of 6.
Example 1.2. Suppose A[0..3] = −3, −4, −1, −5. Then all nonempty
subsequences have negative sums. However, any empty subsequence (e.g.,
A[0..−1]) by definition has a sum of 0. The maximum subsequence sum of
this array is therefore 0.
We can easily obtain an algorithm for this problem by translating the
definition of a maximum subsequence sum directly into an iterative solution.
The result is shown in Figure 1.10. By applying the analysis techniques
of Chapter 3, it can be shown that the running time of this algorithm is
proportional to n3 , where n is the size of the array.
In order to illustrate the practical ramifications of this analysis, we
implemented this algorithm in the JavaTM programming language and ran
it on a personal computer using randomly generated data sets of size 2k for
Introduction 15
Figure 1.10 A simple algorithm implementing the specification given in Figure 1.9
Figure 1.11 An algorithm for maximum subsequence sum (specified in Figure 1.9),
optimized by removing an unnecessary loop from MaxSumIter (Figure 1.10)
In other words, the maximum suffix sum is the maximum sum that we
can obtain by starting at any index i, where 0 ≤ i ≤ n, and adding together
all elements from index i up to index n − 1. (Note that by taking i = n, we
include the empty sequence in this maximum.) We then have a top-down
solution for computing the maximum subsequence sum:
Introduction 17
Figure 1.12 The suffix with maximum sum may begin anywhere in the array, but
must end at the end of the array
⎧
⎨0 if n = 0
sn = (1.1)
⎩max(sn−1 , tn ) if n > 0,
Using (1.1) and (1.2), we obtain the recursive solution given in Figure 1.13.
Note that we have combined the algorithm for MaxSuffixTD with its
specification.
Unfortunately, an analysis of this algorithm shows that it also has
a running time proportional to n2 . What is worse, however, is that an
analysis of its stack usage reveals that it is linear in n. Indeed, the program
implementing this algorithm threw a StackOverflowError on an input
of size 215 .
While these results are disappointing, we at least have some techniques
for improving the stack usage. Note that in both MaxSumTD and
MaxSuffixTD, the recursive calls don’t depend on any of the rest of the
computation; hence, we should be able to implement both algorithms in a
bottom-up fashion in order to remove the recursion. Furthermore, we can
simplify the resulting code with the realization that once ti is computed
(using (1.2)), we can immediately compute si using (1.1). Thus, we can
compute both values within a single loop. The result is shown in Figure 1.14.
Because this algorithm uses no recursion, its stack usage is fixed (i.e., it
does not grow as n increases). Furthermore, an analysis of its running time
18 Algorithms: A Top-Down Approach
1.7 Summary
The study of algorithms encompasses several facets. First, before an algo-
rithm or data structure can be considered, a specification of the requirements
must be made. Having a specification, we can then design the algorithm or
20 Algorithms: A Top-Down Approach
1.8 Exercises
Exercise 1.1. We wish to design an algorithm that takes an array A[0..n−1]
of numbers in nondecreasing order and a number x, and returns the location
of the first occurrence of x in A[0..n − 1], or the location at which x could
be inserted without violating the ordering if x does not occur in the array.
Give a formal specification for this problem. The algorithm shown in Figure
1.16 should meet your specification.
Exercise 1.2. Give an iterative algorithm that results from removing the
tail recursion from the algorithm shown in Figure 1.16. Your algorithm
should meet the specification described in Exercise 1.1.
Exercise 1.3. Figure 1.17 gives a recursive algorithm for computing the
dot product of two vectors, represented as arrays. Give a bottom-up
implementation of this algorithm.
Introduction 21
(a) Give a formal specification for this problem. Use Char to denote the
data type for a character.
(b) Using the top-down approach, give an algorithm to solve this problem.
Your algorithm should contain a single recursive call.
(c) Give an iterative version of your algorithm from part (b) by either
implementing it bottom-up or eliminating tail recursion, whichever
is appropriate.
(a) Using the top-down approach, give an algorithm for FindMax. Note
that according to the specification, your algorithm may not change
the values in A. Your algorithm should contain exactly one recursive
call.
(b) Give an iterative version of your algorithm from part (a) by either
implementing it bottom-up or eliminating tail recursion, whichever
is appropriate.
* (c) Show how to reduce the sorting problem to FindMax (as specified
in part (a)) and a smaller instance of sorting. Use the technique of
Introduction 23
1.9 Notes
The elements of top-down software design were introduced by Dijkstra [28]
and Wirth [122] in the late 1960s and early 1970s. As software systems
grew, however, these notions were found to be insufficient to cope with the
sheer size of large projects. As a result, they were eventually superseded by
object-oriented design and programming. The study of algorithms, however,
does not focus on large software systems, but on small components. Conse-
quently, a top-down approach provides an ideal framework for designing and
understanding algorithms.
The maximum subsequence sum problem, as well as the algorithms
MaxSumIter, MaxSumOpt, and MaxSumBU, was introduced by Bentley
[12]. The sorting algorithm suggested by Exercise 1.6 is selection sort.
Java is a registered trademark of Oracle and/or its affiliates.
Chapter 2
25
26 Algorithms: A Top-Down Approach
Sort permutes A (we know this from its postcondition), A contains the
same collection of values as does A.
Suppose A [i] < A [k] for some i, 1 ≤ i ≤ n. Then i < k, for if k < i,
A [k] > A [i] violates the postcondition of Sort. Hence, there are fewer than
Proof. By induction on n.
Base: n ≤ 1. In this case the algorithm does nothing, but its postcondition
is vacuously satisfied (i.e., there are no i, j such that 1 ≤ i < j ≤ n).
Induction Hypothesis: Assume that for some n > 1, for every k < n,
InsertSort(A[1..k]) satisfies its specification.
Induction Step: We first assume that initially, the precondition for Insert-
Sort(A[1..n]) is satisfied. Then the precondition for InsertSort(A[1..n−1])
is also initially satisfied. By the Induction Hypothesis, we conclude that
InsertSort(A[1..n − 1]) satisfies its specification; hence, its postcondition
holds when it finishes. Let A denote the value of A after Insert-
Sort(A[1..n−1]) finishes. Then A [1..n−1] is a permutation of A[1..n−1] in
nondecreasing order, and A [n] = A[n]. Thus, A satisfies the precondition of
Insert. Let A denote the value of A after Insert(A[1..n]) is called. By the
postcondition of Insert, A [1..n] is a permutation of A[1..n] in nondecreasing
order. InsertSort therefore satisfies its specification.
Initialization: Before the loop iterates the first time, i has a value of 0.
The maximum subsequence sum of A[0.. − 1] is defined to be 0. m is initially
assigned this value. Likewise, the maximum suffix sum of A[0..− 1] is defined
to be 0, and msuf is initially assigned this value. Therefore, the invariant
initially holds.
= t i .
m = max(m, msuf )
= max(si , ti )
h−1 i −1
= max max A[k] | 0 ≤ l ≤ h ≤ i , max A[k] | 0 ≤ l ≤ i
k=l k=l
h−1
= max A[k] | 0 ≤ l ≤ h ≤ i
k=l
= s i .
32 Algorithms: A Top-Down Approach
Therefore, the invariant holds at the end of the In this textbook, a for loop
iteration. always contains a single
index variable, which either
is incremented by a fixed
positive amount each itera-
Termination: Because the loop is a for loop, it tion until it exceeds a fixed
clearly terminates. value or is decremented by a
fixed positive amount each
iteration until it is less than
a fixed value. The index
Correctness: The loop exits when i = n. Thus, from cannot be changed other-
the invariant, m is the maximum subsequence sum of wise. Such loops will always
terminate.
A[0..n − 1] when the loop terminates.
Thus, if we choose an invariant that is too strong, it may not be true each
time the loop condition is tested. On the other hand, if we choose an invariant
that is too weak, we may not be able to prove the correctness property.
Furthermore, even if the invariant is true on each iteration and is strong
enough to prove the correctness property, it may still be impossible to prove
the maintenance step. We will discuss this issue in more detail shortly.
For while loops, the proof of termination is usually nontrivial and
in some cases quite difficult. An example that is not too difficult is
IterativeInsert in Figure 1.6 (page 11). To prove termination of this
loop, we need to show that each iteration makes progress toward satisfying
the loop exit condition. The exit condition for this loop is that j ≤ 1 or
A[j] ≥ A[j − 1]. Usually, the way in which a loop will make progress toward
meeting such a condition is that each iteration will decrease the difference
between the two sides of an inequality. In this case, j is decreased by each
iteration, and therefore becomes closer to 1. (The other inequality in the
exit condition is not needed to prove termination — if it becomes true, the
Proving Algorithm Correctness 33
loop just terminates that much sooner.) We can therefore prove the following
theorem.
Theorem 2.6. The while loop in IterativeInsert always terminates.
Proof. We first observe that each iteration of the while loop decreases j
by 1. Thus, if the loop continues to iterate, eventually j ≤ 1, and the loop
then terminates.
Initialization: (Outer loop) When the loop begins, i = 1 and the contents
of A[1..n] have not been changed. Because A[1..i − 1] is an empty array, it
is in nondecreasing order.
Initialization: (Inner loop) Because A[1..n] has not been changed since the
beginning of the current iteration of the outer loop, from the outer loop
invariant, A[1..n] is a permutation of its original values. From the outer loop
invariant, A[1..i − 1] is in nondecreasing order; hence, because j = i, we have
for 1 ≤ k < k ≤ i, where k = j, A[k] ≤ A[k ].
Case 1: k < j −1. Then A [k] = A[k] and A [k ] = A[k ]. From the invariant,
A[k] ≤ A[k ]; hence, A [k] ≤ A [k ].
Case 3: k > j. Then A [k ] = A[k ], and A [k] = A[l], where l is either k, j,
or j −1. In each of these cases, l < k ; hence, from the invariant, A[l] ≤ A[k ].
Thus, A [k] ≤ A [k ].
Termination (Inner loop): Each iteration decreases the value of j by 1;
hence, if the loop keeps iterating, j must eventually be no greater than 1.
At this point, the loop will terminate.
Correctness (Inner loop): Let A [1..n] denote the contents of A[1..n] when
the while loop terminates, and let i and j denote their values at this point.
From the invariant, A [1..n] is a permutation of its original values. We must
36 Algorithms: A Top-Down Approach
Case 1: k = j. Then j > 1. From the loop exit condition, it follows that
A [j − 1] ≤ A [j] = A [k ]. From the invariant, if k = j − 1, then A [k] ≤
A [j − 1]; hence, regardless of whether k = j − 1, A [k] ≤ A [k ].
This completes the proof for the inner loop, and hence the proof of
maintenance for the outer loop.
Termination (Outer loop): Because the loop is a for loop, it must
terminate.
Correctness (Outer loop): Let A [1..n] denote its final contents. From the
invariant, A [1..n] is a permutation of its original values. From the loop exit
condition (i = n + 1) and the invariant, A [1..n] is in non-decreasing order.
Therefore, the postcondition is satisfied.
Now that we have shown that InsertionSort is correct, let us consider
how we might have found the invariant for the inner loop. The inner loop
implements a transformation of larger instances of the insertion problem,
specified in Figure 1.3 on page 7, to smaller instances of the same problem.
The loop invariant should therefore be related to the precondition for Insert.
The current instance of the insertion problem is represented by A[1..j].
Therefore, a first choice for an invariant might be that A[1..j] is a
permutation of its original values, and that A[1..j − 1] is sorted. However,
this invariant is not strong enough to prove the correctness property. To see
why, observe that the loop exit condition allows the loop to terminate when
j = 1. In this case, A[1..j] has only one element, A[1..j − 1] is empty, and
the invariant tells us almost nothing.
Clearly, we need to include in our invariant that A[1..n] is a permutation
of its initial values. Furthermore, we need more information about what has
already been sorted. Looking at the invariant for the outer loop, we might
try saying that both A[1..j − 1] and A[j..i] are in nondecreasing order. By
coupling this invariant with the loop exit condition (i.e, either j = 1 or
A[j − 1] ≤ A[j]), we can then show that A[1..i] is sorted. Furthermore, it is
possible to show that this invariant is true every time the loop condition is
tested. However, it still is not sufficient to prove the maintenance step for
this loop. To see why, observe that it tells us nothing about how A[j − 1]
Proving Algorithm Correctness 37
compares with A[j + 1]. Thus, when A[j − 1] is swapped with A[j], we cannot
show that A[j] ≤ A[j + 1].
We need to express in our invariant that when we choose two indices
k < k , where k = j, we must have A[k] ≤ A[k ]. The invariant in Figure 1.7
states precisely this fact. Arriving at this invariant, however, required some
degree of effort.
We mentioned in Section 1.4 that starting the for loop with i = 1,
rather than i = 2, simplifies the correctness proof without affecting the
correctness. We can now explain what we meant. Note that if we were to
begin the for loop with i = 2, its invariant would no longer be established
initially if n = 0. Specifically, A[1..i − 1] = A[1..1], and if n = 0, A[1] is not
a valid array location. A more complicated invariant — and consequently
a more complicated proof — would therefore be required to handle this
special case. By instead beginning the loop at 1, we have sacrificed a
very small amount of run-time overhead for the purpose of simplifying the
invariant.
and that a number is red if it is strictly less than some given value p, white
if it is equal to p, or blue if it is strictly greater than p.
The formal specification of this problem is given in Figure 2.3. Note that
we use the type Int to represent an integer. Notice also that because it may
be important to know the number of items of each color, these values are
returned in a 3-element array.
We can then find the kth smallest element in a nonempty array as follows:
1. Let p be the median element of the array.
2. Solve the resulting Dutch national flag problem.
3. If there are at least k red elements, return the kth smallest red element.
4. Otherwise, if there are at least k red and white elements combined,
return p.
5. Otherwise, return the (k − j)th smallest blue element, where j is the
number of red and white elements combined.
Note that after we have solved the Dutch national flag problem, all
elements less than p appear first in the array, followed by all elements equal
to p, followed by all elements greater than p. Furthermore, because steps 3
Proving Algorithm Correctness 39
and 5 apply to portions of the array that do not contain p, these steps solve
strictly smaller problem instances.
In what follows, we will develop a solution to the Dutch national flag
problem. We will then combine that solution with the above reduction
to obtain a solution to the selection problem (we will simply use the
specification for Median). We will then prove that the resulting algorithm
is correct.
In order to conserve resources, we will constrain our solution to the Dutch
national flag problem to rearrange items by swapping them. We will reduce
a large instance of the problem to a smaller instance. We begin by examining
the last item. If it is blue, then we can simply ignore it and solve what is
left. If it is red, we can swap it with the first item and again ignore it and
solve what is left. If it is white, we need to find out where it belongs; hence,
we temporarily ignore it and solve the remaining problem. We then swap it
with the first blue item, or if there are no blue items, we can leave it where
it is. This algorithm is shown in Figure 2.4.
If we were to implement this solution, or to analyze it using the
techniques of Chapter 3, we would soon discover that its stack usage is too
high. Furthermore, none of the recursive calls occur at either the beginning
or the end of the computation; hence, the recursion is not tail recursion, and
we cannot implement it bottom-up.
We can, however, use a technique called generalization that will allow
us to solve the problem using a transformation. We first observe that
the only reason we must wait until after the recursive calls to increment
? ? · · ·? r w w · · · w w r ? · · ·? ? w w · · · w w
swap
? ? · · ·? w w w · · · w w ? ? · · ·? w w w · · · w w
? ? · · ·? b w w · · · w w ? ? · · ·? w w w · · · w b
swap
Proving Algorithm Correctness 41
Figure 2.6 Tail recursive solution to a generalization of the Dutch national flag
problem
obtained by ignoring the first item. If it is white, we solve the problem that
results from incrementing the initial number of white items. If it is blue, we
swap it with the last element, and solve the smaller problem obtained by
ignoring the last item. A recursive implementation of this strategy is shown
in Figure 2.6.
The way we handle the case in which an item is white is suspicious in
that the reduced instance is an array with the same number of elements.
However, note that in each case, the number of elements of unknown color is
decreased by the reduction. Thus, if we choose our definition of “size” to be
the number of elements of unknown color, then our reduction does decrease
the size of the problem in each case. Recall that our notion of size is any
natural number which decreases in all “smaller” instances. Our reduction is
therefore valid.
42 Algorithms: A Top-Down Approach
Figure 2.7 An algorithm for solving the selection problem, specified in Figure 1.1,
using the median
Figure 2.7 shows the result of eliminating the tail recursion from Dutch-
FlagTailRec, incorporating it into the selection algorithm described earlier
in this section, and making some minor modifications. First, lo and hi
have been replaced by 1 and n, respectively. Second, the array N has been
removed, and r, w, b are used directly instead. Finally, referring to Figure 2.6,
note that when a recursive call is made, lo is incremented exactly when r is
incremented, and hi is decremented exactly when b is incremented. Because
we are replacing lo with 1, which cannot be changed, and hi with n, which
we would rather not change, we instead use the expressions r + 1 and n − b,
respectively. Thus, for example, instead of having a while loop condition of
w < hi − lo + 1, we replace lo with r + 1 and hi with n − b, rearrange terms,
and obtain r + w + b < n.
As we have already observed, the invariant for a loop implementing
a transformation is closely related to the precondition for the problem.
Thus, in order to obtain the loop invariant, we take the precondition for
DutchFlagTailRec, remove “A[lo..hi] is an array of Numbers”, as this
is understood, and replace lo with r + 1 and hi with n − b. This gives us
Proving Algorithm Correctness 43
most of the invariant. However, we must also take into account that the
iterations do not actually change the size of the problem instance; hence,
the invariant must also include a characterization of what has been done
outside of A[r + 1..n − b]. The portion to the left is where red items have
been placed, and the portion to the right is where blue items have been
placed. We need to include these constraints in our invariant.
Note that in Figure 2.7, the last line of SelectByMedian contains a
recursive call in which the first parameter is A[1 + r + w..n]. However, the
specification given in Figure 1.1 (page 5) states that the first parameter
must be of the form A[1..n]. To accommodate such a mismatch, we adopt
a convention that allows for automatic re-indexing of arrays when the
specification requires a parameter to be an array whose beginning index
is a fixed value. Specifically, we think of the sub-array A[1 + r + w..n] as an
array B[1..n − (r + w)]. B is then renamed to A when it is used as the actual
parameter in the recursive call.
Let us now prove the correctness of SelectByMedian. Because
SelectByMedian contains a loop, we must prove this loop’s correctness
using the techniques of Section 2.3. Specifically, we need the following lemma,
whose proof we leave as an exercise.
Lemma 2.8. If the precondition for SelectByMedian is satisfied, then its
while loop always terminates with A[1..n] being a permutation of its original
elements such that
Furthermore, when the loop terminates, r, w, and b are natural numbers such
that r + w + b = n.
We can then prove the correctness of SelectByMedian using
induction.
Theorem 2.9. SelectByMedian meets the specification of Select given
in Figure 1.1.
Proof. By induction on n.
Case 2: r < k ≤ r + w. In this case, there are fewer than k elements less
than p and at least k elements less than or equal to p. p is therefore the kth
smallest element.
Case 3: r + w < k. In this case, there are fewer than k elements less than
or equal to p. The kth smallest must therefore be greater than p. It must
therefore be in A[r + w + 1..n]. Because every element in A[1..r + w] is
less than the kth smallest, the kth smallest must be the (k − (r + w))th
smallest element in A[r + w + 1..n]. Because p is an element of A[1..n] that
is not in A[r + w + 1..n], r + w + 1 > 1, so that the number of elements in
A[r + w + 1..n] is less than n. Let us refer to A[r + w + 1..n] as B[1..n −
(r + w)]. Then because r + w < k, 1 ≤ k − (r + w), and because k ≤ n,
k − (r + w) ≤ n − (r + w). Therefore, the precondition for Select is satisfied
by the recursive call SelectByMedian(B[1..n − (r + w)], k − (r + w)). By
the Induction Hypothesis, this recursive call returns the (k − (r + w))th
smallest element of B[1..n − (r + w)] = A[r + w + 1..n]. This element is the
kth smallest of A[1..n].
In some cases, a recursive call might occur inside a loop. For such
cases, we would need to use the induction hypothesis when reasoning about
the loop. As a result, it would be impossible to separate the proof into a
lemma dealing with the loop and a theorem whose proof uses induction
and the lemma. We would instead need to prove initialization, maintenance,
termination, and correctness of the loop within the induction step of the
induction proof.
Proving Algorithm Correctness 45
i−1
= max A[k] | 0 ≤ l ≤ i + A[i]
k=l
i−1
= max A[i] + A[k] | 0 ≤ l ≤ i
k=l
i
= max A[k] | 0 ≤ l ≤ i
k=l
i −1
= max A[k] | 0 ≤ l ≤ i − 1 .
k=l
However,
i −1
ti = max A[k] | 0 ≤ l ≤ i .
k=l
Note that the set on the right-hand side of this last equality has one
more element than does the set on the right-hand side of the preceding
equality. This element is generated by l = i , which results in an empty sum
having a value of 0. All of the remaining elements are derived from values
l ≤ i − 1, which result in nonempty sums of elements from A[0..i]. Thus, if
A[0..i] contains only negative values, msuf < ti . It is therefore impossible
to prove that these values are equal.
A failure to come up with a proof of correctness does not necessarily
mean the algorithm is incorrect. It may be that we have not been clever
enough to find the proof. Alternatively, it may be that an invariant has
not been stated properly, as discussed in Section 2.3. Such a failure always
reveals, however, that we do not yet understand the algorithm well enough
to prove that it is correct.
2.7 Summary
We have introduced two main techniques for proving algorithm correctness,
depending on whether the algorithm uses recursion or iteration:
2.8 Exercises
Exercise 2.1. Induction can be used to prove solutions for summations.
Use induction to prove each of the following:
(a) The arithmetic series:
n
n(n + 1)
i= . (2.1)
2
i=1
and
Fn2 + Fn+1
2
= F2n+1 . (2.5)
φn − (−φ)−n
Fn = √ , (2.6)
5
where φ is the golden ratio:
√
1+ 5
φ= .
2
Exercise 2.4. Prove that RecursiveInsert, shown in Figure 1.4 on page 8,
meets is specification, given in Figure 1.3 on page 7.
Exercise 2.5. Prove that MaxSuffixTD and MaxSumTD, given in
Figure 1.13 (page 18), meet their specifications. For MaxSumTD, use the
specification of MaxSum given in Figure 1.9 (page 14).
Exercise 2.6. Prove that DotProduct, shown in Figure 1.17 on page 21,
meets its specification.
Exercise 2.7. Prove that Factorial, shown in Figure 2.9, meets its
specification. n! (pronounced, “n factorial”) denotes the product 1 · 2 · · · n
(0! is defined to be 1).
Exercise 2.8. A minor modification of MaxSumOpt is shown in
Figure 2.10 with its loop invariants. Prove that it meets the specification
of MaxSum, given in Figure 1.9 (page 14).
50 Algorithms: A Top-Down Approach
Exercise 2.16. Figure 2.15 contains an algorithm for reducing the Dutch
national flag problem to the problem solved in Figure 2.14. However, the
algorithm contains several errors. Work through a proof that this algorithm
meets its specification (given in Figure 2.3 on page 38), pointing out each
place at which the proof fails. At each of these places, suggest a small change
that could be made to correct the error. In some cases, the error might be
in the invariant, not the algorithm itself.
* Exercise 2.17. Reduce the sorting problem to the Dutch national flag
problem and one or more smaller instances of itself.
2.9 Notes
The techniques presented here for proving correctness of algorithms are based
on Hoare logic [63]. More complete treatments of techniques for proving
54 Algorithms: A Top-Down Approach
program correctness can be found in Apt and Olderog [6] or Francez [44]. Our
presentation of proofs using invariants is patterned after Cormen, et al. [25].
A discussion of the Dutch national flag problem and the iterative solution
used in SelectByMedian are given by Dijkstra [29]. The Collatz problem
was first posted by Lothar Collatz in 1937. An up-to-date summary of its
history is maintained by Eric Weisstein [119].
Chapter 3
Analyzing Algorithms
In Chapter 1, we saw that different algorithms for the same problem can
have dramatically different performance. In this chapter, we will introduce
techniques for mathematically analyzing the performance of algorithms.
These analyses will enable us to predict, to a certain extent, the performance
of programs using these algorithms.
3.1 Motivation
Perhaps the most common performance measure of a program is its running
time. The running time of a program depends not only on the algorithms it
uses, but also on such factors as the speed of the processor(s), the amount
of main memory available, the speeds of devices accessed, and the impact of
other software utilizing the same resources. Furthermore, the same algorithm
can perform differently when coded in different languages, even when all
other factors remain unchanged. When analyzing the performance of an
algorithm, we would like to learn something about the running time of any
of its implementations, regardless of the impact of these other factors.
Suppose we divide an execution of an algorithm into a sequence of steps,
each of which does some fixed amount of work. For example, a step could be
comparing two values or performing a single arithmetic operation. Assuming
the values used are small enough to fit into a single machine word, we could
reasonably expect that any processor could execute each step in a bounded
amount of time. Some of these steps might be faster than others, but for
any given processor, we should be able to identify both a lower bound l > 0
and an upper bound u ≥ l on the amount of time required for any single
execution step, assuming no other programs are being executed by that
55
56 Algorithms: A Top-Down Approach
Definition 3.1. Let f : N → R≥0 . O(f (n)) is defined O(f (n)) is pronounced
to be the set of all functions g : N → R≥0 such that “big-Oh of f of n”.
statements prior to the loop, including the initialization of the loop index i,
require a fixed number of steps. Their running time is therefore bounded
by some constant a. Likewise, the number of steps required by any single
iteration of the loop (including the loop test and the increment of i) is
bounded by some constant b. Because the loop iterates n times, the total
number of steps required by the loop is at most bn. Finally, the last loop
condition test and the return statement require a number of steps bounded
by some constant c. The running time of the entire algorithm is therefore
bounded by a + bn + c, where a, b, and c are fixed positive constants. The
running time of MaxSumBU is in O(n), because a + bn + c ≤ (a + b + c)n
for all n ≥ 1.
We can simplify the above analysis somewhat using the following
theorem.
Theorem 3.8. Suppose f1 (n) ∈ O(g1 (n)) and f2 (n) ∈ O(g2 (n)). Then
1. f1 (n)f2 (n) ∈ O(g1 (n)g2 (n)); and
2. f1 (n) + f2 (n) ∈ O(max(g1 (n), g2 (n))).
(By f1 (n)f2 (n), we mean the function that maps n to the product of f1 (n)
and f2 (n). Likewise, max(g1 (n), g2 (n)) denotes the function that maps n to
the maximum of g1 (n) and g2 (n).)
Proof. Because f1 (n) ∈ O(g1 (n)) and f2 (n) ∈ O(g2 (n)), there exist
positive real numbers c1 and c2 and natural numbers n1 and n2 such that
and
Figure 3.1 Venn diagram depicting the relationships between the sets O(f (n)),
Ω(f (n)), and Θ(f (n))
In other words, Θ(f (n)) is the set of all functions belonging to both
O(f (n)) and Ω(f (n)) (see Figure 3.1). We can restate this definition by
the following theorem, which characterizes Θ(f (n)) in terms similar to the
definitions of O and Ω.
Theorem 3.13. g(n) ∈ Θ(f (n)) iff there exist positive constants c1 and c2
and a natural number n0 such that
whenever n ≥ n0 .
⇒: Suppose g(n) ∈ Θ(f (n)). Then g(n) ∈ O(f (n)) and g(n) ∈ Ω(f (n)).
By the definition of Ω, there exist a positive real number c1 and a natural
number n1 such that c1 f (n) ≤ g(n) whenever n ≥ n1 . By the definition of
O, there exist a positive real number c2 and a natural number n2 such that
g(n) ≤ c2 f (n) whenever n ≥ n2 . Let n0 = max(n1 , n2 ). Then (3.3) holds
whenever n ≥ n0 .
Analyzing Algorithms 63
Using this lower bound, we conclude that the running time of the inner loop
is in Ω(1). Because the outer loop iterates n times, the running time of the
algorithm is in Ω(n).
Unfortunately, this lower bound does not match our upper bound of
O(n2 ). In some cases, we may not be able to make the upper and lower
bounds match. In most cases, however, if we work hard enough, we can
bring them together.
Clearly, the running time of a single iteration of the inner loop will
require a constant number of steps in the worst case. Let a > 0 denote that
constant. The loop iterates n − i times, so that the total number of steps
required by the inner loop is (n − i)a. An iteration of the outer loop requires
a constant number of steps apart from the inner loop. Let b > 0 denote that
constant. The loop iterates n times. However, because the number of steps
required for the inner loop depends on the value of i, which is different for
each iteration of the outer loop, we must be more careful in computing the
total number of steps required by the outer loop. That number is given by
n−1
n−1
b+ (n − i)a = b + a (n − i).
i=0 i=0
We can now use (2.1) from page 48 to conclude that the number of steps
taken by the outer loop is
an(n + 1)
b+ .
2
The above expression is a polynomial in n with degree 2. The following
theorem gives us a way to characterize polynomials using asymptotic
notation.
Analyzing Algorithms 65
f (n) = p(n)
k
= ai ni
i=0
k
≤ Ani
i=0
k
≤A nk
i=0
because n ≥ 1. Thus,
k
f (n) ≤ A nk
i=0
= A(k + 1)nk .
f (n) = p(n)
k−1
= ak n k + ai n i
i=0
k−1
≥ ak nk + A ni
i=0
66 Algorithms: A Top-Down Approach
k−1
k
= ak n + A ni
i=0
k−1
≥ ak nk + A nk−1
i=0
• f ◦ A = {f ◦ g | g ∈ A};
• A ◦ f = {g ◦ f | g ∈ A}; and
• A ◦ B = {g ◦ h | g ∈ A, h ∈ B}.
Example 3.21. n2 + Θ(n3 ) is the set of all functions that can be written
n2 + g(n) for some g(n) ∈ Θ(n3 ). This set includes such functions as
• n2 + 3n3 ;
3 n3 +1
• (n3 + 1)/2, which can be written n2 + ( n 2+1 − n2 ) (note that 2 −n
2 ≥0
for all natural numbers n); and
• n3 + 2n, which can be written n2 + (n3 + 2n − n2 ).
Example 3.22. O(n2 ) + O(n3 ) is the set of functions that can be written
f (n) + g(n), where f (n) ∈ O(n2 ) and g(n) ∈ O(n3 ). Functions in this set
include:
• 2n2 + 3n3 ;
• 2n, which can be written as n + n; and
• 2n3 , which can be written as 0 + 2n3 .
Because all functions in this set belong to O(n3 ), O(n2 ) + O(n3 ) ⊆ O(n3 ).
n
g(n) = f (i)
i=k
Example 3.24.
n
Θ(i2 )
i=1
for some h(n) ∈ Θ(n). Note that because h(0) may have any nonnegative
value, so may f (0).
We can use the above definitions to simplify our analysis of the lower
bound for MaxSumOpt. Instead of introducing the constant a to represent
the running time of a single iteration of the inner loop, we can simply use
Ω(1) to represent the lower bound for this running time. We can therefore
conclude that the total running time of the inner loop is in
n−1
Ω(1).
k=i
for some smooth function f . The following theorem, whose proof is outlined
in Exercise 3.13, can then be applied.
Theorem 3.31. Let f : N → R≥0 be a smooth function, g : N → N be an
eventually non-decreasing and unbounded function, and let X denote either
O, Ω, or Θ. Then
g(n)
X(f (i)) ⊆ X(g(n)f (g(n))).
i=1
• f c (n); and
• f (g(n)), provided g is unbounded.
The proof is left as an exercise. Knowing that f (n) = n is smooth, we
can apply Theorem 3.32 to conclude that any polynomial is smooth. In fact,
√ √
such functions as n and n 2 are also smooth. We can extend this idea to
logarithms as well. In particular, let lg x denote the base-2 logarithm; i.e.,
2lg x = x (3.4)
for all positive x.
Example 3.33. lg n is smooth. Clearly lg n is eventually non-decreasing and
eventually positive. Furthermore, lg(2n) = 1 + lg n ≤ 2 lg n whenever n ≥ 2.
Thus far, the only example we have seen of a non-smooth function
is 2n . Indeed, almost any polynomial-bounded, eventually non-decreasing,
eventually positive function we encounter will turn out to be smooth.
However, we can contrive exceptions. For example, we leave it as an exercise
lg lg n
to show that 22 ∈ O(n), but is not smooth.
We can now continue the analysis of the lower bound for MaxSumOpt.
As we showed in the previous section, the lower bound on the running time
of the inner loop is in
n−1
Ω(1).
k=i
In order to apply Theorem 3.31, we need to rewrite the sum so that the
summation index begins at 1:
n−1
n−i
Ω(1) = Ω(1).
k=i k=1
Theorem 3.31 still does not quite apply because the upper limit, n − i,
is not a function of one variable. In order to overcome this difficulty, we
can introduce an auxiliary variable N , which we define to be n − i. Because
i < n in the algorithm, this definition makes sense — N is a natural number.
The lower bound can now be expressed as
N
Ω(1).
k=1
Analyzing Algorithms 71
We can now apply Theorem 3.31 by letting g(N ) = N and f (k) = 1. Because
N is eventually non-decreasing and unbounded, and because 1 is smooth, we
can conclude that the running time of the inner loop is in Ω(g(N )f (g(N ))) =
Ω(N ). The lower bound for the algorithm is therefore in
n−1
Ω(N ).
i=0
Again, Theorem 3.31 does not immediately apply to this summation. First,
the lower limit of the index i is 0, not 1 as required by Theorem 3.31.
Furthermore, the theorem requires the expression inside the asymptotic
notation to be a function of the summation index i, not of N .
In order to take care of the latter problem, we observe that as i ranges
from 0 to n − 1, N (or n − i) takes on each of the integer values from n to 1.
We can therefore write the above sum as:
n
Ω(N ).
N =1
This summation does not immediately fit the form of Theorem 3.31, as the
starting value of the summation index J is 0, not 1. We can rewrite this
sum as
n−i
n−i+1
Θ(J) = Θ(J − 1).
J=0 J=1
What we have done here is simply to shift the range of J upward by 1 (i.e.,
from 0, . . . , n − i to 1, . . . , n − i + 1), and to compensate for this shift by
subtracting 1 from each occurrence of J in the expression being summed.
Now from Theorem 3.19, J − 1 ∈ Θ(J), and from Theorem 3.18, Θ(J − 1) =
Θ(J); hence, the running time of the middle loop is in
n−i−1
Θ(J).
J=1
Applying Theorem 3.31 to the above sum, we find that the running time of
the middle loop is in Θ(N 2 ). The running time of the outer loop is then in
n
n+1
Θ(N 2 ) = Θ(N 2 ).
i=0 N =1
Applying Theorem 3.31 to this sum, we find that the running time of this
loop is in
i − 1 times. Because each iteration runs in Θ(1) time, the while loop runs
in O(i) time in the worst case.
In order to be able to conclude that the loop runs in Θ(i) time in the
worst case, we must determine that for arbitrarily large i, the loop may
iterate until j = 1. This is certainly the case if, prior to the beginning of the
loop, A[i] is strictly less than every element in A[1..i − 1]. Thus, the while
loop runs in Θ(i) time in the worst case.
It is now tempting to use Theorem 3.31 to conclude that the entire
algorithm’s running time is in
n
Θ(1) + Θ(i) ⊆ Θ(1) + Θ(n2 )
i=1
= Θ(n2 ).
However, we must be careful, because we have not shown that the while
loop runs in Ω(i) time for every iteration of the for loop; hence the running
time of the for loop might not be in
n
Θ(i).
i=1
We must show that there are inputs of size n, for every sufficiently
large n, such that the while loop iterates i − 1 times for each iteration
of the for loop. It is not hard to show that an array of distinct elements in
decreasing order will produce the desired behavior. Therefore, the algorithm
indeed operates in Θ(n2 ) time.
Suppose now that we wish to analyze an algorithm that makes one or more
recursive calls. For example, consider MaxSuffixTD from Figure 1.13 on
page 18. We analyze such an algorithm in exactly the same way. Specifically,
74 Algorithms: A Top-Down Approach
this algorithm has a running time in Θ(1) plus whatever is required by the
recursive call. The difficulty here is in how to determine the running time of
the recursive call without knowing the running time of the algorithm.
The solution to this difficulty is to express the running time as a
recurrence. Specifically, let f (n) denote the worst-case running time of
MaxSuffixTD on an array of size n. Then for n > 0, we have the equation,
f (n) = g(n) + f (n − 1) (3.5)
where g(n) ∈ Θ(1) is the worst-case running time of the body of the function,
excluding the recursive call. Note that f (n − 1) has already been defined to
be the worst-case running time of MaxSuffixTD on an array of size n − 1;
hence, f (n − 1) gives the worst-case running time of the recursive call.
The solution of arbitrary recurrences is beyond the scope of this book.
However, asymptotic solutions are often much simpler to obtain than are
exact solutions. First, we observe that (3.5) can be simplified using set
operations:
f (n) ∈ f (n − 1) + Θ(1) (3.6)
for n > 0.
It turns out that most of the recurrences that we derive when analyzing
algorithms fit into a few general forms. With asymptotic solutions to these
general forms, we can analyze recursive algorithms without using a great
deal of detailed mathematics. (3.6) fits one of the most basic of these forms.
The following theorem, whose proof is outlined in Exercise 3.23, gives the
asymptotic solution to this form.
Theorem 3.34. Let
f (n) ∈ af (n − 1) + X(bn g(n))
for n > n0 , where n0 ∈ N, a ≥ 1 and b ≥ 1 are real numbers, g(n) is a
smooth function, and X is either O, Ω, or Θ. Then
⎧
⎪
⎪ X(bn g(n)) if a < b
⎨
f (n) ∈ X(nan g(n)) if a = b
⎪
⎪
⎩
X(an ) if a > b.
When we apply this theorem to the analysis of algorithms, a in the
recurrence denotes the number of recursive calls. The set X(bn g(n)) contains
the function giving the running time of the algorithm, excluding recursive
calls. Note that the expression bn g(n) is general enough to describe a wide
Analyzing Algorithms 75
f (n) ∈ f (n − 1) + Θ(n)
for n > 0. Again, this recurrence fits the form of Theorem 3.34 with a = 1,
b = 1, g(n) = n, and X = Θ. The second case again holds, so that the
running time is in Θ(n2 ).
It is no coincidence that both of these analyses fit the second case of
Theorem 3.34. Note that unless a and b are both 1, Theorem 3.34 yields an
exponential result. Thus, efficient algorithms will always fit the second case
if this theorem applies. As a result, we can observe that an algorithm that
makes more than one recursive call of size n−1 will yield an exponential-time
algorithm.
We have included the first and third cases in Theorem 3.34 because they
are useful in deriving a solution for certain other types of recurrences. To
illustrate how these recurrences arise, we consider another solution to the
maximum subsequence sum problem (see Section 1.6).
The technique we will use is called divide-and-conquer. This technique,
which we will examine in detail in Chapter 10, involves reducing the size of
recursive calls to a fixed fraction of the size of the original call. For example,
we may attempt to make recursive calls on arrays of half the original size.
We therefore begin this solution by dividing a large array in half, as
nearly as possible. The subsequence giving us the maximum sum can then
lie in one of three places: entirely in the first half, entirely in the second
half, or partially in both halves, as shown in Figure 3.2. We can find the
maximum subsequence sum of each half by solving the two smaller problem
instances recursively. If we can then find the maximum sum of any sequence
76 Algorithms: A Top-Down Approach
Figure 3.2 When applying divide-and-conquer, the maximum subsequence sum may
not lie entirely in either half
0 n−1
that begins in the first half and ends in the second half, then the maximum
of these three values is the overall maximum subsequence sum.
For example, consider again the array A[0..5] = −1, 3, −2, 7, −9, 7 from
Example 1.1 (page 14). The maximum subsequence sum of the first half,
namely, of A[0..2] = −1, 3, −2, has a value of 3. Likewise, the maximum
subsequence sum of the second half, 7, −9, 7, is 7. In examining the two
halves, we have missed the actual maximum, A[1..3] = 3, −2, 7, which
resides in neither half. However, notice that such a sequence that resides in
neither half can be expressed as a suffix of the first half followed by a prefix
of the last half; e.g., 3, −2, 7 can be expressed as 3, −2 followed by 7.
Let us define the maximum prefix sum analogously to the maximum
suffix sum as follows:
i−1
max A[k] | 0 ≤ i ≤ n .
k=0
It is not hard to see that the maximum sum of any sequence crossing
the boundary is simply the maximum suffix sum of the first half plus
the maximum prefix sum of the second half. For example, returning to
Example 1.1, the maximum suffix sum of the first half is 1, obtained from
the suffix 3, −2. Likewise, the maximum prefix sum of the second half is 7,
obtained from the prefix 7. The sum of these two values gives us 8, the
maximum subsequence sum.
Note that when we create smaller instances by splitting the array in half,
one of the two smaller instances — the upper half — does not begin with
index 0. For this reason, let us describe the input array more generally, as
A[lo..hi]. We can then modify the definitions of maximum subsequence sum,
maximum suffix sum, and maximum prefix sum by replacing 0 with lo and
n − 1 with hi. We will discuss the ranges of lo and hi shortly.
Analyzing Algorithms 77
for n > 1.
This equation does not fit the form of Theorem 3.34. However, suppose
we focus only on those values of n that are powers of 2; i.e., let n = 2k for
some k > 0, and let g(k) = f (2k ) = f (n). Then
g(k) = f (2k )
∈ 2f (2k−1 ) + Θ(2k )
= 2g(k − 1) + Θ(2k ) (3.8)
78 Algorithms: A Top-Down Approach
for k > 0. Theorem 3.34 applies to (3.8), yielding g(k) ∈ Θ(k2k ). Because
n = 2k , we have k = lg n, so that
f (n) ≤ c2 (k + 1)2k+1
≤ c2 dk2k
≤ c2 dn lg n
∈ O(n lg n).
Likewise,
f (n) ≥ c1 k2k
c1 (k + 1)2k+1
≥
d
c1 n lg n
≥
d
∈ Ω(n lg n).
Let us first see that (3.7) fits the form of Theorem 3.35. As we
have already observed, f is eventually non-decreasing (this requirement is
typically met by recurrences obtained in the analysis of algorithms). When
n = 2k , (3.7) simplifies to
g (n) = g(2n )
= lg 2n
=n
g (n) = g(5 · 3n )
= lg2 (5 · 3n )
= lg2 5 + n2 lg2 3
hence, the stack usage is in Θ(n). We typically would not have a runtime
stack capable of occupying space proportional to an input of, say, a million
elements.
Let us now complete the analysis of MaxSumTD. Ignoring the space
usage of the recursive call, we see that MaxSumTD uses Θ(n) space, due to
the space usage of MaxSuffixTD. However, this does not mean that the
following recurrence describes the total space usage:
f (n) ∈ f (n − 1) + Θ(n),
for n > 0. The reason is that the call made to MaxSuffixTD can
reuse the space used by the recursive call. Furthermore, any calls made to
MaxSuffixTD as a result of the recursive call will be on arrays of fewer than
n elements, so they may reuse the space used by MaxSuffixTD(A[0..n−1]).
Therefore, the total space used by all calls to MaxSuffixTD is in Θ(n).
Ignoring this space, the space used by MaxSumTD is given by
f (n) ∈ f (n − 1) + Θ(1),
for n > 0, so that f (n) ∈ Θ(n). The total space used is therefore in Θ(n) +
Θ(n) = Θ(n).
Now let’s consider MaxSumDC. MaxSuffixBU and MaxPrefixBU
each use Θ(1) space. Because the two recursive calls can reuse the same
space, the total space usage is given by
for n > 1. Applying Theorem 3.35, we see that f (n) ∈ Θ(lg n). Because lg n is
such a slow-growing function (e.g., lg 106 < 20), we can see that MaxSumDC
is a much more space-efficient algorithm than MaxSumTD. Because the
space used by both algorithms is almost entirely from the runtime stack,
MaxSumDC will not have the stack problems that MaxSumTD has.
g(m, n) ≤ cf (m, n)
and
whenever m ≥ n0 and n ≥ n0 .
Likewise, we can define big-Ω and big-Θ for 2-variable functions.
Definition 3.40. For a function f : N × N → R≥0 , we define Ω(f (m, n)) to
be the set of all functions g : N × N → R≥0 such that there exist c ∈ R>0
and n0 ∈ N so that
g(m, n) ≥ cf (m, n)
and
whenever m ≥ n0 and n ≥ n0 .
Definition 3.41. For a function f : N × N → R≥0 ,
1. O(f (m, n)) is the set of all functions g : N × N → R≥0 such that there
exist c ∈ R>0 and n0 ∈ N such that
g(m, n) ≤ cf (m, n),
whenever m ≥ n0 and n ≥ n0 .
2. Ω(f (m, n)) is the set of all functions g : N × N → R≥0 such that there
exist c ∈ R>0 and n0 ∈ N such that
g(m, n) ≥ cf (m, n),
whenever m ≥ n0 and n ≥ n0 .
Proof. From the definitions, for any function g(m, n) in O(f (m, n)) or
in Ω(f (m, n)), respectively, there are a c ∈ R>0 and an n0 ∈ N such that
whenever m ≥ n0 and n ≥ n0 , the corresponding inequality above is satisfied.
We therefore only need to show that if there are c ∈ R>0 and n0 ∈ N such
that whenever m ≥ n0 and n ≥ n0 , the given inequality is satisfied, then
g(m, n) belongs to O(f (m, n)) or Ω(f (m, n), respectively.
We first observe that if f is strictly non-decreasing, then
f (m, n) = f (m, n),
for all natural numbers m and n. Furthermore, for any function g : N × N →
R≥0 ,
g(m, n) ≥ g(m, n).
Now suppose c ∈ R>0 and n0 ∈ N such that whenever m ≥ n0 and
n ≥ n0 , g(m, n) ≤ cf (m, n). Then for m ≥ n0 and n ≥ n0 ,
g(m, n) ≤ g(m, n)
≤ cf (m, n)
= cf (m, n).
Hence, g(m, n) ∈ O(f (m, n)).
Likewise, suppose now that c ∈ R>0 and n0 ∈ N such that whenever
m ≥ n0 and n ≥ n0 , g(m, n) ≥ cf (m, n). Then for m ≥ n0 and n ≥ n0 ,
g(m, n) ≥ g(m, n)
≥ cf (m, n)
= cf (m, n).
Therefore, g(m, n) ∈ Ω(f (m, n)).
Analyzing Algorithms 87
Proof. We will only show part 3.44; part 3.44 will be left as an exercise.
Because f1 (m, n) ∈ O(g1 (m, n)) and f2 (m, n) ∈ O(g2 (m, n)), there exist
positive real numbers c1 and c2 and natural numbers n1 and n2 such that
whenever m ≥ n1 and n ≥ n1 ,
f
1 f2 (m, n) = max{f1 (i, j)f2 (i, j) | 0 ≤ i ≤ m, 0 ≤ j ≤ n}.
f
1 f2 (m, n) ≤ f1 (m, n)f2 (m, n).
f
1 f2 (m, n) ≤ f1 (m, n)f2 (m, n)
Before we can extend Theorem 3.31 to more than one variable, we must
first extend the definition of smoothness. In order to do this, we must first
extend the definitions of eventually non-decreasing and eventually positive.
The following extension to Theorem 3.31 can now be shown — the proof
is left as an exercise.
g(m)
X(f (i, n)) ⊆ X(g(m)f (g(m), n)).
i=1
Analyzing Algorithms 89
Having the above theorems, we can now complete the analysis of Add-
Matrices. Because we are analyzing the algorithm with respect to two
parameters, we view n as the 2-variable function f (m, n) = n, and we view
m as the 2-variable function g(m, n) = m. We can then apply Corollary 3.46
to Θ(m)Θ(n) to obtain a running time in Θ(mn). Alternatively, because n
is smooth, we could apply Theorem 3.50 to obtain
m
Θ(n) ⊆ Θ(mn).
i=1
The results from this section give us the tools we need to analyze iterative
algorithms with two natural parameters. Furthermore, all of these results
can be easily extended to more than two parameters. Recursive algorithms,
however, present a greater challenge. In order to analyze recursive algorithms
using more than one natural parameter, we need to be able to handle
asymptotic recurrences in more than one variable. This topic is beyond the
scope of this book.
Figure 3.6 Venn diagram depicting the relationships between the sets O(f (n)),
Ω(f (n)), Θ(f (n)), o(f (n)), and ω(f (n))
It may seem at this point that the above theorem could be strengthened
to say that o(f (n)) = O(f (n)) \ Θ(f (n)) and ω(f (n)) = Ω(f (n)) \ Θ(f (n)).
Indeed, for functions f and g that we typically encounter in the analysis
of algorithms, it will be the case that if g(n) ∈ O(f (n)) \ Θ(f (n)) then
g(n) ∈ o(f (n)). However, there are exceptions. For example, let f (n) = n,
lg lg n
and let g(n) = 22 . Then g(n) ∈ O(f (n)) because g(n) ≤ f (n)√for all
k k−1
n ∈ N. Furthermore, when n = 22 − 1 for k > 0, g(n) = 22 = n + 1;
k
hence, g(n) ∈ Θ(f (n)). Finally, when n = 22 , g(n) = n, so g(n) ∈ o(f (n)).
Note that we have the same duality between o and ω as between O and
Ω. We therefore have the following theorem.
Theorem 3.55. Let f : N → R≥0 and g : N → R≥0 . Then g(n) ∈ o(f (n))
iff f (n) ∈ ω(g(n)).
Given the above results, we might expect o and ω to have some properties
similar to those of other forms of asymptotic notation. One example of such
a property is expressed in the following theorem, which is analogous to
Theorems 3.8 and 3.15. Its proof is left as an exercise.
Theorem 3.56. Suppose f1 (n) ∈ o(g1 (n)), f2 (n) ∈ o(g2 (n)), f3 (n) ∈
ω(g3 (n)), and f4 (n) ∈ ω(g4 (n)). Then
Theorem 3.57. Let p, q ∈ R≥0 such that p < q, and suppose f (n) ∈ O(np )
and g(n) ∈ Ω(nq ). Then f (n) ∈ o(g(n)).
Proof. Because f (n) ∈ O(np ), there exist a positive real number c1 and a
natural number n1 such that
f (n) ≤ c1 np (3.11)
g(n) ≥ c2 nq (3.12)
92 Algorithms: A Top-Down Approach
c1 g(n)
f (n) ≤
c2 nq−p
c1 g(n)
f (n) ≤
c2 nq−p
< cg(n).
lim f (n) = u,
n→∞
if for every positive real number c, there is a natural number n0 such that
|f (n) − u| < c whenever n ≥ n0 . Likewise, for a function g : R≥0 → R, we
say that
lim g(x) = u,
x→∞
if for every positive real number c, there is a real number x0 such that
|g(x) − u| < c whenever x ≥ x0 .
Analyzing Algorithms 93
whenever the latter limit exists. It is also possible to define infinite limits,
but for our purposes we only need finite limits as defined above. Given this
definition, we can now formally relate limits to asymptotic notation.
Theorem 3.60. Let f : N → R≥0 and g : N → R≥0 . Then
1. g(n) ∈ o(f (n)) iff limn→∞ g(n)/f (n) = 0 and
2. g(n) ∈ Θ(f (n)) if limn→∞ g(n)/f (n) = x > 0.
Note that part 3.60 is an “if and only if”, whereas part 3.60 is an “if”.
The reason for this is that there are four possibilities, given arbitrary f and
g:
1. limn→∞ g(n)/f (n) = 0. In this case g(n) ∈ o(f (n)) and f (n) ∈ ω(g(n)).
2. limn→∞ f (n)/g(n) = 0. In this case f (n) ∈ o(g(n)) and g(n) ∈ ω(f (n)).
3. limn→∞ g(n)/f (n) = x > 0. In this case, g(n) ∈ Θ(f (n)) and f (n) ∈
Θ(g(n)). (Note that limn→∞ f (n)/g(n) = 1/x > 0.)
4. Neither limn→∞ g(n)/f (n) nor limn→∞ f (n)/g(n) exists. In this case, we
can only conclude that g(n) ∈ o(f (n)) and f (n) ∈ o(g(n)) — we do not
have enough information to determine whether g(n) ∈ Θ(f (n)).
Because these inequalities hold for every positive real number c, and
because x > 0, we may choose c = x/2, so that both x − c and x + c
are positive. Therefore, g(n) ∈ Θ(f (n)).
A powerful tool for evaluating limits of the form given in Theorem 3.60
is L’Hôpital’s rule, which we present without proof in the following theorem.
94 Algorithms: A Top-Down Approach
= 0.
Hence, limx→∞ lgp x/xq = 0. Therefore, lgp n ∈ o(nq ) and O(lgp n) ⊆
o(nq ).
2. Because limx→∞ lgp x/xq = 0 and 2x is non-decreasing and unbounded,
it follows that
lim xp /2qx = lim lgp (2x )/(2x )q
x→∞ x→∞
= 0.
Therefore, np ∈ o(2qn ) and O(np ) ⊆ o(2qn ).
Analyzing Algorithms 95
3.12 Summary
Asymptotic notation can be used to express the growth rates of functions
in a way that ignores constant factors and focuses on the behavior as the
function argument increases. We can therefore use asymptotic notation to
analyze performance of algorithms in terms of such measures as worst-case
running time or space usage. O and Ω are used to express upper and lower
bounds, respectively, while Θ is used to express the fact that the upper and
lower bounds are tight. o gives us the ability to abstract away low-order terms
when we don’t want to ignore constant factors. ω provides a dual for o.
Analysis of iterative algorithms typically involves summations. Theo-
rem 3.31 gives us a powerful tool for obtaining asymptotic solutions for
summations. Analysis of recursive algorithms, on the other hand, typically
involves recurrence relations. Theorems 3.34 and 3.35 provide asymptotic
solutions for the most common forms of recurrences.
The analyses of the various algorithms for the maximum subsequence
sum problem illustrate the utility of asymptotic analysis. We saw that the
five algorithms have worst-case running times shown in Figure 3.7. These
results correlate well with the actual running times shown in Figure 1.15.
The results of asymptotic analyses can also be used to predict perfor-
mance degradation. If an algorithm’s running time is in Θ(f (n)), then as n
increases, the running time of an implementation must lie between cf (n) and
df (n) for some positive real numbers c and d. In fact, for most algorithms,
this running time will approach cf (n) for a single positive real number c.
Assuming that this convergence occurs, if we run the algorithm on sufficiently
large input, we can approximate c by dividing the actual running time by
f (n), where n is the size of the input.
or over 1.7 billion years! Even if we could speed up the processor by a factor
of one million, this implementation would still require over 1700 years.
Though this example clearly illustrates the utility of asymptotic analysis,
a word of caution is in order. Asymptotic notation allows us to focus on
growth rates while ignoring constant factors. However, constant factors
can be relevant. For example, two linear-time algorithms will not yield
comparable performance if the hidden constants are very different.
√
For a more subtle example, consider the functions lg16 n and n, shown
√
in Figure 3.9. From Theorem 3.58, O(lg16 n) ⊆ o( n), so that as n increases,
√
lg16 n grows much more slowly than does n. However, consider n = 232 =
√
4,294,967,296. For this value, n = 216 = 65,536, whereas
3.13 Exercises
Exercise 3.1. Prove that if g(n) ∈ O(f (n)), then O(g(n)) ⊆ O(f (n)).
Exercise 3.2. Prove that for any f : N → R≥0 , f (n) ∈ Θ(f (n)).
Exercise 3.3. Prove that if f (n) ∈ O(g(n)) and g(n) ∈ O(h(n)), then
f (n) ∈ O(h(n)).
Exercise 3.4. Suppose f (n) ∈ Θ(g(n)). Prove that for each X ∈ {O, Ω, Θ},
X(f (n)) = X(g(n)).
Exercise 3.5. Prove Theorem 3.15.
Exercise 3.6. Prove Theorem 3.17.
Exercise 3.7. Prove Theorem 3.18. [Hint: You might find Theorem 3.17
useful for showing containment in one direction.]
Exercise 3.8. For each of the following, give functions f (n) ∈ Θ(n) and
g(n) ∈ Θ(n) that satisfy the given property.
Exercise 3.9. Suppose that g1 (n) ∈ Θ(f1 (n)) and g2 (n) ∈ Θ(f2 (n)), where
g2 and f2 are eventually positive. Prove that g1 (n)/g2 (n) ∈ Θ(f1 (n)/f2 (n)).
Exercise 3.10. Show that the result in Exercise 3.9 does not necessarily
hold if we replace Θ by O.
Exercise 3.11. Let f : N → R≥0 and g : N → R≥0 , where g is eventually
positive. Prove that f (n) ∈ O(g(n)) iff there is a positive real number c such
that f (n) ≤ cg(n) whenever g(n) > 0.
lg lg n
* Exercise 3.12. Let f (n) = 22 , where we assume that f (n) = 0 for
n ≤ 1.
a. Show that f (n) ∈ O(n).
b. Show that f (n) is not smooth; i.e., show that for every c ∈ R>0 and
every n0 ∈ N, there is some n ≥ n0 such that f (2n) > cf (n). [Hint:
k
Consider a sufficiently large value of n having the form 22 −1 .]
Analyzing Algorithms 99
* Exercise 3.13. The goal of this exercise is to prove Theorem 3.31. Let
f : N → R≥0 be a smooth function, g : N → N be an eventually non-
decreasing and unbounded function, and h : N → R≥0 .
a. Show that if h(n) ∈ O(f (n)), then there exist natural numbers n0 and
n1 , a positive real number c, and a non-negative real number d such that
for every n ≥ n1 ,
g(n) g(n)
h(i) ≤ d + cf (g(n)).
i=1 i=n0
c. Show that if h(n) ∈ Ω(f (n)), then there exist natural numbers n0 and n1
and positive real numbers c and d such that for every n ≥ n0 ,
f (n) ≥ f (2n)/d,
and
g(n) ≥ 2n0
hold.
d. Use part (c) to prove that
g(n)
Ω(f (i)) ⊆ Ω(g(n)f (g(n))).
i=1
* Exercise 3.14. Prove that for every smooth function f : N → R≥0 and
every eventually non-decreasing and unbounded function g : N → N, and
every X ∈ {O, Ω, Θ},
g(n)
X(f (i)) = X(g(n)f (g(n))).
i=1
[Hint: First identify a property that every function in the set on the left-hand
side must satisfy, but which functions in the set on the right-hand side need
not satisfy.]
Exercise 3.15. Prove Theorem 3.32.
Exercise 3.16. Analyze the worst-case running time of the following code
fragments, assuming that n represents the problem size. Express your result
as simply as possible using Θ-notation.
a. for i ← 0 to 2n
for j ← 0 to 3n
k ←k+i+j
b. for i ← 1 to n2
for j ← i to i3
k ←k+1
* c. i ← n
while i > 0
for j ← 1 to i2
x ← (x + j)/2
i ← i/2
b.
f (n) ∈ f (n − 1) + Ω(n lg n),
for n > 0.
c.
f (n) ∈ 4f (n/2) + O(lg2 n),
whenever n = 3 · 2k for a positive integer k.
d.
f (n) ∈ 5f (n/3) + Θ(n2 ),
whenever n = 3k for a positive integer k.
e.
f (n) ∈ 3f (n/2) + O(n),
whenever n = 8 · 2k for a positive integer k.
Exercise 3.18. Analyze the worst-case running time of SelectByMedian,
shown in Figure 2.7, assuming that Median is implemented to run in Θ(n)
time. Express your result as simply as possible using Θ-notation.
Exercise 3.19. Analyze the worst-case running time of the following
functions. Express your result as simply as possible using Θ-notation.
a. SlowSort(A[1..n])
if n = 2 and A[1] > A[2]
A[1] ↔ A[2]
else if n > 2
SlowSort(A[1..n − 1])
SlowSort(A[2..n])
SlowSort(A[1..n − 1])
b. FindMax(A[1..n])
if n = 0
error
else if n = 1
return A[1]
else
return Max(FindMax(A[1..n/2]), FindMax(A[n/2 + 1..n]))
102 Algorithms: A Top-Down Approach
c. FindMin(A[1..n])
if n = 0
error
else if n = 1
return A[1]
else
B ← new Array[1.. n/2 ]
for i ← 1 to n/2
B[i] ← Min(A[2i − 1], A[2i])
if n mod 2 = 1
B[ n/2 ] ← A[n]
return FindMin(B[1.. n/2 ])
Exercise 3.20. Analyze the worst-case space usage of each of the functions
given in Exercise 3.19. Express your result as simply as possible using Θ-
notation.
* Exercise 3.21. Prove that if f : N → R≥0 is smooth and g(n) ∈ Θ(n),
then f (g(n)) ∈ Θ(f (n)).
* Exercise 3.22. Prove that for any smooth function g : N → R≥0 , there
is a natural number k such that g(n) ∈ O(nk ).
* Exercise 3.23. The goal of this exercise is to prove Theorem 3.34. Let
c. Use parts (a) and (b), together with Equation (2.2), to show that if a < b,
then f (n) ∈ X(bn g(n)).
d. Use parts (a) and (b), together with Theorem 3.31, to show that if a = b,
then f (n) ∈ X(nan g(n)).
e. Suppose a > b, and let r = a/b. Show that there is a natural number
n2 ≥ n0 such that for every n ≥ n2 , 0 < g(n) ≤ g(n + 1) and
n
r
(b/a)i g(i) ≤ .
r−1
i=n2 +1
[Hint: Use the result of Exercise 3.22 and Theorem 3.58 to show that for
sufficiently large i, g(i) ≤ r i ; then apply Equation (2.2).]
f. Use parts (a), (b), and (e) to show that if a > b, then f (n) ∈ X(an ).
3.14 Notes
Asymptotic notation predates electronic computing by several decades.
Big-O notation was introduced by Bachman [7] in 1894, but with a meaning
slightly different from our definition. In the original definition, O(f (n)) was
used to denote a specific, but unknown, function belonging to the set we have
defined to be O(f (n)). According to the original definition, it was proper to
write,
2n2 + 7n − 4 = O(n2 ).
O(n2 ) = 2n2 + 7n − 4.
Thus, the “=” symbol was used to denote not equality, but a relation that
is not even symmetric.
Over the years, many have observed that a set-based definition, as we
have given here, is more sound mathematically. In fact, Brassard [16] claims
that as long ago as 1962, a set-based treatment was taught consistently
in Amsterdam. It was Brassard’s paper [16], however, that in 1985 first
made a strong case for using set-based notation consistently. Though we
are in full agreement with his position, use of the original definition is
still widespread. Alternatively, some authors give set-based definitions, then
abuse the notation by using “=” instead of “∈” or “⊆”. For a justification of
this practice, see Knuth [82] or Cormen, et al. [25]. For more information on
the development of asymptotic notation, including variations not discussed
here, see Brassard [16].
Analyzing Algorithms 105
Data Structures
This page intentionally left blank
Chapter 4
4.1 Stacks
One of the strengths of both top-down design and object-oriented design is
their use of abstraction to express high-level solutions to problems. In fact,
we can apply abstraction to the problems themselves to obtain high-level
solutions to many similar problems. Such high-level solutions are known
as design patterns. For example, consider the “undo” operation in a word
processor. We have some object that is undergoing a series of modifications.
An application of the “undo” operation restores the object to its state prior
to the last modification. Subsequent applications of “undo” restore the object
to successively earlier states in its history.
We have captured the essence of the “undo” operation without specifying
any of the details of the object being modified or the functionality of the
document formatter in which it will appear. In fact, our description is general
enough that it can apply to other applications, such as a spreadsheet or the
search tree viewer on this book’s web site. We have therefore specified a
design pattern for one aspect of functionality of an application.
109
110 Algorithms: A Top-Down Approach
and perhaps other variables that can be accessed using the representation
variables. It should be true at virtually all times. The only exception is that
we allow it to be temporarily violated while an operation is modifying the
structure, provided that it is true by the time the operation completes. The
structural invariant for our present example will be:
The values of the representation variables, together with all values used
by the interpretation and the structural invariant, comprise the state of the
data structure. Thus, the state of our stack implementation consists of the
value of size, the array elements, and the values stored in elements[1..size].
(We will clarify shortly the distinction between the array and the values
stored in the array.)
We can now complete our implementation by giving algorithms for the
SimpleStack constructor and operations. These algorithms are shown in
Figure 4.4.
Note that the preconditions and postconditions for the constructor and
operations are stated in terms of the definition of a stack, not in terms of our
chosen representation. For example, the precondition for the Push operation
could have been stated as,
Figure 4.4 The data type SimpleStack, which does not quite implement the Stack
ADT
i.e., with an error condition, then we assume the structure has not been
constructed.)
2. Maintenance: If the structural invariant holds prior to the beginning of
an operation, then it holds following completion of that operation.
3. Security: If the structural invariant holds, then the state can only be
modified by invoking one of this structure’s operations.
4. Termination: Each operation and constructor terminates.
5. Correctness: If the structural invariant and the precondition hold prior
to the beginning of an operation, then the postcondition holds following
the completion of that operation.
These restrictions are severe enough that we will often need to relax
them. In order to relax either of the first two restrictions, we can provide
accessor operations. Because we frequently need to do this, we will adopt
some conventions that allow us to avoid cluttering our algorithms with trivial
code.
calls, some operation x.Op2 is then called (see Figure 4.5). At this point,
the structural invariant is false, and the operation’s correctness cannot be
guaranteed. This scenario is known as a callback. Note that it does not matter
how long the sequence of nested calls is. In particular, the function call made
by x.Op1 may be a direct call to x.Op2, or it may be a recursive call.
Though callbacks are common in software systems, they are more
problematic than beneficial in the design of data structures. Furthermore, as
is the case for mutual recursion, callbacks may be impossible to detect when
we are designing a single data structure, as we may need to call a function
whose implementation we don’t have. For these reasons, we will assume that
if a callback is attempted, a runtime error results. Our correctness proofs
will then rest on the assumption that data structures and algorithms are not
combined in such a way as to result in a callback.
Given this assumption, once initialization, main- A locking mechanism can
tenance, and security are shown, it can be shown by be used to cause any call-
back to generate a runtime
induction that the structural invariant holds between error.
execution of any operations that construct or alter an
instance of the data structure. Note that the structural invariant will hold
after the structure is constructed, regardless of whether preconditions to
operations are met. Thus, we can be convinced that a structural invariant
holds even if operations are not invoked properly. Because we can know
that the structural invariant holds, we can then use it in addition to the
precondition in proving the correctness of an individual operation.
push n elements onto such a stack, where n is much larger than the size of
the original array. After the original array is filled, each Push requires Θ(i)
time, where i is number of elements currently in the stack. It is not hard to
see that the total time for pushing all n elements is in Θ(n2 ), assuming the
size of the original array is a fixed constant.
In order to avoid this bad performance, when we need to allocate a new
array, we should make sure that it is significantly larger than the array we
are replacing. As we will see shortly, we can achieve this goal by doubling
the size of the array. The ExpandableArrayStack implementation shown
in Figure 4.6 implements this idea. In order for this to work, however, the
size of the array must always be non-zero; hence, we will need to include
this restriction in the structural invariant. Note that we have added a
constructor that was not specified in the interface (Figure 4.1). The no-
argument constructor simply invokes this new constructor with a default
value for its argument.
At this point it is tempting to apply the top-down design principle
by defining an ADT for an expandable array. However, when an idea is
simple enough, designing an ADT often becomes more cumbersome than it
is worth. To attain the full functionality of an expandable array, we would
need operations to perform each of the following tasks:
Furthermore, we might wish to redistribute the data in the larger array when
we expand it. It therefore seems best to characterize the expandable array
design pattern as the practice of moving data from one array to a new one
of at least twice the size whenever the current array becomes too full.
Clearly, the worst-case running time of the Push operation shown in
Figure 4.6 is in O(n), where n is the number of elements in the stack.
Furthermore, for any n > 0, if we construct a stack with the constructor call
ExpandableArrayStack(n), then when the (n + 1)-st element is pushed
onto the stack, Ω(n) time is required. Therefore, the worst-case running
time of the Push operation is in Θ(n). All other operations clearly require
Θ(1) time.
The above analysis seems inadequate because in any actual use of
a stack, the Θ(n) behavior will occur for only a few operations. If an
ExpandableArrayStack is used by some algorithm, the slow operations
Basic Techniques for Data Structures 123
may be few enough that they do not significantly impact the algorithm’s
overall performance. In such a case, it makes sense to consider the worst-
case performance of an entire sequence of operations, rather than a single
operation. This idea is the basis for amortized analysis.
With amortized analysis, we consider an arbitrary sequence of operations
performed on an initially empty data structure. We then do a form of worst-
case analysis of this sequence of operations. Clearly, longer sequences will
have longer running times. In order to remove the dependence upon the
length of the sequence, we amortize the total running time of the sequence
over the individual operations in the sequence. For the time being, this
amortization will simply be to compute the average running time for an
individual operation in the sequence; later in this chapter we will generalize
this definition. The analysis is still worst-case because the sequence is
arbitrary — we are finding the worst-case amortized time for the operations
on the data structure.
For example, consider any sequence of n operations on an initially empty
stack constructed with ExpandableArrayStack(k). We first analyze the
worst-case running time of such a sequence. We can use the techniques
presented in Chapter 3, but the analysis is easier if we apply a new technique.
We first analyze the running time ignoring all iterations of the for loop in
the Push operation and any loop overhead that results in the execution of
an iteration. Having ignored this code, it is easily seen that each operation
requires Θ(1) time, so that the entire sequence requires Θ(n) time.
We now analyze the running time of all iterations of the for loop
throughout the entire sequence of operations. In order to accomplish this,
we must compute the total number of iterations. The array will be expanded
to size 2k the first time the stack reaches size k + 1. For this first expansion,
the for loop iterates k times. The array will be expanded to size 4k the first
time the stack reaches size 2k + 1. For this expansion, the loop iterates 2k
times. In general, the array will be expanded to size 2i+1 k the first time the
stack reaches size 2i k + 1, and the loop will iterate 2i k times during this
expansion. Because the sequence contains n operations, the stack can never
exceed size n. Therefore, in order to compute an upper bound on the total
number of iterations, we must sum 2i k for all i ≥ 0 such that
2i k + 1 ≤ n
2i ≤ (n − 1)/k
i ≤ lg(n − 1) − lg k.
Basic Techniques for Data Structures 125
Because each loop iteration requires Θ(1) time, the time required for all
loop iterations is in O(n). Combining this result with the earlier analysis
that ignored the loop iterations, we see that the entire sequence runs in
Θ(n) time.
Now to complete the amortized analysis, we must average the total
running time over the n operations in the sequence. By Exercise 3.9 on page
98, if f (n) ∈ Θ(n), then f (n)/n ∈ Θ(1). Therefore, the worst-case amortized
time for the stack operations is in Θ(1). We conclude that, although an
individual Push operation may be expensive, the expandable array yields
a stack that performs well on any sequence of operations starting from an
initially empty stack.
data item will be reflected in both stacks. However, changes to one of the
stacks will not affect the other. In order to perform a shallow clone of an
ExpandableArrayStack, the array must clearly be copied, so that the
two stacks can be manipulated independently. Copying one array to another
requires Θ(n) time, where n is the number of elements copied.
We might be able to improve on this running time if we can use a data
structure that facilitates non-destructive updates. An update is said to be
non-destructive if it does not change any of the existing structure, but instead
builds a new structure, perhaps using some or all of the existing structure.
If all updates are non-destructive (i.e., the structure is immutable), it is
possible for different structures to share substructures that are common to
both. This sharing can sometimes lead to improved efficiency; for example, to
clone an immutable structure all that we need to copy is the reference to it.
In order to apply this idea to stacks, it is helpful to think of a finite
sequence as nested ordered pairs. In particular, a sequence of length n > 0
is an ordered pair consisting of a sequence of length n − 1 followed by an
element. As a special case, the sequence of length 0 is denoted (). Thus, the
sequence a1 , a2 , a3 can be thought of as the pair ((((), a1 ), a2 ), a3 ). If we
think of this sequence as a Stack S, then we can think of S.Push(a4 ) as
a function returning a new sequence (((((), a1 ), a2 ), a3 ), a4 ). Note that this
new sequence can be constructed simply by pairing S with a4 , leaving S
unchanged.
Nested pairs form the basic data structure in the programming language
Lisp and its derivatives. The Lisp function to build an ordered pair is called
cons. Based on this background is the ADT known as a ConsList. It is
useful to think of a nonempty ConsList as a pair (head, tail), where head
is an element and tail is a ConsList. (Note that the two components of the
pair are in the reverse order of that described in the above paragraph.)
More formally, we define a ConsList to be a We use Bool to denote the
finite sequence a1 , . . . , an , together with the opera- type whose only values are
true and false.
tions specified in Figure 4.7. Note that none of these
operations changes the ConsList. We therefore say that a ConsList is an
immutable structure, meaning that though the elements in the sequence may
change their state, the sequence itself will not change.
In what follows, we will show how to implement Stack using a Cons-
List. We will have thus applied top-down design to the task of implementing
Stack, as we will have reduced this problem to the problem of implementing
ConsList. The resulting Stack implementation will support constant-time
Push, Pop, and shallow cloning, which we will support via an additional
Basic Techniques for Data Structures 127
Correctness: The only operations simply provide read access, and so are
trivially correct.
Figure 4.12 Example of actual, potential, and amortized costs for gasoline
the sum of amortized costs is the sum of actual costs of gasoline purchases.
Thus, the sum of amortized costs is the sum of actual costs plus the final
potential cost. Because the potential cost can never be negative (the tank
can’t be “overfull”), the sum of the amortized costs will be at least the sum
of the actual costs.
Let us now consider how we might apply this technique to the amortized
analysis of a data structure such as an ExpandableArrayStack. The
potential gasoline cost is essentially a measure of how “bad” the state of
the gas tank is. In a similar way, we could measure how “bad” the state of
an ExpandableArrayStack is by considering how full the array is — the
closer the array is to being filled, the closer we are to an expensive operation.
We can formalize this measure by defining a potential function Φ, which maps
states of a data structure into the nonnegative real numbers, much like the
potential gasoline cost maps “states” of the gas tank into nonnegative real
numbers.
134 Algorithms: A Top-Down Approach
Using the above criteria, we can divide operations into four categories:
k + (1 − k) = 1.
We can therefore conclude that the amortized running time of the IterBin-
Counter operations is in O(1).
Let us now use this technique to analyze the amortized performance of
ExpandableArrayStack. We first observe that operations which result
in an error run in Θ(1) time and do not change the state of the structure;
hence, we can ignore these operations. As we did in Section 4.3, we will
again amortize the number of loop iterations; i.e., the actual cost of an
operation will be the number of loop iterations performed by that operation.
An operation that does not require expanding the array performs no loop
iterations, and an operation that requires expanding the array performs n
loop iterations, where n is the size of the stack prior to the operation.
We now need an appropriate potential function. We first note that the
Pop operation not only is cheap, but it also improves the state of the stack
by making more array locations available. We therefore don’t need to focus
on this operation when looking for a potential function. Instead, we need
to focus on the Push operation. A Push that does not expand the array is
inexpensive, but degrades the future performance by reducing the number of
available array locations. We therefore want the potential function to increase
by at most a constant in this case. A Push that requires an array expansion
is expensive — requiring n iterations — but improves the performance of
the structure by creating additional array locations. We want the potential
function to decrease by roughly n in this case.
We mentioned earlier that we wanted the potential function to be a
measure of how full the array is. Perhaps the most natural measure is n/k,
where n is the number of elements in the stack and k is the size of the array.
This function is 0 when n is 0 and is always nonnegative. Furthermore,
because n ≤ k, n/k never exceeds 1; hence, no operation can increase this
140 Algorithms: A Top-Down Approach
function by more than 1. However, this also means that no operation causes
it to decrease by more than 1. Therefore, it does not fit the characteristics
we need for a tight amortized analysis.
In order to overcome this problem, let us try multiplying n/k by some
value in order to give it more of a range. Because we need for the function to
decrease by about n when we expand the array, it will need to have grown
by about n after we have done n Pushes; hence, it needs to exhibit at least
linear growth. n/k is bounded by a constant; hence, to cause it to be linear
in n, we would want to multiply it by a function that is linear in n. This
suggests that we might want to try some function of the form an2 /k, where
a is some positive real number to be determined later.
Using this potential function, consider the amortized cost of a Push
operation that expands the array. Prior to the operation, n = k. Therefore,
the change in potential is
<4
Basic Techniques for Data Structures 141
because k must be strictly larger than n, and both are integers. Because no
loop iterations are performed in this case, the actual cost is 0; hence, the
amortized cost is less than 4.
In order to complete the analysis, we must consider the Pop operation.
Because n is initially positive and decreases by 1, and because k remains the
same, the change in potential is
4.6 Summary
We have shown how the top-down design paradigm can be applied to the
design of data structures. In many cases, we can reduce the implementation
of an ADT to the implementation of simpler or lower-level ADTs. In other
cases, we can reduce the implementation to a common design pattern. The
algorithms we have used for implementing the operations of ADTs have been
quite simple. As we examine more advanced data structures in the following
chapters, we will see that the algorithms in the implementations also use the
top-down approach as presented in Chapter 1.
Applying the top-down approach yields clean techniques for proving
that implementations of ADTs meet their specifications. The techniques are
similar to those presented in Chapter 2, but additionally require proving
security of the implementations. Borrowing some ideas from modular and
object-oriented languages, we have supplied a computational model that
facilitates security in a straightforward way. This model also facilitates
the implementation of immutable structures, which in some cases yield
performance benefits by eliminating the need to copy data. However, use
of immutable structures tends to increase the amount of dynamic memory
allocation and requires the presence of an automatic garbage collector.
142 Algorithms: A Top-Down Approach
4.7 Exercises
Exercise 4.1. Complete the proof of Theorem 4.2 by giving proofs of
maintenance and correctness for the two missing cases.
Exercise 4.2. Prove that ConsListStack, shown in Figure 4.8 on
page 128, meets its specification, given in Figure 4.1 on page 110.
* Exercise 4.3. Give an algorithm for Append, specified in Figure 4.15.
Your algorithm should run in O(n) time, where n is the number of elements
in x.
n − f (n) + 1.
This can happen if a large number of elements are pushed onto the stack,
then most are removed. One solution is to modify the Pop operation so that
if the number of elements drops below half the size of the array, then we copy
the elements to a new array of half the size. Give a convincing argument that
this solution would not result in O(1) amortized running time.
Exercise 4.10. An alternative to the solution sketched in the above exercise
is to reduce the size of the array by half whenever it becomes less than 1/4
full, but is still non-empty.
a. Give a modified Pop operation to implement this idea.
* b. Using the technique of Section 4.3, show that the stack operations have
an amortized running time in O(1) when this scheme is used. You may
assume that the array is initially of size 4.
** c. Repeat the above analysis using a potential function. [Hint: Your
potential function will need to increase as the size of the array diverges
from 2n, where n is the number of elements in the stack.]
Exercise 4.11. A queue is similar to a stack, but it provides first in first out
(FIFO) access to the data items. Instead of the operations Push and Pop,
it has operations Enqueue and Dequeue — Enqueue adds an item to the
end of the sequence, and Dequeue removes the item from the beginning of
the sequence.
a. Give an ADT for a queue.
b. Using the linked list design pattern, give an implementation of your ADT
for which all operations run in Θ(1) time.
c. Prove that your implementation meets its specification.
Exercise 4.12. A certain data structure contains operations that each
consists of a sequence of zero or more Pops from a stack, followed by a
single Push. The stack is initially empty, and no Pop is attempted when
the stack is empty.
a. Prove that in any sequence of n operations on an initialized structure,
there are at most 2n stack operations (i.e., Pushes and Pops).
b. Use a simple potential function to show that the amortized number of
stack operations is bounded by a constant.
Note that the least significant bit has the lowest index; hence, it might
be helpful to think of the array with index 0 at the far right, and indices
increasing from right to left.
a. Complete this implementation of BigNum such that
• NumBits runs in Θ(1) time;
• Shift and GetBits run in Θ(n) time, where n is the number of bits
in the result;
• the constructor and the remaining operations run in Θ(n) time, where
n is the number of bits in the largest number involved in the operation.
b. Prove that your implementation meets its specification.
4.8 Notes
The phenomenon that occurs when multiple copies are made of the same
reference is known in the literature as aliasing. The problem is thoroughly
discussed by, e.g., Aho et al. [3] and Muchnick [94].
Use of immutable structures has its roots in functional programming,
though it has carried over to some degree to languages from other paradigms.
Paulson [97] gives a nice introduction to functional programming using ML,
where immutable data types are the norm.
The search tree viewer posted on this textbook’s web site contains
complete Java implementations of ConsList and ConsListStack. Deep
cloning is simulated in this code because only immutable items are placed
on the stacks.
Exercise 4.12 is due to Tarjan [113], who gives an excellent survey of
amortized analysis. He credits D. Sleator for the potential function method
of amortized analysis.
This page intentionally left blank
Chapter 5
Priority Queues
149
150 Algorithms: A Top-Down Approach
the Put operation to the problem of finding the correct location to insert a
given priority p. This location is the index i, 0 ≤ i ≤ size, such that
Let us now analyze the running time of Find. Clearly, each iteration of
the while loop runs in Θ(1) time, as does the code outside the loop. We
therefore only need to count the number of iterations of the loop.
Let f (n) denote the number of iterations, where n = hi − lo gives the
number of elements in the search range. One iteration reduces the number of
elements in the range to either n/2 or n/2 − 1. The former value occurs
Priority Queues 153
whenever the key examined is greater than or equal to p. The worst case
therefore occurs whenever we are looking for a key smaller than any key in
the set. In the worst case, the number of iterations is therefore given by the
following recurrence:
f (n) = f (n/2) + 1
for n > 1. From Theorem 3.35, f (n) ∈ Θ(lg n). Therefore, Find runs in
Θ(lg n) time.
Let us now analyze the running time of Put. Let n be the value of
size. The first statement requires Θ(lg n) time, and based on our analysis
in Section 4.3, the Expand function should take O(n) time in the worst
case. Because we can amortize the time for Expand, let us ignore it for now.
Clearly, everything else outside the for loop and a single iteration of the
loop run in Θ(1) time. Furthermore, in the worst case (which occurs when
the new key has a value less than all other keys in the set), the loop iterates
n times. Thus, the entire algorithm runs in Θ(n) time in the worst case,
regardless of whether we count the time for Expand.
5.2 Heaps
The SortedArrayPriorityQueue has very efficient MaxPriority and
RemoveMax operations, but a rather slow Put operation. We could speed
up the Put operation considerably by dropping our requirement that the
array be sorted. In this case, we could simply add an element at the
end of the array, expanding it if necessary. This operation is essentially
the same as the ExpandableArrayStack.Push operation, which has an
amortized running time in Θ(1). However, we would no longer be able to
take advantage of the ordering of the array in finding the maximum priority.
As a result, we would need to search the entire array. The running times
for the MaxPriority and RemoveMax operations would therefore be in
Θ(n) time, where n is the number of elements in the priority queue.
In order to facilitate efficient implementations of all three operations,
let us try applying the top-down approach to designing an appropriate data
structure. Suppose we have a non-empty set of elements. Because we need
to be able to find and remove the maximum priority quickly, we should
keep track of it. When we remove it, we need to be able to locate the new
maximum quickly. We can therefore organize the remaining elements into two
(possibly empty) priority queues. (As we will see, using two priority queues
for these remaining elements can yield significant performance advantages
154 Algorithms: A Top-Down Approach
Figure 5.3 A heap — each priority is no smaller than any of its children
over a single priority queue.) Assuming for the moment that both of these
priority queues are nonempty, the new overall maximum must be the larger of
the maximum priorities from each of these priority queues. We can therefore
find the new maximum by comparing these two priorities. The cases in which
one or both of the two priority queues are empty are likewise straightforward.
We can implement the above idea by arranging the priorities into a heap,
as shown in Figure 5.3. This structure will be the basis of all of the remaining
PriorityQueue implementations presented in this chapter. In this figure,
integer priorities of several data items are shown inside circles, which we
will call nodes. The structure is referenced by its root node, containing the
priority 89. This value is the maximum of the priorities in the structure.
The remaining priorities are accessed via one of two references, one leading
to the left, and the other leading to the right. Each of these two groups
of priorities forms a priority queue structured in a similar way. Thus, as
we follow any path downward in the heap, the values of the priorities are
non-increasing.
A heap is a special case of a more general structure known as a tree. Let
N be a finite set of nodes, each containing a data item. We define a rooted
tree comprised of N recursively as:
• a special object which we will call the empty tree if N = ∅; or
• a root node x ∈ N , together with a finite sequence T1 , . . . , Tk of children,
where
Priority Queues 155
enforcing as a structural invariant the fact that no two children have nodes
in common. In order for an operation to maintain this invariant when adding
a new node, it would apparently need to examine the entire structure to see if
the new node is already in the tree. As we will see, maintaining this invariant
becomes much easier for specific applications of trees. It therefore seems best
to think of a rooted tree as a mathematical object, and to mimic its structure
in defining a heap implementation of PriorityQueue.
In order to build a heap, we need to be able to implement a single
node. For this purpose, we will define a data type BinaryTreeNode. Its
representation will contain three variables:
• the item stored at the root has the maximum key in the tree and
• both children are heaps.
Our structural invariant will be that elements is a heap whose size is given by
size. We interpret the contents of the nodes comprising this heap as the set of
items stored in the priority queue, together with their associated priorities.
Implementation of MaxPriority is now trivial — we just return the
key of the root. To implement RemoveMax, we must remove the root
(provided the heap is non-empty) and return the data from its contents.
When we remove the root, we are left with the two children, which must
then be combined into one heap. We therefore will define an internal function
Merge, which takes as input two heaps h1 and h2 with no nodes in common
(i.e., the two heaps share no common structure, though they may have keys
in common), and returns a single heap containing all of the nodes from h1
and h2 . Note that we can also use the Merge function to implement Put
if we first construct a single-node heap from the element we wish to insert.
Let us consider how to implement Merge. If either of the two heaps h1
and h2 is nil (i.e., empty), we can simply return the other heap. Otherwise,
the root of the result must be the root of either h1 or h2 , whichever root
contains a Keyed item with larger key (a tie can be broken arbitrarily). Let
L denote the heap whose root contains the maximum key, and let S denote
the other heap. Then we must form a heap whose root is the root of L and
whose two children are heaps containing the nodes in the following three
heaps:
We can form these two children by recursively merging two of these three
heaps.
A simple implementation, which we call SimpleHeap, is shown in
Figure 5.5. Note that we can maintain the structural invariant because we
can ensure that the precondition to Merge is always met (the details are left
as an exercise). Note also that the above discussion leaves some flexibility in
the implementation of Merge. In fact, we will see shortly that this particular
implementation performs rather poorly. As a result, we will need to find a
better way of choosing the two heaps to merge in the recursive call, and/or
a better way to decide which child the resulting heap will be.
Let us now analyze the running time of Merge. Suppose h1 and h2
together have n nodes. Clearly, the running time excluding the recursive
call is in Θ(1). In the recursive call, L.RightChild() has at least one fewer
node than does L; hence the total number of nodes in the two heaps in the
158 Algorithms: A Top-Down Approach
f (n) ∈ f (n − 1) + O(1)
⊆ O(n)
by Theorem 3.34.
At first it might seem that the bound of n − 1 on the number of nodes in
the two heaps in the recursive call is overly pessimistic. However, upon close
examination of the algorithm, we see that not only does this describe the
worst case, it actually describes every case. To see this, notice that nowhere
in the algorithm is the left child of a node changed after that node is created.
Because each left child is initially empty, no node ever has a nonempty left
child. Thus, each heap is single path of nodes going to the right.
The SimpleHeap implementation therefore amounts to a linked list in
which the keys are kept in non-increasing order. The Put operation will
therefore require Θ(n) time in the worst case, which occurs when we add a
node whose key is smaller than any in the heap. In the remainder of this
chapter, we will examine various ways of taking advantage of the branching
potential of a heap in order to improve the performance.
Theorem 5.1. For any binary tree T with n nodes, the null path length of
T is at most lg(n + 1).
The proof of this theorem is typical of many proofs of properties of trees.
It proceeds by induction on n using the following general strategy:
• For the base case, prove that the property holds when n = 0 — i.e., for
an empty tree.
• For the induction step, apply the induction hypothesis to one or more of
the children of a nonempty tree.
Proof of Theorem 5.1. By induction on n.
Induction Hypothesis: Assume for some n > 0 that for 0 ≤ i < n, the
null path length of any tree with i nodes is at most lg(i + 1).
Induction Step: Let T be a binary tree with n nodes. Then because the two
children together contain n − 1 nodes, they cannot both contain more than
(n − 1)/2 nodes; hence, one of the two children has no more than (n − 1)/2
nodes. By the induction hypothesis, this child has a null path of at most
lg((n − 1)/2 + 1). The null path length of T is therefore at most
By the above theorem, if we can always choose The term “leftist” refers to
the child with smaller null path length for the the tendency of these struc-
tures to be heavier on the
recursive call, then the merge will operate in O(lg n) left.
time, where n is the number of nodes in the larger of
the two heaps. We can develop slightly simpler algorithms if we build our
heaps so that the right-hand child always has the smaller null path length, as
in Figure 5.6(a). We therefore define a leftist tree to be a binary tree which, if
nonempty, has two leftist trees as children, with the right-hand child having
a null path length no larger than that of the left-hand child. A leftist heap
is then a leftist tree that is also a heap.
In order to implement a leftist heap, we will use an implementation of
a leftist tree. The leftist tree implementation will take care of maintaining
the proper shape of the tree. Because we will want to combine leftist trees
Priority Queues 161
to form larger leftist trees, we must be able to handle the case in which
two given leftist trees have nodes in common. The simplest way to handle
this situation is to define the implementation to be an immutable structure.
162 Algorithms: A Top-Down Approach
Because no changes can be made to the structure, we can treat all nodes
as distinct, even if they are represented by the same storage (in which case
they are the roots of identical trees).
In order to facilitate fast computation of null path lengths, we will record
the null path length of a leftist tree in one of its representation variables.
Thus, when forming a new leftist tree from a root and two existing leftist
trees, we can simply compare the null path lengths to decide which tree
should be used as the right child. Furthermore, we can compute the null
path length of the new leftist tree by adding 1 to the null path length of its
right child.
For our representation of LeftistTree, we will therefore use four
variables:
We will allow read access to all variables. Our structural invariant will be
that this structure is a leftist tree such that
Specifically, we will allow the same node to occur more than once in the
structure — each occurrence will be viewed as a copy. Because the structure
is immutable, such sharing is safe. The implementation of LeftistTree is
shown in Figure 5.7. Clearly, each of these constructors runs in Θ(1) time.
We now represent our LeftistHeap implementation of Priori-
tyQueue using two variables:
Our structural invariant is that elements is a leftist heap whose size is given
by size, and whose nodes are Keyed items. We interpret these Keyed
items as the represented set of elements with their associated priorities. The
implementation of LeftistHeap is shown in Figure 5.8.
Based on the discussion above, Merge runs in O(lg n) time, where n is
the number of nodes in the larger of the two leftist heaps. It follows that
Put and RemoveMax operate in O(lg n) time, where n is the number of
Priority Queues 163
items in the priority queue. Though it requires some work, it can be shown
that the lower bound for each of these running times is in Ω(lg n).
It is easy to see that the stack space usage of Merge is proportional to
the depth of recursion, which in turn is proportional to the running time.
164 Algorithms: A Top-Down Approach
Therefore, Merge, and hence Put and RemoveMax, uses Θ(lg n) stack
space in the worst case. The remaining space usage is in Θ(1).
Example 5.2. Consider the leftist heap shown in Figure 5.6(a). Suppose
we were to perform a RemoveMax on this heap. To obtain the resulting
heap, we must merge the two children of the root. The larger of the two keys
is 15; hence, it becomes the new root. We must then merge its right child
with the original right child of 20 (see Figure 5.6(b)). The larger of the two
roots is 13, so it becomes the root of this subtree. The subtree rooted at 7 is
then merged with the empty right child of 13. Figure 5.6(c) shows the result
without considering the null path lengths. We must therefore make sure that
in each subtree that we’ve formed, the null path length of the right child is
no greater than the null path length of the left child. This is the case for the
subtree rooted at 13, but not for the subtree rooted at 15. We therefore must
swap the children of 15, yielding the final result shown in Figure 5.6(d).
by modifying Merge to swap the children after the recursive call. We call
this modified structure a skew heap. The Merge function for SkewHeap is
shown in Figure 5.9; the remainder of the implementation of SkewHeap is
the same as for SimpleHeap.
Example 5.3. Consider again the heap shown in Figure 5.6(a), and suppose
it is a skew heap. Performing a RemoveMax on this heap proceeds as shown
in Figure 5.6 through part (c). At this point, however, for each node at which
a recursive Merge was performed, the children of this node are swapped.
These nodes are 13 and 15. The resulting heap is shown in Figure 5.10.
In order to understand why such a simple modification might be advan-
tageous, observe that in Merge, when S is merged with L.RightChild(),
we might expect the resulting heap to have a tendency to be larger than
L.LeftChild(). As we noted at the end of the previous section, good worst-
case behavior can be obtained by ensuring that the left child of each node
has at least as many nodes as the right child. Intuitively, we might be able
to approximate this behavior by swapping the children after every recursive
call. However, this swapping does not always avoid expensive operations.
Suppose, for example, that we start with an empty skew heap, then insert
the sequence of keys 2, 1, 4, 3, . . . , 2i, 2i − 1, 0, for some i ≥ 1. Figure 5.11
Priority Queues 167
Figure 5.10 The result of performing a RemoveMax on the skew heap shown in
Figure 5.6(a)
shows this sequence of insertions for i = 3. Note that each time an even
key is inserted, because it is the largest in the heap, it becomes the new
root and the original heap becomes its left child. Then when the next key
is inserted, because it is smaller than the root, it is merged with the empty
right child, then swapped with the other child. Thus, after each odd key is
inserted, the heap will contain all the even keys in the rightmost path (i.e.,
the path beginning at the root and going to the right until it reaches an
empty subtree), and for i ≥ 1, key 2i will have key 2i − 1 as its left child.
Finally, when key 0 is inserted, because it is the smallest key in the heap,
it will successively be merged with each right child until it is merged with
the empty subtree at the far right. Each of the subtrees on this path to the
right is then swapped with its sibling. Clearly, this last insertion requires
Θ(i) running time, and i is proportional to the number of nodes in the heap.
The bad behavior described above results because a long rightmost
path is constructed. Note, however, that 2i Put operations were needed
to construct this path. Each of these operations required only Θ(1) time.
Furthermore, after the Θ(i) operation, no long rightmost paths exist from
any node in the heap (see Figure 5.11). This suggests that a skew heap might
have good amortized running time.
A good measure of the actual cost of the SkewHeap operations is the
number of calls to Merge, including recursive calls. In order to derive a
bound on the amortized cost, let us try to find a good potential function.
Based upon the above discussion, let us say that a node is good if its left
child has at least as many nodes as its right child; otherwise, it is bad. We
now make two key observations, whose proofs are left as exercises:
• In any binary tree with n nodes, the number of good nodes in the rightmost
path is no more than lg(n + 1).
• In the Merge function, if L is a bad node initially, it will be a good node
in the resulting heap.
P (e) = 1.
e∈S
Thus, by multiplying the value of the random variable for each elementary
event by the probability of that elementary event, we obtain an average value
for that variable. Note that it is possible for an expected value to be infinite.
If the summation converges, however, it converges to a unique value, because
all terms are nonnegative.
Example 5.4. Let T be a binary tree with n nodes, such that all paths from
the root to empty subtrees have the same length. Because the probability of
each path is determined solely by its length, all paths must have the same
probability. Because there are n + 1 paths and the sum of their probabilities
is 1, each path must have probability 1/(n+1). In this case, E[lenT ] is simply
172 Algorithms: A Top-Down Approach
1
= lenT (e).
n+1
e∈PathT
Furthermore, because the lengths of all of the paths are the same, E[lenT ]
must be this length, which we will denote by k.
We have defined the probability of a path of length k to be 2−k .
Furthermore, we have seen that all probabilities are 1/(n + 1). We therefore
have
2−k = 1/(n + 1).
Solving for k, we have
k = lg(n + 1).
Thus, E[lenT ] = lg(n + 1).
The discrete random variable lenT is always a natural number. When
this is the case, its expected value is often easier to analyze. To show why,
we first need to define an event, which is any subset of the elementary events
in a discrete probability space. The probability of an event A is the sum of
the probabilities of its elementary events; i.e.,
P (A) = P (e).
e∈A
Note that because the sum of the probabilities of all elementary events in
a discrete probability space is 1, the probability of an event is never more
than 1.
The following theorem gives a technique for computing expected values
of discrete random variables that range over the natural numbers. It uses
predicates like “f = i” to describe events; e.g., the predicate “f = i” defines
the event in which f has the value i, and P (f = i) is the probability of this
event.
Theorem 5.5. Let f : S → N be a discrete random variable. Then
∞
E[f ] = P (f ≥ i).
i=1
Priority Queues 173
In the above sum, the negative portion iP (f ≥ i + 1) of the ith term cancels
most of the positive portion (i+1)P (f ≥ i+1) of the (i+1)st term. The result
of this cancellation is the desired sum. However, in order for this reasoning
to be valid, it must be the case that the “leftover” term, −iP (f ≥ i + 1),
converges to 0 as i approaches infinity if E[f ] is finite. We leave the details
as an exercise.
Example 5.6. Let T be a binary tree in which each of the n nodes has
an empty left child; i.e., the nodes form a single path going to the right.
Again, the size of PathT is n + 1, but now the probabilities are not all the
same. The length of the path to the rightmost empty subtree is n; hence, its
probability is 2−n . For 1 ≤ i ≤ n, there is exactly one path that goes right
i − 1 times and left once. The probabilities for these paths are given by 2−i .
We therefore have
E[lenT ] = lenT (e)P (e)
e∈PathT
n
−n
= n2 + i2−i .
i=1
(1/2)n − 1
= (by (2.2))
(1/2) − 1
= 2 − 21−n .
Thus, E[lenT ] < 2.
In order to be able to analyze the expected running time of Random-
izedHeap.Merge, we need to know E[lenT ] for a worst-case binary tree T
with n nodes. Examples 5.4 and 5.6 give two extreme cases — a completely
balanced tree and a completely unbalanced tree. We might guess that the
worst case would be one of these extremes. Because lg(n + 1) ≥ 2 − 21−n
for all n ∈ N, a good guess would be that lg(n + 1) is an upper bound for
the worst case. We can show that this is indeed the case, but we need to
use the following theorem relating the sum of logarithms to the logarithm
of a sum.
In order to isolate lg xy, let us now subtract xy from the fraction in the
above equation. This yields
2
x + 2xy + y 2
2 lg(x + y) − 2 = lg
4
x2 − 2xy + y 2
= lg xy +
4
(x − y)2
= lg xy +
4
≥ lg xy,
because (x − y)2 /4 is always nonnegative and the lg function is non-
decreasing.
We can now show that lg(n + 1) is an upper bound for E[lenT ] when T
is a binary tree with n nodes.
Theorem 5.8. Let T be any binary tree with size n, where n ∈ N. Then
E[lenT ] ≤ lg(n + 1).
Proof. By induction on n.
Base: n = 0. Then only one path to an empty tree exists, and its length
is 0. Hence, E[lenT ] = 0 = lg 1.
because the probability of any path from the root of a child of T to any
empty subtree is twice the probability of the path from the root of T to the
same empty subtree, and its length is one less.
176 Algorithms: A Top-Down Approach
Because the two sums in (5.1) are similar, we will simplify just the first
one. Thus,
⎛ ⎞
P (e) 1
(lenL (e) + 1) = ⎝ lenL (e)P (e) + P (e)⎠
2 2
e∈PathL e∈PathL e∈PathL
⎛ ⎞
1⎝
= lenL (e)P (e) + 1⎠ ,
2
e∈PathL
and
∞
∞
E hi = E[hi ].
i=0 i=0
where |S| and |T | denote the sizes of S and T , respectively. Thus, the
expected running time of Merge is in O(lg n), where n is the total number
of nodes in the two heaps. It follows that the expected running times of Put
and RemoveMax are also in O(lg n).
A close examination of Example 5.4 reveals that the bound of lg(n + 1)
on E[lenT ] is reached when n + 1 is a power of 2. Using the fact that lg is
smooth, we can then show that the expected running time of Merge is in
Ω(lg n); the details are left as an exercise. Thus, the expected running times
of Put and RemoveMax are in Θ(lg n).
It is also clear that the stack space usage of Merge is proportional to the
depth of recursion, which is proportional to the running time. As a result, the
expected stack space usage of Merge, and hence of Put and RemoveMax,
is in Θ(lg n). While this result is positive, it might be worthwhile to consider
the worst-case stack space usage, as high stack space usage will cause a
program to terminate abnormally. In the worst case, all nodes in a single
tree can be in the same path, and Merge can follow this path to the end.
Hence, in the worst case, Merge, Put, and RemoveMax can use Θ(n)
stack space. On the other hand, the recursion can be removed in a similar
Priority Queues 179
this key is at least as large as any key in the priority queue, but no larger
than any key in the sorted part, we can extend the sorted part to include
this location (see Figure 5.13(c)).
We therefore need to be able to represent a heap using an array. One way
to accomplish this is to number the nodes left-to-right by levels, as shown
in Figure 5.14. The numbers we have assigned to the nodes can be used
as array indices. In order to avoid ambiguity, there should be no “missing”
nodes; i.e., each level except possibly the last should be completely full, and
all of the nodes in the last level should be as far to the left as possible. This
scheme for storing a heap is known as a binary heap.
Note that a binary heap is very nearly balanced. We saw in Example 5.4
that in a completely balanced binary tree with n nodes, the length of any
path to an empty subtree is lg(n+1). This result holds only for tree sizes that
can be completely balanced. However, it is not hard to show that for any n,
if a binary tree with n nodes is balanced as nearly as possible, the length
of the longest path to an empty subtree is lg(n + 1) (or equivalently, the
height is lg(n + 1) − 1). We will show that this fact allows us to implement
both Put and RemoveMax for a binary heap in Θ(lg n) time.
Note that each level of a binary heap, except the first and possibly the
last, contains exactly twice as many nodes as the level above it. Thus, if we
were to number the levels starting with 0 for the top level, then each level i
(except possibly the last) contains exactly 2i nodes. It follows from (2.2)
Priority Queues 181
that levels 0 through i − 1, where i is strictly less than the total number of
levels, have a total of 2i − 1 nodes. Let x be the jth node on level i. x would
then have index 2i − 1 + j. Suppose x has a left child, y. In order to compute
its index, we observe that level i has j − 1 nodes to the left of x. Each of
these nodes has two children on level i + 1 to the left of node y. Therefore,
the index of y is
or exactly twice the index of its parent. Likewise, if x has a right child, its
index is 1 greater than that of y.
As a result of these relationships, we can use simple calculations to find
either child or the parent of a node at a given location. Specifically, the left
and right children of the element at location i are the elements at locations
2i and 2i + 1, respectively, provided they exist. Furthermore, the parent of
the element at location i > 1 is at location i/2.
Let us consider how we can implement a binary heap as a data structure.
We will use two representation variables:
• size ≤ SizeOf(elements);
• elements[0] = sentinel; and
• for 1 ≤ i ≤ size, elements[i].Key() ≤ elements[i/2].Key().
then insert the other into one of the children. We select which child based
on where we need the new leaf.
In this insertion algorithm, unless the tree is empty, there will always
be a recursive call. This recursive call will always be on the child in the
path that leads to the location at which we want to add the new node. Note
that the keys along this path from the root to the leaf are in nonincreasing
order. As long as the key to be inserted is smaller than the key to which it
is compared, it will be the inserted element in the recursive call. When it is
compared with a smaller key, that smaller key is used in the recursive call.
When this happens, the key passed to the recursive call will always be at
least as large as the root of the subtree in which it is being inserted; thus,
it will become the new root, and the old root will be used in the recursive
call. Thus, the entire process results in inserting the new key at the proper
point in the path from the root to the desired insertion location.
For example, suppose we wish to insert the priority 35 into the binary
heap shown in Figure 5.15(a). We first find the path to the next insertion
point. This path is 89, 32, 17 . The proper position of 35 in this path is
between 89 and 32. We insert 35 at this point, pushing the following priorities
downward. The result is shown in Figure 5.15(b).
Because we can easily find the parent of a node in a BinaryHeap, we
can implement this algorithm bottom-up by starting at the location of the
new leaf and shifting elements downward one level until we reach a location
where the new element will fit. This is where having a sentinel element is
convenient — we know we will eventually find some element whose key is at
least as large as that of x. The resulting algorithm is shown in Figure 5.16.
We assume that Expand(A) returns an array of twice the size of A, with
the elements of A copied to the first half of the returned array.
The RemoveMax operation is a bit more difficult. We need to remove
the root because it contains the element with maximum priority, but in order
to preserve the proper shape of the heap, we need to remove a specific leaf.
We therefore first save the value of the root, then remove the proper leaf. We
need to form a new heap by replacing the root with the removed leaf. In order
to accomplish this, we use the MakeHeap algorithm shown in Figure 5.17.
For ease of presentation, we assume t is formed with BinaryTreeNodes,
rather than with an array. If the key of x is at least as large as the keys of
the roots of all children of t, we can simply replace the root of t with x, and
we are finished. Otherwise, we need to move the root of the child with larger
key to the root of t and make a heap from this child and x. This is just a
smaller instance of the original problem.
We can simplify MakeHeap somewhat when we use it with a binary
heap. First, we observe that once we have determined that at least one child
is nonempty, we can conclude that the left child must be nonempty. We also
observe that the reduction is a transformation to a smaller instance; i.e.,
A[1..n − 1] into a heap, then to insert A[n]. We can easily implement this
bottom-up. The resulting algorithm does n − 1 insertions into heaps of sizes
ranging from 1 to n − 1. The total running time is therefore in
n−1
Θ(lg i) ⊆ Θ((n − 1) lg(n − 1)) (from Theorem 3.31)
i=1
= Θ(n lg n).
f (N ) ∈ 2f (N/2) + Θ(lg N ).
5.7 Summary
A heap provides a clean framework for implementing a priority queue.
Although LeftistHeaps yield Θ(lg n) worst-case performance for the
operations Put and RemoveMax, the simpler SkewHeaps and Random-
izedHeaps yield O(lg n) amortized and Θ(lg n) expected costs, respectively,
for these operations. On the other hand, stack space usage for these two
implementations may be problematic unless the algorithms are restructured
to use iteration rather than recursion. BinaryHeaps, while providing no
asymptotic improvements over LeftistHeaps, nevertheless tend to be more
efficient in practice because they require less dynamic memory allocation.
They also provide the basis for HeapSort, a Θ(n lg n) in-place sorting
algorithm that uses Θ(lg n) stack space in the worst case. A summary
of the running times of the PriorityQueue operations for the various
implementations is shown in Figure 5.21.
For the implementations that use a Merge function, it is possible
to provide Merge as an operation. However, this operation is not very
appropriate for the PriorityQueue ADT because we may need to require
188 Algorithms: A Top-Down Approach
Figure 5.21 Running times for the PriorityQueue operations for various imple-
mentations.
Notes:
• n is the number of elements in the priority queue.
• Unless otherwise noted, all running times are worst-case.
• The constructor and the MaxPriority and Size operations all run is
Θ(1) worst-case time for all implementations.
the two priority queues to be of the same type. For example, if we added
a Merge operation to LeftistHeap, we would need to require that the
parameter is also a LeftistHeap — Merge(PriorityQueue) would be
insufficient. Furthermore, we would need to be concerned with security
because the resulting heap would share storage with the original heaps.
Using an immutable structure, as we did for LeftistHeap, would take
care of the security issue. With such implementations, the Merge operation
could be done in Θ(lg n) worst-case time for a LeftistHeap, or in Θ(lg n)
expected time for a RandomizedHeap, where n is the sum of the sizes of the
two priority queues. The amortized time for SkewHeap.Merge, however,
is not in O(lg n) unless we restrict the sequences of operations so that after
two priority queues are merged, the original priority queues are not used in
any subsequent operations; otherwise, we can repeatedly perform the same
expensive Merge.
190 Algorithms: A Top-Down Approach
5.8 Exercises
Exercise 5.1. Complete the implementation of SortedArrayPriority-
Queue shown in Figure 5.2 by adding a constructor and implementations
of the MaxPriority and RemoveMax operations. Prove that your
implementation meets its specification.
Exercise 5.2. Prove that SimpleHeap, shown in Figure 5.5, meets its
specification.
Exercise 5.3. Show the result of first inserting the sequence of priorities
below into a leftist heap, then executing one RemoveMax.
34, 12, 72, 15, 37, 49, 17, 55, 45
Exercise 5.4. Prove that LeftistTree, shown in Figure 5.7, meets its
specification.
Exercise 5.5. Prove that LeftistHeap, shown in Figure 5.8, meets its
specification.
* Exercise 5.6. Prove that for any n ∈ N, if we insert a sequence of n
strictly decreasing priorities into an initially empty leftist heap, we obtain a
leftist heap with null path length lg(n + 1).
Exercise 5.7. Instead of keeping track of the null path lengths of each
node, a variation on LeftistTree keeps track of the number of nodes in
each subtree, and ensures that the left child has as many nodes as the right
child. We call this variation a LeftHeavyTree.
a. Give an implementation of LeftHeavyTree. The structure must be
immutable, and each constructor must require only Θ(1) time.
b. Prove by induction on the number of nodes n in the tree that in any
LeftHeavyTree, the distance from the root to the rightmost empty
subtree is no more than lg(n + 1).
c. Using the result of part 5.7, show that if we use LeftHeavyTrees
instead of LeftistTrees in the implementation of LeftistHeap, the
running times of the operations are still in O(lg n), where n is the number
of elements in the priority queue.
Priority Queues 191
Exercise 5.8. Repeat Exercise 5.3 using a skew heap instead of a leftist
heap.
Exercise 5.9. Prove that SkewHeap, obtained by replacing the Merge
function in SimpleHeap (Figure 5.5) with the function shown in Figure 5.9,
meets its specification.
* Exercise 5.10. Another way of specifying a priority queue is to define
an interface HasPriority, as shown in Figure 5.22. Rather than supplying
two arguments to the Put operation, we could instead specify that it takes
a single argument of type HasPriority, where the priority of the item is
given by its Priority operation. Discuss the potential security problems for
this approach. How could these problems be avoided if such a specification
were adopted?
Exercise 5.11. The goal of this exercise is to complete the analysis of the
amortized running times of the SkewHeap operations.
a. Compute E[h].
b. Compute E[h2 ], and show that E[h2 ] = (E[h])2 .
c. Compute E[2h ], and show that E[2h ] = 2E[h] .
Exercise 5.19. Use Example 5.4 to show that the expected running time of
RandomizedHeap.Merge, shown in Figure 5.12, is in Ω(lg n) in the worst
case, where n is the number of elements in the two heaps combined.
Exercise 5.20. Complete the implementation of BinaryHeap by adding
a constructor and a MaxPriority operation to the operations shown in
Figures 5.16 and 5.18. Prove that the resulting implementation meets its
specification.
Exercise 5.21. Repeat Exercise 5.3 using a binary heap instead of a leftist
heap. Show the result as both a tree and an array.
Exercise 5.22. Prove that HeapSort, shown in Figure 5.20, meets its
specification.
Exercise 5.23. Prove that the first loop in HeapSort runs in Θ(n) time
in the worst case.
Exercise 5.24. Prove that HeapSort runs in Θ(n lg n) time in the worst
case.
Exercise 5.25. We can easily modify the Sort specification (Figure 1.2 on
page 7) so that instead of sorting numbers, we are sorting Keyed items in
nondecreasing order of their keys. HeapSort can be trivially modified to
meet this specification. Any sorting algorithm meeting this specification is
said to be stable if the resulting sorted array always has elements with equal
keys in the same order as they were initially. Show that HeapSort, when
modified to sort Keyed items, is not stable.
Exercise 5.26. Consider the following scheduling problem. We have a
collection of jobs, each having a natural number ready time ri , a positive
integer execution time ei , and a positive integer deadline di , such that
di ≥ ri + ei . At each natural number time instant t, we wish to schedule
the job with minimum deadline satisfying the following conditions
• t ≥ ri (i.e., the job is ready);
• if the job has already been executed for a < ei time units, then t + ei − a ≤
di (i.e., the job can meet its deadline).
Note that this scheduling strategy may preempt jobs, and that it will discard
jobs that have been delayed so long that they can no longer meet their
194 Algorithms: A Top-Down Approach
5.9 Notes
Both heaps and heap sort were introduced by Williams [121]. The linear-
time construction of a binary heap is due to Floyd [41]. Leftist heaps were
introduced by Crane [26]; see also Knuth [84]. Skew heaps were introduced
by Sleator and Tarjan [108]. Randomized heaps were introduced by Gambin
and Malinowski [48].
Other implementations of priority queues have been defined based on the
idea of a heap. For example, binomial queues were introduced by Vuillemin
[116]. Lazy binomial queues and Fibonacci heaps, each of which provide
Put and RemoveMax operations with amortized running times in O(1)
and O(lg n), respectively, were introduced by Fredman and Tarjan [46].
The information on craps in Exercise 5.27 is taken from Silberstang [106].
This page intentionally left blank
Chapter 6
197
198 Algorithms: A Top-Down Approach
the statement,
d.VisitInOrder(new Printer()).
than any key in the left child of t; hence, moving it to the root of t maintains
the BST structure.
We can find the smallest key in a nonempty BST by first looking at the
left child of the root. If it is empty, then the root contains the smallest key.
Otherwise, the smallest key is in the left child. This is a transformation,
and so can be implemented using a loop. The complete implementation of
Remove is shown in Figure 6.7. Figure 6.8 shows the result of deleting 54
from the BST shown in Figure 6.5. Specifically, because 54 has two children,
it is replaced by the smallest key (64) in its right child, and 64 is replaced
by its right child (71).
The VisitInOrder operation requires us to apply v.Visit to each data
item in the BST, in order of keys. If the BST is empty, then there is
nothing to do. Otherwise, we must visit all of the data items in the left
child prior to visiting the root, then we must visit all of the data items in
the right child. Because the left and right children are themselves BSTs,
they comprise smaller instances of this problem. Applying the top-down
approach in a straightforward way, we obtain the recursive internal function
TraverseInOrder shown in Figure 6.7.
The above algorithm implemented by TraverseInOrder is known as
an inorder traversal. Inorder traversal applies strictly to binary trees, but two
other traversals apply to rooted trees in general. A preorder traversal visits
the root prior to recursively visiting all of its children, whereas a postorder
traversal visits the root after recursively visiting all of its children.
Let us now analyze the running time of Find. Let n be the number
of data items in the BST. Clearly, the time required outside the loop and
the time for a single iteration of the loop are each in Θ(1). We therefore
need to analyze the worst-case number of iterations of the loop. Initially, t
refers to a BST with n nodes. A single iteration has the effect of resetting t
to refer to one of its children. In the worst case, this child may contain all
nodes except the root. Thus, in the worst case, the loop may iterate n times.
This can happen, for example, if all left children are empty, so that elements
refers to a BST that consists of a single chain of nodes going to the right
(see Figure 6.9). The worst-case running time is therefore in Θ(n).
Example 6.2. Suppose we build a BSTDictionary by inserting n items
with integer keys 1, 2, . . . , n, in that order. As each key is inserted, it is
larger than any key already in the BST. It is therefore inserted to the right
of every key already in the BST. The result is shown in Figure 6.9. It is
easily seen that to insert key i requires Θ(i) time. The total time to build
the BSTDictionary is therefore in Θ(n2 ), by Theorem 3.31.
204 Algorithms: A Top-Down Approach
Figure 6.8 The result of deleting 54 from the BST shown in Figure 6.5 — 54 is
replaced by 64, which in turn is replaced by 71.
nonempty trees. Because each of these calls makes two recursive calls, the
total number of recursive calls is exactly 2n. Including the initial call the
total number of calls made to TraverseInOrder is 2n + 1. Because each
of these calls runs in Θ(1) time (excluding the time taken by v.Visit), the
total time is in Θ(n). Note that we cannot hope to do any better than this
because the specification requires that v.Visit be called n times.
Because TraverseInOrder is recursive, we should also consider the
stack space usage. Again, analyzing the stack space in terms of the number
of nodes is rather difficult, but it turns out to be much easier to do the
analysis in terms of the height of the tree. As with Find, this analysis also
gives us useful information about the performance of TraverseInOrder.
Let f (h) be the worst-case stack space used by TraverseInOrder on
a tree of height h. Then if h > 0, f (h) is in Θ(1) plus the maximum of the
amount of space used the two recursive calls in the worst case. Because at
least one of the two children has height h − 1, f (h) ≥ f (h − 1); i.e., f (h) is
nondecreasing. Then in the worst case, the maximum space used by one of
the recursive calls is f (h − 1). We therefore have the following recurrence:
f (h) ∈ f (h − 1) + Θ(1),
The balance criterion that we choose is that in any subtree, the heights
of the two children differ by at most 1. For the purpose of this definition, we
consider an empty tree to have height −1, or one less than the height of a tree
containing a single node. A binary search tree obeying this balance criterion
is known as an AVL tree; “AVL” stands for the names of its inventors,
Adel’son-Vel’skiı̆ and Landis.
Figure 6.10 shows an AVL tree of height 4 containing integer keys. Note
that its balance is not perfect – it is not hard to construct a binary tree of
height 3 with even more nodes. Nevertheless, the children of each nonempty
subtree have heights differing by at most 1, so it is an AVL tree.
Before we begin designing an AVL tree implementation of Ordered-
Dictionary, let us first derive an upper bound on the height of an AVL
tree with n nodes. We will not derive this bound directly. Instead, we will
first derive a lower bound on the number of nodes in an AVL tree of height
h. We will then transform this lower bound into our desired upper bound.
Consider an AVL tree with height h having a minimum number of
nodes. By definition, both children of a nonempty AVL tree must also be
AVL trees. By definition of the height of a tree, at least one child must have
height h − 1. By definition of an AVL tree, the other child must have height
at least h − 2. In order to minimize the number of nodes in this child, its
height must be exactly h−2, provided h ≥ 1. Thus, the two children are AVL
trees of heights h − 1 and h − 2, each having a minimum number of nodes.
The above discussion suggests a recurrence giving the minimum number
of nodes in an AVL tree of height h. Let g(h) give this number. Then for
Storage/Retrieval I: Ordered Keys 209
h ≥ 1, the number of nodes in the two children are g(h − 1) and g(h − 2).
Then for h ≥ 1,
where g(−1) = 0 (the number of nodes in an empty tree) and g(0) = 1 (the
number of nodes in a tree of height 0).
g1 (h) = 2g1 (h − 2) + 1,
g2 (h) = g1 (2h)
= 2g1 (2h − 2) + 1
= 2g1 (2(h − 1)) + 1
= 2g2 (h − 1) + 1.
g2 then fits the form of Theorem 3.34. Applying this theorem, we obtain
g2 (h) ∈ Θ(2h ).
210 Algorithms: A Top-Down Approach
Thus, for sufficiently large h, there is a positive real number c1 such that
g1 (2h) = g2 (h)
≥ c1 2h .
Then for sufficiently large even h,
g1 (h) ≥ c1 2h/2 .
For sufficiently large odd h, we have
g1 (h) ≥ g1 (h − 1)
≥ c1 2(h−1)/2 (because h − 1 is even)
c1
= √ 2h/2 ,
2
so that for some positive real number c2 and all sufficiently large h,
g1 (h) ≥ c2 2h/2 . (6.3)
Combining (6.3) with (6.2), we obtain
c2 2h/2 ≤ g(h),
for sufficiently large h. Applying lg to both sides and rearranging terms, we
obtain
h ≤ 2(lg g(h) − lg c2 )
∈ O(lg g(h)).
Because g(h) is the minimum number of nodes in an AVL tree of height h, it
follows that the height of an AVL tree is in O(lg n), where n is the number of
nodes. By a similar argument, it can be shown that the height is in Ω(lg n)
as well. We therefore have the following theorem.
Theorem 6.4. The worst-case height of an AVL tree is in Θ(lg n), where n
is the number of nodes.
By Theorem 6.4, if we can design operations that run in time linear in
the height of an AVL tree, these operations will run in time logarithmic
in the size of the data set. Certainly, adding or deleting a node will change
the heights of some of the subtrees in an AVL tree; hence, these operations
must re-establish balance. Computing the height of a binary tree involves
finding the longest path, which apparently requires examining the entire tree.
However, we can avoid recomputing heights from scratch if we record the
Storage/Retrieval I: Ordered Keys 211
height of each subtree. If the heights of both children are known, computing
the height of the tree is straightforward.
We therefore define the data type AVLNode, which is just like Binary-
TreeNode, except that it has an additional representation variable, height.
This variable is used to record the height of the tree as an integer. As for the
other three variables, we allow read/write access to height. The constructor
for AVLNode is just like the constructor for BinaryTreeNode, except
that it also initializes height to −1.
To represent an OrderedDictionary using an AVL tree, we again
use two variables, elements and size, as we did for BSTDictionary. In
this representation, however, elements will refer to an AVLNode. Our
structural invariant is that elements represents an AVL tree. We interpret
this statement as implying that each height variable gives the height of the
subtree at which it is rooted, or −1 if that subtree is empty.
We can define a Find function for this implementation as we did for
BSTDictionary; in fact, because an AVL tree is a binary search tree, the
same function will work. As we have already shown the running time of
BSTDictionary.Find to be in Θ(h), where h is the height of the tree,
we can conclude that this function has a running time in Θ(lg n). However,
this function is not useful in implementing the Put or Remove operations
because we might need to change the shape of the tree at some other location
in order to maintain the balance criterion.
Let us therefore consider how Put might be implemented. More
generally, let us consider how a data item x might be inserted into an
arbitrary AVL tree t, which may be a subtree of a larger AVL tree. If t is
empty, we can replace it with a single-node AVL tree containing x. Otherwise,
we’ll need to compare keys and insert into the appropriate child. However, we
are not yet finished, because the insertion into the child will have changed its
shape; hence, we need to compare the heights of the two children and restore
balance if necessary. Note that this reduction is not a transformation, due
to the additional work required following the insertion into the child.
In order to complete the insertion function, we need to be able to restore
the balance criterion after an insertion into one of the children. Clearly,
if we insert into one particular child, the other child will be unchanged.
Furthermore, if we specify the insertion function to cause the result to be
an AVL tree, we know that both children will be AVL trees; hence, we only
need to worry about restoring balance at the root. Before we can talk about
how to restore balance at the root, we should consider how much difference
there might be in the heights of the children. It stands to reason that an
212 Algorithms: A Top-Down Approach
Example 6.5. Suppose we were to insert the key 39 into the AVL tree shown
in Figure 6.14(a). Using the ordinary BST insertion algorithm, 39 should be
made the right child of 35, as shown in Figure 6.14(b). To complete the
insertion, we must check the balance along the path to 39, starting at the
bottom. Both 35 and 23 satisfy the the balance criterion; however, the left
child of 42 has height 2, whereas the right child has height 0. We therefore
need to perform a rotation at 42. To determine which rotation is appropriate,
we compare the height of the left child of the left child of 42 (i.e., the subtree
rooted at 11) with the right child of 42. Because both of these subtrees have
height 0, a double rotate right is required at 42. To accomplish this rotation,
we promote 35 to the root of the subtree (i.e., where 42 currently is), and
place the nodes 23 and 42, along with the subtrees rooted at 11, 39, and 50,
at the only locations that preserve the order of the BST. The result of this
rotation is shown in Figure 6.14(c). Because the balance criterion is satisfied
at 54, this tree is the final result.
now analyze Insert. Excluding the recursion, this function clearly runs in
Θ(1) time. At most one recursive call is made, and its second parameter has
height strictly less than the height of t; in the worst case, it is 1 less. If h is
the height of t, then the worst-case running time of Insert is given by
f (h) ∈ f (h − 1) + Θ(1),
for h > 0. By Theorem 3.34, f (h) ∈ Θ(h). By Theorem 6.4, Insert therefore
runs in Θ(lg n) time, where n is the size of the data set. Clearly, Put can
be written to operate in Θ(lg n) time.
The same analysis shows that the stack space usage of Put is in Θ(lg n).
We leave as an exercise the design and analysis of an algorithm for Remove.
the root of the tree by a series of double rotations, each promoting b by two
levels. Now referring to Figure 6.12, note that the distance between the root
and any descendant of d decreases by 1 for each rotation. The number of
rotations is half the distance from the root to d, so each descendant of d ends
up closer to the root by half the original distance between the root and d.
Unfortunately, single rotations are not as effective in improving the
structure. Notice that in Figure 6.11, nodes in subtree c do not get any
closer to the root as a result of the rotation. As a result, we need a new kind
of double rotation that can be applied when the node to be promoted is
not a “zig-zag” from its grandparent. So that we might distinguish between
the various double rotations, we will refer to the rotation of Figure 6.12 as
a zig-zag right, and to its mirror image as a zig-zag left. A zig-zig right is
shown in Figure 6.15. Note that by this rotation, the distance between the
root and any descendant of a is decreased by at least 1.
Our representation, interpretation, and structural invariant will be the
same as for BSTDictionary. The only differences will occur in the actual
implementations of the operations. In fact, the implementation of VisitIn-
Order will also be the same as for BSTDictionary.
Let us consider how we can implement a Find function. First, we observe
that no value needs to be returned, because if the key we are looking for
exists, we will bring it to the root of the tree. Hence, after invoking the
Find function, the Get operation only needs to look in the root to see if the
desired key is there. Second, we don’t want to bring a node representing an
empty subtree to the root. For this reason, we will need to verify that a node
is nonempty at some point before rotating it to the root. It therefore seems
reasonable to include as part of the precondition that the tree is nonempty.
We therefore begin by comparing the given key k to the key at the root
of the given tree t. If the keys don’t match, we will need to look in the
appropriate child, after verifying that it is nonempty. However, we want to
do a double rotation whenever possible, so rather than using a recursive call
at this point, we should go ahead and make another comparison. If we find
the key k, or if the appropriate grandchild is empty, we do a single rotation.
Otherwise, we recursively look for k in the appropriate grandchild and do a
double rotation. The algorithm is shown in Figure 6.16.
The insertion algorithm cannot use Find because it must insert a new
data item when an empty subtree is found. However, it can be patterned
after the Find algorithm. The main difference is that because a data item
is inserted into an empty tree, we will always rotate that node to the root.
We therefore do not need to restrict its use to nonempty trees. The details
are left as an exercise.
The deletion algorithm can, however, use Find. Suppose we want to
delete key k. We can use Find(k, elements) to move k to the root if
it is present. If the right child is empty, we can simply make the left
child the new root. Otherwise, we can use another internal function,
FindMin(elements.RightChild()), to move the minimum key m in the
right child to the root of the right child. At this point, the right child has
an empty left child, because there are no keys with values between k and its
right child. The result is shown in Figure 6.18. We can therefore complete
the deletion by making A the left child of m and making m the root (see
Figure 6.18). The algorithm is given in Figure 6.19.
Let us now analyze the amortized running times of Get, Put, and
Remove for SplayDictionary. It is not hard to see that all of the
recursive algorithms have constant running time, excluding recursive calls.
Furthermore, each time a recursive call is made, a rotation is done. It is
therefore sufficient to analyze the total number of rotations. Each rotation,
therefore, will have an actual cost of 1.
In order to amortize the number of rotations, we need to find an
appropriate potential function. Intuitively, an operation involving many
Storage/Retrieval I: Ordered Keys 219
Figure 6.16 The Find internal function for the SplayDictionary implementation
of OrderedDictionary.
rotations should improve the overall balance of the tree. The potential
function should in some way measure this balance, decreasing as the balance
increases. If the tree is very unbalanced, as in Figure 6.9, many of the subtrees
220 Algorithms: A Top-Down Approach
Figure 6.18 The splay tree after the calls to Find and FindMin in Remove
Tc
c Tb
Ta
b
Ta Tc
a D
a c
Tb
A b
A B C D
B C
Storage/Retrieval I: Ordered Keys 223
plus the change in the potential function Φ. Noting that |Tb | = |Tc |, we
conclude that the change in Φ is
In order to get a tight bound for this expression in terms of lg |Tc | − lg |Ta |,
we need to be a bit more clever. We would again like to use Theorem 5.7.
Note that |Ta | + |Tc | ≤ |Tc |; however, lg |Ta | + lg |Tc | does not occur in (6.6).
Tc Ta
c a
Tb Tb
Ta b D A b Tc
a C B c
A B C D
224 Algorithms: A Top-Down Approach
Let us therefore both add and subtract lg |Ta | to (6.6). Adding in the actual
cost, applying Theorem 5.7, and simplifying, we obtain the following bound
on the amortized cost of a zig-zig rotation:
Because each operation will do at most two single rotations (recall that a
deletion can do a single rotation in both the Find and the FindMin), the
“+ 1” in this bound will not cause problems.
We can now analyze the amortized cost of a Find. We first combine
bounds (6.5), (6.7), and (6.8) into a single recurrence defining a function
f (k, t) bounding the amortized cost of Find(k, t). Suppose Find(k, t) makes
a recursive call on a subtree s and performs a double rotation. We can then
combine (6.5) and (6.7) to define:
Tb Ta
b a
Ta Tb
a C A b
A B B C
Storage/Retrieval I: Ordered Keys 225
For the base of the recurrence, suppose that either no rotation or a single
rotate is done. Using (6.8), we can define
where s is the subtree whose root is rotated to the root of t. Clearly, f (k, t) ∈
O(lg n), where n is the number of nodes in t. The amortized cost of Find,
and hence of Get, is therefore in O(lg n).
The analysis of Put is identical to the analysis of Find, except that we
must also account for the change in Φ when the new node is added to the
tree. When the new node is added, prior to any subsequent rotations, it is a
leaf. Let s denote the empty subtree into which the new leaf is inserted. The
insertion causes each of the ancestors of s, including s itself, to increase in
size by 1. Let t be one of these ancestors other than the root, and let t be the
same subtree after the new node is inserted. Note that t has no more nodes
than does the parent of t. If we think of the insertion as replacing the parent
of t by t , then this replacement causes no increase in Φ. The only node for
which this argument does not apply is the root. Therefore, the increase in Φ
is no more than lg(n + 1), where n is the number of nodes in the tree prior
to the insertion. The entire amortized cost of Put is therefore in O(lg n).
Finally, let us consider the Remove operation. The Find has an
amortized cost in O(lg n). Furthermore, the amortized analysis of Find also
applies to FindMin, so that it is also in O(lg n). Finally, it is easily seen
that the actual removal of the node does not increase Φ. The amortized cost
of Remove is therefore in O(lg n) as well.
As we observed for skew heaps (see Section 5.4), amortized analysis is
inappropriate for analyzing stack space usage. We leave it as an exercise to
show that the worst-case stack space usage for Put is in Θ(n). To avoid this
large stack space usage, the algorithms can be reformulated to use iteration
with an explicit stack, rather than using the runtime stack for recursive calls.
As a result, the total space usage is still in Θ(n), but the stack space usage
is in Θ(1). We leave the details as an exercise.
226 Algorithms: A Top-Down Approach
Note :
Storage/Retrieval I: Ordered Keys 227
We interpret the represented set to be the data items in the linked list
beginning with start and ending with end, using the variables links[1] to
obtain the next element in the list; the data items in start and end are
excluded from the set.
Our structural invariant is:
• Both start and end have a level of M ≥ maxLevel.
• start.key = minKey, which is the smallest possible key.
• end.key = maxKey, which is the largest possible key.
228 Algorithms: A Top-Down Approach
This fact follows from the following theorem, using c = 2 and a = −1.
Storage/Retrieval I: Ordered Keys 231
Theorem 6.7. For any real numbers a and c such that c > 1,
∞
ca+1
ca−i = .
c−1
i=0
Proof.
∞
n
ca−i = lim ca−i
n→∞
i=0 i=0
n
= ca lim (1/c)i
n→∞
i=0
(1/c)n+1 − 1
= ca lim 1 from (2.2)
c −1
n→∞
c − (1/c)n
= ca lim
n→∞ c−1
ca+1
=
c−1
because 1/c < 1.
We now define the discrete random variable len over Seq such that len(e)
is the length of the sequence of flips. Note that E[len] gives us the expected
number of times the while loop condition is tested, as well as the expected
final value of l. As a result, it also gives us the expected number of iterations
of the for loop, provided k is not already in the set.
Because len(e) is always a natural number, we can apply Theorem 5.5
(page 172) to obtain E[len]. The probability that a sequence has length at
least i is the probability that i − 1 flips all result in heads, or 21−i . Thus,
∞
E[len] = 21−i
i=1
=2
from Theorem 6.7. We can therefore expect the while loop in Put to iterate
once, yielding an expected value of 2 for l, on average. Hence, the for loop,
if it executes, iterates twice on average. The expected running times of both
loops are therefore in Θ(1) for a worst-case input.
In order to determine the expected running time of the SkipListNode
constructor, we need to analyze it again, but this time doing an expected-
case analysis using len as its third parameter. Using the same analysis as
232 Algorithms: A Top-Down Approach
we did for the for loop in Put, we see that its expected running time is in
Θ(1).
In order to complete the expected-case analysis of Put, we need to
analyze Find. We will begin by defining an appropriate discrete probability
space. Let Seqn be the set of all n-tuples of elementary events from Seq; i.e.,
each elementary event in Seqn is an n-tuple e1 , . . . , en such that each ei is
a sequence of coin flips containing zero or more heads, followed by exactly
one tails. Such an n-tuple describes the “shape” of a skip list by recording,
for each of the n data elements, the sequence of coin flips which generated
the level of the SkipListNode containing it.
In order to show that Seqn is countable, we can label each n-tuple
e1 , . . . , en ∈ Seqn with the natural number
len(e1 ) len(e2 )
p1 p2 · · · pnlen(en ) ,
where pi is the ith prime. Because each elementary event in Seq is uniquely
identified by its length, and because each positive integer has a unique prime
factorization, each tuple has a unique label; hence, Seqn is countable.
We need to define the probabilities of elements in Seqn . In order to do
this properly, we need to extend the definition of independence given in
Section 5.5 to more than two events. We say that a
This -notation denotes
set S of events is pairwise independent if for every the product of the proba-
P (e) for all events
pair of events e1 , e2 ∈ S, e1 and e2 are independent. bilities
e in T .
If for every subset T ⊆ S containing at least two
events,
P e = P (e),
e∈T e∈T
• tail 2 (e) = 2 because there are 2 level-2 nodes following the last node with
level greater than 2; and
• tail 3 (e) = 1 because there is 1 level-3 node, and there are no nodes with
level greater than 3.
Suppose e describes some skip list with n elements, and suppose this
skip list’s Find function is called with a key larger than any in the list. The
running time of Find is then proportional to the number of times the while
loop condition is tested. On iteration i of the for loop, the while loop will
iterate exactly tail i (e) times, but will be tested tail i (e) + 1 times, including
the test that causes the loop to terminate. The expected running time of
Find on a worst-case input is therefore proportional to:
⎡ ⎤
max(maxLevel,l)
E⎣ (tail i + 1)⎦
i=1
⎡⎛ ⎞ ⎤
max(maxLevel,l)
= E ⎣⎝ tail i ⎠ + max(maxLevel, l)⎦
i=1
⎡ ⎤
max(maxLevel,l)
=E⎣ tail i ⎦ + E[max(maxLevel, l)]. (6.9)
i=1
By Theorem 5.5,
∞
E[tail i ] = P (tail i ≥ j).
j=1
=1
from Theorem 6.7.
This bound seems quite good, perhaps even surprisingly so. It tells us
that on any iteration of the for loop, we can expect the while loop to iterate
236 Algorithms: A Top-Down Approach
no more than once, on average. Still, this bound does not give a finite bound
for (6.10). However, we have already observed that for any e ∈ Seqn , tail i (e)
will be 0 for all but finitely many i. This follows because there are only
finitely many nonempty levels. Consequently, we might want to use the fact
that tail i (e) ≤ numi (e); hence, E[tail i ] ≤ E[numi ].
While this bound would yield a finite bound for (6.10), it unfortunately
is still too loose, as num1 (e) = n for every e ∈ Seqn . We would like to derive
a logarithmic upper bound, if possible. However, we can use a combination of
the two bounds. In particular, the bound of 1 seems to be a good upper bound
as long as it is less than E[numi ]. Once i is large enough that E[numi ] ≤ 1,
E[numi ] would be a better bound. If we can determine the smallest value of
i such that E[numi ] ≤ 1, we should be able to break the infinite sum into
two sums and derive tight bounds for each of them.
In order to analyze E[numi ], we observe that for e ∈ Seqn , numi (e) is a
count of the number of components whose lengths are at least i. Furthermore,
we can express the fact that a component has a length of at least i as an event
in Seq. The standard technique for counting events is to use an indicator
random variable. Specifically, consider the event in Seq that len ≥ i; i.e.,
this event is the set of sequences of coin flips consisting of at least i − 1
heads, followed by exactly one tails. The indicator for this event is then
defined to be
1 if len(ej ) ≥ i
I(len ≥ i)(ej ) =
0 otherwise.
We can then express numi as follows:
n
numi (e1 , . . . , en ) = I(len ≥ i)(ej ).
j=1
= E[nI(len ≥ i)]
Storage/Retrieval I: Ordered Keys 237
= nE[I(len ≥ i)]
= nP (len ≥ i)
= n21−i .
= lg n + n21−lg n
≤ lg n + n21−lg n
= lg n + 2. (6.11)
For the case in which Find is called from Put, we know that E[l] = 2. We
therefore need to evaluate E[maxLevel]. Note that maxLevel is the number
238 Algorithms: A Top-Down Approach
We therefore have
∞
E[maxLevel] = E I(numi > 0)
i=1
∞
= E[I(numi > 0)]. (6.13)
i=1
Clearly, I(numi > 0)(e) ≤ 1 for all e ∈ Seqn , so that E[I(numi > 0)] ≤ 1.
Furthermore, I(numi > 0)(e) ≤ numi (e), so that E[I(numi > 0)] ≤ E[numi ].
We therefore have E[I(numi > 0)] ≤ min(1, E[numi ]), which is the same
upper bound we showed for E[tail i ]. Therefore, following the derivation of
(6.11), we have
E[maxLevel] ≤ lg n + 2. (6.14)
Now combining (6.9), (6.10), (6.11), (6.12), and (6.14), it follows that
the expected number of tests of the while loop condition is no more than
2( lg n + 2) + 2 ∈ O(lg n),
for a worst-case input when Find is called by Put. The expected running
time of Find in this context is therefore in O(lg n).
A matching lower bound for the expected running time of Find can
also be shown — the details are outlined in Exercise 6.18. We can therefore
conclude that the expected running time of Find when called from Put on
a worst-case input is in Θ(lg n).
We can now complete the analysis of Put. We have shown that the
expected running times for both loops and the constructor for SkipList-
Node are all in Θ(1). The expected running time of Find(k, l) is in Θ(lg n).
The remainder of the algorithm clearly runs in Θ(1) time. The total time
is therefore expected to be in Θ(lg n) for a worst-case input. We leave as
exercises to design Get and Remove to run in Θ(lg n) expected time,
as well.
Storage/Retrieval I: Ordered Keys 239
Thus, the probability that some element has a level strictly greater than 100
is at most n2−100 . Because 2−20 < 10−6 , this means that for n ≤ 280 ≈ 1024 ,
the probability that a level higher than 100 is reached is less than one in a
million. Such a small probability of error can safely be considered negligible.
6.5 Summary
A summary of the running times of the operations for the various imple-
mentations of OrderedDictionary is given in Figure 6.26. Θ(lg n)-
time implementations of the Get, Put, and Remove operations for the
OrderedDictionary interface can be achieved in three ways:
The worst-case stack space usage for each of the AVL tree operations
is in Θ(lg n). Because the skip list implementation uses no recursion, its
worst-case stack space usage is in Θ(1). However, unless the splay tree
implementation is revised to remove the recursion (see Exercise 6.14), its
worst-case stack space usage is in Θ(n).
Section 6.4 introduced the use of indicator random variables for analyzing
randomized algorithms. The application of this technique involves converting
the expected value of a random variable to the expected values of indicator
random variables and ultimately to probabilities. Theorems 5.5, 5.9, and 6.9
are useful in performing this conversion. The probabilities are then computed
using the probabilities of the elementary events and the laws of probability
240 Algorithms: A Top-Down Approach
Notes:
• n is the number of elements in the dictionary.
• The constructor and the Size operation each run in Θ(1) worst-case
time for each implementation.
• The VisitInOrder operation runs in Θ(n) worst-case time for each
implementation, assuming that the Visit operation for the given
Visitor runs in Θ(1) time.
• Unless otherwise noted, all running times are worst-case.
6.6 Exercises
Exercise 6.1. Prove the correctness of BSTDictionary.TraverseInOr-
der, shown in Figure 6.7.
Exercise 6.2. Draw the result of inserting the following keys in the order
given into an initially empty binary search tree:
Exercise 6.3. Draw the result of deleting each of the following keys from
the tree shown in Figure 6.10, assuming that it is an ordinary binary search
tree. The deletions are not cumulative; i.e., each deletion operates on the
original tree.
a. 55
b. 74
c. 34
Storage/Retrieval I: Ordered Keys 241
Exercise 6.5. Repeat Exercise 6.3 assuming the tree is an AVL tree.
Exercise 6.7. Repeat Exercise 6.3 assuming the tree is a splay tree.
Exercise 6.9. The depth of a node in a tree is its distance from the root;
specifically the root has depth 0 and the depth of any other node is 1 plus
the depth of its parent. Prove by induction on the height h of any AVL tree
that every leaf has depth at least h/2.
* Exercise 6.10. Prove that when a node is inserted into an AVL tree, at
most one rotation is performed.
** Exercise 6.11. Prove that if 2m − 1 keys are inserted into an AVL tree
in increasing order, the result is a perfectly balanced tree. [Hint: You will
need to describe the shape of the tree after n insertions for arbitrary n, and
prove this by induction on n.]
Exercise 6.12. A red-black tree is a binary search tree whose nodes are
colored either red or black such that
• if a node is red, then the roots of its non-empty children are black; and
• from any given node, every path to any empty subtree has the same
number of black nodes.
We call the number of black nodes on a path from a node to an empty subtree
to be the black-height of that node. In calculating the black-height of a node,
we consider that the node itself is on the path to the empty subtree.
where
n n!
=
j j!(n − j)!
are the binomial coefficients for 0 ≤ j ≤ n. [Hint: Use induction on n.]
Storage/Retrieval I: Ordered Keys 243
c. Using the results of parts (a) and (b), prove that for i ≤ lg n + 1,
d. Using the result of part (c), Exercise 6.17, and (6.13), prove that
Show that the three events are pairwise independent, but not mutually
independent.
* Exercise 6.21. Let len be as defined in Section 6.4. For each of the
following, either find the expected value or show that it diverges (i.e., that
it is infinite).
a. E[2len
√ ].
b. E[ 2len ].
6.7 Notes
AVL trees, which comprise the first balanced binary search tree scheme, were
introduced by Adel’son-Vel’skiı̆ and Landis [1]. Splay trees were introduced
by Sleator and Tarjan [107]. Red-black trees, mentioned in Exercise 6.12,
were introduced by Bayer [8] (see also Gubias and Sedgewick [60]). Balance
in red-black trees is maintained using the same rotations as for splay trees.
As a result, keys can be accessed in Θ(lg n) time in the worst case. Because
heights don’t need to be calculated, they tend to perform better than AVL
trees and are widely used in practice. A somewhat simpler version of red-
black trees, known as AA-trees, was introduced by Andersson [5].
All of the above trees can be manipulated by the tree viewer on this
textbook’s web site. The implementations of these trees within this package
are all immutable.
Another important balanced search tree scheme is the B-tree, introduced
by Bayer and McCreight [9]. A B-tree is a data structure designed for
accessing keyed data from an external storage device. B-trees therefore have
high branching factor in order to minimize the number of device accesses
needed. Red-black trees and AA-trees are actually simulations of B-trees
with a maximum branching factor of 4 (called 2-3-4 trees) and 3 (called 2-3
trees), respectively.
Skip lists were introduced by Pugh [100].
Chapter 7
245
246 Algorithms: A Top-Down Approach
0 1 2 3 4 5 6 7 8 9
elements: ? ? ? ? 35 ? ? 17 ? ?
used: 7 4 ? ? ? ? ? ? ? ?
num = 2
loc: ? ? ? ? 1 ? ? 0 ? ?
7.2 Hashing
The technique we will develop over the remainder of this chapter is known
as hashing. The basic idea behind hashing is to convert each key k to an
index h(k) using a hash function h, so that for all k, 0 ≤ h(k) < m for some
positive integer m. h(k) is then used as an index into a hash table, which is
an array T [0..m − 1]. We then store the data item at that index.
Storage/Retrieval II: Unordered Keys 249
Typically, the universe of keys is much larger than m, the size of the hash
table. By choosing our array size m to be close to the number of elements
we need to store, we eliminate the space usage problem discussed in Section
7.1. However, because the number of possible keys will now be greater than
m, we must deal with the problem that h must map more than one potential
key to the same index. When two actual keys map to the same index, it is
known as a collision.
The potential for collisions is not just a theoretical issue unlikely to
occur in practice. Suppose, for example, that we were to randomly and
independently assign indices to n keys, so that for any given key k and
index i, 0 ≤ i < m, the probability that k is assigned i is 1/m. We can
model this scenario with a discrete probability space consisting of the mn
n-tuples of natural numbers less than m. Each tuple is equally likely, and
so has probability m−n . We can then define the random variable coll as the
number of collisions; i.e., coll(i1 , . . . , in ) is the number of ordered pairs
(ij , ik ) such that ij = ik and j < k.
coll can be expressed as the sum of indicator random variables as follows:
n−1
n
coll(i1 , . . . , in ) = I(ij = ik ).
j=1 k=j+1
Therefore,
⎡ ⎤
n−1
n
E[coll] = E ⎣ I(ij = ik )⎦
j=1 k=j+1
n−1
n
= E[I(ij = ik )]
j=1 k=j+1
n−1
n
= P (ij = ik ).
j=1 k=j+1
For each choice of i, j, and ij , ik can take on m possible values, one of which
is ij . Because the probabilities of all elementary events are equal, it is easily
seen that P (ij = ik ) = 1/m for j < k. Hence,
n−1
n
E[coll] = 1/m
j=1 k=j+1
250 Algorithms: A Top-Down Approach
n−1
1
= (n − j)
m
j=1
n−1
1
= j (reversing the sum)
m
j=1
n(n − 1)
=
2m
by (2.1).
For example, if our hash table has 500,000 locations and we have more
than a thousand data elements, we should expect at least one collision, on
average. In general, it requires too much space to make the table large enough
so that we can reasonably expect to have no collisions.
Several solutions to the collision problem exist, but the most common is
to use a linked list to store all data elements that are mapped to the same
location. The approach we take here is similar, but we will use a ConsList
instead of a linked list. Using a ConsList results in somewhat simpler code,
and likely would not result in any significant performance degradation. This
approach is illustrated in Figure 7.3.
In the remainder of this section, we will ignore the details of specific
hash functions and instead focus on the other implementation details of a
0 14
1 8 29
4 53 11 32
6
Storage/Retrieval II: Unordered Keys 251
hash table. In order to approach the use of hash functions in a general way,
we use the HashFunction ADT, shown in Figure 7.4. Note that because
there are no operations to change the hash function, the HashFunction
ADT specifies an immutable data type. In remaining sections of this chapter,
we will consider various ways of implementing a HashFunction. As we
will see in the next section, not all hash table sizes are appropriate for
every HashFunction implementation. For this reason, we allow the user to
select an approximate table size, but leave it up to the HashFunction to
determine the exact table size.
Our HashTable representation of Dictionary then consists of three
variables:
• hash: a HashFunction whose associated table size is some positive
integer m;
• table[0..m − 1]: an array of ConsLists; and
• size: a readable Nat.
Our structural invariant is that:
• for 0 ≤ i < hash.Size(), table[i] is a ConsList containing only Keyed
items;
• for each Keyed item x in table[i], 0 ≤ i < m, hash.Index(x.Key()) = i;
and
• the total number of Keyed items in the ConsLists is given by size.
252 Algorithms: A Top-Down Approach
Theorem 7.1. Let T be a hash table with m locations, and suppose the
universe U of possible keys contains more than m(n − 1) elements. Then for
any function h mapping U to natural numbers less than m, there is some
natural number i < m such that h maps at least n keys in U to i.
Storage/Retrieval II: Unordered Keys 253
The proof of the above theorem is simply the observation that if it were
not true — i.e., if h maps at most n − 1 elements to each i — then the size
of U could be at most m(n − 1). Though this result looks bad, what it tells
us is that we really want h to produce a random distribution of the keys so
that the list lengths are more evenly distributed throughout the table.
For the remainder of this section, therefore, we will assume that the
key distribution is modeled by a discrete probability space hashDist. The
elementary events in hashDist are the same as those in the probability
distribution defined above: all n-tuples of natural numbers less than m.
Again, the n positions in the tuple correspond to n keys, and their values
give their indices in the hash table. Regarding probabilities, however, we
will make a weaker assumption, namely, the probability that any two given
distinct positions are equal is at most , where 0 < < 1. Our earlier
probability space satisfies this property for = 1/m, but we will see in
Sections 7.4 and 7.5 that other spaces do as well.
In what follows, we will analyze the expected length of the ConsList
searched for an arbitrary key, assuming a distribution modeled by hashDist.
In the next section we will show how to define deterministic hash functions
that approximate this distribution well enough to work very well in practice.
Then in Sections 7.4 and 7.5, we will show how to guarantee this behavior
using randomization.
For a given search in the hash table, suppose there are a total of n keys
in the table together with the key for which we are searching. Thus, if the
given key is in the hash table, there are n keys in the hash table; otherwise,
there are n − 1. We will use hashDist to model the distribution of these n
keys, where the nth key is the one for which we are searching. Let len be the
discrete random variable giving the number of positions equal to position n in
a given element of hashDist. Then if the given key is in the hash table, E[len]
gives the expected length of the ConsList searched; otherwise, E[len] − 1
gives this expected length.
We can express len as the sum of indicator random variables as follows:
n
len = I(ij = in ),
j=1
n
= P (ij = in ).
j=1
= 1 + (n − 1).
The above value is the expected length of the ConsList searched when
the key is found in a table containing n keys. If the key is not in the table,
n − 1 gives the number of keys in the table, and E[len] is one greater than
the expected length of the ConsList. Thus, if we let n denote the number of
keys in the table, the length of the ConsList searched is expected to be n.
In either case, the length of the ConsList is linear in n if is a fixed
constant. However, may depend upon m. Thus, if ≤ c/m for some positive
constant c and we use an expandable array for the table, we can keep the
expected length bounded by a constant. Let λ = n/m be known as the load
factor of the hash table. Using the expandable array design pattern, we can
ensure that λ ≤ d, where d is a fixed positive real number of our choosing.
Thus, the expected list length is bounded by
1 + n ≤ 1 + cn/m
= 1 + cλ
≤ 1 + cd
∈ O(1).
We will assume that our keys are represented as natural numbers. This
assumption does not result in any loss of generality, because all data types
can be viewed as sequences of bytes, or more generally, as w-bit components.
We can view each component as a natural number less than 2w . The sequence
k1 , . . . , kl then represents the natural number
l
ki 2w(l−i) ;
i=1
h(k) = k mod m,
We can therefore compute h(k) bottom-up by starting with the first com-
ponent of k and repeatedly multiplying by 2w , adding the next component,
and taking the result mod m.
The division method is illustrated in Figure 7.7, where an implementa-
tion of HashFunction is presented. The representation of HashFunction
is a Nat size, and the structural invariant is size > 0. We assume the
existence of a function ToArray(x, w), which returns an array of Nats,
each strictly less than 2w , and which together give a representation of x. It
is easily seen that Index runs in time linear in the length of the key.
Storage/Retrieval II: Unordered Keys 257
l
l
256l−i ki = (255 + 1)l−i ki
i=1 i=1
l l−i
l−i
= 255j ki .
j
i=1 j=0
Each term of the inner sum such that j > 0 is divisible by 255; hence,
computing the key mod 255 yields:
l
l
i−1
256 ki mod 255 = ki mod 255.
i=1 i=1
258 Algorithms: A Top-Down Approach
where w is the number of bits in a machine word, excluding any sign bit. The
final “mod 2w ” describes the effect of overflow in a w-bit unsigned integer.
Thus, if an unsigned integer is used, this operation need not be explicitly
performed.
260 Algorithms: A Top-Down Approach
This gives us a top-down solution that can be applied bottom-up in the same
way as we applied the division method directly to large keys. Specifically,
we start with k1 and repeatedly multiply by r and add the next ki . This
procedure requires one multiplication and one addition for each component
of the key. Furthermore, all computation can be done with single-word
arithmetic.
In order for this method to work well, r must be chosen properly. We first
note that 256 is a poor choice, because 256i mod 2w = 0 for all i ≥ w/8;
thus only the first w/8 components of the key are used in computing the hash
value. More generally, r should never be even, because (c2j )i mod 2w = 0
for j > 0 and i ≥ w/j. Furthermore, not all odd values work well. For
example, r = 1 yields r i = 1 for all i, so that the result is simply the
sum of the components, mod 2w . This has the disadvantage of causing all
permutations of a key to collide.
More generally, if r is odd, r i mod 2w will repeat its values in a cyclic
fashion. In other words, for every odd r there is a natural number n such
that r n+i mod 2w = r i for all i ∈ N. Fortunately, there are only a few values
of r (like 1) that have short cycles. In order to avoid these short cycles, we
would like to choose r so that this cycle length is as large as possible. It is
beyond the scope of this book to explain why, but it turns out that this cycle
length is maximized whenever r mod 8 is either 3 or 5.
We can run into other problems if r is small and the component size
is smaller than w. Suppose, for example that r = 3, w = 32, and each
component is one byte. For any key containing fewer than 15 components,
the polynomial-hash value will be less than 231 . We have therefore reduced
the range of possible results by more than half — much more for shorter
keys. As a result, more collisions than necessary are introduced. A similar
phenomenon occurs if r is very close to 2w .
If we avoid these problems, polynomial hashing usually works well as
a compression map. To summarize, we should choose r so that r mod 8 is
either 3 or 5, and not too close to either 0 or 2w . This last condition can
Storage/Retrieval II: Unordered Keys 261
l
at least lg m2 = 2l lg m bits. If, for example, each key is 32 bits and our hash
table size is 256, four gigabytes of storage would be needed just to identify
the hash function.
Instead, we will randomly generate a table location for each of the l bit
positions. Let these locations be t1 , . . . , tl . We will assume that m is a power
of 2 so that each of these locations is encoded using lg m bits. A given key
k will select the subsequence of t1 , . . . , tl such that ti is included iff the
ith bit of k is a 1. Thus, each key selects a unique subsequence of locations.
The hash table location of k is then given by the bitwise exclusive-or of the
locations in the subsequence; in other words, the binary encoding of the hash
location has a 1 in position j iff the number of selected locations having a 1
in position j is odd.
Example 7.1. Suppose our keys contain 4 bits, and we want to use a hash
table with 8 locations. We then randomly generate 4 table locations, one for
each of the 4 bit positions in the keys:
• t1 = 3, or 011 in binary;
• t2 = 6, or 110 in binary;
• t3 = 0, or 000 in binary;
• t4 = 3, or 011 in binary.
p 1−p 1
+ = .
2 2 2
Storage/Retrieval II: Unordered Keys 263
We now define
1
Hl,m = {hs | s ∈ Sl,m }.
show that for every distinct k, k ∈ U , P (h(k) = h(k )) = 1/m, so that Hl,m
1
264 Algorithms: A Top-Down Approach
i + 1 through j at that time. Note that neither of these strategies add any
significant overhead — they simply delay the generation of the bit strings.
We leave the implementation details as an exercise.
Theorem 7.3. Let a, b, and m be natural numbers such that 0 < a < m
and b < m. Then the equation
ai mod m = b
has a unique solution in the range 0 ≤ i < m iff a and m are relatively prime
(i.e., 1 is the greatest common divisor of a and m).
Proof. Because we will only need to use this theorem in one direction, we
will only prove one implication and leave the other as an exercise.
266 Algorithms: A Top-Down Approach
ai − q1 m = aj − q2 m
a(i − j) = (q1 − q2 )m,
For our next universal family, we will interpret the keys as natural
numbers and assume that there is some maximum value for a key. Let
p be a prime number strictly larger than this maximum key value. Our
hash functions will consist of two steps. The first step will map each key
to a unique natural number less than p. We will design this part so that,
depending on which hash function is used, a distinct pair of keys will be
mapped with uniform probability to any of the pairs of distinct natural
numbers less than p. The second step will apply the division method to scale
the value to an appropriate range.
For the first step, let
for a and b strictly less than p. Consider distinct keys k and k . We then
have
(hp,a,b (k) − hp,a,b (k )) mod p = ((ak + b) mod p − (ak + b) mod p) mod p
= a(k − k ) mod p,
j = a(k − k ) mod p
= (hp,a,b (k) − hp,a,b (k )) mod p,
Storage/Retrieval II: Unordered Keys 267
Lemma 7.4. Let p be a prime number, and let k and k be distinct natural
numbers strictly less than p. If a and b are chosen independently and
uniformly such that 1 ≤ a < p and 0 ≤ b < p, then hp,a,b (k) and hp,a,b (k )
are any pair of distinct natural numbers less than p with uniform probability.
fm (i) = i mod m,
2
Hp,m = {fm ◦ hp,a,b | 0 < a < p, 0 ≤ b < p},
Proof. Let k and k be two distinct keys. As we argued above, hp,a,b (k)
and hp,a,b (k ) are distinct natural numbers less than p, and each possible
pair of distinct values can be obtained by exactly one pair of values for a
and b. fm (hp,a,b (k)) = fm (hp,a,b (k )) iff hp,a,b (k) mod m = hp,a,b (k ) mod m
iff hp,a,b (k) − hp,a,b (k ) is divisible by m. For any natural number i < p, there
are strictly fewer than p/m natural numbers j < p (other than i) such that
i − j is divisible by m. Because the number of these values of j is an integer,
it is at most (p − 1)/m. Because there are p possible values of hp,a,b (k) and
p(p − 1) possible pairs of values for hp,a,b (k) and hp,a,b (k ), each of which is
268 Algorithms: A Top-Down Approach
equally likely, the probability that fm (hp,a,b (k)) = fm (hp,a,b (k )) is at most
p−1
p m 1
= .
p(p − 1) m
Note that by the above theorem, Hp,m 2 is universal for any positive m.
As a result, the size of the hash table does not need to be a particular kind
of number, such as a prime number or a power of 2, in order for this strategy
to yield good expected performance. However, the restriction that p is a
prime number larger than the value of the largest possible key places some
limitations on the effectiveness of this approach. Specifically, if there is no
upper bound on the length of a key, we cannot choose a p that is guaranteed
to work. Furthermore, even if an upper bound is known, unless it is rather
small, the sizes of p, a, and b would make the cost of computing the hash
function too expensive.
Let us therefore treat keys as sequences of natural numbers strictly
smaller than some value p, which we presume to be not too large (e.g.,
small enough to fit in a single machine word). Furthermore, let us choose p
to be a prime number. Let k1 , . . . , kl be a key, and let s = a1 , . . . , al be a
sequence of natural numbers, each of which is strictly less than p. We then
define
l
hp,s (k1 , . . . , kl ) = ai ki mod p.
i=1
We first observe that we cannot guarantee that hp,s (k) = hp,s (k ) for
each distinct pair of keys k and k . The reason for this is that there are
potentially more keys than there are values of hp,s . However, suppose k and
k are distinct keys, and let ki = ki , where 1 ≤ i ≤ l. Let us arbitrarily fix
the values of all aj such that j = i, and let
⎛ ⎞
i−1
l
i−1
l
c=⎝ aj kj + aj kj − aj kj − aj kj ⎠ mod p.
j=1 j=i+1 j=1 j=i+1
Then
⎛ ⎞
l l
(hp,s (k) − hp,s (k )) mod p = ⎝ aj kj − aj kj ⎠ mod p
j=1 j=1
We now define
3
Hp,l = {hp,s | s = a1 , . . . , al , 0 ≤ ai < p for 1 ≤ i ≤ l}. (7.2)
If we know in advance the approximate size of our data set and the
maximum key length, we can select an appropriate prime value for p and
randomly select the appropriate hash function from Hp,l 3 . Because we can
apply the mod operation after each addition, we are always working with
values having no more than roughly twice the number of bits as p; hence, we
can compute this hash function reasonably quickly for each key. Furthermore,
even if we don’t know the maximum key length, we can generate the
multipliers ai as we need them.
However, if we don’t know in advance the approximate size of the data
set, we may need to use rehashing. For the sake of efficiency, we would like to
avoid the need to apply a new hash function to the entire key. Furthermore,
as we will see in the next section, it would be useful to have a universal family
that is appropriate for large keys and for which the table size is unrestricted.
270 Algorithms: A Top-Down Approach
3 with
A straightforward attempt to achieve these goals is to combine Hp,l
2 . Specifically, we define
Hp,m
4 2 3
Hp,l,m = {h1 ◦ h2 | h1 ∈ Hp,m , h2 ∈ Hp,l }.
to collide with probability 1/p. When the function from Hp,m 2 is applied to
equal values, it yields equal values. We must therefore be careful in analyzing
4
the probability of collisions for Hp,l,m .
Let us first consider the case in which two distinct keys k and k are
mapped to distinct values by hp,s ∈ Hp,l 3 . From Lemma 7.6, the probability
1 p−1
1− = .
p p
Furthermore, from Lemma 7.4, hp,a,b (hp,s (k)) and hp,a,b (hp,s (k )) are with
uniform probability any pair of distinct natural numbers less than p, provided
a and b are chosen independently with uniform probability such that 1 ≤
a < p and 0 ≤ b < p. Because there are p(p − 1) pairs of distinct natural
numbers less than p, this probability is
1
.
p(p − 1)
Therefore, given any two distinct keys k and k , and any two distinct natural
numbers i and j strictly less than p, the probability that hp,a,b (hp,s (k)) = i
Storage/Retrieval II: Unordered Keys 271
Now consider the case in which hp,s (k) = hp,s (k ). From Lemma 7.6, this
case occurs with probability 1/p. For any value of a, 1 ≤ a < p, and any
value of i, 0 ≤ i < p, there is exactly one value of b such that 0 ≤ b < p and
Thus, each value of i is reached with probability 1/p. Therefore, for each
natural number i < p, the probability that hp,a,b (hp,s (k)) = hp,a,b (hp,s (k )) =
i is 1/p2 .
Thus, for a hash function h chosen from Hp,l,m4 , h(k) = i mod m and
h(k ) = j mod m, where i and j are natural numbers less than p chosen
independently with uniform probability. Furthermore, i mod m = j mod m
iff i − j is divisible by m. Because p − (p mod m) is divisible by m, for any i,
exactly 1 of every m values j such that 0 ≤ j < p − (p mod m) is such that
i − j is divisible by m. Likewise, for any j, exactly 1 of every m values i
such that 0 ≤ i < p − (p mod m) is such that i − j is divisible by m (see
Figure 7.9). Thus, of the p2 − (p mod m)2 pairs in which at least one value
is less than p − (p mod m), exactly
p2 − (p mod m)2
m
pairs result in collisions. Of the remaining (p mod m)2 pairs, only those in
which i = j result in collisions. There are exactly p mod m such pairs. Thus,
the probability of a collision is exactly
However, recall from Section 7.2 that Θ(1) amortized expected perfor-
mance can be achieved using rehashing if the probability of collisions is
bounded by c/m for some positive real number c. We therefore define a
family of hash functions to be c-universal if for each pair of distinct keys,
the probability of a collision is at most c/m. In what follows, we will derive
4
a c such that Hp,l,m is c-universal whenever 1 < m < p.
Specifically, we need to find a real number c such that whenever p is
prime and 1 < m < p,
or equivalently, to minimize
There are several ways to find the minimum value of a quadratic, but
one way that does not involve calculus is by the technique of completing the
square. A quadratic of the form (ax − b)2 is clearly nonnegative for all values
of a, x, and b. Furthermore, it reaches a value of 0 (its minimum) at x = b/a.
We can therefore minimize f (m) by finding a value d such that f (m) − d is
of the form
Thus, −f (m) — and hence the numerator of the second term in the right-
hand side of (7.5) — is never more than p2 /8. Furthermore, this value is
achieved (assuming for the moment that m varies over the real numbers)
when
3p
√
2 2
m= √
2
3p
= .
4
We conclude that the right-hand side of (7.5) is bounded above by
p2 /8
1+ = 9/8.
p2
We therefore have the following theorem.
Theorem 7.8. For any prime number p and positive integers l and m such
4
that 1 < m < p, Hp,l,m is 9/8-universal.
The upper bound of 9/8 can be reached when m = 3p/4; however, in
order for this equality to be satisfied, p must be a multiple of 4, and hence
cannot be prime. We can, however, come arbitrarily close to this bound by
using a sufficiently large prime number p and setting m to either 3p/4 or
3p/4. Practically speaking, though, such values for m are much too large.
In practice, m would be much smaller than p, and as a result, the actual
probability of a collision would be much closer to 1/m.
By choosing p to be of an appropriate size, we can choose a single h of
the form
l
h(k) = a ai ki + b mod p,
i=1
table. We can then compute the new hash values for each k by looking up
h(k) and computing h(k) mod 2m.
If p = 231 − 1 and m is a power of 2, then p mod m = m − 1. Substituting
this value into (7.4), we see that the probability of a collision is
1 m−1 1 1
+ 2
< + 31
m mp m (2 − 1)2
1
< + 2−61 .
m
can then guarantee that accesses will be fast. To achieve this goal, we use a
technique called perfect hashing.
One of the drawbacks to hashing is that we can’t guarantee that there
will be no collisions. In fact, we can’t even guarantee that all of keys don’t
hash to the same location. Universal hashing gives us an expectation that the
resulting hash table will not have too many collisions. Thus, even though we
might be unlucky and choose a hash function that yields poor performance
on our data set, if we randomly select several different hash functions, we
can expect to find one that yields a small number of collisions.
With perfect hashing, our goal is to produce a hash table with no
collisions. Unfortunately, as we saw in Section 7.2, unless the size of the
hash table is much larger than the number of keys, we can expect to have
at least one collision. With a reasonable table size, we would probably need
to try many different hash functions before we found one that yielded no
collisions.
We can avoid this difficulty, however, by employing a two-level approach
(see Figure 7.11). Instead of using a ConsList to store all of the elements
that hash to a certain location, we use a secondary hash table with its own
hash function. The secondary hash tables that store more than one element
are much larger than the number of elements they store. As a result, we
will be able to find a hash function for each secondary hash table such that
no collisions occur. Furthermore, we will see that the sizes of the secondary
hash tables can be chosen so that the total number of locations in all of the
hash tables combined is linear in the number of elements stored.
Let us first determine an appropriate size m for a secondary hash table
in which we need to store n distinct keys. We saw in Section 7.2 that in order
for the expected number of collisions to be less than 1, if the probability that
two keys collide is 1/m, then m must be nearly n2 . We will therefore assume
that m ≥ n2 .
Let Hm be a c-universal family of hash functions. We wish to determine
an upper bound on the number of hash functions we would need to select
from Hm before we can expect to find one that produces no collisions among
the given keys. Let coll be the discrete random variable giving the total
number of collisions, as defined in Section 7.2, produced by a hash function
h ∈ Hm on distinct keys k1 , . . . , kn . As we showed in Section 7.2,
n−1
n
E[coll] = P (h(ki ) = h(kj )).
i=1 j=i+1
Storage/Retrieval II: Unordered Keys 277
0 12
0 57
1
1
2
0 2
3 51
1 15 3 64
4
2 27 4
5 24
3 5 36
6
6
7 16
7
8
8
9 83
Because the probability that any two distinct keys collide is no more than
c/m ≤ c/n2 , we have
n−1
n
c
E[coll] ≤
n2
i=1 j=i+1
n−1
c
= 2 (n − i)
n
i=1
n−1
c
= i (reversing the sum)
n2
i=1
cn(n − 1)
= (by (2.1))
2n2
< c/2.
From Markov’s Inequality (5.3) on page 194, the probability that there is at
least one collision is therefore less than c/2.
Suppose, for example, that c = 1, as for a universal hash family. Then the
probability that a randomly chosen hash function results in no collisions is
278 Algorithms: A Top-Down Approach
4
greater than 1/2. If c = 9/8, as for Hp,l,m , then the probability is greater than
7/16. Suppose we repeatedly select hash functions and try storing the keys
in the table. Because the probability that there are no collisions is positive
whenever c < 2, we will eventually find a hash function that produces no
collisions.
Let us now determine how many hash functions we would expect to
try before finding one that results in no collisions. Let reps be the discrete
random variable giving this number. For a given positive integer i, P (reps ≥
i) is the probability that i − 1 successive hash functions fail; i.e.,
Suppose c < 2. Then we can re-index the sum to begin at 0 and apply
Theorem 6.7, yielding
∞
E[reps] < (2/c)−i
i=0
2/c
=
(2/c) − 1
2
= .
2−c
Note that the above value is a fixed constant for fixed c < 2. Thus,
the expected number of attempts at finding an appropriate secondary hash
function is bounded by a fixed constant. For example, with c = 1, the value
of this constant is less than 2, or with c = 9/8, the value is less than 16/7.
As a result, we would expect that the number of times a secondary hash
function is applied to any key during the process placing keys in secondary
hash tables is bounded by a constant.
We must now ensure that the total space used by the primary and
secondary hash tables (and hence the time needed to initialize them) is
linear in n, the total number of keys. Suppose the primary hash table has
Storage/Retrieval II: Unordered Keys 279
Let sumsq be a discrete random variable denoting the above sum. The
expected space usage of the secondary hash tables is then linear in E[sumsq].
In order to analyze E[sumsq], we first observe that n2i is closely related to
the number of collisions at index i. The number of collisions at index i is
ni (ni − 1)/2, so that
m−1
ni (ni − 1)
E[coll] = E
2
i=0
m−1 m−1
1
= E n2i −E ni
2
i=0 i=0
= (E[sumsq] − E[n])/2
= (E[sumsq] − n)/2.
E[sumsq] = 2E[coll] + n.
cn(n − 1)
E[coll] ≤ .
2m
Hence,
E[sumsq] = 2E[coll] + n
cn(n − 1)
≤ + n. (7.6)
m
280 Algorithms: A Top-Down Approach
• if table[i] = nil, then the array stored there is indexed 0..s − 1, where s is
the size of functions[i];
• if an element with key k is stored at table[i][j], then hash.Index(k) = i
and functions[i].Index(k) = j; and
• size = n, the total number of keys stored.
m−1
m−1
m−1
Θ(n2i + ni f (l)) = Θ(n2i ) + Θ(ni f (l))
i=0 i=0 i=0
m−1
m−1
=Θ n2i + Θ f (l) ni
i=0 i=0
= Θ((c + 1)n) + Θ(nf (l))
= Θ(nf (l)).
The total expected running time of the constructor is therefore in Θ(nf (l)).
282 Algorithms: A Top-Down Approach
Thus, for Hp,m 2 , the constructor runs in Θ(n) expected time, and for
4
Hp,l,m , the constructor runs in Θ(nl) expected time. It is not hard to show
that the constructor runs in Θ(nl) expected time for Hl,m1 as well; the details
function with size less than 2n, and the constructors for the families Hp,m2
4
and Hp,l,m both return hash functions with size n. Furthermore, if we were
to fix a specific c-universal family of hash functions, we could reduce the
bound on the first repeat loop to 2(c + 1)n.
Combining the above results, we see that the worst-case total number of
array locations can be reduced to:
1 ;
• 10n for Hl,m
2 ; or
• 5n for Hp,m
4
• 21n/4 for Hp,l,m .
Finally, we observe that because E[sumsq] < (c + 1)n, the expected total
number of array locations is no more than
1 ;
• 6n for Hl,m
2 ; or
• 3n for Hp,m
4
• 25n/8 for Hp,l,m .
These last bounds hold regardless of whether we change the bound on the
first repeat loop.
The Get operation is shown in Figure 7.13. It clearly runs in Θ(f (l))
time, where f (l) is the time needed to compute the hash function on a key
of length l.
7.7 Summary
If keys are natural numbers, we can implement Dictionary using a VArray
and thus achieve constant-time accesses in the worst case. However, the space
usage of a VArray makes it impractical. For this reason, hash tables are
the preferred implementation in practice. Furthermore, hashing can be done
for arbitrary types of keys.
284 Algorithms: A Top-Down Approach
7.8 Exercises
Exercise 7.1. Prove that VArray, shown in Figure 7.2, meets its
specification.
Exercise 7.2. Give an algorithm that takes as input an array A[1..n] of
natural numbers and returns an array B[1..n] such that for 1 ≤ i ≤ n, B[i]
gives the last location in A that contains A[i]. Your algorithm must run in
O(n) time in the worst case, and you may make no assumptions about how
large the elements in A are. Prove the correctness and time complexity of
your algorithm. [Hint: Use a VArray.]
Exercise 7.3. Complete the implementation of HashTable shown in
Figures 7.5 (p. 252) and 7.6 (p. 255) by adding a Remove operation as
specified in Figure 6.2 (p. 204).
Exercise 7.4. Prove that if the cost of rehashing, as implemented in
Figure 7.6 (p. 255), is amortized over all Put and Remove operations,
the amortized cost of rehashing is proportional to the cost of computing the
index for a single key.
Exercise 7.5. Prove the following for all integers x and y and all positive
integers m:
a. (x + (y mod m)) mod m = (x + y) mod m.
b. (x(y mod m)) mod m = (xy) mod m.
c. (−(x mod m)) mod m = (−x) mod m.
Exercise 7.6. Show the hash table that results from inserting the following
keys in the order listed, assuming the division method is used with a table
of size 13:
27, 36, 14, 40, 42, 15, 25, 2.
You may assume that no rehashing is done. How does the number of
collisions, as defined by the random variable coll in Section 7.2, compare with
the expected number, assuming that distinct keys collide with probability
1/13?
286 Algorithms: A Top-Down Approach
assume the variable p contains a prime number larger than any key. You
may also assume that all values will fit into integer variables.
Exercise 7.13. Implement HashFunction to provide Hp,l 3 . You may
assume the variable p contains a prime number larger than w bits, where w
is another variable. You may also assume that if a, b, and c are all natural
numbers less than p, then ab + c will fit in an integer variable; however,
you may not assume that arbitrarily many of these values added together
will fit.
Storage/Retrieval II: Unordered Keys 287
that for each ai , 1 ≤ ai < p. Show that for every l ≥ 2 and prime number
p, the resulting family of hash functions is not universal. Specifically, show
that there are two distinct keys that collide with probability strictly greater
than 1/p. [Hint: First consider l = 2, then generalize.]
4
Exercise 7.15. Implement HashFunction to provide Hp,l,m using the
same assumptions as for Exercise 7.13.
l
h(k1 , . . . , kl ) = a ai ki + b mod p mod m
i=1
cn(n − 1)
+ n + m,
m
Exercise 7.18. Prove that the constructor for PerfectHash runs in Θ(nl)
1 is used as the universal hash family, where m is a power
expected time if Hl,m
of 2.
288 Algorithms: A Top-Down Approach
7.9 Notes
Virtual initialization was suggested by Aho et al. [2, Exercise 2.12].
The first description of hashing in the literature was by Dumey [33],
who also introduced the division method. However, the concept appears
to have been discovered a few years earlier at IBM by H. P. Luhn and
independently by Gene M. Amdahl, Elaine M. Boehme, N. Rochester, and
Arthur L. Samuel. Knuth [84] gives a detailed treatment of deterministic
hashing.
Universal hashing was introduced by Carter and Wegman [19]. They
1
presented the universal families Hl,m 2 . The notion of a c-universal
and Hp,m
family is closely related to the notion of an -universal family defined by
Cormen et al. [25].
The perfect hashing strategy given in Section 7.6 is due to Fredman
et al. [45].
Chapter 8
Disjoint Sets
In order to motivate the topic of this chapter, let us consider the following
problem. We want to design an algorithm to schedule a set of jobs on a
single server. Each job requires one unit of execution time and has its own
deadline. We must assign a job with deadline d to some time slot t, where
1 ≤ t ≤ d. Furthermore, no two jobs can be assigned to the same time slot. If
we can’t find a time slot for some jobs, we simply won’t schedule them. One
way to construct such a schedule is to assign each job in turn to the latest
available time slot prior to its deadline, provided there is such a time slot.
The challenge here is to find an efficient way of locating the latest available
time slot prior to the deadline.
One way to think about this problem is to partition the time slots into
disjoint sets — i.e., a collection of sets such that no two sets have any element
in common. In this case, each set will contain a non-empty range of time slots
such that the first has not been assigned to a job, but all the rest have been
assigned to jobs. In order to be able to handle the case in which time slot 1
has been assigned a job, we will also include a time slot 0, which we will
consider to be always available.
Suppose, for example, that we have scheduled jobs in time slots 1, 2, 5,
7, and 8. Each set must have a single available time slot, which must be the
smallest time slot in that set; thus, the elements 0, 3, 4, 6, and all elements
greater than 8 must be in different sets and must each be the smallest element
of its set. If 10 is the latest deadline, our disjoint sets will therefore be
{0, 1, 2}, {3}, {4, 5}, {6, 7, 8}, {9}, and {10}. If we then wish to schedule
a job with deadline 8, we need to find the latest available time slot prior
to 8. This is simply the first time slot in the set containing 8 — namely, 6.
Thus, in order to find this time slot, we need to be able to determine which
set contains the deadline 8, and what is the first time slot in that set.
289
290 Algorithms: A Top-Down Approach
When we then schedule the job at time slot 6, the set {6, 7, 8} no longer
contains an available time slot. We therefore need to merge the set {6, 7, 8}
with the set containing 5, namely, {4, 5}.
The operations of finding the set containing a given element and merging
two sets are typical of many algorithms that manipulate disjoint sets. The
operation of finding the smallest element of a given set is not as commonly
needed, so we will ignore this operation for now; however, as we will see
shortly, it is not hard to use an array to keep track of this information.
Furthermore, we often need to manipulate objects other than Nats; however,
we can always store these objects in an array and use their indices as the
elements of the disjoint sets. For this reason, we will simplify matters by
assuming that the elements of the disjoint sets are the Nats 0..n − 1. In gen-
eral, the individual sets will be allowed to contain non-consecutive integers.
The DisjointSets ADT, shown in Figure 8.1, specifies the data
structure we need. Each of the sets contains an element that is distinguished
as its representative. The Find operation simply returns that representative.
Thus, if two calls to Find return the same result, we know that both elements
belong to the same set. The Merge operation takes two representatives,
combines the sets identified by these elements, and returns the resulting
set’s representative. In this chapter, we will consider how the DisjointSets
ADT can be implemented efficiently. Before we do this, however, let us take
a closer look at how the DisjointSets ADT can be used to implement the
scheduling algorithm outlined above.
f (h) = 2f (h − 1).
h = lg f (h)
≤ lg k.
Let rank s and parents denote the values of the rank and parent arrays
in state s. We will define our potential function based on these values. Let
denote the initial state. Thus, for 0 ≤ i < n, rank [i] = 0. In order for Φ to
be a valid potential function, we need Φ() = 0. To accomplish this, we let
φs (i) = 0 if rank s [i] = 0 for 0 ≤ i < n and any state s. Note that a node can
only obtain a non-zero rank when a Merge makes it the parent of another
node; thus, rank s [i] = 0 iff i is a leaf.
We have two operations we need to consider as we define φs (i) for non-
leaf i. Merge is a cheap operation, having an actual cost of 2, whereas Find
is more expensive in the worst case. We therefore need to amortize the cost
of an expensive Find over preceding Merges. This means that we need
298 Algorithms: A Top-Down Approach
we must attain a value of at least Af (s,i)+1 (rank s [i]). Then if we define f (s, i)
to be the maximum k such that rank s [parents [i]] ≥ Ak (rank s [i]), we can
never have rank s [parents [i]] ≥ Af (s,i)+1 (rank s [i]).
We still need to define the functions Ak . In order to facilitate this
definition, we first define the iteration operator for functions. Let F : N → N.
We then define
F (0) (n) = n
F (k) (n) = F (F (k−1) (n)) for k > 0.
For example, if F (n) = 2n, then F (2) (n) = 4n and F (3) (n) = 8n; more
generally, F (k) (n) = 2k n.
We now define:
n+1 if k = 0
Ak (n) = (n+1)
Ak−1 (n) if k ≥ 1.
We can then define, for each node i that is neither a leaf nor a root,
f (s, i) = max{k | rank s [parents [i]] ≥ Ak (rank s [i])}
and
(k)
g(s, i) = max{k | rank s [parents [i]] ≥ Af (s,i) (rank s [i])}.
Finally, we need f (s, i) < α(n) whenever i is neither a leaf nor a root.
Thus, we need to ensure that whenever i is neither a leaf nor a root, we have
Aα(n) (rank s [i]) > rank s [parents [i]].
We have shown that without path compression, the height of a tree never
exceeds lg n; hence, with path compression, the rank of a node never exceeds
lg n. It therefore suffices to define
α(n) = min{k | Ak (1) > lg n}.
As the subscript k increases, Ak (1) increases very rapidly. We leave it as
an exercise to show that
2
··
2·
A4 (1) ≥ 2 ,
where there are 2051 2s on the right-hand side. It is hard to comprehend
just how large this value is, for if the right-hand side contained only six 2s,
the number of bits required to store it would be 265536 + 1. By contrast, the
number of elementary particles in the universe is currently estimated to be
Disjoint Sets 301
no more than about 2300 . Hence, there is not nearly enough matter in the
universe to store A4 (1) in binary. Because α(n) ≤ 4 for all n < 2A4 (1) , we
can see that α grows very slowly.
To summarize, we define our potential function Φ so that
n−1
Φ(s) = φs (i),
i=0
where
⎧
⎨0 if rank s [i] = 0
φs (i) = α(n)rank s [i] if parents [i] = i
⎩
(α(n) − f (s, i))rank s [i] − g(s, i) otherwise,
for α, f , and g as defined above. Before we can complete the amortized
analysis, we need to show that both f and g satisfy the properties outlined
in the discussion above.
Lemma 8.1. Let s be a state of a CompressedDisjointSets of size n,
and let 0 ≤ i < n such that parents [i] = i and ranks [i] > 0. Then
Proof. First, because the rank of the parent of i is strictly larger than that
of i, we have
We are now ready to show that the amortized costs of Merge and Find
are in O(α(n)).
Theorem 8.3. With respect to Φ, the amortized cost of Merge on a
CompressedDisjointSets of size n is in O(α(n)).
Proof. Suppose we do Merge(i, j) in state s, yielding state s . Without
loss of generality, assume j is made the parent of i. Then i is the only node
whose parent changes, and j is the only node whose rank may change; hence,
the potentials for all other nodes remain unchanged. The change in potential
for node i is given by
φs (i) − φs (i) = (α(n) − f (s , i))rank s [i] − g(s , i) − α(n)rank s [i]
< α(n)(rank s [i] − rank s [i])
= 0.
Because f (s , i) = k, g(s , i) > g(s, i), so that φs (i) < φs (i).
The above theorems show that the amortized running times of Merge
and Find are in O(α(n)). However, α appears to be a somewhat contrived
function. We have argued intuitively that α increases very slowly, but we
have not formally compared it with any better-known slow-growing function
like lg or lg lg. We address this issue more formally in the Exercises. For now,
we will simply state that the collection of functions Ak form a variation of
Ackermann’s function, and that α is one way of defining its inverse. There
have actually been several different 2- or 3-variable functions that have been
called Ackermann’s function, and all grow at roughly the same rapid rate.
8.5 Summary
Tree-based implementations of disjoint sets provide very efficient Merge
and Find operations, particularly when path compression is used. The worst-
case running times for these operations are in Θ(1) and Θ(lg n), respectively,
for both ShortDisjointSets and CompressedDisjointSets. The latter
implementation yields nearly constant amortized running time. A summary
of the running times of the operations for the different implementations is
shown in Figure 8.7. As we will see in later chapters, these structures are
very useful in the design of efficient algorithms.
Figure 8.7 Comparison of running times of the DisjointSets operations for various
implementations
Notes :
Disjoint Sets 305
8.6 Exercises
Exercise 8.1. Draw the trees that result from the following sequence of
operations:
t ← new TreeDisjointSets(8)
t.Merge(0, 1)
t.Merge(t.Find(1), 2)
t.Merge(3, 4)
t.Merge(5, 6)
t.Merge(t.Find(3), t.Find(6))
t.Merge(t.Find(3), t.Find(0))
8.7 Notes
The TreeDisjointSets implementation of DisjointSets is due to Galler
and Fischer [47]. The improvement of Section 8.3 is presented by Hopcroft
and Ullman [66], who credit McIlroy and Morris with having implemented it.
The improvement using path compression is credited to Tritter by Knuth
[82]. The amortized analysis of this structure yielding results similar to those
presented here was done by Tarjan [111,112]. The analysis given here is based
on the presentation by Cormen et al. [25], which is based on a proof due to
Kozen [87].
Exercise 8.12 is from Brassard and Bratley [17].
This page intentionally left blank
Chapter 9
Graphs
309
310 Algorithms: A Top-Down Approach
indicate the directions of the edges. Conventionally, we draw the edge (u, v)
as an arrow from u to v (see Figure 9.2). For a directed edge (u, v) we say
that v is adjacent to u, but not vice versa (unless (v, u) is also an edge in
the graph).
We usually want to associate some additional information with the
vertices and/or the edges. For example, if the graph is used to represent
distances between points on a map, we would want to associate a distance
with each edge. In addition, we might want to associate the name of a city
with each vertex. In order to simplify our presentation, we will focus our
attention on the edges of a graph and any information associated with them.
Specifically, as we did for disjoint sets in the previous chapter, we will adopt
the convention that the vertices of a graph will be designated by natural
numbers 0, . . . , n − 1. If additional information needs to be associated with
vertices, it can be stored in an array indexed by the numbers designating the
vertices. While some applications might require more flexibility, this scheme
is sufficient for our purposes.
Graphs 311
We can therefore solve the original universal sink detection problem for a
nonempty graph by first finding a candidate vertex i as described above. We
know that if there is a universal sink, it must be i. We then check whether
i is a universal sink by verifying that for every j = i, (j, i) is an edge but
(i, j) is not. The resulting algorithm is shown in Figure 9.4.
can be modeled by the directed acyclic graph shown in Figure 9.5. We need
to find an ordering of the vertices such that for every edge (u, v), u precedes
v in the ordering. Such an ordering is called a topological sort of the graph.
Examples of topological sorts of the graph in Figure 9.5 are B, A, C, D
and B, D, A, C. In this section, we will present an algorithm for finding a
topological sort of a given directed acyclic graph. First, we will show that
every directed acyclic graph has a topological sort.
Lemma 9.1. Every nonempty directed acyclic graph has at least one vertex
with no incoming edges.
Proof. By contradiction. Suppose every vertex in some nonempty directed
acyclic graph G has incoming edges. Then starting from any vertex, we may
always traverse an incoming edge backwards to its source. Because G has
finitely many vertices, if we trace a path in this fashion, we must eventually
repeat a vertex. We will have then found a cycle — a contradiction.
for loop therefore runs in Θ(n) time. Because it iterates n times, its running
time is in Θ(n2 ).
The first and third for loops clearly run in Θ(n) time. Furthermore, the
analysis of the fourth for loop is similar to that of the second. Therefore,
the entire algorithm runs in Θ(n2 ) time.
Note that the second and fourth for loops in TopSort each contain a
nested while loop. Each iteration of this while loop processes one of the
edges. Furthermore, each edge is processed at most once by each while
loop. The total number of iterations of each of the while loops is therefore
the number of edges in the graph. While this number can be as large as
n(n − 1) ∈ Θ(n2 ), it can also be much smaller.
318 Algorithms: A Top-Down Approach
The number of edges does not affect the asymptotic running time,
however, because MatrixGraph.AllFrom runs in Θ(n) time, regardless of
how many edges it retrieves. If we can make this operation more efficient, we
might be able to improve the running time for TopSort on graphs with few
edges. In the next section, we will examine an alternative implementation
that accomplishes this.
runs in Θ(m) time. Note that Θ(m) ⊆ O(n). The space usage of ListGraph
is easily seen to be in Θ(n + a), where a is the number of edges in the graph.
Let us now revisit the analysis of the running time of TopSort (Figure
9.6), this time assuming that G is a ListGraph. Consider the second for
loop. Note that running time of the nested while loop does not depend on
the implementation of G; hence, we can still conclude that it runs in O(n)
time. We can therefore conclude that the running time of the second for
loop is in O(n2 ). However, because we have reduced the running time of
AllFrom from Θ(n) to Θ(1), it is no longer clear that the running time of
this loop is in Ω(n2 ). Indeed, if there are no edges in the graph, then the
nested while loop will not iterate. In this case, the running time is in Θ(n).
We therefore need to analyze the running time of the nested while
loop more carefully. Note that over the course of the for loop, each edge
is processed by the inner while loop exactly once. Therefore, the body of
the inner loop is executed exactly a times over the course of the entire outer
loop, where a is the number of edges in G. Because the remainder of the
outer loop is executed exactly n times, the running time of the outer loop is
in Θ(n + a).
We now observe that the fourth loop can be analyzed in exactly the same
way as the second loop; hence, the fourth loop also runs in Θ(n + a) time.
In fact, because the structure of these two loops is quite common for graph
algorithms, this method of calculating the running time is often needed for
analyzing algorithms that operate on ListGraphs.
To complete the analysis of TopSort, we observe that the first and
third loops do not depend on how G is implemented; hence, they both run
in Θ(n) time. The total running time of TopSort is therefore in Θ(n + a).
For graphs in which a ∈ o(n2 ), this is an improvement over the Θ(n2 ) running
time when G is implemented as a MatrixGraph.
Let us now consider the impact of the ListGraph implementation on
the analysis of UniversalSink (Figure 9.4). Due to the increased running
time of Get, the body of the while loop runs in Θ(m) time, where m is the
number of vertices adjacent to i. This number cannot be more than n − 1,
nor can it be more than a. Because this loop iterates Θ(n) times, we obtain
an upper bound of O(n min(n, a)). Likewise, it is easily seen that the for
loop runs in O(n min(n, a)) time.
To see that this bound is tight for the while loop, let us first consider
the case in which a ≤ n(n − 1) − n/2 . Suppose that from vertex 0 there is
an edge to each of the vertices 1, . . . , min(a, (n − 1)/2 ), but no edge to any
other vertex. From vertices other than 0 we may have edges to any of the
Graphs 321
other vertices. Note that with these constraints, we can have up to n(n −
1) − n/2 edges. For such a graph, the first n/2 iterations of the while
loop will have i = 0, while j ranges from n − 1 down to (n − 1)/2 + 1. For
each of these iterations, Get(i, j) runs in Θ(min(a, n)) time, because there
are Θ(min(a, n)) vertices adjacent to 0, but j is not adjacent to 0. Because
the number of these iterations is in Θ(n), the total time is in Θ(n min(n, a)).
Now let us consider the case in which a > n(n − 1) − n/2 . In this case,
we make sure that from each of the vertices 0, . . . , n/2 − 1, there is an
edge to every other vertex. Furthermore, we make sure that in each of the
ConsLists of edges from these first n/2 vertices, the edge to vertex n − 1
occurs last. From the remaining vertices we may have any edges listed in any
order. For such a graph, the first n/2 iterations of the while loop will have
j = n − 1, while i ranges from 0 to n/2 − 1. For each of these iterations,
Get(i, j) runs in Θ(n) time, because there are Θ(n) vertices adjacent to i,
and n − 1 is the last of these. Because the total number of iterations is
in Θ(n), the total time is in Θ(n2 ). Because a ≥ n, this is the same as
Θ(n min(a, n)).
Based on the analyses of the two algorithms, we can see that neither
implementation is necessarily better than the other. If an algorithm relies
more heavily on Get than on AllFrom, it is better to use MatrixGraph.
If an algorithm relies more heavily on AllFrom, it is probably better to use
ListGraph, particularly if there is a reasonable expectation that the graph
will be sparse — i.e., that it will have relatively few edges. Note also that
for sparse graphs, a ListGraph will use considerably less space.
9.5 Multigraphs
Let us briefly consider the building of a ListGraph. We must first construct
a graph with no edges, then add edges one by one using the Put operation.
The constructor runs in Θ(n) time. The Put operation runs in Θ(m)
time, where m is the number of vertices adjacent to the source of the
edge. It is easily seen that the time required to build the graph is in
O(n + a min(n, a)), where a is the number of edges. It is not hard to match
this upper bound using graphs in which the number of vertices with outgoing
edges is minimized for the given number of edges. An example of a sparse
graph (specifically, with a ≤ n) that gives this behavior is a graph whose
edge set is
{(0, j) | 1 ≤ j < a}.
322 Algorithms: A Top-Down Approach
temporary space in the worst case, where n and a are the number of vertices
and unique edges, respectively, in the given ListMultigraph.
9.6 Summary
Graphs are useful for representing relationships between data items. Various
algorithms can then be designed for manipulating graphs. As a result, we
can often use the same algorithm in a variety of different applications.
Graphs may be either directed or undirected, but we can treat undirected
graphs as directed graphs in which for every edge (u, v), there is a reverse
edge (v, u). We then have two implementations of graphs. The adjacency
matrix implementation has Get and Put operations that run in Θ(1) time,
but its AllFrom operation runs in Θ(n) time, where n is the number of
vertices in the graph. Its space usage is in Θ(n2 ). On the other hand, the
adjacency list implementation has an AllFrom operation that runs in Θ(1)
time, but its Get and Put operations run in Θ(m) time in the worst case,
where m is the number of vertices adjacent to the given source vertex. Its
space usage is in Θ(n + a) where n is the number of vertices and a is the
number of edges.
In order to improve the running time of the Put operation — and
hence of building a graph — when using an adjacency list, we can relax
our definition to allow parallel edges. The resulting structure is known
as a multigraph. We can always use a multigraph whenever a graph is
required, though it might be useful to maintain an invariant that no
parallel edges exist. Furthermore, we can construct a ListGraph from a
ListMultigraph with no parallel edges in Θ(n + a) time and Θ(n) space,
where n is the number of vertices and a is the number of edges. Figure 9.11
shows a summary of the running times of these operations for each of the
implementations of Graph, as well as for ListMultigraph.
9.7 Exercises
Exercise 9.1. Prove that UniversalSink, shown in Figure 9.4, meets its
specification.
Exercise 9.2. Prove that TopSort, shown in Figure 9.6, meets its
specification.
Exercise 9.3. Give an algorithm that takes as input a directed graph G =
(V, E) and returns a directed graph G = (V, E ), where
Figure 9.11 Comparison of running times for two implementations of Graph, along
with ListMultigraph
Notes:
Thus, G contains the same edges as does G, except that they are reversed
in G . Express the running time of your algorithm as simply as possible using
Θ-notation in terms of the number of vertices n and the number of edges a,
assuming the graphs are implemented using
a. MatrixGraph
b. ListGraph
c. ListMultigraph.
Exercise 9.4. Give an algorithm to compute the number of edges in a given
graph. Express the running time of your algorithm as simply as possible using
Θ-notation in terms of the number of vertices n and the number of edges a,
assuming the graph is implemented using
a. MatrixGraph
b. ListGraph
Exercise 9.5. A directed graph is said to be transitively closed if whenever
(u, v) and (v, w) are edges, then (u, w) is also an edge. Give an O(n3 )
Graphs 327
9.8 Notes
The study of graph theory began in 1736 with Leonhard Euler’s famous
study of the Königsberg Bridge Problem [38], which is simply the problem
of finding an Euler path in a connected undirected graph (see Exercise 9.10).
Good references on graph theory and graph algorithms include Even [39],
Kocay and Kreher [85], and Tarjan [112]. In the early days of electronic
computing, graphs were typically implemented using adjacency matrices.
Hopcroft and Tarjan [65] first proposed using adjacency lists for sparse
graphs. The topological sort algorithm of Section 9.2 is due to Knuth [82].
This page intentionally left blank
Part III
In Part I of this text, we introduced several techniques for applying the top-
down approach to algorithm design. We will now take a closer look at some
of these techniques. In this chapter, we will look at the divide-and-conquer
technique.
As we stated in Chapter 3, the divide-and-conquer technique involves
reducing a large instance of a problem to one or more instances having a
fixed fraction of the size of the original instance. For example, recall that
the algorithm MaxSumDC, shown in Figure 3.3 on page 78, reduces large
instances of the maximum subsequence sum problem to two smaller instances
of roughly half the size.
Though we can sometimes convert divide-and-conquer algorithms to
iterative algorithms, it is usually better to implement them using recursion.
One reason is that typical divide-and-conquer algorithms implemented using
recursion require very little stack space to support the recursion. If we divide
an instance of size n into instances of size n/b whenever n is divisible by b,
we can express the total stack usage due to recursion with the recurrence
f (n) ∈ f (n/b) + Θ(1).
Applying Theorem 3.35 to this recurrence, we see that f (n) ∈ Θ(lg n). The
other reason for retaining the recursion is that when a large instance is
reduced to more than one smaller instance, removing the recursion can be
difficult and usually requires the use of a stack to simulate at least one
recursive call.
Because divide-and-conquer algorithms are typically expressed using
recursion, the analysis of their running times usually involves the asymptotic
solution of a recurrence. Theorem 3.35 almost always applies to this
recurrence. Not only does this give us a tool for analyzing running times,
331
332 Algorithms: A Top-Down Approach
it also can give us some insight into what must be done to make an algorithm
more efficient. We will explain this concept further as we illustrate the
technique by applying it to several problems.
where
m−1
p0 (x) = ai x i ,
i=0
n−m−1
p1 (x) = am+i xi ,
i=0
m−1
q0 (x) = bi x i ,
i=0
n−m−1
q1 (x) = bm+i xi .
i=0
Divide and Conquer 333
If we set m = n/2, then each of the smaller polynomials has roughly n/2
terms.
The product polynomial is now
pq(x) = p0 (x)q0 (x) + xm (p0 (x)q1 (x) + p1 (x)q0 (x)) + x2m p1 (x)q1 (x). (10.1)
To obtain the coefficients of pq, we can first compute the four products
of the smaller polynomials. We can then obtain any given coefficient of pq
by performing at most two additions. We can therefore obtain all 2n − 1
coefficients in Θ(n) time after the four smaller products are computed.
Setting m = n/2, we can describe the running time of this divide-and-
conquer algorithm with the following recurrence:
Note that all four of the terms in the right-hand-side above appear in the
product pq (see (10.1)). In order to make this fact useful, however, we need
to be able to separate out the first and last terms. We can do this by
334 Algorithms: A Top-Down Approach
computing the products p0 (x)q0 (x) and p1 (x)q1 (x), then subtracting. Thus,
we can compute the product pq using the following three products:
degrees are the same if n is odd. Therefore, we need to be careful to note the
degrees of each polynomial we construct. By choosing m = n/2, we ensure
that m ≥ n − m. Thus, we can add the two halves of a polynomial by first
recording the low-order half, then adding in the high-order half, yielding a
polynomial of degree m − 1. After the recursive multiplications, P1 and P2
will both have degree 2(m − 1), but P3 will have degree 2(n − m − 1). To
construct P , we can first copy P1 and P3 to the proper locations, and fill
in 0 for the coefficient of x2m−1 . We can then add P2 [i] − P1 [i] − P3 [i] to
the coefficient of xm+i ; however, because P3 has a different degree than P1
and P2 , we use a separate loop to subtract this polynomial.
From Figure 10.1 and Exercise 3.26 (page 103), it is evident that a total
of Θ(n) time is needed apart from the recursive calls. Thus, we can describe
the running time with the following recurrence:
• If two elements in the same input array have equal keys, they remain in
the same order in the output array.
• If an element x from the first input array has a key equal to some element
y in the second input array, then x must precede y in the output array.
336 Algorithms: A Top-Down Approach
Suppose we are given two sorted arrays. If either is empty, we can simply
use the other. Otherwise, the element with minimum key in the two arrays
needs to be first in the sorted result. The element with minimum key in each
array is the first element in the array. We can therefore determine the overall
minimum by comparing the keys of the first elements of the two arrays. If
the keys are equal, in order to ensure stability, we must take the element
from the first array. To obtain the remainder of the result, we merge the
remainder of the two input arrays. We have therefore transformed a large
instance of merging to a smaller instance.
Divide and Conquer 337
of the two subproblems is at most half the size of the original problem, we
can bound the running time of this sorting algorithm with the recurrence
f (n) ∈ f (n − 1) + Θ(n).
From Theorem 3.34, f (n) ∈ Θ(n2 ), so that the running time for this
algorithm is in Ω(n2 ) in the worst case. Observing that each element is chosen
as a pivot at most once, we can easily see that O(n2 ) is an upper bound on
the running time, so that the algorithm runs in Θ(n2 ) time in the worst
case. Because of this bad worst case, the most common implementations of
this algorithm combine it with a Θ(n lg n) algorithm, usually heap sort, in
order to achieve Θ(n lg n) performance in the worst case (see Exercise 10.10).
For the remainder of this section, we will focus on improving the quick sort
algorithm without combining it with another sorting algorithm.
Choosing the first element (or the last element) as the pivot is a bad
idea, because an already-sorted array yields the worst-case performance.
Furthermore, the performance is nearly as bad on a nearly-sorted array.
To make matters worse, it is not hard to see that when the running time
is in Θ(n2 ), the stack usage is in Θ(n). Because we often need to sort a
nearly-sorted array, we don’t want an algorithm that performs badly in such
cases.
The above analyses illustrate that it is better for the pivot element to be
chosen to be near the median than to be near the smallest (or equivalently,
the largest) element. More generally, it illustrates why divide-and-conquer is
often an effective algorithm design strategy: when a problem is reduced to
multiple subproblems, it is best if these subproblems are the same size. For
Divide and Conquer 339
quick sort, we need a way to choose the pivot element quickly in such a way
that it tends to be near the median.
One way to accomplish this is to choose the pivot element randomly.
This algorithm is shown in Figure 10.3. In order to make the presentation
easier to follow, we have specified the algorithm so that the array is indexed
with arbitrary endpoints.
Let us now analyze the expected running time of QuickSort on an
array of size n. We first observe that for any call in which lo < hi, the loop
will execute at least once. Furthermore, by an easy induction on n, we can
340 Algorithms: A Top-Down Approach
show that at most n + 1 calls have lo ≥ hi. Because each of these calls
requires Θ(1) time, a total of at most O(n) time is used in processing the
base cases. Otherwise, the running time is proportional to the number of
times the loop executes over the course of the algorithm.
Each iteration of the loop involves comparing one pair of elements. For a
given call to QuickSort, the pivot is compared to all elements currently in
the array, then is excluded from the subsequent recursive calls. Thus, once a
pair of elements is compared, they are never compared again on subsequent
loop iterations (though they may be compared twice in the same iteration —
once in each if statement). The total running time is therefore proportional
to the number of pairs of elements that are compared. We will only concern
ourselves with pairs of distinct elements, as this will only exclude O(n) pairs.
Let F [1..n] be the final sorted array, and let comp be a discrete random
variable giving the number of pairs (i, j) such that 1 ≤ i < j ≤ N and F [i] is
compared with F [j]. We wish to compute E[comp]. Let cij denote the event
that F [i] is compared with F [j]. Then
⎡ ⎤
n n
E[comp] = E ⎣ I(cij )⎦
i=1 j=i+1
n
n
= E[I(cij )]
i=1 j=i+1
n n
= P (cij ).
i=1 j=i+1
We therefore have
n
n
E[comp] = P (cij )
i=1 j=i+1
n
n
2
≤
j−i+1
i=1 j=i+1
n n−i+1
1
=2 . (10.3)
j
i=1 j=2
Tight bounds for Hn are given by the following theorem, whose proof is left
as an exercise.
Theorem 10.1. For all n ≥ 1:
ln(n + 1) ≤ Hn ≤ 1 + ln n.
Applying Theorem 10.1 to inequality (10.3), we have
n n−i+1
1
E[comp] ≤ 2
j
i=1 j=2
n
=2 (Hn−i+1 − 1)
i=1
n
≤2 ln(n − i + 1)
i=1
n
=2 ln i
i=1
∈ O(n lg n),
from Theorem 3.31. For an array of distinct elements, a similar analysis shows
that E[comp] ∈ Ω(n lg n); hence, the expected running time of QuickSort
on any array of n elements is in Θ(n lg n).
The expected-case analysis of QuickSort suggests that it would work
well in practice, and indeed, there are versions that outperform both heap
sort and merge sort. The most widely-used versions, however, are not
342 Algorithms: A Top-Down Approach
10.4 Selection
In Section 1.1, we introduced the selection problem. Recall that this problem
is to find the kth smallest element of an array of n elements. We showed
that it can be reduced to sorting. Using either heap sort or merge sort, we
therefore have an algorithm for this problem with a running time in Θ(n lg n).
In this section, we will improve upon this running time.
Section 2.4 shows that the selection problem can be reduced to the Dutch
National Flag problem and a smaller instance of itself. This reduction is
very similar to the reduction upon which quick sort is based. Specifically, we
choose a pivot element p and solve the resulting Dutch national flag problem
as we did for the quick sort reduction. Let r and w denote the numbers of
red items and white items, respectively. We then have three cases:
Due to the similarity of this algorithm to quick sort, some of the same
problems arise in choosing the pivot element appropriately. For example, if
we always use the first element as the pivot, then selecting the nth smallest
element in a sorted array of n distinct elements always results in a recursive
call with all but one of the original elements. As we saw in Section 10.3,
this yields a running time in Θ(n2 ). On the other hand, it is possible to
show that selecting the pivot at random yields an expected running time in
Θ(n) — the details are left as an exercise.
Divide and Conquer 343
Base: We must still show the claim for 0 < n < n2 . In other words, we need
b ≥ f (n)/n for 0 < n < n2 . We can satisfy this constraint and the one above
if b = max{a/(1 − c), f (n)/n | 0 < n < n2 } (note that because this set is
finite and nonempty, it must have a maximum element).
Returning to recurrence 10.4, we see that Theorem 10.2 applies if M > 4.
Thus, if we set M = 5, we have f (n) ∈ O(n). The entire algorithm is shown
in Figure 10.5.
Now that we have described the algorithm precisely, let us analyze its
running time more carefully to be sure that it is, in fact in Θ(n). It is easily
seen that the running time is in Ω(n). We need for the recurrence
f (n) ∈ f (n/5) + f (3n/4) + O(n) (10.5)
Divide and Conquer 347
to give an upper bound on the running time of the algorithm for sufficiently
large n. Clearly, the number of elements in the first recursive call is n/5.
Furthermore, if we ignore the recursive calls, the time needed is in O(n) (note
that the time needed to sort 5 elements is bounded by a constant because
the number of elements is constant). However, the number of elements in the
second recursive call is not always bounded above by 3n/4. Consider, for
example, the array A[1..13] with A[i] = i for 1 ≤ i ≤ 13. The values 3 and
8 will be placed in T [1] and T [2], respectively. The value assigned to p will
therefore be 3. If k > 3, ten elements will be passed to the second recursive
call, but 3 · 13/4 = 9.
348 Algorithms: A Top-Down Approach
The largest such n is therefore 10 + 4 = 14. Then for all n ≥ 15, recurrence
(10.5) gives an upper bound on the running time of LinearSelect. From
Theorem 10.2, the running time is in O(n), and hence in Θ(n).
Various performance improvements can be made to LinearSelect. For
example, if n = 5, there is no reason to apply the Dutch national flag
algorithm after sorting the array — we can simply return A[k]. In other
words, it would be better if the base case included n = 5, and perhaps some
larger values as well. Furthermore, sorting is not the most efficient way to
solve the selection problem for small n. We explore some alternatives in the
exercises.
Even with these performance improvements, however, LinearSelect
does not perform nearly as well as the randomized algorithm outlined at
the beginning of this section. Better still is using a quick approximation
of the median, such as finding the median of the first, middle, and last
elements, as the value of p. This approach yields an algorithm whose worst-
case running time is in Θ(n2 ), but which typically performs better than even
the randomized algorithm.
Figure 10.6 Multiplication and division functions for use with the BigNum ADT
defined in Figure 4.18 on page 146
at the top of the main loop. Before the while loop is executed, the value
of rem is multiplied by 2n/2 , and next is added. Because next contains at
most n/2 significant bits, next < 2n/2 . Thus, when the while loop executes,
rem < v2n/2 . Because v contains n significant bits, rem contains at most
352 Algorithms: A Top-Down Approach
3n/2 significant bits. Likewise, it is not hard to show that at the beginning
of the while loop, approx contains at most n/2 + 1 significant bits, and that
prod contains at most 3n/2 + 1 significant bits. The body of the while loop
therefore runs in Θ(n) time in the worst case.
In order to get a tight bound on the number of iterations of the while
loop, we need a tighter bound on approx. In particular, we need to know
how close approx is to rem/v. Let r = rem × 2−n/2 . We first observe that
r r × 2n/2
=
vFirst + 1 (vFirst + 1) × 2n/2
rem
= ,
(vFirst + 1) × 2n/2
because the n/2 low-order bits of the numerator do not affect the value of
the expression. Furthermore, the right-hand side above is no larger than
rem/v. Thus,
r r
approx − rem/v ≤ −
vFirst vFirst + 1
r r − vFirst
≤ −
vFirst vFirst + 1
r(vFirst + 1) − vFirst(r − vFirst)
=
vFirst(vFirst + 1)
r + vFirst2
= .
vFirst(vFirst + 1)
Now because rem < v2n/2 , it follows that r < vFirst × 2n/2 . We therefore
have
r + vFirst2
approx − rem/v ≤
vFirst(vFirst + 1)
vFirst × 2n/2 + vFirst2
<
vFirst(vFirst + 1)
2n/2 + vFirst
=
vFirst + 1
2n/2 vFirst
= + .
vFirst + 1 vFirst + 1
Because vFirst contains n/2 significant bits, its value must be at least
2n/2−1 . The value of the first term on the right-hand side above is therefore
strictly less than 2. Clearly, the value of the second term is strictly less than
Divide and Conquer 353
1, so that the right-hand side is strictly less than 3. Because the left-hand
side is an integer, its value must therefore be at most 2. It follows from
the while loop invariant that the loop terminates when approx = rem/v.
Because this loop decrements approx by 1 each iteration, it must iterate at
most twice. Its running time is therefore in Θ(n).
It is now easily seen that, excluding the recursive call, the running time
of the body of the main loop is dominated by the running time of the
multiplication. Because the result of the multiplication contains at most
3n/2 + 1 significant bits, this multiplication can be done in Θ(nlg 3 ) time
using the multiplication algorithm suggested at the beginning of this section.
If m ≥ n, the number of iterations of the main loop is easily seen to be
numDig − 1 = m/digLen − 1
m
= −1
n/2
= 2m/n − 1.
Thus, the running time of the main loop, excluding the recursive call, is in
We now observe that for even n, there are in the worst case 2m/n − 1
recursive calls. For odd n, the worst-case number of recursive calls is 2(m +
1)/(n + 1) − 1. The resulting recurrence is therefore quite complicated.
However, consider the parameters of the recursive call. We have already
shown that rem < v2n/2 , and vFirst = v2−n/2 . This recursive call therefore
divides a value strictly less than v by v2−n/2 . Thus, in any of these calls,
the dividend is less than the divisor plus 1, multiplied by 2n , where n is the
number of bits in the divisor. In addition, it is easily seen that the dividend is
never less than the divisor. Furthermore, if these relationships initially hold
for odd n, they hold for the recursive call in this case as well. We therefore
will first restrict our attention to this special case.
Let n, the number of significant bits in v, be even. If
v ≤ u < (v + 1)2n ,
2m/n − 1 ≤ 4n/n − 1
= 3.
354 Algorithms: A Top-Down Approach
Because each iteration may contain a recursive call, this suggests that there
are a total of at most 3 recursive calls. However, note that whenever a
recursive call is made, the dividend is no less than the divisor, so that a
nonzero digit results in the quotient. Suppose the first of the three digits of
the quotient is nonzero. Because the first n bits of u are at most v, the only
possible nonzero result for the first digit is 1. The remainder of the quotient
is then formed by dividing a value strictly less than 2n by v, which is at
least 2n−1 . This result is also at most 1, so that the second digit must be 0.
We conclude that no more than two recursive calls are ever made. In each
of these recursive calls, the divisor has n/2 bits.
If n is odd and greater than 1, we increase the number of bits in v by 1.
The above reasoning then applies to n + 1, where n denotes the original
number of bits in v. We can therefore express the overall running time in
terms of n via the recurrence
v ≤ u < (v + 1)2n ,
becomes
Then if vq ≤ u−v, we know that the actual quotient is q+1. If vq > u,
we know that the quotient is q − 1. Otherwise, the quotient is q.
Suppose an error of is introduced in approximating the reciprocal. Then
we need
−n 1 u
u2 + − ≤ 1
v2−n v
u u
+ u2−n − ≤ 1
v v
|| ≤ 2n /u.
Suppose u consists of m bits, so that u < 2m . Then we can ensure that
the approximation of u/v differs from the actual value of u/v by no more
than 1 if our approximation of the reciprocal of v2−n differs from the actual
reciprocal by at most 2n−m .
The resulting algorithm is shown in Figure 10.8. We handle the case
in which u < v separately in order to ensure that the precondition for
u.Subtract(v) is met. As in the previous section, we use constants zero and
one, which refer to BigNum representing 0 and 1, respectively. It is easily
seen that the running time is simply the time for Reciprocal plus the time
to do the two multiplications. The time to do the first multiplication depends
on the size of the value returned by Reciprocal. Because the accuracy of
the approximation is 2n−m , we would expect the value to be not much more
than m − n significant bits.
In the remainder of this section, we will consider how to implement the
Reciprocal function specified in Figure 10.8. The technique we apply is
Newton’s method for approximating a root of a function. Let I be some
interval of the real numbers, and suppose f : I → R has at least one root
— a value x ∈ I such that f (x) = 0. For example, if y is a fixed positive
real number, the function f (x) = 1/x − y over R>0 has exactly one root,
namely, x = 1/y. Newton’s method is an iterative approach to finding an
approximation of a root of f .
Newton’s method begins with an initial estimate x0 of the root. If f (x0 )
is not sufficiently close to 0, a better approximation is found using the
derivative of f , which we will denote by f . Recall that f (x0 ) gives the
slope of the line tangent to f at x0 (see Figure 10.9). We can easily find
the intersection x1 of this line with the x-axis, and for many functions,
this intersection will be a better approximation to the root than the initial
estimate. We then apply Newton’s method using x1 as the initial estimate.
For many functions, this approach is guaranteed to approach a root very
quickly. As we will see, the function f (x) = 1/x − y is such a function.
Divide and Conquer 357
The line tangent to f at x0 has slope f (x0 ) and includes the point
(x0 , f (x0 )). To find its x-intercept, we need to go to the left of x0 a distance
of f (x0 )/f (x0 ) (or if this value is negative, we go to the right a distance of
−f (x0 )/f (x0 )). The new estimate x1 is therefore given by
1/x0 − y
x1 = x0 −
−x−2
0
= x0 + x0 − yx20
= 2x0 − yx20 .
358 Algorithms: A Top-Down Approach
the same size as y. However, note that successive iterations give successively
better approximations. For the earlier approximations, which will probably
not be very accurate anyway, we need not use all of the bits of y in the
computation.
This suggests the following approach. Suppose we need an approximation
that differs from the actual reciprocal by no more than 2−k . We will use k as
the size of this problem instance. If k is not too small, we first solve a smaller
instance in order to obtain a less-accurate approximation. The accuracy that
we require of this approximation needs to be such that a single application
of the Newtonian iteration will yield an accuracy of within 2−k . In applying
this iteration, we only use as many bits of y as we need in order to ensure
the required accuracy. Finally, in order to keep the number of bits in the
approximation from growing too rapidly, we return only as many bits as we
need to ensure the required accuracy.
Let α ∈ R denote the absolute error of some estimate; i.e., our estimate is
1/y + α. Let β ∈ R≥0 denote the absolute error introduced by truncating y,
so that the value we use for y in the iteration is y − β. Finally, let γ ∈ R≥0
denote the absolute error introduced by truncating the result. The value
computed by the iteration is therefore
2
1 1 1 β 2αβ
2 + α − (y − β) +α −γ = + 2 + + α2 β − yα2 − γ.
y y y y y
We need for this value to differ from 1/y by at most 2−k ; i.e., we need
β
+ 2αβ + α2 β − yα2 − γ ≤ 2−k . (10.2)
y2 y
Note that because y > 0, β ≥ 0, and γ ≥ 0, all terms except the second
are always non-negative. In order to ensure that the inequality holds when
the value inside the absolute value bars is non-negative, we can therefore
ignore the last two terms. We therefore need
β 2αβ
2
+ + α2 β ≤ 2−k .
y y
If we replace α by |α| in the above inequality, the left-hand side does not
decrease. For fixed α and β, the resulting left-hand side is maximized when
y is minimized. Setting y to its minimum possible value of 1/2, it therefore
suffices to ensure that
4β + 4|α|β + α2 β ≤ 2−k .
360 Algorithms: A Top-Down Approach
In order to keep the first term sufficiently small, we need β < 2−k−2 . In
order to leave room for the other two terms, let us take β ≤ 2−k−3 . In other
words, we will use the first k + 3 bits of y in applying the iteration. Then as
long as |α| ≤ 1/2, we have
4β + 4αβ + α2 β ≤ 2−k−1 + 2−k−2 + 2−k−5
≤ 2−k .
Let us now consider the case in which the value inside the absolute value
bars in (10.2) is negative. We can now ignore the first and third terms. We
therefore need
2αβ
yα2 + γ − ≤ 2−k .
y
Here, we can safely replace α by −|α|. For fixed α, β, and γ in the resulting
inequality, the first term is maximized when y is maximized, but the third
term is maximized when y is minimized. It therefore suffices to ensure that
α2 + γ + 4|α|β ≤ 2−k .
Again taking β ≤ 2−k−3 , we only need |α| ≤ 2−(k+1)/2 and γ ≤ 2−k−2 . We
then have
α2 + γ + 4|α|β ≤ 2−k−1 + 2−k−2 + 2k−1−(k+1)/2
≤ 2−k ,
provided k ≥ 1.
We can satisfy the constraints on α and γ by finding an approximation
within 2−(k+1)/2 , and returning k + 3 bits of the result of applying the
iteration (recall that the result has one bit to the left of the radix point).
Note that if we take k as the size of the problem instance, we are reducing
the problem to an instance roughly half the original size. We therefore have
a divide-and-conquer algorithm.
In order to complete the algorithm, we need to handle the base cases.
Because (k + 1)/2 < k only when k > 2, these cases occur for k ≤ 2. It
turns out that these cases are important for ensuring that the approximation
is at least 1 and strictly less than 2. From (10.1), the result of the iteration
is never more than 1/y (here y denotes the portion we are actually using
in computing the iteration). Thus, if y > 1/2, the estimate is less than 2.
Furthermore, if y = 1/2, an initial estimate less than 2 will ensure that some
error remains, so that the result is still strictly less than 2. Finally, provided
< 1, the result is always closer to 1/y than the initial estimate. Thus, if we
Divide and Conquer 361
make sure that our base case gives a value that is less than 2 and no worse
an estimate than 1 would be, the approximation will always be in the proper
range.
We leave it as an exercise to show that the estimate
11 − 8y
4
satisfies the specification and the requirements discussed above for k ≤ 2.
8y is simply the first 3 bits of y. Because 1/2 ≤ y < 1, 4 ≤ 8y < 8. The
numerator is therefore always a 3-bit natural number. The final division by
4 simply puts the radix point in the proper place.
The algorithm is shown in Figure 10.10. We use the variable len to store
the value k + 3, which, except in the base case, is both the number of bits
we use from y and the number of bits we return. We assume the existence
of a constant eleven referring to a BigNum with value 11. Before we do the
subtraction, we must make sure the radix points in the operands line up.
The approximation x0 has one bit to the left of the implicit radix point. The
multiplication of x0 by 2 simply moves the radix point to the right one place.
As a result, the implicit radix point in 2x0 is x0 .NumBits()−2 from the right
in x0 . The implicit radix point in the product yx20 is len+2(x0 .NumBits()−1)
bits from the right. In order for the radix points to line up, we therefore
need to pad the value stored in x0 with len + x0 .NumBits() zeros prior to
subtracting.
Let us now analyze the running time of RecipNewton. Suppose we use
a multiplication algorithm that runs in O(M (n)) time, where n is the number
of bits in the product. For now, we will assume that M (n) is a smooth
function in Ω(n), but we will strengthen this assumption as the analysis
proceeds. It is easily seen that for k ≥ 3, the number of bits returned by
RecipNewton is k + 3. Therefore, for k ≥ 4, the worst-case number of bits
in the first product is 2k + 6. Because we use k + 3 bits of y, the worst-case
number of bits in the second product is 3k + 9. Because M is smooth, from
Exercise 3.21, the time required for the two multiplications is in O(M (k)).
Because the remainder of the operations, excluding the recursive call,
run in linear time, the total time excluding the recursive call is in O(M (k)).
The total running time is therefore given by the recurrence
k+1
f (k) ∈ f + O(M (k)),
2
for k ≥ 4. We can simplify this recurrence by defining f1 (k) = f (k + 1).
Thus, for k ≥ 4,
f1 (k) = f (k + 1)
k+2
∈f + O(M (k + 1))
2
k
=f + 1 + O(M (k + 1))
2
= f1 (k/2) + O(M (k + 1))
= f1 (k/2) + O(M (k)),
because M is smooth.
In order to be able to apply Theorem 3.35 to f1 , we need additional
assumptions on M . We therefore assume that M (k) = k q g(k), where q ≥ 1
and g1 (k) = g(2k+2 ) is smooth. (Note that the functions k lg 3 and k lg k lg lg k
both satisfy these assumptions on M .) Then from Theorem 3.35, f1 (k) ∈
O(M (k)). Because M is smooth, f (k) = f1 (k − 1) ∈ O(M (k)).
We can now analyze the running time of DivideRecip. If m < n, the
running time is clearly in Θ(1). Suppose m ≥ n. Then the value r returned by
Reciprocal(v, m−n) has m−n+3 bits in the worst case. Hence, the result
of the first multiplication has 2m − n + 3 bits in the worst case. The worst-
case running time of this multiplication is therefore in O(M (m)). q then has
m − n + 1 bits in the worst case. The result of the second multiplication
therefore has m + 1 bits in the worst case, and hence runs in O(M (m))
time. Because Reciprocal runs in O(M (m − n)) time, and the remaining
operations run in O(m) time, the overall running time is in O(M (m)). The
Divide and Conquer 363
10.7 Summary
The divide-and-conquer technique involves reducing large instances of a
problem to one or more smaller instances, each of which is a fraction of
the size of the original problem. The running time of the resulting algorithm
can typically be analyzed by deriving a recurrence to which Theorem 3.35
applies. Theorem 3.35 can also suggest how to improve a divide-and-conquer
algorithm.
Some variations of the divide-and-conquer technique don’t completely fit
the above description. For example, quick sort does not necessarily produce
subproblems whose sizes are a fraction of the size of the original array. As
a result, Theorem 3.35 does not apply. However, we still consider quick
sort to be a divide-and-conquer algorithm because its goal is to partition
an array into two arrays of approximately half the size of the input array,
and to sort these arrays recursively. Likewise, in LinearSelect, the sizes
of the two recursive calls are very different, but because they are both
fractions of the original size, the analysis ends up being related to that of a
more standard divide-and-conquer algorithm. Finally, DivideDC does not
divide the problem into a bounded number of subproblems; however, all of
the recursive calls in turn yield at most two recursive calls, so we can analyze
these calls using standard divide-and-conquer techniques.
10.8 Exercises
Exercise 10.1. Prove that PolyMult, shown in Figure 10.1, meets its
specification.
Exercise 10.2. PolyMult is not particularly efficient when one polyno-
mial has a degree much larger than that of the other. For example, if p
has degree n and q has degree 1, a straightforward implementation of the
definition of the product yields Θ(n) running time. Devise an algorithm that
runs in Θ(mnlg 3−1 ) time on polynomials of degree m and n with m ≥ n. Your
algorithm may use PolyMult. Analyze the running time of your algorithm.
[Hint: If m > n, divide the larger polynomial into polynomials of degree at
most n.]
* Exercise 10.3. Construct a divide-and-conquer polynomial multiplica-
tion algorithm that performs 5 recursive calls on polynomials of 1/3 the size
364 Algorithms: A Top-Down Approach
of the original polynomials. Show that your algorithm has a running time in
Θ(nlog3 5 ). (Note that log3 5 < lg 3.)
** Exercise 10.4. Generalize Exercise 10.3 by showing that for sufficiently
large n and any k ≥ 2, the product of two degree-(n − 1) polynomials can be
computed from the products of 2k − 1 polynomials of degree approximately
(n/k) − 1. Using this result, show that for any ∈ R>0 , there is an algorithm
to multiply two degree-(n − 1) polynomials in O(n1+ ) time.
Exercise 10.5. Adapt PolyMult to implement Multiply, as specified in
Figure 10.6, in Θ(nlg 3 ) time, where n is the number of bits in the product.
Exercise 10.6. Prove that MergeSort, shown in Figure 10.2, meets its
specification.
Exercise 10.7. Suppose we are given a tape containing a large number of
Keyed items to be sorted. The number of items is too large to fit into main
memory, but we have three additional tapes we can use, and we can rewrite
the input tape. Give a bottom-up version of merge sort that produces the
sorted output on one of the tapes. You may not assume that data items on
the tapes can be accessed “randomly” — they must be accessed in sequence.
Your algorithm must make at most O(lg n) passes through each tape.
Exercise 10.8. Prove that QuickSort, shown in Figure 10.3, meets its
specification.
Exercise 10.9. Notice that one of the recursive calls in QuickSort is
tail recursion. Taking advantage of this fact, convert one of the recursive
calls to iteration. Notice that the calls can be made in either order, and so
either may be converted to iteration. Make the proper choice so that the
resulting algorithm uses Θ(lg n) stack space in the worst case on an array of
n elements.
Exercise 10.10. Suppose we modify QuickSort by introducing a second
parameter d giving the depth of recursion and a third parameter giving the
length N of the entire array (not just the portion currently being sorted).
Then prior to selecting the pivot element, if d ≥ 2 lg N , instead of sorting
using the given algorithm, sort A[lo..hi] using a Θ(n lg n) algorithm such as
heap sort or merge sort. Show that this modification results in an algorithm
that runs in Θ(n lg n) in the worst case, even if A[lo] is always used as the
pivot element.
Divide and Conquer 365
* Exercise 10.27. Given two natural numbers u and v which are not both
0, the greatest common divisor of u and v (or gcd(u, v)) is the largest integer
that evenly divides both u and v.
a. Prove that for any positive integers u and v, gcd(u, v) = gcd(v, u mod v).
b. Design a divide-and-conquer algorithm that takes as input two positive
integers u and v and returns gcd(u, v). Your algorithm should run in
O(lg max(u, v)) time.
* Exercise 10.28. Given two positive integers u and m such that u < m,
a multiplicative inverse of u mod m is any positive integer v such that 1 ≤
v < m and (uv) mod m = 1.
a. Prove that for any positive integers u and v, there exist integers a and b
such that au + bv = gcd(u, v).
368 Algorithms: A Top-Down Approach
algorithm to compute the Manhattan skyline, and show that your algorithm
runs in Θ(n lg n) time.
01234
0 10543
1 24051
2 32105
3 45320
4 53412
Your algorithm should return the minimum distance separating any two
distinct points.
370 Algorithms: A Top-Down Approach
10.9 Notes
The PolyMult algorithm is based on a Θ(nlg 3 ) large-integer multiplication
algorithm by Karatsuba and Ofman [76]. The DivideDC algorithm is due
to Burnikel and Ziegler [18]. The RecipNewton algorithm is a top-down
adaptation of an algorithm given by Knuth [83]; he credits the idea to Cook.
Solutions to Exercises 10.3, 10.4, 10.25, and 10.26, can be found in Knuth
[83].
Merge sort was one of the earliest algorithms developed for electronic
computers, being developed by von Neumann in 1945 [80,114]. Exercise 10.7
is based on work by Eckert and Mauchly [34]. Quick sort was developed
by Hoare [62]. Introsort was developed by Musser [95]. Pattern-defeating
quicksort, or pdqsort, is an important variation of introsort developed by
Peters [98].
Algorithm LinearSelect is due to Blum, et al. [14]. The solution to
Exercise 10.20 is due to Aigner [4].
The solution to Exercise 10.32 is due to Bentley [11]. The solution to
Exercise 10.34 is due to Strassen [109].
Chapter 11
371
372 Algorithms: A Top-Down Approach
The spanning tree will initially contain only the vertex 0; hence, it is
unnecessary to include the index 0 for the arrays best and bestCost. We
can then initialize each best[k] to 0 and each bestCost[k] to the cost of edge
{0, k}, or to ∞ if there is no such edge. In order to find an edge to add
to the spanning tree we can find the minimum bestCost[k] such that k is
not in the spanning tree. If we denote this index by next, then the edge
378 Algorithms: A Top-Down Approach
{best[next], next} is the next edge to be added, thus connecting next to the
spanning tree. For each k that is still not in the spanning tree, we must then
update bestCost[k] by comparing it to the cost of {next, k}, and update
best[k] accordingly. The algorithm is shown in Figure 11.2.
It is easily seen that if G is a MatrixGraph, the running time
is in Θ(n2 ). This is an improvement over Kruskal’s algorithm when a
MatrixGraph is used. If a ListGraph is used, however, the running time
is still in Ω(n2 ), and can be as bad as Θ(n3 ) for dense graphs. Thus, Kruskal’s
Optimization I: Greedy Algorithms 379
and an edge (w, x), where w is a vertex in T , so that the path from u to x
in the resulting tree is a shortest path in G from u to x.
For each vertex w in T , let dw give the length of the path from u to
w in T . For each edge (x, y) in G, let len(x, y) give the length of (x, y). Let
(w, x) be an edge in G such that
• w is in T ;
• x is not in T ; and
• dw + len(w, x) is minimized.
M p
characters require three bytes. Various symbols and emoticons even require
four bytes.
Two improvements to this approach can be made to reduce the length of
a specific document. First, we can choose the encoding based on the actual
character frequencies within that document. Second, we can use a variable
number of bits, rather than a variable number of bytes. If we can make these
improvements, the characters occurring most frequently in the document are
likely to be encoded using fewer than eight bits.
The difficulty with variable-width encodings is choosing the encoding so
that it is clear where one character ends and the next begins. For example,
if we encode “n” with 11 and “o” with 111, then the encoding 11111 would
be ambiguous — it could encode either “no” or “on”. To overcome this
difficulty, we arrange the characters as the leaves of a binary tree in which
each non-leaf has two non-empty children (see Figure 11.3). The encoding
of a character is determined by the path from the root to the leaf containing
that character: each left child on the path denotes a 0 in the encoding, and
each right child on the path denotes a 1 in the encoding. Thus, in Figure 11.3,
“M” is encoded as 100. Because no path from the root to a leaf is a proper
prefix of any other path from the root to a leaf, no ambiguity results.
Example 11.1. For example, we can use the tree in Figure 11.3 to
encode “Mississippi” as 100011110111101011010. We parse this encoding by
traversing the tree according to the paths specified by the encoding. Starting
at the root, we go right-left-left, arriving at the leaf “M”. Starting at the
root again, we go left, arriving at the leaf “i”. Continuing in this manner, we
see that the bit-string decodes into “Mississippi”. Note that because there
are four distinct characters, a fixed-width encoding would require at least
382 Algorithms: A Top-Down Approach
two bits per character, yielding a bit string of length 22. However, the bit
string produced by the given encoding has length 21.
Theorem 11.3. Let T be a Huffman tree for a frequency table F , and let
t1 , . . . , tn be subtrees of T such that n > 1 and each leaf of T occurs in exactly
Optimization I: Greedy Algorithms 383
Case 1: The path from x to t1 is no longer than the path from x to t2 . Let
t be the sibling of t2 in T . Without loss of generality, assume t is the left
child and t2 is the right child (otherwise, we can swap them). Clearly, t can
be neither t1 nor t2 . Furthermore, it cannot be a proper subtree of any of
t1 , . . . , tn , because then t2 would also be a proper subtree of the same tree.
Finally, t cannot contain t1 as a proper subtree, because then the path from
x to t1 would be longer than the path from x to t2 . We conclude that t must
contain one or more of t3 , . . . , tn . We can therefore swap t1 with t, letting
the result be T .
Because t contains one or more of t3 , . . . , tn , weight(t1 ) ≤ weight(t);
hence, weight(t) − weight(t1 ) ≥ 0. The swap then causes the weights of all
nodes except x on the path from x to the parent of t1 in T to increase
by weight(t) − weight(t1 ). Furthermore, it causes the weights of all nodes
except x on the path from x to the parent of t2 in T to decrease by weight(t)−
weight(t1 ). No other nodes change weight. Because there are at least as many
nodes on the path from x to t2 in T as on the path from x to t1 in T , the
swap cannot increase the cost of the tree. Therefore T is a Huffman tree.
11.5 Summary
Greedy algorithms provide an efficient mechanism for solving certain opti-
mization problems. The major steps involved in the construction of a greedy
algorithm are:
Optimization I: Greedy Algorithms 385
Priority queues are often useful in facilitating quick access to the best
extension, as determined by the selection criterion. In many cases, the
extension involves joining pieces of a partial solution in a way that can be
modeled effectively using a DisjointSets structure.
Proving that the incremental extension can be extended to an optimal
solution is essential, because it is not true for all selection criteria. In fact,
there are optimization problems for which there is no greedy solution. In
the next chapter, we will examine a more general, though typically more
expensive, technique for solving optimization problems.
11.6 Exercises
Exercise 11.1. Prove that Kruskal, shown in Figure 11.1, meets its
specification.
Exercise 11.2. Prove that Prim, shown in Figure 11.2, meets its
specification.
Exercise 11.3. Instead of using the arrays best and bestCost, Prim’s
algorithm could use a priority queue to store all of the edges from vertices in
the spanning tree. As vertices are added to the spanning tree, all edges from
these vertices would be added to the priority queue. As edges are removed
from the priority queue, they would need to be checked to see if they connect
a vertex in the spanning tree with one that is not in the spanning tree.
Implement this algorithm and analyze its running time assuming the graph
is implemented as a ListGraph.
that pred[i] gives the parent of i in the shortest paths tree; pred[u] should
be −1.
Exercise 11.5. Modify your algorithm from Exercise 11.4 to use a priority
queue as suggested in Exercise 11.3. Analyze its running time assuming the
graph is implemented as a ListGraph.
Exercise 11.6. Suppose we wish to solve the single-source shortest path
problem for a graph with unweighted edges; i.e., each edge is understood to
have a length of 1. Prove that the algorithm for Exercise 11.5 can be modified
by replacing the priority queue with a queue (see Exercise 4.11, page 144) to
yield an algorithm for the unweighted single-source shortest path problem.
Analyze the running time of the resulting algorithm, assuming the graph
is implemented as a ListGraph. (This algorithm is known as breadth-first
search.)
Exercise 11.7. Construct a Huffman tree for the string, “banana split”,
and give its resulting encoding in binary. Don’t forget the blank character.
Exercise 11.8. Prove that HuffmanTree, shown in Figure 11.4, meets its
specification.
Exercise 11.9. Suppose we have a set of jobs, each having a positive integer
execution time. We must schedule all of the jobs on a single server so that at
most one job occupies the server at any given time and each job occupies the
server for a length of time equal to its execution time. Our goal is to minimize
the sum of the finish times of all of the jobs. Design a greedy algorithm to
accomplish this and prove that it is optimal. Your algorithm should run in
O(n lg n) time, where n is the number of jobs.
Exercise 11.10. Extend the above exercise to k servers, so that each job is
scheduled on one of the servers.
Exercise 11.11. Suppose we are given a set of events, each having a start
time and a finish time. Each event requires a single room. We wish to assign
events to rooms using as few rooms as possible so that no two events in the
same room overlap (they may, however, be scheduled “back-to-back” with
no break in between). Give a greedy algorithm to accomplish this and prove
that it is optimal. Your algorithm should run in O(n lg n) time.
Exercise 11.12. Repeat the above exercise with the constraint that only
one room is available. The goal is to schedule as many events as possible.
Exercise 11.13. We wish to plan a trip across country in a car that can go d
miles on a full tank of gasoline. We have identified all of the gas stations along
Optimization I: Greedy Algorithms 387
the proposed route. We wish to plan the trip so as to make as few stops for
gasoline as possible. Design a greedy algorithm that gives an optimal set of
stops when given d and an array dist[1..n] such that dist[i] gives the distance
from the starting point to the ith gas station. Your algorithm should operate
in O(n) time.
* Exercise 11.14. The fractional knapsack problem is as follows. We are
given a set of n items, each having a positive weight wi ∈ N and a positive
value vi ∈ N. We are also given a weight bound W ∈ N. We wish to carry
some of these items in a knapsack without exceeding the weight bound. Our
goal is to maximize the total value of the items we carry. Furthermore, the
items are such that we can take a fraction of the item if we wish. Thus, we
wish to maximize
n
ai vi ,
i=1
a. Give a greedy algorithm to find an optimal packing, and prove that your
algorithm is correct. Your algorithm should run in O(n lg n) time.
b. Show using a specific example that this greedy algorithm does not always
give an optimal solution if we require that each ai be either 0 or 1.
c. Using techniques from Chapter 10, improve the running time of your
algorithm to O(n).
11.7 Notes
Greedy algorithms were first identified in 1971 by Edmonds [36], though
they actually existed long before then. The theory that underlies greedy
algorithms — matroid theory — was developed by Whitney [120] in the
1930s. See, e.g., Lawler [89] or Papadimitriou and Steiglitz [96] for more
information on greedy algorithms and matroid theory.
The first MST algorithm was given by Boru̇vka [15] in 1926. What is
now known as Prim’s algorithm was first discovered by Jarnı́k [70], and
over 25 years later rediscovered independently by Prim [99] and Dijkstra
[27]; the latter paper also includes the single-source shortest paths algorithm
outlined in Section 11.3. Kruskal’s algorithm was given by Kruskal [88].
388 Algorithms: A Top-Down Approach
Other MST algorithms have been given by Yao [124], Cheriton and Tarjan
[21], Tarjan [112], Karger [77], and Chazelle [20]. Other improvements for
single-source shortest paths have been given by Johnson [74, 75], Tarjan
[112], and Fredman and Tarjan [46].
Huffman coding was developed by Huffman [68]. See Lelewer and
Hirschberg [90] and Sayood [103] for surveys of compression algorithms. On
the website for this textbook is a tool for constructing and displaying a
Huffman tree for a given text.
Chapter 12
In the last chapter, we saw that greedy algorithms are efficient solutions to
certain optimization problems. However, there are optimization problems for
which no greedy algorithm exists. In this chapter, we will examine a more
general technique, known as dynamic programming, for solving optimization
problems.
Dynamic programming is a technique of implementing a top-down
solution using bottom-up computation. We have already seen several exam-
ples of how top-down solutions can be implemented bottom-up. Dynamic
programming extends this idea by saving the results of many subproblems
in order to solve the desired problem. As a result, dynamic programming
algorithms tend to be more costly, in terms of both time and space, than
greedy algorithms. On the other hand, they are often much more efficient
than straightforward recursive implementations of the top-down solution.
Thus, when greedy algorithms are not possible, dynamic programming
algorithms are often the most appropriate.
389
390 Algorithms: A Top-Down Approach
first takes 25. At this point, the only denomination that does not cause the
total to exceed n is 1. The greedy strategy therefore gives a total of six coins:
one 25 and five 1s. This solution is not optimal, however, as we can produce
30 with three 10s.
Let us consider a more direct top-down solution. If k = 1, then dk = 1,
so the only solution contains n coins. Otherwise, if dk > n, we can reduce
the size of the problem by removing dk from the set of denominations, and
the solution to the resulting problem is the solution to the original problem.
Finally, suppose dk ≤ n. There are now two possibilities: the optimal solution
either contains dk or it does not. In what follows, we consider these two cases
separately.
Let us first consider the case in which the optimal
These two possibilities are
solution does not contain dk . In this case, we do not not exclusive — there could
change the optimal solution if we remove dk from the be one optimal solution that
contains dk and another
set of denominations. We therefore have reduced the that does not.
n − k ≥ k2 − k
= k(k − 1)
≥ (k − 1)2 ,
Figure 12.1 Algorithm for computing the minimum number of coins needed to
achieve a given value
Precondition: d[1..k] is an array of Ints such that 1 = d[1] < d[2] < · · · <
d[k], and n is a Nat.
Postcondition: Returns an array A[1..k] such that A[i] gives the number
of coins of denomination d[i] in a minimum-sized collection of coins with
value n.
Change(d[1..k], n)
C ← new Array[0..n, 1..k]; A ← new Array[1..k]
for i ← 0 to n
C[i, 1] ← i
for i ← 0 to n
for j ← 2 to k
if i < d[j]
C[i, j] ← C[i, j − 1]
else
C[i, j] ← Min(C[i, j − 1], C[i − d[j], j] + 1)
for j ← 1 to k
A[j] ← 0
i ← n; j ← k
// Invariant: kl=1 A[l]d[l] = n − i, and there is an optimal solution
// that includes all of the coins in A[1..k], but no additional coins from
// d[j + 1..k].
while j > 1
if i < d[j] or C[i, j − 1] < C[i − d[j], j] + 1
j ←j−1
else
A[j] ← A[j] + 1; i ← i − d[j]
A[1] ← i
return A[1..k]
M1 M2 · · · Mn ,
394 Algorithms: A Top-Down Approach
2 · 3 · 4 + 2 · 4 · 1 = 32.
3 · 4 · 1 + 2 · 3 · 1 = 18.
Thus, the way in which the matrices are parenthesized can affect
the number of scalar multiplications performed in computing the matrix
product. This fact motivates an optimization problem: Given a sequence of
positive integer dimensions d0 , . . . , dn , determine the minimum number of
scalar multiplications needed to compute the product M1 . . . Mn , assuming
Mi is a di−1 × di matrix for 1 ≤ i ≤ n, and that the number of scalar
multiplications required to multiply two matrices is as described above.
Various greedy strategies might be applied to this problem, but none
can guarantee an optimal solution. Let us therefore look for a direct
top-down solution to the problem of finding the minimum number of
scalar multiplications for a product Mi . . . Mj . Let us focus on finding the
last matrix multiplication. This multiplication will involve the products
Mi . . . Mk and Mk+1 . . . Mj for some k, 1 ≤ k < n. The sizes of these two
matrices are di−1 × dk and dk × dj . Therefore, once these two matrices are
computed, an additional di−1 dk dj scalar multiplications must be performed.
The principle of optimality clearly holds for this problem, as a better way
Optimization II: Dynamic Programming 395
We wish to find, for each ordered pair (u, v) ∈ V 2 , the length of the shortest
path from u to v; if there is no such path, we define the length to be ∞.
Note that we have simplified the problem so that instead of finding the actual
paths, we will only be finding their lengths.
This optimization problem is somewhat nonstandard in that the objec-
tive function is not a numeric-valued function. Instead, its range can be
thought of as a matrix of values. However, the optimum is well-defined, as
it occurs when all values are simultaneously minimized, and this is always
possible.
Let p be a shortest path from i to j, and consider any vertex k other
than i or j. Then either k is in p or it isn’t. If k is not in p, then p remains
the shortest path from i to j if we remove k from the graph. Otherwise, we
can break p into a path from i to k and a path from k to j. Clearly, each
of these paths are shortest paths between their endpoints. Thus, if we can
find the shortest path from i to k and the shortest path from k to j, we can
determine the shortest path from i to j.
A shortcoming to this approach is that we haven’t actually reduced
the size of the problem, as the shortest paths from i to k and k to j are
with respect to the original graph. One way to avoid this shortcoming is to
generalize the problem so that a set of possible intermediate vertices is given
as additional input. The problem is then to find, for each ordered pair (i, j)
of vertices, the length of the shortest path from i to j such that all vertices
other than i and j on this path belong to the given set. If the given set is
V, then the result is the solution to the all-pairs shortest paths problem.
In order to keep the number of subproblems from being too large, we can
restrict the sets we allow as input. Specifically, our additional input can be a
natural number k, which denotes the set of all natural numbers strictly less
than k.
Optimization II: Dynamic Programming 397
Let Lk (i, j) denote the length of the shortest path from i to j with
intermediate vertices strictly less than k, where 0 ≤ i < n, 0 ≤ j < n, and
0 ≤ k ≤ n. Using the above reasoning, we have the following recurrence for
Lk (i, j):
len(i, j) if k = 0
Lk (i, j) =
min(Lk−1 (i, j), Lk−1 (i, k − 1) + Lk−1 (k − 1, j)) if k > 0.
(12.3)
We can then implement a dynamic programming algorithm to compute
all Lk (i, j) using a 3D array. However, we can save a great deal of space by
making some observations. Note that in order to compute an entry Lk (i, j),
for k > 0, we only use entries Lk−1 (i, j), Lk−1 (i, k−1), and Lk−1 (k−1, j). We
claim that Lk−1 (i, k −1) = Lk (i, k −1) and that Lk−1 (k −1, j) = Lk (k −1, j).
To see this, note that
and
0 ≤ j ≤ W:
⎧
⎨0 if i = 0
Vi (j) = Vi−1 (j) if i > 0, j < wi (12.4)
⎩
max(Vi−1 (j), Vi−1 (j − wi ) + vi ) otherwise.
n
V = vi .
i=1
Let us then compute the minimum weight required to achieve each possible
value v ≤ V . The largest value v yielding a minimum weight no larger than
W is then our optimal value.
Taking this approach, we observe that item n is either in the set of items
for which value v can be achieved with minimum weight, or it isn’t. If it
is, then the minimum weight can be computed by removing item n and
finding the minimum weight needed to achieve a value of v − vn . Otherwise,
the minimum weight can be computed by removing item n. The following
recurrence therefore gives the minimum weight Wi (j) needed to achieve a
value of exactly j from the first i items, for 0 ≤ i ≤ n, 0 ≤ j ≤ V :
⎧
⎪
⎪0 if j = 0
⎨
∞ if i = 0, j > 0
Wi (j) = (12.5)
⎪
⎪ W (j) if i > 0, 0 < j < vi
⎩ i−1
min(Wi−1 (j), Wi−1 (j − vi ) + wi ) otherwise.
12.5 Summary
Dynamic programming algorithms provide more power for solving optimiza-
tion problems than do greedy algorithms. Efficient dynamic programming
algorithms can be found when the following conditions apply:
12.6 Exercises
Exercise 12.1. Prove by induction on n + k that C(n, k), as defined in
recurrence (12.1), gives the minimum number of coins needed to give a value
of exactly n if the denominations are d1 < d2 < · · · < dk and d1 = 1.
Exercise 12.2. Prove that Change, shown in Figure 12.1, meets its
specification. You do not need to focus on the first half of the algorithm;
i.e., you can assume that C(i, j), as defined in recurrence (12.1), is assigned
to C[i, j]. Furthermore, you may use the result of Exercise 12.1 in your proof.
Optimization II: Dynamic Programming 401
a. Give a recurrence for L(i), the length of the longest increasing subse-
quence of A[1..i] that ends with i, where 1 ≤ i ≤ n.
b. Give a dynamic programming algorithm that prints the indices of a
longest increasing subsequence of A[1..n]. Your algorithm should operate
in O(n2 ) time.
Exercise 12.10. Let A[1..m] and B[1..n] be two arrays. An array C[1..k]
is a common subsequence of A and B if there are two sequences of indices
i1 , . . . , ik and j1 , . . . , jk such that
a. Give a recurrence for L(i, j), the length of the longest common subse-
quence of A[1..i] and B[1..j].
b. Give a dynamic programming algorithm that returns the longest common
subsequence of A[1..m] and B[1..n]. Your algorithm should operate in
O(mn) time.
Exercise 12.11. A palindrome is a string that reads the same from right
to left as it does from left to right (“abcba”, for example). Give a dynamic
programming algorithm that takes a String (see Figure 4.17 on page 145) s
as input, and returns a longest palindrome contained as a substring within s.
Your algorithm should operate in O(n2 ) time, where n is the length of s. You
may use the results of Exercise 4.13 (page 144) in analyzing your algorithm.
[Hint: For each pair of indices i ≤ j, determine whether the substring from
i to j is a palindrome.]
Optimization II: Dynamic Programming 403
(x1 − x2 )2 + (y1 − y2 )2 .
3
Processor 1: Weight = 8
2
B = 20
3 Communication cost = 2
5
Processor 2: Weight = 17
8
* Exercise 12.15. A chain is a rooted tree with exactly one leaf. We are
given a chain representing a sequence of n pipelined processes. Each node i
in the chain represents a process and has a positive execution time ei ∈ N.
Each edge (i, j) has a positive communication cost cij ∈ N. For edge (i, j),
if processes i and j are executed on separate processors, the time needed to
send data from process i to process j is cij ; if the processes are executed on
the same processor, this time is 0. We wish to assign processes to processors
such that each processor has total weight no more than a given value B ∈ N.
The weight of a processor is given by the sum of the execution times of the
processes assigned to that processor, plus the sum of the communication
costs of edges between tasks on that processor and tasks on other processors
(see Figure 12.5). The communication cost of an assignment is the sum of
the communication costs of edges that connect nodes assigned to different
processors.
Give a dynamic programming algorithm that finds the minimum com-
munication cost of any assignment of processes to processors such that each
processor has weight no more than B. Note that we place no restriction on
the number of processors used. Your algorithm should run in O(n2 ) time.
Prove that your algorithm is correct.
Exercise 12.16. Given two strings x and y, we define the edit distance from
x to y as the minimum number of operations required to transform x into y,
where the operations are chosen from the following:
Optimization II: Dynamic Programming 405
• insert a character;
• delete a character; or
• change a character.
We say that a binary search tree containing these keys is optimal if the
expected cost of a look-up in this tree is minimum over the set of all binary
search trees containing these keys.
a. Let us extend the definition of the cost of a look-up to pertain to a
specific subtree, so that the cost with respect to subtree T is the number
of nodes in T examined during that look-up. For i ≤ j, let Sij be the
set of all binary search trees with keys k1 , . . . , kn such that there is
a subtree containing exactly the keys ki , . . . , kj . Let Cij denote the
minimum over Sij of the expected cost of a look-up with respect to the
subtree containing keys ki , . . . , kj . Prove that
⎧
⎨pi
⎪ if i = j
j
Cij =
⎪ min (Ci,k−1 + Ck+1,j ) +
⎩i≤k≤j pk if i < j
k=i
that we don’t need the values of the keys in order to compute this
value.) Your algorithm should run in O(n3 ) time and O(n2 ) space.
**c. Suppose rij is the root of an optimal binary search containing the keys
ki , . . . , kj , where i ≤ j. Prove that ri,j−1 ≤ rij ≤ ri+1,j for 1 ≤ i < j ≤
n.
*d. Using the above result, improve your algorithm to run in O(n2 ) time.
Exercise 12.18. Give a dynamic programming algorithm that takes as
input two natural numbers k ≤ n and returns the probability that flipping a
fair coin n times yields at least k heads. Your algorithm should run in O(n)
time. Prove that your algorithm is correct.
* Exercise 12.19. Give a dynamic programming algorithm that takes as
input a natural number n and returns the number of different orderings of n
elements using < and/or =. For example, for n = 3, there are 13 orderings:
x<y<z x<z<y y<x<z y<z<x
z<x<y z<y<x x=y<z z<x=y
x=z<y y<x=z y=z<x x<y=z
x = y = z.
Your algorithm should run in O(n2 ) time and use O(n) space. Prove that
your algorithm is correct.
* Exercise 12.20. Suppose we have a mathematical structure containing
three elements, a, b, and c, and a multiplication operation given by the
following table:
abc
aaca
b cbb
c acb
Note that this multiplication operation is neither commutative nor associa-
tive. Give a dynamic programming algorithm that takes as input a string
over a, b, and c, and returns a boolean indicating whether it is possible
to parenthesize the string so that the result is a. (For example, if we
parenthesize abca as (a(bc))a, we get a result of a.) Your algorithm should
run in O(n3 ) time, where n is the length of the input string. Prove that your
algorithm is correct.
* Exercise 12.21. Suppose we are given an array L[1..n] of positive integers
representing the lengths of successive words in a paragraph. We wish to
Optimization II: Dynamic Programming 407
format the paragraph so that each line contains no more than m characters,
including a single blank character between adjacent words on the same line.
Furthermore, we wish to minimize a “sloppiness” criterion. Specifically, we
wish to minimize the following objective function:
k−1
f (m − ci ),
i=1
(x1 − x2 )2 + (y1 − y2 )2 .
12.7 Notes
The mathematical foundation for dynamic programming was given by
Bellman [10]. The Change algorithm in Figure 12.1 is due to Wright [123].
The ChainedMatrixMult algorithm in Figure 12.2 is due to Godbole
[56]. Floyd’s algorithm (Figure 12.3) is due to Floyd [40], but is based on
a theorem due to Warshall [118] for computing the transitive closure of a
boolean matrix. Because a boolean matrix can be viewed as an adjacency
matrix for a directed graph, this is the same as finding the transitive closure
of a directed graph (Exercise 12.13).
The algorithm suggested by Exercise 12.3 is due to Kozen and Zaks [86].
Exercise 12.10 is solved by Chvatal et al. [22]. Wagner and Fischer [117]
solved Exercise 12.16 and provided an alternative solution to Exercise 12.10.
Exercise 12.17 is solved by Gilbert and Moore [55] and Knuth [81], but a
more elegant solution is given by Yao [125]. Exercises 12.19 and 12.20 are
from Brassard and Bratley [17].
Part IV
Depth-First Search
411
412 Algorithms: A Top-Down Approach
We interpret the size of the structure to be the size of num, and we interpret
the value of num[i] as the value associated with i. Clearly, the constructor
runs in Θ(n) time, and the operations all run in Θ(1) time.
The algorithm shown in Figure 13.2 combines the preorder and postorder
traversals of the tree T . We use a (directed) Graph to represent T . pre is
a VisitCounter that records the order in which nodes are visited in the
Depth-First Search 413
Precondition: n is a Nat.
Postcondition: Constructs a VisitCounter of size n, all of whose
values are 0.
VisitCounter(n)
count ← 0; num ← new Array[0..n − 1]
for i ← 0 to n − 1
num[i] ← 0
Precondition: true.
Postcondition: Returns the size of this VisitCounter.
VisitCounter.Size()
return SizeOf(num)
rooted at next has strictly fewer nodes than does T , from the Induction
Hypothesis, the recursive call satisfies the postcondition with S denoting
the set of descendants of next. Let R be the set of descendants of next. Let
S denote the value of S at the end of the iteration; i.e., S = S ∪ R. We
must show that the invariant holds for S at the end of the iteration.
Let us first determine the values in pre and post that have changed
from their initial values by the time the iteration completes. From the
invariant, only pre.Num(i), pre.Num(j), and post.Num(j) such that j ∈ S
have changed prior to the beginning of the iteration. From the Induction
Hypothesis, the recursive call only changes the values of pre.Num(j) and
post.Num(j) for j ∈ R. Thus, the only values to have changed from
their initial values are pre.Num(i), pre.Num(j), and post.Num(j) such
that j ∈ S . Furthermore because only values for j ∈ R are changed by
the iteration, it is still the case that pre.Num(i) > pre.Num(k) for all
k ∈ S ∪ {i}.
Let j ∈ S and k ∈ S . If j ∈ R, then pre.Num(j), post.Num(j),
pre.Num(k) and post.Num(k) are unchanged by the iteration. Therefore,
because j ∈ S, from the invariant, it is still the case that pre.Num(j) >
pre.Num(k) and post.Num(j) > post.Num(k). On the other hand, suppose
j ∈ R. Because k ∈ R, by the Induction Hypothesis, pre.Num(j) >
pre.Num(k) and post.Num(j) > post.Num(k) at the end of the iteration.
Now let j, k ∈ S . We must show that j is a proper ancestor of k iff
pre.Num(j) < pre.Num(k) and post.Num(j) > post.Num(k).
Correctness: Assume the invariant holds and that L is empty when the
loop terminates. We need to show that the postcondition holds when the
algorithm finishes. Let S denote the set of descendants of i.
Let us first consider which values in pre and post have been changed
by the algorithm. From the invariant, only pre.Num(i), pre.Num(j), and
post.Num(j), where j ∈ S \{i}, have changed by the time the loop
terminates. The final call to post.Visit(i) changes post.Num(i). Therefore,
the only values to have been changed by the algorithm are pre.Num(j) and
post.Num(j) such that j ∈ S.
Let j ∈ S and k ∈ S. If j = i, then from the invariant, pre.Num(j) >
pre.Num(k) and post.Num(j) > post.Num(k). If j = i, then from the
invariant pre.Num(j) > pre.Num(k). Furthermore, the call to post.Visit(i)
makes post.Num(j) > post.Num(k).
Now let j, k ∈ S. We must show that j is a proper ancestor of k iff
pre.Num(j) < pre.Num(k), and post.Num(j) > post.Num(k).
Figure 13.3 Algorithm for testing ancestry for multiple pairs of nodes in a rooted
tree
The algorithm for testing ancestry for multiple pairs of nodes is given
in Figure 13.3. The initialization prior to the call to PrePostTraverse
clearly runs in Θ(n) time, as does the call to PrePostTraverse. The
body of the loop runs in Θ(1) time. Because the loop iterates m times, the
entire algorithm runs in Θ(n + m) time.
Precondition: n is a Nat.
Postcondition: Constructs a Selector of size n, all of whose elements
are selected.
Selector(n)
Precondition: true.
Postcondition: Selects all elements.
Selector.SelectAll()
Precondition: true.
Postcondition: Unselects all elements.
Selector.UnselectAll()
Precondition: i is a Nat less than the number of elements.
Postcondition: Selects element i.
Selector.Select(i)
Precondition: i is a Nat less than the number of elements.
Postcondition: Unselects element i.
Selector.Unselect(i)
Precondition: i is a Nat less than the number of elements.
Postcondition: Returns true if element i is selected, or false otherwise.
Selector.IsSelected(i)
Depth-First Search 419
We can now traverse the graph using almost the same algorithm as
PrePostTraverse — the only differences are that pre and post are not
needed, and we must check that a vertex has not already been visited before
we traverse it. We call this traversal a depth-first search (DFS). The entire
algorithm is shown in Figure 13.5. We retain pre and post in order to
maintain a close relationship between ReachDFS and PrePostTraverse.
Let G be an undirected Graph, and let i ∈ N such that i < G.Size().
Further let sel be a Selector of size G.Size() in which all elements are
selected, and let pre and post be VisitCounters of size G.Size() in which
all values are 0. Suppose we invoke ReachDFS(G, i, sel, pre, post). We define
a directed graph G as follows, based on the behavior of this invocation:
We therefore have the ADT specified in Figure 13.6. The generic depth-first
search is shown in Figure 13.7.
Let us now consider the useful properties of depth-first spanning trees.
These properties concern the non-tree edges. First, we show the following
theorem regarding undirected graphs.
Precondition: n is a Nat.
Postcondition: Constructs a new Searcher of size n.
Searcher(n)
Precondition: i is a Nat less than the size of this Searcher.
Postcondition: true.
Searcher.PreProc(i)
Precondition: i is a Nat less than the size of this Searcher.
Postcondition: true.
Searcher.PostProc(i)
Precondition: e is an Edge whose vertices are less than the size of this
Searcher.
Postcondition: true.
Searcher.TreePreProc(e)
Precondition: e is an Edge whose vertices are less than the size of this
Searcher.
Postcondition: true.
Searcher.TreePostProc(e)
Precondition: e is an Edge whose vertices are less than the size of this
Searcher.
Postcondition: true.
Searcher.OtherEdgeProc(e)
The above theorem gives the property of depth-first spanning trees that
makes depth-first search so useful for connected undirected graphs. Given
a connected undirected graph G and a depth-first spanning tree T of G,
let us refer to edges of G that correspond to edges in T as tree edges. We
will call all other edges back edges. By definition, tree edges connect parents
with children. Theorem 13.2 tells us that back edges connect ancestors with
descendants.
Depth-First Search 423
The point at which the proof of Theorem 13.2 fails for directed graphs
is the initial assumption that j is unselected before k is. For an undirected
graph, one of the endpoints of the edge will be unselected first, and it doesn’t
matter which endpoint we call j. However, with a directed edge, either the
source or the destination may be unselected first, and we must consider both
cases. Given the assumption that the source is unselected first, the remainder
of the proof follows. We therefore have the following theorem.
Theorem 13.3. Let G be a directed graph with n vertices such that all
vertices are reachable from i, and let sel be a Selector of size n in which
all elements are selected. Suppose we call Dfs(G, i, sel, s), where s is a
Searcher of size n. Then for every edge (j, k) processed as a non-tree edge,
if j is unselected before k is, then j is an ancestor of k.
• edges from ancestors to descendants (we call these forward edges if they
are not in the tree);
• edges from descendants to ancestors (we call these back edges); and
• edges from right to left (we call these cross edges).
Theorem 13.3 gives us the property we need to make use of depth-first search
with directed graphs.
As a final observation, we note that back edges in directed graphs always
form cycles, because there is always a path along the tree edges from a vertex
to any of its descendants. Hence, a directed acyclic graph cannot have back
edges.
0 0
1 3 1 3
2 4 2 4
(a) (b)
Depth-First Search 425
In the next three sections, we will show how to use depth-first search to
design algorithms for connected undirected graphs, directed acyclic graphs,
and directed graphs.
We can now build a Searcher s so that Dfs(G, 0, sel, s) will find the
articulation points of G, where sel is an appropriate Selector. (Note that
it doesn’t matter which node is used as the root of the depth-first search,
so we will arbitrarily use 0.) Let n be the number of vertices in G. We
need as representation variables a VisitCounter pre of size n, an array
highest[0..n − 1], a readable array artPoints[0..n − 1] of booleans to store the
results, and a natural number rootChildren to record the number of children
of the root. Note that making artPoints readable makes this data structure
insecure, because code that can read the reference to the array can change
values in the array. We will discuss this issue in more detail shortly.
To implement the Searcher operations, we only need to determine
when the various calculations need to be done. Initialization should go in the
constructor; however, because the elements of the arrays are not needed until
Depth-First Search 427
ArtSearcher(n)
artPoints ← new Array[0..n − 1]; highest ← new Array[0..n − 1]
pre ← new VisitCounter(n); rootChildren ← 0
ArtSearcher.PreProc(i)
pre.Visit(i); artPoints[i] ← false; highest[i] ← ∞
ArtSearcher.TreePostProc(e)
i ← e.Source(); j ← e.Dest(); highest[i] ← Min(highest[i], highest[j])
if i = 0
rootChildren ← rootChildren + 1
else if highest[j] = pre.Num(i)
artPoints[i] ← true
ArtSearcher.OtherEdgeProc(e)
i ← e.Source(); k ← e.Dest()
highest[i] ← Min(highest[i], pre.Num(k))
ArtSearcher.PostProc(i)
if i = 0 and rootChildren > 1
artPoints[i] ← true
428 Algorithms: A Top-Down Approach
Figure 13.11 Algorithm for processing an entire graph with depth-first search
at a child of the root n. For this reason, we call the collection of trees
traversed by DfsAll a depth-first spanning forest. In particular, note that
either Theorem 13.2 or Theorem 13.3, depending on whether G is undirected
or directed, can be extended to apply to this forest.
Let G be a ListGraph with n vertices and a edges. A graph G
constructed by adding a new vertex and n − 1 new edges to G then has
n + a − 1 edges. Therefore, the running time of DfsAll(G, s) is easily seen
to be in Θ(n+a), provided each of the vertex and edge processing operations
in s runs in Θ(1) time.
Now consider the depth-first spanning forest for a directed acyclic graph.
Because there are no cycles, the spanning forest can have no back edges. This
leaves only tree edges, forward edges and cross edges. Furthermore, for each
of these types of edge (i, j), j is postorder processed before i. This property
suggests a straightforward algorithm for topological sort, namely, to order
the vertices in the reverse of the order in which they are postorder processed
by a depth-first search.
The Searcher for this algorithm needs as representation variables a
readable array order[0..n − 1] for storing the listing of vertices in topological
order and a natural number loc for storing the location in order of the last
vertex to be inserted. Only the constructor and the PostProc operation are
nonempty; these are shown in Figure 13.12. The topological sort algorithm is
shown in Figure 13.13. If G is implemented as a ListGraph, the algorithm’s
432 Algorithms: A Top-Down Approach
TopSortSearcher(n)
order ← new Array[0..n − 1]; loc ← n
TopSortSearcher.PostProc(i)
loc ← loc − 1; order[loc] ← i
running time is clearly in Θ(n + a), where n is the number of vertices and a
is the number of edges in G. We leave the proof of correctness as an exercise.
Induction Hypothesis: Let n > 0, and assume that for every m < n, if
there is a path of length m from k to i in G , then k is a descendant of i.
Case 2: (j, k) is either a forward edge or a tree edge. Then i and j are both
ancestors of k. Because j is in G , it can be postorder processed no later
than i. Therefore, j cannot be a proper ancestor of i. j must therefore be a
descendant of i.
RevSearcher(n)
reverse ← new ListMultigraph(n)
order ← new Array[0..n − 1]; loc ← n
RevSearcher.TreePreProc(e)
reverse.Put(e.Dest(), e.Source(), e.Data())
RevSearcher.OtherEdgeProc(e)
reverse.Put(e.Dest(), e.Source(), e.Data())
RevSearcher.PostProc(i)
loc ← loc − 1; order[loc] ← i
436 Algorithms: A Top-Down Approach
SccSearcher(n)
components ← new Array[0..n − 1]; count ← 0
SccSearcher.PreProc(i)
components[i] ← count
SccSearcher.NextComp()
count ← count + 1
13.7 Summary
Many graph problems can be reduced to depth-first search. In performing the
reduction, we focus on a depth-first spanning tree or a depth-first spanning
forest. Because a rooted tree is more amenable to the top-down approach
than is a graph, algorithmic design is made easier. Furthermore, depth-first
spanning trees have structural properties that are often useful in designing
graph algorithms.
Depth-First Search 437
13.8 Exercises
Exercise 13.1. Analyze the worst-case running time of the algorithm Pre-
PostTraverse, shown in Figure 13.2, assuming the tree T is implemented
as a MatrixGraph.
Exercise 13.2. Prove that DfsTopSort, shown in Figures 13.12 and 13.13,
meets its specification.
Exercise 13.3. Show that StronglyConnComp, shown in Figures 13.14–
13.16, runs in Θ(n + a) time, where n is the number of vertices and a is the
number of edges in the given graph, assuming the graph is implemented as
a ListGraph.
Exercise 13.4. Prove that StronglyConnComp, shown in Figures 13.14–
13.16, meets its specification.
Exercise 13.5. Give an algorithm that decides whether a given directed
graph G contains a cycle. Your algorithm should return a boolean value
that is true iff G has a cycle. Assuming G is implemented as a ListGraph,
your algorithm should run in O(n+a) time, where n is the number of vertices
and a is the number of edges in G.
Exercise 13.6. A bridge in a connected undirected graph is an edge whose
removal disconnects the graph. Give an algorithm that returns a ConsList
containing all bridges of a given connected undirected graph. Your algorithm
should run in O(a) time in the worst case, where a is the number of edges
in the graph, assuming the graph is implemented as a ListGraph.
Exercise 13.7. A connected undirected graph is said to be biconnected if
it is impossible to disconnect the graph by removing a single vertex; i.e., it
is biconnected iff it has no articulation points. A biconnected component of
a connected undirected graph G is a maximal biconnected subgraph G of
G (by “maximal”, we mean that there is no biconnected subgraph of G that
contains all of G plus other vertices and/or edges).
a. Prove that each edge in a connected undirected graph G belongs to
exactly one biconnected component of G.
438 Algorithms: A Top-Down Approach
13.9 Notes
The depth-first search technique was developed in the nineteenth century
by Trémaux, as reported by Lucas [92]. Its properties were studied by
Tarjan [110], who presented an algorithm he credits to Hopcroft for finding
articulation points and biconnected components (Exercise 13.7); see also
Hopcroft and Tarjan [65]. The algorithm given in Section 13.6 for finding
strongly connected components is due to Sharir [105].
This page intentionally left blank
Chapter 14
441
442 Algorithms: A Top-Down Approach
sink 5
1 1
3 4
1
1 1
1 2
1 1
source 0
Thus, the flow on each edge is no more than that edge’s capacity, and the
total flow into a vertex other than the source or the sink is the same as the
total flow out of that vertex. An example of a flow on the network shown
in Figure 14.1 would have a flow of 1 on every edge except (4, 1); this edge
would have a flow of 0.
We leave it as an exercise to show that for any flow F of a flow network
(G, u, v, C),
F (e) − F (e) = F (e) − F (e). (14.1)
e∈u→ e∈u← e∈v← e∈v→
(a) (b)
sink 5 sink 5
1 1 1 1
3 4 3 4
1 1
1 1 1 1
1 2 1 2
1 1 1 1
source 0 source 0
Case 1: (x, y) ∈ P . Then C ((x, y)) = C((x, y)) − m. The sum of the two
flows on (x, y) is therefore at most C((x, y)).
Case 3: (x, y) ∈ P and (y, x) ∈ P . Then C ((x, y)) = C((x, y)). In the
combination of F1 with F2 , the flow on (x, y) is simply its flow in F2 , which
can be no more than C((x, y)).
Finally, it is clear that for each vertex w in G other than u and v, the
total flow into w must equal the total flow out of w, and that the total flow
out of u is k + m.
Using the above Lemma, we can prove the theorem below. Combined
with Lemma 14.1, this theorem ensures that the reduction yields a maximum
flow for the given network.
Theorem 14.2. Let (G, u, v, C) be a flow network with maximum flow k,
and let P be the set of edges in some augmenting path. Let m be the minimum
446 Algorithms: A Top-Down Approach
in the worst case. Furthermore, the analysis of the last section still applies,
so that the running time is in O(min(M, na)(n + a)), where M is the value
of the maximum flow. If we assume that every vertex is reachable from the
source, we can simplify this to O(min(M a, na2 )).
4 5 6 7 8
0 1 2 3
452 Algorithms: A Top-Down Approach
Figure 14.6 The flow network constructed from the bipartite graph shown in
Figure 14.5
sink 10
4 5 6 7 8
0 1 2 3
source 9
All edge capacities are 1.
edges, where n and a are the number of vertices and edges, respectively, in
the bipartite graph. The Edmonds–Karp algorithm will therefore solve the
constructed network flow instance in O(M (n + a)) time, where M is the
number of edges in the maximum-sized matching. Because M can be no
more than n/2, if we assume that each vertex is incident on at least one
edge, the running time is in O(na). Furthermore, it is not hard to construct
the flow network in O(n + a) time, so the bipartite matching problem can
be solved in O(na) time.
Rather than presenting the code for the reduction, let us first examine
the reduction more carefully to see if we can optimize the bipartite matching
algorithm. For example, the addition of new vertices and edges is only
needed to form a flow network. We could instead adapt one of the network
flow algorithms to operate without the source and/or the sink explicitly
represented.
We also note that as flow is added, the edges containing the flow — which
are the edges of a matching — have their direction reversed. Rather than
explicitly reversing the direction of the edges, we could keep track of which
edges have been included in the matching in some other way. For example,
we could use an array matching[0..n − 1] such that matching[i] gives the
vertex to which i is matched, or is −1 if i is unmatched. Because a matching
has at most one edge incident on any vertex, this may end up being a more
efficient way of keeping track of the vertices adjacent (in the flow network)
to vertices in V2 . The maximum-sized matching could also be returned via
this array.
As we observed at the end of Section 14.1, once flow is added to any
edge from the source or to any edge to the sink, that flow is never removed.
To put this in terms of the matching algorithm, once a vertex is matched, it
remains matched, although the vertex to which it is matched may change.
Furthermore, we claim that if we ever attempt to add a vertex w ∈ V1 to the
current matching M and are unable to do so (i.e., there is no path from w
to an unmatched vertex in V2 ), then we will never be able to add w to the
matching.
To see why this is true, notice that if there were a maximum-sized
matching containing all currently matched vertices and w, then there is
a matching M containing no other vertices from V1 . If we delete all
vertices from V1 that are unmatched in M , then M is clearly a maximum-
sized matching for the resulting graph. The Ford–Fulkerson algorithm must
therefore be able to find a path that yields M from M .
454 Algorithms: A Top-Down Approach
Figure 14.7 The MatchingGraph for the bipartite graph shown in Figure 14.5
with matching {{0, 5}, {3, 7}}
4
0
5
1
9 6
2
7
3
8
Its structural invariant will be that for 0 ≤ i < n, if matching[i] = −1, then
matching[matching[i]] = i.
A partial implementation is shown in Figure 14.8 — we only include
implementations of those operations we will actually be using. These oper-
ations include an additional operation for adding an edge to the matching,
while removing any edges that might be incident on either endpoint. We
also include a constructor that constructs a MatchingGraph from a given
bipartite graph with an empty matching. We use the data variable of an
Edge to store the intermediate vertex between the two edges of the bipartite
graph represented by that Edge.
Note that this implementation is not secure, because its constructor
allows an outside reference to bipartite, and because matching is readable.
We could easily modify the implementation so that the constructor stores
a copy of its input graph and the Matching operation returns a copy of
matching; however, if we write our matching algorithm so that it doesn’t
456 Algorithms: A Top-Down Approach
MatchingGraph.Size()
return bipartite.Size() + 1
MatchingGraph.AllFrom(i)
n ← bipartite.Size(); L ← new ConsList()
if i < n
foundUnmatched ← false; adj ← bipartite.AllFrom(i)
while not adj.IsEmpty()
e ← adj.Head(); adj ← adj.Tail()
k ← e.Dest(); j ← matching[k]
if j = −1 and not foundUnmatched
L ← new ConsList(new Edge(i, n, k), L)
foundUnmatched ← true
else if j = −1
L ← new ConsList(new Edge(i, j, k), L)
return L
PathSearcher(n)
incoming ← new Array[0..n − 1]
PathSearcher.TreePreProc(e)
incoming[e.Dest()] ← e
The number of iterations of the inner loop is at most the current size of the
matching, so its running time is in O(n) ⊆ O(a). The call to SelectAll
also runs in O(n) ⊆ O(a) time. We therefore conclude that a single iteration
of the for loop runs in O(a) time, so that the entire algorithm runs in O(na)
time.
To show that the running time of the algorithm is in Ω(na), we will first
construct a graph with 4k vertices and 4k − 1 edges for k ∈ N. We will
show that the algorithm runs in Ω(k2 ) time for these graphs. We will then
generalize the construction to an arbitrary number n of vertices and a edges
such that n − 1 ≤ a < n(n + 20)/32. We will show that the algorithm runs
in Ω(na) time for these graphs.
We begin by setting V = {i | 0 ≤ i < 4k} (refer to Figure 14.11 for the
case in which k = 4). We then add the following edges:
8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7
We arrange the edges so that when we try to add vertex 2i for 0 ≤ i < k,
we first encounter the edge {2i, 2k + i}. Because 2k + i is not in the
matching, it is added. It will then be impossible to add vertex 2i + 1,
but each node 2k + j, for 0 ≤ j < i, will be reached in the search for
an augmenting path. (For example, consider the search when trying to add
5 to the matching {{0, 8}, {2, 9}, {4, 10}} in Figure 14.11.) Constructing this
matching therefore uses Ω(k2 ) time.
We can now generalize the above construction to arbitrary n by adding
or removing a few vertices adjacent to 2k −1. Furthermore, we can add edges
{2i, 2k + j} for 0 ≤ i < k and 0 ≤ j < i − 1 without increasing the size of
the maximum-sized matching. However, these additional edges must all be
traversed when we try to add vertex 2i+1 to the matching. This construction
therefore forces the algorithm to use Ω(na) time. Furthermore, the number
of edges added can be as many as
k−1
k(k − 1)
(i − 1) = −k
2
i=0
k2 − 3k
=
2
n2 − 12n
= .
32
Including the n − 1 original edges, the total number of edges a is in the range
n(n + 20)
n−1≤a< .
32
The above construction is more general than we really need, but its
generality shows that some simple modifications to the algorithm won’t
improve its asymptotic running time. For example, the graph is connected,
so processing connected components separately won’t help. Also, the two
partitions are the same size, so processing the smaller (or larger) partition
460 Algorithms: A Top-Down Approach
first won’t help either. Furthermore, using breadth-first search won’t help
because it will process just as many edges when no augmenting path exists.
On the other hand, this algorithm is not the most efficient one known for
this problem. In the exercises, we explore how it might be improved.
Although the optimizations we made over a direct reduction to network
flow did not improve the asymptotic running time of the algorithm, the
resulting algorithm may have other advantages. For example, suppose we are
trying to match jobs with job applicants. Each applicant may be qualified
for several jobs. We wish to fill as many jobs as possible, but still assign
jobs so that priority is given to those who applied earlier. If we process the
applicants in the order in which they applied, we will obey this priority.
14.4 Summary
The network flow problem is a general combinatorial optimization problem
to which many other problems can be reduced. Although the Ford–Fulkerson
algorithm can behave poorly when the maximum flow is large in comparison
to the size of the graph, its flexibility makes it useful for those cases in which
the maximum flow is known to be small. For cases in which the maximum
flow may be large, the Edmonds–Karp algorithm, which is simply the Ford–
Fulkerson algorithm using breadth-first search to find augmenting paths,
performs adequately.
The bipartite matching problem is an example of a problem which occurs
quite often in practice and which can be reduced to network flow to yield a
reasonably efficient algorithm. Furthermore, a careful study of the reduction
yields insight into the problem that leads to a more general algorithm.
14.5 Exercises
Exercise 14.1. Prove Equation (14.1) on page 442. [Hint: Show by
induction that the net flow out of any set of vertices including the source
but not the sink is equal to the left-hand side.]
√
e. Give an O(a n) algorithm to find a maximum-sized matching in G.
Exercise 14.9. Suppose we modify the network flow problem so that the
input includes an array cap[0..n − 1] of integers such that for each vertex i,
cap[i] gives an upper bound on the flow we allow to go to and from vertex
i. Show how to reduce this problem to the ordinary network flow problem.
Your reduction must run in O(n + a) time, where n is the number of vertices
and a is the number of edges in the graph.
Exercise 14.10. We define an n × n grid to be an undirected graph (V, E)
where V = {(i, j) | 1 ≤ i ≤ n, 1 ≤ j ≤ n}, and two vertices (i, j) and (i , j )
are adjacent iff either i = i and j = j ± 1 or j = j and i = i ± 1. Thus, each
vertex in a grid has at most 4 neighbors. We call the vertices with fewer than
4 neighbors boundary vertices (i.e., these are vertices (1, j), (n, j), (i, 1), or
(i, n)). Give an O(mn2 ) algorithm which takes a value n ∈ N and m ≤ n2
starting vertices (i, j) ∈ [1..n] × [1..n] and determines whether there exists a
set of m vertex-disjoint paths in the n × n grid, each connecting a starting
node with a boundary node. You may assume you have an algorithm for the
problem stated in Exercise 14.9.
* Exercise 14.11. A path cover of a directed graph is a set of paths such
that every vertex is included in exactly one path. The size of a path cover is
the number of paths in the set. Show how to reduce the problem of finding
a minimum-sized path cover in a directed acyclic graph to the problem of
finding a maximum-sized matching in a bipartite graph. The total running
time of the algorithm should be in O(na).
* Exercise 14.12. We are given two arrays of integers, R[1..m] and C[1..n]
such that
m
n
R[i] = C[i] = k.
i=1 i=1
your running time analysis, you may assume the graph is represented as a
ListGraph. Prove the correctness of your algorithm.
14.6 Notes
The NetworkFlow algorithm is due to Ford and Fulkerson [42]. The
running-time analysis of the use of breadth-first search in the Ford–Fulkerson
algorithm is due to Edmonds and Karp [37] and Dinic [30]. Asymptotically
faster algorithms exist — to date, the fastest known is due to Goldberg and
Rao [57]. Their algorithm has a running time in
O(min(n2/3 , a1/2 )a lg(n2 /a + 2) lg C),
where C is the maximum capacity of any edge.
The technique of finding a maximum-sized matching using augmenting
paths is due to Berge [13]. He showed that in an arbitrary undirected graph,
a matching is of maximum size iff no augmenting path exists. Finding
augmenting paths in arbitrary undirected graphs is more challenging,
however, because we must avoid returning to the same vertex from which we
started. The first efficient algorithm for finding an augmenting path in an
arbitrary undirected graph is due to Edmonds [35]. The algorithm suggested
by Exercise 14.8 is due to Hopcroft and Karp [64], and is the asymptotically
fastest known algorithm for finding a maximum-sized matching in a bipartite
graph. The structure of this exercise is based on a problem in Cormen, et al.
√
[25]. An O(a n) algorithm for arbitrary undirected graphs was later given
by Micali and Vazirani [93].
A solution to Exercise 14.6 is given by Ford and Fulkerson [43].
This page intentionally left blank
Chapter 15
15.1 Convolutions
Let a = a0 , . . . , am−1 and b = b0 , . . . , bn−1 be two vectors. We define the
convolution of a and b as the vector c = c0 , . . . , cm+n−2 , where
min(j,m−1)
cj = ai bj−i .
i=max(0,j−n+1)
465
466 Algorithms: A Top-Down Approach
in fact the hidden constant becomes quite large as approaches 0). We wish
to improve on these algorithms.
It is a well-known fact that a polynomial of degree n − 1 is uniquely
determined by its values at any n distinct points. Therefore, one way to
multiply two polynomials p(x) and q(x) whose product has degree n − 1 is
as follows:
Note that step 2 can be done in Θ(n) time, assuming each multiplication
can be done in Θ(1) time. We need to show how steps 1 and 3 can be done
efficiently.
The evaluation of a polynomial of degree less than n at n distinct points
can be viewed as a linear transformation — i.e., a multiplication of a 1 × n
vector by an n × n matrix. Specifically, let p be the 1 × n vector representing
the coefficients of a polynomial p(x) as described above (if the degree is less
than n − 1, we can use coefficients of 0 for the high-order terms). Let A
be the n × n matrix such that Aij = xij for 0 ≤ i < n, 0 ≤ j < n, where
x0 , . . . , xn−1 are distinct values. Then the product pA yields the 1 × n vector
v = v0 , . . . , vn−1 such that
n−1
vj = pi Aij
i=0
n−1
= pi xij
i=0
= p(xj ).
vA−1 = pAA−1
= p.
(pA · qA)A−1 ,
where “·” denotes the component-wise product of two vectors of the same
size.
The main problem with this approach is that the multiplications of a 1×n
vector with an n×n array would appear to require Ω(n2 ) time. However, this
running time can be improved if we choose the points x0 , . . . , xn−1 cleverly.
In order to do this, we need to allow them to be chosen from the set of
complex numbers, C. We also need to define, for any n ≥ 1, a principal nth
root of unity as any value ω ∈ C such that
• ω n = 1; and
• for 1 ≤ j < n,
n−1
ω ij = 0.
i=0
We will show how to find such values in C. First, however, let us consider
why having a principal nth root of unity might be helpful. Given a principal
nth root of unity ω, let A be the n × n matrix such that Aij = ω ij . Given a
1 × n vector p, the product pA is said to be the discrete Fourier transform of
p with respect to ω. Note that if p is the coefficient vector for a polynomial
p(x), then pA gives the values of p(ω j ) for 0 ≤ j < n.
In what follows, we will develop a divide-and-conquer algorithm for
computing a DFT. To simplify matters, let’s assume that n is a power of 2.
The following theorem shows an important property of principal nth roots
of unity when n is a power of 2. We will use this property in designing our
divide-and-conquer algorithm.
n−1
ω i(2j) = 0.
i=0
468 Algorithms: A Top-Down Approach
= ω 2ij + ω (2(i+n/2)−n)j
i=0 i=0
n
−1
2
=2 ω 2ij .
i=0
(ω 2 )ij = 0.
i=0
Note that each sum on the right-hand side is the jth component of the
DFT with respect to ω 2 of a 1 × n/2 vector. Specifically, let d and d be the
DFTs of p and p , respectively, with respect to ω 2 , and let d be the DFT of
p with respect to ω. Then for 0 ≤ j < n/2, we have
dj = dj + ω j dj . (15.1)
Furthermore,
n n
−1 −1
2
2
= 1 + ω.
Rearranging terms, we have ω = −1.
= ω ij + ω (i+n/2)j
i=0 i=0
n n
−1 −1
2
2
= ω ij + ω ij (ω n/2 )j
i=0 i=0
n n
−1 −1
2
2
= ω ij + ω ij (−1)j
i=0 i=0
= 0.
n−1
= (ω 2 )ij/2 ,
i=n/2
* The Fast Fourier Transform 471
= (ω 2 )(i+n/2)j/2
i=0
n
−1
2
= (ω 2 )(n/2)j/2 (ω 2 )ij/2
i=0
n
−1
2
= (ω 2 )ij/2
i=0
= 0.
We conclude that ω is a principal nth root of unity.
Using the fact that ω n/2 = −1, we can now rewrite (15.2) for 0 ≤ j <
n/2 as
dj+n/2 = dj − ω j dj . (15.3)
We therefore have the divide-and-conquer algorithm, known as the Fast
Fourier Transform, shown in Figure 15.1. Note that we use the type
Complex to represent a complex number.
Because Fft should only be called with a vector whose size n is a power
of 2, n is not a good measure of the size of the problem instance for the
purpose of analyzing the algorithm. Instead, we will use k = lg n. Assuming
each arithmetic operation on complex numbers can be performed in Θ(1)
time, it is easily seen that the running time excluding the recursive calls is
in Θ(2k ). The worst-case running time is therefore given by the recurrence
f (k) ∈ 2f (k − 1) + Θ(2k ).
From Theorem 3.34, f (k) ∈ Θ(k2k ).
In order to use Fft to compute a convolution, we need to be able to
compute the inverse transform. Let A be the n × n matrix defining a DFT.
In order to compute the inverse transform, we need to know that A−1 exists,
and we need an efficient way to multiply a given 1 × n vector on the right
by A−1 . The following theorem gives A−1 .
472 Algorithms: A Top-Down Approach
Theorem 15.3. Let A be the n × n matrix such that for 0 ≤ i < n and
0 ≤ j < n, Aij = ω ij , where ω is a principal nth root of unity. Then A−1 is
the matrix B, where Bij = ω −ij /n.
Proof. We must show that AB = I, where
⎧
⎨1 if i = j
Iij =
⎩0 otherwise,
Case 1: i = j. Then
n−1
1 k(i−j)
Cij = ω
n
k=0
n−1
1 0
= ω
n
k=0
= 1.
= 0.
these values. The theorem actually holds for all positive n, but the proof is
simpler when n is a power of 2.
Theorem 15.5. Let n be a power of 2. Then
2π 2π
cos + i sin
n n
is a principal nth root of unity.
Proof. We first observe that if n = 1, then
2π 2π
cos + i sin = 1 + 0i
n n
= 1.
Base: n = 2. Then
n/2
2π 2π
cos + i sin = cos π + i sin π
n n
= −1 + 0i
= −1.
Induction Step:
n/2 2 n/4
2π 2π 2π 2π
cos + i sin = cos + i sin
n n n n
n/4
22π 2 2π 2π 2π
= cos − sin + 2i cos sin .
n n n n
* The Fast Fourier Transform 475
and
We therefore have
n/2 n/4
2π 2π 4π 4π
cos + i sin = cos + i sin
n n n n
n/2
2π 2π 2
= cos + i sin
n/2 n/2
= −1
n−1
n−1
pi Aij = pi ω ij .
i=0 i=0
n−1
n−1
n−1
n−1
ij kj
pi ω qk ω = pi qk ω (i+k)j .
i=0 k=0 i=0 k=0
476 Algorithms: A Top-Down Approach
Figure 15.2 Algorithm for computing a positive wrapped convolution over C using
the Fast Fourier Transform
Example 15.1. Z, +, the set of integers with addition, is an abelian group.
Example 15.2. N, +, the set of natural numbers with addition, is not a
group because only 0 has an inverse.
Example 15.3. For a positive integer m, let Zm denote the set of natural
numbers strictly less than m, and let + denote addition mod m. It is not
hard to see that Zm , + is an abelian group, with 0 being the identity and
m − i being the inverse of i.
but
10 11 11
= .
11 01 12
In what follows, we will show that the results of the previous section
extend to an arbitrary commutative ring R = S, +, · with unit element 1.
For convenience, we will typically abbreviate x · y as xy. We will also
abbreviate x + (−y) as x − y.
480 Algorithms: A Top-Down Approach
Hence, the definition of a principal nth root of unity makes sense for R.
Furthermore, the definition of a discrete Fourier transform also makes sense
over this ring. The following theorem states that some familiar properties of
exponentiation must hold for any ring with unit element; its proof is left as
an exercise.
Theorem 15.6. Let R be any ring with unit element. Then the following
properties hold for any x in R and any m, n ∈ N :
a. xm xn = xm+n .
b. (xm )n = x(mn) .
Theorem 15.1 can be shown using only the properties given in the
definition of a ring, together with Theorem 15.6. It therefore applies to
R. The derivations of Equations (15.1) and (15.2) use the properties of a
ring, together with commutativity, so that they also hold for R. The proof
of Theorem 15.2 applies for arbitrary rings with unit elements, so equation
(15.3) holds for R. The algorithm Fft therefore can be used to compute a
DFT over R, provided ω is a principal nth root of unity for that ring, and
that addition and multiplication on elements of the ring are the + and ·
operations from R.
In order to extend Theorem 15.3 to R, we must consider what it would
mean to divide by n in that ring. First of all the ring might not contain n as
an element. However, we can always embed the integers into a ring with unit
element as follows. First, if the ring has a unit element 1, it also contains
−1 (the additive inverse of 1) and 0 (the additive identity). For n > 1, if
n − 1 is in the ring, we can give the element (n − 1) + 1 the name n, and we
can give the element −(n − 1) − 1 the name −n. Thus, each integer refers to
some element of the ring. Note that a particular element of the ring might
not correspond to any integer, or it might correspond to more than one. If
it does correspond to more than one integer, it is not hard to show that it
corresponds to infinitely many integers.
Now that we have identified n with some element in the ring, we can
define division by n as multiplication by n−1 , provided n has a multiplicative
inverse. We note that if ω is a principal nth root of unity, then ωω n−1 = 1,
so that ω −1 = ω n−1 . Then the proof of Theorem 15.3 can easily be seen to
* The Fast Fourier Transform 481
Theorem 15.7. Let k and n be powers of 2 such that 1 ≤ n ≤ 2k, and let
m = 2k + 1. In the ring Zm , +, · :
Note that if k and n are both powers of 2 such that 1 ≤ n ≤ 2k, both
22k/n and 22k /n are also powers of 2. This fact is advantageous because
482 Algorithms: A Top-Down Approach
MultFft(u, v)
n ← Max(1, u.NumBits() + v.NumBits()); k ← 2 lg n
return ModMult(u, v, k)
ModMult(u, v, k)
b−1
u= ui 2il
i=0
and
b−1
v= vi 2il .
i=0
Note that the last term in the above sum (i.e., for j = 2b−1) is 0. We include
it in order to simplify the derivation that follows.
Because k = bl, 2bl = −1 in the ring Zm , +, ·, where m = 2k + 1. We
can therefore write the product uv in this ring as
⎛ ⎞ ⎛ ⎞
b−1 j 2b−1
b−1
uv = ⎝ ui vj−i 2jl ⎠ − ⎝ ui vj−i 2(j−b)l ⎠
j=0 i=0 j=b i=j−b+1
⎛ ⎞ ⎛ ⎞
j
b−1 b−1
b−1
=⎝ ui vj−i 2jl ⎠ − ⎝ ui vj−i+b 2jl ⎠
j=0 i=0 j=0 i=j+1
⎛ ⎞
b−1
j
b−1
= 2jl ⎝ ui vj−i − ui vj−i+b ⎠ .
j=0 i=0 i=j+1
Theorem 15.8. Let R be a commutative ring with unit element, and suppose
ψ is a principal (2n)th root of unity in R. Let p and q be 1 × n vectors over
R, and let Ψ and Ψ be 1 × n vectors such that Ψj = ψ j and Ψj = ψ 2n−j for
0 ≤ j < n. Then the negative wrapped convolution of p and q is given by
j
n−1
2n 3n
=ψ pi qj−i + ψ pi qj−i+n
i=0 i=j+1
j
n−1
= pi qj−1 + ψ n pi qj−i+n
i=0 i=j+1
j
n−1
= pi qj−1 − pi qj−i+n .
i=0 i=j+1
uniquely determines
j
b−1
ui vj−i − ui vj−i+b .
i=0 i=j+1
ModMultFft(u, v, k)
if k < 16
return ToRing(MultiplyAdHoc(u, v), k)
else
if (lg k) mod
√ 2=√ 0
b ← 2 k; l ← k/2
else √
b ← 2k; l ← k/2
uarray ← new Array[0..b − 1]; varray ← new Array[0..b − 1]
for j ← 0 to b − 1
uarray[j] ← new BigNum(u.GetBits(jl, l))
varray[j] ← new BigNum(v.GetBits(jl, l))
conv ← NegConv(uarray, varray, 4l)
return Eval(conv, k, l)
must be careful, however, when subtracting ω i d [i] from d [i] in order to
obtain d[i + mid], because ω i d [i] may be greater than d [i]. In order to
satisfy the precondition of BigNum.Subtract (Figure 4.18 on page 146),
we first subtract ω i d [i] from m, then add the result, mod m, to d [i]. In
* The Fast Fourier Transform 487
Figure 15.7 The Fast Fourier Transform algorithm over a modular ring
ToRing(x, k)
numDig x.NumBits()/k ; m ← one.Shift(k).Add(one)
rem ← x.GetBits(k(numDig − 1), k)
// Invariant:
// rem = x.GetBits((i + 1)k, x.NumBits() − (i + 1)k) mod m
for i ← numDig − 2 to 0 by −1
next ← x.GetBits(ik, k)
if rem.CompareTo(next) > 0
next ← next.Add(m)
rem ← next.Subtract(rem)
return rem
Eval(v[0..n − 1], k, l)
m ← one.Shift(k).Add(one); m ← one.Shift(4l).Add(one)
half ← m .Shift(−1)
pos ← new Array[0..nl − 1]; neg ← new Array[0..nl − 1]
posCarry ← zero; negCarry ← zero
for j ← 0 to n − 1
if v[j].CompareTo(half ) > 0
negCarry ← negCarry.Add(m .Subtract(v[j]))
else
posCarry ← posCarry.Add(v[j])
negBits ← negCarry.GetBits(0, l); negCarry ← negCarry.Shift(l)
posBits ← posCarry.GetBits(0, l); posCarry ← posCarry.Shift(l)
Copy(negBits[0..l − 1], neg[jl..j(l + 1) − 1])
Copy(posBits[0..l − 1], pos[jl..j(l + 1) − 1]);
posNum ← posCarry.Shift(nl).Add(new BigNum(pos))
negNum ← negCarry.Shift(nl).Add(new BigNum(neg))
return ToRing(posNum.Add(m.Subtract(ToRing(negNum, k))), k)
Θ(bl) = Θ(2(K+1)/2+(K−1)/2 )
= Θ(2K ).
O(2K K lg K), and the running time of the resulting multiplication algorithm
would be in O(n lg n lg lg n). Thus, in order to improve the running time of
ModMult, it suffices to reduce the size of the ring we use from 24l + 1 =
2
22 l + 1 to 22l + 1.
The difficulty with such an approach is that we have already shown that
lg(b22l+1 −1) bits are required so that the elements of the negative wrapped
convolution over the given ring uniquely determine the negative wrapped
convolution over the integers. We need an additional result result that will
allow us to extract the elements of the negative wrapped convolution over
the integers from their values over a modular ring. This result is the Chinese
Remainder Theorem.
Theorem 15.9 (Chinese Remainder Theorem). Let a1 , a2 , m1 , and
m2 be natural numbers such that a1 < m1 , a2 < m2 , where m1 and m2 are
relatively prime. Then there is a unique natural number i < m1 m2 such that
i mod m1 = a1 and i mod m2 = a2 .
Before we prove this theorem, let’s see why it might useful. We need to
compute the negative wrapped convolution of two vectors u and v, each of
size b and consisting of natural numbers less than 2l . Let wj denote the jth
component of the negative wrapped convolution. As we have already shown,
−b22l < wj < b22l . Suppose we were to compute the negative wrapped
convolution over two separate rings Zmi , +, ·, where m1 = 22l + 1 and
m2 = 2b, as shown in Figure 15.10. (As we will see, it is possible to compute
the second convolution with relatively little overhead.) Then the results of
these convolutions give us
and
for 0 ≤ j < b.
Because 2b is a power of 2 and 22l + 1 is odd, they are relatively prime.
Theorem 15.9 therefore guarantees that if wj ≥ 0, then it is the only natural
number less than 2b(22l + 1) that satisfies (15.2) and (15.3). Furthermore, it
is not hard to see that wj + 2b(22l + 1) also satisfies these constraints. Thus,
Theorem 15.9 guarantees that if wj < 0, then wj + 2b(22l + 1) is the only
natural number less than 2b(22l +1) that satisfies these constraints. The proof
of Theorem 15.9 will be constructive, so that we will be able to compute the
value that it guarantees. Finally, because wj < b(22l + 1) < wj + 2b(22l + 1),
494 Algorithms: A Top-Down Approach
ModMultSS(u, v, k)
if k < 8
return ToRing(MultiplyAdHoc(u, v), k)
else
if (lg k)√mod 2 =√0
b ← k; l ← k
else √
b ← 2k; l ← k/2
uarray ← new Array[0..b − 1]; varray ← new Array[0..b − 1]
uarray ← new Array[0..b − 1]; varray ← new Array[0..b − 1]
for j ← 0 to b − 1
uarray[j] ← new BigNum(u.GetBits(jl, l))
varray[j] ← new BigNum(v.GetBits(jl, l))
uarray [j] ← new BigNum(u.GetBits(jl, lg b + 1))
varray [j] ← new BigNum(v.GetBits(jl, lg b + 1))
conv ← NegConv(uarray, varray, 2l)
conv ← NegConvSS(uarray , varray , lg b + 1)
return EvalSS(conv, conv , k, l)
We can multiply by 22l + 1 using a bit shift and an addition. We can then
determine wj by comparing the above value with b(22l+1 ) and subtracting
2b(22l + 1) if necessary. The algorithm is shown in Figure 15.11.
In order to implement NegConvSS, we must be able to compute a
negative wrapped convolution over a ring Zm , +, ·, where m is a power
of 2. However, because the values of the vectors are much smaller than those
used in the other convolution, we don’t need to be quite as careful regarding
the efficiency of this algorithm. Specifically, we don’t need to use the FFT.
Instead, we can first compute a non-wrapped convolution mod 2k . Let us
* The Fast Fourier Transform 497
when K ≥ 3.
The above recurrence fits the form of (15.7) with d = 1; hence, as we
showed at the beginning of this section, the running time of the Schönhage–
Strassen algorithm is in O(n lg n lg lg n), where n is the number of bits in the
product.
15.5 Summary
The Fast Fourier Transform is an efficient algorithm for computing a con-
volution, a problem which arises in a variety of applications. For numerical
applications, applying the FFT over C, +, · is appropriate; however, for
number-theoretic applications like arbitrary-precision integer multiplication,
other algebraic structures are more appropriate. The algorithm extends to
any commutative ring containing a principal nth root of unity, and over
which n has a multiplicative inverse, where n is a power of 2 giving the
number of elements in the vectors.
Some rings that are particularly useful for number-theoretic applications
are rings of the form Zm , +, ·, where m is of the form 2k + 1. The properties
of these rings contribute in several ways to the efficiency of the Schönhage–
Strassen integer multiplication algorithm. First, we can compute n mod
(2k + 1) efficiently. Second, the principal nth roots of unity in these rings
are powers of 2, so that we can use bit shifting to multiply by these roots.
Third, when n is a power of 2, it has a multiplicative inverse that is also a
power of 2. Fourth, we can compute a product in this ring with a negative
wrapped convolution of vectors with half as many elements as would be
needed to compute a non-wrapped convolution. Finally, because any power
of 2 is relatively prime to 2k + 1, we can reduce by half the number of
* The Fast Fourier Transform 499
15.6 Exercises
Exercise 15.1. Prove Theorem 15.6. [Hint: Use induction on either m or n.]
Exercise 15.2. Suppose that in multiplying two BigNums mod 2k − 1,
where k is a power of 2, instead of making b and 4l as nearly equal as
possible (as in Section 15.3), we were to make b as small as possible. Analyze
the running time of the algorithm that results if we set b to 8 and l to k/8.
Exercise 15.3.
a. Prove Theorem 15.9 by showing that for any a1 ∈ Zm1 and any a2 ∈
Zm2 , if i = (m2 c2 a1 + m1 c1 a2 ) mod m1 m2 , where (m1 c1 ) mod m2 = 1
and (m2 c2 ) mod m1 = 1, then i mod m1 = a1 and i mod m2 = a2 .
* b. Extend the above idea to prove the following. Let m1 , . . . , mn be
positive integers that are all relatively prime to each other, and let
n
M= mj .
j=1
Thus, if c is a principal nth root of unity, then the chirp transform with
respect to c is a DFT. Show how to reduce the problem of computing a chirp
transform for arbitrary c ∈ C to the problem of computing a convolution.
Using this reduction, give an O(n lg n) algorithm for evaluating a chirp
transform.
500 Algorithms: A Top-Down Approach
15.7 Notes
Heideman, Johnson, and Burrus [61] credit Gauss with the discovery of the
fast Fourier transform in 1805. Its importance to computation was shown by
Cooley and Tukey [24]. The multiplication algorithm of Section 15.4 is due
to Schönhage and Strassen [104].
Though we have referred to Theorem 15.9 as the Chinese Remainder
Theorem, it is usually stated in the more general form suggested by Exercise
15.3. The process of solving so-called simultaneous congruences in this
way dates back to the third or fourth century AD, when the Chinese
mathematician Sun Zi (or Sun Tsŭ) showed how to solve a specific instance of
simultaneous congruences. The technique was published as a general theorem
by Qin Jiushao (or Chhin Chiu-Shao) in 1247.
Part V
Intractable Problems
This page intentionally left blank
Chapter 16
N P-Completeness
503
504 Algorithms: A Top-Down Approach
∨ or
¬ ∧ not and
x y x 1 2 1
such that
• Y ∈ P;
• for each x ∈ I, x ∈ X iff there is a proof φ ∈ B such that (x, φ) ∈ Y ; and
• for each x ∈ X, there is a proof φ ∈ B such that (x, φ) ∈ Y and |φ| ≤ p(|x|).
N P-Completeness 509
From our earlier discussion, it follows that Sat ∈ We use |x| to denote the
N P. We can clearly consider any array A[1..n] of length of the encoding of x.
computing f (x) and deciding whether f (x) ∈ Y. The notation may seem
confusing at first, because when we use the word “reduce”, we usually think
of decreasing the size. As a result, denoting a reduction from X to Y by
X ≤pm Y seems backwards. The proper way to understand the notation is
to realize that when there is a polynomial many-one reduction from X to Y,
then in some sense, X is no harder than Y. This idea is formalized by the
following theorem.
Note that Theorem 16.2 does not say that if Y can be decided in O(f (n))
time, then X can be decided in O(f (n)) time. Indeed, in the proof of the
theorem, the bound on the time to decide X can be much larger than the
time to decide Y. Thus, if we interpret X ≤pm Y as indicating that X is no
harder than Y, we must understand “no harder than” in a very loose sense —
simply that if Y ∈ P, then X ∈ P.
We will often utilize Theorem 16.2 in the following equivalent form.
The idea of the proof of Cook’s Theorem is to give a method for constructing,
from an arbitrary X ∈ N P, a polynomial-time algorithm that takes as input
an instance x of X and produces as output a boolean expression F such
that F is satisfiable iff x ∈ X. In constructing this algorithm, we can use the
polynomial p(n) bounding the size of a proof φ and the algorithm for deciding
whether φ proves that x ∈ X. In order to complete the construction, we must
carefully define the computational model so that the boolean formula can
encode the algorithm. Due to the large amount of work involved, we will
delay the proof of Cook’s Theorem until Section 16.8.
Fortunately, once we have one N P-complete problem, the task of
showing other problems to be N P-complete becomes much easier. The
reason for this is that polynomial many-one reducibility is transitive, as
we show in the following theorem. Its proof is similar to the proof of
Theorem 16.2, and is therefore left as an exercise. Its corollary then gives
512 Algorithms: A Top-Down Approach
are boolean expressions for which the shortest equivalent CNF expression
has size exponential in the size of the original expression. As a result, any
such conversion algorithm must require at least exponential time in the
worst case.
Fortunately, our reduction doesn’t need to construct an equivalent
expression, but only one that is satisfiable iff the given expression is
satisfiable. In fact, the constructed expression isn’t even required to contain
the same variables. We will use this flexibility in designing our reduction.
For the first step of our reduction, we will construct an equivalent
formula in which negations are applied only to variables. Because of this
restriction, we can simplify our representation for this kind of expression
by allowing leaves to contain either positive or negative integers, as in our
representation of CNF formulas. Using this representation, we no longer need
nodes representing the ¬ operation. We will refer to this representation as a
normalized expression tree.
Fortunately, there is a polynomial-time algorithm for normalizing a bool-
ean expression tree. The algorithm uses DeMorgan’s laws:
• ¬(x ∨ y) = ¬x ∧ ¬y; and
• ¬(x ∧ y) = ¬x ∨ ¬y.
The algorithm is shown in Figure 16.3. This algorithm solves a slightly more
general problem for which the input includes a boolean neg, which indicates
whether the normalized expression should be equivalent to F or ¬F. It is
easily seen that its running time is proportional to the number of nodes in
the tree, which is in O(m), where m is the number of operators in F.
As the second step in our reduction, we need to find the largest integer
used to represent a variable in a normalized expression tree. We need this
value in order to be able to introduce new variables. Such an algorithm is
shown in Figure 16.4. Clearly, its running time is in O(|F|).
As the third step in our reduction, we will construct from a normalized
expression tree F and a value larger than any integer representing a variable
in F, a CNF expression F having the following properties:
P1 : F contains all of the variables in F;
P2 : for any satisfying assignment A for F, there is a satisfying assignment
A for F in which all the variables in F have the same values as in A; and
P3 : for any satisfying assignment A for F , the assignment A for F in which
each variable in F is assigned its value from A satisfies F.
514 Algorithms: A Top-Down Approach
Figure 16.5 Algorithm for constructing a CNF formula from a normalized expression
tree
The fact that 3-Sat ∈ N P follows immediately from the fact that
CSat ∈ N P, as 3-Sat is the same problem as CSat, only with more
restrictions placed on the input. Thus, the proof that CSat ∈ N P also
proves that 3-Sat ∈ N P.
In order to show that 3-Sat is N P-hard, we have two choices: we can
reduce either Sat or CSat to 3-Sat. Reducing CSat to 3-Sat would appear
to be less work, as instances of CSat are already in CNF. All that remains
is to ensure that the number of literals in each clause is no more than 3. We
will therefore show that CSat ≤pm 3-Sat.
As in the previous reduction, we will not produce an equivalent formula.
Instead, we will again introduce new variables. In addition, we will break up
clauses that are too long into clauses containing only 3 literals.
Suppose our formula contains a clause C = α1 ∨ · · · ∨ αm , where m > 3.
We first introduce m − 3 new variables, u1 , . . . , um−3 . We then construct the
following clauses to replace C:
• α1 ∨ α2 ∨ u1 ;
• ¬ui ∨ αi+2 ∨ ui+1 for 1 ≤ i ≤ m − 4; and
• ¬um−3 ∨ αm−1 ∨ αm .
We first claim that any assignment of boolean values that satisfies C can
be extended to an assignment that satisfies each of the new clauses. To see
why, first observe that if C is satisfied, then αi must be true for some i. We
can then set u1 , . . . , ui−2 to true and ui−1 , . . . , um−3 to false. Then each of
the first i − 2 clauses is satisfied because u1 , . . . , ui−2 are true. The (i − 1)st
clause, ¬ui−2 ∨ αi ∨ ui−1 is satisfied because αi is true. Finally, the remaining
clauses are satisfied because ¬ui−1 , . . . , ¬um−3 are true.
518 Algorithms: A Top-Down Approach
We now claim that any assignment that satisfies the new clauses will also
satisfy C. Suppose to the contrary that all the new clauses are satisfied, but
that C is not satisfied — i.e., that α1 , . . . , αm are all false. Then in order for
the first clause to be satisfied, u1 must be true. Likewise, it is easily shown
by induction on i that each ui must be true. Then the last clause is not
satisfied — a contradiction.
If we apply the above transformation to each clause having more than
3 literals in a CNF formula F and retain those clauses with no more than
3 literals, then the resulting 3-CNF formula is satisfiable iff F is satisfiable.
Furthermore, it is not hard to implement this reduction in O(|F|) time —
the details are left as an exercise. Hence, CSat ≤pm 3-Sat. We therefore
conclude that 3-Sat is N P-complete.
Figure 16.7 The graph constructed from (x1 ∨ ¬x2 ∨ x3 ) ∧ (¬x1 ∨ x3 ) in the reduction
from 3-Sat to VC
c12 c22
x1 x2 x3
Figure 16.8 Triples for setting boolean values in the reduction from 3-Sat to 3DM,
with n = 4
x1
bx2 ax4
x2 x4
ax2 bx4
x3
Let x1 , . . . , xn denote all the copies of the literal x, and let ¬x1 , . . . , ¬xn
denote all the copies of the literal ¬x. We then introduce the following triples
(see Figure 16.8):
It is not too hard to see that in order to match all of the axi s and bxi s, a
matching must include either those triples containing the xi s or those triples
containing the ¬xi s.
We can now use the construction described earlier for building triples
from clauses, except that for clause i, we include the ith copy of each literal
in its triple. Thus, in any matching, there must be for each clause at least one
triple containing a copy of a literal. However, there still may be unmatched
copies of literals. We need to introduce more triples in order to match the
remaining copies.
Suppose our 3-CNF formula F has n clauses and m variables. Then our
construction so far contains:
N P-Completeness 525
we include ¬xi , axi , bxi for 1 ≤ i ≤ n. Thus, each axi and bxi is included
exactly once. Then for clause i, because A is a satisfying assignment there
is at least one literal αij that is true in A. Because αij has not yet been
included in M , we can include the triple αij , ci , di in M . Thus, M includes
each ci and di exactly once.
At this point M includes no item more than once, but does not include
any of the ei s or fi s. Furthermore, because exactly mn+n of the xi s and ¬xi s
have been included, (m−1)n have not yet been included. Let β1 , . . . , β(m−1)n
denote the xi s and ¬xi s that have not yet been included. We complete M
by including βi , ei , fi for 1 ≤ i ≤ (m − 1)n. It is now easily seen that M is
a matching.
• X = {x0 , . . . , xm−1 };
• Y = {y0 , . . . , ym−1 };
• Z = {z0 , . . . , zm−1 }; and
• W = {w0 , . . . , wn−1 } such that each wi ∈ X × Y × Z.
We will construct a weight for each triple, plus two additional weights.
Suppose xi , yj , zk ∈ W . The weight we construct for this triple will be
and
B = C + M.
we don’t expect them to be very long. For example, about 300 bits are
sufficient to encode in binary the estimated number of elementary particles
in the universe. Thus, because there is an algorithm for Part whose running
time is a low-order polynomial in the length of the input and the values
encoded in the input, it seems unreasonable to consider this problem to be
intractable.
In order to accommodate numbers in the input, we say that an algorithm
is pseudopolynomial if its running time is bounded by some polynomial
in the length of the input and the largest integer encoded in the input.
Thus, the O(nW ) algorithm for 0-1 knapsack (and hence partition) is
pseudopolynomial. Whenever the numbers in a decision problem’s input
refer to physical quantities, we consider the problem to be tractable if
it has a pseudopolynomial algorithm. However, if the numbers are purely
mathematical entities (as, for example, in cryptographic applications), we
consider the problem to be tractable only if it belongs to P.
We would also like to extend the notion of N P-hardness to account for
numbers in the input. To this end, we first define a way to restrict a decision
problem so that no integer in an instance is too large. Specifically, for a
decision problem X and a function f : N → R≥0 , we define Xf to be the
restriction of X to instances x in which no integer has a value larger than
f (|x|). We then say that X is N P-hard in the strong sense if there is a
polynomial p such that Xp is N P-hard. If, in addition, X ∈ N P, we say
that X is N P-complete in the strong sense.
Suppose we were to find a pseudopolynomial algorithm for a strongly
N P-hard problem. When we restrict the problem so that its instances
have integers bounded by some polynomial, the pseudopolynomial algorithm
becomes truly polynomial, so that the restricted problem would be in P.
Furthermore, this restricted problem is still N P-hard in the ordinary sense.
Thus, by Theorem 16.4, we would have shown that P = N P. It therefore
seems highly unlikely that there is a pseudopolynomial algorithm for any
strongly N P-hard problem.
In order to show a problem to be N P-hard in the strong sense, we
must ensure that the reduction produces numbers whose values are bounded
above by some polynomial in the length of the instance we construct. The
proof of Cook’s Theorem in Section 16.8 does not construct large integers;
hence, Sat is N P-complete in the strong sense. Furthermore of all of the
N P-hardness proofs we have presented so far, only the proof that Part
is N P-hard constructs integers whose values are not bounded by some
polynomial in the length of the input. As a result, CSat, 3Sat, VC, IS,
530 Algorithms: A Top-Down Approach
and 3DM are all N P-complete in the strong sense. However, these results
are rather uninteresting because none of their instances contain numbers
that can become large in comparison to the length of the input without
rendering the problem trivial.
In what follows, we will show a problem with potentially large numbers
to be N P-complete in the strong sense. We will use a restricted form of
polynomial many-one reduction motivated by the following theorem.
Theorem 16.16. Let f be a polynomial many-one reduction from problem
X to problem Y, where X is N P-hard in the strong sense. Suppose that f
satisfies the following properties:
(1) there is a polynomial p1 such that p1 (|f (x)|) ≥ |x| for every instance x
of X; and
(2) there is a two-variable polynomial p2 such that each integer constructed
has a value no greater than p2 (|x|, μ(x)), where μ(x) denotes the
maximum value of any integer in x.
Then Y is N P-hard in the strong sense.
Proof. Because X is N P-hard in the strong sense, there is some poly-
nomial p such that Xp is N P-hard. If the reduction is then applied to
Xp , all numbers constructed will have values bounded by p2 (|x|, p(|x|)), by
Property 2. Furthermore, by Property 1, these values are no more than
p2 (p1 (|f (x)|), p(p1 (|f (x)|))), which is a polynomial in the length of the
instance constructed. The reduction from Xp to Y therefore shows that Y
is N P-hard in the strong sense.
We will show that 3DM ≤pp m 4-Part. The reduction will be somewhat
similar to the reduction from 3DM to Part, but we must be careful that
the weights we construct are not too large. Let us describe an instance of
3DM using the same notation as we did for the earlier reduction. We will
assume that each element occurs in at least one triple. Otherwise, there is
no matching, and we can create an instance with seven items having weight
6 and one item having weight 8, so that the total weight is 50, and B = 25.
Clearly, 25/5 < 6 < 8 < 25/3; hence, this is a valid instance, but there is
clearly no way to form a subset with weight 25.
We will construct, for each triple xi , yj , zk ∈ W , four weights: one
weight for each of xi , yj , and zk , plus one weight for the triple itself. Because
each element of X ∪ Y ∪ Z can occur in several triples, we may construct
several items for each element. Exactly one of these will be a matching
item. All non-matching items constructed from the same element will have
the same weight, which will be different from that of the matching item
constructed from that element. We will construct the weights so that in any
4-partition, the item constructed from a triple must be grouped with either
the matching items constructed from the elements of the triple, or three
non-matching items — one corresponding to each element of the triple. In
this way, a 4-partition will exist iff W contains a matching.
As in the previous reduction, it will be convenient to view the weights
in a particular radix r, which we will specify later. In this case, however,
the weights will contain only a few radix-r digits. We will choose r to
be large enough that when we add any four of the weights we construct,
each column of digits will have a sum strictly less than r; hence, we will be
able to deal with each digit position independently in order to satisfy the
various constraints. Note that if we construct the weights so that for every
triple, the sum of the four weights constructed is the same, then this sum
will be B.
We will use the three low-order digits to enforce the constraint that the
four items within any partition must be derived from some triple and its
three components. To this end, we make the following assignments:
532 Algorithms: A Top-Down Approach
r4 + r3 +i+1
r4 + r3 + (j + 1)r
r 4 + r 3 + (k + 1)r 2
r4 + + i+1
4
r + + (j + 1)r
r 4 + 3r 3 + (k + 1)r 2
Furthermore,
so that every weight is larger than B/5. Furthermore, each weight is less
than
r 4 + 4r 3 < r 4 + r 4 /3
= 4r 4 /3
< B/3.
Because we will be using this model only for representing an algorithm for
deciding whether (x, φ) ∈ Y, we need exactly two input streams, one for x
and one for φ. Furthermore, we can represent a “yes” output by setting the
output bit to 1.
We will assume that each memory location is addressed by a unique
natural number. Each machine will then have the following instruction set:
• Input(i, l): Stores the next bit from input stream i, where i is either 0
or 1, in memory location l. If all of the input has already been read, the
value 2 is stored.
• Load(n, l): Stores the natural number n at memory location l.
• Copy(l1 , l2 ): Copies the value stored at location l1 into location l2 .
• Goto(p): Changes the value of the program counter to p.
• IfLeq(l1 , l2 , p): If the value at location l1 is less than or equal to the value
at location l2 , changes the value of the program counter to p.
• Add(l1 , l2 ): Adds the value in location l1 to the value in location l2 , saving
the result in location l2 .
• Subtract(l1 , l2 ): Subtracts the value in location l1 from the value in
location l2 , saving the result in location l2 .
• Shift(l): Replaces the value n stored in location l with n/2.
• Halt(b): Terminates the program with output b, which must be either 0
or 1.
Add(0, ∗1)
such that
If(y, z) = ¬y ∨ z.
This abbreviation specifies that if y is true, then z must also be true. However,
if y is false, then no constraint is placed upon z. Note that such an expression
can be constructed in O(1) time.
We can extend the above abbreviation to specify an if-then-else
construct:
This specifies that if y is true, then z1 is true, but if not, then z2 is true.
Clearly it can be constructed in O(1) time.
540 Algorithms: A Top-Down Approach
Eq(y, true) = y
and
Because aj , aj , l, and vij are arrays of p(n) elements, this expression can be
constructed in O(p2 (n)) time.
N P-Completeness 541
the value of the program counter at each execution step. We specify this
constraint with the sub-formula,
p(n) P −2 P −1
F1 = If(pij , ¬pij ).
i=0 j=0 j =j+1
Because the size of A depends only on the problem X, this sub-formula can
be constructed in O(p(n)) time.
To complete the formula, we need constraints specifying the correct
behavior of M . To this end, we will construct one sub-formula for each
instruction in the program of M . These sub-formulas will depend on the
particular instruction. Let 0 ≤ q < P , where P is the number of instructions
in the program. In what follows, we will describe how the sub-formula Fq is
constructed depending on the instruction at program location q.
Regardless of the specific instruction, the sub-formula will have the same
general form. In each case, Fq must specify that some particular behavior
occurs whenever the program counter has a value of q. Fq will therefore have
the following form:
p(n)
Fp = If(pi−1,q , ψq (i)), (16.1)
i=1
There are some instances of the above predicates that occur for more
than one type of instruction.
• If the instruction at location q is not an Input instruction, then
Iq (i) = Eq(xi , xi−1 ) ∧ Eq(φi , φi−1 ). (16.3)
• If this instruction is neither a Goto, an IfLeq, nor a Halt, then
Pq (i) = pi,q+1 . (16.4)
• If this instruction is either a Goto or a Halt, then
4p(n)
Uq (i) = Eq(vij , vi−1,j ), (16.5)
j=1
and
Eq (i) = true. (16.6)
In what follows, we will define the remaining predicates for several of the
possible instructions. We leave the remaining cases as exercises.
Let us first consider an instruction Load(n, l). Because l is the only
memory location that is accessed, we can define
4p(n)
Eq (i) = Eq(aj , l).
j=1
Note that the above expression specifies that every vij such that aj = l has
its value changed to n.
Let us now compute the time needed to construct the resulting sub-
formula Fq . Because the arrays aj and l each contain p(n) elements, Eq (i, j)
can be constructed in O(p2 (n)) time. It is not hard to verify that Uq (i) can
N P-Completeness 545
4p(n)
Eq (i) = Ind(i − 1, j, l),
j=1
and
4p(n)
Uq (i) = If(Ind(i − 1, j, l), Eq(vij , n), Eq(vij , vi−1,j )).
j=1
In this case, Eq (i) and Uq (i) can be constructed in O(p3 (n)) time, so that Fq
can be constructed in O(p4 (n)) time.
Let us now consider an instruction IfLeq(l1 , l2 , q ). Because the memory
locations l1 and l2 are referenced, we define
4p(n) 4p(n)
Eq = Eq(aj , l1 ) ∧ Eq(aj , l2 ).
j=1 j=1
4p(n)
Uq (i) = Eq(vij , vi−1,j )
j=1
4p(n) 4p(n)
∧ If(Eq(aj , l1 ) ∧ Eq(aj , l2 ),
j=1 j =1
piq , pi,q+1 ⎠ .
Eq (i) can be constructed in O(p2 (n)) time, and both Uq (i) and Pq (i) can
be constructed in O(p3 (n)) time. Furthermore, Iq (i) as given in (16.3) can
be constructed in O(p(n)) time. The total time needed to construct Fq is
therefore in O(p4 (n)).
Finally, let us consider a Halt instruction. For a Halt instruction, we
have already defined Iq (i) (16.3), Uq (i) (16.5), and Eq (i) (16.6). To define
Pq (i), we need to specify that for all i > i, each pi j is false:
p(n) P −1
Pq (i) = ¬pi j .
i =i+1 j=0
16.9 Summary
The N P-complete problems comprise a large class of decision problems
for which no polynomial-time algorithms are known. Furthermore, if a
polynomial time algorithm were found for any one of these problems, we
would be able to construct polynomial-time algorithms for all of them. For
this reason, along with many others that are beyond the scope of this book,
we tend to believe that none of these problems can be solved in polynomial
time. Note, however, that this conjecture has not been proven. Indeed, this
question — whether P = N P — is the most famous open question in
theoretical computer science.
Proofs of N P-completeness consist of two parts: membership in N P
and N P-hardness. Without knowledge of any N P-complete problems, it is
quite tedious to prove a problem to be N P-hard. However, given one or
more N P-complete problems, the task of proving additional problems to be
N P-hard is greatly eased using polynomial-time many-one reductions.
Some general guidelines for finding a reduction from a known
N P-complete problem to a problem known to be in N P are as follows:
• Look for a known N P-complete problem that has similarities with the
problem in question.
• If all else fails, try reducing from 3-Sat.
548 Algorithms: A Top-Down Approach
16.10 Exercises
Exercise 16.1. Prove that if X, Y, and Z are decision problems such that
X ≤pm Y and Y ≤pm Z, then X ≤pm Z.
* Exercise 16.38. Define the predicates Iq (i), Eq (i), and Uq (i) for the case
in which the instruction at location q is Input(1, ∗l). Show that the resulting
sub-formula Fq can be constructed in O(p5 (n)) time.
16.11 Notes
N P-completeness was introduced by Cook [23], who proved that Sat and
CSat are N P-complete. Karp [78] then demonstrated the importance of
this topic by proving N P-completeness of 21 problems, including VC, 3DM,
Part, and the problems described in Exercises 16.7, 16.9, 16.10, 16.13, 16.16,
16.18, and 16.23. The original definition of N P was somewhat different from
the one given here — it was based on non-deterministic Turing machines,
rather than on algorithms or RAMs. The definition given in Section 16.1
is based on a definition given by Brassard and Bratley [17]. All of these
definitions are equivalent.
Sat is an example of an N P-complete problem for which practical
algorithms exist. Even though each of these algorithms requires exponential
time in the worst case, they have been used to solve large instances arising
in fields such as software verification and scheduling. For a survey of Sat-
solvers, see Gong and Zhou [58].
The notion of strong N P-completeness was introduced by Garey and
Johnson [53]. They provided the definitions of strong N P-completeness,
pseudopolynomial algorithms, and pseudopolynomial reductions. They had
earlier given N P-completeness proofs for k-Part for k ≥ 3 [51] and for the
problem described in Exercise 16.29 [52]. As it turned out, their reductions
were pseudopolynomial. Their book on N P-completeness [54] is an excellent
resource.
Exercise 16.20 is solved by Lovasz [91]. Exercise 16.21 is solved by
Kirkpatrick and Hell [79]. Exercise 16.22 is solved by van Leeuwen [115].
The solution to Exercise 16.30 is attributed to Perl and Zaks by Garey and
Johnson [54].
Axis and AlliesTM (mentioned in Exercise 16.25) is a registered trade-
mark of Hasbro, Inc.
This page intentionally left blank
Chapter 17
Approximation Algorithms
555
556 Algorithms: A Top-Down Approach
• the time required to obtain a solution for x, excluding any time needed to
solve instances of Y , is bounded above by p(|x|); and
• the values of all variables are bounded above by p(|x|).
17.2 Knapsack
The first problem we will examine is the 0-1 knapsack problem, as defined
in Section 12.4. As is suggested by Exercise 16.18, the associated decision
problem is N P-complete; hence, the optimization problem is N P-hard.
Consider the following greedy strategy for filling the knapsack. Suppose
we take an item whose ratio of value to weight is maximum. If this item won’t
fit, we discard it and solve the remaining problem. Otherwise, we include it in
the knapsack and solve the problem that results from removing this item and
decreasing the capacity by its weight. We have thus reduced the problem to a
smaller instance of itself. Clearly, this strategy results in a set of items whose
total weight does not exceed the weight bound. Furthermore, it is not hard
to implement this strategy in O(n lg n) time, where n is the number of items.
Because the problem is N P-hard, we would not expect this greedy
strategy to yield an optimal solution in all cases. What we need is a way
to measure how good an approximation to an optimal solution it provides.
In order to motivate an analysis, let us consider a simple example. Consider
the following instance consisting of two items:
The value-to-weight ratios of the two items are 2 and 1, respectively. The
greedy algorithm therefore takes the first item first. Because the second item
will no longer fit, the solution provided by the greedy algorithm consists of
the first item by itself. The value of this solution is 2. However, it is easily
seen that the optimal solution is the second item by itself. This solution has
a value of 10.
A common way of measuring the quality of an approximation is to form
a ratio with the actual value. Specifically, for a maximization problem, we
define the approximation ratio of a given approximation to be the ratio of
the optimal value to the approximation. Thus, the approximation ratio for
the above example is 5. For a minimization problem, we use the reciprocal
of this ratio, so that the approximation ratio is always at least 1. As
the approximation ratio approaches 1, the approximation approaches the
optimal value.
Note that for a minimization problem, the approximation ratio cannot
take a finite value if the optimal value is 0. For this reason, we will restrict
our attention to optimization problems whose optimal solutions always make
558 Algorithms: A Top-Down Approach
S = {i | 1 ≤ i ≤ n, A[i] = true},
then
w[i] ≤ W.
i∈S
Proof. We begin by showing the lower bound. Let ∈ R>0 , and without
loss of generality, assume < 1. We first define the weight bound as
4
W =2 .
The optimal solution clearly consists of the second and third items. This
solution has value W . Each iteration of the outer loop of KnapsackApprox
yields a solution containing the first item and one of the other two. The
solution returned by this algorithm therefore has a value of W/2 + 2. The
approximation ratio is therefore
W 2W
W
=
2 +2 W +4
8
=2−
W +4
8
=2−
24/ + 4
8
≥2−
8/
= 2 − .
each packing using the greedy algorithm. (If there are fewer than k items,
we simply do an exhaustive search and return the optimal solution.) The
proof is a straightforward generalization of the proof of Theorem 17.3 — the
details are left as an exercise.
It is not hard to see that the algorithm outlined above can be
implemented to return a solution in Θ(nk+1 ) time. If k is a fixed constant,
the running time is polynomial. We therefore have an infinite sequence of
algorithms, each of which is polynomial, such that if an approximation ratio
of 1 + is needed (for some positive ), then one of these algorithms will
provide such an approximation. Such a sequence of algorithms is called a
polynomial approximation scheme.
Although each of the algorithms in the above sequence is polynomial
in the length of the input, it is somewhat unsatisfying that to achieve an
approximation ratio of 1+ k1 , a running time in Θ(nk+1 ) is required. We would
be more satisfied with a running time that is polynomial in both n and k.
More generally, suppose we have an approximation algorithm that takes as
an extra input a natural number k such that for any fixed k, the algorithm
yields an approximation ratio of no more than 1 + k1 . Suppose further that
this algorithm runs in a time polynomial in k and the length of its input.
We call such an algorithm a fully polynomial approximation scheme.
We can obtain a fully polynomial approximation scheme for the 0-1
knapsack problem using one of the dynamic programming algorithms
suggested in Section 12.4. The algorithm based on recurrence (12.5) on page
399 runs in Θ(nV ) time, where n is the number of items and V is the sum
of their values. We can make V as small as we wish by replacing each value
v by v/d for some positive integer d. If some of the values become 0,
we remove these items. Observe that because we don’t change any weights
or the weight bound, any packing for the new instance is a packing for the
original. However, because we take the floor of each v/d, the optimal packing
for the new instance might not be optimal for the original. The smaller we
make d, the better our approximation, but the less efficient our dynamic
programming algorithm.
In order to determine an appropriate value for d, we need to analyze the
approximation ratio of this approximation algorithm. Let S be some optimal
set of items. The optimal value is then
V∗ = vi .
i∈S
562 Algorithms: A Top-Down Approach
We can clearly compute the scaled values in O(n) time. If v ≥ 2(k + 1)n,
the sum of the scaled values is no more than
nv nv
=
d v
(k+1)n
nv
≤ v−(k+1)n
(k+1)n
(k + 1)n2 v
=
v − (k + 1)n
(k + 1)n2 v
≤
v/2
= 2(k + 1)n2 .
In this case, the dynamic programming algorithm runs in O(kn3 ) time.
If v < 2(k + 1)n, then d = 1, so that we use the original values. In this
case, the sum of the values is no more than
nv < 2(k + 1)n2 ,
so that again, the dynamic programming algorithm runs in O(kn3 ) time.
Thus, the total running time of the approximation algorithm is in O(kn3 ).
Because this running time is polynomial in k and n, and because the
approximation ratio is no more than 1+ k1 , this algorithm is a fully polynomial
approximation scheme.
Theorem 17.4. Let p(x, y) be an integer-valued Recall that |x| denotes the
polynomial, and let X be an optimization problem number of bits in the encod-
ing of x and µ(x) denotes
whose optimal value on any input x is a natural the maximum value of any
integer encoded within x.
number bounded above by p(|x|, μ(x)). If there is a
fully polynomial approximation scheme for X, then
there is a pseudopolynomial algorithm for obtaining
an optimal solution for X.
Proof. The pseudopolynomial algorithm operates as follows. Given an
input x, it first computes k = p(|x|, μ(x)). It then uses the fully polynomial
approximation scheme to approximate a solution with an approximation
ratio bounded by 1 + k1 . Let V be the value of the approximation, and let
V ∗ be the value of an optimal solution. If the problem is a minimization
problem, we have
V 1
∗
≤1+ ,
V k
V∗
V ≤V∗+ ,
k
V ∗
V −V∗ ≤
k
< 1.
Because both V and V ∗ are natural numbers and V ≥ V ∗ , we conclude that
V = V ∗ . Furthermore, because the fully polynomial approximation scheme
runs in time polynomial in |x| and p(|x|, μ(x)), it is a pseudopolynomial
algorithm.
An analogous argument applies to maximization problems.
Because the minimum number of bins needed is clearly no more than
the length of the input to the bin packing problem, Theorem 17.4 applies to
this problem. Indeed, the condition that the optimal solution is bounded by
a polynomial in the length of the input and the largest integer in the input
holds for most optimization problems. In these cases, if the given problem
is strongly N P-hard (as is bin packing), there can be no fully polynomial
approximation scheme unless P = N P.
If we cannot obtain a fully polynomial approximation scheme for bin
packing, we might still hope to find a polynomial approximation scheme.
However, the theory of N P-hardness tells us that this is also unlikely. In
particular, for a fixed positive integer k, let k-BP denote the problem of
deciding whether, for a given instance of bin packing, there is a solution
using at most k bins. It is easily seen that Part ≤pm 2-BP, so that 2-BP is
Approximation Algorithms 565
N P-hard. Now for a fixed positive real number , let -ApproxBP be the
problem of approximating a solution to a given instance of bin packing with
an approximation ratio of no more than 1 + . We will now show that for
any < 1/2, 2-BP ≤pT -ApproxBP, so that -ApproxBP is N P-hard. As
a result, there can be no polynomial approximation scheme for bin packing
unless P = N P.
Theorem 17.5. For 0 < < 1/2, -ApproxBP is N P-hard.
Proof. As we noted above, we will show that 2-BP ≤pT -ApproxBP.
Given an instance of 2-BP, we first find an approximate solution with
approximation ratio at most 1 + . If the approximate solution uses no more
than 2 bins, then we can answer “yes”. If the approximate solution uses 3
or more bins, then the optimal solution uses at least
3 3
>
1+ 3/2
=2
bins. We can therefore answer “no”.
Ignoring the time needed to compute the approximation, this algorithm
runs in Θ(1) time. Therefore, 2-BP ≤pT -ApproxBP, and -ApproxBP is
N P-hard.
wj ≤ W.
j∈S
BinPackingFF(W , w[1..n])
B ← new Array[1..n]; slack ← new Array[1..n]; numBins ← 0
for i ← 1 to n
j←1
while j ≤ numBins and w[i] > slack[j]
j ←j+1
if j > numBins
numBins ← numBins + 1; B[j] ← new ConsList(); slack[j] ← W
B[j] ← new ConsList(i, B[j]); slack[j] ← slack[j] − w[i]
return B[1..numBins]
is placed, it cannot increase the number of bins that are no more than half
full. Suppose w[i] ≤ W/2. Then if there is a bin that is no more than half
full, w[i] will fit into this bin. Thus, the only case in which the number of
bins that are no more than half full increases is if there are no bins that are
no more than half full. In this case, the number cannot be increased to more
than one.
We conclude that the packing returned by this algorithm has at most
one bin that is no more than half full. Suppose this packing consists of k
bins. The total weight must therefore be strictly larger than (k − 1)W/2.
The optimal packing must therefore contain more than (k − 1)/2 bins. Thus,
the number of bins in the optimal packing is at least
k−1 k+1
+1=
2 2
≥ k/2.
Proof. Let > 0, and let HC be the problem of deciding whether a given
undirected graph G contains a Hamiltonian cycle. By Exercise 16.9, HC
is N P-complete. Since there are no integers in the problem instance, it is
strongly N P-complete. We will show that HC ≤pp T -ApproxTSP, where
pp
≤T denotes a pseudopolynomial Turing reduction. It will then follow that
-ApproxTSP is N P-hard in the strong sense.
Let G = (V, E) be an undirected graph. We first construct a complete
undirected graph G = (V, E ). Let k = + 2. We define the weight of an
edge e ∈ E as follows:
• If e ∈ E, then the weight of e is 1.
• If e ∈ E, then the weight of e is nk, where n is the size of V.
Note that because k is a fixed constant, the weights are bounded by a
polynomial in the size of G.
We now show how we can use an approximation of a minimum-
weight Hamiltonian cycle in G to decide whether G has a Hamilton cycle.
Suppose we can obtain an approximation with an approximation ratio of
no more than 1 + . If the weight of this approximation is n, then the
corresponding Hamiltonian cycle must contain only edges with weight 1;
hence, it is a Hamiltonian cycle in G, so we can answer “yes”. Otherwise,
the approximation contains at least one edge with weight nk, and n > 0.
The weight of the approximation is therefore at least nk + n − 1. Because the
approximation ratio is no more than 1+ , the minimum-weight Hamiltonian
path has a weight of at least
nk + n − 1 n( + 2) + n − 1
=
1+ 1+
n(1 + )
>
1+
= n.
Hence, there is no Hamiltonian cycle whose edge weights are all 1. Because
this implies that G contains no Hamiltonian cycle, we can answer “no”.
The running time for this algorithm, excluding any time needed to
compute the approximation, is linear in the size of G. Furthermore, all
integers constructed have values polynomial in the size of G. We therefore
conclude that -ApproxTSP is N P-hard in the strong sense.
MetricTspSearcher(n)
pre ← new VisitCounter(n); order ← new Array[0..n − 1]
MetricTspSearcher.PreProc(i)
pre.Visit(i); order[pre.Num(i)] ← i
Figure 17.5 Approximation algorithm for the metric traveling salesperson problem
Proof. For a given vertex i, let Wi denote the sum of the weights of all
edges {i, j} such that 0 ≤ j < i. At the end of iteration i, the value of the
cut increases by Wi − clusterInc[m], where clusterInc[m] is the sum of the
weights of the edges from i to other vertices in partition m. m is chosen so
that clusterInc[m] is minimized; hence, for each partition other than m, the
sum of the weights of the edges from i to vertices in that partition is at least
clusterInc[m]. We therefore have
clusterInc[m] ≤ Wi /k.
The value of the cut therefore increases by at least Wi (k − 1)/k on
iteration i. Because the value of the cut is initially 0, the final value of the
cut is at least
n−1
n−1
Wi (k − 1) k−1
= Wi
k k
i=0 i=0
k−1
= W,
k
574 Algorithms: A Top-Down Approach
where W is the sum of all edge weights in G. Clearly, the maximum cut can
be no more than W . The approximation ratio is therefore bounded above by
W k
=
(k − 1)W/k k−1
1
=1+ .
k−1
1
0 1
x x
x
x
max cut
2 3
1
approximation
Approximation Algorithms 575
vertices in the same cluster via edges whose weights are all x. Thus, the
approximation ratio is x, which can be chosen to be arbitrarily large. We
can therefore see that even though the maximum cut and minimum cluster
optimization problems are essentially the same, the MaxCut algorithm
yields vastly different approximation ratios relative to the two problems.
To carry this idea a step further, we will now show that the minimum
cluster problem has no approximation algorithm with a bounded approxi-
mation ratio unless P = N P. For a given ∈ R>0 and integer k ≥ 2, let the
-Approx-k-Cluster problem be the problem of finding, for a given com-
plete undirected graph G with positive integer edge weights, a k-cut whose
sum of cluster weights is at most W ∗ (1 + ), where W ∗ is the minimum sum
of cluster weights. Likewise, let -Approx-Cluster be the corresponding
problem with k provided as an input. We will show that for every positive
and every integer k ≥ 3, the -Approx-k-Cluster problem is N P-hard in
the strong sense. Because -Approx-3-Cluster ≤pp T -Approx-Cluster,
it will then follow that this latter problem is also N P-hard in the strong
sense. Whether the result extends to -Approx-2-Cluster is unknown at
the time of this writing.
Theorem 17.9. For every ∈ R>0 and every k ≥ 3, -Approx-k-Cluster
is N P-hard in the strong sense.
Proof. As is suggested by Exercises 16.23 and 16.24, the problem of
deciding whether a given undirected graph is k-colorable is N P-complete for
each k ≥ 3. Let us refer to this problem as k-Col. Because k-Col contains
no large integers, it is N P-complete in the strong sense. We will now show
that k-Col ≤pp T -Approx-k-Cluster, so that -Approx-k-Cluster is
N P-hard in the strong sense for k ≥ 3.
Let G = (V, E) be a given undirected graph. Let n be the number
of vertices in G. We can assume without loss of generality that n > k,
for otherwise G is clearly k-colorable. We construct G = (V, E ) to be
the complete graph on V. We assign an edge weight of n2 1 + to edge
{u, v} ∈ E if {u, v} ∈ E; otherwise, we assign it a weight of 1. Clearly,
this construction can be completed in time polynomial in n. Furthermore,
because is a fixed constant, all integers have values polynomial in n.
Suppose we have a k-cut of G such that the ratio of its cluster weight
to the minimum cluster weight is at most 1 + . If the given cluster weight
is less than n2 1 + , then all edges connecting vertices in the same cluster
must have weight less than n2 1 + ; hence none of them belong to E.
Approximation Algorithms 577
17.6 Summary
Using Turing reducibility, we can extend the definition of N P-hardness from
Chapter 16 to apply to problems other than decision problems in a natural
way. We can then identify certain optimization problems as being N P-hard,
either in the strong sense or the ordinary sense. One way of coping with
N P-hard optimization problems is by using approximation algorithms.
For some N P-hard optimization problems we can find polynomial
approximation schemes, which take as input an instance x of the problem
and a positive real number and return, in time polynomial in |x|, an
approximate solution with approximation ratio no more than 1 + . If
this algorithm runs in time polynomial in |x| and 1/, it is called a fully
polynomial approximation scheme.
However, Theorem 17.4 tells us that for most optimization problems, if
the problem admits a fully polynomial approximation scheme, then there
is a pseudopolynomial algorithm to solve the problem exactly. As a result,
we can use strong N P-hardness to show for a number of problems that
unless P = N P, that problem cannot have a fully polynomial approximation
scheme. Furthermore, by showing N P-hardness of certain approximation
problems, we can show that unless P = N P, the corresponding optimiza-
tion problem has no approximation algorithm with approximation ratio
bounded by some — or in some cases any — given value.
Finally, there are some pairs of optimization problems, such as the maxi-
mum cut problem and the minimum cluster problem, that are essentially the
same problem, but which yield vastly different results concerning approxi-
mation algorithm. For example, the maximum k-cut, can be approximated
1
in Θ(n2 ) time with an approximation ratio of no more than 1+ k−1 ; however,
unless P = N P, there is no polynomial-time algorithm with any bounded
approximation ratio for finding the minimum weight of clusters formed by a
k-cut if k ≥ 3.
578 Algorithms: A Top-Down Approach
17.7 Exercises
Exercise 17.1. Give an approximation algorithm that takes an instance of
the knapsack problem and a positive integer k and returns a packing with
an approximation ratio of no more than 1 + k1 . Your algorithm must run in
O(nk+1 ) time. Prove that both of these bounds (approximation ratio and
running time) are met by your algorithm.
Exercise 17.3. The best-fit algorithm for bin packing considers the items
in the given order, always choosing the largest-weight bin in which the item
will fit. Show that the approximation ratio for this algorithm is no more
than 2.
* Exercise 17.5.
* Exercise 17.7.
17.8 Notes
The concept of a polynomial-time approximation algorithm was first formal-
ized by Gareyet al. [50] and Johnson [73]. In fact, much of the foundational
work in this area is due to Garey and Johnson — see their text [54] for a
summary of the early work. For example, they proved Theorem 17.4 [53]. A
detailed analysis of bin packing, including an 11 ∗
9 B + 4 upper bound on the
approximation ratio for first-fit decreasing, is given by Johnson [72]. This
bound was later improved to 11 ∗ 2
9 B + 3 by Dósa [31], who showed this bound
to be tight. The 17 ∗
10 B upper bound for the first-fit algorithm is due to
Garey et al. [49], and a close relationship between best-fit and first-fit was
established by Johnson et al. [71]. Dósa and Sgall [32] later showed that the
upper bound for best-fit is also 17 ∗
10 B .
The polynomial approximation scheme suggested by Exercise 17.1 for the
knapsack problem is due to Sahni [102]. The fully polynomial approximation
scheme of Section 17.2 is due to Ibarra and Kim [69]. Theorem 17.7 was
shown by Sahni and Gonzalez [101].
This page intentionally left blank
Bibliography
581
582 Algorithms: A Top-Down Approach
[88] J. B. Kruskal, Jr. On the shortest spanning subtree of a graph and the traveling
salesman problem. Proceedings of the American Mathematical Society, 7:48–50, 1956.
[89] E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart,
and Winston, 1976.
[90] D. A. Lelewer and D. S. Hirschberg. Data compression. ACM Computing Surveys,
19:261–296, 1987.
[91] L. Lovasz. Coverings and colorings of hypergraphs. In Proceedings of the 4th
Southeastern Conference on Combinatorics, Graph Theory, and Computing, pp. 3–
12. Utilitas Mathematica Publishing, 1973.
[92] E. Lucas. Récréations Mathématiques, volume
1. Gauthier-Villars, 1883.
[93] S. Micali and V. V. Vazirani. An O( |V | · |E|) algorithm for finding maximal
matchings in general graphs. In Proceedings of the 21st Annual IEEE Symposium
on Foundations of Computer Science, pp. 17–27, 1980.
[94] S. S. Muchnick. Advanced Compiler Design & Implementation. Morgan Kaufmann,
1997.
[95] D. R. Musser. Introspective sorting and selection algorithms. Software: Practice and
Experience, 27:983–993, 1997.
[96] C. H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization: Algorithms
and Complexity. Prentice-Hall, 1982.
[97] L. C. Paulson. ML for the Working Programmer, 2nd edition. Cambridge University
Press, 1996.
[98] O. R. L. Peters. Pattern-defeating quicksort. CoRR, 2021. https://arxiv.org/abs/
2106.05123.
[99] R. C. Prim. Shortest connection networks and some generalizations. Bell System
Technical Journal, 36:1389–1401, 1957.
[100] W. Pugh. Skip lists: A probabilistic alternative to balanced trees. Communications
of the ACM, 33:668–676, 1990.
[101] S. Sahni and T. Gonzalez. P-complete approximation problems. Journal of the
Association for Computing Machinery, 23:555–565, 1976.
[102] S. Sahni. Approximate algorithms for the 0/1 knapsack problem. Journal of the
Association for Computing Machinery, 22:115–124, 1975.
[103] K. Sayood. Introduction to Data Compression, 5th edition. Morgan Kaufmann
Publishers, 2018.
[104] A. Schönhage and V. Strassen. Schnelle multiplikation grosser zahlen. Computing,
7:281–292, 1971.
[105] M. Sharir. A strong-connectivity algorithm and its applications in data flow analysis.
Computers and Mathematics with Applications, 7:67–72, 1981.
[106] E. Silberstang. The Winner’s Guide to Casino Gambling, 4th edition. Henry Holt
and Company, LLC, 2005.
[107] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. Journal of
Computer and System Sciences, 26:362–391, 1983.
[108] D. D. Sleator and R. E. Tarjan. Self-adjusting heaps. SIAM Journal on Computing,
15:52–69, 1986.
[109] V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13:354–
356, 1969.
[110] R. E. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on
Computing, 1:146–160, 1972.
[111] R. E. Tarjan. On the efficiency of a good but not linear set merging algorithm.
Journal of the ACM, 22:215–225, 1975.
586 Algorithms: A Top-Down Approach
[112] R. E. Tarjan. Data Structures and Network Algorithms. Society for Industrial and
Applied Mathematics, 1983.
[113] R. E. Tarjan. Amortized computational complexity. SIAM Journal on Algebraic and
Discrete Methods, 6:306–318, 1985.
[114] A. H. Taub (ed). John Von Neumann: Collected Works, volume 5, Pergamon Press,
1961.
[115] J. van Leeuwen. Having a Grundy-numbering is NP-complete. Report 207,
Pennsylvania State University, University Park, PA, 1976.
[116] J. Vuillemin. A data structure for manipulating priority queues. Communications
of the ACM, 21:309–315, 1978.
[117] R. A. Wagner and M. J. Fischer. The string-to-string correction problem. Journal
of the ACM, 21:168–173, 1974.
[118] S. Warshall. A theorem on Boolean matrices. Journal of the ACM, 9:11–12, 1962.
[119] E. W. Weisstein. Collatz problem. From MathWorld — A Wolfram Web Resource.
http://mathworld.wolfram.com/CollatzProblem.html.
[120] H. Whitney. On the abstract properties of linear dependence. American Journal of
Mathematics, 57:509–533, 1935.
[121] J. W. J. Williams. Algorithm 232: Heapsort. Communications of the ACM, 7:347–
348, 1964.
[122] N. Wirth. Program development by stepwise refinement. Communications of the
ACM, 14:221–227, 1971.
[123] J. W. Wright. The change-making problem. Journal of the ACM, 22:125–128, 1975.
[124] A. C.-C. Yao. An O(|E| log log |V |) algorithm for finding minimum spanning trees.
Information Processing Letters, 4:21–23, 1975.
[125] F. F. Yao. Efficient dynamic programming using quadrangle inequalities. In
Proceedings of the 12th Annual ACM Symposium on Theory of Computing, pp. 429–
435, 1980.
Index
587
588 Algorithms: A Top-Down Approach
L N
L’Hôpital’s rule, 93 natural number (Nat), 4–5
Landis, E. M., 208, 244 network flow, 441–451, 460
Lawler, E. L., 387 augmenting path, 443
Lelewer, D. A., 388 residual network, 444
lg, 70 Newton’s method, 356–363
limit, 92 nondestructive updates, 126
linearity of expectation, 178 N P-complete problem, 511
strong sense, 529
linked list, 131
N P-hard problem, 511
little-oh, 89
non-decision problem, 556
little-omega, 89
strong sense, 529
load factor, see hashing, load factor, 254
null path length, 159
logarithm
Number, 5
base e, 94
base 2, 70 O
natural, 94
object-oriented design, 24
Lovasz, L., 553
object-oriented programming, 24
Lucas, E., 439
Ofman, Y., 370
Luhn, H. P., 288
Olderog, E.-R., 54
optimization problem, 371
M
Malinowski, A., 195 P
Markov’s Inequality, 194 palindrome, 22, 402
matching Papadimitriou, C. H., 387
bipartite, 451–460 partition problem, 526–533
matrix multiplication Patashnik, O., 105
chained, 393–395 Paulson, L. C., 147
Mauchly, J. W., 370 Perl, Y., 553
maximum cut problem, 572–577 Peters, O. R. L., 370
Index 591