Lecture4 2024
Lecture4 2024
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u
1.4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u ‣ memory
Running time
Analytic Engine
4
Cast of characters
Theoretician wants
to understand.
5
Reasons to analyze algorithms
Predict performance.
Provide guarantees.
6
Some algorithmic successes
7
Some algorithmic successes
N-body simulation.
・Simulate gravitational interactions among N bodies.
・Brute force: N steps.
2
8
The challenge
Scientific method.
・Observe some feature of the natural world.
・Hypothesize a model that is consistent with the observations.
・Predict events using the hypothesis.
・Verify the predictions by making further observations.
・Validate by repeating until the hypothesis and observations agree.
Principles.
・Experiments must be reproducible.
・Hypotheses must be falsifiable.
3-SUM. Given N distinct integers, how many triples sum to exactly zero?
8
30 -40 10 0
30 -40 -20 -10 40 0 10 5 1
2 30 -20 -10 0
% java ThreeSum 8ints.txt
4
3 -40 40 0 0
4 -10 0 10 0
14
Measuring the running time
15
Measuring the running time
Run the program for various input sizes and measure running time.
18
Empirical analysis
Run the program for various input sizes and measure running time.
N time (seconds) †
250 0
500 0
1,000 0.1
2,000 0.8
4,000 6.4
8,000 51.1
16,000 ?
19
Data analysis
20
Data analysis
Log-log plot. Plot running time T (N) vs. input size N using log-log scale.
lg(T (N)) = b lg N + c
b = 2.999
c = -33.2103
T (N) = a N b, where a = 2 c
power law
Observations.
N time (seconds) †
8,000 51.1
8,000 51
8,000 51.1
16,000 410.8
validates hypothesis!
22
Doubling hypothesis
250 0 –
8,000 51.1 8 3
N time (seconds) †
8,000 51.1
51.1 = a 80003
8,000 51
⇒ a = 0.998 10 –10
8,000 51.1
25
Empirical analysis – What could be T(N)?
Run the program for various input sizes and measure running time.
N time (seconds) †
public static long play(int N) {
long sum = 0L;
4,000 0.016
for(int i = 1; i<= N; i++) {
for(int j = 1; j <= N; j++)
8,000 0.062
sum++;
}
16,000 0.185
return sum;
32,000 0.733 }
64,000 2.955
75,000 3.974
100,000 ?
26
1.4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u ‣ memory
Mathematical models for running time
Donald Knuth
1974 Turing Award
30
Cost of basic operations
assignment statement a = b c2
int count = 0;
for (int i = 0; i < N; i++)
if (a[i] == 0)
count++;
N array accesses
operation frequency
variable declaration 2
assignment statement 2
equal to compare N
array access N
increment N to 2 N
32
Example: 2-SUM
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
Pf. [ n even]
half of half of
square diagonal 33
String theory infinite sum
http://www.nytimes.com/2014/02/04/science/in-the-end-it-all-adds-up-to.html
34
Example: 2-SUM
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
operation frequency
equal to compare ½ N (N − 1)
tedious to count exactly
array access N (N − 1)
increment ½ N (N − 1) to N (N − 1)
35
Simplifying the calculations
36
Simplification 1: cost model
Cost model. Use some basic operation as a proxy for running time.
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
operation frequency
equal to compare ½ N (N − 1)
Ex 1. ⅙ N 3 + 20 N + 16 ~ ⅙N3
Ex 2. ⅙ N 3 + 100 N 4/3 + 56 ~ ⅙N3
Ex 3. ⅙N3 - ½N 2 + ⅓ N ~ ⅙N3
38
Simplification 2: tilde notation
increment ½ N (N − 1) to N (N − 1) ~ ½ N 2 to ~ N 2
39
Example: 2-SUM
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++) "inner loop"
if (a[i] + a[j] == 0)
count++;
A. ~ N 2 array accesses.
Bottom line. Use cost model and tilde notation to simplify counts.
40
Example: 3-SUM
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++) "inner loop"
if (a[i] + a[j] + a[k] == 0)
count++;
A. ~ ½ N 3 array accesses.
Bottom line. Use cost model and tilde notation to simplify counts.
41
Diversion: estimating a discrete sum
Ex 1. 1 + 2 + … + N.
Ex 2. 1k + 2k + … + N k.
42
Estimating a discrete sum
Ex 4. 1 + ½ + ¼ + ⅛ + …
wolframalpha.com
N (N - 1) (N - 2)
-----------------
6
44
Mathematical models for running time
In practice,
・Formulas can be complicated.
・Advanced mathematics might be required.
・Exact models best left for experts.
costs (depend on machine, compiler)
TN = c1 A + c2 B + c3 C + c4 D + c5 E
A= array access
B= integer add
C= integer compare frequencies
D= increment (depend on algorithm, input)
E= variable assignment
Definition. If f (N) ~ c g(N) for some constant c > 0, then the order of growth
of f (N) is g(N).
・Ignores leading coefficient.
・Ignores lower-order terms.
Ex. The order of growth of the running time of this code is N 3.
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++)
if (a[i] + a[j] + a[k] == 0)
count++;
49
Common order-of-growth classifications
order of
name typical code framework description example T(2N) / T(N)
growth
add two
1 constant a = b + c; statement 1
numbers
while (N > 1)
log N logarithmic divide in half binary search ~1
{ N = N / 2; ... }
divide
N log N linearithmic [see mergesort lecture] mergesort ~2
and conquer
50
Binary search demo
Goal. Given a sorted array and a key, find index of the key in the array?
6 13 14 25 33 43 51 53 64 72 84 93 95 96 97
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
p r
54
Binary search: Java implementation
Trivial to implement?
・First binary search published in 1946.
・First bug-free one in 1962.
・Bug in Java's Arrays.binarySearch() discovered in 2006.
public static int binarySearch(int[] a, int key)
{
int p = 0, r = a.length-1;
while (p <= r)
{
int mid = p + (r - p) / 2;
if (key < a[mid]) r = mid - 1;
else if (key > a[mid]) p = mid + 1; one "3-way compare"
𝑁
Binary search recurrence. 𝑇 𝑁 ≤ 𝑇 + 1 for N > 1, with T (1) = 1.
2
= 1 + lg N
56
An N2 log N algorithm for 3-SUM
Algorithm.
input
・Step 1: Sort the N (distinct) numbers. 30 -40 -20 -10 40 0 10 5
・Step 2: For each pair of numbers a[i]
sort
and a[j], binary search for -(a[i] + a[j]). -40 -20 -10 0 5 10 30 40
binary search
(-40, -20) 60
(-40, -10) 50
Analysis. Order of growth is N 2 log N.
・Step 1: N 2 with insertion sort.
(-40, 0) 40
(-40, 10) 30
⋮ ⋮
(-20, -10) 30
Remark. Can achieve N2 by modifying
⋮ ⋮
binary search step.
(-10, 0) 10
⋮ ⋮
only count if
( 10, 30) -40 a[i] < a[j] < a[k]
to avoid
( 10, 40) -50
double counting
( 30, 40) -70 58
Comparing programs
32,000 14.88
64,000 59.16
ThreeSumDeluxe.java
Goals.
・Establish “difficulty” of a problem.
・Develop “optimal” algorithms.
Approach.
・Suppress details in analysis: analyze “to within a constant factor.”
・Eliminate variability in input model: focus on the worst case.
Upper bound. Performance guarantee of algorithm for any input.
Lower bound. Proof that no algorithm can do better.
Optimal algorithm. Lower bound = upper bound (to within a constant
factor).
64
Commonly-used notations in the theory of algorithms
½ N2
Asymptotic order 10 N 2 classify
Big Theta Θ(N2)
of growth 5 N 2 + 22 N log N + 3N algorithms
⋮
10 N 2
100 N develop
Big Oh Θ(N2) and smaller O(N2)
22 N log N + 3 N upper bounds
⋮
½N2
N5 develop
Big Omega Θ(N2) and larger Ω(N2)
N 3 + 22 N log N + 3 N lower bounds
⋮
Theory of algorithms: example 1
Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 1-SUM = “Is there a 0 in the array? ”
Upper bound. A specific algorithm.
・Ex. Brute-force algorithm for 1-SUM: Look at every array entry.
・Running time of the optimal algorithm for 1-SUM is O(N).
Lower bound. Proof that no algorithm can do better.
・Ex. Have to examine all N entries (any unexamined one might be 0).
・Running time of the optimal algorithm for 1-SUM is Ω(N).
Optimal algorithm.
・Lower bound equals upper bound (to within a constant factor).
・Ex. Brute-force algorithm for 1-SUM is optimal: its running time is
Θ(N).
66
Theory of algorithms: example 2
Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 3-SUM.
Upper bound. A specific algorithm.
・Ex. Brute-force algorithm for 3-SUM.
・Running time of the optimal algorithm for 3-SUM is O(N 3).
67
Theory of algorithms: example 2
Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 3-SUM.
Upper bound. A specific algorithm.
・Ex. Improved algorithm for 3-SUM.
・Running time of the optimal algorithm for 3-SUM is O(N 2 log N ).
Start.
・Develop an algorithm.
・Prove a lower bound.
Gap?
・Lower the upper bound (discover a new algorithm).
・Raise the lower bound (more difficult).
Golden Age of Algorithm Design.
・1970s-.
・Steadily decreasing upper bounds for many important problems.
・Many known optimal algorithms.
Caveats.
・Overly pessimistic to focus on worst case?
・Need better than “to within a constant factor” to predict
performance. 69
Commonly-used notations in the theory of algorithms
10 N 2 provide
Tilde leading term ~ 10 N 2 10 N 2 + 22 N log N approximate
10 N 2 + 2 N + 37 model
½ N2
asymptotic order of classify
Big Theta Θ(N2) 10 N 2
growth algorithms
5N 2+ 22 N log N + 3N
10 N 2
develop
Big Oh Θ(N2) and smaller O(N2) 100 N
upper bounds
22 N log N + 3 N
½N2
develop
Big Omega Θ(N2) and larger Ω(N2) N 5
lower bounds
N 3+ 22 N log N + 3 N
70
1.4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u ‣ memory
Basics
73
Typical memory usage for primitive types and arrays
boolean 1 char[] 2 N + 24
byte 1 int[] 4 N + 24
char 2 double[] 8 N + 24
short 2
one-dimensional arrays
int 4
float 4
type bytes
long 8
char[][] ~2MN
double 8
int[][] ~4MN
primitive types
double[][] ~8MN
two-dimensional arrays
74
Typical memory usage for objects in Java
Reference 4 bytes
75
Typical memory usage summary
77
Example
16 bytes
public class WeightedQuickUnionUF
(object overhead)
{
8 + (4N + 24) bytes each
private int[] id;
(reference + int[] array)
private int[] sz; 4 bytes (int)
private int count; 4 bytes (padding)
{
id = new int[N];
sz = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
for (int i = 0; i < N; i++) sz[i] = 1;
}
...
}
A. 8 N + 88 ~ 8 N bytes.
79
Turning the crank: summary
Empirical analysis.
・Execute program to perform experiments.
・Assume power law and formulate a hypothesis for running time.
・Model enables us to make predictions.
Mathematical analysis.
・Analyze algorithm to count frequency of operations.
・Use tilde notation to simplify analysis.
・Model enables us to explain behavior.
Scientific method.
・Mathematical model is independent of a particular system;
applies to machines not yet built.
・Empirical analysis is necessary to validate mathematical models
and to make predictions.
81