CS 332: Algorithms
Quicksort
Review: Analyzing Quicksort
What will be the worst case for the algorithm?
What will be the best case for the algorithm?
Partition is balanced
Which is more likely?
Partition is always unbalanced
The latter, by far, except...
Will any particular input elicit the worst case?
Yes: Already-sorted input
Review: Analyzing Quicksort
In the worst case:
T(1) = (1)
T(n) = T(n - 1) + (n)
Works out to
T(n) = (n2)
Review: Analyzing Quicksort
In the best case:
T(n) = 2T(n/2) + (n)
What does this work out to?
T(n) = (n lg n)
Review: Analyzing Quicksort
(Average Case)
Intuitively, a real-life run of quicksort will
produce a mix of bad and good splits
Randomly distributed among the recursion tree
Pretend for intuition that they alternate between
best-case (n/2 : n/2) and worst-case (n-1 : 1)
What happens if we bad-split root node, then
good-split the resulting size (n-1) node?
Review: Analyzing Quicksort
(Average Case)
Intuitively, a real-life run of quicksort will produce
a mix of bad and good splits
Randomly distributed among the recursion tree
Pretend for intuition that they alternate between bestcase (n/2 : n/2) and worst-case (n-1 : 1)
What happens if we bad-split root node, then good-split
the resulting size (n-1) node?
We end up with three subarrays, size 1, (n-1)/2, (n-1)/2
Combined cost of splits = n + n -1 = 2n -1 = O(n)
No worse than if we had good-split the root node!
Review: Analyzing Quicksort
(Average Case)
Intuitively, the O(n) cost of a bad split
(or 2 or 3 bad splits) can be absorbed
into the O(n) cost of each good split
Thus running time of alternating bad and good
splits is still O(n lg n), with slightly higher
constants
How can we be more rigorous?
Analyzing Quicksort: Average Case
For simplicity, assume:
All inputs distinct (no repeats)
Slightly different partition() procedure
partition around a random element, which is not included
in subarrays
all splits (0:n-1, 1:n-2, 2:n-3, , n-1:0) equally likely
What is the probability of a particular split
happening?
Answer: 1/n
Analyzing Quicksort: Average Case
So partition generates splits
(0:n-1, 1:n-2, 2:n-3, , n-2:1, n-1:0)
each with probability 1/n
If T(n) is the expected running time,
1 n 1
T n T k T n 1 k n
n k 0
What is each term under the summation for?
What is the (n) term for?
Analyzing Quicksort: Average Case
So
1 n 1
T n T k T n 1 k n
n k 0
2 n 1
T k n
n k 0
Write it on
the board
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
Assume that the inductive hypothesis holds
Substitute it in for some value < n
Prove that it follows for n
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
Whats the answer?
Assume that the inductive hypothesis holds
Substitute it in for some value < n
Prove that it follows for n
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
T(n) = O(n lg n)
Assume that the inductive hypothesis holds
Substitute it in for some value < n
Prove that it follows for n
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
Assume that the inductive hypothesis holds
T(n) = O(n lg n)
Whats the inductive hypothesis?
Substitute it in for some value < n
Prove that it follows for n
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
Assume that the inductive hypothesis holds
T(n) = O(n lg n)
T(n) an lg n + b for some constants a and b
Substitute it in for some value < n
Prove that it follows for n
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
Assume that the inductive hypothesis holds
T(n) an lg n + b for some constants a and b
Substitute it in for some value < n
T(n) = O(n lg n)
What value?
Prove that it follows for n
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
Assume that the inductive hypothesis holds
T(n) an lg n + b for some constants a and b
Substitute it in for some value < n
T(n) = O(n lg n)
The value k in the recurrence
Prove that it follows for n
Analyzing Quicksort: Average Case
We can solve this recurrence using the dreaded
substitution method
Guess the answer
Assume that the inductive hypothesis holds
T(n) an lg n + b for some constants a and b
Substitute it in for some value < n
T(n) = O(n lg n)
The value k in the recurrence
Prove that it follows for n
Grind through it
Analyzing Quicksort: Average Case
2 n 1
T n T k n
n k 0
The recurrence to be solved
2 n 1
ak lg k b n
n k 0
Plug
What
in inductive
are we doing
hypothesis
here?
n 1
2
b ak lg k b n
n
k 1
Expand
case
Whatout
arethe
we k=0
doing
here?
2 n 1
2b
ak lg k b
n
n k 1
n
2b/n is just a constant,
What are we doing here?
so fold it into (n)
2 n 1
ak lg k b n
n k 1
Note: leaving the same
recurrence as the book
Analyzing Quicksort: Average Case
2 n 1
T n ak lg k b n
n k 1
2 n 1
2 n 1
ak lg k b n
n k 1
n k 1
The recurrence to be solved
Distribute
thewe
summation
What are
doing here?
2a n 1
2b
Evaluate the summation:
k
lg
k
(
n
1
)
n
What are we doing here?
b+b++b
= b (n-1)
n k 1
n
2a n 1
k lg k 2b n
n k 1
This summation gets its own set of slides later
Since
n-1<n,
2b(n-1)/n
< 2b
What
are we
doing here?
Analyzing Quicksort: Average Case
2a n 1
T n
k lg k 2b n
n k 1
The recurrence to be solved
2a 1 2
1 2
Wellthe
prove
this later
hell?
n lg n n 2b n What
n 2
8
a
an lg n n 2b n
Distribute
thewe(2a/n)
What are
doingterm
here?
4
a
our goal is to get
an lg n b n b n Remember,
What are we doing here?
4 T(n) an lg n + b
Pick a large enough that
an lg n b
How did we do this?
an/4 dominates (n)+b
Analyzing Quicksort: Average Case
So T(n) an lg n + b for certain a and b
Thus the induction holds
Thus T(n) = O(n lg n)
Thus quicksort runs in O(n lg n) time on average
(phew!)
Oh yeah, the summation
Tightly Bounding
The Key Summation
n 1
n 2 1
n 1
k 1
k 1
k n 2
n 2 1
n 1
k 1
k n 2
k lg k k lg k k lg k
k lg k k lg n
n 2 1
n 1
k 1
k n 2
k lg k lg n k
Split the summation for a
What are we doing here?
tighter bound
The lg k in the second term
What are we doing here?
is bounded by lg n
Move the lg n outside the
What are we doing here?
summation
Tightly Bounding
The Key Summation
n 1
n 2 1
n 1
k 1
k 1
k n 2
k lg k k lg k lg n k
The summation bound so
far
n 2 1
n 1
k 1
k n 2
k lg n 2 lg n k
n 2 1
k lg n 1 lg n
k 1
lg n 1
n 1
The lg k in the first term is
What are we doing here?
bounded by lg n/2
lg n/2
= lg
n we
- 1 doing here?
What
are
k n 2
n 2 1
n 1
k 1
k n 2
k lg n k
Move (lg n - 1) outside the
What are we doing here?
summation
Tightly Bounding
The Key Summation
n 1
n 2 1
n 1
k 1
k 1
k n 2
k lg k lg n 1 k lg n k
lg n
n 2 1
n 2 1
k 1
k 1
k lg n
n 1
n 2 1
k 1
k 1
lg n k
n 1
Distribute
the
(lg nhere?
- 1)
What
are we
doing
k n 2
The summations overlap in
What are we doing here?
range; combine them
n 1 (n)
lg n
2
The summation bound so
far
n 2 1
k
k 1
TheWhat
Guassian
are weseries
doing here?
Tightly Bounding
The Key Summation
n 1 (n)
k lg k
lg n
k 1
n 1
n 2 1
The summation bound so
far
k 1
n 2 1
1
n n 1 lg n k
2
k 1
Rearrange first term, place
What are we doing here?
upper bound on second
1
1 n n
n n 1 lg n 1
2
2 2 2
1 2
1 2 n
n lg n n lg n n
2
8
4
X Guassian
What are series
we doing?
Multiply it
What are we doing?
all out
Tightly Bounding
The Key Summation
n 1
1 2
1 2 n
k lg k n lg n n lg n n
2
8
4
k 1
1 2
1 2
n lg n n when n 2
2
8
Done!!!