Lecture 7
Lecture 7
Slides are mainly originate from Dr. Steven Skiena’s course on Analysis of Algorithms 1
Topic: Introduction to Sorting
2
Importance of Sorting
Why don’t CS profs ever stop talking about sorting?
• Computers spend a lot of time sorting, historically 25%
on mainframes.
• Sorting is the best studied problem in computer science,
with many different algorithms known.
• Most of the interesting ideas we will encounter in the
course can be taught in the context of sorting, such as
divide-and-conquer, randomized algorithms, and lower
bounds.
You should have seen most of the algorithms, so we will
concentrate on the analysis. 3
Efficiency of Sorting
Sorting is important because that once a set of items is sorted,
many other problems become easy.
Further, using O(n log n) sorting algorithms leads naturally
to sub-quadratic algorithms for all these problems.
n n 2 /4 n lg n
10 25 33
100 2,500 664
1,000 250,000 9,965
10,000 25,000,000 132,877
100,000 2,500,000,000 1,660,960
1,000,000 250,000,000,000 13,815,551
Large-scale data processing is impossible with Ω(n2) sorting.
4
Pragmatics of Sorting: Comparison Functions
Alphabetizing is the sorting of text strings.
Libraries have very complete and complicated rules con-
cerning the relative collating sequence of characters and
punctuation.
• Is Skiena the same key as skiena?
• Is Brown-Williams before or after Brown America before
or after Brown, John?
Explicitly controlling the order of keys is the job of the
comparison function we apply to each pair of elements,
including the question of increasing or decreasing order.
5
Pragmatics of Sorting: Equal Elements
Elements with equal keys will all bunch together in any total
order, but sometimes the relative order among these keys
matters.
Often there are secondary keys (like first names) to test after
the primary keys. This is a job for the comparison function.
Certain algorithms (like quicksort) require special care to run
efficiently with large numbers of equal elements.
6
Pragmatics of Sorting: Library Functions
Any reasonable programming language has a built-in sort
routine as a library function.
You are almost always better off using the system sort than
writing your own routine.
For example, the standard library for C contains the function
qsort for sorting:
7
Questions?
8
Topic: Applications of Sorting
9
Application of Sorting: Searching
Binary search lets you test whether an item is in a dictionary
in O(lg n) time.
Search preprocessing is perhaps the single most important
application of sorting.
10
Application of Sorting: Closest pair
Given n numbers, find the pair which are closest to each other.
Once the numbers are sorted, the closest pair will be next to
each other in sorted order, so an O(n) linear scan completes
the job.
11
Application of Sorting: Element Uniqueness
Given a set of n items, are they all unique or are there any
duplicates?
Sort them and do a linear scan to check all adjacent pairs.
This is a special case of closest pair above.
12
Application of Sorting: Mode
Given a set of n items, which element occurs the largest
number of times? More generally, compute the frequency
distribution.
Sort them and do a linear scan to measure the length of all
adjacent runs.
The number of instances of k in a sorted array can be found in
O(log n) time by using binary search to look for the positions
of both k − ϵ and k + ϵ.
13
Application of Sorting: Median and Selection
What is the kth largest item in the set?
Once the keys are placed in sorted order in an array, the kth
largest can be found in constant time by simply looking in the
kth position of the array.
There is a linear time algorithm for this problem, but the idea
comes from partial sorting.
14
Application of Sorting: Convex hulls
Given n points in two dimensions, find the smallest area
polygon which contains them all.
17
Topic: Selection Sort / Heapsort
18
Selection Sort
Selection sort scans throught the entire array, repeatedly
finding the smallest remaining element.
For i = 1 to n
A: Find the smallest of the first n − i + 1 items.
B: Pull it out of the array and put it first.
19
The Data Structure Matters
Using arrays or unsorted linked lists as the data structure,
operation A takes O(n) time and operation B takes O(1),
for an O(n 2 ) selection sort.
Using balanced search trees or heaps, both of these operations
can be done within O(lg n) time, for an O(n log n)
selection sort called heapsort.
Balancing the work between the operations achieves a better
tradeoff.
Key question: “Can we use a different data structure?”
20
Questions?
21
Topic: Priority Queues with Applications
22
Priority Queues
Priority queues are data structures which provide extra
flexibility over sorting.
This is important because jobs often enter a system at
arbitrary intervals. It is more cost-effective to insert a new
job into a priority queue than to re-sort everything on each
new arrival.
23
Priority Queue Operations
The basic priority queue supports three primary operations:
• Insert(Q,x): Given an item x with key k, insert it into the
priority queue Q.
• Find-Minimum(Q) or Find-Maximum(Q): Return a
pointer to the item whose key is smaller (larger) than
any other key in the priority queue Q.
• Delete-Minimum(Q) or Delete-Maximum(Q) – Remove
the item from the priority queue Q whose key is minimum
(maximum).
Each of these operations can be easily supported using heaps
or balanced binary trees in O(log n). 24
Questions?
25
Applications of Priority Queues: Dating
What data structure should be used to suggest who to ask out
next for a date?
It needs to support retrieval by desirability, not name.
Desirability changes (up or down), so you can re-insert the
max with the new score after each date.
New people you meet get inserted with your observed
desirability level.
There is no reason to delete anyone until they rise to the top.
26
Applications of Priority Queues: Discrete Event
Simulations
In simulations of airports, parking lots, and jai-alai – priority
queues can be used to maintain who goes next.
The stack and queue orders are just special cases of orderings.
In real life, certain people cut in line, and this can be modeled
with a priority queue.
27
Topic: Heaps
28
Heap Definition
A binary heap is defined to be a binary tree with a key in each
node such that:
1. All leaves are on, at most, two adjacent levels.
2. All leaves on the lowest level occur to the left, and all
levels except the lowest one are completely filled.
3. The key in root is ≤ all its children, and the left and right
subtrees are again binary heaps.
Conditions 1 and 2 specify shape of the tree, and condition 3
the labeling of the tree.
29
Binary Heaps
Heaps maintain a partial order on the set of elements which
is weaker than the sorted order (so it can be efficient to
maintain) yet stronger than random order (so the minimum
element can be quickly identified).
1 1492
1492 2 1783
3 1776
1783 1776 4 1804
5 1865
6 1945
1804 1865 1945 1963 1963
7
8 1918
1918 2001 1941 9 2001
10 1941
31
Array-Based Heaps
The most natural representation of this binary tree would
involve storing each key in a node with pointers to its two
children.
However, we can store a tree as an array of keys, using
the position of the keys to implicitly satisfy the role of the
pointers.
The left child of k sits in position 2k and the right child
in 2k + 1.
The parent of k is in position [n/2]
32
Can we Implicitly Represent Any Binary Tree?
The implicit representation is only efficient if the tree is
sparse, meaning that the number of nodes n < 2h.
All missing internal nodes still take up space in our structure.
This is why we insist on heaps as being as balanced/full at
each level as possible.
The array-based representation is also not as flexible to
arbitrary modifications as a pointer-based tree.
33
Constructing Heaps
Heaps can be constructed incrementally, by inserting new
elements into the left-most open spot in the array.
If the new element is greater than its parent, swap their
positions and recur.
Since all but the last level is always filled, the height h of an
n element heap is bounded because:
σℎ𝑖=1 2i = 2h+1 − 1 ≥ n
so h = [lg n]
Doing n such insertions really takes Θ(n log n), because
the last n/ 2 insertions require O(log n) time each.
34
Heap Insertion
35
Bubble Up
36
Extracting the Minimum Element
if (q->n <= 0) {
printf("Warning: empty priority queue.\n");
} else {
min = q->q[1];
q->q[1] = q->q[q->n];
q->n = q->n - 1;
bubble_down(q, 1);
}
return(min);
}
37
Bubble Down Implementation
c = pq_young_child(p);
min_index = p;
if (min_index != p) {
pq_swap(q, p, min_index);
bubble_down(q, min_index);
}
}
38
Heapsort
Heapify can be used to construct a heap, using the observation
that an isolated element forms a heap of size 1.
void heapsort_(item_type s[], int n) {
int i; /* counters */
priority_queue q; /* heap for heapsort */
make_heap(&q, s, n);
40