Sorting
Sorting
CH. 7
SORTING
Motivation
3
Motivation: Sequential Search
Search the WHOLE list in left-to-right or
right-to-left order until we find the first
occurrence of the record with the target
key.
template <class E, class K>
int SeqSearch (E *a, const int n, const K& k)
{ // Search a[1:n] from left to right. Return least i such
// that the key of a[i] equals k. If there is no such I,
// return 0.
int i;
for (i = 1 ; i <= n && a[i] != k ; i++ );
if (i > n) return 0;
return i;
} Time complexity = 𝑂(𝑛)
4
Motivation: Improvement
How do we improve the performance of
searching a record?
Sort the list in a specific order before you
do the search!
For examples, given an ordered numeric
list, using Binary search could obtain an
improved performance of 𝑂(log 𝑛)
5
Recursive Binary Search
int BinarySearch(int *A, const int x, const int
left, const int right )
{ // Search the A[left],..,A[right] for x
if (left <= right) { // more integers to check
int middle = (left+right)/2;
if (x < A[middle])
return BinarySearch(A, x, left, middle-1);
else if (x > A[middle])
return BinarySearch(A, x, middle+1, right);
return middle;
} // end of if
return -1; // not found
}
6
Binary Search Example
Search for 𝑥 = 9 in array A[0]…[7] :
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7]
A 1 3 5 8 9 17 32 50
To improve the
search performance!
8
Two Categories
Internal sort:
◦ The entire sort could be done in main memory
◦ Suitable for list of small size (e.g. 1MB)
◦ Insertion sort, merge sort, heap sort, radix sort
External sort:
◦ Data I/O are necessary during the sorting.
◦ Suitable for list of large size (e.g. 1T)
◦ Merge sort
9
Stable Sort
A sort algorithm is called “Stable” iff 𝑟𝑖 =
𝑟𝑗 and 𝑟𝑖 precedes 𝑟𝑗 in the input list, then
𝑟𝑖 precedes 𝑟𝑗 in the sorted list
Unsorted Stable sort
Insertion
Sort
12
A Running Example
44 55 12 42 94 18 6 67
44 55 12 42 94 18 6 67
44 55 12 42 94 18 6 67
12 44 55 42 94 18 6 67
12 42 44 55 94 18 6 67
…
13
Insertion Sort (codes)
template <class T>
void Insert(const T& e, T *a, int i){
a[0] = e;
while (e < a[i]) {
a[i+1] = a[i];
i--; }
a[i + 1] = e;
}
template <class T>
void InsertionSort(T *a, const int n){
for (int j = 2; j <= n ; j++){
T temp = a[j];
Insert(temp, a, j – 1);}
} 14
Complexity
Worst case running time
◦ Outer loop: 𝑂(𝑛)
◦ Inner loop: 𝑂(𝑗)
n
j =1
j = O(n 2 )
Stable sort
15
7.3
Quick Sort
26 5 37 1 61 11 59 15 48 19
i swap j
26 5 19 1 61 11 59 15 48 37
i swap j
26 5 19 1 15 11 59 61 48 37
𝑖 › 𝑗 → stop j i
11 5 19 1 15 26 59 61 48 37
Sublist 1 Sublist 2
18
Quick Sort (code)
template <class T>
void QuickSort(T *a, const int left, const int right)
{ if (left < right) {
int i = left, j = right + 1, pivot = a[left];
do {
do i++; while (a[i] < pivot);
do j--; while (a[j] > pivot);
if (i < j) swap (a[i], a[j]);
} while (i < j);
swap (a[left], a[j]);
QuickSort(a, left, j - 1);
QuickSort(a, j + 1, right);
}
}
19
Time complexity
If the splitting record is in the middle
Depth of recursion: 𝑂(log 𝑛)
Finding the position of splitting record:
𝑂(𝑛)
Total running time: 𝑂(𝑛 log 𝑛)
Worst case running time: 𝑂(𝑛2)
𝐴[𝑟] 𝑛−1
𝑛−2
21
7.4
How Fast
Can We Sort
23
Decision Tree for Insertion Sort
K1≤K2 [1,2,3]
Yes No
Yes No Yes No
24
Time Complexity
Given a list of 𝒏 records.
There are 𝒏! combinations and thus
having 𝒏! leaf nodes in a decision tree.
For a decision tree (binary tree) with 𝒏!
leaves, the height (depth) of the tree is
𝒏 log 𝒏.
◦ 𝒏! ≥ 𝒏/𝟐 𝒏/𝟐
◦ log 𝑛! ≥ 𝑛/2 log 𝑛/2 = Ω(𝑛 log 𝑛)
Therefore the average root-to-leaf path is
𝛀(𝒏 log 𝒏). 25
7.5
Merge Sort
27
Merge Illustration
Sorted
A:
merge
Sorted Sorted
FirstPart SecondPart
A:
A:
13 5 15 28 30 6 10 14
k=0
L: R:
23 15
3 28
7 30
8 16 10
4 14
5 22
6
i=0 j=0
30
A Running Example
A:
1 52 15 28 30 6 10 14
k=1
L: R:
23 35 15
7 28
8 16 10
4 14
5 22
6
i=0 j=1
31
A Running Example
A:
1 2 3 28 30 6 10 14
k=2
L: R:
2 3 7 8 16 10
4 14
5 22
6
i=1 j=1
32
A Running Example
A:
1 2 3 4 6 10 14
k=3
L: R:
2 3 7 8 16 10
4 14
5 22
6
i=2 j=1
33
A Running Example
A:
1 2 3 4 5 6 10 14
k=4
L: R:
2 3 7 8 16 10
4 14
5 22
6
i=2 j=2
34
A Running Example
A:
1 2 3 4 5 6 10 14
k=5
L: R:
2 3 7 8 16 10
4 14
5 22
6
i=2 j=3
35
A Running Example
A:
1 2 3 4 5 6 7 14
k=6
L: R:
2 3 7 8 16 10
4 14
5 22
6
i=2 j=4
36
A Running Example
A:
1 2 3 4 5 6 7 8
k=7
L: R:
23 35 15
7 28
8 16 10
4 14
5 22
6
i=3 j=4
37
A Running Example
A:
1 2 3 4 5 6 7 8
k=8
L: R:
23 35 15
7 28
8 16 10
4 14
5 22
6
i=4 j=4
38
Merge Sort (codes)
template <class T>
void Merge(T *initList, T *mergedList, const int l, const int m,
const int n)
{ for (int i1 = l, iResult = l, i2 = m + 1; i1 <= m && i2 <= n;
iResult++)
if (initList[i1] <= initList [i2]){
mergedList[iResult] = initList[i1];
i1++;
}else{
mergedList[iResult] = initList[i2];
i2++;}
// copy the remaining records, if any, of the 1st list
copy (initList + i1, initList + m + 1, mergedList + iResult);
// copy the remaining records, if any, of 2nd list
copy (initList + i2, initList + n + 1, mergedList + iResult);
}
i1 i2
initList l m m+1 n
mergedList l n
39
iResult
7.5.2 Iterative Merge Sort
Interpret the list as comprised of n
sorted sublists.
1𝑠𝑡 merge pass: 𝒏 sublists are merged by
pairs to obtain 𝒏/𝟐 sublists.
2𝑛𝑑 merge pass: 𝒏/𝟐 sublists are merged
by pairs to obtain 𝒏/𝟒 sublists.
…
The process repeats until only one sublist
exists.
40
Example
26 5 77 1 61 11 59 15 48 19
5 26 1 77 11 61 15 59 19 48
1 5 26 77 11 15 59 61
1 5 11 15 26 59 61 77
1 5 11 15 19 26 48 59 61 77
41
Iterative Merge Sort (code)
template <class T>
void MergePass(T *initList, T *resultList, const int n, const
int s)
{ // Adjacent pairs of sublists of size s are merged from
// initList to resultList. n is the size of initList.
for (int i = 1; // i is the 1st position in the 1st sublist
i <= n-2*s+1; // enough records for two sublists?
i+ = 2*s)
Merge(initList, resultList, i, i + s -1, i + 2 * s -1);
// merge remaining list of size < 2 * s
if ((i + s -1) < n )
Merge(initList, resultList, i, i + s -1, n);
else
copy(initList + i, initList + n + 1, resultList + i);
}
i i+s-1 i+2s-1 n-2*s+1
n
2s 2s 2s 2s <2s
42
Iterative Merge Sort (code)
template <class T>
void MergeSort(T *a, const int n)
{
T *tempList = new T[n+1];
// l is the length of the sublist currently being merged
for (int l =1; l < n; l*= 2){
MergePass(a, tempList, n, l);
l*=2;
MergePass(tempList, a, n, l); // switch role of a and
// tempList
}
delete [] tempList;
}
43
Properties
Time complexity
◦ Number of merge pass: 𝑂(log 𝑛)
◦ Time complexity of merge pass: 𝑂(𝑛)
◦ Time complexity = 𝑂(𝑛 log 𝑛)
Require additional storage to store
merged result during the process.
Stable sort
44
7.5.3 Recursive Merge Sort
Divide the list to be sorted into two
roughly equal parts called left and right
sublists.
Recursively sort the two sublists.
Merge the sorted sublists
45
Recursive Merge Sort Example
26 5 77 1 61 11 59 15 48 19
26 5 77 1 61 11 59 15 48 19
26 5 77 1 61 11 59 15 48 19
26 5 77 1 61 11 59 15 48 19
5 26 77 1 61 11 59 15 19 48
5 26 77 1 61 11 15 59 19 48
1 5 26 61 77 11 15 19 48 59
1 5 11 15 19 26 48 59 61 77 46
Recursive Merge Sort (code)
Using a structure “link” to represent the
index order of sorted list.
template <class T>
int rMergeSort(T* a, int* link, const int left, const int right)
{// sorting a[left:right]. link[i] is initialize to 0.
// rMerge returns the index of 1st element in the sorted list.
if (left >= right) return left;
int mid = (left + right) /2;
return ListMerge(a, link,
rMergeSort(a, link, left, mid), // sort left sublist.
rMergeSort(a, link, mid + 1, right));// sort right sublist.
}
47
tamplate <class T>
int ListMerge(T* a, int* link, const int start1, const int
start2)
{// merge two sorted lists, starting from start1 and start2.
// link[0] is a temporary head, stores the head of merged list.
// iRsults records the last element of currently merged list.
int iResult = 0;
for (int i1 = start1, i2 =start2; i1 && i2; ){
if (a[i1] <= a[i2]) {
link[iResult] = i1; iResult = i1; i1 = link[i1];}
else {
link[iResult] = i2; iResult = i2; i2 = link[i2];}
}
// attach the remaining list to the resultant list.
if (i1 = = 0) link[iResult] = i2;
else link[iResult] = i1;
return link[0]; index 1 2 3 4 5 6 7 8 9 10
} data 26 5 77 1 61 11 59 15 48 19
link 4 9 6 0 2 3 8 5 10 7 148
7.6
Heap Sort
14 14
10 7 12 7
12 8 6 8 6
51
Max Heap: Representation
Since the heap is a complete binary tree, we
could adopt “Array Representation” as
we mentioned before!
Let node 𝑖 be in position 𝑖 (array[0] is empty)
◦ 𝑷𝒂𝒓𝒆𝒏𝒕(𝒊) = 𝒊/𝟐 if 𝑖 ≠ 1. If 𝑖 = 1, 𝑖 is the root
and has no parent.
◦ 𝒍𝒆𝒇𝒕𝑪𝒉𝒊𝒍𝒅(𝒊) = 𝟐𝒊 if 2𝑖 ≤ 𝑛. If 2𝑖 > 𝑛, then 𝑖
has no left child.
◦ 𝒓𝒊𝒈𝒉𝒕𝑪𝒉𝒊𝒍𝒅(𝒊) = 𝟐𝒊 + 𝟏 if 2𝑖 + 1 ≤ 𝑛, if 2𝑖 +
1 > 𝑛, then 𝑖 has no right child.
52
Max Heap: Insert
Make sure it is a complete binary tree
Insert a new node
Check if the new node is greater than its
parent
If so, swap two nodes
20
15 52
14 10 25
53
Max Heap: Delete
1. Always delete the root
2. Move the last element to the root ( maintain a
complete binary tree )
3. Swap with larger and largest child (if any)
4. Continue step 3 until the max heap is
maintained (trickle down)
20
16 15
12 8
54
7.6 Heap Sort
Utilize the max-heap structure
The insertion and deletion could be done
in O(logn)
Build a max-heap using n records, insert
each record one by one ( O(nlogn) )
Iteratively remove the largest record (the
root) from the max-heap ( O(nlogn) )
Not a stable sort
56
Heap Sort Example
26 5 77 1 61 11 59 15 48 19
[1]
77
[2] [3]
61 59
[4] [5] [6] [7]
48 19 11 26
[1]
61
[2] [3]
48 59
[4] [5] [6] [7]
15 19 11 26
[1]
59
[2] [3]
48 26
[4] [5] [6] [7]
15 19 11 1
[1]
48
[2] [3]
19 26
[4] [5] [6] [7]
15 5 11 1
[1]
26
[2] [3]
19 11
[4] [5] [6] [7]
15 5 1 48
[1]
19
[2] [3]
15 11
[4] [6] [7]
[5]
1 5 26 48
[1]
15
[2] [3]
5 11
[4] [6] [7]
[5]
1 19 26 48
[1]
11
[2] [3]
5 1
[4] [6] [7]
[5]
15 19 26 48
[1]
5
[2] [3] 11
1
[4] [6] [7]
[5]
15 19 26 48
[1]
1
[2] 5 [3] 11
Sorting on
Several Keys
𝑥1, … , 𝑥𝑟 ≤ (𝑦1 , … , 𝑦𝑟 )
iff either 𝑥𝑘 = 𝑦𝑘 , 1 ≤ 𝑘 ≤ 𝑛, and
𝑥𝑛+1 < 𝑦𝑛+1 for some 𝑛 < 𝑟,
or 𝑥𝑘 = 𝑦𝑘 , 1 ≤ 𝑘 ≤ 𝑟
69
Sorting a Deck of Cards
Each card has two keys
◦ 𝐾1 (Suits): ♣ < ♦ < ♥ < ♠
◦ 𝐾2 (Face values): 2 < 3 < 4 …<J < Q < K < A
◦ The sorted list is: 2 ♣, …, A♣, …, 2 ♠, …,
A♠
Most-significant-digit (MSD) sort
◦ Sort using 𝐾1 to obtain 4 “piles” of records.
◦ Sort each piles into sub-piles.
◦ Merge piles by placing the piles on top of each
other.
70
Sorting a Deck of Cards (cont’d)
Least-significant-digit (LSD) sort
◦ Sort using 𝐾2 to obtain 13 “piles” of records.
◦ Place 3’s on top of 2’s,…, Aces on top of kings.
2 < 3 < 4 … J < Q < K <A
◦ Using a stable sort with respect to 𝐾1 and
obtain 4 “piles”.
◦ Merge piles by placing the piles on top of each
other.
71
Bin Sort (Bucket Sort)
Assume the records in a list to be sorted
come from a set of size 𝒎, say {1,2, … , 𝑚}.
Create 𝒎 buckets.
Scan the sequence 𝑎[1] … 𝑎[𝑛], and put
𝑎[𝑖] element into the 𝒂[𝒊]𝒕𝒉 bucket.
Concatenate all buckets to get the sorted
list.
Suitable for a set with small 𝒎 .
72
Radix Sort
Decompose the key (number) into
subkeys using some radix 𝒓
◦ For 𝑟 = 10, 𝐾 = 123, then 𝐾1 = 1, 𝐾2 = 2,
and 𝐾3 = 3.
Create 𝒓 buckets (0 ~ 𝒓−𝟏 ).
Apply bin sort with MSD or LSD order.
Suitable to sort numbers with large value
range.
73
Radix Sort Example (Pass 1)
179 208 306 93 859 984 55 9 271 33
f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]
33 859
74
Radix Sort Example (Pass 2)
271 93 33 984 55 306 208 179 859 9
f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]
75
Radix Sort Example (Pass 3)
306 208 9 33 55 859 271 179 93 984
f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]
33 271
55
93
77
LSB Radix Sort (code) 2/2
// do radix sorting…
for (i = d-1; i >=0; i--) { // sort in LSB order
fill(f, f+r, 0); // initialize the bins
for (int current = first; current; current = link[current])
{ // put the element with key k to bin[k]
int k = digit(a[current], i, r);
if (f[k]== 0) f[k] = current;
else link[e[k]] = current;
e[k] =current;
}
for (j = 0; !f[j]; j++); // find the 1st non-empty bin
first = f [j];
int last = e[j];
for (int k = j + 1; k < r; k++){ // link the rest of bins
if (f[k]) {
link[last] = f[k];
last = e[k];}
}
link[last] = 0;
}
return first;
} 78
7.9
Summary of
Internal
Sorting
80
Actual Runtime Comparison
n Insert Heap Merge Quick
0 0.000 0.000 0.000 0.000
50 0.004 0.009 0.008 0.006
5 Insertion Sort 100 0.011 0.019 0.017 0.013
200 0.033 0.042 0.037 0.029
300 0.067 0.066 0.059 0.045
4 400 0.117 0.090 0.079 0.061
500 0.179 0.116 0.100 0.079
1000 0.662 0.245 0.213 0.169
2000 2.439 0.519 0.459 0.358
3
3000 5.390 0.809 0.721 0.560
4000 9.530 1.105 0.972 0.761
5000 15.935 1.410 1.271 0.970
2
Heap Sort
Merge Sort
1
Quick Sort
0
0 500 1000 2000 3000 4000 5000
81
Design Guidelines
Insertion sort is good for small n and
when the list is partially sorted.
Merge sort is slightly faster than heap
sort but it require additional storage.
Quick sort outperforms in average.
Combining insertion sort with quick sort
to obtain better performance.
82
C++’s Sort Methods
Designed to optimize the average performance.
std::sort()
◦ Modified Quick sort.
◦ Heap Sort
when the number of subdivision exceed 𝑐log𝑛
◦ Insertion Sort
when the segment size becomes small
std::stable_sort()
◦ Merge Sort.
◦ Insertion Sort
when the segment size becomes small
std::partial_sort()
◦ Heap Sort.
83
7.10
External
Sorting
86
Runs & Merge Tree
run 1 run 2 run 3 run 4 run 5 run 6
Merge tree
87
Example: Problem
Internal memory: 750 records.
List to be sorted: 4500 records.
Block size: 250 records.
List in Disk
R1 R2 . . . . . . . . . . . . . . . R18
Internal Memory
88
Example: Merge Pass 1
To merge 𝑅𝑖 and 𝑅𝑖+1 :
◦ The blocks of 𝑅𝑖 and 𝑅𝑖+1 are read into input buffers
◦ The merged data is written to output buffer
◦ Output buffer full ⇒ write onto disk
◦ Input buffer empty ⇒ read from the new block
List in Disk
Internal Memory
89
Example: Merge Pass 2
To merge 𝑅𝑖 and 𝑅𝑗 :
◦ The blocks of 𝑅𝑖 and 𝑅𝑗 are read into input buffers
◦ The merged data is written to output buffer
◦ Output buffer full ⇒ write onto disk
◦ Input buffer empty ⇒ read from the new block
List in Disk
Internal Memory
90
7.10.5 Optimal Merging of Runs
Runs with different sizes.
Different merge sequence may result in
different runtime.
Internal nodes
(Merging)
15
5 2 4 5 15
2 4
External nodes
(Run and its size)
93
Runtime Evaluation
Merge tree A Merge tree B
𝐶𝑜𝑠𝑡 𝐶𝑜𝑠𝑡
= (2 + 4) + (2 + 4 + 5) + (2 =2∗2+4∗2+5∗2
+ 4 + 5 + 15)
= 2 ∗ 3 + 4 ∗ 3 + 5 ∗ 2 + 15 ∗ 1 + 15 ∗ 2 = 52
= 43
15
5 2 4 5 15
2 4
94
Weighted External Path Length
The total number of merge steps is equal
to:
𝑛
𝑠𝑖 𝑑𝑖
𝑖=1
Where 𝑠𝑖 is the size of Run 𝑖 and 𝑑𝑖 is
the distance from the node to root.
How to build a merge tree such that
the total cost is minimized?
95
Sort by Block Size
Sort runs using its size.
2 4 5 15
2 4 5 15 2 4 5 15 2 4 5 15
96
Similar to Message Encoding
Given a set of messages {𝑀1 , 𝑀2 , … , 𝑀𝑖 }
How do we encode each 𝑀𝑖 using a
binary code such that the total number of
message bits is minimum?
Encode 1 Encode 2 Encode 3
𝑀1 0 0001 0001
𝑀2 1 0010 1
𝑀3 10 0100 01
𝑀4 11 1000 001
97
7.10.5
F7.28 Huffman Code
Using a binary tree, called decode tree
to encode messages.
99
Decoding Cost
Cost of decoding a code word is proportional to
the number of bits of the word.
◦ Decoding a code word contain 2 ∗ 𝑀1 and 1 ∗ 𝑀4
requires process 2 ∗ 3 + 1 = 7 bits.
Assume the message 𝑀𝑖 with encoded bit length
𝑑𝑖 , occurring frequency is 𝑠𝑖 , then the total cost of
the code word is:
𝑛
𝑠𝑖 𝑑𝑖
𝑖=1
How do we construct a decode tree such
that the decoding cost is minimized?
100
Optimal Merge Tree
Follow Huffman Code Method
Sort the message according to 𝑠𝑖
𝑀1 𝑀3 𝑀2 𝑀4
2 4 5 15
Take two messages with the least 𝒔𝒊 and
combine them into a tree (a new message)
Repeat the process until we obtain one
tree.
𝑀1 𝑀3 𝑀1 𝑀3 𝑀2 M1 M3 M2 M4
101
Self-Study Topics
7.8 List and Table Sorts
103