Sorting Algorithms
Motivation
• Sorting means arranging data in a particular order (ascending,
descending, alphabetical, etc.)
• It helps organize data for easy access and analysis
• Everyday examples:
– Contacts listed alphabetically
– Products sorted by price
– Emails sorted by date
• Sorting = Structure + Efficiency
Motivation
• Once data is sorted, we can use faster search algorithms
– Linear Search → O(n)
– Binary Search → O(log n)
• Sorting is often a preprocessing step in databases, file systems,
and applications.
• Well-sorted data means well-optimized systems
Definition
Sorting is the process of:
– Taking a list of objects which could be stored in a linear order
(a0, a1, ..., an – 1)
e.g., numbers, and returning an reordering
(a'0, a'1, ..., a'n – 1)
such that
a'0 ≤ a'1 ≤ · · · ≤ a'n – 1
The conversion of an Abstract List into an Abstract Sorted List
Definition
Seldom will we sort isolated values
– Usually we will sort a number of records containing a number of fields
based on a key:
19991532 Stevenson Monica 3 Glendridge Ave.
19990253 Redpath Ruth 53 Belton Blvd.
19985832 Kilji Islam 37 Masterson Ave.
20003541 Groskurth Ken 12 Marsdale Ave.
19981932 Carol Ann 81 Oakridge Ave.
20003287 Redpath David 5 Glendale Ave.
Numerically by ID Number Lexicographically by surname, then given name
19981932 Carol Ann 81 Oakridge Ave. 19981932 Carol Ann 81 Oakridge Ave.
19985832 Khilji Islam 37 Masterson Ave. 20003541 Groskurth Ken 12 Marsdale Ave.
19990253 Redpath Ruth 53 Belton Blvd. 19985832 Kilji Islam 37 Masterson Ave.
19991532 Stevenson Monica 3 Glendridge Ave. 20003287 Redpath David 5 Glendale Ave.
20003287 Redpath David 5 Glendale Ave. 19990253 Redpath Ruth 53 Belton Blvd.
20003541 Groskurth Ken 12 Marsdale Ave. 19991532 Stevenson Monica 3 Glendridge Ave.
Definition
In these topics, we will assume that:
– Arrays are to be used for both input and output,
– We will focus on sorting objects and leave the more general
case of sorting records based on one or more fields as an
implementation detail
In-place Sorting
Sorting algorithms may be performed in-place, that is, with the
allocation of at most (1) additional memory (e.g., fixed number of
local variables)
Other sorting algorithms require the allocation of second array of
equal size
– Requires (n) additional memory
We will prefer in-place sorting algorithms
Classifications
The operations of a sorting algorithm are based on the actions
performed:
– Insertion
– Exchanging
– Selection
– Merging
– Distribution
Run-time
The run time of the sorting algorithms fall into one of three categories:
(n) (n ln(n)) O(n2)
We will examine average- and worst-case scenarios for each
algorithm
Run-time
We will review the more traditional O(n2) sorting algorithms:
– Insertion sort, Selection sort, Bubble sort
Some of the faster (n ln(n)) sorting algorithms:
– Merge sort, Quicksort, and Heap sort
Insertion sort
Background
For example, consider this sorted array containing of eight sorted
entries
5 7 12 19 21 26 33 40 14 9 18 21 2
Suppose we want to insert 14 into this array leaving the resulting
array sorted
Background
Starting at the back, if the number is greater than 14, copy it to the
right
– Once an entry less than 14 is found, insert 14 into the resulting vacancy
The Algorithm
For any unsorted list:
– Treat the first element as a sorted list of size 1
Then, given a sorted list of size k – 1
– Insert the kth item in the sorted list
– The sorted list is now of size k
The Algorithm
Recall the five sorting techniques:
– Insertion
– Exchange
– Selection
– Merging
– Distribution
Clearly insertion falls into the first category
Implementation
void insertion_sort( int* array, int n ) {
for ( int k = 1; k < n; ++k ) {
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > array[j] ) {
swap( array[j - 1], array[j] );
} else {
// no need to swap,
// the (k+1)th is in the correct location
break;
}
}
}
}
Implementation and Analysis
Let’s do a run-time analysis of this code
void insertion_sort( int* array, int n ) {
for ( int k = 1; k < n; ++k ) {
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > array[j] ) {
swap( array[j - 1], array[j] );
} else {
// As soon as we don't need to swap,
// the (k + 1)st is in the correct location
break;
}
}
}
}
Implementation and Analysis
The initialization of the outer for-loop is executed once – O(1)
void insertion_sort( int* array, int n ) {
for ( int k = 1; k < n; ++k ) {
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > array[j] ) {
swap( array[j - 1], array[j] );
} else {
// As soon as we don't need to swap,
// the (k + 1)st is in the correct
location
break;
}
}
}
}
Implementation and Analysis
This condition will be tested n times at which point it fails
void insertion_sort( int* array, int n ) {
for ( int k = 1; k < n; ++k ) {
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > array[j] ) {
swap( array[j - 1], array[j] );
} else {
// As soon as we don't need to swap,
// the (k + 1)st is in the correct location
break;
}
}
}
}
Implementation and Analysis
Thus, the inner for-loop will be executed a total of n – 1 times
void insertion_sort( int* array, int n ) {
for ( int k = 1; k < n; ++k ) {
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > array[j] ) {
swap( array[j - 1], array[j] );
} else {
// As soon as we don't need to swap,
// the (k + 1)st is in the correct location
break;
}
}
}
}
Implementation and Analysis
In the worst case, the inner for-loop is executed a total of k times
void insertion_sort( int* array, int n ) {
for ( int k = 1; k < n; ++k ) {
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > array[j] ) {
swap( array[j - 1], array[j] );
} else {
// As soon as we don't need to swap,
// the (k + 1)st is in the correct
location
break;
}
}
}
}
Implementation and Analysis
The body of the inner for-loop runs once in either case – O(1)
void insertion_sort( int* array, int n ) {
for ( int k = 1; k < n; ++k ) {
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > array[j] ) {
swap( array[j - 1], array[j] );
} else {
// As soon as we don't need to swap,
// the (k + 1)st is in the correct location
break;
}
} Thus, the worst-case run time is
n −1
(
n n −1 )
}
}
k =
2
= O n 2
( )
k =1
Note:
– The algorithm is easy to implement
– Best case: elements are already sorted – Ω(n)
– Worst case: O(n2)
– Average case: θ(n2)
– If the array is almost sorted, insertion sort is a better algorithm
to apply.
– If the given array contains n-element, insertion sort will take (n-1)
comparison in the best case.
– Insertion sort is a better algorithm for smaller-size arrays.
– Insertion sort is an in-place sorting algorithm.
– Insertion sort is a stable sorting algorithm.
The Algorithm
Swapping is expensive, so we could just temporarily assign the new entry
Implementation
void insertion( int *array, int n ) {
for ( int k = 1; k < n; ++k ) {
int tmp = array[k];
for ( int j = k; j > 0; --j ) {
if ( array[j - 1] > tmp ) {
array[j] = array[j - 1];
} else {
array[j] = tmp;
break;
}
}
if (array[0]>tmp)
array[0] = tmp; // only executed if tmp < array[0]
}
}
Selection sort
Selection Sort (min at first)
General situation :
0 k size-1
x: smallest elements, sorted remainder, unsorted
Steps :
• Find smallest element, mval, in x[k…size-1]
• Swap smallest element with x[k], then increase k.
0 k mval size-1
x:
swap
Selection Sort - Example
x: 3 12 -5 6 142 21 -17 45 x: -17 -5 3 6 12 21 142 45
x: -17 12 -5 6 142 21 3 45 x: -17 -5 3 6 12 21 45 142
x: -17 -5 12 6 142 21 3 45 x: -17 -5 3 6 12 21 45 142
x: -17 -5 3 6 142 21 12 45
x: -17 -5 3 6 142 21 12 45
x: -17 -5 3 6 12 21 142 45
Selection Sort
/* The main sorting function */
/* Sort x[0..size-1] in non-decreasing order */
int selectionSort (int x[], int size)
{ int k, m;
for (k=0; k<size-1; k++)
{
m = findMinLoc(x, k, size);
temp = a[k];
a[k] = a[m];
a[m] = temp;
}
}
Selection Sort
/* Identify location of smallest element in
x[k .. size-1];*/
int findMinLloc (int x[ ], int k, int size)
{
int j, pos; /* x[pos] is the smallest
element found so far */
pos = k;
for (j=k+1; j<size; j++)
if (x[j] < x[pos])
pos = j;
return pos;
}
Note:
• Time complexity => n2 // all the cases
• (n-1)+(n-2)+….3+2+1 = n(n-1)/2 = O(n2)
• Maximum no of swap in selection sort => (n-1)
Bubble sort
Description
Suppose we have an array of data which is unsorted:
– Starting at the front, traverse the array, find the largest item, and
move (or bubble) it to the top
– With each subsequent iteration, find the next largest item and
bubble it up towards the top of the array
Implementation
Starting with the first item, assume that it is the largest
Compare it with the second item:
– If the first is larger, swap the two,
– Otherwise, assume that the second item is the largest
Continue up the array, either swapping or redefining the largest item
Implementation
After one pass, the largest item must be the last in the list
Start at the front again:
– the second pass will bring the second-largest element into the
second-last position
Repeat n – 1 times, after which, all entries will be in place
Example
Consider the unsorted array to the right
We start with the element in the first location,
and move forward:
– if the current and next items are in order,
continue with the next item, otherwise
– swap the two entries
Example
After one loop, the largest element is in the
last location
– Repeat the procedure
Example
Now the two largest elements are at the end
– Repeat again
Example
With this loop, 5 and 7 are swapped
Example
At this point, we have a sorted array
Implementation
The default algorithm:
void bubble( int *array, int n ) {
for ( int i = n - 1; i > 0; --i ) {
for ( int j = 0; j < i; ++j ) {
if ( array[j] > array[j + 1] ) {
swap( array[j], array[j + 1] );
}
}
}
}
The Basic Algorithm
Here we have two nested loops
Thus, calculating the run time is straight-forward: O(n2)
Implementations and Improvements
The next few slides show some implementations of bubble sort together
with a few improvements:
– reduce the number of swaps
– halting if the list is sorted
First Improvement
We could avoid so many swaps...
void bubble( int *array, int n ) {
for ( int i = n - 1; i > 0; --i ) {
int max = array[0]; // assume a[0] is the max
for ( int j = 1; j <= i; ++j ) {
if ( array[j] < max ) {
array[j - 1] = array[j]; // move
} else {
array[j – 1] = max; // store the old max
max = array[j]; // get the new max
}
}
array[i] = max; // store the max
}
}
Second Improvement: Flagged Bubble Sort
One useful modification would be to check if no swaps occur:
– If no swaps occur, the list is sorted
– In this example, no swaps occurred
during the 5th pass
Use a Boolean flag to check if no
swaps occurred
Flagged Bubble Sort
Check if the list is sorted (no swaps)
void bubble( int *array, int n ) {
for ( int i = n - 1; i > 0; --i ) {
int max = array[0];
bool sorted = true;
for ( int j = 1; j <= i; ++j ) {
if ( array[j] < max ) {
array[j - 1] = array[j];
sorted = false;
} else {
array[j – 1] = max;
max = array[j];
}
}
array[i] = max;
if ( sorted ) {
break;
}
}
}
If the array is already sorted then time complexity is n due to flag