Chapter 3 Simple Searching and Sorting
Chapter 3 Simple Searching and Sorting
– Sequential searching
– Binary searching
Sequential Searching
• Is also called linear search, serial search
• Doesn’t need sorting of the elements
• Is the most natural way of finding (searching) an element from a collection of elements
• Easy to understand and implement the algorithm
1
Sequential Search of a student database
– Loop through the list starting at the first element until the value of target matches
one of the list elements.
– If a match is not found, return –1.
2
i.e : Steps through the list from the beginning one item at a time looking for the
desired item
– The search stops when the item (that is the key) is found or when the search has
examined each item without success ( that is until end of list is reached)
• The technique is often used because it is easy to write and is applicable to many
situations
• Following is the same as the algorithm for sequential search written using the C++
programming language
if(!found)
index=-1;
return index;
}
3
• Of course, the exact number of steps depends on the input data
• For sequential search algorithm, the number of steps depends on the target is in the list,
and if so, where in the list, as well as on the length of the list
• For searching algorithms, the main steps are comparisons of list values with the target
value
• Count these to compute the complexity of sequential search for data models representing
the best case, worst case and average case
The main task of the searching algorithm is comparison of the key with data elements
Time requirements for sequential search
– Is defined as the smallest of all the running times on inputs of a particular size
– As its name suggests, it takes the optimistic view possible
– We assume the target (the element that we want) is the first element in the list
– Then there will be only one comparison
– Thus, For a list (an array) of n elements, the best case-case cpmplexity of
sequential search just one comparison (or array access)
That is, Tbest(n) = O(1)
• Worst case (maximum amount of work ):
– we consider the hardest input (a target that requires the algorithm to compare the
largest number of elements in the list)
For the serial search, the worst case running time occurs either when the target is not in the list or
the target is the last element of the list
– For a list of n elements, the worst case running time of sequential search requires
n comparisons
– Tworst(n) = O(n)
• Simply the number of comparison in the body is one and this is multiplied
by the number of iterations
4
Time requirements for sequential search
Average case
Worst case
– One way of developing an expression for the average running time of serial
search is based on all the targets that are actually in the list
– Suppose (to be concrete) the list has ten elements
– So, there are ten possible targets
– A search for the target that occurs at the first location requires just one
comparison (or array access)
– A search for the target that occurs at the second location requires two
comparisons (or array accesses)
– And so on, through the final target, which requires ten comparisons (or array
access) search for
– In all there are ten possible targets which requires 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
comparisons (or array accesses)
• Average case (expected amount of work )
– The average of all theses searches is:
– Generalization
5
Summary
– Both the worst-case and average case running time for sequential search are in
O(n) expression, but nevertheless, the average case is about half the time of
the worst case
• The principle strength of linear search is that it doesn't require that the elements of the list
be in any particular order, only that we can step through the list.
• It is easy to implement
• It is easy to analyze
• It is fine to use if you are searching a small list (or array) a few times
However, if a search algorithm will be used over and over, it is worthwhile to find a faster
algorithm
– One dramatically faster search algorithm is binary search
– Write a small C++ program that demonstrates sequential search. The program should
read a list from the user, prompts for a search target, and then uses sequential search to
look for the target in the list. The program should include extra output statements to
display steps in the search. The list (which is an array) is partially filled. Since the
integers in the list are meant to represent student ID, it is reasonable to assume that they
are all positive. Thus, you can use a sentinel entry of – to mark the end of the list
Binary search
• When the values of a list are in sorted order, there are better searches than sequential
search for determining whether a particular value is in the list
• For example, when you look up a name on the phone book, you do not start in the
beginning and scan through until you find the name
• You use the fact that names are listed in sorted order and use some intelligence to jump
quickly to the right page and then start scanning
• Binary search is such an example, a better search than sequential search for determining
whether a particular value is in the list
• Works only if the data is sorted
• It uses divide and conquer strategy (approach)
• How it works
• A flag called found is set to false, meaning NOT found.
• middle is calculated
6
• The value at the middle index position in the array is checked to see if it matches
the value the user was searching for
• If a match is found, the found flag is set to true so the loop will fail, preventing
unnecessary iterations.
• If the value is not found...
• Since the array is sorted, we can check to see which half of the array the value is
in
• If the value is in the lower half, we change last so the new array goes from 0 to
middle - 1.
• This has the effect of discarding the larger half of the array
• We then test to see if the value stored in the (new) middle position is the value we
were searching for
• If it is not, the process of dividing the array continues until either the value is
found or the loop terminates
• A similar procedure develops if the value is in the larger half of the array except
the variable first is changed to the position just after middle (middle + 1) and
ends at last
• If the value does not exist, the loop will terminate after the array has been
subdivided so many times, the last position is a smaller integer than the first
position, which of course is impossible in an array
• That is, the search ends when the target item is found or the values of first
and last cross over, so that last < first, indicating that no list items are left to
check
Psuedocode for Binary Search
if(!found)
index=-1;
else
index=mid;
return index;
}
• Remark
• It might appear that this algorithm changes the list because each time through the
while loop half of the list is “discarded”, so that at the end the list may contain
only one element
• Two reasons to avoid this
• The list may be needed again, to search for other targets, for example
• It creates extra work to delete half of the items in a list, or make a copy of
the list
• Rather than changing the list, the algorithm changes the part of the list to search
• Binary search uses the result of each comparison to eliminate half of the list
from further searching
• Analysis Binary Search
• What is the complexity (or maximum number of comparisons made) of binary search?
• Is binary search more efficient than sequential search?
• If so, how much more efficient is it?
• To evaluate binary search, count the number of comparisons in the best case and worst
case (why?)
• Complexity analysis for the average case is a bit more difficult and hence will be omitted
• Best case
• Occurs if the middle item happens to be the target
• Then, only one comparison is needed to find the target
• Thus, Tbest(n) = O(1)
8
• Remark
• The best case analysis does not reveal much
• Worst case
• When does the worst case occur?
• If the target is not in the list then the processes of dividing the list in half
continues until there is only one item left to check
• Here is the pattern of the number of comparisons after each division, given the
simplifying assumptions of an initial list length that is an even power of 2 (1024)
and exact division in half on each iteration
• One comparison, reduces the number of items left to check by half
• For a list of size 1024, there are 10 comparisons to reach a list of size one, given
that there is one comparison for each division, and each division splits the list size
in half
• Generalization
• Tworst(n) = O(logn) (that is, binary search efficiency for the worst case ia a
logarithmic function of list size)
• Here is a table showing how the maximum number of comparisons for sequential and
binary searches
100,000 100,000 16
200,000 200,000 17
9
400,000 400,000 18
800,000 800,000 19
1,600,000 1,600,000 20
• Note
– The worst case number of comparisons is just 16 for a list with 100,000 items,
versus 100,000 for sequential search
– Furthermore, if the list were doubled in size to 200,000, the maximum number of
comparisons for binary search would only increase by 1 to 17
• In general
logn << n
O(n) O(logn)
10
• One can estimate from the graph which one performs better (Include graph)
– For small number of data sequence, sequential search is more efficient than binary
search
– For sorted and large number of data, binary search is more efficient
Sorting Algorithms
External sorting:
Cannot be performed in main memory due to their large input size. i.e., the input is much
larger to fit into main memory
Sorting is done on disk or tape.
It is device dependent than internal sorting
Some of the algorithms that are external are:
� Simple algorithm- uses merge routine from merge sort
� Multiway merge
� Polypbase merge
� Replacement selection
The reset of this chapter will discus and analyze various internal sorting algorithms.
− There are several algorithms that are easy and are in the order of O (n2).
− There is also algorithm (ShellSort) that is very simple to implement and run in and
O(n2) practically efficient.
− There are slightly complicated O(nlogn) sorting algorithms.
− Any general purpose sorting algorithm requires Ω(nlogn) comparisons.
Assumptions:
− Our sorting is comparison based.
− Each algorithm will be passed an array containing N elements.
− N is the number of elements passed to our sorting algorithm
− The operators that will be used are “<”,”>”, and“==”
– Insertion Sort
– Selection Sort
– Bubble Sort
Bubble Sort
Bubble sort is the simplest algorithm to implement and the slowest algorithm on very large
inputs.
The basic idea is: Loop through list and compare adjacent pair of elements and when every two
elements are out of order with respect to each other interchange them.
12
Target: Pull a least element to upper position s bubble.
− The process of sequentially traversing through all part of list is known as pass.
− Implementation: This algorithm needs two nested loops. The outer loop controls the number
of passes through the list and inner loop controls the number of adjacent comparisons.
Elements 7 20 15 3 72 13 11 32 9
Index 0 1 2 3 4 5 6 7 8
Index E
lements Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Pass 6 Pass 7
Index Elements Pass1 Pass2 Pass3 Pass4 Pass 5 Pass 6 Pass 7
7 7 72 72 72 72 72 72 72
6 20 7 20 20 20 20 20 20
5 15 20 7 17 17 17 17 17
4 3 17 17 7 13 13 13 13
3 72 3 13 13 7 11 11 11
2 13 13 3 11 11 7 9 9
1 11 11 11 3 9 9 7 7
0 9 9 9 9 3 3 3 3
Implementation:
void bubble_sort(list[])
int i,j,temp;
for(i=0;i<n; i++){
13
for(j=n-1;j>i; j--){
if(list[j]<list[j-1]){
temp=list[j];
list[j]=list[j-1];
list[j-1]=temp;
}//end of bubble_sort
(n-1)+(n-2)+…+1= O(n2)
(n-1)+(n-2)+…+1= O(n2)
Space?
In-place algorithm.
If there is no change in the ith pass it implies that the elements are sorted(sorting is done)
earlier. Thus there is no need to continue the remaining pass.
In exploring this part one can develop a modified bubble sort algorithm that has better
performance.
14
Modified bubble sort:
void mBubbleSort(int a[],int n)
{
int i=1;
int swapped =1;
Selection Sort
• The selection sort algorithm is similarly motivated as bubble sort. However in the case of
selection sort it attempts to avoid the multitude of interchanges of adjacent entries to
which bubble sort is prone.
• Selection sort iterates over the list, placing one element in the correct place each time.
• To do this on the ith pass through the array it will determine the position of the
smallest/largest entry among the list of elements then this element is swapped.
• This algorithm uses its inner loop to find the largest/smallest entry.
• One version of selection sort finds the smallest element in the list and swaps it with
whatever is first, and then repeats with each element of the list.
• Note that this also uses linear search to find the smallest element each time.
• The xth pass of the selection sort will select the xth smallest key in the array,
placing that record into position x.
Basic Idea:
15
Implementation of selection sort in C++
void selection_sort(int list[])
for(i=0;i<n;i++){
smallest=i;
for(j=i+1;j<n;j++){
if(list[j]<list[smallest])
smallest=j;
temp=list[smallest];
list[smallest]=list[i];
list[i]=temp;
}//end of selection_sort
16
ANALYSIS of Selection Sort
• Selection sort is a bubble sort (see later), except that rather than repeatedly swapping
adjacent elements we remember the position of the element to be selected and do the
swap at the end of the inner loop.
(n-1)+(n-2)+…+1= O(n2)
n=O(n)
In-place algorithm
Insertion Sort
The insertion sort works just like its name suggests - it inserts each item into its proper place in
the final list. The simplest implementation of this requires two list structures - the source list and
the list into which sorted items are inserted. To save memory, most implementations use an in-
place sort that works by moving the current item past the already sorted items and repeatedly
swapping it with the preceding item until it is in place.
It's the most instinctive type of sorting algorithm. The approach is the same approach that you
use for sorting a set of cards in your hand. While playing cards, you pick up a card, start at the
beginning of your hand and find the place to insert the new card, insert it and move all the others
up one place.
Basic Idea:
Find the location for an element and move all others up, and insert the element.
1. The left most value can be said to be sorted relative to itself. Thus, we don’t need to do
anything.
2. Check to see if the second value is smaller than the first one. If it is, swap these two
values. The first two values are now relatively sorted.
3. Next, we need to insert the third value in to the relatively sorted portion so that after
insertion, the portion will still be relatively sorted.
17
4. Remove the third value first. Slide the second value to make room for insertion. Insert the
value in the appropriate position.
5. Now the first three are relatively sorted.
6. Do the same for the remaining items in the list.
Implementation
void insertion_sort(int list[]){
int temp;
for(int i=1;i<n;i++){
temp=list[i];
list[j]=list[j-1];
list[j-1]=temp;
}//end of insertion_sort
Analysis
1+2+3+…+(n-1)= O(n2)
1+2+3+…+(n-1)= O(n2)
In-place algorithm
18