DSA - Searching PDF
DSA - Searching PDF
Strictly for internal circulation (within KIIT) and reference only. Not for outside circulation without permission
Computer systems are often used to store large amounts of data from which
individual records must be retrieved according to some search criterion. Thus
the efficient storage of data to facilitate fast searching is an important issue.
Linear Search
Binary Search
Hashing
Linear search is a very simple search algorithm. In this type of search, a sequential search
is made over all items one by one. Every items is checked and if a match founds then that
particular item is returned otherwise search continues till the end of the data collection.
The run time complexity is O(n)
How linear search works?
It sequentially checks each element of the list for the target value until a match is found or
until all the elements have been searched.
Algorithm
LinearSearch (Array A, Value x) Step 6: Set i to i + 1 [continuation of algorithm]
Step 7: Go to Step 4
Step 1: Start
Step 8: Print Element x found at position i
Step 2: Set i to 1
and go to step 10
Step 3: Set n to length of A
Step 9: Print element not found
Step 4: if i > n then go to step 9
Step 10: Stop
Step 5: if A[i] = x then go to step 8
School of Computer Engineering
Linear Search C code
5
#include <stdio.h>
//continuation of program
for (c = 0; c < n; c++)
int main()
{
{
if (array[c] == search)
int array[100], search, c, n;
{
printf("%d is present at location %d.\n", search, c+1);
printf("Enter the number of elements in array\n");
break;
scanf("%d",&n);
}
}
printf("Enter %d integer(s)\n", n);
if (c == n)
for (c = 0; c < n; c++) printf("%d is not present in array.\n", search);
scanf("%d", &array[c]);
return 0;
printf("Enter the number to search\n"); }
scanf("%d", &search);
#include <stdio.h>
//continuation of program
int x = 3; // x is the element to be searched for
/* Recursive function to search x in arr[l..r] */
int index = recSearch(arr, 0, n-1, x);
int recSearch(int arr[], int l, int r, int x)
if (index != -1)
{
printf("Element %d is present at index %d", x, index);
if (r < l)
else
return -1;
printf("Element %d is not present", x);
if (arr[l] == x)
return 0;
return l;
}
return recSearch(arr, l+1, r, x);
}
int main()
{
int arr[] = {12, 34, 54, 2, 3}, i;
int n = sizeof(arr)/sizeof(arr[0]);
Traversal
Case Best Case Worst Case Average Case
Item is present 1 n n/2
Item not present n n n
Class Work
Your CR (Class Representative) went for a walk in a garden. There are many trees in the
garden and each tree has an English alphabet on it. While CR was walking, he/she noticed
that all trees with vowels on it are not in good state. She/he decided to take care of them.
So, he/she asked you to tell him the count of such trees in the garden.
Note : The following letters are vowels: 'A', 'E', 'I', 'O', 'U' ,'a','e','i','o' and 'u'.
Input : “nBBZLaosnm” Input : “JHkIsnZtTL”
Output : 2 Output : 1
Explanation: number of vowels in 1st input is 2 and in second input is 1
In divide and conquer approach, the problem in hand, is divided into smaller sub-
problems and then each problem is solved independently. When we keep on dividing the
sub-problems into even smaller sub-problems, we may eventually reach at a stage where
no more dividation is possible. Those "atomic" smallest possible sub-problem (fractions)
are solved. The solution of all sub-problems is finally merged in order to obtain the
solution of original problem.
Divide/Break: This step involves breaking the problem into smaller sub-problems.
Sub-problems should represent as a part of original problem. This step generally takes
recursive approach to divide the problem until no sub-problem is further dividable. At
this stage, sub-problems become atomic in nature but still represents some part of
actual problem.
Conquer/Solve: This step receives lot of smaller sub-problem to be solved. Generally
at this level, problems are considered 'solved' on their own.
Merge/Combine: When the smaller sub-problems are solved, this stage recursively
combines them until they formulate solution of the original problem.
This algorithmic approach works recursively and conquer & merge steps works so close
that they appear as one.
School of Computer Engineering
Binary Search
10
Binary search is a fast search algorithm with run-time complexity of Ο(log n). This search algorithm
works on the principle of divide and conquer. For this algorithm to work properly the data
collection should be in sorted form. It search a particular item by comparing the middle most item of
the collection. If match occurs then index of item is returned. If middle item is greater than item then
item is searched in sub-array to the right of the middle item other wise item is search in sub-array to
the left of the middle item. This process continues on sub-array as well until the size of sub-array
reduces to zero.
How binary search works?
Before the sort computation starts, bottom is initialized to 0 and top is initialized to n-1 i.e. 9.
First, we shall determine the half of the array by using this formula : mid = (top + bottom)/ 2. Here it
is, (9 + 0 ) / 2 = 4 (integer value of 4.5). So 4 is the mid of array.
Now we compare the value stored at location 4, with the value being searched i.e. 31. We find that value
at location 4 is 27, which is not a match. Because value is greater than 27 and we have a sorted array
so we also know that target value must be in upper portion of the array. So make bottom = mid + 1
i.e. 4 + 1 = 5
So at this point, bottom is 5 and top is 9. Second, we need to find the new mid value again i.e. mid =
(bottom + top ) /2 = (5 + 9) / 2 = 14 / 2 = 7. So 7 is the mid of the array
Now we compare the value stored at location 7, with the value being searched i.e. 31. We find that value
at location 7 is 35, which is not a match. Because value is less than 35 and we have a sorted array so
we also know that target value must be in lower portion of the array. So make top = mid - 1 i.e. 7 - 1 =
6
So at this point, bottom is 5 and top is 6. Third, we need to find the new mid value again i.e. mid =
(bottom + top ) /2 = (5 + 6) / 2 = 11 / 2 = 5. The value stored at location 5 is a match and conclude that
the target value 31 is stored at location 5.
#include <stdio.h>
//continuation of program
int main() do
{ {
int n, a[30], item, i, j, mid, top, bottom; mid = (bottom + top) / 2;
printf("Enter # of elements :\n"); if (item < a[mid])
scanf("%d", &n); top = mid - 1;
printf("Enter elements in ascending order\n"); else if (item > a[mid])
for (i = 0; i < n; i++) bottom = mid + 1;
{ } while (item != a[mid] && bottom <= top);
scanf("%d", &a[i]); if (item == a[mid])
} printf("Binary search successful!!\n");
printf("\nEnter the item to search\n"); else
scanf("%d", &item); printf("\n Search failed);
bottom = 0; return 0;
top = n - 1; }
Traversal
Case Best Case Worst Case Average Case
Item is present 1 log2(n) log2(n)
Item not present log2(n) log2(n) log2(n)
Class Work
Its been a few days since John is acting weird and finally you(best friend) came to know
that its because his proposal has been rejected.
He is trying hard to solve this problem but because of the rejection thing he can't really
focus. Can you help him? The question is: Given a number n , find if n can be represented
as the sum of 2 desperate numbers (not necessarily different) , where desperate numbers
are those which can be written in the form of (a*(a+1))/2 where a > 0 .
Input : The first input line contains an integer n (1 ≤ n ≤ 10^9).
Output : Print "YES", if n can be represented as a sum of two desperate numbers,
otherwise print "NO".
School of Computer Engineering
Hashing
16
Application
Compiler use hash tables to implement the symbol table (a data structure to
keep track of declared variables)
Game programs use hash tables to keep track of positions it has encountered
(transposition table)
Online spelling checker
Substring pattern matching
Document comparison
Searching
Basic Operation
Search − Searches an element in a hash table.
Insert − inserts an element in a hash table.
Delete − Deletes an element from a hash table.
School of Computer Engineering
Hash Function
18
The mapping between an key and the slot where that key belongs in the hash table is called the
hash function. The hash function will take any key in the collection and return an integer in the
range of slot names, between 0 and m-1. Assume that we have the set of integer key 54, 26, 93, 17,
77, and 31. The hash function, sometimes referred to as the “remainder method” simply takes an
key and divides it by the table size, returning the remainder as its hash value h(key) = key % 11.
Table shown below gives all of the hash values for example keys.
Key Hash Index Once the hash values have been computed, we can insert each key into the
54 54 % 11 = 10 hash table at the designated slot as shown below. Note that 6 of the 11 slots
26 26 % 11 = 4 are now occupied. This is referred to as the load factor, and is commonly
93 93 % 11 = 5 denoted by λ=number of item/table size. For this example, λ=6/11.
17 17 % 11 = 6
77 77 % 11 = 0
31 31 % 11 = 9
Now when we want to search for an key, we simply use the hash function to compute the slot name for the key and then check the
hash table to see if it is present. This searching operation is O(1), since a constant amount of time is required to compute the hash
value and then index the hash table at that location. If everything is where it should be, we have found a constant time search
algorithm.
So the hash function is not yielding to distinct values. So (54, 65) & (26,37) yielding to same
hash value. This situation is called as collision (also called clash) and some method to be
used to resolve it. So how to handle the collision?
Search for an empty location in the hash table
Use a second/third/fourth/fifth hash function
Use the array location as the header of a linked list of values that hash to this location
Given a collection of keys, a hash function that maps each item into a unique slot is
referred to as a perfect hash function. If we know the keys and the collection will never
change, then it is possible to construct a perfect hash function. Unfortunately, given an
arbitrary collection of keys, there is no systematic way to construct a perfect hash
function.
One way to always have a perfect hash function is to increase the size of the hash table so
that each possible value in the item range can be accommodated. This guarantees that
each item will have a unique slot. Although this is practical for small numbers of items, it
is not feasible when the number of possible items is large. For example, if the keys were
nine-digit Social Security Numbers (SSN), this method would require almost one billion
slots. If we only want to store data for a class of 25 citizen, we will be wasting an
enormous amount of memory.
So goal is to create a hash function that minimizes the number of collisions, easy to
compute, and evenly distributes the items in the hash table. There are a number of
common ways to extend the simple remainder method.
Folding Method
The folding method for constructing hash functions begins by dividing the key into
equal-size pieces (the last piece may not be of equal size). These pieces are then added
together to give the resulting hash value.
Example -
if our key was the phone number 436-555-4601, we would take the digits and divide
them into groups of 2 (43,65,55,46,01). After the addition, 43+65+55+46+01, we get
210. If we assume our hash table has 11 slots, then we need to perform the extra step of
dividing by 11 and keeping the remainder. In this case 210 % 11 is 1, so the phone
number 436-555-4601 hashes to slot 1. Sometimes, for extra milling, even number
parts are each reversed before the addition. So the groups of 2 (43,65,55,46,01)
becomes (43, 56, 55, 64 and 01). After the addition, 43+56+55+64+01, we get 210.
We first square the key, and then extract some portion of the resulting digits. For
example, if the key was 44, we would first compute 442=1936. By extracting the middle
two digits, 93, and performing the remainder step, we get 5 (93 % 11). Below table
shows item under midsquare method.
The keys are not consecutive and don’t start from 1. In such cases, we subtract a number
from the item to determine the address.
Example
A company have 100 employees and employee number starts from 1000
Keys are extracted from the key and made use as its address i.e. select specific digits from
the key k and use it as an address.
Example 1
Suppose we want to hash a 6 digit employee number say 123456 to a three digit
address, we could select the first, third and fourth digits from left and use them as
address, so the address will be 134
Example 2
Suppose the roll number of a student is 160252 and to hash the number to a 3 digit
address selecting first, third and fourth digits from right, so the address will be 220
Example 3
Suppose the roll number of a student is 160252 and to hash the number to a 3 digit
address selecting first, third and fourth digits from left, so the address will be 102
This method is useful when keys are assigned serially, as in the case of serial numbers.
This method is generally not used by itself, but is used in combination with other hashing
methods.
Example
The difference has to do with whether collisions are stored outside of the hash table
(open hashing) or whether collisions result in storing one of the records at another slot
in the hash table (closed hashing)
λ=load factor
As the load factor increases, number of collision increases causing increased
search time
To maintain efficiency, it is important to prevent the hash table from filling up
School of Computer Engineering
Linear Probing Algorithm
32
None None
None
# of Probes 1 1 2 3 3
School of Computer Engineering
Quadratic Probing Algorithm
34
It works on a similar idea to linear and quadratic probing. Use a big table and hash into it. Whenever
a collision occurs, choose another slot in table to put the value. The difference here is that instead of
choosing next opening, a second hash function is used to determine the location of the next slot. For
example, given hash function H1 and H2 and key. do the following:
Check location hash1(key). If it is empty, put record in it.
If it is not empty calculate hash2(key).
check if hash1(key)+hash2(key) is open, if it is, put it in
repeat with hash1(key)+2hash2(key), hash1(key)+3hash2(key) and so on, until an opening is
found.
Let hash1(k) = k % 20 and hash2(k) = k % 6 + 1 and length of the hash table (circular array) is 20
None None None None None None None None None None None None None None None None None None None None
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Let’s insert 34, 55, 12, 8, 45, 37, 88, 98, 54 and 32
32 None None None 45 None None 8
None 12 88 34 55 54 37 98 None
None None None
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Probes 4 1 1 1 2 1 1 3 1 1
School of Computer Engineering
Double Hashing Algorithm
36
INT HF1(INT KEY) //Hash Function 1
BEGIN SET H1 <- HF1(DI.Key) // NO COLLISION
RETURN KEY MOD M IF (HT[H1] == ∅) THEN
END HT[H1] = DI
EXIT()
INT HF2(INT KEY) //Hash Function 2 END IF
BEGIN SET J <- 1 // COLLISION CASE
SET K <- 6 // Can be any number SET H2 <- HF2(DI.Key)
RETURN (KEY MOD K + 1)
WHILE (TRUE)
END
SET I <- H1 + J * H2
SET K <- I MOD M
VOID INSERT(DataItem DI)
IF(HT[K] == ∅) THEN
BEGIN
HT[K] <- S
IF (LOADFACTOR() == 1) THEN
EXIT()
DISPLAY “Hash Table Full”
ELSE
EXIT()
J <- J+1
END IF
END IF
END WHILE
END // End of algorithm
School of Computer Engineering
Quadratic and Double Hashing Analysis
37
λ=load factor
It allow each slot to hold a reference to a collection (or chain) of items. Chaining allows many items
to exist at the same location in the hash table. When collisions happen, the item is still placed in the
proper slot of the hash table. As more and more items hash to the same location, the difficulty of
searching for the item in the collection increases.
When we want to search for an item, we use the hash function to generate the slot where it should
reside. Since each slot holds a collection, we use a searching technique to decide whether the item is
present. The advantage is that on the average there are likely to be many fewer items in each slot, so
the search is perhaps more efficient.
λ=load factor
It is the most efficient collision resolution scheme
Requires more storage (needs storage for pointers)
It easily performs the deletion operation. Deletion is more difficult in open-
addressing
/* update the head of the list and no of nodes in the current bucket */
hashTable[hashIndex].head = newnode;
hashTable[hashIndex].count++;
} School of Computer Engineering
Closed Hashing Implementation cont…
42
void searchInHash(int key)
{
int hashIndex = key % slotCount, flag = 0;
struct node *myNode;
myNode = hashTable[hashIndex].head;
if (!myNode)
{
printf("Search element unavailable in hash table\n");
return;
}
while (myNode != NULL)
{
if (myNode->info == key)
{
printf(“Element is : %d\n", myNode->info);
flag = 1;
break;
}
myNode = myNode->next;
}
if (!flag)
printf("Search element unavailable in hash table\n");
}
School of Computer Engineering
Deletion from Hash Table
43
We have a N (very large number of) sales records. Each record consists of the
id number of the customer and the price. There are k customers, where k is
still large, but not nearly as large as N. We want create a list of customers
together with the total amount spent by each customer. That is, for each
customer id, we want to know the sum of all the prices in sales records with
that id. Design a sensible algorithm for doing this.
What is the average and worst time complexity for insertion, deletion and
access operation for the hash table.
Suppose an unsorted linked list is in memory. Write a procedure
SEARCH(INFO, LINK, START, ITEM, LOC) which
Finds the location LOC of ITEM in the list or sets LOC = NULL for an
successful search
When the search is successful, interchanges ITEM with the element in
front of it.
Mathematically compute the worst case time complexity of binary search
https://www.tutorialspoint.com/data_structures_algorithms/linear_search_algorithm.htm
https://www.tutorialspoint.com/data_structures_algorithms/binary_search_algorithm.htm
https://www.tutorialspoint.com/data_structures_algorithms/hash_data_structure.htm
http://www.studytonight.com/data-structures/search-algorithms
https://www.w3schools.in/data-structures-tutorial/searching-techniques/
http://interactivepython.org/courselib/static/pythonds/SortSearch/searching.html
http://btechsmartclass.com/DS/U4_T1.html
http://www.geeksforgeeks.org/hashing-data-structure/
http://www.geeksforgeeks.org/searching-algorithms/
http://nptel.ac.in/courses/106102064/5
http://nptel.ac.in/courses/106103069/15
http://www.sanfoundry.com/c-program-implement-hash-tables-chaining-with-singly-
linked-lists/