Chapter 8_ Searching and Hashing
Chapter 8_ Searching and Hashing
○ Binary Search
Contents ● Hashing
○ Introduction to Hashing
1. Sequential search
2. Binary search
Sequential search (aka linear search)
● Is used in an unordered list
Steps:
1. Start from the leftmost element of the list and one by one compare the target
with each element of the list
2. If the target matches with an element, return the index of the element
3. Otherwise, return -1 indicating that the target is not present in the list
Sequential search
Example: Search for 1 in this unsorted list.
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Input: 26 5 37 1 61 11 59 15 48 19
Target: 1
Sequential search
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
26 5 37 1 61 11 59 15 48 19
input[0] == 1 ?
Index: 0
Sequential search
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
26 5 37 1 61 11 59 15 48 19
input[1] == 1 ?
Index: 1
Sequential search
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
26 5 37 1 61 11 59 15 48 19
input[2] == 1 ?
Index: 2
Sequential search
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
26 5 37 1 61 11 59 15 48 19
input[3] == 1 ? Yes
Index: 3
Sequential search performance
Best case, i.e. when the target is the first element in the list:
O(1)
Worst case, i.e. when the target is not present in the list or is the last element of the
list:
O(n)
Average case:
O(n)
Binary search
In sequential search, if there are 1000 elements, 1000 comparisons will be made in
the worst case.
If the list is sorted, we can use a more efficient algorithm called the binary search.
In general, we should use a binary search whenever the list starts to become large
(e.g., when the list has more than 16 elements).
Algorithm: binarySearch(a, target)
Input: A sorted list, a, and the element to be
searched, target
Output: Index of the target, if present, otherwise -1
7. else if a[mid] < target
Steps: 8. min = mid + 1
1. min = 0 9. else
2. max = n - 1 10. max = mid - 1
3. while max ≥ min 11. end if
4. mid = ⌊(min + max ) / 2⌋ # average of max and min 12. end while
5. if a[mid] == target 13. if max < min, then return
6. return mid # target found -1 # target is not present
14. end if
Binary search
Example: Search for 26 in this list.
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Input: 1 5 11 15 19 26 37 48 59 61
Target: 26
Binary search
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] min max mid
1 5 11 15 19 26 37 48 59 61 0 9 4
input[4] == 26 ? No
mid 4 input[4] > 26 ? No
input[4] < 26 ? Yes
Binary search
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] min max mid
1 5 11 15 19 26 37 48 59 61 5 9 7
input[7] == 26 ? No
mid 7 input[7] > 26 ? Yes
Binary search
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] min max mid
1 5 11 15 19 26 37 48 59 61 5 6 5
Target found!
Binary search performance
Best case: O(1)
In a hashed search, the key determines the location of the data through an
algorithmic function called a hash function
Main idea:
For an array, the address can be the index that contains the data
A hash function is a function which when given a key, generates an address in the
table
Hashing
Use a hash function to determine where to insert the record
ID Name Group
[0]
1 John A [3]
[4]
Key [5]
Hashing
When a record needs to be searched, use the same hash function to locate the
record ID Name Group
[0]
1 John A [3]
[4] 1 John A
Key [5]
Hashing
Hash function efficiency
Measure of how efficiently the hash function produces hash values for elements
within a set of data.
Prime area: The memory that contains all of the home addresses
If the data contain two or more synonyms, we can have collisions. A collision
occurs when a hashing algorithm produces an address for an insertion key and that
address is already occupied.
Hashing terminologies
Collision resolution
When two keys collide at a home address, we must resolve the collision by placing
one of the keys and its data in another location.
Hashing methods
● Direct hashing
● Subtraction
● Modulo-division / division remainder
● Digit-extraction
● Midsquare hashing
● Folding
● Rotation hashing
● Pseudorandom hashing
Hashing methods
Subtraction method
Limitation:
Hashing methods
Modulo-division method
Divide the key by the array size and use the remainder for the address
Example:
Hashing methods
Digit-extraction
Extract selected digits from the key and use them as the address
Example:
379452 → 394
121267 → 112
378845 → 388
Hashing methods
Midsquare hashing
Square the key and select the address from the middle of the squared number
Example:
If key = 9452, the address can be taken as 3403 because 94522 = 89340304
Hashing methods
Pseudorandom hashing
The key is used as the seed in a pseudorandom number generator (PRNG)*, and
the resulting random number is then scaled into the possible address range using
modulo-division method
Most hash table designs employ an imperfect hash function, which might generate
the same index for more than one key, causing collisions
When two keys collide at a home address, we must resolve the collision by placing
one of the keys and its data in another location.
When a collision occurs, the prime area addresses are searched for an open or
unoccupied element where the new data can be placed. Each calculation of an
address and test for success is called a probe.
● Linear probing
● Quadratic probing
● Double hashing (rehashing)
● Random probing
Collision resolution
Linear probing
When inserting a new pair whose key is k, we search the hash table addresses in
the order ( h(k) + i ) % b, 0 ≤ i ≤ b - 1, where h is the hash function, and b is the size of
the hash table (or the array).
Example: Using modulo-division method and linear probing, store the keys shown
below in an array with 19 elements.
[2] [12]
224562, 137456, 214562, 140145, 214576, 162145,
[13]
144467, 199645, 234534 [3]
[14]
Solution
[4] [15]
[16]
[5]
[17]
[6] [18]
Linear probing
[0] [9]
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11]
Solution (Contd.) [12]
[2] 140145
[4] [15]
[16]
[5]
[17]
[6] [18]
Linear probing
[0] [9] 214576
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11]
Solution (Contd.) [12]
[2] 140145
[4] [15]
[16]
[5]
[17]
[16]
[5]
[17] 234534
Hash. Map key to integer i between 0 and N-1, where N is the array size.
Insert. Put at table index i if free; if not try i+1, i+2, etc.
Search. Search table index i; if occupied but no match, try i+1, i+2, etc.
● Simple to implement
● Data tend to reamin near their home address
Disadvantages:
Example:
Quadratic probing
Limitation:
It is not possible to generate a new address for every element in the list.
Solution:
Use a list size that is a prime number. In this case, at least half of the list is
reachable.
Double hashing
Rehashing: Use a series of hash functions h1, h2, … , hn.
● Uses a separate area to store collisions and chains all synonyms together in a
linked list
● Uses two storage areas: the prime area and the overflow area
● Each element in the prime area contains a link head pointer to a linked list of
overflow data in the overflow area
Chaining
When a collision occurs, one
element is stored in the prime
area and chained to its
corresponding linked list in the
overflow area