0% found this document useful (0 votes)

20 views

Chapter 8_ Searching and Hashing

Chapter 8 covers searching and hashing techniques, detailing basic search methods such as sequential and binary search, along with their performance metrics. It introduces hashing as a method for efficient data retrieval using hash functions and discusses collision resolution techniques. The chapter also outlines various hashing methods and terminologies, emphasizing the importance of hash tables for quick data access.

Uploaded by

nabinsharmagairipipli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Chapter 8_ Searching and Hashing

Uploaded by

nabinsharmagairipipli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Chapter 8

Searching and Hashing

● Basic Search Techniques
○ Sequential Search

○ Binary Search

Contents ● Hashing
○ Introduction to Hashing

○ HashFunction and hash tables

○ Collision resolution techniques

Searching
● A table or a file is a group of elements, each of which is called a record
● A key is used to differentiate among different records
● Searching is the process of finding a record (with the target key) among a list
of records
● Searching is one of the most common and time-consuming operations
● A search algorithm may return the entire record or, more commonly, it may
return a pointer to that record
● If the record is not found, then it is called an unsuccessful search
● A successful search is often called a retrieval
Basic searching techniques
The algorithm used to search a list depends to a large extent on the structure of the
list.

Two basic searches for arrays are:

1. Sequential search
2. Binary search
Sequential search (aka linear search)
● Is used in an unordered list

Steps:

1. Start from the leftmost element of the list and one by one compare the target
with each element of the list
2. If the target matches with an element, return the index of the element
3. Otherwise, return -1 indicating that the target is not present in the list
Sequential search
Example: Search for 1 in this unsorted list.

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

Input: 26 5 37 1 61 11 59 15 48 19

Target: 1
Sequential search

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

26 5 37 1 61 11 59 15 48 19

input[0] == 1 ?

Index: 0
Sequential search

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

26 5 37 1 61 11 59 15 48 19

input[1] == 1 ?

Index: 1
Sequential search

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

26 5 37 1 61 11 59 15 48 19

input[2] == 1 ?

Index: 2
Sequential search

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

26 5 37 1 61 11 59 15 48 19

input[3] == 1 ? Yes

Index: 3
Sequential search performance
Best case, i.e. when the target is the first element in the list:

O(1)

Worst case, i.e. when the target is not present in the list or is the last element of the
list:

O(n)

Average case:
O(n)
Binary search
In sequential search, if there are 1000 elements, 1000 comparisons will be made in
the worst case.

If the list is sorted, we can use a more efficient algorithm called the binary search.

In general, we should use a binary search whenever the list starts to become large
(e.g., when the list has more than 16 elements).
Algorithm: binarySearch(a, target)
Input: A sorted list, a, and the element to be
searched, target
Output: Index of the target, if present, otherwise -1
7. else if a[mid] < target
Steps: 8. min = mid + 1
1. min = 0 9. else
2. max = n - 1 10. max = mid - 1
3. while max ≥ min 11. end if
4. mid = ⌊(min + max ) / 2⌋ # average of max and min 12. end while
5. if a[mid] == target 13. if max < min, then return
6. return mid # target found -1 # target is not present
14. end if
Binary search
Example: Search for 26 in this list.

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

Input: 1 5 11 15 19 26 37 48 59 61

Target: 26
Binary search

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] min max mid

1 5 11 15 19 26 37 48 59 61 0 9 4

input[4] == 26 ? No
mid 4 input[4] > 26 ? No
input[4] < 26 ? Yes
Binary search

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] min max mid

1 5 11 15 19 26 37 48 59 61 5 9 7

input[7] == 26 ? No
mid 7 input[7] > 26 ? Yes
Binary search

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] min max mid

1 5 11 15 19 26 37 48 59 61 5 6 5

mid 5 input[5] == 26 ? Yes

Target found!
Binary search performance
Best case: O(1)

Worst case: O(log2n)

Average case: O(log2n)

Hashing
Goal of hashing: To find the data with only one test, i.e., expected complexity =
O(1). (Also, to insert and delete in O(1) expected time.)

In a hashed search, the key determines the location of the data through an
algorithmic function called a hash function

Main idea:

1. Use a hash function to determine where to insert the record

2. When a record needs to be searched, use the same hash function to locate the
record
Hashing
Hashing is a key-to-address mapping process

For an array, the address can be the index that contains the data

A hash function is a function which when given a key, generates an address in the
table
Hashing
Use a hash function to determine where to insert the record
ID Name Group

[0]

Hash function [1]

ID Name Group [2]

1 John A [3]

[4]

Key [5]
Hashing
When a record needs to be searched, use the same hash function to locate the
record ID Name Group

[0]

Hash function [1]

ID Name Group [2]

1 John A [3]

[4] 1 John A

Key [5]
Hashing
Hash function efficiency
Measure of how efficiently the hash function produces hash values for elements
within a set of data.

A hash function should be a quick, stable and deterministic operation.

Hash table
A data structure for quickly looking things up

Implements an associative array (map, symbol table, or dictionary) abstract data

type

A dictionary is a structure that is composed of a collection of

(key, value) pairs (maps keys to values )

The main operation supported by a dictionary is

searching by key
Hash table
A hash table uses a hash function to compute an index into an array of buckets or
slots, from which the desired value can be found
Hashing terminologies
Home address: The address produced by the hashing algorithm

Prime area: The memory that contains all of the home addresses

Synonyms: The set of keys that hash to the same location

If the data contain two or more synonyms, we can have collisions. A collision
occurs when a hashing algorithm produces an address for an insertion key and that
address is already occupied.
Hashing terminologies
Collision resolution

When two keys collide at a home address, we must resolve the collision by placing
one of the keys and its data in another location.
Hashing methods
● Direct hashing
● Subtraction
● Modulo-division / division remainder
● Digit-extraction
● Midsquare hashing
● Folding
● Rotation hashing
● Pseudorandom hashing
Hashing methods
Subtraction method

Subtract a fixed value from the key to determine the address

Address = key - constant

Limitation:
Hashing methods
Modulo-division method

Divide the key by the array size and use the remainder for the address

Address = key % listSize

Example:
Hashing methods
Digit-extraction

Extract selected digits from the key and use them as the address

Example:

379452 → 394

121267 → 112

378845 → 388
Hashing methods
Midsquare hashing

Square the key and select the address from the middle of the squared number

Example:

If key = 9452, the address can be taken as 3403 because 94522 = 89340304
Hashing methods
Pseudorandom hashing

The key is used as the seed in a pseudorandom number generator (PRNG)*, and
the resulting random number is then scaled into the possible address range using
modulo-division method

* The PRNG-generated sequence is not truly random, because it is completely

determined by an initial value, called the PRNG's seed. Example PRNG: y = ax + c
Collision resolution
A perfect hash function will assign each key to a unique bucket (i.e., no collision)

Most hash table designs employ an imperfect hash function, which might generate
the same index for more than one key, causing collisions

When two keys collide at a home address, we must resolve the collision by placing
one of the keys and its data in another location.

Two general approaches to handling collision resolution:

1. Open addressing: resolves collisions in the prime area

2. Chaining: resolves collisions by placing the data in a separate overflow area
Collision resolution
Open addressing

When a collision occurs, the prime area addresses are searched for an open or
unoccupied element where the new data can be placed. Each calculation of an
address and test for success is called a probe.

● Linear probing
● Quadratic probing
● Double hashing (rehashing)
● Random probing
Collision resolution
Linear probing

When inserting a new pair whose key is k, we search the hash table addresses in
the order ( h(k) + i ) % b, 0 ≤ i ≤ b - 1, where h is the hash function, and b is the size of
the hash table (or the array).

The search terminates when we find the first unfilled address.

Example: Using modulo-division method and linear probing, store the keys shown
below in an array with 19 elements.

224562, 137456, 214562, 140145, 214576, 162145, 144467, 199645, 234534

Linear probing
[0] [9]
Example: Using modulo-division method and linear
[10]
probing, store the keys shown below in an array with [1] 224562
19 elements. [11]

[2] [12]
224562, 137456, 214562, 140145, 214576, 162145,
[13]
144467, 199645, 234534 [3]
[14]
Solution
[4] [15]

Here, b = 19 and the hash function is h(k) = k % n [16]

[5]
[17]
Address for 224562 is h(224562) = 224562 % 19 = 1
[6] [18]
Linear probing
[0] [9]
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11]
Solution (Contd.) [12]
[2]

Address for 137456 is h(137456) = 137456 % 19 = 10 [13]

[3]
[14] 214562
Similarly, h(214562) = 214562 % 19 = 14
[4] [15]

[16]
[5]
[17]

[6] [18]
Linear probing
[0] [9]
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11]
Solution (Contd.) [12]
[2] 140145

h(140145) = 140145 % 19 = 1. Since the index 1 is [13]

[3]
already occupied, we probe sequentially until we [14] 214562
find an unoccupied index.
[4] [15]

The next address is ( h(140145) + 1) % b = (1 + 1) % 19 = [16]

2, which is unoccupied. So, 140145 will be inserted at [5]
[17]
the index 2 of the array.
[6] [18]
Linear probing
[0] [9] 214576
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11]
Solution (Contd.) [12]
[2] 140145

h(214576) = 214576 % 19 = 9 [13]

[3]
[14] 214562

[4] [15]

[16]
[5]
[17]

[6] [18]
Linear probing
[0] [9] 214576
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11]
Solution (Contd.) [12]
[2] 140145

h(162145) = 162145 % 19 = 18 [13]

[3]
[14] 214562

[4] [15]

[16]
[5]
[17]

[6] [18] 162145

Linear probing
[0] [9] 214576
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11] 144467
Solution (Contd.) [12]
[2] 140145

h(144467) = 144467 % 19 = 10. Collision occurs here. [13]

[3]
[14] 214562
The next address is ( h(144467) + 1) % b = (10 + 1) % 19 =
11, which is unoccupied. So, 144467 will be inserted at [4] [15]

the index 11 of the array. [16]

[5]
[17]

[6] [18] 162145

Linear probing
[0] [9] 214576
224562, 137456, 214562, 140145, 214576, 162145,
[10] 137456
144467, 199645, 234534 [1] 224562
[11] 144467
Solution (Contd.) [12] 199645
[2] 140145

h(199645) = 199645 % 19 = 12 [13]

[3]
[14] 214562
h(234534) = 234534 % 19 = 17
[4] [15]

[16]
[5]
[17] 234534

[6] [18] 162145

Linear probing
Summary

Hash. Map key to integer i between 0 and N-1, where N is the array size.

Insert. Put at table index i if free; if not try i+1, i+2, etc.

Search. Search table index i; if occupied but no match, try i+1, i+2, etc.

Note. Array size N must be greater than number of key-value pairs.

Linear probing
Advantages:

● Simple to implement
● Data tend to reamin near their home address

Disadvantages:

● Linear probes tend to produce primary clustering (clustering of data aroudn a

home address)
● Tend to make the search algorithm more complex, especially after data have
been deleted
Quadratic probing
A quadratic function of i is used as the increment, i.e., we examine the addresses
(h(k) + i2) % b

Example:
Quadratic probing
Limitation:

It is not possible to generate a new address for every element in the list.

Solution:

Use a list size that is a prime number. In this case, at least half of the list is
reachable.
Double hashing
Rehashing: Use a series of hash functions h1, h2, … , hn.

Double hashing: Use two hash functions, h1, and h2.

● First probe the location h1(key) % N, where N is the array size.

● If the location is occupied, we probe the location (h1(key) + h2(key)) % N, then
(h1(key) + 2*h2(key)) % N, and so on.
Double hashing example
(Pseudo)Random probing
Uses a pseudorandom number to resolve the
collision
Chaining
A major disadvantage to open addressing is that each collision resolution increases
the probability of future collisions. Also, the search for a key involves comparison
with keys that have different hash values.

This disadvantage is eliminated in chaining

● Uses a separate area to store collisions and chains all synonyms together in a
linked list
● Uses two storage areas: the prime area and the overflow area
● Each element in the prime area contains a link head pointer to a linked list of
overflow data in the overflow area
Chaining
When a collision occurs, one
element is stored in the prime
area and chained to its
corresponding linked list in the
overflow area

Transmision Automatica Rexton (Manual de Taller)
No ratings yet
Transmision Automatica Rexton (Manual de Taller)
60 pages
Algorithm Lecture6 Search
No ratings yet
Algorithm Lecture6 Search
40 pages
Unit-9-Searching
No ratings yet
Unit-9-Searching
10 pages
Unit-1-1
No ratings yet
Unit-1-1
63 pages
Unit Nine
No ratings yet
Unit Nine
31 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
CACS201 Unit 9 - Searching
No ratings yet
CACS201 Unit 9 - Searching
29 pages
Module-6 Searching Techniques
No ratings yet
Module-6 Searching Techniques
44 pages
Lecture 09 - Searching (Updated)
No ratings yet
Lecture 09 - Searching (Updated)
68 pages
Unit 5 - DSA
No ratings yet
Unit 5 - DSA
14 pages
Chapter 8 - Searching
No ratings yet
Chapter 8 - Searching
12 pages
Searching 2
No ratings yet
Searching 2
64 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Searching: Kruse and Ryba CH 7.1-7.3 and 9.6
No ratings yet
Searching: Kruse and Ryba CH 7.1-7.3 and 9.6
64 pages
Implementation Priority Queue Using Array
No ratings yet
Implementation Priority Queue Using Array
3 pages
Dsa Module 6 Ktustudents - in
No ratings yet
Dsa Module 6 Ktustudents - in
9 pages
DSA Chapter 08 (Searching)
No ratings yet
DSA Chapter 08 (Searching)
65 pages
Hashing Slide
No ratings yet
Hashing Slide
16 pages
DS 5
No ratings yet
DS 5
23 pages
DS 5
No ratings yet
DS 5
16 pages
Dsa Module 6 Ktuassist
No ratings yet
Dsa Module 6 Ktuassist
9 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Chapter-8-Searching_0086ec41-188b-4e07-84f7-49ceaf281845
No ratings yet
Chapter-8-Searching_0086ec41-188b-4e07-84f7-49ceaf281845
12 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Chapter 8 - Searching
No ratings yet
Chapter 8 - Searching
44 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Unit-3_Notes_Searching_Sorting_DKPJ[1]
No ratings yet
Unit-3_Notes_Searching_Sorting_DKPJ[1]
39 pages
8 search+hash - 2
No ratings yet
8 search+hash - 2
28 pages
Chapter 11 Hashing
No ratings yet
Chapter 11 Hashing
42 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
No ratings yet
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
78 pages
Hashing
No ratings yet
Hashing
25 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Unit 5
No ratings yet
Unit 5
6 pages
Hashing
No ratings yet
Hashing
20 pages
Search vs. Hashing
No ratings yet
Search vs. Hashing
55 pages
DSAL Manual Assignment 4
No ratings yet
DSAL Manual Assignment 4
6 pages
Handout 9 - Hashing
No ratings yet
Handout 9 - Hashing
11 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
Ders7 - Data Structures and Search Algorithms
No ratings yet
Ders7 - Data Structures and Search Algorithms
41 pages
Hashing Powerpoint
No ratings yet
Hashing Powerpoint
58 pages
File Organization
No ratings yet
File Organization
49 pages
3.1 Searching Techniques
No ratings yet
3.1 Searching Techniques
49 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Binary Search, Hashing and File Structures
No ratings yet
Binary Search, Hashing and File Structures
23 pages
MODULE-5
No ratings yet
MODULE-5
33 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
CH 4
No ratings yet
CH 4
58 pages
Lab08 - DS - Hash Tables
No ratings yet
Lab08 - DS - Hash Tables
9 pages
Hashing
No ratings yet
Hashing
23 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
ceng2001_week7
No ratings yet
ceng2001_week7
52 pages
Hashing
No ratings yet
Hashing
10 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Paper - Contribution of LLR To Fundamental Astronomy - Chapront - 2001
No ratings yet
Paper - Contribution of LLR To Fundamental Astronomy - Chapront - 2001
6 pages
DBT100A MIL-STD-1553 Network Tester: Data Sheet
No ratings yet
DBT100A MIL-STD-1553 Network Tester: Data Sheet
2 pages
Item Analysis Send
No ratings yet
Item Analysis Send
34 pages
Sdic I Notes
No ratings yet
Sdic I Notes
60 pages
Module 1 - Satellite Orbits and Trajectories
No ratings yet
Module 1 - Satellite Orbits and Trajectories
28 pages
Rosen7eExtraExamples0101 PDF
No ratings yet
Rosen7eExtraExamples0101 PDF
16 pages
Mark Scheme (Results) January 2015: Pearson Edexcel International Advanced Subsidiary in Chemistry (WCH02) Paper 01
No ratings yet
Mark Scheme (Results) January 2015: Pearson Edexcel International Advanced Subsidiary in Chemistry (WCH02) Paper 01
23 pages
Cement Chemistry
No ratings yet
Cement Chemistry
11 pages
CHAPTER 3 Local GH
No ratings yet
CHAPTER 3 Local GH
22 pages
6.8.1 Study - Area and Sectors (Study Guide)
No ratings yet
6.8.1 Study - Area and Sectors (Study Guide)
7 pages
JAVA ANSWERS
No ratings yet
JAVA ANSWERS
23 pages
ABB Swiches
No ratings yet
ABB Swiches
53 pages
Show and Tell: A Neural Image Caption Generator (CVPR 2015) : Presenters: Tianlu Wang, Yin Zhang October 5
No ratings yet
Show and Tell: A Neural Image Caption Generator (CVPR 2015) : Presenters: Tianlu Wang, Yin Zhang October 5
13 pages
Questions For Carbohydrates
No ratings yet
Questions For Carbohydrates
4 pages
Dowthermal Property
No ratings yet
Dowthermal Property
53 pages
Table of Current Rating
No ratings yet
Table of Current Rating
1 page
Presentation On: Iris Scanner Technology
No ratings yet
Presentation On: Iris Scanner Technology
16 pages
Cessna 414 Sid
No ratings yet
Cessna 414 Sid
59 pages
1 s2.0 S1474667016361055 Main
No ratings yet
1 s2.0 S1474667016361055 Main
6 pages
Science Focus 9 Unit 2 Topic 5 The Periodic Table
No ratings yet
Science Focus 9 Unit 2 Topic 5 The Periodic Table
10 pages
Introduction to Mathematical Control Theory
No ratings yet
Introduction to Mathematical Control Theory
276 pages
Q. 1 - Q. 25 Carry One Mark Each.: Links Created By: Taha Sheikh Metallurgy and Material Science Iitk
No ratings yet
Q. 1 - Q. 25 Carry One Mark Each.: Links Created By: Taha Sheikh Metallurgy and Material Science Iitk
10 pages
JEE Main 2025 Physics Syllabus - 1730950385180
No ratings yet
JEE Main 2025 Physics Syllabus - 1730950385180
10 pages
PR 2 Topic 2
0% (2)
PR 2 Topic 2
21 pages
Statistics Homework Assessment Worksheet Answers
100% (1)
Statistics Homework Assessment Worksheet Answers
6 pages
Decision Trees Palagraism
No ratings yet
Decision Trees Palagraism
16 pages
Bme Assignment Ice
No ratings yet
Bme Assignment Ice
3 pages
Chapter-V Automotive Communication Protocols (1)
No ratings yet
Chapter-V Automotive Communication Protocols (1)
251 pages
Imaging in Chronic Pancreatitis - State of The Art Review
No ratings yet
Imaging in Chronic Pancreatitis - State of The Art Review
10 pages

Chapter 8_ Searching and Hashing

Uploaded by

Chapter 8_ Searching and Hashing

Uploaded by

Chapter 8

Searching and Hashing

○ HashFunction and hash tables

○ Collision resolution techniques

Two basic searches for arrays are:

mid 5 input[5] == 26 ? Yes

Worst case: O(log2n)

Average case: O(log2n)

1. Use a hash function to determine where to insert the record

Hash function [1]

ID Name Group [2]

Hash function [1]

ID Name Group [2]

A hash function should be a quick, stable and deterministic operation.

Implements an associative array (map, symbol table, or dictionary) abstract data

A dictionary is a structure that is composed of a collection of

The main operation supported by a dictionary is

Synonyms: The set of keys that hash to the same location

Subtract a fixed value from the key to determine the address

Address = key - constant

Address = key % listSize

* The PRNG-generated sequence is not truly random, because it is completely

Two general approaches to handling collision resolution:

1. Open addressing: resolves collisions in the prime area

The search terminates when we find the first unfilled address.

224562, 137456, 214562, 140145, 214576, 162145, 144467, 199645, 234534

Here, b = 19 and the hash function is h(k) = k % n [16]

Address for 137456 is h(137456) = 137456 % 19 = 10 [13]

h(140145) = 140145 % 19 = 1. Since the index 1 is [13]

The next address is ( h(140145) + 1) % b = (1 + 1) % 19 = [16]

h(214576) = 214576 % 19 = 9 [13]

h(162145) = 162145 % 19 = 18 [13]

[6] [18] 162145

h(144467) = 144467 % 19 = 10. Collision occurs here. [13]

the index 11 of the array. [16]

[6] [18] 162145

h(199645) = 199645 % 19 = 12 [13]

[6] [18] 162145

Note. Array size N must be greater than number of key-value pairs.

● Linear probes tend to produce primary clustering (clustering of data aroudn a

Double hashing: Use two hash functions, h1, and h2.

● First probe the location h1(key) % N, where N is the array size.

This disadvantage is eliminated in chaining

You might also like