0% found this document useful (0 votes)
23 views23 pages

Lecture 13 - Hashing

The document discusses hashing as a method for identifying duplicate integers in a list using various data structures and functions. It explains the process of hashing, the importance of hash functions, and common methods for generating hash values, including the Division Method, Folding Method, and Mid-Square Method. Additionally, it addresses the concept of collisions and the characteristics of effective hash functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views23 pages

Lecture 13 - Hashing

The document discusses hashing as a method for identifying duplicate integers in a list using various data structures and functions. It explains the process of hashing, the importance of hash functions, and common methods for generating hash values, including the Division Method, Folding Method, and Mid-Square Method. Additionally, it addresses the concept of collisions and the characteristics of effective hash functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

CSE 2105: Data Structures and Algorithms

Hashing
Md Mehrab Hossain Opi
Motivational Problem 2
• Let’s start with a very simple problem.
• Given a list of integers, you need to find the numbers that have
occurred more than once.
• How will you solve it?

CSE 2105: Data Structures and Algorithms 08/15/2025


Problem Solutions 3
• Can you propose a solution that does not require extra memory?
• We can simply sort the array and check adjacent elements.
• What will be the time complexity?

CSE 2105: Data Structures and Algorithms 08/15/2025


Problem Solutions 4
• Let’s use extra memory now.
• We can use a data structure to keep the count of each number.
• Which data structure will you use?
• Array?
• Yes, but what if the numbers are too large?
• Map?
• Let’s pretend we don’t know STL today.
• What will you do now?

CSE 2105: Data Structures and Algorithms 08/15/2025


Problem Solution using Array 5
• We will try to use the idea of a map and implement it using an array.
• If we can somehow convert the numbers into smaller values, we can
easily use the array.
• Like we will convert 1232345 to 45, or convert 3246346 to 20.
• Then we can count the occurrences using a small amount of memory.
• But how do we convert the numbers?
• We will make a function that will convert any number x.
• Let it be .

CSE 2105: Data Structures and Algorithms 08/15/2025


Function to convert Numbers 6
• Let’s define now.
• The function is supposed to make any number smaller.
• How small should it be?
• Suppose we declared an array of size 1000 to keep the count of each
number.
• So, the converted numbers should be in the range of 0 to 999.
• How will you ensure that returns a number between 0 to 999?

CSE 2105: Data Structures and Algorithms 08/15/2025


Function Definition 7
• We can use the modulo operation.
• If we take the remainder of any number by dividing by n, it will be
always between 0 to n-1.
• We can define our function as

CSE 2105: Data Structures and Algorithms 08/15/2025


Hashing 8
• The process we have just seen is called hashing.
• Formally,

Hashing refers to the process of generating a fixed-size


output from an input of variable size using the mathematical
formulas known as hash functions.

CSE 2105: Data Structures and Algorithms 08/15/2025


Collisions 9
• The function we used to convert numbers was
• .
• What will be the value of output for and ?
• Both of them will be 0.
• This condition is called collision.
• A collision refers to the situation in hashing where two distinct inputs
produce the same hash value.

CSE 2105: Data Structures and Algorithms 08/15/2025


Components of Hashing 10
• Input Data
• Hash Function
• Hash Value
• Hash Table (Optional)
• Collision Handling (Optional)

CSE 2105: Data Structures and Algorithms 08/15/2025


Input Data 11
• Data that needs to be hashed.
• Also called keys.
• It was the actual integers in our case.
• Can be any kind of data.
• Number, string, password, file, etc.

CSE 2105: Data Structures and Algorithms 08/15/2025


Hash Function 12
• Algorithm used to compute the hash value from the input data.
• Previously we used .
• The function takes an input and produces a fixed-size output.
• There can be many types of hash functions.

CSE 2105: Data Structures and Algorithms 08/15/2025


Characteristics of Good Hash Functions 13
• Minimize collision.
• Easy and quick to compute.
• Distribute key values evenly
• Have a high load factor.

CSE 2105: Data Structures and Algorithms 08/15/2025


Hash Value 14
• Also known as the hash code or digest.
• Output produced by the hash function.
• Unique representation of the input data in a fixed-size format.
• Ideally, two different inputs should produce different hash values.

CSE 2105: Data Structures and Algorithms 08/15/2025


Hash Table 15
• We can say it’s the generalization of the array.
• Without hashing we were keeping a count array.
• Storing the count of value k at index k.
• Also called direct addressing.

• But it was taking a lot of space.


• Hence, we use a hash table or hash map.

CSE 2105: Data Structures and Algorithms 08/15/2025


Hash Table 16
• A data structure that stores the keys and their associated values
• Using hash function to map keys to their associate value.

CSE 2105: Data Structures and Algorithms 08/15/2025


Common Hash Functions 17
• Let’s discuss about some common hash functions now.
• Simpler hash function includes
• Division Method
• Folding Method
• Mid-square Method

• Well known hash functions


• MD5
• SHA-256
• CRC32

CSE 2105: Data Structures and Algorithms 08/15/2025


Division Method 18
• It is the simplest one and we have already used it.
• We just take the remainder of dividing the key by a constant which is
often the size of hash table.

CSE 2105: Data Structures and Algorithms 08/15/2025


Folding Method 19
• The key is divided into several parts.
• These parts are combined or folded together and transformed to create
the target address.
• Two types of folding are available
• Shift Folding
• Boundary Folding

CSE 2105: Data Structures and Algorithms 08/15/2025


Shift Folding 20
• The number is first divided into several parts.
• The parts are simply added.
• The summed-up value is further adjusted to ensure it falls within the range.
• Example:
• Suppose the number is 233251948124 and size of table is 1000.
• We first divide the number.
• 233, 251, 948, 124.

• Then we add these values


• 233+251+948+124 = 1556

• Finally we take the remainder of the value.


• 1556 % 1000 = 556.

CSE 2105: Data Structures and Algorithms 08/15/2025


Boundary Folding 21
• In this case the key is seen as being written on a piece of paper.
• That is folded on the borders between different parts of the key.

• So, every other part is put in the reverse order now.


• Let’s consider the same number 233251948124.
• At first, we divide it.
• 233, 251, 948, 124.

• Then we reverse every even number.


• 233, 152, 948, 421.

• Then we will add and take remainder again.


• (233+152+948+421)%1000 = (1754)%1000 = 754

CSE 2105: Data Structures and Algorithms 08/15/2025


Mid Square Method 22
• This method works in two steps.
• Square the value of the k.
• Extract the middle r digits as the hash value.

• Suppose the number is 3121.


• Then .
• If the table size is 1000, then as it’s the middle part of the number.
• Converting the number into binary and then performing the operation
changes the result but the performance becomes better.

CSE 2105: Data Structures and Algorithms 08/15/2025


CSE 2105: Data Structures and Algorithms 08/15/2025 23

Thank You.

You might also like