CSE 2105: Data Structures and Algorithms
Hashing
Md Mehrab Hossain Opi
Motivational Problem 2
• Let’s start with a very simple problem.
• Given a list of integers, you need to find the numbers that have
occurred more than once.
• How will you solve it?
CSE 2105: Data Structures and Algorithms 08/15/2025
Problem Solutions 3
• Can you propose a solution that does not require extra memory?
• We can simply sort the array and check adjacent elements.
• What will be the time complexity?
CSE 2105: Data Structures and Algorithms 08/15/2025
Problem Solutions 4
• Let’s use extra memory now.
• We can use a data structure to keep the count of each number.
• Which data structure will you use?
• Array?
• Yes, but what if the numbers are too large?
• Map?
• Let’s pretend we don’t know STL today.
• What will you do now?
CSE 2105: Data Structures and Algorithms 08/15/2025
Problem Solution using Array 5
• We will try to use the idea of a map and implement it using an array.
• If we can somehow convert the numbers into smaller values, we can
easily use the array.
• Like we will convert 1232345 to 45, or convert 3246346 to 20.
• Then we can count the occurrences using a small amount of memory.
• But how do we convert the numbers?
• We will make a function that will convert any number x.
• Let it be .
CSE 2105: Data Structures and Algorithms 08/15/2025
Function to convert Numbers 6
• Let’s define now.
• The function is supposed to make any number smaller.
• How small should it be?
• Suppose we declared an array of size 1000 to keep the count of each
number.
• So, the converted numbers should be in the range of 0 to 999.
• How will you ensure that returns a number between 0 to 999?
CSE 2105: Data Structures and Algorithms 08/15/2025
Function Definition 7
• We can use the modulo operation.
• If we take the remainder of any number by dividing by n, it will be
always between 0 to n-1.
• We can define our function as
CSE 2105: Data Structures and Algorithms 08/15/2025
Hashing 8
• The process we have just seen is called hashing.
• Formally,
Hashing refers to the process of generating a fixed-size
output from an input of variable size using the mathematical
formulas known as hash functions.
CSE 2105: Data Structures and Algorithms 08/15/2025
Collisions 9
• The function we used to convert numbers was
• .
• What will be the value of output for and ?
• Both of them will be 0.
• This condition is called collision.
• A collision refers to the situation in hashing where two distinct inputs
produce the same hash value.
CSE 2105: Data Structures and Algorithms 08/15/2025
Components of Hashing 10
• Input Data
• Hash Function
• Hash Value
• Hash Table (Optional)
• Collision Handling (Optional)
CSE 2105: Data Structures and Algorithms 08/15/2025
Input Data 11
• Data that needs to be hashed.
• Also called keys.
• It was the actual integers in our case.
• Can be any kind of data.
• Number, string, password, file, etc.
CSE 2105: Data Structures and Algorithms 08/15/2025
Hash Function 12
• Algorithm used to compute the hash value from the input data.
• Previously we used .
• The function takes an input and produces a fixed-size output.
• There can be many types of hash functions.
CSE 2105: Data Structures and Algorithms 08/15/2025
Characteristics of Good Hash Functions 13
• Minimize collision.
• Easy and quick to compute.
• Distribute key values evenly
• Have a high load factor.
CSE 2105: Data Structures and Algorithms 08/15/2025
Hash Value 14
• Also known as the hash code or digest.
• Output produced by the hash function.
• Unique representation of the input data in a fixed-size format.
• Ideally, two different inputs should produce different hash values.
CSE 2105: Data Structures and Algorithms 08/15/2025
Hash Table 15
• We can say it’s the generalization of the array.
• Without hashing we were keeping a count array.
• Storing the count of value k at index k.
• Also called direct addressing.
• But it was taking a lot of space.
• Hence, we use a hash table or hash map.
CSE 2105: Data Structures and Algorithms 08/15/2025
Hash Table 16
• A data structure that stores the keys and their associated values
• Using hash function to map keys to their associate value.
CSE 2105: Data Structures and Algorithms 08/15/2025
Common Hash Functions 17
• Let’s discuss about some common hash functions now.
• Simpler hash function includes
• Division Method
• Folding Method
• Mid-square Method
• Well known hash functions
• MD5
• SHA-256
• CRC32
CSE 2105: Data Structures and Algorithms 08/15/2025
Division Method 18
• It is the simplest one and we have already used it.
• We just take the remainder of dividing the key by a constant which is
often the size of hash table.
CSE 2105: Data Structures and Algorithms 08/15/2025
Folding Method 19
• The key is divided into several parts.
• These parts are combined or folded together and transformed to create
the target address.
• Two types of folding are available
• Shift Folding
• Boundary Folding
CSE 2105: Data Structures and Algorithms 08/15/2025
Shift Folding 20
• The number is first divided into several parts.
• The parts are simply added.
• The summed-up value is further adjusted to ensure it falls within the range.
• Example:
• Suppose the number is 233251948124 and size of table is 1000.
• We first divide the number.
• 233, 251, 948, 124.
• Then we add these values
• 233+251+948+124 = 1556
• Finally we take the remainder of the value.
• 1556 % 1000 = 556.
CSE 2105: Data Structures and Algorithms 08/15/2025
Boundary Folding 21
• In this case the key is seen as being written on a piece of paper.
• That is folded on the borders between different parts of the key.
• So, every other part is put in the reverse order now.
• Let’s consider the same number 233251948124.
• At first, we divide it.
• 233, 251, 948, 124.
• Then we reverse every even number.
• 233, 152, 948, 421.
• Then we will add and take remainder again.
• (233+152+948+421)%1000 = (1754)%1000 = 754
CSE 2105: Data Structures and Algorithms 08/15/2025
Mid Square Method 22
• This method works in two steps.
• Square the value of the k.
• Extract the middle r digits as the hash value.
• Suppose the number is 3121.
• Then .
• If the table size is 1000, then as it’s the middle part of the number.
• Converting the number into binary and then performing the operation
changes the result but the performance becomes better.
CSE 2105: Data Structures and Algorithms 08/15/2025
CSE 2105: Data Structures and Algorithms 08/15/2025 23
Thank You.