0% found this document useful (0 votes)
15 views

DSA_M5

Uploaded by

debifiw316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

DSA_M5

Uploaded by

debifiw316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Hashing

https://www.geeksforgeeks.org/hashing-data-structure/

Hashing is a technique used in data structures that efficiently

stores and retrieves data in a way that allows for quick access.

 Hashing involves mapping data to a specific index in a

hash table (an array of items) using a hash function that

enables fast retrieval of information based on its key.

 The great thing about hashing is, we can achieve all

three operations (search, insert and delete) in O(1) time

on average.

 Hashing is mainly used to implement a set of distinct

items and dictionaries (key value pairs).


Basics of Hashing in Data Structure

 Introduction

 Applications

 Index Mapping (or Trivial Hashing)

 Separate Chaining for Collision Handling

 Open Addressing for Collision Handling

 Double Hashing

 Load Factor and Rehashing

What is Hashing?

Hashing in Data Structures refers to the process of transforming

a given key to another value. It involves mapping data to a

specific index in a hash table using a hash function that enables

fast retrieval of information based on its key. The

transformation of a key to the corresponding value is done using

a Hash Function and the value obtained from the hash function

is called Hash Code .

Need for Hash data structure


Every day, the data on the internet is increasing multifold and

it is always a struggle to store this data efficiently. In day-to-

day programming, this amount of data might not be that big,

but still, it needs to be stored, accessed, and processed easily

and efficiently. A very common data structure that is used for

such a purpose is the Array data structure.

Now the question arises if Array was already there, what was

the need for a new data structure! The answer to this is in the

word ” efficiency “. Though storing in Array takes O(1) time,

searching in it takes at least O(log n) time. This time appears to

be small, but for a large data set, it can cause a lot of problems

and this, in turn, makes the Array data structure inefficient.

So now we are looking for a data structure that can store the

data and search in it in constant time, i.e. in O(1) time. This is

how Hashing data structure came into play. With the

introduction of the Hash data structure, it is now possible to

easily store data in constant time and retrieve them in constant

time as well.

Components of Hashing
There are majorly three components of hashing:

1. Key: A Key can be anything string or integer which is

fed as input in the hash function the technique that

determines an index or location for storage of an item

in a data structure.

2. Hash Function: The hash function receives the input

key and returns the index of an element in an array

called a hash table. The index is known as the hash

index .

3. Hash Table: Hash table is a data structure that maps

keys to values using a special function called a hash

function. Hash stores the data in an associative manner

in an array where each data value has its own unique

index.
How does Hashing work?

Suppose we have a set of strings {“ab”, “cd”, “efg”} and we

would like to store it in a table.

Our main objective here is to search or update the values stored

in the table quickly in O(1) time and we are not concerned

about the ordering of strings in the table. So the given set of

strings can act as a key and the string itself will act as the value

of the string but how to store the value corresponding to the

key?
 Step 1: We know that hash functions (which is some

mathematical formula) are used to calculate the hash

value which acts as the index of the data structure

where the value will be stored.

 Step 2: So, let’s assign

o “a” = 1,

o “b”=2, .. etc, to all alphabetical characters.

 Step 3: Therefore, the numerical value by summation of

all characters of the string:

 “ab” = 1 + 2 = 3,

 “cd” = 3 + 4 = 7 ,

 “efg” = 5 + 6 + 7 = 18

 Step 4: Now, assume that we have a table of size 7 to

store these strings. The hash function that is used here is

the sum of the characters in key mod Table size . We

can compute the location of the string in the array by

taking the sum(string) mod 7 .

 Step 5: So we will then store

o “ab” in 3 mod 7 = 3,
o “cd” in 7 mod 7 = 0, and

o “efg” in 18 mod 7 = 4.

The above technique enables us to calculate the location of a

given string by using a simple hash function and rapidly find the

value that is stored in that location. Therefore the idea of

hashing seems like a great way to store (key, value) pairs of the

data in a table.

What is a Hash function?

The hash function creates a mapping between key and value,

this is done through the use of mathematical formulas known as

hash functions. The result of the hash function is referred to as

a hash value or hash. The hash value is a representation of the

original string of characters but usually smaller than the

original.
For example: Consider an array as a Map where the key is the

index and the value is the value at that index. So for an array A

if we have index i which will be treated as the key then we can

find the value by simply looking at the value at A[i].

Types of Hash functions:

https://www.geeksforgeeks.org/hash-functions-and-list-

types-of-hash-functions/

This article focuses on discussing different hash functions:

1. Division Method.

2. Multiplication Method

3. Mid-Square Method

4. Folding Method

5. Cryptographic Hash Functions

6. Universal Hashing

7. Perfect Hashing

Let’s begin discussing these methods in detail.

1. Division Method
The division method involves dividing the key by a prime

number and using the remainder as the hash value.

h(k)=k mod m

Where k is the key and 𝑚m is a prime number.

Advantages:

 Simple to implement.

 Works well when 𝑚m is a prime number.

Disadvantages:

 Poor distribution if 𝑚m is not chosen wisely.

2. Multiplication Method

In the multiplication method, a constant 𝐴A (0 < A < 1) is used

to multiply the key. The fractional part of the product is then

multiplied by 𝑚m to get the hash value.

h(k)=⌊m(kAmod1)⌋

Where ⌊ ⌋ denotes the floor function.

Advantages:

 Less sensitive to the choice of 𝑚m.

Disadvantages:

 More complex than the division method.


3. Mid-Square Method

In the mid-square method, the key is squared, and the middle

digits of the result are taken as the hash value.

Steps:

1. Square the key.

2. Extract the middle digits of the squared value.

Advantages:

 Produces a good distribution of hash values.

Disadvantages:

 May require more computational effort.

4. Folding Method

The folding method involves dividing the key into equal parts,

summing the parts, and then taking the modulo with respect to

𝑚m .

Steps:

1. Divide the key into parts.

2. Sum the parts.

3. Take the modulo 𝑚m of the sum.

Advantages:
 Simple and easy to implement.

Disadvantages:

 Depends on the choice of partitioning scheme.

5. Cryptographic Hash Functions

Cryptographic hash functions are designed to be secure and are

used in cryptography. Examples include MD5, SHA-1, and

SHA-256.

Characteristics:

 Pre-image resistance.

 Second pre-image resistance.

 Collision resistance.

Advantages:

 High security.

Disadvantages:

 Computationally intensive.

6. Universal Hashing

Universal hashing uses a family of hash functions to minimize

the chance of collision for any given set of inputs.

h(k)=((a⋅k+b)modp)modm
Where a and b are randomly chosen constants, p is a prime

number greater than m, and k is the key.

Advantages:

 Reduces the probability of collisions.

Disadvantages:

 Requires more computation and storage.

7. Perfect Hashing

Perfect hashing aims to create a collision-free hash function for

a static set of keys. It guarantees that no two keys will hash to

the same value.

Types:

 Minimal Perfect Hashing: Ensures that the range of the

hash function is equal to the number of keys.

 Non-minimal Perfect Hashing: The range may be larger

than the number of keys.

Advantages:

 No collisions.

Disadvantages:

 Complex to construct.
Conclusion

In conclusion, hash functions are very important tools that help

store and find data quickly. Knowing the different types of hash

functions and how to use them correctly is key to making

software work better and more securely. By choosing the right

hash function for the job, developers can greatly improve the

efficiency and reliability of their systems.

Properties of a Good hash function

A hash function that maps every item into its own unique slot is

known as a perfect hash function. We can construct a perfect

hash function if we know the items and the collection will never

change but the problem is that there is no systematic way to

construct a perfect hash function given an arbitrary collection of

items. Fortunately, we will still gain performance efficiency even

if the hash function isn’t perfect. We can achieve a perfect hash

function by increasing the size of the hash table so that every

possible value can be accommodated. As a result, each item will


have a unique slot. Although this approach is feasible for a small

number of items, it is not practical when the number of

possibilities is large.

So, We can construct our hash function to do the same but the

things that we must be careful about while constructing our own

hash function.

A good hash function should have the following properties:

1. Efficiently computable.

2. Should uniformly distribute the keys (Each table position

is equally likely for each.

3. Should minimize collisions.

4. Should have a low load factor(number of items in the

table divided by the size of the table).

Complexity of calculating hash value using the hash

function

 Time complexity: O(n)

 Space complexity: O(1)

Problem with Hashing


If we consider the above example, the hash function we used is

the sum of the letters, but if we examined the hash function

closely then the problem can be easily visualized that for

different strings same hash value is being generated by the hash

function.

For example: {“ab”, “ba”} both have the same hash value, and

string {“cd”,”be”} also generate the same hash value, etc. This is

known as collision and it creates problem in searching, insertion,

deletion, and updating of value.

What is Collision?

Collision in Hashing occurs when two different keys map to the

same hash value. Hash collisions can be intentionally created for

many hash algorithms. The probability of a hash collision

depends on the size of the algorithm, the distribution of hash

values and the efficiency of Hash function.

The hashing process generates a small number for a big key, so

there is a possibility that two keys could produce the same

value. The situation where the newly inserted key maps to an


already occupied, and it must be handled using some collision

handling technology.

How to handle Collisions?

There are mainly two methods to handle collision:

1. Separate Chaining

2. Open Addressing
1) Separate Chaining

The idea is to make each cell of the hash table point to a linked

list of records that have the same hash function value. Chaining

is simple but requires additional memory outside the table.

Example: We have given a hash function and we have to insert

some elements in the hash table using a separate chaining

method for collision resolution technique.

Hash function = key % 5,


Elements = 12, 15, 22, 25 and 37.
Let’s see step by step approach to how to solve the above

problem:
2) Open Addressing

In open addressing, all elements are stored in the hash table

itself. Each table entry contains either a record or NIL. When

searching for an element, we examine the table slots one by one

until the desired element is found or it is clear that the element

is not in the table.

2.a) Linear Probing

In linear probing, the hash table is searched sequentially that

starts from the original location of the hash. If in case the


location that we get is already occupied, then we check for the

next location.

Algorithm:

1. Calculate the hash key. i.e. key = data % size

2. Check, if hashTable[key] is empty

 store the value directly by hashTable[key] = data

3. If the hash index already has some value then

 check for next index using key = (key+1) % size

4. Check, if the next index is available hashTable[key] then

store the value. Otherwise try for next index.

5. Do the above process till we find the space.

Example: Let us consider a simple hash function as “key mod 5”

and a sequence of keys that are to be inserted are 50, 70, 76,

85, 93.
2.b) Quadratic Probing

Quadratic probing is an open addressing scheme in computer

programming for resolving hash collisions in hash tables.

Quadratic probing operates by taking the original hash index

and adding successive values of an arbitrary quadratic

polynomial until an open slot is found.

An example sequence using quadratic probing is:

H+1 2 ,H +2 2 ,H +3 2 ,H +4 2 …………………. H + k 2

This method is also known as the mid-square method because in

this method we look for i 2 ‘th probe (slot) in i’th iteration and

the value of i = 0, 1, . . . n – 1. We always start from the


original hash location. If only the location is occupied then we

check the other slots.

Let hash(x) be the slot index computed using the hash function

and n be the size of the hash table.

If the slot hash(x) % n is full, then we try (hash(x) + 1 2 ) % n.

If (hash(x) + 1 2 ) % n is also full, then we try (hash(x) + 2 2 )%

n.

If (hash(x) + 2 2 ) % n is also full, then we try (hash(x) + 3 2 )%

n.

This process will be repeated for all the values of i until an

empty slot is found

Example: Let us consider table Size = 7, hash function as

Hash(x) = x % 7 and collision resolution strategy to be f(i) = i 2 .

Insert = 22, 30, and 50

https://www.geeksforgeeks.org/introduction-to-hashing-2/

Applications of Hash Data structure

 Hash is used in databases for indexing.


 Hash is used in disk-based data structures.

 In some programming languages like Python, JavaScript

hash is used to implement objects.

Real-Time Applications of Hash Data structure

 Hash is used for cache mapping for fast access to the

data.

 Hash can be used for password verification.

 Hash is used in cryptography as a message digest.

 Rabin-Karp algorithm for pattern matching in a string.

 Calculating the number of different substrings of a

string.

Advantages of Hash Data structure

 Hash provides better synchronization than other data

structures.

 Hash tables are more efficient than search trees or other

data structures

 Hash provides constant time for searching, insertion,

and deletion operations on average.


Disadvantages of Hash Data structure

 Hash is inefficient when there are many collisions.

 Hash collisions are practically not avoided for a large set

of possible keys.

 Hash does not allow null values.

Hashing is a computation technique in which hashing functions take variable-


length data as input and issue a shortened fixed-length data as output. The
output data is often called a "Hash Code", "Key", or simply "Hash". The data
on which hashing works is called a "Data Bucket".

Characteristics of Hashing Technique

Hashing techniques come with the following characteristics −

 The first characteristic is, hashing technique is deterministic. Means, whatever


number of times you invoke the function on the same test variable, it delivers the
same fixed-length result.
 The second characteristic is its unidirectional action. There is no way you can use the
Key to retrieve the original data. Hashing is irreversible.
What are Hash Functions?

Hash functions are mathematical functions that are executed to generate


addresses of data records. Hash functions use memory locations that store
data, called ‘Data Buckets’.

Hash functions are used in cryptographic signatures, securing privacy of


vulnerable data, and verifying correctness of the received files and texts. In
computation, hashing is used in data processing to locate a single string of
data in an array, or to calculate direct addresses of records on the disk by
requesting its Hash Code or Key.

Explore our latest online courses and learn new skills at your own pace. Enroll and become
a certified expert to boost your career.

Applications of Hashing

Hashing is applicable in the following area −

 Password verification
 Associating filename with their paths in operating systems
 Data Structures, where a key-value pair is created in which the key is a unique value,
whereas the value associated with the keys can be either same or different for
different keys.
 Board games such as Chess, tic-tac-toe, etc.
 Graphics processing, where a large amount of data needs to be matched and fetched.

Database Management Systems where phenomenal records are required to be


searched, queried, and matched for retrieval. For example, DBMS used in
banking or large public transport reservation software.

Read through this article to find out more about Hashing and specifically the
difference between two important hashing techniques − static
hashing and dynamic hashing.

What is Static Hashing?

It is a hashing technique that enables users to lookup a definite data set.


Meaning, the data in the directory is not changing, it is "Static" or fixed. In this
hashing technique, the resulting number of data buckets in memory remains
constant.

Operations Provided by Static Hashing

Static hashing provides the following operations −

 Delete − Search a record address and delete a record at the same address or delete a
chunk of records from records for that address in memory.
 Insertion − While entering a new record using static hashing, the hash function (h)
calculates bucket address "h(K)" for the search key (k), where the record is going to
be stored.
 Search − A record can be obtained using a hash function by locating the address of
the bucket where the data is stored.
 Update − It supports updating a record once it is traced in the data bucket.

Advantages of Static Hashing


Static hashing is advantageous in the following ways −

 Offers unparalleled performance for small-size databases.


 Allows Primary Key value to be used as a Hash Key.

Disadvantages of Static Hashing

Static hashing comes with the following disadvantages −

 It cannot work efficiently with the databases that can be scaled.


 It is not a good option for large-size databases.
 Bucket overflow issue occurs if there is more data and less memory.

What is Dynamic Hashing?

It is a hashing technique that enables users to lookup a dynamic data set.


Means, the data set is modified by adding data to or removing the data from,
on demand hence the name ‘Dynamic’ hashing. Thus, the resulting data bucket
keeps increasing or decreasing depending on the number of records.

In this hashing technique, the resulting number of data buckets in memory is


ever-changing.

Operations Provided by Dynamic Hashing

Dynamic hashing provides the following operations −

 Delete − Locate the desired location and support deleting data (or a chunk of data) at
that location.
 Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
 Query − Perform querying to compute the bucket address.
 Update − Perform a query to update the data.

Advantages of Dynamic Hashing


Dynamic hashing is advantageous in the following ways −

 It works well with scalable data.


 It can handle addressing large amount of memory in which data size is always
changing.
 Bucket overflow issue comes rarely or very late.

Disadvantages of Dynamic Hashing

Dynamic hashing comes with the following disadvantage −

 The location of the data in memory keeps changing according to the bucket size. Hence
if there is a phenomenal increase in data, then maintaining the bucket address table
becomes a challenge.

Differences between Static and Dynamic Hashing

Here are some prominent differences by which Static Hashing is different than
Dynamic Hashing −

Key Factor Static Hashing Dynamic Hashing

Form of Variable-size, changing


Fixed-size, non-changing data.
Data data.

The resulting Data Bucket is of The resulting Data Bucket


Result
fixed-length. is of variable-length.

Challenge of Bucket overflow Bucket overflow can occur


Bucket
can arise often depending upon very late or doesn’t occur
Overflow
memory size. at all.

Complexity Simple Complex


Conclusion

Hashing is a computation technique that uses mathematical functions called


Hash Functions to calculate the location (address) of the data in the memory.
We learnt that there are two different hashing functions namely, Static hashing
and Dynamic hashing.

Each hashing technique is different in terms of whether they work on fixed-


length data bucket or a variable-length data bucket. Selecting a proper hashing
technique is required by considering the amount of data needed to be handled,
and the intended speed of the application.

Static Hashing in DBMS




Static hashing refers to a hashing technique that allows the user to search over
a pre-processed dictionary (all elements present in the dictionary are final and
unmodified). In this article, we will take an in-depth look at static hashing in a
DBMS.

What is Static Hashing?

When a search key is specified in a static hash, the hashing algorithm always
returns the same address. For example, if you take the mod-4 hash function,
only 5 values will be produced. For this to work, the output address must
always be the same. The number of buckets at any given time is constant.

The data bucket address obtained by static hashing will always be the same. So,
if we use the mod(5) hash function to get the address of EmpId = 103, we always
get the same data bucket address 3. In this case, the data bucket position
remains unchanged. Therefore, all existing data buckets in memory remain
unchanged while the entire hashing process remains the same. In this case,
there are five partitions in the memory used to hold data.

What are the Operations in Static Hashing?

Searching the data: When data is needed, the same hash function

is used to get the address of the packet where the data is stored.

Inserting Data: When new data is entered into the table, the hash

key is used to create the address of the new data and place the

data there.

Deleting a Record: To delete a record, we must first provide the

record to be destroyed. The data of this address will be deleted

from memory.
Updating a Record: To update the info, we will first find it using

the hash function and then change the profile info. If we want to

add data to the file, but the address of the dataset created by

the hash function is not empty or there is data in the address,

we cannot add information. Bucket overflow is the term used to

describe this situation in static hashing.

There are many ways to solve bucket overflow problem. Here are

some of the most common uses:

Open Hashing

When the hash function returns an address that already contains

information, the next packet is assigned to it through a process

called linear probing. For example, if the new address needed

after input is R4, the hash function will produce 112 as the

address of R4. However, the residence is completely finished; so

the system selects 113 as the next available packet and assigns it

R4.

Close Hashing

When the data group is full, a new bucket will be created for the

same hash result and added after the old group. This method is
called overflow chaining. For example, if the new address to be

added to the file is R4, the hash function will give it address 112.

But the bucket is too full to hold any more information. In this

case, a new bucket will be placed and tied to the end of the 112

bucket.

Advantages of Static Hashing

The advantages of using static hashes in a DBMS are:

 Performance is very good for small data sets.

 Help with storage management.

 Hash keys facilitate access to address information.

 The primary key value can be used by the hash value.

Disadvantages of Static Hashing

Disadvantages of using static hashing techniques in DBMS are:

 Static hashing is not a good choice for big data.

 This process takes longer than usual because the hash

function has to go through all memory locations in the

DBMS system.

 Does not work with extensible databases.


 The sorting technique is not good compared to other

hashing techniques.

Conclusion

Static hashing is one of many other hashing techniques used to

show that data is not stored sequentially and also provides the

correct memory address for each value in the data. Unlike other

hashes, static hash methods can be used for static results without

changing the values of objects, objects, and relational data in

the DBMS.

Frequently Asked Questions on Static Hashing -

FAQs

What is hash collision?

A hash collision occurs when the hash values of two or more files

in the dataset are not assigned to the same location in the hash

table.

What to do with hash conflicts?

There are two methods that can be used to avoid hash collisions

which includes rehashing in which calls the association hash


function, which is used repeatedly until space appears. Second

method is the chaining method creates a linked list of objects

whose keys have the same value. This method should have an

extension for each address.

What is the time complexity of static hashing?

The Modulo Hash Function calculates the hash of key data and

performs a modulo N operation to find the array index (node

identifier) to store or retrieve a key. The time complexity of

finding an identity in the static hash part is constant O(1).

https://quescol.com/data-structure/linear-probing

https://databasetown.com/types-of-hashing-in-dbms-static-

dynamic-hashing/

https://pediaa.com/what-is-the-difference-between-static-

and-dynamic-hashing/

You might also like