0% found this document useful (0 votes)
15 views7 pages

22CS302 LM21

The document provides an overview of hash tables, explaining how hashing is used to uniquely identify objects and facilitate quick data retrieval. It discusses the importance of a good hash function, which should be easy to compute, provide uniform distribution, and minimize collisions. Additionally, it illustrates the application of hashing in counting character frequencies in a string, demonstrating improved efficiency compared to traditional methods.

Uploaded by

poojask1636
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

22CS302 LM21

The document provides an overview of hash tables, explaining how hashing is used to uniquely identify objects and facilitate quick data retrieval. It discusses the importance of a good hash function, which should be easy to compute, provide uniform distribution, and minimize collisions. Additionally, it illustrates the application of hashing in counting character frequencies in a string, demonstrating improved efficiency compared to traditional methods.

Uploaded by

poojask1636
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

22XX302 DATA STRUCTURES I

UNIT IV

&
HASH TABLE

1. Basics of Hash Tables

Hashing is a technique that is used to uniquely identify a specific object from a group of
similar objects. Some examples of how hashing is used in our lives include:
In universities, each student is assigned a unique roll number that can be used to retrieve
information about them.
In libraries, each book is assigned a unique number that can be used to determine information
about the book, such as its exact position in the library or the users it has been issued to etc.
In both these examples the students and books were hashed to a unique number.
Assume that you have an object and you want to assign a key to it to make searching easy. To
store the key/value pair, you can use a simple array like a data structure where keys (integers)
can be used directly as an index to store values. However, in cases where the keys are large
and cannot be used directly as an index, you should use hashing.
In hashing, large keys are converted into small keys by using hash functions. The values are
then stored in a data structure called hash table. The idea of hashing is to distribute entries
(key/value pairs) uniformly across an array. Each element is assigned a key (converted key).
By using that key you can access the element in O(1) time. Using the key, the algorithm
(hash function) computes an index that suggests where an entry can be found or inserted.
Hashing is implemented in two steps:
An element is converted into an integer by using a hash function. This element can be used as
an index to store the original element, which falls into the hash table.
The element is stored in the hash table where it can be quickly retrieved using hashed key.
hash = hashfunc(key)
index = hash % array_size
In this method, the hash is independent of the array size and it is then reduced to an index (a
number between 0 and array_size − 1) by using the modulo operator (%).

2. Hash function

A hash function is any function that can be used to map a data set of an arbitrary size to a
data set of a fixed size, which falls into the hash table. The values returned by a hash function
are called hash values, hash codes, hash sums, or simply hashes.
To achieve a good hashing mechanism, It is important to have a good hash function with the
following basic requirements:
Easy to compute: It should be easy to compute and must not become an algorithm in itself.
Uniform distribution: It should provide a uniform distribution across the hash table and
should not result in clustering.
Less collisions: Collisions occur when pairs of elements are mapped to the same hash value.
These should be avoided.
Note: Irrespective of how good a hash function is, collisions are bound to occur. Therefore, to
maintain the performance of a hash table, it is important to manage collisions through various
collision resolution techniques.

3. Need for a good hash function

Let us understand the need for a good hash function. Assume that you have to store strings in
the hash table by using the hashing technique {“abcdef”, “bcdefa”, “cdefab” , “defabc” }.
To compute the index for storing the strings, use a hash function that states the following:
The index for a specific string will be equal to the sum of the ASCII values of the characters
modulo 599.
As 599 is a prime number, it will reduce the possibility of indexing different strings
(collisions). It is recommended that you use prime numbers in case of modulo. The ASCII
values of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102 respectively. Since all the strings
contain the same characters with different permutations, the sum will 599.
The hash function will compute the same index for all the strings and the strings will be
stored in the hash table in the following format. As the index of all the strings is the same,
you can create a list on that index and insert all the strings in that list.
Here, it will take O(n) time (where n is the number of strings) to access a specific string. This
shows that the hash function is not a good hash function.
Let’s try a different hash function. The index for a specific string will be equal to sum of
ASCII values of characters multiplied by their respective order in the string after which it is
modulo with 2069 (prime number).
String Hash function Index
abcdef (971 + 982 + 993 + 1004 + 1015 + 1026)%2069 38
bcdefa (981 + 992 + 1003 + 1014 + 1025 + 976)%2069 23
cdefab (991 + 1002 + 1013 + 1024 + 975 + 986)%2069 14
defabc (1001 + 1012 + 1023 + 974 + 985 + 996)%2069 11
4. Hash table
A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to
compute an index into an array in which an element will be inserted or searched. By using a
good hash function, hashing can work well. Under reasonable assumptions, the average time
required to search for an element in a hash table is O(1).
Let us consider string S. You are required to count the frequency of all the characters in this
string.
string S = “ababcd”
The simplest way to do this is to iterate over all the possible characters and count their
frequency one by one. The time complexity of this approach is O(26*N) where N is the size
of the string and there are 26 possible characters.
void countFre(string S)
{
for(char c = ‘a’;c <= ‘z’;++c)
{
int frequency = 0;
for(int i = 0;i < S.length();++i)
if(S[i] == c)
frequency++;
cout << c << ‘ ‘ << frequency << endl;
}
}
Output
a2
b2
c1
d1
e0
f0

z0
Let us apply hashing to this problem. Take an array frequency of size 26 and hash the 26
characters with indices of the array by using the hash function. Then, iterate over the string
and increase the value in the frequency at the corresponding index for each character. The
complexity of this approach is O(N) where N is the size of the string.
int Frequency[26];

int hashFunc(char c)
{
return (c - ‘a’);
}
void countFre(string S)
{
for(int i = 0;i < S.length();++i)
{
int index = hashFunc(S[i]);
Frequency[index]++;
}
for(int i = 0;i < 26;++i)
cout << (char)(i+’a’) << ‘ ‘ << Frequency[i] << endl;
}
Output
a2
b2
c1
d1
e0
f0

z0

You might also like