DS - Unit 5 - Notes
DS - Unit 5 - Notes
of AI
In universities, each student is assigned a unique roll number that can be used to retrieve
information about them.
In libraries, each book is assigned a unique number that can be used to determine information
about the book, such as its exact position in the library or the users it has been issued to etc.
In both these examples the students and books were hashed to a unique number. Assume
that you have an object and you want to assign a key to it to make searching easy. To store the
key/value pair, you can use a simple array like a data structure where keys (integers) can be used
directly as an index to store values. However, in cases where the keys are large and cannot be
used directly as an index, you should use hashing.
In hashing, large keys are converted into small keys by using hash functions. The values
are then stored in a data structure called hash table. The idea of hashing is to distribute entries
(key/value pairs) uniformly across an array. Each element is assigned a key (converted key). By
using that key you can access the element in O(1) time. Using the key, the algorithm (hash
function) computes an index that suggests where an entry can be found or inserted.
Hashing is implemented in two steps:
An element is converted into an integer by using a hash function. This element can be used
as an index to store the original element, which falls into the hash table.
The element is stored in the hash table where it can be quickly retrieved using hashed key.
Hash = hashfunc(key)
index = hash % array_size
In this method, the hash is independent of the array size and it is then reduced to an index (a
number between 0 and array_size − 1) by using the modulo operator (%).
The efficiency of mapping depends on the efficiency of the hash function used.
Hash Table: Hash table is a widely used efficient data structure that used to store data which
can be searched in constant time O(1). It is also referred as hash map or hash set. This data
structure is implemented over an array that maps keys to values. Hence, hash map can be seen as
a set of key value pairs. Each key is a number in the range of 0 to the array size–1, generated by
a “hash function”. A good example for hash table is phone book. A phone book has names and
phone numbers. In this case, the names are the keys, the phone numbers are the values.
There are four key components in a hash table:
a hash function: to map a data to an integer index
collision handling: how to handle two data points that table to the same index
put: how to add data to a hash table? in O(1) time
get: how to retrieve data from a hash table? in O(1) time
DrAA Unit-5 1
II B.Tech – I Sem Data Structures Dept. of AI
2. Hash function:
A hash function is any function that can be used to map a data set of an arbitrary size to a
data set of a fixed size, which falls into the hash table. The values returned by a hash function are
called hash values, hash codes, hash sums, or simply hashes.
Let us understand the need for a good hash function. Assume that you have to store strings in the
hash table by using the hashing technique {“abcdef”, “bcdefa”, “cdefab” , “defabc” }.
To compute the index for storing the strings, use a hash function that states the following:
The index for a specific string will be equal to the sum of the ASCII values of the characters
modulo 599. As 599 is a prime number, it will reduce the possibility of indexing different strings
(collisions). It is recommended that you use prime numbers in case of modulo. The ASCII values
of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102 respectively. Since all the strings contain
the same characters with different permutations, the sum will 599.
The hash function will compute the same index for all the strings and the strings will be stored in
the hash table in the following format. As the index of all the strings is the same, you can create a
list on that index and insert all the strings in that list.
Here, it will take O(n) time (where n is the number of strings) to access a specific string. This
shows that the hash function is not a good hash function. Let‟s try a different hash function. The
index for a specific string will be equal to sum of ASCII values of characters multiplied by their
respective order in the string after which it is modulo with 2069 (prime number).
String Hash function Index
abcdef (971 + 982 + 993 + 1004 + 1015 + 1026)%2069 38
bcdefa (981 + 992 + 1003 + 1014 + 1025 + 976)%2069 23
cdefab (991 + 1002 + 1013 + 1024 + 975 + 986)%2069 14
defabc (1001 + 1012 + 1023 + 974 + 985 + 996)%2069 11
DrAA Unit-5 2
II B.Tech – I Sem Data Structures Dept. of AI
A hash function is used to generate the index (hash code) of the key in the array. Ideally,
each index a hash function generates should be unique, but in practice, it is extremely difficult.
For example, a hash code is generated based on the object's address. If we need to compare other
characteristics about two objects, like two people's name and age, we need to write a new hash
function that generates hash code based on name and age.
Properties:
A hash function always returns a number
Two equal objects (based on the equal() method) always have the same hash code
Two different objects don't always have the same hash code
Here is the procedure of storing objects in the array using a hash function:
Here we use a hash function which generates indexes based on the remainder. So, 15 will be in
index 1, 93 will be in index 2. But if there was a 79 which also has a remainder 2, a collision will
happen.
DrAA Unit-5 3
II B.Tech – I Sem Data Structures Dept. of AI
Hash Table:
0 1 2 3 4 5 6
63 15 93 73 101 40 55
79
To perform a search: we need to determine which list to traverse by using the hash function, and
then search in that linked list.
To perform insert: depending on if duplicates are accepted or not, we check the target list to see
if the element we want to insert already exists or not. We then insert the element at the front of
the list because first, it is convenient; second, the recently inserted elements are more likely to be
needed in the future.=
DrAA Unit-5 4
II B.Tech – I Sem Data Structures Dept. of AI
b. Quadratic Probing
Quadratic probing is an open addressing scheme for resolving hash collisions in hash
tables. Quadratic probing operates by taking the original hash index and adding successive
values of an arbitrary quadratic polynomial until an open slot is found.
How it works: it is a collision handling method that eliminates the primary clustering problem.
Like what the name itself, this method uses a quadratic collision function. A common choice is
f(i) = i^2, i stands for index.
In this process, secondary clustering can happen. Just like primary clustering, the slots after the
hash position can be filled up, but created by quadratic probing. For example, if the primary hash
index is x, the probing will go x+2, x+4, x+9 and so on. Secondary clustering is less problematic
for the performance.
5. Double Hashing
How it works: this method requires a second hash function to resolve collision. When a collision
happens, this method uses the second hash function to generate a new hash code for the object. A
common choice is hash(key) = R - (key % R), R stands for a prime smaller than the table size.
DrAA Unit-5 5
II B.Tech – I Sem Data Structures Dept. of AI
6. Rehashing
Rehashing includes increasing the size of the underlying data structure and mapping existing
items to new bucket locations.
How it works: When an insert is made such that the number of entries in a hash table exceeds the
product of the load factor and the current capacity then the hash table will need to be rehashed.
When should we rehash?
When table s half full.
When an insertion fails.
When load reaches a certain level.
Rehashing example:
7. Extendible Hashing
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used
to hash data. It is an aggressively flexible method in which the hash function also experiences dynamic
changes.
Main features of Extendible Hashing: The main features in this hashing technique are:
Directories: The directories store addresses of the buckets in pointers. An id is assigned to
each directory which may change each time when Directory Expansion takes place.
Buckets: The buckets are used to hash the actual data.
DrAA Unit-5 6
II B.Tech – I Sem Data Structures Dept. of AI
8. Implementation of Dictionaries
A dictionary is a general-purpose data structure for storing a group of objects. A dictionary
is an ordered or unordered list of elements. Each element is a pair of key and value. A value is
associated with the corresponding key. When presented with a key, the dictionary will simply
return the associated value. A dictionary is also known as an associative array or a map.
A dictionary holds a set of {key, value} pairs
Example 1:
Key Value
FirstName Mahesh
LastName Babu
Address Hyderabad
Age 45
Example 2: The results of a classroom test could be represented as a dictionary with student's
names as keys and their scores as the values:
results = { „ Sachin‟ : 65, „Dhoni‟ : 70, „ Kohili‟ : 55, „Irfan‟ : 50, „Raina‟ : 40 }
Example: Consider an empty unordered dictionary and the following set of operations:
Operation Dictionary Output
insertItem(5,A) {(5,A)}
insertItem(7,B) {(5,A), (7,B)}
insertItem(2,C) {(5,A), (7,B), (2,C)}
insertItem(8,D) {(5,A), (7,B), (2,C), (8,D)}
insertItem(2,E) {(5,A), (7,B), (2,C), (8,D), (2,E)}
findItem(7) {(5,A), (7,B), (2,C), (8,D), (2,E)} B
findItem(4) {(5,A), (7,B), (2,C), (8,D), (2,E)} NO_SUCH_KEY
findItem(2) {(5,A), (7,B), (2,C), (8,D), (2,E)} C
findAllItems(2) {(5,A), (7,B), (2,C), (8,D), (2,E)} C, E
DrAA Unit-5 7
II B.Tech – I Sem Data Structures Dept. of AI
1 50 2 60 4 70 6 80 Null
The pseudo code of Linear list for representing dictionary:
struct node
{
int key;
int value;
struct node *next;
} head;
void insert( );
void delete( );
void search( );
void display( );
Insertion: consider initially the dictionary is empty. That means that Head is NULL. We will
create a new node with key = 1 and a value 50.
Key value nextAddress
HEAD 1 50 NULL
If we want to insert more records like key = 2; value = 60 then we will create such a new node:
Key value nextAddress
1 50 2 60 NULL
Note : Head(Key) < New(Key)
Deletion: consider initially the dictionary is full. That means that Head is NOT NULL. We will
delete an existing node with key.
Key value nextAddress
1 50 2 60 4 70 6 80 Null
Suppose we want to delete middle node with key (4) from the list.
1 50 2 60 4 70 6 80 Null
DrAA Unit-5 8