0% found this document useful (0 votes)
112 views

DS - Unit 5 - Notes

This document discusses hashing and hash tables. It begins by defining hashing as a technique that maps keys to values in a hash table using a hash function. A good hash function distributes entries uniformly in the hash table and allows accessing elements in O(1) time. The document then discusses hash tables, which are data structures that implement key-value pairs using a hash function to map keys to array indexes. It notes the components of a hash table include the hash function, collision handling, insertion, and retrieval operations. The document explains different hash functions and collision handling methods like separate chaining using linked lists and open addressing techniques like linear and quadratic probing.

Uploaded by

Manikyaraju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

DS - Unit 5 - Notes

This document discusses hashing and hash tables. It begins by defining hashing as a technique that maps keys to values in a hash table using a hash function. A good hash function distributes entries uniformly in the hash table and allows accessing elements in O(1) time. The document then discusses hash tables, which are data structures that implement key-value pairs using a hash function to map keys to array indexes. It notes the components of a hash table include the hash function, collision handling, insertion, and retrieval operations. The document explains different hash functions and collision handling methods like separate chaining using linked lists and open addressing techniques like linear and quadratic probing.

Uploaded by

Manikyaraju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

II B.Tech – I Sem Data Structures Dept.

of AI

Unit 5: Hashing and Dictionaries

1.Hashing and Hash Table:


Hashing: Hashing is a technique or process of mapping keys, values into the hash table by
using a hash function. It means that hashing is a technique to convert a range of key values into a
range of indexes of an array. It is done for faster access to elements.
Hashing is a technique that is used to uniquely identify a specific object from a group of
similar objects. Some examples of how hashing is used in our lives include:

 In universities, each student is assigned a unique roll number that can be used to retrieve
information about them.
 In libraries, each book is assigned a unique number that can be used to determine information
about the book, such as its exact position in the library or the users it has been issued to etc.
In both these examples the students and books were hashed to a unique number. Assume
that you have an object and you want to assign a key to it to make searching easy. To store the
key/value pair, you can use a simple array like a data structure where keys (integers) can be used
directly as an index to store values. However, in cases where the keys are large and cannot be
used directly as an index, you should use hashing.
In hashing, large keys are converted into small keys by using hash functions. The values
are then stored in a data structure called hash table. The idea of hashing is to distribute entries
(key/value pairs) uniformly across an array. Each element is assigned a key (converted key). By
using that key you can access the element in O(1) time. Using the key, the algorithm (hash
function) computes an index that suggests where an entry can be found or inserted.
Hashing is implemented in two steps:
 An element is converted into an integer by using a hash function. This element can be used
as an index to store the original element, which falls into the hash table.
 The element is stored in the hash table where it can be quickly retrieved using hashed key.

Hash = hashfunc(key)
index = hash % array_size
In this method, the hash is independent of the array size and it is then reduced to an index (a
number between 0 and array_size − 1) by using the modulo operator (%).

The efficiency of mapping depends on the efficiency of the hash function used.
Hash Table: Hash table is a widely used efficient data structure that used to store data which
can be searched in constant time O(1). It is also referred as hash map or hash set. This data
structure is implemented over an array that maps keys to values. Hence, hash map can be seen as
a set of key value pairs. Each key is a number in the range of 0 to the array size–1, generated by
a “hash function”. A good example for hash table is phone book. A phone book has names and
phone numbers. In this case, the names are the keys, the phone numbers are the values.
There are four key components in a hash table:
 a hash function: to map a data to an integer index
 collision handling: how to handle two data points that table to the same index
 put: how to add data to a hash table? in O(1) time
 get: how to retrieve data from a hash table? in O(1) time

DrAA Unit-5 1
II B.Tech – I Sem Data Structures Dept. of AI

2. Hash function:
A hash function is any function that can be used to map a data set of an arbitrary size to a
data set of a fixed size, which falls into the hash table. The values returned by a hash function are
called hash values, hash codes, hash sums, or simply hashes.

Let us understand the need for a good hash function. Assume that you have to store strings in the
hash table by using the hashing technique {“abcdef”, “bcdefa”, “cdefab” , “defabc” }.

To compute the index for storing the strings, use a hash function that states the following:

The index for a specific string will be equal to the sum of the ASCII values of the characters
modulo 599. As 599 is a prime number, it will reduce the possibility of indexing different strings
(collisions). It is recommended that you use prime numbers in case of modulo. The ASCII values
of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102 respectively. Since all the strings contain
the same characters with different permutations, the sum will 599.

The hash function will compute the same index for all the strings and the strings will be stored in
the hash table in the following format. As the index of all the strings is the same, you can create a
list on that index and insert all the strings in that list.

Here, it will take O(n) time (where n is the number of strings) to access a specific string. This
shows that the hash function is not a good hash function. Let‟s try a different hash function. The
index for a specific string will be equal to sum of ASCII values of characters multiplied by their
respective order in the string after which it is modulo with 2069 (prime number).
String Hash function Index
abcdef (971 + 982 + 993 + 1004 + 1015 + 1026)%2069 38
bcdefa (981 + 992 + 1003 + 1014 + 1025 + 976)%2069 23
cdefab (991 + 1002 + 1013 + 1024 + 975 + 986)%2069 14
defabc (1001 + 1012 + 1023 + 974 + 985 + 996)%2069 11

DrAA Unit-5 2
II B.Tech – I Sem Data Structures Dept. of AI

A hash function is used to generate the index (hash code) of the key in the array. Ideally,
each index a hash function generates should be unique, but in practice, it is extremely difficult.
For example, a hash code is generated based on the object's address. If we need to compare other
characteristics about two objects, like two people's name and age, we need to write a new hash
function that generates hash code based on name and age.
Properties:
 A hash function always returns a number
 Two equal objects (based on the equal() method) always have the same hash code
 Two different objects don't always have the same hash code
Here is the procedure of storing objects in the array using a hash function:

Here is a hash table example:

Data List x = { 15, 40, 55, 63, 73, 93, 101 }


Hash function h(x) = [ x % 7 ]
i.e. 15 % 7 = 1; 40 % 7 = 5; 55 % 7 = 6; 63 % 7 = 0; 73 % 7 = 3; 93 % 7 = 2; 101 % 7 = 4;
Hash Table:
0 1 2 3 4 5 6
63 15 93 73 101 40 55

Here we use a hash function which generates indexes based on the remainder. So, 15 will be in
index 1, 93 will be in index 2. But if there was a 79 which also has a remainder 2, a collision will
happen.

DrAA Unit-5 3
II B.Tech – I Sem Data Structures Dept. of AI

Hash Table:
0 1 2 3 4 5 6
63 15 93 73 101 40 55
79

3. Separate Chaining (Collision resolution) - Linked List Method)


Collision happens when different objects in the hash table are mapped to the same location.
How it works: put all the objects which have the same hash code into a linked list (the hash table
is then an array of lists).

To perform a search: we need to determine which list to traverse by using the hash function, and
then search in that linked list.
To perform insert: depending on if duplicates are accepted or not, we check the target list to see
if the element we want to insert already exists or not. We then insert the element at the front of
the list because first, it is convenient; second, the recently inserted elements are more likely to be
needed in the future.=

4. Open Addressing (A non-Linked list method)


a) Linear Probing
Linear probing is a scheme in computer programming for resolving collisions in hash
tables, data structures for maintaining a collection of key–value pairs and looking up the value
associated with a given key.
How it works: when there is collision, this method looks for the next available space and inserts
the object there. For example, instead of putting 47 and 40 in a linked list, linear probing finds
the next space available to insert object 47.

DrAA Unit-5 4
II B.Tech – I Sem Data Structures Dept. of AI

b. Quadratic Probing
Quadratic probing is an open addressing scheme for resolving hash collisions in hash
tables. Quadratic probing operates by taking the original hash index and adding successive
values of an arbitrary quadratic polynomial until an open slot is found.
How it works: it is a collision handling method that eliminates the primary clustering problem.
Like what the name itself, this method uses a quadratic collision function. A common choice is
f(i) = i^2, i stands for index.

In this process, secondary clustering can happen. Just like primary clustering, the slots after the
hash position can be filled up, but created by quadratic probing. For example, if the primary hash
index is x, the probing will go x+2, x+4, x+9 and so on. Secondary clustering is less problematic
for the performance.

5. Double Hashing
How it works: this method requires a second hash function to resolve collision. When a collision
happens, this method uses the second hash function to generate a new hash code for the object. A
common choice is hash(key) = R - (key % R), R stands for a prime smaller than the table size.

Double Hashing Example

DrAA Unit-5 5
II B.Tech – I Sem Data Structures Dept. of AI

6. Rehashing
Rehashing includes increasing the size of the underlying data structure and mapping existing
items to new bucket locations.
How it works: When an insert is made such that the number of entries in a hash table exceeds the
product of the load factor and the current capacity then the hash table will need to be rehashed.
When should we rehash?
 When table s half full.
 When an insertion fails.
 When load reaches a certain level.
Rehashing example:

7. Extendible Hashing
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used
to hash data. It is an aggressively flexible method in which the hash function also experiences dynamic
changes.
Main features of Extendible Hashing: The main features in this hashing technique are:
 Directories: The directories store addresses of the buckets in pointers. An id is assigned to
each directory which may change each time when Directory Expansion takes place.
 Buckets: The buckets are used to hash the actual data.

Extendible Hashing Example

DrAA Unit-5 6
II B.Tech – I Sem Data Structures Dept. of AI

8. Implementation of Dictionaries
A dictionary is a general-purpose data structure for storing a group of objects. A dictionary
is an ordered or unordered list of elements. Each element is a pair of key and value. A value is
associated with the corresponding key. When presented with a key, the dictionary will simply
return the associated value. A dictionary is also known as an associative array or a map.
A dictionary holds a set of {key, value} pairs
Example 1:
Key Value
FirstName Mahesh
LastName Babu
Address Hyderabad
Age 45
Example 2: The results of a classroom test could be represented as a dictionary with student's
names as keys and their scores as the values:
results = { „ Sachin‟ : 65, „Dhoni‟ : 70, „ Kohili‟ : 55, „Irfan‟ : 50, „Raina‟ : 40 }

Basic Dictionary Operations


The dictionary ADT provides operations for inserting the records, deleting the records and
searching the records in the collection of databases. Dictionaries typically support so many
operations such as:
Insert(x, D) -> insertion of element x(key & value) in to dictionary D.
Insert(key, value) e.g., Insert(age, 40)
Delete(x, D) -> deletion of element x(key & value) from the dictionary D.
delete(key) e.g., delete(age)
Search(x, D) -> searching the prescribed value of x in the dictionary D with
a key of an element x.
search(key) – value e.g., search(age) - 40
Member(x, D) -> It returns “true” if x belongs to D else returns “false”.
size(D) -> It returns the count of total number of elements in dictionary D.
Max(D) -> It returns the maximum element in the dictionary D.
Min(D) -> It returns the minimum element in the dictionary D.

Example: Consider an empty unordered dictionary and the following set of operations:
Operation Dictionary Output
insertItem(5,A) {(5,A)}
insertItem(7,B) {(5,A), (7,B)}
insertItem(2,C) {(5,A), (7,B), (2,C)}
insertItem(8,D) {(5,A), (7,B), (2,C), (8,D)}
insertItem(2,E) {(5,A), (7,B), (2,C), (8,D), (2,E)}
findItem(7) {(5,A), (7,B), (2,C), (8,D), (2,E)} B
findItem(4) {(5,A), (7,B), (2,C), (8,D), (2,E)} NO_SUCH_KEY
findItem(2) {(5,A), (7,B), (2,C), (8,D), (2,E)} C
findAllItems(2) {(5,A), (7,B), (2,C), (8,D), (2,E)} C, E

DrAA Unit-5 7
II B.Tech – I Sem Data Structures Dept. of AI

size() {(5,A), (7,B), (2,C), (8,D), (2,E)} 5


removeItem(5) {(7,B), (2,C), (8,D), (2,E)} A
removeAllItems(2) {(7,B), (8,D)} C, E
findItem(4) {(7,B), (8,D)} NO_SUCH_KEY

Dictionary Linear List Representation


The dictionary ADT provides operations for inserting the records, deleting the records and
searching the records in the collection of databases. The dictionary can be represented as a linear
list. The linear list is a collection of pairs (Key and value). There are two methods in
representation of a dictionary in linked list:
 Sorted Array (Sorted list)
 Sorted Chain (Skip list)
The contents of dictionary are always in sorted form.
Key Value nextAddress

1 50 2 60 4 70 6 80 Null
The pseudo code of Linear list for representing dictionary:
struct node
{
int key;
int value;
struct node *next;
} head;
void insert( );
void delete( );
void search( );
void display( );
Insertion: consider initially the dictionary is empty. That means that Head is NULL. We will
create a new node with key = 1 and a value 50.
Key value nextAddress

HEAD 1 50 NULL
If we want to insert more records like key = 2; value = 60 then we will create such a new node:
Key value nextAddress

1 50 2 60 NULL
Note : Head(Key) < New(Key)
Deletion: consider initially the dictionary is full. That means that Head is NOT NULL. We will
delete an existing node with key.
Key value nextAddress

1 50 2 60 4 70 6 80 Null

Suppose we want to delete middle node with key (4) from the list.

1 50 2 60 4 70 6 80 Null

DrAA Unit-5 8

You might also like