0% found this document useful (0 votes)
13 views

DSA LABTASK 12

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DSA LABTASK 12

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Lab Task # 14: Hash Table I (Open Addressing)

Objectives: to let the students explore Hash Table and associated concepts of hashing, collisions and
open addressing techniques to accommodate collisions.

Hash table

A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute
an index into an array in which an element will be inserted or searched. By using a good hash function,
hashing can work well. Under reasonable assumptions, the average time required to search for an
element in a hash table is O(1).

Hashing

Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects.
Some examples of how hashing is used in our lives include:

1. In universities, each student is assigned a unique roll number that can be used to retrieve
information about them.
2. In libraries, each book is assigned a unique number that can be used to determine information
about the book, such as its exact position in the library or the users it has been issued to etc.

In both these examples the students and books were hashed to a unique number.

Assume that you have an object and you want to assign a key to it to make searching easy. To store the
key/value pair, you can use a simple array like a data structure where keys (integers) can be used
directly as an index to store values. However, in cases where the keys are large and cannot be used
directly as an index, you should use hashing.

In hashing, large keys are converted into small keys by using hash functions. The values are then stored
in a data structure called hash table. The idea of hashing is to distribute entries (key/value pairs)
uniformly across an array. Each element is assigned a key (converted key). By using that key you can
access the element in O(1) time. Using the key, the algorithm (hash function) computes an index that
suggests where an entry can be found or inserted.

Hashing is implemented in two steps:

An element is converted into an integer by using a hash function. This element can be used as an index
to store the original element, which falls into the hash table.

The element is stored in the hash table where it can be quickly retrieved using hashed key.

hash = hashfunc(key)

index = hash % array_size

In this method, the hash is independent of the array size and it is then reduced to an index (a number
between 0 and array_size − 1) by using the modulo operator (%).
Hash function

A hash function is any function that can be used to map a data set of an arbitrary size to a data set of a
fixed size, which falls into the hash table. The values returned by a hash function are called hash values,
hash codes, hash sums, or simply hashes. To achieve a good hashing mechanism, It is important to have
a good hash function with the following basic requirements:

 Easy to compute: It should be easy to compute and must not become an algorithm in itself.
 Uniform distribution: It should provide a uniform distribution across the hash table and should
not result in clustering.
 Less collisions: Collisions occur when pairs of elements are mapped to the same hash value.
These should be avoided.

Irrespective of how good a hash function is, collisions are bound to occur. Therefore, to maintain the
performance of a hash table, it is important to manage collisions through various collision resolution
techniques.

Open Addressing as solution to collisions in Hash Tables

Linear Probing

In open addressing, instead of in linked lists, all entry records are stored in the array itself. When a new
entry has to be inserted, the hash index of the hashed value is computed and then the array is examined
(starting with the hashed index). If the slot at the hashed index is unoccupied, then the entry record is
inserted in slot at the hashed index else it proceeds in some probe sequence until it finds an unoccupied
slot.

The probe sequence is the sequence that is followed while traversing through entries. In different probe
sequences, you can have different intervals between successive entry slots or probes.

When searching for an entry, the array is scanned in the same sequence until either the target element
is found or an unused slot is found. This indicates that there is no such key in the table. The name "open
addressing" refers to the fact that the location or address of the item is not determined by its hash
value.

Linear probing is when the interval between successive probes is fixed (usually to 1). Let’s assume that
the hashed index for a particular entry is index. The probing sequence for linear probing will be:

index = index % hashTableSize

index = (index + 1) % hashTableSize

index = (index + 2) % hashTableSize

index = (index + 3) % hashTableSize

and so on…
Quadratic Probing

Quadratic probing is similar to linear probing and the only difference is the interval between successive
probes or entry slots. Here, when the slot at a hashed index for an entry record is already occupied, you
must start traversing until you find an unoccupied slot. The interval between slots is computed by
adding the successive value of an arbitrary polynomial in the original hashed index.

Let us assume that the hashed index for an entry is index and at index there is an occupied slot. The
probe sequence will be as follows:

index = index % hashTableSize

index = (index + 12) % hashTableSize

index = (index + 22) % hashTableSize

index = (index + 32) % hashTableSize

and so on…
Double Hashing

Double hashing is similar to linear probing and the only difference is the interval between successive
probes. Here, the interval between probes is computed by using two hash functions.

Let us say that the hashed index for an entry record is an index that is computed by one hashing
function and the slot at that index is already occupied. You must start traversing in a specific probing
sequence to look for an unoccupied slot. The probing sequence will be:

index = (index + 1 * indexH) % hashTableSize;

index = (index + 2 * indexH) % hashTableSize;

and so on…

Here, indexH is the hash value that is computed by another hash function.
Lab Tasks:

1. Create a class which will properly accommodate the given insert and search functions. You will
also have to define the hashTable array (array of strings, assuming that length of a string will not
exceed 6 lower case characters) with appropriate Size.
2. Code the hashFucntion in such a way that each string is hashed into a number ranging from zero
to the size of the hashTable array.
Hint: The index/key for a specific string will be equal to sum of ASCII values of characters
multiplied by their respective order in the string after which it is modulo with 2069 (Large prime
number).
String Hash function Index/Key
abcdef (97*1 + 98*2 + 99*3 + 100*4 + 101*5 + 102*6)%2069 38
bcdefa (98*1 + 99*2 + 100*3 + 101*4 + 102*5 + 97*6)%2069 23
3. Execute the code and generate outputs showing the three collision avoidance techniques in
open addressing i.e. linear probing, Quadratic Probing and double hashing).

You might also like