13.hashing
13.hashing
4
USING A HASH FUNCTION
values
[0] 0000
HandyParts company
makes no more than 100
[1] 0001 different parts. But the
[2] parts all have four digit
0002
numbers which ranges
[3] 0
0003 from 0000 to 0100.
[4]
0004
8
We can directly access any
.
.
part record through the
. 10 array index.
.
.
. i.e. there is one-to-one
correspondence between
[ 97] 0097
Part number & index
[ 98] 0098
[ 99] 0099
5
USING A HASH FUNCTION
values
[0] Empty Now if another company makes no
[1] 4501
more than 100 different parts. But
the parts all have four digit
[2] Empty numbers with no
[3] restriction on range .
8903
7803
What to do?
[4]
Empty
.
.8
This hash function can be used to
. store and retrieve parts in an array.
.
. 10
.
Hash(key) = partNum % 100
[ 97] Empty
[ 98] 2298
[ 99] 3699
6
PLACING ELEMENTS IN THE ARRAY
values
[0] Empty
5500 Use the hash function
[ 98] 2298
[ 99] 3699
7
PLACING ELEMENTS IN THE ARRAY
values
[0] 5500
Use the hash function
[ 97] Empty
[ 98] 2298
[ 99] 3699
8
PLACING ELEMENTS IN THE ARRAY
values
[0] 5500
Use the hash function
[ 97] Empty
[ 98] 2298
[ 99] 3699
9
PLACING ELEMENTS IN THE ARRAY
values
[0] 5500
Next place part number
6702 in the array.
[1] 4501
[3]
7803 6702 % 100 = 2
[4]
Empty But values[2] is already
.
.
occupied.
.
.
.
. COLLISION OCCURS
[ 97] Empty
The condition resulting when
[ 98] 2298
two or more keys produce
[ 99] 3699 the same hash location
10
HOW TO RESOLVE THE COLLISION?
values
[0] 5500
One way is by linear probing.
This uses the rehash function
[1] 4501
[3]
7803 repeatedly until an empty location
[4]
is found for part number 6702.
Empty
.
.
.
.
.
. Linear Probing: Resolving a hash
collision by sequentially searching a
[ 97] Empty
hash table beginning at the location
[ 98] 2298 returned by the has function.
[ 99] 3699
11
RESOLVING THE COLLISION
values
[0] 5500
Still looking for a place for 6702
using the function
[1] 4501
[3]
7803
[4]
(6702 + 1) % 100 = 3
Empty
.
.
.
.
.
.
[ 97] Empty
[ 98] 2298
[ 99] 3699
12
COLLISION RESOLVED
values
[0] 5500
Part 6702 can be placed at
the location with index 4.
[1] 4501
[3]
7803
[4]
Empty
.
.
.
.
.
.
[ 97] Empty
[ 98] 2298
[ 99] 3699
13
COLLISION RESOLVED
values
[0] 5500
Part 6702 is placed at
the location with index 4.
[1] 4501
[2] 5502
Where would the part with
[3] 7803 number 4598 be placed using
[4] 6702
linear probing?
Empty
[5]
. .
. . 4598 will be stored at index 5
. . /*treating list as circular*/
[ 97] Empty
[ 98] 2298
[ 99] 3699
14
BUCKETS & CHAINING
Another alternative for handling
collisions is to allow multiple element
keys to hash to the same location.
Bucket
A collection of elements associated with a
particular hash location
BUCKETS & CHAINING
Suppose we have a bucket of size 3. so 3
elements can share the location.
[00 Empty Empty Insert 5462
Empty
5460
] 5462%100 = 2
[01 14001 72101 Empty
Insert 5460
]
5460%100 = 0
[02 9872 5462
Empty Empty
9462
] Insert 9462
. . . . 9462%100 = 2
. . . . Insert 71462
. . . . 71462%100 = 2
0 ...
1
...
2
D-1 ...
HASH TABLES
There are two types of Hash Tables: Open-addressed Hash Tables and Separate-
Chained Hash Tables.
· Insertion.
· Searching
· Deletion.
TYPES OF HASHING
There are two types of hashing :
1. Static hashing: In static hashing, the hash function maps
search-key values to a fixed set of locations.
The load factor of a hash table is the ratio of the number of keys in the table
to the size of the hash table.
Note: The higher the load factor, the slower the retrieval.
· Minimize collisions.
More evenly distributed digit positions are extracted and used for
hashing purposes.
It involves splitting keys into two or more parts and then combining the
parts to form the hash addresses.
Transforms a key into another number base to obtain the hash value.
Typically use number base other than base 10 and base 2 to calculate
the hash addresses.
To map the key 55354 in the range 0 to 9999 using base 11 we have:
5535410 = 3865211
The key is squared and the middle part of the result taken as the
hash value.
To map the key 3121 into a hash table of size 1000, we square it
31212 = 9740641 and extract 406 as the hash value.
Symbol tables: The tables used by compilers to maintain information about symbols
from a program. Compilers access information about symbols frequently. Therefore, it
is important that symbol tables be implemented very efficiently.
Data dictionaries: Data structures that support adding, deleting, and searching for
data. Although the operations of a hash table and a data dictionary are similar, other
data structures may be used to implement data dictionaries. Using a hash table is
particularly efficient.