6 Hash-Based Indexing
6 Hash-Based Indexing
Hashing function
Excellent for equality selection
The basic idea is to use a hashing function, which maps
values in a search field into a range of bucket numbers to
find the page on which a desired data entry belongs.
Hashing function:
Static Hashing : suffers from the problem of long overflow
chains, which can affect performance.
Extendible Hashing : uses a directory to support inserts and
deletes efficiently without any overflow pages.
Linear Hashing : uses a clever policy for creating new
buckets and supports inserts and deletes efficiently
without the use of a directory.
STATIC HASHING
The pages containing the data can be viewed as a
collection of buckets, with one primary page and
possibly additional overflow pages per bucket.
A file consists of buckets 0 through N − 1, with one
primary page per bucket initially.
Buckets contain data entries.
Static Hashing
Operation on Static Hashing
Search : apply a hash function h to identify the bucket to
which it belongs and then search this bucket.
Insert :
use the hash function to identify the correct bucket
put the data entry there.
If there is no space for this data entry, allocate a new overflow page, put
the data entry on this page, and add the page to the overflow chain of
the bucket.
Delete :
use the hashing function to identify the correct bucket,
locate the data entry by searching the bucket, and then remove it.
If this data entry is the last in an overflow page, the overflow page is
removed from the overflow chain of the bucket and added to a list of
free pages.
Hash Function
hash function must distribute values in the domain of the
search field uniformly over the collection of buckets.
N buckets, numbered 0 through N − 1, a hash function h :
h(value) = (a*value + b)
The bucket identified : h(value) mod N.
The constants a and b can be chosen to `tune' the hash
function.
EXTENDIBLE HASHING *
Doubling the number of buckets and redistributing
the entries across the new set of buckets high
cost.
The Extendible Hashing scheme uses a directory to
support inserts and deletes efficiently without any
overflow pages.
Use a directory of pointers to buckets, and double
the size of the number of buckets by doubling just the
directory and splitting only the bucket that
overflowed.
To locate a data entry with hash value 5 (5*)?? 13*??
Example of an Extendible Hashed File:
After Inserting Entry r with h(r)=13
While Inserting Entry r with h(r)=20
After Inserting Entry r with h(r)=20
After Inserting Entry r with h(r)=9
Hash Function
The basic technique used in Extendible Hashing : to
treat the result of applying a hash function h as a
binary number and to interpret the last d bits, where
d depends on the size of the directory, as an offset
into the directory.
In example d = 2 (because have four buckets)
After the split, d = 3 (because have eight buckets).
LINEAR HASHING
Linear Hashing is a dynamic hashing technique, does not
require a directory.
The scheme utilizes a family of hash functions h0, h1, h2, : : :,
with the property that each function's range is twice that
of its predecessor.
If hi maps a data entry into one of M buckets, hi+1 maps a
data entry into one of 2M buckets.
The idea is the best understood in in terms of round of
splitting.
During round number level, only hash function hlevel and
hlevel+1 are in use.
The buckets in the file at the beginning of the round are
split, one by one from the first to the last bucket.
At any given point within the round, there are:
Bucket that have been split
Bucket that are yet to be split
Bucket created by split in this round.
Buckets during a Round in Linear Hashing
Search for a data entry with a given search key value:
Apply hash function hLevel, :
If this leads to one of the unsplit buckets look there.
If it leads to one of the split buckets, the entry may be there
or it may have been moved to the new bucket created
earlier in this round by splitting this bucket;
To determine which of these two buckets contains the entry,
apply hLevel+1.
An overflow page is added to store the newly
inserted data entry (which triggered the split).
A counter Level : indicate the current round number
and is initialized to 0.
The bucket to split is denoted by Next and is initially
bucket 0 (the first bucket).
Denote the number of buckets in the file at the
beginning of round Level by NLevel NLevel = N * 2Level.
For example:
Let the number of buckets at the beginning of round 0,
denoted by N0, be N.
Each bucket can hold four data entries, and the file initially
contains four buckets, as shown in the figure.
Example of a Linear Hashed File
The bucket can be split whenever a new overflow page is
added.
A split is “triggered” when inserting a new data entry
causes the creation of an overflow page.
Whenever a split is triggered, the Next bucket is split
and hash function hlevel+1 redistributes entries between
these buckets and its split image.
After splitting a bucket, the value of Next increment by 1.
For example: insert data entry 43* triggers a split
After Inserting Record r with h(r)=43
Not all insertions trigger a split.
For example: insert 37*.