EXTENDIBLE HASHING
B.KOHILA
I –Msc(IT)
 Extendible hashing
 Expandable and dynamic hashing
 Virtual hashing
 Summary
2
OUTLINE
3
 Standard hashing works on fixed file size.
 What if we add / delete many keys?
What if the file sizes change significantly?
 Then we will develop separate techniques.
Two types:
- Directory schemes
- Directory less schemes
Hash Functions for Extendible Hashing
4
 Keys stored in buckets.
 Each bucket can only hold a fixed size of items.
 Index is an extendible table;
h(x) hashes a key value x to a bit map;
only a portion of a bit map is used to build a directory.
Example: buckets h(kn) = 11011
Add kn
b00 ********************************
b00
b01 b01
b10
Table
b1 b11
Extendible Hashing
00011
00110
00101
01100
01011
10011
11110
11111
00
01
10
11
00
01
10
11
10011
11011
11110
11111
5
 Directory schemes
- Extendible Hashing (Fagin et. al. 1979)
- Expandable hashing (Knott 1971)
- Dynamic Hashing (Larson 1978)
 Directory less schemes
- Virtual hashing (Litwin 1978)
Hash Functions for Extendible Hashing
6
 Size of a bucket = MAX # of pseudokeys (3 in our example)
 Once the bucket is full –
split the bucket into two
Two situation will be possible:
- Directory remains of the same size
adjust pointer to a bucket
- Size of directory grows from 2k to 2k+1
i.e. directory size can be 1, 2, 4, 8, 16 etc
(8 is shown in the figure).
The number of buckets will remain the same,
i.e. some references will point to the same bucket.
Finally, one can use bitmap to build the index but store an actual key in
the bucket!
Extendible Hashing
000
001
010
011
100
101
110
111
7
1. Use as much space as needed.
2. Input the file name, # of words to insert
Use bucket size: 128
3. Use any function h(k) that returns the string of bits of up to
32 bits (integer type can be used).
4. Bucket – char array
5. Main idea: only the FIRST bits of the mask are used for
search
Extendible Hashing
8
Assume that a hashing technique is applied to a dynamically changing file
composed of buckets, and each bucket can hold only a fixed number of items.
Extendible hashing accesses the data stored in
buckets indirectly through an index that is
dynamically adjusted to reflect changes in the file.
The characteristic feature of extendible hashing is the organization of the
index, which is an expandable table.
Extendible Hashing
9
 A hash function applied to a certain key indicates a position in the index
and not in the file (or table or keys). Values returned by such a hash
function are called pseudokeys.
 The file requires no reorganization when data are added to or deleted
from it, since these changes are indicated in the index.
Only one hash function h can be used, but depending on the size of the
index, only a portion of the added h(K) is utilized.
 A simple way to achieve this effect is by looking at the address into the
string of bits from which only the i leftmost bits can be used.
The number i is the depth of the directory.
In figure 1(a) (in the next slide), the depth is equal to two.
Extendible Hashing
10
Extendible Hashing
Figure 1. An example of extendible hashing
(Drozdek Textbook)
11
Expandable Hashing
 Similar idea to an extendible hashing.
But binary tree is used to store an index on the buckets.
Dynamic Hashing
 multiple binary trees are used.
Outcome:
- To shorten the search.
- Based on the key --- select what tree to search.
Expandable & Dynamic Hashing
12
 Larson method
 Index is simplified to be represented as a set of binary
trees.
 Height of each tree is limited.
 h(x) is searched in ALL trees.
Time: m – trees, k keys in each max, overall: m*lgk.
Advantage: shorter search time in index file
Dynamic Hashing
13
Litwin’s Virtual Hashing
 Expand buckets in a linear fashion.
 Store them continuously in the memory.
 No table is needed, the procedure is simple.
Virtual Hashing
14
Summary
 Extendible hashing advantages:
 Initially allocated space can increase indefinitely
 Location of a bucket where key belongs requires only very fast bits
comparison
 Very flexible in choosing size of the bucket, and allows their storage on
disks/remote memory access
 Extendible hashing disadvantages:
 Increased algorithm complexity
 Extra memory overhead to store index inside the bucket

Extensible hashing

  • 1.
  • 2.
     Extendible hashing Expandable and dynamic hashing  Virtual hashing  Summary 2 OUTLINE
  • 3.
    3  Standard hashingworks on fixed file size.  What if we add / delete many keys? What if the file sizes change significantly?  Then we will develop separate techniques. Two types: - Directory schemes - Directory less schemes Hash Functions for Extendible Hashing
  • 4.
    4  Keys storedin buckets.  Each bucket can only hold a fixed size of items.  Index is an extendible table; h(x) hashes a key value x to a bit map; only a portion of a bit map is used to build a directory. Example: buckets h(kn) = 11011 Add kn b00 ******************************** b00 b01 b01 b10 Table b1 b11 Extendible Hashing 00011 00110 00101 01100 01011 10011 11110 11111 00 01 10 11 00 01 10 11 10011 11011 11110 11111
  • 5.
    5  Directory schemes -Extendible Hashing (Fagin et. al. 1979) - Expandable hashing (Knott 1971) - Dynamic Hashing (Larson 1978)  Directory less schemes - Virtual hashing (Litwin 1978) Hash Functions for Extendible Hashing
  • 6.
    6  Size ofa bucket = MAX # of pseudokeys (3 in our example)  Once the bucket is full – split the bucket into two Two situation will be possible: - Directory remains of the same size adjust pointer to a bucket - Size of directory grows from 2k to 2k+1 i.e. directory size can be 1, 2, 4, 8, 16 etc (8 is shown in the figure). The number of buckets will remain the same, i.e. some references will point to the same bucket. Finally, one can use bitmap to build the index but store an actual key in the bucket! Extendible Hashing 000 001 010 011 100 101 110 111
  • 7.
    7 1. Use asmuch space as needed. 2. Input the file name, # of words to insert Use bucket size: 128 3. Use any function h(k) that returns the string of bits of up to 32 bits (integer type can be used). 4. Bucket – char array 5. Main idea: only the FIRST bits of the mask are used for search Extendible Hashing
  • 8.
    8 Assume that ahashing technique is applied to a dynamically changing file composed of buckets, and each bucket can hold only a fixed number of items. Extendible hashing accesses the data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file. The characteristic feature of extendible hashing is the organization of the index, which is an expandable table. Extendible Hashing
  • 9.
    9  A hashfunction applied to a certain key indicates a position in the index and not in the file (or table or keys). Values returned by such a hash function are called pseudokeys.  The file requires no reorganization when data are added to or deleted from it, since these changes are indicated in the index. Only one hash function h can be used, but depending on the size of the index, only a portion of the added h(K) is utilized.  A simple way to achieve this effect is by looking at the address into the string of bits from which only the i leftmost bits can be used. The number i is the depth of the directory. In figure 1(a) (in the next slide), the depth is equal to two. Extendible Hashing
  • 10.
    10 Extendible Hashing Figure 1.An example of extendible hashing (Drozdek Textbook)
  • 11.
    11 Expandable Hashing  Similaridea to an extendible hashing. But binary tree is used to store an index on the buckets. Dynamic Hashing  multiple binary trees are used. Outcome: - To shorten the search. - Based on the key --- select what tree to search. Expandable & Dynamic Hashing
  • 12.
    12  Larson method Index is simplified to be represented as a set of binary trees.  Height of each tree is limited.  h(x) is searched in ALL trees. Time: m – trees, k keys in each max, overall: m*lgk. Advantage: shorter search time in index file Dynamic Hashing
  • 13.
    13 Litwin’s Virtual Hashing Expand buckets in a linear fashion.  Store them continuously in the memory.  No table is needed, the procedure is simple. Virtual Hashing
  • 14.
    14 Summary  Extendible hashingadvantages:  Initially allocated space can increase indefinitely  Location of a bucket where key belongs requires only very fast bits comparison  Very flexible in choosing size of the bucket, and allows their storage on disks/remote memory access  Extendible hashing disadvantages:  Increased algorithm complexity  Extra memory overhead to store index inside the bucket