HashingHashing
Gaurav TrivediGaurav Trivedi
EE 693EE 693
Algorithms and Data StructuresAlgorithms and Data Structures
ContentsContents
 Static HashingStatic Hashing
• File OrganizationFile Organization
• Properties of the Hash FunctionProperties of the Hash Function
• Bucket OverflowBucket Overflow
• IndicesIndices
 Dynamic HashingDynamic Hashing
• Underlying Data StructureUnderlying Data Structure
• Querying and UpdatingQuerying and Updating
 ComparisonsComparisons
• Other types of hashingOther types of hashing
• Ordered Indexing vs. HashingOrdered Indexing vs. Hashing
Static HashingStatic Hashing
 Hashing provides a means forHashing provides a means for
accessing data without the use of anaccessing data without the use of an
index structure.index structure.
 Data is addressed on disk byData is addressed on disk by
computing a function on a searchcomputing a function on a search
key instead.key instead.
OrganizationOrganization
 AA bucketbucket in a hash file is unit ofin a hash file is unit of
storage (typically a disk block) thatstorage (typically a disk block) that
can hold one or more records.can hold one or more records.
 TheThe hash functionhash function, h, is a function, h, is a function
from the set of all search-keys, K, tofrom the set of all search-keys, K, to
the set of all bucket addresses, B.the set of all bucket addresses, B.
 Insertion, deletion, and lookup areInsertion, deletion, and lookup are
done in constant time.done in constant time.
Querying and UpdatesQuerying and Updates
 To insert a record into the structureTo insert a record into the structure
compute the hash value h(Kcompute the hash value h(Kii), and), and
place the record in the bucketplace the record in the bucket
address returned.address returned.
 For lookup operations, compute theFor lookup operations, compute the
hash value as above and search eachhash value as above and search each
record in the bucket for the specificrecord in the bucket for the specific
record.record.
 To delete simply lookup and remove.To delete simply lookup and remove.
Properties of the Hash FunctionProperties of the Hash Function
 The distribution should be uniform.The distribution should be uniform.
• An ideal hash function should assign theAn ideal hash function should assign the
same number of records in each bucket.same number of records in each bucket.
 The distribution should be random.The distribution should be random.
• Regardless of the actual search-keys,Regardless of the actual search-keys,
the each bucket has the same numberthe each bucket has the same number
of records on averageof records on average
• Hash values should not depend on anyHash values should not depend on any
ordering or the search-keysordering or the search-keys
Bucket OverflowBucket Overflow
 How does bucket overflow occur?How does bucket overflow occur?
• Not enough buckets to handle dataNot enough buckets to handle data
• A few buckets have considerably moreA few buckets have considerably more
records then others. This is referred torecords then others. This is referred to
as skew.as skew.
 Multiple records have the same hash valueMultiple records have the same hash value
 Non-uniform hash function distribution.Non-uniform hash function distribution.
SolutionsSolutions
 Provide more buckets then areProvide more buckets then are
needed.needed.
 Overflow chainingOverflow chaining
• If a bucket is full, link another bucket toIf a bucket is full, link another bucket to
it. Repeat as necessary.it. Repeat as necessary.
• The system must then check overflowThe system must then check overflow
buckets for querying and updates. Thisbuckets for querying and updates. This
is known asis known as closed hashingclosed hashing..
AlternativesAlternatives
 Open hashingOpen hashing
• The number of buckets is fixedThe number of buckets is fixed
• Overflow is handled by using the nextOverflow is handled by using the next
bucket in cyclic order that has space.bucket in cyclic order that has space.
 This is known asThis is known as linear probinglinear probing..
 Compute more hash functions.Compute more hash functions.
Note: Closed hashing is preferred inNote: Closed hashing is preferred in
database systems.database systems.
IndicesIndices
 AA hash indexhash index organizes the searchorganizes the search
keys, with their pointers, into a hashkeys, with their pointers, into a hash
file.file.
 Hash indices never primary evenHash indices never primary even
though they provide direct access.though they provide direct access.
Example of Hash IndexExample of Hash Index
Dynamic HashingDynamic Hashing
 More effective then static hashingMore effective then static hashing
when the database grows or shrinkswhen the database grows or shrinks
 Extendable hashingExtendable hashing splits andsplits and
coalesces buckets appropriately withcoalesces buckets appropriately with
the database size.the database size.
• i.e. buckets are added and deleted oni.e. buckets are added and deleted on
demand.demand.
The Hash FunctionThe Hash Function
 Typically produces a large number ofTypically produces a large number of
values, uniformly and randomly.values, uniformly and randomly.
 Only part of the value is usedOnly part of the value is used
depending on the size of thedepending on the size of the
database.database.
Data StructureData Structure
 Hash indices are typically a prefix ofHash indices are typically a prefix of
the entire hash value.the entire hash value.
 More then one consecutive index canMore then one consecutive index can
point to the same bucket.point to the same bucket.
• The indices have the same hash prefixThe indices have the same hash prefix
which can be shorter then the length ofwhich can be shorter then the length of
the index.the index.
General Extendable HashGeneral Extendable Hash
StructureStructure
In this structure, i2 = i3 = i, whereas i1 = i – 1
Queries and UpdatesQueries and Updates
 LookupLookup
• Take the first i bits of the hash value.Take the first i bits of the hash value.
• Following the corresponding entry in theFollowing the corresponding entry in the
bucket address table.bucket address table.
• Look in the bucket.Look in the bucket.
Queries and Updates (Cont’d)Queries and Updates (Cont’d)
 InsertionInsertion
• Follow lookup procedureFollow lookup procedure
• If the bucket has space, add the record.If the bucket has space, add the record.
• If not…If not…
Insertion (Cont’d)Insertion (Cont’d)
 Case 1: i = iCase 1: i = ijj
• Use an additional bit in the hash valueUse an additional bit in the hash value
 This doubles the size of the bucket address table.This doubles the size of the bucket address table.
 Makes two entries in the table point to the fullMakes two entries in the table point to the full
bucket.bucket.
• Allocate a new bucket, z.Allocate a new bucket, z.
 Set iSet ijj and iand izz to ito i
 Point the second entry to the new bucketPoint the second entry to the new bucket
 Rehash the old bucketRehash the old bucket
• Repeat insertion attemptRepeat insertion attempt
Insertion (Cont’d)Insertion (Cont’d)
 Case 2: i > iCase 2: i > ijj
• Allocate a new bucket, zAllocate a new bucket, z
• Add 1 to iAdd 1 to ijj, set, set iijj andand iizz to this new valueto this new value
• Put half of the entries in the first bucketPut half of the entries in the first bucket
and half in the otherand half in the other
• Rehash records in bucket jRehash records in bucket j
• Reattempt insertionReattempt insertion
Insertion (Finally)Insertion (Finally)
 If all the records in the bucket haveIf all the records in the bucket have
the same search value, simply usethe same search value, simply use
overflow buckets as seen in staticoverflow buckets as seen in static
hashing.hashing.
Use of Extendable HashUse of Extendable Hash
Structure: ExampleStructure: Example
Initial Hash structure, bucket size = 2
Example (Cont.)Example (Cont.)
 Hash structure after insertion ofHash structure after insertion of
one Brighton and two Downtownone Brighton and two Downtown
recordsrecords
Example (Cont.)Example (Cont.)
Hash structure after insertion of Mianus record
Example (Cont.)Example (Cont.)
Hash structure after insertion of three Perryridge records
Example (Cont.)Example (Cont.)
 Hash structure after insertion ofHash structure after insertion of
Redwood and Round Hill recordsRedwood and Round Hill records
Comparison to Other HashingComparison to Other Hashing
MethodsMethods
 Advantage: performance does notAdvantage: performance does not
decrease as the database sizedecrease as the database size
increasesincreases
• Space is conserved by adding andSpace is conserved by adding and
removing as necessaryremoving as necessary
 Disadvantage: additional level ofDisadvantage: additional level of
indirection for operationsindirection for operations
• Complex implementationComplex implementation
Ordered Indexing vs. HashingOrdered Indexing vs. Hashing
 Hashing is less efficient if queries toHashing is less efficient if queries to
the database include ranges asthe database include ranges as
opposed to specific values.opposed to specific values.
 In cases where ranges are infrequentIn cases where ranges are infrequent
hashing provides faster insertion,hashing provides faster insertion,
deletion, and lookup then ordereddeletion, and lookup then ordered
indexing.indexing.

More Related Content

PDF
Hashing and Hash Tables
PPT
Hashing PPT
ZIP
Hashing
PDF
Hash Tables in data Structure
PPT
Concept of hashing
PPTX
Hashing 1
PPTX
Hash table in java
PPTX
Hashing data
Hashing and Hash Tables
Hashing PPT
Hashing
Hash Tables in data Structure
Concept of hashing
Hashing 1
Hash table in java
Hashing data

What's hot (20)

PPTX
Hashing
PPT
Hashing
PPT
Ch17 Hashing
PPTX
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
PPTX
Hashing In Data Structure
PPTX
linear probing
PPT
Data Structure and Algorithms Hashing
PPTX
Hashing
PPTX
Hashing in datastructure
PPTX
Hashing algorithms and its uses
PDF
08 Hash Tables
PPT
4.4 hashing
PPTX
Hashing
PPT
18 hashing
PPTX
Quadratic probing
PPT
Chapter 12 ds
PPT
PDF
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
PDF
Application of hashing in better alg design tanmay
PPTX
Hash table in data structure and algorithm
Hashing
Hashing
Ch17 Hashing
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
Hashing In Data Structure
linear probing
Data Structure and Algorithms Hashing
Hashing
Hashing in datastructure
Hashing algorithms and its uses
08 Hash Tables
4.4 hashing
Hashing
18 hashing
Quadratic probing
Chapter 12 ds
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Application of hashing in better alg design tanmay
Hash table in data structure and algorithm
Ad

Similar to Hashing gt1 (20)

PPTX
Hashing Techniques in database management systems
PPTX
UNIT-V-hashing and its techniques ppt.pptx
PPTX
DT-08-Hashing.PPTX
PPTX
Hashing_UNIT2.pptx
PDF
PDF
DataBaseManagementSystems-BTECH--UNIT-5.pdf
PPTX
B tree
PPTX
hashing in data structures and its applications
PPTX
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
PPTX
files,indexing,hashing,linear and non linear hashing
PDF
Hashing
PPTX
Unit4 Part3.pptx
PPTX
Relational Database Management System
PPTX
Hash Table.pptx
PPTX
Data base Hash based indexing good.pptxx
PDF
DBMS 9 | Extendible Hashing
PDF
5 data storage_and_indexing
PPTX
Hashing techniques, Hashing function,Collision detection techniques
PPT
File organization 1
PDF
Hashing and File Structures in Data Structure.pdf
Hashing Techniques in database management systems
UNIT-V-hashing and its techniques ppt.pptx
DT-08-Hashing.PPTX
Hashing_UNIT2.pptx
DataBaseManagementSystems-BTECH--UNIT-5.pdf
B tree
hashing in data structures and its applications
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
files,indexing,hashing,linear and non linear hashing
Hashing
Unit4 Part3.pptx
Relational Database Management System
Hash Table.pptx
Data base Hash based indexing good.pptxx
DBMS 9 | Extendible Hashing
5 data storage_and_indexing
Hashing techniques, Hashing function,Collision detection techniques
File organization 1
Hashing and File Structures in Data Structure.pdf
Ad

More from Gopi Saiteja (20)

PPT
Trees gt(1)
PPT
Topic11 sortingandsearching
PPT
Heapsort
PDF
Ee693 sept2014quizgt2
PDF
Ee693 sept2014quizgt1
PDF
Ee693 sept2014quiz1
PDF
Ee693 sept2014midsem
PDF
Ee693 questionshomework
PPT
Dynamic programming
PPT
Cs105 l15-bucket radix
PPT
Chapter11 sorting algorithmsefficiency
PDF
Answers withexplanations
PDF
Sorting
PDF
Solution(1)
PDF
Pthread
PDF
Open mp
PDF
Introduction
PDF
PDF
Vector space interpretation_of_random_variables
PDF
Statistical signal processing(1)
Trees gt(1)
Topic11 sortingandsearching
Heapsort
Ee693 sept2014quizgt2
Ee693 sept2014quizgt1
Ee693 sept2014quiz1
Ee693 sept2014midsem
Ee693 questionshomework
Dynamic programming
Cs105 l15-bucket radix
Chapter11 sorting algorithmsefficiency
Answers withexplanations
Sorting
Solution(1)
Pthread
Open mp
Introduction
Vector space interpretation_of_random_variables
Statistical signal processing(1)

Recently uploaded (20)

PPTX
SC Robotics Team Safety Training Presentation
PPTX
Design ,Art Across Digital Realities and eXtended Reality
PPTX
Module1.pptxrjkeieuekwkwoowkemehehehrjrjrj
PDF
Mechanics of materials week 2 rajeshwari
PPTX
DATA STRCUTURE LABORATORY -BCSL305(PRG1)
PDF
IAE-V2500 Engine Airbus Family A319/320
PPTX
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
PPTX
Software-Development-Life-Cycle-SDLC.pptx
PDF
V2500 Owner and Operatore Guide for Airbus
PDF
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
PPTX
Solar energy pdf of gitam songa hemant k
PDF
Module 1 part 1.pdf engineering notes s7
PPT
Basics Of Pump types, Details, and working principles.
PDF
Research on ultrasonic sensor for TTU.pdf
PDF
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
PDF
Performance, energy consumption and costs: a comparative analysis of automati...
PPT
Unit - I.lathemachnespct=ificationsand ppt
PDF
ECT443_instrumentation_Engg_mod-1.pdf indroduction to instrumentation
PPTX
chapter 1.pptx dotnet technology introduction
PPTX
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
SC Robotics Team Safety Training Presentation
Design ,Art Across Digital Realities and eXtended Reality
Module1.pptxrjkeieuekwkwoowkemehehehrjrjrj
Mechanics of materials week 2 rajeshwari
DATA STRCUTURE LABORATORY -BCSL305(PRG1)
IAE-V2500 Engine Airbus Family A319/320
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
Software-Development-Life-Cycle-SDLC.pptx
V2500 Owner and Operatore Guide for Airbus
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
Solar energy pdf of gitam songa hemant k
Module 1 part 1.pdf engineering notes s7
Basics Of Pump types, Details, and working principles.
Research on ultrasonic sensor for TTU.pdf
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
Performance, energy consumption and costs: a comparative analysis of automati...
Unit - I.lathemachnespct=ificationsand ppt
ECT443_instrumentation_Engg_mod-1.pdf indroduction to instrumentation
chapter 1.pptx dotnet technology introduction
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS

Hashing gt1

  • 1. HashingHashing Gaurav TrivediGaurav Trivedi EE 693EE 693 Algorithms and Data StructuresAlgorithms and Data Structures
  • 2. ContentsContents  Static HashingStatic Hashing • File OrganizationFile Organization • Properties of the Hash FunctionProperties of the Hash Function • Bucket OverflowBucket Overflow • IndicesIndices  Dynamic HashingDynamic Hashing • Underlying Data StructureUnderlying Data Structure • Querying and UpdatingQuerying and Updating  ComparisonsComparisons • Other types of hashingOther types of hashing • Ordered Indexing vs. HashingOrdered Indexing vs. Hashing
  • 3. Static HashingStatic Hashing  Hashing provides a means forHashing provides a means for accessing data without the use of anaccessing data without the use of an index structure.index structure.  Data is addressed on disk byData is addressed on disk by computing a function on a searchcomputing a function on a search key instead.key instead.
  • 4. OrganizationOrganization  AA bucketbucket in a hash file is unit ofin a hash file is unit of storage (typically a disk block) thatstorage (typically a disk block) that can hold one or more records.can hold one or more records.  TheThe hash functionhash function, h, is a function, h, is a function from the set of all search-keys, K, tofrom the set of all search-keys, K, to the set of all bucket addresses, B.the set of all bucket addresses, B.  Insertion, deletion, and lookup areInsertion, deletion, and lookup are done in constant time.done in constant time.
  • 5. Querying and UpdatesQuerying and Updates  To insert a record into the structureTo insert a record into the structure compute the hash value h(Kcompute the hash value h(Kii), and), and place the record in the bucketplace the record in the bucket address returned.address returned.  For lookup operations, compute theFor lookup operations, compute the hash value as above and search eachhash value as above and search each record in the bucket for the specificrecord in the bucket for the specific record.record.  To delete simply lookup and remove.To delete simply lookup and remove.
  • 6. Properties of the Hash FunctionProperties of the Hash Function  The distribution should be uniform.The distribution should be uniform. • An ideal hash function should assign theAn ideal hash function should assign the same number of records in each bucket.same number of records in each bucket.  The distribution should be random.The distribution should be random. • Regardless of the actual search-keys,Regardless of the actual search-keys, the each bucket has the same numberthe each bucket has the same number of records on averageof records on average • Hash values should not depend on anyHash values should not depend on any ordering or the search-keysordering or the search-keys
  • 7. Bucket OverflowBucket Overflow  How does bucket overflow occur?How does bucket overflow occur? • Not enough buckets to handle dataNot enough buckets to handle data • A few buckets have considerably moreA few buckets have considerably more records then others. This is referred torecords then others. This is referred to as skew.as skew.  Multiple records have the same hash valueMultiple records have the same hash value  Non-uniform hash function distribution.Non-uniform hash function distribution.
  • 8. SolutionsSolutions  Provide more buckets then areProvide more buckets then are needed.needed.  Overflow chainingOverflow chaining • If a bucket is full, link another bucket toIf a bucket is full, link another bucket to it. Repeat as necessary.it. Repeat as necessary. • The system must then check overflowThe system must then check overflow buckets for querying and updates. Thisbuckets for querying and updates. This is known asis known as closed hashingclosed hashing..
  • 9. AlternativesAlternatives  Open hashingOpen hashing • The number of buckets is fixedThe number of buckets is fixed • Overflow is handled by using the nextOverflow is handled by using the next bucket in cyclic order that has space.bucket in cyclic order that has space.  This is known asThis is known as linear probinglinear probing..  Compute more hash functions.Compute more hash functions. Note: Closed hashing is preferred inNote: Closed hashing is preferred in database systems.database systems.
  • 10. IndicesIndices  AA hash indexhash index organizes the searchorganizes the search keys, with their pointers, into a hashkeys, with their pointers, into a hash file.file.  Hash indices never primary evenHash indices never primary even though they provide direct access.though they provide direct access.
  • 11. Example of Hash IndexExample of Hash Index
  • 12. Dynamic HashingDynamic Hashing  More effective then static hashingMore effective then static hashing when the database grows or shrinkswhen the database grows or shrinks  Extendable hashingExtendable hashing splits andsplits and coalesces buckets appropriately withcoalesces buckets appropriately with the database size.the database size. • i.e. buckets are added and deleted oni.e. buckets are added and deleted on demand.demand.
  • 13. The Hash FunctionThe Hash Function  Typically produces a large number ofTypically produces a large number of values, uniformly and randomly.values, uniformly and randomly.  Only part of the value is usedOnly part of the value is used depending on the size of thedepending on the size of the database.database.
  • 14. Data StructureData Structure  Hash indices are typically a prefix ofHash indices are typically a prefix of the entire hash value.the entire hash value.  More then one consecutive index canMore then one consecutive index can point to the same bucket.point to the same bucket. • The indices have the same hash prefixThe indices have the same hash prefix which can be shorter then the length ofwhich can be shorter then the length of the index.the index.
  • 15. General Extendable HashGeneral Extendable Hash StructureStructure In this structure, i2 = i3 = i, whereas i1 = i – 1
  • 16. Queries and UpdatesQueries and Updates  LookupLookup • Take the first i bits of the hash value.Take the first i bits of the hash value. • Following the corresponding entry in theFollowing the corresponding entry in the bucket address table.bucket address table. • Look in the bucket.Look in the bucket.
  • 17. Queries and Updates (Cont’d)Queries and Updates (Cont’d)  InsertionInsertion • Follow lookup procedureFollow lookup procedure • If the bucket has space, add the record.If the bucket has space, add the record. • If not…If not…
  • 18. Insertion (Cont’d)Insertion (Cont’d)  Case 1: i = iCase 1: i = ijj • Use an additional bit in the hash valueUse an additional bit in the hash value  This doubles the size of the bucket address table.This doubles the size of the bucket address table.  Makes two entries in the table point to the fullMakes two entries in the table point to the full bucket.bucket. • Allocate a new bucket, z.Allocate a new bucket, z.  Set iSet ijj and iand izz to ito i  Point the second entry to the new bucketPoint the second entry to the new bucket  Rehash the old bucketRehash the old bucket • Repeat insertion attemptRepeat insertion attempt
  • 19. Insertion (Cont’d)Insertion (Cont’d)  Case 2: i > iCase 2: i > ijj • Allocate a new bucket, zAllocate a new bucket, z • Add 1 to iAdd 1 to ijj, set, set iijj andand iizz to this new valueto this new value • Put half of the entries in the first bucketPut half of the entries in the first bucket and half in the otherand half in the other • Rehash records in bucket jRehash records in bucket j • Reattempt insertionReattempt insertion
  • 20. Insertion (Finally)Insertion (Finally)  If all the records in the bucket haveIf all the records in the bucket have the same search value, simply usethe same search value, simply use overflow buckets as seen in staticoverflow buckets as seen in static hashing.hashing.
  • 21. Use of Extendable HashUse of Extendable Hash Structure: ExampleStructure: Example Initial Hash structure, bucket size = 2
  • 22. Example (Cont.)Example (Cont.)  Hash structure after insertion ofHash structure after insertion of one Brighton and two Downtownone Brighton and two Downtown recordsrecords
  • 23. Example (Cont.)Example (Cont.) Hash structure after insertion of Mianus record
  • 24. Example (Cont.)Example (Cont.) Hash structure after insertion of three Perryridge records
  • 25. Example (Cont.)Example (Cont.)  Hash structure after insertion ofHash structure after insertion of Redwood and Round Hill recordsRedwood and Round Hill records
  • 26. Comparison to Other HashingComparison to Other Hashing MethodsMethods  Advantage: performance does notAdvantage: performance does not decrease as the database sizedecrease as the database size increasesincreases • Space is conserved by adding andSpace is conserved by adding and removing as necessaryremoving as necessary  Disadvantage: additional level ofDisadvantage: additional level of indirection for operationsindirection for operations • Complex implementationComplex implementation
  • 27. Ordered Indexing vs. HashingOrdered Indexing vs. Hashing  Hashing is less efficient if queries toHashing is less efficient if queries to the database include ranges asthe database include ranges as opposed to specific values.opposed to specific values.  In cases where ranges are infrequentIn cases where ranges are infrequent hashing provides faster insertion,hashing provides faster insertion, deletion, and lookup then ordereddeletion, and lookup then ordered indexing.indexing.