Hash
Maps
Tutorial 3
Ahmed Fahmy
ECE 250 @uWaterloo
Motivation
• Look-ups for key-value pairs Ahmed 1
• For example:
Lulu 0
• Does an item (key) exist (value) in the data structure?
• Given a student name (key), Maaz 0
would they pass ECE250 (value)?
• Complexity John 1
• Linked lists Gloria 1
• Trees
• Vectors...?
ADT: Dictionary
Motivation
• Using vectors for lookups 0
• If keys are integers
• Space complexity depends on the key range! 1
• A quick solution would be mapping:
2
• Problem: collisions! 3
• Later...
• If keys are objects: 4
• Strings
• User-defined class
• Solution: transform the object into some integer
Hashing
• Hash: give each object a different unsigned int (hash) value.
• Requirements:
• Fast
• An object will always have the same hash value
•
• Uniform probability for a collision very important
Mapping
• We can use
• A bit slow operation!
• Solution: make
Collision
unsigned int hash(type obj, unsigned int size) {
return obj.hash() & ((1 << m) – 1);
}
• Insert
• 4, 10, 33, 2
• Chaining
33 10 4
0 1 2 3 4 5 6 7
Collision
unsigned int hash(type obj, unsigned int size) {
return obj.hash() & ((1 << m) – 1);
}
• Insert
• 4, 10, 33, 2
• Chaining
• Open-addressing
33 10 4
0 1 2 3 4 5 6 7
Linear-Probing
unsigned int hash(type obj, unsigned int size) {
return obj.hash() & ((1 << m) – 1);
}
• Insert
• 4, 10, 33, 2
• Check next location
• Search
• Stop when empty or full
33 10 4
0 1 2 3 4 5 6 7
Double Hashing
• It is the most efficient!
• Hash again to get the next cell index:
• Different hash functions for the initial value and jump
Quality of
Hashing
• How can we assess the quality of a hash function?
• Load factor: expected number of keys to have the same hash value
• Another way to define it:
• How many times we probe “on average” to find an item?
Quality of
Hashing
• Let us have an experiment:
• Pick a hash function
• Insert random numeric strings into
a hash map
• Draw the hash map as a picture:
• Each pixel is a cell
• Colored if cell is occupied
• White if cell is empty
SDBM [1]
Quality of
Hashing
• Let us have an experiment:
• Pick a hash function
• Insert random numeric strings into
a hash map
• Draw the hash map as a picture:
• Each pixel is a cell
• Colored if cell is occupied
• White if cell is empty
DBJ2A [1]
Quality of
Hashing
• Let us have an experiment:
• Pick a hash function
• Insert random numeric strings into
a hash map
• Draw the hash map as a picture:
• Each pixel is a cell
• Colored if cell is occupied
• White if cell is empty
FNV1 [1]
Quality of
Hashing
• Let us have an experiment:
• Pick a hash function
• Insert random numeric strings into
a hash map
• Draw the hash map as a picture:
• Each pixel is a cell
• Colored if cell is occupied
• White if cell is empty
FNV1-A [1]
Quality of
Hashing
• Let us have an experiment:
• Pick a hash function
• Insert random numeric strings into
a hash map
• Draw the hash map as a picture:
• Each pixel is a cell
• Colored if cell is occupied
• White if cell is empty
Murmur2 [1]
Problem Solving
• Remove Duplicates
from (unsorted) vector void removeDubFast(vector<int>& v){ // un/sorted vector v
unordered_set<int> m;
for (int i:v)
• Complexity: m.insert(i);
v.clear();
for (int i:m)
v.push_back(i);
}
450000000
400000000
350000000
300000000
250000000
200000000
150000000
100000000
50000000
0
0 42000 84000 126000 168000 210000 252000 294000 336000 378000 420000 462000 504000 546000 588000 630000 672000 714000 756000 798000 840000 882000 924000 966000
Real Performance
Thank You
References
• [1]
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-al
gorithm-is-best-for-uniqueness-and-speed