Hash Tables
● Hashing is based on the idea of distributing keys
among a 1-D array T[0..m-1] called Hash Table.
● Hash table:
■ Given a table T and a record x, we need to support:
○ Insert (T, x)
○ Delete (T, x)
○ Search(T, x)
■ We want these to be fast, but don’t care about sorting
the records
■ In this discussion we consider all keys to be natural
numbers
04/30/2025
1
Direct Addressing
● Suppose:
■ The range of keys is 0..m-1
■ Keys are distinct
● The idea:
■ Set up an array T[0..m-1] in which
○ T[i] = x if x T and key[x] = i
○ T[i] = NULL otherwise
■ This is called a direct-address table
2
04/30/2025
Direct Addressing
3
04/30/2025
The Problem With Direct Addressing
● Direct addressing works well when the range m of
keys is relatively small
● But what if the keys are 32-bit integers?
■ Problem 1: direct-address table will have 2 32 entries,
more than 4 billion
■ Problem 2: even if memory is not an issue, the time to
initialize the elements to NULL may be
● Solution: map keys to smaller range 0..m-1
● This mapping is called a hash function
4
04/30/2025
Hash Function
A hash function is said to be good if each key is
equally likely to hash to any of the m slots of T
independently of where any other key has hashed to.
Unfortunately, we have no way to check this.
5
04/30/2025
Hash Functions
Problem? collision
T
U 0
(universe of keys)
h(k1)
k1
h(k4)
K k4
k5
(actual h(k2) = h(k5)
keys)
k2 h(k3)
k3
m-1
6
04/30/2025
Resolving Collisions
● How can we solve the problem of collisions?
● Solution 1: chaining (Open Hashing)
● Solution 2: open addressing (Closed Hashing)
7
04/30/2025
Chaining
● Chaining puts elements that hash to the same slot in
a linked list: T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
8
04/30/2025
Chaining
● How do we insert an element?
T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
9
04/30/2025
Chaining
● How do we delete an element?
■ Do we need a doubly-linked list for efficient delete?
T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
10
04/30/2025
Chaining
● How do we search for an element with a given key?
T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
11
04/30/2025
Choosing A Hash Function
● Clearly choosing the hash function well is crucial
■ What will a worst-case hash function do?
■ What will be the time to search in this case?
● What are desirable features of the hash function?
■ Should distribute keys uniformly into slots
■ Should not depend on patterns in the data
12
04/30/2025
Hash Functions:
The Division Method
● h(k) = k mod m
■ In words: hash k into a table with m slots using the slot
given by the remainder of k divided by m
● What happens to elements with adjacent values of k?
● What happens if m is a power of 2 (say 2 P)?
● What if m is a power of 10?
● Upshot: pick table size m = prime number not too
close to a power of 2 (or 10)
13
04/30/2025
Hash Functions:
Example
Consider the following list of words:
A, FOOL, AND, HIS, MONEY, ARE, SOON,
PARTED
A hash function will be used by simply adding the
positions of a word’s letters in the alphabet and
compute the sum’s remainder after division by 13
(being the size of the array).
A = 1 mod 13 = 1, FOOL = (6 + 15 + 15 + 12) mod
13 = 9 and so on.
14
04/30/2025
Hash Functions:
Example
Note a collision of the keys ARE and SOON because
h(ARE) = (1 + 18 + 5) mod 13 = 11 and h(SOON) =
(19 + 15 + 15 + 14) mod 13 = 11
0 1 2 3 4 5 6 7 8 9 10 11 12
ARE
A AND MONEY
15
04/30/2025 SOON
Open Addressing
● Basic idea:
■ To insert: if slot is full, try another slot, …, until an open
slot is found (linear probing)
■ To search, follow same sequence of probes as would be
used when inserting the element
○ If reach element with correct key, return it
○ If reach a NULL pointer, element is not in table
● Good for fixed sets (adding but no deletion)
■ Example: spell checking
● Table needn’t be much bigger than n
16
04/30/2025
Open Addressing
The advantage of this approach is that it avoids the
use of pointers. The memory saved by not storing
pointers can be used to construct a larger hash table
if necessary. Thus, using the same amount of
memory we can construct a larger hash table, which
potentially leads to fewer collisions and therefore
faster DICTIONARY ADT operations.
17
04/30/2025
Open Addressing
Initially all hash table locations store the empty value;
however, if an element is stored in the table and later
deleted, we will mark the vacated slot using the
deleted symbol rather than the empty symbol.
Advantage of using deleted rather than empty?
if the deleted symbol is used, then a search can terminate
whenever an empty value is encountered. In this case, we
know that the element being searched for is not in the hash
table. 18
04/30/2025
Open Addressing
Searching for (or deleting) an element involves
probing the hash table until the desired key is found.
Note that the same sequence of probes used to insert
an element must also be used when searching for (or
deleting) it.
19
04/30/2025
Open Addressing
0 1 2 3 4 5 6 7 8 9 10 11 12
AA
20
04/30/2025