0 ratings0% found this document useful (0 votes) 39 views5 pagesHashing
lecture related to hashing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon,
Lecture 4: Hashing
Review
Lecture 4: Hashing
Operations O(-)
Container [| Static [| Dynamic Order
Data Structure build ix} find (x i i find prev (k)
‘Array n n n n n
Sorted Array [|| wlogm | logn n 1 logn
# Idea! Want faster search and dynamic operations. Can we f inc (x) faster than @(logn)?
Answer is no (lower bound)! (But actually, yes...!2)
Comparison Model
© In this model, assume algorithm can only differentiate items via comparisons
«© Comparable items: black boxes only supporting comparisons between pairs
© Comparisons are <,<,>,>,
, outputs are binary: True ot False
# Goal: Store a set of n comparable items, support Find (x) operation
# Running time is lower bounded by # comparisons performed, so count comparisons!
Decision Tree
© Any algorithm can be viewed as a decision tree of operations performed
* An internal node represents a binary comparison, branching either True or False
* For a comparison algorithm, the decision tree is binary (draw example)
«A leaf represents algorithm termination, resulting in an algorithm output
« A root-to-leaf path represents an execution of the algorithm on some input
«Need at least one leaf for each algorithm output, so search requires > n + 1 leaves2 Lecture 4: Hashing
Comparison Search Lower Bound
© What is worst-case running time of a comparison search algorithm?
© running time > # comparisons > max length of any root-to-leaf path > height of tree
© What is minimum height of any binary tree on > n nodes?
© Minimum height when binary tree is complete (all rows full except last)
© Height > [lg(n + 1)] — 1 = 2(ogn), so running time of any comparison sort is $X(log n)
Sorted arrays achieve this bound! Yay!
# More generally, height of tree with @(n) leaves and max branching factor b is (log, n)
To get faster, need an operation that allows super-constant (1) branching factor. How??
Direct Access Array
© Exploit Word-RAM O(1) time random access indexing! Linear branching factor!
# Idea! Give item unique integer key k in {0,...,u— 1}, store item in an array at index k
Associate a meaning with each index of array
If keys fit in a machine word, ie. u < 2", worst-case O(1) find/dynamic operations! Yay!
6.006: assume input numbers/strings fit in a word, unless length explicitly parameterized
Anything in computer memory is a binary integer, or use (static) 64-bit address in memory
But space O(u), so really badifn Collision! —:(
‘© Can't store both items at same index, so where to store? Either:
= store somewhere else in the array (open addressing)
* complicated analysis, but common and practical
— store in another data structure supporting dynamic set interface (chaining)
Chaining
# Idea! Store collisions in another data structure (a chain)
If keys roughly evenly distributed over indices, chain size is n/m = n/(n) = O(1)!
If chain has O(1) size, all operations take O(1) time! Yay!
If not, many items may map to same location, e.g. h(k) = constant, chain size is O(n) -(
Need good hash function! So what's a good hash function?
Hash Functions
Division (bad): h(k) = (k mod m)
# Heuristic, good when keys are uniformly distributed!
‘m should avoid symmetries of the stored keys
Large primes far from powers of 2 and 10 can be reasonable
Python uses a version of this with some additional mixing
If w > n, every hash function will have some input set that will a create O(n) size chain
# Idea! Don’t use a fixed hash function! Choose one randomly (but carefully)!4 Lecture 4: Hashing
Universal (good, theoretically) as(k) = (((ak + b) mod p) mod m)
Hash Family H(p,m) = {has a,b € {0,...,p— 1} and a 4 0}
Parameterized by a fixed prime p > u, with « and b chosen from range {0,...,p — 1}
-}
His a Universal family: Pr {h(k,) = h(kj)} < fm Wk # k € {0,
© Why is universality useful? Implies short chain lengths! (in expectation)
# X;, indicator random variable over h € H: Xi = 1 iff h(h,) = h(hy), Xi = 0 otherwise
Size of chain at index /h(k,) is random variable X, = 7, X,j
Expected size of chain at index h(t.)
BO + LB, 0)
= 14 OG) Pes) = Ay} + 0) Pe Als) A bl)
5A
S140 malt (n-V/m
7
© Since m = (n), load factor a = n/m = O(1), so O(1) im expectation!
Dynamic
© Ifn/m far from 1, rebuild with new randomly chosen hash function for new size m
# Same analysis as dynamic arrays, cost can be amortized over many dynamic operations
© So ahash table can implement dynamic set operations in expected amortized O(1) time! :)
Operations OC.)
Container J Static [| Dynamic Order
Data Structure peta |] cinain | ansereo0 |) eineininey | eaneecevin
aeretetiy |] tindnaxtd | cindnext 0)
Array n n n n n
Sorted Array logn n logn
Direct Access Array
Hash TableMIT OpenCourseWare
https://ocw.mit.edu
6.006 Introduction to Algorithms
Spring 2020
For information about citing these materials or our Terms of Use, visit: httos://ocw.mit.edu/terms