0% found this document useful (0 votes)
39 views5 pages

Hashing

lecture related to hashing

Uploaded by

Alina Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
39 views5 pages

Hashing

lecture related to hashing

Uploaded by

Alina Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
Introduction to Algorithms: 6.006 Massachusetts Institute of Technology Instructors: Erik Demaine, Jason Ku, and Justin Solomon, Lecture 4: Hashing Review Lecture 4: Hashing Operations O(-) Container [| Static [| Dynamic Order Data Structure build ix} find (x i i find prev (k) ‘Array n n n n n Sorted Array [|| wlogm | logn n 1 logn # Idea! Want faster search and dynamic operations. Can we f inc (x) faster than @(logn)? Answer is no (lower bound)! (But actually, yes...!2) Comparison Model © In this model, assume algorithm can only differentiate items via comparisons «© Comparable items: black boxes only supporting comparisons between pairs © Comparisons are <,<,>,>, , outputs are binary: True ot False # Goal: Store a set of n comparable items, support Find (x) operation # Running time is lower bounded by # comparisons performed, so count comparisons! Decision Tree © Any algorithm can be viewed as a decision tree of operations performed * An internal node represents a binary comparison, branching either True or False * For a comparison algorithm, the decision tree is binary (draw example) «A leaf represents algorithm termination, resulting in an algorithm output « A root-to-leaf path represents an execution of the algorithm on some input «Need at least one leaf for each algorithm output, so search requires > n + 1 leaves 2 Lecture 4: Hashing Comparison Search Lower Bound © What is worst-case running time of a comparison search algorithm? © running time > # comparisons > max length of any root-to-leaf path > height of tree © What is minimum height of any binary tree on > n nodes? © Minimum height when binary tree is complete (all rows full except last) © Height > [lg(n + 1)] — 1 = 2(ogn), so running time of any comparison sort is $X(log n) Sorted arrays achieve this bound! Yay! # More generally, height of tree with @(n) leaves and max branching factor b is (log, n) To get faster, need an operation that allows super-constant (1) branching factor. How?? Direct Access Array © Exploit Word-RAM O(1) time random access indexing! Linear branching factor! # Idea! Give item unique integer key k in {0,...,u— 1}, store item in an array at index k Associate a meaning with each index of array If keys fit in a machine word, ie. u < 2", worst-case O(1) find/dynamic operations! Yay! 6.006: assume input numbers/strings fit in a word, unless length explicitly parameterized Anything in computer memory is a binary integer, or use (static) 64-bit address in memory But space O(u), so really badifn Collision! —:( ‘© Can't store both items at same index, so where to store? Either: = store somewhere else in the array (open addressing) * complicated analysis, but common and practical — store in another data structure supporting dynamic set interface (chaining) Chaining # Idea! Store collisions in another data structure (a chain) If keys roughly evenly distributed over indices, chain size is n/m = n/(n) = O(1)! If chain has O(1) size, all operations take O(1) time! Yay! If not, many items may map to same location, e.g. h(k) = constant, chain size is O(n) -( Need good hash function! So what's a good hash function? Hash Functions Division (bad): h(k) = (k mod m) # Heuristic, good when keys are uniformly distributed! ‘m should avoid symmetries of the stored keys Large primes far from powers of 2 and 10 can be reasonable Python uses a version of this with some additional mixing If w > n, every hash function will have some input set that will a create O(n) size chain # Idea! Don’t use a fixed hash function! Choose one randomly (but carefully)! 4 Lecture 4: Hashing Universal (good, theoretically) as(k) = (((ak + b) mod p) mod m) Hash Family H(p,m) = {has a,b € {0,...,p— 1} and a 4 0} Parameterized by a fixed prime p > u, with « and b chosen from range {0,...,p — 1} -} His a Universal family: Pr {h(k,) = h(kj)} < fm Wk # k € {0, © Why is universality useful? Implies short chain lengths! (in expectation) # X;, indicator random variable over h € H: Xi = 1 iff h(h,) = h(hy), Xi = 0 otherwise Size of chain at index /h(k,) is random variable X, = 7, X,j Expected size of chain at index h(t.) BO + LB, 0) = 14 OG) Pes) = Ay} + 0) Pe Als) A bl) 5A S140 malt (n-V/m 7 © Since m = (n), load factor a = n/m = O(1), so O(1) im expectation! Dynamic © Ifn/m far from 1, rebuild with new randomly chosen hash function for new size m # Same analysis as dynamic arrays, cost can be amortized over many dynamic operations © So ahash table can implement dynamic set operations in expected amortized O(1) time! :) Operations OC.) Container J Static [| Dynamic Order Data Structure peta |] cinain | ansereo0 |) eineininey | eaneecevin aeretetiy |] tindnaxtd | cindnext 0) Array n n n n n Sorted Array logn n logn Direct Access Array Hash Table MIT OpenCourseWare https://ocw.mit.edu 6.006 Introduction to Algorithms Spring 2020 For information about citing these materials or our Terms of Use, visit: httos://ocw.mit.edu/terms

You might also like