Advanced Algorithm Design Lecture Notes Princeton Cos521 Itebooks instant download
Advanced Algorithm Design Lecture Notes Princeton Cos521 Itebooks instant download
https://ebookbell.com/product/advanced-algorithm-design-lecture-
notes-princeton-cos521-itebooks-23835968
https://ebookbell.com/product/a-textbook-of-data-structures-and-
algorithms-volume-3-mastering-advanced-data-structures-and-algorithm-
design-strategies-2nd-edition-g-a-vijayalakshmi-pai-47399684
https://ebookbell.com/product/automated-design-of-electrical-
converters-with-advanced-ai-algorithms-xin-zhang-49606052
https://ebookbell.com/product/design-optimization-of-renewable-energy-
systems-using-advanced-optimization-algorithms-venkata-rao-
ravipudi-46753108
https://ebookbell.com/product/assembly-line-design-the-balancing-of-
mixedmodel-hybrid-assembly-lines-with-genetic-algorithms-springer-
series-in-advanced-manufacturing-1st-edition-brahim-rekiek-2151350
Advances In Metaheuristic Algorithms For Optimal Design Of Structures
Third Edition 3rd Edition Ali Kaveh
https://ebookbell.com/product/advances-in-metaheuristic-algorithms-
for-optimal-design-of-structures-third-edition-3rd-edition-ali-
kaveh-23269222
https://ebookbell.com/product/advances-in-metaheuristic-algorithms-
for-optimal-design-of-structures-1st-edition-a-kaveh-auth-4696934
https://ebookbell.com/product/advances-in-metaheuristic-algorithms-
for-optimal-design-of-structures-2nd-edition-a-kaveh-auth-5675734
https://ebookbell.com/product/advances-in-evolutionary-algorithms-
theory-design-and-practice-1st-edition-dr-chang-wook-ahn-auth-4191262
https://ebookbell.com/product/jaya-an-advanced-optimization-algorithm-
and-its-engineering-applications-1st-ed-2019-ravipudi-venkata-
rao-7039990
princeton univ. F’14 cos 521: Advanced Algorithm Design
Lecture 1: Course Intro and Hashing
Lecturer: Sanjeev Arora Scribe:Sanjeev
Algorithms are integral to computer science and every computer scientist (even as an
undergrad) has designed several algorithms. So has many a physicist, electrical engineer,
mathematician etc. This course is meant to be your one-stop shop to learn how to design
a variety of algorithms. The operative word is “variety. ”In other words you will avoid the
blinders that one often sees in domain experts. A bayesian needs to see priors on the data
before he can begin designing algorithms; an optimization expert needs to cast all problems
as convex optimization; a systems designer has never seen any problem that cannot be
solved by hashing. (OK, mostly kidding but there is some truth in these stereotypes.)
These and more domain-specific ideas make an appearance in our course, but we will learn
to not be wedded to any single approach.
The primary skill you will learn in this course is how to analyse algorithms: prove their
correctness and their running time and any other relevant properties. Learning to analyse a
variety of algorithms (designed by others) will let you design better algorithms later in life.
I will try to fill the course with beautiful algorithms. Be prepared for frequent rose-smelling
stops, in other words.
1
2
The changing graph. In undergrad algorithms the graph is given and arbitrary (worst-
case). In grad algorithms we are willing to look at where the graph came from (social
network, computer vision etc.) since those properties may be germane to designing a good
algorithm. (This is not a radical idea of course but we will see that formulating good graph
models is not easy. This is why you see a lot of heuristic work in practice, without any
mathematical proofs of correctness.)
Changing data structures: In undergrad algorithms the data structures were simple
and often designed to hold data generated by other algorithms. A stack allows you to hold
vertices during depth-first search traversal of a graph, or instances of a recursive call to a
procedure. A heap is useful for sorting and searching.
But in the newer applications, data often comes from sources we don’t control. Thus it
may be noisy, or inexact, or both. It may be high dimensional. Thus something like heaps
will not work, and we need more advanced data structures.
We will encounter the “curse of dimensionality”which constrains algorithm design for
high-dimensional data.
Type of analysis: In undergrad algorithms the algorithms were often exact and work on
all (i.e., worst-case) inputs. In grad algorithms we are willing to relax these requirements.
2 Hashing: Preliminaries
Now we briefly study hashing, both because it is such a basic data structure, and because
it is a good setting to develop some fluency in probability calculations.
Hashing can be thought of as a way to rename an address space. For instance, a router
at the internet backbone may wish to have a searchable database of destination IP addresses
of packets that are whizing by. An IP address is 128 bits, so the number of possible IP
addresses is 2128 , which is too large to let us have a table indexed by IP addresses. Hashing
allows us to rename each IP address by fewer bits. Furthermore, this renaming is done
probabilistically, and the renaming scheme is decided in advance before we have seen the
actual addresses. In other words, the scheme is oblivious to the actual addresses.
Formally, we want to store a subset S of a large universe U (where |U | = 2128 in the
above example). And |S| = m is a relatively small subset. For each x ∈ U , we want to
support 3 operations:
3
U
h
n elements
A hash table can support all these 3 operations. We design a hash function
h : U −→ {0, 1, . . . , n − 1} (1)
3 Hash Functions
Say we have a family of hash functions H, and for each h ∈ H, h : U −→ [n]1 . What do
mean if we say these functions are random?
For any x1 , x2 , . . . , xm ∈ S (xi 6= xj when i 6= j), and any a1 , a2 , . . . , am ∈ [n], ideally a
random H should satisfy:
1
We use [n] to denote the set {0, 1, . . . , n − 1}
4
• Prh∈H [h(x1 ) = a1 ] = n1 .
1
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ] = n2
. Pairwise independence.
1
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ∧ · · · ∧ h(xk ) = ak ] = nk
. k-wise independence.
1
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ∧ · · · ∧ h(xm ) = am ] = nm . Full independence (note
that |U | = m).
Generally speaking, we encounter a tradeoff. The more random H is, the greater the
number of random bits needed to generate a function h from this class, and the higher the
cost of computing h.
For example, if H is a fully random family, there are nm possible h, since each of the
m elements at S have n possible locations they can hash to. So we need log |H| = m log n
bits to represent each hash function. Since m is usually very large, this is not practical.
But the advantage of a random hash function is that it ensures very few collisions with
high probability. Let Lx be the length of the linked list containing x; this is just the number
of elements with the same hash value as x. Let random variable
(
1 if h(y) = h(x),
Iy = (2)
0 otherwise.
P
So Lx = 1 + y∈S;y6=x Iy , and
X m−1
E[Lx ] = 1 + E[Iy ] = 1 + (3)
n
y∈S;y6=x
Usually we choose n > m, so this expected length is less than 2. Later we will analyse
this in more detail, asking how likely is Lx to exceed say 100.
The expectation calculation above doesn’t need full independence; pairwise indepen-
dence would actually suffice. This motivates the next idea.
And let
ha,b (x) = fa,b (x) mod n (6)
5
Lemma 1
6 x2 and s 6= t, the following system
For any x1 =
Since [p] constitutes a finite field, we have that a = (x1 − x2 )−1 (s − t) and b = s − ax1 .
Since we have p(p − 1) different hash functions in H in this case,
1
Pr [h(x1 ) = s ∧ h(x2 ) = t] = (9)
h∈H p(p − 1)
0
1
si elements
i
s2i locations
n−1
5 Load Balance
Now we think a bit about how large the linked lists (ie number of collisions) can get. Let
us think for simplicity about hashing n keys in a hash table of size n. This is the famous
balls-and-bins calculation, also called load balance problem. We have n balls and n bins,
and we randomly put the balls into bins. Then for a given i,
n 1 1
Pr[bini gets more than k elements] ≤ · k ≤ (19)
k n k!
By Stirling’s formula,
√ k
k! ∼ 2nk( )k (20)
e
If we choose k = O( logloglogn n ), we can let 1
k! ≤ 1
n2
. Then
1 1
Pr[∃ a bin ≥ k balls] ≤ n · 2
= (21)
n n
12
So with probability larger than 1 − n ,
log n
max load ≤ O( ) (22)
log log n
2 1
this can be easily improve to 1 − nc
for any constant c
7
Aside: The above load balancing is not bad; no more than O( logloglogn n ) balls in a bin with
high probability. Can we modify the method of throwing balls into bins to improve the load
balancing? We use an idea that you use at the supermarket checkout: instead of going to
a random checkout counter you try to go to the counter with the shortest queue. In the
load balancing case this is computationally too expensive: one has to check all n queues.
A much simpler version is the following: when the ball comes in, pick 2 random bins, and
place the ball in the one that has fewer balls. Turns out this modified rule ensures that the
maximal load drops to O(log log n), which is a huge improvement. This called the power of
two choices.
princeton univ. F’13 cos 521: Advanced Algorithm Design
Lecture 2: Karger’s Min Cut Algorithm
Lecturer: Sanjeev Arora Scribe:Sanjeev
Today’s topic is simple but gorgeous: Karger’s min cut algorithm and its extension. It
is a simple randomized algorithm for finding the minimum cut in a graph: a subset of
vertices S in which the set of edges leaving S, denoted E(S, S) has minimum size among
all subsets. You may have seen an algorithm for this problem in your undergrad class that
uses maximum flow. Karger’s algorithm is elementary and and a great introduction to
randomized algorithms.
The algorithm is this: Pick a random edge, and merge its endpoints into a single “su-
pernode.”Repeat until the graph has only two supernodes, which is output as our guess for
min-cut. (As you continue, the supernodes may develop parallel edges; these are allowed.
Selfloops are ignored.)
Note that if you pick a random edge, it is more likely to come from parts of the graph
that contain more edges in the first place. Thus this algorithm looks like a great heuristic
to try on all kinds of real-life graphs, where one wants to cluster the nodes into “tightly-
knit”portions. For example, social networks may cluster into communities; graphs capturing
similarity of pixels may cluster to give different portions of the image (sky, grass, road etc.).
Thus instead of continuing Karger’s algorithm until you have two supernodes left, you could
stop it when there are k supernodes and try to understand whether these correspond to a
reasonable clustering.
Today we will first see that the above version of the algorithm yields the optimum min
cut with probability at least 2/n2 . Thus we can repeat it say 20n2 times, and output the
smallest cut seen in any iteration. The probability that the optimum cut is not seen in any
2
repetition is at most (1 − 2/n2 )20n < 0.01.
Unfortunately, this simple version has running time about n4 which is not great.
So then we see a better version with a simple tweak that brings the running time down
to closer to n2 . The idea is that roughly that repetition ensures fault tolerance. The real-life
advice of making two backups of your hard drive is related to this: the probability that both
fail is much smaller than one does. In case of Karger’s algorithm, the overall √ probability
of success is too low. But if run part of the way until the graph has n/ 2 supernodes,
the chance that the mincut √ hasn’t changed is at least 1/2. So you make two independent
runs that go down to n/ 2 supernodes, and recursively solve both of these. Thus the
expected number of instances that will yield the correct mincut is 2 × 12 = 1. (Unwrapping
√
the recursion, you see that each instance of size n/ 2 will generate two instances of size
n/2, and so on.) Simple induction shows that this 2-wise repetition is enough to bring the
probability of success above 1/ log n.
As you might suspect, this is not the end of the story but improvements beyond this
get more hairy. If anybody is interested I can give more pointers.
Also this algorithm forms the basis of other algorithms for other tasks. Again, talk to
me for pointers.
1
Chapter 3
Today’s topic is deviation bounds: what is the probability that a random variable deviates
from its mean by a lot? Recall that a random variable X is a mapping from a probability
space to R. The expectation or mean is denoted E[X] or sometimes as µ.
In many settings we have a set of n random variables X1 , X2 , X3 , . . . , Xn defined on
the same probability space. To give an example, the probability space could be that of all
possible outcomes of n tosses of a fair coin, and Xi is the random variable that is 1 if the
ith toss is a head, and is 0 otherwise, which means E[Xi ] = 1/2.
The first observation we make is that of the Linearity of Expectation, viz.
X X
E[ Xi ] = E[Xi ]
i i
It is important to realize that linearity holds regardless of the whether or not the random
variables are independent.
Can we say something about E[X1 X2 ]? In general, nothing much but if X1 , X2 are
independent events (formally, this means that for all a, b Pr[X1 = a, X2 = b] = Pr[X1 =
a] Pr[X2 = b]) then E[X1 X2 ] = E[X1 ] E[X2 ].
Note that if the Xi ’s are
P pairwisePindependent (i.e., each pair are mutually independent)
then this means that var[ i Xi ] = i var[Xi ].
18
19
Note that this is just another way to write the trivial observation that E[X] k ·Pr[X k].
Can we give any meaningful upperbound on Pr[X < c · E[X]] where c < 1, in other
words the probability that X is a lot less than its expectation? In general we cannot.
However, if we know an upperbound on X then we can. For example, if X 2 [0, 1] and
E[X] = µ then for any c < 1 we have (simple exercise)
1 µ
Pr[X cµ] .
1 cµ
Sometimes this is also called an averaging argument.
Example 1 Suppose you took a lot of exams, each scored from 1 to 100. If your average
score was 90 then in at least half the exams you scored at least 80.
and so,
⇥ ⇤ 1
Pr |X µ|2 k2 2
.
k2
Here is simple fact that’s used a lot: If Y1 , Y2 , . . . , Yt are iid (whichP
is jargon for inde-
pendent and identically distributed) then the variance of their average k1 i Yt is exactly 1/t
times the variance of one of them. Using Chebyshev’s inequality, this already implies that
the average of iid variables converges sort-of strongly to the mean.
m(m 1)
Now for independent random variables E[Yi Yj ] = E[Yi ] E[Yj ] so E[X 2 ] = m
n + n2
.
Hence the variance is very close to m/n, and thus Chebyshev implies that the probability
that Pr[X > 2 m n
n ] < m . When m > 3n, say, this is stronger than Markov.
20
k2
Pr[|X µ| > k ] 2 exp( ),
4
P 2
P 2
where µ = i µi and = i i. Also, k /2 (say).
Instead of proving the above we prove a simpler theorem for binary valued variables
which showcases the basic idea.
Theorem 3
Let X1 , X2 , . . . , Xn be independent
P 0/1-valued random variables
Pand let pi = E[Xi ], where
0 < pi < 1. Then the sum X = ni=1 Xi , which has mean µ = ni=1 pi , satisfies
Pr[X (1 + )µ] (c )µ
⇥ ⇤
where c is shorthand for (1+ e)(1+ ) .
Remark: There is an analogous inequality that bounds the probability of deviation below
the mean, whereby becomes negative and the in the probability becomes and the c
is very similar.
Proof: Surprisingly, this inequality also is proved using the Markov inequality, albeit
applied to a di↵erent random variable.
21
We introduce a positive constant t (which we will specify later) and consider the random
variable exp(tX): when X is a this variable is exp(ta). The advantage of this variable is
that X Y Y
E[exp(tX)] = E[exp(t Xi )] = E[ exp(tXi )] = E[exp(tXi )], (3.1)
i i i
where the last equality holds because the Xi r.v.s are independent, which implies that
exp(tXi )’s are also independent. Now,
E[exp(tXi )] = (1 pi ) + p i e t ,
therefore,
Y Y Y
E[exp(tXi )] = [1 + pi (et 1)] exp(pi (et 1))
i i i
X (3.2)
= exp( pi (et 1)) = exp(µ(et 1)),
i
using lines (3.1) and (3.2) and the fact that t is positive. Since t is a dummy variable, we can
choose any positive value we like for it. The right hand size is minimized if t = ln(1+ )—just
di↵erentiate—and this leads to the theorem statement. 2
The following is the more general inequality for variables that do not lie in [ 1, 1]. It is
proved similarly to Cherno↵ bound.
Theorem 4 (Hoeffding) P
Suppose X1 , X2 , . . . , Xn are independent r.v.’s, with ai Xi bi . If X = i Xi and
µ = E[X] then
t2
Pr[X µ > t] exp( P ).
i (bi ai ) 2
Using only memory equivalent to 5 lines of printed text, you can estimate with a
typical accuracy of 5 per cent and in a single pass the total vocabulary of Shake-
speare. This wonderfully simple algorithm has applications in data mining, esti-
mating characteristics of huge data flows in routers, etc. It can be implemented
by a novice, can be fully parallelized with optimal speed-up and only need minimal
hardware requirements. Theres even a bit of math in the middle!
Opening lines of a paper by Durand and Flajolet, 2003.
Strictly speaking, one cannot hash to a real number since computers lack infinite preci-
sion. Instead, one hashes to rational numbers in [0, 1]. For instance, hash IP addresses to
the set [p] as before, and then think of number “i mod p”as the rational number i/p. This
works OK so long as our method doesn’t use too many bits of precision in the real-valued
hash.
23
Other documents randomly have
different content
credimi, e faremo baldoria.... Oh, se faremo baldoria!... E chi sa...
chi sa... che l'antica vigna non ci riveda!... (Le dà un bacio rovente.)
Rosa
Lucio
Rosa
(Sipario.)
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
ebookbell.com