Linearity of Expectation: Unraveling Black Magic
Linearity of Expectation: Unraveling Black Magic
Tony Lu
October 13, 2022
Contents
1 Introduction 1
1.1 Behind this Handout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Conceptual Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 The Magic 3
2.1 Our First Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Our Second Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
6 Practice Problems 22
§1 Introduction
§1.1 Behind this Handout
Problems concerning linearity of expectation in computational contests are often viewed
with a kind of “sacred fog”: fairly routine expected value problems are often touted as
“black magic” or “beautifully constructed.” As the old saying goes, ignorance leads to
veneration; in this handout, I will attempt to unravel the magic of expectation and offer
the reader a comprehensive way to understand and solve these “magical” problems.
I was inspired to write this handout after reading and working through multiple
resources and problem sets on LoE and more generally double counting ideas and feeling
dissatisfied by the amount of discourse given in them. It pains me that many students
can only vaguely understand and intuitively apply linearity (correctly in many contest
situations) but fail to find what is really acting behind the scenes.
1
Tony Lu (October 13, 2022) Linearity of Expectation
This handout will not go into the details of the probabilistic method in an attempt to
focus on more computational ideas; however, there will be a significant portion dedicated
to using similar ideas in existence proofs on olympiad problems.
Definition 1.1. We call an experiment any sequence of actions that can be replicated
and has a well-defined set of possible outcomes, which we call the sample space. Such
an experiment is deterministic if it has a unique outcome, and random if it has more
than one possible outcome. An event is a subset of the sample space of the experiment.
Exercise 1.2. Two distinguishable fair coins are tossed and a die is rolled simultaneously.
(a) Describe the sample space of the experiment. How many outcomes are in it?
(c) Let A be the event that the total number of heads matches the number on the die
in parity. Find the number of outcomes in A, and find the probability of A.
Exercise 1.5. Use the definition of expected value to compute the following:
(a) The expected value of the sum of the two outcomes when two dice are rolled.
(b) The expected value of the product of the two outcomes when two dice are rolled.
2
Tony Lu (October 13, 2022) Linearity of Expectation
(c) The expected value of your gain or loss in a lottery costing $25 with jackpot 20
million that you have a 0.0001% chance of winning.
Now that we have the basic ideas down, throughout the rest of the handout I will
show how ultimately the theory of expected value is the foundation of the Pigeonhole
Principle, helps us prove many combinatorial and algebraic identities, and is essentially
equivalent to the enigmatic
XX XX
f (a, b) = f (a, b).
a∈A b∈B b∈B a∈A
§2 The Magic
§2.1 Our First Tour
Example 2.1
At a party, the name tags of n participants are shuffled and then redistributed
randomly to the same n people. What is the expected number of people who have
their own name tag?
The generic value of n makes computing the expected value using the definition rather
cumbersome; it’s difficult to find the probability that a random permutation of the name
tags has exactly k fixed points, for instance. Let’s begin by writing out our expected
value sum:
+(# permutations with 2 fixed points)·2+· · ·+(# permutations with n fixed points)·n.
Now, we can easily see that the RHS is, in reality, the total number of fixed points across
all permutations of the name tags. This confirms our intuition that the expected value
is an average: more specifically,
Total # of times it appears across all permutations
Expected # of times something appears = .
# of permutations
By now, we should have confidently established the following proposition:
3
Tony Lu (October 13, 2022) Linearity of Expectation
This should be fairly easy wrap your head around: the average of the sum is the sum of
the averages for independent quantities. We defer a formal proof until Section 4, where
we will prove linearity of expectation generally.
To finish the first problem, we present the first interpretation of linearity: using a
table.
Say n = 3, and let the three participants be A, B, and C. Let’s list out every
permutation and find how many fixed points it has:
A B C # Fixed Points
1 A B C 3
2 A C B 1
3 B A C 1
4 B C A 0
5 C A B 0
6 C B A 1
Our goal is to find the total number of bolded entries in the table. Notice that in this
table, there are two bolded entries per column; why is this?
Exercise 2.3. Before you move on, explain to yourself why there are two bolded entries
per column.
To do so, we can gear in on any given column, say column B. The number of permu-
tations with a bolded B in the second column is 2! = 2, as there are two permutations
with a B in the second position. In general, we can generalize that there are (n − 1)!
permutations with a B in the second position if there are n people total.
However, we can apply a similar argument to show that there are (n − 1)! permutations
with any element in its fixed position, which is precisely the number of times we have
a bolded element in each column. As there are n columns, the total number of bolded
entries is n · (n − 1)! = n!. Now, we can finish by computing the expected value:
n!
E[X] = = 1.
n!
Magical, isn’t it? We’ll come back to this problem and show how to narrow down and
simplify this idea later.
Intuitively, the answer should be p1 ; if you have a 12 probability of flipping heads, you
would expect to flip the coin twice before flipping heads. But why? Let’s go back to the
definition again.
The event happens the first second with probability p. The probability it happens the
second second is (1 − p) · p, and in general, the probability that it happens in the kth
second is (1 − p)k−1 · p. Thus, by definition, our expected value is
E[X] = p + 2p(1 − p) + 3p(1 − p)2 + · · · + k · (1 − p)k−1 · p + · · · .
This is an arithmetico-geometric series, so we could sum it routinely by multiplying by
1 − p and subtracting. However, let’s inspect what this sum is really doing a little bit
more. We can break the sum up into geometric series:
4
Tony Lu (October 13, 2022) Linearity of Expectation
as we had expected.
Sum of rows; sum of columns; what’s going on?
This somewhat conceals the entire point of the Pigeonhole Principle, and why all of us
consider it to really be common sense:
In other words, if we are able to compute the average, we are able to obtain good bounds
in the largest possible minimum and smallest possible maximum. In other words, we are
able to prove that the minimum has to be sufficiently small and maximum has to be
sufficiently large. This is why the Pigeonhole Principle is useful in existence proofs: if we
prove that the maximum is sufficiently large, then we know that there will be an element
that is sufficiently large, despite not really knowing what the element is, and vice versa.
As we will see, one of the difficult part of problems that invoke Pigeonhole is actu-
ally computing what the average is. Average ... have we seen that before? Nonethe-
less, sometimes what to apply Pigeonhole to really matters; we will give a few exam-
ples.
5
Tony Lu (October 13, 2022) Linearity of Expectation
Example 3.3
Let S be a subset of {1, 2, 3, · · · , 100} such that there do not exist a, b ∈ S with a | b.
Find the maximum possible value of |S|.
We can find that the set {51, 52, 53, · · · , 100}, or the set of odd numbers both work for
an answer of 50. To prove that this is the maximum, we can partition the set into many
disjoint smaller sets, in which any two elements from one set will violate the condition.
Then, we will be able to show that the number of elements is at most the number of sets
by Pigeonhole.
There are many ways to construct such pairs of sets, but the easiest way is to consider
the sets of the form
Am = {2k m | m odd, k ≥ 0}.
Any two elements in one of these sets will have a quotient which is a power of two, and
there are precisely 50 such sets. Thus, by Pigeonhole, if we had at least 51 elements, two
must fall in one of the Ai , and thus they would violate the condition.
Exercise 3.4. What is the maximum number of positive integers one can choose between
1 and 100 such that no two chosen integers differ by a factor of less than 1.5?
The above exercise can essentially be solved with the same mindset. Try it out!
Though Pigeonhole in its most raw form can only establish bounds, we can also modify
it to show exact results. Here’s an example:
Example 3.5
In a circle of 51 boys and 49 girls, show that there exist two boys such that there
are exactly 19 children strictly between the two boys.
The notion of existence is very closely related to Pigeonhole; in this problem, we can
apply Pigeonhole to the sets of chairs that differ by exactly 20 indices.
Let Ai denote the set of chair i and chair i + 20. Each boy’s chair falls in exactly two
of the Ai , and there are exactly 100 such sets. By Pigeonhole, two of the 102 placements,
counting multiplicity, must be in the same set; as a result, these two boys have chair
indices that differ by exactly 20. This solves the problem.
It turns out that counting the pigeons with multiplicity can be a helpful tool. (We’ll
see how this is related to what we discussed before later!)
It is difficult to write insightful and difficult Pigeonhole problems without the tool we
will develop next, so we leave the bulk of Pigeonhole concepts and examples to the fifth
section.
6
Tony Lu (October 13, 2022) Linearity of Expectation
Example 3.6
A soda company inscribes one of the letters A, B, C, D, E, F, G, H on the bottlecaps
of each one of its bottles uniformly at random. What is the expected number of
sodas Andy must buy to collect at least one bottlecap labelled with each of the
letters?
The idea behind states is that there is one or multiple “intermediate steps” that are
necessary between the given outcome and the desired outcome. It is helpful to split one
big task into smaller ones that are on their own necessary to be completed.
For example, in this problem, in order to obtain all eight varieties of bottlecaps, we
must first obtain one variety of bottlecap, then two varieties, then three, and so on. It
should be easy to conclude that
(Simply use Proposition 2.2!). Now, we can narrow in our view to one specific term, say
getting from k types to k + 1 types.
Here, we know that every time we buy a bottle, we get a new one with probability
8−k
8 as there are 8 − k remaining bottles. Now, we can apply Example 2.4 to obtain that
8
the expected value equals 8−k . Thus, the answer is
7
X 8 761
= .
8−k 35
k=0
This is fundamentally what finite states are doing: breaking a more complicated problem
into smaller bits using linearity of expectation and then analyzing the expected values of
the smaller bits themselves.
Now we take a look at infinite states, and where they originate from. First, let’s
revisit Example 2.4.
Example 3.7
Every second, an event happens with probability p if it has not happened before.
Find the expected number of seconds until the event first happens.
This time, consider the problem in the following way: suppose we begin at an initial
state (having no seconds passed), with a counter for the expected number of seconds
until the event will happen at that state, which we will call µ. Then, the event happens
with probability p at the initial state, in which case we are done in 1 second. Otherwise,
the entire process essentially repeats itself: the outcome of the event starting from the
second second is independent of the outcome during the first second. Thus, the expected
value is still µ after the first second has passed, or an expected value fo µ + 1 in this case.
Thus, by definition of expected value, we may write
1
µ = p + (1 − p)(µ + 1) =⇒ µ = ,
p
as before.
The work above should not be surprising; however, what happens when we combine
this solution with our previous result? If we plug in
7
Tony Lu (October 13, 2022) Linearity of Expectation
which is obviously an identity! Notice that the equation for µ we derived earlier to solve
for it is almost essentially identical to how we evaluate arithmetico-geometric series using
S − rS. In general, any result that includes infinite states can be expanded as an infinite
series; writing a one-variable states equation is essentially equivalent to writing out the
arithmetico-geometric series.
So, in essence, it is true that every computational problem that can be “solved
with states” has a methodical, albeit long-winded and difficult to comprehend solution
using the definition of states. Let’s look at a more complicated example with a similar
flavor.
First, let’s look at the vanilla states solution to this problem. Let a be the probability
that Alex wins given it is Alex’s turn, b be the probability that Alex wins given that it is
Barry’s turn, and c the probability that Alex wins given that it is Charles’ turn. Then,
analyzing each move of the game, we can write the following equations:
1 5
a= + b
6 6
5
b= c
6
5
c = a.
6
Before we proceed, consider the following question:
Exercise 3.9. Where in these state equations have we encoded the condition that Alex
goes first? How would we write these equations if we were given that Barry went first?
To answer this question, let’s get rid of the intermediate states b and c, and instead
have only one equation
1 125
a= + a.
6 216
This is telling us that at during Alex’s turn, he has a probability 16 of winning and can
otherwise return to his turn with probability 125
216 . Thus, we can write the corresponding
geometric series for the probability:
∞
1 X 125 k
36
p= = .
6 216 91
k=0
This geometric series helps us understand why our equations, and ultimately our expression
for a, what we are solving for, is correct.
Now, let’s consider this question from a different perspective. Every possible game can
be categorized as some chain of actions, say
A’s turn → B’s turn → C’s turn → A’s turn → · · · → A’s turn → A wins!
8
Tony Lu (October 13, 2022) Linearity of Expectation
or
A’s turn → B’s turn → C’s turn → · · · → A’s turn → B’s turn → B wins :(
We can trace each process by looking at the system of equations, the variable we are on
an indicator of the position we are currently at. For example, the first sequence happens
with probability
5 5 5 5 1
· · ··· · .
6 6 6 6 6
5
Each 6 is appended as the probability we actually move to the next turn, which is
reflected in the state equations. By moving to a different state, we multiply our current
coefficient by another factor of 56 . In other words, we should be thinking of b and c as
intermediate indicator variables – that the same desired end scenario, that Alex wins,
happens when we are on one of these intermediate states.
Finally, let’s analyze one more question. What happens when we are algebraically
cancelling variables? Essentially, we are getting rid of intermediate states. By cancelling
the variable b, we turn the first sequence into
the probability of the first arrow occuring the algebraic coefficient of c in the equation of
a, which, by substituting in our variable b, we have calculated accurately.
Thus, writing state equations and solving them algebraically through substitution is
fundamentally an easy way to simplify every possible chain of events by changing the
probabilities between each state, in order for us to finally reach a single state, which we
can solve using a simple geometric series or the equivalent state equation.
Let’s look at one last more complicated example where our analogy still holds
up.
The state equations in this problem are not difficult to write or simplify; however, I will
focus on analyzing why they work, as before. For example, we can write the equation
3 7
p3 = p2 + p4 ,
10 10
where p2 , p3 , p4 are the probabilities the frog escapes given it is on lily pad 2, 3, and 4,
respectively. Now, we can substitute every occurence of p3 somewhere else for the linear
combination of p2 and p4 . This is equivalent to turning a chain
p2 → p3 → p4 into simply p2 → p4
p2 → p3 → p2 → p3 → · · · → p2 into p2 → p2 → p2 → · · · .
9
Tony Lu (October 13, 2022) Linearity of Expectation
After every variable except for p1 has been gotten rid of, all our chains will be of the form
p1 → p1 → p1 → · · · → eaten
or
p1 → p1 → p1 → · · · → win.
Then, our final state equation will be of the form
p1 = ap1 + b,
where a is the probability we return to p1 and b the probability we win without returning
to p1 . This can still be written as an infinite geometric series based on how many times
we return to p1 or can be solved directly.
As an exercise, solve the following problem by writing out an infinite geometric series;
then, solve it using state equations, and consider how your two approaches are related.
Exercise 3.11 (1995 AIME/15). Let p be the probability that, in the process of
repeatedly flipping a fair coin, one will encounter a run of 5 heads before one encounters a
run of 2 tails. Given that p can be written in the form m/n where m and n are relatively
prime positive integers, find m + n.
Example 4.1
Let τ (n) denote the number of divisors of n. Show that
jnk jnk jnk
τ (1) + τ (2) + τ (3) + · · · + τ (n) = + + ··· + .
1 2 n
Before we present the solution, we ask the reader to try and write a formal argument
for this problem. Though the result should be intuitively quite obvious, a large part of
making this connection is how we present the solution.
Consider the following table, where we place a 1 if the row number is a multiple of the
column number:
1 2 3 ··· n Total
1 1 0 0 ··· 0 τ (1)
2 1 1 0 ··· 0 τ (2)
3 1 0 1 ··· 0 τ (3)
.. .. .. .. .. .. ..
. . . . . . .
n n n n
Total 1 2 3 ··· n
Both sides count the number of 1’s in the table, so they are equal. Have we seen this
before?
We offer another way of formalizing a proof without writing out a table, in a very
similar style.
10
Tony Lu (October 13, 2022) Linearity of Expectation
Consider the number of pairs (x, y), such that 1 ≤ x, y ≤ n and Px divides
y. If we
fix x, there are nx values of y such that x divides y, so there are nx=1 nx such pairs
To prove the previous example problem, we can use indicator variables. The trick is
to define a variable (
1 x|y
1x,y =
0 otherwise.
Then, via swapping the summation,
n j k n X
n n X
n n
X n X X X
= 1x,y = 1x,y = τ (y).
x
x=1 x=1 y=1 y=1 x=1 y=1
Example 4.4
At a party, the name tags of n participants are shuffled and then redistributed
randomly to the same n people. What is the expected number of people who have
their own name tag?
Our earlier solution to this problem involved summing over a table, so we suspect that
we can write a viable double counting solution as well. We encourage the reader to try
and come up with the proof themselves before reading the solution.
The idea is to double count pairs (σ, k) where student k receives his own name tag
in permutation σ. The expected value is essentially the average of the number of pairs
11
Tony Lu (October 13, 2022) Linearity of Expectation
containing σ over all permutations σ; in other words, it is the total number of pairs
divided by the total number of permutations. However, we can count the total number
of pairs by fixing k and counting the total number of permutations σ where k is a fixed
point. In particular, for each k, there are (n − 1)! permutations σ such that (σ, k) is a
pair. Thus, there are n! such pairs, and the desired average is simply 1.
Finally, we are able to introduce Linearity of Expectation, the synthesis of all our pre-
vious ideas and a universal perspective for analyzing expected value problems.
The idea behind this proof is using swapping the summation to prove the n = 2 case.
Proof. It suffices to prove the case n = 2. (Why?) Then, using the definition of expected
value,
X
E[X1 + X2 ] = (x1 + x2 ) · P[X1 + X2 = x1 + x2 ]
x1 +x2
XX
= (x1 + x2 ) · P[X1 = x1 , X2 = x2 ].
x1 x2
The second step is known as splitting the summation: convince yourself that every term in
the first summation can be decomposed into terms of the form in the second summation.
This can be written as
XX XX
x1 · P[X1 = x1 , X2 = x2 ] + x2 · P[X1 = x1 , X2 = x2 ].
x1 x2 x1 x2
However, notice that in the first summation, when we fix x1 and sum over all x2 , the
expression X
P[X1 = x1 , X2 = x2 ] = P[X1 = x1 ]
x2
The key is that we can do something similar for the second summation: swapping the
order of summation, we can fix x2 and vary over all x1 instead. Thus, we similarly have
XX XX X
x2 · P[X1 = x1 , X2 = x2 ] = x2 · P[X1 = x1 , X2 = x2 ] = x2 · P[X2 = x2 ].
x1 x2 x2 x1 x2
However, each of these quantities are simply E[X1 ] and E[X2 ] respectively! Thus, we
have established
E[X1 + X2 ] = E[X1 ] + E[X2 ].
12
Tony Lu (October 13, 2022) Linearity of Expectation
This theorem is powerful, and we can use it many times as a “shortcut” to a canonical
double counting solution. For example, in Example 4.4, we can define Xi = 1 if person i
receives their own name tag and 0 otherwise. Then,
1 1 1
E[X1 + X2 + · · · + Xn ] = E[X1 ] + E[X2 ] + · · · + E[Xn ] = + + ··· + = 1 ,
n n n
1
as E[Xi ] is simply n by definition.
E[XY ] = E[X]E[Y ]
algebraically.
These Xi are known as indicator variables. (Have we seen this term before?)
They are helpful because they essentially help us turn expected value computations into
probability computations. If we define
(
1 Event E does occurs
X=
0 Event E does not occur
then
E[X] = 1 · P[E] + 0 · P[not E] = P[E].
See how this is helpful?
In summary, we can use linearity of expectations to split larger expected values into
sums of smaller ones, where the smaller expected values are all indicator variables,
and thus can be computed by the above property. Let’s look at a concrete example of
this.
E[Total subsequences] = E[HH in flips 1, 2]+E[HH in flips 2, 3]+· · ·+E[HH in flips 24, 25].
Though these smaller indicator variables are not independent, we can still apply linearity
by the previous theorem. However, these smaller expected values are all indicator values.
In particular,
1
E[HH in flips 1, 2] = P[HH in flips 1, 2] = .
4
1
Thus, the answer is 4 · 24 = 6 expected subsequences.
Here is another very similar and simple exercise to check your understanding:
13
Tony Lu (October 13, 2022) Linearity of Expectation
Exercise 4.9 (1989 AHSME/30). Suppose that 7 boys and 13 girls line up in a row.
Let S be the number of places in the row where a boy and a girl are standing next to
each other. For example, for the row GBBGGGBGBGGGBGBGGBGG we have that
S = 12. Find the expected value of S.
A more difficult exercise, but nonetheless a similar idea:
Exercise 4.10 (2013 HMMT C6). Values a1 , a2 , a3 , · · · , a2013 are chosen independently
and at random from the set {1, 2, 3, · · · , 2013}. What is the expected number of distinct
values in the set {a1 , a2 , · · · , a2013 }?
To conclude this subsection, we can combine our earlier discussion of Pigeonhole with
double counting to solve more difficult existence problems.
By Pigeonhole, we simply want to show that the average number (or expected number)
of rays that have four points of the same color is greater than 12. Thus, the problem
turns into an expected value computation!
To do this, we can count the total number of monochromatic rays across all rotations.
Let us consider any one of the 100 rays and assume WLOG that it contains a black point
on the first circle. First, we can fix the orientation of the first circle. Then, there are 503
ways to rotate the other three circles such that the ray contains four black points. Thus,
across all 100 rays, there are 503 · 100 monochromatic rays, across 1003 total rotations.
Then, the expected value
503 · 100
E[Monochromatic Rays] = = 12.5 > 12,
1003
and thus we are done.
Exercise 4.12. The numbers 1 through 10 are placed around a circle.
(a) Show that there exist three consecutive numbers on this circle with sum at least 17.
(b) Show that there exist three consecutive numbers on the circle with sum at least 18.
There is a very clean but intuitive global solution to the second part; see if you can
find it!
This type of problem where we compute an expected value in some form to prove
existence is very common in olympiads. We will learn that drawing a table, per how we
motivated our methods of double counting, will become a powerful method of visualizing
and conceptualizing problems later; many similar problems will be covered in the next
section.
14
Tony Lu (October 13, 2022) Linearity of Expectation
Claim — For an arbitrary noodle of length `, there exists a constant c such that
the expected number of intersections is c · `.
Proof. The main idea is to partition the noodle into tiny segments of length ε, each of
which has a constant probability p of intersecting the floorboard. Thus, by linearity of
expectation, the expected number of intersections is
`
p· =c·`
ε
for some fixed constant c.
Now, we simply want to extract the constant c. If we let the noodle be a circle
with diameter 1, then it intersects the floorboard with certainty exactly once, and thus
2
c = π2 . As a result, the expected number of intersections is , which is also our
π
probability.
Example 4.14
n points are chosen uniformly at random on the circumference of a circle. Find the
probability that all n points lie on a semicircle.
We utilize a similar idea presented in the previous problem. Instead of computing the
probability directly, we compute the expected number of semicircles with one end at one
of the n points which contain all n points. This becomes much easier: we can compute
the individual probability that all other n − 1 points lie on the semicircle, which occurs
1
with probability 2n−1 .
This means that
15
Tony Lu (October 13, 2022) Linearity of Expectation
Thus, the two probabilities are equivalent, and thus the probability that there exists such
n
a circle is simply 2n−1 .
Problems that use linearity of expectation in the reverse direction are quite rare, but
they are nonetheless quite straightforward once one correctly harnesses the power of
linearity. A few guidelines when using linearity of expectation:
Make sure that your random variables actually sum to the random variable you are
splitting up and do not overlap! It is easy to handwave confuse dependent events
with overlapping quantities in the random variables.
Though E[X + Y ] = E[X] + E[Y ] holds even if X and Y are dependent, other
properties, such as E[XY ] = E[X]E[Y ], only work if X and Y are independent.
Usually when we perform double counting, we double count incidences: pairs (a, b). This
is how we can utilize summing in two ways: we can fix a and count over all b, or fix b
and count over all a.
This problem is no different. Because we want to show that there exists a row or
√
column ` with at least n distinct symbols s, we can double count pairs (`, s). As we
√
want to show there exists at least n pairs for some `, we can instead sum over all s to
count the total number of pairs. In other words, we want to show that the total number
√
of pairs is at least 2n n because there are 2n possible choices of `.
√
Thus, let’s fix the symbol s; we want to show that there are at least 2 n lines ` with
s ∈ ` because there are n different symbols. Say s appears in a columns and b rows.
Then, we know that all symbols of type s must lie in one of the ab squares that constitute
the intersection between one of the columns and one of the rows. In other words,
√ √
a + b ≥ 2 ab ≥ 2 n
√
as ab ≥ n, so the total number of rows and columns s appears in is at least 2 n, as
needed.
The next problem illustrates an idea that essentially embodies a more generalized
version of double counting.
16
Tony Lu (October 13, 2022) Linearity of Expectation
triangles.
As seen from the above problem, in graph theoretic and geometric formulations, objects
frequently double counted are angles between edges at vertices: they are easy to count
from multiple perspectives. The double counting may not be as obvious as counting pairs
(a, b) in many problems either. Next, we have a double counting problem that strongly
illustrates the emphasis of the table behind our theory.
Once one finds the correct double counting interpretation behind this problem, there is
really not much to it. Thus, we encourage the reader to attempt to construct such an
argument before we present a bare-bones solution.
17
Tony Lu (October 13, 2022) Linearity of Expectation
Proof. Consider the following partition of the numbers in the sequence {bn }:
b1 = x1
b2 = 2x1 + x2
b3 = 4x2 + 2x2 + x3
..
.
bm = 2m−1 x1 + 2m−2 x2 + · · · + xm .
Obviously, choosing a unique m-tuple of nonnegative integers (note that the integers can
equal zero) is a bijection to choosing the bi . Now, sum the columns vertically; let the
first nonzero sum be a constant times a1 , second nonzero sum to be a constant times
a2 , and so on. Every such sequence of ai ’s that are one less than a power of two can be
broken into such components; furthermore, by using the sum of a geometric series and
factoring out each xi term from the column sum, each array of xi ’s will produce a unique
sequence {an }. Thus, both sequences biject to the sequence {xi }, so A(n) = B(n) by the
bijection.
The key in the above problem was to decompose the structure of the ai and the bi into
something common: in this case, consecutive powers of two. There is no real formula or
method to find such a connection between seemingly independent conditions, so fluency
can only come through practice and conscious reflection.
The next example demonstrates how we can work backwards from the assertion given
by the problem to discover our double counting argument.
Example 5.5
Let A1 , A2 , · · · , An+1 be subsets of a set S with n elements. Show that
X |Ai ∩ Aj |
≥ 1.
|Ai | · |Aj |
1≤i<j≤n
Let’s look at what exactly each of our fractions count. The denominator, |Ai | · |Aj |, is
simply the number of ways to choose one element from each of two subsets Ai , Aj , and
|Ai ∩ Aj | is the number of ways to choose one element from the intersection of Ai and Aj .
Because we are only choosing one element in the numerator and two in the denominator,
we consider the prospect of identical elements: the numerator essentially counts the
number of ways to choose a pair of identical elements (a, a) which lie in both Ai and
Aj . In other words, the fraction is the probability that two randomly chosen elements
from the Ai and Aj are the same element. (The intersection condition can be dropped,
because it is implied!)
Finally, as we are computing a sum of these fractions, we can rewrite these probabilities
as indicator variables. In other words, each fraction is the expected number of overlapped
elements when we choose one element from Ai and Aj . Summing over all possible pairs
(i, j), we are looking for the expected number of pairs of overlapped elements when we
choose each element from every one of the Ai .
But now this is easy! Because we choose n + 1 elements from an n-element subset, two
of them must be the same by Pigeonhole. This implies that the number of intersections
must be at least 1 in all cases, which correspondingly means that the expected value
must be at least 1. This proves the result.
18
Tony Lu (October 13, 2022) Linearity of Expectation
The next problem is one of many similarly flavored problems, where we use the “forming
a table” idea behind our original formulation of double counting to understand the problem.
See if you can recognize similar flavors in the practice problems!
This is one of many problems where the double counting interpretation is fairly obvious:
we can double count pairs (Ci , ai ) where ai is a delegate in committee Ci . To begin with,
why is the following solution wrong?
Fake Solution. Number the committees C1 , C2 , C3 , · · · , C16000 in some order, and call
the delegates a1 , a2 , · · · , a1600 . The probability that any given delegate is on a given
1
committee is 20 . As a result, given two committees Ci , Cj , a certain ak ∈ Ci , Cj with
1 2
probability 20 . Thus, we expect to see
2
1
· 1600 = 4
20
members in the ak on both committees. Obviously, there must exist such a construction.
To figure out the mistake in this solution, let’s write it using our linearity of expectation
jargon. We wish to compute the expected number of members in the intersection between
any two randomly chosen committees. In other words,
1600
X
E[Number of common delegates] = P[Delegate i is a common delegate]
i=1
by using each of the terms as an indicator variable. So far, so good. However, what really
is the probability that delegate i is a common delegate? If we randomly fix a committee
1
and choose a delegate, it is true that the delegate has a 20 probability of appearing in the
committee. However, that is not what we are doing here: we are fixing a delegate (in this
case, delegate i) and claiming that for any arbitrary committee (or pair of committees),
1
the delegate has a 20 probability of appearing in that pair of committees.
However, this is obviously false! If we had a delegate in only no committees (which
is totally possible), then there would be a probability of 0 that any randomly chosen
committee would contain this delegate. However, if there were to be a delegate in all
16000 committees, then the probability would be 1! We are claiming that it is fixed at
1
20 , which we have not warranted.
Thus, the main idea presented here is that while linearity of expectation allows us to
sum across both rows and columns, they are still different! When dealing with probability,
fixing a row and looking at a column is totally different from fixing a column and looking
at a row. This is a trap that is often fallen into: while using linearity of expectation, we
may confuse our indicator variables and probabilities regarding what they are fixing and
what they are considering over.
With that in mind, we now present a correct solution. Notice how we avoid the subtlety
that rendered the previous solution incorrect.
Proof. Because there are 16000 · 80 total appearances of any delegate in any committee,
we expect to see each delegate in 800 committees. Thus, there are a total of 800
2 · 1600
19
Tony Lu (October 13, 2022) Linearity of Expectation
16000
delegates in the intersection of any two committees. As there are 2 committees, the
average size of intersection is
800 1600 63920
· 16000 = > 3.
2 2
15999
As a result, there must exist two committees whose intersection contains four or more
members.
Walkthrough. For this problem specifically, we ask the reader to follow along with the
following steps to solve the problem: we will discuss inherent ideas behind the problem
later.
(a) Show that there exists two 28-agons, each with all vertices of the same color, that
are congruent.
(b) Show that there exists three octagons, each with all vertices of the same color, that
are congruent.
Magic?
The idea in this problem is that the bound is not tight enough. Summing in the obvious
1084
way yields a bound of 432 3 −ε < 2 expected intersections between all four points. Upon
inspection, however, this is a pretty loose bound: there are many cases where things don’t
intersect (i.e. the EV is 0) but they are counted in the numerator. Replacing 4323 with
431 · 430 · 429 doesn’t do much, either.
The piecemeal Pigeonhole application is simply to encode this emptiness as much as
possible. Notice that at every step, the difference between 432 and 431 makes a critical
difference: otherwise, we wouldn’t have been able to construct the 28-gon (instead a
27-gon), 8-gon (instead a 7-gon), and so on; ignoring these slight non-identity irregularities
turns this solution into essentially the global case (with only segments found), but taking
advantage of the small alterations forms the crux of the solution.
To conclude our examples, we examine a few pigeonhole and double counting problems
phrased in a more pure geometric or number theoretic context.
Example 5.8
Several circles with circumferences summing to 10 lie strictly inside a square with
side length 1. Prove that there exists a line that intersects at least four of the circles.
20
Tony Lu (October 13, 2022) Linearity of Expectation
This problem feels somewhat loose: we are not even given any constraints on the number
of circles; there would very well be a huge number of them. The main technique we will
use (and is quite common, as you will see in the praactice problems) is projections. The
key is that the following lemma is useful:
Lemma
Suppose we have disjoint segments with length summing to more than n contained
in a segment of side length 1. Then there exists a point on the segment that lies on
at least n + 1 of the segments.
It should be quite intuitive why this lemma is true by Pigeonhole; it is difficult to prove
this rigorously with our current definition of length, so we leave a formal proof up to the
curious reader.
However, this lemma now offers us a valuable way to solve the problem. Projecting
all the circumferences onto a side of the square, the total sum of the segments that are
the projections is 10 10
π (the sum of the diameters). However, π > 3, which means that by
Pigeonhole, there exists a point that lies on at least four of the segments. Taking the
vertical line through this point guarantees that it intersects the four circles for which the
projections of the diameter contain the projected point, as required.
Example 5.9
n points lie in general position on the plane. Show that there exists some constant c
such that the maximum number of pairs of points that are a distance 1 apart from
√
each other is cn n.
First, we have to encode the constraint in a useful way: for instance, in its current
formulation, there’s nothing preventing us from having all the points a distance 1 from
each other! To do so, we use the fact that a triangle has a unique circumcenter; in other
words, there cannot exist two points that are both equidistant from a given three points.
Thus, we can double count pairs (P, T ) where P is a point and T is a pair of points
such that P is distance 1 from all vertices in T . Let S denote the set of all the points.
Define f (P ) to be the number of points that are a distance 1 from P . If we fix T , there
are at most
two points P for this T , which implies that the total number of pairs is at
most 2 n2 . However, the desired quantity – the number of pairs of adjacent points, is the
sum of f (P ) over all choices of P .
Now,to find a condition on f (P ), notice that the number of pairs (P, T ) for a fixed P
is f (P )
2 . Thus,
X PP ∈S f (P )
n f (P ) n
2 ≥ ≥n
2 2 2
P ∈S
P
by Jensen. Let x = P ∈S f (P ) be (twice) the desired number of pairs. Then,
x x
n· · − 1 ≤ 2n(n − 1).
n n
This yields √
n(1 + 8n − 7) √
x≤ ∼ n 2n,
2
√
n√ n
so the desired number of points is about 2
, which solves the problem.
21
Tony Lu (October 13, 2022) Linearity of Expectation
Exercise 5.10. Notice that we could also have counted pairs (P, T ) where P is a point
and T is a triangle, but this yields a weaker bound. Do so.
For the interested reader, there is the following graph theoretic generalization, for
which the proof follows along basically the same lines but is more involved asymptoti-
cally.
Theorem 5.11
There exists a constant c > 0 such that the maximum number of edges in a graph
1
with n vertices without Ks,t is c · n2− s .
Remark 5.12. An important inequality that we will use throughout many of these problems
is Jensen’s inequality on the convex binomial function xk for x ≥ k. In other words, in
general, we have
n Pni=1 ai
X ai n
≥n .
i=1
k k
This is useful as we have to choose out of a subset when we fix one element during our
double counting; for instance, it was used in the previous example.
Example 5.13
Let f (n) denote the largest prime factor of a positive integer n. Let {an } be a
strictly increasing sequence of positive integers. Show that for every positive integer
M , there exists i 6= j such that f (ai + aj ) > M .
This is a fairly difficult problem when presented sending alone; the main idea is to use
the property that the sequence is unbounded (as it is strictly increasing). But given a
finite number of primes that divide sums of the form ai + aj , we eventually may obtain
size issues with respect to the ai . The formal proof below may formalize that intuition.
Proof. For the sake of contradiction assume that p1 , p2 , · · · , pk are the only prime factors
6 j. Now consider the k + 1 numbers ai + aj for fixed i and
dividing ai + aj for all i =
1 ≤ j ≤ k + 1. By assumption every ai + aj is of the form pα1 1 pα2 2 · · · pαk k . As the
sequence is unbounded, suppose that ai > (p1 p2 · · · pk )N for some large integer N . We
may conclude that
max(α1 , α2 , · · · , αk ) > N
due to the only prime factors being p1 , p2 , · · · , pk . Thus, for every j, there exists a p such
that pN | ai + aj ; but there are only k primes p and k + 1 indices j, so by Pigeonhole
two such sums correspond with the same prime p. So
Y
pN | aj2 − aj1 ⇐⇒ pN | (aj2 − aj1 ).
1≤j1 <j2 ≤k+1
But the RHS is fixed, while N can be arbitrarily large by selecting large enough i (as the
sequence {an } increases).
22
Tony Lu (October 13, 2022) Linearity of Expectation
§6 Practice Problems
Mostly olympiad problems, though a few practice problems on states are mixed in. They
are not strictly in increasing order of difficulty, though the reader will (hopefully) find
some of the earlier problems simpler.
Problem 6.1. Find the maximum possible size of a subset S of {1, 2, 3, · · · , 2017}, such
that there do not exist distinct a, b, c ∈ S with ab = c.
Problem 6.2 (2002 Putnam). Given any five points on a sphere, show that some four
of them must lie on a closed hemisphere.
Problem 6.3 (2002 IMC). 200 students participated in a math contest. They had 6
problems to solve. Each problem was correctly solved by at least 120 participants. Prove
that there must be 2 participants such that every problem was solved by at least one of
these two students.
Problem 6.4 (2004 BAMO). Suppose one is given n real numbers, not all zero, but
such that their sum is zero. Prove that one can label these numbers a1 , a2 , ..., an in such
a manner that
a1 a2 + a2 a3 + ... + an−1 an + an a1 < 0.
Problem 6.5 (2001 IMO/4). Let n be an odd integer greater than 1 and let c1 , c2 , . . . , cn
Pnintegers. For each permutation a = (a1 , a2 , . . . , an ) of {1, 2, . . . , n}, define S(a) =
be
i=1 ci ai . Prove that there exist permutations a 6= b of {1, 2, . . . , n} such that n! is a
divisor of S(a) − S(b).
Problem 6.6 (1987 IMO/1). Let pn (k) be the number ofPpermutations of the set
{1, 2, 3, . . . , n} which have exactly k fixed points. Prove that nk=0 kpn (k) = n!.
Problem 6.7 (AoPS). On a 7 × 7 square piece of graph paper, the centers of k of the
49 squares are chosen. No four of the chosen points are the vertices of a rectangle whose
sides are parallel to those of the paper. What is the largest k for which this is possible?
Problem 6.8 (1985 AIME/14). In a tournament each player played exactly one game
against each of the other players. In each game the winner was awarded 1 point, the loser
got 0 points, and each of the two players earned 1/2 point if the game was a tie. After
the completion of the tournament, it was found that exactly half of the points earned
by each player were earned against the ten players with the least number of points. (In
particular, each of the ten lowest scoring players earned half of her/his points against the
other nine of the ten). What was the total number of players in the tournament?
Problem 6.9 (2022 HMMT C5). Five cards labeled 1, 3, 5, 7, 9 are laid in a row in that
order, forming the five-digit number 13579 when read from left to right. A swap consists
of picking two distinct cards, and then swapping them. After three swaps, the cards form
a new five-digit number n when read from left to right. Compute the expected value of n.
Problem 6.10 (2000 Putnam). Let n ≥ 3 be an integer, and let B be a set of more than
2n+1
n distinct points in n-dimensional space with coordinates of the form (±1, ±1, . . . , ±1).
Show that there are three distinct points P, Q, R in B such that P QR is an equilateral
triangle.
Problem 6.11. Let S = [n]. Suppose that some subsets A1 , A2 , · · · , Ak of S satisfy the
following:
23
Tony Lu (October 13, 2022) Linearity of Expectation
Problem 6.12 (1998 IMO/2). In a contest, there are m candidates and n judges, where
n ≥ 3 is an odd integer. Each candidate is evaluated by each judge as either pass or fail.
Suppose that each pair of judges agrees on at most k candidates. Prove that
k n−1
≥ .
m 2n
Problem 6.13 (Shortlist 2006 C3). Let S be a finite set of points in the plane such that
no three of them are on a line. For each convex polygon P whose vertices are in S, let
a(P ) be the number of vertices of P , and let b(P ) be the number of points of S which are
outside P . A line segment, a point, and the empty set are considered as convex polygons
of 2, 1, and 0 vertices respectively. Prove that for every real number x
X
xa(P ) (1 − x)b(P ) = 1,
P
where the sum is taken over all convex polygons with vertices in S.
Problem 6.14 (2000 Putnam B1). Let aj , bj , cj be integers for 1 ≤ j ≤ N . Assume for
each j, at least one of aj , bj , cj is odd. Show that there exists integers r, s, t such that
raj + sbj + tcj is odd for at least 4N 7 values of j, 1 ≤ j ≤ N .
Problem 6.15 (2016 AIME I/13). Freddy the frog is jumping around the coordinate
plane searching for a river, which lies on the horizontal line y = 24. A fence is located at
the horizontal line y = 0. On each jump Freddy randomly chooses a direction parallel
to one of the coordinate axes and moves one unit in that direction. When he is at a
point where y = 0, with equal likelihoods he chooses one of three directions where he
either jumps parallel to the fence or jumps away from the fence, but he never chooses
the direction that would have him cross over the fence to where y < 0. Freddy starts his
search at the point (0, 21) and will stop once he reaches a point on the river. Find the
expected number of jumps it will take Freddy to reach the river.
Problem 6.16 (1985 IMO/4). Given a set M of 1985 distinct positive integers, none of
which has a prime divisor greater than 23, prove that M contains a subset of 4 elements
whose product is the 4th power of an integer.
Problem 6.18 (2018 AIME II/13). Misha rolls a standard, fair six-sided die until she
rolls 1-2-3 in that order on three consecutive rolls. The probability that she will roll the
die an odd number of times is m n , where m and n are relatively prime positive integers.
Find m + n.
Problem 6.19. Suppose that S is a set of n points in the plane in general position.
√ ∈ S, there exists at least k points in S that are equidistant
Suppose that for every point P
from P . Show that k < 12 + 2n.
24
Tony Lu (October 13, 2022) Linearity of Expectation
Problem 6.20 (Iran). A school has n students, each of which is enrolled in at least two
classes. We know that if two classes have at least two students taking both classes, then
the number of students in the two classes are different. Show that there are at most
(n − 1)2 classes.
Problem 6.21 (Shortlist 2004 C1). There are 10001 students at an university. Some
students join together to form several clubs (a student may belong to different clubs).
Some clubs join together to form several societies (a club may belong to different societies).
There are a total of k societies. Suppose that the following conditions hold:
2. For each student and each society, the student is in exactly one club of the society.
3. Each club has an odd number of students. In addition, a club with 2m + 1 students
(m is a positive integer) is in exactly m societies.
Problem 6.23 (2006 HMMT C10). Somewhere in the universe, n students are taking
a 10-question math competition. Their collective performance is called laughable if, for
some pair of questions, there exist 57 students such that either all of them answered both
questions correctly or none of them answered both questions correctly. Compute the
smallest n such that the performance is necessarily laughable.
Problem 6.25 (Korea 2005). 11 students take a test. For any two question in a test,
there are at least 6 students who solved exactly one of those two questions. Prove that
there are no more than 12 questions in this test.
Problem 6.26 (Russia 1999). In a class, each boy knows at least one girl. Show that
one can choose a group of more than half of the students such that every boy in the
group knows an odd number of girls in the group.
Problem 6.27. A series of line segments with sum of lengths greater than 1000 lie in a
square with side length 1. Show that there exists a line that intersects at least 501 of
these segments.
Problem 6.28 (1993 Putnam). Let x1 , x2 , . . . , x19 be positive integers less than or
equal to 93. Let y1 , y2 , . . . , y93 be positive integers less than or equal to 19. Prove that
there exists a (non-empty) sum of some xi equal to a sum of some yi .
Problem 6.29 (China 1993). Ten people went to a bookstore. It is known that each
person bought exactly 3 books, and for every pair of people, there exists at least one
book that they both bought. Find the minimum possible number of people who bought
the most bought book.
25
Tony Lu (October 13, 2022) Linearity of Expectation
Problem 6.30. We choose at least n + 1 numbers among 1, 2, · · · , 2n. Prove that among
the chosen numbers, there are two different numbers whose sum is a prime number.
Problem 6.31 (2010 HMMT C9). Rosencrantz and Guildenstern are playing a game
where they repeatedly flip coins. Rosencrantz wins if 1 heads followed by 2009 tails
appears. Guildenstern wins if 2010 heads come in a row. They will flip coins until
someone wins. What is the probability that Rosencrantz wins?
26