0% found this document useful (0 votes)

245 views26 pages

Linearity of Expectation: Unraveling Black Magic

Uploaded by

redlotus31415

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

245 views26 pages

Linearity of Expectation: Unraveling Black Magic

Uploaded by

redlotus31415

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Linearity of Expectation

Unraveling Black Magic

Tony Lu
October 13, 2022

Contents
1 Introduction 1
1.1 Behind this Handout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Conceptual Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 The Magic 3
2.1 Our First Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Our Second Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Other Related Ideas 5

3.1 Pigeonhole Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 What’s Up with States? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Summing in Two Ways 10

4.1 How it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 How are they connected? . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Some Surprising Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Extra Example Problems 16

6 Practice Problems 22

§1 Introduction
§1.1 Behind this Handout
Problems concerning linearity of expectation in computational contests are often viewed
with a kind of “sacred fog”: fairly routine expected value problems are often touted as
“black magic” or “beautifully constructed.” As the old saying goes, ignorance leads to
veneration; in this handout, I will attempt to unravel the magic of expectation and offer
the reader a comprehensive way to understand and solve these “magical” problems.
I was inspired to write this handout after reading and working through multiple
resources and problem sets on LoE and more generally double counting ideas and feeling
dissatisfied by the amount of discourse given in them. It pains me that many students
can only vaguely understand and intuitively apply linearity (correctly in many contest
situations) but fail to find what is really acting behind the scenes.

1
Tony Lu (October 13, 2022) Linearity of Expectation

This handout will not go into the details of the probabilistic method in an attempt to
focus on more computational ideas; however, there will be a significant portion dedicated
to using similar ideas in existence proofs on olympiad problems.

§1.2 Conceptual Introduction

Before we can introduce expected value, first I have to put forth many notions we use in
probability:

Definition 1.1. We call an experiment any sequence of actions that can be replicated
and has a well-defined set of possible outcomes, which we call the sample space. Such
an experiment is deterministic if it has a unique outcome, and random if it has more
than one possible outcome. An event is a subset of the sample space of the experiment.

Exercise 1.2. Two distinguishable fair coins are tossed and a die is rolled simultaneously.

(a) Describe the sample space of the experiment. How many outcomes are in it?

(b) Is each outcome equally likely?

(c) Let A be the event that the total number of heads matches the number on the die
in parity. Find the number of outcomes in A, and find the probability of A.

Definition 1.3 (Random Variable). A random variable X for an experiment is an

element taken at random from the sample space of the experiment with respect to the
probabilities of each outcome. We define P(X = a) to be the probability X attains the
outcome a, or simply the probability of outcome a. For example, say X is the random
variable corresponding to an outcome of rolling a dice. Then
1
P(X = 1) = P(X = 2) = · · · = P(X = 6) = .
6
Finally, we are ready to define expected value formally:

Definition 1.4 (Expected Value). Let X be a random variable of an experiment, and A

be the sample space of that experiment. Then, the expected value of X is
X
E[X] = x · P(X = x).
x∈A

Notice that the restriction x ∈ A is not really needed; if x 6∈ A, P (X = a) = 0, so the

contribution is irrelevant.
So what does this definition really mean? We can apply it to X being the outcome of
rolling a die. Then
1 1 1
E[X] = · 1 + · 2 + · · · + · 6 = 3.5.
6 6 6
We can tell that expected value feels like an average: the value we expect to get on
average when performing the experiment. This piece of intuition is going to be critical
for us to expand our understanding of expectation and ultimately introduce linearity.

Exercise 1.5. Use the definition of expected value to compute the following:

(a) The expected value of the sum of the two outcomes when two dice are rolled.

(b) The expected value of the product of the two outcomes when two dice are rolled.

2
Tony Lu (October 13, 2022) Linearity of Expectation

(c) The expected value of your gain or loss in a lottery costing $25 with jackpot 20
million that you have a 0.0001% chance of winning.

Now that we have the basic ideas down, throughout the rest of the handout I will
show how ultimately the theory of expected value is the foundation of the Pigeonhole
Principle, helps us prove many combinatorial and algebraic identities, and is essentially
equivalent to the enigmatic
XX XX
f (a, b) = f (a, b).
a∈A b∈B b∈B a∈A

§2 The Magic
§2.1 Our First Tour
Example 2.1
At a party, the name tags of n participants are shuffled and then redistributed
randomly to the same n people. What is the expected number of people who have
their own name tag?

The generic value of n makes computing the expected value using the definition rather
cumbersome; it’s difficult to find the probability that a random permutation of the name
tags has exactly k fixed points, for instance. Let’s begin by writing out our expected
value sum:

E[X] = (Probability of no fixed points) · 0 + (Probability of 1 fixed point) · 1+

(Probability of 2 fixed points) · 2 + · · · + (Probability of n fixed points) · n.

This is hard to wrap our head around, as probabilities are not physical quantities.
However, recall that the probability of an event is the number of outcomes with that
event divided by the total number of outcomes, which in this case is simply n!. Thus,
multiplying both sides by n!,

E[X]·n! = (# permutations with 0 fixed points)·0+(# permutations with 1 fixed point)·1

+(# permutations with 2 fixed points)·2+· · ·+(# permutations with n fixed points)·n.
Now, we can easily see that the RHS is, in reality, the total number of fixed points across
all permutations of the name tags. This confirms our intuition that the expected value
is an average: more specifically,
Total # of times it appears across all permutations
Expected # of times something appears = .
# of permutations
By now, we should have confidently established the following proposition:

Proposition 2.2 (Weaker Linearity)

For any independent random variables X1 , X2 , · · · , Xn , we have

E[X1 + X2 + · · · + Xn ] = E[X1 ] + E[X2 ] + · · · + E[Xn ].

3
Tony Lu (October 13, 2022) Linearity of Expectation

This should be fairly easy wrap your head around: the average of the sum is the sum of
the averages for independent quantities. We defer a formal proof until Section 4, where
we will prove linearity of expectation generally.
To finish the first problem, we present the first interpretation of linearity: using a
table.
Say n = 3, and let the three participants be A, B, and C. Let’s list out every
permutation and find how many fixed points it has:
A B C # Fixed Points
1 A B C 3
2 A C B 1
3 B A C 1
4 B C A 0
5 C A B 0
6 C B A 1
Our goal is to find the total number of bolded entries in the table. Notice that in this
table, there are two bolded entries per column; why is this?
Exercise 2.3. Before you move on, explain to yourself why there are two bolded entries
per column.
To do so, we can gear in on any given column, say column B. The number of permu-
tations with a bolded B in the second column is 2! = 2, as there are two permutations
with a B in the second position. In general, we can generalize that there are (n − 1)!
permutations with a B in the second position if there are n people total.
However, we can apply a similar argument to show that there are (n − 1)! permutations
with any element in its fixed position, which is precisely the number of times we have
a bolded element in each column. As there are n columns, the total number of bolded
entries is n · (n − 1)! = n!. Now, we can finish by computing the expected value:
n!
E[X] = = 1.
n!
Magical, isn’t it? We’ll come back to this problem and show how to narrow down and
simplify this idea later.

§2.2 Our Second Tour

Example 2.4
Every second, an event happens with probability p if it has not happened before.
Find the expected number of seconds until the event first happens.

Intuitively, the answer should be p1 ; if you have a 12 probability of flipping heads, you
would expect to flip the coin twice before flipping heads. But why? Let’s go back to the
definition again.
The event happens the first second with probability p. The probability it happens the
second second is (1 − p) · p, and in general, the probability that it happens in the kth
second is (1 − p)k−1 · p. Thus, by definition, our expected value is
E[X] = p + 2p(1 − p) + 3p(1 − p)2 + · · · + k · (1 − p)k−1 · p + · · · .
This is an arithmetico-geometric series, so we could sum it routinely by multiplying by
1 − p and subtracting. However, let’s inspect what this sum is really doing a little bit
more. We can break the sum up into geometric series:

4
Tony Lu (October 13, 2022) Linearity of Expectation

p + p(1 − p) + p(1 − p)2 + ··· + (1 − p)k−1 · p + ···

+ p(1 − p) + p(1 − p)2 + ··· + (1 − p)k−1 · p + ···
+ p(1 − p)2 + ··· + (1 − p)k−1 · p + ···
.. ..
. + . + ···
p + 2p(1 − p) + 3p(1 − p)2 + ··· + k · (1 − p)k−1 · p + ···
Notice that each term of the original sequence is a sum of a column in this decomposition.
However, each sum of a row is a geometric series. In particular, the kth row sums to
1
p(1 − p)k · = (1 − p)k ,
1 − (1 − p)

so the total sum is simply

∞
X 1 1
(1 − p)k = = ,
1 − (1 − p) p
k=0

as we had expected.
Sum of rows; sum of columns; what’s going on?

§3 Other Related Ideas

Before we reach the grand finale of combining all our previous ideas, let’s look at two
seemingly totally different ideas independently.

§3.1 Pigeonhole Principle

The Pigeonhole Principle is usually presented in some variation of the following:

Theorem 3.1 (Pigeonhole Principle)

n
If we put n objects into k boxes, there are at least k objects in one of the boxes.

This somewhat conceals the entire point of the Pigeonhole Principle, and why all of us
consider it to really be common sense:

Theorem 3.2 (Essentially Pigeonhole Principle)

min ≤ average ≤ max.

In other words, if we are able to compute the average, we are able to obtain good bounds
in the largest possible minimum and smallest possible maximum. In other words, we are
able to prove that the minimum has to be sufficiently small and maximum has to be
sufficiently large. This is why the Pigeonhole Principle is useful in existence proofs: if we
prove that the maximum is sufficiently large, then we know that there will be an element
that is sufficiently large, despite not really knowing what the element is, and vice versa.
As we will see, one of the difficult part of problems that invoke Pigeonhole is actu-
ally computing what the average is. Average ... have we seen that before? Nonethe-
less, sometimes what to apply Pigeonhole to really matters; we will give a few exam-
ples.

5
Tony Lu (October 13, 2022) Linearity of Expectation

Example 3.3
Let S be a subset of {1, 2, 3, · · · , 100} such that there do not exist a, b ∈ S with a | b.
Find the maximum possible value of |S|.

We can find that the set {51, 52, 53, · · · , 100}, or the set of odd numbers both work for
an answer of 50. To prove that this is the maximum, we can partition the set into many
disjoint smaller sets, in which any two elements from one set will violate the condition.
Then, we will be able to show that the number of elements is at most the number of sets
by Pigeonhole.
There are many ways to construct such pairs of sets, but the easiest way is to consider
the sets of the form
Am = {2k m | m odd, k ≥ 0}.
Any two elements in one of these sets will have a quotient which is a power of two, and
there are precisely 50 such sets. Thus, by Pigeonhole, if we had at least 51 elements, two
must fall in one of the Ai , and thus they would violate the condition.

Exercise 3.4. What is the maximum number of positive integers one can choose between
1 and 100 such that no two chosen integers differ by a factor of less than 1.5?

The above exercise can essentially be solved with the same mindset. Try it out!
Though Pigeonhole in its most raw form can only establish bounds, we can also modify
it to show exact results. Here’s an example:

Example 3.5
In a circle of 51 boys and 49 girls, show that there exist two boys such that there
are exactly 19 children strictly between the two boys.

The notion of existence is very closely related to Pigeonhole; in this problem, we can
apply Pigeonhole to the sets of chairs that differ by exactly 20 indices.
Let Ai denote the set of chair i and chair i + 20. Each boy’s chair falls in exactly two
of the Ai , and there are exactly 100 such sets. By Pigeonhole, two of the 102 placements,
counting multiplicity, must be in the same set; as a result, these two boys have chair
indices that differ by exactly 20. This solves the problem.
It turns out that counting the pigeons with multiplicity can be a helpful tool. (We’ll
see how this is related to what we discussed before later!)
It is difficult to write insightful and difficult Pigeonhole problems without the tool we
will develop next, so we leave the bulk of Pigeonhole concepts and examples to the fifth
section.

§3.2 What’s Up with States?

One technique in which we use to compute expected value in computational problems is
the use of states. While the technique might seem completely disjoint from the definition
of expected value when we consider it by its face value, the concepts are still closely
interconnected.
First, we consider finite states, which are relatively easy to comprehend.

6
Tony Lu (October 13, 2022) Linearity of Expectation

Example 3.6
A soda company inscribes one of the letters A, B, C, D, E, F, G, H on the bottlecaps
of each one of its bottles uniformly at random. What is the expected number of
sodas Andy must buy to collect at least one bottlecap labelled with each of the
letters?

The idea behind states is that there is one or multiple “intermediate steps” that are
necessary between the given outcome and the desired outcome. It is helpful to split one
big task into smaller ones that are on their own necessary to be completed.
For example, in this problem, in order to obtain all eight varieties of bottlecaps, we
must first obtain one variety of bottlecap, then two varieties, then three, and so on. It
should be easy to conclude that

(EV from 0 → 8) = (EV from 0 → 1) + (EV from 1 → 2) + · · · + (EV from 7 → 8).

(Simply use Proposition 2.2!). Now, we can narrow in our view to one specific term, say
getting from k types to k + 1 types.
Here, we know that every time we buy a bottle, we get a new one with probability
8−k
8 as there are 8 − k remaining bottles. Now, we can apply Example 2.4 to obtain that
8
the expected value equals 8−k . Thus, the answer is
7
X 8 761
= .
8−k 35
k=0

This is fundamentally what finite states are doing: breaking a more complicated problem
into smaller bits using linearity of expectation and then analyzing the expected values of
the smaller bits themselves.
Now we take a look at infinite states, and where they originate from. First, let’s
revisit Example 2.4.

Example 3.7
Every second, an event happens with probability p if it has not happened before.
Find the expected number of seconds until the event first happens.

This time, consider the problem in the following way: suppose we begin at an initial
state (having no seconds passed), with a counter for the expected number of seconds
until the event will happen at that state, which we will call µ. Then, the event happens
with probability p at the initial state, in which case we are done in 1 second. Otherwise,
the entire process essentially repeats itself: the outcome of the event starting from the
second second is independent of the outcome during the first second. Thus, the expected
value is still µ after the first second has passed, or an expected value fo µ + 1 in this case.
Thus, by definition of expected value, we may write

1
µ = p + (1 − p)(µ + 1) =⇒ µ = ,
p

as before.
The work above should not be surprising; however, what happens when we combine
this solution with our previous result? If we plug in

µ = p + 2p(1 − p) + 3p(1 − p)2 + · · ·

7
Tony Lu (October 13, 2022) Linearity of Expectation

into the expression, then we obtain

p + 2p(1 − p) + 3p(1 − p)2 + · · · = p + (1 − p)(p + 2p(1 − p) + 3p(1 − p)2 + · · · ),

which is obviously an identity! Notice that the equation for µ we derived earlier to solve
for it is almost essentially identical to how we evaluate arithmetico-geometric series using
S − rS. In general, any result that includes infinite states can be expanded as an infinite
series; writing a one-variable states equation is essentially equivalent to writing out the
arithmetico-geometric series.
So, in essence, it is true that every computational problem that can be “solved
with states” has a methodical, albeit long-winded and difficult to comprehend solution
using the definition of states. Let’s look at a more complicated example with a similar
flavor.

Example 3.8 (Folklore)

Alex, Barry, and Charles are playing a game. Alex starts by rolling a die; if he rolls
a six, then he wins. Otherwise, play passes to the next player, who will win if he
rolls a six, and wraps around to Alex after Charles rolls. Find the probability that
Alex wins the game.

First, let’s look at the vanilla states solution to this problem. Let a be the probability
that Alex wins given it is Alex’s turn, b be the probability that Alex wins given that it is
Barry’s turn, and c the probability that Alex wins given that it is Charles’ turn. Then,
analyzing each move of the game, we can write the following equations:
1 5
a= + b
6 6
5
b= c
6
5
c = a.
6
Before we proceed, consider the following question:
Exercise 3.9. Where in these state equations have we encoded the condition that Alex
goes first? How would we write these equations if we were given that Barry went first?
To answer this question, let’s get rid of the intermediate states b and c, and instead
have only one equation
1 125
a= + a.
6 216
This is telling us that at during Alex’s turn, he has a probability 16 of winning and can
otherwise return to his turn with probability 125
216 . Thus, we can write the corresponding
geometric series for the probability:
∞
1 X 125 k

36
p= = .
6 216 91
k=0

This geometric series helps us understand why our equations, and ultimately our expression
for a, what we are solving for, is correct.
Now, let’s consider this question from a different perspective. Every possible game can
be categorized as some chain of actions, say

A’s turn → B’s turn → C’s turn → A’s turn → · · · → A’s turn → A wins!

8
Tony Lu (October 13, 2022) Linearity of Expectation

A’s turn → B’s turn → C’s turn → · · · → A’s turn → B’s turn → B wins :(

We can trace each process by looking at the system of equations, the variable we are on
an indicator of the position we are currently at. For example, the first sequence happens
with probability
5 5 5 5 1
· · ··· · .
6 6 6 6 6
5
Each 6 is appended as the probability we actually move to the next turn, which is
reflected in the state equations. By moving to a different state, we multiply our current
coefficient by another factor of 56 . In other words, we should be thinking of b and c as
intermediate indicator variables – that the same desired end scenario, that Alex wins,
happens when we are on one of these intermediate states.
Finally, let’s analyze one more question. What happens when we are algebraically
cancelling variables? Essentially, we are getting rid of intermediate states. By cancelling
the variable b, we turn the first sequence into

A’s turn → C’s turn → A’s turn → · · · ,

the probability of the first arrow occuring the algebraic coefficient of c in the equation of
a, which, by substituting in our variable b, we have calculated accurately.
Thus, writing state equations and solving them algebraically through substitution is
fundamentally an easy way to simplify every possible chain of events by changing the
probabilities between each state, in order for us to finally reach a single state, which we
can solve using a simple geometric series or the equivalent state equation.
Let’s look at one last more complicated example where our analogy still holds
up.

Example 3.10 (2014 AMC 10B/25)

In a small pond there are eleven lily pads in a row labeled 0 through 10. A frog
is sitting on pad 1. When the frog is on pad N , 0 < N < 10, it will jump to pad
N N
N − 1 with probability 10 and to pad N + 1 with probability 1 − 10 . Each jump is
independent of the previous jumps. If the frog reaches pad 0 it will be eaten by a
patiently waiting snake. If the frog reaches pad 10 it will exit the pond, never to
return. What is the probability that the frog will escape without being eaten by the
snake?

The state equations in this problem are not difficult to write or simplify; however, I will
focus on analyzing why they work, as before. For example, we can write the equation
3 7
p3 = p2 + p4 ,
10 10
where p2 , p3 , p4 are the probabilities the frog escapes given it is on lily pad 2, 3, and 4,
respectively. Now, we can substitute every occurence of p3 somewhere else for the linear
combination of p2 and p4 . This is equivalent to turning a chain

p2 → p3 → p4 into simply p2 → p4

with a different probability in between. Or, we could turn

p2 → p3 → p2 → p3 → · · · → p2 into p2 → p2 → p2 → · · · .

9
Tony Lu (October 13, 2022) Linearity of Expectation

After every variable except for p1 has been gotten rid of, all our chains will be of the form

p1 → p1 → p1 → · · · → eaten

or
p1 → p1 → p1 → · · · → win.
Then, our final state equation will be of the form

p1 = ap1 + b,

where a is the probability we return to p1 and b the probability we win without returning
to p1 . This can still be written as an infinite geometric series based on how many times
we return to p1 or can be solved directly.
As an exercise, solve the following problem by writing out an infinite geometric series;
then, solve it using state equations, and consider how your two approaches are related.

Exercise 3.11 (1995 AIME/15). Let p be the probability that, in the process of
repeatedly flipping a fair coin, one will encounter a run of 5 heads before one encounters a
run of 2 tails. Given that p can be written in the form m/n where m and n are relatively
prime positive integers, find m + n.

§4 Summing in Two Ways

§4.1 How it works
The easiest way to introduce summing in two ways, also known as double counting,
is by some algebraic examples.

Example 4.1
Let τ (n) denote the number of divisors of n. Show that
jnk jnk jnk
τ (1) + τ (2) + τ (3) + · · · + τ (n) = + + ··· + .
1 2 n

Before we present the solution, we ask the reader to try and write a formal argument
for this problem. Though the result should be intuitively quite obvious, a large part of
making this connection is how we present the solution.
Consider the following table, where we place a 1 if the row number is a multiple of the
column number:
1 2 3 ··· n Total
1 1 0 0 ··· 0 τ (1)
2 1 1 0 ··· 0 τ (2)
3 1 0 1 ··· 0 τ (3)
.. .. .. .. .. .. ..
. . . . . . .
n n n n
Total 1 2 3 ··· n

Both sides count the number of 1’s in the table, so they are equal. Have we seen this
before?
We offer another way of formalizing a proof without writing out a table, in a very
similar style.

10
Tony Lu (October 13, 2022) Linearity of Expectation

Consider the number of pairs (x, y), such that 1 ≤ x, y ≤ n and Px divides
y. If we
fix x, there are nx values of y such that x divides y, so there are nx=1 nx such pairs

when we sum over all x.

On the Pother hand, when we fix y, there are τ (y) values of x such that x divides y, so
there are ny=1 τ (y) pairs when we sum over all y.
However, these two quantities are counting the same thing! Thus, they are equal.
Here are two very similar exercises for practice.
Exercise 4.2. Show that
n n
X √
k
X
n = blogk nc .
k=2 k=2

Exercise 4.3 (Gauss). Prove that for all n ≥ 1 we have

X
φ(d) = n.
d|n

Here φ(n) is the Euler totient function.

In fact, we can think of summing in two ways as swapping the summation. The
idea is what we wrote earlier:
XX XX
f (a, b) = f (a, b).
a∈A b∈B b∈B a∈A

To prove the previous example problem, we can use indicator variables. The trick is
to define a variable (
1 x|y
1x,y =
0 otherwise.
Then, via swapping the summation,
n j k n X
n n X
n n
X n X X X
= 1x,y = 1x,y = τ (y).
x
x=1 x=1 y=1 y=1 x=1 y=1

Try writing out the two exercises in a similar fashion.

§4.2 How are they connected?

Now that we understand fundamentally how double counting and swapping the summation
works, we can finally tie in all our previous discussions. First, let’s revisit our first example
but this time from a double counting standpoint.

Example 4.4
At a party, the name tags of n participants are shuffled and then redistributed
randomly to the same n people. What is the expected number of people who have
their own name tag?

Our earlier solution to this problem involved summing over a table, so we suspect that
we can write a viable double counting solution as well. We encourage the reader to try
and come up with the proof themselves before reading the solution.
The idea is to double count pairs (σ, k) where student k receives his own name tag
in permutation σ. The expected value is essentially the average of the number of pairs

11
Tony Lu (October 13, 2022) Linearity of Expectation

containing σ over all permutations σ; in other words, it is the total number of pairs
divided by the total number of permutations. However, we can count the total number
of pairs by fixing k and counting the total number of permutations σ where k is a fixed
point. In particular, for each k, there are (n − 1)! permutations σ such that (σ, k) is a
pair. Thus, there are n! such pairs, and the desired average is simply 1.
Finally, we are able to introduce Linearity of Expectation, the synthesis of all our pre-
vious ideas and a universal perspective for analyzing expected value problems.

Theorem 4.5 (Linearity of Expectation)

Let X1 , X2 , X3 , · · · , Xn be not necessarily independent random variables. Then,

E[X1 + X2 + X3 + · · · + Xn ] = E[X1 ] + E[X2 ] + · · · + E[Xn ].

The idea behind this proof is using swapping the summation to prove the n = 2 case.

Proof. It suffices to prove the case n = 2. (Why?) Then, using the definition of expected
value,
X
E[X1 + X2 ] = (x1 + x2 ) · P[X1 + X2 = x1 + x2 ]
x1 +x2
XX
= (x1 + x2 ) · P[X1 = x1 , X2 = x2 ].
x1 x2

The second step is known as splitting the summation: convince yourself that every term in
the first summation can be decomposed into terms of the form in the second summation.
This can be written as
XX XX
x1 · P[X1 = x1 , X2 = x2 ] + x2 · P[X1 = x1 , X2 = x2 ].
x1 x2 x1 x2

However, notice that in the first summation, when we fix x1 and sum over all x2 , the
expression X
P[X1 = x1 , X2 = x2 ] = P[X1 = x1 ]
x2

as all values of x2 are accounted for. Thus,

XX X
x1 · P[X1 = x1 , X2 = x2 ] = x1 P[X1 = x1 ].
x1 x2 x1

The key is that we can do something similar for the second summation: swapping the
order of summation, we can fix x2 and vary over all x1 instead. Thus, we similarly have
XX XX X
x2 · P[X1 = x1 , X2 = x2 ] = x2 · P[X1 = x1 , X2 = x2 ] = x2 · P[X2 = x2 ].
x1 x2 x2 x1 x2

However, each of these quantities are simply E[X1 ] and E[X2 ] respectively! Thus, we
have established
E[X1 + X2 ] = E[X1 ] + E[X2 ].

12
Tony Lu (October 13, 2022) Linearity of Expectation

This theorem is powerful, and we can use it many times as a “shortcut” to a canonical
double counting solution. For example, in Example 4.4, we can define Xi = 1 if person i
receives their own name tag and 0 otherwise. Then,
1 1 1
E[X1 + X2 + · · · + Xn ] = E[X1 ] + E[X2 ] + · · · + E[Xn ] = + + ··· + = 1 ,
n n n
1
as E[Xi ] is simply n by definition.

Exercise 4.6. Let X and Y be two independent events. Show that

E[XY ] = E[X]E[Y ]

algebraically.

These Xi are known as indicator variables. (Have we seen this term before?)
They are helpful because they essentially help us turn expected value computations into
probability computations. If we define
(
1 Event E does occurs
X=
0 Event E does not occur

then
E[X] = 1 · P[E] + 0 · P[not E] = P[E].
See how this is helpful?
In summary, we can use linearity of expectations to split larger expected values into
sums of smaller ones, where the smaller expected values are all indicator variables,
and thus can be computed by the above property. Let’s look at a concrete example of
this.

Example 4.7 (Brilliant)

A fair coin is tossed 25 times, and an H is written down if the coin lands heads and
a T written down otherwise in the order of the tosses. Find the expected number of
HH subsequences in the sequence of tosses.

The idea is to split the entire expected value as follows:

E[Total subsequences] = E[HH in flips 1, 2]+E[HH in flips 2, 3]+· · ·+E[HH in flips 24, 25].

Though these smaller indicator variables are not independent, we can still apply linearity
by the previous theorem. However, these smaller expected values are all indicator values.
In particular,
1
E[HH in flips 1, 2] = P[HH in flips 1, 2] = .
4
1
Thus, the answer is 4 · 24 = 6 expected subsequences.

Exercise 4.8. Solve the previous problem using double counting.

Hint. Your solution should start with “we double count pairs (S, i) where S is a
sequence and i is an index such that...”

Here is another very similar and simple exercise to check your understanding:

13
Tony Lu (October 13, 2022) Linearity of Expectation

Exercise 4.9 (1989 AHSME/30). Suppose that 7 boys and 13 girls line up in a row.
Let S be the number of places in the row where a boy and a girl are standing next to
each other. For example, for the row GBBGGGBGBGGGBGBGGBGG we have that
S = 12. Find the expected value of S.
A more difficult exercise, but nonetheless a similar idea:
Exercise 4.10 (2013 HMMT C6). Values a1 , a2 , a3 , · · · , a2013 are chosen independently
and at random from the set {1, 2, 3, · · · , 2013}. What is the expected number of distinct
values in the set {a1 , a2 , · · · , a2013 }?
To conclude this subsection, we can combine our earlier discussion of Pigeonhole with
double counting to solve more difficult existence problems.

Example 4.11 (Similar to Canada 2009/2)

100 rays emanating from the center of four concentric circles intersect each circle at
100 points, of which 50 are colored black and 50 are colored white. Show that one
can rotate the circles and the points thereof such that there exist 13 rays with the
four points on each ray being the same color.

By Pigeonhole, we simply want to show that the average number (or expected number)
of rays that have four points of the same color is greater than 12. Thus, the problem
turns into an expected value computation!
To do this, we can count the total number of monochromatic rays across all rotations.
Let us consider any one of the 100 rays and assume WLOG that it contains a black point
on the first circle. First, we can fix the orientation of the first circle. Then, there are 503
ways to rotate the other three circles such that the ray contains four black points. Thus,
across all 100 rays, there are 503 · 100 monochromatic rays, across 1003 total rotations.
Then, the expected value
503 · 100
E[Monochromatic Rays] = = 12.5 > 12,
1003
and thus we are done.
Exercise 4.12. The numbers 1 through 10 are placed around a circle.
(a) Show that there exist three consecutive numbers on this circle with sum at least 17.
(b) Show that there exist three consecutive numbers on the circle with sum at least 18.
There is a very clean but intuitive global solution to the second part; see if you can
find it!
This type of problem where we compute an expected value in some form to prove
existence is very common in olympiads. We will learn that drawing a table, per how we
motivated our methods of double counting, will become a powerful method of visualizing
and conceptualizing problems later; many similar problems will be covered in the next
section.

§4.3 Some Surprising Examples

Here we present two problems where expected value and linearity of expectation can unex-
pectedly help us solve problems that seem unrelated to the topic. The idea through both
of these problems is that usually we use probabilities and indicator variables to compute
expected values using linearity, but this time we can do the exact opposite!

14
Tony Lu (October 13, 2022) Linearity of Expectation

Example 4.13 (Buffon’s Needle)

A needle of length 1 is dropped arbitrarily at random onto a floorboard where the
horizontal creases are equally spaced 1 unit apart. Find the probability that the
needle intersects one of the creases.

Instead of computing the probability of intersection, we compute the expected number of

intersections instead. In particular, the following claim helps us to compute the expected
number of intersections:

Claim — For an arbitrary noodle of length `, there exists a constant c such that
the expected number of intersections is c · `.

The following proof is not completely rigorous, as it is difficult to define length in an

accurate fashion for arbitrary noodles. Nonetheless, it outlines the general method to
prove the claim.

Proof. The main idea is to partition the noodle into tiny segments of length ε, each of
which has a constant probability p of intersecting the floorboard. Thus, by linearity of
expectation, the expected number of intersections is
`
p· =c·`
ε
for some fixed constant c.

Now, we simply want to extract the constant c. If we let the noodle be a circle
with diameter 1, then it intersects the floorboard with certainty exactly once, and thus
2
c = π2 . As a result, the expected number of intersections is , which is also our
π
probability.

Example 4.14
n points are chosen uniformly at random on the circumference of a circle. Find the
probability that all n points lie on a semicircle.

We utilize a similar idea presented in the previous problem. Instead of computing the
probability directly, we compute the expected number of semicircles with one end at one
of the n points which contain all n points. This becomes much easier: we can compute
the individual probability that all other n − 1 points lie on the semicircle, which occurs
1
with probability 2n−1 .
This means that

E[Semicircles with all n points] = E[Semicircles at point 1] + · · · + E[Semicircles at point n]

= P[Semicircle 1 contains all] + · · · + P[Semicircle n contains all].
1 1 1 n
= n−1 + n−1 + · · · + n−1 = n−1 .
2 2 2 2
n
Thus, the probability that one of these n semicircles covers all n points is 2n−1 . However,
if there exists a semicircle with endpoints not at the n points that covers all n points, we
can shift it in one direction or another until one of its endpoints is at one of the points.

15
Tony Lu (October 13, 2022) Linearity of Expectation

Thus, the two probabilities are equivalent, and thus the probability that there exists such
n
a circle is simply 2n−1 .
Problems that use linearity of expectation in the reverse direction are quite rare, but
they are nonetheless quite straightforward once one correctly harnesses the power of
linearity. A few guidelines when using linearity of expectation:

Make sure that your random variables actually sum to the random variable you are
splitting up and do not overlap! It is easy to handwave confuse dependent events
with overlapping quantities in the random variables.

Though E[X + Y ] = E[X] + E[Y ] holds even if X and Y are dependent, other
properties, such as E[XY ] = E[X]E[Y ], only work if X and Y are independent.

§5 Extra Example Problems

In this section we demonstrate how the ideas discussed earlier are used in practice by
presenting some of the most canonical and instructive problems on expected value and
global ideas.
We begin with a fairly well-developed double counting and pigeonhole problem.

Example 5.1 (AMSP)

Each square of an n × n array contains a symbol. We know that there are n distinct
symbols in total and that each of them occurs n times. Show that there is a row or
√
column with at least n distinct symbols.

Usually when we perform double counting, we double count incidences: pairs (a, b). This
is how we can utilize summing in two ways: we can fix a and count over all b, or fix b
and count over all a.
This problem is no different. Because we want to show that there exists a row or
√
column ` with at least n distinct symbols s, we can double count pairs (`, s). As we
√
want to show there exists at least n pairs for some `, we can instead sum over all s to
count the total number of pairs. In other words, we want to show that the total number
√
of pairs is at least 2n n because there are 2n possible choices of `.
√
Thus, let’s fix the symbol s; we want to show that there are at least 2 n lines ` with
s ∈ ` because there are n different symbols. Say s appears in a columns and b rows.
Then, we know that all symbols of type s must lie in one of the ab squares that constitute
the intersection between one of the columns and one of the rows. In other words,
√ √
a + b ≥ 2 ab ≥ 2 n
√
as ab ≥ n, so the total number of rows and columns s appears in is at least 2 n, as
needed.
The next problem illustrates an idea that essentially embodies a more generalized
version of double counting.

Example 5.2 (Canada 2006/4)

In a tournament with 2n + 1 vertices, find the maximum and minimum number of
directed triangles.

16
Tony Lu (October 13, 2022) Linearity of Expectation

The minimum is not exactly interesting: we leave it to an exercise to the reader to

construct a tournament with no directed triangles.
The difficulty comes in computing the maximum. Let x be the number of directed
triangles, and let y be the number of acyclic triangles: essentially, triangles that have
two edges pointing into one vertex.
Then, as there are 2n+1
3 triangles total, we have the relation

2n + 1
x+y = .
3

To derive another constraint on x and y, we look at the definitions of these triangles.

The key (and the main difficulty in the problem) comes from double counting directed
angles: angles at a vertex between two edges oriented in opposite directions.
Notice that we can count directed angles with respect to each triangle: there are
exactly 3 directed edges in each of the x cyclic triangles, and only 1 directed edge in
each of the y acyclic triangles. On the other hand, we can count the directed angles with
respect to each vertex. As 2n edges emanate from the same vertex, by AM-GM, we can
choose at most n2 pairs of them that point in opposite directions. (Why?) Thus, we
have the inequality
3x + y ≤ n2 .
Combining this with the previous equality, we obtain

2n + 1 n
x≤ − (2n + 1)
3 2

triangles.

Exercise 5.3. Construct the equality case.

Hint: Think about what inequalities we used in the solution and when they have
equality!

As seen from the above problem, in graph theoretic and geometric formulations, objects
frequently double counted are angles between edges at vertices: they are easy to count
from multiple perspectives. The double counting may not be as obvious as counting pairs
(a, b) in many problems either. Next, we have a double counting problem that strongly
illustrates the emphasis of the table behind our theory.

Example 5.4 (2017 APMO/3)

Let A(n) denote the number of sequences a1 ≥ a2 ≥ · · · ≥ ak of positive integers for
which a1 + · · · + ak = n and each ai + 1 is a power of two (i = 1, 2, · · · , k). Let B(n)
denote the number of sequences b1 ≥ b2 ≥ · · · ≥ bm of positive integers for which
b1 + · · · + bm = n and each inequality bj ≥ 2bj+1 holds (j = 1, 2, · · · , m − 1). Prove
that A(n) = B(n) for every positive integer n.

Once one finds the correct double counting interpretation behind this problem, there is
really not much to it. Thus, we encourage the reader to attempt to construct such an
argument before we present a bare-bones solution.

17
Tony Lu (October 13, 2022) Linearity of Expectation

Proof. Consider the following partition of the numbers in the sequence {bn }:

b1 = x1
b2 = 2x1 + x2
b3 = 4x2 + 2x2 + x3
..
.
bm = 2m−1 x1 + 2m−2 x2 + · · · + xm .

Obviously, choosing a unique m-tuple of nonnegative integers (note that the integers can
equal zero) is a bijection to choosing the bi . Now, sum the columns vertically; let the
first nonzero sum be a constant times a1 , second nonzero sum to be a constant times
a2 , and so on. Every such sequence of ai ’s that are one less than a power of two can be
broken into such components; furthermore, by using the sum of a geometric series and
factoring out each xi term from the column sum, each array of xi ’s will produce a unique
sequence {an }. Thus, both sequences biject to the sequence {xi }, so A(n) = B(n) by the
bijection.

The key in the above problem was to decompose the structure of the ai and the bi into
something common: in this case, consecutive powers of two. There is no real formula or
method to find such a connection between seemingly independent conditions, so fluency
can only come through practice and conscious reflection.
The next example demonstrates how we can work backwards from the assertion given
by the problem to discover our double counting argument.

Example 5.5
Let A1 , A2 , · · · , An+1 be subsets of a set S with n elements. Show that
X |Ai ∩ Aj |
≥ 1.
|Ai | · |Aj |
1≤i<j≤n

Let’s look at what exactly each of our fractions count. The denominator, |Ai | · |Aj |, is
simply the number of ways to choose one element from each of two subsets Ai , Aj , and
|Ai ∩ Aj | is the number of ways to choose one element from the intersection of Ai and Aj .
Because we are only choosing one element in the numerator and two in the denominator,
we consider the prospect of identical elements: the numerator essentially counts the
number of ways to choose a pair of identical elements (a, a) which lie in both Ai and
Aj . In other words, the fraction is the probability that two randomly chosen elements
from the Ai and Aj are the same element. (The intersection condition can be dropped,
because it is implied!)
Finally, as we are computing a sum of these fractions, we can rewrite these probabilities
as indicator variables. In other words, each fraction is the expected number of overlapped
elements when we choose one element from Ai and Aj . Summing over all possible pairs
(i, j), we are looking for the expected number of pairs of overlapped elements when we
choose each element from every one of the Ai .
But now this is easy! Because we choose n + 1 elements from an n-element subset, two
of them must be the same by Pigeonhole. This implies that the number of intersections
must be at least 1 in all cases, which correspondingly means that the expected value
must be at least 1. This proves the result.

18
Tony Lu (October 13, 2022) Linearity of Expectation

The next problem is one of many similarly flavored problems, where we use the “forming
a table” idea behind our original formulation of double counting to understand the problem.
See if you can recognize similar flavors in the practice problems!

Example 5.6 (1996 All-Russian 9.4)

In the Duma there are 1600 delegates, who have formed 16000 committees of 80
persons each. Prove that one can find two committees having no fewer than four
common members.

This is one of many problems where the double counting interpretation is fairly obvious:
we can double count pairs (Ci , ai ) where ai is a delegate in committee Ci . To begin with,
why is the following solution wrong?
Fake Solution. Number the committees C1 , C2 , C3 , · · · , C16000 in some order, and call
the delegates a1 , a2 , · · · , a1600 . The probability that any given delegate is on a given
1
committee is 20 . As a result, given two committees Ci , Cj , a certain ak ∈ Ci , Cj with
1 2

probability 20 . Thus, we expect to see
2
1
· 1600 = 4
20
members in the ak on both committees. Obviously, there must exist such a construction.

To figure out the mistake in this solution, let’s write it using our linearity of expectation
jargon. We wish to compute the expected number of members in the intersection between
any two randomly chosen committees. In other words,
1600
X
E[Number of common delegates] = P[Delegate i is a common delegate]
i=1

by using each of the terms as an indicator variable. So far, so good. However, what really
is the probability that delegate i is a common delegate? If we randomly fix a committee
1
and choose a delegate, it is true that the delegate has a 20 probability of appearing in the
committee. However, that is not what we are doing here: we are fixing a delegate (in this
case, delegate i) and claiming that for any arbitrary committee (or pair of committees),
1
the delegate has a 20 probability of appearing in that pair of committees.
However, this is obviously false! If we had a delegate in only no committees (which
is totally possible), then there would be a probability of 0 that any randomly chosen
committee would contain this delegate. However, if there were to be a delegate in all
16000 committees, then the probability would be 1! We are claiming that it is fixed at
1
20 , which we have not warranted.
Thus, the main idea presented here is that while linearity of expectation allows us to
sum across both rows and columns, they are still different! When dealing with probability,
fixing a row and looking at a column is totally different from fixing a column and looking
at a row. This is a trap that is often fallen into: while using linearity of expectation, we
may confuse our indicator variables and probabilities regarding what they are fixing and
what they are considering over.
With that in mind, we now present a correct solution. Notice how we avoid the subtlety
that rendered the previous solution incorrect.

Proof. Because there are 16000 · 80 total appearances of any delegate in any committee,
we expect to see each delegate in 800 committees. Thus, there are a total of 800

2 · 1600

19
Tony Lu (October 13, 2022) Linearity of Expectation

16000

delegates in the intersection of any two committees. As there are 2 committees, the
average size of intersection is

800 1600 63920
· 16000 = > 3.
2 2
15999

As a result, there must exist two committees whose intersection contains four or more
members.

The following problem illustrates an important idea when applying Pigeonhole to

problems: try to make local estimates or apply Pigeonhole locally. This is similar to a
connected idea in inequalities.

Example 5.7 (2012 USAMO/5)

A circle is divided into 432 congruent arcs by 432 points. The points are colored in
four colors such that some 108 points are colored Red, some 108 points are colored
Green, some 108 points are colored Blue, and the remaining 108 points are colored
Yellow. Prove that one can choose three points of each color in such a way that the
four triangles formed by the chosen points of the same color are congruent.

Walkthrough. For this problem specifically, we ask the reader to follow along with the
following steps to solve the problem: we will discuss inherent ideas behind the problem
later.

(a) Show that there exists two 28-agons, each with all vertices of the same color, that
are congruent.

(b) Show that there exists three octagons, each with all vertices of the same color, that
are congruent.

(c) Finish the problem.

Magic?
The idea in this problem is that the bound is not tight enough. Summing in the obvious
1084
way yields a bound of 432 3 −ε < 2 expected intersections between all four points. Upon
inspection, however, this is a pretty loose bound: there are many cases where things don’t
intersect (i.e. the EV is 0) but they are counted in the numerator. Replacing 4323 with
431 · 430 · 429 doesn’t do much, either.
The piecemeal Pigeonhole application is simply to encode this emptiness as much as
possible. Notice that at every step, the difference between 432 and 431 makes a critical
difference: otherwise, we wouldn’t have been able to construct the 28-gon (instead a
27-gon), 8-gon (instead a 7-gon), and so on; ignoring these slight non-identity irregularities
turns this solution into essentially the global case (with only segments found), but taking
advantage of the small alterations forms the crux of the solution.
To conclude our examples, we examine a few pigeonhole and double counting problems
phrased in a more pure geometric or number theoretic context.

Example 5.8
Several circles with circumferences summing to 10 lie strictly inside a square with
side length 1. Prove that there exists a line that intersects at least four of the circles.

20
Tony Lu (October 13, 2022) Linearity of Expectation

This problem feels somewhat loose: we are not even given any constraints on the number
of circles; there would very well be a huge number of them. The main technique we will
use (and is quite common, as you will see in the praactice problems) is projections. The
key is that the following lemma is useful:

Lemma
Suppose we have disjoint segments with length summing to more than n contained
in a segment of side length 1. Then there exists a point on the segment that lies on
at least n + 1 of the segments.

It should be quite intuitive why this lemma is true by Pigeonhole; it is difficult to prove
this rigorously with our current definition of length, so we leave a formal proof up to the
curious reader.
However, this lemma now offers us a valuable way to solve the problem. Projecting
all the circumferences onto a side of the square, the total sum of the segments that are
the projections is 10 10
π (the sum of the diameters). However, π > 3, which means that by
Pigeonhole, there exists a point that lies on at least four of the segments. Taking the
vertical line through this point guarantees that it intersects the four circles for which the
projections of the diameter contain the projected point, as required.

Example 5.9
n points lie in general position on the plane. Show that there exists some constant c
such that the maximum number of pairs of points that are a distance 1 apart from
√
each other is cn n.

First, we have to encode the constraint in a useful way: for instance, in its current
formulation, there’s nothing preventing us from having all the points a distance 1 from
each other! To do so, we use the fact that a triangle has a unique circumcenter; in other
words, there cannot exist two points that are both equidistant from a given three points.
Thus, we can double count pairs (P, T ) where P is a point and T is a pair of points
such that P is distance 1 from all vertices in T . Let S denote the set of all the points.
Define f (P ) to be the number of points that are a distance 1 from P . If we fix T , there
are at most
two points P for this T , which implies that the total number of pairs is at
most 2 n2 . However, the desired quantity – the number of pairs of adjacent points, is the
sum of f (P ) over all choices of P .
Now,to find a condition on f (P ), notice that the number of pairs (P, T ) for a fixed P
is f (P )
2 . Thus,
X PP ∈S f (P )
n f (P ) n
2 ≥ ≥n
2 2 2
P ∈S
P
by Jensen. Let x = P ∈S f (P ) be (twice) the desired number of pairs. Then,
x x
n· · − 1 ≤ 2n(n − 1).
n n
This yields √
n(1 + 8n − 7) √
x≤ ∼ n 2n,
2
√
n√ n
so the desired number of points is about 2
, which solves the problem.

21
Tony Lu (October 13, 2022) Linearity of Expectation

Exercise 5.10. Notice that we could also have counted pairs (P, T ) where P is a point
and T is a triangle, but this yields a weaker bound. Do so.

For the interested reader, there is the following graph theoretic generalization, for
which the proof follows along basically the same lines but is more involved asymptoti-
cally.

Theorem 5.11
There exists a constant c > 0 such that the maximum number of edges in a graph
1
with n vertices without Ks,t is c · n2− s .

Remark 5.12. An important inequality that we will use throughout many of these problems
is Jensen’s inequality on the convex binomial function xk for x ≥ k. In other words, in

general, we have
n Pni=1 ai
X ai n
≥n .
i=1
k k
This is useful as we have to choose out of a subset when we fix one element during our
double counting; for instance, it was used in the previous example.

Example 5.13
Let f (n) denote the largest prime factor of a positive integer n. Let {an } be a
strictly increasing sequence of positive integers. Show that for every positive integer
M , there exists i 6= j such that f (ai + aj ) > M .

This is a fairly difficult problem when presented sending alone; the main idea is to use
the property that the sequence is unbounded (as it is strictly increasing). But given a
finite number of primes that divide sums of the form ai + aj , we eventually may obtain
size issues with respect to the ai . The formal proof below may formalize that intuition.

Proof. For the sake of contradiction assume that p1 , p2 , · · · , pk are the only prime factors
6 j. Now consider the k + 1 numbers ai + aj for fixed i and
dividing ai + aj for all i =
1 ≤ j ≤ k + 1. By assumption every ai + aj is of the form pα1 1 pα2 2 · · · pαk k . As the
sequence is unbounded, suppose that ai > (p1 p2 · · · pk )N for some large integer N . We
may conclude that
max(α1 , α2 , · · · , αk ) > N
due to the only prime factors being p1 , p2 , · · · , pk . Thus, for every j, there exists a p such
that pN | ai + aj ; but there are only k primes p and k + 1 indices j, so by Pigeonhole
two such sums correspond with the same prime p. So
Y
pN | aj2 − aj1 ⇐⇒ pN | (aj2 − aj1 ).
1≤j1 <j2 ≤k+1

But the RHS is fixed, while N can be arbitrarily large by selecting large enough i (as the
sequence {an } increases).

22
Tony Lu (October 13, 2022) Linearity of Expectation

§6 Practice Problems
Mostly olympiad problems, though a few practice problems on states are mixed in. They
are not strictly in increasing order of difficulty, though the reader will (hopefully) find
some of the earlier problems simpler.

Problem 6.1. Find the maximum possible size of a subset S of {1, 2, 3, · · · , 2017}, such
that there do not exist distinct a, b, c ∈ S with ab = c.

Problem 6.2 (2002 Putnam). Given any five points on a sphere, show that some four
of them must lie on a closed hemisphere.

Problem 6.3 (2002 IMC). 200 students participated in a math contest. They had 6
problems to solve. Each problem was correctly solved by at least 120 participants. Prove
that there must be 2 participants such that every problem was solved by at least one of
these two students.

Problem 6.4 (2004 BAMO). Suppose one is given n real numbers, not all zero, but
such that their sum is zero. Prove that one can label these numbers a1 , a2 , ..., an in such
a manner that
a1 a2 + a2 a3 + ... + an−1 an + an a1 < 0.

Problem 6.5 (2001 IMO/4). Let n be an odd integer greater than 1 and let c1 , c2 , . . . , cn
Pnintegers. For each permutation a = (a1 , a2 , . . . , an ) of {1, 2, . . . , n}, define S(a) =
be
i=1 ci ai . Prove that there exist permutations a 6= b of {1, 2, . . . , n} such that n! is a
divisor of S(a) − S(b).

Problem 6.6 (1987 IMO/1). Let pn (k) be the number ofPpermutations of the set
{1, 2, 3, . . . , n} which have exactly k fixed points. Prove that nk=0 kpn (k) = n!.

Problem 6.7 (AoPS). On a 7 × 7 square piece of graph paper, the centers of k of the
49 squares are chosen. No four of the chosen points are the vertices of a rectangle whose
sides are parallel to those of the paper. What is the largest k for which this is possible?

Problem 6.8 (1985 AIME/14). In a tournament each player played exactly one game
against each of the other players. In each game the winner was awarded 1 point, the loser
got 0 points, and each of the two players earned 1/2 point if the game was a tie. After
the completion of the tournament, it was found that exactly half of the points earned
by each player were earned against the ten players with the least number of points. (In
particular, each of the ten lowest scoring players earned half of her/his points against the
other nine of the ten). What was the total number of players in the tournament?

Problem 6.9 (2022 HMMT C5). Five cards labeled 1, 3, 5, 7, 9 are laid in a row in that
order, forming the five-digit number 13579 when read from left to right. A swap consists
of picking two distinct cards, and then swapping them. After three swaps, the cards form
a new five-digit number n when read from left to right. Compute the expected value of n.

Problem 6.10 (2000 Putnam). Let n ≥ 3 be an integer, and let B be a set of more than
2n+1
n distinct points in n-dimensional space with coordinates of the form (±1, ±1, . . . , ±1).
Show that there are three distinct points P, Q, R in B such that P QR is an equilateral
triangle.

Problem 6.11. Let S = [n]. Suppose that some subsets A1 , A2 , · · · , Ak of S satisfy the
following:

23
Tony Lu (October 13, 2022) Linearity of Expectation

For any element x ∈ S, there exists an i such that x ∈ Ai .

For any y 6= z ∈ S, there exists a subset Aj that contains exactly one of y, z.

Determine the minimum possible value of k.

Problem 6.12 (1998 IMO/2). In a contest, there are m candidates and n judges, where
n ≥ 3 is an odd integer. Each candidate is evaluated by each judge as either pass or fail.
Suppose that each pair of judges agrees on at most k candidates. Prove that
k n−1
≥ .
m 2n
Problem 6.13 (Shortlist 2006 C3). Let S be a finite set of points in the plane such that
no three of them are on a line. For each convex polygon P whose vertices are in S, let
a(P ) be the number of vertices of P , and let b(P ) be the number of points of S which are
outside P . A line segment, a point, and the empty set are considered as convex polygons
of 2, 1, and 0 vertices respectively. Prove that for every real number x
X
xa(P ) (1 − x)b(P ) = 1,
P

where the sum is taken over all convex polygons with vertices in S.

Problem 6.14 (2000 Putnam B1). Let aj , bj , cj be integers for 1 ≤ j ≤ N . Assume for
each j, at least one of aj , bj , cj is odd. Show that there exists integers r, s, t such that
raj + sbj + tcj is odd for at least 4N 7 values of j, 1 ≤ j ≤ N .

Problem 6.15 (2016 AIME I/13). Freddy the frog is jumping around the coordinate
plane searching for a river, which lies on the horizontal line y = 24. A fence is located at
the horizontal line y = 0. On each jump Freddy randomly chooses a direction parallel
to one of the coordinate axes and moves one unit in that direction. When he is at a
point where y = 0, with equal likelihoods he chooses one of three directions where he
either jumps parallel to the fence or jumps away from the fence, but he never chooses
the direction that would have him cross over the fence to where y < 0. Freddy starts his
search at the point (0, 21) and will stop once he reaches a point on the river. Find the
expected number of jumps it will take Freddy to reach the river.

Problem 6.16 (1985 IMO/4). Given a set M of 1985 distinct positive integers, none of
which has a prime divisor greater than 23, prove that M contains a subset of 4 elements
whose product is the 4th power of an integer.

Problem 6.17 (2015 China Second Round/2). Let S = {A1 , A2 , . . . , An }, where

A1 , A2 , . . . , An are n pairwise distinct finite sets (n ≥ 2), such thatSfor any Ai , Aj ∈ S,
Ai ∪ Aj ∈ S. If k = min1≤i≤n |Ai | ≥ 2, prove that there exists x ∈ ni=1 Ai , such that x
is in at least nk of the sets A1 , A2 , . . . , An .

Problem 6.18 (2018 AIME II/13). Misha rolls a standard, fair six-sided die until she
rolls 1-2-3 in that order on three consecutive rolls. The probability that she will roll the
die an odd number of times is m n , where m and n are relatively prime positive integers.
Find m + n.

Problem 6.19. Suppose that S is a set of n points in the plane in general position.
√ ∈ S, there exists at least k points in S that are equidistant
Suppose that for every point P
from P . Show that k < 12 + 2n.

24
Tony Lu (October 13, 2022) Linearity of Expectation

Problem 6.20 (Iran). A school has n students, each of which is enrolled in at least two
classes. We know that if two classes have at least two students taking both classes, then
the number of students in the two classes are different. Show that there are at most
(n − 1)2 classes.

Problem 6.21 (Shortlist 2004 C1). There are 10001 students at an university. Some
students join together to form several clubs (a student may belong to different clubs).
Some clubs join together to form several societies (a club may belong to different societies).
There are a total of k societies. Suppose that the following conditions hold:

1. Each pair of students are in exactly one club.

2. For each student and each society, the student is in exactly one club of the society.

3. Each club has an odd number of students. In addition, a club with 2m + 1 students
(m is a positive integer) is in exactly m societies.

Find all possible values of k.

Problem 6.22 (China). 16 students attend a mathematical competition, where each

question has four possible answers. After the competition, it was found that any two
students had at most one common answer. Prove that there are at most 5 problems on
this competition.

Problem 6.23 (2006 HMMT C10). Somewhere in the universe, n students are taking
a 10-question math competition. Their collective performance is called laughable if, for
some pair of questions, there exist 57 students such that either all of them answered both
questions correctly or none of them answered both questions correctly. Compute the
smallest n such that the performance is necessarily laughable.

Problem 6.24 (2001 Iberoamerican/3). Let S be a set of n elements and S1 , S2 , . . . , Sk

are subsets of S (k ≥ 2), such that every one of them has at least r elements.
Show that there exists i and j, with 1 ≤ i < j ≤ k, such that the number of common
nk
elements of Si and Sj is greater or equal to: r − 4(k−1)

Problem 6.25 (Korea 2005). 11 students take a test. For any two question in a test,
there are at least 6 students who solved exactly one of those two questions. Prove that
there are no more than 12 questions in this test.

Problem 6.26 (Russia 1999). In a class, each boy knows at least one girl. Show that
one can choose a group of more than half of the students such that every boy in the
group knows an odd number of girls in the group.

Problem 6.27. A series of line segments with sum of lengths greater than 1000 lie in a
square with side length 1. Show that there exists a line that intersects at least 501 of
these segments.

Problem 6.28 (1993 Putnam). Let x1 , x2 , . . . , x19 be positive integers less than or
equal to 93. Let y1 , y2 , . . . , y93 be positive integers less than or equal to 19. Prove that
there exists a (non-empty) sum of some xi equal to a sum of some yi .

Problem 6.29 (China 1993). Ten people went to a bookstore. It is known that each
person bought exactly 3 books, and for every pair of people, there exists at least one
book that they both bought. Find the minimum possible number of people who bought
the most bought book.

25
Tony Lu (October 13, 2022) Linearity of Expectation

Problem 6.30. We choose at least n + 1 numbers among 1, 2, · · · , 2n. Prove that among
the chosen numbers, there are two different numbers whose sum is a prime number.

Problem 6.31 (2010 HMMT C9). Rosencrantz and Guildenstern are playing a game
where they repeatedly flip coins. Rosencrantz wins if 1 heads followed by 2009 tails
appears. Guildenstern wins if 2010 heads come in a row. They will flip coins until
someone wins. What is the probability that Rosencrantz wins?

Essential Discrete Mathematics for Computer Science 1st Edition Harry Lewis 2024 Scribd Download
100% (1)
Essential Discrete Mathematics for Computer Science 1st Edition Harry Lewis 2024 Scribd Download
55 pages
Solutions
No ratings yet
Solutions
323 pages
18 - Expected Value
No ratings yet
18 - Expected Value
38 pages
Chapter 5 - Principles of Inclusion-Exclusion, Pigeonhole Principle
No ratings yet
Chapter 5 - Principles of Inclusion-Exclusion, Pigeonhole Principle
6 pages
Class 15 PUMPING LEMMA FOR REGULAR LANGUAGE
No ratings yet
Class 15 PUMPING LEMMA FOR REGULAR LANGUAGE
17 pages
The Probability of An Event - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
No ratings yet
The Probability of An Event - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
8 pages
ExpectationMath[1]
No ratings yet
ExpectationMath[1]
19 pages
Onto (Surjective) Functions
No ratings yet
Onto (Surjective) Functions
75 pages
Open MOAT Solutions
No ratings yet
Open MOAT Solutions
16 pages
Notes On Measure, Probability and Stochastic Processes
No ratings yet
Notes On Measure, Probability and Stochastic Processes
155 pages
Junior Problem Seminar
No ratings yet
Junior Problem Seminar
148 pages
ProbabilisticMethod 14
No ratings yet
ProbabilisticMethod 14
12 pages
Expected Value of A Random Variable
No ratings yet
Expected Value of A Random Variable
10 pages
Combinatorica
100% (1)
Combinatorica
74 pages
Discussion Section of 1/23/13: 1.1 Expected Value and Variance
No ratings yet
Discussion Section of 1/23/13: 1.1 Expected Value and Variance
7 pages
21MidtermReview 1
No ratings yet
21MidtermReview 1
40 pages
Probabilistic Method
No ratings yet
Probabilistic Method
18 pages
Quant_Exercises
No ratings yet
Quant_Exercises
16 pages
Probabilistic Methods in Combinatorics - Prob-Comb
No ratings yet
Probabilistic Methods in Combinatorics - Prob-Comb
7 pages
Reductio Ad Absurdum
No ratings yet
Reductio Ad Absurdum
83 pages
Prob Stat
No ratings yet
Prob Stat
46 pages
Junior Problem Seminar
No ratings yet
Junior Problem Seminar
148 pages
Combi
No ratings yet
Combi
52 pages
MIT Overview Basic Probability
No ratings yet
MIT Overview Basic Probability
24 pages
School of Mathematical Sciences MAT 3044 Discrete Mathematics
No ratings yet
School of Mathematical Sciences MAT 3044 Discrete Mathematics
20 pages
unexpected_expectations 2015
No ratings yet
unexpected_expectations 2015
15 pages
Expected Uses of Probability - Evan Chen
No ratings yet
Expected Uses of Probability - Evan Chen
17 pages
20 de Secrete Pentru Fotografii Digitale Uimitoare V2
100% (1)
20 de Secrete Pentru Fotografii Digitale Uimitoare V2
18 pages
[Ebooks PDF] download Essential Discrete Mathematics for Computer Science 1st Edition Harry Lewis full chapters
100% (2)
[Ebooks PDF] download Essential Discrete Mathematics for Computer Science 1st Edition Harry Lewis full chapters
65 pages
Chapter 6 - Discrete Math
No ratings yet
Chapter 6 - Discrete Math
72 pages
ProbabilisticMethod 13
No ratings yet
ProbabilisticMethod 13
13 pages
ProbabilisticMethod 19
No ratings yet
ProbabilisticMethod 19
10 pages
ProbabilisticMethod 16
No ratings yet
ProbabilisticMethod 16
13 pages
ProbabilisticMethod 4
No ratings yet
ProbabilisticMethod 4
13 pages
Pigeon Hole Principle
No ratings yet
Pigeon Hole Principle
12 pages
ProbabilisticMethod 15
No ratings yet
ProbabilisticMethod 15
11 pages
ProbabilisticMethod 15
No ratings yet
ProbabilisticMethod 15
9 pages
Discrete Structure Unit 1
No ratings yet
Discrete Structure Unit 1
15 pages
Expectations of Discrete Random Variables: Scott Sheffield
No ratings yet
Expectations of Discrete Random Variables: Scott Sheffield
61 pages
cs109 lecture 1 - counting
No ratings yet
cs109 lecture 1 - counting
5 pages
mathematical expectation
No ratings yet
mathematical expectation
6 pages
March 13 Homework Solutions Math 151, Winter 2012 Chapter 7 Problems (Pages 373-379)
No ratings yet
March 13 Homework Solutions Math 151, Winter 2012 Chapter 7 Problems (Pages 373-379)
8 pages
ProbabilisticMethod 3
No ratings yet
ProbabilisticMethod 3
13 pages
Wattle Lecture 15
No ratings yet
Wattle Lecture 15
6 pages
HW3 Sols
No ratings yet
HW3 Sols
5 pages
ProbabilisticMethod 7
No ratings yet
ProbabilisticMethod 7
12 pages
Spring 07 Math 510 HW Solution I
No ratings yet
Spring 07 Math 510 HW Solution I
2 pages
ProbabilisticMethod 9
No ratings yet
ProbabilisticMethod 9
15 pages
Pigeon Hole
No ratings yet
Pigeon Hole
5 pages
Diktat Olimpiade Matematika I
100% (2)
Diktat Olimpiade Matematika I
154 pages
Expectations of Discrete Random Variables: Scott She Eld
No ratings yet
Expectations of Discrete Random Variables: Scott She Eld
18 pages
Probability p4
No ratings yet
Probability p4
25 pages
Lecture 19
No ratings yet
Lecture 19
23 pages
ST120 Practice Sheet 5
No ratings yet
ST120 Practice Sheet 5
2 pages
Expected Uses of Probability: Evan Chen
No ratings yet
Expected Uses of Probability: Evan Chen
18 pages
Introduction To Probability and Expected Value
No ratings yet
Introduction To Probability and Expected Value
12 pages
mit18_05_s22_class04-prep-b
No ratings yet
mit18_05_s22_class04-prep-b
7 pages
mt1 Fa06 Ee126 Soln PDF
No ratings yet
mt1 Fa06 Ee126 Soln PDF
5 pages
More Discrete R.V
No ratings yet
More Discrete R.V
40 pages
States: 1 Probability
No ratings yet
States: 1 Probability
6 pages
bmc_probability
No ratings yet
bmc_probability
4 pages
Lecture 10: November 2, 2021: 1 Basics of Probability: The Finite Case
No ratings yet
Lecture 10: November 2, 2021: 1 Basics of Probability: The Finite Case
3 pages
Discrete Lecture - Notes
No ratings yet
Discrete Lecture - Notes
48 pages
Pigeonhole Principle
No ratings yet
Pigeonhole Principle
0 pages
Probabilistic PDF
No ratings yet
Probabilistic PDF
16 pages
15 Combinatorics Pigeonhole Principle
No ratings yet
15 Combinatorics Pigeonhole Principle
13 pages
Math 112
No ratings yet
Math 112
3 pages
Mit18 05 s22 Exam1 Rev Pset Sol
No ratings yet
Mit18 05 s22 Exam1 Rev Pset Sol
11 pages
Probabilistic Methods in Combinatorics: 1 Warm-Up
No ratings yet
Probabilistic Methods in Combinatorics: 1 Warm-Up
7 pages
Finding The Number To Pick To Ensure A Result
No ratings yet
Finding The Number To Pick To Ensure A Result
2 pages
Random Variables Tarea Teoría
No ratings yet
Random Variables Tarea Teoría
8 pages
Q250: Math and Logic HW 5 Answer Key
No ratings yet
Q250: Math and Logic HW 5 Answer Key
3 pages
Mathematical Foundations of Computer Science Lecture Outline
No ratings yet
Mathematical Foundations of Computer Science Lecture Outline
6 pages
Discussion 9 Fall 2019 Solutions
No ratings yet
Discussion 9 Fall 2019 Solutions
7 pages
Probabilistic Methods in Combinatorics: 1 Warm-Up
No ratings yet
Probabilistic Methods in Combinatorics: 1 Warm-Up
7 pages
Small Mathematical Expectation
No ratings yet
Small Mathematical Expectation
6 pages
Pigeonhole Principle
100% (1)
Pigeonhole Principle
3 pages
Math 581 Homework 1 Solutions
No ratings yet
Math 581 Homework 1 Solutions
4 pages
Expectation Value (Statisitic Formulae)
No ratings yet
Expectation Value (Statisitic Formulae)
6 pages
Expected Value Markov Chains
No ratings yet
Expected Value Markov Chains
10 pages
Lecture 7 - Quantifying Uncertainty and Risk
No ratings yet
Lecture 7 - Quantifying Uncertainty and Risk
8 pages
Tutorial 2 PDF
No ratings yet
Tutorial 2 PDF
2 pages
The Probabilistic Method: David Arthur
No ratings yet
The Probabilistic Method: David Arthur
7 pages
Pigeonhole Principle
No ratings yet
Pigeonhole Principle
2 pages
Pigeonhole Principle Teach - Gabriel Carroll - MOP (Green) 2010
No ratings yet
Pigeonhole Principle Teach - Gabriel Carroll - MOP (Green) 2010
4 pages
Lecture 4 - Combinatorial Number Theory: 1 Pigeonhole Principle
100% (1)
Lecture 4 - Combinatorial Number Theory: 1 Pigeonhole Principle
15 pages
Complex numbers
From Everand
Complex numbers
Alessio Mangoni
No ratings yet
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)

Linearity of Expectation: Unraveling Black Magic

Uploaded by

Linearity of Expectation: Unraveling Black Magic

Uploaded by

Linearity of Expectation

Unraveling Black Magic

3 Other Related Ideas 5

4 Summing in Two Ways 10

5 Extra Example Problems 16

§1.2 Conceptual Introduction

(b) Is each outcome equally likely?

Definition 1.3 (Random Variable). A random variable X for an experiment is an

Definition 1.4 (Expected Value). Let X be a random variable of an experiment, and A

Notice that the restriction x ∈ A is not really needed; if x 6∈ A, P (X = a) = 0, so the

E[X] = (Probability of no fixed points) · 0 + (Probability of 1 fixed point) · 1+

(Probability of 2 fixed points) · 2 + · · · + (Probability of n fixed points) · n.

E[X]·n! = (# permutations with 0 fixed points)·0+(# permutations with 1 fixed point)·1

Proposition 2.2 (Weaker Linearity)

E[X1 + X2 + · · · + Xn ] = E[X1 ] + E[X2 ] + · · · + E[Xn ].

§2.2 Our Second Tour

p + p(1 − p) + p(1 − p)2 + ··· + (1 − p)k−1 · p + ···

so the total sum is simply

§3 Other Related Ideas

§3.1 Pigeonhole Principle

Theorem 3.1 (Pigeonhole Principle)

Theorem 3.2 (Essentially Pigeonhole Principle)

min ≤ average ≤ max.

§3.2 What’s Up with States?

(EV from 0 → 8) = (EV from 0 → 1) + (EV from 1 → 2) + · · · + (EV from 7 → 8).

µ = p + 2p(1 − p) + 3p(1 − p)2 + · · ·

into the expression, then we obtain

p + 2p(1 − p) + 3p(1 − p)2 + · · · = p + (1 − p)(p + 2p(1 − p) + 3p(1 − p)2 + · · · ),

Example 3.8 (Folklore)

A’s turn → C’s turn → A’s turn → · · · ,

Example 3.10 (2014 AMC 10B/25)

with a different probability in between. Or, we could turn

§4 Summing in Two Ways

when we sum over all x.

Exercise 4.3 (Gauss). Prove that for all n ≥ 1 we have

Here φ(n) is the Euler totient function.

Try writing out the two exercises in a similar fashion.

§4.2 How are they connected?

Theorem 4.5 (Linearity of Expectation)

E[X1 + X2 + X3 + · · · + Xn ] = E[X1 ] + E[X2 ] + · · · + E[Xn ].

as all values of x2 are accounted for. Thus,

Exercise 4.6. Let X and Y be two independent events. Show that

Example 4.7 (Brilliant)

The idea is to split the entire expected value as follows:

Exercise 4.8. Solve the previous problem using double counting.

Example 4.11 (Similar to Canada 2009/2)

§4.3 Some Surprising Examples

Example 4.13 (Buffon’s Needle)

Instead of computing the probability of intersection, we compute the expected number of

The following proof is not completely rigorous, as it is difficult to define length in an

E[Semicircles with all n points] = E[Semicircles at point 1] + · · · + E[Semicircles at point n]

§5 Extra Example Problems

Example 5.1 (AMSP)

Example 5.2 (Canada 2006/4)

The minimum is not exactly interesting: we leave it to an exercise to the reader to

To derive another constraint on x and y, we look at the definitions of these triangles.

Exercise 5.3. Construct the equality case.

Example 5.4 (2017 APMO/3)

Example 5.6 (1996 All-Russian 9.4)

The following problem illustrates an important idea when applying Pigeonhole to

Example 5.7 (2012 USAMO/5)

(c) Finish the problem.

 For any element x ∈ S, there exists an i such that x ∈ Ai .

 For any y 6= z ∈ S, there exists a subset Aj that contains exactly one of y, z.

Determine the minimum possible value of k.

Problem 6.17 (2015 China Second Round/2). Let S = {A1 , A2 , . . . , An }, where

1. Each pair of students are in exactly one club.

Find all possible values of k.

Problem 6.22 (China). 16 students attend a mathematical competition, where each

Problem 6.24 (2001 Iberoamerican/3). Let S be a set of n elements and S1 , S2 , . . . , Sk

You might also like

For any element x ∈ S, there exists an i such that x ∈ Ai .

For any y 6= z ∈ S, there exists a subset Aj that contains exactly one of y, z.