lectures_week4
lectures_week4
Covariance
We now introduce the covariance, a concept that is closely related to the variance. Rather
than being associated to a single random variable X, the covariance is associated to a pair of
random variables X and Y . Roughly speaking, the covariance of X and Y measures the degree
with which these random variables tend to vary together.
Definition. Let X and Y be two random variables. The covariance of X and Y is defined
as
Cov(X, Y ) = E [(X − E[X]) (Y − E[Y ])] ,
whenever this expectation exists.
Before we look at examples, let us list a few first properties of the covariance. First, the
covariance is symmetric, that is,
Cov(X, Y ) = Cov(Y, X).
Second, we clearly have
Cov(X, X) = Var(X).
Third, in the same way that there are two formulas to compute the variance (namely, E[(X −
E[X])2 ] and E[X 2 ] − (E[X])2 ), there is also an alternate formula for the covariance:
Proposition. Let X and Y be two random variables. Then,
Example. Let X and Y be discrete random variables with join probability mass function
1 1 1
pX,Y (−2, −2) = , pX,Y (−2, 2) = , pX,Y (3, 2) = .
6 3 2
Let us find the covariance of X and Y . We need to compute E[XY ], E[X] and E[Y ], which
we now do:
7
E[XY ] = (−2) · (−2) · pX,Y (−2, −2) + (−2) · 2 · pX,Y (−2, 2) + 3 · 2 · pX,Y (3, 2) = ,
3
1
E[X] = (−2) · pX,Y (−2, −2) + (−2) · pX,Y (−2, 2) + 3 · pX,Y (3, 2) = ,
2
4
E[Y ] = (−2) · pX,Y (−2, −2) + 2 · pX,Y (−2, 2) + 2 · pX,Y (3, 2) = .
3
Hence,
7 1 4 5
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = − · = .
3 2 3 3
1
Example. In each of the items below, a set A ⊆ R2 is equal to the union of the red squares.
1 1 1
x x x
1 1 1
In each item, consider a pair of jointly continuous random variables X and Y whose joint
probability density function is given by fX,Y (x, y) = C if (x, y) ∈ A (and 0 otherwise). Find C
and Cov(X, Y ) in each case.
Solution.
(a) Since
Z ∞ Z ∞ Z −1 Z −1 Z 3 Z 3
fX,Y (x, y) dx dy = C dx dy + C dx dy = 8C
−∞ −∞ −3 −3 1 1
and this double integral should equal 1, we obtain C = 81 . It is easy to see that the
distribution of X and Y are both symmetric about 0, so E[X] = E[Y ] = 0. We compute
Z ∞Z ∞
E[XY ] = xy · fX,Y (x, y) dx dy
−∞ −∞
Z −1 Z −1 Z 3 Z 3
1 1
= xy dx dy + xy dx dy = 4.
8 −3 −3 8 1 1
We then have
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 4.
(b) Similarly to above, we have C = 18 , and again by symmetry, E[X] = E[Y ] = 0. Next,
Z 3 Z −1 Z −1 Z 3
1 1
E[XY ] = xy dx dy + xy dx dy = −4.
8 1 −3 8 −3 1
Hence,
Cov(X, Y ) = −4
in this case.
1
(c) Here we get C = 16
, and simple symmetry considerations show that E[XY ] = E[X] =
E[Y ] = 0, so
Cov(X, Y ) = 0.
In fact, in this last case it can be shown that X and Y are independent, which, as we
will see next, always implies that the covariance between them is zero.
2
Proposition. Let X and Y be two independent random variables. Then,
Cov(X, Y ) = 0.
Proof. Recall that if X and Y are independent, then E[XY ] = E[X]E[Y ], so
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.
The following example shows that the converse to the above proposition is in general not true.
Example (Zero covariance does not imply independence). Let X be a discrete random
variable with probability mass function
1
pX (−2) = pX (−1) = pX (1) = pX (2) = .
4
Let Y = X 2 . Then,
Note that the distribution of X is symmetric about 0, and the same holds for X 3 , so E[X] =
E[X 3 ] = 0, so by the above we get Cov(X, Y ) = 0. However, X and Y are not independent;
this can for example be shown by noting that
We will now see that the covariance is linear in each of its two arguments.
Proposition. (Covariance of sums) For random variables X, Y and Z and real num-
bers a, b, we have
We leave the verification of this property as an exercise. Note that linearity can be used
repeatedly to give
m n
! m Xn
X X X
Cov ai X i , bj Y y = ai bj · Cov(Xi , Yj ).
i=1 j=1 i=1 j=1
Variance of sums
Let us recall that, if X and Y are independent random variables, then Var(X + Y ) = Var(X) +
Var(Y ). This equality need not be true when X and Y are not independent. We now see a
formula that holds in all cases.
Proposition. (Variance of sums) For random variables X and Y , we have
3
Proof. We prove the second formula, since it is more general:
n
! n n
! n X n
X X X X
Var Xi = Cov Xi , Xi = Cov(Xi , Xj )
i=1 i=1 i=1 i=1 j=1
n
X X
= Cov(Xi , Xi ) + Cov(Xi , Xj )
i=1 16i,j6n,
i6=j
n
X X
= Var(Xi ) + 2 Cov(Xi , Xj ).
i=1 16i<j6n
Note that, in the above proposition, if the Xi ’s are independent, then Cov(Xi , Xj ) = 0 for
all i 6= j, so we re-obtain that the variance of the sum is equal to the sum of the variances.
Example. A group of N men, all of which have hats, throw their hats to the floor of a room
and shuffle them, and then each man takes a hat from the floor at random (assume that the
shuffling is perfect, in the sense that at the end of the procedure, all the possible allocations
of hats to the men are equally likely). Let X denote the number of men who recover their
own hats. Find the expectation and variance of X.
Solution. We enumerate the men from 1 to N , and for i = 1, . . . , N , define the random
variable (
1 if man i recovers his own hat;
Xi =
0 otherwise.
PN
We have X = i=1 Xi , so we can compute the expectation and variance of X using the
formulas
XN XN X
E[X] = E[Xi ], Var(X) = Var(Xi ) + Cov(Xi , Xj ).
i=1 i=1 i6=j
In order to apply these formulas, we need E[Xi ] and Var(Xi ) for each i and Cov(Xi , Xj ) for
each i 6= j. Since man i is equally likely to recover any of the N hats, we have
1
P(Xi = 1) = .
N
Hence, Xi ∼ Ber(1/N ) and we have
1 1 N −1 N −1
E[Xi ] = , Var(Xi ) = · = .
N N N N2
Also, for i 6= j we note that Xi Xj is also a Bernoulli random variable (it only attains the
values 0 and 1) and
(N − 2)! 1
E[Xi Xj ] = P(Xi Xj = 1) = P(Xi = 1, Xj = 1) = =
N! N (N − 1)
(the third equality follows from the fact that there are (N − 2)! outcomes where men i and j
recover their own hats, and N ! outcomes in total).
4
Then,
1 1 1
Cov(Xi , Xj ) = E[Xi Xj ] − E[Xi ]E[Xj ] = − 2 = 2 .
N (N − 1) N N (N − 1)
and
N
X N −1 X 1
Var(X) = 2
+ 2
i=1
N i6=j
N (N − 1)
N −1 1
= + N (N − 1) · 2 = 1.
N N (N − 1)
Cauchy-Schwarz inequality
Next, we look at an important inequality.
Theorem. (Cauchy-Schwarz inequality) Given two random variables X and Y , we have
p
|Cov(X, Y )| 6 Var(X) · Var(Y ).
Proof. We give a classical proof of this inequality, that is often seen in Linear Algebra. Define
ϕ(t) := Var(X − t · Y ), t ∈ R.
We have
Note that ϕ(t) > 0 for all t, since the variance is always non-negative. Now, when a quadratic
function t 7→ at2 + bt + c is non-negative for all t ∈ R, the discriminant b2 − 4ac must be less
than or equal to zero. This gives
Correlation coefficient
The covariance is a useful quantity that describes how two random variables vary together.
However, it has one disadvantage: it is not scale invariant. To explain what this means,
suppose that X and Y are two random variables, both measuring lengths in meters. Assume
that U and V give the same measurements as X and Y , respectively, but in centimeters, that
is, U = 100X and V = 100Y . Then,
5
This means that changing the scale also changes the covariance. To obtain a scale-invariant
quantity, we make the following definition.
Definition. The correlation coefficient between two random variables X and Y is defined
as
Cov(X, Y )
ρ(X, Y ) = p .
Var(X) · Var(Y )
−1 6 ρ(X, Y ) 6 1.
ρ(aX + b, cY + d) = ρ(X, Y ).
Proof. The first statement is an immediate consequence of the Cauchy-Schwarz inequality. For
the second statement, first note that
We now note that the covariance between a constant and any other random variable is equal
to zero. For instance,
Cov(aX + b, cY + d) = ac · Cov(X, Y ).
Then,
Cov(aX + b, cY + d) ac · Cov(X, Y )
ρ(aX + b, cY + d) = p =p
Var(aX + b) · Var(cY + d) a · Var(X) · c2 · Var(Y )
2
ac · Cov(X, Y ) Cov(X, Y )
= p =p = ρ(X, Y ).
|a| · |c| · Var(X) · Var(Y ) Var(X) · Var(Y )
It is useful to observe that ρ(X, X) = 1 and ρ(X, −X) = −1. Two random variables X and Y
are called uncorrelated if ρ(X, Y ) = 0. Note that if X and Y are independent, then they are
uncorrelated, but the converse is not in general true.
Example. Let X1 and X2 be two independent random variables with expectation 0 and
variance 1.
1. Find ρ(X1 , X2 ).
√
2. Let Y1 := X1 and Y2 := cX1 + 1 − c2 · X2 , where c ∈ [−1, 1]. Determine E[Y2 ], E[Y22 ]
and ρ(Y1 , Y2 ).
6
Solution.
2. We start with
h √ i √
E[Y2 ] = E cX1 + 1− c2 · X2 = c · E[X1 ] + 1 − c2 · E[X2 ] = 0.
| {z } | {z }
=0 =0
Next,
√ √
2 h i
E[Y22 ] =E cX1 + 1− c2 · X2 = E c2 · X12 + (1 − c2 ) · X22 + 2c 1 − c2 · X1 · X2
√
= c2 · E[X12 ] + (1 − c2 ) · E[X22 ] + 2c 1 − c2 · E[X1 ] · E[X2 ]
= c2 · E[X12 ] + (1 − c2 ) · E[X22 ].
Noting that
Cov(Y1 , Y2 )
ρ(Y1 , Y2 ) = p = c.
Var(Y1 ) · Var(Y2 )