0% found this document useful (0 votes)

37 views248 pages

Multiple Random Variables

The document discusses the joint distribution of a pair of random variables X and Y. It defines the joint distribution function FXY(x,y) as the probability that X is less than or equal to x and Y is less than or equal to y. FXY(x,y) must satisfy certain properties, including being non-decreasing in each argument and satisfying FXY(x2,y2) - FXY(x2,y1) - FXY(x1,y2) + FXY(x1,y1) ≥ 0 for all x1 < x2 and y1 < y2. The joint distribution fully characterizes the probability of the random vector (X,Y).

Uploaded by

sunnyman9899

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views248 pages

Multiple Random Variables

Uploaded by

sunnyman9899

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 248

A pair of random variables

◮ Let X, Y be random variables on the same probability

space (Ω, F, P )
◮ Each of X, Y maps Ω to ℜ.
◮ We can think of the pair of radom variables as a
vector-valued function that maps Ω to ℜ2 .

[ XY ]

Sample Space 2
R

P S Sastry, IISc, E1 222 Aug 2021 1/248

◮ Just as in the case of a single rv, we can think of the
induced probability space for the case of a pair of rv’s too.
◮ That is, by defining the pair of random variables, we
essentially create a new probability space with sample
space being ℜ2 .
◮ The events now would be the Borel subsets of ℜ2 .
◮ Recall that ℜ2 is cartesian product of ℜ with itself.
◮ So, we can create Borel subsets of ℜ2 by cartesian
product of Borel subsets of ℜ.

B 2 = σ ({B1 × B2 : B1 , B2 ∈ B})

where B is the Borel σ-algebra we considered earlier, and

B 2 is the set of Borel sets of ℜ2 .

P S Sastry, IISc, E1 222 Aug 2021 2/248

◮ Recall that B is the smallest σ-algebra containing all
intervals.
◮ Let I1 , I2 ⊂ ℜ be intervals. Then I1 × I2 ⊂ ℜ2 is known
as a cylindrical set.
[a, b] X [c, d]

a b x

◮ B 2 is the smallest σ-algebra containing all cylindrical sets.

◮ We saw that B is also the smallest σ-algebra containing
all intervals of the form (−∞, x].
◮ Similarly B 2 is the smallest σ-algebra containing
cylindrical sets of the form (−∞, x] × (−∞, y].
P S Sastry, IISc, E1 222 Aug 2021 3/248
◮ Let X, Y be random variables on the probability space
(Ω, F, P )
◮ This gives rise to a new probability space (ℜ2 , B 2 , PXY )
with PXY given by

PXY (B) = P [(X, Y ) ∈ B], ∀B ∈ B 2

= P ({ω : (X(ω).Y (ω)) ∈ B})

(Here, B ⊂ ℜ2 )
◮ Recall that for a single rv, the resulting probability space
is (ℜ, B, PX ) with

PX (B) = P [X ∈ B] = P ({ω : X(ω) ∈ B})

(Here, B ⊂ ℜ)

P S Sastry, IISc, E1 222 Aug 2021 4/248

◮ In the case of a single rv, we define a distribution
function, FX which essentailly assigns probability to all
intervals of the form (−∞, x].
◮ This FX uniquely determines PX (B) for all Borel sets, B.
◮ In a similar manner we define a joint distribution function
FXY for a pair of random varibles.
◮ FXY (x, y) would be PXY ((−∞, x] × (−∞, y]).
◮ FXY fixes the probability of all cylindrical sets of the form
(−∞, x] × (−∞, y] and hence uniquely determines the
probability of all Borel sets of ℜ2 .

P S Sastry, IISc, E1 222 Aug 2021 5/248

Joint distribution of a pair of random variables

◮ Let X, Y be random variables on the same probability

space (Ω, F, P )
◮ The joint distribution function of X, Y is FXY : ℜ2 → ℜ,
defined by

FXY (x, y) = P [X ≤ x, Y ≤ y]
= P ({ω : X(ω) ≤ x} ∩ {ω : Y (ω) ≤ y})

◮ The joint distribution function is the probability of the

intersection of the events [X ≤ x] and [Y ≤ y].

P S Sastry, IISc, E1 222 Aug 2021 6/248

Properties of Joint Distribution Function

◮ Joint distribution function:

FXY (x, y) = P [X ≤ x, Y ≤ y]

◮ FXY (−∞, y) = FXY (x, −∞) = 0, ∀x, y;

FXY (∞, ∞) = 1
(These are actually limits: limx→−∞ FXY (x, y) = 0, ∀y)
◮ FXY is non-decresing in each of its arguments
◮ FXY is right continuous and has left-hand limits in each
of its arguments
◮ These are straight-forward extensions of single rv case
◮ But there is another crucial property satisfied by FXY .

P S Sastry, IISc, E1 222 Aug 2021 7/248

◮ Recall that, for the case of a single rv, given x1 < x2 , we
have
P [x1 < X ≤ x2 ] = FX (x2 ) − FX (x1 )
◮ The LHS above is a probability.
Hence the RHS should be non-negative
The RHS is non-negative because FX is non-decreasing.
◮ We will now derive a similar expression in the case of two
random variables.
◮ Here, the probability we want is that of the pair of rv’s
being in a cylindrical set.

P S Sastry, IISc, E1 222 Aug 2021 8/248

◮ Let x1 < x2 and y1 < y2 . We want
P [x1 < X ≤ x2 , y1 < Y ≤ y2 ].
◮ Consider the Borel set B = (−∞, x2 ] × (−∞, y2 ].
y

y
B3 2
B1

y
1

x1 x2 x
B2

B , (−∞, x2 ] × (−∞, y2 ] = B1 + (B2 ∪ B3 )

B1 = (x1 , x2 ] × (y1 , y2 ]
B2 = (−∞, x2 ] × (−∞, y1 ]
B3 = (−∞, x1 ] × (−∞, y2 ]
B2 ∩ B3 = (−∞, x1 ] × (−∞, y1 ]

P S Sastry, IISc, E1 222 Aug 2021 9/248

y
B3 2
B1

y
1

x1 x2 x
B2

FXY (x2 , y2 ) = P [X ≤ x2 , Y ≤ y2 ] = P [(X, Y ) ∈ B]

= P [(X, Y ) ∈ B1 + (B2 ∪ B3 )]
= P [(X, Y ) ∈ B1 ] + P [(X, Y ) ∈ (B2 ∪ B3 )]

P [(X, Y ) ∈ B2 ] = P [X ≤ x2 , Y ≤ y1 ] = FXY (x2 , y1 )

P [(X, Y ) ∈ B3 ] = P [X ≤ x1 , Y ≤ y2 ] = FXY (x1 , y2 )
P [(X, Y ) ∈ B2 ∩ B3 ] = P [X ≤ x1 , Y ≤ y1 ] = FXY (x1 , y1 )

P [(X, Y ) ∈ B1 ] = FXY (x2 , y2 ) − P [(X, Y ) ∈ (B2 ∪ B3 )]

= FXY (x2 , y2 ) − FXY (x2 , y1 ) − FXY (x1 , y2 ) + FXY (x1 , y1 )

P S Sastry, IISc, E1 222 Aug 2021 10/248

◮ What we showed is the following.
◮ For x1 < x2 and y1 < y2

P [x1 < X ≤ x2 , y1 < Y ≤ y2 ] = FXY (x2 , y2 ) − FXY (x2 , y1 )

−FXY (x1 , y2 ) + FXY (x1 , y1 )

◮ This means FXY should satisfy

FXY (x2 , y2 )−FXY (x2 , y1 )−FXY (x1 , y2 )+FXY (x1 , y1 ) ≥ 0

for all x1 < x2 and y1 < y2

◮ This is an additional condition that a function has to
satisfy to be the joint distribution function of a pair of
random variables

P S Sastry, IISc, E1 222 Aug 2021 11/248

Properties of Joint Distribution Function
◮ Joint distribution function: FXY : ℜ2 → ℜ

FXY (x, y) = P [X ≤ x, Y ≤ y]
◮ It satisfies
1. FXY (−∞, y) = FXY (x, −∞) = 0, ∀x, y;
FXY (∞, ∞) = 1
2. FXY is non-decreasing in each of its arguments
3. FXY is right continuous and has left-hand limits in each
of its arguments
4. For all x1 < x2 and y1 < y2

FXY (x2 , y2 )−FXY (x2 , y1 )−FXY (x1 , y2 )+FXY (x1 , y1 ) ≥ 0

◮ Any F : ℜ2 → ℜ satisfying the above would be a joint

distribution function.
P S Sastry, IISc, E1 222 Aug 2021 12/248
◮ Let X, Y be two discrete random variables (defined on
the same probability space).
◮ Let X ∈ {x1 , · · · xn } and Y ∈ {y1 , · · · , ym }.
◮ We define the joint probability mass function of X and Y
as
fXY (xi , yj ) = P [X = xi , Y = yj ]
(fXY (x, y) is zero for all other values of x, y)
◮ The fXY would satisfy
P P
◮ fXY (x, y) ≥ 0, ∀x, y and i j fXY (xi , yj ) = 1
◮ This is a straight-forward extension of the pmf of a single
discrete rv.

P S Sastry, IISc, E1 222 Aug 2021 13/248

Example

◮ Let Ω = (0, 1) with the ‘usual’ probability.

◮ So, each ω is a real number between 0 and 1
◮ Let X(ω) be the digit in the first decimal place in ω and
let Y (ω) be the digit in the second decimal place.
◮ If ω = 0.2576 then X(ω) = 2 and Y (ω) = 5
◮ Easy to see that X, Y ∈ {0, 1, · · · , 9}.
◮ We want to calculate the joint pmf of X and Y

P S Sastry, IISc, E1 222 Aug 2021 14/248

Example
◮ What is the event [X = 4]?

[X = 4] = {ω : X(ω) = 4} = [0.4, 0.5)

◮ What is the event [Y = 3]?

[Y = 3] = [0.03, 0.04) ∪ [0.13, 0.14) ∪ · · · ∪ [0.93, 0.94)

◮ What is the event [X = 4, Y = 3]?
It is the intersection of the above

[X = 4, Y = 3] = [0.43, 0.44)
◮ Hence the joint pmf of X and Y is

fXY (x, y) = P [X = x, Y = y] = 0.01, x, y ∈ {0, 1, · · · , 9}

P S Sastry, IISc, E1 222 Aug 2021 15/248

Example
◮ Consider the random experiment of rolling two dice.
Ω = {(ω1 , ω2 ) : ω1 , ω2 ∈ {1, 2, · · · , 6}}
◮ Let X be the maximum of the two numbers and let Y be
the sum of the two numbers.
◮ Easy to see X ∈ {1, 2, · · · , 6} and Y ∈ {2, 3, · · · , 12}
◮ What is the event [X = m, Y = n]? (We assume m, n
are in the correct range)
[X = m, Y = n] = {(ω1 , ω2 ) ∈ Ω : max(ω1 , ω2 ) = m, ω1 +ω2 = n}
◮ For this to be a non-empty set, we must have
m < n ≤ 2m
◮ Then [X = m, Y = n] = {(m, n − m), (n − m, m)}
◮ Is this always true? No! What if n = 2m?
[X = 3, Y = 6] = {(3, 3)},
[X = 4, Y = 6] = {(4, 2), (2, 4)}
◮ So, P [X = m, Y = n] is either 2/36 or 1/36 (assuming
m, n satisfy other requirements) P S Sastry, IISc, E1 222 Aug 2021 16/248
Example
◮ We can now write the joint pmf.
◮ Assume 1 ≤ m ≤ 6 and 2 ≤ n ≤ 12. Then
( 2
36
if m < n < 2m
fXY (m, n) = 1
36
if n = 2m
(fXY (m, n) is zero in all other cases)
◮ Does this satisfy requirements of joint pmf?
6 2m−1 6
X X X 2 X 1
fXY (m, n) = +
m,n m=1 n=m+1
36 m=1 36
6
2 X 1
= (m − 1) + 6
36 m=1 36
2 6
= (21 − 6) + =1
36 36
P S Sastry, IISc, E1 222 Aug 2021 17/248
Joint Probability mass function

◮ Let X ∈ {x1 , x2 , · · · } and Y ∈ {y1 , y2 , · · · } be discrete

random variables.
◮ The joint pmf: fXY (x, y) = P [X = x, Y = y].
◮ The joint pmf satisfies:
◮ fP
XYP (x, y) ≥ 0, ∀x, y and
j fXY (xi , yj ) = 1
◮
i
◮ Given the joint pmf, we can get the joint df as
X X
FXY (x, y) = fXY (xi , yj )
i: j:
xi ≤x yj ≤y

P S Sastry, IISc, E1 222 Aug 2021 18/248

◮ Given sets {x1 , x2 , · · · } and {y1 , y2 , · · · }.
◮ Suppose fXY : ℜ2 → [0, 1] be such that
◮ fXY (x, y) = 0 unless x = xi for some i and y = yj for
some
P Pj, and
j fXY (xi , yj ) = 1
◮
i
◮ Then fXY is a joint pmf.
◮ This is because, if we define
X X
FXY (x, y) = fXY (xi , yj )
i: j:
xi ≤x yj ≤y

then FXY satisfies all properties of a df.

◮ We normally specify a pair of discrete random variables by
giving the joint pmf

P S Sastry, IISc, E1 222 Aug 2021 19/248

◮ Given the joint pmf, we can (in principle) compute the
probability of any event involving the two discrete random
variables.
X
P [(X, Y ) ∈ B] = fXY (xi , yj )
i,j:
(xi ,yj )∈B

◮ Now, events can be specified in terms of relations

between the two rv’s too

[X < Y + 2] = {ω : X(ω) < Y (ω) + 2}

◮ Thus, X
P [X < Y + 2] = fXY (xi , yj )
i,j:
xi <yj +2

P S Sastry, IISc, E1 222 Aug 2021 20/248

◮ Take the example: 2 dice, X is max and Y is sum
◮ fXY (m, n) = 0 unless m = 1, · · · , 6 and n = 2, · · · , 12.
For this range
( 2
36
if m < n < 2m
fXY (m, n) = 1
36
if n = 2m
◮ Suppose we want P [Y = X + 2].
X 6
X
P [Y = X + 2] = fXY (m, n) = fXY (m, m + 2)
m,n: m=1
n=m+2
6
X
= fXY (m, m + 2) since we need m + 2 ≤ 2m
m=2
1 2 9
= +4 =
36 36 36

P S Sastry, IISc, E1 222 Aug 2021 21/248

Joint density function
◮ Let X, Y be two continuous rv’s with df FXY .
◮ If there exists a function fXY that satisfies
Z x Z y
FXY (x, y) = fXY (x′ , y ′ ) dy ′ dx′ , ∀x, y
−∞ −∞

then we say that X, Y have a joint probability density

function which is fXY
◮ Please note the difference in the definition of joint pmf
and joint pdf.
◮ When X, Y are discrete we defined a joint pmf
◮ We are not saying that if X, Y are continuous rv’s then a
joint density exists.

P S Sastry, IISc, E1 222 Aug 2021 22/248

properties of joint density

◮ The joint density (or joint pdf) of X, Y is fXY that

satisfies
Z x Z y
FXY (x, y) = fXY (x′ , y ′ ) dy ′ dx′ , ∀x, y
−∞ −∞

◮ Since FXY is non-decreasing in each argument, we must

have fXY (x, y) ≥ 0.
R∞ R∞
◮
−∞ −∞ XY
f (x′ , y ′ ) dy ′ dx′ = 1 is needed to ensure
FXY (∞, ∞) = 1.

P S Sastry, IISc, E1 222 Aug 2021 23/248

properties of joint density

◮ The joint density fXY satisfies the following

1. fXY (x, y) ≥ 0, ∀x, y
R∞ R∞
2. −∞ −∞ fXY (x′ , y ′ ) dy ′ dx′ = 1

◮ These are very similar to the properties of the density of a

single rv

P S Sastry, IISc, E1 222 Aug 2021 24/248

Example: Joint Density
◮ Consider the function
f (x, y) = 2, 0 < x < y < 1 (f (x, y) = 0, otherwise)

◮ Let us show this is a density

Z ∞ Z ∞ Z 1Z y Z 1 Z 1
f (x, y) dx dy = 2 dx dy = 2 x|y0 dy = 2y dy = 1
−∞ −∞ 0 0 0 0

◮ We can say this density is uniform over the region

1.0

0.5

x
0.5 1.0

The figure is not a plot of the density function!!

P S Sastry, IISc, E1 222 Aug 2021 25/248
properties of joint density

◮ The joint density fXY satisfies the following

1. fXY (x, y) ≥ 0, ∀x, y
R∞ R∞
2. −∞ −∞ fXY (x′ , y ′ ) dy ′ dx′ = 1
◮ Any function fXY : ℜ2 → ℜ satisfying the above two is a
joint density function.
◮ Given fXY satisfying the above, define
Z x Z y
FXY (x, y) = fXY (x′ , y ′ ) dy ′ dx′ , ∀x, y
−∞ −∞

◮ Then we can show FXY is a joint distribution.

P S Sastry, IISc, E1 222 Aug 2021 26/248

R∞ R∞
◮ fXY (x, y) ≥ 0 and −∞ −∞
fXY (x′ , y ′ ) dy ′ dx′ = 1
◮ Define
Z x Z y
FXY (x, y) = fXY (x′ , y ′ ) dy ′ dx′ , ∀x, y
−∞ −∞

◮ Then, FXY (−∞, y) = FXY (x, −∞) = 0, ∀x, y and

FXY (∞, ∞) = 1
◮ Since fXY (x, y) ≥ 0, FXY is non-decreasing in each
argument.
◮ Since it is given as an integral, the above also shows that
FXY is continuous in each argument.
◮ The only property left is the special property of FXY we
mentioned earlier.

P S Sastry, IISc, E1 222 Aug 2021 27/248

∆ , FXY (x2 , y2 ) − FXY (x1 , y2 ) − FXY (x2 , y1 ) + FXY (x1 , y1 ).

◮ We need to show ∆ ≥ 0 if x1 < x2 and y1 < y2 .

◮ We have
Z x 2 Z y2 Z x 1 Z y2
∆ = fXY dy dx − fXY dy dx
−∞ −∞ −∞ −∞
Z x 2 Z y1 Z x 1 Z y1
− fXY dy dx + fXY dy dx
−∞ −∞ −∞ −∞
Z x 2 Z y2 Z y1
= fXY dy − fXY dy dx
−∞ −∞ −∞
Z x 1 Z y2 Z y1
− fXY dy − fXY dy dx
−∞ −∞ −∞

P S Sastry, IISc, E1 222 Aug 2021 28/248

◮ Thus we have
Z x 2 Z y2 Z y1
∆ = fXY dy − fXY dy dx
−∞ −∞ −∞
Z x 1 Z y2 Z y1
− fXY dy − fXY dy dx
−∞ −∞ −∞
Z x 2 Z y2 Z x 1 Z y2
= fXY dy dx − fXY dy dx
−∞ y1 −∞ y1
Z x 2 Z y2
= fXY dy dx ≥ 0
x1 y1

◮ This actually shows

Z x2 Z y2
P [x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ] = fXY dy dx
x1 y1

P S Sastry, IISc, E1 222 Aug 2021 29/248

◮ What we showed is the following
◮ Any function fXY : ℜ2 → ℜ that satisfies
◮ fRXY (x, y) ≥ 0, ∀x, y
∞ R∞
−∞ −∞ fXY (x, y) dx dy = 1
◮

is a joint density function.

◮ This is because
R y now
Rx
FXY (x, y) = −∞ −∞ fXY (x, y) dx dy
would satisfy all conditions for a df.
◮ Convenient to specify joint density (when it exists)
◮ We also showed
Z x 2 Z y2
P [x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ] = fXY dy dx
x1 y1

◮ In general
Z
P [(X, Y ) ∈ B] = fXY (x, y) dx dy, ∀B ∈ B 2
B

P S Sastry, IISc, E1 222 Aug 2021 30/248

◮ Let us consider the example

f (x, y) = 2, 0 < x < y < 1

◮ Suppose wee want probability of [Y > X + 0.5]

P [Y > X + 0.5] = P [(X, Y ) ∈ {(x, y) : y > x + 0.5}]

Z
= fXY (x, y) dx dy
{(x,y) : y>x+0.5}
Z 1 Z y−0.5
= 2 dx dy
Z0.51 0

= 2(y − 0.5)dy
0.5
1
y2
=2 − y|10.5 = 1 − 0.25 − 1 + 0.5 = 0.25
2 0.5

P S Sastry, IISc, E1 222 Aug 2021 31/248

◮ We can look at it geometrically
y

1.0

0.5

x
0.5 1.0

◮ The probability of the event we want is the area of the

small triangle divided by that of the big triangle.

P S Sastry, IISc, E1 222 Aug 2021 32/248

Marginal Distributions
◮ Let X, Y be random variables with joint distribution
function FXY .
◮ We know FXY (x, y) = P [X ≤ x, Y ≤ y].
◮ Hence

FXY (x, ∞) = P [X ≤ x, Y ≤ ∞] = P [X ≤ x] = FX (x)

◮ We define the marginal distribution functions of X, Y by

FX (x) = FXY (x, ∞); FY (y) = FXY (∞, y)

◮ These are simply distribution functions of X and Y

obtained from the joint distribution function.

P S Sastry, IISc, E1 222 Aug 2021 33/248

Marginal mass functions
◮ Let X ∈ {x1 , x2 , · · · } and Y ∈ {y1 , y2 , · · · }
◮ Let fXY be their joint mass function.
◮ Then
X X
P [X = xi ] = P [X = xi , Y = yj ] = fXY (xi , yj )
j j

P [Y = yj ], j = 1, · · · , form a partition
(This is because
and P (A) = i P (ABi ) when Bi is a partition)
◮ We define the marginal mass functions of X and Y as
X X
fX (xi ) = fXY (xi , yj ); fY (yj ) = fXY (xi , yj )
j i

◮ These are mass functions of X and Y obtained from the

joint mass function
P S Sastry, IISc, E1 222 Aug 2021 34/248
marginal density functions
◮ Let X, Y be continuous rv with
R x joint
R y density f′ XY′ . ′ ′
◮ Then we know FXY (x, y) = −∞ −∞ fXY (x , y ) dy dx
◮ Hence, we have
Z x Z ∞
FX (x) = FXY (x, ∞) = fXY (x′ , y ′ ) dy ′ dx′
Z−∞ −∞
x Z ∞
= fXY (x , y ) dy dx′
′ ′ ′
−∞ −∞
◮ Since X is a continuous rv, this means
Z ∞
fX (x) = fXY (x, y) dy
−∞
We call this the marginal density of X.
◮ Similarly, marginal density of Y is
Z ∞
fY (y) = fXY (x, y) dx
−∞
◮ These are pdf’s of X and Y obtained from theIISc,joint
P S Sastry, E1 222 Aug 2021 35/248
Example
◮ Rolling two dice, X is max, Y is sum
◮ We had, for 1 ≤ m ≤ 6 and 2 ≤ n ≤ 12,
( 2
36
if m < n < 2m
fXY (m, n) = 1
36
if n = 2m
P
◮ We know, fX (m) = n fXY (m, n), m = 1, · · · , 6.
◮ Given m, for what values of n, fXY (m, n) > 0 ?
We can only have n = m + 1, · · · , 2m.
◮ Hence we get
2m 2m−1
X X 2 1 2 1 2m − 1
fX (m) = fXY (m, n) = + = (m−1)+ =
n=m+1 n=m+1
36 36 36 36 36

P S Sastry, IISc, E1 222 Aug 2021 36/248

Example
◮ Consider the joint density

fXY (x, y) = 2, 0 < x < y < 1

◮ The marginal density of X is: for 0 < x < 1,

Z ∞ Z 1
fX (x) = fXY (x, y) dy = 2 dy = 2(1 − x)
−∞ x

Thus, fX (x) = 2(1 − x), 0 < x < 1

◮ We can easily verify this is a density
Z ∞ Z 1
1
fX (x) dx = 2(1 − x) dx = (2x − x2 ) 0
=1
−∞ 0

P S Sastry, IISc, E1 222 Aug 2021 37/248

We have: fXY (x, y) = 2, 0 < x < y < 1
◮ We can similarly find density of Y .

◮ For 0 < y < 1,

Z ∞ Z y
fY (y) = fXY (x, y) dx = 2 dx = 2y
−∞ 0

◮ Thus, fY (y) = 2y, 0 < y < 1 and

1 1
y2
Z
2y dy = 2 =1
0 2 0

P S Sastry, IISc, E1 222 Aug 2021 38/248

◮ If we are given the joint df or joint pmf/joint density of
X, Y , then the individual df or pmf/pdf are uniquely
determined.
◮ However, given individual pdf of X and Y , we cannot
determine the joint density. (same is true of pmf or df)
◮ There can be many different joint density functions all
having the same marginals

P S Sastry, IISc, E1 222 Aug 2021 39/248

Conditional distributions

◮ Let X, Y be rv’s on the same probability space

◮ We define the conditional distribution of X given Y by

FX|Y (x|y) = P [X ≤ x|Y = y]

(For now ignore the case of P [Y = y] = 0).

◮ Note that FX|Y : ℜ2 → ℜ
◮ FX|Y (x|y) is a notation. We could write FX|Y (x, y).

P S Sastry, IISc, E1 222 Aug 2021 40/248

◮ Conditional distribution of X given Y is

FX|Y (x|y) = P [X ≤ x|Y = y]

It is the conditional probability of [X ≤ x] given (or

conditioned on) [Y = y].
◮ Consider example: rolling 2 dice, X is max, Y is sum

P [X ≤ 4|Y = 3] = 1; P [X ≤ 4|Y = 9] = 0

◮ This is what conditional distribution captures.

◮ For every value of y, FX|Y (x|y) is a distribution function
in the variable x.
◮ It defines a new distribution for X based on knowing the
value of Y .

P S Sastry, IISc, E1 222 Aug 2021 41/248

◮ Let: X ∈ {x1 , x2 , · · · } and Y ∈ {y1 , y2 , · · · }. Then

P [X ≤ x, Y = yj ]
FX|Y (x|yj ) = P [X ≤ x|Y = yj ] =
P [Y = yj ]

(We define FX|Y (x|y) only when y = yj for some j).

◮ For each yj , FX|Y (x|yj ) is a df of a discrete rv in x.

◮ Since X is a discrete rv, we can write the above as

P
P [X ≤ x, Y = yj ] i:xi ≤x P [X = xi , Y = yj ]
FX|Y (x|yj ) = =
P [Y = yj ] P [Y = yj ]
X P [X = xi , Y = yj ]
=
i:xi ≤x
P [Y = yj ]
X fXY (xi , yj )

=
i:x ≤x
fY (yj )
i

P S Sastry, IISc, E1 222 Aug 2021 42/248

Conditional mass function
◮ We got
X fXY (xi , yj )
FX|Y (x|yj ) =
i:x ≤x
fY (yj )
i

◮ We define the conditional mass function of X given Y as

fXY (xi , yj )
fX|Y (xi |yj ) = = P [X = xi |Y = yj ]
fY (yj )

◮ Note that
X X
fX|Y (xi |yj ) = 1, ∀yj ; and FX|Y (x|yj ) = fX|Y (xi |yj )
i i:xi ≤x

P S Sastry, IISc, E1 222 Aug 2021 43/248

Example: Conditional pmf
◮ Consider the random experiment of tossing a coin n
times.
◮ Let X denote the number of heads and let Y denote the
toss number on which the first head comes.
◮ For 1 ≤ k ≤ n
P [Y = k, X = 1]
fY |X (k|1) = P [Y = k|X = 1] =
P [X = 1]
p(1 − p)n−1
= n
C1 p(1 − p)n−1
1
=
n
◮ Given there is only one head, it is equally likely to occur
on any toss.
P S Sastry, IISc, E1 222 Aug 2021 44/248
◮ The conditional mass function is
fXY (xi , yj )
fX|Y (xi |yj ) = P [X = xi |Y = yj ] =
fY (yj )
◮ This gives us the useful identity
fXY (xi , yj ) = fX|Y (xi |yj )fY (yj )
( P [X = xi , Y = yj ] = P [X = xi |Y = yj ]P [Y = yj ])
◮ This gives us the total proability rule for discrete rv’s
X X
fX (xi ) = fXY (xi , yj ) = fX|Y (xi |yj )fY (yj )
j j

◮ This is same as
X
P [X = xi ] = P [X = xi |Y = yj ]P [Y = yj ]
j
P
(P (A) = j P (A|Bj )P (Bj ) when B1 , · · · form a
partition)
P S Sastry, IISc, E1 222 Aug 2021 45/248
Bayes Rule for discrete Random Variable
◮ We have

fXY (xi , yj ) = fX|Y (xi |yj )fY (yj ) = fY |X (yj |xi )fX (xi )

◮ This gives us Bayes rule for discrete rv’s

fY |X (yj |xi )fX (xi )

fX|Y (xi |yj ) =
fY (yj )
fY |X (yj |xi )fX (xi )
= P
i fXY (xi , yj )
fY |X (yj |xi )fX (xi )
= P
i fY |X (yj |xi )fX (xi )

P S Sastry, IISc, E1 222 Aug 2021 46/248

◮ Let X, Y be continuous rv’s with joint density, fXY .
◮ We once again want to define conditional df

FX|Y (x|y) = P [X ≤ x|Y = y]

◮ But the conditioning event, [Y = y] has zero probability.

◮ Hence we define conditional df as follows

FX|Y (x|y) = lim P [X ≤ x|Y ∈ [y, y + δ]]

δ↓0

◮ This is well defined if the limit exists.

◮ The limit exists for all y where fY (y) > 0 (and for all x)

P S Sastry, IISc, E1 222 Aug 2021 47/248

◮ The conditional df is given by (assuming fY (y) > 0)
FX|Y (x|y) = lim P [X ≤ x|Y ∈ [y, y + δ]]
δ↓0
P [X ≤ x, Y ∈ [y, y + δ]]
= lim
δ↓0 P [Y ∈ [y, y + δ]]
R x R y+δ
−∞ y
fXY (x′ , y ′ ) dy ′ dx′
= lim R y+δ
δ↓0
y
fY (y ′ ) dy ′
Rx
f (x′ , y) δ dx′ + o(δ)
−∞ XY
= lim
δ↓0 fY (y) δ + o(δ)
Z x
fXY (x′ , y) ′
= dx
−∞ fY (y)
◮ We define conditional density of X given Y as
fXY (x, y)
fX|Y (x|y) =
fY (y)

P S Sastry, IISc, E1 222 Aug 2021 48/248

◮ Let X, Y have joint density fXY .
◮ The conditional df of X given Y is
FX|Y (x|y) = lim P [X ≤ x|Y ∈ [y, y + δ]]
δ↓0

◮ This exists if fY (y) > 0 and then it has a density:

Z x Z x
fXY (x′ , y) ′
FX|Y (x|y) = dx = fX|Y (x′ |y) dx′
−∞ f Y (y) −∞

◮ This conditional density is given by

fXY (x, y)
fX|Y (x|y) =
fY (y)
◮ We (once again) have the useful identity
fXY (x, y) = fX|Y (x|y) fY (y) = fY |X (y|x)fX (x)

P S Sastry, IISc, E1 222 Aug 2021 49/248

Example
fXY (x, y) = 2, 0 < x < y < 1
◮ We saw that the marginal densities are

fX (x) = 2(1 − x), 0 < x < 1; fY (y) = 2y, 0 < y < 1

◮ Hence the conditional densities are given by

fXY (x, y) 1
fX|Y (x|y) = = , 0<x<y<1
fY (y) y
fXY (x, y) 1
fY |X (y|x) = = , 0<x<y<1
fX (x) 1−x
◮ We can see this intuitively
Conditioned on Y = y, X is uniform over (0, y).
Conditioned on X = x, Y is uniform over (x, 1).
P S Sastry, IISc, E1 222 Aug 2021 50/248
◮ The identity fXY (x, y) = fX|Y (x|y)fY (y) can be used to
specify the joint density of two continuous rv’s
◮ We can specify the marginal density of one and the
conditional density of the other given the first.
◮ This may actually be the model of how the the rv’s are
generated.

P S Sastry, IISc, E1 222 Aug 2021 51/248

Example
◮ Let X be uniform over (0, 1) and let Y be uniform over
0 to X. Find the density of Y .
◮ What we are given is
1
fX (x) = 1, 0 < x < 1; fY |X (y|x) = , 0 < y < x < 1
x
◮ Hence the joint density is:
fXY (x, y) = x1 , 0 < y < x < 1.
◮ Hence the density of Y is
Z ∞ Z 1
1
fY (y) = fXY (x, y) dx = dx = − ln(y), 0 < y < 1
−∞ y x

◮ We can verify it to be a density

Z 1 1
1
Z
1
− ln(y) dy = −y ln(y)|0 + y dy = 1
0 0 y
P S Sastry, IISc, E1 222 Aug 2021 52/248
◮ We have the identity
fXY (x, y) = fX|Y (x|y) fY (y)
◮ By integrating both sides
Z ∞ Z ∞
fX (x) = fXY (x, y) dy = fX|Y (x|y) fY (y) dy
−∞ −∞
◮ This is a continuous analogue of total probability rule.
◮ But note that, since X is continuous rv, fX (x) is NOT
P [X = x]
◮ In case of discrete rv, the mass function value fX (x) is
equal to P [X = x] and we had
X
fX (x) = fX|Y (x|y)fY (y)
y
◮ It is as if one can simply replace pmf by pdf and
summation by integration!!
◮ While often that gives the right result, one needs to be
very careful
P S Sastry, IISc, E1 222 Aug 2021 53/248
◮ We have the identity

fXY (x, y) = fX|Y (x|y) fY (y) = fY |X (y|x)fX (x)

◮ This gives rise to Bayes rule for continuous rv

fY |X (y|x)fX (x)
fX|Y (x|y) =
fY (y)
fY |X (y|x)fX (x)
= R∞
f (y|x)fX (x) dx
−∞ Y |X

◮ This is essentially identical to Bayes rule for discrete rv’s.

We have essentially put the pdf wherever there was pmf

P S Sastry, IISc, E1 222 Aug 2021 54/248

◮ To recap, we started by defining conditional distribution
function.
FX|Y (x|y) = P [X ≤ x|Y = y]
◮ When X, Y are discrete, we define this only for y = yj .
That is, we define it only for all values that Y can take.
◮ When X, Y have joint density, we defined it by
FX|Y (x|y) = lim P [X ≤ x|Y ∈ [y, y + δ]]
δ↓0

This limit exists and FX|Y is well defined if fY (y) > 0.

That is, essentially again for all values that Y can take.
◮ In the discrete case, we define fX|Y as the pmf
corresponding to FX|Y . This conditional pmf can also be
defined as a conditional probability
◮ In the continuous case fX|Y is the density corresponding
to FX|Y .
◮ In both cases we have: fXY (x, y) = fX|Y (x|y)fY (y)
◮ This gives total probability rule and Bayes rule for random
variables P S Sastry, IISc, E1 222 Aug 2021 55/248
◮ Now, let X be a continuous rv and let Y be discrete rv.
◮ We can define FX|Y as

FX|Y (x|y) = P [X ≤ x|Y = y]

This is well defined for all values that y takes. (We

consider only those y)
◮ Since X is continuous rv, this df would have a density
Z x
FX|Y (x|y) = fX|Y (x′ |y) dx′
−∞

◮ Hence we can write

P [X ≤ x, Y = y] = FX|Y (x|y)P [Y = y]
Z x
= fX|Y (x′ |y) fY (y) dx′
−∞

P S Sastry, IISc, E1 222 Aug 2021 56/248

◮ We now get
X
FX (x) = P [X ≤ x] = P [X ≤ x, Y = y]
y
X Z x
= fX|Y (x′ |y) fY (y) dx′
y −∞
Z x X
= fX|Y (x′ |y) fY (y) dx′
−∞ y

◮ This gives us
X
fX (x) = fX|Y (x|y)fY (y)
y

◮ This is another version of total probability rule.

◮ Earlier we derived this when X, Y are discrete.
◮ The formula is true even when X is continuous
Only difference is we need to take fX as the density of X.
P S Sastry, IISc, E1 222 Aug 2021 57/248
◮ When X, Y are discrete we have
X
fX (x) = fX|Y (x|y)fY (y)
y

◮ When X is continuous and Y is discrete, we defined

fX|Y (x|y) to be the density corresponding to
FX|Y (x|y) = P [X ≤ x|Y = y]
◮ Then we once again get
X
fX (x) = fX|Y (x|y)fY (y)
y

However, now, fX is density (and not a mass function).

fX|Y is also a density now.
◮ Suppose Y ∈ {1, 2, 3} and fY (i) = λi .
Let fX|Y (x|i) = fi (x). Then
fX (x) = λ1 f1 (x) + λ2 f2 (x) + λ3 f3 (x)
Called a mixture density model
P S Sastry, IISc, E1 222 Aug 2021 58/248
◮ Continuing with X continuous rv and Y discrete. We
have
Z x
FX|Y (x|y) = P [X ≤ x|Y = y] = fX|Y (x′ |y) dx′
−∞

◮ We also have
Z x
P [X ≤ x, Y = y] = fX|Y (x′ |y) fY (y) dx′
−∞

◮ Hence we can define a ‘joint density’

fXY (x, y) = fX|Y (x|y)fY (y)

◮ This is a kind of mixed density and mass function.

◮ We will not be using such ‘joint densities’ here

P S Sastry, IISc, E1 222 Aug 2021 59/248

◮ For simplifying this we note the following:

Z x
P [X ≤ x, Y = y] = fX|Y (x′ |y) fY (y) dx′
−∞

Z x+δ
⇒ P [X ∈ [x, x+δ], Y = y] = fX|Y (x′ |y) fY (y) dx′
x

P S Sastry, IISc, E1 222 Aug 2021 60/248

◮ We have

fY |X (y|x) = lim P [Y = y|X ∈ [x, x + δ]]

δ↓0
P [Y = y, X ∈ [x, x + δ]]
= lim
δ↓0 P [X ∈ [x, x + δ]]
R x+δ
fX|Y (x′ |y) fY (y) dx′
= lim x R x+δ
δ↓0
x
fX (x′ ) dx′
fX|Y (x|y) fY (y)
=
fX (x)
⇒ fY |X (y|x)fX (x) = fX|Y (x|y) fY (y)

◮ This gives us further versions of total probability rule and

Bayes rule.

P S Sastry, IISc, E1 222 Aug 2021 61/248

◮ First let us look at the total probability rule possibilities
◮ When X is continuous rv and Y is discrete rv, we derived

fY |X (y|x)fX (x) = fX|Y (x|y) fY (y)

Note that fY is mass fn, fX is density and so on.

◮ Since fX|Y is a density (corresponding to FX|Y ),
Z ∞
fX|Y (x|y) dx = 1
−∞

◮ Hence we get
Z ∞
fY (y) = fY |X (y|x)fX (x) dx
−∞

◮ Earlier we derived the same formula when X, Y have a

joint density.
P S Sastry, IISc, E1 222 Aug 2021 62/248
◮ Let us review all the total probability formulas
X
1. fX (x) = fX|Y (x|y)fY (y)
y

◮ We first derived this when X, Y are discrete.

◮ But now we proved this holds when Y is discrete
If X is continuous the fX , fX|Y are densities; If X is also
discrete they are mass functions
Z ∞
2. fY (y) = fY |X (y|x)fX (x) dx
−∞

◮ We first proved it when X, Y have a joint density

We now know it holds also when X is cont and Y is
discrete. In that case fY is a mass function

P S Sastry, IISc, E1 222 Aug 2021 63/248

◮ When X is continuous rv and Y is discrete rv, we derived

fY |X (y|x)fX (x) = fX|Y (x|y) fY (y)

◮ This once again gives rise to Bayes rule:

fX|Y (x|y) fY (y) fY |X (y|x)fX (x)

fY |X (y|x) = fX|Y (x|y) =
fX (x) fY (y)
◮ Earlier we showed this hold when X, Y are both discrete
or both continuous.
◮ Thus Bayes rule holds in all four possible scenarios
◮ Only difference is we need to interpret fX or fX|Y as
mass functions when X is discrete and as densities when
X is a continuous rv
◮ In general, one refers to these always as densities since
the actual meaning would be clear from context.
P S Sastry, IISc, E1 222 Aug 2021 64/248
Example

◮ Consider a communication system. The transmitter puts

out 0 or 5 volts for the bits of 0 and 1, and, volage
measured by the receiver is the sent voltage plus noise
added by the channel.
◮ We assume noise has Gaussian density with mean zero
and variance σ 2 .
◮ We want the probability that the sent bit is 1 when
measured voltage at the receiver is x (to decide what is
sent).
◮ Let X be the measured voltage and let Y be sent bit.
◮ We want to calculate fY |X (1|x).
◮ We want to use the Bayes rule to calculate this

P S Sastry, IISc, E1 222 Aug 2021 65/248

fX|Y (x|1) fY (1)

P [Y = 1|X = x] = fY |X (1|x) =
fX (x)
◮ We need fY (1), fY (0). Let us take them to be same.
◮ In practice we only want to know whether
fY |X (1|x) > fY |X (0|x)
◮ Then we do not need to calculate fX (x).
We only need ratio of fY |X (1|x) and fY |X (0|x).

P S Sastry, IISc, E1 222 Aug 2021 66/248

◮ The ratio of the two probabilities is

fY |X (1|x) fX|Y (x|1) fY (1)

=
fY |X (0|x) fX|Y (x|0) fY (0)
1 2
√1 e− 2σ2 (x−5)
σ 2π
= 1 2
√1 e− 2σ2 (x−0)
σ 2π
−0.5σ −2 (x2 −10x+25−x2 )
= e
−2
= e0.5σ (10x−25)

◮ We are only interested in whether the above is greater

than 1 or not.
◮ The ratio is greater than 1 if 10x > 25 or x > 2.5
◮ So, if X > 2.5 we will conclude bit 1 is sent. Intuitively
obvious!

P S Sastry, IISc, E1 222 Aug 2021 67/248

◮ We did not calculate fX (x) in the above.
◮ We can calculate it if we want.
◮ Using total probability rule
X
fX (x) = fX|Y (x|y)fY (y)
y
= fX|Y (x|1)fY (1) + fX|Y (x|0)fY (0)
1 1 (x−5)2 1 1 x2
= √ e− 2σ2 + √ e− 2σ2
2 σ 2π 2 σ 2π
◮ It is a mixture density

P S Sastry, IISc, E1 222 Aug 2021 68/248

◮ As we saw, given the joint distribution we can calculate
all the marginals.
◮ However, there can be many joint distributions with the
same marginals.
◮ Let F1 , F2 be one dimensional df’s of continuous rv’s with
f1 , f2 being the corresponding densities.
Define a function f : ℜ2 → ℜ by
f (x, y) = f1 (x)f2 (y) [1 + α(2F1 (x) − 1)(2F2 (y) − 1)]
where α ∈ (−1, 1).
◮ First note that f (x, y) ≥ 0, ∀α ∈ (−1, 1).
For different α we get different functions.
◮ We first show that f (x, y) is a joint density.
◮ For this, we note the following
∞
(F1 (x))2
Z ∞
1
f1 (x) F1 (x) dx = =
−∞ 2 2
−∞

P S Sastry, IISc, E1 222 Aug 2021 69/248

f (x, y) = f1 (x)f2 (y) [1 + α(2F1 (x) − 1)(2F2 (y) − 1)]

Z ∞ Z ∞ Z ∞ Z ∞
f (x, y) dx dy = f1 (x) dx f2 (y) dy
−∞ −∞ −∞ −∞
Z ∞ Z ∞
+α (2f1 (x)F1 (x) − f1 (x)) dx (2f2 (y)F2 (y) − f2 (y)) dy
−∞ −∞
= 1
R∞
because 2 −∞
f1 (x) F1 (x) dx = 1. This also shows
Z ∞ Z ∞
f (x, y)dx = f2 (y); f (x, y)dy = f1 (x)
−∞ −∞

P S Sastry, IISc, E1 222 Aug 2021 70/248

◮ Thus infinitely many joint distributions can all have the
same marginals.
◮ So, in general, the marginals cannot determine the joint
distribution.
◮ An important special case where this is possible is that of
independent random variables

P S Sastry, IISc, E1 222 Aug 2021 71/248

Independent Random Variables
◮ Two random variable X, Y are said to be independent if
for all Borel sets B1 , B2 , the events [X ∈ B1 ] and
[Y ∈ B2 ] are independent.
◮ If X, Y are independent then

P [X ∈ B1 , Y ∈ B2 ] = P [X ∈ B1 ] P [Y ∈ B2 ], ∀B1 , B2 ∈ B

◮ In particular

FXY (x, y) = P [X ≤ x, Y ≤ y] = P [X ≤ x]P [Y ≤ y] = FX (x) FY (y)

◮ Theorem: X, Y are independent if and only if

FXY (x, y) = FX (x)FY (y).

P S Sastry, IISc, E1 222 Aug 2021 72/248

◮ Suppose X, Y are independent discrete rv’s

fXY (x, y) = P [X = x, Y = y] = P [X = x]P [Y = y] = fX (x)fY (y)

The joint mass function is a product of marginals.

◮ Suppose fXY (x, y) = fX (x)fY (y). Then
X X
FXY (x, y) = fXY (xi , yj ) = fX (xi )fY (yj )
xi ≤x,yj ≤y xi ≤x,yj ≤y
X X
= fX (xi ) fY (yj ) = FX (x)FY (y)
xi ≤x yj ≤y

◮ So, X, Y are independent if and only if

fXY (x, y) = fX (x)fY (y)

P S Sastry, IISc, E1 222 Aug 2021 73/248

◮ Let X, Y be independent continuous rv
Z x Z y
′ ′
FXY (x, y) = FX (x)FY (y) = fX (x ) dx fY (y ′ ) dy ′
−∞ −∞
Z y Z x
= (fX (x′ )fY (y ′ )) dx′ dy ′
−∞ −∞

◮ This implies joint density is product of marginals.

◮ Now, suppose fXY (x, y) = fX (x)fY (y)
Z y Z x
FXY (x, y) = fXY (x′ , y ′ ) dx′ dy ′
−∞ −∞
Z y Z x
= fX (x′ )fY (y ′ ) dx′ dy ′
−∞ −∞
Z x Z y
′ ′
= fX (x ) dx fY (y ′ ) dy ′ = FX (x)FY (y)
−∞ −∞

◮ So, X, Y are independent if and only if

fXY (x, y) = fX (x)fY (y)
P S Sastry, IISc, E1 222 Aug 2021 74/248
◮ Let X, Y be independent.
◮ Then P [X ∈ B1 |Y ∈ B2 ] = P [X ∈ B1 ].
◮ Hence, we get FX|Y (x|y) = FX (x).
◮ This also implies fX|Y (x|y) = fX (x).
◮ This is true for all the four possibilities of X, Y being
continuous/discrete.

P S Sastry, IISc, E1 222 Aug 2021 75/248

More than two rv
◮ Everything we have done so far is easily extended to
multiple random variables.
◮ Let X, Y, Z be rv on the same probability space.
◮ We define joint distribution function by

FXY Z (x, y, z) = P [X ≤ x, Y ≤ y, Z ≤ z]

◮ If all three are discrete then the joint mass function is

fXY Z (x, y, z) = P [X = x, Y = y, Z = z]

◮ If they are continuous , they have a joint density if

Z z Z y Z x
FXY Z (x, y, z) = fXY Z (x′ , y ′ , z ′ ) dx′ dy ′ dz ′
−∞ −∞ −∞

P S Sastry, IISc, E1 222 Aug 2021 76/248

◮ Easy to see that joint mass function satisfies
1. fXY Z (x, y, z) ≥ 0 and is non-zero only for countably
many
P tuples.
2. x,y,z fXY Z (x, y, z) = 1
◮ Similarly the joint density satisfies
1. fRXY ZR(x, y, z) ≥ 0
∞ ∞ R∞
2. −∞ −∞ −∞ fXY Z (x, y, z) dx dy dz = 1
◮ These are straight-forward generalizations
◮ The properties of joint distribution function such as it
being non-decreasing in each argument etc are easily seen
to hold here too.
◮ Generalizing the special property of the df (relating to
probability of cylindrical sets) is a little more complicated.
◮ We specify multiple random variables either through joint
mass function or joint density function.

P S Sastry, IISc, E1 222 Aug 2021 77/248

◮ Now we get many different marginals:

FXY (x, y) = FXY Z (x, y, ∞); FZ (z) = FXY Z (∞, ∞, z) and so on

◮ Similarly we get
Z ∞
fY Z (y, z) = fXY Z (x, y, z) dx;
Z−∞
∞ Z ∞
fX (x) = fXY Z (x, y, z) dy dz
−∞ −∞

◮ Any marginal is a joint density of a subset of these rv’s

and we obtain it by integrating the (full) joint density
with respect to the remaining variables.
◮ We obtain the marginal mass functions for a subset of the
rv’s also similarly where we sum over the remaining
variables.
P S Sastry, IISc, E1 222 Aug 2021 78/248
◮ We have to be a little careful in dealing with these when
some random variables are discrete and others are
continuous.
◮ Suppose X is continuous and Y, Z are discrete. We do
not have any joint density or mass function as such.
◮ However, the joint df is always well defined.
◮ Suppose we want marginal joint distribution of X, Y . We
know how to get FXY by marginalization.
◮ Then we can get fX (a density), fY (a mass fn), fX|Y
(conditinal density) and fY |X (conditional mass fn)
◮ With these we can generally calculate most quantities of
interest.

P S Sastry, IISc, E1 222 Aug 2021 79/248

◮ Like in case of marginals, there are different types of
conditional distributions now.
◮ We can always define conditional distribution functions
like

FXY |Z (x, y|z) = P [X ≤ x, Y ≤ y|Z = z]

FX|Y Z (x|y, z) = P [X ≤ x|Y = y, Z = z]

◮ In all such cases, if the conditioning random variables are

continuous, we define the above as a limit.
◮ For example when Z is continuous

FXY |Z (x, y|z) = lim P [X ≤ x, Y ≤ y|Z ∈ [z, z + δ]]

δ↓0

P S Sastry, IISc, E1 222 Aug 2021 80/248

P S Sastry, IISc, E1 222 Aug 2021 81/248

◮ When X, Y, Z have joint density, all such relations hold
for the appropriate (conditional) densities. For example,

P [Z ≤ z, X ∈ [x, x + δ], Y ∈ [y, y + δ]]

FZ|XY (z|x, y) = lim
δ↓0 P [X ∈ [x, x + δ, Y ∈ [y, y + δ]]
R z R x+δ R y+δ
−∞ x y
fXY Z (x′ , y ′ , z ′ ) dy ′ dx′ dz ′
= lim R x+δ R y+δ
δ↓0
x y
fXY (x′ , y ′ ) dy ′ dx′
Z z Z z
fXY Z (x, y, z ′ ) ′
= dz = fZ|XY (z ′ |x, y) dz ′
−∞ f XY (x, y) −∞

◮ Thus we get

fXY Z (x, y, z) = fZ|XY (z|x, y)fXY (x, y) = fZ|XY (z|x, y)fY |X (y|x)fX (x)

P S Sastry, IISc, E1 222 Aug 2021 82/248

◮ We can similarly talk about the joint distribution of any
finite number of rv’s
◮ Let X1 , X2 , · · · , Xn be rv’s on the same probability space.
◮ We denote it as a vector X or X. We can think of it as a
mapping, X : Ω → ℜn .
◮ We can write the joint distribution as

FX (x) = P [X ≤ x] = P [Xi ≤ xi , i = 1, · · · , n]

◮ We represent by fX (x) the joint density or mass function.

Sometimes we also write it as fX1 ···Xn (x1 , · · · , xn )
◮ We use similar notation for marginal and conditional
distributions

P S Sastry, IISc, E1 222 Aug 2021 83/248

Independence of multiple random variables

◮ Random variables X1 , X2 , · · · , Xn are said to be

independent if the the events [Xi ∈ Bi ], i = 1, · · · , n are
independent.
(Recall definition of independence of a set of events)
◮ Independence implies that the marginals would determine
the joint distribution.
◮ If X, Y, Z are independent then
fXY Z (x, y, z) = fX (x)fY (y)fZ (z)
◮ For independent random variables, the joint mass
function (or density function) is product of individual
mass functions (or density functions)

P S Sastry, IISc, E1 222 Aug 2021 84/248

Example
◮ Let a joint density be given by

fXY Z (x, y, z) = K, 0<z<y<x<1

First let us determine K.

Z ∞ Z ∞ Z ∞ Z 1 Z x Z y
fXY Z (x, y, z) dz dy dx = K dz dy dx
−∞ −∞ −∞ 0 0 0
Z 1 Z x
= K y dy dx
0 0
1 2
x
Z
= K dx
0 2
1
= K ⇒K=6
6

P S Sastry, IISc, E1 222 Aug 2021 85/248

fXY Z (x, y, z) = 6, 0<z<y<x<1

◮ Suppose we want to find the (marginal) joint distribution

of X and Z.
Z ∞
fXZ (x, z) = fXY Z (x, y, z) dy
−∞
Z x
= 6 dy, 0 < z < x < 1
z
= 6(x − z), 0<z<x<1

P S Sastry, IISc, E1 222 Aug 2021 86/248

◮ We got the joint density as

fXZ (x, z) = 6(x − z), 0<z<x<1

◮ We can verify this is a joint density

Z ∞Z ∞ Z 1Z x
fXZ (x, z) dz dx = 6(x − z) dz dx
−∞ −∞ 0 0
Z 1 x
x z2
= 6x z|0 − 6 dx
0 2 0
Z 1
x2

2
= 6x − 6 dx
0 2
1
x3
= 3 =1
3 0

P S Sastry, IISc, E1 222 Aug 2021 87/248

◮ The joint density of X, Y, Z is

fXY Z (x, y, z) = 6, 0<z<y<x<1

◮ The joint density of X, Z is

fXZ (x, z) = 6(x − z), 0<z<x<1

◮ Hence,
fXY Z (x, y, z) 1
fY |XZ (y|x, z) = = , 0<z<y<x<1
fXZ (x, z) x−z

P S Sastry, IISc, E1 222 Aug 2021 88/248

Functions of multiple random variables
◮ Let X, Y be random variables on the same probability
space.
◮ Let g : ℜ2 → ℜ.
◮ Let Z = g(X, Y ). Then Z is a rv
◮ This is analogous to functions of a single rv
Z = g(X,Y)

g
R2
Sample Space
[ XY] R

B’ B

P S Sastry, IISc, E1 222 Aug 2021 89/248

◮ let Z = g(X, Y )
◮ We can determine distribution of Z from the joint
distribution of X, Y

FZ (z) = P [Z ≤ z] = P [g(X, Y ) ≤ z]

◮ For example, if X, Y are discrete, then

X
fZ (z) = P [Z = z] = P [g(X, Y ) = z] = fXY (xi , yj )
xi ,yj :
g(xi ,yj )=z

P S Sastry, IISc, E1 222 Aug 2021 90/248

◮ Let X, Y be discrete rv’s. Let Z = min(X, Y ).

fZ (z) = P [min(X, Y ) = z]
= P [X = z, Y > z] + P [Y = z, X > z] + P [X = Y = z]
X X
= P [X = z, Y = y] + P [X = x, Y = z]
y>z x>z
+P [X = z, Y = z]
X X
= fXY (z, y) + fXY (x, z) + fXY (z, z)
y>z x>z

◮ Now suppose X, Y are independent and both of them

have geometric distribution with the same parameter, p.
◮ Such random variables are called independent and
identically distributed or iid random variables.

P S Sastry, IISc, E1 222 Aug 2021 91/248

◮ Now we can get pmf of Z as (note Z ∈ {1, 2, · · · })

fZ (z) = P [X = z, Y > z] + P [Y = z, X > z] + P [X = Y = z]

= P [X = z]P [Y > z] + P [Y = z]P [X > z] + P [X = z]P [Y = z]
2
= p(1 − p)z−1 (1 − p)z ∗ 2 + p(1 − p)z−1
2
= 2p(1 − p)z−1 (1 − p)z + p(1 − p)z−1
= 2p(1 − p)2z−1 + p2 (1 − p)2z−2
= p(1 − p)2z−2 (2(1 − p) + p)
= (2 − p)p(1 − p)2z−2

P S Sastry, IISc, E1 222 Aug 2021 92/248

◮ We can show this is a pmf
∞
X ∞
X
fZ (z) = (2 − p)p(1 − p)2z−2
z=1 z=1
∞
X
= (2 − p)p (1 − p)2z−2
z=1
1
= (2 − p)p
1 − (1 − p)2
1
= (2 − p)p =1
2p − p2

P S Sastry, IISc, E1 222 Aug 2021 93/248

◮ Let us consider the max and min functions, in general.
◮ Let Z = max(X, Y ). Then we have

FZ (z) = P [Z ≤ z] = P [max(X, Y ) ≤ z]
= P [X ≤ z, Y ≤ z]
= FXY (z, z)
= FX (z)FY (z), if X, Y are independent
= (FX (z))2 , if they are iid

◮ This is true of all random variables.

◮ Suppose X, Y are iid continuous rv. Then density of Z is

fZ (z) = 2FX (z)fX (z)

P S Sastry, IISc, E1 222 Aug 2021 94/248

◮ Suppose X, Y are iid uniform over (0, 1)
◮ Then we get df and pdf of Z = max(X, Y ) as

FZ (z) = z 2 , 0 < z < 1; and fZ (z) = 2z, 0 < z < 1

FZ (z) = 0 for z ≤ 0 and FZ (z) = 1 for z ≥ 1 and

fZ (z) = 0 outside (0, 1)

P S Sastry, IISc, E1 222 Aug 2021 95/248

◮ This is easily generalized to n radom variables.
◮ Let Z = max(X1 , · · · , Xn )

FZ (z) = P [Z ≤ z] = P [max(X1 , X2 , · · · , Xn ) ≤ z]
= P [X1 ≤ z, X2 ≤ z, · · · , Xn ≤ z]
= FX1 ···Xn (z, · · · , z)
= FX1 (z) · · · FXn (z), if they are independent
= (FX (z))n , if they are iid
where we take FX as the common df

◮ For example if all Xi are uniform over (0, 1) and ind, then
FZ (z) = z n , 0 < z < 1

P S Sastry, IISc, E1 222 Aug 2021 96/248

◮ Consider Z = min(X, Y ) and X, Y independent

FZ (z) = P [Z ≤ z] = P [min(X, Y ) ≤ z]

◮ It is difficult to write this in terms of joint df of X, Y .

◮ So, we consider the following

P [Z > z] = P [min(X, Y ) > z]

= P [X > z, Y > z]
= P [X > z]P [Y > z], using independence
= (1 − FX (z))(1 − FY (z))
= (1 − FX (z))2 , if they are iid

Hence, FZ (z) = 1 − (1 − FX (z))(1 − FY (z))

◮ We can once again find density of Z if X, Y are
continuous
P S Sastry, IISc, E1 222 Aug 2021 97/248
◮ Suppose X, Y are iid uniform (0, 1).
◮ Z = min(X, Y )

FZ (z) = 1 − (1 − FX (z))2 = 1 − (1 − z)2 , 0 < z < 1

◮ We get the density of Z as

fZ (z) = 2(1 − z), 0 < z < 1

P S Sastry, IISc, E1 222 Aug 2021 98/248

◮ min fn is also easily generalized to n random variables
◮ Let Z = min(X1 , X2 , · · · , Xn )

P [Z > z] = P [min(X1 , X2 , · · · , Xn ) > z]

= P [X1 > z, · · · , Xn > z]
= P [X1 > z] · · · P [Xn > z], using independence
= (1 − FX1 (z)) · · · (1 − FXn (z))
= (1 − FX (z))n , if they are iid

◮ Hence, when Xi are iid, the df of Z is

FZ (z) = 1 − (1 − FX (z))n

where FX is the common df

P S Sastry, IISc, E1 222 Aug 2021 99/248

Joint distribution of max and min
◮ X, Y iid with df F and density f
Z = max(X, Y ) and W = min(X, Y ).
◮ We want joint distribution function of Z and W .
◮ We can use the following
P [Z ≤ z] = P [Z ≤ z, W ≤ w] + P [Z ≤ z, W > w]

P [Z ≤ z, W > w] = P [w < X, Y ≤ z] = (F (z) − F (w))2

P [Z ≤ z] = P [X ≤ z, Y ≤ z] = (F (z))2
◮ So, we get FZW as
FZW (z, w) = P [Z ≤ z, W ≤ w]
= P [Z ≤ z] − P [Z ≤ z, W > w]
= (F (z))2 − (F (z) − F (w))2
◮ Is this correct for all values of z, w?
P S Sastry, IISc, E1 222 Aug 2021 100/248
◮ We have P [w < X, Y ≤ z] = (F (z) − F (w))2 only when
w ≤ z.
◮ Otherwise it is zero.
◮ Hence we get FZW as

(F (z))2

if w > z
FZW (z, w) = 2 2
(F (z)) − (F (z) − F (w)) if w ≤ z

◮ We can get joint density of Z, W as

∂2
fZW (z, w) = FZW (z, w)
∂z ∂w
= 2f (z)f (w), w ≤ z

P S Sastry, IISc, E1 222 Aug 2021 101/248

◮ Let X, Y be iid uniform over (0, 1).
◮ Define Z = max(X, Y ) and W = min(X, Y ).
◮ Then the joint density of Z, W is

fZW (z, w) = 2f (z)f (w), w ≤ z

= 2, 0 < w ≤ z < 1

P S Sastry, IISc, E1 222 Aug 2021 102/248

Order Statistics
◮ Let X1 , · · · , Xn be iid with density f .
◮ Let X(k) denote the k th smallest of these.
◮ That is, X(k) = gk (X1 , · · · , Xn ) where gk : ℜn → ℜ and
the value of gk (x1 , · · · , xn ) is the k th smallest of the
numbers x1 , · · · , xn .
◮ X(1) = min(X1 , · · · , Xn ), X(n) = max(X1 , · · · , Xn )
◮ The joint distribution of X(1) , · · · X(n) is called the order
statistics.
◮ Earlier, we calculated the order statistics for the case
n = 2.
◮ It can be shown that
n
Y
fX(1) ···X(n) (x1 , · · · xn ) = n! f (xi ), x1 < x2 < · · · < xn
i=1

P S Sastry, IISc, E1 222 Aug 2021 103/248

Marginal distributions of X(k)

◮ Let X1 , · · · , Xn be iid with df F and density f .

◮ Let X(k) denote the k th smallest of these.
◮ We want the distribution of X(k) .
◮ The event [X(k) ≤ y] is:
“at least k of these are less than or equal to y”
◮ We want probability of this event.

P S Sastry, IISc, E1 222 Aug 2021 104/248

Marginal distributions of X(k)
◮ X1 , · · · , Xn iid with df F and density f .
◮ P [Xi ≤ y] = F (y) for any i and y.
◮ Since they are independent, we have, e.g.,

P [X1 ≤ y, X2 > y, X3 ≤ y] = (F (y))2 (1 − F (y))

◮ Hence, probability that exactly k of these n random

variables are less than or equal to y is
n
Ck (F (y))k (1 − F (y))n−k
◮ Hence we get
n
X
n
FX(k) (y) = Cj (F (y))j (1 − F (y))n−j
j=k

We can get the density by differentiating this.

P S Sastry, IISc, E1 222 Aug 2021 105/248
Sum of two discrete rv’s
◮ Let X, Y ∈ {0, 1, · · · }
◮ Let Z = X + Y . Then we have
X
fZ (z) = P [X + Y = z] = P [X = x, Y = y]
x,y:
x+y=z
z
X
= P [X = k, Y = z − k]
k=0
Xz
= fXY (k, z − k)
k=0

◮ Now suppose X, Y are independent. Then

z
X
fZ (z) = fX (k)fY (z − k)
k=0

P S Sastry, IISc, E1 222 Aug 2021 106/248

◮ Now suppose X, Y are independent Poisson with
parameters λ1 , λ2 . And, Z = X + Y .
z
X
fZ (z) = fX (k)fY (z − k)
k=0
z
X λk 1 −λ1 λ2z−k −λ2
= e e
k=0
k! (z − k)!
z
−(λ1 +λ2 ) 1 X z!
= e λk1 λ2z−k
z! k=0 k!(z − k)!
1
= e−(λ1 +λ2 ) (λ1 + λ2 )z
z!
◮ Z is Poisson with parameter λ1 + λ2

P S Sastry, IISc, E1 222 Aug 2021 107/248

Sum of two continuous rv
◮ Let X, Y have a joint density fXY . Let Z = X + Y

FZ (z) = P [Z ≤ z] = P [X + Y ≤ z]
Z Z
= fXY (x, y) dy dx
{(x,y):x+y≤z}
Z ∞ Z z−x
= fXY (x, y) dy dx
x=−∞ y=−∞
change variable y to t: t = x + y
dt = dy; y = z − x ⇒ t = z
Z ∞ Z z
= fXY (x, t − x) dt dx
x=−∞ t=−∞
Z z Z ∞
= fXY (x, t − x) dx dt
−∞ −∞

◮ This gives us the density of Z

P S Sastry, IISc, E1 222 Aug 2021 108/248
◮ X, Y have joint density fXY . Z = X + Y . Then
Z ∞
fZ (z) = fXY (x, z − x) dx
−∞

◮ Now suppose X and Y are independent. Then

Z ∞
fZ (z) = fX (x) fY (z − x) dx
−∞

Density of sum of independent random variables is

the convolution of their densities.

fX+Y = fX ∗ fY (Convolution)

P S Sastry, IISc, E1 222 Aug 2021 109/248

Distribution of sum of iid uniform rv’s
◮ Suppose X, Y are iid uniform over (−1, 1).
◮ let Z = X + Y . We want fZ .
◮ The density of X, Y is

◮ fZ is convolution of this density with itself.

P S Sastry, IISc, E1 222 Aug 2021 110/248

◮ fX (x) = 0.5, −1 < x < 1. fY is also same
◮ Note that Z takes values in [−2, 2]
Z ∞
fZ (z) = fX (t) fY (z − t) dt
−∞

◮ For the integrand to be non-zero we need

◮ −1 < t < 1 ⇒ t < 1, t > −1
◮ −1 < z − t < 1 ⇒ t < z + 1, t > z − 1
◮ Hence we need:
t < min(1, z + 1), t > max(−1, z − 1)
◮ Hence, for z < 0, we need −1 < t < z + 1
and, for z ≥ 0 we need z − 1 < t < 1
◮ Thus we get
 R z+1
 −1 14 dt = z+2
4
if − 2 ≤ z < 0
fZ (z) =
 R 1 1 dt = 2−z
if 2 ≥ z ≥ 0
z−1 4 4

P S Sastry, IISc, E1 222 Aug 2021 111/248

◮ Thus, the density of sum of two ind rv’s that are uniform
over (−1, 1) is
z+2
4
if − 2 < z < 0
fZ (z) = 2−z
4
if 0 < z < 2

◮ This is a triangle with vertices (−2, 0), (0, 0.5), (2, 0)

P S Sastry, IISc, E1 222 Aug 2021 112/248

Independence of functions of random variable

◮ Suppose X and Y are independent.

◮ Then g(X) and h(Y ) are independent
◮ This is because [g(X) ∈ B1 ] = [X ∈ B̃1 ] for some Borel
set, B̃1 and similarly [h(Y ) ∈ B2 ] = [Y ∈ B̃2 ]
◮ Hence, [g(X) ∈ B1 ] and [h(Y ) ∈ B2 ] are independent.

P S Sastry, IISc, E1 222 Aug 2021 113/248

Independence of functions of random variable

◮ This is easily generalized to functions of multiple random

variables.
◮ If X, Y are vector random variables (or random vectors),
independence implies [X ∈ B1 ] is independent of
[Y ∈ B2 ] for all borel sets B1 , B2 (in appropriate spaces).
◮ Then g(X) would be independent of h(Y).
◮ That is, suppose X1 , · · · , Xm , Y1 , · · · , Yn are
independent.
◮ Then, g(X1 , · · · , Xm ) is independent of h(Y1 , · · · , Yn ).

P S Sastry, IISc, E1 222 Aug 2021 114/248

◮ Let X1 , X2 , X3 be independent continuous rv
◮ Z = X1 + X2 + X3 .
◮ Can we find density of Z?
◮ Let W = X1 + X2 . We know how to find its density
◮ Then Z = W + X3 and W and X3 are independent.
◮ So, density of Z is the convolution of the densities of W
and X3 .

P S Sastry, IISc, E1 222 Aug 2021 115/248

◮ Suppose X, Y are iid exponential rv’s.

fX (x) = λ e−λx , x > 0

◮ Let Z = X + Y . Then, density of Z is
Z ∞
fZ (z) = fX (x) fY (z − x) dx
−∞
Z z
= λ e−λx λ e−λ(z−x) dx
0
Z z
2 −λz
= λ e dx = λ2 z e−λz
0

◮ Thus, sum of independent exponential random variables

has gamma distribution:

fZ (z) = λz λe−λz , z > 0

P S Sastry, IISc, E1 222 Aug 2021 116/248

Sum of independent gamma rv

◮ Gamma density with parameters α > 0 and λ > 0 is given

by
1
f (x) = λα xα−1 e−λx , x > 0
Γ(α)
We will call this Gamma(α, λ).
◮ The α is called the shape parameter and λ is called the
rate parameter.
◮ For α = 1 this is the exponential density.
◮ Let X ∼ Gamma(α1 , λ), Y ∼ Gamma(α2 , λ).
Suppose X, Y are independent.
◮ Let Z = X + Y . Then Z ∼ Gamma(α1 + α2 , λ).

P S Sastry, IISc, E1 222 Aug 2021 117/248

Z ∞
fZ (z) = fX (x) fY (z − x) dx
−∞
Z z
1 1
= λα1 xα1 −1 e−λx λα2 (z − x)α2 −1 e−λ(z−x) dx
0 Γ(α 1 ) Γ(α 2 )
λα1 +α2 e−λz z α1 −1 x α1 −1 α2 −1 x α2 −1
Z
= z z 1− dx
Γ(α1 )Γ(α2 ) 0 z z
x
change the variable: t = (⇒ z −1 dx = dt)
z
λα1 +α2 e−λz α+ α2 −1 1 α1 −1
Z
= z t (1 − t)α2 −1 dt
Γ(α1 )Γ(α2 ) 0
1 α1 +α2 α1 +α2 −1 −λz
= λ z e
Γ(α1 + α2 )
Because
Z 1
Γ(α1 )Γ(α2 )
tα1 −1 (1 − t)α2 −1 dt =
0 Γ(α1 + α2 )
P S Sastry, IISc, E1 222 Aug 2021 118/248
◮ If X, Y are independent gamma random variables then
X + Y also has gamma distribution.
◮ If X ∼ Gamma(α1 , λ), and Y ∼ Gamma(α2 , λ), then
X + Y ∼ Gamma(α1 + α2 , λ).

P S Sastry, IISc, E1 222 Aug 2021 119/248

Sum of independent Gaussians

◮ Sum of independent Gaussians random variables is a

Gaussian rv
◮ If X ∼ N (µ1 , σ12 ) and Y ∼ N (µ2 , σ22 ) and X, Y are
independent, then
X + Y ∼ N (µ1 + µ2 , σ12 + σ22 )
◮ We can show this.
◮ The algebra is a little involved.
◮ There is a calculation trick that is often useful with
Gaussian density

P S Sastry, IISc, E1 222 Aug 2021 120/248

A Calculation Trick

∞
1 2
Z

I = exp − x − 2bx + c dx
−∞ 2K
Z ∞
1 2 2

= exp − (x − b) + c − b dx
−∞ 2K
Z ∞
(x − b)2 (c − b2 )

= exp − exp − dx
−∞ 2K 2K
(c − b2 ) √

= exp − 2πK
2K

because
∞
(x − b)2

1
Z
√ exp − dx = 1
2πK −∞ 2K

P S Sastry, IISc, E1 222 Aug 2021 121/248

◮ We next look at a general theorem that is quite useful in
dealing with functions of multiple random variables.
◮ This result is only for continuous random variables.

P S Sastry, IISc, E1 222 Aug 2021 122/248

◮ Let X1 , · · · , Xn be continuous random variables with
joint density fX1 ···Xn . We define Y1 , · · · Yn by
Y1 = g1 (X1 , · · · , Xn ) ··· Yn = gn (X1 , · · · , Xn )
We think of gi as components of g : ℜn → ℜn .
◮ We assume g is continuous with continuous first partials
and is invertible.
◮ Let h be the inverse of g. That is
X1 = h1 (Y1 , · · · , Yn ) ··· Xn = hn (Y1 , · · · , Yn )
◮ Each of gi , hi are ℜn → ℜ functions and we can write
them as
yi = gi (x1 , · · · , xn ); ··· xi = hi (y1 , · · · , yn )
We denote the partial derivatives of these functions by
∂xi
∂yj
etc.
P S Sastry, IISc, E1 222 Aug 2021 123/248
◮ The jacobian of the inverse transformation is
∂x1 ∂x1 ∂x1
∂y1 ∂y2
··· ∂yn

∂(x1 , · · · , xn ) ∂x2 ∂x2

··· ∂x2
J= = ∂y1 ∂y2 ∂yn
∂(y1 , · · · , yn ) .. .. .. ..
. . . .
∂xn ∂xn ∂xn
∂y1 ∂y2
··· ∂yn

◮ We assume that J is non-zero in the range of the

transformation
◮ Theorem: Under the above conditions, we have

fY1 ···Yn (y1 , · · · , yn ) = |J|fX1 ···Xn (h1 (y1 , · · · , yn ), · · · , hn (y1 , · · · , yn ))

Or, more compactly, fY (y) = |J|fX (h(y))

P S Sastry, IISc, E1 222 Aug 2021 124/248

◮ Let X1 , X2 have a joint density, fX . Consider

Y1 = g1 (X1 , X2 ) = X1 + X2 (g1 (a, b) = a + b)

Y2 = g2 (X1 , X2 ) = X1 − X2 (g2 (a, b) = a − b)

This transformation is invertible

Y1 + Y2
X1 = h1 (Y1 , Y2 ) = (h1 (a, b) = (a + b)/2)
2
Y1 − Y2
X2 = h2 (Y1 , Y2 ) = (h2 (a, b) = (a − b)/2)
2
0.5 0.5
The jacobian is: = −0.5.
0.5 −0.5
y1 +y2 y1 −y2

◮ This gives: fY1 Y2 (y1 , y2 ) = 0.5 fX1 X2 2
, 2

P S Sastry, IISc, E1 222 Aug 2021 125/248

Proof of Theorem
◮ Let B = (−∞, y1 ] × · · · × (−∞, yn ] ⊂ ℜn . Then
FY (y) = FY1 ···Yn (y1 , · · · yn ) = P [Yi ≤ yi , i = 1, · · · , n]
Z
= fY1 ···Yn (y1′ , · · · , yn′ ) dy1′ · · · dyn′
B

◮ Define
g −1 (B) = {(x1 , · · · , xn ) ∈ ℜn : g(x1 , · · · , xn ) ∈ B}
= {(x1 , · · · , xn ) ∈ ℜn : gi (x1 , · · · , xn ) ≤ yi , i = 1 · · · n}
◮ Then we have
FY1 ···Yn (y1 , · · · yn ) = P [gi (X1 , · · · , Xn ) ≤ yi , i = 1, · · · n]
Z
= fX1 ···Xn (x′1 , · · · , x′n ) dx′1 · · · dx′n
g −1 (B)

P S Sastry, IISc, E1 222 Aug 2021 126/248

Proof of Theorem
◮ B = (−∞, y1 ] × · · · × (−∞, yn ].
◮ g −1 (B) = {(x1 , · · · , xn ) ∈ ℜn : g(x1 , · · · , xn ) ∈ B}

FY (y1 , · · · , yn ) = P [gi (X1 , · · · , Xn ) ≤ yi , i = 1, · · · , n]

Z
= fX1 ···Xn (x′1 , · · · , x′n ) dx′1 · · · dx′n
g −1 (B)
change variables: yi′ = gi (x′1 , · · ·
, x′n ), i = 1, · · · n
(x′1 , · · · x′n ) ∈ g (B) ⇒ (y1′ , · · · , yn′ ) ∈ B
−1

x′i = hi (y1′ , · · · , yn′ ), dx′1 · · · dx′n = |J|dy1′ · · · dyn′

Z
FY (y1 , · · · , yn ) = fX1 ···Xn (h1 (y′ ), · · · , hn (y′ )) |J|dy1′ · · · dyn′
B
⇒ fY1 ···Yn (y1 , · · · , yn ) = fX1 ···Xn (h1 (y), · · · , hn (y)) |J|

P S Sastry, IISc, E1 222 Aug 2021 127/248

◮ X1 , · · · Xn are continuous rv with joint density

Y1 = g1 (X1 , · · · , Xn ) ··· Yn = gn (X1 , · · · , Xn )

◮ The transformation is continuous with continuous first

partials and is invertible and

X1 = h1 (Y1 , · · · , Yn ) ··· Xn = hn (Y1 , · · · , Yn )

◮ We assume the Jacobian of the inverse transform, J, is

non-zero
◮ Then the density of Y is

fY1 ···Yn (y1 , · · · , yn ) = |J|fX1 ···Xn (h1 (y1 , · · · , yn ), · · · , hn (y1 , · · · , yn ))

◮ Called multidimensional change of variable formula

P S Sastry, IISc, E1 222 Aug 2021 128/248

◮ Let X, Y have joint density fXY . Let Z = X + Y .
◮ We want to find fZ using the theorem.
◮ To use the theorem, we need an invertible transformation
of ℜ2 onto ℜ2 of which one component is x + y.
◮ Take Z = X + Y and W = X − Y . This is invertible.
◮ X = (Z + W )/2 and Y = (Z − W )/2. The Jacobian is
1 1
2 2 1
J= =−
1
2
− 12 2
◮ Hence we get
z+w z−w

1
fZW (z, w) = fXY ,
2 2 2
◮ Now we get density of Z as
Z ∞
z+w z−w

1
fZ (z) = fXY , dw
−∞ 2 2 2

P S Sastry, IISc, E1 222 Aug 2021 129/248

◮ let Z = X + Y and W = X − Y . Then
Z ∞
z+w z−w

1
fZ (z) = fXY , dw
−∞ 2 2 2
z+w 1
change the variable: t = ⇒ dt = dw
2 2
Z ∞ ⇒ w = 2t − z ⇒ z − w = 2z − 2t
fZ (z) = fXY (t, z − t) dt
−∞
Z ∞
= fXY (z − s, s) ds,
−∞

◮ We get same result as earlier. If, X, Y are independent

Z ∞
fZ (z) = fX (t) fY (z − t) dt
−∞

P S Sastry, IISc, E1 222 Aug 2021 130/248

◮ let Z = X + Y and W = X − Y . We got

z+w z−w

1
fZW (z, w) = fXY ,
2 2 2
◮ Now we can calculate fW also.
Z ∞
z+w z−w

1
fW (w) = fXY , dz
−∞ 2 2 2
z+w 1
change the variable: t = ⇒ dt = dz
2 2
Z ∞ ⇒ z = 2t − w ⇒ z − w = 2t − 2w
fW (w) = fXY (t, t − w) dt
−∞
Z ∞
= fXY (s + w, s)ds,
−∞

P S Sastry, IISc, E1 222 Aug 2021 131/248

Example
◮ Let X, Y be iid U [0, 1]. Let Z = X − Y .
Z ∞
fZ (z) = fX (t) fY (t − z) dt
−∞
◮ For the integrand to be non-zero
◮ 0 ≤ t ≤ 1 ⇒ t ≥ 0, t ≤ 1
◮ 0 ≤ t − z ≤ 1 ⇒ t ≥ z, t ≤ 1 + z
◮ ⇒ max(0, z) ≤ t ≤ min(1, 1 + z)
◮ Thus, we get density as (note Z ∈ (−1, 1))
 R
 1+z 1 dt = 1 + z, if − 1 ≤ z ≤ 0
0
fZ (z) =
 1 1 dt = 1 − z,
R
0≤z≤1
z

◮ Thus, when X, Y ∼ U (0, 1) iid

fX−Y (z) = 1 − |z|, −1 < z < 1

P S Sastry, IISc, E1 222 Aug 2021 132/248

◮ We showed that
Z ∞ Z ∞
fX+Y (z) = fXY (t, z − t) dt = fXY (z − t, t) dt
−∞ −∞
Z ∞ Z ∞
fX−Y (w) = fXY (t, t − w) dt = fXY (t + w, t)dt
−∞ −∞

◮ Suppose X, Y are discrete. Then we have

X
fX+Y (z) = P [X + Y = z] = P [X = k, Y = z − k]
k
X
= fXY (k, z − k)
k
X
fX−Y (w) = P [X − Y = w] = P [X = k, Y = k − w]
k
X
= fXY (k, k − w)
k

P S Sastry, IISc, E1 222 Aug 2021 133/248

Distribution of product of random variables
◮ We want density of Z = XY .
◮ We need one more function to make an invertible
transformation
◮ A possible choice: Z = XY W =Y
◮ This is invertible: X = Z/W Y = W
1 −z
w w2 1
J= =
0 1 w
◮ Hence we get
1 z
fZW (z, w) = fXY ,w
w w
◮ Thus we get the density of product as
Z ∞
1 z
fZ (z) = fXY , w dw
−∞ w w
P S Sastry, IISc, E1 222 Aug 2021 134/248
Density of XY
◮ Let X, Y have joint density fXY . Let Z = XY .
◮ We can find density of XY directly also (but it is more
complicated)
◮ Let Az = {(x, y) ∈ ℜ2 : xy ≤ z} ⊂ ℜ2 .

FZ (z) = P [XY ≤ z] = P [(X, Y ) ∈ Az ]

Z Z
= fXY (x, y) dy dx
Az

◮ We need to find limits for integrating over Az

◮ If x > 0, then xy ≤ z ⇒ y ≤ z/x
If x < 0, then xy ≤ z ⇒ y ≥ z/x
Z 0 Z ∞ Z ∞ Z z/x
FZ (z) = fXY (x, y) dy dx+ fXY (x, y) dy dx
−∞ z/x 0 −∞

P S Sastry, IISc, E1 222 Aug 2021 135/248

Z 0 Z ∞ Z ∞ Z z/x
FZ (z) = fXY (x, y) dy dx + fXY (x, y) dy dx
−∞ z/x 0 −∞

◮ Change variable from y to t using t = xy

y = t/x; dy = x1 dt; y = z/x ⇒ t = z
Z 0 Z −∞ Z ∞Z z
1 t 1 t
FZ (z) = fXY (x, ) dt dx + fXY (x, ) dt d
−∞ z x x 0 −∞ x x
Z 0 Z z Z ∞Z z
1 t 1 t
= fXY (x, ) dt dx + fXY (x, )
−∞ x x −∞ x x
Z−∞
∞ Z z 0
1 t
= fXY x, dt dx
−∞ −∞ x x
Z z Z ∞
1 t
= fXY x, dx dt
−∞ −∞ x x
R∞
This shows: fZ (z) = −∞ x1 fXY x, xz dx

P S Sastry, IISc, E1 222 Aug 2021 136/248

example

◮ Let X, Y be iid U (0, 1). Let Z = XY .

Z ∞
1 z
fZ (z) = fX fY (w) dw
−∞ w w
z
◮ We need: 0 < w < 1 and 0 < w
< 1. Hence
1 1
1 1
Z Z
fZ (z) = dw = dw = − ln(z), 0 < z < 1
z w z w

P S Sastry, IISc, E1 222 Aug 2021 137/248

◮ X, Y have joint density and Z = XY . Then
Z ∞
1 z
fZ (z) = fXY .w dw
−∞ w w

Suppose X, Y are discrete and Z = XY

X X
fZ (0) = P [X = 0 or Y = 0] = fXY (x, 0) + fXY (0, y)
x y

X k X k
fZ (k) = P X = ,Y = y = fXY , y , k 6= 0
y6=0
y y6=0
y

◮ We cannot always interchange density and mass

functions!!

P S Sastry, IISc, E1 222 Aug 2021 138/248

◮ We wanted density of Z = XY .
◮ We used: Z = XY and W = Y .
◮ We could have used: Z = XY and W = X.
◮ This is invertible: X = W and Y = Z/W .

0 1 1
J= =−
1
w
−z
w2
w

◮ This gives

1 z
fZW (z, w) = fXY w,
w w
Z ∞
1 z
fZ (z) = fXY w, dw
−∞ w w
◮ The fZ should be same in both cases.
P S Sastry, IISc, E1 222 Aug 2021 139/248
Distributions of quotients
◮ X, Y have joint density and Z = X/Y .
◮ We can take: Z = X/Y W =Y
◮ This is invertible: X = ZW Y = W
w z
J= =w
0 1

◮ Hence we get

fZW (z, w) = |w| fXY (zw, w)

◮ Thus we get the density of quotient as

Z ∞
fZ (z) = |w| fXY (zw, w) dw
−∞

P S Sastry, IISc, E1 222 Aug 2021 140/248

example
◮ Let X, Y be iid U (0, 1). Let Z = X/Y .
Note Z ∈ (0, ∞)
Z ∞
fZ (z) = |w| fX (zw) fY (w) dw
−∞

◮ We need 0 < w < 1 and 0 < zw < 1 ⇒ w < 1/z.

◮ So, when z ≤ 1, w goes from 0 to 1; when z > 1, w goes
from 0 to 1/z.
◮ Hence we get density as
 R
 01 w dw = 21 , if 0 < z ≤ 1
fZ (z) =
 R 1/z w dw = 1 , 1 < z < ∞
0 2z 2

P S Sastry, IISc, E1 222 Aug 2021 141/248

◮ X, Y have joint density and Z = X/Y
Z ∞
fZ (z) = |w| fXY (zw, w) dw
−∞

◮ SupposeX, Y are discrete and Z = X/Y

fZ (z) = P [Z = z] = P [X/Y = z]
X
= P [X = yz, Y = y]
y
X
= fXY (yz, y)
y

P S Sastry, IISc, E1 222 Aug 2021 142/248

◮ We chose: Z = X/Y and W = Y .
◮ We could have taken: Z = X/Y and W = X
◮ The inverse is: X = W and Y = W/Z

0 1 w
J= =−
− zw2 1
z
z2

◮ Thus we get the density of quotient as

Z ∞
w w
fZ (z) = 2
f XY w, dw
−∞ z z
w dw
put t = ⇒ dt = , w = tz
z z
Z ∞
= |t|fXY (tz, t) dt
−∞

◮ We can show that the density of quotient is same in both

these approches.
P S Sastry, IISc, E1 222 Aug 2021 143/248
Summary: Densities of standard functions of rv’s

◮ We derived densities of sum, difference, product and

quotient of random variables.
Z ∞ Z ∞
fX+Y (z) = fXY (t, z − t) dt = fXY (z − t, t) dt
−∞ −∞
Z ∞ Z ∞
fX−Y (z) = fXY (t, t − z) dt = fXY (t + z, t)dt
−∞ −∞
Z ∞ Z ∞
1 z 1 z
fX∗Y (z) = fXY , t dt = fXY t, dt
−∞ t t −∞ t t
Z ∞ Z ∞
t t
f(X/Y ) (z) = |t| fXY (zt, t) dt = 2
fXY t, dt
−∞ −∞ z z

P S Sastry, IISc, E1 222 Aug 2021 144/248

Exchangeable Random Variables
◮ X1 , X2 , · · · , Xn are said to be exchangeable if their joint
distribution is same as that of any permutation of them.
◮ let (i1 , · · · , in ) be a permutation of (1, 2, · · · , n). Then
joint df of (Xi1 , · · · , Xin ) should be same as that
(X1 , · · · , Xn )
◮ Take n = 3. Suppose FX1 X2 X3 (a, b, c) = g(a, b, c). If they
are exchangeable, then

FX2 X3 X1 (a, b, c) = P [X2 ≤ a, X3 ≤ b, X1 ≤ c]

= P [X1 ≤ c, X2 ≤ a, X3 ≤ b]
= g(c, a, b) = g(a, b, c)

◮ The df or density should be “symmetric” in its variables if

the random variables are exchangeable.

P S Sastry, IISc, E1 222 Aug 2021 145/248

◮ Consider the density of three random variables
2
f (x, y, z) = (x + y + z), 0 < x, y, z < 1
3
◮ They are exchangeable (because f (x, y, z) = f (y, x, z))
◮ If random variables are exchangeable then they are
identically distributed.
FXY Z (a, ∞, ∞) = FXY Z (∞, ∞, a) ⇒ FX (a) = FZ (a)
◮ The above example shows that exchangeable random
variables need not be independent. The joint density is
not factorizable.
Z 1Z 1
2 2(x + 1)
(x + y + z) dy dz =
0 0 3 3
◮ So, the joint density is not the product of marginals

P S Sastry, IISc, E1 222 Aug 2021 146/248

Expectation of functions of multiple rv
◮ Theorem: Let Z = g(X1 , · · · Xn ) = g(X). Then
Z
E[Z] = g(x) dFX (x)
ℜn

◮ That is, if they have a joint density, then

Z
E[Z] = g(x) fX (x) dx
ℜn

◮ Similarly, if all Xi are discrete

X
E[Z] = g(x) fX (x)
x

P S Sastry, IISc, E1 222 Aug 2021 147/248

◮ Let Z = X + Y . Let X, Y have joint density fXY
Z ∞Z ∞
E[X + Y ] = (x + y) fXY (x, y) dx dy
−∞ −∞
Z ∞ Z ∞
= x fXY (x, y) dy dx
−∞
Z ∞−∞ Z ∞
+ y fXY (x, y) dx dy
−∞ −∞
Z ∞ Z ∞
= x fX (x) dx + y fY (y) dy
−∞ −∞
= E[X] + E[Y ]

◮ Expectation is a linear operator.

◮ This is true for all random variables.

P S Sastry, IISc, E1 222 Aug 2021 148/248

◮ We saw E[X + Y ] = E[X] + E[Y ].
◮ Let us calculate Var(X + Y ).

Var(X + Y ) = E ((X + Y ) − E[X + Y ])2

= E ((X − EX) + (Y − EY ))2

= E (X − EX)2 + E (Y − EY )2

+2E [(X − EX)(Y − EY )]

= Var(X) + Var(Y ) + 2Cov(X, Y )

where we define covariance between X, Y as

Cov(X, Y ) = E [(X − EX)(Y − EY )]

P S Sastry, IISc, E1 222 Aug 2021 149/248

◮ We define covariance between X and Y by

Cov(X, Y ) = E [(X − EX)(Y − EY )]

= E [XY − X(EY ) − Y (EX) + EX EY ]
= E[XY ] − EX EY

◮ Note that Cov(X, Y ) can be positive or negative

◮ X and Y are said to be uncorrelated if Cov(X, Y ) = 0
◮ If X and Y are uncorrelated then

Var(X + Y ) = Var(X) + Var(Y )

◮ Note that E[X + Y ] = E[X] + E[Y ] for all random

variables.

P S Sastry, IISc, E1 222 Aug 2021 150/248

Example
◮ Consider the joint density

fXY (x, y) = 2, 0 < x < y < 1

◮ We want to calculate Cov(X, Y )

Z 1Z 1 1
1
Z
EX = x 2 dy dx = 2 x (1 − x) dx =
0 x 0 3
1 y 1
2
Z Z Z
EY = y 2 dx dy = 2 y 2 dy =
0 0 0 3
1Z y 1
y2 1
Z Z
E[XY ] = xy 2 dx dy = 2 y dy =
0 0 0 2 4

1 2 1
◮ Hence, Cov(X, Y ) = E[XY ] − EX EY = 4
− 9
= 36
P S Sastry, IISc, E1 222 Aug 2021 151/248
Independent random variables are uncorrelated

◮ Suppose X, Y are independent. Then

Z Z
E[XY ] = x y fXY (x, y) dx dy
Z Z
= x y fX (x) fY (y) dx dy
Z Z
= xfX (x) dx yfY (y) dy = EX EY

◮ Then, Cov(X, Y ) = E[XY ] − EX EY = 0.

◮ X, Y independent ⇒ X, Y uncorrelated

P S Sastry, IISc, E1 222 Aug 2021 152/248

Uncorrelated random variables may not be
independent
◮ Suppose X ∼ N (0, 1) Then, EX = EX 3 = 0
◮ Let Y = X 2 Then,

E[XY ] = EX 3 = 0 = EX EY

◮ Thus X, Y are uncorrelated.

◮ Are they independent? No
e.g.,
P [X > 2 |Y < 1] = 0 6= P [X > 2]

◮ X, Y are uncorrealted does not imply they are

independent.
P S Sastry, IISc, E1 222 Aug 2021 153/248
◮ We define the correlation coefficient of X, Y by

Cov(X, Y )
ρXY = p
Var(X) Var(Y )
◮ If X, Y are uncorrelated then ρXY = 0.
◮ We will show that |ρXY | ≤ 1
◮ Hence −1 ≤ ρXY ≤ 1, ∀X, Y

P S Sastry, IISc, E1 222 Aug 2021 154/248

◮ We have E [(αX + βY )2 ] ≥ 0, ∀α, β ∈ ℜ
α2 E[X 2 ] + β 2 E[Y 2 ] + 2αβE[XY ] ≥ 0, ∀α, β ∈ ℜ
E[XY ]
Take α = −
E[X 2 ]
(E[XY ])2 2 2 (E[XY ])2
+ β E[Y ] − 2β ≥ 0, ∀β ∈ ℜ
E[X 2 ] E[X 2 ]
aβ 2 + bβ + c ≥ 0, ∀β ⇒ b2 − 4ac ≤ 0
2
(E[XY ])2 2

2 (E[XY ])
⇒ 4 − 4E[Y ] ≤0
E[X 2 ] E[X 2 ]
2
(E[XY ])2 E[Y 2 ](E[XY ])2

⇒ ≤
E[X 2 ] E[X 2 ]
(E[XY ])4 E[Y 2 ](E[X 2 ])2
⇒ ≤
(E[XY ])2 E[X 2 ]
⇒ (E[XY ])2 ≤ E[X 2 ]E[Y 2 ]

P S Sastry, IISc, E1 222 Aug 2021 155/248

◮ We showed that
(E[XY ])2 ≤ E[X 2 ]E[Y 2 ]
◮ Take X − EX in place of X and Y − EY in place of Y
in the above algebra.
◮ This gives us
(E[(X − EX)(Y − EY )])2 ≤ E[(X−EX)2 ]E[(Y −EY )2 ]

⇒ (Cov(X, Y ))2 ≤ Var(X)Var(Y )

◮ Hence we get
!2
Cov(X, Y )
ρ2XY = p ≤1
Var(X)Var(Y )
◮ The equality holds here only if E [(αX + βY )2 ] = 0
Thus, |ρXY | = 1 only if αX + βY = 0
◮ Correlation coefficient of X, Y is ±1 only when Y is a
linear function of X P S Sastry, IISc, E1 222 Aug 2021 156/248
Linear Least Squares Estimation

◮ Suppose we want to approximate Y as an affine function

of X.
◮ We want a, b to minimize E [(Y − (aX + b))2 ]
◮ For a fixed a, what is the b that minimizes
E [((Y − aX) − b)2 ] ?
◮ We know the best b here is:
b = E[Y − aX] = EY − aEX.
◮ So, we want to find the best a to minimize
J(a) = E [(Y − aX − (EY − aEX))2 ]

P S Sastry, IISc, E1 222 Aug 2021 157/248

◮ We want to find a to minimize

J(a) = E (Y − aX − (EY − aEX))2

= E ((Y − EY ) − a(X − EX))2

= E (Y − EY )2 + a2 (X − EX)2 − 2a(Y − EY )(X − EX)

= Var(Y ) + a2 Var(X) − 2aCov(X, Y )

◮ So, the optimal a satisfies

Cov(X, Y )
2aVar(X) − 2Cov(X, Y ) = 0 ⇒ a =
Var(X)

P S Sastry, IISc, E1 222 Aug 2021 158/248

◮ The final mean square error, say, J ∗ is

J ∗ = Var(Y ) + a2 Var(X) − 2aCov(X, Y )

2
Cov(X, Y ) Cov(X, Y )
= Var(Y ) + Var(X) − 2 Cov(X, Y )
Var(X) Var(X)
(Cov(X, Y ))2
= Var(Y ) −
Var(X)
(Cov(X, Y ))2

= Var(Y ) 1 −
Var(Y ) Var(X)
2

= Var(Y ) 1 − ρXY

P S Sastry, IISc, E1 222 Aug 2021 159/248

◮ The best mean-square approximation of Y as a ‘linear’
function of X is

Cov(X, Y ) Cov(X, Y )
Y = X + EY − EX
Var(X) Var(X)
◮ Called the line of regression of Y on X.
◮ If cov(X, Y ) = 0 then this reduces to approximating Y by
a constant, EY .
◮ The final mean square error is

Var(Y ) 1 − ρ2XY

◮ If ρXY = ±1 then the error is zero

◮ If ρXY = 0 the final error is Var(Y )

P S Sastry, IISc, E1 222 Aug 2021 160/248

◮ The covariance of X, Y is

Cov(X, Y ) = E[(X−EX) (Y −EY )] = E[XY ]−EX EY

Note that Cov(X, X) = Var(X)

◮ X, Y are called uncorrelated if Cov(X, Y ) = 0.
◮ X, Y independent ⇒ X, Y uncorrelated.
◮ Uncorrelated random variables need not necessarily be
independent
◮ Covariance plays an important role in linear least squares
estimation.
◮ Informally, covariance captures the ‘linear dependence’
between the two random variables.

P S Sastry, IISc, E1 222 Aug 2021 161/248

Covariance Matrix
◮ Let X1 , · · · , Xn be random variables (on the same
probability space)
◮ We represent them as a vector X.
◮ As a notation, all vectors are column vectors:
X = (X1 , · · · , Xn )T
◮ We denote E[X] = (EX1 , · · · , EXn )T
◮ The n × n matrix whose (i, j)th element is Cov(Xi , Xj ) is
called the covariance matrix (or variance-covariance
matrix) of X. Denoted as ΣX or ΣX
 
Cov(X1 , X1 ) Cov(X1 , X2 ) · · · Cov(X1 , Xn )
 Cov(X2 , X1 ) Cov(X2 , X2 ) · · · Cov(X2 , Xn ) 
ΣX =  .. .. .. ..
 

 . . . . 
Cov(Xn , X1 ) Cov(Xn , X2 ) · · · Cov(Xn , Xn )

P S Sastry, IISc, E1 222 Aug 2021 162/248

Covariance matrix

◮ If a = (a1 , · · · , an )T then
a aT is a n × n matrix whose (i, j)th element is ai aj .
◮ Hence we get

ΣX = E (X − EX) (X − EX)T

◮ This is because
(X − EX) (X − EX)T ij = (Xi − EXi )(Xj − EXj )

and (ΣX )ij = E[(Xi − EXi )(Xj − EXj )]

P S Sastry, IISc, E1 222 Aug 2021 163/248

◮ Recall the following about vectors and matrices
◮ let a, b ∈ ℜn be column vectors. Then
2 T T
aT b = aT b a b = bT a aT b = bT a aT b

◮ Let A be an n × n matrix with elements aij . Then

n
X
T
b Ab = bi bj aij
i,j=1

where b = (b1 , · · · , bn )T
◮ A is said to be positive semidefinite if bT Ab ≥ 0, ∀b

P S Sastry, IISc, E1 222 Aug 2021 164/248

◮ ΣX is a real symmetric matrix
◮ It is positive semidefinite.
◮ Let a ∈ ℜn and let Y = aT X.
◮ Then, EY = aT EX. We get variance of Y as
h 2 i
Var(Y ) = E[(Y − EY )2 ] = E aT X − aT EX
h 2 i
T
= E a (X − EX)
= E aT (X − EX) (X − EX)T a

= aT E (X − EX) (X − EX)T a

= aT ΣX a

◮ This gives aT ΣX a ≥ 0, ∀a
◮ This shows ΣX is positive semidefinite

P S Sastry, IISc, E1 222 Aug 2021 165/248

Y = aT X = i ai Xi – linear combination of Xi ’s.
P
◮

◮ We know how to find its mean and variance

X
EY = aT EX = ai EXi ;
i
X
Var(Y ) = aT ΣX a = ai aj Cov(Xi , Xj )
i,j

Specifically, by taking all components of a to be 1, we get

◮

n
! n n n X
X X X X
Var Xi = Cov(Xi , Xj ) = Var(Xi )+ Cov(Xi , Xj )
i=1 i,j=1 i=1 i=1 j6=i

◮ If Xi are independent, variance of sum is sum of

variances.

P S Sastry, IISc, E1 222 Aug 2021 166/248

◮ Covariance matrix ΣX positive semidefinite because

aT ΣX a = Var(aT X) ≥ 0

◮ ΣX would be positive definite if aT ΣX a > 0, ∀a 6= 0

◮ It would fail to be positive definite if Var(aT X) = 0 for
some nonzero a.
◮ Var(Z) = E[(Z − EZ)2 ] = 0 implies Z = EZ, a
constant.
◮ Hence, ΣX fails to be positive definite only if there is a
non-zero linear combination of Xi ’s that is a constant.

P S Sastry, IISc, E1 222 Aug 2021 167/248

◮ Covariance matrix is a real symmetric positive
semidefinite matrix
◮ It have real and non-negative eigen values.
◮ It would have n linearly independent eigen vectors.
◮ These also have some interesting roles.
◮ We consider one simple example.

P S Sastry, IISc, E1 222 Aug 2021 168/248

◮ Let Y = aT X and assume ||a|| = 1
◮ Y is projection of X along the direction a.
◮ Suppose we want to find a direction along which variance
is maximized
◮ We want to maximize aT ΣX a subject to aT a = 1
◮ The lagrangian is aT ΣX a + η(1 − aT a)
◮ Equating the gradient to zero, we get

ΣX a = ηa

◮ So, a should be an eigen vector (with eigen value η).

◮ Then the variance would be aT ΣX a = ηaT a = η
◮ Hence the direction is the eigen vector corresponding to
the highest eigen value.

P S Sastry, IISc, E1 222 Aug 2021 169/248

Joint moments

◮ Given two random variables, X, Y

◮ The joint moment of order (i, j) is defined by

mij = E[X i Y j ]

m10 = EX, m01 = EY , m11 = E[XY ] and so on

◮ Similarly joint central moments of order (i, j) are defined
by
sij = E (X − EX)i (Y − EY )j

s10 = s01 = 0, s11 = Cov(X, Y ), s20 = Var(X) and so on

◮ We can similarly define joint moments of multiple random
variables

P S Sastry, IISc, E1 222 Aug 2021 170/248

◮ We can define moment generating function of X, Y by

MXY (s, t) = E esX+tY , s, t ∈ ℜ

◮ This is easily generalized to n random variables

h T i
MX (s) = E es X , s ∈ ℜn

◮ Once again, we can get all the moments by differentiating

the moment generating function
∂
MX (s) = EXi
∂si s=0

◮ More generally
∂ m+n
MX (s) = EXin Xjm
∂sni ∂sm
j s=0

P S Sastry, IISc, E1 222 Aug 2021 171/248

Conditional Expectation

◮ Suppose X, Y have a joint density fXY

◮ Consider the conditional density fX|Y (x|y). This is a
density in x for every value of y.
◮ Since it isR a density, we can use it in an expectation
integral: g(x) fX|Y (x|y) dx
◮ This is like expectation of g(X) since fX|Y (x|y) is a
density in x.
◮ However, its value would be a function of y.
◮ That is, this is a kind of expectation that is a function of
Y (and hence is a random variable)
◮ It is called conditional expectation.

P S Sastry, IISc, E1 222 Aug 2021 172/248

◮ Let X, Y be discrete random variables (on the same
probability space).
◮ The conditinal expectation of h(X) conditioned on Y is a
function of Y , and is defined by
E[h(X)|Y ] = g(Y ) where
X
E[h(X)|Y = y] = g(y) = h(x) fX|Y (x|y)
x

◮ Thus
X
E[h(X)|Y = y] = h(x) fX|Y (x|y)
x
X
= h(x) P [X = x|Y = y]
x

◮ Note that, E[h(X)|Y ] is a random variable

P S Sastry, IISc, E1 222 Aug 2021 173/248
◮ Let X, Y have joint density fXY .
◮ The conditinal expectation of h(X) conditioned on Y is a
function of Y , and its value for any y is defined by
Z ∞
E[h(X)|Y = y] = h(x) fX|Y (x|y) dx
−∞

◮ Once again, what this means is that E[h(X)|Y ] = g(Y )

where Z ∞
g(y) = h(x) fX|Y (x|y) dx
−∞

P S Sastry, IISc, E1 222 Aug 2021 174/248

◮ We can actually define E[h(X, Y )|Y ] also as above.

That is,
Z ∞
E[h(X, Y )|Y = y] = h(x, y) fX|Y (x|y) dx
−∞

◮ It has all the properties of expectation:

P S Sastry, IISc, E1 222 Aug 2021 176/248

◮ Conditional expectation also has some extra properties
which are very important
◮ E [E[h(X)|Y ]] = E[h(X)]
◮ E[h1 (X)h2 (Y )|Y ] = h2 (Y )E[h1 (X)|Y ]
◮ E[h(X, Y )|Y = y] = E[h(X, y)|Y = y]
◮ We will justify each of these.
◮ The last property above follows directly from the
definition.

P S Sastry, IISc, E1 222 Aug 2021 177/248

◮ Expectation of a conditional expectation is the
unconditional expectation
E [ E[h(X)|Y ] ] = E[h(X)]
In the above, LHS is expectation of a function of Y .
◮ Let us denote g(Y ) = E[h(X)|Y ]. Then
E [ E[h(X)|Y ] ] = E[g(Y )]
Z ∞
= g(y) fY (y) dy
−∞
Z ∞ Z ∞
= h(x) fX|Y (x|y) dx fY (y) dy
−∞ −∞
Z ∞Z ∞
= h(x) fXY (x, y) dy dx
−∞ −∞
Z ∞
= h(x) fX (x) dx
−∞
= E[h(X)]
P S Sastry, IISc, E1 222 Aug 2021 178/248
◮ Any factor that depends only on the conditioning variable
behaves like a constant inside a conditional expectation

E[h1 (X) h2 (Y )|Y ] = h2 (Y )E[h1 (X)|Y ]

◮ Let us denote g(Y ) = E[h1 (X) h2 (Y )|Y ]

g(y) = E[h1 (X) h2 (Y )|Y = y]

P S Sastry, IISc, E1 222 Aug 2021 179/248

◮ A very useful property of conditional expectation is
E[ E[X|Y ] ] = E[X] (Assuming all expectations exist)
◮ We can see this in our earlier example.

fXY (x, y) = 2, 0 < x < y < 1

◮ We easily get: EX = 13 and EY = 32

◮ We also showed E[X|Y ] = Y2

Y 1
E[ E[X|Y ] ] = E = = E[X]
2 3
◮ Similarly

1+X 2
E[ E[Y |X] ] = E = = E[Y ]
2 3

P S Sastry, IISc, E1 222 Aug 2021 180/248

Example
◮ Let X, Y be random variables with joint density given by

fXY (x, y) = e−y , 0 < x < y < ∞

◮ The marginal densities are:
Z ∞ Z ∞
fX (x) = fXY (x, y) dy = e−y dy = e−x , x > 0
−∞ x
Z ∞ Z y
fY (y) = fXY (x, y) dx = e−y dx = y e−y , y > 0
−∞ 0
Thus, X is exponential and Y is gamma.
◮ Hence we have

EX = 1; Var(X) = 1; EY = 2; Var(Y ) = 2

P S Sastry, IISc, E1 222 Aug 2021 181/248

fXY (x, y) = e−y , 0 < x < y < ∞

◮ Let us calculate covariance of X and Y

Z ∞Z ∞
E[XY ] = xy fXY (x, y) dx dy
−∞ −∞
Z ∞Z y Z ∞
−y 1 3 −y
= xye dx dy = y e dy = 3
0 0 0 2
◮ Hence, Cov(X, Y ) = E[XY ] − EX EY = 3 − 2 = 1.
◮ ρXY = √12

P S Sastry, IISc, E1 222 Aug 2021 182/248

◮ Recall the joint and marginal densities

fXY (x, y) = e−y , 0 < x < y < ∞

fX (x) = e−x , x > 0; fY (y) = ye−y , y > 0

◮ The conditional densities will be
fXY (x, y) e−y 1
fX|Y (x|y) = = −y = , 0 < x < y < ∞
fY (y) ye y

fXY (x, y) e−y

fY |X (y|x) = = −x = e−(y−x) , 0 < x < y < ∞
fX (x) e

P S Sastry, IISc, E1 222 Aug 2021 183/248

◮ The conditional densities are
1
fX|Y (x|y) = ; fY |X (y|x) = e−(y−x) , 0 < x < y < ∞
y
◮ We can now calculate the conditional expectation
Z y
1 y
Z
E[X|Y = y] = x fX|Y (x|y) dx = x dx =
0 y 2
Y
Thus E[X|Y ] = 2
Z Z ∞
E[Y |X = x] = y fY |X (y|x) dy = ye−(y−x) dy
Z ∞ x
x −y ∞ −y
= e −ye x + e dy
x
= ex xe−x + e−x = 1 + x

Thus, E[Y |X] = 1 + X

P S Sastry, IISc, E1 222 Aug 2021 184/248
◮ We got
Y
E[X|Y ] = ; E[Y |X] = 1 + X
2
◮ Using this we can verify:

Y EY 2
E[ E[X|Y ] ] = E = = = 1 = EX
2 2 2

E[ E[Y |X] ] = E[1 + X] = 1 + 1 = 2 = EY

P S Sastry, IISc, E1 222 Aug 2021 185/248

◮ A property of conditional expectation is

E[ E[X|Y ] ] = E[X]

◮We assume that all three expectations exist.

◮ Very useful in calculating expectations

X Z
EX = E[ E[X|Y ] ] = E[X|Y = y] fY (y) or E[X|Y = y] fY (y) d
y

◮ Can be used to calculate probabilities of events too

P (A) = E[IA ] = E [ E [IA |Y ] ]

P S Sastry, IISc, E1 222 Aug 2021 186/248

◮ Let X be geometric and we want EX.
◮ X is number of tosses needed to get head
◮ Let Y ∈ {0, 1} be outcome of first toss. (1 for head)

P S Sastry, IISc, E1 222 Aug 2021 187/248

◮ P [X = k|Y = 1] = 1 if k = 1 (otherwise it is zero) and
hence E[X|Y = 1] = 1
(
0 if k = 1
P [X = k|Y = 0] = (1−p)k−1 p
(1−p)
if k ≥ 2

Hence
∞
X
E[X|Y = 0] = k (1 − p)k−2 p
k=2
X∞ ∞
X
k−2
= (k − 1) (1 − p) p+ (1 − p)k−2 p
k=2 k=2
X∞ ∞
X
′ ′
= k ′ (1 − p)k −1 p + (1 − p)k −1 p
k′ =1 k′ =1
= EX + 1

P S Sastry, IISc, E1 222 Aug 2021 188/248

Another example

◮ Example: multiple rounds of the party game

◮ Let Rn denote number of rounds when you start with n
people.
◮ We want R̄n = E [Rn ].
◮ We want to use E [Rn ] = E[ E [Rn |Xn ] ]
◮ We need to think of a useful Xn .
◮ Let Xn be the number of people who got their own hat in
the first round with n people.

P S Sastry, IISc, E1 222 Aug 2021 189/248

◮ Rn – number of rounds when you start with n people.
◮ Xn – number of people who got their own hat in the first
round

E [Rn ] = E[ E [Rn |Xn ] ]

Xn
= E [Rn |Xn = i] P [Xn = i]
i=0
n
X
= (1 + E [Rn−i ]) P [Xn = i]
i=0
Xn n
X
= P [Xn = i] + E [Rn−i ] P [Xn = i]
i=0 i=0

If we can guess value of E[Rn ] then we can prove it using

mathematical induction

P S Sastry, IISc, E1 222 Aug 2021 190/248

◮ What would be E[Xn ]?
◮ Let Yi ∈ {0, 1} denote whether or not ith person got his
own hat.
◮ We know
(n − 1)! 1
E[Yi ] = P [Yi = 1] = =
n! n
n
X n
X
Now, Xn = Yi and hence EXn = E[Yi ] = 1
i=1 i=1

◮ Hence a good guess is E[Rn ] = n.

◮ We verify it using mathematical induction. We know
E[R1 ] = 1

P S Sastry, IISc, E1 222 Aug 2021 191/248

◮ Assume: E [Rk ] = k, 1 ≤ k ≤ n − 1
n
X n
X
E [Rn ] = P [Xn = i] + E [Rn−i ] P [Xn = i]
i=0 i=0
n
X
= 1 + E [Rn ] P [Xn = 0] + E [Rn−i ] P [Xn = i]
i=1
n
X
= 1 + E [Rn ] P [Xn = 0] + (n − i) P [Xn = i]
i=1

n
X
E [Rn ] (1 − P [Xn = 0]) = 1 + n(1 − P [Xn = 0]) − i P [Xn = i]
i=1
= 1 + n (1 − P [Xn = 0]) − E[Xn ]
= 1 + n (1 − P [Xn = 0]) − 1
⇒ E [Rn ] = n

P S Sastry, IISc, E1 222 Aug 2021 192/248

Analysis of Quicksort

◮ Given n numbers we want to sort them. Many algorithms.

◮ Complexity – order of the number of comparisons needed
◮ Quicksort: Choose a pivot. Separte numbers into two
parts – less and greater than pivot, do recursively
◮ Separating into two parts takes n − 1 comparisons.
◮ Suppose the two parts contain m and n − m − 1.
Comparisons needed to Separate each of them into two
parts depends on m
◮ So, final number of comparisons depends on the ‘number
of rounds’

P S Sastry, IISc, E1 222 Aug 2021 193/248

quicksort details
◮ Given {x1 , · · · , xn }.
◮ Choose first as pivot
{xj1 , xj2 , · · · , xjm }x1 {xk1 , xk2 , · · · , xkn−1−m }
◮ Suppose rn is the number of comparisons. If we get
(roughly) equal parts, then
rn ≈ n+2rn/2 = n+2(n/2+2rn/4 ) = n+n+4rn/4 = · · · = n log2 (n)
◮ If all the rest go into one part, then
n(n + 1)
rn = n + rn−1 = n + (n − 1) + rn−2 = · · · =
2
◮ If you are lucky, O(n log(n)) comparisons.
◮ If unlucky, in the worst case, O(n2 ) comparisons
◮ Question: ‘on the average’ how many comparisons?
P S Sastry, IISc, E1 222 Aug 2021 194/248
Average case complexity of quicksort
◮ Assume pivot is equally likely to be the smallest or second
smallest or mth smallest.
◮ Mn – number of comparisons.
◮ Define: X = j if pivot is j th smallest
◮ Given X = j we know Mn = (n − 1) + Mj−1 + Mn−j .
n
X
E[Mn ] = E[ E[Mn |X] ] = E[Mn |X = j] P [X = j]
j=1
n
X 1
= E[(n − 1) + Mj−1 + Mn−j ]
j=1
n
n−1
2X
= (n − 1) + E[Mk ], (taking M0 = 0)
n k=1
◮ This is a recurrence relation. (A little complicated to
solve)
P S Sastry, IISc, E1 222 Aug 2021 195/248
Least squares estimation
◮ We want to estimate Y as a function of X.
◮ We want an estimate with minimum mean square error.
◮ We want to solve (the min is over all functions g)

min E (Y − g(X))2
g

◮ Earlier we considered only linear functions:

g(X) = aX + b
◮ Now we want the ‘best’ function (linear or nonlinear)
◮ The solution now turns out to be

g ∗ (X) = E[Y |X]

◮ Let us prove this.

P S Sastry, IISc, E1 222 Aug 2021 196/248

◮ We want to show that for all g

E (E[Y | X] − Y )2 ≤ E (g(X) − Y )2

◮ We have
2
(g(X) − Y )2 =

(g(X) − E[Y | X]) + (E[Y | X] − Y )
2 2
= g(X) − E[Y | X] + E[Y | X] − Y

+ 2 g(X) − E[Y | X] E[Y | X] − Y

◮ Now we can take expectation on both sides.

◮ We first show that expectation of last term on RHS
above is zero.

P S Sastry, IISc, E1 222 Aug 2021 197/248

First consider the last term

E (g(X) − E[Y | X])(E[Y | X] − Y )

= E E (g(X) − E[Y | X])(E[Y | X] − Y ) | X
because E[Z] = E[ E[Z|X] ]

= E (g(X) − E[Y | X]) E (E[Y | X] − Y ) | X
because E[h1 (X)h2 (Z)|X] = h1 (X) E[h2 (Z)|X]

= E (g(X) − E[Y | X]) E (E[Y | X])|X − E{Y | X})

= E (g(X) − E[Y | X]) (E[Y | X] − E[Y | X))
= 0

P S Sastry, IISc, E1 222 Aug 2021 198/248

◮ We earlier got
2 2
(g(X) − Y )2 = g(X) − E[Y | X] + E[Y | X] − Y

+ 2 g(X) − E[Y | X] E[Y | X] − Y

◮ Hence we get

E (g(X) − Y )2 = E (g(X) − E[Y | X])2

+ E (E[Y | X] − Y )2

≥ E (E[Y | X] − Y )2

◮ Since the above is true for all functions g, we get

g ∗ (X) = E [Y | X]

P S Sastry, IISc, E1 222 Aug 2021 199/248

Sum of random number of random variables

◮ Let X1 , X2 , · · · be iid rv on the same probability space.

Suppose EXi = µ < ∞, ∀i.
◮ Let N be a positive integer valued rv that is independent
of all Xi (EN < ∞)
Let S = N
P
i=1 Xi .
◮

◮ We want to calculate ES.

◮ We can use
E[S] = E[ E[S|N ] ]

P S Sastry, IISc, E1 222 Aug 2021 200/248

◮ Hence we get

E[S|N ] = N µ ⇒ E[S] = E[N ]E[X1 ]

P S Sastry, IISc, E1 222 Aug 2021 201/248

Wald’s formula
We took S = N
P
i=1 Xi with N independent of all Xi .
◮

◮ With iid Xi , the formula ES = EN EX1 is valid even

under some dependence between N and Xi .
◮ Here are one version of assumptions needed.
1 |] < ∞ and EN < ∞ (Xi iid).
A1 E[|X
A2 E Xn I[N ≥n] = E[Xn ]P [N ≥ n], ∀n
Let SN = N
P
i=1 Xi .
◮

◮ Then, ESN = EX1 EN

◮ Suppose the event [N ≤ n − 1] depends only on
X1 , · · · , Xn−1 .
◮ Such an N is called a stopping time.
◮ Then the event [N ≤ n − 1] and hence its complement
[N ≥ n] is independent of Xn and hence A2 holds.

P S Sastry, IISc, E1 222 Aug 2021 202/248

Wald’s formula

◮ In the general case, we do not need Xi to be iid.

◮ Here is one version of this Wald’s formula. We assume
i |] < ∞, ∀i and EN < ∞.
1. E[|X
2. E Xn I[N ≥n] = E[Xn ]P [N ≥ n], ∀n
Let SN = N
P PN
i=1 Xi and let TN = i=1 E[Xi ].
◮

◮ Then, ESN = ETN .

If E[Xi ] is same for all i, ESN = EX1 EN .

P S Sastry, IISc, E1 222 Aug 2021 203/248

Variance of random sum
PN
◮ S= i=1 Xi , Xi iid, ind of N . Want Var(S)
 !2    !2 
XN XN
E[S 2 ] = E  Xi  = E  E  Xi | N 
i=1 i=1

◮ As earlier, we have
 !2   !2 
XN n
X
E Xi | N = n = E  Xi | N = n
i=1 i=1
 !2 
n
X
= E Xi 
i=1

P S Sastry, IISc, E1 222 Aug 2021 204/248

Let Y = ni=1 Xi , Xi iid
P
◮

◮ Then, Var(Y ) = n Var(X1 )

◮ Hence we have

E[Y 2 ] = Var(Y ) + (EY )2 = n Var(X1 ) + (nEX1 )2

◮ Using this
 !2   !2 
N
X n
X
E Xi | N = n = E  Xi  = n Var(X1 )+(nEX1 )2
i=1 i=1

◮ Hence
 !2 
N
X
E Xi | N  = N Var(X1 ) + N 2 (EX1 )2
i=1

P S Sastry, IISc, E1 222 Aug 2021 205/248

PN
◮ S= i=1 Xi (Xi iid). We got

E[S 2 ] = E[ E[S 2 |N ] ] = EN Var(X1 ) + E[N 2 ](EX1 )2

◮ Now we can calculate variance of S as

Var(S) = E[S 2 ] − (ES)2

= EN Var(X1 ) + E[N 2 ](EX1 )2 − (EN EX1 )2
EN Var(X1 ) + (EX1 )2 E[N 2 ] − (EN )2

=
= EN Var(X1 ) + Var(N ) (EX1 )2

P S Sastry, IISc, E1 222 Aug 2021 206/248

Another Example
◮ We toss a (biased) coin till we get k consecutive heads.
Let Nk denote the number of tosses needed.
◮ N1 would be geometric.
◮ We want E[Nk ]. What rv should we condition on?
◮ Useful rv here is Nk−1

E[Nk | Nk−1 = n] = (n + 1)p + (1 − p)(n + 1 + E[Nk ])

◮ Thus we get the recurrence relation

E[Nk ] = E[ E[Nk | Nk−1 ] ]

= E [(Nk−1 + 1)p + (1 − p)(Nk−1 + 1 + E[Nk ])]

P S Sastry, IISc, E1 222 Aug 2021 207/248

◮ We have
E[Nk ] = E [(Nk−1 + 1)p + (1 − p)(Nk−1 + 1 + E[Nk ])]
◮ Denoting Mk = E[Nk ], we get
Mk = pMk−1 + p + (1 − p)Mk−1 + (1 − p) + (1 − p)Mk
pMk = Mk−1 + 1
1 1
Mk = Mk−1 +
p p
2 2
1 1 1 1 1 1 1
= Mk−2 + + = Mk−2 + +
p p p p p p p
k−1 k−1 j
1 X 1
= M1 +
p j=1
p
1 − pk 1
= taking M 1 =
(1 − p)pk p

P S Sastry, IISc, E1 222 Aug 2021 208/248

◮ As mentioned earlier, we can use the conditional
expectation to calculate probabilities of events also.

P (A) = E[IA ] = E [ E [IA |Y ] ]

E[IA |Y = y] = P [IA = 1|Y = y] = P (A|Y = y)

◮ Thus, we get

P (A) = E[IA ] = E [ E [IA |Y ] ]

X
= P (A|Y = y)P [Y = y], when Y is discrete
y
Z
= P (A|Y = y) fY (y) dy, when Y is continuous

P S Sastry, IISc, E1 222 Aug 2021 209/248

Example
◮ Let X, Y be independent continuous rv
◮ We want to calculate P [X ≤ Y ]
◮ We can calculate it by integrating joint density over
A = {(x, y) : x ≤ y}
Z Z
P [X ≤ Y ] = fX (x) fY (y) dx dy
A
Z ∞ Z y
= fY (y) fX (x) dx dy
−∞ −∞
Z ∞
= FX (y) fY (y) dy
−∞

◮ IF X, Y are iid then P [X < Y ] = 0.5

P S Sastry, IISc, E1 222 Aug 2021 210/248

◮ We can also use the conditional expectation method here
Z ∞
P [X ≤ Y ] = P [X ≤ Y | Y = y] fY (y) dy
−∞
Z ∞
= P [X ≤ y | Y = y] fY (y) dy
−∞
Z ∞
= P [X ≤ y] fY (y) dy
−∞
Z ∞
= FX (y) fY (y) dy
−∞

P S Sastry, IISc, E1 222 Aug 2021 211/248

Another Example

◮ Consider a sequence of bernoullli trials where p,

probability of success, is random.
◮ We first choose p uniformly over (0, 1) and then perform
n tosses.
◮ Let X be the number of heads.
◮ Conditioned on knowledge of p, we know distribution of
X
P [X = k | p] = n Ck pk (1 − p)n−k
◮ Now we can calculate P [X = k] using the conditioning
argument.

P S Sastry, IISc, E1 222 Aug 2021 212/248

◮ Assuming p is chosen uniformly from (0, 1), we get
Z
P [X = k] = [P [X = k | p] f (p) dp
Z 1
n
= Ck pk (1 − p)n−k 1 dp
0

n k!(n − k)!
= Ck
(n + 1)!
Z 1
Γ(k + 1)Γ(n − k + 1)
because pk (1 − p)n−k dp =
0 Γ(n + 2)
1
=
n+1
1
◮ So, we get: P [X = k] = n+1
, k = 0, 1, · · · , n

P S Sastry, IISc, E1 222 Aug 2021 213/248

Tower property of Conditional Expectation

◮ Conditional expectation satisfies

E[ E[h(X)|Y, Z] | Y ] = E[h(X)|Y ]

Note that all these can be random vectors.

◮ Let

g1 (Y, Z) = E[h(X)|Y, Z]
g2 (Y ) = E[g1 (Y, Z)|Y ]

We want to show g2 (Y ) = E[h(X)|Y ]

P S Sastry, IISc, E1 222 Aug 2021 214/248

◮ Recall: g1 (Y, Z) = E[h(X)|Y, Z], g2 (Y ) = E[g1 (Y, Z)|Y ]
Z
g2 (y) = g1 (y, z) fZ|Y (z|y) dz
Z Z
= h(x) fX|Y Z (x|y, z) dx fZ|Y (z|y) dz
Z Z
= h(x) fX|Y Z (x|y, z) fZ|Y (z|y) dz dx
Z Z
= h(x) fXZ|Y (x, z|y) dz dx
Z
= h(x) fX|Y (x|y) dx

◮ Thus we get
E[ E[h(X)|Y, Z] | Y ] = E[h(X)|Y ]

P S Sastry, IISc, E1 222 Aug 2021 215/248

Gaussian or Normal distribution
◮ The Gaussian or normal density is given by
1 (x−µ)2
f (x) = √ e− 2σ2 , −∞ < x < ∞
σ 2π
◮ If X has this density, we denote it as X ∼ N (µ, σ 2 ).
We showed EX = µ and Var(X) = σ 2
◮ The density is a ‘bell-shaped’ curve

P S Sastry, IISc, E1 222 Aug 2021 216/248

◮ Standard Normal rv — X ∼ N (0, 1)
◮ The distribution function of standard normal is
Z x
1 t2
Φ(x) = √ e− 2 dt
−∞ 2π
◮ Suppose X ∼ N (µ, σ 2 )
Z b
1 (x−µ)2
P [a ≤ X ≤ b] = √ e− 2σ2 dx
a σ 2π
(x − µ) 1
take y = ⇒ dy = dx
σ σ
Z (b−µ)
σ 1 y2
= √ e− 2 dy
(a−µ) 2π
σ
b−µ a−µ

= Φ −Φ
σ σ
◮ We can express probability of events involving all Normal
rv using Φ.
P S Sastry, IISc, E1 222 Aug 2021 217/248
◮ X ∼ N (0, 1). Then its mgf is
Z ∞
tX 1 x2
MX (t) = E e = etx √ e− 2 dx
2π
Z ∞ −∞
1 1 2
= √ e− 2 (x −2tx) dx
2π −∞
Z ∞
1
e− 2 ((x−t) −t ) dx
1 2 2
= √
2π −∞
Z ∞
1 2
t 1 1 2
= e2 √ e− 2 (x−t) dx
2π −∞
1 2
= e2t
◮ Now let Y = σX + µ. Then Y ∼ N (µ, σ 2 ).
The mgf of Y is
MY (t) = E et(σX+µ) = etµ E e(tσ)X = etµ MX (tσ)

= e(µt+ 2 t σ )
1 2 2

P S Sastry, IISc, E1 222 Aug 2021 218/248

Multi-dimensional Gaussian Distribution

◮ The n-dimensional Gaussian density is given by

1 1 T Σ−1 (x−
fX (x) = 1 n
e− 2 (x−µ) µ) , x ∈ ℜn
|Σ| (2π)
2 2

◮ µ ∈ ℜn and Σ ∈ ℜn×n are parameters of the density and

Σ is symmetric and positive definite.
◮ If X1 , · · · , Xn have the above joint density, they are said
to be jointly Gaussian.
◮ We denote this by X ∼ N (µ, Σ)
◮ We will now show that this is a joint density function.

P S Sastry, IISc, E1 222 Aug 2021 219/248

◮ We begin by showing the following is a density (when M
is symmetric +ve definite)
1 T My
fY (y) = C e− 2 y
1 T
Let I = ℜn C e− 2 y M y dy
R
◮

◮ Since M is real symmetric, there exists an orthogonal

transform, L with L−1 = LT , |L| = 1 and LT M L is
diagonal
◮ Let LT M L = diag(m1 , · · · , mn ).
◮ Then for any z ∈ ℜn ,
X
zT LT M Lz = mi zi2
i

P S Sastry, IISc, E1 222 Aug 2021 220/248

◮ We now get
Z
1 T
I = C e− 2 y M y dy
ℜn
change variable: z = L−1 y = LT y ⇒ y = Lz
Z
1 T T
= C e− 2 z L M Lz dz (note that |L| = 1)
n
Zℜ
1 P 2
= C e− 2 i mi zi dz
ℜn
n n Z zi2
YZ − 21
− 21 mi zi2 1
Y
= C e dzi = C e mi
dzi
i=1 ℜ i=1 ℜ
n r
Y 1
= C 2π
i=1
mi

P S Sastry, IISc, E1 222 Aug 2021 221/248

◮ We will first relate m1 · · · mn to the matrix M .
◮ By definition, LT M L = diag(m1 , · · · , mn ). Hence

1 1 −1
diag ,··· , = LT M L = L−1 M −1 (LT )−1 = LT M −1 L
m1 mn

◮ Since |L| = 1, we get

1
LT M −1 L = M −1 =
m1 · · · mn
Putting all this together
n r
1
Z
1
− 21 yT M y n
Y
Ce dy = C 2π = C (2π) 2 M −1 2

ℜn i=1
mi

1
Z
1 T My
⇒ n 1 e− 2 y dy = 1
(2π) |M −1 |
2 2 ℜn

P S Sastry, IISc, E1 222 Aug 2021 222/248

◮ We showed the following is a density (taking M −1 = Σ)
1 1 T Σ−1 y
fY (y) = n 1 e− 2 y , y ∈ ℜn
(2π) |Σ|
2 2

◮ Let X = Y + µ. Then
1 1 T Σ−1 (x−
fX (x) = fY (x − µ) = n 1 e− 2 (x−µ) µ)
(2π) |Σ|
2 2

◮ This is the multidimensional Gaussian distribution

P S Sastry, IISc, E1 222 Aug 2021 223/248

◮ Consider Y with joint density
1 1 T Σ−1 y
fY (y) = n 1 e− 2 y , y ∈ ℜn
(2π) |Σ|
2 2

◮ As earlier let M = Σ−1 . Let LT M L = diag(m1 , · · · , mn )

◮ Define Z = (Z1 , · · · , Zn )T = LT Y. Then Y = LZ.
◮ Recall |L| = 1, |M −1 | = (m1 · · · mn )−1
◮ Then density of Z is
1 1 T T 1 1 P
mi zi2
fZ (z) = n 1 e− 2 z L M Lz
= n 1 1 e
−2 i

(2π) |M −1 |
2 2 (2π) 2 ( m1 ···m n
)2
n n z2
r r
1 1 1 1 − 12 1i
− 12 mi zi2
Y Y
= q e = q e mi
i=1
2π 1
i=1
2π 1
mi mi

This shows that Zi ∼ N (0, m1i ) and Zi are independent.

P S Sastry, IISc, E1 222 Aug 2021 224/248
◮ If Y has density fY and Z = LT Y then Zi ∼ N (0, m1i )
and Zi are independent. Hence,

1 1
ΣZ = diag ,··· , = LT M −1 L
m1 mn

◮ Also, since E[Zi ] = 0, ΣZ = E[ZZT ].

◮ Since Y = LZ, E[Y] = 0 and

ΣY = E[YYT ] = E[LZZT LT ] = LE[ZZT ]LT = L(LT M −1 L)LT = M −1

◮ Thus, if Y has density

1 1 T Σ−1 y
fY (y) = n 1 e− 2 y , y ∈ ℜn
(2π) |Σ|
2 2

then EY = 0 and ΣY = M −1 = Σ
P S Sastry, IISc, E1 222 Aug 2021 225/248
◮ Let Y have density
1 1 T Σ−1 y
fY (y) = n 1 e− 2 y , y ∈ ℜn
(2π) |Σ|
2 2

◮ Let X = Y + µ. Then
1 1 T Σ−1 (x−
fX (x) = n 1 e− 2 (x−µ) µ)
(2π) |Σ|
2 2

◮ We have
EX = E[Y + µ] = µ

ΣX = E[(X − µ)(X − µ)T ] = E[YYT ] = Σ

P S Sastry, IISc, E1 222 Aug 2021 226/248

Multi-dimensional Gaussian density
◮ X = (X1 , · · · , Xn )T are said to be jointly Gaussian if
1 1 T Σ−1 (x−
fX (x) = n 1 e− 2 (x−µ) µ)
(2π) |Σ|
2 2

◮ EX = µ and ΣX = Σ.
◮ Suppose Cov(Xi , Xj ) = 0, ∀i 6= j ⇒ Σij = 0, ∀i 6= j.
◮ Then Σ is diagonal. Let Σ = diag(σ12 , · · · , σn2 ).
2 n 2
1 1

xi −µi xi −µi
− 21 n − 12
P Y
fX (x) = n e i=1 σi
= √ e σi

(2π) σ1 · · · σn
2
i=1
σi 2π

◮ This implies Xi are independent.

◮ If X1 , · · · , Xn are jointly Gaussian then uncorrelatedness
implies independence.
P S Sastry, IISc, E1 222 Aug 2021 227/248
◮ Let X = (X1 , · · · , Xn )T be jointly Gaussian:
1 1 T Σ−1 (x−
fX (x) = n 1 e− 2 (x−µ) µ)
(2π) |Σ|
2 2

◮ Let Y = X − µ.
◮ Let M = Σ−1 and L be such that
LT M L = diag(m1 , · · · , mn )
◮ Let Z = (Z1 , · · · , Zn )T = LT Y .
◮ Then we saw that Zi ∼ N (0, m1i ) and Zi are independent.
◮ If X1 , · · · , Xn are jointly Gaussian then there is a ‘linear’
transform that transforms them into independent random
variables.

P S Sastry, IISc, E1 222 Aug 2021 228/248

Moment generating function
◮ Let X = (X1 , · · · , Xn )T be jointly Gaussian
◮ Let Y = X − µ and Z = (Z1 , · · · , Zn )T = LT Y as earlier
◮ The moment generating function of X is given by
h T i
MX (s) = E es X
h T i h T i
s (Y+µ) sT µ
= E e =e E es Y
T
h T i
= es µ E es LZ
h T i
sT µ
= e E eu Z
where u = LT s
T
= es µ MZ (u)

P S Sastry, IISc, E1 222 Aug 2021 229/248

◮ Since Zi are independent, easy to get MZ .
◮ We know Zi ∼ N (0, m1i ). Hence

1 1 u2
u2i i
MZi (ui ) = e 2 mi = e 2mi

h i n n u2 u2
i i
P
uT Z
Y Y
ui Z i

MZ (u) = E e = E e = e 2mi = e i 2mi

i=1 i=1

◮ We derived earlier
T
MX (s) = es µ MZ (u), where u = LT s

P S Sastry, IISc, E1 222 Aug 2021 230/248

◮ We got
P u2
i
T
MX (s) = es µ MZ (u); u = LT s; MZ (u) = e i 2mi

◮ Earlier we have shown LT M −1 L = diag( m11 , · · · , m1n )

where M −1 = Σ. Now we get

1 X u2i 1 1 1
= uT (LT M −1 L)u = sT M −1 s = sT Σs
2 i mi 2 2 2

◮ Hence we get
T 1 T
MX (s) = es µ + 2 s Σs

◮ This is the moment generating function of

multi-dimensional Normal density

P S Sastry, IISc, E1 222 Aug 2021 231/248

◮Let X, Y be jointly Gaussian. For simplicity let
EX = EY = 0.
◮ Let Var(X) = σ 2 , Var(Y ) = σ 2 ;
x y
let ρXY = ρ ⇒ Cov(X, Y ) = ρσx σy .
◮ Now, the covariance matrix and its inverse are given by

σx2 σy2

ρσx σy −1 1 −ρσx σy
Σ= ; Σ = 2 2
ρσx σy σy2 σx σy (1 − ρ2 ) −ρσx σy σx2

◮ The joint density of X, Y is given by

x2 y2
1 − 1
2(1−ρ2 ) 2 + 2 − 2ρxy
σx σy
fXY (x, y) = p e σx σy

2πσx σy 1 − ρ2
◮ This is the bivariate Gaussian density

P S Sastry, IISc, E1 222 Aug 2021 232/248

◮ Suppose X, Y are jointly Gaussian (with the density
above)
◮ Then, all the marginals and conditionals would be
Gaussian.
◮ X ∼ N (0, σx2 ), and Y ∼ N (0, σy2 )
◮ fX|Y (x|y) would be a Gaussian density with mean yρ σσxy
and variance σx2 (1 − ρ2 ).

P S Sastry, IISc, E1 222 Aug 2021 233/248

◮ Let X = (X1 , · · · , Xn )T be jointly Gaussian.
◮ Then we call X as a Gaussian vector.
◮ It is possible that Xi , i = 1, · · · , n are individually
Gaussian but X is not a Gaussian vector.
◮ For example, X, Y may be individually Gaussian but their
joint density is not the bivariate normal density.
◮ Gaussian vectors have some special properties. (E.g.,
uncorrelated implies independence)
◮ Important to note that ‘individually Gaussian’ does not
mean ‘jointly Gaussian’

P S Sastry, IISc, E1 222 Aug 2021 234/248

◮ The multi-dimensional Gaussian density has some
important properties.
◮ We have seen some of them earlier.
◮ If X1 , · · · , Xn are jointly Gaussian then they are
independent if they are uncorrelated.
◮ Suppose X1 , · · · , Xn be jointly Gaussian and have zero
means. Then there is an orthogonal transform Y = AX
such that Y1 , · · · , Yn are jointly Gaussian and
independent.
◮ Another important property is the following
◮ X1 , · · · , Xn are jointly Gaussian if and only if tT X is
Gaussian for for all non-zero t ∈ ℜn .
◮ We will prove this using moment generating functions

P S Sastry, IISc, E1 222 Aug 2021 235/248

◮ Suppose X = (X1 , · · · , Xn )T be jointly Gaussian and let
W = tT X.
◮ Let µX and ΣX denote the mean vector and covariance
matrix of X. Then

µw , EW = tT µX ; σw2 , Var(W ) = tT ΣX t

◮ The mgf of W is given by

uW h T i
MW (u) = E e = E eu t X
Tµ 1 2 T
= MX (ut) = eut x + 2 u t Σx t

1 2 σ2
= euµw + 2 u w

showing that W is Gaussian

◮ Shows density of Xi is Gaussian for each i. For example,
if we take t = (1, 0, 0, · · · , 0)T then tT X would be X1 .
P S Sastry, IISc, E1 222 Aug 2021 236/248
◮ Now suppose W = tT X is Gaussian for all t 6= 0.
1 2 σ2 Tµ 1 2
tT Σ X t
MW (u) = euµw + 2 u w = eu t X+2u

◮This implies
h T i T 1 2 T
E eu t X = eu t µX + 2 u t ΣX t , ∀u ∈ ℜ, ∀t ∈ ℜn , t 6= 0
h T i T 1 T
E et X = et µX + 2 t ΣX t , ∀t

This implies X is jointly Gaussian.

◮ This is a defining property of multidimensional Gaussian
density

P S Sastry, IISc, E1 222 Aug 2021 237/248

◮ Let X = (X1 , · · · , Xn )T be jointly Gaussian.
◮ Let A be a k × n matrix with rank k.
◮ Then Y = AX is jointly Gaussian.
◮ We will once again show this using the moment
generating function.
◮ Let µx and Σx denote mean vector and covariance matrix
of X. Similarly µy and Σy for Y
◮ We have µy = Aµx and

Σy = E (Y − µy )(Y − µy )T

= E (A(X − µx ))(A(X − µx ))T

= E A(X − µx )(X − µx )T AT

= A E (X − µx )(X − µx )T AT = AΣx AT

P S Sastry, IISc, E1 222 Aug 2021 238/248

◮ The mgf of Y is
h T i
MY (s) = E es Y (s ∈ ℜk )
h T i
s AX
= E e
= MX (AT s)
Tµ 1 T
x + 2 t Σx t
(Recall MX (t) = et )
1 T
sT Aµ x+ 2 s A
Σx AT s
= e
T 1 T
= e s µy + 2 s Σ y s

This shows Y is jointly Gaussian

P S Sastry, IISc, E1 222 Aug 2021 239/248

◮ X is jointly Gaussian and A is a k × n matrix with rank k.
◮ Then Y = AX is jointly Gaussian.
◮ This shows all marginals of X are gaussian
◮ For example, if you take A to be

1 0 0 ··· 0
A=
0 1 0 ··· 0

then Y = (X1 , X2 )T

P S Sastry, IISc, E1 222 Aug 2021 240/248

◮ Finding the distribution of a rv by calculating its mgf is
useful in many situations.
◮ Let X1 , X2 , · · · be iid with mgf MX (t).
Let SN = N
P
i=1 Xi where N is a positive integer valued
◮
rv which is independent of all Xi .
◮ We want to find out the distribution of SN .
◮ We can calculate mgf of SN in terms of MX and
distribution of N .
◮ We can use properties of conditional expectation for this

P S Sastry, IISc, E1 222 Aug 2021 241/248

The mgf of SN is MSN (t) = E etSN

◮

h PN i
E etSN | N = n = E et i=1 Xi | N = n

h Pn i
t i=1 Xi
= E e |N =n
" n #
h Pn i Y
= E et i=1 Xi = E etXi
i=1
n
Y
E etXi = (MX (t))n

=
i=1

◮ Hence we get

E etSN | N = (MX (t))N

P S Sastry, IISc, E1 222 Aug 2021 242/248

◮ We can now find mgf of SN as

MSN (t) = E etSN

= E E etSN | N

h i
= E (MX (t))N
∞
X
= (MX (t))n fN (n)
n=1
= GN ( MX (t) )

where GN (s) = EsN is the generating function of N

◮ This method is useful for finding distribution of SN when
we can recognize the distribution from its mgf

P S Sastry, IISc, E1 222 Aug 2021 243/248

◮ We can also find distribution function of SN directly using
the technique of conditional expectations.
◮ FSN (s) = P [SN ≤ s] and we know how to find
probabilities of events using conditional expectation.
" N # ∞
" N #
X X X
P Xi ≤ s = P Xi ≤ s | N = n P [N = n]
i=1 n=1
∞
" i=1
n
#
X X
= P Xi ≤ s P [N = n]
n=1 i=1

P S Sastry, IISc, E1 222 Aug 2021 244/248

Jensen’s Inequality
◮ Let g : ℜ → ℜ be a convex function. Then
g(EX) ≤ E[g(X)]
◮ For example, (EX)2 ≤ E [X 2 ]
◮ Function g is convex if (see figure on left)
g(αx+(1−α)y) ≤ αg(x)+(1−α)g(y), ∀x, y, ∀0 ≤ α ≤ 1
◮ If g is convex, then, given any x0 , exists λ(x0 ) such that
(see figure on right)
g(x) ≥ g(x0 ) + λ(x0 )(x − x0 ), ∀x

P S Sastry, IISc, E1 222 Aug 2021 245/248

Jensen’s Inequality: Proof
◮ We have: ∀x0 , ∃λ(x0 ) such that

g(x) ≥ g(x0 ) + λ(x0 )(x − x0 ), ∀x

◮ Take x0 = EX and x = X(ω). Then

g(X(ω)) ≥ g(EX) + λ(EX)(X(ω) − EX), ∀ω

◮ Y (ω) ≥ Z(ω), ∀ω ⇒ Y ≥ Z ⇒ EY ≥ EZ
Hence we get

g(X) ≥ g(EX) + λ(EX)(X − EX)

⇒ E[g(X)] ≥ g(EX) + λ(EX) E[X − EX] = g(EX)

◮ This completes the proof

P S Sastry, IISc, E1 222 Aug 2021 246/248

◮ Consider the set of all mean-zero random variables.
◮ It is closed under addition and scalar (real number)
multiplication.
◮ Cov(X, Y ) = E[XY ] satisfies
1. Cov(X, Y ) = Cov(Y, X)
2. Cov(X, X) = Var(X) ≥ 0 and is zero only if X = 0
3. Cov(aX, Y ) = aCov(X, Y )
4. Cov(X1 + X2 , Y ) = Cov(X1 , Y ) + Cov(X2 , Y )
◮ Thus Cov(X, Y ) is an inner product here.
◮ The Cauchy-Schwartz inequality (|xT y| ≤ ||x|| ||y||)
gives
p p
|Cov(X, Y )| ≤ Cov(X, X) Cov(Y, Y ) = Var(X) Var(Y )

◮ This is same as |ρXY | ≤ 1

◮ A generalization of Cauchy-Schwartz inequality is Holder
inequality
P S Sastry, IISc, E1 222 Aug 2021 247/248
Holder Inequality
1 1
◮ For all p, q with p, q > 1 and p
+ q
=1
1 1
E[|XY |] ≤ (E|X|p ) p (E|Y |q ) q
(We assume all the expectations are finite)
◮ If we take p = q = 2
p
E[|XY |] ≤ E[X 2 ] E[Y 2 ]
◮ This is same as Cauchy-Schwartz inequality. This implies
|ρXY | ≤ 1.
Cov(X, Y ) = E[(X − EX)(Y − EY )]

≤ E (X − EX)(Y − EY )
p
≤ E[(X − EX)2 ] E[(Y − EY )2 ]
p
= Var(X) Var(Y )

P S Sastry, IISc, E1 222 Aug 2021 248/248

Module 3
No ratings yet
Module 3
93 pages
Jointly Distributed Random Variables: Jeff Chak Fu WONG
No ratings yet
Jointly Distributed Random Variables: Jeff Chak Fu WONG
44 pages
Joint
No ratings yet
Joint
5 pages
CL 202 Multiplerandomvars
No ratings yet
CL 202 Multiplerandomvars
45 pages
Maths Unit III
No ratings yet
Maths Unit III
66 pages
Lect6 PDF
No ratings yet
Lect6 PDF
11 pages
Joint Random Variables 1
No ratings yet
Joint Random Variables 1
11 pages
Lect Slides#3
No ratings yet
Lect Slides#3
80 pages
M 4
No ratings yet
M 4
18 pages
Joint Distribution
No ratings yet
Joint Distribution
37 pages
Ma2013e Mat Iv.2
No ratings yet
Ma2013e Mat Iv.2
39 pages
Joint Probability Functions
No ratings yet
Joint Probability Functions
7 pages
Chap 7
No ratings yet
Chap 7
22 pages
PS - Module 2 Aartyh
No ratings yet
PS - Module 2 Aartyh
206 pages
3-Joint Probability Distribution-03-02-2024
No ratings yet
3-Joint Probability Distribution-03-02-2024
22 pages
AMA2104 Probability and Engineering Statistics 3 Joint Distribution
No ratings yet
AMA2104 Probability and Engineering Statistics 3 Joint Distribution
25 pages
Random Variables - 2D
No ratings yet
Random Variables - 2D
17 pages
Chapter 6 - Joint Distributions
No ratings yet
Chapter 6 - Joint Distributions
20 pages
L14 Types of Random Vector
No ratings yet
L14 Types of Random Vector
4 pages
Joint Distributions Functions: Scott Sheffield
No ratings yet
Joint Distributions Functions: Scott Sheffield
68 pages
Chapter 4 - Multivariate Probability Distribution
No ratings yet
Chapter 4 - Multivariate Probability Distribution
27 pages
Lecture 7 - Fall 2023
No ratings yet
Lecture 7 - Fall 2023
29 pages
Notes 05
No ratings yet
Notes 05
19 pages
Chap 5 PME
No ratings yet
Chap 5 PME
48 pages
Chapter 5
No ratings yet
Chapter 5
56 pages
Unit - Ii
No ratings yet
Unit - Ii
65 pages
Joint Distribution and Expection
No ratings yet
Joint Distribution and Expection
36 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
33 pages
S201, Lec 2
No ratings yet
S201, Lec 2
48 pages
21MAB203T, Unit-2, Updated Feb 26, 2025
No ratings yet
21MAB203T, Unit-2, Updated Feb 26, 2025
49 pages
Lecture 4
No ratings yet
Lecture 4
2 pages
Elec2600 Lecture Part III H
No ratings yet
Elec2600 Lecture Part III H
237 pages
Chap4 Slides
No ratings yet
Chap4 Slides
50 pages
Joint Probability
No ratings yet
Joint Probability
28 pages
Chapter 4
No ratings yet
Chapter 4
97 pages
Notes 05
No ratings yet
Notes 05
20 pages
5 - Pair R. V.
No ratings yet
5 - Pair R. V.
24 pages
Supportive Notes & QB-Distribution Theory-PS-Unit2
No ratings yet
Supportive Notes & QB-Distribution Theory-PS-Unit2
11 pages
Joint Dist
No ratings yet
Joint Dist
30 pages
Probability and Random Variable (Chapter 5) : Il-Suek Koh
No ratings yet
Probability and Random Variable (Chapter 5) : Il-Suek Koh
53 pages
Stats 116 SU
No ratings yet
Stats 116 SU
128 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Econ-2042 - Unit 4-HO
No ratings yet
Econ-2042 - Unit 4-HO
13 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
14 pages
Joint Distribution and Later
No ratings yet
Joint Distribution and Later
61 pages
CHAPTER 03-Random Variable
No ratings yet
CHAPTER 03-Random Variable
68 pages
ST2334 Chapter 3 Slides
No ratings yet
ST2334 Chapter 3 Slides
76 pages
Chapter 4: Multiple Random Variables
No ratings yet
Chapter 4: Multiple Random Variables
34 pages
Multiple Random Variables
No ratings yet
Multiple Random Variables
244 pages
Proba 4
No ratings yet
Proba 4
14 pages
Unit 2b
No ratings yet
Unit 2b
19 pages
STB251 Unit 2b
No ratings yet
STB251 Unit 2b
19 pages
Unit II - Sjit
No ratings yet
Unit II - Sjit
32 pages
Lect 04
No ratings yet
Lect 04
24 pages
Lecture 6 Joint
No ratings yet
Lecture 6 Joint
34 pages
Joint Density
No ratings yet
Joint Density
28 pages
Chap 3.1
No ratings yet
Chap 3.1
25 pages
Ppt-21mab204t - Unit III Two Dimensional Rvs
No ratings yet
Ppt-21mab204t - Unit III Two Dimensional Rvs
107 pages
Analisis Bertopik Form 3 Chap 6 10 R
No ratings yet
Analisis Bertopik Form 3 Chap 6 10 R
23 pages
Syllabus of IBA MBA Admission Test
50% (6)
Syllabus of IBA MBA Admission Test
3 pages
Continuity & Differentiability Mind Maps Lakshya JEE 2024
0% (1)
Continuity & Differentiability Mind Maps Lakshya JEE 2024
1 page
Assignment 04
No ratings yet
Assignment 04
2 pages
Imo Syllabus - 2024-25
No ratings yet
Imo Syllabus - 2024-25
2 pages
Parametric Geometry of Curves and Surfaces Architectural Form-Finding by Lastra, Alberto, 2021
100% (2)
Parametric Geometry of Curves and Surfaces Architectural Form-Finding by Lastra, Alberto, 2021
294 pages
Chapter 1: Systems of Linear Equations and Matrices
100% (1)
Chapter 1: Systems of Linear Equations and Matrices
7 pages
3.1 Binary Addition: Chapter Three
No ratings yet
3.1 Binary Addition: Chapter Three
28 pages
1.5 - Infinite Geometric Series
No ratings yet
1.5 - Infinite Geometric Series
3 pages
Stem 11 10 GR 1 Acaso Arandid Aumentado Balind
No ratings yet
Stem 11 10 GR 1 Acaso Arandid Aumentado Balind
7 pages
Chapter 3 Multi-Step Equations And: Inequalities
No ratings yet
Chapter 3 Multi-Step Equations And: Inequalities
11 pages
Math
No ratings yet
Math
24 pages
Question DPP Functions JEE Main Crash Course MathonGo
No ratings yet
Question DPP Functions JEE Main Crash Course MathonGo
2 pages
AAHL P1 C Questions
No ratings yet
AAHL P1 C Questions
12 pages
Solution To Homework 2: Olena Bormashenko September 23, 2011
No ratings yet
Solution To Homework 2: Olena Bormashenko September 23, 2011
14 pages
Aif 1605
No ratings yet
Aif 1605
38 pages
Relation and Function
No ratings yet
Relation and Function
25 pages
Pair of Straight Lines 13 14 PDF
83% (24)
Pair of Straight Lines 13 14 PDF
34 pages
Questions 1.4 - 3.4
No ratings yet
Questions 1.4 - 3.4
6 pages
PPT06 - Eigenvalues and Eigenvectors
No ratings yet
PPT06 - Eigenvalues and Eigenvectors
13 pages
Allied Mathematics - I Unit I Partial Fractions
No ratings yet
Allied Mathematics - I Unit I Partial Fractions
10 pages
A Diophantine System: Ajai Choudhry
No ratings yet
A Diophantine System: Ajai Choudhry
4 pages
GATE Engn. Maths
No ratings yet
GATE Engn. Maths
28 pages
QC Acet Notes - Unit 1
No ratings yet
QC Acet Notes - Unit 1
33 pages
Lecture 3 - Time-Domain Analysis (Zero-Input Response)
100% (1)
Lecture 3 - Time-Domain Analysis (Zero-Input Response)
5 pages
Calculus Chapter 8
No ratings yet
Calculus Chapter 8
56 pages
10th Maths Sankalya (EM) - 2023 (GSEBMaterial - Com)
No ratings yet
10th Maths Sankalya (EM) - 2023 (GSEBMaterial - Com)
18 pages
Quadratic Residues
No ratings yet
Quadratic Residues
5 pages
12.-Higher-Math-1st-Paper-2023 2
No ratings yet
12.-Higher-Math-1st-Paper-2023 2
9 pages
3.1 Sequences and Series
No ratings yet
3.1 Sequences and Series
15 pages