CL202: Introduction to Data Analysis
MB+SCP
Mani Bhushan, Sachin Patwardhan
Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan,[email protected]
Spring 2016
MB+SCP (IIT Bombay)
CL202
Spring 2016
1 / 45
This handout
Multiple Random Variables
Joint, marginal, conditional distribution and density functions
Independence
MB+SCP (IIT Bombay)
CL202
Spring 2016
2 / 45
Extension of Ideas:
Multiple (Multivariate) Random Variables: Jointly distributed random
variables
Event occurs in sample space S. Associate many, X1 , X2 , ..., Xn , random
variables with .
= 1 2 . . . n
Each random variable is a valid mapping from S to R.
MB+SCP (IIT Bombay)
CL202
Spring 2016
3 / 45
Bivariate Random Variables
For simplicity of notation consider two random variables: X , Y .
Special case of multiple random variables.
Examples:
I
I
I
I
Average number of cigarettes smoked daily and the age at which an individual
gets cancer,
Height and weight of an individual,
Height and IQ of an individual.
Flow-rate and pressure drop of a liquid flowing through a pipe.
MB+SCP (IIT Bombay)
CL202
Spring 2016
4 / 45
Jointly distributed random variables
Often interested in answering questions on X , Y taking values in a specified
region D in R2 (xy plane).
The distribution functions FX (x) and FY (y ) of X and Y determine their
individual probabilities but not their joint probabilities. In particular, the
probability of the event
{X x} {Y y } = {X x, Y y }
cannot be expressed in terms of FX (x) and FY (y ).
Joint probabilities of X , Y are completely determined if the probability of the
above event is known for every x and y .
MB+SCP (IIT Bombay)
CL202
Spring 2016
5 / 45
Joint Probability Distribution Function or Joint Cumulative
Distribution Function
For random variables (discrete or continuous) X , Y , the joint (bivariate)
probability distribution function is:
FX ,Y (x, y ) = P{X x, Y y }
where x, y are two arbitrary real numbers.
Often, the subscript X , Y omitted.
MB+SCP (IIT Bombay)
CL202
Spring 2016
6 / 45
Properties of Joint Probability Distribution Function
(Papoulis and Pillai, 2002)
F (, y ) = F (x, ) = 0, F (, ) = 1.
P(x1 < X x2 , Y y ) = F (x2 , y ) F (x1 , y )
P(X x, y1 < Y y2 ) = F (x, y2 ) F (x, y1 )
P(x1 < X x2 , y1 < Y y2 ) = F (x2 , y2 ) F (x1 , y2 ) F (x2 , y1 ) + F (x1 , y1 )
MB+SCP (IIT Bombay)
CL202
Spring 2016
7 / 45
Joint Density Function
The joint density of X and Y is by definition the function
2 F (x, y )
xy
f (x, y ) =
It follows that,
x
F (x, y ) =
f (, )dd
Z Z
P((X , Y ) D) =
f (x, y )dxdy
D
In particular, as x 0 and y 0,
P(x < X x + x, y < Y y + y ) f (x, y )xy
R R
f (x, y )dxdy = 1; f (x, y ) 0 x, y R.
MB+SCP (IIT Bombay)
CL202
Spring 2016
8 / 45
Joint Density Example: Bivariate Gaussian Random
Variable
f (x, y ) = exp(0.5( )T P 1 ( ))
with
=
x
y
MB+SCP (IIT Bombay)
, =
1
1
, P=
CL202
0.9 0.4
0.4 0.3
, =
1
p
|P|
Spring 2016
9 / 45
Joint Density Visualization
MB+SCP (IIT Bombay)
CL202
Spring 2016
10 / 45
Joint Distribution Visualization
MB+SCP (IIT Bombay)
CL202
Spring 2016
11 / 45
Marginal Distribution or Density Functions of Individual
Random Variables
Marginal Probability Distribution Functions: FX (x), FY (y ):
I
Extract FX (x) from F (x, y ) as:
FX (x)
P(X x) =
P(X x, Y < ) = F (x, )
Similarly, extract FY (y ) as:
FY (y ) = P(Y y ) = P(X < , Y y ) = F (, y )
Marginal Probability Density Functions: fX (x), fY (y ):
I
Extract these from f (x, y ) as:
Z
fX (x) =
f (x, y )dy
fY (y ) =
f (x, y )dx
MB+SCP (IIT Bombay)
CL202
Spring 2016
12 / 45
Marginal Probability Density
Z
fX (x) =
f (x, y )dy
Makes sense, since
P(X A)
= P(X A, Y (, ))
Z Z
f (x, y )dydx
=
ZA
=
fX (x)dx
A
where fX (x) is as defined above.
Similarly,
Z
fY (y ) =
f (x, y )dx
MB+SCP (IIT Bombay)
CL202
Spring 2016
13 / 45
Example 4.3c from Ross
f (x, y ) =
2e x e 2y
0
0 < x < , 0 < y <
otherwise
Compute: (a) P(X > 1, Y < 1), (b) P(X < Y ), (c) P(X < a)
Z
P(X > 1, Y < 1)
=
0
Z 1
2e x e 2y dxdy
2e 2y (e x |
1 )dy
Z 1
2e 2y dy = e 1 (1 e 2 )
= e 1
0
Z Z y
2e x e 2y dxdy = 1/3
P(X < Y ) =
0
0
Z aZ
P(X < a) =
2e x e 2y dydx = 1 e a
0
MB+SCP (IIT Bombay)
CL202
Spring 2016
14 / 45
Joint Density Visualization: Exponential
MB+SCP (IIT Bombay)
CL202
Spring 2016
15 / 45
Joint Distribution Visualization: Exponential
MB+SCP (IIT Bombay)
CL202
Spring 2016
16 / 45
Joint Probability Mass Function (PMF)
Given two discrete random variables X and Y in the same experiment, the
joint PMF of X and Y is
p(xi , yj ) = P(X = xi , Y = yj )
for all pairs of (xi , yj ) values that X and Y can take.
p(xi , yj ) also denoted as pX ,Y (xi , yj ).
The marginal probability mass functions for X and Y are
X
pX (x) = P(X = x) =
pX ,Y (x, y )
y
pY (y ) = P(Y = y ) =
pX ,Y (x, y )
MB+SCP (IIT Bombay)
CL202
Spring 2016
17 / 45
Computation of Marginal PMF from Joint PMF
Formally:
{X = xi } =
{X = xi , Y = yj }
All events on RHS are mutually exclusive. Thus,
X
X
pX (xi ) = P(X = xi ) =
P(X = xi , Y = yj ) =
p(xi , yj )
j
Similarly,
P
pY (yj ) = P(Y = yj ) = p(xi , yj ).
i
Note: P(X = xi , Y = yj ) cannot be constructed from knowledge of P(X = xi )
and P(Y = yj ).
MB+SCP (IIT Bombay)
CL202
Spring 2016
18 / 45
Example: 4.3a, Ross
3 batteries are randomly chosen from a group of 3 new, 4 used but still working,
and 5 defective batteries. Let X , Y denote the number of new, and used but
working batteries that are chosen, respectively. Find
p(xi , yj ) = P(X = xi , Y = yj ).
Solution: Let T =12 C3
p(0, 0) = (5 C3 )/T
p(0, 1) = (4 C1 )(5 C2 )/T
p(0, 2) = (4 C2 )(5 C1 )/T
p(0, 3) = (4 C3 )/T
p(1, 0) = (3 C1 )(5 C2 )/T
p(1, 1) = (3 C1 )(4 C1 )(5 C1 )/T
p(1, 2) = ...
p(2, 0) = ...
p(2, 1) = ...
p(3, 0) = ...
MB+SCP (IIT Bombay)
CL202
Spring 2016
19 / 45
Tabular Form
0
1
2
3
Col sum
(P(Y = j))
10/220
30/220
15/220
1/220
56/220
40/220
60/220
12/220
0
112/220
30/220
18/220
0
0
48/220
4/220
0
0
0
4/220
Row Sum
(P(X = i))
84/220
108/220
27/220
1/220
i represents row and j represents column:
Both row and column sums add upto 1.
Marginal probabilities in the margins of the table.
MB+SCP (IIT Bombay)
CL202
Spring 2016
20 / 45
n Random Variables
Joint cumulative probability distribution function F (x1 , x2 , ..., xn ) of n random
variables X1 , X2 , ..., Xn is defined as:
F (x1 , x2 , ..., xn ) = P(X1 x1 , X2 x2 , ..., Xn xn )
If random vars. discrete: joint probability mass function
p(x1 , x2 , ..., xn ) = P(X1 = x1 , X2 = x2 , ..., Xn = xn )
If random vars. continuous: joint probability density function f (x1 , x2 , ..., xn )
such that for any set C in n-dimensional space
Z Z
Z
P((X1 , X2 , ..., Xn ) C ) =
. . . f (x1 , x2 , ..., xn )dx1 dx2 ...dxn
(x1 ,...,xn )C
where,
f (x1 , x2 , ..., xn ) =
MB+SCP (IIT Bombay)
CL202
n F (x1 , x2 , . . . , xn )
x1 x2 . . . xn
Spring 2016
21 / 45
Obtaining Marginals
= F (x1 , , , . . . , )
Z Z
Z
fX1 (x1 ) =
...
f (x1 , x2 , . . . , xn )dx2 dx3 . . . dxn
XX X
pX1 (x1 ) =
...
p(x1 , x2 , . . . , xn )
FX1 (x1 )
x2
MB+SCP (IIT Bombay)
x3
xn
CL202
Spring 2016
22 / 45
Independence of Random Variables
Random variables X and Y are independent if for any two sets of real
numbers A and B:
P(X A, Y B) = P(X A)P(Y B)
i.e. events EA = {X A} and EB = {Y B} are independent.
Height and IQ
In particular: P(X a, Y b) = P(X a)P(Y b), or
In terms of joint cumulative distribution function F of X and Y :
F (a, b) = FX (a)FY (b); a, b R
Random variables that are not independent are dependent
MB+SCP (IIT Bombay)
CL202
Spring 2016
23 / 45
Independence: Probability Mass and Density Functions
Random variables X , Y independent if:
Discrete random variables: Probability mass function
p(xi , yj ) = pX (xi )pY (yj ) for all xi , yj
Continuous random variables: Probability density function
f (x, y ) = fX (x)fY (y ) for all x, y
MB+SCP (IIT Bombay)
CL202
Spring 2016
24 / 45
Independence: Equivalent Statements
1)
P(X A, Y B) = P(X A)P(Y B); A, B sets in R
2)
F (x, y ) = FX (x)FY (y ); x, y
3)
f (x, y ) = fX (x)fY (y ); x, y ; continuous RVs
3)
p(xi , yj ) = pX (xi )pY (yj ); xi , yj ; discrete RVs
MB+SCP (IIT Bombay)
CL202
Spring 2016
25 / 45
Example 5.2 (Ogunnaike, 2009)
The reliability of the temperature control system for a commercial, highly
exothermic polymer reactor is known to depend on the lifetimes (in years) of the
control hardware electronics, X1 , and of the control valve on the cooling water
line, X2 . If one component fails, the entire control system fails. The random
phenomenon in question is characterized by the two-dimensional random variable
(X1 , X2 ) whose joint probability distribution is given as:
1 (0.2x +0.1x )
1
2
, 0 < x1 < , 0 < x2 <
50 e
f (x1 , x2 ) =
0,
otherwise
1
Establish that
is a legitimate joint probability density function,
R Rabove
To show: 0 0 f (x1 , x2 )dx1 dx2 = 1.
Z
0
Z
0
1 (0.2x1 +0.1x2 )
1
0.1x2
e
dx1 dx2 =
(5e 0.2x1 |
|0 ) = 1
0 )(10e
50
50
MB+SCP (IIT Bombay)
CL202
Spring 2016
26 / 45
Example (Continued)
Whats the probability of the system
lasting more than 2 years.
RR
1 (0.2x1 +0.1x2 )
e
dx1 dx2 = 0.549.
To find: P(X1 > 2, X2 > 2) = 2 2 50
Find marginal density function of X1 .
Z
1 (0.2x1 +0.1x2 )
1
fX1 (x1 ) =
e
dx2 = e (0.2x1 )
50
5
0
Find marginal density function of X2 ?
Z
1 (0.1x2 )
1 (0.2x1 +0.1x2 )
fX2 (x2 ) =
e
dx1 =
e
50
10
0
Are X1 , X2 independent? Yes, since f (x1 , x2 ) = fX1 (x1 )fX2 (x2 ).
MB+SCP (IIT Bombay)
CL202
Spring 2016
27 / 45
Independence of n Random Variables
Random variables X1 , X2 , ..., Xn are said to be independent if
For all sets of real numbers A1 , A2 , ..., An :
P(X1 A1 , X2 A2 , ..., Xn An ) =
n
Y
P(Xi Ai )
i=1
In particular: a1 , a2 , ..., an R
P(X1 a1 , X2 a2 , ..., Xn an )
F (a1 , a2 , ..., an )
=
=
n
Y
i=1
n
Y
P(Xi ai ), or
FXi (ai )
i=1
For discrete random variables: probability mass function factorizes:
p(x1 , x2 , ..., xn ) = pX 1 (x1 )pX 2 (x2 )...pXn (xn )
For continuous random variables: probability density function factorizes:
f (x1 , x2 , ..., xn ) = fX1 (x1 )fX2 (x2 )...fXn (xn )
MB+SCP (IIT Bombay)
CL202
Spring 2016
28 / 45
Independent, Repeated Trials
In statistics, one usually does not consider just a single experiment, but that
the same experiment is performed several times.
Associate a separate random variable with each of those experimental
outcomes.
If the experiments are independent of each other, then we get a set of
independent random variables.
Example: Tossing a coin n times. Random variable Xi is the outcome (0 or
1) in the i th toss.
MB+SCP (IIT Bombay)
CL202
Spring 2016
29 / 45
Independent and Identically Distributed (IID) Variables
A collection of random variables is said to be IID if
The variables are independent
The variables have the same probability distribution
Example 1: Tossing a coin n times. The probability of obtaining a head in a
single toss does not vary and all the tosses are independent.
I
Each toss leads to a random variable with the same probability distribution
function. The random variables are also independent. Thus, IID.
Example 2: Measuring temperature of a beaker at n time instances in the
day. The true water temperature changes throughout the day. The sensor is
noisy.
I
I
Each sensor reading leads to a random variable.
Variables are independent but not identically distributed.
MB+SCP (IIT Bombay)
CL202
Spring 2016
30 / 45
Conditional Distributions
Remember for two events A and B: conditional probability of A given B is:
P(A | B) =
P(A, B)
P(B)
for P(B) > 0.
MB+SCP (IIT Bombay)
CL202
Spring 2016
31 / 45
Conditional Probability Mass Function
For X , Y discrete random variables, define the conditional probability mass
function of X given Y = y by
pX |Y (x|y ) = P(X = x | Y = y ) =
P(X = x, Y = y )
p(x, y )
=
P(Y = y )
pY (y )
for pY (y ) > 0.
MB+SCP (IIT Bombay)
CL202
Spring 2016
32 / 45
Examples 4.3b,f from Ross
Question: In a community, 15% families have no children, 20% have 1, 35% have
2 and 30% have 3 children. Each child is equally likely to be a boy or girl. We
choose a family at random. Given that the chosen family has one girl, compute
the probability mass function of the number of boys in the family?
G: number of girls, B: number of boys, C: number of children
To find: P(B = i|G = 1), i = 0, 1, 2, 3.
P(B = i|G = 1) =
P(B = i, G = 1)
, i = 0, 1, 2, 3
P(G = i)
First find P(G = 1)
{G = 1} = {G = 1} ({C = 0} {C = 1} {C = 2} {C = 3})
P(G = 1) = P(G = 1, C = 0) + P(G = 1, C = 1) + P(G = 1, C = 2)
+ P(G = 1, C = 3)
since C = 0, C = 1, C = 2, C = 3 are mutually exclusive events with union a S.
Then,
P(G = 1)
= P(G = 1 | C = 0)P(C = 0) + P(G = 1 | C = 1)P(C = 1) + ...
=
0 + (1/2) 0.2 + ... = 0.3875
MB+SCP (IIT Bombay)
CL202
Spring 2016
33 / 45
Example continued
Then,
P(B = 0 | G = 1) =
P(B = 0, G = 1)
P(G = 1)
Numerator
= P(G = 1andC = 1) = P(G = 1 | C = 1)P(C = 1) = (1/2)0.2 = 0.1. Then,
P(B = 0 | G = 1) = 0.1/0.3875 = 8/31
Similarly:
P(B = 1 | G = 1) = 14/31, P(B = 2 | G = 1) = 9/31, P(B = 3 | G = 1) = 0.
Check: Sum of conditional probabilities is 1.
MB+SCP (IIT Bombay)
CL202
Spring 2016
34 / 45
Conditional Probability Density Function
For Random Variables X , Y , conditional probability density of X given that Y = y
is defined as:
f (x, y )
fX |Y (x | y ) =
fY (y )
for fY (y ) > 0.
Hence, can make statements on probabilities of X taking values in some set A
given the value obtained by Y as:
Z
P(X A | Y = y ) =
fX |Y (x | y )dx
A
MB+SCP (IIT Bombay)
CL202
Spring 2016
35 / 45
Independence and Conditional Probabilities
If X , Y are independent, then
pX |Y (x|y ) = pX (x)
fX |Y (x|y ) = fX (x)
MB+SCP (IIT Bombay)
CL202
Spring 2016
36 / 45
Temperature Control Example (Continued), Example 5.2
(Ogunnaike, 2009) Earlier
Find Conditional density function: fX1 |X2 (x1 |x2 ).
f (x1 , x2 )/fX2 (x2 ) =
1 0.2x1
e
5
which is same as fX1 (x1 ) in this example.
2
Similarly, fX2 |X1 (x2 |x1 ) = fX2 (x2 ) in this example.
Generic Question: If fX1 |X2 (x1 |x2 ) = fX1 (x1 ), then is fX2 |X1 (x2 |x1 ) = fX2 (x2 )?
Answer: Yes
MB+SCP (IIT Bombay)
CL202
Spring 2016
37 / 45
Example 5.5 (Ogunnaike, 2009)
fX1 ,X2 =
x1 x2 , 1 < x1 < 2, 0 < x2 < 1
0,
otherwise
Find: Conditional probability densities.
Answer: Compute marginals
(x1 0.5),
fX1 (x1 ) =
0,
(1.5 x2 ),
fX2 (x2 ) =
0,
1 < x1 < 2
otherwise
0 < x2 < 1
otherwise
Then compute conditionals
(x1 x2 )
, 1 < x1 < 2
(1.5 x2 )
(x1 x2 )
fX2 |X1 (x2 |x1 ) =
, 0 < x2 < 1
(x1 0.5)
fX1 |X2 (x1 |x2 ) =
The random variables X1 , X2 are not independent.
MB+SCP (IIT Bombay)
CL202
Spring 2016
38 / 45
Plots
MB+SCP (IIT Bombay)
CL202
Spring 2016
39 / 45
Independence of Transformations
If random variables X , Y are independent, then the random variables
Z = g (X ), U = h(Y )
are also independent.
Proof: Let Az denote the set of points on the x-axis such that g (x) z and Bu
denote the set of points on the y-axis such that h(y ) u. Then,
{Z z} = {X Az }; {U u} = {Y Bu }
Thus, the events {Z z} and {U u} are independent because events
{X Az } and {Y Bu } are independent.
MB+SCP (IIT Bombay)
CL202
Spring 2016
40 / 45
Expected Value
By analogy with transformation of a single RV, expected value of a transformation
of multiple RVs can be defined as:
Z Z
E [g (X , Y )] =
g (x, y )f (x, y )dxdy
For discrete RVs, the above becomes
E [g (X , Y )] =
XX
y
MB+SCP (IIT Bombay)
g (x, y )p(x, y )
CL202
Spring 2016
41 / 45
Special Cases
g (X , Y ) = X + Y . Then,
E (g (X , Y )) = E [X ] + E [Y ]
g (X , Y ) = (X E [X ])(Y E [Y ]): covariance of X,Y; labeled Cov(X,Y).
Correlation coefficient:
=
Cov(X , Y )
X Y
Property: dimensionless, 1 1.
MB+SCP (IIT Bombay)
CL202
Spring 2016
42 / 45
Independence versus Covariance
If X , Y are independent, then
Cov(X , Y ) = 0
Independence = covariance=0 (variables uncorrelated)
Covariance=0 =
6
independence
Example: X,Y take values (0, 1), (1, 0), (0, 1), (1, 0) with equal probability
(1/4).
Cov(X,Y)=0, but X,Y not independent.
MB+SCP (IIT Bombay)
CL202
Spring 2016
43 / 45
Independence Implications
g (X , Y ) = XY
I
If X , Y independent,
E [XY ] = E [X ]E [Y ]
g (X , Y ) = h(X )l(Y )
I
If X , Y independent,
E [h(X )l(Y )] = E [h(X )]E [l(Y )]
MB+SCP (IIT Bombay)
CL202
Spring 2016
44 / 45
THANK YOU
MB+SCP (IIT Bombay)
CL202
Spring 2016
45 / 45