Llecture2 1
Llecture2 1
Chen Tong
▶ Joint Distributions
▶ In general, the probability can be any number between zero and one.
Call this number θ :
P(X = 1) = θ, P(X = 0) = 1 − θ
2. Joint Distributions
fX ,Y (x, y ) = P(X = x, Y = y )
▶ Properties:
P P
fX ,Y (x, y ) ≥ 0, X y fX ,Y (x, y ) = 1 in discrete case
R R
fX ,Y (x, y ) ≥ 0, X y fX ,Y (x, y )dydx = 1 in continuous case
F (z, w ) = P(X ≤ z, Y ≤ w )
are XX
F (z, w ) = fX ,Y (x, y )
x≤z y ≤w
fX ,Y (x, y ) = fX (x)fY (y ),
and
P
fY (y ) = R X fX ,Y (x, y ) in the dicrete case
f (s, y )ds in the continuous case
X X ,Y
Example above:
Z 1 1
21 2 21 2 2 21 2 21 2 4 21 2
x 1 − x4
fX (x) = x ydy = x y = x − x x =
x2 4 8 x2 8 8 8
fX ,Y (x, y )
fY |X (y | x) = with fX (x) > 0
fX (x)
▶ Properties: fY |X (y | x) ≥ 0 and
P
R ∞y fY |X (y | x) = 1 in the discrete case
f (y | x)dy = 1 in the continuous case
−∞ Y |X
fY ,X (y , x) = fY |X (y | x)fX (x)
21 2
▶ Example above: fY |X (y | x) = f (x,y ) 4 x y 2y
fX (x) = 21 2 4 = 1−x 4
8 x (1−x )
or expressed it in log-form
where f (x) and f (y ) are the marginal pdf, and F (x) and F (y ) are
the marginal cdf.
▶ "Measures of variability":
▶ Variance (squared difference of X from its expected value)
▶ Standard deviation (square root of the variance)
▶ Calculation rules:
E (a) = a,
E (X + Y ) = E (X ) + E (Y ) = µX + µY
E (aX + b) = aE (X ) + b = aµX + b
▶ We do not only want to know the expected value but also the spread
of a distribution
⇒ How far is the distance of a random variable X from its expected
value?
▶ Calculation rule:
and
Var(Z ) = Var(aX + b) = a2 Var(X ) = σ 2 /σ 2 = 1
h i
▶ The first central moment is E (X − µX )1 = E (X ) − µX = 0
h i
▶ The second central moment is the variance: E (X − µX )2
▶
Cov(X , Y ) = E [(X − µX ) (Y − µY )]
= E [XY − µX Y − µY X + µX µY ]
= E (XY ) − µX µY
E (XY ) = E (X )E (Y ) = µX µY ⇒ Cov(X , Y ) = 0
Var(Y | X = x) = E (Y − E (Y | X = x))2 | X = x
= E Y 2 | X = x − [E (Y | X = x)]2
▶ Y = r (X ) and X pdf
∼ fX
▶ the cdf of Y is
Z
G (y ) = Pr(Y ⩽ y ) = Pr(r (X ) ⩽ y ) = f (x)dx,
{x:r (X )⩽y }
ds(y )
g (y ) = f (s(y )) ,
dy
▶ The t-Distribution
▶ The F-Distribution
1
√ exp −(x − µ)2 /2σ 2 ,
f (x) = −∞ < x < ∞
σ 2π
1
ϕ(z) = √ exp −z 2 /2 ,
−∞ < z < ∞
2π
▶ tn has heavier tails and the amount of probability mass in the tails is
controlled by the parameter n. For n = 1 the t distribution tn
becomes the standard Cauchy distribution, whereas for n → ∞ it
becomes the standard normal distribution N(0, 1).
T
ST = q , T ∼ tn
n
n−2
6
So we have its variance is 1, and the excess kurtosis is n−4 if n > 4.
(X1 /k1 )
F =
(X2 /k2 )