0% found this document useful (0 votes)
10 views6 pages

Imp Formula

stats 2

Uploaded by

mishkatchougule
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Imp Formula

stats 2

Uploaded by

mishkatchougule
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

lOMoARcPSD|31573472

Stats 2 Formula Sheet - Summary Programming and data


science
Programming and data science (Indian Institute of Technology Madras)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Mishkat Chougule ([email protected])
lOMoARcPSD|31573472

Statistics for Data Science - 2


Formula file

Discrete random variables:

Distribution PMF (fX (k)) CDF (FX (x)) E[X] Var(X)



 0 x<0
1
, x=k

 k−a+1

k ≤x<k+1
Uniform(A) n
n a+b n2 −1
n=b−a+1 2 12
A = {a, a + 1, . . . , b} k = a, a + 1, . . . , b − 1, b
k = a, a + 1, . . . , b



1 x≥n


0 x<0
( 
p x=1
Bernoulli(p) 1−p 0≤x<1 p p(1 − p)
1−p x=0 
1 x≥1



 0 x<0
k


n
Ci pi (1 − p)n−i

P
n
Ck pk (1 − p)n−k , k ≤x<k+1
Binomial(n, p) i=0 np np(1 − p)
k = 0, 1, . . . , n
k = 0, 1, . . . , n





1 x≥n


0
 x<0
1 1−p
(1 − p)k−1 p, k
Geometric(p) 1 − (1 − p) k ≤ x < k + 1
k = 1, . . . , ∞  p p2
k = 1, . . . , ∞


 0 x<0
e−λ λk


 k λi
Poisson(λ) , e−λ
P
k ≤x<k+1 λ λ
k!
k = 0, 1, . . . , ∞ 
 i=0 i!

 k = 0, 1, . . . , ∞

Downloaded by Mishkat Chougule ([email protected])


lOMoARcPSD|31573472

Continuous random variables:

Distribution PDF (fX (k)) CDF (FX (x)) E[X] Var(X)



 0 x≤a
(b − a)2

x − a
1 a+b
Uniform[a, b] ,a≤x≤b a<x<b
b−a 
 b−a 2 12
1 x≥b

(
0 x≤0 1 1
Exp(λ) λe−λx , x > 0
1 − e−λx x > 0 λ λ2
−(x − µ)2
 
1
√ exp ,
Normal(µ, σ 2 ) σ 2π 2σ 2 No closed form µ σ2
−∞ < x < ∞
β α α−1 −βx α α
Gamma(α, β) x e ,x>0
Γ(α) β β2
Γ(α + β) α−1
x (1 − x)β−1 α αβ
Beta(α, β) Γ(α)Γ(β)
α+β (α + β)2 (α + β + 1)
0<x<1

1. Markov’s inequality: Let X be a discrete random variable taking non-negative values


with a finite mean µ. Then,
µ
P (X ≥ c) ≤
c
2. Chebyshev’s inequality: Let X be a discrete random variable with a finite mean µ
and a finite variance σ 2 . Then,
1
P (| X − µ |≥ kσ) ≤
k2

3. Weak Law of Large numbers: Let X1 , X2 , . . . , Xn ∼ iid X with E[X] = µ, Var(X) =


σ2.
X1 + X2 + . . . + Xn
Define sample mean X = . Then,
n
σ2
P (|X − µ| > δ) ≤
nδ 2

4. Using CLT to approximate probability: Let X1 , X2 , . . . , Xn ∼ iid X with E[X] =


µ, Var(X) = σ 2 .
Define Y = X1 + X2 + . . . + Xn . Then,
Y − nµ
√ ≈ Normal(0, 1).

Page 2

Downloaded by Mishkat Chougule ([email protected])


lOMoARcPSD|31573472

5. Bias of an estimator: Bias(θ̂, θ) = E[θ̂] − θ.


1P n
6. Method of moments: Sample moments, Mk (X1 , X2 , . . . , Xn ) = Xk
n i=1 i
Procedure: For one parameter θ

• Sample moment: m1
• Distribution moment: E(X) = f (θ)
• Solve for θ from f (θ) = m1 in terms of m1 .
• θ̂ : replace m1 by M1 in the above solution.

7. Likelihood of i.i.d. samples: Likelihood of a sampling x1 , x2 , . . . , xn , denoted


n
Y
L(x1 , . . . , xn ) = fX (xi ; θ1 , θ2 , . . .)
i=1

8. Maximum likelihood (ML) estimation:


n
Y
θ1∗ , θ2∗ , . . . = arg max fX (xi ; θ1 , θ2 , . . .)
θ1 ,θ2 ,...
∗ ∗
i=1

9. Bayesian estimation: Let X1 , . . . , Xn ∼ i.i.d.X, parameter Θ.


Prior distribution of Θ : Θ ∼ fΘ (θ).
Samples, S : (X1 = x1 , . . . , Xn = xn )
Posterior: Θ | (X1 = x1 , . . . , Xn = xn )
Bayes’ rule: Posterior ∝ Prior × Likelihood
Posterior density ∝ fΘ (θ) × P (X1 = x1 , . . . , Xn = xn | Θ = θ)

10. Normal samples with unknown mean and known variance:


X1 , . . . , Xn ∼ i.i.d. Normal(M, σ 2 ).
Prior M ∼ Normal(µ0 , σ02 ).
nσ02 σ2
  
Posterior mean: µ̂ = X + µ0
nσ02 + σ 2 nσ02 + σ 2

Page 3

Downloaded by Mishkat Chougule ([email protected])


lOMoARcPSD|31573472

11. Hypothesis Testing

• Test for mean


Case (1): When population variance σ 2 is known (z-test)

Test H0 HA Test statistic Rejection region

T =X
right-tailed µ = µ0 µ > µ0 X − µ0 X>c
Z= σ√
/ n
T =X
left-tailed µ = µ0 µ < µ0 X − µ0 X<c
Z= σ√
/ n
T =X
two-tailed µ = µ0 µ ̸= µ0 X − µ0 |X − µ0 | > c
Z= σ√
/ n

Case (2): When population variance σ 2 is unknown (t-test)

Test H0 HA Test statistic Rejection region

T =X
right-tailed µ = µ0 µ > µ0 X − µ0 X>c
tn−1 = S/√n

T =X
left-tailed µ = µ0 µ < µ0 X − µ0 X<c
tn−1 = S/√n

T =X
two-tailed µ = µ0 µ ̸= µ0 X − µ0 |X − µ0 | > c
tn−1 = S/√n

Page 4

Downloaded by Mishkat Chougule ([email protected])


lOMoARcPSD|31573472

• χ2 -test for variance:

Test H0 HA Test statistic Rejection region

(n − 1)S 2
right-tailed σ = σ0 σ > σ0 T = ∼ χ2n−1 S 2 > c2
σ02

(n − 1)S 2
left-tailed σ = σ0 σ < σ0 T = 2
∼ χ2n−1 S 2 < c2
σ0
α
(n − 1)S 2 S 2 > c2 where = P (S 2 > c2 ) or
two-tailed σ = σ0 σ ̸= σ0 T = ∼ χ2n−1 2
σ02 α
S 2 < c2 where = P (S 2 < c2 )
2

• Two samples z-test for means:

Test H0 HA Test statistic Rejection region

T =X −Y
σ2 σ2
 
right-tailed µ1 = µ2 µ1 > µ2 X −Y >c
X − Y ∼ Normal 0, 1 + 2 if H0 is true
n1 n2
T =Y −X
σ2 σ2
 
left-tailed µ1 = µ2 µ1 < µ2 Y −X >c
Y − X ∼ Normal 0, 2 + 1 if H0 is true
n2 n1
T =X −Y
σ2 σ2
 
two-tailed µ1 = µ2 µ1 ̸= µ2 |X − Y | > c
X − Y ∼ Normal 0, 1 + 2 if H0 is true
n1 n2

• Two samples F -test for variances

Test H0 HA Test statistic Rejection region

S12 S12
one-tailed σ1 = σ2 σ 1 > σ2 T = ∼ F(n1 −1,n2 −1) >1+c
S22 S22

S12 S12
one-tailed σ1 = σ2 σ 1 < σ2 T = ∼ F(n1 −1,n2 −1) <1−c
S22 S22
S12 α
> 1 + cR where = P (T > 1 + cR ) or
S12 S22 2
two-tailed σ1 = σ2 σ1 ̸= σ2 T = ∼ F(n1 −1,n2 −1)
S22 S12 α
< 1 − cL where = P (T < 1 − cL )
S22 2

Page 5

Downloaded by Mishkat Chougule ([email protected])

You might also like