Summations U ∞
X q L − q U +1 X qL
• Geometric series: qk = (holds for any q), qk = if |q| < 1
1−q 1−q
k=L k=L
U n n
X X n(n + 1) X n(n + 1)(2n + 1)
• Polynomial series: c = (U − L + 1)c, k= , k2 =
2 6
k=L k=1 k=1
Basic Probability
P [A,B]
• The sample space S is the set of all possible outcomes • Conditional probability: P [A|B] = P [B]
• An event A is a subset of S • Law of total probability: if B1 , . . . , Bm is a partition, then
• A ∩ B is the intersection: set of elements in both A and B m
X
• A ∪ B is the union: set of element in either A or B P [A] = P [Bi ]P [A|Bi ]
• Events A, B are disjoint or mutually exclusive if A ∩ B = ∅, i=1
P [A|B] P [B]
where ∅ = {} is the empty set • Bayes’ law: P [B|A] = P [A]
• Probability axioms: (i) P [A] ≥ 0, (ii) P [S] = 1, • Independence: A and B are independent events if
(iii) If A1 , A2 , . . . are mutually exclusive, then P [A, B] = P [A]P [B] or P [A|B] = P [A]
P [A1 ∪ A2 ∪ · · · ] = P [A1 ] + P [A2 ] + · · ·
• Consequences of the probability axioms:
P [Ac ] = 1 − P [A], P [∅] = 0, 0 ≤ P [A] ≤ 1, P [A ∪ B] ≤ P [A] + P [B],
m
X
P [A ∪ B] = P [A] + P [B] − P [A ∩ B], If A ⊂ B then P [A] ≤ P [B], P [{s1 , . . . , sm }] = P [si ]
Counting Methods i=1
|A|
• If outcomes are equally likely, then P [A] = |S| where |A| is the number of elements in A
• Number of ways to choose k out of n objects:
Sampling without replacement
Permutations (order matters) Combinations
(order doesn’t matter) Sampling with replacement
n! n n!
(n)k = = nk
(n − k)! k k!(n − k)!
Distribution Functions for Random Variables
Cumulative distribution fnc. (CDF) Probability mass fnc. (PMF) Probability density fnc. (PDF)
Definition FX (x) = P [X ≤ x] PX (x) = P [X = x] fX (x) = dFdxX (x)
P R∞
Properties FX (−∞) = 0, FX (∞) = 1, PX (x) ≥ 0, x PX (x) = 1, fX (x) ≥ 0, −∞ fX (x)dx = 1,
P R
FX (x1 ) ≤ FX (x2 ) if x1 ≤ x2 , P [X ∈ A] = x∈A PX (x) P [X ∈ A] = A fX (x)dx
FX (x2 ) − FX (x1 ) = P [x1 < X ≤ x2 ]
∂ 2 FX,Y (x,y)
Joint FX,Y (x, y) = P [X ≤ x, Y ≤ y] PX,Y (x, y) = P [X = x, Y = y] fX,Y (x, y) = ∂x ∂y
P R ∞
Marginal FX (x) = FX,Y (x, ∞) PX (x) = y PX,Y (x, y) fX (x) = −∞ fX,Y (x, y)dy
( (
PX (x) fX (x)
Conditional P [X∈A] x∈A x∈A
—— PX|X∈A (x) = fX|X∈A (x) = P [X∈A]
on an event 0 o.w. 0 o.w.
Conditional PX,Y (x,y) fX,Y (x,y)
—— PY |X (y|x) = PX (x) fY |X (y|x) = fX (x)
on a RV
Expectation, Variance, etc.
Single random variable: Two random variables:
P P P
• Expectation: (discrete) E[g(X)] = g(x)P (x), • E[g(X, Y )] =
R∞ x X x y g(x, y)PX,Y (x, y)
(continuous) E[g(X)] = −∞ g(x)fX (x)dx R∞ R∞
or −∞ −∞ g(x, y)fX,Y (x, y)dx dy
– µX = E[X] P
• E[g(Y )|X = x] = y g(y)PY |X (y|x)
– E[a g(X) + b] = a E[g(X)] + b R∞
2
• Variance: σX = Var[X] = E[(X − µX ) ]
2 or −∞ g(y)fY |X (y|x)dy
= E[X 2 ] − µ2X • Covariance: Cov[X, Y ] = E[(X − µX )(Y − µY )] = E[XY ] − µX µY
2 • Var[X + Y ] = Var[X] + 2 Cov[X, Y ] + Var[Y ]
– Var[a X + b] = a Var[X]
• Moment generating function (MGF): ϕX (s) = E[esX ] • X and Y are uncorrelated if Cov[X, Y ] = 0
dn ϕX (s) Cov[X,Y ]
– E[X n ] = • Correlation coefficient ρX,Y = √ , −1 ≤ ρX,Y ≤ 1
dsn Var[X] Var[Y ]
s=0
Independent Random Variables
• X and Y are independent if any of the following hold:
FX,Y (x, y) = FX (x)FY (y), PX,Y (x, y) = PX (x)PY (y), fX,Y (x, y) = fX (x)fY (y),
PY |X (y|x) = PY (y), fY |X (y|x) = fY (y)
• If X and Y are independent, then E[g(X)h(Y )] = E[g(X)] E[h(Y )]
• Independent implies uncorrelated, but not the other way around
Derived Random Variables Random Vectors
• Discrete: if W = g(X, Y ) then PW (w) = P [g(X, Y ) = w] T
• X = [X1 X2 · · · Xn ]
• Continuous: if W = g(X, Y ) then T
• µX = E[X] = [E[X1 ] E[X2 ] · · · E[Xn ]]
FW (w) = P [g(X, Y ) ≤ w]. Differentiate to find the PDF T
• Covariance matrix: CX = E[(X − µX )(X − µX ) ] contains
• If W = max{X, Y }, then FW (w) = FX,Y (w, w) variances along the diagonal and all covariances off the
R∞
• If W = X + Y , then fW (w) =
R∞ f
−∞ X,Y
(x, w − x)dx = diagonal
T
−∞
f X,Y (w−y, y)dy (convolution if X and Y independent) • If Y = AX + b, then µY = AµX + b, CY = ACX A
Distribution Families
RV Family Distribution
( (“0 otherwise” suppressed) Expectation Variance MGF E[esX ]
1 − p, x = 0
Bernoulli(p) PX (x) = p p(1 − p) 1 − p + pes
p, x=1
n x
Binomial(n, p) PX (x) = x p (1 − p)n−x , x = 0, 1, . . . , n np np(1 − p) (1 − p + pes )n
1 1−p pes
Geometric(p) PX (x) = (1 − p)x−1 p, x = 1, 2, . . . p p2 1−(1−p)es
1 k+ℓ (ℓ−k)(ℓ−k+2) esk −es(ℓ+1)
Disc. Uniform(k, ℓ) PX (x) = ℓ−k+1 , x = k, k + 1, . . . , ℓ 2 12 (ℓ−k+1)(1−es )
αx e−α s
−1)
Poisson(α) PX (x) = x! , x = 0, 1, . . . α α eα(e
k
x−1 k k(1−p) pes
Pascal(k, p) PX (x) = k−1 pk (1 − p)x−k , x = k, k + 1, . . . p p2 1−(1−p)es
2 2 2
√ 1 e−(x−µ) /(2σ ) σ 2 /2
Gaussian(µ, σ) fX (x) = 2π σ
µ σ2 esµ+s
1 a+b (b−a)2 ebs −eas
Cont. Uniform(a, b) fX (x) = b−a , a<x<b 2 12 s(b−a)
−λx 1 1 λ
Exponential(λ) fX (x) = λe , x>0 λ λ2 λ−s
n n−1 −λx
n
λ x e n n λ
Erlang(n, λ) fX (x) = (n−1)! , x>0 λ λ2 λ−s
2
x2 /2 2−π/2
fX (x) = a2 xe−a π
p
Rayleigh(a) , x>0 2a2 a2 complicated
Sums of Random Variables
Gaussian Random Variables/Vectors Let Wn = X1 + · · · + Xn , Mn = Wn /n
x−µ
• CDF: for a Gaussian(µ, σ), FX (x) = Φ where
Rx 2
σ • E[Wn ] = E[X1 ] + · · · + E[Xn ]
Φ(x) = −∞ √12π e−u /2 du Xn n−1
X X n
• Bivariate Gaussian: given µX , µY , σX , σY , ρ, • Var[Wn ] = Var[Xi ] + 2 Cov[Xi , Xj ]
i=1 i=1 j=i+1
fX,Y (x, y) = • If Xi are uncorrelated (or independent), then
2 2 n
x−µX 2ρ(x−µX )(y−µY ) y−µY X
1 σX − σX σY + σY Var[Wn ] = Var[Xi ]
exp −
2(1 − ρ2 )
p
i=1
2
2πσX σY 1 − ρ
• Law of large numbers: if Xi are independent and identically
• Gaussian random vector: given µX , CX , distributed (iid), then
fX (x) = lim P |Mn − µX | ≥ c = 0 for any c > 0
n→∞
1 1 T −1 • Central limit theorem: if Xi are iid, then Wn is approximately
exp − (x − µx ) C X (x − µx )
(2π)n/2 (det CX )1/2 2 Gaussian for large n:
E[Y |X] = µY + ρ σσXY
(X − µX )
!
•
w − nµX
• Var[Y |X] = σY (1 − ρ2 )
2 FWn (w) ≈ Φ p
2
nσX
• Any linear function, marginal distribution, or conditional
distribution of Gaussians is still Gaussian • If Xi are independent, then the MGFs satisfy
• Gaussians variables are independent if and only if they are
uncorrelated ϕWn (s) = ϕX1 (s)ϕX2 (s) · · · ϕXn (s)
Probability Bounds
• Markov inequality: If P [X < 0] = 0, then P [X ≥ c] ≤ µX /c
2 2
• Chebyshev inequality: P |X − µX | ≥ c ≤ σX /c