Probability
Lecture 6
Centre for Data Science, ITER
Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India.
1 / 20
Contents
1 Introduction
2 Dependence and independence of events
3 Conditional Probability
4 Bayes’s Theorem
5 Random Variables
6 Continuous Distributions
7 Probability Density Function
8 Cumulative Distribution Function
9 The Normal Distribution
10 The Central Limit Theorem
2 / 20
Introduction
Probability is a way of quantifying the uncertainty associated with
events chosen from some universe of events.
Notationally, we write P(E) to mean “the probability of the event E.”
3 / 20
Dependence and independence of events
Mathematically, we say that two events E and F are independent if
the probability that they both happen is the product of the
probabilities that each one happens:
P(E,F) = P(E)P(F)
For instance, if we flip a fair coin twice, knowing whether the first
flip is heads gives us no information about whether the second flip
is heads. These events are independent.
On the other hand, knowing whether the first flip is heads certainly
gives us information about whether both flips are tails. (If the first
flip is heads, then definitely it’s not the case that both flips are
tails.) These two events are dependent.
4 / 20
Conditional Probability
If two events E and F are not necessarily independent (and if the
probability of F is not zero), then we define the probability of E
“conditional on F” as:
P(E | F ) = P(E, F )/P(F )
We can say that this is the probability that E happens, given that
we know that F happens.
We often rewrite this as:
P(E, F ) = P(E | F )P(F )
5 / 20
Conditional Probability (Contd.)
When E and F are independent, you can check that this gives:
P(E | F ) = P(E)
which is the mathematical way of expressing that knowing F
occurred gives us no additional information about whether E
occurred.
6 / 20
Bayes’s Theorem
Bayes’s theorem is a way of “reversing” conditional probabilities.
Let’s say we need to know the probability of some event E
conditional on some other event F occurring. But we only have
information about the probability of F conditional on E occurring.
Using the definition of conditional probability twice tells us that:
P(E | F ) = P(E, F )/P(F ) = P(F | E)P(E)/P(F )
7 / 20
Bayes’s Theorem
The event F can be split into the two mutually exclusive events “F
and E” and “F and not E.” If we write -E for “not E” (i.e., “E doesn’t
happen”), then:
P(F ) = P(F , E) + P(F , −E)
so that:
P(E| F ) = P(F | E)P(E)/[P(F | E)P(E) + P(F | −E)P(−E)]
which is how Bayes’s theorem is often stated.
8 / 20
Random Variables
A random variable is a variable whose possible values have an
associated probability distribution.
Eg: A very simple random variable equals 1 if a coin flip turns up
heads and 0 if the flip turns up tails.
The expected value of a random variable, which is the average of
its values weighted by their probabilities.
Eg: The coin flip variable has an expected value of
1/2 (= 0 * 1/2 + 1 * 1/2)
and the range(10) variable has an expected value of 4.5.
9 / 20
Continuous Distributions
A coin flip corresponds to a discrete distribution—one that
associates positive probability with discrete outcomes.
A continuous distribution describes the probabilities of the
possible values of a continuous random variable i.e. a random
variable which has infinite and uncountable set of possible values
as number of outcomes.
Eg: The uniform distribution puts equal weight on all the
numbers between 0 and 1.
10 / 20
Probability Density Function
Because there are infinitely many numbers between 0 and 1, this
means that the weight it assigns to individual points must
necessarily be zero.
For this reason, we represent a continuous distribution with a
probability density function (PDF) such that the probability of
seeing a value in a certain interval equals the integral of the
density function over the interval.
The density function for the uniform distribution is just:
def uniform pdf(x: float) -> float:
return 1 if 0 <= x < 1 else 0
11 / 20
Cumulative Distribution Function
We will often be more interested in the cumulative distribution
function (CDF), which gives the probability that a random variable
is less than or equal to a certain value.
CDF for the uniform distribution will be:
def uniform cdf(x: float) -> float:
if x < 0: return 0
elif x < 1: return x
else: return 1
12 / 20
The Normal Distribution
The normal distribution is the classic bell curve–shaped
distribution and is completely determined by two parameters: its
mean µ (mu) and its standard deviation σ (sigma).
The mean indicates where the bell is centered, and the standard
deviation how “wide” it is.
(x−µ)2
−
It has the PDF: f (x | µ, σ) = √1 e 2σ 2
2πσ
13 / 20
Normal Distribution (Contd.)
It can be implemented as:
import math
SQRT TWO PI = math.sqrt(2 * math.pi)
def normal pdf(x: float, mu: float = 0, sigma: float = 1) -> float:
return (math.exp(-(x-mu) ** 2 / 2 / sigma ** 2) / (SQRT TWO PI * sigma))
When µ = 0 and σ = 1, it’s called the standard normal distribution.
If Z is a standard normal random variable, then it turns out that:
X = σZ + µ is also normal but with mean µ and standard deviation σ.
Conversely, if X is a normal random variable with mean µ and standard
deviation σ,
Z = (X − µ)/σ is a standard normal variable.
14 / 20
Normal Distribution (Contd.)
The CDF for the normal distribution cannot be written in an
“elementary” manner, but we can write it using Python’s
math.erf error function:
def normal cdf(x: float, mu: float = 0, sigma: float = 1) -> float:
return (1 + math.erf((x - mu) / math.sqrt(2) / sigma)) / 2
15 / 20
The Central Limit Theorem
If x1, ..., xn are random variables with mean µ and standard
deviation σ, and if n is large, then:
1
n (x1 + x2 + . . . + xn )
is approximately normally distributed with mean µ and standard
deviation √σn .
Equivalently (but often more usefully),
(x1 +x2 +...+xn )−µn
√
σ n
is approximately normally distributed with mean 0 and standard
deviation 1.
16 / 20
Central Limit Theorem (Contd.)
A Binomial(n,p) random variable is simply the sum of n
independent Bernoulli(p) random variables, each of which equals
1 with probability p and 0 with probability 1 – p:
def bernoulli trial(p: float) -> int:
return 1 if random.random() < p else 0
def binomial(n: int, p: float) -> int:
return sum(bernoulli trial(p) for in range(n))
17 / 20
Central Limit Theorem (Contd.)
The
pmean of a Bernoulli(p) variable is p, and its standard deviation
is p(1 − p).
The central limit theorem says that as n gets large, a Binomial(n,p)
variable is approximately a normal random
p variable with mean
µ = np and standard deviation σ = np(1 − p).
18 / 20
References
[1] Data Science from Scratch: First Principles with Python by Joel Grus
19 / 20
Thank You
Any Questions?
20 / 20