0% found this document useful (0 votes)

12 views40 pages

Lecture 01

The document outlines the foundational concepts of probability and statistics relevant to data science, focusing on random variables, their types, and properties. It covers discrete random variables, probability mass functions, cumulative distribution functions, expectations, and variances, providing examples to illustrate these concepts. The lecture serves as an introduction to essential statistical tools necessary for data analysis.

Uploaded by

kht07144

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views40 pages

Lecture 01

Uploaded by

kht07144

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

SEHH2311

FOUNDATIONS OF DATA SCIENCE

LECTURE 1
Revision of Essentials of Probability and
Statistics
Topics
1. Discrete Random Variables
2. Continuous Random Variables
3. Expectations of Random Variables
4. Variances of Random Variables
5. Distributions of Functions of Random Variables

SEHH2311 Foundations of Data Science Page 2

Random Variable
• A random variable is a variable whose value is
determined by random events
• Random variable can be discrete or continuous
• Example
– Height of a randomly selected students from HKCC is a
continuous random variable (X)
– Number of family members in a randomly selected
household from an estate is a discrete random variable (Y)
• We typically use a capital letter to denote a random
variable and small letters to denote the "observed"
data values of a random variable
SEHH2311 Foundations of Data Science Page 3
Discrete Random Variable
• The behavior of a discrete random variable can be fully
described by using a probability mass function (p.m.f.)
• Probability Mass Function 𝒇𝒇(𝒙𝒙) of a discrete random
variable 𝑋𝑋 is the probability 𝑃𝑃(𝑋𝑋 = 𝑥𝑥)
• Sample space 𝑺𝑺 is the set of possible values of 𝑋𝑋
• An event regarding the random variable 𝑋𝑋 corresponds
to a subset of values of 𝑆𝑆, for example, 𝐴𝐴 ⊂ 𝑆𝑆

SEHH2311 Foundations of Data Science Page 4

Discrete Random Variable
Properties of p.m.f.
Properties of a pmf 𝑓𝑓(𝑥𝑥)
1. 𝑓𝑓 𝑥𝑥 > 0, 𝑥𝑥 ∈ 𝑆𝑆
2. ∑𝑥𝑥∈𝑆𝑆 𝑓𝑓(𝑥𝑥) = 1
3. 𝑃𝑃 𝑋𝑋 ∈ 𝐴𝐴 = ∑𝑥𝑥∈𝐴𝐴 𝑓𝑓 𝑥𝑥 , where 𝐴𝐴 ⊂ 𝑆𝑆

SEHH2311 Foundations of Data Science Page 5

Example 1
Let 𝑋𝑋 be the number showing up when rolling a die.

Sample space: 𝑆𝑆 = 1, 2, 3, 4, 5, 6
pmf: 𝑓𝑓 𝑥𝑥 = 1/6 for 𝑥𝑥 ∈ 𝑆𝑆
i.e. 𝑓𝑓 1 = 𝑓𝑓 2 = ⋯ = 𝑓𝑓 6 = 1/6

Suppose we are interested in the event that the number showing up

is even. Then the set 𝐴𝐴 is 2, 4, 6
𝑃𝑃 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = � 𝑓𝑓(𝑥𝑥) = 𝑓𝑓 2 + 𝑓𝑓 4 + 𝑓𝑓 6 = 1/2
𝑥𝑥∈𝐴𝐴

SEHH2311 Foundations of Data Science Page 6

Example 2
Suppose we have two $2 coins, three $5 coins and five $10 coins in
a box. Let 𝑋𝑋 be the "value" of a coin drawn randomly from the box.

Sample space: 𝑆𝑆 = {$2, $5, $10}

2 3 5
pmf: 𝑓𝑓 2 = = 0.2, 𝑓𝑓 5 = = 0.3, 𝑓𝑓 10 = = 0.5
10 10 10

Suppose we are interested in the event of drawing a coin of value

greater than $2. i.e. 𝐴𝐴 = $5, $10
𝑃𝑃 𝑋𝑋 > 2 = � 𝑓𝑓(𝑥𝑥) = 𝑓𝑓 5 + 𝑓𝑓 10 = 0.8
𝑥𝑥∈𝐴𝐴

SEHH2311 Foundations of Data Science Page 7

Example 3
Suppose 𝑋𝑋 is the number of Heads showing up when
flipping a fair coin 3 times. Find the sample space 𝑆𝑆 and
the probability mass function of 𝑋𝑋.
Solution:

SEHH2311 Foundations of Data Science Page 8

Example 3
Suppose 𝑋𝑋 is the number of Heads showing up when
flipping a fair coin 3 times. Find the sample space 𝑆𝑆 and
the probability mass function of 𝑋𝑋.
Solution:
Sample space: 𝑆𝑆 = {0, 1, 2, 3}

Probability Mass Function:

𝑓𝑓 0 = 𝑃𝑃 𝑋𝑋 =0 = 1/8
𝑓𝑓 1 = 𝑃𝑃 𝑋𝑋 =1 = 3/8
𝑓𝑓 2 = 𝑃𝑃 𝑋𝑋 =2 = 3/8
𝑓𝑓 3 = 𝑃𝑃 𝑋𝑋 =3 = 1/8

SEHH2311 Foundations of Data Science Page 9

Discrete Random Variable
Cumulative Distribution Function (CDF)
Cumulative Distribution Function 𝑭𝑭(𝑿𝑿) is defined as
𝐹𝐹 𝑥𝑥 = 𝑃𝑃 𝑋𝑋 ≤ 𝑥𝑥

CDF is the cumulative sum of the pmf up to 𝑥𝑥.

Properties:
1. 0 ≤ 𝐹𝐹 𝑥𝑥 ≤ 1
2. 𝐹𝐹 𝑥𝑥 is a non-decreasing function. i.e. 𝐹𝐹 𝑥𝑥2 ≥ 𝐹𝐹 𝑥𝑥1
if 𝑥𝑥2 > 𝑥𝑥1

SEHH2311 Foundations of Data Science Page 10

Example 4
Suppose 𝑋𝑋 is the number showing up when rolling a fair
die.
1
pmf: 𝑓𝑓 𝑥𝑥 = , for 𝑥𝑥 = 1, 2, … , 6
6

𝑥𝑥
CDF: 𝐹𝐹 𝑥𝑥 = 𝑃𝑃 𝑋𝑋 ≤ 𝑥𝑥 = , for 𝑥𝑥 = 1, 2, … , 6
6
1
𝐹𝐹 1 = 𝑃𝑃 𝑋𝑋 ≤ 1 = 𝑓𝑓 1 =
6
2
𝐹𝐹 2 = 𝑃𝑃 𝑋𝑋 ≤ 2 = 𝑓𝑓 1 + 𝑓𝑓 2 =
6
…
6
𝐹𝐹 6 = 𝑃𝑃 𝑋𝑋 ≤ 6 = 𝑓𝑓 1 + 𝑓𝑓 2 + ⋯ + 𝑓𝑓 6 =
6

SEHH2311 Foundations of Data Science Page 11

Example 4
Visualization of pmf and CDF of X

SEHH2311 Foundations of Data Science Page 12

Example 5
Suppose the pmf of the random variable 𝑋𝑋 is defined as follows.
𝑥𝑥 0 2 5 9 10
𝑓𝑓(𝑥𝑥) 0.2 0.2 0.0 0.1 0.5

The corresponding CDF will be

𝑥𝑥 0 2 5 9 10
𝐹𝐹(𝑥𝑥) 0.2 0.4 0.4 0.5 1.0

SEHH2311 Foundations of Data Science Page 13

Example 6
Suppose 3 cards are drawn from an ordinary deck of cards with replacement. Let
𝑋𝑋 be the number of Spade ( ) out of the 3 cards drawn. Complete the tables
below and sketch the pmf and CDF of 𝑋𝑋.

Solution: pmf CDF

𝑥𝑥 𝑥𝑥
𝑓𝑓(𝑥𝑥) 𝐹𝐹(𝑥𝑥)

SEHH2311 Foundations of Data Science Page 14

Solution: pmf CDF

𝑥𝑥 0 1 2 3 𝑥𝑥 0 1 2 3
𝑓𝑓(𝑥𝑥) 0.422 0.422 0.141 0.015 𝐹𝐹(𝑥𝑥) 0.422 0.844 0.985 1

SEHH2311 Foundations of Data Science Page 15

Expectation of Discrete Random
Variables
Expectation of a random variable 𝑋𝑋 is its value on average if you
observe the random variable for a large number of times.
Expectation of 𝑋𝑋 is denoted by 𝐸𝐸(𝑋𝑋) or 𝜇𝜇 (the population mean of
𝑋𝑋).
𝜇𝜇 = 𝐸𝐸 𝑋𝑋 = � 𝑥𝑥 ⋅ 𝑓𝑓(𝑥𝑥)
𝑥𝑥∈𝑆𝑆

If the random variable 𝑋𝑋 only has non-zero probability for 𝑛𝑛 distinct

values 𝑥𝑥1 , 𝑥𝑥2 , …, 𝑥𝑥𝑛𝑛 , the expected value can be written as
𝑛𝑛
n can be infinity
𝜇𝜇 = 𝐸𝐸 𝑋𝑋 = � 𝑥𝑥𝑖𝑖 ⋅ 𝑓𝑓 𝑥𝑥𝑖𝑖 for some random
variables

𝑖𝑖=1

SEHH2311 Foundations of Data Science Page 16

Example 7
Suppose 𝑋𝑋 is the number showing up when rolling a fair die. Find
the expected value of 𝑋𝑋.
Solution: 𝑥𝑥𝑖𝑖 1 2 3 4 5 6

𝑓𝑓(𝑥𝑥𝑖𝑖 ) 1/6 1/6 1/6 1/6 1/6 1/6

𝐸𝐸 𝑋𝑋 = � 𝑥𝑥𝑖𝑖 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1
1 1 1 1 1 1
=1 +2 +3 +4 +5 +6
6 6 6 6 6 6
= 3.5
That means the average of the numbers showing up when rolling a
fair die for a large number times is equal to 3.5.

SEHH2311 Foundations of Data Science Page 17

Example 8
Consider the random variable in Example 2, we have two $2 coins,
three $5 coins and five $10 coins in a box. Let 𝑋𝑋 be the "value" of a
coin drawn randomly from the box.
Find the expected value of the coin drawn.
Solution:

SEHH2311 Foundations of Data Science Page 18

𝐸𝐸 𝑋𝑋 = � 𝑥𝑥𝑖𝑖 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1
= 2 0.2 + 5 0.3 + 10 0.5
= $6.9

SEHH2311 Foundations of Data Science Page 19

Variance & Standard Deviation of
Discrete Random Variable
Variance measures the "spread" of a random variable when you
observe it for a large number of time. It is defined as
𝑛𝑛

𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = � 𝑥𝑥𝑖𝑖 − 𝜇𝜇 2
⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1
or equivalently
𝑛𝑛

𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = � 𝑥𝑥𝑖𝑖2 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 ) − 𝜇𝜇2

𝑖𝑖=1

Since variance does not have the same unit as the random variable
𝑋𝑋, we define standard deviation as the square root of the variance.

𝜎𝜎 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = 𝜎𝜎 2

SEHH2311 Foundations of Data Science Page 20

Variance & Standard Deviation of
Discrete Random Variable
Interpretation of variance:
• 𝑥𝑥𝑖𝑖 − 𝜇𝜇 2 is the squared distance of 𝑥𝑥𝑖𝑖 from the mean

• If you define 𝑌𝑌 = 𝑋𝑋 − 𝜇𝜇 2 , 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) is equal to the

expectation of 𝑌𝑌. i.e. 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = 𝐸𝐸 𝑌𝑌 = 𝐸𝐸[ 𝑋𝑋 − 𝜇𝜇 2 ]

• If most of the 𝑥𝑥𝑖𝑖 's are far from the mean 𝜇𝜇, we will have
a large variance.

SEHH2311 Foundations of Data Science Page 21

Example 9
Suppose 𝑋𝑋 is the number showing up when rolling a fair die. Find
the variance and standard deviation of 𝑋𝑋.
Solution:
𝑛𝑛
1 1 1
𝜇𝜇 = � 𝑥𝑥𝑖𝑖 ⋅ 𝑓𝑓 𝑥𝑥𝑖𝑖 =1 +2 + ⋯+ 6 = 3.5
6 6 6
𝑖𝑖=1
𝑛𝑛

𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = � 𝑥𝑥𝑖𝑖2 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 ) − 𝜇𝜇2

𝑖𝑖=1
1 1 1
= 12 +2 2 + ⋯6 2 − 3.5 ≈ 2.917
6 6 6
𝜎𝜎 = 𝜎𝜎 2 ≈ 1.078

SEHH2311 Foundations of Data Science Page 22

Example 10
Consider the random variable in Example 2, we have two $2 coins,
three $5 coins and five $10 coins in a box. Let 𝑋𝑋 be the "value" of a
coin drawn randomly from the box.
Find the variance and standard deviation of 𝑋𝑋.
Solution:

SEHH2311 Foundations of Data Science Page 23

𝜇𝜇 = � 𝑥𝑥𝑖𝑖 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 ) = 2 0.2 + 5 0.3 + 10 0.5 = $6.9

𝑖𝑖=1
3

𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = � 𝑥𝑥𝑖𝑖2 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 ) − 𝜇𝜇2 = 22 0.2 + 52 0.3 + 102 0.5 − 6.92 = 10.69
𝑖𝑖=1
𝜎𝜎 = 𝜎𝜎 2 ≈ $3.270

SEHH2311 Foundations of Data Science Page 24

Continuous Random Variable
• Examples of continuous random variable
– Height of a randomly selected student from a school
– Speed of a randomly selected car on a highway
• The behavior of a continuous random variable can be fully
described by its probability density function (pdf) 𝑓𝑓(𝑥𝑥)
• Sample space (or domain) of a continuous random variable
typically consists of intervals of values instead of a set of distinct
values
• Similar to discrete random variable, we also have cumulative
distribution function (CDF) F(x) for continuous variable.

SEHH2311 Foundations of Data Science Page 25

Continuous Random Variable
Properties of pdf
Properties of a pdf 𝑓𝑓(𝑥𝑥) for the random variable 𝑋𝑋
1. 𝑓𝑓 𝑥𝑥 ≥ 0 for all 𝑥𝑥.
∞
2. ∫−∞ 𝑓𝑓(𝑥𝑥) =1.
𝑥𝑥2
3. 𝑃𝑃 𝑥𝑥1 < 𝑋𝑋 < 𝑥𝑥2 = ∫𝑥𝑥 𝑓𝑓(𝑥𝑥) 𝑑𝑑𝑑𝑑 for any 𝑥𝑥1 less than
1
𝑥𝑥2

SEHH2311 Foundations of Data Science Page 26

Continuous Random Variable
Cumulative Distribution Function (CDF)
Cumulative Distribution Function 𝑭𝑭(𝑿𝑿) is defined as
𝑥𝑥
𝐹𝐹 𝑥𝑥 = 𝑃𝑃 𝑋𝑋 ≤ 𝑥𝑥 = � 𝑓𝑓 𝑢𝑢 𝑑𝑑𝑑𝑑
−∞

CDF is the cumulative probability up to 𝑥𝑥.

Properties:
1. 0 ≤ 𝐹𝐹 𝑥𝑥 ≤ 1
2. 𝐹𝐹 𝑥𝑥 is a non-decreasing function. i.e. 𝐹𝐹 𝑥𝑥2 ≥ 𝐹𝐹 𝑥𝑥1
if 𝑥𝑥2 > 𝑥𝑥1
SEHH2311 Foundations of Data Science Page 27
Uniform Distribution
A random variable 𝑋𝑋 with a uniform distribution U(a,b) has a
constant pdf in a given interval [a, b]. The pdf 𝑓𝑓(𝑥𝑥) has the
following form
1
𝑓𝑓𝑓𝑓𝑓𝑓 𝑎𝑎 ≤ 𝑥𝑥 ≤ 𝑏𝑏
𝑓𝑓 𝑥𝑥 = �𝑏𝑏 − 𝑎𝑎
0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 < 𝑎𝑎 𝑜𝑜𝑜𝑜 𝑥𝑥 > 𝑏𝑏
0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 < 𝑎𝑎
𝑥𝑥 − 𝑎𝑎
𝐹𝐹 𝑥𝑥 = 𝑓𝑓𝑓𝑓𝑓𝑓 𝑎𝑎 ≤ 𝑥𝑥 ≤ 𝑏𝑏
𝑏𝑏 − 𝑎𝑎
1 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 > 𝑏𝑏

SEHH2311 Foundations of Data Science Page 28

Uniform Distribution
Suppose 𝑋𝑋 has the distribution 𝑈𝑈 1,4 , i.e. a=1 and b=4.
0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 < 1
1
𝑓𝑓𝑓𝑓𝑓𝑓 1 ≤ 𝑥𝑥 ≤ 4 𝑥𝑥 − 1
𝑓𝑓 𝑥𝑥 = �3 𝐹𝐹 𝑥𝑥 = 𝑓𝑓𝑓𝑓𝑓𝑓 1 ≤ 𝑥𝑥 ≤ 4
0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 < 1 𝑜𝑜𝑜𝑜 𝑥𝑥 > 4 3
1 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 > 4

SEHH2311 Foundations of Data Science Page 29

Uniform Distribution
Suppose 𝑋𝑋 has the distribution 𝑈𝑈(1,4). Find the following probabilities.
1. P( 1.5 < 𝑋𝑋 < 2)
2. P( -1 < 𝑋𝑋 < 3)
3. P( X = 3)
Solution:
2 1 1
1. 𝑃𝑃 1.5 < 𝑋𝑋 < 2 = ∫1.5 3
𝑑𝑑𝑑𝑑 = 2 − 1.5
3
≈ 0.167
3 1 3 1
2. 𝑃𝑃 −1 < 𝑋𝑋 < 3 = ∫−1 𝑓𝑓(𝑢𝑢)𝑑𝑑𝑑𝑑 = ∫−1 0 𝑑𝑑𝑑𝑑 + ∫1 3
𝑑𝑑𝑑𝑑
1
=0+ 3−1 ≈ 0.667
3
3 1 1
3. 𝑃𝑃 𝑋𝑋 = 3 = ∫3 3
𝑑𝑑𝑑𝑑 = 3 − 3
3
=0

SEHH2311 Foundations of Data Science Page 30

Example 11
Suppose a random variable 𝑋𝑋 has a pdf as shown below. Write
down the pdf and CDF of 𝑋𝑋.
Solution:

SEHH2311 Foundations of Data Science Page 31

Example 11
Suppose a random variable 𝑋𝑋 has a pdf as shown below. Write
down the pdf and CDF of 𝑋𝑋.
Solution:

0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 < −1
1 + 𝑥𝑥 𝑓𝑓𝑓𝑓𝑓𝑓 − 1 ≤ 𝑥𝑥 ≤ 0
𝑓𝑓 𝑥𝑥 =
1 − 𝑥𝑥 𝑓𝑓𝑓𝑓𝑓𝑓 0 < 𝑥𝑥 ≤ 1
0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 > 1
0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 < −1
𝑥𝑥 + 𝑥𝑥 2 /2 + 1/2 𝑓𝑓𝑓𝑓𝑓𝑓 − 1 ≤ 𝑥𝑥 ≤ 0
𝐹𝐹 𝑥𝑥 =
𝑥𝑥 − 𝑥𝑥 2 /2 + 1/2 𝑓𝑓𝑓𝑓𝑓𝑓 0 < 𝑥𝑥 ≤ 1
1 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 > 1

SEHH2311 Foundations of Data Science Page 32

Example 12
For the random variable described in Example 11, find the following
probabilities.
Solution:

𝑃𝑃 𝑋𝑋 ≤ −0.5 = 𝐹𝐹 =

𝑃𝑃 𝑋𝑋 ≤ 0.2 = 𝐹𝐹 =

𝑃𝑃 −0.5 < 𝑋𝑋 ≤ 0.2 =

𝑃𝑃 0 < 𝑋𝑋 < 0.8 =

SEHH2311 Foundations of Data Science Page 33

Example 12
For the random variable described in Example 11, find the following
probabilities.
Solution:

−0.5 2 1
𝑃𝑃 𝑋𝑋 ≤ −0.5 = 𝐹𝐹 −0.5 = −0.5 + + = 0.125
2 2

0.2 2 1
𝑃𝑃 𝑋𝑋 ≤ 0.2 = 𝐹𝐹 0.2 = 0.2 − + = 0.68
2 2

𝑃𝑃 −0.5 < 𝑋𝑋 ≤ 0.2 = 𝐹𝐹 0.2 − 𝐹𝐹 −0.5 = 0.68 − 0.125 = 0.555

𝑃𝑃 0 < 𝑋𝑋 < 0.8 = 𝐹𝐹 0.8 − 𝐹𝐹 0 = 0.98 − 0.5 = 0.48

SEHH2311 Foundations of Data Science Page 34

Expectation of Continuous Random
Variable
Similar to discrete random variable the expectation of continuous
random variable is the average value of a random variable 𝑋𝑋 when
you observe it for large number of times. Expectation of continuous
random variable is defined as follows
∞
𝜇𝜇 = 𝐸𝐸 𝑋𝑋 = � 𝑥𝑥 ⋅ 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑
−∞

SEHH2311 Foundations of Data Science Page 35

Variance and Standard Deviation of
Continuous Random Variable
Similar to discrete random variable, we can use the expectation of
squared distance from the mean to define variance of continuous
random variable.
∞
𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) = 𝐸𝐸[ 𝑋𝑋 − 𝜇𝜇 2 ] = � 𝑥𝑥 − 𝜇𝜇 2 ⋅ 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑
−∞
or equivalently, 𝜎𝜎 2 = 𝐸𝐸 𝑋𝑋 2 − 𝜇𝜇2

Analogously, we can define standard deviation as below

𝜎𝜎 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑥𝑥 = 𝜎𝜎 2

SEHH2311 Foundations of Data Science Page 36

Example 13
Suppose 𝑋𝑋 has a uniform distribution 𝑈𝑈(2,4). Find the expected value and
variance of 𝑋𝑋.
Solution:
4
4 1
𝐸𝐸 𝑋𝑋 = � 𝑥𝑥 ⋅ 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑 = � 𝑥𝑥 𝑑𝑑𝑑𝑑
2 2 4 − 2
4
𝑥𝑥 2
42 22
= � = − =3
4 4 4
2
4
𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝜇𝜇2 = � 𝑥𝑥 2 ⋅ 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑 − 𝜇𝜇2
2
4
4
1 𝑥𝑥 3
=� 𝑥𝑥 2 ⋅ 𝑑𝑑𝑑𝑑 − 𝜇𝜇2 = � − 32 ≈ 0.333
2 4−2 6
2

SEHH2311 Foundations of Data Science Page 37

Functions of Random Variables
In some situations, we need to work with functions of several
random variables. For example, 𝑋𝑋, 𝑌𝑌 and 𝑍𝑍 are the Chinese, English
and Mathematics scores of a randomly selected student from a
school. It will be interesting to know the expectation and variance
of the average score of the three subjects.

𝑋𝑋 + 𝑌𝑌 + 𝑍𝑍
𝐸𝐸 =?
3
𝑋𝑋 + 𝑌𝑌 + 𝑍𝑍
𝑉𝑉𝑉𝑉𝑉𝑉 =?
3

SEHH2311 Foundations of Data Science Page 38

Useful Rules About Expectations
Suppose 𝑋𝑋 and 𝑌𝑌 are two random variables. 𝑎𝑎 and 𝑏𝑏 are
two scalars.
𝐸𝐸 𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 = 𝑎𝑎𝑎𝑎 𝑋𝑋 + 𝑏𝑏𝑏𝑏(𝑌𝑌)

If 𝑋𝑋 and 𝑌𝑌 are independent, then

𝐸𝐸 𝑎𝑎𝑎𝑎𝑎𝑎 = 𝑎𝑎𝑎𝑎 𝑋𝑋 𝐸𝐸(𝑌𝑌)

𝑋𝑋 𝐸𝐸 𝑋𝑋
However, 𝐸𝐸 ≠ in general!!!
𝑌𝑌 𝐸𝐸 𝑌𝑌

SEHH2311 Foundations of Data Science Page 39

Useful Rules About Variances
Suppose 𝑋𝑋 and 𝑌𝑌 are two independent random variables.
𝑎𝑎 and 𝑏𝑏 are two scalars.

𝑉𝑉𝑉𝑉𝑉𝑉 𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 = 𝑎𝑎2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 + 𝑏𝑏 2 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌)

Important: The above relationship does not hold if 𝑋𝑋 and

𝑌𝑌 are dependent.

SEHH2311 Foundations of Data Science Page 40

Chapter3-Probability Distribution
100% (1)
Chapter3-Probability Distribution
35 pages
Statistic Project
No ratings yet
Statistic Project
11 pages
Investment Preference, Risk Perception, and Portfolio Choices Under Different Socio-Economic Status: Some Experimental Evidences From Individual Investors
No ratings yet
Investment Preference, Risk Perception, and Portfolio Choices Under Different Socio-Economic Status: Some Experimental Evidences From Individual Investors
43 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
02-Random Variables
No ratings yet
02-Random Variables
62 pages
Random Variables
No ratings yet
Random Variables
14 pages
Random Variables
No ratings yet
Random Variables
9 pages
RM2
No ratings yet
RM2
102 pages
Math2101Stat 4
No ratings yet
Math2101Stat 4
15 pages
02-Random Variables
No ratings yet
02-Random Variables
62 pages
Chapter-3
No ratings yet
Chapter-3
26 pages
Random Variables Apr 27
No ratings yet
Random Variables Apr 27
32 pages
Chapter (2) (1)cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
No ratings yet
Chapter (2) (1)cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
28 pages
Random Variables
No ratings yet
Random Variables
26 pages
BMA2102 Probability and Statistics II Lecture 1
No ratings yet
BMA2102 Probability and Statistics II Lecture 1
15 pages
Chap-5
No ratings yet
Chap-5
14 pages
Chap 2 Random Variables
No ratings yet
Chap 2 Random Variables
41 pages
5-Discrete Random Variable
No ratings yet
5-Discrete Random Variable
17 pages
PRP Module 2
No ratings yet
PRP Module 2
113 pages
CH 7 - Random Variables Discrete and Continuous
No ratings yet
CH 7 - Random Variables Discrete and Continuous
7 pages
Chapter 2 Random Variables
No ratings yet
Chapter 2 Random Variables
34 pages
Learn Distribute
No ratings yet
Learn Distribute
23 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
23 pages
4-Random Variables
No ratings yet
4-Random Variables
80 pages
Math 5846 Chapter 2
No ratings yet
Math 5846 Chapter 2
102 pages
Random Variables FinalNotes
No ratings yet
Random Variables FinalNotes
57 pages
1.Stochastic Process Edited - Final
No ratings yet
1.Stochastic Process Edited - Final
90 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
2 Random Variable
No ratings yet
2 Random Variable
69 pages
ISM Session 5 June 2025
No ratings yet
ISM Session 5 June 2025
74 pages
Econ-2042- Unit 2-HO
No ratings yet
Econ-2042- Unit 2-HO
12 pages
Notes#5 PDF
No ratings yet
Notes#5 PDF
57 pages
Week 03 Random - Variables
No ratings yet
Week 03 Random - Variables
21 pages
M6 2020 Discrete Random Variable Notes For SLS
No ratings yet
M6 2020 Discrete Random Variable Notes For SLS
42 pages
Week 3 Pro
No ratings yet
Week 3 Pro
23 pages
Ch.3 - Ch.4 - Ch.5 RV-PD - Part I
No ratings yet
Ch.3 - Ch.4 - Ch.5 RV-PD - Part I
82 pages
Chapter 3 Without Audio
No ratings yet
Chapter 3 Without Audio
52 pages
Chapter 4 1
No ratings yet
Chapter 4 1
30 pages
5. Introduction to Random Variables
No ratings yet
5. Introduction to Random Variables
9 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Week05-06 EC With Annotations
No ratings yet
Week05-06 EC With Annotations
84 pages
Some Common Probability Distributions
No ratings yet
Some Common Probability Distributions
92 pages
Module2 - Random Variable
No ratings yet
Module2 - Random Variable
24 pages
Class Notes 4
No ratings yet
Class Notes 4
14 pages
Chapter 2 Random Variables PDF
No ratings yet
Chapter 2 Random Variables PDF
41 pages
Stats2 Textbook Week4
No ratings yet
Stats2 Textbook Week4
34 pages
6 Continuous Variables
No ratings yet
6 Continuous Variables
8 pages
Chapter 4-6
No ratings yet
Chapter 4-6
39 pages
Random Variables
No ratings yet
Random Variables
23 pages
DOC-20250212-WA0007.
No ratings yet
DOC-20250212-WA0007.
111 pages
Idea of Random Variable
No ratings yet
Idea of Random Variable
33 pages
Discrete Random Variables Class 4, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Discrete Random Variables Class 4, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
13 pages
Random Variables, Cumulative Distribution Functions, Probability Mass Function
No ratings yet
Random Variables, Cumulative Distribution Functions, Probability Mass Function
23 pages
L-6 Probability Distribution
No ratings yet
L-6 Probability Distribution
58 pages
UECM2273 Mathematical Statistics
No ratings yet
UECM2273 Mathematical Statistics
16 pages
1 RandomVariable
No ratings yet
1 RandomVariable
21 pages
Ch-04 - Random Variables and Their Properties
No ratings yet
Ch-04 - Random Variables and Their Properties
32 pages
Module 2.Pptx
No ratings yet
Module 2.Pptx
22 pages
Chapter 6 - Random Variables and Probability Distributions
No ratings yet
Chapter 6 - Random Variables and Probability Distributions
101 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Tiruneh Abebe
No ratings yet
Tiruneh Abebe
69 pages
0 Ppt1 Introduction To Biostatistics123
No ratings yet
0 Ppt1 Introduction To Biostatistics123
59 pages
Stat2507 Finalexam
100% (1)
Stat2507 Finalexam
12 pages
Statistics in Education - Made Simple
100% (1)
Statistics in Education - Made Simple
26 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Sampling Methods Applied To Fisheries Science
100% (1)
Sampling Methods Applied To Fisheries Science
100 pages
Uncertainty Analysis in Geological Modeling-libre
No ratings yet
Uncertainty Analysis in Geological Modeling-libre
23 pages
Innovation and Co-Creation in Knowledge Intensive Business Services - The Spinner Model
No ratings yet
Innovation and Co-Creation in Knowledge Intensive Business Services - The Spinner Model
15 pages
TG-16635 (Method)
No ratings yet
TG-16635 (Method)
3 pages
MTH 410 Outline
No ratings yet
MTH 410 Outline
4 pages
RP Spiral
No ratings yet
RP Spiral
48 pages
Reading Statistics and Research
No ratings yet
Reading Statistics and Research
4 pages
ECO - Chapter 01 The Subject Matter of Econometrics
No ratings yet
ECO - Chapter 01 The Subject Matter of Econometrics
42 pages
Machine Learning-1
No ratings yet
Machine Learning-1
24 pages
Assignment2_DMS672
No ratings yet
Assignment2_DMS672
15 pages
12 ABM-4 (B2) Factors Affecting The Grade 12 ABM Students in Crecencia Drucila Lopez Senior HIgh School On Purchasing Skincare Products. - MANUSCRIPT
No ratings yet
12 ABM-4 (B2) Factors Affecting The Grade 12 ABM Students in Crecencia Drucila Lopez Senior HIgh School On Purchasing Skincare Products. - MANUSCRIPT
49 pages
Environmental Factors Influencing Management Thought: Economic Influences
100% (1)
Environmental Factors Influencing Management Thought: Economic Influences
58 pages
Marketing Research Report Group 1 Ib Ljdivb
No ratings yet
Marketing Research Report Group 1 Ib Ljdivb
15 pages
SyllabusAD251 (2021 2022)
100% (1)
SyllabusAD251 (2021 2022)
3 pages
5110
No ratings yet
5110
34 pages
Z Test (Standard Normal Distribution) (N 30) : 1) Confidence Interval For Mean
No ratings yet
Z Test (Standard Normal Distribution) (N 30) : 1) Confidence Interval For Mean
10 pages
Full Download Foundations of Nursing Research 7th Edition - eBook PDF PDF DOCX
100% (1)
Full Download Foundations of Nursing Research 7th Edition - eBook PDF PDF DOCX
54 pages
Gec410 Note Viii
No ratings yet
Gec410 Note Viii
26 pages
Journalsadmin,+Journal+Manager,+Article+Published +5th+Sept+2018
No ratings yet
Journalsadmin,+Journal+Manager,+Article+Published +5th+Sept+2018
26 pages
Development of Correlation Equations Bet
No ratings yet
Development of Correlation Equations Bet
6 pages
Test Bank For Interactive Statistics 3 e 3rd Edition 0131497561
100% (59)
Test Bank For Interactive Statistics 3 e 3rd Edition 0131497561
4 pages
d2c0 PDF
No ratings yet
d2c0 PDF
6 pages
INTED2016 Hodanova-Nocar
No ratings yet
INTED2016 Hodanova-Nocar
8 pages

Lecture 01

Uploaded by

Lecture 01

Uploaded by

SEHH2311

FOUNDATIONS OF DATA SCIENCE

SEHH2311 Foundations of Data Science Page 2

SEHH2311 Foundations of Data Science Page 4

SEHH2311 Foundations of Data Science Page 5

Suppose we are interested in the event that the number showing up

SEHH2311 Foundations of Data Science Page 6

Sample space: 𝑆𝑆 = {$2, $5, $10}

Suppose we are interested in the event of drawing a coin of value

SEHH2311 Foundations of Data Science Page 7

SEHH2311 Foundations of Data Science Page 8

Probability Mass Function:

SEHH2311 Foundations of Data Science Page 9

CDF is the cumulative sum of the pmf up to 𝑥𝑥.

SEHH2311 Foundations of Data Science Page 10

SEHH2311 Foundations of Data Science Page 11

SEHH2311 Foundations of Data Science Page 12

The corresponding CDF will be

SEHH2311 Foundations of Data Science Page 13

Solution: pmf CDF

SEHH2311 Foundations of Data Science Page 14

Solution: pmf CDF

SEHH2311 Foundations of Data Science Page 15

If the random variable 𝑋𝑋 only has non-zero probability for 𝑛𝑛 distinct

SEHH2311 Foundations of Data Science Page 16

𝑓𝑓(𝑥𝑥𝑖𝑖 ) 1/6 1/6 1/6 1/6 1/6 1/6

SEHH2311 Foundations of Data Science Page 17

SEHH2311 Foundations of Data Science Page 18

SEHH2311 Foundations of Data Science Page 19

𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = � 𝑥𝑥𝑖𝑖2 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 ) − 𝜇𝜇2

SEHH2311 Foundations of Data Science Page 20

• If you define 𝑌𝑌 = 𝑋𝑋 − 𝜇𝜇 2 , 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) is equal to the

SEHH2311 Foundations of Data Science Page 21

𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = � 𝑥𝑥𝑖𝑖2 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 ) − 𝜇𝜇2

SEHH2311 Foundations of Data Science Page 22

SEHH2311 Foundations of Data Science Page 23

𝜇𝜇 = � 𝑥𝑥𝑖𝑖 ⋅ 𝑓𝑓(𝑥𝑥𝑖𝑖 ) = 2 0.2 + 5 0.3 + 10 0.5 = $6.9

SEHH2311 Foundations of Data Science Page 24

SEHH2311 Foundations of Data Science Page 25

SEHH2311 Foundations of Data Science Page 26

CDF is the cumulative probability up to 𝑥𝑥.

SEHH2311 Foundations of Data Science Page 28

SEHH2311 Foundations of Data Science Page 29

SEHH2311 Foundations of Data Science Page 30

SEHH2311 Foundations of Data Science Page 31

SEHH2311 Foundations of Data Science Page 32

𝑃𝑃 −0.5 < 𝑋𝑋 ≤ 0.2 =

𝑃𝑃 0 < 𝑋𝑋 < 0.8 =

SEHH2311 Foundations of Data Science Page 33

𝑃𝑃 −0.5 < 𝑋𝑋 ≤ 0.2 = 𝐹𝐹 0.2 − 𝐹𝐹 −0.5 = 0.68 − 0.125 = 0.555

𝑃𝑃 0 < 𝑋𝑋 < 0.8 = 𝐹𝐹 0.8 − 𝐹𝐹 0 = 0.98 − 0.5 = 0.48

SEHH2311 Foundations of Data Science Page 34

SEHH2311 Foundations of Data Science Page 35

Analogously, we can define standard deviation as below

SEHH2311 Foundations of Data Science Page 36

SEHH2311 Foundations of Data Science Page 37

SEHH2311 Foundations of Data Science Page 38

If 𝑋𝑋 and 𝑌𝑌 are independent, then

SEHH2311 Foundations of Data Science Page 39

𝑉𝑉𝑉𝑉𝑉𝑉 𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 = 𝑎𝑎2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 + 𝑏𝑏 2 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌)

Important: The above relationship does not hold if 𝑋𝑋 and

SEHH2311 Foundations of Data Science Page 40

You might also like