Lecture2 - Descriptive Statistics - 0909
Lecture2 - Descriptive Statistics - 0909
Yunduan Lin
Assistant Professor
Department of Decisions, Operations and Technology
CUHK Business School
Agenda
Project
o Small project. Allow 1-3 students as a group. Recall the 'mini-lecture' I mentioned in the Survey.
o However, I would expect more if you have more students in the group.
Results for Pre-Course Survey
69 responses
69 responses
Results for Pre-Course Survey
What do you expect to gain from this course? Topics to cover for mini-lecture:
https://docs.google.com/forms/d/e/1FAIpQLSfsEgnMFLypI_KW6GF7j_FXtVY5E4Jrmf2P_BDwaG8GXWDc0A/viewform?usp=sf_link
Three Numerical Representation
Central tendency One number to present the whole dataset
o Mean Middle in terms of distance Easily affected by outliers
o Median Middle in terms of position Not affected by outliers
o Mode With most frequency
o Third quartile:
o Mild / Suspected Outliers: any data points larger than Q3+1.5IQR or smaller than Q1-1.5IQR.
o Serious / Extreme Outliers: any data points larger than Q3+3IQR or smaller than Q1-3IQR.
Example: Outlier
Data: 1, 2, 3, 4, 5 Data: 1, 2, 3, 4, 50
1 2
Q1=2, Q3=4, IQR=2 Q1=2, Q3=4, IQR=2
1 -2 4 -8 1 -3 9 -27
2 -1 1 -1 2 -2 4 -8
3 0 0 0 3 -1 1 -1
4 1 1 1 4 0 0 0
5 2 4 8 10 6 36 216
1 Data: 1, 2, 3, 4, 5
2 Data: 1, 2, 3, 4, 15
1 -2 4 16 1 -4 16 256
2 -1 1 1 2 -3 9 81
3 0 0 0 3 -2 4 16
4 1 1 1 4 -1 1 1
5 2 4 16 15 10 100 10000
o Covariance
o Correlation coefficient
Covariance - Definition
Covariance
Whether the given two sequences move toward means together or not?
Population covariance
Sample covariance
1 Data: 1, 2, 3, 4, 5
1 -2 4 4 -2 4 4
2 -1 1 7 1 1 -1
3 0 0 6 0 0 0
4 1 1 4 -2 4 -2
5 2 4 9 3 9 6
Covariance:
Correlation:
Covariance & Correlation - Importance
Why we care if them move together?
o For finance, negative correlated assets reduces total risk exposure (diversification)
o For marketing, the manager may want to know if consumers buying for one good would also be
interested to buy another good. Then the manager may want to provide discount for bundle
goods to boost sale.
Application
Summary statistics
Uploaded as
1 – Descriptive Statistics.xlsx
Excel – Cell Reference
Use Excel as to Store Data
Range of Cells
o Absolute address: $A$1:$B$2
o Relative address: A1:B2
Excel – Bar Chart
Use Excel to Generate a Bar Chart Vertical Axis Title
0
Systems Software App Programmers Network / CIS Managers Infor Security Database
o Select the data (e.g., first three columns) Analysts Developers System
Admins
Analysts Administrators
o Click insert > insert Column or Bar Chart icon 2010 2020 Est.
11 22.6 $445
12 17.2 $408 o Click on the scatters
o Right click > add trendline
o Select the data (e.g., last two columns) o Click on the trendline
o Click insert > insert Scatterplot icon o Right click > add trendline equation
Excel – Scatterplot VS Bar Chart
Which one to choose?
Ice Cream Revenue Ice Cream Revenue
Temperature ℃ Ice Cream Revenue Ice Cream Brand Ice Cream Revenue
1 14.2 $215 Pixabay $185
2 16.4 $325 Ben&Jerry's $215
3 11.9 $185 Haagen-Dazs $332
4 15.2 $332 Dreyer's $325
5 18.5 $506 Blue Bell $408
Continuous6 22.1 $522 Skinny Cow $421
Discrete / Halo Top $406
variable 7 19.4 $412
Categorical
8 25.1 $614
variable
9 23.4 $544
10 18.1 $421
11 22.6 $445
12 17.2 $408
Haagen-Dazs $332
Dreyer's $325
Blue Bell $408 Ice Cream Revenue
Distribution
of revenue
Excel – Scatterplot VS Line Chart
Which one to choose?
Scatterplot
Ice Cream Revenue Ice Cream Revenue VS Temperature
$700
Temperature ℃ Ice Cream Revenue
$600
1 14.2 $215 $500
Correlation $400 y = 29.989x - 149.29
2 16.4 $325 $300
$200
3 11.9 $185
$100
4 15.2 $332 $0
0 5 10 15 20 25 30
5 18.5 $506
Ice Cream Revenue Linear (Ice Cream Revenue)
6 22.1 $522
7 19.4 $412 Line Chart
8 25.1 $614
Trend
Ice Cream Revenue
9 23.4 $544
$800
10 18.1 $421
$600
11 22.6 $445
$400
12 17.2 $408 $200
$0
1 2 3 4 5 6 7 8 9 10 11 12
o Select the data (e.g., the last column)
Series1
o Click insert > insert Line Chart icon
Excel – Wrong Examples
Charts can become meaningless
Ice Cream Revenue
Temperature ℃ Ice Cream Revenue A Wrong Line Chart
1 14.2 $215 700
500
3 11.9 $185
400
4 15.2 $332
300
5 18.5 $506 200
6 22.1 $522 100
7 19.4 $412 0
1 2 3 4 5 6 7 8 9 10 11 12
8 25.1 $614 Temperature ℃ Ice Cream Revenue
9 23.4 $544
10 18.1 $421
11 22.6 $445 o There is no need to plot these two lines in one chart.
12 17.2 $408 These two columns have no meaning for comparison.
Excel – Wrong Examples
Charts can become meaningless