0% found this document useful (0 votes)
28 views29 pages

Lecture2 - Descriptive Statistics - 0909

Uploaded by

九.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views29 pages

Lecture2 - Descriptive Statistics - 0909

Uploaded by

九.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

DOTE 2011 | Fall 2024

@ CUHK Business School

Statistical Analysis for Business Decisions


Descriptive Statistics and Excel Intro

Yunduan Lin
Assistant Professor
Department of Decisions, Operations and Technology
CUHK Business School
Agenda

Statistical Analysis for Business Decisions

01 Relationship Between Data


o Covariance
o Positive / negative correlation

02 Excel for Basic Statistics


o Basic functions
o Tool for calculation
o Generate different kinds of charts
Supplement Logistics
Grading
Main message:
You can secure a reasonable score as long as you spend some efforts (i.e., participation, homework etc.).
Exam
o Open everything except Internet.
o I can arrange one make-up exam if there is time conflict. But the time may not be ideal for all of you due to
the availability of TA and me.
o You can choose to skip one of the exams and move all weights to the other. BUT, the 'skip' decision should
be made before midterm and are not allowed to change later.
Quiz
o Basically, it is for 'participation', but not for 'exam'. I will mainly give points based on whether you submit it or
not, rather than the correctness. Even, the questions might be some open questions.

Project
o Small project. Allow 1-3 students as a group. Recall the 'mini-lecture' I mentioned in the Survey.
o However, I would expect more if you have more students in the group.
Results for Pre-Course Survey

69 responses

69 responses
Results for Pre-Course Survey
What do you expect to gain from this course? Topics to cover for mini-lecture:

o Data processing o Statistics analysis on gaming


o Math knowledge o Statistics base on the financial crisis like what
o How to view and understand data in an easier way and could be the major cause of financial crisis no
how to generate idea from it matter their seriousness
o Know about all the statistics terminology o How to not get fooled by biased statistics
o Quickly transfer the data into a clearer picture o Python
o Knowing how to classify and group the data that is o Vba
applicable in workplace o Use statistics to analyze a stock
o Excel o Fun fact about stat in ancient time
o The mechanism and application of ANOVA o I would like to cover the costs of a firm fulfilling
o Convince people with statistics the responsibility to the environment or
o Something actually useful sustainability
o learn something better for my career o Investments
o A good grade o News
o No o Fashion or music
o Exam skills(?)
A Feedback Form for the Entire Term

https://docs.google.com/forms/d/e/1FAIpQLSfsEgnMFLypI_KW6GF7j_FXtVY5E4Jrmf2P_BDwaG8GXWDc0A/viewform?usp=sf_link
Three Numerical Representation
Central tendency One number to present the whole dataset
o Mean Middle in terms of distance Easily affected by outliers
o Median Middle in terms of position Not affected by outliers
o Mode With most frequency

Dispersion How data points are spread out


o Range Difference between max and min Easily affected by outliers
o Interquartile Range (IQR) Difference between Q 3 and Q1 Not affected by outliers
o Variance (VAR) Average squared distance to mean
o Standard Deviation (SD) Sqaure root of VAR
o Coefficient of Variation (CV) Rescale SD with mean

Shape Symmetric? Thin or fat tail?


o Skewness Symmetry Cubic Extreme values
o Kurtosis Shape of tail Quadratic Extreme values
Revisit - IQR
Quartiles
Quartiles divided data into four (quart) parts (tiles):

o First quartile: Median


o Second quartile:

o Third quartile:

Interquartile range (IQR)


Difference between the third and first quartile:​
Less affected by the outlier
Revisit – IQR - Extension
Rule of thumb
Outlier Detection
IQR can be considered as the range of the middle portion of the data, so this is not sensitive to
the outliers. In other words, we can use IQR to roughly define outliers:

o Mild / Suspected Outliers: any data points larger than Q3+1.5IQR or smaller than Q1-1.5IQR.

o Serious / Extreme Outliers: any data points larger than Q3+3IQR or smaller than Q1-3IQR.

Example: Outlier
Data: 1, 2, 3, 4, 5 Data: 1, 2, 3, 4, 50
1 2
Q1=2, Q3=4, IQR=2 Q1=2, Q3=4, IQR=2

Q3+1.5IQR=7, Q1-1.5IQR=-1 Q3+1.5IQR=7, Q1-1.5IQR=-1


Q3+3IQR=10, Q1-3IQR=-4 Q3+3IQR=10, Q1-3IQR=-4
Revisit - Skewness
Skewness
Average cubic distance from mean scales by cubic SD

skew=0: symmetric skew>0: right-skewed skew<0: left-skewed


Revisit - Skewness
Example:
Data: 1, 2, 3, 4, 5 Data: 1, 2, 3, 4, 10
1 2

1 -2 4 -8 1 -3 9 -27
2 -1 1 -1 2 -2 4 -8
3 0 0 0 3 -1 1 -1
4 1 1 1 4 0 0 0
5 2 4 8 10 6 36 216

Population skewness: Population skewness:


Revisit - Kurtosis
Kurtosis
Thickness of the tail

excesskurtosis=0: mesokurtic Normal distribution

excesskurtosis>0: leptokurtic Fat tail (slender peak)

excesskurtosis<0: platykurtic Thin tail (broad peak)


Revisit - Kurtosis
Example:

1 Data: 1, 2, 3, 4, 5
2 Data: 1, 2, 3, 4, 15

1 -2 4 16 1 -4 16 256
2 -1 1 1 2 -3 9 81
3 0 0 0 3 -2 4 16
4 1 1 1 4 -1 1 1
5 2 4 16 15 10 100 10000

Population kurtosis: Population kurtosis:

Population excess kurtosis: -1.3 Population excess kurtosis: 0.063


Platykurtic Leptokurtic
Relationship Between Paired Observations
Previously, we only have a single measure on one individual:

Suppose, we now have two measures on the same individual:

For example, x can be height and y can be weight.

We then consider the relationship between x and y.

o Covariance
o Correlation coefficient
Covariance - Definition
Covariance
Whether the given two sequences move toward means together or not?

Population covariance

Sample covariance

Cov>0: positive correlation, moves together;


Cov<0: negative correlation, moves oppositely.
Correlation Coefficient - Definition
Correlation coefficient
Rescale covariance by standard deviations

Population correlation coefficient

Sample correlation coefficient Coincide to be the same!

Correlation coefficient is always between –1 and 1.


Covariance - Illustration
Perfect positive correlation Positive correlation Zero correlation

Negative correlation Perfect negative correlation


Covariance & Correlation - Example
Example:

1 Data: 1, 2, 3, 4, 5

1 -2 4 4 -2 4 4
2 -1 1 7 1 1 -1
3 0 0 6 0 0 0
4 1 1 4 -2 4 -2
5 2 4 9 3 9 6

Covariance:

Correlation:
Covariance & Correlation - Importance
Why we care if them move together?

o For finance, negative correlated assets reduces total risk exposure (diversification)

o For marketing, the manager may want to know if consumers buying for one good would also be
interested to buy another good. Then the manager may want to provide discount for bundle
goods to boost sale.
Application
Summary statistics

Characteristic of stock return

o High mean / median: high return


o Low variance / standard deviation: less risky
o Positive skewness: likely to have a positive surprise
o Low kurtosis: less risk

Characteristic of a good portfolio


o Zero or negative correlation: lower total risk
Excel – Basic Functions
Use Excel as a Calculator
https://support.microsoft.com/en-us/office/excel-functions-
Official Documentation
alphabetical-b3944572-255d-4efb-bb96-c6d90033e188

Uploaded as
1 – Descriptive Statistics.xlsx
Excel – Cell Reference​
Use Excel as to Store Data

Instead of entering data directly, store the data in the spreadsheet:

Single Cell Reference


o Absolute address: $A$1 – unchanged after copy and paste
o Relative address: A1 – move along copy and paste

Range of Cells
o Absolute address: $A$1:$B$2
o Relative address: A1:B2
Excel – Bar Chart
Use Excel to Generate a Bar Chart Vertical Axis Title

Computer-Related Jobs Number of Works


700,000
Job Names 2010 2020 Est. Median Pay
Systems Analysts 544,400 664,800 $77,740 600,000

Software App Developers 520,800 664,500 $90,530 500,000

Programmers 363,100 406,800 $71,380


400,000
Network / System Admins 347,200 443,800 $69,160
300,000
CIS Managers 307,900 363,700 $115,780
200,000
Infor Security Analysts 302,300 367,900 $75,660
Database Administrators 110,800 144,800 $73,490 100,000

0
Systems Software App Programmers Network / CIS Managers Infor Security Database
o Select the data (e.g., first three columns) Analysts Developers System
Admins
Analysts Administrators

o Click insert > insert Column or Bar Chart icon 2010 2020 Est.

Horizontal Axis Legend


Excel – Scatterplot
Use Excel to Generate a Scatterplot
Ice Cream Revenue
Temperature ℃ Ice Cream Revenue
1 14.2 $215
Ice Cream Revenue VS Temperature
2 16.4 $325
$700
3 11.9 $185 $600
4 15.2 $332 $500
y = 29.989x - 149.29
5 18.5 $506 $400

6 22.1 $522 Fitted equation $300 Trendline


$200
7 19.4 $412
$100
8 25.1 $614
$0
9 23.4 $544 0 5 10 15 20 25 30

10 18.1 $421 Ice Cream Revenue Linear (Ice Cream Revenue)

11 22.6 $445
12 17.2 $408 o Click on the scatters
o Right click > add trendline
o Select the data (e.g., last two columns) o Click on the trendline
o Click insert > insert Scatterplot icon o Right click > add trendline equation
Excel – Scatterplot VS Bar Chart
Which one to choose?
Ice Cream Revenue Ice Cream Revenue
Temperature ℃ Ice Cream Revenue Ice Cream Brand Ice Cream Revenue
1 14.2 $215 Pixabay $185
2 16.4 $325 Ben&Jerry's $215
3 11.9 $185 Haagen-Dazs $332
4 15.2 $332 Dreyer's $325
5 18.5 $506 Blue Bell $408
Continuous6 22.1 $522 Skinny Cow $421
Discrete / Halo Top $406
variable 7 19.4 $412
Categorical
8 25.1 $614
variable
9 23.4 $544
10 18.1 $421
11 22.6 $445
12 17.2 $408

Scatterplot Bar chart


Excel – Bar Chart VS Histogram
Which one to choose?
Bar Chart
Ice Cream Revenue
Ice Cream Revenue $500

Ice Cream Brand Ice Cream Revenue


Illustration $400
$300

Pixabay $185 by brand $200


$100
Ben&Jerry's $215 $0

Haagen-Dazs $332
Dreyer's $325
Blue Bell $408 Ice Cream Revenue

Skinny Cow $421


Histogram
Halo Top $406
Distribution of Ice Cream Revenue

Distribution
of revenue
Excel – Scatterplot VS Line Chart
Which one to choose?
Scatterplot
Ice Cream Revenue Ice Cream Revenue VS Temperature
$700
Temperature ℃ Ice Cream Revenue
$600
1 14.2 $215 $500
Correlation $400 y = 29.989x - 149.29
2 16.4 $325 $300
$200
3 11.9 $185
$100
4 15.2 $332 $0
0 5 10 15 20 25 30
5 18.5 $506
Ice Cream Revenue Linear (Ice Cream Revenue)
6 22.1 $522
7 19.4 $412 Line Chart
8 25.1 $614
Trend
Ice Cream Revenue
9 23.4 $544
$800
10 18.1 $421
$600
11 22.6 $445
$400
12 17.2 $408 $200

$0
1 2 3 4 5 6 7 8 9 10 11 12
o Select the data (e.g., the last column)
Series1
o Click insert > insert Line Chart icon
Excel – Wrong Examples
Charts can become meaningless
Ice Cream Revenue
Temperature ℃ Ice Cream Revenue A Wrong Line Chart
1 14.2 $215 700

2 16.4 $325 600

500
3 11.9 $185
400
4 15.2 $332
300
5 18.5 $506 200
6 22.1 $522 100

7 19.4 $412 0
1 2 3 4 5 6 7 8 9 10 11 12
8 25.1 $614 Temperature ℃ Ice Cream Revenue
9 23.4 $544
10 18.1 $421
11 22.6 $445 o There is no need to plot these two lines in one chart.
12 17.2 $408 These two columns have no meaning for comparison.
Excel – Wrong Examples
Charts can become meaningless

Ice Cream Revenue Another Wrong Line Chart


Ice Cream Brand Ice Cream Revenue $450
$400
Pixabay $185 $350
$300
Ben&Jerry's $215 $250
$200
Haagen-Dazs $332 $150
$100
Dreyer's $325 $50
Blue Bell $408 $0

Skinny Cow $421


Halo Top $406
Ice Cream Revenue

o The brand is not ordinal categorical or numerical


variable, therefore, this trend is meaningless.

You might also like