0% found this document useful (0 votes)
84 views60 pages

Business Statistics: Session 2

This document discusses descriptive statistics and measures used to summarize datasets. It introduces key concepts like measures of central tendency (mean, median, mode), measures of variability (range, standard deviation, variance), and how to compute them. The document provides examples and formulas for calculating common descriptive statistics like mean, median, mode, quartiles, and standard deviation on both ungrouped and grouped data. It also discusses concepts like the empirical rule and Chebyshev's theorem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views60 pages

Business Statistics: Session 2

This document discusses descriptive statistics and measures used to summarize datasets. It introduces key concepts like measures of central tendency (mean, median, mode), measures of variability (range, standard deviation, variance), and how to compute them. The document provides examples and formulas for calculating common descriptive statistics like mean, median, mode, quartiles, and standard deviation on both ungrouped and grouped data. It also discusses concepts like the empirical rule and Chebyshev's theorem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

Business statistics

Session 2
Descriptive statistics

 Descriptive statistics provide an organization


and summary of a dataset
 A small number of summary measures replaces
the entirety of a dataset
 We’ll briefly talk about some simple descriptive
summary measures
Learning Objectives

 Distinguish between measures of central


tendency, measures of variability, measures of
shape.
 Understand the meanings of mean, median,
mode, quartile and range.
 Compute mean, median, mode, quartile, range,
variance, standard deviation, and mean absolute
deviation on ungrouped data.
 Differentiate between sample and population
variance and standard deviation
Learning Objectives -- Continued

 Understand the meaning of standard deviation


as it is applied by using the empirical rule and
Chebyshev’s theorem.
 Compute the mean, median, standard deviation,
and variance on grouped data.
 Understand box and whisker plots, skewness,
and kurtosis.
Measures of Central Tendency:
Ungrouped Data
 Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
 Common Measures of Location
 Mode
 Median
 Mean
 Quartiles
Mode

 Mode - the most frequently occurring value in a


data set
 Applicable to all levels of data measurement

(nominal, ordinal, interval, and ratio)


 Can be used to determine what categories

occur most frequently


 Bimodal – In a tie for the most frequently
occurring value, two modes are listed
 Multimodal -- Data sets that contain more than
two modes
Median

 Median - middle value in an ordered array of


numbers.
 For an array with an odd number of terms, the

median is the middle number


 For an array with an even number of terms the

median is the average of the middle two


numbers
Arithmetic Mean

 Mean is the average of a group of numbers


 Applicable for interval and ratio data
 Not applicable for nominal or ordinal data
 Affected by each value in the data set,
including extreme values
 Computed by summing all values in the data
set and dividing the sum by the number of
values in the data set
Demonstration Problem

 The number of U.S. cars in service by top car rental


companies in a recent year according to Auto Rental
News follows.

 Company Number of Cars in Service


Enterprise 643,000; Hertz 327,000; National/Alamo
233,000; Avis 204,000; Dollar/Thrifty 167,000; Budget
144,000; Advantage 20,000; U-Save 12,000; Payless
10,000; ACE 9,000; Fox 9,000; Rent-A-Wreck 7,000;
Triangle 6,000

 Compute the mode, the median, and the mean.


Demonstration Problem

Solution:
 Mode: 9,000
 Median: With 13 different companies in this
group, N = 13. The median is located at the (13
+1)/2 = 7th position. Because the data are
already ordered, the 7th term is 20,000, which is
the median.

 Mean: The total number of cars in service is


1,791,000 = ∑x

μ = ∑x/N = (1,791,000/13) = 137,769.23


Quartiles

 Quartile - measures of central tendency that


divide a group of data into four subgroups
 Q1: 25% of the data set is below the first
quartile
 Q2: 50% of the data set is below the second
quartile
 Q3: 75% of the data set is below the third
quartile
Q1 Q2 Q3
25% 25% 25% 25%
Measures of Variability:
Ungrouped Data
 Measures of Variability - tools that describe the
spread or the dispersion of a set of data.
 Provides more meaningful data when used

with measures of central tendency


Measures of Variability:
Ungrouped Data
 Common Measures of Variability
 Range
 Inter-quartile Range
 Mean Absolute Deviation
 Variance
 Standard Deviation
 Z scores
 Coefficient of Variation
Range

 The difference between the largest and the


smallest values in a set of data
 Advantage – easy to compute
 Disadvantage – is affected by extreme values
Interquartile Range

 Interquartile Range - range of values between


the first and third quartiles
 Range of the “middle half”; middle 50%
 Useful when researchers are interested in the

middle 50%, and not the extremes


 Interquartile Range – used in the construction
of box and whisker plots
Interquartile Range  Q 3  Q1
Mean Absolute Deviation, Variance,
and Standard Deviation
 These data are not meaningful unless the data
are at least interval level data
 One way for researchers to look at the spread of
data is to subtract the mean from each data set
 Subtracting the mean from each data value

gives the deviation from the mean (X - µ)


Mean Absolute Deviation, Variance,
and Standard Deviation
 An examination of deviation from the mean can
reveal information about the variability of the
data
 Deviations are used mostly as a tool to compute
other measures of variability
 The Sum of Deviation from the arithmetic mean
is always zero
Sum (X - µ) = 0
Mean Absolute Deviation, Variance,
and Standard Deviation
 An obvious way to force the sum of deviations to
have a non zero total is to take the absolute
value of each deviation around the mean

 Allows one to solve for the Mean Absolute


Deviation
Mean Absolute Deviation (MAD)

 Mean Absolute Deviation - average of the


absolute deviations from the mean

X X   X  
M . A. D. 
 X
5 -8 +8
9 -4 +4 N
16 +3 +3 24
17 +4 +4 
18 +5 +5 5
0 24  4.8
Population Variance

 Variance - average of the squared deviations


from the arithmetic mean

 Population variance is denoted by σ2

 Sum of Squared Deviations (SSD) about the


mean of a set of values (called Sum of Squares
of X) .
Population Variance

 Variance = average of the squared deviations


from the arithmetic mean
 Population variance is denoted by σ2
X   X
X  X 
 
2
2


2
5 -8 64 
9 -4 16 N
16 +3 9 130
17 +4 16 
18 +5 25 5
0 130  2 6 .0
Sample Variance

 Sample Variance - average of the squared


deviations from the arithmetic mean
 Sample Variance – denoted by S2

X X  X X  X 
 
2

 
2
2

X X
2,398 625 390,625 S n 1
1,844 71 5,041
663,866
1,539 -234 54,756 
1,311 -462 213,444 3
7,092 0 663,866  221,288.67
Where,X =1773
Sample Standard Deviation

 Sample Std Dev is the square root of the sample


variance.

X X  X X  X  X 
2

  X
2

2
S 
n  1
2,398 625 390,625 6 6 3 ,8 6 6
1,844 71 5,041 
3
1,539 -234 54,756  2 2 1 , 2 8 8 .6 7
1,311 -462 213,444
S 
2

7,092 0 663,866
S
 2 2 1 , 2 8 8 .6 7
 4 7 0 .4 1
Empirical Rule

 Empirical Rule – used to state the approximate


percentage of values that lie within a given
number of standard deviations from the set of
data if the data are normally distributed
 Empirical rule is used only for three numbers of
standard deviation: 1σ, 2σ, and 3σ
1σ = 68% of data;
2σ = 95% of data; and
3σ = 99% of data
Chebyshev’s Theorem

 Empirical rule – applies when data are


approximately normally distributed
 Chebyshev’s Theorem – applies to all
distributions, and can be used whenever the data
distribution shape is unknown or non-normal

1
P(   k  X    k )  1  2
k
for k > 1
Chebyshev’s Theorem

 Chebyshev’s Theorem - states that at least (1 –


1/k2) values fall within +k standard deviations of
the mean regardless of the shape of the
distribution
 Example: At least 75% of all values are within
+2σ of the mean regardless of the shape of a
distribution
when k = 2, then (1 – 1/k2) = .75
when K=3 ,then (1-1/k2 )=.89
Demonstration Problem
The effectiveness of district attorneys can be measured by
The effectiveness of district attorneys can be measured by
several
severalvariables,
variables,including
includingthe
thenumber
numberofofconvictions
convictionsperper
month,
month, the
the number
number of of cases
caseshandled
handledperpermonth,
month, and
and the
the total
number of years
total number of conviction
of years per month.
of conviction A researcher
per month. uses a
A researcher
uses a sample
sample of five attorneys
of five district district attorneys
in a city in a city
and and
determines the
determines the total number of years of conviction that each
total number
attorney wonof years of
against convictionduring
defendants that each attorney
the past month,wonas
against
reporteddefendants
in the firstduring
column the
inpast month, astabulations.
the following reported in the
Compute
first column theinmean absolutetabulations.
the following deviation, the variance,
Compute theand
meanthe
standarddeviation,
absolute deviation the
for these figures.
variance, and the standard deviation
figures.
Demonstration Problem

Solution
The researcher computes the mean absolute deviation,
the variance, and the standard deviation for these
data in the following manner.
x |x-x| (x-x )2
55 41 1,681
100 4 16
125 29 841
140 44 1,936
60 36 1,296
∑x = 480 ∑ ( x-x)2 =5770
X =96
Demonstration Problem

 The computational formulas are used to


solve for s2 and s and compares the
results.

 S2 = (5,770/4) = 1,442.5 and S = Square


root of variance = 37.98

 MAD = 154/5 = 30.8


Z Scores

 Z score – represents the number of Std Dev a


value (x) is above or below the mean of a set
of numbers when the data are normally
distributed
 Z score allows translation of a value’s raw
distance from the mean into units of std dev.
 Z = (x-µ)/σ
Z Scores

 If Z is negative, the raw value (x) is below the


mean
 If Z is positive, the raw value (x) is above the
mean
 Between
Z = + 1, are app. 68% of the values
Z = + 2, are app. 95% of the values
Z = + 3, are app. 99% of the values
Coefficient of Variation

 Coefficient of Variation (CV) - ratio of the standard


deviation to the mean, expressed as a percentage
 useful when comparing Std Dev computed from data
with different means
 Measurement of relative dispersion


C.V . 100

Coefficient of Variation

  29
1
  84
2

 1
 4.6  2
 10
 100  100
. .
CV 1
1
. .
CV 2
2

1 2

4.6 10
 100  100
29 84
 1586
.  1190
.
Measures of Central Tendency
and Variability: Grouped Data
 Measures of Central Tendency
 Mean

 Median

 Mode

 Measures of Variability
 Variance

 Standard Deviation
Measures of Central Tendency
and Variability: Grouped Data
 Mean – The midpoint of each class interval is used
to represent all the values in a class interval
 Midpoint is weighted by the frequency of values in

the
class interval
 Mean is computed by summing the products of
class midpoint, and the class frequency for each
class and
dividing that sum by the total number of
frequencies
Measures of Central Tendency
and Variability: Grouped Data
 Median – The middle value in an ordered array
of numbers
 Mode – the mode for grouped data is the class
midpoint of the modal class
 The modal class is class interval with the

greatest frequency
Calculation of Grouped Mean

Class Interval Frequency Class Midpoint fM


20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150

 fM 
2150
 43 . 0
f 50
Median of Grouped Data - Example

N
 cfp
Class Interval Freq Cum.Freq Md  L  2 W 
fmed
20-under 30 6 6 50
 24
 30-under 40 18 24  40  2 10
11
 40-under 50 11 35  40.909
 50-under 60 11 46 Where,
 60-under 70 3 49 Md =median value
L=Lower limit of median class
 70-under 80 1 50 Fmed =freq of median class
N = 50 Cfp =cum freq of class
preceding median class
W= width of class interval
Mode of Grouped Data

Class Interval Frequency 30  40


20-under 30 6 Mode   35
30-under 40 18 2
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Mode of Grouped Data contd…
Mode is given by:
Class Freq. d1
L W
0-1 1 d1  d 2
1-2 4 d1  f1  f 0 d 2  f1  f 2
Where,
2-3 8 L=lower limit of modal class
W= width of class
f1=freq of modal class
3-4 7 f2=freq of class after the modal class
f0=freq of class preceding modal class
4-5 3
Here, L=2 d1 =8-4
5-6 2 W=1 d2=8-7
Total 25
So mode = 2+ 4/5*1= 2.8
Variance and Standard Deviation
of Grouped Data
Population Sample

  M   S  M  X 
2 2
f f

2

 
2
n1
N
2
S S
 
2
Population Variance and Standard
Deviation of Grouped Data

M  M    M 
2
f
2
Class Interval M fM
f

20-under 30 6 25 150 -18 324 194


30-under 40 18 35 630 -8 64 1152
4
40-under 50 11 45 495 2 4 44
50-under 60 11 55 605 12 144 1584
3 65 195 22 1452
60-under 70 484
1 75 75 32 1024
70-under 80 50 1024 7200
2150

M 
2

 2
f
µ=43

2
 
7200
 144  144  12
N 50
Further Measures of the Distribution

• While measures of dispersion are useful for helping


us describe the width of the distribution, they tell us
nothing about the shape of the distribution
Further Measures of the Distribution

There are further statistics that describe the


shape of the distribution, using formulae that
are similar to those of the mean and variance
 Mean (describes central value)
 Variance (describes dispersion)
 Skewness (describes asymmetry)
 Kurtosis (describes peakedness)
Measures of Shape :Skewness and
Kurtosis
 A fundamental task in many statistical analyses is to
characterize the location and variability of a data set
(Measures of central tendency vs. measures of
dispersion)
 Both measures tell us nothing about the shape of the
distribution
 A further characterization of the data includes
skewness and kurtosis
 The histogram is an effective graphical technique for
showing both the skewness and kurtosis of a data
set
Skewness-Measure of asymmetry

 Symmetrical – the right half is a mirror image of


the left half
 Skewness – shows that the distribution lacks
symmetry; used to denote the data is sparse at
one end, and piled at the other end
 Absence of symmetry

 Extreme values in one side of a distribution


Types of Skewness
Skewness-Measure of asymmetry

 If skewness equals zero, the histogram is symmetric about the


mean
 Positive skewness
 There are more observations below the mean than above it
 When the mean is greater than the median
 Negative skewness
 There are a small number of low observations and a large
number of high ones
 When the median is greater than the mean
Coefficient of Skewness

 Coefficient of Skewness (Sk) - compares the


mean and median in light of the magnitude to
the standard deviation; Md is the median; Sk is
coefficient of skewness; σ is the Std Dev

3  Md 
Sk 

Coefficient of Skewness

 Summary measure for skewness


3  Md 
Sk 

 If Sk < 0, the distribution is negatively skewed


(skewed to the left).
 If Sk = 0, the distribution is symmetric (not
skewed).
 If Sk > 0, the distribution is positively skewed
(skewed to the right).
Further Moments – Kurtosis

 The kurtosis of a normal distribution is 0


(Mesokurtic)
 Kurtosis characterizes the relative peakedness
or flatness of a distribution compared to the
normal distribution
Further Moments – Kurtosis

 Platykurtic– When the kurtosis < 0, the frequencies


throughout the curve are closer to be equal (i.e., the
curve is more flat and wide)
 Thus, negative kurtosis indicates a relatively flat
distribution
 Leptokurtic– When the kurtosis > 0, there are high
frequencies in only a small part of the curve (i.e, the
curve is more peaked)
 Thus, positive kurtosis indicates a relatively peaked
distribution
Low vs high Kurtosis
leptokurtic
Platykurtic

Kurtosis is based on the size of a distribution's tails.


Negative kurtosis (platykurtic) – distributions with
short tails
Positive kurtosis (leptokurtic) – distributions with
relatively long tails
Why Do We Need Kurtosis?

 These two distributions have the same variance,


approximately the same skew, but differ markedly in
kurtosis.
How to Graphically Summarize Data?

 Histograms

 Box plots
Functions of a Histogram

 The function of a histogram is to graphically


summarize the distribution of a data set
 The histogram graphically shows the following:
1. Center (i.e., the location) of the data
2. Spread (i.e., the scale) of the data
3. Skewness of the data
4. Kurtosis of the data
4. Presence of outliers
5. Presence of multiple modes in the data.
Functions of a Histogram contd..

 The histogram can be used to answer the


following questions:
1. What kind of population distribution do the
data come from?
2. Where are the data located?
3. How spread out are the data?
4. Are the data symmetric or skewed?
5. Are there outliers in the data?
Box Plots

 We can also use a box plot to graphically summarize a data set


 A box plot represents a graphical summary of what is
sometimes called a “five-number summary” of the distribution
 Minimum
 Maximum
 25th percentile
 75th percentile
 Median
 Interquartile Range (IQR)

Box plot
Box Plots

Example – Consider first 9 stock prices ( in $,000)


6.0, 6.7, 3.8, 7.0, 5.8, 9.975, 10.5, 5.99, 20.0
Arrange these in order of magnitude
3.8, 5.8, 5.99, 6.0, 6.7, 7.0, 9.975, 10.5, 20.0
The median is Q2 = 6.7 (there are 4 values on either
side)
Q1 = 5.9 (median of the 4 smallest values)
Q3 = 10.2 (median of the 4 largest values)
IQR = Q3 – Q1 = 10.2 - 5.9 = 4.3
THANK YOU

You might also like