0% found this document useful (0 votes)

6 views18 pages

Week_4

This document discusses measures of dispersion, which are essential for understanding the variability in data sets beyond just central tendency. Key measures include range, variance, standard deviation, and interquartile range, along with visual tools like box-and-whisker plots. The document also covers the empirical rule and Chebyshev's theorem for estimating the distribution of data points around the mean.

Uploaded by

noborongpotrika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views18 pages

Week_4

Uploaded by

noborongpotrika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Measures of Dispersion

The various measures of central tendency (averages) give us a single value that
represents the entire data. But the average alone cannot adequately describe a set of
observations. Furthermore, measures of central value fail to give us any idea about
the formulation of the data sets. For this reason, it is necessary to study the
dispersion (variability) along with average for describing a data set.

Example: Populations/Samples can have similar means but the variations can be
very different.
Consider two datasets:

20, 22, 24 and 0, 22, 44

Both have same mean, 22, but the variation in the second dataset is larger.

Measure of Dispersion

The measurement of the scatter of the values of a data set among themselves is
called a measure of dispersion or measure of variation. A measure of dispersion
conveys information regarding the amount of variability present in a set of data. If
all the values are the same, there is no dispersion; if they are not all the same,
dispersion is present in the data. The amount of dispersion may be small when the
values, though different, are close together. If the values are widely scattered, the
dispersion is greater. Other terms used synonymously with dispersion include
variation, spread, and scatter.

1
Figure 2.5.1 shows the frequency polygons for two populations that have equal
means but different amounts of variability. Population B, which is more variable
than population A, is more spread out.

The two commonly used measures of dispersion are:

(a) Range
(b) Variance or Standard Deviation

The Range:

The range of a set of data values is the difference between the highest and the
lowest values in the set. If 𝑥𝑙 & 𝑥𝑠 are the largest and smallest values, respectively
in a set, the range, denoted by R, is defined as
𝑅 = 𝑥𝑙 − 𝑥𝑠 .

2
Example: Calculate sample range for the following data:
2, 4, 5, 8

Solution: The range is 𝑅 = 8 − 2 = 6.

Note:
1. The value of range is always non-negative.
2. The unit of range is same as the unit of data.
3. The range is not useful as a measure of the variation since it only
takes into account two of the values (it is not good), however, it plays
a significant role in some applications.

Variance:

3
4
5
6
The variance represents squared units and, therefore, is not an appropriate
measure of dispersion when we wish to express this concept in terms of the
original units.

7
8
Example:

9
10
Interquartile Range: A measure that reflects the variability among the middle 50
percent of the observations in a data set is the interquartile range. Interquartile
range, denoted by IQR, is defined as
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 ,
where 𝑄1 and 𝑄3 are the first and third quartiles, respectively. A large IQR
indicates a large amount of variability among the middle 50 percent of the relevant
observations, and a small IQR indicates a small amount of variability among the
relevant observations.

Box-and-Whisker Plot
A useful visual device for communicating the information contained in a data set is
the box-and-whisker plot. The construction of a box-and whisker plot (sometimes
called, simply, a box plot) makes use of the quartiles of a data set. It
 Depicts the central tendency and variability
 Displays where there are any potential outlier/extreme observations
Steps in drawing Box-and-Whisker plot for a given data set are
1. Compute 𝑄1 , 𝑄2 (Median), and 𝑄3 .
2. Calculate inner fences and outer fences:
Inner fences: Q1 – 1.5(IQR)
Q3 + 1.5(IQR)
Outer fences: Q1 – 3(IQR)
Q3 + 3(IQR)
3. Identify the smallest observation, a and largest observation, b that
are between the inner fences
4. Draw a box that extends from Q1 to Q3. Draw a vertical line
through the box at median
5. Draw whiskers as lines that extend below Q1 and above Q3. Draw
one whisker from Q1 to a and the other whisker from Q3 to b.

11
6. Measurements that are located between inner and outer fences are
called mild outliers. Plot them using the symbol *.
7. Measurements that are located outside the outer fences are called
extreme outliers. Plot them using the symbol o.

Outlier: An observation of 𝑥 > 𝑄3 + 1.5(𝐼𝑄𝑅) or an observation of 𝑥 < 𝑄1 −

1.5(𝐼𝑄𝑅) is called an outlier.

Example:
Draw Box-and-Whisker plot for 20 customer satisfaction scores:
1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10

1. Calculate Q1, Q2 (Md), Q3 and IQR

Q1 = 7.5, Q2 = 8, Q3 = 9, IQR = 9 - 7.5 = 1.5
2. Calculate inner fences and outer fences:

Inner fences: Q1 – 1.5(IQR) = 7.5 – 1.5(1.5) = 5.25

Q3 + 1.5(IQR) = 9 + 1.5(1.5) = 11.25

Outer fences: Q1 – 3(IQR) = 7.5 – 3(1.5) = 3.0

Q3 + 3(IQR) = 9 + 3(1.5) = 13.5

3. Identify the smallest (a) and largest (b) observations that are between the
inner fences

4. Draw a box that extends from Q1 to Q3. Draw a vertical line through the box
at Md

12
5. Draw whiskers as lines that extend below Q1 and above Q3. Draw one
whisker from Q1 to (a) and the other whisker from Q3 to (b).
6. Measurements that are located between inner and outer fences are called
mild outliers. Plot them using the symbol *
7. Measurements that are located outside the outer fences are called extreme
outliers. Plot them using the symbol o

 There are two mild outliers and one extreme outlier in the dataset

Detecting Shape of Distribution from Box-and-Whisker Plot

 If distance between Q2 and Q1 and distance between Q3 and Q2 are

approximately equal, the shape of distribution is symmetric
 If Q2 is closer to Q1 compared to Q3, the shape of distribution is
positively skewed
 If Q2 is closer to Q3 compared to Q1, the shape of distribution is
negatively skewed

13
Empirical Rule (Rule 68-95-99.7)
If a distribution appears bell-shaped symmetric about 𝜇 , we expect that
approximately
 68% of the observations to fall in the interval (𝜇 − 𝜎, 𝜇 + 𝜎)
(within one standard deviation of the mean)
 95% of the observations to fall in the interval (𝜇 − 2𝜎, 𝜇 + 2𝜎)
(within two standard deviations of the mean)
 99.7% of the observations to fall in the interval (𝜇 − 3𝜎, 𝜇 + 3𝜎)
(within three standard deviations of the mean)
𝜇: 𝑃𝑜𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑀𝑒𝑎𝑛; 𝜎: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

14
Example 1: A research was performed on the IQ scores of the employees of a
private firm. The scores are noted to be in normal distribution (bell shaped
symmetric). The mean of the distribution is 100 and standard deviation is 15.
Estimate the percentage of the scores that fall between 70 and 130.

Solution: According to empirical rule, we are supposed to calculated either

(𝜇 − 𝜎, 𝜇 + 𝜎), or (𝜇 − 2𝜎, 𝜇 + 2𝜎), or (𝜇 − 3𝜎, 𝜇 + 3𝜎). Here, 𝜇 = 100 𝑎𝑛𝑑 𝜎 =
15.
70=100-30=100-2(15)
130=100+30=100+2(15)
Thus, 130 and 70 are 2 standard deviations to the right and to the left of the mean.
Therefore, from the definition of empirical rule, about 95% of the IQ scores will
fall between 70 and 130.

Example 2: The scores of an entrance test for the high school pass-outs in a
particular year were bell shaped symmetric. If the mean and standard deviation
were 490 and 100, then

(a) What percentage students scored between 590 and 390 on this test?
(b) The score of a student was 795. What can you say about his
performance as compared to rest of the scores?

Solution:
a) Since 590 = 490 + 100 = μ+ σ
and 390 = 490 - 100 = μ - σ

Hence, we can say that approximately 68% of the students scored between 590 and
390 on this test.

15
b) Since 490 + 3 x 100 = 790 = μ + 3σ
490 - 3 x 100 = 190 = μ - 3σ
We can say that 99.7% of the test scores lie between 190 and 790. Hence a score of
795 is one of the highest scores.

Problem: If the average age of retirement for the entire population in a country is
64 years and the distribution is normal (bell shaped symmetric) with a standard
deviation of 3.5 years, what is the approximate age range in which 95% of people
retire?

Empirical Rule for a Given Large Sample:

If a distribution (histogram, box plot) appears bell-shaped symmetric, we expect

around

 68% of the observations to fall in the interval (𝑥 − 𝑠, 𝑥 + 𝑠)

(within one standard deviation of the sample mean)
 95% of the observations to fall in the interval (𝑥 − 2𝑠, 𝑥 + 2𝑠)
(within two standard deviations of the sample mean)
 99.7% of the observations to fall in the interval (𝑥 − 3𝑠, 𝑥 + 3𝑠)
(within three standard deviations of the sample mean)

𝑥 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛; 𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Example: Data on scores of an exam collected from 50 students

16
78 64 76 82 68 69 67 79 69 74 83 76 71 84 72 75 72 71 83 76 68 75 73 69 73 76
77 68 71 72 70 77 71 74 75 75 75 71 74 71 70 76 64 65 76 78 70 69 82 77
𝑥 = 73.42, 𝑠 = 4.82

According to Empirical Rule

 68% of the students scored between (73.42-4.82 =) 68.6 to (73.42+4.82 =)
78.24
 95% of the students scored between 63.77 to 83.07
 99.7% of the students scored between 58.95 to 87.89

But actually following are the students who scored between 68.6 to 78.24:
69 69 69 69 70 70 70 71 71 71 71 71 71 72 72 72 73 73 74 74 74 75 75 75 75 75
76 76 76 76 76 76 77 77 77 78 78
There are 37 of them out of 50, therefore the actual percentage is 74 (which was
estimated as 68).

17
Chebyshev’s Theorem

 Similar to Empirical Rule, but this can be applied even when the distribution
is not bell shaped symmetric.
1
 For any value k > 1, at least 100 1 − % of the population measurements
𝑘2
lie in the interval (𝜇 − 𝑘𝜎, 𝜇 + 𝑘𝜎).

Problem: A population data set of size N = 500 has mean μ = 5.2 and standard
deviation σ = 1.1. Find the minimum number of observations in the data set that
must lie:

(a) between 3 and 7.4;

(b) between 1.9 and 8.5.

Problem: Over the last decade, Amazon.com has sold the following number of
books (in millions):
103 106 114 177 111 162 148 119 120 144

a) Calculate the sample mean and sample standard deviation.

b) Use Chebyshev’s Theorem to find an interval centered about the mean
in which you would expect 75% of the years to fall.
c) Use Chebyshev’s Theorem to find an interval centered about the mean
in which you would expect 93.8% of the years to fall.

Krishnan N. Machine Learning For Materials Discovery. Numerical Recipes... 2024
No ratings yet
Krishnan N. Machine Learning For Materials Discovery. Numerical Recipes... 2024
287 pages
Intuitive Biostatistics & Normality Test & Sample PDF
100% (11)
Intuitive Biostatistics & Normality Test & Sample PDF
605 pages
AP Stats Investigative Task 2 - Education Money
100% (1)
AP Stats Investigative Task 2 - Education Money
3 pages
STAT Lab
No ratings yet
STAT Lab
6 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Lecture 5&6
No ratings yet
Lecture 5&6
15 pages
05 -- moments-standized_variable_chebychev-1
No ratings yet
05 -- moments-standized_variable_chebychev-1
22 pages
Measures of Dispersion Tendency
No ratings yet
Measures of Dispersion Tendency
7 pages
3--Measures of Variation (2)
No ratings yet
3--Measures of Variation (2)
36 pages
Lecture 4 Measures of Dispersion
No ratings yet
Lecture 4 Measures of Dispersion
34 pages
2.1 - Examining Numerical Data
No ratings yet
2.1 - Examining Numerical Data
60 pages
Measures of Location
No ratings yet
Measures of Location
6 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
4 Numerical Methods For Describing Data
No ratings yet
4 Numerical Methods For Describing Data
50 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Measures of Dispersion Topic 11
No ratings yet
Measures of Dispersion Topic 11
8 pages
Lecture No. 6 Measures of Variability
No ratings yet
Lecture No. 6 Measures of Variability
25 pages
AP ECON 2500 Session 2
No ratings yet
AP ECON 2500 Session 2
22 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
Introduction To Probability and Statistics Thirteenth Edition
No ratings yet
Introduction To Probability and Statistics Thirteenth Edition
46 pages
4 - Stat - Measures of Variation 2024
No ratings yet
4 - Stat - Measures of Variation 2024
27 pages
Module 3 Descriptive Statistics Numerical Measures
No ratings yet
Module 3 Descriptive Statistics Numerical Measures
28 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
lecture_note_2
No ratings yet
lecture_note_2
7 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
UNIT FIVE (1)
No ratings yet
UNIT FIVE (1)
23 pages
Statistics Part 1 and 2
No ratings yet
Statistics Part 1 and 2
53 pages
3.4 Descriptive Statistics Measures of Spread
No ratings yet
3.4 Descriptive Statistics Measures of Spread
4 pages
Lecture Slides - Capítulo 02
No ratings yet
Lecture Slides - Capítulo 02
21 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
1 - 3 Descriptive Measures
No ratings yet
1 - 3 Descriptive Measures
33 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
40 pages
lecture_4
No ratings yet
lecture_4
56 pages
Quantitative Methods in Management
No ratings yet
Quantitative Methods in Management
67 pages
Chapter 3(Technical English for Statistics)
No ratings yet
Chapter 3(Technical English for Statistics)
8 pages
7_2
No ratings yet
7_2
34 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Lecture_04
No ratings yet
Lecture_04
88 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Statistics Measure of Center
No ratings yet
Statistics Measure of Center
11 pages
Analysis of Statistcal Data
No ratings yet
Analysis of Statistcal Data
46 pages
Biostats Research Work
No ratings yet
Biostats Research Work
4 pages
Chapter 5 Measures of Variability
No ratings yet
Chapter 5 Measures of Variability
24 pages
Descriptive Statistics - Measures of Spread: April 2014
No ratings yet
Descriptive Statistics - Measures of Spread: April 2014
5 pages
Descriptive Statistics - Measures of Spread: April 2014
No ratings yet
Descriptive Statistics - Measures of Spread: April 2014
5 pages
TN 5 3.2_3.3
No ratings yet
TN 5 3.2_3.3
5 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Author(s) Prerequisites Learning Objectives: Measures of Variability
No ratings yet
Author(s) Prerequisites Learning Objectives: Measures of Variability
17 pages
Descriptive and Inferential Statistics. Confidence Interval
No ratings yet
Descriptive and Inferential Statistics. Confidence Interval
42 pages
CH 3 Numerical Summaries Final PDF 2 23102024 104402pm
No ratings yet
CH 3 Numerical Summaries Final PDF 2 23102024 104402pm
46 pages
1.3 Variation
No ratings yet
1.3 Variation
16 pages
L3 Numerical Summary Measures
No ratings yet
L3 Numerical Summary Measures
44 pages
Stat 102 Module 3
No ratings yet
Stat 102 Module 3
8 pages
Decriptive Part 3
No ratings yet
Decriptive Part 3
32 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
skewness-and-kurtosis
No ratings yet
skewness-and-kurtosis
40 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
R3.Descriptive Statistics
No ratings yet
R3.Descriptive Statistics
5 pages
Ch 3_250408_170537
No ratings yet
Ch 3_250408_170537
33 pages
Chapter 2 Handout Jan 30
No ratings yet
Chapter 2 Handout Jan 30
12 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
ADA Assignment - Final - 2024
No ratings yet
ADA Assignment - Final - 2024
5 pages
TT Mag Issue 008
No ratings yet
TT Mag Issue 008
92 pages
Applied Robust Statistics-David Olive
No ratings yet
Applied Robust Statistics-David Olive
588 pages
Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection
No ratings yet
Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection
28 pages
Artificial Neural Networks For Earthquake Prediction Using Times Series Magnitude Data or Seismic Electric Signals
No ratings yet
Artificial Neural Networks For Earthquake Prediction Using Times Series Magnitude Data or Seismic Electric Signals
8 pages
Exam 1 Review, Solutions, and Formula Sheet, Chapters 1-4
100% (1)
Exam 1 Review, Solutions, and Formula Sheet, Chapters 1-4
9 pages
g5 First Quarterly Assessment Result
No ratings yet
g5 First Quarterly Assessment Result
16 pages
T18001.037_Atellica-Advanced-Operator-Training-Workbook-eff-date-12-31-20
No ratings yet
T18001.037_Atellica-Advanced-Operator-Training-Workbook-eff-date-12-31-20
160 pages
C990 25 PDF
No ratings yet
C990 25 PDF
12 pages
The Normal Distribution Activity
No ratings yet
The Normal Distribution Activity
6 pages
DM Makeup Key
No ratings yet
DM Makeup Key
6 pages
Machine Learning-Based Soft Sensors For Vacuum Dis
No ratings yet
Machine Learning-Based Soft Sensors For Vacuum Dis
9 pages
Essentials of Modern Business Statistics with Microsoft Excel 7th Edition David Anderson - The complete ebook version is now available for download
No ratings yet
Essentials of Modern Business Statistics with Microsoft Excel 7th Edition David Anderson - The complete ebook version is now available for download
66 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
STT 215 Exam 1 Study Guide
No ratings yet
STT 215 Exam 1 Study Guide
2 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Flood Routing
50% (2)
Flood Routing
85 pages
Heart Disease rp2
No ratings yet
Heart Disease rp2
14 pages
DMBI Sem 6 Important Topics (IT)
No ratings yet
DMBI Sem 6 Important Topics (IT)
20 pages
Lec 1 Data Mining Introduction For Exam
No ratings yet
Lec 1 Data Mining Introduction For Exam
48 pages
OREAS 627 Certificate
No ratings yet
OREAS 627 Certificate
27 pages
Bma301 Cat1 Questions
No ratings yet
Bma301 Cat1 Questions
8 pages
Full Download Statistics For Business & Economics 13th Revised Edition Edition David Ray Anderson PDF
100% (4)
Full Download Statistics For Business & Economics 13th Revised Edition Edition David Ray Anderson PDF
64 pages
Final Report Srini
No ratings yet
Final Report Srini
24 pages
Robust Estimation Methods and Outlier Detection in Mediation Model
No ratings yet
Robust Estimation Methods and Outlier Detection in Mediation Model
25 pages
A Presentation On Service Time Variation in Cafe Coffee Day
No ratings yet
A Presentation On Service Time Variation in Cafe Coffee Day
29 pages

Week_4

Uploaded by

Week_4

Uploaded by

Measures of Dispersion

20, 22, 24 and 0, 22, 44

The two commonly used measures of dispersion are:

Solution: The range is 𝑅 = 8 − 2 = 6.

Outlier: An observation of 𝑥 > 𝑄3 + 1.5(𝐼𝑄𝑅) or an observation of 𝑥 < 𝑄1 −

1. Calculate Q1, Q2 (Md), Q3 and IQR

Inner fences: Q1 – 1.5(IQR) = 7.5 – 1.5(1.5) = 5.25

Outer fences: Q1 – 3(IQR) = 7.5 – 3(1.5) = 3.0

Detecting Shape of Distribution from Box-and-Whisker Plot

 If distance between Q2 and Q1 and distance between Q3 and Q2 are

Solution: According to empirical rule, we are supposed to calculated either

Empirical Rule for a Given Large Sample:

If a distribution (histogram, box plot) appears bell-shaped symmetric, we expect

 68% of the observations to fall in the interval (𝑥 − 𝑠, 𝑥 + 𝑠)

𝑥 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛; 𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Example: Data on scores of an exam collected from 50 students

According to Empirical Rule

(a) between 3 and 7.4;

a) Calculate the sample mean and sample standard deviation.

You might also like