Measures of Dispersion
Dr. Seema Gupta Bhol
Measures of Dispersion
Dispersion in statistics is a way to describe how spread out or
scattered the data is around an average value.
It helps to understand if the data points are close together or
far apart.
Measures of Dispersion measure the scattering of the data. It
tells us how the values are distributed in the data set
Variance
Variance is a statistical measurement of the spread between
numbers in a data set. It measures how far each number in the set is
from the mean (average), and thus from every other number in the
set.
• Variance is often depicted by this symbol: σ2.
Example: Find the variance of the numbers
3, 8, 6, 10, 12, 9, 11, 10, 12, 7.
Solution:
Step 1: Compute the mean of the 10 values given.
Mean = (3+8+6+10+12+9+11+10+12+7) / 10 = 88 / 10 = 8.8
Step 2: Make a table with three columns, one for the X values, the
second for the deviations and the third for squared deviations.
Question 2: Calculate the variance for the following data
xi fi
10 1
4 3
6 5
8 1
Mean (x̄) = ∑(fi xi)/∑(fi)
n = ∑(fi) = 1+3+5+1 = 10
∑(fi xi) = (10×1 + 4×3 + 6×5 + 8×1)/(1+3+5+1) =60
Mean (x̄) = 60/10 = 6
xi fi f i xi (xi – x̄) (xi – x̄)2 fi(xi – x̄)2
10 1 10 4 16 16
4 3 12 -2 4 12
6 5 30 0 0 0
8 1 8 2 4 8
σ2 = (∑in fi(xi – x̄)2/n)
= [(16 + 12 + 0 +8)/10]
= 3.6
Variance(σ2) = 3.6
Example 3: Find the variance of the following data table
Class Frequency
0-10 3
10-20 6
20-30 4
30-40 2
40-50 1
Class Xi fi f×Xi Xi – μ (Xi – μ)2 f×(Xi – μ)2
0-10 5 3 15 -15 225 675
10-20 15 6 90 -5 25 150
20-30 25 4 100 5 25 100
30-40 35 2 70 15 225 450
40-50 45 1 45 25 625 625
Total 16 320 2000
Mean (μ) = ∑(fi xi)/∑(fi)
= 320/16 = 20
σ2 = (∑in fi(xi – μ)2/n)
= [(2000)/(16)]
= (125)
The variance of given data set is 125.
Standard Deviation
•The square root of the variance is the standard
deviation (SD or σ),
•Shows Variation About the Mean:
•Formula :
X i
2
N
Standard deviation & Variance
Standard deviation is calculated from variance, and both
are measures of spread in a data set.
Variance is harder to interpret intuitively because the units
of variance are much larger than the typical value of a data
set. Standard deviation is expressed in the same units as
the data, making it easier to understand. For example, if
you're dealing with test scores, standard deviation will be
in points, just like the scores themselves.
Both variance and standard deviation are useful, and you
can use whichever is more appropriate in a given situation
What Is Variance Used for?
• Variance measures the degree of spread in a data set from its mean
value. It shows the amount of variation that exists among the data
points.
• Visually, the larger the variance, the "fatter" a probability
distribution will be.
• Variance measures variability or how far numbers in a data set
diverge from the mean.
• It is used by various professionals including data analysts,
mathematicians, scientists, statisticians. and investors. The latter
two use variance to determine whether to buy, sell, or hold
securities. For example, if an investment has a greater variance, it
could be considered more volatile and risky.
Coefficient of Variation
The coefficient of variation represents the ratio of the standard
deviation to the mean, and it is a useful statistic for comparing the
degree of variation from one data series to another.
Measure of Relative Variation
Always as %
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula :
SD
CV 100%
X
Comparing Coefficient of Variation
Stock A: Average Price last year = $50
Standard Deviation = $5
Stock B: Average Price last year = $100
Standard Deviation = $5
Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Quartiles
Quartiles segment any distribution that’s ordered from low to
high into four equal parts.
Quartiles are the set of values that divide the data points into
four identical values using three individual data points.
The first quartile (Q1) , is defined as the middle number
between the smallest number and the median of the data set,
The second quartile (Q2) is the median of the given data set.
The third quartile (Q3) is the middle number between the
median and the largest value of the data set.
Quartiles
Q1 is the value below which 25 percent of the distribution lies,
while Q3 is the value below which 75 percent of the distribution
lies.
You can think of Q1 as the median of the first half and Q3 as the
median of the second half of the distribution.
Range
Range is measure of the "spread" in a data set.
It is the difference between the largest value and the
smallest value in the given data set.
The range gives the spread of the whole data set,
while the inter quartile range gives the range of the
middle half of a data set.
Range
The daily high temperatures is recorded for two different cities in
a recent week in degree Celsius. The temperatures for each city
are shown below.
City A: [23,25,28,28,32,33,35]
City B: [16,24,26,26,26,27,28]
THE RANGE OF THE TEMPERATURES IN CITY A:
RANGE=MAX-MIN
=35-23=12
THE RANGE OF THE TEMPERATURES IN CITY B:
RANGE=MAX-MIN
=28-16=12
Interquartile range (IQR)
The interquartile range (IQR) contains the second and third
quartiles, or the middle half of the data set.
Interquartile range = Upper Quartile – Lower Quartile = Q3 – Q1
where Q1 is the first quartile and Q3 is the third quartile of the series.
Interquartile range (IQR)
Interquartile range = Upper Quartile – Lower Quartile = Q3 – Q1
where Q1 is the first quartile and Q3 is the third quartile of the series.
Question:
Determine the interquartile range value for the first ten prime numbers.
Solution:
Given: The first ten prime numbers are:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29
This is already in increasing order.
Number of data items = 10
10 is an even number. Therefore, the median is mean of 5 th and 6th data item. i.e.
11 and 13
That is Q2 = (11 + 13)/2 = 24/2 = 12.
Now we have to get two parts i.e. lower half to find Q1 and the upper half to find
Q 3.
Q1 part : 2, 3, 5,7,11
Here the number of values = 5
5 is an odd number. Therefore, the center value is (5+1)/2, that is Q 1= 5
Q3 part : 13, 17, 19, 23, 29
Here the number of values = 5
5 is an odd number. Therefore, the center value is 19, that is Q3= 19
The subtraction of Q1 and Q3 value is 19 – 5 = 11
Therefore, 11 is the interquartile range value.
Question: Consider the following dataset of exam scores for a
class tenth:
77, 85, 92, 64, 78, 95, 82
Find the Interquartile Range of the above data
Solution:
First, we need to arrange in ascending order
Count the given values i.e is 7 ,so count is odd, then median is
middle value =82
Next Divide into two halfs ,Lower half and Upper half
Next identify median value in lower half as Q1 and upper half as Q3
Now,Q1 = 77 and Q3 = 92
⇒ IQR = Q3 - Q1= 92-77 = 15
Box and Whisker Plot
It is one of the types of graphical methods which displays the
variation of the data in the dataset.
It is also called just a box plot.
To plot a graph outline a box from the first quartile to the third
quartile. A vertical line that goes through the box is the median.
The whiskers (small lines) go from each quartile towards the
minimum or maximum value.
Elements of a Box and Whisker Plot
The elements required to construct a box and whisker plot are:
Minimum value (Q0 or 0th percentile)
First quartile (Q1 or 25th percentile)
Median (Q2 or 50th percentile)
Third quartile (Q3 or 75th percentile)
Maximum value (Q4 or 100th percentile)
Interquartile range
Example: Draw the box plot for the given set of data: {3, 7, 8, 5, 12, 14, 21, 13, 18}.
Solution:
Firstly, write the given data in increasing order.
3, 5, 7, 8, 12, 13, 14, 18, 21
Range = Maximum value – Minimum value
Range = 21 – 3 = 18
Now, Median = center value of the given data
Median = 12
Now, we need to find the quartiles.
First quartile = Q1 = Median of data values present at the left side of Median
Q1 = Median of (3, 5, 7, 8)
Q1 = (5+7)/2 = 12/2 = 6
Third quartile = Q3 = Median of data values present at the right side of Median
Q3 = Median of (13, 14, 18, 21)
Q3 = (14+18)/2 = 32/2 = 16
Therefore, the interquartile range = Q3 – Q1 = 16 – 6 = 10