Statistics Week 5
Statistics Week 5
MEASURES OF DISPERSION
WEEK-5
MEANING OF DISPERSION
The measures are absolute in the sense that they are expressed in the
same statistical unit in which the original data are presented, such as
dollar, taka, meter, kilogram, etc.
MEASURES OF DISPERSION
When the two or more data sets are expressed in different units,
however, the absolute measures are not comparable, in which case it
is necessary to consider some other measures that reduce the
absolute deviation in some relative form. These measures are referred
to as relative measures. The relative measures are usually expressed
in the form of coefficients and are pure numbers, independent of the
unit of measurements. The measures are
(i) Coefficient of range
(ii) Coefficient of quartile deviation
(iii) Coefficient of mean deviation
(iv) Coefficient of variation
ABSOLUTE MEASURES OF DISPERSION
The Range
R=L–S
For a set of observations 90, 110, 20, 51, 210 and 190, say,
the smallest value is 20 and the largest value is 210, so
that R = 210 20 = 190.
MEAN DEVIATION
For data clustered near the central value, the differences of the individual
observations from their typical value will tend to be small. Accordingly, to obtain a
these differences. The resulting average will be called mean deviation. It is also
In practice, the mean deviation is computed as the arithmetic mean of the absolute
values of the deviations from a typical value of a distribution. The typical value may
be the arithmetic mean, median, mode or any other arbitrary (any value without
mean) value. The median is sometimes preferred as a typical value, because the
sum of the absolute values of the deviations from the median is smaller than any
If x1, x2,..,xn form a sample of observations, the formula for computing the
average or mean deviation about any arbitrary values ‘a’ is
M d a
x i a
… (4.4)
n
where means that the signs of the deviations whether positive or
negative, are ignored. For a grouped frequency distribution with fi=n, the
mean deviation about the arbitrary value ‘a’ is
M d (a)
f i xi a
… (4.5)
n
If we replace ‘a’ by x , the resulting mean deviation will be called mean
deviation about the mean:
M d x
x i x
… (4.5a)
n
where n= f i .For a grouped frequency distribution
M d x
f i xi x
… (4.5b)
n
When the deviations are taken from the median we substitute m ~ for a in
(4.5), and the resulting formula for computing mean deviation about the
median is
M d m
~ f i
~
xi m
… (4.5c)
n
The following examples demonstrate how the mean deviation is computed.
MEAN DEVIATION
To compute the mean deviation about mean for the given data, the following
steps are involved:
b) Obtain the absolute deviation of each value in column (2) of Table 4.1
from the computed mean. These deviations are shown in column (3).
c) Obtain the sum of column (3) and divide the resulting sum by the total
number of observations (n=10).
d) The result obtained in (c) above is the mean deviation about the mean.
MEAN DEVIATION
Example: Ten persons of varying ages were weighed and the following weights
in kg were recorded:
110, 125, 125, 147, 117, 125, 136, 157, 124, 110.
Compute mean deviation about the mean, median and an arbitrary value 120.
The mean deviations about the mean, median and an arbitrary value 120
are respectively
M d (x)
x i x
x i 127.6
114 .4
11 .44
n n 10
~)
M d (m
~
xi m
x i 125
104.0
10.40
n n 10
and
M d (a )
x i a
x
122.0 i
12.20
120
n n 10
Note that among the three mean deviations, mean deviation about the median
is the smallest.
VARIANCE AND STANDARD DEVIATION
Instead of ignoring the signs of deviations from the mean as in the computation
of an average deviation, they may each be squared and then the results are
added. The sum of squares can be regarded as a measure of the total
dispersion of the distribution. By dividing the sum by n (the total number of
observations), we obtain the average of the squares of deviations, a measure,
called variance, of the distribution. If the observations are all from a population,
the resulting variance is referred to as the population variance. As a formula,
the variance of population observations x1, x2,...,xN , commonly designatedσ2 is
xi
2
2 ….(4.8)
N
VARIANCE AND STANDARD DEVIATION
where is the mean of all the observations in the population and N is the total
number of observations in the population. Because of the operation of squaring,
the variance is expressed in square units (e.g.km2, taka2, etc.), and not (e.g.
km, taka, etc.), of the original unit. It is therefore necessary to extract the
positive square root to restore the original unit. The measure of dispersion thus
obtained is called the population standard deviation and is usually denoted by
. Thus
x
i
2
… (4.9)
N
Thus, by definition, the standard deviation is the positive square root of the
mean-square deviations of the observations from their arithmetic mean.
VARIANCE AND STANDARD DEVIATION
Fortunately, it can be shown that if the sum of the squared deviations in the
sample is divided by n–1, and not by n, then the resulting sample variance will
provide an unbiased estimate of the population variance. For this reason, the
sample variance is defined as follows:
2
s
x x
i
2
… (4.12)
n 1
The variance and hence the standard deviation are simple to compute
for ungrouped data. Suppose a data set consists of n values x1, x2,
…..,xn. As a first step, compute the arithmetic mean for this data set.
Then subtract this mean from each of the values of x and obtain a set
x x x
of deviations (x1– ),(x2 – ),…,(xn– ). Then square these deviations,
sum them and divide the resulting sum by n. This gives you the
variance of the given values x1, x2, ..,xn. Let us illustrate the
computation of variance from raw data by an example.
COMPUTING VARIANCE FOR UNGROUPED DATA
Example: Compute the variance and standard deviation from the data on weight of ten
children in Example 4.1.
Solution: The data were as follows: 20, 13, 17, 17, 13, 18, 14, 17, 16, and 15. The mean
of this set is 16. Following the steps outlined above, the accompanying table is
constructed to illustrate the computation of variance.
Table 4.3: Computation of variance and standard deviation
Child xi xi x ( xi x ) 2 x2
1 20 4 16 400
2 13 –3 9 169
3 17 1 1 289
4 17 1 1 289
5 13 –3 9 169
6 18 2 4 324
7 14 –2 4 196
8 17 1 1 289
9 16 0 0 256
10 15 –1 1 225
Total 160 0 46 2606
s 2
(x i x )2
46
5.11 kg
2
n 1 9
Taking square root of the variance, we obtain the standard deviation:
s 5.11 kg 2 2.26 kg
COMPUTING VARIANCE FOR FREQUENCY DISTRIBUTION
The formula for the computation of the variance presented above can be rewritten in a
compact form as follows:
1 f x 2
i i
2
s f i xi2
n 1 n
n f x f x
2
i i i i
2
… (4.15)
nn 1
In many textbooks, the divisor n is used in place of n1. The discrepancy in the value of
s2 resulting from the use of n instead of n-1 is not however substantial when n is large.
COMPUTING VARIANCE FOR FREQUENCY DISTRIBUTION
Example 4.6: Compute the variance and standard deviation for the
following frequency distribution:
x: 3 5 7 8 9
f: 2 3 2 2 1
2
n f i x i2 f x i i
2
10400 602
s 4.44
nn 1 1010 1
COMPUTING VARIANCE FOR FREQUENCY DISTRIBUTION
Example 4.7: The lengths of 32 leaves were measured correct to the
nearest mm. Find the mean, variance and hence the standard deviation of
the lengths.
x
fx i i
867
27.1
n 32
Using (4.15) the variance is
Covx, y
xi x yi y
n
Coefficient of Variation
A value of 33 percent, for example, for CV implies that the standard deviation of
the sample value is 33 percent of the mean of the same distribution. As an
illustration of the use of CV as descriptive statistics, let us look at the following
examples:
RELATIVE MEASURES OF DISPERSION
Example 4.17: The average weekly wage in a factory had increased from
Tk.8000 to Tk.12000 as result of negotiation between the employees and the
employer. Alongside, the standard deviation had decreased from Tk.150 to
Tk.100. Can we conclude that after negotiation, the wage has become higher
and more uniform?
Solution: As the standard deviation after the settlement shows a lower value
than before, one might tend to conclude that disparity in wage has been
considerably reduced. But the average wage differs considerably before and
after the settlement. It is therefore not safe to base our decision only on the
basis of standard deviation. Coefficient of variation seems to be the best tool in
this instance. Thus
RELATIVE MEASURES OF DISPERSION
100
CV (before settlement) 100 125%
8000
150
CV (after settlement) 100 125%
12000
The variability and hence the disparity in the distribution of wages remained as
before as shown by the CV, although the average wage has shown an increase
from 8000 to 12000.
THANK YOU