0% found this document useful (0 votes)
20 views

Statistics Week 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Statistics Week 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

STATISTICS

MEASURES OF DISPERSION

WEEK-5
MEANING OF DISPERSION

The essential purpose of statistical averages discussed in the


preceding chapter is to summarize a large mass of data. These
averages serve to locate the ‘center’ of a distribution but they do
not reveal how the items or the observations are spread out or
scattered around their mean. This latter characteristic of a
distribution is variously known as the dispersion, ‘scatter’, or
‘variation’. If the dispersion is small, it indicates high uniformity
of the observations in the distribution. Absence of dispersion in
the data indicates perfect uniformity. This situation arises when
all observations in the distribution are identical.
MEASURES OF DISPERSION

The measures of dispersion can broadly be classified to fall into one of


the two categories: absolute measures and relative measures. The first
category of measures includes:
i. The range
ii. The quartile deviation
iii.The mean (or average) deviation
iv.The variance
v. The standard deviation

The measures are absolute in the sense that they are expressed in the
same statistical unit in which the original data are presented, such as
dollar, taka, meter, kilogram, etc.
MEASURES OF DISPERSION

When the two or more data sets are expressed in different units,
however, the absolute measures are not comparable, in which case it
is necessary to consider some other measures that reduce the
absolute deviation in some relative form. These measures are referred
to as relative measures. The relative measures are usually expressed
in the form of coefficients and are pure numbers, independent of the
unit of measurements. The measures are
(i) Coefficient of range
(ii) Coefficient of quartile deviation
(iii) Coefficient of mean deviation
(iv) Coefficient of variation
ABSOLUTE MEASURES OF DISPERSION

The Range

The simplest and the crudest measure of dispersion is the


range (R). This is defined as the difference between the
smallest (S) and the largest (L) values in the distribution.

R=L–S

For a set of observations 90, 110, 20, 51, 210 and 190, say,
the smallest value is 20 and the largest value is 210, so
that R = 210  20 = 190.
MEAN DEVIATION

For data clustered near the central value, the differences of the individual

observations from their typical value will tend to be small. Accordingly, to obtain a

measure of the total variation in the data, it is appropriate to find an average of

these differences. The resulting average will be called mean deviation. It is also

known as the average deviation.

In practice, the mean deviation is computed as the arithmetic mean of the absolute

values of the deviations from a typical value of a distribution. The typical value may

be the arithmetic mean, median, mode or any other arbitrary (any value without

mean) value. The median is sometimes preferred as a typical value, because the

sum of the absolute values of the deviations from the median is smaller than any

other value. In practice, however, the arithmetic mean is generally used.


MEAN DEVIATION

If x1, x2,..,xn form a sample of observations, the formula for computing the
average or mean deviation about any arbitrary values ‘a’ is

M d a  
x i a
… (4.4)
n
where  means that the signs of the deviations whether positive or
negative, are ignored. For a grouped frequency distribution with fi=n, the
mean deviation about the arbitrary value ‘a’ is

M d (a) 
f i xi  a
… (4.5)
n
If we replace ‘a’ by x , the resulting mean deviation will be called mean
deviation about the mean:

M d x  
x i x
… (4.5a)
n
where n= f i .For a grouped frequency distribution

M d x  
f i xi  x
… (4.5b)
n
When the deviations are taken from the median we substitute m ~ for a in
(4.5), and the resulting formula for computing mean deviation about the
median is

M d m
~  f i
~
xi  m
… (4.5c)
n
The following examples demonstrate how the mean deviation is computed.
MEAN DEVIATION

Steps to compute mean deviation:

To compute the mean deviation about mean for the given data, the following
steps are involved:

a) Compute the arithmetic mean. This is 127.6 in the present instance.

b) Obtain the absolute deviation of each value in column (2) of Table 4.1
from the computed mean. These deviations are shown in column (3).

c) Obtain the sum of column (3) and divide the resulting sum by the total
number of observations (n=10).

d) The result obtained in (c) above is the mean deviation about the mean.
MEAN DEVIATION

Example: Ten persons of varying ages were weighed and the following weights
in kg were recorded:

110, 125, 125, 147, 117, 125, 136, 157, 124, 110.

Compute mean deviation about the mean, median and an arbitrary value 120.

Solution: Repeat the procedure outlined above to compute the mean


deviations about the median (which is 125 for the data set) and the arbitrary
value 120, i.e. a=120. The corresponding deviations are shown in last two
columns of Table 4.1.
MEAN DEVIATION
Table 4.1: Computation of mean deviations
Serial no. Weight (xi) xi  x ~
xi  m xi  a
(1) (2) (3) (4) (5)
1 110 17.6 15.0 10.0
2 125 2.6 0 5.0
3 125 2.6 0 5.0
4 147 19.4 22.0 27.0
5 117 10.6 8.0 3.0
6 125 2.6 0 5.0
7 136 8.4 11.0 16.0
8 157 29.4 32.0 37.0
9 124 3.6 1.0 4.0
10 110 17.6 15.0 10.0
Total 1276 114.4 104.0 122.0

The mean deviations about the mean, median and an arbitrary value 120
are respectively

M d (x) 
x i x

x i  127.6

114 .4
 11 .44
n n 10

~) 
M d (m
 ~
xi  m

x i  125

104.0
 10.40
n n 10
and

M d (a ) 
x i a

x
122.0 i
 12.20
 120

n n 10
Note that among the three mean deviations, mean deviation about the median
is the smallest.
VARIANCE AND STANDARD DEVIATION

Instead of ignoring the signs of deviations from the mean as in the computation
of an average deviation, they may each be squared and then the results are
added. The sum of squares can be regarded as a measure of the total
dispersion of the distribution. By dividing the sum by n (the total number of
observations), we obtain the average of the squares of deviations, a measure,
called variance, of the distribution. If the observations are all from a population,
the resulting variance is referred to as the population variance. As a formula,
the variance of population observations x1, x2,...,xN , commonly designatedσ2 is

 xi   
2

2  ….(4.8)
N
VARIANCE AND STANDARD DEVIATION

where  is the mean of all the observations in the population and N is the total
number of observations in the population. Because of the operation of squaring,
the variance is expressed in square units (e.g.km2, taka2, etc.), and not (e.g.
km, taka, etc.), of the original unit. It is therefore necessary to extract the
positive square root to restore the original unit. The measure of dispersion thus
obtained is called the population standard deviation and is usually denoted by
. Thus


 x   
i
2
… (4.9)
N

Thus, by definition, the standard deviation is the positive square root of the
mean-square deviations of the observations from their arithmetic mean.
VARIANCE AND STANDARD DEVIATION

Fortunately, it can be shown that if the sum of the squared deviations in the
sample is divided by n–1, and not by n, then the resulting sample variance will
provide an unbiased estimate of the population variance. For this reason, the
sample variance is defined as follows:

2
s 
 x  x 
i
2
… (4.12)
n 1

Such an estimate will show no systematic tendency to be either greater than or


less than the population variance 2. The division by n–1 instead of n makes
the average squared deviation consistent with many similar measures used in
statistical measures.
COMPUTING VARIANCE FOR UNGROUPED DATA

The variance and hence the standard deviation are simple to compute
for ungrouped data. Suppose a data set consists of n values x1, x2,

…..,xn. As a first step, compute the arithmetic mean for this data set.
Then subtract this mean from each of the values of x and obtain a set
x x x
of deviations (x1– ),(x2 – ),…,(xn– ). Then square these deviations,
sum them and divide the resulting sum by n. This gives you the
variance of the given values x1, x2, ..,xn. Let us illustrate the
computation of variance from raw data by an example.
COMPUTING VARIANCE FOR UNGROUPED DATA

Example: Compute the variance and standard deviation from the data on weight of ten
children in Example 4.1.

Solution: The data were as follows: 20, 13, 17, 17, 13, 18, 14, 17, 16, and 15. The mean
of this set is 16. Following the steps outlined above, the accompanying table is
constructed to illustrate the computation of variance.
Table 4.3: Computation of variance and standard deviation
Child xi xi  x ( xi  x ) 2 x2
1 20 4 16 400
2 13 –3 9 169
3 17 1 1 289
4 17 1 1 289
5 13 –3 9 169
6 18 2 4 324
7 14 –2 4 196
8 17 1 1 289
9 16 0 0 256
10 15 –1 1 225
Total 160 0 46 2606

The variance is thus

s 2

 (x i  x )2

46
 5.11 kg
2
n 1 9
Taking square root of the variance, we obtain the standard deviation:
s  5.11 kg 2  2.26 kg
COMPUTING VARIANCE FOR FREQUENCY DISTRIBUTION

The formula for computation of variance and standard deviation of a frequency


distribution should be modified to take into account the values of x and their
corresponding frequencies. Thus if the variable values x1, x2, …,xk each occur

with frequencies f1, f2, …,fk respectively, then


s2 
 f x  x 
i i
2
… (4.14)
n 1

For grouped data xi will be the mid-value of the i-th class.

The formula for the computation of the variance presented above can be rewritten in a
compact form as follows:
1   f x  2 

i i
2
s   f i xi2  
n 1  n 
 


n  f x   f x 
2
i i i i
2
… (4.15)
nn  1

In many textbooks, the divisor n is used in place of n1. The discrepancy in the value of
s2 resulting from the use of n instead of n-1 is not however substantial when n is large.
COMPUTING VARIANCE FOR FREQUENCY DISTRIBUTION

Example 4.6: Compute the variance and standard deviation for the
following frequency distribution:

x: 3 5 7 8 9
f: 2 3 2 2 1

Solution: The following table illustrates the computation of variance from


the above distribution.
xi fi f i xi f i xi2
3 2 6 18
5 3 15 75
7 2 14 98
8 2 16 128
9 1 9 81
Total 10 60 400

2
n  f i x i2   f x  i i
2
10400  602
s    4.44
nn  1 1010  1
COMPUTING VARIANCE FOR FREQUENCY DISTRIBUTION
Example 4.7: The lengths of 32 leaves were measured correct to the
nearest mm. Find the mean, variance and hence the standard deviation of
the lengths.

Length: 20-22 23-25 26-28 29-31 32-34


Frequency: 3 6 12 9 2

Solution: In order to compute the required measures, we construct the


following table:

Length xi xi2 fi f i xi f i xi2


20–22 21 441 3 63 1323
23–25 24 576 6 144 3456
26–28 27 729 12 324 8748
29–31 30 900 9 270 8100
32–34 33 1089 2 66 2178
Total - - 32 867 23805

The mean length is

x
fx i i

867
 27.1
n 32
Using (4.15) the variance is

2 n  f i xi 2   f i xi 2 32( 23805)  8672


s    10.15
n( n  1) 32  31

so that the standard deviation is s  10.15  3.19


Hence the mean, variance and standard deviation are 27.1, 10.15 and 3.19
respectively.
PROPERTIES OF VARIANCE

4.1: The variance is independent of origin but dependent on the scale of


measurement.
4.2: If u=x+y,then su2  s x2  s 2y  2 cov x, y  , where su2 is the variance of u
and Cov(x, y) is the covariance between x and y as defined below:

Covx, y  
 xi  x  yi  y 
n

4.3: The variance is minimized if computed from the arithmetic mean


USES OF STANDARD DEVIATION

A thorough understanding of the use of standard deviation is difficult for us at


this stage, unless we acquire some knowledge on some theoretical distributions
in statistics. Nevertheless, we shall try to introduce the idea of its use through a
few simple illustrative examples. The standard deviation of a population () is a
measure of the dispersion in the population, while the standard deviation of
sample observations (s) is a measure of the dispersion in the distribution
constructed from the sample. In both the cases, the standard deviation (like the
mean deviation) represents the average variability in a distribution. The greater
this variability around the mean of a distribution, the larger the standard
deviation. Thus s = 4.5, for example, indicates greater variability than s = 2.5.
RELATIVE MEASURES OF DISPERSION

Coefficient of Variation

The coefficient of variation (CV) is one of the important measures of


dispersion that attempts to measure the variability in data relative to the
mean. When mean values of two or more data sets vary considerably,
we do not get an accurate picture of the relative variability in the sets
just by comparing the standard deviations. Coefficient of variation
tends to overcome this difficulty. This is a measure that represents the
spread of the distribution relative to the mean of the same distribution.
RELATIVE MEASURES OF DISPERSION

A coefficient of variation is computed as a ratio of the standard deviation of the


distribution to the mean of the same distribution. Expressing in percentage
form, the symbolic representation of the coefficient is:
sx
CV   100 … (4.20)
x

Clearly, if the mean of a data set is zero, CV cannot be computed. The


measure is a pure number and independent of units.

A value of 33 percent, for example, for CV implies that the standard deviation of
the sample value is 33 percent of the mean of the same distribution. As an
illustration of the use of CV as descriptive statistics, let us look at the following
examples:
RELATIVE MEASURES OF DISPERSION

Example 4.17: The average weekly wage in a factory had increased from
Tk.8000 to Tk.12000 as result of negotiation between the employees and the
employer. Alongside, the standard deviation had decreased from Tk.150 to
Tk.100. Can we conclude that after negotiation, the wage has become higher
and more uniform?

Solution: As the standard deviation after the settlement shows a lower value
than before, one might tend to conclude that disparity in wage has been
considerably reduced. But the average wage differs considerably before and
after the settlement. It is therefore not safe to base our decision only on the
basis of standard deviation. Coefficient of variation seems to be the best tool in
this instance. Thus
RELATIVE MEASURES OF DISPERSION

100
CV (before settlement)   100  125%
8000
150
CV (after settlement)   100  125%
12000

The variability and hence the disparity in the distribution of wages remained as
before as shown by the CV, although the average wage has shown an increase
from 8000 to 12000.
THANK YOU

You might also like