chapter 4
chapter 4
CHAPTER FOUR
4. MEASURES OF VARIATION
4.1 Introduction
In addition to locating the center of the observed values of the variable in the data, another
important aspect of a descriptive study of the variable is numerically measuring the extent of
variation around the center. Two data sets of the same variable may exhibit similar positions of
center but may be remarkably different with respect to variability.
Just as there are several different measures of center, there are also several different measures of
variation. In this section, we will examine three of the most frequently used measures of variation;
the sample range, the sample interquartile range and the sample standard deviation. Measures of
variation are used mostly only for quantitative variables.
4.2 Objectives of Measuring Variation
The general object of measuring dispersion is to obtain a single summary figure which adequately
exhibits whether the distribution is compact or spread out.
To judge the reliability of measures of central tendency
To control variability itself.
To compare two or more groups of numbers in terms of their variability.
To make further statistical analysis.
4.3 Absolute and Relative Measures of Dispersion
The measures of dispersion which are expressed in terms of the original unit of a series are termed
as absolute measures. Such measures are not suitable for comparing the variability of two
distributions which are expressed in different units of measurement and different average size.
Relative measures of dispersions are a ratio or percentage of a measure of absolute dispersion to an
appropriate measure of central tendency and are thus pure numbers independent of the units of
measurement. For comparing the variability of two distributions (even if they are not measured in
the same unit), we compute the relative measure of dispersion instead of absolute measures of
dispersion.
It is useful for comparing variation in two or more distributions where units of measurements are
the same. Various measures of dispersions are in use. The most commonly used measures of
dispersions are:
1. Range and Relative Range
2. Quartile Deviation and Coefficient of Quartile Deviation
3. Mean Deviation and Coefficient of Mean Deviation
4. Standard Deviation and Coefficient of Variation.
4.3.1 The Range and Relative Range
The Range (R): The range is the largest score minus the smallest score. It is a quick and dirty
measure of variability, although when a test is given back to students they very often wish to know
the range of scores. Because the range is greatly affected by extreme scores, it may give a distorted
picture of the scores. Range for grouped frequency distribution is the upper class boundary of the
last class interval minus the lower class boundary of the first class interval, i.e., R = UCBlci - LCBfci
BY: Habtamu W.(MSc in Biostatistics) Page 1
Basic Statistics Lecture Note 2024/2025
The following two distributions have the same range, 13, yet appear to differ greatly in the amount
of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.
Range for grouped data:
If data are given in the shape of continuous frequency distribution, the range is computed as:
R UCLk LCL1 , UCLk is upperclasslim it of the last class.
UCL1 is lower class lim it of the first class.
This is sometimes expressed as:
R X k X1 , X k is class mark of the last class.
X 1 is classmark of the first class.
Merits and Demerits of range
Merits:
It is rigidly defined.
It is easy to calculate and simple to understand.
Demerits:
It is not based on all observation.
It is highly affected by extreme observations.
It is affected by fluctuation in sampling.
It cannot be computed in the case of open end distribution.
It is very sensitive to the size of the sample.
Relative Range (RR): It is also sometimes called coefficient of range and given by:
CR = (highest value – smallest value)/(highest value + smallest value)
Example:
1. Find the relative range of the above two distribution. (Exercise!)
2. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value of:
a. Smallest observation
b. Largest observation
Solution: (2)
R 4 L S 4 __________ _______(1)
RR 0.25 L S 16 __________ ___( 2)
Solving (1) and (2) at the same time , one can obtain the following value
L 10 and S 6
Q.D =
Coefficient of Quartile Deviation (C.Q.D):
C.Q.D = =
Remark: Q.D or C.Q.D includes only the middle 50% of the observation.
4.3.3 The Mean Deviation and Coefficient Of Mean Deviation
The Mean Deviation (M.D): The mean deviation of a set of items is defined as the arithmetic mean
of the values of the absolute deviations from a given average. Depending up on the type of averages
used we have different mean deviations.
a) Mean Deviation about the mean
MD = .
For the case of a frequency distribution data where the values X1, X2, X3, …, Xm occur f1, f2, f3, …,
fm times respectively, then mean deviation is obtained by:
MD = .
For grouped data that is if the data is given in the form of frequency distribution of K-classes in
which mi and fi are the class marks and frequency of the ith class respectively then the mean
Solution: first find the mean as = = (10*8 + 8*9 +…+6*3)/(8+9+…+3) = 8.4, then
Xi 10 8 9 7 6
fi 8 9 13 6 3
│Xi - │ 1.6 0.6 0.4 1.4 2.4
fi │Xi - │ 12.8 7.8 3.6 8.4 7.2
CMD = .
Exercise: find the coefficient of mean deviation about the mean for the above example.
4.2.4 The Variance, Standard Deviation and the Coefficient of Variation
The Variance: is the "average squared deviation from the mean" and it measures the average of the
square of the deviations from the mean for each observations.
Suppose we have population of N observations, say X1, X2, X3, …, XN, then we define the
population variance as:
= = .
But most of the time we have sample of n observations, say X1, X2, X3, …, Xn from the population
of N, then we define the sample variance as:
= .
This measure of variation is universally used to show the scatter of the individual measurements
around the mean of all the measurements in a given distribution. But the disadvantage is that the
units of variance are the square of the units of the original observations. The easiest way for this
difficulty is to use the square root of the variance as a measure of variability called the standard
deviation.
The population and the sample standard deviations denoted by σ and S respectively are defined as:
σ= and S = = .
For the case of frequency distribution data the population and sample variance are given as:
= and =
and the square roots of these will give the corresponding standard deviations.
Variance and Standard Deviation for Grouped Data
To obtain the variance and standard deviation of data presented in a grouped frequency distribution,
we make the same assumptions that made in the calculation of the mean for grouped data in which
each value falling in to a class is identically distributed and observations in each class represented
by the class mark. The calculation is the same to the formula of data given in frequency distribution
except that Xi is substitute by the mid points of each class and m by k.
The following steps are used to calculate the sample variance:
1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the number of
observations minus one, (i.e., n-1), where n is the number of observations in the data set.
Example: Areas of spray able surfaces with DDT from a sample of 15 houses are as follows (m2):
101, 105, 110, 114, 115, 124, 125, 125, 130, 133, 135, 136, 137, 140, 145. Find the variance and
standard deviation of the above distribution.
Solution: The mean of the sample is 125 m2, then
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions: a) = 11
Xi 5 10 12 17 Total
36 1 1 36 74
(Xi- )2
a) 38 and 62 are at equal distance from the mean,50 and this distance is 12
ks 12
12 12
k 2
S 6
1
(1 ) *100% 75%
Applying the above theorem, at least k2 of the numbers lie between
38 and 62.
b) Similarly done.
1
2
*100% 25%
c) It is just the complement of a) i.e. at most k of the numbers lie less than 32 or
more than 62.
d) Similarly done.
3. Consider a sample X1, ….., Xn, which will be referred to as the original sample. To create a
translated sample X1+C, add a constant C to each data point. Let Yi = Xi+C, i = 1, …., n.
Suppose we want to compute the standard deviation of the translated sample, we can show that
the following relationship holds: If Yi = Xi + C, i = 1, …., n, then Sy = Sx. Therefore, the
standard deviation of Y will be the same as the standard deviation of X.
4. What happens to the standard deviation if the units or scales being worked with are changed? A
re-scaled sample can be created: If Yi = CXi, i=1, ……., n, then Sy = CSx and S2y = C2S2x.
Therefore, to find the variance and standard deviation of the Y’s compute the variance and
standard deviations of the X’s and multiply it by the constant C2 and C, respectively.
Example: If we have a sample of temperature in °C with a standard deviation of 1.8, then
what is the standard deviation of a sample temperature in °F?
Solution: Let Yi denote the °F temperature that corresponds to a °C temperature of Xi.
Since the required transformation to convert the data to °F would be: Yi = Xi + 32, i= 1,
2, 3, …, n. Then the standard deviation in oF would be: Sy = 9/5(1.8) = 3.24 0F.
5. On the other hand, where several standard deviations for a variable are available and if we
need to compute the combined standard deviation, the pooled standard deviation (Sp) of the
entire group consisting of all the samples may be computed as:
Example: The standard deviation of systolic blood pressure was found to be 10.6 and 15.2 mm Hg,
respectively, for two groups of 12 and 15 men. What is the standard deviation of systolic pressure of
all the 27 men?
Solution: Given: Group 1: S1 = 10.6 and n1 = 12 Group 2: S2 = 15.2 and n2 = 15, then
Coefficient of Variation (CV): The coefficient of variation (CV) is defined by *100%. The
coefficient of variation is most useful in comparing the variability of several different samples, each
with different means. This is because a higher variability is usually expected when the mean
increases, and the CV is a measure that accounts for this variability.
The coefficient of variation is also useful for comparing the reproducibility of different variables.
CV is a relative measure free from unit of measurement.
Examples: An analysis of the monthly wages paid (in Birr) to workers in two firms A and B
belonging to the same industry gives the following results.
Value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
The Z-score is the number of standard deviations that a given value X is below or above the mean
and defined as Z = (for the sample data sets) and Z = (for the population data sets).
Values above the mean have positive z-scores and values below the mean have negative Z-scores.
The numerical value of the Z-score reflects because of this Z-score is also referred to as relative
measure of relative standing. Scores are generally meaningless by themselves unless they are
compared to the distribution or scores from some reference group. In addition to comparison the
data sets it is useful to transform a given data sets in to a new distribution and the resulting data has
mean value zero and variance one which is the standard normal distribution (we will see it in
chapters of hypothesis testing).
Note: A Z-score value less than -2 and greater than 2 considers as unusual value while between -2
and 2 is considers as ordinary values.
Examples 1. Two sections were given introduction to statistics examinations. The following
information was given.
Value Section 1 Section 2
Mean 78 90
Standard deviation 6 5
Student A from section 1 scored 90 and student B from section 2 scored 95. Relatively speaking
who performed better?
Moments are statistical measures used to describe the characteristics of a distribution and we can
have moment about any number A and /or about the mean (called central moment).
The rth moment of the distribution about the mean is:
∑ ̅̅̅̅
for ungrouped data set and
∑ ̅̅̅̅
for grouped data set.
The rth moments of the distribution about A is:
∑
for ungrouped data set and
∑
for grouped data set.
Skewness
If the distribution of the data is not symmetrical, it is called asymmetrical or skewed. Skewness
characterizes the degree of asymmetry of a distribution around its mean.
The direction of the skewness depends upon the location of the extreme values. If the extreme
values are the larger observations, the mean will be the measure of location most greatly distorted
toward the upward direction. Since the mean exceeds the median and the mode, such distribution is
said to be positive or right-skewed. The tail of its distribution is extended to the right.
On the other hand, if the extreme values are the smaller observations, the mean will be the measure
of location most greatly reduced. Since the mean is exceeded by the median and the mode, such
distribution is said to be negative or left-skewed. The tail of its distribution is extended to the left.