Measures of Dispersion
DR. VANMALA BUCHKE
PROFESSOR
HUMAN DEVELOPMENT (DEPARTMENT OF
HOME SCIENCE)
S.N.G.G.P.G.(AUTONOMOUS) COLLEGE ,
BHOPAL M.P.
Introduction
 So far we have looked at ways of summarising data by showing some sort of average
(central tendency).
 But it is often useful to show how much these figures differ from the average.
 This measure is called dispersion.
1
Measures of variability /Dispersion
Values of central tendencies represents central position of the series and these representative numbers merely
gives us an idea of the general achievement of the group as a whole . They do not show how the individual
values spreads and what is the composition of the series . For ex .
Set A : 1,4,4,4,7
Set B : 4,4,4,4,4
3
4
These sets much in common . They have the same mean ,
median , mode and mid range or in other words they are
similar in central tendency . However, the first set of
observations is more variable than the second .
We notice that the first set values varies from a low value of
1 to a value of 7, where as the second set has 4 as both
5
The first set is composed of values which have wide
difference whereas the second set shows none. Thus
there is a great need to pay attention to the variability
or dispersion of values if we want to describe and
compare them.
The above discussion may lead us to conclude that there is a tendency for data to be dispersed , scattered or to show variability around the average.
Thus we can say that the tendency of the attributes of a group to deviate from the average or central value is known as dispersion or
variability . Thus we can say that dispersion is an average of second order and it is the difference of all items of the series from an average of all
items.
This weakness of the averages of the first orders was rectified by the averages of second order.
6
Measures of Dispersion
Which of the
distributions of scores
has the larger dispersion?
0
25
50
75
100
125
1 2 3 4 5 6 7 8 9 10
0
25
50
75
100
125
1 2 3 4 5 6 7 8 9 10
7
The upper distribution
has more dispersion
because the scores are
more spread out
That is, they are less
similar to each other
Definition
Measures of dispersion are descriptive statistics that
describe how similar a set of scores are to each other
 The more similar the scores are to each other, the lower the measure of dispersion will be
 The less similar the scores are to each other, the higher the measure of dispersion will be
 In general, the more spread out a distribution is, the larger the measure of dispersion will be
8
9
Definition
 According to L.A. CONNOR “ Dispersion is a measure of the extent
to which the individual items vary . ”
 D. C. BROOKs and W. F. L. DICS “ A Dispersion or spread is the
degree of the scatter or variation of variables about a central value
.”
 SPRIEGAL has defined it as , “ the degree to which numerical data
tend to spread about an average value is called the variation or
10
Testing
the
reliability
of the
average
• Homogenous
• Heterogeneous
Useful for
higher
statistical
analysis
• Calculation of
correlation ,
regression , skewness ,
kurtosis etc.
Control
over
variation
Comparison of
two or more
series
PURPOSE /
OBJECTIVES
11
Essentials of good measure of variability
Simplicity in calculation .
Easy to understand .
Rigidly defined.
Precise value
Based on all
observations.
Unaffected by
fluctuations in
sampling .
Usable for further
statistical calculations
12
Types Of
Variability
Absolute
Variability
Absolute variability is
measured in same unit
of data ex. Rs . Cm.
Etc.
Relative
Variability
It is measured in ratio or
percentage or
proportion or co-
efficient of absolute
measure which is co-
efficient of dispersion
Types of dispersion
 There are five ways of showing dispersion:
 Range
 Inter-quartile range
 Mean deviation
 Standard deviation
 Coefficient of variation
13
The Range
 The range is defined as the difference between the largest score in the set of data and
the smallest score in the set of data, XL - XS
 What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
 The largest score (XL) is 9; the smallest score (XS) is 1; the range is XL - XS = 9 - 1 = 8
14
The Range
 The range is the difference between the maximum and minimum values.
 The range is quite limited in statistics apart from using it to say the range is quite large or quite small.
15
10 – 25 – 45 – 47 – 49 – 51 – 52 – 52 – 54 – 56 – 57 – 58 – 60 – 62 – 66 – 68 - 90
Range 10 - 90 But most results were between 45 - 68
When To Use the Range
 The range is used when
 you have ordinal data or
 you are presenting your results to people with little or no knowledge of statistics
 The range is rarely used in scientific work as it is fairly insensitive
 It depends on only two scores in the set of data, XL and XS
 Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
11
ADVANTAGES :-
1. Simplicity : it very easy to compute and understand
2. Reflect Picture Of Data : range provides the broad picture of the data at a
glance.
DISADVANTAGES :-
1. Based On Extreme Values Only : range takes into consideration of largest
and smallest value only. As such it is not based upon every item of the
series.
2. Useless On Open-end Distribution : range can not be computed in case of
open –end distribution.
3. No Knowledge About The Distribution Of Items : it does not tell any
thing about the distribution 17
S
L
R 

The Inter-Quartile Range
 The inter-quartile range is the range of the middle half of the values.
 It is a better measurement to use than the range because it only refers to the middle half of the results.
 Basically, the extremes are omitted and cannot affect the answer.
18
 To calculate the inter-quartile range we must first find the quartiles.
 There are three quartiles, called Q1, Q2 & Q3. We do not need to worry about Q2 (this is just the median).
 Q1 is simply the middle value of the bottom half of the data and Q3 is the middle value of the top half of the data.
19
 We calculate the inter quartile range by taking Q1 away from Q3 (Q3 – Q1).
20
10 – 25 – 45 – 47 – 49 – 51 – 52 – 52 – 54 – 56 – 57 – 58 – 60 – 62 – 66 – 68 – 70 - 90
Remember data must be placed in order
Because there is an even number of values (18) we can split
them into two groups of 9.
Q1 Q3
IR = Q3 – Q1 , IR = 62 – 49. IR = 13
The Semi-Interquartile Range
 The semi-inter-quartile range (or SIR) is defined as the HALF of the difference of the first and third quartiles
 The first quartile is the 25th percentile
 The third quartile is the 75th percentile OR QUARTILE DEVIATION
 SIR OR QD = (Q3 - Q1) / 2
 Coefficient of Quartile Deviation =
21
1
3
1
3
Q
Q
Q
Q


SIR Example
What is the SIR for the data
to the right?
25 % of the scores are below
5
5 is the first quartile
25 % of the scores are above
25
25 is the third quartile
SIR = (Q3 - Q1) / 2 =
(25 - 5) / 2 = 10
2
4
6
 5 = 25th
%tile
8
10
12
14
20
30
 25 = 75th
%tile
60 22
23
The mean deviation
 Measures the ‘average’ distance of each observation
away from the mean of the data
 Gives an equal weight to each observation
 Generally more sensitive than the range or
interquartile range, since a change in any value will
affect it
24
Actual and absolute deviations from mean
A set of xvalues has a mean of
 The residual of a particular x-value is:
Residual or deviation = x -
 The absolute deviation is:
25
x
x
x
-
x
Mean deviation
 The mean of the absolute deviations
26
n
x
x
deviation
Mean
 

To calculate mean deviation
1.Calculate mean of data Find x
2.Subtract mean from each
observation
Record the differences
For each x, find
x
x 
3.Record absolute value of
each residual
Find
x
x 
for each x
4.Calculate the mean of
the absolute values n
x
x
deviation
Mean
 

Add up absolute values
and divide by n
27
Standard Deviation
 The standard deviation is one of the most important measures of dispersion. It is much
more accurate than the range or inter quartile range.
 It takes into account all values and is not unduly affected by extreme values.
28
Standard Deviation
 It is a measure of the
dispersion of a collection of
numbers.
 It indicates how widely spread
the values in a dataset are
with respect to their mean.
29
A data set with a mean of 50 (shown in blue) and a standard deviation (σ)
of 20.
What does it measure?
 It measures the dispersion (or spread) of figures around the mean.
 A large number for the standard deviation means there is a wide spread of values
around the mean, whereas a small number for the standard deviation implies that the
values are grouped close together around the mean.
30
The formula
You may need to sit down for this!
31
σ = √{∑ (x - ẍ)2 / n}
This is the symbol for
the standard deviation
Semi-worked example
 We are going to try and find the standard deviation of the minimum temperatures of 10 weather stations in Britain on a
winters day.
The temperatures are:
5, 9, 3, 2, 7, 9, 8, 2, 2, 3 (˚Centigrade)
32
To calculate the standard deviation we construct a table like this
one:
33
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
√∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
There should be enough space here to
fit in the number of values. Eg: there
are 10 temperatures so leave 10 lines.
x = temperature --- ẍ = mean temperature --- √ = square root
2
x = temperature --- ẍ = mean temperature --- √ = square root
2
To calculate the standard deviation we construct a table like this
one:
34
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
Next we write the values (temperatures) in column x
(they can be in any order).
5
9
3
2
7
9
8
2
2
3
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
x = temperature --- ẍ = mean temperature --- √ = square root
2
Add them up (∑x)
5
9
3
2
7
9
8
2
2
3
Calculate the mean (ẍ)
50/10 = 5
50
35
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
x = temperature --- ẍ = mean temperature --- √ = square root
2
5
9
3
2
7
9
8
2
2
3
50/10 = 5
50
5
5
5
5
5
5
5
5
5
5
Write the mean temperature (ẍ) in every
row in the second column.
36
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
x = temperature --- ẍ = mean temperature --- √ = square root
2
5
9
3
2
7
9
8
2
2
3
50/10 = 5
50
5
5
5
5
5
5
5
5
5
5
Subtract each value (temperature) from the mean. It does not
matter if you obtain a negative number.
0
4
-2
-3
2
4
3
-3
-3
-2
37
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
x = temperature --- ẍ = mean temperature --- √ = square root
2
5
9
3
2
7
9
8
2
2
3
50/10 = 5
50
5
5
5
5
5
5
5
5
5
5
0
4
-2
-3
2
4
3
-3
-3
-2
Square (2) all of the figures you obtained in column
3 to get rid of the negative numbers.
0
16
4
9
4
16
9
9
9
4
38
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
x = temperature --- ẍ = mean temperature --- √ = square root
2
5
9
3
2
7
9
8
2
2
3
50/10 = 5
50
5
5
5
5
5
5
5
5
5
5
0
4
-2
-3
2
4
3
-3
-3
-2
0
16
4
9
4
16
9
9
9
4
Add up all of the figures that you calculated in
column 4 to get ∑ (x - ẍ)2.
80
39
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
x = temperature --- ẍ = mean temperature --- √ = square root
2
5
9
3
2
7
9
8
2
2
3
50/10 = 5
50
5
5
5
5
5
5
5
5
5
5
0
4
-2
-3
2
4
3
-3
-3
-2
0
16
4
9
4
16
9
9
9
4
80
Divide ∑(x - ẍ)2 by the total number of values (in
this case 10 – weather stations)
8
40
(x - ẍ)2
∑(x - ẍ)2 =
∑(x - ẍ)2/n =
(x - ẍ)
ẍ
x
∑x =
ẍ = ∑x/n =
x = temperature --- ẍ = mean temperature --- √ = square root
2
5
9
3
2
7
9
8
2
2
3
50/10 = 5
50
5
5
5
5
5
5
5
5
5
5
0
4
-2
-3
2
4
3
-3
-3
-2
0
16
4
9
4
16
9
9
9
4
80
Take the square root (√) of the figure to obtain the standard
deviation. (Round your answer to the nearest decimal place)
8
41
Answer
2.8°C
Standard Deviation
• It is calculated by determining the square root of the
variance.
43
)
1
(
)
( 2




n
x
x

7,612
x x x
x  2
)
( x
x 
19
612
,
7
20 
1 60 50 10 100
2 34 50 -16 256
3 74 50 24 576
4 10 50 -40 1600
5 86 50 36 1296
6 59 50 9 81
7 34 50 -16 256
8 50 50 0 0
9 43 50 -7 49
10 59 50 9 81
11 68 50 18 324
12 35 50 -15 225
13 53 50 3 9
14 28 50 -22 484
15 82 50 32 1024
16 47 50 -3 9
17 60 50 10 100
18 40 50 -10 100
19 19 50 -31 961
20 59 50 9 81
Why?
 Standard deviation is much more useful.
 For example our 2.8 means that there is a 68%
chance of the temperature falling within ± 2.8°C of
the mean temperature of 5°C.
 That is one standard deviation away from the mean.
Normally, values are said to lie between one, two or
three standard deviations from the mean.
44
Where did the 68% come from?
This is a normal distribution curve. It is a bell-shaped curve with
most of the data cluster around the mean value and where the
data gradually declines the further you get from the mean until
very few data appears at the extremes.
45
For Example – peoples height
46
Most people are near
average height.
Some are short Some are tall
But few are
very short
And few are
very tall.
47
If you look at the graph you can see that most of the data (68%) is located
within 1 standard deviation on either side of the mean, even more (95%)
is located within 2 standard deviations on either side of the mean, and
almost all (99%) of the data is located within 3 standard deviations on
either side of the mean.
48
The coefficient of variation
(This will seem easy compared to the standard deviation!)
Coefficient of variation
 The coefficient of variation indicates the spread of values around the mean by a percentage.
50
Things you need to know
 The higher the Coefficient of Variation the more widely spread the values are around the mean.
 The purpose of the Coefficient of Variation is to let us compare the spread of values between different data sets.
51

Measures of Dispersion.pptx

  • 1.
    Measures of Dispersion DR.VANMALA BUCHKE PROFESSOR HUMAN DEVELOPMENT (DEPARTMENT OF HOME SCIENCE) S.N.G.G.P.G.(AUTONOMOUS) COLLEGE , BHOPAL M.P.
  • 2.
    Introduction  So farwe have looked at ways of summarising data by showing some sort of average (central tendency).  But it is often useful to show how much these figures differ from the average.  This measure is called dispersion. 1
  • 3.
    Measures of variability/Dispersion Values of central tendencies represents central position of the series and these representative numbers merely gives us an idea of the general achievement of the group as a whole . They do not show how the individual values spreads and what is the composition of the series . For ex . Set A : 1,4,4,4,7 Set B : 4,4,4,4,4 3
  • 4.
    4 These sets muchin common . They have the same mean , median , mode and mid range or in other words they are similar in central tendency . However, the first set of observations is more variable than the second . We notice that the first set values varies from a low value of 1 to a value of 7, where as the second set has 4 as both
  • 5.
    5 The first setis composed of values which have wide difference whereas the second set shows none. Thus there is a great need to pay attention to the variability or dispersion of values if we want to describe and compare them.
  • 6.
    The above discussionmay lead us to conclude that there is a tendency for data to be dispersed , scattered or to show variability around the average. Thus we can say that the tendency of the attributes of a group to deviate from the average or central value is known as dispersion or variability . Thus we can say that dispersion is an average of second order and it is the difference of all items of the series from an average of all items. This weakness of the averages of the first orders was rectified by the averages of second order. 6
  • 7.
    Measures of Dispersion Whichof the distributions of scores has the larger dispersion? 0 25 50 75 100 125 1 2 3 4 5 6 7 8 9 10 0 25 50 75 100 125 1 2 3 4 5 6 7 8 9 10 7 The upper distribution has more dispersion because the scores are more spread out That is, they are less similar to each other
  • 8.
    Definition Measures of dispersionare descriptive statistics that describe how similar a set of scores are to each other  The more similar the scores are to each other, the lower the measure of dispersion will be  The less similar the scores are to each other, the higher the measure of dispersion will be  In general, the more spread out a distribution is, the larger the measure of dispersion will be 8
  • 9.
    9 Definition  According toL.A. CONNOR “ Dispersion is a measure of the extent to which the individual items vary . ”  D. C. BROOKs and W. F. L. DICS “ A Dispersion or spread is the degree of the scatter or variation of variables about a central value .”  SPRIEGAL has defined it as , “ the degree to which numerical data tend to spread about an average value is called the variation or
  • 10.
    10 Testing the reliability of the average • Homogenous •Heterogeneous Useful for higher statistical analysis • Calculation of correlation , regression , skewness , kurtosis etc. Control over variation Comparison of two or more series PURPOSE / OBJECTIVES
  • 11.
    11 Essentials of goodmeasure of variability Simplicity in calculation . Easy to understand . Rigidly defined. Precise value Based on all observations. Unaffected by fluctuations in sampling . Usable for further statistical calculations
  • 12.
    12 Types Of Variability Absolute Variability Absolute variabilityis measured in same unit of data ex. Rs . Cm. Etc. Relative Variability It is measured in ratio or percentage or proportion or co- efficient of absolute measure which is co- efficient of dispersion
  • 13.
    Types of dispersion There are five ways of showing dispersion:  Range  Inter-quartile range  Mean deviation  Standard deviation  Coefficient of variation 13
  • 14.
    The Range  Therange is defined as the difference between the largest score in the set of data and the smallest score in the set of data, XL - XS  What is the range of the following data: 4 8 1 6 6 2 9 3 6 9  The largest score (XL) is 9; the smallest score (XS) is 1; the range is XL - XS = 9 - 1 = 8 14
  • 15.
    The Range  Therange is the difference between the maximum and minimum values.  The range is quite limited in statistics apart from using it to say the range is quite large or quite small. 15 10 – 25 – 45 – 47 – 49 – 51 – 52 – 52 – 54 – 56 – 57 – 58 – 60 – 62 – 66 – 68 - 90 Range 10 - 90 But most results were between 45 - 68
  • 16.
    When To Usethe Range  The range is used when  you have ordinal data or  you are presenting your results to people with little or no knowledge of statistics  The range is rarely used in scientific work as it is fairly insensitive  It depends on only two scores in the set of data, XL and XS  Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9 11
  • 17.
    ADVANTAGES :- 1. Simplicity: it very easy to compute and understand 2. Reflect Picture Of Data : range provides the broad picture of the data at a glance. DISADVANTAGES :- 1. Based On Extreme Values Only : range takes into consideration of largest and smallest value only. As such it is not based upon every item of the series. 2. Useless On Open-end Distribution : range can not be computed in case of open –end distribution. 3. No Knowledge About The Distribution Of Items : it does not tell any thing about the distribution 17 S L R  
  • 18.
    The Inter-Quartile Range The inter-quartile range is the range of the middle half of the values.  It is a better measurement to use than the range because it only refers to the middle half of the results.  Basically, the extremes are omitted and cannot affect the answer. 18
  • 19.
     To calculatethe inter-quartile range we must first find the quartiles.  There are three quartiles, called Q1, Q2 & Q3. We do not need to worry about Q2 (this is just the median).  Q1 is simply the middle value of the bottom half of the data and Q3 is the middle value of the top half of the data. 19
  • 20.
     We calculatethe inter quartile range by taking Q1 away from Q3 (Q3 – Q1). 20 10 – 25 – 45 – 47 – 49 – 51 – 52 – 52 – 54 – 56 – 57 – 58 – 60 – 62 – 66 – 68 – 70 - 90 Remember data must be placed in order Because there is an even number of values (18) we can split them into two groups of 9. Q1 Q3 IR = Q3 – Q1 , IR = 62 – 49. IR = 13
  • 21.
    The Semi-Interquartile Range The semi-inter-quartile range (or SIR) is defined as the HALF of the difference of the first and third quartiles  The first quartile is the 25th percentile  The third quartile is the 75th percentile OR QUARTILE DEVIATION  SIR OR QD = (Q3 - Q1) / 2  Coefficient of Quartile Deviation = 21 1 3 1 3 Q Q Q Q  
  • 22.
    SIR Example What isthe SIR for the data to the right? 25 % of the scores are below 5 5 is the first quartile 25 % of the scores are above 25 25 is the third quartile SIR = (Q3 - Q1) / 2 = (25 - 5) / 2 = 10 2 4 6  5 = 25th %tile 8 10 12 14 20 30  25 = 75th %tile 60 22
  • 23.
  • 24.
    The mean deviation Measures the ‘average’ distance of each observation away from the mean of the data  Gives an equal weight to each observation  Generally more sensitive than the range or interquartile range, since a change in any value will affect it 24
  • 25.
    Actual and absolutedeviations from mean A set of xvalues has a mean of  The residual of a particular x-value is: Residual or deviation = x -  The absolute deviation is: 25 x x x - x
  • 26.
    Mean deviation  Themean of the absolute deviations 26 n x x deviation Mean   
  • 27.
    To calculate meandeviation 1.Calculate mean of data Find x 2.Subtract mean from each observation Record the differences For each x, find x x  3.Record absolute value of each residual Find x x  for each x 4.Calculate the mean of the absolute values n x x deviation Mean    Add up absolute values and divide by n 27
  • 28.
    Standard Deviation  Thestandard deviation is one of the most important measures of dispersion. It is much more accurate than the range or inter quartile range.  It takes into account all values and is not unduly affected by extreme values. 28
  • 29.
    Standard Deviation  Itis a measure of the dispersion of a collection of numbers.  It indicates how widely spread the values in a dataset are with respect to their mean. 29 A data set with a mean of 50 (shown in blue) and a standard deviation (σ) of 20.
  • 30.
    What does itmeasure?  It measures the dispersion (or spread) of figures around the mean.  A large number for the standard deviation means there is a wide spread of values around the mean, whereas a small number for the standard deviation implies that the values are grouped close together around the mean. 30
  • 31.
    The formula You mayneed to sit down for this! 31 σ = √{∑ (x - ẍ)2 / n} This is the symbol for the standard deviation
  • 32.
    Semi-worked example  Weare going to try and find the standard deviation of the minimum temperatures of 10 weather stations in Britain on a winters day. The temperatures are: 5, 9, 3, 2, 7, 9, 8, 2, 2, 3 (˚Centigrade) 32
  • 33.
    To calculate thestandard deviation we construct a table like this one: 33 (x - ẍ)2 ∑(x - ẍ)2 = ∑(x - ẍ)2/n = √∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = There should be enough space here to fit in the number of values. Eg: there are 10 temperatures so leave 10 lines. x = temperature --- ẍ = mean temperature --- √ = square root 2
  • 34.
    x = temperature--- ẍ = mean temperature --- √ = square root 2 To calculate the standard deviation we construct a table like this one: 34 (x - ẍ)2 ∑(x - ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = Next we write the values (temperatures) in column x (they can be in any order). 5 9 3 2 7 9 8 2 2 3
  • 35.
    (x - ẍ)2 ∑(x- ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = x = temperature --- ẍ = mean temperature --- √ = square root 2 Add them up (∑x) 5 9 3 2 7 9 8 2 2 3 Calculate the mean (ẍ) 50/10 = 5 50 35
  • 36.
    (x - ẍ)2 ∑(x- ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = x = temperature --- ẍ = mean temperature --- √ = square root 2 5 9 3 2 7 9 8 2 2 3 50/10 = 5 50 5 5 5 5 5 5 5 5 5 5 Write the mean temperature (ẍ) in every row in the second column. 36
  • 37.
    (x - ẍ)2 ∑(x- ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = x = temperature --- ẍ = mean temperature --- √ = square root 2 5 9 3 2 7 9 8 2 2 3 50/10 = 5 50 5 5 5 5 5 5 5 5 5 5 Subtract each value (temperature) from the mean. It does not matter if you obtain a negative number. 0 4 -2 -3 2 4 3 -3 -3 -2 37
  • 38.
    (x - ẍ)2 ∑(x- ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = x = temperature --- ẍ = mean temperature --- √ = square root 2 5 9 3 2 7 9 8 2 2 3 50/10 = 5 50 5 5 5 5 5 5 5 5 5 5 0 4 -2 -3 2 4 3 -3 -3 -2 Square (2) all of the figures you obtained in column 3 to get rid of the negative numbers. 0 16 4 9 4 16 9 9 9 4 38
  • 39.
    (x - ẍ)2 ∑(x- ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = x = temperature --- ẍ = mean temperature --- √ = square root 2 5 9 3 2 7 9 8 2 2 3 50/10 = 5 50 5 5 5 5 5 5 5 5 5 5 0 4 -2 -3 2 4 3 -3 -3 -2 0 16 4 9 4 16 9 9 9 4 Add up all of the figures that you calculated in column 4 to get ∑ (x - ẍ)2. 80 39
  • 40.
    (x - ẍ)2 ∑(x- ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = x = temperature --- ẍ = mean temperature --- √ = square root 2 5 9 3 2 7 9 8 2 2 3 50/10 = 5 50 5 5 5 5 5 5 5 5 5 5 0 4 -2 -3 2 4 3 -3 -3 -2 0 16 4 9 4 16 9 9 9 4 80 Divide ∑(x - ẍ)2 by the total number of values (in this case 10 – weather stations) 8 40
  • 41.
    (x - ẍ)2 ∑(x- ẍ)2 = ∑(x - ẍ)2/n = (x - ẍ) ẍ x ∑x = ẍ = ∑x/n = x = temperature --- ẍ = mean temperature --- √ = square root 2 5 9 3 2 7 9 8 2 2 3 50/10 = 5 50 5 5 5 5 5 5 5 5 5 5 0 4 -2 -3 2 4 3 -3 -3 -2 0 16 4 9 4 16 9 9 9 4 80 Take the square root (√) of the figure to obtain the standard deviation. (Round your answer to the nearest decimal place) 8 41
  • 42.
  • 43.
    Standard Deviation • Itis calculated by determining the square root of the variance. 43 ) 1 ( ) ( 2     n x x  7,612 x x x x  2 ) ( x x  19 612 , 7 20  1 60 50 10 100 2 34 50 -16 256 3 74 50 24 576 4 10 50 -40 1600 5 86 50 36 1296 6 59 50 9 81 7 34 50 -16 256 8 50 50 0 0 9 43 50 -7 49 10 59 50 9 81 11 68 50 18 324 12 35 50 -15 225 13 53 50 3 9 14 28 50 -22 484 15 82 50 32 1024 16 47 50 -3 9 17 60 50 10 100 18 40 50 -10 100 19 19 50 -31 961 20 59 50 9 81
  • 44.
    Why?  Standard deviationis much more useful.  For example our 2.8 means that there is a 68% chance of the temperature falling within ± 2.8°C of the mean temperature of 5°C.  That is one standard deviation away from the mean. Normally, values are said to lie between one, two or three standard deviations from the mean. 44
  • 45.
    Where did the68% come from? This is a normal distribution curve. It is a bell-shaped curve with most of the data cluster around the mean value and where the data gradually declines the further you get from the mean until very few data appears at the extremes. 45
  • 46.
    For Example –peoples height 46 Most people are near average height. Some are short Some are tall But few are very short And few are very tall.
  • 47.
  • 48.
    If you lookat the graph you can see that most of the data (68%) is located within 1 standard deviation on either side of the mean, even more (95%) is located within 2 standard deviations on either side of the mean, and almost all (99%) of the data is located within 3 standard deviations on either side of the mean. 48
  • 49.
    The coefficient ofvariation (This will seem easy compared to the standard deviation!)
  • 50.
    Coefficient of variation The coefficient of variation indicates the spread of values around the mean by a percentage. 50
  • 51.
    Things you needto know  The higher the Coefficient of Variation the more widely spread the values are around the mean.  The purpose of the Coefficient of Variation is to let us compare the spread of values between different data sets. 51