100% found this document useful (1 vote)
207 views238 pages

Stats Book

Uploaded by

George Paily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
207 views238 pages

Stats Book

Uploaded by

George Paily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 238

STATISTICS:

PROBLEMS AND SOLUTIONS


A Complete Course in Statistics

by

J. Murdoch BSc, ARTC, AMIProdE


and
J. A. Barnes BSc, ARCS

STATISTICS, PROBLEMS AND SOLUTIONS


BASIC STATISTICS, LABORATORY INSTRUCTION MANUAL
STATISTICAL TABLES FOR SCIENCE ENGINEERING
BUSINESS STUDIES AND MANAGEMENT
STATISTICS:
PROBLEMS AND SOLUTIONS

J. Murdoch, BSc, ARTC, AMIProdE


Head of Statistics and Operational Research Section,
Cranfield Institute of Technology

and

J. A. Barnes, BSc, ARCS


Lecturer in Statistics and Operational Research,
Cranfield Institute of Technology

Palgrave Macmillan
© J. Murdoch and J. A. Barnes 1973

All rights reserved. No part of this publication may be reproduced or transmitted,


in any form or by any means, without permission

First published 1973

Published by
THE MACMILLAN PRESS LTD
London and Basingstoke
Associated companies in New York Toronto
Melbourne Dublin Johannesburg and Madras
SBN 333 12017 5

ISBN 978-0-333-12017-0 ISBN 978-1-349-01063-9 (eBook)


DOI 10.1007/978-1-349-01063-9
Preface

Statistics is often regarded as a boring, and therefore difficult, subject


particularly by those whose previous experience has not produced any real
need to understand variation and to make appropriate allowances for it. The
subject can certainly be presented in a boring way and in much advanced work
can be conceptually and mathematically very difficult indeed.
However for most people a simple but informed approach to the collection,
analysis and interpretation of numerical information is of tremendous benefit
to them in reducing some of the uncertainties involved in decision making. It is
a pity that many formal courses of statistics appear to frighten people away
from achieving this basic attitude usually through failing to relate the theory to
practical applications.
This book, whose chapters each contain a brief summary of the main
concepts and methods, is intended to show, through worked examples, some of
the practical applications of simple statistical methods and so to stimulate
interest. In order to establish firmly the basic concepts, a more detailed
treatment of the theory is given in chapters 1 and 2. Some examples of a more
academic nature are also given to illustrate the way of thinking about problems.
Each chapter contains problems for the reader to attempt, the solutions to
these being discussed in some detail, particularly in relation to the inferences
that can validly be drawn even in those cases where the numbers have been put
into the correct 'textbook formula' for the situation.
This book will not only greatly assist students to gain a better appreciation
of the basic concepts and use of the theory, but will also be of interest to
personnel in industry and commerce, enabling them to see the range of
applic:ltion of basic statistical concepts.
For the application of basic statistics, it is essential that statistical tables
are used to reduce the computation to a minimum. The tables used are those
by the authors, Statistical Tables, a companion volume in this series of
publications on statistics. The third book, Basic Statistics: Laboratory
Instruction Manual, designed to be used with the Cranfield Statistical Teaching
Aids is referred to here and, in addition, some experiments are suggested for
v
vi Preface

the reader to perform to help him understand the concepts involved. In the
chapters of this book references to Statistical Tables for Science, Engineering
and Management are followed by an asterisk to distinguish them from
references to tables in this book.
The problems and examples given represent work by the authors over many
years and every attempt has been made to-select a representative range to
illustrate the basic concepts and application of the techniques. The authors
would like to apologise if inadvertently examples which they have used have
previously been published. It is extremely difficult in collating a problem book
such as this to avoid some cases of duplication.
It is hoped that this new book, together with its two companion books, will
form the basis of an effective approach to the teaching of statistics, and
certainly the results from its trials at Cranfield have proved very stimulating.
J. Murdoch
Cranfield J. A. Barnes
Contents

list of symbols xi

Probability theory
1.2.1 Introduction 1
1.2.2 Measurement of probability
1.2.3 Experimental Measurement of Probab~lity 2
1.2.4 Basic laws of probability 2
1.2.5 Conditional probability 6
1.2.6 Theory of groups 10
1.2.7 Mathematical expectation 11
1.2.8 Geometric probability 12
1.2.9 Introduction to the hypergeometric law 13
1.2.10 Introduction to the binomial law 14
1.2.11 Management decision theory 15
1.3 Problems 17
1.4 Worked solutions 19
1.5 Practical experiments 25
Appendix I-specimen experimental results 27

2 Theory of distributions 32
2.2.1 Introduction 32
2.2.2 Frequency distributions 33
2.2.3 Probability distributions 35
2.2.4 Populations 35
2.2.5 Moments of distribution 37
2.2.6 Summary of terms 38
2.2.7 Types of distribution 40
2.2.8 Computation of moments 42
2.2.9 Sheppard's correction 45
2.3 Problems 45
2.4 Worked solutions 48
vii
viii Contents

2.5 Practical experiments 60


2.5.1 The drinking straw experiment 60
2.5.2 The shove halfpenny experiment 61
2.5.3 The Quincunx 61

3 Hypergeometric binomial and Poisson distributions 63


3.2.1 Hypergeometric law 63
3.2.2 Binomial law 63
3.2.3 Poisson law 63
3.2.4 Examples of the use of the distributions 65
3.2.5 Examples of the Poisson distribution 68
3.3 Problems 72
3.4 Worked solutions 73
3.5 Practical experiments 76
Appendix I-binomial experiment with specimen results 77

4 Normal distribution 80
4.2.1 Introduction 80
4.2.2 Equation of normal curve 80
4.2.3 Standardised variate 81
4.2.4 Area under normal curve 81
4.2.5 Percentage points of the normal distribution 82
4.2.6 Ordinates of the normal curve 82
4.2.7 Fitting a normal distribution to data 82
4.2.8 Arithmetic probability paper 82
4.2.9 Worked examples 83
4.3 Problems 89
4.4 Worked solutions 92
4.5 Practical experiments 102
Appendix I-Experiment 10 of Laboratory Manual* 102
Appendix 2-Experiment 11 of Laboratory Manual* 105

5 Relationship between the basic distributions 109


5.2 Resume of theory 109
5.2.1 Hypergeometric, binomial and Poisson approximations 111
5.2.2. Normal approximation to Poisson 112
5.2.3 Examples of use of approximations 113
5.2.4 Examples of special interest 115
5.3 Problems 118
5.4 Worked solution~ 119
Appendix I-Experiment 8 of Laboratory Manual* 122
Contents ix

6 Distribution of linear functions of variables 124


6.2.1 Linear combination of variates 124
6.2.2 Sum of n variates 127
6.2.3 Distribution of sample mean 127
6.2.4 Central limit theorem 128
6.2.5 Sum of two means 129
6.3 Problems 130
6.4 Worked solutions 133
Appendix I-Experiment 12 of Laboratory Manual* 141

7 Estimation and significance testing (I)-'large sample' methods 145


7.2.1 Point estimators 145
7.2.2 Confidence intervals 145
7.2.3 Hypothesis testing 146
7.2.4 Errors involved in hypothesis testing 146
7.2.5 Hypothesis (significance) testing 147
7.2.6 Sample size 147
7.2.7 Tests for means and proportions 147
7.2.8 Practical significance 149
7.2.9 Exact and approximate tests 149
7.2.10 Interpretation of significant results 150
7.2.11 Worked examples 150
7.3 Problems 162
7.4 Worked solutions 163

8 Sampling theory and significance testing (11)- 't', 'F' and X2 tests 170
8.2.1 Unbiased estimate of population variance 170
8.2.2 Degrees of freedom 171
8.2.3 The 'u '-test with small samples 171
8.2.4 The 't'-test of significance 172
8.2.5 The 'F'-test of significance 174
8.2.6 The 'X2 '-test of significance 175
8.2.7 One- and two-tailed tests 177
8.2.8 Worked examples 177
8.3 Problems 182
8.4 Worked solutions 184
8.5 Practical experiments 192
Appendix I-Experiment 14 of Laboratory Manual* 193

9 Linear regression theory 197


9.2.1 Basic concepts 197
9.2.2 Assumptions 198
x Contents

9.2.3 Basic theory 198


9.2.4 Significance testing 199
9.2.5 Confidence limits for the regression line 200
9.2.6 Prediction limits 200
9.2.7 Correlation coefficient 201
9.2.8 Transformations 203
9.2.9 Worked example 203
9.3 Problems 208
9.4 Worked solutions 211
List of symbols

a constant term in linear regression


b regression coefficient
C scaling factor used in calculating mean and variance
Ei expected frequency for X2 goodness-of-fit test
Eij expected frequency in cell ij of contingency table
Ii frequency of ith class
F variance ratio
Ho null hypothesis
Hi alternative hypothesis
m mean of poisson distribution
11~ rth moment of a distribution about the origin
I1r rth moment of a distribution about its mean
n number of observations and/or number of trials in probability theory

°i
°ij
observed frequency for X2 goodness-of-fit test
observed frequency in ijth :cell of contingency table
P probability
P(A) probability of an event A
Pf~ number of permutations of n objects taken x at a time
P(A/B) conditional probability of A on assumption that B has occurred
E[x] expected value of variate, x
r sample correlation coefficient
,
S standard deviation of a sample
S2 unbiased sample estimator of popUlation variance
t Student's 't'
U coded variable used in calculating mean and variance of sample
alsot
U standardised normal variate
Xi value of variate
Yi value of dependent variable corresponding to Xi in regression
Yi estimated value of dependent variable using the regression line
t Little confusion should arise here on the use of the same symbol in two different ways.
Their use in both these areas is too standardised for the authors to suggest a change.
xi
xii List of Symbols

Greek Symbols

J.l. population mean


02 population variance
X2 sum of the squares of standardised normal deviates
v number of degrees of freedom
a magnitude of risk of 1st kind or significance level
{3 magnitude of risk of 2nd kind or (1 - (3) is the power of the test
7r proportion of a population having a given attribute
€ standard error
Note: a and {3 are also used as parameters of the population regression line
1/ = a + {3 (Xt-x) but again no confusion should arise.

Mathematical Symbols
n
L summation from i = 1 to n
;=1

e exponential 'e', the base of natural logarithms


approximately equal to
b""'fj the sample statistic b is an estimate of population parameter (3
x>v x is greater than y
x~y x is greater than or equal to y
x<y x is less than y
x~y x is less than or equal to y

number of different combinations of size x from group of size n

x! factorial x = x(x-l)(x-2) ... 3 x 2 x 1


rx number of permutations of n objects taken x at a time

Note: The authors use ( n ) but in order to avoid any confusion both are given
in the definitions. x
1 Probability theory

1.1 Syllabus Covered


Definition and measurement of probability; addition and multiplication laws;
conditional probability; permutations and combinations; mathematical
expectation; geometric probability; introduction to hypergeometric and
binomial laws.

1.2 Resume of Theory and Basic Concepts

1.2.1 Introduction
Probability or chance is a concept which enters all activities. We speak of the
chance of it raining today, the chance of winning the football pools, the chance
of getting on the bus in the mornings when the queues are of varying size, the
chance of a stock item going out of stock, etc. However, in most of these uses of
probability, it is very seldom that we attempt to measure or quantify the
statements. Most of our ideas about probability are intuitive and in fact
probability is a quantity rather like length or time and therefore not amenable
to simple definition. However, probability (like length or time) can be measured
and various laws set up to govern its use.
The following sections outline the measurement of probability and the rules
used for combining probabilities.

1.2.2 Measurement oj Probability


Probability is measured on a scale ranging from 0 to I and can take any value
inside this range. This is illustrated in figure 1.1.
The probability p that an event (A) will occur is written
P(A)=p where O~p~ 1
,I
2 Statistics: Problems and Solutions

Probability that you will die one day (absolute certainty)

~ t!o or 0.5 Probability that an unbiased coin shows 'heads' after one toss
or 0.167 Probability that a die shows 'six' on one roll
Probability that you will live forever (absolute impossibility)
Figure 1.1. Probability scale.

It will be seen that on this continuous scale, only the two end points are
concerned with deductive logic (although even here, there are certain logical
difficulties with the particular example quoted).
On this scale absolute certainty is represented by p = I and an impossible
event has probability of zero. However, it is between these two extremes that
the majority of practical problems lie. For instance, what is the chance that a
machine will produce defective items? What is the probability that a machine
will find the overhead service crane available when required? What is the
probability of running out of stock of any item? Or again, in insurance, what
is the chance that a person of a given age will survive for a further year?

1.2.3 Experimental Measurement of Probability


In practice there are many problems where the only method of estimating the
probability is the following

b bil·t f t A P(A) total occurrences of the event A


pro a 1 y 0 even, = total number of trials

For example, what is the probability of an item's going out of stock in a given
period?
Measurement showed that 189 items ran out in the period out of a total
number of stock items of 2000, therefore the estimate of probability of a stock
running out is
P(A) = ~ = 0.0945
Again, if out of a random sample of 1000 men, 85 were found to be over
1.80 m tall, then
estimate of probability of a man being over 1.8Q m tall = 18go = 0.085.

1.2.4 Basic Laws of Probability


I. Addition Law of Probability
This law states that if A and B are mutually exclusive events, then the probability
that either A or B occurs in a given trial is equal to the sum of the separate
probabilities of A and B occurring.
Probability Theory 3

In symbolic terms this law can be shown as


peA or B) = P(A) + P(B)

This law can be extended by repeated application to cover the case of more
than two mutually exclusive events.
ThusP(A or B or Cor ... ) = peA) + PCB) + P(C) + ...
The events of this law are mutually exclusive events, which simply means
that the occurrence of one of the events excludes the possibility of the occurrence
of any of the others on the same trial.
For example, if in a football match, the probability that a team will score
ogoals is 0.50, 1 goal is 0.30,2 goals is 0.15 and 3 or more goals is 0.05, then
the probability of the team scoring either 0 or 1 goals in the match is
p(0 or 1) =P(O) + P(1) = 0.50 + 0.30 = 0.80
Also, the probability that the team will score at least one goal is
peat least one goal) =PO) + P(2) + P(3 or more) = 0.30 + 0.15 + 0.05 = 0.50
Any event either occurs or does not occur on a given occasion. From the
definition of probability and the addition law, the probabilities of these two
alternatives must sum to unity. Thus the probability that an event does not
occur is equal to
1 - (probability that the event does occur)
In many examples, this relationship is very useful since it is often easier to
find the probability of the complementary event first.
For example, the probability of a team's scoring at least one goal in a
football match can be obtained as
peat least 1 goal) = I-p(O goals) = 1-0.50 = 0.50
as before.
As a further example, suppose that the probabilities of a man dying from
heart disease, cancer or tuberculosis are 0.51,0.16 and 0.20 respectively. The
probability that a man will die from heart disease or cancer is 0.51 + 0.16 = 0.67.
The probability that he will die from some cause other than the three mentioned
is 1-(0.51 + 0.16 + 0.20) = 0.13; i.e., 13% of men can be expected to die from
some other cause.
However, consider the following example. Suppose that of all new cars sold,
40% are blue and 30% have two doors. Then it cannot be said that the probability
of a person's owning either a blue car or a two-door car is 0.70 (= 0.40 + 0.30)
since the events (blue cars and two-door cars) are not mutually exclusive, i.e., a
car can be both blue and have two doors. To deal with this case, a more general
version of the addition law is necessary. This may he stated as
PeA or B or both) =P(A) +P(B) - P(A and B)
4 Statistics: Problems and Solutions

The additional term, the probability of events A and B both occurring


together on a trial is obtained using the multiplication law of probabilities.
2. Multiplication Law of Probability
This law states that the probability of the combined occurrence of two events
A and B is the product of the probability of A and the conditional probability
of B on the assumption that A has occurred.
ThusP(A and B) =P(AB) =P(A) xP(B/A)

wherep(B/A) is the conditional probability ofeventB on the assumption that


A occurs at the same time (see list of symbols, page xi).

P(AB) is also given by PCB) x P(A/B)


While this law is usually defined as above for two events, it can be extended
to any number of events.
3. Independent Events
Events are defined as independent if the probability of the occurrence of either
is not affected by the occurrence or not of the other. Thus if A and Bare
independent events, then the law states that the probability of the combined
occurrence of the events A and B is the product of their individual
probabilities. That is
P(AB) =peA) x PCB)
Many people meeting the ideas of probability for the first time find difficulty
in deciding whether to add or multiply probabilities. If the problem (or part of
it) is concerned with either event A or event B occurring, then add probabilities;
if A and B must both occur (at the same time or one after the other), then
multiply probabilities. Consider the following example with the throwing of
two dice illustrating the use of the two basic laws of probability.

Examples
1. In the throw of two dice, what is the probability of obtaining two sixes?
One of the dice must show a six and the other must also show a six. Thus the
required probability (independent events) is

p=!xi=~
2. In the throw of two dice, what is the probability of a score of 9 points?
Here we must consider the number of mutually exclusive ways in which the
score 9 can occur. These ways are listed below

i
Dice A 3 5 6
and
DiceB l or
5
or
4
I or
J
Probability Theory 5

The probability of any of these four possible arrangements occurring is equal


to, as before (i x i) = ~
Thus the probability that two dice show a total score of 9 is equal to
316 + 316 + fr, + l6 = !
3. In marketing a product, records show that on average one call in 10 results
in making a sale to a potential customer. What is the probability that a salesman
will make two sales from any two given calls?
Assuming the events (sales to different customers) to be independent, use of
the multiplication law gives the probability of making two sales in two calls as
0.1 x 0.1 = 0.01.
As an extension of this example, what is the probability of making at least
one sale in five calls? The easiest way to calculate this probability is to note that
the event complementary to making one or more sales is not to make any sales.
Using the multiplication law gives the probability of making no sales in five calls
as
0.9 x 0.9 x 0.9 x 0.9 x 0.9 = 0.9 5 = 0.5905
peat least one sale in five calls) = 1-0.5905 = 0.4095
These basic laws for combining probability may be used to answer such
questions as how many calls must be planned so that there is a high probability,
say 95% or 99%, of making at least one sale or of making at least two sales,
etc. Or, again, what is the probability that it will need more than, say eight
calls to make two sales?
As an example, suppose that the probability of making at least one sale in n
calls is to be at least 0.95. What is the smallest value of n which will achieve this?
Turning the problem round gives that the probability of making no sales in
n calls is to be at most 0.05 and thus
0.9 n ~ 0.05
The smallest value of n which satisfies this requirement is 29. This means that
if the salesman schedules 29 customer calls every day, he will make at least one
sale on just over 95% of days in the long run. Conversely on just under one day
in 20 he will receive no orders as a result of any of his 29 visits. The average daily
number of sales made will be 2.9.
The example of the addition law where the events (a car being blue and a
car having two doors) were not mutually exclusive can now be completed.
The probability that a randomly chosen car is either blue or has two doors
or is a two-door blue car is given by
P(blue) + P( 2 doors) - P(blue and 2 doors)
= 0.4 + 0.3 -(0.4 x 0.3)

= 0.70-0.12 = 0.58
6 Statistics: Problems and Solutions

Not
blue 0.6

Blue 0.4

_____
0.3
-4~--~-----

0.7
2 Not
doors 2 doors
Figure 1.2

This result is valid on the assumption that the number of doors that a car has
is not dependent on its colour.
Figure 1.2 illustrates the situation. The areas of the four rectangles within the
square represent the proportion of all cars having the given combination of
colour and number of doors. The total area of the three shaded rectangles is
equal to 0.58, the proportion of cars that are either blue or have two doors or
are two-door blue cars.

1.2.5 Conditional Probability


In many problems and situations, however, the events are neither independent
nor mutually exclusive, and the general theory for conditional probability will
be outlined here.
Before considering conditional probability in algebraic terms, some simple
numerical examples will be given.
If one card is drawn at random from a full pack of 52 playing cards, the
probability that it is red is 26/52. Random selection of a card means that each
of the 52 cards is as likely as any of the others to be the sampled one.
If a second card is selected at random from the pack (without replacing the
first), the probability that it is red depends on the colour of the first card drawn.
There are only 51 cards that could be selected as the second card, all of them
having an equal chance. If the first had been black, there are 26 red cards
available and the probability that the second card is red is therefore 26/51 (Le.,
conditional upon the first card being black).
Similarly, if the first card is red, the probability that the second is also
red is 25/51.
The process can be continued; the probability that a third card is red being
26/50,25/50 or 24/50 depending on whether the previous two cards drawn were
both black, one of each colour (drawn in either order) or both red.
Probability Theory 7

Most practical sampling problems are of this 'sampling without replacement'


type and conditional probabilities need to be taken into account. There are,
however, suitable approximations which can often be used in practice instead
of working with the exact conditional values. (These methods are referred to
in chapter 5.)
Using the multiplication law and the conditional probabilities discussed
above, the probability that two cards, taken randomly from a full pack (52 cards),
will both be red is given by
p(first card red) x P(second card red given that first was red) = ~ x %f =f&
This result applies whether the two cards are taken one after the other or
both at the same time.
As another example, suppose that two of the bulbs in a set of 12 coloured
lights are burnt out. What is the probability of finding
(a) both burnt-out bulbs in the first two tested?
(b) one of the burnt-out bulbs in the first two tested?
(c) at least one of the burnt-out bulbs in the first two tested?

The solutions are


(a) p(first bulb tested is burnt out) = fz
P(second bulb tested is also burnt out) = fi
P(finding both burnt-out bulbs in first two tests) =fz x fi = n
(b) p(first bulb tested is burnt out) = fz
andp(second bulb tested is not burnt out) = ¥l
The product of these probabilities =fz x ¥l =~ =~
The same result can be obtained if a burnt-out bulb is found on the second
test, the first bulb being good. The two situations are mutually exclusive.
Then p(first bulb good) = ~
P(second bulb burnt out) = fr
Thus p(first bulb good and second burnt out) =Wx fr =~

Either of these two situations satisfies (b) and the probability of at least one
of the burnt-out bulbs being in the first two tested is given by their sum

~+~=~=~
(c) The probability of at least one burnt-out bulb being found in two tests
is equal to the sum of the answers to parts (a) and (b), namely

~+n=i1=~
8 Statistics: Problems and Solutions

As a check on this result, the only other possibility is that neither of the
faulty bulbs will be picked out for the first two tests. The probability of this,
using the multiplication law with the appropriate conditional probability, is
WXfy=Y
The situation in part (c) therefore has probability of I-Y = ~ as given by
direct calculation.
Consider a box containing r red balls and w white balls. A random sample of
two balls is drawn. What is the probability of the sample containing two red
balls?

r= red bolls

w= white bolls

If the first ball is red (event A), probability of this event occurring
r
P(A)=-
r+w
The probability of the second ball being red (event B) given the first was red
is thus

P(B/A}=~
r+w-I
since there are now only (r-I) red balls in the box containing (r+w-l) balls.
:. Probability of the sample containing two red balls
_ r (r-l)
---x
(r+w) (r+w-l)
In similar manner, probability of the sample containing two white balls
_ w (w-i)
---x
(r+w) (r+w-l)
Also consider the probability'of the samples containing one red and one
white ball. This event can happen in two mutually exclusive ways, first ball red,
second ball white or first ball white, second ball red.
Thus, the probability of the sample containing one white and one red
ball is

_r_x w +~x r = 2wr


(r+w) (r+w-l) (r+w) (r+w-I) (r+w}(r+w-I)
Note: Readers might like to verify that the sum of these three probabilities = I.
Probability Theory 9

Examples
1. In a group of ten people where six are male and four are female, what is the
chance that a committee of four, formed from the group with random selection,
comprises (a) four females, or (b) three females and one male?
(a) Probability of committee with four females
-to x~ xi x~ = 0.0048
(b) Committee comprising three females and one male. This committee can
be formed in the following mutually exclusive ways

1st member M F F F
2nd member F or M or F or F
3rd member F F M F
4th member F F F M
The probability of the first arrangement is
fu x~ x~ x, = 0.0286
The probability for the second arrangement is

10 x ~ x~ x~ = 0.0286
and similarly for the third and fourth columns, the position of the numbers in
the numerator being different in each of the four cases. The required probability
is thus 4 x 0.0286 = 0.114.

2. From a consignment containing 100 items of which 10% are defective, a


random sample of 10 is drawn. What is the probability of (a) the sample
containing no defective items, or (b) the sample containing exactly one
defective item?

(a) Probability of no defective items in the sample only arises in one way.
Probability of no defective items

P(O) = 1& x ~ x ~ x ... x M= 0.33


(b) Exactly one defective item in the sample can arise in 10 mutually exclusive
ways as shown below

2 3 4 5 6 7 8 9 10
D G G G G G G G G G D = defective item
G D G G G G G G G G G = good item
G G G G G G G G G D
10 Statistics: Problems and Solutions

Thus the probability of one defective in 10 items sampled is


P(1) == 10 x f& x ~ x ~ x ... x ij == 0.41

1.2.6 Theory of Groups


There are two group theories which can assist in the solution and/or computation
involved in probability theory (rather than the long methods used in examples
in section 1.2.5).

Permutations
Groups form different permutations if they differ in any or all of the following
aspects.
(l) Total number of items in the group.
(2) Number of items of anyone type in the group.
(3) Sequence.
Thus
ABB, BAB are different permutations (3); AA, BAA are different permutations
(l) and (2); CAB, CAAB are different permutations (l) and (2);BAABA, BABBA
are different permutations because of (2).
Thus distinct arrangements differing in {l) and/or (2) and/or (3) form different
permu ta tions.

Group Theory No. 1


If there are n objects, each distinct, then the number of permutations of
objects taken x at a time is
p!- ==_n_!_
x (n-x)!

An example is the number of ways of arranging two different letters out of


the word girl.
Here n == 4, x == 2
PJ~ == ~: == 12
The arrangements are
gi, gr, gl, ir, ii, rl, Ir, Ii, ri, Ig, rg, ig
Combinations
Groups form different combinations when they differ in
(1) Total number of objects in the group.
(2) Number of objects of anyone type in the group.
(Note: Sequence does not matter.)
Probability Theory 11

ThusABB, BAB, BBA, are not different combinations.

Group Theory No.2


If there are n objects, each distinct, then the number of different combinations
of size x is given by

or (:)= x!(:~x)!
As an example, a committee of three is to be formed from five department
heads. How many different committees can be formed?

1.2.7 Mathematical Expectation


In statistics, the term expected value refers to the average value that a variable
takes. It is often used in the context of gambling but its use is appropriate
whenever we are concerned with average values.
The expected value is thus the mean of a distribution (see chapter 2),
i.e., the average sample value which will be obtained when the sample size tends
to infinity.
Suppose player A receives an amount of money MI if event EI happens, an
amount M2 if E2 happens, ... and amount Mn if En happens, where
E I , E 2 , • •• En are mutually exclusive and exhaustive events;P\, P2 , • •• Pn
are the respective probabilities of these events. Then A's mathematical
expectation of gain is defined as
E(M) = MIP I + M 2P2 + ... + MnPn
In gambling, for the game to be fair the expectation should equal the charge for
playing the game. This concept is also used in insurance plans, etc. Use of this
concept of expected value is illustrated in the following example.

Example
The probability that a man aged 55 will live for another year is 0.99. How large
a premium should he pay for £2000 life insurance policy for one year?
(Ignore insurance company charges for administration, profit, etc.)
Let s = premium to be paid
Expected return = 0 x 0.99 + £2000 x 0.01 = £20
:. Premium s = £20 (should equal expected return)
12 Statistics: Problems and Solutions

1.2.8 Geometric Probability


Many problems lend themselves to solutions only by using the concept of
geometric probability and this will be illustrated in this section.

Example-A Fairground Problem


At a fair, customers roll a coin onto a board made up to the pattern shown in
figure 1.3. If the coin finishes in a square (not touching any lines), the number
of coins the customer will win is shown in that square, but otherwise the coin
is lost. If at least half of the coin is outside the board, it is returned to the
player.

2 I 2

I 4 I

2 I 2

Figure 1.3

Given that the lines are 1 mm thick, the sides of the squares are 60 mm and
the diameter of the coin is 20 mm what is
(a) the chance of getting the coin in the 4 square?
(b) the chance of getting the coin in a 2 square?
(c) the expected return per trial, if returns are made in accordance with the
numbers in the squares?

E
E
iD

Figure 1.4
Probability Theory 13

Considering one square (figure 1.4), total possible area (ignoring small
edge-effects of line thickness) = 61 2 = 3721
For one square, the probability that the coin does not touch a line is
1600
3721 = 0.43

Thus if the coin falls at random on the board


(a) the chance that it falls completely within the 4 square = ~ x 0.43 = 0.048
(b) the chance that it falls completely within a 2 square =; =
x 0.43 0.191
(c) the expected payout per trial is (4 x 0.048) + (2x 0.191) + (1 x 0.191)
= 0.76
Since it costs one coin to play, the player will lose 0.24 of a coin per turn in the
long run.

1.2.9 Introduction to the Hypergeometric Law


The hypergeometric law gives an efficient way of solving problems where the
probabilities involved are conditional.
In general form, it can be defined as follows.

Definition of Hypergeometric Law


If a group contains N items of which M are of one type and the remainder
N - M, are of another type, then the probability of getting exactly x of the
first type in a random sample of size n is

To illustrate the use of the hypergeometric law, consider example (2),


page 9 again.
HereN= 100
M = 10 or number of defective items in the batch
N - M = 90 or number of good items in the batch

Sample size n =10


·(10)(90) 1O! 90!
:. For (a) x =0, P(O) = a 10 = OliO! x iOT8Oi
( 100) lOa!
10 10! x 90!
14 Statistics: Problems and Solutions

for (b) x= 1,

Both results are the same as before but are obtained more easily.

1.2.10 Introduction to the Binomial Law


Although this law will be dealt with more fully in chapter 3, it is useful to
introduce it here in the chapter on probability since knowledge of the law
helps in the understanding of probability.

Definition of Binomial Law


If the probability of success in an individual trial is p, and p is constant over all
trials, then the probability of x successes in n independent trials is

P(x) = (:)px(1_p)n-x

To illustrate the use of the binomial law consider the following example. A
firm has 10 lorries in service distributing its goods. Given that each lorry spends
10% of its time in the repair depot, what is the probability of (a) no lorry in the
depot for repair, and (b) more than one in for repair?
(a) Probability of success (Le., lorry under repair), p = 0.10
Number of trials n = 10 (lorries)

Probability of no lorries being in for repair

P(O) = eo
O) x 0.10 0 x 0.9010 = 0.3487

(c.f. result obtained from first principles)


(b) The probability of more than one lorry being in for repair, 1'(> 1), can best
be obtained by:
P(>1)= 1-P(0)-P(1)
Probability of exactly one lorry being in for repair

P(1) = (\0) X 0.10 1 X 0.90 10 - 1 = 0.3874

Probability of more than one lorry being in for repair


1'(> 1) = 1-0.3487 -0.3874 = 0.2639
Thus this binomial law gives a neat and quick way of computing the
probabilities in simple cases like this.
Probability Theory 15

1.2.11 Management Decision Theory


What has become known as decision theory is in simple terms just the application
of probability theory and in the authors' opinion should be considered primarily
as just this. This point of view will be illustrated in some examples below and
in the problems given later in the chapter. It must be appreciated that in decision
theory the probabilities assigned to the decisions are themselves subject to
errors and, whilst better than nothing, the analysis should not be used unless a
sensitivity analysis is also carried out. Also, when using decision theory (or
probability theory) for decisions involving capital investment, discounted cash
flow (D.C.F.) techniques are required. However, in order not to confuse readers,
since this is a text on probability, D.C.F. has not been used in the examples or
problems.
Note: Although it is the criterion used here by way of general introduction,
the use of expected values is just one measure in a decision process. In too many
books it appears to be the sole basis on which a decision is made.

Examples

1. Consider, as a simplification of the practical case, that a person wishing to


sell his car has the following alternatives: (a) to go to a dealer with complete
certainty of selling for £780, (b) to advertise in the press at a cost of £50, in
order to sell the car for £850.
Under alternative (b), he estimates that the probability of selling the car for
£850 is 0.60. If he does not sell through the advertisement for £850, he will take
it to the dealer and sell for £780. (Note that a more realistic solution would
allow for different selling prices each with their associated probability of
occurrence.) Should he try for a private sale?
If he advertises the car there is a chance of 0.6 of obtaining £850 and
therefore a chance of 0.4 of having to go to the dealer and accept £780.
The expected return on the sale
= £850 x 0.6 + £780 x 0.4 = £822
For an advertising expenditure of £50, he has only increased his expected
return by £(822- 780) or £42.
On the basis of expected return therefore, he should not advertise but go
direct to the dealer and accept £780.
This method of reaching his decision is based on what would happen on
average if he had a large number of cars to sell each under the same conditions
as above. By advertising each of them, he would in the long run receive £8 per
car less than if he sold direct to the dealer without advertising. Such a long
16 Statistics: Problems and Solutions

run criterion may not be relevant to his once only decision. Compared with the
guaranteed price, by advertising, he will either lose £50 or be £20 in pocket with
probabilities of 0.4 and 0.6 respectively. He would probably make his decision
by assessment of the risk of 40% of losing money. In practice, he could probably
increase the chances of a private sale by bargaining and allowing the price to drop
as low as £830 before being out of pocket.
As a further note, the validity of the estimate (usually subjective) of a 60%
chance of selling privately at the price asked should be carefully examined as
well as the sensitivity of any solution to errors in the magnitude of the
probability estimate.

2. A firm is facing the chance of a strike occurring at one of its main plants.
Considering only two points (normally more would be used), management
assesses the following:
(a) An offer of 5% pay increase has only a 10% chance of being accepted
outright. If a strike occurs:

chance of a strike lasting 1 month. = 0.20


chance of a strike lasting 2 months = 0.50
chance of a strike lasting 3 months = 0.30
chance of a strike lasting longer than 3 months = 0.0
(b) An offer of 10% pay increase has a 90% chance of being accepted
outright. If a strike occurs:
chance of strike lasting 1 month = 0.98
chance of strike lasting 2 months = 0.02
chance of strike lasting longer than 2 months = 0.0
Given that the increase in wage bill per 1% pay increase is £10 000 per month
and that any agreement will last only 5 years and also that the estimated cost
of a strike is £1 000000 per month, made up of lost production, lost orders,
goodwill, etc., which offer should management make?
(a) Considering expected costs for the offer of 5%. Expected loss due to strike
= 0.90[(0.20 x 1) + (0.50 x 2) + (0.30 x 3)] x £1 000000 = £1 890000
Increase in wage bill over 5 years
= £10 000 x 12 x 5 x 5 = £3 000 000
Total (expected) cost of decision = £4 890 000
(b) For the offer of 10%, expected loss due to strike

= 0.10[(0.98 x 1) + (0.02 x 2)] x £1 000000 = £102 000


Probability Theory 17

Increase in wage bill over 5 years


= £ 10 000 x 12 x 5 x 10 = £6 000 000

Total (expected) cost of decision = £6102000


Thus, management should clearly go for the lower offer and the possible
strike with its consequences, although many other factors would be considered
in practice before a final decision was made.

1.3 Problems for Solution


1. Four playing cards are drawn from a well-shuffled pack of 52 cards.
(a) What is the probability that the cards drawn will be the four aces?
(b) What is the probability that the cards will be the four aces drawn in
order Spade, Heart, Diamond, Club?

2. Four machines-a drill, a lathe, a miller, and a grinder-operate independently


of each other. Their utilisations are: drill 50%, lathe 40%, miller 70%, grinder
80%.
(a) What is the chance of both drill and lathe not being used at any instant
of time?
(b) What is the chance of all machines being in use?
(c) What is the chance of all machines being idle?

3. A man fires shots at a target, the probability of each shot scoring a hit being
1/4 independently of the results of previous shots. What is the probability that
in three successive shots
(a) he will fail to hit the target?
(b) he will hit the target at least twice?

4. Five per cent of the components in a large batch are defective. If five are
taken at random and tested
(a) What is the probability that no defective components will appear?
(b) What is the probability that the test sample will contain one defective
component?
(c) What is the probability that the test sample will contain two defective
components?

5. A piece of equipment will only function if three components, A, Band C,


are all working. The probability of A 's failure during one year is 5%, that of
B's failure is 15%, and that of C's failure is 10%. What is the probability that
the equipment will fail before the end of one year?
18 Statistics: Problems and Solutions

6. A certain type of seed has a 90% germination rate. If six seeds are planted,
what is the chance that
(a) exactly five seeds will germinate?
(b) at least five seeds will germinate?

7. A bag contains 7 white, 3 red, and 5 black balls. Three are drawn at random
without replacement. Find the probabilities that (a) no ball is red, (b) exactly
one is red, (c) at least one is red, (d) all are of the same colour, (e) no two are
of the same colour.

8. If the chance of an aircraft failing to return from any single operational


flight is 5%
(a) what is the chance that it will survive 10 operational flights?
(b) if such an aircraft does survive 10 flights, what is the chance that it will
survive a further 10 flights?
(c) if five similar aircraft fly on a mission, what is the chance that exactly
two will return?

9. If the probability that any person 30 years old will be dead within a year is
0.01, find the probability that out of a group of eight such persons, (a) none,
(b) exactly one, (c) not more than one, (d) at least one will be dead within a
year.

10. A and B arrange to meet between 3 p.m. and 4 p.m., but that each should
wait no longer than 5 min for the other. Assuming all arrival times between
3 o'clock and 4 o'clock to be equally likely, find the probability that they meet.

11. A manufacturer has to decide whether or not to produce and market a new
Christmas novelty toy. Ifhe decides to manufacture he will have to purchase a
special plant and scrap it at the end of the year. If a machine costing £ 10 000
is bought, the fixed cost of manufacture will be £1 per unit; if he buys a
machine costing £20 000 the fixed cost will be SOp per unit. The selling
price will be £4.50 per unit.
Given the following probabilities of sales as:
Sales £2000 £5000 £10 000
Probability 0.40 0.30 0.30
What is the decision with the best pay-off?

12. Three men arrange to meet one evening at the 'Swan Inn' in a certain town.
Probability Theory 19

There are, however, three inns called 'The Swan' in the town. Assuming that each
man is equally likely to go to anyone of these inns
(a) what is the chance that none of the men meet?
(b) what is the chance that all the men meet?

13. An assembly operator is supplied continuously with components x, y, and z


which are stored in three bins on the assembly bench. The quality level of the
components are (1) x-I 0% defective, (2) y-2% defective, (3) z-5% defective.

5 % defective

(2) (I) (2)


Assembly unit

Figure 1.5

An assembly consists of two components of x, one component of y and two


components of z. If components are selected randomly, what proportion of
assemblies will contain
(a) no defective components?
(b) only one defective component?

14. A marketing director has just launched four new products onto the market.
A market research survey showed that the chance of any given retailer adopting
the products was
Product A 0.95 Product C 0.80
ProductB 0.50 ProductD 0.30
What proportion of retailers will (a) take all four new products, (b) take
A, Band C but not D?

1.4 Solutions to Problems


_ 4
1. (a) Probability of the 1st card being an ace -n
If the first card is an ace,
the probability of 2nd card being an ace =/r
If the first two cards are aces,
the probability of 3rd card being an ace =fo
20 Statistics: Problems and Solutions

If the first three cards are aces,


the probability of 4th card being an ace =i9
By multiplication law,
the probability of all four being aces =~xlIxibxi9
= 0.000 0037

(b) Probability of 1st card being the Ace of Spades =i,.


If the first card is the Ace of Spades,
the probability of 2nd card being the Ace of Hearts =/r
If the first two cards are the Aces of Spades and
Hearts, probability of 3rd card being the Ace of Diamonds =/0
If the first three cards are the Aces of Spades, Hearts
and Diamonds, the probability of 4th card being the
Ace of Clubs =-Jg
By the multiplication law, the probability of drawing
four aces in the order Spades, Hearts, Diamonds, Clubs =i,.x/rx/oxi9
= 0.0000001 5
2. The utilisations can be expressed in probabilities as follows:
Probability of being used Probability of being idle
Drill 0.50 0.50
Lathe 0.40 0.60
Miller 0.70 0.30
Grinder 0.80 0.20
(a) By the multiplication law, the probability of drill and lathe being
idle = 0.50 x 0.60 = 0.30
(b) By the multiplication law, the probability of all machines
being busy =0.50 x 0.40 x 0.70 x 0.80 =0.112
(c) Probability of all machines being idle at any
instant = 0.5 x 0.6 x 0.3 x 0.2 = 0.018

3. (a) p(all three shots miss target) =1 x 1 xi =~ =0.42


(b) P(hits target once) =(i x 1 xl) + (1 x i xl) + (1 x 1 x i)::: ~ = 0.42
p(hits target at least twice) = 1-(0.42 +0.42) =0.16
(This result can be checked by direction evaluation of the probabilities of
two hits and three hits.)
4. This problem is solved from first principles, although the binomial law can be
applied.
(a) Probability of selecting a good item from the large batch = 0.95
Probability Theory 21

By the multiplication law, probability of selecting five good items from the
large batch = 0.95 x 0.95 x 0.95 x 0.95 x 0.95 = 0.77
(b) In a sample of five, one defective item can arise in the following five
ways:
D A A A A
A D A A A
D = defective part
A A D A A
A = acceptable part
A A A D A
A A A A D
The probability of each one of these mutually exclusive ways occurring
= 0.05 x 0.95 x 0.95 x 0.95 x 0.95 = 0.0407

The probability that a sample of five will contain one defective item
= 5 x 0.0407 = 0.2035
(c) In a sample of five, two defective items can occur in the following ways:

D D D D A A A A A A
D A A A D D D A A A
A D A A AD A D D A
A A D AA D A D A D
A A A D A A D A D D
or in 10 ways.
Probability of each separate way = 0.05 2 x 0.95 3 = 0.00214
Probability that the sample will contain two defectives
= 10 x 0.00214 = 0.0214
It will be seen that permutations increase rapidly and the use of basic laws
is limited. The binomial law is of course the quicker method of solving this
problem, particularly if binomial tables are used.

5. The equipment would fail either if A, or B, or C were to fail, or if any


combination of these three were to fail.
Thus the probability of the equipment failing for any reason = I-probability
that the equipment operates for the whole year.
Probability that A does not fail = 0.95
Probability that B does not fail = 0.85
Probability that C does not fail = 0.90
Probability that the equipment does not fail = 0.95 x 0.85 x 0.90 = 0.7268
Probability that the equipment will fail = 1-0.7268 = 0.2732
SPS-2
22 Statistics: Problems and Solutions

6. (a) p(5 seeds germinating) = 6 x 0.9 5 x 0.1 = 0.3543


(b) P(at least 5 seeds germinating) = P(5 or more) = P( 5 or 6)
= P(5) + P(6) = 0.3543 + 0.9 6 = 0.8858

*
7. Conditional probability:
(a) Probability that no ball is red = H x x ~ = 0.4835
(b) Probability that 1 ball is red = 3 x (Is x H x Ii) = 0.4352
(c) Probability that at least 1 is red = 1-0.4835 = 0.5165
(d) Probability that all are the same colour
=P(all white) +p(all red) +p(all black)
=(~ x/4 xfJ) +(fs xf4 x-i3)+(-ls- xt.J xi\)=0.1011
(e) Probability that all are different = 6 x ~ x -h x &= 0.231

8. (a) p(aircraft survives 1 flight) = 0.95


p(aircraft survives 10 flights) = 0.95 10 = 0.7738
(b) p(aircraft survives further 10 flights having survived ten)
= 0.95 10 = 0.7738
(c) p(any 2 of the 5 return) = 10 x 0.95 2 X 0.05 3 = 0.0012

9. (a) Probability that any 1 will be alive = 0.99


By the multiplication law, probability that all 8 will be alive = 0.99 8 = 0.92
:. Probability that none will be dead = 0.92
(b) By multiplication law, probability that 7 will be alive and 1 dead
= 0.99 7 x 0.01. The number of ways this can happen is the number of
permutations of 8, of which 7 are of one kind and 1 another.
8!
Number of ways =-. 7
x l''. =8
By the addition law, probability that 7 will be alive and 1 dead
= 8 x 0.99 7 x 0.01 = 0.075
(c) By the addition law, probability of none or one being dead
= 0.92 + 0.075 = 0.995
Probability of not more than one being dead = 0.995
Probability Theory 23

(d) Probability of none being dead = 0.92


Probability of I or more being dead = 1-0.92
Probability of at least I being dead =0.08

10. At the present stage this is best done geometrically, as in figure 1.6,
A and B will meet if the point representing their two arrival times is in the
shaded area.
p(meet) = I-p(point in unshaded area) = 1-(H)2 = D4

- 5 m in

8 's a r r vi al
lime

A 's arrival l ime -


Figure 1.6

II. There are three possibilities: (a) to produce the toys on machine costing
£10 000; (b) to produce the toys on machine costing £20 000; (c) not to
produce the toys at all.
The solution is obtained by calculating the expected profits for each
possibili ty.

(a) Profit on sales of 2000 = £[ 4.50- (I + I~~~~)] per unit

= - £1.50 or a loss of £1.50 per unit

Profit on sales of 5000 =£[ 4.50 - (I + I ~o~~o )1]per unit


= £(+ 4.50-3) = + £1.50
or a profit of £1.50 per unit

Profit on sales of 10 000 = £[4.50 - (1 + ~~ ~~~) = + £2.50 J


or a profit of £2.50 per unit
Expected profit = -£1.50 x 0040 + £1.50 x 0.30 + £2.50 x 0.3
=£(-0.60 + 0.45 + 0.75) =+ £0.60 per unit
24 Statistics: Problems and Solutions

(b) As before: profit on sales of 2000 = £[4.50 - (0.50 + 2g0~~0) ]


= - £6.00 (Le., loss of £6.00 per unit)
Similarly Profit on sales of 5000 =.0 or break even
Profit on sales of 10 000 = + £2.00 per unit
Expected profit = - £6.00 x 0.4 + £0 x 0.3 + £2.00 x 0.3
= £(-2.40 + 0.60) = - £1.80 per unit

(c) Expected profit = 0


Solution is to install machine (a)
Note: If machine (a) had given a loss, then solution would have been not to
produce at all.

12. (a) P( the men do not meet) = P(all go to different inns)


=P( 1st goes to any) x P(2nd goes to one of the other two)
x p(3rd goes to last inn)
=lxjxj=~

(b) P(all three men meet) = P(lst goes to any inn)

x P(2nd goes to same inn)


x P(3rd goes to same inn)
=lxjxj=~

13. (a) There will be no defective components in the assembly if all five
components selected are acceptable ones. The chance of such an occurrence is
given by the product of the individual probabilities and is
0.90 x 0.90 x 0.98 x 0.95 x 0.95 = 0.7164

(b) If the assembly contains one defective component, anyone (but only
one) of the five components could be the defective. There are thus five mutually
exclusive ways of getting the required result, each of these ways having its
probability determined by multiplying the appropriate individual probabilities
together.
Probability Theory 2S

1st x component D A A A A
2nd x component A D A A A
A = acceptable part
y component A or A or D or A or A
D = defective part
1st z componen t A A A D A
2nd z component A A A A D
The probability of there being just one defective component in the assembly
is given by
2 x (0.10 x 0.90 x 0.98 x 0.95 x 0.95) +(0.90 x 0.90 x 0.02 x 0.95 x 0.95) +
+2 x (0.90 x 0.90 x 0.98 x 0.05 x 0.95) = 0.1592+0.0146+0.0754 =0.2492

14. Assume the products to be independent of each other. Then


(a) Probability of taking all four new products
= 0.95 x 0.50 x 0.80 x 0.30 = 0.1140
(b) Probability of taking only
A, B, and C = 0.95 x 0.50 x 0.80 x (1-0.30) = 0.2660

1.5 Practical Laboratory Experiments and Demonstrations


Experience has shown that when students are being introduced to statistics, the
effectiveness of the course is greatly improved by augmenting it with a practical
laboratory course of experiments and demonstrations, irrespective of the
mathematical background of the students.
The three experiments described here are experiments 1,2, and 3 from the
authors' Laboratory Manual in Basic Statistics, which contains full details,
analysis and summary sheets.
Appendix 1 gives full details of experiment 1 together with the analysis and
summary sheets.
The follOWing notes are for guidance on experiments.

1.5.1 Experiment 1
This experiment, in being the most comprehensive of the experiments in the
book, is unfortunately also the longest as far as data collection goes. However,
as will be seen from the points made, the results more than justify the time.
Should time be critical it is possible to miss experiment 1 and carry out
experiments 2 and 3 which are much speedier. In experiment 1 the data
collection time is relatively long since the three dice have to be thrown 100
times (this cannot be reduced without drastically affecting the results).
26 Statistics: Problems and Solutions

Appendix 1 contains full details of the analysis of eight groups' results for
the first experiment, and the following points should be observed in summarising
the experiment:
(1) The variation between the frequency distributions of number of ones
(or number of sixes) obtained by all groups, and that the distributions
based on total data (sum of all groups) are closer to the theoretical situation.
(2) The comparison of the distributions of score of the coloured dice and
the total score of three dice show clearly that the total score distribution now
tends to a bell-shaped curve.

1.5.2 Experiment 2
This gives a speedy demonstration of Bernoulli's law. As n, the number of
trials, increases, the estimate of p the probability gets closer to the true
population value. For n = 1 the estimate is either p = 1 or 0 and as n increases,
the estimates tend to get closer to p = 0.5. Figure 1.7 shows a typical result.
1.0

x
.~ x j\ x x---x______
:g 0.5 x~x L / " /x-===-----
.0
o \ x \ / ""'x7 --=::::::::::::x
ct \/
x
\

o
Figure 1.7

1.5.3 Experiment 3
Again this is a simple demonstration of probability laws and sampling errors.
Four coins are tossed 50 times and in each toss the number of heads is
recorded. See table 6 of the laboratory manual.
Note
It is advisable to use the specially designed shakers or something similar.
Otherwise the coins will roll or bias in the tossing will occur. The results
of this experiment are summarised in table 8 of the laboratory manual and the
variation in groups' results are stressed as is the fact that the results based on all
groups' readings are closer to the theoretical than those for one group only.

1.5.4 Summary of Experiments 1, 2, and 3


The carrying out of these experiments will have given students a feel for the
basic concepts of statistics. While in all other sciences they expect their results
Probability Theory 27

to obey the theoretical law exactly, they will have been shown that in statistics
all samples vary, but an underlying pattern emerges. The larger the samples
used the closer this pattern tends to be to results predicted by theory. The
basic laws-those of addition and multiplication-and other concepts of
probability theory, have been illustrated.
Other experiments with decimal dice can be designed.t

Appendix (-Experiment 1 and Sample Results

Probability Theory
Number of persons: 2 or 3.

Object
The experiment is designed to illustrate
(a) the basic laws of probability
(b) that the relative frequency measure of probability becomes more
reliable the greater the number of observations on which it is based, that is,
Bernoulli's theorem.

Method
Throw three dice (2 white, 1 coloured) a hundred times. For each throw,
record in table 1
(a) the number of ones
(b) the number of sixes
(c) the score of the coloured die
(d) the total score of the three dice.
Draw up these results, together with those of other groups, into tables
(2,3, and 4).

Analysis
1. For each set of 100 results and for the combined figures of all groups,
calculate the probabilities that, in a throw of three dice:
(a) no face shows a one
(b) two or more sixes occur
(c) a total score of more than 13 is obtained

t Details from: Technical Prototypes (Sales) Limited, lA West Holme Street, Leicester.
28 Statistics: Problems and Sc.lutions

~o. No. Col- Total No. No. Col- Total No. No. Col- Total No. No. Col- Total
of oured score of of loured scare of of oured scare of of oured scare

.
Iof
1'5 6'5 die 1'5 6'5 die 1'5 6'5 die 1'5 6'5 die
0 I:'
"
\ ~ I b 0 \ ~ 0 0 :l... 8 0 0

I 0 2- f 2- 0 I I.f- 0 0 4- \ \ 0 0 :2- 1 0

0 0 Lt- 'Lt- 0 0 ~ 'Lt- 0 I b 12- 1 \ \ "I


1 0 ~ 9 0 0 ::2- 7 1 1 \ 10 0 1 2- \ \

..
..... c;-
1-
0
0
I
4-
(,
~

\b
\

I
0

1 I
, 0

1';2.
1

0 0
0 \ \
g
\
0
0

\ b
\
"
12-

0 I f I~ 0 0 4- I .. 0 0 Z- 'I 0 0 ." 7
0 0 ~ ~ 0 1 Lt \~ 0 0 :2- b \ \ \ 12-
1. 0 I ~ I 0 I b 0 I I~ 0 0 ~ 1:\
,
~

0 0 '3 12. I 0 I 10 0 1 4- 12- z- \ 'l


2- 0 I ~ 0 1 b 12- 0 1 b IS"" 0 0 ~ , 2

1
0
::z..
0

0
1
::.
';
:z.
7
14
0

0
2-

\
0
G.

b
I
,..
1 b

b
0

0
1
2-

0
0
"\

S""
I"
11
1-:3
0
0
0

, ..,
I
2-

4- \~
'I

\ 3
4- \ 0

0 2. ::. I~ 0 1 b \ 2- 0 0 2- 'i 0 , S"" 1 6

0 0 2- 7 \ 0 S"" 1 I 1 0 , 10 0 1 ~ ''"l.
0 b
" 0 I .. I 12.
I 12- I 1 10 0 ~ 0 4-
q ::.
,
1. 0 2- l.!- I 0 ~ 0 0 \ 1 0 1 4- 14-

2-
1
0
"
1
12-

7
0

0
0
0
3
..-
9
\ 0
0

0 I
4-
S-
11-
I;'
0

0
0

0
S""
4-
I,.

10

0 0 ~ 1 \ 0 S- 9 1 :2- I I ~ 0 0 S- \ 0

I
2..
0
0
1
I
b
Lt-
I
2- 0
I b
1
\ 0

7
0

I
:2-

I "
'!:.
14-
, 0
1
:z... 1
1
,
\ 12-
f/
1 I I 10 \ 0 I 7 1 \ I 12- 0 I b lIf-
O 0 '3 7 0 0 u- 1'2. 0 I b I~ 2- 0 \ 4-
Probability Theory 29

Tally TOlal Frequency of given 10lal score Tolal Experi - Theo -


marks score for eaeh group freqY menial relieal
for all proba- proba-
I ® 3 4 5 6 7 8 groups bilily bilily

----
-1-t+\- I ./
3
4
0
0
0
b
0

0
u
0
0

I
0
I
0

Ii-
2-

0
1... 0002-<;" 00046

12. 0'0 I 5: 0.0138

I .; 5 !, I it I I 2- 2. I ~ 0022.<;' 0.0278
It
+++t- / 6 It ~ b 2. 'I 10 u- '3> 1.,-2 O' Mz..5:' 0.0463

-++tt- III ,( 7 9 f 7 5 II b 7 ~ (,,').. 0·0775:' 0.0694

-4+t- III ,( 8 1(, S 10 17 12 f? 17 10 '\ 'i o· I 22. > 0.0972

++++- 111/ / 9 12- 9 11 10 I') ~ lr b 7 0 O0il7<;" 0.1157

-++I+.w+- II J 10 17 12 Iw I b 12- 12- I;l.. 13 lo"iJ Or,';" 0.1250

-I+!+-I J I I 7 b IS Ilt '\ b 13 1'1 q 9 orl-,S- 0.1250


++It- -ltit-ll!+; 12 12- IS- 10 10 10 II, q 10 q2. 0'1" 0.1157

,
-l+It-Wt- 1111 13 q 1£1- ~ 10 7 1.2- 9 '? 7 'iI 00~7C 00972

6 <;; S b' q s 1.,-<;; 0.0694


" 14 If"
-l-I-ft I 0·0'02-

S- t. 3 ,. 0·"",7 0.0463
':,"
-W+- 15 € lr 4- I if-
M.ft- 5' -; S 4- J.. 4- 3 2-9 O·",G-:> 0.0278

- " 16

-
17 I 0 0 0 I I 2. I b 0·007<;; 0.0138

18 0 0 I 1 I 0 0 I 4- 000) 0.0046

100 I DC 100 100 100 )00 100 )00 900


No. of Ihrows
Probabilily of
score of more
0·\\ 0'11, 0'1 a'lt; 0'1" 0". O'lq /)'1 ..

tbQD 13

2. Compare these results with those expected from theory and comment on
the agreement both for individual groups and for the combined observations.

3. Draw probability histograms both for the score of the coloured die and for
the total score of the three dice, on page 27. Do this for your own group's
readings and for the combined results of all groups.

Comment on the agreement with the theoretical distributions.


Note: The theoretical probability distribution for the total score of three
dice is shown in table 2.
IN
o

Tobie 3

No. of No. of throws in Probability that. No. of throws in Probability that.


which given no. of in one throw. which given no. of in one throw.
Group
throws ONES occur no face shows SIXES occur two or more
I a ONE I S I XES appear
0 2 3 0 2 3

I 61- 40 <g 0 O· !) ;L. 61- ~~ b 0 O· Ob


® 100 1,1 ')...7 1:2.. 0 0·1, I ~S- t..-o 0 O· 05
3 {;j,..y ~'.I ~ 0 o· S 4- !)<;l ~7
'"
4- \ 0·0';-

4 !"b L;-\ ;:. 0 D'S"b (;"<l, :'!.f- b I O· O~

5 S"S- /.IrO r;; 0 O· !> 5" 0~ :':2- 4- I 0'0';-


6 b!)" 1...1, CI. 0 0·10 ? 101 :, 0 0'03
"b V:l
7 S"7 "!, ....... ~ 0 o· r>7 5"",\ ~4- 7 0 0'09 ...;;;-;;;.
8 S"b ~9 '!:, 2- 0·:> to s"c!. 1.(" ~ \ O· Db
...
~.
Totals ~OO I+Sb 2...~b S"b 2 4-74- ')../;0 4-J.... £.t-
Experimental ~
<:)
probability O·S O'~7 0'07 000).5 O'S'W O'-:,<G" o'on~ O'coS" ~
Theoretical ~
0- S 7'"6 o·S/f o.~~ ~,~ o· ~
--- -------
probability __ ' - - - - - . - ~ O'li> ~O"~f 0-0""" o~0i>l ' - - - - -0711-
-----
'"
;::s
'"
!::l..
V:l
<:)
i:'
...
0'
;::s
'"
~
c
2"
~
q:
Table 4
~
II>
C
No. of No. of throws in which given score appears on the Probability of odd q
Group throws coloured die number

I 2 3 4 5 6

I \~ kO ::LLt- \:t.. \~ I \ 0'~7

~ \00 :2..S" I~ It.t- l~ 19 lb o· S" 'iJ


3 19 :;"'0 ~"1 ILr IL+ IS O· '51
4 IS- Is \~ I/..t- "J...D ::1..\ O' S 0
5 :1..0 I b :J-:L Ib 17 q o· ~~

6 \S \ <l \("" \ b .2...\ \ . O' S-l

7 l'fl \ '\ \3 I <J 17 I~ O· 5"" '9


8 \ \ \7 ::z..b 14- \ '9 17 O· S" ")..

Totals $]00 \:!.7 I";:,~ 11.t-" I \"7 11.4'1.+- I \ "'-


Experimental
[probability 0'171 0'\7~ O·\q\ O·\l.+b 0,\",0 O'I~'"
Theoretical
Iprobability 1:).("7 0'\1>7 0'11.7 0'1"7 0'1"7 0'\"7 0·50

IN

-
2 Theory of distributions

2.1 Syllabus Covered


Summary of data; frequency and probability distributions; histograms; samples
and populations; distribution types; moments and their calculation; suggested
experiments and demonstrations.

2.2 Resume of Basic Theory and Concepts


2.2.1 Introduction
The understanding of the concepts of distributions and their laws is fundamental
to the science of statistics. Variation occurs almost without exception in all
our activities and processes. For example nature cannot produce two of her
products alike-two 'identical' twins are never exactly alike; a description of
similarity is the saying 'as alike as two peas', yet study two peas from the same
pod and differences in size or colour or shape will become apparent. Consider
for example the heights of men. Heights between 1.70 m and 1.83 m are quite
common and heights outside this range are by no means rare.
Although it is not so obvious, man-made articles are also subject to the same
kind of variability. The manufacturer of washers realises that some washers
will have a greater thickness than others. The resistance of electrical ftlaments
made at the same time will not be exactly alike. The running cost of a department
in a company will not be exactly the same each week, although, off hand, there
is no reason for the difference. The tensile strength of a steel bar is not the same
at two different parts of the same bar. The ash content of coal in a truck is
different when a number of samples from the truck are tested. Differences in
the diameter of components being produced on the same lathe will be found.
The time taken to do a given job will vary from occasion to occasion.
In present-day manufacture, the aim is usually to make things as alike as
possible. Or, alternatively, the amount of variability is controlled by specification
so that variation between certain limits is permitted.
It is interesting to note that even with the greatest precision of manufacture,
32
Theory of Distributions 33

variability will still exist, providing the measuring equipment is sensitive


enough to pick it up.
Thus, variation will be seen to be present in all processes, to a greater or
lesser extent, and the use of distributions and their related theorems is
necessary to analyse such situations.

2.2.2 Basic Theory of Distributions


The basic concepts of distributions can be illustrated by considering any
collection of data such as the 95 values of the output time for an open hearth
furnace given in table 2.1. The output time is the overall time from starting to
charge to finishing tapping the furnace.

7.8 8.0 8.6 8.1 7.9 8.2 8.1 7.9 8.2 8.1
8.4 8.2 7.8 8.0 7.S 7.4 8.0 7.3 7.6 7.8
7.7 7.8 7.S 7.9 7.8 8.3 7.9 8.0 8.2 7.4
7.1 7.S 7.9 8.2 8.S 7.9 7.S 7.8 8.4 8.1
8.2 7.9 8.7 7.7 7.8 8.0 8.1 8.2 7.9 7.3
8.0 8.1 7.8 8.1 7.6 7.8 7.9 8.S 7.8
8.3 7.9 8.1 7.6 7.9 8.3 7.4 7.9 8.7
7.6 8.0 8.0 8.2 8.2 7.9 8.1 8.4 7.6
7.9. 7.7 7.9 7.8 7.8 7.7 7.S 7.7 8.1
8.1 8.0 8.1 7.7 8.0 8.0 8.0 8.1 7.7

Table 2.1 Furnace output time (h)

Referring to these data, it will be seen that the figures vary one from the
other; the first is 7.8 h, the next 8.4 h and so on; there is one as low as 7.1 hand
one as high as 8.7 h.
In statistics the basic logic is inductive, and the data must be looked at as a
whole and not as a collection of individual readings.
It is often surprising to the non-statistician or deterministic scientist how
often regularities appear in these statistical counts.
The process of grouping data consists of two steps usually carried out
together.

Step 1. The data are placed in order of magnitude.


Step 2. The data are then summarised into groups or class intervals.
This process is carried out as follows:
(1) The range of the data is found, i.e.
largest reading - smallest reading = 8.7 -7.1 = 1.6 h
(2) The range is then sub-divided into a series of steps called class intervals.
34 Statistics: Problems and Solutions

These class intervals are usually of equal size, although in certain cases unequal
class intervals are used. For usual sample sizes, the number of class intervals is
chosen to be between 8 and 20, although this should be regarded as a general
rule only. For table 2.1, class intervals of size 0.2 h were chosen, i.e.,
7.1-7.3,7.3-7.5, ... ,8.7-8.9
(3) More precise defmition of the boundaries of the class intervals is however
required, otherwise readings which fall say at 7.3 can be placed in either of two
class intervals.
Since in practice the reading recorded as 7.3 h could have any value between
7.25 h and 7.35 h (normal technique of rounding off), the class boundaries will
now be taken as:
7.05-7.25,7.25-7.45, ... , 8.45-8.65, 8.65-8.85
Note: Since an extra digit is used there is no possibility of any reading's falling
on the boundary of a class.
The summarising of data in figure 2.1 into a distribution is shown in
table 2.2. For each observation in table 2.1 a stroke is put opposite the sub-range
into which the reading falls. The strokes are made in groups of five for easy
summation.

Value of Frequency Probability


variable Frequency distribution distribution

7.05-7.25 I 1 0.01
7.25-7.45 tttt .5 0.05
7.45-7.65 tttttttt 10 0.11
7.65-7.85 tttt tttt tttt 1111 19 0.20
7.85-8.05 tttt tttt tttt tttt tttt II 27 0.28
8.05-8.25 tttt tttt tttt tttt II 22 0.23
8.25-8.45 ttttl 6 0.06
8.45-8.65 III 3 0.03
8.65-8.85 II 2 0.02
Total = 95 Total = 1.00
Table 2.2
The last operation is to total the strokes and enter the totals in the next to
last column in table 2.2 obtaining what is called a frequency distribution. There
are for example, one reading in class interval 7.05-7.25, five readings in the
next, ten in the next, and so on. Such a table is called a frequency distribution
since it shows how the individuals are distributed between the groups or class
intervals. Diagrams are more easily assimilated so it is normal to plot the
Theory of Distributions 3S

30

25

20

15

10

Output time (h)

Figure 2.1. Frequency histogram (output data).

frequency distribution and this frequency histogram is shown in figure 2.1.


In plotting a histogram, a rectangle is erected on each class interval, the area
of the rectangle being proportional to the class frequency.
Note: Where class intervals are all of equal length, the height of the rectangle
is also proportional to the class frequency. Other examples of frequency
distributions and histograms are given in section 2.3.

2.2.3 Probability Distributions


The frequency distribution is often transformed into a probability distribution
by calculating the relative frequency or probability of a reading falling in each
class interval.
For example, probability of a reading falling in the interval 7.45-7.65
=number of readings in class _ !Q =0 11
total number of readings 95 .
(See chapter 1 on measurement of probability.)
Probability distributions have a distinct advantage when comparing two or
more sets of data since the area under the CUl'Ve has been standardised in all
cases to unity.

2.2.4 Concept of a Population


All the data of table 2.1 are summarised by means of the frequency distribution
shown in figure 2.1. The distribution was obtained from a sample of95
observations. However, in statistics the analyst likes to think in terms of
36 Statistics: Problems and Solutions

thousands of observations; in fact he thinks in terms of millions or .more and


thus he conceives an infinite popu1i1tion. Normally millions of observations
cannot be obtained, only hundreds at the most being available, and so the
statistician is forced to work with a finite number of readings. These readings
are thought of as a sample taken from an infinite population and in some way
representative of this population. Statisticians take this infmite population as a
smooth curve. This is substantiated by studying what happens to the shape of
the distribution as the sample size increases. Figure 2.2 illustrates this, the data

Frequency scale
~=O.I

Sample
2400

Scale of x
'--......._'----'-_-'----'-1----,I 1 I I I 1
-3 -2 -I 0 2 3 -3 -2 -I 0 2 3

Figure 2.2. The effect of the sample size on the histogram shape.

here being taken from an experiment in a laboratory. A sample size of 100 gives
an irregular shape similar to those obtained from the data of output times.
However, with increasing sample size, narrower class intervals can be used and
the frequency distribution becomes more uniform in shape until with a sample of
10 000 it is almost smooth. The limit as the sample size becomes infmite is also
shown. Thus with small samples, irregularities are to be expected in the frequency
distributions, even when the population gives a smooth curve.
It is the assumption that the population from which the data was obtained
Theory of Distributions 37

has a smooth curve (although not all samples have), that enables the statistician
to use the mathematics of statistics.

2.2.5 Moments of Distribution


The summarising of data into a distribution always forms the first stage in
statistical analysis. However, this summarising process must usually be taken
further since a shape is not easy to deal with.
The statistician, in the final summary stage, calculates measures from the
distribution, these measures being used to represent the distribution and thus
the original data.
Each measure is called a statistic. In calculating these measures or statistics,
the concept of moments is borrowed from mechanics. The distribution in
probability form is considered as lying on the x axis and the readings in each
interval as having the value of the mid-point of each interval, i.e., x h X2, ••• , XN
etc.
If the probabilities associated with these variable values are Pi> P2, ... PN
(figure 2.3 shows this diagrammatically), then PI + P2 + ... + PN = 1.

_XI_

Figure 2.3

Consider now the 1st moment of the distribution about the origin
N
=L P; x; = x (the arithmetical average)
;=1

Thus the lst statistic or measure is the arithmetical average x. Higher moments
are now taken about this arithmetical average rather than the origin.
Thus, the 2nd moment about the arithmetical average
N
= L p;(x; _X)2
;=1
38 Statistics.' Problems and Solutions

This 2nd moment is called the variance in statistics, and its square root is called
the standard deviation.
Thus the standard deviation of the distribution

The higher moments are as follows:


N
3rd moment about the average = L Pi(Xi - x?
i=l

N
4 th moment about the average = L PiCXi -
i=l
X)4

N
or in general the kth moment about the average = L plXi - x)k
i=l

The first two moments, the mean and the variance, are by far the most
important.

2.2.6 Resume of Statistical Terms used in Distribution Theory


Sample
A sample is any selection of data under study, e.g., readings of heights of men,
readings from repeated time studies.

Random Sample
A random sample is a sample selected without bias, i.e., one for which every
member of the population has an equal chance of being included in the sample.

Population or Universe
This is the total number of possible observations. This concept of a population
is fundamental to statistics. All data studied are in sample form and the
statistician's sample is regarded as having been drawn from the population of all
possible events. A population may be finite or infinite. In practice, man~ finite
populations are so large they can be conveniently considered as infinite in size.

Grouping or Classification of Numerical Data


The results are sub-divided into groups so that no regard is paid to variations
within the groups. The following example illustrates this.
Theory of Distributions 39

Groupings Number of results

3.95-4.95 8
4.95-5.95 7
5.95-6.95 5

The class boundaries shown in this example are suitable for measurements
recorded to the nearest 0.1 of a unit. The boundaries chosen are convenient for
easy summary of the raw data since the first class shown contains all
measurements whose integer part is 4, the next class all measurements starting
with 5 and so on.
It would have been valid but less convenient to choose the class as, say,
3.25-4.25,4.25-5.25, ...
In grouping, any group is called a class and the nJlmber of values falling in
the class is the class frequency. The magnitude of the range of the group is
called the class interval, i.e., 3.95-4.95 or 1.
Number of Groups
F or simplicity of calculation, the number of intervals chosen should not be too
large, preferably not more than twenty. Again, in order that the results obtained
may be sufficiently accurate, the number must not be too small, preferably
not less than eight.

Types of Variable
Continuous. A continuous variable is one in which the variable can take every
value between certain limits a and b, say.
Discrete. A discrete variable is one which takes certain values only-frequently
part or all of the set of positive integers. For example, each member of a
sample mayor may not possess a certain attribute and the observation recorded
(the value of the variable) might be the number of sample members which possess
the given attribute.

Frequency Histogram
A frequency distribution shows the number of samples falling into each class
interval when a sample is grouped according to the magnitude of the values. If the
class form, frequency is plotted as a rectangular block on the class interval the
diagram is called a frequency histogram. Note: Area is proportional to frequency.

Probability Histograms
A probability histogram is the graphical picture obtained when the grouped
40 Statistics: Problems and Solutions

sample data are plotted, the class probability being erected as a rectangular
block on the class interval. The area above any class interval is equal to the
probability of an observation being in that class since the total area under the
histogram is equal to one.

Limiting form of Histogram


The larger the sample, the closer the properties of histograms and probability
curves become to those of the populations from which they were drawn, i.e.,
the limiting form.

Variate
A variate is a variable which possesses a probability distribution.

2.2.7 Types of Distribution


While there is much discussion as to the value of classifying distributions into
types, there is no doubt in the authors' minds that classification does help the
student to get a better appreciation of the patterns of variation met in practice.
Figure 2.4 gives the usually accepted classifications.

Type 1: Unimodal
Examples of this variation pattern are: intelligence quotients of children,
heights (and/or weights) of people, nearly all man-made objects when produced
under controlled conditions (length of bolts mass-produced on capstans, etc.).
A simple example of this type of distribution can be illustrated if one
assumes that the aim is to make each item or product alike but that there
exists a very large number of small independent forces deflecting the aim, and
under such conditions, a unimodal distribution arises. For example, consider
a machine tool mass-producing screws. The setter sets the machine up as
correctly as he can and then passe:; it over to the operator and the screws
produced form a pattern of variation of type 1. The machine is set to produce
each screw exactly the same, but, because of a large number of deflecting
forces present, such as small particles of grit in the cooling oil, vibrations in
the machine, slight variation in the metal-manufacturing conditions are not
constant, hence there is variation in the final product. (See simple quincunx
unit on page 61.)

Type 2: Positive Skew


Examples of this type of distribution are the human reaction time and other
types of variable where there is a lower limit to the values, i.e., distribution of
number of packages bought at a supermarket, etc.
If this type of distribution is met when a symmetrical type should be
expected it is indicative of the process being out of control.
Theory of Distributions 41

Symmetrical Positive skew

Negative skew Bimodal

j' shaped 'u' shaped

Figure 2.4. Types of distribution.

Type 3: Negative Skew


True examples of this type are difficult to find in practice but can arise when
there is some physical or other upper constraint on the process.

Type 4: Bimodal
This type cannot be classified as a separate form unless more evidence of
measures conforming to this pattern of variation are discovered. In most cases
42 Statistics: Problems and Solutions

this type arises from the combination of two distributions of type 1 (see
figure 2.5).

------

Figure 2.5. Bimodal distribution arising from two type-J distributions with
different means m 1 and m2.

Type 5: J-Shaped or Negative Exponential


Examples of its type include flow of water down a river, most service time
distributions and time intervals between accidents or breakdowns.

Type 6: U-Shaped
This type is fascinating in that its pattern is the opposite to type 1. A variable
where the least probable values are those around the average would not be
expected intuitively and it is rare when it occurs in practice. One example,
however, is the degree of cloudiness of the sky-at certain times of the year
the sky is more likely to be completely clear or completely cloudy than anything
in between.

2.2.8 Computation of Moments of Distribution


Dependent on the type of data and their range, the data mayor may not be
grouped into class intervals. The calculation of moments is the same for either
grouped or ungrouped data, but in the case of grouped data, all the readings in
the class interval are regarded as lying at the centre of the interval. The method
used in this text and throughout all the examples makes use of a simple
transformation of the variate and is usually carried out on the frequency
distributions rather than on the probability distribution. This use of frequency
distributions is common to most text books and will be used here although
there is often advantage in using the probability distribution.
Theory of Distributions 43

Let xi= value of the variate in the ith class interval


Ii = frequency of readings in the ith class interval
Pi = probability of a value in the ith class interval
L Ii = n, the total number of readings
i

IfiXi
The 1st moment (arithmetic average) =~ fi =X
i
or

The 2nd momen t (variance)

or

For computing purposes the formula for variance is usually modified to


reduce the effect of rounding errors. These errors can arise through use of the
calculated average x which is generally a rounded number. If insufficient
significant figures are retained in x, each of the deviations (Xi - x) will be in
error and the sum of their squares [~f,{xi - X)2] will tend to be inaccurate.

Computation of Moments using Frequency Distributions


The variate (xD is transformed to
Xi-Xo
ui = --c- or Xi = CU +Xo

where Xo = any value of X taken as an arbitrary average, c = class interval width.


I t can easily be shown that

1st moment x = Xo +
c
t.L
i
fi
fiUi

2nd moment (S')2 = c2 [ ~fiU; - (~~X;)2]


~fi
44 Statistics: Problems and Solutions

Example

The values given in table 2.3 have been calculated using the data from table 2.2.

Mid point Frequency


Class interval Uj fjuj fjul
(Xj) (h)

7'.05-7.25 7.15 I -4 -4 16
7.25-7.45 7.35 5 -3 -15 45
7.45-7.65 7.55 10 -2 -20 40
7.65-7.85 7.75 19 -I -19 19
7.85-8.05 7.95 27 0 0 0
8.05-8.25 8.15 22 +1 +22 22
8.25-8.45 8.35 6 +2 +12 24
8.45-8.65 8.55 3 +3 +9 29
8.65-8.85 8.75 2 +4 +8 32
~fj = 95 ~fjuj= -7 ~fiul= 225

Table 2.3

Let Xo =7.95, c =0.20 h


arithmetic average = 7.95 + 0.20( ~;)= 7.94 h

.. variance (S')2 =(0.2Y [225 -9/(-7)'


5
] =0.095
Computation using Probability Distributions

Class interval Mid point Probability


Uj uiPj ulpj
(Xj) (pj)

7.05-7.25 7.15 om -4 -0.04 0.16


7.25-7.45 7.35 0.05 -3 -0.15 0.45
7.45-7.65 7.55 0.11 -2 -0.22 0.44
7.65-7.85 7.75 0.20 -1 -0.20 0.20
7.85-8.05 7.95 0.28 0 0 0
8.05-8.25 8.15 0.23 +1 +0.23 0.23
8.25-8.45 8.35 0.06 +2 +0.12 0.24
8.45-8.65 8.55 0.03 +3 +0.09 0.27
8.65-8.85 8.75 0.02 +4 +0.08 0.32

Table 2.4
Theory of Distributions 45

Let Xo = 7.95 and c = 0.2 then 'Luj/Jj = 0.09 and 'L(pjU;) = 2.31

The formulae for the moments are


arithmetic average x =Xo + C'LPjUj = 7.95 + (-0.018) = 7.93 h
variance (S')2 = c2 ['LpjU;-('LpjU;)2] = 0.22 (2.31-0.09 2) = 0.092
which compares favourably with results achieved using the frequency distribution
in view of the rounding off of probability to the second decimal point.

2.2.9 Sheppard's Correction


When calculating the moments of grouped distributions, the assumption that
the readings are all located at the centre of the class interval leads to minor
errors in these moments. It must be stressed that the authors do not consider
that these corrections, known as Sheppard's corrections, are of sufficient
magnitude in most problems, to be used.
However, it is only correct that they should be given:
Correction to 1st moment, x = 0

Correction to 2nd moment = - ~2


Thus the 1st moment calculation is unbiased while the answer given for the
2nd moment should be reduced by c2 /12.

2.3 Problems for Solution


In the following problems, students are required to
(1) summarise data into distributions
(2) draw the frequency histogram
(3) calculate the mean and standard deviation of data.
While there is a large collection of problems given, tutors should select those
examples most relevant to their students' courses. Worked solutions are
given for all questions in section 2.4, but in the authors' opinion the answering
of two or three problems should be adequate.
Note: Students' answers may differ slightly from the given answers, depending
on the class intervals selected.
The distributions illustrate that with limited samples of 30 to 100 observations
the shapes of the distributions can tend in some cases to be relatively irregular.
46 Statistics,' Problems and Solutions

1. In a work study investigation of an operator glueing labels onto square


biscuit tins, the following readings, in basic minutes, were obtained for the time
of each operation:

0.09 0.09 0.11 0.09 0.09 0.11 0.09 0.07 0.09 0.06
0.09 0.09 0.09 0.11 0.09 0.07 0.09 0.06 0.10 0.07
0.09 0.10 0.06 0.10 0.08 0.06 0.09 0.08 0.08 0.08
0.08 0.10 0.08 0.07 0.09 0.08 0.09 0.11 0.09 0.09
0.08 0.10 0.09 0.08 0.10 0.08· 0.08 0.09 0.09 0.09
0.08 0.06 0.08 0.08 0.10 0.09 0.09 0.10 0.10 0.11

2. In the assembly of a Hoover agitator, time element number 2 consists of:


pick up two spring washers, one in each hand and place on spindle, pick up two
bearings and place on spindle, pick up two felt washers and place on spindle,
pick up two end caps and screw onto spindle.
The following data, in basic minutes, were obtained from 93 studies for the
time element number 2:

0.26 0.28 0.31 0.22 0.25 0.28 0.28 0.26 0.29 0.25
0.24 0.29 0.26 0.28 0.24 0.26 0.29 0.23 0.26 0.26
0.25 0.30 0.25 0.29 0.17 0.26 0.33 0.24 0.18 0.34
0.26 0.31 0.23 0.29 0.22 0.26 0.29 0.25 0.24 0.28
0.27 0.32 0.23 0.26 0.25 0.28 0.36 0.42 0.24 0.21
0.23 0.27 0.46 0.23 0.28 0.31 0.29 0.31 0.25
0.24 0.28 0.33 0.24 0.29 0.36 0.32 0.27 0.24
0.25 0.29 0.33 0.25 0.35 0.24 0.33 0.28 0.26
0.26 0.20 0.24 0.26 0.34 0.30 0.30 0.29
0.18 0.22 0.25 0.27 0.33 0.30 0.30 0.23

3. The time interval, in minutes, between the arrival of successive customers


at a cash desk of a self-service store was measured over 56 customers and the
results are given below:

1.05 1.68 0.78 1.10 0.32 1.61 0.10 0.43 3.70 0.09
0.21 2.71 2.12 2.81 3.30 0.15 0.54 3.12 0.80 1.76
1.14 0.16 0.31 0.91 0.18 0.04 1.16 2.16 1.48 0.63
0.57 0.65 4.60 1.72 0.52 2.32 0.08 0.62 3.80 1.21
1.16 0.58 0.57 0.04 1.19 0.11 0.05 2.68 2.08 0.01
0.15 0.42 0.25 0.05 1.88 3.90
Theory of Distributions 47

4. The number of defects per shift from a large indexing machine are given
below for the last 52 shifts:
2 6 4 5 1 3 2 1 4 2 1 4 6
3 4 3 2 4 5 4 3 6 3 0 7 4
7 3 5 4 3 2 0 5 2 5 3 2 9
5 3 2 1 0 3 3 4 3 2

5. The crane handling times, in minutes, for a sample of 100 jobs lifted and
moved by an outside yard mobile crane are given below:
5 6 21 8 7 8 11 5 10 21
13 15 17 7 27 6 6 11 9 4
7 4 9 192 10 15 31 15 11 38
16 52 87 20 18 22 11 7 9 8
6 10 10 17 37 32 10 26 14 15
28 182 17 27 4 9 19 10 44 20
15 5 20 8 25 14 23 13 12 7
9 92 33 22 19 151 171 21 4 6
31 13 7 45 6 7 17 7 19 42
9 6 55 61 52 7 5 102 8 23

6. The lifetime, in hours, of a sample of 100 electric light bulbs is given


below:
1067 919 1196 785 1126 936 918 1156 920 1192
855 1092 1162 1170 929 950 905 972 1035 922
1022 978 832 1009 1157 1151 1009 765 958 1039
923 1333 811 1217 1085 896 958 1311 1037 1083
999 932 1035 944 1049 940 1122 1115 1026 1040
901 1324 818 1250 1203 1078 890 1303 1147 1289
1187 1067 1118 1037 958 760 1101 949 883 699
824 643 980 935 878 934 910 1058 867 1083
844 814 1103 1000 788 1143 935 1069 990, 880
1037 1151 863 990 1035 1112 931 970 1258 1029

7. The number of goals scored in 57 English and Scottish league matches for
Saturday 23rd September, 1969, was:
0 2 3 3 5 2 2 1 4 4
2 3 3 3 2 2 0 4 6 2 5
6 1 4 4 3 4 2 7 6 2
3 6 4 2 4 3 3 3 6 8 3
5 3 3 2 3 I
J 3 5
48 Statistics: Problems and Solutions

8. The intelligence quotients of 106 children are given below:t


75 112 100 116 99 111 85 82 108 85
94 91 118 103 102 133 98 106 92 102
115 109 100 57 108 77 94 121 100 107
104 67 111 88 87 97 102 98 101 88
90 93 85 107 80 106 120 91 101 103
109 100 127 107 112 98 83 98 89 106
79 117 85 94 119 93 100 90 102 87
95 117 142 94 93 72 98 105 122 104
104 79 102 104 107 97 100 109 103 107
106 96 83 107 102 110 102 76 98 88

9. The sales value for the last 30 periods of a non-seasonal product are given
below in units of £100:
43 41 74 61 79 60 71 69 63 77
70 66 64 71 71 74 56 74 41 71
63 57 57 68 64 62 59 52 40 76

10. The records of the total score of three dice in 100 throws are given below:
16 4 9 12 11 8 15 13 12 13
8 7 6 13 10 11 16 14 7 12
14 14 4 13 9 12 8 10 12 14
8 4 10 6 9 10 13 12 13 13
16 7 13 12 9 8 10 11 12 10
15 12 4 16 10 9 13 10 9 12
9 4 14 13 7 6 11 9 15 8
5 12 7 6 7 13 13 11 13 14
12 7 10 12 12 12 13 9 16 4

2.4 Solutions to Problems


1. Range = 0.11- 0.06 = 0.05 min.
Since only two significant figures are given in the data, there is no choice
regarding the class interval width.
Size of class interval = 0.01 min, giving only six class intervals (below the
preferred minimum of eight).

t These data were taken from Facts from Figures by M. J. Moroney, Pelican.
Theory of Distributions 49

30

20

10

Basic minutes

Figure 2.6. Clueing labels onto biscuit tins.

Class interval Mid point (x) Frequency (f) u uf

0.055-0.065 0.06 5 -3 -15 45


0.065-0.075 0.07 4 -2 -8 16
0.075-0.085 0.08 14 -1 -14 14
0.085-0.095 0.09 23 0 0 0
0.095-0.105 0.10 9 +1 +9 9
0.105-0.115 0.11 5 +2 +10 20
""2:,f= 60 ""2:,uf= -18 ""2:,u 2 f=104

Table 2.5

Transforming x =Xo + cu
Letxe = 0.09
c = 0.D1
1st moment about the origin = arithmetical mean,

__ ""2:,uf _ (-18)_.
x -Xo + c ""2:,f - 0.09 + 0.01 x 60 - 0.087 mm

Variance Of[ ""2:,u 2f -(""2:,U


--
fl ] [(-18)2]
104- " - ( )
(s')2 = c2 ""2:,f ""2:,f = 0.012 60 60 = 0.012 1046~ 5.4
= 1.64 x 10- 4
so Statistics: Problems and Solutions

Standard deviation
s'= V(1.64 x 10- 4 ) = 0.013 min
The histogram is shown in figure 2.6.

2. Range = 0.46 - 0.17 = 0.29 min; size of class interval = 0.03 min, giving
9-10 class intervals.

Class interval Mid point (x) Frequency (J) u uf

0.165-0.195 0.18 3 -3 -9 27
0.195-0.225 0.21 5 -2 -10 20
0.225-0.255 0.24 25 -1 -25 25
0.255-0.285 0.27 26 0 0 0
0.285-0.315 0.30 19 +1 +19 19
0.315-0.345 0.33 10 +2 +20 40
0.345-0.3 7 5 0.36 3 +3 +9 27
0.375-0.405 0.39 0 +4 0 0
0.405-0.435 0.42 1 +5 +5 25
0.435-0.465 0.45 1 +6 +6 36
'J:,f= 93 'J:,uf= +15 'J:,u 2 f=219

Table 2.6
(For histogram see figure 2.7.)

Calculation of the Mean and Standard Deviation


Transform
X=Xo+CU
Let
Xo = 0.27, c= 0.03

Average time

x =xo + c ~1 = 0.27 + (0.03 x ~~) =0.275

e
Variance of sample

'J:,U2f- ('J:,Ufi] [219 - (+15)2]


[ 19 932.42)
(S')2 = c2 'J:,f 'J:,f = 0.03 2 93 93 = 0.03 2

0.03 2 ~3216.58 = 0.0021

Standard deviation s' = 0.046 min


Theory of Distributions 51

30

20

10

0.45
Basic minutes

Figure 2. 7. Time taken to assemble Hoover agitator.

3. Range = 4.60 - 0.01 = 4.59 min; width of class interval = 0.5 min.

Class interval Frequency if) u uf

0-0.499 19 -2 -38 76
0.50-0.999 11 -1 -11 11
1.00-1.499 7 0 0 o
1.50-1.999 6 +1 +6 6
2.00-2.499 4 +2 +8 16
2.50-2.999 3 +3 +9 27
3.00-3.499 2 +4 +8 32
3.50-3.999 3 +5 +15 75
4.00-4.499 0 +6 +0 o
4.50-4.999 1 +7 +7 49
56 ~uf=+4

Table 2.7
(For histogram, see figure 2.8.)
Transform
x =xo + cu
Let
xo = 1.25, c = 0.50
1st moment about the origin = arithmetic average,
__
x - xo + c
~uf _
~f -
±_ .
1.25 + 0.50 x 56 - 1.29 mm
52 Statistics.' Problems and Solutions

20

15

>.
u
c
CIl
fr 10
CIl
~

Interval between arrivals (min)

Figure 2.8. Interval between arrival of customers.


Variance of the sample

(S')2 = c2 [
LU2 f - (LUJ)2]
Lf Lf = 0.5 2
Standard deviation of sample s' = 1.14 min
[292 - (+4)2]
56 56 = 0.25 e
92 ;, 0.29) = 1.30

4. Range =9 - 0 = 9 defectives; width of class interval = 1 defective.

Number of defectives Number of shifts (f) u uf u2f

0 3 -3 -9 27
7 -2 -14 28
2 9 -1 -9 9
3 12 0 0 0
4 9 +1 +9 9
5 6 +2 +12 24
6 3 +3 +9 27
7 2 +4 +8 32
8 0 +5 +0 0
9 1 +6 +6 36
Lf= 52 Luf= +12 Lu 2 f= 192
Table 2.8
(For histogram see figure 2.9.)
Theory of Distributions 53

10

>-
u
C
Ol
::J
0-
Ol
~ 5

Number of defects per shift


Figure 2.9. Number of defects in indexing machine.
Transfonn Xo = 3
Let c = 1 defective . . 12 .
Average number of defectives per ShIft =3 + 1 x 52 = 3.2 per ShIft
Variance of the sample [ (12)2]
192--
(S')2 = 12 x 52 52 = 3.64

Standard deviation = 1.9

5. Range = 192 -4 = 188 min.


In this case, if equal class interval widths were chosen, then a width of
perhaps 20 min would be suitable. However, as can be checked, in the case of
the J-shaped distribution unequal class intervals give a better summary.

Class interval Mid point (x) Frequency (f) u uf u2 f

0- 9.99 5 35 -3 -105 315


10- 19.99 15 30 -2 -60 120
20- 29.99 25 15 -1 -15 15
30- 39.99 35 6 0 0 0
40- 49.99 45 3 +1 3 3
50- 69.99 60 4 +2.5 10 25
70- 99.99 85 2 +5 10 50
100-139.99 120 3 +8.5 25.5 216.75
140-199.99 140 2 +13.5 27 364.5
"'Lf= 100 "'Luf= -104.5 "'Lu 2 f= 1108.5
Table 2.9
(For histogram, see figure 2.10.)
SPS-3
54 Statistics: Problems and Solutions

35
35

30

25

~ 20
c
CD
::l
0-
CD
....
lJ.. 15

10

2 3
2
10 10
IX) (l)

Handling times (min)

Figure 2.10. Crane handling times.

Transfonn X = Xo + eu
Lete= 10 min
Xo = 35

Arithmetic average = 35 - 10 x 1~05 = 24.6 min

Variance of the sample

1082 - (-104.5)2]
[
(S')2 = 102 100 100 = 102(9.72) = 972

Standard deviation s' = 31.2 min


Theory of Distributions ss

6. Range = 1333 - 643 = 690 h; class interval chosen as 100.

Variate (x) Frequency (f) u uf

549.5- 649.5 -4 -4 16
649.5- 749.5 1 -3 -3 9
749.5- 849.5 10 -2 -20 40
849.5- 949.5 26 -1 -26 26
949.5-1049.5 26 0 0 o
1049.5-1149.5 18 +1 +18 18
1149.5-1249.5 11 +2 +22 44
1249.5-1349.5 7 +3 +21 63
'Lf= 100 'Luf= +8

Table 2.10
(For histogram, see figure 2.11.)

Transforming x =xo + uc
where c = 100 h
Xo = 1000 h

Average life of bulbs,

x = 1000 + 100 x (1+080)= 1008 h


Variance of the sample
216- (+8)2J
[
(s)2 = 1002 100100 = 21 536

Standard deviation s' = 146.6 h

30

>- 20
o
C
QI
:J
g
tt 10

550 650
Lifetime (h)
Figure 2.11. Lifetime of electric light bulbs.
S6 Statistics: Problems and Solutions

7. Range = 0-8 goals.


Discrete distribution
Width of class interval = 1 goal

Number of
Frequency (f) u uf u2f
goals/match

0 2 -4 -8 32
I 9 -3 -27 81
2 II -2 -22 44
3 IS -I -IS IS
4 8 0 0 0
5 5 +1 +5 5
6 5 +2 +10 20
7 I +3 +3 9
8 I +4 +4 16
--- ---
'kf= 57 'kuf= -50 'ku 2 f= 222

Table 2.11
(For histogram, see figure 2.12.)
Xo =4 c =1
Average goals/match
x=4+1x ( -50)
57 =3.12
Variance of sample
222 - (50)2]
[
(s')2 = 12 x. 5/ 7 =3.13
Standard deviation of sample = 1.8

15

:>. 10
u
c
CI>
:l
CT

~ 5

7 8
Goals / mate h
Figure 2.12. Number of goals scored in soccer matches.
Theory of Distributions 57

8. Range = 143 - 57 = 85.


Suitable class intervals could be either 10 or 15. In this case as with the author
in Facts from Figures the class interval is chosen as 10.

Class interval Frequency (f) u uf u2 f

54.5- 64.5 -4 -4 16
64.5- 74.5 2 -3 -6 18
74.5- 84.5 9 -2 -18 36
84.5- 94.5 22 -1 -22 22
94.5-104.5 33 0 0 0
104.5-114.5 22 +1 +22 22
114.5-124.5 8 +2 +16 32
124.5-134.5 2 +3 +6 18
134.5-144.5 1 +4 +4 16
'i,f= 100 'i,uf= -2 'i,u 2 f= 180

Table 2.12
Transfonning x =Xo + cu (For histogram, see figure 2.13.)
where Xo = 99.5
c= 10
Average intelligence quotient

oX = 99.5 + 10 x C~~) = 99.3


Variance of sample
180- (-2?]
[
(s'}2 = 102 100100 = 180.0
Standard deviation of sample s' = 13.4
40

30
>-
"
c
Ol
::J
c- 20
Ol
~

10

2 2
55 65 135 145
Intelligence quotients
Figure 2.13. Intelligence quotients of children.
58 Statistics: Problems and Solutions

9. Range = 79 - 40 = 39; class interval width = 4.

Variate (x) Frequency (f) u uf u2f

39.5-43.5 4 -5 -20 100


43.5-47.5 0 -4 0 0
47.5-51.5 0 -3 0 0
51.5-55.5 1 -2 -2 4
55.5-59.5 4 -1 -4 4
59.5-63.5 5 0 0 0
63.5-67.5 3 +1 +3 3
67.5-71.5 7 +2 +14 28
71.5-75.5 4 +3 +12 36
75.5-79.5 2 +4 +8 32
J:,f= 30 J:,uf= +11 J:,u 2 f= 207

Table 2.13
(For histogram, see figure 2.14.)
where Xo =61.5
c=4
Average sales/period
x = 61.5 + 4Gb)= 63

(:lJlf] =
Variance of sample
207 -
(S')2 =4 2 [ 30 30 108.3

Standard deviation of sample s' = 10.4


7

Figure 2.14. Sales value of a product over 30 time periods.


Theory of Distributions 59

10. Range = 16-4 = 12; use class interval of2 units.

Variate (x) Frequency (f) u uf u2 f

3.5- 5.5 7 -3 -21 63


5.5- 7.5 13 -2 -26 52
7.5- 9.5
9.5-11.5
11.5-13.5
17
18
29
-1

+1
0 -17

+29
0 0
17

29
l3.5-15.5 11 +2 +22 44
15.5-17.5 5 +3 +15 45
~f= 100 ~uf=+2 ~u'},f= 250

Table 2.14

(For histogram, see figure 2.15.)

where Xo = 10.5
c=2
Average score

x = 10.5 + 2 (1~) = 10.54


Variance of scores

(S')2 = 22 [ 250-(~tl
1O~00 = 10
Standard deviation s' = 3.16

30

>- 20
o
c:
Q)
:l
i
It 10

Totol dice score

Figure 2.15. Total score of three dice.


60 Statistics: Problems and Solutions

2.5 Practical Laboratory Experiments and Demonstrations


The three experiments described are experiments 4, 5 and 6 from the authors'
Laboratory Manual in Basic Statistics, pages 20-32.
As explained on page 20 of the manual, the authors leave the selection of
the populations to be used to the individual instructor-use whatever is
most suitable.
The objects of these experiments are firstly to show basic concepts involved
and secondly to give experience in computing means and standard deviation.
Thus data collection should be as quick as possible and the following points
noted:

(1) How accurately should students measure? Obviously the unit of


measurement must be small enough to give approximately eight to twenty class
intervals and since the sample size of 50 is relatively small, the best number of
class intervals is at the bottom end of the range.
(2) Selection of class intervals: The tables for computing mean and
standard deviations are set out fully in the manual. However, with the kit, one
obvious and quick experiment is to measure 50 rods from either the red or
yellow population using the measuring rules. Again one of the experiments
designed by the authors is described below. This 'straw' experiment is perhaps
one of the best and most famous distribution experime·nts due to various points
which can be made. Also described is the shove halfpenny experiment.

2.5.1 The Drinking Straw Experiment


The simplicity and speed of this experiment illustrate the main requirements of
good design.
Here students (in groups of two or three) are given 50 or more ordinary
drinking straws (usual size 180-250 mm) and one of the standard measuring
rules from the kit.
One student acts as cutter for the whole experiment and cuts, with the use
of the rule, one straw to exactly 130 mm. With this straw laid on the bench as
the guide and holding the other straws 0.5-1 m away he then attempts to cut
50 straws to the 130 mm standard. As the straws are cut they are passed to
others in the group for measuring and the results are entered in the table in the
manual.
Students have to decide (or be guided) on the unit of measurement, i.e. at
least 6 to 8 class intervals. (Note: no feedback must take place in this experiment
and measurers should not let the cutter see results.) This experiment ends
usually in distributions whose shapes are either (a), (b) or (c) as illustrated in
figure 2.16.
Theory of Distributions 61

(0) (b) (c)

Figure 2.16

F or the case of
(a) The cutter has held the standard and produced a bell-shaped curve.
(b) Here either consciously or not, the standard has been changing.
(c) Here the negative skew distribution has arisen by the cutter again either
consciously or not, placing control on the short end of the straw.

2.5.2 The Shove Halfpenny Experiment


Number of persons: groups of2 or 3.

Laboratory Equipment
Shove-halfpenny board or specially designed board (available from Technical
Prototypes (Sales) Ltd).

Method
After one trial, carry out 50 further trials, measuring the distance travelled each
time, the object being to send the disc the same distance at each trial.

Analysis
Summarise the data into a distribution, draw a histogram and calculate the mean
and standard deviation.

2.5.3 The Quincunx


This use of a quincunx developed by the authors is outlined below and gives an
effective simple demonstration of the basic concepts of variation.
This simple model, the principle of which was originally devised by Galton
to give a mechanical representation of the binomial distribution, is an
effective means of demonstrating to students the basic concepts of distributions.
The quincunx supplied with the statistical kit has ten rows of pins and seed
is fed in a stream through the pattern of pins. The setting is such that each pin
has a 50% chance of throwing anyone seed to the right or to the left and thus,
a symmetrical distribution is obtained. With this large array of pins and the
speed of the stream, a simple analogy of the basic concept of distributions can
be demonstrated speedily and effectively.
62 Statistics: Problems and Solutions

Distributions can be regarded as arising under conditions where the aim is to


produce items as alike as possible (the stream in the model), but due to a large
number of deflecting forces (the pins) each one independent, the final product
varies and this variation pattern forms a distribution. For example, if one
considers an automatic lathe, mass producing small components, then material
and machine settings are controlled to produce products alike. However, due to
a very large number of small deflecting forces-no one of which has an appreciable
effect, otherwise it would be possible to correct for it-such as vibration,
particles of dirt in the cooling oil, small random variation in the material, the
final components give rise to a distribution similar to that generated by the
quincunx.
Hypergeometric, Binomial
3 and Poisson distributions

3.1 Syllabus Covered


Use of hypergeometric distribution; binomial distribution and its application;
Poisson distribution and its application; fitting of distributions to given data.

3.2 Resume of Theory and Concepts


Here only a brief resume of theory is given since this is already easily available
in a wide range of textbooks. However, this resume, gives not only the basic
laws but also the basic concepts, which are necessary to give a fuller under-
standing of the laws.

3.2.1 The Hypergeometric Law


If a group contains N items of which M are of one type and the remainder
N - M, are of another type, then the probability of getting exactly x of the
first type in a random sample of size n is

3.2.2 The Binomial Law


If the probability of success of an event in a single trial is p and p is constant for
all trials, then the probability of x successes in n independent trials is

3.2.3 The Poisson Law


If the chance of an event occurring at any instant is constant in a continuum of
63
64 Statistics: Problems and Solutions

time, and if the average number of successes in time tis m, then the probability
of x successes in time t is

where m = expected ( or average) number of successes


e = exponential base e ~ 2.178
Here, the event's happening has a definite meaning but no meaning can be
attached to its not happening. For example, the number of times lightning strikes
can be counted and have a meaning but this is not true of the number of times
lightning does not strike.
Again the Poisson law can be derived as the limit of the binomial under
conditions where p ~ 0 and n ~oo but such that np remains finite and equal to m.
Then the probability of x successes is

as before.

3.2.4 The essential requirement is for students to be able to decide which


distribution is applicable to the problem, and for this reason the problems are
all given under the general headings, because for example the statement that the
problems relate to the Poisson law almost defeats the purpose of setting the
problems.

Tutors must stress the relationship between these distributions so that students
can understand the type to use for any given situation.
Tutors can introduce students to the use of binomial distribution in place of
hypergeometric distribution in sampling theory when n/N';;;; 0.10.
Students should be introduced to the use of statistical tables at this stage. For
all examples and problems, the complementary set of tables, namely Statistical
Tables by Murdoch and Barnes, published by Macmillan, has been used. As
mentioned in the preface, references to these tables will be followed by an
asterisk.

Note: The first and second moments of the binomial and Poisson distributions
arr given below.
Binomial Poisson
1st moment (mean) J.l. np m
2nd moment about the
mean (variance) a 2 np(1-p) m
Hypergeometric, Binomial and Poisson Distributions 65

3.2.5 Examples on Use of Distributions


1. Assuming randomness in shuffling, what is the distribution of the number of
diamonds in a 13-card hand? What is the probability of getting exactly five
diamonds in the hand?
This is the hypergeometric distribution.
Type I-diamonds 13 cards
Type 2-not diamonds 39 cards
Probability of x diamonds in 13-card hand is

P()= (~)C~~J
x CD
Probability of exactly five diamonds in the hand

( 13)(39) 13! 39!


P( 5) = 5 8 =8T5! x 3i!8! = 01247
( 52) 52! .
13 39! 13!

2. A distribution firm has 50 lorries in service delivering its goods; given that
lorries break down randomly and that each lorry utilisation is 90%, what
proportion of the time will
(a) exactly three lorries be broken down?
(b) more than five lorries be broken down?
(c) less than three lorries be broken down?

This is the binomial distribution since the probability of success, i.e. the
probability of a lorry breaking down, is p = 0.10 and this probability is constant.
The number of trials n = 50.
(a) Probability of exactly three lorries being broken down

P(3) = e30 ) 0.1 03 (1-0.10)47


from table 1* of statistical tables
:. P(3) =0.8883 -0.7497 = 0.1386
(b) Probability of more than five lorries being broken down

p(x> 5) = I
x=6
(50) 0.1O x (1-0.1O)50-X
x
66 Statistics: Problems and Solutions

from table 1*
p(x > 5) = 0.3839
(c) Probability ofless than three lorries being broken down

P(x < 3) = 1 - ~ (50) 0.lOx(1-0.l0)50-x = 1-0.8883 = 0.1117


x=3 x

3. How many times must a die be rolled in order that the probability of 5
occurring is at least 0.75?
This can be solved using the binomial distribution. Probability of
success, i.e. a 5 occurring, is p = i.
Let k be the unknown number of rolls required, then probability of x
number of 5's in k rolls

Probability required is

x=l
f (k)(I)X(~)k-X = 0.75
x 6 6
Since

1- ~1 (=)(~r(~r-X =probabilitYOfnotgettinga5inkthrOws=(~y
~ k=1-0.75=0.25

k = log 0.25 1.39794 = 0.60206 = 7 6


log 0.833 1.92065 0.07935 .
Number of throws required = 8

4. A firm receives very large consignments of nuts from its supplier. A random
sample of 20 is taken from each consignment. If the consignment is in fact 30%
defective, what is
(a) probability of finding no defective nuts in the sample?
(b) probability of finding five or more defective nuts in the sample?

This is strictly a hypergeometric problem but it can be solved by using the


binomial distribution since probability of success, i.e. of obtaining a defect, is
p = 0.30 which can be assumed constant. The consignment is large enough to
Hypergeometric, Binomial and Poisson Distributions 67

ignore the very slight change in p as the sample is taken.


(a) From table I *,p = 0.30, n = 20

P(O) =(~) 0.30°(1-0.30)20 = 1-0.9992 =0.0008


(b) The probability of finding five or more defectives
20
p(x ~ 5) = L (20) 0.30 x (1-0.30)20-x
x=s x

from table 1* = 0.7625

5. The average usage of a spare part is one per month. Assuming that all machines
using the part are independent and that breakdowns occur at random, what is
(a) the probability of using three spares in any month?
(b) the level of spares which must be carried at the beginning of each month
so that the probability of running out of stock in any month is at most 1 in
100?
This is the Poisson distribution.
The expected usage m = 1.0
(a):. Probability of using three spares in any month
1.03 e-l.O
P(3) = 3!

from table 2* P(3) = 0.0803-0.0190 = 0.0613

(b) This question is equivalent to: what demand in a month has a probability
of at most 0.01 of being equalled or exceeded?
Note: Runout if stock falls to zero.

From table 2*
Stocking four spares, probability of four or more = 0.0190
Stocking five spares, probability of five or more = 0.0037
:. Stock five, the probability of runout being 0.0037
(Note: It is usual to go to a probability of 1/100 or less.)

6. The average number of breakdowns due to failure of a bearing on a large


automatic indexing machine is two per six months. Assume the failures are
68 Statistics: Problems and Solutions

random, and calculate and draw the probability distribution of the number of
failures per six months per machine over 100 machines.
Calculate the average and the standard deviation of the distribution.
This is the Poisson distribution.
Expected number of failures per machine per six months, m = 2.

Expected
Number Probability number
u uh u2h
of failures Pi of failures
Ii
0 0.1353 13.5 -2 -27 54.0
I 0.2707 27.1 -I -27.1 27.1
2 0.2707 27.1 0 0 0
3 0.1804 18.0 +1 +18 18.0
4 0.0902 9.0 +2 +18 36.0
5 0.0361 3.6 +3 +10.8 32.4
6 0.0121 1.2 +4 4.8 19.2
7 or over 0.0045 0.5 +5 2.5 12.5
1.0000 'l:.li = 100 'l:.uli =0 u 2 Ii = 199.2
Table 3.1. The values have been calculated from table 2 * of statistical tables.

Transform x =uc + xo
Xo =2
c=l
The arithmetical average

Variance

3.2.6 Special Examples of the Poisson Distribution of General


Interest
The following examples have been chosen to show the use of the Poisson
distribution and to illustrate clearly the tremendous potential of statistics or,
that is, the logic of inference.
Hypergeometric, Binomial and Poisson Distributions 69

Students will be introduced here to some of the logic used later so that they
can see, even at this introductory stage, something of the overall analysis using
statistical methods.

1. Goals Scored in Soccer


Problem 7 in chapter 2 (page 47) gives the data to illustrate this example. The
distribution of actual goals scored in the 57 matches is given in table 3.2. The
mean of this distribution is easily calculated, as in chapter 2, as
average number of goals/match m = 3.l

Number of goals/match 012345678 Total


Frequency 2 9 11 15 8 5 5 57

Table 3.2

Setting up the hypothesis that goals occur at random at a constant average


rate, i.e. it does not matter which team is playing, then the Poisson distribution
should fit these data. Using table 2* of statistical tables the probabilities are
given in table 3.3, together with the Poisson frequencies.

Number of goals/match U 2 3 4 5 6 7 8 Total


Poisson probability 0.045. 0.140 0.217 0.223 0.173 0.107 0.056 0.024 0.0142 1.000
distribu tion
Poisson frequency 2.6 7.98 12.4 12.7 9.9 6.1 3.2 1.4 0.8 57
distribution
Actual frequency 2 9 11 15 8 5 5 57
distribution

Table 3.3

The agreement will be seen to be fairly close and when tested (see chapter 8), is
a good fit. It is interesting to see that the greater part of the variation is due to this
basic law of variation. However, larger samples tend to show that the Poisson
does not give a correct fit in this particular context.

2. Deaths due to Horsekicks


The following example due to Von Bortkewitz gives the records for lO army
corps over 20 years, or 200 readings of the number of deaths of cavalrymen due
to horsekicks. The frequency distribution of number of deaths per corps per
year is shown in table 3.4.
70 Statistics: Problems and Solutions

Number of deaths/corps/year o 1 2 3 4
Frequency 109 65 22 3

Table 3.4
From this table the average number of deaths/corps/year, m = 0.61
Setting up the null hypothesis, namely, that the probability of a death has been
constant over the years and is the same for each corps, is equivalent to
postulating that this pattern of variation follows the Poisson law. Fitting a
Poisson distribution to these data and comparing the fit, gives a method of
testing this hypothesis. Using table 2* of statistical tables, and without
interpolating, i.e. use m = 0.60, gives the results shown in table 3.5

Number of deaths 0 2 3 4 or more Total

Poisson probability 0.5488 0.3293 0.0988 0.0197 0.0034 1.0000


Poisson frequency 109.8 65.9 19.8 3.9 0.68 200
Actual frequency 109 65 22 3 1 200

Table 3.5

Comparison of the actual pattern of variation shows how closely it follows


the basic Poisson law, indicating that the observed differences between the
corps are entirely due to chance or a basic law of na ture.

3. Outbreaks of War
The data in table 3.6 (from Mathematical Statistics by J. F. Ractliffe, Q.U.P.) give
the number of outbreaks of war each year between the years 1500 and 1931
inclusive.

Number of outbreaks of war o 2 3 4 5 Total


Frequency 223 142 48 15 4 o 432

Table 3.6

Setting up a hypothesis that war was equally likely to break out at any instant
of time during this 432-year period would give rise to a Poisson distribution. The
fitting of this Poisson distribution to the data gives a method of testing this
hypothesis.
The average number of outbreaks/year = 0.69 ~ 0.70
Using table 2* of statistical tables, table 3.7 gives a comparison of the actual
variation with that of the Poisson. Again comparison shows the staggering fact
that life has closely followed this basic law of variation.
Hypergeometric, Binomial and Poisson Distributions 71

Number of
outbreaks of war o 2 3 4 5 or more Total

Poisson probability 0.4966 0.3476 0.1217 0.0283 0.0050 0.0008 1.00


Poisson frequency 214.5 150.2 52.6 12.2- 2.0 0.3 432
Actual frequency 223 142 48 15 4 0 432

Table 3.7

4. Demand for Spare Parts for B-47 Aircraft Airframe


Units Number of weeks
Item demanded
per week Observed Poissona

Seal: $3 each (lAFE 15-24548-501) 0 48 46


1 12 16
2 2 3b
3 2
50 c
Mean demand per week 0.3 c
Dome assembly: $610 each o 33 26
(lAFE4-2608-826) 1 17 24
2 7 11
3 5 3
4 2 1d
6 1
Mean demand per week 0.9
Boost assembly-elevator control o 20 17
$800 each (lAFE 15-24377-27, and 1 22 23
substitute -20, -504) 2 13 15
3 5 7
4 3 2
5 2 Ie
Mean demand per week 1.3

Table 3.8. Observed frequencies of demand compared wIth derived Poisson


distributions
a Computed by assuming that the observed mean demand per week is the mean of the
Poisson distribution.
b Two units or more.
c A demand of 50 units by a single aircraft was recorded on 23 December 1953. The
mean used to fit the Poisson distribution (0.3) was obtained omitting this demand.
dFour units or more.
e Five units or more.
72 Statistics: Problems and Solutions

The actual demands at the MacDill Airforce Base per week for three spares
for B-47 airframe over a period of 65 weeks are given in table 3.8.
The Poisson frequencies are obtained by using the statistical tables and
table 3.8 gives a comparison of the actual usage distribution with that of the
Poisson distribution.
The theoretical elements assuming the Poisson distribution are shown in the
table also. It will be seen that these distributions agree fairly well with actual
demands.

5. Spontaneous Ignitions in an Explosives Factory


The distribution of the number of spontaneous ignitions per day in an explosives
factory is shown in table 3.9 and covers a period of 250 days. The Poisson
frequencies, using the same mean number of explosions per day, have been
calculated and the fit found to be good. This implies that the explosions occur
at random, thus making it very unlikely that there is any systematic cause of
ignition.

Number of ignitions Observed number of days Poisson number of days

o 75 74.2
1 90 90.1
2 54 54.8
3 22 22.2
4 6 6.8
5 2 1.6
6 or more 0.4

Table 3.9. Mean number of ignitions per day = 1.126.

Authors' Special Note


In all the foregoing examples, the (actual) observed distribution is compared
with the (theoretical) expected distribution assuming the null hypothesis to be
true. It should be stressed here that the degree of agreement between the
observed and theoretical distributions can only be assessed by special tests,
called significance tests. These tests will be carried out in chapter 8 later in the
book.

3.3 Problems for Solution


1. A book of 600 pages contains 600 misprints distributed at random. What is
the chance that a page contains at least two misprints?
Hypergeometric, Binomial and Poisson lAstributions 73

2. If the chance that anyone of ten telephone lines is busy at any instant is 0.2,
what is the chance that five of the lines are busy?

3. A sampling inspection scheme is set up so that a sample of ten components is


taken from each batch supplied and if one or more defectives is found the
batch is rejected. If the suppliers' batches are defective
(a) 10% and (b) 20% what percentage of the batches will be rejected?

4. In a group of five machines, which run independently of each other, the


chance of a breakdown on each machine is 0.20. What is the probability of
breakdown of 0, 1, 2, 3, 4, 5 machines? What is the expected number of
breakdowns?

5. In a quality control scheme, samples of five are taken from the production at
regular intervals of time.
What number of defectives in the samples will be exceeded 1/20 times if the
process average defective rate is (a) 10%, (b) 20%, (c) 30%?

6. In a process running at 20% defective, how often would you expect in a


sample of 20 that the rejects would exceed four?

7. From a group of eight male operators and five female operators a committee
of five is to be formed. What is the chance of
(a) all five being male?
(b) all five being female?
(c) how many ways can the committee be formed if there is exactly one
female on it?
8. In 1000 readings of the results of trials for an event of small probability, the
frequencies!i and the numbers Xi of successes were:
Xi o 1 2 3 4 5 6 7
!i 305 365 210 80 28 9 2
Show that the expected number of successes is 1.2 and calculate the expected
frequencies assuming Poisson distribution.
Calculate the variance of the distribution.

3.4 Solutions to the Problems


.1. Assuming an average of one misprint per page, use of Poisson table 2* gives
P(2 or more misprints) = 0.2642
74 Statistics: Problems and Solutions

2. p(5 lines busy) = (150) 0.25 0.8 5 = 0.0264 from table 1* in statistical tables.

3. (a) Sample size n = 10


Probability of defective =p =0.10
Reject on one or more defectives in sample of 10
From table 1 *
:. Probability of finding one or more defectives in 10 = 0.6513
:. Percentage of batches rejected =65.13
(b) Sample size n = 10
Probability of a defective p =0.20
Reject on one or more defectives in sample of 10
From table 1*
Probability of finding one or more defective in 10 = 0.8926
:. Percentage of batches rejected =89.26

4. n =5 } From statistical tables the probabilities of 0,1,2,3,4,5 machines


p = 0.20 breaking down have been calculated and are given in table 3.10.

Number of machines Probability of


broken down this number

o 0.33
I 0.41
2 0.20
approximately
3 0.05
4 0.01
5 o
Table 3.10

Expected number of breakdowns =np =5 x 0.20 = 1

5. (a) n = 5
P =0.10
From table 1*
:. Probability of exceeding 1 = 0.0815
:. Probability of exceeding 2 = 0.0086
1 in 20 times is a probability of 0.05
:. Number of defectives exceeded 1 in 20 times is greater than 1 but less
than 2.
Hypergeometric, Binomial and Poisson Distributions 7S

(b) n = 5
p = 0.20
From table 1*
Probability of more than 2 = 0.0579
:. Number of defectives exceeded 1 in 20 times (approximately) is 2

(c) n = 5
p = 0.30
From table 1*
Probability of more than 3 = 0.0318
:. Number of defectives exceeded 1 in 20 times is nearly 3

6. n = 20
p = 0.20
From table 1*
Probability of more than four rejects = 0.3704
:. Four will be exceeded 37 times in 100

7. (a) M=8,N-M=5,N= 13,n=5


x=5

Probability

((8)\{(55 \ 8!
_~_ 5! 3! _ 8! 5! 8! _ 8 x 7 x 6 x 5 x 4 _
- (13) - 13! -5!3!13!-13xI2xllxl0x9- 0.044
5 5!8!
(b)M=8,N= 13,N-M=5,n=5
x=O

Probability

= (~)(D
13)
~ =
~
= 8! 5! =
13!
5x4x3x2x 1
13 x 12 x 11 x 10 x 9
= 000078
.
(
5 8! 5!
(c) Number of ways one female can be chosen from five

= (i) = 5
76 Statistics: Problems and Solutions

Number of ways four males can be chosen from eight

Total number of ways

= 5 x (8) = 5 x £ = 5 x 8 x 7 x 6 x 5 = 350
4 4! 4! 4x3x2x 1
8.
10. x f u z.f u 2f
Assumed
mean 0 305 -1 -305 305
1 365 0 0 0
2 210 +1 210 210
3 80 +2 160 320
4 28 +3 84 252
5 9 +4 36 144
6 2 +5 10 50
7 1 +6 6 36
r-f 1000 r-u[201 r-u 2 f 1317

Table 3.11

Expected number = x = 1 + ~u[ = 1.201

2 (r-u[)2
. r-u [ - r-[ _ 1317 - 40 _
Vanance = r-[ - 1000 - 1.277

3.5 Practical Laboratory Experiments and Demonstrations


The authors feel that of the three distributions the binomial lends itself best to
demonstration by laboratory experiments. Attempts to demonstrate a true
hypergeometric or Poisson distribution tend to be either very tedious and/or
relatively expensive.
However, with the use of the binomial sampling boxest the basic concepts
and mechanics of the binomial can be speedily and effectively demonstrated.
The use of both 6-sided and/or the special decimal dice also give a simple
method for carrying out binomial distribution experiments.
Appendix 1 contains the full details of the experiment, together with a
sample set of results.
t Available in two sizes from Technical Prototypes Ltd. lA, Westholme Street,
Leicester.
Hypergeometric, Binomial and Poisson Distributions 77

Appendix I-Experiment 7 and Sample Results

Binomial Distribution
Number of persons: 2 or 3.

Object
The experiment is designed to demonstrate the basic properties of the binomial
law.

Method
Using the binomial sampling box, take 50 samples of size 10 from the popUlation,
recording in table 18, the number of coloured balls found in each sample.
(Note: Proportion of coloured (Le. other than white) balls is 0.15.)

Analysis
1. Group the data of table 18 into the frequency distribution, using the top
part of table 19.
2. Obtain the experimental probability distribution of the number of coloured
balls found per sample and compare it with the theoretical probability
distribu tion.
3. Combine the frequencies for all groups, using the lower part of table 19,
and obtain the experimental probability distribution for these combined results.
Again, compare the observed and theoretical probability distributions.
4. Enter, in table 20, the total frequencies obtained by combining individual
groups' results. Calculate the mean and standard deviation of this distribution
and compare them with the theoretical values given by np and y[np(1-p)]
respectively where, in the present case, n = 10 and p = 0.15.

Sample Results

1-10 3 2 , 3 'r 3 2 0 0 I
11-20 2 3 :2. 3 3 I ::(. I I 0
21-30 0 0 I I 2 I 3 I I 3
31-40 4- 4- I I 4- I .2 3 2 2.
41-50 I I I I .<. I 0 0 I ::<..

Table 3.12 (Table 18 of the laboratory manual)

Summarise these data in table 19.


78 Statistics: Problems and Solutions

Number of coloured balls in sample


Total
0 I 2 3 4 5 6 7 8 9 10 frequency

'Tally-marks'
11 t
-:Y $
$ f1
$:;::;, ~
~
;:::::.

Group No._
~ /"

$
Experimental
frequency 7 /q II 9 L.r 50
Experimental
probability O·ltt ~.1>'i O·~ a·li O-Oil 1·0
Theoretical
probability 1)·1'17 Io-~Io' o-17~ 0·1";) 00 ..... ~o, ~-001 1·0
Group I 7 /9 II q 4- 5"0
results
2 Il. LO g g I I 50
3 10 Ib 17 4- :3 50
4 g I b 17 5 :L ~ 50
5 7 5 17 I~ 7 I 50
6 5 ~I III 10 2 I 50
7 I~ Ib l;t B I 50
8 I:' 17 II 7 1. 50
Total frequency
(all Qroups) 74- 1,,;)0 lOIr b5 2.l. 4- I /;..00
Experimental
probability ~·\ts jo.~1S"/O·LW Io·\~l ~-oH 0·010 ~
Table 3.13 (Table 19 of the laboratory manual)

Number of
coloured
bolls per
Ic-l1Iquency sample
f x fx fx2
7/t 0 0 0
I~O I I"?:>O I~O

IO/.to 2 .10il It-Ib


b(" 3 1 'IS ~'i~
l..'l. 4 1Jj ~S:L
4- 5 ~O 100
I 6 b ~"
7
8
9
10
trotals If=ltOO Ifx=&1t1 If..v2ot WI
Table 3.14 (Table 20 of the laboratory manual)
Hypergeometric, Binomial and Poisson Distributions 79

For the distribution of number of coloured balls per sample of 10


_ "Lfx _ 647 _
observed mean - "Lf - 400 - 1.618

Observed standard deviation


J[
=
"LfX2 - ("LfX)2]
"Lf "Lf =
J1619 _ (647)2
400 400

= 1.19

Theoretical mean = np = 10 x 0.15 = 1.5


Theoretical standard deviation = ylnp(1-p)] =y(10 x 0.15 x 0.85)
=Y1.275 = 1.13.
4 Normal distribution

4.1 Syllabus Covered


Equation of the normal curve; area under the normal curve; ordinates of the
normal curve; standardised normal variate; use of tables of area; fitting of
normal distribution to data; normal probability paper.

4.2 Resume of Theory


4.2.1 Introduction
The normal, or gaussian, distribution occupies a central place in the theory of
statistics. It is an adequate, and often very good, approximation to other
distributions which occur; examples of this are given in chapters 5 and 6. Many
of the advanced methods of statistics require the assumption that the basic
variables being used are normally distributed; the purpose of this is usually to
allow standard tests of Significance to be applied to the results.
It often happens, however, that data summarised into a frequency
distribution (see chapter 2) are more or less normally distributed; that is, some
central value of the variable has the highest frequency of occurrence and the
class frequencies diminish near enough symmetrically on either side of the
central value. In such cases, it is very convenient to use the properties of the
normal distribution to describe the popUlation. This chapter deals with the
main properties of the normal distribution.

4.2.2 Equation of the Normal Curve


Chapter 2 mentioned that, the greater the number of readings that are taken,
the more the outline of the plotted histogram tends to a smooth curve. If the
population is actually normal then this limiting shape of the histogram will be
similar to that in figure 4.1.
The curve can be described in terms of an equation so that the height of the
curve, y, can be expressed in terms of the value of the measured variable, x.
80
Normal Distribution 81

This equation is

1 e
y = av'(21T)
where JJ, is the mean of the variable x
G is the standard deviation of x
e is the well-known mathematical constant (=2.718 approximately)
1T is another well-known mathematical constant (=3.142 approximately)

This equation can be used to derive various properties of the normal distribution.
A useful one is the relation between area under the curve and deviation from the
mean, but before looking at this we need to refer to a standardised variable.

x-
Figure 4.1

4.2.3 Standardised Variate


Any random variable, x, having mean, JJ" and standard deviation, G, can be
expressed in standardised form, i.e. x is measured from JJ, in multiples of G. The
standardised variable is therefore given by (x - M}/a and is dimensionless.
In particular, if x is a normal variate then
x-JJ,
u=--
G
is a standardised normal variate.
Tables 3, 4 and 5* in statistical tables are tabulated in terms of this
standardised normal variate, u, and therefore they apply to any normal variate.

4.2.4 Area under Normal Curve


The total area under the normal curve is unity (as is the case for any probability
density function) and the area under the curve between two values of x, say a
and b (shown shaded in figure 4.2) gives the proportion of the population having

Prob(a < x ~ b)

Figure 4.2 a x
82 Statistics: Problems and Solutions

values between a and b. This is equal to the probability that a single random
value of x will be bigger than a but less than b.
By standardising the variable and using the symmetry of the distribution,
table 3* can be used to find this probability as well as the unshaded areas in each
tail.

4.2.5 Percentage Points of the Normal Distribution


Table 4* gives percentage points (this is the common name although it is
actually 'proportion points' which are tabulated) of the normal distribution;
the a-proportion point or 100 a percentage point is the value of u, denoted
by U Ol , which is exceeded with probability a. Negative values of uOl
corresponding to a greater than 0.50 can be found by symmetry.

4.2.6 Ordinates of the Normal Curve


Table 5* gives the height of the normal curve for values of U and by plotting
a selection of points, the outline of a normal distribution with any required
mean and standard deviation can be drawn.

4.2.7 Fitting a Normal Distribution to a Set of Data


Observed data will often be presented in the form of a frequency distribution
together with a histogram. A normal distribution can be fitted to such a
summary:.. The continuous curve outlining the shape of the normal distribution
with the same (or any other) mean and standard deviation can be superimposed
on the histogram using the ordinates in table 5*.
However, a more usual approach is to find the expected frequencies in each
class interval of the observed data assuming that the population is normal with
some given mean and standard deviation. This is best done using table 3* of
areas and gives a basis for testing whether the assumption of normality is
reasonable for the observed data (see chapter 8 for an example).

4.2.8 Arithmetic Probability Paper


This is graph paper with a special scale which makes the normal distribution,
when plotted cumulatively, appear as a straight line. One axis has a linear scale
and on this one convenient values of the variate are plotted. The other scale is
usually marked in percentages which represent the probability that the variate
takes on a value less than or equal to each of the plotted values.
Any observed data can be plotted on this paper, the straighter the line the
more nearly normal is the distribution. Unfortunately the straightness of the
line is rather a subjective judgement.
If a variate is obViously not normal, a suitable transformation can sometimes
be found which is distributed approximately normally; that is the logarithm, say,
(or the square root or the reciprocal, etc.), of each observation is used as the
Normal Distribution 83

variable. By plotting these new variables on probability paper it can be seen


whether any of the transformations gives a straight line.

4.2.9 Worked Examples


1. What is the chance that a random standardised normal variate
(a) will exceed 1.0?
(b) will be less than 2.0?
(c) will be less than -2.0?
(d) will be between -1.5 and +0.5?
Table 3* can be used to find these probabilities and it is useful to draw a
diagram to ensure that the appropriate areas are found. In figure 4.3 the
shaded areas represent the required answer. Remember that table 3* gives the
probability of exceeding the specified value ofu for positive values ofu only.
(e)

o u o

Figure 4.3

(a) u = 1.0
Area = 0.1587
(b) u = 2.0
Area in right tail = 0.02275
Thus shaded area = 1 - 0.02275 = 0.97725
(c) By symmetry area to left of u =- 2 is the same as the area to the right
ofu = +2.
Thus the shaded area = 0.02275
(d) Area above u = +0.5 is 0.3085
Area below u =-1.5 is 0.0668
Total unshaded area = 0.3753
:. shaded area = 0.6247
84 Statistics: Problems and Solutions

2. Jam is packed in tins of nominal net weight 1 kg. The actual weight of jam
delivered to a tin by the filling machine is normally distributed about the set
weight with standard deviation of 12 g.
(a) If the set, or average, filling of jam is 1 kg what proportion of tins
contain
(i) less than 985 g?
(ii) more than 1030 g?
(iii) between 985 and 1030 g?

(b) If not more than one tin in 100 is to contain less than the advertised
net weight, what must be the minimum setting of the filling machine in order to
achieve this requirement?

(a) In solving such problems as these, it is always useful to draw a sketch


(figure 4.4) to ensure that the appropriate area under the curve is found from
tables. * In each case the shaded area is the required solution.

985 1000 1000 1030


( i) Oi)

1000 1030
(iii)
Figure 4.4

(i) u = 985 ~21000 = -1.25

Using table 3* and the symmetry of the curve, the required proportion is 0.1056

(11.. ) u = 103012
- 1000 = 30 = 2 5
12·

This corresponds to a right-hand tail area of 0.00621


(iii) To find a shaded area as in this case, the tail areas are found directly
from tables* and then subtracted from the total curve area (unity).
Normal Distribution 8S

The lower and upper tail areas have already been found in (i) and (ii) and
thus the solution is
1-(0.1056 + 0.00621) = 1-0.1118 = 0.8882
(b) In this case, the area in the tail is fixed and in order to find the value of
the mean corresponding to this area, the cut-off point (1000 g) must be
expressed in terms of the number of standard deviations that it lies from the
mean.

~.
0.01
u=12

1000
Figure 4.5

From table 4* (or table 3* working from the body of the table outwards),
1% of a normal distribution is cut off beyond 2.33 standard deviations from the
mean.
The required minimum value for the mean is thus
1000 + 2.33 x 12 = 1028 g = 1.028 kg
3. The data from problem 1, chapter 2 (page 46), can be used to show the
fitting of a normal distribution. The observed and fitted distributions are also
shown plotted on arithmetic probability paper.
The mean of the distribution was 0.087 min and the standard deviation
0.013 min. The method of finding the proportion falling in each class of a
normal distribution with these parameters is shown in table 4.1. The expected
class frequencies are found by multiplying each class proportion by the total
observed frequency. Notice that the total of the expected normal frequencies is
not 60. The reason is that about !% of the fitted distribution lies outside the
range (0.045 to 0.125) that has been considered.
Table 4.2 shows the observed and expected normal class frequencies in
cumulative form as a percentage of the total frequency. Figure 4.6 shows these
two sets of data superimposed on the same piece of normal (or arithmetic)
probability paper.
The dots in figure 4.6 represent the observed points and the crosses represent
the fitted normal frequencies. Note that the plot of the cumulative normal
percentage frequencies does not quite give a straight line. The reason for this
is that the Mo of the normal distribution having values less than 0.045 has not
been included. If this 16% were added to each of the cumulative percentages in
the right-hand column of table 4.2 then a straight-line plot would be obtained.
SPS-4
00
0\

Standardised Area in Expected Observed


Class Upper tail area
Class upper each normal class
boundary boundary 'u' of N.D. above 'u'
class frequency frequency

0.035-0.045
0.045 -3.23 1-0.0006 = 0.9994
0.045-0.055 0.0063 0.4 0
0.055 -2.46 1-0.0069 = 0.9931
0.055-0.065 0.0386 2.3 5
0.065 -1.69 1-0.0455 = 0.9545
0.065-0.075 0.1333 8.0 4
0.075 -0.92 1-0.1788=0.8212
0.075-0.085 0.2616 15.7 14
0.085 -0.15 1-0.4404 = 0.5596
0.085-0.095 0.2920 17.5 23
0.095 0.62 0.2676
0.095-0.105 0.1838 11.0 9 ~
0.105 1.38 0.0838 ....
s::.
0.105-0.115 2.15 0.0680 4.1 5 .....
0:;.
0.115 0.0158 ....
0.115-0.125 0.0140 0.8 0 ;:;.
0.125 2.92 0·0018 .,..
0.9976 59.8 60 ~
0
Q'
;:;;
Table 4.1
.,~
s::.
;::s
I:l..
~
0
ii:
:::to
0
.,;::s
~
~
!::.
1::::1
1:;.
~
~
I::
...c·
;::s
Observed Fitted normal

Cumulative % Cumulative Cumulative % Cumulative


Class Frequency Frequency
frequency frequency frequency frequency

0.045-0.055 0.4 0.4 0.7


0.055-0.065 5 5 8.3 2.3 2.7 4.5
0.065-0.075 4 9 15.0 8.0 10.7 17.8
0.075-0.085 14 23 38.3 15.7 26.4 44.0
0.085-0.095 23 46 76.7 17.5 43.9 73.2
0.095-0.105 9 55 91.7 11.0 54.9 91.5
0.105-0.115 5 60 100.0 4.1 59.0 98.3
0.115-0.125 0.8 59.8 99.7

Table 4.2

QO
--!
88 Statistics: Problems and Solutions

0.5
I
,,
,,
5 ,,
10 ,,
20
", ,
,
~ 40
c: ,,
,
Q)
~
0'
60
~
..... ",
~ 70
,
", ,
.!:! 80

90
,,
,
" ,
" ,,
99 ,,
""
99.9

0.055 0.065 0.075 0.085 0.095 0.105 O. I 15 0.125


Upper class boundaries
Figure 4.6

A further point to note is that the cumulative frequences are plotted against
the upper class boundaries (not the mid point of the class) since those are the
values below which lie the appropriate cumulative frequencies.
In addition, if the plotted points fall near enough on a straight line, which
implies approximate normality of the distribution, the mean and standard
deviation can be estimated graphically from the plot. To do this the best
straight line is drawn through the points (by eye is good enough). This straight
line will intersect the 16%,50% and 84% lines on the frequency scale at three
points on the scale of the variable.
The value of the variable corresponding to the 50% point gives an estimation
of the median, which is the same as the mean if the distribution being plotted
is approximately symmetrical.
The horizontal separation between the 84% and 16% intercepts is equal to
Normal Distribution 89

2a for a straight line (normal) plot and so half of this distance gives an estimate
of the standard deviation.
Applying this to the fitted normal points, the mean is estimated as 0.087
and the standard deviation comes out as 0;5 (0.100-0.074) = 0.013, the
figures used to derive the fitted frequencies in the first place. The small bias
referred to earlier caused by omitting the bottom It,% of the distribution in the
plot has had very little influence on the estimate in this case.

4.3 Problems for Solution


(** denotes more difficult problems)
1. For any normal distribution, what proportion of it is

(a) more than twice the standard deviation above the mean?
(b) further than half the standard deviation below the mean?
(c) within one and a half standard deviations of the mean?

2. A normal distribution has a mean of 56 and a standard deviation of 10. What


proportion of it

(a) exceeds 68?


(b) is less than 40?
(c) is contained between 56 and 65?
(d) is contained between 60 and 65?
(e) is contained between 52 and 65?

3. Problem 8 of chapter 2 (page 48) gives the intelligence quotients of a sample


of 100 children. The mean and standard deviation of these numbers are 99.3
and 13.4, respectively, and the histogram indicates that normality is a good
assumption for the distribution of intelligence quotient (I.Q.).
(a) What proportion of all children can be expected to have I.Q's
(i) greater than 120?
(ii) less than 90?
(iii) between 70 and 130?
(b) What I.Q. will be exceeded by
(i) 1% of children?
(ii) 0.1% of children?
(iii) 9<m of children?
(c) Between what limits will 95% of children's I.Q. values lie?

What assumptions have been made in obtaining these answers?


90 Statistics: Problems and Solutions

4. A process of knitting stockings should give a mean part-fInished stocking


length of 1.45 m with a standard deviation of 0.013 m. Assuming that the
distribution of length is normal,
(a) if a tolerance of 1.45 m ± 0.020 m is fIXed, what total percentage of
oversize and undersize stockings can be expected?
(b) What tolerance can be worked to if not more than a total of 5% of
stockings undersized or oversized can be accepted?
(c) if the mean part-fmished length is actually 1.46 m, what proportion of
the output are undersized or oversized stockings, allowing a tolerance of
1.45 m ± 0.025 m.

5. The door frames used in an industrialised building system are of one


standard size. If the heights of adults are normally distributed, men with a mean
of 1.73 m and standard deviation of 0.064 m and women with a mean of 1.67 m
and standard deviation of 0.050 m,
(a) what proportion of men will be taller than the door frames if the standard
frame height is 1.83 m?
(b) what proportion of women will be taller than the standard frame height of
1.83 m?
(c) what proportion of men will have a clearance of at least 13 cm on a
frame height of 1.83 m?
(d) what should the minimum frame height be such that at most one manin
a thousand will be taller than the frame height?
(e) if women outnumber men (e.g. in a large department store) in the ratio
19 : 1, for what proportion of people would a frame height of 1.83 m be too
low?

6. The data summarised in table 4.3 come from the analysis of 53 samples of
rock taken every few feet during a tin-mining operation. The original data for
each sample were obtained in terms of pounds of tin per ton of host rock but
since the distribution of such a measurement from point to point is quite skew,
the data were transformed by taking the ordinary logarithms of each sample
value and summarising the 53 numbers so obtained into the given frequency
distribu tion.
Fit a normal distribution to the data.

**7. The individual links used in making chains have a normal distribution of
strength with mean of 1000 kg and standard deviation of 50 kg.
If chains are made up of 20 randomly chosen links
(a) what is the probability that such a chain will fail to support a load of
900 kg?
Normal Distribution 91

Logarithm of ore Frequency of given


grade ore grade

0.6-0.799
0.8-0.999 3
1.0-1.199 6
1.2-1.399 8
1.4-1.599 12
1.6-1.799 11
1.8-1.999 6
2.0-2.199 4
2.2-2.399 2
53

Table 4.3

(b) what should the minimum mean link strength be for 99.9% of all chains
to support a load of 900 kg?
(c) what is the median strength of a chain?

**8. The standardised normal variate, U, having mean of 0 and variance of 1, has
probability density function
A,/ ) 1 _!.u·
'l'\u = V(21T) e' ,

If this distribution is truncated at the point Ucx (Le. the shaded portion, a,
of the distribution above Ucx is removed-see figure 4.7), obtain an expression in
terms of a and Ucx showing the amount by which the mean of the truncated
distribution is displaced from U = o.

Figure 4.7

9. In a bottle-fIlling process, the volume of liquid delivered to a bottle is


hormally distributed with mean and standard deviation of 1 litre and 5 m1
respectively. If all bottles containing less than 991 m1 are removed and emptied,
and the contents used again in the fIlling process, what will be the average volume
of liquid in bottles offered for sale?
92 Statistics: Problems and Solutions

4.4 Solutions to Problems


1. Use table 3* of statistical tables.
(a) The proportion two standard deviations is 0.02275 (from the table).
(b) From the symmetry of the normal distribution, 0.3085 of the area is
further than 0.5 standard deviations below the mean.
(c) 0.0668 of the distribution is beyond one and a half standard deviations
from the mean in each tail. Thus the proportion within 1.5 standard deviations is
1-(0.0668 +0.0668) = 0.8664
68 - 56 12
2. (a) u= 10 = 10= 1.2

Thus, 0.1151 of the area exceeds 68

Figure 4.8
Lr\: 56 68

(b) 40-56=-16=_16
u 10 10 .
Thus, 0.0548 of the distribution takes values less than 40.

Figure 4.9 40 56

65-56
(c) For 65, U = 10 = 0.9

Area in upper tail above 65 = 0.1841


For 56, U = 0
:. Required shaded area = 0.5000 - 0.1841 = 0.3159

Figure 4.10
Normal Distribution 93

60-56
(d) For 60, U =-1-0- = 0.4

Thus, area above 60 is 0.3446. Area above 65 is found in (c) to be 0.1841.


Thus, proportion between 60 and 65 is 0.3446-0.1841 = 0.1605.

Figure 4.11

52-56
(e) For52,u= 10 =-0.4

From symmetry, area below 52 = 0.3446.


From (c) area above 65 = 0.1841. Thus, proportion between 52 and
65 = 1 -(0.3446 + 0.1841) = 1-0.5287 = 0.4713

Figure 4.12

3. (a) (i) For LQ. = 120, U = 12~;~9.3 = 1.54


Proportion greater than 120 is 0.0618, say, 0.06

Figure 4.13
~ 99.3 120

(ii) LQ. = 90, U = 901~~.3 -0.69

By symmetry, proportion less than 90 = 0.2451, say 0.24, since U is nearer to


-0.694

Figure 4.14 90 99.3


94 Statistics: Problems and Solutions

... ) I•Q• = 130, u = 13013.4


(111 - 99.3 2 29
=.
Area above u = 2.29 is 0.0110.
70-99.3
I.Q. = 70, u = 13.4 =-2.19
Area below u = -2.19 is 0.0143
Proportion of children with LQ. values between 70 and 130 is
1 - (0.011 0 + 0.0143) = 1 - 0.0253 = 0.975

Figure 4.15 70 99.3 130

(b) (i) For all normal distributions, 1% in the tail occurs at a point 2.33 standard
deviations from the mean. (See table 4* or use table 3* in reverse.)
Thus, 1% of all children will have an LQ. value greater than
99.3 + 2.33 x 13.4 = 99.3 + 31.2 = 130.5

~,
Figure 4.16 99.3 ?

(ii) For a = 0.001 (0.1%), the corresponding u-value is 3.09.


Thus one child in 1000 will have an I.Q. value greater than

LT\r,
99.3 + 3.09 x 13.4 =99.3 + 41.5 = 140.8

Figure 4.17 99.3 ?

(iii) Ten per cent of children will have I.Q. values less than the value which
90% exceed.
The u-value corresponding to this point is -1.28 and converting this into the
scale of I.Q. gives
99.3 -1.28 x 13.4 = 99.3 -17.2 = 82.1
Normal Distribution 95

Figure 4.18
~ ? 99.3

(c) We need to find the lower and upper limits such that the shaded area is 95%
of the total. There are a number of ways of doing this, depending on how the
remaining 5% is split between the two tails of the distribution. It is usual to
divide them equally. On this basis, each tail will contain 0.025 of the total area
and here the required limits will be 1.96 standard deviations below and above
the mean respectively.
Thus, 95% of children will have I.Q. values between
99.3 - 1.96 x 13.4 and 99.3 + 1.96 x 13.4 i.e.
99.3 - 26.2 and 99.3 + 26.2

73.1 and 125.5

Figure 4.19 ? 99.3 ?

We have assumed that the original sample of 100 children was taken randomly
and representatively from the whole population of children about whom the
above probability statements have been made. This kind of assumption should
always be carefully checked for validity in practice.
In addition, the mean and standard deviation of the sample were used as
though they were the corresponding values for the population. In general, they
will not be numerically equal, even for samples as large as 100, and this will
introduce errors into the statements made. However, the answers will be of the
right order of magnitude which is mostly all that is reqUired in practice.
The assumption of normality of the population has already been mentioned.

4. (a) If the mean length is 1.45 m then the maximum deviation allowed for a
stocking to be acceptable is
+ 0.020
- 0.013

standard deviations, i.e. U = ±1.54.


The percentage of unacceptable output is represented by the two shaded
areas in figure 4.20 and is 2 x 0.0618 x 100 = 12.36%.
96 Statistics: Problems and Solutions

Figure 4.20 1.43 1.45 1.47

(b) This time the two shaded areas are each specified to be 0.025 (2!%).
Therefore the tolerance that can be worked to corresponds to u =± 1.96,
Le. to ± 1.96 x 0.013 = ± 0.025 m, or ±25 mm.

Figure 4.21
~~~ ? 1.45 ?

(c) The lower and upper lengths allowed are 1.425 m and 1.475 m respectively.
The shaded area gives the proportion of stockings that do not meet the standard
when the process mean length is 1.46 m.

1.475 m, u= 1.46~o~i.460 - 1.15; area = 0.1251

1.425 m, U = 1.4~~0~i.460 - 2.69; area =0.0036

Total shaded area = 0.1287


Thus nearly 13% of output will not meet the standard.

Figure 4.22 1.425 1.46 1.475

5. (a.) For.
183 m, U =
1.83-1.73
0.064 156
.

Required proportion = 0.0594

Figure 4.23
AS: 1.73 1.83
Normal Distribution 97

(b) For 1.83 m, U


1.83 -1.67 = 3 2
0.050 ..

Js:o
Required proportion = 0.00069

Figure 4.24 1.67 1.83

(c) Men shorter than 1.83 - 0.13 = 1.70 will have a clearance of at least
O.13m.

Correspon di ng U = 1.70-1.73
0.064
047
=- .

From symmetry, proportion of men with at least 13 cm to spare is 0.3192.

~64
Figure 4.25 1.70 1.73

(d) The frame height which is exceeded by one man in a thousand will be
3.09 standard deviations above the mean height of men, i.e. at

/13:4
1. 73 + 3.09 x 0.064 = 1.93 m

0.001
Figure 4.26 1.73 ?

(e) For women, 1.83 m corresponds to u = 1.83 -1.67 - 3.2


0.050
Proportion of women taller than 1.83 m = 0.00069

For men,.
1 83 m correspon ds to u = 1.83 -1.73 1 56
0.064 =.
Proportion of men taller than 1.83 m = 0.0594
:. Expected proportion of people for whom 1.83 m is too low is
0.00069 x 0.95 + 0.0594 x 0.05 = 0.004, i.e
4 people in a 1000.
98 Statistics: Problems and Solutions

The problem can be extended by allowing some people to wear hats as well
as shoes with different heights of heel.
This problem was intended to give practice in using normal tables of area. Any
practical consideration of the setting of standard frame heights would need to
take account of the physiological and psychological needs of human door users,
of economics and of the requirements of the rest of the building system.

6. It is quite possible to use a normal distribution having an arbitrary mean and


standard deviation, but it would make more sense in this case to use the mean
and standard deviation of the observed data. The reason for this is that we are
mainly concerned with testing the assumption of normality without wishing to
specify the parameters.
First the mean and standard deviation are found.

Coded
x [ variable (u) [u [u 2

0.6-0.799 1 -4 -4 16
0.8-0.999 3 -3 -9 27
1.0-1.199 6 -2 -12 24
1.2-1.399 8 -1 -8 8
1.4-1.599 12 0 -33 0
1.6-1.799 11 1 11 11
1.8-1.999 6 2 12 24
2.0-2.199 4 3 12 36
2.2-2.399 2 4 8 32
53 43 178
-33
10

Table 4.4

Mean = 1.5 + 0.2 x ~ = 1.538

. .
Standard de."..on " 0.2 J 178- 53
(S3
10 )
2

" 0.2 J(53)"


176.1
0.364

Using these two values, the areas under the fitted normal curve falling in each
class are found using table 3* of the statistical tables. This operation is carried
out in table 4.5. Note that the symbol u in the table refers to the standardised
normal variate corresponding to the class boundary, whereas in table 4.4 it
represents the coded variable (formed for ease of computation) obtained by
Normal Distribution 99

subtracting 1.5 from each class midpoint and dividing the result by 0.2, the
class width.

Expected
Class Area in
Class u Area above u normal
boundaries each class
frequency

0.4-0.6
0.6 -2.58 1-0.0049 = 0.9951
0.6-0.8 0.0163 0.86
0.8 -2.03 1-0.0212 = 0.9788
0.8-1.0 0.0482 2.55
1.0 -1.48 1-0.0694 = 0.9306
l.0-l.2 0.1068 5.66
1.2 -0.93 1-0.1762 = 0.8238
1.2-l.4 0.1758 9.32
1.4 -0.38 1-0.3520 = 0.6480
l.4-1.6 0.2155 11.42
l.6 0.17 0.4325 10.43
l.6-1.8 0.1967
l.8 0.72 0.2358 7.09
1.8-2.0 0.1338
2.0 1.27 0.1020
2.0-2.2 0.0676 3.58
2.2 1.82 0.0344
2.2-2.4 0.0255 l.35
2.4 2.37 0.0089
2.4-2.6

Table 4.5

7. (a) Since a chain is as strong as its weakest link, the chain will fail to support
a load of 900 kg if one or more of its links is weaker than 900 kg.

The probability that a single link is weaker than 900 kg is given by the area
in the tail of the normal curve below
900-1000 .
u= 50 =-2, 1.e. 0.02275

:. The probability that a single link does not fail at 900 kg = 0.97725 and the
probability that none of the links fails = 0.97725 20 . Thus the probability that
a chain of 20 links will not support a load of 900 kg is
1-(0.97725)20 = 1-0.631 = 0.37

900 1000
Figure 4.27 Single link strength

(b) In this case, the probability of a chain supporting a load of 900 kg is


required to be 0.999.
100 Statistics: Problems and Solutions

Let p be the probability that an individual link is stronger than 900 kg.
Then we have that
p20 = 0.999
p = 0.99998 (using 5 figure logarithms)

Figure 4.28 Single link strength

It follows that the probability of an individual link's being weaker than


900 kg must be at most 0.00002.
Thus 900 kg corresponds to u = - 4.0 approximately and the mean link
strength must be at least
900+4.0 x 50 = 1100 kg
(c) In the long run, one chain out of every two will be stronger than the
median chain strength.
Let p be the probability that an individual link exceeds the median chain
strength.
Then from p20 =0.5
P = 0.96594 (using 5 figure logarithms)
and the probability that an individual link is less than the median chain strength
is (l-p) =0.0341.

Figure 4.29
0.0341
~
u =-1.82 1000
Single link strength

Such a tail area corresponds approximately to u = -1.82 and the median


strength of a chain is therefore given by
1000-(1.82 x 50) = 909 kg

'8. The density function is


Normal Distribution 101

Figure 4.30
L o U a u--

The mean of the truncated distribution is given by

Since the mean was previously at u = 0 (Le. when a = 0), the above
expression also represents the shift in mean.
cf>(u Ol ) is the ordinate (from table 5* of statistical tables) of the normal
distribution corresponding to u =u Oi •
The result just obtained can be used to solve the numerical part of the
problem.
The bottle contents are distributed normally but if the segregation process
operates perfectly (which it will not do in practice), the distribution of bottle
contents offered for sale will correspond to the unshaded part of figure 4.31.

991 1000
Figure 4.31 Bottle contents (ml)

The cut-off volume of991 ml corresponds to


991-1000
u= 5 1.8

The amount of truncation is therefore a = 0.0359. The increase in mean


volume of despatched bottles is therefore
1 0.0790
--'-(1---0-".0'-3-59-) x cf>(-1.8) x a = 0.9641 x 5 = 0.41 ml
102 Statistics: Problems and Solutions

Note: The change in mean is positive since the truncation occurs in the lower
tail instead of the upper tail.
The mean volume of bottle contents is therefore 1000 + 0041 = 100004 ml.

4.5 Practical Laboratory Experiments and Demonstrations


The following experiments are reproduced from Basic Statistics, Laboratory
Instruction Manual

Appendix I-Experiment 10

Normal Distribution
Number of persons: 2 or 3.

Object
To give practice in fitting a normal distribution to an observed frequency
distribu tion.

Method
The frequency distribution of total score of three dice obtained by combining
all groups' results in table 2, experiment 1, shOUld be re-listed in tabie 26
(Table 4.6).

Analysis
1. In table 26, calculate the mean and standard deviation of the observed
frequency distribution.
2. Using table 27, fit a normal distribution, haVing the same mean and standard
deviation as the data, to the observed distribution.
3. Draw the observed and normal frequency histograms on page 46 and comment
on the agreement.

Notes
1. It is not implied in this experiment, that the distribution of the total score
of three dice should be normal in form.
2. The total score of three dice is a discrete variable, but the method of fitting
a normal distribution is exactly the same for this case as for a frequency
distribution of grouped values of a continuous variable.
Normal Distribution 103

Class width = unity.


If c = width of class interval, choose Xo to be the midpoint of a class which,
by inspection, is somewhere near the mean of the distribution.
Obtain the class values u from the relation

u =x-xo
c

Mid The values of u will be


Class point I"'requcnc' Class positive or negative integers.
Interval x f u fu fu2
The mean, x, of the sample
2.5 -3.5 3 is given by
3.5 -4.5 4
__ + 'i:.fu
4.5-5.5 5
x -Xo 'i:.f
5.5-6.5 6
=
6.5-7.5 7
=
7.5-8.5 8

8.5-9.5 9

9.5-10.5 10 The variance, (sY, of the sample is


10.5-11.5 II 'i:.f U 2 - ('i:.fiNj
( ')2 = [ 'i:.f
II. 5-12.5 12 s 'i:.f
12.5-13.5 13

13.5-14.5 14
=
14.5-15.5 15

15.5-16.5 16

16.5-17.5 17 The standard deviation, SI , of

-
17.5-18.5 18 the sample is given by
= -/(variance)
I~:a~:r~s ~ ~

• •
SI

Tota~:r~s
-ve ~ ~~ =
Net Totals =

Table 4.6 (Table 26 of the laboratory manual)


Tota l u Area under Area
score for norma l curve for Expected
of 3 Closs closs from eoch normal Observed
d ice bounda ries bounda ries u to co closs frequency frequency
2 .5 flI. eI
3
3 .5 -<
4
45 I\l
5
5 .5 ~.
"
6 ,-
7
6.5
." .1
""' "

7. 5
"". ... ~-
6
6 .5 ..
9
9 .5
"' " .

10
10. 5 ~,

II
1 1. 5 m '"
12
12.5 ..ii' X
13
13 .5
14
14 .5 ,,;
15
15 .5 ~.

16
16.5 .,J,r>,.>
17 '" "
17 . 5 .'"
18
16.5 "",
'"
";,-
Table 4.7 (Table 27 of the laboratory manual)
Notes
1. u is the deviation from the mean, of the class boundary expressed as a
multiple of the standard deviation (with appropriate sign).
. _ class boundary - X
I.e. u - s
2. The area under the normal curve above each class boundary may be found
from the table of area under the normal curve at the end of the book.
The normal curve area or probability for each class is obtained by differencing
the cumulative probabilities in the previous column.
3. Other tables which cumulate the area under the normal curve in a different
way may be used, but some of the column headings will require modification
and the probabilities subtracted or summed as appropriate.
4. In order to obtain equality of expected and observed total frequencies, the
two extreme classes should be treated as open-ended, Le. with class boundaries
of - 0 0 and +00 instead of 2.5 and 18.5 respectively.
Normal Distribution 105

Appendix 2-Experiment 11

Normal Distribution
Number of persons: 2 or 3.

Object
To calculate the mean and standard deviation of a sample from a normal
population and to demonstrate the effect of random sampling fluctuations.

Method
From the red rod population M6/1 (Normally distributed with a mean of 6.0
and standard deviation of 0.2) take a random sample of 50 rods and measure
their lengths to the nearest tenth of a unit using the scale provided. The rods
should be selected one at a time and replaced after measurement, before the
next one is drawn.
Record the measurements in table 28.
Care should be taken to ensure good mixing in order that the sample is
random. The rod popUlation should be placed in a box and stirred-up well
during sampling.

Analysis
1. Summarise the observations into a frequency distribution using table 29.
2. Calculate the mean and standard deviation of the sample data using table 30.
3. Compare, in table 31, the sample estimates of mean and standard deviation
obtained by each group. Observe how the estimates vary about the actual
population parameters.
4. Summarise the observed frequencies of all groups in table 32. On page 51,
draw, to the same scale, the probability histograms for your own results and
for the combined results of all groups. Observe the shapes of the histograms
and comment.

1-10
11-20
21-30
31-40
41-50
Table 4.8 (Table 28 of the laboratory manual)
Summarise these observations into class intervals of width 0.1 unit with the
measured lengths at the mid points using the 'tally-mark' method and table 29.
106 Statistics: Problems and Solutions

Class
interval Class 'Tally-marks' Frequency
(units) mid point

5.35-5.45 5.4
5.45-5.55 5.5
5.55-5.65 5.6
5.65-5.75 5.7
5.75-5.85 5.8
5.85-5.95 5.9
5.95-6.05 6.0
6.05-6.15 6.1
6.15-6.25 6.2
6.25-6.35 6.3
6.35-6.45 6.4
6.45-6.55 6.5
6.55-6.65 6.6
Total frequency

Table 4.9 (Table 29 of the laboratory manual)

Width of class interval 0.1 unit.


If c is width of class interval, choose Xo to be the mid point of a class which,
by inspection, is somewhere near the mean of the distribution.
Obtain the class values u from the relation

u X-Xo
c
The values of u will be positive or negative integers.
The mean oX of the sample is

- 01 "E,fu
x =xo + . "E,f

=
Normal Distribution 107

Closs Mid
Interval, point ~reQuerlC) Closs
units )( f u fu fu 2
5·;30-5.45 5.4
5.45-5.55 5.5

5.55-5.65 5.6
5.65-5.75 5.7

5.75-5:85 5:8

5.85-5.95 5.9
5.95-6.05 6.0

6.05-6.15 6.1
6.15-6.25 6.2

6.25-6.35 6.3

6.35-6.45 6.4

6.45-6.55 6.5

6.55-6.65 6.6

I~:o~::ms ~ ~
~o::\~~ms ~ ~ R ~
Net totals
~ R
Table 4.10 (Table 30 of the laboratory manual)

The variance (s'}2 of the sample is

=
=
The standard deviation s of the sample is given by
s' =y(variance)
=
=
108 Statistics: Problems and Solutions

Sample

Sample Standard
Group Mean
size deviation
1

2
3
4
5
6

8
Population parameters 6.00 0.2

Table 4.11 (Table 31 of the laboratory manual-summary of data)

Frequency of rod lengths

5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6
1
2

7
8
Total
frequencies
(all groups)

Table 4.12 (Table 32 of the laboratory manual)


Relationship between the basic
5 distributions

5.1 Syllabus Covered


The relationships between hypergeometric, binomial, Poisson and normal
distributions; use of binomial as approximation to hypergeometric in sampling;
Poisson as approximation to binomial; normal as an approximation to binomial;
normal as an approximation to Poisson.

5.2 Resume of Theory


The following points should be revised and stressed.
(1) The basic laws of the distributions in chapters 3 and 4.
(2) Their interrelationships and conditions for using approximate distributions.
Tables 5.1 and 5.2 summarise the interrelationships together with rules for use of
the approximations.
Note: In practice use the Poisson and normal distributions as approximations
to hypergeometric and binomial whenever possible:
(3) (a) Binomial approximation to hypergeometry.
(b) Using Poisson approximation to binomial.
(c) Normal approximation to binomial.
(d) Normal approximation to Poisson.
(4) Whenever the normal distribution is used to approximate either the
hypergeometric, binomial or Poisson distributions, care should be taken to
remember that a continuous distribution is being approximated to a discrete
one, and to include an allowance when calculating probabilities. For example,
take the case of using the normal approximation to the following problem.
What is the chance in a group of 100, of more than 20 persons dying before
65 years of age, given that the chance of anyone person's dying is 0.20?
Here, since p > 0.20 and np > 5, normal approximation can be used.
However, in calculating the probability figure 5.1 illustrates that the value 20.5
and not 20 must be used.
109
2 3 4
Hypergeometric Binomial Poisson Normal
distribution distribution distribution distribution

General term of (:)(~=:) e- m mX Density


( : ) pX(l- P )n-x -(x _J.J.)2
distribution P(x) xl function
(~) 2a2
_1 e
f(x) = av'27r
M
Mean nx N np m J.J.

Variance M ( MYN-n) np(l-p) m a2


n xN 1- N (N - 1)

Notes: Computation of Computation will Direct computation Probabilities can easily be


probabilities is usually be tedious even i~easier than for (l) obtained from a table of
excessive and the if it is practicable. If or (2). Tables of areas under the normal
distribution formulae tables of probabilities Poisson probabilities curve. It is necessary to
need only be used in are not available, the are readily available. express the variable in
practice where the usual appropriate one of standardised form, i.e. in
approximations (i.e. distributions (3) and terms of
distribution (2) and (4) can generally be
x-J.J.
its approximations) do used as a good U = -a-
not give the required approximation.
accuracy.

Table 5.1. Relationship between distributions


Relationship Between the Basic Distributions 111

Use (2) as an approximation for (1)


. M
puttmgp = N if;<O.lO

Use (3) as an approximation for (2)


putting m = np ifp<O.lO

Use (4) as an approximation for (2) ifp;;"O.lO


p';;;;O.90
putting J1 = np and a2 = np(l-p)
np> 5

Use (4) as an approximation for (3) if m ;;"15


and preferably
putting J1 = m and a2 =m m>30

Table 5.2 Approximations and a guide to their use in practice

The suggested approximations will usually be satisfactory for practical


purposes. However, for values of the parameters near to the limiting conditions
give above, care should be taken when determining probabilities in the tails of
a distribution, as the errors of approximation may be considerably greater than
allowable.

Probability of exceeding 20.


Value of 20.5 must be used.

19202122
Figure 5.1 Number of deaths

5.2.1 Hypergeometric, Binomial and Poisson Approximations


Tables 5.3 and 5.4 give details of the accuracy of the approximations at the
limiting conditions; obviously the further the parameters are from these
conditions the more accurate the approximation.
Batch size N = 100 of which 10 are defective.
Sample size n = 10, p =0.10
n
thus N =0.10, np =1
112 Statistics: Problems and Solutions

Table 5.3 gives a full comparison of the probabilities of fmding x defects in the
sample.

No. of defects Hypergeometric Binomial Poisson


in sample (x) distribution distribution distribution

0 0.3305 0.3487 0.3679


I 0.4080 0.3874 0.3679
2 0.2015 0.1937 0.1839
3 0.0518 0.0574 0.0613
4 0.0076 0.0112 0.0153
5 0.0006 0.0015 0.0031
6 or over 0.0000 0.0001 0.0006

Table 5.3

5.2.2 Normal Distribution as an Approximation to Poisson


Table 5.4 gives a full comparison of the probability of x successes where m = 15
using the Poisson and its normal approximation.

No. of Poisson Normal No. of Poisson Normal


successes distribution approximation successes distribution approximation
(x) (x)

0 0.0000 0.0001 16 0.0960 0.0993


1 0.0000 0.0001 17 0.0848 0.0899
2 0.0000 0.0004 18 0.0706 0.0762
3 0.0002 0.0009 19 0.0557 0.0603
4 0.0007 0.0019 20 0.0418 0.0452
5 0.0019 0.0037 21 0.0299 0.0313
6 0.0048 0.0072 22 0.0204 0.0200
7 0.0104 0.0122 23 0.0132 0.0122
8 0.0194 0.0200 24 0.0083 0.0072
9 0.0325 0.0313 25 0.0050 0.0037
10 0.0486 0.0452 26 0.0029 0.0019
11 0.0663 0.0603 27 0.0016 0.0009
12 0.0828 0.0762 28 0.0008 0.0004
13 0.0956 0.0899 29 0.0005 0.0001
14 0.1025 0.0993 30 0.0002 0.0001
15 0.1024 0.1026 31 0.0001 0.0000

Table 5.4
Relationship Between the Basic Distributions 113

The two distributions converge rapidly as m increases.


While most statisticians accept the use of the normal approximation for
m > 15, it will be seen that there is quite an appreciable divergence in the tails
of the distributions. The authors recommend that whenever possible normal
approximation is used when m > 30.
The statistical tables* have been amended to give Poisson probabilities up to
m=40.

5.2.3 Examples on the Use of Theory


1. In sampling from batches of 5000 components, a sample of 50 is taken and if
one or more defects is found the batch is rejected. What is the probability of
accepting batches containing 2% defects?
The theoretically correct distribution is the hypergeometric, but since
n. 50
N l.e. 5000

is less than 10%, the binomial can be used.


However, computation is still difficult and since p < 0.10 the Poisson
distribution can be used.
Solution by binomial approximation from table 1*
Probability of accepting batches with 2% defectives = 0.3642
Solution by Poisson approximation
Expected number of defects in sample = np = 50 x 0.02 = 1
From table 2*
Probability of accepting batches with 2% defectives = 0.3679
2. In 50 tosses of an unbiased coin, what is the probability of more than 30
heads occurring?
This requires the binomial distribution which gives
50

P(> 30 heads) = L (~0)(!Y(n50-X


x=31

from table 1 = 0.0595

Using normal approximation since p > 0.10 and np > 5


mean of distribution = np = 25
Variance of distribution = np(1-p) = 50 x ! x! = 12.5
114 Statistics: Problems and Solutions

Figure 5.2
~ 25 30.5

Standard deviation = 3.54

= 30.5 - 25 = 5.5 = 1 55
u 3.54 3.54 .
which from table 3* leads to a probability of 0.0606.
Note: Since a continuous distribution is being used to approximate to a
discrete distribution, the value 30.5 and not 30 must be used in calculating the
u value.
3. A machine produces screws 10% of which have defects. What is the probability
that, in a sample of 500
(a) more than 35 defects are found?
(b) between 30 and 35 (inclusive) defects are found?
The binomial law: assuming a sample of 500 from a batch of at least 5000.
The normal approximation can be used since p > 0.10, np = 50.
f.J. = np = 500 x 0.10 = 50
a =V(500 x 0.10 x 0.90) = V45 = 6.7
(a) u_35.5-50=_14.5=_2.16
6.7 6.7
Probability of more than 35 defects from tables* = 1 - 0.01539 = 0.9846
(b) Probability of between 30 and 35 defects, use limits 29.5 and 35.5.

Figure 5.3

u = 29.5 -50 = _ 20.5 =-306


6.7 6.7 .

Probability of more than 29 = 1 - 0.0011 = 0.9989


Probability of between 30 and 35 inclusive = 0.9989 - 0.9846 = 0.0143
Relationship Between the Basic Distributions liS

4. The average number of breakdowns per period of an assembly moulding


line is 30. If the breakdowns occur at random what is the probability of more
than 40 breakdowns occurring per period?
Here the theoretically correct distribution is the Poisson. However, since
m> 15 use normal approximation.
Solution by Poisson, table 2*

30
L
00

P(x) = 30X ~- - 0.0323


x=41 x.

Using normal approximation

J.L= 30

(J =y'30 = 5.48

CT = 5.48

Figure 5.4 30 40.5

Again to include 41 breakdowns but exclude 40

=40.5-30= 192
u 5.48 .
Probability of exceeding 40, p(>40) = 0.0274 from statistical table 3*.

5.2.4 Examples of Special Interest


1. Bomb Attack on London
During the last war, it was asked whether the bombs dropped on London were
aimed or whether they fell at random. The term 'aimed' is of course very loose,
since obviously the Germans could point the bombs towards Britain, but aim in
this problem is defined as pinpointing targets inside a given area.
To determine the solution, part of London was divided into 576 equal areas
Ci km2 each) and the number of areas with 0, 1,2, ... , hits was tabulated from
the results of 537 bombs which fell on the area. These data in distribution form
are shown in table 5.5.
116 Statistics: Problems and Solutions

Number of hits j 0 2 3 4 5
Number of areas with j hits 229 211 93 35 7 I

Table 5.5

In statistical logic, as will be seen later, an essential step in testing in the logic
is the setting up of what is called the null hypothesis.
Here the null hypothesis is that the bombs are falling randomly or that there
is no ability to aim at targets of the order of! km 2 in area.
Then if the hypothesis is true, the probability of any given bomb falling in
anyone given area = m.
Probability of x hits in any area

p(x)
= (537)(_1
x 576
)X (575)S37-X
576
from the binomial law.
However, since the probability of success is very small and the number of
attempts is relatively large, the Poisson law can be used as an approximation to
the binomial thus greatly reducing the computation involved.
Thus, for the Poisson calculation
average number of successes m = np = 537 x stt; = 0.93

The results obtained by reference to statistical tables by interpolation for the


chance of various numbers of hits are given in table 5.6.

Number of hits j o I 2 3 4 5
Probability of j hits 0.395 0.367 0.170 0.053 0.012 0.002

Table 5.6

Table 5.7 shows the results obtained by comparing the actual frequency
distribution of number of hits per area with the Poisson expected frequencies if
the hypothesis is true.

Number of hits j 0 2 3 4 5
Actual number of areas with j hits 229 211 93 35 7
Expected number of areas with j hits (Poisson) 227 211 98 31 7

Table 5.7
Relationship Between the Basic Distributions 117

The agreement is certainly good enough (without significance testing) to state


that the null hypothesis is true; namely, that the bombs fell at random, so that
the area into which the bomb could be aimed must have been much larger than
the area of London.

2. Defective Rate of a Production Process


The number of defects per shift produced by a certain process over the last 52
shifts is given in table 5.8. Is the process in control, i.e. has the defective rate
remained constant over the period? The total production per shift is 600 units.

Number of defects/shift o 1 2 3 4 5 6 7 8 9 10 Total


Frequency 2 6 9 11 8 6 4 3 2 0 52

Average defects/shift = 3.6


Table 5.8

This problem gives an excellent introduction to the basic principles of


quality control.
The process is assumed to be in control. If this hypothesis is true then

the probability of anyone component being defective = :o~ = 0.006

Thus, by the Binomial law

probability of x defects in a shift p(x) = (6~0) O.006 x O.994600-X


Number of Number of Poisson Calculated number
defectives(s) shifts pes) of shifts: 52 pes)

0 2 0.0273
1 6 0.0984 5
2 9 0.1771 9
3 II 0.2125 II
4 8 0.1912 10
5 6 0.1377 7
6 4 0.0826 4.5
7 3 0.0425 2
8 2 0.0191
9 0.0076 0.5
10 0 0.0040 0.2
52 1.0000 51.2

Table 5.9
sPs-s
118 Statistics: Problems and Solutions

However, here again the Poisson law gives an excellent approximation to the
binomial, reducing the computation considerably.
It should be noted that in most attribute quality control tables this Poisson
approximation is used.
Using m = 3.6, table 5.9 gives the comparison of the actual pattern of variation
with the Poisson.
Reference to the table indicates that the defects in the period of 52 shifts did
not show any 'abnormal' deviations from the expected number.
Thus, this comparison gives the basis for determining whether or not a
process is in control, the basic first step in any quality control investigation.

5.3 Problems for Solution


1. In a machine shop with 250 machines, the utilisation of each machine is
80%, i.e. 20% of the time the machine is not working. What is the probability
of having
(a) more than 60 machines idle at anyone time?
(b) between 60 and 65 machines idle?
(c) less than 32 machines idle?
2. In a sampling scheme, a random sample of 500 is taken from each batch of
components received. If one or more defects are found the batch is rejected.
What is the probability of rejecting batches containing
(a) 1% defectives?
(b) 0.1% defectives?
3. Assuming equal chance of birth of a boy or girl, what is the probability that
in a class of 50 students, less than 30% will be boys?
4. The average number of customers entering a supermarket in 1 h is 30.
Assuming that all customers arrive independently of each other, what is the
probability of more than 40 customers arriving in 1 h?
5. In a hotel, the five public telephones in the lobby are utilised 48% of the time
between 6 p.m. and 7 p.m. in the evening. What is the probability of
(a) all telephones being in use?
(b) four telephones being in use?
6. A city corporation has 24 dustcarts for collection of rubbish in the city. Given
that the dustcarts are 80% utilised or 20% of time broken down, what proportion
of the time will there be more than three dustcarts broken down?
7. A batch of 20 special resistors are delivered to a factory. Four resistors are
Relationship Between the Basic Distributions 119

defective. Four resistors are selected at random and installed in a control panel.
What is the probability that no defective resistor is installed?

5.4 Worked Solutions to the Problems


1. This is the binomial distribution.
Since p = 0.20, np = 250 x 0.20 = 50, thus normal approximation can be
used.
J.I.= 5.0
a2 = 250 x 0.20 x 0.80 = 40
a=6.3

IT·6.3

Figure 5.5 31.5 50 59.560.565.5

(a) u = 6~~;50 = 1.67

Probability of more than 60 machines idle P(>60) = 0.0475

Probability of more than 65 machines idle P(>65) = 0.0069


AI so u = 59.5-50
6.3
= 1 51
.

Probability of more than 59 machines idle P(>59) = 0.0655


Probability of between 60 and 65 machines idle (inclusive = 0.0655-0.0069
= 0.0586

(c) u = 316~;50 = 2.94

Probability of less than 32 machines idle p«32) = 0.0016

2. This is the hypergeometric distribution in theory but if it is assumed that n,


the sample size, is less than 10% of the batch, the binomial can be used.
120 Statistics: Problems and Solutions

For 1% defectives, since p < 0.10 } the Poisson approximation can


and for 0.1 % defectives, since p < 0.10 be used.

(a) m = np = 500 x rbo = 5


Probability of rejecting batches with 1% defectives
p(>0) = 0.9933

(b) m = np = 500 x ~O~ = 0.5


Probability of rejecting batches with 0.1% defectives
p(>0) = 0.3935
3. Probability of birth of a boy p =!, sample size n = 50
This is the binomial distribution but since p > 0.10 and np > 5 the normal
approximation can be used. Thus, J.1 = np = 50 x 0.5 = 25
a = y[np(l-p)] = y(50 x 0.5 x 0.5) = Y12.5 = 3.54

To calculate the probability of there being less than 15 boys

u = 14.5-25 = -10.5 = -2 97
3.54 3.54 .

(7= 3.54

Figure 5.6 14.5 JJ-=25

From tables*,
Probability of class of 50 having less than 15 boys = 0.0015
Compare this with the correct answer from binomial tables of 0.0013.

4. This by definition is the Poisson law. However, since m > 15, the normal
approximation can be used. Here J.1 = 30, a = y30 = 5.48

40.5-30
u = 5.48 1.92

Probability of more than 40 customers arriving in 1 h = 0.0274


(Compare this with the theoretically correct result from Poisson of 0.0323.)
Relationship Between the Basic Distributions 121

u=5.48

Figure 5.7 flo =30

5. Probability of a telephone booth being busy p = 0.48.


number of booths, n = 5
This is the binomial distribution.
This example is included to demonstrate clearly that cases will arise in
practice where the approximations given will not apply. Here we cannot use
Poisson approximation since p > 0.10. Also we cannot use normal approximation
since np is not greater than 5. Thus the problem must be solved by computing
the binomial distribution or referring to comprehensive binomial tables.
Thus
probability of all telephones being in use = 0.48 5 = 0.0255

probability of four telephones being in use = G) 0.48 4 x 0.52 1 = 0.1380

6. Here, n = 24
Probability of dustcart's being broken down (P) = 0.20. This is the binomial
distribution. Here the normal distribution can be used as an approximation.
Mean p. =np=24 x 0.20 =4.8
Variance a 2 =np(l-p) =24 x 0.20 x 0.80 =3.84
Standard deviation = 1.96

u= 1.96

Figure 5.8 flo =4.8


122 Statistics: Problems and Solutions

Table 3* gives the probability of three or less dustcarts being out of service
as 0.2546
Probability of more than three dustcarts being out of service
p(>3) = 1- 0.2546 =0.7454 or 74.5%
7. Here this is the hypergeometric distribution and since the sample size4 is
greater than 10% of population (20) no approximation can be made. Thus the
hypergeometric distribution must be used.
Probability of 0 defects
( 4)(16) 16!
o 4 12!4! 16 15 14 13
P(O) = (20) = 20! =20 x 19 x 18 x 17 = 0.3756
4 16! 4!

5.5 Practical Laboratory Experiment and Demonstrations


Using the binomial sampling box, experiment 8 in the laboratory manual
demonstrates how the Poisson distribution can be used to approximate the
binomial.
The laboratory instructions are given together with recording, analysis and
summary sheets in pages 37-39 of the manual.
The laboratory instruction sheet for experiment 8 is reproduced in
Appendix 1.

Appendix I-Instruction Sheet for Experiment 8

Number of persons: 2 or 3.

Object
To demonstrate that the Poisson law may be used as an approximation to the
binomial law for suitable values of n (sample size) and p (proportion of the
population having a given attribute), and that, for a given sample size n, the
approximation improves asp becomes smaller. (Note: for a given value ofp, the
approximation also improves as n incre"ases.)

Method
Using the binomial sampling box, take 100 samples of size 10, recording, in
table 21, the number of red balls in each sample. (proportion of red balls in the
population = 0.02.)
Relationship Between the Basic Distributions 123

Analysis
1. Summarise the data into a frequency distribution of number of red balls per
sample in table 22 and compare the experimental probability distribution with
the theoretical binomial (given) and Poisson probability distributions.
Draw both the theoretical Poisson (mean =0.2) and the experimental
probability histograms on figure 1 below table 22.
2. Using the data of experiment 7 and table 23, compare the observed
probability distribution with the binomial and Poisson (mean = 1.5) probability
distribu tions.
Also, draw both the theoretical Poisson (mean = 1.5) and the experimental
probability histograms on figure 2 below table 23.
Note: Use different colours for drawing the histograms in order that comparison
may be made more easily.
Distribution of linear
6 functions of variables

6.1 Syllabus Covered


Variance of linear combinations of variates; distribution of sample mean;
central limit theorem.

6.2 Resume of Theory and Basic Concepts


6.2.1 Linear Combination of Variates
Consider the following independent variates x, y, z, ... with means X, y, z, ...
·
an d vanances 2
ax, 2
ay, az2 , ..•
Let wr = aX r + b Yr + C zr + ... where a, b, c are constants.
Then W is distributed with mean w =a x + by + C Z + ... and variance
a~ = a2ai + b2a~ + c2a~ + ...

Special Case 1- Variance of Sum of Two Variates


Here a = +1, b = +1 and c = 0 as for all other constants, then
Wr =xr +Yr
w=x+y
a 2w =ax2 + ay2
or the variance of the sum of two independent variates is equal to the sum of
their variances.

Special Case 2- Variance of Difference of Two Variates


Here a = +1, b = -1 and all other constants = O.
Wr =xr-Yr
and
w=x-y
Distribution of Linear Functions of Variables 125

or the variance of the difference of two variates is the sum of their variances.
Note: It should be noted that while this theorem places no restraint on the
form of distribution of variates the following conditions are of prime importance:
(1) If variates x, y, Z, . .. are normally distributed then w is also normally
distributed.
(2) If variates x, y, Z are Poisson distributed then w is also distributed as
Poisson.

Examples
1. In fitting a shaft into a bore of a housing, the shafts have a mean diameter of
50 mm and standard deviation of 0.12 mm. The bores have a mean diameter of
51 mm and standard deviation of 0.25 mm. What is the clearance of the fit?
The mean clearance = 5 I - 50 = 1 mm
Variance of clearance = 0.122 + 0.25 2 = 0.0769
Standard deviation of clearance = YO.0769 = 0.277 mm

2. A machine producing spacers works to a nominal dimension of 5 mm and


standard deviation of 0.25 mm. Five of these spacers are fitted on to a bolt
manufactured to a nominal shaft dimension of 38 mm and standard deviation
0.50mm.
What is the mean and variance of the clearance on the end of the shaft of
the bolt?

.. 38mm
..
Figure 6.1

Here average clearance = 38 - 5 x 5 = 13 mm


Variance of clearance = 1 x 0.5 2 + 5 X 0.25 2 = 0.5625 mm
Standard deviation =0.75 mm

3. (a) The time taken to prepare a certain type of component before assembly
is normally distributed with mean 4 min and standard deviation of 0.5 min. The
time taken for its subsequent assembly to another component is independent of
preparation time and again normally distributed with mean 9 min and standard
deviation of 1.0 min.
126 Statistics: Problems and Solutions

What is the distribution of total preparation and assembly time and what
proportion of assemblies will take longer than 15 min to prepare and assemble?
Let w'= total preparation and assembly time for rth unit.
w=4+9= 13 min
o~ = 12 X 0.5 2 + 12 X 1.02 = 1.25
or standard deviation ofw, Ow =v'1.25 = 1.12 min

Figure 6.2

Distribution of preparation and assembly time, w


15-13 2
u = T.i2 = 1.12 = 1.78 mean Ilw = 13 Ow = 1.12

Reference to table 3* gives that probability of total assembly and preparation


time exceeding 15 min is 0.0375 or 3.75% of units.

(b) In order to show clearly the use of constants a, b, c, . .. , consider the


previous example, but suppose now that each unit must be left to stand for
twice as long as its actual preparation time before assembly is commenced.
What is the distribution of total operation time now?
Here

where
x, = preparation time
Y r = assembly time
w= (3 x 4) + 9 = 21 min
o~ = 32 X 0.5 2 + 12 X 12 = 3.25
Standard deviation of w = 1.8 min.

(c) To further clarify the use of constants, consider now example 3(a). Here
the unit has to be sent back through the preparation phase twice before passing
on to. assembly.
Distribution of Linear Functions of Variables 127

Assuming that the individual preparation times are independent, what is the
distribution of the total operation time now?
Here

or

w =(4+4+4)+9 = 21 min as before


however, variance

Standard deviation

aw = 1.32
6.2.2 Distribution of Sum of n Variates
The sum of n equally distributed variates has a distribution whose average and
variance are equal to n times the average and variance of the individual variates.
This follows direct from the general theorem in section 6.2.1.
Let
x = y = z . .. and a = b = c ... = 1

then
w=x+x+ ... +x=n.x
a~ =(12 X a~) + (12 X a~) + ... + (12 X a~) =na~
Example
Five resistors from a population whose mean resistance is 2.6 kU and standard
deviation is 0.1 kU are connected in series. What is the mean and standard
deviation of such random assemblies?
Average resistance = 5 x 2.6 = 13 kU
Variance of assembly = 5 x 0.12 = 0.05
Standard deviation = 0.225 kU

6.2.3 Distribution of Sample Mean


The distribution of means of samples of size n from a distribution with mean p.
and variance a2 has a mean of p. and variance a2 In.
Population
meanp.
variance a2
128 Statistics: Problems and Solutions

Consider ith sample of size n from this population


Let xr = rth member of this sample
then the mean of the ith sample

Xj =1n (XI +X2 +X3 +X r + .. . xn)

=(~)XI +(~)X2 +G)X3 + ... +~(Xr)+ ... +(~)Xn


Since mean of XI = mean of X2 = mean of Xr =/1
average of distribution of samples of size n = 1 (/1 +/1 + ... +/1) = /1
n

Variance of distribution of sample of size n =(~r a 2 + (~y a 2 + ... +( ~r a2

Standard deviation of samples of size n=In


6.2.4 Central Limit Theorem
(Associated with theorem of distribution of sample mean in section 6.2.3 or
with distribution of sum of variates in section 6.2.2.)
The distribution of sample mean (or the sum of n variates) has a distribution
that is more normal than the distribution of individual variables.
This theorem explains the prominence of the normal distribution in the
theory of statistics and the approximation to normality obviously depends upon
the shape of the distribution of the variate and the size of n. As n increases the

(a)

2 3 4 5 6

"):l~
3 4 5 6 7 8 9 10 II 1213 14 15 1617 18

Figure 6.3. Probability distribution (a) the score of 1 die (b) the score of 3 dice.
Distribution of Linear Functions of Variables 129

sampling distribution of means gets closer to normality and similarly the closer
the original distribution to normal the quicker the approach to true normal form.
However the rapidity of the approach is shown in figure 6.3 which shows the
distribution of the total score of three 6-sided dice thrown 50 times. This is
equivalent to sampling three times from a rectangular population and it will
be seen that the distribution of the sum of the variates has already gone a long
way towards normality.

6.2.5 Distribution of the Sum (or Difference) of Two Means


J.l x and J.ly are the means of distribution of x and y and 0;, 0;
their respective
variances, then if a sample of nx is taken from the x population and a sample of
ny from the y population, the distribution of the sum (or difference) between
the averages of the samples has mean

J.l x + J.ly (or J.lx -J.ly)


and variance
2 2
ox+~
nx ny
In the special case where two samples of size nl and nz are taken from the
same population with mean J.l and variance 0 2 , the moments of the distribution
of sum (or difference) of sample averages is mean 2J.l for sum (and 0 for
difference) and variance

This theorem is most used for testing the difference between two populations,
but this is left until chapter 7.

Example
A firm calculates each period the total value of sales orders received in £.p. The
average value of an order received is approximately £400, and the average number
of orders per period is 100.
What likely maximum error in estimation will be made if in totalling the
orders, they are rounded off to the nearest pound?
Assuming that each fraction of £1 is equally likely (the problem can,
however, be solved without this restriction) the probability distribution of the
error on each order is rectangular as in figure 6.4, showing that each rounding
off error is equally likely.
Consider the total error involved in adding up 100 orders each rounded off.
Statistically this is equivalent to finding the sum of a sample of 100 taken
from the distribution in figure 6.4.
130 Statistics: Problems and Solutions

-50p o +50p
Figure 6.4. Probability distribution of error/order.

From theorem 3 the distribution of this sum will be normal and its mean and
variance are given below.
Average error =0
Variance of sum = 10002 where 02 variance of the distribution of individual
errors

For a rectangular distribution, 02 =h 2 / 12 where h =range of base.


h in this problem = lOOp = £1.00
. 100 x 1 100
Vanance of sum = 12 12 =8.33
Standard deviation =v'8.33 =£2.90
Here the likely maximum error will be interpreted as the error which will be
exceeded only once in 1000 times.
Since the error can be both positive or negative
Maximum likely error = ± 3.29 x £2.9 or ±£9.50

6.3 Problems for Solution


1. A manufacturer markets butter in ! kg packages. His packing process has a
standard deviation of 10 g. What must his process average be set at to ensure that
the chance of any individual package's being 5% under the nominal weight of
! kg is only 5% (or 1 in 20)?
If the manufacturer now decides to market in super packages containing
four !-kg packages, what proportion of his product can be saved by this
marketing method if he still has to satisfy the condition that super packages
must only have a 5% chance of being 5% under nominal weight of 2 kg?
2. The maximum payload of a light aircraft is 350 kg. If the weight of an adult
is normally distributed (N.D.) with mean and standard deviation of 75 and 15 kg
respectively, and the weight of a child is normally distributed with mean and
standard deviation of 23 and 7 kg respectively, what is the probability that the
plane can take off safely with
(a) four adult passengers?
(b) f-our adult passengers and one child?
Distribution of Linear Functions of Variables 131

In each case, what is the probability that the plane can take off if 40 kg of
baggage is carried?
3. Two spacer pieces are placed on a bolt to take up some of the slack before
a spring washer and nut are added. The bolt (b) is pushed through a plate (p)
and then two spacers (s) added, as in figure 6.5.

p ___ I-Plate

Bolt (b)

-
.. ..
r---~--~--~--.---

I I
Clearance

Figure 6.5
Given the following data on the production of the components
plate: mean thickness 12 mm, standard deviation of thickness 0.05 mm,
normal distribution
bolt: mean length 25 m, standard deviation of length 0.025 mm, normal
distribution
spacer: mean thickness 3 mm, standard deviation of thickness 0.05 mm,
normal distribution
what is the probability of the clearance being less than 7.2 mm?
4. In a machine fitting caps to bottles, the force (torque) applied is distributed
normally with mean 8 units and standard deviation 1.2 units. The breaking
strength of the caps has a normal distribution with mean 12 units and standard
deviation 1.6 units. What percentage of caps are likely to break on being fitted?

5. Four rods of nominal length 25 mm are placed end to end. If the standard
deviation of each rod is 0.05 mm and they are normally distributed, find the
99% tolerance of the assembled rods.
6. The heights of the men in a certain country have a mean of 1.65 m and
standard deviation of 76 mm.
(a) What proportion will be 1.80 m or over?
132 Statistics: Problems and Solutions

(b) How likely is it that a sample of 100 men will have a mean height as
great as 1.68 m. If the sample does have a mean of 1.68 m, to what extent does
it confirm or discredit the initial statement?
7. A bar is assembled in two parts, one 66 mm ± 0.3 mm and the other
44 mm ± 0.3 mm. These are the 99% tolerances. Assuming normal distribu tions,
find the 99% tolerance of the assembled bar.
8. Plugs are to be machined to go into circular holes of mean diameter 35 mm
and standard deviation of 0.010 mm. The standard deviation of plug diameter is
0.075 mm.
The clearance (difference between diameters) of the fit is required to be at
least 0.05 mm. If plugs and holes are assembled randomly:
(a) Show that, for 95% of assemblies to satisfy the minimum clearance
condition, the mean plug diameter must be 34.74 mm.
(b) Find the mean plug diameter such that 60% of assemblies will have the
required clearance.
In each case find the percentage of plugs that would fit too loosely (clearance
greater than 0.375 mm).
9. Tests show that the individual maximum temperature that a certain type of
capacitor can stand is distributed normally with mean of 130°C and standard
deviation of 3°C. These capacitors are incorporated into units (one capacitor per
unit), each unit being subjected to a maximum temperature which is distributed
normally with a mean of 118°C and standard deviation of 5°C.
What percentage of units will fail due to capacitor failure?
10. It is known that the area covered by 5 litres of a certain type of paint is
normally distributed with a mean of 88 m 2 and a standard deviation of 3 m 2 . An
area of 3500 m 2 is to be painted and the painters are supplied with 40 5-litre tins
of paint. Assuming that they do not adjust their application of paint according
to the area still to be painted, find the probability that they will not have
sufficient paint to complete the job.
11. A salesman has to make 15 calls a day. Including journey time, his time
spent per customer is 30 min on average with a standard deviation of 6 min.
(a) If his working day is of 8 h, what is the chance that he will have to work
overtime on any given day?
(b) In any 5-day week, between what limits is his 'free' time likely to be?
12. A van driver is allowed to work for a maximum of 10 h per day. His
journey time per delivery is 30 min on averag~ with a standard deviation of 8
min.
In order to ensure that he has only a small chance (1 in 1000) of exceeding
the 10 h maximum, how many deliverties should he be scheduled for each day?
Distribution of Linear Functions of Variables 133

6.4 Solutions to Problems


1. At least 95% of individual packets must weigh more than 0.475 kg. Thus the
process average weight must be set above 0.475 kg by 1.645 times the standard
deviation (see figure 6.6; 5% of the tail of a normal distribution is cut off
beyond 1.645 standard deviations), i.e at
0.475 + 1.645 x 0.010 = 0.475 + 0.0164 = 0.491 kg

,,00.010

0.05 0.05
....~::;f
0.475 0.475+1.645 x 0.01 1.9 1.9+1.645xO.01 4
Individual packets Weight of 4 packs

Figure 6.6 Figure 6.7

If individual packages are packed four at a time, the distribution of total net
weight and the probability requirements are shown in figure 6.7.
The mean weight of 4 packages must be
1.9 + 1.645 x 0.OlY4 = 1.9 + 0.033 = 1.933 kg
Thus the process setting must be 1.933/4 = 0.483 kg
The long run proportional saving of butter per nominal !-kg package is

.0.491- 0.483 = 0.008 = 00163 1 6301


0.491 0.491· or. 10

2. (a) The weight of four adult passengers will be normally distributed with
mean of 4 x 75(= 300) kg and standard deviation of y4 x 15(=30) kg. The
shaded area in figure 6.9 gives the probability that the plane is within its
maximum payload.
The standardised normal variate,

= 350 - 300 = 50 = 1 67
u 30 30·

75
Child weight Adult weight

Figure 6.8 Figure 6.9


134 Statistics: Problems and Solutions

Figure 6.10

Table 3* gives the unshaded area as 0.0475. The required answer is


1- 0.0475 =0.9525, say 0.95
With 40 kg of baggage, the mean weight is increased to 300 + 40 = 340 kg.

u
=35030
- 340 =.!Q =0 33
30·
The probability of safe take-off now becomes 1 - 0.3707 = 0.63
(b) For four adults and one child the weight distribution is shown in
figure 6.11.
As before,
_ 350 - 323 =.1:L= 0 88
u 30.8 30.8 .

Thus probability of safe take-off = 1 - 0.1894 = 0.81

Figure 6.11 (4x75l+23 350

With 40 kg of baggage in addition, the distribution is

= 350- 363 = -13 - -042


u 30.8 30.8 .
Probability of safe take-off =0.3372, say 0.34

Figure 6.12 350 300+23+40


Distribution of Linear Functions of Variables 135

3. The mean clearance will be 25 - 12 - 3 - 3 = 7 mm


The variance of the clearance will be 0.025 2 + 0.050 2 + 0.0502 + 0.0502
= 0.008125
and the standard deviation is 0.090 mm.
The distribution of clearance is shown in figure 6.13, the shaded area being
the required answer.

= 7.2 - 7.0 _ 0.2 = 2 22


u 0.09 0.09 .
thus the probability that the clearance is less than 7.2 mm is 1- 0.0132 = 0.987

Figure 6.13
'£:90 7 7.2

4. A cap will break if the applied force is greater than its breaking strength.
The mean excess of breaking strength is 12 - 8 = 4 units while the standard
deviation of the excess of breaking strength is v'(1.6 2 + 1.22) = v'4.00 = 2.0.
When the excess of cap strength is less than zero the cap will break and the
proportion of caps doing so will be equal to the shaded area of figure 6.14, i.e
the area below
0-4
u =-2- =-2 or 0.0228,

about 21% of caps.

o 4
Figure 6.14 Excess of breaking strength

5. The distribution of the total length of four rods will be normal with a mean
of 4 x 25 = 100 mm and standard deviation of v'4 x 0.05 = 0.10 mm.
Ninety-nine per cent of all assemblies of four rods will have their overall
length within the range
100 ± 2.58 x 0.10 mm i.e. 100 ± 0.26 mm
6. (a) Assuming that heights can be measured to very small fractions of a
136 Statistics: Problems and Solutions

metre, the required answer is equal to the shaded area in figure 6.15.

= 1.80-1.65 = 1 97
u 0.076 .
and area = 0.0244

CT = 0.076

Figure 6.15 1.65 1.80

If heights, say, can only be measured to the nearest 5 mm, it would be


reasonable to say that any height actually greater than 1.795 m would be
recorded as 1.80 m or more. In this case u becomes

1.795 -1.65 = 0.145 = 1 91


0.076 0.076·

Proportion over 1.80 m tall = 0.0281.


(b) Average heights of 100 men at a time (selected randomly) will be
distributed normally with a mean of 1.65 m and standard deviation of
76/y100 = 7.6 mm.
Probability of a mean height of 1.68 m or more equals the shaded area in
figure 6.16.

= 1.68 - 1.65 = 3 95
u 0.0076 .
The shaded area is about 0.00004.

Figure 6.16
ill 1.651.68
Mean of 100 heights

Possible alternative conclusions are that this particular sample is a very unusual
one or that the assumed mean height of 1.65 m is wrong (being an underestimate)
or that the standard deviation is actually higher than the assumed value of
76 mm.
7. The standard deviation of each component part is 0.3/2.58. The standard
deviation of an assembly of each part will be 0.3Y2/2.58 about a mean of
Distribution of Linear Functions of Variables 137

66 + 44 = 11 0 mm.
Ninety-nine per cent of assemblies wiIllie within 110 ± 0.3V2 mm, i.e. within
110 ± 0.42 mm.

8. The distribution of clearance will have standard deviation of


V(0.1002 + 0.075 2 ) = 0.125 mm
(a) For 95% of assemblies to have clearance greater than 0.05 mm and
assuming normality of the distribution, the mean must be
0.05 + 1.645 x 0.125 = 0.256 mm
and thus the mean plug diameter must be less than 35 mm by this amount, i.e.
34.74mm.

Figure 6.17 Clearance

A clearance of 0.375 mm is equivalent to

= 0.375 - 0.256 =0 95
u 0.125 .

Table 3* shows that the probability of exceeding a standardised normal variate


of 0.95 is 0.1711, i.e. approximately 17% of plugs would be too loose a fit.
(h) For 60% of assemblies to have clearance greater than 0.05 mm, the mean
clearance must be
0.05 + 0.253 x 0.125 = 0.082 mm
and the mean plug diameter must be less than 35 mm by this amount, i.e.
34.92mm.

0.05 0.375
Figure 6.18 Clearance

For a clearance of 0.375 mm,

= 0.375 - 0.082 = 2 34
u 0.125 .
138 Statistics: Problems and Solutions

corresponding to an upper tail area of 0.0096, i.e. less than 1% of clearances


would be too great.

9. A capacitor will fail if the maximum temperature to which it is subjected is


greater than its own threshold temperature.
The proportion of units for which the maximum applied temperature is
greater than the temperature the capacitor can resist is given by the shaded area
in figure 6.21.

118 Y 130 x
Max. applied temperature Capacitor max. temperature
Figure 6.19 Figure 6.20

<Jx _y =V34

Figure 6.21 o 12 (x-y)

For. the distribution of excess of capacitor threshold temperature over


applied temperature,
the mean = 130 - 118 = 12 and variance = 3 2 + 52 = 34.
Since this distribution will be normal and any negative excess corresponds to
failure of the capacitor, we require the area below O. The u-value equivalent to
this is
0-12
u=--=-2.06
V34
giving a proportion of 0.0197.
10. The area covered by 5 litre of paint is normally distributed with mean and
standard deviation of 88 m 2 and 3 m 2 , respectively. Thus the area covered by

3500 3520
Figure 6.22 Area covered by 40 x 5 litres of paint
Distribution of Linear Functions of Variables 139

40 x 5 litre of paint will also be normally distributed with mean of


40 x 88 (=3520) m 2 and standard deviation of V40 x 3(=19.0) m 2 •
The probability of covering less than 3500 m 2 is the probability of having
insufficient paint for the job (shaded area of figure 6.22).
To find the shaded area

=3500-3520=_105
u 19.0 .
giving an answer of about 14.7%.
11. (a) The distribution of time spent on a total of 15 calls will be approximately
normal (by the central limit theorem) with a mean of 15 x 30(=450) min and a
standard deviation of V15 x 6(=23.3) min.

Figure 6.23 Time for 15 visits

The probability that 15 calls take longer than 8 h is represented by the shaded
area in figure 6.23.
480 min (8 h) corresponds to

= 480 - 450 = 1 29
u 6,.115 .
The required probability is 0.0985.
(b) There may be differing interpretations about what is meant by 'free' time
in a week. 'Free' time for the salesman occurs on days when he works less than
8 h. The total of such time is found for five consecutive days, no account being
taken of any 'overtime' that has to be worked. The solution of such a problem is
quite difficult.
In this case, we shall consider 'free' time as the net amount by which his
actual working time is less than his scheduled working time.

Figure 6.24 Working time


140 Statistics: Problems and Solutions

In 5 days, the number of calls to be made is 75. The distribution of total


time to make these calls is approximately normal (by the central limit theorem)
with a mean of 75 x 0.5 (= 37.5) h and standard deviation of y75 x 0.1
(= 0.866) h.
The salesman's total working time will lie within 2.58 standard deviations of
the expected time for 75 calls with a probability of 99%, i.e. within
37.5 ± 2.58 x 0.1 y75 = 37.5 ± 2.24 = 35.26 to 39.74 h
There is thus only a small chance (1 %) that his 'free' time in one week lies
outside the range
0.25 to 4.75 h
12. Let the required number of deliveries be n. The time required for n
deliveries will be approximately normally distributed (central limit theorem)
with mean time of 30n min and standard deviation of 8yn min.

/l\ =8Vn

LLkOI
ITn

30 30n 600
Time per delivery Time for n deliveries
Figure 6.25 Figure 6.26

In order that there is only 1 chance in 1000 that n journeys take longer than
10 h (600 min), n must be such that
30n + 3.09 x 8yn .;;; 600
Theilargest value of n that satisfies the inequality can be found by systematic
trial and error. However, a more general approach is to solve the equality as a
quadratic in yn, taking the integral part of the admissible solution as the number
of deliveries to be scheduled.
Thus
30n + 24.72Yn - 600 = 0

yn = -24.72 ± Y(26~722 + 72000) =4.0788 or -4.9028

(discard negative term as inadmissible)


n = 16.64 or 24.04
n = 24.05 corresponding to the negative root of the quadratic is clearly
Distribution of Linear Functions of Variables 141

inadmissible since the average total journey time would be 12 h, violating the
probability condition.
The number of deliveries to be scheduled is therefore 16.
If 16 deliveries were scheduled, the probability of exceeding 10 h would
actually be less than O.OOI-in fact about 1 in 10 000.

6.5 Practical Laboratory Experiments and Demonstrations


The following experiment from Basic Statistics Laboratory Instruction Manual
demonstrates the basic concepts of the distribution of sample means and the
central limit theorem.

Appendix I-Experiment 12

Sampling Distribution of Means and Central Limit Theorem


Number of persons: 2 or 3.

Object
To demonstrate that the distribution of the means of samples of size n, taken
from a rectangular population, with standard deviation a tends towards the
normal with standard deviation a/Yn.

Method
From the green rod population M6/3 (rectangularly distributed with mean of
6.0 standard deviation of 0.258), take 50 random samples each of size 4,
replacing the rods after each sample and mixing them, before drawing the next
sample of 4 rods.
Measure the lengths of the rods in the sample and record them in table 33.

Analysis
1. Calculate, to 3 places of decimals, the means of the 50 samples and summarise
them into a grouped frequency distribution using table 34.
2. Also in table 34, calculate the mean and standard deviation of the sample
means and record these estimates along with those of other groups in table 35.
Observe how they vary amongst themselves around the theoretically
expected values.
3. In table 36, summarise the frequencies obtained by all groups and draw, on
page 57, the frequency histogram for the combined results. Observe the shape
of the histogram.
142 Statistics: Problems and Solutions

r Sample no. 1 2 3 4 5 6 7 8 9 10

I Total
r Average
r Sample no. 11 12 13 14 15 16 17 18 19 20

r Total
I Average

r Sample no. 21 22 23 24 25 26 27 28 29 30

r Total
I Average

I Sample no. 31 32 33 34 35 36 37 38 39 40

I Total
I Average
r Sample no. 41 42 43 44 45 46 47 48 49 50

I Total
I Average

Table 6.1 (Table 33 of the laboratory manual)


Distribution of Linear Functions of Variables 143

Class Mid 'Tally-marks


, .,.., Closs
Interval '"units point f u fu fu 2

5.600 -5.650 5.625 -5

5.675 -5.725 5.700 -4

5.750 -5.000 5.775 -3

5.825 -5.875 5.850 -2


5.900 -5.950 5.925 -I

5.975 -6.025 6.000 0

6.050-6.100 6.075 I

6.125 -6.1 75 6.150 2

6.200 -6.250 6.225 3

6.275 -6.325 6.300 4

6.350-6.400 6.375 5

-
Totals of +ve terms
~
Total of -ve terms
~~ ~
Net totals

Table 6.2 (Table 34 of the laboratory manual)

Calculation of Distribution Mean, X, and Standard Deviations, s


6.000 is the mid point of the class denoted by u = 0
Class width = 0.075
The mean, X, of the distribution is given by

x = 6.000 + 0.075 ~; = =

The standard deviation, s', of the distribution is given by:

s' =0.075 [
~fu2 :~~~)2l
~J 'J
=
=

* Strictly the class intervals should read 5.5875-5.6625 and the next 5.6625-5.7375 etc.
but the present tabulation makes summarising simpler.
144 Statistics: Problems and Solutions

Figure 6.27. 'Page 57' of the laboratory manual.


Estimation and significance
7 testing (I]-'Iarge sample' methods

7.1 Syllabus
Point and interval estimates; hypothesis testing; risks in sampling; tests for
means and proportions; sample sizes; practical significance; exact and approximate
tests.

7.2 Resume of Theory


7.2.1 Point Estimators
A point estimator is a number obtained from a sample and used to estimate a
population parameter. For example, the average of a random sample is an
estimator of the mean of the population from which it came. The sample
median can also be used to estimate the mean of a symmetrical population as
can other sample statistics. There are certain statistically desirable properties
that point estimators should possess (unbiasedness, consistency, efficiency,
sufficiency) and which make one estimator better than another for a particular
purpose.
However, regardless of the estimator used, it is necessary to allow for
uncertainty due to sampling variation, i.e. the numerical value obtained from the
sample will not be exactly the same as the parameter value and an interval must
be defined within which we can be reasonably confident that the parameter lies.

7.2.2 Confidence Intervals


Two numbers are calculated to determine the ends of an interval within which
we can state that the population parameter lies. A probability is attached to the
calculated interval and signifies the confidence we have in stating that the
parameter actually falls within the interval. What this means is that if we found,
say, a 95% confidence interval for a parameter, if such an interval were calculated
for each of a large number of individual sample estimates, 95 out of every 100
intervals in the long run would contain the parameter and five would not.
145
146 Statistics: Problems and Solutions

The determination of confidence intervals is calculated from the sampling


distribution of the particular sample estimator being used.

7.2.3 Hypothesis Testing


In science, a theory is developed to 'explain' the occurrence of an observed
phenomenon. Further observations, usually coupled with deliberate experiments,
are made to test the theory. The theory will be accepted as an adequate model
until observations are made which it cannot satisfactorily 'explain'. In this case
modification, or abandonment of the theory in favour of another one, is
necessary. This is the approach used in statistical hypothesis testing.
An hypothesis is set up concerning a population; for example, it may be a
statement about the value of one or more parameters of the population or
perhaps about its form, i.e. that it is normal or exponential, etc. Statistical
techniques are necessary to decide whether observationaagree with such
hypotheses because variation, and hence uncertainty, is usually present.
A statistical hypothesis is usually of the null type. As examples, consider
the following.
(1) In testing whether a coin is biased, the hypothesis would be set up that it
was fair, i.e. the probability of a 'head' on one toss is 0.5.
(2) In testing the efficiency of a new drug, it would be assumed as a hypothesis
that it was no different in cure potential from the standard drug in current use.
(3) A new teaching method has been introduced; to assess whether it gives an
improvement in its end product compared with the previous method, the
hypothesis set up would be that it made no difference, i.e. that it was of the
same effectiveness.
(4) To determine whether an overall 100 k.p.h. speed limit on previously
unrestricted roads reduces accidents, the hypothesis would be set up that it
makes no difference. The same method would be used to assess the effect of
breathalyser tests.

7.2.4 Errors Involved in Hypothesis Testing


In deciding, on the basis of observed data subject to variation, whether or not
to accept a statistical hypothesis, two types of error may be made.

Type I error (the Il risk)


This first kind of error is the risk of rejecting the original hypothesis when it
is, in fact, true. The risk is expressed in probability terms and is of magnitude Il.

Type II error (the (3 risk)


This error is the risk of accepting (or better, failing to reject) the original
hypothesis when, in fact, it is false. As for Il, the risk is expressed as a probability
of magnitude {3.
Estimation and Significance Testing (J) 147

7.2.5 Hypothesis (Significance) Testing


A test of a statistical hypothesis is a procedure for deciding whether or not to
accept the hypothesis. This decision is made by assessing the significance of the
observed results, i.e. are they so unlikely on the basis of the test hypothesis that
the latter must be rejected in favour of some alternative hypothesis?
For example, in (1) in section 7.2.3 on testing the bias of a coin, the test
would consist of counting the number of 'heads' in some convenient number of
tosses and calculating the pm:"dbility that such a result could have been
obtained if the hypothesis was true.
When this probability has been calculated and it turns out to be very small,
two explanations are possible. Either the hypothesis is false or else a rare event
has occurred by chance.
It is customary to choose the first of these two alternatives when the
probability is below a given level. In fact it will be seen that this probability is
the risk of rejecting the hypothesis when in fact it is true. The levels of a are
arbitrary but conventional values used are
a= 0.05, a = 0.01 and a = 0.001

7.2.6 Sample Size


The magnitude of a can be fixed for a given test but {3 depends on the variability
of the basic variable, the extent to which the test hypothesis is false (if it is
false) and n, the sample size used. Generally the only one of these that can be
altered at will is the sample size although there may be practical limitations of
time, cost, or feasibility restricting even this.
Nevertheless, it is useful to know the sample size required to achieve given
levels of a and {3 under given conditions of variability and parameter value of the
population.

7.2.7 Tests for Means and Proportions


This section deals with some of the standard tests for population means and
proportions. In both cases the following problems are considered.
I. Is the sample mean (or proportion) consistent with the value of the
population mean assumed under the test hypothesis?
II. Do two different sample means (or proportions) indicate a significant
difference in population means?
III. What is a confidence interval for the mean of a population?
IV. What is a confidence interval for the difference between the means of
two populations?
Table 7.1 summarises the requirements for tackling these four problems.
The notation used is:
Sample size n (nl and n2 for problems II and IV)
Problem Variables Attributes (for n large)
X-iJ.. p-7r
I u=--- 00
a U = J[ 7r(ln- 7r )]
-"'"
vn
(XI -X2) - (iJ..I - iJ..2) _ (PI-P2)-(7r I- 7r2)
II U - ---2-'--"--.-~=--~-=------=---~
U = J(af + a~) 7r1 (1-7rd + 7r2 (1-7r2 )]
nl n2
J[ nl n2
This reduces to This reduces to
(XI -X2) u= PI-P2
U =J(af + a~)
nl n2 J[ 1T( I-i) (nil + :J]
where the null hypothesis assumes iJ..1 = iJ..2 where the null hypothesis is 7r1 = 7r2
fr is the best estimate of 7r I or 7r 2 and

= nlPI + n2P2
nl + n2 V:l
~
III 100 (l-a)% confidence interval is 100 (l-a)% confidence interval is approximately ;;;
......
.....
~.
..
p - ua/2 PO-P)]
n ~ 7r ~ P + ua/2J[P(l-P)]
n
X-Ua/2(-fn)~iJ..~X +uaf2(fn) J[ ~
c\:)"
~
IV 100 (l-a)% confidence interval is 100 (l-a)% confidence interval is approximately ~
..
~
;::s
a2 a2) I:l..
(XI-X2)-U a /2J( -1.+-1 ~(iJ..1-iJ..2) (PI - P2) - Ua /2J[PI (l-PI) + P2 (1-P2 )] ~ (7r1 - 7r 2) V:l
nl n2 ,nl n2 C
~
.....
2
- --1+_
a~) ~ (PI - P2) + Ua /2J[PI (1-PI) + P2 (1-P2 )] ;::s
~ (x I -X2) + ua /2 J( a nl n2 .c·
nl n2

Table 7.1
Estimation and Significance Testing (J) 149

Variables Population mean 11 (j..tl and 112 for problems II and IV)
Population standard deviation a (a I and a2 for problems II and
IV) assumed known
Sample mean x (x I and X2 for II and IV)
Proportions Population proportion (1f1 and 1f2 for problems II and IV)
Sample proportion p (P I and P2 for problems II and IV)
u is the standardised normal variate. In the case of variables, a (and a I and a2
as appropriate) is assumed to be known or calculated from a sample of size n
larger than about 30.

7.2.8 Practical Significance


Even if a significant result is obtained, i.e. an observed sample is so extreme that
the test hypothesis is no longer tenable, such statistical significance need not
mean that the result is of any practical significance. For example, if a particular
kind of light bulb has a mean life of 1220 h and a test of significance detects
that, after a fairly expensive modification of the lamp-making. process the mean
life is increased, the modification is unlikely to be worth permanent incorporation
if the new mean life is only 1225 h, say.
In summary then, the decision made as a result of a Significance test depends
on the possible consequences of that decision together with any other relevant
information that may be available.

7.2.9 Exact and ApprOXimate Tests


TIlls section was given the sub-heading of 'Large Sample' Methods which is a
common classification in the literature. Parts of chapter 8 refer to 'small sample'
methods. The authors believe that a better classification would be into 'exact'
and 'approximate' tests.
For example, in table 7.1, both of the tests using variables as well as the
confidence interval estimation are exact for any n, provided that the populations
involved are normal and that the variance a 2 (and ar and a~ as appropriate) is
known. The tests and intervals become approximate when the populations
involved are not normal but because of the central limit theorem, the error
involved is very small for n larger than abou t 4.
Approximations are also introduced when a is unknown but is estimated
from the sample data. In this case, provided n is greater than about 30, very
little error is involved. It is in cases where a test statistic is approximately
normally distributed (which is often the case for large n) that the description of
large sample methods is applied. Note, however, that the u-test can be exact for
any n under appropriate conditions.
For attributes, as mentioned in table 7.1, all the procedures are approximate
since they depend on the tendency of the binomial distribution towards
SPS-6
150 Statistics: Problems and Solutions

normality for large n (and preferably with 1T neither small norlarge )-see chapter 5.

7.2.10 Interpretation of Significant Results


Great care must be taken in the interpretation of a significant result. If a sample
result is extreme (i.e. significantly different from expectation) the rules given
lead to rejection of the test hypothesis. However, it is possible to get such a
result when the hypothesis is true because the sample is not random but is
heavily biased. This bias may arise either in the initial selection of the sample or
in the subsequent extraction of numerical data from the members of the sample
or in other subtle ways.
For example, in a coin-tossing experiment, the null hypothesis of a 50%
probability of heads may be rejected by the sample evidence for a particular
coin, the conclusion being that the coin is biased. However, the true situation
could be that the coin is not biased but the method of tossing it (i.e. of
sampling) is biased. This possibility should be considered in the initial
design of the experiment so that such a mistake is not made in the final conclusion
To repeat, there is more to statistics than knowing which formula to substitute
the numbers into-the relevance and validity of the numbers must be considered
and any interpretation of results closely matched to the circumstances of the
case.

7.2.11 Worked Examples


1. From long experience, a variable is known to be normally distributed with
standard deviation 6.0 about any given value of the mean, i.e. whatever the
current mean is, the variability about that mean is constant. A random sample
of 16 items from the population has a mean of 53.0. Is the current population
mean 50.0?
Set up the test hypothesisHo: mean (i.e. E[x]) = 50 and the alternative
hypothesis HI : E[x] =1= 50.
The means of samples of size 16 will have standard error of 6/"'; 16 = 1.5 and
thus the sample mean of 53 deviates from the overall assumed mean by
53-50
u = -1-.5- = 2 standard errors
Since this is a two-sided test (i.e. if the mean is not 50, there is no knowledge
that it must be larger), the result is si~ificant at the 5% level as the observed
value of lui is greater than 1.96. In fact, the actual significance level corresponds
to about 4.5%. If the consequences of wrong rejection of the hypothesis were
not very serious-as measured in terms of money, safety, inconvenience-then it
would be reasonable to reject the assumption that the mean of the population is
currently 50 units.
In general, haVing shown that there is some evidence (but not complete proof)
that the mean is not 50 units, the next quesiton is-what is it?
The sample mean provides the best estimate of the popUlation mean but an
Estimation and Significance Testing (1) lSI

allowance must be made for sampling fluctuations; this is done by using the
standard error of the sample mean to determine a confidence interval for the
popUlation mean.
For 95% confidence, the interval (conventionally symmetric in tail
probability) is
x- 1.96alYn and x + 1.96alYn i.e.
53 - 1.96 x 1.5 and 53 + 1.96 x 1.5
53 - 2.94 and 53 + 2.94
50.06 and 55.94 say 50.1 and 55.9
Notice that the interval does not include the previously assumed mean of 50.0.
In this respect, the two procedures (hypothesis testing and interval estimation)
are equivalent since the test hypothesis will be rejected at the 5% level of
significance if the observed sample mean is more than 1.96 standard errors on
either side of the assumed mean, and if this is the case the 95% confidence
interval cannot include the assumed mean. This argument applies in the two-
sided case for any significance level a and associated confidence probability
(I-a).
Also note, that in this example, the standard deviation was known and the
test and confidence interval estimation was perfectly valid for any size of
sample.
2. A synthesis of pre-determined motion times gives a nominal time of 0.10
min for the operation of piecing-up on a ring frame, compiled after analysis of
the standard method. 160 observed readings of the piecing-up element had an
average of 0.103 min and standard deviation of 0.009 min. Is the observed
time really different from the nominal time?
Here, the popUlation a is not known but an estimate based on 160 (random)
readings will be satisfactory.
Set up H 0: real mean element time, Ilo, = 0.100 min, HI : real mean element
time, Ill' 4= 0.100 min

u =X7jO
a/ n
~ 0.103-0.100= ~= 422
0.009/J160 3 .
This is significant at the 1 in 1000 level (lui> 3.09), the actual type I error being
less than 6 parts in 100000 (table 3*).
Ninety-nine per cent confidence limits for the real mean piecing-up time
under the conditions applying during the sampling of the 160 readings are

0.103 ± 2.58 x ~~~~ = 0.103 ± 0.00184, i.e.

0.1012 to 0.1048 min


152 Statistics: Problems and Solutions

Thus, the evidence suggests that the synthesis of the mean operation time
tends to underestimate the actual time by something between 1% and 5%.
Whether this is of any practical importance depends on what use is going to be
made of the synthetic time. Perhaps the method of synthesising the time may be
worth review in order to bring it into line with reality.
3. In special trials carried out on two furnaces each given a different basic mix,
furnace A in 200 trials gave an average output time of7.10 h while 100 trials
with furnace B gave an average output time of 7.15 h.
Given that, from previous records, the variance of furnace A is 0.09 h 2 and
of B is 0.07 h 2 and an assurance that these variances did not change during the
trials, is furnace A more efficient than B?
First of all, set up the test hypothesis that there is no difference in furnace
efficiencies (Le. average outpurtimes). The test is two-sided since there is no
reason to suppose that if one is more efficient then it is known which one it will
be.
Set up
Ho: PoA =PoB, Le. PoA -PoB = 0
HI: PoA - PoB =1= 0

The test statistic appropriate is

J(ai
u = (oXA - oX2)-(PoA -Poa)
+ a~)
nA nB
which becomes on substituting the observed data and the test assumptions
regarding (p.A - PoB)

u = (7.10-7.15) - 0 = -0.05 = -147


( 0.09 + 0.07) 0.034 .
200 100
Since this is numerically less than 1.96, or any higher value of u corresponding
to a smaller a, the difference in mean output times has not been shown to be
statistically significant at any reasonable type I error level.
Note: Even if a very highly significant value of u had been obtained (say
lui> 4.0) then the question could still not have been answered because of the
way the trials had been set up. The two furnaces may have been different in
mean output times (efficiencies) but because different basic mixes had always
been used in the furnaces, it is not apparent how much of the efficiency
difference was due to the different mixes and how much was due to inherent
properties of the furnaces (including, perhaps, the crews who operate them). To
determine whether the mix differences, furnace differences or a combination of
Estimation and Significance Testing (I) 153

both are responsible for differing mean output times would require a properly
designed experiment (this experiment is not designed to answer the question
posed).
In addition, it was assumed that the variances of the output times would be
unchanged during the special trials. This may often be a questionable assumption
and is unnecessary in this example since the sample variances of the 200 and
100 trials respectively could be substituted for ai and a~ with very little effect
on the significance test.

4. In a given year for a random sample of 1000 farms with approximately


the same area given to wheat, the average yield of wheat per hectare (10 000 m 2 )
was 2000 kg, with standard deviation of 192 kg/ha. The follOwing year, for a
random sample of 2000 farms, the average was 2020 kg/ha, with standard
deviation of 224 kg/ha. Doe_s the second year show an increased yield?
In this case, because of the large samples, each of them greater than about 30,
the sample variances can be used instead of the unknown population variances.
Set-upHo: no difference in mean yield per hectare, i.e. J.lI-J.l2 = 0
HI: mean yields per hectare different between the years, i.e.
J.lI - J.l2 =F 0

This is almost Significant at the 1% level and suggests that the mean yield for the
whole population of farms is greater in the second year.
As a word of warning, such a conclusion may not really be valid since the two
samples may not cover in the same way the whole range of climatic conditions,
soil fertility, farming methods, etc. The significant result may be due as much
to the samples' being non-representative as to a real change in mean yield for the
whole popUlation. The extent of each would be impossible to determine without
proper design of the survey. There are many methods of overcoming this, one
of which would be to choose a representative samples of farms and use the same
farms in both years.
5. A further test of the types illustrated in examples 3 and 4 can be made
when the population variances are unknown but there is a strong a priori
suggestion that they are equal. In this case, the two sample variances can be
pooled to make the test more efficient, i.e. to reduce {3 for given a and total
sample size (nl + n2).
A group of boys and girls were given an intelligence test by a personnel
officer. The mean scores, standard deviations and the numbers of each sex
154 Statistics: Problems and Solutions

are given in table 7.2. Were the boys significantly more intelligent than the
girls?

Boys Girls

Mean score 124 121


Standard deviation 11 10
Number 72 50

Table 7.2
The question as stated is trivial. If the test really does measure that which is
termed 'intelligence', then on average that group of boys was more intelligent
than that group of girls, although as a group they were more variable than the
girls.
However, if the boys are a random sample from some defined population of
boys and similarly for the girls, then any difference in average intelligence
between the populations may be tested for.
Assuming that there is a valid reason for saying that the two populations have
the same variances, the two sample variances can be pooled by taking a weighted
average using the sample sizes as weights (strictly the degrees of freedom-see
chapter 8-are used as weights but this depends on whether the degrees of
freedom were used in calculating the quoted standard deviations; in any case,
since the sample sizes here are large the error introduced will be negligible).
Pooled estimate of variance of individual scores

(Note: The variances are pooled, not the standard deviations.)

where S2 is the pooled variance

_ (124-121)-0 _ 3v'3600 _ 3 x 60_


- V[ 112(-11 + to)] - V(112 x 122) - ----u7 - 1.54

Thus there is no evidence that the populations of boys and girls differ in average
intelligence. This conclusion does not mean that there is not a difference,
.merely that if there is one we have not sufficient evidence to demonstrate it,
and even if we did, it may be so small as to be of no practical importance at all.
Confidence limits for the difference between two population means can be
set in the same way as in examples (1) and (2) above.
Estimation and Significance Testing (I) 155

Thus 95% confidence limits for the difference in mean intelligence are given
by

or, using the pooled variance, by

xB -xG ± 1.96J[S2(nI B + n~)] = (124 - 121) ± 1.96V[1l2(1i + to)

=3± 1
.
96J(1123600
x 122)

= 3 ± 1.96 x -W = 3 ± 3.82, i.e


-0.82 to 6.82
including the null value, 0, as it must from the significance test.
Note: The use of 1.96 instead of 2.0 is somewhat pedantic in practical terms;
it is retained in this chapter to serve as a reminder that the appropriate u-factor
is found from the tables* of the normal distribution in conjunction with the
choice of Q.

Examples Concerning Proportions


6. A programmed learning course has been introduced to train operators for a
precision job in a company. Ten per cent of the operators trained by the previous
method were found to be unsuitable for the job. Of 100 operators trained by the
new method, eight were not suitable. Is the new method better than the old?
Set up the null hypothesis that both methods are the same in their effect and
hence have the same 'failure' rate, i.e.
Ho: 1T = 0.10
1T is the probability that anyone operator will not benefit from the course and
is assumed constant for all operators.
The sample proportion of operators, p, who do not benefit from the course
will be binomially distributed with mean of 1T and standard error of
Y[1T(1-1T)/n] .
For large n, this binomial distribution can be approximated by the normal
distribution with the same parameters.
Thus,
u~ P-1T 0.08-0.10 =_0.02=_067
J[1T(1n- 1T)] J(0.\~00.9) 0.03 .

Since this is not at all a low probability result, there is no evidence that the
new method is any more or less effective than the previous one.
156 Statistics: Problems and Solutions

7. A certain type of seed is supposed to have a germination rate of 80%. If 50


seeds are tested and 14 fail to germinate, does this mean that the batch from
which they came is below specification?
Set up
Ho: 1T= 0.80
Use the normal approximation to give

u:!): P-1T = 0.72-0.80 _-0.08v'SO __ 141


J[1T(~-1T)] Je·8s~ 0.2) JO.16 .

This is not numerically large enough to reject the test hypothesis-the type I
error would correspond to just under 16%.
A slight improvement can be made in the adequacy of the normal approximation
by making the so-called correction for continuity. However, with the large
sample sizes generally required for use of the normality condition, this refinement
will not usually be worth incorporating. It is given here as an example.
36 or fewer germinating seeds can be considered as 36.5 or fewer on a continuous
scale. 36.5 corresponds to 0.73 as a proportion of 50 and the corrected value for
u becomes

J
u:!): 0.73 - 0.80
e·8s~ 0.2)
= -0.07"'50 = -1 24
0.4 .
The type I error corresponding to such a value ofu (two tails) is about 21.5%;
too high for most people to contemplate making.
Both this example and example (6) could have been done using the number
of occurrences rather than the proportion of occurrences in a sample. The
approaches are identical but for setting confidence limits, the proportion method
is better.
Standardising the number of 'successes' x in n trials gives
x-mr ---:-::-
- y[n1T(1-1T)]
u -"- ---;-::.;;.....,:..:..:,'

which on dividing top and bottom by n gives

The exact test can be carried for this example since the appropriate
Estimation and Significance Testing (/) 157

parameters are tabulated in table 1* (cumulative binomial probabilities).


For an assumed germination rate of 80%, the expected (mean) number of
seeds germinating out of 50 tested is 50 x 0.8 = 40. Because of the method of
tabulating (i.e. for proportions ~ 0.50), the problem is best discussed in terms
of seeds failing to germinate, the expected number being 50 x 0.2 = 10.
The probability of 14 or more failing to germinate is 0.1106 and the
probability of six or fewer failing to germinate is 1-0.8966 = 0.1034, i.e. a total
probability (magnitude of type I error) of 21.40% which compares favourably
with the refined normal approximation.
A a further point, if a 5% significance level is specified for this problem
(two-sided test since the true germination rate could be above or below 80%),
using table 1* with n = 50 and 1r (tabulated as p) = 0.20, the acceptance region
for failed seeds is from 5 up to 15 inclusive with the critical region split as near
equally as possible between the two tails (1.85% in the lower tail and 3.08% in
the upper tail).
Approximate confidence limits for the seed population germination rate
are found as

95% p ± 1.96 J[P(1;P)] = 0.72 ± 1.96 J(O.72 ;0°·28) = 59.6% to 84.4%


99% p± 2.58J[P(1 ;P)] = 0.72 ± 2.58 J(O.72 ;0°·28) = 55.6% to 88.4%
As mentioned earlier, these confidence limits are approximate because of the
use of the. normal distribution and because of substitution of the sample
proportion, p, in place of the population proportion 1r in the expression for
the standard error of p.
Note: The standard error, and hence the size of the confidence interval,
depends mainly on the actual size of the sample and not, for practical purposes,
on the proportion which the sample is of the population. The latter usually
only becomes important when it is about 20% or more. In such a case, the
formula used in the example overestimates the standard error a little bit, i.e. the
probability associated with the calculated interval is a little higher than stated.
Thus in this example, the 50 seeds provide the same information about the
overall germination rate whether they were taken randomly from a batch of
1000 seeds or a batch of 1 000 000 seeds (or any otherlarge number).

8. As an extension of the previous example, suppose two seedsmen, A and B,


each produce large quantities of nominally the same type of seed. Under
standard test conditons, out of 200 seeds from A, 180 germinate, whilst
255 germinate out of 300 from B. Has A a better germination rate than B?
158 Statistics: Problems and Solutions

Set up the null hypothesis that both germination rates are the same, i.e.

An approximately normal test statistic can be set up (see summary table 7.1) as

J[
(PA -PB)-(1TA -1TB)
U ~ ---,--=---=----=-...:"-'-:---=---.:..:...--"'-'---
1TA (1-1TA ) + 1TB(1-1T B )]
nA nB

Under the null hypothesis, 1TA =1TB = some value 1T, say, and the test statistic
becomes

The actual value of 1T, however, is unknown and it is usual to replace it by its
pooled sample estimate, p, obtained as a weighted average of the two sample
proportions P A and PB, the sample sizes being the weights.
Thus

P = nAPA +nBPB = 180+255 = 435 = 0.87


nA + nB 500 500

0.90 - 0.85 0.05 1 63


u = y[ 0.87 x 0.13(~ + ~)] = 0.0307 = .
Since this value does not exceed 1.96, numerically, there is no evidence at the
5% level of a difference between the seeds of A and B as far as germination rate
is concerned.
9. Example (1) of this section was concerned with a normal variable with
standard deviation of 6.0 units, this being assumed constant whatever the mean
of the popUlation. The null hypothesis was set up that the mean was 50.0 units.
(a) If a two-sided test of this hypothesis is carried out at the 1% level of
significance, what will be the type II error, if a sample of size 16 is taken and the
popUlation mean is actually equal to
(i) 51.0 units?
(ii) 53.0 units?
(b) What size of sample would be necessary to reject the test hypothesis
with probability 90% when the popUlation mean is actually 48.0 units? The
significance level (type I error) remains at 1%.
(a) (i) Figure 7.1 shows the essence of the solution. The solid distribution
is how x is assumed to be distributed under the null hypothesis, the critical
Estimation and Significance Testing (J) 159

" ~: §. = 1. 5
, ViS
'Vn
\
\

0 ,005 0 ,005',
....:z.a..::::..........;;:-.-.L----1.--:-::-"':':"""'-----":.... ;; _
Figure 7.1 46 .1 3 50 51 53 .87

region being given by the shaded area in its two tails. The boundaries of the
acceptance region for a 1% significance level are at

50 ± 2.58 x v'~6 = 46.13 and 53.87

The dotted distribution shows how x is actually distributed. If the observed


sample mean falls in the acceptance region, the null hypothesis would not be
rejected and a type II error would be committed. The shaded area shown dotted
is the probability of making such an error and to find it, we need

u = 53.87 i 51.0 = +1.93


v'16
and

46.13 - 51.0 -3.25


u= 6
y'16

The tail areas corresponding to these values are 0.0268 and 0.0006
approximately.
The type II error is therefore equal to
1- (0.0268 + 0.0006) = 0.9726
(ii) The solution to this part is the same as that for part (i) except that the
actual distribution of the sample mean will be centred around 53.0.
The values of u corresponding to the limits of the acceptance region are

u = 53.876- 53.0 =0.58

v'16
and

u =46.136- 53.0 = -4.58


~.h6
160 Statistics: Problems and Solutions

The type II error is therefore given by


1-(0.2810+0.0000) = 0.7190
(b) Here the risks are fixed for specific values of the population mean; the
sample size, n, is to be found. The requirements are shown in figure 7.2, in which
x~ and x; are the lower and upper boundaries of the acceptance region. Half per
cent of the sample mean distribution assumed under the null hypothesis will
lie outside each of these boundaries (1 % type I error with a two-sided
al terna tive).
,., -r-,
/' I ,
I I \
6
Vn /
I
,
,

I
I I
/' 0005 0.10 0 .005

X,·
~

48.0 50.0
Figure 7.2

The dotted distribution shows how the means of samples of size n will be
distributed when the population average is actually 48.0 units. The extreme
part of the right-hand tail of this distribution will lie above but it will be x;
such a minute proportion in this case as to be negligible.
The following equations can be set up.

x~ = 48.0 + 1.28 x In (7.1)

x~ = 50.0- 2.58 x In (7.2)

Subtracting one equation from the other leads to


6
(1.28 + 2.58) v'n = (50.0 -48.0)

or

The critical values of x are thus

50.0 ± 2.58 x v'1634 = 48.66 and 51.34


Part (b) of this example postulated the requirement that if the mean is 48.0
units (or more generally, if it differs from the test value by more than 2.0 units),
the chance of detecting such a difference should be 90%. This requirement
Estimation and Significance Testing (/) 161

would have been determined by the practical aspects of the problem. However,
if the actual population mean were less than 48.0 (or bigger than 52.0), the
probability of committing a type II error with a sample size of 134 would be less
than 10%; and if the population mean were actually between 48.0 and 50.0, this
probability would be greater than 10%.
10. What is the smallest random sample of seeds necessary for it to be asserted,
with a probability of at least 0.95, that the observed sample germination
proportion deviates from the population germination rate by less than 0.03?
The standard error of a sample proportion is V[ 1T(l-1T)/n 1 where 1T is the
population proportion and n the sample size. Assuming that n will be large
enough for the approximation of normality to apply reasonably well to the
distribution of p. the problem requires that

1.96 J[ 1T(l ;;1T) ] = 0.03


giving

1.96)2
n = ( 0.03 1T(l-1T)

1T, the quantity to be estimated is unknown (if it were known, there would be no
need to estimate it) and this creates a slight difficulty in determining n. However,
1T(l-1T) takes its maximum value of! when 1T =1 and

n
= (1.96)2
0.03
1
4
~ 1060
would certainly satisfy the conditions of the problem (whatever the value of 1T).
Alternatively if an estimate is available of the likely value of 1T, this can be
used instead of 1T as an approximation. Such an estimate may come from previous
experience of the population or perhaps from a pilot random sample; the pilot
sample estimate can be used to determine the total size necessary. If the pilot
sample is at least as big as this, no further sampling is needed. If it was not, the
extra number of observations required can be found approximately. If such
extra sampling is not possible for some reason (too costly, not enough time),
the confidence probabilities of types I and II errors will be modified (adversely).
For this example, if the seed population germination rate is usually about
80%, then the required value of sample size for at most a deviation of 0.03
(Le. 3%) with probability of 0.95 is

0.03 0.8 x 0.2 -~ 680


n -_ (1.96)2

(c.f. 1060 before).


162 Statistics,' Problems and Solutions

7.3 Problems for Solution


1. In production of a tinned product, records show that the standard deviation
of filled weights is 0.025 kg. A sample of six tins gave the following weights:
1.04,0.97,0.99, 1.00, 1.02, 1.01 kg.
(a) If the process is required to give an average weight of 1.00 kg does the
filling machine require re-setting?
(b) Determine confidence limits for the actual process average.

2. In a dice game, if an odd number appears you pay your opponent 1p and if
an even number turns up, you receive 1p from him. If, after 200 throws, you
are losing SOp and the dice are your opponent's, would you be justified in
feeling cheated?
3. A company, to determine the utilisation of one of its machines, makes
random spot checks to find out for what proportion of time the machine is in
use. It is found to be in use during (a) 49 out of 100 checks, and (b) 280 out of
600 checks.
Find out in each case, the percentage time the machine is in use, stating the
confidence limits.
How many random spot checks would have to be made to be able to estimate
the machine utilisation to within ± 2%7
4. In a straight election contest between two candidates, a survey poll of 2000
gave 1100 supporting candidate A. Assuming sample opinion to represent
performance at the election, will candidate A be elected?
5. In connection with its marketing policy, a firm plans a market research
survey in a country area and another survey in a town. A random sample of the
people living in the areas is interviewed and one question they are asked is
whether or not they use a product of the firm concerned. The results of this
question are:
Town: Sample size = 2000, no. of users = 180
Country: Sample size = 2000, no. of users = 200
Does this result show that the firm's product is used more in the country than in
town?
6. In a factory, sub-assemblies are supplied by two sub-contractors. Over a
period, a random sample of 200 from supplier A was 5% defective, while a
sample of 300 from supplier B was 3% defective.
Does this signify that supplier B is better than supplier A?
A further sample of 400 items from B contained eight defective sub-assemblies.
What is the position now?
7. lfmen's heights are normally distributed with mean of 1.73 m and standard
Estimation and Significance Testing (I) 163

deviation of 0.076 m and women's heights are normally distributed with mean
of 1.65 m and standard deviation of 0.064 m, and if, in a random sample of 100
married couples, 0.05 m was the average value of the difference between
husband's height and wife's height, is the choice of partner in marriage influenced
by consideration of height?
8. For the data of problem (3)(page 46), chapter 2, estimate 99% confidence
limits for the mean time interval between customer arrivals. Also find the
number of observations necessary to estimate the mean time to within 0.2 min.
9. An investigation of the relative merits of two kinds of electric battery showed
that a random sample of 100 batteries of brand A had an average lifetime of
24.2 h, with a standard deviation of 1.8 h, while a random sample of 80 batteries
of brand B had an average lifetime of 24.5 h with a standard deviation of 1.5 h.
Use a significance level of 0.01 to test whether the observed difference between
the two average lifetimes is significant.
10. Two chemists, A and B, each perform independent repeat analyses on a
homogeneous mixture to estimate the percentage of a given constituent.
The repeatability of measurement has a standard deviation of 0.1 % and is
the same for each analyst. Four determinations by A have a mean of 28.4% and
five readings by B have a mean of 28.2%.
(a) Is there a systematic difference between the analysts?
(b) If each analyst carries out the same number of observations as the other,
what should this number be in order to detect a systematic difference between
the analysts of 0.3% with probability of at least 99%, the level of significance
being 1%?

7.4 Solutions to Problems


1. The observed sample mean is x = (1.04 + 0.97 + 0.99 + 1.00 + 1.02 + 1.01)/6
= 1.005 kg
(a) Assuming the mean net weight of individual cans is 1.00 kg, i.e.
Ho: E[x] =J.l.o = 1.00 kg
HI: E[x] = J.l.1 :#: 1.00 kg
then

u =x - J.l.o = 1.005 - 1.000 y6 =049


a/yn 0.025/Y6 5 '.
The probability of such a deviation is about 62% and so there is no real
evidence that the process average is not 1.00 kg, i.e. the sample data are quite
164 Statistics: Problems and Solutions

consistent with a setting of 1.00 kg, although a type II error could be committed
in deciding not to re-set the process.
(b) Confidence limits for the actual current process average are, for two levels
of confidence

95%: 1.005 ± 1.96 x 0~~5 = 1.005 ± 0.02 = 0.985 and 1.025 kg

99%: 1.005 ± 2.58 x 0~~5 = 1.005 ± 0.026 = 0.979 and 1.031 kg

2. Losing 50p in 200 throws means that there must have been 125 odd numbers
(losing results) and 75 even numbers (winners) in 200 throws. Set up the null
hypothesis that the dice is unbiased.
He: 'IT = 0.5 ('IT == the probability of an odd number)
H\:'IT*0.5
The total number of odd numbers will be binomially distributed and since
n = 200 and 'IT = ! we know that
x-n'IT 124.5-100 k· h . f ..
u ~v[n'IT(l-'lT)] =v(200 x! x!) rna mg t e correctIOn or contmUity

=~~~ = 3.46
The probability of such a deviation is certainly less than 0.0007 and it therefore
seems likely that the dice is biased towards odd numbers.

3. 95% confidence limits for the proportional utilisation of the machine are
approxima tely

which gives

(a) 0.49 ± 1.96J(0.4~~00.51) = 0.49 ± 0.098 = 39.2 to 58.8%


and

280 +
(b) 600 - 1.96
J( 280 x 320 ) - + -
600 x 600 x 600 - 0.467 - 0.04 - 42.7 to 50.7%

Note: The standard error has been reduced by a factor of approximately


v6, the square root of the ratio of the two sample sizes.
Estimation and Significance Testing (/) 165

Also, since 1T is near to 0.5, for 95% confidence estimation, the required number
of spot checks is given by

1.96J(0.5 ~ 0.5) = 0.02, i.e. n = 98 2 x! = 2401

For a 99% confidence interval of width (2 x 0.02), the required n is found


from

2.58 J(0.5 ~ 0.5) = 0.02 n = 129 2 x! = 4160

4. 99% confidence limits for the population proportional support for candidate
A are

1100 +
2000 - 2.58
J( 1100 x 900
2000 x 2000 x 2000
) = + =
0.55 - 0.0287 0.521 to 0.579

Thus it is virtually certain that candidate A will be elected.

5. Assume that there is no difference in the proportion of people using the


product either in the country (1Td or in the town (1TT).

The best estimate under Ho of the common usage rate

= 200 + 180 = 0 095


4000 .
Then

u"""- (0.10-0.09~-0 =108


v'[0.095 x 0.905 (woo + ~)] .
There is no evidence that the proportion of people in the country area using the
product is any different from that in the town.
6. Assume that the percentage of sub-assemblies which are defective is the
same in the long run for both suppliers.
Thus

AssumingHo to be true, the best estimate of each· supplier's defective proportion


is
. 200 x 0.05 + 300 x 0.03 19
1T = 200 + 300 500
166 Statistics: Problems and Solutions

Thus
_ (0.05 - 0.03) - 0 _ 0.02 x 500 _
U - y[i&! x m (iW + ~)] - y(l9 x 481 x c& - 1.145
There is no evidence of a difference between the suppliers.
With the additional evidence, assuming that the underlying conditions remain
unchanged, the test may be carried out again

Pooling all the information,


, 10+9+ 8 27
1T =200 + 300 + 400 =900 =0.03
"" (t&-~)-O _
U -Y[0.03 x 0.97(2bo +"bo)] -1.88
This value of u nearly reaches its critical value for a 5% (two-'sided) level of
significance; the actual level is about 6%. There is thus some suspicion that B is
better than A but what action is taken depends on the consequences of the
possible alternative decisions.
7. Set up the test hypothesis that the choice of marriage partner is not influenced
by the height of either. In this case, in a married couple, the height of a man and
of a woman is each a random selection from the distributions of men's and
women's heights respectively.

J 0.00987
=0.0099
[TOO

0.05 0.08
Figure 7.3 Average height excess for 100 couples

The excess of the man's height over the woman's height will be a normal
variable with mean of(1.73 -1.65) m and variance of (0.076 2 + 0.064 2 )m 2 •
The average difference (excess) of height taken over a random sample of 100
such married couples will be normally distributed (Le. from one sample of 100
to another) with mean of 1.73 -1.65 = 0.08 m and variance of
(0.076 2 + 0.064 2 )/100 and have a standard error of YO.00987 /y100 = 0.0099 m.
The observed average difference was 0.05 m corresponding to a u value of

0.05 - 0.08 = -304


0.0099 .
Estimation and Significance Testing (I) 167

The (two-sided) significance level corresponding to this is approximately 0.0024


and thus it seems reasonable to conclude that the choice of marriage partner is
not independent of height.
8. The observed data of problem 3, chapter 2, are distributed in a skew pattern
with a calculated mean of 1.29 min and standard deviation of 1.14 min. The
figure of 1.29 min is the average of 56 individual readings, and by the central
limit theorem, such an average can be expected to be normally distributed. The
appropriate confidence limits can be found using the sample standard deviation
as an estimate of the population standard deviation since it is based on more than
30 readings; such an approximation will be good enough for most practical
purposes.

11\- h6
'=
O.OO~.005
o 1.29? ,.. 1.29?
Time between successive customers Average of 56 time intervals

Figure 7.4 Figure 7.5

99% confidence limits for the mean time between arrivals are

1.29 ± 2.58 x ~;: = 1.29 ± 0.39 = 0.90 and 1.68 min


The number of observations, n, necessary to estimate the population mean to
within 0.2 min (99% confidence) is given by equating the sampling error to the
required error, i.e.
a n = (2.58 x 1.14)2 = 216
2.58 Vn = 0.2 0.2
9. Set up the test hypothesis that there is no difference in mean lifetimes
between the two brands.

Ho: E[(XA -XB)] =J1A -J1B = 0


HI: E[(XA -XB)] = J1A -J1B *0
An appropriate statistic is

J(
u = ~A -XB)-O
Qi
nA
+Q1)
nB

The denominator being the standard error of the difference of two sample
melhls based on samples of size nA and nB respectively.
168 Statistics: Problems and Solutions

Thus
_ (24.2 -- 24.5) - 0
u-J(~ +~)
~OO 80
substituting the sample variances for the population variances
-0.3 -0.3
v'0.060S = 0.246 = -1.22

Since this value is not numerically larger than 2.58, there is no evidence of a
difference in mean lifetimes between A and B.
10. (a) Assume there is no systematic difference between the analysts, i.e. the
means of an infinitely large number of analyses of the same material would be
equal for A and B.
Under such a null hypotheses we may use the test statistic

u=(XA-XB)-O (XA-XB) = 28.4-28.2 =~=298


J(~+~) J[a2(n~ + n~)] v'[0.12(!+*)] O.lv'!o .

This is Significant at the 1% level (i.e. lui> 2.58) and we can conclude (with
only a small type I error) that there is a systematic difference between the
analysts, A giving a higher result than B on average. Thus at least one of them,
and possibly both, gives a biased estimate of the actual percentage composition.
99% confidence limits for the extent of this systematic difference are given
by
(28.4-28.2) ± 2.S8v'[0.12(! + n] = 0.2 ± 0.173 = 0.027 and 0.373%
(b) Figure 7.0 shows the requirements of this problem.

(XA -xB)* = 0 + 2.58 x J[ a2(n~ + n~)] (7.3)

(7.4)

o 0.3

Figure 7.6
Estimation and Significance Testing (J) 169

An equivalent pair of equations would be obtained for a systematic difference


of -0.3%.

Note: In writing down equations (7.3) and (7.4), the minute part of the left-
hand tail of the dotted distribution falling in the lower part of the critical
region has been ignored.
Putting n A =nB =n gives the required number of readings by each analyst as
n
=(2.58+2.33)2
0.32
x 2 x 0.12 = 4912
.
2 = 5 35
x"9 •

Thus each analyst should do six tests, the probability of detecting a systematic
difference of 0.3% between them (if it exists) being greater than the required
minimum of 99%.
In fact the required minimum power would still be achieved if one analyst
took six tests and the other five in order to reduce the total cost or effort
involved.

7.5 Practical Laboratory Experiments and Demonstrations


Since this chapter is concerned with 'large sample' methods, all the experiments
and demonstrations on illustrating the basic concepts of significance have been
left over to chapter 8.
In view of sample sizes required, experimenting is not very effective for
methods outlined in this chapter.
Sampling theory and significance
8 testing (IIJ-'t: 'F' and X2 tests

8.1 Syllabus Covered


Unbiased estimate of population variance; degrees of freedom; small sampling
theory; 't' test of significance; confidence limits using 't'; paired comparisons;
'F' test of significance for two variances; X2 test of significance; goodness of fit
tests; contingency tables.

8.2 Resume of Theory and Basic Concepts

8.2.1 Unbiased Estimate of Population Variance


In chapter 7 the use of significance testing for large samples or for samples where
an independent estimate of population variance was available was discussed. The
'u' test was described for comparing a sample mean with a given hypothesis and
also for testing significant differences between two popUlation means.
In this chapter the problems outlined are different in that the sample sizes are
small and no independent estimate of population variance is available-an
estimate from the sample having to be used for the popUlation variance.
In obtaining an unbiased estimate of the population variance from sample
data the following formula must be used
n
L (xi-if
S2 =_i_ __
(8.1)
n-1
where x = sample average.
Note: If an independent estimate of popUlation mean Il is available the sample
estimator of variance is
n
L (Xi-Il)2
S2=1_·_ _-
(8.2)
n
The denominators in both equations (8.1) and (8.2) are called the degrees of
freedom of the variance estimate.
170
Sampling Theory and Significance Testing (II) 171

8.2.2 Degrees of Freedom


This concept of degrees of freedom is very difficult to define exactly but it can
be considered as the number of independent variates. This number of independent
variates or degrees of freedom is equal to the total number of variates less the
number of independent linear constraints on the variates.
For example in equation (8.1) in estimating the population variance, the
sample mean x is used in the equation thus reducing the number of comparisons
or degrees of freedom to n - 1. No such reduction is necessary in equation (8.2).
When dealing with X2 goodness of fit testing a further explanation of this concept
of degrees of freedom will be given.

8.2.3 The 'u' Test with Small Samples


The arbitrary division of significance testing between large sampling theory (or
approximate methods) in chapter 7 and the small sampling theory (or exact
methods) in this chapter, necessitates the repeating of one test in order to
main tain consistency.

Testing the Hypothesis that the Mean of a Normal Population has a Specific
Value J.l.o -Population Variance Known
Here, providing the population variance is known (and therefore the sample
estimate of variance is not used), then the 'u' test is appropriate whatever the
sample size.
Thus
x-J.l.o
u=--
a
...jn
is calculated and the significance level is determined.

Example
In an intelligence test on ten pupils the following scores were obtained: 105,
120,90,85, 130, 11~ 12~ 115, 125, 10Q
Given that the average score for the class before the special tuition for the
test was 105 with standard deviation 8.0, has the special tuition improved the
performance?
Here since the standard deviation is given and if the assumption is made that
tuition method does not change this variation, then the u test is applicable.
Null hypothesis-tuition has made no improvement
Average score in test

x =105 + 120 + ... + 125 + 100 = 110


10
172 Statistics: Problems and Solutions

Here a one-tailed test can be used if it is again assumed that tuition could not
have worsened the performance.
Thus

u = 110; 105 = 1.98


ylO
From table 3*
for 5% u = 1.64
1% u = 2.33
This result is significant at the 5% level; there is evidence of an improvement.

8.2.4 The 't' Test of Significance


Testing the Hypothesis that the Mean of a Normal Population has a Specific
Value Ilo-Population Variance Unknown
Here the sample of size n is used to give the estimate of population variance.

"£xl _ ("£Xj)2
S2 = ~(xi - x) _ n (8.3)
n-l n-l
The null hypothesis is set up that the sample has come from a normal population
with mean Ilo.
W. S. Gosset under the nom de plume of 'Student' examined the following
statistic

(8.4)

and showed that it is not distributed normally but in a form which depends on
the degrees of freedom (v) if the null hypothesis is true. Table 7* sets out the
various percentage points of the 't' distribution for a range of degrees of freedom.
Obviously t tends to the statistic u in the limit where v~oo, i.e. tis
approximately normally distributed for large degrees of freedom v. Reference
to the table* shows that, as most textbooks assert, where the degrees of freedom
exceed 30, the normal approximation can be used or the 't' test can be replaced
by the simpler 'u' test.
Note: For a two-tailed test note that a 5% significance level requires
a =0.025 in table 7* and for a 1% significance level, a=0.005 (a is a proportion,
not a percentage). See section 8.2.6.
Sampling Theory and Significance Testing (II) 173

Again, one-tailed tests are only used when a priori logic clearly shows that
the alternative population mean must be on one side of the hypothesis value f.lo.
See section 8.2.7.

Testing the Hypothesis that the Means of Two Normal Populations are f.lx and f.ly
Respectively- Variances Equal but Unknown
Note: The assumption must hold that the variances of the two populations
are the same (Le. a~ = a;) since we are going to pool two sample variances and
this only makes sense if they are both estimates of the same thing-a common
population variance. If a~ does not equal a;
then the statistic given below is not
distributed like t.
s;
The two sample variances s~ and are pooled to give a best estimate of the
common population variance.

(8.5)

where nx and ny are the sizes of the two samples, and

t =(x - ji) - (f.lx - f.ly)

sJ(..!. +..l)
nx ny
(8.6)

with (nx + ny - 2) degrees of freedom. The usual test hypothesis is that the
populations have equal means and under this assumption (f.lx -f.ly) = 0 and the
test statistic reduces to

(x - ji)
t=J(-1+ -1)
S
nx ny
(8.7)

t- Tes fUsing Paired Comparisons


In many problems, the power of the significance test can be increased by pairing
the results and testing the hypothesis that the mean difference between paired
readings is equal to f.lo.
Note: This approach is only legitimate provided that there is a valid reason
for pairing the observations. This validity is determined by the way in which
the experimental observations are obtained.
Let the number of paired readings = n
Let the difference of the ith pair = d;
174 Statistics: Problems and Solutions

Then
n
L(di-a?
S2 = ...:..i_----:-_ where
n-l
and

t=d-Jlo (8.8)
s
Vn
The test hypothesis is usually of the null type where there is assumed to be no
difference on average in the paired readings, i.e. Jlo = O. In this case the test
statistic t is given by

t = -.!:L (8.9)
s
Tn
Confidence Limits for Population Mean
Where the degrees of freedom are less than about 30, the confidence limits for
population mean Jlo are:

for 95% confidence limits x ± (to.025,V)-!;n

for 99% confidence limits x ± (to .005 ,vk~­


vn

This is similar to the large sample case except that t is used instead of u.

8.2.5 The 'F' Test of Significance


For testing the hypothesis that the variances of two normal populations are
equal.
Again, a null hypothesis is set up that the variances are the same.
Let

Then

F _ sx
-2
- -2
Sy
where s; > s~ (8.10)
Sampling Theory and Significance Testing (II) 175

If F is greater than F 0.025 (see table 9*) for (nx - 1) degrees of freedom of
numerator and (ny - 1) degrees of freedom for the denominator, then the
difference is significant at 5% level (a = 0.05). For F to be significant at 1% level,
use F 0.005 (actually F 0.01 will have to be used giving a 2% significance level
of F).

8.2.6 The X2 Test of Significance


Definition of x2
Let Xl, X2, ... Xn be n normal variates from a population with mean,u and
standard deviation u.
Then

x~ =(XI ;,uy +(X2 ;,uy + ... + en ;,u/ (8.11)

the suffix n denoting the number of degrees of freedom. Obviously the larger n
is the larger X2 and the percentage points of the sampling distribution of X2 are
given in table 8 *.
For example, where n = 1 the numerical value of the standardised normal
deviate u exceeds 1.96 with 5% probability and 2.58 with 1% probability (Le.
with half the probabilities in each tail). Consequently X2 with one degree of
freedom has 5% and 1% points as 1.96 2 and 2.58 2 or 3.841 and 6.635.
However, for higher degrees of freedom the distribution of X2 is much more
difficult to calculate, but it is fully tabulated in table 8*.

Goodness of Fit Test using X2


A most important use of the x2 distribution is in a significance test for the
'goodness of fit' between observed data and an hypothesis.
Let k = number of cells or comparisons
OJ = observed frequency in ith cell
E j = expected frequency in ith cell from the hypothesis
r = number of restrictions, derived from the observed readings, which
have to be used when fitting the hypothesis.
Then

(0. _£.)2
In I I

j = 1
E·I

is distributed like X2 with (k - r) degrees of freedom where r = number of


parameters used to fit the distribution.
For the use of this test all the E j values must be greater than 5. If any are less
then the data must be grouped.
176 Statistics: Problems and Solutions

Application 0/X2 to Contingency Tables


When the total frequency can be divided between two factors, and each factor
subdivided into various levels, then the table formed is called a contingency
table. Data in the form of a contingency table give one of the simplest methods
of testing the relationship between two factors.
Consider the following con tingency table (table 8.1) with the first factor
(F l ) at a levels and the second factor (F2 ) at b levels. The individual cell totals
Oij give the observed frequency of readings at the ith level of factor Fl and the
jth level of factor F2 .

Factor I Row
I 2 3 a totals

Factor 2
I 0 11 0 21 0 31 Oil Oal ~0i1
I

2 0 12 0 22
3 0 13

j Olj Oij 'i::0ij


I

b Olb Oab ~Oib


I

Column
~Ol· ~O··
. 1/ ~Oaj ~~Oi'
totals i / / / i j /

Table 8.1
'i::'i::0ij =total frequency
I /

~Oij = total frequency at the ith level of factor 1 (column total)


/

~Oij = total frequency at the jth level of factor 2 (row total)


I

These tables are generally used to test the hypothesis that the factors are
independent.
If this hypothesis is true then, the expected cell frequency
~O··x~O··
i I] j I]

Eij= ~~O ..
i j I]
Sampling Theory and Significance Testing (II) 177

is distributed as X2 with (a - 1) (b - 1) degrees of freedom.


It can be shown that only (a - 1) (b - 1) of the comparisons are independent
since the row and column totals of expected frequencies must be the same as
the row and column totals of observed frequencies.

8.2.7 One- and Two-tailed Tests


This whole question of one· and two-tailed tests is a subject of considerable
controversy among statisticians.
However, the following points of guidance are useful in deciding which to
apply.
(1) In general, if ever in doubt use the two-tailed test since this plays safer.
(2) Only if, from a priori knowledge, it can be definitely stated that the
change must be in one direction only, can the one-tailed test be used.
The observations apply, of course, to all significance tests and it is hoped
that the examples given will clarify this confusing problem.

8.2.8 Examples on the Use of the Tests


1. A canning machine is required to turn out cans weighing 251 g on the average.
A random sample of five is drawn from the output and each is found to weigh
252 g respectively. Can it be said that the machine produces cans of average
weight 251 g?
Coding the variate by subtracting 250 gives X = 1,2,4,4,2: LX = 13,
Lx 2 = 41.
Null hypothesis is set up that the process is running at an average of 251 g.
Mean

x = 1; = 2.6
Estimated variance of population
LX 2 _ (Lx)2
n
s= 1 = 1.8
n-

Estimated standard deviation of population s = 1.34


Estimated standard error of sample mean Ex = V5 4 = 0.6

On the null hypothesis that the population average is 1

t = 2.~.~ 1 = 2.67
178 Statistics: Problems and Solutions

From tables* (4 degrees of freedom)

t o.025 = 2.78 to.oos= 4.6


or the results are not significant. However, since the t value is close to the 5%
level (2-tailed) it is possible that if a large sample were taken a difference may
be shown.
2. A weaving firm has been employing two methods of training weavers. The
first is the method of 'accelerated training', the second, 'the traditional' method.
Although it is accepted that the former method enables weavers to be trained
more quickly, it is desired to test the long-term effects on weaver efficiency. For
this purpose the varying efficiency of the weavers who have undergone training
during a period of years has been calculated, and is given in table 8.2.
Is there any significant difference between training methods?

Specialised Traditional
training method Total
A B

Above shed average 32 12 44


Below shed average 2 14 22 36
Insufficient data 3 6 9 15
Total 52 43 95

Table 8.2. Training schemes and weaver efficiency

Null hypothesis set up that there is no difference in the methods.


EAI =i x 44= 24.1 EBI = 44-24.1 = 19.9
E A2 = ~ x 36 = 19.7 EB2 = 36-19.7 = 16.3

E A3 =ij x 15 = 8.2 E B3 = 15-8.2 = 6.8

x2 = 7.9 2 + 7.92 + 5.72 + 5.72 + 2.22 + 2.22


24.1 19.9 19.7 16.3 8.2 6.8
= 2.59+3.14+ 1.65+2.00+0.59+0.79
= 10.76
Degrees offreedom = (3-1)(2-1) = 2

x5.os = 5.991 X5.01 = 9.210


Sampling Theory and Significance Testing (IJ) 179

Special Training Traditional Total


A B

Above overage 0 : 32 0 : 12 44
E : 24.1 E : 19.9

Below average 2 0 :
14 0 : 22 36
E : 19.7 E : 16.3

Insufficient 0 : 6 0 :
9 15
data 3
E : 8.2 E :
6.8

Total 52 43 95

Table 8.3
Result is significant at 1% level.
There is evidence that the training methods differ in their long-term
efficiency.

3. In a study of two processes in industry the following data were obtained.

Process 1 Process 2

Sample size 50 60
Mean 10.2 11.1
Standard deviation 2.7 2.1

Table 8.4

Is there any evidence of a difference in variability between the processes?


A further sample taken on process 1, gave: sample size = 100, mean = 10.6,
standard deviation = 3.1.
What is the significance of the difference in variability now?
Null hypothesis set up is that there is no difference in the variation of the
two processes.
F = Greater estimate of population variance
Lesser estimate of population variance

Referring to table 9*
Degrees of freedom of greater estimate VI = 49, read as 24 (safer than (0)

Degrees of freedom of lesser estimate V2 = 59, read as 60


180 Statistics: Problems and Solutions

To be significant F must reach 1.88 at 5% level or 2.69 at 0.2% level.


The difference is not significant.
Variance of combined sample for process 1
_si(nl-1)+s~(n2-IL2.72 x49+3.1 2 x99
- rll+n2-2 - 50+100-2

= 357 + 951 = 8 83
148 .

F= ~:~;= 2.00
From table 9*
Degrees of freedom of greater estimate = 149, read as 00
Degrees of freedom of lesser estimate = 59, read as 60
5% significance level = 1.48
0.2% significance level = 1.89
Difference is highly significant or there is strong evidence that process variations
are different.

4. For example (1), page 69, in chapter 3, for goals scored per soccer match,
test whether this distribution agrees with the Poisson law.
Null hypothesis: the distribution agrees with the Poisson law.

No. of goals/match 0 2 3 4 5 6 7 8 Total

Actual frequency (0) 9


2'----y--' 11 15 8 5 5 1 1 57
~
11 7
Poisson frequency (E) 2.6 8.0 12.4 12.7 9.9 6.1 3.21.4 0.8 57
'----y--' '--v---'
10.6 5.4

Table 8.5

In table 8.5 the last three class intervals must be grouped to give each class
interval an expected value greater than 5. Also, the first two.

2 =(11-10.6)2 +(11-12.4)2 + +(5-6.1)2 + (7-5.4)2 = 163


X 10.6 12.4' . . 6.1 7 .
Sampling Theory and Significance Testing (II) 181

Degrees of freedom = 6-1-1 = 4, since the totals are made the same and the
Poisson distribution is fitted with same mean as the actual distribution.
Referring to table 8*
X~.os = 9.488 X~.OI = 13.277
Thus, there is no evidence from the data for rejecting the hypothesis, or the
pattern of variation shows no evidence of not having arisen randomly.

5. In a mixed sixth form the marks of eight boys and eight girls in a subject
were
Boys: 25,30,42,44,59, 73, 82, 85; boys' average = 55
Girls: 32,36,40,41,46,47,54,72; girls' average = 46
Do these figures support the theory that boys are better than girls in this
subject?
Null hypothesis-that boys and girls are equally good at the subject.
From the sample of boys x! = 55 si = 540.57
From the sample of girls X2 = 46 s~ = 156.86
Applying the F test to test that population variances are not different, gives,

F = 540.57
156.86 =3, 46 ( not slgm
" f 'lcant)

Best estimate of popUlation variance by pooling

S2 = (n! -l)si + (n2 -1)s~ = 7 x 540.57 + 7 x 156.86,;, 349 0


n! +n2 -2 (8+8-2)"
Standard error of difference between means
e(x! -X2) = y[349(i + i)] = 9.35
_(55-46) _ 9 _
t - 9.35 0 - 9.35 - 0.96

with 14 degrees of freedom.


From table 7*
t o•025 = 2.14 (two sided 5%)
There is no evidence from these data that boys are better than girls,
(see discussion of example 5, chapter 7, p 153).

6, In designing a trial to test whether or not the conversion of a machine has


reduced its variability, a sample of 20 on the new process is taken.
SPS-7
182 Statistics: Problems and Solutions

Previous machine standard deviation before conversion = 2.8 mm. For the new
process, calculated from sample of 20, standard deviation = 1.7 mm.
What is the significance of this test?
It can be assumed that the process change could not have increased the
variation of the process.
Null hypothesis-that no change has occurred in process variation. Thus, a
one-tailed test can be used
2.8 2
F= 1.7 2 = 2.71 Vz = 19 (use 18)

from table 9*. Thus


Fo.os = 1.92 FO.Ol = 2.57

Therefore, the result is highly significant and the change can be assumed to have
reduced the process variation.

8.3 Problems for Solution


1. Three women take an advanced typing course in order to increase their
speed. Before the course their rates are 40, 42, 40 words per minute. After the
course their speeds are, 45,50 and 42 respectively. Is the course effective?
2. Table 8.6 gives the data obtained in an analysis of the labour turnover
records of the departments of a factory. Is there any evidence that departmental
factors affect labour turnover and if so, which departments?

Average Number of
Department labour force leavers/year

A 60 15
B 184 16
C 162 15
D 56 12
E 30 4
F 166 25
G 182 25
H 204 18

Table 8.6

3. Table 8.7 gives the data obtained on process times of two types of machine.
Is machine A more variable than machine B?
Sampling Theory and Significance Testing (/I) 183

Machine A Machine B

Average time 2.5 2.3


Standard deviation 0.5 0.2
Sample size 100 80

Table 8.7

4. A change made to a process was tested by timing two sets of different


workers. Those using the new process completed the job in
32,32,33,33,33,34,34,35,39,45s
Using the old process, another group completed it in
31,32,32,33,33,34,37,43,47,48s
Is the new process quicker?

5. In designing a trial to test whether or not the conversion of a machine has


reduced its variability, a sample of 13 items is taken from the new process.
Previous machine standard deviation before conversion = 2.8 mm; standard
deviation from new process = 1.7 mm.
Is a reduced variability demonstrated?

6. The number of cars per hour passing an intersection, counted from 11 p.m.
to 12 p.m. over nine days was 7,10,5,1,0,6,11,4,9.
Does this represent an increase over the previous average per hour of three
cars?

7. In a time study, only 18 readings of an element could be taken as the order


was nearly finished. They were as follows, in minutes
0.12,0.14,0.16,0.12,0.12,0.17,0.15,0.14,0.12,
0.11,0.12,0.12,0.12,0.15,0.17,0.13,0.14,0.14
Within what limits (95% confidence) would you expect the actual average
time for this element to lie?

8. A coin is tossed 200 times and heads appear only 83 times. Is the coin
biased?

9. A new advertising campaign is tried out in addition to normal advertising


in six selected areas and the sales for a three-month period compared with
184 Statistics: Problems and Solutions

those of the six areas before the special campaign. The data are given in table 8.8.
Has the new campaign had any effect on the sales?

Sales before campaign Sales after campaign

Area 1 £2000 £2500


2 £3600 £3000
3 £2500 £3100
4 £3000 £2800
5 £2800 £3400
6 £2900 £3200

Table 8.8

8.4 Solution to Problems


I. The null hypothesis is set up-that the advanced typing course will not
affect the speed of typists.
This is a paired 't' test since by considering the differences only, the variation
due to varying basic efficiency of the individuals is eliminated.

Difference
Typist
x

1 5 25
2 8 64
3 2 4
~i:::: 15 '1;xl:::: 93

x=5 Table 8.9

Estimated population variance


~~ _ (~i)2 93 _ (15)2
I
82 :::: n _ _3_=9 s:::: 3
n-I 2
Thus

t :::: X ~ 0 :::: 5 ~ 0 :::: 2.89


...;'3 ...;'3
with 2 degrees of freedom.
Sampling Theory and Significance Testing (II) 185

Reference to table 7* (using two-tailed test)


t o.OS / 2 = 4.303 t o.010/2 = 9.925
or the result is not significant.
Thus there is no evidence from this sample that the new course improves
speed.
It is not surprising that no evidence was found from this trial because of the
small sample taken. In practice, when more .girls were tested the new course was
shown to be more effective, illustrating that the result not significant does not
mean no difference but that no evidence of a difference has been found.

2. Null hypothesis: there is no difference in turnover rate between departments.


The expected number of leavers and X2 contributions are given in table 8.10.

Average Expected
Number of Con tribution
Dept. labour number of
leavers/year to X2
force/year leavers/year
(Oi) (E i )

A 60 15 7.5 7.5
B 184 16 23.0 2.1
C 162 15 20.0 1.2

~} 56 }
30 86 1~ }16 ~:~} 10.8 2.5
F 166 25 20.8 0.9
G 182 25 22.8 0.2
H 204 18 25.5 2.2

Totals 1044 130 130.4 X2 = 16.6

Table 8.10

Since in department E, the expected number of leavers is less than five, it has
to be grouped with another department. It is logical to group it with a similar
department or one whose effect would be expected to be similar on number of
leavers. Here, haVing no other a priori logic, since there is little difference between
the observed and expected frequency for department E, it has little effect. Here
it is combined with department D, the next smallest.
130
Average turnover rate = 1044 x 100% =12.5% per year
186 Statistics: Problems and Solutions

Expected number of leavers per year in department A

= \~~ x 60= 7.5


and so on.
Thus, X2 = 16.6 with (7 -1) or 6 degrees of freedom since only the total was
used to set up hypothesis.
Reference to table 8*

X~.os = 12.592 X~.OI = 16.812 X~.OOI = 22.457


Thus result is significant at 1/20 level, or there is evidence of differences between
departments.
When such a result is obtained then it is usually possible to isolate the
heterogeneous departments by locating the department with the largest
contribution to X2. Providing the X2 is significant at 1 degree of freedom or
exceeds 3.841 at 1/20 significance level, this department should be excluded
from the data and the analysis repeated until X2 is not significant. If the result
is significant with no single contribution greater than 3.841 then conclusion
can only be drawn that heterogeneity is not due to one or two specific
departments but general variations between all.
The results of repeating the analysis excluding department A are given in
table 8.11.

Average Expected
Number of Contribution
Dept. labour number of
leavers/year to X2
force/year leavers/year

B 184 16 21.5 1.40


C 162 15 19.0 0.84
D
E
56}86
30
1~}16 10.0 3.60
F 166 25 19.4 1.61
G 182 25 21.3 0.64
H 204 18 23.8 1.41

Totals 984 115 115.0 9.50

Table 8.11

Average turnover rate =W x 100% = 11.7%


Thus X2 = 9.50 with (6-1) or 5 degrees of freedom

X~.OS = 11.070 X~.Ol = 15.086 X~.OOl =20.517


Sampling Theory and Significance Testing (II) 187

or result is not significant, there is no evidence of differences between remaining


departments. It would however be worth checking further on department D
since its contribution to total X2 has been 'watered-down' in having the data
from E combined with it.

3. This is an 'F' test.


Assume no a priori knowledge-use two-tailed test.
0.5 2
F =0.21 = 6.25
Referring to table 9* use Vi ~ 24, V2 ~ 60 (on safe side)

Fo.os = 1.88 F 0.002 = 2.69


Clearly present value is highly significant, the product from machine A is more
variable than the product from machine B.

4. Null hypothesis- the change to the process has not affected the time.
Let Xl = time of new process
X2 = time of old process
then, mean of new process Xl = 35 s.
Variance of new process estimated from sample

si = ~(Xi;Xd2 = 16.44
Mean of old process X2 = 37 s.
Variance of old process estimated from sample
s~ = 42.67
In order to apply the 't' test, the variances of the two populations must be
the same.
Using the 'F' test to test that population variances are the same, gives

F = ~~::; = 2.59 with Vi = 9 degrees of freedom


V2 = 9 degrees of freedom
Table 9* shows that this is not significant-or there is no evidence of difference
in population variances.
Thus pooling the two estimates si and s~ to give best estimate

S2 =(ni -l)si + (nz - l)s~ = 29.6


ni+n2- 2
188 Statistics: Problems and Solutions

Standard error of the difference between the means


e(XI-X2) = v'[29.6 (fo + -/0)] = 2.43
=(37-35)-0 = 0 82
t 2.43 .

with 18 degrees offreedom.


From table 7*
t o.05 / 2= 2.101 to.02/2= 2.878
or result is not significant at 0.05 level-there is no evidence that the change has
reduced the time.

5. Null hypothesis-conversion has not reduced the variability.


Thus
2.82
F= 1.72 = 2.71 VI = 00 degrees of freedom
V2 = 12 degrees of freedom

Referring to table 9* for VI = 00, V2 = 12 gives

Fo.os = 2.30 FO.OI = 3.36


The result is significant at the 5% level but not at the 1% level". Some further
sampling would probably be in order so as to reduce the errors involved in
reaching a decision.
Strictly, the one-sided F test (used because there is, say, prior knowledge
that the conversion cannot possibly increase product variation but may reduce
it) should be applied as follows.
Observed
1.72
F = 2.8 2 = 0.37

The lower 5% point of F with VI = 12 and V2 = 00 is obtained from table 9* as


1 1
FO.95 , 12, .. =F0.05, ",12 =2 . 30 =0.435
Since the observed value of F is lower than this, the reduction in variation is
Significant (statistically) at the 5% level.
The lower 1% point of F is
1
3.36 = 0.30

and the observed F is not significantly low at this level.


Sampling Theory and Significance Testing (II) 189

6. Null hypothesis: there has been no increase in number of cars.


From sample
'I;X[ = 429
Sample mean
x = 5.9
Variance 429 _ (53)2
S2 = 9 = 14.6
8

Standard deviation
S= 3.82
5.9-3
t =3.82 = 2.28
V9
with 8 degrees of freedom.
From table 7*

t o.os /2 = 2.306
Thus the result is not quite significant at the 5% level. On the present data no
real increase in mean traffic flow is shown.

7. Sample mean
'I;x·
x =-n ' = 0.136 min
Estimate of popUlation variance

S2 = 'I;(Xi- X)2 = 0.000212 S = 0.0146 min


n-l
Let 110 =unknown true popUlation average. Then for 95% confidence
S < <_x + t0.025 VS
x - t0.025 V
_
110n n

0.0146 0.0146
0.136-2.11 V 18 < 110 < 0.136 + 2.11 V18

or inside limits 0.136 ± 0.0073.

8. This problem will be solved using two alternative methods-the 'u' test and
the X2 test.
190 Statistics: Problems and Solutions

1st Method-the 'u'Test


Hypothesis-the coin is unbiased.
:. Probability of a head = 0.50
Sampling distribution of number of heads in 200 trials has
=np =200 x 0.50 = 100
J.I

a =y[np( 1-p)] =y200 x 0.5 x 0.5 =7.07


=83.5-100 = _ 16.5 =-2 33
u 7.07 7.07'
From table 3*, probability of 83 or fewer heads = 0.01; by symmetry the
possibility of 117 or more heads is 0.01.

~ 83.!\ 100
Figure 8.1

2nd Method-the 'X 2 ' Test

Heads Tails

Observed 0 83 117
Expected E 100 100

Table 8.12

Heads Tails

o 83.5 116.5
E 100 100

Table 8.13. Using Yate's correction

2 = (83.5-100)2 + (116.5-100)2 = 5445


X 100 100 .
with 1 degree of freedom.
Sampling Theory and Significance Testing (II) 191

From table 8* the probability of X2 this high, or higher, is approximately 0.02.


However, in calculating X2 , the tabulation includes both tails of the normal
distribution of which it is the sum of the squares.
Hence the probability of getting X2 > 5.445 is 0.02 if both the probability
of 83 or smaller and also of 117 or more are included.
Thus the probability of 83 or smaller = 0.01, which agrees with the result by
the 'u' test.

9. Here, since from a priori knowledge, it can be stated that the new campaign
can only increase the sales rate. Then a one· tailed test can be used for extra
'power' in the test.
Again the paired 't' test is applicable. Null hypothesis-new campaign has
not increased the sales.

Area Difference in sales


x
+500
2 -600
3 +600
4 -200
5 +600
6 +300

Average +200

Table 8.14

Code data
, x
x
100
Thus x' =2
~(' -')2
... X ~x = 24.4, s~ = 4.93

t = 2 - 0 = 0.99
4.93
y6
with 5 degrees of freedom
to.05 = 2.015 (for one tail test)
or result is not significant; there is no evidence of an increase in sales rate.
192 Statistics: Problems and Solutions

8.5 Practical Laboratory Experiments and Demonstrations


The concept of significance is perhaps one of the most difficult to grasp in
statistics, i.e. that one cannot prove a hypothesis, only offer evidence on a
probability basis for its rejection.
Here again practical participative laboratory experimentation gives the most
effective vehicle for pu tting across this concept.

8.5.1 Experiment 14-the 't' Test of Significance


(This experiment is from the Laboratory Manual pages 62-65.)
Given to students in groups of two or three after lectures on significance
testing and 't' test of means.
In this experiment, use is made of the two normal populations of rods
supplied in the Kit.t While it is appreciated that realism can be introduced to
experiments by using components from industry, experience has shown the
necessity of having standard populations available, especially as they are used
extensively throughout the experiments.
In Appendix I the instruction sheet, recording forms and analysis and
summary sheets for the experiment are given together with a set of results
obtained.

Red Yellow
rod population rod population

Mean Jl 6.0 6.2


Standard deviation a 0.2 0.2

Table 8.15

The population parameters are given in table 8.15. These parameters are
chosen so that for the first part of the experiment with sample sizes n = 10,
approximately half the groups will establish a significant difference between the
populations while the other half will show no significant difference at the 5%
probability level. Since each group summarises the results of all the groups, this
experiment brings out much more clearly than any lecture could do, the
concept of significance.
In the second part of the experiment where each sample size is increased to
30, the probability is such that all groups generally establish (95% probability)
a significant difference. The experiment demonstrates that there is a connection
between the two types of error inherent in hypothesis testing by sampling and
the amount of sampling carried out. To complete this experiment, including the
full analysis, takes approximately 40 min.

t Available from Technical Prototypes, IA Westholme Street, Leicester.


Sampling Theory and Significance Testing (II) 193

8.5.2 Experiment i5-the 'F'Test


(This experiment is described in pages 66-68 of the Laboratory Manual.)
The same rod populations as for experiment 14 again demonstrate the
basic concepts of the test.

8.5.3 Experiment i6-Estimation df Population Mean


(Pages 69-71 of Laboratory Manual.)

8.5.4 Experiment i7-Estimation of Population Mean (Small Sample)


(pages 72-74 of Laboratory Manual.)

8.5.5 Experiment i8-Estimation of Population Variance


(Pages 75-76 of Laboratory Manual.)
Note: All these experiments use the standard rod populations supplied with
the Statistical Kit No.1.

8.5.6 Experiment i9-The X2 Test


Using data from experiment 1 this experiment is described on pages 77-79 of the
Laboratory Manual.

Appendix 1
Object
To test whether the means of two normal populations are significantly different
and to demonstrate the effect of sample size on the result of the test.
Method
Take a random sample of size 10 from each of the two populations (red and
yellow rods) and record the lengths in table 1. Return the rods to the
appropriate population.
Also take a random sample of size 30 (a few rods at a time) from each of the
two populations (red and yellow rods) and record the lengths in table 2.
Analysis
(1) Code the data, as indicated, in tables 1 and 2.
(2) Calculate the observed value of't' for the two samples of size 10 and again
for the samples of size 30.
(3) Summarise your results with those of other groups in tables 3 and 4.
Observe whether a significant difference is obtained more often with the
samples of size 30 than with the smaller samples.

Notes
The 't' test used is only valid provided the variances of the two populations
are equal. This requirement is, in fact, satisfied in the present experiment.
Table I

Yellow population
In order to reduce the subsequent arithmetic,
Rod Coded
and to keep all numbers positive, the coded
lengths data
, values, x', are used in the calculation. The
x x"
coded data can be obtained by subtracting
b I O·~ OOCj from all readings, the smallest observed rod
length in the sample. The coded values, y', may
>. 'lI a· 0 00 be obtained in a similar way for the sample of
b·'L. 0 4- O\'=> red rods.
If a is the length of the shortest yellow rod
G,\ O~ OOq
in the sample, the mean, X, of the sample is
("2- o ./..t (? l "
x=a+~x'=58+3.7 =617
b· (J o· 2- 0·0 L{- 10 . 10 .

b'~ rJ $' 02..s The variance, si, of the yellow sample is


b'J..- rJ'4- 01 b
~X'2 _(~X')2 169~
b~ o~ 02...S 2 _ 10 . 9 10 = 0:2 = 0.0355
sx - 9
{,.~ 07 04-1:\

LX= ":>7 \. b 9 = l:x"


If b is the length of the shortest red rod in the
sample, the mean, y, of the sample is
~'
Red population Y=b+ 1b=5.97
Rod Coded
lengths data The variance, s~, of the red sample is
, ~y'2 _ (~y')2
y y"

S- G 0 0 00 S2 = 10 = 1.97 - 1.37 _
y 9 9 - 0.0667
(,. 2- O· b O;;'b
The pooled estimate of variance, S2 , is
b'L- Ob o :'b
~X'2 _ (~x')2' + ~y'2 _ (~y')2
r; « 02. OOC,f- 2 _
s -
18
18
10
= 0.0512
5' Co 0·0 0·0

~'i? 02..... 004- t=J


s
X-Y
-1+ -
nx
1
6.17 - 5.97
v'0.0512tVfo
ny
+ to = v'0.0512
0.447

b·o OW- Olb


= 1.96
b· 2- O' f, O'~b

b Q o· 4- D·l b

0'~ 0'7 04-q


l;y' = ~7 l· 97 = l;y"
Yellow population - (30) Red population - (30)

Rod Coded Rod Coded The analysis is exactly similar


lengths data lengths data to that for samples of size 1O.
X'2
,
x' y y" If a and b denote the
b·O 0·:::.. 0·0 ~ b :, 07 OlYl lengths of the shortest yellow
o· 3 o· (, O·~ (, 6 0 04- 0·\ b and red rods (in samples of 30),
respectively
5'7 00 0·0 b \ o 5 o 2S'
b· 0 0'3 009 b \ o S' o'J.-S
5· q 0'2- 0·0 It S' q a ~ 00'1
~'
04- o· \ b x = b +~
30 = 5 .98
{. ~ 0';1..... O'OLf- bO
(. q 0·). 004- (? 0 o It 01 b
The pooled estimate of variance,
6· 3 O·b O~b bo 0''+ 0- \ 6
S2,is
b· ~ O·~ 0-:2.. S' b 0 04- Q. \ 6

6· 2- oS- 0·2..~ (g·o O·t..+- 01 b ~X'2 _ (~X')2 + ~y'2 _ (~y')2


2 _ 30 30
6· 3 O'b a·'!. b b·)... Db O":l(' S - 30+ 30-2
b·O 0'3 0'09 ~ "6 02.. 00t../-
t = (x- y)
b· \ 0'4- 0·\ b b I () ~ Q:l-?
a J~+.l
6· L.t- 07 a'4- 'I ~ q 03 0'09 nx ny
5" '1 0'2. a'04- ~.y o· 2.. 0'04- which reduces to
b' 0 o·~ ao~ 5" '1 O~ 0'0"1
_ (X-y)y'870
o· \ o·y.. 0\ b (,;,. I 0,-:; o· 2."; t - v'~X'2 + ~y'2 -/0 [(~X')2 +(~y')
v' I 0'4- 0·\ b b I 0·5 0')..<;
_ 12 x 29.5
o· \ o· 4- 0·\ b 5 '1 o~ 0·0'1
- v'1O.98 - fo [144+ 130]
6· 4- 0'7 0,('>' 'I b:L 0·" o·~(,

b· ( 0'''8 O-b4- S'7 0·\ 0'01 = i:~: = 2.6


b· 0 0'3 0·0 'i ao 00
5 "
tr 0 c>.~ 0'0'1 {,. I a·<; 0·2..";

b,!, o· (, O'3h b 2- O'b O';J!"


&.~ o·~ O;l..~ 5 q o· "::> o.oq

6· 2.. a·S' O·')...~ 0 I ().~ 02 <;

r-. ~ 0'\ o·oJ <;,9 0::' o·oq

(,. 2 O'S (?:z..{ <;.~ 0'2 oo/,¥


~. 9 02. o·O't (,. 0 0'4- O' I{,

6'1 0'4- 0·\ b 'S·7 0·, 0- 0 I


:Ex'= \:2...·0 5'·~b = :Ex' • :Ey'= 'H·r S". 12. = :Ey'2
Table 3
Summary Table - samples af size 10

Sample means Difference Value Whether


significant
Group of
at 5% level
x y (x-y) t (two - tail test)

<D b· I 7 5 q7 0 :LO 1 "Ib \\.)0

2 b I I <:> 01 0 10 I· 2.7 11)0

3 b :, l? 5 <14- o /.t- 4- 4' ::, '-11'::::'

4 6 ;l...~ 6· O;L O· 2..::' 2' Ib '-IE::'

5 0.2-2.. ;) gC1 o· 0,::' z,' b :, U,es

6 60"1 <;. C\ b 0 13 I· S ~ NO

7 0· \ ?, 0· \"2:J 0 0 (\)0

8 6· I 7 ;)"97 O' ::I-O :2-' 4-lf '-IEoS

The value of I t I which must be exceeded for the observed difference to be significant
at the 5% level =2'101

Table 4
Summary Table - samples of size 30

Sample means Difference Value Whether


Group of significant
at 5% level
x y (x-y) t (two-tail test)

CD b 10 5· '1~ O· 12 ::L. b '1 E :::,

2 6 15 b 03 o· 1.2- :<- 73 '1ES

3 6 2'1 S'Q7 o· 32 6". 'E~ '1E~

4 6· \ S 5 99 o· \ 9 3' 92. Y£::::,

5 6· \ '3 5". 'I 6 O· 2.2- tr· Db "\(~

6 6 12.. 6· 0 b 0·06 \. \7 (\)0

7 6· 19 S-. 99 o· .2..0 "3' 64- '-Ie";)

8 ",. 2.2.. 5· 94- (] . .2..1? (:, 7S" '-\E:",:,


The value of I t I which must be exceeded far the abserved difference to be significant at
the 5% level = 2·002
9 Linear regression theory

9.1 Syllabus
Assumption for use of regression theory; least squares; standard errors;
confidence limits; prediction limits; correlation coefficient and its meaning in
regression analysis; transformations to give linear regression.

9.2 R~sume of Theory Covered


9.2.1 Basic Concepts
Regression analysis is concerned with the relationship between variables. In this
chapter, only the linear relationship between a dependent variable, y, and an
independent variable, x, will be discussed.
Regression analysis can be extended to cover curvilinear relationships
between two variables and the relationship between a variable y and mother
variables Xl,X2, ••. x m . This is called multi·regression analysis and details
can be found in textbooks on mathematical statistics.
The data for regression analysis may take two forms:
(l) The natural pairing of variables such as: height and weight, height of
son and height of father, the output of a department per week and the average
cost, or the sales of a product in an area and the advertising expenditure in that
area.
(2) The independent variable x is given assigned values and for each value of
x, a range of values of y is obtained. This type of data normally arises when the
experimental design is under the control of the analyst and data in this form
are from many points of view preferable to data of class (1). For example, in
establishing relationships between cutting tool life and speed, the experimenter
may vary speed (x) over a finite number of values and then take a number of
observations of tool life (y) at each of these levels.
Note: It is important to appreciate clearly that the regression relationship
calculated only holds over the range of variation of x used in the calculation.
Any extrapolation of the relationships can only be carried out based upon
SPS-8 197
198 Statistics: Problems and Solutions

a priori knowledge or assumptions that this relationship will hold for other
values ofx.

9.2.2 Assumptions Required for Linear Regression Analysis


The following assumptions are required for the use of regression theory and for
the use of significance testing in the theory.
(1) The dependent variable (y) is normally distributed for each value of the
independent variable (x).
(2) The independent variable x is either free from error or subject to negligible
error only.
(3) The variance of y for all values of x is constant.
Note: It is also possible in advanced theory to apply regression analysis in cases
where the variance of y is a function of x.

9.2.3 Basic Theory


The regression line is fitted by the method of least squares. Given the population
theoretical regression line as
7] =ex + (3(x-x)
then the best estimate of this line is given by
Y=a + b(x-x)
where
n
IfiYi
a=_i_ _
n
Ifi
i
and
b - 'LflYi - y)(xi - x)
- 'Lf;(Xi- X )2
a and b are unbiased estimates of a and {3 respectively.
These estimates minimise the residual variance of y about the regression
line and for this reason the approach is known as the 'method of least squares'.
Note: In the first form of data where there is a natural pairing of the points,
Ii = 1 for all i, and the regression coefficients are given by the following
formulae
n n
IYi
i ~
I (Yi- Y)(Xi- X)
a=---ex b=i ~{3
n 'L(Xi - X)2
where n = number of pairs of observations.
Linear Regression Theory 199

Since this book is concerned with giving an introduction to the--theory, the


examples given will be for this case of paired variables. It should be stressed, .
however, that for cases where fi > 1 a more rigorous theory can be developed
and, in fact, a test for linearity can be incorporated into the analysis. Details
of this more advanced analysis can be found in most mathematical statistics
textbooks.
This omission of an independent test of linearity requires usually an a priori
knowledge of linearity and this should in all cases be examined by drawing a
scatter diagram.

9.2.4 Significance Testing


It is, of course, necessary not only to calculate the statistics 'a' and 'b' but also
to be able to test their significance. This point cannot be stressed strongly
enough. In addition it should be noted that even if a regression coefficient is
found to be significant, it does not necessarily imply a causal relationship between
the variables.
The standard errors of the coefficients are: standard error of a

standard error of b

'where s' = residual variance about regression line

=
~(y.-
I
y.y
I
n-2
where Yi = estimate from regression line.
The significance of a and b can, therefore, be tested by the 't' test (see
chapter 8) or, alternatively, as shown in some textbooks, by an 'F' test, (see for
example Weatherburn, A First Course in Mathematical Statistics, C. U.P., pages
193 and 224, example 8).

The t-Test of the Significance of an Observed Regression Coefficient b


Set up the null hypothesis that ~ = 0, i.e. that there is no linear relationship
between y and x and thus the values of yare independent of the values of x.
Remember that in this simple theory, it is necessary to assume that the
only possible relation between y and x is a linear one.
Under the assumptions given in section 9.2.2, the statistic
200 Statistics: Problems and Solutions

will be distributed like Student's t with (n - 2) degrees of freedom. The degrees


of freedom of Eb are 2 less than the number of points (pairs of values of x and y)
since the residual sum of squares about the fitted regression line is subject to
two independent constraints corresponding to the two constants calculated from
the data and used to fit the regression line.
The value of

t=--
b-O
Eb

given by the data can be referred to table 7* of the Statistical Tables and if it is
significantly large, judged usually on a two-sided basis, there is thus evidence of a
linear relationship between y and x.

9.2.5 Confidence Limits for the Regression Line


The standard error, €Yi' of the regression estimate, Yj, is given by

100 (1- a)% confidence limits for the precision of estimation of the regression
line are then given bYYi ± t a /2 ,v €Yi for given xi where v = n - 2.
Note: The confidence limits are closest together at the average value x, of
the independent variable.

9.2.6 Prediction Limits


The confidence limits defined in section 9.2.5 relate to the position of the
assumed 'true' regression line. If the relation is to be used to predict the value
of y that would be observed corresponding to a given value of x, then, in
addition to the uncertainty about the 'true' regression line, the scatter of
individual values of y about this 'true' line must also be allowed for.
The standard error of a single value of y corresponding to a given value,
Xi, is

obtained by adding on the variance of a single value of y to the variance of the


regression estimate, Y i .
Thus, for a particular Xi, there is a probability of 100(1-a)% that the
corresponding value of y that would be observed will lie in the interval
Linear Regression Theory 201

9.2.7 Correlation Coefficient (r)


A measure closely related to the regression coefficient (b) is the correlation
coefficient (r).
The correlation coefficient (r) is a measure of the degree of (linear) association
between the two variables and is defined as

The observed correlation coefficient can be tested for significant departure from
zero bu t, as in the case of the regression coefficient, b, a significant value does
not necessarily imply any causal relationship between x and y.
The residual variance about the regression line defmed in section 9.2.4 as the
sum of the squared deviations of each observed value of y from its estimated value
using the fitted regression equation, this sum being divided by (n - 2) its degrees
of freedom, is related to the correlation coefficient and to the total variance
ofy.
Thus
~(y.
· I
_ y.)2
1
~(yi _ ji)2 ~(y,. _ ji)2 ~
2
I
2 I .2 1 n-
s= =(1-r) -(1 r)--:----:--
n-2 (n-2) - - (n-I) (n-2)
(n-I)
= s2(1-r2)--
y (n-2)

which for large n is approximately equal to s; (1 _r2).


For large n, it follows that a useful interpretation of this result is thatr
measures the proportion of the total variance of y that is 'explained' by the
linear relation betweeny andx.
r can take values between 0 and 1 inclusive and hence for any set of data,
r will be in the range

-I~r~+1

When r = ± 1, then the total variance of y is completely explained by the


variation in x or in other words the relationship is deterministic.
Figure 9.1 shows three sets of data with different values of the correlation
coefficient (r). In the first two cases, the regression coefficient b is the same.
A further useful relationship is that between the regression coefficient (b)
and the correlation coefficient (r) and which is

b=rx!l!.
sx
202 Statistics: Problems and Solutions

where s.~ is the variance of the values of x, Le.


~(Xi-X)2
I

n-l

x x x
(0) r=+I·O (b) r=+O'5 (e) r= 0

Figure 9.1

In practice, it is usual for all these calculations to be carried out on some


form of calculating machine or computer, though there is no reason, apart
from the tedious arithmetic involved, why they should not be done 'by hand'
preferably with suitable coding of the data.
The coefficients are computed as follows

The correlation coefficient

Also the regression coefficient


S
b=rx~
Sx

Thus the following totals are required for the computation


n,
Linear Regression Theory 203

9.2.8 Transformations
In some problems the relationship between the variables, when plotted or from
a priori knowledge, is found not to be linear. In many of these cases it is possible
to transform the variables to make use of linear regression theory.
F or example, in his book Statistical Theory with Engineering Applications
(Wiley), Hald discusses the problem of the relationship between tensile strength
of cement (y) and its curing time (x).
From a priori knowledge a relationship of the form y =Ae-B/X is to be
expected.
The simple logarithmic transformation therefore gives
B
10glO Y =10glO A - - 10glO e
x
or the logarithm of the tensile strength is a linear function of the reciprocal
value of the curing time and the theory of linear regression can then be applied.
Note: The requirement that the variance of y is constant for all x must, of
course, hold in the transformation and this must be checked. Usually a visual
check is adequate.

9.2.9 Example on the Use of Regression Theory


The following example has been selected to illustrate the various concepts,
computational methods and analysis.
In order to keep the computation inside reasonable limits, the number of
observations has been kept small; in practice, however, in many actual problems
hundreds of readings are involved, but with the use of computers the
computation is no problem.
The data given in table 9.1 show the relationship between the scoring of post·
graduate students in a numeracy test on interview and their performance in the
final quantitative examination.

Student 2 3 4 5 6 7 8 9 10
Numeracy test score 200 175 385 300 350 125 440 315 275 230
(pts)
Final exam performance 55 45 71 61 62 50 74 67 65 52
(%)

Table 9.1

What is the best relationship between test score and final performance?
Before any analysis is started the scatter diagram must be plotted to test the
assumption that the relationship is linear. This diagram shows no evidence of
non-linearity (figure 9.2).
204 Statistics: Problems and Solutions

100

90

Regression line
80

70
~
E
0
x
60
Q)

0
.~

c
50 -- x

~
0
0
40
(J)

Prediction limits
30

20

10

o 100 150 200 250 300 350 400


Test score

Figure 9.2. Regression line with 95% confidence limits and prediction limits.

Here y = final exam performance


x = numeracy test score.
No attempt has been made to code the data and the various summations
required for analysis are given below
n = 10 ~XiYi = 175 960
~Yi = 602 ~Yl = 37050

~Xi = 2795 ~xl = 868 325

Total variance of x
~x~ _J~X)2 (2795?
n 868325 - 10 87 122
s;
I
= --n---1-- 9 =- 9 - = 9680.3
Linear Regression Theory 20S

Total variance of y

37050 _ (602)2
2= 10 =809.6=900
Sy 9 9·

Correlation Coefficient

r =
J( 175 960 - 2795 x 602)
10 7701
--8-=7-1-:-:2:-::2-x-8=-1::..:=0'---- = 8401 = 0.92

Regression Coefficients

0;
a = ~ = 61 = 60.2 b -- r x ~- 7701
Sx - 8401
J( 90 )
9680 = 0.088

The regression line is thus given by


Y = 60.2 + 0.088 (x - 279.5) =35.6 + 0.088x

Residual Variance about the Regression Line


The approximate residual variance using the relation given in 9.2.7 is
s = s;(I -~) = 90(1- 0.92 2) = 13.8
Note: This will be slightly in error through squaring a rounded value of r.
A better approach, to ensure arithmetical accuracy, would be to calculate
S2 as

77012)
90.0 ( 1 - 8401 2 = 14.4

However, in addition, in this example, n (=10) is not very large and the
approximation used will lead to an underestimate of the actual residual variance.
Using the exact expression gives

S2 = si(I (n~~)--=--!2 = 14.4 x t = 16.2

and this will be used in the remaining calculations since without this correction,
the bias of the estimator is -~ (I 1%) of the true value.
The residual standard deviation, s, is Y16.2 = 4.02
206 Statistics: Problems and Solutions

Standard E"ors of the Regression Coefficients

€a = yn Js J(16.2)
2
S = -;; = . 10 = 1.27

€b
s
=J[
=Y[L(X-X)2] S2
L(X-X)2
] J( 16.2 ) = 0.0136
= . 87122
Significance of b
Assuming E[ b] = {3 = 0, the observed value of t is
0.088 -0
t = 0.0136 = 6.47

with 8 degrees of freedom.


Reference to table 7* shows that this value exceeds the 0.1% level of t
(5.041 for the two-sided test) and hence the observed value of b = + 0.088 is
very significantly different from zero. This implies that there is a strong linear
relation between y and x, Le. between final quantitative examination
performance and initial numeracy test score.

Confidence Limits for the Regression Line


The standard error, €Yj, of the regression estimate is
€Yj =y[€~ + €~(Xj - X)2] =Y[1.27 2 + 0.0136 2 (xj- 279.5)2]
Thus for
Xj =x = 279.5, €Yt =";1.27 2 = 1.27
Xj = 380 (or 179), €Yj = ";(1.27 2 + 0.0136 2 x 100.5 2 ) = 1.87
Xj = 440 (or 119), €Yj = ";(1.27 2 + 0.01362 x 160.5 2 ) = 2.53

For any given value of Xj, the confidence limits for the regression estimate
(Le. of the mean value of y for that value of x) are found as

Y j ± t Ot /2,(n-2) €Yj

For 95% limits, the appropriate value of t (table 7*) is 2.306; table 9.2 shows
the derivation of the actual limits for a range of values x.
The scatter diagram (drawn before any computations were carried out, in
order to check that the basic regression assumptions were not obviously violated),
the fitted regression line and 95% confidence limits are shown in figure 9.2.
From figure 9.2 or table 9.2, there is 95% confidence that the average final
examination percentage for all candidates who score 330 points in their initial
numeracy test will lie between 61.3% and 67.9%.
Linear Regression Theory 207

Lower 95% Upper 95%


limit limit
Xj Yj eYj 2.31 eYi (Yj-2.31 eYj) (Yj + 2.31 eYj)

119 46.1 2.53 5.8 40.3 51.9


179 51.4 1.87 4.3 47.1 55.7
229 55.8 1.44 3.3 52.5 59.1
279.5 60.2 1.27 2.9 57.3 63.1
330 64.6 1.44 3.3 61.3 67.9
380 69.0 1.87 4.3 64.7 73.3
440 74.3 2.53 5.8 68.5 80.1

Table 9.2

Prediction Limits for a Single Value of y for Given x


The standard error, eyi , of a single value of y corresponding to a given value of
Xjis
eYi = ..J[S2 + e~ + ef,(xi - X)2]
Limits within which 95% of all possible values of y for a given Xi will lie are
found as
Y j ± 2.31 eyj

These limits are calculated in table 9.3 and are also drawn in figure 9.2.

Lower 95% Upper 95%


prediction limit prediction limit
Xj Yi eYi (Y j -2.31 eyj ) (Yj+2.31 eyj )

119 46.1 4.8 35.0 57.2


179 51.4 4.4 41.2 61.6
229 55.8 4.3 45.9 65.7
279.5 60.2 4.2 50.5 69.9
330 64.6 4.3 54.7 74.5
380 69.0 4.4 58.8 79.2
440 74.3 4.8 63.2 85.4

Table 9.3

From the figures in table 9.2, it can be expected, for example, that 95% of
candidates scoring 330 points in their numeracy test will achieve a final
examination mark between 55% and 74% inclusive, 5% of candidates gaining
marks outside this range.
208 Statistics: Problems and Solutions

Note: Such predictions are only likely to be at all valid if the sampled data
used to calculate the regression relation are representative of the same population
of students (and examination standards) for which the prediction is being made.
In other words, care must be taken to see that inferences really do apply to the
popUlation or conditions for which they are made.
The danger of extrapolation has been mentioned. The regression equation
indicates that students scoring zero in the test, on average, gain a fmal mark of
35.6%. This may be so but it is very likely that the relation between t4e two
examination performances is not linear over all values of x. Conclusions on the
given data should only be made for x in the range 125 to 440.

9.3 Problems for Solution


1. The shear strength of electric welds in metal sheets of various thickness is
given in table 9.4.

Thickness of Shear strength


sheets (mm) of sheets (kg)

0.2 102
0.3 129
0.4 201
0.5 342
0.6 420
0.7 591
0.8 694
0.9 825
1.0 1014
1.1 1143
1.2 1219

Table 9.4

Calculate the linear relationship between strength and thickness and give the
limits of accuracy of the regression line.

2. The following problem is based on an example in Ezekiel's Methods of


Correlation Analysis and shows for 20 farms, the annual income in dollars
together with the size of the farm in hectares (Le. units of 10 000 m 2 ). The
data are given in table 9.5.
Find the best linear relationship between the size of farm and income and
Linear Regression Theory 209

Size of farm (ha) Income ($)


(x) (y)

60 960
220 830
180 1260
80 610
120 590
100 900
170 820
110 880
160 860
230 760
70 1020
120 1080
240 960
160 700
90 800
110 1130
220 760
110 740
160 980
80 800

Table 9.5

state the limits of error in using this relationship to predict farm income from
farm size.

3. The data obtained from a controlled experiment to determine the


relationship between y and x are given below
x 5 10 15 20 30 40 55 65 80
y 7.2 14.7 21.0 27.5 30.0 35.0 37.3 40.2 41.8
Calculate the linear regression line.

4. A manufacturer of optical equipment has the following data on the unit


cost of certain custom-made lenses and the number of units in each order.
Number of units 3 5 10 12 (x)
Cost per unit (£) 58 55 40 37 22 (y)
(a) Calculate the regression coefficients and thus the regression equation
210 Statistics: Problems and Solutions

which will enable the manufacturer to predict the unit cost of these lenses in
terms of the number of lenses contained in each order.
(b) Estimate the unit cost of an order for eight lenses.

5. The work of wrapping parcels of similar boxes was broken down into eight
elements. The sum of the basic seconds per parcel (Le. of these eight elements)
together with the number of boxes in each parcel is given in table 9.5.

I
Number of boxes Sum of basic INumber of boxes Sum of basic
in parcel seconds per parcell in parcel seconds per parcel
(x) (y) I (x) (y)

I 130 22 260
6
13
200
150
I 27
34
190
290
19 200
I 42 270

Table 9.5

(a) Calculate the constant basic seconds per parcel and the basic seconds for
each additional box in the parcel.
Calculate the linear regression and test its significance.
(b) What would be the best estimate of the basic seconds for wrapping a
parcel of 18 boxes?

6. A manufacturer of farm tools wishes to study the relationship between his


sales and the income of farmers in a certain area. A sample of 11 regions showing
the income level of farmers in that area, together with the total sales to the
area, gave the data in table 9.6. Of what use is this information to the
manufacturer?

Income level of Total sales to Income level of Total sales to


farms in area farms in area farms in area farms in area
($) ($) ($) ($)
---------------
1300 2800 1300 3000
900 1900 1200 2600
1400 3200 800 3300
1000 2400 1400 1500
800 1700 700 1600
900 2000

Table 9.6
Linear Regression Theory 211

7. The following example illustrates the application of regression analysis to


time series.
The annual sales of a product over eight years are given below
1960 1961 1962 1963 1964 1965 1966 1967
300 215 450 325 375 300 375 400
Estimate the best linear time trend and calculate confidence limits for
forecasting.

9.4 Solutions to Problems


1. Let x = thickness of sheet (rom)
y = shear strength of sheet (kg)
n = 11 "£x1 = 6.49
"£Xi =7.7 "£y1 = 5 692958
"£Yi = 6680.0 x = 0.7
"£XiYi = 6008 ji = 607.3

Variance of x
649 _ (7.7)2
2 =' 11 = 1.10=0110
sx 10 10'

Total Variance of y

5 692958 _ (6680)2
s~ = 10 11 = 1 6361~76.2 = 163637.6

Correlation Coefficient
6008 _ (7.7X6680)
11
r = V{1.10 x 1 636376.2) = + 0.9928

The proportion of the total variance of y 'explained' by the linear regression


relation betweeny and x is approximately 0.9928 2 or 98.6%.

Regression Line

a = ji = 607.3

b = r x ~ = 0.9928 x J 16~.~~~,6 = 1210.9


212 Statistics: Problems and Solutions

The linear regression line is given as


y -607.3 = 121O.9(x-0.7) or y= 240.3 + 1210.9

Standard Errors
The estimated residual variance about the regression line is

thus

Test of Significance of b
From the evidence of the scatter diagram and the high value of r, the observed
value of b is expected to be significant. In confirmation, the test gives

= 1210.9-0=249
t 48.7 .
a very highly significant value of t for 9 degrees of freedom.

Confidence Limits and Prediction Limits


The estimated standard error of the regression line is
€Yi = V[€~ + €b(x - X)2]
and the estimated standard error of a single predicted value of y for given x is
€Yi = V[S2 + €~ + €b(x - X)2]
Table 9.7 shows some values of these two standard errors for particular
values of x, together with the 95% confidence and prediction limits using the
appropriate t-value of 2.26 (9 degrees of freedom).
The information in this table, as well as the observed data are plotted in
figure 9.3.
Notice that the fitted 'best' line does not go through the origin. In fact the
origin is not contained within the 95% confidence interval for the 'true'
regression line-which is equivalent to saying that the intercept of the fitted
line is significantly (5% level) different from zero. From inspection of the
observed data, there is a suggestion that the true relation curves towards the
origin for low values of sheet thickness. In short, do not extrapolate for
thickness values below 0.2 mm and bear in mind that the calculated relationship
for sheet thicknesses of 0.2 mm and just above may underestimate the average
shear strength of welds.
Linear Regression Theory 213

95% confidence limits 95% prediction limits


for regression line for single values

Xi Yi Y i ± 2.26 €Yi €Yi Yi ± 2.26 €Yi

0.2 l.9 28.81 -63,67 58.64 -131, 134


0.3 123.0 24.83 67, 179 56.79 -5,251
0.4 244.1 2l.23 196,292 55.31 119,369
0.5 365.2 18.22 324,406 54.23 243,488
0.6 486.2 16.15 450,523 53.57 365,607
0.7 607.3 15.40 572,642 53.35 487,728
0.8 728.4 16.15 692, 765 53.57 607,849
0.9 849.5 18.22 808,891 54.23 727,972
l.0 970.6 2l.23 923, 1019 55.31 846, 1096
l.l 109l.7 24.83 1036, 1148 56.79 963, 1220
l.2 1212.8 28.81 1148,1278 58.64 1080,1345

Table 9.7

1400

"
1300 ,/ /

/' / /
1200
Regression line "X /
""
1100 " I /
/ / /
1000 ,/ ~/ / ~/
,
/

"0
.><
900 " /,'
," 11,1,"
/,

~ 800
, / 1111,/
: 700 , / X '
r.
III
," / / "
... 600 ," / / /
/
I'

o
r. 500
Co
c: 400
II
" "//'~
I'

//,'
" / / x,
/ / "

.=
III
" /x/ "
,"// , / '
Prediction
limits
300
,/~/x/~
L-
a
1!
IJ)
200 ,'1/,'
100 .,,' / x / "
;V // " Confidence limits

"
/1','
o
/ /,t'
-100

-200

Thickness of sheets (mm)

Figure 9.3. Regression line with 95% confidence limits and prediction limits.
214 Statistics: Problems and Solutions

In the following solutions, since the calculations are all similar to that of
problem 1, the detailed computations are not given.

2. Here the scatter diagram (figure 9.4) shows little evidence of a relationship
but, on the other hand, it does not offer any evidence against the linearity
assumption so the computation is as follows.
n= 20 x = 139.5
~x = 2790 ji = 872.0

~y = 17440 S2 = 3194.47
x

~X2 = 449 900 s~ = 28711.58

~y2 = 15753200 r= +0.0078


~xy = 2 434 300 Sx = 56.5
Sy = 169.4
169.4
b = 0.0078 x 56.5 = + 0.02339

1300
x
1200
x
1100 x
1000 x
x
x
x
900
- - - - -- _x_ x--€) - y - --
x x x x
800

.. x x x
700 x
Q)
600 x x
E
0
<J
~ 500

400

300

200

100

0
40 80 120 160 200 240
Size of farm (ha)

Figure 9.4
Linear Regression Theory 215

Regression Line

y - 872 = 0.0234 (x -139.5) y= 868.7 + 0.0234x

Significance of b
From inspection of the scatter diagram (figure 9.4) and the low value of r (the
significance of which can be tested using table 10*), the observed value of b is
not expected to differ significantly from zero.
Residual variance

S2 = 28 711.58(1-0.0078 2) x ~ = 30305
Standard error of b

Eb = J[ 30 305 (2790)2 1J =
(30 305) = 0.71
60695
449900- 20

Thus the observed value of

t = 0'06.;~ - 0 = 0.033
which is clearly not significant. (For the slope of the fitted regression line to be
Significantly different from zero, at the 5% level, the observed value of t would
have to be numerically larger than 2.101.)
Thus, until further evidence to the contrary is obtained, farm income can be
assumed to be independent of farm size, at least for the population of farms
covered by the sample of 20 farms.
Since the data show no evidence of a relation between farm size and income,
there is little point in retaining the fitted regression equation. The best estimate
of the mean income of farms in the given popUlation is therefore $872.
Ninety-five per cent confidence limits for this mean income are given by
169.4
872 ± 2.101 x y20 = 872 ± 79.6 = $792.4 to $951.6

Ninety-five per cent prediction limits for the income of an individual farm
are given as
872 ± 2.101 x 169.4 y(1 + 1o}= $872 ± 364.7
=$507.3 to $1236.7
3. This problem is of interest since the assumption of linearity can be quite
safely rejected after drawing the scatter diagram (figure 9.5). There is therefore
no point in trying to fit a single linear relationship to the data.
2]6 Statistics: Problems and Solutions

y 45

x
40 x

x
35 x

30 x
x
25 The relationship is not
linear so analysis cannot
be continued
x
20

15
x

10

x
5

0
10 20 30 40 50 60 70 80
x
Figure 9.5

In practice either a polynomial or other mathematical function would be


fitted to the observed data or else a suitable transformation of the values of
either x or y or both would be used to give an approximately linear relation. In
this latter case, the standard methods could be used to find the linear regression
relation between the transformed y-values and the transformed x-values.
However, since both methods are beyond the scope of this chapter, the
answer here is that linear regression analysis cannot validly be used directly with
these data. Although there appears to be a relationship, it is not linear.

4. The scatter diagram (figure 9.6) indicates quite a strong relationship between
unit cost and order size, and a simple linear relation would probably be adequate,
at least in the range of order size considered. Such a simple model would be
inadequate for extrapolation purposes since the cost per unit would be expected
to tend towards a fixed minimum value as order size was increased indefinitely
and therefore some sort of exponential relation would be a better fit for such
purposes.
Linear Regression Theory 217

70 "-
"-
"-
60 "- Regression line
"- "-....
x
"-
50
S
-c:
::l
40
---- ---- ______ x
---- "-..
'-
-- --
.,
.... x

-
u
Co

en
0
30
"-
"-....
"- x
20 "-
"-
"-
10
"-

o 2 4
Number of units in order
Figure 9.6

(a) The required totals of the basic data are


x = number of units in an order
y = cost per unit (£)
n=5
LX= 31 x= 6.2
LX2 = 279 s~ = 21.7

LY = 212 ji = 42.4

Ly2 = 9842 5;'=213.3


LXY = 1057 r= -0.9459

Regression Line

a= ji = 42.4

b = -0.9459 xJC2\~i) = -2.97


(Y -42.4) = -2.97(x - 6.2) or Y = 6.8-2.97x
218 Statistics: Problems and Solu tions

Significance of b
Residual variance
S2 = 213.3[1-(-0.9459)2] X j = 29.94
Standard error of b

Eb
s
=v'[~(X_X)2] = J(29.94)
86.8 = 0.587

Observed value of
-2.97-0
t= 0.587 = -5.06

Reference to table 7* for 3 degrees of freedom shows that the value of It I


for significance at the 1% level is 5.841 and at the 2% level is 4.541. The
observed value of t falls between the two and it may reasonably be inferred that
the slope of the 'true' regression line is different from zero and is negative, the
best estimate of its value being -2.97.

Confidence Limits for the Regression Line

J{
The standard error of the regression estimate is

EYj =V~ '[ Ea2 + Eb2(Xj - -)2]


X = 5 + (X86.8
29.94 [1 i -X)2]}
95% confidence limits for the regression estimate at several values of Xj are
derived in table 9.8, figure 9.6 showing these limits plotted on the scatter diagram.

Xj Yj EYj Yj-3.l8 EYj Yj+ 3.18 EYj

1 57.8 3.91 45.4 70.2


3 51.9 3.09 42.1 61.7
6.2 42.4 2.45 34.6 50.2
10 31.1 3.31 20.6 41.6
12 25.2 4.19 11.9 38.5

Table 9.8

(b) To estimate the unit cost of an order for eight lenses, substitution of
x = 8 can be made in the regression equation giving
y = 60.8 - 2.97 x 8 = £37.0
Linear Regression Theory 219

This figure is the 'best' estimate of the average over all possible orders of eight
lenses, of the cost per lens in an order of eight lenses.
The uncertainty of this figure (£37.0) is given by the interval (at 95%
confidence) £28.50 to £45.50.
If required, the cost per lens for a randomly selected order for eight lenses
is likely to be (95% probability) in the interval, £17.64 to £56.36, a very wide
range indeed.

5. The scatter diagram (figure 9.7) does not show any evidence against the
assumption of linearity and in this example, a priori logic suggests that it would
be a reasonable model of the situation.
Let x = the number of boxes in a parcel and y = the number of basic seconds
per parcel.

I
/
300 /
/ x
/
280 /
/ x
/
260 x /
/
/
Ii /
ec 240
/'
/

...
Co
./
GI /'
Co 220 ./
u
GI
III /'
U ./
III
200 x ./
C /' Regression line
m x
180
/
/ 95% Confidence limits
160 /
x/
/
140 /
/
/
120 /
/
/
1000 40
5 10 15 30 35
Number of boxes in parcel

Figure 9.7
220 Statistics: Problems and Solutions

The following totals are obtained from the data (without coding)
n=8
LX = 164 x = 20.5
Lx 2= 4700 si = 191.14
LY = 1690 Y = 211.25
Ly2 = 380100 s~ = 3298.21
LXY = 39130 r = +0.8069

Regression Line

a =y = 211.25

b = 0.8069 x JG~~i~l) = 3.35


(Y-211.25)=3.3S(x-20.5) Y= 142.6+3.3Sx

Significance of b
Residual variance
S2 = 3298.21 (I - 0.8069 2) x t = 1342.5
Standard error of b

€b = J( 1342.S)
1338 = 1.002

Observed value of

t = 3.35 - 0 = 3 34
1.002 .

Reference to table 7* shows that this value, having 6 degrees of freedom, falls
between the 2% and 1% levels of significance (3.143 and 3.707 respectively). The
slope of the regression line can therefore be assumed to be different from zero
with b = 3.35 as its best estimate.

Confidence Limits for the Regression Line


The standard error of the regression estimate for given Xi is

€Yi
=J{1342 5
.
[l
8
+ (Xi - 20.5)2]}
1338
Linear Regression Theory 221

Table 9.9 shows values of €Yj for certainxj together with 95% confidence
limits for the regression estimate at that point. The scatter diagram (figure 9.7)
also has 95% confidence limits drawn on it.

Xj Yj €Yj Yj-2.45 €Yj Yj+ 2.45 €Yj

145.95 23.44 88.5 203.4


5 159.35 20.22 109.8 208.9
10 176.10 16.69 135.2 217.0
20 209.60 12.96 177.8 241.4
30 243.10 16.07 203.7 282.5
40 276.60 23.44 219.2 334.0

Table 9.9

The analysis therefore gives the following estimates


(a) The constant basic seconds per parcel (i.e. the value of Yatx =0) = 142.6 s
and the basic seconds per additional box = 3.35 s.
(b) The average time to wrap a parcel with 18 boxes is 202.9 or 203 s.
although the 95% prediction interval for the time taken to wrap a single parcel
of 18 boxes is from 107 s to 298 s.

6. Here, in order to reduce the computation slightly, all the basic data have
been coded into units of $100; i.e. $1300 becomes 13 etc.
The scatter diagram (figure 9.8) illustrates the case of 'fliers' or 'outliers', i.e.
readings which do not appear to belong to the bivariate distribution. These
suspect readings are marked as A and B in figure 9.8. Whenever such observations
occur in practice, a decision has to be made as to whether or not to exclude
them. Special tests to assist in this are available but are beyond the level of this
book and all that can be said here is that the source of the readings should be
carefully examined and if any reason is found for their not being homogeneous
with the others, they should then be rejected. In many cases, a commonsense
approach will indicate what should be done.
In this example, the two points, A and B, clearly do not conform and a closer
examination of the situation would probably isolate a reason so that the points
could validly be excluded. However, to demonstrate their strong effect on the
analysis, the points A and B have been retained in fitting the regression line.
222 Statistics: Problems and Solutions

xIS)

x
/
y

3000 x
/
/

x /
'/\ Regression line
not significant
2500
/
/
...-
Q)
'0
II>
/
/
E 2000 / x
r= x

x
x
1500 x (A)

1000500
1000 1500 2000
Income level of farms(S) x

Figure 9.8

x =income level (in $100)


y =total sales (in $100)
n = 11 oX = 10.64

~= 117 s~ = 6.85
~X2 = 1313 Y = 23.64
~y = 260 s;' = 43.45
~y2 = 6580 r= 0.3566
~xy =2827
Linear Regression Theory 223

Regression Line

a = ji = 23.64

b = 0.3566 x J( 43.45)
6.85 = 0.898

(Y -23.64) = 0.898(x-10.64) Y = 14.09 + 0.898x (in $100)


or converting back to the original units
Y = 1409 + 0.898x (in $)

Significance of b
The residual variance about the line,
!P = 43.45(1-0.3566 2 ) X ~ = 42.14
Observed
t = b - 0 = 0.898= 1 15
€b 0.784 .
a value which is not significantly high.
The regression line calculated above could therefore be misleading since the
observed data as a whole show no evidence of a linear relation between y and x.
However, as mentioned above, the analysis can be carried out omitting
readings A and B if a valid reason to do so is found. If this is done, the
calculations give
n=9

~x=95

~X2 = 1053
£y = 212
~y2 = 5266

~xy = 2353
leading to
r = 0.9854 and Y = 66.15 + 2.29x (in $)
The fact that just two points have obscured the relationship should be noted,
as should the assistance given by the scatter diagram towards interpretation of the
situation.

7. This example illustrates the simple application of regression analysis to time


series data.
224 Statistics: Problems and Solutions

Note: No attempt will be made to justify forecasting from such analysis


(beware of extrapolation) but the method of fitting the linear regression line
is given.
As usual, the scatter diagram is plotted and is shown in figure 9.9.
To reduce the size of numbers involved in the computation, the years are
coded, 1960 being taken as Year 1 and so on up to 1967 as Year 8.

y 500

400

---
Regression line

~ 300 - - -
11)

'"
"0
11)

~
o
I- x
200

100

1960 '61 '62 '63 '64 '65 '66 '67


x
(I) (2) (3) (4) (5) (6) (7) (8)
Year

Figure 9.9
The scatter diagram shows no strong relationship between the variables
(sales and time), nor is there any apparent evidence of non-linearity, so the
results for the straight line regression are as shown below.
n=8 ky2 =975600 y =342.5
kx=36 kXY = 12880 s~=5307.1

kX2 =204 oX =4.5 r= 0.4403

kY =2740 si =6.0
Linear Regression Theory 225

Regression Line

a= y = 342.5

b= 0.4403 Je!~;·I) = 13.095


(Y - 342.5) = 13.095(x - 4.5) Y = 283.57 + 13.09x

where x is in coded units.

Significance of b
The residual variance about the line is
S2 = 5307.1 [1-(0.4403)2] x ~ = 4991.3
The standard error of b is

and

t =b - 0 = 13.09 = 1 20
€b 10.90 .
with 6 degrees of freedom.
Reference to table 7* shows that this is not significantly different from
zero, that is, there is no evidence of a relationship between sales and time. In
this case there is no point in using the regression equation above to estimate sales
for 1968 (Year 9) or beyond. The average yearly sales figure of 342 is probably
as good a figure as any to use for making a short-term forecast on the basis of the
information given.

You might also like