0% found this document useful (0 votes)
158 views109 pages

Intro Stats for B.Sc. Math Students

This document is an introductory chapter to a textbook on introductory statistics. It defines key statistical concepts such as population, sample, variables, attributes, and frequency distributions. A population refers to all individuals in a group being studied, while a sample is a subset of the population. Variables are quantities that can vary and take different values, while attributes are fixed characteristics. Frequency distributions organize data into groups or classes based on common values of the variables.

Uploaded by

Eshita Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views109 pages

Intro Stats for B.Sc. Math Students

This document is an introductory chapter to a textbook on introductory statistics. It defines key statistical concepts such as population, sample, variables, attributes, and frequency distributions. A population refers to all individuals in a group being studied, while a sample is a subset of the population. Variables are quantities that can vary and take different values, while attributes are fixed characteristics. Frequency distributions organize data into groups or classes based on common values of the variables.

Uploaded by

Eshita Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

School of Distance Education

INTRODUCTORY
STATISTICS
Complementary course for
[Link]. MATHEMATICS
I Semester
(2019 Admission)

UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut University P.O. Malappuram, Kerala, India 673 635

19553

Introductory Statistics 1
School of Distance Education School of Distance Education

INDEX
UNIVERSITY OF CALICUT MODULE 1 5
SCHOOL OF DISTANCE EDUCATION

STUDY MATERIAL
MODULE II 73

I Semester
Complementary Course MODULE III 113
for B Sc. Mathematics
INTRODUCTORY STATISTICS (STA1 C01)
MODULE IV 159
Prepared and Scrutinised by:
Dr. [Link],
Director,
Academic Staff College,
University of Calicut.

©
Reserved
Introductory Statistics 2 Introductory Statistics 3
School of Distance Education School of Distance Education

Module 1

INTRODUCTION

The term statistics seems to have been derived from the Latin word
‘status’ or Italian word ‘statista’ or the German word ‘statistic, each of
which means political state.
The word ‘Statistics’ is usually interpreted in two ways. The first sense
in which the word is used is a plural noun just refer to a collection of
numerical facts. The second is as a singular noun to denote the methods
generally adopted in the collection and analysis of numerical facts. In the
singular sense the term ‘Statistics’ is better described as statistical methods.
Different authors have defined statistics in different ways. According
to Croxton and Cowden statistics may be defined as ‘‘collection,
organisation presentation, analysis and interpretation of numerical data’’

Population and sample


Population
An aggregate of individual items relating to a phenomenon under
investigation is technically termed as ‘population’. In other words a
collection of objects pertaining to a phenomenon of statistical enquiry is
referred to as population or universe. Suppose we want to collect data
regarding the income of college teachers under University of Calicut,, then,
the totality of these teachers is our population.
In a given population, the individual items are referred to as elementary
units, elements or members of the population. The population has the
statistical characteristic of being finite or infinite. When the number of
units under investigation are determinable, it is called finite population. For
example, the number of college teachers under Calicut University is a finite
population. When the number of units in a phenomenon is indeterminable,
eg, the number of stars in the sky, it is called an infinite population.

Introductory Statistics 4 Introductory Statistics 5


School of Distance Education School of Distance Education
Variables and Attributes
Sample
A quantity which varies from one person to another or one time to
When few items are selected for statistical enquiry, from a given
another or one place to another is called a variable. It is actually a numerical
population it is called a ‘sample’. A sample is the small part or subset of the
value possessed by an item. For example, price of a given commodity,
population. Say, for instance, there may be 3000 workers in a factory.
wages of workers, production and weights of students etc.
One wants to study their consumption pattern. By selecting only 300 workers
from the group of 3000, sample for the study has been taken. This sample Attribute means a qualitative characteristic possessed by each individual
is not studied just for its own sake. The motive is to know the true state of in a group. It can’t assume numerical values. For example, sex, honesty,
the population. From the sample study statistical inference about the colour etc.
population can be done. This means that a variable will always be a quantitative characteristic.
Census and sample Method Data concerned with a quantitative variable is called quantitative data and
the data corresponding to a qualitative variable is called qualitative data.
In any statistical investigation, one is interested in studying the population
characteristics. This can be done either by studying the entire items in the We can divide quantitative variables into two (i) discrete (ii) continuous.
population or on a part drawn from it. If we are studying each and every Those variables which can assume only distinct or particular values are
element of the population, the process is called census method and if we called discrete or discontinuous variables. For example, the number of
are studying only a sample, the process is called sample survey, sample children per family, number rooms in a house etc. Those variables which
method or sampling. For example, the Indian population census or a socio can take any numerical value with in a certain range are known as continuous
economic survey of a whole village by a college planning forum are variables. Height of a boy is a continuous variable, for it changes
examples of census studies. The national sample survey enquiries are continuously in a given range of heights of the boys. Similar is the case of
examples of sample studies. weight,: production, price, demand, income, marks etc.

Advantages of Sampling
1. The sample method is comparatively more economical. Types of Frequency Distribution
2. The sample method ensures completeness and a high degree of Erricker states “frequency distribution is a classification according to
accuracy due to the small area of operation the number possessing the same values of the variables’’. It is simply a
3. It is possible to obtain more detailed information, in a sample survey table in which data are grouped into classes and the number of cases
than complete enumeration. which fall in each class is recorded. Here the numbers are usually termed
4. Sampling is also advocated where census is neither necessary nor as ‘frequencies’. There are discrete frequency distributions and continuous
desirable. frequency distributions.
5. In some cases sampling is the only feasible method. For example, we
1. Discrete Frequency Distribution
have to test the sharpness of blades-if we test each blade, perhaps
the whole of the product will be wasted; in such circumstances the If we have a large number of items in the data it is better to prepare a
census method will not be suitable. Under these circumstances frequency array and condense the data further. Frequency array is prepared
sampling techniques will be more useful. by listing once and consecutively all the values occurring in the series and
noting the number of times each such value occurs. This is called discrete
6. A sample survey is much more scientific than census because in it
frequency distribution or ungrouped frequency distribution.
the extent of the reliability of the results can be known where as this
is not always possible in census.
Introductory Statistics 6 Introductory Statistics 7
Frequency Distribution
School of Distance Education School of Distance Education

Illustration: The following data give the number of children per family Concepts of a Frequency Table
in each of 25 families 1, 4, 3, 2, 1, 2, 0, 2, 1, 2, 3, 2, 1, 0, 2, 3, 0, 3, 2, i. Class limits: The observations which constitute a class are called class
1, 2, 2, 1, 4, 2. Construct a frequency distribution. limits. The left hand side observations are called lower limits and the
right hand side observations are called upper limits.
No of children Tally marks No of families
0 lll 3 ii. Working classes: The classes of the form 0-9, 10-19, 20-29,... are
1 lllI l 6 called working classes or nominal classes. They are obtained by the
inclusive method of classification where both the limits of a class are
2 llIl IIII 10
included in the same class itself.
3 lllI 4 iii. Actual classes: If we are leaving either the upper limit or the lower
4 lI 2 limit from each class, it is called exclusive method of classification.
The classes so obtained are called ‘actual classes’ or ‘true classes’.
Total 25 The classes  0.5 - 9.5, 9.5 - 19.5, 19.5 - 29.5,... are the actual
classes of the above working classes. The classes of the type 0-10, 10
2. Continuous Frequency Distribution - 20, 20 - 30,... are also treated as actual classes. There will be no
break in the actual classes. We can convert working classes to the
An important method of condensing and presenting data is that of the corresponding actual classes using the following steps.
construction of a continuous frequency distribution or grouped frequency
distribution. Here the data are classified according to class intervals. 1. Note the difference between one upper limit and the next lower
limit.
The following are the rules generally adopted in forming a frequency
table for a set of observations. 2. Divide the difference by 2.
1. Note the difference between the largest and smallest value in the 3. Subtract that value from the lower limits and add the same to the
given set of observations upper limits.
2. Determine the number classes into which the difference can be For example
divided.
3. The classes should be mutually exclusive. That means they do not Working Classes Frequency Actual Classes
overlap.
1-2.9 2 0.95-2.95
4. Arrange a paper with 3 columns, classes, tally marks and frequency.
5. Write down the classes in the first column. 3-4.9 8 2.95-4.95
6. Go though the observations and put tally marks in the respective 5-6.9 10 4.95-6.95
classes. 7-8.9 5 6.95-8.95
7. Write the sum of the tally marks of each class in the frequency iv. Class boundaries: The class limits of the actual classes are called
column. actual class limits or class boundaries.
8. Note that the sum of the frequencies of all classes should be equal v. Class mark: The class marks or mid value of classes is the average
to the total number of observations.
Introductory Statistics 8 Introductory Statistics 9
School of Distance Education School of Distance Education

of the upper limit and lower limit of that class. The mid value of working Cumulative frequencies are determined on either a less than basis or
classes and the corresponding actual classes are the same. For example, more than basis. Thus we get less than cumulative frequencies (<CF) and
the class mark of the classes 0 - 9, 10 - 19, 20 - 29,... are respectively greater than or more than cumulative frequencies (>CF). Less than CF
4.5, 14.5, 24.5,... give the number of observations falling below the upper limit of a class
and greater than CF give the number of observations lying above the lower
vi. Class interval: The class interval or width of a class is the difference limit of the class. Less than CF are obtained by adding successively the
between upper limit and lower limit of an actual class. It is better to frequencies of all the previous classes including the class against which it
note that the difference between the class limits of a working class is is written. The cumulation is started from the lowest size of the class to
not the class interval. The class interval is usually denoted by ‘c’ or i or the highest size, (usually from top to bottom). They are based on the
‘h’. upper limit of actual classes.
Example More than CF distribution is obtained by finding the cumulation or total
of frequencies starting from the highest size of the class to the lowest
Construct a frequency distribution for the following data
class, (ie., from bottom to top) More than CF are based on the lower limit
70 45 33 64 50 25 65 74 30 20 of the actual classes.
55 60 65 58 52 36 45 42 35 40
51 47 39 61 53 59 49 41 20 55 Classes f UL <CF LL >CF
46 48 52 64 48 45 65 78 53 42
0-10 2 10 2 2 0 3+7+10+8+5+1235
Solution
10-20 5 20 2+5 7 10 3+7+10+8+5 33
Classes Tally marks Frequency 20-30 8 30 2+5+8 15 20 3+7+10+8 28
30-40 10 40 2+5+8+10 25 30 3+7+10 20
20-29 lll 3
40-50 7 50 2+5+8+10+7 32 40 3+10 13
30-39 llll 5 50-60 3 60 2+5+8+10+7+3 35 50 3 3
40-49 llll llll ll 12
50-59 llll llll 10
60-69 llll ll 7
EXERCISES
70-79 lll 3
Multiple Choice Questions
1. A qualitative characterisic is also known as
Total 40 a. attribute b. variable
Cumulative Frequency Distribution c. variate d. frequency
An ordinary frequency distribution show the number of observations 2. A variable which assumes only integral values is called
falling in each class. But there are instances where we want to know a. continuous b. discrete
how many observations are lying below or above a particular value or
in between two specified values. Such type of information is found in c. random d. None of these
cumulative frequency distributions.
Introductory Statistics 10 Introductory Statistics 11
School of Distance Education School of Distance Education

3. An example of an attribute is
16. A group frequency distribution with uncertain first or last classes is
a. Height b. weight known as:
c. age d. sex a. exclusive class distribution
4. Number of students having smoking habit is a variable which is b. inclusive class distribution
a. Continuous b. discrete c. open end distribution
c. neither disrete nor continuous d. discrete frequency distribution
d. None of these Very Short Answer Questions
5. A series showing the sets of all district values individually with their 17. Define the term ‘statistics’.
frequencies is known as
18. Define the term population.
a. grouped frequency distribution
19. What is sampling
b. simple frequency distribution
20. What is a frequency distribution?
c. cumulative frequency distribution
21 Distinguish between discrete and continuous variables.
d. none of the above
Short Essay Questions
6. A series showing the sets of all values in classes with their
corersponding frequencies is knowsn as 22. Explain the different steps in the construction of a frequency table
for a given set of observations.
a. grouped frequency distribution
b. simple frequency distribution 23. Explain the terms (i) class interval (ii) class mark (iii) class frequency.
c. cumulative frequency distribution 24. Distinguish between census and sampling
d. none of the above 25. What are the advantages of sampling over census?
12. If the lower and upper limits of a class are 10 and 40 respectively, the 23. State the various stages of statistical investigation.
mid points of the class is Long Essay Questions
a. 25.0 b. 12.5 c. 15.0 d. 30.0 24. Present the following data of marks secured in Statistics (out of 100)
13. In a grouped data, the number of classes preferred are of 60 students in the form of a frequency table with 10 classes of
a. minimum possible b. adequate equal width, the lowest class being 0-9
c. maximum possible d. any arbitrarily chosen number 41 17 83 60 54 91 60 58 70 07
67 82 33 45 57 48 34 73 54 62
14. Class interval is measured as:
36 52 32 72 60 33 07 77 28 30
a. the sum of the upper and lower limit 42 93 43 80 03 34 56 66 23 63
b. half of the sum of upper and lower limit 63 11 35 85 62 24 00 42 62 33
c. half of the difference between upper and lower limit 72 53 92 87 10 55 60 35 40 57
d. the difference between upper and lower limit
Introductory Statistics 12 Introductory Statistics 13
School of Distance Education School of Distance Education

25. Following is a cumulative frequency distribution showing the marks Module 2


secured and the number of students in an examination:
Marks No. of students (F) MEASURES OF CENTRAL
Below 10 12
“ 20 30 TENDENCY
“ 30 60
“ 40 100 A measure of central tendency helps to get a single representative value
“ 50 150 for a set of usually unequal values. This single value is the point of location
“ 60 190 around which the individual values of the set cluster. Hence the averages
are known also as measures of location.
“ 70 220
“ 80 240 The important measures of central tendencies or statistical averages
are the following.
“ 90 250
1. Arithmetic Mean
Obtain the frequency table (simple) from it. Also prepare ‘More
than’ cumulative frequency table. 2. Geometric Mean
3. Harmonic Mean
4. Median
5. Mode
Weighted averages, positional values, viz., quartiles, deciles and
percentiles, also are considered in this chapter.
Criteria or Desirable Properties of an Average
1. It should be rigidly defined: That is, it should have a formula and
procedure such that different persons who calculate it for a set of
values get the same answer.
2. It should have sampling stability: A number of samples can be drawn
from a population. The average of one sample is likely to be different
from that of another. It is desired that the average of any sample is not
much different from that of any other.

1. Arithmetic Mean
The arithmetic mean (AM) or simply mean is the most popular and
widely used average. It is the value obtained by dividing sum of all given
observations by the number of observations. AM is denoted by x (x bar).

Introductory Statistics 14 Introductory Statistics 15


School of Distance Education School of Distance Education
X d = x  320
Solution
305  15
Definition for a raw data
Daily Earnings in 320
(x) No. 0of workers (f) fx
For a raw data or ungrouped data if x1, x2, x3,...,xn are n observations,
5 332 12 3 15
x 1  x 2  x 3  ...  x n
then x = 6 350 30 8 48
n
7 12 84
x
ie., x = where the symbol  (sigma) denotes summation. 8 10 80
n
9 7 63
Example 1 Total 40 290
Calculate the AM of 12, 18, 14, 15, 16
 fx 290
x =  = 7.25
Solution N 40
Average income per worker is 7.25
x 12  18  14  15  16 75
x = =  = 15
n 5 5 Example 3
Definition for a frequency data Calculate the AM of the following data
Class : 04 4-8 8-12 12-16
For a frequency data if x 1, x 2 , x 3 , ..., x n are ‘n’ observations or
middle values of ‘n’ classes with the corresponding frequencies Frequency : 1 4 3 2
f 1, f 2 , ..., f n then AM is given by Solution
f 1  x 1  f 2  x 2  ...  f n  x n  fx Class f Mid values (x) fx
x = f 1  f 2  ....  f n = f 0-4 1 2 2
4-8 4 6 24
 fx
ie., x = where N =  f = Total frequency 8-12 3 10 30
N
12-16 2 14 28
Example 2 Total 10 84
The following data indicate daily earnings (in rupees) of 40 workers in
a factory.  fx 84
x =  = 8.4
Daily earnings in : 5 6 7 8 9 N 10

No of workers : 3 8 12 10 7
Calculate the average income per worker.

Introductory Statistics 16 Introductory Statistics 17


School of Distance Education School of Distance Education

Shortcut Method: Raw data x A


d=
Suppose the values of a variable under study are large, choose any c
value in between them. Preferably a value that lies more or less in the where A-assumed mean, c-class interval, x-mid values. Then the
middle, called arbitrary origin or assumed mean, denoted by A. Take formula for calculating AM is given by
deviations of every value from the assumed mean A.


Let d = x  A, Taking summation of both sides and dividing by n, we fd
get x = A+ c
N
d
x = A Example 5
n
Calculate AM from the following data
Example 4
Calculate the AM of 305, 320, 332, 350 Weekly wages : 0-10 10-20 20-30 30-40 40-50
Solution Frequency : 3 12 20 10 5
X d = x  320
305  15
320 0 Solution
332 12 x  25
Weekly wages f Mid value x d  fd
350 30 10
27 0-10 3 5 2 6
 18
d 10-20 12 15 1  12
x = A
n 20-30 20 25 0 0
30-40 10 35 1 10
27 20
= 320 
4 40-50 5 45 2 10
= 320+6.75 Total 50 2
= 326.75

f
d

A
+

×
c
=
2
5
+
Shortcut Method: Frequency Data 2

N
x =  10  25  0.4 = 25.4
When the frequencies and the values of the variable x are large the 50
calculation of AM is tedious. So a simpler method is adopted. The
deviations of the mid values of the classes are taken from a convenient
origin. Usually the mid value of the class with the maximum frequency is
chosen as the arbitrary origin or assumed mean. Thus change x values to
‘d’ values by the rule,
Introductory Statistics 18 Introductory Statistics 19
School of Distance Education School of Distance Education

Properties Merits and Demerits


1. The AM is preserved under a linear transformation of scale. Merits
That is, if xi is changed to yi by the rule The most widely used arithmetic mean has the following merits.
1. It is rigidly defined. Clear cut mathematical formulae are available.
yi = a + b xi, then y  a  b x , which is also linear..
2. It is based on all the items. The magnitudes of all the items are
2. The mean of a sum of variables is equal to the sum of the means of considered for its computation.
the variables.
3. It lends itself for algebraic manipulations. Total of a set, Combined
3. Algebraic sum of the deviations of every observation from the A.M. Mean etc., could be calculated.
zero.
4. It is simple to understand and is not difficult to calculate. Because of
4. If n1 observations have an A.M x 1 and n 2 observations have an its practical use, provisions are made in calculators to find it.
5. It has sampling stability. It does not vary very much when samples
AM x 2 then the AM of the combined group of n 1  n 2 observations
are repeatedly taken from one and the same population.
n1x1  n 2 x 2 6. It is very much useful in day-to-day activities, later chapters in
is given by x  n1  n 2 . Statistics and many disciplines of knowledge.
7. Many forms of the formula are available. The form appropriate and
Example 6 easy for the data on hand can be used.
Let the average mark of 40 students of class A be 38; the average mark Demerits
of 60 students of another class B is 42. What is the average mark of the 1. It is unduly affected by extreme items. One greatest item may pull up
combined group of 100 students? the mean of the set to such an extent that its representative character
is questioned. For example, the mean mark is 35 for the 3 students
Here n1= 40, x 1 = 38 , n2 = 60, x 2 = 42
whose individual marks are 0, 5 and 100.
n 1 . x 1  n 2 . x 2 (40  38)  (60  42) 2. Theoretically, it cannot be calculated for open-end data.
Here x   3. It cannot be found graphically.
n1  n 2 40  60
4. It is not defined to deal with qualities.
1520  2520 4040
=  = 40.4
100 100 Weighted Arithmetic Mean
Note In calculating simple arithmetic mean it was assumed that all items are
The above property can be extended as follows. When there are three of equal importance. This may not be true always. When items vary in
groups, the combined mean is given by importance they must be assigned weights in proportion to their relative
importance. Thus, a weighted mean is the mean of weighted items. The
5. The algebraic sum of the squares of the observations from AM is weighted arithmetic mean is sum of the product of the values and their
always minimum. ie., is always minimum. respective weights divided by the sum of the weights.

Introductory Statistics 20 Introductory Statistics 21


School of Distance Education School of Distance Education

Symbolically, if x 1 , x 2 , x 3 ,... x n are the values of items and Geometric Mean


w 1, w 2 , ...w n are their respective weights, then Geometric mean (GM) is the appropriate root (corresponding to the
number of observations) of the product of observations. If there are n
w 1 x 1  w 2 x 2  w 3 x 3  ...  w n x n  w x observations GM is the n-th root of the product of n observations.
WAM = 
w 1  w 2  w 3  ...  w n w
Weighted AM is preferred in computing the average of percentages, Definition for a raw data
ratios or rates relating to different classes of a group of observations. Also
If x1, x2 , x3,..., xn are n observations;
WAM is invariably applied in the computation of birth and death rates and
index numbers.
GM = n x 1 , x 2 , ...... x n
Example 7
Using logarithms, we can calculate GM using the formula,
A student obtains 60 marks in Statistics, 48 marks in Economics, 55
marks in law, 72 marks in Commerce and 45 marks in taxation in an   log x 
GM = Anti log  
examination. The weights of marks respectively are 2, 1, 3, 4, 2. Calculate  n 
the simple AM and weighted AM of the marks.
Definition for a frequency distribution
Solution
For a frequency distribution if x1, x2 , x3,..., xn are n observations
 x 60  48  55  72  45 280
Simple AM =   = 56 with the corresponding frequencies f1, f2, ..., fn
n 5 5

Marks (x) Weights (w) wx GM = N x 1f1 , x 2f 2 , ...... x nf n

60 2 120 using logarithm,


48 1 48
55 3 165   f log x 
GM = Ant ilog   where N =  f .
72 4 288  N 
45 2 90
Note
12 711 1. GM is the appropriate average for calculating index number and
average rates of change.
 w x 711
WAM =  = 59.25 2. GM can be calculated only for non zero and non negative values.
w 12
  w log x 
3. Weighted GM = Anti log  
 w 
where w’s are the weights assigned.

Introductory Statistics 22 Introductory Statistics 23


School of Distance Education School of Distance Education

Example 8   f log x 
GM = Ant ilog  
 N 
Calculate GM of 2, 4, 8
= Antilog(30.2627/42)
Solution
= Antilog 0.7205 = 5.254
GM = n x 1 , x 2 , ...... x n = 3
2  4  8  3 64 = 4 Merits and Demerits
Example 9 Merits
1. It is rigidly defined. It has clear cut mathematical formula.
Calculate GM of 4, 6, 9, 1 1 and 15
2. It is based on all the items. The magnitude of every item is considered
Solution for its computation.
3. It is not as unduly affected by extreme items as A.M. because it gives
  log x 
x logx GM = Anti log   less weight to large items and more weight to small items.
 n  4. It can be algebraically manipulated. The G.M. of the combined set
4 0.6021
can be calculated from the GMs and sizes of the sets.
 4.5520 
6 0.7782 = Anti log   5. It is useful in averaging ratios and percentages. It is suitable to find
9 0.9542  5  the average rate (not amount) of increase or decrease and to compute
11 1.0414 = Antilog0.9104 index numbers.
15 1.1761 = 8.136 Demerits
4.5520 1. It is neither simple to understand nor easy to calculate. Usage of
logarithm makes the computation easy.
Example 10 2. It has less sampling stability than the A.M.
Calculate GM of the following data 3. It cannot be calculated for open-end data.
Classes : 1-3 4-6 7-9 10-12 4. It cannot be found graphically.
Frequency : 8 16 15 3 5. It is not defined for qualities. Further, when one item is zero, it is
zero and thereby loses its representative character. It cannot be
Solution calculated even if one value or one mid value is negative.
Classes f X logx [Link]
1-3 8 2 0.3010 2.4080
4-6 16 5 0.6990 11.1840
7-9 15 8 0.9031 13.5465
10-12 3 11 1.0414 3.1242
Total 42 30.2627

Introductory Statistics 24 Introductory Statistics 25


School of Distance Education School of Distance Education
Harmonic Mean
Solution
n 5
The harmonic mean (HM) of a set of observations is defined as the HM = 
1 1 1 1 1 1
reciprocal of the arithmetic mean of the reciprocals of the observations.     
x 2 3 4 5 7
Definition for a raw data
5 5  420
If x1, x2 , x3,..., xn are ‘n’ observations =  = 3.50
210  140  105  84  60 599
1 n n 420
HM = 
 1 1 1  1 1 1 = 1 
   ..     ..    Example 12
 x1 x 2 xn  x1 x 2 xn x
n Calculate HM of 5, 11, 12,16, 7, 9, 15, 13, 10 and 8
Solution
Definition for a frequency data
If x1, x2 , x3,..., x n are ‘n’ observations with the corresponding X 1/x X 1/x
frequencies f 1 , f 2 , f 3 ,..., f n 5 0.2000 9 0.1111
11 0.0909 15 0.0667
N N 12 0.0833 13 0.0769
then HM = 1 1 1 = f 
f1   f2   ..  f n    16 0.0625 10 0.1000
x1 x2 xn x 7 0.1429 8 0.1250
where N =  f To t a l 1 . 0 5 9 3
Note 1 HM can be calculated only for non zero and non negative values. n
HM = = (10/1.0593) = 9.44
1 
Note 2 HM is appropriate for finding average speed when distance travelled  
at different speeds are equal. Weighted HM is appropriate when x
the distances are unequal. HM is suitable to study rates also.
Merits and Demerits
N Merits
Note 3 Weighted HM = where w’s are the weighted assigned.
w  1. It is rigidly defined. It has clear cut mathematical formula.
  2. It is based on all the items. The magnitude of every item is considered
x 
for its computation.
3. It is affected less by extreme items than A.M. or even G.M.
4. It gives lesser weight to larger items and greater weight to lesser
Example 11 items.
Calculate the HM of 2, 3, 4, 5 and 7

Introductory Statistics 26 Introductory Statistics 27


School of Distance Education School of Distance Education

5. It can be algebraically manipulated. The H.M. of the combined set which divides the distribution into two equal parts. The median can be
can be calculated from the H.M.s and sizes of the sets. For example, calculated using the following formula.
N1  N 2 N 
HM 12 
N1 N2  m 
2

HM 1 HM 2 M l    c
f
6. It is suitable to find the average speed.
where, l - lower limit of median class
Demerits
1. It is neither simple to understand nor easy to calculate. Median class - the class in which N/2lh observation falls
2. It has less sampling stability than the A.M. N - total frequency
3. Theoretically, it cannot be calculated for open-end data. m - cumulative frequency up to median class
4. It cannot be found graphically. c - class interval of the median class
5. It is not defined for qualities. It is not calculated when atleast one f - frequency of median class
item or one mid value is zero or negative. found to lie with in that interval.
6. It gives undue weightage to small items and least weightage to largest
items. It is not used for analysing business or economic data.
Example 13
Find the median height from the following heights (in cms.) of 9 soldiers.
Median 160, 180, 175, 179, 164, 178, 171, 164, 176
Median is defined as the middle most observation when the observations
are arranged in ascending or descending order of magnitude. That means Solution
the number of observations preceding median will be equal to the number Step 1. Heights are arranged in ascending order:
of observations succeeding it. Median is denoted by M.
160, 164, 164, 171, 175, 176, 178, 179, 180.
Definition for a raw data
n 1 9 1
For a raw data if there are odd number of observations, there will be Step 2. Position of median = is calculated. It is 5.
only one middle value and it will be the median. That means, if there are n 2 2
observations arranged in order of their magnitude, the size of (n+1)/2  th Step 3. Median is identified (5th value) M = 175cms.
observation will be the median. If there are even number of observations
the average of two middle values will beththe median. That means, median n 1
n  It is to be noted that
2
may be a fraction, in which case, median is
will be the average of n/2th and   1  observations.
2  found as follows.
Definition for a frequency data
For a frequency distribution median is defined as the value of the variable Example 14

Introductory Statistics 28 Introductory Statistics 29


School of Distance Education School of Distance Education

Find the median weight from the following weights (in Kgs) of 10 N  1 70  1 1
soldiers. 75, 71, 73, 70, 74, 80, 85, 81, 86, 79 Step 2. Position of median,   35 is calculated.
2 2 2
Solution Step 3. Median is identified as the average of the values at the
positions 35 and 36. The values are 173 and 178 respectively.
Step 1. Weights are arranged in ascending order:
173  178
70, 71, 73, 74, 75, 79, 80, 81, 85, 86  M  = 175.5cm
2
n  1 10  1 1
Step 2. Position of median =  5 is calculated Example 16
2 2 2
Calculate median for the following data
Step 3. Median is found. It is the mean of the values at 5th
75  79 Class : 0-5 5-10 10-15 15-20 20-25
and 6th positions and so M = = 77Kgs. f : 5 10 15 12 8
2
Example 15
Find the median for the following data. Solution
Height in cms : 160 164 170 173 178 180 182 Class f CF
No. of soldiers : 1 2 10 22 19 14 2 0-5 5 5
5-10 10 15
Solution
10-15 15 30
Step 1. Heights are arranged in ascending order. Cumulative 15-20 12 42
frequencies (c.f) are found. (They help to know the 20-25 8 50
values at different positions)
Total 50
Height in cms. No. of Soldiers C.f.
N 
160 1 1  m 
 2  c
164 2 3 M l  Median class is 10-15
170 10 13 f
173 22 35
Here l  10, N / 2  50 / 2  25, c  5, m  15, f  15
178 19 54
180 14 68  25  15  5
182 2 70 M = 10 
15
Total 70 
10  5 10
= 10   10  = 10+3.33 = 13.33
15 3
Introductory Statistics 30 Introductory Statistics 31
School of Distance Education School of Distance Education

Example 17 Graphical Determination of Median


Calculate median for the data given below.
Median can be determined graphically using the following
Classes : 0-6 7-13 14-20 21-27 28-34 35-41
Steps
f : 8 17 28 15 9 3
1. Draw the less than or more than ogive
Solution: 2. Locate N/2 on the Y axis.
Class f Actual class CF
3. At N/2 draw a perpendicular to the Y axis and extend it to meet the ogive
0-6 8  0.5-6.5 8 4. From the point of intersection drop a perpendicular to the X axis
7-13 17 6.5-13.5 25 5. The point at which the perpendicular meets the X axis will be the median
14-20 28 13.5-20.5 53 value.
21-27 15 20.5-27.5 68 Median can also be determined by drawing the two ogives, simultaneously.
28-34 9 27.5-34.5 77 Here drop a perpendicular from the point of intersection to the X axis. This
35-41 3 34.5-41.5 80 perpendicular will meet at the median value.

Total 80

Median class is 13.5-20.5, l = 13.5, N/2 = 80/2 = 40


c = 7, m = 25, f = 28

N 
 m 
M = l  2   c  13.5  (40  25)  7
f 28

15  7 15
= 13.5   13.5 
28 4

= 13.5+3.75

= 17.25

Introductory Statistics 32 Introductory Statistics 33


School of Distance Education School of Distance Education

Merits and Demerits Mode


Merits Mode is that value of the variable, which occur maximum number of
1. It is not unduly affected by extreme items. times in a set of observations. Thus, mode is the value of the variable,
2. It is simple to understand and easy to calculate. which occur most frequently. Usually statements like, ‘average student’,
‘average buyer’, ‘the typical firm’, etc. are referring to mode of the
3. It can be calculated for open end data
phenomena. Mode is denoted by Z or Mo. For a raw data as well as for a
4. It can be determined graphically. discrete frequency distribution we can locate mode by inspection.
5. It can be used to deal with qualitative data. For a frequency distribution mode is defined as the value of the variable
Demerits having the maximum frequency. For a continuous frequency distribution
1. It is not rigidly defined. When there are even number of individual it can be calculated using the formula given below:
observations, median is approximately taken as the mean of the two
1
middle most observations. Z l  c
2. It is not based on the magnitude of all the items. It is a positional 1  2
measure. It is the value of the middle most item. where l : lower limit of modal class
3 It cannot be algebraically manipulated. For example, the median of Modal class : Class having the maximum frequency
the combined set can not be found from the medians and the sizes of
the individual sets alone. 1 : difference between the frequency of modal
class and that of the premodal class
4. It is difficult to calculate when there are large number of items which
are to be arranged in order of magnitude. 2 : difference between frequency of the modal
5. It does not have sampling stability. It varies more markedly than A M class and that of the post modal class
from sample to sample although all the samples are from one and the c : class interval
same population.
6. Its use is lesser than that of AM. For applying this formula, the class intervals should be (i) of equal size
(ii) in ascending order and (iii) in exclusive form.

Example 18
Determine the mode of

420, 395, 342, 444, 551, 395, 425, 417, 395, 401, 390

Solution
Mode = 395

Introductory Statistics 34 Introductory Statistics 35


School of Distance Education School of Distance Education

Example 19 For a symmetrical or moderately assymmetrical distribution, the


Determine the mode
empirical relation is
Mean  Mode = 3 (Mean  Median)
Size of shoes : 3 4 5 6 7 8 9
This relation can be used for calculating any one measure, if the
No of pairs sold : 10 25 32 38 61 47 34
remaining two are known.
Solution
Mode = Z = 7
Example 21
Example 20
In a moderately assymmetrical distribution Mean is 24.6 and Median
Calculate mode for the following data 25.1. Find the value of mode.
Classes : 0-9 10 - 19 20-29 30-39 40-49 50-59
Solution
f : 5 10 17 33 22 13
We have
Solution
Mean  Mode = 3(Mean  Median)
Classes f Atual class 24.6  Z = 3(24.6  25.1)
0-9 5  0.5-9.5 24.6  Z = 3(  0.5) =  1.5
10-19 10 9.5-19.5 Z = 24.6 + 1.5 = 26.1
20-29 17 19.5-29.5
30-39 33 29.5-39.5 Example 22
40-49 22 39.5-49.5 In a moderately assymmetrical distribution Mode is 48.4 and Median
50-59 13 49.5-59.5 41.6. Find the value of Mean
Solution
We have,
1
Z = l c Modal class is 29.5-39.5 Mean  Mode = 3(Mean  Median)
1  2
x  48.4 = 3( x  41.6)
l = 29.5
x  48.4 = 3 x  124.8
1  33  17  16
3 x  x = 124.8  48.4
2  33  22  11, c  10
2 x = 76.4
16 x = 76.4  2 = 38.2
= 29.5   10
16  11
= 29.5+5.92 = 35.42

Introductory Statistics 36 Introductory Statistics 37


School of Distance Education School of Distance Education

Merits and Demerits Quartiles


Merits Quartiles are partition values which divide the distribution or area under
1. Mode is not unduly affected by extreme items. a frequency curve into 4 equal parts at 3 points namely Q1, Q2, and Q3 .
2. It is simple to understand and easy to calculate Q1 is called first quartile or lower quartile, Q2 is called second quartile,
3. It is the most typical or representative value in the sense that it has middle quartile or median and Q3 is called third quartile or upper quartile.
the greatest frequency density. In other words Q1 is the value of the variable such that the number of
4. It can be calculated for open-end data. observations lying below it, is N/4 and above it is 3N/4. Q2 is the value of
5. It can be determined graphically. It is the x-coordinate of the peak the variable such that the number of observations on either side of it is
of the frequency curve. equal to N/2. And Q3 is the value of the variable such that the number of
6. It can be found for qualities also. The quality which is observed observations lying below Q3 is 3N/4 and above Q3 is N/4.
more often than any other quality is the modal quality.

Demerits
1. It is not rigidly defined.
2. It is not based on all the items. It is a positional value.
3. It cannot be algebraically manipulated. The mode of the combined set
cannot be determined as in the case of AM.
4. Many a time, it is difficult to calculate. Sometimes grouping table and
Deciles and Percentiles
frequency analysis table are to be formed.
5. It is less stable than the A.M. Deciles are partition values which divide the distribution or area under
6. Unlike other measures of central tendency, it may not exist for some a frequency curve into 10 equal parts at 9 points namely D1, D2, .........,
data. Sometimes there may be two or more modes and so it is said to be D9.
ill defined.
7. It has very limited use. Modal wage, modal size of shoe, modal size of Percentiles are partition values which divide the distribution into 100
family, etc., are determined. Consumer preferences are also dealt with. equal parts at 99 points namely P1, P2, P3, .... P99. Percentile is a very
useful measure in education and psychology. Percentile ranks or scores
Partition Values
can also be calculated. Kelly’s measure of skewness is based on percentiles.
We have already noted that the total area under a frequency curve is
equal to the total frequency. We can divide the distribution or area under Calculation of Quartiles
a curve into a number of equal parts choosing some points like median. The method of locating quartiles is similar to that method used for
They are generally called partition values or quantiles. The important finding median. Q 1 is the value of the item at
partition values are quartiles, deciles and percentiles. (n + 1)/4 th position and Q 3 is the value of the item at
3(n + 1) / 4th position when actual values are known. In the case of a
frequency distribution Q1 and Q3 can be calculated as follows.

Introductory Statistics 38 Introductory Statistics 39


School of Distance Education School of Distance Education

 iN 
N   m 
4
 m 
4 Qi  l i     c, i  1, 2, 3
Q1  l1    c f
f

where l1 - lower limit of Q1 class


In a similar fashion deciles and percentiles can be calculated as
th
Q1 class - the class in which N/4 item falls
 iN 
 m 
m - cumulative frequency up to Q1 class  10   c, i  1, 2, 3,.... 9
Di  l i 
f
c - class interval  iN 
 m 
 100   c, i  1, 2, 3,...., 99
Pi  l i 
f - frequency of Q1 class f

 3N  Graphical Determination of Quartiles


 m 
4
Q3  l 3    c Quartiles can be determined graphically by drawing the ogives of the
f given frequency distribution. So draw the less than ogive of the given
data. On the Y axis locate N/4, N/2 and
where l3 - lower limit of Q3 class
3N/4. At these points draw perpendiculars to the Y axis and extend it to
meet the ogive. From the points of intersection drop perpen-diculars to the
Q3 class - the class in which 3N/4th item falls X axis. The point corresponding to the CF, N/4 is Q1 corresponding to the
CF N/2 is Q2 and corres-ponding to the CF 3N/4 is Q3.
m - cumulative frequency up to Q3 class

c - class interval

f - frequency of Q3 class

We can combine these three formulae and can be written as

Introductory Statistics 40 Introductory Statistics 41


School of Distance Education School of Distance Education

Example 23 Solution

Find , Q1, Q3, D2, D9, P16, P65 for the following data. 282, 754, 125, Cumulative
Marks No of students
765, 875, 645, 985, 235, 175, 895, 905, 112 and 155. frequency

Solution 25 3 3
35 29 32
Step 1. Arrange the values in ascending order
40 32 64
112, 125, 155, 175, 235, 282, 645, 754, 765, 875,
895, 905 and 985. 50 41 105
52 49 154
n  1 13  1 14
Step 2. Position of Q1 is    3.5 53 54 208
4 4 4
Similarly positions of Q3, D2, D9, P16 and P65 are 10.5, 67 38 246
2.8, 12.6, 2.24 and 9. 1 respectively. 75 29 275
Step 3. 80 27 302
Q1  155  0.5(175  155) = 165 Step 1. The cumulative frequencies of marks given in ascending order
are found
Q3  875  0.5(895  875) = 885
Step 2. The positions of Q1, Q3, D4, P20 and P99 are found.
D 2  125  0.8(155  125) = 149.0 They are
D 9  905  0.6(985  905) = 953 N  1 303
 = 75.75
P16  125  0.24(155  125) = 132.20 4 4

P65  765  0.1(875  765) = 776.0 3(N  1) 303


3 = 227.25
Note 4 4
The value of the 12.6-th position (D9) is obtained as value of 12-th 4(N  1) 40  303
position + 0.6 (value at 13-th position - value at 12-th position)  = 121.20
10 10
Example 24 20(N  1) 20  303
 = 60.60
Find Q1, Q3, D4, P20 and P99 for the data given below. 100 100

Mark : 25 35 40 50 52 53 67 75 80 99(N  1) 99  303


 = 299.97
No of students : 3 29 32 41 49 54 38 29 27 100 100

Introductory Statistics 42 Introductory Statistics 43


School of Distance Education School of Distance Education

Step 3. The marks of students at those positions are found


N 
Q1  50  0.75(50  50) = 50 Marks   m c
Q1 = l1   4 
Q3  67  0.25(67  67) = 67 Marks f

D 4  52  0.20(52  52) = 52 Marks (25  10 )5


= 35 
16
P20  40  0.60(40  40) = 40 Marks
P9 9  8 0  0 .9 7(8 0  8 0 ) = 80 Marks 15  5 75
= 35  = 35 
16 16
Note
= 35+4.68 = 39.68
Refer the above example to know the method of finding the values of
the items whose positions are fractions.
N 
Example 25   m c
Q2 = l2   2 
Calculate quartiles for the following data f
Classes : 30-35 35-40 40-45 45-50 50-55 55-60 60-65
(50  44)5
Freq. : 10 16 18 27 18 8 3 = 45 
27
6 5
= 45 
27
10
= 45  = 45+1.11 = 46.111
Solution 9
Class f CF
 3N 
  m c
= l3   4 
30-35 10 10
Q3
35-40 16 26 f
40-45 18 44
(75  71)5
45-50 27 71 = 50 
18
50-55 18 89
4 5 10
55-60 8 97 = 50  = 50  = 50+1.11 = 51.111
18 9
60-65 3 100
Total 100

Introductory Statistics 44 Introductory Statistics 45


School of Distance Education School of Distance Education

EXERCISES 9. The median of the variate values 11, 7, 6, 9, 12, 15,, 19 is:
Multiple Choice Questions a. 9 b. 12 c. 15 d. 11
1. Mean is a measure of
a. location or central value b. dispersion 10. The second dicile divides the series in the ratio:
c. correlation d. none of the above a. 1:1 b. 1:2 c. 1:4 d. 2:5

2. If a constant value 50 is subtracted from each observation of a set, 11. For further algebraic treatment, geometric mean is:
the mean of the set is: a. suitable b. not suitable
a. increased by 50 b. decreased by 50 c. sometimes suitable d. none of the above
c. is not affected d. zero
[Link] percentage of values of a set which is beyond the third quartile is:
3. If the grouped data has open end classes, one cannot calculate: a. 100 percent b. 75 percent
a. median b. mode [Link] d. quartiles c. 50 percent d. 25 percent
13. In a distribution, the value around which the items tend to be most
4. Harmonic mean is better than other means if the data are for: heavily concentrated is called:
a. speed or rates b. heights or lengths a. mean b. median
c. binary values like 0 & 1 d. ratio or proportions c. third quartile d. mode
14. Sum of the deviations about mean is
5. Extreme value have no effect on: a. zero b. minimum c. maximum d. one
a. average b. median
c. geometric mean d. harmonic mean 15. The suitable measure of central tendency for qualitative data is:
a. mode b. arithmetic mean
6. If the A.M. of a set of two observations is 9 and its G.M. is 6. Then c. geometric mean d. median
the H.M. of the set of observations is:
16. The mean of the squares of first eleven natural numbers is:
a. 4 b. 3 6 c. 3 d. 1.5
a. 46 b. 23 c. 48 d. 42
7. The A.M. of two numbers is 6.5 and their G.M. is 6. The two numbers
The percentage of items in a frequency distribution lying between
are:
upper and lower quartiles is:
a. 9, 6 b. 9, 5 c. 7, 6 d. 4, 9
a. 80 percent b. 40 percent
8. If the two observations are 10 and  10 then their harmonic mean is:
c. 50 percent d. 25 percent
a. 10 b. 0 c. 5 d. 

Introductory Statistics 46 Introductory Statistics 47


School of Distance Education School of Distance Education

Very Short Answer Questions 34. Show that GM of a set of positive observation lies between AM &
AM.
17. What is central tendency?
35. What are the essential requisites of a good measure of central
18. Define Median and mode.
tendency? Compare and contrast the commonly employed measures
19. Define harmonic mean in terms of these requisites.
20. Define partition values 36. Discuss the merits and demerits of the various measures of central
21 State the properties of AM. tendency. Which particular measure is considered the best and why?
22 In a class of boys and girls the mean marks of 10 boys is 38 and the Illustrate your answer.
mean marks of 20 girls 45. What is the average mark of the class? 37.. What is the difference between simple and weighted average? Explain
23. Define deciles and percentiles. the circumstances under which the latter should be used in preference
to the former.
24 Find the combined mean from the following data.
38. Find the average rate of increase in population which in the first
Series x Series y
decade has increased 12 percent, in the next by 16 per cent, and in
Arithmetic mean 12 20 third by 21 percent.
No of items 80 60 39.. A person travels the first mile at 10 km. per hour, the second mile at
Short Essay Questions 8 km. per hour and the third mile at 6 km. per hour. What is his
25 Define mode. How is it calculated. Point out two average speed?
26. Define AM, median and mode and explain their uses Long Essay Questions
27. Give the formulae used to calculate the mean, median and mode of a
40. Compute the AM, median and mode from the following data
frequency distribution and explain the symbols used in them.
28. How will you determine three quartiles graphically from a less than Age last birth day : 15-19 20-24 25-29 30-34 35-39 40-44
ogive? No of persons : 4 20 38 24 10
29. Three samples of sizes 80, 40 and 30 having means 12.5, 13 and 11
respectively are combined. Find the mean of the combined sample.
30. Explain the advantages and disadvantages of arithmetic mean as an 41. Calculate Arithmetic mean, median and mode for the following data.
average. Age : 55-60 50-55 45-50 40-45 35-40 30-35 25-30 20-25
31. For finding out the ‘typical’ value of a series, what measure of No of people : 7 13 15 20 30 33 28 14
central tendency is appropriate?

42. Calculate mean, median and mode from the following data
32 Explain AM and HM. Which one is better? And Why?
Class Frequency
33. Prove that the weighted arithmetic mean of first n natural numbers
whose weights are equal to the corresponding number is equal to Up to 20 52
20-30 161
 2n  1 / 3

Introductory Statistics 48 Introductory Statistics 49


School of Distance Education School of Distance Education

30-40 254 MODULE 1


40-50 167
PA RT I I I
50-60 78
60-80 64
MEASURES OF DISPERSION
Over 80 52
By disp ersio n we mea n s prea ding or s ca ttered ne ss o r
variation. It is clear from the above example that
43. Calculate mean, median and mode dispersion measures the extent to which the items vary
Central wage in Rs. : 15 20 25 30 35 40 45 from some central value. Since measures of dispersion
No. of wage earners: 3 25 19 16 4 5 6 give an average of the differences of various items from
44. (i) Find the missing frequencies in the following distribution given an average, they are also called averages of second
that N = 100 and median of the distribution is 110. o r d e r.
(ii) Calculate the arithmetic mean of the completed frequency Desirable properties of an ideal measure of
distribution. dispersion
The following are the requisites for an ideal measure
Class : 20-40 40-60 60-80 80-100 100-120 of dispersion.
Frequency : 6 9 - 14 20
1. I t s h o u l d b e r i gi d l y d e f i n e d a n d i t s v a l u e s h o u l d b e
Class : 120-140 140-160 160-180 180-200
definite.
Frequency : 15 - 8 7
2. It should be easy to understand and simple to calculate.
3. It should be based on all observations.
4. It should be capable of further algebraic treatment.
5. It should be least affected by sampling fluctuations.

M e t h o d s o f St u d y i n g Va r i a t i o n
The following measures of variability or dispersion
are commonly used.
1 . Range 2 . Quartile Deviation
3 . Mean Deviation 4. Standard Deviation

Here the first two are called positional measures of


dispersion. The other two are called calculation
measures of deviation.

Introductory Statistics 50 Introductory Statistics 51


School of Distance Education School of Distance Education

Solution
Absolute and Relative Dispersion
Absolute measures and relative measures are the Range = L  S = 165  147 = 18
two kinds of measures of dispersion. The formers are L  S 165  147
Coefficient of Range =  = 0.0577
used to assess the variation among a set of values. L  S 165  147
The latter are used whenever the variability of two or
more sets of values are to be compared. Relative Example 2
measures give pure numbers, which are free from the Calculate coefficient of range from the following
units of measurements of the data. Even data in data:
different units and with unequal average values can Mark: 10-20 20-30 30-40 40-50 50-60
be compared on the basis of relative measures of N o. o f s t u d e n t s : 8 10 12 8 4
dispersion. Less is the value of a relative measure,
less is the variation of the set and more is the
Solution
c o n s i s t e n c y. T h e t e r m s , s t a b i l i t y, h o m o g e n e i t y, L  S 60  10
Coe ffic ie nt of Ra n ge =  = 0.7143
uniformity and consistency are used as if they are L  S 60  10
synonyms.
Merits and Demerits
1. Range
Definition Range is the difference between the
Merits
greatest (largest) and the smallest of the given values. 1. It is the simplest to understand and the easiest to
calculate.
In symbols, Range = L  S where L is the greatest 2. It is used in Statistical Quality Control.
value and S is the smallest value.
Demerits
The corresponding relative measure of dispersion is
1. Its definition does not seem to suit the calculation
defined as
for data with class intervals. Further, it cannot
L S be calculated for open-end data.
Coefficient of Range =
L S 2. It is based on the two extreme items and not on
Example 1 any other item.
The price of a share for a six-day week is fluctuated as 3. I t d o e s n o t h a v e s a m p l i n g s t a b i l i t y. F u r t h e r , i t i s
follows: c a l c u l a t e d f o r s a m p l e s o f s m a l l s i z e s o n l y.
156 165 148 151 147 162 4. It could not be mathematically manipulated
Calculate the Range and its coefficient. f u r t h e r.
5. It is a very rarely used measure. Its scope is
limited to very few considerations in Quality
Control.

Introductory Statistics 52 Introductory Statistics 53


School of Distance Education School of Distance Education

2. Quartile Deviation Example 5


Definition Calculate Quartile deviation for the following data.
Also calculate quartile coefficient of dispersion.
Quartile deviation is half of the difference between
the first and the third quartiles. Class: 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Q3  Q1 f : 6 18 25 50 37 30 24 10
In symbols, Q.D = , Q.D is the abbreviation.
2 Solution
A mo n g t h e q u a r t il e s Q l, Q 2 a n d Q 3, th e ra n ge i s
Classes f CF
Q3  Ql.
ie., inter-quartile range is Q3  Ql and Q.D which is 20-30 6 6
Q3  Q1 30-40 18 24
is the semi inter-quartile range. 40-50 25 49
2

3 3
Q  Q1 50-60 50 99
Coefficient of Quartile Deviation = Q + Q 60-70 37 136
1
This is also called quartile coefficient of dispersion. 70-80 30 166
80-90 24 190
Example 3
90-100 10 200
Find the Quartile Deviation for the following:
N 
391, 384, 591, 407, 672, 522, 777, 733, 1490, 2488  m  N 200
Q1 = l1   4 c   50
Solution f 4 4
Before finding Q.D., Q1 and Q3 are found from the
50  49  10
values in ascending order: = 50  l1  50, c  10
384, 391, 407, 522, 591, 672, 733, 777, 1490, 2488 50

n  1 10  1 1  10 1
Position of Q1 is  = 2.75 = 50   50  m = 49, f = 50
4 4 50 5
Q1 = 391 + 0.75 (407  391) = 403 = 50 + 0.2 = 50.2
3(n  1)  3N 
 3  2.75 m 
Position of Q3 is
4
= 8.25  150  136  10
Q3 = l3   4 c = 70 
Q3 = 777+0.25 (1490  777) = 955.25 f 30
Q3  Q1 955.25  403.00
QD =  = 276.125 14  10
2 2 = 70   70  4.67 = 7 4 . 6 7
30

Introductory Statistics 54 Introductory Statistics 55


School of Distance Education School of Distance Education

Q3  Q1 74.67  50.20 24.47 Instead of taking deviation from mean, if we are


QD =   = 12.23
2 2 2
using median we get the mean deviation about median.
Quartile coefficient of dispersion  x M
 M.D. about Median =
Q3  Q1 74.67  50.20 n
24.47
= Q  Q  74.67  50.20  124.87 = 0 . 1 9 6 For a frequency data, MD about Mean is given by
3 1
 f x x
(MD ) x  ;N f
Merits and Demerits N
 f x M
MD about Median (MD) =
Merits N
1. It is rigidly defined.
Note
2. It is easy to understand and simple to calculate.
3. It is not unduly affected by extreme values. Whenever nothing is mentioned about the measure of
4. It can be calculated for open-end distributions.
Central tendency from which deviations are to be

considered, deviations are to be taken from the mean


Demerits
1.
It is not based on all observations and the required MD is MD about mean.
2. It is not capable of further algebraic treatment
(i) C o e f f i c i e n t of MD (about mean) =
3. It is much affected by fluctuations of sampling.
MD about mean
Mean Deviation Mean
The Mean Deviation is defined as the Arithmetic mean (ii) C o e f f i c i e n t of MD (about median) =
of the absolute value of the deviations of observations MD about median
from some origin, say mean or median or mode. Median
Thus for a raw data
Example 6
 x x
M.D about Mean =
n Calculate MD about Mean of 8, 24, 12, 16, 10, 20
where x  x stands for the absolute deviation of x
f r o m x a n d i s r e a d a s m o d u l u s o f x  x  o r m o d
x  x  .

Introductory Statistics 56 Introductory Statistics 57


School of Distance Education School of Distance Education

Solution Merits and Demerits


Merits
x x x x x
1. It is rigidly defined
8 7 7 2. It is easy to calculate and simple to understand
24 9 9 3. It is based on all observations.
12 3 3 4. It is not much affected by the extreme values of items.
16 1 1 5. It is stable.
10 5 5 Demerits
20 5 5 1. It is mathematically illogical to ignore the algebraic
90 30 signs of deviations.
2. No further algebraic manipulation is possible.
Example 8
3. It gives more weight to large deviations than
Calculate MD about Mean and the coefficient of MD smaller ones.
Classes: 0-10 1 0-2 0 2 0-3 0 3 0-4 0 4 0-5 0
St a n d a r d D e v i a t i o n
f : 5 15 17 11 2
The standard deviation is the most useful and the most
Solution popular measure of dispersion. The deviation of the
observations from the AM are considered and then each
Class f x fx x x x x f x x squared. The sum of squares is divided by the number of
0-10 5 5 25  18 18 90 observations. The square root of this value is known as
10-20 15 15 225 8 8 120 t h e s t a n d a r d d e v i a t i o n . T h u s St a n d a rd d e v i a t i o n ( S D ) i s
d e f i n e d a s t h e s q u a re ro o t o f t h e A M o f t h e s q u a re s o f t h e
20-30 17 25 425 2 2 34
d e v i a t i o n s o f o b s e r v a t i o n s f ro m A M . I t i s d e n o t e d b y ‘ s ’
30-40 11 35 385 12 12 132 ( s i gm a ) . We c a n c a l c u l a t e S D u s i n g t h e f o l l o w i n g f o r m u l a .
40-50 2 45 90 22 22 44
So f or a ra w d a ta , i f x 1, x 2, x 3. .. . x n ar e n ob s er vat i on s
To t a l 50 11 5 0 420
 (x i  x )2
 fx 1150 SD = s =
  23 n
x =
N 50
Fo r a f r e q u e n c y d a ta , i f x 1, x 2, x 3. . . . x n a r e n
 f x x 420 observations or middle values of n classes with the
(ND )x =  = 8.4 c o rre sp o n d in g fre q u e n c ie s f1, f2 ,... . fn
N 50

MD about mean 8.4  fi (x i  x )2


Coefficient of MD= = = 0.3652 then, SD = s =
Mean 23 N

Introductory Statistics 58 Introductory Statistics 59


School of Distance Education School of Distance Education

T h e s q u a r e o f t h e S D i s k n o w n a s ‘ Va r i a n c e ’ a n d i s x A
denoted as s2 or SD is the positive square root of where d =
c
, A - assumed mean, c - class
variance.
interval.
Simplified formula for SD The relative measure of dispersion based on SD or
For a raw data, we have coefficient of SD is given by
SD 
2 
1
n
 (x  x )2 =
1
n

 x 2  2x x  x 2  Coefficient of SD =
AM

x

1 1 1 I m p o r t a n c e o f St a n d a r d D e v i a t i o n
=  x 2  2x  x  x 2 1
n n n Standard deviation is always associated with the
mean. It gives satisfactory information about the
 x2 effectiveness of mean as a representative of the data.
=  2x  x  x 2
n More is the value of the standard deviation less is the
concentration of the observations about the mean and
 x2 vice versa. Whenever the standard deviation is small
=  x2 mean is accepted as a good average.
n
According to the definition of standard deviation, it
can never be negative. When all the observations are
 x2
\ s =  x2 equal standad deviation is zero. Therefore a small value
n of s suggests that the observations are very close to
I n a s i m i l a r w a y, f o r a f r e q u e n c y d a t a each other and a big value of s suggests that the
o b s e r v a t i o n s a r e w i d e l y d i f f e r e n t f r o m e a c h o t h e r.
2
 f x2  f x 
s =   P r o p e r t i e s o f St a n d a r d D e v i a t i o n
N  N 
1. Standard deviation is not affected by change of
origin.
Short Cut Method
Proof
2
d 2  d  Le t x 1, x 2 , . . . . x n b e a s e t o f n o b s e rv a ti o n s .
For a raw data, s =   where d = x
n  n  1
Then sx =  (x i  x )2
-A n
2 Choose yi = xi + c for i = 1, 2, 3... n
 fd 2   fd 
For a frequency data, s = c   y = x c
N  N  Then

Introductory Statistics 60 Introductory Statistics 61


School of Distance Education School of Distance Education

 yi  y = xi  x
Note
2 =  (x i  x )2
 (y i  y ) If there are k groups then the S.D. of the k groups combined is given
by the formula.
1 1
 (y i  y )2  (x i  x )2
n
=
n (n 1  n 2  ....  n k ) 2  n 1 12  n 2  22  ....  n k  k2
+ n 1 d 12  n 2 d 22  ....  n k d k2
1 1
ie.,  (y i  y )2 =  (x i  x )2
n n
ie., sy = sx
Hence the proof Coefficient of Variation
Coefficient of variation (CV) is the most important
2. Standard deviation is affected by change of scale. relative measure of dispersion and is defined by the formula.
Proof
Let x1, x2 ,.... xn be a set of n observations.
St an dar d deviat ion
C o e f f i c i e n t o f Va r i a t i o n =  100
1 Ar it h m et ic m ean
Then sx =  (x i  x )2
n
SD 
Choose yi = c xi + d, i = 1, 2, 3... n and c and d are constants. This fulfils CV =  100 =  100
AM x
the idea of changing the scale of the original values.
CV is thus the ratio of the SD to the mean, expressed
Now y = cx d as a perce ntage. Acc ording to Karl Pearson, Coefficient
 yi  y = c (x i  x ) of variation is the percentage variation in the mean.

 (y i  y )2 = c 2  (x i  x )2 C o e f f i c i e n t o f Va r i a t i o n i s t h e w i d e l y u s e d a n d m o s t
popular relative measure. The group which has less C.V
1 1
 (y i  y )2 = c
2
 (x i  x )2 is said to be more consistent or more uniform or more
n n stable. More coefficient of variation indicates greater
variability or less consistency or less uniformity or less
1 1
ie.,  (y i  y )2 = c  (x i  x )2 s t a b i l i t y.
n n
Example 9
ie., sy = ´ sx
c

SD of y values = c ´ SD of x values Calculate SD of 23, 25, 28, 31, 38,


40, 46
Hence the proof.
Introductory Statistics 62 Introductory Statistics 63
School of Distance Education School of Distance Education
Solution E x a m p l e 11
x x x ( x  x )2
Calculate SD of 42, 48, 50, 62, 65
23  10 100
25 8 64 Solution
28 5 25
31 2 4 x d  x  50 d2
38 5 25 42 8 64
40 7 49 48 2 4
46 13 169
50 0 0
231 436 62 12 144
x =231/7=33 65 15 225

To t a l 17 437
 (x i  x )2
SD = = 7.89 2
n d 2  d 
SD =   = 8.70
n  n 
Example 10
Calculate SD of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Solution Example 12
x 1 2 3 4 5 6 7 8 9 10
Calculate SD of the following data

x2 1 4 9 16 25 36 49 64 81 100 Size (x) : 10 12 14 16 18


F r e q u e n c y: 2 4 10 3 1
 x = 55

x2 = 385 Solution x f fx fx2

10 2 20 200
 x 55 12 4 48 576
x =   5.5
n 10 14 10 140 1960
x2 385 16 3 48 768
SD =    x2   5.5 2 18 1 18 324
n 10
To t a l 20 274 3828
= 38.5  30.25  8.25 = 2 . 8 7

Introductory Statistics 64 Introductory Statistics 65


School of Distance Education School of Distance Education

2 2 x  15
 fx 2   fx  3828  247  Class f x d  fd fd2
 =      6
N  N  20  20 
0-6 5 3 2  10 20
6-12 12 9 1  12 12
= 191.4  (13.7)2  191.40  187.69  3.71 = 1 . 9 2 12-18 15 15 0 0 0
18-24 10 21 1 10 10
Example 13 24-30 3 27 2 6 12
Calculate SD of the following data To t a l 60 6 54
Classes : 0-4 4-8 8-12 12-16 16-20
 fd 6
f : 3 8 17 10 2 x = A  c  15  6
N 60
Solution
6
Classes f x fx fx2 = 15   15  0.6 = 1 4 . 4
10
0-4 3 2 6 12
2 2
4-8 8 6 48 288  fd 2   fd  54  6 
 = c   6  
8-12 17 10 170 1700 N  N  60  60 
12-16 10 14 140 1960
= 6  0.90  0.01  6  0.89
16-20 2 18 36 648
= 6  0.9434 = 5 . 6 6
To t a l 40 400 4608
SD 5.66
2 2 CV =  100   100 = 3 9 . 3 0 %
 fx 2   fx  4608  400  AM 14.4
 =     
N  N  40  40 
Merits and Demerits
= 115.2  100  15.2 = 3 . 8 9 Merits
1. It is rigidly defined and its value is always definite
Example 14 and based on all the observations and the actual
signs of deviations are used.
Calculate mean, SD and CV for the following data
2. As it is based on arithmetic mean, it has all the
Classes : 0-6 6-12 12-18 18-24 24-30
merits of arithmetic mean.
f : 5 12 30 10 3
3. It is the most important and widely used measure
of dispersion.

S o l u t i o n Introductory Statistics 66 Introductory Statistics 67


School of Distance Education School of Distance Education

4. It is possible for further algebraic treatment.


EXERCISES
5. It is less affected by the fluctuations of sampling,
and hence stable. Multiple Choice Questions
6. Squaring the deviations make all the deviations 1. Which of the following is a unit less measure of
positive; as such there is no need to ignore the dispersion?
signs (as in mean deviation). a. standard deviation b. mean deviation
7. It is the basis for measuring the coefficient of c. coefficient of variation d. range
correlation, sampling and statistical inferences.
2. Formula for coefficient of variation is:
8. The standard deviation provides the unit of
measurement for the normal distribution. SD mean
a . C.V .  100 b . C.V .  100
9. It can be used to calculate the combined standard mean SD
deviation of two or more groups. mean  SD 100
10. The coefficient of variation is considered to be the c . C.V .  d . C.V . 
100 mean  SD
most appropriate method for comparing the
variability of two or more distributions, and this 3. F o r a s y m m e t r i c a l d i s t r i b u t i o n , M d  QD c o v e r s :
is based on mean and standard deviation. a. 25 percent of the observations
Demerits b. 50 percent of the observations
1. It is not easy to understand, and it is difficult to c. 75 percent of the observations
calculate. d. 100 percent of the observations
2. It gives more weight to extreme values, because Md = median and Q.D. = quartile deviation.
the values are squared up.
4 Sum of squares of the deviations is minimum when
3. It is affected by the value of every item in the deviations are taken from:
series.
a. mean b. median c. mode d. zero
4. A s i t i s a n a b s o l u t e m e a s u r e o f v a r i a b i l i t y, i t c a n n o t
5 If a constant value 5 is subtracted from each
be used for the purpose of comparison.
observations of a set, the variance is:
5. It has not found favour with the economists and
a. reduced by 5 b. reduced by 25
businessmen.
c. unaltered d. increased by 25
6. Which measure of dispersion ensures highest
degree of reliability?
a. range b. mean deviation
c. quartile deviation d. standard deviation

Introductory Statistics 68 Introductory Statistics 69


School of Distance Education School of Distance Education

7. If the mean deviation of a distribution is 20.20, the 19. C a l c u l a t e c o e f f i c i e n t o f v a r i a t i o n f o r t h e f o l l o w i n g


standard deviation of the distribution is: distribution.
a. 15.15 b. 25.25 x : 0 1 2 3 4 5 6
f : 1 4 13 21 16 7 3
c. 30.30 d. none of the above
8. The mean of a series is 10 and its coefficient of
variation is 40 percent, the variance of the series 20. F o r t h e f o l l o w i n g d a t a c o m p u t e s t a n d a r d d e v i a t i o n ,
is: x : 10 20 30 40 50 60
a. 4 b. 8 c. 12 d. none of the above f : 3 5 7 20 8 7
9. Which measure of dispersion can be calculated in 2 1. C a l c u l a t e m e d i a n a n d q u a r t i l e d e v i a t i o n f o r t h e
case of open end intervals? following data
a. range b. standard deviation x : 60 62 64 66 68 70 72
c. coefficient of variation d. quartile deviation f : 12 16 18 20 15 13 9
Ve r y S h o r t A n s w e r Q u e s t i o n s
1 0. W h a t a r e t h e u s e s o f s t a n d a r d d e v i a t i o n ? 2 2. C a l c u l a t e S D f o r t h e f o l l o w i n g d a t a
1 1. W h y m e a s u r e s o f d i s p e r s i o n a r e c a l l e d a v e r a g e s o f Class interval : 0-5 5-10 10-15 15-20 20-25 25-30
second order?
Frequency : 4 8 14 6 3 I
1 2. F o r t h e n u m b e r s 3 a n d 5 s h o w t h a t S D = ( 1 / 2 )
Range. Long Essay Questions
1 3. D e f i n e C V a n d s t a t e i t s u s e . 2 3. C o m p u t e c o e f f i c i e n t o f v a r i a t i o n f r o m t h e d a t a
g i v e n b e l o w.
1 4. S t a t e t h e d e s i r a b l e p r o p e r t i e s o f a m e a s u r e o f
dispersion Ma rk s L es s t ha n: 10 20 30 40 50 60 70 80 90 100
1 5. D e f i n e Q u a r t i l e d e v i a t i o n . No . of st ud en ts : 5 13 25 48 65 80 92 97 99 100
1 6. G i v e t h e e m p i r i c a l r e l a t i o n c o n n e c t i n g Q D , M D a n d 2 4. C a l c u l a t e t h e s t a n d a r d d e v i a t i o n o f t h e f o l l o w i n g
SD. series. More than : 0 10 20 30 40 50
60 70
Short Essay questions
Frequency : 100 90 75 50 25 15 5 0
17. D e f i n e c o e f f i c i e n t o f v a r i a t i o n . W h a t i s i t s
relevance in economic studies? 2 5. T h e m e a n a n d t h e s t a n d a r d d e v i a t i o n o f a g r o u p o f
50 observations were calculated to be 70 and 10
r e s p e c t i v e l y, I t w a s l a t e r d i s c o v e r e d t h a t a n
18. W h a t i s a r e l a t i v e m e a s u r e o f d i s p e r s i o n ? observation 17 was wrongly-recorded as 70. Find
Distinguish between absolute and relative measure the mean and the standard deviation (i) if the
of dispersion. incorrect observation is omitted (ii) if the incorrect
observation is replaced by the correct value.
Introductory Statistics 70 Introductory Statistics 71
School of Distance Education School of Distance Education

2 6. C a l c u l a t e t h e s t a n d a r d d e v i a t i o n a n d t h e c o e f f i c i e n t I MODULE II
of variation of a raw data for which n = 50
 (x i  x )  10,  (x i  x )2  400 CORRELATION AND REGRESSION
27. F o r t w o s a m p l e s s i z e 1 0 e a c h , w e h a v e t h e CURVE FITTING
following values We have already studied the behaviour of a single variable characteristic
2 2 by analysing a univariate data using the summary measures viz ; measures
 x  71;  x  555;  y  70;  y  525 c o m p a r e t h e
of central tendency, measures of dispersion measures of skewness and
variability of these two samples. measures of kurtosis.
28. D e f i n e c o e f f i c i e n t o f v a r i a t i o n . I n d i c a t e i t s u s e . A Very often in practice a relationship is found to exist between two (or
factory produces two types of electric lamps A and more) variables. For example; there may exist some relation between heights
and weights of a group of students; the yield of a crop is found to vary
B. In an experiment relating to their life,. the with the amount of rainfall over a particular period, the prices of some
following results were obtained: commodities may depend upon their demands in the market etc.
Life in No. of lamps It is frequently desirable to express this relationship in mathematical
form by formulating an equation connecting the variables and to determine
hours A B the degree and nature of the relationship between the variables. Curve
500-700 .. 5 4 fitting, Correlation and Regression respectively serves these purposes.

700-900 .. 11 30 Curve fitting


Let x be an independent variable and y be a variable depending on x;
900-1100 . . 26 12
Here we say that y is a function of x and write it as y = f(x). If f(x) is a
1100-1300 . . 10 8 known function, then for any allowable values x1, x2,.... xn of x. we can
find the corresponding values y1, y2,.... yn of y and thereby determine the
1300-1500 . . 8 6
pairs (x1, y1), (x2, y2) .... (xn, yn) which constitute a bivariate data.
These pairs of values of x and y give us n points on the curve y = f(x).
Compare the variability of the two types of lamps
Suppose we consider the converse problem. That is, suppose we are
u s i n g C . V.
given n values x1, x2,.... xn of an independent variable x and corresponding
values y1, y2,.... yn of a variable y depending on x. Then the pairs (x1, y1),
(x2, y2) .... (xn, yn) give us n points in the xy-plane. Generally, it is not
possible to find the actual curve y = f(x) that passes through these points.
Hence we try to find a curve that serves as best approximation to the
curve y = f(x). Such a curve is referred to as the curve of best fit. The
process of determining a curve of best fit is called curve fitting. The method
generally employed for curve fitting is known as the method of least squares
which is explained below.
Introductory Statistics 72 Introductory Statistics 73
School of Distance Education School of Distance Education
Method of least squares 1. Fitting of a Straight Line y = a + bx
This is a method for finding the unknown coefficients in a curve that Suppose we wish to have a straight line that serves as best approximation
serves as best approximation to the curve y = (f(x). The basic ideas of this to the actual curve y = f(x) passing through n given points (x1, y1), (x2, y2)
method were created by A.M. Legendire and C.F. Gauss. .... (xn, yn). This line will be referred to as the line of best fit and we take
“The principle of least squares says that the sum of the squares of the its equation as
error between the observed values and the corresponding estimated values y = a + bx .... (1)
should be the least.” where a and b are the parameters to be determined. Let ye be the value of
Suppose it is desired to fit a k-th degree curve given by y corresponding to the value xi of x as determined by equation (1). The
value ye is called the estimated value of y.
y = a 0  a 1 x  a 2 x 2  ......  a k x k .... (1)
When x = xi, the observed valued of y is yi. Then the difference yi  ye
to the given pairs of observations (x1, y1), (x2, y2) .... (xn, yn). The is called residual or error. By the principle of least squares, we have
curve has k + 1 unknown constants and hence if n = k + 1 we get k + 1
S =  (y i  y e )2 .... (2)
equations on substituting the values of (xi, yi) in equation (1). This gives
unique solution to the values a 0 a 1 a 2 ..... a n . However, if n > k + 1, no We determine a and b so that S is minimum (least). Two necessary
conditions for this are
unique solution is possible and we use the method of least squares.
Now let S S
 0, 0
a b
ye = a 0  a 1 x  a 2 x 2  ......  a k x k be the estimated value of y
Using (2), these conditions yield the following equations:
when x takes the value xi. But the corresponding observed value of y is yi.
Hence if ei is the residual or error for this point,  (y i  a  bx i ) = 0

ei  y i  y e  y i  a 0  a 1 x i  a 2 x i2  ......  a k x ik or  yi = n a + b x i .... (3)


To make the sum of squares minimum, we have to minimise. and  (y i  a  bx i ) x i  0
n n 2 2
   xi y i = a  x i  b  x i .... (4)
S=  ei2  y i  a 0  a 1 x i  a 2 x i2  .....  a k x ik .... (2)
i 1 i 1 These two equations, (3) and (4), called normal equations, serve as
By differential calculus, S will have its minimum value when two simultaneous equations for determining a and b. Putting the values of
a and b so determined in (1), we get the equation of the line of best fit for
s s s the given data.
 0,  0, .... 0
a0  a1 ak
Example 1
which gives k + 1 equations called normal equations. Solving these Fit a straight line to the following data:
equations we get the best values of a 0 , a 1 , a 2 ..... a k . Substituting these x : 1 2 3 4 5
values in (1) we get the curve of best fit.
y : 14 13 4 5 2
Now we consider the fitting of some curves. Estimate the value of y when x = 3.5
Introductory Statistics 74 Introductory Statistics 75
School of Distance Education School of Distance Education

Solution y = a + bx + cx2 .... (1)

We note that n = 5, and form the following Table. where a, b, c are constants to be determined.
Let ye be the value of y corresponding to the value xi of x determined
xi yi x i2 xi yi
by equation (1). Then the sum of squares of the error between observed
1 14 1 14 value of y and estimated value of y is given by
2 13 4 26 n

3 9 9 27 S =  (y i  y e )2
i 1
4 5 16 20
n
5 2 25 10 Using (1), this becomes, S =  (y i  a  bx i  cx i2 )2 .... (2)
i 1
2
 xi = 15  yi = 43  x i = 55  xi yi = 97 We determine a, b, c so that S is least. Three necessary conditions for
s S S
this are  a  0,  b  0, and  c  0 . Using (2) these conditions yield

Hence the normal equations that determine the line of best fit are the following normal equations.

43 = 5a + 15b  yi = na  b  x i  c  x i2 .... (3)


97 = 15a + 55b  xi yi = a  x i  b  x i2  c  x i3 .... (4)
These give a = 18.2 and b =  3.2. Hence, for the given data, the line
 x i2 y i = a  x i2  b  x i3  c  x i4 .... (5)
of best fit is y = 18.2  3.2x.
Solve (3), (4) and (5) for determining a, b and c. Putting the values of
When x = 3.5, the estimated value of y (found from the line of best a, b, c so determined in (1) we get the equation of parabola of best fit for
fit) is y = 18.2  (3.2)  (3.5) = 7 the given data.

2. Fitting of a Parabola y = a + bx + cx2 Example 2


Suppose we wish to have a parabola (second degree curve) as the Fit a parabola to the following data:
curve of best fit for a data consisting of n given pairs x : 1 2 3 4 5 6 7 8 9
(xi, yi), i = 1, 2.... n. y : 2 6 7 8 10 11 11 10 9
Let us take the equation of the parabola, called ‘parabola of best fit’ Estimate y when x = 4.5
in the form

Introductory Statistics 76 Introductory Statistics 77


School of Distance Education School of Distance Education
EXERCISES
Solution Multiple choice questions
Here n = 9, and we form the following Table: 1. The equation Y = ab x for   1 represents
2 3 4 2
xi yi xi xi xi x i yi xi yi a) exponential growth curve
1 2 1 1 1 2 2 b) exponentially decay curve
2 6 4 8 16 12 24 c) a parabola
3 7 9 27 81 21 63 d) none of the above
4 8 16 64 256 32 128
2. For fitting of curves, we use
5 10 25 125 625 50 250
6 11 36 216 1296 66 396 a) method of moments
7 11 49 343 2401 77 539 b) method of least squares
8 10 64 512 4096 80 640 c) method of maximum likelyhood
9 9 81 729 6561 81 729
d) all the above
2 3 4 x i y i = 421 x i2 y i = 2771
x i = 45 y i = 74 x i = 285 x i = 2025 x i = 15333 3. While fitting a straight line y  a  bx , the value of b measures
a. the rate of change in y w.r.t x
The normal equations that determine the parabola of best fit are b. the proportional variation in y w.r.t the variation in x
74 = 94a + 45b + 285c c. both (a) and (b)
421 = 45a + 285b + 2025c d. neither (a) nor (b)
2771 = 285a + 2025b + 15333c Very short answer questions
Solving these equations, we obtain a =  0.9282,
4. What is the principle of least squares?
b= 3.523 and c =  0.2673. Hence the parabola of best fit for the given
5. What do you mean by curve fitting?
data is
6. What are normal equations?
y =  0.9282 + 3.523x  0.2673x2
7. Write down the normal equation to fit a straight line
For x = 4.5, the parabola of best fit gives the estimated value of y as
y = ax + b.
y =  0.9282 + 3.523  4.5  0.2673  4.52 = 9.5121
Short essay questions
8. How will you fit a straight line to the given data by method of least
squares?
9. What is method of least squares? How will you use it to fit a parabola
of second degree?
10. Explain the principle of least squares method of fitting of a second
degree curve of the form Y = a + bx + cx2 for ‘n’ pairs of values.
Introductory Statistics 78 Introductory Statistics 79
School of Distance Education School of Distance Education
11. Write short notes on
CORRELATION AND REGRESSION
a. Method of least squares
b. Curve fitting
Introduction
c. Normal equations.
In the earlier chapters we have discussed the characteristics and shapes
Long essay questions of distributions of a single variable, eg, mean, S.D. and skewness of the
12. Fit a straight line by the method of least squares to the following distributions of variables such as income, height, weight, etc. We shall
data. now study two (or more) variables simultaneously and try to find the
x: 0 1 2 3 4 quantitative relationship between them. For example, the relationship
between two variables like (1) income and expenditure (2) height and
y: 1 1.8 3.3 4.5 6.3 weight, (3) rainfall and yield of crops, (4) price and demand, etc. will be
13. Fit a straight line y = a + bx to the following data. examined here. The methods of expressing the relationship between two
x: 1 2 3 4 6 8 variables are due mainly to Francis Galton and Karl Pearson.
y: 2.4 3 3.6 4 5 6 Correlation
14. Fit a straight line y = ax + b to the following data. Correlation is a statistical measure for finding out degree (or strength)
x: 1 2 3 4 5 6 7 of association between two (or more) variables. By ‘association’ we mean
y: 80 90 92 83 94 99 92 the tendency of the variables to move together. Two variables X and Y are
so related that movements (or variations) in one, say X, tend to be
2
15. Fit a parabola y = a + bx + x to the following data: accompanied by the corresponding movements (or variations) in the other
x: 0 1 2 3 4 Y, then X and Y are said to be correlated. The movements may be in the
y: 1 1.8 1.3 2.5 6.3 same direction (i.e. either both X, Y increase or both of them decrease) or
in the opposite directions (ie., one, say X, increases and the other Y
decreases). Correlation is said to be positive or negative according as these
16. Fit a curve of the form y = ax + bx2 for the data given below. movements are in the same or in the opposite directions. If Y is unaffected
x: 1 2 3 4 5 by any change in X, then X and Y are said to be uncorrelated.
y: 1.8 5.1 8.9 14.1 19.8 In the words L.R. Conner:
If two or more quantities vary in sympathy so that movements in the
one tend to be accompanied by corresponding movements in the other,
then they are said to be correlated.”
Correlation may be linear or non-linear. If the amount of variation in X
bears a constant ratio to the corresponding amount of variation in Y, then
correlation between X and Y is said to be linear. Otherwise it is non-linear.
Correlation coefficient (r) measures the degree of linear relationship, (i.e.,

Introductory Statistics 80 Introductory Statistics 81


School of Distance Education School of Distance Education

linear correlation) between two variables.

Determination of Correlation
Correlation between two variables may be determined by any one of
the following methods:
1. Scatter Diagram
2. Co-variance Method or Karl Pearson’s Method
3. Rank Method

Scatter Diagram
The existence of correlation can be shown graphically by means of a
scatter diagram. Statistical data relating to simultaneous movements (or
variations) of two variables can be graphically represented-by points. One
of the two variables, say X, is shown along the horizontal axis OX and the
other variable Y along the vertical axis OY. All the pairs of values of X and
Y are now shown by points (or dots) on the graph paper. This diagrammatic
representation of bivariate data is known as scatter diagram.
The scatter diagram of these points and also the direction of the scatter
reveals the nature and strength of correlation between the two variables.
The following are some scatter diagrams showing different types of
correlation between two variables.
In Fig. 1 and 3, the movements (or variations) of the two variables are In Fig. 5 and 6 points (or dots) instead of showing any linear path lie
in the same direction and the scatter diagram shows a linear path. In this around a curve or form a swarm. In this case correlation is very small and
case, correlation is positive or direct. we can take r = 0.
In Fig. 2 and 4, the movements of the two variables are in opposite In Fig. 1 and 2, all the points lie on a straight line. In these cases
directions and the scatter shows a linear path. In this case correlation is correlation is perfect and r = +1 or  1 according as the correlation is
negative or indirect. positive or negative.

Karl Pearson’s Correlation Coefficient


We have remarked in the earlier section that a scatter diagram gives us
only a rough idea of how the two variables, say x and y, are related. We
cannot draw defensible conclusions by merely examining data from the
scatter diagram. In other words, we cannot simply look at a scatter diagram

Introductory Statistics 82 Introductory Statistics 83


School of Distance Education School of Distance Education
variables. On the other hand, neither can we conclude that the correlation So the correlation coefficient or the coefficient of correlation (r) between
at all. We need a quantity (represented by a number), which is a measure X and Y is defined by
of the extent to which x and y are related. The quantity that is used for this
purpose is known as the Co-efficient of Correlation, usually denoted by Cov (X, Y)
rxy or r. The co-efficient of correlation rxy measures the degree (or extent) r = x y
of relationship between the two variables x and y and is given by the
following formula: where  x ,  y are standard deviations of X and Y respectively..
n
 (X i  X )(Yi  Y ) The formula for the Correlation Coefficient r may be written in different
forms.
r xy = i 1 .... (1)
n  x y i. If xi = X  X and y i  Y  Y
where Xt and Yi (i = 1, 2,.., n) are the two sets of values of x and y
 x i yi
respectively and x , y ,  x ,  y are respectively the corresponding means then r = n  (1)
x y
and standard deviations so that
n n 1
1 1  xi yi
X =
n
 Xi , Y 
n
 Yi  from (1), r =
n 
 xi yi
i 1 i 1
 x i2  y i2  x i2   y i2

1 n
1 2 n n
 x2 =
n
 (X i  X )2 
n
 X i2  X
i 1 ii. We have

n 1
1 1 2  ( X i  X ) (Yi  Y )
and  y2 =
n
 (Yi  Y )2 
n
 Yi 2  Y Cov (X, Y) =
n
i 1
1
The above definition of the correlation co-efficient was given by Karl =  ( X i Yi  X i Y  XYi  X Y )
Pearson in 1890 and is called Karl Pearson’s Correlation Co-efficient after n
his name.
 X i Yi  Xi  Yi n X Y
Definition = Y X 
n n n n
If (X1, Y1), (X2 ,Y2) .... (Xn , Yn) be n pairs of observations on two
variables X and Y, then the covariance of X and Y, written as cov (X,Y) is  X i Yi
= X Y X Y X Y
defined by n
1  X i Yi  X i Yi   X i    Yi 
Cov (X,Y) =  ( X i  X )(Yi  Y ) X Y  
n =  
n n  n  n 
Covariance indicates the joint variations between the two variables.
Introductory Statisticsy 84 Introductory Statistics 85
School of Distance Education School of Distance Education

and conclude that since more than half of the points appear to be nearly in x i  c1 u i  x 0 an d y i  c 2 v i  y 0
a straight line, there is a positive or negative correlation between the
x  x 0  c1 u and y  y 0  c 2 v
C ov (X , Y )
Now, r =  x y where u and v are the means u is and v is respectively..

 X i Yi   X i    Yi  x i  x  c1 (u i  u ) and y i  y  c 2 (v i  v )
  
n  n  n 
= ...(2) Substituting these values in (1), we get
2 2
 X i2   X i   Yi 2   Yi 
    
n  n  n  n  1 n

n
 c1 (u i  u ) c2 (vi  v )
2
iii. By multiplying each term of (2) by n , we have i 1
r xy =
n n
1 1
r =
n  X i Yi  (X i )( Yi )
n
 c12 (u i  u )2
n
 c22 (vi  v )2
i 1 i 1
n  X i2  (X i )2  n  Yi 2  (Yi )2

n
1
Theorem n
 (u i  u )(vi  v )
i 1
The correlation coefficient is independent (not affected by) of the change = n n
1 1
of origin and scale of measurement.
n
 (u i  u )2
n
 (vi  v )2
i 1 i 1
Proof
Let (x1, y1), (x2, y2) .... (xn, yn) be a set of n pairs of observations. n

1  (u i  u )(v i  v )
 (x i  x ) (y i  y ) = i 1 = ruv
n n u v
r xy = ...(1)
1 1
 (x i  x )2  (y i  y )2
n n Here, we observe that if we change the origin and choose a new scale,
the correlation co-efficient remains unchanged. Hence the proof.
Let us transform xi to ui and yi to vi by the rules,
Here, ruv can be further simplified as
xi  x 0 y  y0
ui = and v i  i ...(2)
c1 c2 C ov (u , v )
r xy = u v
where x0, y0, c1, c2 are arbitrary constants.
From (2), we have

Introductory Statistics 86 Introductory Statistics 87


School of Distance Education School of Distance Education

n n
1
n
 u i vi  u v  X i Yi
i 1 and r xy = i 1 ...(3)
= 1 1 n x  y
 u i2  u 2  vi2  v 2
n n
Now we have
n n n
n  u i vi   u i vi
n  Xi Y 
2
 X i2  Yi 2 2  X i Yi
=
n  u i2  ( u i )2 n  vi2  ( vi )2    i 
  x y  = i 1
 i 1
 i 1
i 1    x2  y2  x y

Limits of Correlation Co-efficient


n  x2 n  y2
We shall now find the limits of the correlation =   2nr xy using (1), (2), (3).
coefficient between two variables and show that it  x2  y2
lies between 1 and  1 .
= 2n  2n r xy  2n (1  r xy )
ie., 1  r xy  1 Left hand side of the above identity is the sum of the squares of
n numbers and hence it is positive or zero.
Proof
Let (x 1 , y 1 ) , (x 2 , y 2 ) .... (x n , y n) be the given Hence, 1  r xy  0 or, r xy  1 and r xy  1
pairs of observations.
or 1  r xy  1
1 ie., the correlation co-efficient lies between  1 and + 1. Hence the
 (x i  x )(y i  y )
n proof.
Then r =
xy 1 1
 (x i  x )2  (y i  y )2 Note:
n n If rxy = 1, we say that there is perfect positive correlation
We put between x and y.
X i = xi  x , Yi = yi  y If rxy =  1, we say that there is perfect negative correlation
between x and y.
n n
1 1 If rxy = 0, we say that there is no correlation between the two
 x2 =
n
 (x i  x )2 = n
 X i2 ...(1) variables, i.e., the two variables are uncorrelated.
i 1 i 1
If rxy > 0, we say that the correlation between x and y is
n
1 positive (direct).
Similarly  y2 =
n
 Yi 2 ...(2) If rxy < 0, we say that the correlation between x and y is negative
i 1
(indirect).

Introductory Statistics 88 Introductory Statistics 89


School of Distance Education School of Distance Education
X Y x = X X y =Y Y x2 y2 xy 7.6
Using (1) 0.28 
3 y
1 6 3 4 9 16 12
2 8 2 2 4 4 4 7.6 760
or, 0.84  y  7.6, or y  
3 11 1 1 1 1 1 0.84 84
4 9 0 1 0 1 0 = 9.048
5 12 1 2 1 4 2
Example 10
6 10 2 0 4 0 0
7 14 3 4 9 16 12 Calculate Pearson’s coefficient of correlation between advertisement
cost and sales as per the data given below:
28 70 28 42 29
Advt cost in ’000 Rs: 39 65 62 90 82 75 25 98 36 78
 X 28  Y 70 Sales in lakh Rs: 47 53 58 86 62 68 60 91 51 84
X    4 an d Y    10
n 7 n 7
Solution
Karl Pearson’s coefficient of correlation (r) is given by

 xy Karl Pearson’s coefficient of correlation (r) is given by

x2 y2
r =  xy
wher e x  X  X and y  Y Y
 x2  y2
r=
29
= = 0.8457
28 42
X Y x X X y  Y Y x2 y2 xy
Example 9
39 47  26  19 676 361 494
Karl Pearson’s coefficient of correlation between two variables X and
65 53 0  13 0 169 0
Y is 0.28 their covariance is +7.6. If the variance of X is 9, find the
standard deviation of Y-series. 62 58 3 8 9 64 24
Solution 90 86 25 20 625 400 500
Karl Pearson’s coefficient of correlation r is given by 82 62 17 4 289 16  68
75 68 10 2 100 4 20
cov ( X , Y )
r = 25 60  40 6 1600 36 240
 x y
98 91 33 25 1089 625 825
Here r =0.28, Cov (X, Y) = 7.6 and  x2  9;  x  3 .

Introductory Statistics 90 Introductory Statistics 91


School of Distance Education School of Distance Education
Example 8
108 63 8 13 64 169 104
Find the coefficient of correlation from the following data: 106 66 6 16 36 256 96
X: 1 2 3 4 5 6 7 100 62 0 12 0 144 0
104 69 4 19 16 361 76
Y: 6 8 11 9 12 10 14
105 61 5 11 25 121 55
36 51  29  15 841 225 435
78 84 13 18 169 324 234   96 84 1128 1380 312
n  u i vi   u i vi
650 660 0 0 5398 2224 2704
 u i2  ( u i )2 n  vi2  ( vi )2
r =
n
 X 650  Y 660 12  312  96  84
X    65 ; Y    66
n 10 n 10 =
12  1128  (96 )2 12  1380  (84 )2
2704
r = 0.78 =  0.67
5398  2224
Example 12
Example 11 A computer while calculating the correlation coefficient between two
Calculate Pearson’s coefficient of correlation from the following taking variables X and Y from 25 pairs of observations obtained the following results:
100 and 50 as the assumed average of X and Y respectively: n  25,  X  125,  Y  100,  X 2  650,  Y 2  460 and
 X Y  508 . It was, however, discovered at the time of checking that two
X: 104 111 104 114 118 117 105 108 106 100 104 105 pairs of observations were not correctly copied. They were taken as (6, 14) and
(8, 6), while the correct values were (8, 12) and (6, 8). Prove that the correct
Y: 57 55 47 45 45 50 64 63 66 62 69 61
value of the correlation coefficient should be 2/3.
Solution Solution
2 2
X Y u = X  100 v = Y  50 u v uv When the two incorrect pairs of observations are replaced by the correctpairs,
the revised results for the whole series are:
104 57 4 7 16 49 28  X = 125  (Sum of two incorrect values of X) +
111 55 11 5 121 25 55 (Sum of two correct values of X)
104 47 4 3 16 9  12 = 125  (6 + 8) + (8 + 6) = 125
114 45 14 5 196 25  70
Similarly
118 45 18 5 324 25  90
Y = 100  (14 + 6) + (12 + 8) = 100
117 50 17 0 289 0 0
2 = 650  (62 + 82) + (82 + 62) = 650
105 64 5 14 25 196 70 X
X Y = 508  (6  14  8  6)  (8  12  6  8) Y 2 = 460  (142 + 62) + (122 + 82)
= 460  232 + 208 = 436 and
Introductory Statistics 92 Introductory Statistics 93
School of Distance Education School of Distance Education
Proof:
= 508  132 + 144 = 520 ;
Let (x1, y1), (x2, y2),.... (xn, yn) be the ranks of n individuals in two
Correct value of the correlation co efficient is
characters (or series) Edward Spearman’s Rank correlation coefficient R
n  X Y  ( X )( Y ) is the product-moment correlation coefficient between these ranks and,
therefore, we can write.
n  X 2  (X 2 ) n  Y 2  (Y 2 )
r =
C ov (x , y )
25520 125100 R =  x y ...(1)

25650 1252 25436 1002


=
 (x i  x )(y i  y )
where cov (x, y) =
= 2/3 n
But the ranks of n individuals are the natural numbers 1, 2,.... n arranged
Rank Correlation Coefficient in some order depending on the qualities of the individuals.
Simple correlation coefficient (or product-moment correlation  x1, x2,..., xn are the numbers 1, 2... n in some order..
coefficient) is based on the magnitudes of the variables. But in many
situations it is not possible to find the magnitude of the variable at all. For n (n  1)
x = 1  2  ....  n  and
example, we cannot measure beauty or intelligence quantitatively. In this 2
case, it is possible to rank the individuals in some order. Rank correlation
is based on the rank or the order and not on the magnitude of the variable. 2 2 2 n (n  1) (2n  1)
x2 = 1  2  ....  n 
It is more suitable if the individuals (or variables) can be arranged in order 6
of merit or proficiency. If the ranks assigned to individuals range from 1
to n, then the Karl Pearson’s correlation coefficient between two series of
x =
ranks is called Rank correlation coefficient. Edward Spearman’s formula
for Rank correlation coefficient (R) is given by. 2
x2 x  (n  1) (2n  1) (n  1)2
2 2   x2 =    
6 d 6 d n  n  6 4
R= 1 2
or 1 
n (n  1) (n 3  n )
n 1  n 2 1
where d is the difference between the ranks of the two series and n is the =   (4 n  2  3n  3 ) 
number of individuals in each series.  12  12
similarly,
Derivation of Spearman’s Formula for Rank
n 1 n 2 1
Correlation Coefficient y = an d  y2 
2 12
6 d 2 Let di = x i  y i ; then d i  (x i  x )  (y i  y ) [ x  y ]
R= 1 2
n (n  1) Calculate the rank correlation coefficient.
Introductory Statistics 94 Introductory Statistics 95
School of Distance Education School of Distance Education
2
d i2  (x i  x )  (y i  y )
 = Roll Mathematics Statistics Rank Diff. 2
n n No. Marks Rank (x) Marks Rank (y) d = x  y d

=
1 78 4 84 3 1 1
(x i  x )2 (y i  y )2 2 (x i  x )(y i  y )
  2 36 9 51 9 0 0
n n n
3 98 1 91 1 0 0
=  X2   Y2  2 cov (x , y ) 4 25 10 60 6 4 16
or, 2cov (x, y) = 5 75 5 68 4 1 1
n 2  1 n 2  1  d i2 2(n 2  1)  d i2 6 82 3 62 5 2 4
   
12 12 n 12 n 7 90 2 86 2 0 0
8 62 7 58 7 0 0
n 2  1  d i2
or, cov (x, y) =  9 65 6 53 8 2 4
12 2n
10 39 8 47 10 2 4
Hence, from (1), we get
Total      30=  d2
 n 2  1  d i2   n 2 1 
R =     
 12 2n   12  Applying Edward Spearman’s formula:

6 d2 6 d 2
= 1 [omitting i] R = 1
n (n 2  1 ) n (n 2  1)

6  30 18
Example 13 = 1 2
1
99
Student (Roll No.) 10(10  1)
1 2 3 4 5 6 7 8 9 10
Marks in Maths. 2 9
= 1  = 0.82
78 36 98 25 75 82 90 6 2 6 5 6 9 11 11

Marks in Stat. 84 51 91 60 68 62 86 58 53 47 Regression


Solution In some situations, one may need to know the probable value of one
In Mathematics, Student with Roll No. 3 gets the highest mark 98 and variable corresponding to certain value of another variable. This is possible
is ranked 1; Roll No. 7 securing 90 marks has rank 2 and so on. Similarly, using the mathematical relation between the two variables. Scatter diagram,
we can find the ranks of students in statistics. explained above helps to ascertain the nature of relationship such as linear
(straight line), second degree polynomial (parabola), etc. Discussion in

Introductory Statistics 96 Introductory Statistics 97


School of Distance Education School of Distance Education
this book is restricted to linear relation between two variables. Solving the equations (2) and (3) for b after eliminating ‘a’ we get the
During study of hereditary characteristics, Sir Francis Galton found value of b as
regress,
n  x i y i  ( x i )( y i )
t h a t t h e h e i g h t s o f d i f f e r e n t g r o u p s o f s o n s h a d t h e t e n d e n c y t o

that is to go back towards the overall average height of all groups of


fathers. He called the lines of the average relationship as the lines of the
b = n  x i2  ( x i )2
regression. It is also referred to as the estimating equations because based
on the value of one variable one can predict or estimate the value of the  xi yi
xy
other variable.
= n , dividing each term by n2
2
Suppose we are given n pairs of values (x1, y1) (x2, y2), .... (xn  xi
, yn) of two variables x and y. If we fit a straight line to this data by taking  x2
n
x as independent variable and y as dependent variable, then the straight line
obtained is called the regression line of y on x. Its slope is called the regression C ov(x , y ) Pxy
coefficient of y on x. Similarly, if we fit a straight line to the data by taking
 x2
= =
y as independent variable and x as dependent variable, the line obtained is  x2
the regression line of x on y; the reciprocal of its slope is called the regression Substituting b in (5), we get the regression equation of y on x as
coefficient of x on y.
Pxy
Equation for regression lines y y = (x  x ) .... (6)
 x2
Let y = a + bx .... (1)
Similarly, when x is depending on y, the regression equation of x on y is
be the equation of the regression line of y on x, where a and b are determined obtained as
by solving the normal equations obtained by the principle of least squares.
Pxy
 yi = na  b  x i .... (2) x x = (y  y ) .... (7)
 y2
 xi yi = a  x i  b  x i2 .... (3)
Pxy Pxy
Let us denote as byx and as bxy
Divide the equation (2) by n, we get  x2  x2
1 1
 yi = a  b xi Pxy Pxy
n n Thus byx = as bxy =
 x2  y2
or y = a b x .... (4)
Here byx is called the regression coefficient of y on x and bxy is called the
where x and y are the means of x and y series. Substituting for a from regression coefficient of x on y.
(4) in (1), we get the equation, So we can rewrite the regression equation of y on x as

y y = b (x  x ) .... (5) y y = b yx (x  x )

Introductory Statistics 98 Introductory Statistics 99


School of Distance Education School of Distance Education
and the regression equation of x on y as SOLVED PROBLEMS
x x = b xy (y  y )
Example 21
Some remarks Calculate the coefficient of correlation for the following ages of husbands
ry and wives.
[Link] slope of the regression line of y on x is bxy = and the slope Age of husband (x): 23 27 28 29 30 31 33 35 36 39
x
Age of wife (y): 18 22 23 24 25 26 28 29 30 32
y
of the regression line of x on y is the reciprocal of bxy which is . Solution
rx
1 311
2. Since byx = r ( y /  x ) and  x and  y are positive, it follows that We have, x  x i   31.1
n 10
r has the same sign as that of byx .
1 257
y  y i   25.7
3. Since bxy = r ( x /  y ) we readly find that (byx) (bxy) = r . Since r
2 2 n 10
> 0. It follows that bxy has the same sign as that of byx. Thus, r, bxy We prepare the following table.
and byx always have the same signs. Also |r| = (b yx )(b xy ) . That is, Xi xi  Xi  x x i2 Yi y i  Yi  y y i2 xi y i
|r| is the geometric mean of 23  8.1 65.61 18  7.7 52.29 62.37
bxy and byx. Since |r| < 1 it follows that byx > 1 whenever bxy < 1 and
vice-versa. 27  4.1 16.81 22  3.7 13.69 15.17
28  3.1 9.61 23  2.7 7.29 8.37
4. Since the arithmetic mean is always greater than the geometric mean 29  2.1 4.41 24  1.7 2.89 3.57
1 30  1.1 1.21 25  0.7 0.49 0.77
for any two numbers, we have (b yx  b xy )  b yx  b xy = |r|.
2 31  0.1 0.01 26 0.3 0.09  0.03
33 1.9 3.61 28 2.3 5.29 4.37
Thus, the arithmetic mean of bxy and byx is always greater than the
35 3.9 15.21 29 3.3 10.89 12.87
coefficient of correlation.
36 4.9 24.01 30 4.3 18.49 12.07
5. The two lines of regression always pass through the point (x , y ) . 39 7.9 62.41 32 6.3 39.69 49.77
202.90 158.10 178.30
6. The regression equation of y on x is need for estimating or predicting
the value of y for a given value of x and the regression equation of x  xi yi 178.30

on y is used for estimating or predicting x for a specified value of y. Now, r =
 x i2  y i2 202.90  158.10 = 0.9955

Introductory Statistics 100 Introductory Statistics 101


School of Distance Education School of Distance Education
Example 22 Solution
Calculate the coefficient of correlation for the following data. Here we prepare the following table
x: 6 2 10 4 8
y: 9 11 5 8 7
X Y ui = X  14 vi  Y  25 u i2 v i2 ui v i
Solution
10 20 4 5 16 25 20
Here we prepare the following table 16 33 2 8 4 64 16
X Y X2 Y2 XY 13 25 1 0 1 0 0
12 27 2 2 4 4 4
6 9 36 81 54
15 26 1 1 1 1 1
2 11 4 121 22
17 30 3 5 9 25 15
10 5 100 25 50
14 30 0 5 0 25 0
4 8 16 64 32
8 7 64 49 56 1 16 35 144 48

30 40 220 340 214 n  u i vi  ( u i )( vi )


rxy = ruv =
n  u i2  ( u i )2 n vi2  ( vi )2
n  XY  ( X )( Y )
r =
n  X 2  ( X )2 n Y 2  ( Y )2 7  48  (1)  16
=
7  35  (1)2 7  144  16 2
5  214  30  40
=
5  220  30 2 5  340  40 2 336  16
=
245  1 1008  256
130
= = 0.919 352
200 100 = = 0.82
244 752

Example 24
Example 23 Calculate the rank correlation coefficient from the following data
Find the correlation coefficient between X and Y given specifying the ranks of 7 students in two subjects.
Rank in the first subject : 1 2 3 4 5 6 7
x: 10 16 13 12 15 17 14
Rank in the second subject : 4 3 1 2 6 5 7
y: 20 33 25 27 26 30 30

Introductory Statistics 102 Introductory Statistics 103


School of Distance Education School of Distance Education

Solution Solution
The following table is prepared.
Here n = 7. Let x and y denote respectively the ranks in the first and A B Ranks in A Ranks in B di d i2

second subjects. We prepare the following table. 88 65 2 5 3 9


72 90 3 1 2 4
xi yi di  xi  yi d i2 95 86 1 2 1 1
60 72 5 4 1 1
1 4 3 9
35 30 9 10 1 1
2 3 1 1 46 54 8 6 2 4
3 1 2 4 52 38 7 9 2 4
58 43 6 8 2 4
4 2 2 4
30 48 10 7 3 9
5 6 1 1 67 75 4 3 1 1
6 5 1 1

7 7 0 0 6  d i2
R = 1
n (n 2  1)
20
6  38
= 1
10  (10 2  1)
The Spearman’s rank correlation coefficient is
6  d i2 6  20 = 1  0.2303 = 0.7697
R= 1  2
1 = 0.643
n (n  1) 7  (7 2  1) Example 26
Example 25
The coefficient of rank correlation of marks obtained by 10
Find the rank correlation coefficient between marks in two subjects A students in two subjects was computed as 0.5. It was later
and B scored by 10 students
A: 88 72 95 60 35 46 52 58 30 67 discovered that the difference in marks in two subjects obtained
B: 65 90 86 72 30 54 38 43 48 75 by one of the students was wrongly taken as 3 instead of 7.
Find the correct coefficient of rank correlation.

Introductory Statistics 104 Introductory Statistics 105


School of Distance Education School of Distance Education
Solution
Solution
Here we prepare the following table.
Here given R = 0.5, and n = 10.
Then we have, X Y u i  X  34 v i  Y  80 u i2 v i2 u i vi
0.5 = 28 75 6 5 36 25 30
or = 82.5
26 74 8 6 64 36 48
Deleting the wrong item from this and adding the correct item to it we
obtain corrected 32 82 2 2 4 4 4

= 82.5 32 + 72 = 122.5. 31 81 3 1 9 1 3

Consequently, the correct coefficient of rank correlation is 37 90 4 10 16 100 40

R = = 0.2576 29 80 5 0 25 0 0
36 88 2 8 4 64 16
34 85 0 5 0 25 0
Example 27 39 92 5 12 25 144 60
The following are the data on the average height of the plants 40 95 6 15 36 225 90
and weight of yield per plot recorded from 10 plots of rice
7 42 219 624 277
crop.
n  u i vi    u i   vi 
Height (X) i. r xy  ruv = 2 2
(cms)
: 28 26 32 31 37 29 36 34 39 40 n  u i2    u i  n  vi2    vi 
Yield (Y)
: 75 74 82 81 90 80 88 85 92 95
(kg) 10  277  (7)  42
=
10  219  (7)2 10  624  (42)2
Find (i) correlation coefficient between X and Y (ii) the
regression coefficient and hence write down regression 3064
= = 0.989
46.271  66.903
equation of y on x and that of x on y (iii) probable value of the
ii. The regression coefficient of y on x is
yield of a plot having an average plant height of 98 cms.
n  u i vi    u i  vi 
b yx = 2
n  u i2    u i 

Introductory Statistics 106 Introductory Statistics 107


School of Distance Education School of Distance Education
Solution
3064 Since the lines of regression pass through  x , y  we have
= = 1.431
2140.99
4 x  5 y  33 = 0
The regression coefficient of x on y is

n  u i vi    u i  vi  20 x  9 y  107 = 0
b xy = 2
n  vi2    vi 
Solving these equations, we get the mean values of x and y as
3064 x  13, y  17 . We rewrite the given equations respectively as
= = 0.684
4476.01
4 33 9 107 4 9
y  x  ,x  y so that b yx  , b xy 
ui 5 5 20 20 5 20
The regression equation of y on x is x A
n Therefore, the coefficient of correlation between x and y is

y y = b yx  x  x  = 34 
7
 33.3 r = b xy b yx  = 0.6
10
Here positive sign is taken since both b xy and b yx are positive.
 vi
ie., y  84.2 = 1.431  x  33.3  y B
y
n 4
Since r  b yx  , and  x2  9 (given), we get
x 5
42
ie., y = 1.431x  36.55 = 80  = 84.2
10 4 x 4 3
y =  =4
The regression equation of x on y is 5r 5  0.6
x x = b xy  y  y  Thus, the variance of y is  y2 = 16.
ie., x  33.3 = 0.684  y  84.2 
ie., x = 0.684y  24.29 EXERCISES
Multiple Choice Questions
[Link] estimate the yield (y), the regression equation of y on x is 1. The idea of product moment correlation was given by
a) R.A. Fisher b) Sir. Francis Galton
y = 1.431x  36.55
c) Karl Pearson d) Spearman
when x = 98, y = 1.431  98  36.55 = 103.69kg
2. Correlation coefficient was invented in this year
Example 27
a) 1910 b) 1890
For the regression lines 4 x  5 y  33  0 and 20 x  9 y  107 , find
c) 1908 d) None of the above
(a) the mean values of x and y, (b) the coefficient of correlation between
x and y, and (c) the variance of y given that the variance of x is 9.

Introductory Statistics 108 Introductory Statistics 109


School of Distance Education School of Distance Education
3. The unit of correlation coefficient is 11. The correlation between two variables is of order
a) kg/cc b) percent
a) 2 b) 1
c) non existing d) none of the above
c) 0 d) none of the above
4. If , the variables X and Y are
a) linearly related b) independent 12. If simple correlation coefficient is zero, then the regression
c) not linearly related d) none of the above coefficient is equal to .............
5. In a scatter diagram if all dots lie on a line falling from left hand top a. 1 b. 2
to right hand bottom, then the value of r is c. 0 d. -1
a) +1 b) 0 13. Correlation coefficient is a ............. number.
c) -1 d) 1
a. imaginary b. unit based
6. The formula for rank correlation coefficient is c. pure d. None of these
14. If b yx  1 , then b xy is
6 d 2 6 d 2
1 1
   
a) less than 1 b) greater than 1
a) n n 2 1 b) n n 2 1
c) equal to 1 d) equal to 0
2
d Given the regression lines X  2Y  5  0, 2 X  3Y  8  0 and
1 15.
c)

n n 2 1  d) None of the above 2
 x2  12 , the value of  y is
7. Rank correlation coefficient was discovered by a) 16 b) 4 c) 3/4 d) 4/3
a) Charles Spearman b) Karl Pearson
c) R.A. Fisher d) Francis Galton Very Short Answer Questions
8. In a regression line of y on x, the variable x is known as 16. What is Correlation?
a) independent variable b) regressor 17. Enumerate the different types of Correlation.
c) explanatory variable d) all the above
18. What is meant by perfect correlation?
9. If byx and bxy are two regression coefficients, they have
19. What is meant by spurious correlation?
a) same sign b) opposite sign
c) either same or opposite signs d) nothing can be said 20. Give the formula for product moment correlation coefficient.

10. If r xy  1 , the relation between X and Y is of the type 21. Give the significance of the values r = +1, r =  1 and r = 0.
a) When Y increases, X also increases 22. What is the use of scatter diagram?
23. What are advantages of rank correlation coefficient?
b) When Y decreases, X also decreases
24. Why there are two regression lines?
c) X is equal to Y
25. What are regression coefficients.
d) When Y increases, X proportionately decreases.

Introductory Statistics 110 Introductory Statistics 111


School of Distance Education School of Distance Education

Short Essay Questions


I MODULE III
26. Distinguish between Correlation and Regression
27. Define the coefficient of correlation and show that it is free from
origin and the unit of measurement.
PROBABILITYTHEORY
28. Explain how coefficient of correlation measures the linear relationship CLASSICAL DEFINITION OF PROBABILITY
between two variables.
Introduction
29. Define (1) Line of regression and (2) Regression coefficient. Show In everyday language, the word probability describes events that do
that the coefficient of correlation is the geometric mean of not occur with certainty. When we look at the world around us, we have
coefficients of regression.
to conclude that our world functions more on uncertainty than on certainty.
30. State the important properties of regression coefficient. Prove any Thus we speak of the probability of rain tomorrow, the probability that an
one of these properties. electric appliance will be defective, or even the probability of nuclear war.
31. Module
What are regression lines? III are two regression lines?
Why there The concept of probability has been an object of debate among philosophers,
logicians, mathematicians, statisticians, physicists and psychologists for
32. What is correlation? Enunciate the different types of correlation the last couple of centuries and this debate is not likely to be over in the
between two variables. foreseeable future.
Long Essay Questions Probability is a number associated with an event, intented to represent
33. Compute the coefficient of correlation between X and Y presented its ‘likelihood’, ‘chance of occurring’, ‘degree of uncertainity’ and so on.
in the table below: The probability theory has its origin in ‘Games of chance’. Now it has
X : 1 3 4 6 8 9 11 14 become a fundamental tool of scientific thinking.
Y : 1 2 4 4 5 7 8 9
34. Find the correlation coefficient between x and y given the following
sets of values of x and y:-
Classical Definition of Probability
x : 1 2 4 5 8 9 Some Important Concepts
y : 4 6 7 10 11 15
35. From the following information, obtain the correlation coefficient: - 1. Random experiment
It is a physical phenomenon and at its completion we observe certain
N =12;  x  30 ; y  5 ;
results. There are some experiments, called deterministic experiments,
 x 2  670 ;  y 2  285 ;  xy  334 . whose outcomes can be predicted. But in some cases, we can never predict
the outcome before the experiment is performed. An experiment natural,
36. For the following pairs of values, obtain the correlation coefficient:- conceptual, physical or hypothetical is called a random experiment if the
X : 4 6 5 9 6 11 8 exact outcome of the trails of the experiment is unpredictable. In other
Y : 6 14 10 17 12 18 14 words by a random experiment, we mean
37.. Calculate the coefficient of correlation for the following data: 1. It should be repeatable under uniform conditions.
x: 28 45 40 38 35 33 40 32 36 33 2. It should have several possible outcomes.
y: 23 34 33 34 30 26 28 31 36 35
3. One should not predict the outcome of a particular trail.

Introductory Statistics 112 Introductory Statistics 113


School of Distance Education School of Distance Education
Example: Tossing a coin, rolling a die, life time of a machine, length of 6. Favourable cases
tables, weight of a new born baby, weather condition of a certain region
The cases which entail the occurrence of an event are said to be
etc.
favourable to the events. For example, while throwing a die, the occurrence
2. Trial and Event of 2 or 4 or 6 are the favourable events which entail the occurrence of an
Trial is an attempt to produce an outcome of a random experiment. For even number.
example, if we toss a coin or throw a die, we are performing trails.
Classical Definition (Mathematical or ‘a priori’)
The outcomes in an experiment are termed as events or cases. For
Classical definition is the oldest and simplest definition of probability.
example, getting a head or a tail in tossing a coin is an event. Usually
This is sometimes called equally-likely events approach. It is also known
events are denoted by capital letters like A, B, C, etc…
by the name Laplace definition. From a practical point of view it is the
3. Equally likely events most useful definition of probability.
Events or cases are said to be equally likely when we have no reason to Definition
expect one rather than the other.
If a trial results in ‘n’ mutually exclusive, equally likely and exhaustive
For example, in tossing an unbiased coin the two events head and tail
cases and ‘m’ of them are favourable (m < n) to the happening of an event
are equally likely because we have no reason to expect head rather than
A, then the probability of A, designated as P(A) is defined as
tail. Similarly, when we throw a die the occurrence of the numbers 1 or 2
or 3 or 4 or 5 or 6 are equally likely events. m no of favourable cases
P(A) =  (1)
4. Exhaustive events n Total number of cases
The set of all possible outcomes in a trial constitutes the set of exhaustive Obviously, 0 £ P(A) £ 1
cases. In other words the totality of all possible outcomes of a random Note 1
experiment will form the exhaustive cases. For example, in the case of If A is an impossible event, then P(A) = 0
tossing a coin there are two exhaustive cases head or tail. In throwing a If A a sure event, then P(A) = 1
die there are six exhaustive cases since any one of the six faces 1, 2, …, 6 If A is a random event, then 0 < P(A) < 1
may come upper most. In the random experiment of throwing two dice Note 2
the number of exhaustive cases is 62 = 36. In general, in throwing n dice, We can represent the probability given by (1) by saying that the odds in
the exhaustive number of cases is 6n. favour of A are m: (n – m) or the odds against A are (n – m): n.
5. Mutually exclusive events Limitations of classical definition
Events are said to mutually exclusive or incompatible or disjoint if the The above definition of mathematical probability fails in the following
happening of any one of them precludes or excludes the happening of all cases.
the others in a trail. That is, if no two or more of them can happen 1. In the classical or a priori definition of probability only equally likely
simultaneously in the same trial.
cases are taken into consideration. If the events cannot be considered
For example, the events of turning a head or a tail in tossing a coin are equally likely classical definition fails to give a good account of the
mutually exclusive. In throwing a die all the six faces numbered 1 to 6 are concept of probability.
mutually exclusive since if any one of these faces comes, the possibility of
others in the same trial, is ruled out.
Introductory Statistics 114 Introductory Statistics 115
School of Distance Education School of Distance Education

Sets are usually denoted by capital letters A, B, C, X, Y. Z etc. The


2. When the total number of possible outcomes ‘n’ become infinite or
items which are included in a set are called the elements of the set.
countably infinite, this definition fails to give a measure for probability.
If A = {3, 5, 12, 14, 21} is a set, then ‘3’ is an element of set A, and it
3. If we are deviating from the games of chances like tossing a coin,
throwing a die etc., this definition cannot be applied. is written as ‘3  A’. This is read as ‘element 3 belongs to set A. Thus the
4. Another limitation is that it does not contribute much to the growth of symbol ‘’ denotes ‘belongs to’. On the other hand, 8 does not belong to
the probability theory. set A in the above case. Then the symbol ‘’ is used to indicate ‘does not
belong to’. ie., 8  A implies element 8 is not a member of set A.
Frequency Definition of Probability
There are two methods of representing sets viz.
Let the trials be repeated over a large number of times under essentially 1. Roster method. 2. Rule method
homogeneous conditions. The limit of the ratio of the number of times an
event A happens (m) to the total number of trials (n), as the number of I. Roster Method
trials tends to infinity is called the probability of the event A. It is, however, Here each and every element of the set is listed or mentioned.
assumed that the limit is unique as well as finite.
Example: i. A = {a. e, i, o, u} ii. B ={2, 3, 5, 7}
m iii. Y = {6, 1, 5, 2, 4, 3}
Symbolically, P(A) = lim
n  n Note
Remark 1. The application of this definition to many problems cannot be The flower brackets { } are used for denoting a set. The order in
extensive since n is usually finite and the limit of the ratio cannot normally which the elements of a set are listed in the { } brackets is immaterial.
be taken as it leads to mathematical difficulties. Besides, the definition of
probability thus proposed by Von Mises would involve a mixture of empirical 2. Rule Method
and theoretical concepts, which is usually avoided in modern axiomatic Here a rule is stated by which all the elements of the set intended to be
approach. identified.
Remark 2. The two definitions of probability” are apparently different. Example: A: {x/x is a vowel among English alphabets}
The mathematical definition is the relative frequency of favourable cases This is read as set A. Set of all x such that x is a vowel among
to the total number of cases while in the statistical definition it is the limit English alphabets.
of the relative frequency of the happening of the event.
Set Theory Types of Sets
Set: A Set is a collection of well defined objects. The following arc We have different kinds of sets, Consider the following
typical examples of sets. I. Finite Set
1. The students Minu, Jithu, Hari and Devi. A Set which contains a finite or a fixed number of elements is called
2. The odd numbers 1, 3. 5, 7, 9 a ‘Finite Set’. Example:
3. The rivers in South India i. Set A has only five elements i,e,, A = {1, 2. 6. 8. 10}
4. The metropolitan cities of India ii. B = {x/x is a composite number between 12 and 18}
i.e., B = {14, 15, 16}

Introductory Statistics 116 Introductory Statistics 117


School of Distance Education School of Distance Education

iii. Y = {x/x shows a number on a die} 5. Universal Set


This is same as Y = {1, 2, 3, 4, 5, 6} A Universal Set is a set of all elements which are taken into consideration
in a discussion. It is usually denoted by the capital letter U or otherwise
2. Infinite Set defined in the context. In this text we shall use S to indicate a universal set
A set which contains infinite number of elements is called an ‘infinite since it is more convenient for application to probability.
set’. For instance. Let S = {1, 2, 3, 4. 5, 6} be a universal set, showing
Example: possible numbers on a die.
i. X = {x/x is a natural number} i.e.. X = {1, 2, 3. 4...} 6. Sub-sets and Super-sets
ii. Y = {…, –2, –1, 0, +1, +2. …} Let A and B be two sets. If every element of B is present in A, then
3. Singleton Set B is a ‘Sub Set’ of A. ie., B  A. In other words. A is a ‘Super Set’ of B
A set containing only one element is called a ‘Singleton Set’. i.e., A  B

Example Example:
i. A = {0} i. If A {a. b, c, d, e} and B {a, d}
ii. B = {x/x is an even number between 3 and 51 i.e., B = {4} then B  A or A  B
4. Null Set ii. If A = {2, 4} and B = {I, 2, 3, 4, 5}
A set which does not contain any element is called an “empty set or then A  B or B  A
‘void set’ or ‘Null Set’.
7. Equal Sets
Example: Two sets A and B are said to be equal if A  B and B  A and is denoted
i. Set A denotes names of boys in a girls college. by A = B
i.e., A = { } since nobody is admitted to a girl’s college.
Example:
ii. T = {x/x is a perfect square between 10 and 15}
i. Let A = {3, 2. 5. 6} and B = {2, 5. 6. 3}
i.e.. T = { } since no number which is a perfect square exists between
10 and 15. Here all the elements of A are elements of B{ie. A  B} and all the
A null a set is denoted by the greek letter  (read as phi) elements of B are elements of A( ie., B  A). Hence

Example: A=B

T =  implies ‘set T is a null Set’. 8. Equivalent Sets


But T = {} implies ‘set T is a singleton set with  Two sets A and B are said to be equivalent if they have equal number
as an element’ of elements and is denoted by A  B For example

Introductory Statistics 118 Introductory Statistics 119


School of Distance Education School of Distance Education

Let A = {X, Y, Z} and B = {1, 2, 3} Then A and B are said to be 2. Intersection of Sets
equivalent sets and are denoted by A  B If A and B are two sets, then the ‘intersection” of A and B is the set of
all elements which are common to both of them. Intersection of sets A and
9. Power Set B is denoted by A  B
The power set is defined as the collection of all subsets of a given set. That is, x  A  B implies x  A and x  B
It is also called Master set. Example:
Example:
The powerset of a given set {a, b, c} is {, {a}, {b}, {c}, {a,b}, {b,
If A = {2, 5} and B = {5, 7, 9}
c}, {c, a}, {a, b, c}}.
then A  B = {2, 5}  {5, 7, 9} = {5}
The number of elements in a set is called cardinality of the set, Thus
the cardinality of powerset of a given set having 3 elements is 23. Generally Disjoint Sets
the cardinality of power-set of given set having n elements is 2n.
Two sets are said to be ‘disjoint’ or ‘mutually exclusive’ if they do not
Venn Diagrams have any common element between them
Sets can be represented diagrammatically using Venn diagrams. These (A  B) =  or (A  B) = { }, a null set
were introduced by John Venn, an English logician.
Here, the Universal set is represented by a rectangle and all other sub-
Example:
sets by circles or triangles etc. Venn diagrams are especially useful for If A = {1, 2} and B = {a, b, c}. (A  B) = 
representing various set operations. Hence we first learn about set
operations and employ Venn diagrammatic approach to represent the same. 3. Difference of Sets
If A and B are two sets, A – B is the difference of two sets A and B
Set Operations which contains all elements which belong to A but not to B,
That is x  A–B implies x  A and x  B
The basic set operations are (i) union (ii) intersection [iii) compliment
and (iv) difference.
Example:
i. Union of sets
If A = {0, 1, 2, 3} and B = {2. 3. 5, 7}
If A and B are two sets, then the ‘union’ of sets A and B is the set of all
elements which belong to either A or B or both (i.e., which belongs to at then A–B = {0, 1}
least one). It is denoted by A  B.
4. Complement of Sets
That is, x  A  B implies x  A or x  B
Suppose A is a sub set of some Universal set S. Its complementary set
Example: is the set of all elements of the Universal set S which does not belong to
If A = {3, 8, 5} and B = {3, 6, 8} the set A. The complementary set of A is denoted by A  {A dash) or A
then A  B = {3, 8, 5}  {3, 6, 8} = {3, 8, 5, 6} (A bar) or AC (A complement).
That is x  Ac implies x  A but x  S

Introductory Statistics 120 Introductory Statistics 121


School of Distance Education School of Distance Education

Note Set terminology


A complement set A cannot be developed without the elements of the The following terminologies are verbally used for calculating the
Universal set S being known. probability of occurrence of events where the events are represented by
Example sets.
If S = {1, 2, 3, 4, 5} and A = {2, 4} 1. For one event A
then Ac = {1, 3, 5 } i. Occurrence of an event is represented by – A
Algebra of Sets ii. No occurrence of an event – AC
The following results are very useful is the context of probability theory. 2. For two events A and B
We can easily verify the results by choosing the sets appropriately.
i. Occurrence of none – AC  BC
1. A  B = B  A, A  B = B  A - Commutative property
ii. Occurrence of both A and B – A  B
2. (A  B)  C = A  (B  C), (A  B)  C = A (B  C)
iii. Occurrence of exactly one – (A  BC)  (AC  B)
- Associative property
iv. Occurrence of at least one – AB
3. (A  B)  C = (A  C)  (B  C)
(A  B) C = (A C)  (B  C) – Distributive property 3. For three events A, B, and C
4. (A  S) = S, A  S = A i. Occurrence of all – ABC

A   = A, A   =  ii. Occurrence of None – AC  B C  C C

5. A  AC = S, A  AC =  iii. Occurrence exactly of one


– (ABCCC)  (ACBCC)(ACBCC)
6. (A  B)C = AC  BC
iv. Occurrence of exactly two
(A  B)C = AC  BC
– (ABCC)(ABCC)(ACBC)
More generally,
C
v. Occurrence of at least one – ABC
n  n
 Ai    A C Permutations and Combinations
 i 1  i 1

Fundamental Principle:
C
n  n
If an event ‘A’ can happen in ‘n1’ ways and another event ‘B’ can
  Ai    AC – De’ Morgan’s Laws.
happen in ‘n2’ ways, then the number of ways in which both the events A
 i 1  i 1
and B can happen in a specified order is ‘n 1  n2’.
If there are three routes from X to Y: two routes from Y to Z then the
destination Z can be reached from X in 3  2 = 6 ways.

Introductory Statistics 122 Introductory Statistics 123


School of Distance Education School of Distance Education
Permutation n!
npr =
Definition: Permutation refers to the arrangement which can be made by (n  r )!
taking some (say r) of things at a time or all of ‘n’ things at a time with
attention given to the order of arrangement of the selected objects. Results
1. The number of permutations of n objects when r objects taken at a
Mathematicians use a neat notation for permutation (i.e., arrangement)
time when repetition allowed = nr.
of ‘n’ objects taking ‘r’ objects at a time by writing this statement as nPr or
nPr. Here, letter ‘P’ stands for ‘permutation’ (i.e., a rule for arrangement). 2. The number of permutations of n objects when all the n taken at a
time when repetition allowed = nn.
Suppose we want to arrange 3 students A, B and C by choosing 2 of
3. The number of permutations of n objects of which, n 1 are of one
them at a time. This arrangement can be done in the following ways.
kind, n2 are of another kind, n3 are of another kind etc., taking all the
AB, BC, CA, BA, CB and AC
The arrangement of 3 things taken 2 at a time is denoted by 3P 2. n!
n together is given by where
Therefore, 3P2 = 6 = 3  2. n1!n2 !n3 !...nk !
In general, suppose there are ‘n’ objects to be permuted in a row taking n1 + n2 + … + nk = n.
all at a time. This can be done in nPn different ways. It is given by
n
Combination
Pn = n (n – 1) (n – 2) … 3. 2. 1
A combination is a grouping or a selection or a collection of all or a part
Example of a given number of things without reference to their order of arrangement.
4P4 = [Link] = 24 If three letters, a, b, c are given, ab, bc, ca are the only combinations
The permutation of n things taken r at a time (r < n) is given by of the three things a, b, c taken two at a time and it is denoted as 3C2. The
n
Pr = n (n– 1) ... (n– r + 1) other permutations ba, cb and ac are not new combinations. They are
eg: 7P5 = 76543 = 2520 obtained by permuting each combination among themselves.

So 3P2 = 3C2  2!
Factorial notation
We have a compact notation for the full expression given by the product 3P2 3.2
n (n –1) (n – 2) … 3. 2. 1. This is written as n! read as ‘n factorial’. or 3C2 =  3
2! 1.2
So, nPn = n! = n (n – 1) (n –2) … 3. 2. 1.
6P6 = 6! = 6. 5. 4. 3. 2. 1 = 720 Co mbi na tio n o f n d iff ere nt th ing s t a ken r a t time (r < n)
By, definition, 0! =1 The number of combinations of n different things taken r at a time is
We have, nPr = n (n –1) … (n – r + 1) denoted as nCr or nCr or  n  . It is given by
 
= n(n – 1) (n – 2) ... (n – r + 1)
nPr r  n(n  1)(n  2)...(n  r  1)
 (n  r )(n  r  1)...3.2.1 nCr =
r!
=
n! 1  2  3 r
 (n  r )(n  r  1)...3.2.1 or nC r 
  r!(n  r )!

Introductory Statistics 124 Introductory Statistics 125


School of Distance Education School of Distance Education

For example, 7! 7 65 SOLVED PROBLEMS


7
C3 =
3! 4! 1 2  3 = 35
=
Example 1
10! 10  9  8  7 What is the probability that a leap year selected at random will contain
10
C4 = =
4! 6! 1 2  3  4 = 210 53 Sundays?
Solution
Important results
In a leap year there are 366 days consisting of 52 weeks plus 2 more
n! days. The following are the possible combinations for these two days. (i)
1. nCn =
n! 0! = 1. This is the combination of n things taken all at a Sunday and Monday (ii) Monday and Tuesday (iii) Tuesday and Wednesday
(iv) Wednesday and Thursday (v) Thursday and Friday (vi) Friday and
time. Saturday (vii) Saturday and Sunday.
n! For getting 53 Sundays in a leap year, out of the two days so obtained
2. nC0 = one should be a Sunday. There are two cases favourable for getting a
n! 0! = 1. This is the combination of n things taken none at a Sunday out of the 7 cases.
time. Required probability = 2/7.
3. nCr = nCn – r
Example 2
10  9 Three coins are tossed. What is the probability of getting (i) all heads
This says that, 10C8 = 10C2 =
1 2 (ii) exactly one head (iii) exactly two heads (iv) atleast one head (v) atleast
two heads (vi) at most one head (vii) at most two heads (viii) No head.
12  11  10 Solution
12C9 = 12 C3 = = 220
1 2  3 When three coins are tossed, the possible outcomes are given by [HHH,
HHT, HTH, THH, HTT, THT, TTH, TTT]
100  99 i. P (all heads) = 1/8
100C98 = 100 C2 = = 4950
1 2 ii. P (exactly one head) = 3/8
4. nCr + nCr – 1 = (n + 1) Cr. iii. P (exactly two heads) = 3/8
iv. P (atleast one head) = 7/8
v. P (atleast two heads) = 4/8
vi. P (at most one head) = 4/8
vii. P (at most two heads) = 7/8
viii. P (no head) = 1/8

Introductory Statistics 126 Introductory Statistics 127


School of Distance Education School of Distance Education

Example 3 Example 5
What is the probability of getting a spade or an ace from a pack or What is the probability of getting 9 cards of the same suit in one hand
cards? at a game of bridge?
Solution Solution
P (Spade or Ace) = 16/52 One hand in a game of bridge consists of 13 cards. Total number of
possible cases = 52C13
Example 4
A box contains 8 red, 3 white and 9 blue balls. If 3 balls are drawn at The number of ways in which a particular player can have 9 cards of
random, determine the probability that (a) all three are blue (b) 2 are red one suit are 13C9 and the number of ways in which the remaining 4 cards
and 1 is white (c) atleast one is white and (d) one of each colour is drawn. are of some other suit are 39C4. Since there are 4 suits in a pack of cards,
the total number of favourable cases = 4  13C9  39C4.
Solution
Assume that the balls are dreawn from the urn one by one without 4  13C 9  39C 4
replacement. Required probability =
52C13
9C 3 7
a) P(all the three are blue) = =
20C 3 95

8C 2  3C1 4
b) P(2 red and 1 white) = =
20C 3 95
c) P(at least 1 is white) = 1 – P (None is white)

17C 3
= 1–
20C 3

34 23
= 1– =
57 57

8C1  3C1  9C1 18


d) P (one of each colour) = =
20C 3 95

Introductory Statistics 128 Introductory Statistics 129


School of Distance Education School of Distance Education

Event
AXIOMATIC DEFINITION OF PROBABILITY An event is a subset of the sample space. In other words, “of all the
possible outcomes in the sample space of an experiment, some outcomes
The mathematical and statistical definitions of probability have their satisfy a specified description, which we call an event.”
own disadvantages. So they do not contribute much to the growth of the
Field of events (F)
probability theory. The axiomatic definition is due to A.N. Kolmogorov
Let S be the sample space of a random experiment. Then the collection
(1933), a Russian mathematician, and is mathematically the best definition
or class of sets F is called a field or algebra if it satisfies the following
of probability since it eliminates most of the difficulties that are encountered
conditions.
in using other definitions. This axiomatic approach is based on measure
1. F is nonempty
theory. Here we introduce it by means of set operations.
2. the elements of F are subsets of S.
Sample space
3. if A  F, then AC  F
A sample space is the set of all conceivable outcome of a random
4. if A  F and B  F then A  B  F
experiment. The sample space is usually denoted by S or W. The notion
of a sample space comes from Richard Von Mises. For example, let S = { 1, 2, 3, 4, 5, 6 }

Every indecomposable outcome of a random experiment is known as Choose F as the set with elements , S, { 5, 6} and {1, 2, 3,4) Then F
satisfies all the four conditions. So F is a field.
a sample point or elementary outcome. The number of sample points in the
sample space may be finite, countably infinite or noncountably infinite. More generally, when A  S, F = {, A, AC, S} forms a field. Trivially, F
Sample space with finite or countably infinite number of elements is called with just two elements  and S forms a field.
discrete sample space. Sample space with continuum of points is called -’field or -algebra of events
continuous sample space. Let S be a nonempty set and F be a collection of subsets of S. Then
Example F is called a -field or -algebra if
1. The sample space obtained in the throw of a single die is a finite 1. F is nonempty
sample space, ie. S = {1, 2, 3, 4. 5, 6} 2. Tile elements of F are subsets of S
2. The sample space obtained in connection with the random experiment 3. If A F,then AC F and
of tossing a coin again and again until a head appears is a countably 4. The union of any countable collection of elements of F is an element
infinite sample space. of F.
ie. S {H, TH, TTH, TTTH .......... } 

3. Consider the life time of a machine. The outcomes of this experiment i.e., if Ai  F, i = 1, 2, 3, … n, then 
i 1
Ai  F
form a continuous sample space.
The  algebra F is also called Borel field and is often denoted by B.
ie., S = { t : 0 < t < }

Introductory Statistics 130 Introductory Statistics 131


School of Distance Education School of Distance Education

Examples Axiomatic definition


1. B = {, S}
2. B = {0, A, AC, S}
Let S be the sample space. Let B be the class of events constituting
3. B = { , A, B, S} provided A  B = S and A  B = 
the Borel field. Then for each A  B, we can find a real valued set function
4. The powerset of S always form a Borel field. P (A), known as the probability for the occurrence of A if P(A) satisfies
Function and Measure the following three axioms,
We know that a function or mapping is a correspondence between the Axiom 1. (Non negativity)
elements of the set X (called domain) and the set Y (called range) by a rule 0 £ P(A) £ 1 for each A  B
or principle. When the elements of the domain are sets and the elements of Axiom 2. (Norming)
the range are real numbers, the function is said to be a ‘set function’. A set
P(S) = 1
function is usually denoted by P(A) or (A) where A represents an arbitrary
Axiom 3. (Countable additivity)
set in the domain.
If A1, A2, …, An is a finite or infinite sequence of elements in B such
In a set function if A1, A2, …, An are disjoint sets in the domain
and if that Ai  Aj = , i j.

(A1  A2  A3  …  An) = (A1)  (A2)  (A3)  … (An)   


then the set function is said to be additive. P  Ai    P( Ai )
 i 1  i 1
If a set S is partitioned into a countable number of disjoint sets A1,
A2 , … and if a set function defined on satisfies the property.
(A1  A2  … ) = (A1) + (A2) + … Probability Space
From the axiomatic definition of probability we can conceive of a
  
i.e.,    Ai     ( Ai ) probability space constituting the triplet (S, B, P) where S represents the
 1  i 1 sample space, B is the class of all subsets of S constituting a Borel field,
and P is the probability function with domain B and satisfying the axioms
then the set function is said to be countably additive. 1, 2 and 3 of probability given above.
Measure Probability space is a single term that gives us an expedient way to
A set function which is non negative and totally additive is called a assume the existence of all three components in its notation. The three
measure. A measure will be called a probability measure if components are related; B is a collection of subsets of S and P is a function
(A1  A2 
n
A3  … An) = (A1) + (A2) + (A3) + … + (An) = 1 that has B as its domain. The probability space’s main rise is in providing
a convenient method of stating background assumptions for future
where  Ai = S, Ai  Aj = , i j
1 definitions and theorems etc.
In probability theory, the probability measure is denoted by P instead
of .

Introductory Statistics 132 Introductory Statistics 133


School of Distance Education School of Distance Education

Note: The axiomatic definition of probability proposed by Kolmogorov Example


reveals that the numbers in the interval [0, 1] can be assigned as probabilities Consider an experiment of tossing a coin infinitely many times. The
of events in some initial class of elementary events. Using these probabilities outcomes may be represented as infinite sequences of the form
we can determine the probability of any event which may be of interest. HHTHTTTHHT….. so that the sample space S consists of infinitely many
The calculus of probability begin after the assignment of probabilities such sequences. The event ‘head only’ given by the sequence {HHHH….}
represented by the symbols p1, p2, p3 ...... which are usually determined on is not empty. However, the chance of such an outcome is, atleast intuitively,
the basis of some past experience or on the basis of some empirical study. zero. Tails should come up sooner or later.
Theorem 2
Theorems in Probability
Probability is infinitely additive
The following are some consequences of the axioms of probability,
which have got general applications and so they are called theorems. We i.e., P(A1  A2  … An) = P(A1) + P(A2) + P(A3) + … + P(An)
can make use of Venn diagrams for the better understanding of these where Ai  Aj = , i  j.
theorems.
Proof
Theorem I Consider an infinite sequence of events A1, A2, A3, … An, , , , …
The probability of an impossible event is ZERO. which are pairwisely disjoint since Ai’s are disjoint.
ie., P() = 0, Then by axiom 3
Proof P(A1  A2  … An      … ) =
Let  be the impossible event. P(A1) + P(A2) + P(A3) + … + P(An) + P() + P() …

Then S  B and   B i.e., P(A1  A2  … An) = P(A1) + P(A2) + P(A3) + … + P(An)

We have S=S n  n
P  Ai    P( Ai )
 P(S  ) = P(S)  i 1  i 1
Theorem 3 (Monotonicity)
i.e., P(S) + P() = P(S), since S and  are disjoint.
If A  B, then P(A)  P(B)
i.e., 1 + P() = 1 – by axiom 2
Proof
 P() = 0 From the Venn diagram
Note: We have B = A  (AC B)

The condition P(A) = 0 does not imply that A =  P(B) = P[A  (AC B)]
= P(A) + P(AC B) since A and AC B are disjoint

Introductory Statistics 134 Introductory Statistics 135


School of Distance Education School of Distance Education

Theorem 6 (Addition theorem of two events)


= P(A) + a +ve quantity i.e., P(B)  P(A) or P(A)  P(B)
If A and B are any two events,
Note: From the above, we get P(B – A) = P(B) – P(A) since A C  B
=B–A P(A  B) = P(A) + P(B) – P(A  B)
Theorem 4 Proof
Probability is countably subadditive. i.e., for every sequence of From the Venn diagram,
events A1, A2, … We can write
A  B = A  (AC  B)
 
P  Ai   P(A ) + P(A ) + P(A ) + … P(A  B) = P[A  (AC  B)]
 i 1  1 2 3

= P(A) + P (AC  B) … (1) since A  (ACB) = 


Proof
On the other hand,
By considering the infinite operations on events, we can write
the union of events into union of disjoint events, B = (A  B)  (AC  B)


P(B) = P[(A  B) + P(AC  B)] since (A  B)  (AC  B) = 
i.e.,  Ai = A1  ( A1C  A2 )  ( A1C  A2C  A3 )  ... P(AC  B) = P(B) – P(A  B) .. (2)
i 1
On substituting (2) in (1) we get,
  P(A  B) = P(A) + P(B) – P(A  B)
P  Ai   P( A )  P( AC  A )  P( AC  AC  A )  ...
 i 1  1 1 2 1 2 3 Corollary (1) If A  B = , P(A  B) = P(A) + P(B)
(2) P(A  B) = 1 – P(A  B)C
 P( A1 )  P( A2 )  P( A3 )  ...
= 1 – P (AC  BC)
   i.e., P[the occurrence of atleast one event] = 1 – P[None of them is
Since P ( A  A2 )  P(A2) etc., i.e.,
C P  Ai    P( Ai ) occurring]
1
 i 1  i 1 Theorem 7 (Addition theorem for 3 events)
Theorem 5 (Complementation) If A, B, C are any three events,
P(AC) = 1 – P(A) P(A  B  C) = P(A) + P(B) + P(C) – P(A  B) – P(B  C) –
Proof
P(A  C) + P(A  B  C)
We have A  AC = S  P(A  AC ) = P(S) Proof
i.e., P(A) + P(AC ) = 1, by axiom 2 and 3 P(AC ) = 1 – P(A) Let B  C = D, Then P(A  B  C) = P(A  D)
i.e., P [Non occurrence of an event] = 1 – P[Occurrence of that event] = P(A) + P (D) – P(A  D), by theorem 6

Introductory Statistics 136 Introductory Statistics 137


School of Distance Education School of Distance Education

= P(A) + P (B  C) – P[A  (B  C)]


This shows that P(A) satisfies all the axioms of probability. Thus we
= P(A) + P (B) + P(C) – P(B C) – P[(AB) (A  C)] can see that the classical definition is a particular case of axiomatic definition.
= P(A) + P (B) + P(C) – P(B C) – {P(AB) + In other words, the axiomatic definition can be deduced to classical definition
of probability if it is defined on a discrete or finite sample space with
P(AC) – P(A  B C)} equally likely points.
= P(A) + P (B) + P(C) – P(A  B) –
SOLVED PROBLEMS
P(B C) – P(A  C) + P(A  B C)
Corollary Example 10
(1) If the event A, B, C are mutually exclusive A die is rolled. If x is the number shown on the die. 7x coins are tossed,
If y is the number of heads (x , y) is recorded. Write down the sample
P(A  B  C) = P(A) + P(B) + P(C)
space of this experiment.
(2) P(A  B  C) = 1 – P (A  B  C) C = 1 – P(AC  BC  CC)
Solution
Probability in finite sample space with equally likely points
For certain random experiment there is a finite number of outcomes, If x is 1, 7 coins are tossed. If x = 2, 14 coins are tossed and so on. If
say n and the probability attached to each outcome is 1/n. The classical x = 6, 42 coins are tossed. When y denotes the number of heads obtained,
definition of probability is generally adopted for these problems. But we with x = 1, the pair (x, y) takes the values (1, 1) (1 , 2) (1 , 3) … (1, 7}.
can see that the axiomatic definition is applicable as well. Thus the required sample space is
Definition : Let E1, E2, … En be n sample points or simple events in a S = {(1,1}, (1.2), (1,3) ........ (1, 7)}
discrete or finite sample space S. Suppose the set function P with (2.1), (2,2), (2.3) ........ (2, 14)
domain the collection of all subsets of S satisfies the following
(3.1), (3.2), (3.3) ........ (3, 21)
conditions.
(4.1), (4.2), (4,3) ........ (4,28)
1
(i) P(E1) = P(E2) = P(E3) = … = P(En) = (5.1), (5.2), (5,3) ........ (5,35)
n (6,1), (6,2), (6.3) ........ (6,42)}
(ii) P(S) = P(E1  E2  … En ) = P(E1) + P(E2) + … + P(En)
Example 11
1 1 1 n If A1, A2, A3 are three events which are exhaustive, show that B1 = A1, B2
= + +… (n terms) = =1
n n n n = A1C  A 2 , B3 = A1C  A 2C  A 3 are exhaustive and mutually exclusive.
(iii) If A is any event which contains m sample points, say E1, E2, … Em
then, P(A) = P(E1  E2  … Em ) Solution
Since A1, A2, and A3 are exhaustive, we have
= P(E1) + P(E2) + … + P(Em), since Ei  Ej = , ij
A1  A 2  A3 = S
1 1 1 m We have to show that B1  B2  B3 = S,
= + +… (m terms) =
n n n n
Introductory Statistics 138 Introductory Statistics 139
School of Distance Education School of Distance Education

Now B1  B2  B3 = A1  ( A1C  A 2 )  ( A1C  A 2C  A 2 ) Example 13


Given P(A) = 0.30, P(B) = 0.78 and P(A  B) = 0.16. Find
= {( A1  A1C )  ( A1  A 2 )}  ( A1C  A 2C  A 3 ) i. P(AC  BC) ii, P(AC  BC) iii. P(A  BC)
= {S  ( A1  A 2 )}  ( A1C  A 2C  A 3 ) Solution
Given P(A) = 0.30, P(B) = 0.78 and P(A  B) = 0.16.
= ( A1  A 2 )  {( A1  A 2 )C }  A 3
(i) P(AC  BC) = P{(A  B)C} = 1 – P(A  B)
C
= ( A1  A 2 )  {( A1  A 2 ) }  ( A1  A 2  A 3 ) = 1 – {P(A) + P(B) – P(A  B)}
= S S  S = 1 – {0.30 + 0.78 – 0.16} = 0.08
i.e., the events B1, B2 and B3 are exhaustive. (ii) P(A  B ) = P{(A  B)C}= 1 – P(A  B)
C C

To show that B1, B2 and B3 are mutually exclusive, = 1 – 0.16 = 0.84


B1  B2  B3 = A1  ( A1C  A 2 )  ( A1C  A 2C  A3 ) (iii) P(A  B )C
= P[A – (A  B)]

= ( A1  A1C )  A 2  ( A1C  A 2C  A 3 ) = P(A) – P(A  B)


= 0.30 – 0.16 = 0.14
= (1  A 2 )  ( A1C  A 2C  A 3 )
Example 14
=   ( A1C  A 2C  A3 ) The probability that a student passes statistics test is 2/3 and the
probability that he passes both statistics and Mathematics test is 14/45.
= The events B1, B2 and B3 are mutually exclusive.
The probability that he passes at least one test is 4/5. What is the probability
Example 12 that he passes Mathematics test?
In a swimming race the odds that A will win are 2 to 3 and the odds that Solution
B will win are 1 to 4. Find the probability and the odds that A or B wins the
race? Define, A - the student passes statistics test.
B -he passes the Mathematics test.
Solution
2 2 Given P(A) = 2/3. P(A  B) = 14/45. P(A  B) = 4/5
We have P(A) = 
2 1 3 15 We have to find P(B). By addition theorem,
P(B) = 
1 4 5 P(A  B) = P(A) + P(B) – P(A  B)
P(A or B) = P(A) + P(B) since A and B are m.e ie., 4/5 = 2/3 + P(B) – 14/45.
2 1 3
=   70
5 5 5 P(B) = 4/5 – 2/3 + 14/45 =
 Odds that A or B wins are 3 to 2. 135

Introductory Statistics 140 Introductory Statistics 141


School of Distance Education School of Distance Education

CONDITIONAL PROBABILITY P[( A  C )  B]


P(A  C | B) =
P( B)

Definition P[( A  B)  (C  B)]


Let A and B be any two events. The probability of the event A given that = by associative property
P( B)
the event B has already occured or the conditional probability of A given B,
denoted by P(A | B) is defined as P( A  B)  P(C  B)
= since A  B and C  B are disjoint
P( A  B) P( B)
P(A | B) = , P(B)  0
P( B)
P( A  B) P(C  B)
Similarly the conditional probability of B given A is defined as = + = P(A|B) + P(C|B)
P( B) P( B)
P( A  B) That is, conditional probability satisfies all the axioms of probability.
P(B | A) = , P(A)  0 Therefore P(A|B) is a probability function or probability measure.
P( A)

Remarks: Multiplication law of probability


(i) For P(B) > 0 P(A | B)  P(A)
Theorem
(ii) P(A | B) is not defined if P(B) = 0
For any two events A and B
(iii) P(B | B) = 1
P(A  B) = P(A) P(B|A), P(A) > 0
Theorem = P(B) . P(A|B), P(B) > 0
For a fixed B with P(B) > 0, P(A | B) is a probability function (or
probability measure). where P(A|B) and P(B|A) are the conditional probabilities of A and B
Proof respectively.
Here we have to show that conditional probability satisfies all the axioms
of probability.
Independent Events
P( A  B) Definition
(i) P(A | B) =
P( B)
 0, by axiom (1)
Two or more events are said to be independent if the probability of any
one them is not affected by the supplementary knowledge concerning the
P( S  B) P( B) materialisation of any number of the remaining events. Otherwise they are
(ii) P(S | B) = = =1 said to be dependent.
P( B) P( B)
(iii) For any two adjoint events A and C
Introductory Statistics 142 Introductory Statistics 143
School of Distance Education School of Distance Education

Independence of two events A and B Note. 1


An event A is said to be independent (statistically independent) of event For the mutal independence of n events, A1, A2, … , An the total number
B, if the conditional probability of A given B, i.e., P(A|B) is equal to the of conditions to be satisfied is 2n – 1 – n. In particular, for three events we
unconditional probability of A. have 4 = (23 – 1 – 3) conditions for their mutual independence.
In symbols, P(A | B) = P(A) Note. 2
Similarly if the event B is independent of A, we must have We can note that pairwise or mutual independence of events A1, A2, …
P(B | A) = P(B) , An is defined only when P(Ai)  0 , for i = 1, 2, …, n.

Since P(A  B) = P(A) P(B|A) and since P(B|A) = P(B) when B is Note 3
independent of A, we must have, P(A  B} = P(A) . P(B) Pairwise independence does not imply mutual independence.

Hence, the events A and B are independent if Theorem


P(A  B) = P(A) P(B) Mutual independence of events implies pairwise independence of events.
The converse is not true.
Pairwise and Mutual independence
Proof
Definition From the definition of mutual independence, it is clear that mutual
independence implies pair-wise independence. We shall prove that the
A set of events A1, A2, … , An are said to be pairwise independent converse is not necessarily true. i.e., pair-wise independence does not
if every pair of different events are independent. imply mutual independence. We can illustrate it by means of an example
That is, P(Ai  Aj) = P(Ai) P(Aj) for all i and j, ij. due to S.N. Bernstein.
Let S = {1, 2, 3, 4} where P(i) = 1/4 for for i = 1, 2, 3, 4.
Definition
A set of events A1, A2, … , An are said to be mutually independent if Let A = {1, 2}, B = {1, 3} and C = {1, 4}

P(Ai  Aj  …  Ar) = P(Ai) P(Aj) … P(Ar) for every subset (Ai, Aj,
Then P(A)= P(B) = P(C) = 1/2
…, Ar) of A1, A2, … , An and consider the collection of events A.B,C. These events are pairwise
independent but not mutually independent.
That is the probabilities of every two, every three…, every n of the
events are the products of the respective probabilities. Since they are pairwise independent we have,
For example, three events A, B and C are said to be mutually independent P(A  B) = 1/4 = P(A)P(B)
if P(B  C) = 1/4 = P(B)P(C)
P(A  B)
P(A  C) = 1/4 = P(A)P(C)
= P(A) P(B)
P(B  C)
But P(A  B  C) = P(1) = 1/4
= P(B) P(C)
P(A  C) = P(A) P(C) and
1 1 1 1
P(ABC) = P(A) P(B) P(C) P(A).P(B).P(C) =   
2 2 2 8
Introductory Statistics 144 Introductory Statistics 145
School of Distance Education School of Distance Education

Thus P(A  B  C) ¹ P(A) P(B) P(C) (ii) P(AC  B) = P(B) P(AC|B)


Hence they are not mutually independent. = P(B) [ 1 – P(A | B)]
Multiplication Theorem (independent events) = P(B) [1 – P(A)]
= P(AC) P(B)
If A and B are two independent events,
ie., Ac and B are independent
P(A  B) = P(A).P(B)

Proof (iii) P(AC BC) = P(A  B)C = 1 – P(A  B)


= 1 – {P(A) + P(B) – P(A  B)}
We have, for any two events A and B
P(AB) = P(A) P(B|A)
= 1 – P(A) –P(B) + P(A) P(B)
Since A and B are independent, we have P(B|A) = P(B),
since P(AB) = P(A) P(B)
 P(A  B) = P(A) P(B).
Note = [1 – P(A)] – P(B) [1 – P(A)]
If A and B are independent the addition theorem can be stated as P(A  = [1 – P(A)] [1 – P(B)] = P(AC) P(BC)
B) = P(A) + P(B) – P(A). P(B) i.e., AC and BC are independent
Baye’s Theorem
Theorem
If A and B are two independent events P (B )P ( A | B )
i i
then (i) A and BC are independent P(Bi / A) = n
(ii) AC and B are independent  P (B i )P ( A | B i )
i 1
(iii) AC and BC are independent

Proof Note 1
Since A and B are independent, we have Here the probabilities P(Bi | A) for i = 1, 2, …, n are the probabilities
determined after observing the event A and P(Bi) for i = 1, 2, ....., n are the
P(A|B) = P(A). P(B|A) = P(B) and P(AB) = P(A).P(B) probabilities given before hand. Hence P(Bi) for i = 1, 2, ......, n are called
(i) Now, P(ABC) = P(A) P(BC|A) ‘a priori’ probabilities and P(Bi | A) for i =1, 2, ....., n are called “a posteriori’
probabilities. The probabilities P(A|Bi), i = 1, 2, ....., n are called ‘likely
= P(A) [1–P(B|A)]
hoods’ because they indicate how likely the event A under consideration is
= P(A) [1– P(B)] to occur, given each and every, ‘a priori’ probability. Baye’s theorem
= P(A) P(BC) gives a relationship between P(Bi | A) and P(A | Bi) and thus it involves a
ie., A and BC are independent type of inverse reasoning. Baye’s theorem plays an important role in
applications. This theorem is due to Thomas A Baye’s.

Introductory Statistics 146 Introductory Statistics 147


School of Distance Education School of Distance Education
Note 2 P( A  B C ) P( A)  P( A  B)
P(A|B )C
= 
In the case of two events A and B satisfying the assumption P(B) > 0 P( B C ) 1  P( B)
and 0 < P(B) < 1 we have,
1 / 3  1 / 8 5 / 24
P( B) P( A | B) = = = 5/18
1  1/ 4 3/ 4
P(B | A) =
P( B) P( A | B)  P( B C ) P( A | B C )
Example 3
Example 1 The odds that A speaks the truth are 3:2 and the odds that B speaks
Let A and B be two events associated with an experiment and suppose the truth are 5:3. In what percentage of cases are they likely to contradict
P(A) = 0.5 while P(A or B) = 0.8. Let P(B) = p. For what values of p are each other on an identical point?
(a) A and B mutually exclusive (b) A and B independent. Solution
Solution Define the events,
A - A speaks the truth
Given P(A) = 0.5, P(A  B) = 0.8, P(B) = p
B - B speaks the truth
(a) If A and B are mutually exclusive
 P(A) = 3/5, P(AC) = 2/5
P(AB) = P(A) + P(B)
P(B) = 5/8, P(BC) = 3/8
i.e., 0.8 = 0.5 + p They will contradict each other on an identical point means that when
 p = 0.3 A speaks the truth, B will tell a lie and conversely.
(b) If A and B are independent, we have P(They will contradict each other) = [P(ABC)  (ACB)]
P(AB) = P(A) + P(B) – P(A)P(B) = P(A  BC) + P(AC  B), since the events are m.e.

i.e., 0.8 = 0.5 + p – .5p = P(A) P(BC) + P(AC) P(B)

 .5p = 0.3  p = 3/5 3 3 2 5


   = 19/40
=
5 8 5 8
Example 2
ie,, In 47.5% of the cases, A and B contradict each other.
If A and B are two events such that P(A) = 1/3, P(B) = 1/4 and P(AÇB)
= 1/8. Find P(A|B) and P(A|BC) Example 4
A husband and wife appear in an interview for two vacancies in a firm.
Solution The probability of husbands selection is 1/7 and that of wife’s selection is
Given P(A) = 1/3, P(B) = 1/4, P(AB) = 1/8 1/5. What is the probability that
(a) both of them will be selected.
P( A  B) 1 / 8
 P(A|B) =
P( B)
= = 4/8 = 1/2 (b) only one of them will be selected. (c) none of them will be selected.
1/ 4
Introductory Statistics 148 Introductory Statistics 149
School of Distance Education School of Distance Education
Solution
P[(AB)C] = P[(AC)  (BC)]
Let us define the events as
A - The husband get selection. = P(AC) + P(BC) – P(ABC)
B - The wife get selection. = P(A)P(C) + P(B)P(C) – P(A)P(B)P(C)
 P(A) = 1/7, P(B) = 1/5; P(AC) = 6/7; P(BC) = 4/5 = P(C) [P(A)+ P(B) – P(A) P(B)]
(a) P(both of them will be selected) = P(A  B) = P(A  B) . P(C)
= P(A) . P(B), since A and B are independent i.e., A  B and C are independent.
1 1 1
=   Example 6
7 5 35 A problem in statistics is given to 3 students A, B and C whose chances
(b) P(only one of them will be selected) of solving it are 1/2, 3/4 and 1/4 respectively. What is the probability that
= P[(A  BC)  (AC B)] the problem will be solved?
= P(A  B C ) + P(A C  B) Solution
C C
= P(A) P(B ) + P(A ) P(B) Let us define the events as
A – the problem is solved by the student A
1 4 6 1
=    B – the problem is solved by the student B
7 5 7 5
C – the problem is solved by the student C
= 10/35
 P(A) = 1/2, P(B) = 3/4 and P(C) = ¼
(c) P(none of them will be selected) = P(A C  B C)
The problem will be solved if at least one of them solves the problem.
6 4 That means we have to find P(A  B  C).
= P(AC) P(BC) =  = 24/35
7 5 Now P(A  B  C)
= P(A) + P(B) + P(C) – P(A B)
Example 5 – P(BC) – P(AC)+ P(ABC)
If A, B and C are independent, show that A  B and C are independent. = P(A) + P(B) + P(C) – P(A)P(B)
– P(B)P(C) – P(A)P(C) + P(A)P(B)P(C)
Solution
1 3 1 1 3 3 1 1 1 1 3 1
Since, A, B and C are independent, we have
=           
P(A  B) = P(A)P(B), P(B  C) = P(B)P(C) 2 4 4 2 4 4 4 2 4 2 4 4
P(A  C) = P(A)P(C) and P(A  BC) = P(A)P(B)P(C) = 29/32

We have to show that


Introductory Statistics 150 Introductory Statistics 151
School of Distance Education School of Distance Education

Aliter Example 8
P(A  B  C) = 1 – P(A  B  C) C Suppose that there is a chance for a newly constructed house to collapse
= 1 – P(AC  BC  CC)
wether the design is faulty or not. The chance that the design is faulty is
10%. The chance that the house collapse if the design is faulty is 95% and
= 1 – P(AC) P(BC) P(CC)
otherwise it is 45%. It is seen that the house collapsed. What is the
 1  3  1  probability that it is due to faulty design?
=1 – 1  2 1  4 1  4 
    Solution
= 29/32 Let B1 and B2 denote the events that the design is faulty and the design
is good respectively. Let A denote the event that the house collapse. Then
Example 7 we are interested in the event (B1|A), that is, the event that the design is
A purse contains 2 silver coins and 4 copper coins and a second purse faulty given that the house collapsed. We are given,
contains 4 silver coins and 3 copper coins. If a coin is selected at random P(B1) = 0.1 and P(B2) = 0.9
from one of the purse. What is the probability that it is a silver coin? P(A|B1) = 0.95 and P(A|B2) = 0.45
Solution Hence
Define the events P ( B1 ).P ( A | B1 )
P(B1|A) =
B1 – selection of 1st purse P ( B1 ).P ( A | B1 )  P ( B2 ).P ( A | B2 )
B2 – selection of 2nd purse
A – selection of silver coin (0.1)(0.95)
=
P(B1) = P(B2) = 1/2 (0.1)(0.95)  (0.9)(0.45)
P(A|B1) = 2/6, P(A|B2) = 4/7 = 0.19
By theorem on total probabilities
Example 9
P(A) = P(A  B1) + P(A  B2)
Two urns I and II contain respectively 3 white and 2 black bails, 2
= P(B1) P(A|B1) + P(B2) P(A|B2)
white and 4 black balls. One ball is transferred from urn I to urn II and
1 2 1 4 then one is drawn from the latter. It happens to be white. What is the
=    probability that the transferred ball was white.
2 6 2 7
Solution
1 2 7  12 19
=    Define
6 7 42 42
B1 - Transfer a white ball from Urn I to Urn II
B2 - Transfer a black ball from Urn I to Urn II.

Introductory Statistics 152 Introductory Statistics 153


School of Distance Education School of Distance Education

A - Select a white ball from Urn II. 6. If A  B, the probability P(A|B) is equal to
Here, P(B1) = 3/5, P(B2) =2/5 a) zero b) one
P(A|B1) = 3/7, P(A|B2) =2/7 c) P(A)/P(B) d) P(B)/P(A)
We have to find P(B1|A), 7. The probability of two persons being borned on the same day (ignoring
By Baye’s theore, date) is
a) 1/49 b) 1/365
P( B1 ).P( A | B1 )
P(B1|A) = c) 1/7 d) none of the above
P( B1 ) P( A | B1 )  P( B2 ) P( A | B2 )
3/ 53/ 7 9 / 35 9
= = =
3 / 5  3 / 7  2 / 5  2 / 7 13 / 35 13
8. The probability of throwing an odd sum with two fair dice is
EXERCISES a) 1/4 b) 1/16 c) 1 d) 1/2
9. If P(A|B) = 1/4, P(B|A) = 1/3, then P(A)|P(B) is equal to
Multiple choice questions
a) 3/4 b) 7/12
1. Probability is a measure lying between
c) 4/3 d) 1/12
a) – to + b) – to +1
10. If four whole numbers are taken at random and multiplied, the chance
c) –1 to +1 d) 0 to 1 that the first digit is their product is 0, 3, 6 or 9 is
2. Classical probability is also known as a) (2/5)3 b) (1/4)3 c) (2/5)4 d) (1/4)4
a) Laplace’s probability b) mathematical probability
c) a priori probability d) all the above
Fill in the blanks
3. Each outcome of a random experiment is called
11. Classical definition of probability was given by ……….
a) primary event b) compound event
12. An event consisting of only one point is called ……….
c) derived event d) all the above
13. Mathematical probability cannot be calculated if the outcomes are
4. If A and B are two events, the probability of occurance of either A or ……….
B is given by 14. In statistical probability n is never ……….
a) P(A)+P(B) b) P(AB) 15. If A and B are two events, the P(A  B) is ……….
c) P(AB) d) P(A)P(B) 16. Axiomatic definition of probability is propounded by ……….
5. The probability of intersection of two disjoint events is always 17. Baye’s rule is also known as ……….
a) infinity b) zero 18. If an event is not simple, it is a ……….
c) one d) none of the above

Introductory Statistics 154 Introductory Statistics 155


School of Distance Education School of Distance Education

Very short answer questions Long essay questions


19. Define a simple event. 37. Two unbiased dice are tossed. What is the probability that the sum
of points scored on the two dice is 8?
20. Define random experiment.
38. From a group consisting of 6 men and 4 women a committee of 3 is
21. Define equally likely cases.
to be chosen by lot. What is the probability that all 3 are men?
22. State statistical definition of probability.
39. Two events A and B are statistically independent. P(A) = 0.39, P(B)
23. Define conditional probability = 0.21 and P(A or B) = 0.47. Find the probability that
24. State Baye’s rule (a) Neither A nor B will occur
Short essay questions (b) Both A and B will occur
25. Define Sample space and Event When will you say that two events (c) B will occur given that A has occurred
are mutually exclusive? (d) A will occur given that B has occurred
26. Define random experiment, sample space and Event. A coin is 40. If P(A) = 0.3, P(B) = 0.2. P(A  B) = 0.4, find
repeatedly tossed till a head turns up. Write down the sample space.
P(A  B). Examine whether A and B are independent.
27. Give the classical and axiomatic definition of probability, Explain how
axiomatic definition is more general than classical. 41. The probability that A hits a target is 1/4 and the probability that B
hits it is 2/5. What is the probability that the target will be hit if A and
28. Define (i) Mutually exclusive events: (ii) Equally likely
B each shoot at the target?
events: and (iii) Independent events and give example of each. 42. A coin is tossed four times. Assuming that the coin is unbiased, find
29. Give Von Mises definition of empirical probability, Compare this with the probability that out of four times, two times result in head,
the classical definition of probability. 43. Two urns each contain balls of different colours are stated below.
30. State and prove the addition theorem of probability. urn I : 4 black; 3 red; 3 green.
31. Define Conditional probability. urn II : 3 black; 6 red: 1 green.
32. State and prove addition and multiplication theorem of probability. An urn is chosen at random and two balls are drawn from it. What
33. Show that is the probability that one is green and the other is red.
P(A  B)  P(A)  P(A  B)  P(A) + P(B)
34. State and prove Bayes’ theorem. 44. If two dice are rolled, what is the probability that the sum is 7 if we
35. Define Conditional probability. Prove that if P(A) > P(B) then know that at least one die shows 4?
P(A|B) > P(B|A). 45. There are three urns containing balls of different colours as stated
36. Let A, B and C denote events. If P(A | C)  P(B | C) and below:
P(A | Cc)  P(B | CC), then show that P(A) P(B)
Urn I : 4 red, 2 black, 4 green.
Urn II : 3 red, 4 black, 5 green.
Urn III: 2 red, 4 black, 2 green.

Introductory Statistics 156 Introductory Statistics 157


School of Distance Education

An urn is chosen at random and two balls are drawn from it.
What is the probability that both are red?
46. Three urns are given each containing red and while chips as
indicated.
Urn 1 : 6 red and 4 white.
Urn 2 : 2 red and 6 white.
Urn 3 : 1 red and 8 white.
(i) An urn is chosen at random and a ball is drawn from the
urn. The ball is red. Find the probability that the urn chosen
was urn 1.
(ii) An urn is chosen at random and two balls are drawn
without replacement from this urn. If both balls are red, find
the probability that urn 1 was chosen. Under these
conditions, what is the probability that urn III was chosen.
47. State Baye’s theorem. A box contains 3 blue and 2 red balls
while another box contains 2 blue and 5 red balls. A ball
drawn at random from one of the boxes turns out to be blue.
What is the probability that it came from the first box?
48. In a factory machines A, B and C produce 2000, 4000 and
5000 items in a month respectively, Out of their output 5%,
3% and 7% are defective. From the factory’s products one is
selected at random and inspected. What is the probability
that it is good? If it is good, what is the probability that it is
from machine C?

Introductory Statistics Page 158


School of Distance Education

MODULE IV

RANDOM VARIABLE
AND
PROBABILITY DISTRIBUTIONS
We have seen that probability theory was generally
characterised as a collection of techniques to describe, analyse and
predict random phenomena. We then introduced the concept of
sample space, identified events with subsets of this space and
developed some techniques of evaluating probabilities of events.
The purpose of this chapter is to introduce the concepts of
random variables, distribution and density functions and a thorough
understanding of these concepts is very essential for the
development of this subject.
Random variables, to be introduced now, can be regarded
merely as useful tools for describing events. A random variable will
be defined as a numerical function on the sample space S.
Definition
A random variable (r.v.) is a real valued function defined over
the sample space. So its domain of definition is the sample space S
and range is the real line extending from −∞ to +∞. In other words
a r.v. is a mapping from sample space to real numbers. Random
variables are also called chance variables or stochastic variables. It
is denoted by X or X( ).
In symbols, X : S → R (−∞, +∞ )

Introductory Statistics Page 159


School of Distance Education

The above definition of a random variable as such is not perfect.


Because all functions defined on S cannot be random variables. It
has to satisfy some basic requirements.
From the point of view of modern mathematics an acceptable
definition of a random variable is given below.
Random variable X is a function whose domain is S and range is
set of real values from −∞ to +∞ such that the set {X ≤ x} ∈ B, the
Borel field, for any real number x. That means random variables X
are functions on S which are measurable w.r.t. the Borel field.
Here we can note that each set of the form {X ≤ x} is an event.
As an illustration, consider a random experiment consisting of
two tosses of a coin. Then the sample space is given by S = {HH,
HT, TH, TT}
Then to each outcome in the sample space there corresponds
a real number X( ). It can be presented in the tabular form as,

Outcome ( ) : HH HT TH TT

Values of X( ) : 2 1 1 0

Thus we have defined a one dimensional random variable as a


real valued function on S which is associated with a random
experiment. That is a one dimensional random variable is a
measurable function X( ) with domain S and range (−∞, +∞) such
that for every real number x,
the event { : X( ) ≤ x} ∈ B.

Introductory Statistics Page 160


School of Distance Education

Example
In coin tossing experiment, we note that
S = { 1 , 2} where 1 = Head, 2 = Tail
0 = Tail (T)
Now define X( ) =
1 = Head (H)
Here the random variable X( ) takes only two values as can
be either head or tail. Such a random variable is known as Bernoulli
random variable.
Remark: If X1 and X2 are r.v.s. and C is a constant,
then (i) C Xl is a r.v.
(ii) Xl + X2 is a r.v.
(iii) Xl- X2 is a r.v.
(iv) max[Xl, X2] is a r.v.
(v) min[Xl, X2] is a r.v.
Random Variables are of two types (i) discrete (ii) continuous. A
random variable X is said to be discrete if its range includes finite
number of values or countably infinite number of values. The
possible values of a discrete random variable can be labelled as x l ,
x2 , x3... eg. the number of defective articles produced in a factory in
a day in a city, number of deaths due to road accidents in a day in a
city, number of patients arriving at a doctors clinic in a day etc.
A random variable which is not discrete is said to be continuous.
That means it can assume infinite number of values from a
specified interval of the form [ , b].
A few examples of continuous random variable are given below
(1) A man brushes his teeth every morning. X represents the
time taken for brushing, next time (2) X represents the height of
a student randomly chosen from a college
(3) X represents the service time of a doctor on his next patient
(4) life time of a tube etc.

Introductory Statistics Page 161


School of Distance Education

Note that r.v.s. are denoted by capital letters X, Y, Z etc. and the
corresponding small letters are used to denote the value of a.r.v.
We are not interested in random variables, where as, we will be
interested in events defined in terms of random variables. From the
definition of the random variable X, we have seen that each set of
the form {X ≤ x} is an event. The basic type of events that we shall
consider are the following.
{X= }, {X=b}, { < X < b} { < X ≤ b}, { ≤ X < b},
{ ≤ X ≤ b} where −∞ ≤ ≤ b ≤ ∞. The above subsets are being
events, it is permissible to speak of its probability. Thus with every
random variable we can associate its probability distribution or
simply distribution.
Definition
By a distribution of the random variable X we mean the
assignment of probabilities to all events defined in terms of this
random variable.
Now we shall discuss the probability distributions in the case of
discrete as well as continuous random variables.
Probability Distributions
i. Discrete:
The probability distribution or simply distribution of a discrete r.v.
is a list of the distinct values of xi of X with their associated
probabilities
f(xi) = P(X = xi).
Thus let X be a discrete random variable assuming the values
x1, x2, ...xn from the real line. Let the corresponding probabilities be
f(x1), f(x2)....f(xn). Then P(X = xi) = f(xi) is called probability mass
function or probability function of X, provided it satisfy the
conditions
(i) f(xi) ≥ 0 for all i
(ii) Σ f(xi) = 1

Introductory Statistics Page 162


School of Distance Education

The probability distribution of X may be stated either in the form of a


table or in the form of a formula. The formula gives f(x) in terms of x which
represents P(X = x). It can also be denoted as p(x). The formula model is
always convenient but it need not be available for all random variables. We
can sketch the graph of a probability distribution or probability mass
function as given below.
Graphical presentation of a probability distribution
It is usually true that we cannot appreciate the salient features of a
Probability distribution by looking at the number in table. The two main
ways to graph a discrete probability distribution are the line diagram and
Probability histogram.
Line diagram
The distinct values of the r.v. X are marked on the X axis. At each
value x, a vertical line is drawn whose height in equal to its probability. f(x)
or p(x). for example, the line diagram for the following probability
distribution is shown below.
Value x 1 2 3 4
f(x) 1 1 1 1
8 4 2 8

4
8
3
8
2
8
1
8

0 1 2 3 4 5

Line Diagram

Introductory Statistics Page 163


School of Distance Education

Probability Histogram
On the X axis we take values of the r.v. With each value x as centre, a
vertical rectangle is drawn whose area is equal to the probability f(x). Note
that in plotting a probability histogram the area of a rectangle must be
equal to the probability of the value at the centre. So that the total area
ruler the histogram must be equal to the total probability (i.e. unity). For
example the probability histogram of the following probability distribution is
given below.

x 0 0.5 1.0 1.5 2.0

f(x) .1 .2 .3 .25 .15

Y
Probability

.6
.5
Area = .6 x .5

.4
.3
.2
.1

0 0 0.5 1.0 1.5 2.0


X

Probability histogram

The probability histogram is recommended for distribution


with equal spaced x values. When the spacing of x values is
unequal the line diagram should be used. One advantage of
probability histogram is that we can compare two or more
probability distribution to determine the nature and extent of their
similarities and dissimilarities.

Introductory Statistics Page 164


School of Distance Education

ii. Continuous
We now turn our attention to describing the probability
distribution of a random variable that can assume all the values in
an interval. The probability distribution of a continuous random
variable can be visualised as a smooth form of the relative
frequency histogram based on a large number of observations.
Because probability is interpreted as long run relative frequency,
the curve obtained as the limiting form of the relative frequency
histograms represent the manner in which the total probability, is
distributed over the range of possible values of the random variable
X. The mathematical function denoted by f(x) whose graph
produces this curve is called probability density function of the
continuous r.v. X.
Definition
If X is a continuous random variable and if P(x ≤ X ≤ x + dx) =
f(x)dx, then f(x) is called probability density function (pdf) of a
continuous r.v. provided it satisfy the conditions (i) f(x) ≥ 0 ∀ x and
(ii) ∫ f(x)dx = 1.

We can justify the term ‘density function’ to some extent from the
following argument. We have

∫ f(x)dx = P(x ≤ X ≤ x + ∆x).
When ∆x is very small, the mean value gives us the
approximation.

∫ f(x)dx = f (x). ∆x
∴ f (x)∆x = P(x ≤ X ≤ x + ∆x)
( ≤ ≤ ∆ )
∴ f (x) =

( , ∆
= Total probability in the interval

= Probability per unit length


= Density of probability

Introductory Statistics Page 165


School of Distance Education

Result. 1
P(a ≤ x ≤ b) = P(a ≤ x < b)

P(a ≤ x ≤ b) ∫ f(x)dx
= the area under the curve
y = f(x), enclosed between the ordinates
drawn at x = and x = b.
Result. 2
Probability that a continuous r.v. X will assume a particular value
is zero ie., P(X = ) = 0
Distribution function
Definition
For any random variable X, the function of the real variable x
defined as Fx(x) = P(X £ x) is called cumulative probability
distribution function or simply cumulative distribution function (cdf)
of X. We can note that the probability distribution of a random
variable X is determined by its distribution function.
If X is a discrete r.v. with pmf p(x), then the cumulative
distribution function is defined as
Fx(x) = P(X ≤ x) = ∑ p(x).
If X is a continuous r.v. with pdf f(x), then the distribution function
is defined as

Fx(x) = P(X ≤ x) = ∫ f(x)dx.


For convenience we will write Fx(x) as F(x) or F.

Properties of distribution function


If F(x) is the distribution function of a r.v. X then it has the
following properties.
1. F(x) is defined for all real values of x.
2. F(−∞) = 0, F(+∞) = 1.

Introductory Statistics Page 166


School of Distance Education

3. 0 ≤ F(x) ≤ l
4. F( ) ≤ F(b) if < b
That means F(x) is non decreasing
5. If X is discrete, F(b) - F( ) = P( < X ≤ b)
6. For a discrete r.v. the graph of f(x) indicates that it is a step
function or a staircase function.
7. F(x) is a continuous function of x on the right.
8. If X is continuous F(b) - F( ) = P( ≤ X ≤ b) = Area under the
probability curve.
9. F(x) possesses a continuous graph, if X is continuous. If F(x)
possesses a derivative,
( )
then = f(x)
10. The discontinuities of F(x) are at the most countable.

Moments of a continuous probability


distribution
Let X be a random variable with the pdf f(x). Then for a
continuous r.v., we can calculate the measures of central tendency
and the measures of disperson as follows. For a discrete r.v., we
have to replace integration by summation.
1. Arithmetic Mean of X = ∫ x f(x)dx .

2. Median, M is determined by solving the equation ∫ f(x)dx =


or ∫ f(x)dx =
3. Mode (Z) is determined by solving the equation f’(x) = 0 and
verifying the condition f”(x) < 0 at the mode.
4. Geometric Mean is given by the equation
log G = ∫ (log x) f(x)dx
5. Harmonic Mean can be calculated by the equation

Introductory Statistics Page 167


School of Distance Education

=∫ f(x)dx
6. Quartiles are determined by solving the equations
;∫ f(x)dx = ;∫ f(x)dx =
7. The rth central moment is determined by the equation
=∫ (x − μ) f(x)dx, r = l, 2, 3....
where μ is the mean of X.
8. In particular, the variance of X is calculated as
μ2 =∫ (x − μ) f(x)dx,
9. MD about Mean is given by
MD = ∫ |x − mean| f(x)dx,

SOLVED PROBLEMS
Example l
Obtain the probability distribution of the number of heads when
three coins are tossed together?
Solution
When three coins are tossed, the sample space is given by
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}. Here the r.v. X
defined as the number of heads obtained will takes the values 0, l,
2 and 3 from the real line w.r.t each outcome in S. We can assign
probabilities to each value of the r.v. as follows.
P{X = 0} = P{TTT} =
P{X = 1} = P{HTT or THT or TTH} =
P{X = 2} = P{HHT or HTH or THH} =
P{X = 3} = P{HHH} =

Introductory Statistics Page 168


School of Distance Education

Thus the probability distribution of X is given by

X=x 0 1 2 3 Total
1 3 3 1
P(X = x) 1
8 8 8 8

Example 2
From a lot containing 25 items 5 of which are defectives. 4 items
are chosen at random. Find the probability distribution of the
number of defectives obtained.
Solution
Let X be the number of defectives obtained. Here X takes the
values 0, l, 2, 3 and 4. If the items are drawn without replacement.
.
P{X = 0} = =

.
P{X = 1} = =

.
P{X = 2} = =

.
P{X = 3} = =

.
P{X = 4} = =

Introductory Statistics Page 169


School of Distance Education

Thus the probability distribution of X is

X=x 0 1 2 3 Total
969 380 40 1 1
P(X = x)
2530 2530 2530 2530

We can write this probability distribution as a probability function


as shown below.
.
f(x) = P(X = x) = , x = 0, 1, 2, 3, 4

= 0, elsewhere.

Example 3
Examine whether f(x) as defined below is a pdf.
f(x) = 0 ; x < 2
= (3 + 2x) ; 2 ≤ x < 4
= 0 ; x > 4.
Solution
To show that f(x) is a pdf we have to show that
= ∫ f(x)dx = l
Here = ∫ f(x)dx
1 1
= (3 + 2x)dx = (3x + x )
18 18
= |(12 + 16) − (6 + 4)| =
x 18 = 1

So f(x) is a pdf.

Introductory Statistics Page 170


School of Distance Education

Example 4
If the distribution function F(X) is given to be
⎧ 0< ≤1

F(x) =
⎨− + 0< ≤2

⎩ 1 <2
find the density function.
Solution
We know that
( )
= f(x)
( )
Here f(x) = = , when 0 < x ≤ 1

= (3 − ), when 1 < x ≤ 2
= 0, otherwise.

Example 5
Examine whether the following is a distribution function.
0 < −
F(x) = ( + 1) − ≤ ≤
1 >
Solution
a. F(x) is defined for all real values of x.
b. F(-∞) = 0 ;
c. F(∞) = l
d. F(x) is non decreasing.
− < >
e. F’(x) =
0 <− >

Introductory Statistics Page 171


School of Distance Education

F’(x) satisfies the pdf properties.


∴ F(x) is a distribution function.

Example 6
0 <0
Given F(x) = 0≤ ≤1
1 >1
Determine (a) P(X ≤ 0.5) (b) P(0.5 ≤ X ≤ 0.8) (c) P(X > 0.9)
Solution
P(X ≤ 0.5) = F(0.5) = (0.5)2 = 0.25
P(0.5 ≤ X ≤ 0.8) = P (0.5 < X ≤ 0.8)
= F(0.8) - F(0.5)
= (0.8)2 - (0.5)2 = 0.39
P(X > 0.9) = l - P(X ≤ 3.9) = l - F(0.9)
= l - (0.9)2 = 0.19
Example 7
A random variable X has the density function
f(x) = if −∞ < x < α
= 0, elsewhere.
Determine K and the distribution function.

Solution
We know that ∫ ∫( ) = 1
ie. ∫ = 1
(tan ) = 1
K − − = 1
K = 1 K = 1/

Introductory Statistics Page 172


School of Distance Education

The distribution function is


F(x) = P(X ≤ x) = ∫ ∫( )

= ∫
= (tan )

= tan − −
= tan +
Example 8
Evaluate the distribution function F(x) for the following density
function and calculate F(2)
0< ≤1
F(x) = (4 − ) 1 < ≤ 4
0 ℎ
Solution
By definition, F(x) = ∫ f (y)dy

Therefore, for any value of x such that −∞ < < 0, F(x) =


∫ 0 dy = 0

since f(x) = 0 in this interval. For any x in 0 < x ≤ l,


F(x) =∫ 0 dy + ∫ dy =0+ =
For any x in 1 < x ≤ 4,
F(x) = ∫ 0 dy + ∫ dy + ∫ (4 − y) dy
= 0+ + ∫ (4 − y) dy

= + 4 −

Introductory Statistics Page 173


School of Distance Education

Evidently for x ³ 4, F(x) = 1and hence F(x) can be written as,

0 −∞ < < 0

⎪ 0< ≤1
⎨ + 4 − 1< ≤4

⎩ 1 ≥4

Hence, F(2) = + 4 −

at x = 2

Example 9
The probability mass function of a.r.v.X is given as follows.

x 0 1 2 3 4 5

p(x) k2 ′ 5 2k2 k
4 2
4 2

Final (a) k (b) write down the distribution function of x.


Solution
(a) We know that Σ P(x) = 1
ie., + + + +2 + =1
4 + 3 − 1= 0
±√ ±
k= == =1,

Introductory Statistics Page 174


School of Distance Education

Since probability is greater than zero, we have k = 1/4


(b) The probability distribution of X is

x 0 1 2 3 4 5 Total

p(x) 1/16 1/16 10/16 1/16 2/16 1/16 1

So the distribution function of X is given by


0 for <0
⎧ 1/16 for 0 ≤ < 1

⎪ 2/16 for 1 ≤ < 2
F(x) =P(X ≤ x) = 12/16 for 2 ≤ < 3
⎨13/16 for 3 ≤ < 4
⎪15/16 for 4 ≤ < 5

⎩ 1 for ≥5
Example 10
Evaluate the distribution function for the following probability
function.
f(x) = for x = -l

= for x = 0

= for x = 2

= for x = 3

= 0 elsewhere

Introductory Statistics Page 175


School of Distance Education

Solution
By definition, the distribution function F(x) is,
F(x) = P(X ≤ x) = ∑ f(x)

F(-1) = ∑ f(x) = 0 + =

F(0) = ∑ f(x) = 0 + + =

F(2) = + + =

F(3) = + + + =1

F(x) = 1 for x ≥ 3

This may be written in a better way as follows:


F(x) = 0 for x < - l or F(x) = 0,

= for x = - l = ,-l≤x<0

= for x = 0 = ,0≤x<2

= for x = 2 = ,2≤x<3

= 1 for x ≥ 3 = 1, 3 ≤ x < ∞

Introductory Statistics Page 176


School of Distance Education

When F(x) is represented graphically we get a diagram as shown


below since F(x) is a step function in this case. In general we can
expect a step function for the distribution function when the variate
is discrete.
F(x)

1 2/8

7/8

6/8 3/8

5/8

4/8

2/8

1/8

−∝ -1 0 1 2 3 ∝ X

Distribution function of a discrete variable


Example 13
A continuous r.v. X has the pdf ……f(x) = 3x 2 ; 0 ≤ x ≤ 1.
Find and b such that
(i) P(X  ) = P(X  )
(ii) P(X > b) = 0.05
Solution
(i) P(X  ) = ∫ ( ) = ∫ dx =
P(X  ) = ∫ ( ) = ∫ dx = −
=
= −

Introductory Statistics Page 177


School of Distance Education School of Distance Education

ie., 2 a 3 = 1 Example 15
Let the distribution function of X be

.
7
9
0
1/ 3
1 1 
a3 = , a    F(x) = 0 if x <  1
2 2
x 2
1 = if  1  x  1
2 4
(ii) P(X > a ) = 3x dx = 1  b3
b = 1 if x  1
3 3
1  b = 0.05, b = 0.95, b = 0.98 Find P(X = 1)
Example 14 Solution
Verify that the following is a distribution function:
We know that P(X = a ) = F( a + 0)  F( a  0)
F(x) = 0 ;x<  a  P(X = 1) = F(1 + 0)  F(1  0)
1x  = 1  3/4
=  1  ;  a  x  a
2 a  = 1/4

= 1 ;x>a EXERCISES
Solution
Multiple Choice questions
Obviously the properties (i), (ii), (iii) and (iv) are satisfied. Also we
1. The outcomes of tossing a coin three time are a variable of the type
observe that F(x) is continuous at x = a and x =  a as well
a. Continuous r.v b. Discrete r.v.
d 1
Now , F (x ) = ,a  x a c. Neither discrete nor continuous
dx 2a
d. Discrete as well as continuous
= 0, otherwise
2. The weight of persons in a country is a r.v. of the type
= f(x) (say)
a. discrete b. continuous
In order that F(x) is a distribution function, f(x) must be a p.d.f. Thus
c. neither a nor b d. both a and b
we have to show that
3. Let x be a continuous rv with pdf

 f ( x )d x f(x) = kx, 0  x  1 ;
= 1
 = k, 1  x  2;
 a a
= 0 otherwise
1
 f ( x )d x =  f ( x )d x =
2a  d x 1 The value of k is equal to
 a a
a. 1/4 b. 2/3
Hence F(x) is a d.f. c. 2/5 d. 3/4
Introductory Statistics 178 Introductory Statistics 179
School of Distance Education School of Distance Education

Fill in the blanks 21. A c.d.f. F(x) is defined as


4. A r.v is a .......................... function
0 , x 1
5. A continuous r.v can assume .......................... number of values 1
 4
with in the specified range  (x 1) ,1  x  3
F(x) = 16
6. The total area under a probability curve is .......................... 1 ,x  3
7. A continuous r.v.X has pdf f(x) = kx, o < x < 1, the value of k =
.......................... Find the p.d.f.
b Long essay questions
8. In terms of distribution function F(x),  f ( x )d x = .......................... 22. A random variable X has pmf given by P(X = 1) = 1/2,
a P(X = 2) = 1/3 and P(X = 3) = 1/6. Write down the distribution
9. The distribution function F(x) lies between .......................... function of X.
23. Find k and P(12  X  20) and P(X > 16) if following is the
Very short answer questions probability mass function of X.
10. Define a random variable.
11. What are the two types of r.v.s? X 8 12 16 20 24
12. Define probability mass function. f(x) 1/8 1/6 k/6 1/4 1/12
13. Define probability density function.
24. An r.v. Z takes the values 2, 0, 3 and 8 with probabilities
14. What are the axioms of pdf?
1 1 1 1
15. Define distribution function of a r.v. , , and respectively. Write down its d.f. and find
12 2 6 4
Short essay questions
16. Define distribution function of a random variable and write down its P(Z  1) and P(  1 Z < 8).
properties.
25. Define distribution function of a random variable. If X is a random
17. What is the relationship between Distribution function and Density variable with probability mass function
function?
18. State the properties of probability density function
x 2
19. What is random variable? Show by an example that any function  , x  1, 2, 3, 4
p(x) =  30
defined over the sample space need not be a random variable. 0
 , otherw ise
20. Find the contract C such that the function
f(x) = cx2 , 0 < x < 3
Write down the distribution function of X.
= 0 , otherwise 26. Two coins are tossed. X represents the number of heads produced.
is a pdf and compute P(1 < x < 2) Determine the probability distribution and the distribution function
of X.
Introductory Statistics 180 Introductory Statistics 181
School of Distance Education School of Distance Education

27. Examine whether the following can be a p.d.f. If so find k and CHANGE OF VARIABLE
P (2  x  3)

x 1 
f(x) = k    , 2  x  4 and 0 elsewhere. Change of variable technique is a method of finding the distribution of a
3 2 
function of a random variable. In many probability problems, the form of
28. Obtain the distribution function F(x) for the following p.d.f. the density function or the mass function may be complex so as to make
computation difficult. This technique will provide a compact description of
 x / 3, 0  x 1 a distribution and it will be relatively easy to compute mean, variance etc.
5
 We will illustrate this technique by means of examples separately for
 (4  x ), 1  x  4
f(x) =  27 discrete and continuous cases. Here we mention only the univariate case.
0, otherw ise 1. Suppose that the random variable X take on three values -1, 0 and
1 with probabilities 11/32, 16/32 and 5/32 respectively. Let us trans-
29. For the p.d.f. f(x) = 3a x2, 0  x  a find a and form the random variable X, taking Y = 2X + 1.
The random variable Y can also take on values -1, 1 and 3 respectively,
P(X  1/2 / 1/3  X  2/3)
where
30. Examine whether P(Y = -1) = P(2X + 1 = -1) = P(X = -1) = 11/32
0, x  2 , or x  4 P(Y = 1) = P(2X + 1 = 1) = P(X = 0) = 16/32

f(x) =  x  1 , 2  x  4 P(Y = 3) = P(2X + 1 = 3) = P(X = 1) = 5/32
 9 6
Thus the probability distribution of Y is
is a p.d.f. If so, calculate P(2 < X < 3)
Y 1 1 3 Total
3 2
 (1  x ) if 1  x 1 p(y) 11/32 16/32 5/32 1
31. If f(x) =  4
 0, otherw ise 2. In the above example, if we transform, the r.v. X as Y = X2, the
possible values of Y are 0 and 1.
0 , x  1
1 3 Therefore,P(Y = 0) = P(X = 0) = 16/32
 1 3
  x  x , 1  x 1 P(Y = 1) = P(X = -1 or X = 1)
Show that F(x) =  2 4 4
1, x 1 = P(X = -1) + P(X = 1)
= 11/32 + 5/32 = 16/32

Introductory Statistics 182 Introductory Statistics 183


School of Distance Education

Thus the probability distribution of Y is

Y 0 1 Total
p(y) 16/32 16/32 1

3. A r.v. X has the density f(x) = ,0<X <2

0 0< ≤1
Let (gx) = 1 1< ≤ 3 2
2 >3 2
We can find the probability mass function of g(x) as follows.
P{g(x)= 0} = P(0 < x ≤ 1) = ∫ f(x) dx

= ∫ dx = +2

= +2 = +2 = × = =
=
/
P{g(x) = 1} = P(1 < x ≤ 3/2) = ∫ f(x)dx
/
/
= ∫ dx = +2

= +3 - +2 =
P{g(x) = 2} = P{X > 3/2} = ∫ f(x)dx

2 2
1
= ∫ dx =
6 2
+2 3
2

= (2 + 4 ) − +3 =

Introductory Statistics Page 184


School of Distance Education

The probability distribution of g(x) is


g(x) 0 1 2 Total
P[g(x)] 20/48 13/48 15/48 1

4. If X is a continuous r.v. with pdf


2 0< <1
f(x) =
0 ℎ
Define Y = 3X + 1
Le G(y) be the distribution function of Y.
Then G(y) = P(Y ≤ y)
= P(3X + 1 ≤ y)
= P X≤
y−1
= ∫ f(x)dx = ∫ 2x dx = x2 0
3

= if 0 < <1
ie., G(y) = ( − 1) if 0 < y < 4
= 0; otherwise
The density function of y is given by
G(y) = G’(y) = ( − 1), 1 < <4
= 0, otherwise
Remark
From the above examples, we can observe that we can
determine the probability distribution of Y from the probability
distribution of X directly.
Let X be a r.v. defined on the sample space S. Let Y = g(X) be a
single valued and continuous transformation of X. Then g(x) is also

Introductory Statistics Page 185


School of Distance Education

a r.v. defined on S. Now we are interested in determining the pdf of


the new r.v. Y = g(X) given the pdf of the r.v. X
Result
Let X be a continuous r.v. with pdf f(x). Let Y = g(X) be strictly
monotone (increasing or decreasing) function of X. Assume g(x) is
differentiable for all x. Then the pdf of the r.v. Y is given by

f(y) = f(x) , where x is expressed in terms of y.

Example 1
Let X be a continuous r.v. with pdf f(x). Let Y = X2. Find the pdf
and the distribution function of Y?
Solution
Let X be a continuous r.v. with pdf f(x) and the distribution
function F(x). Let Y = X2. Let G(y) be the distribution function of Y
and g(y) its pdf.
Then G(y) = P(Y≤ y)
= P(X2 ≤ y) = P { |X| ≤ }
= P − ≤ ≤
= F − (− ) ....(1)
Now g(y) = G(y)
( )
= F( )+
√ √

= [F( ) + F’ − ]

= [f + f (− y)] .....(2)

Introductory Statistics Page 186


School of Distance Education

Note 1
If, however, the random variable X is of the discrete type the
distribution function of y is given by
0 if y ≤ 0
G(y) = ..... (3)
F y − F − y − P(X = y) if y > 0
If the point - y is not a jump point of the r.v.X. then
P(X = - y ) = 0 and the above result becomes identical with the
(1) given above.
Note 2
Let x1, x2 ... be the jump points of the r.v. X and y1, y2... be the
jump points corresponding to the r.v. Y according to the relation y i =
x .
Then P(Y = yi) = P(X2 = yi) = P(X = ) + P(X =)
Example 2
A r.v. X has density f(x) = Kx e , x > 0. Determine K and the
density of Y = X3
Solution :
Given f(x) = Kx e , x > 0.
We know that ∫ f(x)dx =1
∫ Kx e dx =1
Put x = t
∫ e dt =1
3x dx = dt
=1

− (0 − 1) =1
∴ =3

Introductory Statistics Page 187


School of Distance Education

The pdf of y is given by


f (y) = f(x)

= 3x e .

= e

= e ,y > 0

Example 3
a. If X has a uniform distribution in [0, 1] with pdf.
f(x) = 1, 0 ≤ x ≤ 1
= 0, otherwise
Find the distribution (p.d.f) of -2 log X.
b. If X has a standard cauchy distribution with p.d.f.
f(x) = , , +∞<X<∞
Find a p.d.f. for X2
Solution
Let Y = -2 log X. Then the distribution function G of Y
is
G(y) = P(Y ≤ y) = P(-2 logX ≤ y)
= P(logX ≥ - y/2) = P(X ≥ )
= 1- P(X ≤ )

= 1- ∫ f(x)dx = 1 - ∫ 1. dx = 1-

g(y) = ( )= :0<y<∞ ...(i)


[∵ [0, 1], = −2 log X ranges from 0 to ∞]

Introductory Statistics Page 188


School of Distance Education

Note
(i) is the p.d.f of a chi-square distribution with 2 degrees of
freedom
b. Let Y = X2
G(y) = P(Y ≤ y) = P( )≤ = − ≤ ≤+
= ∫√ ( ) = 2 ∫√ ( )

[∴ integrand is an even function of x.]


G(y) = (tan )√ = tan ;0 < <∞
The pdf g(y) of Y is given by
g(y) = ( )=

= ;0 ≤ <∞
[which is a beta distribution of second kind]

Introductory Statistics Page 189


School of Distance Education School of Distance Education

Find the probability function of X2 and X2 + 2


EXERCISES
10. If the probability mass function (p.m.f.) of X is
Multiple choice question
1. If a r.v. x has pdf  k ( x 2  2 ), x  1, 2, 3
f(x) = 3x, 0 < x < 1 = 0, otherwise g(x) = 
0, ot h er w ise
3 3
a. f(y) = (y  3) b. f(y) = (3  y ) find k. Also obtain the p.m.f. of Y = X2 + 1
16 4
Long essay questions:
3 3 11. If X has the p.d.f. f(x) = e x , x > 0 find the p.d.f. of Y = e x .
c. f(y) = (3  y ) d. f(y) = (3  y )
16 4 12. If the p.d.f. of X is
n 1/ 2 if 1  x 1
2. The distribution type of the variable y =  2  log X i is same as that f(x) = 0,
 other wise
i 1

of the variable What is the p.d.f. of Y = X2


13. f(x) = k(x + 2), 1  x  5 is the p.d.f. of a r.v. Find k and hence
1
 n  find the p.d.f of Y = X2.
a.  2 logxi b. 2 log   x i 
 i  1  1
 , 0 .9  x  1
f(x) =  2
c. both a and b d. neither a nor b
14.
3. If X is a r.v with distribution function F(x), then P(X2  y) is  0 , ot h er w i se

a. P( y  X  y ) b. P( X  y )  P( X   y ) Determine g(y) if Y = 2X2

 kx 3
c. F ( y )  F ( y ) d. All the above.  6
,x  0
Very short answer questions: 15. X has the p.d.f. f(x) =  (1  2 x )

4. Define monotone increasing function. 0 , elsewh er e.
5. Define monotone decreasing function.
6. What do you mean by change of variable technique? 2X
Determine k and also the density function of Y =
Short essay questions: 1 2X
7. Let X have the density (fx) = 1, 0 < x < 1. Find the p.d.f. of Y = ex.
8. If X has uniform distribution in (0, 1) with p.d.f. f(x) = 1, 0 < x < 1, find the x2
 , 0 x 3
p.d.f. of Y =  2 log X. 16. Given f( x) =  9
0 , o t h er w i s e
9. Let X be a rv with probability distribution 
x : 1 0 1 Find the density of Y = X3
11 1 5
p(x) :
32 2 32
Introductory Statistics 190 Introductory Statistics 191

You might also like