Intro Stats for B.Sc. Math Students
Intro Stats for B.Sc. Math Students
INTRODUCTORY
STATISTICS
Complementary course for
[Link]. MATHEMATICS
I Semester
(2019 Admission)
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut University P.O. Malappuram, Kerala, India 673 635
19553
Introductory Statistics 1
School of Distance Education School of Distance Education
INDEX
UNIVERSITY OF CALICUT MODULE 1 5
SCHOOL OF DISTANCE EDUCATION
STUDY MATERIAL
MODULE II 73
I Semester
Complementary Course MODULE III 113
for B Sc. Mathematics
INTRODUCTORY STATISTICS (STA1 C01)
MODULE IV 159
Prepared and Scrutinised by:
Dr. [Link],
Director,
Academic Staff College,
University of Calicut.
©
Reserved
Introductory Statistics 2 Introductory Statistics 3
School of Distance Education School of Distance Education
Module 1
INTRODUCTION
The term statistics seems to have been derived from the Latin word
‘status’ or Italian word ‘statista’ or the German word ‘statistic, each of
which means political state.
The word ‘Statistics’ is usually interpreted in two ways. The first sense
in which the word is used is a plural noun just refer to a collection of
numerical facts. The second is as a singular noun to denote the methods
generally adopted in the collection and analysis of numerical facts. In the
singular sense the term ‘Statistics’ is better described as statistical methods.
Different authors have defined statistics in different ways. According
to Croxton and Cowden statistics may be defined as ‘‘collection,
organisation presentation, analysis and interpretation of numerical data’’
Advantages of Sampling
1. The sample method is comparatively more economical. Types of Frequency Distribution
2. The sample method ensures completeness and a high degree of Erricker states “frequency distribution is a classification according to
accuracy due to the small area of operation the number possessing the same values of the variables’’. It is simply a
3. It is possible to obtain more detailed information, in a sample survey table in which data are grouped into classes and the number of cases
than complete enumeration. which fall in each class is recorded. Here the numbers are usually termed
4. Sampling is also advocated where census is neither necessary nor as ‘frequencies’. There are discrete frequency distributions and continuous
desirable. frequency distributions.
5. In some cases sampling is the only feasible method. For example, we
1. Discrete Frequency Distribution
have to test the sharpness of blades-if we test each blade, perhaps
the whole of the product will be wasted; in such circumstances the If we have a large number of items in the data it is better to prepare a
census method will not be suitable. Under these circumstances frequency array and condense the data further. Frequency array is prepared
sampling techniques will be more useful. by listing once and consecutively all the values occurring in the series and
noting the number of times each such value occurs. This is called discrete
6. A sample survey is much more scientific than census because in it
frequency distribution or ungrouped frequency distribution.
the extent of the reliability of the results can be known where as this
is not always possible in census.
Introductory Statistics 6 Introductory Statistics 7
Frequency Distribution
School of Distance Education School of Distance Education
Illustration: The following data give the number of children per family Concepts of a Frequency Table
in each of 25 families 1, 4, 3, 2, 1, 2, 0, 2, 1, 2, 3, 2, 1, 0, 2, 3, 0, 3, 2, i. Class limits: The observations which constitute a class are called class
1, 2, 2, 1, 4, 2. Construct a frequency distribution. limits. The left hand side observations are called lower limits and the
right hand side observations are called upper limits.
No of children Tally marks No of families
0 lll 3 ii. Working classes: The classes of the form 0-9, 10-19, 20-29,... are
1 lllI l 6 called working classes or nominal classes. They are obtained by the
inclusive method of classification where both the limits of a class are
2 llIl IIII 10
included in the same class itself.
3 lllI 4 iii. Actual classes: If we are leaving either the upper limit or the lower
4 lI 2 limit from each class, it is called exclusive method of classification.
The classes so obtained are called ‘actual classes’ or ‘true classes’.
Total 25 The classes 0.5 - 9.5, 9.5 - 19.5, 19.5 - 29.5,... are the actual
classes of the above working classes. The classes of the type 0-10, 10
2. Continuous Frequency Distribution - 20, 20 - 30,... are also treated as actual classes. There will be no
break in the actual classes. We can convert working classes to the
An important method of condensing and presenting data is that of the corresponding actual classes using the following steps.
construction of a continuous frequency distribution or grouped frequency
distribution. Here the data are classified according to class intervals. 1. Note the difference between one upper limit and the next lower
limit.
The following are the rules generally adopted in forming a frequency
table for a set of observations. 2. Divide the difference by 2.
1. Note the difference between the largest and smallest value in the 3. Subtract that value from the lower limits and add the same to the
given set of observations upper limits.
2. Determine the number classes into which the difference can be For example
divided.
3. The classes should be mutually exclusive. That means they do not Working Classes Frequency Actual Classes
overlap.
1-2.9 2 0.95-2.95
4. Arrange a paper with 3 columns, classes, tally marks and frequency.
5. Write down the classes in the first column. 3-4.9 8 2.95-4.95
6. Go though the observations and put tally marks in the respective 5-6.9 10 4.95-6.95
classes. 7-8.9 5 6.95-8.95
7. Write the sum of the tally marks of each class in the frequency iv. Class boundaries: The class limits of the actual classes are called
column. actual class limits or class boundaries.
8. Note that the sum of the frequencies of all classes should be equal v. Class mark: The class marks or mid value of classes is the average
to the total number of observations.
Introductory Statistics 8 Introductory Statistics 9
School of Distance Education School of Distance Education
of the upper limit and lower limit of that class. The mid value of working Cumulative frequencies are determined on either a less than basis or
classes and the corresponding actual classes are the same. For example, more than basis. Thus we get less than cumulative frequencies (<CF) and
the class mark of the classes 0 - 9, 10 - 19, 20 - 29,... are respectively greater than or more than cumulative frequencies (>CF). Less than CF
4.5, 14.5, 24.5,... give the number of observations falling below the upper limit of a class
and greater than CF give the number of observations lying above the lower
vi. Class interval: The class interval or width of a class is the difference limit of the class. Less than CF are obtained by adding successively the
between upper limit and lower limit of an actual class. It is better to frequencies of all the previous classes including the class against which it
note that the difference between the class limits of a working class is is written. The cumulation is started from the lowest size of the class to
not the class interval. The class interval is usually denoted by ‘c’ or i or the highest size, (usually from top to bottom). They are based on the
‘h’. upper limit of actual classes.
Example More than CF distribution is obtained by finding the cumulation or total
of frequencies starting from the highest size of the class to the lowest
Construct a frequency distribution for the following data
class, (ie., from bottom to top) More than CF are based on the lower limit
70 45 33 64 50 25 65 74 30 20 of the actual classes.
55 60 65 58 52 36 45 42 35 40
51 47 39 61 53 59 49 41 20 55 Classes f UL <CF LL >CF
46 48 52 64 48 45 65 78 53 42
0-10 2 10 2 2 0 3+7+10+8+5+1235
Solution
10-20 5 20 2+5 7 10 3+7+10+8+5 33
Classes Tally marks Frequency 20-30 8 30 2+5+8 15 20 3+7+10+8 28
30-40 10 40 2+5+8+10 25 30 3+7+10 20
20-29 lll 3
40-50 7 50 2+5+8+10+7 32 40 3+10 13
30-39 llll 5 50-60 3 60 2+5+8+10+7+3 35 50 3 3
40-49 llll llll ll 12
50-59 llll llll 10
60-69 llll ll 7
EXERCISES
70-79 lll 3
Multiple Choice Questions
1. A qualitative characterisic is also known as
Total 40 a. attribute b. variable
Cumulative Frequency Distribution c. variate d. frequency
An ordinary frequency distribution show the number of observations 2. A variable which assumes only integral values is called
falling in each class. But there are instances where we want to know a. continuous b. discrete
how many observations are lying below or above a particular value or
in between two specified values. Such type of information is found in c. random d. None of these
cumulative frequency distributions.
Introductory Statistics 10 Introductory Statistics 11
School of Distance Education School of Distance Education
3. An example of an attribute is
16. A group frequency distribution with uncertain first or last classes is
a. Height b. weight known as:
c. age d. sex a. exclusive class distribution
4. Number of students having smoking habit is a variable which is b. inclusive class distribution
a. Continuous b. discrete c. open end distribution
c. neither disrete nor continuous d. discrete frequency distribution
d. None of these Very Short Answer Questions
5. A series showing the sets of all district values individually with their 17. Define the term ‘statistics’.
frequencies is known as
18. Define the term population.
a. grouped frequency distribution
19. What is sampling
b. simple frequency distribution
20. What is a frequency distribution?
c. cumulative frequency distribution
21 Distinguish between discrete and continuous variables.
d. none of the above
Short Essay Questions
6. A series showing the sets of all values in classes with their
corersponding frequencies is knowsn as 22. Explain the different steps in the construction of a frequency table
for a given set of observations.
a. grouped frequency distribution
b. simple frequency distribution 23. Explain the terms (i) class interval (ii) class mark (iii) class frequency.
c. cumulative frequency distribution 24. Distinguish between census and sampling
d. none of the above 25. What are the advantages of sampling over census?
12. If the lower and upper limits of a class are 10 and 40 respectively, the 23. State the various stages of statistical investigation.
mid points of the class is Long Essay Questions
a. 25.0 b. 12.5 c. 15.0 d. 30.0 24. Present the following data of marks secured in Statistics (out of 100)
13. In a grouped data, the number of classes preferred are of 60 students in the form of a frequency table with 10 classes of
a. minimum possible b. adequate equal width, the lowest class being 0-9
c. maximum possible d. any arbitrarily chosen number 41 17 83 60 54 91 60 58 70 07
67 82 33 45 57 48 34 73 54 62
14. Class interval is measured as:
36 52 32 72 60 33 07 77 28 30
a. the sum of the upper and lower limit 42 93 43 80 03 34 56 66 23 63
b. half of the sum of upper and lower limit 63 11 35 85 62 24 00 42 62 33
c. half of the difference between upper and lower limit 72 53 92 87 10 55 60 35 40 57
d. the difference between upper and lower limit
Introductory Statistics 12 Introductory Statistics 13
School of Distance Education School of Distance Education
1. Arithmetic Mean
The arithmetic mean (AM) or simply mean is the most popular and
widely used average. It is the value obtained by dividing sum of all given
observations by the number of observations. AM is denoted by x (x bar).
No of workers : 3 8 12 10 7
Calculate the average income per worker.
∑
Let d = x A, Taking summation of both sides and dividing by n, we fd
get x = A+ c
N
d
x = A Example 5
n
Calculate AM from the following data
Example 4
Calculate the AM of 305, 320, 332, 350 Weekly wages : 0-10 10-20 20-30 30-40 40-50
Solution Frequency : 3 12 20 10 5
X d = x 320
305 15
320 0 Solution
332 12 x 25
Weekly wages f Mid value x d fd
350 30 10
27 0-10 3 5 2 6
18
d 10-20 12 15 1 12
x = A
n 20-30 20 25 0 0
30-40 10 35 1 10
27 20
= 320
4 40-50 5 45 2 10
= 320+6.75 Total 50 2
= 326.75
f
d
∑
A
+
×
c
=
2
5
+
Shortcut Method: Frequency Data 2
N
x = 10 25 0.4 = 25.4
When the frequencies and the values of the variable x are large the 50
calculation of AM is tedious. So a simpler method is adopted. The
deviations of the mid values of the classes are taken from a convenient
origin. Usually the mid value of the class with the maximum frequency is
chosen as the arbitrary origin or assumed mean. Thus change x values to
‘d’ values by the rule,
Introductory Statistics 18 Introductory Statistics 19
School of Distance Education School of Distance Education
Example 8 f log x
GM = Ant ilog
N
Calculate GM of 2, 4, 8
= Antilog(30.2627/42)
Solution
= Antilog 0.7205 = 5.254
GM = n x 1 , x 2 , ...... x n = 3
2 4 8 3 64 = 4 Merits and Demerits
Example 9 Merits
1. It is rigidly defined. It has clear cut mathematical formula.
Calculate GM of 4, 6, 9, 1 1 and 15
2. It is based on all the items. The magnitude of every item is considered
Solution for its computation.
3. It is not as unduly affected by extreme items as A.M. because it gives
log x
x logx GM = Anti log less weight to large items and more weight to small items.
n 4. It can be algebraically manipulated. The G.M. of the combined set
4 0.6021
can be calculated from the GMs and sizes of the sets.
4.5520
6 0.7782 = Anti log 5. It is useful in averaging ratios and percentages. It is suitable to find
9 0.9542 5 the average rate (not amount) of increase or decrease and to compute
11 1.0414 = Antilog0.9104 index numbers.
15 1.1761 = 8.136 Demerits
4.5520 1. It is neither simple to understand nor easy to calculate. Usage of
logarithm makes the computation easy.
Example 10 2. It has less sampling stability than the A.M.
Calculate GM of the following data 3. It cannot be calculated for open-end data.
Classes : 1-3 4-6 7-9 10-12 4. It cannot be found graphically.
Frequency : 8 16 15 3 5. It is not defined for qualities. Further, when one item is zero, it is
zero and thereby loses its representative character. It cannot be
Solution calculated even if one value or one mid value is negative.
Classes f X logx [Link]
1-3 8 2 0.3010 2.4080
4-6 16 5 0.6990 11.1840
7-9 15 8 0.9031 13.5465
10-12 3 11 1.0414 3.1242
Total 42 30.2627
5. It can be algebraically manipulated. The H.M. of the combined set which divides the distribution into two equal parts. The median can be
can be calculated from the H.M.s and sizes of the sets. For example, calculated using the following formula.
N1 N 2 N
HM 12
N1 N2 m
2
HM 1 HM 2 M l c
f
6. It is suitable to find the average speed.
where, l - lower limit of median class
Demerits
1. It is neither simple to understand nor easy to calculate. Median class - the class in which N/2lh observation falls
2. It has less sampling stability than the A.M. N - total frequency
3. Theoretically, it cannot be calculated for open-end data. m - cumulative frequency up to median class
4. It cannot be found graphically. c - class interval of the median class
5. It is not defined for qualities. It is not calculated when atleast one f - frequency of median class
item or one mid value is zero or negative. found to lie with in that interval.
6. It gives undue weightage to small items and least weightage to largest
items. It is not used for analysing business or economic data.
Example 13
Find the median height from the following heights (in cms.) of 9 soldiers.
Median 160, 180, 175, 179, 164, 178, 171, 164, 176
Median is defined as the middle most observation when the observations
are arranged in ascending or descending order of magnitude. That means Solution
the number of observations preceding median will be equal to the number Step 1. Heights are arranged in ascending order:
of observations succeeding it. Median is denoted by M.
160, 164, 164, 171, 175, 176, 178, 179, 180.
Definition for a raw data
n 1 9 1
For a raw data if there are odd number of observations, there will be Step 2. Position of median = is calculated. It is 5.
only one middle value and it will be the median. That means, if there are n 2 2
observations arranged in order of their magnitude, the size of (n+1)/2 th Step 3. Median is identified (5th value) M = 175cms.
observation will be the median. If there are even number of observations
the average of two middle values will beththe median. That means, median n 1
n It is to be noted that
2
may be a fraction, in which case, median is
will be the average of n/2th and 1 observations.
2 found as follows.
Definition for a frequency data
For a frequency distribution median is defined as the value of the variable Example 14
Find the median weight from the following weights (in Kgs) of 10 N 1 70 1 1
soldiers. 75, 71, 73, 70, 74, 80, 85, 81, 86, 79 Step 2. Position of median, 35 is calculated.
2 2 2
Solution Step 3. Median is identified as the average of the values at the
positions 35 and 36. The values are 173 and 178 respectively.
Step 1. Weights are arranged in ascending order:
173 178
70, 71, 73, 74, 75, 79, 80, 81, 85, 86 M = 175.5cm
2
n 1 10 1 1
Step 2. Position of median = 5 is calculated Example 16
2 2 2
Calculate median for the following data
Step 3. Median is found. It is the mean of the values at 5th
75 79 Class : 0-5 5-10 10-15 15-20 20-25
and 6th positions and so M = = 77Kgs. f : 5 10 15 12 8
2
Example 15
Find the median for the following data. Solution
Height in cms : 160 164 170 173 178 180 182 Class f CF
No. of soldiers : 1 2 10 22 19 14 2 0-5 5 5
5-10 10 15
Solution
10-15 15 30
Step 1. Heights are arranged in ascending order. Cumulative 15-20 12 42
frequencies (c.f) are found. (They help to know the 20-25 8 50
values at different positions)
Total 50
Height in cms. No. of Soldiers C.f.
N
160 1 1 m
2 c
164 2 3 M l Median class is 10-15
170 10 13 f
173 22 35
Here l 10, N / 2 50 / 2 25, c 5, m 15, f 15
178 19 54
180 14 68 25 15 5
182 2 70 M = 10
15
Total 70
10 5 10
= 10 10 = 10+3.33 = 13.33
15 3
Introductory Statistics 30 Introductory Statistics 31
School of Distance Education School of Distance Education
Total 80
N
m
M = l 2 c 13.5 (40 25) 7
f 28
15 7 15
= 13.5 13.5
28 4
= 13.5+3.75
= 17.25
Example 18
Determine the mode of
420, 395, 342, 444, 551, 395, 425, 417, 395, 401, 390
Solution
Mode = 395
Demerits
1. It is not rigidly defined.
2. It is not based on all the items. It is a positional value.
3. It cannot be algebraically manipulated. The mode of the combined set
cannot be determined as in the case of AM.
4. Many a time, it is difficult to calculate. Sometimes grouping table and
Deciles and Percentiles
frequency analysis table are to be formed.
5. It is less stable than the A.M. Deciles are partition values which divide the distribution or area under
6. Unlike other measures of central tendency, it may not exist for some a frequency curve into 10 equal parts at 9 points namely D1, D2, .........,
data. Sometimes there may be two or more modes and so it is said to be D9.
ill defined.
7. It has very limited use. Modal wage, modal size of shoe, modal size of Percentiles are partition values which divide the distribution into 100
family, etc., are determined. Consumer preferences are also dealt with. equal parts at 99 points namely P1, P2, P3, .... P99. Percentile is a very
useful measure in education and psychology. Percentile ranks or scores
Partition Values
can also be calculated. Kelly’s measure of skewness is based on percentiles.
We have already noted that the total area under a frequency curve is
equal to the total frequency. We can divide the distribution or area under Calculation of Quartiles
a curve into a number of equal parts choosing some points like median. The method of locating quartiles is similar to that method used for
They are generally called partition values or quantiles. The important finding median. Q 1 is the value of the item at
partition values are quartiles, deciles and percentiles. (n + 1)/4 th position and Q 3 is the value of the item at
3(n + 1) / 4th position when actual values are known. In the case of a
frequency distribution Q1 and Q3 can be calculated as follows.
iN
N m
4
m
4 Qi l i c, i 1, 2, 3
Q1 l1 c f
f
c - class interval
f - frequency of Q3 class
Example 23 Solution
Find , Q1, Q3, D2, D9, P16, P65 for the following data. 282, 754, 125, Cumulative
Marks No of students
765, 875, 645, 985, 235, 175, 895, 905, 112 and 155. frequency
Solution 25 3 3
35 29 32
Step 1. Arrange the values in ascending order
40 32 64
112, 125, 155, 175, 235, 282, 645, 754, 765, 875,
895, 905 and 985. 50 41 105
52 49 154
n 1 13 1 14
Step 2. Position of Q1 is 3.5 53 54 208
4 4 4
Similarly positions of Q3, D2, D9, P16 and P65 are 10.5, 67 38 246
2.8, 12.6, 2.24 and 9. 1 respectively. 75 29 275
Step 3. 80 27 302
Q1 155 0.5(175 155) = 165 Step 1. The cumulative frequencies of marks given in ascending order
are found
Q3 875 0.5(895 875) = 885
Step 2. The positions of Q1, Q3, D4, P20 and P99 are found.
D 2 125 0.8(155 125) = 149.0 They are
D 9 905 0.6(985 905) = 953 N 1 303
= 75.75
P16 125 0.24(155 125) = 132.20 4 4
EXERCISES 9. The median of the variate values 11, 7, 6, 9, 12, 15,, 19 is:
Multiple Choice Questions a. 9 b. 12 c. 15 d. 11
1. Mean is a measure of
a. location or central value b. dispersion 10. The second dicile divides the series in the ratio:
c. correlation d. none of the above a. 1:1 b. 1:2 c. 1:4 d. 2:5
2. If a constant value 50 is subtracted from each observation of a set, 11. For further algebraic treatment, geometric mean is:
the mean of the set is: a. suitable b. not suitable
a. increased by 50 b. decreased by 50 c. sometimes suitable d. none of the above
c. is not affected d. zero
[Link] percentage of values of a set which is beyond the third quartile is:
3. If the grouped data has open end classes, one cannot calculate: a. 100 percent b. 75 percent
a. median b. mode [Link] d. quartiles c. 50 percent d. 25 percent
13. In a distribution, the value around which the items tend to be most
4. Harmonic mean is better than other means if the data are for: heavily concentrated is called:
a. speed or rates b. heights or lengths a. mean b. median
c. binary values like 0 & 1 d. ratio or proportions c. third quartile d. mode
14. Sum of the deviations about mean is
5. Extreme value have no effect on: a. zero b. minimum c. maximum d. one
a. average b. median
c. geometric mean d. harmonic mean 15. The suitable measure of central tendency for qualitative data is:
a. mode b. arithmetic mean
6. If the A.M. of a set of two observations is 9 and its G.M. is 6. Then c. geometric mean d. median
the H.M. of the set of observations is:
16. The mean of the squares of first eleven natural numbers is:
a. 4 b. 3 6 c. 3 d. 1.5
a. 46 b. 23 c. 48 d. 42
7. The A.M. of two numbers is 6.5 and their G.M. is 6. The two numbers
The percentage of items in a frequency distribution lying between
are:
upper and lower quartiles is:
a. 9, 6 b. 9, 5 c. 7, 6 d. 4, 9
a. 80 percent b. 40 percent
8. If the two observations are 10 and 10 then their harmonic mean is:
c. 50 percent d. 25 percent
a. 10 b. 0 c. 5 d.
Very Short Answer Questions 34. Show that GM of a set of positive observation lies between AM &
AM.
17. What is central tendency?
35. What are the essential requisites of a good measure of central
18. Define Median and mode.
tendency? Compare and contrast the commonly employed measures
19. Define harmonic mean in terms of these requisites.
20. Define partition values 36. Discuss the merits and demerits of the various measures of central
21 State the properties of AM. tendency. Which particular measure is considered the best and why?
22 In a class of boys and girls the mean marks of 10 boys is 38 and the Illustrate your answer.
mean marks of 20 girls 45. What is the average mark of the class? 37.. What is the difference between simple and weighted average? Explain
23. Define deciles and percentiles. the circumstances under which the latter should be used in preference
to the former.
24 Find the combined mean from the following data.
38. Find the average rate of increase in population which in the first
Series x Series y
decade has increased 12 percent, in the next by 16 per cent, and in
Arithmetic mean 12 20 third by 21 percent.
No of items 80 60 39.. A person travels the first mile at 10 km. per hour, the second mile at
Short Essay Questions 8 km. per hour and the third mile at 6 km. per hour. What is his
25 Define mode. How is it calculated. Point out two average speed?
26. Define AM, median and mode and explain their uses Long Essay Questions
27. Give the formulae used to calculate the mean, median and mode of a
40. Compute the AM, median and mode from the following data
frequency distribution and explain the symbols used in them.
28. How will you determine three quartiles graphically from a less than Age last birth day : 15-19 20-24 25-29 30-34 35-39 40-44
ogive? No of persons : 4 20 38 24 10
29. Three samples of sizes 80, 40 and 30 having means 12.5, 13 and 11
respectively are combined. Find the mean of the combined sample.
30. Explain the advantages and disadvantages of arithmetic mean as an 41. Calculate Arithmetic mean, median and mode for the following data.
average. Age : 55-60 50-55 45-50 40-45 35-40 30-35 25-30 20-25
31. For finding out the ‘typical’ value of a series, what measure of No of people : 7 13 15 20 30 33 28 14
central tendency is appropriate?
42. Calculate mean, median and mode from the following data
32 Explain AM and HM. Which one is better? And Why?
Class Frequency
33. Prove that the weighted arithmetic mean of first n natural numbers
whose weights are equal to the corresponding number is equal to Up to 20 52
20-30 161
2n 1 / 3
M e t h o d s o f St u d y i n g Va r i a t i o n
The following measures of variability or dispersion
are commonly used.
1 . Range 2 . Quartile Deviation
3 . Mean Deviation 4. Standard Deviation
Solution
Absolute and Relative Dispersion
Absolute measures and relative measures are the Range = L S = 165 147 = 18
two kinds of measures of dispersion. The formers are L S 165 147
Coefficient of Range = = 0.0577
used to assess the variation among a set of values. L S 165 147
The latter are used whenever the variability of two or
more sets of values are to be compared. Relative Example 2
measures give pure numbers, which are free from the Calculate coefficient of range from the following
units of measurements of the data. Even data in data:
different units and with unequal average values can Mark: 10-20 20-30 30-40 40-50 50-60
be compared on the basis of relative measures of N o. o f s t u d e n t s : 8 10 12 8 4
dispersion. Less is the value of a relative measure,
less is the variation of the set and more is the
Solution
c o n s i s t e n c y. T h e t e r m s , s t a b i l i t y, h o m o g e n e i t y, L S 60 10
Coe ffic ie nt of Ra n ge = = 0.7143
uniformity and consistency are used as if they are L S 60 10
synonyms.
Merits and Demerits
1. Range
Definition Range is the difference between the
Merits
greatest (largest) and the smallest of the given values. 1. It is the simplest to understand and the easiest to
calculate.
In symbols, Range = L S where L is the greatest 2. It is used in Statistical Quality Control.
value and S is the smallest value.
Demerits
The corresponding relative measure of dispersion is
1. Its definition does not seem to suit the calculation
defined as
for data with class intervals. Further, it cannot
L S be calculated for open-end data.
Coefficient of Range =
L S 2. It is based on the two extreme items and not on
Example 1 any other item.
The price of a share for a six-day week is fluctuated as 3. I t d o e s n o t h a v e s a m p l i n g s t a b i l i t y. F u r t h e r , i t i s
follows: c a l c u l a t e d f o r s a m p l e s o f s m a l l s i z e s o n l y.
156 165 148 151 147 162 4. It could not be mathematically manipulated
Calculate the Range and its coefficient. f u r t h e r.
5. It is a very rarely used measure. Its scope is
limited to very few considerations in Quality
Control.
3 3
Q Q1 50-60 50 99
Coefficient of Quartile Deviation = Q + Q 60-70 37 136
1
This is also called quartile coefficient of dispersion. 70-80 30 166
80-90 24 190
Example 3
90-100 10 200
Find the Quartile Deviation for the following:
N
391, 384, 591, 407, 672, 522, 777, 733, 1490, 2488 m N 200
Q1 = l1 4 c 50
Solution f 4 4
Before finding Q.D., Q1 and Q3 are found from the
50 49 10
values in ascending order: = 50 l1 50, c 10
384, 391, 407, 522, 591, 672, 733, 777, 1490, 2488 50
n 1 10 1 1 10 1
Position of Q1 is = 2.75 = 50 50 m = 49, f = 50
4 4 50 5
Q1 = 391 + 0.75 (407 391) = 403 = 50 + 0.2 = 50.2
3(n 1) 3N
3 2.75 m
Position of Q3 is
4
= 8.25 150 136 10
Q3 = l3 4 c = 70
Q3 = 777+0.25 (1490 777) = 955.25 f 30
Q3 Q1 955.25 403.00
QD = = 276.125 14 10
2 2 = 70 70 4.67 = 7 4 . 6 7
30
T h e s q u a r e o f t h e S D i s k n o w n a s ‘ Va r i a n c e ’ a n d i s x A
denoted as s2 or SD is the positive square root of where d =
c
, A - assumed mean, c - class
variance.
interval.
Simplified formula for SD The relative measure of dispersion based on SD or
For a raw data, we have coefficient of SD is given by
SD
2
1
n
(x x )2 =
1
n
x 2 2x x x 2 Coefficient of SD =
AM
x
1 1 1 I m p o r t a n c e o f St a n d a r d D e v i a t i o n
= x 2 2x x x 2 1
n n n Standard deviation is always associated with the
mean. It gives satisfactory information about the
x2 effectiveness of mean as a representative of the data.
= 2x x x 2
n More is the value of the standard deviation less is the
concentration of the observations about the mean and
x2 vice versa. Whenever the standard deviation is small
= x2 mean is accepted as a good average.
n
According to the definition of standard deviation, it
can never be negative. When all the observations are
x2
\ s = x2 equal standad deviation is zero. Therefore a small value
n of s suggests that the observations are very close to
I n a s i m i l a r w a y, f o r a f r e q u e n c y d a t a each other and a big value of s suggests that the
o b s e r v a t i o n s a r e w i d e l y d i f f e r e n t f r o m e a c h o t h e r.
2
f x2 f x
s = P r o p e r t i e s o f St a n d a r d D e v i a t i o n
N N
1. Standard deviation is not affected by change of
origin.
Short Cut Method
Proof
2
d 2 d Le t x 1, x 2 , . . . . x n b e a s e t o f n o b s e rv a ti o n s .
For a raw data, s = where d = x
n n 1
Then sx = (x i x )2
-A n
2 Choose yi = xi + c for i = 1, 2, 3... n
fd 2 fd
For a frequency data, s = c y = x c
N N Then
yi y = xi x
Note
2 = (x i x )2
(y i y ) If there are k groups then the S.D. of the k groups combined is given
by the formula.
1 1
(y i y )2 (x i x )2
n
=
n (n 1 n 2 .... n k ) 2 n 1 12 n 2 22 .... n k k2
+ n 1 d 12 n 2 d 22 .... n k d k2
1 1
ie., (y i y )2 = (x i x )2
n n
ie., sy = sx
Hence the proof Coefficient of Variation
Coefficient of variation (CV) is the most important
2. Standard deviation is affected by change of scale. relative measure of dispersion and is defined by the formula.
Proof
Let x1, x2 ,.... xn be a set of n observations.
St an dar d deviat ion
C o e f f i c i e n t o f Va r i a t i o n = 100
1 Ar it h m et ic m ean
Then sx = (x i x )2
n
SD
Choose yi = c xi + d, i = 1, 2, 3... n and c and d are constants. This fulfils CV = 100 = 100
AM x
the idea of changing the scale of the original values.
CV is thus the ratio of the SD to the mean, expressed
Now y = cx d as a perce ntage. Acc ording to Karl Pearson, Coefficient
yi y = c (x i x ) of variation is the percentage variation in the mean.
(y i y )2 = c 2 (x i x )2 C o e f f i c i e n t o f Va r i a t i o n i s t h e w i d e l y u s e d a n d m o s t
popular relative measure. The group which has less C.V
1 1
(y i y )2 = c
2
(x i x )2 is said to be more consistent or more uniform or more
n n stable. More coefficient of variation indicates greater
variability or less consistency or less uniformity or less
1 1
ie., (y i y )2 = c (x i x )2 s t a b i l i t y.
n n
Example 9
ie., sy = ´ sx
c
To t a l 17 437
(x i x )2
SD = = 7.89 2
n d 2 d
SD = = 8.70
n n
Example 10
Calculate SD of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Solution Example 12
x 1 2 3 4 5 6 7 8 9 10
Calculate SD of the following data
10 2 20 200
x 55 12 4 48 576
x = 5.5
n 10 14 10 140 1960
x2 385 16 3 48 768
SD = x2 5.5 2 18 1 18 324
n 10
To t a l 20 274 3828
= 38.5 30.25 8.25 = 2 . 8 7
2 2 x 15
fx 2 fx 3828 247 Class f x d fd fd2
= 6
N N 20 20
0-6 5 3 2 10 20
6-12 12 9 1 12 12
= 191.4 (13.7)2 191.40 187.69 3.71 = 1 . 9 2 12-18 15 15 0 0 0
18-24 10 21 1 10 10
Example 13 24-30 3 27 2 6 12
Calculate SD of the following data To t a l 60 6 54
Classes : 0-4 4-8 8-12 12-16 16-20
fd 6
f : 3 8 17 10 2 x = A c 15 6
N 60
Solution
6
Classes f x fx fx2 = 15 15 0.6 = 1 4 . 4
10
0-4 3 2 6 12
2 2
4-8 8 6 48 288 fd 2 fd 54 6
= c 6
8-12 17 10 170 1700 N N 60 60
12-16 10 14 140 1960
= 6 0.90 0.01 6 0.89
16-20 2 18 36 648
= 6 0.9434 = 5 . 6 6
To t a l 40 400 4608
SD 5.66
2 2 CV = 100 100 = 3 9 . 3 0 %
fx 2 fx 4608 400 AM 14.4
=
N N 40 40
Merits and Demerits
= 115.2 100 15.2 = 3 . 8 9 Merits
1. It is rigidly defined and its value is always definite
Example 14 and based on all the observations and the actual
signs of deviations are used.
Calculate mean, SD and CV for the following data
2. As it is based on arithmetic mean, it has all the
Classes : 0-6 6-12 12-18 18-24 24-30
merits of arithmetic mean.
f : 5 12 30 10 3
3. It is the most important and widely used measure
of dispersion.
2 6. C a l c u l a t e t h e s t a n d a r d d e v i a t i o n a n d t h e c o e f f i c i e n t I MODULE II
of variation of a raw data for which n = 50
(x i x ) 10, (x i x )2 400 CORRELATION AND REGRESSION
27. F o r t w o s a m p l e s s i z e 1 0 e a c h , w e h a v e t h e CURVE FITTING
following values We have already studied the behaviour of a single variable characteristic
2 2 by analysing a univariate data using the summary measures viz ; measures
x 71; x 555; y 70; y 525 c o m p a r e t h e
of central tendency, measures of dispersion measures of skewness and
variability of these two samples. measures of kurtosis.
28. D e f i n e c o e f f i c i e n t o f v a r i a t i o n . I n d i c a t e i t s u s e . A Very often in practice a relationship is found to exist between two (or
factory produces two types of electric lamps A and more) variables. For example; there may exist some relation between heights
and weights of a group of students; the yield of a crop is found to vary
B. In an experiment relating to their life,. the with the amount of rainfall over a particular period, the prices of some
following results were obtained: commodities may depend upon their demands in the market etc.
Life in No. of lamps It is frequently desirable to express this relationship in mathematical
form by formulating an equation connecting the variables and to determine
hours A B the degree and nature of the relationship between the variables. Curve
500-700 .. 5 4 fitting, Correlation and Regression respectively serves these purposes.
We note that n = 5, and form the following Table. where a, b, c are constants to be determined.
Let ye be the value of y corresponding to the value xi of x determined
xi yi x i2 xi yi
by equation (1). Then the sum of squares of the error between observed
1 14 1 14 value of y and estimated value of y is given by
2 13 4 26 n
3 9 9 27 S = (y i y e )2
i 1
4 5 16 20
n
5 2 25 10 Using (1), this becomes, S = (y i a bx i cx i2 )2 .... (2)
i 1
2
xi = 15 yi = 43 x i = 55 xi yi = 97 We determine a, b, c so that S is least. Three necessary conditions for
s S S
this are a 0, b 0, and c 0 . Using (2) these conditions yield
Hence the normal equations that determine the line of best fit are the following normal equations.
Determination of Correlation
Correlation between two variables may be determined by any one of
the following methods:
1. Scatter Diagram
2. Co-variance Method or Karl Pearson’s Method
3. Rank Method
Scatter Diagram
The existence of correlation can be shown graphically by means of a
scatter diagram. Statistical data relating to simultaneous movements (or
variations) of two variables can be graphically represented-by points. One
of the two variables, say X, is shown along the horizontal axis OX and the
other variable Y along the vertical axis OY. All the pairs of values of X and
Y are now shown by points (or dots) on the graph paper. This diagrammatic
representation of bivariate data is known as scatter diagram.
The scatter diagram of these points and also the direction of the scatter
reveals the nature and strength of correlation between the two variables.
The following are some scatter diagrams showing different types of
correlation between two variables.
In Fig. 1 and 3, the movements (or variations) of the two variables are In Fig. 5 and 6 points (or dots) instead of showing any linear path lie
in the same direction and the scatter diagram shows a linear path. In this around a curve or form a swarm. In this case correlation is very small and
case, correlation is positive or direct. we can take r = 0.
In Fig. 2 and 4, the movements of the two variables are in opposite In Fig. 1 and 2, all the points lie on a straight line. In these cases
directions and the scatter shows a linear path. In this case correlation is correlation is perfect and r = +1 or 1 according as the correlation is
negative or indirect. positive or negative.
n 1
1 1 2 ( X i X ) (Yi Y )
and y2 =
n
(Yi Y )2
n
Yi 2 Y Cov (X, Y) =
n
i 1
1
The above definition of the correlation co-efficient was given by Karl = ( X i Yi X i Y XYi X Y )
Pearson in 1890 and is called Karl Pearson’s Correlation Co-efficient after n
his name.
X i Yi Xi Yi n X Y
Definition = Y X
n n n n
If (X1, Y1), (X2 ,Y2) .... (Xn , Yn) be n pairs of observations on two
variables X and Y, then the covariance of X and Y, written as cov (X,Y) is X i Yi
= X Y X Y X Y
defined by n
1 X i Yi X i Yi X i Yi
Cov (X,Y) = ( X i X )(Yi Y ) X Y
n =
n n n n
Covariance indicates the joint variations between the two variables.
Introductory Statisticsy 84 Introductory Statistics 85
School of Distance Education School of Distance Education
and conclude that since more than half of the points appear to be nearly in x i c1 u i x 0 an d y i c 2 v i y 0
a straight line, there is a positive or negative correlation between the
x x 0 c1 u and y y 0 c 2 v
C ov (X , Y )
Now, r = x y where u and v are the means u is and v is respectively..
X i Yi X i Yi x i x c1 (u i u ) and y i y c 2 (v i v )
n n n
= ...(2) Substituting these values in (1), we get
2 2
X i2 X i Yi 2 Yi
n n n n 1 n
n
c1 (u i u ) c2 (vi v )
2
iii. By multiplying each term of (2) by n , we have i 1
r xy =
n n
1 1
r =
n X i Yi (X i )( Yi )
n
c12 (u i u )2
n
c22 (vi v )2
i 1 i 1
n X i2 (X i )2 n Yi 2 (Yi )2
n
1
Theorem n
(u i u )(vi v )
i 1
The correlation coefficient is independent (not affected by) of the change = n n
1 1
of origin and scale of measurement.
n
(u i u )2
n
(vi v )2
i 1 i 1
Proof
Let (x1, y1), (x2, y2) .... (xn, yn) be a set of n pairs of observations. n
1 (u i u )(v i v )
(x i x ) (y i y ) = i 1 = ruv
n n u v
r xy = ...(1)
1 1
(x i x )2 (y i y )2
n n Here, we observe that if we change the origin and choose a new scale,
the correlation co-efficient remains unchanged. Hence the proof.
Let us transform xi to ui and yi to vi by the rules,
Here, ruv can be further simplified as
xi x 0 y y0
ui = and v i i ...(2)
c1 c2 C ov (u , v )
r xy = u v
where x0, y0, c1, c2 are arbitrary constants.
From (2), we have
n n
1
n
u i vi u v X i Yi
i 1 and r xy = i 1 ...(3)
= 1 1 n x y
u i2 u 2 vi2 v 2
n n
Now we have
n n n
n u i vi u i vi
n Xi Y
2
X i2 Yi 2 2 X i Yi
=
n u i2 ( u i )2 n vi2 ( vi )2 i
x y = i 1
i 1
i 1
i 1 x2 y2 x y
x2 y2
r = xy
wher e x X X and y Y Y
x2 y2
r=
29
= = 0.8457
28 42
X Y x X X y Y Y x2 y2 xy
Example 9
39 47 26 19 676 361 494
Karl Pearson’s coefficient of correlation between two variables X and
65 53 0 13 0 169 0
Y is 0.28 their covariance is +7.6. If the variance of X is 9, find the
standard deviation of Y-series. 62 58 3 8 9 64 24
Solution 90 86 25 20 625 400 500
Karl Pearson’s coefficient of correlation r is given by 82 62 17 4 289 16 68
75 68 10 2 100 4 20
cov ( X , Y )
r = 25 60 40 6 1600 36 240
x y
98 91 33 25 1089 625 825
Here r =0.28, Cov (X, Y) = 7.6 and x2 9; x 3 .
=
1 78 4 84 3 1 1
(x i x )2 (y i y )2 2 (x i x )(y i y )
2 36 9 51 9 0 0
n n n
3 98 1 91 1 0 0
= X2 Y2 2 cov (x , y ) 4 25 10 60 6 4 16
or, 2cov (x, y) = 5 75 5 68 4 1 1
n 2 1 n 2 1 d i2 2(n 2 1) d i2 6 82 3 62 5 2 4
12 12 n 12 n 7 90 2 86 2 0 0
8 62 7 58 7 0 0
n 2 1 d i2
or, cov (x, y) = 9 65 6 53 8 2 4
12 2n
10 39 8 47 10 2 4
Hence, from (1), we get
Total 30= d2
n 2 1 d i2 n 2 1
R =
12 2n 12 Applying Edward Spearman’s formula:
6 d2 6 d 2
= 1 [omitting i] R = 1
n (n 2 1 ) n (n 2 1)
6 30 18
Example 13 = 1 2
1
99
Student (Roll No.) 10(10 1)
1 2 3 4 5 6 7 8 9 10
Marks in Maths. 2 9
= 1 = 0.82
78 36 98 25 75 82 90 6 2 6 5 6 9 11 11
y y = b (x x ) .... (5) y y = b yx (x x )
Example 24
Example 23 Calculate the rank correlation coefficient from the following data
Find the correlation coefficient between X and Y given specifying the ranks of 7 students in two subjects.
Rank in the first subject : 1 2 3 4 5 6 7
x: 10 16 13 12 15 17 14
Rank in the second subject : 4 3 1 2 6 5 7
y: 20 33 25 27 26 30 30
Solution Solution
The following table is prepared.
Here n = 7. Let x and y denote respectively the ranks in the first and A B Ranks in A Ranks in B di d i2
7 7 0 0 6 d i2
R = 1
n (n 2 1)
20
6 38
= 1
10 (10 2 1)
The Spearman’s rank correlation coefficient is
6 d i2 6 20 = 1 0.2303 = 0.7697
R= 1 2
1 = 0.643
n (n 1) 7 (7 2 1) Example 26
Example 25
The coefficient of rank correlation of marks obtained by 10
Find the rank correlation coefficient between marks in two subjects A students in two subjects was computed as 0.5. It was later
and B scored by 10 students
A: 88 72 95 60 35 46 52 58 30 67 discovered that the difference in marks in two subjects obtained
B: 65 90 86 72 30 54 38 43 48 75 by one of the students was wrongly taken as 3 instead of 7.
Find the correct coefficient of rank correlation.
= 82.5 32 + 72 = 122.5. 31 81 3 1 9 1 3
R = = 0.2576 29 80 5 0 25 0 0
36 88 2 8 4 64 16
34 85 0 5 0 25 0
Example 27 39 92 5 12 25 144 60
The following are the data on the average height of the plants 40 95 6 15 36 225 90
and weight of yield per plot recorded from 10 plots of rice
7 42 219 624 277
crop.
n u i vi u i vi
Height (X) i. r xy ruv = 2 2
(cms)
: 28 26 32 31 37 29 36 34 39 40 n u i2 u i n vi2 vi
Yield (Y)
: 75 74 82 81 90 80 88 85 92 95
(kg) 10 277 (7) 42
=
10 219 (7)2 10 624 (42)2
Find (i) correlation coefficient between X and Y (ii) the
regression coefficient and hence write down regression 3064
= = 0.989
46.271 66.903
equation of y on x and that of x on y (iii) probable value of the
ii. The regression coefficient of y on x is
yield of a plot having an average plant height of 98 cms.
n u i vi u i vi
b yx = 2
n u i2 u i
n u i vi u i vi 20 x 9 y 107 = 0
b xy = 2
n vi2 vi
Solving these equations, we get the mean values of x and y as
3064 x 13, y 17 . We rewrite the given equations respectively as
= = 0.684
4476.01
4 33 9 107 4 9
y x ,x y so that b yx , b xy
ui 5 5 20 20 5 20
The regression equation of y on x is x A
n Therefore, the coefficient of correlation between x and y is
y y = b yx x x = 34
7
33.3 r = b xy b yx = 0.6
10
Here positive sign is taken since both b xy and b yx are positive.
vi
ie., y 84.2 = 1.431 x 33.3 y B
y
n 4
Since r b yx , and x2 9 (given), we get
x 5
42
ie., y = 1.431x 36.55 = 80 = 84.2
10 4 x 4 3
y = =4
The regression equation of x on y is 5r 5 0.6
x x = b xy y y Thus, the variance of y is y2 = 16.
ie., x 33.3 = 0.684 y 84.2
ie., x = 0.684y 24.29 EXERCISES
Multiple Choice Questions
[Link] estimate the yield (y), the regression equation of y on x is 1. The idea of product moment correlation was given by
a) R.A. Fisher b) Sir. Francis Galton
y = 1.431x 36.55
c) Karl Pearson d) Spearman
when x = 98, y = 1.431 98 36.55 = 103.69kg
2. Correlation coefficient was invented in this year
Example 27
a) 1910 b) 1890
For the regression lines 4 x 5 y 33 0 and 20 x 9 y 107 , find
c) 1908 d) None of the above
(a) the mean values of x and y, (b) the coefficient of correlation between
x and y, and (c) the variance of y given that the variance of x is 9.
10. If r xy 1 , the relation between X and Y is of the type 21. Give the significance of the values r = +1, r = 1 and r = 0.
a) When Y increases, X also increases 22. What is the use of scatter diagram?
23. What are advantages of rank correlation coefficient?
b) When Y decreases, X also decreases
24. Why there are two regression lines?
c) X is equal to Y
25. What are regression coefficients.
d) When Y increases, X proportionately decreases.
Example Example:
i. A = {0} i. If A {a. b, c, d, e} and B {a, d}
ii. B = {x/x is an even number between 3 and 51 i.e., B = {4} then B A or A B
4. Null Set ii. If A = {2, 4} and B = {I, 2, 3, 4, 5}
A set which does not contain any element is called an “empty set or then A B or B A
‘void set’ or ‘Null Set’.
7. Equal Sets
Example: Two sets A and B are said to be equal if A B and B A and is denoted
i. Set A denotes names of boys in a girls college. by A = B
i.e., A = { } since nobody is admitted to a girl’s college.
Example:
ii. T = {x/x is a perfect square between 10 and 15}
i. Let A = {3, 2. 5. 6} and B = {2, 5. 6. 3}
i.e.. T = { } since no number which is a perfect square exists between
10 and 15. Here all the elements of A are elements of B{ie. A B} and all the
A null a set is denoted by the greek letter (read as phi) elements of B are elements of A( ie., B A). Hence
Example: A=B
Let A = {X, Y, Z} and B = {1, 2, 3} Then A and B are said to be 2. Intersection of Sets
equivalent sets and are denoted by A B If A and B are two sets, then the ‘intersection” of A and B is the set of
all elements which are common to both of them. Intersection of sets A and
9. Power Set B is denoted by A B
The power set is defined as the collection of all subsets of a given set. That is, x A B implies x A and x B
It is also called Master set. Example:
Example:
The powerset of a given set {a, b, c} is {, {a}, {b}, {c}, {a,b}, {b,
If A = {2, 5} and B = {5, 7, 9}
c}, {c, a}, {a, b, c}}.
then A B = {2, 5} {5, 7, 9} = {5}
The number of elements in a set is called cardinality of the set, Thus
the cardinality of powerset of a given set having 3 elements is 23. Generally Disjoint Sets
the cardinality of power-set of given set having n elements is 2n.
Two sets are said to be ‘disjoint’ or ‘mutually exclusive’ if they do not
Venn Diagrams have any common element between them
Sets can be represented diagrammatically using Venn diagrams. These (A B) = or (A B) = { }, a null set
were introduced by John Venn, an English logician.
Here, the Universal set is represented by a rectangle and all other sub-
Example:
sets by circles or triangles etc. Venn diagrams are especially useful for If A = {1, 2} and B = {a, b, c}. (A B) =
representing various set operations. Hence we first learn about set
operations and employ Venn diagrammatic approach to represent the same. 3. Difference of Sets
If A and B are two sets, A – B is the difference of two sets A and B
Set Operations which contains all elements which belong to A but not to B,
That is x A–B implies x A and x B
The basic set operations are (i) union (ii) intersection [iii) compliment
and (iv) difference.
Example:
i. Union of sets
If A = {0, 1, 2, 3} and B = {2. 3. 5, 7}
If A and B are two sets, then the ‘union’ of sets A and B is the set of all
elements which belong to either A or B or both (i.e., which belongs to at then A–B = {0, 1}
least one). It is denoted by A B.
4. Complement of Sets
That is, x A B implies x A or x B
Suppose A is a sub set of some Universal set S. Its complementary set
Example: is the set of all elements of the Universal set S which does not belong to
If A = {3, 8, 5} and B = {3, 6, 8} the set A. The complementary set of A is denoted by A {A dash) or A
then A B = {3, 8, 5} {3, 6, 8} = {3, 8, 5, 6} (A bar) or AC (A complement).
That is x Ac implies x A but x S
Fundamental Principle:
C
n n
If an event ‘A’ can happen in ‘n1’ ways and another event ‘B’ can
Ai AC – De’ Morgan’s Laws.
happen in ‘n2’ ways, then the number of ways in which both the events A
i 1 i 1
and B can happen in a specified order is ‘n 1 n2’.
If there are three routes from X to Y: two routes from Y to Z then the
destination Z can be reached from X in 3 2 = 6 ways.
So 3P2 = 3C2 2!
Factorial notation
We have a compact notation for the full expression given by the product 3P2 3.2
n (n –1) (n – 2) … 3. 2. 1. This is written as n! read as ‘n factorial’. or 3C2 = 3
2! 1.2
So, nPn = n! = n (n – 1) (n –2) … 3. 2. 1.
6P6 = 6! = 6. 5. 4. 3. 2. 1 = 720 Co mbi na tio n o f n d iff ere nt th ing s t a ken r a t time (r < n)
By, definition, 0! =1 The number of combinations of n different things taken r at a time is
We have, nPr = n (n –1) … (n – r + 1) denoted as nCr or nCr or n . It is given by
= n(n – 1) (n – 2) ... (n – r + 1)
nPr r n(n 1)(n 2)...(n r 1)
(n r )(n r 1)...3.2.1 nCr =
r!
=
n! 1 2 3 r
(n r )(n r 1)...3.2.1 or nC r
r!(n r )!
Example 3 Example 5
What is the probability of getting a spade or an ace from a pack or What is the probability of getting 9 cards of the same suit in one hand
cards? at a game of bridge?
Solution Solution
P (Spade or Ace) = 16/52 One hand in a game of bridge consists of 13 cards. Total number of
possible cases = 52C13
Example 4
A box contains 8 red, 3 white and 9 blue balls. If 3 balls are drawn at The number of ways in which a particular player can have 9 cards of
random, determine the probability that (a) all three are blue (b) 2 are red one suit are 13C9 and the number of ways in which the remaining 4 cards
and 1 is white (c) atleast one is white and (d) one of each colour is drawn. are of some other suit are 39C4. Since there are 4 suits in a pack of cards,
the total number of favourable cases = 4 13C9 39C4.
Solution
Assume that the balls are dreawn from the urn one by one without 4 13C 9 39C 4
replacement. Required probability =
52C13
9C 3 7
a) P(all the three are blue) = =
20C 3 95
8C 2 3C1 4
b) P(2 red and 1 white) = =
20C 3 95
c) P(at least 1 is white) = 1 – P (None is white)
17C 3
= 1–
20C 3
34 23
= 1– =
57 57
Event
AXIOMATIC DEFINITION OF PROBABILITY An event is a subset of the sample space. In other words, “of all the
possible outcomes in the sample space of an experiment, some outcomes
The mathematical and statistical definitions of probability have their satisfy a specified description, which we call an event.”
own disadvantages. So they do not contribute much to the growth of the
Field of events (F)
probability theory. The axiomatic definition is due to A.N. Kolmogorov
Let S be the sample space of a random experiment. Then the collection
(1933), a Russian mathematician, and is mathematically the best definition
or class of sets F is called a field or algebra if it satisfies the following
of probability since it eliminates most of the difficulties that are encountered
conditions.
in using other definitions. This axiomatic approach is based on measure
1. F is nonempty
theory. Here we introduce it by means of set operations.
2. the elements of F are subsets of S.
Sample space
3. if A F, then AC F
A sample space is the set of all conceivable outcome of a random
4. if A F and B F then A B F
experiment. The sample space is usually denoted by S or W. The notion
of a sample space comes from Richard Von Mises. For example, let S = { 1, 2, 3, 4, 5, 6 }
Every indecomposable outcome of a random experiment is known as Choose F as the set with elements , S, { 5, 6} and {1, 2, 3,4) Then F
satisfies all the four conditions. So F is a field.
a sample point or elementary outcome. The number of sample points in the
sample space may be finite, countably infinite or noncountably infinite. More generally, when A S, F = {, A, AC, S} forms a field. Trivially, F
Sample space with finite or countably infinite number of elements is called with just two elements and S forms a field.
discrete sample space. Sample space with continuum of points is called -’field or -algebra of events
continuous sample space. Let S be a nonempty set and F be a collection of subsets of S. Then
Example F is called a -field or -algebra if
1. The sample space obtained in the throw of a single die is a finite 1. F is nonempty
sample space, ie. S = {1, 2, 3, 4. 5, 6} 2. Tile elements of F are subsets of S
2. The sample space obtained in connection with the random experiment 3. If A F,then AC F and
of tossing a coin again and again until a head appears is a countably 4. The union of any countable collection of elements of F is an element
infinite sample space. of F.
ie. S {H, TH, TTH, TTTH .......... }
3. Consider the life time of a machine. The outcomes of this experiment i.e., if Ai F, i = 1, 2, 3, … n, then
i 1
Ai F
form a continuous sample space.
The algebra F is also called Borel field and is often denoted by B.
ie., S = { t : 0 < t < }
We have S=S n n
P Ai P( Ai )
P(S ) = P(S) i 1 i 1
Theorem 3 (Monotonicity)
i.e., P(S) + P() = P(S), since S and are disjoint.
If A B, then P(A) P(B)
i.e., 1 + P() = 1 – by axiom 2
Proof
P() = 0 From the Venn diagram
Note: We have B = A (AC B)
The condition P(A) = 0 does not imply that A = P(B) = P[A (AC B)]
= P(A) + P(AC B) since A and AC B are disjoint
P(B) = P[(A B) + P(AC B)] since (A B) (AC B) =
i.e., Ai = A1 ( A1C A2 ) ( A1C A2C A3 ) ... P(AC B) = P(B) – P(A B) .. (2)
i 1
On substituting (2) in (1) we get,
P(A B) = P(A) + P(B) – P(A B)
P Ai P( A ) P( AC A ) P( AC AC A ) ...
i 1 1 1 2 1 2 3 Corollary (1) If A B = , P(A B) = P(A) + P(B)
(2) P(A B) = 1 – P(A B)C
P( A1 ) P( A2 ) P( A3 ) ...
= 1 – P (AC BC)
i.e., P[the occurrence of atleast one event] = 1 – P[None of them is
Since P ( A A2 ) P(A2) etc., i.e.,
C P Ai P( Ai ) occurring]
1
i 1 i 1 Theorem 7 (Addition theorem for 3 events)
Theorem 5 (Complementation) If A, B, C are any three events,
P(AC) = 1 – P(A) P(A B C) = P(A) + P(B) + P(C) – P(A B) – P(B C) –
Proof
P(A C) + P(A B C)
We have A AC = S P(A AC ) = P(S) Proof
i.e., P(A) + P(AC ) = 1, by axiom 2 and 3 P(AC ) = 1 – P(A) Let B C = D, Then P(A B C) = P(A D)
i.e., P [Non occurrence of an event] = 1 – P[Occurrence of that event] = P(A) + P (D) – P(A D), by theorem 6
Since P(A B) = P(A) P(B|A) and since P(B|A) = P(B) when B is Note 3
independent of A, we must have, P(A B} = P(A) . P(B) Pairwise independence does not imply mutual independence.
P(Ai Aj … Ar) = P(Ai) P(Aj) … P(Ar) for every subset (Ai, Aj,
Then P(A)= P(B) = P(C) = 1/2
…, Ar) of A1, A2, … , An and consider the collection of events A.B,C. These events are pairwise
independent but not mutually independent.
That is the probabilities of every two, every three…, every n of the
events are the products of the respective probabilities. Since they are pairwise independent we have,
For example, three events A, B and C are said to be mutually independent P(A B) = 1/4 = P(A)P(B)
if P(B C) = 1/4 = P(B)P(C)
P(A B)
P(A C) = 1/4 = P(A)P(C)
= P(A) P(B)
P(B C)
But P(A B C) = P(1) = 1/4
= P(B) P(C)
P(A C) = P(A) P(C) and
1 1 1 1
P(ABC) = P(A) P(B) P(C) P(A).P(B).P(C) =
2 2 2 8
Introductory Statistics 144 Introductory Statistics 145
School of Distance Education School of Distance Education
Proof Note 1
Since A and B are independent, we have Here the probabilities P(Bi | A) for i = 1, 2, …, n are the probabilities
determined after observing the event A and P(Bi) for i = 1, 2, ....., n are the
P(A|B) = P(A). P(B|A) = P(B) and P(AB) = P(A).P(B) probabilities given before hand. Hence P(Bi) for i = 1, 2, ......, n are called
(i) Now, P(ABC) = P(A) P(BC|A) ‘a priori’ probabilities and P(Bi | A) for i =1, 2, ....., n are called “a posteriori’
probabilities. The probabilities P(A|Bi), i = 1, 2, ....., n are called ‘likely
= P(A) [1–P(B|A)]
hoods’ because they indicate how likely the event A under consideration is
= P(A) [1– P(B)] to occur, given each and every, ‘a priori’ probability. Baye’s theorem
= P(A) P(BC) gives a relationship between P(Bi | A) and P(A | Bi) and thus it involves a
ie., A and BC are independent type of inverse reasoning. Baye’s theorem plays an important role in
applications. This theorem is due to Thomas A Baye’s.
Aliter Example 8
P(A B C) = 1 – P(A B C) C Suppose that there is a chance for a newly constructed house to collapse
= 1 – P(AC BC CC)
wether the design is faulty or not. The chance that the design is faulty is
10%. The chance that the house collapse if the design is faulty is 95% and
= 1 – P(AC) P(BC) P(CC)
otherwise it is 45%. It is seen that the house collapsed. What is the
1 3 1 probability that it is due to faulty design?
=1 – 1 2 1 4 1 4
Solution
= 29/32 Let B1 and B2 denote the events that the design is faulty and the design
is good respectively. Let A denote the event that the house collapse. Then
Example 7 we are interested in the event (B1|A), that is, the event that the design is
A purse contains 2 silver coins and 4 copper coins and a second purse faulty given that the house collapsed. We are given,
contains 4 silver coins and 3 copper coins. If a coin is selected at random P(B1) = 0.1 and P(B2) = 0.9
from one of the purse. What is the probability that it is a silver coin? P(A|B1) = 0.95 and P(A|B2) = 0.45
Solution Hence
Define the events P ( B1 ).P ( A | B1 )
P(B1|A) =
B1 – selection of 1st purse P ( B1 ).P ( A | B1 ) P ( B2 ).P ( A | B2 )
B2 – selection of 2nd purse
A – selection of silver coin (0.1)(0.95)
=
P(B1) = P(B2) = 1/2 (0.1)(0.95) (0.9)(0.45)
P(A|B1) = 2/6, P(A|B2) = 4/7 = 0.19
By theorem on total probabilities
Example 9
P(A) = P(A B1) + P(A B2)
Two urns I and II contain respectively 3 white and 2 black bails, 2
= P(B1) P(A|B1) + P(B2) P(A|B2)
white and 4 black balls. One ball is transferred from urn I to urn II and
1 2 1 4 then one is drawn from the latter. It happens to be white. What is the
= probability that the transferred ball was white.
2 6 2 7
Solution
1 2 7 12 19
= Define
6 7 42 42
B1 - Transfer a white ball from Urn I to Urn II
B2 - Transfer a black ball from Urn I to Urn II.
A - Select a white ball from Urn II. 6. If A B, the probability P(A|B) is equal to
Here, P(B1) = 3/5, P(B2) =2/5 a) zero b) one
P(A|B1) = 3/7, P(A|B2) =2/7 c) P(A)/P(B) d) P(B)/P(A)
We have to find P(B1|A), 7. The probability of two persons being borned on the same day (ignoring
By Baye’s theore, date) is
a) 1/49 b) 1/365
P( B1 ).P( A | B1 )
P(B1|A) = c) 1/7 d) none of the above
P( B1 ) P( A | B1 ) P( B2 ) P( A | B2 )
3/ 53/ 7 9 / 35 9
= = =
3 / 5 3 / 7 2 / 5 2 / 7 13 / 35 13
8. The probability of throwing an odd sum with two fair dice is
EXERCISES a) 1/4 b) 1/16 c) 1 d) 1/2
9. If P(A|B) = 1/4, P(B|A) = 1/3, then P(A)|P(B) is equal to
Multiple choice questions
a) 3/4 b) 7/12
1. Probability is a measure lying between
c) 4/3 d) 1/12
a) – to + b) – to +1
10. If four whole numbers are taken at random and multiplied, the chance
c) –1 to +1 d) 0 to 1 that the first digit is their product is 0, 3, 6 or 9 is
2. Classical probability is also known as a) (2/5)3 b) (1/4)3 c) (2/5)4 d) (1/4)4
a) Laplace’s probability b) mathematical probability
c) a priori probability d) all the above
Fill in the blanks
3. Each outcome of a random experiment is called
11. Classical definition of probability was given by ……….
a) primary event b) compound event
12. An event consisting of only one point is called ……….
c) derived event d) all the above
13. Mathematical probability cannot be calculated if the outcomes are
4. If A and B are two events, the probability of occurance of either A or ……….
B is given by 14. In statistical probability n is never ……….
a) P(A)+P(B) b) P(AB) 15. If A and B are two events, the P(A B) is ……….
c) P(AB) d) P(A)P(B) 16. Axiomatic definition of probability is propounded by ……….
5. The probability of intersection of two disjoint events is always 17. Baye’s rule is also known as ……….
a) infinity b) zero 18. If an event is not simple, it is a ……….
c) one d) none of the above
An urn is chosen at random and two balls are drawn from it.
What is the probability that both are red?
46. Three urns are given each containing red and while chips as
indicated.
Urn 1 : 6 red and 4 white.
Urn 2 : 2 red and 6 white.
Urn 3 : 1 red and 8 white.
(i) An urn is chosen at random and a ball is drawn from the
urn. The ball is red. Find the probability that the urn chosen
was urn 1.
(ii) An urn is chosen at random and two balls are drawn
without replacement from this urn. If both balls are red, find
the probability that urn 1 was chosen. Under these
conditions, what is the probability that urn III was chosen.
47. State Baye’s theorem. A box contains 3 blue and 2 red balls
while another box contains 2 blue and 5 red balls. A ball
drawn at random from one of the boxes turns out to be blue.
What is the probability that it came from the first box?
48. In a factory machines A, B and C produce 2000, 4000 and
5000 items in a month respectively, Out of their output 5%,
3% and 7% are defective. From the factory’s products one is
selected at random and inspected. What is the probability
that it is good? If it is good, what is the probability that it is
from machine C?
MODULE IV
RANDOM VARIABLE
AND
PROBABILITY DISTRIBUTIONS
We have seen that probability theory was generally
characterised as a collection of techniques to describe, analyse and
predict random phenomena. We then introduced the concept of
sample space, identified events with subsets of this space and
developed some techniques of evaluating probabilities of events.
The purpose of this chapter is to introduce the concepts of
random variables, distribution and density functions and a thorough
understanding of these concepts is very essential for the
development of this subject.
Random variables, to be introduced now, can be regarded
merely as useful tools for describing events. A random variable will
be defined as a numerical function on the sample space S.
Definition
A random variable (r.v.) is a real valued function defined over
the sample space. So its domain of definition is the sample space S
and range is the real line extending from −∞ to +∞. In other words
a r.v. is a mapping from sample space to real numbers. Random
variables are also called chance variables or stochastic variables. It
is denoted by X or X( ).
In symbols, X : S → R (−∞, +∞ )
Outcome ( ) : HH HT TH TT
Values of X( ) : 2 1 1 0
Example
In coin tossing experiment, we note that
S = { 1 , 2} where 1 = Head, 2 = Tail
0 = Tail (T)
Now define X( ) =
1 = Head (H)
Here the random variable X( ) takes only two values as can
be either head or tail. Such a random variable is known as Bernoulli
random variable.
Remark: If X1 and X2 are r.v.s. and C is a constant,
then (i) C Xl is a r.v.
(ii) Xl + X2 is a r.v.
(iii) Xl- X2 is a r.v.
(iv) max[Xl, X2] is a r.v.
(v) min[Xl, X2] is a r.v.
Random Variables are of two types (i) discrete (ii) continuous. A
random variable X is said to be discrete if its range includes finite
number of values or countably infinite number of values. The
possible values of a discrete random variable can be labelled as x l ,
x2 , x3... eg. the number of defective articles produced in a factory in
a day in a city, number of deaths due to road accidents in a day in a
city, number of patients arriving at a doctors clinic in a day etc.
A random variable which is not discrete is said to be continuous.
That means it can assume infinite number of values from a
specified interval of the form [ , b].
A few examples of continuous random variable are given below
(1) A man brushes his teeth every morning. X represents the
time taken for brushing, next time (2) X represents the height of
a student randomly chosen from a college
(3) X represents the service time of a doctor on his next patient
(4) life time of a tube etc.
Note that r.v.s. are denoted by capital letters X, Y, Z etc. and the
corresponding small letters are used to denote the value of a.r.v.
We are not interested in random variables, where as, we will be
interested in events defined in terms of random variables. From the
definition of the random variable X, we have seen that each set of
the form {X ≤ x} is an event. The basic type of events that we shall
consider are the following.
{X= }, {X=b}, { < X < b} { < X ≤ b}, { ≤ X < b},
{ ≤ X ≤ b} where −∞ ≤ ≤ b ≤ ∞. The above subsets are being
events, it is permissible to speak of its probability. Thus with every
random variable we can associate its probability distribution or
simply distribution.
Definition
By a distribution of the random variable X we mean the
assignment of probabilities to all events defined in terms of this
random variable.
Now we shall discuss the probability distributions in the case of
discrete as well as continuous random variables.
Probability Distributions
i. Discrete:
The probability distribution or simply distribution of a discrete r.v.
is a list of the distinct values of xi of X with their associated
probabilities
f(xi) = P(X = xi).
Thus let X be a discrete random variable assuming the values
x1, x2, ...xn from the real line. Let the corresponding probabilities be
f(x1), f(x2)....f(xn). Then P(X = xi) = f(xi) is called probability mass
function or probability function of X, provided it satisfy the
conditions
(i) f(xi) ≥ 0 for all i
(ii) Σ f(xi) = 1
4
8
3
8
2
8
1
8
0 1 2 3 4 5
Line Diagram
Probability Histogram
On the X axis we take values of the r.v. With each value x as centre, a
vertical rectangle is drawn whose area is equal to the probability f(x). Note
that in plotting a probability histogram the area of a rectangle must be
equal to the probability of the value at the centre. So that the total area
ruler the histogram must be equal to the total probability (i.e. unity). For
example the probability histogram of the following probability distribution is
given below.
Y
Probability
.6
.5
Area = .6 x .5
.4
.3
.2
.1
Probability histogram
ii. Continuous
We now turn our attention to describing the probability
distribution of a random variable that can assume all the values in
an interval. The probability distribution of a continuous random
variable can be visualised as a smooth form of the relative
frequency histogram based on a large number of observations.
Because probability is interpreted as long run relative frequency,
the curve obtained as the limiting form of the relative frequency
histograms represent the manner in which the total probability, is
distributed over the range of possible values of the random variable
X. The mathematical function denoted by f(x) whose graph
produces this curve is called probability density function of the
continuous r.v. X.
Definition
If X is a continuous random variable and if P(x ≤ X ≤ x + dx) =
f(x)dx, then f(x) is called probability density function (pdf) of a
continuous r.v. provided it satisfy the conditions (i) f(x) ≥ 0 ∀ x and
(ii) ∫ f(x)dx = 1.
We can justify the term ‘density function’ to some extent from the
following argument. We have
∆
∫ f(x)dx = P(x ≤ X ≤ x + ∆x).
When ∆x is very small, the mean value gives us the
approximation.
∆
∫ f(x)dx = f (x). ∆x
∴ f (x)∆x = P(x ≤ X ≤ x + ∆x)
( ≤ ≤ ∆ )
∴ f (x) =
∆
( , ∆
= Total probability in the interval
Result. 1
P(a ≤ x ≤ b) = P(a ≤ x < b)
P(a ≤ x ≤ b) ∫ f(x)dx
= the area under the curve
y = f(x), enclosed between the ordinates
drawn at x = and x = b.
Result. 2
Probability that a continuous r.v. X will assume a particular value
is zero ie., P(X = ) = 0
Distribution function
Definition
For any random variable X, the function of the real variable x
defined as Fx(x) = P(X £ x) is called cumulative probability
distribution function or simply cumulative distribution function (cdf)
of X. We can note that the probability distribution of a random
variable X is determined by its distribution function.
If X is a discrete r.v. with pmf p(x), then the cumulative
distribution function is defined as
Fx(x) = P(X ≤ x) = ∑ p(x).
If X is a continuous r.v. with pdf f(x), then the distribution function
is defined as
3. 0 ≤ F(x) ≤ l
4. F( ) ≤ F(b) if < b
That means F(x) is non decreasing
5. If X is discrete, F(b) - F( ) = P( < X ≤ b)
6. For a discrete r.v. the graph of f(x) indicates that it is a step
function or a staircase function.
7. F(x) is a continuous function of x on the right.
8. If X is continuous F(b) - F( ) = P( ≤ X ≤ b) = Area under the
probability curve.
9. F(x) possesses a continuous graph, if X is continuous. If F(x)
possesses a derivative,
( )
then = f(x)
10. The discontinuities of F(x) are at the most countable.
=∫ f(x)dx
6. Quartiles are determined by solving the equations
;∫ f(x)dx = ;∫ f(x)dx =
7. The rth central moment is determined by the equation
=∫ (x − μ) f(x)dx, r = l, 2, 3....
where μ is the mean of X.
8. In particular, the variance of X is calculated as
μ2 =∫ (x − μ) f(x)dx,
9. MD about Mean is given by
MD = ∫ |x − mean| f(x)dx,
SOLVED PROBLEMS
Example l
Obtain the probability distribution of the number of heads when
three coins are tossed together?
Solution
When three coins are tossed, the sample space is given by
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}. Here the r.v. X
defined as the number of heads obtained will takes the values 0, l,
2 and 3 from the real line w.r.t each outcome in S. We can assign
probabilities to each value of the r.v. as follows.
P{X = 0} = P{TTT} =
P{X = 1} = P{HTT or THT or TTH} =
P{X = 2} = P{HHT or HTH or THH} =
P{X = 3} = P{HHH} =
X=x 0 1 2 3 Total
1 3 3 1
P(X = x) 1
8 8 8 8
Example 2
From a lot containing 25 items 5 of which are defectives. 4 items
are chosen at random. Find the probability distribution of the
number of defectives obtained.
Solution
Let X be the number of defectives obtained. Here X takes the
values 0, l, 2, 3 and 4. If the items are drawn without replacement.
.
P{X = 0} = =
.
P{X = 1} = =
.
P{X = 2} = =
.
P{X = 3} = =
.
P{X = 4} = =
X=x 0 1 2 3 Total
969 380 40 1 1
P(X = x)
2530 2530 2530 2530
= 0, elsewhere.
Example 3
Examine whether f(x) as defined below is a pdf.
f(x) = 0 ; x < 2
= (3 + 2x) ; 2 ≤ x < 4
= 0 ; x > 4.
Solution
To show that f(x) is a pdf we have to show that
= ∫ f(x)dx = l
Here = ∫ f(x)dx
1 1
= (3 + 2x)dx = (3x + x )
18 18
= |(12 + 16) − (6 + 4)| =
x 18 = 1
So f(x) is a pdf.
Example 4
If the distribution function F(X) is given to be
⎧ 0< ≤1
⎪
F(x) =
⎨− + 0< ≤2
⎪
⎩ 1 <2
find the density function.
Solution
We know that
( )
= f(x)
( )
Here f(x) = = , when 0 < x ≤ 1
= (3 − ), when 1 < x ≤ 2
= 0, otherwise.
Example 5
Examine whether the following is a distribution function.
0 < −
F(x) = ( + 1) − ≤ ≤
1 >
Solution
a. F(x) is defined for all real values of x.
b. F(-∞) = 0 ;
c. F(∞) = l
d. F(x) is non decreasing.
− < >
e. F’(x) =
0 <− >
Example 6
0 <0
Given F(x) = 0≤ ≤1
1 >1
Determine (a) P(X ≤ 0.5) (b) P(0.5 ≤ X ≤ 0.8) (c) P(X > 0.9)
Solution
P(X ≤ 0.5) = F(0.5) = (0.5)2 = 0.25
P(0.5 ≤ X ≤ 0.8) = P (0.5 < X ≤ 0.8)
= F(0.8) - F(0.5)
= (0.8)2 - (0.5)2 = 0.39
P(X > 0.9) = l - P(X ≤ 3.9) = l - F(0.9)
= l - (0.9)2 = 0.19
Example 7
A random variable X has the density function
f(x) = if −∞ < x < α
= 0, elsewhere.
Determine K and the distribution function.
Solution
We know that ∫ ∫( ) = 1
ie. ∫ = 1
(tan ) = 1
K − − = 1
K = 1 K = 1/
= ∫
= (tan )
= tan − −
= tan +
Example 8
Evaluate the distribution function F(x) for the following density
function and calculate F(2)
0< ≤1
F(x) = (4 − ) 1 < ≤ 4
0 ℎ
Solution
By definition, F(x) = ∫ f (y)dy
= + 4 −
0 −∞ < < 0
⎧
⎪ 0< ≤1
⎨ + 4 − 1< ≤4
⎪
⎩ 1 ≥4
Hence, F(2) = + 4 −
at x = 2
Example 9
The probability mass function of a.r.v.X is given as follows.
x 0 1 2 3 4 5
p(x) k2 ′ 5 2k2 k
4 2
4 2
x 0 1 2 3 4 5 Total
= for x = 0
= for x = 2
= for x = 3
= 0 elsewhere
Solution
By definition, the distribution function F(x) is,
F(x) = P(X ≤ x) = ∑ f(x)
F(-1) = ∑ f(x) = 0 + =
F(0) = ∑ f(x) = 0 + + =
F(2) = + + =
F(3) = + + + =1
F(x) = 1 for x ≥ 3
= for x = - l = ,-l≤x<0
= for x = 0 = ,0≤x<2
= for x = 2 = ,2≤x<3
= 1 for x ≥ 3 = 1, 3 ≤ x < ∞
1 2/8
7/8
6/8 3/8
5/8
4/8
2/8
1/8
−∝ -1 0 1 2 3 ∝ X
ie., 2 a 3 = 1 Example 15
Let the distribution function of X be
.
7
9
0
1/ 3
1 1
a3 = , a F(x) = 0 if x < 1
2 2
x 2
1 = if 1 x 1
2 4
(ii) P(X > a ) = 3x dx = 1 b3
b = 1 if x 1
3 3
1 b = 0.05, b = 0.95, b = 0.98 Find P(X = 1)
Example 14 Solution
Verify that the following is a distribution function:
We know that P(X = a ) = F( a + 0) F( a 0)
F(x) = 0 ;x< a P(X = 1) = F(1 + 0) F(1 0)
1x = 1 3/4
= 1 ; a x a
2 a = 1/4
= 1 ;x>a EXERCISES
Solution
Multiple Choice questions
Obviously the properties (i), (ii), (iii) and (iv) are satisfied. Also we
1. The outcomes of tossing a coin three time are a variable of the type
observe that F(x) is continuous at x = a and x = a as well
a. Continuous r.v b. Discrete r.v.
d 1
Now , F (x ) = ,a x a c. Neither discrete nor continuous
dx 2a
d. Discrete as well as continuous
= 0, otherwise
2. The weight of persons in a country is a r.v. of the type
= f(x) (say)
a. discrete b. continuous
In order that F(x) is a distribution function, f(x) must be a p.d.f. Thus
c. neither a nor b d. both a and b
we have to show that
3. Let x be a continuous rv with pdf
f ( x )d x f(x) = kx, 0 x 1 ;
= 1
= k, 1 x 2;
a a
= 0 otherwise
1
f ( x )d x = f ( x )d x =
2a d x 1 The value of k is equal to
a a
a. 1/4 b. 2/3
Hence F(x) is a d.f. c. 2/5 d. 3/4
Introductory Statistics 178 Introductory Statistics 179
School of Distance Education School of Distance Education
27. Examine whether the following can be a p.d.f. If so find k and CHANGE OF VARIABLE
P (2 x 3)
x 1
f(x) = k , 2 x 4 and 0 elsewhere. Change of variable technique is a method of finding the distribution of a
3 2
function of a random variable. In many probability problems, the form of
28. Obtain the distribution function F(x) for the following p.d.f. the density function or the mass function may be complex so as to make
computation difficult. This technique will provide a compact description of
x / 3, 0 x 1 a distribution and it will be relatively easy to compute mean, variance etc.
5
We will illustrate this technique by means of examples separately for
(4 x ), 1 x 4
f(x) = 27 discrete and continuous cases. Here we mention only the univariate case.
0, otherw ise 1. Suppose that the random variable X take on three values -1, 0 and
1 with probabilities 11/32, 16/32 and 5/32 respectively. Let us trans-
29. For the p.d.f. f(x) = 3a x2, 0 x a find a and form the random variable X, taking Y = 2X + 1.
The random variable Y can also take on values -1, 1 and 3 respectively,
P(X 1/2 / 1/3 X 2/3)
where
30. Examine whether P(Y = -1) = P(2X + 1 = -1) = P(X = -1) = 11/32
0, x 2 , or x 4 P(Y = 1) = P(2X + 1 = 1) = P(X = 0) = 16/32
f(x) = x 1 , 2 x 4 P(Y = 3) = P(2X + 1 = 3) = P(X = 1) = 5/32
9 6
Thus the probability distribution of Y is
is a p.d.f. If so, calculate P(2 < X < 3)
Y 1 1 3 Total
3 2
(1 x ) if 1 x 1 p(y) 11/32 16/32 5/32 1
31. If f(x) = 4
0, otherw ise 2. In the above example, if we transform, the r.v. X as Y = X2, the
possible values of Y are 0 and 1.
0 , x 1
1 3 Therefore,P(Y = 0) = P(X = 0) = 16/32
1 3
x x , 1 x 1 P(Y = 1) = P(X = -1 or X = 1)
Show that F(x) = 2 4 4
1, x 1 = P(X = -1) + P(X = 1)
= 11/32 + 5/32 = 16/32
Y 0 1 Total
p(y) 16/32 16/32 1
0 0< ≤1
Let (gx) = 1 1< ≤ 3 2
2 >3 2
We can find the probability mass function of g(x) as follows.
P{g(x)= 0} = P(0 < x ≤ 1) = ∫ f(x) dx
= ∫ dx = +2
= +2 = +2 = × = =
=
/
P{g(x) = 1} = P(1 < x ≤ 3/2) = ∫ f(x)dx
/
/
= ∫ dx = +2
= +3 - +2 =
P{g(x) = 2} = P{X > 3/2} = ∫ f(x)dx
2 2
1
= ∫ dx =
6 2
+2 3
2
= (2 + 4 ) − +3 =
= if 0 < <1
ie., G(y) = ( − 1) if 0 < y < 4
= 0; otherwise
The density function of y is given by
G(y) = G’(y) = ( − 1), 1 < <4
= 0, otherwise
Remark
From the above examples, we can observe that we can
determine the probability distribution of Y from the probability
distribution of X directly.
Let X be a r.v. defined on the sample space S. Let Y = g(X) be a
single valued and continuous transformation of X. Then g(x) is also
Example 1
Let X be a continuous r.v. with pdf f(x). Let Y = X2. Find the pdf
and the distribution function of Y?
Solution
Let X be a continuous r.v. with pdf f(x) and the distribution
function F(x). Let Y = X2. Let G(y) be the distribution function of Y
and g(y) its pdf.
Then G(y) = P(Y≤ y)
= P(X2 ≤ y) = P { |X| ≤ }
= P − ≤ ≤
= F − (− ) ....(1)
Now g(y) = G(y)
( )
= F( )+
√ √
= [F( ) + F’ − ]
√
= [f + f (− y)] .....(2)
√
Note 1
If, however, the random variable X is of the discrete type the
distribution function of y is given by
0 if y ≤ 0
G(y) = ..... (3)
F y − F − y − P(X = y) if y > 0
If the point - y is not a jump point of the r.v.X. then
P(X = - y ) = 0 and the above result becomes identical with the
(1) given above.
Note 2
Let x1, x2 ... be the jump points of the r.v. X and y1, y2... be the
jump points corresponding to the r.v. Y according to the relation y i =
x .
Then P(Y = yi) = P(X2 = yi) = P(X = ) + P(X =)
Example 2
A r.v. X has density f(x) = Kx e , x > 0. Determine K and the
density of Y = X3
Solution :
Given f(x) = Kx e , x > 0.
We know that ∫ f(x)dx =1
∫ Kx e dx =1
Put x = t
∫ e dt =1
3x dx = dt
=1
− (0 − 1) =1
∴ =3
= 3x e .
= e
= e ,y > 0
Example 3
a. If X has a uniform distribution in [0, 1] with pdf.
f(x) = 1, 0 ≤ x ≤ 1
= 0, otherwise
Find the distribution (p.d.f) of -2 log X.
b. If X has a standard cauchy distribution with p.d.f.
f(x) = , , +∞<X<∞
Find a p.d.f. for X2
Solution
Let Y = -2 log X. Then the distribution function G of Y
is
G(y) = P(Y ≤ y) = P(-2 logX ≤ y)
= P(logX ≥ - y/2) = P(X ≥ )
= 1- P(X ≤ )
= 1- ∫ f(x)dx = 1 - ∫ 1. dx = 1-
Note
(i) is the p.d.f of a chi-square distribution with 2 degrees of
freedom
b. Let Y = X2
G(y) = P(Y ≤ y) = P( )≤ = − ≤ ≤+
= ∫√ ( ) = 2 ∫√ ( )
√
= ;0 ≤ <∞
[which is a beta distribution of second kind]
kx 3
c. F ( y ) F ( y ) d. All the above. 6
,x 0
Very short answer questions: 15. X has the p.d.f. f(x) = (1 2 x )
4. Define monotone increasing function. 0 , elsewh er e.
5. Define monotone decreasing function.
6. What do you mean by change of variable technique? 2X
Determine k and also the density function of Y =
Short essay questions: 1 2X
7. Let X have the density (fx) = 1, 0 < x < 1. Find the p.d.f. of Y = ex.
8. If X has uniform distribution in (0, 1) with p.d.f. f(x) = 1, 0 < x < 1, find the x2
, 0 x 3
p.d.f. of Y = 2 log X. 16. Given f( x) = 9
0 , o t h er w i s e
9. Let X be a rv with probability distribution
x : 1 0 1 Find the density of Y = X3
11 1 5
p(x) :
32 2 32
Introductory Statistics 190 Introductory Statistics 191